AlmaBetter Blogs > Hypothesis Testing

How good are your assumptions?

Considering the current pandemic scenario, we might want to know whether a certain new drug developed by a pharmaceutical company will be able to reduce the number of active cases in the world. Of course before trying the following drug on a mass scale, the government might be interested in the effectiveness of the drug based on the results from a couple of thousands of test cases. For example, When testing the new vaccine on a random sample of 2500 previously uninfected people, its found that these people are 70 % less likely to contract to COVID-19 compared to those who did not take the vaccine. Now, the question is : Is this significant enough? Could this be just the result of chance, or is this vaccine really effective ? It is in these cases that the hypothesis testing comes in and has an very important role to play. Statistical Inference Statistical inference is the process of drawing conclusions about the population based on a sample from that population. This technique really comes in handy when it becomes physically impossible to acquire stats and information about the entire population. We can ask numerous questions about the population and might be able to draw some definitive conclusions based on single randomly drawn sample from that population. Suppose, we survey a total of 50 different higher education institutions in India. We found that out of 950 professors in those institutions, 640 are male and 310 are female. Based on this statistic can we come to a make a claim that there are less than 30% preference for female professors in Indian institutions ? Does the data provide enough evidence to infer a gender bias based on this statistic ? Steps in Hypothesis testing : Given a claim, identify the null hypothesis and the alternative hypothesis, and express them both in symbolic form. Given a claim and sample data, calculate the value of the test statistic. Given a significance level, identify the critical value(s). Given a value of the test statistic, identify the P-value. State the conclusion of a hypothesis test in simple, non-technical terms. Null hypothesis : The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. This is the first assumption that you make about any event and is often quite mundane and doesn’t signify any progress or change. Alternative Hypothesis : The alternative hypothesis is the statement that the statistic has a value that somehow differs from the null hypothesis.This statement is often progressive in nature. The symbolic form of the alternative hypothesis must use one of these symbols: ≠, <, >. Of the two symbolic expressions obtained so far, let the alternative hypothesis be the one not containing equality so that it uses the symbol < or > or ≠. Let the null hypothesis be the symbolic expression that the statistic equals the fixed value being considered. Two salient features : Rare event rule : Under the assumption that null hypothesis is True, if probability of certain event is bleak, i.e. if the event is more likely to have occurred by chance, then our assumption is supposed to be incorrect and we reject null hypothesis in favor of alternate hypothesis. In any case, we either reject or fail to reject the Null hypothesis. We never accept a null hypothesis. After formulating both the hypothesis, we conduct the hypothesis test under the assumption that the null hypothesis is true. Our goal is to collect enough evidence in favor of alternate hypothesis so as to be able to reject null hypothesis. However if our results are statistically insignificant, we fail to reject the null hypothesis. To be able to say if the results are statistically significant or not, we need to understand the concepts of test statistic and p value. But before discussing these terms in detail, lets learn about the concept of and significance level, critical region and types of test we run based on our hypotheses. Significance Level The significance level, also denoted as alpha or α, is a measure of the strength of the evidence that must be present in our sample before we will reject the null hypothesis and conclude that the effect is statistically significant. The researcher determines the significance level before conducting the experiment. The significance level is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference. Lower significance levels indicate that we require stronger evidence before we will reject the null hypothesis. In other words, the significance level is a statistical way of demonstrating how confident you are in your conclusion. If you set a high alpha (0.25), then you’ll have a better shot at supporting your alternative hypothesis, since you don’t need to find as big a difference between your test groups. However, you’ll also have a bigger chance at being wrong about your conclusion. Critical region is the region of values that corresponds to the rejection of the null hypothesis at some chosen significance level α. Critical Value is the permissible limit value beyond which results are statistically significant and Null hypothesis is rejected. Types of Test

The types of test that we conduct is based on the alternate hypothesis (Ha) If Ha : parameter < value, then we conduct left-tailed test, where critical region lies on the left tail of the distribution.

If Ha : parameter >value, then we conduct right-tailed test, where critical region lies on the right tail of the distribution.

If Ha : parameter ≠ value, then we conduct a two-tailed test, where critical regions lie on both the left and right tails of distribution.

Test Statistic A Test statistic is a mathematical formula that allows researchers to determine the likelihood of obtaining sample outcomes if null hypothesis were true. The value of test statistic is used to make a decision regarding the null hypothesis. The larger the test statistic, the further the distance, or the number of standard deviations, a sample mean is from the population mean stated in null hypothesis. Examples of test statistics that are used in hypothesis testing are : Test statistic for proportions :

Test statistic for mean :

If standard deviation σ of population is not known, but however the sample variance is known, we use a t-statistic with (n-1) degrees of freedom, instead of z-statistic.

When we want to test the measure deviation of sample variance (S²) from the population variance (σ²), we use χ² statistic with (n-1) degrees of freedom.

P-value Approach After we are done calculating the statistic of a parameter for a given sample, We need to calculate the P-value of the given statistic. P-value is nothing but the probability of observing a sample parameter that is as or more extreme than the observed and is given by the area under the tails with limits defined according to the type of test we conduct. Lets take the example of the z-statistic that is calculated assuming that our sample parameter follows a normal distribution, which also follows from the Central Limit Theorem.

Now, we use significance levels during hypothesis testing to help us determine which hypothesis the data support. We compare your p-value to our significance level. If the p-value is less than our significance level, we can reject the null hypothesis and conclude that the effect is statistically significant. In other words, the evidence in our sample is strong enough to be able to reject the null hypothesis at the population level. Example 1: A principal at a certain school claims that the students in his school are above average intelligence. A random sample of thirty students IQ scores have a mean score of 112.5. Is there sufficient evidence to support the principal’s claim? The mean population IQ is 100 with a standard deviation of 15. Step 1: State the Null Hypothesis. The accepted fact is that the population mean is 100, so: H0: μ=100. Step 2: State the Alternate Hypothesis. The claim is that the students have above average IQ scores, so: H1: μ > 100. The fact that we are looking for scores “greater than” a certain point means that this is a right-tailed test. Step 3: State the α level. If you aren’t given an α level, use 5% (0.05). Step 4: Find the test statistic using this formula:

For this set of data: z= (112.5–100) / (15/√30) = 4.56. Step 5: Considering we are doing a right-tailed test, calculate the P-value for z statistic using the z table P-value ∼ 0.00001. Step 6: Compare the P-value to significance level α. If p < α, results are significant and null hypothesis is rejected. Else, we fail to reject the Null hypothesis. Since p < α = 0.05, we reject the null hypothesis, in favor of alternate hypothesis and effectively conclude that principal’s claim is True based on the statistical test conducted. Example 2 The average height of students in a batch is 100 cm and the standard deviation is 15. However, the teacher believes that this has changed, so he decides to test the height of 75 random students in the batch. The average height of the sample comes out to be 105. Is there enough evidence to suggest that the average height has changed? Step 1: State the Null Hypothesis. In our case it will be the average height of students in the batch is 100. H0 : μ = 100 Step 2: State the Alternate Hypothesis. In our case, the teacher claims that the actual value has changed. He doesn’t know whether the average has gone up or down, but he believes that it has changed and is not 100 anymore. Ha : μ ≠100 So if the alternate hypothesis is written with a ≠ sign that means that we are going to perform a 2-tailed test because chances are it could be more than 100 or less than 100 which makes it 2-tailed. Step 3: State the α level. If you aren’t given an α level, use 5% (0.05). Step 4: Find the test statistic using this formula:

For this set of data: z= (105–100)/(15/√7.5) = 2.89 Step 5: Considering we are doing a two-tailed test, calculate the P-value for z statistic using the z table P-value = 0.003852 Step 6: Compare the P-value to significance level α. If p < α, results are significant and null hypothesis is rejected. Else, we fail to reject the Null hypothesis. Since p < α = 0.05, we reject the null hypothesis, in favor of alternate hypothesis and effectively conclude that teacher’s claim is True that the average height of students in the batch has changed. Statistical errors We often come across terms like statistical errors as a result of conducting hypothesis tests. If the result of the test corresponds with reality, then a correct decision has been made. However, if the result of the test does not correspond with reality, then an error has occurred. There are two situations in which the decision is wrong. The null hypothesis may be true, whereas we reject H0. On the other hand, the alternative hypothesis Ha may be true, whereas we do not reject H0. Two types of error are distinguished: Type I error and type II error.

Type I error The first kind of error is the rejection of a true null hypothesis as the result of a test procedure. This kind of error is called a type I error (false positive) and is sometimes called an error of the first kind. Type II error The second kind of error is the failure to reject a false null hypothesis as the result of a test procedure. This sort of error is called a type II error (false negative) and is also referred to as an error of the second kind. Power of Hypothesis Test The power of a hypothesis test is the probability that the test rejects the null hypothesis (H0) when a specific alternate hypothesis (Ha) is true i.e., it indicates the probability of avoiding a Type-II error. The statistical power ranges from 0 to 1, and as statistical power increases, the probability of making a type II error (wrongly failing to reject the null hypothesis) decreases. If probability of making Type II error is β, power of hypothesis test is given by 1-β.

AlmaBetter Blogs > Hypothesis Testing

How good are your assumptions?

Considering the current pandemic scenario, we might want to know whether a certain new drug developed by a pharmaceutical company will be able to reduce the number of active cases in the world. Of course before trying the following drug on a mass scale, the government might be interested in the effectiveness of the drug based on the results from a couple of thousands of test cases. For example, When testing the new vaccine on a random sample of 2500 previously uninfected people, its found that these people are 70 % less likely to contract to COVID-19 compared to those who did not take the vaccine. Now, the question is : Is this significant enough? Could this be just the result of chance, or is this vaccine really effective ? It is in these cases that the hypothesis testing comes in and has an very important role to play. Statistical Inference Statistical inference is the process of drawing conclusions about the population based on a sample from that population. This technique really comes in handy when it becomes physically impossible to acquire stats and information about the entire population. We can ask numerous questions about the population and might be able to draw some definitive conclusions based on single randomly drawn sample from that population. Suppose, we survey a total of 50 different higher education institutions in India. We found that out of 950 professors in those institutions, 640 are male and 310 are female. Based on this statistic can we come to a make a claim that there are less than 30% preference for female professors in Indian institutions ? Does the data provide enough evidence to infer a gender bias based on this statistic ? Steps in Hypothesis testing : Given a claim, identify the null hypothesis and the alternative hypothesis, and express them both in symbolic form. Given a claim and sample data, calculate the value of the test statistic. Given a significance level, identify the critical value(s). Given a value of the test statistic, identify the P-value. State the conclusion of a hypothesis test in simple, non-technical terms. Null hypothesis : The null hypothesis is a statement that the value of a population parameter (such as proportion, mean, or standard deviation) is equal to some claimed value. This is the first assumption that you make about any event and is often quite mundane and doesn’t signify any progress or change. Alternative Hypothesis : The alternative hypothesis is the statement that the statistic has a value that somehow differs from the null hypothesis.This statement is often progressive in nature. The symbolic form of the alternative hypothesis must use one of these symbols: ≠, <, >. Of the two symbolic expressions obtained so far, let the alternative hypothesis be the one not containing equality so that it uses the symbol < or > or ≠. Let the null hypothesis be the symbolic expression that the statistic equals the fixed value being considered. Two salient features : Rare event rule : Under the assumption that null hypothesis is True, if probability of certain event is bleak, i.e. if the event is more likely to have occurred by chance, then our assumption is supposed to be incorrect and we reject null hypothesis in favor of alternate hypothesis. In any case, we either reject or fail to reject the Null hypothesis. We never accept a null hypothesis. After formulating both the hypothesis, we conduct the hypothesis test under the assumption that the null hypothesis is true. Our goal is to collect enough evidence in favor of alternate hypothesis so as to be able to reject null hypothesis. However if our results are statistically insignificant, we fail to reject the null hypothesis. To be able to say if the results are statistically significant or not, we need to understand the concepts of test statistic and p value. But before discussing these terms in detail, lets learn about the concept of and significance level, critical region and types of test we run based on our hypotheses. Significance Level The significance level, also denoted as alpha or α, is a measure of the strength of the evidence that must be present in our sample before we will reject the null hypothesis and conclude that the effect is statistically significant. The researcher determines the significance level before conducting the experiment. The significance level is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual difference. Lower significance levels indicate that we require stronger evidence before we will reject the null hypothesis. In other words, the significance level is a statistical way of demonstrating how confident you are in your conclusion. If you set a high alpha (0.25), then you’ll have a better shot at supporting your alternative hypothesis, since you don’t need to find as big a difference between your test groups. However, you’ll also have a bigger chance at being wrong about your conclusion. Critical region is the region of values that corresponds to the rejection of the null hypothesis at some chosen significance level α. Critical Value is the permissible limit value beyond which results are statistically significant and Null hypothesis is rejected. Types of Test

The types of test that we conduct is based on the alternate hypothesis (Ha) If Ha : parameter < value, then we conduct left-tailed test, where critical region lies on the left tail of the distribution.

If Ha : parameter >value, then we conduct right-tailed test, where critical region lies on the right tail of the distribution.

If Ha : parameter ≠ value, then we conduct a two-tailed test, where critical regions lie on both the left and right tails of distribution.

Test Statistic A Test statistic is a mathematical formula that allows researchers to determine the likelihood of obtaining sample outcomes if null hypothesis were true. The value of test statistic is used to make a decision regarding the null hypothesis. The larger the test statistic, the further the distance, or the number of standard deviations, a sample mean is from the population mean stated in null hypothesis. Examples of test statistics that are used in hypothesis testing are : Test statistic for proportions :

Test statistic for mean :

If standard deviation σ of population is not known, but however the sample variance is known, we use a t-statistic with (n-1) degrees of freedom, instead of z-statistic.

When we want to test the measure deviation of sample variance (S²) from the population variance (σ²), we use χ² statistic with (n-1) degrees of freedom.

P-value Approach After we are done calculating the statistic of a parameter for a given sample, We need to calculate the P-value of the given statistic. P-value is nothing but the probability of observing a sample parameter that is as or more extreme than the observed and is given by the area under the tails with limits defined according to the type of test we conduct. Lets take the example of the z-statistic that is calculated assuming that our sample parameter follows a normal distribution, which also follows from the Central Limit Theorem.

Now, we use significance levels during hypothesis testing to help us determine which hypothesis the data support. We compare your p-value to our significance level. If the p-value is less than our significance level, we can reject the null hypothesis and conclude that the effect is statistically significant. In other words, the evidence in our sample is strong enough to be able to reject the null hypothesis at the population level. Example 1: A principal at a certain school claims that the students in his school are above average intelligence. A random sample of thirty students IQ scores have a mean score of 112.5. Is there sufficient evidence to support the principal’s claim? The mean population IQ is 100 with a standard deviation of 15. Step 1: State the Null Hypothesis. The accepted fact is that the population mean is 100, so: H0: μ=100. Step 2: State the Alternate Hypothesis. The claim is that the students have above average IQ scores, so: H1: μ > 100. The fact that we are looking for scores “greater than” a certain point means that this is a right-tailed test. Step 3: State the α level. If you aren’t given an α level, use 5% (0.05). Step 4: Find the test statistic using this formula:

For this set of data: z= (112.5–100) / (15/√30) = 4.56. Step 5: Considering we are doing a right-tailed test, calculate the P-value for z statistic using the z table P-value ∼ 0.00001. Step 6: Compare the P-value to significance level α. If p < α, results are significant and null hypothesis is rejected. Else, we fail to reject the Null hypothesis. Since p < α = 0.05, we reject the null hypothesis, in favor of alternate hypothesis and effectively conclude that principal’s claim is True based on the statistical test conducted. Example 2 The average height of students in a batch is 100 cm and the standard deviation is 15. However, the teacher believes that this has changed, so he decides to test the height of 75 random students in the batch. The average height of the sample comes out to be 105. Is there enough evidence to suggest that the average height has changed? Step 1: State the Null Hypothesis. In our case it will be the average height of students in the batch is 100. H0 : μ = 100 Step 2: State the Alternate Hypothesis. In our case, the teacher claims that the actual value has changed. He doesn’t know whether the average has gone up or down, but he believes that it has changed and is not 100 anymore. Ha : μ ≠100 So if the alternate hypothesis is written with a ≠ sign that means that we are going to perform a 2-tailed test because chances are it could be more than 100 or less than 100 which makes it 2-tailed. Step 3: State the α level. If you aren’t given an α level, use 5% (0.05). Step 4: Find the test statistic using this formula:

For this set of data: z= (105–100)/(15/√7.5) = 2.89 Step 5: Considering we are doing a two-tailed test, calculate the P-value for z statistic using the z table P-value = 0.003852 Step 6: Compare the P-value to significance level α. If p < α, results are significant and null hypothesis is rejected. Else, we fail to reject the Null hypothesis. Since p < α = 0.05, we reject the null hypothesis, in favor of alternate hypothesis and effectively conclude that teacher’s claim is True that the average height of students in the batch has changed. Statistical errors We often come across terms like statistical errors as a result of conducting hypothesis tests. If the result of the test corresponds with reality, then a correct decision has been made. However, if the result of the test does not correspond with reality, then an error has occurred. There are two situations in which the decision is wrong. The null hypothesis may be true, whereas we reject H0. On the other hand, the alternative hypothesis Ha may be true, whereas we do not reject H0. Two types of error are distinguished: Type I error and type II error.

Type I error The first kind of error is the rejection of a true null hypothesis as the result of a test procedure. This kind of error is called a type I error (false positive) and is sometimes called an error of the first kind. Type II error The second kind of error is the failure to reject a false null hypothesis as the result of a test procedure. This sort of error is called a type II error (false negative) and is also referred to as an error of the second kind. Power of Hypothesis Test The power of a hypothesis test is the probability that the test rejects the null hypothesis (H0) when a specific alternate hypothesis (Ha) is true i.e., it indicates the probability of avoiding a Type-II error. The statistical power ranges from 0 to 1, and as statistical power increases, the probability of making a type II error (wrongly failing to reject the null hypothesis) decreases. If probability of making Type II error is β, power of hypothesis test is given by 1-β.

Made with

in Bengaluru, India - Join AlmaBetter
- Sign Up
- Become A Coach
- Coach Login

- Location
- 4th floor, 133/2, Janardhan Towers, Residency Road, Bengaluru, Karnataka, 560025

- Follow Us

© 2022 AlmaBetter