# Tests of Means and Power

… a core topic in Quantitative Methods and Atlas104

### Topic description

This topic examines test of means and the calculation of power – the probability of correctly rejecting a false null hypothesis.

The treatment of this topic on the Atlas follows almost precisely that in Chapter XII, Test of Means, and Chapter XIII, Power, of OnlineStatBook, Online Statistics Education – An Interactive Multimedia Course of Study, http://onlinestatbook.com/2/index.html, accessed 15 May 2016.

###### Topic learning outcome

Familiarity with the using tests of means and power including following core concepts and terms.

[Note: Until Atlas pages are created for individual concepts in Quantitative Methods, the links in the concepts below point directly to the relevant pages in OnlineStatBook.]

###### Core concepts associated with this topic

Read and/or watch video for each of the concept pages above (top to bottom, starting in left column).

Read the Statistical Literacy exercise, Surgery for Weight Loss, and answer the question.

Read the Statistical Literacy exercise, Design of Studies for Alzheimer’s Drug, and answer the question.

Read the Angry Moods (AM) case study and complete following exercises:

• Do athletes or non-athletes calm down more when angry? Conduct a t test to see if the difference between groups in Control-In scores is statistically significant.
• Do people in general have a higher Anger-Out or Anger-In score? Conduct a t test on the difference between means of these two scores. Are these two means independent or dependent? (relevant section)

Read the Smiles and Leniency (SL) case study and do the following exercises:

• Compare each mean to the neutral mean. Be sure to control for the familywise error rate. (relevant section)
• Does a “felt smile” lead to more leniency than other types of smiles? (a) Calculate L (the linear combination) using the following contrast weights false: -1, felt: 2, miserable: -1, neutral: 0. (b) Perform a significance test on this value of L. (relevant section)

Read the Smiles and Leniency (SL) case study and do the following exercises:

• Compare each mean to the neutral mean. Be sure to control for the familywise error rate. (relevant section)
• Does a “felt smile” lead to more leniency than other types of smiles? (a) Calculate L (the linear combination) using the following contrast weights false: -1, felt: 2, miserable: -1, neutral: 0. (b) Perform a significance test on this value of L. (relevant section)

Read the Animal Research (AR) case study and answer the following questions:

• Conduct an independent samples t test comparing males to females on the belief that animal research is necessary. relevant section)
• Based on the t test you conducted in the previous problem, are you able to reject the null hypothesis if alpha = 0.05? What about if alpha = 0.1? relevant section)
• Is there any evidence that the t test assumption of homogeneity of variance is violated in the t test you computed in #25? relevant section)

• Compare each dosage with the dosage below it (compare d0 and d15, d15 and d30, and d30 and d60). Remember that the patients completed the task after every dosage. (a) If the familywise error rate is .05, what is the alpha level you will use for each comparison when doing the Bonferroni correction? (b) Which differences are significant at this level? (relevant section)
• Does performance increase linearly with dosage?
• Plot a line graph of this data.
• Compute L for each patient. To do this, create a new variable where you multiply the following coefficients by their corresponding dosages and then sum up the total: (-3)d0 + (-1)d15 + (1)d30 + (3)d60 (see #8). What is the mean of L?
• Perform a significance test on L. Compute the 95% confidence interval for L. (relevant section)
###### Assessment questions

From http://onlinestatbook.com/2/tests_of_means/ch10_exercises.html and http://onlinestatbook.com/2/power/ch11_exercises.html, accessed 15 May 2016.

AQ104.09.01. Define power in your own words.

AQ104.09.02. List 3 measures one can take to increase the power of an experiment. Explain why your measures result in greater power.

AQ104.09.03. Population 1 mean = 36; Population 2 mean = 45; Both population standard deviations are 10;  Sample size (per group) 16. AQ104.09.03.1 What is the probability that a t test will find a significant difference between means at the 0.05 level? AQ104.09.03.2 Give results for both one- and two-tailed tests. Hint: the power of a one-tailed test at 0.05 level is the power of a two-tailed test at 0.10.

AQ104.09.04. Rank order the following in terms of power. n is the sample size per group.

 Population 1 Mean n Population 2 Mean Standard Deviation a 29 20 43 12 b 34 15 40 6 c 105 24 50 27 d 170 2 120 10

AQ104.09.05. Alan, while snooping around his grandmother’s basement stumbled upon a shiny object protruding from under a stack of boxes . When he reached for the object a genie miraculously materialized and stated: “You have found my magic coin. If you flip this coin an infinite number of times you will notice that heads will show 60% of the time.” Soon after the genie’s declaration he vanished, never to be seen again. Alan, excited about his new magical discovery, approached his friend Ken and told him about what he had found. Ken was skeptical of his friend’s story, however, he told Alan to flip the coin 100 times and to record how many flips resulted with heads. AQ104.09.05.2 What is the probability that Alan will be able convince Ken that his coin has special powers by finding a p value below 0.05 (one tailed). Use the Binomial Calculator (and some trial and error) AQ104.09.05.2 If Ken told Alan to flip the coin only 20 times, what is the probability that Alan will not be able to convince Ken (by failing to reject the null hypothesis at the 0.05 level)?

AQ104.09.06. The scores of a random sample of 8 students on a physics test are as follows: 60, 62, 67, 69, 70, 72, 75, and 78.

AQ104.09.06.1 Test to see if the sample mean is significantly different from 65 at the .05 level. Report the t and p values.

AQ104.09.06.2 The researcher realizes that she accidentally recorded the score that should have been 76 as 67. Are these corrected scores significantly different from 65 at the .05 level? (relevant section)

AQ104.09.07. A (hypothetical) experiment is conducted on the effect of alcohol on perceptual motor ability. Ten subjects are each tested twice, once after having two drinks and once after having two glasses of water. The two tests were on two different days to give the alcohol a chance to wear off. Half of the subjects were given alcohol first and half were given water first. The scores of the 10 subjects are shown below. The first number for each subject is their performance in the “water” condition. Higher scores reflect better performance. Test to see if alcohol had a significant effect. Report the t and p values. (relevant section)

 water alcohol 16 13 15 13 11 10 20 18 19 17 14 11 13 10 15 15 14 11 16 16

AQ104.09.07. The scores on a (hypothetical) vocabulary test of a group of 20 year olds and a group of 60 year olds are shown below.

 20 yr olds 60 yr olds 27 26 26 29 21 29 24 29 15 27 18 16 17 20 12 27 13

AQ104.09.07.1 Test the mean difference for significance using the .05 level. (relevant section).

AQ104.09.08. The sampling distribution of a statistic is normally distributed with an estimated standard error of 12 (df = 20). AQ104.09.08.1 What is the probability that you would have gotten a mean of 107 (or more extreme) if the population parameter were 100? Is this probability significant at the .05 level (two-tailed)? AQ104.09.08.2 What is the probability that you would have gotten a mean of 95 or less (one-tailed)? AQ104.09.08.3 Is this probability significant at the .05 level? You may want to use the t Distribution calculator for this problem. (relevant section)

AQ104.09.09. How do you decide whether to use an independent groups t test or a correlated t test (test of dependent means)? relevant section & (relevant section)

AQ104.09.10. An experiment compared the ability of three groups of subjects to remember briefly-presented chess positions. The data are shown below.

 Non-players Beginners Tournament players 22.1 32.5 40.1 22.3 37.1 45.6 26.2 39.1 51.2 29.6 40.5 56.4 31.7 45.5 58.1 33.5 51.3 71.1 38.9 52.6 74.9 39.7 55.7 75.9 43.2 55.9 80.3 43.2 57.7 85.3

AQ104.09.10.1 Using the Tukey HSD procedure, determine which groups are significantly different from each other at the .05 level. (relevant section)

AQ104.09.10.2 Now compare each pair of groups using t-tests. Make sure to control for the familywise error rate (at 0.05) by using the Bonferroni correction. Specify the alpha level you used.

AQ104.09.11. Below are data showing the results of six subjects on a memory test. The three scores per subject are their scores on three trials (a, b, and c) of a memory task.

AQ104.09.11.1 Are the subjects getting better each trial?

AQ104.09.11.2 Test the linear effect of trial for the data.

 a b c 4 6 7 3 7 8 2 8 5 1 4 7 4 6 9 2 4 2

AQ104.09.11.3 Compute L for each subject using the contrast weights -1, 0, and 1. That is, compute (-1)(a) + (0)(b) + (1)(c) for each subject.

AQ104.09.11.4 Compute a one-sample t-test on this column (with the L values for each subject) you created. (relevant section)

AQ104.09.12. Participants threw darts at a target. In one condition, they used their preferred hand; in the other condition, they used their other hand. All subjects performed in both conditions (the order of conditions was counterbalanced). Their scores are shown below.

 Preferred Non-preferred 12 7 7 9 11 8 13 10 10 9

AQ104.09.12.1 Which kind of t-test should be used?

AQ104.09.12.2 Calculate the two-tailed t and p values using this t test.

AQ104.09.12.3 Calculate the one-tailed t and p values using this t test.

AQ104.09.13. Assume the data in the previous problem were collected using two different groups of subjects: One group used their preferred hand and the other group used their non-preferred hand. Analyze the data and compare the results to those for the previous problem (relevant section)

AQ104.09.14. You have 4 means, and you want to compare each mean to every other mean. AQ104.09.14.1 How many tests total are you going to compute? AQ104.09.14.2 What would be the chance of making at least one Type I error if the Type I error for each test was .05 and the tests were independent? (relevant section & relevant section ) AQ104.09.14.3 Are the tests independent and how does independence/non-independence affect the probability in AQ104.09.14.2.

AQ104.09.15. In an experiment, participants were divided into 4 groups. There were 20 participants in each group, so the degrees of freedom (error) for this study was 80 – 4 = 76. Tukey’s HSD test was performed on the data. AQ104.09.15.1 Calculate the p value for each pair based on the Q value given below. You will want to use the Studentized Range Calculator. AQ104.09.15.2 Which differences are significant at the .05 level? (relevant section

 Comparison of Groups Q A – B 3.4 A – C 3.8 A – D 4.3 B – C 1.7 B – D 3.9 C – D 3.7

AQ104.09.16. If you have 5 groups in your study, why shouldn’t you just compute a t test of each group mean with each other group mean? (relevant section)

AQ104.09.17. You are conducting a study to see if students do better when they study all at once or in intervals. One group of 12 participants took a test after studying for one hour continuously. The other group of 12 participants took a test after studying for three twenty minute sessions. The first group had a mean score of 75 and a variance of 120. The second group had a mean score of 86 and a variance of 100.

AQ104.09.17.1 What is the calculated t value? Are the mean test scores of these two groups significantly different at the .05 level?

AQ104.09.17..2 What would the t value be if there were only 6 participants in each group? Would the scores be significant at the .05 level?

AQ104.09.17.3 A new test was designed to have a mean of 80 and a standard deviation of 10. A random sample of 20 students at your school take the test, and the mean score turns out to be 85. Does this score differ significantly from 80? To answer this problem, you may want to use the Normal Distribution Calculator. (relevant section)

AQ104.09.18. You perform a one-sample t test and calculate a t statistic of 3.0. The mean of your sample was 1.3 and the standard deviation was 2.6. How many participants were used in this study? (relevant section)

AQ104.09.19. True/false: The contrasts (-3, 1 1 1) and (0, 0 , -1, 1) are orthogonal. (relevant section)

AQ104.09.20. True/false: If you are making 4 comparisons between means, then based on the Bonferroni correction, you should use an alpha level of .01 for each test. (relevant section)

AQ104.09.21. True/false: Correlated t tests almost always have greater power than independent t tests. (relevant section)

AQ104.09.22. True/false:The graph below represents a violation of the homogeneity of variance assumption. relevant section)

AQ104.09.23. True/false: When you are conducting a one-sample t test and you know the population standard deviation, you look up the critical t value in the table based on the degrees of freedom. (relevant section)

Questions from Case Studies:

The following questions use data from the Angry Moods (AM) case study.

AQ104.09.24. Do athletes or non-athletes calm down more when angry? Conduct a t test to see if the difference between groups in Control-In scores is statistically significant.

AQ104.09.25. Do people in general have a higher Anger-Out or Anger-In score?

AQ104.09.26. Conduct a t test on the difference between means of these two scores.

AQ104.09.27. Are these two means independent or dependent? (relevant section)

The following questions use data from the Smiles and Leniency (SL) case study.

AQ104.09.28. Compare each mean to the neutral mean. Be sure to control for the familywise error rate. (relevant section)

AQ104.09.29. Does a “felt smile” lead to more leniency than other types of smiles? AQ104.09.29.1 Calculate L (the linear combination) using the following contrast weights false: -1, felt: 2, miserable: -1, neutral: 0. AQ104.09.29.2 Perform a significance test on this value of L. (relevant section)

The following questions are from the Animal Research (AR) case study.

AQ104.09.30. Conduct an independent samples t test comparing males to females on the belief that animal research is necessary. relevant section)

AQ104.09.31.1 Based on the t test you conducted in the previous problem, are you able to reject the null hypothesis if alpha = 0.05? AQ104.09.31.2 What about if alpha = 0.1? relevant section

AQ104.09.32.  Is there any evidence that the t test assumption of homogeneity of variance is violated in the t test you computed in AQ104.09.30? (relevant section)

The following questions use data from the ADHD Treatment (AT) case study.

AQ104.09.33. Compare each dosage with the dosage below it (compare d0 and d15, d15 and d30, and d30 and d60). Remember that the patients completed the task after every dosage. AQ104.09.33.1 If the familywise error rate is .05, what is the alpha level you will use for each comparison when doing the Bonferroni correction? AQ104.09.33.2 Which differences are significant at this level? (relevant section)

AQ104.09.34. Does performance increase linearly with dosage? Plot a line graph of this data.

AQ104.09.34.1 Compute L for each patient. To do this, create a new variable where you multiply the following coefficients by their corresponding dosages and then sum up the total: (-3)d0 + (-1)d15 + (1)d30 + (3)d60 (see #8). AQ104.09.34.2 What is the mean of L?

AQ104.09.33. Perform a significance test on L. Compute the 95% confidence interval for L. (relevant section)

AQ104.09.34. The scores of a random sample of 8 students on a physics test are as follows: 60, 62, 67, 69, 70, 72, 75, and 78. Test to see if the sample mean is significantly different from 65 at the .05 level. Report the t and p values.

AQ104.09.35. The researcher realizes that she accidentally recorded the score that should have been 76 as 67. Are these corrected scores significantly different from 65 at the .05 level? (relevant section)

AQ104.09.36. A (hypothetical) experiment is conducted on the effect of alcohol on perceptual motor ability. Ten subjects are each tested twice, once after having two drinks and once after having two glasses of water. The two tests were on two different days to give the alcohol a chance to wear off. Half of the subjects were given alcohol first and half were given water first. The scores of the 10 subjects are shown below. The first number for each subject is their performance in the “water” condition. Higher scores reflect better performance. Test to see if alcohol had a significant effect. Report the t and p values. (relevant section)

 water alcohol 16 13 15 13 11 10 20 18 19 17 14 11 13 10 15 15 14 11 16 16

AQ104.09.37. The scores on a (hypothetical) vocabulary test of a group of 20 year olds and a group of 60 year olds are shown below.

 20 yr olds 60 yr olds 27 26 26 29 21 29 24 29 15 27 18 16 17 20 12 27 13

AQ104.09.37.1 Test the mean difference for significance using the .05 level. (relevant section).

AQ104.09.38. The sampling distribution of a statistic is normally distributed with an estimated standard error of 12 (df = 20). AQ104.09.38.1 What is the probability that you would have gotten a mean of 107 (or more extreme) if the population parameter were 100? Is this probability significant at the .05 level (two-tailed)? AQ104.09.38.2 What is the probability that you would have gotten a mean of 95 or less (one-tailed)? AQ104.09.38.3 Is this probability significant at the .05 level? You may want to use the t Distribution calculator for this problem. (relevant section)

AQ104.09.39. How do you decide whether to use an independent groups t test or a correlated t test (test of dependent means)? relevant section & (relevant section)

AQ104.09.40. An experiment compared the ability of three groups of subjects to remember briefly-presented chess positions. The data are shown below.

 Non-players Beginners Tournament players 22.1 32.5 40.1 22.3 37.1 45.6 26.2 39.1 51.2 29.6 40.5 56.4 31.7 45.5 58.1 33.5 51.3 71.1 38.9 52.6 74.9 39.7 55.7 75.9 43.2 55.9 80.3 43.2 57.7 85.3

AQ104.09.40.1 Using the Tukey HSD procedure, determine which groups are significantly different from each other at the .05 level. (relevant section)

AQ104.09.40.2 Now compare each pair of groups using t-tests. Make sure to control for the familywise error rate (at 0.05) by using the Bonferroni correction. Specify the alpha level you used.

AQ104.09.41. Below are data showing the results of six subjects on a memory test. The three scores per subject are their scores on three trials (a, b, and c) of a memory task. Are the subjects getting better each trial? Test the linear effect of trial for the data.

 a b c 4 6 7 3 7 8 2 8 5 1 4 7 4 6 9 2 4 2

AQ104.09.41.1 Compute L for each subject using the contrast weights -1, 0, and 1. That is, compute (-1)(a) + (0)(b) + (1)(c) for each subject.

AQ104.09.41.2 Compute a one-sample t-test on this column (with the L values for each subject) you created. (relevant section)

AQ104.09.42. Participants threw darts at a target. In one condition, they used their preferred hand; in the other condition, they used their other hand. All subjects performed in both conditions (the order of conditions was counterbalanced). Their scores are shown below.

 Preferred Non-preferred 12 7 7 9 11 8 13 10 10 9

AQ104.09.42.1 Which kind of t-test should be used?

AQ104.09.42.2 Calculate the two-tailed t and p values using this t test.

AQ104.09.42.3 Calculate the one-tailed t and p values using this t test.

AQ104.09.43. Assume the data in the previous problem were collected using two different groups of subjects: One group used their preferred hand and the other group used their non-preferred hand. Analyze the data and compare the results to those for the previous problem (relevant section)

AQ104.09.44. You have 4 means, and you want to compare each mean to every other mean. AQ104.09.44.1 How many tests total are you going to compute? AQ104.09.44.2 What would be the chance of making at least one Type I error if the Type I error for each test was .05 and the tests were independent? (relevant section & relevant section ) AQ104.09.44.3 Are the tests independent and how does independence/non-independence affect the probability in AQ104.09.44.2?

AQ104.09.45. In an experiment, participants were divided into 4 groups. There were 20 participants in each group, so the degrees of freedom (error) for this study was 80 – 4 = 76. Tukey’s HSD test was performed on the data. AQ104.09.45.1 Calculate the p value for each pair based on the Q value given below. You will want to use the Studentized Range Calculator. AQ104.09.45.2 Which differences are significant at the .05 level? (relevant section

 Comparison of Groups Q A – B 3.4 A – C 3.8 A – D 4.3 B – C 1.7 B – D 3.9 C – D 3.7

AQ104.09.46. If you have 5 groups in your study, why shouldn’t you just compute a t test of each group mean with each other group mean? (relevant section)

AQ104.09.47. You are conducting a study to see if students do better when they study all at once or in intervals. One group of 12 participants took a test after studying for one hour continuously. The other group of 12 participants took a test after studying for three twenty minute sessions. The first group had a mean score of 75 and a variance of 120. The second group had a mean score of 86 and a variance of 100.

AQ104.09.47.1 What is the calculated t value? Are the mean test scores of these two groups significantly different at the .05 level?

AQ104.09.47.2 What would the t value be if there were only 6 participants in each group? Would the scores be significant at the .05 level?

AQ104.09.48. A new test was designed to have a mean of 80 and a standard deviation of 10. A random sample of 20 students at your school take the test, and the mean score turns out to be 85. Does this score differ significantly from 80? To answer this problem, you may want to use the Normal Distribution Calculator.(relevant section)

AQ104.09.49. You perform a one-sample t test and calculate a t statistic of 3.0. The mean of your sample was 1.3 and the standard deviation was 2.6. How many participants were used in this study? (relevant section)

AQ104.09.50. True/false: The contrasts (-3, 1 1 1) and (0, 0 , -1, 1) are orthogonal. (relevant section)

AQ104.09.51. True/false: If you are making 4 comparisons between means, then based on the Bonferroni correction, you should use an alpha level of .01 for each test. (relevant section)

AQ104.09.52. True/false: Correlated t tests almost always have greater power than independent t tests. (relevant section)

AQ104.09.53. True/false: The graph below represents a violation of the homogeneity of variance assumption. relevant section) AQ104.09.54. True/false: When you are conducting a one-sample t test and you know the population standard deviation, you look up the critical t value in the table based on the degrees of freedom. (relevant section)

Questions from Case Studies:

The following questions use data from the Angry Moods (AM) case study.

AQ104.09.55. Do athletes or non-athletes calm down more when angry? Conduct a t test to see if the difference between groups in Control-In scores is statistically significant.

AQ104.09.56. Do people in general have a higher Anger-Out or Anger-In score? AQ104.09.56.1 Conduct a t test on the difference between means of these two scores. AQ104.09.56.2 Are these two means independent or dependent? (relevant section)

The following questions use data from the Smiles and Leniency (SL) case study.

AQ104.09.57. Compare each mean to the neutral mean. Be sure to control for the familywise error rate. (relevant section)

AQ104.09.58. Does a “felt smile” lead to more leniency than other types of smiles? AQ104.09.58.1 Calculate L (the linear combination) using the following contrast weights false: -1, felt: 2, miserable: -1, neutral: 0. AQ104.09.58.2 Perform a significance test on this value of L. (relevant section)

The following questions are from the Animal Research (AR) case study.

AQ104.09.59. Conduct an independent samples t test comparing males to females on the belief that animal research is necessary. (relevant section)

AQ104.09.60.1 Based on the t test you conducted in the previous problem, are you able to reject the null hypothesis if alpha = 0.05? AQ104.09.60.2 What about if alpha = 0.1? relevant section)

AQ104.09.61. Is there any evidence that the t test assumption of homogeneity of variance is violated in the t test you computed in AQ104.09.59? (relevant section)

The following questions use data from the ADHD Treatment (AT) case study.

AQ104.09.62. Compare each dosage with the dosage below it (compare d0 and d15, d15 and d30, and d30 and d60). Remember that the patients completed the task after every dosage. AQ104.09.62.1 If the familywise error rate is .05, what is the alpha level you will use for each comparison when doing the Bonferroni correction? AQ104.09.62.2 Which differences are significant at this level? (relevant section)

AQ104.09.63. Does performance increase linearly with dosage? Plot a line graph of this data.

AQ104.09.64.1 Compute L for each patient. To do this, create a new variable where you multiply the following coefficients by their corresponding dosages and then sum up the total: (-3)d0 + (-1)d15 + (1)d30 + (3)d60 (see AQ104.09.62.). AQ104.09.64.2 What is the mean of L?

AQ104.09.65. Perform a significance test on L. Compute the 95% confidence interval for L. (relevant section)