Transformation, Chi Square, Distribution Free Tests, and Effect Size

… a core topic in Quantitative Methods and Atlas104

RandomizedControlTopic description

This topic examines the statistical techniques of transformation, Chi Square, distribution free tests, and determination of effect size.

The treatment of this topic on the Atlas follows almost precisely that in Chapter XVI, Transformations, Chapter XVII, Chi Square, Chapter XVIII, Distribution Free Tests, and Chapter XIX, Effect Size, in OnlineStatBook, Online Statistics Education – An Interactive Multimedia Course of Study, http://onlinestatbook.com/2/index.html, accessed 15 May 2016.

Topic learning outcome

Familiarity with the application of transformation, Chi Square, distribution free tests, and determination of effect size to statistical problems including following core concepts and terms.

[Note: Until Atlas pages are created for individual concepts in Quantitative Methods, the links in the concepts below point directly to the relevant pages in OnlineStatBook.]

Core concepts associated with this topic

Log

Tukey’s Ladder of Powers

Box-Cox Transformations

Chi Square Distribution

One-Way Tables (Testing Goodness of Fit)

Testing Distributions Demo

Contingency Tables

2 x 2 Table Simulation

Benefits of Distribution-Free Tests

Randomized Tests -Two Conditions

Randomized Tests – Two or More Conditions

Randomization Tests – Association (Pearson’s r)

Randomized Tests – Contingency Tables (Fisher’s Exact Test)

Rank Randomization Tests – Two Conditions (Mann-Whitney U, Wilcoxon Rank Sum)

 

Rank Randomization Tests – Two or More Conditions (Kruskal-Wallis)

Rank Randomization Tests – Association (Spearman’s ρ)

Proportions

Difference between Means

Variance Explained

Readings

Read and/or watch video for each of the concept pages above (top to bottom, starting in left column).

Read the Statistical Literacy exercise, Stock Appreciation, and answer the question.

Read the Statistical Literacy exercise, A Spice Inhibits Liver Cancer, and answer the questions.

Read the Statistical Literacy exercise, Troponin Concentration and Ventricular Strain, and answer the question.

Read the Statistical Literacy exercise, Health Effects of Coffee, and answer the question.

Read the ADHD case study and do the following exercises:

  • Transform the data in the placebo condition (D0) with λ’s of .5, 0, -.5, and -1.
  • How does the skew in each of these compare to the skew in the raw data?
  • Which transformation leads to the least skew?

Read the SAT and GPA (SG) case study and do the following exercises:

  • Answer these items to determine if the math SAT scores are normally distributed. You may want to first standardize the scores. (relevant section)

(a) If these data were normally distributed, how many scores would you expect there to be in each of these brackets: (i) smaller than 1 SD below the mean, (ii) in between the mean and 1 SD below the mean, (iii) in between the mean and 1 SD above the mean, (iv) greater than 1 SD above the mean?

(b) How many scores are actually in each of these brackets?

(c) Conduct a Chi Square test to determine if the math SAT scores are normally distributed based on these expected and observed frequencies. (relevant section)

  • Compute Spearman’s ρ for the relationship between UGPA and SAT.

Read the Diet and Health (DH) case study and do the following exercise:

  • Conduct a Pearson Chi Square test to determine if there is any relationship between diet and outcome. Report the Chi Square and p values and state your conclusions. (relevant section)

Read the Stereograms case study and do the following exercise.

  • Test the difference in central tendency between the two conditions using a rank-randomization test (with the normal approximation) with a one-tailed test. Give the Z and the p.

Read the Smiles and Leniency case study and do the following exercise:

  • Test the difference in central tendency between the four conditions using a rank-randomization test (with the normal approximation). Give the Chi Square and the p.

Sample assessment questions

From http://onlinestatbook.com/2/transformations/exercises.html, http://onlinestatbook.com/2/chi_square/ch14_exercises.html, http://onlinestatbook.com/2/distribution_free_tests/exercises.html, and http://onlinestatbook.com/2/effect_size/exercises.html, accessed 15 May 2016.

AQ104.12.01. When is a log transformation valuable?

AQ104.12.02. If the arithmetic mean of log10 transformed data were 3, what would be the geometric mean?

AQ104.12.03. Using Tukey’s ladder of transformation, transform the following data using a λ of 0.5: 9, 16, 25

AQ104.12.04. What value of λ in Tukey’s ladder decreases skew the most?

AQ104.12.05. What value of λ in Tukey’s ladder increases skew the most?

AQ104.12.06. In the ADHD case study, transform the data in the placebo condition (D0) with λ’s of .5, 0, -.5, and -1. How does the skew in each of these compare to the skew in the raw data. Which transformation leads to the least skew?

AQ104.12.07. Which of the two Chi Square distributions shown below (A or B) has the larger degrees of freedom? How do you know? (relevant section)

AQ104.12.07. Twelve subjects were each given two flavors of ice cream to taste and then were asked whether they liked them. Two of the subjects liked the first flavor and nine of them liked the second flavor. AQ104.12.07.1 Is it valid to use the Chi Square test to determine whether this difference in proportions is significant? AQ104.12.07.2 Why or why not? (relevant section)

AQ104.12.08. A die is suspected of being biased. It is rolled 25 times with the following result:

Outcome
Frequency
1
9
2
4
3
1
4
8
5
3
6
0

AQ104.12.08. Conduct a significance test to see if the die is biased. AQ104.12.08.1 What Chi Square value do you get and how many degrees of freedom does it have? AQ104.12.08.2 What is the p value? (relevant section)

AQ104.12.09. A recent experiment investigated the relationship between smoking and urinary incontinence. Of the 322 subjects in the study who were incontinent, 113 were smokers, 51 were former smokers, and 158 had never smoked. Of the 284 control subjects who were not incontinent, 68 were smokers, 23 were former smokers, and 193 had never smoked. AQ104.12.09.1 Create a table displaying this data. AQ104.12.09.2 What is the expected frequency in each cell? Conduct a significance test to see if there is a relationship between smoking and incontinence. AQ104.12.09.3 What Chi Square value do you get? AQ104.12.09.4 What p value do you get? AQ104.12.09.5  What do you conclude? (relevant section)

AQ104.12.10. At a school pep rally, a group of sophomore students organized a free raffle for prizes. They claim that they put the names of all of the students in the school in the basket and that they randomly drew 36 names out of this basket. Of the prize winners, 6 were freshmen, 14 were sophomores, 9 were juniors, and 7 were seniors. The results do not seem that random to you. You think it is a little fishy that sophomores organized the raffle and also won the most prizes. Your school is composed of 30% freshmen, 25% sophomores, 25% juniors, and 20% seniors. AQ104.12.10.1 What are the expected frequencies of winners from each class? AQ104.12.10.2 Conduct a significance test to determine whether the winners of the prizes were distributed throughout the classes as would be expected based on the percentage of students in each group. Report your Chi Square and p values. AQ104.12.10.3 What do you conclude? (relevant section)

AQ104.12.11. Some parents of the West Bay little leaguers think that they are noticing a pattern. There seems to be a relationship between the number on the kids’ jerseys and their position. These parents decide to record what they see. The hypothetical data appear below. Conduct a Chi Square test to determine if the parents’ suspicion that there is a relationship between jersey number and position is right. Report your Chi Square and p values. (relevant section)

Infield
Outfield
Pitcher
Total
0-9
12
5
5
22
10-19
5
10
2
17
20+
4
4
7
15
Total
21
19
14
54

AQ104.12.12. True/false: A Chi Square distribution with 2 df has a larger mean than a Chi Square distribution with 12 df. (relevant section)

AQ104.12.13. True/false: A Chi Square test is often used to determine if there is a significant relationship between two continuous variables. (relevant section)

AQ104.12.14. True/false: Imagine that you want to determine if the spinner shown below is biased. You spin it 50 times and write down how many times the arrow lands in each section. You will reject the null hypothesis at the .05 level and determine that this spinner is biased if you calculate a Chi Square value of 7.82 or higher. (relevant section)

Questions from Case Studies:

The following question uses data from the SAT and GPA (SG) case study.

AQ104.12.15. Answer these items to determine if the math SAT scores are normally distributed. You may want to first standardize the scores. (relevant section)

AQ104.12.15.1 If these data were normally distributed, how many scores would you expect there to be in each of these brackets: (i) smaller than 1 SD below the mean, (ii) in between the mean and 1 SD below the mean, (iii) in between the mean and 1 SD above the mean, (iv) greater than 1 SD above the mean?

AQ104.12.15.2 How many scores are actually in each of these brackets?

AQ104.12.15.3 Conduct a Chi Square test to determine if the math SAT scores are normally distributed based on these expected and observed frequencies. (relevant section)The following questions are from the Diet and Health (DH) case study.

AQ104.12.16. Conduct a Pearson Chi Square test to determine if there is any relationship between diet and outcome. Report the Chi Square and p values and state your conclusions. (relevant section)

The following questions are from ARTIST.
Visit the site

AQ104.12.17. A study compared members of a medical clinic who filed complaints with a random sample of members who did not complain. The study divided the complainers into two subgroups: those who filed complaints about medical treatment and those who filed nonmedical complaints. Here are the data on the total number in each group and the number who voluntarily left the medical clinic. Set up a two-way table. Analyze these data to see if there is a relationship between complaint (no, yes – medical, yes – nonmedical) and leaving the clinic (yes or no).

AQ104.12.18. Imagine that you believe there is a relationship between a person’s eye color and where he or she prefers to sit in a large lecture hall. You decide to collect data from a random sample of individuals and conduct a chi-square test of independence. What would your two-way table look like? Use the information to construct such a table, and be sure to label the different levels of each category.

AQ104.12.19. A geologist collects hand-specimen sized pieces of limestone from a particular area. A qualitative assessment of both texture and color is made with the following results. Is there evidence of association between color and texture for these limestones? Explain your answer.

AQ104.12.20. Suppose that college students are asked to identify their preferences in political affiliation (Democrat, Republican, or Independent) and in ice cream (chocolate, vanilla, or strawberry). Suppose that their responses are represented in the following two-way table (with some of the totals left for you to calculate).

AQ104.12.20.1 What proportion of the respondents prefer chocolate ice cream?

AQ104.12.20.2 What proportion of the respondents are Independents?

AQ104.12.20.3 What proportion of Independents prefer chocolate ice cream?

AQ104.12.20.4 What proportion of those who prefer chocolate ice cream are Independents?

AQ104.12.20.5 Analyze the data to determine if there is a relationship between political party preference and ice cream preference.

AQ104.12.21. NCAA collected data on graduation rates of athletes in Division I in the mid-1980s. Among 2,332 men, 1,343 had not graduated from college, and among 959 women, 441 had not graduated.

AQ104.12.21.1 Set up a two-way table to examine the relationship between gender and graduation.

AQ104.12.21.2 Identify a test procedure that would be appropriate for analyzing the relationship between gender and graduation. Carry out the procedure and state your conclusion.

AQ104.12.22. For the following data, how many ways could the data be arranged (including the original arrangement) so that the advantage of the Experimental Group mean over the Control Group mean is as large or larger then the original arrangement?

Experimental Control
5
10
15
16
17
1
2
3
4
9

AQ104.12.23. For the data in Problem AQ104.12.22, how many ways can the data be rearranged?

AQ104.12.24. What is the one-tailed probability for a test of the difference?

AQ104.12.25. For the following data, how many ways can the data be rearranged?

T1 T2 Control
7
8
11
14
19
21
0
2
5

AQ104.12.26. In general, are rank randomization tests or randomization tests more powerful?

AQ104.12.27. What is the advantage of rank randomization tests over randomization tests?

AQ104.12.28. Test whether the differences among conditions for the data in Problem 1 is significant (one tailed) at the .01 level using a rank randomization test.

Questions from Case Studies:

The following question uses data from the SAT and GPA case study.

AQ104.12.29. Compute Spearman’s ρ for the relationship between UGPA and SAT.

The following question uses data from the Stereograms case study.

AQ104.12.30. Test the difference in central tendency between the two conditions using a rank-randomization test (with the normal approximation) with a one-tailed test. Give the Z and the p.

The following question uses data from the Smiles and Leniency case study.

AQ104.12.31. Test the difference in central tendency between the four conditions using a rank-randomization test (with the normal approximation). Give the Chi Square and the p.

AQ104.12.32. If the probability of a disease is .34 without treatment and .22 with treatment then what is the

AQ104.12.32.1 absolute risk reduction
AQ104.12.32.2 relative risk reduction
AQ104.12.32.3 Odds ratio
AQ104.12.32.4 Number needed to treat

AQ104.12.33. When is it meaningful to compute the proportional difference between means?

AQ104.12.34. The mean for an experimental group is 12, the mean for the control group were 8, the MSE from the ANOVA is 16, and N, the number of observations is 20, compute g and d.

AQ104.12.35. Two experiments investigated the same variables but one of the experiment had subject who differed greatly from each other whereas the subjects in the other experiment were relatively homogeneous. Which experiment would likely have the larger value of g?

AQ104.12.36. Why is ω2 preferable to η2?

AQ104.12.37. What is the difference between η2 and partial η2?

The following questions are from the Teacher Ratings case study.

AQ104.12.38. What are the values of d and g?

AQ104.12.39. What are the values of ω2 and η2?

The following question is from the Smiles and Leniency case study.

AQ104.12.40. What are the values of ω2 and η2?

The following question is from the Obesity and Bias case study.

AQ104.12.41. For compute ω2 and partial ω2 for the effect of “Weight” in a “Weight x Relatedness” ANOVA.

Page created by: Ian Clark, last modified on 16 May 2016.

Image: edX, Pragmatic Randomized Controlled Trials in Health Care, at https://www.edx.org/course/pragmatic-randomized-controlled-trials-kix-kipractihx-0, accessed 15 May 2016.