Describing Distributions

… a core topic in Quantitative Methods and Atlas104 Topic description

This topic examines the most common types of graphs and descriptive statistics for summarizing distributions.

The treatment of this topic on the Atlas follows almost precisely that in Chapter II, Graphing Distributions, and Chapter III, Summarizing Distributions, of OnlineStatBook, Online Statistics Education – An Interactive Multimedia Course of Study, http://onlinestatbook.com/2/index.html, accessed 14 May 2016.

Topic learning outcome

Familiarity with graphs and descriptive statistics including following core concepts and terms.

[Note: Until Atlas pages are created for individual concepts in Quantitative Methods, the links in the concepts below point directly to the relevant pages in OnlineStatBook.]

Core concepts associated with this topic

Read and/or watch video for each of the concept pages above (top to bottom, starting in left column).

Read the Statistical Literacy exercises, Are Commercial Vehicles in Texas Unsafe? and Linear By Design, and answer the questions.

Read the Angry Moods (AM) case study and complete following exercises:

• Is there a difference in how much males and females use aggressive behavior to improve an angry mood? For the “Anger-Out” scores:

a. Create parallel box plots. (relevant section)

b. Create a back to back stem and leaf displays (You may have trouble finding a computer to do this so you may have to do it by hand.) (relevant section)

• Create parallel box plots for the Anger-In scores by sports participation. (relevant section)
• Plot a histogram of the distribution of the Control-Out scores. (relevant section)
• Create a bar graph comparing the mean Control-In score for the athletes and the non-athletes. What would be a better way to display this data? (relevant section)
• Plot parallel box plots of the Anger Expression Index by sports participation. Does it look like there are any outliers? Which group reported expressing more anger? (relevant section)

Read the Flatulence (F) case study and complete following exercise:

• Plot a histogram of the variable “per day.” (relevant section)
• Based on a histogram of the variable “perday”, do you think the mean or median of this variable is larger? Calculate the mean and median to see if you are right. (relevant section & relevant section)
• Create parallel box plots of “how long” as a function gender. Why is the 25th percentile not showing? What can you say about the results? (relevant section)
• Create a stem and leaf plot of the variable “how long” What can you say about the shape of the distribution? (relevant section)

Read the Stroop (S) case study and complete following exercise:

Read the Physicians’ Reactions (PR) case study and answer the following questions.

• Create box plots comparing the time expected to be spent with the average-weight and overweight patients. (relevant section)
• What is the mean expected time spent for the average-weight patients? What is the mean expected time spent for the overweight patients? (relevant section)
• What is the difference in means between the groups? By approximately how many standard deviations do the means differ?
(relevant section & relevant section)
• Plot histograms of the time spent with the average-weight and overweight patients. (relevant section)
• To which group does the patient with the highest expected time belong?

Read the Smiles and Leniency (SL) case study and answer the following questions:

• Create parallel box plots for the four conditions. (relevant section)
• Find the mean, median, standard deviation, and interquartile range for the leniency scores of each of the four groups. (relevant section & relevant section)
• Create back to back stem and leaf displays for the false smile and neutral conditions. (It may be hard to find a computer program to do this for you, so be prepared to do it by hand). (relevant section)

Read the ADHD Treatment (AT) case study and complete the following exercises:

• Create a line graph of the data. Do certain dosages appear to be more effective than others? (relevant section)
• What is the mean number of correct responses of the participants after taking the placebo (0 mg/kg)? (relevant section)
• Create a stem and leaf plot of the number of correct responses of the participants after taking the placebo (d0 variable). What can you say about the shape of the distribution? (relevant section)
• What are the standard deviation and the interquartile range of the d0 condition? (relevant section)
• Create box plots for the four conditions. You may have to rearrange the data to get a computer program to create the box plots.

Read the SAT and College GPA case study and answer following questions:

• Create histograms and stem and leaf displays of both high-school grade point average and university grade point average. In what way(s) do the distributions differ?
Assessment questions

AQ104.02.01. Name some ways to graph quantitative variables and some ways to graph qualitative variables. (relevant section & relevant section)

AQ104.02.02. Based on the frequency polygon displayed below, the most common test grade was around what score? Explain. (relevant section) AQ104.02.03. An experiment compared the ability of three groups of participants to remember briefly-presented chess positions. The data are shown below. The numbers represent the total number of pieces correctly remembered from three chess positions. Create side-by-side box plots for these three groups. What can you say about the differences between these groups from the box plots?
(relevant section)

 Non-players Beginners Tournament players 22.1 32.5 40.1 22.3 37.1 45.6 26.2 39.1 51.2 29.6 40.5 56.4 31.7 45.5 58.1 33.5 51.3 71.1 38.9 52.6 74.9 39.7 55.7 75.9 43.2 55.9 80.3 43.2 57.7 85.3

AQ104.02.04. You have to decide between displaying your data with a histogram or with a stem and leaf display. What factor(s) would affect your choice? (relevant section & relevant section)

AQ104.02.05. In a box plot, what percent of the scores are between the lower and upper hinges? (relevant section)

AQ104.02.06. A student has decided to display the results of his project on the number of hours people in various countries slept per night. He compared the sleeping patterns of people from the US, Brazil, France, Turkey, China, Egypt, Canada, Norway, and Spain. He was planning on using a line graph to display this data. AQ104.02.06.1 Is a line graph appropriate? AQ104.02.06.2 What might be a better choice for a graph? (relevant section & relevant section)

AQ104.02.07. For the data from the 1977 Stat. and Biom. 200 class for eye color, construct: (relevant section)

AQ104.02.07.1 pie graph

AQ104.02.07.2 horizontal bar graph

AQ104.02.07.3 vertical bar graph

AQ104.02.07.4 a frequency table with the relative frequency of each eye color

 Eye Color Number of students Brown 11 Blue 10 Green 4 Gray 1

(Question submitted by J. Warren, UNH)

AQ104.02.08. A graph appears below showing the number of adults and children who prefer each type of soda. There were 130 adults and kids surveyed. Discuss some ways in which the graph below could be improved. (relevant section) AQ104.02.09.1 Which of the box plots below has a large positive skew? AQ104.02.09.2 Which has a large negative skew? (relevant section & relevant section) Questions from Case Studies:

The following questions are from the Angry Moods (AM) case study.

AQ104.02.10.1 (AM#6) Is there a difference in how much males and females use aggressive behavior to improve an angry mood?

AQ104.02.11. For the “Anger-Out” scores:

AQ104.02.11.1 Create parallel box plots. (relevant section)

AQ104.02.11.2 Create a back to back stem and leaf displays (You may have trouble finding a computer to do this so you may have to do it by hand.) (relevant section)

AQ104.02.12. (AM#9) Create parallel box plots for the Anger-In scores by sports participation. (relevant section)

AQ104.02.13. (AM#11) Plot a histogram of the distribution of the Control-Out scores. (relevant section)

AQ104.02.14.1 (AM#14) Create a bar graph comparing the mean Control-In score for the athletes and the non-athletes. AQ104.02.12.2 What would be a better way to display this data? (relevant section)

AQ104.02.15.1 (AM#18) Plot parallel box plots of the Anger Expression Index by sports participation. AQ104.02.15.2 Does it look like there are any outliers? AQ104.02.15.3 Which group reported expressing more anger? (relevant section)

The following questions are from the Flatulence (F) case study.

AQ104.02.16. (F#1) Plot a histogram of the variable “per day.” (relevant section)

AQ104.02.17.1 (F#7) Create parallel box plots of “how long” as a function gender. AQ104.02.17.2 Why is the 25th percentile not showing? AQ104.02.17.3 What can you say about the results? (relevant section)

AQ104.02.18.1 (F#9) Create a stem and leaf plot of the variable “how long.” AQ104.02.18.2 What can you say about the shape of the distribution? (relevant section.1)

The following questions are from the Physicians’ Reactions (PR) case study.

AQ104.02.19. (PR#1) Create box plots comparing the time expected to be spent with the average-weight and overweight patients. (relevant section)

AQ104.02.20. (PR#4) Plot histograms of the time spent with the average-weight and overweight patients. (relevant section)

AQ104.02.21. (PR#5) To which group does the patient with the highest expected time belong?

The following questions are from the Smiles and Leniency (SL) case study

AQ104.02.22. (SL#1) Create parallel box plots for the four conditions. (relevant section)

AQ104.02.23. (SL#3) Create back to back stem and leaf displays for the false smile and neutral conditions. (It may be hard to find a computer program to do this for you, so be prepared to do it by hand). (relevant section)

The following questions are from the ADHD Treatment (AT) case study.

AQ104.02.24.1 (AT#3) Create a line graph of the data. AQ104.02.24.2 Do certain dosages appear to be more effective than others? (relevant section)

AQ104.02.25.1 (AT#5) Create a stem and leaf plot of the number of correct responses of the participants after taking the placebo (d0 variable). AQ104.02.25.2 What can you say about the shape of the distribution? (relevant section)

AQ104.02.26. Create box plots for the four conditions. You may have to rearrange the data to get a computer program to create the box plots.

The following question is from the SAT and College GPA case study.

AQ104.02.27.1 Create histograms and stem and leaf displays of both high-school grade point average and university grade point average. AQ104.02.27.2 In what way(s) do the distributions differ?

AQ104.02.28. The April 10th issue of the Journal of the American Medical Association reports a study on the effects of anti-depressants. The study involved 340 subjects who were being treated for major depression. The subjects were randomly assigned to receive one of three treatments: St. John’s wort (an herb), Zoloft (Pfizer’s cousin of Lilly’s Prozac) or placebo for an 8-week period. The following are the mean scores (approximately) for the three groups of subjects over the eight-week experiment. The first column is the baseline. Lower scores mean less depression. Create a graph to display these means.

 Placebo 22.5 19.1 17.9 17.1 16.2 15.1 12.1 12.3 Wort 23 20.2 18.2 18 16.5 16.1 14.2 13 Zoloft 22.4 19.2 16.6 15.5 14.2 13.1 11.8 10.5

AQ104.02.29. questions are from Visit the site

AQ104.02.29. For the graph below, of heights of singers in a large chorus, please write a complete description of the histogram. Be sure to comment on all the important features. AQ104.02.30. Pretend you are constructing a histogram for describing the distribution of salaries for individuals who are 40 years or older, but are not yet retired. AQ104.02.30.1 What is on the Y-axis? Explain. AQ104.02.30.2 What is on the X-axis? AQ104.02.30.3 What would be the probable shape of the salary distribution? AQ104.02.30.4 Explain why.

AQ104.02.31. Make up a dataset of 12 numbers with a positive skew. Use a statistical program to compute the skew. AQ104.02.31.1 Is the mean larger than the median as it usually is for distributions with a positive skew? AQ104.02.31.2 What is the value for skew? (relevant section & relevant section )

AQ104.02.32. Repeat AQ104.02.31. only this time make the dataset have a negative skew. (relevant section & relevant section)

AQ104.02.33. Make up three data sets with 5 numbers each that have:

AQ104.02.33.1 the same mean but different standard deviations.
AQ104.02.33.2 the same mean but different medians.
AQ104.02.33.3 the same median but different means.
(relevant section & relevant section)

AQ104.02.34. Find the mean and median for the following three variables:
(relevant section)

A B C
8 4 6
5 4 2
7 6 3
1 3 4
3 4 1

AQ104.02.35. A sample of 30 distance scores measured in yards has a mean of 7, a variance of 16, and a standard deviation of 4. AQ104.02.35.1 You want to convert all your distances from yards to feet, so you multiply each score in the sample by 3. What are the new mean, variance, and standard deviation? AQ104.02.35.2 You then decide that you only want to look at the distance past a certain point. Thus, after multiplying the original scores by 3, you decide to subtract 4 feet from each of the scores. Now what are the new mean, variance, and standard deviation? (relevant section)

AQ104.02.36. You recorded the time in seconds it took for 8 participants to solve a puzzle. These times appear below. However, when the data was entered into the statistical program, the score that was supposed to be 22.1 was entered as 21.2. You had calculated the following measures of central tendency: the mean, the median, and the mean trimmed 25%. Which of these measures of central tendency will change when you correct the recording error? (relevant section & relevant section)

15.2
18.8
19.3
19.7
20.2
21.8
22.1
29.4

AQ104.02.37. For the test scores in AQ104.02.36, which measures of variability (range, standard deviation, variance) would be changed if the 22.1 data point had been erroneously recorded as 21.2? (relevant section)

AQ104.02.38. You know the minimum, the maximum, and the 25th, 50th, and 75th percentiles of a distribution. Which of the following measures of central tendency or variability can you determine?
(relevant section, relevant section & relevant section)

mean, median, mode, trimean, geometric mean,
range, interquartile range, variance, standard deviation

AQ104.02.39. For the numbers 1, 3, 4, 6, and 12:

AQ104.02.39.1 Find the value (v) for which Σ(X-v)2 is minimized.

AQ104.02.39.2 Find the value (v) for which Σ|x-v| is minimized.
(relevant section)

AQ104.02.40. Your younger brother comes home one day after taking a science test. He says that someone at school told him that “60% of the students in the class scored above the median test grade.” AQ104.02.40.1 What is wrong with this statement? AQ104.02.40.2 What if he said “60% of the students scored below the mean?” (relevant section)

AQ104.02.41. An experiment compared the ability of three groups of participants to remember briefly-presented chess positions. The data are shown below. The numbers represent the number of pieces correctly remembered from three chess positions. Compare the performance of each group. Consider spread as well as central tendency. (relevant section, relevant section & relevant section)

 Non-players Beginners Tournament players 22.1 32.5 40.1 22.3 37.1 45.6 26.2 39.1 51.2 29.6 40.5 56.4 31.7 45.5 58.1 33.5 51.3 71.1 38.9 52.6 74.9 39.7 55.7 75.9 43.2 55.9 80.3 43.2 57.7 85.3

AQ104.02.42. True/False: A bimodal distribution has two modes and two medians. (relevant section)

AQ104.02.43. True/False: The best way to describe a skewed distribution is to report the mean. (relevant section)

AQ104.02.44. True/False: When plotted on the same graph, a distribution with a mean of 50 and a standard deviation of 10 will look more spread out than will a distribution with a mean of 60 and a standard deviation of 5. (relevant section)

AQ104.02.45. Compare the mean, median, trimean in terms of their sensitivity to extreme scores (relevant section).

AQ104.02.46. If the mean time to respond to a stimulus is much higher than the median time to respond, what can you say about the shape of the distribution of response times? (relevant section)

AQ104.02.47. A set of numbers is transformed by taking the log base 10 of each number. The mean of the transformed data is 1.65. What is the geometric mean of the untransformed data? (relevant section)

AQ104.02.48. Which measure of central tendency is most often used for returns on investment?

AQ104.02.49. The histogram is in balance on the fulcrum. What are the mean, median, and mode of the distribution (approximate where necessary)? Questions from Case Studies:

The following questions are from the Angry Moods (AM) case study.

AQ104.02.50. (AM#4) Does Anger-Out have a positive skew, a negative skew, or no skew? (relevant section)

AQ104.02.51.1 (AM#8) What is the range of the Anger-In scores? AQ104.02.51.2 What is the interquartile range? (relevant section)

AQ104.02.52.1 (AM#12) What is the overall mean Control-Out score? AQ104.02.52.1 What is the mean Control-Out score for the athletes? What is the mean Control-Out score for the non-athletes? (relevant section)

AQ104.02.53.1 (AM#15) What is the variance of the Control-In scores for the athletes? AQ104.02.53.2 What is the variance of the Control-In scores for the non-athletes? (relevant section)

The following question is from the Flatulence (F) case study.

AQ104.02.54.1 (F#2) Based on a histogram of the variable “perday”, do you think the mean or median of this variable is larger? AQ104.02.54.2 Calculate the mean and median to see if you are right. (relevant section & relevant section)

The following questions are from the Stroop (S) case study.

AQ104.02.55. (S#1) Compute the mean for “words”. (relevant section)

(S#2) Compute the mean and standard deviation for “colors”.
(relevant section & relevant section)

The following questions are from the Physicians’ Reactions (PR) case study.

AQ104.02.57.1 (PR#2) What is the mean expected time spent for the average-weight patients? AQ104.02.57.2 What is the mean expected time spent for the overweight patients? (relevant section)

AQ104.02.58.1 (PR#3) What is the difference in means between the groups? AQ104.02.58.2 By approximately how many standard deviations do the means differ?
(relevant section & relevant section)

The following question is from the Smiles and Leniency (SL) case study.

(SL#2) Find the mean, median, standard deviation, and interquartile range for the leniency scores of each of the four groups. (relevant section & relevant section)

The following questions are from the ADHD Treatment (AT) case study.

AQ104.02.60 (AT#4) What is the mean number of correct responses of the participants after taking the placebo (0 mg/kg)? (relevant section)

AQ104.02.61 (AT#7) What are the standard deviation and the interquartile range of the d0 condition? (relevant section)