Regression

… a core topic in Quantitative Methods and Atlas104

RegressionGuyTopic description

This topic examines the statistical technique of regression.

The treatment of this topic on the Atlas follows almost precisely that in Chapter XIV, Regression, of OnlineStatBook, Online Statistics Education – An Interactive Multimedia Course of Study, http://onlinestatbook.com/2/index.html, accessed 15 May 2016.

Topic learning outcome

Familiarity with the application of regression to statistical problemes including following core concepts and terms.

[Note: Until Atlas pages are created for individual concepts in Quantitative Methods, the links in the concepts below point directly to the relevant pages in OnlineStatBook.]

Core concepts associated with this topic

Simple Linear Regression

Linear Fit Demo

Partitioning Sums of Squares

Standard Error of the Estimate

Inferential Statistics for b and r

Influential Observations

Regression Toward the Mean

Introduction to Multiple Regression

Readings

Read and/or watch video for each of the concept pages above (top to bottom, starting in left column).

Read the Statistical Literacy exercise, Regression Toward the Mean in American Football, and answer the question.

Read the Angry Moods (AM) case study and complete following exercises:

  • Find the regression line for predicting Anger-Out from Control-Out. What is the slope? What is the intercept?
  • Is the relationship at least approximately linear? Test to see if the slope is significantly different from 0.
  • What is the standard error of the estimate?
    (relevant section, relevant section, relevant section)

Read the SAT and GPA (SG) case study and answer the following questions:

  • Find the regression line for predicting the overall university GPA from the high school GPA. What is the slope? What is the y-intercept?
  • If someone had a 2.2 GPA in high school, what is the best estimate of his or her college GPA?
  • If someone had a 4.0 GPA in high school, what is the best estimate of his or her college GPA? (relevant section)

Read the Driving (D) case study and answer the following questions:

  • What is the correlation between age and how often the person chooses to drive in inclement weather? Is this correlation statistically significant at the .01 level? Are older people more or less likely to report that they drive in inclement weather? (relevant section, relevant section )
  • What is the correlation between how often a person chooses to drive in inclement weather and the percentage of accidents the person believes occur in inclement weather? Is this correlation significantly different from 0? (relevant section, relevant section )
  • Use linear regression to predict how often someone rides public transportation in inclement weather from what percentage of accidents that person thinks occur in inclement weather. (Pubtran by Accident) Create a scatter plot of this data and add a regression line. What is the slope? What is the intercept? Is the relationship at least approximately linear? Test if the slope is significantly different from 0. Comment on possible assumption violations for the test of the slope. What is the standard error of the estimate?
    (relevant section, relevant section, relevant section)

Assessment questions

From http://onlinestatbook.com/2/regression/regression_exercises.html, accessed 15 May 2016.

AQ104.10.01.1 What is the equation for a regression line? AQ104.10.01.2 What does each term in the line refer to? (relevant section)

AQ104.10.02. The formula for a regression equation based on a sample size of 25 observations is Y’ = 2X + 9. AQ104.10.02.1 What would be the predicted score for a person scoring 6 on X? AQ104.10.02.2 If someone’s predicted score was 14, what was this person’s score on X? (relevant section)

AQ104.10.03. What criterion is used for deciding which regression line fits best? (relevant section)

AQ104.10.04.1 What does the standard error of the estimate measure? AQ104.10.04.2 What is the formula for the standard error of the estimate? (relevant section)

AQ104.10.05.1 In a regression analysis, the sum of squares for the predicted scores is 100 and the sum of squares error is 200, what is R2? AQ104.10.05.2 In a different regression analysis, 40% of the variance was explained. The sum of squares total is 1000. AQ104.10.05.3 What is the sum of squares of the predicted values? (relevant section)

AQ104.10.06. For the X,Y data below, compute:

AQ104.10.06.1 r and determine if it is significantly different from zero.
AQ104.10.06.2 the slope of the regression line and test if it differs significantly from zero.
AQ104.10.06.3 the 95% confidence interval for the slope.
(relevant section)

X
Y
2
5
4
6
4
7
5
11
6
12

AQ104.10.07. What assumptions are needed to calculate the various inferential statistics of linear regression? (relevant section)

AQ104.10.08. The correlation between years of education and salary in a sample of 20 people from a certain company is .4. Is this correlation statistically significant at the .05 level? (relevant section)

AQ104.10.09. A sample of X and Y scores is taken, and a regression line is used to predict Y from X. If SSY’ = 300, SSE = 500, and N = 50, what is: (relevant section relevant section)

AQ104.10.09.1 SSY?
AQ104.10.09.2 the standard error of the estimate?
AQ104.10.09.3 R2?

AQ104.10.10. Using linear regression, find the predicted post-test score for someone with a score of 43 on the pre-test. (relevant section)

Pre Post
59 56
52 63
44 55
51 50
42 66
42 48
41 58
45 36
27 13
63 50
54 81
44 56
50 64
47 50
55 63
49 57
45 73
57 63
46 46
60 60
65 47
64 73
50 58
74 85
59 44

AQ104.10.11. The equation for a regression line predicting the number of hours of TV watched by children (Y) from the number of hours of TV watched by their parents (X) is Y’ = 4 + 1.2X. The sample size is 12.

AQ104.10.11.1 If the standard error of b is .4, is the slope statistically significant at the .05 level? (relevant section)
AQ104.10.11.2 If the mean of X is 8, what is the mean of Y? (relevant section)

AQ104.10.12. Based on the table below, compute the regression line that predicts Y from X. (relevant section)

MX
MY
sX sY r
10
12
2.5 3.0 -0.6

AQ104.10.13. Does A or B have a larger standard error of the estimate? (relevant section)

AQ104.10.14. True/false: If the slope of a simple linear regression line is statistically significant, then the correlation will also always be significant. (relevant section)

AQ104.10.15.1 True/false: If the slope of the relationship between X and Y is larger for Population 1 than for Population 2, the correlation will necessarily be larger in Population 1 than in Population 2. AQ104.10.15.2 Why or why not? (relevant section)

AQ104.10.16. True/false: If the correlation is .8, then 40% of the variance is explained. (relevant section)

AQ104.10.17. True/false: If the actual Y score was 31, but the predicted score was 28, then the error of prediction is 3. (relevant section)

Questions from Case Studies:

The following question is from the Angry Moods (AM) case study.

AQ104.10.18. Find the regression line for predicting Anger-Out from Control-Out.

AQ104.10.18.1 What is the slope?
AQ104.10.18.2 What is the intercept?
AQ104.10.18.3 Is the relationship at least approximately linear?
AQ104.10.18.4 Test to see if the slope is significantly different from 0.
AQ104.10.18.5 What is the standard error of the estimate?
(relevant section, relevant section, relevant section)

The following question is from the SAT and GPA (SG) case study.

AQ104.10.19. Find the regression line for predicting the overall university GPA from the high school GPA.

AQ104.10.19.1 What is the slope?
AQ104.10.19.2 What is the y-intercept?
AQ104.10.19.3 If someone had a 2.2 GPA in high school, what is the best estimate of his or her college GPA?
AQ104.10.19.4 If someone had a 4.0 GPA in high school, what is the best estimate of his or her college GPA?
(relevant section)

The following questions are from the Driving (D) case study.

AQ104.10.20.1 What is the correlation between age and how often the person chooses to drive in inclement weather? AQ104.10.20.2 Is this correlation statistically significant at the .01 level? AQ104.10.20.3 Are older people more or less likely to report that they drive in inclement weather? (relevant section, relevant section )

AQ104.10.21.1 What is the correlation between how often a person chooses to drive in inclement weather and the percentage of accidents the person believes occur in inclement weather? AQ104.10.21.2 Is this correlation significantly different from 0? (relevant section, relevant section )

AQ104.10.22. (D#10) Use linear regression to predict how often someone rides public transportation in inclement weather from what percentage of accidents that person thinks occur in inclement weather. (Pubtran by Accident)

AQ104.10.22.1 Create a scatter plot of this data and add a regression line.
AQ104.10.22.2 What is the slope?
AQ104.10.22.3 What is the intercept?
AQ104.10.22.4 Is the relationship at least approximately linear?
AQ104.10.22.5 Test if the slope is significantly different from 0.
AQ104.10.22.6 Comment on possible assumption violations for the test of the slope.
AQ104.10.22.7 What is the standard error of the estimate?
(relevant section, relevant section, relevant section)

Page created by: Ian Clark, last modified on 12 June 2017.

Image: Introspective Mode, at http://www.introspective-mode.org/regression-procedures/, accessed 15 May 2016.