Chapter 7: Relationships Among Variables
II. Why use correlation?
a. Example: To determine the degree of relationship between performances on a distance run and a step test as measures of cardiovascular fitness.
b. Example: Skinfolds and hydrostatic weighing for body composition
III. Coefficient of Correlation
a. Is a quantitative value of the relationship between two or more variables and can range from +1.00 to –1.00.
b. An r = 1.0 is a perfect correlation and an r = 0.0 is no correlation
c. Positive correlation is when a small value for one variable is associated with a small value for another variable, and a large to a large.
d. A negative correlation is when a small value of one variable is associated with a large value of another variable
IV. Patterns of Relationships
a. Correlation and Causation
b. Correlation between two variables does not mean that one variable causes another
c. Correlation is necessary but not sufficient for causation.
V. Pearson Product Moment Correlation
a. Pearson Product Moment Correlation Implies that the relationship is linear
b. Does not apply to a curvilinear relationship
Example: figure 7.4d
c. Sometimes a negative correlation coefficient results when the relationship is really positive
d. Example: correlate vertical jump with 40 m dash times
VI. Meaning of the Coefficient of Correlation
a. Significance see Appendix A.3
b. df = N-2 = 10 – 2 = 8
c. Needed for significance, two-tailed test, at the .05 level of significance
d. r = 0.632. Ours is -0.54, so it is not significant.
e. It is even higher at the .01 level.
f. As N increases the r needed for significance decreases.
VII. Coefficient of Determination
a. To interpret the meaningfulness of the correlation coefficient
b. Coefficient of Determination = r2
c. This indicates the portion of the total variance in one measure that can be accounted for or explained by the variance in the other measure.
VIII. Using Correlation for Prediction
a. College entrance exams
b. % Body Fat with skinfolds
c. VO2max using submax cycle ergometer
IX. Prediction is Based on Correlation
a. Linear relationship: Y = a + bX
b. Y=the predicted score, a=the intercept, b=the slope of the regression line, X= the predictor
b = r(sy / sx )
a = My – bMx
X(body weight) Y(strength)
Mx = 98.00 My= 167.00
sx = 9.44 sy = 33.52
r = .67
b = r(sy / sx ) = (.67)(33.52/9.44) = 2.38
a = My – bMx = 167 - 2.38(98) = -66.24
Y = -66.24 + 2.38X
X. Residual Scores and Standard Error of Prediction
a. The difference between the predicted score and the actual score is called a residual score.
b. The mean of all the residual scores is Zero.
c. The standard deviation of all the residual scores is the Standard Error of Prediction
_____
(Estimate) [sy•x = sy Ö(1-r2 ]
XI. Partial Correlation
Symbol is r12·3 which means the correlation between variables 1 and 2 with variable 3 held constant.
Example: Correlation between Shoe size and Math scores; 1 = math achievement, 2 = shoe size, 3 = age
Correlations: r12 = .80, r13 = .90, r23 = .88
r12·3 = r12 - r13r23
--------------------
_____ ______
Ö1- r132 Ö 1- r232
r12·3 = .039
XII. Multiple Regression
a. Involves one dependent variable (usually a criterion of some sort) and two or more predictor variables (independent variables).
b. Using more than one predictor variable usually increases the accuracy of the prediction.
b. The Multiple correlation (R ) indicates the relationship between the criterion and a weighted sum of the predictor variables.
XIII. Selection Procedures in Multiple Regression
a. Full Model – all predictor variables included
b. Hierarchical – predictor variables are entered as blocks
c. Forward – Start with predictor with highest correlation, then the next highest
d. Backward – Start with all variables and remove the lowest correlation predictor variable first, then the next lowest.
e. Stepwise – A combination of forward and backward
XIV. Maximum R squared – program calculates the best variable combination
Multiple Regression Prediction Equations
Y = a + b1 X1 + b2 X2 + … + bi Xi
Example: LBW = 10.138 + 0.9259 (wt) – 0.1881 (thigh skinfold) + 0.637 (bi-iliac diameter) + 0.4888 (neck circumference) – 0.5951 (abdominal circumference)
(Behnke and Willmore, 1974)
XV. Problems with Multiple Regression
a. Generalizability – shrinkage is the loss of accuracy when using the regression equation to predict for individuals outside the sample.
b. Population specificity – The more accurate you want the equations to predict the less able the equations are to predict for other populations.
XVI. Canonical Correlation
Extension of the multiple correlation (several predictors and one criterion) to several predictors and several criteria.
XVII. Factor Analysis
a. Many performance characteristics and variables are used to describe human behavior
b. Factor analysis is an approach to reducing a set of correlated measures to a smaller number of latent or hidden variables
c. It starts with calculating the intercorrelations of all the measures used.
d. The goal of FA is to discover the factors that best explain a group of measurements and describe the relation of each measure to the Factor or underlying construct.
e. Exploratory FA – many variables are reduced to an underlying set
f. Confirmatory FA – either supports or does not support a structure proposed from theory. This is more useful.
XVIII. Structural Modeling
a. Path analysis and linear structural relations are structural and causal modeling techniques that are used to explain the way certain characteristics relate to one another and attempt to imply cause.
b. The way variables influence one another is not always clear; for example,
X¬ Y ® X, or X® Y ® Z
c. Whether they imply cause and effect depends on other things (e.g., control of all other variables, careful treatments, logical hypotheses, valid theories)