Chapter 7: Relationships Among Variables

  1. Correlation is a statistical technique used to determine the relationship between two or more variables

II. Why use correlation?

a. Example:  To determine the degree of relationship between performances on a distance run and a step test as measures of cardiovascular fitness. 

b. Example: Skinfolds and hydrostatic weighing for body composition

III.       Coefficient of Correlation

a. Is a quantitative value of the relationship between two or more variables  and can range from +1.00 to –1.00. 

b. An  r = 1.0 is a perfect correlation and an r = 0.0 is no correlation

c. Positive correlation is when a small value for one variable is associated with a small value for another variable, and a large to a large.

d. A negative correlation is when a small value of one variable is associated with a large value of another variable

IV. Patterns of Relationships

a. Correlation and Causation

b. Correlation  between two variables does not mean that one variable causes another

c. Correlation is necessary but not sufficient for causation.

V. Pearson Product Moment Correlation

a. Pearson Product Moment Correlation Implies that the relationship is linear

b. Does not apply to a curvilinear relationship

       Example:  figure 7.4d

c.  Sometimes a negative correlation coefficient results when the relationship is really positive

d. Example:  correlate vertical jump with 40 m dash times

VI. Meaning of the Coefficient of Correlation

a. Significance see Appendix A.3

b. df = N-2 = 10 – 2 = 8

c. Needed for significance, two-tailed test, at the .05 level of significance

d. r = 0.632.  Ours is -0.54, so it is not significant.

e. It is even higher at the .01 level.

f. As N increases the r needed for significance decreases.

VII.      Coefficient of Determination

a. To interpret the meaningfulness of the correlation coefficient

b. Coefficient of Determination = r2

c. This indicates the portion of the total variance in one measure that can be accounted for or explained by the variance in the other measure.

VIII. Using Correlation for Prediction

a. College entrance exams

b. % Body Fat with skinfolds

c. VO2max using submax cycle ergometer

IX.       Prediction is Based on Correlation

a. Linear relationship:  Y = a + bX

b. Y=the predicted score, a=the intercept, b=the slope of the regression line, X= the predictor

            b = r(sy / sx )

a = My – bMx

X(body weight)   Y(strength)

Mx = 98.00            My= 167.00

sx = 9.44                sy = 33.52

r = .67

b = r(sy / sx ) = (.67)(33.52/9.44) = 2.38

a = My – bMx = 167 - 2.38(98) = -66.24

Y = -66.24 + 2.38X

X. Residual Scores and Standard Error of Prediction

a. The difference between the predicted score and the actual score is called a residual score. 

b. The mean of all the residual scores is Zero.

c. The standard deviation of all the residual scores is the Standard Error of Prediction

                                              _____

(Estimate) [sy•x = sy Ö(1-r2     ]

XI. Partial Correlation

Symbol is r12·3 which means the correlation between variables 1 and 2 with variable 3 held constant. 

Example:  Correlation between Shoe size and Math scores; 1 = math achievement, 2 = shoe size, 3 = age

Correlations:  r12 = .80, r13 = .90, r23 = .88

 

r12·3 =       r12 - r13r23               

                         --------------------

                          _____   ______

                        Ö1- r132 Ö 1- r232

r12·3 =  .039

XII. Multiple Regression

a. Involves one dependent variable (usually a criterion of some sort) and two or  more predictor variables (independent variables).

b. Using more than one predictor variable usually increases the accuracy of the prediction.

b. The Multiple correlation (R ) indicates the relationship between the criterion and a weighted sum of the predictor variables.

XIII. Selection Procedures in Multiple Regression

a. Full Model – all predictor variables included

b. Hierarchical – predictor variables are entered as blocks

c. Forward – Start with predictor with highest correlation, then the next highest

d. Backward – Start with all variables and remove the lowest correlation predictor variable first, then the next lowest.

e. Stepwise – A combination of forward and backward

XIV. Maximum R squared – program calculates the best variable combination

Multiple Regression Prediction Equations

Y = a + b1 X1 + b2 X2 +  … + bi Xi

Example:  LBW = 10.138 + 0.9259 (wt) – 0.1881 (thigh skinfold) + 0.637 (bi-iliac diameter) + 0.4888 (neck circumference) – 0.5951 (abdominal circumference)

(Behnke and Willmore, 1974)

XV. Problems with Multiple Regression

a. Generalizability – shrinkage is the loss of accuracy when using the regression equation to predict for individuals outside the sample.

b. Population specificity – The more accurate you want the equations to predict the less able the equations are to predict for other populations.

XVI. Canonical Correlation

Extension of  the multiple correlation (several predictors and one criterion) to several predictors and several criteria.

XVII. Factor Analysis

a. Many performance characteristics and variables are used to describe human behavior

b. Factor analysis is an approach to reducing a set of correlated measures to a smaller number of latent or hidden variables

c. It starts with calculating the intercorrelations of all the measures used.

d. The goal of FA is to discover the factors that best explain a group of measurements and describe the relation of each measure to the Factor or underlying construct.

e. Exploratory FA – many variables are reduced to an underlying set

f. Confirmatory FA – either supports or does not support a structure proposed from theory.  This is more useful.

XVIII. Structural Modeling

a. Path analysis and linear structural relations are structural and causal modeling techniques that are used to explain the way certain characteristics relate to one another and attempt to imply cause.

b. The way variables influence one another is not always clear; for example,

 X¬  Y ®  X, or X® Y ® Z

c. Whether they imply cause and effect depends on other things (e.g., control of all other variables, careful treatments, logical hypotheses, valid theories)