Correlation Analysis

Correlation Analysis

 

the aggregate of methods, based on the mathematical theory of correlation, for finding the correlation between two random attributes or factors. Correlation analysis of experimental data includes the following fundamental practical methods: (1) the construction of scatter diagrams and the compilation of correlation tables, (2) the calculation of sample correlation coefficients or correlation ratios, and (3) testing of a statistical hypothesis concerning the significance of a relationship. Further investigation consists of establishing the specific form of the relationship between the quantities. The relationship between three or more random attributes or factors is studied by the methods of multi-dimensional correlation analysis (computation of partial and multiple correlation coefficients and correlation ratios).

Scatter diagrams and correlation tables are auxiliary methods in the analysis of sampled data. A scatter diagram is obtained by plotting the sample points on a coordinate plane. By the nature of the arrangement of the points on the diagram, it is possible to form a preliminary opinion about the form of the relationship of the random quantities (for example, whether, on the average, one quantity increases or decreases with an increase in the other). For numerical analysis, the results are usually grouped and presented in the form of a correlation table. Each location in the correlation table (see) contains the frequencies nij of those (x, y) pairs whose components fall within the corresponding group intervals in each variable.

Assuming the lengths of the group intervals (in each of the variables) are equal, we choose the centers xi (and respectively yj) of the intervals and the numbers nij as the bases for calculation.

The correlation coefficient and the correlation ratio provide more precise information on the nature and the measure of the relationship than does the scatter diagram. The sample correlation coefficient is defined by the formula

where

For a large number of independent observations obeying the same distribution law and for a proper choice of group intervals, the coefficient ρ̂ is close to the true correlation coefficient ρ. Therefore, the use of ρ̂ as a measure of relationship has a sharply defined meaning for those distributions for which ρ may serve as a natural measure of relationship (that is, for normal or almost normal distributions). In all other cases, it is recommended to use the correlation ratio η, whose interpretation does not depend on the form of the relationship being studied, as a characteristic of the strength of the relationship. The sample value ηY/X is computed from the data in the correlation table:

where the numerator characterizes the scatter of the conditional mean values yi = Σjnijj/ni near the unconditional mean ȳ (the sample value η̂2x/y is analogously defined). The quantity η̂2x/y — ρ2 is used as a measure of the deviation of the relationship from linearity, since usually η̂2x/y > ρ2 and η̂2x/y > ρ2 and only in the case of a linear relationship does ρ2 = η̂2x/y Thus, in the analysis of the correlation between the heights and the diameters of northern pines, it has been found that the conditional mean values of the heights of the pines for a given diameter are linked by a nonlinear relationship. The correlation ratio (of height to diameter) in this case equals 0.813, and the coefficient of correlation equals 0.762.

Testing of a hypothesis concerning the significance of a relation is based on a knowledge of the laws of the distribution of sample correlation characteristics. In the case of a normal distribution, the value of the sample correlation coefficient ρ is considered to be significantly different from zero if the inequality

(ρ̂)2 > [1 + (n − 2)/tα2]−1

is fulfilled, where is the critical value of Student’s t-distribution with (n — 2) degrees of freedom, which corresponds to a chosen significance level a. However, if it is known that ρ =£ 0, then it is necessary to use Fisher’s z-transformation (which does not depend on ρ or n):

It is possible to determine confidence intervals for the true correlation coefficient p from the approximate normality of z.

In the case when the attributes being studied are not quantitative but qualitative, the usual measures of relationship do not apply. However, if one can order the objects being studied with respect to some attribute, that is, assign to them sequential numbers— ranks (two numbers corresponding to the two attributes) —then one may use as a characteristic of relationship, for example, the rank-difference correlation coefficient:

where di is the difference between the ranks of the two attributes for each object. According to the degree of deviation of R from zero, it is possible to draw certain conclusions about the degree of relationship between the qualitative attributes. For small samples, the hypothesis of independence of attributes is tested with the aid of special tables, and for n > 10 the fact that the correlation coefficients are approximately normally distributed is used to compute critical values of these coefficients.

A. V. PROKHOROV

References in periodicals archive ?
The Pearson correlation analysis shows there is a positive correlation (r = 0.
To help drive in better results, the public sector should now look at how they can improve performance using the EPM suite of methods, integrate the relevant data gathered and to expand the use of analytics like segmentation and correlation analysis," he added.
Therefore, to better understand attribute contributions generated by Maxent, it is necessary to apply correlation analysis to all environmental attributes.
They did this by calculating the patient's average pain after deleting the final reading of the day and repeating the correlation analysis.
The data obtained by preliminary experiments are subdued to some statistic processing such as: variance analysis and correlation analysis.
Other topics include transmission and reflection coefficients for identifying damage in one-dimensional elements, the response characteristics of various impact patterns on the smart bumper of automotives, and applying cyclic correlation analysis to gear damage detection.
Scientists gain additional insight and identify the more informative variables in these studies by evaluating descriptive statistics and performing correlation analysis.
Data were analyzed using analysis of variance with means separation, as well as correlation analysis and principal component analysis.
Mathematics techniques of correlation analysis, regression analysis have now been applied to a computer model developed by United Glass to forecast beer consumption in the U.
In order to test this relationship, a correlation analysis was performed where compression levels were appropriately assigned values from 1 to 5 and used as an ordinal variable.
ABSTRACT : In this study, canonical correlation analysis (CCA) was applied to estimate the relationship between three different sexual maturity traits (X set: days to first egg (DFE), weight of the first egg (WFE), body weight at first egg (BWFE)) and level of nutrient intake (Y set: energy (EI) and protein intake (PI)) or the egg production traits at two different periods (Z set: number of egg ([NE.
Correlation analysis can be used to analyze relationships among major revenue sources such as property and sales taxes, in comparison with residential and commercial development, economic cycles, population growth, inflation, and so forth (see Exhibit 4).