Correlation Analysis

Correlation Analysis

 

the aggregate of methods, based on the mathematical theory of correlation, for finding the correlation between two random attributes or factors. Correlation analysis of experimental data includes the following fundamental practical methods: (1) the construction of scatter diagrams and the compilation of correlation tables, (2) the calculation of sample correlation coefficients or correlation ratios, and (3) testing of a statistical hypothesis concerning the significance of a relationship. Further investigation consists of establishing the specific form of the relationship between the quantities. The relationship between three or more random attributes or factors is studied by the methods of multi-dimensional correlation analysis (computation of partial and multiple correlation coefficients and correlation ratios).

Scatter diagrams and correlation tables are auxiliary methods in the analysis of sampled data. A scatter diagram is obtained by plotting the sample points on a coordinate plane. By the nature of the arrangement of the points on the diagram, it is possible to form a preliminary opinion about the form of the relationship of the random quantities (for example, whether, on the average, one quantity increases or decreases with an increase in the other). For numerical analysis, the results are usually grouped and presented in the form of a correlation table. Each location in the correlation table (see) contains the frequencies nij of those (x, y) pairs whose components fall within the corresponding group intervals in each variable.

Assuming the lengths of the group intervals (in each of the variables) are equal, we choose the centers xi (and respectively yj) of the intervals and the numbers nij as the bases for calculation.

The correlation coefficient and the correlation ratio provide more precise information on the nature and the measure of the relationship than does the scatter diagram. The sample correlation coefficient is defined by the formula

where

For a large number of independent observations obeying the same distribution law and for a proper choice of group intervals, the coefficient ρ̂ is close to the true correlation coefficient ρ. Therefore, the use of ρ̂ as a measure of relationship has a sharply defined meaning for those distributions for which ρ may serve as a natural measure of relationship (that is, for normal or almost normal distributions). In all other cases, it is recommended to use the correlation ratio η, whose interpretation does not depend on the form of the relationship being studied, as a characteristic of the strength of the relationship. The sample value ηY/X is computed from the data in the correlation table:

where the numerator characterizes the scatter of the conditional mean values yi = Σjnijj/ni near the unconditional mean ȳ (the sample value η̂2x/y is analogously defined). The quantity η̂2x/y — ρ2 is used as a measure of the deviation of the relationship from linearity, since usually η̂2x/y > ρ2 and η̂2x/y > ρ2 and only in the case of a linear relationship does ρ2 = η̂2x/y Thus, in the analysis of the correlation between the heights and the diameters of northern pines, it has been found that the conditional mean values of the heights of the pines for a given diameter are linked by a nonlinear relationship. The correlation ratio (of height to diameter) in this case equals 0.813, and the coefficient of correlation equals 0.762.

Testing of a hypothesis concerning the significance of a relation is based on a knowledge of the laws of the distribution of sample correlation characteristics. In the case of a normal distribution, the value of the sample correlation coefficient ρ is considered to be significantly different from zero if the inequality

(ρ̂)2 > [1 + (n − 2)/tα2]−1

is fulfilled, where is the critical value of Student’s t-distribution with (n — 2) degrees of freedom, which corresponds to a chosen significance level a. However, if it is known that ρ =£ 0, then it is necessary to use Fisher’s z-transformation (which does not depend on ρ or n):

It is possible to determine confidence intervals for the true correlation coefficient p from the approximate normality of z.

In the case when the attributes being studied are not quantitative but qualitative, the usual measures of relationship do not apply. However, if one can order the objects being studied with respect to some attribute, that is, assign to them sequential numbers— ranks (two numbers corresponding to the two attributes) —then one may use as a characteristic of relationship, for example, the rank-difference correlation coefficient:

where di is the difference between the ranks of the two attributes for each object. According to the degree of deviation of R from zero, it is possible to draw certain conclusions about the degree of relationship between the qualitative attributes. For small samples, the hypothesis of independence of attributes is tested with the aid of special tables, and for n > 10 the fact that the correlation coefficients are approximately normally distributed is used to compute critical values of these coefficients.

A. V. PROKHOROV

References in periodicals archive ?
Gridsum's core technology, the Gridsum Big Data Platform, is built on a distributed computing framework and performs real-time multi-dimensional correlation analysis of both structured and unstructured data.
Pinctada fucata, Growth trait, Growth rate, Genome methylation, Correlation analysis.
In correlation analysis, internet addiction was found positively related to depression, anxiety, and stress.
PM10 levels within the Projects air shed, and the study on the correlation analysis of ambient air
Table-II: Correlation analysis of pretreatment tumor volume, post-treatment-tumor-volume and treatment area ratio before and after treatment by electrode.
We then used correlation analysis to verify dependence between FDI flows carried out within the EU Single Market (absolute value and as a proportion of GDP), and the SCITR and ECITR indicators for each of the EU28 Members.
Among pairwise measures, correlation analysis is one of the most widely exploited tools for studying interactions among brain areas [10, 11], since it is strictly related to the common definition of functional connectivity as quantifying temporal correlations between spatially segregated areas.
Written for those interested in or considering analytics for help with business analysis and decision making, the book covers the seven most common analytics methodologies: aggregate analysis, correlation analysis, trends analysis, sizing/estimation, predictive analysis/time series, segmentation, and customer life cycle.
Additionally, severity of histopathological changes was categorically graded through the correlation analysis.