# multivariate analysis

Also found in: Dictionary, Thesaurus, Medical, Legal, Financial, Wikipedia.

## multivariate analysis

[¦məl·tē′ver·ē·ət ə′nal·ə·səs]## multivariate analysis

the analysis of data collected on several different VARIABLES. For example, in a study of housing provision, data may be collected on age, income, family size (the ‘variables’) of the population being studied. In analysing the data the effect of each of these variables can be examined, and also the interaction between them.There is a wide range of multivariate techniques available but most aim to simplify the data in some way in order to clarify relationships between variables. The choice of method depends on the nature of the data, the type of problem and the objectives of the analysis. FACTOR ANALYSIS and principle component analysis are exploratory, and used to find new underlying variables. CLUSTER ANALYSIS seeks to find natural groupings of objects or individuals. *Discriminant analysis * is a technique designed to clarify the differentiation between groups influenced by the independent variable(s). Other techniques, e.g. multiple REGRESSION ANALYSIS, aim to explain the variation in one variable by means of the variation in two or more independent variables. MANOVA (multivariate analysis of variance), an extension of the univariate ANALYSIS OF VARIANCE, is used when there are multiple independent variables, as in the example above. An example of multivariate techniques for analysing categorical data is LOG-LINEAR ANALYSIS.

## Multivariate Analysis

the branch of mathematical statistics dealing with methods of studying statistical data concerning objects for which more than one quantitative or qualitative characteristic is measured. In the area of multivariate analysis that has been most intensively investigated, it is assumed that the results of individual observations are independent and obey the same multivariate normal distribution. The term “multivariate analysis” is sometimes applied in a narrow sense to this area.

In more precise language, multivariate analysis deals with data where the result *X _{j}* of the

*l*th observation can be expressed in terms of the vector

*X*= (

_{j}*X*,

_{j1}*X*, . . . ,

_{j2}*X*). Here, the random variable

_{js}*X*has the mathematical expectation μ

_{jk}*and variance , and the correlation coefficient between*

_{k}*X*and

_{jk}*X*is ρ

_{jl}*. Of great importance are the mathematical expectation vector μ = (μ*

_{kl}_{l}, . . . ,μ

_{s}) and the covariance matrix Σ with elements σ

_{k}σ

_{l}σ

_{kl}, where

*k, 1*= 1, . . . ,

*s*. This vector and matrix define completely the distribution of the vectors

*X*. . . ,

_{l}*X*, which are the results of

_{n}*n*independent observations. The choice of the multivariate normal distribution as the principal mathematical model for multivariate analysis can be justified in part by the following considerations: on the one hand, this model can be used in a great number of applications; on the other hand, only within the framework of this model can exact distributions of sample characteristics be calculated. The sample mean

and the sample covariance matrix

are maximum likelihood estimators of_the corresponding parameters of the population; here, (*X _{j}* –

*X̄*)′ is the transpose of (

*X*–

_{j}*X̄*) (

*see*MATRIX). The distribution of

*X̄*is normal (µ,∑/

*n*). The joint distribution of the elements of the covariance matrix

*S*, known as the Wishart distribution, is a natural generalization of the chi-square distribution and plays an important role in multivariate analysis.

A number of problems in multivariate analysis are more or less analogous to the corresponding univariate problems—for example, the problem of testing hypotheses on the equality of the means in two independent samples. Examples of other problems are the testing of hypotheses on the independence of particular groups of components of the vectors *X _{j}* and the testing of such special hypotheses as the spherical symmetry of the distribution of the

*X*The need to understand the complicated relationships between the components of the random vectors

_{j}*X*leads to new problems. The method of the principal components and the method of canonical correlations are used to reduce the number of random characteristics—that is, the number of dimensions—under consideration or to reduce the characteristics to independent random variables.

_{j}In the method of principal components, the vectors *X _{j}* are carried by a transformation into the vectors

*Y*= (

_{j}*Y*..... ,

_{jl}*Y*). The components of the

_{jr}*Y*are chosen such that

_{j}*Y*has the maximum variance among the normalized linear combinations of the components of

_{jl}*X*,

_{1}*Y*has the maximum variance among the linear functions of the components of

_{j2}*X*

_{1}uncorrelated with

*Y*, and so on.

_{j1}In the method of canonical correlations, two sets of random variables (components of *Xj*) are replaced by smaller sets. First, linear combinations, one from each set of variables, are constructed so as to have maximum simple correlation with each other. These linear combinations are called the first pair of canonical variables, and their correlation is the first canonical correlation. The process is continued with the construction of further pairs of linear combinations. It is required, however, that each new canonical variable be uncorrelated with all previous ones. The method of canonical correlations indicates the maximum correlation between linear functions of two groups of components of the observation vector.

The results of the method of principal components and the method of canonical correlations contribute to an understanding of the structure of the multivariate population under consideration. Also of use in this regard is factor analysis, in which the components of the random vectors *X _{j}* are assumed to be linear functions of some unobserved factors that are to be studied.

Multivariate analysis also deals with the problem of differentiating two or more populations from the results of observations. One aspect of this problem is known as the discrimination problem and consists in the assignment of a new element to one of several populations on the basis of an analysis of samples of the populations. Another aspect involves dividing the elements of a population into groups that differ maximally, in some sense, from each other.

### REFERENCES

Anderson, T.*Vvedenie v mnogomernyi statisticheskii analiz*. Moscow, 1963. (Translated from English.)

Kendall, M. G., and A. Stuart.

*The Advanced Theory of Statistics*, vol. 3. London, 1966.

Dempster, A. P.

*Elements of Continuous Multivariate Analysis*. London, 1969.

A. V. PROKHOROV