regression analysis

(redirected from OLS Regression)
Also found in: Dictionary, Thesaurus, Medical, Financial.

regression analysis

[ri′gresh·ən ə‚nal·ə·səs]
The description of the nature of the relationship between two or more variables; it is concerned with the problem of describing or estimating the value of the dependent variable on the basis of one or more independent variables.
McGraw-Hill Dictionary of Scientific & Technical Terms, 6E, Copyright © 2003 by The McGraw-Hill Companies, Inc.
The following article is from The Great Soviet Encyclopedia (1979). It might be outdated or ideologically biased.

Regression Analysis


the branch of mathematical statistics that encompasses practical methods of studying a regression relation between variables on the basis of statistical data. The purposes of regression analysis include the determination of the general form of a regression equation, the construction of estimates of unknown parameters occurring in a regression equation, and the testing of statistical regression hypotheses.

When the relationship between two variables is studied on the basis of the observed values (x1, y1),…, (xn, yn) in accordance with regression theory, it is assumed that one of the variables, Y, has a certain probability distribution when the value of the other variable is fixed as x. This probability distribution is such that

E(Yǀx) = g(x, β)

D(Yǀx) = σ2h2(x)

where β denotes the set of unknown parameters that determine the function g(x), and h(x) is a known function of x—for example, it may have the constant value one. The choice of a regression model is determined by the assumptions regarding the form of the dependence of g(x, β) on x and β. The most natural model, from the standpoint of a unified method of estimating the unknown parameters β, is the regression model that is linear in β:

g(x, β) = β0g0(x) + … + βkgk(x)

Different assumptions may be made regarding the values of the variable x, depending on the nature of the observations and the aims of the analysis. In order to establish the relationship between the variables in an experiment, a model is used that is based on simplified but plausible assumptions. These assumptions are that the variable x is a controllable variable, whose value is assigned during the design of the experiment, and that the observed values of y are expressed in the form

yi = g(xi, β) + i i = 1,…,k

where the quantities i, describe the errors. The errors are assumed to be independent under different measurements and to be identically distributed with zero mean and constant variance σ2. The case where x is an uncontrollable variable differs in that the observed values (x1, y1),…. (xn, yn) constitute a sample from a bivariate population. In either case, the regression analysis is performed in the same way. The interpretation of the results, however, is done substantially differently. If both the variables are random, the relationship between them is studied by the methods of correlation analysis.

A preliminary idea of the nature of the relation between g(x) and x can be obtained by plotting the points (xi, ȳ(xi) in a scatter diagram, which is also called a correlation field when both variables are random. The (xi) are the arithmetic means of the values of y that correspond to a fixed value xi. For example, if the points fall near a straight line, a linear regression can be used as the approximation.

The standard method of estimating the regression line is based on the polynomial model (m ≥ 1):

y(x, β) = β0 + β1x = … + βmxm

One reason for the choice of this model is that every function continuous over some interval can be approximated by a polynomial to any desired degree of accuracy. The unknown regression coefficients β0,…, βm and the unknown variance σ2 are estimated by the method of least squares. The estimates β0,…, β̂0 of the parameters β0, …, βm obtained by this method are called the sample regression coefficients, and the equation

ŷ(x) = β̂0 + … + βmxm

defines what is called the sample regression line. If the observed values are assumed to be normally distributed, this method leads to estimates of β0,…, βm and of σ2 that coincide with estimates obtained by the maximum likelihood method. The estimates obtained by the least squares method are in some sense best estimates even when the distribution is not normal. Thus, if a linear regression hypothesis is to be tested,

where and are the arithmetic means of the xi and yi. The estimate ĝ(x) = β̂0 + β1(x) is an unbiased estimate of g(x); its variance is less than the variance of any other linear estimate. The assumption that the yi have a normal distribution is the most effective method of checking the accuracy of the constructed sample regression equation and of testing the hypotheses on the parameters of the regression model. In this case, the construction of the confidence intervals for the true regression coefficients β0,…, βm and the testing of the hypothesis that no regression relationship exists (βi = 0, i = 1,…, m) are carried out by means of Student’s distribution.

In a more general situation, the observed values y1,…,yn are regarded as values of independent random variables with identical variances and the mathematical expectations

Eyi = βi xu+ … + βkxki i = 1…,n

where the values of the Xji, j = 1,…, k, are assumed known. This form of linear regression model is general in the sense that higher-order models in the variables x1, …, xk reduce to it. Moreover, certain models that are nonlinear in β can also be reduced to this linear form by a suitable transformation.

Regression analysis is one of the most widespread methods of processing the results of observations made during the study of relationships in such fields as physics, biology, economics, and engineering. Such branches of mathematical statistics as analysis of variance and the design of experiments are also based on regression analysis. Regression analysis models are widely used in multivariate statistical analysis.


Yule, G. U., and M. G. Kendall. Teoriia statisliki, 14th ed. Moscow, 1960. (Translated from English.)
Smirnov, N. V., and I. V. Dunin-Barkovskii. Kurs teorii veroiatnostei i matematicheskoi statisliki dlia tekhnicheskikh prilozhenii, 3rd ed. Moscow, 1969.
Aivazian, S. A. Statislicheskoe issledovanie zavisimostei. Moscow, 1968.
Rao, C. R. Lineinye statisticheskie metody i ikh primeneniia. Moscow, 1968. (Translated from English.)


The Great Soviet Encyclopedia, 3rd Edition (1970-1979). © 2010 The Gale Group, Inc. All rights reserved.

regression analysis

In statistics, a mathematical method of modeling the relationships among three or more variables. It is used to predict the value of one variable given the values of the others. For example, a model might estimate sales based on age and gender. A regression analysis yields an equation that expresses the relationship. See correlation.
Copyright © 1981-2019 by The Computer Language Company Inc. All Rights reserved. THIS DEFINITION IS FOR PERSONAL USE ONLY. All other reproduction is strictly prohibited without permission from the publisher.
References in periodicals archive ?
Our recommended approach to multivariate outliers is comprised of five steps: 1) test for the presence of multivariate outliers since they are suggestive of bad data (e.g., data entry errors, sampling errors, and omitted variables), 2) identify outliers robustly in a multivariate context, 3) carefully consider and examine the nature and origin of the outliers, 4) correct data and omitted variables errors, and 5) consider the nature of the research question and economic theory to determine whether to mitigate further by dropping the influential observations in the OLS regressions or by employing outlier robust estimators.
The results in column 1 are based on pooled OLS regression. Column 2 employs the Fama-MacBeth regression.
Table 5 reports the pooled OLS regression results on the impact of the CU membership on FDI inflows for 180 countries over the period of 1970-2007.
The results of OLS regression and various studies have been briefly summarized the facts and digits.
Nonparametric techniques utilize more parameters than OLS regression, and as a result, more observations are necessary to accurately estimate the function (James, Witten, Hastie, & Tibshirani, 2013).
These estimates are large, though not as large as those obtained in the OLS regressions.
Focusing solely on the mean of variables, as in OLS regressions, can overlook valuable information.
If the Chow test were not significant, the OLS regression based on the year-firms would be used.
Table 2 shows the results of two different specifications of the OLS regression.
Column 1 provides the results when the volatility measure alone is included as an explanatory variable in the OLS regression. The coefficient on volatility is negative and statistically significant when no other correlates of growth are included.