Skip to content

Statistics | Endogeneity

Posted on:March 5, 2025

Table of contents

1. Classical Linear Model Assumptions

Classical linear model assumptions contain six assumptions (Wooldridge, 2016, p. 92):

  1. Linear in parameters: y=xTβ+ϵy=\mathbf{x}^T\beta + \epsilon
  2. Random sample with n independent observations (xi,yi)(\mathbf{x}_i, y_i) from the population
  3. No perfect collinearity
  4. Zero conditional mean: E(ϵx)=0E(\epsilon | \mathbf{x}) = 0 for any x\mathbf{x}
  5. Homoskedastic errors: Var(ϵx)=σ2Var(\epsilon | \mathbf{x}) = \sigma^2 for any x\mathbf{x} (Wooldridge, 2016, p. 106)
  6. Noramlity: ϵ\epsilon is independent to x\mathbf{x} and ϵN(0,σ2)\epsilon \sim N(0, \sigma^2)

Remarks:

2. Endogeneity: Cov(x,u)≠0

Under the Modern Assumptions, OLS estimators are guaranteed to be consistent even when a weaker version of assumption 4 is maintained, which refers to no correlation between regressors and the error term: Cov(ϵ,x)=0Cov(\epsilon, \mathbf{x}) = \mathbf{0}. Remember, this condition is weaker than E(ϵx)=0E(\epsilon | \mathbf{x}) = 0.

We call a regressor endogenous when it is correlated with the error term. In this case, the OLS estimators are neither guaranteed to be consistent nor unbiased.

Example: Omitted Variable Bias - When we miss the variable correlated with both included independent variables and the dependent variable (aka. included independent variables correlate with the error term), the estimated coefficients of the included variables would be also distorted (i.e. inconsistent).

3. Dealing with Endogeneity

4. References