We are going to write a function to solve a linear regression. First Linear regression concepts are introduced.
Assume we want to find the best fit for the equation:
$$ Y = \beta _0 + \beta _1 X + e $$
What are the values for $\beta _0$ and $\beta _1$?
We need to optimize
this function by minimization of errors
which is:
$$ e = Y - \beta _0 - \beta _1 X $$
There could be more than one variable to fit:
$$ e = Y - \beta _0 - \beta _1 X_1 - \beta _2 X_2 - \beta _3 X_3 ... $$
We can write it in matrix notation like:
$$ e = Y - X\beta $$
The matrix $X$ contains all variables and $\beta$ contains all parameters. It is a bit easier way to write like this. We use the least square minimization
for regression fits:
$$ \begin{align} \min{\beta} \quad e'e &= (Y−X\beta)′(Y−X\beta)\\ \, &= Y′Y−2\beta′X′Y+\beta′X′X\beta \end{align} $$
The solution is the partial derivative of sum of squares residuals with respect to $\beta$: $$ \frac{\partial e'e}{\beta} = −2X′Y+2X′X\beta = 0 \\ $$
So the $\beta$ is:
$$ \beta = (X′X)^{-1}X′Y $$
Well... It might seem confusing if you meet Least Squares first time but you will use it in Econometrics a lot. It is very basic concept but we are only interested in the solution/formula of the $\beta$ for now. We can define a function that fits the regression.
import numpy as np
def ols(x,y):
# you need to make calculations inside the function and return the BETA coefficients
return None # return Beta coefficients
np.dot()
function. X.T
in numpy format. For instance multiplication of $X'Y$ = np.dot(X.T,Y)
.numpy.linalg.inv()
function to take the inverse of $(X'X)^{-1}$Test if your function works. For we need to create random values.
np.random.seed(1)
X = np.concatenate((np.ones((10,1)),np.random.random((10,3))), axis=1)
Y = np.random.randint(10, size=(10,1))
print(ols(X,Y))