Now, as we relaxed the constraints of the deterministic model and introduced an error term $$\epsilon$$, we run into another problem. There are infinitely many regression lines that fulfill the specifications of the probabilistic model.

Obviously, we need a strategy to select that particular regression line, which corresponds to the best model in order to describe the data. In this section we discuss on one of the most popular methods to achieve that task, the so called ordinary least squares method (OLS).

As mentioned in the previous section for each particular pair of values $$(x_1,y_1)$$ the error $$e_i$$ is calculated by $$y_1-\hat y$$. In order to get the best fitting line for the given data the error sum of squares, denoted by SSE, is minimized.

$min\; SSE = \sum_{i=1}^n e_i^2=\sum_{i=1}^n (y - \hat y)^2$

For the simple linear model there exists an analytic solution for $$\beta_1$$

$\hat{\beta_1} = \frac{\sum_{i=1}^n ((x_i- \bar x) (y_i-\bar y))}{\sum_{i=1}^n (x_i-\bar x)^2} = \frac{cov(x,y)}{var(x)}\text{,}$

and $$\beta_0$$:

$\hat{\beta_0} = \bar y -\hat{\beta_1} \bar x$

The OLS gives the maximum likelihood estimate for $$\hat{\beta}$$ when the parameters have equal variance and are uncorrelated, and the residuals $$\epsilon$$ are uncorrelated and follow a Gaussian distribution (homoscedasticity).