February 2012
M	T	W	T	F	S	S
« Jan		Mar »
	1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29

Least Squares and Statistics

Last time I talked about the beautiful linear Algebra interpretation of Least Squares which adds a wonderful geometric interpretation to the analytical footwork we had to do before. This time I will borrow from a video I saw on Khan Academy a long time ago. I connects some of the elemental principles of statistics with regression.

The basic idea is similar to the analytical approach. But this time, we will only try to fit a line to $$M$$ $$x$$/$$y$$ pairs. So, we have a model of the form

\begin{equation*}y(x) = m x + c.\end{equation*}

The squared errors between model and data is then

\begin{align*}S := \sum_i \epsilon_i^2 &= \sum_i (m x_i + c - y_i)^2 \\ &= \sum_i m^2 x_i^2 + 2 c m x_i + c^2 - 2 m x_i y_i - 2 c y_i + y_i^2\end{align*}

Of course, we search for the global minima of this solution with respect to the parameters $$m$$ and $$c$$. So let's find the derivations:

\begin{align*}0 \stackrel{!}{=} \frac{\partial S}{\partial m} &= \sum_i 2 m x_i^2 + 2 c x_i - 2 x_i y_i \\ 0 \stackrel{!}{=} \frac{\partial S}{\partial c} &= \sum_i 2 m x_i + 2 c - 2 y_i\end{align*}

We can conveniently drop the 2 that appears in front of all terms. We will now rewrite these equations by using the definition of the sample mean:

\begin{equation*}\overline{x} = \frac{\sum_i x_i}{M}\end{equation*}

This gives:

\begin{align*}0 \stackrel{!}{=} m M \overline{x^2} + M c \overline{x} - M \overline{x y} \\ 0 \stackrel{!}{=} m M \overline{x} + M c - M \overline{y}\end{align*}

Let's loose the $$M$$ s and solve for $$m$$ and $$c$$.

\begin{align*}m &= \frac{\overline{x}\,\overline{y}- \overline{x y}}{\overline{x}^2 - \overline{x^2}} \\ c &= \overline{y} - m \overline{x}\end{align*}

If you look closely at the term for the slope $$m$$ you see that this is actually just the covariance of $$x$$ and $$y$$ divided by the variance of $$x$$. I find this very intriguing.

February 3 2012 Click for Comments Permalink

Twitter

Blog Archive

Least Squares and Statistics