See also slides for curve fitting, regression, colab link

python: ols_and_kls.py

curve fitting.

how do we fit a distribution of data over a curve?

Given a set of nn data points S={(xi,yi)}n=1nS=\set{(x^i, y^i)}^{n}_{n=1}

  • xRdx \in \mathbb{R}^{d}
  • yRy \in \mathbb{R} (or Rk\mathbb{R}^{k})

ols.

Ordinary Least Squares (OLS)

Let yi^\hat{y^i} be the prediction of a model XX, di=yiyi^d^i = \| y^i - \hat{y^i} \| is the error, minimize i=1n(yiyi^)2\sum_{i=1}^{n} (y^i - \hat{y^i})^2

In the case of 1-D ordinary least square, the problems equates find a,bRa,b \in \mathbb{R} to minimize mina,bi=1n(axi+byi)2\min\limits_{a,b} \sum_{i=1}^{n} (ax^i + b - y^i)^2

optimal solution

a=xyxyx2(x)2=COV(x,y)Var(x)b=yax\begin{aligned} a &= \frac{\overline{xy} - \overline{x} \cdot \overline{y}}{\overline{x^2} - (\overline{x})^2} = \frac{\text{COV}(x,y)}{\text{Var}(x)} \\ b &= \overline{y} - a \overline{x} \end{aligned}

where x=1Nxi\overline{x} = \frac{1}{N} \sum{x^i}, y=1Nyi\overline{y} = \frac{1}{N} \sum{y^i}, xy=1Nxiyi\overline{xy} = \frac{1}{N} \sum{x^i y^i}, x2=1N(xi)2\overline{x^2} = \frac{1}{N} \sum{(x^i)^2}

hyperplane

Hyperplane equation

y^=w0+j=1dwjxjw0:the y-intercept (bias)\hat{y} = w_{0} + \sum_{j=1}^{d}{w_j x_j} \\ \because w_0: \text{the y-intercept (bias)}

Homogeneous hyperplane:

w0=0y^=j=1dwjxj=w,x=wTx\begin{aligned} w_{0} & = 0 \\ \hat{y} &= \sum_{j=1}^{d}{w_j x_j} = \langle{w,x} \rangle \\ &= w^Tx \end{aligned}

Matrix form OLS:

Xn×d=(x11xd1x1nxdn),Yn×1=(y1yn),Wd×1=(w1wd)X_{n\times d} = \begin{pmatrix} x_1^1 & \cdots & x_d^1 \\ \vdots & \ddots & \vdots \\ x_1^n & \cdots & x_d^n \end{pmatrix}, Y_{n\times 1} = \begin{pmatrix} y^1 \\ \vdots \\ y^n \end{pmatrix}, W_{d\times 1} = \begin{pmatrix} w_1 \\ \vdots \\ w_d \end{pmatrix} Obj:i=1n(y^iyi)2=i=1n(w,xiyi)2 Def:Δ=(Δ1Δn)=(x11xd1x1nxdn)(w1wd)(y1yn)=(y^1y1y^nyn)\begin{aligned} \text{Obj} &: \sum_{i=1}^n (\hat{y}^i - y^i)^2 = \sum_{i=1}^n (\langle w, x^i \rangle - y^i)^2 \\ &\\\ \text{Def} &: \Delta = \begin{pmatrix} \Delta_1 \\ \vdots \\ \Delta_n \end{pmatrix} = \begin{pmatrix} x_1^1 & \cdots & x_d^1 \\ \vdots & \ddots & \vdots \\ x_1^n & \cdots & x_d^n \end{pmatrix} \begin{pmatrix} w_1 \\ \vdots \\ w_d \end{pmatrix} - \begin{pmatrix} y^1 \\ \vdots \\ y^n \end{pmatrix} = \begin{pmatrix} \hat{y}^1 - y^1 \\ \vdots \\ \hat{y}^n - y^n \end{pmatrix} \end{aligned}

minimize ww

minWRd×1XWY22\min\limits_{W \in \mathbb{R}^{d \times 1}} \|XW - Y\|_2^2

OLS solution

WLS=(XTX)1XTYW^{\text{LS}} = (X^T X)^{-1}{X^T Y}

Example:

y^=w0+w1x1+w2x2\hat{y} = w_{0} + w_{1} \cdot x_{1} + w_{2} \cdot x_{2}

With

Xn×2=(x11x21x12x22x13x23)X_{n \times 2} = \begin{pmatrix} x^{1}_{1} & x^{1}_{2} \\ x^{2}_{1} & x^{2}_{2} \\ x^{3}_{1} & x^{3}_{2} \end{pmatrix}

and

Xn×3=(x11x211x12x221x13x231)X^{'}_{n \times 3} = \begin{pmatrix} x^{1}_{1} & x^{1}_{2} & 1 \\ x^{2}_{1} & x^{2}_{2} & 1 \\ x^{3}_{1} & x^{3}_{2} & 1 \end{pmatrix}

With

W=(w1w2)W = \begin{pmatrix} w_1 \\ w_2 \end{pmatrix}

and

W=(w1w2w0)W^{'} = \begin{pmatrix} w_1 \\ w_2 \\ w_0 \end{pmatrix}

thus

X×W=(w0+wi×xi1w0+wi×xin)X^{'} \times W = \begin{pmatrix} w_0 + \sum{w_i \times x_i^{1}} \\ \vdots \\ w_0 + \sum{w_i \times x_i^{n}} \end{pmatrix}

See also Bias and intercept