Linear regression
See also slides for curve fitting, regression, colab link, bias and intercept
python: ols_and_kls.py
curve fitting
how do we fit a distribution of data over a curve?
Given a set of n data points S={(xi,yi)}n=1n
- x∈Rd
- y∈R (or Rk)
ols.
Ordinary Least Squares (OLS)
Let yi^ be the prediction of a model X, di=∥yi−yi^∥ is the error, minimize ∑i=1n(yi−yi^)2
In the case of 1-D ordinary least square, the problems equates find a,b∈R to minimize a,bmin∑i=1n(axi+b−yi)2
optimal solution
ab=x2−(x)2xy−x⋅y=Var(x)COV(x,y)=y−ax
where x=N1∑xi, y=N1∑yi, xy=N1∑xiyi, x2=N1∑(xi)2
hyperplane
y^=w0+j=1∑dwjxj∵w0:the y-intercept (bias)
Homogeneous hyperplane:
w0y^=0=j=1∑dwjxj=⟨w,x⟩=wTx
Matrix form OLS:
Xn×d=x11⋮x1n⋯⋱⋯xd1⋮xdn,Yn×1=y1⋮yn,Wd×1=w1⋮wd
Obj Def:i=1∑n(y^i−yi)2=i=1∑n(⟨w,xi⟩−yi)2:Δ=Δ1⋮Δn=x11⋮x1n⋯⋱⋯xd1⋮xdnw1⋮wd−y1⋮yn=y^1−y1⋮y^n−yn
W∈Rd×1min∥XW−Y∥22
WLS=(XTX)−1XTY
Example:
y^=w0+w1⋅x1+w2⋅x2
With
Xn×2=x11x12x13x21x22x23
and
Xn×3′=x11x12x13x21x22x23111
With
W=(w1w2)
and
W′=w1w2w0
thus
X′×W=w0+∑wi×xi1⋮w0+∑wi×xin