⌘ '

raccourcis clavier

feed-forward neural network

stateless, and usually have no feedback loop.

universal approximation theorem

regression

Think of just a linear layers with some activation functions

import torch.optim as optim
import torch.nn as nn
 
class LinearRegression(nn.Module):
  def __init__(self, input_dim, output_dim):
    super().__init__()
    self.fc = nn.Linear(input_dim, output_dim)
  def forward(self, x): return self.fc(x)
 
model = LinearRegression(224, 10)
loss = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.005)
 
for ep in range(10):
  y_pred = model(X)
  l = loss(Y, y_pred)
  l.backward()
  optimizer.setp()
  optimzer.zero_grad()

classification

Think of one-hot encoding (binary or multiclass) cases

backpropagation

context: using SGD we can compute the gradient:

\nabla_W(L(w,b)) = \sum_{i} \nabla_W (l(f_{w,b}(x^i), y^i))

This is expensive, given that for deep model this is repetitive!

intuition: we want to minimize the error and optimized the saved weights learned through one forward pass.

vanishing gradient

happens in deeper network wrt the partial derivatives

because we applies the chain rule and propagating error signals backward from the output layer through all the hidden layers to the input, in very deep networks, this involves successive multiplication of gradients from each layer.

thus the saturated neurons $\sigma^{'}(x) \approx 0$ , thus gradient does not reach the first layers.

solution:

we can probably use activation functions (Leaky ReLU)
better initialisation
residual network

usually prone to overfitting given they are often over-parameterized

We can usually add regularization terms to the objective functions
Early stopping
Adding noise
structural regularization, via adding dropout

dropout

a case of structural regularization

a technique of randomly drop each node with probability $p$

Bibliographie

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4), 303–314.

stateless, and usually have no feedback loop.

universal approximation theorem

regression

Think of just a linear layers with some activation functions

import torch.optim as optim
import torch.nn as nn
 
class LinearRegression(nn.Module):
  def __init__(self, input_dim, output_dim):
    super().__init__()
    self.fc = nn.Linear(input_dim, output_dim)
  def forward(self, x): return self.fc(x)
 
model = LinearRegression(224, 10)
loss = nn.MSELoss()
optimizer = optim.SGD(model.parameters(), lr=0.005)
 
for ep in range(10):
  y_pred = model(X)
  l = loss(Y, y_pred)
  l.backward()
  optimizer.setp()
  optimzer.zero_grad()

classification

Think of one-hot encoding (binary or multiclass) cases

backpropagation

context: using SGD we can compute the gradient:

\nabla_W(L(w,b)) = \sum_{i} \nabla_W (l(f_{w,b}(x^i), y^i))

This is expensive, given that for deep model this is repetitive!

intuition: we want to minimize the error and optimized the saved weights learned through one forward pass.

vanishing gradient

happens in deeper network wrt the partial derivatives

thus the saturated neurons $\sigma^{'}(x) \approx 0$ , thus gradient does not reach the first layers.

solution:

we can probably use activation functions (Leaky ReLU)
better initialisation
residual network

url: thoughts/regularization
regularization
usually prone to overfitting given they are often over-parameterized

We can usually add regularization terms to the objective functions

Early stopping

Adding noise

structural regularization, via adding dropout

dropout

a case of structural regularization

a technique of randomly drop each node with probability $p$
Lien vers l'original

Bibliographie

Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems, 2(4), 303–314.

feed-forward neural network

Étiquette

publié à

modifié à

durée

source

universal approximation theorem

regression

classification

backpropagation

vanishing gradient

regularization

dropout

Bibliographie

Vous pourriez aimer ce qui suit

Liens retour