Notes
7 knn
- no paramter, with training data as model
- test not fast
no learn anything
8 parameterized learning
- with parameters W,b as model
- test very fast
learn parameters
multi-class svm loss/hinge loss
svm with hinge loss
- hinge loss
- squared hinge loss
L1 = 0, correct predict
L2 = 5.96, incorrect predict
L3 = 5.20, incorrect predict
cross-entropy loss and softmax classifiers
Softmax classifiers give you probabilities for each class label while hinge loss gives you the
margin.
9 optimization
GD and SGD
- basic gradient descent(GD): predict all training data and update weights per epoch
- stochastic gradient descent(SGD): predict only batch training data and update weights per batch
Extensions to SGD
- Momentum based SGD
- Nesterov acceration SGD
tips: use Momentum based SGD
regularization
add to original cost
- L1 regularization
- L2 regularization(weight decay)
- Elastic Net regularization
during training process
- dropout
- data argumentation
- early stopping(no-imporovements-in-N)
Reference
History
- 20190709: created.