๐ ๊ฐ์ ์ ๋ฆฌ
* Cousera ๊ฐ์ ์ค Andrew Ng ๊ต์๋์ Deep Learning Specialization ๊ฐ์๋ฅผ ๊ณต๋ถํ๊ณ ์ ๋ฆฌํ ๋ด์ฉ์ ๋๋ค.
* ์์ด ๊ณต๋ถ๋ฅผ ํ๋ ค๊ณ ์์ด๋ก ๊ฐ์๋ฅผ ์ ๋ฆฌํ๊ณ ์์ต๋๋ค. ํน์ ํ๋ฆฐ ๋ถ๋ถ์ด๋ ์ด์ํ ๋ถ๋ถ์ด ์๋ค๋ฉด ๋๊ธ๋ก ์๋ ค์ฃผ์๊ฑฐ๋ ๋์ด๊ฐ์ฃผ์๋ฉด ๊ฐ์ฌํ๊ฒ ์ต๋๋ค
1. Neural networks overview
I learned about new notations and representation of above shallow neural network. Brackets [] means the layer order and x superscript (i) means i-th training example of x.
2. Neural network representation
We can represent input layer x to a^[0], called layer zero, so technically, there are three layers in a "two layered neural network".
Two layered neural network consists of...
- Input layer a^[0]
- Hidden layer a^[1] (including W^[1], b^[1])
- Output layer a^[2] (including W^[2], b^[2])
3. Vetorizing Across Mutiple Examples
4. Activation functions
- Sigmoid function
- the tanh is pretty much strictly superior.
- Derivatives : g'(z) = g(z)(1-g(z))
- tanh
- Derivatives : g'(z) = 1-(g(z))^2
- ReLU
- Leaky ReLU
5. Why do you need Non-Linear function?
Z[1] = W[1]*x + b[1]; A[1] = Z[1]
Z[2] = W[2]*A[1] + b[2]= W[2]*W[1]*x + W[2]b[1] + b[2] = W'x + b'
"W'x + b'" is a linear function. If you don't have an activation function, then no matter how many layers your neural network has, all it's doing is just computing a linear activation function. So you might as well not have any hidden layers.
6. Derivatives of Activation Functions
๋๊ธ