๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
TIL/Coursera(Google ML Bootcamp)

[Deep Learning Specialization] Neural Networks and Deep Learning - week2:: seoftware

by seowit 2021. 8. 14.

๐Ÿ“œ ๊ฐ•์˜ ์ •๋ฆฌ 

* Cousera ๊ฐ•์˜ ์ค‘ Andrew Ng ๊ต์ˆ˜๋‹˜์˜ Deep Learning Specialization ๊ฐ•์˜๋ฅผ ๊ณต๋ถ€ํ•˜๊ณ  ์ •๋ฆฌํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

* ์˜์–ด ๊ณต๋ถ€๋ฅผ ํ•˜๋ ค๊ณ  ์˜์–ด๋กœ ๊ฐ•์˜๋ฅผ ์ •๋ฆฌํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. ํ‹€๋ฆฐ ๋ถ€๋ถ„์ด๋‚˜ ์–ด์ƒ‰ํ•œ ๋ถ€๋ถ„์ด ์žˆ๋‹ค๋ฉด ๋Œ“๊ธ€๋กœ ์•Œ๋ ค์ฃผ์‹œ๊ฑฐ๋‚˜ ๋„˜์–ด๊ฐ€์ฃผ์‹œ๋ฉด ๊ฐ์‚ฌํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค


Logistic Regression as a Neural Network

 

Binary classification is to get output y in (0, 1) with input image x. For example, y

$$
\hat{y}=\sigma\left(w^{T} x+b\right) \text { , where } \sigma(z)=\frac{1}{1+e^{-z}}
$$

$$
\hat{y}^{(i)}=\sigma\left(w^{T} x^{(i)}+b\right) \text { , where } \sigma(z^{(i)})=\frac{1}{1+e^{-z^{(i)}}}
$$

$$z^{(i)}=w^{T} x^{(i)}+b$$

$^{(i)}$ means i-th training example in dataset X

์ถœ์ฒ˜. deep learning specialization - Andrew Ng

    • Loss function
      • Loss func1. $\mathscr{L}=(y-\hat{y})^{2}$
      • Loss func2. $\mathscr{L}=-(y \log (\hat{y})+(1-y) \log (1-\hat{y}))$
      • prefer func2 to func 1 because of opimization (Convex function has one global optimal, but non convex function has lots of local optimals)
    • Cost function
      • The cost function J, which is applied to your parameters W and b, is average of the loss function applied to each of the training examples 
      • $J({w}, {b})=\mathscr{L}(y-\hat{y})^{2}=-\frac{1}{m}\left[\sum_{i=1}^{m} y^{(i)} \log (\hat{y}^{(i)})+(1-y^{(i)}) \log (1-\hat{y}^{(i)}))\right]$

 

3. Gradient Descent

We want to find w, b to minimize J(w, b).

J is convex function and has one global optimal : $\mathscr{L}=-(y \log (\hat{y})+(1-y) \log (1-\hat{y}))$

w and b take a step in the steepest downhill direction to find the optimal point.

  • w := w - α*dw 
  • b := b - α*db

์ถœ์ฒ˜ Deeplearning specialization - Andrew Ng

 

4. Derivatives

The slope(gradient, derivative) of linear function is same at all the point on the graph.

5. More Derivative Examples

We can calculate derivative by derivative calculus and numerical substitution.

๋„ํ•จ์ˆ˜๋ฅผ ๊ตฌํ•ด์„œ ํ‘ธ๋Š” ๋ฐฉ๋ฒ•๊ณผ x์™€ x + 0.001์˜ ์ถฉ๋ถ„ํžˆ x์™€ ๊ฐ€๊นŒ์šด ์ˆ˜์˜ ํ•จ์ˆ˜๊ฐ’์„ ๊ตฌํ•ด์„œ ๋น„๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค.

์ถœ์ฒ˜ Deeplearning specialization - Andrew Ng

 

6. Computation Graph 

The computations of a neural network are organized in terms of a forward pass, in which we compute the ouput of the NN, and followed by a backward pass(backpropagation) which we use to compute gradients.

The comptuation graph organizes a computations with the blow arrow, left-to-right computation, and the red arrow, right-to-left computation.

8. Logistic Regression Gradient Descent

I

์ถœ์ฒ˜ Deep Learning Specialization - Andrew Ng

9. Gradient Descent on m examples

Figure 1

The value of dw in the code is cumulative, so there is only one dw variale. 

But this method need two for loops, so it takes time too much. First loop is "i to m" loop, second loop is used when you compute dw_1, dw_2, db. If we vectorize variables w_n, we can use only one for loop.


Python and Vectorization

 

1. Vectorization

 

    • ์ถœ์ฒ˜ Deep Learning Specialization - Andrew Ng
      Numpy library provides lots of build-in functions, so use them. It makes code simpler and faster.
  • How to get rid of for loop in Figure 1 (in Gradient Descent on m examples)
    • There are n repeats for calculating dw_1, dw_2, ..., dw_n. If dw_1, ..., dw_n are vectorized to one vector, the inner for loop can be replaced with dw += x_i * dz_i.
    • ํ–‰๋ ฌ๊ณฑ ์˜ˆ์‹œ

3. Vectorizing Logistic Regression

Z = np.dot(W_t, X) + b

X.shape = (nx, m), W.shape = (nx, m), b.shape = (1, m)

4. Vectorizing Logistic Regression's Gradient Output

5. Broadcasting in Python

* axis=0 means to sum vertically, axis=1 means to sum horizontally

How can you divide a three by four matrix by a one by four matrix? By broadcasting!

์ถœ์ฒ˜ Deep learning Specialization - Anrew Ng

  • then this will copy it n times into an (m,n) matrix.
  • 2. And then apply the addition, subtraction, and

 

 

๋Œ“๊ธ€