Logistic regression is an algorithm for binary classification.
Binary classification is to get output y in (0, 1) with input image x. For example, you might have an input image and
want to output a label to recognize this image as either being a cat,in which case you output 1, or not-cat in which case you output 0.
There is a natation guide in the lecture.
2. Logistic Regression
Logistic Regression is a learning algorithm that you use when the output labels Yin a supervised learning problem are all either zero or one,so for binary classification problems.
$$ \text { Given }\left\{\left(x^{(1)}, y^{(1)}\right), \ldots,\left(x^{(m)}, y^{(m)}\right)\right\}, \text { want } \hat{y}^{(i)} \approx y^{(i)} $$ we generate the output Y hat as follows.
$$ \hat{y}=\sigma\left(w^{T} x+b\right) \text { , where } \sigma(z)=\frac{1}{1+e^{-z}} $$
Sigmoid function makes Y hat be in [0, 1]
3. Logistic Regression Cost Function
$$ \hat{y}^{(i)}=\sigma\left(w^{T} x^{(i)}+b\right) \text { , where } \sigma(z^{(i)})=\frac{1}{1+e^{-z^{(i)}}} $$
$$z^{(i)}=w^{T} x^{(i)}+b$$
$^{(i)}$ means i-th training example in dataset X
์ถ์ฒ. deep learning specialization - Andrew Ng
Loss function
calculate the difference between y hat andthe true label
Loss func1. $\mathscr{L}=(y-\hat{y})^{2}$
Loss func2. $\mathscr{L}=-(y \log (\hat{y})+(1-y) \log (1-\hat{y}))$
prefer func2 to func 1 because of opimization (Convex function has one global optimal, but non convex function has lots of local optimals)
Cost function
The cost function J, which is applied to your parameters W and b, is average of the loss function applied to each of the training examples
The computations of a neural network are organized in terms of a forward pass, in which we compute the ouput of the NN, and followed by a backward pass(backpropagation) which we use to compute gradients.
The comptuation graph organizes a computations with the blow arrow, left-to-right computation, and the red arrow, right-to-left computation.
8. Logistic Regression Gradient Descent
I've computed the derivatives of the cost function J with respectto each parameters w_1, w_2 and b
์ถ์ฒ Deep Learning Specialization - Andrew Ng
9. Gradient Descent on m examples
Figure 1
The value of dw in the code is cumulative, so there is only one dw variale.
But this method need two for loops, so it takes time too much. First loop is "i to m" loop, second loop is used when you compute dw_1, dw_2, db. If we vectorize variables w_n, we can use only one for loop.
Python and Vectorization
1. Vectorization
Vectorization is basicallythe art of getting rid of explicit for loops in code.
np.dot(w, x) : this code makes w transpose and calculate dot product of it(w_t) and x.
It is more than 300 times faster than explicit for loop, because it uses parallelization instructions.
2. More Vectorization Examples
Neural Network programming guideline
Whenever possible, avoid explicit for-loops.
์ถ์ฒ Deep Learning Specialization - Andrew Ng
Numpy library provides lots of build-in functions, so use them. It makes code simpler and faster.
How to get rid of for loop in Figure 1 (in Gradient Descent on m examples)
There are n repeats for calculating dw_1, dw_2, ..., dw_n. If dw_1, ..., dw_n are vectorized to one vector, the inner for loop can be replaced with dw += x_i * dz_i.
ํ๋ ฌ๊ณฑ ์์
3. Vectorizing Logistic Regression
Z = np.dot(W_t, X) + b
X.shape = (nx, m), W.shape = (nx, m), b.shape = (1, m)
* axis=0 means to sum vertically, axis=1 means to sum horizontally
How can you divide a three by four matrix by a one by four matrix? By broadcasting!
์ถ์ฒ Deep learning Specialization - Anrew Ng
General principle is ..
1. If you have an (m,n) matrix and you add(+) or subtract(-) or multiply(*) or divide(/) with a (1,n) matrix, then this will copy it n times into an (m,n) matrix.
2. And then apply the addition, subtraction, and multiplication of division element wise.
6. A Note on Python/Numpy Vectors
"np.random.randn(5)" creates five random Gaussian variables - its shape is (5, ) called rank 1 array. More clear respresentation is "np.random.rand(5, 1)". It generate column vector, and have a transpose which shape is (1, 5).
check the shape of arrays by assert(a.shape == (5,1))
๋๊ธ