๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
TIL/Coursera(Google ML Bootcamp)

[Convolutional Neural Networks] Special applications: Face recognition & Neural style transfer :: seoftware

by seowit 2021. 9. 27.

๐Ÿ“œ ๊ฐ•์˜ ์ •๋ฆฌ 

* Cousera ๊ฐ•์˜ ์ค‘ Andrew Ng ๊ต์ˆ˜๋‹˜์˜ Convolutional Neural Network ๊ฐ•์˜๋ฅผ ๊ณต๋ถ€ํ•˜๊ณ  ์ •๋ฆฌํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

* ์ด๋ฏธ์ง€ ์ถœ์ฒ˜ : Deeplearning.AI

 


Face Recognition

What is Face Recognition?

Face recognition ๋ฌธ์ œ๋Š” verification๊ณผ ๊ตฌ๋ณ„๋œ๋‹ค. Face verification์€ 1:1 ๋น„๊ต๋ผ๋ฉด Face Recognition์€ 1:N ๋น„๊ต๋ผ๋Š” ์ ์ด ๊ฐ€์žฅ ํฐ ์ฐจ์ด์ ์ด๋‹ค.

 

One Shot Learning

One shot learning์ด๋ž€ ํ•œ ์‚ฌ์ง„(example)๋งŒ ๋ณด๊ณ  ๋ถ„๋ฅ˜๋ฅผ ํ•˜๋Š” ๊ฒƒ์„ ๋งํ•œ๋‹ค. Face recognition์—์„œ one-shot learning์€ ํ•ด๊ฒฐํ•ด์•ผํ•  ๋ฌธ์ œ ์ค‘์— ํ•˜๋‚˜์ด๋‹ค. ์•„๋ž˜์˜ ์˜ˆ์‹œ์—์„œ๋Š” ์˜ค๋ฅธ์ชฝ์˜ ์‚ฌ์ง„์„ ๋ณด๊ณ  ์™ผ์ชฝ์˜ ์‚ฌ์ง„๊ณผ ๊ฐ™์€ ์‚ฌ๋žŒ์ธ ๊ฒƒ์„ ํŒ๋ณ„ํ•ด์•ผํ•œ๋‹ค. Similarity function์„ ์‚ฌ์šฉํ•ด์„œ ๋‘ ์ด๋ฏธ์ง€๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•œ๋‹ค. d(img1, img2)๋กœ ํ‘œ์‹œํ•˜๊ณ  ํ•จ์ˆ˜ d์˜ ๊ฒฐ๊ณผ๊ฐ€ threshold ์ดํ•˜์ด๋ฉด ๊ฐ™์€ ์‚ฌ๋žŒ, ์ดˆ๊ณผ๋ฉด ๋‹ค๋ฅธ ์‚ฌ๋žŒ์œผ๋กœ ํŒ๋ณ„ํ•œ๋‹ค

 

Siamese Network

์•ž์—์„œ ๋‘ ์–ผ๊ตด ์ด๋ฏธ์ง€๊ฐ„์˜ ์œ ์‚ฌ๋„๋ฅผ ๊ตฌํ•˜๋Š” ํ•จ์ˆ˜ d์— ๋Œ€ํ•ด ์–ธ๊ธ‰ํ–ˆ์—ˆ๋Š”๋ฐ, ์ด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด Siamese Network ์ด๋‹ค. ๋‘ ๊ฐœ์˜ ๋‹ค๋ฅธ ์ธํ’‹์— ๋Œ€ํ•ด ๋™์ผํ•œ conv layer๋ฅผ ์‹คํ–‰ํ•œ ๋‹ค์Œ์— ๋น„๊ตํ•˜๋Š” ๊ฐœ๋…์ด Simese Network ์ด๋‹ค. ์ด๊ฑธ ์ œ์‹œํ•˜๋Š” ๋…ผ๋ฌธ์ด DeepFace์ธ๋ฐ ์ „์— ํ•œ ๋ฒˆ ๋ณธ ์ ์ด ์žˆ๋‹ค. ์ด ๋ถ€๋ถ„์— ๋Œ€ํ•ด์„œ๋Š” ๋‹ค์Œ์— ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ์ง„ํ–‰ํ•ด๋ณด๊ณ  ์‹ถ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ๋’ค์— ๋‚˜์˜ค๋Š” FaceNet์— ๋Œ€ํ•œ ๋…ผ๋ฌธ๋„ ํ•จ๊ป˜ ์ฝ์–ด๋ณด๊ธฐ๋ฅผ ์ถ”์ฒœํ•˜์…จ๋‹ค.

 

์ธํ’‹ ์ด๋ฏธ์ง€ x์— ๋Œ€ํ•ด์„œ f(x)๋กœ ์ธ์ฝ”๋”ฉ์„ ์ง„ํ–‰ํ•œ๋‹ค. ๊ทธ ๋‹ค์Œ ๋‹ค๋ฅธ ์‚ฌ๋žŒ์— ๋Œ€ํ•ด์„œ๋Š” ๊ฑฐ๋ฆฌ ํ•จ์ˆ˜ d์˜ ๊ฐ’์„ ํฌ๊ฒŒ, ๊ฐ™์€ ์‚ฌ๋žŒ์— ๋Œ€ํ•ด์„œ๋Š” d์˜ ๊ฐ’์„ ์ž‘๊ฒŒ ํ•™์Šต์‹œํ‚จ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๊ฐ๊ฐ ๋‹ค๋ฅธ ์‚ฌ๋žŒ์˜ ์ด๋ฏธ์ง€์ธ x1๊ณผ x2 ์ด๋ฏธ์ง€๊ฐ€ ์žˆ์„ ๋•Œ, x1๊ณผ x2๋ฅผ ์ธ์ฝ”๋”ฉํ•˜์—ฌ f(x1)๊ณผ f(x2)๋กœ ํ‘œํ˜„ํ•œ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ d(f(x1), f(x2))์˜ ๊ฐ’์„ ํฌ๊ฒŒ ํ•˜์—ฌ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค. N๋ช…์˜ ์‚ฌ๋žŒ์ด ์žˆ์„ ๋•Œ ๋ชจ๋“  N๋ช…์— ๋Œ€ํ•ด์„œ ๋‘๋ช…์”ฉ ๋ฌถ์–ด์„œ ์œ„์˜ ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•ด์„œ ํ•™์Šต์„ ์‹œํ‚จ๋‹ค.

Triplet Loss

์–ผ๊ตด ์ด๋ฏธ์ง€์˜ ์ธ์ฝ”๋”ฉ์— ๋Œ€ํ•ด์„œ ์‹ ๊ฒฝ๋ง์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ž˜ ํ•™์Šต์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ Triplet Loss Function์„ ์‚ฌ์šฉํ•œ๋‹ค. ๋‘ ์–ผ๊ตด์„ ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” Anchor image์™€ Positive image, Negative image 3๊ฐœ๊ฐ€ ํ•„์š”ํ•˜๋‹ค. Anchor image๋Š” ๊ธฐ์ค€ ์ด๋ฏธ์ง€์ด๊ณ , Positive image๋Š” ๋™์ผํ•œ ์ธ๋ฌผ์˜ ์ด๋ฏธ์ง€์ด๊ณ  Negative image๋Š” ๋‹ค๋ฅธ ์ธ๋ฌผ์˜ ์ด๋ฏธ์ง€์ด๋‹ค. Anchor์™€ Positive๊ฐ„์˜ distance function์˜ ๊ฒฐ๊ณผ์™€ Anchor์™€ Negative ๊ฐ„์˜ distance function์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‚ด์•ผํ•˜๋Š”๋ฐ, ์ด ๋•Œ margin parameter์ธ α๊ฐ€ ์‚ฌ์šฉ๋œ๋‹ค. α๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š” d(A, P)์™€ d(A, N)์˜ ๊ฐ’์ด ๊ฐ™์œผ๋ฉด ์˜๋„์™€ ๋‹ค๋ฅด์ง€๋งŒ ๋‘ ๊ฐ’์˜ ์ฐจ๊ฐ€ 0 ์ดํ•˜๋ผ๋Š” ๊ธฐ์ค€์„ ๋งŒ์กฑ์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์— ์˜๋„์™€ ๋‹ค๋ฅด๊ฒŒ ํ•™์Šต๋  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋”ฐ๋ผ์„œ 0.2์˜ margin parameter๋ฅผ ์‚ฌ์šฉํ•ด์•ผํ•œ๋‹ค.

 

Face Verification and Binary Classification

Triplet loss ๋ง๊ณ ๋„ Face verification์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ๋‘ ์‚ฌ๋žŒ์ด ๊ฐ™์€์ง€(1) ๋‹ค๋ฅธ์ง€(0) ํŒ๋ณ„ํ•˜๋Š” binary classification ๋ฐฉ๋ฒ•์œผ๋กœ๋„ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ๋‹ค. ์–ผ๊ตด์— ๋Œ€ํ•ด ์ธ์ฝ”๋”ฉํ•œ ์ž„๋ฒ ๋”ฉ ๊ฐ’์„ ๊ตฌํ•˜๊ณ  ์ž„๋ฒ ๋”ฉ ๊ฐ’ ๊ฐ„์˜ ๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€๋ฅผ ํ•œ๋‹ค.

์ด ๊ณผ์ •์—์„œ ๋งค๋ฒˆ ์–ผ๊ตด์— ๋Œ€ํ•œ ์ž„๋ฒ ๋”ฉ์„ ๊ตฌํ•  ํ•„์š”๊ฐ€ ์—†๊ณ  ์ž„๋ฒ ๋”ฉ๋œ ๊ฐ’์„ DB์— ์ €์žฅํ•ด์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 


Neural Style Transfer

What is Neural Style Transfer?

What are deep ConvNets learning?

๊ฐ ๋ ˆ์ด์–ด์˜ ํžˆ๋“  ์œ ๋‹›๋“ค์„ ์‹œ๊ฐํ™”ํ•˜์—ฌ ์–ด๋–ค ํŠน์ง•(feature)์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”์ง€ ์‚ดํŽด๋ณธ๋‹ค. layer 1์œผ๋กœ ๊ฐˆ์ˆ˜๋ก ๋‹จ์ˆœํ•œ ํŠน์ง•์„ ๊ฐ–๋Š”๋‹ค.

Cost function

Style Transfer์—์„œ๋Š” content์™€ style ๋‘ ๊ฐœ์— ๋Œ€ํ•œ cost function์ด ํ•„์š”ํ•˜๋‹ค. J_content์™€ J_style์˜ ํ•ฉ์ด ์ตœ์ข… cost function์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

Content Cost function

Style Cost Function

 

1D and 3D generalizations

 

๋Œ“๊ธ€