๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
TIL/Coursera(Google ML Bootcamp)

[Structuring Machine Learning Projects] Why ML Strategy :: seoftware

by seowit 2021. 9. 12.

๐Ÿ“œ ๊ฐ•์˜ ์ •๋ฆฌ 

 

* Coursera ๊ฐ•์˜ ์ค‘ Andrew Ng ๊ต์ˆ˜๋‹˜์˜ Structuring Machine Learning Projects ๊ฐ•์˜๋ฅผ ๊ณต๋ถ€ํ•˜๊ณ  ์ •๋ฆฌํ•œ ๋‚ด์šฉ์ž…๋‹ˆ๋‹ค.

* ์ด๋ฏธ์ง€ ์ถœ์ฒ˜ : Deeplearning.AI


Introduction to ML Strategy

There are some strategy for improving your system(model). These strategy can make your system work effectively and quickly.

 

๋จธ์‹ ๋Ÿฌ๋‹์„ ํšจ์œจ์ ์œผ๋กœ ํ•˜๋Š” ์‚ฌ๋žŒ๋“ค์€ ์–ด๋–ค ํšจ๊ณผ๋ฅผ ์œ„ํ•ด ์–ด๋–ค ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํŠœ๋‹ํ•ด์•ผํ•˜๋Š”์ง€ ๋šœ๋ ทํ•œ ์•ˆ๋ชฉ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฐ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํŠœ๋‹ํ•˜๋Š” ๊ณผ์ •์„ Orthogonalization(์ง๊ตํ™”)๋ผ๊ณ  ๋ถ€๋ฅธ๋‹ค.

 

 ์ „์ฒด์ ์ธ ์‹œ์Šคํ…œ์˜ ๊ณผ์ •์€ training set ํ•™์Šต → dev set์œผ๋กœ ํ™•์ธ → test set ์œผ๋กœ ํ…Œ์ŠคํŠธ → ์‹ค์ œ ์•ฑ์— ์ ์šฉ ์ด๋‹ค.

โœ” ๊ฐ ๊ณผ์ •์—์„œ ์„ฑ๋Šฅ์ด ์ž˜ ์•ˆ๋‚˜์˜จ๋‹ค๋ฉด ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž

  • Fit training set well on cost function : bigger network, better optimization algorithm(like Adam)
  • Fit dev set well on cost function : Regularization, Bigger training set
  • Fit test set well on cost function : Bigger dev set
  • Performs well in real world : change dev set or cost function

Andrew ๊ต์ˆ˜๋‹˜์€ early stopping ๊ธฐ์ˆ ์€ ์ž˜ ์•ˆ์“ด๋‹ค๊ณ  ํ•œ๋‹ค. Orthogonalization์€ ์–ด๋–ค ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์‹œ์Šคํ…œ์˜ ๋‹ค๋ฅธ ๋ถ€๋ถ„์— ์˜ํ–ฅ์„ ์ฃผ๊ฑฐ๋‚˜ ๋ถ€์ž‘์šฉ์ด ์ƒ๊ธฐ์ง€ ์•Š๋„๋ก ํ•ด์•ผํ•˜๋Š”๋ฐ, early stopping์€ train set์„ fittingํ•˜๋Š” ๊ณผ์ •๊ณผ dev set์„ fittingํ•˜๋Š” ๊ณผ์ • ๋ชจ๋‘์— ์˜ํ–ฅ์„ ๋ผ์น˜๊ฒŒ ๋œ๋‹ค.

 

Setting Up Your Goal

1. Single Number Evaluation Metric

ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํŠœ๋‹ํ•˜๊ฑฐ๋‚˜ ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜ ๋ฐฉ๋ฒ•์„ ๋„์ž…ํ•  ๋•Œ, ์‹ค์ˆ˜ํ‰๊ฐ€์ง€ํ‘œ๊ฐ€ ์žˆ์œผ๋ฉด ํšจ์œจ์ด ๋” ์ข‹์•„์ง„๋‹ค.

๋ณดํ†ต ํŒ€ ๋‹จ์œ„๋กœ ๋จธ์‹ ๋Ÿฌ๋‹ ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•  ๋•Œ Single Number Evaluation Metric(์‹ค์ˆ˜ ํ‰๊ฐ€ ์ง€ํ‘œ)์„ ์‚ฌ์šฉ์„ ๊ถŒ์žฅํ•œ๋‹ค.

Idea → Code  → Experiment์˜ ๋ฐ˜๋ณตํ•˜์—ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ณ„์† ๊ฐœ์„ ํ•œ๋‹ค. ๊ทธ ๊ณผ์ •์—์„œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฐœ์„  ์ •๋„๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด single number evalutation metric์ด๋‹ค.

์ข‹์€ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋Š” Precision(์ •๋ฐ€๋„)์™€ Recall(์žฌํ˜„์œจ)์„ ํ™•์ธํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

๊ณ ์–‘์ด๋ฅผ ๋ถ„๋ฅ˜ํ•˜๋Š” ๋ชจ๋ธ์ด ์žˆ๋‹ค๊ณ  ํ•  ๋•Œ, ์ •๋ฐ€๋„๋Š” ๋ชจ๋ธ์ด ๊ณ ์–‘์ด๊ฐ€ ๋งž๋‹ค๊ณ  ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ํŒ๋‹จํ•˜๋Š” ํผ์„ผํŠธ๋ฅผ ์˜๋ฏธํ•˜๊ณ  ์žฌํ˜„์œจ์€ dev set์—์„œ ์‹ค์ œ ๊ณ ์–‘์ด์ธ ๊ฐ’ ์ค‘์— ๊ณ ์–‘์ด๋ผ๊ณ  ํŒ๋‹จํ•œ ํผ์„ผํŠธ๋ฅผ ์˜๋ฏธํ•œ๋‹ค.

There is a trade-off between Precision and Recall.์œ„์˜ ํ”ผ๊ทœ์–ด์—์„œ ์˜ค๋ฅธ์ชฝ ํ‘œ๋ฅผ ๋ณด๋ฉด Classifier A๊ฐ€ Recall ๊ฐ’์€ ๋” ๋†’๊ณ , CLassifier B๊ฐ€ Precision ๊ฐ’์€ ๋” ๋†’๋‹ค. ๋”ฐ๋ผ์„œ ์—ฌ๋Ÿฌ๊ฐœ์˜ Classifier๋ฅผ test ํ•ด๋ณด๊ณ  ๊ฐ€์žฅ ์ข‹์€ classifier๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค.

๊ต์ˆ˜๋‹˜์˜ ์ถ”์ฒœ์€ precision๊ณผ recall์„ ๊ฐ๊ฐ ๋ณด์ง€๋ง๊ณ  ์ด ๋‘˜์˜ ์กฐํ™”ํ‰๊ท ์ธ F1-score์„ ์ธก์ •์ง€ํ‘œ๋ฅผ ์‚ฌ์šฉํ•˜๋ผ!

 

2. Satisficing and Optimizing Metric

 

Accuracy๊ฐ€ optimizing Metric์ด๊ณ , Running time ๋“ฑ์˜ ์š”์†Œ๊ฐ€ Satisficing Metric์ด ๋œ๋‹ค.

๊ณ ๋ คํ•˜๋Š” ์š”์†Œ๊ฐ€ ์—ฌ๋Ÿฌ๊ฐœ๊ฐ€ ์žˆ๋‹ค๋ฉด, ๊ฐ€์žฅ ์ค‘์š”ํ•œ ์š”์†Œ๋ฅผ optimizing metric์œผ๋กœ ์ธก์ •์„ ํ•˜๊ณ , ๋‚˜๋จธ์ง€์— ๋Œ€ํ•ด์„œ๋Š” satisficing metric์œผ๋กœ ํ•œ๊ณ„์น˜๋ฅผ ์ •ํ•œ๋‹ค. ์œ„์˜ ์ด๋ฏธ์ง€์˜ ์˜ˆ์‹œ์˜ ๊ฒฝ์šฐ Running time์ด satisficing metric์ด๋ฏ€๋กœ running time์ด 100ms ์ดํ•˜์ธ ๊ฒƒ์œผ๋กœ satisficing metric์„ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

3. Train/dev/test distributions

dev set๊ณผ test set์˜ ๋ถ„ํฌ ํ˜•ํƒœ๋Š” ๊ฐ™์•„์•ผ ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•ด ๋ฏธ๊ตญ, ์œ ๋Ÿฝ, ํ•œ๊ตญ, ๋ธŒ๋ผ์งˆ, ์ค‘๊ตญ ๋“ฑ์˜ ๋‚˜๋ผ๊ฐ€ ์žˆ๋Š”๋ฐ dev set์€ ๋ฏธ๊ตญ, ์œ ๋Ÿฝ, ํ•œ๊ตญ์„ ์‚ฌ์šฉํ•˜๊ณ  test set์—๋Š” ๋ธŒ๋ผ์งˆ, ์ค‘๊ตญ์„ ์‚ฌ์šฉํ•˜๋ฉด ์ด๋ฏธ์ง€์˜ ๋ถ„ํฌ๊ฐ€ ๋‹ฌ๋ผ์ง„๋‹ค. ๋”ฐ๋ผ์„œ ์ ์ ˆํ•˜๊ฒŒ ์„ž์€ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค.

 

4. Size of the Development and Test set

๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๊ฐ€ 100, 1,000, 10,000๊ฐœ ์ •๋„๋กœ ์ž‘์•˜๋˜ ๊ณผ๊ฑฐ์—๋Š” train๊ณผ test ๋น„์œจ์ด 70:30์ด๊ฑฐ๋‚˜ train, valid, test์˜ ๋น„์œจ์ด 60:20:20 ์ด ๋ณดํŽธ์ ์ด์—ˆ๋‹ค

ํ˜„์žฌ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์•„์„œ 1,000,000๊ฐœ ์ •๋„์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒฝ์šฐ๊ฐ€ ์žˆ์œผ๋ฏ€๋กœ 1ํผ์„ผํŠธ๋งŒ ํ•ด๋„ 10,000๊ฐœ์˜ ๋ฐ์ดํ„ฐ์—ฌ์„œ train, valid, test์˜ ๋น„์œจ์ด 98:1:1๋กœ train set์˜ ๋น„์œจ์ด ์••๋„์ ์ด๋‹ค.

5. When to change Dev/Test sets and Metrics?

weight๋ฅผ ์ฃผ์–ด์„œ metric์„ ๋ฐ”๊พธ๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. A ๋ถ„๋ฅ˜๊ธฐ๊ฐ€ B ๋ถ„๋ฅ˜๊ธฐ๋ณด๋‹ค cat classifier๋กœ์„œ์˜ ์„ฑ๋Šฅ์ด ๋” ์ข‹์ง€๋งŒ A๋Š” ํฌ๋ฅด๋…ธ๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” ๋Šฅ๋ ฅ์ด B๋ณด๋‹ค ์—†๋‹ค๊ณ  ํ•˜๋ฉด, ์‚ฌ์šฉ์ž๋Š” B ๋ถ„๋ฅ˜๊ธฐ๋ฅผ ์„ ํƒํ•˜๊ฒŒ ๋œ๋‹ค. ๋”ฐ๋ผ์„œ metric์—์„œ ํฌ๋ฅด๋…ธ ์‚ฌ์ง„์— ๋Œ€ํ•ด์„œ๋Š” 10์„ ๊ณฑํ•˜๊ณ  ํฌ๋ฅด๋…ธ๊ฐ€ ์•„๋‹Œ ์‚ฌ์ง„ ์‚ฌ์ง„์— ๋Œ€ํ•ด์„œ๋Š” weight๋ฅผ 1์„ ๊ณฑํ•ด์„œ ํฌ๋ฅด๋…ธ ์‚ฌ์ง„์— ๋Œ€ํ•œ cost function value๋ฅผ ํฌ๊ฒŒ ๋งŒ๋“ค์–ด์ค„ ์ˆ˜ ์žˆ๋‹ค.

๋งŒ์•ฝ dev set์˜ ์‚ฌ์ง„์€ ๊ณ ํ™”์งˆ์— ์ „์ฒด ์‚ฌ์ง„์ผ ์ˆ˜ ์žˆ์ง€๋งŒ ์‹ค์ œ ์‚ฌ์šฉ์ž๊ฐ€ ์‚ฌ์šฉํ•˜๋Š” ์‚ฌ์ง„์€ ํ๋ฆฌ๊ณ , ๋ถ€๋ถ„์ ์ธ ์‚ฌ์ง„์ผ ์ˆ˜ ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด ๋•Œ๋„ B classifier์˜ ์„ฑ๋Šฅ์ด ๋” ์ข‹๊ฒŒ ๋‚˜์˜ฌ ์ˆ˜ ์žˆ๊ณ , dev set์˜ dataset์„ ๋ฐ”๊ฟ”์ค˜์•ผ ํ•œ๋‹ค. 

 

Compare to Human-Level Performance

1. Why Human-level Performance?

๋จธ์‹ ๋Ÿฌ๋‹๊ณผ ์ธ๊ฐ„์ง€๋Šฅ์„ ๋น„๊ตํ•˜๋Š” ์ด์œ  : 1. ๋”ฅ๋Ÿฌ๋‹์˜ ๋ฐœ์ „์œผ๋กœ ๋จธ์‹ ๋Ÿฌ๋‹์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ 2. ๋จธ์‹ ๋Ÿฌ๋‹ ๋””์ž์ธ ๊ณผ์ •์ด ํšจ์œจ์ ์œผ๋กœ ๋ฐœ์ „

Bayes Error๋Š” ๋ชจ๋ธ์ด ์ตœ์ ํ™”๋˜์—ˆ์„ ๋•Œ์˜ ์ตœ๊ณ  ์„ฑ๋Šฅ์ผ ๋•Œ์˜ ์—๋Ÿฌ๋ฅผ ์˜๋ฏธํ•œ๋‹ค. human level performanc ์ตœ๋Œ€์น˜์˜ ๊ทผ์‚ฌ ๋˜๋Š” 0%๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค(๋…ธ์ด์ฆˆ ์‹ฌํ•œ ๋ฐ์ดํ„ฐ ์ œ์™ธ)

์œ„์˜ ์˜ˆ๋ฅผ ๋ณด๋ฉด ๋‹ค๋ฅธ ์กฐ๊ฑด(training error, dev error)์€ ๋ชจ๋‘ ๊ฐ™๊ณ  human error (โฉณBayes Error)๊ฐ’๋งŒ ๋‹ค๋ฅด๋‹ค. ๊ทธ๋ ‡์ง€๋งŒ human error๊ฐ€ 1%์ธ ๊ฒฝ์šฐ classifier๊ฐ€ bias์— ์ง‘์ค‘ํ•˜๊ฒŒ ๋˜๊ณ , human error๊ฐ€ 7.5%์ธ ๊ฒฝ์šฐ์—๋Š” variance์— ์ง‘์ค‘ํ•˜๊ฒŒ ๋œ๋‹ค๋Š” ์ฐจ์ด๊ฐ€ ์ƒ๊ธด๋‹ค.

Bayes error์™€ Training Error ์‚ฌ์ด์˜ ์ฐจ์ด๋ฅผ Avoidable Bias๋ผ๊ณ  ํ•˜๊ณ , Training error์™€ Dev error์˜ ์ฐจ์ด๋ฅผ Variance๋ผ๊ณ  ํ•œ๋‹ค. Avoidable bias ๊ฐ’์ด ๋” ํฐ ๊ฒฝ์šฐ์—๋Š” bias์— ์ง‘์ค‘์„ ํ•˜๊ณ , Variance๊ฐ€ ๋” ํฐ ๊ฒฝ์šฐ์—๋Š” variance์— ์ง‘์ค‘ํ•˜๋ฉด ๋œ๋‹ค.  

2. Improving your model performance

์ถ”๊ฐ€์ ์œผ๋กœ test set๊ณผ dev set๊ณผ์˜ ์ฐจ์ด๊ฐ€ ๋งŽ์ด๋‚˜๋ฉด, dev set์— overfit ๋œ ์ƒํƒœ์ด๋ฏ€๋กœ dev set์˜ ํฌ๊ธฐ๋ฅผ ๋Š˜๋ ค์•ผํ•œ๋‹ค.

๋Œ“๊ธ€