๐Ÿ˜ถ ์ดˆ๋ก (Abstract)

- ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์€ ํ•™์Šต์‹œํ‚ค๊ธฐ ๋”์šฑ ์–ด๋ ต๋‹ค.
- "residual learning framework"๋กœ ์ด์ „๊ณผ ๋‹ฌ๋ฆฌ ์ƒ๋‹นํžˆ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์˜ training์„ ์‰ฝ๊ฒŒํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์†Œ๊ฐœํ•˜๊ณ ์ž ํ•œ๋‹ค.
unreferencedํ•จ์ˆ˜ ๋Œ€์‹  layer input์„ ์ฐธ์กฐํ•ด "learning residual function"์œผ๋กœ layer๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ์žฌ๊ตฌ์„ฑํ•˜์˜€๋‹ค.
์ด๋Ÿฐ ์ž”์ฐจ์‹ ๊ฒฝ๋ง(residual network)์ด ์ตœ์ ํ™”ํ•˜๊ธฐ ๋”์šฑ ์‰ฝ๊ณ  ์ƒ๋‹นํžˆ ์ฆ๊ฐ€๋œ ๊นŠ์ด์—์„œ ์ •ํ™•๋„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ๋Š” ํฌ๊ด„์ ์ธ(comprehensive) ๊ฒฝํ—˜์  ์ฆ๊ฑฐ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
- ImageNet dataset์—์„œ VGGNet๋ณด๋‹ค 8๋ฐฐ ๋” ๊นŠ์ง€๋งŒ ๋ณต์žก์„ฑ์€ ๋‚ฎ์€ 152์ธต์˜ ์ž”์ฐจ์‹ ๊ฒฝ๋ง์— ๋Œ€ํ•ด ํ‰๊ฐ€ํ•œ๋‹ค.
์ด๋Ÿฐ ์ž”์ฐจ์‹ ๊ฒฝ๋ง์˜ ์•™์ƒ๋ธ”์€ ImageNet testset์—์„œ 3.57%์˜ ์˜ค๋ฅ˜๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋Š”๋ฐ, ์ด๋Š” ILSVRC-2015์—์„œ 1์œ„๋ฅผ ์ฐจ์ง€ํ–ˆ๋‹ค.
๋˜ํ•œ 100์ธต๊ณผ 1000์ธต์„ ๊ฐ–๋Š” CIFAR-10์— ๋Œ€ํ•œ ๋ถ„์„๋„ ์ œ์‹œํ•œ๋‹ค.

- ๊นŠ์ด์— ๋Œ€ํ•œ ํ‘œํ˜„์€ ๋งŽ์€ ์‹œ๊ฐ์ ์ธ์ง€์ž‘์—…์—์„œ ๋งค์šฐ ์ค‘์š”ํ•˜๋‹ค.
์šฐ๋ฆฌ์˜ ๋งค์šฐ๊นŠ์€๋ฌ˜์‚ฌ๋Š” COCO Object Detection dataset์—์„œ ์ƒ๋Œ€์ ์œผ๋กœ 28% ํ–ฅ์ƒ๋œ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค.
์‹ฌ์ธต์ž”์ฐจ์‹ ๊ฒฝ๋ง(Deep residual net)์€ ILSVRC. &. COCO 2015 ๋Œ€ํšŒ1์— ์ œ์ถœํ•œ ์ž๋ฃŒ์˜ ๊ธฐ๋ฐ˜์œผ๋กœ ImageNet๊ฐ์ง€, ImageNet Localization, COCO ๊ฐ์ง€, COCO segmentation ์ž‘์—…์—์„œ๋„ 1์œ„๋ฅผ ์ฐจ์ง€ํ–ˆ๋‹ค.

 

 

1. ์„œ๋ก  (Introduction)

Deep CNN์€ image classification์˜ ๋ŒํŒŒ๊ตฌ๋กœ ์ด์–ด์กŒ๋‹ค.
์‹ฌ์ธต์‹ ๊ฒฝ๋ง์€ ๋‹น์—ฐํ•˜๊ฒŒ๋„ ์ €·์ค‘·๊ณ ์ˆ˜์ค€์˜ ํŠน์ง•์„ ํ†ตํ•ฉํ•˜๊ณ , ๋ถ„๋ฅ˜๊ธฐ๋Š” ๋‹ค์ธต์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ๋ถ„๋ฅ˜ํ•˜๋ฉฐ feature์˜ "level"์€ ๊นŠ์ด๊ฐ€ ๊นŠ์–ด์ง€๋ฉด์„œ ์ธต์ด ์Œ“์ผ์ˆ˜๋ก ํ’๋ถ€ํ•ด์ง„๋‹ค. (ํ˜„์žฌ ์‹ ๊ฒฝ๋ง์˜ ๊นŠ์ด๋Š” ์•„์ฃผ ์ค‘์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์ด ์ •๋ก ์ด๋‹ค.)

- Depth์˜ ์ค‘์š”์„ฑ์— ๋Œ€ํ•ด ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์งˆ๋ฌธ์ด ๋ฐœ์ƒํ•œ๋‹ค: ๋” ๋งŽ์€ ์ธต์„ ์Œ“์€ ๊ฒƒ ๋งŒํผ ์‹ ๊ฒฝ๋ง์„ ํ•™์Šต์‹œํ‚ค๊ธฐ ๋” ์‰ฌ์šธ๊นŒ?
[
Is learning better networks as easy as stacking more layers?]
- ์ด ์งˆ๋ฌธ์˜ ๋‹ต์„ ์œ„ํ•œ ํฐ ์žฅ์• ๋ฌผ์€ ์•…๋ช…๋†’์€ gradient vanishing/exploding๋ฌธ์ œ๋กœ ์‹œ์ž‘๋ถ€ํ„ฐ ์ˆ˜๋ ด์„ ๋ฐฉํ•ดํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
๋‹ค๋งŒ, ์ด ๋ฌธ์ œ๋Š” ์ดˆ๊ธฐํ™”๋ฅผ ์ •๊ทœํ™”ํ•˜๊ฑฐ๋‚˜ ์ค‘๊ฐ„์— ์ •๊ทœํ™”์ธต์„ ๋„ฃ์–ด ์ˆ˜์‹ญ๊ฐœ(tens)์˜ ์ธต์˜ ์‹ ๊ฒฝ๋ง์ด ์—ญ์ „ํŒŒ๋ฅผ ํ†ตํ•ด SGD๋ฅผ ์œ„ํ•œ ์ˆ˜๋ ด์„ ์‹œ์ž‘ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ๊ฐ™์ด ๋Œ€๋ถ€๋ถ„ ๋‹ค๋ค„์กŒ๋‹ค.

Figure 1. Degradation Problem
[Degradation Problem]
- ๋” ๊นŠ์€ ์‹ ๊ฒฝ๋ง์ด ์ˆ˜๋ ด์„ ์‹œ์ž‘ํ•  ๋•Œ, ์„ฑ๋Šฅ์ €ํ•˜(degradation)๋ฌธ์ œ๊ฐ€ ๋…ธ์ถœ๋˜์—ˆ๋‹ค : ์‹ ๊ฒฝ๋ง๊นŠ์ด๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด ์ •ํ™•๋„๊ฐ€ ํฌํ™”์ƒํƒœ๊ฐ€ ๋˜๊ณ , ๊ทธ ๋‹ค์Œ์— ๋น ๋ฅด๊ฒŒ ์ €ํ•˜๋œ๋‹ค. 
- ์ด ๋ฌธ์ œ์˜ ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ๋ฌธ์ œ์ ์€ ๋ฐ”๋กœ overfitting์ด ์ด ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ ์ธ๋ฐ, ์ ์ ˆํ•œ ์‹ฌ์ธต๋ชจ๋ธ์— ๋” ๋งŽ์€ ์ธต์„ ์ถ”๊ฐ€ํ•˜๋ฉด (์šฐ๋ฆฌ์˜ ์—ฐ๊ตฌ๊ฒฐ๊ณผ์ฒ˜๋Ÿผ) ๋” ๋†’์€ training error๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.
์œ„์˜ ๊ทธ๋ฆผ์€ ๋Œ€ํ‘œ์ ์ธ ์˜ˆ์‹œ์ด๋‹ค.

- training ์ •ํ™•๋„์˜ ์„ฑ๋Šฅ์ €ํ•˜๋Š” ๋ชจ๋“  ์‹œ์Šคํ…œ์ด optimize๋ฅผ ๋น„์Šทํ•œ ์ˆ˜์ค€์œผ๋กœ ์‰ฝ๊ฒŒ ํ•  ์ˆ˜ ์—†๋‹ค๋Š” ๊ฒƒ์„ ์‹œ์‚ฌํ•œ๋‹ค.
์ด๋ฅผ ์œ„ํ•ด ๋” ์–•์€ ๊ตฌ์กฐ์™€ ๋” ๋งŽ์€ ์ธต์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋” ๊นŠ์€ ๊ตฌ์กฐ๋ฅผ ๊ณ ๋ คํ•ด๋ณด์ž.

[Shallow Architecture. vs. Deeper Architecture]
๋” ๊นŠ์€ ๋ชจ๋ธ์— ๋Œ€ํ•œ ๊ตฌ์„ฑ์— ์˜ํ•œ ํ•ด๊ฒฐ์ฑ…์ด ์กด์žฌํ•œ๋‹ค
 - ์ถ”๊ฐ€๋œ layer๋Š” identity mapping์ด๋‹ค.
 - ๋‹ค๋ฅธ์ธต์€ ํ•™์Šต๋œ ์–•์€๋ชจ๋ธ์—์„œ ๋ณต์‚ฌ๋œ๋‹ค.
์ด๋Ÿฐ ๊ตฌ์กฐ์˜ ํ•ด๊ฒฐ์ฑ…์€ ์‹ฌ์ธต๋ชจ๋ธ์ด ์–•์€๋ชจ๋ธ๋ณด๋‹ค ๋” ๋†’์€ training error๋ฅผ ์ƒ์„ฑํ•˜์ง€ ์•Š์•„์•ผ ํ•จ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.
ํ•˜์ง€๋งŒ, ์‹คํ—˜์€ ํ˜„์žฌ์˜ ํ•ด๊ฒฐ์ฑ…์€ ์ด๋Ÿฐ๊ตฌ์กฐ์˜ ํ•ด๊ฒฐ์ฑ…๋ณด๋‹ค ๋น„๊ต์  ์ข‹๊ฑฐ๋‚˜ ๋” ๋‚˜์€ ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ์„ ์ˆ˜ ์—†์—ˆ๋‹ค.

cf. [Identity. &. Identity Mapping]
ResNet๊ตฌ์กฐ์—์„œ Residual Block์€ identity(= identity mapping) ์„ ๋งํ•œ๋‹ค.
- ์ž…๋ ฅ์—์„œ ์ถœ๋ ฅ์œผ๋กœ์˜ ์ „์ฒด ๋งคํ•‘์„ ํ•™์Šตํ•˜๋Š” ๋Œ€์‹  ์ž…๋ ฅ์„ ์•ฝ๊ฐ„ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค.
- Identity Block์€ Skip Connection์ด Identity mapping์„ ์ˆ˜ํ–‰ํ•˜๋Š” residual block์˜ ์ผ์ข…์ด๋‹ค.
- ์ฆ‰, ๋ณ€ํ™˜์—†์ด ์ž…๋ ฅ์ด block์˜ ์ถœ๋ ฅ์— ์ง์ ‘ ์ถ”๊ฐ€๋˜๋Š” ๊ฒƒ์œผ๋กœ Identity Block์€ input์˜ ์ฐจ์›์„ ์œ ์ง€์‹œํ‚จ๋‹ค.
- Identity Block์€ Residual Networks์—์„œ ์ž…๋ ฅ์˜ ์ฐจ์›์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋น„์„ ํ˜•์„ฑ์„ ๋„์ž…ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.
์ด๋Š” ์‹ ๊ฒฝ๋ง์ด ๋” ๋ณต์žกํ•œ ํŠน์ง•์„ ํ•™์Šตํ•˜๋„๋ก ๋•๊ณ  ๋งค์šฐ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•œ๋‹ค.

์ด ๋…ผ๋ฌธ์—์„œ, Degradation Problem์„ "deep residual learning framework"๋ฅผ ์ด์šฉํ•ด ๋‹ค๋ฃฐ ๊ฒƒ์ด๋‹ค.
- ๊ฐ ๋ช‡ ๊ฐœ์˜ ์ ์ธต(stacked layer)์ด ์›ํ•˜๋Š” ๊ธฐ๋ณธ์  ๋งตํ•‘์— ์ง์ ‘ ๋งž์ถ”๊ธฐ(fit)๋ฅผ ๋ฐ”๋ผ๋Š” ๋Œ€์‹ , ์ด๋Ÿฌํ•œ layer๊ฐ€ ์ž”์ฐจ๋งตํ•‘์— ์ ํ•ฉํ•˜๋„๋ก ๋ช…์‹œ์ ์œผ๋กœ ํ—ˆ์šฉํ•œ๋‹ค.


- ์šฐ๋ฆฐ ์„ฑ๋Šฅ์ €ํ•˜๋ฌธ์ œ(Degradation Problem)๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์šฐ๋ฆฌ์˜ ํ•ด๊ฒฐ๋ฒ•์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด ImageNet์œผ๋กœ ํฌ๊ด„์ ์ธ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์—ฌ ๋‹ค์Œ 2๊ฐ€์ง€๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค.
 โ‘  ๊ทน๋„๋กœ ๊นŠ์€ ์ž”์ฐจ์‹ ๊ฒฝ๋ง์€ ์ตœ์ ํ™” ํ•˜๊ธฐ ์‰ฝ๋‹ค.
   - ๋‹จ์ˆœํžˆ ์ธต๋งŒ ์Œ“๋Š” ์ƒ๋Œ€์ ์œผ๋กœ "ํ‰๋ฒ”ํ•œ(plain)" ์‹ ๊ฒฝ๋ง์€ ๊นŠ์ด๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด ๋” ๋†’์€ training error๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
 โ‘ก ์šฐ๋ฆฌ์˜ ์‹ฌ์ธต์ž”์ฐจ์‹ ๊ฒฝ๋ง์€ ๊นŠ์ด๊ฐ€ ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•˜์—ฌ ์ •ํ™•๋„๋ฅผ ์‰ฝ๊ฒŒ ๋†’์˜€๋‹ค.
์ด๋Š” optimization์˜ ์–ด๋ ค์›€๊ณผ ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์˜ ํšจ๊ณผ๊ฐ€ ํŠน์ • dataset์™€ ์œ ์‚ฌํ•˜์ง€ ์•Š์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค.


- ImageNet Classification dataset์—์„œ ์šฐ๋ฆฌ๋Š” ๊ทน๋„๋กœ ์‹ฌ์ธต์ ์ธ ์ž”์ฐจ์‹ ๊ฒฝ๋ง์—์˜ํ•ด ์šฐ์ˆ˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค.
์šฐ๋ฆฌ์˜ "152-layer residual net"์€ ImageNet์— ์ œ์ถœ๋œ ๊ฐ€์žฅ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์ด์ง€๋งŒ VGG๋ณด๋‹ค๋„ ๋ณต์žก์„ฑ์ด ๋‚ฎ๋‹ค.
์•™์ƒ๋ธ”์€ ImageNet testset์—์„œ 3.57%์˜ top-5 error๋ฅผ ๊ธฐ๋กํ–ˆ์œผ๋ฉฐ ILSVRC 2015 classification๋Œ€ํšŒ์—์„œ 1์œ„๋ฅผ ์ฐจ์ง€ํ–ˆ๋‹ค.
๊ทน๋„๋กœ ๊นŠ์€์‹ ๊ฒฝ๋ง์€ ์„œ๋กœ๋‹ค๋ฅธ ์ธ์‹์ž‘์—…์—์„œ๋„ ์šฐ์ˆ˜ํ•œ ์ผ๋ฐ˜ํ™”(generalization)์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•ด ๋‹ค์Œ๊ฐ™์€ ILSVRC ๋ฐ COCO 2015์—์„œ 1์œ„๋ฅผ ์ถ”๊ฐ€๋กœ ๋‹ฌ์„ฑํ–ˆ๋‹ค : ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation in ILSVRC & COCO 2015 competitions

 

 

 

 

2. Related Work

Residual Representations.
- image recognition์—์„œ, VLAD(Vector of Locally Aggregated Descriptors)๋Š” dictionary์— ๊ด€ํ•˜์—ฌ ์ž”์ฐจ๋ฒกํ„ฐ์— ์˜ํ•ด encoding๋˜๋Š” ํ‘œํ˜„์ด๋ฉฐ, Fisher Vector๋Š” VLAD์˜ ํ™•๋ฅ ๋ก ์ ์ธ ๋ฒ„์ „์˜ ๊ณต์‹์œผ๋กœ ๋งŒ๋“ค์–ด์ง„๋‹ค.
๋‘๊ฐ€์ง€ ๋ชจ๋‘ image ํšŒ๋ณต ๋ฐ ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•œ ๊ฐ•๋ ฅํ•œ shallow representation์ด๋‹ค.
Vector ์–‘์žํ™”(quantization)์˜ ๊ฒฝ์šฐ, ์ž”์ฐจ๋ฒกํ„ฐ์˜ ์ธ์ฝ”๋”ฉ์ด ๊ธฐ์กด๋ฒกํ„ฐ ์ธ์ฝ”๋”ฉ๋ณด๋‹ค ๋” ํšจ๊ณผ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค.

cf) VLAD๋Š” feature encoding์„ ์œ„ํ•œ ์ด๋ฏธ์ง€์ฒ˜๋ฆฌ๊ธฐ์ˆ ๋กœ Local Image feature๋ฅผ ๊ณ ์ •๊ธธ์ด๋ฒกํ„ฐํ‘œํ˜„(fixed-length vector representation)์œผ๋กœ ์ธ์ฝ”๋”ฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
ResNet์„ ์‚ฌ์šฉํ•˜๋Š” ์ผ๋ถ€๊ณผ์ œ์—์„œ VLAD๋Š” ๋ถ„๋ฅ˜๋ฅผ ์œ„ํ•ด ์ตœ์ข… softmax์ธต์„ ์‚ฌ์šฉํ•˜๋Š” ๋Œ€์‹ , ResNet์˜ ์ค‘๊ฐ„์ธต์—์„œ ์ถ”์ถœํ•œ ํŠน์ง•์„ ์ธ์ฝ”๋”ฉํ•˜๋Š”๋ฐ ์‚ฌ์šฉํ•˜์—ฌ ๋”์šฑ ์„ธ๋ถ„ํ™”๋œ ํŠน์ง•ํ‘œํ˜„์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

- ์ €์ˆ˜์ค€์˜ ๋น„์ „์—์„œ ํŽธ๋ฏธ๋ถ„๋ฐฉ์ •์‹(PDE.,Partial Differential Equations)์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” Mulit-Grid๋ฐฉ๋ฒ•์€ ์‹œ์Šคํ…œ์„ ์—ฌ๋Ÿฌ์ฒ™๋„(multiple scale)์—์„œ ํ•˜์œ„๋ฌธ์ œ(subproblem)๋กœ ์žฌ๊ตฌ์„ฑ(reformulate)ํ•œ๋‹ค. ์ด๋•Œ, ๊ฐ ํ•˜์œ„๋ฌธ์ œ๋Š” ๋” ๊ฑฐ์นœ ์ฒ™๋„์™€ ๋ฏธ์„ธํ•œ ์ฒ™๋„์‚ฌ์ด์—์„œ ์ž”์ฐจํ•ด๊ฒฐ(residual solution)์„ ๋‹ด๋‹นํ•œ๋‹ค.
Multi-Grid์˜ ๋Œ€์•ˆ์€ ๊ณ„์ธต์  ๊ธฐ๋ณธ ๋”•์…”๋„ˆ๋ฆฌ์˜ ์กฐ๊ฑดํ™”(hierarchical basis pre-conditioning)์ด๋‹ค.
์ด๋Š” ๋‘ ์ฒ™๋„ ์‚ฌ์ด์˜ ์ž”์ฐจ๋ฒกํ„ฐ๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜์— ์˜์กดํ•˜๋ฉฐ ์ด๋Ÿฐ ํ•ด๊ฒฐ๋ฒ•์€ ํ•ด๊ฒฐ์ฑ…์˜ ์ž”์ฐจํŠน์„ฑ(residual nature)์„ ๋ชจ๋ฅด๋Š” ๊ธฐ์กด์˜ ํ•ด๊ฒฐ์ฑ…๋ณด๋‹ค ํ›จ์”ฌ ๋น ๋ฅด๊ฒŒ ์ˆ˜๋ ดํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์œผ๋ฉฐ ์ด๋Ÿฐ ๋ฐฉ๋ฒ•์€ ์ข‹์€ ์žฌ๊ตฌ์„ฑ(reformulation)์ด๋‚˜ ์ „์ œ์กฐ๊ฑด(preconditioning)์ด ์ตœ์ ํ™”๋ฅผ ๋‹จ์ˆœํ™” ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์‹œ์‚ฌํ•œ๋‹ค.



Shorcut Connections
- Shorcut Connection์„ ์œ ๋„ํ•˜๋Š” ์ด๋ก  ๋ฐ ์‹ค์Šต์€ ์˜ค๋ž˜ ์—ฐ๊ตฌํ•ด์™”๋‹ค.
MLP training์˜ ์ดˆ๊ธฐ์‹ค์Šต์€ ์‹ ๊ฒฝ๋ง ์ž…๋ ฅ์—์„œ ์ถœ๋ ฅ์œผ๋กœ ์—ฐ๊ฒฐ๋œ ์„ ํ˜•์ธต(linear layer)๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
๋ช‡๋ช‡์˜ ์ค‘๊ฐ„์ธต์ด gradient vanishing/exploding์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด ๋ณด์กฐ๋ถ„๋ฅ˜๊ธฐ(auxiliary classifier)์— "์ง์ ‘์—ฐ๊ฒฐ"๋œ๋‹ค.
GoogLeNet(https://chan4im.tistory.com/149)์—์„œ, "Inception"์ธต์€ shortcut ๋ถ„๊ธฐ์™€ ๋ช‡๊ฐœ์˜ ๋” ๊นŠ์€ ๋ถ„๊ธฐ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

- ์šฐ๋ฆฌ์˜ ์—ฐ๊ตฌ๊ฐ€ ์ง„ํ–‰๋  ๋•Œ ๋™์‹œ์—, "highway networks"๋Š” gating function์ด ์žˆ๋Š” shortcut์—ฐ๊ฒฐ์„ ์ œ์‹œํ•˜์˜€๋‹ค.
์ด gate๋Š” parameter๊ฐ€ ์—†๋Š” identity shortcut๊ณผ ๋‹ฌ๋ฆฌ data์— ์˜์กดํ•˜๊ณ  parameter๋ฅผ ๊ฐ–๊ณ  ์žˆ๋‹ค.
gate shortcut์ด "closed"(= 0์— ์ ‘๊ทผํ•  ์ˆ˜๋ก) "hightway networks"์˜ layer๋Š” non-residual function์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.
 ๋Œ€์กฐ์ ์œผ๋กœ, ์šฐ๋ฆฌ์˜ ๊ณต์‹์€ ํ•ญ์ƒ ์ž”์ฐจํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•œ๋‹ค.
์šฐ๋ฆฌ์˜ identity shortcut์€ ๊ฒฐ์ฝ” "closed"๋˜์ง€ ์•Š๊ณ  ํ•™์Šตํ•ด์•ผํ•  ์ถ”๊ฐ€ ์ž”์ฐจํ•จ์ˆ˜์™€ ๋ชจ๋“  ์ •๋ณด๊ฐ€ ํ•ญ์ƒ ํ†ต๊ณผํ•œ๋‹ค.
๊ฒŒ๋‹ค๊ฐ€ highway network๋Š” ๊นŠ์ด๊ฐ€ ๊ทน๋„๋กœ ์ฆ๊ฐ€(100๊ฐœ ์ด์ƒ์˜ ์ธต)ํ•˜์—ฌ๋„ ์ •ํ™•๋„์˜ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ฃผ์ง€ ์•Š์•˜๋‹ค.

 

 

 

3. Deep Residual Learning

3.1. Residual Learning

 

 

3.2. Identity Mapping by Shortcuts

 

 

3.3. Network Architectures

Plain Network
  - plain ์‹ ๊ฒฝ๋ง์˜ ํ† ๋Œ€๋Š” ์ฃผ๋กœ VGGNet์˜ ์ฒ ํ•™์—์„œ ์˜๊ฐ์„ ๋ฐ›์•˜๋‹ค. (Fig.3, ์™ผ์ชฝ)
  - conv.layer๋Š” ๋Œ€๋ถ€๋ถ„ 3x3 filter๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, 2๊ฐ€์ง€์˜ ๊ฐ„๋‹จํ•œ ์„ค๊ณ„๋ฐฉ์‹์„ ๋”ฐ๋ฅธ๋‹ค.
    โ‘  ๋™์ผํ•œ ํฌ๊ธฐ์˜ ํŠน์ง•๋งต์ถœ๋ ฅ์— ๋Œ€ํ•˜ layer๋Š” ๋™์ผํ•œ ์ˆ˜์˜ filter๋ฅผ ๊ฐ–๋Š”๋‹ค.
    โ‘ก ํŠน์ง•๋งต ํฌ๊ธฐ๊ฐ€ 1/2(์ ˆ๋ฐ˜์œผ๋กœ ์ค„)์ด๋ฉด, layer๋‹น ์‹œ๊ฐ„๋ณต์žก์„ฑ์„ ์œ ์ง€ํ•ด์•ผ ํ•˜๊ธฐ์— filter์ˆ˜๊ฐ€ 2๋ฐฐ๊ฐ€ ๋œ๋‹ค.
  - ์šฐ๋ฆฐ stride=2์ธ conv.layer์— ์˜ํ•ด ์ง์ ‘ downsampling์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. 
  - ์‹ ๊ฒฝ๋ง์€ Global AveragePooling๊ณผ Softmax๊ฐ€ ์žˆ๋Š” 1000-way Fully-Connected๋กœ ์ข…๋ฃŒ๋œ๋‹ค.
  - weight-layer์˜ ์ด ๊ฐœ์ˆ˜๋Š” ๊ทธ๋ฆผ3์˜ ์ค‘๊ฐ„๊ณผ ๊ฐ™์€ 34๊ฐœ์ด๋‹ค.

 Residual Network
  - ์œ„์˜ plain์‹ ๊ฒฝ๋ง์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์šฐ๋ฆฌ๋Š” ์‹ ๊ฒฝ๋ง์„ counterpart residual ๋ฒ„์ „์œผ๋กœ ๋ฐ”๊พธ๋Š” Shortcut Connection(๊ทธ๋ฆผ 3, ์˜ค๋ฅธ์ชฝ)์„ ์‚ฝ์ž…ํ•œ๋‹ค.
  - Identity shortcuts(Eqn. (1))์€ ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์ด ๋™์ผํ•œ ์น˜์ˆ˜์ผ ๋•Œ ์ง์ ‘ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค(๊ทธ๋ฆผ 3์˜ ์‹ค์„  shortcuts).

  - ์ฐจ์ˆ˜๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด(๊ทธ๋ฆผ 3์˜ ์ ์„  shortcuts), ์šฐ๋ฆฌ๋Š” ๋‘ ๊ฐ€์ง€ ์˜ต์…˜์„ ๊ณ ๋ คํ•œ๋‹ค:
    โ‘  shortcut์€ ์ฐจ์›์„ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€์ ์œผ๋กœ 0๊ฐœ์˜ ํ•ญ๋ชฉ์ด ํŒจ๋”ฉ๋œ ์ƒํƒœ์—์„œ Identity mapping์„ ๊ณ„์† ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
        ์ด๋•Œ, ์ถ”๊ฐ€์ ์ธ ๋งค๊ฐœ ๋ณ€์ˆ˜๋ฅผ ๋„์ž…ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค;

    โ‘ก Eqn. (2)์˜ Projection shortcut์€ ์ฐจ์›์„ ์ผ์น˜์‹œํ‚ค๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค(1×1 convolutions๋กœ ์ˆ˜ํ–‰).

  - ๋‘ ์˜ต์…˜ ๋ชจ๋‘์—์„œ shortcut์€ ๋‘ ๊ฐ€์ง€ ํฌ๊ธฐ์˜ ํŠน์ง•๋งต ํ†ต๊ณผ ์‹œ, stride=2๋กœ ์ˆ˜ํ–‰๋œ๋‹ค.


 

 

3.4. Implementation

- [AlexNet, VGGNet]์˜ ์‹คํ—˜์„ ๋”ฐ๋ผ์„œ ๊ตฌํ˜„์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค. [https://chan4im.tistory.com/145, https://chan4im.tistory.com/146]
- scale augmentation[VGGNet]์„ ์œ„ํ•ด [256, 480]๋กœ randomํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋ง, resize๋ฅผ ์ง„ํ–‰ํ•˜์˜€๋‹ค.
- 224x224 crop์€ ๋žœ๋คํ•˜๊ฒŒ image์—์„œ ์ƒ˜ํ”Œ๋ง๋˜๊ฑฐ๋‚˜ [AlexNet]์ฒ˜๋Ÿผ ํ”ฝ์…€ ๋‹น ํ‰๊ท ๊ฐ’์˜ ์ฐจ๋ฅผ ์ด์šฉํ•œ horizontal flip์„ ์ง„ํ–‰ํ•˜์˜€์œผ๋ฉฐ ์ •์„์ ์ธ color augmentation์€ [Alexnet]๋ฐฉ์‹์„ ์ด์šฉํ–ˆ๋‹ค.

- Batch Normalization์„ ์ฑ„ํƒํ•˜์—ฌ BN๋…ผ๋ฌธ[https://chan4im.tistory.com/147]์—์„œ ๋‚˜์˜จ ๊ฒƒ ์ฒ˜๋Ÿผ Conv.layer ์งํ›„, activation์ด์ „์— ์‚ฌ์šฉ์„ ํ•ด์ฃผ์—ˆ๋‹ค.
- ReLU๋…ผ๋ฌธ[https://chan4im.tistory.com/150]์—์„œ ์ฒ˜๋Ÿผ weight๋ฅผ ์ดˆ๊ธฐํ™”ํ•˜๊ณ  ๋ชจ๋“  ๊ธฐ๋ณธ/์ž”์ฐจ์‹ ๊ฒฝ๋ง์„ ์ฒ˜์Œ๋ถ€ํ„ฐ trainingํ•œ๋‹ค.

- mini-batch size๊ฐ€ 256์ธ SGD(weight decay=0.0001. &. momentum=0.9)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉฐ, learning rate๋Š” ์ดˆ๊ธฐ๊ฐ’์ด 0.1๋กœ ์‹œ์ž‘ํ•ด ํ•™์Šต์ •์ฒดํ˜„์ƒ ์ฆ‰, error plateaus(
SGD๋Š” Plateau์— ์ทจ์•ฝํ•จ)๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด 10์”ฉ ํ•™์Šต๋ฅ ์„ ๋‚˜๋ˆ„์–ด ์ค€๋‹ค.
๋ชจ๋ธ์€ ์ตœ๋Œ€ 60 x 10^4 iteration์œผ๋กœ training๋œ๋‹ค.
- [Batch Normalization]๋…ผ๋ฌธ์— ๊ทผ๊ฑฐํ•˜์—ฌ Dropout์€ ๋ฐฐ์ œํ•˜๊ณ  ์‹คํ—˜์„ ์ง„ํ–‰ํ•œ๋‹ค.


- ์‹คํ—˜์‹œ, ๋น„๊ต๋ถ„์„์„ ์œ„ํ•ด standard 10-crop testing์„ ์ฑ„ํƒํ•˜๋ฉฐ[AlexNet], ์ตœ์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ ์œ„ํ•ด [VGG, Delving Deep into Rectifiers]๋…ผ๋ฌธ์ฒ˜๋Ÿผ Fully-Convolutionalํ˜•ํƒœ๋ฅผ ์ฑ„ํƒํ•œ๋‹ค.
๋˜ํ•œ, ์—ฌ๋Ÿฌ ์ฒ™๋„์—์„œ์˜ score๋ฅผ ํ‰๊ท (average)ํ•˜๋Š”๋ฐ, ์ด๋•Œ image์˜ ํฌ๊ธฐ๋Š” ์งง์€์ชฝ์ด {224, 256, 384, 480, 640}์— ์˜ค๋„๋ก ์กฐ์ •๋œ๋‹ค.

 

 

 

4. Experiments

4.1. ImageNet Classification


 Plain Networks.
  - ๋จผ์ € 18์ธต, 34์ธต plain์‹ ๊ฒฝ๋ง์„ ํ‰๊ฐ€ํ•œ๋‹ค. [34-layer plain net (Fig.3.์ค‘๊ฐ„), ์ž์„ธํ•œ ๊ตฌ์กฐ๋Š” ์•„๋ž˜ ํ‘œ1์„ ์ฐธ์กฐ.]




 - ํ‘œ 2์˜ ๊ฒฐ๊ณผ๋Š” ๋” ๊นŠ์€ 34-layer plain net์ด ๋” ์–•์€ 18-layer plain net๋ณด๋‹ค ๋” ๋†’์€ val_Error๊ฐ’์„ ๊ฐ–์Œ์„ ๋ณด์ธ๋‹ค.

์ด์œ ๋ฅผ ๋ฐํžˆ๊ธฐ ์œ„ํ•ด ๊ทธ๋ฆผ 4(์™ผ์ชฝ)์—์„œ training๊ณผ์ • ์ค‘ ๋ฐœ์ƒํ•œ training/validation error๋ฅผ ๋น„๊ตํ•œ๋‹ค
์œ„ ๊ทธ๋ฆผ์—์„œ ์šฐ๋ฆฌ๋Š” ์„ฑ๋Šฅ์ €ํ•˜๋ฌธ์ œ(Degradation Problem)์„ ๋ฐœ๊ฒฌํ–ˆ๋‹ค.
18-layer plain net์˜ solution space๊ฐ€ 34-layer plain net์„ ๋Œ€์ฒดํ•จ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  34-layer plain net์€ ์ „์ฒด์ ์ธ training์ ˆ์ฐจ์— ๊ฑธ์ณ ๋†’์€ training error๋ฅผ ๊ฐ–๋Š”๋‹ค.


- ์šฐ๋ฆฐ ์ด๋Ÿฐ ์ตœ์ ํ™” ์–ด๋ ค์›€์ด gradient vanishing์œผ๋กœ ์ธํ•œ ๊ฐ€๋Šฅ์„ฑ์€ ๋‚ฎ๋‹ค๊ณ  ์ฃผ์žฅํ•œ๋‹ค.
์ด๋Ÿฐ plain์‹ ๊ฒฝ๋ง์€ ์ˆœ์ „ํŒŒ์‹ ํ˜ธ๊ฐ€ 0์ด ์•„๋‹Œ ๋ถ„์‚ฐ๊ฐ’์„ ๊ฐ–๋„๋ก ๋ณด์žฅํ•˜๋Š” Batch Normalization์œผ๋กœ ํ›ˆ๋ จ๋œ๋‹ค.
์šฐ๋ฆฐ ๋˜ํ•œ ์—ญ์ „ํŒŒ๋œ ๊ธฐ์šธ๊ธฐ๊ฐ€ BN๊ณผ ํ•จ๊ป˜ healthy norm์„ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ๋‹ค.
๊ทธ๋ž˜์„œ ์ˆœ์ „ํŒŒ์™€ ์—ญ์ „ํŒŒ์˜ ์‹ ํ˜ธ๋“ค์ด ์‚ฌ๋ผ์ง€์ง€ ์•Š๋Š”๋‹ค.
(์‹ค์ œ๋กœ 34-layer plain net์€ ํ‘œ 3์—์„œ ๋ณด์ด๋“ฏ ์—ฌ์ „ํžˆ ๊ฒฝ์Ÿ๋ ฅ์žˆ๋Š” ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜๊ธฐ์— solver๊ฐ€ ์–ด๋Š์ •๋„ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.)
์šฐ๋ฆฐ deep plain์‹ ๊ฒฝ๋ง์ด ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ๋‚ฎ์€ ์ˆ˜๋ ด๋ฅ ์„ ๊ฐ€์งˆ  ์ˆ˜ ์žˆ์–ด์„œ training error๊ฐ์†Œ์— ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค๊ณ  ์ถ”์ธกํ•œ๋‹ค.





 Residual Networks.
๋‹ค์Œ์œผ๋กœ 18 ๋ฐ 34-layer residual nets (ResNets)์„ ํ‰๊ฐ€ํ•œ๋‹ค.
๊ทผ๊ฐ„์ด ๋˜๋Š” ๊ตฌ์กฐ๋Š” ์œ„์˜ plain์‹ ๊ฒฝ๋ง๊ณผ ๋™์ผํ•˜๋ฉฐ ๊ทธ๋ฆผ 3(์šฐ์ธก)์ฒ˜๋Ÿผ 3x3 filter์˜ ๊ฐ ์Œ์— shortcut connection์„ ์ถ”๊ฐ€ํ•œ๋‹ค.
์ฒซ ๋น„๊ต(ํ‘œ2์™€ ๊ทธ๋ฆผ4,์˜ค๋ฅธ์ชฝ)์—์„  ๋ชจ๋“  Shortcut์— identity mapping์„ ์‚ฌ์šฉํ•˜๊ณ  ์ฐจ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด zero-padding์„ ํ•˜๊ธฐ์— [option A], plain์— ๋น„ํ•ด ์ถ”๊ฐ€์ ์ธ parameter๊ฐ€ ์—†๋‹ค.

[ํ‘œ 2์™€ ๊ทธ๋ฆผ 4์—์„œ ์–ป์€ 3๊ฐ€์ง€ ์ฃผ์š” ๊ด€์ฐฐ๊ฒฐ๊ณผ]
โ‘  ๋จผ์ €, ์ž”์ฐจํ•™์Šต์œผ๋กœ ์ƒํ™ฉ์ด ์—ญ์ „๋œ๋‹ค.
  - 34-layer ResNet์€ 18-layer ResNet๋ณด๋‹ค 2.8%๋‚ซ๋‹ค.
  - ๋” ์ค‘์š”ํ•œ ์ ์€, 34-layer ResNet์ด ์ƒ๋‹นํžˆ ๋‚ฎ์€ training error๋ฅผ ๋‚˜ํƒ€๋‚ด๊ธฐ์— validation dataset์œผ๋กœ generalization, ์ฆ‰ ์ผ๋ฐ˜ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ์ ์ธ๋ฐ, ์ด๋Š” ์„ฑ๋Šฅ์ €ํ•˜๋ฌธ์ œ๊ฐ€ ์ด ์„ค์ •์—์„œ ์ž˜ ํ•ด๊ฒฐ๋˜์—ˆ๊ธฐ์— ๊นŠ์ด์ฆ๊ฐ€๋ฅผ ์ด์šฉํ•ด ์ •ํ™•์„ฑ์˜ ์ด๋“์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ์‹œ์‚ฌํ•œ๋‹ค.

โ‘ก ๋‘˜์งธ๋กœ, plain์‹ ๊ฒฝ๋ง๊ณผ ๋น„๊ตํ•ด 34-layer ResNet์€ ํ‘œ 2์ฒ˜๋Ÿผ top-1 error rate๋ฅผ 3.5% ๊ฐ์†Œ์‹œํ‚ค๋ฉฐ ์„ฑ๊ณต์ ์œผ๋กœ ๊ฐ์†Œ๋œ training error๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. (๊ทธ๋ฆผ 4์—์„œ ์˜ค๋ฅธ์ชฝ  vs. ์™ผ์ชฝ)
์ด ๋น„๊ต๋Š” ๊ทน๋„๋กœ ์‹ฌ์ธต์ ์ธ ์‹ ๊ฒฝ๋ง์— ๋Œ€ํ•œ ์ž”์ฐจํ•™์Šต์˜ ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•ด์ค€๋‹ค.

โ‘ข ๋งˆ์ง€๋ง‰์œผ๋กœ, 18-layer plain/ResNet์ด ๋น„๊ต์  ์ •ํ™•ํ•˜์ง€๋งŒ (ํ‘œ 2), 18-layer ResNet์€ ๋” ๋นจ๋ฆฌ ์ˆ˜๋ ดํ•œ๋‹ค๋Š” ๊ฒƒ์— ์ฃผ๋ชฉํ•˜์ž.(๊ทธ๋ฆผ 4์—์„œ ์˜ค๋ฅธ์ชฝ  vs. ์™ผ์ชฝ)
  - ์—ฌ๊ธฐ์„œ 18-layer์ฒ˜๋Ÿผ ์‹ ๊ฒฝ๋ง์ด "์ง€๋‚˜์น˜๊ฒŒ ๊นŠ์ง€ ์•Š์€" ๊ฒฝ์šฐ, ํ˜„์žฌ์˜ SGD solver๋Š” ์—ฌ์ „ํžˆ plain์‹ ๊ฒฝ๋ง์— ๋Œ€ํ•ด ์ข‹์€ ํ•ด๊ฒฐ์ฑ…์„ ์ฐพ๋Š”๋‹ค.
  - ์ด ๊ฒฝ์šฐ, ResNet์€ ์ดˆ๊ธฐ๋‹จ๊ณ„์—์„œ ๋” ๋น ๋ฅธ ์ˆ˜๋ ด์„ ์ œ๊ณตํ•ด ์ตœ์ ํ™”๋ฅผ ์™„ํ™”ํ•ด์ค€๋‹ค.




Identity. vs. Projection Shortcuts.
  - parameter๊ฐ€ ์—†๋Š” identity shorcut์€ training์— ๋„์›€์ด ๋œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋ฏ€๋กœ ๋‹ค์Œ์œผ๋กœ๋Š” projection shortcut(Eqn. (2))์— ๋Œ€ํ•ด ์กฐ์‚ฌํ•œ๋‹ค.
ํ‘œ 3์—์„œ๋Š” 3๊ฐ€์ง€ ์˜ต์…˜์„ ๋น„๊ตํ•œ๋‹ค.
(A) zero-padding shortcut์€ ์ฐจ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค.
  - ์ด๋•Œ, ๋ชจ๋“  shortcut์€ parameter๊ฐ€ ์—†๋‹ค(ํ‘œ 2 ๋ฐ ๊ทธ๋ฆผ 4์˜ ์˜ค๋ฅธ์ชฝ)

(B) projection shortcut์€ ์ฐจ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋œ๋‹ค.
  - ์ด๋•Œ, ๋‹ค๋ฅธ shortcut์€ identity์ด๋‹ค.

(C) ๋ชจ๋“  shortcut์€ projection์ด๋‹ค.

- ํ‘œ 3์€ 3๊ฐ€์ง€ ์˜ต์…˜ ๋ชจ๋‘ plain๋ณด๋‹ค ๋‚ซ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค. ์ด๋•Œ, B๊ฐ€ A๋ณด๋‹ค ์•ฝ๊ฐ„ ๋” ๋‚ซ๊ณ  C๊ฐ€ B๋ณด๋‹ค ์•ฝ๊ฐ„ ๋‚ซ๋‹ค. (C > B > A)
์šฐ๋ฆฐ ์ด์— ๋Œ€ํ•ด A์˜ zero-padding์ฐจ์›์ด ์‹ค์ œ๋กœ ์ž”์ฐจํ•™์Šต์„ ๊ฐ–์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๋ฉฐ (B > A์ธ ์ด์œ )
13๊ฐœ์˜ ๋งŽ์€ projection shortcut์— ์˜ํ•ด ๋„์ž…๋œ ์ถ”๊ฐ€์  ๋งค๊ฐœ๋ณ€์ˆ˜ ๋•Œ๋ฌธ์ด๋ผ ๋ณธ๋‹ค. (C > B์ธ ์ด์œ )
๊ทธ๋Ÿฌ๋‚˜ A/B/C์‚ฌ์ด ์ž‘์€ ์ฐจ์ด๋Š” ์„ฑ๋Šฅ์ €ํ•˜๋ฌธ์ œํ•ด๊ฒฐ์„ ์œ„ํ•ด Projection Shortcut์ด ํ•„์ˆ˜์ ์ด์ง€๋Š” ์•Š๋‹ค๋Š” ๊ฒƒ์„ ์‹œ์‚ฌํ•œ๋‹ค.
๋ฉ”๋ชจ๋ฆฌ/์‹œ๊ฐ„๋ณต์žก์„ฑ๊ณผ ๋ชจ๋ธํฌ๊ธฐ๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ด ๋…ผ๋ฌธ์˜ ๋‚˜๋จธ์ง€ ํŒŒํŠธ์—์„œ๋Š” C๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค.
identity shortcut์€ ์•„๋ž˜์— ์†Œ๊ฐœ๋œ Residual Architecture์˜ ๋ณต์žก์„ฑ์„ ์ฆ๊ฐ€์‹œํ‚ค์ง€ ์•Š๊ธฐ์— ๋”์šฑ ์ค‘์š”ํ•˜๋‹ค.




 Deeper Bottleneck Architectures.

 [50-layer ResNet]
  - ์šฐ๋ฆฌ๋Š” 34-layer net
์˜ ๊ฐ 2์ธต block์„ 3-layer bottleneck block์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ 50-layer ResNet(ํ‘œ 1)์„ ์ƒ์„ฑํ•œ๋‹ค.
  - ์šฐ๋ฆฌ๋Š” ์ฐจ์ˆ˜๋ฅผ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•ด option B๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
  - ์ด ๋ชจ๋ธ์—๋Š” 38์–ต ๊ฐœ์˜ FLOPS๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.


 [101-layer ResNet. &. 152-layer ResNet]
  - ์šฐ๋ฆฌ๋Š” ๋” ๋งŽ์€ 3-layer blocks(ํ‘œ 1)์„ ์‚ฌ์šฉํ•˜์—ฌ 101์ธต ๋ฐ 152์ธต ResNets๋ฅผ ๊ตฌ์„ฑํ•œ๋‹ค.
  - ๋†€๋ž๊ฒŒ๋„ ๊นŠ์ด๊ฐ€ ํฌ๊ฒŒ ์ฆ๊ฐ€ํ–ˆ์ง€๋งŒ 152์ธต ResNet(113์–ต FLOP)์€ ์—ฌ์ „ํžˆ VGG-16/19(153์–ต/196์–ต FLOP)๋ณด๋‹ค ๋ณต์žก์„ฑ์ด ๋‚ฎ๋‹ค.
  - 50/101/152-layer ResNets๋Š” 34-layer๋ณด๋‹ค ์ƒ๋‹นํžˆ ์ •ํ™•ํ•˜๋‹ค(ํ‘œ 3 ๋ฐ 4). 
  - ์šฐ๋ฆฌ๋Š” ์„ฑ๋Šฅ์ €ํ•˜๋ฅผ ๊ด€์ฐฐํ•˜์ง€ ์•Š๊ธฐ์— ์ƒ๋‹นํžˆ ์ฆ๊ฐ€๋œ ๊นŠ์ด์—์„œ ์ƒ๋‹นํ•œ ์ •ํ™•๋„ ํ–ฅ์ƒ์„ ์ด๋ค˜๋‹ค.
  - ๊นŠ์ด์˜ ์ด์ ์€ ๋ชจ๋“  ํ‰๊ฐ€ ์ง€ํ‘œ(evaluation metric)์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค(ํ‘œ 3 ๋ฐ 4).


cf. FLOPs
์ปดํ“จํ„ฐ์˜ ์„ฑ๋Šฅ์„ ์ˆ˜์น˜๋กœ ๋‚˜ํƒ€๋‚ผ ๋•Œ ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋‹จ์œ„์ด๋‹ค. ์ดˆ๋‹น ๋ถ€๋™์†Œ์ˆ˜์  ์—ฐ์‚ฐ์ด๋ผ๋Š” ์˜๋ฏธ๋กœ ์ปดํ“จํ„ฐ๊ฐ€ 1์ดˆ๋™์•ˆ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” ๋ถ€๋™์†Œ์ˆ˜์  ์—ฐ์‚ฐ์˜ ํšŸ์ˆ˜๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์‚ผ๋Š”๋‹ค.







Comparision with  State-of-the-art  Method.
  - ํ‘œ 4์—์„œ ์ด์ „์˜ ์ตœ๊ณ ์˜ ๋‹จ์ผ ๋ชจ๋ธ ๊ฒฐ๊ณผ์™€ ๋น„๊ตํ•˜์˜€๋Š”๋ฐ, ์šฐ๋ฆฌ์˜ ๊ทผ-๋ณธ 34์ธต ResNets๋Š” ๋งค์šฐ ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค.
  - 152์ธต ResNet์€ ๋‹จ์ผ ๋ชจ๋ธ์—์„œ top-5 error rate์—์„œ 4.49%๋ฅผ ๊ฐ–๋Š”๋‹ค.
  - ์ด ๋‹จ์ผ ๋ชจ๋ธ ๊ฒฐ๊ณผ๋Š” ์ด์ „๊นŒ์ง€์˜ ๋ชจ๋“  ์•™์ƒ๋ธ” ๊ฒฐ๊ณผ๋ฅผ ๋Šฅ๊ฐ€ํ•œ๋‹ค (ํ‘œ 5).


  - ๊นŠ์ด๊ฐ€ ๋‹ค๋ฅธ 6๊ฐœ์˜ ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์•™์ƒ๋ธ”์„ ํ˜•์„ฑํ•˜์˜€๋Š”๋ฐ, testset์—์„œ top-5 error๊ฐ€ 3.57% ์˜€๋‹ค(ํ‘œ 5).
  (์ œ์ถœ ๋‹น์‹œ, ์•™์ƒ๋ธ” ๊ธฐ๋ฒ•์€ 152์ธต ๋ชจ๋ธ ๋‘ ๊ฐœ๋งŒ ํฌํ•จํ•˜์˜€์œผ๋ฉฐ
ILSVRC 2015์—์„œ 1์œ„๋ฅผ ์ฐจ์ง€ํ–ˆ๋‹ค.)

 

 

4.2. CIFAR-10  and  Analysis

- ์šฐ๋ฆฌ๋Š” 10๊ฐœ์˜ ํด๋ž˜์Šค์—์„œ 5๋งŒ๊ฐœ์˜ traininset๊ณผ 1๋งŒ๊ฐœ์˜ test image๋กœ ๊ตฌ์„ฑ๋œ CIFAR-10 dataset์— ๋Œ€ํ•ด ๋” ๋งŽ์€ ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ๋‹ค.
์šฐ๋ฆฌ๋Š” training set์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋˜๊ณ  testset์— ๋Œ€ํ•ด ํ‰๊ฐ€๋œ ์‹คํ—˜์„ ๊ฒฐ๊ณผ๋กœ ์ œ์‹œํ•œ๋‹ค.
์šฐ๋ฆฌ๋Š” ๊ทน๋„๋กœ ์‹ฌ์ธต์ ์ธ ์‹ ๊ฒฝ๋ง์˜ ๋™์ž‘์— ์ดˆ์ ์„ ๋งž์ถ˜๋‹ค.
์ตœ์ฒจ๋‹จ ๊ฒฐ๊ณผ๋ฅผ ์ถ”์ง„ํ•˜๋Š” ๊ฒƒ์—๋Š” ์ดˆ์ ์„ ๋งž์ถ”์ง€ ์•Š๊ธฐ์— ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‹จ์ˆœํ•œ ๊ตฌ์กฐ๋ฅผ ์˜๋„์ ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.


- plain/residual architecture๋Š” ๊ทธ๋ฆผ 3(๊ฐ€์šด๋ฐ/์˜ค๋ฅธ์ชฝ)์˜ ํ˜•ํƒœ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.
์‹ ๊ฒฝ๋ง์˜ ์ž…๋ ฅ์€ ํ”ฝ์…€๋‹น ํ‰๊ท ์˜ ์ฐจ(per-pixel mean subtracted)์˜ 32x32 ์ด๋ฏธ์ง€์ด๋‹ค.
์ฒซ ๋ฒˆ์งธ ์ธต์€ 3×3 convolution์ด๋ฉฐ
๊ทธ ํ›„ ํฌ๊ธฐ๊ฐ€ ๊ฐ๊ฐ {32, 16, 8}์ธ ํŠน์ง•๋งต์—์„œ 3×3 convolution์„ ๊ฐ€์ง„ 6n layers์˜ ์ ์ธต ํ›„ ๊ฐ ํŠน์ง•๋งต ํฌ๊ธฐ์— ๋Œ€ํ•ด 2n layer๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋•Œ, filter์ˆ˜๋Š” ๊ฐ๊ฐ {16, 32, 64}๊ฐœ์ด๋‹ค.
sub-sampling์€ stride=2์ธ convolution์— ์˜ํ•ด ์ˆ˜ํ–‰๋œ๋‹ค.

์‹ ๊ฒฝ๋ง์€ Global AveragePooling, 10-way FC.layer ๋ฐ softmax๋กœ ์ข…๋ฃŒ๋˜๋ฉฐ ์ด 6n+2๊ฐœ์˜ ์ ์ธต๋œ weight ์ธต์ด ์žˆ๋‹ค.
๋‹ค์Œ ํ‘œ๋Š” Architecture๋ฅผ ์š”์•ฝํ•œ ๊ฒƒ์ด๋‹ค:
Shortcut Connection ์‚ฌ์šฉ์‹œ, ํ•œ ์Œ์˜ 3x3 layer(์ด 3n๊ฐœ์˜ shortcut)์— ์—ฐ๊ฒฐ๋œ๋‹ค.
์ด dataset์—์„œ (option A์„ ํฌํ•จํ•œ)๋ชจ๋“  ๊ฒฝ์šฐ์— identity shortcut์„ ์‚ฌ์šฉํ•œ๋‹ค.
๋”ฐ๋ผ์„œ Residual Model์€  Plain Model๊ณผ depth, width. &. parameter์ˆ˜๊ฐ€ ์ •ํ™•ํ•˜๊ฒŒ ๋™์ผํ•˜๋‹ค.



- weight decay=0.0001๊ณผ momentum=0.9์„ ์‚ฌ์šฉํ•˜๊ณ  ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™” ๋ฐ Batch Normalization์„ ์‚ฌ์šฉํ•˜์ง€๋งŒ Dropout์€ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค. (์ด๋•Œ, ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”๋Š” https://chan4im.tistory.com/150๋ฅผ ๋”ฐ๋ฅธ๋‹ค.)
์ด ๋ชจ๋ธ๋“ค์€ 2๊ฐœ์˜ GPU์—์„œ 128์˜ mini-batch๋กœ ํ›ˆ๋ จ๋œ๋‹ค.
learning rate=0.1์˜ ํ•™์Šต ์†๋„๋กœ ์‹œ์ž‘ํ•ด 32000๊ณผ 48000 iteration์—์„œ 10์œผ๋กœ ๋‚˜๋ˆˆ๋‹ค.
64000๋ฒˆ์˜ iteration์—์„œ 45k/5k๋กœ ๋‚˜๋‰œ train/validation์˜ ๊ฐ’์ด ๊ฒฐ์ •๋˜๊ธฐ์— training์„ ์ข…๋ฃŒํ•œ๋‹ค.

training์„ ์œ„ํ•ด [Supervised Net.,https://arxiv.org/abs/1409.5185]์— ์†Œ๊ฐœ๋œ ๊ฐ„๋‹จํ•œ Data Augmentation์„ ๋”ฐ๋ฅธ๋‹ค.
๊ฐ ๋ฉด์— 4๊ฐœ์˜ pixel์ด padding๋˜๋ฉฐ
32x32 crop์€ padding๋œ ์ด๋ฏธ์ง€ ํ˜น์€ horizontal flip ์ค‘ ๋ฌด์ž‘์œ„๋กœ ์ƒ˜ํ”Œ๋ง๋œ๋‹ค.
test๋ฅผ ์œ„ํ•ด ์›๋ณธ 32x32 ์ด๋ฏธ์ง€์˜ single view๋งŒ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.




- 20, 32, 44 ๋ฐ 56 ์ธต ์‹ ๊ฒฝ๋ง์œผ๋กœ ์ด์–ด์ง€๋Š” n = {3, 5, 7, 9}์„ ๋น„๊ตํ•œ๋‹ค.
๊ทธ๋ฆผ 6(์™ผ์ชฝ)์€ Plain ์‹ ๊ฒฝ๋ง์˜ ์ž‘๋™์„ ๋ณด์—ฌ์ค€๋‹ค.
๊นŠ์€ Plain ์‹ ๊ฒฝ๋ง์€ ๊นŠ์ด๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด์„œ ์–ด๋ ค์›€์„ ๊ฒช๊ณ , ๋” ๊นŠ์ด ๋“ค์–ด๊ฐˆ ๋•Œ ๋” ๋†’์€ train error๋ฅผ ๋ณด์ธ๋‹ค.
์ด๋Ÿฐ ํ˜„์ƒ์€ ImageNet(๊ทธ๋ฆผ 4, ์™ผ์ชฝ) ๋ฐ MNIST([42] ์ฐธ์กฐ)์—์„œ์˜ ํ˜„์ƒ๊ณผ ์œ ์‚ฌํ•˜์—ฌ ์ด๋Ÿฌํ•œ ์ตœ์ ํ™”์˜ ์–ด๋ ค์›€์ด ๊ทผ๋ณธ์ ์ธ ๋ฌธ์ œ์ž„์„ ์‹œ์‚ฌํ•œ๋‹ค.


๊ทธ๋ฆผ 6(๊ฐ€์šด๋ฐ)์€ ResNets์˜ ๋™์ž‘์„ ๋ณด์—ฌ์ค€๋‹ค.
๋˜ํ•œ ImageNet ์‚ฌ๋ก€(๊ทธ๋ฆผ 4, ์˜ค๋ฅธ์ชฝ)์™€ ์œ ์‚ฌํ•˜๊ฒŒ ResNets๋Š” ์ตœ์ ํ™” ์–ด๋ ค์›€์„ ๊ทน๋ณตํ•˜๊ณ  ๊นŠ์ด๊ฐ€ ์ฆ๊ฐ€ํ•  ๋•Œ ์ •ํ™•๋„๊ฐ€ ํ–ฅ์ƒ๋จ์„ ๋ณด์—ฌ์ค€๋‹ค.



- 110์ธต ResNet์œผ๋กœ ์ด์–ด์ง€๋Š” n = 18์„ ์ถ”๊ฐ€๋กœ ํƒ๊ตฌํ•œ๋‹ค.
์ด ๊ฒฝ์šฐ, ์šฐ๋ฆฌ๋Š” ์ดˆ๊ธฐ์˜ learning rate=0.1์ด "์ˆ˜๋ ด์„ ์‹œ์ž‘ํ•˜๊ธฐ"์— ์•ฝ๊ฐ„ ๋„ˆ๋ฌด ํฌ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ๋‹ค.
๋”ฐ๋ผ์„œ training error๊ฐ€ 80%๋ฏธ๋งŒ์ด ๋  ๋•Œ๊นŒ์ง€ ๋‚ฎ์€ ํ•™์Šต๋ฅ ์ธ 0.01๋กœ ์‚ฌ์ „ training์„ ์ง„ํ–‰ํ•œ ํ›„(์•ฝ 400 iteration) 0.1๋กœ ๋‹ค์‹œtraining์„ ๊ณ„์†ํ•œ๋‹ค.
๋‚˜๋จธ์ง€ training schedule์€ ์ด์ „๊ณผ ๋™์ผํ•˜๋ฉฐ, ์ด 110์ธต ์‹ ๊ฒฝ๋ง์€ ์ž˜ ์ˆ˜๋ ด๋œ๋‹ค(๊ทธ๋ฆผ 6, ์ค‘๊ฐ„).
FitNet[https://arxiv.org/abs/1412.6550] ๋ฐ Highway[https://arxiv.org/abs/1505.00387](ํ‘œ 6)์™€ ๊ฐ™์€ ๊นŠ๊ณ  ๋ฐ ์–‡์€ ์‹ ๊ฒฝ๋ง๋ณด๋‹ค parameter์ˆ˜๊ฐ€ ์ ์ง€๋งŒ, ์ตœ์ฒจ๋‹จ ๊ฒฐ๊ณผ(6.43%, ํ‘œ 6) ์ค‘ ํ•˜๋‚˜๋ฅผ ์–ป์—ˆ๋‹ค.


 Analysis of Layer Response.
  - ๊ทธ๋ฆผ 7์€ layer response์˜ ํ‘œ์ค€ ํŽธ์ฐจ(std)๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
์ด๋•Œ, response๋Š” BN ์ดํ›„ ๋ฐ ๊ธฐํƒ€ ๋น„์„ ํ˜•์„ฑ(ReLU/์ถ”๊ฐ€) ์ด์ „์˜ ๊ฐ 3x3 layer์˜ ์ถœ๋ ฅ์ด๋‹ค.
ResNets์˜ ๊ฒฝ์šฐ, ์ด ๋ถ„์„์€ ์ž”์ฐจ ํ•จ์ˆ˜์˜ response๊ฐ•๋„๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.

- ๊ทธ๋ฆผ 7์€ ResNet์ด ์ผ๋ฐ˜์ ์ธ ์‘๋‹ต๋ณด๋‹ค ์ผ๋ฐ˜์ ์œผ๋กœ ๋” ์ž‘์€ ์‘๋‹ต์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.
์ด๋Ÿฐ ๊ฒฐ๊ณผ๋Š” ์ž”์ฐจ ํ•จ์ˆ˜๊ฐ€ ๋น„์ž”์ฐจ ํ•จ์ˆ˜๋ณด๋‹ค ์ผ๋ฐ˜์ ์œผ๋กœ 0์— ๊ฐ€๊นŒ์šธ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ธฐ๋ณธ ๊ฐ€์ •(3.1์ ˆ)์„ ๋’ท๋ฐ›์นจํ•œ๋‹ค.

- ๋˜ํ•œ ๊ทธ๋ฆผ 7์˜ ResNet-20, 56 ๋ฐ 110์˜ ๋น„๊ต์—์„œ ์ž…์ฆ๋œ ๋ฐ”์™€ ๊ฐ™์ด ๋” ๊นŠ์€ ResNet์ด ์‘๋‹ต์˜ ํฌ๊ธฐ๊ฐ€ ๋” ์ž‘๋‹ค๋Š” ๊ฒƒ์„ ์ฃผ๋ชฉํ•˜์ž.
์ฆ‰, ๋” ๋งŽ์€ ์ธต์ด ์žˆ์„ ๋•Œ, ResNets์˜ ๊ฐ๊ฐ์˜ ์ธต์€ ์‹ ํ˜ธ๋ฅผ ๋œ ์ˆ˜์ •ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค.



 Exploring Over 1000 layers.
  - ๊ณต๊ฒฉ์ ์œผ๋กœ 1000๊ฐœ ์ด์ƒ์˜ ์ธต์„ ์Œ“๋Š” ๊นŠ์€ ๋ชจ๋ธ์„ ํƒ๊ตฌํ•œ๋‹ค.
์œ„์—์„œ์ฒ˜๋Ÿผ ํ›ˆ๋ จ๋œ 1202์ธต ์‹ ๊ฒฝ๋ง์œผ๋กœ ์ด์–ด์ง€๋Š” n = 200์„ ์„ค์ •ํ•œ๋‹ค.
์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ ์ตœ์ ํ™” ์–ด๋ ค์›€์„ ๋ณด์ด์ง€ ์•Š๊ณ , ์ด 103์ธต ์‹ ๊ฒฝ๋ง์€ training error < 0.1%๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค(๊ทธ๋ฆผ 6, ์˜ค๋ฅธ์ชฝ).
test error๋Š” ์—ฌ์ „ํžˆ ์ƒ๋‹นํžˆ ์–‘ํ˜ธํ•ฉ๋‹ˆ๋‹ค(7.93%, ํ‘œ 6).


- ๊ทธ๋Ÿฌ๋‚˜ ๊ทธ๋Ÿฌํ•œ ๊ณต๊ฒฉ์  ์‹ฌ์ธต๋ชจ๋ธ์€ ์—ฌ์ „ํžˆ ๋ฏธํ•ด๊ฒฐ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค.
์ด 1202์ธต ์‹ ๊ฒฝ๋ง test ๊ฒฐ๊ณผ๋Š” ์šฐ๋ฆฌ์˜ 110์ธต ์‹ ๊ฒฝ๋ง ๊ฒฐ๊ณผ๋ณด๋‹ค ๋‚˜์˜์ง€๋งŒ, ๋‘˜ ๋‹ค ๋น„์Šทํ•œ training error๋ฅผ ๊ฐ–๋Š”๋‹ค.
์šฐ๋ฆฐ ์ด๋Ÿฐ ํ˜„์ƒ์ด "Overfitting"๋•Œ๋ฌธ์ด๋ผ ์ฃผ์žฅํ•œ๋‹ค.

- 1202์ธต ์‹ ๊ฒฝ๋ง์€ ์ด ์ž‘์€ dataset์— ๋Œ€ํ•ด์„œ๋Š” ๋ถˆํ•„์š”ํ•˜๊ฒŒ ํด ์ˆ˜ ์žˆ๋‹ค(19.4M).
์ด dataset์—์„œ ์ตœ์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป๊ธฐ ์œ„ํ•ด maxout / dropout ๊ฐ™์€ ๊ฐ•๋ ฅํ•œ ์ •๊ทœํ™”๊ฐ€ ์ ์šฉ๋œ๋‹ค.

๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š”, ์ตœ์ ํ™”์˜ ์–ด๋ ค์›€์— ์ดˆ์ ์„ ๋งž์ถ”์ง€ ์•Š๊ธฐ์— maxout/dropout์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค. 
์„ค๊ณ„์— ๋”ฐ๋ผ ๊นŠ๊ณ  ์–‡์€ ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ์ •๊ทœํ™”๋ฅผ ์ ์šฉํ•˜์ง€๋งŒ ๋” ๊ฐ•๋ ฅํ•œ ์ •๊ทœํ™”์™€ ๊ฒฐํ•ฉํ•˜๋ฉด ๊ฒฐ๊ณผ๊ฐ€ ๊ฐœ์„ ๋  ์ˆ˜ ์žˆ๋‹ค.

 

 

4.3. Object Detection on  PASCAL  and  MS COCO

- ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ ๋‹ค๋ฅธ ์ธ์‹๊ณผ์ œ์—์„œ๋„ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์ด ์ข‹๋‹ค.
- ํ‘œ 7๊ณผ 8์€ PASCAL VOC 2007๊ณผ 2012 [5] ๋ฐ COCO์— ๋Œ€ํ•œ Object Detection์˜ ๊ธฐ์ค€๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
- ์šฐ๋ฆฌ๋Š” Detection๋ฐฉ๋ฒ•์€ Faster R-CNN์„ ์‚ฌ์šฉํ•œ๋‹ค.
์ด๋•Œ, VGG-16์„ ResNet-101๋กœ ๋Œ€์ฒดํ•˜๋Š” ๊ฐœ์„  ์‚ฌํ•ญ์— ๊ด€์‹ฌ์„ ๋‘๊ณ  ์ฃผ๋ชฉํ•œ๋‹ค.
๋‘ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” Detection์˜ ๊ตฌํ˜„(๋ถ€๋ก ์ฐธ์กฐ)์€ ๋™์ผ., ๊ฒฐ๊ณผ๋ฌผ์€ ๋” ๋‚˜์€ ์‹ ๊ฒฝ๋ง์— ๊ท€์†๋œ๋‹ค.

- ๊ฐ€์žฅ ์ฃผ๋ชฉํ•  ๋งŒํ•œ ๊ฒƒ์€ ๊นŒ๋‹ค๋กœ์šด COCO dataset์—์„œ COCO์˜ ํ‘œ์ค€ metric์ธ (mAP@[.5, .95])์ด 6.0% ์ฆ๊ฐ€ํ•˜์—ฌ 28%์˜ ์ƒ๋Œ€์  ๊ฐœ์„ ์„ ์–ป์—ˆ๋Š”๋ฐ, ์ด๋Š” ์˜ค์ง learned representation ๋•Œ๋ฌธ์ด๋‹ค.

- ์‹ฌ์ธต ResNet์„ ๊ธฐ๋ฐ˜์œผ๋กœ ILSVRC ๋ฐ COCO 2015 ๋Œ€ํšŒ์—์„œ ์—ฌ๋Ÿฌ ํŠธ๋ž™์—์„œ 1์œ„๋ฅผ ์ฐจ์ง€ํ–ˆ์Šต๋‹ˆ๋‹ค:
ImageNet Detection, ImageNet Localization, COCO Detection ๋ฐ COCO Segmentation.

- ์ž์„ธํ•œ ๋‚ด์šฉ์€ ๋ถ€๋ก์— ๊ธฐ์žฌ.

 

 

 

 

 

๐Ÿ˜ถ ๋ถ€๋ก (Appendix)

A. Object Detection. Baselines

• PASCAL VOC
  -
• MS COCO
  -

 

B. Object Detection Improvements

• MS COCO
  -

• PASCAL VOC
  -

• ImageNet Detection
  - 

 

C. ImageNet Localization

  -

  -

 

 

 

 

 

๐Ÿง ๋…ผ๋ฌธ ๊ฐ์ƒ_์ค‘์š”๊ฐœ๋… ํ•ต์‹ฌ ์š”์•ฝ

"Deep Residual Learning for Image Recognition"
์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์—์„œ gradient vanishing problem์„ ํ•ด๊ฒฐํ•˜๋Š” ์‹ฌ์ธต ConvNet์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๊ตฌ์กฐ๋ฅผ ์†Œ๊ฐœํ•˜๋Š” ์—ฐ๊ตฌ ๋…ผ๋ฌธ์œผ๋กœ ์ด ๋…ผ๋ฌธ์€ ResNet์ด๋ผ๋Š” ์‹ฌ์ธต ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

 

 

[ํ•ต์‹ฌ ๊ฐœ๋…]

1. Problem
- ์ด ๋…ผ๋ฌธ์€ ๋„คํŠธ์›Œํฌ ๊นŠ์ด๊ฐ€ ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ ์‹ ๊ฒฝ๋ง์˜ ์ •ํ™•๋„๊ฐ€ ํฌํ™”๋˜๊ฑฐ๋‚˜ ์ €ํ•˜๋  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋กœ ์ธํ•ด ๋งค์šฐ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์„ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์ด ์–ด๋ ต๋‹ค๋Š” ์ ์„ ๊ฐ•์กฐํ–ˆ๋‹ค.
  [Degradation Problem]
 - ๋” ๊นŠ์€ ์‹ ๊ฒฝ๋ง์ด ์ˆ˜๋ ด์„ ์‹œ์ž‘ํ•  ๋•Œ, ์‹ ๊ฒฝ๋ง๊นŠ์ด๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด ์ •ํ™•๋„๊ฐ€ ํฌํ™”์ƒํƒœ๊ฐ€ ๋˜๊ณ , ๊ทธ ๋‹ค์Œ์— ๋น ๋ฅด๊ฒŒ ์ €ํ•˜๋œ๋‹ค. 
 - ์ด ๋ฌธ์ œ์˜ ์˜ˆ์ƒ์น˜ ๋ชปํ•œ ๋ฌธ์ œ์ ์€ ๋ฐ”๋กœ overfitting์ด ์ด ๋ฌธ์ œ๋ฅผ ์•ผ๊ธฐํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ ์ธ๋ฐ, ์ ์ ˆํ•œ ์‹ฌ์ธต๋ชจ๋ธ์— ๋” ๋งŽ์€ ์ธต์„ ์ถ”๊ฐ€ํ•˜๋ฉด ๋” ๋†’์€ training error๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

2. Solution
์ด ๋…ผ๋ฌธ์€ ๊ธฐ๋ณธ ๋งตํ•‘์„ ์ง์ ‘ ํ•™์Šตํ•˜๋Š” ๋Œ€์‹  ์ž”์ฐจ ํ•จ์ˆ˜(residual functions)๋ฅผ ํ•™์Šตํ•˜๋Š” ์ž”์ฐจ ํ•™์Šต(residual learning)์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ œ์•ˆํ–ˆ๋‹ค.
์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ ์‹ ๊ฒฝ๋ง์ด identity mapping์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก skip connection์„ ์‚ฌ์šฉํ•˜๋Š” residual block์„ ํ†ตํ•ด ๊ตฌํ˜„๋œ๋‹ค.

cf. [Identity. &. Identity Mapping]
ResNet๊ตฌ์กฐ์—์„œ Residual Block์€ identity(= identity mapping) ์„ ๋งํ•œ๋‹ค.
- ์ž…๋ ฅ์—์„œ ์ถœ๋ ฅ์œผ๋กœ์˜ ์ „์ฒด ๋งคํ•‘์„ ํ•™์Šตํ•˜๋Š” ๋Œ€์‹  ์ž…๋ ฅ์„ ์•ฝ๊ฐ„ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ํ•™์Šต์„ ์ง„ํ–‰ํ•œ๋‹ค.
- Identity Block์€ Skip Connection์ด Identity mapping์„ ์ˆ˜ํ–‰ํ•˜๋Š” residual block์˜ ์ผ์ข…์ด๋‹ค.
- ์ฆ‰, ๋ณ€ํ™˜์—†์ด ์ž…๋ ฅ์ด block์˜ ์ถœ๋ ฅ์— ์ง์ ‘ ์ถ”๊ฐ€๋˜๋Š” ๊ฒƒ์œผ๋กœ Identity Block์€ input์˜ ์ฐจ์›์„ ์œ ์ง€์‹œํ‚จ๋‹ค.
Identity Block์€ Residual Networks์—์„œ ์ž…๋ ฅ์˜ ์ฐจ์›์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋น„์„ ํ˜•์„ฑ์„ ๋„์ž…ํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.
์ด๋Š” ์‹ ๊ฒฝ๋ง์ด ๋” ๋ณต์žกํ•œ ํŠน์ง•์„ ํ•™์Šตํ•˜๋„๋ก ๋•๊ณ  ๋งค์šฐ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์—์„œ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฌธ์ œ๋ฅผ ๋ฐฉ์ง€ํ•œ๋‹ค.

[Shortcut Connection]
- skip connection์ด๋ผ๊ณ ๋„ ๋ถˆ๋ฆฐ๋‹ค.
- ์ธต์˜ ์ž…๋ ฅ๊ณผ ์ดํ›„ ์ธต์˜ ์ถœ๋ ฅ ์‚ฌ์ด์— ์ถ”๊ฐ€ํ•จ์œผ๋กœ์จ ResNet์€ ์ธต์„ ์šฐํšŒํ•˜๊ณ  ์ž…๋ ฅ์„ ์ดํ›„ ๊ณ„์ธต์œผ๋กœ "์ง์ ‘ ์ „๋‹ฌ"ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
- ์ด๋ฅผ ํ†ตํ•ด ๋„คํŠธ์›Œํฌ๋Š” ์ž”์ฐจ ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋งค์šฐ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์˜ ํ›ˆ๋ จ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
- ๋˜ํ•œ ์—ฌ๋Ÿฌ conv.layer์™€ ๊ธฐ์กด์˜ ์ž…๋ ฅ์„ ๋ธ”๋ก์˜ ์ถœ๋ ฅ์— ์ถ”๊ฐ€ํ•˜์—ฌ ์‹ ๊ฒฝ๋ง์ด identity mapping์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค€๋‹ค.
์ด๋ฅผ ํ†ตํ•ด training์˜ ์ •ํ™•์„ฑ ๋ฐ ์†๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.

[Residual learning]
์ด ๋…ผ๋ฌธ์€ ์›ํ•˜๋Š” ๊ธฐ๋ณธ ๋งคํ•‘์„ ์ง์ ‘ ํ•™์Šตํ•˜๋Š” ๋Œ€์‹  ๋ ˆ์ด์–ด ์ž…๋ ฅ์„ ์ฐธ์กฐํ•˜์—ฌ ์ž”์ฐจ ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•˜๋Š” ์ž”์ฐจ ํ•™์Šต์˜ ๊ฐœ๋…์„ ๋„์ž…ํ–ˆ๋‹ค.
์ด ์ ‘๊ทผ ๋ฐฉ์‹์€ gradient vanishing์„ ํ”ผํ•  ์ˆ˜ ์žˆ์–ด์„œ ์‹ฌ์ธต์‹ ๊ฒฝ๋งํ›ˆ๋ จ์„ ํ—ˆ์šฉํ•œ๋‹ค.

[Residual blocks]
ResNet์€ ์—ฌ๋Ÿฌ conv.layer์™€ ๊ธฐ์กด์˜ ์ž…๋ ฅ์„ ๋ธ”๋ก์˜ ์ถœ๋ ฅ์— ์ถ”๊ฐ€ํ•˜๋Š” skip connection์œผ๋กœ ๊ตฌ์„ฑ๋œ ์ž”์ฐจ ๋ธ”๋ก(residual block)์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ตฌ์ถ•๋œ๋‹ค.
์ด๋ฅผ ํ†ตํ•ด ์‹ ๊ฒฝ๋ง์€ ์ž”์ฐจ ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋งค์šฐ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์˜ ํ›ˆ๋ จ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.



3. Bottleneck Architecture
- ResNet ๊ตฌ์กฐ๋Š” ๋งค์šฐ ๊นŠ์€ ์ž”์—ฌ ๋ธ”๋ก ์Šคํƒ(์˜ˆ: 152๊ฐœ ์ธต)์œผ๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ ์ •ํ™•๋„๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ๊ณ„์‚ฐ์„ ์ค„์ด๋Š” bottleneck์„ค๊ณ„๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
- ๋…ผ๋ฌธ์—์„œ๋Š” ์„ธ ๊ฐœ์˜ ์ธต์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ณ‘๋ชฉ ๊ตฌ์กฐ๋ฅผ ์†Œ๊ฐœํ–ˆ์œผ๋ฉฐ ์ด ์„ธ ์ธต์„ Add๋ฅผ ์ด์šฉํ•ด ๋”ํ•œ๋‹ค.
โ‘  channel์ˆ˜๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•œ 1x1 conv.layer
โ‘ก feature ํ•™์Šต์„ ์œ„ํ•œ 3x3 conv.layer
โ‘ข channel์ˆ˜๋ฅผ ๋‹ค์‹œ ๋Š˜๋ฆฌ๊ธฐ ์œ„ํ•œ ๋˜ ๋‹ค๋ฅธ 1x1 conv.layer


cf. Pre-activation
์ด ๋…ผ๋ฌธ์„ ์‘์šฉํ•œ ResNetV2๋Š” ์‚ฌ์ „ ํ™œ์„ฑํ™”(pre-activation)์— ๋Œ€ํ•œ ๊ฐœ๋…์„  ๋„์ž…ํ–ˆ๋‹ค.
 - BatchNormalization ๋ฐ ReLU๋ฅผ ๊ฐ conv.layer ์ดํ›„๊ฐ€ ์•„๋‹Œ ์ด์ „์— ์ ์šฉํ•œ๋‹ค.
 - ์ด๋ฅผ ํ†ตํ•ด training performance๋ฅผ ๊ฐœ์„ ํ•˜๊ณ  ๋งค์šฐ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์—์„œ์˜ overfitting์„ ์ค„์—ฌ์ฃผ์—ˆ๋‹ค.



5. Results
- ResNet์€ ํ›จ์”ฌ ๋” ๊นŠ์€ ์‹ ๊ฒฝ๋ง์œผ๋กœ ์ด์ „ ๋ฐฉ๋ฒ•์„ ๋Šฅ๊ฐ€ํ•ด ImageNet dataset์˜ classification์ž‘์—…์—์„œ ์ตœ์ฒจ๋‹จ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.
- ๋˜ํ•œ ResNet์€ Objcet Detection ๋ฐ Semantic Segmentation๊ณผ ๊ฐ™์€ ๋‹ค์–‘ํ•œ dataset ๋ฐ ์ž‘์—…์—์„œ ๋” ๋‚˜์€ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ(generalization performance)์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

- ์ด๋•Œ, ์ดˆ๊ธฐ์˜ learning rate=0.1์ด "์ˆ˜๋ ด์„ ์‹œ์ž‘ํ•˜๊ธฐ"์— ์•ฝ๊ฐ„ ๋„ˆ๋ฌด ํฌ๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๊ธฐ์—
 training error๊ฐ€ 80%๋ฏธ๋งŒ์ด ๋  ๋•Œ๊นŒ์ง€ ๋‚ฎ์€ ํ•™์Šต๋ฅ ์ธ 0.01๋กœ ์‚ฌ์ „ training์„ ์ง„ํ–‰ํ•œ ํ›„(์•ฝ 400 iteration) 0.1๋กœ ๋‹ค์‹œtraining์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.

์ „๋ฐ˜์ ์œผ๋กœ ResNet ๋…ผ๋ฌธ์€ ๋”ฅ๋Ÿฌ๋‹๊ณผ ํŠนํžˆ ์ปดํ“จํ„ฐ ๋น„์ „ ์˜์—ญ์—์„œ ๋งŽ์€ ์ตœ์‹  ๋ชจ๋ธ์˜ ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ๊ฐ€ ๋œ ๋งค์šฐ ๊นŠ์€ ConvNet์˜ ํ›ˆ๋ จ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋Š” ์ƒˆ๋กœ์šด ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•˜๋Š” ํ˜์‹ ์ ์ธ ์„ฑ๊ณผ๋ฅผ ์ด๋ฃฉํ•˜์˜€๋‹ค.

 

 

 

 

 

๐Ÿง  ๋…ผ๋ฌธ์„ ์ฝ๊ณ  Architecture ์ƒ์„ฑ (with tensorflow)

import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, GlobalAveragePooling2D, Dense, ReLU, BatchNormalization, ZeroPadding2D, Activation, Add

def conv_bn_relu(x, filters, kernel_size, strides=1, padding='same'):
    x = Conv2D(filters, kernel_size, strides=strides, padding=padding)(x)
    x = BatchNormalization()(x)
    x = ReLU()(x)
    return x

def identity_block(x, filters):
    shortcut = x
    x = conv_bn_relu(x, filters, 1)
    x = conv_bn_relu(x, filters, 3)
    x = conv_bn_relu(x, filters * 4, 1)
    x = Add()([x, shortcut])
    x = ReLU()(x)
    return x

def projection_block(x, filters, strides):
    shortcut = conv_bn_relu(x, filters * 4, 1, strides)
    x = conv_bn_relu(x, filters, 1, strides)
    x = conv_bn_relu(x, filters, 3)
    x = conv_bn_relu(x, filters * 4, 1)
    x = Add()([x, shortcut])
    x = ReLU()(x)
    return x

def resnet(input_shape, num_classes, num_layers):
    if num_layers == 50:
        num_blocks = [3, 4, 6, 3]   
    elif num_layers == 101:
        num_blocks = [3, 4, 23, 3]   
    elif num_layers == 152:
        num_blocks = [3, 8, 36, 3]

    conv2_x, conv3_x, conv4_x, conv5_x = num_blocks

    inputs = Input(shape=input_shape)
    x = ZeroPadding2D(padding=(3, 3))(inputs)
    x = conv_bn_relu(x, 64, 7, strides=2)
    x = MaxPooling2D(pool_size=(3, 3), strides=2, padding='same')(x)

    x = projection_block(x, 64, strides=1)
    for _ in range(conv2_x - 1):
        x = identity_block(x, 64)

    x = projection_block(x, 128, strides=2)
    for _ in range(conv3_x - 1):
        x = identity_block(x, 128)

    x = projection_block(x, 256, strides=2)
    for _ in range(conv4_x - 1):
        x = identity_block(x, 256)

    x = projection_block(x, 512, strides=2)
    for _ in range(conv5_x - 1):
        x = identity_block(x, 512)

    x = GlobalAveragePooling2D()(x)
    outputs = Dense(num_classes, activation='softmax')(x)

    model = tf.keras.Model(inputs, outputs)
    return model




model = resnet(input_shape=(224,224,3),  num_classes=200, num_layers=152)
model.summary()

 

Model: "ResNet152"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_24 (InputLayer)          [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 zero_padding2d_10 (ZeroPadding  (None, 230, 230, 3)  0          ['input_24[0][0]']               
 2D)                                                                                              
                                                                                                  
 conv2d_4159 (Conv2D)           (None, 115, 115, 64  9472        ['zero_padding2d_10[0][0]']      
                                )                                                                 
                                                                                                  
 batch_normalization_4158 (Batc  (None, 115, 115, 64  256        ['conv2d_4159[0][0]']            
 hNormalization)                )                                                                 
                                                                                                  
 re_lu_540 (ReLU)               (None, 115, 115, 64  0           ['batch_normalization_4158[0][0]'
                                )                                ]                                
                                                                                                  
 max_pooling2d_23 (MaxPooling2D  (None, 58, 58, 64)  0           ['re_lu_540[0][0]']              
 )                                                                                                
                                                                                                  
 conv2d_4161 (Conv2D)           (None, 58, 58, 64)   4160        ['max_pooling2d_23[0][0]']       
                                                                                                  
 batch_normalization_4160 (Batc  (None, 58, 58, 64)  256         ['conv2d_4161[0][0]']            
 hNormalization)                                                                                  
                                                                                                  
 re_lu_542 (ReLU)               (None, 58, 58, 64)   0           ['batch_normalization_4160[0][0]'
                                                                 ]                                
                                                                                                  
 conv2d_4162 (Conv2D)           (None, 58, 58, 64)   36928       ['re_lu_542[0][0]']              
                                                                                                  
 batch_normalization_4161 (Batc  (None, 58, 58, 64)  256         ['conv2d_4162[0][0]']            
 hNormalization)                                                                                  
                                                                                                  
 re_lu_543 (ReLU)               (None, 58, 58, 64)   0           ['batch_normalization_4161[0][0]'
                                                                 ]                                
                                                                                                  
 conv2d_4163 (Conv2D)           (None, 58, 58, 256)  16640       ['re_lu_543[0][0]']              
                                                                                                  
 conv2d_4160 (Conv2D)           (None, 58, 58, 256)  16640       ['max_pooling2d_23[0][0]']       
                                                                                                  
 batch_normalization_4162 (Batc  (None, 58, 58, 256)  1024       ['conv2d_4163[0][0]']            
 hNormalization)                                                                                  
                                                                                                  
 batch_normalization_4159 (Batc  (None, 58, 58, 256)  1024       ['conv2d_4160[0][0]']            
 hNormalization)                                                                                  
                                                                                                  
 re_lu_544 (ReLU)               (None, 58, 58, 256)  0           ['batch_normalization_4162[0][0]'
                                                                 ]                                
                                                                                                  
 re_lu_541 (ReLU)               (None, 58, 58, 256)  0           ['batch_normalization_4159[0][0]'
                                                                 ]                                
                                                                                                  
 add_1150 (Add)                 (None, 58, 58, 256)  0           ['re_lu_544[0][0]',              
                                                                  're_lu_541[0][0]']              
                                                                                                  
 re_lu_545 (ReLU)               (None, 58, 58, 256)  0           ['add_1150[0][0]']               
                                                                                                  
 conv2d_4164 (Conv2D)           (None, 58, 58, 64)   16448       ['re_lu_545[0][0]']  



...


conv2d_4312 (Conv2D)           (None, 8, 8, 512)    2359808     ['re_lu_741[0][0]']              
                                                                                                  
 batch_normalization_4311 (Batc  (None, 8, 8, 512)   2048        ['conv2d_4312[0][0]']            
 hNormalization)                                                                                  
                                                                                                  
 re_lu_742 (ReLU)               (None, 8, 8, 512)    0           ['batch_normalization_4311[0][0]'
                                                                 ]                                
                                                                                                  
 conv2d_4313 (Conv2D)           (None, 8, 8, 2048)   1050624     ['re_lu_742[0][0]']              
                                                                                                  
 batch_normalization_4312 (Batc  (None, 8, 8, 2048)  8192        ['conv2d_4313[0][0]']            
 hNormalization)                                                                                  
                                                                                                  
 re_lu_743 (ReLU)               (None, 8, 8, 2048)   0           ['batch_normalization_4312[0][0]'
                                                                 ]                                
                                                                                                  
 add_1199 (Add)                 (None, 8, 8, 2048)   0           ['re_lu_743[0][0]',              
                                                                  're_lu_740[0][0]']              
                                                                                                  
 re_lu_744 (ReLU)               (None, 8, 8, 2048)   0           ['add_1199[0][0]']               
                                                                                                  
 global_average_pooling2d_6 (Gl  (None, 2048)        0           ['re_lu_744[0][0]']              
 obalAveragePooling2D)                                                                            
                                                                                                  
 dense_6 (Dense)                (None, 200)          409800      ['global_average_pooling2d_6[0][0
                                                                 ]']                              
                                                                                                  
==================================================================================================
Total params: 58,780,744
Trainable params: 58,629,320
Non-trainable params: 151,424
__________________________________________________________________________________________________

+ Recent posts