๐Ÿ˜ถ ์ดˆ๋ก (Abstract)

- ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์€ "over-parameterized"๋˜๊ฑฐ๋‚˜ "weight_decay. &. dropout"๊ฐ™์€ ์—„์ฒญ๋‚œ ์–‘์˜ noise์™€ regularize๋กœ ํ›ˆ๋ จ๋  ๋•Œ, ์ข…์ข… ์ž˜ ์ž‘๋™ํ•˜๊ณคํ•œ๋‹ค.
๋น„๋ก Dropout์ด FC.layer์ธต์—์„œ ๋งค์šฐ ํฌ๊ด„์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” regularization๊ธฐ๋ฒ•์ด์ง€๋งŒ, conv.layer์—์„œ ์ข…์ข… ๋œ ํšจ๊ณผ์ ์ด๋‹ค.
์ด๋Ÿฐ Conv.layer์—์„œ์˜ Dropout๊ธฐ๋ฒ•์˜ ์„ฑ๊ณต์˜ ๊ฒฐํ•์€ conv.layer์˜ activation unit์ด "dropout์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๊ณต๊ฐ„์  ์ƒ๊ด€๊ด€๊ณ„"๊ฐ€ ์žˆ๊ธฐ์— ์ •๋ณด๊ฐ€ ์—ฌ์ „ํžˆ conv.layer๋ฅผ ํ†ตํ•ด ํ๋ฅผ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์ด๋‹ค.
∴ CNN์˜ ์ •๊ทœํ™”๋ฅผ ์œ„ํ•ด ๊ตฌ์กฐํ™”๋œ ํ˜•ํƒœ์˜ dropout๊ธฐ๋ฒ•์ด ํ•„์š”ํ•˜๋‹ค!

- ์ด ๊ตฌ์กฐํ™”๋œ ํ˜•ํƒœ์˜ dropout๊ธฐ๋ฒ•์„ ๋ณธ ๋…ผ๋ฌธ์—์„œ ์†Œ๊ฐœํ•˜๋Š”๋ฐ, ๋ฐ”๋กœ Dropout์ด๋‹ค.
Dropout๊ธฐ๋ฒ•์€ ํŠน์ง•๋งต์˜ ์ธ์ ‘์˜์—ญ์— ์žˆ๋Š” units๋ฅผ ํ•จ๊ป˜ Drop์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ์ฑ„ํƒํ•œ๋‹ค.
์šฐ๋ฆฐ "skip-connection"์— DropBlock์„ ์ ์šฉํ•จ์œผ๋กœ์จ ์ •ํ™•๋„์˜ ํ–ฅ์ƒ์„ ์ด๋ฃฉํ–ˆ๋‹ค.
๋˜ํ•œ, training์‹œ Drop์˜ ์ˆ˜๋ฅผ ์ ์ฐจ ๋Š˜๋ฆฌ๋ฉด์„œ 2๊ฐ€์ง€ ์ด์ ์„ ์–ป์—ˆ๋‹ค.
  i) ๋” ๋‚˜์€ accuracy
  ii) hyperparameter์˜ ์„ ํƒ์˜ ํ™•๊ณ ์„ฑ(robust)

- ์ˆ˜๋งŽ์€ ์‹คํ—˜์—์„œ DropBlock์€ CNN์„ ๊ทœ์ œํ™”ํ•˜๋Š” ๋ฐ Dropout๋ณด๋‹ค ๋” ์ž˜ ์ž‘๋™ํ•œ๋‹ค.
ImageNet์—์„œ DropBlock์„ ์‚ฌ์šฉํ•˜๋Š” ResNet-50 ์•„ํ‚คํ…์ฒ˜๋Š” 78.13%์˜ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜์—ฌ ๊ธฐ์กด๋ณด๋‹ค 1.6% ์ด์ƒ ํ–ฅ์ƒ๋˜์—ˆ๋‹ค. COCO Detection์—์„œ DropBlock์€ RetinaNet์˜ ํ‰๊ท  ์ •๋ฐ€๋„๋ฅผ 36.8%์—์„œ 38.4%๋กœ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค.

 

 

 

1. ์„œ๋ก  (Introduction)

- ์‹ฌ์ธต์‹ ๊ฒฝ๋ง์€ "over-parameterized(= parameter์ˆ˜๊ฐ€ ๋งŽ์€)"๋˜๊ฑฐ๋‚˜ "weight_decay. &. dropout"๊ฐ™์€ ์—„์ฒญ๋‚œ ์–‘์˜ noise์™€ regularize๋กœ ํ›ˆ๋ จ๋  ๋•Œ, ์ข…์ข… ์ž˜ ์ž‘๋™ํ•˜๊ณคํ•œ๋‹ค.
CNN์—์„œ dropout์˜ ์ฒซ๋“ฑ์žฅ์œผ๋กœ ๋ง‰๋Œ€ํ•œ ์„ฑ๊ณต์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์ตœ๊ทผ ์—ฐ๊ตฌ์—์„œ convolutional architectures์—์„œ dropout์ด ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ๋Š” ๊ทนํžˆ ๋“œ๋ฌผ๋‹ค. [BN, ResNet,...๋“ฑ ๋…ผ๋ฌธ]
๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ, dropout์€ CNN์—์„œ ์ฃผ๋กœ "Fully-Connected layer"์—์„œ ๋ฉ”์ธ์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

- ์šฐ๋ฆฐ Dropout์˜ ํ•ต์‹ฌ๊ฒฐ์ ์ด "drops out feature randomly"๋ผ๊ณ  ์ฃผ์žฅํ•œ๋‹ค.
ํŠน์ง•์„ ๋ฌด์ž‘์œ„๋กœ ์ œ๊ฑฐํ•˜๋Š” ๊ฒƒ์ด FC.layer์—์„œ๋Š” ์ข‹์„ ์ˆ˜๋Š” ์žˆ์œผ๋‚˜, "๊ณต๊ฐ„์  ์ƒ๊ด€๊ด€๊ณ„"๊ฐ€ ์กด์žฌํ•˜๋Š” Conv.layer์—์„œ๋Š” ๋œ ํšจ๊ณผ์ ์ด๋‹ค.
dropout์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ํŠน์ง•๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค๋Š” ๊ฒƒ์€ "input์ •๋ณด๊ฐ€ ๋‹ค์Œ์ธต์—๋„ ์ „๋‹ฌ๋œ๋‹ค๋Š” ๊ฒƒ"์ด๋ฉฐ ์ด๋Š” ์‹ ๊ฒฝ๋ง์˜ "overfitting"์„ ์•ผ๊ธฐํ•  ์ˆ˜ ์žˆ๋‹ค.
์ด๋Ÿฐ ์ง๊ด€์€ CNN์˜ ๊ทœ์ œํ™”๋ฅผ ์œ„ํ•ด ๋” ๊ตฌ์กฐํ™”๋œ dropout์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์‹œ์‚ฌํ•œ๋‹ค.

- ๋ณธ ๋…ผ๋ฌธ์—์„œ, (dropout์„ ๊ตฌ์กฐํ™”ํ•˜์—ฌ CNN๊ทœ์ œํ™”์— ๋ถ€๋ถ„์ ์œผ๋กœ ํšจ๊ณผ๊ฐ€ ์žˆ๋Š”) DropBlock์— ๋Œ€ํ•ด ์†Œ๊ฐœํ•œ๋‹ค. 
DropBlock์—์„œ block๋‚ด์˜ ํŠน์ง•, ์ฆ‰ ํŠน์ง•๋งต์˜ ์ธ์ ‘์˜์—ญ์ด ํ•จ๊ป˜ drop๋œ๋‹ค.
DropBlock์€ ์ƒ๊ด€์˜์—ญ์˜ ํŠน์ง•์„ ํ๊ธฐ(discard)ํ•˜๋ฏ€๋กœ ์‹ ๊ฒฝ๋ง์€ data์— ์ ํ•ฉํ•œ ์ฆ๊ฑฐ๋ฅผ ๋‹ค๋ฅธ ๊ณณ์—์„œ ์ฐพ์•„์•ผํ•œ๋‹ค. (๊ทธ๋ฆผ 1 ์ฐธ์กฐ).


- ์šฐ๋ฆฌ ์‹คํ—˜์—์„œ, DropBlock์€ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ๊ณผ dataset์—์„œ dropout๋ณด๋‹ค ํ›จ์”ฌ ๋‚ซ๋‹ค.
ResNet-50 ์— DropBlock์„ ์ถ”๊ฐ€ํ•˜๋ฉด ImageNet์˜ Image Classification์˜ ์ •ํ™•๋„๊ฐ€ 76.51%์—์„œ 78.13%๋กœ ํ–ฅ์ƒ๋œ๋‹ค.
COCO detection์—์„œ DropBlock์€ RetinaNet์˜ AP๋ฅผ 36.8%์—์„œ 38.4%๋กœ ํ–ฅ์ƒ๋œ๋‹ค.

 

 

 

2. Related work

- ์„œ๋ก ์—์„œ ๋งํ–ˆ๋“ฏ, Dropout์€ DropConnect, maxout, StochasticDepth, DropPath, ScheduledDropPath, shake-shake regularization, ShakeDrop regularization๊ฐ™์€ ์‹ ๊ฒฝ๋ง์— ๋Œ€ํ•œ ์—ฌ๋Ÿฌ ์ •๊ทœํ™” ๋ฐฉ๋ฒ•์—์„œ ์˜๊ฐ์„ ๋ฐ›์—ˆ๋‹ค.
์ด๋Ÿฐ ๋ฐฉ๋ฒ•๋“ค์˜ ๊ธฐ๋ณธ ์›์น™์€ training data์— overfitting๋˜์ง€ ์•Š๊ธฐ์œ„ํ•ด ์‹ ๊ฒฝ๋ง์— noise๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค.
CNN์˜ ๊ฒฝ์šฐ, ์ „์ˆ ํ•œ ๋Œ€๋ถ€๋ถ„์˜ ์„ฑ๊ณต์  ๋ฐฉ๋ฒ•์€ noise๊ฐ€ ๊ตฌ์กฐํ™”๋˜์–ด์•ผ ํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด, DropPath์—์„œ ์‹ ๊ฒฝ๋ง์˜ ์ „์ฒด ๊ณ„์ธต์€ ํŠน์ • ๋‹จ์œ„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํ›ˆ๋ จ์—์„œ ์ œ์™ธ๋œ๋‹ค.
์ด๋Ÿฌํ•œ dropping out layer์˜ ์ „๋žต์€ ์ž…์ถœ๋ ฅ branch๊ฐ€ ๋งŽ์€ ๊ณ„์ธต์— ์ž˜ ์ ์šฉ๋  ์ˆ˜ ์žˆ์ง€๋งŒ branch๊ฐ€ ์—†๋Š” ๊ณ„์ธต์—๋Š” ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๋‹ค.
cf) "block"์„ conv.layer๋‚ด์˜ ์ธ์ ‘ํ•œ ํŠน์ง•๋งต์˜ ์ง‘ํ•ฉ์œผ๋กœ ์ •์˜ํ•˜๊ณ 
"branch"๋ฅผ ๋™์ผํ•œ ๊ณต๊ฐ„ ํ•ด์ƒ๋„๋ฅผ ๊ณต์œ ํ•˜๋Š” ์—ฐ์†๋œ block์˜ ์ง‘ํ•ฉ์œผ๋กœ ์ •์˜ํ•œ๋‹ค.


์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์ธ DropBlock์€ CNN์˜ ๋ชจ๋“  ๊ณณ์— ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์—์„œ ๋” ์ผ๋ฐ˜์ ์ด๋‹ค.
์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ ์ „์ฒด channel์ด ํŠน์ง•๋งต์—์„œ drop๋˜๋Š” SpatialDropout๊ณผ ๋ฐ€์ ‘ํ•œ ๊ด€๋ จ์ด ์žˆ๊ธฐ์— ์šฐ๋ฆฌ์˜ ์‹คํ—˜์€ DropBlock์ด SpatialDropout๋ณด๋‹ค ๋” ํšจ๊ณผ์ ์ด๋ผ๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.


- Architecture์— ํŠน์ •๋œ ์ด๋Ÿฐ noise์ฃผ์ž… ๊ธฐ์ˆ ์˜ ๊ฐœ๋ฐœ์€ CNN์—๋งŒ ๊ตญํ•œ๋˜์ง€ ์•Š๋Š”๋‹ค.
์‹ค์ œ๋กœ CNN๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ RNN์€ ์ž์ฒด์ ์ธ noise์˜ ์ฃผ์ž…๋ฐฉ์‹์„ ํ•„์š”๋กœ ํ•œ๋‹ค.
ํ˜„์žฌ, Recurrent Connections์— noise๋ฅผ ์ฃผ์ž…ํ•˜๋Š” ๋ฐ ๊ฐ€์žฅ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ• ์ค‘ ๋‘ ๊ฐ€์ง€๊ฐ€ ์žˆ๋Š”๋ฐ, 
๋ฐ”๋กœ Variational Dropout๊ณผ ZoneOut์ด๋‹ค.

- ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ ์ž…๋ ฅ ์˜ˆ์ œ์˜ ์ผ๋ถ€๊ฐ€ ์˜์  ์ฒ˜๋ฆฌ(zeroed out)๋˜๋Š” ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ๋ฒ•์ธ Cutout์—์„œ ์˜๊ฐ์„ ์–ป์—ˆ๋‹ค.
DropBlock์€ CNN์˜ ๋ชจ๋“  ํŠน์ง•๋งต์—์„œ Cutout์„ ์ ์šฉํ•˜์—ฌ Cutout์„ ์ผ๋ฐ˜ํ™”ํ•œ๋‹ค.
์šฐ๋ฆฌ์˜ ์‹คํ—˜์—์„œ, training ์ค‘ DropBlock์— ๋Œ€ํ•ด ๊ณ ์ •๋œ zero out๋น„์œจ์„ ๊ฐ–๋Š” ๊ฒƒ์€ ํ›ˆ๋ จ ์ค‘ zero out๋น„์œจ์ด ์ฆ๊ฐ€ํ•˜๋Š” schedule์„ ๊ฐ–๋Š” ๊ฒƒ๋งŒํผ ๊ฐ•๋ ฅํ•˜์ง€ ์•Š๋‹ค.
์ฆ‰, ๊ต์œก ์ค‘์—๋Š” ์ดˆ๊ธฐ์— DropBlock ๋น„์œจ์„ ์ž‘๊ฒŒ ์„ค์ •ํ•˜๊ณ , ๊ต์œก ์ค‘์—๋Š” ์‹œ๊ฐ„์ด ์ง€๋‚จ์— ๋”ฐ๋ผ ์„ ํ˜•์ ์œผ๋กœ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๊ฒƒ์ด ์ข‹์œผ๋ฉฐ, ์ด scheduling ์ฒด๊ณ„๋Š” ScheduledDropPath์™€ ๊ด€๋ จ์ด ์žˆ๋‹ค. 

 

 

 

3. DropBlock

- DropBlock์€ dropout๊ณผ ๋น„์Šทํ•œ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์ด์ง€๋งŒ, dropout๊ณผ์˜ ์ฃผ๋œ ์ฐจ์ด์ ์€ 
Dropout: dropping out independent random unit
DropBlock: drops contiguous regions from a feature map of a layer
DropBlock์€ 2๊ฐ€์ง€์˜ ์ฃผ์š” parameter๊ฐ€ ์žˆ๋‹ค.
  i) block_size : dropํ•  block์˜ ํฌ๊ธฐ 
  ii) ๐›พ :์–ผ๋งˆ๋‚˜ ๋งŽ์€ activation units๋ฅผ dropํ•  ๊ฒƒ์ธ์ง€
์šฐ๋ฆฌ๋Š” ์„œ๋กœ ๋‹ค๋ฅธ feature channel์— ๊ฑธ์ณ ๊ณต์œ ๋œ DropBlock mask๋ฅผ ์‹คํ—˜ํ•˜๊ฑฐ๋‚˜ ๊ฐ feature channel์— DropBlock ๋งˆ์Šคํฌ๋ฅผ ์ง€๋‹ˆ๊ฒŒ ํ•œ๋‹ค. Algorithm 1์€ ํ›„์ž์— ํ•ด๋‹นํ•˜๋ฉฐ, ์ด๋Š” ์šฐ๋ฆฌ์˜ ์‹คํ—˜์—์„œ ๋” ์ž˜ ์ž‘๋™ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค.

Dropout๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ์šฐ๋ฆฌ๋Š” ์ถ”๋ก ์‹œ๊ฐ„์ค‘ DropBlock์„ ์ ์šฉํ•˜์ง€ ์•Š๋Š”๋ฐ, ์ด๋Š” ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ํฌ๊ธฐ๊ฐ€ ํฐ ํ•˜์œ„์‹ ๊ฒฝ๋ง์˜ ์•™์ƒ๋ธ”์— ๊ฑธ์ณ ํ‰๊ท ์˜ˆ์ธก์„ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ํ•ด์„๋œ๋‹ค.
์ด๋Ÿฐ ํ•˜์œ„์‹ ๊ฒฝ๋ง์€ ๊ฐ ์‹ ๊ฒฝ๋ง์ด ๊ฐ ํŠน์ง•๋งต์˜ ์ธ์ ‘ํ•œ ๋ถ€๋ถ„์„ ๋ณด์ง€ ๋ชปํ•˜๋Š” Dropout์œผ๋กœ ์ปค๋ฒ„๋˜๋Š” ํŠน๋ณ„ํ•œ ํ•˜์œ„์‹ ๊ฒฝ๋ง์˜ ํ•˜์œ„์ง‘ํ•ฉ์„ ํฌํ•จํ•œ๋‹ค.



Setting the value of block_size



Setting the value of  ๐›พ



 •Scheduled DropBlock

 

 

 

4. Experiments

๋‹ค์Œ Section์—์„œ๋Š” Image Classification, Object Detection ๋ฐ Semantic Segmentation์— ๋Œ€ํ•œ DropBlock์˜ ํšจ๊ณผ๋ฅผ ๊ฒฝํ—˜์ ์กฐ์‚ฌ๋ฅผ ์ง„ํ–‰ํ•œ๋‹ค.
Image Classification์„ ์œ„ํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ResNet-50์— DropBlock์„ ์ ์šฉํ•œ๋‹ค.
๊ฒฐ๊ณผ๊ฐ€ ๋‹ค๋ฅธ ์•„ํ‚คํ…์ฒ˜๋กœ ์ „์†ก๊ฐ€๋Šฅ์—ฌ๋ถ€์˜ ํ™•์ธ์„ ์œ„ํ•ด ์ตœ์ฒจ๋‹จ ๋ชจ๋ธ์ธ AmoebaNet์—์„œ DropBlock์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ฐœ์„  ์‚ฌํ•ญ์„ ๋ณด์—ฌ์ค€๋‹ค.
Image classification ์™ธ์—๋„, ์šฐ๋ฆฌ๋Š” DropBlock์ด Object Detection ๋ฐ Semantic Segmentation์„ ์œ„ํ•œ RetinaNet์„ ํ›ˆ๋ จํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.

 

 

4.1  ImageNet Classification

ILSVRC 2012 classification dataset
- train: 1.2M , valid: 5๋งŒ , test: 15๋งŒ, 1000-class label
์ด๋ฏธ์ง€์—๋Š” 1,000๊ฐœ์˜ ๋ฒ”์ฃผ๊ฐ€ ๋ ˆ์ด๋ธ”๋กœ ์ง€์ •๋ฉ๋‹ˆ๋‹ค.
[GoogLeNet, DenseNet]์ฒ˜๋Ÿผ trainingํ•˜๊ธฐ ์œ„ํ•ด horizontal flip, scale, ์ข…ํšก๋น„์œจํ™•๋Œ€๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.
evaluation์—์„œ multiple crop๋Œ€์‹ , single crop์„ ์ ์šฉํ–ˆ๋‹ค.
์ผ๋ฐ˜์  ๊ด€ํ–‰์— ๋”ฐ๋ผ ๊ฒ€์ฆ ์„ธํŠธ์— ๋Œ€ํ•œ ๋ถ„๋ฅ˜ ์ •ํ™•๋„๋ฅผ ๋ณด๊ณ ํ•œ๋‹ค.


 • Implementation Details

์šฐ๋ฆฌ๋Š” ํ…์„œ ์ฒ˜๋ฆฌ ์žฅ์น˜(TPU)์— ๋Œ€ํ•œ ๋ชจ๋ธ์„ ๊ต์œกํ•˜๊ณ  ๊ณต๊ฐœ๋œ ResNet-50 ๋ฐ AmebaNet์— ๋Œ€ํ•ด tensorflow๋กœ ๊ตฌํ˜„์„ ์‚ฌ์šฉํ–ˆ๋‹ค. [https://github.com/tensorflow/tpu/tree/master/models/official/resnet //  https://github.com/tensorflow/tpu/tree/master/models/experimental/amoeba_net ]

์šฐ๋ฆฌ๋Š” ๋ชจ๋“  ๋ชจ๋ธ์— ๋Œ€ํ•ด
 - ๊ธฐ๋ณธ image size (ResNet-50์˜ ๊ฒฝ์šฐ 224 x 224, AmebaNet์˜ ๊ฒฝ์šฐ 331 x 331)
 - batch size (ResNet-50์˜ ๊ฒฝ์šฐ 1024, AmebaNet์˜ ๊ฒฝ์šฐ 2048)
  - ๋ฐ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •์„ ์ ์šฉํ–ˆ๋‹ค.
์šฐ๋ฆฐ ๋‹จ์ง€ ResNet-50 ์•„ํ‚คํ…์ฒ˜์— ๋Œ€ํ•œ training epoch์„ 90๊ฐœ์—์„œ 270๊ฐœ๋กœ ๋Š˜๋ ธ์„ ๋ฟ์ด๋‹ค.
ํ•™์Šต๋ฅ ์€ 125, 200, 250 epoch๋งˆ๋‹ค 0.1๋ฐฐ ๊ฐ์†Œํ–ˆ๋‹ค.

AmebaNet ๋ชจ๋ธ์€ 340epoch ๋™์•ˆ ํ›ˆ๋ จ๋˜์—ˆ์œผ๋ฉฐ ํ•™์Šต๋ฅ ์Šค์ผ€์ค„๋ง์„์œ„ํ•ด ์ง€์ˆ˜decay์ฒด๊ณ„๊ฐ€ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค.
๊ธฐ์กด์˜ ๋ชจ๋ธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋” ๊ธด training scheme์œผ๋กœ ์ธํ•ด overfitting๋˜์–ด์„œ ํ›ˆ๋ จ์ด ์ข…๋ฃŒ๋˜๋ฉด validation accuracy๊ฐ€ ๋‚ฎ๋‹ค.
๋”ฐ๋ผ์„œ ๊ณต์ •ํ•œ ๋น„๊ต๋ฅผ ์œ„ํ•ด ์ „์ฒด training๊ณผ์ •์— ๊ฑธ์ณ ๊ฐ€์žฅ ๋†’์€ validation accuracy๋ฅผ ๋ณด๊ณ ํ•œ๋‹ค

 

 4.1.1  DropBlock in ResNet-50

ResNet-50์€ ์ด๋ฏธ์ง€ ์ธ์‹์„ ์œ„ํ•ด ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” CNN์•„ํ‚คํ…์ฒ˜์ด๋‹ค.
๋‹ค์Œ ์‹คํ—˜์—์„œ, ์šฐ๋ฆฌ๋Š” ResNet-50์— ๋‹ค๋ฅธ ๊ทœ์ œํ™” ๊ธฐ์ˆ ์„ ์ ์šฉํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ DropBlock๊ณผ ๋น„๊ตํ•œ๋‹ค.
๊ฒฐ๊ณผ๋Š” ํ‘œ 1์— ์š”์•ฝ๋˜์–ด ์žˆ๋‹ค.



 • Where to apply DropBlock
ResNet์—์„œ building block์€ ๋ช‡ ๊ฐœ์˜ conv.layer์™€ identity mapping์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ณ„๋„์˜ skip-connection์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.
๋ชจ๋“  conv.layer๋Š” Batch Normalizatoin layer ๋ฐ ReLU activation์— ๋”ฐ๋ฅธ๋‹ค.
building block์˜ ์ถœ๋ ฅ์€ convolution building block์˜ ์ถœ๋ ฅ์€ convolution branch์˜ ์ถœ๋ ฅ๊ณผ skip connection ์ถœ๋ ฅ์˜ ํ•ฉ์ด๋‹ค.

ResNet์€ ํ™œ์„ฑํ™”ํ•จ์ˆ˜์˜ ๊ณต๊ฐ„์  ํ•ด์ƒ๋„์— ๊ธฐ์ดˆํ•˜์—ฌ ๊ทธ๋ฃน์„ ๊ตฌ์ถ•ํ•˜์—ฌ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š”๋ฐ, building group์€ ์—ฌ๋Ÿฌ building block์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.
์šฐ๋ฆฌ๋Š” ๊ทธ๋ฃน 4๋ฅผ ์‚ฌ์šฉํ•ด ResNet์˜ ๋งˆ์ง€๋ง‰ group(์ฆ‰, conv5_x์˜ ๋ชจ๋“  layer)์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.


๋‹ค์Œ ์‹คํ—˜์—์„œ๋Š” ResNet์—์„œ DropBlock์„ ์ ์šฉํ•  ์œ„์น˜๋ฅผ ์—ฐ๊ตฌํ•œ๋‹ค.
  โ‘  conv.layer์ดํ›„์—๋งŒ DropBlock์„ ์ ์šฉํ•˜๊ฑฐ๋‚˜
  โ‘ก conv.layer์™€ skip-connection ๋‘˜ ๋ชจ๋‘ ๋’ท๋ถ€๋ถ„์— DropBlock์„ ์ ์šฉํ•˜๋Š” ์‹คํ—˜์„ ํ–ˆ๋‹ค.
๋‹ค์–‘ํ•œ ํŠน์ง• ๊ทธ๋ฃน์— ์ ์šฉ๋˜๋Š” DropBlock์˜ ์„ฑ๋Šฅ์„ ์—ฐ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ๊ทธ๋ฃน 4 ๋˜๋Š” ๊ทธ๋ฃน 3๊ณผ ๊ทธ๋ฃน 4 ๋ชจ๋‘์— DropBlock์„ ์ ์šฉํ•˜๋Š” ์‹คํ—˜์„ ํ–ˆ๋‹ค.


 • DropBlock. vs. dropout
์›๋ž˜ ResNet ์•„ํ‚คํ…์ฒ˜๋Š” ๋ชจ๋ธ์˜ Dropout์„ ์ ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค. ๋‹ค๋งŒ ๋…ผ์˜์˜ ์šฉ์ด์„ฑ์„ ์œ„ํ•ด ๊ธฐ์กด ResNet์˜ Dropout์„ convolution branch์—๋งŒ dropout์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ •์˜ํ•œ๋‹ค.
๊ธฐ๋ณธ์ ์œผ๋กœ block_size = 7๋กœ ๊ทธ๋ฃน 3๊ณผ 4 ๋ชจ๋‘์— DropBlock์„ ์ ์šฉํ•œ๋‹ค.
์šฐ๋ฆฌ๋Š” ๋ชจ๋“  ์‹คํ—˜์—์„œ ๊ทธ๋ฃน 3์— ๋Œ€ํ•ด parameter ๐›พ๋ฅผ 4๋งŒํผ ๊ฐ์†Œ์‹œ์ผฐ๋‹ค.
๊ทธ๋ฆผ 3-(a)์—์„œ, ์šฐ๋ฆฌ๋Š” top-1 accuracy์—์„œ DropBlock์ด 1.3%๋กœ Dropout์„ ๋Šฅ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.
reserved keep_prob๋Š” DropBlock์„ keep_prob์˜ ๋ณ€๊ฒฝ์— ๋” ๊ฐ•๋ ฅํ•˜๊ฒŒ ๋งŒ๋“ค๊ณ  keep_prob(3-(b))์˜ ๊ฐ€์žฅ ๋งŽ์€ ๊ฐ’์— ๋Œ€ํ•œ ๊ฐœ์„ ์„ ์ถ”๊ฐ€ํ•œ๋‹ค.


๊ทธ๋ฆผ 3์—์„œ ์ตœ๊ณ ์˜ keep_prob๋ฅผ ํ†ตํ•ด ๋ฐœ๊ฒฐํ•œ ์ ์€ ๋ฐ”๋กœ  block_size๊ฐ€ 1์—์„œ ์ „์ฒด ํŠน์ง•๋งต์„ ํฌํ•จํ•˜๋Š” block_size๋กœ ๋ฐ”๋€Œ์—ˆ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.
๊ทธ๋ฆผ 4๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ 1์˜ block_size๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ํฐ block_size๋ฅผ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋” ๋‚ซ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ ์ตœ์ƒ์˜ DropBlock ๊ตฌ์„ฑ์€ block_size = 7์„ ๊ทธ๋ฃน 3๊ณผ 4์— ๋ชจ๋‘ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.


๋ชจ๋“  ๊ตฌ์„ฑ์—์„œ DropBlock๊ณผ Dropout์€ ์œ ์‚ฌํ•œ ์ถ”์„ธ๋ฅผ ๊ณต์œ ํ•˜๋ฉฐ DropBlock์€ ์ตœ์ƒ์˜ Dropout ๊ฒฐ๊ณผ์— ๋น„ํ•ด ํฐ ์ด๋“์„ ๋ณด์ž…๋‹ˆ๋‹ค.
์ด๊ฒƒ์€ DropBlock์ด Dropout์— ๋น„ํ•ด ๋” ํšจ๊ณผ์ ์ธ ๊ทœ์ œํ™” ๋„๊ตฌ๋ผ๋Š” ์ฆ๊ฑฐ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.



 •DropBlock. vs. SpatialDropout
๊ธฐ์กด์˜ Dropout๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ, ์šฐ๋ฆฌ๋Š” ๊ธฐ์กด์˜ SpatialDropout๊ธฐ๋ฒ•์ด convolution branch์—๋งŒ ์ ์šฉํ•˜๋Š” ๊ฒƒ์œผ๋กœ ์ •์˜ํ•œ๋‹ค.
SpatialDropout๊ธฐ๋ฒ•์€ Dropout๊ธฐ๋ฒ•๋ณด๋‹ค๋Š” ๋‚ซ์ง€๋งŒ DropBlock๋‹ค๋Š” ๋–จ์–ด์ง„๋‹ค.
(์ฆ‰, DropBlock > SpatialDropout > Dropout)
๊ทธ๋ฆผ 4์—์„œ ๊ทธ๋ฃน 3์˜ ๊ณ ํ•ด์ƒ๋„ ํŠน์ง•๋งต์— ์ ์šฉํ•  ๋•Œ, SpatialDropout๊ธฐ๋ฒ•์ด ๋„ˆ๋ฌด ๊ฐ€ํ˜นํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ฐœ๊ฒฌํ–ˆ๋‹ค.
DropBlock์€ ๊ทธ๋ฃน 3๊ณผ ๊ทธ๋ฃน 4 ๋ชจ๋‘์—์„œ ์ผ์ •ํ•œ ํฌ๊ธฐ์˜ Block์„ Dropํ•˜์—ฌ ์ตœ์ƒ์˜ ๊ฒฐ๊ณผ๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค.



 • Comparision with DropPath

Scheduled DropPath๊ธฐ๋ฒ•์—์„œ "skip-connection"์„ ์ œ์™ธํ•œ ๋ชจ๋“  ์—ฐ๊ฒฐ์— Scheduled DropPath๋ฅผ ์ ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค.
์šฐ๋ฆฌ๋Š” keep_prob ๋งค๊ฐœ ๋ณ€์ˆ˜์— ๋Œ€ํ•ด ๋‹ค๋ฅธ ๊ฐ’์œผ๋กœ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œ์ผฐ์œผ๋ฉฐ, ๋ชจ๋“  ๊ทธ๋ฃน์—์„œ DropPath๋ฅผ ์ ์šฉํ•˜๊ณ  ๊ทธ๋ฃน 4 ๋˜๋Š” ๊ทธ๋ฃน 3๊ณผ ๊ทธ๋ฃน 4์—์„œ๋งŒ ๋‹ค๋ฅธ ์‹คํ—˜๊ณผ ์œ ์‚ฌํ•œ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œ์ผฐ๋‹ค.
keep_prob = 0.9์ธ ๊ทธ๋ฃน 4์—๋งŒ ์ ์šฉํ–ˆ์„ ๋•Œ 77.10%๋กœ ์ตœ์ƒ์˜ validation accuracy๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค.



 • Comparision with Cutout
๋˜ํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ธ Cutout๊ธฐ๋ฒ•๊ณผ ๋น„๊ตํ•˜์—ฌ ์ž…๋ ฅ ์ด๋ฏธ์ง€์—์„œ ๊ณ ์ • ํฌ๊ธฐ ๋ธ”๋ก์„ ๋ฌด์ž‘์œ„๋กœ ๋–จ์–ด๋œจ๋ ธ๋‹ค.
Cutout๊ธฐ๋ฒ•์€ Cutout๋…ผ๋ฌธ์—์„œ ์ œ์•ˆํ•œ ๋Œ€๋กœ CIFAR-10 dataset์˜ ์ •ํ™•๋„ํ–ฅ์ƒ์„ ๋˜์ง€๋งŒ, ์šฐ๋ฆฌ ์‹คํ—˜์—์„œ ImageNet dataset์˜ ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค์ง€๋Š” ์•Š๋Š”๋‹ค.



 • Comparision with other regularization techniques
์šฐ๋ฆฌ๋Š” DropBlock์„ ์ผ๋ฐ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” 2๊ฐ€์ง€ regularization๊ธฐ์ˆ (data augmentation, label smoothing)๊ณผ ๋น„๊ตํ•œ๋‹ค.
ํ‘œ 1์—์„œ DropBlock์€ data augmentation, label smoothing์— ๋น„ํ•ด ์„ฑ๋Šฅ์ด ์šฐ์ˆ˜ํ•˜๋‹ค.
DropBlock๊ณผ label smoothing ๋ฐ 290epoch training์„ ๊ฒฐํ•ฉํ•˜๋ฉด ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋˜์–ด ๊ทœ์ œํ™” ๊ธฐ์ˆ ์ด ๋” ์˜ค๋ž˜ ํ›ˆ๋ จํ•  ๋•Œ, ๋ณด๊ฐ•์ด ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.

 

 4.1.2  DropBlock in AmoebaNet

- ๋˜ํ•œ ์ง„ํ™”์ ๊ตฌ์กฐ์˜ search๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐœ๊ฒฌ๋œ ์ตœ์‹  architecture๋Š” AmoebaNet-B architecture์—์„œ DropBlock์˜ ํšจ๊ณผ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. ์ด ๋ชจ๋ธ์€ 0.5์˜ keep probability๋กœ dropoutํ•˜์ง€๋งŒ ์ตœ์ข… softmax์ธต์—์„œ๋งŒ dropout๋œ๋‹ค.

- ์šฐ๋ฆฐ ๋ชจ๋“  Batch Normalization์ธต ํ›„์— DropBlock์„ ์ ์šฉํ•˜๊ณ  ๋˜ํ•œ ๋งˆ์ง€๋ง‰ ์…€์˜ 50%์— Skip-Connection์—๋„ ์ ์šฉํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์…€์—์„œ ํŠน์ง•๋งต์˜ ํ•ด์ƒ๋„๋Š” 331x331 ํฌ๊ธฐ์˜ ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด 21x21 ๋˜๋Š” 11x11์ด๋‹ค.
๋งˆ์ง€๋ง‰ Section์˜ ์‹คํ—˜์„ ๊ธฐ๋ฐ˜์œผ๋กœ, ์šฐ๋ฆฌ๋Š” 0.9์˜ keep_prob๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ๋งˆ์ง€๋ง‰ ํŠน์ง•๋งต์˜ width์ธ block_size = 11์„ ์„ค์ •ํ–ˆ๋‹ค.
DropBlock์€ AmoebaNet-B์˜ top-1 accuracy๋ฅผ 82.25%์—์„œ 82.52%๋กœ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค(ํ‘œ 2).

 

 

4.2  Experimental Analysis


DropBlock์€ ๋“œ๋กญ์•„์›ƒ์— ๋น„ํ•ด ImageNet classification ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๊ฐ•๋ ฅํ•œ ๊ฒฝํ—˜์  ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.

์šฐ๋ฆฌ๋Š” conv.layer์˜ ์ธ์ ‘ ์˜์—ญ์ด ๊ฐ•ํ•˜๊ฒŒ ์ƒ๊ด€๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Dropout์ด ์ถฉ๋ถ„ํ•˜์ง€ ์•Š๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค.
unit์„ ์ž„์˜๋กœ ๋–จ์–ด๋œจ๋ ค๋„ ์ธ์ ‘ unit์„ ํ†ตํ•ด ์ •๋ณด๊ฐ€ ํ๋ฅผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

์ด Section์—์„œ๋Š” DropBlock์ด semantic์ •๋ณด๋ฅผ ์‚ญ์ œํ•˜๋Š” ๋ฐ ๋” ํšจ๊ณผ์ ์ด๋ผ๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•œ๋‹ค.

 ๊ฒฐ๊ณผ์ ์œผ๋กœ, DropBlock์— ์˜ํ•ด ๊ทœ์ œํ™”๋œ ๋ชจ๋ธ์€ Dropout์— ์˜ํ•ด ๊ทœ์ œํ™”๋œ ๋ชจ๋ธ์— ๋น„ํ•ด ๋” ๊ฐ•๋ ฅํ•˜๋‹ค.

์šฐ๋ฆฌ๋Š” ์ถ”๋ก  ์ค‘, block_size๊ฐ€ 1๊ณผ 7์ธ DropBlock์„ ์ ์šฉํ•˜๊ณ  ์„ฑ๋Šฅ์˜ ์ฐจ์ด๋ฅผ ๊ด€์ฐฐํ•˜์—ฌ ๋ฌธ์ œ๋ฅผ ์—ฐ๊ตฌํ•œ๋‹ค.




 • DropBlock drops more semantic information

๋จผ์ € ๊ทœ์ œํ™” ์—†๋Š” ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๊ฐ€์ ธ์™€์„œ block_size = 1 ๋ฐ block_size = 7์„ ์‚ฌ์šฉํ•˜์—ฌ DropBlock์œผ๋กœ ํ…Œ์ŠคํŠธํ–ˆ๋‹ค.
๊ทธ๋ฆผ 5์˜ ๋…น์ƒ‰ ๊ณก์„ ์€ ์ถ”๋ก  ์ค‘์— keep_prob๊ฐ€ ๊ฐ์†Œํ•จ์— ๋”ฐ๋ผ validation accuracy๊ฐ€ ๋น ๋ฅด๊ฒŒ ๊ฐ์†Œํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.

์ด๊ฒƒ์€ DropBlock์ด semantic์ •๋ณด๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ๋ถ„๋ฅ˜๋ฅผ ๋” ์–ด๋ ต๊ฒŒ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์‹œ์‚ฌํ•œ๋‹ค.

์ •ํ™•๋„๋Š” DropBlock์ด dropout๋ณด๋‹ค semantic์ •๋ณด๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๋ฐ ๋” ํšจ๊ณผ์ ์ด๋ผ๋Š” ๊ฒƒ์„ ์‹œ์‚ฌํ•˜๋Š” block_size = 7๊ณผ ๋น„๊ตํ•˜์—ฌ block_size = 1์— ๋Œ€ํ•ด keep_prob๊ฐ€ ๊ฐ์†Œํ• ์ˆ˜๋ก ๋” ๋น ๋ฅด๊ฒŒ ๋–จ์–ด์ง„๋‹ค.



 • Model trained with DropBlock is more robust

๋‹ค์Œ์œผ๋กœ ์šฐ๋ฆฌ๋Š” ๋” ๋งŽ์€ ์˜๋ฏธ ์ •๋ณด๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ํฐ ๋ธ”๋ก ํฌ๊ธฐ๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์ด ๋” ๊ฐ•๋ ฅํ•œ ์ •๊ทœํ™”๋ฅผ ์ดˆ๋ž˜ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.

์šฐ๋ฆฌ๋Š” ์ถ”๋ก  ์ค‘์— block_size = 7๊ณผ ์ ์šฉ๋œ block_size = 1๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ์ทจํ•จ์œผ๋กœ์จ ๊ทธ ์‚ฌ์‹ค์„ ์ž…์ฆํ•˜๊ณ  ๊ทธ ๋ฐ˜๋Œ€๋„ ๋งˆ์ฐฌ๊ฐ€์ง€์ด๋‹ค.

๊ทธ๋ฆผ 5์—์„œ block_size = 1๊ณผ block_size = 7๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์€ ๋ชจ๋‘ ์ถ”๋ก  ์ค‘์— block_size = 1์ด ์ ์šฉ๋œ ์ƒํƒœ์—์„œ ๊ฒฌ๊ณ ํ•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ block_size = 1๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์€ ์ถ”๋ก  ์ค‘์— block_size = 7์„ ์ ์šฉํ•  ๋•Œ keep_prob๊ฐ€ ๊ฐ์†Œํ•จ์— ๋”ฐ๋ผ ๋” ๋น ๋ฅด๊ฒŒ ๊ฐ์†Œํ–ˆ๋‹ค.

๊ฒฐ๊ณผ๋Š” block_size = 7์ด ๋” ๊ฐ•๋ ฅํ•˜๊ณ  block_size = 1์˜ ์ด์ ์ด ์žˆ์ง€๋งŒ ๊ทธ ๋ฐ˜๋Œ€๋Š” ์•„๋‹˜์„ ์‹œ์‚ฌํ•œ๋‹ค.


 

• DropBlock learns spatially distributed representations


DropBlock
์œผ๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์€ DropBlock์ด ์ธ์ ‘ํ•œ ์˜์—ญ์—์„œ semantic์ •๋ณด๋ฅผ ์ œ๊ฑฐํ•˜๋Š” ๋ฐ ํšจ๊ณผ์ ์ด๋‹ค.
๋”ฐ๋ผ์„œ ๊ณต๊ฐ„์ ์œผ๋กœ ๋ถ„์‚ฐ๋œ ํ‘œํ˜„์„ ํ•™์Šตํ•ด์•ผ ํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค
.

DropBlock์— ์˜ํ•ด ๊ทœ์ œํ™”๋œ ๋ชจ๋ธ์€ ํ•˜๋‚˜์˜ ๋…์ž์  ์˜์—ญ์—๋งŒ ์ดˆ์ ์„ ๋งž์ถ”๋Š” ๋Œ€์‹  ์—ฌ๋Ÿฌ ๋…์ž์  ์˜์—ญ์„ ํ•™์Šตํ•ด์•ผ ํ•œ๋‹ค.

์šฐ๋ฆฌ๋Š” ImageNet validation set์—์„œ ResNet-50์˜ conv5_3 ํด๋ž˜์Šค activation์„ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด ํด๋ž˜์Šค ํ™œ์„ฑํ™” ๋งต(CAM)์„ ์‚ฌ์šฉํ•œ๋‹ค.

๊ทธ๋ฆผ 6์€ block_size = 1 ๋ฐ block_size = 7์ธ DropBlock์œผ๋กœ ํ›ˆ๋ จ๋œ ๊ธฐ์กด๋ชจ๋ธ๊ณผ ๋ชจ๋ธ์˜ ํด๋ž˜์Šค activation์„ ๋ณด์—ฌ์ค€๋‹ค.

์ผ๋ฐ˜์ ์œผ๋กœ DropBlock์œผ๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์€ ์—ฌ๋Ÿฌ ์˜์—ญ์—์„œ ๋†’์€ ํด๋ž˜์Šค activation์„ ์œ ๋„ํ•˜๋Š” ๊ณต๊ฐ„์ ์œผ๋กœ ๋ถ„์‚ฐ๋œ ํ‘œํ˜„์„ ํ•™์Šตํ•˜๋Š” ๋ฐ˜๋ฉด, ๊ทœ์ œํ™”๊ฐ€ ์—†๋Š” ๋ชจ๋ธ์€ ํ•˜๋‚˜ ๋˜๋Š” ๋งค์šฐ์ ์€ ์ˆ˜์˜ ์˜์—ญ์— ์ดˆ์ ์„ ๋งž์ถ”๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค.

 

 

4.3  Object Detection in COCO

DropBlock์€ CNN์„ ์œ„ํ•œ ์ผ๋ฐ˜์ ์ธ regularization ๋ชจ๋“ˆ์ด๋‹ค.
์ด Section์—์„œ๋Š” DropBlock์ด COCO dataset์˜ training object detector์—๋„ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.
์šฐ๋ฆฌ๋Š” ์‹คํ—˜์— RetinaNet์„ ์‚ฌ์šฉํ•˜๋ฉฐ, image์— ๋Œ€ํ•œ single label์„ ์˜ˆ์ธกํ•˜๋Š” image classification๊ณผ ๋‹ฌ๋ฆฌ, RetinaNet์€ multi-scale Feature Pyramid network(FPN)์—์„œ convolution์œผ๋กœ ์‹คํ–‰๋˜์–ด ๋‹ค์–‘ํ•œ ์Šค์ผ€์ผ๊ณผ ์œ„์น˜์—์„œ object๋ฅผ localizatoinํ•˜๊ณ  ๋ถ„๋ฅ˜ํ•œ๋‹ค. [Focal loss for dense object detection]์˜ ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜์™€ anchor ์ •์˜๋ฅผ ๋”ฐ๋ผ FPN๊ณผ classifier/regressor์˜ branch๋“ค์„ ๊ตฌ์ถ•ํ–ˆ๋‹ค.


 • Where to apply DropBlock to RetinaNet model
 RetinaNet ๋ชจ๋ธ์€ ResNet-FPN์„ ๋ฐฑ๋ณธ ๋ชจ๋ธ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
๋‹จ์ˆœ์„ฑ์„ ์œ„ํ•ด ResNet-FPN์˜ ResNet์— DropBlock์„ ์ ์šฉํ•˜๊ณ  ImageNet classification ํ›ˆ๋ จ์— ๋Œ€ํ•ด ์ฐพ์€ ์ตœ์ƒ์˜ keep_prob๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
DropBlock์€ ์ง€์—ญ ์ œ์•ˆ(region proposal)์˜ ํŠน์ง•์— ๊ตฌ์กฐํ™”๋œ ํŒจํ„ด์„ dropํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ฐฐ์šฐ๋Š” ์ตœ๊ทผ ์—ฐ๊ตฌ[A-Fast-RCNN]๊ณผ๋Š” ๋‹ค๋ฅด๋‹ค.


• Training object detector from random initialization
 ๋ฌด์ž‘์œ„ ์ดˆ๊ธฐํ™”์—์„œ object detector๋ฅผ ํ›ˆ๋ จํ•˜๋Š” ๊ฒƒ์€ ์–ด๋ ค์šด ์ž‘์—…์œผ๋กœ ๊ฐ„์ฃผ๋˜์–ด ์™”๋‹ค.
์ตœ๊ทผ, ๋ช‡๋ช‡ ๋…ผ๋ฌธ๋“ค์€ ์ƒˆ๋กœ์šด ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜, ํฐ mini-batch size ๋ฐ ๋” ๋‚˜์€ normalization layer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ ค๊ณ  ์‹œ๋„ํ–ˆ๋‹ค.
์šฐ๋ฆฌ์˜ ์‹คํ—˜์—์„œ, ์šฐ๋ฆฌ๋Š” ๋ชจ๋ธ์˜ ๊ทœ์ œํ™” ๊ด€์ ์—์„œ ๋ฌธ์ œ๋ฅผ ์‚ดํŽด๋ณธ๋‹ค.
training image classification model๊ณผ ๋™์ผํ•œ hyper parameter์ธ keep_prob = 0.9๋กœ DropBlock์„ ์‹œ๋„ํ•˜๊ณ  ๋‹ค๋ฅธ block_size๋กœ ์‹คํ—˜ํ–ˆ๋‹ค.
ํ‘œ 3์—์„œ ๋ฌด์ž‘์œ„ ์ดˆ๊ธฐํ™”์—์„œ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์ด ImageNet์œผ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ๋Šฅ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.
DropBlock์„ ์ถ”๊ฐ€ํ•˜๋ฉด 1.6%์˜ AP๊ฐ€ ์ถ”๊ฐ€๋˜๋Š”๋ฐ, ๊ทธ ๊ฒฐ๊ณผ๋Š” ๋ชจ๋ธ ๊ทœ์ œํ™”๊ฐ€ Object Detector๋ฅผ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ์ค‘์š”ํ•œ ์š”์†Œ์ด๋ฉฐ DropBlock์€ ๋ฌผ์ฒด ๊ฐ์ง€๋ฅผ ์œ„ํ•œ ํšจ๊ณผ์ ์ธ ๊ทœ์ œํ™” ์ ‘๊ทผ๋ฒ•์ž„์„ ์‹œ์‚ฌํ•œ๋‹ค.


• Implementation details
 ์šฐ๋ฆฌ๋Š” ์‹คํ—˜์„ ์œ„ํ•ด RetinaNet3์˜ ์˜คํ”ˆ ์†Œ์Šค ๊ตฌํ˜„์„ ์‚ฌ์šฉํ•œ๋‹ค.
๋ชจ๋ธ์€ 64๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ํ•œ batch๋™์•ˆ ์ฒ˜๋ฆฌํ•˜์—ฌ TPU์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋˜์—ˆ๋‹ค.
๊ต์œก ์ค‘์— multi-scale jittering์„ ์ ์šฉํ•ด scale๊ฐ„์˜ image sizewhwjd gn ekdma image๋ฅผ ์ตœ๋Œ€์ฐจ์ˆ˜ 640์œผ๋กœ padding/crop์„ ์ง„ํ–‰.
ํ…Œ์ŠคํŠธํ•˜๋Š” ๋™์•ˆ ์ตœ๋Œ€ ์ฐจ์ˆ˜๋Š” 640์˜ singel-scale image๋งŒ ์‚ฌ์šฉ๋˜์—ˆ๋‹ค.
Batch Normalization์ธต์€ classifier/regressor branch๋ฅผ ํฌํ•จํ•œ ๋ชจ๋“  conv.layer ์ดํ›„์— ์ ์šฉ๋˜์—ˆ๋‹ค.

๋ชจ๋ธ์€ 150 epoch(280k training step)์„ ์‚ฌ์šฉํ•ด ํ›ˆ๋ จ๋˜์—ˆ๋‹ค.
์ดˆ๊ธฐ ํ•™์Šต๋ฅ  0.08์€ ์ฒ˜์Œ 120 epoch์— ์ ์šฉ๋˜์—ˆ๊ณ  120 epoch๊ณผ 140epoch์— 0.1์”ฉ ๊ฐ์†Œํ–ˆ๋‹ค.
ImageNet ์ดˆ๊ธฐํ™”๋ฅผ ์‚ฌ์šฉํ•œ ๋ชจ๋ธ์€ 16 ๋ฐ 22 epoch์—์„œ learning decay์™€ ํ•จ๊ป˜ 28 epoch์— ๋Œ€ํ•ด ํ›ˆ๋ จ๋˜์—ˆ๋‹ค.
์ดˆ์  ์†์‹ค์—๋Š” α = 0.25์™€ ๐›พ = 1.5๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.
weight_decay = 0.0001. &. 0.9์˜ momentum = 0.9
์ด ๋ชจ๋ธ์€ COCO train 2017์—์„œ ํ›ˆ๋ จ๋˜์—ˆ๊ณ  COCO val 2017์—์„œ ํ‰๊ฐ€๋˜์—ˆ๋‹ค.

 

 

4.4 Semantic Segmentation in PASCAL VOC

- ์šฐ๋ฆฌ๋Š” DropBlock์ด semantic segmentation๋ชจ๋ธ๋„ ๊ฐœ์„ ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.
PASCAL VOC 2012 ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‹คํ—˜์— ์‚ฌ์šฉํ•˜๊ณ  ์ผ๋ฐ˜์ ์ธ ๊ด€ํ–‰์„ ๋”ฐ๋ผ ์ฆ๊ฐ•๋œ 10,582๊ฐœ์˜ training์ด๋ฏธ์ง€๋กœ ํ›ˆ๋ จํ•˜๊ณ  1,449๊ฐœ์˜ testset ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ mIOU๋ฅผ ๋ณด๊ณ ํ•œ๋‹ค.
์šฐ๋ฆฌ๋Š” semantic segmentation๋ฅผ ์œ„ํ•ด ์˜คํ”ˆ ์†Œ์Šค RetinaNet ๊ตฌํ˜„์„ ์ฑ„ํƒํ•œ๋‹ค.
๊ตฌํ˜„์€ ResNet-FPN backborn ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ multi-scale feature๋ฅผ ์ถ”์ถœํ•˜๊ณ  segmentation์„ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด Fully-Convolution Network๋ฅผ ์ƒ๋‹จ์— ๋ถ€์ฐฉํ•œ๋‹ค.
์šฐ๋ฆฌ๋Š” ํ›ˆ๋ จ์„ ์œ„ํ•ด ์˜คํ”ˆ ์†Œ์Šค ์ฝ”๋“œ์˜ default hyper-parameter๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.


- Object Detection์‹คํ—˜์— ์ด์–ด random initialization์—์„œ ํ›ˆ๋ จ ๋ชจ๋ธ์— ๋Œ€ํ•œ DropBlock์˜ ํšจ๊ณผ๋ฅผ ์—ฐ๊ตฌํ•œ๋‹ค.
์šฐ๋ฆฌ๋Š” 45๊ฐœ์˜ epoch์— ๋Œ€ํ•ด pre-trained ImageNet ๋ชจ๋ธ๋กœ ์‹œ์ž‘ํ•œ ๋ชจ๋ธ๊ณผ 500๊ฐœ์˜ epoch์— ๋Œ€ํ•ด ๋ฌด์ž‘์œ„ ์ดˆ๊ธฐํ™”๋œ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œ์ผฐ๋‹ค.
ResNet-FPN backborn ๋ชจ๋ธ๊ณผ Fully-Convolution Network์— DropBlock์„ ์ ์šฉํ•˜๋Š” ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ–ˆ์œผ๋ฉฐ Fully-Convolution Network์— DropBlock์„ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋” ํšจ๊ณผ์ ์ด๋ผ๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ๋‹ค.
DropBlock์„ ์ ์šฉํ•˜๋ฉด ์ฒ˜์Œ๋ถ€ํ„ฐ ๊ต์œก ๋ชจ๋ธ์— ๋Œ€ํ•œ mIOU๊ฐ€ ํฌ๊ฒŒ ํ–ฅ์ƒ๋˜๊ณ  ImageNet ์‚ฌ์ „ ๊ต์œก ๋ชจ๋ธ๊ณผ ๋ฌด์ž‘์œ„๋กœ ์ดˆ๊ธฐํ™”๋œ ๋ชจ๋ธ์˜ ๊ต์œก ๊ฐ„ ์„ฑ๋Šฅ ๊ฒฉ์ฐจ๊ฐ€ ์ค„์–ด๋“ ๋‹ค.


 

 

 

 

 

 

5. Discussion

- ์ด ๋…ผ๋ฌธ์—์„œ, CNN์˜ training regularize๊ธฐ๋ฒ•์ธ DropBlock์„ ์†Œ๊ฐœํ•œ๋‹ค.
DropBlock์€ ๊ณต๊ฐ„์ ์œผ๋กœ ์—ฐ๊ด€๋œ ์ •๋ณด๋ฅผ dropํ•˜๋Š” ๊ตฌ์กฐํ™”๋œ dropout๊ธฐ๋ฒ•์ด๋‹ค.
ImageNet, COCO detection์— dropout๊ณผ DropBlock์„ ๋น„๊ตํ•จ์œผ๋กœ์จ ๋”์šฑ ํšจ๊ณผ์ ์ธ ๊ทœ์ œํ™”๊ธฐ๋ฒ•์ž„์„ ์ฆ๋ช…ํ•˜์˜€๋‹ค.
DropBlock์€ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜ ์„ค์ •์—์„œ ์ง€์†์ ์œผ๋กœ ๋“œ๋กญ์•„์›ƒ์„ ๋Šฅ๊ฐ€ํ•œ๋‹ค.
์šฐ๋ฆฌ๋Š” DropBlock์œผ๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์ด ๋” ๊ฐ•๋ ฅํ•˜๊ณ  ๋“œ๋กญ์•„์›ƒ์œผ๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์˜ ์ด์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ์„ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•ด ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•˜์˜€์œผ๋ฉฐ, class activation mapping์€ ๋ชจ๋ธ์ด DropBlock์— ์˜ํ•ด ์ •๊ทœํ™”๋œ ๋” ๋งŽ์€ ๊ณต๊ฐ„์ ์œผ๋กœ ๋ถ„์‚ฐ๋œ ํ‘œํ˜„์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Œ์„ ์‹œ์‚ฌํ•œ๋‹ค.

- ์šฐ๋ฆฌ์˜ ์‹คํ—˜์€ conv.layer์™ธ์— "skip-connection"์— DropBlock์„ ์ ์šฉํ•˜๋ฉด ์ •ํ™•๋„๊ฐ€ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.
๋˜ํ•œ ํ›ˆ๋ จ ์ค‘์— ์‚ญ์ œ๋œ unit์˜ ์ˆ˜๋ฅผ ์ ์ง„์ ์œผ๋กœ ์ฆ๊ฐ€์‹œํ‚ค๋ฉด ์ •ํ™•๋„๊ฐ€ ํ–ฅ์ƒ๋˜๊ณ  hyper-parameter์„ ํƒ์— ๋”์šฑ ๊ฐ•๋ ฅํ•ด์ง„๋‹ค.

 

 

 

 

 

 

 

 

๐Ÿง ๋…ผ๋ฌธ ๊ฐ์ƒ_์ค‘์š”๊ฐœ๋… ํ•ต์‹ฌ ์š”์•ฝ

"DropBlock: A regularization method for convolutional networks"

[ํ•ต์‹ฌ ๊ฐœ๋…]
  1. Dropout์€ ๋”ฅ๋Ÿฌ๋‹์—์„œ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” regularization๋ฐฉ๋ฒ•์ด์ง€๋งŒ ๊ตฌ์กฐ์  ํŠน์„ฑ์œผ๋กœ ์ธํ•ด CNN์—์„œ๋Š” ์ œ๋Œ€๋กœ ์ž‘๋™ํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋‹ค.

  2. DropBlock์€ CNN์šฉ์œผ๋กœ ํŠน๋ณ„ํžˆ ์„ค๊ณ„๋œ ์ •๊ทœํ™” ๋ฐฉ๋ฒ•์ด๋‹ค.
    ๊ฐœ๋ณ„ ๋‹จ์œ„ ๋Œ€์‹  ๊ต์œก ์ค‘์— ๊ธฐ๋Šฅ ๋งต์˜ ์ „์ฒด ์—ฐ์† ๋ธ”๋ก์„ ๋ฌด์ž‘์œ„๋กœ ์‚ญ์ œํ•˜์—ฌ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

  3. DropBlock ๋ฐฉ๋ฒ•์€ ๊ฐœ๋ณ„ ํ”ฝ์…€ ๋Œ€์‹  ์ธ์ ‘ํ•œ ๋ธ”๋ก์„ ๋“œ๋กญ(contiguous block drop)ํ•˜๋Š” ๊ณต๊ฐ„์  ๋“œ๋กญ์•„์›ƒ ๋ฐฉ๋ฒ•์˜ ์ผ๋ฐ˜ํ™”๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

  4. DropBlock์€ CIFAR-10, CIFAR-100 ๋ฐ ImageNet์„ ํฌํ•จํ•œ ์—ฌ๋Ÿฌ dataset์—์„œ CNN์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค.

  5. DropBlock์€ ๊ธฐ์กด CNN๊ตฌ์กฐ์— ์‰ฝ๊ฒŒ ํ†ตํ•ฉ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ "weight_decay" ๋ฐ "data augmentation"๊ฐ™์€ ๋‹ค๋ฅธ "regularization"๋ฐฉ๋ฒ•๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

  6. ๋˜ํ•œ ํ•™์Šต ์ค‘ ์‹ ๊ฒฝ๋ง์—์„œ ์—ฐ๊ฒฐ์˜ ์ „์ฒด ๊ฒฝ๋กœ๋ฅผ ๋ฌด์ž‘์œ„๋กœ dropํ•˜๋Š” ์œ ์‚ฌํ•œ ์ •๊ทœํ™” ๋ฐฉ๋ฒ•์ธ "drop-path"์˜ ๊ฐœ๋…์„ ๋„์ž…ํ–ˆ๋‹ค.

์ „๋ฐ˜์ ์œผ๋กœ DropBlock ๋ฐฉ๋ฒ•์€ CNN์„ ์œ„ํ•ด ํŠน๋ณ„ํžˆ ์„ค๊ณ„๋œ ๊ฐ•๋ ฅํ•˜๊ณ  ํšจ๊ณผ์ ์ธ ์ •๊ทœํ™” ๊ธฐ์ˆ ๋กœ ์—ฌ๋Ÿฌ ๋ฒค์น˜๋งˆํฌdataset์—์„œ ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์œผ๋ฉฐ ๊ธฐ์กด ์•„ํ‚คํ…์ฒ˜์— ์‰ฝ๊ฒŒ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ๋‹ค.

 

๐Ÿง  ๋…ผ๋ฌธ์„ ์ฝ๊ณ  Architecture ์ƒ์„ฑ (with tensorflow)

import tensorflow as tf

def drop_block(x, block_size, keep_prob, is_training):
    def dropblock(inputs):
        input_shape = tf.shape(inputs)
        _, height, width, channels = inputs.get_shape().as_list()

        # Calculate the gamma value
        gamma = (1.0 - keep_prob) * tf.cast(tf.size(inputs), tf.float32) / (
            block_size ** 2 * (height - block_size + 1) * (width - block_size + 1))

        # Create a random mask with block_size * block_size blocks
        mask = tf.random.uniform((input_shape[0], height - block_size + 1, width - block_size + 1, channels)) < gamma

        # Calculate the block mask and apply it to the input
        block_mask = tf.reduce_max(tf.cast(mask, inputs.dtype), axis=(1, 2, 3), keepdims=True)
        block_mask = tf.pad(block_mask, [[0, 0], [block_size // 2, block_size // 2], [block_size // 2, block_size // 2], [0, 0]])
        block_mask = tf.image.extract_patches(block_mask, sizes=[1, block_size, block_size, 1], strides=[1, 1, 1, 1], rates=[1, 1, 1, 1], padding='VALID')
        block_mask = 1 - tf.reshape(block_mask, input_shape)
        inputs = inputs * block_mask / tf.reduce_mean(block_mask)

        return inputs

    return tf.keras.layers.Lambda(dropblock, arguments={'is_training': is_training})(x)

+ Recent posts