[Vision DeepLearning]. Semantic Segmentation_part 2. Models.

V2LLAIN 2023. 5. 7. 18:47

2023. 5. 7. 18:47

이전 이야기

https://chan4im.tistory.com/158에서 이어집니다.

[Vision DeepLearning]. Semantic Segmentation_part 1. Intro.

[Segmentation] 고전 Segmentation - SLIC (Siple Linear Iterative Clustering) - Normalized Cut - GrabCut 🧐 Segmentation with Deep Learning thing: 셀 수 있는 물체 (자동차, 사람) // stuff: 셀 수 없는 물체 (땅, 하늘) - semantic segmenta

chan4im.tistory.com

[FCN : Fully Convolutional Network]

Semantic Segmentation을 처음 시도한 성공적인 convolution network이다.
FCN은 Fully-Connected Layer를 제거함으로써 conv.layer와 pooling.layer로만 구성된다.

input: m x n x 3의 color image
output: m x n x (C+1) tensor
- 이때, C는 class개수이고 C+1에서 +1을 해주는 이유는 배경이 포함되기 때문이다.
- output tensor의 0번 map은 background를 의미하며 이후는 class에 속한다.

FCN은 m x n x 3크기의 input image를 Downsampling하여 점차 작게 만든다.
이후 m x n으로 다시 키우는 Upsampling (= Deconvolution)과정이 필요하다.

Downsampling을 계속하면 receptive field가 커지므로 detail이 떨어진다.
앞쪽의 feature map의 경우, detail한 정보를 갖지만 전역정보가 부족하다.
∴ FCN은 여러 scale의 feature map을 결합해 segmentation성능을 최고로 유지한다.
→ by skip connection !!
https://arxiv.org/abs/1411.4038

Upsampling을 가장 쉽게 하는 방법은 Bilinear Interpolation이다.(https://chan4im.tistory.com/84)
하지만 이 방법은 Downsampling에는 학습으로 수행하고 Upsampling에서는 고전방법을 사용하는 꼴이다.

Down·Up sampling 모두 학습으로 알아내야 높은 성능을 기대할 수 있다.
따라서 FCN은 transposed convolution[Dumoulin2016; https://arxiv.org/abs/1603.07285]으로 Upsampling을 진행한다.

📌 Upsampling by **"Transposed Convolution"**

https://chan4im.tistory.com/160

[논문 review] - A guide to convolution arithmetic for deep learning (2016)

chan4im.tistory.com

뒤에서 소개되는 model들은 FCN이 성공을 거둔 후 FCN의 성능을 개선하여 다양하게 변형된 model들이다.

[DeConvNet]

DeConvNet은 FCN과 Auto-Encoder를 결합한 구조이다. [Noh2015]
FCN의 경우, 사람이 개입해 Upsampling을 설계했기에 구조가 복잡하고 어색하다.
DeConvNet의 경우, 대칭구조의 표준 auto-encoder를 사용해서 구조와 학습, 성능이 우월하다.
https://arxiv.org/abs/1505.04366

[U-Net]

의료 영상처리분야에서 가장 권위있는 학술대회 중 하나인 MICCAI(Medical Image Computer Assisted Intervention)에서 발표된 U-net은 의료영상 분할을 목적으로 개발되었다.[Ronneberger2015]
(같은 해 ISBI 세포 추적 대회에서 우승 차지는 안비밀)

U-net은 Downsampling과정과 Upsampling과정을 각각 다음과 같이 부른다.
- Downsampling = contracting path
- Upsampling = expansive path

📌 Down Sampling과정
→: Convolution layer과 ↓: Pooling layer을 통과하면?
여타 모델처럼 feature map의 spatial resolution은 줄어들고 depth(=channel)는 늘어난다.
이 과정을 4번 반복하면 가장 아래의 32X32X512의 feature map이 된다.

📌 Deconvolution (≒ Upsampling)과정
Transpose Convolution (= Deconvolution)을 이용해 원래 image size로 복원한다.

🌟 U-net의 주안점은 →로 표시된 shortcut connection이다!!
Downsampling, Upsampling에 대응되는 층에 대해 각각 다음과 같이 연산한다.
- 좌측에서 발생한 feature map에 대해 중앙에서 crop을 진행, expansive path로 전달.
- 우측은 전달받은 tensor(HxWxC)를 이어붙여 (HxWx2C)의 tensor를 생성한다.

원래 U-net은 Conv2D(filters, (3,3), strides=1, padding='valid')로 사용한다.
따라서 input image가 572x572는 570x570, 568x568로 줄어든다.

풀링층의 경우, 2x2 filter사용으로 feature map의 크기는 반으로 줄어든다.
따라서 568x568맵이 284x284가 된다.
https://arxiv.org/abs/1505.04597

[DeepLabv3+]

Encoder-Decoder구조에 기반한 신경망은 기존의 image를 16~32배까지 축소 후 Upsampling으로 기존 크기로 복원한다.
예를들어 U-net은 572x572 image를 32x32로 축소 => 388x388로 복원한다.

But! 상당히 작게 축소했다 복원하기에 "detail한 feature"를 잃어버릴 가능성이 높다.
→ DeepLabv3+는 이런 단점을 팽창 컨볼루션(dilated convolution = atrous convolution)으로 완화한다.[Chen2018]
https://arxiv.org/abs/1802.02611

dilated(=atrous) convolution에 대한 그림

atrous convolution은 팽창계수 r을 가지며, r=1의 경우 좌측사진처럼 보통 convolution과 같다.
r=2의 경우, 3x3 filter는 이웃 9개의 pixel을 조사한다.(다만 우측처럼 한 pixel을 건너 진행한다.)
즉, 더 넓은 receptive field를 적용하는 셈이 될 수 있다!

저작자표시

'Deep Learning : Vision System > 영상처리 ~ 최신비전' 카테고리의 다른 글

[영상처리과 컴퓨터비전] (1)	2024.12.16
[Vision DeepLearning]. Semantic Segmentation_part 1. Intro. (0)	2023.05.07
[Vision DeepLearning]. Loss. &. Activation (0)	2023.05.07
[Vision DeepLearning]. CNN with Auto-Encoder. &. ERF(Effective Receptive Field) (0)	2023.03.16
[Vision DeepLearning]. Basic Deep Learning with MNIST data (0)	2023.03.12

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

this.code();

[Vision DeepLearning]. Semantic Segmentation_part 2. Models.

이전 이야기

[FCN : Fully Convolutional Network]

📌 Upsampling by **"Transposed Convolution"**

[DeConvNet]

[U-Net]

[DeepLabv3+]

'Deep Learning : Vision System > 영상처리 ~ 최신비전' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

this.code();

[Vision DeepLearning]. Semantic Segmentation_part 2. Models.

이전 이야기

[FCN : Fully Convolutional Network]

📌 Upsampling by "Transposed Convolution"

[DeConvNet]

[U-Net]

[DeepLabv3+]

'Deep Learning : Vision System > 영상처리 ~ 최신비전' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

📌 Upsampling by **"Transposed Convolution"**