๐Ÿ“Œ ๋ชฉ์ฐจ

1. preview
2. Naïve Bayes ํ™œ์šฉํ•˜๊ธฐ

3. ํ”ํ•œ ์˜คํ•ด 2
4. RNN ํ™œ์šฉํ•˜๊ธฐ
5. CNN ํ™œ์šฉํ•˜๊ธฐ
6. ์‰ฌ์–ด๊ฐ€๊ธฐ) Multi-Label  Classification

๐Ÿ˜š ๊ธ€์„ ๋งˆ์น˜๋ฉฐ...

 

 

 


1. Preview

Text Classification์ด๋ž€, ํ…์ŠคํŠธโˆ™๋ฌธ์žฅโˆ™๋ฌธ์„œ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ์‚ฌ์ „์— ์ •์˜๋œ ํด๋ž˜์Šค์ค‘ ์–ด๋””์— ์†ํ•˜๋Š”์ง€ ๋ถ„๋ฅ˜(classification)ํ•˜๋Š” ๊ณผ์ •์œผ๋กœ ์•„๋ž˜์™€ ๊ฐ™์ด ์‘์šฉ๋ถ„์•ผ๊ฐ€ ๋‹ค์–‘ํ•˜๋‹ค.

๋ฌธ์ œ ํด๋ž˜์Šค ์˜ˆ์‹œ
๊ฐ์„ฑ๋ถ„์„ (Sentiment Analysis) ๊ธ์ • / ์ค‘๋ฆฝ / ๋ถ€์ •
์ŠคํŒธ๋ฉ”์ผ ํƒ์ง€ (Spam Detection) ์ •์ƒ / ์ŠคํŒธ
์‚ฌ์šฉ์ž ์˜๋„ ๋ถ„๋ฅ˜ (Intent Classification) ๋ช…๋ น / ์งˆ๋ฌธ / ์žก๋‹ด ๋“ฑ
์ฃผ์ œ ๋ถ„๋ฅ˜ (Topic Classification) ๊ฐ ์ฃผ์ œ
์นดํ…Œ๊ณ ๋ฆฌ ๋ถ„๋ฅ˜ (Category Classification) ๊ฐ ์นดํ…Œ์ฝ”๋ฆฌ

๋”ฅ๋Ÿฌ๋‹ ์ด์ „์—๋Š” Naïve Bayes Classification, Support Vector Machine ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ Text Classification์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.
์ด๋ฒˆ์‹œ๊ฐ„์—๋Š” ๋”ฅ๋Ÿฌ๋‹ ์ด์ „์˜ ๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ถ„๋ฅ˜๋ฐฉ์‹์ธ Naïve Bayes๋ฐฉ์‹์„ ๋น„๋กฏ, ์—ฌ๋Ÿฌ ๋”ฅ๋Ÿฌ๋‹ ๋ฐฉ์‹์„ ์•Œ์•„๋ณด์ž.

 

 

 

 

 

 


2. Naïve Bayes ํ™œ์šฉํ•˜๊ธฐ

Naïve Bayes๋Š” ์•„์ฃผ ๊ฐ•๋ ฅํ•œ("๊ฐ feature๋Š” independentํ•˜๋‹ค!"๋ผ๋Š” ๊ฐ•๋ ฅํ•œ ๊ฐ€์ •์„ ๊ฐ€์ง) ๋ถ„๋ฅ˜๋ฐฉ์‹์œผ๋กœ

์„ฑ๋Šฅ์€ ์ค€์ˆ˜ํ•˜์ง€๋งŒ ๋‹จ์–ด๋ผ๋Š” ๋ถˆ์—ฐ์†์ ์ธ symbol์„ ๋‹ค๋ฃจ๋Š” NLP์—์„œ๋Š” ์•„์‰ฌ์šด ๋ฉด์ด ์กด์žฌํ•œ๋‹ค.

2.1 MAP (Maximum A Posterior)

โ—๏ธBayes Theorem

์ด๋•Œ, ๋Œ€๋ถ€๋ถ„์˜ ๋ฌธ์ œ์—์„œ evidence, P(D)๋Š” ๊ตฌํ•˜๊ธฐ ์–ด๋ ต๊ธฐ์—
P(c | D) ∝ P(D | c)โˆ™P(c) ์‹์œผ๋กœ ์ ‘๊ทผํ•˜๊ธฐ๋„ ํ•œ๋‹ค.
์•ž์˜ ์„ฑ์งˆ์„ ์ด์šฉํ•˜๋ฉด, ์ฃผ์–ด์ง„ data D์— ๋Œ€ํ•ด ํ™•๋ฅ ์„ ์ตœ๋Œ€๋กœ ํ•˜๋Š” ํด๋ž˜์Šค c๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ,

โ—๏ธMAP
์ด์ฒ˜๋Ÿผ ์‚ฌํ›„ํ™•๋ฅ ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ํด๋ž˜์Šค c๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์„ MAP(์‚ฌํ›„ํ™•๋ฅ ์ตœ๋Œ€ํ™”)๋ผ ํ•œ๋‹ค.
โ—๏ธMLE
์ด์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๊ฐ€๋Šฅ๋„๋ฅผ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ํด๋ž˜์Šค c๋ฅผ ๊ตฌํ•˜๋Š” ๊ฒƒ์„ MLE(์ตœ๋Œ€๊ฐ€๋Šฅ๋„์ถ”์ •)์ด๋ผ ํ•œ๋‹ค.
MLE๋Š” ์ฃผ์–ด์ง„ data D์™€ label C์— ๋Œ€ํ•ด ํ™•๋ฅ ๋ถ„ํฌ๋ฅผ ๊ทผ์‚ฌํ•˜๊ธฐ ์œ„ํ•œ 
parameter θ๋ฅผ ํ›ˆ๋ จํ•˜๊ธฐ์œ„ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.
MLE. vs. MAP
MAP๊ฐ€ ๊ฒฝ์šฐ์— ๋”ฐ๋ผ MLE๋ณด๋‹ค ๋” ์ •ํ™•ํ•  ์ˆ˜ ์žˆ๋‹ค. (โˆต ์‚ฌ์ „ํ™•๋ฅ ์ด ํฌํ•จ๋˜์–ด์žˆ์–ด์„œ)

 

2.2 Naïve Bayes
 Naïve Bayes๋Š” MAP๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ž‘๋™ํ•œ๋‹ค.
๊ฐ€์ •: ๊ฐ feature๋Š” independentํ•˜๋‹ค! ๋ผ๋Š” ๊ฐ•๋ ฅํ•œ ๊ฐ€์ •์„ ๋ฐ”ํƒ•์œผ๋กœ ์ง„ํ–‰๋œ๋‹ค.
๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ, ์‚ฌํ›„ํ™•๋ฅ ์„ ๊ตฌํ•˜๊ธฐ ์–ด๋ ต๊ธฐ์— ๊ฐ€๋Šฅ๋„์™€ ์‚ฌ์ „ํ™•๋ฅ ์˜ ๊ณฑ์œผ๋กœ ํด๋ž˜์Šค๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค.

๋งŒ์•ฝ ๋‹ค์–‘ํ•œ ํŠน์ง•์œผ๋กœ ์ด๋ฃจ์–ด์ง„ data์˜ ๊ฒฝ์šฐ, feature๊ฐ€ ํฌ๋ฐ•ํ•˜๊ธฐ์— ๊ฐ€๋Šฅ๋„๋ฅผ ๊ตฌํ•˜๊ธฐ ๋˜ํ•œ ์–ด๋ ต๋‹ค.
์ด๋•Œ, Naïve Bayes๊ฐ€ ๋งค์šฐ ๊ฐ•๋ ฅํ•œ ํž˜์„ ๋ฐœํœ˜ํ•˜๋Š”๋ฐ, ๊ฐ ํŠน์ง•์ด ๋…๋ฆฝ์ ์ด๋ผ๋Š” ๊ฐ€์ •์„ ํ†ตํ•ด ์‚ฌ์ „ํ™•๋ฅ ์ด ์‹ค์ œ data corpus์—์„œ ์ถœํ˜„ํ•œ ๋นˆ๋„๋ฅผ ํ†ตํ•ด ์ถ”์ •์ด ๊ฐ€๋Šฅํ•ด์ง€๋Š” ๊ฒƒ์ด๋‹ค.

์ด์ฒ˜๋Ÿผ ๊ฐ„๋‹จํ•œ ๊ฐ€์ •์œผ๋กœ ๋ฐ์ดํ„ฐ์˜ ํฌ์†Œ์„ฑ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ์‰ฝ๊ณ  ๊ฐ•๋ ฅํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ MAP์˜ ์ •๋‹ตํด๋ž˜์Šค๋ผ๋ฒจ์˜ˆ์ธก์ด ๊ฐ€๋Šฅํ•ด์ง€๋Š” ๊ฒƒ์ด๋‹ค.

์ƒ์„ธ์˜ˆ์‹œ ๋ฐ ์‹์€ ์•„๋ž˜ 2.3์„ ์ฐธ๊ณ 

 

2.3 Sentiment Analysis ์˜ˆ์ œ
์œ„์™€ ๊ฐ™์ด class์™€ data๊ฐ€ ๊ธ์ •/๋ถ€์ •๊ณผ document๋กœ ์ฃผ์–ด์งˆ ๋•Œ, 
'I am happy to see this movie' ๋ผ๋Š” ๋ฌธ์žฅ์ด ์ฃผ์–ด์ง„๋‹ค๋ฉด, ์ด ๋ฌธ์žฅ์ด ๊ธ์ •์ธ์ง€ ๋ถ€์ •์ธ์ง€ ํŒ๋‹จํ•ด๋ณด์ž!

 Naïve Bayes๋ฅผ ํ™œ์šฉํ•ด ๋‹จ์–ด์˜ ์กฐํ•ฉ์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ๊ฐ๊ฐ ๋ถ„ํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค.
์ฆ‰, ๊ฐ ๋‹จ์–ด์˜ ์ถœํ˜„ํ™•๋ฅ ์„ ๋…๋ฆฝ์ ์ด๋ผ ๊ฐ€์ • ํ›„, ๊ฒฐํ•ฉ๊ฐ€๋Šฅ๋„ํ™•๋ฅ ์„ ๋ชจ๋‘ ๊ฐ๊ฐ์˜ ๊ฐ€๋Šฅ๋„ํ™•๋ฅ ๋กœ ๋ถ„ํ•ดํ•œ๋‹ค.
๊ทธ๋ ‡๊ฒŒ ๋˜๋ฉด ๋ฐ์ดํ„ฐ D์—์„œ์˜ ์ถœํ˜„๋นˆ๋„๋ฅผ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค.

์ด์ฒ˜๋Ÿผ corpus์—์„œ ๋‹จ์ˆœํžˆ ๊ฐ ๋‹จ์–ด์˜ class๋‹น ์ถœํ˜„๋นˆ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ ๊ฐ„๋‹จํ•œ sentiment analysis๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.

 

2.4 Add-One Smoothing
Naïve Bayes๊ฐ€์ •์„ ํ†ตํ•ด corpus์—์„œ ์ถœํ˜„ํ™•๋ฅ ์„ ๋…๋ฆฝ์œผ๋กœ ๋งŒ๋“ค์–ด ์ถœํ˜„ํšŸ์ˆ˜๋ฅผ ์ ๊ทน์ ์œผ๋กœ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜์—ˆ๋‹ค.
์—ฌ๊ธฐ์„œ ๋ฌธ์ œ์ ์ด ๋ฐœ์ƒํ•˜๋Š”๋ฐ, ๋งŒ์•ฝ Count(happy, neg)=0์ด๋ผ๋ฉด? P(happy | neg)=0์ด ๋˜์–ด๋ฒ„๋ฆฐ๋‹ค.
 
์•„๋ฌด๋ฆฌ data corpus์— ์กด์žฌํ•˜์ง€ ์•Š๋”๋ผ๋„ ๊ทธ๋Ÿฐ ์ด์œ ๋กœ ํ•ด๋‹น sample์˜ ์ถœํ˜„ํ™•๋ฅ ์„ 0์œผ๋กœ ์ถ”์ •ํ•ด๋ฒ„๋ฆฌ๋Š” ๊ฒƒ์€ ๋งค์šฐ ์œ„ํ—˜ํ•œ ์ผ์ด ๋˜์–ด๋ฒ„๋ฆฌ๊ธฐ์— ์•„๋ž˜์ฒ˜๋Ÿผ ๋ถ„์ž(์ถœํ˜„ํšŸ์ˆ˜)์— 1์„ ๋”ํ•ด์ฃผ๋ฉด ์‰ฝ๊ฒŒ ๋ฌธ์ œํ•ด๊ฒฐ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.(๋ฌผ๋ก  ์™„๋ฒฝํ•œ ํ•ด๊ฒฐ๋ฒ•์€ ์•„๋‹˜)

 

2.5 ์žฅ์  ๋ฐ ํ•œ๊ณ„
์žฅ์ : ๋‹จ์ˆœํžˆ ์ถœํ˜„๋นˆ๋„๋ฅผ ์„ธ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ ์‰ฝ๊ณ  ๊ฐ„๋‹จํ•˜์ง€๋งŒ ๊ฐ•๋ ฅ!!
๋”ฅ๋Ÿฌ๋‹์„ ํ™œ์šฉํ•˜๊ธฐ์— label๋ž‘ ๋ฌธ์žฅ ์ˆ˜๊ฐ€ ๋งค์šฐ ์ ์€ ๊ฒฝ์šฐ, ์˜คํžˆ๋ ค ๋ณต์žกํ•œ ๋”ฅ๋Ÿฌ๋‹๋ฐฉ์‹๋ณด๋‹ค ๋” ๋‚˜์€ ๋Œ€์•ˆ์ด ๋  ์ˆ˜ ์žˆ๋‹ค.

ํ•œ๊ณ„: 'I am not happy'์™€ 'I am happy'์—์„œ not์˜ ์ถ”๊ฐ€๋กœ ๋ฌธ์žฅ์€ ์ •๋ฐ˜๋Œ€๋œป์ด ๋œ๋‹ค.
์ˆ˜์‹์œผ๋กœ๋Š” P(not, happy) ≠ P(not)โˆ™P(happy)๊ฐ€ ๋œ๋‹ค.
๋‹จ์–ด๊ฐ„ ์ˆœ์„œ๋กœ ์ธํ•ด ์ƒ๊ธฐ๋Š” ์ •๋ณด๋„ ๋ฌด์‹œํ•  ์ˆ˜ ์—†๋Š”๋ฐ, "๊ฐ ํŠน์ง•์€ ์„œ๋กœ ๋…๋ฆฝ์ ์ด๋‹ค."๋ผ๋Š” Naïve Bayes์˜ ๊ธฐ๋ณธ๊ฐ€์ •์€ ์–ธ์–ด์˜ ์ด๋Ÿฐ ํŠน์ง•์„ ๋‹จ์ˆœํ™”ํ•ด ์ ‘๊ทผํ•ด ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค.

 

 

 

 

 

 

 

 

 


3.  ํ”ํ•œ ์˜คํ•ด 2

 

ํ‘œ์ œ์–ด์ถ”์ถœ(lemmatization), ์–ด๊ฐ„์ถ”์ถœ(stemming)์„ ์ˆ˜ํ–‰ํ•ด ์ ‘์‚ฌ๋“ฑ์„ ์ œ๊ฑฐํ•œ ์ดํ›„ Text Classification์„ ์ง„ํ–‰ํ•ด์•ผํ•˜๋Š”๊ฐ€??
์˜ˆ๋ฅผ๋“ค์–ด, "๋‚˜๋Š” ํ•™๊ต์— ๊ฐ€์š”"๋ผ๋Š” ์›๋ฌธ์ด ์žˆ๋‹ค๋ฉด, [๋‚˜ ํ•™๊ต ๊ฐ€] ์ฒ˜๋Ÿผ ์–ด๊ฐ„์ถ”์ถœ์ด ์ง„ํ–‰๋œ๋‹ค.

์ด๋Š” ์ ์€ corpus์—์„œ ํšจ๊ณผ๋ฅผ ๋ฐœํœ˜ํ•˜์—ฌ ํฌ์†Œ์„ฑ๋ฌธ์ œ์—์„œ ์–ด๋Š์ •๋„์˜ ํƒ€ํ˜‘์ ์ด ์กด์žฌํ•  ์ˆ˜ ์žˆ๊ฒŒ๋œ๋‹ค.
ํŠนํžˆ, DNN์ด์ „ ์ „ํ†ต์  ๊ธฐ๊ณ„ํ•™์Šต๋ฐฉ๋ฒ•์—์„œ ๋ถˆ์—ฐ์†์  ์กด์žฌ์ธ ์ž์—ฐ์–ด์— ์ข‹์€ ๋ŒํŒŒ๊ตฌ๋ฅผ ๋งˆ๋ จํ•ด์ฃผ์—ˆ๋‹ค.

ํ•˜์ง€๋งŒ, DNN์‹œ๋Œ€์—์„œ๋Š” ์„ฑ๊ณต์ ์œผ๋กœ ์ฐจ์›์ถ•์†Œ(https://chan4im.tistory.com/197#n2)๋ฅผ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋˜๋ฉด์„œ ํฌ์†Œ์„ฑ๋ฌธ์ œ๋Š” ๋”์ด์ƒ ํฐ ์žฅ์• ๋ฌผ์ด ๋˜์ง€๋Š” ์•Š๊ธฐ์— lemmazation, stemming๋“ฑ์€ ๋ฐ˜๋“œ์‹œ ์ •์„์ด๋ผ ํ•˜๊ธด ์–ด๋ ต๋‹ค.
 

[Gain Study_NLP]03. Word Embedding (word2vec, GloVe)

๐Ÿ“Œ ๋ชฉ์ฐจ 1. preview 2. Dimension Reduction 3. ํ”ํ•œ ์˜คํ•ด 1 4. word2vec [2013] 5. GloVe (Global Vectors for word representation) 6. word2vec ์˜ˆ์ œ (FastText ์˜คํ”ˆ์†Œ์Šค) ๐Ÿ˜š ๊ธ€์„ ๋งˆ์น˜๋ฉฐ... 1. Preview [Gain Study_NLP]02. Similarity. &. Ambiguit

chan4im.tistory.com

๋˜ํ•œ, "๋‚˜๋Š” ํ•™๊ต์— ๊ฐ€์š”" / "๋‚˜๋งŒ ํ•™๊ต์— ๊ฐ€์š”" ๋ผ๋Š” ๋‘ ๋ฌธ์žฅ์€ ์„œ๋กœ ๊ธ์ • / ๋ถ€์ •์ด๋ผ๋Š” ๋‹ค๋ฅธ class๋ฅผ ๊ฐ–๊ธฐ์— lemmazation์ด๋‚˜ stemming์„ ํ•œ ํ›„ Text Classification์— ์ ‘๊ทผํ•˜๋Š” ๊ฒƒ์€ ๋ฐ”๋žŒ์งํ•˜์ง€ ๋ชปํ•œ ๋ฐฉ๋ฒ•์ผ ์ˆ˜๋„ ์žˆ๋‹ค.
๋”ฐ๋ผ์„œ ์ดํ›„ ์„ค๋ช…๋  ์‹ ๊ฒฝ๋ง๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด text classification์„ ์‹œ๋„ํ•˜๋Š”๊ฒƒ์ด ํ›จ์”ฌ ๋ฐ”๋žŒ์งํ•˜๋‹ค.

๋งŒ์•ฝ, ์„ฑ๋Šฅํ–ฅ์ƒ์„ ์œ„ํ•ด tuning ๋ฐ ์—ฌ๋Ÿฌ ์‹œ๋„์—์„œ corpus์˜ ๋ถ€์กฑ์ด ์„ฑ๋Šฅ์ €ํ•˜์˜ ์›์ธ์ด๋ผ ์ƒ๊ฐ๋  ๋•Œ, ์ถ”๊ฐ€์ ์ธ ์‹คํ—˜์œผ๋กœ๋Š” ๊ดœ์ฐฎ์€ ์‹œ๋„๊ฐ€ ๋  ์ˆ˜ ์žˆ๋‹ค.

 

 

 

 

 

 

 

 


4. RNN ํ™œ์šฉํ•˜๊ธฐ

์ด์ œ DNN์„ ํ†ตํ•œ text classification๋ฌธ์ œ๋ฅผ ์‚ดํŽด๋ณด์ž.
๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ RNN์„ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์œผ๋กœ sequential data๋ผ๋Š” ๋ฌธ์žฅ์˜ ํŠน์ง•์„ ๊ฐ€์žฅ ์ž˜ ํ™œ์šฉ๊ฐ€๋Šฅํ•œ ์‹ ๊ฒฝ๋ง ๊ตฌ์กฐ์ด๋‹ค.

n๊ฐœ์˜ ๋‹จ์–ด๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฌธ์žฅ x์— ๋Œ€ํ•ด RNN์ด ์ˆœ์ „ํŒŒ ์‹œ, n๊ฐœ์˜ hidden_state๋ฅผ ์–ป๋Š”๋‹ค.
์ด๋•Œ, ๊ฐ€์žฅ ๋งˆ์ง€๋ง‰ ์€๋‹‰์ธต์œผ๋กœ text classification์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ RNN์€ ์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์„ ๋ถ„๋ฅ˜๋ฌธ์ œ์— ๋งž๊ฒŒ encodingํ•  ์ˆ˜ ์žˆ๋‹ค.
์ฆ‰, RNN์˜ ์ถœ๋ ฅ๊ฐ’์€ ๋ฌธ์žฅ์ž„๋ฒ ๋”ฉ๋ฒกํ„ฐ(sentence embedding vector)๋ผ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

4.1 Architecture
์•Œ๋‹ค์‹œํ”ผ, text์—์„œ ๋‹จ์–ด๋Š” ๋ถˆ์—ฐ์†์  ๊ฐ’์ด๊ธฐ์— ์ด๋“ค์ด ๋ชจ์ธ ๋ฌธ์žฅ ๋˜ํ•œ, ๋ถˆ์—ฐ์†์ ๊ฐ’์ด๋‹ค.
์ฆ‰, ์ด์‚ฐํ™•๋ฅ ๋ถ„ํฌ์—์„œ ๋ฌธ์žฅ์„ samplingํ•œ ๊ฒƒ์ด๋ฏ€๋กœ ์ž…๋ ฅ์œผ๋กœ๋Š” one-hot๋ฒกํ„ฐ๋“ค์ด ์—ฌ๋Ÿฌ time-step์œผ๋กœ ์ฃผ์–ด์ง„๋‹ค.

mini-batch๊นŒ์ง€ ๊ณ ๋ คํ•œ๋‹ค๋ฉด, ์ž…๋ ฅ์€ 3์ฐจ์› tensor (n×m×|V|)๊ฐ€ ๋  ๊ฒƒ์ด๋‹ค.
 โˆ™  n : mini_batch size (= ํ•œ๋ฒˆ์— ์ฒ˜๋ฆฌํ•  ๋ฌธ์„œ์˜ ๊ฐœ์ˆ˜)
 โˆ™ m : sentence length (= feature vector์˜ ์ฐจ์›์ˆ˜ = ํ…์ŠคํŠธ ๋ฌธ์„œ์˜ ๋‹จ์–ด์˜ ๊ฐœ์ˆ˜)
 โˆ™|V| : Vocabulary size (= Dataset๋‚ด์˜ ๊ณ ์œ ํ•œ ๋‹จ์–ด/ํ† ํฐ์˜ ์ด ์ˆ˜)


ํ•˜์ง€๋งŒ ์›ํ•ซ๋ฒกํ„ฐ๋Š” ์ฃผ์–ด์ง„ |V| ์ฐจ์›์— ๋‹จ ํ•˜๋‚˜์˜ 1๊ณผ |V|-1๊ฐœ์˜ 0์œผ๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค.
ํšจ์œจ์  ์ €์žฅ์„ ์œ„ํ•ด ๊ตณ์ด ์›ํ•ซ๋ฒกํ„ฐ ์ „์ฒด๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์„ ํ•„์š”๋Š” ์—†๊ธฐ์— 
์›ํ•ซ๋ฒกํ„ฐ๋ฅผ 0 ~ |V|-1 ์‚ฌ์ด ์ •์ˆ˜๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค๋ฉด,  2์ฐจ์› matrix (n×m)์œผ๋กœ ์ถฉ๋ถ„ํžˆ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

์ด๋ ‡๊ฒŒ ์›ํ•ซ์ธ์ฝ”๋”ฉ๋œ (n×m) tensor๋ฅผ embedding์ธต์— ํ†ต๊ณผ์‹œํ‚ค๋ฉด,
word embedding tensor๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

์ดํ›„ word_embedding tensor๋ฅผ RNN์— ํ†ต๊ณผ์‹œํ‚ค๋ฉด ๋œ๋‹ค.
์ด๋•Œ, ์šฐ๋ฆฐ RNN์— ๋Œ€ํ•ด ๊ฐ time-step๋ณ„, ๊ณ„์ธต๋ณ„๋กœ ๊ตฌ๋ถ„ํ•ด word_embedding tensor๋‚˜ hidden_state๋ฅผ ๋„ฃ์–ด์ค„ ํ•„์š”๊ฐ€ ์—†๋‹ค.

์ตœ์ข…์ ์œผ๋กœ ์ œ์ผ ๋งˆ์ง€๋ง‰ time-step๋งŒ ์„ ํƒํ•ด softmax์ธต์„ ํ†ต๊ณผ์‹œ์ผœ ์ด์‚ฐํ™•๋ฅ ๋ถ„ํฌ P(y | x;θ)๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
์ด๋•Œ ์ œ์ผ ๋งˆ์ง€๋ง‰ time-step์€ H[:, -1]๊ณผ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ index slicing์„ ํ†ตํ•ด ๋„์ถœํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ชจ๋ธ๊ตฌ์กฐ๋กœ ๋ณด๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
๋งˆ์ง€๋ง‰์œผ๋กœ ์›ํ•ซ๋ฒกํ„ฐ y์ด๊ธฐ์— ์ธ๋ฑ์Šค์˜ ๋กœ๊ทธํ™•๋ฅ ๊ฐ’๋งŒ ์ตœ๋Œ€ํ™”ํ•˜๋ฉด ๋˜๋ฏ€๋กœ
CE Loss ์ˆ˜์‹์€ NLL(์Œ์˜ ๋กœ๊ทธ๊ฐ€๋Šฅ๋„)๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ๊ณผ ๋™์น˜์ด๋‹ค.

 

Pytorch ๊ตฌํ˜„์˜ˆ์ œ
์•ž์˜ ์ˆ˜์‹์„ pytorch๋กœ ๊ตฌํ˜„ํ•œ ์˜ˆ์ œ์ฝ”๋“œ๋กœ ์—ฌ๋Ÿฌ๊ณ„์ธต์œผ๋กœ ์ด๋ค„์ง„ LSTM์„ ์‚ฌ์šฉํ–ˆ๋‹ค.
โˆ™ LSTM์—๋Š” ๊ฐ ์ธต๋งˆ๋‹ค Dropout์ด ์‚ฌ์šฉ๋˜๋ฉฐ
โˆ™ NLL(์Œ์˜ ๋กœ๊ทธ๊ฐ€๋Šฅ๋„)์†์‹คํ•จ์ˆ˜๋กœ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด logsoftmax๋กœ ๋กœ๊ทธํ™•๋ฅ ์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
import torch.nn as nn


class RNNClassifier(nn.Module):

    def __init__(
        self,
        input_size,
        word_vec_size,
        hidden_size,
        n_classes,
        n_layers=4,
        dropout_p=.3,
    ):
        self.input_size = input_size  # vocabulary_size
        self.word_vec_size = word_vec_size
        self.hidden_size = hidden_size
        self.n_classes = n_classes
        self.n_layers = n_layers
        self.dropout_p = dropout_p

        super().__init__()

        self.emb = nn.Embedding(input_size, word_vec_size)
        self.rnn = nn.LSTM(
            input_size=word_vec_size,
            hidden_size=hidden_size,
            num_layers=n_layers,
            dropout=dropout_p,
            batch_first=True,
            bidirectional=True,
        )
        self.generator = nn.Linear(hidden_size * 2, n_classes)
        # We use LogSoftmax + NLLLoss instead of Softmax + CrossEntropy
        self.activation = nn.LogSoftmax(dim=-1)

    def forward(self, x):
        # |x| = (batch_size, length)
        x = self.emb(x)
        # |x| = (batch_size, length, word_vec_size)
        x, _ = self.rnn(x)
        # |x| = (batch_size, length, hidden_size * 2)
        y = self.activation(self.generator(x[:, -1]))
        # |y| = (batch_size, n_classes)

        return yโ€‹

 

 

 

 

 

 

 

 

 


5. CNN ํ™œ์šฉํ•˜๊ธฐ

5.1 Convolution Operation
5.2 Convolution Layer

์ž์„ธํ•œ ์„ค๋ช…์€ ์•„๋ž˜ ๋งํฌ ์ฐธ๊ณ  (https://chan4im.tistory.com/133)

 

self.DL.(03). CNN (Convolution Neural Network)

๐Ÿง CNN (Convolution Neural Network) ๐Ÿคซ CNN, ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์ด๋ž€? ์—ฌ๋Ÿฌ ๋ถ„์•ผ, ํŠนํžˆ๋‚˜ image classification์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ์ด๋Ÿฐ ํ•ฉ์„ฑ๊ณฑ ์‹ ๊ฒฝ๋ง์—์„œ ํ•ฉ์„ฑ๊ณฑ์˜ ์—ฐ์‚ฐ์€ ์ •์˜ ์ž์ฒด์— ๊ฐ€์ค‘์น˜๋ฅผ

chan4im.tistory.com

 

 

5.3 Text Classification with CNN
CNN์€ RNN๊ณผ ๋‹ฌ๋ฆฌ ์ˆœ์ฐจ์  ์ •๋ณด๋ณด๋‹ค๋Š” ํŒจํ„ด์ธ์‹ ๋ฐ ํŒŒ์•…์— ์ค‘์ ์„ ๋‘๋Š” ๊ตฌ์กฐ๋ฅผ ๊ฐ–๋Š”๋‹ค.
CNN์€ classification์— ์ค‘์š”ํ•œ ๋‹จ์–ด๋“ค์˜ ์กฐํ•ฉ์— ๋Œ€ํ•œ ํŒจํ„ด์„ ๊ฐ์ง€ํ•˜๊ธฐ๋„ ํ•˜๋Š”๋ฐ,
ํ•ด๋‹น ํด๋ž˜์Šค๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋‹จ์–ด์กฐํ•ฉ์— ๋Œ€ํ•œ pattern์˜ ์œ ๋ฌด๋ฅผ ๊ฐ€์žฅ ์ค‘์‹œํ•œ๋‹ค.

์˜ˆ๋ฅผ๋“ค์–ด, 'good'์ด๋ผ๋Š” ๋‹จ์–ด๋Š” ๊ธ์ •/๋ถ€์ • ๋ถ„๋ฅ˜์— ํ•ต์‹ฌ์ด ๋˜๋Š” ์ค‘์š”ํ•œ signal๋กœ ์ž‘๋™ํ•œ๋‹ค.
๊ทธ๋ ‡๋‹ค๋ฉด, 'good'์— ํ•ด๋‹นํ•˜๋Š” embedding vector์˜ pattern์„ ๊ฐ์ง€ํ•˜๋Š” filter๋ฅผ ๋ชจ๋ธ์ด ํ•™์Šตํ•œ๋‹ค๋ฉด?
'better', 'best', 'great'๋“ฑ์˜ ๋‹จ์–ด๋“ค๋„ 'good'๊ณผ ๋น„์Šทํ•œ ๋ฒกํ„ฐ๊ฐ’์„ ๊ฐ–๊ฒŒ ๋  ๊ฒƒ์ด๋‹ค.
→ ๋” ๋‚˜์•„๊ฐ€ ๋‹จ์–ด๋“ค์˜ ์กฐํ•ฉ ํŒจํ„ด(word sequence pattern)์„ ๊ฐ์ง€ํ•˜๋Š” filter๋„ ํ•™์Šต์ด ๊ฐ€๋Šฅํ•  ๊ฒƒ์ด๋‹ค.

๋ชจ๋ธ ๊ตฌ์กฐ์— ๋Œ€ํ•ด ๊ฐ„๋‹จํžˆ ์„ค๋ช…ํ•˜์ž๋ฉด, ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
๋จผ์ € one-hot๋ฒกํ„ฐ๋ฅผ ํ‘œํ˜„ํ•˜๋Š” ์ธ๋ฑ์Šค๊ฐ’์„ ๋‹จ์–ด์ž„๋ฒ ๋”ฉ๋ฒกํ„ฐ(1์ฐจ์›)๋กœ ๋ณ€ํ™˜ํ•œ๋‹ค.
๊ทธ ํ›„ ๋ฌธ์žฅ ๋‚ด ๋ชจ๋“  time-step์˜ ๋‹จ์–ด์ž„๋ฒ ๋”ฉ๋ฒกํ„ฐ๋ฅผ ํ•ฉ์น˜๋ฉด 2์ฐจ์› ํ–‰๋ ฌ์ด ๋œ๋‹ค.
๊ทธ ํ›„ Convolution Operation์„ ์ˆ˜ํ–‰ํ•˜๋ฉด CNN์ด ํšจ๊ณผ๋ฅผ ๋ฐœํœ˜ํ•œ๋‹ค.



 

Pytorch ๊ตฌํ˜„์˜ˆ์ œ
RNN์˜ text classification์ฒ˜๋Ÿผ NLL(์Œ์˜ ๋กœ๊ทธ๊ฐ€๋Šฅ๋„)์†์‹คํ•จ์ˆ˜๋กœ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด logsoftmax๋กœ ๋กœ๊ทธํ™•๋ฅ ์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
import torch
import torch.nn as nn


class CNNClassifier(nn.Module):

    def __init__(
        self,
        input_size,
        word_vec_size,
        n_classes,
        use_batch_norm=False,
        dropout_p=.5,
        window_sizes=[3, 4, 5],
        n_filters=[100, 100, 100],
    ):
        self.input_size = input_size  # vocabulary size
        self.word_vec_size = word_vec_size
        self.n_classes = n_classes
        self.use_batch_norm = use_batch_norm
        self.dropout_p = dropout_p
        # window_size means that how many words a pattern covers.
        self.window_sizes = window_sizes
        # n_filters means that how many patterns to cover.
        self.n_filters = n_filters

        super().__init__()

        self.emb = nn.Embedding(input_size, word_vec_size)
        # Use nn.ModuleList to register each sub-modules.
        self.feature_extractors = nn.ModuleList()
        for window_size, n_filter in zip(window_sizes, n_filters):
            self.feature_extractors.append(
                nn.Sequential(
                    nn.Conv2d(
                        in_channels=1, # We only use one embedding layer.
                        out_channels=n_filter,
                        kernel_size=(window_size, word_vec_size),
                    ),
                    nn.ReLU(),
                    nn.BatchNorm2d(n_filter) if use_batch_norm else nn.Dropout(dropout_p),
                )
            )

        # An input of generator layer is max values from each filter.
        self.generator = nn.Linear(sum(n_filters), n_classes)
        # We use LogSoftmax + NLLLoss instead of Softmax + CrossEntropy
        self.activation = nn.LogSoftmax(dim=-1)

    def forward(self, x):
        # |x| = (batch_size, length)
        x = self.emb(x)
        # |x| = (batch_size, length, word_vec_size)
        min_length = max(self.window_sizes)
        if min_length > x.size(1):
            # Because some input does not long enough for maximum length of window size,
            # we add zero tensor for padding.
            pad = x.new(x.size(0), min_length - x.size(1), self.word_vec_size).zero_()
            # |pad| = (batch_size, min_length - length, word_vec_size)
            x = torch.cat([x, pad], dim=1)
            # |x| = (batch_size, min_length, word_vec_size)

        # In ordinary case of vision task, you may have 3 channels on tensor,
        # but in this case, you would have just 1 channel,
        # which is added by 'unsqueeze' method in below:
        x = x.unsqueeze(1)
        # |x| = (batch_size, 1, length, word_vec_size)

        cnn_outs = []
        for block in self.feature_extractors:
            cnn_out = block(x)
            # |cnn_out| = (batch_size, n_filter, length - window_size + 1, 1)

            # In case of max pooling, we does not know the pooling size,
            # because it depends on the length of the sentence.
            # Therefore, we use instant function using 'nn.functional' package.
            # This is the beauty of PyTorch. :)
            cnn_out = nn.functional.max_pool1d(
                input=cnn_out.squeeze(-1),
                kernel_size=cnn_out.size(-2)
            ).squeeze(-1)
            # |cnn_out| = (batch_size, n_filter)
            cnn_outs += [cnn_out]
        # Merge output tensors from each convolution layer.
        cnn_outs = torch.cat(cnn_outs, dim=-1)
        # |cnn_outs| = (batch_size, sum(n_filters))
        y = self.activation(self.generator(cnn_outs))
        # |y| = (batch_size, n_classes)

        return yโ€‹

 

 

 

 

 

 

 

 

 


6. ์‰ฌ์–ด๊ฐ€๊ธฐ) Multi-Label  Classification

Mutli-Label Classification: ๊ธฐ์กด softmax ๋ถ„๋ฅ˜์™€ ๋‹ฌ๋ฆฌ ์—ฌ๋Ÿฌ ํด๋ž˜์Šค๊ฐ€ ๋™์‹œ์— ์ •๋‹ต์ด ๋  ์ˆ˜ ์žˆ๋Š”๊ฒƒ

6.1 Binary-Classification
sigmoid. &. BCELoss๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. (์ด์ง„๋ถ„๋ฅ˜์ƒํ™ฉ์€ Bernoulli Distribution์ด๊ธฐ ๋•Œ๋ฌธ)

์ˆ˜์‹์€ ์•„๋ž˜์™€ ๊ฐ™์€๋ฐ, BCE Loss๋Š” ์ด์ง„๋ถ„๋ฅ˜์— ํŠนํ™”๋œ ๊ธฐ์กด CE Loss์˜ ํ•œ ์ข…๋ฅ˜์ด๋‹ค.

์ด ์ˆ˜์‹์—์„œ y๋Š” 0๋˜๋Š” 1์„ ๊ฐ–๋Š” ๋ถˆ์—ฐ์†์ ์ธ ๊ฐ’์ด๊ณ 
y_hat์€ sigmoid๋ฅผ ํ†ต๊ณผํ•œ 0~1์‚ฌ์ด์˜ ์—ฐ์†์ ์ธ ์ถœ๋ ฅ๊ฐ’์ด๋‹ค.

 

6.2 Multi-Binary Classification
๊ทธ๋ ‡๋‹ค๋ฉด, Multi-Label๋ฌธ์ œ์—์„œ Binary Classification๋ฅผ ์–ด๋–ป๊ฒŒ ์ ์šฉํ• ๊นŒ?

n๊ฐœ์˜ ํ•ญ๋ชฉ์„ ๊ฐ–๋Š” ๋ถ„๋ฅ˜์— ๋Œ€ํ•ด ์‹ ๊ฒฝ๋ง์˜ ๋งˆ์ง€๋ง‰ ๊ณ„์ธต์— n๊ฐœ์˜ ๋…ธ๋“œ๋ฅผ ์ฃผ๊ณ , ๋ชจ๋‘ sigmoidํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•œ๋‹ค.
์ฆ‰, ํ•˜๋‚˜์˜ ๋ชจ๋ธ๋กœ ์—ฌ๋Ÿฌ ์ด์ง„๋ถ„๋ฅ˜์ž‘์—…์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
๊ทธ๋ ‡๋‹ค๋ฉด ์ตœ์ข… ์†์‹คํ•จ์ˆ˜๋Š”? ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

 

6.3 ETC
์ด์ง„๋ถ„๋ฅ˜๊ฐ€ ์•„๋‹ ๋•Œ๋Š”, sigmoid๊ฐ€ ์•„๋‹Œ softmax๋ฅผ ์‚ฌ์šฉํ•˜๊ณ , Loss๋„ Cross-Entropy๋กœ ๋ฐ”๊พธ๋ฉด ๋œ๋‹ค.

 

 

 

 

 

 


๋งˆ์น˜๋ฉฐ...

์ด๋ฒˆ์‹œ๊ฐ„์—๋Š” text classification์— ๋Œ€ํ•ด ๋‹ค๋ฃจ์—ˆ๋‹ค.
text classification์€ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์˜ ๋ณต์žก๋„๋‚˜ ์ฝ”๋“œ์ž‘์„ฑ๋‚œ๋„์— ๋น„ํ•ด ํ™œ์šฉ๋„๊ฐ€ ๋งค์šฐ ๋†’์€ ๋ถ„์•ผ์ด๋‹ค.
๋‹ค๋งŒ, ์‹ ๊ฒฝ๋ง์‚ฌ์šฉ์ด์ „, ๋ถˆ์—ฐ์†์ ๊ฐ’์— ๋Œ€ํ•œ ํฌ์†Œ์„ฑ๋ฌธ์ œํ•ด๊ฒฐ์„ ํ•˜์ง€ ๋ชปํ•œ ์ฑ„,  
Naïve Bayes๋ฐฉ์‹๊ณผ ๊ฐ™์ด ๋งค์šฐ ๊ฐ„๋‹จํ•˜๊ณ  ์ง๊ด€์ ์ธ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ๋‹ค.
๋‹ค๋งŒ, Naïve Bayes๋ฐฉ์‹์€ "๊ฐ feature๋Š” independentํ•˜๋‹ค!"๋ผ๋Š” ๊ฐ•๋ ฅํ•œ ๊ฐ€์ •์œผ๋กœ์ธํ•ด classification์˜ ์ •ํ™•๋„๊ฐ€ ๋–จ์–ด์งˆ ์ˆ˜ ๋ฐ–์— ์—†์—ˆ๋‹ค.


ํ•˜์ง€๋งŒ ๋”ฅ๋Ÿฌ๋‹์˜ ๋„์ž…์œผ๋กœ ๋งค์šฐ ํšจ์œจ์ ์ด๊ณ  ์ •ํ™•ํ•˜๊ฒŒ text classification์ด ๊ฐ€๋Šฅํ•ด์กŒ๋Š”๋ฐ,
RNN์€ ๋‹จ์–ด๋“ค์„ ์ˆœ์ฐจ์ ์œผ๋กœ ๋ฐ›์•„ ๊ฐ€์žฅ ๋งˆ์ง€๋ง‰ time-step์—์„œ classification์„ ์˜ˆ์ธกํ•˜๊ณ 
CNN์€ classification์— ์ค‘์š”ํ•œ ๋‹จ์–ด๋“ค์˜ ์กฐํ•ฉ์— ๋Œ€ํ•œ ํŒจํ„ด์„ ๊ฐ์ง€ํ•˜๊ธฐ๋„ ํ•œ๋‹ค.

โˆ™RNN์˜ ๊ฒฝ์šฐ, ๋ฌธ์žฅ์ „์ฒด์˜ ๋งฅ๋ฝ๊ณผ ์˜๋ฏธ์— ๋” ์ง‘์ค‘ํ•ด classification์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ
โˆ™CNN์˜ ๊ฒฝ์šฐ, ํ•ด๋‹น ํด๋ž˜์Šค๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋‹จ์–ด์กฐํ•ฉ์— ๋Œ€ํ•œ pattern์˜ ์œ ๋ฌด๋ฅผ ๊ฐ€์žฅ ์ค‘์‹œํ•œ๋‹ค.

๋”ฐ๋ผ์„œ RNN๊ณผ CNN์„ ์กฐํ•ฉํ•ด Ensemble Model๋กœ ๊ตฌํ˜„ํ•œ๋‹ค๋ฉด ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„์ˆ˜๋„ ์žˆ๋‹ค.
์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋‹ค๋ฅธ ๋ชจ๋ธ๋“ค์„ ์ฐธ๊ณ ํ•œ๋‹ค๋ฉด, ๊ธด๋ฌธ์žฅ์ด๋‚˜ ์–ด๋ ค์šด ํ…์ŠคํŠธ์—์„œ๋„ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

+ Recent posts