๐Ÿ“Œ ๋ชฉ์ฐจ

1. preview
2. N-gram

3. Language Model Metric
4. SRILM ํ™œ์šฉํ•œ N-gram ์‹ค์Šต
5. NNLM (BOS, EOS)
6. Language Model์˜ ํ™œ์šฉ (Speech Recognition / ๊ธฐ๊ณ„๋ฒˆ์—ญ / OCR ๋“ฑ)

๐Ÿ˜š ๊ธ€์„ ๋งˆ์น˜๋ฉฐ...

 

 

 


1. Preview

1.1 LM (Language Model)
์–ธ์–ด๋ชจ๋ธ(LM; Language Model)์€ ๋ฌธ์žฅ์˜ ํ™•๋ฅ ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ชจ๋ธ์ด๋‹ค.
ํ™•๋ฅ ๊ฐ’์„ ํ†ตํ•ด ๋ฌธ์žฅ ์ž์ฒด์˜ ์ถœํ˜„ํ™•๋ฅ , ์ด์ „๋‹จ์–ด๋“ค์— ๋Œ€ํ•ด ๋‹ค์Œ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค.
๊ฒฐ๊ณผ์ ์œผ๋กœ ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์ด ์–ผ๋งˆ๋‚˜ ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ์œ ์ฐฝํ•œ์ง€(fluent) ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค.

 

1.2 Hell ๋‚œ์ด๋„, ํ•œ๊ตญ์–ด
โˆ™ ํ•œ๊ตญ์–ด: ๋Œ€ํ‘œ์ ์ธ ๊ต์ฐฉ์–ด
โˆ™ ์˜์–ด: ๊ณ ๋ฆฝ์–ด(+ ๊ตด์ ˆ์–ด)
โˆ™ ์ค‘๊ตญ์–ด: ๊ณ ๋ฆฝ์–ด

๊ต์ฐฉ์–ด์˜ ํŠน์ง•์ƒ ๋‹จ์–ด์˜ ์˜๋ฏธ๋‚˜ ์—ญํ• ์€ ์–ด์ˆœ๋ณด๋‹ค๋Š” ๋‹จ์–ด์— ๋ถ€์ฐฉ๋˜๋Š” ์–ด๋ฏธ๊ฐ™์€ ์ ‘์‚ฌ๋‚˜ ์กฐ์‚ฌ์— ์˜ํ•ด ๊ฒฐ์ •๋œ๋‹ค.
์ฆ‰, ๋‹จ์–ด์˜ ์–ด์ˆœ์ด ์ค‘์š”ํ•˜์ง€์•Š๊ณ  ์ƒ๋žต ๋˜ํ•œ ๊ฐ€๋Šฅํ•˜๊ธฐ์— ๋‹จ์–ด๊ฐ„์— ํ™•๋ฅ ๊ณ„์‚ฐ ์‹œ ๋ถˆ๋ฆฌํ•˜๋‹ค.

์˜์–ด๋‚˜ ๊ธฐํƒ€ ๋ผํ‹ด์–ด ๊ธฐ๋ฐ˜ ์–ธ์–ด๋“ค์€ ์–ด์ˆœ์ด ๋” ๊ทœ์น™์ ์ด๊ธฐ์— ํ•œ๊ตญ์–ด์— ๋น„ํ•ด ํ—ท๊ฐˆ๋ฆด ๊ฐ€๋Šฅ์„ฑ์ด ๋‚ฎ๋‹ค.
์ถ”๊ฐ€์ ์œผ๋กœ ํ•œ๊ตญ์–ด๋Š” ๊ต์ฐฉ์–ด์˜ ํŠน์ง•์ƒ ์ ‘์‚ฌ ๋ฐ ์กฐ์‚ฌ๋กœ ๋‹จ์–ด์˜ ์˜๋ฏธโˆ™์—ญํ• ์ด ๊ฒฐ์ •๋˜๊ธฐ์— ์•„๋ž˜์™€ ๊ฐ™์ด ์—ฌ๋Ÿฌ ์กฐ์‚ฌ๊ฐ€ ๋ถ™์–ด ์ˆ˜๋งŽ์€ ๋‹จ์–ด๋กœ ํŒŒ์ƒ๋  ์ˆ˜ ์žˆ๋‹ค.
ex) ํ•™๊ต์—, ํ•™๊ต์—์„œ, ํ•™๊ต์—์„œ๋„, ํ•™๊ต๋ฅผ, ํ•™๊ต๋กœ, ํ•™๊ต๊ฐ€, ํ•™๊ต์กฐ์ฐจ๋„, . . .

๋”ฐ๋ผ์„œ ์–ด๋ฏธ๋ฅผ ๋ถ„๋ฆฌํ•ด์ฃผ์ง€ ์•Š์œผ๋ฉด ์–ดํœ˜์˜ ์ˆ˜๊ฐ€ ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ๋Š˜๊ธฐ์— ํฌ์†Œ์„ฑ์ด ๋†’์•„์ ธ ๋ฌธ์ œํ•ด๊ฒฐ์ด ๋” ์–ด๋ ค์›Œ์งˆ ์ˆ˜ ์žˆ๋‹ค.

 

1.3 ๋ฌธ์žฅ์˜ ํ™•๋ฅ ํ‘œํ˜„
๋ฌธ์žฅ์˜ ํ™•๋ฅ ์€ Bayes ์ •๋ฆฌ์— ๋”ฐ๋ผ ์กฐ๊ฑด๋ถ€ ํ™•๋ฅ ์„ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.
(์ฐธ๊ณ : https://chan4im.tistory.com/199#n2)
 

[Gain Study_NLP]05. Text Classification (Naïve Bayes(MLE/MAP), RNN, CNN, Multi-Label ๋ถ„๋ฅ˜)

๐Ÿ“Œ ๋ชฉ์ฐจ 1. preview 2. Naïve Bayes ํ™œ์šฉํ•˜๊ธฐ 3. ํ”ํ•œ ์˜คํ•ด 2 4. RNN ํ™œ์šฉํ•˜๊ธฐ 5. CNN ํ™œ์šฉํ•˜๊ธฐ 6. ์‰ฌ์–ด๊ฐ€๊ธฐ) Multi-Label Classification ๐Ÿ˜š ๊ธ€์„ ๋งˆ์น˜๋ฉฐ... 1. Preview Text Classification์ด๋ž€, ํ…์ŠคํŠธโˆ™๋ฌธ์žฅโˆ™๋ฌธ์„œ๋ฅผ ์ž…๋ ฅ

chan4im.tistory.com

 

 

 

 

 

 

 

 


2. N-gram

์ „์ฒด ๋‹จ์–ด๋ฅผ ์กฐํ•ฉํ•˜๋Š” ๋Œ€์‹ , ๋ฐ”๋กœ ์•ž์˜ ์ผ๋ถ€ ์กฐํ•ฉ๋งŒ ์ถœํ˜„๋นˆ๋„๋ฅผ ๊ณ„์‚ฐํ•ด ํ™•๋ฅ ์„ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ• (์ด๋•Œ, N = k+1)

2.1 Sparse Data Problem
LM์€ ๋ฌธ์žฅ์˜ ํ™•๋ฅ ์„ ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๊ณ , ํ•ด๋‹นํ™•๋ฅ ์„ ๊ทผ์‚ฌํ•˜๊ธฐ ์œ„ํ•ด ์ˆ˜์ง‘ํ•œ corpus์—์„œ ๊ฐ word sequence ๋นˆ๋„๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉด ๋œ๋‹ค.

ํ•˜์ง€๋งŒ, ํ™•๋ฅ ์ถ”์ •์น˜๋ฅผ ์ œ๋Œ€๋กœ ๊ตฌํ•˜๊ธฐ๋ž€ ์–ด๋ ค์šด ๋ฌธ์ œ์ด๋‹ค.
์ˆ˜๋งŽ์€ ๋‹จ์–ด๋ฅผ crawlingํ•˜๋”๋ผ๋„, ์• ์ดˆ์— ์ถœํ˜„๊ฐ€๋Šฅํ•œ ๋‹จ์–ด์˜ ์กฐํ•ฉ์˜ ๊ฒฝ์šฐ์˜ ์ˆ˜๋Š” ํ›จ์”ฌ ๋” ํฌ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

๋‹จ์–ด์˜ ์กฐํ•ฉ์ด ์กฐ๊ธˆ๋งŒ ๊ธธ์–ด์ง€๋”๋ผ๋„,
corpus์—์„œ ์ถœํ˜„๋นˆ๋„๋ฅผ ๊ตฌํ•  ์ˆ˜ ์—†๊ธฐ์— ๋ถ„์ž๊ฐ€ 0์ด๋˜์–ด ํ™•๋ฅ ์ด 0์ด๋˜๊ฑฐ๋‚˜,
์‹ฌ์ง€์–ด ๋ถ„๋ชจ๊ฐ€ 0์ด๋˜์–ด ์ •์˜๋ถˆ๊ฐ€๋Šฅ์ด๋ผ ํ•  ์ˆ˜๋„ ์žˆ๋‹ค.

๋ฌผ๋ก  ์ด์—๋Œ€ํ•ด, ์ฐจ์›์˜ ์ €์ฃผ์™€ ํฌ์†Œ์„ฑ์— ๋Œ€ํ•ด ๋ฏธ๋ฆฌ ๋‹ค๋ฃจ๊ธด ํ–ˆ๋‹ค.
โˆ™ ๋‹จ์ˆœ์„ฑ & ๋ชจํ˜ธ์„ฑ(https://chan4im.tistory.com/196)
โˆ™ word embedding (https://chan4im.tistory.com/197#n2)
 

[Gain Study_NLP]02. Similarity. &. Ambiguity (one-hot encoding, feature, TF-IDF)

๐Ÿ“Œ ๋ชฉ์ฐจ 1. word sense 2. one-hot encoding 3. thesaurus(์–ดํœ˜๋ถ„๋ฅ˜์‚ฌ์ „) 4. Feature 5. Feature Extraction. &. Text Mining (TF-IDF) 6. Feature vector ์ƒ์„ฑ 7. Vector Similarity (with Norm) 8. ๋‹จ์–ด ์ค‘์˜์„ฑ ํ•ด์†Œ (WSD) 9. Selection Preference ๐Ÿ˜š

chan4im.tistory.com

 

[Gain Study_NLP]03. Word Embedding (word2vec, GloVe)

๐Ÿ“Œ ๋ชฉ์ฐจ 1. preview 2. Dimension Reduction 3. ํ”ํ•œ ์˜คํ•ด 1 4. word2vec [2013] 5. GloVe (Global Vectors for word representation) 6. word2vec ์˜ˆ์ œ (FastText ์˜คํ”ˆ์†Œ์Šค) ๐Ÿ˜š ๊ธ€์„ ๋งˆ์น˜๋ฉฐ... 1. Preview [Gain Study_NLP]02. Similarity. &. Ambiguit

chan4im.tistory.com

 

 

 

2.2 Markov Assumption
corpus์—์„œ word sequence์— ๋Œ€ํ•œ ํ™•๋ฅ ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ถ”์ •ํ•˜๋ ค๋ฉด ํฌ์†Œ์„ฑ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•ด์•ผํ•œ๋‹ค.
์ด๋•Œ, Markov ๊ฐ€์ •(Markov Assumption)์„ ๋„์ž…ํ•œ๋‹ค.

Markov ๊ฐ€์ •์ด๋ž€??
ํŠน์ •์‹œ์ ์˜ ์ƒํƒœํ™•๋ฅ ์€ ๋‹จ์ง€ ๊ทธ ์ง์ „ ์ƒํƒœ์—๋งŒ ์˜์กดํ•œ๋‹ค๋Š” ๋…ผ๋ฆฌ.
์ฆ‰, ์•ž์„œ ์ถœํ˜„ํ•œ ๋ชจ๋“  ๋‹จ์–ด๋ฅผ ์‚ดํŽด๋ณผ ํ•„์š”์—†์ด,
์•ž์˜ k๊ฐœ์˜ ๋‹จ์–ด(= ๋ฐ”๋กœ ์ง์ „ ์ƒํƒœ)๋งŒ ๋ณด๊ณ  ๋‹ค์Œ ๋‹จ์–ด์˜ ์ถœํ˜„ํ™•๋ฅ ์„ ๊ตฌํ•˜๋Š” ๊ฒƒ
.


์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ์œ„์™€ ๊ฐ™์€๋ฐ, ์ด๋ ‡๊ฒŒ ์กฐ๊ฑด์„ ๊ฐ„์†Œํ™”ํ•ด ์‹ค์ œ๋กœ ๊ตฌํ•˜๊ณ ์žํ•˜๋Š” ํ™•๋ฅ ์„ ๊ทผ์‚ฌํ•œ๋‹ค.
๋ณดํ†ต k๋Š” 0~3์˜ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค.(k=2๋ผ๋ฉด, ์•ž 2๊ฐœ๋‹จ์–ด๋ฅผ ์ฐธ์กฐํ•ด ๋‹ค์Œ ๋‹จ์–ด xi์˜ ํ™•๋ฅ ์„ ๊ทผ์‚ฌํ•ด ๋‚˜ํƒ€๋‚ธ๋‹ค.)

์—ฌ๊ธฐ์— Chain Rule์„ ์ ์šฉํ•˜๊ณ  ๋กœ๊ทธํ™•๋ฅ ๋กœ ํ‘œํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

์ด๋ ‡๊ฒŒ ์ „์ฒด ๋‹จ์–ด๋ฅผ ์กฐํ•ฉํ•˜๋Š” ๋Œ€์‹ , ๋ฐ”๋กœ ์•ž์˜ ์ผ๋ถ€ ์กฐํ•ฉ๋งŒ ์ถœํ˜„๋นˆ๋„๋ฅผ ๊ณ„์‚ฐํ•ด ํ™•๋ฅ ์„ ์ถ”์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ N-gram์ด๋ผ ๋ถ€๋ฅธ๋‹ค. (์ด๋•Œ, N = k+1)

corpus์˜ ์–‘๊ณผ N์˜ ์ˆ˜์น˜๋Š” ๋ณดํ†ต ๋น„๋ก€ํ•˜๋Š”๋ฐ,
N์ด ์ปค์งˆ์ˆ˜๋ก ์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์ง„ train corpus์— ์กด์žฌํ•˜์ง€ ์•Š์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๊ธฐ์— ์ •ํ™•ํ•œ ์ถ”์ •์ด ์–ด๋ ค์›Œ์ง„๋‹ค.



k (N=k+1) N-gram ๋ช…์นญ
0 1-gram uni-gram
1 2-gram bi-gram
2 3-gram tri-gram
๋”ฐ๋ผ์„œ ๋ณดํ†ต 3-gram์„ ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉํ•˜๋ฉฐ, train data๊ฐ€ ๋งค์šฐ ์ถฉ๋ถ„ํ•˜๋‹ค๋ฉด, 4-gram์„ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•œ๋‹ค.(์‚ฌ์‹ค ๊ทธ๋ ‡๊ฒŒ ํฐ ํšจ์œจ์„ฑ์€ ์—†์Œ)
โˆต 4-gram์„ ์‚ฌ์šฉํ•˜๋ฉด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์€ ํฌ๊ฒŒ ์˜ค๋ฅด์ง€ ์•Š์ง€๋งŒ ๋‹จ์–ด ์กฐํ•ฉ์˜ ๊ฒฝ์šฐ์˜ ์ˆ˜๋Š” ์ง€์ˆ˜์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๊ธฐ ๋•Œ๋ฌธ


ex) 3-gram
โˆ™ 3-gram์˜ ๊ฐ€์ •์— ๋”ฐ๋ผ, ๋‹ค์Œ๊ณผ ๊ฐ™์ด 3๊ฐœ ๋‹จ์–ด์˜ ์ถœํ˜„๋นˆ๋„์™€ ์•ž 2๊ฐœ์˜ ์ถœํ˜„๋นˆ๋„๋งŒ ๊ตฌํ•˜๋ฉด xi์˜ ํ™•๋ฅ ์„ ๊ทผ์‚ฌํ•  ์ˆ˜ ์žˆ๋‹ค.
์ฆ‰, ๋ฌธ์žฅ์ „์ฒด์˜ ํ™•๋ฅ ์— ๋น„ํ•ด Markov๊ฐ€์ •์„ ๋„์ž…ํ•˜๋ฉด, ๋ฌธ์žฅ์˜ ํ™•๋ฅ ์„ ๊ทผ์‚ฌํ•  ์ˆ˜ ์žˆ๋‹ค.
์ด๋ ‡๊ฒŒ ๋˜๋ฉด, train corpus์—์„œ ๋ณด์ง€๋ชปํ•œ ๋ฌธ์žฅ์— ๋Œ€ํ•ด์„œ๋„ ํ™•๋ฅ ์„ ์ถ”์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

2.3 Generalization
train data์— ์—†๋Š” unseen sample์˜ ์˜ˆ์ธก๋Šฅ๋ ฅ (= ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ)์— ์„ฑ๋Šฅ์ด ์ขŒ์šฐ๋œ๋‹ค.
N-gram์—ญ์‹œ Markov๊ฐ€์ •์˜ ๋„์ž…์œผ๋กœ ํฌ์†Œ์„ฑ์— ๋Œ€์ฒ˜ํ•˜๋Š” ์ผ๋ฐ˜ํ™”๋Šฅ๋ ฅ์„ ์–ด๋Š์ •๋„ ๊ฐ–๋Š”๋‹ค.

๋”์šฑ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ์‚ดํŽด๋ณด๋„๋ก ํ•˜์ž.
Smoothing  &  Discounting
์ถœํ˜„ ํšŸ์ˆ˜๋ฅผ ๋‹จ์ˆœํžˆ ํ™•๋ฅ ๊ฐ’์œผ๋กœ ์ถ”์ •ํ•œ๋‹ค๋ฉด...?
train corpus์— ์ถœํ˜„ํ•˜์ง€ ์•Š๋Š” ๋‹จ์–ด corpus์— ๋Œ€์ฒ˜๋Šฅ๋ ฅ์ด ์ €ํ•˜๋œ๋‹ค.
์ฆ‰, unseen word sequence๋ผ๊ณ ํ•ด์„œ ํ™•๋ฅ ์„ 0์œผ๋กœ ์ถ”์ •ํ•ด๋ฒ„๋ฆฌ๊ฒŒ ๋œ๋‹ค.
∴ ์ถœํ˜„๋นˆ๋„๊ฐ’(word frequency)์ด๋‚˜ ํ™•๋ฅ ๊ฐ’์„ ๋”์šฑ ๋‹ค๋“ฌ์–ด(smoothing)์ค˜์•ผ ํ•œ๋‹ค.

๊ฐ€์žฅ ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ ๋ชจ๋“  word sequence์˜ ์ถœํ˜„๋นˆ๋„์— 1์„ ๋”ํ•˜๋Š” ๊ฒƒ์ด๋‹ค.
์ด๋ฅผ ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
์ด ๋ฐฉ๋ฒ•์€ ๋งค์šฐ ๊ฐ„๋‹จํ•˜๊ณ  ์ง๊ด€์ ์ด์ง€๋งŒ, LM์ฒ˜๋Ÿผ ํฌ์†Œ์„ฑ ๋ฌธ์ œ๊ฐ€ ํด ๊ฒฝ์šฐ ์‚ฌ์šฉ์€ ๋น„์ ์ ˆํ•˜๋‹ค.
์ด์™€ ๊ด€๋ จํ•ด Naïve Bayes ๋“ฑ์„ ํ™œ์šฉํ•˜๋Š” ๋‚ด์šฉ์„ ์ „์— ๋‹ค๋ค˜๋‹ค.(https://chan4im.tistory.com/199#n2)

Kneser-Ney Discounting
Smoothing์˜ ํฌ์†Œ์„ฑ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•ด KN(Kneser-Ney) Discounting์„ ์ œ์•ˆํ•œ๋‹ค.

โ—๏ธํ•ต์‹ฌ ์•„์ด๋””์–ด
โˆ™ ๋‹จ์–ด w๊ฐ€ ๋‹ค๋ฅธ ๋‹จ์–ด v์˜ ๋’ค์— ์ถœํ˜„ ์‹œ, ์–ผ๋งˆ๋‚˜ ๋‹ค์–‘ํ•œ ๋‹จ์–ด ๋’ค์—์„œ ์ถœํ˜„ํ•˜๋Š”์ง€(= ์ฆ‰, v๊ฐ€ ์–ผ๋งˆ๋‚˜ ๋‹ค์–‘ํ•œ ์ง€)๋ฅผ ์•Œ์•„๋‚ด๋Š” ๊ฒƒ
โˆ™ ๋‹ค์–‘ํ•œ ๋‹จ์–ด ๋’ค์— ๋‚˜ํƒ€๋‚˜๋Š” ๋‹จ์–ด์ผ์ˆ˜๋ก unseen word sequence๋กœ ๋‚˜ํƒ€๋‚  ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.


KN Discounting์€ Scorecontinuation์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋ชจ๋ธ๋งํ•˜๋Š”๋ฐ, 
์ฆ‰, w์™€ ํ•จ๊ป˜ ๋‚˜ํƒ€๋‚œ v๋“ค์˜ ์ง‘ํ•ฉ์ธ {v:Count(v,w)>0}์˜ ํฌ๊ธฐ๊ฐ€ ํด์ˆ˜๋ก Scorecontinuation์€ ํด ๊ฒƒ์ด๋ผ ๊ฐ€์ •ํ•œ๋‹ค.
์ˆ˜์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.
์œ„์˜ ์ˆ˜์‹์„ ์•„๋ž˜์™€ ๊ฐ™์€ ๊ณผ์ •์œผ๋กœ ์ง„ํ–‰ํ•ด๋ณด์ž.

w์™€ ํ•จ๊ป˜ ๋‚˜ํƒ€๋‚œ v๋“ค์˜ ์ง‘ํ•ฉ {v : Count(v:w)>0}์˜ ํฌ๊ธฐ๋ฅผ
์ „์ฒด ๋‹จ์–ด ์ง‘ํ•ฉ์œผ๋กœ๋ถ€ํ„ฐ samplingํ•œ w'∈W์ผ๋•Œ v, w'๊ฐ€ ํ•จ๊ป˜ ๋‚˜ํƒ€๋‚œ ์ง‘ํ•ฉ{v:Count(v,w')>0}์˜ ํฌ๊ธฐํ•ฉ์œผ๋กœ ๋‚˜๋ˆˆ๋‹ค.
์ˆ˜์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

์ด๋ ‡๊ฒŒ ์šฐ๋ฆฐ bi-gram์„ ์œ„ํ•œ PKN์„ ์•„๋ž˜ ์ˆ˜์‹์ฒ˜๋Ÿผ ์ •์˜ํ•  ์ˆ˜ ์žˆ๋‹ค.
์ด๋•Œ, d๋Š” ์ƒ์ˆ˜๋กœ ๋ณดํ†ต 0.75์˜ ๊ฐ’์„ ๊ฐ–๋Š”๋‹ค.
์ด์ฒ˜๋Ÿผ KN Discounting์€ ๊ฐ„๋‹จํ•œ ์ง๊ด€์—์„œ ์ถœ๋ฐœํ•ด ๋ณต์žกํ•œ ์ˆ˜์‹์„ ๊ฐ–๋Š”๋‹ค.
์—ฌ๊ธฐ์„œ ์•ฝ๊ฐ„์˜ ์ˆ˜์ •์„ ๊ฐ€๋ฏธํ•œ, Modified-KN Discounting์ด ๋ณดํŽธ์ ์ธ ๋ฐฉ๋ฒ•์ด๋‹ค.
cf) ์–ธ์–ด๋ชจ๋ธํˆดํ‚ท(SRILM)์— ๊ตฌํ˜„๋˜์–ด์žˆ๋Š” ๊ธฐ๋Šฅ์„ ํ†ตํ•ด ์‰ฝ๊ฒŒ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
Interpolation
๋‹ค์ˆ˜์˜ LM์‚ฌ์ด์˜ ์„ ํ˜•๊ฒฐํ•ฉ(interpolation)์„ ํ†ตํ•ด LM์„ ์ผ๋ฐ˜ํ™”ํ•ด๋ณด์ž.
LM์˜ interpolation์ด๋ž€, 2๊ฐœ์˜ ์„œ๋กœ๋‹ค๋ฅธ LM์„ ์„ ํ˜•์ ์œผ๋กœ ์ผ์ •๋น„์œจ(λ)๋กœ ์„ž์–ด์ฃผ๋Š” ๊ฒƒ์ด๋‹ค.

ํŠน์ • ์˜์—ญ์— ํŠนํ™”๋œ LM๊ตฌ์ถ• ์‹œ, interpolation์€ ๊ต‰์žฅํžˆ ์œ ์šฉํ•œ๋ฐ, ํŠน์ •์˜์—ญ์˜ ์ž‘์€ corpus๋กœ ๋งŒ๋“  LM๊ณผ ์„ž์Œ์œผ๋กœ์จ ํŠน์ •์˜์—ญ์— ํŠนํ™”๋œ LM์„ ๊ฐ•ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด ์˜๋ฃŒ๋ถ„์•ผ์Œ์„ฑ์ธ์‹์ด๋‚˜ ๊ธฐ๊ณ„๋ฒˆ์—ญ์‹œ์Šคํ…œ ๊ตฌ์ถ•์„ ๊ฐ€์ •ํ•ด๋ณด์ž.
๊ธฐ์กด์˜ ์ผ๋ฐ˜ ์˜์—ญ corpus๋ฅผ ํ†ตํ•ด ์ƒ์„ฑํ•œ LM์ด๋ผ๋ฉด ์˜๋ฃŒ์šฉ์–ดํ‘œํ˜„์ด ๋‚ฏ์„ค ์ˆ˜ ์žˆ๋‹ค.
๋ฐ˜๋Œ€๋กœ ํŠนํ™”์˜์—ญ์˜ corpus๋งŒ ์‚ฌ์šฉํ•ด LM์„ ์ƒ์„ฑํ•œ๋‹ค๋ฉด, generalization๋Šฅ๋ ฅ์ด ์ง€๋‚˜์น˜๊ฒŒ ๋–จ์–ด์งˆ ์ˆ˜ ์žˆ๋‹ค.

โˆ™ ์ผ๋ฐ˜ ์˜์—ญ
 - P(์ง„์ •์ œ | ์ค€๋น„,๋œ) = 0.00001
 - P(์‚ฌ๋‚˜์ด | ์ค€๋น„,๋œ) = 0.01

โˆ™ ํŠนํ™” ์˜์—ญ
 - P(์ง„์ •์ œ | ์ค€๋น„,๋œ) = 0.09
 - P(์•ฝ | ์ค€๋น„,๋œ) = 0.04

โˆ™ interpolation ๊ฒฐ๊ณผ
 - P(์ง„์ •์ œ | ์ค€๋น„,๋œ) = 0.5*0.09  +  (1-0.5)*0.00001 = 0.045005


๊ฒฐ๊ตญ ์ผ๋ฐ˜์ ์ธ ์˜๋ฏธ์™€๋Š” ๋‹ค๋ฅธ ๋œป์˜ ๋‹จ์–ด๊ฐ€ ๋‚˜์˜ฌ์ˆ˜๋„ ์žˆ๊ณ , ์ผ๋ฐ˜์ ์ธ ๋Œ€ํ™”์—์„œ๋Š” ํฌ์†Œํ•œ word sequence๊ฐ€ ํ›จ์”ฌ ์ž์ฃผ ๋“ฑ์žฅํ•  ์ˆ˜๋„ ์žˆ๋‹ค.
๋˜ํ•œ, ํŠนํ™”์˜์—ญ์˜ corpus์—๋Š” ์ผ๋ฐ˜์ ์ธ word sequence๊ฐ€ ๋งค์šฐ ๋ถ€์กฑํ•  ๊ฒƒ์ด๋‹ค.
์ด๋Ÿฐ ๋ฌธ์ œ์ ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ฐ ์˜์—ญ์˜ corpus๋กœ ์ƒ์„ฑํ•œ LM์„ ์„ž์–ด์ฃผ์–ด ํ•ด๋‹น์˜์—ญ์— ํŠนํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.
Back-Off
๋„ˆ๋ฌด ๊ธธ๊ฑฐ๋‚˜ ๋ณต์žกํ•œ word sequence๋Š”์‹ค์ œ train corpus์—์„œ ๊ต‰์žฅํžˆ ํฌ์†Œํ•˜๋‹ค.
๋”ฐ๋ผ์„œ Markov ๊ฐ€์ •์„ ํ†ตํ•ด ์ผ๋ฐ˜ํ™”๊ฐ€ ๊ฐ€๋Šฅํ•˜๋ฉฐ, Back-Off๋ฐฉ์‹์€ ํ•œ๋‹จ๊ณ„ ๋” ๋‚˜์•„๊ฐ„ ๋ฐฉ์‹์ด๋‹ค.

์•„๋ž˜ ์ˆ˜์‹์„ ๋ณด๋ฉด, ํŠน์ • N-gram์˜ ํ™•๋ฅ ์„ N๋ณด๋‹ค ๋” ์ž‘์€ sequence์— ๋Œ€ํ•ด ํ™•๋ฅ ์„ ๊ตฌํ•ด interpolation์„ ์ง„ํ–‰ํ•œ๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด 3-gram์˜ ํ™•๋ฅ ์— ๋Œ€ํ•ด 2-gram, 1-gram์˜ ํ™•๋ฅ ์„ interpolation์„ ํ•  ๋•Œ, ์ด๋ฅผ ์ˆ˜์‹์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉด ๋‹ค์Œ๊ณผ ๊ฐ™์ด N๋ณด๋‹ค ๋” ์ž‘์€ sequence์˜ ํ™•๋ฅ ์„ ํ™œ์šฉํ•จ์œผ๋กœ์จ ๋” ๋†’์€ smoothing&generalization ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 

2.4 Conclusion
N-gram๋ฐฉ์‹์€ ์ถœํ˜„๋นˆ๋„๋ฅผ ํ†ตํ•ด ํ™•๋ฅ ์„ ๊ทผ์‚ฌํ•˜๊ธฐ์— ๋งค์šฐ ์‰ฝ๊ณ  ๊ฐ„ํŽธํ•˜๋‹ค.

Prob)
๋‹ค๋งŒ ๋‹จ์  ๋˜ํ•œ ๋ช…ํ™•ํ•œ๋ฐ, train corpus์— ๋“ฑ์žฅํ•˜์ง€ ์•Š๋Š” ๋‹จ์–ด corpus์˜ ๊ฒฝ์šฐ, ํ™•๋ฅ ์„ ์ •ํ™•ํžˆ ์•Œ ์ˆ˜ ์—†๋‹ค.

Sol)
๋”ฐ๋ผ์„œ Markov ๊ฐ€์ •์„ ํ†ตํ•ด ๋‹จ์–ด ์กฐํ•ฉ์— ํ•„์š”ํ•œ ์กฐ๊ฑด์„ ๊ฐ„์†Œํ™”ํ•  ์ˆ˜ ์žˆ๊ณ 
๋” ๋‚˜์•„๊ฐ€ Smoothing, Back-Off ๋“ฑ์œผ๋กœ ๋‹จ์ ์„ ๋ณด์™„ํ•˜์˜€๋‹ค.

ํ•˜์ง€๋งŒ ์—ฌ์ „ํžˆ ๊ทผ๋ณธ์  ํ•ด๊ฒฐ์ฑ…์€ ์•„๋‹ˆ๋ฉฐ, ํ˜„์žฌ DNN์˜ ๋„์ž…์€ ์Œ์„ฑ์ธ์‹, ๊ธฐ๊ณ„๋ฒˆ์—ญ์— ์‚ฌ์šฉ๋˜๋Š” LM์— ํฐ ๋น›์„ ๊ฐ€์ ธ๋‹ค ์ฃผ์—ˆ๋‹ค.
DNN์‹œ๋Œ€์—์„œ๋„ ์—ฌ์ „ํžˆ N-gram๋ฐฉ์‹์€ ๊ฐ•๋ ฅํ•˜๊ฒŒ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋Š”๋ฐ, ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ, ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์˜ ์œ ์ฐฝ์„ฑ(fluency)์„ ์ฑ„์ ํ•˜๋Š” ๋ฌธ์ œ๋ผ๋ฉด, ๊ตณ์ด ๋ณต์žกํ•œ DNN์ด ์•„๋‹ˆ๋”๋ผ๋„ N-gram๋ฐฉ์‹์ด ์—ฌ์ „ํžˆ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
(DNN์ด ์–ป๋Š” ์‚ฌ์†Œํ•œ ์ด๋“์€ ๋งค์šฐ ๊ท€์ฐฎ๊ณ  ์–ด๋ ค์šด ์ผ์ด ๋  ๊ฒƒ.)

 

 

 

 

 

 

 

 

 

 


3.  LM - Metrics

3.1 PPL (Perplexity)
LM์˜ ํ‰๊ฐ€์ฒ™๋„์ธ perplexity(PPL)์„ ์ธก์ •ํ•˜๋Š” ์ •๋Ÿ‰ํ‰๊ฐ€(extrinsic evaluation) ๋ฐฉ๋ฒ•์ด๋‹ค.
PPL์€ ๋ฌธ์žฅ์˜ ๊ธธ์ด๋ฅผ ๋ฐ˜์˜, ํ™•๋ฅ ๊ฐ’์„ ์ •๊ทœํ™”ํ•œ ๊ฐ’์ด๋‹ค.

๋ฌธ์žฅ์˜ ํ™•๋ฅ ๊ฐ’์ด ๋ถ„๋ชจ์— ์žˆ๊ธฐ์— ํ™•๋ฅ ๊ฐ’์ด ๋†’์„์ˆ˜๋ก PPL์€ ์ž‘์•„์ง„๋‹ค.
๋”ฐ๋ผ์„œ PPL๊ฐ’์ด ์ž‘์„์ˆ˜๋ก, N-gram์˜ N์ด ํด์ˆ˜๋ก ๋” ์ข‹์€ ๋ชจ๋ธ์ด๋‹ค.

 

3.2 PPL์˜ ํ•ด์„
์ถœํ˜„ํ™•๋ฅ ์ด n๊ฐœ๋ผ๋ฉด, ๋งค time-step์œผ๋กœ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ์˜ ์ˆ˜๊ฐ€ n์ธ PPL๋กœ
PPL์€ ์ผ์ข…์˜ n๊ฐœ์˜ branch์˜ ์ˆ˜(๋ป—์–ด๋‚˜๊ฐ€๋Š” ์ˆ˜)๋ฅผ ์˜๋ฏธํ•˜๊ธฐ๋„ ํ•œ๋‹ค.

ex) 20,000๊ฐœ์˜ vocabulary๋ผ๋ฉด, PPL์€ 20,000์ด๋‹ค. (๋‹จ, ๋‹จ์–ด์ถœํ˜„ํ™•๋ฅ ์ด ๋ชจ๋‘ ๊ฐ™์„๋•Œ)

ํ•˜์ง€๋งŒ ๋งŒ์•ฝ 3-gram๊ธฐ๋ฐ˜ LM์œผ๋กœ ์ธก์ •ํ•œ PPL์ด 30์ด ๋‚˜์™”๋‹ค๋ฉด
ํ‰๊ท ์ ์œผ๋กœ 30๊ฐœ์˜ ํ›„๋ณด๋‹จ์–ด ์ค‘์— ํ—ท๊ฐˆ๋ฆฌ๊ณ  ์žˆ๋‹ค๋Š” ๊ฒƒ์œผ๋กœ
๋‹ค์Œ ๋‹จ์–ด ์˜ˆ์ธก ์‹œ, 30๊ฐœ์˜ ํ›„๋ณด๊ตฐ์ค‘ ๊ณ ๋ฅด๋Š” ๊ฒฝ์šฐ๋กœ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

 

3.3 PPL๊ณผ Entropy์˜ ๊ด€๊ณ„
์•ž์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ, ์ •๋ณด๋Ÿ‰์˜ ํ‰๊ท ์„ ์˜๋ฏธํ•˜๋Š” Entropy์˜ ๊ฒฝ์šฐ,
โˆ™ ์ •๋ณด๋Ÿ‰์ด ๋‚ฎ์œผ๋ฉด ํ™•๋ฅ ๋ถ„ํฌ๋Š” sharpํ•œ ๋ชจ์–‘์ด๊ณ 
โˆ™ ์ •๋ณด๋Ÿ‰์ด ๋†’์œผ๋ฉด ํ™•๋ฅ ๋ถ„ํฌ๋Š” flatํ•œ ๋ชจ์–‘์ด ๋œ๋‹ค.


๋จผ์ € ์‹ค์ œ LM์˜ ๋ถ„ํฌ P(x)๋‚˜ ์ถœํ˜„๊ฐ€๋Šฅ๋ฌธ์žฅ๋“ค์˜ ์ง‘ํ•ฉ W์—์„œ ๊ธธ์ด n์˜ ๋ฌธ์žฅ w1:n์„ sampling ์‹œ, ์šฐ๋ฆฌ์˜ LM๋ถ„ํฌ Pθ(x)์˜ entropy๋ฅผ ๋‚˜ํƒ€๋‚ธ ์ˆ˜์‹์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

์—ฌ๊ธฐ์„œ ๋ชฌํ…Œ์นด๋ฅผ๋กœ(Monte Carlo) sampling์„ ํ†ตํ•ด ์œ„์˜ ์ˆ˜์‹์„ ๊ทผ์‚ฌ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.



์•ž์„œ์™€ ๊ฐ™์ด entropy H์‹์„ ๊ทผ์‚ฌ์‹œํ‚ฌ ์ˆ˜ ์žˆ์ง€๋งŒ, ์‚ฌ์‹ค ๋ฌธ์žฅ์€ sequential data์ด๊ธฐ์—
entropy rate๋ผ๋Š” ๊ฐœ๋…์„ ์‚ฌ์šฉํ•˜๋ฉด ์•„๋ž˜์ฒ˜๋Ÿผ ๋‹จ์–ด๋‹น ํ‰๊ท  entropy๋กœ ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Monte Carlo sampling์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.


์ด ์ˆ˜์‹์„ ์กฐ๊ธˆ๋งŒ ๋” ๋ฐ”๊พธ๋ฉด, ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

์—ฌ๊ธฐ์— PPL ์ˆ˜์‹์„ ์ƒ๊ฐํ•ด๋ณด๋ฉด ์•ž์„œ Cross-Entropy๋กœ๋ถ€ํ„ฐ ๋„์ถœํ•œ ์ˆ˜์‹๊ณผ ๋น„์Šทํ•œ ํ˜•ํƒœ์ž„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.



์ตœ์ข…์ ์œผ๋กœ PPL๊ณผ CE์˜ ๊ด€๊ณ„๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.


∴ MLE๋ฅผ ํ†ตํ•ด parameter θ ํ•™์Šต ์‹œ, CE๋กœ ์–ป๋Š” ์†์‹ค๊ฐ’์— exp๋ฅผ ์ทจํ•จ์œผ๋กœ์จ PPL์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค.

 

 

 

 

 

 

 

 

 

 

 


4. SRILM์„ ํ™œ์šฉํ•œ N-gram ์‹ค์Šต

SRILM์€ ์Œ์„ฑ์ธ์‹โˆ™segmentationโˆ™MT(๊ธฐ๊ณ„๋ฒˆ์—ญ) ๋“ฑ์— ์‚ฌ์šฉ๋˜๋Š” n-gram ๋ชจ๋ธ์„ ์‰ฝ๊ฒŒ ๊ตฌ์ถ•ํ•˜๊ณ  ์ ์šฉํ•˜๋Šฅํ•œ Tool-kit์ด๋‹ค.

 

4.1 SRILM ์„ค์น˜ํ•˜๊ธฐ
(http://www.speech.sri.com/projects/srilm/download.html)์—์„œ ๊ฐ„๋‹จํ•œ ์ •๋ณด๋ฅผ ๊ธฐ์ž…, SRILM์„ ๋‚ด๋ ค๋ฐ›์„ ์ˆ˜ ์žˆ๋‹ค.

์ดํ›„ ์•„๋ž˜์™€ ๊ฐ™์ด ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ์ƒ์„ฑ, ๊ทธ ์•ˆ์— ์••์ถ•ํ•ด์ œ ์ง„ํ–‰.
$ mkdir srilm
$ cd ./srilm
$ tar -xzvf ./srilm-1.7.2.tar.gzโ€‹

 


๋””๋ ‰ํ† ๋ฆฌ ๋‚ด๋ถ€์— Makefile์„ ์—ด์–ด 7๋ฒˆ์งธ ๋ผ์ธ์˜ SRILM์˜ ๊ฒฝ๋กœ ์ง€์ • ํ›„ ์ฃผ์„์„ ํ•ด์ œํ•œ๋‹ค.
์ดํ›„ make๋ช…๋ น์œผ๋กœ SRILM์„ ๋นŒ๋“œํ•œ๋‹ค.
$ vi ./Makefile

7๋ฒˆ์งธ ๋ผ์ธ์˜ SRILM์˜ ๊ฒฝ๋กœ ์ง€์ • ํ›„ ์ฃผ์„์„ ํ•ด์ œ

$ makeโ€‹


build๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์™„๋ฃŒ๋˜๋ฉด PATH์— SRILM/bin ๋‚ด๋ถ€์— ์ƒˆ๋กญ๊ฒŒ ์ƒ์„ฑ๋œ ๋””๋ ‰ํ† ๋ฆฌ๋ฅผ ๋“ฑ๋ก ํ›„ export
PATH = {SRILM_PATH}/bin/{MACHINE}:$PATH
#PATH = /home/IHC/Workspace/nlp/srilm/bin/i686-m64:$PATH
export PATHโ€‹


์•„๋ž˜์™€ ๊ฐ™์ด ngram-count์™€ ngram์ด ์ •์ƒ์ ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•œ๋‹ค.
$ source ~/.profile
$ ngram-count -help
$ ngram -helpโ€‹

 

4.2 Dataset ์ค€๋น„
์ด์ „ ์ „์ฒ˜๋ฆฌ ์žฅ์—์„œ ๋‹ค๋ค˜๋˜ ๊ฒƒ์ฒ˜๋Ÿผ ๋ถ„์ ˆ์ด ์™„๋ฃŒ๋œ ํŒŒ์ผ์„ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
์ดํ›„ ํŒŒ์ผ์„ train data์™€ test data๋กœ ๋‚˜๋ˆˆ๋‹ค.

 

4.3 ๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ•
SRILM์—์„œ ์‚ฌ์šฉ๋˜๋Š” ํ”„๋กœ๊ทธ๋žจ๋“ค์˜ ์ฃผ์š”์ธ์ž ์„ค๋ช…
โˆ™ ngram-count : LM ํ›ˆ๋ จ
โˆ™ vocab : lexicon file_name
โˆ™ text : training corpus file_name
โˆ™ order : n-gram count
โˆ™ write : output count file_name
โˆ™ unk : mark OOV as
โˆ™ kndiscountn : Use Kneser -Ney discounting for N-grams of oerder n
โˆ™ ngram : LM ํ™œ์šฉ
โˆ™ ppl : calculate perplexity for test file name
โˆ™ order : n-gram count
โˆ™ lm : LM file_nameโ€‹
LM ๋งŒ๋“ค๊ธฐ
ex) kndiscount๋ฅผ ์‚ฌ์šฉํ•œ ์ƒํƒœ์—์„œ 3-gram์„ ํ›ˆ๋ จ,
LM๊ณผ LM์„ ๊ตฌ์„ฑํ•˜๋Š” vocabulary๋ฅผ ์ถœ๋ ฅํ•˜๋Š” ๊ณผ์ •
$ time ngram-count -order 3 -kndiscount -text <text_fn> -lm <output_lm_fn> -write_vocab <output_vocab_fn> -debug 2โ€‹
๋ฌธ์žฅ ์ƒ์„ฑํ•˜๊ธฐ
N-gram๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•ด ๋งŒ๋“ค์–ด์ง„ LM์„ ํ™œ์šฉํ•ด ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•ด๋ณด์ž.
๋ฌธ์žฅ ์ƒ์„ฑ ์ดํ›„ ์ „์ฒ˜๋ฆฌ(https://chan4im.tistory.com/195)์—์„œ ์„ค๋ช…ํ–ˆ๋“ฏ, ๋ถ„์ ˆ์„ ๋ณต์›ํ•ด์ค˜์•ผ ํ•œ๋‹ค.
์ด๋•Œ, ์•„๋ž˜ ์˜ˆ์‹œ๋Š” Linux์˜ pipeline์„ ์—ฐ๊ณ„ํ•ด sed๋ฅผ ํ†ตํ•œ ์ •๊ทœํ‘œํ˜„์‹์„ ์‚ฌ์šฉํ•ด ๋ถ„์ ์„ ๋ณต์›ํ•œ๋‹ค.
$ ngram -lm <input_lm_fn> -gen <n_sentence_to_generate> | sed "s/ // g" | sed "s/__//g" | sed "s/_//g" | sed "s/^\s//g"โ€‹

๋งŒ์•ฝ, ํ•ญ์ƒ sed์™€ ์ •๊ทœํ‘œํ˜„์‹์ด ๊ท€์ฐฎ๋‹ค๋ฉด ํŒŒ์ด์ฌ์œผ๋กœ๋„ ๊ฐ€๋Šฅํ•˜๋‹ค.

ํ‰๊ฐ€
์„ฑ๋Šฅ ํ‰๊ฐ€์˜ ๊ฒฝ์šฐ, ์•„๋ž˜ ๋ช…๋ น์„ ํ†ตํ•ด ์ˆ˜ํ–‰๋  ์ˆ˜ ์žˆ๋‹ค.
$ ngram -ppl <test_fn> -lm <input_lm_fn> -order 3 -debug 2โ€‹

์œ„์˜ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜๋ฉด OoVs(Out of Vacabularies)์™€ ํ•ด๋‹น test๋ฌธ์žฅ๋“ค์— ๋Œ€ํ•œ PPL์ด ์ถœ๋ ฅ์œผ๋กœ ๋‚˜์˜จ๋‹ค.
์ฃผ๋กœ ๋ฌธ์žฅ์ˆ˜์— ๋Œ€ํ•ด์„œ ํ‰๊ท ์„ ๊ณ„์‚ฐํ•œ(pp1 ์•„๋‹˜) ppl์„ ์ฐธ๊ณ ํ•˜๋ฉด ๋œ๋‹ค.
interpolation
SRILM์„ ํ†ตํ•ด ๋‹จ์ˆœ Smoothing(= discounting)๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Interpolation(๋ณด๊ฐ„) ๋˜ํ•œ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ,
์ด ๊ฒฝ์šฐ์—๋Š” ์™„์„ฑ๋œ 2๊ฐœ์˜ ๋‹ค๋ฅธ LM์ด ํ•„์š”ํ•˜๊ณ , ์ด๋ฅผ ์„ž๊ธฐ์œ„ํ•œ hyper-parameter λ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

์•„๋ž˜์™€ ๊ฐ™์€ ๋ช…๋ น์–ด๋ฅผ ์ž…๋ ฅํ•˜๋ฉด interpolation ์ˆ˜ํ–‰์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
$ ngram -lm <input_lm_fn> -mix-lm <mix_lm_fn> -lambda <mix_ratio_between_0_and_1> -write-lm <output_lm_fn> -debug 2โ€‹


interpolation ์ดํ›„ ์„ฑ๋Šฅํ‰๊ฐ€ ์‹œ, ๊ฒฝ์šฐ์— ๋”ฐ๋ผ ์„ฑ๋Šฅํ–ฅ์ƒ์„ ๊ฒฝํ—˜ํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ λ๋ฅผ ํŠœ๋‹ํ•จ์œผ๋กœ์จ ์„ฑ๋Šฅํ–ฅ์ƒ ํญ์„ ๋” ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค.

 

 

 

 

 

 

 

 

 

 


5. NNLM

5.1 ํฌ์†Œ์„ฑ ํ•ด๊ฒฐํ•˜๊ธฐ
N-gram๊ธฐ๋ฐ˜ LM์€ ๊ฐ„ํŽธํ•˜์ง€๋งŒ ๊ธฐ์กด corpus train data์— ํ•ด๋‹น N-gram์ด ์—†๊ฑฐ๋‚˜ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๋‹จ์–ด์˜ ์กฐํ•ฉ์—๋Š” ์ถœํ˜„ ๋นˆ๋„๋ฅผ ๊ณ„์‚ฐํ•  ์ˆ˜ ์—†์–ด์„œ ํ™•๋ฅ ์„ ๊ตฌํ•  ์ˆ˜ ์—†๊ณ  ํ™•๋ฅ ๊ฐ„ ๋น„๊ต๋ฅผ ํ•  ์ˆ˜ ์—†๋Š” ๋“ฑ ์ƒ๋‹นํžˆ generalization์— ์ทจ์•ฝํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

N-gram ๊ธฐ๋ฐ˜ LM์˜ ์•ฝ์ ์„ ๋ณด์™„ํ•˜๊ธฐ ์œ„ํ•ด NNLM์ด ๋‚˜์˜ค๊ฒŒ ๋˜์—ˆ๋Š”๋ฐ, NNLM(Neural Network Language Model)์€ word embedding์„ ์‚ฌ์šฉํ•ด ๋‹จ์–ด์ฐจ์›์ถ•์†Œ๋ฅผ ํ†ตํ•ด corpus์™€ ์œ ์‚ฌํ•œ dense vector๋ฅผ ํ•™์Šตํ•˜๊ณ , ๋” ๋†’์€ ์œ ์‚ฌ๋„๋ฅผ ๊ฐ–๊ฒŒ ํ•˜์—ฌ generalization ์„ฑ๋Šฅ์„ ๋†’์ž„์œผ๋กœ์จ ํฌ์†Œ์„ฑํ•ด์†Œ(WSD)๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.

NNLM์€ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ๋ฅผ ๊ฐ–์ง€๋งŒ ๊ฐ€์žฅ ํšจ์œจ์ ์ด๊ณ  ํ”ํ•œํ˜•ํƒœ์ธ RNN๊ณ„์—ด์˜ LSTM์„ ํ™œ์šฉํ•œ RNNLM๋ฐฉ์‹์— ๋Œ€ํ•ด ์•Œ์•„๋ณด์ž.

 

5.2 RNNLM (RNN Language Model)

๊ธฐ์กด์˜ LM์€ ๊ฐ๊ฐ์˜ ๋‹จ์–ด๋ฅผ ๋ถˆ์—ฐ์†์  ๋ฐ์ดํ„ฐ๋กœ ์ทจ๊ธ‰ํ•ด์„œ word sequence(๋‹จ์–ด์กฐํ•ฉ)์˜ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์ง€๋ฉด ํฌ์†Œ์„ฑ๋ฌธ์ œ๋กœ ์–ด๋ ค์›€์„ ๊ฒช์—ˆ๋‹ค. ๋”ฐ๋ผ์„œ Markov๊ฐ€์ •์„ ํ†ตํ•ด n-1์ด์ „๊นŒ์ง€์˜ ๋‹จ์–ด๋งŒ ์ฃผ๋กœ ์กฐ๊ฑด๋ถ€๋กœ ์‚ฌ์šฉํ•ด ํ™•๋ฅ ์„ ๊ทผ์‚ฌ์‹œ์ผฐ๋‹ค.

ํ•˜์ง€๋งŒ RNNLM์€ word embedding์„ ํ†ตํ•ด dense vector๋ฅผ ์ƒ์„ฑํ•˜๊ณ 
์ด๋ฅผ ํ†ตํ•ด ํฌ์†Œ์„ฑ๋ฌธ์ œ๋ฅผ ํ•ด์†Œํ•˜์—ฌ ๋ฌธ์žฅ์˜ ์ฒซ ๋‹จ์–ด๋ถ€ํ„ฐ ํ•ด๋‹น ๋‹จ์–ด ์ง์ „์˜ ๋‹จ์–ด๊นŒ์ง€ ๋ชจ๋‘ ์กฐ๊ฑด๋ถ€์— ๋„ฃ์–ด ํ™•๋ฅ ์„ ๊ทผ์‚ฌ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.
์—ฌ๊ธฐ์— ๋กœ๊ทธ๋ฅผ ๋„ฃ์„ ์ˆ˜๋„ ์žˆ๋Š”๋ฐ, π๋ฅผ ๋กœ ๋ฐ”๊พธ๊ณ  ์–‘๋ณ€์— ๋กœ๊ทธ๋ฅผ ์ทจํ•˜๋ฉด ๋œ๋‹ค.

 

5.3 ๊ตฌํ˜„ ์ˆ˜์‹ ๋ฐ ์„ค๋ช…



์ด๋•Œ, ์ž…๋ ฅ๋ฌธ์žฅ์˜ ์‹œ์ž‘๊ณผ ๋์—๋Š” x0์™€ xn+1์ด ์ถ”๊ฐ€๋˜์–ด BOS์™€ EOS๋ฅผ ๋‚˜ํƒ€๋‚ธ๋‹ค.
์ˆ˜์‹์„ ๊ณผ์ •๋ณ„๋กœ ์„ค๋ช…ํ•ด๋ณด๋ฉด, ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

โˆ™ ๋จผ์ € ๋ฌธ์žฅ x1:n[:-1]๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๋Š”๋‹ค.
โˆ™ ์ดํ›„ ๊ฐ time-step๋ณ„ ํ† ํฐ xi๋กœ ์ž„๋ฒ ๋”ฉ ๊ณ„์ธต emb์— ๋„ฃ๋Š”๋‹ค.
โˆ™ ๊ทธ ํ›„ ์ •ํ•ด์ง„ ์ฐจ์›์˜ word embedding vector๋ฅผ ์–ป๋Š”๋‹ค.
โ—๏ธ์ฃผ์˜์ ) EOS๋ฅผ ๋–ผ๊ณ  embedding layer์— input์œผ๋กœ ๋„ฃ์–ด์•ผํ•œ๋‹ค.

โ—๏ธBOS (Beginning of Sentence): BOS๋Š” ๋ฌธ์žฅ์˜ ์‹œ์ž‘์„ ๋‚˜ํƒ€๋‚ด๋Š” ํŠน๋ณ„ํ•œ ํ† ํฐ ๋˜๋Š” ์‹ฌ๋ณผ๋กœ ์ฃผ๋กœ Seq2Seq๋ชจ๋ธ๊ณผ ๊ฐ™์€ ๋ชจ๋ธ์—์„œ ์ž…๋ ฅ ์‹œํ€€์Šค์˜ ์‹œ์ž‘์„ ํ‘œ์‹œํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋œ๋‹ค.
ex) ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋ชจ๋ธ์—์„œ ๋ฒˆ์—ญํ•  ๋ฌธ์žฅ์˜ ์‹œ์ž‘์„ BOS ํ† ํฐ์œผ๋กœ ํ‘œ์‹œํ•˜์—ฌ ๋ชจ๋ธ์—๊ฒŒ ๋ฌธ์žฅ์„ ์‹œ์ž‘ํ•˜๋ผ๊ณ  ์•Œ๋ ค์ค„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

โ—๏ธEOS (End of Sentence): EOS๋Š” ๋ฌธ์žฅ์˜ ๋์„ ๋‚˜ํƒ€๋‚ด๋Š” ํŠน๋ณ„ํ•œ ํ† ํฐ ๋˜๋Š” ์‹ฌ๋ณผ๋กœ ์ฃผ๋กœ Seq2Seq ๋ชจ๋ธ๊ณผ ๊ฐ™์€ ๋ชจ๋ธ์—์„œ ์ถœ๋ ฅ ์‹œํ€€์Šค์˜ ๋์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
ex) ๊ธฐ๊ณ„ ๋ฒˆ์—ญ ๋ชจ๋ธ์ด ๋ฒˆ์—ญ์„ ๋งˆ์ณค์„ ๋•Œ EOS ํ† ํฐ์„ ์ƒ์„ฑํ•˜์—ฌ ์ถœ๋ ฅ ์‹œํ€€์Šค๊ฐ€ ๋๋‚ฌ์Œ์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค.




RNN์€ ํ•ด๋‹น word_embedding_vector๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›๊ณ 
RNN์˜ hidden_state_size์ธ hidden_size์˜ vector๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
์ด๋•Œ, pytorch๋ฅผ ํ†ตํ•ด ๋ฌธ์žฅ์˜ ๋ชจ๋“  time-step์„ ํ•œ๋ฒˆ์— ๋ณ‘๋ ฌ๋กœ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ๋‹ค.



์—ฌ๊ธฐ tensor์— linear layer์™€ softmax๋ฅผ ์ ์šฉํ•ด ๊ฐ ๋‹จ์–ด์— ๋Œ€ํ•œ ํ™•๋ฅ ๋ถ„ํฌ์ธ (x_hat)_i+1๋ฅผ ๊ตฌํ•œ๋‹ค.

์—ฌ๊ธฐ์„œ LSTM์„ ์‚ฌ์šฉํ•ด RNN์„ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ๋‹ค.

test dataset์— ๋Œ€ํ•ด PPL์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๋ฏ€๋กœ Cross Entropy Loss๋ฅผ ์‚ฌ์šฉํ•ด optimizing์„ ์ง„ํ–‰ํ•œ๋‹ค.
์ด๋•Œ, ์ฃผ์˜ํ•  ์ ์€ ์ž…๋ ฅ๊ณผ ๋ฐ˜๋Œ€๋กœ BOS๋ฅผ ์ œ๊ฑฐํ•œ ์ •๋‹ต์ธ x1:n[1:]์™€ ๋น„๊ตํ•œ๋‹ค๋Š” ๊ฒƒ์ด๋‹ค.
Pytorch ๊ตฌํ˜„์˜ˆ์ œ
import torch
import torch.nn as nn

import data_loader


class LanguageModel(nn.Module):

    def __init__(self, 
                 vocab_size,
                 word_vec_dim=512,
                 hidden_size=512,
                 n_layers=4,
                 dropout_p=.2,
                 max_length=255
                 ):
        self.vocab_size = vocab_size
        self.word_vec_dim = word_vec_dim
        self.hidden_size = hidden_size
        self.n_layers = n_layers
        self.dropout_p = dropout_p
        self.max_length = max_length

        super(LanguageModel, self).__init__()

        self.emb = nn.Embedding(vocab_size, 
                                word_vec_dim,
                                padding_idx=data_loader.PAD
                                )
        self.rnn = nn.LSTM(word_vec_dim,
                           hidden_size,
                           n_layers,
                           batch_first=True,
                           dropout=dropout_p
                           )
        self.out = nn.Linear(hidden_size, vocab_size, bias=True)
        self.log_softmax = nn.LogSoftmax(dim=2)

    def forward(self, x):
        # |x| = (batch_size, length)
        x = self.emb(x) 
        # |x| = (batch_size, length, word_vec_dim)
        x, (h, c) = self.rnn(x) 
        # |x| = (batch_size, length, hidden_size)
        x = self.out(x) 
        # |x| = (batch_size, length, vocab_size)
        y_hat = self.log_softmax(x)

        return y_hat

    def search(self, batch_size=64, max_length=255):
        x = torch.LongTensor(batch_size, 1).to(next(self.parameters()).device).zero_() + data_loader.BOS
        # |x| = (batch_size, 1)
        is_undone = x.new_ones(batch_size, 1).float()

        y_hats, indice = [], []
        h, c = None, None
        while is_undone.sum() > 0 and len(indice) < max_length:
            x = self.emb(x)
            # |emb_t| = (batch_size, 1, word_vec_dim)

            x, (h, c) = self.rnn(x, (h, c)) if h is not None and c is not None else self.rnn(x)
            # |x| = (batch_size, 1, hidden_size)
            y_hat = self.log_softmax(x)
            # |y_hat| = (batch_size, 1, output_size)
            y_hats += [y_hat]

            # y = torch.topk(y_hat, 1, dim = -1)[1].squeeze(-1)
            y = torch.multinomial(y_hat.exp().view(batch_size, -1), 1)
            y = y.masked_fill_((1. - is_undone).byte(), data_loader.PAD)
            is_undone = is_undone * torch.ne(y, data_loader.EOS).float()            
            # |y| = (batch_size, 1)
            # |is_undone| = (batch_size, 1)
            indice += [y]

            x = y

        y_hats = torch.cat(y_hats, dim=1)
        indice = torch.cat(indice, dim=-1)
        # |y_hat| = (batch_size, length, output_size)
        # |indice| = (batch_size, length)

        return y_hats, indice

 

5.4 Conclusion
NNLM์€ word_embedding_vector๋ฅผ ์‚ฌ์šฉํ•ด ํฌ์†Œ์„ฑ๋ฌธ์ œํ•ด๊ฒฐ์— ํฐ ํšจ๊ณผ๋ฅผ ๋ณธ๋‹ค.
๊ฒฐ๊ณผ์ ์œผ๋กœ train dataset์— ์—†๋Š” ๋‹จ์–ด์กฐํ•ฉ์—๋„ ํ›Œ๋ฅญํ•œ ๋Œ€์ฒ˜๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.

๋‹ค๋งŒ, N-gram์— ๋น„ํ•ด ๋” ๋งŽ์€ cost๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

 

 

 

 

 

 

 

 

 

 


6. Language Model์˜ ํ™œ์šฉ

Language Model์„ ๋‹จ๋…์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ๋งค์šฐ ๋“œ๋ฌผ๋‹ค.

๋‹ค๋งŒ, NLP์—์„œ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ด๋ผ ํ•  ์ˆ˜ ์žˆ๋Š” LM์€ ๋งค์šฐ ์ค‘์š”ํ•˜๋ฉฐ, ํ˜„์žฌ DNN์„ ํ™œ์šฉํ•ด ๋”์šฑ ๋ฐœ์ „ํ•˜๊ณ  ์žˆ๋‹ค.

LM์€ ์ž์—ฐ์–ด์ƒ์„ฑ์˜ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ด๋˜๋Š” ๋ชจ๋ธ์ด๋ฏ€๋กœ ํ™œ์šฉ๋„๋Š” ๋–จ์–ด์งˆ ์ง€์–ธ์ • ์ค‘์š”์„ฑ๊ณผ ์—ญํ• ์ด ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์€ ๋ถ€์ธํ•  ์ˆ˜ ์—†์„ ๊ฒƒ์ด๋‹ค.

๋Œ€ํ‘œ์  ํ™œ์šฉ๋ถ„์•ผ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

 

6.1 Speech Recognition
์ปดํ“จํ„ฐ์˜ ๊ฒฝ์šฐ, ์Œ์†Œ๋ณ„ ๋ถ„๋ฅ˜์˜ ์„ฑ๋Šฅ์€ ์ด๋ฏธ ์‚ฌ๋žŒ๋ณด๋‹ค ๋›ฐ์–ด๋‚˜๋‹ค.
ํ•˜์ง€๋งŒ ์‚ฌ๋žŒ๊ณผ ๋‹ฌ๋ฆฌ ์ฃผ๋ณ€ ๋ฌธ๋งฅ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” ๋Šฅ๋ ฅ(= ์ผ๋ช… '๋ˆˆ์น˜')์ด ์—†๊ธฐ์—
์ฃผ์ œ๊ฐ€ ์ „ํ™˜๋˜๋Š” ๋“ฑ์˜ ์ƒํ™ฉ์—์„œ ์Œ์„ฑ ์ธ์‹๋ฅ ์ด ๋–จ์–ด์ง€๋Š” ๊ฒฝ์šฐ๊ฐ€ ์ƒ๋‹นํžˆ ์žˆ๋‹ค.
์ด๋•Œ, ์ข‹์€ LM์„ ํ•™์Šตํ•ด ์‚ฌ์šฉํ•˜๋ฉด ์Œ์„ฑ์ธ์‹์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ผ ์ˆ˜ ์žˆ๋‹ค.

์•„๋ž˜ ์ˆ˜์‹์€ ์Œ์„ฑ์ธ์‹์˜ ์ˆ˜์‹์„ ๋Œ€๋žต์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์œผ๋กœ
์Œ์„ฑ์‹ ํ˜ธ X๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ํ™•๋ฅ ์„ ์ตœ๋Œ€๋กœ ํ•˜๋Š” ๋ฌธ์žฅ Y_hat์„ ๊ตฌํ•˜๋Š” ๊ฒƒ์ด ๋ชฉํ‘œ์ด๋‹ค.


์—ฌ๊ธฐ์— Bayes ์ •๋ฆฌ๋กœ ์ˆ˜์‹์„ ์ „๊ฐœํ•˜๋ฉด, ๋ฐ‘๋ณ€ P(X)๋ฅผ ๋‚ ๋ ค๋ฒ„๋ฆด ์ˆ˜ ์žˆ๋‹ค.
โˆ™P(X|Y) : Speech Model (= ํ•ด๋‹น ์Œํ–ฅ signal์ด ๋‚˜ํƒ€๋‚  ํ™•๋ฅ )
โˆ™P(Y) : Language Model (= ๋ฌธ์žฅ์˜ ํ™•๋ฅ )

 

6.2 Machine Translation
๊ธฐ๊ณ„๋ฒˆ์—ญ์˜ ๊ฒฝ์šฐ, ์–ธ์–ด๋ชจ๋ธ์ด ๋ฒˆ์—ญ์‹œ์Šคํ…œ์„ ๊ตฌ์„ฑํ•  ๋•Œ, ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•œ๋‹ค.
๊ธฐ์กด์˜ ํ†ต๊ณ„๊ธฐ๋ฐ˜ ๊ธฐ๊ณ„๋ฒˆ์—ญ(SMT)์—์„œ๋Š” ์Œ์„ฑ์ธ์‹๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ
LM์ด ๋ฒˆ์—ญ๋ชจ๋ธ๊ณผ ๊ฒฐํ•ฉํ•ด ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฌธ์žฅ์„ ๋งŒ๋“ค๋„๋ก ๋™์ž‘ํ•œ๋‹ค. 

์‹ ๊ฒฝ๋ง ๊ธฐ๊ณ„๋ฒˆ์—ญ(NMT; https://chan4im.tistory.com/201)์ด ์ฃผ๋กœ ์‚ฌ์šฉ๋˜๋Š”๋ฐ, 
NMT์—์„œ๋„ LM์ด ๋งค์šฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•œ๋‹ค. (์ž์„ธํ•œ ๋‚ด์šฉ์€ ์•„๋ž˜ ๋งํฌ ์ฐธ๊ณ )

 

6.3 OCR (๊ด‘ํ•™ ๋ฌธ์ž ์ธ์‹)
๊ด‘ํ•™๋ฌธ์ž์ธ์‹(OCR)๋ฅผ ๋งŒ๋“ค ๋•Œ๋„ LM์ด ์‚ฌ์šฉ๋œ๋‹ค.
์‚ฌ์ง„์—์„œ ์ถ”์ถœํ•ด ๊ธ€์ž๋ฅผ ์ธ์‹ํ•  ๋•Œ, ๊ฐ ๊ธ€์ž๊ฐ„ ํ™•๋ฅ ์„ ์ •์˜ํ•˜๋ฉด ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

๋”ฐ๋ผ์„œ OCR์—์„œ๋„ ์–ธ์–ด๋ชจ๋ธ์˜ ๋„์›€์„ ๋ฐ›์•„  ๊ธ€์ž๋‚˜ ๊ธ€์”จ๋ฅผ ์ธ์‹ํ•œ๋‹ค.

 

6.4 ๊ธฐํƒ€ Generative Model
์Œ์„ฑ์ธ์‹, MT, OCR ์—ญ์‹œ ์ฃผ์–ด์ง„ ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•ด๋‚ด๋Š” ์ผ์ข…์˜ ์ž์—ฐ์–ด ์ƒ์„ฑ์ด๋ผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
๊ธฐ๊ณ„ํ•™์Šต์˜ ๊ฒฐ๊ณผ๋ฌผ๋กœ์จ ๋ฌธ์žฅ์„ ๋งŒ๋“ค์–ด๋‚ด๋Š” ์ž‘์—…์€ ๋ชจ๋‘ ์ž์—ฐ์–ด ์ƒ์„ฑ๋ฌธ์ œ์˜ ์นดํ…Œ๊ณ ๋ฆฌ๋ผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.

 

 

 

 

 

 

 

 

 

 

 


๋งˆ์น˜๋ฉฐ...

์ด๋ฒˆ์‹œ๊ฐ„์—๋Š” ์ฃผ์–ด์ง„ ๋ฌธ์žฅ์„ ํ™•๋ฅ ์ (stochastic)์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๋Š” ๋ฐฉ๋ฒ•(LM)์„ ์•Œ์•„๋ณด์•˜๋‹ค.
NLP์—์„œ ๋ฌธ์žฅ์˜ˆ์ธก์˜ ํ•„์š”์„ฑ์€ DNN์ด์ „๋ถ€ํ„ฐ ์žˆ์–ด์™”๊ธฐ์—, N-gram ๋“ฑ์˜ ๋ฐฉ๋ฒ•์œผ๋กœ ๋งŽ์€ ๊ณณ์— ํ™œ์šฉ๋˜์—ˆ๋‹ค.

๋‹ค๋งŒ, N-gram๊ณผ ๊ฐ™์€ ๋ฐฉ์‹๋“ค์€ ์—ฌ์ „ํžˆ ๋‹จ์–ด๋ฅผ ๋ถˆ์—ฐ์†์ ์ธ ์กด์žฌ๋กœ ์ทจ๊ธ‰ํ•˜๊ธฐ์—
ํฌ์†Œ์„ฑ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ง€ ๋ชปํ•ด generalization์—์„œ ๋งŽ์€ ์–ด๋ ค์›€์„ ๊ฒช์—ˆ๋‹ค.

์ด๋ฅผ ์œ„ํ•ด Markov๊ฐ€์ •, Smoothing, Didcounting์œผ๋กœ N-gram์˜ ๋‹จ์ ์„ ๋ณด์™„ํ•˜๊ณ ์ž ํ–ˆ์ง€๋งŒ N-gram์€ ๊ทผ๋ณธ์ ์œผ๋กœ ์ถœํ˜„๋นˆ๋„์— ๊ธฐ๋ฐ˜ํ•˜๊ธฐ์— ์™„๋ฒฝํ•œ ํ•ด๊ฒฐ์ฑ…์ด ๋  ์ˆ˜๋Š” ์—†์—ˆ๋‹ค.


ํ•˜์ง€๋งŒ DNN์˜ ๋„์ž…์œผ๋กœ LM์„ ์‹œ๋„ํ•˜๋ฉด Generalization์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
DNN์€ ๋น„์„ ํ˜•์  ์ฐจ์›์ถ•์†Œ์— ๋งค์šฐ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๊ฐ–๊ธฐ์—, ํฌ์†Œ๋‹จ์–ด์กฐํ•ฉ์—๋„ ํšจ๊ณผ์  ์ฐจ์›์ถ•์†Œ๋ฅผ ํ†ตํ•ด ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ๋‹ค.
๋”ฐ๋ผ์„œ inference time์—์„œ ์ฒ˜์Œ๋ณด๋Š” sequence data๊ฐ€ ์ฃผ์–ด์ง€๋”๋ผ๋„ ๊ธฐ์กด์— ๋น„ํ•ด ๊ธฐ์กด ํ•™์Šต์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ›Œ๋ฅญํ•œ ์˜ˆ์ธก์ด ๊ฐ€๋Šฅํ•˜๋‹ค.




์ง€๊ธˆ๊นŒ์ง€ LM์ด ์ •๋ง ๋งŽ์€ ๋ถ„์•ผ(์Œ์„ฑ์ธ์‹, TM, OCR)์—์„œ ์ดˆ์„์œผ๋กœ ๋‹ค์–‘ํ•˜๊ฒŒ ํ™œ์šฉ๋จ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค.
์ด์ œ, ์‹ ๊ฒฝ๋ง์„ ํ†ตํ•ด ๊ฐœ์„ ๋œ LM์œผ๋กœ ๋’ท ๋‚ด์šฉ๋“ค์—์„œ๋Š” ์ž์—ฐ์–ด์ƒ์„ฑ(ํŠนํžˆ TM;๋ฒˆ์—ญ)์— ๋Œ€ํ•ด ๋‹ค๋ค„๋ณผ ๊ฒƒ์ด๋‹ค.
https://chan4im.tistory.com/201 , https://chan4im.tistory.com/202

+ Recent posts