AI TECH/๋…ผ๋ฌธ

[๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ODQA/Latent Retrieval for Weakly Supervised Open Domain Question Answering

prefer_all 2022. 12. 22. 10:44

๐Ÿ’ฌ ํ‘œ์‹œ๋Š” ์Šคํ„ฐ๋”” ์‹œ๊ฐ„์— ๋‹ค๋ฃฌ ๋ฉ˜ํ† ๋‹˜์˜ ์ฝ”๋ฉ˜ํŠธ, ์งˆ์˜์‘๋‹ต ๋‚ด์šฉ์„ ๋‹ฌ์•„๋‘” ๊ฒƒ์ด๋‹ค.

๐Ÿ’ฌ BERT์˜ 3์ €์ž๊ฐ€ ์ด ๋…ผ๋ฌธ์˜ 1์ €์ž์ด๋‹ค.

https://arxiv.org/abs/1906.00300

Introduction

ODQA์˜ ๊ธฐ์กด ์—ฐ๊ตฌ ํ•œ๊ณ„์ 

1. Given-evidence to open corpus
    ํ˜„์‹ค์—์„œ๋Š” ๋‹ต๋ณ€์„ ์œ„ํ•œ  evidence๊ฐ€ ๋ฐ์ดํ„ฐ ์…‹์— ํ•จ๊ป˜ ์ฃผ์–ด์ง€์ง€ ์•Š๋Š” ๊ฒŒ ์ผ๋ฐ˜์ ์ด๋‹ค.
2. Reliance on IR system(Information Retrieval)
    search space๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด IR system์˜ ๊ฒฐ๊ณผ๋ฌผ์— ์˜์กดํ•˜๊ฒŒ ๋œ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ IR๊ณผ QA๋Š” ๋‹ค๋ฅด๋‹ค.
     QA๋Š” IR๋ณด๋‹ค ๋” ๋งŽ์€ language understanding ๋Šฅ๋ ฅ์ด ํ•„์š”ํ•˜๋‹ค.
3. QA์—์„œ ์‚ฌ์šฉ์ž๋Š” ์ƒ๋Œ€์ ์œผ๋กœ ๋ช…ํ™•ํ•˜๊ฒŒ ์•Œ๋ ค์ง€์ง€ ์•Š๋Š” ์ •๋ณด๋ฅผ ์งˆ๋ฌธํ•˜๊ธฐ ๋•Œ๋ฌธ์— 
    IR ์‹œ์Šคํ…œ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ๊ฒƒ ๋Œ€์‹  ์ง์ ‘์ ์œผ๋กœ QA ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ง€๊ณ  retrieve๋ฅผ ํ•™์Šตํ•ด์•ผ๋งŒ ํ•œ๋‹ค.


์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด IR system์— ์˜์กดํ•˜์ง€ ์•Š๊ณ , evidence์—†์ด Question-answering ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ retriever๋„ ํ•จ๊ป˜ ํ•™์Šต์‹œํ‚ค๋Š” ํŒŒ์ดํ”„ ๋ผ์ธ ์ œ์•ˆํ•œ๋‹ค.

retrieval๊ฐ€ ๊ฑฐ๋Œ€ํ•œ open corpus ์—์„œ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ๋ถˆ๊ฐ€๋Šฅํ•˜๊ณ , ICT๋กœ ์‚ฌ์ „ ํ•™์Šต์‹œํ‚ด์œผ๋กœ์จ end-to-end ํ•™์Šต์ด ๊ฐ€๋Šฅํ–ˆ๋‹ค.

๐Ÿ’ฌ Passage๊ฐ€ ์•„๋‹Œ evidence๋ผ๊ณ  ํ•˜๋Š” ์ด์œ ๋Š”?

์œ„ํ‚คํ”ผ๋””์•„ ๋ฌธ์„œ ํ•˜๋‚˜์— ๋„ˆ๋ฌด ๋งŽ์€ ํ…์ŠคํŠธ๊ฐ€ ์ ํ˜€ ์žˆ๊ณ  ์—ฌ๊ธฐ์„œ gold passage๊ฐ€ ๋ฌด์—‡์ธ์ง€ ์ •๋ณด๊ฐ€ ์—†๋Š” ์ƒํƒœ์—์„œ, ์šฐ๋ฆฌ๋Š” block์„ ์ฐพ๋Š”๋‹ค. ํ•˜๋‚˜์˜ block์ด ์ •๋‹ต์ผ ์ˆ˜๋„ ์žˆ๊ณ  ์•„๋‹ ์ˆ˜๋„ ์žˆ์–ด์„œ "evidence"๋ผ๊ณ  ํ•œ๋‹ค.

์‹ค์ œ๋กœ ํ•™์Šต์„ ํ•  ๋•Œ, ์œ„ํ‚คํ”ผ๋””์•„ ํ•œ ํŽ˜์ด์ง€๋ฅผ ๋‹ค ์“ธ ์ˆ˜ ์—†์–ด์„œ top -k ๋ฅผ ํ†ตํ•ด ์†Ž์•„๋‚ด์„œ pretrainํ•œ๋‹ค.
IR์€ evidence๋ผ๋Š” ์šฉ์–ด๋ฅผ ๋งŽ์ด ์“ฐ๊ณ , ์ตœ๊ทผ์—๋Š” passage๋ผ๊ณ ๋„ ๋ถ€๋ฅธ๋‹ค.

Related Work

๋Œ€๋ถ€๋ถ„์˜ retrieval-based / open domain / QA system์€ ์•„๋ž˜์™€ ๊ฐ™์€ notation์„ ์‚ฌ์šฉํ•œ๋‹ค.


Question q๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, answer derivation (b,s)์—์„œ answer string a๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ด๋ฉฐ, ์ด๋ฅผ Score function S๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.

Inference์˜ output a๋ฅผ ์ฐพ๋Š” ๊ณผ์ •์€ ์•„๋ž˜์™€ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค.


์ด๋ฒˆ ๋…ผ๋ฌธ์€ component scoring์„ ์ƒˆ๋กญ๊ฒŒ ์ œ์•ˆํ•œ๋ฐ ์ค‘์‹ฌ์„ ๋‘๊ณ  ์žˆ์œผ๋ฏ€๋กœ, ์ด์ „ ๋…ผ๋ฌธ์˜ component์˜ score function S_retr์™€ S_read๊ฐ€ ์–ด๋–ค์‹์œผ๋กœ ์ •์˜ ๋˜์—ˆ๋Š”์ง€ ๊ฐ„๋žตํ•˜๊ฒŒ ์•Œ์•„๋ณด์ž.


์ œ์•ˆ ๋ฐฉ๋ฒ•๋ก 


Retriever๊ณผ Reader๊ฐ€ ๊ณต๋™์œผ๋กœ ํ•™์Šต๋˜๋Š” end-to-end ๋ชจ๋ธ์ธ ORQA๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

  • ORQA ๋Š” ์˜ค๋กœ์ง€ question-answer (์งˆ๋ฌธ๊ณผ ๋‹ต๋ณ€ ์Œ ๋ฐ์ดํ„ฐ) ๋กœ๋งŒ ํ•™์Šต ํ•˜์—ฌ open corpus ๋‚ด์—์„œ evidence๋ฅผ ์ฐพ๋Š”๊ฒƒ์„ ํ•™์Šตํ•œ๋‹ค. ⇒ ORQA ๋Š” open corpus ๋‚ด์—์„œ ์–ด๋–ค text ๋“  ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ธฐ์กด QA ๋Š” black-box IR ์— ์˜ํ•ด ๋ฐ˜ํ™˜๋˜๋Š” closed set ์— ์ œํ•œ๋œ๋‹ค.
  • Retriever ๋ฅผ ICT ๊ธฐ๋ฒ•์„ ํ†ตํ•ด Pre-train ์„ ์ง„ํ–‰ํ•œ๋‹ค. ⇒ Word matching features ๊ทธ ์ด์ƒ์„ ํ•™์Šต ํ•  ์ˆ˜ ์žˆ๋‹ค.


๋ชจ๋ธ ๊ตฌ์กฐ
ORQA ๋Š” Retriever ์™€ Reader ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค.
Retriever ๋Š” open corpus ์—์„œ evidence๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š” evidence block ์„ ๊ฒ€์ƒ‰ํ•œ๋‹ค.
Reader ๋Š” Retriever ์—์„œ ๋‚˜์˜จ Top-k ์— ํ•ด๋‹นํ•˜๋Š” evidence block ์—์„œ answer ๊ฐ€ ๋  ์ˆ˜ ์žˆ๋Š” ํ›„๋ณด๋ฅผ ์ฐพ๊ณ  ๊ฐ€์žฅ ๋†’์€ ์ ์ˆ˜๋ฅผ ๊ฐ€์ง€๋Š” answer ๋ฅผ ์ตœ์ข…์ ์ธ answer ๋กœ ์„ ์ •ํ•œ๋‹ค.


Retriever component
Retrieval score๋ฅผ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ •์˜ํ•œ๋‹ค.

q: question / b: evidence block / W: BERT์˜ representation vector๋ฅผ 128 dimensional vector๋กœ ๋ฐ”๊ฟ”์ฃผ๋Š” ํ–‰๋ ฌ
์œ„ ์ˆ˜์‹์„ ์•„๋ž˜์˜ ๊ทธ๋ฆผ๊ณผ ๊ฐ™์ด ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋‹ค. ์ดํ›„ Top-k๋กœ ๊ฐ€์žฅ ๋†’์€ ์ ์ˆ˜ k๊ฐœ๋ฅผ ์„ ํƒํ•˜๊ณ , ๋‹ต์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” k๊ฐœ์˜ evidence block์„ ์„ ์ •ํ•œ๋‹ค.

 

Reader component

Following Lee et al. (2016), a span is represented by the concatenation of its end points, which is scored by a multi-layer perceptron to enable start/end interaction. (Span์€ MLP์— ์˜ํ•ด ์ ์ˆ˜๊ฐ€ ๋งค๊ฒจ์ง„ end points์˜ ์—ฐ๊ฒฐ์— ์˜ํ•ด ํ‘œํ˜„๋œ๋‹ค.์ด๋Š” start/end ์ƒํ˜ธ์ž‘์šฉ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•จ์ด๋‹ค.)

 

Challenges
๋ชจ๋ธ์€ ๊ฐœ๋…์ ์œผ๋กœ ๊ฐ„๋‹จํ•˜์ง€๋งŒ, ํ•™์Šตํ•˜๋Š” ๊ณผ์ •์— ์žˆ์–ด์„œ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์–ด๋ ค์›€์ด ์กด์žฌํ•œ๋‹ค.
1. open evidence corpus ๋Š” ๋„ˆ๋ฌด๋‚˜๋„ ํฐ search space ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค. (๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” 1300๋งŒ ์ด์ƒ์˜ evidence block ์ด ์กด์žฌํ•œ๋‹ค๊ณ  ํ•œ๋‹ค.)
2. ๋„ˆ๋ฌด๋‚˜๋„ ํฐ search space ๋ฅผ ํƒ์ƒ‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์€ ์ž ์žฌ์ ์ด๋‹ค. ๊ทธ๋ž˜์„œ teacher-forcing ๊ธฐ๋ฒ•์œผ๋กœ ํ•™์Šต์‹œํ‚ฌ ์ˆ˜ ์—†๋‹ค.
3. Latent-variable (์ž ์žฌ-๋ณ€์ˆ˜) ๋ฐฉ๋ฒ•์€ ๊ทน๋‹จ์ ์œผ๋กœ ๋ชจํ˜ธํ•œ ๊ฒฐ๊ณผ๋ฌผ์˜ ์ˆ˜๊ฐ€ ๋งŽ๊ธฐ ๋•Œ๋ฌธ์— ๋‹จ์ˆœํžˆ ์‚ฌ์šฉํ•˜๊ธฐ ์–ด๋ ต๋‹ค.

๐Ÿ’ฌ ๋ชจํ˜ธํ•œ ๊ฒฐ๊ณผ?

QA์—์„œ Passage/Evidence๋Š” ์‹ค์ œ liklihood ๊ณ„์‚ฐ ์‹œ ๋ชจ๋‘ ๊ณ ๋ คํ•  ์ˆ˜ ์—†๋‹ค. answer์˜ ํ™•๋ฅ  ๊ณ„์‚ฐ ์‹œ ์‚ฌ์‹ค ๋ชจ๋“  Passage์—์„œ answer ํ›„๋ณด๊ตฐ์˜ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค. ๋ชจํ˜ธํ•˜๋‹ค๋Š” ๋ง์€ ์ด๋Ÿฌํ•œ ๋งฅ๋ฝ์—์„œ ๋‚˜์™”๋‹ค.

Answer ๊ฐ€ ‘seven’ ์ผ๋•Œ, Supportive Evidence ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ Spurious Ambiguity ์—์„œ๋„ seven ์ด๋ผ๋Š” ๋‹จ์–ด๊ฐ€ ์กด์žฌ ํ•œ๋‹ค. ⇒ ๋งฅ๋ฝ์„ ์ดํ•ดํ•˜์ง€ ๋ชปํ•œ์ฑ„ seven ์ด๋ผ๋Š” word matching ์„ ํ†ตํ•ด ๋ชจํ˜ธํ•œ evidence block ์„ ์„ ํƒ ํ•  ์ˆ˜ ์žˆ๋‹ค.

 

ICT(Inverse Cloze Task)โญ

์œ„์˜ challenge๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” Retriever์„ ICT ๊ธฐ๋ฒ•์„ ํ†ตํ•ด Pretrainํ•˜๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆํ•œ๋‹ค.
Cloze Task ๋Š” ๋งฅ๋ฝ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋งˆ์Šคํ‚น๋œ text ๋ฅผ ์œ ์ถ”ํ•˜๋Š” ๊ฒƒ์ธ๋ฐ, ICT ๋Š” ๊ทธ ๋ฐ˜๋Œ€๋กœ text ๋ฅผ ํ†ตํ•ด ๋งฅ๋ฝ์„ ์œ ์ถ”ํ•˜๋Š” Task ์ด๋‹ค.

๐Ÿ’ฌ MLM๊ณผ ICT์˜ ์ฐจ์ด

Pretrain์˜ ๊ฐ€์žฅ ๋ณดํŽธ์ ์ธ ๋ฐฉ์‹์€ MLM(Masked Language Model)์ด๋‹ค. text representation์„ ์ž˜ ํ•˜๊ธฐ ์œ„ํ•ด์„œ mask๋ฅผ ๋ณต๊ตฌํ•œ๋‹ค. MLM์€ token ๋‹จ์œ„๋ผ๋ฉด, ICT๋Š” sentence ๋‹จ์œ„์ด๋‹ค.

์œ„ ๊ทธ๋ฆผ์—์„œ 1๋ฒˆ index์˜ context์—์„œ ๊ฐ€์ ธ์˜จ ํ•˜๋‚˜์˜ ๋ฌธ์žฅ์ด pseudo-question์˜ ์—ญํ• ์„ ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ€์ ธ์˜จ ๋งฅ๋ฝ์ด pseudo-evidence๊ฐ€ ๋œ๋‹ค. ์ด๋ฅผ ์‹์œผ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

q: random sentence / b: text surrounding q

๐Ÿ’ฌ Pseudo-question๊ณผ question์˜ ์ฐจ์ด

Pseudo-question์€ ์œ„ ๊ทธ๋ฆผ์—์„œ์ฒ˜๋Ÿผ ์งˆ๋ฌธ์˜ ํ˜•ํƒœ๊ฐ€ ์•„๋‹ ์ˆ˜ ์žˆ๋‹ค. 
Pseudo-question์€ ๋ผ๋ฒจ๋ง์ด ์•ˆ ๋˜์–ด ์žˆ๋‹ค. ํŠน์ • input ๊ทผ์ฒ˜์— ์žˆ๋Š” ๊ฒƒ์„ positive๋กœ ๊ณ ๋ คํ•˜๋ฉฐ ๋งŒ๋“ค์–ด์ง„ ๋ฌธ์žฅ์ด๋‹ค.

ORQA๋Š” ์œ„ํ‚คํ”ผ๋””์•„์˜ ํŠน์ • ๋ธ”๋ก์„ Pseudo-query๋กœ ๋ณด๊ณ , QA๊ฐ€ ์•„๋‹ˆ๋ผ ๊ด€๋ จ output์„ ์ฐพ๋Š” ์ „์ฒด ๊ณผ์ • ์ž์ฒด๋ฅผ ํ•™์Šตํ•œ๋‹ค.
์ฆ‰, 2 stage learning์œผ๋กœ ๋ณด๋ฉด ๋˜๋Š”๋ฐ ์ฒซ๋ฒˆ์งธ ๋‹จ๊ณ„๊ฐ€ pretrain(ICT)์ด๊ณ  ๋‘๋ฒˆ์งธ ๋‹จ๊ณ„๊ฐ€ finetuning์ด๋‹ค.

 

ICT์˜ ์žฅ์ ์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

1. word matching features ์ด์ƒ์˜ ๊ฒƒ์„ ํ•™์Šต ํ•  ์ˆ˜ ์žˆ๋‹ค. 
    - ์‹ค์ œ QA ์—์„œ๋Š” ์งˆ๋ฌธ์—์„œ ์–ธ๊ธ‰๋˜์ง€ ์•Š์€ ๋ถ€๋ถ„์„ ์ฐพ์•„์„œ ๋‹ต์„ ํ•ด์ฃผ๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค. ์ด ์ ์ด ์ •๋ณด ๊ฒ€์ƒ‰๊ณผ๋Š” ๋‹ค๋ฅธ ๋ถ€๋ถ„์ด๋‹ค.
    - ๊ทธ๋ ‡๋‹ค๊ณ  ํ•ด์„œ word matching ์ด ํ•„์š” ์—†๋Š”๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋งค์šฐ ์ค‘์š”ํ•œ feature ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ทธ๋ž˜์„œ 90% ์˜ ์˜ˆ์‹œ์—์„œ context ์—์„œ ๋ฌธ์žฅ์„ ์ œ๊ฑฐ ํ•จ์œผ๋กœ์จ , low-level ์˜ word matching ๊นŒ์ง€ ํ•™์Šตํ•˜๋„๋ก ํ–ˆ๋‹ค.
2. ์‚ฌ์ „ ํ›ˆ๋ จ ์ค‘ ๋ฌธ์žฅ๊ณผ Fine-tuning ์‹œ Question ๊ฐ„ ๋ถˆ์ผ์น˜ ํ•ด๋„, zero-shot evidence ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์ด ์ž ์žฌ๋ณ€์ˆ˜ ๋ถ€ํŠธ์ŠคํŠธ๋žฉ ํ•˜๊ธฐ์— ์ถฉ๋ถ„ํ•˜๋‹ค.
3. pre-train ์‹œ , evidence block ๊ณผ downstream ์‹œ , evidence block ๊ฐ„์˜ ๋ถˆ์ผ์น˜๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š”๋‹ค.
    - evidence block ์„ ์ธ์ฝ”๋”ฉ ํ•˜๋Š” BERT_B ๋ฅผ ํ•™์Šต ํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. Question encoder ์ธ BERT_Q ๋งŒ fine-tuning ์„ ๊ฑฐ์นœ๋‹ค.
4. spurious ambiguity ์— bias ๋ฅผ ์คŒ์œผ๋กœ์จ ํ”ผํ•  ์ˆ˜ ์žˆ๋‹ค.

 

Inference

- ๋ชจ๋“  Evidence block๋“ค์€ finetuning์—์„œ encodingํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค. (pre-training์—์„œ ์ด๋ฏธ ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค)
           Fixed block encoders already provide a useful representation for retrieval
- ๋‚ด์ ๊ฐ’์˜ maximum์„ ๋น ๋ฅด๊ฒŒ ์ฐพ๊ธฐ ์œ„ํ•œ index๋ฅผ pre-compile ํ•  ์ˆ˜ ์žˆ๋‹ค
- Inference๋Š” pre-compiled index๋ฅผ ์ด์šฉํ•œ beam search ๋ฐฉ์‹ ์‚ฌ์šฉํ•œ๋‹ค
- Tok-k๊ฐœ์˜ evidence block ๊ฒ€์ƒ‰ ํ›„ reader score๋งŒ ๊ณ„์‚ฐ(๋…ผ๋ฌธ์—์„œ k = 5)


Learning

1. Answer derivation์— ๋Œ€ํ•œ ๋ถ„ํฌ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

q : question / b : index of an evidence block
s : span of text within block b / Top(k) : S_retr ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฝ‘์€ k ๊ฐœ์˜ ๊ฒ€์ƒ‰๋œ evidence block ์„ ์˜๋ฏธํ•œ๋‹ค. 

2. Gold answer a๊ฐ€ ์ฃผ์–ด์ง€๋ฉด,  beam search๋ฅผ ํ†ตํ•ด ๋ชจ๋“  ์˜ฌ๋ฐ”๋ฅธ derivations ์„ ์ฐพ๊ณ  marginal log-likelihood ๋กœ ์ตœ์ ํ™” ํ•œ๋‹ค. 

- a=TEXT(s) ๋ผ๋Š” ๊ฒƒ์€ Answer a ๊ฐ€ Span s ์— ํฌํ•จ๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ๋ฅผ ์˜๋ฏธํ•œ๋‹ค.
- Top k ๋‚ด์— ์žˆ๋Š” ๋ฌธ์„œ์— span ์•ˆ์— a ๊ฐ€ ๋“ค์–ด์žˆ๋Š” ๊ฑด ๋งค์šฐ ์ ์„ ๊ฒƒ ์ด๋ฏ€๋กœ Early Update ์ด๋ผ๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆํ•œ๋‹ค.

3. Early Learning : Top (k) ๊ฐ€ ์•„๋‹Œ Top (c), ์ฆ‰ ์ข€ ๋” ํฌ๊ด„์ ์œผ๋กœ evidence block ๋ฅผ ๊ณจ๋ผ์„œ ๊ณ„์‚ฐ์— ์žˆ์–ด์„œ ์ด์ ์ด ์žˆ๋‹ค. (c= 5000)

a ๊ฐ€ index b ์ธ evidence block ๋‚ด์— ์กด์žฌ ํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ๊ฐ€๋ฆฌํ‚จ๋‹ค .


4. Final Loss๋Š” ๋‘˜ ๋‹ค ์—…๋ฐ์ดํŠธํ•œ๋‹ค. 

๋งŒ์•ฝ ์ •๋‹ต a ์™€ ๋งค์นญ ๋˜๋Š”๊ฒŒ ์—†๋‹ค๋ฉด ๊ทธ ์˜ˆ์‹œ๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค.
๊ฑฐ์˜ ๋ชจ๋“  ์˜ˆ์ œ๊ฐ€ Random initialization ๋กœ ํ๊ธฐ๋  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋˜์ง€๋งŒ ICT pre-train ๋•Œ๋ฌธ์— ์˜ˆ์‹œ์˜ 10% ๋ณด๋‹ค ์ ๊ฒŒ ์˜ˆ์‹œ๋ฅผ ์‚ญ์ œํ–ˆ๋‹ค.

Contribution
IR ์‹œ์Šคํ…œ์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ , question-answer ์Œ๋งŒ ์‚ฌ์šฉํ•ด์„œ end-to-end ๋กœ Retriever ์™€ Reader ๋ฅผ ๊ณต๋™ ํ•™์Šตํ•˜๋Š” ์ฒซ๋ฒˆ์งธ ODQA ์‹œ์Šคํ…œ์ด๋‹ค.


์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ

Dataset

  • Natural Question
    • open version์˜ dataset. ์งง์€ answer๋ฅผ ๊ฐ€์ง„ question์„ ์‚ฌ์šฉํ–ˆ๊ณ , ์ฃผ์–ด์ง„ evidence document๋Š” ์ œ๊ฑฐ.
    • ๋งŽ์€ token์„ ๊ฐ€์ง„ answer์—์„œ ์—ญ์‹œ 5๊ฐœ์˜ token๋ณด๋‹ค ๋” ๋งŽ์ด ์‚ญ์ œํ•จ. ์™œ๋ƒ๋ฉด ๊ธด ํ† ํฐ์˜ answer๋Š” extractive snippet๊ณผ ๋น„์Šทํ•˜๊ธฐ ๋•Œ๋ฌธ.
  • WebQuestions
    • Google Suggest API์˜ ์ƒ˜ํ”Œ question์„ ํฌํ•จํ•˜๊ณ  ์žˆ์Œ
    • annotated answer(string type์˜ representation๋งŒ ์ทจ๊ธ‰ํ•จ)
  • CuratedTrec
    • TREC QA data์˜ question-answer pair๋กœ ์ด๋ฃจ์–ด์ง„ corpus
    • question์€ MSNSearch๋‚˜ AskJeeves logs ๊ฐ™์€ real queries
  • TriviaQA
    • trivia QA pair(from web)
    • unfiltered set ํ™œ์šฉ, supervised evidence๋Š” ๋ฒ„๋ฆผ
  • SQuAd
    • ODQA ๋ณด๋‹ค๋Š” reading comprehension์— ๋” ์ ํ•ฉํ•œ dataset
    • Wikipedia ๋ฌธ๋‹จ์—์„œ ์„ ํƒ๋œ answer spans ์™€ annotator๋“ค์— ์˜ํ•ด ์“ฐ์—ฌ์ง„ question
  • dev set ์ด ์ฃผ์–ด์ง€์ง€ ์•Š์€ dataset์—์„œ๋Š” training set์˜ 10%๋ฅผ ๋žœ๋ค์ถ”์ถœ
  • test set์ด hidden๋œ ๊ฒฝ์šฐ(dev set๋งŒ ์žˆ๊ณ  test set์ด ์—†์„ ๊ฒฝ์šฐ๋ฅผ ์˜๋ฏธํ•˜๋Š” ๊ฒƒ ๊ฐ™์Œ) ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ training set ์˜ 10%๋ฅผ dev set์œผ๋กœ ์“ฐ๊ณ  dev set์€ test set์œผ๋กœ ์‚ฌ์šฉ

 

Dataset Biases

  • ๋‹ค์–‘ํ•œ QA pair๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ๊ฒƒ์€ ์ค‘์š”
    • ๋ชจ๋“  ์กด์žฌํ•˜๋Š” dataset๋“ค์ด ๋‚ด๋ถ€์— bias๋“ค์„ ๊ฐ€์ง€๊ณ  ์žˆ์„ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ
  • Natural Questions, WebQuestions, CuratedTrec
    • ์งˆ๋ฌธ์ž๋“ค์ด ์ •๋‹ต์„ ๋ชจ๋ฅธ ์ฑ„๋กœ ์งˆ๋ฌธํ•œ ๊ฒƒ๋“ค์ž„
    • ์ง„์งœ ์ •๋ณด๋ฅผ ์ฐพ๋Š” ์งˆ๋ฌธ
    • ๊ทธ๋ž˜์„œ moderate bias(์ค‘๊ฐ„์ •๋„์˜ bias)๊ฐ€ ์žˆ๋‹ค๊ณ  ๋ณผ ์ˆ˜ ์žˆ์Œ
  • TriviaQA, SQuAD
    • ์ •๋ณด๊ฐ€ ํ•„์š”ํ•ด์„œ ํ•œ ์งˆ๋ฌธ๋“ค์ด ์•„๋‹˜

 

Implementation Details

Evidence corpus

  • English Wikipedia snapshot(Dec, 20, 2018)
  • BERT tokenizer์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์ตœ๋Œ€ 288 wordpieces๋กœ ์ž๋ฆ„
  • 13 million evidence block ์ด์ƒ

Hyperparameters

  • 12 trnasformer layers(hidden size 768)
  • 128 dimensions
  • BERT ์™€ ๋™์ผํ•œ optimizer์‚ฌ์šฉ
  • ICT๋กœ retriever๋ฅผ pre-training ํ•  ๋•Œ
    • lr: 10^{-4}
    • batch size 4096
    • 100k steps
  • fine-tunning ์‹œ์—
    • lr: 10^{-5}
    • batch size 1
    • larger datset(nq, TQA, SQuAd)์—๋Š” 2 epoch์„ ๋Œ๋ ธ์ง€๋งŒ smaller dataset(WebQuestions, CuratedTrec)์—์„œ๋Š” 20 epoch ๋Œ๋ฆผ
๐Ÿ’ฌ Batch size๊ฐ€ ๋งค์šฐ ํฌ๋‹ค
batch size๊ฐ€ ์ž‘์œผ๋ฉด ์•ˆ๋œ๋‹ค๋Š” ์ง€์ ์„ ๋ฐ›๊ธฐ๋„ ํ•œ๋‹ค.
ORAQ๊ฐ€ constrastive learning์œผ๋กœ ์–ด๋–ป๊ฒŒ ์ด์–ด์ง€๋Š” ์ง€ (negative ๋„ฃ๋Š” ๋ฐฉ๋ฒ• ๋“ฑ)๋ฅผ ์‚ดํŽด๋ณด๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค.

 

Baseline

  • ๋‹ค๋ฅธ retrieval methods ์™€ ๋น„๊ตํ•ด๋ณด์ž. S_{retr}(b, q) retrieval scoreํ™œ์šฉ

BM25

  • SOTA unsupervised retrieval method
  • IR task ์™€ evidence retrieval for QA ๋‘˜ ๋‹ค์— ํŠผํŠผํ•œ method์ด๊ธฐ์— ๋น„๊ต๊ตฐ์œผ๋กœ ์‚ฌ์šฉํ•จ
  • BM25๊ฐ€ ํ•™์Šต ์š”์†Œ๊ฐ€ ์•„๋‹ˆ๊ธฐ ๋•Œ๋ฌธ์—(not trainable) fine-tuning ํ•˜๋Š” ๋™์•ˆ์˜ retrieved evidence๋Š” ๊ณ ์ •๋˜์–ด ์žˆ๋‹ค.
  • final score = BM25 ์™€ reader score์˜ ๊ฐ€์ค‘ํ•ฉ

Language Models

  • ๋˜ ๋‹ค๋ฅธ ๋น„๊ต๊ตฐ์œผ๋กœ unsupervised neural retrieval์€ traditional IR์„ ๋›ฐ์–ด๋„˜๊ธฐ ์–ด๋ ต๊ธฐ๋กœ ์œ ๋ช…
  • ๊ทธ๋ž˜์„œ ๋น„๊ต baseline์œผ๋กœ LM์˜ unsupervised pooled retresentation์„ ์‹คํ—˜
    • ๋‘ ๊ฐœ์˜ 128 dim representation(๋„๋ฆฌ ์‚ฌ์šฉ๋จ)
      1. NNLM(๋ฌธ๋งฅ๊ณผ ๋…๋ฆฝ์ ์ธ embeddings)
      2. ELMo(small, ๋ฌธ๋งฅ๊ณผ ๊ด€๋ จ๋œ ์–‘๋ฐฉํ–ฅ์˜ LSTM)

 

๊ฒฐ๊ณผ

  • BM25๋Š” powerfulํ•œ retrieval system์ด๋‹ค
  • ORQA๋Š” ์ง„์งœ ์ •๋ณด๋ฅผ ์ฐพ๊ณ ์ž ํ•˜๋Š” datasets(Natural Questions, WebQuestions, CuratedTrec)์—์„œ BM25๋ฅผ ๋Šฅ๊ฐ€ํ–ˆ๋‹ค.
  • ๊ทธ๋Ÿฌ๋‚˜ question asker๊ฐ€ ์ด๋ฏธ ์ •๋‹ต์„ ์•Œ๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์ธ SQuAD ์™€ TriviaQA์˜ ๊ฒฝ์šฐ retrival problem ์€ traditinal IR๊ณผ ๋น„์Šทํ•˜๋‹ค
  • 128์ฐจ์›์œผ๋กœ ์••์ถ•๋œ ๋ฒกํ„ฐ๋Š” evidence์˜ ๋ชจ๋“  ๋‹จ์–ด๋“ค์„ ์ •ํ™•ํ•˜๊ฒŒ ํ‘œํ˜„ํ•˜๋Š” BM25๋ณด๋‹ค ์„ฑ๋Šฅ์ด ๋‚˜์˜๋‹ค.
  • SQuAD dataset์€ Dev data์™€ Test data ์ ์ˆ˜ ์ฐจ์ด๊ฐ€ ํฌ๋‹ค. ์ด๋Š” 536๊ฐœ์˜ ์ ์€ ์ง€๋ฌธ์—์„œ 10๋งŒ๊ฐœ์˜ ๋งŽ์€ ์งˆ๋ฌธ์„ ๋ฝ‘์•„๋ƒˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค (data bias) 
    => ์ข‹์€ retrieval target(์ ์ˆ˜ ํ•˜๋ฝ์ด ํฌ์ง€ ์•Š์€)์„ ์œ„ํ•ด์„œ๋Š” 
    1) training example        2) IID assumption์„ ์œ„๋ฐ˜ํ•˜๋Š”์ง€     3) ํ•™์Šต๋œ retrieval๊ณผ ์ ํ•ฉํ•œ๊ฐ€ ๋“ฑ์„ ์‹ ๊ฒฝ์จ์•ผํ•˜๊ณ  ์ด๋Ÿฌํ•œ ์ด์œ ์—์„œ ์•ž์œผ๋กœ์˜ ODQA ๋ชจ๋ธ๋“ค์€ SQuAD dataset์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ธฐ๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

 

๋ถ„์„

Masking Rate in the ICT
ICT ์‚ฌ์ „ ํ•™์Šต์‹œ masking rate๋ฅผ 1๋กœ ๋‘๋ฉด (๋ชจ๋“  example์„ ๋งˆ์Šคํ‚น) ์ „ํ˜€ ํ•™์Šต์ด ๋˜์ง€ ์•Š๋Š”๋‹ค. 
์•„์˜ˆ masking์„ ์•ˆํ•  ๊ฒฝ์šฐ, memoryํ•˜๋Š” ๊ฒƒ์—๋Š” ๋ฌธ์ œ๊ฐ€ ์—†์ง€๋งŒ QA์— ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์—†๋‹ค.
90%๋งŒ maskingํ•œ ๊ฒฝ์šฐ, word-matching์œผ๋กœ์„œ์˜ ์—ญํ• ์„ ํ•  ์ˆ˜ ์žˆ๋‹ค.


๋ฐฐ์šด ์ 

  • ODQA๋Š” retriever๊ณผ reader์„ ๋ชจ๋‘ end-to-end๋กœ IR system ์—†์ด QA pair๋กœ๋งŒ ํ•™์Šตํ•œ ์ตœ์ดˆ์˜ ๋ฐฉ๋ฒ•์ด๋‹ค.
  • Inverse Cloze Task(ICT)๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด retriever์„ pretrainํ•˜๋Š” ๊ฒŒ ๊ฐ€๋Šฅํ•˜๋‹ค.