Training Effective Neural CLIR by Bridging the Translation Gap
Authors: Hamed Bonab (1), Sheikh Muhammad Sarwar (1), James Allan (1)
1: University of Massachusetts Amherst

(0)
概要:　私たちは、Smart Shufflingというクロスリンガル埋め込み（CLE）方法を紹介します。この方法は、辞書を活用する統計的な単語アライメント手法から着想を得ており、以前のCLE方法よりもクロスリンガル情報検索（CLIR）において大幅に効果的な高密度の表現を生成します。この研究は、ニューラル手法が単言語の情報検索（IR）では成功しているものの、クロスリンガルの設定では効果が低いという観察に動機づけられています。私たちは、ニューラルCLIRが失敗する理由は、典型的なクロスリンガル埋め込みがクエリの用語をターゲット言語の同義語ではなく、関連する用語に「」しているためだと仮定します。クエリに関連用語を追加する（クエリ拡張）は情報検索に有用ですが、元のクエリにも焦点を当てる必要があります。従来のニューラルCLIRモデルはギャップを埋められず、元のクエリの意図から逸れてしまうクエリを生成しているようです。私たちは、同じデータで訓練されたニューラルおよび統計的機械システムとCLE方法の範囲についてCLIRパフォーマンスを用いて外発評価を行い、効果の大きなギャップを示しました。標準的なCLIRコレクションを用いた4つの言語での実験では、Smart Shufflingがギャップを埋め、意味的マッチングの質を大幅に向上させることが示されました。このような表現を持つことにより、CLIR課題に対してディープニューラル（再）ランキング手法を活用でき、最大で21%のMAP向上を達成し、人間の性能に近づくことができます。バイリンガル辞書の誘導の評価でも同様の改善が見られました。

Abstract:　 We introduce Smart Shuffling, a cross-lingual embedding (CLE) method that draws from statistical word alignment approaches to leverage dictionaries, producing dense representations that are significantly more effective for cross-language information retrieval (CLIR) than prior CLE methods. This work is motivated by the observation that although neural approaches are successful for monolingual IR, they are less effective in the cross-lingual setting. We hypothesize that neural CLIR fails because typical cross-lingual embeddings "translate" query terms into related terms -- i.e., terms that appear in a similar context -- in addition to or sometimes rather than synonyms in the target language. Adding related terms to a query (i.e., query expansion) can be valuable for retrieval, but must be mitigated by also focusing on the starting query. We find that prior neural CLIR models are unable to bridge the translation gap, apparently producing queries that drift from the intent of the source query. We conduct extrinsic evaluations of a range of CLE methods using CLIR performance, compare them to neural and statistical machine translation systems trained on the same translation data, and show a significant gap in effectiveness. Our experiments on standard CLIR collections across four languages indicate that Smart Shuffling fills the translation gap and provides significantly improved semantic matching quality. Having such a representation allows us to exploit deep neural (re-)ranking methods for the CLIR task, leading to substantial improvement with up to 21% gain in MAP, approaching human translation performance. Evaluations on bilingual lexicon induction show a comparable improvement.

A Quantum Interference Inspired Neural Matching Model for Ad-hoc Retrieval
Authors: Yongyu Jiang (1), Peng Zhang (1), Hui Gao (1), Dawei Song (2)
1: Tianjin University, 2: Beijing Institute of Technology