Deconfounded Video Moment Retrieval with Causal Intervention
Authors: Xun Yang (1), Fuli Feng (1), Wei Ji (1), Meng Wang (2), Tat-Seng Chua (1)
1: Sea-NExT Joint Lab & National University of Singapore, 2: Hefei University of Technology

(0)
概要:　私たちは、動画モーメント検索（VMR）のタスクに取り組みます。これは、テキストクエリにしたがって動画内の特定の瞬間をローカライズすることを目的としています。既存の方法は、主にクエリと瞬間の間の一致関係を複雑なクロスモーダル相互作用でモデリングしています。それらの効果にもかかわらず、現在のモデルは主にデータセットのバイアスを利用し、動画の内容を無視するため、一般化性能が低下します。私たちは、この問題がVMRの隠れた交絡因子、すなわち瞬間の時間的位置によって引き起こされ、その位置がモデル入力と予測を偽の関連付けることになると主張します。時間的位置バイアスに対抗するロバストなマッチングモデルの設計が重要ですが、これまでのところVMRに関しては研究されていません。この研究のギャップを埋めるために、クエリと動画コンテンツが予測に与える真の影響を捉える構造因果モデルを構築する因果関係にインスパイアされたVMRフレームワークを提案します。具体的には、モーメント位置の交絡効果を取り除くための非交絡クロスモーダルマッチング（DCM）手法を開発しました。まず、視覚コンテンツの核心的な特徴を推測するためにモーメント表現を分離し、次にバックドア調整に基づく因果介入を非交絡化されたマルチモーダル入力に適用します。これにより、モデルがターゲットの各可能な位置を公正に考慮に入れることを強制します。広範な実験により、私たちのアプローチが精度と一般化の両方の観点から、最先端の方法に対して有意な改善を達成できることが明確に示されました。

Abstract:　 We tackle the task of video moment retrieval (VMR), which aims to localize a specific moment in a video according to a textual query. Existing methods primarily model the matching relationship between query and moment by complex cross-modal interactions. Despite their effectiveness, current models mostly exploit dataset biases while ignoring the video content, thus leading to poor generalizability. We argue that the issue is caused by the hidden confounder in VMR, i.e., temporal location of moments, that spuriously correlates the model input and prediction. How to design robust matching models against the temporal location biases is crucial but, as far as we know, has not been studied yet for VMR. To fill the research gap, we propose a causality-inspired VMR framework that builds structural causal model to capture the true effect of query and video content on the prediction. Specifically, we develop a Deconfounded Cross-modal Matching (DCM) method to remove the confounding effects of moment location. It first disentangles moment representation to infer the core feature of visual content, and then applies causal intervention on the disentangled multimodal input based on backdoor adjustment, which forces the model to fairly incorporate each possible location of the target into consideration. Extensive experiments clearly show that our approach can achieve significant improvement over the state-of-the-art methods in terms of both accuracy and generalization.

Causal Intervention for Leveraging Popularity Bias in Recommendation
Authors: Yang Zhang (1), Fuli Feng (2), Xiangnan He (1), Tianxin Wei (1), Chonggang Song (3), Guohui Ling (3), Yongdong Zhang (1)
1: University of Science and Technology of China, 2: National University of Singapore, 3: Tencent Inc.

ACM DL

Google Scholar

(1)
概要:　レコメンダシステムは通常、人気バイアスの問題に直面します。データの観点から見ると、アイテムのインタラクション頻度は不均等（通常、ロングテール分布）を示します。方法論の観点からは、協調フィルタリング手法が人気アイテムを過剰に推薦することでバイアスを増幅しやすいです。レコメンデーションシステムにおいて人気バイアスを考慮することは非常に重要であり、既存の研究では主にプロペンシティベースのバイアス除去学習や因果埋め込みを用いてバイアスの影響を排除しています。しかし、データに含まれる全てのバイアスが悪いわけではないと我々は主張します。つまり、あるアイテムはその内在する品質が高いために人気を示すことがあります。バイアスを一概に排除することは、データに含まれる有益なパターンを取り除き、レコメンデーションの精度やユーザー満足度を低下させる可能性があります。本研究では、これまで未探索であったレコメンデーションにおける問題、すなわち人気バイアスを活用して推奨精度を向上させる方法を検討します。この鍵は二つの側面にあります。訓練中に人気バイアスの悪影響を取り除く方法と、Top-K推奨を生成する推論段階で期待される人気バイアスを注入する方法です。これにより、推薦生成プロセスの因果メカニズムを疑問視します。このアプローチに沿って、アイテムの人気が表示されたアイテムと観察されたインタラクションの間の共変量として機能し、バイアス増幅の悪影響を引き起こすことがわかります。我々の目標を達成するために、新しいトレーニングと推論のパラダイムとして、Popularity-bias Deconfounding and Adjusting（PDA）を提案します。これは、モデル訓練における因果的な人気バイアスを除去し、期待される人気バイアスを因果介入によってレコメンデーションスコアに調整します。本研究では、潜在因子モデルを通じて新しいパラダイムを示し、Kwai、Douban、Tencentの三つの実世界のデータセットで広範な実験を行いました。実証研究では、因果的な訓練がユーザーの真の興味を発見するのに有用であり、人気バイアスを用いた推論の調整が推奨精度をさらに向上させることが確認されました。我々のコードはhttps://github.com/zyang1580/PDAにて公開しています。

Abstract:　 Recommender system usually faces popularity bias issues: from the data perspective, items exhibit uneven (usually long-tail) distribution on the interaction frequency; from the method perspective, collaborative filtering methods are prone to amplify the bias by over-recommending popular items. It is undoubtedly critical to consider popularity bias in recommender systems, and existing work mainly eliminates the bias effect with propensity-based unbiased learning or causal embeddings. However, we argue that not all biases in the data are bad, \ie some items demonstrate higher popularity because of their better intrinsic quality. Blindly pursuing unbiased learning may remove the beneficial patterns in the data, degrading the recommendation accuracy and user satisfaction. This work studies an unexplored problem in recommendation --- how to leverage popularity bias to improve the recommendation accuracy. The key lies in two aspects: how to remove the bad impact of popularity bias during training, and how to inject the desired popularity bias in the inference stage that generates top-K recommendations. This questions the causal mechanism of the recommendation generation process. Along this line, we find that item popularity plays the role ofconfounder between the exposed items and the observed interactions, causing the bad effect of bias amplification. To achieve our goal, we propose a new training and inference paradigm for recommendation named Popularity-bias Deconfounding and Adjusting (PDA). It removes the confounding popularity bias in model training and adjusts the recommendation score with desired popularity bias via causal intervention. We demonstrate the new paradigm on the latent factor model and perform extensive experiments on three real-world datasets from Kwai, Douban, and Tencent. Empirical studies validate that the deconfounded training is helpful to discover user real interests and the inference adjustment with popularity bias could further improve the recommendation accuracy. We release our code at https://github.com/zyang1580/PDA.

AutoDebias: Learning to Debias for Recommendation
Authors: Jiawei Chen (1), Hande Dong (1), Yang Qiu (1), Xiangnan He (1), Xin Xin (2), Liang Chen (3), Guli Lin (4), Keping Yang (4)
1: University of Science and Technology of China, 2: University of Glasgow, 3: Sun Yat-Sen University, 4: Alibaba Group

ACM DL

Google Scholar

(2)
概要:　レコメンダーシステムはパーソナライズされたモデルを構築するために、評価やクリックなどのユーザー行動データに依存しています。しかし、収集されたデータは実験的ではなく観察的であり、そのためにデータには様々なバイアスが含まれており、学習されたモデルに大きな影響を与えます。既存の推奨システムにおけるデバイアス手法の多くは、逆傾向スコアリングや補完アプローチのように、特定の一つまたは二つのバイアスに焦点を当てています。しかし、混合もしくは未知のバイアスに対応できる普遍的な能力には欠けています。この研究ギャップに対して、まず期待経験リスクと真のリスクの差異を表すリスクの不一致の観点からバイアスの起源を分析します。注目すべきことに、多くの既存のデバイアス戦略を一般的な学習フレームワークですることができることを示しました。このフレームワークのパラメータを特定することで、デバイアス戦略を学習データから学習するなどの普遍的な解決策を開発する機会を提供します。しかし、学習データはどのようにバイアスがかかっているかや、バイアスのないデータがどのようなものであるかという重要な信号を欠いています。この課題を克服するために、われわれはAotoDebiasを提案します。これは、メタラーニングを用いた二段階最適化問題を解くことにより、デバイアスパラメータを最適化するために、別の（少量の）均一データセットを利用します。理論的な分析を通じて、AutoDebiasの一般化境界を導出し、適切なデバイアス戦略を獲得する能力を証明しました。二つの実データセットと一つのシミュレートされたデータセットでの広範な実験により、AutoDebiasの有効性が示されました。コードはhttps://github.com/DongHande/AutoDebiasで入手可能です。

Abstract:　 Recommender systems rely on user behavior data like ratings and clicks to build personalization model. However, the collected data is observational rather than experimental, causing various biases in the data which significantly affect the learned model. Most existing work for recommendation debiasing, such as the inverse propensity scoring and imputation approaches, focuses on one or two specific biases, lacking the universal capacity that can account for mixed or even unknown biases in the data. Towards this research gap, we first analyze the origin of biases from the perspective of risk discrepancy that represents the difference between the expectation empirical risk and the true risk. Remarkably, we derive a general learning framework that well summarizes most existing debiasing strategies by specifying some parameters of the general framework. This provides a valuable opportunity to develop a universal solution for debiasing, e.g., by learning the debiasing parameters from data. However, the training data lacks important signal of how the data is biased and what the unbiased data looks like. To move this idea forward, we propose AotoDebias that leverages another (small) set of uniform data to optimize the debiasing parameters by solving the bi-level optimization problem with meta-learning. Through theoretical analyses, we derive the generalization bound for AutoDebias and prove its ability to acquire the appropriate debiasing strategy. Extensive experiments on two real datasets and a simulated dataset demonstrated effectiveness of AutoDebias. The code is available at https://github.com/DongHande/AutoDebias.

Mitigating Sentiment Bias for Recommender Systems
Authors: Chen Lin (1), Xinyi Liu (1), Guipeng Xv (1), Hui Li (1)
1: Xiamen University

ACM DL

Google Scholar

(3)
概要:　レコメンダーシステム（RS）におけるバイアスおよびバイアス除去は、近年の研究のホットスポットとなっています。本稿では、まだ探求されていないバイアスの一種、すなわち感情バイアスを明らかにします。実証的な研究を通じて、多くのRSモデルが、よりポジティブなフィードバックを持つユーザー/アイテムグループ（すなわちポジティブユーザー/アイテム）に対して、ネガティブなフィードバックを持つユーザー/アイテムグループ（すなわちネガティブユーザー/アイテム）よりも正確な推薦を提供していることが判明しました。感情バイアスは既存のバイアス、例えば人気バイアスとは異なります。ポジティブなユーザー/アイテムが必ずしも多くのユーザーフィードバック（評価やレビューの長さ）を持っているわけではないのです。感情バイアスの存在は、重要なユーザーへの低品質な推薦や、ニッチなアイテムへの不公平な推薦につながります。次に、感情バイアスの原因となる要素について議論します。そして、感情バイアスの原因を修正するために、モデルのアーキテクチャを変更せずにRSモデルに簡単に組み込むことができる異なる正則化子によって具現化される3つの戦略を含む一般的なバイアス除去フレームワークを提案します。さまざまなRSモデルとベンチマークデータセットでの実験により、私たちのバイアス除去フレームワークの有効性が確認されました。我々の知る限り、感情バイアスとそのバイアス除去はこれまで研究されていません。本研究が、RSにおけるバイアスとバイアス除去の研究を強化する一助となることを望んでいます。

Abstract:　 Biases and de-biasing in recommender systems (RS) have become a research hotspot recently. This paper reveals an unexplored type of bias, i.e., sentiment bias. Through an empirical study, we find that many RS models provide more accurate recommendations on user/item groups having more positive feedback (i.e., positive users/items) than on user/item groups having more negative feedback (i.e., negative users/items). We show that sentiment bias is different from existing biases such as popularity bias: positive users/items do not have more user feedback (i.e., either more ratings or longer reviews). The existence of sentiment bias leads to low-quality recommendations to critical users and unfair recommendations for niche items. We discuss the factors that cause sentiment bias. Then, to fix the sources of sentiment bias, we propose a general de-biasing framework with three strategies manifesting in different regularizers that can be easily plugged into RS models without changing model architectures. Experiments on various RS models and benchmark datasets have verified the effectiveness of our de-biasing framework. To our best knowledge, sentiment bias and its de-biasing have not been studied before. We hope that this work can help strengthen the study of biases and de-biasing in RS.

Counterfactual Reward Modification for Streaming Recommendation with Delayed Feedback
Authors: Xiao Zhang (1), Haonan Jia (2), Hanjing Su (3), Wenhan Wang (3), Jun Xu (1), Ji-Rong Wen (1)
1: Renmin University of China & Beijing Key Laboratory of Big Data Management and Analysis Methods, 2: Beijing Key Laboratory of Big Data Management and Analysis Methods & Renmin University of China, 3: Tencent Inc.

ACM DL

Google Scholar

(4)
概要:　多くのストリーミングレコメンデーションシナリオでは、ユーザーフィードバックが遅延することがあります。例えば、推薦されたクーポンに対するユーザーフィードバックには、クリックイベントに関する即時フィードバックと、結果としてのコンバージョンに関する遅延フィードバックが含まれます。遅延フィードバックは、ラベルが不完全なインスタンスを用いたモデルのトレーニングという課題を引き起こします。実際の製品に適用された場合、この課題は一層深刻になり、ストリーミングレコメンデーションモデルは非常に頻繁に再トレーニングされる必要があり、トレーニングインスタンスも非常に短い時間スケールで収集される必要があります。既存のアプローチは、未観測のフィードバックを単純に無視するか、静的なインスタンスセットに対してヒューリスティックにフィードバックを調整することに頼っており、これによりトレーニングデータにバイアスが生じ、学習されたレコメンダーの精度が低下します。本論文では、ユーザーフィードバックを調整しレコメンデーションモデルを学習するための、新しい理論的に健全なカウンターファクチュアルアプローチ、CBDF（遅延フィードバックを伴うカウンターファクチュアルバンディット）を提案します。CBDFは、遅延フィードバックを伴うストリーミングレコメンデーションを逐次意思決定問題として定式化し、バッチ式バンディットとしてモデル化します。遅延フィードバックの問題に対処するために、各反復（エピソード）でカウンターファクチュアルインポータンスサンプリングモデルを使用して元のフィードバックを再重み付けし、修正された報酬を生成します。修正された報酬に基づいて、次の反復でオンラインレコメンデーションを実行するためのバッチ式バンディットが学習されます。理論的分析により、修正された報酬が統計的に偏らないことが示され、学習されたバンディットポリシーはサブリニアリグレート境界を享受することがわかりました。実験結果は、CBDFが合成データセット、Criteoデータセット、およびTencentのWeChatアプリからのデータセットにおいて、最先端のベースラインを上回ることを示しました。

Abstract:　 The user feedbacks could be delayed in many streaming recommendation scenarios. As an example, the user feedbacks to a recommended coupon consist of the immediate feedback on the click event and the delayed feedback on the resultant conversion. The delayed feedbacks pose a challenge of training recommendation models using instances with incomplete labels. When being applied to real products, the challenge becomes more severe as the streaming recommendation models need to be retrained very frequently and the training instances need to be collected over very short time scales. Existing approaches either simply ignore the unobserved feedbacks or heuristically adjust the feedbacks on a static instance set, resulting in biases in the training data and hurting the accuracy of the learned recommenders. In this paper, we propose a novel and theoretic sound counterfactual approach to adjusting the user feedbacks and learning the recommendation models, called CBDF (Counterfactual Bandit with Delayed Feedback). CBDF formulates the streaming recommendation with delayed feedback as a problem of sequential decision making and models it with a batched bandit. To deal with the issue of delayed feedback, at each iteration (episode), a counterfactual importance sampling model is employed to re-weight the original feedbacks and generate the modified rewards. Based on the modified rewards, a batched bandit is learned for conducting online recommendation at the next iteration. Theoretical analysis showed that the modified rewards are statistically unbiased, and the learned bandit policy enjoys a sub-linear regret bound. Experimental results demonstrated that CBDF can outperform the state-of-the-art baselines on a synthetic dataset, the Criteo dataset, and a dataset from Tencent's WeChat app.

Joint Knowledge Pruning and Recurrent Graph Convolution for News Recommendation
Authors: Yu Tian (1), Yuhao Yang (1), Xudong Ren (1), Pengfei Wang (2), Fangzhao Wu (3), Qian Wang (1), Chenliang Li (1)
1: Wuhan University, 2: Beijing University of Posts and Telecommunications, 3: Microsoft Research Asia

ACM DL

Google Scholar

(5)
概要:　最近では、ナレッジグラフ (KG) を利用してニュース記事のセマンティック表現を豊かにすることが、ニュースの推奨に効果的であることが証明されています。これらのソリューションは、知識グラフにある追加情報を用いてニュース記事の表現学習に焦点を当て、後にユーザー表現は主にこれらのニュース表現に基づいて導き出されます。しかし、異なるユーザーは同じニュース記事に対して異なる興味を持つ可能性があります。言い換えれば、ユーザーの興味に関連するエンティティを直接識別し、その結果としてのユーザー表現を導き出すことが、より良いニュースの推奨と説明を可能にするでしょう。この目的のために、本論文ではニュース推奨のための新しい知識剪定ベースのリカレントグラフ畳み込みネットワーク（Kopra と命名）を提案します。ニュース記事用にナレッジグラフから関連エンティティを抽出するのではなく、Kopra はユーザーのクリック履歴と KG の両方から関連エンティティを識別してユーザー表現を導き出すように設計されています。まず、ニュースのタイトルとから抽出されたシードエンティティで初期エンティティグラフ（すなわち興味グラフ）を形成します。次に、共同知識剪定およびリカレントグラフ畳み込み (RGC) メカニズムを導入し、シードエンティティごとにKGから関連エンティティをリカレントに拡張します。すなわち、KG 内の各シードエンティティの近隣エンティティのうち、ユーザーの興味に関連しないエンティティは拡張から削除されます。この剪定およびグラフ畳み込みプロセスをリカレントに行うことで、ユーザーのクリック履歴を基に長期的および短期的な表現をそれぞれ導き出すことができます。最後に、長期的および短期的なユーザー表現および候補ニュース記事のシードエンティティの上での最大プーリング予測を導入し、推奨のためのランキングスコアを計算します。2つの異なる言語の実世界のデータセットでの実験結果は、提案するKopraが一連の最先端技術の選択肢よりも大幅に優れたパフォーマンスを得ることを示唆しています。さらに、Kopra によって生成されたエンティティグラフは、推奨の説明を非常に容易にすることができます。

Abstract:　 Recently, exploiting a knowledge graph (KG) to enrich the semantic representation of a news article have been proven to be effective for news recommendation. These solutions focus on the representation learning for news articles with additional information in the knowledge graph, where the user representations are mainly derived based on these news representations later. However, different users would hold different interests on the same news article. In other words, directly identifying the entities relevant to the user's interest and deriving the resultant user representation could enable a better news recommendation and explanation. To this end, in this paper, we propose a novel knowledge pruning based recurrent graph convolutional network (named Kopra) for news recommendation. Instead of extracting relevant entities for a news article from KG, Kopra is devised to identify the relevant entities from both a user's clicked history and a KG to derive the user representation. We firstly form an initial entity graph (namely interest graph) with seed entities extracted from news titles and abstracts. Then, a joint knowledge pruning and recurrent graph convolution (RGC) mechanism is introduced to augment each seed entity with relevant entities from KG in a recurrent manner. That is, the entities in the neighborhood of each seed entity inside KG but irrelevant to the user's interest are pruned from the augmentation. With this pruning and graph convolution process in a recurrent manner, we can derive the user's both long- and short-term representations based on her click history within a long and short time period respectively. At last, we introduce a max-pooling predictor over the long- and short-term user representations and the seed entities in the candidate news to calculate the ranking score for recommendation. The experimental results over two real-world datasets in two different languages suggest that the proposed Kopra obtains significantly better performance than a series of state-of-the-art technical alternatives. Moreover, the entity graph generated by Kopra can facilitate recommendation explanation much easier.

Personalized News Recommendation with Knowledge-aware Interactive Matching
Authors: Tao Qi (1), Fangzhao Wu (2), Chuhan Wu (1), Yongfeng Huang (1)
1: Tsinghua University, 2: Microsoft Research Asia

ACM DL

Google Scholar

(6)
概要:　パーソナライズされたニュース推薦において最も重要な課題は、候補となるニュースとユーザーの興味を正確に一致させることである。既存の多くのニュース推薦手法は、候補ニュースをそのテキストコンテンツから、ユーザーの興味をクリックされたニュースから独立してモデル化している。しかし、ニュース記事は複数の側面やエンティティをカバーすることがあり、ユーザーも通常様々な種類の興味を持つ。候補ニュースとユーザーの興味を独立してモデル化すると、ニュースとユーザーのマッチングが劣化する可能性がある。本論文では、ニュース推薦のための知識を活用したインタラクティブマッチング手法を提案する。我々の手法は、候補ニュースとユーザーの興味をインタラクティブにモデル化し、その正確なマッチングを促進する。我々は知識を利用したニュース共同エンコーダを設計し、知識グラフの助けを借りて意味とエンティティの両方における関連性を捉えることで、クリックされたニュースと候補ニュースの両方の表現をインタラクティブに学習する。また、ユーザー-ニュース共同エンコーダを設計し、候補ニュースを意識したユーザーの興味の表現とユーザーを意識した候補ニュースの表現を学習することで、より良い興味のマッチングを実現する。実世界の2つのデータセットにおける実験により、我々の手法がニュース推薦のパフォーマンスを効果的に向上させることが検証された。

Abstract:　 The most important task in personalized news recommendation is accurate matching between candidate news and user interest. Most of existing news recommendation methods model candidate news from its textual content and user interest from their clicked news in an independent way. However, a news article may cover multiple aspects and entities, and a user usually has different kinds of interest. Independent modeling of candidate news and user interest may lead to inferior matching between news and users. In this paper, we propose a knowledge-aware interactive matching method for news recommendation. Our method interactively models candidate news and user interest to facilitate their accurate matching. We design a knowledge-aware news co-encoder to interactively learn representations for both clicked news and candidate news by capturing their relatedness in both semantic and entities with the help of knowledge graphs. We also design a user-news co-encoder to learn candidate news-aware user interest representation and user-aware candidate news representation for better interest matching. Experiments on two real-world datasets validate that our method can effectively improve the performance of news recommendation.

Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization
Authors: Yonghui Yang (1), Le Wu (2), Richang Hong (1), Kun Zhang (1), Meng Wang (2)
1: Hefei University of Technology, 2: Hefei University of Technology & Hefei Comprehensive National Science Center

ACM DL

Google Scholar

(7)
概要:　ニューラルグラフベースの協調フィルタリング（CF）モデルは、ユーザー-アイテムの二部グラフ構造に基づいてユーザーとアイテムの埋め込みを学習し、最先端の推薦性能を達成しています。普遍的なインプリシットフィードバックに基づくCFでは、ユーザーの未観測行動はユーザー-アイテムの二部グラフにおけるリンクされていないエッジとして扱われます。ユーザーの未観測行動は嫌悪と未知の積極的な好みが混在しているため、固定されたグラフ構造の入力には潜在的な積極的な好みのリンクが欠如しています。本研究では、CFのための強化されたグラフ構造をどのようにより良く学習するかを探求します。我々は、ノードの埋め込み学習とグラフ構造の学習がCFにおいて相互に強化できると主張します。というのも、更新されたノード埋め込みは以前のグラフ構造から学習され、その逆もまた然りです（つまり、新たに更新されたグラフ構造は現在のノード埋め込み結果に基づいて最適化される）。先行研究ではグラフ構造を精緻化するアプローチを提供していましたが、これらのモデルの多くは、CFでは利用できないノードの特徴に依存していました。さらに、学習された適応グラフと元のグラフを局所的な再構築の観点から比較しようとする従来の最適化目標がほとんどであり、適応グラフ構造のグローバルな特性が学習プロセスでモデル化されているかどうかは未だ不明です。このため、本論文では、相互情報最大化を通じてCFのための強化グラフ学習ネットワークEGLNアプローチを提案します。EGLNの主要なアイデアは二つあります：第一に、強化されたグラフ学習モジュールとノード埋め込みモジュールが互いに特徴入力なしで反復的に学習することです。第二に、強化されたグラフ学習プロセスでグローバルな特性を捉えるために局所-グローバル一貫性最適化関数を設計します。最後に、三つの実世界データセットに基づく広範な実験結果は、我々の提案モデルの有効性を明確に示しています。

Abstract:　 Neural graph based Collaborative Filtering (CF) models learn user and item embeddings based on the user-item bipartite graph structure, and have achieved state-of-the-art recommendation performance. In the ubiquitous implicit feedback based CF, users' unobserved behaviors are treated as unlinked edges in the user-item bipartite graph. As users' unobserved behaviors are mixed with dislikes and unknown positive preferences, the fixed graph structure input is missing with potential positive preference links. In this paper, we study how to better learn enhanced graph structure for CF. We argue that node embedding learning and graph structure learning can mutually enhance each other in CF, as updated node embeddings are learned from previous graph structure, and vice versa ~(i.e., newly updated graph structure are optimized based on current node embedding results). Some previous works provided approaches to refine the graph structure. However, most of these graph learning models relied on node features for modeling, which are not available in CF. Besides, nearly all optimization goals tried to compare the learned adaptive graph and the original graph from a local reconstruction perspective, whether the global properties of the adaptive graph structure are modeled in the learning process is still unknown. To this end, in this paper, we propose an enhanced graph learning network EGLN approach for CF via mutual information maximization. The key idea of EGLN is two folds: First, we let the enhanced graph learning module and the node embedding module iteratively learn from each other without any feature input. Second, we design a local-global consistency optimization function to capture the global properties in the enhanced graph learning process. Finally, extensive experimental results on three real-world datasets clearly show the effectiveness of our proposed model.

ReXPlug: Explainable Recommendation using Plug-and-Play Language Model
Authors: Deepesh V. Hada (1), Vijaikumar M. (1), Shirish K. Shevade (1)
1: Indian Institute of Science

ACM DL

Google Scholar

(8)
概要:　説明可能なレコメンデーションは、ユーザーにアイテムを推奨する理由を提供し、ユーザーの満足度と説得力を高めることがよくあります。レコメンデーションを説明する直感的な方法は、ユーザーとアイテムのペアに対して個別化された自然言語レビューを生成することです。既存のアプローチの中には、レビューを生成することで推奨を説明するものもありますが、それらのレビューの質には疑問が残ります。さらに、これらの方法は通常、テキスト生成のための基盤となる言語モデルのトレーニングにかなりの時間を要します。本研究では、ReXPlugというエンドツーエンドのフレームワークを提案し、プラグアンドプレイ方式でレコメンデーションを説明します。ReXPlugは、正確な評価を予測するとともに、Plug and Play Language Modelを活用して高品質なレビューを生成します。我々は、事前訓練済みの言語モデルを再訓練することなく、生成を制御するための簡単な感情分類器を訓練しています。このシンプルで洗練されたモデルは実装とトレーニングが容易であり、したがってレビュー生成に非常に効率的です。我々は、特別に共同学習されたクロスアテンションネットワークを活用してレビューを個別化します。詳細な実験により、ReXPlugはテキストレビューをレギュラライザーとして利用することで、多様なデータセットにおいて評価予測で多くの最新モデルを上回ることを示しています。定量的な分析では、ReXPlugによって生成されたレビューが実際のレビューと意味的に近いことが示され、定性的な分析では、生成されたレビューの質が経験的および分析的観点の両方から高いことが証明されています。我々の実装はオンラインで利用可能です。

Abstract:　 Explainable Recommendations provide the reasons behind why an item is recommended to a user, which often leads to increased user satisfaction and persuasiveness. An intuitive way to explain recommendations is by generating a synthetic personalized natural language review for a user-item pair. Although there exist some approaches in the literature that explain recommendations by generating reviews, the quality of the reviews is questionable. Besides, these methods usually take considerable time to train the underlying language model responsible for generating the text. In this work, we propose ReXPlug, an end-to-end framework with a plug and play way of explaining recommendations. ReXPlug predicts accurate ratings as well as exploits Plug and Play Language Model to generate high-quality reviews. We train a simple sentiment classifier for controlling a pre-trained language model for the generation, bypassing the language model's training from scratch again. Such a simple and neat model is much easier to implement and train, and hence, very efficient for generating reviews. We personalize the reviews by leveraging a special jointly-trained cross attention network. Our detailed experiments show that ReXPlug outperforms many recent models across various datasets on rating prediction by utilizing textual reviews as a regularizer. Quantitative analysis shows that the reviews generated by ReXPlug are semantically close to the ground truth reviews, while the qualitative analysis demonstrates the high quality of the generated reviews, both from empirical and analytical viewpoints. Our implementation is available online.

Group based Personalized Search by Integrating Search Behaviour and Friend Network
Authors: Yujia Zhou (1), Zhicheng Dou (2), Bingzheng Wei (3), Ruobing Xie (3), Ji-Rong Wen (4)
1: Renmin University of China & Tencent, 2: Renmin University of China, 3: Tencent, 4: Renmin University of China & MOE

ACM DL

Google Scholar

(9)
概要:　パーソナライズド検索の鍵は、過去の行動に基づいてユーザープロフィールを構築することです。履歴データが不足しているユーザーに対処するために、グループベースのパーソナライズドモデルが提案され、類似ユーザーのプロフィールを取り入れて結果を再ランク付けします。しかし、類似ユーザーは主に検索行動の単純な語彙的またはトピック的な類似性に基づいて見つけられます。本論文では、類似ユーザーを意味空間で強調するニューラルネットワーク強化法を提案します。さらに、ユーザーの履歴活動が限られている場合、新しいクエリを理解するためには行動ベースの類似ユーザーでは不十分であると主張します。この問題に対処するために、友人ネットワークをパーソナライズド検索に導入し、別の方法でユーザー間の親密さを判断します。友情は似た背景や興味に基づいて形成されることが多いため、友人ネットワークには自然に多くのパーソナライズド信号が隠されています。具体的には、検索行動および友人関係に基づいてユーザーを複数の友人サークルに分類する友人ネットワーク強化パーソナライズド検索モデルを提案します。これら二つのタイプの友人サークルは補完的であり、パーソナライズドを精緻化するためのより包括的なグループプロファイルを構築します。実験結果は、既存のパーソナライズド検索モデルに対する我々のモデルの有意な改善を示しています。

Abstract:　 The key to personalized search is to build the user profile based on historical behaviour. To deal with the users who lack historical data, group based personalized models were proposed to incorporate the profiles of similar users when re-ranking the results. However, similar users are mostly found based on simple lexical or topical similarity in search behaviours. In this paper, we propose a neural network enhanced method to highlight similar users in semantic space. Furthermore, we argue that the behaviour-based similar users are still insufficient to understand a new query when user's historical activities are limited. To tackle this issue, we introduce the friend network into personalized search to determine the closeness between users in another way. Since the friendship is often formed based on similar background or interest, there are plenty of personalized signals hidden in the friend network naturally. Specifically, we propose a friend network enhanced personalized search model, which groups the user into multiple friend circles based on search behaviours and friend relations respectively. These two types of friend circles are complementary to construct a more comprehensive group profile for refining the personalization. Experimental results show the significant improvement of our model over existing personalized search models.

An Image is Worth a Thousand Terms? Analysis of Visual E-Commerce Search
Authors: Arnon Dagan (1), Ido Guy (2), Slava Novgorodov (1)
1: eBay Research, 2: eBay Research & Ben-Gurion University of the Negev

ACM DL

Google Scholar

(10)
概要:　近年、視覚検索が人気を集めており、ユーザーは携帯端末を使って撮影した画像や写真ライブラリからアップロードした画像を元に検索することができるようになっています。視覚検索が特に有用なドメインの一つが電子商取引であり、ユーザーは購入する商品を検索します。本研究では、視覚電子商取引検索の包括的な詳細研究を紹介します。我々は、最大規模の電子商取引プラットフォームのモバイル検索アプリケーションのクエリログ分析を行います。視覚検索とテキスト検索をさまざまな特徴で比較し、取得された結果とそれに対するユーザーのインタラクションに特に焦点を当てます。また、画像クエリの特徴、属性による絞り込み、視覚検索クエリのパフォーマンス予測についても検討します。我々の分析は、視覚検索とテキスト検索の間にさまざまな違いがあることを指摘しています。これらの違いが将来の電子商取引検索システムの設計に与える影響についても議論します。

Abstract:　 Visual search has become popular in recent years, allowing users to search by an image they are taking using their mobile device or uploading from their photo library. One domain in which visual search is especially valuable is electronic commerce, where users seek for items to purchase. In this work, we present an in-depth comprehensive study of visual e-commerce search. We perform query log analysis of one of the largest e-commerce platforms' mobile search application. We compare visual and textual search by a variety of characteristics, with special focus on the retrieved results and user interaction with them. We also examine image query characteristics, refinement by attributes, and performance prediction for visual search queries. Our analysis points out a variety of differences between visual and textual e-commerce search. We discuss the implications of these differences for the design of future e-commerce search systems.

Efficiently Teaching an Effective Dense Retriever with Balanced Topic Aware Sampling
Authors: Sebastian Hofstätter (1), Sheng-Chieh Lin (2), Jheng-Hong Yang (2), Jimmy Lin (2), Allan Hanbury (1)
1: TU Wien, 2: University of Waterloo

ACM DL

Google Scholar

(11)
概要:　ニューラル検索モデルの広範な採用に向けた重要なステップは、トレーニング、インデクシング、およびクエリワークフロー全体でのリソース効率です。ニューラル情報検索（IR）のコミュニティは、最近、効果的なデュアルエンコーダ密検索（DR）モデルのトレーニングにおいて大きな進展を遂げました。密テキスト検索モデルは、クエリと文章に対して単一のベクトル表現を使用して一致度をスコアリングし、最も近い隣接検索を用いた低レイテンシの一次検索を可能にします。しかし、これらのトレーニングアプローチは、継続的に更新されるインデックスからのネガティブパッセージのサンプリングや非常に大きなバッチサイズを必要とするため、膨大な計算力を必要とします。

より多くの演算能力に依存する代わりに、我々は効率的なトピック認識クエリとバランスの取れたマージンサンプリング技術であるTAS-Balancedを導入します。トレーニング前に一度クエリをクラスタリングし、バッチごとにクラスターからクエリをサンプリングします。我々の軽量な6層DRモデルは、ペアワイズとバッチ内ネガティブ教師を組み合わせた新しいデュアルティーチャー監督でトレーニングされます。この方法は、消費者向けの1台のGPUで48時間以内にトレーニング可能です。

我々のTAS-Balancedトレーニング法は、2つのTREC Deep Learning Trackクエリセットにおいて、最先端の低レイテンシ（クエリごとに64ms）の結果を達成します。NDCG@10で評価した際、BM25を44％、単純にトレーニングされたDRを19％、docT5queryを11％、および前回の最高のDRモデルを5％上回りました。さらに、TAS-BalancedはTREC-DLで任意のカットオフで他のすべての方法を上回る最初の密リトリーバーを生み出し、よりリソース集約型の再ランキングモデルが少ないパッセージを元に結果を改善することを可能にします。

Abstract:　 A vital step towards the widespread adoption of neural retrieval models is their resource efficiency throughout the training, indexing and query workflows. The neural IR community made great advancements in training effective dual-encoder dense retrieval (DR) models recently. A dense text retrieval model uses a single vector representation per query and passage to score a match, which enables low-latency first-stage retrieval with a nearest neighbor search. Increasingly common, training approaches require enormous compute power, as they either conduct negative passage sampling out of a continuously updating refreshing index or require very large batch sizes. Instead of relying on more compute capability, we introduce an efficient topic-aware query and balanced margin sampling technique, called TAS-Balanced. We cluster queries once before training and sample queries out of a cluster per batch. We train our lightweight 6-layer DR model with a novel dual-teacher supervision that combines pairwise and in-batch negative teachers. Our method is trainable on a single consumer-grade GPU in under 48 hours. We show that our TAS-Balanced training method achieves state-of-the-art low-latency (64ms per query) results on two TREC Deep Learning Track query sets. Evaluated on NDCG@10, we outperform BM25 by 44%, a plainly trained DR by 19%, docT5query by 11%, and the previous best DR model by 5%. Additionally, TAS-Balanced produces the first dense retriever that outperforms every other method on recall at any cutoff on TREC-DL and allows more resource intensive re-ranking models to operate on fewer passages to improve results further.

Learning a Fine-Grained Review-based Transformer Model for Personalized Product Search
Authors: Keping Bi (1), Qingyao Ai (2), W. Bruce Croft (1)
1: University of Massachusetts Amherst, 2: University of Utah

ACM DL

Google Scholar

(12)
概要:　商品検索は、オンラインショッピングを利用する人々にとって重要なエントリーポイントとなっています。既存のほとんどのパーソナライズされた商品モデルは、ユーザーの意図と商品のセマンティック空間における表現とマッチングのパラダイムに従っており、細かいレベルでのマッチングは完全に無視され、商品のランク付けはユーザーと商品の類似性以上に説明されることはありません。さらに、既存の研究においては、検索コンテキストに基づいて動的なユーザー表現を作成したモデルもありますが、それらの商品の表現はすべての検索セッションにおいて静的であり、様々なユーザーの意図とのマッチング時に商品の情報が常に同じ重要度で表現されます。上記の限界を認識し、我々はパーソナライズされた商品検索のためのレビュー基盤トランスフォーマーモデル（RTM）を提案します。このモデルはクエリ、ユーザーレビュー、および商品レビューのシーケンスをトランスフォーマーアーキテクチャでエンコードします。RTMはユーザーと商品の間でレビュー単位のマッチングを行い、各レビューはシーケンス内のコンテキストに応じて動的な効果を持ちます。これにより、スコアリングを説明するための有用なレビューを特定することが可能になります。実験結果から、RTMは最新のパーソナライズされた商品検索ベースラインを大幅に上回ることが示されています。

Abstract:　 Product search has been a crucial entry point to serve people shopping online. Most existing personalized product models follow the paradigm of representing and matching user intents and items in the semantic space, where finer-grained matching is totally discarded and the ranking of an item cannot be explained further than just user/item level similarity. In addition, while some models in existing studies have created dynamic user representations based on search context, their representations for items are static across all search sessions. This makes every piece of information about the item always equally important in representing the item during matching with various user intents. Aware of the above limitations, we propose a review-based transformer model (RTM) for personalized product search, which encodes the sequence of query, user reviews, and item reviews with a transformer architecture. RTM conducts review-level matching between the user and item, where each review has a dynamic effect according to the context in the sequence. This makes it possible to identify useful reviews to explain the scoring. Experimental results show that RTM significantly outperforms state-of-the-art personalized product search baselines.

DepressionNet: Learning Multi-modalities with User Post Summarization for Depression Detection on Social Media
Authors: Hamad Zogan (1), Imran Razzak (2), Shoaib Jameel (3), Guandong Xu (4)
1: University of Technology Sydney & College of Computer Science and Information Technology, 2: Deakin University, 3: University of Essex, 4: University of Technology Sydney

ACM DL

Google Scholar

(13)
概要:　 Twitterは現在、ユーザーが自分で生成したコンテンツを共有できる人気のあるオンラインソーシャルメディアプラットフォームです。この公的に生成されたユーザーデータは、発見されたパターンが多くの面で大いに役立つため、ヘルスケア技術にとっても重要です。その応用例の一つが、うつ病などのメンタルヘルス問題を自動的に発見することです。これまでの研究では、オンラインソーシャルメディア上でのうつ病ユーザーを自動検出するために、主にユーザーの行動や社会的相互作用を含む言語パターンに依存してきました。しかし、これらのモデルは、うつ病ユーザーを検出する上で重要ではない多数の無関係なコンテンツで訓練されているため、全体的な効率と有効性に悪影響を及ぼします。既存の自動うつ病検出手法の欠点を克服するために、ユーザーのツイート全体のシーケンスに対してハイブリッドな抽出および的戦略を用いて関連するコンテンツを初期に選別する新しい計算フレームワークを提案します。これにより、より詳細かつ関連性の高いコンテンツが得られます。選別されたコンテンツは、その後、畳み込みニューラルネットワーク（CNN）と注意強化型ゲート付き回帰ユニット（GRU）モデルを組み合わせた統一学習機構を含む新しいディープラーニングフレームワークに送られ、既存の強力なベースラインよりも優れた実証的なパフォーマンスを実現します。

Abstract:　 Twitter is currently a popular online social media platform which allows users to share their user-generated content. This publicly-generated user data is also crucial to healthcare technologies because the discovered patterns would hugely benefit them in several ways. One of the applications is in automatically discovering mental health problems, e.g., depression. Previous studies to automatically detect a depressed user on online social media have largely relied upon the user behaviour and their linguistic patterns including user's social interactions. The downside is that these models are trained on several irrelevant content which might not be crucial towards detecting a depressed user. Besides, these content have a negative impact on the overall efficiency and effectiveness of the model. To overcome the shortcomings in the existing automatic depression detection methods, we propose a novel computational framework for automatic depression detection that initially selects relevant content through a hybrid extractive and abstractive summarization strategy on the sequence of all user tweets leading to a more fine-grained and relevant content. The content then goes to our novel deep learning framework comprising of a unified learning machinery comprising of Convolutional Neural Network (CNN) coupled with attention-enhanced Gated Recurrent Units (GRU) models leading to better empirical performance than existing strong baselines.

Look Before You Leap: Confirming Edge Signs in Random Walk with Restart for Personalized Node Ranking in Signed Networks
Authors: Wonchang Lee (1), Yeon-Chang Lee (1), Dongwon Lee (2), Sang-Wook Kim (1)
1: Hanyang University, 2: The Pennsylvania State University

ACM DL

Google Scholar

(14)
概要:　本論文では、符号付きネットワークにおける個別ノードランキング（PNR）問題に取り組みます。この問題は、符号付きネットワーク内の特定のシードノードに対して最も関連性の高い順にノードをランク付けすることを目的としています。最近提案されたPNR手法は、符号付きランダムサーファー（SRSurfer）という概念を導入しており、バランス理論を用いてノード間のスコア伝播を行います。しかし、現実の符号付きネットワークの設定では、エッジ関係がバランス理論のルールに厳密には従わないことが多いです。そのため、SRSurferベースのPNR手法はノードに対して誤ったスコア伝播を頻繁に行い、PNRの精度を低下させる原因となっています。この限界に対処するために、符号検証を伴う新しいランダムウォークベースのPNR手法OBOE（lOok Before yOu lEap、飛ぶ前に見よ）を提案します。具体的には、OBOEはノードのトポロジー的特徴を用いてSRSurferのスコア伝播を慎重に検証します。さらにOBOEは、与えられたネットワークの統計を利用して、全ての誤ったスコア伝播ケースを修正します。3つの実世界の符号付きネットワークで行った実験では、OBOEが5つの競合手法に対して一貫して大幅に優れた性能を示し、トップk PNR、ボトムk PNRおよびトロール識別タスクにおいて、それぞれ最大13%、95%、249%の改善を達成しました。全てのOBOEコードおよびデータセットはhttp://github.com/wonchang24/OBOEにて利用可能です。

Abstract:　 In this paper, we address the personalized node ranking (PNR) problem for signed networks, which aims to rank nodes in an order most relevant to a given seed node in a signed network. The recently-proposed PNR methods introduce the concept of the signed random surfer, denoted as SRSurfer, that performs the score propagation between nodes using the balance theory. However, in real settings of signed networks, edge relationships often do not strictly follow the rules of the balance theory. Therefore, SRSurfer-based PNR methods frequently perform incorrect score propagation to nodes, thereby degrading the accuracy of PNR. To address this limitation, we propose a novel random-walk based PNR approach with sign verification, named as OBOE (lOok Before yOu lEap). Specifically, OBOE carefully verifies the score propagation of SRSurfer by using the topological features of nodes. Then, OBOE corrects all incorrect score propagation cases by exploiting the statistics of a given network. The experiments on 3 real-world signed networks show that OBOE consistently and significantly outperforms 5 competing methods with improvement up to 13%, 95%, and 249% in top-k PNR, bottom-k PNR, and troll identification tasks, respectively. All OBOE codes and datasets are available at: http://github.com/wonchang24/OBOE.

Hierarchical Multi-modal Contextual Attention Network for Fake News Detection
Authors: Shengsheng Qian (1), Jinguang Wang (2), Jun Hu (3), Quan Fang (1), Changsheng Xu (4)
1: National Lab of Pattern Recognition, 2: HeFei University of Technology, 3: National Lab of Pattern Recognition, 4: National Lab of Pattern Recognition

ACM DL

Google Scholar

(15)
概要:　近年、ソーシャルメディアプラットフォーム上でのフェイクニュースの検出が重要な課題となっています。フェイクニュースが広範に拡散されると、読者を誤導し、悪影響を及ぼす可能性があるためです。これまでに、多くのアルゴリズムが提案され、手作業による特徴抽出方法から深層学習アプローチまでが検討されてきました。しかし、これらの方法には以下のような限界があります。(1) マルチモーダルなコンテキスト情報を活用できず、各ニュースの高次の補完情報を抽出してフェイクニュース検出を強化することができない点。(2) テキスト内容の完全な階層的セマンティクスを無視し、より良いニュース表現を学習する手助けをしていない点。この限界を克服するために、本論文では、新しい階層型マルチモーダルコンテクスト注意ネットワーク（HMCAN）を提案し、マルチモーダルコンテキスト情報とテキストの階層的セマンティクスを統合した深層モデルで共同モデリングします。具体的には、テキストと画像のより良い表現を学習するために、BERTとResNetを採用します。次に、取得した画像とテキストの表現をマルチモーダルコンテクスト注意ネットワークに投入し、モダリティ間およびモダリティ内の関係を融合します。最後に、豊富な階層的セマンティクスをキャプチャするために階層型エンコーディングネットワークを設計し、フェイクニュース検出を行います。3つの公開データセットに対する広範な実験により、提案したHMCANが最先端の性能を達成することを実証しました。

Abstract:　 Nowadays, detecting fake news on social media platforms has become a top priority since the widespread dissemination of fake news may mislead readers and have negative effects. To date, many algorithms have been proposed to facilitate the detection of fake news from the hand-crafted feature extraction methods to deep learning approaches. However, these methods may suffer from the following limitations: (1) fail to utilize the multi-modal context information and extract high-order complementary information for each news to enhance the detection of fake news; (2) largely ignore the full hierarchical semantics of textual content to assist in learning a better news representation. To overcome these limitations, this paper proposes a novel hierarchical multi-modal contextual attention network (HMCAN) for fake news detection by jointly modeling the multi-modal context information and the hierarchical semantics of text in a unified deep model. Specifically, we employ BERT and ResNet to learn better representations for text and images, respectively. Then, we feed the obtained representations of images and text into a multi-modal contextual attention network to fuse both inter-modality and intra-modality relationships. Finally, we design a hierarchical encoding network to capture the rich hierarchical semantics for fake news detection. Extensive experiments on three public real datasets demonstrate that our proposed HMCAN achieves state-of-the-art performance.

DyDiff-VAE: A Dynamic Variational Framework for Information Diffusion Prediction
Authors: Ruijie Wang (1), Zijie Huang (2), Shengzhong Liu (1), Huajie Shao (1), Dongxin Liu (1), Jinyang Li (1), Tianshi Wang (1), Dachun Sun (1), Shuochao Yao (3), Tarek Abdelzaher (1)
1: University of Illinois at Urbana Champaign, 2: University of California, 3: George Mason University

ACM DL

Google Scholar

(16)
概要:　本論文は、ソーシャルメディアにおける情報拡散予測のための新しい拡散モデルであるDyDiff-VAEについて述べています。初期のコンテンツと一連のフォワードユーザーが与えられた時に、DyDiff-VAEは他の潜在的なユーザーの拡散確率を推定し、対応するユーザーランキングを予測することを目指しています。拡散データからユーザーの興味を推測することは拡散予測の基盤となります。なぜなら、ユーザーはしばしば自身の興味を持つ情報や、同じ興味を共有する人物からの情報をフォワードするためです。また、興味は近隣のダイナミックな社会的影響やソーシャルメディア内外で得られる時間に敏感な情報の結果として変化します。既存の研究は拡散データからユーザーの内在的な興味をモデル化することに失敗し、ユーザーの興味が時間とともに静的に保持されると仮定しています。DyDiff-VAEは以下の2つの方向で最先端技術を進展させます。(i) 観測された拡散データからユーザーの興味の進化を推測するための動的エンコーダを提案します。(ii) 初期カスケードコンテンツとフォワードユーザーシーケンスの両方から情報を統合して拡散確率を推定する二重注意型デコーダを提案します。TwitterとYouTubeからの4つの実世界データセットを使った広範な実験により、提案モデルの利点を実証しました; 最高のベースラインに対して平均43.3％の相対的な向上を達成しました。さらに、リカレントニューラルネットワークベースのモデルと比較して、実行時間が最も短いことを示しています。

Abstract:　 This paper describes a novel diffusion model, DyDiff-VAE, for information diffusion prediction on social media. Given the initial content and a sequence of forwarding users, DyDiff-VAE aims to estimate the propagation likelihood for other potential users and predict the corresponding user rankings. Inferring user interests from diffusion data lies the foundation of diffusion prediction, because users often forward the information in which they are interested or the information from those who share similar interests. Their interests also evolve over time as the result of the dynamic social influence from neighbors and the time-sensitive information gained inside/outside the social media. Existing works fail to model users' intrinsic interests from the diffusion data and assume user interests remain static along the time. DyDiff-VAE advances the state of the art in two directions: (i) We propose a dynamic encoder to infer the evolution of user interests from observed diffusion data. (ii) We propose a dual attentive decoder to estimate the propagation likelihood by integrating information from both the initial cascade content and the forwarding user sequence. Extensive experiments on four real-world datasets from Twitter and Youtube demonstrate the advantages of the proposed model; we show that it achieves 43.3%relative gains over the best baseline on average. Moreover, it has the lowest run-time compared with recurrent neural network based models.

Tracing Knowledge State with Individual Cognition and Acquisition Estimation
Authors: Ting Long (1), Yunfei Liu (1), Jian Shen (1), Weinan Zhang (1), Yong Yu (1)
1: Shanghai Jiao Tong University

ACM DL

Google Scholar

(17)
概要:　知識追跡は、学生の学習状態を動的に推定し、質問に対する回答の予測を通じて行う重要なタスクです。このタスクの典型的な解法の一つは、リカレントニューラルネットワーク（RNN）に基づいたもので、RNNの隠れ状態を用いて学生の知識状態を表現します。この種の方法は通常、すべての学生が同じ質問に対して同じ認知レベルと知識獲得感度を持つと仮定します。そのため、彼らは(i)学生の知識状態と質問の表現を基に学生の応答を予測し、(ii)質問の表現と学生の応答に基づいて知識状態を更新します。上記の二つのプロセスには、明示的な認知レベルや知識獲得感度は考慮されていません。しかし、現実のシナリオでは、学生は質問に対する理解が異なり、同じ質問を終えた後の知識獲得も様々です。本論文では、Individual Estimation Knowledge Tracing (IEKT) と呼ばれる新しいモデルを提案します。このモデルは、応答予測の前に質問に対する学生の認知を推定し、知識状態を更新する前に質問に対する知識獲得感度を評価します。実験では、IEKTを4つのベンチマークデータセットにおいて11の既存の知識追跡手法と比較し、その結果、IEKTが最先端の性能を達成することを示しました。

Abstract:　 Knowledge tracing, which dynamically estimates students' learning states by predicting their performance on answering questions, is an essential task in online education. One typical solution for knowledge tracing is based on Recurrent Neural Networks (RNNs), which represent students' knowledge states with the hidden states of RNNs. Such type of methods normally assumes that students have the same cognition level and knowledge acquisition sensitivity on the same question. Thus, they (i) predict students' responses by referring to their knowledge states and question representations, and (ii) update the knowledge states according to the question representations and students' responses. No explicit cognition level or knowledge acquisition sensitivity is considered in the above two processes. However, in real-world scenarios, students have different understandings on a question and have various knowledge acquisition after they finish the same question. In this paper, we propose a novel model called Individual Estimation Knowledge Tracing (IEKT), which estimates the students' cognition on the question before response prediction and assesses their knowledge acquisition sensitivity on the questions before updating the knowledge state. In the experiments, we compare IEKT with 11 knowledge tracing baselines on four benchmark datasets, and the results show IEKT achieves the state-of-the-art performance.

Knowledge-based Review Generation by Coherence Enhanced Text Planning
Authors: Junyi Li (1), Wayne Xin Zhao (1), Zhicheng Wei (2), Nicholas Jing Yuan (2), Ji-Rong Wen (1)
1: Renmin University of China, 2: Huawei Cloud

ACM DL

Google Scholar

(18)
概要:　自然言語生成のタスクとして、情報豊かで一貫性のあるレビュー文を生成することは難題です。生成されるテキストの情報量を増やすために、既存のソリューションは通常、知識グラフ（KG）からエンティティやトリプルをコピーする方法を学習します。しかし、組み込まれた知識を選択し配置する全体的な配慮が欠如しており、それがテキストの非一貫性を引き起こす傾向にあります。この問題に対応するため、我々はKGの意味構造を活用し、生成されるレビューのエンティティ中心の一貫性を向上させることに注力しました。本論文では、レビュー生成のために、知識グラフ（KG）を基にした新しい一貫性強化テキストプランニングモデル（CETP）を提案し、グローバルおよびローカルの一貫性を改善します。提案モデルは文書生成のための二段階のテキストプランを学習します：（1）文書プランは、文のプランを順に並べたものとしてモデル化され、（2）文のプランはKGからのエンティティベースのサブグラフとしてモデル化されます。ローカルな一貫性はエンティティ間の文内相関を通じてKGサブグラフによって自然に強化されます。グローバルな一貫性のためには、サブグラフレベルおよびノードレベルの注意機構を持つ階層的な自己注意アーキテクチャを設計し、サブグラフ間の相関を強化します。我々の知る限り、KGベースのテキストプランニングモデルを用いてレビュー生成のテキスト一貫性を向上させるのは初めての試みです。三つのデータセットにおける広範な実験により、生成されたテキストの内容の一貫性を向上させる上で我々のモデルの有効性が確認されました。

Abstract:　 As a natural language generation task, it is challenging to generate informative and coherent review text. In order to enhance the informativeness of the generated text, existing solutions typically learn to copy entities or triples from knowledge graphs (KGs). However, they lack overall consideration to select and arrange the incorporated knowledge, which tends to cause text incoherence. To address the above issue, we focus on improving entity-centric coherence of the generated reviews by leveraging the semantic structure of KGs. In this paper, we propose a novel Coherence Enhanced Text Planning model (CETP) based on knowledge graphs (KGs) to improve both global and local coherence for review generation. The proposed model learns a two-level text plan for generating a document: (1) the document plan is modeled as a sequence of sentence plans in order, and (2) the sentence plan is modeled as an entity-based subgraph from KG. Local coherence can be naturally enforced by KG subgraphs through intra-sentence correlations between entities. For global coherence, we design a hierarchical self-attentive architecture with both subgraph- and node-level attention to enhance the correlations between subgraphs. To our knowledge, we are the first to utilize a KG-based text planning model to enhance text coherence for review generation. Extensive experiments on three datasets confirm the effectiveness of our model on improving the content coherence of generated texts.

UGRec: Modeling Directed and Undirected Relations for Recommendation
Authors: Xinxiao Zhao (1), Zhiyong Cheng (2), Lei Zhu (3), Jiecai Zheng (4), Xueqing Li (1)
1: Shandong University, 2: Shandong Artificial Intelligence Institute, 3: Shandong Normal University, 4: Shandong Sport University

ACM DL

Google Scholar

(19)
概要:　ユーザーとアイテムの相互作用のみを利用する推奨システム（例えば、協調フィルタリングに基づくもの）は、ユーザーまたはアイテムの相互作用が不十分な場合、劇的な性能低下に直面することが多々あります。近年、この問題を緩和するために様々な種類の副次情報が探求されています。その中でも、ナレッジグラフ（KG）は、ユーザーやアイテムおよびそれらに関連する属性をグラフ構造にエンコードし、関係情報を保持するため、広範な研究の関心を集めています。一方で、アイテム間の共起情報（例えば、共に閲覧されるアイテム）は、豊富な類似性情報を含みながらも、あまり注目されていません。これは、ユーザーやアイテム属性のグラフとは異なる視点から情報を提供し、協調フィルタリング推薦モデルにとっても価値があります。本研究では、これら二種類の副次情報（すなわち、KGおよびアイテム間の共起データ）を推薦に統合する可能性を検討します。この目標を達成するために、我々は従来のナレッジグラフにおける有向関係とアイテム間共起関係を同時に統合する、統一グラフベースの推薦モデル（UGRec）を提案します。具体的には、有向関係に対しては、主語と目的語をそれぞれ対応する関係スペースに変換し、その関係をモデル化します。そして、無向共起関係に対しては、主語と目的語をエンティティ空間の一意の超平面に射影し、距離を最小化します。さらに、詳細な関係モデリングのために、主語-目的語関係に基づいた注意メカニズムを設計しています。

Abstract:　 The recommender systems, which merely leverage user-item interactions for user preference prediction (such as the collaborative filtering-based ones), often face dramatic performance degradation when the interactions of users or items are insufficient. In recent years, various types of side information have been explored to alleviate this problem. Among them, knowledge graph (KG) has attracted extensive research interests as it can encode users/items and their associated attributes in the graph structure to preserve the relation information. In contrast, less attention has been paid to the item-item co-occurrence information (i.e., co-view), which contains rich item-item similarity information. It provides information from a perspective different from the user/item-attribute graph and is also valuable for the CF recommendation models. In this work, we make an effort to study the potential of integrating both types of side information (i.e., KG and item-item co-occurrence data) for recommendation. To achieve the goal, we propose a unified graph-based recommendation model (UGRec), which integrates the traditional directed relations in KG and the undirected item-item co-occurrence relations simultaneously. In particular, for a directed relation, we transform the head and tail entities into the corresponding relation space to model their relation; and for an undirected co-occurrence relation, we project head and tail entities into a unique hyperplane in the entity space to minimize their distance. In addition, a head-tail relation-aware attentive mechanism is designed for fine-grained relation modeling.

DEKR: Description Enhanced Knowledge Graph for Machine Learning Method Recommendation
Authors: Xianshuai Cao (1), Yuliang Shi (2), Han Yu (3), Jihu Wang (1), Xinjun Wang (2), Zhongmin Yan (1), Zhiyong Chen (1)
1: Shandong University, 2: Shandong University & Dareway Software Co., 3: Nanyang Technological University

ACM DL

Google Scholar

(20)
概要:　膨大な数の機械学習（ML）手法により、情報過多が深刻な問題となっています。圧倒的な数のML手法に直面し、与えられたデータセットやタスクに適したものを選択するのは困難です。一般に、ML手法やデータセットの名前は簡潔にまとめられているため、具体的な説明が不足しており、MLエンティティ間の豊富な潜在的関係が十分に探求されていないのが現状です。本論文では、与えられたMLデータセットに対して適切なML手法を推奨するための、説明を強化した機械学習知識グラフベースのアプローチ―DEKRを提案します。本提案の知識グラフ（KG）は、エンティティ間の接続だけでなく、データセットおよび手法エンティティの説明も含んでいます。DEKRは知識グラフ内のエンティティの構造情報と説明情報を融合させています。これは知識グラフベースおよびテキストベースの手法を組み込んだ深層ハイブリッド推奨フレームワークであり、説明情報を無視する従来の知識グラフベースの推奨システムの限界を克服しています。DEKRには2つの重要なコンポーネントがあります。1つ目は、マルチオーダーネイバーからの情報を集約し、注意機構を用いてシード（データセットまたは手法）ノードの表現を強化するグラフニューラルネットワークです。2つ目は、説明テキストの線形および非線形の相互作用を取得するための、説明テキストに基づく深層協調フィルタリングネットワークです。広範な実験により、DEKRの効率性を実証し、現行の最先端ベースラインを大幅に上回る性能を発揮することが示されました。

Abstract:　 The huge number of machine learning (ML) methods has resulted in significant information overload. Faced with an overwhelming number of ML methods, it is challenging to select appropriate ones for the given dataset and task. In general, the names of ML methods or datasets are rather condensed, thus lacking specific explanations, while the rich latent relationships between ML entities are not fully explored. In this paper, we propose a description-enhanced machine learning knowledge graph-based approach - DEKR - to help recommend appropriate ML methods for given ML datasets. The proposed knowledge graph (KG) not only includes the connections between entities but also contains the descriptions of the dataset and method entities. DEKR fuses the structural information with the description information of entities in the knowledge graph. It is a deep hybrid recommendation framework, which incorporates the knowledge graph-based and text-based methods, overcoming the limitations of previous knowledge graph-based recommendation systems that ignore the description information. There are two key components of DEKR: 1) a graph neural network aggregating information from multi-order neighbors with attention to enrich the seed (i.e. dataset or method) node's own representation, and 2) a deep collaborative filtering network based on the description text to obtain the linear and nonlinear interactions of description features. Through extensive experiments, we demonstrated the efficiency of DEKR, which outperforms the current state-of-the-art baselines by a large margin.

Relational Learning with Gated and Attentive Neighbor Aggregator for Few-Shot Knowledge Graph Completion
Authors: Guanglin Niu (1), Yang Li (1), Chengguang Tang (1), Ruiying Geng (1), Jian Dai (1), Qiao Liu (2), Hao Wang (1), Jian Sun (1), Fei Huang (3), Luo Si (3)
1: Alibaba Group, 2: Individual, 3: Alibaba Group

ACM DL

Google Scholar

(21)
概要:　知識グラフ（KG）における少数ショット関係のカバレッジを拡大することを目指して、少数ショット知識グラフ補完（FKGC）は近年多くの研究関心を集めています。一部の既存モデルは、少数ショット関係の多段隣接情報を利用してそのセマンティックな表現を強化します。しかし、隣接情報が過度に疎で、少数ショット関係を表すための隣接が利用できない場合、ノイズとなる隣接情報が増幅される可能性があります。さらに、従来の知識グラフ補完アプローチによる一対多（1-N）、多対一（N-1）、および多対多（N-N）の複雑な関係のモデリングと推論には、高いモデルの複雑さと大量のトレーニングインスタンスが必要です。したがって、限られたトレーニングインスタンスのため、少数ショットシナリオにおける複雑な関係の推論はFKGCモデルにとって困難です。本論文では、これらの問題に対処するために、少数ショット関係学習のためのグローバル-ローカルフレームワークを提案します。グローバルステージでは、新しいゲート付き注意型隣接アグリゲーターを構築し、知識グラフが極めて疎な場合でもノイズとなる隣接をフィルタリングし、少数ショット関係の隣接の意味を正確に統合します。ローカルステージでは、メタ学習ベースのTransH（MTransH）手法を設計し、少数ショット学習方式でモデルをトレーニングします。広範な実験により、私たちのモデルがNELL-OneとWiki-Oneの頻繁に使用されるベンチマークデータセットで、最新のFKGCアプローチを上回ることが示されました。強力なベースラインモデルであるMetaRと比較して、NELL-Oneでは8.0%、Wiki-Oneでは2.8%の5-shot FKGCパフォーマンス向上をHits@10指標によって達成しました。

Abstract:　 Aiming at expanding few-shot relations' coverage in knowledge graphs (KGs), few-shot knowledge graph completion (FKGC) has recently gained more research interests. Some existing models employ a few-shot relation's multi-hop neighbor information to enhance its semantic representation. However, noise neighbor information might be amplified when the neighborhood is excessively sparse and no neighbor is available to represent the few-shot relation. Moreover, modeling and inferring complex relations of one-to-many (1-N), many-to-one (N-1), and many-to-many (N-N) by previous knowledge graph completion approaches requires high model complexity and a large amount of training instances. Thus, inferring complex relations in the few-shot scenario is difficult for FKGC models due to limited training instances. In this paper, we propose a few-shot relational learning with global-local framework to address the above issues. At the global stage, a novel gated and attentive neighbor aggregator is built for accurately integrating the semantics of a few-shot relation's neighborhood, which helps filtering the noise neighbors even if a KG contains extremely sparse neighborhoods. For the local stage, a meta-learning based TransH (MTransH) method is designed to model complex relations and train our model in a few-shot learning fashion. Extensive experiments show that our model outperforms the state-of-the-art FKGC approaches on the frequently-used benchmark datasets NELL-One and Wiki-One. Compared with the strong baseline model MetaR, our model achieves 5-shot FKGC performance improvements of 8.0% on NELL-One and 2.8% on Wiki-One by the metric Hits@10.

AdsGNN: Behavior-Graph Augmented Relevance Modeling in Sponsored Search
Authors: Chaozhuo Li (1), Bochen Pang (2), Yuming Liu (2), Hao Sun (2), Zheng Liu (1), Xing Xie (1), Tianqi Yang (2), Yanling Cui (2), Liangjie Zhang (2), Qi Zhang (2)
1: Microsoft Research Asia, 2: Microsoft

ACM DL

Google Scholar

(22)
概要:　スポンサー検索広告は、人々が検索エンジンで商品やサービスを探す際に検索結果の横に表示されます。近年、これらの広告はマーケティングにおいて最も収益性の高いチャネルの一つとなっています。検索広告の基盤として、関連性モデルの研究は、その重要な研究課題と莫大な実用価値により多くの注目を集めています。既存のほとんどのアプローチは、入力されたクエリ広告ペアの意味情報のみに依存していますが、短い広告データ内の純粋な意味情報だけではユーザーの検索意図を完全に特定することはできません。我々の動機は、関連性モデルを支援するための補完的なグラフとして、履歴検索ログからの膨大な量の教師なしユーザー行動データを取り込むことにあります。本論文では、意味的テキスト情報とユーザー行動グラフを自然に融合させる方法を徹底的に調査し、さらにノード、エッジおよびトークンの観点からトポロジカルな近傍を集約するために三つの新しいAdsGNNモデルを提案します。さらに、ドメイン固有の事前学習とロングテール広告のマッチングという、あまり研究されていない二つの重要な問題も詳しく研究します。実証的に、我々は大規模な業界データセット上でAdsGNNモデルを評価し、オンライン/オフラインテストの実験結果が一貫して我々の提案の優位性を示しました。

Abstract:　 Sponsored search ads appear next to search results when people look for products and services on search engines. In recent years, they have become one of the most lucrative channels for marketing. As the fundamental basis of search ads, relevance modeling has attracted increasing attention due to the significant research challenges and tremendous practical value. Most existing approaches solely rely on the semantic information in the input query-ad pair, while the pure semantic information in the short ads data is not sufficient to fully identify user's search intents. Our motivation lies in incorporating the tremendous amount of unsupervised user behavior data from the historical search logs as the complementary graph to facilitate relevance modeling. In this paper, we extensively investigate how to naturally fuse the semantic textual information with the user behavior graph, and further propose three novel AdsGNN models to aggregate topological neighborhood from the perspectives of nodes, edges and tokens. Furthermore, two critical but rarely investigated problems, domain-specific pre-training and long-tail ads matching, are studied thoroughly. Empirically, we evaluate the AdsGNN models over the large industry dataset, and the experimental results of online/offline tests consistently demonstrate the superiority of our proposal.

Hybrid Learning to Rank for Financial Event Ranking
Authors: Fuli Feng (1), Moxin Li (2), Cheng Luo (3), Ritchie Ng (2), Tat-Seng Chua (2)
1: Sea-NExT Joint Lab & National University of Singapore, 2: National University of Singapore, 3: MegaTech.AI

ACM DL

Google Scholar

(23)
概要:　金融市場は行政命令の発行などの出来事によって動かされます。そのため、金融市場の参加者（例：トレーダー）は、自分が関心を持つ金融資産（例：石油）に関連する金融ニュースに常に注意を払っています。ニュースの流れが大規模であるため、金融資産の価格を動かす影響力のある出来事を手動で特定するのは時間と労力がかかります。このため、金融市場の参加者は自動的な金融イベントランキングを受け入れざるを得ませんが、この分野にはこれまであまり注目が集まっていません。本研究では、金融イベントランキングの課題を定式化し、特定の資産（クエリ）に対する影響力に基づいて金融ニュース（記事）をスコアリングすることを目指します。この課題を解決するため、我々はハイブリッドニュースランキングフレームワークを提案します。このフレームワークでは、資産の視点からニュース記事の内容を比較して影響力を評価し、イベントの視点からは全てのクエリ資産に対する影響力を評価します。さらに、フレームワークのトレーニングに必要な十分なラベルの要件と、ニュースにラベルを付けるためにドメイン専門家を雇うコストが高すぎるというジレンマを解決します。特に、公表された金融アナリストレポートの知識を活用したコストフレンドリーなニュースラベル付けシステムを設計します。こうして、3つの金融イベントランキングデータセットを構築しました。それらのデータセットに関する広範な実験により、提案されたフレームワークの有効性と、ランキング学習を通じて金融イベントランキングを解決する合理性が検証されました。

Abstract:　 The financial markets are moved by events such as the issuance of administrative orders. The participants in financial markets (e.g., traders) thus pay constant attention to financial news relevant to the financial asset (e.g., oil) of interest. Due to the large scale of news stream, it is time and labor intensive to manually identify influential events that can move the price of the financial asset, pushing the financial participants to embrace automatic financial event ranking, which has received relatively little scrutiny to date. In this work, we formulate the financial event ranking task, which aims to score financial news (document) according to its influence to the given asset (query). To solve this task, we propose a Hybrid News Ranking framework that, from the asset perspective, evaluates the influence of news articles by comparing their contents; and from the event perspective, accesses the influence over all query assets. Moreover, we resolve the dilemma between the essential requirement of sufficient labels for training the framework and the unaffordable cost of hiring domain experts for labeling the news. In particular, we design a cost-friendly system for news labeling that leverages the knowledge within published financial analyst reports. In this way, we construct three financial event ranking datasets. Extensive experiments on the datasets validate the effectiveness of the proposed framework and the rationality of solving financial event ranking through learning to rank.

Hybrid Fusion with Intra- and Cross-Modality Attention for Image-Recipe Retrieval
Authors: Jiao Li (1), Xing Xu (1), Wei Yu (2), Fumin Shen (2), Zuo Cao (3), Kai Zuo (3), Heng Tao Shen (2)
1: University of Electronic Science and Technology of China, 2: University of Electronic Science and Technology of China, 3: MEITUAN

ACM DL

Google Scholar

(24)
概要:　食事画像から関連するレシピを検索する、またはその逆を行う画像レシピ検索は、インターネット上で食事関連の画像やレシピを共有することが一般的な傾向となっているため、広く注目を集めています。既存の方法はこの問題を典型的なクロスモーダル検索タスクとして、画像とレシピの類似性を学習する形で解決しています。これらの方法は画像レシピ検索において著しい成果を上げているものの、以下の三つの重要な点を一貫して組み込むことは難しいかもしれません：(1) 材料と手順の関連性、(2) 微細な画像情報、(3) レシピと画像の潜在的な整合性。これに対処するために、我々は正確な画像レシピの類似性を学習するための新しい枠組み「Hybrid Fusion with Intra- and Cross-Modality Attention (HF-ICMA)」を提案します。我々のHF-ICMAモデルは、レシピ内の材料と手順の相互作用に注目する「イントラレシピ融合モジュール」を採用し、二つの個別の埋め込みの表現をさらに豊かにします。同時に、「画像レシピ融合モジュール」を考案し、微細な画像領域とレシピの材料の潜在的な関係を探ります。これにより、局所的および大局的な側面の両方から最終的な画像レシピの類似性が形成されます。大規模なベンチマークデータセットRecipe1Mでの広範な実験により、我々のモデルはさまざまな画像レシピ検索シナリオで最先端のアプローチを大幅に上回る性能を示しました。

Abstract:　 Image-recipe retrieval, which aims at retrieving the relevant recipe from a food image and vice versa, is now attracting widespread attention, since sharing food-related images and recipes on the Internet has become a popular trend. Existing methods have formulated this problem as a typical cross-modal retrieval task by learning the image-recipe similarity. Though these methods have made inspiring achievements for image-recipe retrieval, they may still be less effective to jointly incorporate the three crucial points: (1) the association between ingredients and instructions, (2) fine-grained image information, and (3) the latent alignment between recipes and images. To this end, we propose a novel framework namedHybrid Fusion with Intra- and Cross-Modality Attention (HF-ICMA) to learn accurate image-recipe similarity. Our HF-ICMA model adopts an intra-recipe fusion module to focus on the interaction between ingredients and instructions within a recipe, and further enriches the expressions of the two separate embeddings. Meanwhile, an image-recipe fusion module is devised to explore the potential relationship between fine-grained image regions and ingredients from the recipe, which jointly forms the final image-recipe similarity from both the local and global aspects. Extensive experiments on the large-scale benchmark dataset Recipe1M show that our model significantly outperforms the state-of-the-art approaches on various image-recipe retrieval scenarios.

PreSizE: Predicting Size in E-Commerce using Transformers
Authors: Yotam Eshel (1), Or Levi (1), Haggai Roitman (1), Alex Nus (1)
1: eBay Research

ACM DL

Google Scholar

(25)
概要:　最近のeコマースファッション業界の進展により、パーソナライズを改善して購入者の体験を向上させる新しい方法が探索されています。本研究では、アイテムの適切なサイズを予測して推薦することが重要なパーソナライズの課題であり、それを研究の対象としています。従来の研究では、購入者のフィットメントに関するフィードバックの明示的なモデリングや、問題の一側面（例えば、特定のカテゴリやブランド）だけに焦点を当てていました。より最近の研究では、コンテンツベースまたはシーケンスベースの高機能なモデルが提案され、問題のコンテンツベースの側面や購入者のオンラインジャーニーをよりうまくモデリングしていました。しかし、これらのアプローチは、それぞれの特定のケースでは機能しないことがあります。例えば、新しいアイテムに遭遇した場合（シーケンスベースのモデル）や新しいユーザーに遭遇した場合（コンテンツベースのモデル）です。これらのギャップに対処するために、我々はPreSizEという新しいディープラーニングフレームワークを提案します。このフレームワークは、トランスフォーマーモデルを利用して正確なサイズ予測を行います。PreSizEはブランドやカテゴリーなどのコンテンツベースの属性と、購入者の購入履歴がサイズ選好に与える影響の両方をモデル化します。大規模なeコマースデータセットを用いた広範な実験により、PreSizEは従来の最新のベースラインと比較して優れた予測性能を達成できることを示しました。アイテム属性をエンコードすることにより、PreSizEは新規アイテムに対するコールドスタートケースや、購入履歴が少ないユーザーの場合もよりうまく処理します。概念実証として、PreSizEによるサイズ予測が既存の生産レコメンダーシステムに効果的に統合され、非常に効果的な機能を提供し、推薦を大幅に改善することを示します。

Abstract:　 Recent advances in the e-commerce fashion industry have led to an exploration of novel ways to enhance buyer experience via improved personalization. Predicting a proper size for an item to recommend is an important personalization challenge, and is being studied in this work. Earlier works in this field either focused on modeling explicit buyer fitment feedback or modeling of only a single aspect of the problem (e.g., specific category, brand, etc.). More recent works proposed richer models, either content-based or sequence-based, better accounting for content-based aspects of the problem or better modeling the buyer's online journey. However, both these approaches fail in certain scenarios: either when encountering unseen items (sequence-based models) or when encountering new users (content-based models). To address the aforementioned gaps, we propose PreSizE -- a novel deep learning framework which utilizes Transformers for accurate size prediction. PreSizE models the effect of both content-based attributes, such as brand and category, and the buyer's purchase history on her size preferences. Using an extensive set of experiments on a large-scale e-commerce dataset, we demonstrate that PreSizE is capable of achieving superior prediction performance compared to previous state-of-the-art baselines. By encoding item attributes, PreSizE better handles cold-start cases with unseen items, and cases where buyers have little past purchase data. As a proof of concept, we demonstrate that size predictions made by PreSizE can be effectively integrated into an existing production recommender system yielding very effective features and significantly improving recommendations.

Understanding and Mitigating Bias in Online Health Search
Authors: Anat Hashavit (1), Hongning Wang (2), Raz Lin (1), Tamar Stern (1), Sarit Kraus (1)
1: Bar-Ilan University, 2: University of Virginia

ACM DL

Google Scholar

(26)
概要:　検索エンジンは一般的な情報ニーズに対して信頼できる情報源と見なされています。しかし、検索エンジンを用いて医療に関する質問に答えることは、普通のユーザーにとって困難です。コンテンツは偏っている可能性があり、結果は異なる意見を提示する場合があります。加えて、医療関連の内容を解釈するのは、医療のバックグラウンドがないユーザーにとって難しいことです。これらの要素が、健康に関する質問に対して誤った結論を導く原因となりえます。本研究では、この問題に対して2つの視点からアプローチします。まず、検索エンジンを使用して医療に関する質問に正確に答えるユーザーの能力について調査するため、包括的なユーザースタディを実施しました。治療効果に関する質問では、参加者が正しい答えを見つけるのに苦労し、治療効果を過大評価しやすいことが示されました。また、参加者の年齢や教育レベルなどの人口統計的特性を分析した結果、この問題が全ての人口統計グループにおいて存在することが分かりました。次に、医療コミュニティが見なす治療効果に関するクエリに対して正しい答えを見つけるための半自動の機械学習アプローチを提案します。このモデルは、クエリに関連する医療論文に提示される意見およびその影響を表す特徴に依存しています。我々の手法は人間の行動と比較して、バイアスに陥りにくいことが示されています。我々は、治療効果を医療論文の意見のみに基づいて決定するベースライン手法との比較を通じて、さまざまな推論モデルの構成を評価しました。その結果、我々のアプローチが複雑な健康関連の内容をユーザーに仲介するバイアスフリーの自動ツールの開発に繋がる可能性があることを裏付けることができました。

Abstract:　 Search engines are perceived as a reliable source for general information needs. However, finding the answer to medical questions using search engines can be challenging for an ordinary user. Content can be biased and results may present different opinions. In addition, interpreting medically related content can be difficult for users with no medical background. All of these can lead users to incorrect conclusions regarding health related questions. In this work we address this problem from two perspectives. First, to gain insight on users' ability to correctly answer medical questions using search engines, we conduct a comprehensive user study. We show that for questions regarding medical treatment effectiveness, participants struggle to find the correct answer and are prone to overestimating treatment effectiveness. We analyze participants' demographic traits according to age and education level and show that this problem persists in all demographic groups. We then propose a semi-automatic machine learning approach to find the correct answer to queries on medical treatment effectiveness as it is viewed by the medical community. The model relies on the opinions presented in medical papers related to the queries, as well as features representing their impact. We show that, compared to human behaviour, our method is less prone to bias. We compare various configurations of our inference model and a baseline method that determines treatment effectiveness based solely on the opinion of medical papers. The results bolster our confidence that our approach can pave the way to developing automatic bias-free tools that can help mediate complex health related content to users.

Enhanced Doubly Robust Learning for Debiasing Post-Click Conversion Rate Estimation
Authors: Siyuan Guo (1), Lixin Zou (2), Yiding Liu (2), Wenwen Ye (2), Suqi Cheng (2), Shuaiqiang Wang (2), Hechang Chen (1), Dawei Yin (2), Yi Chang (1)
1: Jilin University, 2: Baidu Inc.

ACM DL

Google Scholar

(27)
概要:　ポストクリックコンバージョン（Post-click Conversion）は、ユーザーの嗜好を示す強力なシグナルとしてレコメンダーシステムの構築に有益です。しかし、クリック後のコンバージョン率（CVR）を正確に推定することは、選択バイアスのために困難です。すなわち、観測されたクリックイベントは通常、ユーザーが好むアイテムに対して発生します。現在、大多数の既存の手法はカウンターファクチュアル学習を利用してレコメンダーシステムのバイアスを軽減しています。その中でも、二重ロバスト（DR）推定量は、誤差修正ベース（EIB）推定量と逆傾向スコア（IPS）推定量を二重にロバストな方法で組み合わせることで、競争力のあるパフォーマンスを達成しています。しかし、不正確な誤差修正はIPS推定量よりも高い分散を引き起こす可能性があります。さらに、既存の手法は通常、簡単なモデル非依存の方法で誤差修正を推定しており、動的に変化するモデル関連のターゲット（すなわち、予測モデルの勾配方向）を近似するには不十分です。これらの問題を解決するために、まずDR推定量のバイアスと分散を導出しました。それに基づいて、さらに分散を低減しながら二重ロバスト性を維持する、よりロバストな二重ロバスト（MRDR）推定量を提案しました。さらに、MRDR推定量のための新しいダブルラーニングアプローチを提案し、これは誤差修正を一般的なCVR推定に変換することができます。また、提案した学習スキームが誤差修正学習の高い分散の問題をさらに排除できることを実証的に確認しました。その有効性を評価するために、半合成データセットと2つの実世界のデータセットで広範な実験を行いました。結果は、提案手法が最新の手法を凌駕する優越性を示しています。コードはhttps://github.com/guosyjlu/MRDR-DLで利用可能です。

Abstract:　 Post-click conversion, as a strong signal indicating the user preference, is salutary for building recommender systems. However, accurately estimating the post-click conversion rate (CVR) is challenging due to the selection bias, i.e., the observed clicked events usually happen on users' preferred items. Currently, most existing methods utilize counterfactual learning to debias recommender systems. Among them, the doubly robust (DR) estimator has achieved competitive performance by combining the error imputation based (EIB) estimator and the inverse propensity score (IPS) estimator in a doubly robust way. However, inaccurate error imputation may result in its higher variance than the IPS estimator. Worse still, existing methods typically use simple model-agnostic methods to estimate the imputation error, which are not sufficient to approximate the dynamically changing model-correlated target (i.e., the gradient direction of the prediction model). To solve these problems, we first derive the bias and variance of the DR estimator. Based on it, a more robust doubly robust (MRDR) estimator has been proposed to further reduce its variance while retaining its double robustness. Moreover, we propose a novel double learning approach for the MRDR estimator, which can convert the error imputation into the general CVR estimation. Besides, we empirically verify that the proposed learning scheme can further eliminate the high variance problem of the imputation learning. To evaluate its effectiveness, extensive experiments are conducted on a semi-synthetic dataset and two real-world datasets. The results demonstrate the superiority of the proposed approach over the state-of-the-art methods. The code is available at https://github.com/guosyjlu/MRDR-DL.

Adapting Interactional Observation Embedding for Counterfactual Learning to Rank
Authors: Mouxiang Chen (1), Chenghao Liu (2), Jianling Sun (1), Steven C.H. Hoi (2)
1: Zhejiang University & Alibaba-Zhejiang University Joint Institute of Frontier Technologies, 2: Salesforce Research Asia

ACM DL

Google Scholar

(28)
概要:　反事実性ランキング学習（Counterfactual Learning to Rank, CLTR）は、クリックログを用いてランカーを訓練できる能力から、魅力的な研究テーマとなっています。しかし、CLTRは本質的に、多量のバイアスに悩まされます。これは、観察（閲覧）行動とクリック行動の両方に影響を与える交絡因子が原因です。最近のバイアス修正の試みは主に位置バイアスに焦点を当てており、ランキングリスト内の各観察が孤立しており、位置にのみ依存すると仮定しています。この方法は効果的ですが、ユーザーはしばしば文書と対話的に関与します。大量のデータを収集しても、観察とクリックの相互作用を無視すれば、大きな相互作用的観察バイアスが生じます。本研究では、埋め込み法を活用して観察確率を推定する相互作用観察モデル（Interactional Observation-Based Model, IOBM）を開発します。観察とクリックの相互作用には、複雑な観測されたおよび未観測の交絡因子が存在しますが、埋め込みを代理交絡因子として使用し、観察傾向の予測に必要な情報を明らかにすることが十分であると主張します。さらに、埋め込みは完全に特定された生成モデルの代替となり得、観察とクリックの複雑な相互作用構造を切り離します。IOBMでは、まず位置とクリック情報を捉えるために個別の観察埋め込みを学習します。その後、地域の相互作用構造を明らかにするために相互作用観察埋め込みを学習します。無関係な情報を除去し、文脈バイアスを減少させるために、クエリ文脈情報を利用し、内部観察注意と外部観察注意を提案します。二つのランキング学習ベンチマークデータセット上で広範な実験を行い、提案されたIOBMがさまざまなクリック状況で一貫してベースラインモデルより優れた性能を発揮し、相互作用的観察バイアスを排除する効果を検証しました。

Abstract:　 Counterfactual Learning to Rank (CLTR) becomes an attractive research topic due to its capability of training ranker with click logs. However, CLTR inherently suffers from a large amount of bias caused by confounders, variables that affect both the observation (examination) behavior and click behavior. Recent efforts to correct bias mostly focus on position bias, which assumes that each observation in a ranking list is isolated and only depends on the position. Though effective, users often engage with documents in an interactive manner. Ignoring the interactions between observations/clicks would incur a large interactional observation bias no matter how much data is collected. In this work, we leverage the embedding method to develop an Interactional Observation-Based Model (IOBM) to estimate the observation probability. We argue that while there exist complex observed and unobserved confounders for observation/click interactions, it is sufficient to use the embedding as a proxy confounder to uncover the relevant information for the prediction of the observation propensity. Moreover, the embedding could offer an alternative to the fully specified generative model for observation and decouples the complex interaction structure of observations/clicks. In our IOBM, we first learn the individual observation embedding to capture position and click information. Then, we learn the interactional observation embedding to uncover their local interaction structure. To filter out irrelevant information and reduce contextual bias, we utilize query context information and propose the intra-observation attention and the inter-observation attention, respectively. We conduct extensive experiments on two LTR benchmark datasets, demonstrating that the proposed IOBM consistently achieves better performance over the baseline models in various click situations and verifying its effectiveness of eliminating interactional observation bias.

This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affect User Attitudes on Debated Topics
Authors: Tim Draws (1), Nava Tintarev (2), Ujwal Gadiraju (1), Alessandro Bozzon (1), Benjamin Timmermans (3)
1: Delft University of Technology, 2: Maastricht University, 3: IBM

ACM DL

Google Scholar

(29)
概要:　議論のあるトピックに関するウェブ検索では、アルゴリズム的および認知的バイアスがユーザーが情報を消費し処理する方法に大きな影響を与えます。最近の研究では、これが検索エンジン操作効果（SEME）を引き起こす可能性があることが示されています。つまり、検索結果のランキングが特定の視点に偏っていると、ユーザーはこの偏った視点を採用しがちです。SEMEのメカニズムをよりよく理解するために、事前登録された5×3因子のユーザースタディを実施し、順位効果（すなわち、上位にランクされた文書に関連する視点をユーザーが採用すること）がSEMEを引き起こすかどうかを調査しました。5つの異なる議題について、軽度の既存の態度を持つ参加者に対して全体的に視点が均衡した検索結果を提示し、アルゴリズムランキングバイアスの3つのレベルのいずれかを反映した結果にさらした後の態度変化を評価しました。結果として、ランキングバイアスのレベル間で態度変化に違いはなく、個々のユーザーの差異に基づく変動も見られませんでした。したがって、順位効果はSEMEの基礎的なメカニズムではないことが示唆されます。探索的分析により、暴露効果（すなわち、ユーザーが検討する結果の中で多数派の視点を採用すること）がユーザーの態度変化に寄与する要因であることが支持されました。我々の発見がユーザーバイアス緩和戦略の設計にどのように貢献するかについても議論します。

Abstract:　 In web search on debated topics, algorithmic and cognitive biases strongly influence how users consume and process information. Recent research has shown that this can lead to a search engine manipulation effect (SEME): when search result rankings are biased towards a particular viewpoint, users tend to adopt this favored viewpoint. To better understand the mechanisms underlying SEME, we present a pre-registered, 5 x 3 factorial user study investigating whether order effects (i.e., users adopting the viewpoint pertaining to higher-ranked documents) can cause SEME. For five different debated topics, we evaluated attitude change after exposing participants with mild pre-existing attitudes to search results that were overall viewpoint-balanced but reflected one of three levels of algorithmic ranking bias. We found that attitude change did not differ across levels of ranking bias and did not vary based on individual user differences. Our results thus suggest that order effects may not be an underlying mechanism of SEME. Exploratory analyses lend support to the presence of exposure effects (i.e., users adopting the majority viewpoint among the results they examine) as a contributing factor to users' attitude change. We discuss how our findings can inform the design of user bias mitigation strategies.

Societal Biases in Retrieved Contents: Measurement Framework and Adversarial Mitigation of BERT Rankers
Authors: Navid Rekabsaz (1), Simone Kopeinik (2), Markus Schedl (1)
1: Johannes Kepler University, 2: Know-Center GmbH

ACM DL

Google Scholar

(30)
概要:　情報検索（IR）システムによって取得されるコンテンツには社会的偏見が反映され、既存のステレオタイプを強化する結果となります。この問題に取り組むには、検索結果における様々な社会集団の表象に関する公平性を測るための確立された基準と、特に深層ランキングモデルの進展を踏まえた偏見を軽減する方法が必要です。本研究では、まずランキングモデルによって取得されたテキストコンテンツの公平性を測定するための新しいフレームワークを提供します。ランカー非依存の測定を導入することで、このフレームワークはコレクションの公平性への影響をランカーの影響から分離することも可能にします。これらの偏見を軽減するために、我々はAdvBertを提案します。これは、IRに対する敵対的偏見軽減法を適用して達成されたランキングモデルであり、関連性の予測と保護された属性の除去を共同で学習します。我々は、ジェンダー属性に関する選択されたクエリサブセットについての公平性アノテーションを追加した2つのパッセージ検索コレクション（MSMARCO Passage Re-rankingとTREC Deep Learning 2019 Passage Re-ranking）で実験を行いました。MSMARCOベンチマークでの結果は、(1)全てのランキングモデルがランカー非依存のベースラインに比べて公平性が低く、(2)提案されたAdvBertモデルを使用することでBertランカーの公平性が大幅に向上することを示しています。最後に、公平性と有効性の間のトレードオフを調査し、有効性に大きな損失を伴うことなく公平性の大幅な向上を維持できることを示しました。

Abstract:　 Societal biases resonate in the retrieved contents of information retrieval (IR) systems, resulting in reinforcing existing stereotypes. Approaching this issue requires established measures of fairness in respect to the representation of various social groups in retrieval results, as well as methods to mitigate such biases, particularly in the light of the advances in deep ranking models. In this work, we first provide a novel framework to measure the fairness in the retrieved text contents of ranking models. Introducing a ranker-agnostic measurement, the framework also enables the disentanglement of the effect on fairness of collection from that of rankers. To mitigate these biases, we propose AdvBert, a ranking model achieved by adapting adversarial bias mitigation for IR, which jointly learns to predict relevance and remove protected attributes. We conduct experiments on two passage retrieval collections (MSMARCO Passage Re-ranking and TREC Deep Learning 2019 Passage Re-ranking), which we extend by fairness annotations of a selected subset of queries regarding gender attributes. Our results on the MSMARCO benchmark show that, (1) all ranking models are less fair in comparison with ranker-agnostic baselines, and (2) the fairness of Bert rankers significantly improves when using the proposed AdvBert models. Lastly, we investigate the trade-off between fairness and utility, showing that we can maintain the significant improvements in fairness without any significant loss in utility.

Bootstrapping User and Item Representations for One-Class Collaborative Filtering
Authors: Dongha Lee (1), SeongKu Kang (1), Hyunjun Ju (1), Chanyoung Park (2), Hwanjo Yu (1)
1: Pohang University of Science and Technology (POSTECH), 2: Korea Advanced Institute of Science and Technology (KAIST)

ACM DL

Google Scholar

(31)
概要:　 One-class collaborative filtering (OCCF) の目標は、ユーザーとアイテムのペアの中でまだ相互作用がないが、正の関連性を持つものを特定することであり、ここではユーザーの暗黙的なフィードバックなどの一部の正のユーザーとアイテムの相互作用のみが観測されます。正の相互作用と負の相互作用の間の識別モデリングのために、過去の多くの研究はある程度までネガティブサンプリングに依存しており、これは観測されていないユーザーとアイテムのペアを負のものとして考慮することを意味しますが、実際の負のものは不明です。しかし、ネガティブサンプリング方式には、「正のものだが未観測」というペアを負のものとして選択してしまうという重大な制限があります。本論文は、ネガティブサンプリングを必要としない新しいOCCFフレームワーク、BUIRを提案します。BUIRは、正の関連性のあるユーザーとアイテムの表現を類似させつつ、崩壊解を回避するために、互いに学習する2つの異なるエンコーダーネットワークを採用します。最初のエンコーダーは2番目のエンコーダーの出力を目標として予測するように訓練され、2番目のエンコーダーはゆっくりと第1のエンコーダーに近づくことで一貫した目標を提供します。さらに、BUIRはエンコーダー入力に確率的データ増強を適用することにより、OCCFのデータスパース性問題を効果的に緩和します。ユーザーとアイテムの近傍情報に基づき、BUIRは各正の相互作用の拡張ビューをランダムに生成し、その後、この自己監視によりモデルをさらに訓練します。我々の広範な実験では、BUIRが特に負の相互作用に関する仮定があまり有効でない非常にスパースなデータセットにおいて、大きな差を持ってすべてのベースライン方法を一貫して大幅に上回ることを示しています。

Abstract:　 The goal of one-class collaborative filtering (OCCF) is to identify the user-item pairs that are positively-related but have not been interacted yet, where only a small portion of positive user-item interactions (e.g., users' implicit feedback) are observed. For discriminative modeling between positive and negative interactions, most previous work relied on negative sampling to some extent, which refers to considering unobserved user-item pairs as negative, as actual negative ones are unknown. However, the negative sampling scheme has critical limitations because it may choose "positive but unobserved" pairs as negative. This paper proposes a novel OCCF framework, named as BUIR, which does not require negative sampling. To make the representations of positively-related users and items similar to each other while avoiding a collapsed solution, BUIR adopts two distinct encoder networks that learn from each other; the first encoder is trained to predict the output of the second encoder as its target, while the second encoder provides the consistent targets by slowly approximating the first encoder. In addition, BUIR effectively alleviates the data sparsity issue of OCCF, by applying stochastic data augmentation to encoder inputs. Based on the neighborhood information of users and items, BUIR randomly generates the augmented views of each positive interaction each time it encodes, then further trains the model by this self-supervision. Our extensive experiments demonstrate that BUIR consistently and significantly outperforms all baseline methods by a large margin especially for much sparse datasets in which any assumptions about negative interactions are less valid.

Unsupervised Proxy Selection for Session-based Recommender Systems
Authors: Junsu Cho (1), SeongKu Kang (1), Dongmin Hyun (1), Hwanjo Yu (1)
1: Pohang University of Science and Technology

ACM DL

Google Scholar

(32)
概要:　セッションベースのレコメンダーシステム（SRS）は、匿名の短いアイテムシーケンス（つまり、セッション）の次のアイテムを推薦するために積極的に開発されています。各ユーザーの全体のインタラクションシーケンスを使用して短期的な興味と一般的な興味の両方をモデリングできるシーケンス認識レコメンダーシステムとは異なり、SRSではユーザー依存の情報が欠如しているため、データから直接ユーザーの一般的な興味を導き出すことが困難です。このため、既存のSRSはセッション内の短期的な興味に関する情報を効果的にモデル化することに焦点を当てていますが、ユーザーの一般的な興味を捉えるには不十分です。この課題を克服するために、我々はSRSの制限を克服するための新たなフレームワーク「ProxySR」を提案します。ProxySRはセッションを代理することでSRSに欠けている情報（つまりユーザーの一般的な興味）を模倣します。ProxySRは入力されたセッションの代理を非教師あり方式で選択し、それをセッションのエンコードされた短期的な興味と組み合わせます。代理は短期的な興味と共同で学習され、複数のセッションによって選ばれるため、代理はユーザーの一般的な興味を果たす役割を学習し、ProxySRは入力セッションに適した代理を選択する方法を学習します。さらに、わずかにログインして識別子をセッションに残すユーザーがいる実際のSRSのシナリオと、そのシナリオに対するProxySRの修正も提案します。我々の実験は、実データセットを用いて、ProxySRが最先端の競合他社を大幅に上回り、代理がユーザー依存の情報なしにユーザーの一般的な興味を模倣することに成功することを示しています。

Abstract:　 Session-based Recommender Systems (SRSs) have been actively developed to recommend the next item of an anonymous short item sequence (i.e., session). Unlike sequence-aware recommender systems where the whole interaction sequence of each user can be used to model both the short-term interest and the general interest of the user, the absence of user-dependent information in SRSs makes it difficult to directly derive the user's general interest from data. Therefore, existing SRSs have focused on how to effectively model the information about short-term interest within the sessions, but they are insufficient to capture the general interest of users. To this end, we propose a novel framework to overcome the limitation of SRSs, named ProxySR, which imitates the missing information in SRSs (i.e., general interest of users) by modeling proxies of sessions. ProxySR selects a proxy for the input session in an unsupervised manner, and combines it with the encoded short-term interest of the session. As a proxy is jointly learned with the short-term interest and selected by multiple sessions, a proxy learns to play the role of the general interest of a user and ProxySR learns how to select a suitable proxy for an input session. Moreover, we propose another real-world situation of SRSs where a few users are logged-in and leave their identifiers in sessions, and a revision of ProxySR for the situation. Our experiments on real-world datasets show that ProxySR considerably outperforms the state-of-the-art competitors, and the proxies successfully imitate the general interest of the users without any user-dependent information.

xLightFM: Extremely Memory-Efficient Factorization Machine
Authors: Gangwei Jiang (1), Hao Wang (1), Jin Chen (2), Haoyu Wang (3), Defu Lian (1), Enhong Chen (1)
1: University of Science and Technology of China, 2: University of Electronic Science and Technology of China, 3: SUNY Buffalo

ACM DL

Google Scholar

(33)
概要:　特徴量分解モデルは、オンライン広告やレコメンデーションシステムにおいて、組み合わせ特徴を効率的にモデル化できるため、大きな成功を収めてきました。これらのモデルは、特徴エンベッディング間のベクトル積により特徴相互作用をエンコードします。一般化の向上にもかかわらず、これらのモデルのメモリ消費は大きく増加します。なぜなら、通常、数百から数千もの大規模カテゴリカル特徴を入力とするからです。現存のいくつかの研究は、ハッシングやランダムエンベッディング生成、次元の探索等によってメモリフットプリントを削減しようと試みていますが、これらは大幅な性能低下や限定的なメモリ圧縮の問題に直面しています。そこで本論文では、各カテゴリエンベッディングがコードブックから選択された潜在ベクトルで構成される、非常にメモリ効率の高い特徴量分解機（xLightFM）を提案します。各カテゴリカル特徴の特性に基づいて、ニューラルアーキテクチャ探索技術を用い、各カテゴリカル特徴のエンベッディングを生成するためにコードブックサイズを適応させることも提案します。これにより、予測性能にほとんど影響を与えず、さらに場合によっては向上させつつ、メモリ圧縮の限界に挑戦します。本アルゴリズムの有効性を、2つの実データセットを用いて評価しました。その結果、xLightFMは、現行の軽量な特徴量分解ベースの方法を、予測品質とメモリフットプリントの両面で上回り、それぞれのデータセットでバニラFMと比較して18倍および27倍以上のメモリ圧縮を達成しました。

Abstract:　 The factorization-based models have achieved great success in online advertisements and recommender systems due to the capability of efficiently modeling combinational features. These models encode feature interactions by the vector product between feature embedding. Despite the improvement of generalization, the memory consumption of these models grows significantly, because they usually take hundreds to thousands of large categorical features as input. Several existing works try to reduce the memory footprint by hashing, randomized embedding composition, and dimensionality search, but they suffer from either substantial performance degradation or limited memory compression. To this end, in this paper, we propose an extremely memory-efficient Factorization Machine (xLightFM), where each category embedding is composited with latent vectors selected from codebooks. Based on the characteristics of each categorical feature, we further propose to adapt the codebook size with the neural architecture search techniques for compositing the embedding of each categorical feature. This further pushes the limits of memory compression while incurring negligible degradation or even some improvements in prediction performance. We extensively evaluate the proposed algorithm with two real-world datasets. The results demonstrate that xLightFM can outperform the state-of-the-art lightweight factorization-based methods in terms of both prediction quality and memory footprint, and achieve more than 18x and 27x memory compression compared to the vanilla FM on these two datasets, respectively.

Counterfactual Data-Augmented Sequential Recommendation
Authors: Zhenlei Wang (1), Jingsen Zhang (1), Hongteng Xu (1), Xu Chen (1), Yongfeng Zhang (2), Wayne Xin Zhao (1), Ji-Rong Wen (1)
1: Renmin University of China, 2: Rutgers University

ACM DL

Google Scholar

(34)
概要:　連続的なレコメンデーションは、ユーザーの過去の行動に基づいてユーザーの嗜好を予測することを目的としています。しかし、このレコメンデーション戦略は、実世界データの希少性ゆえに実際にはうまく機能しない場合があります。本研究では、不完全なトレーニングデータの影響を軽減し、連続的なレコメンデーションモデルを強化するための新しい反実仮想データ拡張フレームワークを提案します。我々のフレームワークは、サンプリングモデルとアンカーモデルから構成されています。サンプリングモデルは観測されたユーザー行動シーケンスに基づいて新しいシーケンスを生成することを目的としており、アンカーモデルは観測および生成されたシーケンスの両方に基づいて最終的なレコメンデーションリストを提供するために利用されます。我々は、ユーザーが「以前に購入した商品が異なっていた場合、何を購入したいと思ったか？」という重要な反実仮想の質問に答えるためにサンプリングモデルを設計しました。ヒューリスティックな介入方法を超え、2つの学習ベースの方法を活用してサンプリングモデルを実装し、アンカーモデルのトレーニング時の生成シーケンスの質を向上させます。さらに、生成シーケンスがアンカーモデルに与える影響を理論的に分析し、生成シーケンスによって導入される情報とノイズのバランスを取ります。9つの実世界データセットでの実験により、我々のフレームワークの有効性と汎用性が示されました。

Abstract:　 Sequential recommendation aims at predicting users' preferences based on their historical behaviors. However, this recommendation strategy may not perform well in practice due to the sparsity of the real-world data. In this paper, we propose a novel counterfactual data augmentation framework to mitigate the impact of the imperfect training data and empower sequential recommendation models. Our framework is composed of a sampler model and an anchor model. The sampler model aims to generate new user behavior sequences based on the observed ones, while the anchor model is leveraged to provide the final recommendation list, which is trained based on both observed and generated sequences. We design the sampler model to answer the key counterfactual question: "what would a user like to buy if her previously purchased items had been different?". Beyond heuristic intervention methods, we leverage two learning-based methods to implement the sampler model, and thus, improve the quality of the generated sequences when training the anchor model. Additionally, we analyze the influence of the generated sequences on the anchor model in theory and achieve a trade-off between the information and the noise introduced by the generated sequences. Experiments on nine real-world datasets demonstrate our framework's effectiveness and generality.

StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking
Authors: Jiachun Wang (1), Fajie Yuan (2), Jian Chen (3), Qingyao Wu (3), Min Yang (4), Yang Sun (4), Guoxiao Zhang (5)
1: South China University of Technology & Chinese Academy of Sciences, 2: Westlake University & Tencent, 3: South China University of Technology, 4: Chinese Academy of Sciences, 5: Tencent

ACM DL

Google Scholar

(35)
概要:　深層学習は系列推薦（Sequential Recommendation, SR）タスクに大きな進歩をもたらしました。高度なネットワークアーキテクチャにより、系列推薦モデルは多くの隠れ層、例えば実世界の推薦データセットで最大100層まで重ねることができます。このような深層ネットワークのトレーニングは計算コストが非常に高くなる上に、多くの時間を要するため困難です。特に、数百億のユーザーアイテムのインタラクションが存在する状況ではなおさらです。この課題に対処するために、我々はStackRecを提案します。これは、反復的な層のスタッキングにより、深層SRモデルのトレーニングを簡単かつ非常に効果的かつ効率的に行うフレームワークです。具体的には、十分にトレーニングされた深層SRモデルの隠れ層/ブロックが非常に似た分布を持つという重要な洞察を提供します。これに基づき、事前学習された層/ブロックでスタッキング操作を行い、浅いモデルから深いモデルへの知識の移転を行います。その後、反復的にスタッキングを実行することで、より深く、しかしトレーニングが容易なSRモデルを生成します。我々は、実世界のデータセットを用いた3つの実用シナリオで、4つの最先端SRモデルにStackRecをインスタンス化することにより、パフォーマンスを検証しました。広範な実験結果により、StackRecはスクラッチからトレーニングされたSRモデルと比較して、同等のパフォーマンスを達成するだけでなく、トレーニング時間においても大幅な加速を実現しました。コードはhttps://github.com/wangjiachun0426/StackRecで公開されています。

Abstract:　 Deep learning has brought great progress for the sequential recommendation (SR) tasks. With advanced network architectures, sequential recommender models can be stacked with many hidden layers, e.g., up to 100 layers on real-world recommendation datasets. Training such a deep network is difficult because it can be computationally very expensive and takes much longer time, especially in situations where there are tens of billions of user-item interactions. To deal with such a challenge, we present StackRec, a simple, yet very effective and efficient training framework for deep SR models by iterative layer stacking. Specifically, we first offer an important insight that hidden layers/blocks in a well-trained deep SR model have very similar distributions. Enlightened by this, we propose the stacking operation on the pre-trained layers/blocks to transfer knowledge from a shallower model to a deep model, then we perform iterative stacking so as to yield a much deeper but easier-to-train SR model. We validate the performance of StackRec by instantiating it with four state-of-the-art SR models in three practical scenarios with real-world datasets. Extensive experiments show that StackRec achieves not only comparable performance, but also substantial acceleration in training time, compared to SR models that are trained from scratch. Codes are available at https://github.com/wangjiachun0426/StackRec.

CauseRec: Counterfactual User Sequence Synthesis for Sequential Recommendation
Authors: Shengyu Zhang (1), Dong Yao (1), Zhou Zhao (1), Tat-Seng Chua (2), Fei Wu (1)
1: Zhejiang University, 2: National University of Singapore

ACM DL

Google Scholar

(36)
概要:　歴史的な行動に基づいてユーザー表現を学習することは、現代のレコメンダーシステムの核心にあります。近年の逐次レコメンダーの進歩により、与えられた行動シーケンスから効果的なユーザー表現を抽出する高い能力が確実に示されています。しかし、観察行動シーケンスのみをモデル化することは、記録されたユーザーのインタラクションのノイズやスパース性が原因で、もろく不安定なシステムを生じる可能性があると主張します。本論文では、ノイズのある行動への（攻撃に対して）感受性が低く、必要不可欠な行動をより信頼することで、正確かつ堅牢なユーザー表現を学習する手法を提案します。具体的には、観察された行動シーケンスに対して、提案するCauseRecフレームワークは、微細なアイテムレベルおよび的な興味レベルの両方で不要な概念と必要不可欠な概念を識別します。CauseRecは、元の概念シーケンス内の不要な概念と必要不可欠な概念を置き換えることによって、反事実データ分布からユーザー概念シーケンスを条件付きでサンプリングします。合成されたユーザーシーケンスから得られたユーザー表現を用いて、CauseRecは、観察行動と対比させることで、対比的なユーザー表現学習を行います。我々は、実世界の公開推薦ベンチマークで広範な実験を行い、多面的なモデル分析でCauseRecの有効性を正当化します。その結果、提案するCauseRecは、正確かつ堅牢なユーザー表現を学習することで、最新の逐次レコメンダーを上回ることを示しました。

Abstract:　 Learning user representations based on historical behaviors lies at the core of modern recommender systems. Recent advances in sequential recommenders have convincingly demonstrated high capability in extracting effective user representations from the given behavior sequences. Despite significant progress, we argue that solely modeling the observational behaviors sequences may end up with a brittle and unstable system due to the noisy and sparse nature of user interactions logged. In this paper, we propose to learn accurate and robust user representations, which are required to be less sensitive to (attack on) noisy behaviors and trust more on the indispensable ones, by modeling counterfactual data distribution. Specifically, given an observed behavior sequence, the proposed CauseRec framework identifies dispensable and indispensable concepts at both the fine-grained item level and the abstract interest level. CauseRec conditionally samples user concept sequences from the counterfactual data distributions by replacing dispensable and indispensable concepts within the original concept sequence. With user representations obtained from the synthesized user sequences, CauseRec performs contrastive user representation learning by contrasting the counterfactual with the observational. We conduct extensive experiments on real-world public recommendation benchmarks and justify the effectiveness of CauseRec with multi-aspects model analysis. The results demonstrate that the proposed CauseRec outperforms state-of-the-art sequential recommenders by learning accurate and robust user representations.

Sequential Recommendation with Graph Neural Networks
Authors: Jianxin Chang (1), Chen Gao (1), Yu Zheng (1), Yiqun Hui (2), Yanan Niu (2), Yang Song (2), Depeng Jin (1), Yong Li (1)
1: Tsinghua University, 2: Beijing Kuaishou Technology Co.

ACM DL

Google Scholar

(37)
概要:　シーケンシャルレコメンデーションは、ユーザーの過去の行動を活用して次のインタラクションを予測することを目的としています。既存の研究は、シーケンシャルレコメンデーションにおける二つの主要な課題にまだ対応していません。まず、ユーザーの豊かな履歴シーケンス中の行動は、多くの場合、暗黙的でノイズが多い好みの信号であり、ユーザーの実際の好みを十分に反映していません。さらに、ユーザーの動的な好みは急速に変化することがあり、その履歴シーケンス内のパターンを捉えるのは困難です。本研究では、これら二つの問題に対処するために、SURGE（SeqUential Recommendation with Graph neural nEtworks）と呼ばれるグラフニューラルネットワークモデルを提案します。具体的には、SURGEは長期的なユーザー行動の中の異なるタイプの好みを、メトリック学習に基づいて緩いアイテムシーケンスを緊密なアイテム間の興味グラフに再構築することにより、グラフ内のクラスタに統合します。これにより、興味グラフ内に密なクラスタを形成することで、ユーザーの主要な興味を明確に区別できます。その後、構築されたグラフに対してクラスタ認識およびクエリ認識のグラフ畳み込み伝播とグラフプーリングを実行します。これにより、ノイズの多いユーザー行動シーケンスからユーザーの現在活性化された主要な興味を動的に融合・抽出します。公的および専用の産業データセットにおいて広範な実験を行いました。実験結果は、提案手法が最新の手法と比較して有意な性能向上を示すことを証明しました。シーケンス長に関するさらなる研究により、提案手法が長い行動シーケンスを効果的かつ効率的にモデル化できることが確認されました。

Abstract:　 Sequential recommendation aims to leverage users' historical behaviors to predict their next interaction. Existing works have not yet addressed two main challenges in sequential recommendation. First, user behaviors in their rich historical sequences are often implicit and noisy preference signals, they cannot sufficiently reflect users' actual preferences. In addition, users' dynamic preferences often change rapidly over time, and hence it is difficult to capture user patterns in their historical sequences. In this work, we propose a graph neural network model called SURGE (short forSeqUential Recommendation with Graph neural nEtworks) to address these two issues. Specifically, SURGE integrates different types of preferences in long-term user behaviors into clusters in the graph by re-constructing loose item sequences into tight item-item interest graphs based on metric learning. This helps explicitly distinguish users' core interests, by forming dense clusters in the interest graph. Then, we perform cluster-aware and query-aware graph convolutional propagation and graph pooling on the constructed graph. It dynamically fuses and extracts users' current activated core interests from noisy user behavior sequences. We conduct extensive experiments on both public and proprietary industrial datasets. Experimental results demonstrate significant performance gains of our proposed method compared to state-of-the-art methods. Further studies on sequence length confirm that our method can model long behavioral sequences effectively and efficiently.

Category-aware Collaborative Sequential Recommendation
Authors: Renqin Cai (1), Jibang Wu (1), Aidan San (1), Chong Wang (2), Hongning Wang (1)
1: University of Virginia, 2: Bytedance

ACM DL

Google Scholar

(38)
概要:　シーケンシャルレコメンデーションは、ユーザーの過去のインタラクション履歴に基づいて次のアイテムを予測するタスクです。次のアクションが過去のアクションにどのように依存するかを正確にモデル化することが、この問題の解決において重要です。さらに、シーケンシャルレコメンデーションは、ユーザーの行動シーケンスにおけるアイテム間の遷移の希薄性という重大な課題に直面しています。これにより、実践的なソリューションの有用性が制限されることがあります。これらの課題に対処するために、私たちは「カテゴリー認識コラボレーティブシーケンシャルレコメンダー」を提案します。

私たちの予備的な統計テストでは、同じカテゴリー内のアイテム間の遷移は、元のシーケンスで観察された一般的なアイテム間の遷移よりも次のアイテムの強力な指標であることが示されました。私たちの方法は、アイテムカテゴリーを2つの方法で活用します。まず、レコメンダーはアイテムカテゴリーを利用して、ユーザー自身の行動を整理し、過去の行動に基づいた依存関係のモデル化を強化します。自己注意を利用して同一カテゴリー内の遷移パターンを捉え、最近の行動のカテゴリーに基づいてどの遷移パターンを考慮するかを決定します。次に、アイテムカテゴリーを利用して、同一カテゴリー内の嗜好が似ているユーザーを検索し、ユーザー間のコラボレーティブ学習を促進して希薄性を克服します。注意機構を用いて、ターゲットユーザーのために検索されたユーザーのカテゴリー内遷移パターンを組み込みます。

2つの大規模なデータセットにおいて、広範な実験を行った結果、最新のシーケンシャルレコメンデーションモデルと比較して、私たちのソリューションの有効性が証明されました。

Abstract:　 Sequential recommendation is the task of predicting the next items for users based on their interaction history. Modeling the dependence of the next action on the past actions accurately is crucial to this problem. Moreover, sequential recommendation often faces serious sparsity of item-to-item transitions in a user's action sequence, which limits the practical utility of such solutions. To tackle these challenges, we propose a Category-aware Collaborative Sequential Recommender. Our preliminary statistical tests demonstrate that the in-category item-to-item transitions are often much stronger indicators of the next items than the general item-to-item transitions observed in the original sequence. Our method makes use of item category in two ways. First, the recommender utilizes item category to organize a user's own actions to enhance dependency modeling based on her own past actions. It utilizes self-attention to capture in-category transition patterns, and determines which of the in-category transition patterns to consider based on the categories of recent actions. Second, the recommender utilizes the item category to retrieve users with similar in-category preferences to enhance collaborative learning across users, and thus conquer sparsity. It utilizes attention to incorporate in-category transition patterns from the retrieved users for the target user. Extensive experiments on two large datasets prove the effectiveness of our solution against an extensive list of state-of-the-art sequential recommendation models.

Event Occurrence Date Estimation based on Multivariate Time Series Analysis over Temporal Document Collections
Authors: Jiexin Wang (1), Adam Jatowt (2), Masatoshi Yoshikawa (1)
1: Kyoto University, 2: University of Innsbruck

ACM DL

Google Scholar

(39)
概要:　現実世界の出来事はしばしばテキスト中で言及されます。イベントの発生時刻を推定することは、情報検索（IR）、質問応答（QA）、一般的な文書理解、およびその他のNLP（自然言語処理）タスクに多くの応用があります。本論文では、テキスト中のイベント言及に対する時間プロファイリングのアプローチを提案します。我々の手法はニュース記事のアーカイブコレクションを利用し、現代的およびレトロスペクティブなイベントの参照を含む時間的およびテキスト情報を収集します。我々の実験で示すように、ウィキペディアなどの二次データソースに依存する従来の手法は、特に過去に起こった小さなあるいはあまり知られていないイベントの発生時刻を正しく推定するには不十分です。それに対し、我々の手法はニュース記事アーカイブを活用して過去のイベントの発生時刻を効果的に推測し、日、週、月、年など異なる時間粒度でその時刻を推定することが可能です。広範な実験により、提案されたモデルはすべての粒度で既存の手法を大幅に上回る性能を示しています。また、我々のアプローチはニュース記事アーカイブを対象としたQAフレームワークに組み込むことで、過去のイベントに関する任意の質問に答えることができることも実証されています。

Abstract:　 Real world events are quite often mentioned in texts. Estimating the occurrence time of event mentions has many applications in IR, QA, general document understanding and downstream NLP tasks. In this paper we propose an approach to temporal profiling of event mentions in text. Our method utilizes a news article archival collection for collecting temporal as well as textual information containing contemporary and retrospective event references. As we demonstrate in our experiments, the recent method which relies on secondary data sources like Wikipedia is insufficient to correctly estimate the event time, especially, for minor or less well-known events that happened in the past. Our method then harnesses news article archives to effectively infer the occurrence time of past events, and is able to estimate the time at different temporal granularities (e.g., day, week, month, or year). As evidenced through extensive experiments, the proposed model outperforms the existing methods by a large margin at all granularities. We also demonstrate that our approach helps to answer arbitrary questions about past events, when incorporated into a QA framework operating over news article archives.

Temporal Knowledge Graph Reasoning Based on Evolutional Representation Learning
Authors: Zixuan Li (1), Xiaolong Jin (1), Wei Li (2), Saiping Guan (1), Jiafeng Guo (1), Huawei Shen (1), Yuanzhuo Wang (1), Xueqi Cheng (1)
1: School of Computer Science and Technology, 2: Baidu Inc.

ACM DL

Google Scholar

(40)
概要:　不完全な知識グラフ（KG）に対する欠落した事実の予測を行う知識グラフ推論は広く研究されています。しかし、未来の事実を予測する時系列知識グラフ（TKG）に対する推論は、まだ十分に解決されていません。未来の事実を予測する鍵は、過去の事実を徹底的に理解することです。TKGは、異なるタイムスタンプに対応する一連のKGであり、各KG内の同時発生事実は構造的依存性を示し、時間的に連続した事実は情報量の多い順序パターンを持ちます。これらの特性を効果的かつ効率的に捉えるために、グラフ畳み込みネットワーク（GCN）に基づく新しいリカレント進化ネットワーク (RE-GCN) を提案します。このモデルは、KGのシーケンスをリカレントにモデル化することで、各タイムスタンプにおけるエンティティと関係の進化的表現を学習します。具体的には、進化ユニットとして、関係認識GCNを用いて各タイムスタンプ時点のKG内の構造的依存性を捉えます。並行して全ての事実の順序パターンを捉えるために、過去のKGシーケンスを自己回帰的にゲートリカレント成分でモデル化します。さらに、エンティティの種類などの静的特性を静的グラフ制約成分を通じて取り入れることで、より良いエンティティ表現を得ます。未来タイムスタンプの事実予測は、進化的なエンティティおよび関係の表現に基づいて実現されます。広範な実験により、RE-GCNモデルは6つのベンチマークデータセットでの時間的推論タスクにおいて、性能と効率が大幅に向上することが証明されています。特に、エンティティ予測において最大11.46%のMRR改善を達成し、最先端のベースラインと比較して最大82倍の速度向上を実現しています。

Abstract:　 Knowledge Graph (KG) reasoning that predicts missing facts for incomplete KGs has been widely explored. However, reasoning over Temporal KG (TKG) that predicts facts in the future is still far from resolved. The key to predict future facts is to thoroughly understand the historical facts. A TKG is actually a sequence of KGs corresponding to different timestamps, where all concurrent facts in each KG exhibit structural dependencies and temporally adjacent facts carry informative sequential patterns. To capture these properties effectively and efficiently, we propose a novel Recurrent Evolution network based on Graph Convolution Network (GCN), called RE-GCN, which learns the evolutional representations of entities and relations at each timestamp by modeling the KG sequence recurrently. Specifically, for the evolution unit, a relation-aware GCN is leveraged to capture the structural dependencies within the KG at each timestamp. In order to capture the sequential patterns of all facts in parallel, the historical KG sequence is modeled auto-regressively by the gate recurrent components. Moreover, the static properties of entities, such as entity types, are also incorporated via a static graph constraint component to obtain better entity representations. Fact prediction at future timestamps can then be realized based on the evolutional entity and relation representations. Extensive experiments demonstrate that the RE-GCN model obtains substantial performance and efficiency improvement for the temporal reasoning tasks on six benchmark datasets. Especially, it achieves up to 11.46% improvement in MRR for entity prediction with up to 82 times speedup compared to the state-of-the-art baseline.

Summarize Dates First: A Paradigm Shift in Timeline Summarization
Authors: Moreno La Quatra (1), Luca Cagliero (1), Elena Baralis (1), Alberto Messina (2), Maurizio Montagnuolo (2)
1: Politecnico di Torino, 2: Radiotelevisione Italiana (RAI)

ACM DL

Google Scholar

(41)
概要:　タイムラインは、長いニュース記事をコンパクトに提示することを目指しています。最新のアプローチでは、まず元のイベントタイムラインから最も関連性の高い日付を選択し、次に日付ごとのニュースを生成します。日付選択は、日付ごとのニュース内容や日付レベルの参照に基づいて行われます。ニュースフローの重複が特徴的な複雑なイベントデータに対処する際、このパイプラインは、日付選択にニュース内容を限られた範囲でしか使用せず、高レベルの時間参照（例えば過去1か月）を全く使用しないため、日付選択およびに関連する問題に直面する可能性があります。本論文では、これらの問題を克服するために、タイムラインにおけるパラダイムシフトを提案します。それは、新しいアプローチである「最初に日付を」という方法を導入し、まず日付レベルのを生成し、その後された知識の上で最も関連性の高い日付を選択することに焦点を当てます。後者の段階では、高レベルの時間参照も考慮に入れて日付の集約を行います。提案されたパイプラインは、以前のアプローチよりも効率的に頻繁なインクリメンタルタイムライン更新をサポートします。我々の非教師ありアプローチは、既存のベンチマークデータセットおよび新たに提案されたCOVID-19ニュースタイムラインを記述するベンチマークデータセットの両方でテストされ、その結果、最新の非教師あり手法を上回り、教師あり手法とも競合し得る成果を達成しました。

Abstract:　 Timeline summarization aims at presenting long news stories in a compact manner. State-of-the-art approaches first select the most relevant dates from the original event timeline then produce per-date news summaries. Date selection is driven by either per-date news content or date-level references. When coping with complex event data, characterized by inherent news flow redundancy, this pipeline may encounter relevant issues in both date selection and summarization due to a limited use of news content in date selection and no use of high-level temporal references (e.g., the past month). This paper proposes a paradigm shift in timeline summarization aimed at overcoming the above issues. It presents a new approach, namely Summarize Date First, which focuses on first generating date-level summaries then selecting the most relevant dates on top of summarized knowledge. In the latter stage, it performs date aggregations to consider high-level temporal references as well. The proposed pipeline also supports frequent incremental timeline updates more efficiently than previous approaches. We tested our unsupervised approach both on existing benchmark datasets and on a newly proposed benchmark dataset describing the COVID-19 news timeline. The achieved results were superior to state-of-the-art unsupervised methods and competitive against supervised ones.

TIE: A Framework for Embedding-based Incremental Temporal Knowledge Graph Completion
Authors: Jiapeng Wu (1), Yishi Xu (2), Yingxue Zhang (3), Chen Ma (4), Mark Coates (4), Jackie Chi Kit Cheung (1)
1: McGill University & MILA, 2: University of Montreal & MILA, 3: Montreal Research Center & Huawei Noah's Ark Lab, 4: McGill University

ACM DL

Google Scholar

(42)
概要:　時系列知識グラフ（TKG）における推論は、情報検索とセマンティックサーチにおいて重要な課題です。特に、TKGが頻繁に更新される場合にはその難易度が高まります。モデルはTKGの変化に適応し、効率的な訓練と推論を行いつつ、過去の知識に関するパフォーマンスも維持しなければなりません。最近の研究では、エンコーダ・デコーダ枠組みに時間認識エンコーディング機能を追加することでTKG補完（TKGC）に取り組んでいます。しかし、これらの手法を用いてモデルを各時間ステップで単純に微調整するだけでは、1) カタストロフィックフォーゲッティング、2) 事実の変化（例：政治的所属の変更や結婚の終了）を特定するモデルの無能、3) 訓練効率の欠如といった問題が解決されません。これらの課題に対処するために、TKG表現学習、経験再生、および時系列正則化を組み合わせたTime-aware Incremental Embedding (TIE)フレームワークを提案します。モデルの非適応性を特徴付ける一連の指標を導入し、削除された事実を否定ラベルと関連付ける制約を提案します。Wikidata12kおよびYAGO11kデータセットでの実験結果は、提案されたTIEフレームワークが従来の全バッチトレーニングに比べ、訓練時間を約10倍短縮し、提案した指標において改善を示すことを明らかにしました。これは、従来の測定基準に対するパフォーマンスの顕著な低下を伴うものではありません。広範なアブレーション研究により、さまざまな評価指標間でのパフォーマンスのトレードオフが明らかになり、実際のTKGアプリケーションに関する意思決定にとって重要です。

Abstract:　 Reasoning in a temporal knowledge graph (TKG) is a critical task for information retrieval and semantic search. It is particularly challenging when the TKG is updated frequently. The model has to adapt to changes in the TKG for efficient training and inference while preserving its performance on historical knowledge. Recent work approaches TKG completion (TKGC) by augmenting the encoder-decoder framework with a time-aware encoding function. However, naively fine-tuning the model at every time step using these methods does not address the problems of 1) catastrophic forgetting, 2) the model's inability to identify the change of facts (e.g., the change of the political affiliation and end of a marriage), and 3) the lack of training efficiency. To address these challenges, we present the Time-aware Incremental Embedding (TIE) framework, which combines TKG representation learning, experience replay, and temporal regularization. We introduce a set of metrics that characterizes the intransigence of the model and propose a constraint that associates the deleted facts with negative labels. Experimental results on Wikidata12k and YAGO11k datasets demonstrate that the proposed TIE framework reduces training time by about ten times and improves on the proposed metrics compared to vanilla full-batch training. It comes without a significant loss in performance for any traditional measures. Extensive ablation studies reveal performance trade-offs among different evaluation metrics, which is essential for decision-making around real-world TKG applications.

Allowing for The Grounded Use of Temporal Difference Learning in Large Ranking Models via Substate Updates
Authors: Daniel Cohen (1)
1: Brown University

ACM DL

Google Scholar

(43)
概要:　我々は、IR（情報検索）のための時系列差分学習（temporal difference learning）の広範な利用を促進するために、確立された強化学習方法の修正を紹介する。それが、補間サブステート時系列差分（Interpolated Substate Temporal Difference: ISSTD）学習である。強化学習法はドキュメントランキングで成功を示しているが、これらの貢献は比較的古いポリシー勾配法（例えばREINFORCE）に依存してきた。これらの方法は、高分散の勾配推定やサンプルの非効率性などの問題を伴い、深層ニューラル検索モデルの訓練において重大な障害を引き起こす。強化学習のコミュニティには、時系列差分アップデートに基づく学習方法（例えば、Q学習、アクター-クリティック、SARSAなど）に関する膨大な研究が存在し、これらはREINFORCEで見られる問題の一部を解決する。しかし、時系列差分方法はランキングモデル内部に状態全体をモデル化する必要があり、これは深層全文検索や最初のステージの検索には現実的ではない。したがって、我々はISSTDを提案し、マッチングモデルの場合、サブステート、つまり個々のドキュメントに基づいて操作し、時系列差分アップデートを状態全体に補間する。我々は、収束に関する理論的保証も提供し、時系列差分アップデートを利用するあらゆるアルゴリズムにISSTDを適用できるようにする。さらに、実証結果は、深層ニューラルモデルに対するこのアプローチのロバスト性を示し、深層ニューラル検索モデルの訓練において、現在のポリシー勾配アプローチを上回ることが実証されている。

Abstract:　 We introduce a modification of an established reinforcement learning method to facilitate the widespread use of temporal difference learning for IR: interpolated substate temporal difference (ISSTD) learning. While reinforcement learning methods have shown success in document ranking, these contributions have relied on relatively antiquated policy gradient methods like REINFORCE. These methods bring associated issues like high variance gradient estimates and sample inefficiency, which presents significant obstacles when training deep neural retrieval models. Within the reinforcement learning community, there exists a substantial body of work on alternative methods of training which revolve around temporal difference updates, such as Q-learning, Actor-Critic, or SARSA, that resolve some of the issues seen in REINFORCE. However, temporal difference methods require the full size of the state to be modeled internally within the ranking model, which is unrealistic for deep full text retrieval or first stage retrieval. We therefore propose ISSTD, operating on the substate, or individual documents in the case of matching models, and interpolating the temporal difference updates to the rest of the state. We provide theoretical guarantees on convergence, enabling the drop in use of ISSTD for any algorithm that relies on temporal difference updates. Furthermore, empirical results demonstrate the robustness of this approach for deep neural models, outperforming the current policy gradient approach for training deep neural retrieval models.

Answer Complex Questions: Path Ranker Is All You Need
Authors: Xinyu Zhang (1), Ke Zhan (1), Enrui Hu (1), Chengzhen Fu (2), Lan Luo (1), Hao Jiang (1), Yantao Jia (2), Fan Yu (1), Zhicheng Dou (3), Zhao Cao (2), Lei Chen (4)
1: Huawei, 2: Huawei, 3: Renmin University of China, 4: Hong Kong University of Science and Technology

ACM DL

Google Scholar

(44)
概要:　現在、オープンドメインの質問応答（QA）において最も広く採用されている手法は「リトリーバーとリーダー」パイプラインです。ここで、リトリーバーは大量の文書から候補文書のリストを抽出し、ランカーが最も関連性の高い文書をランク付けし、リーダーが候補から答えを抽出します。しかし、既存の研究では、各ホップ（ステップ）ごとにランク付けのためのサンプルのみを使用し、文書全体にわたるグローバルな情報を無視するという貪欲な戦略を取っています。本論文では、完全にランクベースのフレームワークである「Thinking Path Re-Ranker（TPRR）」を提案します。TPRRは、文書のシーケンス「パス」を生成する「Thinking Path Ranker（TPR）」と、TPRが生成する候補パスから最適なパスを選ぶ「External Path Reranker（EPR）」から構成されています。具体的には、TPRは高密度モデルのスコアと条件付き確率を利用して完全なパスをスコアリングします。さらに、高密度ランカーの反復トレーニングのパフォーマンスを向上させるために、「thinking」ネガティブ選択法を提案し、現在のホップにおいてネガティブと見なされるトップKの候補を動的に調整します。複数のサポートパスをTPRを通じて取得した後、EPRコンポーネントがQAのためのいくつかの詳細なトレーニングタスクを統合して最適なパスを選択し、答えを抽出します。提案手法は、マルチホップデータセット「HotpotQA」のフルウィキ設定でテストされ、その結果、TPRRが既存の最先端モデルを大幅に上回ることが示されました。さらに、我々の手法は2021年2月1日以来、HotpotQA公式リーダーボードのフルウィキ設定で1位となっています。コードはhttps://gitee.com/mindspore/mindspore/ tree/master/model_zoo/research/nlp/tprrで入手可能です。

Abstract:　 Currently, the most popular method for open-domain Question Answering (QA) adopts "Retriever and Reader" pipeline, where the retriever extracts a list of candidate documents from a large set of documents followed by a ranker to rank the most relevant documents and the reader extracts answer from the candidates. Existing studies take the greedy strategy in the sense that they only use samples for ranking at the current hop, and ignore the global information across the whole documents. In this paper, we propose a purely rank-based framework Thinking Path Re-Ranker (TPRR), which is comprised of Thinking Path Ranker (TPR) for generating document sequences called "a path" and External Path Reranker (EPR) for selecting the best path from candidate paths generated by TPR. Specifically, TPR leverages the scores of a dense model and conditional probabilities to score the full paths. Moreover, to further enhance the performance of the dense ranker in the iterative training, we propose a "thinking" negatives selection method that the top-K candidates treated as negatives in the current hop are adjusted dynamically through supervised signals. After achieving multiple supporting paths through TPR, the EPR component which integrates several fine-grained training tasks for QA is used to select the best path for answer extraction. We have tested our proposed solution on the multi-hop dataset "HotpotQA" with a full wiki set ting, and the results show that TPRR significantly outperforms the existing state-of-the-art models. Moreover, our method has won the first place in the HotpotQA official leaderboard since Feb 1, 2021 under the Fullwiki setting. Code is available at https://gitee.com/mindspore/mindspore/ tree/master/model_zoo/research/nlp/tprr.

Reinforcement Learning from Reformulations in Conversational Question Answering over Knowledge Graphs
Authors: Magdalena Kaiser (1), Rishiraj Saha Roy (1), Gerhard Weikum (1)
1: Max Planck Institute for Informatics

ACM DL

Google Scholar

(45)
概要:　個人アシスタントの普及により、会話型質問応答（ConvQA）はユーザーとシステムのインタラクションにおいて非常に人気のあるメカニズムとなっています。ナレッジグラフ（KG）に基づく最先端のConvQAメソッドは、一般的なベンチマークで見られる正確な質問応答ペアからしか学習できません。しかし実際には、そのようなトレーニングデータを手に入れるのは困難です。ユーザーが明示的に回答を正しいか誤っているかを示すことは稀だからです。本研究では、質問の再構築を通じた雑音の多い暗黙のフィードバックから学習する、より自然な学習パラダイムへの一歩を踏み出します。再構築は誤ったシステム応答によって引き起こされる可能性が高い一方、新たなフォローアップ質問は前のターンの答えに対する肯定的なシグナルとなる可能性があります。我々は会話の流れるような質問と再構築から学習できる強化学習モデル、CONQUERを提案します。CONQUERは答えを導くプロセスを、複数のエージェントがKG上で並列に歩く形でモデル化し、そのプロセスはポリシーネットワークを使用して採取される行動によって決定されます。このポリシーネットワークは質問と会話のコンテキストを入力として受け取り、再構築の可能性から得られる雑音の多い報酬を通じて訓練されます。CONQUERを評価するために、約11kの自然な会話と約205kの再構築を含むベンチマーク、ConvRefを作成し公開します。実験の結果、CONQUERは雑音の多い報酬信号から学習し、最先端のベースラインと比較して大幅に改善されることが示されました。

Abstract:　 The rise of personal assistants has made conversational question answering (ConvQA) a very popular mechanism for user-system interaction. State-of-the-art methods for ConvQA over knowledge graphs (KGs) can only learn from crisp question-answer pairs found in popular benchmarks. In reality, however, such training data is hard to come by: users would rarely mark answers explicitly as correct or wrong. In this work, we take a step towards a more natural learning paradigm - from noisy and implicit feedback via question reformulations. A reformulation is likely to be triggered by an incorrect system response, whereas a new follow-up question could be a positive signal on the previous turn's answer. We present a reinforcement learning model, termed CONQUER, that can learn from a conversational stream of questions and reformulations. CONQUER models the answering process as multiple agents walking in parallel on the KG, where the walks are determined by actions sampled using a policy network. This policy network takes the question along with the conversational context as inputs and is trained via noisy rewards obtained from the reformulation likelihood. To evaluate CONQUER, we create and release ConvRef, a benchmark with about 11k natural conversations containing around 205k reformulations. Experiments show that CONQUER successfully learns from noisy reward signals, significantly improving over a state-of-the-art baseline.

Ranking User-Generated Content via Multi-Relational Graph Convolution
Authors: Kanika Narang (1), Adit Krishnan (1), Junting Wang (1), Chaoqi Yang (1), Hari Sundaram (1), Carolyn Sutter (1)
1: University of Illinois at Urbana-Champaign

ACM DL

Google Scholar

(46)
概要:　ユーザー生成コンテンツの品質のばらつきは、オンラインプラットフォーム上でコミュニティにサービスを提供する際の主な障害となっています。現在のコンテンツランキング手法は、主に各ユーザーポストのテキストおよび非テキストコンテンツの特徴を個別に評価しています。本論文では、ユーザーコンテンツ間の暗黙的および明示的な関係面を考慮することの有用性を示し、その品質を評価する方法を提案します。まず、独立して導出されたコンテンツグラフを通じて、ユーザー生成コンテンツの対比（または競合）および類似性に基づく関係面を表現する、モジュラーかつプラットフォーム非依存のフレームワークを開発しました。次に、競合するコンテンツの特徴対比と類似するコンテンツの特徴平滑化/共有を可能にする、補完的な2つのグラフ畳み込み演算子を開発しました。各コンテンツグラフのエッジのセマンティクスに応じて、そのノードを上記の2つのメカニズムのいずれかを用いて埋め込みます。また、対比演算子が競合するポストの埋め込み間に差別的な拡大を生み出すことを示しました。さらに、クラシカルなブースティング手法を適用して各コンテンツグラフの最終層埋め込みを結合することで、典型的なスタッキング、フュージョン、またはノード近傍埋め込み集約法を用いるグラフ畳み込みアーキテクチャを大きく上回ることを示す驚くべき結果も得ました。本手法は、50の多様なStack-Exchange (https://stackexchange.com/) サイトにおける承認回答予測を通じて徹底的に検証され、最先端のニューラル、多関係、およびテキストベースラインに対して一貫して5%以上の精度向上を示しました。

Abstract:　 The quality variance in user-generated content is a major bottleneck to serving communities on online platforms. Current content ranking methods primarily evaluate text and non-textual content features of each user post in isolation. In this paper, we demonstrate the utility of considering the implicit and explicit relational aspects across user content to assess their quality. First, we develop a modular platform-agnostic framework to represent the contrastive (or competing) and similarity-based relational aspects of user-generated content via independently induced content graphs. Second, we develop two complementary graph convolutional operators that enable feature contrast for competing content and feature smoothing/sharing for similar content. Depending on the edge semantics of each content graph, we embed its nodes via one of the above two mechanisms. We also show that our contrastive operator creates discriminative magnification across the embeddings of competing posts. Third, we show a surprising result-applying classical boosting techniques to combine final-layer embeddings across the content graphs significantly outperforms the typical stacking, fusion, or neighborhood embedding aggregation methods in graph convolutional architectures. We exhaustively validate our method via accepted answer prediction over fifty diverse Stack-Exchange (https://stackexchange.com/) websites with consistent relative gains of over 5% accuracy over state-of-the-art neural, multi-relational and textual baselines.

Answering Any-hop Open-domain Questions with Iterative Document Reranking
Authors: Yuyu Zhang (1), Ping Nie (2), Arun Ramamurthy (3), Le Song (1)
1: Georgia Institute of Technology, 2: Peking University, 3: Siemens Corporate Technology

ACM DL

Google Scholar

(47)
概要:　既存のオープンドメインの質問応答（QA）アプローチは、質問の複雑さについて強い仮定を行い、単一の推論ステップまたは複数の推論ステップを必要とする質問に対して設計されています。さらに、多段階のドキュメント検索は、関連性はあるが支持しないドキュメントの数が増加する傾向があり、これは回答抽出のためのノイズに敏感なリーダーモジュールに悪影響を与えます。これらの課題に対処するために、我々は任意ホップのオープンドメイン質問に応答するための統一QAフレームワークを提案します。このフレームワークは、ドキュメントを反復的に検索、再ランク付け、およびフィルタリングし、検索プロセスを終了するタイミングを適応的に決定します。検索精度を向上させるために、マルチドキュメント間の相互作用を実行するグラフベースの再ランク付けモデルを提案し、これを反復再ランク付けフレームワークの中核とします。我々の手法は、Natural Questions Open、SQuAD Open、およびHotpotQAなどの単一ホップおよび複数ホップのオープンドメインQAデータセットにおいて、一貫して最新技術と同等またはそれ以上の性能を達成します。

Abstract:　 Existing approaches for open-domain question answering (QA) are typically designed for questions that require either single-hop or multi-hop reasoning, which make strong assumptions of the complexity of questions to be answered. Also, multi-step document retrieval often incurs higher number of relevant but non-supporting documents, which dampens the downstream noise-sensitive reader module for answer extraction. To address these challenges, we propose a unified QA framework to answer any-hop open-domain questions, which iteratively retrieves, reranks and filters documents, and adaptively determines when to stop the retrieval process. To improve the retrieval accuracy, we propose a graph-based reranking model that perform multi-document interaction as the core of our iterative reranking framework. Our method consistently achieves performance comparable to or better than the state-of-the-art on both single-hop and multi-hop open-domain QA datasets, including Natural Questions Open, SQuAD Open, and HotpotQA.

Multimodal Activation: Awakening Dialog Robots without Wake Words
Authors: Liqiang Nie (1), Mengzhao Jia (1), Xuemeng Song (1), Ganglu Wu (2), Harry Cheng (1), Jian Gu (2)
1: Shandong University, 2: Alibaba Group

ACM DL

Google Scholar

(48)
概要:　対話ロボットと会話する際、ユーザーはまず「Hey Siri」などの特定のウェイクワードを使ってロボットを待機モードから起動しなければならず、これはユーザーフレンドリーではありません。最新世代の対話ロボットには、カメラのような高度なセンサーが搭載され、マルチモーダルな起動が可能となっています。本研究では、ウェイクワードなしでロボットを起動することを目指します。この課題を達成するために、我々はマルチモーダルアクティベーションスキーム（MAS）を提案します。このスキームは、オーディオ・ビジュアルの一貫性検出と意味的発話意図推論という2つの主要な要素で構成されています。前者は、聞こえた音声がカメラの前にいる検出されたユーザーから発せられたものかどうかを判断するために、音声と視覚モダリティの一貫性を測ることを目的としています。このため、顔のランドマーク特徴とMFCC音声特徴をそれぞれ畳み込み処理する、異種のCNNベースのネットワークが導入されます。後者は、記録された音声の意味的発話意図を推論するもので、音声の転写が認識され、行列分解を利用して潜在的な人間-ロボットの対話トピックを明らかにします。最終的にこれら2つの要素を統合するために、異なる融合戦略を考案します。MASを評価するために、194人の招待ボランティアによって記録された12,741本の短いビデオを含むデータセットを構築しました。広範な実験により、我々のスキームの有効性が実証されました。

Abstract:　 When talking to the dialog robots, users have to activate the robot first from the standby mode with special wake words, such as "Hey Siri", which is apparently not user-friendly. The latest generation of dialog robots have been equipped with advanced sensors, like the camera, enabling multimodal activation. In this work, we work towards awaking the robot without wake words. To accomplish this task, we present a Multimodal Activation Scheme (MAS), consisting of two key components: audio-visual consistency detection and semantic talking intention inference. The first one is devised to measure the consistency between the audio and visual modalities in order to figure out weather the heard speech comes from the detected user in front of the camera. Towards this end, two heterogeneous CNN-based networks are introduced to convolutionalize the fine-grained facial landmark features and the MFCC audio features, respectively. The second one is to infer the semantic talking intention of the recorded speech, where the transcript of the speech is recognized and matrix factorization is utilized to uncover the latent human-robot talking topics. We ultimately devise different fusion strategies to unify these two components. To evaluate MAS, we construct a dataset containing 12,741 short videos recorded by 194 invited volunteers. Extensive experiments demonstrate the effectiveness of our scheme.

RCD: Relation Map Driven Cognitive Diagnosis for Intelligent Education Systems
Authors: Weibo Gao (1), Qi Liu (1), Zhenya Huang (1), Yu Yin (1), Haoyang Bi (1), Mu-Chun Wang (1), Jianhui Ma (1), Shijin Wang (2), Yu Su (2)
1: University of Science and Technology of China, 2: IFLYTEK

ACM DL

Google Scholar

(49)
概要:　認知診断（Cognitive Diagnosis, CD）は、インテリジェントな教育環境において基本的な問題であり、学生がさまざまな知識概念に対する習熟度を発見することを目的としています。一般的に、これまでの多くの研究では、IRTにおける学生問題間の相互作用やDINAにおける学生概念間の相互作用など、これを層間の相互作用モデリングの問題として考えていますが、概念間の教育的な相互依存関係などの層内の構造的関係はまだ十分に探求されていません。さらに、CDシステムにおける学生-問題-概念の階層的関係を包括的にモデル化する試みが不足しています。このような背景から、本論文では、新たな「関係地図駆動型認知診断（Relation map driven Cognitive Diagnosis, RCD）」フレームワークを提案します。これは、学生、問題、概念の間のインタラクティブな関係と構造的な関係を多層的な学生-問題-概念関係地図を用いて統一的にモデル化します。具体的には、まず、学生、問題、概念を階層的なレイアウトの個別ノードとして表現し、学生-問題の相互作用マップ、概念-問題の相関マップ、概念の依存関係マップを含む、三つのよく定義された局所的な関係地図を構築し、これらを通じて層間および層内の関係を組み込みます。次に、マルチレベルのアテンションネットワークを利用して、各ローカルマップ内のノードレベルの関係集約を統合し、異なるマップ間でのマップレベルの集約のバランスを取ります。最後に学生のパフォーマンスを予測し、ネットワークを共同訓練するために拡張可能な診断関数を設計します。実データセットに対する広範な実験結果は、診断精度の向上と関係認識の表現学習の両面において、我々のRCDの有効性と拡張性を明確に示しています。

Abstract:　 Cognitive diagnosis (CD) is a fundamental issue in intelligent educational settings, which aims to discover the mastery levels of students on different knowledge concepts. In general, most previous works consider it as an inter-layer interaction modeling problem, e.g., student-exercise interactions in IRT or student-concept interactions in DINA, while the inner-layer structural relations, such as educational interdependencies among concepts, are still underexplored. Furthermore, there is a lack of comprehensive modeling for the student-exercise-concept hierarchical relations in CD systems. To this end, in this paper, we present a novel Relation map driven Cognitive Diagnosis (RCD) framework, uniformly modeling the interactive and structural relations via a multi-layer student-exercise-concept relation map. Specifically, we first represent students, exercises and concepts as individual nodes in a hierarchical layout, and construct three well-defined local relation maps to incorporate inter- and inner-layer relations, including a student-exercise interaction map, a concept-exercise correlation map and a concept dependency map. Then, we leverage a multi-level attention network to integrate node-level relation aggregation inside each local map and balance map-level aggregation across different maps. Finally, we design an extendable diagnosis function to predict students' performance and jointly train the networks. Extensive experimental results on real-world datasets clearly show the effectiveness and extendibility of our RCD in both diagnosis accuracy improvement and relation-aware representation learning.

Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations
Authors: Nghi D. Q. Bui (1), Yijun Yu (1), Lingxiao Jiang (2)
1: Huawei Research Center, 2: Singapore Management University

ACM DL

Google Scholar

(50)
概要:　私たちは、ソースコードモデルのための自己教師付きコントラスト学習フレームワーク「Corder」を提案します。Corderは、コード検索およびコードのタスクにおけるラベル付きデータの必要性を軽減するよう設計されています。Corderの事前学習モデルは2つの方法で使用できます。(1) コードのベクトル表現を生成し、ラベル付きデータがないコード検索タスクに適用することができます。(2) コードのようにラベルデータをまだ必要とするタスクに対するファインチューニングプロセスに使用できます。主な革新点は、コントラスト学習の目標を通じて、類似および非類似のコードスニペットを認識するようにソースコードモデルを訓練することです。これを実現するために、意味を保持する一連の変換オペレーターを使用して、文法的には多様であるが意味的には等価なコードスニペットを生成します。広範な実験を通じて、Corderで事前学習されたコードモデルが、コード間検索、テキストからコードへの検索、およびコードからテキストへのタスクにおいて、他のベースラインを大幅に上回ることを示しました。

Abstract:　 We propose Corder, a self-supervised contrastive learning framework for source code model. Corder is designed to alleviate the need of labeled data for code retrieval and code summarization tasks. The pre-trained model of Corder can be used in two ways: (1) it can produce vector representation of code which can be applied to code retrieval tasks that do not have labeled data; (2) it can be used in a fine-tuning process for tasks that might still require label data such as code summarization. The key innovation is that we train the source code model by asking it to recognize similar and dissimilar code snippets through acontrastive learning objective. To do so, we use a set of semantic-preserving transformation operators to generate code snippets that are syntactically diverse but semantically equivalent. Through extensive experiments, we have shown that the code models pretrained by Corder substantially outperform the other baselines for code-to-code retrieval, text-to-code retrieval, and code-to-text summarization tasks.

Initiative-Aware Self-Supervised Learning for Knowledge-Grounded Conversations
Authors: Chuan Meng (1), Pengjie Ren (1), Zhumin Chen (1), Zhaochun Ren (1), Tengxiao Xi (1), Maarten de Rijke (2)
1: Shandong University, 2: University of Amsterdam & Ahold Delhaize Research

ACM DL

Google Scholar

(51)
概要:　摘要

知識に基づく会話（KGC）タスクでは、システムが外部知識を活用してより情報豊かな応答を生成することを目指しています。KGCの重要な部分として、次の応答に組み込むべき適切な知識を選択する「知識選択」があります。会話では、ユーザーとシステムが新しい会話の方向性を提案する「混合イニシアチブ」が本質的な特徴です。知識選択はユーザーのイニシアチブまたはシステムのイニシアチブに基づいて行われることがあります。前者の場合、システムは通常、新しいトピックや質問を含む現在のユーザー発話に基づいて知識を選択し、後者の場合、システムは通常、以前に選択した知識に基づいて知識を選択します。従来の研究では、知識選択の性能を向上させるために混合イニシアチブの特性を考慮していませんでした。本論文では、KGCのための混合イニシアチブ知識選択法（MIKe）を提案し、ユーザーイニシアチブとシステムイニシアチブの知識選択を明確に区別します。具体的には、両者を別々にモデル化するための2つの知識選択装置を導入し、各会話ターンでの知識選択のイニシアチブタイプを識別する新しいイニシアチブ識別器を設計しました。MIKeのトレーニングの際の課題は、通常イニシアチブを示すラベルがないことです。この課題に対処するため、MIKeが自己教師付きタスクを通じてイニシアチブのタイプを識別することを学ぶのに役立つ、イニシアチブ認識自己教師付き学習スキームを考案しました。2つのデータセットを用いた実験結果は、MIKeが自動および人的評価の両方において最先端の方法を顕著に上回り、より適切な知識を選択し、より情報豊かで魅力的な応答を生成できることを示しています。

Abstract:　 In the knowledge-grounded conversation (KGC) task systems aim to produce more informative responses by leveraging external knowledge. KGC includes a vital part, knowledge selection, where conversational agents select the appropriate knowledge to be incorporated in the next response. Mixed initiative is an intrinsic feature of conversations where the user and the system can both take the initiative in suggesting new conversational directions. Knowledge selection can be driven by the user's initiative or by the system's initiative. For the former, the system usually selects knowledge according to the current user utterance that contains new topics or questions posed by the user; for the latter, the system usually selects knowledge according to the previously selected knowledge. No previous study has considered the mixed-initiative characteristics of knowledge selection to improve its performance. In this paper, we propose a mixed-initiative knowledge selection method (MIKe) for KGC, which explicitly distinguishes between user-initiative and system-initiative knowledge selection. Specifically, we introduce two knowledge selectors to model both of them separately, and design a novel initiative discriminator to discriminate the initiative type of knowledge selection at each conversational turn. A challenge for training MIKe is that we usually have no labels for indicating initiative. To tackle this challenge, we devise an initiative-aware self-supervised learning scheme that helps MIKe to learn to discriminate the initiative type via a self-supervised task. Experimental results on two datasets show that MIKe significantly outperforms state-of-the-art methods in terms of both automatic and human evaluations, indicating that it can select more appropriate knowledge and generate more informative and engaging responses.

Wizard of Search Engine: Access to Information Through Conversations with Search Engines
Authors: Pengjie Ren (1), Zhongkun Liu (1), Xiaomeng Song (1), Hongtao Tian (1), Zhumin Chen (1), Zhaochun Ren (1), Maarten de Rijke (2)
1: Shandong University, 2: University of Amsterdam & Ahold Delhaize

ACM DL

Google Scholar

(52)
概要:　会話型情報探索（CIS）は、人々を情報に結びつけるうえでますます重要な役割を果たしています。しかし、適切なリソースの不足により、これまでのCISに関する研究は、概念的枠組みの研究、実験室ベースのユーザースタディ、あるいはCISの特定の側面（例：明確化質問の提出）に限定されていました。本研究では、CISの研究を促進するために、次の3つの主要な貢献を行います。(1) 意図検出、キーフレーズ抽出、アクション予測、クエリ選択、パッセージ選択、応答生成の6つのサブタスクを含むCISのパイプラインを定式化しました。(2) CISのすべての側面に関する包括的かつ詳細な研究を可能にするベンチマークデータセット「Wise（ウィザード・オブ・サーチエンジン）」を公開しました。(3) 6つのサブタスクの両方を同時におよび個別にトレーニングおよび評価できるニューラルアーキテクチャを設計し、利用可能なデータを最大限に活用してWISEの規模要件を削減できるプレトレーニング/ファインチューニング学習スキームを考案しました。 WISE データセットの統計に基づいて CIS タスクの有用な特性を報告します。また、我々の最も優れたモデル変種が効果的な CIS を達成できることを示します。この重要な研究分野でのさらなる改善を測定することにより、今後の研究を促進するためにデータセット、コード、および評価スクリプトを公開します。

Abstract:　 Conversational information seeking (CIS) is playing an increasingly important role in connecting people to information. Due to a lackof suitable resources, previous studies on CIS are limited to thestudy of conceptual frameworks, laboratory-based user studies, or a particular aspect of CIS (e.g., asking clarifying questions). In this work, we make three main contributions to facilitate research into CIS: (1) We formulate a pipeline for CIS with six subtasks: intent detection, keyphrase extraction, action prediction, query selection, passage selection, and response generation. (2) We release a benchmark dataset, called wizard of search engine (WISE), which allows for comprehensive and in-depth research on all aspects of CIS. (3) We design a neural architecture capable of training and evaluating both jointly and separately on the six sub-tasks, and devise a pre-train/fine-tune learning scheme, that can reduce the requirements of WISE in scale by making full use of available data. We report useful characteristics of the CIS task based on statistics of the WISE dataset. We also show that our best performing model variant is able to achieve effective CIS. We release the dataset, code as well as evaluation scripts to facilitate future research by measuring further improvements in this important research direction.

Semi-Supervised Variational Reasoning for Medical Dialogue Generation
Authors: Dongdong Li (1), Zhaochun Ren (1), Pengjie Ren (1), Zhumin Chen (1), Miao Fan (2), Jun Ma (1), Maarten de Rijke (3)
1: Shandong University, 2: Baidu Inc., 3: University of Amsterdam & Ahold Delhaize

ACM DL

Google Scholar

(53)
概要:　医療対話生成は、医師が診断および治療提案を効率的に得るための自動かつ正確な応答を提供することを目的としています。医療対話において、応答生成に関連する2つの主要な特性は、患者の状態（例えば、症状、投薬）と医師の行動（例えば、診断、治療）です。医療シナリオでは、コストが高く、プライバシー要件があるため、大規模な人間による注釈データが通常は利用できません。したがって、現在の医療対話生成のアプローチは、患者の状態と医師の行動を明示的に考慮せず、暗黙の表現に焦点を当てています。そこで、我々は、医療対話生成のためのエンドツーエンドの変分推論アプローチを提案します。限られたラベル付きデータに対処するために、患者の状態と医師の行動を、それぞれ明示的な患者状態追跡および医師ポリシー学習のためのカテゴリカル事前分布を持つ潜在変数として導入します。我々は、患者の状態および医師の行動に対する近似事後分布を推定するために、変分ベイズ生成アプローチを採用しています。導出された証拠下限を最適化するために、効率的な確率的勾配変分ベイズ推定器を使用し、モデル訓練中のバイアスを減らすために2段階のコラプス推論法を提案します。アクションクラス分類器と2つの推論検出器で構成される医師ポリシーネットワークを提案し、推論能力を強化します。我々は、医療プラットフォームから収集された3つのデータセットで実験を行いました。実験結果は、提案手法が客観的および主観的評価指標において、最新のベースラインを上回ることを示しています。また、提案する半教師付き推論法が、医師ポリシー学習において最新の完全教師付き学習ベースラインと同等の性能を達成することを示しています。

Abstract:　 Medical dialogue generation aims to provide automatic and accurate responses to assist physicians to obtain diagnosis and treatment suggestions in an efficient manner. In medical dialogues, two key characteristics are relevant for response generation: patient states (such as symptoms, medication) and physician actions (such as diagnosis, treatments). In medical scenarios large-scale human annotations are usually not available, due to the high costs and privacy requirements. Hence, current approaches to medical dialogue generation typically do not explicitly account for patient states and physician actions, and focus on implicit representation instead. We propose an end-to-end variational reasoning approach to medical dialogue generation. To be able to deal with a limited amount of labeled data, we introduce both patient state and physician action as latent variables with categorical priors for explicit patient state tracking and physician policy learning, respectively. We propose a variational Bayesian generative approach to approximate posterior distributions over patient states and physician actions. We use an efficient stochastic gradient variational Bayes estimator to optimize the derived evidence lower bound, where a 2-stage collapsed inference method is proposed to reduce the bias during model training. A physician policy network composed of an action-classifier and two reasoning detectors is proposed for augmented reasoning ability. We conduct experiments on three datasets collected from medical platforms. Our experimental results show that the proposed method outperforms state-of-the-art baselines in terms of objective and subjective evaluation metrics. Our experiments also indicate that our proposed semi-supervised reasoning method achieves a comparable performance as state-of-the-art fully supervised learning baselines for physician policy learning.

One Chatbot Per Person: Creating Personalized Chatbots based on Implicit User Profiles
Authors: Zhengyi Ma (1), Zhicheng Dou (1), Yutao Zhu (2), Hanxun Zhong (1), Ji-Rong Wen (1)
1: Renmin University of China & Beijing Key Laboratory of Big Data Management and Analysis Methods, 2: Université de Montréal

ACM DL

Google Scholar

(54)
概要:　パーソナライズされたチャットボットは、一貫した個性を持たせることで、実際のユーザーのように振る舞い、より情報豊富な応答を提供し、さらにパーソナルアシスタントとして機能することを目的としています。既存のパーソナライズ手法では、ユーザープロフィールとしていくつかのテキスト記述を取り入れる試みがされていました。しかし、このような明示的なプロフィールの取得は高コストで時間がかかるため、大規模な現実世界のアプリケーションには実用的ではありません。さらに、事前に定義されたプロフィールでは、実際のユーザーの言語行動を無視しており、ユーザーの興味の変化に伴って自動的に更新されません。本論文では、大規模なユーザーダイアログ履歴から暗黙のユーザープロフィールを自動的に学習し、パーソナライズされたチャットボットを構築する手法を提案します。具体的には、言語理解におけるTransformerの利点を活かし、ユーザーの過去の応答から一般的なユーザープロフィールを構築するためにパーソナライズされた言語モデルを訓練します。入力されたポストに対する関連する過去の応答を強調するために、過去のポストと応答のペアをキー-バリューメモリネットワークとして確立し、動的なポスト認識型のユーザープロフィールを構築します。この動的プロフィールは主に、ユーザーが過去に同様のポストに対してどのように応答したかを記述します。ユーザーがよく使う単語を明示的に利用するために、ジェネリックな語彙から単語を生成する戦略と、ユーザーのパーソナライズされた語彙から単語をコピーする戦略の二つを融合するパーソナライズされたデコーダを設計しました。2つの現実世界のデータセットにおける実験により、既存の手法と比較して本モデルが著しく改善されていることが示されました。

Abstract:　 Personalized chatbots focus on endowing chatbots with a consistent personality to behave like real users, give more informative responses, and further act as personal assistants. Existing personalized approaches tried to incorporate several text descriptions as explicit user profiles. However, the acquisition of such explicit profiles is expensive and time-consuming, thus being impractical for large-scale real-world applications. Moreover, the restricted predefined profile neglects the language behavior of a real user and cannot be automatically updated together with the change of user interests. In this paper, we propose to learn implicit user profiles automatically from large-scale user dialogue history for building personalized chatbots. Specifically, leveraging the benefits of Transformer on language understanding, we train a personalized language model to construct a general user profile from the user's historical responses. To highlight the relevant historical responses to the input post, we further establish a key-value memory network of historical post-response pairs, and build a dynamic post-aware user profile. The dynamic profile mainly describes what and how the user has responded to similar posts in history. To explicitly utilize users' frequently used words, we design a personalized decoder to fuse two decoding strategies, including generating a word from the generic vocabulary and copying one word from the user's personalized vocabulary. Experiments on two real-world datasets show the significant improvement of our model compared with existing methods.

Partner Matters! An Empirical Study on Fusing Personas for Personalized Response Selection in Retrieval-Based Chatbots
Authors: Jia-Chen Gu (1), Hui Liu (2), Zhen-Hua Ling (1), Quan Liu (3), Zhigang Chen (4), Xiaodan Zhu (2)
1: University of Science and Technology of China, 2: Queen's University, 3: University of Science and Technology of China & iFLYTEK Research, 4: iFLYTEK Research

ACM DL

Google Scholar

(55)
概要:　ペルソナは、対話システムの一貫性を維持するための事前知識として機能します。これまでの多くの研究は、候補として選択されるか、直接生成される応答における自己ペルソナを採用した対話に焦点を当てていましたが、対話におけるパートナーの役割に注目したものは少数でした。本稿では、対話ベースのチャットボットにおける応答選択の課題に関して、自己またはパートナースピーカーを描写するペルソナを活用する影響を徹底的に探求する試みを行います。ペルソナが文脈や応答とどのように相互作用するかを仮定した4つのペルソナ融合戦略を設計しました。これらの戦略は、階層型リカレントエンコーダー（HRE）、インタラクティブマッチングネットワーク（IMN）、および双方向エンコーダー表現（BERT）に基づく3つの代表的な応答選択モデルに実装されました。Persona-Chatデータセットを用いた実証研究は、これまでの研究で見過ごされていたパートナーペルソナが、IMNおよびBERTベースのモデルにおける応答選択の精度を向上させることを示しています。さらに、文脈-応答認識型ペルソナ融合戦略を実装した当社のBERTベースモデルは、original personaにおいて2.7%以上、revised personaにおいて4.6%以上のマージンで先行手法を凌駕し、Persona-Chatデータセットにおける新しい最先端のパフォーマンスを達成しました。

Abstract:　 Persona can function as the prior knowledge for maintaining the consistency of dialogue systems. Most of previous studies adopted the self persona in dialogue whose response was about to be selected from a set of candidates or directly generated, but few have noticed the role of partner in dialogue. This paper makes an attempt to thoroughly explore the impact of utilizing personas that describe either self or partner speakers on the task of response selection in retrieval-based chatbots. Four persona fusion strategies are designed, which assume personas interact with contexts or responses in different ways. These strategies are implemented into three representative models for response selection, which are based on the Hierarchical Recurrent Encoder (HRE), Interactive Matching Network (IMN) and Bidirectional Encoder Representations from Transformers (BERT) respectively. Empirical studies on the Persona-Chat dataset show that the partner personas neglected in previous studies can improve the accuracy of response selection in the IMN- and BERT-based models. Besides, our BERT-based model implemented with the context-response-aware persona fusion strategy outperforms previous methods by margins larger than 2.7% on original personas and 4.6% on revised personas in terms of hits@1 (top-1 accuracy), achieving a new state-of-the-art performance on the Persona-Chat dataset.

Learning Recommender Systems with Implicit Feedback via Soft Target Enhancement
Authors: Mingyue Cheng (1), Fajie Yuan (2), Qi Liu (1), Shenyang Ge (3), Zhi Li (1), Runlong Yu (1), Defu Lian (1), Senchao Yuan (1), Enhong Chen (1)
1: University of Science and Technology of China, 2: Westlake University & Tencent Inc., 3: JD Inc

ACM DL

Google Scholar

(56)
概要:　タイトル：SoftRec：ディープラーニングベースのレコメンダーシステムにおける柔軟な目標信号を活用した最適化フレームワークソフトマックス損失を伴うワンホットエンコーダーは、多クラス問題に対処するためのデフォルトの設定となっており、ディープラーニング（DL）ベースのレコメンダーシステム（RS）においても広く用いられています。このような方法の標準的な学習プロセスは、モデルの出力をグラウンドトゥルースのワンホットエンコーディングに適合させることであり、これはハードターゲットと呼ばれます。しかし、これらのハードターゲットはRSにおける未観測フィードバックの曖昧さを大きく無視し、最適ではない一般化性能につながる可能性があることが知られています。本研究では、アイテム推薦を強化するための新しいRS最適化フレームワークであるSoftRecを提案します。コアとなるアイデアは、各インスタンスに対して追加の監督信号としてよく設計されたソフトターゲットを導入し、レコメンダーの学習をより効果的に誘導することです。さらに、SoftRecを一連の戦略（アイテムベース、ユーザーベース、モデルベースなど）で具体化することで、特定のソフトターゲット分布の影響を慎重に調査します。SoftRecの有効性を検証するために、さまざまなディープレコメンデーションアーキテクチャを使用して、2つの公共の推薦データセットで広範な実験を行いました。その実験結果は、標準的な最適化アプローチと比較して、我々の方法が優れた性能を達成することを示しています。さらに、ユーザーアイテム相互作用が希薄なコールドスタートシナリオにおいても、SoftRecは高い性能を発揮することができます。

Abstract:　 One-hot encoder accompanied by a softmax loss has become the default configuration to deal with the multiclass problem, and is also prevalent in deep learning (DL) based recommender systems (RS). The standard learning process of such methods is to fit the model outputs to a one-hot encoding of the ground truth, referred to as the hard target. However, it is known that these hard targets largely ignore the ambiguity of unobserved feedback in RS, and thus may lead to sub-optimal generalization performance. In this work, we propose SoftRec, a new RS optimization framework to enhance item recommendation. The core idea is that we add additional supervisory signals - well-designed soft targets - for each instance so as to better guide the recommender learning. Meanwhile, we carefully investigate the impacts of specific soft target distributions by instantiating the SoftRec with a series of strategies, including item-based, user-based, and model-based. To verify the effectiveness of SoftRec, we conduct extensive experiments on two public recommendation datasets by using various deep recommendation architectures. The experimental results show that our methods achieve superior performance compared with the standard optimization approaches. Moreover, SoftRec could also exhibit strong performance in cold-start scenarios where user-item interaction has higher sparsity.

Set2setRank: Collaborative Set to Set Ranking for Implicit Feedback based Recommendation
Authors: Lei Chen (1), Le Wu (2), Kun Zhang (1), Richang Hong (1), Meng Wang (2)
1: Hefei University of Technology, 2: Hefei University of Technology & Hefei Comprehensive National Science Center

ACM DL

Google Scholar

(57)
概要:　ユーザーがクリックした商品や購入した商品などの二値行動データ（暗黙的フィードバック）を基に好みを表現することが多い中、暗黙的フィードバックに基づく協調フィルタリング（CF）モデルは、ユーザーが好みそうな上位アイテムを予測するために暗黙的なユーザー-アイテムのインタラクションデータを活用します。各ユーザーに対する暗黙的フィードバックは、限られた観測行動を持つ観測アイテムセットと、負のアイテム行動や未知の行動が混在する大規模な未観測アイテムセットの2つに分けられます。任意のユーザープリファレンス予測モデルにおいて、研究者はランキングベースの最適化目標を設計するか、負のアイテムマイニング技術に依存して最適化を図ります。これらの暗黙的フィードバックに基づくモデルのパフォーマンス向上にもかかわらず、各ユーザーの観測アイテムセットの希少性により、推薦結果は依然として満足のいくものではありません。このため、本論文では暗黙的フィードバックの特有の特性を探求し、推薦のためのSet2setRankフレームワークを提案します。Set2setRankの最適化基準は二つあります。第一に、サンプルされた観測セットの各観測アイテムが、サンプルされた未観測セットのいずれのアイテムよりも高順位に位置付けられるようにするアイテム対アイテムセットの比較を設計します。第二に、サンプルされた負のセットから最も「困難な」未観測アイテムと観測アイテムセットからまとめられた距離との間にマージンを持たせるセットレベルの比較をモデル化します。さらに、これら二つの目標を実現するための適応型サンプリング手法を設計しています。提案フレームワークはモデルに依存せず、多くの推薦予測アプローチに簡単に適用でき、実際には時間効率も高いです。最後に、3つの実世界のデータセットに基づく広範な実験によって、提案手法の優越性が実証されました。

Abstract:　 As users often express their preferences with binary behavior data~(implicit feedback), such as clicking items or buying products, implicit feedback based Collaborative Filtering~(CF) models predict the top ranked items a user might like by leveraging implicit user-item interaction data. For each user, the implicit feedback is divided into two sets: an observed item set with limited observed behaviors, and a large unobserved item set that is mixed with negative item behaviors and unknown behaviors. Given any user preference prediction model, researchers either designed ranking based optimization goals or relied on negative item mining techniques for better optimization. Despite the performance gain of these implicit feedback based models, the recommendation results are still far from satisfactory due to the sparsity of the observed item set for each user. To this end, in this paper, we explore the unique characteristics of the implicit feedback and propose Set2setRank framework for recommendation. The optimization criteria of Set2setRank are two folds: First, we design an item to an item set comparison that encourages each observed item from the sampled observed set is ranked higher than any unobserved item from the sampled unobserved set. Second, we model set level comparison that encourages a margin between the distance summarized from the observed item set and the most "hard'' unobserved item from the sampled negative set. Further, an adaptive sampling technique is designed to implement these two goals. We have to note that our proposed framework is model-agnostic and can be easily applied to most recommendation prediction approaches, and is time efficient in practice. Finally, extensive experiments on three real-world datasets demonstrate the superiority of our proposed approach.

Package Recommendation with Intra- and Inter-Package Attention Networks
Authors: Chen Li (1), Yuanfu Lu (2), Wei Wang (2), Chuan Shi (3), Ruobing Xie (2), Haili Yang (2), Cheng Yang (3), Xu Zhang (2), Leyu Lin (2)
1: Beijing University of Posts and Telecommunications & WeChat Search Application Department, 2: WeChat Search Application Department, 3: Beijing University of Posts and Telecommunications

ACM DL

Google Scholar

(58)
概要:　モバイルインターネットにおけるオンラインソーシャルネットワークの急成長に伴い、新たな推薦シナリオが情報取得において重要な役割を果たしています。このシナリオでは、ユーザーに単一のアイテムやアイテムリストがおすすめされるのではなく、異種で多様なオブジェクト（ニュース、出版社、ニュースを閲覧している友人などを含むパッケージ）を組み合わせて提供されます。従来の推薦とは異なり、ユーザーはパッケージ内の明示的に表示されたオブジェクトに対して大きな興味を示すことが多く、これらのオブジェクトがユーザーの行動に大きな影響を与える可能性があります。しかし、我々の知る限り、パッケージ推薦に関する研究はほとんど行われておらず、既存のアプローチではパッケージ内の多様なオブジェクトの複雑な相互作用を適切にモデル化することが困難です。そこで本研究では、パッケージ推薦に関する初の研究を行い、パッケージ推薦のための「パッケージ内およびパッケージ間注意ネットワーク（IPRec）」を提案します。具体的には、パッケージモデリングのために、パッケージ内のオブジェクトレベルのユーザー意図を捉えるための「パッケージ内注意ネットワーク」を提案し、隣接するパッケージの協調特徴を捉えるパッケージレベルの情報エンコーダとして「パッケージ間注意ネットワーク」を導入します。さらに、ユーザーの嗜好表現を捉えるために、細粒度の特徴集約ネットワークと粗粒度のパッケージ集約ネットワークを備えた「ユーザー嗜好学習器」を提案します。3つの実データセットに対する大規模な実験により、IPRecが最先端の手法を大幅に上回る性能を示すことが確認されました。さらに、モデルの解析により、IPRecの解釈可能性およびユーザー行動の特性が明らかになりました。コードとデータセットはhttps://github.com/LeeChenChen/IPRecから入手可能です。

Abstract:　 With the booming of online social networks in the mobile internet, an emerging recommendation scenario has played a vital role in information acquisition for user, where users are no longer recommended with a single item or item list, but a combination of heterogeneous and diverse objects (called a package, e.g., a package including news, publisher, and friends viewing the news). Different from the conventional recommendation where users are recommended with the item itself, in package recommendation, users would show great interests on the explicitly displayed objects that could have a significant influence on the user behaviors. However, to the best of our knowledge, few effort has been made for package recommendation and existing approaches can hardly model the complex interactions of diverse objects in a package. Thus, in this paper, we make a first study on package recommendation and propose an Intra- and inter-package attention network for Package Recommendation (IPRec). Specifically, for package modeling, an intra-package attention network is put forward to capture the object-level intention of user interacting with the package, while an inter-package attention network acts as a package-level information encoder that captures collaborative features of neighboring packages. In addition, to capture users preference representation, we present a user preference learner equipped with a fine-grained feature aggregation network and coarse-grained package aggregation network. Extensive experiments on three real-world datasets demonstrate that IPRec significantly outperforms the state of the arts. Moreover, the model analysis demonstrates the interpretability of our IPRec and the characteristics of user behaviors. Codes and datasets can be obtained at https://github.com/LeeChenChen/IPRec.

A Guided Learning Approach for Item Recommendation via Surrogate Loss Learning
Authors: Ahmed Rashed (1), Josif Grabocka (2), Lars Schmidt-Thieme (1)
1: University of Hildesheim, 2: University of Freiburg

ACM DL

Google Scholar

(59)
概要:　正規化割引累積利得 (NDCG) は、レコメンダシステムおよびランキング学習問題において広く利用されている評価指標の一つです。しかし、NDCGは非微分可能であるため、勾配ベースの最適化手法では最適化が困難です。過去20年間にわたり、NDCGを最適化するための学習を支援する代理損失が数多く開発されてきましたが、二値関連性の暗黙的フィードバック設定においては、これらの代理損失は依然として大きな課題を抱えています。なぜなら、通常これらの損失はマルチレベルの関連性フィードバックのみを対象として設計・評価されているからです。本論文では、NDCG指標の直接最適化の限界に対処するため、NDCGに対する最新のパラメータ化代理損失を採用したガイド付き学習アプローチ (GuidedRec) を提案します。NDCGの代理損失とレコメンデーションモデルを共同で学習することが非常に不安定であるという観察から出発し、ポイントワイズなロジスティック損失関数を使用する任意のレコメンダシステムモデルにシームレスに適用できる段階的アプローチを設計しました。この提案手法は、NDCGを近似する独立した代理損失モデルを訓練し、その過程で元のロジスティック損失関数をスタビライザとして維持することにより、モデルがNDCGの最適化に向かうようガイドします。3つのレコメンデーションデータセットにおける実験では、我々のガイド付き代理学習アプローチが、最新のエンジニアード代理損失を用いた手法よりもNDCGに対してよりよく最適化されたモデルを生成することを示します。

Abstract:　 Normalized discounted cumulative gain (NDCG) is one of the popular evaluation metrics for recommender systems and learning-to-rank problems. As it is non-differentiable, it cannot be optimized by gradient-based optimization procedures. In the last twenty years, a plethora of surrogate losses have been engineered that aim to make learning recommendation and ranking models that optimize NDCG possible. However, binary relevance implicit feedback settings still pose a significant challenge for such surrogate losses as they are usually designed and evaluated only for multi-level relevance feedback. In this paper, we address the limitations of directly optimizing the NDCG measure by proposing a guided learning approach (GuidedRec) that adopts recent advances in parameterized surrogate losses for NDCG. Starting from the observation that jointly learning a surrogate loss for NDCG and the recommendation model is very unstable, we design a stepwise approach that can be seamlessly applied to any recommender system model that uses a point-wise logistic loss function. The proposed approach guides the models towards optimizing the NDCG using an independent surrogate-loss model trained to approximate the true NDCG measure while maintaining the original logistic loss function as a stabilizer for the guiding procedure. In experiments on three recommendation datasets, we show that our guided surrogate learning approach yields models better optimized for NDCG than recent state-of-the-art approaches using engineered surrogate losses.

Structured Graph Convolutional Networks with Stochastic Masks for Recommender Systems
Authors: Huiyuan Chen (1), Lan Wang (1), Yusan Lin (1), Chin-Chia Michael Yeh (1), Fei Wang (1), Hao Yang (1)
1: Visa Research

ACM DL

Google Scholar

(60)
概要:　グラフ畳み込みネットワーク（GCN）は協調フィルタリングにおいて強力です。GCNの主要なコンポーネントは、ユーザーとアイテムの高次元表現を抽出するために近傍集約メカニズムを探求することです。しかし、現実世界のユーザー・アイテムグラフはしばしば不完全でノイズが含まれています。誤解を招く近傍情報を集約すると、GCNが適切に正則化されていない場合、性能が最適でない結果となる可能性があります。また、現実世界のユーザー・アイテムグラフはしばしばスパースで低ランクです。これらの二つの本質的なグラフ特性は浅層の行列完成モデルで広く使用されていますが、グラフニューラルモデルではほとんど研究されていません。ここで我々は、スパース性と低ランクというグラフの構造的特性を利用してGCNの性能を向上させるために、構造化グラフ畳み込みネットワーク（SGCN）を提案します。スパース性を達成するために、各GCN層にトレーニング可能な確率的二値マスクを付加し、ノイズや重要でないエッジを剪定してクリーンでスパースなグラフを作成します。また、その低ランク特性を保持するために、核ノルム正則化を適用します。確率的二値マスクと元のGCNのパラメータを共同で学習するために、確率的二値最適化問題を解決します。さらに、二値変数の勾配をより良く逆伝播させるために、バイアスのない勾配推定量を提案します。実験結果は、SGCNが最先端のGCNと比較してより良い性能を発揮することを示しています。

Abstract:　 Graph Convolutional Networks (GCNs) are powerful for collaborative filtering. The key component of GCNs is to explore neighborhood aggregation mechanisms to extract high-level representations of users and items. However, real-world user-item graphs are often incomplete and noisy. Aggregating misleading neighborhood information may lead to sub-optimal performance if GCNs are not regularized properly. Also, the real-world user-item graphs are often sparse and low rank. These two intrinsic graph properties are widely used in shallow matrix completion models, but far less studied in graph neural models. Here we propose Structured Graph Convolutional Networks (SGCNs) to enhance the performance of GCNs by exploiting graph structural properties of sparsity and low rank. To achieve sparsity, we attach each layer of a GCN with a trainable stochastic binary mask to prune noisy and insignificant edges, resulting in a clean and sparsified graph. To preserve its low-rank property, the nuclear norm regularization is applied. We jointly learn the parameters of stochastic binary masks and original GCNs by solving a stochastic binary optimization problem. An unbiased gradient estimator is further proposed to better backpropagate the gradients of binary variables. Experimental results demonstrate that SGCNs achieve better performance compared with the state-of-the-art GCNs.

WGCN: Graph Convolutional Networks with Weighted Structural Features
Authors: Yunxiang Zhao (1), Jianzhong Qi (1), Qingwei Liu (1), Rui Zhang (2)
1: The University of Melbourne, 2: www.ruizhang.info

ACM DL

Google Scholar

(61)
概要:　グラフの構造情報（トポロジーや接続性など）は、グラフ畳み込みネットワーク（GCN）がノードの表現を学習するための貴重なガイダンスを提供します。しかし、現存のGCNモデルはノードの構造情報を捉える際に、入隣接ノードと出隣接ノードを等しく重み付けするか、ノードの局所トポロジーを考慮せずに全体的に区別するのみです。私たちは、ノードの局所トポロジーが異なる場合、入隣接ノードと出隣接ノードの貢献度が異なることを観察しました。さまざまなノードに対する方向性のある構造情報を探るため、新しいGCNモデル「WGCN」を提案します。WGCNはまず、方向と次数に敏感なリスタート付きランダムウォークアルゴリズムを用いて、ノードの構造的指紋を捉えます。このアルゴリズムでは、エッジの方向とノードの入次数および出次数によってウォークが誘導されます。次に、ノードの構造的指紋間の相互作用を用いて、重み付けされたノード構造特徴を生成します。ノードの高次依存性やグラフの幾何情報をさらに捉えるため、WGCNはグラフを潜在空間に埋め込み、ノードの潜在隣接ノードおよび幾何関係を取得します。潜在空間におけるノードの幾何関係に基づき、WGCNは注意機構に基づく幾何集約を用いて潜在隣接ノード、入隣接ノード、および出隣接ノードを区別します。推論ノード分類タスクの実験では、WGCNは5つのベンチマークデータセットにおいて、精度で最大17.07%の向上を示し、ベースラインモデルを一貫して上回りました。

Abstract:　 Graph structural information such as topologies or connectivities provides valuable guidance for graph convolutional networks (GCNs) to learn nodes' representations. Existing GCN models that capture nodes' structural information weight in- and out-neighbors equally or differentiate in- and out-neighbors globally without considering nodes' local topologies. We observe that in- and out-neighbors contribute differently for nodes with different local topologies. To explore the directional structural information for different nodes, we propose a GCN model with weighted structural features, named WGCN. WGCN first captures nodes' structural fingerprints via a direction and degree aware Random Walk with Restart algorithm, where the walk is guided by both edge direction and nodes' in- and out-degrees. Then, the interactions between nodes' structural fingerprints are used as the weighted node structural features. To further capture nodes' high-order dependencies and graph geometry, WGCN embeds graphs into a latent space to obtain nodes' latent neighbors and geometrical relationships. Based on nodes' geometrical relationships in the latent space, WGCN differentiates latent, in-, and out-neighbors with an attention-based geometrical aggregation. Experiments on transductive node classification tasks show that WGCN outperforms the baseline models consistently by up to 17.07% in terms of accuracy on five benchmark datasets.

Privacy Protection in Deep Multi-modal Retrieval
Authors: Peng-Fei Zhang (1), Yang Li (1), Zi Huang (1), Hongzhi Yin (1)
1: The University of Queensland

ACM DL

Google Scholar

(62)
概要:　ディープラーニング技術は、大規模なマルチモーダル検索において顕著な進歩をもたらしました。しかしながら、これらの先進技術は、個人のプライバシーを侵害する検索に悪用される可能性もあります。本論文では、悪意のあるマルチモーダル検索モデルに対して、独自のプライバシー保護手法 (PIP) を提案します。この手法は、元のデータを準知覚不能な摂動を加えた対抗データに変換し、公開する前に対策を講じます。その結果、許可されていない悪意のある第三者は、デプロイされたディープモデルを用いて目的の敏感情報を探し出すことができなくなります。プライバシー保護に加え、PIPは許可された利用を促進する効果的なマルチモーダル検索モデルを同時に学習し、摂動に対して強い耐性を持たせます。我々の知識の限りでは、マルチモーダル検索におけるプライバシー問題を考慮し、許可されていない検索に対するプライバシー保護と許可された利用のための頑強なマルチモーダル学習の双方を統合した枠組みを初めて提案する試みです。この研究は、対象となる悪意のあるモデルも教師付き情報も知られていない「no-box」および「unsupervised」環境という困難な設定で実施されました。我々の多用途なPIPの最適化目的は、異なるコンポーネント間の二者ゲームを通じて達成され、イントラモダリティおよびインターモダリティの両方のグラフアライメントとドメイン分布アライメントが考慮されます。さらに、信頼できる学習ガイダンスを得るために、高次の類似性マトリックスが開発されました。実証的には、提案されたPIPをハッシュベースのマルチモーダル検索シナリオに適用し、複数のベンチマークとタスクでその有効性を証明しました。

Abstract:　 Deep learning techniques have ushered in significant progress in large-scale multi-modal retrieval. Nevertheless, the advanced techniques may be used nefariously to conduct a search that violates the privacy of individuals. In this paper, we propose a novel PrIvacy Protection method (PIP) against malicious multi-modal retrieval models, which proactively transfers original data into adversarial data with quasi-imperceptible perturbations before releasing them. Consequently, unauthorized malicious parties are not able to use deployed deep models to find out desired sensitive information with them. In addition to privacy preserving, PIP synchronously learns an effective multi-modal retrieval model to facilitate authorized uses, endowed with strong resilience to the perturbations. To the best of our knowledge, it is a very first attempt to consider privacy issues in multi-modal retrieval, and encapsulate both privacy protection against unauthorized retrieval and robust multi-modal learning for authorized uses into a unified framework. This work is conducted in the challenging no-box and unsupervised settings, where neither target malicious models nor supervised information is known. The optimization objective of our versatile PIP is achieved through a two-player game between different components with both the intra- and inter-modality graph alignments and the domain distribution alignment considered. Besides, a high-level similarity matrix is developed to obtain reliable guidance for learning. Empirically, we apply the proposed PIP to hashing based multi-modal retrieval scenarios and prove its effectiveness on a range of benchmarks and tasks.

Learning Discriminative Neural Representations for Event Detection
Authors: Jinzhi Liao (1), Xiang Zhao (1), Xinyi Li (1), Lingling Zhang (2), Jiuyang Tang (1)
1: National University of Defense Technology, 2: Xi'an Jiaotong University

ACM DL

Google Scholar

(63)
概要:　テキストからイベントインスタンスを抽出することは、自然言語処理のさまざまな応用（例：自動質問応答システムや対話システム）にとって極めて重要であり、その第一歩としてイベント検出が必須となります。このタスクには、トリガー識別とタイプ分類という二つのサブタスクが存在し、前者は特に重要とされています。しかし、イベントトリガーを正確に予測することは非常に困難とされています。これを解決するために、既存の研究では手動特徴の組み込み、データ増強、ニューラルネットワークなどを利用して大きな進歩を遂げてきました。しかし、データの不足とトリガー単語の表現力の不十分さにより、トリガーの範囲を正確に特定すること（トリガースパン検出問題）には依然として課題が残っています。この問題に対処するために、私たちはテキストから判別力のあるニューラル表現（DNR）を学習する方法を提案します。具体的には、私たちのDNRモデルは、トリガースパン検出問題に対して以下の二つの新技術を利用して取り組みます：1) コントラスト学習戦略により、トリガー内部と外部の単語表現の差異を拡大する。2) Mixspan戦略により、トリガースパンの境界付近の単語を区別するモデルの訓練を強化する。ベンチマーク（ACE2005およびTAC2015）に対する広範な実験は、私たちのDNRモデルの優位性を実証し、最先端の成果をもたらします。

Abstract:　 Retrieving event instances from texts is pivotal to various natural language processing applications (e.g., automatic question answering and dialogue systems), and the first task to perform is event detection. There are two related sub-tasks therein-trigger identification and type classification, and the former is considered to play a dominant role. Nevertheless, it is notoriously challenging to predict event triggers right. To handle the task, existing work has made tremendous progress by incorporating manual features, data augmentation and neural networks, etc. Due to the scarcity of data and insufficient representation of trigger words, however, they still fail to precisely determine the spans of triggers (coined as trigger span detection problem). To address the challenge, we propose to learn discriminative neural representations (DNR) from texts. Specifically, our DNR model tackles the trigger span detection problem by exploiting two novel techniques: 1) a contrastive learning strategy, which enlarges the discrepancy between representations of words inside and outside triggers; and 2) a Mixspan strategy, which better trains the model to differentiate words nearby triggers' span boundaries. Extensive experiments on benchmarks-ACE2005 and TAC2015-demonstrate the superiority of our DNR model, leading to state-of-the-art performance.

Not All Relevance Scores are Equal: Efficient Uncertainty and Calibration Modeling for Deep Retrieval Models
Authors: Daniel Cohen (1), Bhaskar Mitra (2), Oleg Lesota (3), Navid Rekabsaz (3), Carsten Eickhoff (1)
1: Brown University, 2: Microsoft, 3: Johannes Kepler University

ACM DL

Google Scholar

(64)
概要:　あらゆるランキングシステムにおいて、検索モデルは検索クエリに対する文書の関連性に基づいて1つのスコアを出力します。検索モデルはより複雑なアーキテクチャの導入に伴い進化し続けているものの、単一のスコアを超えたスコアに対するモデルの信頼性を検討した研究は少数です。我々は、文書のスコアに対するモデルの不確実性を捉えることが、現在のモデルを新しい文書分布やコレクション、さらには後続のタスクの効果を向上させるために非常に重要であると主張します。本研究では、効率的なベイズフレームワークを用いて、この問題に対処します。このフレームワークは、確率過程を通じて関連性スコアに対するモデルの信頼性を捉えながら、計算コストをほとんど増加させません。我々は、この信頼性をランク付けベースの校正指標を用いて評価し、近似ベイズフレームワークがリスク認識の再ランク付けおよび信頼性の校正を通じて検索モデルのランキング効果を大幅に向上させることを示しました。最後に、この追加の不確実性情報が、カットオフ予測を通じて表される後続タスクにおいても実行可能かつ信頼性の高いものであることを実証しました。

Abstract:　 In any ranking system, the retrieval model outputs a single score for a document based on its belief on how relevant it is to a given search query. While retrieval models have continued to improve with the introduction of increasingly complex architectures, few works have investigated a retrieval model's belief in the score beyond the scope of a single value. We argue that capturing the model's uncertainty with respect to its own scoring of a document is a critical aspect of retrieval that allows for greater use of current models across new document distributions, collections, or even improving effectiveness for down-stream tasks. In this paper, we address this problem via an efficient Bayesian framework for retrieval models which captures the model's belief in the relevance score through a stochastic process while adding only negligible computational overhead. We evaluate this belief via a ranking based calibration metric showing that our approximate Bayesian framework significantly improves a retrieval model's ranking effectiveness through a risk aware reranking as well as its confidence calibration. Lastly, we demonstrate that this additional uncertainty information is actionable and reliable on down-stream tasks represented via cutoff prediction.

Interpretable Graph Similarity Computation via Differentiable Optimal Alignment of Node Embeddings
Authors: Khoa D. Doan (1), Saurav Manchanda (2), Suchismit Mahapatra (3), Chandan K. Reddy (1)
1: Virginia Tech, 2: University of Minnesota, 3: University of Buffalo

ACM DL

Google Scholar

(65)
概要:　グラフ類似性の計算は、グラフデータベースの検索やグラフクラスタリングなど、多くのグラフ関連アプリケーションにおいて重要なタスクです。グラフ間の類似性を捉えるために多くの手法が提案されてきましたが、実際にはグラフ編集距離（Graph Edit Distance, GED）と最大共通部分グラフ（Maximum Common Subgraphs, MCS）の2つが広く使用されています。GEDおよびMCSは、グラフ間の構造的類似性を定量化するドメイン非依存の指標であり、類似性を2つのグラフにおける異なるエンティティ（例えばノード、エッジ、部分グラフ）のペアワイズアラインメントの関数として定義します。このペアワイズアラインメントによる明確な説明性は、類似性スコアの透明性と正当化を提供するため、GEDおよびMCSは実際的な応用価値があります。しかし、これらの正確な計算はNP困難であることが知られています。最近提案されたニューラルネットワークベースの近似手法は、これらの類似性スコアを正確に計算できることが示されていますが、従来の組み合わせアルゴリズム（例: ビームサーチ）と比較して包括的な説明を提供する能力には限界があります。本論文は、ニューラルネットワークを通じてこれらのドメイン非依存の類似性測定を効率的に近似し、同時に従来の計算困難な手法に匹敵するアラインメント（すなわち、説明）を学習することを目的としています。具体的には、グラフ間の類似性を学習可能なノード埋め込み空間での最小「変換」コストとして定式化します。もしノード埋め込みがその周辺コンテキストを密接に捉えることができれば、我々の提案する類似性関数は従来の手法のアラインメントおよび類似性スコアを近似することができることを示します。さらに、提案する目的関数の効率的な微分可能な計算方法も提案します。実証的に、提案手法がグラフ類似性近似タスクで平均二乗誤差を最大50％-100％削減し、グラフ検索タスクで検索評価指標を最大20％改善することを示します。ソースコードはhttps://github.com/khoadoan/GraphOTSimで利用可能です。

Abstract:　 Computing graph similarity is an important task in many graph-related applications such as retrieval in graph databases or graph clustering. While numerous measures have been proposed to capture the similarity between a pair of graphs, Graph Edit Distance (GED) and Maximum Common Subgraphs (MCS) are the two widely used measures in practice. GED and MCS are domain-agnostic measures of structural similarity between the graphs and define the similarity as a function of pairwise alignment of different entities (such as nodes, edges, and subgraphs) in the two graphs. The explicit explainability offered by the pairwise alignment provides transparency and justification of the similarity score, thus, GED and MCS have important practical applications. However, their exact computations are known to be NP-hard. While recently proposed neural-network based approximations have been shown to accurately compute these similarity scores, they have limited ability in providing comprehensive explanations compared to classical combinatorial algorithms, e.g., Beam search. This paper aims at efficiently approximating these domain-agnostic similarity measures through a neural network, and simultaneously learning the alignments (i.e., explanations) similar to those of classical intractable methods. Specifically, we formulate the similarity between a pair of graphs as the minimal "transformation" cost from one graph to another in the learnable node-embedding space. We show that, if node embedding is able to capture its neighborhood context closely, our proposed similarity function closely approximates both the alignment and the similarity score of classical methods. Furthermore, we also propose an efficient differentiable computation of our proposed objective for model training. Empirically, we demonstrate that the proposed method achieves up to 50%-100% reduction in the Mean Squared Error for the graph similarity approximation task and up to 20% improvement in the retrieval evaluation metrics for the graph retrieval task. The source code is available at https://github.com/khoadoan/GraphOTSim.

MMConv: An Environment for Multimodal Conversational Search across Multiple Domains
Authors: Lizi Liao (1), Le Hong Long (2), Zheng Zhang (3), Minlie Huang (3), Tat-Seng Chua (2)
1: Sea-NExT Joint Lab & National University of Singapore, 2: National University of Singapore, 3: Tsinghua University

ACM DL

Google Scholar

(66)
概要:　対話型検索は対話研究や情報検索(IR)コミュニティで注目されているトピックであるにも関わらず、進展は利用可能なデータセットの規模と品質によって制限されている。この根本的な障害を克服するために、私たちはマルチモーダル・マルチドメイン・コンバーセーショナルデータセット（MMConv）を紹介する。これは、複数のドメインとタスクにわたる人間同士のロールプレイダイアログを完全に注釈したコレクションである。その貢献は二つある。一つ目は、ユーザーとエージェントのペア間のタスク指向のマルチモーダル対話を超えて、ダイアログ信念状態とダイアログ行為で完全に注釈されている点である。さらに重要なことは、実際のユーザー設定、構造化された会場データベース、注釈付き画像リポジトリ、およびクラウドソースされた知識データベースを備えた比較的包括的な環境を作成することで、マルチモーダル対話型検索を実施する場を提供する点である。データ収集手順の詳細な説明とデータ構造および分析のが提供される。二つ目は、対話状態追跡、対話型推薦、応答生成、および複数のタスクに対する統一モデルのベンチマーク結果のセットが報告されていることである。これらのタスクに対してそれぞれ最新の手法を採用し、データの使用可能性を実証し、現在の手法の限界を議論し、将来の研究のためのベースラインを設定する。

Abstract:　 Although conversational search has become a hot topic in both dialogue research and IR community, the real breakthrough has been limited by the scale and quality of datasets available. To address this fundamental obstacle, we introduce the Multimodal Multi-domain Conversational dataset (MMConv), a fully annotated collection of human-to-human role-playing dialogues spanning over multiple domains and tasks. The contribution is two-fold. First, beyond the task-oriented multimodal dialogues among user and agent pairs, dialogues are fully annotated with dialogue belief states and dialogue acts. More importantly, we create a relatively comprehensive environment for conducting multimodal conversational search with real user settings, structured venue database, annotated image repository as well as crowd-sourced knowledge database. A detailed description of the data collection procedure along with a summary of data structure and analysis is provided. Second, a set of benchmark results for dialogue state tracking, conversational recommendation, response generation as well as a unified model for multiple tasks are reported. We adopt the state-of-the-art methods for these tasks respectively to demonstrate the usability of the data, discuss limitations of current methods and set baselines for future studies.

Video Corpus Moment Retrieval with Contrastive Learning
Authors: Hao Zhang (1), Aixin Sun (1), Wei Jing (2), Guoshun Nan (3), Liangli Zhen (2), Joey Tianyi Zhou (2), Rick Siow Mong Goh (2)
1: Nanyang Technological University, 2: Agency for Science, 3: Singapore University of Technology and Design

ACM DL

Google Scholar

(67)
概要:　未編集および未分割のビデオのコレクションを与えられた場合、ビデオコーパスモーメント検索（Video Corpus Moment Retrieval, VCMR）とは、特定のテキストクエリに意味的に対応する一時的なモーメント（つまり、ビデオの一部）を検索することを指します。ビデオとテキストは異なる特徴空間から来ているため、VCMRを解決するための一般的なアプローチは2つあります：（i）各モダリティの表現を個別にエンコードし、次にクエリ処理のためにそれらの表現を揃える方法、（ii）細粒度のクロスモーダルインタラクションを採用して、クエリ処理のためのマルチモーダル表現を学習する方法。後者のアプローチはしばしばより良い検索精度をもたらしますが、前者のアプローチははるかに効率的です。本論文では、VCMRのための対比学習を用いた検索および局所化ネットワーク（Retrieval and Localization Network with Contrastive Learning, ReLoCLNet）を提案します。我々は前者のアプローチを採用し、ビデオエンコーダーとテキストエンコーダーを改良するために2つの対比学習目的を導入し、VCMRのより良い整合性を持つビデオおよびテキスト表現を個別に学習します。ビデオ対比学習（VideoCL）は、クエリと候補ビデオ間の相互情報をビデオレベルで最大化することを目的としています。フレーム対比学習（FrameCL）は、クエリに対応するモーメント領域をビデオ内のフレームレベルで強調することを目指します。実験結果は、ReLoCLNetが効率のためにテキストとビデオを個別にエンコードするにもかかわらず、その検索精度がクロスモーダルインタラクション学習を採用しているベースラインと同等であることを示しています。

Abstract:　 Given a collection of untrimmed and unsegmented videos, video corpus moment retrieval (VCMR) is to retrieve a temporal moment (i.e., a fraction of a video) that semantically corresponds to a given text query. As video and text are from two distinct feature spaces, there are two general approaches to address VCMR: (i) to separately encode each modality representations, then align the two modality representations for query processing, and (ii) to adopt fine-grained cross-modal interaction to learn multi-modal representations for query processing. While the second approach often leads to better retrieval accuracy, the first approach is far more efficient. In this paper, we propose a Retrieval and Localization Network with Contrastive Learning (ReLoCLNet) for VCMR. We adopt the first approach and introduce two contrastive learning objectives to refine video encoder and text encoder to learn video and text representations separately but with better alignment for VCMR. The video contrastive learning (VideoCL) is to maximize mutual information between query and candidate video at video-level. The frame contrastive learning (FrameCL) aims to highlight the moment region corresponds to the query at frame-level, within a video. Experimental results show that, although ReLoCLNet encodes text and video separately for efficiency, its retrieval accuracy is comparable with baselines adopting cross-modal interaction learning.

One Person, One Model, One World: Learning Continual User Representation without Forgetting
Authors: Fajie Yuan (1), Guoxiao Zhang (2), Alexandros Karatzoglou (3), Joemon Jose (4), Beibei Kong (2), Yudong Li (2)
1: Westlake University & Tencent, 2: Tencent, 3: Google, 4: University of Glasgow

ACM DL

Google Scholar

(68)
概要:　ユーザー表現の学習は、効果的なユーザーモデリングとパーソナライズされた推薦システムにとって重要な技術です。既存の手法では、個々のタスクごとに別々のデータでトレーニングすることにより、タスクごとに異なる一連のモデルパラメータを導き出すことが多いです。しかし、同じユーザーの表現には、異なるタスクにおいても嗜好や性格のような共通点がある可能性があります。そのため、別々にトレーニングされた表現は、パフォーマンスが最適でないだけでなく、パラメータの共有の観点からも非効率的である可能性があります。本論文では、ユーザー表現をタスクごとに継続的に学習する研究に取り組みます。新しいタスクを学習する際に、以前のタスクの一部のパラメータを使用します。ただし、新しいタスクをトレーニングする際に、以前に学習したパラメータが変更される可能性が高く、その結果として、人工ニューラルネットワーク（ANN）ベースのモデルが以前のタスクに対して学習した能力を永久に失う可能性があります。この課題は「破滅的忘却」と呼ばれます。この問題に対処するために、古いタスクを忘れずに、新しいタスクを時間をかけて学習する初の継続、または生涯のユーザー表現学習器であるConureを提案します。具体的には、深層ユーザー表現モデルにおいて、古いタスクの重要度が低い重みを反復的に除去することを提案します。ニューラルネットワークモデルが通常過剰にパラメータ化されているという事実に基づいています。こうすることで、重要な重みを再利用し、重要度の低い重みを変更して新しいタスクに適応することで、単一のモデルで多くのタスクを学習することができます。9つのタスクを含む2つの実世界のデータセットで広範な実験を行い、Conureがそのような古い「知識」を意図的に保存しない標準モデルを大きく上回り、個々のタスクごとに学習されるモデルやすべてのタスクデータを統合して同時に学習されるモデルよりも、競争力があり、時にはそれ以上に優れていることを示します。

Abstract:　 Learning user representations is a vital technique toward effective user modeling and personalized recommender systems. Existing approaches often derive an individual set of model parameters for each task by training on separate data. However, the representation of the same user potentially has some commonalities, such as preference and personality, even in different tasks. As such, these separately trained representations could be suboptimal in performance as well as inefficient in terms of parameter sharing. In this paper, we delve on research to continually learn user representations task by task, whereby new tasks are learned while using partial parameters from old ones. A new problem arises since when new tasks are trained, previously learned parameters are very likely to be modified, and as a result, an artificial neural network (ANN)-based model may lose its capacity to serve for well-trained previous tasks forever, this issue is termed catastrophic forgetting. To address this issue, we present Conure the first continual, or lifelong, user representation learner --- i.e., learning new tasks over time without forgetting old ones. Specifically, we propose iteratively removing less important weights of old tasks in a deep user representation model, motivated by the fact that neural network models are usually over-parameterized. In this way, we could learn many tasks with a single model by reusing the important weights, and modifying the less important weights to adapt to new tasks. We conduct extensive experiments on two real-world datasets with nine tasks and show that Conure largely exceeds the standard model that does not purposely preserve such old "knowledge'', and performs competitively or sometimes better than models which are trained either individually for each task or simultaneously by merging all task data.

Learning Domain Semantics and Cross-Domain Correlations for Paper Recommendation
Authors: Yi Xie (1), Yuqing Sun (1), Elisa Bertino (2)
1: Shandong University, 2: Purdue University

ACM DL

Google Scholar

(69)
概要:　学問分野を超えた知識の技術的な移転がどのように行われるかを理解することは、イノベーションの理解と促進に非常に関連しています。この目的のためには、意味のあいまいさと分野間の非対称的な影響の2つの課題があります。本論文では、知識の伝播を調査し、異分野間の論文推薦における意味的相関を特徴づけます。私たちは生成モデルを採用し、論文の内容を既存の階層的に分類された分野との確率的な関連性として表現し、単語の意味のあいまいさを軽減します。異分野間の意味的相関は、影響関数、相関メトリクス、およびランキングメカニズムによって表現されます。次に、ユーザーの興味をターゲットドメインの意味に対する確率分布として表し、関連する論文を推薦します。実データセットを用いた実験結果から、私たちの方法の有効性が示されました。また、結果の内在的要因についても解釈可能な形で議論します。従来の単語埋め込みベースの方法と比較して、私たちのアプローチはドメインの意味の進化をサポートし、それに応じて意味の相関の更新を可能にします。私たちのアプローチのもう一つの利点は、ユーザーの興味の指定を論文のリストまたはキーワードのクエリのいずれかで対応できる柔軟性と一貫性を持つことであり、実際のシナリオに適しています。

Abstract:　 Understanding how knowledge is technically transferred across academic disciplines is very relevant for understanding and facilitating innovation. There are two challenges for this purpose, namely the semantic ambiguity and the asymmetric influence across disciplines. In this paper we investigate knowledge propagation and characterize semantic correlations for cross discipline paper recommendation. We adopt a generative model to represent a paper content as the probabilistic association with an existing hierarchically classified discipline to reduce the ambiguity of word semantics. The semantic correlation across disciplines is represented by an influence function, a correlation metric and a ranking mechanism. Then a user interest is represented as a probabilistic distribution over the target domain semantics and the correlated papers are recommended. Experimental results on real datasets show the effectiveness of our methods. We also discuss the intrinsic factors of results in an interpretable way. Compared with traditional word embedding based methods, our approach supports the evolution of domain semantics that accordingly lead to the update of semantic correlation. Another advantage of our approach is its flexibility and uniformity in supporting user interest specifications by either a list of papers or a query of key words, which is suited for practical scenarios.

FedCT: Federated Collaborative Transfer for Recommendation
Authors: Shuchang Liu (1), Shuyuan Xu (1), Wenhui Yu (2), Zuohui Fu (1), Yongfeng Zhang (1), Amelie Marian (1)
1: Rutgers University, 2: Alibaba Group

ACM DL

Google Scholar

(70)
概要:　ユーザーが新しいエリアのアイテムをeコマースシステムで探索し始めるとき、クロスドメイン推奨技術が、ユーザーの馴染みのあるドメインからこの新しいドメインに豊富な知識を転送することで役立ちます。しかし、この解決策は通常、クラウド上でのサービス提供者間の直接情報共有を必要とし、これが常に可能であるとは限らず、プライバシーの懸念も生じます。本研究では、スマートフォンやノートパソコンなどのエッジデバイスでの学習を通じて、これらの懸念を克服できることを示します。クロスドメイン推奨問題は、複数のドメインサーバーを持つ分散計算環境下で定式化されます。そして、私たちはこの設定のために2つの主要な課題を特定しました：直接転送の不可能性とドメイン固有のユーザー表現の異質性です。次に、各ユーザーの個人空間で分散化されたユーザーエンコーディングを学習および維持することを提案します。最適化は、ユーザーのエンコーディングと彼女が関 interacted したすべてのドメインからのドメイン固有のユーザー情報との間の相互情報量を最大化する変分推論フレームワークに従います。実世界のデータセットを用いた経験的研究は、推奨タスクにおける我々の提案フレームワークの有効性と、ドメインごとの転送モデルよりも優れていることを示しています。結果として得られるシステムは、通信コストを削減し、関与するドメインの数に依存しない効率的な推論メカニズムを提供し、他のドメインに大きな干渉を与えることなく、ドメイン固有の転送モデルを柔軟にプラグインできます。

Abstract:　 When a user starts exploring items from a new area of an e-commerce system, cross-domain recommendation techniques come into help by transferring the abundant knowledge from the user's familiar domains to this new domain. However, this solution usually requires direct information sharing between service providers on the cloud which may not always be available and brings privacy concerns. In this paper, we show that one can overcome these concerns through learning on edge devices such as smartphones and laptops. The cross-domain recommendation problem is formalized under a decentralized computing environment with multiple domain servers. And we identify two key challenges for this setting: the unavailability of direct transfer and the heterogeneity of the domain-specific user representations. We then propose to learn and maintain a decentralized user encoding on each user's personal space. The optimization follows a variational inference framework that maximizes the mutual information between the user's encoding and the domain-specific user information from all her interacted domains. Empirical studies on real-world datasets exhibit the effectiveness of our proposed framework on recommendation tasks and its superiority over domain-pairwise transfer models. The resulting system offers reduced communication cost and an efficient inference mechanism that does not depend on the number of involved domains, and it allows flexible plugin of domain-specific transfer models without significant interference on other domains.

Self-supervised Graph Learning for Recommendation
Authors: Jiancan Wu (1), Xiang Wang (2), Fuli Feng (2), Xiangnan He (1), Liang Chen (3), Jianxun Lian (4), Xing Xie (4)
1: University of Science and Technology of China, 2: National University of Singapore, 3: Sun Yat-sen University, 4: Microsoft Research Asia

ACM DL

Google Scholar

(71)
概要:　レコメンデーションのためのユーザーアイテムグラフにおける表現学習は、単一のIDやインタラクション履歴を使用する段階から高次の近隣を活用する段階へと進化してきました。これにより、PinSageやLightGCNといったグラフ畳み込みネットワーク（GCN）の成功がもたらされました。しかし、その有効性にもかかわらず、以下の二つの制約があると主張します: (1) 高次数ノードが表現学習に大きな影響を与え、次数の低い（ロングテール）アイテムのレコメンデーションを悪化させること。(2) 表現がノイズのあるインタラクションに脆弱であること。近隣の集約手法が観測されたエッジの影響をさらに拡大してしまうためです。本研究では、ユーザーアイテムグラフ上で自己教師付き学習を探求し、レコメンデーション用のGCNの精度と堅牢性を向上させることを目指します。このアイデアは、従来のレコメンデーションの教師ありタスクに補助的な自己教師ありタスクを追加するものであり、自己識別を通じてノード表現学習を強化します。具体的には、ノードの複数のビューを生成し、同じノードの異なるビュー間の一致度を最大化することで、他のノードと比較します。我々はビューを生成するために、ノードドロップアウト、エッジドロップアウト、およびランダムウォークの三つのオペレータを考案し、それぞれ異なる方法でグラフ構造を変更します。この新しい学習パラダイムを自己教師付きグラフ学習（Self-supervised Graph Learning; SGL）と呼び、最新モデルのLightGCNで実装しました。理論解析を通じて、SGLが自動的にハードネガティブを発見する能力を持つことを見出しました。三つのベンチマークデータセットにおける実証研究は、特にロングテールアイテムに対するレコメンデーション精度とインタラクションノイズに対する堅牢性の向上を示しました。実装は \urlhttps://github.com/wujcan/SGL で公開しています。

Abstract:　 Representation learning on user-item graph for recommendation has evolved from using single ID or interaction history to exploiting higher-order neighbors. This leads to the success of graph convolution networks (GCNs) for recommendation such as PinSage and LightGCN. Despite effectiveness, we argue that they suffer from two limitations: (1) high-degree nodes exert larger impact on the representation learning, deteriorating the recommendations of low-degree (long-tail) items; and (2) representations are vulnerable to noisy interactions, as the neighborhood aggregation scheme further enlarges the impact of observed edges. In this work, we explore self-supervised learning on user-item graph, so as to improve the accuracy and robustness of GCNs for recommendation. The idea is to supplement the classical supervised task of recommendation with an auxiliary self-supervised task, which reinforces node representation learning via self-discrimination. Specifically, we generate multiple views of a node, maximizing the agreement between different views of the same node compared to that of other nodes. We devise three operators to generate the views --- node dropout, edge dropout, and random walk --- that change the graph structure in different manners. We term this new learning paradigm asSelf-supervised Graph Learning (SGL), implementing it on the state-of-the-art model LightGCN. Through theoretical analyses, we find that SGL has the ability of automatically mining hard negatives. Empirical studies on three benchmark datasets demonstrate the effectiveness of SGL, which improves the recommendation accuracy, especially on long-tail items, and the robustness against interaction noises. Our implementations are available at \urlhttps://github.com/wujcan/SGL.

Modeling Intent Graph for Search Result Diversification
Authors: Zhan Su (1), Zhicheng Dou (1), Yutao Zhu (2), Xubo Qin (1), Ji-Rong Wen (3)
1: Renmin University of China, 2: Université de Montréal, 3: Beijing Key Laboratory of Big Data Management and Analysis Methods & Key Lab. of Data Engineering and Knowledge Engineering

ACM DL

Google Scholar

(72)
概要:　検索結果の多様化は、可能な限り多くの意図をカバーする多様な文書を提供することを目的としています。既存の暗黙的な多様化アプローチの多くは、文書の表現の類似性を通じて多様性をモデル化していますが、これは間接的で不自然です。多様性をより正確に取り扱うために、我々は意図のカバー範囲の類似性によって文書の類似性を測定します。具体的には、文書の内容に基づいて、異なる文書が同じ意図を含むかどうかを判断する分類器を構築します。その後、文書とクエリの複雑な関係を示すために意図グラフを構築します。意図グラフ上では、文書が類似している場合に接続され、クエリと文書は文書選択の結果に基づいて段階的に接続されます。次に、グラフ畳み込みネットワーク（GCN）を使用して、隣接ノードを集約することにより、クエリと各文書の表現を更新します。この方法により、文書選択プロセス中に動的な意図グラフを通じて、コンテキストに応じたクエリの表現と意図に応じた文書の表現を得ることができます。さらに、これらの表現と意図グラフの特徴を多様性の特徴に融合します。伝統的な関連性の特徴と組み合わせて、関連性と多様性のバランスを取る最終的なランキングスコアを得ます。実験結果は、この暗黙的な多様化モデルが既存のすべての暗黙的多様化方法を大きく上回り、最新の明示的なモデルをも凌駕することを示しています。

Abstract:　 Search result diversification aims to offer diverse documents that cover as many intents as possible. Most existing implicit diversification approaches model diversity through the similarity of document representation, which is indirect and unnatural. To handle the diversity more precisely, we measure the similarity of documents by their similarity of the intent coverage. Specifically, we build a classifier to judge whether two different documents contain the same intent based on the document's content. Then we construct an intent graph to present the complicated relationship of documents and the query. On the intent graph, documents are connected if they are similar, while the query and the document are gradually connected based on the document selection result. Then we employ graph convolutional networks (GCNs) to update the representation of the query and each document by aggregating its neighbors. By this means, we can obtain the context-aware query representation and the intent-aware document representations through the dynamic intent graph during the document selection process. Furthermore, these representations and intent graph features are fused into diversity features. Combined with the traditional relevance features, we obtain the final ranking score that balances the relevance and the diversity. Experimental results show that this implicit diversification model significantly outperforms all existing implicit diversification methods, and it can even beat the state-of-the-art explicit models.

Enhancing Domain-Level and User-Level Adaptivity in Diversified Recommendation
Authors: Yile Liang (1), Tieyun Qian (1), Qing Li (2), Hongzhi Yin (3)
1: Wuhan University, 2: Hong Kong Polytechnic University, 3: The University of Queensland

ACM DL

Google Scholar

(73)
概要:　推薦システムは、ユーザーの個人的な嗜好を取り入れる能力により、オンラインプラットフォームにおいて重要な役割を果たしています。精度を超えて、多様性はユーザーの視野を広げるだけでなく、企業の売上を促進する重要な要素として認識されています。しかし、精度と多様性のトレードオフは依然として大きな課題となっています。さらに重要なことに、既存の方法では多様性に対するドメインおよびユーザーのバイアスを検討したものはありません。本論文では、多様化した推薦におけるドメインレベルおよびユーザーレベルの適応性を強化することに焦点を当てています。具体的には、まずドメインレベルの多様性を一般化された双方向ネットワークに適応的なバランス戦略を用いてエンコードします。また、各ブランチ内に双方向適応メトリック学習バックボーンネットワークを開発することでユーザーレベルの多様性を捉えます。3つの実世界のデータセットで広範な実験を行い、結果として提案手法が最新のベースラインよりも一貫して優れていることを示しています。

Abstract:　 Recommender systems are playing a vital role in online platforms due to the ability of incorporating users' personal tastes. Beyond accuracy, diversity has been recognized as a key factor to broaden users' horizons as well as to promote enterprises' sales. However, the trade-off between accuracy and diversity remains to be a big challenge. More importantly, none of existing methods has explored the domain and user biases toward diversity. In this paper, we focus on enhancing both domain-level and user-level adaptivity in diversified recommendation. Specifically, we first encode domain-level diversity into a generalized bi-lateral branch network with an adaptive balancing strategy. We further capture user-level diversity by developing a two-way adaptive metric learning backbone network inside each branch. We conduct extensive experiments on three real-world datasets. Results demonstrate that our proposed approach consistently outperforms the state-of-the-art baselines.

Graph Meta Network for Multi-Behavior Recommendation
Authors: Lianghao Xia (1), Yong Xu (1), Chao Huang (2), Peng Dai (2), Liefeng Bo (2)
1: South China University of Technology, 2: JD Finance America Corporation

ACM DL

Google Scholar

(74)
概要:　現代のレコメンダーシステムは、ユーザーとアイテムを低次元の潜在表現に埋め込むことが一般的です。この埋め込みは、ユーザーとアイテムの観測されたインタラクションに基づいています。しかし、実際のレコメンデーションシナリオにおいて、ユーザーは様々な意図を持ち、それが複数の行動タイプ（例：クリック、タグ付け、お気に入り、購入）でアイテムとインタラクトする動機となっています。ほとんどの既存アプローチではユーザー行動の多様性が無視されており、異なるタイプのインタラクティブ行動間の異種関係構造を捉えるのが難しい状況です。複数の行動パターンを探求することはレコメンデーションシステムにとって非常に重要ですが、以下の二つの点で非常に困難です：i）異なるタイプのユーザー-アイテムインタラクション間の複雑な依存関係、ii）ユーザーの個人的な嗜好によって多様な行動パターンが異なる可能性があること。これらの課題に対処するために、マルチビヘイビア推奨フレームワーク「グラフメタネットワーク（GMN）」を提案し、メタラーニングパラダイムに多行動パターンのモデリングを取り入れることにしました。我々が開発したMB-GMNは、ユーザーとアイテムのインタラクション学習において、行動依存の表現を明らかにする能力を備えており、推薦のための行動異質性とインタラクション多様性を自動的に抽出します。三つの実世界のデータセットにわたる広範な実験により、MB-GMNは様々な最先端のベースラインと比較して、推薦パフォーマンスを大幅に向上させることが示されました。ソースコードはhttps://github.com/akaxlh/MB-GMNで入手可能です。

Abstract:　 Modern recommender systems often embed users and items into low-dimensional latent representations, based on their observed interactions. In practical recommendation scenarios, users often exhibit various intents which drive them to interact with items with multiple behavior types (e.g., click, tag-as-favorite, purchase). However, the diversity of user behaviors is ignored in most of existing approaches, which makes them difficult to capture heterogeneous relational structures across different types of interactive behaviors. Exploring multi-typed behavior patterns is of great importance to recommendation systems, yet is very challenging because of two aspects: i) The complex dependencies across different types of user-item interactions; ii) Diversity of such multi-behavior patterns may vary by users due to their personalized preference. To tackle the above challenges, we propose a Multi-Behavior recommendation framework with Graph Meta Network to incorporate the multi-behavior pattern modeling into a meta-learning paradigm. Our developed MB-GMN empowers the user-item interaction learning with the capability of uncovering type-dependent behavior representations, which automatically distills the behavior heterogeneity and interaction diversity for recommendations. Extensive experiments on three real-world datasets show the effectiveness of MB-GMN by significantly boosting the recommendation performance as compared to various state-of-the-art baselines. The source code is available at https://github.com/akaxlh/MB-GMN.

Fairness among New Items in Cold Start Recommender Systems
Authors: Ziwei Zhu (1), Jingu Kim (2), Trung Nguyen (2), Aish Fenton (2), James Caverlee (1)
1: Texas A&M University, 2: Netflix

ACM DL

Google Scholar

(75)
概要:　本論文は、新規アイテムに対する推奨システムの公平性を調査しています。従来の研究では、推奨システムにおける公平性が検討され、改善の成功が示されていますが、主にクリックや閲覧などのバイアスがかかったユーザーフィードバック履歴によって不公平が生じるシナリオに焦点を当てています。しかし、新規アイテムがフィードバック履歴を持たない場合において、公平に推奨できるかどうかは不明です。さらに、不公平が存在する場合、コールドスタートシナリオにおいてどのようにこれらの新規アイテムの間で公平な推奨を提供できるかも課題です。本研究では、まず新規アイテム間の公平性を「機会の平等」や「ロールズの最大-最小公平性」といった既知の概念を用いて形式化します。そして、コールドスタート推奨システムにおける不公平の広まりを実証的に示します。次に、フェアネスを強化するためのモデル設計図として、新たに学習可能な後処理フレームワークを提案し、それに基づき2つの具体的なモデル—共同学習生成モデルとスコアスケーリングモデル—を提案します。4つの公開データセットを用いた広範な実験により、推奨の有用性を維持しながらフェアネスを向上させる提案モデルの有効性が示されています。

Abstract:　 This paper investigates recommendation fairness among new items. While previous efforts have studied fairness in recommender systems and shown success in improving fairness, they mainly focus on scenarios where unfairness arises due to biased prior user-feedback history (like clicks or views). Yet, it is unknown whether new items without any feedback history can be recommended fairly, and if unfairness does exist, how can we provide fair recommendations among these new items in such a cold-start scenario. In detail, we first formalize fairness among new items with the well-known concepts of equal opportunity and Rawlsian Max-Min fairness. We empirically show the prevalence of unfairness in cold start recommender systems. Then we propose a novel learnable post-processing framework as a model blueprint for enhancing fairness, with which we propose two concrete models: a joint-learning generative model, and a score scaling model. Extensive experiments over four public datasets show the effectiveness of the proposed models for enhancing fairness while also preserving recommendation utility.

Make It Easy: An Effective End-to-End Entity Alignment Framework
Authors: Congcong Ge (1), Xiaoze Liu (1), Lu Chen (1), Baihua Zheng (2), Yunjun Gao (3)
1: Zhejiang University, 2: Singapore Management University, 3: Zhejiang University & Alibaba-Zhejiang University Joint Institute of Frontier Technologies

ACM DL

Google Scholar

(76)
概要:　エンティティアライメント（EA）は、統一された知識グラフのカバレッジを拡大するための前提条件です。従来のEAアプローチは、情報の利用が不十分であるためにパフォーマンスが制約されたり、外部情報や信頼できる情報を取得するための労力のかかる前処理が必要だったりします。本論文では、EASYという効果的なエンドツーエンドのEAフレームワークを提案します。EASYは、(i) エンティティ自体が提供する名前情報を完全に活用することで労力のかかる前処理を排除し、(ii) エンティティの名前で捉えられる特徴とグラフ構造情報を共同で融合させてEAの結果を向上させることができます。具体的には、EASYはまず、非常に効果的な名前ベースのエンティティアライメント手法であるNEAPを導入し、初期アライメントを取得します。このアライメントは合理的な精度を持ち、かつ多くのメモリ消費や複雑なトレーニングプロセスを必要としません。その後、EASYは新しい構造ベースのリファインメント戦略であるSRSを呼び出し、NEAPによって生成された不整合なエンティティを反復的に修正して、エンティティアライメントをさらに強化します。広範な実験により、提案されたEASYが13の最新の競合手法に対して有意な改善を示し、その優位性が実証されました。

Abstract:　 Entity alignment (EA) is a prerequisite for enlarging the coverage of a unified knowledge graph. Previous EA approaches either restrain the performance due to inadequate information utilization or need labor-intensive pre-processing to get external or reliable information to perform the EA task. This paper proposes EASY, an effective end-to-end EA framework, which is able to (i) remove the labor-intensive pre-processing by fully discovering the name information provided by the entities themselves; and (ii) jointly fuse the features captured by the names of entities and the structural information of the graph to improve the EA results. Specifically, EASY first introduces NEAP, a highly effective name-based entity alignment procedure, to obtain an initial alignment that has reasonable accuracy and meanwhile does not require much memory consumption or any complex training process. Then, EASY invokes SRS, a novel structure-based refinement strategy, to iteratively correct the misaligned entities generated by NEAP to further enhance the entity alignment. Extensive experiments demonstrate the superiority of our proposed EASY with significant improvement against 13 existing state-of-the-art competitors.

Files of a Feather Flock Together? Measuring and Modeling How Users Perceive File Similarity in Cloud Storage
Authors: Will Brackenbury (1), Galen Harrison (1), Kyle Chard (1), Aaron Elmore (1), Blase Ur (1)
1: University of Chicago

ACM DL

Google Scholar

(77)
概要:　先行研究は、ユーザーがデジタルファイルの個人コレクションの整理を類似性の観点から概念化していることを示唆しています。しかし、実際のファイルコレクションにおいて類似したファイルがどの程度近くに配置されているか（例：同じディレクトリ内）や、ファイルの類似性を利用することで整理されていないファイルの情報検索と整理が改善されるかどうかは不明です。これを明らかにするために、50名のGoogle DriveおよびDropboxユーザーのクラウドアカウントに対する自動解析と、これらのアカウントから抽出したファイルのペアについて尋ねる調査を組み合わせたオンライン研究を実施しました。参加者が異なるファイル階層にある多くのファイルが彼らの認識において類似していると感じ、またそれらがアルゴリズム的に抽出可能な特徴においても類似していることが判明しました。参加者は類似ファイルを共に管理したい（例：1つのファイルを削除することは他のファイルも削除することを意味する）と望むことが多かったですが、それらはファイル階層内で遠く離れていました。この関係をさらに理解するために、回帰モデルを構築し、いくつかのアルゴリズム的に抽出可能なファイル特徴が、ファイルの類似性に対する人間の認識と望ましいファイル共同管理の予測因子となることを発見しました。我々の発見は、ユーザーの類似ファイルとの過去の相互作用に基づいて、アクセス、移動、削除操作を自動的に推奨するためにファイルの類似性を活用する道を開くものです。

Abstract:　 Prior work suggests that users conceptualize the organization of personal collections of digital files through the lens of similarity. However, it is unclear to what degree similar files are actually located near one another (e.g., in the same directory) in actual file collections, or whether leveraging file similarity can improve information retrieval and organization for disorganized collections of files. To this end, we conducted an online study combining automated analysis of 50 Google Drive and Dropbox users' cloud accounts with a survey asking about pairs of files from those accounts. We found that many files located in different parts of file hierarchies were similar in how they were perceived by participants, as well as in their algorithmically extractable features. Participants often wished to co-manage similar files (e.g., deleting one file implied deleting the other file) even if they were far apart in the file hierarchy. To further understand this relationship, we built regression models, finding several algorithmically extractable file features to be predictive of human perceptions of file similarity and desired file co-management. Our findings pave the way for leveraging file similarity to automatically recommend access, move, or delete operations based on users' prior interactions with similar files.

CINES: Explore Citation Network and Event Sequences for Citation Forecasting
Authors: Fang He (1), Wang-Chien Lee (1), Tao-Yang Fu (1), Zhen Lei (1)
1: The Pennsylvania State University

ACM DL

Google Scholar

(78)
概要:　科学論文や特許の引用は、知識の流れを明らかにし、その新規性や分野への影響を評価するための指標として機能する。引用予測は、したがって現実世界でさまざまな応用がある。既存の引用予測に関する研究は、通常、引用イベントの逐次的な特性を活用するが、引用ネットワークを探求することは少ない。本論文では、引用ネットワークと関連する引用イベントのシーケンスの両方を探求し、将来の引用予測に役立つ貴重な情報を提供することを提案する。具体的には、新しい\em Citation Network and Event Sequence (CINES) モデルを提唱し、引用ネットワークと関連する引用イベントシーケンスのシグナルをさまざまな種類の埋め込みにエンコードし、将来の引用の到来をデコードする方法を示す。さらに、\em atemporal ネットワーク注意と\em 双方向の特徴伝播を3つの異なる設計で提案し、引用ネットワーク内の出版物の回顧的および見通し的な側面を集約する。これにより、二段階の注意メカニズムを用いた引用イベントシーケンスの埋め込みと結合して引用予測を行う。米国特許データセットおよびDBLPデータセットで、我々のモデルとベースラインを評価した結果、提案したモデルが現行の最先端手法（RMTPP、CYAN-RNN、Intensity-RNN、PC-RNN）よりも優れ、予測誤差を37.76%から75.32%削減することがわかった。

Abstract:　 Citations of scientific papers and patents reveal the knowledge flow and usually serve as the metric for evaluating their novelty and impacts in the field. Citation Forecasting thus has various applications in the real world. Existing works on citation forecasting typically exploit the sequential properties of citation events, without exploring the citation network. In this paper, we propose to explore both the citation network and the related citation event sequences which provide valuable information for future citation forecasting. We propose a novel \em Citation Network and Event Sequence (CINES) Model to encode signals in the citation network and related citation event sequences into various types of embeddings for decoding to the arrivals of future citations. Moreover, we propose atemporal network attention and three alternative designs of \em bidirectional feature propagation to aggregate the retrospective and prospective aspects of publications in the citation network, coupled with the citation event sequence embeddings learned by a \em two-level attention mechanism for the citation forecasting. We evaluate our models and baselines on both a U.S. patent dataset and a DBLP dataset. Experimental results show that our models outperform the state-of-the-art methods, i.e., RMTPP, CYAN-RNN, Intensity-RNN, and PC-RNN, reducing the forecasting error by 37.76% - 75.32%.

Learning to Ask Appropriate Questions in Conversational Recommendation
Authors: Xuhui Ren (1), Hongzhi Yin (1), Tong Chen (1), Hao Wang (2), Zi Huang (1), Kai Zheng (3)
1: The University of Queensland, 2: Alibaba Group, 3: University of Electronic Science and Technology of China

ACM DL

Google Scholar

(79)
概要:　会話型推薦システム（CRS）は、対話エージェントを活用してユーザーの細やかな好みを動的に把握することで、従来の推薦パラダイムに革命をもたらしました。典型的な会話型推薦のシナリオでは、CRSはまずユーザーに彼女/彼の要求を明確にするための質問を生成し、その後、適切な推薦を行います。したがって、適切な明確化質問を生成する能力は、ユーザーの動的な好みをタイムリーに追跡し、成功した推薦を達成するための鍵となります。しかし、既存のCRSは質の高い質問をする点で問題があります。理由は次の通りです：(1) システム生成の応答はダイアログポリシーエージェントの性能に大きく依存しており、すべての状況をカバーするためには膨大な会話コーパスで訓練しなければならないこと、そして(2) 現在のCRSは適切でパーソナライズされた応答を生成するための学習済みの潜在ユーザープロファイルを十分に活用できていないことです。これらの問題を緩和するため、我々は知識ベース質問生成システム（KBQG）という新しい会話型推薦フレームワークを提案します。既存の会話型推薦システムとは異なり、KBQGは構造化された知識グラフ（KG）から最も関連性の高い関係を特定することによって、ユーザーの好みをより細かくモデリングします。異なる関係の重要性に応じて生成される明確化質問は、ユーザーに詳細な好みを提供させるのに優れた性能を発揮します。結果として、少ない会話ターンで正確な推薦が生成されます。さらに、提案されたKBQGは、二つの実世界データセットでの実験において、すべてのベースラインを上回る性能を示しました。

Abstract:　 Conversational recommender systems (CRSs) have revolutionized the conventional recommendation paradigm by embracing dialogue agents to dynamically capture the fine-grained user preference. In a typical conversational recommendation scenario, a CRS firstly generates questions to let the user clarify her/his demands and then makes suitable recommendations. Hence, the ability to generate suitable clarifying questions is the key to timely tracing users' dynamic preferences and achieving successful recommendations. However, existing CRSs fall short in asking high-quality questions because: (1) system-generated responses heavily depends on the performance of the dialogue policy agent, which has to be trained with huge conversation corpus to cover all circumstances; and (2) current CRSs cannot fully utilize the learned latent user profiles for generating appropriate and personalized responses. To mitigate these issues, we propose the Knowledge-Based Question Generation System (KBQG), a novel framework for conversational recommendation. Distinct from previous conversational recommender systems, KBQG models a user's preference in a finer granularity by identifying the most relevant relations from a structured knowledge graph (KG). Conditioned on the varied importance of different relations, the generated clarifying questions could perform better in impelling users to provide more details on their preferences. Finially, accurate recommendations can be generated in fewer conversational turns. Furthermore, the proposed KBQG outperforms all baselines in our experiments on two real-world datasets.

Multi-Modal Supplementary-Complementary Summarization using Multi-Objective Optimization
Authors: Anubhav Jangra (1), Sriparna Saha (1), Adam Jatowt (2), Mohammed Hasanuzzaman (3)
1: Indian Institute of Technology Patna, 2: University of Innsbruck, 3: Munster Technological University

ACM DL

Google Scholar

(80)
概要:　オンライン上の大量のマルチモーダル情報は、ユーザーが適切な洞察を得ることを困難にしています。本論文では、出力における異なるモダリティによってカバーされる情報の重複の文脈で、補助的および補完的マルチモーダルの概念を紹介し、正式に定義します。補完的かつ補助的なマルチモーダル（CCS-MMS）の新しい問題定義を策定します。この問題は、新しい教師なしフレームワークを考案して多目的最適化の概念を活用することで、いくつかの段階に分けて解決されます。既存のマルチモーダルデータセットは、提案手法の有効性を確立するために、異なるモダリティの出力を追加してさらに拡張されます。提案アプローチによって得られた結果は、いくつかの強力なベースラインと比較されます。また、提案技術を実証的に正当化するためにアブレーション実験も行われます。さらに、提案モデルは異なるモダリティごとに定量的および定性的に別々に評価され、我々のアプローチの優位性が示されます。

Abstract:　 Large amounts of multi-modal information online make it difficult for users to obtain proper insights. In this paper, we introduce and formally define the concepts of supplementary and complementary multi-modal summaries in the context of the overlap of information covered by different modalities in the summary output. A new problem statement of combined complementary and supplementary multi-modal summarization (CCS-MMS) is formulated. The problem is then solved in several steps by utilizing the concepts of multi-objective optimization by devising a novel unsupervised framework. An existing multi-modal summarization data set is further extended by adding outputs in different modalities to establish the efficacy of the proposed technique. The results obtained by the proposed approach are compared with several strong baselines; ablation experiments are also conducted to empirically justify the proposed techniques. Furthermore, the proposed model is evaluated separately for different modalities quantitatively and qualitatively, demonstrating the superiority of our approach.

Few-Shot Conversational Dense Retrieval
Authors: Shi Yu (1), Zhenghao Liu (1), Chenyan Xiong (2), Tao Feng (1), Zhiyuan Liu (1)
1: Tsinghua University, 2: Microsoft Research

ACM DL

Google Scholar

(81)
概要:　高密度検索（Dense Retrieval, DR）は、クエリ理解の課題を解決し、会話型検索において学習された埋め込み空間でマッチングする潜在力を持っています。しかし、この適応はDRモデルが追加の監督信号を必要とし、会話型検索のロングテール性のために困難です。本論文では、会話型高密度検索システム（Conversational Dense Retrieval, ConvDR）を提案します。このシステムは、複数ターンの会話クエリに対して文脈化された埋め込みを学習し、埋め込みのドット積のみを用いて文書を検索します。さらに、ConvDRに少数ショット能力を付与するために、教師-学生フレームワークを採用します。ここでは、アドホックな高密度検索を教師として使用し、その文書エンコーディングを継承し、オラクルが再構成したクエリで教師の埋め込みを模倣する学生クエリエンコーダを学習します。TREC CAsTとOR-QuACでの実験により、ConvDRは少数ショットおよび完全監督設定の両方において効果的であることが示されました。ConvDRは、従来のスパースワード空間で動作するシステムを上回り、オラクルクエリの再構成の検索精度に匹敵し、そのシンプルさにより効率も高いことが示されました。我々の分析は、ConvDRの利点が、有益なコンテキストを捉えつつ、以前の会話ラウンドにおける関連性の低いコンテキストを無視できる能力にあることを明らかにしています。これにより、会話が進化するにつれてConvDRはより効果的となり、従来のシステムは以前のターンからのノイズにより混乱する可能性があります。コードは、https://github.com/thunlp/ConvDR で公開しています。

Abstract:　 Dense retrieval (DR) has the potential to resolve the query understanding challenge in conversational search by matching in the learned embedding space. However, this adaptation is challenging due to DR models' extra needs for supervision signals and the long-tail nature of conversational search. In this paper, we present a Conversational Dense Retrieval system, ConvDR, that learns contextualized embeddings for multi-turn conversational queries and retrieves documents solely using embedding dot products. In addition, we grant ConvDR few-shot ability using a teacher-student framework, where we employ an ad hoc dense retriever as the teacher, inherit its document encodings, and learn a student query encoder to mimic the teacher embeddings on oracle reformulated queries. Our experiments on TREC CAsT and OR-QuAC demonstrate ConvDR's effectiveness in both few-shot and fully-supervised settings. It outperforms previous systems that operate in the sparse word space, matches the retrieval accuracy of oracle query reformulations, and is also more efficient thanks to its simplicity. Our analyses reveal that the advantages of ConvDR come from its ability to capture informative context while ignoring the unrelated context in previous conversation rounds. This makes ConvDR more effective as conversations evolve while previous systems may get confused by the increased noise from previous turns. Our code is publicly available at https://github.com/thunlp/ConvDR.

Conversational Fashion Image Retrieval via Multiturn Natural Language Feedback
Authors: Yifei Yuan (1), Wai Lam (1)
1: The Chinese University of Hong Kong

ACM DL

Google Scholar

(82)
概要:　本研究では、複数回の自然言語フィードバックを用いた会話型ファッション画像検索の課題を探求します。これまでの研究の多くは、単一回の設定に基づいています。複数回の会話型ファッション画像検索に関する既存のモデルには、従来型のモデルを使用しているため、パフォーマンスが効果的でないなどの限界があります。我々は、複数回の自然言語フィードバックテキストを使用した会話型ファッション画像検索を効果的に処理できる新しいフレームワークを提案します。本フレームワークの特徴の一つは、エンコードされた参照画像とフィードバックテキスト情報を会話履歴と共に活用して候補画像を検索する点です。さらに、ファッション属性情報は相互注意戦略を通じて利用されます。本タスクの複数回設定に適したファッションデータセットは存在しないため、既存の単一回データセットに追加の手動アノテーションを行い、大規模な複数回ファッションデータセットを導出しました。実験の結果、提案モデルは既存の最先端手法を大幅に上回ることが示されています。

Abstract:　 We study the task of conversational fashion image retrieval via multiturn natural language feedback. Most previous studies are based on single-turn settings. Existing models on multiturn conversational fashion image retrieval have limitations, such as employing traditional models, and leading to ineffective performance. We propose a novel framework that can effectively handle conversational fashion image retrieval with multiturn natural language feedback texts. One characteristic of the framework is that it searches for candidate images based on exploitation of the encoded reference image and feedback text information together with the conversation history. Furthermore, the image fashion attribute information is leveraged via a mutual attention strategy. Since there is no existing fashion dataset suitable for the multiturn setting of our task, we derive a large-scale multiturn fashion dataset via additional manual annotation efforts on an existing single-turn dataset. The experiments show that our proposed model significantly outperforms existing state-of-the-art methods.

Neural Graph Matching based Collaborative Filtering
Authors: Yixin Su (1), Rui Zhang (2), Sarah M. Erfani (1), Junhao Gan (1)
1: University of Melbourne, 2: www.ruizhang.info

ACM DL

Google Scholar

(83)
概要:　ユーザー属性とアイテム属性は、本質的な付加情報であり、これらの相互作用（つまり、サンプルデータにおける共起）がさまざまなレコメンダーシステムにおける予測精度を大幅に向上させることができます。我々は、属性間の相互作用を2つの異なるタイプに分類しました。すなわち、内側相互作用とクロス相互作用です。内側相互作用は、ユーザー属性同士またはアイテム属性同士の相互作用を指し、クロス相互作用はユーザー属性とアイテム属性の相互作用を指します。既存のモデルは、これら2つの属性相互作用を区別しておらず、この情報の活用が最適でない可能性があります。この欠点を解消するために、我々はニューラルグラフマッチングベースの協調フィルタリングモデル（GMCF）を提案し、相互作用をグラフマッチング構造でモデル化および集約することで、 2つの属性間相互作用を効果的に捉えることができます。我々のモデルでは、推薦のための2つの重要なプロセスである特徴学習と嗜好のマッチングが、グラフ学習（内側相互作用に基づく）とノードマッチング（クロス相互作用に基づく）を通じて明示的に実行されます。実験結果は、我々のモデルが最新のモデルを上回ることを示しています。さらに、GMCFが推薦の精度向上に効果的であることが確認されました。

Abstract:　 User and item attributes are essential side-information; their interactions (i.e., their co-occurrence in the sample data) can significantly enhance prediction accuracy in various recommender systems. We identify two different types of attribute interactions, inner interactions and cross interactions: inner interactions are those between only user attributes or those between only item attributes; cross interactions are those between user attributes and item attributes. Existing models do not distinguish these two types of attribute interactions, which may not be the most effective way to exploit the information carried by the interactions. To address this drawback, we propose a neural Graph Matching based Collaborative Filtering model (GMCF), which effectively captures the two types of attribute interactions through modeling and aggregating attribute interactions in a graph matching structure for recommendation. In our model, the two essential recommendation procedures, characteristic learning and preference matching, are explicitly conducted through graph learning (based on inner interactions) and node matching (based on cross interactions), respectively. Experimental results show that our model outperforms state-of-the-art models. Further studies verify the effectiveness of GMCF in improving the accuracy of recommendation.

The World is Binary: Contrastive Learning for Denoising Next Basket Recommendation
Authors: Yuqi Qin (1), Pengfei Wang (1), Chenliang Li (2)
1: Beijing University of Posts and Telecommunications, 2: School of Cyber Science and Engineering

ACM DL

Google Scholar

(84)
概要:　次のバスケット推薦は、ユーザーが次回の訪問で購入するアイテムのセットを、過去に購入したバスケットのシーケンスを考慮して推測することを目的としています。この課題は学術界と産業界の双方から注目を集めています。既存のソリューションは主に、ユーザーの過去の購入履歴を時系列的にモデル化することに焦点を当てています。しかし、ユーザーの行動の多様性とランダム性のため、これらのバスケットのすべてが次の行動を特定するのに有用であるわけではありません。推薦性能を向上させるためには、バスケットのデノイズを行い、信頼性の高い関連アイテムを抽出することが必要です。残念ながら、この視点は現行の文献では通常見落とされています。そこで本研究では、次のバスケット推薦のためのターゲットアイテムに関連するアイテムを自動的に抽出するContrastive Learning Model（CLEA）を提案します。具体的には、Gumbel Softmaxを活用し、バスケットの各アイテムがターゲットアイテムに関連しているかどうかを適応的に識別するデノイズジェネレータを考案しました。このプロセスにより、各ユーザーに対して各バスケットのポジティブなサブバスケットとネガティブなサブバスケットを取得できます。次に、GRUベースのコンテキストエンコーダを用いて各サブバスケットの表現を、その構成アイテムに基づいて抽出し、ターゲットアイテムに対する関連する嗜好または無関連なノイズを表現します。その後、アイテムレベルの関連性の監督を必要とせずに、この関連学習を同時に導くための新しい2段階のアンカー誘導型コントラスト学習プロセスを設計しました。私たちの知る限り、次のバスケット推薦のためにバスケットのアイテムレベルのデノイズをエンドツーエンドで実施するのはこれが初めてです。多様な特徴を持つ4つの実世界のデータセットを用いて広範な実験を行い、その結果、提案するCLEAが既存の最新の方法よりも優れた推薦性能を実現することを示しました。さらに、分析結果から、CLEAが推薦決定に向けて真正な関連アイテムを発見できることも確認しました。

Abstract:　 Next basket recommendation aims to infer a set of items that a user will purchase at the next visit by considering a sequence of baskets he/she has purchased previously. This task has drawn increasing attention from both the academic and industrial communities. The existing solutions mainly focus on sequential modeling over their historical interactions. However, due to the diversity and randomness of users' behaviors, not all these baskets are relevant to help identify the user's next move. It is necessary to denoise the baskets and extract credibly relevant items to enhance recommendation performance. Unfortunately, this dimension is usually overlooked in the current literature. To this end, in this paper, we propose a Contrastive Learning Model~(named CLEA) to automatically extract items relevant to the target item for next basket recommendation. Specifically, empowered by Gumbel Softmax, we devise a denoising generator to adaptively identify whether each item in a historical basket is relevant to the target item or not. With this process, we can obtain a positive sub-basket and a negative sub-basket for each basket over each user. Then, we derive the representation of each sub-basket based on its constituent items through a GRU-based context encoder, which expresses either relevant preference or irrelevant noises regarding the target item. After that, a novel two-stage anchor-guided contrastive learning process is then designed to simultaneously guide this relevance learning without requiring any item-level relevance supervision. To the best of our knowledge, this is the first work of performing item-level denoising for a basket in an end-to-end fashion for next basket recommendation. Extensive experiments are conducted over four real-world datasets with diverse characteristics. The results demonstrate that our proposed CLEA achieves significantly better recommendation performance than the existing state-of-the-art alternatives. Moreover, further analysis also shows that CLEA can successfully discover the real relevant items towards the recommendation decision.

Dual Attention Transfer in Session-based Recommendation with Multi-dimensional Integration
Authors: Chen Chen (1), Jie Guo (1), Bin Song (1)
1: Xidian University

ACM DL

Google Scholar

(85)
概要:　セッションベースの推薦システム (SBR) は、短いシーケンスに基づいて匿名ユーザーの次のクリックアクションを予測するために、電子商取引で広く使用されています。これまでの多くの研究では、SBRタスクにグラフニューラルネットワーク (GNN) を適用することの潜在的な利点が示されています。しかし、既存のSBRモデルは、トレーニング中に1つのデータセットのみに基づいて1つの推薦モデルを取得するため、ユーザーの嗜好問題を解決するためにGNNを使用しています。単一のデータセットには、データが過度にスパースであることや、アイテムの長距離関係を含む問題があります。したがって、データソースを豊富にするためのデュアル転移をSBRに導入することは絶対に必要です。この目的のために、本論文では多次元統合に基づくデュアル注意転移 (DAT-MDI) と呼ばれる新しい方法を提案します。(i) DATは、スロットアテンションメカニズムに基づいた潜在マッピング法を使用して、複数ドメイン間の異なるセッションにおけるユーザーの表現情報を抽出します。(ii) MDIはグラフニューラルネットワークをセッショングラフおよびグローバルグラフに対して、ゲートリカレントユニット (GRU) をシーケンスに対して組み合わせて、各セッションにおけるアイテムの表現を学習します。次に、マルチレベルのセッション表現をソフトアテンションメカニズムで結合します。我々はベンチマークとなる4つのデータセットで様々な実験を行い、DAT-MDIモデルが最新の手法よりも優れていることを示しました。

Abstract:　 Session-based recommendation (SBR) is widely used in e-commerce to predict the anonymous user's next click action according to a short sequence. Many previous studies have shown the potential advantages of applying Graph Neural Networks (GNN) to SBR tasks. However, the existing SBR models using GNN to solve user preference problems are only based on one single dataset to obtain one recommendation model during training. While the single dataset has the problems including the excessive sparse data source and the long-distance relationship of items. Therefore, introducing the dual transfer, which can enrich the data source, to SBR is absolutely necessary. To this end, a new method is proposed in this paper, which is called dual attention transfer based on multi-dimensional integration (DAT-MDI): (i) DAT uses a potential mapping method based on a slot attention mechanism to extract the user's representation information in different sessions between multiple domains. (ii) MDI combines the graph neural network for the graphs (session graph and global graph) and the gate recurrent unit (GRU) for the sequence to learn the item representation in each session. Then the multi-level session representation are combined by a soft-attention mechanism. We do a variety of experiments on four benchmark datasets which have shown that the superiority of the DAT-MDI model over the state-of-the-art methods.

User-Centric Path Reasoning towards Explainable Recommendation
Authors: Chang-You Tai (1), Liang-Ying Huang (2), Chien-Kun Huang (1), Lun-Wei Ku (1)
1: Academia Sinica, 2: Academia Sinica

ACM DL

Google Scholar

(86)
概要:　異種混合ナレッジグラフ（KG）をレコメンデーションシステムの補助情報として利用することにおいて、顕著な進展が見られました。KGパスに対する推論は、ユーザーの意思決定プロセスを解明します。従来の方法は、このプロセスをマルチホップ推論問題として形成することに重点を置いています。しかし、推論プロセスにおいて何らかのガイダンスがないと、膨大な探索空間が精度の低下と説明の多様性の欠如を招きます。本論文では、UCPRというユーザー中心のパス推論ネットワークを提案します。このネットワークはユーザーの需要という観点から探索を継続的にガイドし、説明可能なレコメンデーションを実現します。本ネットワークでは、局所的なシーケンス推論情報だけでなく、ユーザーの需要ポートフォリオの全体像を利用して次のユーザー意思決定ステップを推論します。5つの現実世界のベンチマーク実験において、UCPRは最先端の方法よりも有意に高い精度を示しました。また、提案モデルがユーザーの関心をうまく特定し、説明の多様性を増して説明可能性を向上させることも示しました。

Abstract:　 There has been significant progress in the utilization of heterogeneous knowledge graphs (KG) as auxiliary information in recommendation systems. Reasoning over KG paths sheds light on the user's decision-making process. Previous methods focus on formulating this process as a multi-hop reasoning problem. However, without some form of guidance in the reasoning process, such a huge search space results in poor accuracy and little explanation diversity. In this paper, we propose UCPR, a user-centric path reasoning network that constantly guides the search from the aspect of user demand and enables explainable recommendations. In this network, a multi-view structure leverages not only local sequence reasoning information but also a panoramic view of the user's demand portfolio while inferring subsequent user decision-making steps. Experiments on five real-world benchmarks show UCPR is significantly more accurate than state-of-the-art methods. Besides, we show that the proposed model successfully identifies users' concerns and increases reason-ing diversity to enhance explainability

On Interpretation and Measurement of Soft Attributes for Recommendation
Authors: Krisztian Balog (1), Filip Radlinski (2), Alexandros Karatzoglou (2)
1: University of Stavanger, 2: Google

ACM DL

Google Scholar

(87)
概要:　我々は、推薦システムにおける自然言語での詳細な修正（または批評）をいかにして堅牢に解釈するかについて取り組んでいる。特に、人間同士の推薦設定では、映画のプロットの独創性、会場の騒々しさ、レシピの複雑さなど、ソフト属性を用いてアイテムの好みを表現することが頻繁に行われている。バイナリタグ付けが推薦システムの文脈で広く研究されている一方で、ソフト属性は主観的かつ文脈に依存する側面を含むため、この方法では確実に捉えることができず、知識ベースにおける客観的なバイナリの真実として表現することもできない。このことは、ソフト属性のランキングを測定する際に重要な考慮点となる。我々は、絶対的なアイテムの特性としてではなく、パーソナライズされた相対的な表現として、より自然な表現方法を提案する。さらに、新しいデータ収集手法と評価アプローチ、そして新しい公開データセットを紹介する。さらに、ソフト属性に基づく批評を解釈し、それに基づいて行動するための手段として、非監視学習から弱監視学習、完全監視学習に至る一連のスコアリングアプローチを提案する。

Abstract:　 We address how to robustly interpret natural language refinements (or critiques) in recommender systems. In particular, in human-human recommendation settings people frequently use soft attributes to express preferences about items, including concepts like the originality of a movie plot, the noisiness of a venue, or the complexity of a recipe. While binary tagging is extensively studied in the context of recommender systems, soft attributes often involve subjective and contextual aspects, which cannot be captured reliably in this way, nor be represented as objective binary truth in a knowledge base. This also adds important considerations when measuring soft attribute ranking. We propose a more natural representation as personalized relative statements, rather than as absolute item properties. We present novel data collection techniques and evaluation approaches, and a new public dataset. We also propose a set of scoring approaches, from unsupervised to weakly supervised to fully supervised, as a step towards interpreting and acting upon soft attribute based critiques.

How do Online Learning to Rank Methods Adapt to Changes of Intent?
Authors: Shengyao Zhuang (1), Guido Zuccon (1)
1: The University of Queensland

ACM DL

Google Scholar

(88)
概要:　オンライン学習ランキング（OLTR）は、クリックなどのインタラクションデータを使用してランカーを動的に更新します。OLTRはユーザーの意図の変化を時間と共に捉えることができると考えられてきました。これは、オフラインや反実仮想の学習ランキングのような統計データセットで訓練されたランカーでは不可能なタスクです。しかし、この特性はこれまで実証および実証的に研究されたことがありません。以前の研究では、単一のユーザー意図を持つシミュレートされたオンラインデータや、意図やそれがインタラクションを通じてどのように変わるかについて明示的な概念がない実際のオンラインデータのみが考慮されていました。本論文では、OLTRアルゴリズムがユーザーの意図の変化に適応する能力を研究することで、このギャップを埋めます。我々の実証実験は、意図の変化に対する適応がOLTRメソッドによって異なり、暗黙のフィードバック信号に含まれるノイズ量にも依存することを示しています。この結果は重要で、意図変化の適応はオンラインおよびオフラインのパフォーマンスと並んで研究されるべきであることを示しています。意図変化に適応するOLTRアルゴリズムを調査することは挑戦的です。というのも、現在のLTRデータセットには必要な意図データが明示的に含まれていないからです。本論文では、意図変化に関連する主要な所見に加え、OLTRメソッドのこの側面を研究するための方法論も提供します。具体的には、既存のTRECコレクションをこのタスクに適応させることで、意図変化を明示的に含むOLTRのコレクションを作成します。さらに、意図変化に関連するクリック行動をモデル化およびシミュレートする方法を提案します。また、OLTRメソッドが意図変化にどのように適応するかを研究するための新しい評価指標も提案します。

Abstract:　 Online learning to rank (OLTR) uses interaction data, such as clicks, to dynamically update rankers. OLTR has been thought to capture user intent change overtime - a task that is impossible for rankers trained on statistic datasets such as in offline and counterfactual learning to rank. However, this feature has never been demonstrated and empirically studied, as previous work only considered simulated online data with single user intent or real online data with no explicit notion of intents and how they change over interactions. In this paper, we address this gap by study the capability of OLTR algorithms to adapt to user intent change. Our empirical experiments show that the adaptation to intent change does vary across OLTR methods, and is also dependent on the amount of noise in the implicit feedback signal. This is an important result, as it highlights that intent change adaptation should be studied alongside online and offline performance. Investigating how OLTR algorithms adapt to intent change is challenging as current LTR datasets do not explicitly contain the required intent data. Along with the main findings reported in this paper related to intent change, we also contribute a methodology to investigate this aspect of OLTR methods. Specifically, we create a collection for OLTR with explicit intent change by adapting an existing TREC collection to this task. We further introduce methods to model and simulate click behaviour related to intent change. We further propose novel evaluation metrics tailored to study different aspects of how OLTR methods adapt to intent change.

Scalable Personalised Item Ranking through Parametric Density Estimation
Authors: Riku Togashi (1), Masahiro Kato (2), Mayu Otani (2), Tetsuya Sakai (3), Shin'ichi Satoh (2)
1: CyberAgent, 2: CyberAgent, 3: Waseda University

ACM DL

Google Scholar

(89)
概要:　暗黙のフィードバックから学習することは困難です。なぜなら、ワンクラス問題の性質上、正の例しか観察できないからです。従来の多くの手法は、ワンクラス問題に対処するためにペアワイズランキングアプローチとネガティブサンプラーを使用します。しかし、そのような手法には特に大規模な応用において2つの主要な欠点があります。(1) ペアワイズアプローチは二次的な計算コストのために極めて非効率であり、(2) モデルベースのサンプラー（例えば、IRGAN）でさえ追加のモデルの訓練が必要なため、実用的な効率を達成できません。本論文では、収束速度がポイントワイズアプローチに匹敵し、ランキング効果においてもペアワイズアプローチと同様の性能を発揮する学習トゥランクアプローチを提案します。我々のアプローチは、ユーザーごとの正のアイテムの確率密度を指数型分布族の豊富な分布クラス内で推定します。私たちの定式化では、最大尤度推定に基づいた損失関数と適切なネガティブサンプリング分布を導き出します。また、リスク近似の実用的技術と正則化スキームを開発しました。さらに、特定の条件下で我々の単一モデルアプローチがIRGANの変種と等価であることを議論します。実データセットを用いた実験を通じて、我々のアプローチは効果と効率の面でポイントワイズおよびペアワイズの対抗手法を上回ることを示します。

Abstract:　 Learning from implicit feedback is challenging because of the difficult nature of the one-class problem: we can observe only positive examples. Most conventional methods use a pairwise ranking approach and negative samplers to cope with the one-class problem. However, such methods have two main drawbacks particularly in large-scale applications; (1) the pairwise approach is severely inefficient due to the quadratic computational cost; and (2) even recent model-based samplers (e.g. IRGAN) cannot achieve practical efficiency due to the training of an extra model. In this paper, we propose a learning-to-rank approach, which achieves convergence speed comparable to the pointwise counterpart while performing similarly to the pairwise counterpart in terms of ranking effectiveness. Our approach estimates the probability densities of positive items for each user within a rich class of distributions, viz. exponential family. In our formulation, we derive a loss function and the appropriate negative sampling distribution based on maximum likelihood estimation. We also develop a practical technique for risk approximation and a regularisation scheme. We then discuss that our single-model approach is equivalent to an IRGAN variant under a certain condition. Through experiments on real-world datasets, our approach outperforms the pointwise and pairwise counterparts in terms of effectiveness and efficiency.

New Insights into Metric Optimization for Ranking-based Recommendation
Authors: Roger Zhe Li (1), Julián Urbano (1), Alan Hanjalic (1)
1: Delft University of Technology

ACM DL

Google Scholar

(90)
概要:　 IR（情報検索）メトリクスの直接最適化は、ランキングベースのレコメンダーシステムを設計および開発するための一般的なアプローチとして採用されてきた。このアプローチに従う多くの手法（例えば、TFMAP、CLiMF、Top-N-Rank）は、評価に用いられるメトリクスを最適化することで最高のパフォーマンスが得られるという仮定の下に動作している。しかしながら、この仮定には疑問を投げかける研究も多数存在する。本論文では、この問題にさらに深く掘り下げ、最適化するメトリクスの選択がランキングベースのレコメンダーシステムのパフォーマンスに与える影響について詳細に調査する。対ペアおよびリストベースの学習-to-ランク（LTR）シナリオにおいて、異なるデータセットを用いて広範な実験研究を行い、最適化および評価に使用される四つの人気のあるIRメトリクス、具体的にはRR、AP、nDCG、およびRBPの相対的なメリットを比較する。前者の三つに対しては、文献に基づいた損失関数の形成を踏襲し、RBPに対しては、対ペアおよびリスト両シナリオのために新しい損失関数を提案する。我々の結果は、評価に使用される同じメトリクスを最適化することが必ずしも最高のパフォーマンスを達成するとは限らないことを確認する。実際、RBPに着想を得た損失は、他のメトリクスと比較して一貫して少なくとも同等のパフォーマンスを示し、いくつかのケースでは明確な利点を提供することがわかった。興味深い点として、RBPに基づく損失は、すべての使用シナリオで推薦パフォーマンスを向上させる一方、ユーザーがアイテムとインタラクションする活動レベルに関連する個々のパフォーマンス向上をもたらす可能性があることが示された。活動的なユーザーほど、より大きな利益を得ることができる。総じて、我々の結果は、同じメトリクスを最適化および評価する現在の研究慣行に疑問を投げかけ、推薦の文脈での学習-to-ランクにおいて有望な代替手段としてRBPベースの最適化を指摘している。

Abstract:　 Direct optimization of IR metrics has often been adopted as an approach to devise and develop ranking-based recommender systems. Most methods following this approach (e.g. TFMAP, CLiMF, Top-N-Rank) aim at optimizing the same metric being used for evaluation, under the assumption that this will lead to the best performance. A number of studies of this practice bring this assumption, however, into question. In this paper, we dig deeper into this issue in order to learn more about the effects of the choice of the metric to optimize on the performance of a ranking-based recommender system. We present an extensive experimental study conducted on different datasets in both pairwise and listwise learning-to-rank (LTR) scenarios, to compare the relative merit of four popular IR metrics, namely RR, AP, nDCG and RBP, when used for optimization and assessment of recommender systems in various combinations. For the first three, we follow the practice of loss function formulation available in literature. For the fourth one, we propose novel loss functions inspired by RBP for both the pairwise and listwise scenario. Our results confirm that the best performance is indeed not necessarily achieved when optimizing the same metric being used for evaluation. In fact, we find that RBP-inspired losses perform at least as well as other metrics in a consistent way, and offer clear benefits in several cases. Interesting to see is that RBP-inspired losses, while improving the recommendation performance for all uses, may lead to an individual performance gain that is correlated with the activity level of a user in interacting with items. The more active the users, the more they benefit. Overall, our results challenge the assumption behind the current research practice of optimizing and evaluating the same metric, and point to RBP-based optimization instead as a promising alternative when learning to rank in the recommendation context.

Fast Attention-based Learning-To-Rank Model for Structured Map Search
Authors: Chiqun Zhang (1), Michael R. Evans (2), Max Lepikhin (3), Dragomir Yankov (1)
1: Microsoft, 2: Microsoft, 3: Microsoft

ACM DL

Google Scholar

(91)
概要:　最近の研究では、Transformerベースの学習ランキング(Learning-to-Rank, LTR)アプローチが、文書およびパッセージの再ランキング問題において、従来の確立されたランキング手法（例えば、勾配ブースト決定木(Gradient-Boosted Decision Trees, GBDT)）を上回る性能を発揮することが示されています。これらの研究の共通の仮定は、クエリと結果の文書が明確な構造を持たない純粋なテキスト情報で構成されているということです。一方、地図検索では、結果の関連度はクエリおよび結果から派生したテキスト特徴、ユーザーへの結果の近接性などの地理空間的特徴、結果の住所形式を反映した構造化された特徴、およびクエリの認知された構造など、豊富な異質な特徴に基づいて決定されます。本研究では、GBDTベースの手法に似つつも、異質な入力をシームレスに処理可能な新しい深層ニューラルネットワークLTRアーキテクチャを提案します。同時に、GBDTとは異なり、このアーキテクチャは（多数の）注意深く作成された特徴による人間の入力を必要としません。代わりに、特徴は自己注意メカニズムを通じて推論されます。我々のモデルは、ランキングのために最適化された2つの軽量な注意層を実装しています。第一層はクエリ結果の類似性を計算し、第二層はリストワイズランキングの推論を行います。我々は、いくつかの単一言語および1つの多言語データセットで評価を行いました。結果として、我々のモデルは他のTransformerベースのランキングアーキテクチャを大幅に上回り、GBDTモデルと同等かそれ以上の性能を持つことを示しました。同様に重要なのは、推論の実行時間が他のTransformerアーキテクチャよりも桁違いに速いため、ハードウェアの提供コストを大幅に削減できることです。このモデルは、様々な言語および市場において産業用地図検索エンジンのランキング機能を低コストで提供できる適切な代替手段です。

Abstract:　 Recent works show that Transformer-based learning-to-rank (LTR) approaches can outperform previous well-established ranking methods, such as gradient-boosted decision trees (GBDT), on document and passage re-ranking problems. A common assumption in these works is that the query and the result documents are comprised of purely textual information without explicit structure. In map search, the relevance of results is determined based on rich heterogeneous features - textual features derived from the query and the results, geospatial features such as proximity of a result to the user, structured features reflecting the address format of the result, and the perceived structure of the query. In this work, we propose a novel deep neural network LTR architecture, capable of seamlessly handling heterogeneous inputs, similar to GBDT-based methods. At the same time, unlike GBDT, the architecture does not require human input via (numerous) carefully-crafted features. Instead, features are inferred through a self-attention mechanism. Our model implements two lightweight attention layers optimized for ranking: the first layer computes query-result similarities, the second implements listwise ranking inference. We perform evaluation on several single language and one multilingual dataset. Our model outperforms by a wide margin other Transformer-based ranking architectures and has equal or better performance than GBDT models. Equally important, runtime inference is orders of magnitude faster than other Transformer architectures, significantly reducing hardware serving costs. The model is a low-cost alternative suitable to power ranking in industrial map search engines across a variety of languages and markets.

Learning to Rank for Mathematical Formula Retrieval
Authors: Behrooz Mansouri (1), Richard Zanibbi (1), Douglas W. Oard (2)
1: Rochester Institute of Technology, 2: University of Maryland

ACM DL

Google Scholar

(92)
概要:　数式情報検索（Mathematical Information Retrieval、MIR）では、クエリに数式を使用して文書中の他の類似数式と一致させることができます。しかし、数式の構造の複雑さから、数式の一致には特別な処理が必要です。数式は、記号配置木（Symbol Layout Trees、SLTs）としての見た目や、演算子木（Operator Trees、OPTs）としての構文によって表現されることがあります。過去の数式検索アプローチでは、これらの表現のいずれか、または両方を用いて、完全な一致でない場合（例：異なる変数名の一致を許可する）の検索結果を改善するために単一化を使用しました。これらの表現に基づいて、完全な式（木）、部分式、および経路の一致のためのモデルが使用されてきました。最近では、数式をベクトルとして表現するために埋め込みモデルが使用されました。本論文では、異なる検索モデルおよび数式表現の有効性を比較検討し、それぞれの強みと弱みを明らかにします。それに基づき、異なる数式検索モデルからの類似度スコアを特徴量として用いたSVM-rankを用いたランク学習モデルを提案します。ARQMath数式検索タスクの実験結果は、提案されたランク学習モデルが効果的であり、新しい最先端の結果を生み出すことを示しています。

Abstract:　 In Mathematical Information Retrieval (MIR), formulae can be used in a query to match other similar formulae in documents. However, due to the structural complexity of formulae, specialized processing is needed for formula matching. Formulae may be represented by their appearance in Symbol Layout Trees (SLTs) or by their syntax in Operator Trees (OPTs). Previous approaches for formula retrieval used one or both of these representations and used unification to improve search results for inexact matches (e.g., allowing different variable names to match). On these representations, models for matching full expressions (trees), subexpressions, and paths have been used. Recently embedding models were used to represent formulae as vectors. In this paper, the effectiveness of retrieval models and formula representations are studied to identify their relative strengths and weaknesses. Then, a learning to rank model is proposed, using SVM-rank over similarity scores from different formula retrieval models as features. Experiments on the ARQMath formula retrieval task results show that the proposed learning to rank model is effective, producing new state-of-the-art results.

Investigating User Behavior in Legal Case Retrieval
Authors: Yunqiu Shao (1), Yueyue Wu (1), Yiqun Liu (1), Jiaxin Mao (2), Min Zhang (1), Shaoping Ma (1)
1: BNRist, 2: GSAI

ACM DL

Google Scholar

(93)
概要:　法的ケースの検索は、クエリーケースに対して支持するケースを検索することを目的とした特殊な情報検索（IR）のタスクです。最近の研究は、自動検索モデルの性能向上に注力していますが、このタスクにおけるユーザーとシステムの実際の検索インタラクションにほとんど注意が払われていません。そこで私たちは、法的ケース検索のシナリオにおけるユーザーの行動を調査することに焦点を当てました。具体的には、法律を専攻する45人の参加者を対象にした実験室でのユーザースタディを実施し、ユーザーの豊富なインタラクションと関連性評価を収集しました。収集したデータを用いて、まず法的ケース検索の実務における検索プロセスの特性を分析しました。その結果、法的ケース検索と一般的なウェブ検索の間で様々な検索行動に大きな違いが見られました。これらの違いは、法的ケース検索におけるユーザー行動を深く調査し、ウェブ検索のユーザーモデルに基づいて開発された関連メカニズムの適用を再考する必要性を浮き彫りにしています。次に、タスクの難易度やドメイン専門知識を含む異なる視点から検索行動に影響を与える要因を調査しました。最後に、法的ケース検索における暗黙的なフィードバックに光を当て、ユーザー行動に基づいた関連性の予測モデルを設計しました。本研究は、法的ケース検索プロセスにおけるユーザーインタラクションの理解を深め、法律実務者を支援するための対応する検索システムの設計に役立つものです。

Abstract:　 Legal case retrieval is a specialized IR task aiming to retrieve supporting cases given a query case. While recent research efforts are committed to improving the automatic retrieval models' performances, little attention has been paid to the practical search interactions between users and systems in this task. Therefore, we focus on investigating user behavior in the scenario of legal case retrieval. Specifically, we conducted a laboratory user study that involved 45 participants majoring in law to collect users' rich interactions and relevance assessments. With the collected data, we first analyzed the characteristics of the search process in legal case retrieval practice. We observed significant differences between legal case retrieval and general web search in various search behavior. These differences highlight the necessity of in-depth investigating user behavior in legal case retrieval and re-thinking the application of related mechanisms developed based on the user models in Web search. Then we investigated factors that would influence search behavior from different perspectives, including task difficulty and domain expertise. Finally, we shed light on implicit feedback in legal case retrieval and designed a predictive model for relevance based on user behavior. Our work provides a better understanding of user interactions in the legal case retrieval process, which can benefit the design of the corresponding retrieval systems to support legal practitioners.

NeurJudge: A Circumstance-aware Neural Framework for Legal Judgment Prediction
Authors: Linan Yue (1), Qi Liu (1), Binbin Jin (1), Han Wu (1), Kai Zhang (1), Yanqing An (1), Mingyue Cheng (1), Biao Yin (1), Dayong Wu (2)
1: University of Science and Technology of China, 2: IFLYTEK

ACM DL

Google Scholar

(94)
概要:　法的判断予測は、民法システムの法的インテリジェンスにおける基本的なタスクであり、訴因、法令記事、刑罰の期間予測など、複数のサブタスクの判決結果を自動的に予測することを目的としています。既存の研究は主に、すべてのサブタスクに対する全体的な事実記述の影響に焦点を当てていますが、実際の司法シナリオ、すなわち裁判官が判決結果を決定するために犯罪の状況（つまり、事実のさまざまな部分）を採用することを無視しています。この目的のために、本論文では犯罪の状況を探ることで、状況認識型の法的判断予測フレームワーク（NeurJudge）を提案します。具体的には、NeurJudgeは中間サブタスクの結果を利用して事実記述を異なる状況に分け、それを他のサブタスクの予測に利用します。さらに、判決に混乱を引き起こす（すなわち、訴因と法令記事）問題の普及を考慮して、NeurJudgeをより包括的なフレームワークであるNeurJudge+に拡張します。特にNeurJudge+は、ラベル埋め込み方法を利用して、訴因と法令記事の意味を事実に組み込み、混乱する判決問題に対してより表現力豊かな事実表現を生成します。2つの実世界データセットにおける広範な実験結果は、提案されたフレームワークの有効性を明確に検証しています。

Abstract:　 Legal Judgment Prediction is a fundamental task in legal intelligence of the civil law system, which aims to automatically predict the judgment results of multiple subtasks, such as charge, law article, and term of penalty prediction. Existing studies mainly focus on the impact of the entire fact description on all subtasks. They ignore the practical judicial scenario, where judges adopt circumstances of crime (i.e., various parts of the fact) to decide judgment results. To this end, in this paper, we propose a circumstance-aware legal judgment prediction framework (i.e., NeurJudge) by exploring circumstances of crime. Specifically, NeurJudge utilizes the results of intermediate subtasks to separate the fact description into different circumstances and exploits them to make the predictions of other subtasks. In addition, considering the popularity of confusing verdicts (i.e., charges and law articles), we further extend NeurJudge to a more comprehensive framework which is denoted by NeurJudge+. Particularly, NeurJudge+ utilizes a label embedding method to incorporate the semantics of labels (i.e., charges and law articles) into facts to generate more expressive fact representations for confusing verdicts problems. Extensive experimental results on two real-world datasets clearly validate the effectiveness of our proposed frameworks.

Legal Judgment Prediction via Relational Learning
Authors: Qian Dong (1), Shuzi Niu (2)
1: Institute of Software, 2: Institute of Software

ACM DL

Google Scholar

(95)
概要:　法的訴訟とすべての法令を考慮し、LJP（Legal Judgment Prediction：法的判決予測）は、訴訟の違反条項、罪状、および刑期を予測することを目的としています。自然に、これらのラベルは異なるタスク間およびタスク内で絡み合っています。例えば、各罪状は特定の条項と論理的または意味的に関連しています。これらの制約を無視すると、LJP手法は信頼性の低い結果を予測してしまいます。この問題を解決するために、まずトレーニングセットから得られたグローバルな一貫性グラフ上のノード分類問題としてLJPを形式化します。ノードエンコーダーに関しては、マスクされるトランスフォーマーネットワークを使用して、タスク間で一貫し、タスク内で識別可能な訴訟認識ノード表現を取得します。ノード分類器に関しては、このグラフ内で各ノードのラベル分布が隣接ノードに依存して局所的一貫性を達成するため、関係学習により実現します。ノードエンコーダーと分類器の両方は変分EMにより最適化されます。最後に、分類結果の自己一貫性を評価する新しい測定法を提案します。2つのベンチマークデータセットでの実験結果は、我々の手法のF1改善率がSOTA手法と比較して約4.8％であることを示しています。

Abstract:　 Given a legal case and all law articles, L egal J udgment P rediction (LJP) is to predict the case's violated articles, charges and term of penalty. Naturally, these labels are entangled among different tasks and within a task. For example, each charge is only logically or semantically related to some fixed articles. Ignoring these constraints, LJP methods would predict unreliable results. To solve this problem, we first formalize LJP as a node classification problem over a global consistency graph derived from the training set. In terms of node encoder, we utilize a masked transformer network to obtain case aware node representations consistent among tasks and discriminative within a task. In terms of node classifier, each node's label distribution is dependent on its neighbors' in this graph to achieve local consistency by relational learning. Both the node encoder and classifier are optimized by variational EM. Finally, we propose a novel measure to evaluate self-consistency of classification results. Experimental results on two benchmark datasets demonstrate that the F1 improvement of our method is about $4.8%$ compared with SOTA methods.

Legal Judgment Prediction with Multi-Stage Case Representation Learning in the Real Court Setting
Authors: Luyao Ma (1), Yating Zhang (2), Tianyi Wang (2), Xiaozhong Liu (3), Wei Ye (4), Changlong Sun (2), Shikun Zhang (4)
1: Peking University & Alibaba Group, 2: Alibaba Group, 3: Indiana University Bloomington, 4: Peking University

ACM DL

Google Scholar

(96)
概要:　法的判断予測（LJP）は、法的AIにおいて重要なタスクです。これまでの方法は、裁判官がしたケースナラティブを入力として判断を予測する疑似設定でこのトピックを研究していましたが、実際の裁判環境での重要なケースライフサイクル情報を無視することは、ケースロジックの表現の質と予測の正確性に脅威を与える可能性があります。本論文では、実際の法廷から新たなチャレンジングなデータセットを紹介し、原告の主張や法廷での討論データという本物のケース入力を活用して法的判断を合理的に百科事典的な方法で予測します。本手法は、法廷討論の多役対話を包括的に理解することによって自動的にケースの事実を認識し、それから多目的学習を通じて主張を識別することによって最終的な判断に到達します。大規模な民事裁判データセットを用いた一連の実験により、提案モデルは主張、事実、討論間の相互作用をより正確に特徴付け、強力な最先端ベースラインを大幅に上回る成果を達成しました。さらに、実際の裁判官や法科大学院生を対象に行われたユーザースタディによって、神経予測が解釈可能かつ観察しやすくなり、裁判の効率性と判断の質を向上させることも示されました。

Abstract:　 Legal judgment prediction(LJP) is an essential task for legal AI. While prior methods studied on this topic in a pseudo setting by employing the judge-summarized case narrative as the input to predict the judgment, neglecting critical case life-cycle information in real court setting could threaten the case logic representation quality and prediction correctness. In this paper, we introduce a novel challenging dataset from real courtrooms to predict the legal judgment in a reasonably encyclopedic manner by leveraging the genuine input of the case - plaintiff's claims and court debate data, from which the case's facts are automatically recognized by comprehensively understanding the multi-role dialogues of the court debate, and then learnt to discriminate the claims so as to reach the final judgment through multi-task learning. An extensive set of experiments with a large civil trial data set shows that the proposed model can more accurately characterize the interactions among claims, fact and debate for legal judgment prediction, achieving significant improvements over strong state-of-the-art baselines. Moreover, the user study conducted with real judges and law school students shows the neural predictions can also be interpretable and easily observed, and thus enhancing the trial efficiency and judgment quality.

Cross-Domain Contract Element Extraction with a Bi-directional Feedback Clause-Element Relation Network
Authors: Zihan Wang (1), Hongye Song (2), Zhaochun Ren (1), Pengjie Ren (1), Zhumin Chen (1), Xiaozhong Liu (3), Hongsong Li (2), Maarten de Rijke (4)
1: Shandong University, 2: Alibaba Group, 3: Indiana University Bloomington, 4: University of Amsterdam & Ahold Delhaize Research

ACM DL

Google Scholar

(97)
概要:　契約要素抽出（CEE）は、契約日、支払い、法令参照などの法的に関連する要素を自動的に認識し抽出するという新しい課題です。この作業の自動化手法は、シーケンスラベリング問題として取り組み、人間の労力を大幅に削減します。しかし、契約のジャンルや要素の種類が広範にわたるため、このシーケンスラベリングタスクにおける大きな課題は、知識をあるドメインから別のドメインに転移する方法、すなわちクロスドメインCEEです。クロスドメインCEEは、クロスドメイン固有表現認識（NER）とは2つの重要な点で異なります。第一に、契約要素は固有表現よりもはるかに細分化されているため、抽出器の転移が困難です。第二に、クロスドメインCEEの抽出領域は、クロスドメインNERの抽出領域よりも大きいため、異なるドメインの要素の文脈が多様になることが一般的です。これらの課題に対応するために、我々はクロスドメインCEEタスクに向けたフレームワーク、双方向フィードバック句要素関係ネットワーク（Bi-FLEET）を提案します。Bi-FLEETは、(1)コンテキストエンコーダ、(2)句要素関係エンコーダ、(3)推論層の3つの主要なコンポーネントを持ちます。要素と句の種類に関する不変の知識を取り入れるために、句要素グラフがドメイン間で構築され、句要素関係エンコーダでは階層的グラフニューラルネットワークが採用されます。文脈の変動の影響を軽減するために、推論層では双方向フィードバック方式を組み込んだマルチタスクフレームワークを設計し、句分類と要素抽出の両方を実行します。クロスドメインNERおよびCEEタスクに関する実験結果は、Bi-FLEETが最先端のベースラインを大幅に上回ることを示しています。

Abstract:　 Contract element extraction (CEE) is the novel task of automatically identifying and extracting legally relevant elements such as contract dates, payments, and legislation references from contracts. Automatic methods for this task view it as a sequence labeling problem and dramatically reduce human labor. However, as contract genres and element types may vary widely, a significant challenge for this sequence labeling task is how to transfer knowledge from one domain to another, i.e., cross-domain CEE. Cross-domain CEE differs from cross-domain named entity recognition (NER) in two important ways. First, contract elements are far more fine-grained than named entities, which hinders the transfer of extractors. Second, the extraction zones for cross-domain CEE are much larger than for cross-domain NER. As a result, the contexts of elements from different domains can be more diverse. We propose a framework, the Bi-directional Feedback cLause-Element relaTion network (Bi-FLEET), for the cross-domain CEE task that addresses the above challenges. Bi-FLEET has three main components: (1) a context encoder, (2) a clause-element relation encoder, and (3) an inference layer. To incorporate invariant knowledge about element and clause types, a clause-element graph is constructed across domains and a hierarchical graph neural network is adopted in the clause-element relation encoder. To reduce the influence of context variations, a multi-task framework with a bi-directional feedback scheme is designed in the inference layer, conducting both clause classification and element extraction. The experimental results over both cross-domain NER and CEE tasks show that Bi-FLEET significantly outperforms state-of-the-art baselines.

TFROM: A Two-sided Fairness-Aware Recommendation Model for Both Customers and Providers
Authors: Yao Wu (1), Jian Cao (1), Guandong Xu (2), Yudong Tan (3)
1: Shanghai Jiao Tong University, 2: University of Technology Sydney, 3: Ctrip.com International Ltd Shanghai

ACM DL

Google Scholar

(98)
概要:　現在、レコメンダシステムの公平性に関する研究の多くは、顧客の視点または製品（またはサービス）提供者の視点から行われています。しかし、このようなアプローチでは、一方の公平性が保証されると、もう一方の公平性や権利が損なわれる可能性があるという事実が見逃されています。本論文では、顧客と提供者の二面の視点からレコメンデーションシナリオを考察します。提供者の視点では、レコメンダシステムにおける提供者の露出の公平性を検討します。顧客に対しては、公平性に関する措置の導入により推奨結果の質が低下することの公平性を検討します。レコメンデーションの質、顧客の公平性、および提供者の公平性の関係について理論的に分析し、顧客と提供者の両方に対応する二面の公平性を考慮したレコメンデーションモデル（TFROM）を設計しました。具体的には、オフラインレコメンデーションとオンラインレコメンデーションのために、TFROMの二つのバージョンを設計しました。本モデルの有効性は、三つの実世界データセットで検証されました。実験結果は、TFROMが依然として高いパーソナライゼーションレベルを維持しつつ、ベースラインアルゴリズムよりも二面の公平性をより良く提供することを示しています。

Abstract:　 At present, most research on the fairness of recommender systems is conducted either from the perspective of customers or from the perspective of product (or service) providers. However, such a practice ignores the fact that when fairness is guaranteed to one side, the fairness and rights of the other side are likely to reduce. In this paper, we consider recommendation scenarios from the perspective of two sides (customers and providers). From the perspective of providers, we consider the fairness of the providers' exposure in recommender system. For customers, we consider the fairness of the reduced quality of recommendation results due to the introduction of fairness measures. We theoretically analyzed the relationship between recommendation quality, customers fairness, and provider fairness, and design a two-sided fairness-aware recommendation model (TFROM) for both customers and providers. Specifically, we design two versions of TFROM for offline and online recommendation. The effectiveness of the model is verified on three real-world data sets. The experimental results show that TFROM provides better two-sided fairness while still maintaining a higher level of personalization than the baseline algorithms.

Computationally Efficient Optimization of Plackett-Luce Ranking Models for Relevance and Fairness
Authors: Harrie Oosterhuis (1)
1: Radboud University

ACM DL

Google Scholar

(99)
概要:　最近の研究では、確率的Plackett-Luce (PL)ランキングモデルが、関連性と公平性の指標を最適化するための堅牢な選択肢として提案されています。ヒューリスティックな最適化アルゴリズムを必要とする決定的なモデルとは異なり、PLモデルは完全に微分可能です。理論的には、確率的勾配降下法を用いてランキング指標を最適化することができます。しかし、実際にはアイテムのすべての可能な順列を一つひとつ反復する必要があるため、勾配の計算は現実的ではありません。そのため、実際の応用ではサンプリング技術を用いて勾配の近似を行っています。本論文では、新しいアルゴリズムであるPL-Rankを紹介します。このアルゴリズムは、PLランキングモデルの勾配を関連性と公平性の両指標に対して推定します。従来の方策勾配に基づくアプローチとは異なり、PL-RankはPLモデルとランキング指標の特定の構造を活用します。我々の実験分析によると、PL-Rankは既存の方策勾配よりもサンプル効率が高く、計算コストが低いため、高性能で迅速な収束を達成することが示されました。PL-Rankはさらに、より関連性が高く、より公平な実世界のランキングシステムにPLモデルを適用することを可能にします。

Abstract:　 Recent work has proposed stochastic Plackett-Luce (PL) ranking models as a robust choice for optimizing relevance and fairness metrics. Unlike their deterministic counterparts that require heuristic optimization algorithms, PL models are fully differentiable. Theoretically, they can be used to optimize ranking metrics via stochastic gradient descent. However, in practice, the computation of the gradient is infeasible because it requires one to iterate over all possible permutations of items. Consequently, actual applications rely on approximating the gradient via sampling techniques. In this paper, we introduce a novel algorithm: PL-Rank, that estimates the gradient of a PL ranking model w.r.t. both relevance and fairness metrics. Unlike existing approaches that are based on policy gradients, PL-Rank makes use of the specific structure of PL models and ranking metrics. Our experimental analysis shows that PL-Rank has a greater sample-efficiency and is computationally less costly than existing policy gradients, resulting in faster convergence at higher performance. PL-Rank further enables the industry to apply PL models for more relevant and fairer real-world ranking systems.

When Fair Ranking Meets Uncertain Inference
Authors: Avijit Ghosh (1), Ritam Dutt (2), Christo Wilson (1)
1: Northeastern University, 2: Carnegie Mellon University

ACM DL

Google Scholar

(100)
概要:　既存の公正なランク付けシステム、特に人口統計学的に公正であるとされるものは、個人に関する正確な人口統計情報がランク付けアルゴリズムに利用可能であることを前提としています。しかし、現実にはこの前提は成り立たないことが多いです。例えば、就職応募者やクレジット申請者のランク付けの際には、社会的および法的な障害により、アルゴリズムの運営者が人々の人口統計情報を収集できない場合があります。このような状況では、アルゴリズムの運営者は人々の人口統計を推測し、その推測結果をランク付けアルゴリズムの入力として使用しようとするかもしれません。本研究では、人口統計の推測における不確実性と誤差が、公正なランク付けアルゴリズムの提供する公正性にどのように影響するかを検証します。シミュレーションと実データを用いた三つの事例研究を通じて、実際のシステムから得られた人口統計の推測が不当なランク付けを引き起こす可能性があることを示します。我々の結果は、推測された人口統計データが非常に正確でない限り、それを公正なランク付けアルゴリズムの入力として使用すべきでないことを示唆しています。

Abstract:　 Existing fair ranking systems, especially those designed to be demographically fair, assume that accurate demographic information about individuals is available to the ranking algorithm. In practice, however, this assumption may not hold --- in real-world contexts like ranking job applicants or credit seekers, social and legal barriers may prevent algorithm operators from collecting peoples' demographic information. In these cases, algorithm operators may attempt to infer peoples' demographics and then supply these inferences as inputs to the ranking algorithm. In this study, we investigate how uncertainty and errors in demographic inference impact the fairness offered by fair ranking algorithms. Using simulations and three case studies with real datasets, we show how demographic inferences drawn from real systems can lead to unfair rankings. Our results suggest that developers should not use inferred demographic data as input to fair ranking algorithms, unless the inferences are extremely accurate.

Policy-Gradient Training of Fair and Unbiased Ranking Functions
Authors: Himank Yadav (1), Zhengxiao Du (2), Thorsten Joachims (1)
1: Cornell University, 2: Tsinghua University

ACM DL

Google Scholar

(101)
概要:　クリックや滞在時間などの暗黙的フィードバックは、ランキング学習のための豊富で魅力的なデータソースですが、外生的および内生的な理由で不公平なランキングポリシーを生み出す可能性があります。外生的な理由は通常、トレーニングデータのバイアスとして現れ、それが学習されたランキングポリシーに反映され、多くの場合、豊かな者がさらに豊かになるダイナミクスを引き起こします。さらに、このようなバイアスを修正した後でも、学習アルゴリズムの設計に内在する理由が、アイテム間の露出が公平に配分されないランキングポリシーにつながることがあります。外生的および内生的な不公平の両方の原因に対処するために、プレゼンテーションバイアスとメリットベースの露出の公平性の両方に同時に対処する最初のランキング学習アプローチを提案します。具体的には、アプリケーションのニーズに基づいて選択できる一連のアンモライズド露出公平性制約を定義し、暗黙的フィードバックデータの選択バイアスにもかかわらず、これらの公平性基準をどのようにして強制できるかを示します。鍵となる結果は、FULTRと呼ばれる効率的で柔軟なポリシー勾配アルゴリズムであり、これはユーティリティ推定と公平性制約の両方にカウンターファクチュアル推定器を使用することを初めて可能にします。理論的な枠組みの正当性を超えて、提案されたアルゴリズムがバイアスのあるノイズの多いフィードバックから正確で公平なランキングポリシーを学習できることを実証的に示します。

Abstract:　 While implicit feedback (e.g., clicks, dwell times, etc.) is an abundant and attractive source of data for learning to rank, it can produce unfair ranking policies for both exogenous and endogenous reasons. Exogenous reasons typically manifest themselves as biases in the training data, which then get reflected in the learned ranking policy and often lead to rich-get-richer dynamics. Moreover, even after the correction of such biases, reasons endogenous to the design of the learning algorithm can still lead to ranking policies that do not allocate exposure among items in a fair way. To address both exogenous and endogenous sources of unfairness, we present the first learning-to-rank approach that addresses both presentation bias and merit-based fairness of exposure simultaneously. Specifically, we define a class of amortized fairness-of-exposure constraints that can be chosen based on the needs of an application, and we show how these fairness criteria can be enforced despite the selection biases in implicit feedback data. The key result is an efficient and flexible policy-gradient algorithm, called FULTR, which is the first to enable the use of counterfactual estimators for both utility estimation and fairness constraints. Beyond the theoretical justification of the framework, we show empirically that the proposed algorithm can learn accurate and fair ranking policies from biased and noisy feedback.

Towards Personalized Fairness based on Causal Notion
Authors: Yunqi Li (1), Hanxiong Chen (1), Shuyuan Xu (1), Yingqiang Ge (1), Yongfeng Zhang (1)
1: Rutgers University

ACM DL

Google Scholar

(102)
概要:　レコメンダーシステムは、情報検索や意思決定のために多くのユーザーが利用するため、人間および社会にますます重要かつ批判的な影響を与えています。したがって、レコメンデーションにおける潜在的な不公平問題に対処することが重要です。ユーザーがアイテムに対して個別の好みを持っているように、公平性に対する要求も多くのシナリオで個別化されています。したがって、ユーザーの個別の公平性要求を満たすために、個別化された公平なレコメンデーションを提供することが重要です。さらに、公平なレコメンデーションに関するこれまでの研究は、主に関連性に基づく公平性に焦点を当てていました。しかし、レコメンダーシステムにおける公平性を適切に評価するためには、関連性に基づく公平性の概念から因果公平性の概念に進むことが重要です。上記の要点を踏まえ、本論文ではレコメンダーシステムにおけるユーザーの個別の反実的公平性の実現に焦点を当てています。この目的のために、対立学習を通じて特徴量に依存しないユーザー埋め込みを生成することで、反実的に公平なレコメンデーションを実現するためのフレームワークを導入します。このフレームワークにより、レコメンダーシステムはユーザーごとの公平性を達成しつつ、非個別化された状況にも対応することができます。浅層および深層のレコメンデーションアルゴリズムを用いた2つの実世界のデータセット上での実験では、望ましいレコメンデーション性能を持ちながら、より公正なレコメンデーションをユーザーに提供できることが示されました。

Abstract:　 Recommender systems are gaining increasing and critical impacts on human and society since a growing number of users use them for information seeking and decision making. Therefore, it is crucial to address the potential unfairness problems in recommendations. Just like users have personalized preferences on items, users' demands for fairness are also personalized in many scenarios. Therefore, it is important to providepersonalized fair recommendations for users to satisfy theirpersonalized fairness demands. Besides, previous works on fair recommendation mainly focus on association-based fairness. However, it is important to advance from associative fairness notions to causal fairness notions for assessing fairness more properly in recommender systems. Based on the above considerations, this paper focuses on achieving personalized counterfactual fairness for users in recommender systems. To this end, we introduce a framework for achieving counterfactually fair recommendations through adversary learning by generating feature-independent user embeddings for recommendation. The framework allows recommender systems to achieve personalized fairness for users while also covering non-personalized situations. Experiments on two real-world datasets with shallow and deep recommendation algorithms show that our method can generate fairer recommendations for users with a desirable recommendation performance.

DAIR: A Query-Efficient Decision-based Attack on Image Retrieval Systems
Authors: Mingyang Chen (1), Junda Lu (1), Yi Wang (2), Jianbin Qin (3), Wei Wang (1)
1: University of New South Wales, 2: Dongguan University of Technology, 3: Shenzhen Institute of Computing Sciences

ACM DL

Google Scholar

(103)
概要:　画像検索システムに対する敵対的攻撃の研究への関心が高まっている。しかし、既存の攻撃手法の多くは、攻撃者がモデルおよびデータベースの詳細全てにアクセスできるホワイトボックス設定に基づいており、これは実際の攻撃に対して強い仮定である。一般的な転送ベースの攻撃もまた多大なリソースを必要とし、その効果は信頼性に欠けることが示されている。本論文では、人間には知覚できない微小な摂動によりトップKの検索結果を完全に覆すための、クエリ効率の高い決定論ベースの攻撃フレームワーク（DAIR）を初めて提案する。我々は、この問題の離散的な性質を克服するために平滑化された効用関数を用いた最適化手法を提案する。さらにクエリの効率を向上させるために、サロゲートモデルとターゲットモデル間での転送性を効率的に実現できる新規のサンプリング手法を提案する。我々のベンチマークデータセットにおける包括的な実験評価の結果、DAIR手法が最先端の決定論ベースの手法を大幅に上回ることが示された。さらに、実際の画像検索エンジン（Bing Visual SearchおよびFace++エンジン）に対しても、数百回のクエリのみで攻撃が成功することを示した。

Abstract:　 There is an increasing interest in studying adversarial attacks on image retrieval systems. However, most of the existing attack methods are based on the white-box setting, where the attackers have access to all the model and database details, which is a strong assumption for practical attacks. The generic transfer-based attack also requires substantial resources yet the effect was shown to be unreliable. In this paper, we make the first attempt in proposing a query-efficient decision-based attack framework for the image retrieval (DAIR) to completely subvert the top-K retrieval results with human imperceptible perturbations. We propose an optimization-based method with a smoothed utility function to overcome the challenging discrete nature of the problem. To further improve the query efficiency, we propose a novel sampling method that can achieve the transferability between the surrogate and the target model efficiently. Our comprehensive experimental evaluation on the benchmark datasets shows that our DAIR method outperforms significantly the state-of-the-art decision-based methods. We also demonstrate that real image retrieval engines (Bing Visual Search and Face++ engines) can be attacked successfully with only several hundreds of queries.

Fight Fire with Fire: Towards Robust Recommender Systems via Adversarial Poisoning Training
Authors: Chenwang Wu (1), Defu Lian (1), Yong Ge (2), Zhihao Zhu (1), Enhong Chen (1), Senchao Yuan (1)
1: University of Science and Technology of China, 2: University of Arizona

ACM DL

Google Scholar

(104)
概要:　近年の研究で、推薦システムは脆弱であり、攻撃者がうまく設計された悪意あるプロファイルをシステムに注入することが容易であることが示されています。このようなデータの合理性を否定することはできず、強固な推薦システムを構築することが急務です。脆弱性対策としての敵対的訓練（Adversarial Training）は広く研究されていますが、従来の敵対的訓練はパラメータ（入力値）に小さな摂動を加えるものであり、推薦システムへの毒殺攻撃メカニズムには適していません。そのため、既存のデータをよく学習する実用的モデルには効果的ではありません。これらの制約に対処するために、我々は敵対的毒殺訓練（Adversarial Poisoning Training, APT）を提案します。これは、経験リスク最小化（ERM）に専念する偽ユーザーを注入することで毒殺プロセスをシミュレートし、強固なシステムを構築します。さらに、ERMユーザーを生成するために、各偽ユーザーの経験リスクへの影響を推定する近似手法を探求しました。「毒をもって毒を制す」という戦略は直感に反するように思えますが、理論的にAPTが毒殺耐性の上限を引き上げることを証明しています。また、敵対的訓練が推薦システムの堅牢性を向上させる効果があることを初めて理論的に証明しました。4つの実世界のデータセットにおける5つの毒殺攻撃に関する広範な実験を通じて、APTの堅牢性の改善がベースラインを大きく上回る結果が示されました。さらに、APTは多くの場合においてモデルの一般化能力も向上させることが言及される価値があります。

Abstract:　 Recent studies have shown that recommender systems are vulnerable, and it is easy for attackers to inject well-designed malicious profiles into the system, leading to biased recommendations. We cannot deny these data's rationality, making it imperative to establish a robust recommender system. Adversarial training has been extensively studied for robust recommendations. However, traditional adversarial training adds small perturbations to the parameters (inputs), which do not comply with the poisoning mechanism in the recommender system. Thus for the practical models that are very good at learning existing data, it does not perform well. To address the above limitations, we propose adversarial poisoning training (APT). It simulates the poisoning process by injecting fake users (ERM users) who are dedicated to minimizing empirical risk to build a robust system. Besides, to generate ERM users, we explore an approximation approach to estimate each fake user's influence on the empirical risk. Although the strategy of "fighting fire with fire" seems counterintuitive, we theoretically prove that the proposed APT can boost the upper bound of poisoning robustness. Also, we deliver the first theoretical proof that adversarial training holds a positive effect on enhancing recommendation robustness. Through extensive experiments with five poisoning attacks on four real-world datasets, the results show that the robustness improvement of APT significantly outperforms baselines. It is worth mentioning that APT also improves model generalization in most cases.

Adversarial-Enhanced Hybrid Graph Network for User Identity Linkage
Authors: Xiaolin Chen (1), Xuemeng Song (1), Guozhen Peng (1), Shanshan Feng (2), Liqiang Nie (1)
1: Shandong University, 2: Harbin Institute of Technology

ACM DL

Google Scholar

(105)
概要:　本研究では、異なるソーシャルメディアプラットフォーム間での異質なマルチモーダル投稿とソーシャル接続に基づくユーザーアイデンティティリンクタスクを調査します。このタスクは以下の2つの課題により容易ではありません。1) 各ユーザーが内的なマルチモーダル投稿と外的なソーシャル接続を持つため、内的および外的な視点からユーザーの表現学習を正確に実施することが主な課題であり、2) 同一のアイデンティティに関する異なるプラットフォーム上の表現が異なる傾向がある（すなわちセマンティックギャップ問題）ため、異なるプラットフォームのデータ分布の差異に起因します。従って、このセマンティックギャップ問題を軽減することがもう一つの難題となります。この目的のために、新規の adversarial-enhanced hybrid graph network (AHG-Net) を提案します。これは、ユーザー表現抽出、ハイブリッドユーザー表現学習、敵対的学習の3つの主要なコンポーネントで構成されます。具体的には、AHG-Net はまず高度なディープラーニング技術を用いて、ユーザーの異質なマルチモーダル投稿とソーシャル接続から中間表現を抽出します。次に、AHG-Net は内的ユーザー表現学習と外的ユーザー表現学習をハイブリッドグラフネットワークで統合します。最後に、AHG-Net はセマンティックディスクリミネータを使用して、同一のアイデンティティの学習ユーザープレゼンテーションを類似させるように敵対的学習を採用します。評価のために、Twitter と Foursquare から収集した62,021枚の画像を使用して、既存のデータセットを拡張することでマルチモーダルユーザーアイデンティティリンクデータセットを作成しました。広範な実験により、提案されたネットワークの優位性が検証されました。また、研究コミュニティの促進のために、データセット、コード、パラメータを公開します。

Abstract:　 In this work, we investigate the user identity linkage task across different social media platforms based on heterogeneous multi-modal posts and social connections. This task is non-trivial due to the following two challenges. 1) As each user involves both intra multi-modal posts and inter social connections, how to accurately fulfil the user representation learning from both intra and inter perspectives constitutes the main challenge. And 2) even representations distributed on different platforms of the same identity tend to be distinct (i.e., the semantic gap problem) owing to discrepant data distribution of different platforms. Hence, how to alleviate the semantic gap problem poses another tough challenge. To this end, we propose a novel adversarial-enhanced hybrid graph network (AHG-Net), consisting of three key components: user representation extraction, hybrid user representation learning, and adversarial learning. Specifically, AHG-Net first employs advanced deep learning techniques to extract the user's intermediate representations from his/her heterogeneous multi-modal posts and social connections. Then AHG-Net unifies the intra-user representation learning and inter-user representation learning with a hybrid graph network. Finally, AHG-Net adopts adversarial learning to encourage the learned user presentations of the same identity to be similar using a semantic discriminator. Towards evaluation, we create a multi-modal user identity linkage dataset by augmenting an existing dataset with 62,021 images collected from Twitter and Foursquare. Extensive experiments validate the superiority of the proposed network. Meanwhile, we release the dataset, codes, and parameters to facilitate the research community.

A Study of Defensive Methods to Protect Visual Recommendation Against Adversarial Manipulation of Images
Authors: Vito Walter Anelli (1), Yashar Deldjoo (1), Tommaso Di Noia (1), Daniele Malitesta (1), Felice Antonio Merra (1)
1: Polytechnic University of Bari

ACM DL

Google Scholar

(106)
概要:　視覚ベースのレコメンダーシステム（VRS）は、ユーザーのフィードバックとアイテム画像の視覚的特徴を統合することで、推薦性能を向上させます。最近では、人間に感知できない画像の摂動、いわゆる敵対的サンプルが、特定の商品のカテゴリーを押し上げたり（推奨）または押し下げたり（非推薦）することで、VRSの性能を変化させることができることが示されています。最も効果的な敵対的防御方法の1つは、敵対的サンプルをトレーニングプロセスに組み込み、敵対的リスクを最小化することでモデルのロバスト性を高める敵対的トレーニング（AT）です。このATの効果は、画像分類などの教師あり学習タスクにおけるDNNの防御で検証されています。しかし、ATが画像の敵対的摂動に対して深層VRSをどの程度保護できるかについては、ほとんど調査されていません。本研究では、VRSの防御側に焦点を当て、この分野のフロンティアを広げるための一般的な知見を提供します。まず、VRS上のDNNに対する一連の敵対的攻撃と、それに対抗する防御戦略を紹介します。次に、防御されたまたは防御されていないDNNが、さまざまな視覚認識アイテム推薦タスクにおいてどのように性能を発揮するかを実証的に調査する評価フレームワーク「Visual Adversarial Recommender（VAR）」を提案します。大規模な実験の結果、DNNのロバスト化を通じたVRSの保護には深刻なリスクがあることが示唆されています。ソースコードとデータはhttps://github.com/sisinflab/Visual-Adversarial-Recommendationで入手可能です。

Abstract:　 Visual-based recommender systems (VRSs) enhance recommendation performance by integrating users' feedback with the visual features of items' images. Recently, human-imperceptible image perturbations, defined adversarial samples, have been shown capable of altering the VRSs performance, for example, by pushing (promoting) or nuking (demoting) specific categories of products. One of the most effective adversarial defense methods is adversarial training (AT), which enhances the robustness of the model by incorporating adversarial samples into the training process and minimizing an adversarial risk. The AT effectiveness has been verified on defending DNNs in supervised learning tasks such as image classification. However, the extent to which AT can protect deep VRSs, against adversarial perturbation of images remains mostly under-investigated. This work focuses on the defensive side of VRSs and provides general insights that could be further exploited to broaden the frontier in the field. First, we introduce a suite of adversarial attacks against DNNs on top of VRSs, and defense strategies to counteract them. Next, we present an evaluation framework, named Visual Adversarial Recommender (VAR), to empirically investigate the performance of defended or undefended DNNs in various visually-aware item recommendation tasks. The results of large-scale experiments indicate alarming risks in protecting a VRS through the DNN robustification. Source code and data are available at https://github.com/sisinflab/Visual-Adversarial-Recommendation.

Dynamic Modality Interaction Modeling for Image-Text Retrieval
Authors: Leigang Qu (1), Meng Liu (2), Jianlong Wu (1), Zan Gao (3), Liqiang Nie (1)
1: Shandong University, 2: Shandong Jianzhu University, 3: Shandong Artificial Intelligence Institute

ACM DL

Google Scholar

(107)
概要:　画像-テキスト検索は、情報検索における基本かつ重要な分野です。視覚と言語の橋渡しにおいて多くの進展が見られるにもかかわらず、モーダル内推論とクロスモーダルの整合が難しいため、依然として挑戦的な課題です。既存のモダリティ相互作用方法は公共データセットで印象的な成果を上げていますが、それらは相互作用パターンの設計において専門家の経験や経験的なフィードバックに大きく依存しているため、柔軟性に欠けています。これらの課題に対処するために、ルーティングメカニズムに基づく新しいモダリティ相互作用モデリングネットワークを開発しました。これは、画像-テキスト検索に向けた最初の統一的かつ動的なマルチモーダル相互作用フレームワークです。具体的には、まずモダリティ相互作用の異なるレベルを探るための基本単位として4種類のセルを設計し、それらを密に接続してルーティング空間を構築します。モデルに経路決定能力を与えるために、各セルに動的ルーターを統合し、パターン探索を行います。ルーターは入力に依存するため、モデルは異なるデータに対して異なるアクティベートパスを動的に学習できます。Flickr30KおよびMS-COCOという二つのベンチマークデータセットにおける広範な実験により、いくつかの最新技術と比較して、我々のモデルの優位性が確認されました。

Abstract:　 Image-text retrieval is a fundamental and crucial branch in information retrieval. Although much progress has been made in bridging vision and language, it remains challenging because of the difficult intra-modal reasoning and cross-modal alignment. Existing modality interaction methods have achieved impressive results on public datasets. However, they heavily rely on expert experience and empirical feedback towards the design of interaction patterns, therefore, lacking flexibility. To address these issues, we develop a novel modality interaction modeling network based upon the routing mechanism, which is the first unified and dynamic multimodal interaction framework towards image-text retrieval. In particular, we first design four types of cells as basic units to explore different levels of modality interactions, and then connect them in a dense strategy to construct a routing space. To endow the model with the capability of path decision, we integrate a dynamic router in each cell for pattern exploration. As the routers are conditioned on inputs, our model can dynamically learn different activated paths for different data. Extensive experiments on two benchmark datasets, i.e., Flickr30K and MS-COCO, verify the superiority of our model compared with several state-of-the-art baselines.

Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval
Authors: Weike Jin (1), Zhou Zhao (1), Pengcheng Zhang (1), Jieming Zhu (2), Xiuqiang He (2), Yueting Zhuang (1)
1: Zhejiang University, 2: Huawei Noah's Ark Lab

ACM DL

Google Scholar

(108)
概要:　インターネット上の動画コンテンツの人気に伴い、動画とテキスト間の情報検索は研究者たちの間で大きな関心を集めています。これは、クロスモーダル検索タスクの中でも特に難しい課題です。一般的な解決策は、クロスモーダルな類似性を測定するための共通の埋め込み空間を学習することです。しかし、多くの既存のアプローチは、テキスト情報、動画情報、またはクロスモーダルマッチング手法のいずれかに重きを置く一方で、これらすべてを均等に重視することが少ないです。我々は、優れた動画-テキスト検索システムはこれら三つのポイントすべてを考慮に入れ、両モダリティのセマンティック情報を最大限に活用し、包括的なマッチングを行うべきだと考えています。本論文では、動画-テキスト検索タスクのための階層型クロスモーダルグラフ一貫学習ネットワーク（HCGC）を提案し、動画-テキストマッチングのために多層のグラフ一貫性を考慮します。具体的には、まず動画の階層的なグラフ表現を構築し、それはグローバルからローカルまでの三層（動画、クリップ、オブジェクト）を含みます。同様に、対応するテキストグラフは、文、動作、エンティティ間のセマンティック関係に基づいて構築されます。次に、動画グラフとテキストグラフ間のより良いマッチを学習するために、三種類のグラフ一貫性（直接および間接の両方）を設計します：グラフ間のパラレル一貫性、グラフ間のクロス一貫性、およびグラフ内のクロス一貫性です。異なる動画-テキストデータセットに関する広範な実験結果は、テキストから動画、および動画からテキストの検索双方で我々のアプローチの有効性を示しています。

Abstract:　 Due to the popularity of video contents on the Internet, the information retrieval between videos and texts has attracted broad interest from researchers, which is a challenging cross-modal retrieval task. A common solution is to learn a joint embedding space to measure the cross-modal similarity. However, many existing approaches either pay more attention to textual information, video information, or cross-modal matching methods, but less to all three. We believe that a good video-text retrieval system should take into account all three points, fully exploiting the semantic information of both modalities and considering a comprehensive match. In this paper, we propose a Hierarchical Cross-Modal Graph Consistency Learning Network (HCGC) for video-text retrieval task, which considers multi-level graph consistency for video-text matching. Specifically, we first construct a hierarchical graph representation for the video, which includes three levels from global to local: video, clips and objects. Similarly, the corresponding text graph is constructed according to the semantic relationships among sentence, actions and entities. Then, in order to learn a better match between the video and text graph, we design three types of graph consistency (both direct and indirect): inter-graph parallel consistency, inter-graph cross consistency and intra-graph cross consistency. Extensive experimental results on different video-text datasets demonstrate the effectiveness of our approach on both text-to-video and video-to-text retrieval.

PAN: Prototype-based Adaptive Network for Robust Cross-modal Retrieval
Authors: Zhixiong Zeng (1), Shuai Wang (1), Nan Xu (1), Wenji Mao (1)
1: Institute of Automation

ACM DL

Google Scholar

(109)
概要:　クロスモーダル検索の実際の応用において、検索システムのテストクエリは大きく異なることがあり、未知のカテゴリーからも発生する可能性があります。また、データ収集のコストや難しさなどの問題により、クロスモーダル検索用のデータは異なるモダリティ間で不均衡であることがよくあります。本論文では、現実世界の応用に向けたクロスモーダル検索システムの堅牢性を向上させるための二つの重要な課題に取り組みます。それは、未知のカテゴリーからのテストクエリへの対応と、モダリティ不均衡なトレーニングデータの処理です。既存の方法では、これらの課題に十分に対処されていません。これらの課題を克服するために、我々はプロトタイプ学習の利点を活かし、ロバストなクロスモーダル検索のためのプロトタイプベースの適応ネットワーク（PAN）を提案します。我々の方法は、統一されたプロトタイプを用いて各モーダリティ間のセマンティックカテゴリーを表現し、これにより異なるカテゴリーの識別情報を提供し、統一されたプロトタイプをアンカーとしてクロスモーダル表現を適応的に学習します。さらに、セマンティックな一貫性とモダリティの異質性を保ちながら、バランスの取れた表現を再構築するための新しいプロトタイプ伝播戦略を提案します。ベンチマークデータセットでの実験結果は、最先端（SOTA）手法と比較して我々の方法の有効性を示しており、さらなる堅牢性試験は、上記の課題を解決する上で我々の方法の優位性を示しています。

Abstract:　 In practical applications of cross-modal retrieval, test queries of the retrieval system may vary greatly and come from unknown category. Meanwhile, due to the cost and difficulty of data collection as well as other issues, the available data for cross-modal retrieval are often imbalanced over different modalities. In this paper, we address two important issues to increase the robustness of cross-modal retrieval system for real-world applications: handling test queries from unknown category and modality-imbalanced training data. The first issue has not been addressed by existing methods and the second issue was not well addressed in the related research. To tackle the above issues, we take the advantage of prototype learning, and propose a prototype-based adaptive network (PAN) for robust cross-modal retrieval. Our method leverages a unified prototype to represent each semantic category across modalities, which provides discriminative information of different categories and takes unified prototypes as anchors to learn cross-modal representations adaptively. Moreover, we propose a novel prototype propagation strategy to reconstruct balanced representations which preserves the semantic consistency and modality heterogeneity. Experimental results on the benchmark datasets demonstrate the effectiveness of our method compared to the SOTA methods, and further robustness tests show the superiority of our method in solving the above issues.

Multi-Type Textual Reasoning for Product-Aware Answer Generation
Authors: Yue Feng (1), Zhaochun Ren (1), Weijie Zhao (2), Mingming Sun (1), Ping Li (2)
1: Baidu Research, 2: Baidu Research

ACM DL

Google Scholar

(110)
概要:　本研究では、レビューや製品属性を読んで製品関連の質問に対して自然な回答を自動生成することを目指した、eコマースの質疑応答タスクについて検討します。しかし、既存の方法は、各レビューや各製品属性が意味的に独立していると仮定し、これらの複数タイプのテキスト間の関係を無視しがちです。本論文では、これらすべての複数タイプのテキストの論理的関係をモデル化するために、レビュー属性異種グラフニューラルネットワーク（RAHGNN）を提案します。RAHGNNは、レビュー・属性異種グラフ構築器、質問対応入力エンコーダ、異種グラフ関係解析器、コンテキストベースの回答デコーダの四つのコンポーネントで構成されています。具体的には、レビューと製品属性を持つ異種グラフを構築した後、レビュー・ノードおよび属性・ノードの初期表現をそれぞれ質問注意ネットワークおよびキー-バリュー記憶ネットワークに基づいて導き出します。RAHGNNは、ノードレベルの注意とセマンティックレベルの注意を使用して、サブグラフ構造およびサブグラフの意味に基づいて関係を解析します。最後に、関係の表現をコンテキスト入力として使用し、リカレントニューラルネットワークにより回答を生成します。大規模な実世界のeコマースデータセットにおける広範な実験結果は、RAHGNNが最先端のベースラインよりも優れた性能を示すだけでなく、製品認識型の回答生成における複数タイプのテキスト関係の良好な解釈可能性を有する潜在能力も証明しています。

Abstract:　 By reading reviews and product attributes, e-commerce question-answering task aims to automatically generate natural-sounding answers for product-related questions. Existing methods, however, typically assume that each review and each product attribute are semantically independent, ignoring the relation among all these multi-type texts. In this paper, we propose a review-attribute heterogeneous graph neural network (abbreviated as RAHGNN) to model the logical relation of all multi-type text. RAHGNN consists of four components: a review-attribute heterogeneous graph constructor, a question-aware input encoder, a heterogeneous graph relation analyzer, and a context-based answer decoder. Specifically, after constructing the heterogeneous graph with reviews and product attributes, we derive the initial representation of each review node and attribute node based on question attention network and key-value memory network respectively. RAHGNN analyzes the relation according to the subgraph structure and subgraph semantic meaning using node-level attention and semantic-level attention. Finally, the answer is generated by the recurrent neural network with the relation representation as context input. Extensive experimental results on a large-scale real-world e-commerce dataset not only show the superior performance of RAHGNN over state-of-the-art baselines, but also demonstrate its potentially good interpretability for multi-type text relation in product-aware answer generation.

Heterogeneous Attention Network for Effective and Efficient Cross-modal Retrieval
Authors: Tan Yu (1), Yi Yang (2), Yi Li (2), Lin Liu (2), Hongliang Fei (3), Ping Li (1)
1: Baidu Research, 2: Baidu Inc., 3: Baidu Research

ACM DL

Google Scholar

(111)
概要:　伝統的に、クロスモーダル検索のタスクは、共通の埋め込みを通じて取り組まれてきました。しかし、共通の埋め込み方法で使用されるグローバルマッチングは、画像の局所領域とテキストの単語間のマッチングを効果的に記述することができないことがよくあります。そのため、テキストと画像の関連性を捉える際に効果的でない場合があります。本研究では、効果的かつ効率的なクロスモーダル検索のための異種注目ネットワーク（HAN）を提案します。提案するHANは、画像を一連のバウンディングボックス特徴量によって、文章を一連の単語特徴量によって表現します。画像と文章の関連性は、単語特徴量の集合とバウンディングボックス特徴量の集合との間のセット間マッチングによって決定されます。マッチングの有効性を向上させるために、提案する異種注目層を利用して、単語特徴量とバウンディングボックス特徴量の両方にクロスモーダルコンテキストを提供します。また、より効果的にメトリックを最適化するために、従来のトリプレットロスと比べて、困難なネガティブにより多くの注意を適応的に割く新しいソフトマックストリプレットロスを提案し、提案するHANをより効果的に学習させます。さらに、提案するHANは効率的であり、トレーニングには1枚のGPUカードしか必要としない軽量なアーキテクチャを持っています。2つの公開ベンチマークで行った広範な実験によって、HANの有効性と効率性が実証されました。この研究は、Baidu検索広告のプロダクションに導入されており、「PaddleBox」プラットフォームの一部を成しています。

Abstract:　 Traditionally, the task of cross-modal retrieval is tackled through joint embedding. However, the global matching used in joint embedding methods often fails to effectively describe matchings between local regions of the image and words in the text. Hence they may not be effective in capturing the relevance between the text and the image. In this work, we propose a heterogeneous attention network (HAN) for effective and efficient cross-modal retrieval. The proposed HAN represents an image by a set of bounding box features and a sentence by a set of word features. The relevance between the image and the sentence is determined by the set-to-set matching between the set of word features and the set of bounding box features. To enhance the matching effectiveness, we exploit the proposed heterogeneous attention layer to provide the cross-modal context for word features as well as bounding box features. Meanwhile, to optimize the metric more effectively, we propose a new soft-max triplet loss, which adaptively gives more attention to harder negatives and thus trains the proposed HAN in a more effective manner compared with the original triplet loss. Meanwhile, the proposed HAN is efficient, and its lightweight architecture only needs a single GPU card for training. Extensive experiments conducted on two public benchmarks demonstrate the effectiveness and efficiency of our HAN. This work has been deployed in production Baidu Search Ads and is part of the "PaddleBox'' platform.

Learning Graph Meta Embeddings for Cold-Start Ads in Click-Through Rate Prediction
Authors: Wentao Ouyang (1), Xiuwu Zhang (1), Shukui Ren (1), Li Li (1), Kun Zhang (1), Jinmei Luo (1), Zhaojie Liu (1), Yanlong Du (1)
1: Alibaba Group

ACM DL

Google Scholar

(112)
概要:　クリック率（CTR）予測は、オンライン広告システムにおける最も重要なタスクの一つです。最近では、特徴埋め込みと高次データの非線形性を利用したディープラーニングモデルがCTR予測において劇的な成功を収めています。しかし、これらのモデルは、新しいIDのコールドスタート広告では埋め込みが十分に学習されていないため、性能が低下することがあります。本論文では、グラフニューラルネットワークとメタラーニングに基づいて、新しい広告IDに対して理想的な初期埋め込みを迅速に生成する方法を学習するGraph Meta Embedding（GME）モデルを提案します。従来の研究は、新しい広告自体からこの問題に対処していますが、既存の古い広告に含まれる有用な情報を無視していました。それに対して、GMEは新しい広告と既存の古い広告という2つの情報源を同時に考慮します。新しい広告に対しては、その関連する属性を活用します。既存の古い広告に対しては、それらを新しい広告に接続するグラフを構築し、適応的に有用な情報を抽出します。我々は、どのような情報を使用し、どのように情報を抽出するかを探るために、異なる視点から3つの具体的なGMEを提案します。特に、GME-Pは事前に学習された隣接IDの埋め込みを使用し、GME-Gは生成された隣接IDの埋め込みを使用し、GME-Aは隣接属性を使用します。3つの実世界のデータセットに対する実験結果は、GMEが5つの主要なディープラーニングベースのCTR予測モデルに対して、コールドスタート（すなわち、トレーニングデータが利用できない場合）およびウォームアップ（すなわち、少数のトレーニングサンプルが収集される場合）の両方のシナリオで予測精度を大幅に向上させることを示しています。GMEはコンバージョン率（CVR）予測にも適用できます。

Abstract:　 Click-through rate (CTR) prediction is one of the most central tasks in online advertising systems. Recent deep learning-based models that exploit feature embedding and high-order data nonlinearity have shown dramatic successes in CTR prediction. However, these models work poorly on cold-start ads with new IDs, whose embeddings are not well learned yet. In this paper, we propose Graph Meta Embedding (GME) models that can rapidly learn how to generate desirable initial embeddings for new ad IDs based on graph neural networks and meta learning. Previous works address this problem from the new ad itself, but ignore possibly useful information contained in existing old ads. In contrast, GMEs simultaneously consider two information sources: the new ad and existing old ads. For the new ad, GMEs exploit its associated attributes. For existing old ads, GMEs first build a graph to connect them with new ads, and then adaptively distill useful information. We propose three specific GMEs from different perspectives to explore what kind of information to use and how to distill information. In particular, GME-P uses Pre-trained neighbor ID embeddings, GME-G uses Generated neighbor ID embeddings and GME-A uses neighbor Attributes. Experimental results on three real-world datasets show that GMEs can significantly improve the prediction performance in both cold-start (i.e., no training data is available) and warm-up (i.e., a small number of training samples are collected) scenarios over five major deep learning-based CTR prediction models. GMEs can be applied to conversion rate (CVR) prediction as well.

Learning to Warm Up Cold Item Embeddings for Cold-start Recommendation with Meta Scaling and Shifting Networks
Authors: Yongchun Zhu (1), Ruobing Xie (2), Fuzhen Zhuang (3), Kaikai Ge (2), Ying Sun (4), Xu Zhang (2), Leyu Lin (2), Juan Cao (4)
1: Institute of Computing Technology, 2: Tencent, 3: Beihang University, 4: Institute of Computing Technology

ACM DL

Google Scholar

(113)
概要:　テキスト:近年、埋め込み技術はレコメンダーシステムにおいて顕著な成功を収めています。しかしながら、埋め込み技術はデータを多く必要とし、コールドスタート問題に苦しんでいます。特に、限られたインタラクションしか持たないコールドスタートアイテムに対して、合理的なアイテムID埋め込み（コールドID埋め込み）を訓練することは困難であり、これは埋め込み技術にとっての大きな課題です。このコールドアイテムID埋め込みには主に二つの問題が存在します。(1) コールドID埋め込みとディープモデルの間にギャップが存在すること。(2) コールドID埋め込みがノイズのあるインタラクションに深刻な影響を受けること。しかし、ほとんどの既存の手法はコールドスタート問題におけるこの二つの問題を同時に考慮していません。これらの問題に対処するために、我々は二つのキーアイデアを採用します。(1) コールドアイテムID埋め込みのモデルフィッティングを加速する（高速適応）。(2) ノイズの影響を軽減する。この方針に従って、我々は各アイテムのためにスケーリング関数とシフティング関数を生成するMeta Scaling and Shifting Networksを提案します。スケーリング関数はコールドアイテムID埋め込みを暖かい（well-fit）フィーチャースペースに直接変換でき、シフティング関数はノイズのある埋め込みから安定した埋め込みを生成できます。この二つのメタネットワークを活用して、我々はコールドID埋め込みを「温める」ことを学習するMeta Warm Up Framework（MWUF）を提案します。さらに、MWUFはさまざまな既存のディープレコメンデーションモデルに適用可能な汎用フレームワークです。提案モデルは、レコメンデーションおよび広告データセットの三つの人気ベンチマークで評価され、その優れた性能と互換性が実証されました。

Abstract:　 Recently, embedding techniques have achieved impressive success in recommender systems. However, the embedding techniques are data demanding and suffer from the cold-start problem. Especially, for the cold-start item which only has limited interactions, it is hard to train a reasonable item ID embedding, called cold ID embedding, which is a major challenge for the embedding techniques. The cold item ID embedding has two main problems: (1) A gap is existing between the cold ID embedding and the deep model. (2) Cold ID embedding would be seriously affected by noisy interaction. However, most existing methods do not consider both two issues in the cold-start problem, simultaneously. To address these problems, we adopt two key ideas: (1) Speed up the model fitting for the cold item ID embedding (fast adaptation). (2) Alleviate the influence of noise. Along this line, we propose Meta Scaling and Shifting Networks to generate scaling and shifting functions for each item, respectively. The scaling function can directly transform cold item ID embeddings into warm feature space which can fit the model better, and the shifting function is able to produce stable embeddings from the noisy embeddings. With the two meta networks, we propose Meta Warm Up Framework (MWUF) which learns to warm up cold ID embeddings. Moreover, MWUF is a general framework that can be applied upon various existing deep recommendation models. The proposed model is evaluated on three popular benchmarks, including both recommendation and advertising datasets. The evaluation results demonstrate its superior performance and compatibility.

FORM: Follow the Online Regularized Meta-Leader for Cold-Start Recommendation
Authors: Xuehan Sun (1), Tianyao Shi (1), Xiaofeng Gao (2), Yanrong Kang (3), Guihai Chen (1)
1: Shanghai Jiao Tong University, 2: Shanghai Jiao Tong University, 3: Tencent Advertising and Marketing Service

ACM DL

Google Scholar

(114)
概要:　メタラーニングに基づくレコメンデーションシステムは、二重のメタ最適化プロセスを通じてコールドスタート問題を緩和します。レコメンデーションは、事前にトレーニングされた静的なシステムレベルのパラメータから過去の経験を借り、新しいユーザーに対してモデルをユーザーレベルでファインチューニングします。しかし、実際のレコメンドシステムでは、動的なオンラインシーケンスでユーザーをサンプリングすることが自然であり、既存のメタラーニングベースのレコメンデーションにさらに課題をもたらします。具体的には、ユーザーレベルのレコメンデーションモデルが完全に収束する前にシステムレベルの更新が始まること、現在のメタラーニングフレームワークには安定でランダム性に強い二重の勾配下降アプローチが欠けていること、異なるユーザー間での学習能力の評価が欠如していることなどが挙げられます。本論文では、これらの問題を解決するために、FORM（オンライン規則化メタリーダーレコメンデーションアプローチ）を提案します。メタラーニングベースのレコメンダーをオンラインシナリオに移行するために、フォロー・ザ・メタリーダーアルゴリズムを開発し、安定したオンライン勾配を学習します。また、オンラインシステムの変動を軽減し、スパースウェイトパラメータを生成するために規則化手法を導入します。さらに、既存ユーザーの分散や学習ショットに基づいてスケーラブルなメタ訓練学習率を設計し、新規ユーザーへの効率的な適応をガイドします。3つの公開データセットと1つの商用オンライン広告データセットにおける広範な実験により、我々のアプローチの有効性と安定性が示され、他の最先端手法を上回り、新規ユーザーに対して安定で迅速な適応を実現します。

Abstract:　 Meta-learning based recommendation systems alleviate the cold-start problem through a bi-level meta-optimization process. Recommendation borrows prior experience from pre-trained static system-level parameters and fine-tunes the model in user-level for new users. However, it is more natural for the system to sample users in a dynamic online sequence in most real-world recommendation systems, which brings further challenges for existing meta-learning based recommendation: system-level updates begins before user-level recommendation models have converged on the whole time series; stable and randomness-resistant bi-level gradient descent approaches are missing in the current meta-learning framework; evaluation on learning abilities across different users are lacked for exploring the diversities of different users. In this paper, we propose an online regularized meta-leader recommendation approaches named FORM to address such problems. To transfer meta-learning based recommender into the online scenario, we develop follow-the-meta-leader algorithm to learn stable online gradients. Regularized methods are then introduced to alleviate the volatility of online systems and produce sparse weight parameters. Besides, we design a scalable meta-trained learning rate based on the variance and learning-shots of existing users to guide the model to adapt efficiently to new users. Extensive experiments on three public datasets and one commercial online advertisement dataset demonstrate our approaches' effectiveness and stability, which outperform other state-of-the-art methods and achieve a stable and fast adaption on new users.

Privileged Graph Distillation for Cold Start Recommendation
Authors: Shuai Wang (1), Kun Zhang (1), Le Wu (2), Haiping Ma (3), Richang Hong (1), Meng Wang (4)
1: Hefei University of Technology, 2: Hefei University of Technology& Hefei Comprehensive National Science Center, 3: Anhui University, 4: Hefei University of Technology & Hefei Comprehensive National Science Center

ACM DL

Google Scholar

(115)
概要:　レコメンダーシステムにおけるコールドスタート問題は長年の課題であり、履歴的なインタラクション記録がない新規ユーザー（アイテム）に属性に基づいて推薦する必要があります。これらの推薦システムでは、ウォームユーザー（アイテム）がコールドスタートユーザー（アイテム）に比べて優れたインタラクション記録の共有信号を持ち、このコラボレーティブフィルタリング（CF）信号は推薦性能において競争力があります。多くの研究者が、コラボレーティブ信号埋め込み空間と属性埋め込み空間の相関を学習することでコールドスタート推薦を改善することを提案しており、多くのオンラインプラットフォームではユーザーおよびアイテムのカテゴリ属性が利用可能です。しかし、コールドスタート推薦は依然として二重の埋め込み空間のモデリングおよび空間変換の単純な仮定により制限されています。ユーザー-アイテムのインタラクション行動およびユーザー（アイテム）属性が自然に異種グラフ構造を形成することから、本稿では特権グラフ蒸留モデル（PGD）を提案します。教師モデルは特権的なCFリンクを持つウォームユーザーおよびアイテムの異種グラフ構造で構成されており、生徒モデルはCFリンクのないエンティティ属性グラフで構成されています。具体的には、教師モデルは構築された異種グラフから複雑な高次関係を注入することで各エンティティのより良い埋め込みを学習します。生徒モデルは教師の埋め込みから蒸留された特権的なCF埋め込みを学習できます。提案モデルは新規ユーザー、新規アイテム、または新規ユーザー-新規アイテムのコールドスタートシナリオに広く適用可能です。最後に、実世界のデータセットにおける広範囲の実験結果は、提出されたモデルが異なるタイプのコールドスタート問題において有効であることを明確に示しており、3つのデータセットにおいて最先端のベースラインに対して平均6.6％、5.6％、および17.1％の改善を達成しました。

Abstract:　 The cold start problem in recommender systems is a long-standing challenge, which requires recommending to new users (items) based on attributes without any historical interaction records. In these recommendation systems, warm users (items) have privileged collaborative signals of interaction records compared to cold start users (items), and these Collaborative Filtering (CF) signals are shown to have competing performance for recommendation. Many researchers proposed to learn the correlation between collaborative signal embedding space and the attribute embedding space to improve the cold start recommendation, in which user and item categorical attributes are available in many online platforms. However, the cold start recommendation is still limited by two embedding spaces modeling and simple assumptions of space transformation. As user-item interaction behaviors and user (item) attributes naturally form a heterogeneous graph structure, in this paper, we propose a privileged graph distillation model (PGD). The teacher model is composed of a heterogeneous graph structure for warm users and items with privileged CF links. The student model is composed of an entity-attribute graph without CF links. Specifically, the teacher model can learn better embeddings of each entity by injecting complex higher-order relationships from the constructed heterogeneous graph. The student model can learn the distilled output with privileged CF embeddings from the teacher embeddings. Our proposed model is generally applicable to different cold start scenarios with new user, new item, or new user-new item. Finally, extensive experimental results on the real-world datasets clearly show the effectiveness of our proposed model on different types of cold start problems, with average 6.6%, 5.6%, and 17.1% improvement over state-of-the-art baselines on three datasets, respectively.

Supporting Metacognition during Exploratory Search with the OrgBox
Authors: Anita Crescenzi (1), Austin R. Ward (1), Yuan Li (1), Rob Capra (1)
1: University of North Carolina at Chapel Hill

ACM DL

Google Scholar

(116)
概要:　現行の検索システムは、事実調査や情報照会に従事するユーザーに対して効果的なサポートを提供しています。しかし、学習、統合、計画、反省といった認知およびメタ認知活動を伴う探索的検索タスクに対する支援は比較的少ないです。我々は、新たな知識整理ツール「OrgBox」の効果を調査するために、24人を対象とした被験者内ユーザースタディを実施しました。このツールは、ユーザーが検索によって得た情報をドラッグ＆ドロップし、「ボックス」に配置することができる機能を備えており、ボックスは作成、ラベル付け、再配置が可能です。被験者は、OrgBoxを使用した探索的検索タスクと、テキストエディタ（例: 書式設定、箇条書き）機能を含む基準ツール「OrgDoc」を使用したタスクの2つを完了しました。本稿では、OrgBoxとOrgDocツールを比較した研究結果を提示します。具体的には、（1）検索インタラクション、（2）情報の保存および整理行動（例えば、情報量、メモの構造）、（3）タスクの認識、ツールの使いやすさ、およびタスク成果の質に関する認識、（4）タスクに関わる認知およびメタ認知活動に対するツールのサポートに対する認識の違いがあるかどうかを調査しました。我々の結果は、OrgBoxツール使用時には参加者がノートにより多くのグループ化セクションを作成し、より多くのテキストを保存することを示しています。メタ認知サポートに関しては、参加者はOrgBoxツールを使用することで、3種類のメタ認知活動（モニタリング/追跡、評価、計画）に対する支援レベルが有意に高いと認識しましたが、タスクの難易度に対する認識は変わりませんでした。

Abstract:　 Current search systems provide effective support to users engaged in fact-finding and look-up oriented tasks. However, they provide relatively little support for users engaged in exploratory search tasks that involve cognitive and metacognitive activities such as learning, synthesis, planning, and reflection. We conducted a within-subject user study (N=24) that investigated the effects of a novel knowledge organization tool called the OrgBox, designed to assist users with organizing and synthesizing information, and metacognitive activities. The OrgBox included features to allow users to drag-drop information they found through search into "boxes" that could be created, labelled, and re-arranged. Study participants completed two exploratory search tasks, one with the OrgBox, and one with the OrgDoc, a baseline tool that included features of a rich-text editor (e.g., formatting, bullets) for taking notes. In this paper, we present results from our study comparing the OrgBox and OrgDoc tools. Specifically, we investigate if there were differences in participants' (1) search interactions, (2) saving and organizing behaviors (e.g., amount of information, structure of notes), (3) perceptions of the tasks, tool usability, and quality of their task outputs, and (4) perceptions of how the tools provided support for cognitive and metacognitive activities involved in the task. Our results show that when using the OrgBox tool, participants created more grouping sections in their notes and saved more text. In terms of metacognitive support, participants perceived the OrgBox tool to provide significantly higher levels of support for three types of metacognitive activity (monitoring/tracking, evaluation, and planning) without changing their perceptions of the task difficulty.

Should Graph Convolution Trust Neighbors? A Simple Causal Inference Method
Authors: Fuli Feng (1), Weiran Huang (2), Xiangnan He (3), Xin Xin (4), Qifan Wang (5), Tat-Seng Chua (6)
1: Sea-NExT Joint Lab & National University of Singapore, 2: The Chinese University of Hong Kong, 3: University of Science and Technology of China, 4: University of Glasgow, 5: Google US, 6: National University of Singapore

ACM DL

Google Scholar

(117)
概要:　グラフ畳み込みネットワーク（GCN）は、情報検索（IR）アプリケーションにおける新興技術です。GCNはグラフの同質性（ホモフィリー）を前提としていますが、現実世界のグラフは完全ではなく、ノードの局所構造には不整合が含まれることがあります。例えば、ノードの隣接ノードのラベルが異なる場合があります。このため、GCNモデルにおいて局所構造の不整合を考慮する必要があります。既存の研究では、グラフアテンションなどの追加モジュールを導入することにより、この問題に対処しようとしています。これは、各隣接ノードの貢献度を学習することを期待されています。しかし、特にラベル付きデータが少ない場合など、監督信号が不足していると、こうしたモジュールが期待通りに機能しないことがあります。さらに、既存の手法はトレーニングデータ中のノードのモデリングに焦点を当てており、テストノードの局所構造の不整合を考慮していません。本研究は、テストノードの局所構造の不整合問題に焦点を当て、これまでほとんど注目されてこなかった視点から検討します。因果関係の新しい視点から、GCNがテストノードのラベルを予測する際にその局所構造をどの程度信頼すべきかを調査します。この目的のために、因果グラフを用いてGCNの動作メカニズムを解析し、ノードの局所構造が予測に与える因果効果を推定します。このアイデアはシンプルですが効果的です。まず、訓練されたGCNモデルを用いて、グラフ構造をブロックすることで予測に介入します。次に、元の予測と介入後の予測を比較し、予測に対する局所構造の因果効果を評価します。こうして局所構造の不整合の影響を除去し、より正確な予測が可能になります。7つのノード分類データセットにおける広範な実験により、因果関係に基づく本手法がGCNの推論段階を効果的に強化することを示しています。

Abstract:　 Graph Convolutional Network (GCN) is an emerging technique for information retrieval (IR) applications. While GCN assumes the homophily property of a graph, real-world graphs are never perfect: the local structure of a node may contain discrepancy, e.g., the labels of a node's neighbors could vary. This pushes us to consider the discrepancy of local structure in GCN modeling. Existing work approaches this issue by introducing an additional module such as graph attention, which is expected to learn the contribution of each neighbor. However, such module may not work reliably as expected, especially when there lacks supervision signal, e.g., when the labeled data is small. Moreover, existing methods focus on modeling the nodes in the training data, and never consider the local structure discrepancy of testing nodes. This work focuses on the local structure discrepancy issue for testing nodes, which has received little scrutiny. From a novel perspective of causality, we investigate whether a GCN should trust the local structure of a testing node when predicting its label. To this end, we analyze the working mechanism of GCN with causal graph, estimating the causal effect of a node's local structure for the prediction. The idea is simple yet effective: given a trained GCN model, we first intervene the prediction by blocking the graph structure; we then compare the original prediction with the intervened prediction to assess the causal effect of the local structure on the prediction. Through this way, we can eliminate the impact of local structure discrepancy and make more accurate prediction. Extensive experiments on seven node classification datasets show that our causality-based method effectively enhances the inference stage of GCN.

Meta-Inductive Node Classification across Graphs
Authors: Zhihao Wen (1), Yuan Fang (1), Zemin Liu (1)
1: Singapore Management University

ACM DL

Google Scholar

(118)
概要:　グラフ上の半教師ありノード分類は、社会ネットワーク上のコンテンツ分類や、eコマースのクエリーグラフにおけるクエリー意図分類など、情報検索の多くの現実応用を伴う重要な研究課題です。従来のアプローチは主に推論的（transductive）でしたが、最近のグラフニューラルネットワーク（GNN）はノードの特徴とネットワーク構造を統合することで、新しいノードや同一特徴空間内の新しいグラフにも適用可能な帰納的（inductive）ノード分類モデルを可能にしています。しかし、同じドメイン内のグラフ間には依然として差異が存在します。そのため、すべての新しいグラフを扱うために単一のグローバルモデル（例：最先端のGNN）を訓練し、グラフ間の差異を無視することは、最適なパフォーマンスを発揮しない可能性があります。本研究では、グラフ間の帰納的ノード分類の問題を検討します。既存の「一つのモデルですべてを解決」するアプローチとは異なり、我々はMI-GNNという新しいメタ帰納的フレームワークを提案し、メタラーニングのパラダイムの下で各グラフに対して帰納的モデルをカスタマイズします。つまり、MI-GNNは直接的に帰納的モデルを学習するのではなく、新しいグラフ上で半教師ありノード分類のためのモデルを訓練するための一般的な知識を学びます。グラフ間の差異に対処するために、MI-GNNはグラフレベルおよびタスクレベルの両方でデュアル適応メカニズムを採用しています。具体的には、グラフレベルの差異に適応するためのグラフ先行知識と、特定のグラフに基づいたタスクレベルの差異に適応するためのタスク先行知識を学習します。5つの実世界のグラフコレクションに対する広範な実験により、提案モデルの有効性を実証しました。

Abstract:　 Semi-supervised node classification on graphs is an important research problem, with many real-world applications in information retrieval such as content classification on a social network and query intent classification on an e-commerce query graph. While traditional approaches are largely transductive, recent graph neural networks (GNNs) integrate node features with network structures, thus enabling inductive node classification models that can be applied to new nodes or even new graphs in the same feature space. However, inter-graph differences still exist across graphs within the same domain. Thus, training just one global model (e.g., a state-of-the-art GNN) to handle all new graphs, whilst ignoring the inter-graph differences, can lead to suboptimal performance. In this paper, we study the problem of inductive node classification across graphs. Unlike existing one-model-fits-all approaches, we propose a novel meta-inductive framework called MI-GNN to customize the inductive model to each graph under a meta-learning paradigm. That is, MI-GNN does not directly learn an inductive model; it learns the general knowledge of how to train a model for semi-supervised node classification on new graphs. To cope with the differences across graphs, MI-GNN employs a dual adaptation mechanism at both the graph and task levels. More specifically, we learn a graph prior to adapt for the graph-level differences, and a task prior to adapt for the task-level differences conditioned on a graph. Extensive experiments on five real-world graph collections demonstrate the effectiveness of our proposed model.

Iterative Network Pruning with Uncertainty Regularization for Lifelong Sentiment Classification
Authors: Binzong Geng (1), Min Yang (2), Fajie Yuan (3), Shupeng Wang (2), Xiang Ao (4), Ruifeng Xu (5)
1: University of Science and Technology of China & Shenzhen Institutes of Advanced Technology, 2: Shenzhen Institutes of Advanced Technology, 3: Westlake University & Tencent, 4: Institute of Computing Technology, 5: Harbin Institute of Technology (Shenzhen)

ACM DL

Google Scholar

(119)
概要:　常に流入する意見情報を処理するためには、感情分類器の生涯学習能力が重要です。しかし、生涯学習を深層ニューラルネットワークで実現することは、漸次的に利用可能となる情報の継続的な学習が必然的に「忘却の災害」や干渉を引き起こすため、容易ではありません。本論文では、生涯感情分類のための新しい「不確実性正則化を伴う反復ネットワークプルーニング法」（IPRLS）を提案します。これにより、ネットワークプルーニングと重みの正則化の原理を活用します。不確実性正則化を伴うネットワークプルーニングを反復的に行うことにより、IPRLSは単一のBERTモデルを、複数のドメインからの継続的なデータに適応させることができ、「忘却の災害」や干渉を避けます。具体的には、反復的なプルーニング手法を利用して、大規模ディープネットワークの冗長なパラメータを削除し、解放されたスペースを新しいタスクの学習に利用することで、忘却の災害問題に対処します。新しいタスクを学習する際に旧タスクを固定する代わりに、ベイジアンオンライン学習フレームワークに基づく不確実性正則化を用いて、BERTの旧タスク重みの更新を制約し、ポジティブバックワードトランスファー（新タスクの学習が過去のタスクのパフォーマンスを向上させ、旧知識の喪失を防ぐ）を可能にします。さらに、各BERT層に並行してタスク固有の低次元残差関数を提案し、新しいタスクを学習する際にベースのBERTネットワークで保存された知識の喪失を防ぎやすくします。16の人気レビューコーパスを用いた広範な実験により、提案されたIPRLS法が生涯感情分類のための強力なベースラインを大幅に上回ることを実証しました。再現性のため、コードとデータを以下に提出します：\urlhttps://github.com/siat-nlp/IPRLS。

Abstract:　 Lifelong learning capabilities are crucial for sentiment classifiers to process continuous streams of opinioned information on the Web. However, performing lifelong learning is non-trivial for deep neural networks as continually training of incrementally available information inevitably results in catastrophic forgetting or interference. In this paper, we propose a novel i terative network p runing with uncertainty r egularization method for l ifelong s entiment classification (IPRLS), which leverages the principles of network pruning and weight regularization. By performing network pruning with uncertainty regularization in an iterative manner, IPRLS can adapt a single BERT model to work with continuously arriving data from multiple domains while avoiding catastrophic forgetting and interference. Specifically, we leverage an iterative pruning method to remove redundant parameters in large deep networks so that the freed-up space can then be employed to learn new tasks, tackling the catastrophic forgetting problem. Instead of keeping the old-tasks fixed when learning new tasks, we also use an uncertainty regularization based on the Bayesian online learning framework to constrain the update of old tasks weights in BERT, which enables positive backward transfer, i.e. learning new tasks improves performance on past tasks while protecting old knowledge from being lost. In addition, we propose a task-specific low-dimensional residual function in parallel to each layer of BERT, which makes IPRLS less prone to losing the knowledge saved in the base BERT network when learning a new task. Extensive experiments on 16 popular review corpora demonstrate that the proposed IPRLS method significantly outperforms the strong baselines for lifelong sentiment classification. For reproducibility, we submit the code and data at: \urlhttps://github.com/siat-nlp/IPRLS .

Decoupling Representation Learning and Classification for GNN-based Anomaly Detection
Authors: Yanling Wang (1), Jing Zhang (1), Shasha Guo (1), Hongzhi Yin (2), Cuiping Li (1), Hong Chen (1)
1: Renmin University of China, 2: The University of Queensland

ACM DL

Google Scholar

(120)
概要:　 GNN（グラフニューラルネットワーク）に基づく異常検知は、近年かなりの注目を集めています。これまで行われた既存の試みは、異常を検知するためのノード表現と分類器の共同学習に集中してきました。グラフにおける自己教師あり学習（Self-Supervised Learning, SSL）の最近の進展に触発され、ノード表現学習と異常検知のための分類を分離する別の可能性を探ります。既存のグラフSSLメソッドを使用してノードを表現する分離訓練が、共同訓練よりも性能向上を得られることを示す予備的な研究を行いましたが、行動パターンとラベルのセマンティクスが高度に不一致となった場合にはパフォーマンスが低下する可能性があります。この不一致によるバイアスを減らすために、ノード表現学習のための単純ながら効果的なグラフSSLスキームとして、Deep Cluster Infomax（DCI）を提案します。DCIは、グラフ全体を複数の部分にクラスタリングすることにより、集中した特徴空間で内在するグラフの特性を捉えます。私たちは、異常検知のための4つの実世界データセットで広範な実験を行いました。その結果、適切なSSLスキームを装備した分離訓練が、AUCにおいて共同訓練を上回ることが示されました。既存のグラフSSLスキームと比較して、DCIは分離訓練がさらに改善されるのに寄与できることが分かりました。

Abstract:　 GNN-based anomaly detection has recently attracted considerable attention. Existing attempts have thus far focused on jointly learning the node representations and the classifier for detecting the anomalies. Inspired by the recent advances of self-supervised learning (SSL) on graphs, we explore another possibility of decoupling the node representation learning and the classification for anomaly detection. We conduct a preliminary study to show that decoupled training using existing graph SSL schemes to represent nodes can obtain performance gains over joint training, but it may deteriorate when the behavior patterns and the label semantics become highly inconsistent. To be less biased by the inconsistency, we propose a simple yet effective graph SSL scheme, called Deep Cluster Infomax (DCI) for node representation learning, which captures the intrinsic graph properties in more concentrated feature spaces by clustering the entire graph into multiple parts. We conduct extensive experiments on four real-world datasets for anomaly detection. The results demonstrate that decoupled training equipped with a proper SSL scheme can outperform joint training in AUC. Compared with existing graph SSL schemes, DCI can help decoupled training gain more improvements.

"Did you buy it already?", Detecting Users Purchase-State From Their Product-Related Questions
Authors: Lital Kuchy (1), David Carmel (1), Thomas Huet (1), Elad Kravi (1)
1: Amazon

ACM DL

Google Scholar

(121)
概要:　本研究では、eコマースウェブサイト上でユーザーが行う製品に関する質問に基づいて、ユーザーの購入状態を識別する問題に取り組みます。我々は、製品を購入する前に行われる質問（購入前）と購入後に行われる質問（購入後）を区別します。まず、購入状態の定義には曖昧さが存在することを研究し、各状態における質問の言語的特徴を調査します。次に、購入前と購入後の質問の言語モデルの不一致を分析し、このタスクのための2つの分類スキームを提案します。どちらの分類スキームも人間の判断を上回る性能を示します。さらに、両分類モデルが消費者と販売者の双方における実世界の応用を向上させる効果があることを実証します。

Abstract:　 In this study we address the problem of identifying the purchase-state of users, based on product-related questions they ask on an eCommerce website. We differentiate between questions asked before buying a product (pre-purchase) and after (post-purchase). At first, we study the ambiguity that exists in purchase-states' definition, and then investigate the linguistic characteristics of the questions in each state. We analyze the discrepancy between the language models of pre- and post-purchase questions, and offer two classification schemes for this task, both outperform human judgments. We additionally show the effectiveness of our classification models in improving real world applications for both consumers and sellers.

A Graph-Enhanced Click Model for Web Search
Authors: Jianghao Lin (1), Weiwen Liu (2), Xinyi Dai (1), Weinan Zhang (1), Shuai Li (1), Ruiming Tang (2), Xiuqiang He (2), Jianye Hao (2), Yong Yu (1)
1: Shanghai Jiao Tong University, 2: Huawei Noah's Ark Lab

ACM DL

Google Scholar

(122)
概要:　検索ログを最大限に活用し、ユーザーの行動パターンをモデル化するために、多くのクリックモデルが提案され、ユーザーの暗黙のインタラクションフィードバックを抽出しています。従来のクリックモデルの多くは確率的グラフィカルモデル（PGM）フレームワークに基づいており、手動で設計された依存関係が必要であり、ユーザーの行動を過度に単純化する可能性があります。最近では、ニューラルネットワークに基づく手法が提案され、表現力を強化し、柔軟な依存関係を可能にすることで、ユーザー行動の予測精度を向上させています。しかし、これらの方法も依然としてデータの疎さやコールドスタート問題に悩まされています。本研究では、ウェブ検索のための新しいグラフ強化クリックモデル（GraphCM）を提案します。まず、各クエリやドキュメントを頂点と見なし、クエリとドキュメントそれぞれに対して新しい同質グラフ構築方法を提案し、セッション内およびセッション間の情報を十分に活用して疎さとコールドスタート問題に対処します。次に、検討仮説に従い、魅力推定器と検討予測器を個別にモデル化して魅力スコアと検討確率を出力し、事前に構築された同質グラフにエンコードされた補助情報を抽出するためにグラフニューラルネットワークと隣接インタラクション技術を適用します。最後に、組み合わせ関数を適用して、検討確率と魅力スコアをクリック予測に統合します。3つの実世界のセッションデータセットで実施した広範な実験は、GraphCMが最先端のモデルを凌駕するだけでなく、データの疎さとコールドスタート問題に対して優れたパフォーマンスを達成することを示しています。

Abstract:　 To better exploit search logs and model users' behavior patterns, numerous click models are proposed to extract users' implicit interaction feedback. Most traditional click models are based on the probabilistic graphical model (PGM) framework, which requires manually designed dependencies and may oversimplify user behaviors. Recently, methods based on neural networks are proposed to improve the prediction accuracy of user behaviors by enhancing the expressive ability and allowing flexible dependencies. However, they still suffer from the data sparsity and cold-start problems. In this paper, we propose a novel graph-enhanced click model (GraphCM) for web search. Firstly, we regard each query or document as a vertex, and propose novel homogeneous graph construction methods for queries and documents respectively, to fully exploit both intra-session and inter-session information for the sparsity and cold-start problems. Secondly, following the examination hypothesis, we separately model the attractiveness estimator and examination predictor to output the attractiveness scores and examination probabilities, where graph neural networks and neighbor interaction techniques are applied to extract the auxiliary information encoded in the pre-constructed homogeneous graphs. Finally, we apply combination functions to integrate examination probabilities and attractiveness scores into click predictions. Extensive experiments conducted on three real-world session datasets show that GraphCM not only outperforms the state-of-art models, but also achieves superior performance in addressing the data sparsity and cold-start problems.

ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table
Authors: Huifeng Guo (1), Wei Guo (2), Yong Gao (1), Ruiming Tang (1), Xiuqiang He (1), Wenzhi Liu (1)
1: Huawei Noah's Ark Lab, 2: Huawei Noah's Ark Lab

ACM DL

Google Scholar

(123)
概要:　深層学習の優れた特徴表現能力により、多くの深層クリック率（CTR）モデルが商業システムにおいて産業界で使用されています。より良いパフォーマンスを達成するためには、大量のトレーニングデータを効率的に使用して深層CTRモデルを訓練することが必要であり、このためには訓練プロセスの高速化が重要な課題となります。密なトレーニングデータを持つモデルとは異なり、CTRモデルのトレーニングデータは通常、高次元でスパースです。この高次元のスパースな入力を低次元の密な実数ベクトルに変換するために、ほぼすべての深層CTRモデルはエンベディング層を採用していますが、これに必要なメモリは簡単に数百GBまたはTBに達します。単一のGPUだけでは全てのエンベディングパラメータを保持できないため、分散トレーニングを行う際にはデータの並列化のみでは不十分となります。そのため、推薦システムの既存の分散トレーニングプラットフォームはモデルの並列化を採用しています。具体的には、エンベディングパラメータの維持と更新にはサーバーのCPU（ホスト）メモリを使用し、GPUワーカーを利用して順伝播と逆伝播の計算を行います。しかしながら、これらのプラットフォームには2つのボトルネックがあります。（1）ホストとGPU間のプル＆プッシュ操作のレイテンシー、（2）CPUサーバー内でのパラメータの更新と同期。このようなボトルネックに対処するために、本論文ではScaleFreeCTR（SFCTR）と呼ばれる、MixCacheベースのCTRモデル用分散トレーニングシステムを提案します。具体的には、SFCTRでは巨大なエンベディングテーブルをCPUに保持しつつ、エンベディングの同期にはCPUではなくGPUを利用して効率的に行います。GPUホスト間およびGPU間のデータ転送のレイテンシーを削減するために、MixCacheメカニズムとバーチャルスパースID操作を提案します。包括的な実験を行い、SFCTRの有効性と効率性を示します。さらに、我々のシステムは近い将来、MindSporeに基づいてオープンソース化される予定です。

Abstract:　 Because of the superior feature representation ability of deep learning, various deep Click-Through Rate (CTR) models are deployed in the commercial systems by industrial companies. To achieve better performance, it is necessary to train the deep CTR models on huge volume of training data efficiently, which makes speeding up the training process an essential problem. Different from the models with dense training data, the training data for CTR models is usually high-dimensional and sparse. To transform the high-dimensional sparse input into low-dimensional dense real-value vectors, almost all deep CTR models adopt the embedding layer, which easily reaches hundreds of GB or even TB. Since a single GPU cannot afford to accommodate all the embedding parameters, when performing distributed training, it is not reasonable to conduct the data-parallelism only. Therefore, existing distributed training platforms for recommendation adopt model-parallelism. Specifically, they use CPU (Host) memory of servers to maintain and update the embedding parameters and utilize GPU worker to conduct forward and backward computations. Unfortunately, these platforms suffer from two bottlenecks: (1) the latency of pull & push operations between Host and GPU; (2) parameters update and synchronization in the CPU servers. To address such bottlenecks, in this paper, we propose the ScaleFreeCTR: a MixCache-based distributed training system for CTR models. Specifically, in SFCTR, we also store huge embedding table in CPU but utilize GPU instead of CPU to conduct embedding synchronization efficiently. To reduce the latency of data transfer between both GPU-Host and GPU-GPU, the MixCache mechanism and Virtual Sparse Id operation are proposed. Comprehensive experiments are conducted to demonstrate the effectiveness and efficiency of SFCTR. In addition, our system will be open-source based on MindSpore in the near future.

Looking at CTR Prediction Again: Is Attention All You Need?
Authors: Yuan Cheng (1), Yanbo Xue (1)
1: Career Science Lab

ACM DL

Google Scholar

(124)
概要:　クリック率（CTR）の予測は、ウェブ検索、推薦システム、およびオンライン広告表示における重要な課題です。ユーザーのアイテムに対する嗜好を反映するためには、良好な特徴相互作用の学習が不可欠です。ディープラーニングに基づく多くのCTR予測モデルが提案されていますが、研究者は通常、最先端の性能が達成されているかどうかにのみ注意を払い、フレームワーク全体が合理的であるかどうかを無視しがちです。本研究では、経済学における離散選択モデルを用いてCTR予測問題を再定義し、自己注意メカニズムに基づいた一般的なニューラルネットワークフレームワークを提案します。既存のほとんどのCTR予測モデルが我々の提案する一般フレームワークと一致することがわかりました。また、我々の提案するフレームワークの表現力とモデルの複雑さを検討し、いくつかの既存モデルへの潜在的な拡張も考察します。最後に、公共データセットに関する実験結果を通じて我々の見識を実証および検証します。

Abstract:　 Click-through rate (CTR) prediction is a critical problem in web search, recommendation systems and online advertisement displaying. Learning good feature interactions is essential to reflect user's preferences to items. Many CTR prediction models based on deep learning have been proposed, but researchers usually only pay attention to whether state-of-the-art performance is achieved, and ignore whether the entire framework is reasonable. In this work, we use the discrete choice model in economics to redefine the CTR prediction problem, and propose a general neural network framework built on self-attention mechanism. It is found that most existing CTR prediction models align with our proposed general framework. We also examine the expressive power and model complexity of our proposed framework, along with potential extensions to some existing models. And finally we demonstrate and verify our insights through some experimental results on public datasets.

Clicks can be Cheating: Counterfactual Recommendation for Mitigating Clickbait Issue
Authors: Wenjie Wang (1), Fuli Feng (1), Xiangnan He (2), Hanwang Zhang (3), Tat-Seng Chua (1)
1: National University of Singapore, 2: University of Science and Technology of China, 3: Nanyang Technological University

ACM DL

Google Scholar

(125)
概要:　推薦は情報システムにおいて極めて一般的かつ重要なサービスです。ユーザーにパーソナライズされた提案を提供するために、業界では機械学習、特にクリック行動データに基づいた予測モデルの構築が採用されています。これはクリック率 (CTR) 予測として知られ、パーソナライズされた推薦サービスを構築するためのゴールドスタンダードとなっています。しかし、我々はクリックとユーザー満足度の間に大きなギャップがあると主張します。ユーザーが魅力的なタイトルやカバーに「騙されて」クリックしてしまうことが一般的であり、クリックしたアイテムの実際の内容が期待外れであると、システムへの信頼が大きく損なわれます。さらに悪いことに、このような欠陥のあるデータに基づいてCTRモデルを最適化すると、マシュー効果が生じ、見た目は魅力的だが実際には質の低いアイテムがより頻繁に推薦されることになります。本論文では、推薦モデルを因果グラフとして定式化し、推薦における因果関係を反映させます。そして、因果グラフに対する反事実推論を行うことで、クリックベイト問題に対処します。我々は、各アイテムが露出特徴（すなわち、ユーザーがクリック決定を行う前に見ることができる特徴）のみを持つ反事実の世界を想像します。反事実世界におけるユーザーのクリック確率を推定することにより、露出特徴の直接効果を減少させ、クリックベイト問題を排除することが可能となります。実世界のデータセットに対する実験により、我々の手法がCTRモデルのクリック後満足度を大幅に改善することが実証されました。

Abstract:　 Recommendation is a prevalent and critical service in information systems. To provide personalized suggestions to users, industry players embrace machine learning, more specifically, building predictive models based on the click behavior data. This is known as the Click-Through Rate (CTR) prediction, which has become the gold standard for building personalized recommendation service. However, we argue that there is a significant gap between clicks and user satisfaction --- it is common that a user is "cheated" to click an item by the attractive title/cover of the item. This will severely hurt user's trust on the system if the user finds the actual content of the clicked item disappointing. What's even worse, optimizing CTR models on such flawed data will result in the Matthew Effect, making the seemingly attractive but actually low-quality items be more frequently recommended. In this paper, we formulate the recommendation models as a causal graph that reflects the cause-effect factors in recommendation, and address the clickbait issue by performing counterfactual inference on the causal graph. We imagine a counterfactual world where each item has only exposure features (i.e., the features that the user can see before making a click decision). By estimating the click likelihood of a user in the counterfactual world, we are able to reduce the direct effect of exposure features and eliminate the clickbait issue. Experiments on real-world datasets demonstrate that our method significantly improves the post-click satisfaction of CTR models.

A General Method For Automatic Discovery of Powerful Interactions In Click-Through Rate Prediction
Authors: Ze Meng (1), Jinnian Zhang (2), Yumeng Li (3), Jiancheng Li (3), Tanchao Zhu (4), Lifeng Sun (5)
1: Tsinghua University, 2: University of Wisconsin Madison, 3: Alibaba Group, 4: Alibaba Group, 5: Ministry of Education & Tsinghua University

ACM DL

Google Scholar

(126)
概要:　クリック率（CTR）予測における重要な課題の一つは、強力な相互作用をモデル化することです。CTR予測は、個別広告や推薦システムにおける最も典型的な機械学習タスクの一つです。少数のデータセットに対して手作業で相互作用を設計することは効果的ですが、広範なシナリオにおいては労力を要する複雑なアーキテクチャ設計が必要となります。近年では、相互作用を自動設計するためのニューラルアーキテクチャ検索（NAS）手法がいくつか提案されています。しかし、既存の手法は、相互作用生成のためのオペレーターの種類や接続を限られた範囲でしか探索しておらず、一般化能力が低いという問題があります。これらの問題を解決するために、私たちはAutoPIという名前のより一般的な自動化手法を提案します。この論文の主な貢献は次の通りです。AutoPIは、計算グラフが既存のネットワーク接続から一般化され、グラフのエッジにある相互作用オペレーターが代表的な手作業の成果から抽出された、より一般的な検索空間を採用しています。これにより、様々な強力な特徴相互作用を検索し、広範なアプリケーションにおいて高いAUCと低いLoglossを実現できます。また、AutoPIは、探索において著しく低い計算コストである勾配ベースの検索戦略を活用しています。実験的に、私たちは、多様なベンチマークデータセットでAutoPIを評価し、手作業のアーキテクチャや最先端のNASアルゴリズムに比べて、AutoPIの一般化能力と効率性を実証しました。

Abstract:　 Modeling powerful interactions is a critical challenge in Click-through rate (CTR) prediction, which is one of the most typical machine learning tasks in personalized advertising and recommender systems. Although developing hand-crafted interactions is effective for a small number of datasets, it generally requires laborious and tedious architecture engineering for extensive scenarios. In recent years, several neural architecture search (NAS) methods have been proposed for designing interactions automatically. However, existing methods only explore limited types and connections of operators for interaction generation, leading to low generalization ability. To address these problems, we propose a more general automated method for building powerful interactions named AutoPI. The main contributions of this paper are as follows: AutoPI adopts a more general search space in which the computational graph is generalized from existing network connections, and the interactive operators in the edges of the graph are extracted from representative hand-crafted works. It allows searching for various powerful feature interactions to produce higher AUC and lower Logloss in a wide variety of applications. Besides, AutoPI utilizes a gradient-based search strategy for exploration with a significantly low computational cost. Experimentally, we evaluate AutoPI on a diverse suite of benchmark datasets, demonstrating the generalizability and efficiency of AutoPI over hand-crafted architectures and state-of-the-art NAS algorithms.

How Powerful are Interest Diffusion on Purchasing Prediction: A Case Study of Taocode
Authors: Xuanwen Huang (1), Yang Yang (1), Ziqiang Cheng (1), Shen Fan (2), Zhongyao Wang (2), Juren Li (1), Jun Zhang (2), Jingmin Chen (2)
1: Zhejiang University, 2: Alibaba Group

ACM DL

Google Scholar

(127)
概要:　タオコードとは、タオバオ（世界最大のオンラインショッピングサイト）上で特別に符号化されたテキストリンクの一種であり、ユーザー同士が商品の情報を共有するために使用されます。タオコードの解析は、ユーザー間の社会的関係を理解する手助けとなるだけでなく、タオコードの拡散が及ぼす影響下にあるオンライン購買行動の理解にもつながります。本研究は、タオコードをケーススタディとして、情報拡散の観点からオンライン購買予測の問題を革新的に調査しました。具体的には、タオバオから得られた1億件以上のタオコード共有記録を含む大規模な実データセットを用いて、深い観察研究を行いました。観察結果に基づき、タオコードを介した情報拡散をモデル化するための動的GNNベースのフレームワークであるInfNetを提案します。次に、このInfNetを商品購入予測に適用しました。大規模な実データセットに対する広範な実験により、最先端のベースラインと比較してもInfNetの有効性が検証されました。

Abstract:　 A taocode is a kind of specially coded text-link on taobao.com (the world's biggest online shopping website), through which users can share messages about products with each other. Analyzing taocodes can potentially facilitate understanding of the social relationships between users and, more excitingly, their online purchasing behaviors under the influence of taocode diffusion. This paper innovatively investigates the problem of online purchasing predictions from an information diffusion perspective, with taocode as a case study. Specifically, we conduct profound observational studies on a large-scale real-world dataset from Taobao, containing over 100M Taocode sharing records. Inspired by our observations, we propose InfNet, a dynamic GNN-based framework that models the information diffusion across Taocode. We then apply InfNet to item purchasing predictions. Extensive experiments on real-world datasets validate the effectiveness of InfNet compared with νmofbaseline~ state-of-the-art baselines.

Binary Neural Network Hashing for Image Retrieval
Authors: Wanqian Zhang (1), Dayan Wu (2), Yu Zhou (3), Bo Li (3), Weiping Wang (3), Dan Meng (3)
1: Institute of Information Engineering, 2: Institute of Information Engineering, 3: Institute of Information Engineering

ACM DL

Google Scholar

(128)
概要:　ハッシュ化は、大規模な画像検索においてますます重要になってきており、その低ストレージコストおよび高速検索が重要な特性となっています。しかし、既存の手法では、大規模なニューラルネットワークを採用しており、メモリと実行時間のオーバーヘッドが許容範囲を超えるため、リソースが限られたデバイスに展開するのが難しいという問題があります。このようなニューラルネットワークの巨大なオーバーヘッドは、ハッシュ化の魅力的な特性をある程度損なっていると考えられます。本論文では、高速な画像検索のために、バイナリニューラルネットワークハッシュ (BNNH) という新しいディープハッシュ手法を提案します。具体的には、効率的なバイナライズドネットワークアーキテクチャを構築し、軽量なモデルと高速な推論を提供し、量子化損失を発生させずに直接バイナリ出力をハッシュコードとして生成します。さらに、極端に量子化されたアクティベーションによる大幅な性能低下を回避するために、中間層のアクティベーションの更新を明示的にガイドするシンプルで効果的なアクティベーション・アウェア・ロスを導入します。3つのベンチマークで行った広範な実験により、提案手法が最先端のバイナリゼーション手法を大幅に上回る性能を示し、BNNHの効率性を検証しました。

Abstract:　 Hashing has become increasingly important for large-scale image retrieval, of which the low storage cost and fast searching are two key properties. However, existing methods adopt large neural networks, which are hard to be deployed in resource-limited devices due to the unacceptable memory and runtime overhead. We address that this huge overhead of neural networks somewhatviolates the appealing properties of hashing. In this paper, we propose a novel deep hashing method, called Binary Neural Network Hashing (BNNH) for fast image retrieval. Specifically, we construct an efficient binarized network architecture to provide lighter model and faster inference, which directly generates binary outputs as the desired hash codes without introducing the quantization loss. Besides, in order to circumvent the huge performance degradation caused by the extremely quantized activations, we introduce a simple yet effective activation-aware loss to explicitly guide the updating of activations in intermediate layers. Extensive experiments conducted on three benchmarks show that the proposed method outperforms the state-of-the-art binarization methods by large margins and validate the efficiency of BNNH.

Long-Tail Hashing
Authors: Yong Chen (1), Yuqing Hou (2), Shu Leng (3), Qing Zhang (2), Zhouchen Lin (1), Dell Zhang (4)
1: Peking University & Pazhou Lab, 2: Meituan, 3: Tsinghua University, 4: Blue Prism AI Labs & Birkbeck

ACM DL

Google Scholar

(129)
概要:　ハッシングは、データ項目をコンパクトなバイナリコードとして表現することで、超高速の検索速度と極めて経済的なメモリ消費のため、大規模画像検索などにおいてますます人気のある技術となっています。しかし、既存のハッシング手法はすべて、実際の状況では一般的に利用できない人工的にバランスの取れたデータセットからバイナリコードを学習しようとしています。本論文では、データラベルが概ねロングテール分布を示すより現実的なデータセットに対して、ハッシングを学習する問題に対処する新しい二段階のディープハッシングアプローチ「Long-Tail Hashing Network (LTHNet)」を提案します。具体的には、第一段階でロングテールの特徴を考慮に入れたエンドツーエンドのディープニューラルネットワークを通じて与えられたデータセットのリラックスされた埋め込みを学習し、第二段階でそれらの埋め込みをバイナリ化します。LTHNetの重要な部分は、ヘッドクラスとテールクラス間で適応的に視覚知識の転送を実現し、ハッシングのための画像表現を豊かにすることができる決定点過程で拡張された動的メタ埋め込みモジュールです。実験の結果、LTHNetはロングテールデータセットにおいて、最新の競合手法全てを大幅に上回る性能向上を達成し、バランスの取れたデータセットにおいてもほとんど犠牲を払うことがないことが示されました。さらに、クラス重みを損失関数で直接操作することが驚くほどほとんど効果がない一方で、拡張された動的メタ埋め込みモジュールや平方損失ではなくクロスエントロピー損失の使用、そして相対的に小さいバッチサイズのトレーニングがLTHNetの成功に寄与していることが明らかになりました。

Abstract:　 Hashing, which represents data items as compact binary codes, has been becoming a more and more popular technique, e.g., for large-scale image retrieval, owing to its super fast search speed as well as its extremely economical memory consumption. However, existing hashing methods all try to learn binary codes from artificially balanced datasets which are not commonly available in real-world scenarios. In this paper, we propose Long-Tail Hashing Network (LTHNet), a novel two-stage deep hashing approach that addresses the problem of learning to hash for more realistic datasets where the data labels roughly exhibit a long-tail distribution. Specifically, the first stage is to learn relaxed embeddings of the given dataset with its long-tail characteristic taken into account via an end-to-end deep neural network; the second stage is to binarize those obtained embeddings. A critical part of LTHNet is its dynamic meta-embedding module extended with a determinantal point process which can adaptively realize visual knowledge transfer between head and tail classes, and thus enrich image representations for hashing. Our experiments have shown that LTHNet achieves dramatic performance improvements over all state-of-the-art competitors on long-tail datasets, with no or little sacrifice on balanced datasets. Further analyses reveal that while to our surprise directly manipulating class weights in the loss function has little effect, the extended dynamic meta-embedding module, the usage of cross-entropy loss instead of square loss, and the relatively small batch-size for training all contribute to LTHNet's success.

PTHash: Revisiting FCH Minimal Perfect Hashing
Authors: Giulio Ermanno Pibiri (1), Roberto Trani (1)
1: ISTI-CNR

ACM DL

Google Scholar

(130)
概要:　異なる n 個のキーから成るセット S が与えられた場合、その各キーを一対一対応で (0,...,n-1) の範囲にマップする関数 f は、S に対する最小完全ハッシュ関数と呼ばれます。n が大きい場合でも定数時間で評価できるこのような関数を見つけるアルゴリズムは実務上非常に重要です。例えば、検索エンジンやデータベースは、可変長のキー（例：文字列）に対して迅速に識別子を割り当てるために最小完全ハッシュ関数を使用します。課題は、f を見つける時間（構築時間）、S のキーに対する f の評価時間（ルックアップ時間）、および f の表現に必要な空間（ストレージ）の3つの側面で効率的なアルゴリズムを設計することにあります。これらの側面間のトレードオフを図る複数のアルゴリズムが提案されています。1992年、Fox、Chen、Heath（FCH）は、SIGIR で非常に速いルックアップ評価を提供するアルゴリズムを発表しました。しかし、その手法は構築時間が長く、空間消費が他の後発技術よりも大きいため、あまり注目されませんでした。約30年後、我々はその枠組みを再検討し、大規模なセットにもスケールし、かつ空間消費を削減しながらルックアップ時間を損なわない改善されたアルゴリズムを発表します。我々は広範な実験評価を行い、このアルゴリズムが空間に関して最新技術と競合し、ルックアップ時間で2-4倍の性能向上を提供することを示しています。

Abstract:　 Given a set S of n distinct keys, a function f that bijectively maps the keys of S into the range (0,...,n-1) is called a minimal perfect hash function for S. Algorithms that find such functions when n is large and retain constant evaluation time are of practical interest; for instance, search engines and databases typically use minimal perfect hash functions to quickly assign identifiers to static sets of variable-length keys such as strings. The challenge is to design an algorithm which is efficient in three different aspects: time to find f (construction time), time to evaluate f on a key of S (lookup time), and space of representation for f. Several algorithms have been proposed to trade-off between these aspects. In 1992, Fox, Chen, and Heath (FCH) presented an algorithm at SIGIR providing very fast lookup evaluation. However, the approach received little attention because of its large construction time and higher space consumption compared to other subsequent techniques. Almost thirty years later we revisit their framework and present an improved algorithm that scales well to large sets and reduces space consumption altogether, without compromising the lookup time. We conduct an extensive experimental assessment and show that the algorithm finds functions that are competitive in space with state-of-the art techniques and provide 2-4x better lookup time.

Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking
Authors: Sebastian Hofstätter (1), Bhaskar Mitra (2), Hamed Zamani (3), Nick Craswell (4), Allan Hanbury (1)
1: TU Wien, 2: Microsoft, 3: University of Massachusetts Amherst, 4: Microsoft

ACM DL

Google Scholar

(131)
概要:　最新のニューラル文書再ランキングで最先端の効果を達成するための新しいレシピは、大規模な事前学習済み言語モデル（例：BERT）を利用して、文書内のすべての個々のパッセージを評価し、その出力をプーリングまたは追加のTransformer層で集約することです。このアプローチの大きな欠点は、BERTを用いて文書内のすべてのパッセージを評価するコストのために、高いクエリ待ち時間が発生することです。さらに悪いことに、この高い推論コストと待ち時間は文書の長さによって変わり、長い文書ではより多くの時間と計算が必要です。この課題に対処するために、我々は候補文書のパッセージをより安価なモデル（ESM）を用いて剪定し、その後、より高価で効果的なスコアリングモデル（ETM）を実行する、文書内カスケード戦略を採用しています。ESM（効率的なスチューデントモデル）は、ETM（効果的なティーチャーモデル）-例：BERT-からの知識蒸留を通じて訓練するのが最適であることがわかりました。この剪定により、文書の長さによって変わらない小さいパッセージ集合に対してのみETMモデルを実行することができます。我々のMS MARCOおよびTREC Deep Learning Trackベンチマークでの実験は、提案された文書内カスケードランキングモデル（IDCM）が、最先端のBERTベースの文書ランキングモデルと本質的に同じ効果を提供しながら、クエリ待ち時間を400%以上低減することを示唆しています。

Abstract:　 An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e.g., BERT - to evaluate all individual passages in the document and then aggregating the outputs by pooling or additional Transformer layers. A major drawback of this approach is high query latency due to the cost of evaluating every passage in the document with BERT. To make matters worse, this high inference cost and latency varies based on the length of the document, with longer documents requiring more time and computation. To address this challenge, we adopt an intra-document cascading strategy, which prunes passages of a candidate document using a less expensive model, called ESM, before running a scoring model that is more expensive and effective, called ETM. We found it best to train ESM (short for Efficient Student Model) via knowledge distillation from the ETM (short for Effective Teacher Model) e.g., BERT. This pruning allows us to only run the ETM model on a smaller set of passages whose size does not vary by document length. Our experiments on the MS MARCO and TREC Deep Learning Track benchmarks suggest that the proposed Intra-Document Cascaded Ranking Model (IDCM) leads to over 400% lower query latency by providing essentially the same effectiveness as the state-of-the-art BERT-based document ranking models.

Improving Video Retrieval by Adaptive Margin
Authors: Feng He (1), Qi Wang (1), Zhifan Feng (1), Wenbin Jiang (1), Yajuan Lü (1), Yong Zhu (1), Xiao Tan (1)
1: Baidu Inc.

ACM DL

Google Scholar

(132)
概要:　インターネット上の動画の急速な増加により、ビデオ検索の重要性が高まっています。ビデオ検索の主流のパラダイムは、正のペアと負のペアの類似度の間の距離を一定のマージンから引き離すことで、動画とテキストの表現を学習します。しかし、トレーニングに使用される負のペアはランダムにサンプリングされるため、負のペア間の意味が関連している、あるいは同等である可能性があります。それにもかかわらず、ほとんどの方法では彼らの表現を異なるものに強制し、その類似性を低減させます。この現象は不正確な監督を引き起こし、動画-テキスト表現の学習における性能の低下を招きます。この現象を見落とすことが多い動画検索方法の一方で、我々は上述の問題を解決するために、正のペアと負のペア間の距離に応じて変わる適応型マージンを提案します。まず、距離測定の方法と距離とマージンの関数を含む、適応型マージンの計算フレームワークを設計します。そして、「クロスモーダル一般化自己蒸留」（CMGSD）と呼ばれる新しい実装を導入します。これは、ほとんどのビデオ検索モデルの上に少ない修正で構築できます。特筆すべきは、CMGSDはトレーニング時にわずかな計算オーバーヘッドを追加するだけであり、テスト時には追加の計算オーバーヘッドをまったく追加しない点です。3つの広く使用されているデータセットでの実験結果は、提案した方法が対応するベースモデルよりも顕著に優れた性能を発揮し、最先端の方法よりも大幅に優れていることを示しています。

Abstract:　 Video retrieval is becoming increasingly important owing to the rapid emergence of videos on the Internet. The dominant paradigm for video retrieval learns video-text representations by pushing the distance between the similarity of positive pairs and that of negative pairs apart from a fixed margin. However, negative pairs used for training are sampled randomly, which indicates that the semantics between negative pairs may be related or even equivalent, while most methods still enforce dissimilar representations to decrease their similarity. This phenomenon leads to inaccurate supervision and poor performance in learning video-text representations. While most video retrieval methods overlook that phenomenon, we propose an adaptive margin changed with the distance between positive and negative pairs to solve the aforementioned issue. First, we design the calculation framework of the adaptive margin, including the method of distance measurement and the function between the distance and the margin. Then, we explore a novel implementation called "Cross-Modal Generalized Self-Distillation" (CMGSD), which can be built on the top of most video retrieval models with few modifications. Notably, CMGSD adds few computational overheads at train time and adds no computational overhead at test time. Experimental results on three widely used datasets demonstrate that the proposed method can yield significantly better performance than the corresponding backbone model, and it outperforms state-of-the-art methods by a large margin.

Comprehensive Linguistic-Visual Composition Network for Image Retrieval
Authors: Haokun Wen (1), Xuemeng Song (1), Xin Yang (1), Yibing Zhan (2), Liqiang Nie (1)
1: Shandong University, 2: JD Explore Academy

ACM DL

Google Scholar

(133)
概要:　の翻訳

画像検索のためのテキストと画像の合成（CTI-IR）は新しいが難易度の高い課題であり、入力クエリは従来の画像やテキストではなく、参照画像とその対応する修正テキストという合成物です。CTI-IRの核心は、ターゲット画像を検索するためにどのようにして適切にマルチモーダルクエリを合成するかにあります。初期の研究では主に参照画像のローカルな視覚記述子またはグローバルな特徴とテキストを合成することに焦点を当てていました。しかし、テキストの修正は「長袖に変更」といった具体的な属性の変化から、「スタイルをプロフェッショナルに変更」といった的な視覚的性質の調整まで多岐にわたることを見落としています。従って、クエリの合成において参照画像のローカルまたはグローバルな特徴のみを強調するのは不十分です。以上の分析に基づき、我々は包括的な言語・視覚合成ネットワーク（CLVC-Net）を提案します。CLVC-Netの核となるのは、精細なローカルレベルの合成モジュールと精細なグローバルレベルの合成モジュールという二つの合成モジュールを設計し、包括的なマルチモーダル合成を目指す点です。さらに、相互強化モジュールを設計し、ローカルとグローバルの合成プロセスがお互いに知識を共有し合うように促進します。三つの実世界のデータセットで行われた広範な実験は、我々のCLVC-Netの優位性を示しています。我々は他の研究者に利益をもたらすためにコードを公開しました。

Abstract:　 Composing text and image for image retrieval (CTI-IR) is a new yet challenging task, for which the input query is not the conventional image or text but a composition, i.e., a reference image and its corresponding modification text. The key of CTI-IR lies in how to properly compose the multi-modal query to retrieve the target image. In a sense, pioneer studies mainly focus on composing the text with either the local visual descriptor or global feature of the reference image. However, they overlook the fact that the text modifications are indeed diverse, ranging from the concrete attribute changes, like "change it to long sleeves", to the abstract visual property adjustments, e.g., "change the style to professional". Thus, simply emphasizing the local or global feature of the reference image for the query composition is insufficient. In light of the above analysis, we propose a Comprehensive Linguistic-Visual Composition Network (CLVC-Net) for image retrieval. The core of CLVC-Net is that it designs two composition modules: fine-grained local-wise composition module and fine-grained global-wise composition module, targeting comprehensive multi-modal compositions. Additionally, a mutual enhancement module is designed to promote local-wise and global-wise composition processes by forcing them to share knowledge with each other. Extensive experiments conducted on three real-world datasets demonstrate the superiority of our CLVC-Net. We released the codes to benefit other researchers.

GilBERT: Generative Vision-Language Pre-Training for Image-Text Retrieval
Authors: Weixiang Hong (1), Kaixiang Ji (1), Jiajia Liu (1), Jian Wang (1), Jingdong Chen (1), Wei Chu (1)
1: Ant Group

ACM DL

Google Scholar

(134)
概要:　テキスト/画像クエリが与えられた場合、画像-テキスト検索はデータベース内の関連アイテムを見つけることを目指します。近年、視覚言語事前学習（VLP）方法は、画像-テキスト検索や他の視覚言語タスクにおいて有望な精度を示しています。これらのVLP方法は通常、大量の画像-テキストペアで事前学習され、その後様々なダウンストリームタスクに対して微調整されます。しかしながら、画像-テキスト検索における自然なモダリティの不完全性、すなわちクエリが画像またはテキストであり、画像-テキストペアではないため、VLPを画像-テキスト検索に単純に適用することは大きな非効率を招きます。さらに、既存のVLP方法は、単一モーダルクエリとマルチモーダルデータベース項目のための比較可能な表現を抽出することができません。この研究では、画像-テキストデータの汎用的な表現を学習し、不完全なペアのための欠損モダリティを補完するために、GilBERTと呼ばれる生成的視覚言語事前学習アプローチを提案します。テストフェーズでは、提案するGilBERTは、クエリとデータベース項目のための統一された特徴埋め込みを提供することで、効率的なベクターベースの検索を促進します。さらに、生成的トレーニングは、GilBERTを非並列のテキスト/画像コーパスと互換性を持たせるだけでなく、大量のランダムサンプリングされた負のサンプルに苦しむことなく画像-テキスト関係をモデル化することを可能にし、優れた実験パフォーマンスをもたらします。広範な実験により、効率と精度の両面でGilBERTの画像-テキスト検索における利点が示されました。

Abstract:　 Given a text/image query, image-text retrieval aims to find the relevant items in the database. Recently, visual-linguistic pre-training (VLP) methods have demonstrated promising accuracy on image-text retrieval and other visual-linguistic tasks. These VLP methods are typically pre-trained on a large amount of image-text pairs, then fine-tuned on various downstream tasks. Nevertheless, due to the natural modality incompleteness in image-text retrieval, i.e., the query is either image or text rather than an image-text pair, the naive application of VLP to image-text retrieval results in significant inefficiency. Moreover, existing VLP methods cannot extract comparable representations for a single-modal query and multi-modal database items. In this work, we propose a generative visual-linguistic pre-training approach, termed as GilBERT, to simultaneously learn generic representations of image-text data and complete the missing modality for incomplete pairs. In testing phase, the proposed GilBERT facilitates efficient vector-based retrieval by providing unified feature embedding for query and database items. Moreover, the generative training not only makes GilBERT compatible with non-parallel text/image corpus, but also enables GilBERT to model the image-text relationships without suffering massive randomly-sampled negative samples, leading to superior experimental performances. Extensive experiments demonstrate the advantages of GilBERT in image-text retrieval, in terms of both efficiency and accuracy.

DeepQAMVS: Query-Aware Hierarchical Pointer Networks for Multi-Video Summarization
Authors: Safa Messaoud (1), Ismini Lourentzou (2), Assma Boughoula (1), Mona Zehni (1), Zhizhen Zhao (1), Chengxiang Zhai (1), Alexander G. Schwing (1)
1: University of Illinois at Urbana-Champaign, 2: Virginia Tech

ACM DL

Google Scholar

(135)
概要:　タイトル:昨今のウェブ動画共有プラットフォームの急成長に伴い、効率的に動画コンテンツを閲覧し、検索し、するシステムの需要が高まっています。クエリに応じたマルチビデオはこの需要に応える有望な技術です。本研究では、クエリに応じた階層型ポインターネットワーク（Query-Aware Hierarchical Pointer Network）によるマルチビデオ、DeepQAMVSを提案します。このモデルは以下の複数の基準を同時に最適化します: (1) 簡潔さ、(2) クエリに関連する重要なイベントの代表性、(3) 時系列の一貫性。我々は三つの分布に分解する階層型注意モデルを設計し、それぞれが異なるモダリティから証拠を集めた後、ポインターネットワークがに含めるフレームを選択します。DeepQAMVSは強化学習を用いて訓練され、代表性、多様性、クエリ適応性および時間的整合性を捉える報奨を組み込みます。MVS1Kデータセットにおいて最先端の成果を達成し、推論時間は入力動画フレーム数に線形にスケーリングします。

Abstract:　 The recent growth of web video sharing platforms has increased the demand for systems that can efficiently browse, retrieve and summarize video content. Query-aware multi-video summarization is a promising technique that caters to this demand. In this work, we introduce a novel Query-Aware Hierarchical Pointer Network for Multi-Video Summarization, termed DeepQAMVS, that jointly optimizes multiple criteria: (1) conciseness, (2) representativeness of important query-relevant events and (3) chronological soundness. We design a hierarchical attention model that factorizes over three distributions, each collecting evidence from a different modality, followed by a pointer network that selects frames to include in the summary. DeepQAMVS is trained with reinforcement learning, incorporating rewards that capture representativeness, diversity, query-adaptability and temporal coherence. We achieve state-of-the-art results on the MVS1K dataset, with inference time scaling linearly with the number of input video frames.

Comparison-based Conversational Recommender System with Relative Bandit Feedback
Authors: Zhihui Xie (1), Tong Yu (2), Canzhe Zhao (3), Shuai Li (1)
1: Shanghai Jiao Tong University, 2: Carnegie Mellon University, 3: Shandong University

ACM DL

Google Scholar

(136)
概要:　近年の対話型推薦システムの進展により、レコメンダシステムは対話を通じてユーザーの好みを積極的かつ動的に引き出すことが可能となっています。これを実現するために、システムは定期的に属性に関するユーザーの好みを尋ね、そのフィードバックを収集します。しかし、現在の多くの対話型推薦システムは、ユーザーに絶対的なフィードバックのみを提供させることができます。実際には、絶対的なフィードバックは限られており、ユーザーは好みを表現する際に偏ったフィードバックを提供しがちです。それに対して、ユーザーは本質的に相対的な好みを表現する傾向が強く、比較による好みを示すことの方が多いのです。対話型のやり取りの中で、ユーザーが比較による好みを提供できるようにするために、新しい比較ベースの対話型推薦システムを提案します。相対的なフィードバックは実践的ではありますが、そのフィードバックの尺度がユーザーの絶対的な好みと一致しないため、取り入れるのは簡単ではありません。インタラクティブな方法で相対的なフィードバックを効果的に収集・理解するために、「RelativeConUCB」と呼ばれる新しいバンディットアルゴリズムを提案します。合成データと実世界のデータセットの両方で行った実験によって、対話型レコメンダシステムにおける既存のバンディットアルゴリズムと比較して、提案手法の優位性が検証されました。

Abstract:　 With the recent advances of conversational recommendations, the recommender system is able to actively and dynamically elicit user preference via conversational interactions. To achieve this, the system periodically queries users' preference on attributes and collects their feedback. However, most existing conversational recommender systems only enable the user to provide absolute feedback to the attributes. In practice, the absolute feedback is usually limited, as the users tend to provide biased feedback when expressing the preference. Instead, the user is often more inclined to express comparative preferences, since user preferences are inherently relative. To enable users to provide comparative preferences during conversational interactions, we propose a novel comparison-based conversational recommender system. The relative feedback, though more practical, is not easy to be incorporated since its feedback scale is always mismatched with users' absolute preferences. With effectively collecting and understanding the relative feedback from an interactive manner, we further propose a new bandit algorithm, which we call RelativeConUCB. The experiments on both synthetic and real-world datasets validate the advantage of our proposed method, compared to the existing bandit algorithms in the conversational recommender systems.

When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution
Authors: Chuanhao Li (1), Qingyun Wu (1), Hongning Wang (1)
1: University of Virginia

ACM DL

Google Scholar

(137)
概要:　協調型バンディット学習、すなわち共同フィルタリング技術を利用してオンラインインタラクティブ推薦のサンプル効率を向上させるバンディットアルゴリズムは、2つの世界の利点を享受するため、多くの研究の関心を集めています。しかしながら、既存の協調型バンディット学習の解決策は、環境についての静的な仮定―すなわち、ユーザーの好みとユーザー間の依存関係の両方が時間と共に静的であると仮定しています。残念ながら、この仮定は、ユーザーの興味や依存関係が絶えず変化するため、実際にはほとんど当てはまりません。そのために、推薦システムは現実のパフォーマンスが最適に達しません。本研究では、推薦のために変化する環境に対応する協調的な動的バンディット解決策を開発します。我々は、ユーザーの好みと依存関係の両方の基盤となる変化を確率過程として明示的にモデル化します。個々のユーザーの好みは、ディリクレ過程先行分布を持つグローバルに共有された文脈バンディットモデルの混合によってモデル化されます。ユーザー間の協力は、グローバルなバンディットモデルに対するベイズ推論を通じて実現されます。インタラクション中の探索と搾取のバランスを取るために、トンプソンサンプリングがモデル選択とアーム選択の両方に用いられます。我々の解決策は、この挑戦的な環境で標準的な~O(√T)ベイズ後悔を維持することが証明されています。合成データセットおよび現実のデータセットを用いた広範な実証評価により、変化する環境のモデル化の必要性と、我々のアルゴリズムがいくつかの最新のオンライン学習解決策に対する実際的な利点を持つことがさらに確認されました。

Abstract:　 Collaborative bandit learning, i.e., bandit algorithms that utilize collaborative filtering techniques to improve sample efficiency in online interactive recommendation, has attracted much research attention as it enjoys the best of both worlds. However, all existing collaborative bandit learning solutions impose a stationary assumption about the environment, i.e., both user preferences and the dependency among users are assumed static over time. Unfortunately, this assumption hardly holds in practice due to users' ever-changing interests and dependency relations, which inevitably costs a recommender system sub-optimal performance in practice. In this work, we develop a collaborative dynamic bandit solution to handle a changing environment for recommendation. We explicitly model the underlying changes in both user preferences and their dependency relation as a stochastic process. Individual user's preference is modeled by a mixture of globally shared contextual bandit models with a Dirichlet process prior. Collaboration among users is thus achieved via Bayesian inference over the global bandit models. To balance exploitation and exploration during the interactions, Thompson sampling is used for both model selection and arm selection. Our solution is proved to maintain a standard ~O(√T) Bayesian regret in this challenging environment. Extensive empirical evaluations on both synthetic and real-world datasets further confirmed the necessity of modeling a changing environment and our algorithm's practical advantages against several state-of-the-art online learning solutions.

Glider: A Reinforcement Learning Approach to Extract UI Scripts from Websites
Authors: Yuanchun Li (1), Oriana Riva (2)
1: Microsoft Research, 2: Microsoft Research

ACM DL

Google Scholar

(138)
概要:　ウェブ自動化スクリプト（タスクリット）は、車の予約や映画のチケット購入など、人間のタスクを実行するためにAIアシスタントが使用します。現在、タスクリットの生成は手作業が多く、時間のかかるプロセスです。私たちは、自然言語によるタスククエリとウェブサイトURLからタスクリットを自動かつスケーラブルに生成する「Glider」というアプローチを提案します。Gliderの大きな利点の一つは、事前学習を必要としない点です。Gliderはタスクリット抽出を状態空間探索としてモデル化し、エージェントがウェブサイトのUIを探索し、タスクの完了に向けて進展すると報酬を受け取る仕組みです。この報酬は、エージェントのナビゲーションパターンとタスククエリの類似性に基づいて計算されます。階層的強化学習ポリシーを使用して、報酬を最大化するアクションシーケンスを効率的に見つけます。Gliderを評価するために、さまざまなカテゴリ（ショッピング、不動産、フライトなど）のタスクのタスクリットを抽出しました。その結果、79%のケースで正しいタスクリットが生成されました。

Abstract:　 Web automation scripts (tasklets) are used by personal AI assistants to carry out human tasks such as reserving a car or buying movie tickets. Generating tasklets today is a tedious job which requires much manual effort. We propose Glider, an automated and scalable approach to generate tasklets from a natural language task query and a website URL. A major advantage of Glider is that it does not require any pre-training. Glider models tasklet extraction as a state space search, where agents can explore a website's UI and get rewarded when making progress towards task completion. The reward is computed based on the agent's navigating pattern and the similarity between its trajectory and the task query. A hierarchical reinforcement learning policy is used to efficiently find the action sequences that maximize the reward. To evaluate Glider, we used it to extract tasklets for tasks in various categories (shopping, real-estate, flights, etc.); in 79% of cases a correct tasklet was generated.

Unified Conversational Recommendation Policy Learning via Graph-based Reinforcement Learning
Authors: Yang Deng (1), Yaliang Li (2), Fei Sun (3), Bolin Ding (2), Wai Lam (1)
1: The Chinese University of Hong Kong, 2: Alibaba Group, 3: Alibaba Group

ACM DL

Google Scholar

(139)
概要:　会話型レコメンダーシステム(CRS)は、インタラクティブな会話を通じてユーザーのアイテムや属性に対する好みを明示的に取得できるように、従来のレコメンダーシステムを拡張します。強化学習(RL)は、各会話のターンごとにどの属性を尋ねるか、どのアイテムを推薦するか、またいつ尋ねるかまたは推薦するかを決定するための会話型推薦ポリシーを学習するために広く採用されています。しかし、既存の手法は主にこれらの三つの意思決定問題のうち一つか二つに焦点を当てており、会話コンポーネントと推薦コンポーネントが分離しているため、CRSのスケーラビリティと一般性が制限され、安定したトレーニングプロセスの維持には不十分です。これらの課題を踏まえ、我々はこれら三つの意思決定問題を統一されたポリシー学習タスクとして定式化することを提案します。会話と推薦の各コンポーネントを体系的に統合するために、動的加重グラフベースのRL手法を開発し、各会話ターンでアクション（属性を尋ねるかアイテムを推薦するか）を選択するポリシーを学習します。さらに、サンプル効率の問題に対処するために、好みとエントロピー情報に基づいて候補アクション空間を縮小するための二つのアクション選択戦略を提案します。2つのベンチマークCRSデータセットおよび実世界のE-コマースアプリケーションにおける実験結果は、提案手法が最先端の手法を大幅に上回るパフォーマンスを示し、CRSのスケーラビリティと安定性も向上させることを示しています。

Abstract:　 Conversational recommender systems (CRS) enable the traditional recommender systems to explicitly acquire user preferences towards items and attributes through interactive conversations. Reinforcement learning (RL) is widely adopted to learn conversational recommendation policies to decide what attributes to ask, which items to recommend, and when to ask or recommend, at each conversation turn. However, existing methods mainly target at solving one or two of these three decision-making problems in CRS with separated conversation and recommendation components, which restrict the scalability and generality of CRS and fall short of preserving a stable training procedure. In the light of these challenges, we propose to formulate these three decision-making problems in CRS as a unified policy learning task. In order to systematically integrate conversation and recommendation components, we develop a dynamic weighted graph based RL method to learn a policy to select the action at each conversation turn, either asking an attribute or recommending items. Further, to deal with the sample efficiency issue, we propose two action selection strategies for reducing the candidate action space according to the preference and entropy information. Experimental results on two benchmark CRS datasets and a real-world E-Commerce application show that the proposed method not only significantly outperforms state-of-the-art methods but also enhances the scalability and stability of CRS.

Conversations Powered by Cross-Lingual Knowledge
Authors: Weiwei Sun (1), Chuan Meng (1), Qi Meng (2), Zhaochun Ren (1), Pengjie Ren (1), Zhumin Chen (1), Maarten de Rijke (3)
1: Shandong University, 2: Microsoft Research Asia, 3: University of Amsterdam & Ahold Delhaize Research

ACM DL

Google Scholar

(140)
概要:　現在のオープンドメイン会話エージェントは、外部知識を活用することによって生成された応答の情報量を増加させています。しかし、多くの既存のアプローチは大量の単言語知識ソースが存在するシナリオでのみ機能します。知識ソースの利用が限られている言語においては、同じ言語の知識を使用して情報豊かな応答を生成することは効果的ではありません。この問題に対処するために、私たちはクロスリンガル知識基盤会話（CKGC）のタスクを提案します。このタスクでは、他言語の大規模な知識ソースを活用して情報豊かな応答を生成します。

クロスリンガル知識基盤会話のタスクには、二つの主な課題があります：(1) クロスリンガルな環境における知識選択と応答生成、(2) 評価のためのテストデータセットが不足していること。この最初の課題に取り組むために、カリキュラム自己知識蒸留（CSKD）スキームを提案します。このスキームは、補助言語の大規模な対話コーパスを利用して、知識蒸留を介してターゲット言語におけるクロスリンガルな知識選択と知識表現を改善します。第二の課題に対処するために、クロスリンガル知識基盤会話テストデータセットを収集し、将来の関連研究を促進します。

新たに作成されたデータセットにおける広範な実験により、私たちの提案するカリキュラム自己知識蒸留メソッドがクロスリンガル知識基盤会話において効果的であることを検証しました。さらに、我々の提案する教師なしメソッドがクロスリンガル知識選択において最先端のベースラインを大幅に上回ることを確認しました。

Abstract:　 Today's open-domain conversational agents increase the informativeness of generated responses by leveraging external knowledge. Most of the existing approaches work only for scenarios with a massive amount of monolingual knowledge sources. For languages with limited availability of knowledge sources, it is not effective to use knowledge in the same language to generate informative responses. To address this problem, we propose the task of cross-lingual knowledge grounded conversation (CKGC), where we leverage large-scale knowledge sources in another language to generate informative responses. Two main challenges come with the task of cross-lingual knowledge grounded conversation: (1) knowledge selection and response generation in a cross-lingual setting; and (2) the lack of a test dataset for evaluation. To tackle the first challenge, we propose the curriculum self-knowledge distillation (CSKD) scheme, which utilizes a large-scale dialogue corpus in an auxiliary language to improve cross-lingual knowledge selection and knowledge expression in the target language via knowledge distillation. To tackle the second challenge, we collect a cross-lingual knowledge grounded conversation test dataset to facilitate relevant research in the future. Extensive experiments on the newly created dataset verify the effectiveness of our proposed curriculum self-knowledge distillation method for cross-lingual knowledge grounded conversation. In addition, we find that our proposed unsupervised method significantly outperforms the state-of-the-art baselines in cross-lingual knowledge selection.

Transformer Reasoning Network for Personalized Review Summarization
Authors: Hongyan Xu (1), Hongtao Liu (1), Pengfei Jiao (1), Wenjun Wang (2)
1: Tianjin University, 2: Tianjin University & Shihezi University

ACM DL

Google Scholar

(141)
概要:　レビューは、オンライン製品レビューの凝縮されたテキストを生成することを目指しており、Eコマースプラットフォームでますます注目されています。入力されたレビューに加えて、生成されたの品質はユーザーや製品の過去のなどの特徴に密接に関連しており、これらは目標の生成に有用な手掛かりを提供します。しかし、これまでの多くの研究では、与えられた入力レビューとそれに対応する過去のとの間の潜在的な相互作用が無視されてきました。したがって、我々は履歴情報を生成に効果的に組み込む方法を探ることを目指しています。本論文では、パーソナライズされたレビューのための新しいトランスフォーマーベースの推論フレームワークを提案します。工夫されたトランスフォーマーネットワークを設計し、エンコーダとデコーダを含めて、入力レビューに対して過去のの重要で情報に富んだ部分を完全に推論し、より包括的なを生成します。提案する方法のエンコーダでは、履歴情報を選択的に取り入れて入力レビューのパーソナライズされた表現を学習するためのインターアテンションとイントラアテンションを開発しました。デコーダ部分では、コンストラクションされた推論メモリを元のトランスフォーマーデコーダに組み込み、最終生成のための有用な情報をさらに引き出すためのメモリ・デコーダアテンションモジュールを設計しました。広範な実験を行い、その結果、提案手法が推奨に対してより合理的なを生成し、多くの競争力のあるベースライン手法を凌駕することが示されました。

Abstract:　 Review summarization aims to generate condensed text for online product reviews, and has attracted more and more attention in E-commerce platforms. In addition to the input review, the quality of generated summaries is highly related to the characteristics of users and products, e.g., their historical summaries, which could provide useful clues for the target summary generation. However, most previous works ignore the underlying interaction between the given input review and the corresponding historical summaries. Therefore, we aim to explore how to effectively incorporate the history information into the summary generation. In this paper, we propose a novel transformer-based reasoning framework for personalized review summarization. We design an elaborately adapted transformer network containing an encoder and a decoder, to fully infer the important and informative parts among the historical summaries in terms of the input review to generate more comprehensive summaries. In the encoder of our approach, we develop an inter- and intra-attention to involve the history information selectively to learn the personalized representation of the input review. In the decoder part, we propose to incorporate the constructed reasoning memory learning from historical summaries into the original transformer decoder, and design a memory-decoder attention module to retrieve more useful information for the final summary generation. Extensive experiments are conducted and the results show our approach could generate more reasonable summaries for recommendation, and outperform many competitive baseline methods.

Leveraging Lead Bias for Zero-shot Abstractive News Summarization
Authors: Chenguang Zhu (1), Ziyi Yang (2), Robert Gmyr (1), Michael Zeng (1), Xuedong Huang (1)
1: Microsoft, 2: Stanford University

ACM DL

Google Scholar

(142)
概要:　ニュース記事において、最も重要な情報を冒頭で伝えることは一般的なジャーナリスティックな慣習であり、リード・バイアスと呼ばれています。この現象は生成において利用できる反面、モデルに重要な情報を識別し抽出する能力を教える上では否定的な影響を与えます。我々は、このリード・バイアスを簡単かつ効果的に活用し、大規模なラベルなしニュースコーパスを用いて的ニュースモデルの事前学習を行う方法を提案します。具体的には、記事の残りの部分を用いて先頭の文を予測するという手法です。まず、巨大なニュースコーパスを収集し、統計解析を用いてデータのクリーニングとフィルタリングを行います。次に、このデータセットを用いた自己教師付き事前学習を、既存の生成モデルであるBARTおよびT5に適用し、ドメイン適応を図ります。6つのベンチマークデータセットに対する広範な実験により、このアプローチが品質を劇的に向上させ、ファインチューニングなしでゼロショットニュースにおいて最先端の結果を達成できることを示しました。例えば、DUC2003データセットにおけるBARTのROUGE-1スコアは、リード・バイアス事前学習後に13.7%向上しました。私たちはこのモデルをMicrosoft Newsに展開し、多言語ニュースのための公開APIおよびデモウェブサイトを提供しています。

Abstract:　 A typical journalistic convention in news articles is to deliver the most salient information in the beginning, also known as the lead bias. While this phenomenon can be exploited in generating a summary, it has a detrimental effect on teaching a model to discriminate and extract important information in general. We propose that this lead bias can be leveraged in our favor in a simple and effective way to pre-train abstractive news summarization models on large-scale unlabeled news corpora: predicting the leading sentences using the rest of an article. We collect a massive news corpus and conduct data cleaning and filtering via statistical analysis. We then apply self-supervised pre-training on this dataset to existing generation models BART and T5 for domain adaptation. Via extensive experiments on six benchmark datasets, we show that this approach can dramatically improve the summarization quality and achieve state-of-the-art results for zero-shot news summarization without any fine-tuning. For example, in the DUC2003 dataset, the ROUGE-1 score of BART increases 13.7% after the lead-bias pre-training. We deploy the model in Microsoft News and provide public APIs as well as a demo website for multi-lingual news summarization.

Retrieving Complex Tables with Multi-Granular Graph Representation Learning
Authors: Fei Wang (1), Kexuan Sun (1), Muhao Chen (1), Jay Pujara (1), Pedro Szekely (2)
1: University of Southern California, 2: University of Southern California

ACM DL

Google Scholar

(143)
概要:　自然言語テーブル検索（NLTR）のタスクは、自然言語クエリに基づいて意味的に関連するテーブルを検索することを目指しています。このタスクのための既存の学習システムは、テーブルをデータフレームとして構造化されていると仮定して、単純なテキストとして扱うことが多いです。しかし、テーブルには複雑なレイアウトが存在し、ネストされたヘッダーなど、サブテーブル構造間の多様な依存関係を示すことがあります。その結果、クエリはこれらの構造に分散している関連コンテンツの異なる範囲を参照することができます。さらに、このようなシステムは、トレーニングセットで見られない新しいシナリオに一般化することができません。従来の手法は、複雑なテーブルのレイアウトや複数の粒度にわたるクエリを処理する際に不十分であり、NLTR問題の一般化可能な解決策にはまだ遠い状況です。これらの問題に対処するために、我々は多粒度のグラフ表現学習を備えた一般化可能なNLTRフレームワークであるGraph-based Table Retrieval（GTR）を提案します。我々のフレームワークでは、まずテーブルをセルノード、行ノード、列ノードを含むタブラルグラフに変換し、異なる粒度でコンテンツを把握します。次に、タブラルグラフを入力として、テーブルのセルコンテンツとレイアウト構造の両方を捉えることができるGraph Transformerモデルに渡します。モデルの強固さと一般化能力を向上させるために、さらにグラフコンテクストマッチングに基づく自己教師付き事前学習タスクを組み込みました。二つのベンチマークでの実験結果は、我々の方法が現在の最先端システムに対して大幅な改善をもたらすことを示しています。さらに、追加の実験により、我々の方法が異なるデータセット間での一般化性能の向上や、複雑なテーブルを扱う能力、そして多様なクエリインテントを満たす能力で有望な性能を発揮することが示されています。

Abstract:　 The task of natural language table retrieval (NLTR) seeks to retrieve semantically relevant tables based on natural language queries. Existing learning systems for this task often treat tables as plain text based on the assumption that tables are structured as dataframes. However, tables can have complex layouts which indicate diverse dependencies between subtable structures, such as nested headers. As a result, queries may refer to different spans of relevant content that is distributed across these structures. Moreover, such systems fail to generalize to novel scenarios beyond those seen in the training set. Prior methods are still distant from a generalizable solution to the NLTR problem, as they fall short in handling complex table layouts or queries over multiple granularities. To address these issues, we propose Graph-based Table Retrieval (GTR), a generalizable NLTR framework with multi-granular graph representation learning. In our framework, a table is first converted into a tabular graph, with cell nodes, row nodes and column nodes to capture content at different granularities. Then the tabular graph is input to a Graph Transformer model that can capture both table cell content and the layout structures. To enhance the robustness and generalizability of the model, we further incorporate a self-supervised pre-training task based on graph-context matching. Experimental results on two benchmarks show that our method leads to significant improvements over the current state-of-the-art systems. Further experiments demonstrate promising performance of our method on cross-dataset generalization, and enhanced capability of handling complex tables and fulfilling diverse query intents.

TILDE: Term Independent Likelihood moDEl for Passage Re-ranking
Authors: Shengyao Zhuang (1), Guido Zuccon (1)
1: The University of Queensland

ACM DL

Google Scholar

(144)
概要:　ディープラングエージモデル（Deep LMs）は、全テキスト検索やカスケード検索パイプラインの後段階リランカーとしてますます利用されています。しかし、ディープLMsを使用する際の問題として、クエリ時に遅い推論ステップを実行する必要があり、これがこれらの強力な検索モデルの実用的な採用を妨げたり、リランキングの対象とする文書の数を制限したりします。そこで我々は、新しいBERTベースのTerm Independent Likelihood moDEl（TILDE）を提案します。このモデルは、クエリと文書の両方の尤度に基づいて文書をランキングします。クエリ時には、ディープLMsに基づく検索アプローチが必要とする推論ステップを必要としないため、タイムセービングを一貫して提供します。これは、索引作成時にクエリ用語の尤度の予測を事前計算し、保存することで実現されます。この方法では、ディープLMsが前提とする用語依存仮定を緩和します。

さらに、両方向のトレーニング損失を新たに考案し、トレーニング中にTILDEがクエリおよび文書の両方の尤度を最大化できるようにしました。クエリ時には、TILDEはクエリ尤度コンポーネント（TILDE-QL）のみ、またはTILDE-QLと文書尤度コンポーネント（TILDE-DL）の組み合わせに依存することができます。これにより、効率と効果の間で柔軟にトレードオフすることが可能です。両コンポーネントを活用することで、高い計算コストを伴うものの、最高の効果を得ることができます。一方、TILDE-QLのみに依存することで、推論が不要なため、効果を犠牲にしても応答時間を速めることができます。TILDEは、MS MARCOおよびTREC Deep Learning 2019、2020パッセージランキングデータセットで評価されました。実験結果から、他のディープラングエージモデルを実際に運用可能にするためのアプローチと比較して、TILDEは競争力のある効果と低いクエリレイテンシーを兼ね備えていることが示されました。

Abstract:　 Deep language models (deep LMs) are increasingly being used for full text retrieval or within cascade retrieval pipelines as later-stage re-rankers. A problem with using deep LMs is that, at query time, a slow inference step needs to be performed -- this hinders the practical adoption of these powerful retrieval models, or limits sensibly how many documents can be considered for re-ranking. We propose the novel, BERT-based, Term Independent Likelihood moDEl (TILDE), which ranks documents by both query and document likelihood. At query time, our model does not require the inference step of deep language models based retrieval approaches, thus providing consistent time-savings, as the prediction of query terms' likelihood can be pre-computed and stored during index creation. This is achieved by relaxing the term dependence assumption made by the deep LMs. In addition, we have devised a novel bi-directional training loss which allows TILDE to maximise both query and document likelihood at the same time during training. At query time, TILDE can rely on its query likelihood component (TILDE-QL) solely, or the combination of TILDE-QL and its document likelihood component (TILDE-DL), thus providing a flexible trade-off between efficiency and effectiveness. Exploiting both components provide the highest effectiveness at a higher computational cost while relying only on TILDE-QL trades off effectiveness for faster response time due to no inference being required. TILDE is evaluated on the MS MARCO and TREC Deep Learning 2019 and 2020 passage ranking datasets. Empirical results show that, compared to other approaches that aim to make deep language models viable operationally, TILDE achieves competitive effectiveness coupled with low query latency.

Path-based Deep Network for Candidate Item Matching in Recommenders
Authors: Houyi Li (1), Zhihong Chen (1), Chenliang Li (2), Rong Xiao (1), Hongbo Deng (1), Peng Zhang (1), Yongchao Liu (3), Haihong Tang (1)
1: Alibaba Group, 2: Wuhan University, 3: Ant Group

ACM DL

Google Scholar

(145)
概要:　大規模なレコメンダーシステムは主に2つの段階、マッチングとランキング、から成り立っています。マッチング段階（リトリーバルステップとも呼ばれます）では、膨大なアイテムコーパスから関連性の高い小さな部分集合を、低遅延かつ低コストで特定します。アイテム間協調フィルタリング（アイテムベースのCF）と埋め込みベースのリトリーバル（EBR）は、その効率性から産業界で長く使用されてきました。しかし、アイテムベースのCFはパーソナライズには適しておらず、EBRは多様性の確保が難しいという課題があります。本論文では、パーソナライズと多様性の両方を取り入れてマッチング性能を向上させる、新しいマッチングアーキテクチャ「Path-based Deep Network（PDN）」を提案します。具体的には、PDNは「Trigger Net」と「Similarity Net」の2つのモジュールで構成されています。PDNは、Trigger Netを用いて各ユーザーが関与したアイテムに対する興味を捉えます。Similarity Netは、これらのアイテムのプロフィールとCF情報に基づいて、各関与アイテムとターゲットアイテムの類似性を評価するように設計されています。ユーザーとターゲットアイテム間の最終的な関連性は、ユーザーの多様な興味を明示的に考慮し、関連する2ホップ経路（経路の一つのホップはユーザーアイテム間の関与を、もう一つのホップはアイテム間の関連性を指します）の関連性重みを集約することで計算されます。さらに、本論文では、先進的な実世界のEコマースサービス（Mobile Taobaoアプリ）における提案されるPDNのアーキテクチャ設計を説明します。オフライン評価とオンラインA/Bテストに基づき、同一のタスクに対する既存のソリューションよりもPDNが優れていることを示します。また、オンライン結果は、PDNがパーソナライズされた多様なアイテムをリトリーバルでき、ユーザーのエンゲージメントを大幅に向上させることを示しています。現在、PDNシステムはMobile Taobaoアプリに成功裏に導入されており、主要なオンライントラフィックを処理しています。

Abstract:　 The large-scale recommender system mainly consists of two stages: matching and ranking. The matching stage (also known as the retrieval step) identifies a small fraction of relevant items from billion-scale item corpus in low latency and computational cost. Item-to-item collaborative filtering (item-based CF) and embedding-based retrieval (EBR) have been long used in the industrial matching stage owing to its efficiency. However, item-based CF is hard to meet personalization, while EBR has difficulty in satisfying diversity. In this paper, we propose a novel matching architecture, Path-based Deep Network (named PDN), through incorporating both personalization and diversity to enhance matching performance. Specifically, PDN is comprised of two modules: Trigger Net and Similarity Net. PDN utilizes Trigger Net to capture the user's interest in each of his/her interacted item. Similarity Net is devised to evaluate the similarity between each interacted item and the target item based on these items' profile and CF information. The final relevance between the user and the target item is calculated by explicitly considering user's diverse interests, \ie aggregating the relevance weights of the related two-hop paths (one hop of a path corresponds to user-item interaction and the other to item-item relevance). Furthermore, we describe the architecture design of the proposed PDN in a leading real-world E-Commerce service (Mobile Taobao App). Based on offline evaluations and online A/B test, we show that PDN outperforms the existing solutions for the same task. The online results also demonstrate that PDN can retrieve more personalized and more diverse items to significantly improve user engagement. Currently, PDN system has been successfully deployed at Mobile Taobao App and handling major online traffic.

Optimizing Dense Retrieval Model Training with Hard Negatives
Authors: Jingtao Zhan (1), Jiaxin Mao (2), Yiqun Liu (1), Jiafeng Guo (3), Min Zhang (1), Shaoping Ma (1)
1: BNRist, 2: GSAI, 3: University of Chinese Academy of Sciences & Institute of Computing Technology

ACM DL

Google Scholar

(146)
概要:　ランキングは情報検索研究における主要な関心事の一つです。数十年間、アドホック検索プロセスでは語彙匹配信号が主導していましたが、この信号のみを使用することは語彙不一致問題を引き起こす可能性があります。近年、表現学習技術の発展に伴い、多くの研究者がより良いランキング性能を求めてDense Retrieval（DR）モデルに転向しています。既存のいくつかのDRモデルは既に有望な結果を得ていますが、その性能向上は主にトレーニング例のサンプリングに依存しています。多くの効果的なサンプリング戦略は、実際の使用において効率的でなく、また多くの戦略については、性能向上の仕組みや理由に関する理論的分析が不足しています。これらの研究課題を明らかにするために、DRモデルの異なる訓練戦略を理論的に調査し、なぜハードネガティブサンプリングがランダムサンプリングよりも優れているのかを説明しようとします。この分析を通じて、既存の多くの訓練方法で採用されている静的ハードネガティブサンプリングに潜在的なリスクが多いことも明らかにしました。そこで、私たちはそれぞれSTAR（Stable Training Algorithm for dense Retrieval）とADORE（query-side training Algorithm for Directly Optimizing Ranking pErformance）という2つの訓練戦略を提案します。STARはランダムネガティブを導入することでDR訓練プロセスの安定性を向上させます。ADOREは広く採用されている静的ハードネガティブサンプリング法を動的なものに置き換え、ランキング性能を直接最適化します。2つの公開されている検索ベンチマークデータセットでの実験結果は、いずれの戦略も既存の競争力のあるベースラインに対して有意な改善をもたらし、またそれらの組み合わせにより最高の性能が得られることを示しています。

Abstract:　 Ranking has always been one of the top concerns in information retrieval researches. For decades, the lexical matching signal has dominated the ad-hoc retrieval process, but solely using this signal in retrieval may cause the vocabulary mismatch problem. In recent years, with the development of representation learning techniques, many researchers turn to Dense Retrieval (DR) models for better ranking performance. Although several existing DR models have already obtained promising results, their performance improvement heavily relies on the sampling of training examples. Many effective sampling strategies are not efficient enough for practical usage, and for most of them, there still lacks theoretical analysis in how and why performance improvement happens. To shed light on these research questions, we theoretically investigate different training strategies for DR models and try to explain why hard negative sampling performs better than random sampling. Through the analysis, we also find that there are many potential risks in static hard negative sampling, which is employed by many existing training methods. Therefore, we propose two training strategies named a Stable Training Algorithm for dense Retrieval (STAR) and a query-side training Algorithm for Directly Optimizing Ranking pErformance (ADORE), respectively. STAR improves the stability of DR training process by introducing random negatives. ADORE replaces the widely-adopted static hard negative sampling method with a dynamic one to directly optimize the ranking performance. Experimental results on two publicly available retrieval benchmark datasets show that either strategy gains significant improvements over existing competitive baselines and a combination of them leads to the best performance.

B-PROP: Bootstrapped Pre-training with Representative Words Prediction for Ad-hoc Retrieval
Authors: Xinyu Ma (1), Jiafeng Guo (1), Ruqing Zhang (1), Yixing Fan (1), Yingyan Li (1), Xueqi Cheng (1)
1: Institute of Computing Technology

ACM DL

Google Scholar

(147)
概要:　事前学習と微調整は、多くの下流自然言語処理（NLP）タスクで顕著な成功を収めています。最近では、情報検索（IR）に特化した事前学習手法も検討されており、最新の成功例としてPROP方法が挙げられます。この方法は、さまざまなアドホック検索ベンチマークで新しいSOTA（State-Of-The-Art）を達成しています。PROPの基本的なアイデアは、クエリ尤度モデルに触発されて作成された代表単語予測（ROP）タスクを事前学習に利用することです。優れた性能を発揮しているものの、ROPタスクの構築過程で採用されている古典的なユニグラム言語モデルによって、その効果が制限される可能性があります。この問題に対処するために、私たちはBERTに基づいたブートストラップ事前学習方法（B-PROP）をアドホック検索向けに提案します。キーポイントは、ROPタスク構築のための古典的なユニグラム言語モデルを強力なコンテキスト（文脈）言語モデルBERTで置き換え、IR向けにカスタマイズされた目標に向けてBERT自体を再訓練することです。具体的には、ランダムさからの乖離アイデアに触発された新しいコントラスト手法を導入し、BERTの自己注意メカニズムを利用してドキュメントから代表単語をサンプリングします。下流のアドホック検索タスクでさらに微調整することで、私たちの方法はPROPや他のベースラインに対して顕著な改善を達成し、さまざまなアドホック検索タスクにおけるSOTAをさらに推し進めます。

Abstract:　 Pre-training and fine-tuning have achieved remarkable success in many downstream natural language processing (NLP) tasks. Recently, pre-training methods tailored for information retrieval (IR) have also been explored, and the latest success is the PROP method which has reached new SOTA on a variety of ad-hoc retrieval benchmarks. The basic idea of PROP is to construct therepresentative words prediction (ROP) task for pre-training inspired by the query likelihood model. Despite its exciting performance, the effectiveness of PROP might be bounded by the classical unigram language model adopted in the ROP task construction process. To tackle this problem, we propose a bootstrapped pre-training method (namely B-PROP) based on BERT for ad-hoc retrieval. The key idea is to use the powerful contextual language model BERT to replace the classical unigram language model for the ROP task construction, and re-train BERT itself towards the tailored objective for IR. Specifically, we introduce a novel contrastive method, inspired by the divergence-from-randomness idea, to leverage BERT's self-attention mechanism to sample representative words from the document. By further fine-tuning on downstream ad-hoc retrieval tasks, our method achieves significant improvements over PROP and other baselines, and further pushes forward the SOTA on a variety of ad-hoc retrieval tasks.

Standing in Your Shoes: External Assessments for Personalized Recommender Systems
Authors: Hongyu Lu (1), Weizhi Ma (1), Min Zhang (1), Maarten de Rijke (2), Yiqun Liu (1), Shaoping Ma (1)
1: Tsinghua University, 2: University of Amsterdam & Ahold Delhaize

ACM DL

Google Scholar

(148)
概要:　推薦システムの評価は、ユーザーの好みに関するデータに依存していますが、その主観的な性質から直接取得することは困難です。現在の推薦システムは、ユーザーの過去の操作履歴を暗黙的または明示的なフィードバックとして広く利用していますが、この種のデータはさまざまなバイアスに悩まされがちです。ユーザーの個人的な好みを第三者による注釈を通じて収集し理解する研究はほとんど行われていません。情報検索シナリオでは、外部評価、すなわちシステムのユーザーでない評価者による注釈が幅広く使用されています。これをユーザーの好みのラベル作成に活用することは可能なのでしょうか？本論文は、外部評価を好みのラベリングおよび推薦システムの評価に組み込む初めての試みを紹介します。その目的は、パーソナライズされた推薦システムにおいて、外部評価の可能性と信頼性を検証することにあります。私たちは、複数の役割とセッションを持つユーザー研究を通じて、ユーザーの実際の好みと評価者の推定好みの両方を収集しました。評価者間の一致およびユーザーと評価者の整合性を調査することで、外部の好み評価の合理的な安定性と高い精度を示しています。さらに、システム評価における外部評価の利用についても調査しました。従来の履歴ベースのオンライン評価よりも、ユーザーのオンラインフィードバックとの一致度が高いことが観察されました。我々の発見は、外部評価がパーソナライズされた推薦シナリオにおいてユーザーの好みラベルを評価し、システムを評価するために使用できることを示しています。

Abstract:　 The evaluation of recommender systems relies on user preference data, which is difficult to acquire directly because of its subjective nature. Current recommender systems widely utilize users' historical interactions as implicit or explicit feedback, but such data usually suffers from various types of bias. Little work has been done on collecting and understanding user's personal preferences via third-party annotations. External assessments, that is, annotations made by assessors who are not the systems' users, have been widely used in information search scenarios. Is it possible to use external assessments to construct user preference labels? This paper presents the first attempt to incorporate external assessments into preference labeling and recommendation evaluation. The aim is to verify the possibility and reliability of external assessments for personalized recommender systems. We collect both users' real preferences and assessors' estimated preferences through a multi-role, multi-session user study. By investigating the inter-assessor agreement and user-assessor consistency, we demonstrate the reasonable stability and high accuracy of external preference assessments. Furthermore, we investigate the usage of external assessments in system evaluation. A higher degree of consistency with users' online feedback is observed, even better than traditional history-based online evaluation. Our findings show that external assessments can be used for assessing user preference labels and evaluating systems in personalized recommendation scenarios.

Evaluation Measures Based on Preference Graphs
Authors: Charles L.A. Clarke (1), Chengxi Luo (1), Mark D. Smucker (1)
1: University of Waterloo

ACM DL

Google Scholar

(149)
概要:　検索のオフライン評価では、ランカーによって返される結果の質を測定するための基準を定義する必要があります。多くの場合、この基準は関連度グレードによって絶対的な観点から定義されますが、相対的な観点からも嗜好によって定義することができます。これらの嗜好は明示的な嗜好判定を通じて作成される場合もあれば、関連度グレードから導き出されたり、クリックや他のシグナルから推測されることもあります。複数のソースからの嗜好を組み合わせることも可能です。絶対グレードと対照的に、嗜好は関連度の複雑な定義を避け、ただランカーが一つの結果を他の結果よりも好むべきことを示します。嗜好のシンプルさと柔軟性にもかかわらず、広範な採用が限られているのは、確立された評価尺度の欠如によるものです。この分野の最近の研究は2つのアプローチを取っています：1）嗜好のセットとランカーによって生成された実際のランク付けとの間の一致と不一致の重み付きカウントに基づく測定方法；および2）嗜好をnDCGのような従来の測定方法で使用するためのゲイン値に変換する測定方法です。どちらのアプローチも、理論的基盤がほとんどまたはまったくない重みやゲインを指定するための方法を必要とし、これらの測定値の値には明確で有意義な解釈がありません。これらの問題を解決するために、嗜好の有向マルチグラフとランカーによって生成された実際のランク付けとの類似性を計算する評価尺度を提案します。この尺度は、ランク類似性測定基準に基づいて、嗜好グラフの頂点の順序を計算し、実際のランク付けとの類似性を最大化します。この最大類似性が尺度の値となります。嗜好グラフはしばしば非巡回的か、ほぼ非巡回的であり、尺度を計算するために、ほぼ非巡回グラフに対して良好な結果を生むことが知られている近似貪欲アルゴリズムを拡張します。ランク類似性測定基準としては、検索および関連するアプリケーションの要件に明示的に合致するように作成されたRank Biased Overlap（RBO）を使用します。我々は、最近の研究で検討された複数の嗜好コレクションで新しい尺度を検証します。

Abstract:　 The offline evaluation of search requires us to define a standard against which we measure the quality of results returned by a ranker. Frequently this standard is defined in absolute terms through relevance grades, but it can also be defined in relative terms through preferences. These preferences might be created through explicit preference judgments, derived from relevance grades, or inferred from clicks and other signals. Preferences from multiple sources might even be combined. In contrast to absolute grades, preferences avoid complex definitions of relevance, indicating only that a ranker should favor one result over another. Despite the simplicity and flexibility of preferences, widespread adoption has been limited by the lack of established evaluation measures. Recent work in this direction has taken two approaches: 1) measures based on weighted counts of agreements and disagreements between a set of preferences and an actual ranking generated by a ranker; and 2) measures that translate preferences into gain values for use with traditional measures, such as nDCG. Both approaches require methods for specifying weights or gains that have little or no theoretical foundation, and the values of these measures have no clear and meaningful interpretation. To address these problems, we propose an evaluation measure that computes the similarity between a directed multigraph of preferences and an actual ranking generated by a ranker. The measure computes an ordering for the vertices of the preference graph that maximizes its similarity to the actual ranking under a rank similarity measure. This maximum similarity becomes the value of the measure. Preference graphs are often acyclic, or nearly so, and to compute the measure we extend an approximate greedy algorithm that is known to produce good results for nearly acyclic graphs. For the rank similarity measure we employ Rank Biased Overlap (RBO) which was explicitly created to match the requirements of search and related applications. We validate the new measure over several collections of preferences explored in recent work.

Do Affective Cues Validate Behavioural Metrics for Search?
Authors: Daniel McDuff (1), Paul Thomas (2), Nick Craswell (3), Kael Rowan (1), Mary Czerwinski (1)
1: Microsoft, 2: Microsoft, 3: Microsoft

ACM DL

Google Scholar

(150)
概要:　検索エンジンの評価には、クエリの再構成やクリックといった検索者の行動の痕跡が一般に利用されます。これらの行動が関連性、有用性、満足度などといったより重要なものの代理指標となるという前提があります。感情コンピューティング技術は、検索セッション中の本能的な表現反応を調査することで、これらの期待の一部を確認する手助けを与えてくれます。しかし、これまでの研究は、実験室環境での少数の人口と限られた数の具体的な検索タスクのみを対象としたものでした。本研究では、152人の情報労働者の長期的かつ実環境における検索行動を数週間にわたって分析し、同時に彼らの顔の表情を追跡しました。20,000を超える検索セッションと45,000件以上のクエリから得られた結果により、感情表現が既存の「クリックベース」の指標と一致し、補完し合うことが観察されました。クエリレベルでは、短い滞在時間で終わる検索は微笑み（「幸福」の表現）の減少と関連し、一方でクエリの再構成が行われた場合、再構成の結果は微笑みの増加と関連し、人々が必要な情報に収束する際のポジティブな結果を示唆します。セッションレベルでは、再構成を含むセッションはより少ない微笑みとより多くの眉間にシワを寄せる表情（「怒り/フラストレーション」の表現）と関連があります。同様に、短い滞在時間のクリックが含まれるセッションも微笑みの減少と関連しています。これらのデータは、検索体験の本能的な側面に関する洞察を提供し、エンジン性能の評価に新たな次元を提示します。

Abstract:　 Traces of searcher behaviour, such as query reformulation or clicks, are commonly used to evaluate a running search engine. The underlying expectation is that these behaviours are proxies for something more important, such as relevance, utility, or satisfaction. Affective computing technology gives us the tools to help confirm some of these expectations, by examining visceral expressive responses during search sessions. However, work to date has only studied small populations in laboratory settings and with a limited number of contrived search tasks. In this study, we analysed longitudinal, in-situ, search behaviours of 152 information workers, over the course of several weeks while simultaneously tracking their facial expressions. Results from over 20,000 search sessions and 45,000 queries allow us to observe that indeed affective expressions are consistent with, and complementary to, existing "click-based'' metrics. On a query-level, searches that result in a short dwell time are associated with a decrease in smiles (expressions of "happiness'') and that if a query is reformulated the results of the reformulation are associated with an increase in smiling---suggesting a positive outcome as people converge on the information they need. On a session-level, sessions that feature reformulations are more commonly associated with fewer smiles and more furrowed brows (expressions of "anger/frustration''). Similarly, sessions with short-dwell clicks are also associated with fewer smiles. These data provide an insight into visceral aspects of search experience and present a new dimension for evaluating engine performance.

Current Challenges and Future Directions in Podcast Information Access
Authors: Rosie Jones (1), Hamed Zamani (2), Markus Schedl (3), Ching-Wei Chen (4), Sravana Reddy (1), Ann Clifton (4), Jussi Karlgren (5), Helia Hashemi (6), Aasish Pappu (4), Zahra Nazari (4), Longqi Yang (7), Oguz Semerci (4), Hugues Bouchard (4), Ben Carterette (4)
1: Spotify, 2: University of Massachusetts Amherst, 3: Johannes Kepler University, 4: Spotify, 5: Spotify, 6: University of Massachusetts, 7: Microsoft

ACM DL

Google Scholar

(151)
概要:　ポッドキャストは、多種多様なジャンルとスタイルにわたる口述文書であり、世界中で聴取者が増加し、リスナーとクリエーターの両方にとって参入障壁が急速に低下しています。検索と推薦の研究および産業における大きな進歩は、ポッドキャストの分野ではまだ影響を見せておらず、推薦は依然として主に口コミに依存しています。本展望論文では、ポッドキャストと他のメディアとの多くの違いに焦点を当て、ポッドキャスト情報アクセスの領域における課題と今後の研究方向性についての我々の見解を議論します。

Abstract:　 Podcasts are spoken documents across a wide-range of genres and styles, with growing listenership across the world, and a rapidly lowering barrier to entry for both listeners and creators. The great strides in search and recommendation in research and industry have yet to see impact in the podcast space, where recommendations are still largely driven by word of mouth. In this perspective paper, we highlight the many differences between podcasts and other media, and discuss our perspective on challenges and future research directions in the domain of podcast information access.

MS MARCO: Benchmarking Ranking Models in the Large-Data Regime
Authors: Nick Craswell (1), Bhaskar Mitra (2), Emine Yilmaz (3), Daniel Campos (4), Jimmy Lin (5)
1: Microsoft, 2: Microsoft & University College London, 3: University College London, 4: University of Illinois at Urbana-Champaign, 5: University of Waterloo & Microsoft

ACM DL

Google Scholar

(152)
概要:　 TREC、CLEF、NTCIR、FIREといった評価活動や、MS MARCOのような公開リーダーボードは、研究の促進と進捗の追跡を目的とし、我々の分野における大きな問題に取り組んでいる。しかし、その目標は単にトップスコアを達成して「最優秀」のランを特定することだけではない。目標は、多様な環境で機能し、研究と実践に採用される、新たで堅牢な技術を開発し、分野を前進させることにある。本稿では、MS MARCOとTREC Deep Learning Trackをケーススタディとして使用し、1990年代のTRECアドホックランキングの事例と比較する。評価活動の設計が特定の結果を促進または抑制する方法を示し、結果の内部および外部妥当性に関する疑問を提起する。いくつかの落とし穴に関する分析と、それらを回避するためのベストプラクティスを示す。我々はこれまでの進捗をまとめ、「堅牢な有用性」という望ましい最終状態を記述し、そこに到達するために必要なステップを説明する。

Abstract:　 Evaluation efforts such as TREC, CLEF, NTCIR and FIRE, alongside public leaderboard such as MS MARCO, are intended to encourage research and track our progress, addressing big questions in our field. However, the goal is not simply to identify which run is "best", achieving the top score. The goal is to move the field forward by developing new robust techniques, that work in many different settings, and are adopted in research and practice. This paper uses the MS MARCO and TREC Deep Learning Track as our case study, comparing it to the case of TREC ad hoc ranking in the 1990s. We show how the design of the evaluation effort can encourage or discourage certain outcomes, and raising questions about internal and external validity of results. We provide some analysis of certain pitfalls, and a statement of best practices for avoiding such pitfalls. We summarize the progress of the effort so far, and describe our desired end state of "robust usefulness", along with steps that might be required to get us there.

Towards Multi-Modal Conversational Information Seeking
Authors: Yashar Deldjoo (1), Johanne R. Trippas (2), Hamed Zamani (3)
1: Polytechnic University of Bari, 2: University of Melbourne, 3: University of Massachusetts Amherst

ACM DL

Google Scholar

(153)
概要:　最新の会話型情報探索（CIS）に関する研究は、主に単一モードの対話と情報項目に焦点を当てています。この展望論文は、豊富なコンテキストを活用し、エラーを克服し、アクセス性を向上させるためのマルチモーダル会話型情報探索（MMCIS）システムの開発と評価の重要性を強調します。我々は、マルチモーダルとCISの研究間のギャップを埋め、MMCISに対する正式な定義を提供します。そして、MMCISシステムの設計、実装、評価における潜在的な機会と研究課題について議論します。この研究に基づき、MMCIS研究を促進するための実用的なオープンソースのフレームワークを提案し、実装します。

Abstract:　 Recent research on conversational information seeking (CIS) mostly focuses on uni-modal interactions and information items. This per- spective paper highlights the importance of moving towards de- veloping and evaluating multi-modal conversational information seeking (MMCIS) systems as they enable us to leverage richer context, overcome errors, and increase accessibility. We bridge the gap between the multi-modal and CIS research and provide a formal definition for MMCIS. We discuss potential opportunities and research challenges in designing, implementing, and evaluating MMCIS systems. Based on this research, we propose and implement a practical open-source framework for facilitating MMCIS research.

AMM: Attentive Multi-field Matching for News Recommendation
Authors: Qi Zhang (1), Qinglin Jia (1), Chuyuan Wang (1), Jingjie Li (1), Zhaowei Wang (1), Xiuqiang He (2)
1: Noah's Ark Lab, 2: Noah's Ark Lab

ACM DL

Google Scholar

(154)
概要:　パーソナル化されたニュースレコメンデーションは、ユーザーが興味を持つニュースを見つけるための重要な技術であり、ユーザーの興味と候補ニュースを正確にマッチングすることがニュースレコメンデーションの核心にあります。既存の研究では、一般的にユーザーの閲覧したニュースを集約してユーザーの興味ベクトルを学習し、それを候補ニュースベクトルとマッチングしますが、これではテキストの意味的なマッチング信号が失われる可能性があります。本論文では、ニュースレコメンデーションのための注目型マルチフィールドマッチング（AMM）フレームワークを提案します。このフレームワークは、各閲覧ニュースと候補ニュースの間の意味的マッチング表現を捕捉し、それらを最終的なユーザーとニュースのマッチング信号として集約します。さらに、我々の手法はマルチフィールドの情報を組み込み、フィールド内およびフィールド間のマッチングメカニズムを設計し、異なるフィールド（例：タイトル、、本文）から補完的な情報を活用してマルチフィールドマッチング表現を取得します。包括的な意味理解を達成するために、最も人気のある言語モデルであるBERTを用いて各閲覧ニュースと候補ニュースペアのマッチング表現を学習し、集約手順で注意メカニズムを組み込んで最終的なユーザーとニュースのマッチング信号の重要性を特徴付けます。実世界のデータセットでの実験により、AMMの有効性が検証されました。

Abstract:　 Personalized news recommendation is a critical technology to help users find interested news, and how to precisely match users' interests and candidate news lies in the core of news recommendation. Existing studies generally learn user's interest vector by aggregating his/her browsed news and then match it with the candidate news vector, which may lose the textual semantic matching signals for recommendation. In this paper, we propose an Attentive Multi-field Matching (AMM) framework for news recommendation which captures the semantic matching representations between each browsed news and candidate news, and then aggregates them as final user-news matching signal. In addition, our method incorporates multi-field information and designs a within-field and cross-field matching mechanism, which leverages complementary information from different fields (e.g., titles, abstracts and bodies) and obtain the multi-field matching representations. To achieve a comprehensive semantic understanding, we employ the most popular language model BERT to learn the matching representation of each browsed-candidate news pair, and incorporate the attention mechanism in aggregating procedure to characterize the importance of each matching representation for the final user-news matching signal. Experiments on the real world datasets validate the effectiveness of AMM.

An ALBERT-based Similarity Measure for Mathematical Answer Retrieval
Authors: Anja Reusch (1), Maik Thiele (1), Wolfgang Lehner (1)
1: Technische Universität Dresden

ACM DL

Google Scholar

(155)
概要:　数学的言語処理（MLP）は、数学文書の自動処理と分析を扱っており、数学記号やテキストの良好な表現に大きく依存しています。本研究の目的は、最先端の教師なし深層学習手法のモデリング能力を探求し、そのような表現を作成することです。具体的には、数学に特化したStackExchangeのデータを用いて、ALBERTモデルの異なるインスタンスを事前訓練し、数学的解答検索のタスクにおいてファインチューニングを行いました。我々の評価によると、ALBERTは過去の全てのシステムを上回り、現行の最先端システムと同等の性能を示しました。これは、数学的ポストをモデリングする際の強力な能力を示唆しています。このことは、我々のアプローチが自動証明の検証や科学文献のなど、MLPの他のさまざまなタスクにも有益である可能性を示唆しています。

Abstract:　 Mathematical Language Processing (MLP) deals with the automated processing and analysis of mathematical documents and relies heavily on good representations of mathematical symbols and texts. The aim of this work is to explore the modeling capabilities of state-of-the-art unsupervised deep learning methods to create such representations. Therefore, we pre-trained different instances of an ALBERT model on Mathematics StackExchange data and fine-tuned it on the task of Mathematical Answer Retrieval. Our evaluation shows that ALBERT outperforms all previous systems and is on par with current state-of-the-art systems for math retrieval indicating strong capabilities of modeling mathematical posts. This implies that our approach can also be beneficial to various other tasks in MLP such as automatic proof checking or summarization of scientific texts.

An Exploration of Tester-based Evaluation of User Simulators for Comparing Interactive Retrieval Systems.
Authors: Sahiti Labhishetty (1), Chengxiang Zhai (1)
1: University of Illinois Urbana-Champaign

ACM DL

Google Scholar

(156)
概要:　対話型情報検索 (IIR) システムの評価にはユーザーシミュレーションが必要です。しかし、いかなるユーザーシミュレーターも有用であるためには、その信頼性が担保されなければなりません。本論文では、ユーザーシミュレーターの信頼性を評価するための新しい「テスターベースの評価アプローチ」を提案します。このアプローチでは、期待されるパフォーマンスパターンを持つ一連の情報検索 (IR) システムに基づいてテスターを構築し、そのテスターをユーザーシミュレーターに適用してユーザーシミュレーターが期待されるパフォーマンスパターンを生成するかどうかを確認します。複数のテスターを構築し、それらを代表的なユーザーシミュレーターのセットに適用して、提案したテスターベースの評価方法の実現可能性と有効性を実証的に検討します。結果は、テスターベースの評価がユーザーシミュレーターの評価およびIIRシステムの評価において信頼性の高いものを選択するための実現可能かつ効果的な方法であることを示しています。

Abstract:　 User simulation is needed for evaluating Interactive Information Retrieval (IIR) Systems. However, for any user simulator to be useful, it must be reliable. In this paper, we propose a novel Tester-based evaluation approach to evaluating the reliability of user simulators, in which we would construct a Tester based on a set of IR systems with an expected performance pattern and apply such a Tester to a user simulator to see if the user simulator would generate the expected performance pattern. We construct multiple Testers and apply them to a set of representative user simulators to empirically study the feasibility and effectiveness of the proposed Tester-based evaluation method. The results show that Tester-based evaluation is a feasible and effective method for evaluating user simulators and selecting reliable ones for evaluating IIR systems.

APRF-Net: Attentive Pseudo-Relevance Feedback Network for Query Categorization
Authors: Ali Ahmadvand (1), Sayyed M. Zahiri (2), Simon Hughes (2), Khalifeh Al Jadda (2), Surya Kallumadi (2), Eugene Agichtein (1)
1: Emory University, 2: The Home Depot

ACM DL

Google Scholar

(157)
概要:　クエリの分類は、eコマース検索におけるクエリ意図の理解において重要な要素です。一般的なクエリ分類のタスクは、製品分類体系内の関連する詳細な製品カテゴリを選択することです。頻繁に使用されるクエリに対しては、クリックデータなどの豊富な顧客行動データを用いて関連する製品カテゴリを推測することができます。しかし、検索トラフィックの大部分を占める、より稀なクエリに関しては、このような顧客行動データが不足しているため、それに依存することは十分ではありません。稀なクエリの分類を改善するために、我々は擬似関連フィードバック（PRF）アプローチを適応させ、意味的または語彙的に類似した製品文書に埋め込まれた潜在知識を活用して、稀なクエリの表現を充実させます。この目的のために、我々は「注意型擬似関連フィードバックネットワーク（APRF-Net）」という新しいディープニューラルモデルを提案し、クエリ分類のために稀なクエリの表現を向上させます。我々のアプローチの有効性を示すために、大手商業検索エンジンから検索クエリを収集し、APRF-Netをテキスト分類の最先端ディープラーニングモデルと比較しました。我々の結果は、APRF-NetがベースラインよりF1@1スコアで5.9%のクエリ分類の改善を達成し、稀な（テール）クエリに対しては8.2%の改善に至ることを示しています。本論文の発見は、検索クエリの表現と理解をさらに改善するために活用することができます。

Abstract:　 Query categorization is an essential part of query intent understanding in e-commerce search. A common query categorization task is to select the relevant fine-grained product categories in a product taxonomy. For frequent queries, rich customer behavior (e.g., click-through data) can be used to infer the relevant product categories. However, for more rare queries, which cover a large volume of search traffic, relying solely on customer behavior may not suffice due to the lack of this signal. To improve categorization of rare queries, we adapt the Pseudo-Relevance Feedback (PRF) approach to utilize the latent knowledge embedded in semantically or lexically similar product documents to enrich the representation of the more rare queries. To this end, we propose a novel deep neural model named Attentive Pseudo Relevance Feedback Network (APRF-Net) to enhance the representation of rare queries for query categorization. To demonstrate the effectiveness of our approach, we collect search queries from a large commercial search engine, and compare APRF-Net to state-of-the-art deep learning models for text classification. Our results show that the APRF-Net significantly improves query categorization by 5.9% on F1@1 score over the baselines, which increases to 8.2% improvement for the rare (tail) queries. The findings of this paper can be leveraged for further improvements in search query representation and understanding.

Augmenting Sequential Recommendation with Pseudo-Prior Items via Reversely Pre-training Transformer
Authors: Zhiwei Liu (1), Ziwei Fan (1), Yu Wang (1), Philip S. Yu (1)
1: University of Illinois at Chicago

ACM DL

Google Scholar

(158)
概要:　シーケンシャルレコメンデーション（Sequential Recommendation）は、アイテムの時系列に沿った進化パターンをモデル化することによってその特性を捉えます。シーケンシャルレコメンデーションの主要な目標は、アイテム間の遷移相関を捕捉することにあります。最近のトランスフォーマーモデルの発展は、効果的なシーケンスエンコーダー（例：SASRecやBERT4Rec）の設計をコミュニティに促しました。しかし、これらのトランスフォーマーベースのモデルは、コールドスタート問題、すなわち短いシーケンスに対しての性能低下に悩まされています。そこで、我々はオリジナルのシーケンシャル相関を保ちながら短いシーケンスを拡張することを提案します。我々は擬似事前アイテム（Pseudo-prior items）を用いたシーケンシャルレコメンデーションの拡張フレームワーク（ASReP）を導入します。まず、トランスフォーマーを逆方向のシーケンスで事前トレーニングし、事前アイテムを予測します。そして、このトランスフォーマーを用いて短いシーケンスの冒頭に偽の履歴アイテムを生成します。最後に、これらの拡張されたシーケンスを時系列順に使用してトランスフォーマーを微調整し、次のアイテムを予測します。2つの実世界のデータセットで行った実験により、ASRePの有効性が確認されました。コードはhttps://github.com/DyGRec/ASRePで入手可能です。

Abstract:　 Sequential Recommendation characterizes the evolving patterns by modeling item sequences chronologically. The essential target of it is to capture the item transition correlations. The recent developments of transformer inspire the community to design effective sequence encoders,e.g., SASRec and BERT4Rec. However, we observe that these transformer-based models suffer from the cold-start issue,i.e., performing poorly for short sequences. Therefore, we propose to augment short sequences while still preserving original sequential correlations. We introduce a new framework for Augmenting Sequential Recommendation with Pseudo-prior items (ASReP). We firstly pre-train a transformer with sequences in a reverse direction to predict prior items. Then, we use this transformer to generate fabricated historical items at the beginning of short sequences. Finally, we fine-tune the transformer using these augmented sequences from the time order to predict the next item. Experiments on two real-world datasets verify the effectiveness of ASReP. The code is available on https://github.com/DyGRec/ASReP.

Cluster-Based Bandits: Fast Cold-Start for Recommender System New Users
Authors: Sulthana Shams (1), Daron Anderson (1), Douglas Leith (1)
1: Trinity College Dublin

ACM DL

Google Scholar

(159)
概要:　新規ユーザーの好みを迅速かつ確実に学習する方法は、レコメンダーシステム設計における重要な課題であり続けています。本論文では、この課題に対処するために、新しいタイプのオンライン学習アルゴリズムである「クラスタベースのバンディット」を紹介します。このアルゴリズムは、ユーザーの好みに基づいてユーザーをクラスタにグループ化できる事実を利用しており、これにより新規ユーザーの好みの学習を加速させます。タスクはユーザーがどのクラスタに属するかを識別することであり、評価されるアイテム数よりもクラスタ数がはるかに少ない場合が多いためです。しかし、クラスタリングだけでは十分ではありません。クラスタ内のユーザー間の変動は、ユーザーの評価にノイズを加える要因と考えられます。このようなノイズの存在下では、決定木のような決定論的手法は性能が低下します。そこで、新規ユーザーがどのクラスタに属するかを判断する際に特に有用な「識別アイテム」を特定します。これらのアイテムを使用することで、クラスタベースのバンディットアルゴリズムはユーザーの応答に効率的に適応し、新規ユーザーに割り当てる正しいクラスタを迅速に学習できるのです。

Abstract:　 How to quickly and reliably learn the preferences of new users remains a key challenge in the design of recommender systems. In this paper we introduce a new type of online learning algorithm, cluster-based bandits, to address this challenge. This exploits the fact that users can often be grouped into clusters based on the similarity of their preferences, and this allows accelerated learning of new user preferences since the task becomes one of identifying which cluster a user belongs to and typically there are far fewer clusters than there are items to be rated. Clustering by itself is not enough however. Intra-cluster variability between users can be thought of as adding noise to user ratings. Deterministic methods such as decision-trees perform poorly in the presence of such noise. We identify so-called distinguisher items that are particularly informative for deciding which cluster a new user belongs to despite the rating noise. Using these items the cluster-based bandit algorithm is able to efficiently adapt to user responses and rapidly learn the correct cluster to assign to a new user.

Contextualized Offline Relevance Weighting for Efficient and Effective Neural Retrieval
Authors: Xuanang Chen (1), Ben He (1), Kai Hui (2), Yiran Wang (1), Le Sun (3), Yingfei Sun (1)
1: University of Chinese Academy of Sciences, 2: Amazon Alexa, 3: Institute of Software

ACM DL

Google Scholar

(160)
概要:　オンライン検索の遅延は、大規模な事前学習済み言語モデル (例: BERT) を検索アプリケーションに展開する際の大きなボトルネックです。トランスフォーマーベースの文書拡張技術における最近の進展に触発されて、オフラインの関連性の重み付けとオンライン検索の効率をトレードオフすることを提案します。具体的には、生成された疑似クエリごとに収集された隣接文書を強力なBERTランカーを用いて重み付けします。オンライン検索段階では、従来のクエリと文書のマッチングが、はるかに低コストなクエリから疑似クエリへのマッチングに減少し、事前計算された隣接文書に基づいてドキュメントランクリストが迅速にリコールされます。パッセージと文書ランクタスクの両方を含む標準的なMS MARCOデータセットでの広範な実験により、オンラインの効率性と効果の両方の点で本手法の有望な結果が示されています。

Abstract:　 Online search latency is a major bottleneck in deploying large-scale pre-trained language models, e.g. BERT, in retrieval applications. Inspired by the recent advances in transformer-based document expansion technique, we propose to trade offline relevance weighting for online retrieval efficiency by utilizing the powerful BERT ranker to weight the neighbour documents collected by generated pseudo-queries for each document. In the online retrieval stage, the traditional query-document matching is reduced to the much less expensive query to pseudo-query matching, and a document rank list is quickly recalled according to the pre-computed neighbour documents. Extensive experiments on the standard MS MARCO dataset with both passage and document ranking tasks demonstrate promising results of our method in terms of both online efficiency and effectiveness.

Conversational vs Traditional: Comparing Search Behavior and Outcome in Legal Case Retrieval
Authors: Bulou Liu (1), Yueyue Wu (1), Yiqun Liu (1), Fan Zhang (1), Yunqiu Shao (1), Chenliang Li (2), Min Zhang (1), Shaoping Ma (1)
1: Tsinghua University, 2: Wuhan University

ACM DL

Google Scholar

(161)
概要:　近年、法的ケース検索は情報検索（IR）研究コミュニティで多くの注目を集めています。これは、特定の問い合わせケースに対応する支持ケースを検索し、より良い法制度に貢献することを目的としています。しかし、法的ケース検索システムを使用する際、ユーザーは情報ニーズを正確に表現するクエリを構築するのが難しいと感じることが多くあります。特に、十分な専門知識がない場合にはなおさらです。対話型検索は、ユーザーの複雑で探索的な情報ニーズを満たすために広く認識されているため、私たちは対話型検索パラダイムがユーザーの法的ケース検索体験を向上させるかどうかを調査します。実験室ベースの研究を設計し、従来の検索システムとエージェント仲介型の対話型法的ケース検索システムを使用する際のユーザーの相互作用行動および明示的なフィードバック信号を収集しました。収集されたデータに基づいて、これら2つの異なるインタラクションパラダイムの検索行動と結果を比較します。従来の方法と比較して、実験結果は対話型ケース検索システムでユーザーがより良い検索性能を達成できることを示しています。さらに、対話型システムはユーザーがクエリを構築し、結果を確認する努力を節約することもできます。

Abstract:　 In recent years, legal case retrieval has attracted much attention in the IR research community. It aims to retrieve supporting cases for a given query case and contributes to better legal systems. While using a legal case retrieval system, users always feel difficult to construct accurate queries to express their information need, especially when they lack sufficient domain knowledge. Since conversational search has been widely recognized to fulfill users' complex and exploratory information need, we investigate whether conversational search paradigm can be adopted to improve users' legal case retrieval experience. We design a laboratory-based study to collect users' interaction behaviors and explicit feedback signals while using traditional and agent-mediated conversational legal case retrieval systems. Based on the collected data, we compare search behavior and outcome of these two different kinds of interaction paradigms. Compared with the traditional one, experimental results show that users can achieve better retrieval performance with the conversational case retrieval system. Moreover, conversational system can also save users' efforts in formulating queries and examining results.

Counterfactual Explanations for Neural Recommenders
Authors: Khanh Hiep Tran (1), Azin Ghazimatin (1), Rishiraj Saha Roy (1)
1: Max Planck Institute for Informatics

ACM DL

Google Scholar

(162)
概要:　近年ではニューラルレコメンダシステムが最先端となっていますが、ディープモデルの複雑さからエンドユーザーに対する具体的な説明を生成することは依然として困難な問題です。既存の手法は多様な特徴に対するアテンション分布に基づいており、これらの適合性については疑問が残るうえ、エンドユーザーにとっても理解するのが難しいです。一方、ユーザー自身の行動の小さなセットに基づく反事実的説明は、具体性の問題に対する許容可能な解決策として示されています。しかしながら、現在の反事実的説明に関する研究はニューラルモデルに容易に適用できるものではありません。本研究では、ニューラルレコメンダシステムに対する反事実的説明を見つけるための初の一般的フレームワークであるACCENTを提案します。これは、推奨に最も関連するトレーニングポイントを特定するために、最近提案されたインフルエンス関数を、一対のアイテムに拡張し、反事実的セットを反復的に導き出すプロセスです。ACCENTを用いて、Neural Collaborative Filtering (NCF) およびRelational Collaborative Filtering (RCF)という2つの一般的なニューラルモデルに対する反事実的説明を生成し、MovieLens 100Kデータセットのサンプルでその実現可能性を示します。

Abstract:　 While neural recommenders have become the state-of-the-art in recent years, the complexity of deep models still makes the generation of tangible explanations for end users a challenging problem. Existing methods are usually based on attention distributions over a variety of features, which are still questionable regarding their suitability as explanations, and rather unwieldy to grasp for an end user. Counterfactual explanations based on a small set of the user's own actions have been shown to be an acceptable solution to the tangibility problem. However, current work on such counterfactuals cannot be readily applied to neural models. In this work, we propose ACCENT, the first general framework for finding counterfactual explanations for neural recommenders. It extends recently-proposed influence functions for identifying training points most relevant to a recommendation, from a single to a pair of items, while deducing a counterfactual set in an iterative process. We use ACCENT to generate counterfactual explanations for two popular neural models, Neural Collaborative Filtering (NCF) and Relational Collaborative Filtering (RCF), and demonstrate its feasibility on a sample of the popular MovieLens 100K dataset.

Cross-Batch Negative Sampling for Training Two-Tower Recommenders
Authors: Jinpeng Wang (1), Jieming Zhu (2), Xiuqiang He (2)
1: Tsinghua University, 2: Huawei Noah's Ark Lab

ACM DL

Google Scholar

(163)
概要:　大規模なレコメンダーシステムにおいて、アイテムおよびユーザの表現を学習するための手法として、二塔アーキテクチャが広く適用されています。多くの二塔モデルは、ミニバッチのサイズに依存する様々なインバッチネガティブサンプリング戦略を用いて訓練されています。しかし、大きなバッチサイズで二塔モデルを訓練することは非効率的であり、アイテムおよびユーザコンテンツの大量のメモリを必要とし、特徴エンコーディングに多くの時間を消費します。興味深いことに、ニューラルエンコーダーが訓練過程でウォームアップした後、同じ入力に対して比較的安定した特徴を出力することを発見しました。このような事実に基づき、最近のミニバッチから得られたエンコードされたアイテム埋め込みを利用してモデル訓練を促進する、単純かつ効果的なサンプリング戦略であるクロスバッチネガティブサンプリング（CBNS）を提案します。理論的分析と実証的評価の両方により、CBNSの有効性と効率性が示されました。

Abstract:　 The two-tower architecture has been widely applied for learning item and user representations, which is important for large-scale recommender systems. Many two-tower models are trained using various in-batch negative sampling strategies, where the effects of such strategies inherently rely on the size of mini-batches. However, training two-tower models with a large batch size is inefficient, as it demands a large volume of memory for item and user contents and consumes a lot of time for feature encoding. Interestingly, we find that neural encoders can output relatively stable features for the same input after warming up in the training process. Based on such facts, we propose a simple yet effective sampling strategy called Cross-Batch Negative Sampling (CBNS), which takes advantage of the encoded item embeddings from recent mini-batches to boost the model training. Both theoretical analysis and empirical evaluations demonstrate the effectiveness and the efficiency of CBNS.

De-Biased Modeling of Search Click Behavior with Reinforcement Learning
Authors: Jianghong Zhou (1), Sayyed M. Zahiri (2), Simon Hughes (2), Khalifeh Al Jadda (2), Surya Kallumadi (2), Eugene Agichtein (1)
1: Emory University, 2: The Home Depot

ACM DL

Google Scholar

(164)
概要:　ユーザーによるウェブ検索結果のクリックは、ウェブ検索の質を評価および向上させるための重要な指標の一つであり、最先端のLearning-To-Rank（LTR）モデルの一部として広く使用されています。主要な検索エンジンでは大規模な検索ログが利用可能であり、検索者のクリック行動を効果的にモデル化することでLTRモデルの評価や訓練が行われてきました。しかし、ユーザーのクリック行動をモデル化する際には、その行動に内在するバイアスを考慮することが不可欠です。特に、検索結果がクリックされなかった場合、それがユーザーによって関連性がないと判断されたわけではなく、単に見逃された可能性がある（特に低ランクの結果において）ことが考えられます。このようなクリックログデータ内のバイアスはクリックモデルに組み込まれ、結果として生成されるLTRランキングモデルや評価指標に誤差を伝播させる可能性があります。本論文では、デバイアス済み強化学習クリックモデル（DRLC）を提案します。DRLCモデルは、以前のユーザーの閲覧行動およびそれに伴う潜在状態に関する仮定を緩和します。DRLCモデルを実装するために、畳み込みニューラルネットワークが強化学習のための価値ネットワークとして使用され、クリックログのバイアスを減らすための方策を学習するように訓練されています。DRLCモデルの有効性を実証するために、まずログ尤度やパープレキシティなどの確立されたクリック予測指標を用いて、従来の最先端手法と性能を比較します。さらに、DRLCがランキング性能の向上にも寄与することを示します。我々の実験は、DRLCモデルがクリックログのバイアスを学習して削減する効果を示しており、ウェブ検索の質の向上にDRLCを使用する可能性を示しています。

Abstract:　 Users' clicks on Web search results are one of the key signals for evaluating and improving web search quality and have been widely used as part of current state-of-the-art Learning-To-Rank(LTR) models. With a large volume of search logs available for major search engines, effective models of searcher click behavior have emerged to evaluate and train LTR models. However, when modeling the users' click behavior, considering the bias of the behavior is imperative. In particular, when a search result is not clicked, it is not necessarily chosen as not relevant by the user, but instead could have been simply missed, especially for lower-ranked results. These kinds of biases in the click log data can be incorporated into the click models, propagating the errors to the resulting LTR ranking models or evaluation metrics. In this paper, we propose the De-biased Reinforcement Learning Click model (DRLC). The DRLC model relaxes previously made assumptions about the users' examination behavior and resulting latent states. To implement the DRLC model, convolutional neural networks are used as the value networks for reinforcement learning, trained to learn a policy to reduce bias in the click logs. To demonstrate the effectiveness of the DRLC model, we first compare performance with the previous state-of-art approaches using established click prediction metrics, including log-likelihood and perplexity. We further show that DRLC also leads to improvements in ranking performance. Our experiments demonstrate the effectiveness of the DRLC model in learning to reduce bias in click logs, leading to improved modeling performance and showing the potential for using DRLC for improving Web search quality.

DOZEN: Cross-Domain Zero Shot Named Entity Recognition with Knowledge Graph
Authors: Hoang-Van Nguyen (1), Francesco Gelli (1), Soujanya Poria (2)
1: PayPal, 2: Singapore University of Technology & Design

ACM DL

Google Scholar

(165)
概要:　自然言語処理の新しい発展に伴い、固有表現抽出（NER）に対する関心が高まっています。しかし、ほとんどの研究は、人、場所、組織などの限られた数のエンティティを持つ大規模な注釈付きデータセットに焦点を当てています。一方で、ドメイン固有のエンティティを含む他のデータセットも紹介されていますが、そのサイズが小さいため、最先端のディープモデルの適用は著しく制限されています。ゼロショット学習（ZSL）を行うための有望な新しいアプローチが存在するにもかかわらず、それらはクロスドメイン設定には設計されていません。我々は、外部知識の既存のオントロジーとエンティティとドメインを結びつけるアナロジーのセットから、異なるドメイン間のエンティティの関係を学習するクロスドメインゼロショット固有表現抽出（DOZEN）を提案します。大規模およびドメイン固有のデータセットの両方で実施された実験により、DOZENが異なるドメインからターゲットデータセットに存在しないエンティティを抽出する最適なオプションであることが示されました。

Abstract:　 With the new developments of natural language processing, increasing attention has been given to the task of Named Entity Recognition (NER). However, the vast majority of work focus on a small number of large-scale annotated datasets with a limited number of entities such as person, location and organization. While other datasets have been introduced with domain-specific entities, the smaller size of these largely limits the applicability of state-of-the-art deep models. Even if there are promising new approaches for performing zero-shot learning (ZSL), they are not designed for a cross-domain settings. We propose Cross Domain Zero Shot Named Entity Recognition with Knowledge Graph (DOZEN), which learns the relations between entities across different domains from an existing ontology of external knowledge and a set of analogies linking entities and domains. Experiments performed on both large scale and domain-specific datasets indicate that DOZEN is the most suitable option to extracts unseen entities in a target dataset from a different domain.

Dual Unbiased Recommender Learning for Implicit Feedback
Authors: Jae-woong Lee (1), Seongmin Park (1), Jongwuk Lee (1)
1: Sungkyunkwan University

ACM DL

Google Scholar

(166)
概要:　失われたデータがランダムでないという仮定のもとで、暗黙的なデータセットの固有のバイアスを軽減するために、偏りのないレコメンダー学習が積極的に研究されています。既存の研究は、ポジティブなフィードバックのバイアスのみを扱い、欠落したフィードバックのバイアスを考慮していませんが、これが最適パフォーマンスの向上を妨げる大きな要因となっています。本論文では、クリックされたデータと未クリックデータのバイアスを同時に排除するデュアルレコメンダー学習フレームワークを提案します。具体的には、提案する損失関数は二重の傾向スコア重み付けを採用し、クリックされたデータと未クリックデータから真のポジティブおよびネガティブな嗜好を効果的に推定します。また、提案する損失関数がクリックおよび未クリックデータの理想的な損失関数に収束することも証明しました。モデルに依存しないという性質により、既存の偏りのない学習モデルに適用可能です。実験結果から、提案手法は三つのデータセットにおいてMAP@1で最大5.54-24.56%まで最先端の偏りのないモデルを上回ることが示されました。

Abstract:　 Unbiased recommender learning has been actively studied to alleviate the inherent bias of implicit datasets under the missing-not-at-random assumption. Existing studies solely address the bias of positive feedback but do not account for the bias of missing feedback, which heavily affects their sub-optimal performance gains. This paper proposes a dual recommender learning framework that simultaneously eliminates the bias of clicked and unclicked data. Specifically, the proposed loss function adopts two propensity weighting to effectively estimate the true positive and negative preferences from clicked and unclicked data. We also prove that the proposed loss function converges to the ideal loss function for both clicked and unclicked data. Because of the model-agnostic property, it can be applied to any existing unbiased learning models. Experimental results show that the proposed method outperforms state-of-the-art unbiased models up to 5.54-24.56% for MAP@1 on three datasets.

Empowering News Recommendation with Pre-trained Language Models
Authors: Chuhan Wu (1), Fangzhao Wu (2), Tao Qi (1), Yongfeng Huang (1)
1: Tsinghua University, 2: Microsoft Research Asia

ACM DL

Google Scholar

(167)
概要:　個人に最適化されたニュース推薦は、オンラインニュースサービスにおいて不可欠な技術です。ニュース記事は豊富なテキストコンテンツを含んでおり、個人向けニュース推薦には正確なニュースモデリングが重要となります。既存のニュース推薦方法は主に従来のテキストモデリング手法に基づいてニューステキストをモデル化しており、ニューステキストの深層意味情報を発掘するには最適ではありません。事前学習済み言語モデル（PLM）は自然言語理解において強力であり、より優れたニュースモデリングの可能性を秘めています。しかし、PLMがニュース推薦に適用されたことを示す公開レポートは存在しません。本論文では、事前学習済み言語モデルを活用したニュース推薦（PLM-NR）に関する研究を報告します。単言語および多言語ニュース推薦データセットに対するオフライン実験結果により、ニュースモデリングにPLMを活用することでニュース推薦の性能が効果的に向上することが示されました。我々のPLM-NRモデルはMicrosoft Newsプラットフォームに展開されており、オンラインフライト結果により、英語圏およびグローバル市場の両方で顕著な性能向上を達成できることが確認されました。

Abstract:　 Personalized news recommendation is an essential technique for online news services. News articles usually contain rich textual content, and accurate news modeling is important for personalized news recommendation. Existing news recommendation methods mainly model news texts based on traditional text modeling methods, which is not optimal for mining the deep semantic information in news texts. Pre-trained language models (PLMs) are powerful for natural language understanding, which has the potential for better news modeling. However, there is no public report that shows PLMs have been applied to news recommendation. In this paper, we report our work on pre-trained language models empowered news recommendation (PLM-NR). Offline experimental results on both monolingual and multilingual news recommendation datasets show that leveraging PLMs for news modeling can effectively improve the performance of news recommendation. Our PLM-NR models have been deployed to the Microsoft News platform, and online flight results show that they can achieve significant performance gains in both English-speaking and global markets.

Entangled Bidirectional Encoder to Autoregressive Decoder for Sequential Recommendation
Authors: Taegwan Kang (1), Hwanhee Lee (1), Byeongjin Choe (1), Kyomin Jung (1)
1: Seoul National University

ACM DL

Google Scholar

(168)
概要:　近年、BERTは双方向注意メカニズムを用いることで、シーケンシャルレコメンデーションにおいて圧倒的な性能を示しています。しかし、双方向モデルはユーザーのインタラクションからダイナミクスを効果的に捉える一方で、そのトレーニング戦略は一般に左から右へ進行するシーケンシャルレコメンデーションの推論段階には適していません。この問題に対処するために、NLPタスクで広く使用されているBARTを基にした新しいレコメンデーションシステムを導入します。BARTは左から右のデコーダーを使用し、その双方向エンコーダーにノイズを注入することで、トレーニングと推論のギャップを減らすことができます。しかし、BARTをそのままレコメンデーションシステムに使用することは、そのモデル特性とドメインの違いから困難です。BARTは自己回帰型生成モデルであり、ノイズ変換技術は元々テキストシーケンス用に開発されています。本論文では、ユーザーインタラクションのために双方向エンコーダーと自己回帰型デコーダーをノイズ変換で絡める新しいシーケンシャルレコメンデーションモデル「Entangled BART for Recommendation (E-BART4Rec)」を提案します。BARTとは異なり、最終出力がデコーダーの出力のみに依存するのではなく、E-BART4Recは各出力の重要性を計算するゲーティングメカニズムに基づいて双方向エンコーダーと自己回帰型デコーダーの出力を動的に統合します。また、エンコーダーの入力には、アイテムの削除、クロッピング、逆転、インフィリングのような実際のユーザーの行動を模倣するノイズ変換を採用しています。広く使用されている実世界データセットを用いた広範な実験により、我々のモデルがベースラインを大幅に上回ることが実証されました。

Abstract:　 Recently, BERT has shown overwhelming performance in sequential recommendation by using a bidirectional attention mechanism. Although the bidirectional model effectively captures dynamics from user interaction, its training strategy does not fit well to the inference stage in sequential recommendation which generally proceeds in a left-to-right way. To address this problem, we introduce a new recommendation system built upon BART, which is widely used in NLP tasks. BART uses a left-to-right decoder and injects noise into its bidirectional encoder, which can reduce the gap between training and inference. However, direct usage of BART for recommendation system is challenging due to its model property and domain difference. BART is an auto-regressive generative model, and its noising transformation techniques are originally developed for text sequence. In this paper, we present a novel sequential recommendation model, Entangled BART for Recommendation (E-BART4Rec) that entangles bidirectional encoder and auto-regressive decoder with noisy transformations for user interaction. Unlike BART, where the final output only depends on its output of the decoder, E-BART4Rec dynamically integrates the output of the bidirectional encoder and auto-regressive decoder based on a gating mechanism that calculates the importance of each output. We also employ noisy transformation that imitates the real users' behaviors, such as item deletion, item cropping, item reverse, and item infilling, to the input of the encoder. Extensive experiments on widely used real-world datasets demonstrate that our models significantly outperform the baselines.

Entity Retrieval Using Fine-Grained Entity Aspects
Authors: Shubham Chatterjee (1), Laura Dietz (1)
1: University of New Hampshire

ACM DL

Google Scholar

(169)
概要:　エンティティのアスペクトリンクを用いて、エンティティ検索における最新技術をさらに向上させました。エンティティ検索とは、「家畜における抗生物質の使用」といった検索クエリに関連するエンティティを検索するタスクです。エンティティのアスペクトリンクは、エンティティリンクのセマンティックな情報を精緻化する新しい技術です。例えば、上記のクエリに関連するパッセージが「USA」というエンティティを言及するかもしれませんが、USAには多くのアスペクトがあり、その中で「USA/農業」のような少数のアスペクトのみがこのクエリに関連しています。エンティティのどのアスペクトがクエリの文脈において言及されているかを示すエンティティアスペクトリンクを使用することで、より具体的な関連性指標を得ることができます。我々のアプローチが全てのベースライン方法、現在の最新技術も含めて、エンティティ検索の標準テストコレクションを使用した評価において改善することを示しました。この研究では、大規模なTRECコーパスに対して多くのエンティティアスペクトリンクのコレクションを公開します。

Abstract:　 Using entity aspect links, we improve upon the current state-of-the-art in entity retrieval. Entity retrieval is the task of retrieving relevant entities for search queries, such as "Antibiotic Use In Livestock". Entity aspect linking is a new technique to refine the semantic information of entity links. For example, while passages relevant to the query above may mention the entity "USA", there are many aspects of the USA of which only few, such as "USA/Agriculture", are relevant for this query. By using entity aspect links that indicate which aspect of an entity is being referred to in the context of the query, we obtain more specific relevance indicators for entities. We show that our approach improves upon all baseline methods, including the current state-of-the-art using a standard entity retrieval test collection. With this work, we release a large collection of entity-aspect-links for a large TREC corpus.

Evaluating the Predictivity of IR Experiments
Authors: Lida Rashidi (1), Justin Zobel (1), Alistair Moffat (1)
1: The University of Melbourne

ACM DL

Google Scholar

(170)
概要:　情報検索における研究活動の重要な要素として、実験評価が挙げられます。これは、「技術Aは技術Bよりも優れた検索効果を提供する」といった主張を裏付けるために用いられます。このような主張には、結果が適用されるデータの特性が暗黙的に含まれており、使用されるクエリや適用されるドキュメントの両方に関して言えます。本研究では、異なる特性を持つコレクションに対する相対的な性能を予測するための手段としての評価の役割を探ります。特に、制御された方法で異なる新しいコレクションを合成することによって、情報検索評価パイプラインの信頼性を探求し、実験的な検証の重要な部分であるドキュメント、クエリ、メトリクス間の複雑な相互関係をよりよく理解することが可能であることを示します。我々の結果は、コレクションが単純に例えばあるドキュメントソースから別の類似ソースへと移行するといった形で変化するにつれて、予測能力が低下することを示しています。

Abstract:　 Experimental evaluation is regarded as a critical element of any research activity in Information Retrieval, and is typically used to support assertions of the form "Technique A provides better retrieval effectiveness than does Technique B". Implicit in such claims are the characteristics of the data to which the results apply, in terms of both the queries used and the documents they were applied to. Here we explore the role of evaluation on a collection as a prediction of relative performance on collections that have different characteristics. In particular, by synthesizing new collections that vary from each other in a controlled way, we show that it is possible to explore the reliability of an IR evaluation pipeline, and to better understand the complex interrelationship between documents, queries, and metrics that is an important part of any experimental validation. Our results show that predictivity declines as the collection is varied, even in simple ways such as shifting in focus from one document source to another similar source.

FedCMR: Federated Cross-Modal Retrieval
Authors: Linlin Zong (1), Qiujie Xie (1), Jiahui Zhou (1), Peiran Wu (1), Xianchao Zhang (1), Bo Xu (1)
1: Dalian University of Technology

ACM DL

Google Scholar

(171)
概要:　ディープ・クロスモーダル検索手法は、多様なクロスモーダル検索アルゴリズムの中で競争力を見せています。一般的に、これらの手法は大量のトレーニングデータを必要とします。しかしながら、大量のデータを集めることは、巨大なプライバシーリスクと高いメンテナンスコストを伴います。最近のフェデレーティッドラーニングの成功に触発されて、我々は分散型のマルチモーダルデータを用いてモデルを学習するフェデレーティッド・クロスモーダル検索（FedCMR）を提案します。具体的には、まず各クライアントがそのローカルデータを用いてクロスモーダル検索モデルを訓練し、複数のモーダリティ間での共通空間を学習します。その後、信頼できる中央サーバー上で、複数クライアントの共通サブスペースを共同で学習します。最終的に、各クライアントはサーバー上の集約された共通サブスペースに基づいてローカルモデルの共通サブスペースを更新し、トレーニングに参加した全てのクライアントがフェデレーティッドラーニングの恩恵を受けることができます。4つのベンチマークデータセットでの実験結果は、提案手法の有効性を示しています。

Abstract:　 Deep cross-modal retrieval methods have shown their competitiveness among different cross-modal retrieval algorithms. Generally, these methods require a large amount of training data. However, aggregating large amounts of data will incur huge privacy risks and high maintenance costs. Inspired by the recent success of federated learning, we propose the federated cross-modal retrieval (FedCMR), which learns the model with decentralized multi-modal data. Specifically, we first train the cross-modal retrieval model and learn the common space across multiple modalities in each client using its local data. Then, we jointly learn the common subspace of multiple clients on the trusted central server. Finally, each client updates the common subspace of the local model based on the aggregated common subspace on the server, so that all clients participated in the training can benefit from federated learning. Experiment results on four benchmark datasets demonstrate the effectiveness proposed method.

Gazetteer Enhanced Named Entity Recognition for Code-Mixed Web Queries
Authors: Besnik Fetahu (1), Anjie Fang (1), Oleg Rokhlenko (1), Shervin Malmasi (1)
1: Amazon.com

ACM DL

Google Scholar

(172)
概要:　 Webクエリにおける固有表現認識（NER）は非常に難しい課題です。クエリは多くの場合、整った文章で構成されておらず、文脈がほとんど存在せず、高度に曖昧なクエリ対象を含みます。特に、エンタープライズなどのドメインにおいて、異なる言語の固有表現を含むクエリ（e.g. 映画や商品の名前を含むクエリ）は一層の挑戦をもたらします。本研究では、固有表現と非固有表現のクエリ用語が異なる言語で同時に共存するコードミックスクエリに対するNERに取り組みます。我々の貢献は二つあります。第一に、コードミックスNERデータが不足している問題に対処するため、6つの言語と4つの異なるスクリプトで構成された大規模データセット「EMBER」を作成しました。Bingのクエリデータに基づいて、実際の検索シナリオを反映する多くの言語組み合わせを含めました。第二に、既存の多言語トランスフォーマーモデルを強化するために、新しいゲートアーキテクチャを提案し、エキスパートのミキスチャーモデルとの組み合わせにより多言語ガゼティアを動的に注入し、複数の言語で固有表現と非固有表現のクエリ用語を同時に識別・処理することを可能にします。複数の言語におけるコードミックスクエリに対する実験評価において、我々のアプローチはガゼティアを効果的に利用し、F1=68%を達成し、ガゼティアを用いないベースラインに対して絶対値で+31%の改善を示しました。

Abstract:　 Named entity recognition (NER) for Web queries is very challenging. Queries often do not consist of well-formed sentences, and contain very little context, with highly ambiguous queried entities. Code-mixed queries, with entities in a different language than the rest of the query, pose a particular challenge in domains like e-commerce (e.g. queries containing movie or product names). This work tackles NER for code-mixed queries, where entities and non-entity query terms co-exist simultaneously in different languages. Our contributions are twofold. First, to address the lack of code-mixed NER data we create EMBER, a large-scale dataset in six languages with four different scripts. Based on Bing query data, we include numerous language combinations that showcase real-world search scenarios. Secondly, we propose a novel gated architecture that enhances existing multi-lingual Transformers with a Mixture-of-Experts model to dynamically infuse multi-lingual gazetteers, allowing it to simultaneously differentiate and handle entities and non-entity query terms in multiple languages. Experimental evaluation on code-mixed queries in several languages shows that our approach efficiently utilizes gazetteers to recognize entities in code-mixed queries with an F1=68%, an absolute improvement of +31% over a non-gazetteer baseline.

Hyperbolic Online Time Stream Modeling
Authors: Ramit Sawhney (1), Shivam Agarwal (2), Megh Thakkar (3), Arnav Wadhwa (1), Rajiv Ratn Shah (1)
1: Indraprastha Institute of Information Technology Delhi, 2: Manipal Institute of Technology, 3: BITS Pilani

ACM DL

Google Scholar

(173)
概要:　ソーシャルメディアのテキストやニュースなどのオンライン情報の急増と普及により、金融市場へのユーザーのアクセスは向上していますが、これらの不規則で時間的なデータの膨大な流れをモデル化することは課題です。このような時間的な情報の流れは、べき乗則のダイナミクス、スケールフリーの特性、および時間的不規則性を示し、従来の逐次モデルでは正確にモデル化できません。本研究では、時刻認識型階層ハイパーボリックLSTM（HTLSTM）を初めて提案し、リーマン多様体を利用して、時間認識型の方法でテキストのスケールフリーな性質をエンコードします。株取引、株価の動きの予測、金融リスクの予測という3つの金融タスクに関する実験を通じて、HTLSTMがオンライン情報の時間的シーケンスをモデル化するための適用可能性を実証します。英語および中国語のデータを含む4つのグローバル株式市場および3つの株価指数の実データで、ハイパーボリック幾何学を用いた時間認識型テキストモデリングへの一歩を踏み出します。

Abstract:　 The rapidly rising ubiquity and dissemination of online information such as social media text and news improve user accessibility towards financial markets, however, modeling these vast streams of irregular, temporal data poses a challenge. Such temporal streams of information show power-law dynamics, scale-free characteristics, and time irregularities that sequential models are unable to accurately model. In this work, we propose the first Hierarchical Time-Aware Hyperbolic LSTM (HTLSTM), which leverages the Riemannian manifold for encoding the scale-free nature of a sequence of text in a time-aware fashion. Through experiments on three financial tasks: stock trading, equity price movement prediction, and financial risk prediction, we demonstrate HTLSTM's applicability for modeling temporal sequences of online information. On real-world data from four global stock markets and three stock indices spanning data in English and Chinese, we make a step towards time-aware text modeling via hyperbolic geometry.

ICAI-SR: Item Categorical Attribute Integrated Sequential Recommendation
Authors: Xu Yuan (1), Dongsheng Duan (1), Lingling Tong (2), Lei Shi (2), Cheng Zhang (1)
1: Institute of Computing Technology, 2: National Computer Network Emergency Response Technical Team

ACM DL

Google Scholar

(174)
概要:　シーケンシャル推薦（Sequential Recommendation, SR）は近年多くの研究者の関心を集めています。しかし、既存のほとんどの属性統合SRモデルは、アイテムとカテゴリ属性間の複雑な関係を直接的にモデル化しておらず、次のアイテムを予測する際の属性シーケンスの力を十分に活用していません。本論文では、アイテムカテゴリ属性統合シーケンシャル推薦（ICAI-SR）フレームワークを提案します。ICAI-SRは、アイテム‐属性集約（Item-Attribute Aggregation, IAA）モデルとエンティティシーケンシャル（Entity Sequential, ES）モデルで構成されています。IAAモデルでは、異種グラフを用いてアイテムと異なるカテゴリ属性間の複雑な関係を表現し、注目機構に基づく隣接集約を設計することで、アイテムと属性間の相関をモデル化します。ESモデルには、アイテムシーケンシャル（Item Sequential, IS）モデルと一つ以上の属性シーケンシャル（Attribute Sequential, AS）モデルが含まれています。ISモデルとASモデルを使用することで、アイテムシーケンスだけでなく属性シーケンスも用いて次のアイテムを予測します。ICAI-SRは、ゲーテッド・リカレント・ユニット（Gated Recurrent Unit, GRU）と双方向エンコーダ表現を用いたトランスフォーマー（Bidirectional Encoder Representations from Transformers, BERT）をESモデルとして利用し、それぞれICAI-GRUとICAI-BERTを実現しました。3つの公共データセットで広範な実験を行い、ICAI-SRの性能を検証しました。実験結果は、ICAI-SRが基本的なSRモデルおよび競争力のある属性統合SRモデルよりも優れた性能を示すことを明らかにしました。

Abstract:　 Sequential recommendation (SR) has attracted much research attention in the past few years. Most existing attribute integrated SR models do not directly model the complex relations between items and categorical attributes, as well do not exploit the power of attribute sequence in predicting the next item. In this paper, we propose an Item Categorical Attribute Integrated Sequential Recommendation (ICAI-SR) framework, which consists of an Item-Attribute Aggregation (IAA) model and Entity Sequential (ES) models. In IAA model, we employ a heterogeneous graph to represent the complex relations between items and different types of categorical attributes, then the attention mechanism based neighborhood aggregation is designed to model the correlations between items and attributes. For ES models, there are one Item Sequential (IS) model and one or more Attribute Sequential (AS) models. With IS and AS models, not only the item sequence but also the attribute sequence are used to predict the next item during model training. ICAI-SR is instantiated by taking Gated Recurrent Unit (GRU) and Bidirectional Encoder Representations from Transformers (BERT) as ES models, resulting in ICAI-GRU and ICAI-BERT respectively. Extensive experiments have been conducted on three public datasets to validate the performance of ICAI-SR. Experimental Results show that ICAI-SR performs better than both basic SR models and a competitive attribute integrated SR model.

Identifying Queries in Instant Search Logs
Authors: Markus Fischer (1), Kristof Komlossy (2), Benno Stein (2), Martin Potthast (3), Matthias Hagen (4)
1: Friedrich-Schiller-Universität Jena, 2: Bauhaus-Universität Weimar, 3: Leipzig University, 4: Martin-Luther-Universität Halle-Wittenberg

ACM DL

Google Scholar

(175)
概要:　インスタント検索機能を持つ検索エンジンのクエリログは、クエリレベルではなくキー入力レベルでのやり取りを表しているため、ログ分析が難しい。クエリレベルでのログ分析を可能にするためには、ユーザーのキー入力レベルでのやり取りの記録を、個別のクエリにマッピングする必要がある。この問題は、「標準」のクエリログにおけるセッション検出（同じトピックに関する連続するクエリのグループ化）と強い類似点があるが、顕著な違いも存在する。本論文では、インスタントクエリログにおける同じクエリに属するインタラクションを識別する新しいアプローチを提案する。実験的比較において、我々の新しいアプローチは、クエリログセッション検出の最新のカスケード法の0.83に比べ、0.93のF2スコアを達成した。

Abstract:　 Query logs of search engines with instant search functionality are challenging for log analysis, since the log entries represent interactions at the keystroke level, rather than at the query level. To enable log analyses at the query level, a user's logged sequence of keystroke-level interactions needs to be mapped to distinct queries. This problem bears strong parallels to session detection in "standard" query logs (i.e., forming groups of subsequent queries on the same topic), but there are salient differences. In this paper, we present a new approach to identifying interactions belonging to the same query in instant query logs. In an experimental comparison, our new approach achieves an F2 score of 0.93 compared to only 0.83 of a state-of-the-art cascading method for query log session detection.

Improving Transformer-Kernel Ranking Model Using Conformer and Query Term Independence
Authors: Bhaskar Mitra (1), Sebastian Hofstätter (2), Hamed Zamani (3), Nick Craswell (4)
1: Microsoft, 2: TU Wien, 3: University of Massachusetts Amherst, 4: Microsoft

ACM DL

Google Scholar

(176)
概要:　 Transformer-Kernel (TK) モデルは、TREC Deep Learning ベンチマークで強力なリランキング性能を示しており、大規模な事前学習（高い訓練コスト）、クエリとドキュメントの共同エンコード（高い推論コスト）、および多数のTransformer層（訓練と推論の両方で高コスト）を用いる他のTransformerベースのアーキテクチャに対する効率的な（しかしやや効果が低い）代替手段と見なされています。TKモデルの変種であるTKLは、ローカルな自己注意を組み込み、ドキュメントランキングの文脈でより長い入力シーケンスを効率的に処理するように開発されました。本研究では、TKをより長い入力シーケンスにスケールするための代替アプローチとして、新しいConformer層を提案します。さらに、クエリ項目の独立性と明示的な項目マッチングを組み込むことで、モデルをフルリトリーバル設定に拡張します。我々のモデルをTREC 2020 Deep Learning トラックの厳密なブラインド評価設定でベンチマークし、提案されたアーキテクチャの変更がTKLよりも改善されたリトリーバル品質をもたらすことを確認しました。我々の最良のモデルは、すべての非ニューラルラン（"trad"）および事前学習Transformerベースのラン（"nnlm"）の3分の2をNDCG@10で上回りました。

Abstract:　 The Transformer-Kernel (TK) model has demonstrated strong reranking performance on the TREC Deep Learning benchmark---and can be considered to be an efficient (but slightly less effective) alternative to other Transformer-based architectures that employ (i) large-scale pretraining (high training cost), (ii) joint encoding of query and document (high inference cost), and (iii) larger number of Transformer layers (both high training and high inference costs). Since, a variant of the TK model---called TKL---has been developed that incorporates local self-attention to efficiently process longer input sequences in the context of document ranking. In this work, we propose a novel Conformer layer as an alternative approach to scale TK to longer input sequences. Furthermore, we incorporate query term independence and explicit term matching to extend the model to the full retrieval setting. We benchmark our models under the strictly blind evaluation setting of the TREC 2020 Deep Learning track and find that our proposed architecture changes lead to improved retrieval quality over TKL. Our best model also outperforms all non-neural runs ("trad") and two-thirds of the pretrained Transformer-based runs ("nnlm") on NDCG@10.

Info-flow Enhanced GANs for Recommender
Authors: Yuan Lin (1), Zhang Xie (1), Bo Xu (1), Kan Xu (1), Hongfei Lin (1)
1: Dalian University of Technology

ACM DL

Google Scholar

(177)
概要:　推薦システムはユーザーが大量の情報を処理するのを助けることができ、生成対向ネットワーク（GANs）は推薦システムにおいて大きな可能性を示しています。本論文では、オリジナルの生成器と識別器の間の情報の流れに基づいて、生成器内部の情報流を強化する新しいGANモデルを提案します。我々の実験結果は、我々のモデルが生成器と識別器の間の差異を減少させることを示しています。生成器と識別器の両方は、他の強力なベースラインと比較して顕著な性能向上を示します。NDCG@3とMRRによる改善は、それぞれ30.98％と30.17％に達することができ、これらは重要な改善です。

Abstract:　 Recommendation systems can help users process large amounts of information, and generative adversarial networks (GANs) show great potential in recommendation systems. In this paper, we propose a new GAN model to enhance the information flow within the generator based on the information flow between the original generator and discriminator. Our experimental results indicate that our model reduces the discrepancy between the generator and the discriminator. Both the generator and discriminator yield considerable performance improvements compared to other strong baselines. The improvements by NDCG@3 and MRR are significant, which can reach 30.98% and 30.17%, respectively.

Investigating Session Search Behavior with Knowledge Graphs
Authors: Xiangsheng Li (1), Maarten de Rijke (2), Yiqun Liu (1), Jiaxin Mao (3), Weizhi Ma (1), Min Zhang (1), Shaoping Ma (1)
1: Tsinghua University, 2: University of Amsterdam & Ahold Delhaize, 3: Renmin University of China

ACM DL

Google Scholar

(178)
概要:　知識グラフは情報検索において広く活用されており、クエリや文書の意味的理解を強化することができます。主要な考え方は、エンティティとエンティティ間の関係を補足情報として考慮することです。既存の研究は、知識グラフからの情報を検索モデルに取り入れることで検索効果を向上させていますが、ユーザーの検索行動の理解に知識グラフを利用した研究はほとんど行われていません。本研究では、セッション検索中のユーザー行動を知識グラフの観点から調査します。具体的には、クエリの再構成や文書のクリック行動に関するクエリログを基にした分析を行います。大規模な商業用クエリログと知識グラフに基づき、クエリの再構成や文書のクリックに関する新しいユーザー行動パターンを発見しました。本研究は、セッション検索におけるユーザー行動への理解を深め、知識グラフを用いて検索モデルを改善するための示唆を提供します。

Abstract:　 Knowledge graphs are widely used in information retrieval as they can enhance our semantic understanding of queries and documents. The main idea is to consider entities and entity relationships as side information. Although existing work has achieved improvements in retrieval effectiveness by incorporating information from knowledge graphs into retrieval models, few studies have leveraged knowledge graphs in understanding users' search behavior. We investigate user behavior during session search from the perspective of a knowledge graph. We conduct a query log-based analysis of users' query reformulation and document clicking behavior. Based on a large-scale commercial query log and a knowledge graph, we find new user behavior patterns in terms of query reformulation and document clicking. Our study deepens our understanding of user behavior in session search and provides implications to help improve retrieval models with knowledge graphs.

Is Query Performance Prediction With Multiple Query Variations Harder Than Topic Performance Prediction?
Authors: Oleg Zendel (1), J. Shane Culpepper (1), Falk Scholer (1)
1: RMIT University

ACM DL

Google Scholar

(179)
概要:　異なる情報ニーズを表す複数のクエリの検索効果を正確に推定することは、情報検索（IR）において20年以上研究されてきた問題です。最近の研究では、同じ情報ニーズを表す複数のクエリを使用する場合、この問題が著しく困難になることが示されました。本研究では、クエリパフォーマンス予測（QPP）の既存の評価フレームワークを一般化することにより、両シナリオにおける予測品質の違いの原因を探ります。実証的な分析により、ほとんどの予測子において、この違いはクエリ効果分布の根本的な違いによるものであることが明らかになりました。また、詳細な分析により、QPPが最も信頼できる場合とそうでない場合の重要なパフォーマンス分布特性も示しています。

Abstract:　 Accurately estimating the retrieval effectiveness of different queries representing distinct information needs is a problem in Information Retrieval (IR) that has been studied for over 20 years. Recent work showed that the problem can be significantly harder when multiple queries representing the same information need are used in prediction. By generalizing the existing evaluation framework of Query Performance Prediction (QPP) we explore the causes of these differences in prediction quality in the two scenarios. Our empirical analysis demonstrates that for most predictors, this difference is solely an artifact of the underlying differences in the query effectiveness distributions. Our detailed analysis also demonstrates key performance distribution properties under which (QPP) is most and least reliable.

Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index
Authors: Han Zhang (1), Hongwei Shen (2), Yiming Qiu (1), Yunjiang Jiang (2), Songlin Wang (1), Sulong Xu (1), Yun Xiao (2), Bo Long (1), Wen-Yun Yang (2)
1: JD.com, 2: JD.com Silicon Valley Research Center

ACM DL

Google Scholar

(180)
概要:　高速な近似最近傍（ANN）検索を可能にする埋め込みインデックスは、最先端のディープリトリーバルシステムにとって不可欠な構成要素です。従来の手法では、埋め込み学習とインデックス構築の2つのステップを分離することが多く、これがインデックス作成時間の増加や検索精度の低下につながっています。本論文では、Poeem（深層リトリーバルモデルと共同で訓練された積層量子化に基づく埋め込みインデックス）と呼ばれる新しい手法を提案し、勾配直通推定器、ウォームスタート戦略、最適空間分割、Givens回転などの技術を活用して、これらの分離されたステップをエンドツーエンドの訓練の中で統一します。広範な実験結果により、提案手法は検索精度を大幅に向上させるとともに、インデックス作成時間をほぼゼロに削減することが示されました。我々のアプローチは、比較および再現性のためにオープンソース化されています。

Abstract:　 Embedding index that enables fast approximate nearest neighbor(ANN) search, serves as an indispensable component for state-of-the-art deep retrieval systems. Traditional approaches, often separating the two steps of embedding learning and index building, incur additional indexing time and decayed retrieval accuracy. In this paper, we propose a novel method called Poeem, which stands for product quantization based embedding index jointly trained with deep retrieval model, to unify the two separate steps within an end-to-end training, by utilizing a few techniques including the gradient straight-through estimator, warm start strategy, optimal space decomposition and Givens rotation. Extensive experimental results show that the proposed method not only improves retrieval accuracy significantly but also reduces the indexing time to almost none. We have open sourced our approach for the sake of comparison and reproducibility.

Learning Passage Impacts for Inverted Indexes
Authors: Antonio Mallia (1), Omar Khattab (2), Torsten Suel (1), Nicola Tonellotto (3)
1: New York University, 2: Stanford University, 3: Università di Pisa

ACM DL

Google Scholar

(181)
概要:　ニューラル情報検索システムは通常、カスケード型パイプラインを使用し、ファーストステージのモデルが候補文書セットを取得し、その後のステージがBERTなどの文脈化言語モデルを使用してこのセットの再ランキングを行います。本論文では、標準の反転インデックスを使用した効率的な検索に適した新しい文書用語重み付けスキームであるDeepImpactを提案します。既存の手法と比較して、DeepImpactはインパクトスコアモデルの改善と語彙不一致問題に取り組んでいます。特に、DeepImpactはDocT5Queryを活用して文書コレクションを充実させ、文脈化言語モデルを使用して文書内のトークンの意味的重要性を直接推定し、各文書内の各トークンに対して単一の値表現を生成します。我々の実験では、DeepImpactはDocT5Queryに関して、効果指標において従来のファーストステージ検索アプローチを最大17%上回ることが大幅に示され、再ランキングシナリオで展開された場合、最先端アプローチの同じ効果を5.1倍の効率で達成できることが確認されました。

Abstract:　 Neural information retrieval systems typically use a cascading pipeline, in which a first-stage model retrieves a candidate set of documents and one or more subsequent stages re-rank this set using contextualized language models such as BERT. In this paper, we propose DeepImpact, a new document term-weighting scheme suitable for efficient retrieval using a standard inverted index. Compared to existing methods, DeepImpact improves impact-score modeling and tackles the vocabulary-mismatch problem. In particular, DeepImpact leverages DocT5Query to enrich the document collection and, using a contextualized language model, directly estimates the semantic importance of tokens in a document, producing a single-value representation for each token in each document. Our experiments show that DeepImpact significantly outperforms prior first-stage retrieval approaches by up to 17% on effectiveness metrics w.r.t. DocT5Query, and, when deployed in a re-ranking scenario, can reach the same effectiveness of state-of-the-art approaches with up to 5.1x speedup in efficiency.

Learning Robust Dense Retrieval Models from Incomplete Relevance Labels
Authors: Prafull Prakash (1), Julian Killingback (1), Hamed Zamani (1)
1: University of Massachusetts Amherst

ACM DL

Google Scholar

(182)
概要:　近年では、効率的な億規模の近似近傍（ANN）探索アルゴリズムがGPU上で展開され、それにより情報検索の研究者たちは、クエリと文書のための低次元の密集表現を学習し、ANN探索を用いた検索を行う神経ランキングモデルを開発してきました。しかし、これらの密集検索モデルを最適化するには、負のサンプリング（ペアワイズ学習）の課題を含むいくつかの困難があります。最近のモデルであるANCEは、ANN検索を用いた動的な負のサンプリングを用いてこの問題に成功しています。本論文では、完全な関連付け注釈が不足しているトレーニングデータのシナリオにおいて、ロバストな負のサンプリング戦略を提案することで、ANCEを改善しました。大規模なトレーニングデータを完全な関連性判断とともに取得するには莫大な費用がかかるため、これは非常に重要です。本研究のモデルは、完全な関連性判断を持つ小規模な検証セットを使用して、密集検索モデルの負のサンプリング分布を正確に推定します。また、トレーニング中の語彙マッチングシグナルの活用と、評価時の擬似関連フィードバックの利用を探索し、パフォーマンスの向上を図ります。TRECディープラーニングトラックベンチマークで行った実験により、我々の解決策の有効性が実証されました。

Abstract:　 Recent deployment of efficient billion-scale approximate nearest neighbor (ANN) search algorithms on GPUs has motivated information retrieval researchers to develop neural ranking models that learn low-dimensional dense representations for queries and documents and use ANN search for retrieval. However, optimizing these dense retrieval models poses several challenges including negative sampling for (pair-wise) training. A recent model, called ANCE, successfully uses dynamic negative sampling using ANN search. This paper improves upon ANCE by proposing a robust negative sampling strategy for scenarios where the training data lacks complete relevance annotations. This is of particular importance as obtaining large-scale training data with complete relevance judgment is extremely expensive. Our model uses a small validation set with complete relevance judgments to accurately estimate a negative sampling distribution for dense retrieval models. We also explore leveraging a lexical matching signal during training and pseudo-relevance feedback during evaluation for improved performance. Our experiments on the TREC Deep Learning Track benchmarks demonstrate the effectiveness of our solutions.

Lighter and Better: Low-Rank Decomposed Self-Attention Networks for Next-Item Recommendation
Authors: Xinyan Fan (1), Zheng Liu (2), Jianxun Lian (2), Wayne Xin Zhao (1), Xing Xie (2), Ji-Rong Wen (1)
1: Renmin University of China, 2: Microsoft Research Asia

ACM DL

Google Scholar

(183)
概要:　自己注意ネットワーク（SANs）は、シーケンシャルリコメンダーに集中的に適用されていますが、次の問題点があります。(1) 自己注意における二次複雑性と過剰パラメータ化への脆弱性；(2) 暗黙の位置エンコーディングによるアイテム間のシーケンシャル関係の不正確なモデリング。本研究では、これらの問題を克服するために低ランク分解自己注意ネットワーク（LightSANs）を提案します。特に、ユーザーの歴史的アイテムを少数の定数個の潜在的な興味に投影し、アイテムと興味の相互作用を活用してコンテキスト対応の表現を生成する低ランク分解自己注意を導入します。時間と空間の観点でユーザーの歴史的シーケンス長に対して線形にスケールし、過剰パラメータ化への耐性が強化されています。さらに、アイテム間のシーケンシャルな関係をより正確にモデリングするための非連結位置エンコーディングを設計しました。3つの実世界のデータセットを用いた広範な実験研究から、LightSANsは効果と効率の両面で既存のSANsベースのリコメンダーを上回ることが示されました。

Abstract:　 Self-attention networks (SANs) have been intensively applied for sequential recommenders, but they are limited due to: (1) the quadratic complexity and vulnerability to over-parameterization in self-attention; (2) inaccurate modeling of sequential relations between items due to the implicit position encoding. In this work, we propose the low-rank decomposed self-attention networks (LightSANs) to overcome these problems. Particularly, we introduce the low-rank decomposed self-attention, which projects user's historical items into a small constant number of latent interests and leverages item-to-interest interaction to generate the context-aware representation. It scales linearly w.r.t. the user's historical sequence length in terms of time and space, and is more resilient to over-parameterization. Besides, we design the decoupled position encoding, which models the sequential relations between items more precisely. Extensive experimental studies are carried out on three real-world datasets, where LightSANs outperform the existing SANs-based recommenders in terms of both effectiveness and efficiency.

Motif-aware Sequential Recommendation
Authors: Zeyu Cui (1), Yinjiang Cai (2), Shu Wu (1), Xibo Ma (1), Liang Wang (1)
1: Institute of Automation, 2: University of Chinese Academy of Sciences & Institute of Automation

ACM DL

Google Scholar

(184)
概要:　シーケンシャルレコメンデーションは、ユーザーの行動シーケンスを通じて動的な行動規則をモデル化することを目的としています。最近では、さまざまなディープラーニング技術がシーケンス内のアイテムの関連をモデル化するために応用されています。これらの手法は効果的である一方で、我々は、これらの方法が行動シーケンスのマクロ構造のみを考慮し、シーケンス内の重要なマイクロ構造を無視していると主張します。この限界に対処するために、我々はMotif-aware Sequential Recommendation（MoSeR）という新しいモデルを提案します。MoSeRは行動シーケンス内に隠されたモチーフを捕捉して、マイクロ構造の特徴をモデル化します。MoSeRは最後の行動とターゲットアイテムの両方を含むモチーフを抽出します。これらのモチーフは、有向グラフの形で局所アイテム間のトポロジー的な関係を反映します。したがって、我々の手法は、局所アイテム間の固有パターンを認識することで、より正確な予測を行うことができます。3つのベンチマークデータセットにおける広範な実験により、我々のモデルが最先端のシーケンシャルレコメンデーションモデルを上回る性能を示すことが確認されました。

Abstract:　 Sequential recommendation is intended to model the dynamic behavior regularity through users' behavior sequences. Recently, various deep learning techniques are applied to model the relation of items in the sequences. Despite their effectiveness, we argue that the aforementioned methods only consider the macro-structure of the behavior sequence, but neglect the micro-structure in the sequence which is important to sequential recommendation. To address the above limitation, we propose a novel model called Motif-aware Sequential Recommendation (MoSeR), which captures the motifs hidden in behavior sequences to model the micro-structure features. MoSeR extracts the motifs that contain both the last behavior and the target item. These motifs reflect the topological relations among local items in the form of directed graphs. Thus our method can make a more accurate prediction with the awareness of the inherent patterns between local items. Extensive experiments on three benchmark datasets demonstrate that our model outperforms the state-of-the-art sequential recommendation models.

Neural Representations in Hybrid Recommender Systems: Prediction versus Regularization
Authors: Ramin Raziperchikolaei (1), Tianyu Li (2), Young-joo Chung (1)
1: Rakuten, 2: Rakuten

ACM DL

Google Scholar

(185)
概要:　オートエンコーダーベースのハイブリッドレコメンダーシステムは、ユーザーのアイテムに対するフィードバック（例：評価）やユーザーおよびアイテムのサイド情報（例：ユーザーの職業やアイテムのタイトル）を再構成することによって、ユーザーおよびアイテムの表現を学習できるため、近年人気が高まっています。しかし、既存のシステムは依然として行列因子分解（MF）によって学習された表現を評価予測に使用し、ニューラルネットワークによって学習された表現を正則化項として使用しています。本研究では、予測のためのニューラル表現（NRP）フレームワークを定義し、それをオートエンコーダーベースのレコメンデーションシステムに適用します。さらに、我々の目的関数が以前のMFおよびオートエンコーダーベースの手法とどのように関連しているかを理論的に分析し、ニューラル表現を正則化項として使用することの意味を説明します。また、NRPフレームワークを直接的なニューラルネットワーク構造に適用し、ユーザーおよびアイテム情報を再構成することなく評価を予測する方法を検討します。多くの実験を行い、ニューラル表現が正則化よりも予測に適していることを確認し、NRPフレームワークが予測タスクにおいて最新の手法を上回り、トレーニング時間とメモリの消費が少ないことを示します。

Abstract:　 Autoencoder-based hybrid recommender systems have become popular recently because of their ability to learn user and item representations by reconstructing various information sources, including users' feedback on items (e.g., ratings) and side information of users and items (e.g., users' occupation and items' title). However, existing systems still use representations learned by matrix factorization (MF) to predict the rating, while using representations learned by neural networks as the regularizer. In this paper, we define the neural representation for prediction (NRP) framework and apply it to the autoencoder-based recommendation systems. We theoretically analyze how our objective function is related to the previous MF and autoencoder-based methods and explain what it means to use neural representations as the regularizer. We also apply the NRP framework to a direct neural network structure which predicts the ratings without reconstructing the user and item information. We conduct extensive experiments which confirm that neural representations are better for prediction than regularization and show that the NRP framework outperforms the state-of-the-art methods in the prediction task, with less training time and memory.

On the Orthogonality of Bias and Utility in Ad hoc Retrieval
Authors: Amin Bigdeli (1), Negar Arabzadeh (2), Shirin Seyedsalehi (1), Morteza Zihayat (1), Ebrahim Bagheri (1)
1: Ryerson University, 2: University of Waterloo

ACM DL

Google Scholar

(186)
概要:　複数の研究者が近年、アドホック検索や質問応答などの情報検索タスクに対するさまざまなバイアスの影響を探求してきました。偏見の増加を避けるためにはバイアスの影響を制御する必要がありますが、従来の文献では、検索効用（効果）とバイアスの減少の関係はトレードオフとして捉えられ、片方が向上すればもう片方が犠牲になるとされてきました。本論文では、このトレードオフを実証的に研究し、同様の検索効用を維持しつつバイアスを減少させることが可能かどうかを探ります。バイアス認識の擬似関連フィードバックフレームワークを通じて入力クエリを改訂することによって、それが可能であることを示しました。我々の発見は、Robust04、Gov2、ClueWeb09、ClueWeb12という4つの広く使用されているTRECのコーパスおよび2種類のバイアスメトリックに基づいて報告されています。本論文の発見は、バイアスの低減が必ずしも効用の低下を伴う必要はないことを示す初の研究の一つとして重要です。

Abstract:　 Various researchers have recently explored the impact of different types of biases on information retrieval tasks such as ad hoc retrieval and question answering. While the impact of bias needs to be controlled in order to avoid increased prejudices, the literature has often viewed the relationship between increased retrieval utility (effectiveness) and reduced bias as a tradeoff where one can suffer from the other. In this paper, we empirically study this tradeoff and explore whether it would be possible to reduce bias while maintaining similar retrieval utility. We show this would be possible by revising the input query through a bias-aware pseudo-relevance feedback framework. We report our findings based on four widely used TREC corpora namely Robust04, Gov2, ClueWeb09 and ClueWeb12 and using two classes of bias metrics. The findings of this paper are significant as they are among the first to show that decrease in bias does not necessarily need to come at the cost of reduced utility.

Passage Retrieval for Outside-Knowledge Visual Question Answering
Authors: Chen Qu (1), Hamed Zamani (1), Liu Yang (1), W. Bruce Croft (1), Erik Learned-Miller (1)
1: University of Massachusetts Amherst

ACM DL

Google Scholar

(187)
概要:　本研究では、テキスト質問と画像を含むマルチモーダルな情報ニーズに取り組み、外部知識を利用した視覚的質問応答（Visual Question Answering, VQA）のためのパッセージ検索に焦点を当てています。このタスクでは、外部知識へのアクセスが必要であり、ここでは大規模な非構造化パッセージコレクションを外部知識と定義します。最初に、BM25を使用したスパース検索を実施し、オブジェクト名と画像キャプションで質問を拡張する方法を検討しました。視覚的手がかりが重要な役割を果たし、スパース検索ではキャプションがオブジェクト名よりも情報量が多い傾向にあることを確認しました。次に、クエリエンコーダーにマルチモーダル事前学習済みトランスフォーマーであるLXMERTを使用したデュアルエンコーダー密結合リトリーバーを構築しました。さらに、密結合リトリーバーがオブジェクト拡張を用いたスパース検索を大幅に上回ることを示しました。加えて、密結合リトリーバーは人間が生成したキャプションを活用したスパース検索と同等の性能を発揮します。

Abstract:　 In this work, we address multi-modal information needs that contain text questions and images by focusing on passage retrieval for outside-knowledge visual question answering. This task requires access to outside knowledge, which in our case we define to be a large unstructured passage collection. We first conduct sparse retrieval with BM25 and study expanding the question with object names and image captions. We verify that visual clues play an important role and captions tend to be more informative than object names in sparse retrieval. We then construct a dual-encoder dense retriever, with the query encoder being LXMERT, a multi-modal pre-trained transformer. We further show that dense retrieval significantly outperforms sparse retrieval that uses object expansion. Moreover, dense retrieval matches the performance of sparse retrieval that leverages human-generated captions.

Predicting Links on Wikipedia with Anchor Text Information
Authors: Robin Brochier (1), Frédéric Béchet (1)
1: Aix Marseille Université

ACM DL

Google Scholar

(188)
概要:　ウィキペディアは、最大のオープン協働型オンライン百科事典であり、内部ハイパーリンクによって結びつけられた文書群のコーパスです。これらのリンクは、大規模なネットワークの構造を形成し、この百科事典に含まれる概念に関する重要な情報を提供します。ソースページのアンカーテキストからターゲットページへのリンクの存在は、読者のトピック理解を深める可能性があります。しかし、リンクの過程は、過剰リンクと過少リンクを避けるために特定の編集ルールに従います。本論文では、英語版ウィキペディアのいくつかのサブセットにおけるリンク予測の推論タスクと帰納的タスクを研究し、アンカーテキスト情報に基づく自動リンクの主な課題を特定します。さらに、適切な評価サンプリング方法論を提案し、いくつかのアルゴリズムを比較します。加えて、タスク全体の難易度を良好に推定するベースラインモデルも提案します。

Abstract:　 Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline models that provide a good estimation of the overall difficulty of the tasks.

Propensity-Independent Bias Recovery in Offline Learning-to-Rank Systems
Authors: Zohreh Ovaisi (1), Kathryn Vasilaky (2), Elena Zheleva (1)
1: University of Illinois at Chicago, 2: California Polytechnic State University

ACM DL

Google Scholar

(189)
概要:　ランク学習システムは、高精度なランキングを提供するためにしばしばユーザーとアイテムの相互作用データ（例えばクリック）の利用を行います。しかし、このデータにはいくつかのバイアスが含まれており、これをそのまま学習データとして使用すると、最適でないランキングアルゴリズムに繋がる可能性があります。既存のバイアス修正方法の多くは、上位に表示された結果が相互作用を受けやすいという位置バイアスに焦点を当て、逆確率重み付けを利用してこのバイアスに対処しています。しかし、逆確率スコアを正確に推定することは必ずしも可能ではなく、さらに現実のレコメンダーシステムでは選択バイアスも頻繁に見られます。選択バイアスは、ユーザーが限られたリストの結果にしか触れることができず、その結果、一部のアイテムが観測されないため相互作用の機会を失う現象です。ここでは、選択バイアスと位置バイアスの両方に対応するために、二段階の修正アプローチを用いた新しい反事実法を提案します。この方法は、確率スコアに依存せずに、ランク学習システムのパフォーマンスを向上させます。我々の実験結果は、本手法が最新の確率非依存型方法より優れ、また確率モデルが既知であるという強い仮定に基づく他の手法と比較しても同等以上の性能を示すことを証明しています。

Abstract:　 Learning-to-rank systems often utilize user-item interaction data (e.g., clicks) to provide users with high-quality rankings. However, this data suffers from several biases, and if naively used as training data, it can lead to suboptimal ranking algorithms. Most existing bias-correcting methods focus on position bias, the fact that higher-ranked results are more likely to receive interaction, and address this bias by leveraging inverse propensity weighting. However, it is not always possible to accurately estimate propensity scores, and in addition to position bias, selection bias is often encountered in real-world recommender systems. Selection bias occurs because users are exposed to a truncated list of results, which gives a zero chance for some items to be observed and, therefore, interacted with, even if they are relevant. Here, we propose a new counterfactual method that uses a two-stage correction approach and jointly addresses selection and position bias in learning-to-rank systems without relying on propensity scores. Our experimental results show that our method is better than state-of-the-art propensity-independent methods and either better than or comparable to methods that make the strong assumption for which the propensity model is known.

Revisiting the Tag Relevance Prediction Problem
Authors: Denis Kotkov (1), Alexandr Maslov (2), Mats Neovius (2)
1: University of Helsinki, 2: Åbo Akademi University

ACM DL

Google Scholar

(190)
概要:　従来、レコメンダーシステムは、ユーザーの過去のアイテムとのやり取りに基づいて、ユーザーに一連の推奨事項を提供します。これらの推奨は通常、ユーザーのアイテムに対する嗜好に基づいており、生成には遅延が伴います。クライティキングレコメンダーシステムは、ユーザーがタグを使用して推奨に対して即時のフィードバックを提供し、それに応じて新しい推奨セットを受け取ることができるようにします。しかし、これらのシステムは、タグがアイテムにどの程度適用されるかを示す関連スコアを含む、詳細なアイテム記述を必要とすることが多いです。例えば、この関連スコアは、「ゴッドファーザー」という映画が0から1のスケールでどれほど暴力的であるかを示すことができます。これらのデータを取得することは非常に手間のかかるプロセスであり、ユーザーがタグがアイテムにどの程度適用されるかを明示的に示す必要があります。このプロセスは、タグの関連性を予測する機械学習の方法で改善することができます。本論文では、映画とタグのペアに関連スコアを収集した別の研究のデータセットを探ります。特に、タグ関連性予測問題を定義し、ユーザーが提供する関連スコアの不一致をこの問題の課題として探求し、タグ関連性を予測するための最先端の方法を上回る方法を提示します。我々は、ユーザーの関連スコアに中程度の不一致があることを発見しました。また、ユーザーが「良い演技」、「悪いプロット」、「引用可能」などの主観的なタグに対してより多くの意見の相違がある傾向があり、「アニメーション」、「車」、「結婚式」などの客観的なタグに関する意見の相違は中程度であることも分かりました。

Abstract:　 Traditionally, recommender systems provide a list of suggestions to a user based on past interactions with items of this user. These recommendations are usually based on user preferences for items and generated with a delay. Critiquing recommender systems allow users to provide immediate feedback to recommendations with tags and receive a new set of recommendations in response. However, these systems often require rich item descriptions that contain relevance scores indicating the strength, with which a tag applies to an item. For example, this relevance score could indicate how violent the movie "The Godfather" is on a scale from 0 to 1. Retrieving these data is a very demanding process, as it requires users to explicitly indicate the degree to which a tag applies to an item. This process can be improved with machine learning methods that predict tag relevance. In this paper, we explore the dataset from a different study, where the authors collected relevance scores on movie-tag pairs. In particular, we define the tag relevance prediction problem, explore the inconsistency of relevance scores provided by users as a challenge of this problem and present a method, which outperforms the state-of-the-art method for predicting tag relevance. We found a moderate inconsistency of user relevance scores. We also found that users tend to disagree more on subjective tags, such as "good acting", "bad plot" or "quotable" than on objective tags, such as "animation", "cars" or "wedding", but the disagreement of users regarding objective tags is also moderate.

RMBERT: News Recommendation via Recurrent Reasoning Memory Network over BERT
Authors: Qinglin Jia (1), Jingjie Li (1), Qi Zhang (1), Xiuqiang He (2), Jieming Zhu (2)
1: Noah's Ark Lab, 2: Noah's Ark Lab

ACM DL

Google Scholar

(191)
概要:　タイトル:個別化されたニュース推薦は、情報過多を軽減し、ユーザーが興味を持つニュースを見つけるのを助けることを目的としています。候補ニュースとユーザーの興味を正確にマッチングさせることが、ニュース推薦の鍵となります。既存の多くの手法は、ニュース内容によってユーザーとニュースを別々にベクトル化し、次にその二つのベクトルをマッチングさせています。しかし、一人のユーザーの興味は、各ニュースやニュースの各トピックにおいて異なることがあります。ユーザーとニュースのベクトルを動的に学習し、それらの相互作用をモデル化することが必要です。本研究では、ニュース推薦のためのBERTに基づくリカレント推論メモリネットワーク（RMBERT）を提案します。他の手法と比較して、我々のアプローチはBERTのコンテンツモデリング能力を活用できます。さらに、一連の注意に基づく推論ステップを実行するリカレント推論メモリネットワークは、ユーザーとニュースのベクトルを動的に学習し、各ステップにおいてそれらの相互作用をモデル化できます。その結果、我々のアプローチはユーザーの興味をより良くモデル化することができます。実際のニュース推薦データセットを用いた広範な実験の結果、我々のアプローチが既存の最先端手法を大幅に上回ることを示しました。

Abstract:　 Personalized news recommendation aims to alleviate information overload and help users find news of their interests. Accurately matching candidate news and users' interests is the key to news recommendation. Most existing methods separately encode each user and news into vectors by news contents and then match the two vectors. However, a user's interest may differ in each news or each topic of one news. It's necessary to dynamically learn user and news vector and model their interaction. In this work, we present Recurrent Reasoning Memory Network over BERT (RMBERT) for news recommendation. Compared with other methods, our approach can leverage the ability of content modeling from BERT. Moreover, the recurrent reasoning memory network which performs a series of attention based reasoning steps can dynamically learn user and news vector and model their interaction in each step. As a result, our approach can better model user's interests. We conduct extensive experiments on a real-world news recommendation dataset and the results show that our approach significantly outperforms existing state-of-the-art methods.

Robust Neural Text Classification and Entailment via Mixup Regularized Adversarial Training
Authors: Jiahao Zhao (1), Penghui Wei (1), Wenji Mao (1)
1: Institute of Automation

ACM DL

Google Scholar

(192)
概要:　最近の研究により、自然言語処理のためのニューラルモデルが通常、敵対的攻撃（例えば、文字レベルの挿入や単語レベルの同義語置換）に対して脆弱であることが示されており、これにより堅牢性の欠如が露呈しています。多くの防御技術は特定のセマンティックレベルの攻撃に特化しており、複数レベルの攻撃を同時に緩和する能力を持っていません。敵対的訓練はモデルの堅牢性を高める効果があることが示されていますが、通常のデータに対する性能の低下、特に敵対的例の割合が増加した場合には問題があります。これに対処するために、我々は多レベル攻撃に対するMixup正則化を用いた敵対的訓練（MRAT）を提案します。我々の方法は複数の敵対的例を利用して、通常のデータに対する性能を犠牲にすることなくモデルの内在的な堅牢性を向上させることができます。テキスト分類と含意タスクで我々の方法を評価しました。異なるテキストエンコーダ（BERT、LSTM、CNN）を用いた多レベルの攻撃における実験結果から、我々の方法が一貫して敵対的訓練を上回る性能を示すことが確認されました。

Abstract:　 Recent studies show that neural models for natural language processing are usually fragile under adversarial attacks (e.g., character-level insertion and word-level synonym substitution), which exposes the lack of robustness. Most defense techniques are tailored to specific semantic level attacks and do not possess the ability to mitigate multi-level attack simultaneously. Adversarial training has been shown the effectiveness of increasing model robustness. However, it often suffers from degradation on normal data, especially when the proportion of adversarial examples increase. To address this, we propose mixup regularized adversarial training (MRAT) against multi-level attack. Our method can utilize multiple adversarial examples to increase model intrinsic robustness without sacrificing the performance on normal data. We evaluate our method on text classification and entailment tasks. Experimental results on different text encoders (BERT, LSTM and CNN) with multi-level attack show that our method outperforms adversarial training consistently.

Sequential Recommendation for Cold-start Users with Meta Transitional Learning
Authors: Jianling Wang (1), Kaize Ding (2), James Caverlee (1)
1: Texas A&M University, 2: Arizona State University

ACM DL

Google Scholar

(193)
概要:　シーケンシャル推薦における基本的な課題は、ユーザーがアイテム間をどのように移動するかをモデル化するために、ユーザーのシーケンシャルパターンを捉えることです。しかし、多くの実用的なシナリオでは、ログに記録されたインタラクションがごくわずかしかないコールドスタートユーザーが多数存在します。その結果として、既存のシーケンシャル推薦モデルは、限られたインタラクションしかないユーザーに対してシーケンシャルパターンを学習するのが困難なため、予測能力を失ってしまいます。本研究では、メタラーニングを通じてユーザーの移行パターンをモデリングする新しいフレームワーク「MetaTL」を提案し、コールドスタートユーザー向けのシーケンシャル推薦を改善することを目指します。具体的には、提案するMetaTLは以下の特性を持ちます。(i) コールドスタートユーザーへのシーケンシャル推薦を少ショット学習問題として定式化します。 (ii) 翻訳ベースのアーキテクチャを用いて、ユーザー間の動的移行パターンを抽出します。 (iii) 限られたインタラクションしか持たないコールドスタートユーザーに対して、素早く学習を可能にするメタ移行学習を採用することで、シーケンシャルインタラクションの正確な推論を実現します。

Abstract:　 A fundamental challenge for sequential recommenders is to capture the sequential patterns of users toward modeling how users transit among items. In many practical scenarios, however, there are a great number of cold-start users with only minimal logged interactions. As a result, existing sequential recommendation models will lose their predictive power due to the difficulties in learning sequential patterns over users with only limited interactions. In this work, we aim to improve sequential recommendation for cold-start users with a novel framework named MetaTL, which learns to model the transition patterns of users through meta-learning. Specifically, the proposed MetaTL: (i) formulates sequential recommendation for cold-start users as a few-shot learning problem; (ii) extracts the dynamic transition patterns among users with a translation-based architecture; and (iii) adopts meta transitional learning to enable fast learning for cold-start users with only limited interactions, leading to accurate inference of sequential interactions.

Social Recommendation with Implicit Social Influence
Authors: Changhao Song (1), Bo Wang (1), Qinxue Jiang (2), Yehua Zhang (3), Ruifang He (1), Yuexian Hou (3)
1: College of Intelligence and Computing, 2: Newcastle University, 3: College of Intelligence and Computing

ACM DL

Google Scholar

(194)
概要:　ソーシャルインフルエンス（社会的影響力）は、ソーシャルレコメンデーションにおいて重要な役割を果たします。現在の影響ベースのソーシャルレコメンデーションは、観測されたソーシャルリンク上の明示的な影響に焦点を当てています。しかし、実際には、暗黙的な社会的影響力もユーザーの嗜好に非観測的な形で影響を及ぼすことがあります。本研究では、暗黙的な影響の2種類に注目します：観測されていない対人関係における「局所的な暗黙的影響」と、ユーザーに広く伝えられるアイテムの「グローバルな暗黙的影響」です。我々は、2種類の暗黙的影響を別々にモデル化することで、最新のGNN（グラフニューラルネットワーク）ベースのソーシャルレコメンデーション手法を改良しました。局所的な暗黙的影響は、観測されていない社会的関係を予測することで組み込まれます。グローバルな暗黙的影響は、各アイテムのグローバルな人気度を定義し、その人気度が各ユーザーに与える影響をパーソナライズすることで組み込まれます。GCN（グラフ畳み込みネットワーク）において、明示的および暗黙的な影響は統合され、ソーシャルレコメンデーションにおけるユーザーとアイテムの社会的埋め込みを学習します。Yelp上での実験結果は、提案したモデルの有効性を初期的に示しています。

Abstract:　 Social influence is essential to social recommendation. Current influence-based social recommendation focuses on the explicit influence on observed social links. However, in real cases, implicit social influence can also impact users' preference in an unobserved way. In this work, we concern two kinds of implicit influence: Local Implicit Influence of persons on unobserved interpersonal relations, and Global Implicit Influence of items broadcasted to users. We improve the state-of-the-art GNN-based social recommendation methods by modeling two kinds of implicit influences separately. Local implicit influence is involved by predicting unobserved social relationships. Global implicit influence is involved by defining global popularity of each item and personalize the impact of the popularity on each user. In a GCN network, explicit and implicit influence are integrated to learn the social embedding of users and items in social recommendation. Experimental results on Yelp initially demonstrate the effectiveness of proposed model.

Synthetic Target Domain Supervision for Open Retrieval QA
Authors: Revanth Gangi Reddy (1), Bhavani Iyer (2), Md Arafat Sultan (2), Rong Zhang (2), Avirup Sil (2), Vittorio Castelli (2), Radu Florian (2), Salim Roukos (2)
1: University of Illinois at Urbana Champaign, 2: IBM Research AI

ACM DL

Google Scholar

(195)
概要:　ニューラルパッセージ検索は、オープンリトリーバル質問応答における新しい有望なアプローチです。本研究では、最先端のオープンドメインニューラル検索モデルであるDense Passage Retriever（DPR）をCOVID-19のようなクローズドかつ専門的なターゲットドメインでストレステストし、重要な現実世界の設定で標準的なBM25に遅れを取っていることを発見しました。ドメインシフト下でDPRをより頑健にするために、我々はシンセティックトレーニング例を用いた微調整を探り、テキストからテキストへ変換するジェネレーターを用いてラベルのないターゲットドメインテキストから生成しました。実験の結果、この騒音の多いが完全に自動化されたターゲットドメインの監視により、ドメイン外の設定でBM25に対してDPRにかなりの優位性があることが示され、実際により有望なモデルとなりました。最後に、BM25と改良されたDPRモデルのアンサンブルが最良の結果をもたらし、複数のドメイン外テストセットでオープンリトリーバルQAの最先端をさらに推進しました。

Abstract:　 Neural passage retrieval is a new and promising approach in open retrieval question answering. In this work, we stress-test the Dense Passage Retriever (DPR)---a state-of-the-art (SOTA) open domain neural retrieval model---on closed and specialized target domains such as COVID-19, and find that it lags behind standard BM25 in this important real-world setting. To make DPR more robust under domain shift, we explore its fine-tuning with synthetic training examples, which we generate from unlabeled target domain text using a text-to-text generator. In our experiments, this noisy but fully automated target domain supervision gives DPR a sizable advantage over BM25 in out-of-domain settings, making it a more viable model in practice. Finally, an ensemble of BM25 and our improved DPR model yields the best results, further pushing the SOTA for open retrieval QA on multiple out-of-domain test sets.

Temporal Augmented Graph Neural Networks for Session-Based Recommendations
Authors: Huachi Zhou (1), Qiaoyu Tan (1), Xiao Huang (1), Kaixiong Zhou (2), Xiaoling Wang (3)
1: The Hong Kong Polytechnic University, 2: Texas A&M University, 3: East China Normal University

ACM DL

Google Scholar

(196)
概要:　セッションベースのレコメンデーションは、匿名ユーザーが1回の訪問中にクリックしたシーケンスに基づいて、次にクリックされる可能性が最も高いアイテムを予測することを目的としています。これはプライバシー保護の観点から、多くのレコメンデーションシステムにとって不可欠な機能となっています。しかし、蓄積されたセッション記録が増加するにつれて、ユーザーの興味をモデル化することが難しくなります。特に、時間が経過するにつれユーザーの興味が変動するためです。この動的なユーザーの興味を扱うために、すべての過去のセッションを一度にモデル化したり、定期的にオフラインで再学習を行ったりする取り組みがなされています。しかし、これらの解決策は効率性やユーザーの興味をタイムリーに捉える点で実務的な要件には遠く及びません。この目的のために、我々はメモリエフィシェントなフレームワーク - TASRec を提案します。これは、アイテム間の関係をモデル化するために各日にグラフを構築します。このように、異なる日に同じアイテムが異なる隣接アイテムを持ち、変動するユーザーの興味に対応することができます。我々はこの動的なアイテムグラフを埋め込むためのカスタマイズされたグラフニューラルネットワークを設計し、時間拡張されたアイテム表現を学習します。これに基づいて、与えられたクリックシーケンスの次のアイテムを予測するために、シーケンシャルニューラルアーキテクチャを活用します。実世界のデータセットに対する実験結果から、TASRec が最新のセッションベースのレコメンデーション手法を上回る性能を示しました。

Abstract:　 Session-based recommendation aims to predict the next item that is most likely to be clicked by an anonymous user, based on his/her clicking sequence within one visit. It becomes an essential function of many recommender systems since it protects privacy. However, as the accumulated session records keep increasing, it becomes challenging to model the user interests since they would drift when the time span is large. Efforts have been devoted to handling dynamic user interests by modeling all historical sessions at one time or conducting offline retraining regularly. These solutions are far from practical requirements in terms of efficiency and capturing timely user interests. To this end, we propose a memory-efficient framework - TASRec. It constructs a graph for each day to model the relations among items. Thus, the same item on different days could have different neighbors, corresponding to the drifting user interests. We design a tailored graph neural network to embed this dynamic graph of items and learn temporal augmented item representations. Based on this, we leverage a sequential neural architecture to predict the next item of a given sequence. Experiments on real-world datasets demonstrate that TASRec outperforms state-of-the-art session-based recommendation methods.

Text-to-Text Multi-view Learning for Passage Re-ranking
Authors: Jia-Huei Ju (1), Jheng-Hong Yang (2), Chuan-Ju Wang (1)
1: Academia Sinica, 2: University of Waterloo

ACM DL

Google Scholar

(197)
概要:　近年、自然言語処理の分野では、大規模なコーパスで事前学習された深層文脈表現による大きな進展が見られます。通常、これらの事前学習モデルを特定の下流タスクに対してファインチューニングする際には、単一ビュー学習が用いられます。しかし、文は異なる視点から異なった解釈が可能なため、単一ビュー学習だけでは不十分です。そこで本研究では、典型的な単一ビューのパッセージランキングモデルに、テキスト生成という追加の視点を取り入れたテキスト間マルチビュー学習フレームワークを提案します。実験的に、提案手法は単一ビューのモデルと比較してランキング性能の向上に寄与することが確認されました。また、コンポーネント分析の結果も論文で報告しています。

Abstract:　 Recently, much progress in natural language processing has been driven by deep contextualized representations pretrained on large corpora. Typically, the fine-tuning on these pretrained models for a specific downstream task is based on single-view learning, which is however inadequate as a sentence can be interpreted differently from different perspectives. Therefore, in this work, we propose a text-to-text multi-view learning framework by incorporating an additional view---the text generation view---into a typical single-view passage ranking model. Empirically, the proposed approach is of help to the ranking performance compared to its single-view counterpart. Component analysis is also reported in the paper.

The Winner Takes it All: Geographic Imbalance and Provider (Un)fairness in Educational Recommender Systems
Authors: Elizabeth Gómez (1), Carlos Shui Zhang (1), Ludovico Boratto (2), Maria Salamó (1), Mirko Marras (3)
1: Universitat de Barcelona, 2: University of Cagliari, 3: EPFL

ACM DL

Google Scholar

(198)
概要:　教育用レコメンダーシステムの研究は、推奨されるアイテムの効果に関する部分に多くの努力を注いでいます。しかし、オンラインプラットフォームにおいて重要な役割を担う教師に対するレコメンダーシステムの影響、特にそれらのシステムがコースに対してどの程度の露出を与えるかについては、十分に研究されていません。本論では、現実のプラットフォームからのデータを考慮し、教師の地理的な出身地に対する推奨の分布を分析します。データは、提供されるコースや相互作用の観点から、アメリカ合衆国に著しく偏っていることが観察されました。これらの不均衡は、レコメンダーシステムによって悪化し、その国のデータ表現に対して過度に露出が与えられ、他国の教師に対して不公平を生じさせます。公平性を導入するために、我々は、特定の国で作成されたアイテムに与えられる推奨の割合（可視性）と、推奨リスト内のアイテムの位置（露出）を調整するアプローチを提案します。

Abstract:　 Educational recommender systems channel most of the research efforts on the effectiveness of the recommended items. While teachers have a central role in online platforms, the impact of recommender systems for teachers in terms of the exposure such systems give to the courses is an under-explored area. In this paper, we consider data coming from a real-world platform and analyze the distribution of the recommendations w.r.t. the geographical provenience of the teachers. We observe that data is highly imbalanced towards the United States, in terms of offered courses and of interactions. These imbalances are exacerbated by recommender systems, which overexpose the country w.r.t. its representation in the data, thus generating unfairness for teachers outside that country. To introduce equity, we propose an approach that regulates the share of recommendations given to the items produced in a country (visibility) and the position of the items in the recommended list (exposure).

Transfer-Meta Framework for Cross-domain Recommendation to Cold-Start Users
Authors: Yongchun Zhu (1), Kaikai Ge (2), Fuzhen Zhuang (3), Ruobing Xie (2), Dongbo Xi (1), Xu Zhang (2), Leyu Lin (2), Qing He (1)
1: Institute of Computing Technology, 2: Tencent, 3: Beihang University & Xiamen Data Intelligence Academy of ICT

ACM DL

Google Scholar

(199)
概要:　 :

コールドスタート問題は実用的なレコメンダーシステムにおいて大きな課題です。この問題への有望な解決策の一つはクロスドメインレコメンデーション（CDR）であり、補助的な（ソース）ドメインから豊富な情報を活用してターゲットドメインのレコメンダーシステムの性能を向上させます。これらのCDRアプローチの中でも、Embedding and Mapping methods for CDR (EMCDR)のファミリーは非常に効果的であり、ソース埋め込みからターゲット埋め込みへのマッピング関数を重複するユーザーを用いて明示的に学習します。しかし、これらのアプローチは一つの深刻な問題を抱えています。それは、マッピング関数が限られた重複ユーザーでのみ学習されるため、その関数は限られた重複ユーザーに偏り、ターゲットドメインのコールドスタートユーザーに対する一般化能力が低下し、性能が劣化することです。新しいタスクへの良好な一般化能力を持つメタ学習の利点を活用して、我々はCDRのためのトランスファーメタフレームワーク（TMCDR）を提案します。これはトランスファーステージとメタステージを持ちます。トランスファーステージ（事前学習段階）では、ソースモデルとターゲットモデルがそれぞれソースドメインとターゲットドメインで学習されます。メタステージでは、タスク指向メタネットワークがソースドメインのユーザー埋め込みをターゲットの特徴空間に暗黙的に変換するように学習されます。さらに、TMCDRは一般的なフレームワークであり、MF、BPR、CMLなどのさまざまなベースモデルに適用可能です。AmazonとDoubanのデータを使用して、6つのクロスドメインタスクにおいて広範な実験を行い、TMCDRの優れた性能と互換性を実証しました。

Abstract:　 Cold-start problems are enormous challenges in practical recommender systems. One promising solution for this problem is cross-domain recommendation (CDR) which leverages rich information from an auxiliary (source) domain to improve the performance of recommender system in the target domain. In these CDR approaches, the family of Embedding and Mapping methods for CDR (EMCDR) is very effective, which explicitly learn a mapping function from source embeddings to target embeddings with overlapping users. However, these approaches suffer from one serious problem: the mapping function is only learned on limited overlapping users, and the function would be biased to the limited overlapping users, which leads to unsatisfying generalization ability and degrades the performance on cold-start users in the target domain. With the advantage of meta learning which has good generalization ability to novel tasks, we propose a transfer-meta framework for CDR (TMCDR) which has a transfer stage and a meta stage. In the transfer (pre-training) stage, a source model and a target model are trained on source and target domains, respectively. In the meta stage, a task-oriented meta network is learned to implicitly transform the user embedding in the source domain to the target feature space. In addition, the TMCDR is a general framework that can be applied upon various base models, e.g., MF, BPR, CML. By utilizing data from Amazon and Douban, we conduct extensive experiments on 6 cross-domain tasks to demonstrate the superior performance and compatibility of TMCDR.

Underestimation Refinement: A General Enhancement Strategy for Exploration in Recommendation Systems
Authors: Yuhai Song (1), Lu Wang (1), Haoming Dang (1), Weiwei Zhou (1), Jing Guan (1), Xiwei Zhao (1), Changping Peng (1), Yongjun Bao (1), Jingping Shao (1)
1: JD.com

ACM DL

Google Scholar

(200)
概要:　深層ニューラルネットワークに基づくクリック率（CTR）予測は、推薦システムにおいて著しい進展を遂げてきました。しかし、これらの手法は、ロングテールアイテムに対するインプレッションの不足によってCTRの過小評価に悩まされることが多いです。CTR予測をコンテクスチュアルバンディット問題として定式化した場合、探索手法がこの問題に対する自然な解決策を提供します。本論文では、まず推薦システムの設定における最先端の探索手法のベンチマークを行います。その結果、勾配に基づく不確実性モデリングとトンプソンサンプリングの組み合わせが顕著な優位性を持つことが分かりました。このベンチマークに基づき、さらに一般的な強化戦略であるUnderestimation Refinement（UR）を提案します。この戦略は、インプレッションの不足がCTRの過小評価につながるという事前知識を明示的に取り入れています。この戦略は既存のほぼすべての探索手法に適用可能です。実験結果はURの有効性を検証しており、すべてのベースライン探索手法において一貫した改善を達成しました。

Abstract:　 Click-through rate (CTR) prediction based on deep neural networks has made significant progress in recommendation systems. However, these methods often suffer from CTR underestimation due to insufficient impressions for long-tail items. When formalizing CTR prediction as a contextual bandit problem, exploration methods provide a natural solution addressing this issue. In this paper, we first benchmark state-of-the-art exploration methods in the recommendation system setting. We find that the combination of gradient-based uncertainty modeling and Thompson Sampling achieves a significant advantage. On the basis of the benchmark, we further propose a general enhancement strategy, Underestimation Refinement (UR), which explicitly incorporates the prior knowledge that insufficient impressions likely leads to CTR underestimation. This strategy is applicable to almost all the existing exploration methods. Experimental results validate UR's effectiveness, achieving consistent improvement across all baseline exploration methods.

Web Document Encoding for Structure-Aware Keyphrase Extraction
Authors: Jihyuk Kim (1), Young-In Song (2), Seung-won Hwang (3)
1: Yonsei University, 2: Naver Corp, 3: Seoul National University

ACM DL

Google Scholar

(201)
概要:　我々はウェブ文書からのキーフレーズ抽出（KPE）について研究します。我々の主要な貢献は、タイトルやアンカーなどの構造を活用するためにウェブ文書を符号化し、位置に基づく近接性と構造的関係の両方を表現する単語のグラフを構築することです。我々は、実世界の検索エンジンNAVERおよび人間が注釈を付けたKPEベンチマークでKPEパフォーマンスを評価し、我々の手法は両方のタスクにおいて最新技術を上回ります。

Abstract:　 We study keyphrase extraction (KPE) from Web documents. Our key contribution is encoding Web documents to leverage structure, such as title or anchors, by building a graph of words representing both (a) position-based proximity and (b) structural relations. We evaluate KPE performance on real-world search engine NAVER and human-annotated KPE benchmarks, and ours outperforms state-of-the-arts in both tasks.

Adapted Graph Reasoning and Filtration for Description-Image Retrieval
Authors: Shiqian Chen (1), Zhiling Luo (1), Yingqi Gao (1), Wei Zhou (1), Chenliang Li (2), Haiqing Chen (1)
1: Alibaba Group, 2: School of Cyber Science and Engineering

ACM DL

Google Scholar

(202)
概要:　認知の大幅な低下により、マルチメディアコンテンツは現在ますます重要な情報タイプとなっています。説明が画像と組み合わされることで、コンテンツはより魅力的で説得力のあるものになります。現在、テキストと画像の検索効率を改善するためのいくつかの方法が開発されています。しかし、実際の検索アプリケーションでは、単に内容を説明する表面的なキャプションではなく、生き生きとした簡潔な説明が広く使用されています。そのため、キャプション風のテキストに設計された既存の方法では、この目的を達成できません。ミスマッチを解消するために、我々は説明-画像検索に関する新たな問題を提起し、特別に設計された方法であるAdapted Graph Reasoning and Filtration（AGRF）を提案します。AGRFでは、まず適応されたグラフ推論ネットワークを活用して、画像中の視覚オブジェクトの組み合わせを発見します。その後、説明に依存しない組み合わせを排除するためのクロスモーダルゲートメカニズムを提案します。実世界のデータセットを用いた実験結果は、最先端の方法に対するAGRFの優位性を示しています。

Abstract:　 Due to the significant cognition reduction, multi-media content has become an increasingly important information type nowadays. More and more descriptions are coupled with images to make them more attractive and persuasive. Currently, several text-image retrieval methods have been developed to improve the efficiency of the time-consuming and professional process. However, in practical retrieval applications, it is the vivid and terse descriptions that are widely used, instead of the shallow captions that describe what is contained. Therefore, the most existing methods designed for the caption-style text can not achieve this purpose. To eliminate the mismatch, we introduce a novel problem about description-image retrieval and propose the specially designed method, named Adapted Graph Reasoning and Filtration (AGRF). In AGRF, we firstly leverage an adapted graph reasoning network to discover the combination of visual objects in the image. Then, a cross-modal gate mechanism is proposed to cast aside those description-independent combinations. Experiment results on the real-world dataset demonstrate the advantages of the AGRF over the state-of-the-art methods.

Affective Dependency Graph for Sarcasm Detection
Authors: Chenwei Lou (1), Bin Liang (2), Lin Gui (3), Yulan He (3), Yixue Dang (4), Ruifeng Xu (1)
1: Harbin Institute of Technology, 2: Joint lab of CMS-HITSZ, 3: University of Warwick, 4: China Merchants Securities Co.

ACM DL

Google Scholar

(203)
概要:　皮肉表現の検出は、ソーシャルメディアでの自然言語理解を促進する可能性があります。本論文では、皮肉検出に新たな視点を導入し、長距離的な文字通りの感情不一致を考慮することを目指します。具体的には、外部の感情に関する常識知識から取得した感情情報と文の統語情報に基づいて、各文に対する感情グラフと依存グラフを構築する新しいシナリオを探求します。これに基づき、感情依存グラフ畳み込みネットワーク（Affective Dependency Graph Convolutional Network、ADGCN）フレームワークを提案し、感情情報と依存情報を相互にモデル化することで、長距離の矛盾パターンおよび文脈内の不一致表現を引き出して皮肉を検出します。複数のベンチマークデータセットにおける実験結果は、提案手法が現行の最先端手法を上回る性能を示していることを明らかにしています。

Abstract:　 Detecting sarcastic expressions could promote the understanding of natural language in social media. In this paper, we revisit sarcasm detection from a novel perspective, so as to account for the long-range literal sentiment inconsistencies. More concretely, we explore a novel scenario of constructing an affective graph and a dependency graph for each sentence based on the affective information retrieved from external affective commonsense knowledge and the syntactical information of the sentence. Based on it, an Affective Dependency Graph Convolutional Network (ADGCN) framework is proposed to draw long-range incongruity patterns and inconsistent expressions over the context for sarcasm detection by means with interactively modeling the affective and dependency information. Experimental results on multiple benchmark datasets show that our proposed approach outperforms the current state-of-the-art methods in sarcasm detection.

Automatic Form Filling with Form-BERT
Authors: Gilad Fuchs (1), Haggai Roitman (2), Matan Mandelbrod (2)
1: eBay Research, 2: eBay Research

ACM DL

Google Scholar

(204)
概要:　デジタルフォームは、ユーザーから構造化された情報を収集するためによく使用されます。しかし、多数のフィールドを含むデジタルフォームの記入は面倒でエラーが発生しやすいです。ユーザーのためにフォームフィールドを自動入力することは、ユーザーエクスペリエンスを向上させ、（すべてのフィールドが必須でない場合に）より価値のある情報を収集する可能性を高めるのに非常に有益です。オンラインEコマース市場では、売り手からリスティング属性を収集するためにこのようなフォームがよく利用されています。本研究では、次の入力を考慮してリスティング属性を自動入力するために最適化されたTransformerベースのモデルであるForm-BERTを紹介します。入力には、自由形式のテキスト、既知の属性名のリスト、およびゼロまたは複数の属性値が含まれます。Form-BERTはさらに、フォームの入力が進行するにつれて記入された属性を活用するために反復的に使用することもできます。

Abstract:　 Digital-forms are commonly used for collecting structured information from users. However, filling digital-forms that include a large number of fields is tedious and error-prone. Auto-filling form fields for the user is highly beneficial for improving user experience and potentially collecting more valuable information (in cases where not all fields are mandatory). Online E-commerce marketplaces quite often utilize such forms to collect listing attributes from sellers. In this work, we describe Form-BERT -- a Transformer-based model which is optimized for auto-filling listing attributes given the following inputs: free-text, list of known attribute names, and zero or more attribute values. Form-BERT can be further used iteratively to leverage filled out attributes as the form filling progresses.

Circumstances enhanced Criminal Court View Generation
Authors: Linan Yue (1), Qi Liu (1), Han Wu (1), Yanqing An (1), Li Wang (1), Senchao Yuan (1), Dayong Wu (2)
1: University of Science and Technology of China, 2: IFLYTEK

ACM DL

Google Scholar

(205)
概要:　刑事裁判所の見解生成は、判決結果を解釈する文を自動的に生成することを目的とする法的インテリジェンスにおいて重要なタスクです。裁判所の見解は事件の犯罪状況のと見なされ、AD判決状況（ADJ）およびSEC（量刑状況）を含みます。しかし、異なる状況は大きく異なり、それらを直接使用して裁判所の見解を生成することは、生成性能を制限する可能性があります。したがって、事件の事実におけるADJおよびSEC関連の文を特定し、それぞれを裁判所の見解生成に強化することが必要です。この目的のために、本論文では、新しいADJおよびSECを強化した刑事裁判所の見解生成（C3VG）手法を提案します。この手法は、抽出段階と生成段階で構成されています。具体的には、抽出段階では、ADJおよびSEC関連の文を選択するCircumstances Selectorを設計します。その後、これらの文を2つのジェネレータに適用して、それぞれの状況を強化した裁判所の見解を生成します。2種類の見解を統合することで、最終的な裁判所の見解が得られます。現実のデータセットを用いた広範な実験を行い、実験結果は提案モデルの有効性を明確に検証します。

Abstract:　 Criminal Court View Generation is an essential task in legal intelligence, which aims to automatically generate sentences interpreting judgment results. The court view could be seen as the summary of crime circumstances in a case, including ADjudging Circumstance (ADC) and SEntencing Circumstance (SEC). However, different circumstances vary widely, and adopting them to generate court views directly may limit the generation performance. Therefore, it is necessary to identify the ADC and SEC related sentences in case facts and enhance them into the court view generation, respectively. To this end, in this paper, we propose a novel Circumstances enhanced Criminal Court View Generation (C3VG) method, consisting of the extraction and generation stage. Specifically, in the extraction stage, we design a Circumstances Selector to select ADC and SEC related sentences. After that, we apply them to two generators to generate the circumstances enhanced court views, respectively. After merging the two types of court views, we could obtain the final court views. We evaluate C3VG by conducting extensive experiments on a real-world dataset and experimental results clearly validate the effectiveness of our proposed model.

Cross Interaction Network for Natural Language Guided Video Moment Retrieval
Authors: Xinli Yu (1), Mohsen Malmir (2), Xin He (2), Jiangning Chen (2), Tong Wang (2), Yue Wu (2), Yue Liu (2), Yang Liu (2)
1: Temple University, 2: Amazon.com

ACM DL

Google Scholar

(206)
概要:　ビデオ内で自然言語クエリの基盤を確立することは、クエリ、ビデオ、およびこれらのモダリティ間の情報を融合させる包括的な理解を必要とする難しい課題です。既存の方法の多くは、後から情報を統合する方式でクエリからビデオへの一方向の関係に重点を置いており、クエリとビデオ内およびその間の関係を細分化して捉える効果的な方法が不足しています。さらに、現在の方法はしばしば過度に複雑であり、訓練時間が長くなる傾向があります。我々は、自己アテンションとクロスインタラクション型のマルチヘッドアテンションメカニズムを早期統合方式で提案し、ビデオとクエリ間の内部依存関係および両方向（クエリからビデオ、ビデオからクエリ）の相互関係を捉えることを目指します。クロスアテンションメソッドは、任意の位置でクエリ単語とビデオフレームを関連付け、ビデオコンテキスト内の長距離依存関係を考慮することができます。加えて、開始/終了予測とモーメントセグメンテーションを含むマルチタスク訓練目的を提案します。モーメントセグメンテーションタスクは、アノテータ間の意見の不一致によって引き起こされる開始/終了予測ノイズを補う追加の訓練信号を提供します。我々のシンプルで効果的なアーキテクチャにより、AWS P3.2xlarge GPUインスタンス上で1時間以内の迅速な訓練と即時推論が可能です。我々は、この提案手法が複雑な最先端技術と比較して優れた性能を達成し、特にCharades-STAデータセットでの高IoU指標（R@1, IoU=0.7）において絶対3.52％（相対11.09％）の改善を示しました。

Abstract:　 Natural language query grounding in videos is a challenging task that requires comprehensive understanding of the query, video and fusion of information across these modalities. Existing methods mostly emphasize on the query-to-video one-way interaction with a late fusion scheme, lacking effective ways to capture the relationship within and between query and video in a fine-grained manner. Moreover, current methods are often overly complicated resulting in long training time. We propose a self-attention together with cross interaction multi-head-attention mechanism in an early fusion scheme to capture video-query intra-dependencies as well as inter-relation from both directions (query-to-video and video-to-query). The cross-attention method can associate query words and video frames at any position and account for long-range dependencies in the video context. In addition, we propose a multi-task training objective that includes start/end prediction and moment segmentation. The moment segmentation task provides additional training signals that remedy the start/end prediction noise caused by annotator disagreement. Our simple yet effective architecture enables speedy training (within 1 hour on an AWS P3.2xlarge GPU instance) and instant inference. We showed that the proposed method achieves superior performance compared to complex state of the art methods, in particular surpassing the SOTA on high IoU metrics (R@1, IoU=0.7) by 3.52% absolute (11.09% relative) on the Charades-STA dataset.

Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval
Authors: Yi He (1), Xin Liu (2), Yiu-Ming Cheung (3), Shu-Juan Peng (2), Jinhan Yi (2), Wentao Fan (2)
1: Huaqiao University & Soochow University, 2: Huaqiao University, 3: Hong Kong Baptist University

ACM DL

Google Scholar

(207)
概要:　細粒度な画像-テキスト検索は、マルチメディア解析の分野で重要かつ難解な技術です。従来の手法は主に画像（またはパッチ）と文（または単語）の共通埋め込み空間を学習することに焦点を当てており、その埋め込み空間でのマッピング特徴を直接的に評価しています。しかしながら、従来の画像-テキスト検索の研究では、異種モダリティ間で潜在的に関連する共有セマンティック概念を考慮に入れることが少なく、この概念は埋め込み空間の識別力を高めることができます。この目的のために、我々は共有セマンティック概念を明示的に学習するためのクロスグラフ・アテンションモデル（CGAM）を提案し、各モダリティの特徴学習プロセスをガイドし、共通の埋め込み学習を促進します。具体的には、各モダリティに対してセマンティック埋め込みグラフを構築し、クロスグラフ・アテンションモデルを通じて二つのモダリティ間の差異を平滑化し、共有セマンティック強化特徴を取得します。同時に、共有セマンティック概念と元の埋め込み表現を通じて画像とテキストの特徴を再構築し、類似性計算のためにマルチヘッドメカニズムを活用します。これにより、画像とテキスト間のセマンティック強化クロスモーダル埋め込みが識別的に取得され、高性能な細粒度検索が可能になります。ベンチマークデータセットでの大規模な実験により、最新技術と比較して性能の向上が示されました。

Abstract:　 Fine-grained Image-text retrieval is challenging but vital technology in the field of multimedia analysis. Existing methods mainly focus on learning the common embedding space of images (or patches) and sentences (or words), whereby their mapping features in such embedding space can be directly measured. Nevertheless, most existing image-text retrieval works rarely consider the shared semantic concepts that potentially correlated the heterogeneous modalities, which can enhance the discriminative power of learning such embedding space. Toward this end, we propose a Cross-Graph Attention model (CGAM) to explicitly learn the shared semantic concepts, which can be well utilized to guide the feature learning process of each modality and promote the common embedding learning. More specifically, we build semantic-embedded graph for each modality, and smooth the discrepancy between two modalities via cross-graph attention model to obtain shared semantic-enhanced features. Meanwhile, we reconstruct image and text features via the shared semantic concepts and original embedding representations, and leverage multi-head mechanism for similarity calculation. Accordingly, the semantic-enhanced cross-modal embedding between image and text is discriminatively obtained to benefit the fine-grained retrieval with high retrieval performance. Extensive experiments evaluated on benchmark datasets show the performance improvements in comparison with state-of-the-arts.

DCSpell: A Detector-Corrector Framework for Chinese Spelling Error Correction
Authors: Jing Li (1), Gaosheng Wu (1), Dafei Yin (1), Haozhao Wang (2), Yonggang Wang (1)
1: Sinovation Ventures, 2: HuaZhong University of Science and Technology

ACM DL

Google Scholar

(208)
概要:　スペリングエラー修正(SC)は、テキスト内のスペリングエラーを検出して修正する技術であり、人間の言語理解に広範な応用があります。これまでの解決策には、統計に基づく方法、ワンステージおよびツーステージの機械学習に基づく方法が含まれていますが、これらは双方向モデルを深く構築できず、学習能力が大幅に制限されていました。最近登場したマスクドランゲージモデルにより、トランスフォーマーベースのネットワークはSECにおいて顕著な成功を収めています。しかし、現在のトランスフォーマーベースの中国語SECアルゴリズムはすべてエンドツーエンド方式であり、文の各文字の正誤に関係なく修正するため、誤検出率が高くなります。この問題は、文全体にわずかな誤った文字しか存在しない場合にさらに深刻化します。この問題を解決するために、我々はクローズスタイルの検出-修正フレームワーク(DCSpell)を提案します。まず、誤りを検出してから修正するというアプローチを取ります。具体的には、ELECTRAの識別器を「検出器」として使用し、不正確な文字の位置を検出します。この検出器は、サンプル効率の高い置換トークン検出の事前学習タスクによって訓練されているため、少量のデータでのドメイン適応が可能です。その後、トランスフォーマーベースの「修正器」が各検出された位置に対して正しい文字を見つけます。これは、文章ペアを入力として使用するため、音韻的および視覚的な類似性の知識を組み込む可能性があります。さらに、性能を向上させるために混同行列に基づいたポストプロセッシングを採用しています。実験の結果、DCSpellは、最先端の方法と比較して、SIGHANデータセットで15.7％、実世界の音声コーパスから転写されたデータセットで6.6％のF1スコアの改善を達成しました。

Abstract:　 Spelling Error Correction (SEC) that detects and corrects spelling errors in a text has a wide range of applications in human language understanding. Earlier solutions, including statistic-based methods, one-stage, and two-stage machine learning-based methods, cannot build deeply bidirectional models and significantly confine the learning ability. With the recently emerging masked language models, transformer-based networks have achieved remarkable success in SEC. However, current transformer-based Chinese SEC algorithms are all end-to-end methods, which suffer from high false alarm rates because they correct each character of the sentence regardless of its correctness. This issue becomes even more severe when there exist only a small fraction of incorrect characters in the whole sentence. To solve this problem, we propose a cloze-style detector-corrector framework (DCSpell) that firstly detects whether a character is erroneous before correcting it. Specifically, DCSpell employs the discriminator of ELECTRA as the Detector to detect the positions of incorrect characters. The Detector is trained by a sample-efficient replaced token detection pre-training task, and thus allows domain adaption with a small amount of data. After that, a transformer-based Corrector is used to find the correct character for each detected position. It employs sentence pairs as the input, which potentially incorporates the knowledge of phonological and visual similarity. A confusion-set-based post-processing is used to further improve the performance. Experiments show that DCSpell achieves 15.7% improvement on the SIGHAN dataset and 6.6% improvement on a dataset transcribed from a real-world acoustic speech corpus compared to the state-of-the-art methods in terms of the F1 score.

Decoupling Representation and Regressor for Long-Tailed Information Cascade Prediction
Authors: Fan Zhou (1), Liu Yu (1), Xovee Xu (1), Goce Trajcevski (2)
1: University of Electronic Science and Technology of China, 2: Iowa State University

ACM DL

Google Scholar

(209)
概要:　 :

情報カスケードの規模を効果的に予測することは、影響力の最大化やフェイクニュースの検出など、多くの社会応用の進化を理解する上で非常に重要です。従来の方法はデータの不均衡という課題に直面し、それが満足のいく予測性能を得られない原因となります。これを回避し、数値的安定性を確保するために、従来の研究では問題定義を再構築したり、他のタイプの評価指標を採用したりしていました。しかし、情報カスケードの回帰予測を長尾分布の視点から解決する試みは十分に行われていません。本論文では、表現の抽出から回帰器のファインチューニングまでを行う一般的なデカップリング予測ソリューションを提案します。この方法は、我々が設計したサブネットワーク（SUB）によって生成された重み付きバイアスと元の予測値を組み合わせます。長尾ベンチマークで実施した実験により、我々の方法が最先端の方法に比べて予測精度を大幅に向上させ、長尾カスケード予測問題を緩和することが示されました。

Abstract:　 Effectively predicting the size of information cascades is crucial for understanding the evolution of many social applications, such as influence maximization and fake news detection. Conventional methods face the challenge of data imbalance which, in turn, yields unsatisfactory prediction performance. To prevent the loss functions or metrics from being affected by extreme values and assure numerical stability, previous works reformulate the problem definitions or adopt other types of evaluation metrics. However, solving the regression prediction of information cascades from a long-tailed distribution perspective is under explored. In this paper, we propose a general decoupling prediction solution -- first extracting the representation, then fine-tuning the regressor, which combines the original prediction value and weighted bias generated by a sub-network (SUB) that we designed. Our experiments conducted on long-tailed benchmarks demonstrate that our method significantly improves the prediction accuracy over state-of-the-art methods and mitigates the long-tailed cascade prediction problem.

Deep Music Retrieval for Fine-Grained Videos by Exploiting Cross-Modal-Encoded Voice-Overs
Authors: Tingtian Li (1), Zixun Sun (1), Haoruo Zhang (1), Jin Li (1), Ziming Wu (2), Hui Zhan (1), Yipeng Yu (1), Hengcan Shi (3)
1: Tencent, 2: Tencent, 3: University of Electronic Science and Technology of China

ACM DL

Google Scholar

(210)
概要:　近年、様々なインターネットプラットフォーム上で短編動画の人気が急速に高まっていることから、背景音楽（BGM）の検索システムの必要性が増しています。しかし、既存の動画と音楽の検索方法は視覚的なモダリティのみに基づいており、細部にわたる仮想コンテンツを含む動画に対するパフォーマンスは期待できません。本論文では、短編動画によく追加されるボイスオーバーにも注目し、細部にわたる短編動画に適したBGMを検索する新しいフレームワークを提案します。我々のフレームワークでは、自己注意（SA）とクロスモーダルアテンション（CMA）モジュールを使用して、それぞれ異なるモダリティの内部および相互関係を探ります。モダリティ間のバランスを取るために、フュージョンゲートを介してモーダル特徴に動的に異なる重みを割り当てます。クエリとBGMの埋め込みを一致させるために、トリプレット疑似ラベル損失を導入し、モーダル埋め込みの意味を制約します。既存の仮想コンテンツ動画とBGMの検索データセットがないため、私たちは新たにHoK400とCFM400の2つの仮想コンテンツ動画データセットを構築・公開しました。実験結果は、我々の方法が他の最先端手法に比べて大幅に優れたパフォーマンスを達成し、優位性を示すことを明らかにしました。

Abstract:　 Recently, the witness of the rapidly growing popularity of short videos on different Internet platforms has intensified the need for a background music (BGM) retrieval system. However, existing video-music retrieval methods only based on the visual modality cannot show promising performance regarding videos with fine-grained virtual contents. In this paper, we also investigate the widely added voice-overs in short videos and propose a novel framework to retrieve BGM for fine-grained short videos. In our framework, we use the self-attention (SA) and the cross-modal attention (CMA) modules to explore the intra- and the inter-relationships of different modalities respectively. For balancing the modalities, we dynamically assign different weights to the modal features via a fusion gate. For paring the query and the BGM embeddings, we introduce a triplet pseudo-label loss to constrain the semantics of the modal embeddings. As there are no existing virtual-content video-BGM retrieval datasets, we build and release two virtual-content video datasets HoK400 and CFM400. Experimental results show that our method achieves superior performance and outperforms other state-of-the-art methods with large margins.

Deep Position-wise Interaction Network for CTR Prediction
Authors: Jianqiang Huang (1), Ke Hu (1), Qingtao Tang (1), Mingjian Chen (1), Yi Qi (1), Jia Cheng (1), Jun Lei (1)
1: Meituan

ACM DL

Google Scholar

(211)
概要:　クリック率(CTR)予測は、オンライン広告やレコメンダーシステムにおいて重要な役割を果たします。実際には、CTRモデルの訓練はクリックデータに依存しており、このデータは本質的に高い位置にバイアスがかかっています。というのも、高い位置は自然にCTRが高くなるためです。固定位置推論付きの実際の位置訓練や、位置推論なしの逆選好重み付き訓練などの既存の手法は、バイアス問題をある程度緩和します。しかし、訓練と推論の間で位置情報の取り扱いが異なるため、必然的に一貫性の欠如や最適でないオンラインパフォーマンスを招きます。同時に、これらの手法の基本的な仮定、すなわちクリック確率が検査確率と関連確率の積であるという仮定は、位置と他の情報との間の豊富な相互作用をモデル化するには単純すぎるという問題があります。本論文では、各位置でのCTRを推定するために、全ての候補アイテムと位置を効率的に組み合わせるDeep Position-wise Interaction Network (DPIN)を提案し、オフラインとオンラインの一貫性を達成するとともに、厳しい実行性能の制限下で位置、ユーザ、コンテキスト、アイテム間の深い非線形相互作用をモデル化します。CTR予測における位置バイアスに対する新たな取り組みに続いて、特定の位置でのランキングの質を測定するのに適した新しい評価指標であるPAUC（位置別AUC）を提案します。実データセットでの広範な実験を通して、我々の手法が位置バイアス問題を解決する上で効果的かつ効率的であることを実証しました。また、我々の手法を実運用環境に導入し、厳格なA/Bテストにおいて、高度に最適化されたベースラインを統計的に有意に上回る改善を観測しました。

Abstract:　 Click-through rate(CTR) prediction plays an important role in online advertising and recommender systems. In practice, the training of CTR models depends on click data which is intrinsically biased towards higher positions since higher position has higher CTR by nature. Existing methods such as actual position training with fixed position inference and inverse propensity weighted training with no position inference alleviate the bias problem to some extend. However, the different treatment of position information between training and inference will inevitably lead to inconsistency and sub-optimal online performance. Meanwhile, the basic assumption of these methods, i.e., the click probability is the product of examination probability and relevance probability, is oversimplified and insufficient to model the rich interaction between position and other information. In this paper, we propose a Deep Position-wise Interaction Network (DPIN) to efficiently combine all candidate items and positions for estimating CTR at each position, achieving consistency between offline and online as well as modeling the deep non-linear interaction among position, user, context and item under the limit of serving performance. Following our new treatment to the position bias in CTR prediction, we propose a new evaluation metrics named PAUC (position-wise AUC) that is suitable for measuring the ranking quality at a given position. Through extensive experiments on a real world dataset, we show empirically that our method is both effective and efficient in solving position bias problem. We have also deployed our method in production and observed statistically significant improvement over a highly optimized baseline in a rigorous A/B test.

Deep User Match Network for Click-Through Rate Prediction
Authors: Zai Huang (1), Mingyuan Tao (1), Bufeng Zhang (1)
1: Alibaba Group

ACM DL

Google Scholar

(212)
概要:　クリック率（CTR）予測は、多くのアプリケーション（例：レコメンダーシステム）において重要なタスクです。最近では、特徴の相互作用やユーザー行動と候補アイテムの関連性に基づくユーザーの興味に焦点を当てた深層学習ベースのモデルが提案され、CTR予測に成功裏に応用されています。しかし、これらの既存モデルは、ターゲットユーザーと候補アイテムを好むユーザーとの間のユーザー間関連性を考慮しておらず、これがターゲットユーザーの好みを反映する可能性があります。そこで本論文では、CTR予測のためにユーザー間関連性を測定する新しい「Deep User Match Network（DUMN）」を提案します。具体的には、DUMNでは、ユーザーの行動に基づいてユーザーの潜在的な興味を含む統一されたユーザー表現を学習する「ユーザー表現層」を設計しました。次に、「ユーザーマッチ層」を設計して、ターゲットユーザーと候補アイテムに対するインタラクションを持つユーザーをマッチングし、ユーザー表現空間におけるそれらの類似性をモデル化することで、ユーザー間の関連性を測定します。3つの公共の実世界データセットに基づく広範な実験結果により、DUMNが最先端の手法と比較して有効であることが確認されました。

Abstract:　 Click-through rate (CTR) prediction is a crucial task in many applications (e.g. recommender systems). Recently deep learning based models have been proposed and successfully applied for CTR prediction by focusing on feature interaction or user interest based on the item-to-item relevance between user behaviors and candidate item. However, these existing models neglect the user-to-user relevance between the target user and those who like the candidate item, which can reflect the preference of target user. To this end, in this paper, we propose a novel Deep User Match Network (DUMN) which measures the user-to-user relevance for CTR prediction. Specifically, in DUMN, we design a User Representation Layer to learn a unified user representation which contains user latent interest based on user behaviors. Then, User Match Layer is designed to measure the user-to-user relevance by matching the target user and those who have interacted with candidate item and modeling their similarities in user representation space. Extensive experimental results on three public real-world datasets validate the effectiveness of DUMN compared with state-of-the-art methods.

Distant Supervision based Machine Reading Comprehension for Extractive Summarization in Customer Service
Authors: Bing Ma (1), Cao Liu (2), Jingyu Wang (3), Shujie Hu (2), Fan Yang (2), Xunliang Cai (2), Guanglu Wan (2), Jiansong Chen (2), Jianxin Liao (3)
1: Beijing University of Posts and Telecommunications & Meituan, 2: Meituan, 3: Beijing University of Posts and Telecommunications

ACM DL

Google Scholar

(213)
概要:　長文を対象とするシステムは、元のテキストの重要な情報を保持しつつ、短いハイライトを得ることを目的としています。カスタマーサービスにおいては、オペレーターとユーザー間の対話のは、ユーザーの質問、ユーザーの目的、オペレーターの解決策など、いくつかの固定された重要なポイントに焦点を当てます。従来の抽出型手法では、すべての定義済みの重要ポイントを正確に抽出することは困難です。さらに、重要なポイントを含む大規模かつ高品質な抽出型データセットが不足しています。これらの課題を解決するために、我々は抽出型のための遠隔監督に基づく機械読解モデル（DSMRC-S）を提案します。DSMRC-Sは、タスクを機械読解問題に変換し、定義済みの質問に従って元のテキストから正確に重要ポイントを抜き出します。加えて、適格な抽出型データセットの不足を緩和するために、遠隔監督法を提案します。我々は、カスタマーサービスシナリオで収集された大規模なデータセットで実験を行い、その結果、提案されたDSMRC-Sが強力なベースライン手法をROUGE-Lで4ポイント上回ることを示しました。

Abstract:　 Given a long text, the summarization system aims to obtain a shorter highlight while keeping important information on the original text. For customer service, the summaries of most dialogues between an agent and a user focus on several fixed key points, such as user's question, user's purpose, the agent's solution, and so on. Traditional extractive methods are difficult to extract all predefined key points exactly. Furthermore, there is a lack of large-scale and high-quality extractive summarization datasets containing key points. In order to solve the above challenges, we propose a Distant Supervision based Machine Reading Comprehension model for extractive Summarization (DSMRC-S). DSMRC-S transforms the summarization task into the machine reading comprehension problem, to fetch key points from the original text exactly according to the predefined questions. In addition, a distant supervision method is proposed to alleviate the lack of eligible extractive summarization datasets. We conduct experiments on a large-scale summarization dataset collected in customer service scenarios, and the results show that the proposed DSMRC-S outperforms the strong baseline methods by 4 points on ROUGE-L.

Does BERT Pay Attention to Cyberbullying?
Authors: Fatma Elsafoury (1), Stamos Katsigiannis (2), Steven R. Wilson (3), Naeem Ramzan (1)
1: University of the West of Scotland, 2: Durham University, 3: University of Edinburgh

ACM DL

Google Scholar

(214)
概要:　ソーシャルメディアは、サイバーブルーイングのような脅威をもたらしました。これにより、ストレス、不安、抑うつにつながり、重篤な場合には自殺未遂に至ることもあります。サイバーブルーイングを検出することは、攻撃者を警告・ブロックし、被害者への支援を提供するのに役立ちます。しかし、自己注意機構に基づく言語モデル（BERTなど）をサイバーブルーイング検出に使用する研究は非常に少なく、BERTの性能を報告するだけで、その性能の背後にある理由を詳しく検討することはほとんどありません。本研究では、さまざまなデータセットでのサイバーブルーイング検出におけるBERTの使用を検討し、テキストおよび言語の特徴に対するBERTの注意重みと勾配に基づく特徴重要度スコアを分析することで、その性能を説明しようと試みます。我々の結果は、注意重みが特徴重要度スコアと相関しておらず、したがってモデルの性能を説明しないことを示しています。加えて、BERTがサイバーブルーイングに関連する言語特徴ではなく、データセット内の統語バイアスに依存してクラス関連の単語に特徴重要度スコアを割り当てていることを示唆しています。

Abstract:　 Social media have brought threats like cyberbullying, which can lead to stress, anxiety, depression, and in some severe cases, suicide attempts. Detecting cyberbullying can help to warn/ block bullies and provide support to victims. However, very few studies have used self-attention-based language models like BERT for cyberbullying detection and they typically only report BERT's performance without examining in depth the reasons for its performance. In this work, we examine the use of BERT for cyberbullying detection on various datasets and attempt to explain its performance by analyzing its attention weights and gradient-based feature importance scores for textual and linguistic features. Our results show that attention weights do not correlate with feature importance scores and thus do not explain the model's performance. Additionally, they suggest that BERT relies on syntactical biases in the datasets to assign feature importance scores to class-related wordsrather than cyberbullying-related linguistic features.

ECG Data Modeling and Analyzing via Deep Representation Learning and Nonparametric Hidden Markov Models
Authors: Jiaojiao Zhu (1), Wentao Fan (2)
1: Huaqiao University & Soochow University, 2: Huaqiao University

ACM DL

Google Scholar

(215)
概要:　現代臨床医学において、心電図（ECG）は心血管疾患の一般的な診断技術です。本論文の目的は、ECGデータを解析するための新しいモデルベースのクラスタリング手法を提案することです。我々のアプローチは、表現学習とECGデータクラスタリングの二つのモジュールから構成されています。

表現学習のモジュールでは、観測されたECGデータの表現を抽出するために、変分オートエンコーダ（VAE）と長短期記憶（LSTM）ネットワークに基づく、超球面変分再帰オートエンコーダ（HVRAE）と呼ばれる深層生成モデルを開発しました。ECGデータクラスタリングのモジュールでは、学習過程で隠れ状態の数を自動的に推測するディリクレ過程に基づく非パラメトリック隠れマルコフモデル（NHMM）を開発しました。

さらに、我々のNHMMの各隠れ状態の放出密度は、ガウス分布などの他の一般的に使用される分布よりもECG表現のモデリング能力に優れるフォン・ミーゼス-フィッシャー（VMF）分布の混合に従います。提案するVMFベースのNHMMを学習するために、変分ベイズに基づく効果的な学習アルゴリズムを理論的に開発しました。我々のモデルベースのクラスタリング手法のECGデータ解析に関する有効性は、公開されているECGデータセットを用いた実験を通じて検証されました。

Abstract:　 In modern clinical medicine, electrocardiogram (ECG) is a common diagnosis technique of cardiovascular diseases. The purpose of this paper is to propose a novel model-based clustering approach for analyzing ECG data. Our approach is composed of two modules: representation learning and ECG data clustering. In the module of representation learning, a deep generative model referred to as the hyperspherical variational recurrent autoencoder (HVRAE) is developed to extract the representation of observed ECG data, based on the variational autoencoder (VAE) with long short-term memory (LSTM) networks. In the module of ECG data clustering, we develop a nonparametric hidden Markov model (NHMM) based on Dirichlet process in which the number of hidden states is inferred automatically during the learning process. Moreover, the emission density of each hidden state of our NHMM follows a mixture of von Mises-Fisher (VMF) distributions which have better capability for modeling ECG representations than other commonly used distributions (such as the Gaussian distribution). To learn the proposed VMF-based NHMM, we theoretically develop an effective learning algorithm based on variational Bayes. The merits of our model-based clustering approach for analyzing ECG data are verified through experiments on publicly available ECG data sets.

Faster Index Reordering with Bipartite Graph Partitioning
Authors: Joel Mackenzie (1), Matthias Petri (2), Alistair Moffat (1)
1: The University of Melbourne, 2: Amazon Alexa

ACM DL

Google Scholar

(216)
概要:　文章の再配置に関する二部グラフ分割アプローチ（Dhulipala et al., KDD 2016）を再検討し、インデックス最小化の文書順序を高速に計算できるようにするための幅広いアルゴリズム的およびヒューリスティックな改良を考察する。我々の最終的な実装は、開始時に基準とした実装よりも約4倍速く実行され、3つの大規模テキストコレクションにおいて同等、またはわずかに優れた圧縮効率を達成した。

Abstract:　 We revisit the Bipartite Graph Partitioning approach to document reordering (Dhulipala et al., KDD 2016), and consider a range of algorithmic and heuristic refinements that lead to faster computation of index-minimizing document orderings. Our final implementation executes approximately four times faster than the reference implementation we commence with, and obtains the same, or slightly better, compression effectiveness on three large text collections.

Follow the Prophet: Accurate Online Conversion Rate Prediction in the Face of Delayed Feedback
Authors: Haoming Li (1), Feiyang Pan (1), Xiang Ao (1), Zhao Yang (1), Min Lu (2), Junwei Pan (2), Dapeng Liu (2), Lei Xiao (2), Qing He (1)
1: Institute of Computing Technology, 2: Tencent

ACM DL

Google Scholar

(217)
概要:　オンライン広告では、数分から数日にわたる多様なコンバージョンフィードバック遅延が、重要な課題の一つとなっています。異なる種類の広告やユーザーに対して同一でない遅延を持つため、適切なオンライン学習システムの設計は困難です。本論文では、オンライン広告におけるフィードバック遅延問題に対処するために、「予言者を追う」(Following the Prophet、短縮してFTP)という手法を提案します。鍵となる洞察は、すべてのログされたサンプルに即時フィードバックがあった場合、遅延なしのフィードバックモデル、つまり「予言者」モデルを得ることができるという点です。オンライン学習中には予言者モデルを直接得ることはできませんが、異なる期間のフィードバックパターンを捉えた複数のタスク予測を集約するポリシーによって、予言者の予測を模倣できることを示します。本論文では、このポリシーの目的関数と最適化手法を提案し、ログデータを用いて予言者を模倣します。3つの実世界の広告データセットにおける広範な実験により、我々の手法が従来の最先端の基準手法を上回る性能を示すことが確認されました。

Abstract:　 The delayed feedback problem is one of the imperative challenges in online advertising, which is caused by the highly diversified feedback delay of a conversion varying from a few minutes to several days. It is hard to design an appropriate online learning system under these non-identical delay for different types of ads and users. In this paper, we propose to tackle the delayed feedback problem in online advertising by "Following the Prophet" (FTP for short). The key insight is that, if the feedback came instantly for all the logged samples, we could get a model without delayed feedback, namely the "prophet". Although the prophet cannot be obtained during online learning, we show that we could predict the prophet's predictions by an aggregation policy on top of a set of multi-task predictions, where each task captures the feedback patterns of different periods. We propose the objective and optimization approach for the policy, and use the logged data to imitate the prophet. Extensive experiments on three real-world advertising datasets show that our method outperforms the previous state-of-the-art baselines.

GAIPS: Accelerating Maximum Inner Product Search with GPU
Authors: Long Xiang (1), Xiao Yan (2), Lan Lu (2), Bo Tang (1)
1: Southern University of Science and Technology & Peng Cheng Laboratory, 2: Department of Computer Science and Engineering

ACM DL

Google Scholar

(218)
概要:　本論文では、GPU上で効率的な最大内積探索（MIPS）を行うためのGAIPSフレームワークを提案します。クエリは通常、データセットのごく一部を占める大きなノルムのアイテムの中で、その最大内積の良好な下限を見つけることができ、この事実を利用して剪定を促進することを観察しました。加えて、ノルムベース、残差ベース、ハッシュベースの剪定技術を設計し、MIPSの結果になりそうもないアイテムの計算を回避します。実験結果は、最新のGPUベースの類似検索フレームワークであるFAISSと比較して、GAIPSが同じリコール率でクエリ処理時間が大幅に短縮されることを示しています。

Abstract:　 In this paper, we propose the GAIPS framework for efficient maximum inner product search (MIPS) on GPU. We observe that a query can usually find a good lower bound of its maximum inner product in some large norm items that take up only a small portion of the dataset and utilize this fact to facilitate pruning. In addition, we design norm-based, residue-based and hash-based pruning techniques to avoid computation for items that are unlikely to be the MIPS results. Experiment results show that compared with FAISS, the state-of-the-art GPU-based similarity search framework, GAIPS has significantly shorter query processing time at the same recall.

Generalized Zero-shot Intent Detection via Commonsense Knowledge
Authors: A.B. Siddique (1), Fuad Jamour (1), Luxun Xu (1), Vagelis Hristidis (1)
1: University of California

ACM DL

Google Scholar

(219)
概要:　自然言語の発話からユーザーの意図を識別することは、対話型システムにおいて非常に重要なステップであり、これまでも広く監督分類問題として研究されてきました。しかし、実際には、意図検出モデルをデプロイした後に新しい意図が出現することもあります。したがって、これらのモデルは、既知および未知の意図の両方を持つ発話をシームレスに適応し、分類する必要があります。未知の意図はデプロイ後に出現し、訓練データが存在しないものです。この設定に対応する既存のモデルは、既知の意図の訓練データに大きく依存し、それに過剰適合する傾向があり、その結果、未知の意図を持つ発話を既知の意図に誤分類するバイアスが生じます。私たちは、この問題を克服するために、常識知識を無監督の形で活用する意図検出モデルRIDEを提案します。RIDEは、発話と意図ラベルとの間の深い意味論的関係を捉えた堅牢で一般化可能な関係メタ特徴を計算します。これらの特徴は、発話中の概念が意図ラベル中の概念と常識知識を介してどのようにリンクしているかを考慮して計算されます。三つの広く使用されている意図検出ベンチマークにおける広範な実験分析の結果、関係メタ特徴が既知および未知の意図の検出を大幅に改善し、RIDEが最新のモデルを上回ることが示されました。

Abstract:　 Identifying user intents from natural language utterances is a crucial step in conversational systems that has been extensively studied as a supervised classification problem. However, in practice, new intents emerge after deploying an intent detection model. Thus, these models should seamlessly adapt and classify utterances with both seen and unseen intents -- unseen intents emerge after deployment and they do not have training data. The few existing models that target this setting rely heavily on the training data of seen intents and consequently overfit to these intents, resulting in a bias to misclassify utterances with unseen intents into seen ones. We propose RIDE: an intent detection model that leverages commonsense knowledge in an unsupervised fashion to overcome the issue of training data scarcity. RIDE computes robust and generalizable relationship meta-features that capture deep semantic relationships between utterances and intent labels; these features are computed by considering how the concepts in an utterance are linked to those in an intent label via commonsense knowledge. Our extensive experimental analysis on three widely-used intent detection benchmarks shows that relationship meta-features significantly improve the detection of both seen and unseen intents and that RIDE outperforms the state-of-the-art models.

Graph-Structured Context Understanding for Knowledge-grounded Response Generation
Authors: Yanran Li (1), Wenjie Li (2), Zhitao Wang (2)
1: The Hong Kong Polytechnic University, 2: The Hong Kong Polytechnic University

ACM DL

Google Scholar

(220)
概要:　本研究では、会話の発言と外部知識の両方からコンテキストグラフを構築し、会話の文脈をよりよく理解するための新しいグラフベースのエンコーダを開発しました。具体的には、このエンコーダはコンテキストグラフ内の情報を段階的に融合し、グラフ内の各ノードのグローバルなコンテキストグラフ認識表現を提供することで、知識に基づいた応答生成を促進します。大規模な会話コーパスを用いて提案手法の有効性を検証し、会話理解における知識の利点を示しました。

Abstract:　 In this work, we establish a context graph from both conversation utterances and external knowledge, and develop a novel graph-based encoder to better understand the conversation context. Specifically, the encoder fuses the information in the context graph stage-by-stage and provides global context-graph-aware representations of each node in the graph to facilitate knowledge-grounded response generation. On a large-scale conversation corpus, we validate the effectiveness of the proposed approach and demonstrate the benefit of knowledge in conversation understanding.

Hierarchical Dependence-aware Evaluation Measures for Conversational Search
Authors: Guglielmo Faggioli (1), Marco Ferrante (1), Nicola Ferro (1), Raffaele Perego (2), Nicola Tonellotto (3)
1: University of Padova, 2: National Research Council, 3: University of Pisa

ACM DL

Google Scholar

(221)
概要:　対話エージェントに関する研究は、大規模なコンテキスト化された言語モデルによる言語理解の進展により、情報検索（IR）コミュニティで多くの注目を集めています。IR研究者たちは、新しいアプローチの適切な評価の重要性を早くから認識してきました。しかし、対話検索の評価技術の開発は依然として見過ごされている問題です。現在、ほとんどの評価アプローチは、即席検索評価から直接引き出された手法に依存しており、対話中の発話をただの別個のトピックとして独立したイベントのように扱うことで、対話の脈絡を考慮していません。この問題を克服するために、対話の脈絡や発話の意味的依存関係を踏まえた評価指標を定義するためのフレームワークを提案します。具体的には、対話を有向非巡回グラフ（DAG）としてモデル化し、自己説明的な発話をルートノードとし、照応的な発話を意味的情報が不足している文にリンクします。その後、対話グラフを基に評価指標の階層的な依存関係を考慮した集約手法のファミリーを提案します。我々の実験では、同じ会話からの発話が異なる会話からの発話よりも20％も相関が高いことを示します。提案されたフレームワークを使うことで、そのような相関を集計に含めることができ、対話システムのペアが有意に異なるかどうかをより正確に判断することが可能となります。

Abstract:　 Conversational agents are drawing a lot of attention in the information retrieval (IR) community also thanks to the advancements in language understanding enabled by large contextualized language models. IR researchers have long ago recognized the importance o fa sound evaluation of new approaches. Yet, the development of evaluation techniques for conversational search is still an underlooked problem. Currently, most evaluation approaches rely on procedures directly drawn from ad-hoc search evaluation, treating utterances in a conversation as independent events, as if they were just separate topics, instead of accounting for the conversation context. We overcome this issue by proposing a framework for defining evaluation measures that are aware of the conversation context and the utterance semantic dependencies. In particular, we model the conversations as Direct Acyclic Graphs (DAG), where self-explanatory utterances are root nodes, while anaphoric utterances are linked to sentences that contain their missing semantic information. Then,we propose a family of hierarchical dependence-aware aggregations of the evaluation metrics driven by the conversational graph. In our experiments, we show that utterances from the same conversation are 20% more correlated than utterances from different conversations. Thanks to the proposed framework, we are able to include such correlation in our aggregations, and be more accurate when determining which pairs of conversational systems are deemed significantly different.

Improving Response Quality with Backward Reasoning in Open-domain Dialogue Systems
Authors: Ziming Li (1), Julia Kiseleva (2), Maarten de Rijke (3)
1: University of Amsterdam, 2: Microsoft, 3: University of Amsterdam & Ahold Delhaize

ACM DL

Google Scholar

(222)
概要:　インフォーマティブで一貫性のある対話応答を生成する能力は、人間のようなオープンドメイン対話システムを設計する際の重要な要素です。エンコーダ・デコーダを基盤とする対話モデルは、デコードステップで一般的で退屈な応答を生成する傾向があります。なぜなら、最も予測可能な応答が最も適切な応答ではなく、情報量の少ない応答である可能性が高いためです。この問題を緩和するために、バニラのエンコーダ・デコーダ訓練に逆推論ステップを追加して双方向にモデルを訓練することを提案します。提案する逆推論ステップは、順方向生成ステップの出力を用いて逆方向に対話コンテキストを推測するため、モデルがよりインフォーマティブで一貫性のある内容を生成するように促します。我々の方法の利点は、順方向生成ステップと逆推論ステップが潜在変数を使用して双方向最適化を促進することで同時に訓練される点にあります。我々の方法は、補助情報（例：事前学習されたトピックモデル）を導入せずに応答品質を向上させることができます。提案する双方向応答生成方法は、応答品質において最先端の性能を達成します。

Abstract:　 Being able to generate informative and coherent dialogue responses is crucial when designing human-like open-domain dialogue systems. Encoder-decoder-based dialogue models tend to produce generic and dull responses during the decoding step because the most predictable response is likely to be a non-informative response instead of the most suitable one. To alleviate this problem, we propose to train the generation model in a bidirectional manner by adding a backward reasoning step to the vanilla encoder-decoder training. The proposed backward reasoning step pushes the model to produce more informative and coherent content because the forward generation step's output is used to infer the dialogue context in the backward direction. The advantage of our method is that the forward generation and backward reasoning steps are trained simultaneously through the use of a latent variable to facilitate bidirectional optimization. Our method can improve response quality without introducing side information (e.g., a pre-trained topic model). The proposed bidirectional response generation method achieves state-of-the-art performance for response quality.

Knowledge Based Hyperbolic Propagation
Authors: Chang-You Tai (1), Chien-Kun Huang (1), Liang-Ying Huang (1), Lun-Wei Ku (1)
1: Academia Sinica

ACM DL

Google Scholar

(223)
概要:　異種知識グラフ（KGs）を推薦システムの補助情報として利用する分野で、大きな進展が見られます。しかし、既存のKG対応推薦モデルはユークリッド空間にのみ依存しており、埋め込みを分離する能力において優位性を持つと既に示されている双曲空間を無視していました。我々は、知識伝播の精度向上を目的として、KG属性の重要度を計算するための双曲成分を含む知識ベースの双曲伝播フレームワーク（KBHP）を提案します。元々の知識グラフ内の関係に加えて、ユーザー購買関係を提案し、ユーザーの嗜好をモデル化するためにユーザーとアイテムを結びつける論理パターンを双曲空間により良く表現します。四つの実世界のベンチマークでの実験により、KBHPは最先端のモデルと比較して大幅に正確であることが示されました。また、生成された埋め込みを可視化することで、提案モデルがアイテムに関連する属性をうまくクラスタリングし、推薦に有用な情報を含む属性を強調していることを示します。

Abstract:　 There has been significant progress in utilizing heterogeneous knowledge graphs (KGs) as auxiliary information in recommendation systems. However, existing KG-aware recommendation models rely solely on Euclidean space, neglecting hyperbolic space, which has already been shown to possess a superior ability to separate embed-dings by providing more "room". We propose a knowledge-based hyperbolic propagation framework (KBHP) which includes hyperbolic components for calculating the importance of KG attributes relative to achieve better knowledge propagation. In addition to the original relations in the knowledge graph, we propose a user purchase relation to better represent logical patterns in hyperbolic space, which bridges users and items for modeling user preference. Experiments on four real-world benchmarks show that KBHP is significantly more accurate than state-of-the-art models. We further visualize the generated embeddings to demonstrate that the proposed model successfully clusters attributes that are relevant to items and highlights those that contain useful information for the recommendation.

Learning to Select Instance: Simultaneous Transfer Learning and Clustering
Authors: Zhaoxin Huan (1), Yulong Wang (1), Yong He (1), Xiaolu Zhang (1), Chilin Fu (1), Weichang Wu (1), Jun Zhou (1), Ke Ding (1), Liang Zhang (1), Linjian Mo (1)
1: Ant Group

ACM DL

Google Scholar

(224)
概要:　転移学習は、データが豊富なソースドメインの知識を、データが乏しいターゲットドメインに活用する技術です。しかし、ソースデータとターゲットデータの分布の違いが、転移の有効性を弱める要因となります。このギャップを埋めるために、本研究では、ターゲットドメインと密接に関連し、同じ分布を持つソースインスタンスを選択することに焦点を当てます。本論文では、転移の有効性を向上させる新しい手法である、適応クラスタリング転移学習（ACTL）を提案します。具体的には、インスタンスセレクタと転移学習モデルを同時に訓練します。セレクタは、トレーニングデータに対して適応的にクラスタリングを行い、ソースインスタンスの重みを学習します。この重みにより、転移学習中に該当するソースインスタンスの貢献度が強化または抑制されます。同時に、転移学習モデルは、目的関数に従って適切に重みを学習するようセレクタを導きます。我々の手法の有効性を評価するために、推薦システムとテキストマッチングを含む二つの異なるタスクで実験を行いました。実験結果は、我々の手法が競合手法に対して一貫して優れた性能を示し、選択されたソースインスタンスがターゲットドメインと類似したデータ分布を共有していることを示しています。

Abstract:　 Transfer learning leverages knowledge from a source domain with rich data to a target domain with sparse data. However, the difference between the source and target data distribution weakens the transferability. To bridge this gap, we focus on selecting source instances that are closely related to and have the same distribution as the target domain. In this paper, we propose a novel Adaptive Clustering Transfer Learning (ACTL) method to improve transferability. Specifically, we simultaneously train the instance selector and the transfer learning model. The selector adaptively conducts clustering on the training data and learns the weights for source instances. The weight will activate or inhibit the contribution of the corresponding source instance during transfer learning. Meanwhile, the transfer learning model guides the selector to learn the weight appropriately according to the objective function. To evaluate the effectiveness of our method, we conduct experiments on two different tasks including recommender system and text matching. Experimental results show that our method consistently outperforms competing methods and the selected source instances share a similar data distribution with the target domain.

LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question Answering
Authors: Zujie Liang (1), Haifeng Hu (1), Jiaying Zhu (1)
1: Sun Yat-sen University

ACM DL

Google Scholar

(225)
概要:　既存の多くのVisual Question Answering (VQA)システムは、言語バイアスに過度に依存し、その結果、視覚的な手がかりから推論することができません。この問題に対処するために、我々は新たなLanguage-Prior Feedback (LPF)目的関数を提案し、VQA全体の損失における各回答の損失値の比率を再調整します。LPFは最初に、質問のみのブランチを用いて言語バイアスを決定するための調整因子を計算します。その後、LPFは訓練プロセス中に各訓練サンプルに自己適応型の重みを割り当てます。この重み付けメカニズムにより、LPFは全体のVQA損失がよりバランスの取れた形に再構築されることを保証します。この方法により、特定の視覚情報が必要なサンプルが効率的に訓練に利用されます。我々の手法は実装が簡単で、モデル非依存型であり、エンドツーエンドで訓練可能です。広範な実験を行った結果、LPFは (1) さまざまなVQAモデルにおいて大幅な改善をもたらし、(2) バイアスに敏感なVQA-CP v2ベンチマークで競争力のある性能を達成することが示されました。

Abstract:　 Most existing Visual Question Answering (VQA) systems tend to overly rely on the language bias and hence fail to reason from the visual clue. To address this issue, we propose a novel Language-Prior Feedback (LPF) objective function, to re-balance the proportion of each answer's loss value in the total VQA loss. The LPF firstly calculates a modulating factor to determine the language bias using a question-only branch. Then, the LPF assigns a self-adaptive weight to each training sample in the training process. With this reweighting mechanism, the LPF ensures that the total VQA loss can be reshaped to a more balanced form. By this means, the samples that require certain visual information to predict will be efficiently used during training. Our method is simple to implement, model-agnostic, and end-to-end trainable. We conduct extensive experiments and the results show that the LPF (1) brings a significant improvement over various VQA models, (2) achieves competitive performance on the bias-sensitive VQA-CP v2 benchmark.

LS-DST: Long and Sparse Dialogue State Tracking with Smart History Collector in Insurance Marketing
Authors: Liqiang Song (1), Mengqiu Yao (1), Ye Bi (1), Zhenyu Wu (1), Jianming Wang (2), Jing Xiao (1), Juan Wen (3), Xin Yu (3)
1: Ping An Technology Co, 2: Ping An Technology Co, 3: Ping An Life Insurance Company

ACM DL

Google Scholar

(226)
概要:　保険代理店の業務は、従来のタスク指向型やオープンドメイン対話システムとは異なり、顧客の具体的なニーズを満たし、情緒的な交流を提供することを目指しています。その結果、顧客と代理店間の対話は非常に長くなり、その多くのターンがマーケティングに有益な手がかりを含まない雑談で占められます。これは、保険マーケティングにおける対話状態追跡タスクにチャレンジをもたらします。こうした長く散発的な対話に対処するために、我々は新しい対話状態追跡アーキテクチャを提案します。このアーキテクチャは、対話エンコーダ、スマート履歴コレクター（SHC）、対話状態分類器の3つのコンポーネントで構成されています。SHCは、スロットアテンションを介して関連する対話履歴を効果的に選択し、対話履歴メモリを更新することにより、意図的に設計された記憶ネットワークです。SHCを用いることで、我々のモデルは重要な情報を追跡し、純粋な雑談を除外することができます。実験結果は、我々の提案するLS-DSTが、実際の保険対話データセットにおいて最先端のベースラインを大幅に上回ることを示しています。

Abstract:　 Different from traditional task-oriented and open-domain dialogue systems, insurance agents aim to engage customers for helping them satisfy specific demands and emotional companionship. As a result, customer-to-agent dialogues are usually very long, and many turns of them are pure chit-chat without any useful marketing clues. This brings challenges to dialogue state tracking task in insurance marketing. To deal with these long and sparse dialogues, we propose a new dialogue state tracking architecture containing three components: dialogue encoder, Smart History Collector (SHC) and dialogue state classifier. SHC, a deliberately designed memory network, effectively selects relevant dialogue history via slot-attention, and then updates dialogue history memory. With SHC, our model is able to keep track of the vital information and filter out pure chit-chat. Experimental results demonstrate that our proposed LS-DST significantly outperforms the state-of-the-art baselines on real insurance dialogue dataset.

Medical Triage Chatbot Diagnosis Improvement via Multi-relational Hyperbolic Graph Neural Network
Authors: Zheng Liu (1), Xiaohan Li (1), Zeyu You (2), Tao Yang (2), Wei Fan (2), Philip Yu (1)
1: University of Illinois at Chicago, 2: Tencent

ACM DL

Google Scholar

(227)
概要:　医療トリアージのチャットボットは、症状や病歴に関連する質問をすることで、事前診断に広く使用されています。オンラインチャットボットシステムを通じて患者から収集される情報は、不完全かつ不正確なことが多く、正確なトリアージの実現は本質的に困難です。本論文では、Multi-relational Hyperbolic Diagnosis Predictor (MHDP) と呼ばれる新しい多関係双曲線グラフニューラルネットワークベースのアプローチを提案し、疾病予測モデルを構築します。具体的には、MHDPにおいて、症状、患者、診断のノードからなる異種グラフを生成し、その後、双曲線空間内で隣接情報を再帰的に集約することでノード表現を導出します。2つの実世界データセットを用いた実験により、提案したMHDPアプローチが最先端のベースラインを上回ることを示します。

Abstract:　 Medical triage chatbot is widely used in pre-diagnosis by asking symptom and medical history-related questions. Information collected from patients through an online chatbot system is often incomplete and imprecise, and thus it's essentially hard to achieve precise triaging. In this paper, we propose Multi-relational Hyperbolic Diagnosis Predictor (MHDP) --- a novel multi-relational hyperbolic graph neural network-based approach, to build a disease predictive model. More specifically, in MHDP, we generate a heterogeneous graph consisting of symptoms, patients, and diagnoses nodes, and then derive node representations by aggregating neighborhood information recursively in the hyperbolic space. Experiments conducted on two real-world datasets demonstrate that the proposed MHDP approach surpasses state-of-the-art baselines.

Meta-Learned Specific Scenario Interest Network for User Preference Prediction
Authors: Yinan Sun (1), Kang Yin (2), Hehuan Liu (3), Si Li (1), Yajing Xu (1), Jun Guo (1)
1: Beijing University of Posts and Telecommunications, 2: University of Chinese Academy of Sciences, 3: Beijing University of Technology

ACM DL

Google Scholar

(228)
概要:　ユーザープリファレンス予測は、ユーザー-アイテムの相互作用を通じてユーザーの興味を学習するタスクです。既存の多くの研究は、特定のシナリオ情報を考慮せずに過去の行動に基づいてユーザーの興味を捉えています。しかし、ユーザーはこれらの特定のシナリオで特別な興味を持つ場合があり、時にはユーザーの過去の行動が限られていることもあります。本論文では、特定のシナリオにおける興味を捉えることにより、ターゲットアイテムのユーザープリファレンスを予測するためのメタラーニングされた特定シナリオ興味ネットワーク（Meta-SSIN）を提案します。Meta-SSINは複数の独立したメタラーニングモジュールを使用して各シナリオにおける過去の行動をモデル化します。この独立したモジュールは、限られた行動に基づいて特別な興味を捉えることができます。3つのデータセットでの実験結果は、Meta-SSINが最新の手法と比較して優れていることを示しています。

Abstract:　 User preference prediction is a task of learning user interests through user-item interactions. Most existing studies capture user interests based on historical behaviors without considering specific scenario information. However, the users may have special interests in these specific scenarios and sometimes user historical behaviors are limited. In this paper, we propose a Meta-Learned Specific Scenario Interest Network (Meta-SSIN) to predict user preference of target item by capturing specific scenario interests. Meta-SSIN uses multiple independent meta-learning modules to model historical behaviors in each scenario. The independent module can capture special interests based on limited behaviors. Experimental results on three datasets show that Meta-SSIN outperforms compared state-of-the-art methods.

On the Privacy of Federated Pipelines
Authors: Reza Nasirigerdeh (1), Reihaneh Torkzadehmahani (1), Jan Baumbach (2), David B. Blumenthal (1)
1: Technical University of Munich, 2: University of Hamburg & University of Southern Denmark

ACM DL

Google Scholar

(229)
概要:　フェデレーテッドラーニング（FL）は、各ローカルサイトで利用可能な機微なデータがプライバシー保護規制のために共有できないアプリケーションシナリオにおいて、ますます人気のある機械学習パラダイムとなっています。FLでは、機微なデータはローカルサイトから出ることはなく、共有されるのはモデルパラメータのみです。それにもかかわらず、最近の研究により、特定の状況下ではモデルパラメータから機微データが再構築されることが示されており、FLにおいてデータ漏洩が発生し得ることが示唆されています。本論文では、FLに関連する別のリスクに注意を向けます。個別のフェデレーテッドアルゴリズムがプライバシー保護であっても、それらをパイプラインとして組み合わせることでプライバシー保護が保証されない場合があります。具体的な例として、ゲノムワイド関連解析の事例を挙げ、フェデレーテッド主成分分析とフェデレーテッド線形回帰の組み合わせにより、多次元部分和問題のインスタンスを解くことでアグリゲーターが機微な患者データを取得できることを示します。これは、FLが真にプライバシー保護を実現するためには、アグリゲーターにおけるデータ漏洩防止策を講じる必要があるという意識の高まりを支持するものです。

Abstract:　 Federated learning (FL) is becoming an increasingly popular machine learning paradigm in application scenarios where sensitive data available at various local sites cannot be shared due to privacy protection regulations. In FL, the sensitive data never leaves the local sites and only model parameters are shared with a global aggregator. Nonetheless, it has recently been shown that, under some circumstances, the private data can be reconstructed from the model parameters, which implies that data leakage can occur in FL. In this paper, we draw attention to another risk associated with FL: Even if federated algorithms are individually privacy-preserving, combining them into pipelines is not necessarily privacy-preserving. We provide a concrete example from genome-wide association studies, where the combination of federated principal component analysis and federated linear regression allows the aggregator to retrieve sensitive patient data by solving an instance of the multidimensional subset sum problem. This supports the increasing awareness in the field that, for FL to be truly privacy-preserving, measures have to be undertaken to protect against data leakage at the aggregator.

On the Two-Sample Randomisation Test for IR Evaluation
Authors: Tetsuya Sakai (1)
1: Waseda University

ACM DL

Google Scholar

(230)
概要:　以前のIRシステム評価における統計的有意差検定の比較研究は、ペアデータ検定（例：共通のテストコレクションを使用して2つのシステムを評価するためのもの）に焦点を当てていましたが、異なるテストコレクション間でIR実験の再現性を検証する場合には、二標本検定を使用する必要があります。本研究では、NTCIR-15 WWW-3タスクからの実際のランとテストコレクションを用いて、二つのシステムを比較するための三つの二標本有意差検定の特性を比較しました：Studentのt検定（これが従来のパラメトリック検定です）、Wilcoxon順位和検定（これが従来のノンパラメトリック検定です）、およびランダム化検定（これは現代の計算能力を利用する非母集団依存の方法です）。偽陽性率（評価の得点が同じシステムから来ているにもかかわらず統計的有意性を検出する可能性）に関しては、これら三つの検定は同様の動作を示しますが、Wilcoxon順位和検定は非常に少ないトピックセットサイズ（例：それぞれ10トピック）と大きな有意水準（例：α=0.10）で他の二つよりもやや堅牢であるようです。一方で、t検定とWilcoxon順位和検定は、次の二つの観点から非常に類似しています：「存在しない差を両方が検出する頻度」と「真の差を両方が見逃す頻度」。二つの従来の有意差検定と比較して、ランダム化検定はこれらの二つの観点に関して著しく異なる挙動を示します。したがって、研究者は三つの二標本検定の上記の特性を選択時に少なくとも認識しておくべきであると提案します。

Abstract:　 While previous work in comparing statistical significance tests for IR system evaluation have focused on paired data tests (e.g., for evaluating two systems using a common test collection), two-sample tests must be used when the reproducibility of IR experiments across different test collections must be examined. Using real runs and a test collection from the NTCIR-15 WWW-3 Task, the present study compares the properties of three two-sample significance tests for comparing two systems: Student's t-test (i.e., the classical parametric test), the Wilcoxon rank sum test (i.e., the classical nonparametric test), and the randomisation test (i.e., a population-free method that utilises modern computational power). In terms of the false positive rate (i.e., the chance of detecting a statistical significance even though the two samples of evaluation measure scores come from the same system), the three tests behave similarly, although the Wilcoxon rank sum test appears to be slightly more robust than the other two for very small topic set sizes (e.g., 10 topics each) with a large significance level (e.g., α=0.10). On the other hand, the t-test and the Wilcoxon rank sum test are very similar to each other from the following two viewpoints: "How often do they both detect a nonexistent difference?" and "How often do they both overlook a true difference?" Compared to the two classical significance tests, the randomisation test behaves markedly differently in terms of the above two viewpoints. Hence, we suggest that researchers should at least be aware of the above properties of the three two-sample tests when choosing from them.

Position Enhanced Mention Graph Attention Network for Dialogue Relation Extraction
Authors: Xinwei Long (1), Shuzi Niu (2), Yucheng Li (2)
1: Institute of Software Chinese Academy of Sciences & University of Chinese Academy of Sciences, 2: Institute of Software Chinese Academy of Sciences

ACM DL

Google Scholar

(231)
概要:　対話関係抽出（Dialogue Relation Extraction, DRE）は、複数ターンの対話から関係を抽出する新しいタスクです。従来のタスクとは異なり、話者固有の関係が局所的な発話ウィンドウと話者コンテキストの中で暗黙的に混在しています。局所および話者依存の課題に取り組むために、異なるエンティティからの発話内または話者全体の発話間で統一された言及共起グラフを明示的に構築します。各対話について、このグラフ上で位置強調型のグラフ注意ネットワークを提案し、両方のコンテキストにおいて位置認識の言及表現を取得します。ゲート機能を利用して、元の言及表現と位置認識された言及表現から、各関係に対して十分に識別的な表現を得ることが可能です。各対話のエンティティ対ごとに、ペアワイズ注意メカニズムを採用して、識別的な言及表現をペア表現として集約し、それを標準的な多ラベル分類器に入力して関係ラベルを予測します。2つのベンチマークの実験結果により、提案手法の性能向上はSOTAと比較して少なくとも1.6%及び3.2%であることが示されています。

Abstract:　 Dialogue Relation Extraction (DRE) is a new kind of relation extraction task from multi-turn dialogues. Different from the previous tasks, speaker specific relations are implicitly mixed together in both a local utterance window and a speaker context. To tackle both local and speaker dependency challenges, we explicitly construct a unified mention co-occurrence graph within a local utterance window or all utterances of a speaker from different entities. For each dialogue, a position enhanced graph attention network over this graph is proposed to obtain position aware mention representations in terms of both contexts. A gate function is utilized to help obtain a discriminative representation enough for each relation from original and position aware mention representations. For each entity pair in this dialogue, a pairwise attention mechanism is deployed to aggregate those discriminative mention representations as pair representation, which is fed into a standard multi-label classifier for relation label prediction. Experimental results on two benchmarks show the performance improvement of the proposed method is at least 1.6% and 3.2% compared with SOTA.

Predicting Patient Readmission Risk from Medical Text via Knowledge Graph Enhanced Multiview Graph Convolution
Authors: Qiuhao Lu (1), Thien Huu Nguyen (1), Dejing Dou (2)
1: University of Oregon, 2: University of Oregon & Baidu Research

ACM DL

Google Scholar

(232)
概要:　緊急集中治療室（ICU）への再入院率は、病院ケアの質を評価するための重要な指標です。ICU再入院リスクの効率的かつ正確な予測は、患者が不適切に退院されることや潜在的な危険を防ぐだけでなく、医療関連のコスト削減にも役立ちます。本論文では、電子健康記録（EHR）の医療テキストを用いた新しい予測方法を提案します。この方法は、患者の数値データや時系列データに大きく依存する従来の研究に対して、異なる視点を提供します。具体的には、患者のEHRから退院を抽出し、それを外部知識グラフで強化されたマルチビュ―グラフで表現します。次に、グラフ畳み込みネットワークを用いて表現学習を行います。実験結果は、本手法の有効性を証明し、このタスクにおいて最先端の性能を示しています。

Abstract:　 Unplanned intensive care unit (ICU) readmission rate is an important metric for evaluating the quality of hospital care. Efficient and accurate prediction of ICU readmission risk can not only help prevent patients from inappropriate discharge and potential dangers, but also reduce associated costs of healthcare. In this paper, we propose a new method that uses medical text of Electronic Health Records (EHRs) for prediction, which provides an alternative perspective to previous studies that heavily depend on numerical and time-series features of patients. More specifically, we extract discharge summaries of patients from their EHRs, and represent them with multiview graphs enhanced by an external knowledge graph. Graph convolutional networks are then used for representation learning. Experimental results prove the effectiveness of our method, yielding state-of-the-art performance for this task.

Predicting User Demography and Device from News Comments
Authors: Ohad Rozen (1), Joel Oren (2), Ariel Raviv (3)
1: Bar Ilan University, 2: Bosch Center for AI, 3: Yahoo Research

ACM DL

Google Scholar

(233)
概要:　年齢や性別などのオンラインユーザーの人口統計は、特にニュースドメインにおいてパーソナライズされたウェブアプリケーションで重要な役割を果たします。しかし、オンラインユーザーの人口統計情報を直接取得することは困難です。過去の研究では、ニュース閲覧データから得られる読書パターンに基づいてユーザーの人口統計を予測しようとしましたが、このようなデータは非常に限られていることがあります。幸いにも、近年ではオンラインユーザーによる投稿やコメントが非常に普及しており、異なる人口統計のユーザーからのコメントは内容や文体に違いを示します。したがって、コメントは人口統計予測の追加の手がかりを提供することができます。本論文では、ニュース閲覧データと関連するユーザー生成コメントの両方に基づいてユーザーの人口統計を予測することについて研究します。この目的のために、最近導入されたBERTベースのモデルを新たに使用し、各コメントを関連する記事の文脈に埋め込みます。実データセットを用いて実験を行い、閲覧データとユーザー生成データの両方が性別、地域タイプ（例：田舎対都会）、モバイルデバイスという3つの異なるユーザー属性の予測に及ぼす寄与を探ります。最後に、我々のアプローチがそのような予測のパフォーマンスを効果的に向上させ、ベースライン手法を上回ることを示します。

Abstract:　 Demographics of online users such as age and gender play an important role in personalized web applications, particularly in the News domain. However, it is difficult to directly obtain the demographic information of online users. Past works have attempted to predict user demography based on reading patterns obtained from news browsing data. However, such data can be very limited. Luckily, in recent years, posts and comments have become much prevalent among online users, and the comments from users of different demographics exhibit differences in contents and writing styles. Thus, comments can provide additional clues for demographic prediction. In this paper, we study predicting users' demographics based on both news browsing data and the associated user generated comments. To this end, we make a novel use of a recently introduced BERT-based model to embed each comment in the context of its associated article. We experiment on real-world datasets, and explore the contribution of both browsing data and user generated data in the task of predicting three different user attributes: gender, location type (e.g., rural vs. urban), and mobile device. Finally we show that our approach can effectively improve the performance of such predictions and outperforms baseline methods.

Proactive Retrieval-based Chatbots based on Relevant Knowledge and Goals
Authors: Yutao Zhu (1), Jian-Yun Nie (1), Kun Zhou (2), Pan Du (1), Hao Jiang (3), Zhicheng Dou (2)
1: Université de Montréal, 2: Renmin University of China, 3: Huawei Poisson Lab.

ACM DL

Google Scholar

(234)
概要:　プロアクティブ対話システムは会話を能動的にリードする能力を持っています。単にユーザーに反応するだけの一般的なチャットボットとは異なり、プロアクティブ対話システムは、例えばユーザーにアイテムを推薦するなどの目的を達成するために使用されることがあります。会話の円滑で自然な移行を可能にするためには、背景知識が不可欠です。本論文では、リトリーバルベースの知識に基づいたプロアクティブ対話のための新しいマルチタスク学習フレームワークを提案します。使用される関連知識を決定するために、知識予測を補完的なタスクとしてフレーム化し、その学習を明示的な信号で監督します。最終的な応答は、予測された知識、達成すべき目標、そしてコンテキストに従って選択されます。実験結果は、知識予測と目標選択の明示的なモデリングが最終的な応答選択を大いに改善できることを示しています。私たちのコードはhttps://github.com/DaoD/KPN/で入手可能です。

Abstract:　 A proactive dialogue system has the ability to proactively lead the conversation. Different from the general chatbots which only react to the user, proactive dialogue systems can be used to achieve some goals, e.g., to recommend some items to the user. Background knowledge is essential to enable smooth and natural transitions in dialogue. In this paper, we propose a new multi-task learning framework for retrieval-based knowledge-grounded proactive dialogue. To determine the relevant knowledge to be used, we frame knowledge prediction as a complementary task and use explicit signals to supervise its learning. The final response is selected according to the predicted knowledge, the goal to achieve, and the context. Experimental results show that explicit modeling of knowledge prediction and goal selection can greatly improve the final response selection. Our code is available at https://github.com/DaoD/KPN/.

Pseudo Siamese Network for Few-shot Intent Generation
Authors: Congying Xia (1), Caiming Xiong (2), Philip Yu (1)
1: University of Illinois at Chicago, 2: Salesforce Research

ACM DL

Google Scholar

(235)
概要:　数ショットインテント検出は、スカースアノテーション問題により困難な課題です。本稿では、この問題を軽減するために、擬似シアミーズネットワーク（PSN）を提案します。PSNは、同一構造だが異なる重みを持つ二つのサブネットワークからなります：アクションネットワークとオブジェクトネットワーク。各サブネットワークは, 文中の異なる要素の潜在分布をモデル化しようとする, トランスフォーマーベースの変分オートエンコーダで構成されています。アクションネットワークはアクショントークンを理解し、オブジェクトネットワークはオブジェクト関連の表現に焦点を当てて学習します。これにより、特定のインテント内に存在するアクションとオブジェクトを持つ発話を生成するための解釈可能なフレームワークを提供します。二つの実世界データセットでの実験により、PSNが一般化された少数ショットインテント検出タスクにおいて最先端の性能を達成することが示されました。

Abstract:　 Few-shot intent detection is a challenging task due to the scare annotation problem. In this paper, we propose a Pseudo Siamese Network (PSN) to generate labeled data for few-shot intents and alleviate this problem. PSN consists of two identical subnetworks with the same structure but different weights: an action network and an object network. Each subnetwork is a transformer-based variational autoencoder that tries to model the latent distribution of different components in the sentence. The action network is learned to understand action tokens and the object network focuses on object-related expressions. It provides an interpretable framework for generating an utterance with an action and an object existing in a given intent. Experiments on two real-world datasets show that PSN achieves state-of-the-art performance for the generalized few shot intent detection task.

Retrieving Implicit Information for Stock Movement Prediction
Authors: Tsun-Hsien Tang (1), Chung-Chi Chen (1), Hen-Hsen Huang (2), Hsin-Hsi Chen (3)
1: National Taiwan University, 2: National Chengchi University, 3: National Taiwan University & MOST Joint Research Center for AI Technology and All Vista Healthcare

ACM DL

Google Scholar

(236)
概要:　 :

これまでの金融ニュースに関する研究は、主に目標とする金融商品を明示的に言及しているニュース記事に焦点を当てており、データの希少性に悩まされる可能性があります。しかし、セクター関連のニュースなど、他の関連ニュースを考慮することは実際の意思決定において重要な要素です。そこで、明示的な目標言及のないニュースを活用して予測モデルに情報を充実させることを検討しました。混沌とした日々のニュースプールから暗黙の情報を抽出するニュース選択メカニズムを使用して、共同学習を行うニューラルネットワークフレームワークを開発しました。我々の提案するモデル「ニュース蒸留ネットワーク（NDN）」は、ニューラル表現学習と協調フィルタリングを活用して、株式とニュースの関係を捉えます。NDNを使用することで、類似度測定を促進する潜在的な株式とニュースの表現を学習し、ノイズの多いニュース表現が上位のエンコーディングステージに流れるのを防ぐゲートメカニズムを適用します。このステージでは、各日の選択されたニュース表現をエンコードします。実際の株式市場データを用いた広範な実験により、我々のフレームワークの有効性が証明され、従来の技術よりも改善が見られました。

Abstract:　 Previous studies on the financial news focus mainly on the news articles explicitly mentioning the target financial instruments, and may suffer from data sparsity. As taking into consideration other related news, e.g., sector-related news, is a crucial part of real-world decision-making, we explore the use of news without explicit target mentions to enrich the information for the prediction model. We develop a neural network framework that jointly learns with a news selection mechanism to extract implicit information from the chaotic daily news pool. Our proposed model, called the news distilling network (NDN), takes advantage of neural representation learning and collaborative filtering to capture the relationship between stocks and news. With NDN, we learn latent stock and news representations to facilitate similarity measurements, and apply a gating mechanism to prevent noisy news representations from flowing to a higher level encoding stage, which encodes the selected news representation of each day. Extensive experiments on real-world stock market data demonstrate the effectiveness of our framework and show improvements over previous techniques.

Retrieving Skill-Based Teams from Collaboration Networks
Authors: Radin Hamidi Rad (1), Ebrahim Bagheri (1), Mehdi Kargar (2), Divesh Srivastava (3), Jaroslaw Szlichta (4)
1: Ryerson University, 2: Ryerson University, 3: AT&T Labs-Research, 4: Ontario Tech University

ACM DL

Google Scholar

(237)
概要:　必要なスキルのセットを前提として、チーム編成問題の目的は、これらのスキルを網羅する専門家チームを形成することです。現存する多くのアプローチは、最小費用全域木などのグラフ法に基づいています。しかし、これらの方法はネットワークの見方が限定的で、専門家間の複雑な相互作用を捉えることができず、計算上も扱いにくいです。より新しいアプローチは、スキルと専門家の空間間のマッピングを学習するためにニューラルアーキテクチャを採用しています。これらの方法は効果的ではありますが、次の2つの主な制限があります：(1) スキルおよび専門家のための固定された表現を考慮していること、(2) 過去のコラボレーションネットワーク情報の重要な量を見落としていること。我々は、過去のコラボレーションに基づいてスキルと専門家の密な表現を学習し、転移学習を通じてトレーニングプロセスをブートストラップします。また、マッピング関数を学習する際に、スキルと専門家の表現を微調整することを提案します。DBLPデータセットを用いた実験により、我々の提案するアーキテクチャが、ランクおよび品質測定の両方で最先端のグラフおよびニューラル法を上回ることが確認されました。

Abstract:　 Given a set of required skills, the objective of the team formation problem is to form a team of experts that cover the required skills. Most existing approaches are based on graph methods, such as minimum-cost spanning trees. These approaches, due to their limited view of the network, fail to capture complex interactions among experts and are computationally intractable. More recent approaches adopt neural architectures to learn a mapping between the skills and experts space. While they are more effective, these techniques face two main limitations: (1) they consider a fixed representation for both skills and experts, and (2) they overlook the significant amount of past collaboration network information. We learn dense representations for skills and experts based on previous collaborations and bootstrap the training process through transfer learning. We also propose to fine-tune the representation of skills and experts while learning the mapping function. Our experiments over the DBLP dataset verify that our proposed architecture is able to outperform the state-of-the-art graph and neural methods over both ranking and quality metrics.

Rumor Detection on Social Media with Event Augmentations
Authors: Zhenyu He (1), Ce Li (1), Fan Zhou (1), Yi Yang (2)
1: University of Electronic Science and Technology of China, 2: Hong Kong University of Science and Technology

ACM DL

Google Scholar

(238)
概要:　インターネット上のデジタルデータが急速に増加する中で、ソーシャルメディアにおけるデマ検出は非常に重要となっています。既存のディープラーニングベースの手法は、デマの高次元表現を学習する能力によって有望な結果を達成しています。しかし、この成功にもかかわらず、これらのアプローチは、大量の信頼性のあるラベル付きデータをトレーニングに必要とするため、時間がかかり、データ効率が悪いと主張します。この課題に対処するために、私たちは新しい解決策「イベント拡張を用いたソーシャルメディアにおけるデマ検出（RDEA）」を提案します。これは、返信属性とイベント構造の両方を変更する三つの拡張戦略を革新的に統合し、有意義なデマ伝播パターンを抽出し、ユーザーエンゲージメントの本質的な表現を学習します。さらに、イベント拡張の効率的な実装とデータ不足の問題を軽減するために、コントラスト自己教師あり学習を導入します。二つの公開データセットで行った多数の実験により、RDEAが既存のベースラインを超える最先端の性能を達成することを実証しました。加えて、ラベル付きデータが限られている場合におけるRDEAの堅牢性を経験的に示しました。

Abstract:　 With the rapid growth of digital data on the Internet, rumor detection on social media has been vital. Existing deep learning-based methods have achieved promising results due to their ability to learn high-level representations of rumors. Despite the success, we argue that these approaches require large reliable labeled data to train, which is time-consuming and data-inefficient. To address this challenge, we present a new solution, Rumor Detection on social media with Event Augmentations (RDEA), which innovatively integrates three augmentation strategies by modifying both reply attributes and event structure to extract meaningful rumor propagation patterns and to learn intrinsic representations of user engagement. Moreover, we introduce contrastive self-supervised learning for the efficient implementation of event augmentations and alleviate limited data issues. Extensive experiments conducted on two public datasets demonstrate that RDEA achieves state-of-the-art performance over existing baselines. Besides, we empirically show the robustness of RDEA when labeled data are limited.

Similar Trademark Detection via Semantic, Phonetic and Visual Similarity Information
Authors: Yingchi Liu (1), Quanzhi Li (1), Changlong Sun (2), Luo Si (1)
1: Alibaba Group, 2: Alibaba Group

ACM DL

Google Scholar

(239)
概要:　昨年、中国では数百万件の商標が登録され、毎日何千件もの申請が提出されています。商標はその属するカテゴリで唯一無二でなければなりません。したがって、新しい商標申請は、そのカテゴリ内の既存のすべての商標と照合する必要があります。商標は、テキスト文字列（文字、単語、フレーズ）、図形（シンボルやデザイン）、またはその両方で構成されます。本研究では、中国語のテキスト商標に焦点を当て、特定の商標に対して類似商標を見つけるモデルを提案します。このニューラルネットワークモデルは、二つのテキスト商標間の意味的、音韻的、および視覚的な類似性を利用します。我々のモデルは、実際の商標申請データから生成されたデータセットを使用して評価しました。この評価結果は、提案されたモデルが他のアプローチよりも優れていることを示しています。

Abstract:　 Millions of trademarks were registered last year in China, and thousands of applications are submitted daily. A trademark must be unique in the category it belongs to. Therefore, each new trademark application needs to be checked against all the existing ones in its category. A trademark can be a text string (characters, words or phrases), a figure (symbol or design), or both. In this study, we focus on the textual trademark in Chinese, and propose a model for finding similar trademarks for a given one. This neural network model exploits the semantic, phonetic and visual similarities between two textual trademarks. We evaluated our model based on a dataset that were built from the real trademark application data. Our evaluation shows that the proposed model outperforms other approaches.

Structured Fine-Tuning of Contextual Embeddings for Effective Biomedical Retrieval
Authors: Alberto Ueda (1), Rodrygo L. T. Santos (1), Craig Macdonald (2), Iadh Ounis (2)
1: Federal University of Minas Gerais, 2: University of Glasgow

ACM DL

Google Scholar

(240)
概要:　ニューラル言語モデリングの最近の進歩により、生物医学文献の検索が大いに改善されました。特に、事前学習済みの文脈的言語モデルを微調整することで、最近の生物医学検索評価キャンペーンで驚異的な結果が示されています。しかし、現在のアプローチは、生物医学のから得られる内在的な構造を無視しています。これらのは、背景、方法、結果、および結論といった意味的に一貫したセクションに（しばしば明示的に）整理されています。本論文では、事前学習済みの文脈的言語モデルをより細かい粒度で微調整するために、生物医学のセクションを活用する適性を調査します。2つのTREC生物医学テストコレクションにおける結果は、構造を活用しない標準的な微調整と比較して、提案された構造化微調整レジームの有効性を示しています。アブレーションスタディを通じて、個々のセクションで微調整されたモデルが、構造非依存モデルでは無視される可能性のある有用な単語の文脈を捉えることができることを示します。

Abstract:　 Biomedical literature retrieval has greatly benefited from recent advances in neural language modeling. In particular, fine-tuning pretrained contextual language models has shown impressive results in recent biomedical retrieval evaluation campaigns. Nevertheless, current approaches neglect the inherent structure available from biomedical abstracts, which are (often explicitly) organised into semantically coherent sections such as background, methods, results, and conclusions. In this paper, we investigate the suitability of leveraging biomedical abstract sections for fine-tuning pretrained contextual language models at a finer granularity. Our results on two TREC biomedical test collections demonstrate the effectiveness of the proposed structured fine-tuning regime in contrast to a standard fine-tuning that does not leverage structure. Through an ablation study, we show that models fine-tuned on individual sections are able to capture potentially useful word contexts that may be otherwise ignored by structure-agnostic models.

Towards a Better Tradeoff between Effectiveness and Efficiency in Pre-Ranking: A Learnable Feature Selection based Approach
Authors: Xu Ma (1), Pengjie Wang (1), Hui Zhao (1), Shaoguo Liu (1), Chuhan Zhao (1), Wei Lin (1), Kuang-Chih Lee (2), Jian Xu (1), Bo Zheng (1)
1: Alibaba Group, 2: Alibaba Group

ACM DL

Google Scholar

(241)
概要:　実世界の検索、推薦、および広告システムでは、多段階ランキングアーキテクチャが一般的に採用されています。このアーキテクチャは通常、マッチング、事前ランキング、ランキング、および再ランキングの段階から構成されます。事前ランキング段階では、システム効率を考慮して、表現重視のアーキテクチャを持つベクトル積ベースのモデルが一般的に採用されています。しかし、これはシステムの有効性に大きな損失をもたらします。本稿では、複雑な相互作用重視のアーキテクチャをサポートする新しい事前ランキング手法を提案します。提案する特徴複雑性と変量ドロップアウトに基づく学習可能な特徴選択法（FSCD）を利用することで、有効性と効率のバランスをより良く実現します。検索エンジン向けの実世界のeコマーススポンサー付き検索システムでの評価では、提案された事前ランキングを利用することで、システムの有効性が著しく向上することが示されています。さらに、従来の事前ランキングモデルを用いたシステムと比較して、同一量の計算リソースが消費されます。

Abstract:　 In real-world search, recommendation, and advertising systems, the multi-stage ranking architecture is commonly adopted. Such architecture usually consists of matching, pre-ranking, ranking, and re-ranking stages. In the pre-ranking stage, vector-product based models with representation-focused architecture are commonly adopted to account for system efficiency. However, it brings a significant loss to the effectiveness of the system. In this paper, a novel pre-ranking approach is proposed which supports complicated models with interaction-focused architecture. It achieves a better tradeoff between effectiveness and efficiency by utilizing the proposed learnable Feature Selection method based on feature Complexity and variational Dropout (FSCD). Evaluations in a real-world e-commerce sponsored search system for a search engine demonstrate that utilizing the proposed pre-ranking, the effectiveness of the system is significantly improved. Moreover, compared to the systems with conventional pre-ranking models, an identical amount of computational resource is consumed.

Towards an Online Empathetic Chatbot with Emotion Causes
Authors: Yanran Li (1), Ke Li (1), Hongke Ning (1), Xiaoqiang Xia (1), Yalong Guo (1), Chen Wei (1), Jianwei Cui (1), Bin Wang (1)
1: Xiaomi AI Lab

ACM DL

Google Scholar

(242)
概要:　既存の感情認識会話モデルは、特定の感情クラスに合わせた応答内容の制御に焦点を当てることが多い。しかし、共感とは、他人の感情や経験を理解し、それに配慮する能力を指す。そのため、共感的な応答を行うためには、ユーザーの感情を引き起こす原因、すなわち感情原因を学習することが重要である。オンライン環境で感情原因を収集するために、私たちはカウンセリング戦略を利用し、因果的な感情情報を活用する共感的なチャットボットを開発した。実際のオンラインデータセットを用いて提案手法の有効性を検証し、複数の最新手法（SOTA）との比較を自動評価指標、専門家による人間の判断、およびユーザーによるオンライン評価を通じて行った。

Abstract:　 Existing emotion-aware conversational models usually focus on controlling the response contents to align with a specific emotion class, whereas empathy is the ability to understand and concern the feelings and experience of others. Hence, it is critical to learn the causes that evoke the users' emotion for empathetic responding, a.k.a. emotion causes. To gather emotion causes in online environments, we leverage counseling strategies and develop an empathetic chatbot to utilize the causal emotion information. On a real-world online dataset, we verify the effectiveness of the proposed approach by comparing our chatbot with several SOTA methods using automatic metrics, expert-based human judgements as well as user-based online evaluation.

User Feedback and Ranking in-a-Loop: Towards Self-Adaptive Dialogue Systems
Authors: Chen Shi (1), Yuxiang Hu (1), Zengming Zhang (1), Liang Shao (1), Feijun Jiang (1)
1: Alibaba Group

ACM DL

Google Scholar

(243)
概要:　要点: 英語から日本語への翻訳
対象ジャーナル: 機械学習
目的: 分かりやすく、明瞭な文章
対象読者: 専門家
スタイル: 分析的、博士号レベル
テキスト:現代の会話型AIエージェントの成功には、正確なスキル検索が重要です。主な課題は、人間の話し言葉の曖昧さと候補となるスキルの幅広いスペクトルにあります。本論文では、初めてこの問題に取り組むためにユーザーフィードバック強化再ランキング戦略を実装し、会話型AIエージェントのための自己適応型対話システム（AdaDial）を提案します。AdaDialでは、ユーザーフィードバックの推定とランキング戦略の調整を「閉ループ」として考えます。特に、ユーザーフィードバックの推定のためのスケーラブルなスキーマと、カスタマイズされた特徴エンコーディング、ターゲット注意に基づく特徴アセンブリング、およびマルチタスク学習を用いたフィードバック強化再ランキングモデルを提案します。これにより、AdaDialは個別レベルおよびシステム全体レベルで自己適応性を実現します。オンライン実験結果は、AdaDialが異なるシナリオで異なるユーザーのために望ましいスキルを検索できるだけでなく、負のフィードバックに応じて通常の戦略を修正できることを示しています。AdaDialは、毎日数千万件のクエリを処理する大規模な会話型AIエージェントに導入されており、ユーザー体験に継続的にポジティブな影響を与えています。

Abstract:　 Accurate skill retrieval is a key factor for the success of modern conversational AI agents. The major challenges lie in the ambiguity in human spoken language and the wide spectrum of candidate skills. In this paper, we make the first attempt to attack the problem by implementing a user feedback enhanced reranking strategy, and propose a self-adaptive dialogue system (AdaDial) for conversational AI agents. In AdaDial, we consider estimating user feedback and adjusting ranking strategy into a "closed-loop". In particular, we propose a scalable schema for user feedback estimation and a feedback enhanced reranking model with customized feature encoding, target attention based feature assembling, and multi-task learning. As a result, AdaDial achieves self-adaptivity at both individual- and system-levels. Online experimental results demonstrate that AdaDial could not only retrieve desired skills for different users in different scenarios, but also correct its regular strategy according to negative feedback. AdaDial has been deployed on a large-scale conversational AI agents with tens of millions daily queries, and is bringing continued positive impacts on user experience.

User Preference-aware Fake News Detection
Authors: Yingtong Dou (1), Kai Shu (2), Congying Xia (1), Philip S. Yu (1), Lichao Sun (3)
1: University of Illinois at Chicago, 2: Illinois Institute of Technology, 3: Lehigh University

ACM DL

Google Scholar

(244)
概要:　昨今の情報操作やフェイクニュースは個人や社会に深刻な影響を及ぼし、フェイクニュース検出への関心が高まっています。既存のフェイクニュース検出アルゴリズムの多くは、ニュースの内容や周囲の外因的な文脈を掘り下げ、欺瞞的なシグナルを発見することに焦点を当てていますが、ユーザーがフェイクニュースを拡散するかどうかを決定するときの内因的な嗜好は無視されています。確証バイアス理論は、ユーザーが自身の既存の信念や嗜好を確認するニュースを拡散しやすいことを示唆しています。ユーザーの過去の社会的な関わり（例えば投稿）は、ニュースに対するユーザーの嗜好に関する豊富な情報を提供し、フェイクニュース検出を進展させる大きな可能性を秘めています。しかし、フェイクニュース検出におけるユーザー嗜好の探索に関する研究は限られています。したがって本論文では、フェイクニュース検出のためのユーザー嗜好の活用という新しい課題について研究します。我々は、ユーザーの嗜好から得られる様々なシグナルを同時に取り込み、コンテンツとグラフを共同でモデリングする新しいフレームワークUPFDを提案します。実世界のデータセットを用いた実験結果は、提案したフレームワークの有効性を示しています。我々は、GNNベースのフェイクニュース検出のベンチマークとして、コードとデータを公開します: https://github.com/safe-graph/GNN-FakeNews.

Abstract:　 Disinformation and fake news have posed detrimental effects on individuals and society in recent years, attracting broad attention to fake news detection. The majority of existing fake news detection algorithms focus on mining news content and/or the surrounding exogenous context for discovering deceptive signals; while the endogenous preference of a user when he/she decides to spread a piece of fake news or not is ignored. The confirmation bias theory has indicated that a user is more likely to spread a piece of fake news when it confirms his/her existing beliefs/preferences. Users' historical, social engagements such as posts provide rich information about users' preferences toward news and have great potentials to advance fake news detection. However, the work on exploring user preference for fake news detection is somewhat limited. Therefore, in this paper, we study the novel problem of exploiting user preference for fake news detection. We propose a new framework, UPFD, which simultaneously captures various signals from user preferences by joint content and graph modeling. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework. We release our code and data as a benchmark for GNN-based fake news detection: https://github.com/safe-graph/GNN-FakeNews.

Utility of Missing Concepts in Query-biased Summarization
Authors: Sheikh Muhammad Sarwar (1), Felipe Moraes (2), Jiepu Jiang (3), James Allan (1)
1: University of Massachusetts Amherst, 2: Delft University of Technology, 3: University of Wisconsin-Madison

ACM DL

Google Scholar

(245)
概要:　タイトル:クエリ重視の(Query-biased Summarization, QBS) は、検索された文書に基づいてクエリ依存のを生成し、全文内容を確認するための人的労力を削減することを目指しています。典型的なアプローチは、クエリと重なる文書のスニペットを抽出し、それを検索者に提示します。しかし、このようなQBSの手法は、文書内の関連情報を示すものの、検索者に不足している情報を通知することはありません。我々の研究は、検索された文書の欠落情報を明らかにすることで、関連する文書を見つけるための労力を軽減することに焦点を当てています。我々は古典的なアプローチであるDSPApproxを用い、クエリに関連する用語やフレーズを見つけます。そして、これらの用語やフレーズが文書内で欠落しているかを特定し、検索インターフェースにおいて提示し、クラウドワーカーにスニペットと不足情報に基づいた文書の関連性を評価してもらいます。実験結果は、関連スニペットのみを表示する従来の方法と比較して、我々の手法の利点と限界の双方を示しています。

Abstract:　 Query-biased Summarization (QBS) aims to produce a query-dependent summary of a retrieved document to reduce the human effort for inspecting the full-text content. Typical summarization approaches extract document snippets that overlap with the query and show them to searchers. Such QBS methods show relevant information in a document but do not inform searchers what is missing. Our study focuses on reducing user effort in finding relevant documents by exposing the information in the query that is missing in the retrieved results. We use a classical approach, DSPApprox, to find terms or phrases relevant to a query. Then, we identify which terms or phrases are missing in a document, present them in a search interface, and ask crowd workers to judge document relevance based on snippets and missing information. Experimental results show both benefits and limitations of our method compared with traditional ones that only show relevant snippets.

Variational Autoencoders for Top-K Recommendation with Implicit Feedback
Authors: Bahare Askari (1), Jaroslaw Szlichta (1), Amirali Salehi-Abari (1)
1: Ontario Tech University

ACM DL

Google Scholar

(246)
概要:　変分オートエンコーダ（VAE）は、暗黙的フィードバック（例えば、閲覧履歴、購入パターンなど）を利用するレコメンダーシステムにおいて効果的であることが示されています。しかし、ユーザとアイテムの表現を共同で学習するVAEのアンサンブルには、ほとんど注目が集まっていません。本研究では、ユーザとアイテムの表現を共同で学習し、ユーザの好みを予測するための2つのVAEのアンサンブルであるJoint Variational Autoencoder（JoVA）を提案します。この設計により、JoVAはユーザ間およびアイテム間の相関を同時に捉えることが可能となります。また、JoVAの拡張版としてヒンジベースのペアワイズ損失関数を導入したJoVA-Hingeを紹介し、暗黙的フィードバックによる推薦に特化させました。4つの実世界データセットを用いた広範な実験により、JoVA-Hingeが一般に使用される様々な評価指標の下で最新技術を大きく上回ることが示されました。また、実証結果から、JoVA-Hingeが限られた訓練データを持つユーザに対しても効果的であることが明らかとなりました。

Abstract:　 Variational Autoencoders (VAEs) have shown to be effective for recommender systems with implicit feedback (e.g., browsing history, purchasing patterns, etc.). However, a little attention is given to ensembles of VAEs, that can learn user and item representations jointly. We introduce Joint Variational Autoencoder (JoVA), an ensemble of two VAEs, which jointly learns both user and item representations to predict user preferences. This design allows JoVA to capture user-user and item-item correlations simultaneously. We also introduce JoVA-Hinge, a JoVA's extension with a hinge-based pairwise loss function, to further specialize it in recommendation with implicit feedback. Our extensive experiments on four real-world datasets demonstrate that JoVA-Hinge outperforms a broad set of state-of-the-art methods under a variety of commonly-used metrics. Our empirical results also illustrate the effectiveness of JoVA-Hinge for handling users with limited training data.

Vera: Prediction Techniques for Reducing Harmful Misinformation in Consumer Health Search
Authors: Ronak Pradeep (1), Xueguang Ma (1), Rodrigo Nogueira (1), Jimmy Lin (1)
1: University of Waterloo

ACM DL

Google Scholar

(247)
概要:　 COVID-19パンデミックは信憑性に欠け、科学的事実を誤って伝える有害なニュース記事のオンライン上での増加をもたらしました。誤報は、健康情報を検索するユーザーに実際の影響を及ぼします。マルチステージランク付けアーキテクチャの文脈では、正確で信頼性のある情報を誤報より優先するかどうかを調査する研究はほとんど行われていません。私たちは、MS MARCO文などの標準的なリレベンスランキングデータセットでモデルをトレーニングすると、信頼性の高い情報をほとんど含むようにキュレーションされているため、有害な誤報も促進される可能性があることを発見しました。これを修正するために、私たちは有用なコンテンツと有害なコンテンツを区別できるラベル予測技術を提案します。この設計は、リレベンスランキングとラベル予測の両方において、事前学習されたシーケンス・トゥ・シーケンスのトランスフォーマーモデルを活用しています。TREC 2020 Health Misinformation Trackで評価された我々の技術は、トップランクのシステムを表しており、最高の提出ランは主要な測定基準に基づいて2番目のランより19.2ポイント高く、68％の相対的な改善を示しました。追加の事後実験では、効果をさらに3.5ポイント向上させることができることが示されています。

Abstract:　 The COVID-19 pandemic has brought about a proliferation of harmful news articles online, with sources lacking credibility and misrepresenting scientific facts. Misinformation has real consequences for consumer health search, i.e., users searching for health information. In the context of multi-stage ranking architectures, there has been little work exploring whether they prioritize correct and credible information over misinformation. We find that, indeed, training models on standard relevance ranking datasets like MS MARCO passage---which have been curated to contain mostly credible information---yields models that might also promote harmful misinformation. To rectify this, we propose a label prediction technique that can separate helpful from harmful content. Our design leverages pretrained sequence-to-sequence transformer models for both relevance ranking and label prediction. Evaluated at the TREC 2020 Health Misinformation Track, our techniques represent the top-ranked system: Our best submitted run was 19.2 points higher than the second-best run based on the primary metric, a 68% relative improvement. Additional post-hoc experiments show that we can boost effectiveness by another 3.5 points.

Visual Question Rewriting for Increasing Response Rate
Authors: Jiayi Wei (1), Xilian Li (1), Yi Zhang (1), Xin Eric Wang (1)
1: University Of California

ACM DL

Google Scholar

(248)
概要:　 :
オンラインで人間が質問したり、会話型仮想エージェントが人間に質問したりする際、感情を引き起こす質問や詳細を含む質問は、より回答を得やすい可能性があります。我々は、人々の回答率を向上させるために自然言語の質問を自動的に書き換える方法を探求します。特に、視覚情報が新しい質問の改善にどのように役立つかを探るために、新しいタスク「視覚質問書き換え（VQR）」を導入しました。これには、4,000組の淡白かつ魅力的な質問-画像三重項を含むデータセットを収集しました。ベースラインのシーケンス・トゥ・シーケンスモデルや、より進んだトランスフォーマーベースのモデルを開発し、淡白な質問と関連する画像を入力として受け取り、より魅力的になると予想される書き換えた質問を出力します。オフライン実験やMechanical Turkによる評価は、淡白な質問をより詳しく魅力的に書き換えることで回答率を上げることが可能であり、画像がその助けとなることを示しています。

Abstract:　 When a human asks questions online, or when a conversational virtual agent asks a human questions, questions triggering emotions or with details might more likely to get responses or answers. we explore how to automatically rewrite natural language questions to improve the response rate form people. In particular, a new task of Visual Question Rewriting (VQR) task is introduced to explore how visual information can be used to improve the new question(s). A data set containing -4K bland&attractive question-images triples is collected. We developed some baseline sequence to sequence models and more advanced transformer-based models, which take a bland question and a related image as input, and output a rewritten question that's expected to be more attractive. Offline experiments and mechanical Turk based evaluations show that it's possible to rewrite bland questions in a more detailed and attractive way to increase response rate, and images can be helpful.

X-2ch: Quad-Channel Collaborative Graph Network over Knowledge-Embedded Edges
Authors: Kachun Lo (1), Tsukasa Ishigaki (1)
1: Tohoku Universiy

ACM DL

Google Scholar

(249)
概要:　豊富なサイド情報を持つナレッジグラフ（KG）は、レコメンデーションのための協調フィルタリング（CF）のスパース性を豊かにする上で大きな可能性を示しています。グラフニューラルネットワーク（GNN）はKGとCFシグナルからユーザーの嗜好を同時に学習するために成功裏に利用されていますが、多くのモデルはその設計の欠陥、すなわち、1) ユーザー、アイテム、KGエンティティを区別しないこと、2) KGシグナルとCFシグナルを混乱させること、3) グラフ情報伝播に重要なエッジの効果を完全に無視していることにより、性能が劣悪である。本論文では、これらの問題に対処するために、四重チャネルグラフモデル（X-2ch）を提案します。まず、KGエンティティをノードとしてグラフに配置するのではなく、X-2chはKG情報を抽出し、双方向の方法でエッジ属性として埋め込み、自然なユーザー-アイテム相互作用プロセスをモデル化します。次に、X-2chはコラボレーティブなユーザー-アイテム更新とCF-KGアテンティブ伝播を含む新しい四重チャネル学習方式を導入し、ユーザーとアイテムの相互接続性を総合的にキャプチャする一方で、それぞれの固有の特性を保持します。二つの実世界ベンチマークでの実験は、最新のベースラインに対して大幅な改善を示しました。

Abstract:　 Carrying abundant side information, knowledge graph (KG) has shown its great potential in enriching the sparsity of collaborative filtering (CF) for recommendation. Although graph neural networks (GNNs) have been successfully employed to learn user preferences from KG and CF signals simultaneously, most models suffer from inferior performance due to their deficient designs, i.e., 1) formulating no distinction between users, items and KG entities, 2) confounding KG signals with CF signals and 3) completely neglecting the effects of edges, which is vital for graph information propagation. In this paper, we propose a quad-channel graph model (X-2ch) to tackle these problems. First, rather than lodging KG entities on graph as nodes, X-2ch distills KG information and embeds them as edge attributes in a bi-directional manner to model the natural user-item interaction process. Second, X-2ch introduces a novel quad-channel learning scheme, including a collaborative user-item update and a CF-KG attentive propagation, to holistically capture the interconnectivity of users and items while preserving their distinct properties. Experiments on two real-world benchmarks show substantial improvement over the state-of-the-art baselines.

A Systematic Evaluation of Transfer Learning and Pseudo-labeling with BERT-based Ranking Models
Authors: Iurii Mokrii (1), Leonid Boytsov (2), Pavel Braslavski (3)
1: HSE University, 2: Bosch Center for Artificial Intelligence, 3: Ural Federal University & HSE University

ACM DL

Google Scholar

(250)
概要:　高コストのアノテーション作業を回避するために、既存の人間が作成した訓練データを最大限に活用することは重要な研究方向です。このため、私たちはBERTベースのニューラルランキングモデルの転移可能性を5つの英語データセットにわたって体系的に評価しました。従来の研究は主に、大規模なデータセットから少数のクエリを含むデータセットへのゼロショットおよび少数ショットの転移に焦点を当てていました。これに対し、私たちのコレクションはそれぞれが多数のクエリを有しているため、フルショット評価モードを可能にし、結果の信頼性を向上させます。さらに、ソースデータセットのライセンスが商業利用を禁止している場合が多いため、BM25スコアラーによって生成された擬似ラベルに基づく訓練と転移学習を比較しました。擬似ラベルでの訓練は、適度な数のアノテーション付きクエリでその後にファインチューニングを行うことで、転移学習と比較して競争力のある、もしくは優れたモデルを生成できることがわかりました。しかし、事前訓練されたモデルの性能を劣化させることがあるため、少数ショットの訓練の安定性および/または有効性を向上させることが必要です。

Abstract:　 Due to high annotation costs making the best use of existing human-created training data is an important research direction. We, therefore, carry out a systematic evaluation of transferability of BERT-based neural ranking models across five English datasets. Previous studies focused primarily on zero-shot and few-shot transfer from a large dataset to a dataset with a small number of queries. In contrast, each of our collections has a substantial number of queries, which enables a full-shot evaluation mode and improves reliability of our results. Furthermore, since source datasets licences often prohibit commercial use, we compare transfer learning to training on pseudo-labels generated by a BM25 scorer. We find that training on pseudo-labels---possibly with subsequent fine-tuning using a modest number of annotated queries---can produce a competitive or better model compared to transfer learning. Yet, it is necessary to improve the stability and/or effectiveness of the few-shot training, which, sometimes, can degrade performance of a pretrained model.

Abstractive Text Summarization with Hierarchical Multi-scale Abstraction Modeling and Dynamic Memory
Authors: Lihan Wang (1), Min Yang (2), Chengming Li (2), Ying Shen (3), Ruifeng Xu (4)
1: Shenzhen Institutes of Advanced Technology, 2: Shenzhen Institutes of Advanced Technology, 3: Sun Yat-Sen University, 4: Harbin Institute of Technology

ACM DL

Google Scholar

(251)
概要:　本論文では、階層的マルチスケール化モデリングと動的メモリを用いた新しい的テキスト手法（MADY）を提案します。まず、文書の一連の依存関係を複数の階層レベルで捉える階層的マルチスケール化モデリング手法を提案します。この方法は、人間が記事を理解する過程を模倣しており、低レベル化層には細かい時間スケール、高レベル化層には粗い時間スケールを学習させます。この適応型更新メカニズムを適用することで、高レベル化層は低レベル化層よりも更新頻度が低く、長期依存関係をよりよく記憶することが期待されます。次に、動的キー-バリュー記憶強化型アテンションネットワークを提案し、入力文書内の重要な側面に関するアテンション履歴と包括的な文脈情報を追跡します。これにより、モデルは重複した単語や誤ったを生成することを避けることができます。広く使用されている2つのデータセットを用いた広範な実験により、提案するMADYモデルの自動評価および人間評価の両方における有効性が実証されました。再現性のために、コードとデータを以下に提出します：https://github.com/siat-nlp/MADY.git。

Abstract:　 In this paper, we propose a novel abstractive text summarization method with hierarchical multi-scale abstraction modeling and dynamic memory (called MADY). First, we propose a hierarchical multi-scale abstraction modeling method to capture the temporal dependencies of the document from multiple hierarchical levels of abstraction, which mimics the process of how human beings comprehend an article by learning fine timescales for low-level abstraction layers and coarse timescales for high-level abstraction layers. By applying this adaptive updating mechanism, the high-level abstraction layers are updated less frequently and expected to remember the long-term dependency better than the low-level abstraction layer. Second, we propose a dynamic key-value memory-augmented attention network to keep track of the attention history and comprehensive context information for the salient facets within the input document. In this way, our model can avoid generating repetitive words and faultiness summaries. Extensive experiments on two widely-used datasets demonstrate the effectiveness of the proposed MADY model in terms of both automatic evaluation and human evaluation. For reproducibility, we submit the code and data at: https://github.com/siat-nlp/MADY.git.

Accelerating Neural Architecture Search for Natural Language Processing with Knowledge Distillation and Earth Mover's Distance
Authors: Jianquan Li (1), Xiaokang Liu (2), Sheng Zhang (3), Min Yang (4), Ruifeng Xu (5), Fengqing Qin (1)
1: Yibin University, 2: Beijing Ultrapower Software Co., 3: National University of Defense Technology, 4: SIAT, 5: Harbin Institute of Technology

ACM DL

Google Scholar

(252)
概要:　近年のAI研究では、深層ニューラルネットワークのアーキテクチャを自動的に設計する技術、いわゆるニューラルアーキテクチャ検索（NAS）への関心が高まっている。NASを用いて自動的に検索されたネットワークアーキテクチャは、いくつかのNLPタスクにおいて、人手で設計されたアーキテクチャを上回る結果を示している。しかし、効率的なNASのために多くのモデル構成を訓練することは計算資源を大量に要求し、現実のアプリケーションでNAS手法を適用するための大きな障壁となっている。本研究では、知識蒸留に基づいて自然言語処理用のニューラルアーキテクチャ検索を高速化する手法（KD-NAS）を提案する。具体的には、訓練セット上の最適なネットワーク重み条件付きで検証セット上の最適なネットワークアーキテクチャを検索する代わりに、事前に訓練された教師ネットワークから検索ネットワークへ知識を転送する際の知識損失をEarth Mover's Distance（EMD）に基づいて最小化することにより、最適なネットワークを学習する。5つのデータセットでの実験により、提案手法が精度と検索速度の両面で強力な競合手法と比較して有望な性能を達成することを確認した。再現性のために、コードは以下に提出している: https://github.com/lxk00/KD-NAS-EMD。

Abstract:　 Recent AI research has witnessed increasing interests in automatically designing the architecture of deep neural networks, which is coined as neural architecture search (NAS). The automatically searched network architectures via NAS methods have outperformed manually designed architectures on some NLP tasks. However, training a large number of model configurations for efficient NAS is computationally expensive, creating a substantial barrier for applying NAS methods in real-life applications. In this paper, we propose to accelerate neural architecture search for natural language processing based on knowledge distillation (called KD-NAS). Specifically, instead of searching the optimal network architecture on the validation set conditioned on the optimal network weights on the training set, we learn the optimal network by minimizing the knowledge loss transferred from a pre-trained teacher network to the searching network based on Earth Mover's Distance (EMD). Experiments on five datasets show that our method achieves promising performance compared to strong competitors on both accuracy and searching speed. For reproducibility, we submit the code at: https://github.com/lxk00/KD-NAS-EMD.

Automated Graph Learning via Population Based Self-Tuning GCN
Authors: Ronghang Zhu (1), Zhiqiang Tao (2), Yaliang Li (3), Sheng Li (1)
1: University of Georgia, 2: Santa Clara University, 3: Alibaba Group

ACM DL

Google Scholar

(253)
概要:　優れたグラフ埋め込み抽出能力により、グラフ畳み込みネットワーク（GCN）とその派生モデルは、ノード分類、リンク予測、グラフ分類など幅広いタスクに成功裏に応用されてきました。従来のGCNモデルは過学習と過平滑化の問題に悩まされますが、DropEdgeのような最近の技術はこれらの問題を軽減し、深層GCNの開発を可能にします。しかし、深層GCNモデルのトレーニングは、ドロップアウト率や学習率への依存性が高く、ハイパーパラメータの選択に敏感であるため簡単ではありません。本研究では、ハイパーパラメータ最適化を通じてGCNモデルのトレーニングを自動化することを目指します。具体的には、交互トレーニングアルゴリズムを用いた自己調整型GCNアプローチを提案し、さらに、このアプローチを母集団ベースのトレーニングスキームを組み込むことにより拡張します。3つのベンチマークデータセットにおける実験結果は、複数層GCNの最適化において、いくつかの代表的なベースラインと比較して、我々のアプローチの有効性を示しています。

Abstract:　 Owing to the remarkable capability of extracting effective graph embeddings, graph convolutional network (GCN) and its variants have been successfully applied to a broad range of tasks, such as node classification, link prediction, and graph classification. Traditional GCN models suffer from the issues of overfitting and oversmoothing, while some recent techniques like DropEdge could alleviate these issues and thus enable the development of deep GCN. However, training GCN models is non-trivial, as it is sensitive to the choice of hyperparameters such as dropout rate and learning weight decay, especially for deep GCN models. In this paper, we aim to automate the training of GCN models through hyperparameter optimization. To be specific, we propose a self-tuning GCN approach with an alternate training algorithm, and further extend our approach by incorporating the population based training scheme. Experimental results on three benchmark datasets demonstrate the effectiveness of our approaches on optimizing multi-layer GCN, compared with several representative baselines.

AutoName: A Corpus-Based Set Naming Framework
Authors: Zhiqi Huang (1), Razieh Rahimi (1), Puxuan Yu (1), Jingbo Shang (2), James Allan (1)
1: University of Massachusetts Amherst, 2: University of California San Diego

ACM DL

Google Scholar

(254)
概要:　我々は、クエリエンティティのセットに対して、大規模なテキストコーパスから名前を抽出するための自動名前付け（AutoName）という教師なしフレームワークを提案します。エンティティセットの命名は、セッションベースや会話型の情報探索など、自然言語処理および情報検索に関連する多くのタスクにおいて有用です。従来の研究は、信頼性の高いエンティティ関係を提供する知識ベースからセット名を主に抽出していましたが、エンティティのカバレッジや広義な意味クラスを表現するセット名に限界がありました。これらの問題を解決するために、AutoNameは事前学習された言語モデルとドキュメント内のエンティティの文脈をプロービングすることで、上位概念に基づく候補フレーズを生成します。次に、フレーズをクラスタリングし、クエリエンティティ間の共通概念を記述するものを特定します。最後に、AutoNameは、クエリエンティティとの共起頻度および各クラスタの概念的一貫性に基づいて精緻化されたフレーズをランク付けします。我々は、このタスクのための新しいベンチマークデータセットを構築し、130のエンティティセットとその名前ラベルが含まれています。実験結果は、AutoNameが一貫性があり意味のあるセット名を生成し、すべてのベースラインを大幅に上回ることを示しています。

Abstract:　 We propose AutoName, an unsupervised framework that extracts a name for a set of query entities from a large-scale text corpus. Entity-set naming is useful in many tasks related to natural language processing and information retrieval such as session-based and conversational information seeking. Previous studies mainly extract set names from knowledge bases which provide highly reliable entity relations, but suffer from limited coverage of entities and set names that represent broad semantic classes. To address these problems, AutoName generates hypernym-anchored candidate phrases via probing a pre-trained language model and the entities' context in documents. Phrases are then clustered to identify ones that describe common concepts among query entities. Finally, AutoName ranks refined phrases based on the co-occurrences of their words with query entities and the conceptual integrity of their respective clusters. We built a new benchmark dataset for this task, consisting of 130 entity sets with name labels. Experimental results show that AutoName generates coherent and meaningful set names and significantly outperforms all baselines.

Backretrieval: An Image-Pivoted Evaluation Metric for Cross-Lingual Text Representations Without Parallel Corpora
Authors: Mikhail Fain (1), Niall Twomey (1), Danushka Bollegala (2)
1: Cookpad Ltd., 2: The University of Liverpool

ACM DL

Google Scholar

(255)
概要:　近年では、多言語間のテキスト表現が人気を集めており、例えば、教師なし機械翻訳や多言語情報検索などの多くのタスクのバックボーンとして機能しています。しかし、そのような表現の評価は、標準的なベンチマークを超える領域では困難です。これは、異なる言語ペア間でドメイン固有の並列言語データを取得する必要があるためです。本論文では、画像を代わりに使用したペア画像-テキスト評価データセットを用いて、多言語テキスト表現の品質を評価するための自動メトリックを提案します。実験的には、Backretrievalが注釈付きデータセットの真の値メトリックと高い相関を持つことが示され、我々の分析ではベースラインに比べて統計的に有意な改善が見られました。我々の実験は、並列多言語データのないレシピデータセットでのケーススタディで締めくくられます。Backretrievalを用いて多言語埋め込みの品質を評価する方法を示し、小規模な人間による研究でその結果を検証します。

Abstract:　 Cross-lingual text representations have gained popularity lately and act as the backbone of many tasks such as unsupervised machine translation and cross-lingual information retrieval, to name a few. However, evaluation of such representations is difficult in the domains beyond standard benchmarks due to the necessity of obtaining domain-specific parallel language data across different pairs of languages. In this paper, we propose an automatic metric for evaluating the quality of cross-lingual textual representations using images as a proxy in a paired image-text evaluation dataset. Experimentally, Backretrieval is shown to highly correlate with ground truth metrics on annotated datasets, and our analysis shows statistically significant improvements over baselines. Our experiments conclude with a case study on a recipe dataset without parallel cross-lingual data. We illustrate how to judge cross-lingual embedding quality with Backretrieval, and validate the outcome with a small human study.

Bayesian Critiquing with Keyphrase Activation Vectors for VAE-based Recommender Systems
Authors: Hojin Yang (1), Tianshu Shen (1), Scott Sanner (1)
1: University of Toronto

ACM DL

Google Scholar

(256)
概要:　クリティークは、ユーザーの好みに関するフィードバックに基づいて提案を段階的に適応させる会話型推薦手法の一つです。最近のクリティークの進展は、キーフレーズに基づく説明に対するユーザーの批評に基づいて潜在的なユーザー嗜好の埋め込みを更新する、Critiquable-Explainable（CE-VAE）フレームワークにおけるVAE-CF推薦の力を活用しています。しかし、CE-VAEには以下の2つの主要な欠点があります： (i) 説明とクリティークを円滑にするために第2のVAEヘッドを使用するが、これはマルチオブジェクティブなトレーニングのために第1のVAEヘッドの推薦性能を犠牲にする可能性があり、(ii) マルチステップのクリティークのための逆デコードエンコードループを繰り返す必要があり、性能が低下することです。これらの欠点を解決するために、我々は新しいベイジアンキーフレーズクリティークVAE（BK-VAE）フレームワークを提案します。BK-VAEはVAE-CFの強みを活かしつつ、CE-VAEの問題のある第2のヘッドを回避します。代わりに、BK-VAEは概念活性化ベクター（CAV）に着想を得たアプローチを使用して、VAE-CFにおけるアイテムのキーフレーズ特性と潜在的なユーザー嗜好の整合性を決定します。BK-VAEはこの整合性をベイジアンフレームワークで活用し、ユーザーの潜在的な嗜好に関する不確実性をモデル化し、各クリティーク後にこれらの嗜好の信念を後方更新します—基本的にベイズの定理の簡単な適用を通じてCE-VAEの説明およびクリティークの反転を実現しています。2つのデータセットを用いた実証評価では、BK-VAEが推薦およびマルチステップクリティーク性能の両方でCE-VAEに匹敵またはそれを上回ることが示されています。

Abstract:　 Critiquing is a method for conversational recommendation that incrementally adapts recommendations in response to user preference feedback. Recent advances in critiquing have leveraged the power of VAE-CF recommendation in a critiquable-explainable (CE-VAE) framework that updates latent user preference embeddings based on their critiques of keyphrase-based explanations. However, the CE-VAE has two key drawbacks: (i) it uses a second VAE head to facilitate explanations and critiquing, which can sacrifice recommendation performance of the first VAE head due to multiobjective training, and (ii) it requires iterating an inverse decoding-encoding loop for multi-step critiquing that yields poor performance. To address these deficiencies, we propose a novel Bayesian Keyphrase critiquing VAE (BK-VAE) framework that builds on the strengths of VAE-CF, but avoids the problematic second head of CE-VAE. Instead, the BK-VAE uses a Concept Activation Vector (CAV) inspired approach to determine the alignment of item keyphrase properties with latent user preferences in VAE-CF. BK-VAE leverages this alignment in a Bayesian framework to model uncertainty in a user's latent preferences and to perform posterior updates to these preference beliefs after each critique --- essentially achieving CE-VAE's explanation and critique inversion through a simple application of Bayes rule. Our empirical evaluation on two datasets demonstrates that BK-VAE matches or dominates CE-VAE in both recommendation and multi-step critiquing performance.

Cheap and Good? Simple and Effective Data Augmentation for Low Resource Machine Reading
Authors: Hoang Van (1), Vikas Yadav (2), Mihai Surdeanu (1)
1: University of Arizona, 2: IBM Research

ACM DL

Google Scholar

(257)
概要:　我々は、リソースの少ない機械読解（MRC）のためのデータ拡張に向けたシンプルかつ効果的な戦略を提案します。我々のアプローチでは、まずMRCシステムの回答抽出コンポーネントを、正解の近似コンテキストを含む拡張データで事前学習し、その後、正確な回答範囲を含むデータで学習を行います。この近似コンテキストは、QAメソッドのコンポーネントが回答の位置を絞り込むのに役立ちます。我々は、このシンプルな戦略が回答のコンテキストの拡大と追加の訓練データの提供により、文書の検索および回答抽出の性能を大幅に向上させることを実証しました。特に、本手法は、複雑でリソースの少ないMRCタスクであるTechQAにおいて、BERTベースのリトリバーの性能を15.12%向上させ、回答抽出のF1スコアを4.33%向上させました。さらに、我々のデータ拡張戦略は、実務的でありながら中規模のQAデータセットであるPolicyQAにおいて、回答抽出の正確一致（EM）を最大3.9%、F1スコアを2.7%向上させる重要な改善をもたらしました。

Abstract:　 We propose a simple and effective strategy for data augmentation for low-resource machine reading comprehension (MRC). Our approach first pretrains the answer extraction components of a MRC system on the augmented data that contains approximate context of the correct answers, before training it on the exact answer spans. The approximate context helps the QA method components in narrowing the location of the answers. We demonstrate that our simple strategy substantially improves both document retrieval and answer extraction performance by providing larger context of the answers and additional training data. In particular, our method significantly improves the performance of BERT based retriever (15.12%), and answer extractor (4.33% F1) on TechQA, a complex, low-resource MRC task. Further, our data augmentation strategy yields significant improvements of up to 3.9% exact match (EM) and 2.7% F1 for answer extraction on PolicyQA, another practical but moderate sized QA dataset that also contains long answer spans.

CIFDM: Continual and Interactive Feature Distillation for Multi-Label Stream Learning
Authors: Yigong Wang (1), Zhuoyi Wang (1), Yu Lin (1), Latifur Khan (1), Dingcheng Li (2)
1: The University of Texas at Dallas, 2: Amazon Alexa AI

ACM DL

Google Scholar

(258)
概要:　マルチラベル学習アルゴリズムは近年ますます注目を集めています。これは主に、現実世界のデータが一般的に複数の排他的でないラベルと関連しているためです。これらのラベルは異なるオブジェクト、シーン、アクション、属性に対応することがあります。本論文では、次のような挑戦的なマルチラベルストリームシナリオを考察します：新しいラベルが変化する環境の中で継続的に出現し、以前のデータに割り当てられる場合です。この設定では、データマイニングのソリューションは新しい概念を学習し、同時に破滅的な忘却を回避する能力が求められます。私たちは、新しいラベルを効果的に分類するために、新しい継続的かつインタラクティブな特徴蒸留に基づく学習フレームワーク（CIFDM）を提案します。私たちは過去のタスクからの知識を活用して新しい知識を学習し、現在のタスクを解決します。その後、システムは過去および新しい知識を圧縮して保持し、新しいタスクの出現を待ちます。CIFDMは3つのコンポーネントで構成されています：1)既存の特徴レベルで圧縮された知識を保存し、これまで観察されたラベルを予測する「知識バンク」；2)知識バンクに基づいて新しく出現したラベルを学習および予測することを目指す「パイオニアモジュール」；3)新しい知識を圧縮してバンクに転送し、次のタスクのためにパイオニアのラベル埋め込みを初期化するために現在の圧縮された知識を適用する「インタラクティブ知識圧縮機能」です。

Abstract:　 Multi-label learning algorithms have attracted more and more attention as of recent. This is mainly because real-world data is generally associated with multiple and non-exclusive labels, which could correspond to different objects, scenes, actions, and attributes. In this paper, we consider the following challenging multi-label stream scenario: the new labels emerge continuously in the changing environments, and are assigned to the previous data. In this setting, data mining solutions must be able to learn the new concepts and avoid catastrophic forgetting simultaneously. We propose a novel continual and interactive feature distillation-based learning framework (CIFDM), to effectively classify instances with novel labels. We utilize the knowledge from the previous tasks to learn new knowledge to solve the current task. Then, the system compresses historical and novel knowledge and preserves it while waiting for new emerging tasks. CIFDM consists of three components: 1) a knowledge bank that stores the existing feature-level compressed knowledge, and predicts the observed labels so far; 2) a pioneer module that aims to learn and predict new emerged labels based on knowledge bank.; 3) an interactive knowledge compression function which is used to compress and transfer the new knowledge to the bank, and then apply the current compressed knowledge to initialize the label embedding of the pioneer for the next task.

Clustering-Based Online News Topic Detection and Tracking Through Hierarchical Bayesian Nonparametric Models
Authors: Wentao Fan (1), Zhiyan Guo (1), Nizar Bouguila (2), Wenjuan Hou (1)
1: Huaqiao University, 2: Concordia University

ACM DL

Google Scholar

(259)
概要:　本論文では、異なるニュース記事間でトピックを共有させることが可能な階層ベイズ非パラメトリックフレームワークを基盤としたクラスタリングベースのオンラインニューストピック検出および追跡（TDT）アプローチを提案します。我々のアプローチは、構成成分の密度として逆ベータ・リウヴィル（IBL）分布を用いた階層ピットマン・ヨール過程混合モデルを使用して構成されており、これは広く使用されているガウス分布よりもテキストデータのモデリングにおいて優れた性能を示しています。さらに、提案するTDTモデルをニュース記事のストリームから効果的に学習できる変分ベイズに基づく収束保証付きのオンライン学習アルゴリズムを理論的に開発しました。我々のTDTアプローチの利点は、異なるニュースデータセット上で他のよく定義されたクラスタリングベースのTDTアプローチと比較することにより示されています。

Abstract:　 In this paper, we propose a clustering-based online news topic detection and tracking (TDT) approach based on hierarchical Bayesian nonparametric framework that allows topics to be shared across different news stories in a corpus. Our approach is formulated using the hierarchical Pitman-Yor process mixture model with the inverted Beta-Liouville (IBL) distribution as its component density, which has shown superior performance in modeling text data than the widely used Gaussian distribution. Moreover, we theoretically develop a convergence-guaranteed online learning algorithm that can effectively learn the proposed TDT model from a stream of news stories based on varational Bayes. The merits of our TDT approach are illustrated by comparing it with other well-defined clustering-based TDT approaches on different news data sets.

Communication Efficient Distributed Hypergraph Clustering
Authors: Chun Jiang Zhu (1), Qinqing Liu (2), Jinbo Bi (2)
1: University of North Carolina Greensboro, 2: University of Connecticut

ACM DL

Google Scholar

(260)
概要:　ハイパーグラフは、グラフにおけるペア関係ではなく、オブジェクトの部分集合間の高次関係を捉えることができます。ハイパーグラフクラスタリングは、情報検索や機械学習において重要なタスクです。本研究では、メッセージパッシング通信モデルを使用し、小さな通信コストでの分散ハイパーグラフクラスタリングの問題を検討します。我々は、スペクトルハイパーグラフスパーシフィケーションに基づく分散ハイパーグラフクラスタリングのためのアルゴリズムフレームワークを提案します。最大サイズrのハイパーエッジを持つn頂点のハイパーグラフGが任意にsサイトに分散され、パラメーターε∈ (0,1)の場合、我々のアルゴリズムは通信コスト~O(nr²s/ε^O(1))（~Oは多対数因子を隠します）を使用して、GのコンダクタンスφGに対してO(√1+ε/1-ε√φG)のコンダクタンスを持つ頂点集合を生成できます。理論的な結果は、異なる実世界のデータセットで提案アルゴリズムの効率と効果を実証する広範な実験で補完されます。我々のソースコードはgithub.com/chunjiangzhu/dhgcで公開されております。

Abstract:　 Hypergraphs can capture higher-order relations between subsets of objects instead of only pairwise relations as in graphs. Hypergraph clustering is an important task in information retrieval and machine learning. We study the problem of distributed hypergraph clustering in the message passing communication model using small communication cost. We propose an algorithm framework for distributed hypergraph clustering based on spectral hypergraph sparsification. For an n-vertex hypergraph G with hyperedges of maximum size r distributed at s sites arbitrarily and a parameter ε∈ (0,1), our algorithm can produce a vertex set with conductance O(√1+ε/1-ε √φG), where φG is the conductance of G, using communication cost ~O(nr2s/εO(1)) (~O hides a polylogarithmic factor). The theoretical results are complemented with extensive experiments to demonstrate the efficiency and effectiveness of the proposed algorithm under different real-world datasets. Our source code is publicly available at github.com/chunjiangzhu/dhgc.

Composite Code Sparse Autoencoders for First Stage Retrieval
Authors: Carlos Lassance (1), Thibault Formal (1), Stéphane Clinchant (1)
1: Naver Labs Europe

ACM DL

Google Scholar

(261)
概要:　本研究では、Siamese-BERTモデルに基づく文書表現の近似最近傍検索（Approximate Nearest Neighbor, ANN）のための複合コードスパースオートエンコーダ（Composite Code Sparse Autoencoder, CCSA）アプローチを提案します。情報検索（Information Retrieval, IR）において、ランク付けプロセスは一般に2つの段階に分解されます。第一段階では、コレクション全体から候補集合を抽出し、第二段階ではより複雑なモデルを用いて候補を再ランク付けします。近年、Siamese-BERTモデルは従来のバッグオブワーズモデルを代替または補完する第一段階のランカーとして利用されています。しかし、大規模な文書コレクションのインデックス作成と検索には、密なベクトルに対する効率的な類似度検索が必要であり、このためにANN技術が活用されます。複合コードは自然にスパースであるため、CCSAが均一性正則化子を用いることで効率的な並列倒立インデックスを学習できる方法を示します。MS MARCOでの実験により、同じ量子化予算とrecall@1000ターゲットに対して、CCSAが製品量子化を用いた倒立インデックスファイル（IVF）を上回る性能を発揮できることを明らかにしました。

Abstract:　 We present a Composite Code Sparse Autoencoder (CCSA) approach for Approximate Nearest Neighbor (ANN) search of document representations based on Siamese-BERT models. In Information Retrieval (IR), the ranking pipeline is generally decomposed in two stages: the first stage focuses on retrieving a candidate set from the whole collection. The second stage re-ranks the candidates by relying on more complex models. Recently, Siamese-BERT models have been used as first stage rankers to replace or complement the traditional bag-of-words models. However, indexing and searching a large document collection requires efficient similarity search on dense vectors and this is why ANN techniques come into play. Since composite codes are naturally sparse, we show how CCSA can learn efficient parallel inverted index thanks to an uniformity regularizer. Our experiments on MS MARCO reveal that for the same quantization budget and recall@1000 targets, CCSA is able to outperform IVF (inverted-index file) with product quantization on both

ConsisRec: Enhancing GNN for Social Recommendation via Consistent Neighbor Aggregation
Authors: Liangwei Yang (1), Zhiwei Liu (1), Yingtong Dou (1), Jing Ma (2), Philip S. Yu (1)
1: University of Illinois at Chicago, 2: Sichuan University

ACM DL

Google Scholar

(262)
概要:　ソーシャルレコメンデーションは、評価予測のためのコールドスタート問題を軽減するために、ソーシャルリンクとユーザー・アイテム間の相互作用を融合させることを目的としています。近年のグラフニューラルネットワーク（GNN）の発展により、ソーシャルとユーザー・アイテム相互作用の両方の情報を同時に集約するためのGNNベースのソーシャルレコメンデーションフレームワークの設計が試みられています。しかし、多くの既存の方法はソーシャル不一致問題を無視しています。ソーシャルリンクは評価予測プロセスと必ずしも一致しないという直感的な問題です。ソーシャル不一致は、コンテキストレベルとリレーションレベルの両方で観察され得ます。したがって、我々はGNNモデルがソーシャル不一致問題に対処できる能力を持たせることを目指しています。具体的には、近隣ノード間の一貫性スコアに基づくサンプリング確率を用いて、一貫性のある近隣ノードをサンプリングすることを提案します。さらに、関係注意メカニズムを使用して集約のために一貫性のある関係に高い重要度を付与します。2つの実世界データセットを用いた実験により、モデルの有効性が検証されました。

Abstract:　 Social recommendation aims to fuse social links with user-item interactions to alleviate the cold-start problem for rating prediction. Recent developments of Graph Neural Networks (GNNs) motivate endeavors to design GNN-based social recommendation frameworks to aggregate both social and user-item interaction information simultaneously. However, most existing methods neglect the social inconsistency problem, which intuitively suggests that social links are not necessarily consistent with the rating prediction process. Social inconsistency can be observed from both context-level and relation-level. Therefore, we intend to empower the GNN model with the ability to tackle the social inconsistency problem. We propose to sample consistent neighbors by relating sampling probability with consistency scores between neighbors. Besides, we employ the relation attention mechanism to assign consistent relations with high importance factors for aggregation. Experiments on two real-world datasets verify the model effectiveness.

DSGPT: Domain-Specific Generative Pre-Training of Transformers for Text Generation in E-commerce Title and Review Summarization
Authors: Xueying Zhang (1), Yunjiang Jiang (1), Yue Shang (1), Zhaomeng Cheng (1), Chi Zhang (1), Xiaochuan Fan (1), Yun Xiao (1), Bo Long (1)
1: JD.com Silicon Valley Research Center

ACM DL

Google Scholar

(263)
概要:　本研究では、テキスト生成のための新たなドメイン特化型生成事前学習（DSGPT）手法を提案し、Eコマースのモバイル表示における製品タイトルとレビューの問題に適用します。まず、デコーダーのみのトランスフォーマーアーキテクチャを採用し、入力と出力を組み合わせることにより、ファインチューニングタスクに適しています。次に、関連ドメインの少量の事前学習データを利用することが強力であることを示します。一般的なコーパス（例えばWikipediaやCommon Crawl）から言語モデルを事前学習するには膨大な時間とリソースが必要であり、下流タスクの種類が限られている場合、それは無駄になる可能性があります。我々のDSGPTは限定されたデータセット、すなわち中国語短文データセット（LCSTS）を使用して事前学習されています。第三に、我々のモデルは製品関連の人手によるデータを必要としません。タイトルタスクでは、最先端の手法はトレーニングおよび予測段階で追加の背景知識を明示的に使用しますが、それに対し我々のモデルはこの知識を暗黙的に捉え、公衆データセット（Taobao.com）でのファインチューニング後、他の手法に対して顕著な改善を達成しました。レビュータスクでは、JD.comのインハウスデータセットを利用し、ファインチューニングの柔軟性がない標準的な機械翻訳手法に対して同様の改善を観察しました。我々の提案する手法は、他のドメインにも簡単に拡張でき、広範なテキスト生成タスクに適用可能です。

Abstract:　 We propose a novel domain-specific generative pre-training (DSGPT) method for text generation and apply it to the product title and review summarization problems on E-commerce mobile display. First, we adopt a decoder-only transformer architecture, which fits well for fine-tuning tasks by combining input and output all together. Second, we demonstrate utilizing only small amount of pre-training data in related domains is powerful. Pre-training a language model from a general corpus such as Wikipedia or the Common Crawl requires tremendous time and resource commitment, and can be wasteful if the downstream tasks are limited in variety. Our DSGPT is pre-trained on a limited dataset, the Chinese short text summarization dataset (LCSTS). Third, our model does not require product-related human-labeled data. For title summarization task, the state of art explicitly uses additional background knowledge in training and predicting stages. In contrast, our model implicitly captures this knowledge and achieves significant improvement over other methods, after fine-tuning on the public Taobao.com dataset. For review summarization task, we utilize JD.com in-house dataset, and observe similar improvement over standard machine translation methods which lack the flexibility of fine-tuning. Our proposed work can be simply extended to other domains for a wide range of text generation tasks.

Dual-View Distilled BERT for Sentence Embedding
Authors: Xingyi Cheng (1)
1: Ant Group

ACM DL

Google Scholar

(264)
概要:　近年、BERTは単語レベルのクロスセンテンスアテンションを用いた文の一致において顕著な進歩を遂げました。しかし、2つの文ベクトルを導き出すためにサイアムBERTネットワークを使用すると、2つの文間の単語レベルのアテンションが欠如するため、グローバルな意味の把握が不十分となり、性能が大きく低下します。本論文では、文章の埋め込みによる文の一致のためのDual-view蒸留BERT(DvBERT)を提案します。本手法では、文の対をサイアムビューとインタラクションビューという2つの異なる視点から処理します。サイアムビューは文の埋め込みを生成する骨幹であり、インタラクションビューは複数の教師としてクロスセンテンスインタラクションを統合し、文の埋め込みの表現能力を向上させます。6つのSTSタスクにおける実験では、我々の手法が最先端の文埋め込み手法を上回る性能を示しました。

Abstract:　 Recently, BERT realized significant progress for sentence matching via word-level cross sentence attention. However, the performance significantly drops when using siamese BERT-networks to derive two sentence embeddings, which fall short in capturing the global semantic since the word-level attention between two sentences is absent. In this paper, we propose a Dual-view distilled BERT~(DvBERT) for sentence matching with sentence embeddings. Our method deals with a sentence pair from two distinct views, i.e., Siamese View and Interaction View. Siamese View is the backbone where we generate sentence embeddings. Interaction View integrates the cross sentence interaction as multiple teachers to boost the representation ability of sentence embeddings. Experiments on six STS tasks show that our method outperforms the state-of-the-art sentence embedding methods.

Enhanced Representation Learning for Examination Papers with Hierarchical Document Structure
Authors: Yixiao Ma (1), Shiwei Tong (1), Ye Liu (1), Likang Wu (1), Qi Liu (1), Enhong Chen (1), Wei Tong (1), Zi Yan (2)
1: University of Science and Technology of China, 2: The National Education Examinations Authority of the People's Republic of China

ACM DL

Google Scholar

(265)
概要:　試験問題の表現学習は、教育分野における試験問題分析（EPA）の基盤であり、特に問題の難易度予測（PDR）や類似問題の検索（FSP）において重要な役割を果たします。これまでの研究は主に各設問の表現学習に焦点を当ててきましたが、試験問題の階層的文書構造に着目した研究はほとんどありませんでした。そこで、本論文では、試験問題の階層的文書構造を考慮した頑健な表現を学習するための新しい手法「試験組織エンコーダー（EOE）」を提案します。具体的には、まず構文解析器を提案し、試験問題の階層的文書構造を回復し、「試験組織ツリー（EOT）」に変換します。このツリーにおいて、設問がリーフノードとし、内部ノードはそれらの子ノードのとして表されます。次に、各リーフノードの表現を取得するために、2層のGRUベースのモジュールを適用します。その後、各リーフノードの表現を集約するためのサブツリーエンコーダーモジュールを設計し、EOTの各層の埋め込みを計算します。最後に、全ての層の埋め込みを出力モジュールに入力し、試験問題の表現が得られ、これを下流のタスクに利用することができます。実世界のデータに基づく広範な実験により、本手法の有効性と解釈性が実証されました。

Abstract:　 Representation learning of examination papers is the cornerstone of the Examination Paper Analysis (EPA) in education area including Paper Difficulty Prediction (PDR) and Finding Similar Papers (FSP). Previous works mainly focus on the representation learning of each test item, but few works notice the hierarchical document structure in examination papers. To this end, in this paper, we propose a novel Examination Organization Encoder (EOE) to learn a robust representation of the examination paper with the hierarchical document structure. Specifically, we first propose a syntax parser to recover the hierarchical document structure and convert an examination paper to an Examination Organization Tree (EOT), where the test items are the leaf nodes and the internal nodes are summarization of their child nodes. Then, we applied a two-layer GRU-based module to obtain the representation of each leaf node. After that, we design a subtree encoder module to aggregate the representation of each leaf node, which is used to calculate an embedding for each layer in the EOT. Finally, we feed all the layer embedding into an output module, the process is over and we get the examination paper representation that can be used for downstream tasks. Extensive experiments on real-world data demonstrate the effectiveness and interpretability of our method.

Explicit Semantic Cross Feature Learning via Pre-trained Graph Neural Networks for CTR Prediction
Authors: Feng Li (1), Bencheng Yan (1), Qingqing Long (1), Pengjie Wang (1), Wei Lin (1), Jian Xu (1), Bo Zheng (1)
1: Alibaba Group

ACM DL

Google Scholar

(266)
概要:　クリック率（CTR）予測において、クロス特徴量は重要な役割を果たします。既存のほとんどの手法は、DNNベースのモデルを採用して暗黙的にクロス特徴量を捉えようとします。しかし、これらの暗黙的手法は、明示的な意味的モデリングの制限により、最適化された性能を達成できないことがあります。従来の統計的な明示的意味的クロス特徴量は、暗黙的方法の問題を解決できますが、一般化の不足や高コストのメモリ消費といった課題を抱えています。これらの課題に対処する研究は少ないです。本論文では、明示的な意味的クロス特徴量の学習に初めて取り組み、Pre-trained Cross Feature learning Graph Neural Networks（PCF-GNN）を提案します。このGNNベースの事前学習モデルは、明示的な方法でクロス特徴量を生成することを目的としています。公共データセットと産業データセットの両方で広範な実験を行ったところ、PCF-GNNはさまざまなタスクにおいて、性能とメモリ効率の両方で優れた能力を発揮しました。

Abstract:　 Cross features play an important role in click-through rate (CTR) prediction. Most of the existing methods adopt a DNN-based model to capture the cross features in an implicit manner. These implicit methods may lead to a sub-optimized performance due to the limitation in explicit semantic modeling. Although traditional statistical explicit semantic cross features can address the problem in these implicit methods, it still suffers from some challenges, including lack of generalization and expensive memory cost. Few works focus on tackling these challenges. In this paper, we take the first step in learning the explicit semantic cross features and propose Pre-trained Cross Feature learning Graph Neural Networks (PCF-GNN), a GNN based pre-trained model aiming at generating cross features in an explicit fashion. Extensive experiments are conducted on both public and industrial datasets, where PCF-GNN shows competence in both performance and memory-efficiency in various tasks.

GemNN: Gating-enhanced Multi-task Neural Networks with Feature Interaction Learning for CTR Prediction
Authors: Hongliang Fei (1), Jingyuan Zhang (1), Xingxuan Zhou (2), Junhao Zhao (2), Xinyang Qi (2), Ping Li (1)
1: Baidu Research, 2: Baidu Inc.

ACM DL

Google Scholar

(267)
概要:　ディープニューラルネットワーク（DNN）モデルは、オンライン広告におけるクリック率（CTR）予測に広く使用されています。通常のトレーニングフレームワークは、埋め込み層と多層パーセプトロン（MLP）で構成されています。Baidu検索広告（別名：フェニックスネスト）では、新世代のCTRトレーニングプラットフォームとして、GPUベースのパラメータサーバーシステムであるPaddleBoxが導入されました。本論文では、Baiduが最近更新したCTRトレーニングフレームワークである「Gating-enhanced Multi-task Neural Networks（GemNN）」を紹介します。特に、多タスク学習モデルに基づくニューラルネットワークを開発し、CTRを粗から細へと段階的に予測することで広告候補を絞り込み、上流タスクから下流タスクへとパラメータを共有してトレーニング効率を向上させます。また、埋め込み層とMLPの間にゲーティングメカニズムを導入し、特徴の相互作用を学習するとともに、MLP層に供給される情報の流れを制御します。私たちはこのソリューションをBaidu PaddleBoxプラットフォームで導入し、オフラインおよびオンライン評価の両方でかなりの改善を観察しました。現在、このフレームワークは本番システムの一部となっています。

Abstract:　 Deep neural network (DNN) models have been widely used for click-through rate (CTR) prediction in online advertising. The training framework typically consists of embedding layers and multi-layer perceptions (MLP). At Baidu Search Ads (a.k.a. Phoenix Nest), the new generation of CTR training platform has become PaddleBox, a GPU-based parameter server system. In this paper, we present Baidu's recently updated CTR training framework, called Gating-enhanced Multi-task Neural Networks (GemNN). In particular, we develop a neural network based multi-task learning model to predict CTR in a coarse-to-fine manner, which gradually reduces ad candidates and allows parameter sharing from upstream tasks to downstream tasks to improve the training efficiency. Also, we introduce a gating mechanism between embedding layers and MLP to learn feature interactions and control the information flow fed to MLP layers. We have launched our solution in Baidu PaddleBox platform and observed considerable improvements in both offline and online evaluations. It is now part of the current production~system.

Graph Learning Regularization and Transfer Learning for Few-Shot Event Detection
Authors: Viet Dac Lai (1), Minh Van Nguyen (1), Thien Huu Nguyen (1), Franck Dernoncourt (2)
1: University of Oregon, 2: Adobe Inc.

ACM DL

Google Scholar

(268)
概要:　本研究では、イベント検出（ED）における少数ショット学習モデルの一般化性能の低さを、転移学習と表現正則化を用いて改善します。特に、オープンドメインの語義曖昧性解消の知識をEDの少数ショット学習モデルに転移することにより、新しいイベントタイプへの一般化を向上させることを提案します。また、依存グラフから得られる新しい学習信号を提案し、EDの表現学習を正則化します。さらに、大規模な人手によるアノテーションが付与されたEDデータセットを用いて、EDにおける少数ショット学習モデルの評価を行い、この問題に対してより信頼性の高い洞察を得ることを目指します。我々の包括的な実験により、提案モデルがEDにおける少数ショット学習および教師あり学習設定において、最先端のベースラインモデルを上回る性能を示すことが明らかになりました。コードとデータの分割は、https://github.com/laiviet/ed-fsl で入手可能です。

Abstract:　 We address the poor generalization of few-shot learning models for event detection (ED) using transfer learning and representation regularization. In particular, we propose to transfer knowledge from open-domain word sense disambiguation into few-shot learning models for ED to improve their generalization to new event types. We also propose a novel training signal derived from dependency graphs to regularize the representation learning for ED. Moreover, we evaluate few-shot learning models for ED with a large-scale human-annotated ED dataset to obtain more reliable insights for this problem. Our comprehensive experiments demonstrate that the proposed model outperforms state-of-the-art baseline models in the few-shot learning and supervised learning settings for ED. Code and data splits are available at https://github.com/laiviet/ed-fsl.

Graph Pooling via Coarsened Graph Infomax
Authors: Yunsheng Pang (1), Yunxiang Zhao (2), Dongsheng Li (3)
1: University of Melbourne, 2: National University of Defense Technology & University of Melbourne, 3: National University of Defense Technology

ACM DL

Google Scholar

(269)
概要:　大規模なグラフの情報をコンパクトな形にするグラフプーリングは、階層的なグラフ表現学習において不可欠です。既存のグラフプーリング手法は、高い計算複雑性に悩まされるか、プーリング前後のグラフ間の全体的な依存関係を捉えられないという問題があります。これらの問題に対処するために、入力グラフと各プーリング層の縮約されたグラフとの間の相互情報を最大化してグラフレベルの依存関係を保存する「Coarsened Graph InfomaxPooling（CGIPool）」を提案します。相互情報のニューラル最大化を達成するために、対照学習を適用し、正と負のサンプルを学習するための自己注意ベースのアルゴリズムを提案します。7つのデータセットにおける広範な実験結果は、最先端法と比較したときのCGIPoolの優位性を示しています。

Abstract:　 Graph pooling that summaries the information in a large graph into a compact form is essential in hierarchical graph representation learning. Existing graph pooling methods either suffer from high computational complexity or cannot capture the global dependencies between graphs before and after pooling. To address the problems of existing graph pooling methods, we propose Coarsened Grap hInfomaxPooling (CGIPool) that maximizes the mutual information between the input and the coarsened graph of each pooling layer to preserve graph-level dependencies. To achieve mutual information neural maximization, we apply contrastive learning and propose a self-attention-based algorithm for learning positive and negative samples. Extensive experimental results on seven datasets illustrate the superiority of CGIPool comparing to the state-of-the-art

GraphPAS: Parallel Architecture Search for Graph Neural Networks
Authors: Jiamin Chen (1), Jianliang Gao (1), Yibo Chen (2), Moctard Babatounde Oloulade (1), Tengfei Lyu (1), Zhao Li (3)
1: Central South University, 2: State Grid Hunan ElectricPower Company Limited, 3: Alibaba Group

ACM DL

Google Scholar

(270)
概要:　グラフニューラルアーキテクチャ探索は、非ユークリッドデータに対してグラフニューラルネットワーク（GNN）が最近成功裡に応用されたことにより、多くの注目を集めています。しかし、大規模なグラフデータにおける広大な探索空間内のあらゆる可能なGNNアーキテクチャを探求することは、非常に時間がかかるか、もしくは不可能です。本論文では、グラフニューラルネットワークのための並列グラフアーキテクチャ探索（GraphPAS）フレームワークを提案します。GraphPASでは、共有ベースの進化学習を設計することによって、並列で探索空間を探求し、精度を損なうことなく探索効率を向上させることができます。さらに、アーキテクチャ情報エントロピーを動的に採用して突然変異選択確率を調整することにより、探索空間の縮小が可能となります。実験結果は、GraphPASが効率と精度を併せ持ち、最先端モデルを上回る性能を示すことを明らかにしています。

Abstract:　 Graph neural architecture search has received a lot of attention as Graph Neural Networks (GNNs) has been successfully applied on the non-Euclidean data recently. However, exploring all possible GNNs architectures in the huge search space is too time-consuming or impossible for big graph data. In this paper, we propose a parallel graph architecture search (GraphPAS) framework for graph neural networks. In GraphPAS, we explore the search space in parallel by designing a sharing-based evolution learning, which can improve the search efficiency without losing the accuracy. Additionally, architecture information entropy is adopted dynamically for mutation selection probability, which can reduce space exploration. The experimental result shows that GraphPAS outperforms state-of-art models with efficiency and accuracy simultaneously.

Hierarchically Modeling Micro and Macro Behaviors via Multi-Task Learning for Conversion Rate Prediction
Authors: Hong Wen (1), Jing Zhang (2), Fuyu Lv (1), Wentian Bao (1), Tianyi Wang (1), Zulong Chen (1)
1: Alibaba Group, 2: The University of Sydney

ACM DL

Google Scholar

(271)
概要:　現代の産業用電子商取引プラットフォームにおいて、コンバージョン率（CVR）の予測はますます重要になってきており、これは最終的な収益に直接寄与します。CVRモデリングの過程で発生する、よく知られたサンプル選択バイアス（SSB）およびデータスパーシティ（DS）の問題に対処するために、豊富なラベル付きマクロ行動（ユーザーのアイテムとの相互作用）が使用されます。しかしながら、購入に関連するいくつかのミクロ行動（アイテム詳細ページの特定のコンポーネントとのユーザーの相互作用）が、CVR予測のための詳細な手がかりを補完することが観察されています。この観察に基づき、我々はミクロおよびマクロ行動の階層的モデリングによる新しいCVR予測方法（HM3）を提案します。具体的には、まず完全なユーザーの逐次的行動グラフを構築し、ミクロ行動およびマクロ行動をワンホップおよびツーホップのクリック後ノードとして階層的に表現します。その後、HM3をマルチヘッドディープニューラルネットワークとして具体化し、グラフ内の明示的なサブパスに対応する6つの確率変数を予測します。これらは、グラフ上で定義された条件付き確率ルールに従って、4つの補助タスクの予測変数および最終的なCVRに統合されます。マルチタスク学習を採用し、マイクロおよびマクロ行動からの豊富な監督ラベルを活用することにより、HM3はエンドツーエンドで訓練され、SSBおよびDSの問題に対処することができます。オフラインとオンラインの両方の環境での広範な実験により、提案されたHM3が代表的な最先端手法よりも優れていることが示されました。

Abstract:　 Conversion Rate (CVR) prediction in modern industrial e-commerce platforms is becoming increasingly important, which directly contributes to the final revenue. In order to address the well-known sample selection bias (SSB) and data sparsity (DS) issues encountered during CVR modeling, the abundant labeled macro behaviors (i.e., user's interactions with items) are used. Nonetheless, we observe that several purchase-related micro behaviors (i.e., user's interactions with specific components on the item detail page) can supplement fine-grained cues for CVR prediction. Motivated by this observation, we propose a novel CVR prediction method by Hierarchically Modeling both Micro and Macro behaviors (HM3). Specifically, we first construct a complete user sequential behavior graph to hierarchically represent micro behaviors and macro behaviors as one-hop and two-hop post-click nodes. Then, we embody HM3 as a multi-head deep neural network, which predicts six probability variables corresponding to explicit sub-paths in the graph. They are further combined into the prediction targets of four auxiliary tasks as well as the final CVR according to the conditional probability rule defined on the graph. By employing multi-task learning and leveraging the abundant supervisory labels from micro and macro behaviors, HM3 can be trained end-to-end and address the SSB and DS issues. Extensive experiments on both offline and online settings demonstrate the superiority of the proposed HM3 over representative state-of-the-art methods.

Improving Bi-encoder Document Ranking Models with Two Rankers and Multi-teacher Distillation
Authors: Jaekeol Choi (1), Euna Jung (2), Jangwon Suh (2), Wonjong Rhee (2)
1: Seoul National University & Naver Corp., 2: Seoul National University

ACM DL

Google Scholar

(272)
概要:　 BERTベースのニューラルランキングモデル（NRM）は、クエリとドキュメントがBERTのセルフアテンションレイヤーを通じてどのようにエンコードされるかによって、バイエンコーダーとクロスエンコーダーに分類されます。バイエンコーダーモデルは、クエリの前にすべてのドキュメントを事前処理できるため非常に効率的ですが、そのパフォーマンスはクロスエンコーダーモデルに比べて劣ります。両モデルとも、BERT表現を入力とし、関連性スコアを出力するランカーを利用します。本研究では、クロスエンコーダーNRMとバイエンコーダーNRMに対してマルチティーチャー蒸留を適用し、2つのランカーを持つバイエンコーダーNRMを生成する方法を提案します。その結果として得られた学生バイエンコーダーは、クロスエンコーダー教師とバイエンコーダー教師の両方から同時に学び、さらに2つのランカーからの関連性スコアを統合することで、性能向上を達成します。我々はこの手法をTRMD（Two Rankers and Multi-teacher Distillation）と呼びます。実験では、TwinBERTとColBERTをベースラインバイエンコーダーとして使用。monobertをクロスエンコーダー教師として使用し、TwinBERTまたはColBERTをバイエンコーダー教師とする場合、TRMDは、対応するベースラインバイエンコーダーよりも優れた性能を発揮する学生バイエンコーダーを生成しました。P@20において、最大改善率は11.4%、平均改善率は6.8%でした。さらにクロスエンコーダー学生をTRMDで生産する追加実験も行い、それによってクロスエンコーダーも改善できることを確認しました。

Abstract:　 BERT-based Neural Ranking Models (NRMs) can be classified according to how the query and document are encoded through BERT's self-attention layers - bi-encoder versus cross-encoder. Bi-encoder models are highly efficient because all the documents can be pre-processed before the query time, but their performance is inferior compared to cross-encoder models. Both models utilize a ranker that receives BERT representations as the input and generates a relevance score as the output. In this work, we propose a method where multi-teacher distillation is applied to a cross-encoder NRM and a bi-encoder NRM to produce a bi-encoder NRM with two rankers. The resulting student bi-encoder achieves an improved performance by simultaneously learning from a cross-encoder teacher and a bi-encoder teacher and also by combining relevance scores from the two rankers. We call this method TRMD (Two Rankers and Multi-teacher Distillation). In the experiments, TwinBERT and ColBERT are considered as baseline bi-encoders. When monoBERT is used as the cross-encoder teacher, together with either TwinBERT or ColBERT as the bi-encoder teacher, TRMD produces a student bi-encoder that performs better than the corresponding baseline bi-encoder. For P@20, the maximum improvement was 11.4%, and the average improvement was 6.8%. As an additional experiment, we considered producing cross-encoder students with TRMD, and found that it could also improve the cross-encoders.

Improving Neural Text Style Transfer by Introducing Loss Function Sequentiality
Authors: Chinmay Rane (1), Gaël Dias (2), Alexis Lechervy (2), Asif Ekbal (3)
1: Normandie University & ABV-Indian Institute of Information Technology and Management Gwalior, 2: Normandie Univ, 3: Indian Institute of Technology Patna

ACM DL

Google Scholar

(273)
概要:　テキストスタイル変換は対話エージェントにとって重要な課題であり、特定の対話状況に応じて発話生成を適応させることができます。これは、文の意味を保ちながら、特定のスタイルを導入することに他なりません。この範囲内で、並列データに依存する方法や非教師あり技術を利用する方法など、さまざまな戦略が提案されてきました。本論文では、後者のアプローチを採用し、異なる損失関数を学習過程に順次導入することで標準モデルの性能を向上させることができることを示します。また、全体的または局所的なテキスト情報に焦点を当てた異なるスタイル分類器を組み合わせることで文生成が改善されることも実証します。Yelpデータセットを用いた実験では、スタイルの正確性、文法の正確性、および内容の保持において、我々の手法が現状の最先端モデルに強く競合することを示しています。

Abstract:　 Text style transfer is an important issue for conversational agents as it may adapt utterance production to specific dialogue situations. It consists in introducing a given style within a sentence while preserving its semantics. Within this scope, different strategies have been proposed that either rely on parallel data or take advantage of non-supervised techniques. In this paper, we follow the latter approach and show that the sequential introduction of different loss functions into the learning process can boost the performance of a standard model. We also evidence that combining different style classifiers that either focus on global or local textual information improves sentence generation. Experiments on the Yelp dataset show that our methodology strongly competes with the current state-of-the-art models across style accuracy, grammatical correctness, and content preservation.

Inductive Representation Learning in Temporal Networks via Mining Neighborhood and Community Influences
Authors: Meng Liu (1), Yong Liu (1)
1: Heilongjiang University

ACM DL

Google Scholar

(274)
概要:　ネットワーク表現学習は、ネットワーク内の各ノードの埋め込みを生成し、ノード分類やリンク予測などの下流の機械学習タスクを容易にします。現在の研究は主に、固定されたノード埋め込みを生成する伝達的なネットワーク表現学習に焦点を当てていますが、これは実世界のアプリケーションには適していません。そこで、我々は時間的ネットワークにおいて、隣接とコミュニティの影響を活用する新しい帰納的ネットワーク表現学習法であるMNCIを提案します。我々は、隣接の影響とコミュニティの影響を統合して、任意の時点でノードの埋め込みを生成するアグリゲーター関数を提案します。いくつかの実世界のデータセットで広範な実験を行い、MNCIをいくつかの最先端ベースライン手法とノード分類やネットワーク可視化などの様々なタスクで比較しました。実験結果は、MNCIがベースラインよりも優れた性能を発揮することを示しています。

Abstract:　 Network representation learning aims to generate an embedding for each node in a network, which facilitates downstream machine learning tasks such as node classification and link prediction. Current work mainly focuses on transductive network representation learning, i.e. generating fixed node embeddings, which is not suitable for real-world applications. Therefore, we propose a new inductive network representation learning method called MNCI by mining neighborhood and community influences in temporal networks. We propose an aggregator function that integrates neighborhood influence with community influence to generate node embeddings at any time. We conduct extensive experiments on several real-world datasets and compare MNCI with several state-of-the-art baseline methods on various tasks, including node classification and network visualization. The experimental results show that MNCI achieves better performance than baselines.

KeyBLD: Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval
Authors: Minghan Li (1), Eric Gaussier (1)
1: Univ. Grenoble Alpes

ACM DL

Google Scholar

(275)
概要:　 Transformerベースのモデル、とりわけBERTのような事前学習された言語モデルは、様々な自然言語処理や情報検索のタスクにおいて大きな成功を収めています。しかし、この種のモデルは、自己注意メカニズムの二次複雑性のために長い文書を処理するのが難しいです。最近の研究では、長い文書を切り捨てるか、標準的なBERTモデルで処理できるパッセージに分割する手法が取られています。さらに、階層的なアーキテクチャ（例えば、トランスフォーマー）を採用して、各パッセージの表現を基に文書全体の表現を構築することも可能です。しかし、これらのアプローチは情報を失うか、高い計算複雑性を持つ（後者の場合、時間とエネルギーの両面で消費が大きい）という問題があります。ここでは、長い文書の主要なブロックをローカルなクエリ-ブロック事前ランク付けによって選択し、いくつかのブロックを集約して、BERTのようなモデルで処理できる短い文書を形成する、若干異なるアプローチを採用しています。標準的な情報検索データセットで行った実験は、提案手法の有効性を示しています。

Abstract:　 Transformer-based models, and especially pre-trained language models like BERT, have shown great success on a variety of Natural Language Processing and Information Retrieval tasks. However, such models have difficulties to process long documents due to the quadratic complexity of the self-attention mechanism. Recent works either truncate long documents or segment them into passages that can be treated by a standard BERT model. A hierarchical architecture, such as a transformer, can be further adopted to build a document-level representation on top of the representations of each passage. However, these approaches either lose information or have high computational complexity (and are both time and energy consuming in this latter case). We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then aggregates few blocks to form a short document that can be processed by a model such as BERT. Experiments conducted on standard Information Retrieval datasets demonstrate the effectiveness of the proposed approach.

Knowledge Graph Embedding via Metagraph Learning
Authors: Chanyoung Chung (1), Joyce Jiyoung Whang (1)
1: KAIST

ACM DL

Google Scholar

(276)
概要:　知識グラフ埋め込みは、知識グラフの構造を維持しつつ、エンティティと関係を連続特徴空間に表現することを目指しています。既存のほとんどの知識グラフ埋め込み手法は、与えられた知識グラフの平面的な構造にのみ注目するか、定義済みのエンティティタイプを用いて複雑な構造を探索することに焦点を当てています。本論文では、エンティティ間の構造的類似性を測る新しいアフィニティメトリックを提案し、これに基づいて知識グラフのメタグラフを定義します。その後、ハイパーグラフクラスタリングを用いて近接するエンティティをグループ化します。エンティティタイプに関する事前情報がなくても、意味的に近いエンティティの集合をうまく一つのスーパーエンティティに統合することができます。我々は、まずメタグラフで表現を学習し、その学習した表現で元の知識グラフのエンティティと関係を初期化する、知識グラフ埋め込みのためのメタグラフベースの事前学習モデルを提案します。実験結果は、我々の手法が最先端の知識グラフ埋め込み手法の精度を向上させるのに効果的であることを示しています。

Abstract:　 Knowledge graph embedding aims to represent entities and relations in a continuous feature space while preserving the structure of a knowledge graph. Most existing knowledge graph embedding methods either focus only on a flat structure of the given knowledge graph or exploit the predefined types of entities to explore an enriched structure. In this paper, we define the metagraph of a knowledge graph by proposing a new affinity metric that measures the structural similarity between entities, and then grouping close entities by hypergraph clustering. Without any prior information about entity types, a set of semantically close entities is successfully merged into one super-entity in our metagraph representation. We propose the metagraph-based pre-training model of knowledge graph embedding where we first learn representations in the metagraph and initialize the entities and relations in the original knowledge graph with the learned representations. Experimental results show that our method is effective in improving the accuracy of state-of-the-art knowledge graph embedding methods.

Learning Early Exit Strategies for Additive Ranking Ensembles
Authors: Francesco Busolin (1), Claudio Lucchese (1), Franco Maria Nardini (2), Salvatore Orlando (1), Raffaele Perego (2), Salvatore Trani (2)
1: Ca' Foscari University of Venice, 2: ISTI-CNR

ACM DL

Google Scholar

(277)
概要:　現代の検索エンジンランキングパイプラインは、大規模な機械学習によって得られた回帰木のアンサンブルに基づいていることが一般的です。私たちは、LEARという新しい技術を提案します。この技術は、ドキュメントがスコアを蓄積するためにたどる木の平均数を減らすことを目的としており、全体的なクエリ応答時間を短縮します。LEARは、ドキュメントが最終的な上位k件にランクインする可能性が低いため、アンサンブルから早期退出できるかどうかを予測する分類器を活用します。この早期退出の判断は、木の評価が限られた数に達した時点、すなわち「監視点」で行われ、部分スコアを利用して有望でないドキュメントをフィルタリングします。私たちは、実生産に近い設定でLEARを評価し、アンサンブル走査のための最先端のアルゴリズムを採用しました。2つの公開データセットで包括的な実験評価を行い、LEARがクエリ処理の効率に対して有意な影響を与えつつ、そのランキング品質を損なわないことを示しました。具体的には、最初のデータセットでは、LEARはNDCG@10の損失なくスピードアップを3倍達成し、2つ目のデータセットでは、スピードアップが5倍を超えつつ、NDCG@10の損失がごく僅か（< 0.05%）であることがわかりました。

Abstract:　 Modern search engine ranking pipelines are commonly based on large machine-learned ensembles of regression trees. We propose LEAR, a novel - learned - technique aimed to reduce the average number of trees traversed by documents to accumulate the scores, thus reducing the overall query response time. LEAR exploits a classifier that predicts whether a document can early exit the ensemble because it is unlikely to be ranked among the final top-k results. The early exit decision occurs at a sentinel point, i.e., after having evaluated a limited number of trees, and the partial scores are exploited to filter out non-promising documents. We evaluate LEAR by deploying it in a production-like setting, adopting a state-of-the-art algorithm for ensembles traversal. We provide a comprehensive experimental evaluation on two public datasets. The experiments show that LEAR has a significant impact on the efficiency of the query processing without hindering its ranking quality. In detail, on a first dataset, LEAR is able to achieve a speedup of 3x without any loss in NDCG@10, while on a second dataset the speedup is larger than 5x with a negligible NDCG@10 loss (< 0.05%).

LSTPR: Graph-based Matrix Factorization with Long Short-term Preference Ranking
Authors: Chih-Hen Lee (1), Jun-En Ding (1), Chih-Ming Chen (2), Jing-Kai Lou (3), Ming-Feng Tsai (2), Chuan-Ju Wang (1)
1: Academia Sinica, 2: National Chengchi University, 3: KKStream Limited

ACM DL

Google Scholar

(278)
概要:　私たちは、推薦アルゴリズムにおいてユーザーとアイテムの相互作用の時間順序を考慮することが、近年新たなクラスのアルゴリズムとなっていることに注目しています。その中でも、順序推薦モデルは最も人気のあるアプローチです。理論的には、このような詳細なモデリングは推薦性能に有益であるべきですが、順序モデルは実際にはデータの希少性の問題に苦しんでいます。これは、アイテムシーケンスの組み合わせが非常に多いためです。この問題に対処するために、私たちは高次グラフ情報と長短期のユーザープリファレンスを統合したグラフベースの行列分解モデルであるLSTPRを提案します。LSTPRは、長期的および短期的なユーザープリファレンスを明示的に区別し、ユーザー-アイテムグラフ上でのランダムサーフィンを通じて希少な相互作用を充実させます。時間的なユーザー・アイテム情報を持つ3つの推薦データセットでの実験により、提案されたLSTPRモデルが7つのベースラインメソッドよりも顕著に優れた性能を達成することを示しました。

Abstract:　 Considering the temporal order of user-item interactions for recommendation forms a novel class of recommendation algorithms in recent years, among which sequential recommendation models are the most popular approaches. Although, theoretically, such fine-grained modeling should be beneficial to the recommendation performance, these sequential models in practice greatly suffer from the issue of data sparsity as there are a huge number of combinations for item sequences. To address the issue, we propose LSTPR, a graph-based matrix factorization model that incorporates both high-order graph information and long short-term user preferences into the modeling process. LSTPR explicitly distinguishes long-term and short-term user preferences and enriches the sparse interactions via random surfing on the user-item graph. Experiments on three recommendation datasets with temporal user-item information demonstrate that the proposed LSTPR model achieves significantly better performance than the seven baseline methods.

Maximal Multipolarized Cliques Search in Signed Networks
Authors: Jie Gao (1), Fei Hao (2), Geyong Min (3), Zhipeng Cai (4)
1: Shaanxi Normal University, 2: Shaanxi Normal University & University of Exeter, 3: University of Exeter, 4: Georgia State University

ACM DL

Google Scholar

(279)
概要:　ソーシャルメディア上のグループ極性化の増加は、公共の議論や情報拡散の健全性に深刻な影響を及ぼしています。現在、署名ネットワーク内の極性構造を検出することは、ソーシャルメディア上のグループ極性化を研究するための良い動機となっています。しかし、多くの研究が極性構造の数を2つに制限し、実際のシナリオでは複数の極性構造から成る署名ネットワークを考慮していないため、不合理な前提となっています。本研究では、既存の限定的な研究を克服するため、構造的クラスタビリティ理論に基づく新しい凝集部分グラフモデル、「最大多極化クリーク (MMC)」を提案します。これは、部分クリーク内のエッジが正、部分クリーク間のエッジが負であるように、k個の極性化された部分クリークに分割可能です。本論文では、署名ネットワークにおける最大多極化クリーク探索 (MMCS) の問題を定式化し、これがNP困難であることを証明します。この問題に対処するために、まず強力な枝刈りルールを作成して署名ネットワークを大幅に削減し、さらに、削減された署名ネットワーク内のすべての最大多極化クリークを探索する効率的なアルゴリズムを開発します。実世界の署名ネットワークにおける実験結果は、提案されたアルゴリズムの効率性と有効性を示しています。

Abstract:　 The increasing of group polarization on social media seriously impacts on the health of public discourse and information dissemination. At present, detecting polarized structures in signed networks is well-motivated for studying the group polarization on social media. However, most studies restricted the number of polarized structures to only two, while neglecting the real-world scenario where signed networks consist of multiple polarized structures, that is an unreasonable assumption. To conquer the limitations of the existing work, in this paper, we present a novel cohesive subgraph model based on structural clusterable theory, named maximal multipolarized clique (MMC), which can be partitioned into k polarized subcliques such that the edges in subcliques are positive and the edges between subcliques are negative. This paper formulates the problem of Maximal Multipolarized Cliques Search (MMCS) in signed networks which is proved to be NP-hard. To address this problem, we first devise powerful pruning rules to reduce the signed network significantly and further develop an efficient algorithm to search all maximal multipolarized cliques in the reduced signed network. The experimental results on real-world signed networks demonstrate the efficiency and effectiveness of our algorithm.

MetaP: Meta Pattern Learning for One-Shot Knowledge Graph Completion
Authors: Zhiyi Jiang (1), Jianliang Gao (1), Xinqi Lv (1)
1: Central South University

ACM DL

Google Scholar

(280)
概要:　知識グラフ（KGs）は、情報検索の様々なアプリケーションで広く利用されています。しかし、KGsの規模は大きいにもかかわらず、不完全性の問題に直面しています。知識グラフ補完（KGC）の従来手法は、各関係に対して多数のトレーニング事例を必要とします。しかし、少数の関連トリプルしか持たないロングテール関係はKGsに広く存在しています。そのため、ロングテール関係を補完することは非常に困難です。本論文では、各関係に対して参照が1つしかない厳しい設定下で、新しい事実を予測するためのメタパターン学習フレームワーク（MetaP）を提案します。データ内のパターンはデータを分類するための代表的な規則性です。KGs内のトリプルも、関係固有のパターンに従っており、これらのパターンを用いてトリプルの有効性を測ることができます。私たちのモデルは、畳み込みパターン学習器を通じてパターンを効果的に抽出し、照会パターンと参照パターンをマッチングさせることでトリプルの有効性を正確に測定します。広範な実験により、我々の手法の有効性が実証されました。さらに、新型コロナウイルスの研究プロセスを支援するために、COVID-19の数ショットKGCデータセットを構築しました。

Abstract:　 Knowledge Graphs (KGs) are widely used in various applications of information retrieval. Despite the large scale of KGs, they are still facing incomplete problems. Conventional approaches on Knowledge Graph Completion (KGC) require a large number of training instances for each relation. However, long-tail relations which only have a few related triples are ubiquitous in KGs. Therefore, it is very difficult to complete the long-tail relations. In this paper, we propose a meta pattern learning framework (MetaP) to predict new facts of relations under a challenging setting where there is only one reference for each relation. Patterns in data are representative regularities to classify data. Triples in KGs also conform to relation-specific patterns which can be used to measure the validity of triples. Our model extracts the patterns effectively through a convolutional pattern learner and measures the validity of triples accurately by matching query patterns with reference patterns. Extensive experiments demonstrate the effectiveness of our method. Besides, we build a few-shot KGC dataset of COVID-19 to assist the research process of the new coronavirus.

MSSM: A Multiple-level Sparse Sharing Model for Efficient Multi-Task Learning
Authors: Ke Ding (1), Xin Dong (1), Yong He (2), Lei Cheng (2), Chilin Fu (2), Zhaoxin Huan (2), Hai Li (1), Tan Yan (1), Liang Zhang (2), Xiaolu Zhang (2), Linjian Mo (1)
1: Ant Group, 2: Ant Group

ACM DL

Google Scholar

(281)
概要:　マルチタスク・ラーニング（MTL）は、さまざまな実世界の応用において解決が求められる難題です。典型的なマルチタスク・ラーニングのアプローチは、全タスク間でグローバルなパラメーター共有メカニズムを確立するか、各タスクに個別のパラメーターセットを割り当ててタスク間にクロス接続を設けるものです。しかし、ほとんどの既存の手法では、すべてのタスクが区別なく全ての特徴を完全にまたは比例的に共有するため、その有用性の違いを認識しません。その結果、あるタスクに役立つ特徴が他のタスクには不要であるにもかかわらず干渉し、望ましくない負の転送効果を引き起こすことがあります。本論文では、特徴を選択的に学習し、効率的に知識を共有する新しいアーキテクチャ、「多層スパース共有モデル（MSSM）」を設計しました。MSSMはまず、フィールドレベルのスパース接続モジュール（FSCM）を使用して、タスク間の一般化のために特徴フィールドのより表現力豊かな組み合わせを学習しつつ、同時に各タスクに特有の特徴をカスタマイズできるようにします。さらに、セルレベルのスパース共有モジュール（CSSM）は、一組のコーディング変数を通じて共有パターンを認識し、特定のタスクに対してルーティングされるセルを選択的に選びます。いくつかの実世界のデータセットに対する広範な実験結果により、MSSMはAUCおよびLogLossの指標において、SOTAモデルを著しく上回る性能を示しました。

Abstract:　 Multi-task learning(MTL) is an open and challenging problem in various real-world applications. The typical way of conducting multi-task learning is establishing some global parameter sharing mechanism across all tasks or assigning each task an individual set of parameters with cross-connections between tasks. However, for most existing approaches, all tasks just thoroughly or proportionally share all the features without distinguishing the helpfulness of them. By that, some tasks would be intervened by the unhelpful features that are useful for other tasks, leading to undesired negative transfer between tasks. In this paper, we design a novel architecture named the Multiple-level Sparse Sharing Model (MSSM), which can learn features selectively and share knowledge across all tasks efficiently. MSSM first employs a field-level sparse connection module (FSCM) to enable much more expressive combinations of feature fields to be learned for generalization across tasks while still allowing for task-specific features to be customized for each task. Furthermore, a cell-level sparse sharing module (CSSM) can recognize the sharing pattern through a set of coding variables that selectively choose which cells to route for a given task. Extensive experimental results on several real-world datasets show that MSSM outperforms SOTA models significantly in terms of AUC and LogLoss metrics.

NIP-GCN: An Augmented Graph Convolutional Network with Node Interaction Patterns
Authors: Manish Chandra (1), Debasis Ganguly (2), Pabitra Mitra (1), Bithika Pal (1), James Thomas (3)
1: Indian Institute of Technology Kharagpur, 2: University of Glasgow, 3: University College London

ACM DL

Google Scholar

(282)
概要:　本論文では、拡張型グラフ畳み込みネットワーク（GCN）メカニズムを提案します。このメカニズムでは、あるノードとその隣接ノードとの局所的な相互作用パターン（具体的には、事前学習されたノードベクトルとその隣接ノード間のコサイン類似度値の分布）に関する追加情報を用いて、GCNのトレーニング前にノードの表現を強化します。これにより、ノードの構造的特性に関する追加の情報が提供され、GCNの標準的な畳み込み操作がそれを活用して、下流タスクでの効果の向上を図ることができます。我々の実験では、このノード相互作用パターン（NIP）を追加し、さらにノイズ対比ペアワイズ文書類似度目的をGCNに組み込むことで、リンクド文書分類タスクの性能が向上することを実証しました。

Abstract:　 In this paper, we propose an augmented Graph Convolutional Network (GCN) mechanism wherein additional information of local interaction patterns between a node with its neighbors (specifically, in the form of distribution of cosine similarity values of a pre-trained node vector with its neighbors) is used to enrich a node's representation prior to training a GCN. This provides additional information about the structural properties of a node, which the standard convolution operation in a GCN can then leverage for obtaining potentially improved effectiveness in a down-stream task. Our experiments demonstrate that adding these node interaction patterns (NIPs) along with an additional noise-contrastive pairwise document similarity objective within a GCN improves the linked document classification task.

Podcast Metadata and Content: Episode Relevance and Attractiveness in Ad Hoc Search
Authors: Ben Carterette (1), Rosie Jones (2), Gareth F. Jones (3), Maria Eskevich (4), Sravana Reddy (2), Ann Clifton (1), Yongze Yu (2), Jussi Karlgren (5), Ian Soboroff (6)
1: Spotify, 2: Spotify, 3: Dublin City University, 4: CLARIN, 5: Spotify, 6: NIST

ACM DL

Google Scholar

(283)
概要:　急速に増加しているオンラインポッドキャストのアーカイブには、幅広いトピックに関する多様なコンテンツが含まれています。これらのアーカイブは、エンターテインメントや専門的な利用において重要なリソースとなりますが、その価値はユーザーが興味のあるコンテンツを迅速かつ確実に見つけられる場合にのみ実現されます。関連するコンテンツの検索は、コンテンツ作成者が提供するメタデータに基づくこともあれば、話されている内容の書き起こしに基づくこともあります。これらの音声ストリームの奥深くから多様な情報ニーズに応じて関連するコンテンツを発掘するには、システムのプロトタイピングのアプローチを変える必要があります。我々は、多様なポッドキャストの情報ニーズと、取得されたコンテンツの関連性を評価するための異なるアプローチを説明します。これらの情報ニーズを使用して、これらの情報ソースの有用性と効果を調査します。我々の分析に基づき、アドホック検索のためのポッドキャストコンテンツのインデックス作成と検索に対するアプローチを推奨します。

Abstract:　 Rapidly growing online podcast archives contain diverse content on a wide range of topics. These archives form an important resource for entertainment and professional use, but their value can only be realized if users can rapidly and reliably locate content of interest. Search for relevant content can be based on metadata provided by content creators, but also on transcripts of the spoken content itself. Excavating relevant content from deep within these audio streams for diverse types of information needs requires varying the approach to systems prototyping. We describe a set of diverse podcast information needs and different approaches to assessing retrieved content for relevance. We use these information needs in an investigation of the utility and effectiveness of these information sources. Based on our analysis, we recommend approaches for indexing and retrieving podcast content for ad hoc search.

Propensity-scored Probabilistic Label Trees
Authors: Marek Wydmuch (1), Kalina Jasinska-Kobus (2), Rohit Babbar (3), Krzysztof Dembczynski (4)
1: Poznan University of Technology, 2: ML Research at Allegro.pl & Poznan University of Technology, 3: Aalto University, 4: Yahoo! Research & Poznan University of Technology

ACM DL

Google Scholar

(284)
概要:　極端多ラベル分類（XMLC）は、可能な全ラベルの極めて大きな集合から、関連性のある小さなラベルのサブセットでインスタンスをタグ付けするタスクを指します。最近では、XMLCは自動コンテンツラベリング、オンライン広告、推奨システムなど、多様なウェブアプリケーションに広く応用されています。このような環境では、ラベルの分布が非常に不均衡であり、大部分が非常に稀なテールラベルで構成されていることが多く、関連するラベルが欠損している場合もあります。これらの問題に対する解決策として、プロペンシティモデルが導入され、いくつかのXMLCアルゴリズム内で適用されています。本研究では、確率的ラベルツリーというXMLC問題に対する一般的なアプローチについて、このモデルの下での最適な予測の問題に注目します。既知の全確率とプロペンシティを前提として、効率的に最適解を見つけるA探索アルゴリズムに基づく推論手続きを導入します。このアプローチの魅力を、一般的なXMLCベンチマークデータセットに対する広範な実証研究において示します。

Abstract:　 Extreme multi-label classification (XMLC) refers to the task of tagging instances with small subsets of relevant labels coming from an extremely large set of all possible labels. Recently, XMLC has been widely applied to diverse web applications such as automatic content labeling, online advertising, or recommendation systems. In such environments, label distribution is often highly imbalanced, consisting mostly of very rare tail labels, and relevant labels can be missing. As a remedy to these problems, the propensity model has been introduced and applied within several XMLC algorithms. In this work, we focus on the problem of optimal predictions under this model for probabilistic label trees, a popular approach for XMLC problems. We introduce an inference procedure, based on the A-search algorithm, that efficiently finds the optimal solution, assuming that all probabilities and propensities are known. We demonstrate the attractiveness of this approach in a wide empirical study on popular XMLC benchmark datasets.

ReadsRE: Retrieval-Augmented Distantly Supervised Relation Extraction
Authors: Yue Zhang (1), Hongliang Fei (2), Ping Li (1)
1: Baidu Research, 2: Baidu Research

ACM DL

Google Scholar

(285)
概要:　要訳
遠隔監視 (DS) は、リレーション抽出 (RE) のために自動的に（ノイズの含まれた）ラベル付きデータを構築するために広く使用されています。ノイズの多いラベルの問題に対処するために、ほとんどのモデルは複数の文をバッグとして表現する多重インスタンス学習の枠組みを採用してきました。しかし、この戦略は複数の仮定（例：バッグ内のすべての文が同じリレーションを共有する）に依存しており、これらの仮定は現実のアプリケーションでは無効である可能性があります。さらに、データセット内にサポート文がほとんどないロングテールエンティティペアに対してはうまく機能しません。本研究では、リトリーバル拡張遠隔監視リレーション抽出（ReadsRE）と呼ばれる新しいパラダイムを提案します。これは、大規模なオープンドメインの知識（例：Wikipedia）をリトリーバルステップに組み込むことができます。ReadsREは、ニューラルリトリーバとリレーション予測器をエンドツーエンドの枠組みでシームレスに統合しています。著名なNYT10データセットでのReadsREの有効性を示しました。実験結果は、ReadsREが有意義な文（すなわち、ノイズを除去）を効果的にリトリーバルでき、外部オープンドメインコーパスを組み込むことでオリジナルのデータセットにおけるロングテールエンティティペアの問題を緩和することを確認しました。比較により、ReadsREがこのタスクにおいて他のベースラインを上回ることを示しました。

Abstract:　 Distant supervision (DS) has been widely used to automatically construct (noisy) labeled data for relation extraction (RE). To address the noisy label problem, most models have adopted the multi-instance learning paradigm by representing entity pairs as a bag of sentences. However, this strategy depends on multiple assumptions (e.g., all sentences in a bag share the same relation), which may be invalid in real-world applications. Besides, it cannot work well on long-tail entity pairs which have few supporting sentences in the dataset. In this work, we propose a new paradigm named retrieval-augmented distantly supervised relation extraction (ReadsRE), which can incorporate large-scale open-domain knowledge (e.g., Wikipedia) into the retrieval step. ReadsRE seamlessly integrates a neural retriever and a relation predictor in an end-to-end framework. We demonstrate the effectiveness of ReadsRE on the well-known NYT10 dataset. The experimental results verify that ReadsRE can effectively retrieve meaningful sentences (i.e., denoise), and relieve the problem of long-tail entity pairs in the original dataset through incorporating external open-domain corpus. Through comparisons, we show ReadsRE outperforms other baselines for this task.

Regularized Dual-PPMI Co-clustering for Text Data
Authors: Séverine Affeldt (1), Lazhar Labiod (1), Mohamed Nadif (1)
1: Université de Paris

ACM DL

Google Scholar

(286)
概要:　ドキュメント-単語行列の共クラスタリングは、一方向のクラスタリングよりも効果的であることが証明されている。テキストデータはその特性上、不均衡で方向性を持つことが一般的である。最近、vMF（von Mises-Fisher）混合モデルが、不均衡データを扱う一方で、テキストの方向性の性質を活用するために提案された。本研究では、vMFモデルベースの共クラスタリングの行列形式に基づいた新しい共クラスタリングアプローチを提案する。この形式は、単語-単語の意味関係や、ドキュメント-ドキュメントの類似性の両方を容易に組み込むことができる柔軟なテキスト共クラスタリング法をもたらす。従来の手法が一般的に加法的に類似性を組み込むのに対し、我々はテキストデータ構造をよりよく捉える二重乗法的正則化を提案する。様々な実世界のテキストデータセットに対する広範な評価を通じて、基本手法や競合手法と比べて、クラスタリング結果や共クラスタトピックの一貫性の両面で我々の提案アプローチの優れた性能を示す。

Abstract:　 Co-clustering of document-term matrices has proved to be more effective than one-sided clustering. By their nature, text data are also generally unbalanced and directional. Recently, the von Mises-Fisher (vMF) mixture model was proposed to handle unbalanced data while harnessing the directional nature of text. In this paper we propose a novel co-clustering approach based on a matrix formulation of vMF model-based co-clustering. This formulation leads to a flexible method for text co-clustering that can easily incorporate both word-word semantic relationships and document-document similarities. By contrast with existing methods, which generally use an additive incorporation of similarities, we propose a dual multiplicative regularization that better encapsulates the underlying text data structure. Extensive evaluations on various real-world text datasets demonstrate the superior performance of our proposed approach over baseline and competitive methods, both in terms of clustering results and co-cluster topic coherence.

RLNF: Reinforcement Learning based Noise Filtering for Click-Through Rate Prediction
Authors: Pu Zhao (1), Chuan Luo (1), Cheng Zhou (1), Bo Qiao (1), Jiale He (1), Liangjie Zhang (1), Qingwei Lin (1)
1: Microsoft

ACM DL

Google Scholar

(287)
概要:　クリック率（CTR）予測は、ユーザーが興味を持ちクリックにつながる広告を提示することを目的としており、さまざまなオンライン広告システムにとって極めて重要です。実際には、CTR予測は通常、クリックされた広告を正のサンプル、その他の広告を負のサンプルとする従来のバイナリ分類問題として定式化されます。しかし、クリックされなかった広告を直接負のサンプルとして扱うと、ラベルノイズの問題が深刻になる可能性があります。なぜなら、ユーザーがいくつかの広告に興味を持っていてもクリックしない理由は多岐にわたるからです。このような深刻な問題に対処するために、我々は強化学習に基づくノイズフィルタリング手法「RLNF（Reinforcement Learning Noise Filtering）」を提案します。この手法では、ノイズフィルターを用いて効果的な負のサンプルを選別します。RLNFでは、選別された効果的な負のサンプルを利用してCTR予測モデルを強化し、同時にCTR予測モデルの性能を報酬として用いた強化学習を通じてノイズフィルターの効果を向上させることができます。実際に、ノイズフィルターとCTR予測モデルの強化を交互に行うことで、両者の性能が改善されます。我々の実験では、7つの最先端CTR予測モデルにRLNFを装備しました。公的データセットおよび産業データセットを用いた広範な実験により、RLNFがこれら7つのCTR予測モデルすべての性能を大幅に向上させることが示され、RLNFの有効性と一般性の両方が確認されました。

Abstract:　 Click-through rate (CTR) prediction aims to recall the advertisements that users are interested in and to lead users to click, which is of critical importance for a variety of online advertising systems. In practice, CTR prediction is generally formulated as a conventional binary classification problem, where the clicked advertisements are positive samples and the others are negative samples. However, directly treating unclicked advertisements as negative samples would suffer from the severe label noise issue, since there exist many reasons why users are interested in a few advertisements but do not click. To address such serious issue, we propose a reinforcement learning based noise filtering approach, dubbed RLNF, which employs a noise filter to select effective negative samples. In RLNF, such selected, effective negative samples can be used to enhance the CTR prediction model, and meanwhile the effectiveness of the noise filter can be enhanced through reinforcement learning using the performance of CTR prediction model as reward. Actually, by alternating the enhancements of the noise filter and the CTR prediction model, the performance of both the noise filter and the CTR prediction model is improved. In our experiments, we equip 7 state-of-the-art CTR prediction models with RLNF. Extensive experiments on a public dataset and an industrial dataset present that RLNF significantly improves the performance of all these 7 CTR prediction models, which indicates both the effectiveness and the generality of RLNF.

SDG: A Simplified and Dynamic Graph Neural Network
Authors: Dongqi Fu (1), Jingrui He (1)
1: University of Illinois at Urbana-Champaign

ACM DL

Google Scholar

(288)
概要:　グラフニューラルネットワーク（GNNs）は、その強力な表現学習能力により、不正検出、情報検索、レコメンダーシステムなど多くの高影響アプリケーションで最先端のパフォーマンスを達成しています。一部の新しい研究では、計算の複雑さを低減するため、GNNモデルの構造を簡素化することに焦点が当てられてきました。しかし、これらのアプリケーションの動的な性質は、GNN構造が時間とともに進化することを要求しており、この点はこれまでほとんど見過ごされてきました。本論文では、このギャップを埋めるために、簡素化された動的なグラフニューラルネットワークモデル、SDGを提案します。SDGは効率的で効果的であり、解釈可能な予測を提供します。特に、SDGでは、従来のGNNのメッセージパッシングメカニズムをパーソナライズされたPageRank追跡プロセスに基づいた動的伝播スキームに置き換えます。提案するSDGの有効性と効率性を実証するために、広範な実験とアブレーション研究を行いました。また、フェイクニュース検出に関するケーススタディを設計し、SDGの解釈可能性を示しました。

Abstract:　 Graph Neural Networks (GNNs) have achieved state-of-the-art performance in many high-impact applications such as fraud detection, information retrieval, and recommender systems due to their powerful representation learning capabilities. Some nascent efforts have been concentrated on simplifying the structures of GNN models, in order to reduce the computational complexity. However, the dynamic nature of these applications requires GNN structures to be evolving over time, which has been largely overlooked so far. To bridge this gap, in this paper, we propose a simplified and dynamic graph neural network model, called SDG. It is efficient, effective, and provides interpretable predictions. In particular, in SDG, we replace the traditional message-passing mechanism of GNNs with the designed dynamic propagation scheme based on the personalized PageRank tracking process. We conduct extensive experiments and ablation studies to demonstrate the effectiveness and efficiency of our proposed SDG. We also design a case study on fake news detection to show the interpretability of SDG.

Semantic Query Labeling Through Synthetic Query Generation
Authors: Elias Bassani (1), Gabriella Pasi (2)
1: Consorzio per il Trasferimento Tecnologico - C2T & University of Milano-Bicocca, 2: University of Milano-Bicocca

ACM DL

Google Scholar

(289)
概要:　特定のドメインに特化した構造化された文書のコーパス（例：eコマース、メディアストリーミングサービス、仕事探しプラットフォーム）における検索は、伝統的な検索タスクやファセット検索によって管理されることが多いです。セマンティッククエリラベリング（Semantic Query Labeling）は、検索クエリの構成要素を特定し、それぞれにドメイン固有の事前定義されたセマンティックラベルを割り当てるタスクであり、文書の構造を検索中に活用しつつ、キーワードベースのクエリの形式を変更しないことを可能にします。公開されているデータセットの不足とその生成コストの高さから、この分野での発表はごくわずかしかありませんでした。本論文では、コーパスが既にユーザーが検索する情報を含むという仮定に基づき、セマンティックラベル付きクエリを自動生成する手法を提案し、BERT、ガゼティアベースの特徴、および条件付き確率場（CRF）に基づいたセマンティックタグ付け器が、実世界のデータで訓練された同じモデルと同等の結果を合成クエリで訓練することで達成できることを示します。また、映画ドメインにおける手動で注釈を付けた大規模なクエリデータセットも提供し、セマンティッククエリラベリングの研究に適しています。このデータセットが公開されることで、この分野での将来の研究を刺激することを期待しています。

Abstract:　 Searching in a domain-specific corpus of structured documents (e.g., e-commerce, media streaming services, job-seeking platforms) is often managed as a traditional retrieval task or through faceted search. Semantic Query Labeling --- the task of locating the constituent parts of a query and assigning domain-specific predefined semantic labels to each of them --- allows leveraging the structure of documents during retrieval while leaving unaltered the keyword-based query formulation. Due to both the lack of a publicly available dataset and the high cost of producing one, there have been few published works in this regard. In this paper, basing on the assumption that a corpus already contains the information the users search, we propose a method for the automatic generation of semantically labeled queries and show that a semantic tagger --- based on BERT, gazetteers-based features, and Conditional Random Fields --- trained on our synthetic queries achieves results comparable to those obtained by the same model trained on real-world data. We also provide a large dataset of manually annotated queries in the movie domain suitable for studying Semantic Query Labeling. We hope that the public availability of this dataset will stimulate future research in this area.

Significant Improvements over the State of the Art? A Case Study of the MS MARCO Document Ranking Leaderboard
Authors: Jimmy Lin (1), Daniel Campos (2), Nick Craswell (3), Bhaskar Mitra (4), Emine Yilmaz (5)
1: University of Waterloo and Microsoft AI & Research, 2: University of Illinois at Urbana-Champaign, 3: Microsoft AI & Research, 4: Microsoft AI & Research and University College London, 5: University College London

ACM DL

Google Scholar

(290)
概要:　リーダーボードは、現代の応用機械学習研究において遍在する存在です。設計上、それらはエントリーをあるリニアな順序に並べ替え、トップスコアのエントリーが「最新技術 (SOTA)」として認識されます。特にニューラルモデルでの今日の急速な進歩により、リーダーボードのトップエントリーは定期的に更新されます。これらは最新技術の改善として宣伝されますが、このような宣言はほとんどの場合、有意性検定で裏付けられていません。MS MARCOドキュメントランキングリーダーボードの文脈において、特定の質問を提起します。ある実行が現在のSOTAより有意に優れているかどうかをどう判断するのでしょうか？スケールタイプに関する最近の情報検索 (IR) 議論を背景に、我々の研究は特定の結果を明示的に区別し、単一のポイントメトリックに集約することを避ける評価フレームワークを提案します。MS MARCOドキュメントランキングリーダーボードのSOTA実行の実証分析により、現在の公式評価指標 (MRR@100) では見えない「有意に優れている」ことに関する洞察が明らかになりました。

Abstract:　 Leaderboards are a ubiquitous part of modern research in applied machine learning. By design, they sort entries into some linear order, where the top-scoring entry is recognized as the "state of the art" (SOTA). Due to the rapid progress being made today, particularly with neural models, the top entry in a leaderboard is replaced with some regularity. These are touted as improvements in the state of the art. Such pronouncements, however, are almost never qualified with significance testing. In the context of the MS MARCO document ranking leaderboard, we pose a specific question: How do we know if a run is significantly better than the current SOTA? Against the backdrop of recent IR debates on scale types, our study proposes an evaluation framework that explicitly treats certain outcomes as distinct and avoids aggregating them into a single-point metric. Empirical analysis of SOTA runs from the MS MARCO document ranking leaderboard reveals insights about how one run can be "significantly better" than another that are obscured by the current official evaluation metric (MRR@100).

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking
Authors: Thibault Formal (1), Benjamin Piwowarski (2), Stéphane Clinchant (3)
1: Naver Labs Europe & Sorbonne Université, 2: Sorbonne Université, 3: Naver Labs Europe

ACM DL

Google Scholar

(291)
概要:　ニューラル情報検索において、ランキングパイプラインの最初の検索機の改善に向けた研究が進行中です。効率的な近似最近傍法を用いて検索を行うために、密な埋め込みを学習することは良好な成果を示しています。一方で、単語の正確な一致や倒置インデックスの効率性といった、バグ・オブ・ワーズモデルの望ましい特性を引き継ぐことができるスパース（疎な）表現の学習への関心も高まっています。本研究では、明示的なスパース正則化と用語の重みに対する対数飽和効果を基にした新しい一次ランカーを提案します。これにより、極めてスパースな表現が得られ、最先端の密およびスパースメソッドに対して競争力のある結果を示します。私たちのアプローチは単純で、単一段階でエンドツーエンドの訓練が可能です。また、スパース正則化の寄与を制御することで、効果と効率のトレードオフを探求します。

Abstract:　 In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. In this work, we present a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-of-the-art dense and sparse methods. Our approach is simple, trained end-to-end in a single stage. We also explore the trade-off between effectiveness and efficiency, by controlling the contribution of the sparsity regularization.

Stopping Criteria for Technology Assisted Reviews based on Counting Processes
Authors: Alison Sneyd (1), Mark Stevenson (1)
1: University of Sheffield

ACM DL

Google Scholar

(292)
概要:　テクノロジー支援レビュー（Technology Assisted Review, TAR）は、関連文書を特定するために必要な手動判断の最小化を目指しています。作業負荷の軽減は、レビュー担当者が文書の確認を停止する時期について情報に基づいた判断を下せるかどうかに依存します。計数過程を用いることで、観測される関連文書の割合を分析して、TARアプローチの停止基準を理論的に確立する方法が提供されます。本論文では、既存のアプローチに対する2つの修正を紹介します。具体的には、コックス過程（この問題に対して以前は使用されていなかった計数過程）の適用と、べき法則に基づくレート関数の使用です。CLEF 2017 e-Health TARコレクションを用いた実験では、これらのアプローチがこれまで報告された結果を上回る成果をもたらすことを示しています。

Abstract:　 Technology Assisted Review (TAR) aims to minimise the manual judgements required to identify relevant documents. Reductions in workload are dependent on a reviewer being able to make an informed decision about when to stop examining documents. Counting processes offer a theoretically sound approach to creating stopping criteria for TAR approaches that are based on analysis of the rate at which relevant documents are observed. This paper introduces two modifications to existing approaches: application of a Cox Process (a counting process which has not previously been used for this problem) and use of a rate function based on a power law. Experiments on the CLEF 2017 e-Health TAR collection demonstrates that these approaches produces results that are superior to those reported previously.

Targeted Attack and Defense for Deep Hashing
Authors: Xunguang Wang (1), Zheng Zhang (2), Guangming Lu (1), Yong Xu (2)
1: Harbin Institute of Technology, 2: Harbin Institute of Technology & Peng Cheng Laboratory

ACM DL

Google Scholar

(293)
概要:　ディープハッシング法は広く研究され、大規模な高速画像検索に成功裏に適用されてきました。しかし、ディープニューラルネットワークの欠陥に起因して、ディープハッシングモデルは敵対的サンプルに容易に騙されるため、ハッシングベースの検索に重大なセキュリティリスクをもたらします。本論文では、新しいターゲット攻撃方法とディープハッシングベースの検索に対する初めての防御スキームを提案します。具体的には、カテゴリレベルのセマンティック埋め込み（プロトタイプコードと呼ばれる）を生成するための単純ながら効果的なPrototypeNetを設計しました。これをターゲットラベルのセマンティック代表とみなし、関係のあるラベルとのセマンティック類似性と無関係なラベルとの類似性の欠如を保持します。次に、敵対的サンプルのハッシュコードとプロトタイプコードとのハミング距離を最小化することでターゲット攻撃を実行します。さらに、ディープハッシングネットワークの敵対的ロバストネスを向上させるための敵対的トレーニングアルゴリズムを提供します。広範な実験により、我々の手法が最先端のターゲット攻撃性能を上回る高品質な敵対的サンプルを生成できることが示されています。重要なことに、我々の敵対的防御フレームワークは、ディープハッシングベースの検索における敵対的攻撃に対するハッシングネットワークのロバストネスを大幅に向上させることができます。コードはhttps://github.com/xunguangwang/Targeted-Attack-and-Defense-for-Deep-Hashingで利用可能です。

Abstract:　 Deep hashing methods have been intensively studied and successfully applied in massive fast image retrieval. However, inherited from the deficiency of deep neural networks, deep hashing models can be easily fooled by adversarial examples, which brings a serious security risk to hashing based retrieval. In this paper, we propose a novel targeted attack method and the first defense scheme for deep hashing based retrieval. Specifically, a simple yet effective PrototypeNet is designed to generate category-level semantic embedding (dubbed prototype code) regarded as the semantic representative of the target label, which preserves the semantic similarity with relevant labels and dissimilarity with irrelevant labels. Subsequently, we conduct the targeted attack by minimizing the Hamming distance between the hash code of the adversarial sample and the prototype code. Moreover, we provide an adversarial training algorithm to improve the adversarial robustness of deep hashing networks. Extensive experiments demonstrate our method can produce high-quality adversarial samples with the benefit of superior targeted attack performance over state-of-the-arts. Importantly, our adversarial defense framework can significantly boost the robustness of hashing networks against adversarial attacks on deep hashing based retrieval. The code is available at https://github.com/xunguangwang/Targeted-Attack-and-Defense-for-Deep-Hashing.

Understanding the Role of Affect Dimensions in Detecting Emotions from Tweets: A Multi-task Approach
Authors: Rajdeep Mukherjee (1), Atharva Naik (1), Sriyash Poddar (1), Soham Dasgupta (2), Niloy Ganguly (1)
1: IIT Kharagpur, 2: MAIS Bangalore

ACM DL

Google Scholar

(294)
概要:　本研究では、感情表現のカテゴリー的モデルと次元的モデルの関連性を活用した多タスクフレームワークVADECを提案し、主観性分析を向上させます。主にツイートからの感情の効果的な検出に焦点を当て、マルチラベル感情分類と多次元感情回帰を共同で訓練し、これによりタスク間の相互関連性を利用します。共同学習は特に分類タスクの性能向上に寄与し、AITデータセット[17]でのJaccard精度、Macro-F1、Micro-F1スコアにおいて、それぞれ3.4%、11%、および3.9%の向上を達成し、最も強力なベースラインを上回りました。また、SenWaveデータセット[27]において、6つの異なる指標で平均11.3%の向上を達成し、最新の成果を実現しました。回帰タスクにおいて、SenWaveを用いて訓練されたVADECは、Valence（V）およびDominance（D）の感情次元に対して、それぞれ7.6%と16.5%のPearson相関スコアの向上をEMOBANKデータセット[5]の現行最新技術に対して達成しました。我々の研究をCOVID-19に関するインド人によるツイートの事例研究で締めくくり、提案した解決策の有効性をさらに確立します。

Abstract:　 We propose VADEC, a multi-task framework that exploits the correlation between the categorical and dimensional models of emotion representation for better subjectivity analysis. Focusing primarily on the effective detection of emotions from tweets, we jointly train multi-label emotion classification and multi-dimensional emotion regression, thereby utilizing the inter-relatedness between the tasks. Co-training especially helps in improving the performance of the classification task as we outperform the strongest baselines with 3.4%, 11%, and 3.9% gains in Jaccard Accuracy, Macro-F1, and Micro-F1 scores respectively on the AIT dataset [17]. We also achieve state-of-the-art results with 11.3% gains averaged over six different metrics on the SenWave dataset [27]. For the regression task, VADEC, when trained with SenWave, achieves 7.6% and 16.5% gains in Pearson Correlation scores over the current state-of-the-art on the EMOBANK dataset [5] for the Valence (V) and Dominance (D) affect dimensions respectively. We conclude our work with a case study on COVID-19 tweets posted by Indians that further helps in establishing the efficacy of our proposed solution.

Unsupervised Ensemble Learning with Noisy Label Correction
Authors: Xupeng Zou (1), Zhongnan Zhang (1), Zhen He (2), Liang Shi (1)
1: Xiamen University, 2: Beijing Institute of Basic Medical Sciences

ACM DL

Google Scholar

(295)
概要:　教師なしアンサンブル学習は、複数のアノテータによる雑音の多い信頼性の低いラベリング結果を統合して、真のラベルを推定することを目指しています。数多くの技術がこの困難な課題に対処するために提案されてきましたが、統合後も誤分類される「厳しい」インスタンスが存在し、分類性能に大きな影響を与えます。本論文では、教師なしアンサンブル学習に基づいてラベル精度を向上させる新しいアプローチを紹介します。まず、すべてのインスタンスに対してラベルを集約するために期待値最大化（EM）アルゴリズムを適用します。次に、二段階のフィルタリング手法を通じて、「厳しい」とされるインスタンスを特定します。最後に、高品質なデータセットでアンサンブルのAdaBoostベースの分類モデルを訓練し、これらの「厳しい」インスタンスに対して新しいラベルを予測します。二値分類タスクにおける実証的調査の結果、以下のことが示されました：（1）本アプローチは入力データセットから「厳しい」インスタンスを効果的に識別できること。（2）本アプローチは教師なしアンサンブルアルゴリズムによって生成されたラベルの精度向上においてより良い性能を達成すること。

Abstract:　 Unsupervised ensemble learning aims to estimate ground-truth labels via integrating noisy and unreliable labeling results from multiple annotators. Although many techniques have been proposed to deal with this challenging task, there still exists some "tough" instances with noisy labels that are misclassified after the integration, which significantly affect the classification performance. This paper introduces a novel approach to improve the label accuracy based on unsupervised ensemble learning. First, we apply the expectation maximization (EM) algorithm to aggregate labels for all the instances. Then we identify instances that are most likely to be "tough" through a two-stage filtering method. Finally, an ensemble of AdaBoost-based classification models is trained on the high-quality dataset, and predicts new labels for these "tough" instances. The results of empirical investigation on binary classification task show that: (1) our approach can identify "tough" instances from the input dataset effectively; (2) our approach achieves a better performance on improving the accuracy of labels produced by unsupervised ensemble algorithms.

Unsupervised Extractive Text Summarization with Distance-Augmented Sentence Graphs
Authors: Jingzhou Liu (1), Dominic J. D. Hughes (2), Yiming Yang (1)
1: Carnegie Mellon University, 2: University of California

ACM DL

Google Scholar

(296)
概要:　近年、最先端のディープラーニング技術を活用することで、教師ありは大きな進展を遂げてきました。しかし、教師あり手法の真の成功は、大量の人間が生成した文書の存在に依存しており、これは一般的に非常に高コストで入手が困難です。本論文では、書類ごとに自動的に構築された文グラフを使用し、類似性および文の近傍における相対距離に基づいてに適した重要な文を選択する、教師なしの抽出型テキスト手法を提案します。我々のアプローチを単一文書から複数文書設定に一般化するために、文書レベルのグラフを近接ベースのクロスドキュメントエッジを介して集約します。ベンチマークデータセットでの実験において、提案手法は、単一文書および複数文書の設定の両方で、従来の最先端の教師なし抽出型手法に比べて競争力のある、あるいはより優れた結果を達成し、強力な教師ありベースラインと同等の性能を示しました。

Abstract:　 Supervised summarization has made significant improvements in recent years by leveraging cutting-edge deep learning technologies. However, the true success of supervised methods relies on the availability of large quantity of human-generated summaries of documents, which is highly costly and difficult to obtain in general. This paper proposes an unsupervised approach to extractive text summarization, which uses an automatically constructed sentence graph from each document to select salient sentences for summarization based on both the similarities and relative distances in the neighborhood of each sentences. We further generalize our approach from single-document summarization to a multi-document setting, by aggregating document-level graphs via proximity-based cross-document edges. In our experiments on benchmark datasets, the proposed approach achieved competitive or better results than previous state-of-the-art unsupervised extractive summarization methods in both single-document and multi-document settings, and the performance is competitive to strong supervised baselines.

When Choice Happens: A Systematic Examination of Mouse Movement Length for Decision Making in Web Search
Authors: Lukas Brückner (1), Ioannis Arapakis (2), Luis A. Leiva (3)
1: Independent researcher, 2: Telefonica Research, 3: University of Luxembourg

ACM DL

Google Scholar

(297)
概要:　検索者は、検索結果ページ上で数秒以内に選択を行うことが多いです。この選択は動的な認知過程の結果であり、最終的にはマウスの動きに反映されるため、コンピューターマウスの追跡によってモデル化することができます。しかし、すべての動きが同等の価値を持つわけではないため、それらの動きとその一連の長さがモデルの性能にどのように影響するかを理解することが重要です。本研究では、検索者が（1）広告に気づいた場合、（2）ページを放棄した場合、（3）フラストレーションを感じた場合の3つの異なるシナリオを検討します。これらのシナリオをリカレントニューラルネットを用いてモデル化し、マウスのシーケンスパディングと切り捨てが異なる長さに与える影響を研究します。我々の発見では、前述のタスクを予測するには時にはわずか2秒の動きだけで可能であることが分かりました。最終的には、適切なデータ量を効率的に記録することで、貴重な帯域幅とストレージを節約し、ユーザーのプライバシーを尊重し、機械学習モデルの訓練と展開の速度を向上させることができます。ウェブスケールを考慮すると、これにより我々の環境に対して純利益がもたらされるでしょう。

Abstract:　 Searchers often make a choice in a matter of seconds on SERPs. As a result of a dynamic cognitive process, choice is ultimately reflected in motor movement and thus can be modeled by tracking the computer mouse. However, because not all movements have equal value, it is important to understand how do they and, critically, their sequence length impact model performance. We study three different SERP scenarios where searchers (1)~noticed an advertisement, (2)~abandoned the page, and (3)~became frustrated. We model these scenarios with recurrent neural nets and study the effect of mouse sequence padding and truncating to different lengths. We find that it is possible to predict the aforementioned tasks sometimes using just 2 seconds of movement. Ultimately, by efficiently recording the right amount of data, we can save valuable bandwidth and storage, respect the users' privacy, and increase the speed at which machine learning models can be trained and deployed. Considering the web scale, doing so will have a net benefit on our environment.

Window Navigation with Adaptive Probing for Executing BlockMax WAND
Authors: Jinjin Shao (1), Yifan Qiao (1), Shiyu Ji (1), Tao Yang (1)
1: University of California

ACM DL

Google Scholar

(298)
概要:　 BlockMax WAND (BMW)およびその変種は、低スコアのドキュメントを効果的にプルーニングし、トップkの論理和クエリ処理を高速化することができます。本論文では、BMWまたはその変種のいずれかを、しきい値制約を早期に厳密化する順序で優先されたポスティングウィンドウのシーケンス上で実行することで、ドキュメント検索をさらに高速化するブースティングアプローチを研究しています。この最適化により、ポスティングブロック訪問およびドキュメントスコア評価に関連する操作を安全に削減できるメリットが追加される可能性があります。本論文では、BMWおよびその2つの変種に関するこのようなインデックスナビゲーションを評価します。

Abstract:　 BlockMax WAND (BMW) and its variants can effectively prune low-scoring documents for fast top-k disjunctive query processing. This paper studies a boosting approach that further accelerates document retrieval by executing BMW, or one of its variants, on a sequence of posting windows with an order prioritized to tighten the threshold bound earlier. This optimization could add benefits to safely eliminate more operations involved in posting block visitation and document score evaluation. This paper evaluates such index navigation for BMW and two of its variants.

A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
Authors: Ahmed Hamdi (1), Elvys Linhares Pontes (1), Emanuela Boros (1), Thi Tuyet Hai Nguyen (1), Günter Hackl (2), Jose G. Moreno (3), Antoine Doucet (1)
1: University of La Rochelle, 2: Innsbruck University Innovations GmbH, 3: University of Toulouse

ACM DL

Google Scholar

(299)
概要:　デジタルライブラリに保管されている大量の文書やアーカイブのため、歴史的テキストに対する固有表現処理の重要性が増しています。しかし、歴史的に注釈が不十分なリソースのため、情報抽出の性能は現代テキストに比べて劣っています。本論文では、固有表現認識とリンクを行う多言語データセット「NewsEyeリソース」の開発について紹介します。このデータセットは、1850年から1950年にかけてフランス語、ドイツ語、フィンランド語、およびスウェーデン語で発行された逐次的な歴史的新聞素材で構成されています。このような歴史的リソースは、固有表現処理システムの開発と評価において不可欠です。また、既存のアプローチの性能を向上させ、デジタル文化遺産コレクションの歴史的文書の適切かつ効率的なセマンティックインデックスを可能にします。

Abstract:　 Named entity processing over historical texts is more and more being used due to the massive documents and archives being stored in digital libraries. However, due to the poor annotated resources of historical nature, information extraction performances fall behind those on contemporary texts. In this paper, we introduce the development of the NewsEye resource, a multilingual dataset for named entity recognition and linking enriched with stances towards named entities. The dataset is comprised of diachronic historical newspaper material published between 1850 and 1950 in French, German, Finnish, and Swedish. Such historical resource is essential in the context of developing and evaluating named entity processing systems. It evenly allows enhancing the performances of existing approaches on historical documents which enables adequate and efficient semantic indexing of historical documents on digital cultural heritage collections.

How Deep is your Learning: the DL-HARD Annotated Deep Learning Dataset
Authors: Iain Mackie (1), Jeffrey Dalton (1), Andrew Yates (2)
1: University of Glasgow, 2: Max Planck Institute for Informatics

ACM DL

Google Scholar

(300)
概要:　 Deep Learning Hard (DL-HARD) は、複雑なトピックに対するニューラルランキングモデルの評価を効果的に行うための新しい注釈付きデータセットです。このデータセットは、TREC Deep Learning (DL) のトピックを基に、質問の意図カテゴリ、回答の種類、ウィキ化されたエンティティ、トピックのカテゴリ、商用ウェブ検索エンジンからの結果タイプのメタデータで広範に注釈を付けています。このデータに基づき、挑戦的なクエリを特定するためのフレームワークを導入します。DL-HARDは、公式のDL 2019/2020評価ベンチマークからの50トピックを含み、そのうち半分は新たに独立して評価されています。公式に提出されたDLの実行結果を使用してDL-HARDに関する実験を行い、メトリクスや参加システムのランキングにおいて大きな違いがあることを発見しました。総じて、DL-HARDは、挑戦的で複雑なトピックに焦点を当てることで、ニューラルランキング手法に関する研究を促進する新しいリソースです。

Abstract:　 Deep Learning Hard (DL-HARD) is a new annotated dataset designed to more effectively evaluate neural ranking models on complex topics. It builds on TREC Deep Learning (DL) topics by extensively annotating them with question intent categories, answer types, wikified entities, topic categories, and result type metadata from a commercial web search engine. Based on this data, we introduce a framework for identifying challenging queries. DL-HARD contains fifty topics from the official DL 2019/2020 evaluation benchmark, half of which are newly and independently assessed. We perform experiments using the official submitted runs to DL on DL-HARD and find substantial differences in metrics and the ranking of participating systems. Overall, DL-HARD is a new resource that promotes research on neural ranking methods by focusing on challenging and complex topics.

LeCaRD: A Legal Case Retrieval Dataset for Chinese Law System
Authors: Yixiao Ma (1), Yunqiu Shao (1), Yueyue Wu (1), Yiqun Liu (1), Ruizhe Zhang (1), Min Zhang (1), Shaoping Ma (1)
1: Tsinghua University

ACM DL

Google Scholar

(301)
概要:　法律事件の検索は、さまざまな法体系で正義を確保するために極めて重要であり、近年、情報検索（IR）研究で注目を集めています。しかし、従来の検索データセットの関連性判断基準は、非引用関係の事件には適用できないか、将来のデータセットにとって指導的役割を果たすには不十分です。加えて、多くの既存のベンチマークデータセットはクエリの選択に焦点を当てていません。本研究では、中国法律事件検索データセット（LeCaRD）を構築しました。このデータセットには107件のクエリ事件と43,000件以上の候補事件が含まれています。クエリと結果は中国最高人民法院が公表した刑事事件から採用されています。特に、関連性定義の難しさに対処するために、我々の法律チームによって設計された一連の関連性判断基準を提案し、法律専門家によって対応する候補事件の注釈が行われました。また、クエリの難易度と多様性の両方を考慮した新しいクエリサンプリング戦略を開発しました。データセットの評価のため、いくつかの既存の検索モデルをLeCaRDに実装し、ベースラインとして設定しました。このデータセットは、完全なデータ処理の詳細とともに、一般に公開されています。

Abstract:　 Legal case retrieval is of vital importance for ensuring justice in different kinds of law systems and has recently received increasing attention in information retrieval (IR) research. However, the relevance judgment criteria of previous retrieval datasets are either not applicable to non-cited relationship cases or not instructive enough for future datasets to follow. Besides, most existing benchmark datasets do not focus on the selection of queries. In this paper, we construct the Chinese Legal Case Retrieval Dataset (LeCaRD), which contains 107 query cases and over 43,000 candidate cases. Queries and results are adopted from criminal cases published by the Supreme People's Court of China. In particular, to address the difficulty in relevance definition, we propose a series of relevance judgment criteria designed by our legal team and corresponding candidate case annotations are conducted by legal experts. Also, we develop a novel query sampling strategy that takes both query difficulty and diversity into consideration. For dataset evaluation, we implemented several existing retrieval models on LeCaRD as baselines. The dataset is now available to the public together with the complete data processing details.

Morphologically Annotated Amharic Text Corpora
Authors: Tilahun Yeshambel (1), Josiane Mothe (2), Yaregal Assabie (3)
1: Addis Ababa University, 2: Univ. de Toulouse, 3: Addis Ababa University

ACM DL

Google Scholar

(302)
概要:　情報検索（IR）では、クエリに一致するドキュメントが検索されます。検索エンジンは通常、インデックス作成時に単語の異なる形態を共通の語幹に統合します。これは、クエリとドキュメントが必ずしも同じ単語の形態を使用する必要がないためです。語幹抽出器は多くの言語でIRにおいて効果的であることが知られています。しかし、語幹抽出器や形態解析器が不足している言語も未だ存在します。エチオピアの公用語であるアムハラ語はその一例です。形態解析は、語幹、語根（主要な語彙単位）、および人称、時制、否定標識などの文法標識を導出するための鍵です。本論文では、形態情報を注釈付けしたアムハラ語の辞書、ならびに語幹ベースおよび語根ベースの形態注釈付きコーパスを提示します。これらは研究コミュニティによって形態解析器やアムハラ語における情報検索の評価用ベンチマークとして使用されることができます。このようなリソースは、アムハラ語のIR研究を促進するものと期待されています。

Abstract:　 In information retrieval (IR), documents that match the query are retrieved. Search engines usually conflate word variants into a common stem when indexing documents because queries and documents do not need to use exactly the same word variant for the documents to be relevant. Stemmers are known to be effective in many languages for IR. However, there are still languages where stemmers or morphological analyzers are missing; this is the case for Amharic which is the working language of Ethiopia. Morphological analysis is the key to derive stems, roots (primary lexical units) and grammatical markers of words such as person, tense and negation markers. This paper presents morphologically annotated Amharic lexicons as well as stem-based and root-based morphologically annotated corpora which could be used by the research community as benchmark collections either to evaluate morphological analyzers or information retrieval for Amharic. Such resources are believed to foster research in Amharic IR.

Pyserini: A Python Toolkit for Reproducible Information Retrieval Research with Sparse and Dense Representations
Authors: Jimmy Lin (1), Xueguang Ma (1), Sheng-Chieh Lin (1), Jheng-Hong Yang (1), Ronak Pradeep (1), Rodrigo Nogueira (1)
1: University of Waterloo

ACM DL

Google Scholar

(303)
概要:　歯をめソリンは、疎表現および密表現を用いた再現可能な情報検索研究のためのPythonツールキットです。本ツールキットは、多段階ランキングアーキテクチャにおいて、効果的で再現可能かつ使いやすい第一段階の検索を提供することを目的としています。Pyseriniは標準的なPythonパッケージとして自己完結しており、多くの一般的なIRテストコレクションに対してクエリ、関連性判断、事前構築済みインデックス、および評価スクリプトを備えています。私たちは、現代のニューラルアプローチを用いたランキングの改善を目指す研究の全ライフサイクルを、すぐに活用できる形でサポートすることを目指しています。特に、Pyseriniは疎検索（例：単語の袋表現を使用したBM25スコアリング）、密検索（例：トランスフォーマーでエンコードされた表現に基づく最近傍検索）、およびこれらのアプローチを統合するハイブリッド検索をサポートします。本論文では、ツールキットの機能を提供するとともに、人気のある二つのランキングタスクにおいてその有効性を示す実証結果を提示します。また、本ツールキットを中心に、厳格な自動テストを可能にする共有規範とツールを通じて、私たちのグループは再現性の文化を築いてきました。

Abstract:　 Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well as hybrid retrieval that integrates both approaches. This paper provides an overview of toolkit features and presents empirical results that illustrate its effectiveness on two popular ranking tasks. Around this toolkit, our group has built a culture of reproducibility through shared norms and tools that enable rigorous automated testing.

REGIS: A Test Collection for Geoscientific Documents in Portuguese
Authors: Lucas Lima de Oliveira (1), Regis Kruel Romeu (2), Viviane Pereira Moreira (1)
1: UFRGS, 2: Petrobras

ACM DL

Google Scholar

(304)
概要:　実験による検証は情報検索（IR）システムの開発において重要な役割を果たします。標準的な評価パラダイムには、ドキュメント、クエリ、および関連性評価を含むテストコレクションが必要です。テストコレクションの作成には主に関連性評価の提供に多大な人的労力が必要となります。その結果、現在でも多くの分野や言語で適切な評価テストベッドが欠如しています。ポルトガル語はその一例であり、世界的に主要な言語でありながらIR研究においては見過ごされがちです。現在利用可能な唯一のテストコレクションは、1994年のニュース記事と百のクエリで構成されています。このギャップを埋めることを目指し、本研究では、ポルトガル語の地球科学領域におけるテストコレクションであるREGIS（Retrieval Evaluation for Geoscientific Information Systems）を開発しました。REGISには、2万件のドキュメントと34のクエリトピック、および関連性評価が含まれています。ドキュメントの収集、トピックの作成、関連性評価の手続きを詳述するとともに、REGISに対する標準的なIR技術の結果を報告し、これが今後の研究の基準となることを目指します。

Abstract:　 Experimental validation is key to the development of Information Retrieval (IR) systems. The standard evaluation paradigm requires a test collection with documents, queries, and relevance judgments. Creating test collections requires significant human effort, mainly for providing relevance judgments. As a result, there are still many domains and languages that, to this day, lack a proper evaluation testbed. Portuguese is an example of a major world language that has been overlooked in terms of IR research -- the only test collection available is composed of news articles from 1994 and a hundred queries. With the aim of bridging this gap, in this paper, we developed REGIS (Retrieval Evaluation for Geoscientific Information Systems), a test collection for the geoscientific domain in Portuguese. REGIS contains 20K documents and 34 query topics along with relevance assessments. We describe the procedures for document collection, topic creation, and relevance assessment. In addition, we report on results of standard IR techniques on REGIS so that they can serve as a baseline for future research.

TREC Deep Learning Track: Reusable Test Collections in the Large Data Regime
Authors: Nick Craswell (1), Bhaskar Mitra (2), Emine Yilmaz (3), Daniel Campos (4), Ellen M. Voorhees (5), Ian Soboroff (5)
1: Microsoft, 2: Microsoft, 3: University College London, 4: University of Illinois, 5: National Institute of Standards and Technology

ACM DL

Google Scholar

(305)
概要:　 TREC Deep Learning (DL)トラックは、大規模データ環境におけるアドホック検索を研究しており、人間によってラベル付けされた大量の訓練データが利用可能です。これまでの結果は、大規模データを使用した最良のモデルがディープニューラルネットワークである可能性を示唆しています。本論文は、TREC DLテストコレクションの再利用を以下の三つの方法で支援します。まず、データセットを詳細に記述し、トラックガイドライン、論文、MS MARCOリーダーボードページに散在する情報を一元的にまとめています。この記述は、TREC DLデータを初めて使用する人々のための利便性を意図しています。次に、データセットの再利用時に繰り返しや選択バイアスのリスクがあるため、TREC DLデータを使用して論文を執筆する際のベストプラクティスを説明し、過学習を避ける方法を示します。いくつかの具体的な分析も提供します。最後に、TREC DLデータの再利用可能性に関する問題点を含むいくつかの課題についても取り上げます。

Abstract:　 The TREC Deep Learning (DL) Track studies ad hoc search in the large data regime, meaning that a large set of human-labeled training data is available. Results so far indicate that the best models with large data may be deep neural networks. This paper supports the reuse of the TREC DL test collections in three ways. First we describe the data sets in detail, documenting clearly and in one place some details that are otherwise scattered in track guidelines, overview papers and in our associated MS MARCO leaderboard pages. We intend this description to make it easy for newcomers to use the TREC DL data. Second, because there is some risk of iteration and selection bias when reusing a data set, we describe the best practices for writing a paper using TREC DL data, without overfitting. We provide some illustrative analysis. Finally we address a number of issues around the TREC DL data, including an analysis of reusability.

WWW3E8: 259,000 Relevance Labels for Studying the Effect of Document Presentation Order for Relevance Assessors
Authors: Tetsuya Sakai (1), Sijie Tao (1), Zhaohao Zeng (1)
1: Waseda University

ACM DL

Google Scholar

(306)
概要:　深度kプーリングに基づく情報検索（IR）の評価では、プールされた文書を関連性アセッサに提示する順序にいくつかの戦略があります。その中で、最も単純なアプローチとして文書の順序を完全にランダム化する方法があります。これは「アセッサが文書がどのシステムでどれくらい高い評価を受けたか、またはどのシステムがその文書を取得したかを判断できないようにする」ためです。この方法とは対照的なアプローチとして、NTCIRで広く使用されているツールNTCIRPOOLによる優先順位付けの戦略があります。NTCIRPOOLはプールされた文書を「疑似関連性」という統計量でソートします。これは深度kのプール内で各文書の人気度を反映しています。これら二つの戦略は20年以上共存してきましたが、IR研究コミュニティはそれぞれの戦略が持つ利点についてまだコンセンサスに達していません。この問いに対して研究者が自分の好む分析方法を用いて直接的に取り組めるようにするために、我々はWWW3E8という大規模データセットを公開しました。これはNTCIR-15のWWW-3タスクの160の英語トピックに対する八つの独立したqrelsセットを含んでいます：ランダム化アプローチで構築された四つのqrelsファイルと、もう四つはNTCIRPOOLの優先順位付けアプローチで構築されました。各qrelsファイルは32,375のトピック-文書ペアをカバーしており、したがってWWW3E8は合計259,000の関連性ラベルを含んでいます。さらに、このデータセットにはWWW-3タスクからのローデータ、ランダム化および優先順位付けされたプールファイル、およびタスクで使用された公式指標のトピックごとのスコアマトリックスが含まれています。したがって、文書の順序に関する上記の研究質問に関心のある研究者は、二つの戦略を直接比較するための共通基盤としてWWW3E8を利用できます。

Abstract:　 In IR evaluation based on depth-k pooling, there are several strategies to order the pooled documents for relevance assessors. Among them, the simplest approach is to completely randomise the order "so assessors cannot tell if a document was highly ranked by some system or how many systems (or which systems) retrieved the document." An approach that is in sharp contrast to the above is the prioritisation approach taken by NTCIRPOOL, a tool widely used at NTCIR. NTCIRPOOL sorts the pooled documents by "pseudorelevance," a statistic that reflects the popularity of each document within the depth-k pools. Although these two strategies have coexisted for over two decades, the IR research community has yet to reach a consensus as to what advantages each of these two strategies actually offer. To help researchers directly address this question using their favourite methods of analysis, we have released a large-scale data set called WWW3E8. It comprises eight independent sets of qrels for the 160 English topics of the NTCIR-15 WWW-3 task: four qrels files constructed using the randomisation approach, and another four constructed using the prioritisation approach of NTCIRPOOL. Each qrels file covers 32,375 topic-document pairs; hence, WWW3E8 contains a total of 259,000 relevance labels. Moreover, the data set contains the raw English subtask run files from the WWW-3 task, the randomised and prioritised pool files, and topic-by-run score matrices of the official measures used in the task. Hence, researchers interested in the above research question regarding document ordering can utilise WWW3E8 as a common ground to directly compare the two strategies.

Advancements in the Music Information Retrieval Framework AMUSE over the Last Decade
Authors: Igor Vatolkin (1), Philipp Ginsel (1), Günter Rudolph (1)
1: TU Dortmund University

ACM DL

Google Scholar

(307)
概要:　 AMUSE（Advanced MUSic Explorer）は、2006年に様々な音楽情報検索タスク（特徴抽出、特徴処理、分類、評価など）を目的としたオープンソースのJavaフレームワークとして作成されました。個別のMIR（音楽情報検索）関連アルゴリズムに焦点を当てるツールボックスとは異なり、AMUSEを使用することで、例えばLibrosaで特徴を抽出し、MIRtoolboxで推定されたイベントに基づいてその特徴を処理し、WEKAやKerasで分類し、自らの分類性能指標を用いてモデルを検証することが可能です。ISMIR 2010での最初の発表以来、AMUSEにはいくつかの重要な貢献がなされています。それには、単一および複数トラックの注釈エディタ、マルチラベルおよびマルチクラス分類のサポート、Keras、Librosa、Sonic Annotatorと連携する新しいプラグインの導入が含まれます。その他の統合された方法には、構造的複雑性処理、コードベクトル特徴、推定されたオンセットイベント周辺の特徴の集計、時間イベント抽出器の評価が含まれます。さらに、フレームサイズなど異なるパラメータでの柔軟な特徴抽出、教師あり分類に関連しない追加タスクの統合、分類タスクで無視できる特徴のマーキング、外部コード（例えばKeras神経ネットワークの構造）を用いたアルゴリズムパラメータの拡張など、さらなる進歩もあります。

Abstract:　 AMUSE (Advanced MUSic Explorer) was created 2006 as an open-source Java framework for various music information retrieval tasks like feature extraction, feature processing, classification, and evaluation. In contrast to toolboxes which focus on individual MIR-related algorithms, it is possible with AMUSE, for instance, to extract features with Librosa, process them based on events estimated by MIRtoolbox, classify with WEKA or Keras, and validate the models with own classification performance measures. We present several substantial contributions to AMUSE since its first presentation at ISMIR 2010. They include the annotation editor for single and multiple tracks, the support of multi-label and multi-class classification, and new plugins which operate with Keras, Librosa, and Sonic Annotator. Other integrated methods are the structural complexity processing, chord vector feature, aggregation of features around estimated onset events, and evaluation of time event extractors. Further advancements are a more flexible feature extraction with different parameters like frame sizes, possibility to integrate additional tasks beyond algorithms related to supervised classification, marking of features which can be ignored for a classification task, extension of algorithm parameters with external code (e.g., a structure of a Keras neural net), etc.

Conversational Entity Linking: Problem Definition and Datasets
Authors: Hideaki Joko (1), Faegheh Hasibi (1), Krisztian Balog (2), Arjen P. de Vries (1)
1: Radboud University, 2: University of Stavanger

ACM DL

Google Scholar

(308)
概要:　会話システムにおけるユーザー発言の機械理解は、ユーザーとの引き込まれるような意味のある会話を可能にするために極めて重要です。エンティティリンキング（EL）は、テキスト理解の一手段であり、情報検索における様々な下流タスクに対してその有効性が証明されています。本論文では、会話システムにおけるエンティティリンキングを研究します。会話設定におけるELが何を含むのかをよりよく理解するために、既存の会話データセットから多くの対話を分析し、クラウドソーシングを用いて概念、固有名詞、および個人のエンティティへの言及を注釈付けしました。注釈付けされた対話に基づいて、会話におけるエンティティリンキングの主な特性を特定しました。さらに、我々の会話エンティティリンキングデータセット「ConEL」での伝統的なELシステムの性能について報告し、会話設定により適合するようにこれらの方法を拡張しました。この論文と共に公開されるリソースには、注釈付けされたデータセット、クラウドソーシングの設定の詳細な説明、そして様々なELシステムによって生成された注釈が含まれています。これらの新しいリソースは、会話におけるエンティティの役割が文書やクエリやツイートのような孤立した短いテキスト発話における役割とどのように異なるかを調査することを可能にし、既存の会話データセットを補完します。

Abstract:　 Machine understanding of user utterances in conversational systems is of utmost importance for enabling engaging and meaningful conversations with users. Entity Linking (EL) is one of the means of text understanding, with proven efficacy for various downstream tasks in information retrieval. In this paper, we study entity linking for conversational systems. To develop a better understanding of what EL in a conversational setting entails, we analyze a large number of dialogues from existing conversational datasets and annotate references to concepts, named entities, and personal entities using crowdsourcing. Based on the annotated dialogues, we identify the main characteristics of conversational entity linking. Further, we report on the performance of traditional EL systems on our Conversational Entity Linking dataset, ConEL, and present an extension to these methods to better fit the conversational setting. The resources released with this paper include annotated datasets, detailed descriptions of crowdsourcing setups, as well as the annotations produced by various EL systems. These new resources allow for an investigation of how the role of entities in conversations is different from that in documents or isolated short text utterances like queries and tweets, and complement existing conversational datasets.

CopyCat: Near-Duplicates Within and Between the ClueWeb and the Common Crawl
Authors: Maik Fröbe (1), Janek Bevendorff (2), Lukas Gienapp (3), Michael Völske (2), Benno Stein (2), Martin Potthast (3), Matthias Hagen (1)
1: Martin-Luther-Universität Halle-Wittenberg, 2: Bauhaus-Universität Weimar, 3: Leipzig University

ACM DL

Google Scholar

(309)
概要:　 ClueWebやCommon Crawlのようなウェブクロールには、ほぼ重複するデータが多く含まれており、これらを利用するユーザーは、高い計算コストと人件費を伴う重複排除の前処理パイプラインを開発するか、実験の信頼性と有効性に悪影響を及ぼす近似重複を受け入れるかの選択を迫られます。私たちは、重複排除を大幅に簡素化するChatNoir-CopyCat-21を紹介します。このリソースは二つの部分から成り立っています。(1) ClueWeb09、ClueWeb12、および二つのCommon Crawlスナップショット内ならびにこれらのクロールの選択間での近似重複文書のコンパイル。(2) 任意の文書セットの重複排除を実装するソフトウェアライブラリ。私たちの分析結果によれば、クロール内では14～52%、クロール間では約0.7～2.5%的な文書がほぼ重複しています。二つのショーケースでは、私たちのリソースの応用と有用性を実証しています。

Abstract:　 The amount of near-duplicates in web crawls like the ClueWeb or Common Crawl demands from their users either to develop a preprocessing pipeline for deduplication, which is costly both computationally and in person hours, or accepting the undesired effects that near-duplicates have on reliability and validity of experiments. We introduce ChatNoir-CopyCat-21, which simplifies deduplication significantly. It comes in two parts: (1) A compilation of near-duplicate documents within the ClueWeb09, the ClueWeb12, and two Common Crawl snapshots, as well as between selections of these crawls, and (2) a software library that implements the deduplication of arbitrary document sets. Our analysis shows that 14--52, of the documents within a crawl and around~0.7--2.5, between the crawls are near-duplicates. Two showcases demonstrate the application and usefulness of our resource.

Elliot: A Comprehensive and Rigorous Framework for Reproducible Recommender Systems Evaluation
Authors: Vito Walter Anelli (1), Alejandro Bellogin (2), Antonio Ferrara (1), Daniele Malitesta (1), Felice Antonio Merra (1), Claudio Pomo (1), Francesco Maria Donini (3), Tommaso Di Noia (1)
1: Politecnico di Bari, 2: Autónoma Madrid, 3: Università della Tuscia

ACM DL

Google Scholar

(310)
概要:　推薦システムは、選択肢過多問題を軽減し、正確で個別化された推奨を提供する効果的な手段であることが示されています。しかし、提案された推奨アルゴリズム、分割戦略、評価プロトコル、メトリクス、タスクの数が多いため、厳密な実験評価が特に困難になっています。適切な評価ベンチマーク、実験パイプライン、ハイパーパラメータ最適化、および評価手順の再作成に戸惑いと苛立ちを感じている中で、私たちはこれらのニーズに対処するための徹底的なフレームワークを開発しました。Elliotは、単純な設定ファイルを処理することで、全ての実験パイプラインを実行および再現することを目指した包括的な推薦フレームワークです。フレームワークは、多数の戦略（13の分割方法と8のフィルタリングアプローチ）を考慮してデータを読み込み、フィルタリングし、分割します（時間的訓練-テスト分割からネストされたK-フォールドクロスバリデーションまで）。Elliot (https://github.com/sisinflab/elliot) は、ハイパーパラメータ（51の戦略）を複数の推薦アルゴリズム（50）のために最適化し、最良のモデルを選択し、ベースラインと比較してモデル内統計を提供し、精度から精度以外のメトリクス、バイアス、公平性に至るまでのメトリクス（36）を計算し、統計解析（ウィルコクソンおよびペアードt検定）を実施します。

Abstract:　 Recommender Systems have shown to be an effective way to alleviate the over-choice problem and provide accurate and tailored recommendations. However, the impressive number of proposed recommendation algorithms, splitting strategies, evaluation protocols, metrics, and tasks, has made rigorous experimental evaluation particularly challenging. Puzzled and frustrated by the continuous recreation of appropriate evaluation benchmarks, experimental pipelines, hyperparameter optimization, and evaluation procedures, we have developed an exhaustive framework to address such needs. Elliot is a comprehensive recommendation framework that aims to run and reproduce an entire experimental pipeline by processing a simple configuration file. The framework loads, filters, and splits the data considering a vast set of strategies (13 splitting methods and 8 filtering approaches, from temporal training-test splitting to nested K-folds Cross-Validation). Elliot(https://github.com/sisinflab/elliot) optimizes hyperparameters (51 strategies) for several recommendation algorithms (50), selects the best models, compares them with the baselines providing intra-model statistics, computes metrics (36) spanning from accuracy to beyond-accuracy, bias, and fairness, and conducts statistical analysis (Wilcoxon and Paired t-test).

HOOPS: Human-in-the-Loop Graph Reasoning for Conversational Recommendation
Authors: Zuohui Fu (1), Yikun Xian (1), Yaxin Zhu (1), Shuyuan Xu (1), Zelong Li (1), Gerard de Melo (2), Yongfeng Zhang (1)
1: Rutgers University, 2: HPI/Univ. of Potsdam

ACM DL

Google Scholar

(311)
概要:　近年、人間フィードバックから学習する人間中心のAIの必要性がますます認識されています。しかし、現在のほとんどのAIシステムは、モデル設計に重点を置き、人間の参加をパイプラインの一部として取り入れることは少ないです。本研究では、Human-in-the-Loop (HitL) グラフ推論パラダイムを提案し、KG駆動型の会話推薦タスク向けの対応するデータセットHOOPSを開発しました。具体的には、まず多様なユーザー行動を解釈するKGを構築し、各ユーザーとアイテムのペアに対して関連する属性エンティティを特定します。次に、KG構造を透過的に追跡し、適切なアイテムを選択する人間の意思決定プロセスを反映した会話のターンをシミュレートします。また、我々が開発したデータセットを用いて、推薦のためのHitLグラフ推論の実現可能性を確認するためのベンチマーク手法とその性能を報告し、研究コミュニティに新たな機会を提供することを示しました。

Abstract:　 There is increasing recognition of the need for human-centered AI that learns from human feedback. However, most current AI systems focus more on the model design, but less on human participation as part of the pipeline. In this work, we propose a Human-in-the-Loop (HitL) graph reasoning paradigm and develop a corresponding dataset named HOOPS for the task of KG-driven conversational recommendation. Specifically, we first construct a KG interpreting diverse user behaviors and identify pertinent attribute entities for each user--item pair. Then we simulate the conversational turns reflecting the human decision making process of choosing suitable items tracing the KG structures transparently. We also provide a benchmark method with reported performance on the dataset to ascertain the feasibility of HitL graph reasoning for recommendation using our developed dataset, and show that it provides novel opportunities for the research community.

On the Quality of the TREC-COVID IR Test Collections
Authors: Ellen M. Voorhees (1), Kirk Roberts (2)
1: National Institute of Standards and Technology, 2: The University of Texas Health Science Center at Houston

ACM DL

Google Scholar

(312)
概要:　共有テキストコレクションは、依然としてIR（情報検索）研究にとって重要なインフラです。COVID-19パンデミックは、パンデミック中に急速に変化する情報空間をキャプチャするテストコレクションを作成する機会を提供し、その結果、TRECフレームワークを使用してそのようなコレクションを構築するためのTREC-COVIDプロジェクトが開始されました。本論文では、作成されたTREC-COVIDテストコレクションの品質を検証するとともに、再利用可能なIRテストコレクションの構築における最新技術の批評を行います。最大のコレクションである「TREC-COVID Complete」は、既存の品質テストで明らかな問題を発見できないことから、従来のTRECアドホックコレクションと同程度の品質であることが確認されました。しかし、このコレクションの品質を決定的に証明する方法がないことや、従来使用されてきた品質ヒューリスティクスを逸脱していることから、コレクションの品質に影響を与える要因の理解にはまだ多くの課題が残っていることが示唆されます。

Abstract:　 Shared text collections continue to be vital infrastructure for IR research. The COVID-19 pandemic offered an opportunity to create a test collection that captured the rapidly changing information space during a pandemic, and the TREC-COVID effort was created to build such a collection using the TREC framework. This paper examines the quality of the resulting TREC-COVID test collections, and in doing so, offers a critique of the state-of-the-art in building reusable IR test collections. The largest of the collections--called 'TREC-COVID Complete'--is found to be on par with previous TREC ad~hoc collections with existing quality tests uncovering no apparent problems. Yet the lack of any way to definitively demonstrate the collection's quality and its violation of previously used quality heuristics suggest much work remains to be done to understand the factors affecting collection quality.

Simplified Data Wrangling with ir_datasets
Authors: Sean MacAvaney (1), Andrew Yates (2), Sergey Feldman (3), Doug Downey (3), Arman Cohan (3), Nazli Goharian (4)
1: University of Glasgow & Georgetown University, 2: Max Planck Institute for Informatics, 3: Allen Institute for AI, 4: Georgetown University

ACM DL

Google Scholar

(313)
概要:　情報検索（Information Retrieval, IR）実験のデータ管理は困難な場合があります。データセットのドキュメントはインターネット上に散在し、データのコピーを取得しても、さまざまなデータフォーマットに対応しなければなりません。基本的なフォーマットでさえ、正しく使用するためにはデータセット固有の微妙な違いを考慮する必要があります。これらの課題を軽減するために、我々はIRで使用されるデータセットの取得、管理、および典型的な操作を行うための新しい強力かつ軽量なツール（ir_datasets）を紹介します。我々は主にアドホック検索に使用されるテキストデータセットに焦点を当てています。このツールは、Pythonインターフェースとコマンドラインインターフェースの両方を提供し、数多くのIRデータセットおよびベンチマークに対応します。私たちの知る限り、これはこの種のツールの中で最も包括的なものです。人気のあるIRインデックスおよび実験ツールキットとの統合により、このツールの有用性が示されています。また、\sys カタログ（https://ir-datasets.com/）を通じてこれらのデータセットのドキュメントも提供しています。このカタログは、IRで使用されるデータセットに関する情報のハブとして機能し、各ベンチマークが提供するコア情報および詳細情報へのリンクを提供します。我々はコミュニティからの貢献を歓迎し、このツールを引き続き維持および成長させることを目指しています。

Abstract:　 Managing the data for Information Retrieval (IR) experiments can be challenging. Dataset documentation is scattered across the Internet and once one obtains a copy of the data, there are numerous different data formats to work with. Even basic formats can have subtle dataset-specific nuances that need to be considered for proper use. To help mitigate these challenges, we introduce a new robust and lightweight tool (ir_datasets) for acquiring, managing, and performing typical operations over datasets used in IR. We primarily focus on textual datasets used for ad-hoc search. This tool provides both a Python and command line interface to numerous IR datasets and benchmarks. To our knowledge, this is the most extensive tool of its kind. Integrations with popular IR indexing and experimentation toolkits demonstrate the tool's utility. We also provide documentation of these datasets through the \sys catalog: https://ir-datasets.com/. The catalog acts as a hub for information on datasets used in IR, providing core information about what data each benchmark provides as well as links to more detailed information. We welcome community contributions and intend to continue to maintain and grow this tool.

Wiki-Reliability: A Large Scale Dataset for Content Reliability on Wikipedia
Authors: KayYen Wong (1), Miriam Redi (2), Diego Saez-Trumper (3)
1: Outreachy, 2: Wikimedia Foundation, 3: Wikimedia Foundation

ACM DL

Google Scholar

(314)
概要:　 Wikipediaは、アルゴリズムやウェブユーザーにとって信頼性の高い情報の中心地として利用されている、最大のオンライン百科事典です。Wikipediaのコンテンツの品質と信頼性は、ボランティアエディターのコミュニティによって維持されています。機械学習や情報検索のアルゴリズムは、Wikipediaの内容の信頼性に関するエディターの手作業の努力を拡大するのに役立ちます。しかし、この分野の研究開発を支援するための大規模なデータが不足しています。このギャップを埋めるために、本論文では、コンテンツの信頼性に関する広範な問題点が注釈された英語版Wikipedia記事の最初のデータセットであるWiki-Reliabilityを提案します。このデータセットを構築するために、Wikipediaの「テンプレート」に依拠します。テンプレートは、専門的なWikipediaエディターが「中立的ではない視点」や「矛盾する記事」の存在などのコンテンツ問題を示すために使用するタグであり、リビジョンにおける信頼性の問題を検出する強力なシグナルとなります。我々は、Wikipediaで最も人気のある信頼性に関連するテンプレート10個を選定し、各テンプレートに関して1M近くのWikipedia記事のリビジョンをポジティブまたはネガティブとラベル付けする効果的な方法を提案します。データセットの各ポジティブ/ネガティブ例には、完全な記事本文とリビジョンのメタデータから20の特徴量が付随しています。我々は、このデータによって可能となる下流タスクの全体像を概説し、Wiki-Reliabilityがコンテンツ信頼性予測のための大規模モデルの訓練に使用できることを示します。我々は、すべてのデータとコードを公開しています。

Abstract:　 Wikipedia is the largest online encyclopedia, used by algorithms and web users as a central hub of reliable information on the web.The quality and reliability of Wikipedia content is maintained by a community of volunteer editors. Machine learning and information retrieval algorithms could help scale up editors' manual efforts around Wikipedia content reliability. However, there is a lack of large-scale data to support the development of such research. To fill this gap, in this paper, we propose Wiki-Reliability, the first dataset of English Wikipedia articles annotated with a wide set of content reliability issues. To build this dataset, we rely on Wikipedia "templates". Templates are tags used by expert Wikipedia editors to indicate content issues, such as the presence of "non-neutral point of view" or "contradictory articles", and serve as a strong signal for detecting reliability issues in a revision. We select the 10 most popular reliability-related templates on Wikipedia, and propose an effective method to label almost 1M samples of Wikipedia article revisions as positive or negative with respect to each template. Each positive/negative example in the dataset comes with the full article text and 20 features from the revision's metadata. We provide an overview of the possible downstream tasks enabled by such data, and show that Wiki-Reliability can be used to train large-scale models for content reliability prediction. We release all data and code for public use.

WIT: Wikipedia-based Image Text Dataset for Multimodal Multilingual Machine Learning
Authors: Krishna Srinivasan (1), Karthik Raman (1), Jiecao Chen (1), Michael Bendersky (1), Marc Najork (1)
1: Google

ACM DL

Google Scholar

(315)
概要:　ディープ・リプレゼンテーション・ラーニングと事前学習技術によってもたらされた画期的な改善は、NLP、IR、およびビジョンの下流タスク全体で大きなパフォーマンス向上をもたらしました。マルチモーダル・モデリング技術は、高品質な視覚言語データセットを活用し、画像と言語モダリティ間の補完的な情報を学習することを目的としています。本稿では、マルチモーダルおよび多言語学習を更に促進するために、ウィキペディアベースの画像とテキスト (WIT) データセットを紹介します。WITは、108のウィキペディア言語にわたる1150万のユニークな画像を含む、3750万のエンティティ豊富な画像テキスト例の厳選されたセットで構成されています。その規模は、WITをマルチモーダルモデルの事前学習データセットとして使用することを可能にし、画像とテキストの検索などの下流タスクに応用した際に示されています。WITには、主に4つのユニークな利点があります。第一に、WITは執筆時点で画像テキスト例の数で3倍の規模を持つ最大のマルチモーダルデータセットです。第二に、WITは100以上の言語をカバーしており（それぞれが少なくとも12,000の例を持つ）、多くの画像に対するクロスリンガルなテキストを提供するという、初の大規模多言語データセットです。第三に、WITは以前のデータセットに比べて、より多様な概念や実世界のエンティティを代表しています。最後に、WITは非常に挑戦的な実世界のテストセットを提供しており、これを画像テキスト検索タスクの例を用いて実証しています。WITデータセットは、クリエイティブ・コモンズ・ライセンスを通じてダウンロードおよび使用可能です（こちら: https://github.com/google-research-datasets/wit）。

Abstract:　 The milestone improvements brought about by deep representation learning and pre-training techniques have led to large performance gains across downstream NLP, IR and Vision tasks. Multimodal modeling techniques aim to leverage large high-quality visio-linguistic datasets for learning complementary information across image and text modalities. In this paper, we introduce the Wikipedia-based Image Text (WIT) Dataset to better facilitate multimodal, multilingual learning. WIT is composed of a curated set of 37.5 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal models, as we show when applied to downstream tasks such as image-text retrieval. WIT has four main and unique advantages. First, WIT is the largest multimodal dataset by the number of image-text examples by 3x (at the time of writing). Second, WIT is massively multilingual (first of its kind) with coverage over 100+ languages (each of which has at least 12K examples) and provides cross-lingual texts for many images. Third, WIT represents a more diverse set of concepts and real world entities relative to what previous datasets cover. Lastly, WIT provides a very challenging real-world test set, as we empirically illustrate using an image-text retrieval task as an example. WIT Dataset is available for download and use via a Creative Commons license here: https://github.com/google-research-datasets/wit.

A Test Collection for Ad-hoc Dataset Retrieval
Authors: Makoto P. Kato (1), Hiroaki Ohshima (2), Ying-Hsang Liu (3), Hsin-Liang Chen (4)
1: University of Tsukuba & JST, 2: University of Hyogo, 3: Oslo Metropolitan University, 4: Missouri University of Science and Technology

ACM DL

Google Scholar

(316)
概要:　本稿は、アドホックデータセット検索のための新しいテストコレクションを紹介する。このテストコレクションは、第15回NTCIRにおけるデータ検索という共有タスクを通じて開発されたものである。本テストコレクションは、米国および日本政府のオープンデータサイト（Data.govおよびe-Stat）から派生したデータセットコレクションおよびこれらのコレクションに対応する英語と日本語のトピックで構成されている。NTCIRの共有タスクを組織する中で、74の検索システムによって取得されたデータセットに対して関連性判定を行い、それらをテストコレクションに含めた。テストコレクションの詳細な記述に加え、我々はテストコレクションに関する綿密な分析を実施し、以下のことを明らかにした：(1) どの技術が使用され効果的であったか、(2) どのトピックが困難であったか、(3) データセット検索タスクにおけるトピックの大きな変動性。

Abstract:　 This paper introduces a new test collection for ad-hoc dataset retrieval, which have been developed through a shared task called Data Search in the fifteenth NTCIR. This test collection consists of dataset collections derived from the US and Japanese governments' open data sites (i.e., Data.gov and e-Stat), as well as English and Japanese topics for these collections. Organizing the shared task in NTCIR, we conducted relevance judgments for datasets retrieved by 74 search systems, and included them in the test collection. In addition to the detailed description of the test collection, we conducted in-depth analysis on the test collection, and revealed (1) what techniques were used and effective, (2) what topics were difficult, and (3) large topic variability in the dataset retrieval task.

Booking.com Multi-Destination Trips Dataset
Authors: Dmitri Goldenberg (1), Pavel Levin (2)
1: Booking.com, 2: Booking.com

ACM DL

Google Scholar

(317)
概要:　本論文では、Booking.comのオンライン旅行プラットフォームを通じて予約された実際の複数目的地旅行の新しいデータセットを紹介します。このデータセットは、39,000の目的地にわたる359,000のユニークな旅行を表す1.5百万の予約から構成されています。そのため、このデータは高次元のターゲット空間における逐次的なレコメンデーションおよび情報検索問題のモデル化に非常に適しています。ユーザープライバシーを保護し、ビジネスに敏感な統計情報を守るために、このデータは完全に匿名化され、サンプル抽出されており、5つのユーザー起点市場に制限されています。それでもなお、このデータセットは一般的な旅行購入行動を代表しており、機械学習および情報検索の研究者にとって極めて価値のあるリソースとなっています。本研究では、データセットのを示します。WSDM WebTourワークショップの一環として最近開催されたBooking.comのデータチャレンジで得られた関連レコメンデーション問題のベンチマーク結果も報告します。

Abstract:　 We introduce a novel dataset of real multi-destination trips booked through Booking.com's online travel platform. The dataset consists of 1.5 million reservations representing 359,000 unique journeys made across 39,000 destinations. As such, the data is particularly well suited to model sequential recommendation and retrieval problems in a high cardinality target space. To preserve user privacy and protect business-sensitive statistics, the data is fully anonymized, sampled and limited to five user origin markets. Even so, the dataset is representative of the general travel purchase behavior and therefore presents a uniquely valuable resource for Machine Learning and information retrieval researchers. This work provides an overview of the dataset. It reports several benchmark results for relevant recommendation problems obtained as part of the recently held Booking.com data challenge during the WSDM WebTour workshop.

EXTRA: Explanation Ranking Datasets for Explainable Recommendation
Authors: Lei Li (1), Yongfeng Zhang (2), Li Chen (1)
1: Hong Kong Baptist University, 2: Rutgers University

ACM DL

Google Scholar

(318)
概要:　近年、説明可能なレコメンダーシステムに関する研究が学術界と産業界の両方から大きな注目を集めており、多種多様な説明可能なモデルが生み出されている。その結果、モデル間で評価方法が異なり、異なるモデルの説明可能性を比較するのが非常に困難である。推奨説明を評価するための標準的な方法を実現するために、我々はEXplanaTion RAnking（略してEXTRA）のための三つのベンチマークデータセットを提供し、説明可能性をランキング指向の指標で測定できるようにする。しかし、このようなデータセットを構築することは大きな課題を伴う。まず、既存のレコメンダーシステムにおけるユーザー-アイテム-説明の三重相互作用は非常に稀であり、代替手段を見つけることが一つの課題となる。我々の解決策は、ユーザーレビューからほぼ同一の文を特定することである。このアイディアは次に、データセット内の文を効率的に異なるグループに分類する方法という二つ目の課題に繋がる。全ての文間の類似性を推定するには二次の実行時間がかかる。これを緩和するために、特定のクエリに対してサブ線形時間で重複に近い文を検出できる局所性に敏感なハッシング（LSH）に基づくより効率的な方法を提供する。さらに、コミュニティの研究者が独自のデータセットを作成できるように、我々のコードを公開している。

Abstract:　 Recently, research on explainable recommender systems has drawn much attention from both academia and industry, resulting in a variety of explainable models. As a consequence, their evaluation approaches vary from model to model, which makes it quite difficult to compare the explainability of different models. To achieve a standard way of evaluating recommendation explanations, we provide three benchmark datasets for EXplanaTion RAnking (denoted as EXTRA), on which explainability can be measured by ranking-oriented metrics. Constructing such datasets, however, poses great challenges. First, user-item-explanation triplet interactions are rare in existing recommender systems, so how to find alternatives becomes a challenge. Our solution is to identify nearly identical sentences from user reviews. This idea then leads to the second challenge, i.e., how to efficiently categorize the sentences in a dataset into different groups, since it has quadratic runtime complexity to estimate the similarity between any two sentences. To mitigate this issue, we provide a more efficient method based on Locality Sensitive Hashing (LSH) that can detect near-duplicates in sub-linear time for a given query. Moreover, we make our code publicly available to allow researchers in the community to create their own datasets.

Pchatbot: A Large-Scale Dataset for Personalized Chatbot
Authors: Hongjin Qian (1), Xiaohe Li (1), Hanxun Zhong (1), Yu Guo (1), Yueyuan Ma (1), Yutao Zhu (2), Zhanliang Liu (1), Zhicheng Dou (1), Ji-Rong Wen (1)
1: Renmin University of China, 2: Université de Montréal

ACM DL

Google Scholar

(319)
概要:　最近、自然言語対話システムが大きな注目を集めています。多くの対話モデルがデータ駆動型であるため、高品質なデータセットがこれらのシステムにとって不可欠です。本論文では、Weiboと司法フォーラムからそれぞれ収集された2つのサブセットを含む大規模な対話データセット「Pchatbot」を紹介します。生データセットを対話システムに適応させるために、匿名化、重複除去、セグメンテーション、フィルタリングなどのプロセスを通じて綿密に正規化しました。Pchatbotの規模は既存の中国語データセットよりもかなり大きく、データ駆動型モデルにとって有益である可能性があります。さらに、現在のパーソナライズされたチャットボット用の対話データセットは通常、いくつかのペルソナ文や属性を含んでいますが、Pchatbotは投稿とレスポンスの両方に対して匿名化されたユーザーIDとタイムスタンプを提供します。これにより、ユーザーの対話履歴から暗黙のユーザー特性を直接学習するパーソナライズド対話モデルの開発が可能となります。我々の予備実験研究は、いくつかの最先端対話モデルをベンチマークし、将来の研究比較のための基準を提供します。このデータセットはGithubで公開されています：https://github.com/qhjqhj00/Pchatbot。

Abstract:　 atural language dialogue systems raise great attention recently. As many dialogue models are data-driven, high-quality datasets are essential to these systems. In this paper, we introduce Pchatbot, a large-scale dialogue dataset that contains two subsets collected from Weibo and Judicial forums respectively. To adapt the raw dataset to dialogue systems, we elaborately normalize the raw dataset via processes such as anonymization, deduplication, segmentation, and filtering. The scale of Pchatbot is significantly larger than existing Chinese datasets, which might benefit the data-driven models. Besides, current dialogue datasets for personalized chatbot usually contain several persona sentences or attributes. Different from existing datasets, Pchatbot provides anonymized user IDs and timestamps for both posts and responses. This enables the development of personalized dialogue models that directly learn implicit user personality from the user's dialogue history. Our preliminary experimental study benchmarks several state-of-the-art dialogue models to provide a comparison for future work. The dataset can be publicly accessed at Github: https://github.com/qhjqhj00/Pchatbot.

POINTREC: A Test Collection for Narrative-driven Point of Interest Recommendation
Authors: Jafar Afzali (1), Aleksander Mark Drzewiecki (1), Krisztian Balog (1)
1: University of Stavanger

ACM DL

Google Scholar

(320)
概要:　この論文では、物語に基づくシナリオにおける文脈依存型の興味地点（POI）推薦のためのテストコレクションを紹介します。ここでは、ユーザー履歴は利用できず、代わりにユーザーのリクエストが自然言語で記述されています。我々のコレクション内のリクエストは、ソーシャルシェアリングウェブサイトから手動で収集され、場所、カテゴリ、制約条件、具体的なPOI例など、様々なメタデータで注釈が付けられています。これらのリクエストは、人気のあるオンラインディレクトリから収集されたPOIデータセットから解決され、さらに地理的知識ベースと関連するウェブスニペットとリンクされて強化されています。評価のための関連度のグレードは、クラウドソーシングを通じて手動および自動の推薦をプールし、後者は将来の性能比較のベースラインとして機能します。このリソースは、エンドツーエンドのPOI推薦のための新しいアプローチの開発を支援するとともに、自然言語リクエストの特定のセマンティックアノテーションタスクのためのものです。

Abstract:　 This paper presents a test collection for contextual point of interest (POI) recommendation in a narrative-driven scenario. There, user history is not available, instead, user requests are described in natural language. The requests in our collection are manually collected from social sharing websites, and are annotated with various types of metadata, including location, categories, constraints, and example POIs. These requests are to be resolved from a dataset of POIs, which are collected from a popular online directory, and are further linked to a geographical knowledge base and enriched with relevant web snippets. Graded relevance assessments are collected using crowdsourcing, by pooling both manual and automatic recommendations, where the latter serve as baselines for future performance comparison. This resource supports the development of novel approaches for end-to-end POI recommendation as well as for specific semantic annotation tasks on natural language requests.

Seer-Dock: A General-Purpose Dockerized Scholarly Document Collection and Management Framework
Authors: Dina Sayed (1), Mohamed Nour (2), Heiko Schuldt (1)
1: University of Basel, 2: Independent Researcher

ACM DL

Google Scholar

(321)
概要:　テーマドキュメントコレクションの収集、管理、および分析は、多様なアプリケーションにおいて主要な課題とされています。このようなコレクションの編成基準は個々に異なる一方で、全体のプロセスは大部分が標準化されています。そのため、これらのタスクを引き継ぐために新しいシステムを毎回構築するのは効率的ではありません。本研究では、学術ドキュメントの収集および管理システムを構築するための、新しいかつ展開が容易な汎用的なDockerフレームワークであるSeer-Dockを紹介します。これは、最も広く使用されている学術検索エンジンであるCiteSeerXに基づいています。Seer-DockはすべてのコンポーネントにDockerコンテナを使用しており、任意のオペレーティングシステムプラットフォーム上で完全なドキュメントコレクションおよび管理システムを迅速に展開し、アプリケーションドメインの特定のニーズに合わせてカスタマイズすることを可能にします。さらに、スケーリング、オーケストレーション、メンテナンス、およびリカバリーが容易です。このリソースペーパーでは、Seer-Dockのアーキテクチャとそのコンポーネントを紹介します。そのカーネルであるCiteSeerXと同様に、Seer-DockはApache 2オープンソースライセンスの下で提供されています。

Abstract:　 The harvesting, management, and analysis of thematic document collections is a major challenge in a wide variety of applications. While the criteria for compiling such collections are individual, the entire process is largely standardized. Therefore, it is not efficient to build new systems over and over again to take over these tasks. In this work, we introduce Seer-Dock, a novel and easy-to deploy general-purpose dockerized framework to build a scholarly document harvesting and management system. It is based on CiteSeerX, the most widely used scholarly search engine. Seer-Dock uses docker containers for all components and thus enables its users to rapidly deploy a full-fledged document collection and management system on any operating system platform and tailor it to the specific needs of an application domain. Moreover, it is easy to scale, orchestrate, maintain, and recover. In this resource paper, we introduce the architecture of Seer-Dock and its components. Like its kernel CiteSeerX, Seer-Dock is available under an Apache 2 open source license.

Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
Authors: Aman Jain (1), Mayank Kothyari (1), Vishwajeet Kumar (2), Preethi Jyothi (1), Ganesh Ramakrishnan (1), Soumen Chakrabarti (1)
1: Indian Institute of Technology Bombay, 2: IBM Research

ACM DL

Google Scholar

(322)
概要:　マルチモーダル情報検索 (IR) は、テキストコーパス、知識グラフ、および画像を跨ぐものであり、特に外部知識を活用したビジュアル質問応答 (OKVQA) における研究が近年注目されています。しかし、広く使用されているデータセットには深刻な制約が存在します。多くのクエリは、クロスモーダル情報を統合する能力を評価しておらず、画像とは無関係であったり、推測に依存したり、OCRを必要とするか、画像だけから回答が得られるものも多いのです。さらに、上記の制約に加えて、トレインとテストのフォールド間に（意図せず）広く分布する回答の重複があるため、頻度に基づく推測が非常に効果的です。その結果、先端システムがこれらの弱点を利用しているのか、本当に答えを推論しているのかを判断するのは困難です。加えて、データセットはエンドツーエンドの回答検索タスクの定量的評価のみを目的として設計されており、入力クエリの正確な（意味的）解釈を評価するための仕組みがありません。

これに対して、OKVQAにおける重要な構造的イディオムであるS3（選択、置換、検索）を特定し、新しいデータセットとチャレンジを構築しました。具体的には、質問者が画像内のエンティティを特定し、そのエンティティを含む知識グラフまたはコーパスの文章を参照することでのみ回答可能な質問をします。我々のチャレンジは、(i)構造的イディオムに基づいて注釈を付けたOKVQAのサブセットであるOKVQA_S3と、(ii)新たにゼロから構築されたS3VQAで構成されてます。さらに、我々はこのチャレンジデータセットに明示的に対応するニューラルながら構造的に透明なOKVQAシステムであるS3を提示し、最近の競争的なベースラインを上回る性能を示しました。我々のコードとデータは、https://s3vqa.github.io/ で利用可能です。

Abstract:　 Multimodal IR, spanning text corpus, knowledge graph and images, called outside knowledge visual question answering (OKVQA), is of much recent interest. However, the popular data set has serious limitations. A surprisingly large fraction of queries do not assess the ability to integrate cross-modal information. Instead, some are independent of the image, some depend on speculation, some require OCR or are otherwise answerable from the image alone. To add to the above limitations, frequency-based guessing is very effective because of (unintended) widespread answer overlaps between the train and test folds. Overall, it is hard to determine when state-of-the-art systems exploit these weaknesses rather than really infer the answers, because they are opaque and their 'reasoning' process is uninterpretable. An equally important limitation is that the dataset is designed for the quantitative assessment only of the end-to-end answer retrieval task, with no provision for assessing the correct(semantic) interpretation of the input query. In response, we identify a key structural idiom in OKVQA ,viz., S3 (select, substitute and search), and build a new data set and challenge around it. Specifically, the questioner identifies an entity in the image and asks a question involving that entity which can be answered only by consulting a knowledge graph or corpus passage mentioning the entity. Our challenge consists of (i)OKVQA_S3, a subset of OKVQA annotated based on the structural idiom and (ii)S3VQA, a new dataset built from scratch. We also present a neural but structurally transparent OKVQA system, S3, that explicitly addresses our challenge dataset, and outperforms recent competitive baselines. We make our code and data available at https://s3vqa.github.io/.

Simulating User Satisfaction for the Evaluation of Task-oriented Dialogue Systems
Authors: Weiwei Sun (1), Shuo Zhang (2), Krisztian Balog (3), Zhaochun Ren (1), Pengjie Ren (1), Zhumin Chen (1), Maarten de Rijke (4)
1: Shandong University, 2: Bloomberg, 3: University of Stavanger, 4: University of Amsterdam & Ahold Delhaize Research

ACM DL

Google Scholar

(323)
概要:　タスク指向ダイアログシステムの開発プロセスにおいて、評価は極めて重要です。評価手法としてのユーザシミュレーションは、スケーラビリティやコスト効率の問題に対処することができ、大規模な自動評価にとって有力な選択肢となります。対話の質を評価するために、人間のようなユーザシミュレータを構築するために、次のタスクを提案します：タスク指向ダイアログシステムの評価のためのユーザ満足度のシミュレーションです。このタスクの目的は、ユーザシミュレーションの評価力を高め、シミュレーションをより人間らしくすることです。注釈データが不足している問題に対処するために、我々はUSSと呼ばれるユーザ満足度注釈データセットを提案します。このデータセットには、現実のeコマースダイアログ、ウィザード・オブ・オズ実験を通じて構築されたタスク指向ダイアログ、映画推薦ダイアログなど、複数のドメインからサンプルされた6,800の対話が含まれています。これらの対話におけるすべてのユーザ発話、および対話自体は、5段階の満足度スケールに基づいてラベル付けされています。さらに、ユーザ満足度予測およびアクション予測タスクのための3つのベースライン手法を共有します。USSデータセットに対して行われた実験では、分散表現が特徴ベースの手法よりも優れていることが示唆されています。階層化GRUに基づくモデルは、ドメイン内のユーザ満足度予測で最高のパフォーマンスを示し、一方でBERTベースのモデルはクロスドメイン汎化能力に優れています。

Abstract:　 Evaluation is crucial in the development process of task-oriented dialogue systems. As an evaluation method, user simulation allows us to tackle issues such as scalability and cost-efficiency, making it a viable choice for large-scale automatic evaluation. To help build a human-like user simulator that can measure the quality of a dialogue, we propose the following task: simulating user satisfaction for the evaluation of task-oriented dialogue systems. The purpose of the task is to increase the evaluation power of user simulations and to make the simulation more human-like. To overcome a lack of annotated data, we propose a user satisfaction annotation dataset, USS, that includes 6,800 dialogues sampled from multiple domains, spanning real-world e-commerce dialogues, task-oriented dialogues constructed through Wizard-of-Oz experiments, and movie recommendation dialogues. All user utterances in those dialogues, as well as the dialogues themselves, have been labeled based on a 5-level satisfaction scale. We also share three baseline methods for user satisfaction prediction and action prediction tasks. Experiments conducted on the USS dataset suggest that distributed representations outperform feature-based methods. A model based on hierarchical GRUs achieves the best performance in in-domain user satisfaction prediction, while a BERT-based model has better cross-domain generalization ability.

TripClick: The Log Files of a Large Health Web Search Engine
Authors: Navid Rekabsaz (1), Oleg Lesota (1), Markus Schedl (1), Jon Brassey (2), Carsten Eickhoff (3)
1: Johannes Kepler University, 2: Trip Database, 3: Brown University

ACM DL

Google Scholar

(324)
概要:　クリックログは、さまざまな情報検索（IR）タスクにとって有価なリソースです。これには、クエリの理解・分析や、特に大量のトレーニングデータを必要とする効果的なIRモデルの学習が含まれます。我々は、Trip Databaseの健康ウェブ検索エンジンにおけるユーザーインタラクションから得られた大規模なドメイン固有のクリックログデータセットを公開します。我々のクリックログデータセットは、2013年から2020年に収集された約520万件のユーザーインタラクションで構成されています。このデータセットを使用して、約70万件のユニークな自由形式クエリと130万件のクエリ・ドキュメントの関連信号ペアを含む標準的なIR評価ベンチマーク「TripClick」を作成しました。この関連性は二つのクリックスルーモデルによって推定されています。このコレクションは、大量のパラメータを持つニューラルIRモデルをトレーニングするために必要なデータの豊かさと規模を提供する数少ないデータセットの一つであり、とりわけ健康ドメインでは初めてのものです。TripClickを使用して、さまざまなIRモデルを評価する実験を行い、このデータを利用してニューラルアーキテクチャをトレーニングする利点を示します。特に、評価結果は、最も性能の良いニューラルIRモデルが、クラシックIRモデルに比べて特に頻繁なクエリに対して大幅に性能を向上させることを示しています。

Abstract:　 Click logs are valuable resources for a variety of information retrieval (IR) tasks. This includes query understanding/analysis, as well as learning effective IR models particularly when the models require large amounts of training data. We release a large-scale domain-specific dataset of click logs, obtained from user interactions of the Trip Database health web search engine. Our click log dataset comprises approximately 5.2 million user interactions collected between 2013 and 2020. We use this dataset to create a standard IR evaluation benchmark - TripClick - with around 700,000 unique free-text queries and 1.3 million pairs of query-document relevance signals, whose relevance is estimated by two click-through models. As such, the collection is one of the few datasets offering the necessary data richness and scale to train neural IR models with a large amount of parameters, and notably the first in the health domain. Using TripClick, we conduct experiments to evaluate a variety of IR models, showing the benefits of exploiting this data to train neural architectures. In particular, the evaluation results show that the best performing neural IR model significantly improves the performance by a large margin relative to classical IR models, especially for more frequent queries.

WTR: A Test Collection for Web Table Retrieval
Authors: Zhiyu Chen (1), Shuo Zhang (2), Brian D. Davison (1)
1: Lehigh University, 2: Bloomberg

ACM DL

Google Scholar

(325)
概要:　本研究では、Webテーブル検索のタスクにおけるテストコレクションの開発、特性、および利用可能性について述べます。これは、Common Crawlから抽出された大規模なWebテーブルコーパスを使用しています。Webテーブルには通常、ページタイトルや周囲の段落など、豊富なコンテキスト情報が含まれているため、クエリとテーブルのペアだけでなく、クエリに対するクエリとテーブルのコンテキストペアに関する関連性判断も提供します。これは、従来のテストコレクションでは無視されていた点です。このベンチマークを用いた今後の研究を支援するため、データセットの前処理方法や、従来および最近提案されたテーブル検索方法によるベースライン結果について詳述します。我々の実験結果は、コンテキストラベルを適切に使用することで、従来のテーブル検索方法の効果が向上することを示しています。

Abstract:　 We describe the development, characteristics and availability of a test collection for the task of Web table retrieval, which uses a large-scale Web Table Corpora extracted from the Common Crawl. Since a Web table usually has rich context information such as the page title and surrounding paragraphs, we not only provide relevance judgments of query-table pairs, but also the relevance judgments of query-table context pairs with respect to a query, which are ignored by previous test collections. To facilitate future research with this benchmark, we provide details about how the dataset is pre-processed and also baseline results from both traditional and recently proposed table retrieval methods. Our experimental results show that proper usage of context labels can benefit previous table retrieval methods.

Chatty Goose: A Python Framework for Conversational Search
Authors: Edwin Zhang (1), Sheng-Chieh Lin (1), Jheng-Hong Yang (1), Ronak Pradeep (1), Rodrigo Nogueira (1), Jimmy Lin (1)
1: University of Waterloo

ACM DL

Google Scholar

(326)
概要:　 Chatty Gooseは、ニューラルモデルの最新進展に基づいた強力かつ再現可能なリランキングパイプラインを提供する、オープンソースのPython対話検索フレームワークです。このフレームワークは、HuggingFaceのTransformersやFacebookのParlAIなどの有名なライブラリと統合可能な拡張性の高いモジュールコンポーネントで構成されています。我々の目標は、対話型検索の研究分野への参入障壁を低くし、研究者がその上に構築できる再現可能なベースラインを提供することです。本稿では、フレームワークのを説明し、ゼロから新しいシステムを立ち上げる方法を示します。Chatty Gooseは、我々がTREC 2019対話支援トラック（CAsT）で導入したコンポーネントの改善を取り入れており、我々の提出システムは最高のパフォーマンスを誇りました。このフレームワークを使用することで、数行のコードで同等の結果を再現できます。

Abstract:　 Chatty Goose is an open-source Python conversational search framework that provides strong, reproducible reranking pipelines built on recent advances in neural models. The framework comprises extensible modular components that integrate with popular libraries such as Transformers by HuggingFace and ParlAI by Facebook. Our aim is to lower the barrier of entry for research in conversational search by providing reproducible baselines that researchers can build on top of. We provide an overview of the framework and demonstrate how to instantiate a new system from scratch. Chatty Goose incorporates improvements to components that we introduced in the TREC 2019 Conversational Assistance Track (CAsT), where our submission represented the top-performing system. Using our framework, a comparable run can be reproduced with just a few lines of code.

DarkJargon.net: A Platform for Understanding Underground Conversation with Latent Meaning
Authors: Dominic Seyler (1), Wei Liu (1), Yunan Zhang (1), XiaoFeng Wang (2), ChengXiang Zhai (1)
1: University of Illinois at Urbana-Champaign, 2: Indiana University Bloomington

ACM DL

Google Scholar

(327)
概要:　地下会話の本質的な部分は、ダーク・ジャーゴン（暗号化された専門用語）にあります。これらの用語は一見無害に見えますが、隠された、時には悪意のある意味を持ち、地下フォーラムの参加者が違法行為のために使用します。例えば、「rat」というダーク用語は、「リモートアクセス型トロイの木馬（Remote Access Trojan）」の代わりにしばしば使われます。本研究では、潜在的な意味を持つ地下会話の理解を支援する新しいオンラインプラットフォームを紹介します。本システムは、研究者、法執行機関のエージェント、および「ホワイトハットハッカー」に、以下のような機能を提供することで、地下コミュニケーションへの貴重な洞察を得る手助けをします：(1) ダーク・ジャーゴン用語を辞書で調べる;(2) ダーク・ジャーゴンの使用の変遷を探り、その意味を解釈する;(3) 自らの研究成果を共有し、共同で研究する。さらに、トランスフォーマーベースのアーキテクチャのマスクランゲージモデリングを活用した新しいダーク・ジャーゴン解釈方法を導入します。

Abstract:　 An essential part of underground conversation are dark jargon terms. They are benign-looking, but have hidden, sometimes sinister meanings and are used by participants of underground forums for illicit behavior. For example, the dark term "rat" is often used in lieu of "Remote Access Trojan". We present a novel online platform that caters to the understating of underground conversation with latent meaning. Our system enables researchers, law enforcement agents and "white-hat" hackers to gain invaluable insights into underground communication by providing them with a tool to (1) look-up dark jargon terms in a dictionary; (2) explore the usage of dark jargon over time and interpret their meaning; (3) collaborate and contribute their own research findings. Furthermore, we introduce a novel dark jargon interpretation method that leverages masked language modeling of a transformer-based architecture.

OpenMatch: An Open Source Library for Neu-IR Research
Authors: Zhenghao Liu (1), Kaitao Zhang (1), Chenyan Xiong (2), Zhiyuan Liu (1), Maosong Sun (1)
1: Tsinghua University, 2: Microsoft Research

ACM DL

Google Scholar

(328)
概要:　 OpenMatchは、ニューラル情報検索（Neu-IR）研究のためのPythonベースのライブラリです。ニューラルおよび従来型の情報検索モジュールを自包含し、カスタマイズ可能で高性能な情報検索システムを簡単に構築できるようにしています。ユーザーがNeu-IRモデルの利点を最大限に引き出せるよう、OpenMatchは最新のニューラル情報検索モデルの実装、複雑な実験手順、および高度な少数ショット学習方法を提供します。OpenMatchは、広く使用されているIRベンチマーク上で前述の研究結果のランク付けを再現し、ユーザーが基準再実装にかかる労力から解放されることを可能にします。我々のOpenMatchベースのソリューションは、アドホック検索や会話型検索などのさまざまなランキングタスクでトップランクの経験的結果を達成し、OpenMatchが効果的な情報検索システムの構築を促進する利便性を示しています。ライブラリ、実験手法、および結果は、すべてhttps://github.com/thunlp/OpenMatchで公に利用可能です。

Abstract:　 OpenMatch is a Python-based library that serves for Neural Information Retrieval (Neu-IR) research. It provides self-contained neural and traditional IR modules, making it easy to build customized and higher-capacity IR systems. In order to develop the advantages of Neu-IR models for users, OpenMatch provides implementations of recent neural IR models, complicated experiment instructions, and advanced few-shot training methods. OpenMatch reproduces corresponding ranking results of previous work on widely-used IR benchmarks, liberating users from surplus labor in baseline reimplementation. Our OpenMatch-based solutions conduct top-ranked empirical results on various ranking tasks, such as ad hoc retrieval and conversational retrieval, illustrating the convenience of OpenMatch to facilitate building an effective IR system. The library, experimental methodologies and results of OpenMatch are all publicly available at https://github.com/thunlp/OpenMatch.

Precision Medicine Search for Paediatric Oncology
Authors: Bevan Koopman (1), Tracey Wright (1), Natacha Omer (2), Veronica McCabe (3), Guido Zuccon (4)
1: CSIRO, 2: Queensland Health, 3: Children's Hospital Foundation, 4: University of Queensland

ACM DL

Google Scholar

(329)
概要:　小児がんの治療法を見つけるための検索エンジンを発表します。小児がんは死因の主要な要因であり、臨床医はますます個々の患者、特に腫瘍の遺伝的特性に合わせた治療法を求めています。小児に特化した、そして個々の遺伝子に適合する治療法を見つけることは、膨大かつ増え続ける医学文献や臨床試験の中で真の挑戦となっています。私たちの目標は、この問題に適した検索システムを通じて臨床医を支援することです。このシステムは、PubMedの文献や臨床試験を検索します。遺伝子、薬品、がんという臨床医が重視する3つの主要な情報タイプを強調表示するためにエンティティ抽出が行われます。クエリ提案は、臨床医が難しいクエリを形成するのを助け、結果は知識グラフとして提示されることで、結果の解釈を容易にします。提案されたシステムは、ターゲット治療法の検索を大幅に簡素化し、見過ごされがちな命を救う治療法を見つける可能性もあります。デモの詳細はhttp://health-search.csiro.au/oscar/にあります。

Abstract:　 We present a search engine aimed to help clinicians find targeted treatments for children with cancer. Childhood cancer is a leading cause of death and clinicians increasingly seek treatments that are tailored to an individual patient, particularly their tumour genetics. Finding treatments that are specific to paediatrics and match individual genetics is a real challenge amongst the vast and growing body of medical literature and clinical trials. We aim to help clinicians through a search system tailored to this problem. The system retrieves PubMed articles and clinical trials. Entity extraction is done to highlight genes, drugs and cancers --- three key information types clinicians care about. Query suggestion helps clinicians formulate otherwise difficult queries and results are presented as a knowledge graph to help result interpretability. The proposed system aims to both significantly reduce the effort of searching for targeted treatments and potentially find life saving treatments that may have otherwise been missed. Demo details at http://health-search.csiro.au/oscar/

PYA0: A Python Toolkit for Accessible Math-Aware Search
Authors: Wei Zhong (1), Jimmy Lin (1)
1: University of Waterloo

ACM DL

Google Scholar

(330)
概要:　数式情報検索 (Mathematical Information Retrieval, MIR) は近年活発に研究され、多くの成果が発表されています。その中でも、Approach Zero システムは、サブストラクチャーマッチングを効率的に行える数式対応の検索エンジンの一つです。また、その実証的な効果と構造化された数式コンテンツを扱う能力から、最新のコミュニティ全体でのMIR評価であるARQMath2020において、強力なベースラインとして展開されました。しかし、構造化されたクエリを効率的に処理するための検索モデルを実装するには、Approach Zero はゼロから C言語で書かれており、数式コンテンツとクエリを処理するために特別なパイプラインが必要です。そのため、研究ツールとしてコミュニティで便利にアクセス・再利用できる状態にはありません。本論文では、Approach Zero を基盤として使いやすさを向上させるPythonツールキットPyA0を紹介します。我々はツールキットのインターフェースを紹介し、人気のあるMIRデータセットでの評価結果を報告して、ツールキットの効果と効率を実証します。PyA0 のソースコードは https://github.com/approach0/pya0 で公開されており、ノートブックデモへのリンクも含まれています。

Abstract:　 Mathematical Information Retrieval (MIR) has been actively studied in recent years and many fruitful results have emerged. Among those, the Approach Zero system is one of the few math-aware search engines that is able to perform substructure matching efficiently. Furthermore, it has been deployed in ARQMath2020, the most recent community-wide MIR evaluation, as a strong baseline due to its empirical effectiveness and ability to handle structured math content. However, in order to implement a retrieval model that handles structured queries efficiently, Approach Zero is written in C from the ground up, requiring special pipelines for processing math content and queries. Thus, the system is not conveniently accessible and reusable to the community as a research tool. In this paper, we present PyA0, an easy-to-use Python toolkit built on Approach Zero that improves its accessibility to researchers. We introduce the toolkit interface and report evaluation results on popular MIR datasets to demonstrate the effectiveness and efficiency of our toolkit. We have made PyA0 source code publicly accessible at https://github.com/approach0/pya0, which includes a link to a notebook demo.

QuARk: A GUI for Quality-Aware Ranking of Arguments
Authors: Markus Nilles (1), Lorik Dumani (1), Ralf Schenkel (1)
1: Trier University

ACM DL

Google Scholar

(331)
概要:　ウェブが日々進化し、コンピュータがますます高性能になる中、計算論的議論の分野における研究はますます重要になっています。その研究分野の一つが議論検索であり、これはユーザーのクエリに対して最適な議論を発見し提示することを目的としています。この目的のために既にいくつかのシステムが存在しており、すべて同じ目標を持ちながらも、異なる方法で実現されています。既存の研究に合わせて、議論は前提によって支持または攻撃される主張から成り立つとされています。現在、CLEFラボTouchéで議論検索が独立した課題となったことで、ランク付けの表示がますます重要になっています。本論文では、クエリに対して焦点を絞った討論コレクションから議論を検索するためのGUI「QuARk」を紹介します。私たちはフロントエンドとバックエンドを厳密に区別し、その間の通信をシンプルに保ったため、QuARkは適切な修正を行うことで様々な議論検索システムを統合することが可能です。GUIのデモンストレーションとして、CLEFラボTouchéで発表した複雑な検索アルゴリズムの統合を示します。私たちの検索プロセスは二つのステップから成り立っています。最初のステップでは、クエリと最も類似した主張を見つけます。このため、ユーザーは異なる標準情報検索(IR)類似度方法から選択できます。第二のステップでは、主張に直接関連する前提をランク付けします。このため、ユーザーは定量的、定性的、またはそれらを組み合わせた評価基準で議論をランク付けすることができます。

Abstract:　 With the Web augmenting every day and computers increasingly getting more powerful, research in the field of computational argumentation becomes more and more important. One of its research branches is argument retrieval, which aims at finding and presenting users the best arguments for their queries. Several systems already exist for this purpose, all having the same goal but reaching it in different ways. In line with existing work, an argument consists of a claim supported or attacked by a premise. Now that argument retrieval has become a separate task in the CLEF lab Touché, displaying the ranking is becoming increasingly important. In this paper we present QuARk, a GUI that allows users to retrieve arguments from a focused debate collection for their queries. Since we strictly distinguished between frontend and backend and kept the communication between them simple, QuARk can be extended to integrate various argument retrieval systems, assuming some modifications are made. In order to demonstrate the GUI, we show the integration of a complex retrieval algorithm that we also presented in the CLEF lab Touché. Our retrieval process consists of two parts. In the first step, it finds the most similar claims to the query. Therefore, the user can select between different standard IR similarity methods. The second step ranks the premises directly related to the claims. Therefore, the user can choose to rank the arguments either by quantitative, qualitative, or a combined measure.

The Information Retrieval Anthology
Authors: Martin Potthast (1), Sebastian Günther (2), Janek Bevendorff (3), Jan Philipp Bittner (2), Alexander Bondarenko (2), Maik Fröbe (2), Christian Kahmann (1), Andreas Niekler (1), Michael Völske (3), Benno Stein (3), Matthias Hagen (2)
1: Leipzig University, 2: Martin-Luther-Universität Halle-Wittenberg, 3: Bauhaus-Universität Weimar

ACM DL

Google Scholar

(332)
概要:　我々は、情報検索に関する論文をメタデータブラウザと全文検索エンジンを介してアクセス可能にする「IR Anthology」を紹介します。よく知られているACL Anthologyに倣い、IR Anthologyは情報検索に関心を持つ研究者のためのハブとして機能します。我々の検索エンジンChatNoirは論文の全文をインデックス化し、集中した検索を可能にし、ユーザーをそれぞれの出版社のサイトにリンクして個人的なアクセスを提供します。執筆時点で40,000以上の論文がリストされており、IR Anthologyはhttps://IR.webis.deから自由にアクセス可能です。

Abstract:　 We present the IR Anthology, a corpus of information retrieval publications accessible via a metadata browser and a full-text search engine. Following the example of the well-known ACL Anthology, the IR Anthology serves as a hub for researchers interested in information retrieval. Our search engine ChatNoir indexes the publications' full texts, enabling a focused search and linking users to the respective publisher's site for personal access. Listing more than 40,000 publications at the time of writing, the IR Anthology can be freely accessed at https://IR.webis.de.

A Web-based Knowledge Hub for Exploration of Multiple Research Article Collections
Authors: Wei Emma Zhang (1), Miao Liu (2), Alan Pallath (1), Gokul Tamilventhan (1)
1: The University of Adelaide, 2: Wuhan University

ACM DL

Google Scholar

(333)
概要:　医療の意思決定には、豊富な医療研究と臨床試験の結果が指針として用いられます。新しいウイルス、例えばCovid-19が世界的な流行を引き起こしている場合、医師、医療従事者、研究者は最新の研究と臨床結果に基づいた正確な決定を緊急に行う必要があります。しかし、特定のトピックに関する医療文献は異なる視点からのものがあり、それらが異なる文献データベースに保管されることが多いため、関連するすべての論文を検索する作業は手間と時間がかかります。さらに、そのトピックに関する発表文献数が急速に増加している場合、この問題はより悪化します。本研究では、研究者からの要望を受けて、Covid-19関連の研究論文専用のオンライン知識ハブ(http://covid19knowledgehub.herokuapp.com/)を構築しました。このシステムは、広範な医療分野をカバーする9つの医療研究論文データベースに基づいており、ユーザーが一度に複数の文献データベースから論文を簡単に検索および探索できるようにします。また、記事の分布に関する統計情報も提供し、このトピックに関する研究の状況を概観する役割も果たします。この要請に基づいたシステムは、武漢大学人民病院の研究チームにおいて導入され、最新の記事検索にかかる時間を大幅に削減しました。本プロジェクトはCovid-19関連の研究論文を対象としていますが、その背後にあるアプローチはあらゆるトピック、あらゆる分野に適用可能です。

Abstract:　 Medical decision-making is guided by the results of rich medical research and clinical trials. Doctors, practitioners and researchers urgently need to get updated by the most recent research and clinical outputs to make correct decisions, especially when a new virus such as Covid-19 is causing a global epidemic. However, medical literature for a certain topic could be from different aspects thus archived in different literature databases, resulting in the searching for all related articles become a laborious and time-consuming task. It becomes worse when there is a rapid growth of the number of published literature for the given topic. In this work, we build an online knowledge hub (http://covid19knowledgehub.herokuapp.com/) particularly for Covid-19 related research articles per the requirement from researchers in a hospital. The system is built on top of nine medical research article databases, which covers a wide range of medical aspects. It allows users to easily retrieve and explore the articles from multiple literature databases at one-stop. The system also provides the statistics of article distributions to offer an overview of the status of research under this topic. This real-demand driven system is deployed in a research team of Renmin Hospital, Wuhan University, and largely reduces their time for searching the latest articles. Although this project focuses on Covid-19 related research articles, the approach at the back could be applied to any topic in any domain.

FedNLP: An Interpretable NLP System to Decode Federal Reserve Communications
Authors: Jean Lee (1), Hoyoul Luis Youn (2), Nicholas Stevens (3), Josiah Poon (1), Soyeon Caren Han (1)
1: The University of Sydney, 2: KPMG Australia, 3: NRS Technology

ACM DL

Google Scholar

(334)
概要:　連邦準備制度理事会（フェデラル・リザーブ・システム、以下「Fed」）は、世界中の金融政策と金融状況に大きな影響を与えます。Fedのコミュニケーションを分析して有用な情報を抽出することは重要ですが、その内容は曖昧で難解なため、一般的に長文で複雑です。本論文では、連邦準備制度のコミュニケーションを解読するための、解釈可能な多要素自然言語処理（NLP）システムであるFedNLPを紹介します。このシステムは、エンドユーザーがNLP技術を活用して、連邦準備制度のコミュニケーションを包括的に理解するための方法を提供し、プログラミングの必要がありません。FedNLPは、バックグラウンドで各ダウンストリームタスクにおいて従来の機械学習アルゴリズムから深層ニューラルネットワークアーキテクチャまでを使用しています。デモンストレーションでは、感情分析、文書の、フェデラルファンズ金利の動きの予測、および予測モデルの結果を解釈するための可視化を含む複数の結果を一度に示します。我々のアプリケーションシステムとデモンストレーションはhttps://fednlp.netで利用可能です。

Abstract:　 The Federal Reserve System (the Fed) plays a significant role in affecting monetary policy and financial conditions worldwide. Although it is important to analyse the Fed's communications to extract useful information, it is generally long-form and complex due to the ambiguous and esoteric nature of content. In this paper, we present FedNLP, an interpretable multi-component Natural Language Processing (NLP) system to decode Federal Reserve communications. This system is designed for end-users to explore how NLP techniques can assist their holistic understanding of the Fed's communications with NO coding. Behind the scenes, FedNLP uses multiple NLP models from traditional machine learning algorithms to deep neural network architectures in each downstream task. The demonstration shows multiple results at once including sentiment analysis, summary of the document, prediction of the Federal Funds Rate movement and visualization for interpreting the prediction model's result. Our application system and demonstration are available at https://fednlp.net.

GeoWINE: Geolocation based Wiki, Image, News and Event Retrieval
Authors: Golsa Tahmasebzadeh (1), Endri Kacupaj (2), Eric Müller-Budack (1), Sherzod Hakimov (1), Jens Lehmann (3), Ralph Ewerth (4)
1: TIB - Leibniz Information Centre for Science and Technology, 2: University of Bonn, 3: University of Bonn & Fraunhofer IAIS Dresden, 4: TIB - Leibniz Information Centre for Science and Technology & L3S Research Center

ACM DL

Google Scholar

(335)
概要:　ソーシャルメディアの文脈において、ニュースやイベントの地理的位置推定は非常に重要なタスクとなっています。本論文では、GeoWINE（地理位置ベースのウィキ-画像-ニュース-イベント検索）デモンストレーターを紹介します。これは、単一の画像を入力として期待する効果的なモジュラーシステムであり、マルチモーダル検索に優れています。GeoWINEシステムは、さまざまな情報源から関連情報を取得するために5つのモジュールで構成されています。最初のモジュールは、画像の地理位置推定のための最先端モデルです。2番目のモジュールは、Wikidataナレッジグラフを使ったエンティティ検索のための地理空間ベースのクエリを実行します。3番目のモジュールは、入力画像に対して最も類似したエンティティを取得するために使用される4つの異なる画像埋め込み表現を活用します。最後の2つのモジュールは、EventRegistryとOpen Event Knowledge Graph (OEKG)からニュースとイベントの検索を実行します。GeoWINEはエンドユーザーにとって直感的なインターフェースを提供し、専門家にとっては個別の設定に再構成するための洞察を提供します。GeoWINEはGoogle Landmarksデータセットにおける画像のエンティティラベル予測で有望な結果を達成しました。このデモンストレーターは、http://cleopatra.ijs.si/geowine/ で公開されています。

Abstract:　 In the context of social media, geolocation inference on news or events has become a very important task. In this paper, we present the GeoWINE (Geolocation-based Wiki-Image-News-Event retrieval) demonstrator, an effective modular system for multimodal retrieval which expects only a single image as input. The GeoWINE system consists of five modules in order to retrieve related information from various sources. The first module is a state-of-the-art model for geolocation estimation of images. The second module performs a geospatial-based query for entity retrieval using the Wikidata knowledge graph. The third module exploits four different image embedding representations, which are used to retrieve most similar entities compared to the input image. The last two modules perform news and event retrieval from EventRegistry and the Open Event Knowledge Graph (OEKG). GeoWINE provides an intuitive interface for end-users and is insightful for experts for reconfiguration to individual setups. The GeoWINE achieves promising results in entity label prediction for images on Google Landmarks dataset. The demonstrator is publicly available at http://cleopatra.ijs.si/geowine/.

OrgBox: Supporting Cognitive and Metacognitive Activities During Exploratory Search
Authors: Austin R. Ward (1), Robert Capra (2)
1: University of North Carolina at Chapel Hill, 2: University of North Carolina at Chapel Hill

ACM DL

Google Scholar

(336)
概要:　本論文では、探索的な検索作業を行う際にユーザーの組織的、認知的、メタ認知的活動を支援するために作成された「OrgBox」と呼ばれる検索支援ツールについて説明します。OrgBoxツールは、検索中に見つけた情報を整理するために、「ボックス」を作成しラベル付けすることができます。OrgBoxはカスタム構築された検索システムと統合されており、ユーザーがドラッグアンドドロップの操作を使用して情報を保存、整理、統合することができます。また、OrgBoxツールはユーザーが検索プロセス中にメタ認知的活動（例：次のステップの計画、進捗の監視、これまでに見つけた情報の評価）に従事するよう促します。本論文では、OrgBoxツールの機能と実装について説明します。また、OrgBoxを使用して行った二つのユーザースタディの結果をし、ユーザーに対する認知的およびメタ認知的な利点を示します。

Abstract:　 We describe a search-assistance tool called the OrgBox, created to support users' organizational, cognitive, and metacognitive activities while performing exploratory search tasks. The OrgBox tool allows users to create and label "boxes" to organize information found during a search. The OrgBox was integrated with a custom built search system that allowed users to save, organize, and synthesize information using drag-and-drop actions. The OrgBox tool also encourages users to engage in metacognitive activities during their search process (e.g., planning their next steps, monitoring their progress, evaluating the information found so far). In this paper, we describe the features and implementation of the OrgBox tool. We also summarize result of two user studies conducted using the OrgBox that show its cognitive and metacognitive benefits to users.

QuTI! Quantifying Text-Image Consistency in Multimodal Documents
Authors: Matthias Springstein (1), Eric Müller-Budack (1), Ralph Ewerth (1)
1: TIB - Leibniz Information Centre for Science and Technology

ACM DL

Google Scholar

(337)
概要:　ワールドワイドウェブとソーシャルメディアプラットフォームは、ニュースや情報の人気のある情報源となっています。通常、画像とテキストなどのマルチモーダル情報が、情報をより効果的に伝え、関心を引くために使用されます。多くの場合、画像の内容は装飾的であるか、追加の情報を示しているが、近年では偽情報や噂を広めるためにも利用されることがあります。本論文では、画像とテキスト内のエンティティ（人物、場所、イベント）のクロスモーダル関係を自動的に定量化するウェブベースのデモアプリケーションを紹介します。このアプリケーションには多様な応用があります。例えば、ユーザーがマルチモーダルな記事をより効率的に探索するのを助けたり、ニュース記事、ツイート、その他のマルチモーダル文書の信頼性を検証するための人間の査定者やファクトチェックの取り組みを支援することができます。

Abstract:　 The World Wide Web and social media platforms have become popular sources for news and information. Typically, multimodal information, e.g., image and text is used to convey information more effectively and to attract attention. While in most cases image content is decorative or depicts additional information, it has also been leveraged to spread misinformation and rumors in recent years. In this paper, we present a web-based demo application that automatically quantifies the cross-modal relations of entities~(persons, locations, and events) in image and text. The applications are manifold. For example, the system can help users to explore multimodal articles more efficiently, or can assist human assessors and fact-checking efforts in the verification of the credibility of news stories, tweets, or other multimodal documents.

Towards Trustworthiness in the Context of Explainable Search
Authors: Sayantan Polley (1), Rashmi Raju Koparde (1), Akshaya Bindu Gowri (1), Maneendra Perera (1), Andreas Nuernberger (1)
1: Otto von Guericke University Magdeburg

ACM DL

Google Scholar

(338)
概要:　説明可能なAI（XAI）は現在、活発な研究テーマとなっています。しかし、真の説明が欠如しているため、Explainable SearchのようなXAIシステムの評価は困難です。我々は、検索性能と共にXAIの信頼性側面を評価することに焦点を当てたExplainable Searchシステムを提案します。我々は最新の説明可能な検索システムの強化版であるSIMFIC 2.0（Similarity in Fiction）を紹介します。このシステムは、選択された書籍に類似した書籍をクエリ例示設定で検索します。このシステムの動機は、フィクション書籍における類似性の概念を説明することにあります。我々はフィクション書籍のための手作りの解釈可能な特徴を抽出し、線形回帰を適用してグローバルな説明を提供し、類似性指標に基づいたローカルな説明を提供します。信頼性の側面はユーザースタディを用いて評価し、ランキング性能はユーザークリックの分析によって比較されます。また、インターフェースと対話する際に説明要素へのユーザーの注意を調査するために、視線追跡を使用します。初期の実験では、システムの信頼性について統計的に有意な結果が得られ、現在も研究が進められている興味深い研究方向への道が開かれました。

Abstract:　 Explainable AI (XAI) is currently a vibrant research topic. However, the absence of ground truth explanations makes it difficult to evaluate XAI systems such as Explainable Search. We present an Explainable Search system with a focus on evaluating the XAI aspect of Trustworthiness along with the retrieval performance. We present SIMFIC 2.0 (Similarity in Fiction), an enhanced version of a recent explainable search system. The system retrieves books similar to a selected book in a query-by-example setting. The motivation is to explain the notion of similarity in fiction books. We extract hand-crafted interpretable features for fiction books and provide global explanations by fitting a linear regression and local explanations based on similarity measures. The Trustworthiness facet is evaluated using user studies, while the ranking performance is compared by analysis of user clicks. Eye tracking is used to investigate user attention to the explanation elements when interacting with the interface. Initial experiments show statistically significant results on the Trustworthiness of the system, paving way for interesting research directions that are being investigated.

YASBIL: Yet Another Search Behaviour (and) Interaction Logger
Authors: Nilavra Bhattacharya (1), Jacek Gwizdka (1)
1: The University of Texas at Austin

ACM DL

Google Scholar

(339)
概要:　参加者の検索ログを収集することは、インタラクティブな情報検索（IR）研究において不可欠です。現在のアプローチは断片的な解決策であったり、煩雑な設定を必要としたりします。我々はYASBILを紹介します。これはブラウザ拡張機能とWordPressプラグインから成る二つのコンポーネントのログ収集ソリューションです。ブラウザ拡張機能は参加者のマシン上の閲覧活動をログに記録し、WordPressプラグインがその記録されたデータを研究者のデータサーバに収集します。このログ収集は、ウェブページのHTML構造についての知識を必要とせず、どのウェブページでも機能します。YASBILはまた、参加者がログデータを閲覧しコピーを取得できるようにすると共に、HTTPS接続を介して安全にデータを研究者のサーバにアップロードできるようにすることにより、倫理的なデータの透明性とセキュリティを提供します。我々は、インストールの容易さと使いやすさが、YASBILをリモートユーザースタディおよびIRにおける縦断研究に特に適したものにするであろうと考えています。

Abstract:　 Collecting participant search logs is an integral part of interactive IR research. Today's existing approaches are either piecemeal solutions, and/or require cumbersome setups. We present YASBIL, a two-component logging solution comprising a browser extension and a WordPress plugin. The browser extension logs the browsing activity in the participants' machines. The WordPress plugin collects the logged data into the researcher's data server. The logging works on any webpage, without the need to own or have knowledge about the HTML structure of the webpage. YASBIL also offers ethical data transparency and security towards participants, by enabling them to view and obtain copies of the logged data, as well as securely upload the data to the researcher's server over an HTTPS connection. We posit that ease of installation and use will make YASBIL especially suitable for remote user-studies, and longitudinal studies in IR.

Big Brother: A Drop-In Website Interaction Logging Service
Authors: Harrisen Scells (1), Jimmy (2), Guido Zuccon (1)
1: The University of Queensland, 2: University of Surabaya

ACM DL

Google Scholar

(340)
概要:　ユーザー研究における細粒度のインタラクションの記録は、ユーザー行動の研究などの理由から重要です。しかし、多くの研究シナリオでは、インタラクションの記録方法が単一のシステムに依存しがちです。そこで我々は、ユーザー研究を特に対象とした、ウェブページ上でのインタラクションを記録するための汎用的かつアプリケーション非依存のサービスを提案します。この「Big Brother」サービスは、研究者がほとんど設定を必要とせずに既存のユーザーインターフェースに組み込むことが可能です。Big Brotherは既に複数のユーザー研究で使用されており、ラボベースやクラウドソーシング環境など、さまざまな研究シナリオでインタラクションを記録しています。さらに、Big Brotherが非常に大規模なユーザー研究に対応できる能力をベンチマーク実験を通じて示します。加えて、Big Brotherはインタラクションの可視化や分析のためのツールも提供します。Big Brotherは、ユーザー研究のための強力で最小限の設定が必要なサービスを提供することにより、ユーザーインタラクションを記録する際の参入障壁を大幅に低減します。このサービスは数千の同時セッションにも対応できるスケーラビリティを持ち、我々はBig Brotherのソースコードとリリースをhttps://github.com/hscells/bigbroからダウンロード可能にしています。

Abstract:　 Fine-grained logging of interactions in user studies is important for studying user behaviour, among other reasons. However, in many research scenarios, the way interactions are logged is usually tied to a monolithic system. We present a generic, application-independent service for logging interactions in web-pages, specifically targetting user studies. Our service, Big Brother, can be dropped-in to existing user interfaces with almost no configuration required by researchers. Big Brother has already been used in several user studies to record interactions in a number of user study research scenarios, such as lab-based and crowdsourcing environments. We further demonstrate the ability for Big Brother to scale to very large user studies through benchmarking experiments. Big Brother also provides a number of additional tools for visualising and analysing interactions. Big Brother significantly lowers the barrier to entry for logging user interactions by providing a minimal but powerful, no configuration necessary, service for researchers and practitioners of user studies that can scale to thousands of concurrent sessions. We have made the source code and releases for Big Brother available for download at https://github.com/hscells/bigbro.

DiffIR: Exploring Differences in Ranking Models' Behavior
Authors: Kevin Martin Jose (1), Thong Nguyen (1), Sean MacAvaney (2), Jeffrey Dalton (2), Andrew Yates (1)
1: Max Planck Institute for Informatics, 2: University of Glasgow

ACM DL

Google Scholar

(341)
概要:　検索モデルの挙動を理解し比較することは、平均有効性やクエリごとのメトリクスを超えて、ランキングモデルの挙動が個々の結果に与える影響の重要な違いを明らかにする必要があるため、基本的かつ挑戦的な課題です。DiffIRは、新しいオープンソースのWebツールであり、挙動が著しく異なるクエリに対して個々の結果レベルでシステムランキングを視覚的に『diff』して質的ランキング分析を支援します。複数の設定可能な類似性測定の一つを使用し、モデル間で重要な違いがあるクエリを特定し、ランキングを並べて比較する視覚的なWebインターフェースを提供します。さらにDiffIRは、カスタム用語重要度ファイルに基づいたモデル固有の視覚化アプローチをサポートしています。これにより、類似度マトリックスや単一の文書パッセージに基づいて文書スコアを生成するニューラル検索方法などの解釈可能なモデルの挙動を研究することが可能です。このツールからの観察結果は、ABNIRMLのようなニューラルプロービングアプローチを補完し、定量的テストを生成することができます。我々は、標準的なTRECベンチマークデータセット上での最近開発されたニューラルランキングモデル間の質的な違いを研究することで、このDiffIRの具体的な使用例を提供します。

Abstract:　 Understanding and comparing the behavior of retrieval models is a fundamental challenge that requires going beyond examining average effectiveness and per-query metrics, because these do not reveal key differences in how ranking models' behavior impacts individual results. DiffIR is a new open-source web tool to assist with qualitative ranking analysis by visually 'diffing' system rankings at the individual result level for queries where behavior significantly diverges. Using one of several configurable similarity measures, it identifies queries for which the rankings of models compared have important differences in individual rankings and provides a visual web interface to compare the rankings side-by-side. DiffIR additionally supports a model-specific visualization approach based on custom term importance weight files. These support studying the behavior of interpretable models, such as neural retrieval methods that produce document scores based on a similarity matrix or based on a single document passage. Observations from this tool can complement neural probing approaches like ABNIRML to generate quantitative tests. We provide an illustrative use case of DiffIR by studying the qualitative differences between recently developed neural ranking models on a standard TREC benchmark dataset.

Interacting with Information in Immersive Virtual Environments
Authors: Austin R. Ward (1), Yiyin Gu (1), Sandeep Avula (2), Praneeth Chakravarthula (1)
1: University of North Carolina at Chapel Hill, 2: Amazon

ACM DL

Google Scholar

(342)
概要:　本論文では、情報検索の文脈において、ユーザーが没入型仮想環境で的な情報オブジェクトとどのように相互作用するかを研究するために設計および実装された「仮想現実における情報相互作用（IIVR）」システムを示します。仮想現実（VR）ディスプレイは急速に社会的および個人的コンピューティングメディアとして成長しており、これらの没入型環境におけるユーザーの相互作用を理解することが重要です。この新興プラットフォーム上での効果的な情報検索に向けた一歩として、我々のシステムは、VRにおける情報のトリアージ（選別）タスクにユーザーがどのように従事するかを観察するための今後の研究の中心となります。これらの研究では、(1)情報のレイアウトと(2)VRにおける相互作用の種類の影響を観察します。この初期のシステムは、将来のVR情報検索アプリケーションのための有意義な相互作用を理解し設計するために、研究者を動機付けると信じています。

Abstract:　 In this paper, we demonstrate the Information Interactions in Virtual Reality (IIVR) system designed and implemented to study how users interact with abstract information objects in immersive virtual environments in the context of information retrieval. Virtual reality displays are quickly growing as social and personal computing media, and understanding user interactions in these immersive environments is imperative. As a step towards effective information retrieval in such emerging platforms, our system is central to upcoming studies to observe how users engage in information triaging tasks in Virtual Reality (VR). In these studies, we will observe the effects of (1) information layouts and (2) types of interactions in VR. We believe this early system motivates researchers in understanding and designing meaningful interactions for future VR information retrieval applications.

News2PubMed: A Browser Extension for Linking Health News to Medical Literature
Authors: Jun Wang (1), Bei Yu (2)
1: Independent Researcher, 2: Syracuse University

ACM DL

Google Scholar

(343)
概要:　このデモシステムは、健康ニュース記事の読者が関連する医学・健康研究論文を迅速に取得できるブラウザ拡張機能を提供します。このシステムは、ニュース編集者や読者が相関的な発見からの因果関係の主張や動物研究から人間への推論など、不正確または誇張された主張をファクトチェックするのに役立ちます。健康ニュースを関連する研究論文にリンクさせることは、科学ニュース報道ではリンクがほとんど欠如しているため、簡単な作業ではありません。健康ニュースを医学文献にリンクするために、我々のシステムは新しい固有名詞認識機能を含め、ジャーナル名を抽出し、リッチなメタデータを検索戦略に取り入れるための新しいElasticsearchベースの検索エンジンを導入しています。本論文では、提案された検索システムの性能を評価するための新しいデータセットも紹介します。

Abstract:　 This demo system presents a browser extension that allows the reader of a health news article to quickly retrieve related medical/health research papers. This system can help news editors and readers fact-check health news for incorrect or exaggerated claims, such as making causal claims from correlational findings or inference of animal studies to humans. Linking health news to the original research papers is not a trivial task, as links are largely missing in science news reports. To link health news to medical literature, our system includes a new named-entity recognition function to extract journal names, and a new Elasticsearch-based search engine to incorporate rich metadata into the search strategy. This paper also introduces a new dataset for evaluating the performance of the proposed search system.

PECAN: A Platform for Searching Chat Conversations
Authors: Kunpeng Qin (1), Harrisen Scells (1), Guido Zuccon (1)
1: The University of Queensland

ACM DL

Google Scholar

(344)
概要:　現在、組織や個人が利用している既存のチャットサービスの多くは、過去に送信されたメッセージを検索する方法を提供しています。しかし、これらのチャットサービスの多くは、個々のメッセージに対する完全一致の検索機能しか提供しておらず、非常に制限されています。本論文では、この問題に対処するための新しいタスク「会話の検索」を紹介します。このタスクでは、検索クエリに基づいて関連するメッセージのグループを取得し、ランク付けすることを目的としています。我々は、このタスクを促進するための研究開発用プラットフォーム「PECAN」を提供します。このプラットフォームは、研究者が「会話の検索」に関する実験を行うために必要なすべての機能を提供します。また、我々のシステムは、チャットメッセージアーカイブを検索したい組織や個人をサポートする汎用性を持っています。PECANはオープンソースプロジェクトとして一般に公開しており、https://github.com/ielab/pecan からダウンロード可能です。

Abstract:　 Often, existing chat services that organisations and individuals use today provide a way to search through previously sent messages. However, many of these chat services provide far-limited search functionalities, typically exact matching on individual messages. In this paper, we introduce a new task for addressing this problem, called searching for conversations, whereby the aim is to retrieve and rank groups of related messages given a search query. We promote this task by providing a platform for research and development called PECAN. Our platform provides all the necessary functionality researchers need to conduct experiments on searching for conversations. Our system is also generic so as to support organisations and individuals who wish to search through their chat message archives. We release PECAN to the wider community as an Open Source project available for download at https://github.com/ielab/pecan.

Privacy-Aware Remote Information Retrieval User Experiments Logging Tool
Authors: Hanyu Li (1), Hongyu Lu (1), Songhao Huang (1), Weizhi Ma (1), Min Zhang (1), Yiqun Liu (1), Shaoping Ma (1)
1: Beijing National Research Center for Information Science and Technology

ACM DL

Google Scholar

(345)
概要:　ユーザーの行動や経験は情報検索システムの基本要素ですが、収集が難しく、アプリケーションと研究の両方に課題をもたらします。近年、研究者は単純なクリックよりも詳細なユーザー行動（時間パターン、マウスやスクロールのパターンなど）を専用の実験プラットフォームで探求しています。しかし、ユーザーの行動や経験を記録するための公開されたツールキットの欠如は、リアルなシナリオにおける遠隔ユーザー実験のフィールドスタディを困難にしています。本研究では、ユーザーのプライバシーに特別な配慮を払いながら、ユーザー行動および明示的な経験フィードバックを遠隔で収集するためのプライバシー対応リモートユーザーログツールを提案します。このツールを使用することで、参加者は時間と場所に制約されずに遠隔からユーザー実験を行うことができ、研究者はユーザーのより自然な行動や経験を観察する可能性が広がります。

Abstract:　 User behaviors and experiences are the fundamental parts of information retrieval systems, but are often difficult to collect, bringing challenges to both applications and research. Recently, researchers have been exploring more fine-grained user behavior than simple clicks, such as time patterns, mouse/scroll patterns, etc., with their own specific laboratory experimental platforms. However, the lack of public available toolkits for logging user behaviors and experiences leads to difficulties on field study of remote user experiments in real scenarios. In this work, we propose a Privacy-Aware Remote User Logging Tool for remotely collecting user behaviors and explicit experience feedback, with a special care for user privacy. With this tool, participants can conduct the user experiments remotely without time and location constraints, giving researchers the possibility to observe users' more natural behaviors and experiences.

Science2Cure: A Clinical Trial Search Prototype
Authors: Maciej Rybinski (1), Sarvnaz Karimi (1), Aleney Khoo (1)
1: CSIRO Data61

ACM DL

Google Scholar

(346)
概要:　精密医療の進展に伴い、特定の患者プロファイルに関連する臨床試験を特定することがますます困難になっています。試験が関連性があると見なされるためには、しばしば非常に具体的な分子レベルの患者特性を一致させる必要があります。臨床試験には、厳格な包含および除外基準が含まれており、多くの場合、自由形式のテキストで書かれています。患者プロファイルも半構造化されており、重要な情報の一部が臨床メモに隠されています。私たちは、患者プロファイルを入力として、潜在的な一致を見つけるために臨床試験を検索するシステムを紹介します。このシステムは、Apache Luceneのクエリ構文による強力なクエリ言語を、最先端のランダムネスからの乖離（Divergence From Randomness）検索とBERTベースのニューラルランキングコンポーネントと組み合わせて活用することを可能にします。このシステムは、臨床意思決定を支援することを目的としています。

Abstract:　 With the advances in precision medicine, identifying clinical trials relevant to a specific patient profile becomes more challenging. Often very specific molecular-level patient features need to be matched for the trial to be deemed relevant. Clinical trials contain strict inclusion and exclusion criteria, often written in free-text. Patients profiles are also semi-structured, with some important information hidden in clinical notes. We present a search system that given a patient profile searches over clinical trials for potential matches. It enables the users to leverage the powerful querying language that comes with Apache Lucene query syntax in combination with state-of-the-art Divergence From Randomness retrieval coupled with a BERT-based neural ranking component. This system aims to assist in clinical decision making.

Learning with Little Data: Industry Challenges and Innovations
Authors: Nikhil Rao (1)
1: Amazon

ACM DL

Google Scholar

(347)
概要:　 eコマースアプリケーションにおいて、顧客はクエリを使用して一つまたは複数の製品を検索・発見します。これらのクエリの中には、多様で複数の意図を含むものがあります。そのため、匿名化および集計された顧客の過去の行動データだけに頼ることは、機械学習モデルの訓練には不十分です。例えば、「samsung galaxy s9」というクエリに対して、顧客がギャラクシー充電器をクリックして購入することがあります。この商品は顧客のクエリと完全に一致するわけではありませんが、元のクエリを補完する役割を果たし、購入される可能性があります。このような検索結果の不一致を防ぐために、eコマースシステムは人間が注釈を付けたデータによって訓練された機械学習モデルに依存しています。人間の注釈付きデータを収集する際には二つの課題があります。第一に、人間による注釈プロセスはスケールしないため、複数の言語で大量の注釈を取得することが困難です。第二に、注釈者は既存のシステムからサンプルを監査するためにクエリを実行する必要があり、非常に少ない不一致例（データの偏り）および反実例バイアスが生じます。本講演では、深層学習の最新の進展を使用してこれらの課題に取り組みます。データの偏りに対処するために、ポジティブな例を使用してハードネガティブな例を生成します。ここでの鍵となるアイデアは、変分エンコーダーデコーダー（VED）アーキテクチャを用いて合成データを生成することです。新しい結合器を備えた損失関数を使用して、ポリシーベースの勾配およびその他のヒューリスティックスを回避する方法を示します。あまり人気のない言語でのデータの疎さに対処するために、言語に依存しない表現学習を使用してすべての言語のデータを結合します。使用するサイド情報は、アイテムを同じ潜在空間に言語を超えて整合させます。これらのアプローチにより、最先端のベースラインに比べて、変分モデルのF1スコアで25%以上、マルチリンガルモデルのF1スコアで20%以上の改善が見られることを示します。

Abstract:　 In e-commerce applications, customers search and discover one or more products using queries. Some of these queries are broad and diverse, with multiple intents. Therefore, relying purely on the anonymized and aggregated customer historical behavioral data is not sufficient to train machine learned models. For example, customers may click and purchase a galaxy charger for a "samsung galaxy s9" query. The item is not an exact match for the customer query. However, it serves as a complement to the original query and may be purchased. To address these potential mismatches from surfacing in search results, e-commerce systems rely on machine learned models trained on human- annotated data. There are two challenges in collecting human annotated data. First, the human annotation process does not scale and it is hard to obtain large volumes of annotations in multiple languages. Second, annotators must query existing systems to obtain samples for auditing, resulting in very few mismatched examples (data skewness) and counterfactual biases. In this talk, we address these challenges using two recent advances in deep learning. To address the data skewness, we generate hard negative examples using positive examples. The key idea here is to generate synthetic data using a Variational Encoder Decoder (VED) architecture. We show how a modified loss function with a novel combiner (to combine VED with the classifier) can avoid policy-based gradients and other heuristics. To address the sparsity of data in less popular languages, we combine data across all languages using language-agnostic representation learning. The side information we use aligns the items across languages in the same latent space. We show that our approaches significantly improve upon state of the art baselines, by over 25% in F1 score for the variational model, and over 20% in F1 score for the multilingual model.

Restoring Healthy Online Discourse by Detecting and Reducing Controversy, Misinformation, and Toxicity Online
Authors: Shiri Dori-Hacohen (1), Keen Sung (1), Jengyu Chou (1), Julian Lustig-Gonzalez (1)
1: AuCoDe

ACM DL

Google Scholar

(348)
概要:　論争、誤情報、偽情報、有害な言説が増加する中で、健全なオンライン討論を行うことがますます難しくなっています。有害な言説を検出するための情報検索（IR）は重要ですが、研究者は学際的に協力し、迅速かつ効果的に介入手段を展開するために産業界と連携する必要があります。本論文では、オンライン情報の混乱を検出し、新しい現実世界のコンテンツモデレーションツールを導入することの重要性を主張します。これにより、ソーシャルネットワーク上での共感を促進し、自由な表現と討論を維持することができると考えています。私たちの研究では、ParlerやRedditのような異なるソーシャルネットワークを対象に洞察を詳細に述べています。最後に、アカデミアと他の産業パートナーとの共同研究を通じて、より信頼性の高いオンライン・エコシステムへの道を模索する研究室育ちのスタートアップ企業としての喜びと課題について議論します。

Abstract:　 Healthy online discourse is becoming less and less accessible beneath the growing noise of controversy, mis- and dis-information, and toxic speech. While IR is crucial in detecting harmful speech, researchers must work across disciplines to develop interventions, and partner with industry to deploy them rapidly and effectively. In this position paper, we argue that both detecting online information disorders and deploying novel, real-world content moderation tools is crucial in promoting empathy in social networks, and maintaining free expression and discourse. We detail our insights in studying different social networks such as Parler and Reddit. Finally, we discuss the joys and challenges as a lab-grown startup working with both academia and other industrial partners in finding a path toward a better, more trustworthy online ecosystem.

From Producer Success to Retention: A New Role of Search and Recommendation Systems on Marketplaces
Authors: Viet Ha-Thuc (1), Matthew Wood (1), Yunli Liu (1), Jagadeesan Sundaresan (1)
1: Facebook Inc.

ACM DL

Google Scholar

(349)
概要:　オンラインマーケットプレイスでは、ますます多くの生産者が消費者と結びつき生計を立てるために検索および推薦システムに依存しています。本講演では、これらのシステムが従来の定式化から進化し、プロデューサーの価値を目標に組み込む必要があることについて議論します。消費者と生産者の両方の価値に基づいて、これらのシステムのランキング関数を共同で最適化することは、新しい方向性を示し、多くの技術的な課題を引き起こします。これらを克服するために、エンドツーエンドのソリューションを提示し、このソリューションをFacebookマーケットプレイスに適用した結果を紹介します。

Abstract:　 In online marketplaces, an increasing number of producers depend on search and recommendation systems to connect them with consumers to make a living. In this talk, we discuss how these systems will need to evolve from the traditional formulations by incorporating the producer value into their objectives. Jointly optimizing the ranking functions behind these systems on both consumer and producer values is a new direction and raises many technical challenges. To overcome these, we lay out an end-to-end solution and present the results of applying this solution on Facebook Marketplace.

Putting the Role of Personalization into Context
Authors: Dmitri Goldenberg (1)
1: Booking.com

ACM DL

Google Scholar

(350)
概要:　パーソナライゼーションは、エンターテインメントや商業利用からスマートデバイスや医療まで、私たちの生活に広く浸透しています。様々な製品へのパーソナライゼーションの統合は、急速に不必要な贅沢から顧客が期待するコモディティへと変わりました。機械学習の異なる分野は最先端の進歩と超人的な性能を示していますが、パーソナライゼーションの応用は複雑なフレーミングと異なるビジネス目標を持つ複数のステークホルダーが絡むため、新たなソリューションの導入が遅れることがしばしばあります。パーソナライゼーションの応用の役割もまた曖昧です。例えば、モデルが単にユーザーの次の行動を予測するだけなのか、それとも積極的にユーザーの選択に影響を与えるのかは明確ではありません。本講演ではレコメンダーの役割と、顧客フィードバックに適応する能力を検討することに焦点を当てます。因果関係やアクティブな探索といった重要なトピックを、実例を交えてビジネス上の考慮事項や実装の課題と共に示します。本講演は、この分野の最近の進展と、世界有数のオンライン旅行プラットフォームであるBooking.comで実施されたパーソナライゼーションモデルの実装に基づいています。

Abstract:　 Personalization is omnipresent in our life, with applications ranging from entertainment and commercial uses to smart devices and medical treatments. The integration of personalization in various products turned rapidly from an unnecessary luxury to a commodity that is expected by customers. While different machine learning fields present state-of-the-art advances and super-human performance, personalization applications are often late-adopters of novel solutions due to their complex framing and multiple stakeholders' with different business goals. The role of personalisation applications is also ambiguous: it is unclear, for instance, whether models just predict a user's next action or proactively affect the user's selections. This talk focuses on examining the role of recommenders and their ability to adapt to customer feedback. Key topics such as causality and active exploration are depicted with real examples and demonstrated alongside business considerations and implementation challenges. It relies on recent advances in the field and on work conducted at Booking.com, where we implement personalization models on one of the world's leading online travel platform.

SearchGCN: Powering Embedding Retrieval by Graph Convolution Networks for E-Commerce Search
Authors: Xinlin Xia (1), Shang Wang (1), Han Zhang (1), Songlin Wang (1), Sulong Xu (1), Yun Xiao (2), Bo Long (3), Wen-Yun Yang (2)
1: JD.com, 2: JD.com Silicon Valley Research Center, 3: JD.com & JD.com Silicon Valley Research Center

ACM DL

Google Scholar

(351)
概要:　本文では、グラフ畳み込みネットワーク（GCN）が最近、グラフノード分類、レコメンデーション、その他の用途において新たな最先端手法として登場しましたが、産業規模の検索エンジンに成功裏に応用されたことはありませんでした。本提案では、世界最大級のeコマース検索エンジンの一つにおける埋め込みベースの候補検索のための新しいアプローチ、SearchGCNを紹介します。実証的な研究により、SearchGCNは既存の手法よりも優れた埋め込み表現を学習することが示され、特に長尾クエリやアイテムに対して効果的です。そのため、SearchGCNは2020年7月からJD.comの検索プロダクションに導入されています。

Abstract:　 Graph convolution networks (GCN), which recently becomes new state-of-the-art method for graph node classification, recommendation and other applications, has not been successfully applied to industrial-scale search engine yet. In this proposal, we introduce our approach, namely SearchGCN, for embedding-based candidate retrieval in one of the largest e-commerce search engine in the world. Empirical studies demonstrate that SearchGCN learns better embedding representations than existing methods, especially for long tail queries and items. Thus, SearchGCN has been deployed into JD.com's search production since July 2020.

AliMe Avatar: Multi-modal Content Production and Presentation for Live-streaming E-commerce
Authors: Feng-Lin Li (1), Zhongzhou Zhao (1), Qin Lu (1), Xuming Lin (1), Hehong Chen (1), Bo Chen (1), Liming Pu (1), Jiashuo Zhang (1), Fu Sun (1), Xikai Liu (1), Liqun Xie (1), Qi Huang (1), Ji Zhang (1), Haiqing Chen (1)
1: DAMO Academy

ACM DL

Google Scholar

(352)
概要:　我々は、Eコマース分野でのライブストリーミング販売のために設計されたVtuber「AliMe Avatar」を紹介します。新興のライブショッピングモードをサポートするために、このデジタルアバターの核心は、顧客が製品を理解し、バーチャル放送ルームでの購入を促進することです。コンピュータグラフィックスとビジョン、自然言語処理、および音声認識と合成に基づき、当AIアバターは以下の3つの主要な機能を提供できます: カスタム外観、商品放送、およびマルチモーダルインタラクション。現在、このアバターはTaobaoアプリでオンライン展開されており、1日あたり700時間以上の放送を行い、数十万人の顧客にサービスを提供しています。本論文では主に商品放送部分に焦点を当て、システムをデモンストレーションし、基盤となる技術を紹介し、ライブストリーミングEコマースに対する我々の経験を共有します。

Abstract:　 We present AliMe Avatar, a Vtuber designed for live-streaming sales in the E-commerce field. To support the emerging live shopping mode, the core of our digitial avatar is to enable customers to understand products and encourage customers to purchase in a virtual broadcasting room. Based on computer graphics & vision, natural language processing, and speech recognition & synthesis, our AI avatar is able to offer three kinds of key capabilities: custom appearance, product broadcasting, and multi-modal interaction. Currently, it has been launched online in the Taobao app, broadcasts 700+ hours and serves hundreds of thousands of customers per day. In this paper, we mainly focus on the product broadcasting part, demonstrate the system, present the underlying techniques, and share our experience in dealing with live-streaming E-commerce.

AliMe DA: A Data Augmentation Framework for Question Answering in Cold-start Scenarios
Authors: Guohai Xu (1), Yan Shao (1), Chenliang Li (1), Feng-Lin Li (1), Bin Bi (1), Ji Zhang (1), Haiqing Chen (1)
1: DAMO Academy

ACM DL

Google Scholar

(353)
概要:　新しいビジネスシナリオにおいて、質問応答型チャットボットを構築する際に最も困難で時間がかかるフェーズは、十分なトレーニングデータを収集する「コールドスタート」です。本論文では、この問題を軽減するための実践的なデータ増強（DA）フレームワーク「AliMe DA」を提案します。このフレームワークは、データの生成、ノイズ除去、および消費という三段階から成り立っています。我々は、どのようにして本DAアプローチがアノテーション生産性を大幅に向上させ、下流モデルの性能も改善するかを示します。さらに重要なことに、フレームワークの各段階で適切な方法を選択および使用するためのベストプラクティスを提供し、前訓練された言語モデルの時代におけるデータ増強の適用シーンに関する我々の観察結果も共有します。

Abstract:　 Cold-start is the most difficult and time-consuming phase when building a question answering based chatbot for a new business scenario because of the collection of sufficient training data. In this paper, we propose AliMe DA, a practical data augmentation (DA) framework that consists of data production, denoising and consumption, to alleviate this problem. We show how our DA approach can be used to substantially enhance annotation productivity and also improve downstream model performance. More importantly, we provide best practices for data augmentation, including how to choose and employ appropriate methods at each stage of our framework, and share our observation on the applicable scene of data augmentation in the era of pre-trained language models.

AI Based Information Retrieval System for Identifying Harmful Online Gaming Patterns
Authors: Deepanshi Seth (1), Rukma Talwadker (2), Tridib Mukherjee (1), Usama Chitapure (1), Nagesh Adiga (1), Avantika Gupta (1)
1: Games24x7, 2: Games24x7.com

ACM DL

Google Scholar

(354)
概要:　スキルゲームは、優れたレクリエーションとリラクゼーションの手段です。これらのゲームは社会的な交流やコミュニティ活動のための最も安全で簡単にアクセスできる構造でもあり、個人の価値、社会的受容、尊敬と認識を実現する新しい方法を提供します。しかし、これらのゲームがリアルマネーで行われる場合、ゲームの慎重さを確保することが必要となります。すなわち、ユーザーがリアルマネーのスキルゲームをエンターテイメント目的のみに利用し、自己資源の範囲内で楽しむことが求められます。これはプレイヤーの健康とウェルビーイングのために、またオンラインゲームが純粋に娯楽として提供されるために重要です。本提案では、当社のオンラインスキルゲームプラットフォームに統合されている、自動化されたデータ駆動型AI搭載の「責任あるゲームプレイ（Responsible Game Play: RGP）」フレームワークおよびツールを紹介します。RGPパイプラインは、以下の要素から構成されています: a) いくつかの異常検知ルールベースエンジン; b) 健康で積極的なプレイヤーのゲームプレイ特性をモデル化し、潜在的にリスクのあるプレイヤーを識別するディープラーニングパイプライン; c) ユーザーの長期的な行動パターンを利用し、隣接するAI運用および信号処理ドメインを用いて新しい特徴を構築するMLベースのローカルエキスパート。さらに、心理測定評価を統合し、リスクのあるプレイヤーを事前にプロアクティブに修正・誘導します。

Abstract:　 Games of skill are an excellent source of recreation and relaxation. Games are also the safest and readily accessible constructs for social interaction and community affairs which potentially opens up new avenues for realising personal worth, social acceptance, respect & recognition. However, when these games are played with real money, ensuring game prudence, whereby users play real-money skill games only for entertainment purposes, and do so well within their resourceful means, becomes necessary. It becomes paramount for the wellness of players and also to ensure online gaming is only available for sheer entertainment. In this proposal, we present an automated, data driven, AI powered, Responsible Game Play (RGP) framework cum tool which has been integrated in our online skill gaming platform. RGP pipeline is a combination of: a) a couple of anomaly detection Rule Based Engines; b) a Deep Learning Pipeline which models the game play characteristics of healthy and engaged players to identify potentially risky players, and c) a ML based Local Expert which leverages users' longitudinal behavioral patterns and constructs new features using the adjacent AI OPS and Signal Processing Domains. We integrate the psychometric assessment to nudge and coarse correct at-risk players proactively, ahead of time

Transformer-based Banking Products Recommender System
Authors: Davide Liu (1), George Philippe Farajalla (2), Alexandre Boulenger (1)
1: Tsinghua University & Genify.ai, 2: Genify.ai

ACM DL

Google Scholar

(355)
概要:　クレジットカード、預金、ローン、年金基金、投資信託などの製品は、銀行の顧客にとってどの時点で relevant なのでしょうか？我々は、Transformerエンコーダー[6]を用いたアイテム推薦のためのモデリングフレームワークと、アイテム所有の時間的文脈とユーザーメタデータを考慮した新しい入力データ表現を提案します。我々は、このモデルをBank Santanderの大規模データセットで評価しました。我々のシステムは、業界のベースラインであるAmazon Personalize [1] と、Santander Kaggleコンペティション [2]でトップパフォーマンスだったXGBoost [4]を上回りました。我々は、トップ3精度で56.6％を達成し、それぞれ21.5％と37.9％のトップ3精度を持つAmazon PersonalizeとXGBoostモデルを大幅に上回りました。入力データをシーケンスとして表現する独自の方法を開発し、この特定の表現とTransformerベースのアーキテクチャがモデルの性能を向上させることを発見しました。我々の貢献が、銀行業における推薦システムの普及および業界での製品推薦のためのTransformerモデルの利用の道を開くことを期待します。

Abstract:　 Credit cards, deposits, loans, pension funds, mutual funds which of these products is relevant to a bank's clients, and at what time in their banking journey? We propose a modeling framework for item recommendation using a Transformer encoder [6] and a novel input data representation accounting for the temporal context of item ownership and user metadata. We evaluate the model on a large dataset from Bank Santander. Our system outperforms industry baselines Amazon Personalize [1], and XGBoost [4], a top performing model in the Santander Kaggle competition [2]. We achieve a 56.6% top-3 precision and significantly outperforms Amazon Personalize and the XGBoost model, with 21.5% and 37.9% top-3 precision, respectively. We engineered an original way of representing input data as a sequence and found that this specific representation, with our Transformer-based architecture, improves the model's performance. We hope that our contribution paves the way for the democratization of recommender systems in banking, and the use of the Transformer model for product recommendation in industry.

Addressing Bias and Fairness in Search Systems
Authors: Ruoyuan Gao (1), Chirag Shah (2)
1: Rutgers University, 2: University of Washington

ACM DL

Google Scholar

(356)
概要:　検索システムは、人々がどのように情報にアクセスするかに前例のない影響を及ぼしています。これらの情報へのゲートウェイは、一方で簡単かつ普遍的なオンライン情報へのアクセスを提供する一方、他方で知識の格差や情報探索者の誤った意思決定を引き起こすバイアスも生み出します。インデキシング、検索、ランキングのアルゴリズムの大部分は、バイアスが含まれている基礎データによって大きく駆動されています。さらに、検索結果の順序は、関連性とユーザー満足度に対する強調によって位置バイアスや露出バイアスを引き起こします。これらや他の形で暗黙的に、時には明示的に検索システムに組み込まれたバイアスは、情報探索および意識形成プロセスに対する脅威となりつつあります。このチュートリアルでは、データやアルゴリズム、そして全体的な検索プロセスにおけるバイアスの問題を紹介し、より公正で多様性と透明性が高いシステムをどのように考え構築できるかを示します。具体的には、関連性、新規性、多様性、バイアス、公正性などの基本概念を、さまざまなコミュニティからの社会技術用語を用いて紹介し、それらを理解し、抽出し、具体化するための指標や枠組みに深く踏み込みます。このチュートリアルでは、この分野での最新の研究成果を取り上げ、SIGIRなどのコミュニティに新たな挑戦と機会をもたらしたことを示します。

Abstract:　 Search systems have unprecedented influence on how and what information people access. These gateways to information on the one hand create an easy and universal access to online information, and on the other hand create biases that have shown to cause knowledge disparity and ill-decisions for information seekers. Most of the algorithms for indexing, retrieval, and ranking are heavily driven by the underlying data that itself is biased. In addition, orderings of the search results create position bias and exposure bias due to their considerable focus on relevance and user satisfaction. These and other forms of biases that are implicitly and sometimes explicitly woven in search systems are becoming increasing threats to information seeking and sense-making processes. In this tutorial, we will introduce the issues of biases in data, in algorithms, and overall in search processes and show how we could think about and create systems that are fairer, with increasing diversity and transparency. Specifically, the tutorial will present several fundamental concepts such as relevance, novelty, diversity, bias, and fairness using socio-technical terminologies taken from various communities, and dive deeper into metrics and frameworks that allow us to understand, extract, and materialize them. The tutorial will cover some of the most recent works in this area and show how this interdisciplinary research has opened up new challenges and opportunities for communities such as SIGIR.

Beyond Probability Ranking Principle: Modeling the Dependencies among Documents
Authors: Liang Pang (1), Qingyao Ai (2), Jun Xu (3)
1: Institute of Computing Technology, 2: The University of Utah, 3: Renmin University of China

ACM DL

Google Scholar

(357)
概要:　確率ランキング原理（PRP）[31]は、各文書が特定の情報ニーズを満たすための独自かつ独立した確率を持つと仮定するランキングの基本原理の一つです。従来、ヒューリスティックなランキング特徴やよく知られた学習によるランキング手法は、このPRP原理に基づいて設計されてきました。近年、ランキング性能を向上させるためにディープラーニングを採用するニューラルIRモデルもPRP原理に従っています。PRPは約50年にわたり広く使用されてきましたが、詳細な分析によれば、各文書が他の候補と独立しているという仮定のため、ランキングには最適な原理ではないことが示されています。反例としては、擬似関連フィードバック[24]、インタラクティブ情報検索[46]、検索結果の多様化[10]などが挙げられます。この問題を解決するために、研究者たちは最近、ランキングモデルを設計する際に文書間の依存性をモデル化することを提案しています。いくつかのランキングモデルが提案され、最先端のランキング性能が達成されています。本チュートリアルでは、PRP原理を超えたこれらの新たに開発されたランキングモデルを包括的に調査することを目的としています。本チュートリアルでは、文書が独立している、連続的に依存している、または全体的に依存しているという内在的な仮定に基づいてこれらのモデルを分類しようとしています。このようにして、検索や推薦のランキングに焦点を当てる研究者たちがランキングモデルの設計に関して新たな視点を持ち、新しいランキングモデルの開発に関する新しいアイデアを促進することを期待しています。

Abstract:　 Probability Ranking Principle (PRP)[31], which assumes that each document has a unique and independent probability to satisfy a particular information need, is one of the fundamental principles for ranking. Traditionally, heuristic ranking features and well-known learning-to-rank approaches have been designed by following the PRP principle. Recently, neural IR models, which adopt deep learning to enhance the ranking performances, also obey the PRP principle. Though it has been widely used for nearly five decades, in-depth analysis shows that PRP is not an optimal principle for ranking, due to its independent assumption that each document should be independent of the rest candidates. Counter examples include pseudo relevance feedback[24], interactive information retrieval[46], search result diversification[10] etc. To solve the problem, researchers recently proposed to model the dependencies among the documents during the designing of ranking models. A number of ranking models have been proposed and state-of-the-art ranking performances have been achieved. This tutorial aims to give a comprehensive survey on these recently developed ranking models that go beyond the PRP principle. The tutorial tries to categorize these models based on their intrinsic assumptions: assuming that the documents are independent, sequentially dependent, or globally dependent. In this way, we expect the researchers focusing on ranking in search and recommendation can have a novel angle of view on the designing of ranking models, and therefore can stimulate new ideas on developing novel ranking models.

Deep Learning on Graphs for Natural Language Processing
Authors: Lingfei Wu (1), Yu Chen (2), Heng Ji (3), Bang Liu (4)
1: JD.COM Silicon Valley Research Center, 2: Facebook AI, 3: University of Illinois at Urbana-Champaign, 4: University of Montreal

ACM DL

Google Scholar

(358)
概要:　この「自然言語処理のためのグラフに基づくディープラーニング（DLG4NLP）」チュートリアルでは、自然言語処理（NLP）に対するグラフ技術を用いたディープラーニングの適用に関する関連性と興味深いトピックを扱います。具体的には、NLP における自動グラフ構築、NLP のためのグラフ表現学習、NLP における高度な GNN ベースのモデル（例：graph2seq、graph2tree、graph2graph）、および様々な NLP タスク（例：機械翻訳、自然言語生成、情報抽出、意味解析）における GNN の応用に関する内容が含まれます。さらに、実践的なデモンストレーションセッションを含めることで、最近開発されたオープンソースライブラリ「Graph4NLP」を使用して GNN を適用し、挑戦的な NLP 問題を解決するための実務経験を聴衆が得られるよう支援します。このライブラリは、様々な NLP タスクにおける GNN の簡単な利用を研究者や実務者に提供する初のライブラリです。

Abstract:　 This tutorial of Deep Learning on Graphs for Natural Language Processing (DLG4NLP) will cover relevant and interesting topics on applying deep learning on graph techniques to NLP, including automatic graph construction for NLP, graph representation learning for NLP, advanced GNN based models (e.g., graph2seq, graph2tree, and graph2graph) for NLP, and the applications of GNNs in various NLP tasks (e.g., machine translation, natural language generation, information extraction and semantic parsing). In addition, a handson demonstration session will be included to help the audience gain practical experience on applying GNNs to solve challenging NLP problems using our recently developed open source library - Graph4NLP, the first library for researchers and practitioners for easy use of GNNs for various NLP tasks.

Tutorial on Fairness of Machine Learning in Recommender Systems
Authors: Yunqi Li (1), Yingqiang Ge (1), Yongfeng Zhang (1)
1: Rutgers University

ACM DL

Google Scholar

(359)
概要:　近年、機械学習における公平性の考慮がますます注目されています。機械学習の最も普及している応用の一つとして、レコメンダシステムは、情報収集や意思決定のために多くのユーザーに利用されており、社会や人間に対してますます重要な影響を与えています。そのため、レコメンダシステムにおける潜在的な不公平性の問題に対処することは、ユーザーや提供者の満足度だけでなく、プラットフォームの利益を損なう可能性があるため、極めて重要です。本チュートリアルは、レコメンドにおける公平性の基礎とアルゴリズムに焦点を当てています。また、分類やランキングなどの基本的な機械学習タスクにおける公平性についての簡単な紹介も行います。このチュートリアルでは、現在の公平性の定義や評価指標の分類を紹介し、レコメンドにおける公平性についての先行研究を紹介するとともに、将来の研究方向性を提示します。本チュートリアルの目的は、レコメンド方法における公平性をコミュニティに紹介し、さらにこの研究方向に関心を持つ研究者や実務者を集めて議論やアイデアの交流、研究の促進を図ることです。

Abstract:　 Recently, there has been growing attention on fairness considerations in machine learning. As one of the most pervasive applications of machine learning, recommender systems are gaining increasing and critical impacts on human and society since a growing number of users use them for information seeking and decision making. Therefore, it is crucial to address the potential unfairness problems in recommendation, which may hurt users' or providers' satisfaction in recommender systems as well as the interests of the platforms. The tutorial focuses on the foundations and algorithms for fairness in recommendation. It also presents a brief introduction about fairness in basic machine learning tasks such as classification and ranking. The tutorial will introduce the taxonomies of current fairness definitions and evaluation metrics for fairness concerns. We will introduce previous works about fairness in recommendation and also put forward future fairness research directions. The tutorial aims at introducing and communicating fairness in recommendation methods to the community, as well as gathering researchers and practitioners interested in this research direction for discussions, idea communications, and research promotions.

Interactive Information Retrieval with Bandit Feedback
Authors: Huazheng Wang (1), Yiling Jia (1), Hongning Wang (1)
1: University of Virginia

ACM DL

Google Scholar

(360)
概要:　情報検索（IR）は本質的に逐次的な意思決定プロセスです。システムはユーザーと繰り返し対話し、ユーザーの情報ニーズの理解を深め、結果の関連性の推定を改善し、返される結果（例：検索結果の順位）の有用性を高めます。従来のオフライントレーニングされたポリシーを固定的に実行するIRソリューションとは異なり、インタラクティブ情報検索はオンラインポリシー学習を重視します。しかし、このアプローチには少なくとも三つの根本的な困難が伴います。第一に、システムが集めるフィードバックは提示された結果に対するユーザーの反応、すなわちバンディットフィードバックのみです。第二に、ユーザーのフィードバックはノイズを含み、偏っていることが知られています。第三に、その結果として、システムは現在過小評価されている結果をユーザーに提示してポリシーの改善を図ることと、現在推定されているベストの結果を上位にランク付けすることによってユーザーを満足させることという対立する目標に直面します。本チュートリアルでは、ウェブ検索やレコメンデーションなど、オンラインの逐次意思決定が必要な現実のIR問題における重要性を強調し、インタラクティブIRにおけるオンラインポリシー学習の必要性を動機づけます。このようなソリューションパラダイムにおける新たな課題として、サンプルの複雑さ、高価で時には古いフィードバック、インタラクティブIRにおけるオンライン学習の倫理的考慮（公正性やプライバシーなど）を慎重に取り扱います。技術的な議論を準備するために、まず機械学習の文献からいくつかの古典的なインタラクティブ学習戦略を紹介し、その後、上述の基本的な課題に取り組むための最新の研究開発に深く踏み込みます。なお、「インタラクティブ情報検索：モデル、アルゴリズム、評価」についてのチュートリアルでは、インタラクティブIRの一般的な概念フレームワークおよび形式モデルに関する広範なを提供しますが、本チュートリアルはバンディットフィードバックを伴うインタラクティブIRにおけるオンラインポリシー学習ソリューションをカバーします。

Abstract:　 Information retrieval (IR) in nature is a process of sequential decision making. The system repeatedly interacts with the users to refine its understanding of the users' information needs, improve its estimation of result relevance, and thus increase the utility of its returned results (e.g., the result rankings). Distinct from traditional IR solutions that rigidly execute an offline trained policy, interactive information retrieval emphasizes online policy learning. This, however, is fundamentally difficult for at least three reasons. First, the system only collects user feedback on the presented results, aka, the bandit feedback. Second, users' feedback is known to be noisy and biased. Third, as a result, the system always faces the conflicting goals of improving its policy by presenting currently underestimated results to users versus satisfying the users by ranking the currently estimated best results on top. In this tutorial, we will first motivate the need for online policy learning in interactive IR, by highlighting its importance in several real-world IR problems where online sequential decision making is necessary, such as web search and recommendations. We will carefully address the new challenges that arose in such a solution paradigm, including sample complexity, costly and even outdated feedback, and ethical considerations in online learning (such as fairness and privacy) in interactive IR. We will prepare the technical discussions by first introducing several classical interactive learning strategies from machine learning literature, and then fully dive into the recent research developments for addressing the aforementioned fundamental challenges in interactive IR. Note that the tutorial on "Interactive Information Retrieval: Models, Algorithms, and Evaluation" will provide a broad overview on the general conceptual framework and formal models in interactive IR, while this tutorial covers the online policy learning solutions for interactive IR with bandit feedback.

Interactive Information Retrieval: Models, Algorithms, and Evaluation
Authors: Chengxiang Zhai (1)
1: University of Illinois at Urbana-Champaign

ACM DL

Google Scholar

(361)
概要:　情報検索（Information Retrieval, IR）は一般的に対話型プロセスであるため、対話型情報検索（Interactive Information Retrieval, IIR）を研究することが重要です。IIRにおいては、単一のクエリではなく、ユーザーが検索エンジンと潜在的に多様な方法で対話できる全体的な対話型検索プロセスをモデル化し最適化することを目指します。本チュートリアルでは、IIRの研究進展を体系的にレビューし、特にモデル、アルゴリズム、および評価戦略の最近の進展に重点を置いています。最後に、IIRの主要な未解決課題と、今後の最も有望な研究方向について簡単に議論します。

Abstract:　 Since Information Retrieval (IR) is an interactive process in general, it is important to study Interactive Information Retrieval (IIR), where we would attempt to model and optimize an entire interactive retrieval process (rather than a single query) with consideration of many different ways a user can potentially interact with a search engine. This tutorial systematically reviews the progress of research in IIR with an emphasis on the most recent progress in the development of models, algorithms, and evaluation strategies for IIR, ending with a brief discussion of the major open challenges in IIR and some of the most promising future research directions.

Pretrained Transformers for Text Ranking: BERT and Beyond
Authors: Andrew Yates (1), Rodrigo Nogueira (2), Jimmy Lin (2)
1: Max Planck Institute for Informatics, 2: University of Waterloo

ACM DL

Google Scholar

(362)
概要:　テキストランキングの目的は、クエリに応じてコーパスから取得したテキストを順序付けするリストを生成することです。テキストランキングの最も一般的な形式は検索ですが、このタスクの事例は多くの自然言語処理（NLP）アプリケーションにも見られます。本チュートリアルは、近刊の書籍に基づき、トランスフォーマーと呼ばれるニューラルネットワークアーキテクチャによるテキストランキングのを提供します。BERTがその最もよく知られた例です。トランスフォーマーと自己教師有り事前学習の組み合わせは、誇張なしに、自然言語処理（NLP）、情報検索（IR）などの分野を革命的に変えました。我々は、研究者と実務者の両方に向けて、既存の研究を一つの入り口として統合します。我々のカバー範囲は、マルチステージランキングアーキテクチャでリランキングを行うトランスフォーマーモデルと、直接ランキングを行う学習済みの高密度表現の2つのカテゴリに分かれています。我々の取り扱うテーマは、長文ドキュメントを処理する技術と、効果（結果の質）と効率（クエリの待ち時間）のトレードオフに対処する技術の2つに共通しています。トランスフォーマーアーキテクチャと事前学習技術は最近の革新ですが、それらの応用の多くの側面はよく理解されています。それにもかかわらず、多くの未解決の研究課題が残っており、テキストランキングのための事前学習済みトランスフォーマーの基礎を提示するだけでなく、未来を予測する試みも行っています。

Abstract:　 The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query. Although the most common formulation of text ranking is search, instances of the task can also be found in many natural language processing applications. This tutorial, based on a forthcoming book, provides an overview of text ranking with neural network architectures known as transformers, of which BERT is the best-known example. The combination of transformers and self-supervised pretraining has, without exaggeration, revolutionized the fields of natural language processing (NLP), information retrieval (IR), and beyond. We provide a synthesis of existing work as a single point of entry for both researchers and practitioners. Our coverage is grouped into two categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that perform ranking directly. Two themes pervade our treatment: techniques for handling long documents and techniques for addressing the tradeoff between effectiveness (result quality) and efficiency (query latency). Although transformer architectures and pretraining techniques are recent innovations, many aspects of their application are well understood. Nevertheless, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, we also attempt to prognosticate the future.

Reinforcement Learning for Information Retrieval
Authors: Alexander Kuhnle (1), Miguel Aroca-Ouellette (1), Anindya Basu (1), Murat Sensoy (1), John Reid (1), Dell Zhang (1)
1: Blue Prism AI Labs

ACM DL

Google Scholar

(363)
概要:　検索、レコメンデーション、広告などの情報検索（IR）アプリケーションに強化学習（RL）を活用することへの関心が高まっています。2020年だけで、「強化学習」という用語はACM SIGIRが発行した60以上の論文で言及されました。また、GoogleやAlibabaのようなインターネット企業が、RLベースの検索およびレコメンデーションエンジンから競争優位性を得始めていると報告されています。この終日のチュートリアルは、RLの経験がほとんどない、あるいは全くないIR研究者や実務者に、実践的なハンズオン形式で現代のRLの基礎を学ぶ機会を提供します。さらに、IRシステムにおけるRLの代表的な応用例も紹介し、議論します。このチュートリアルに参加することで、参加者は現代のRLの概念やREINFORCEやDQNなどの標準的なアルゴリズムについて十分な知識を得ることができます。この知識を活用することで、RLを含む最新のIR関連論文をよりよく理解できるようになり、自分自身の実際的なIR問題にもRLの技術やツールを使って対処する準備が整います。詳しくはチュートリアルのウェブサイト（https://rl-starterpack.github.io/）をご覧ください。

Abstract:　 There is strong interest in leveraging reinforcement learning (RL) for information retrieval (IR) applications including search, recommendation, and advertising. Just in 2020, the term "reinforcement learning" was mentioned in more than 60 different papers published by ACM SIGIR. It has also been reported that Internet companies like Google and Alibaba have started to gain competitive advantages from their RL-based search and recommendation engines. This full-day tutorial gives IR researchers and practitioners who have no or little experience with RL the opportunity to learn about the fundamentals of modern RL in a practical hands-on setting. Furthermore, some representative applications of RL in IR systems will be introduced and discussed. By attending this tutorial, the participants will acquire a good knowledge of modern RL concepts and standard algorithms such as REINFORCE and DQN. This knowledge will help them better understand some of the latest IR publications involving RL, as well as prepare them to tackle their own practical IR problems using RL techniques and tools. Please refer to the tutorial website (https://rl-starterpack.github.io/) for more information.

Stance Detection: Concepts, Approaches, Resources, and Outstanding Issues
Authors: Dilek Küçük (1), Fazli Can (2)
1: TÜBITAK MRC Energy Institute, 2: Bilkent University

ACM DL

Google Scholar

(364)
概要:　スタンス検出（スタンス分類、スタンス予測とも呼ばれる）は、ソーシャルメディア分析、自然言語処理、および情報検索に関連する問題であり、人が生み出すテキストから、その人が特定の対象（概念、アイデア、イベントなど）に対して示す立場を決定することを目的としています。この対象は、テキスト内で明示的に指定される場合もあれば、暗黙的に示される場合もあります。スタンス検出手続きの出力は通常、「賛成」「反対」「なし」のセットのいずれかです。本チュートリアルでは、スタンス検出に関連するコア概念と研究課題を定義し、スタンス検出の歴史的および現代的なアプローチを紹介し、関連するリソース（データセットおよびツール）への指針を提供し、スタンス検出の未解決問題および応用分野を網羅します。スタンス検出の解決策は、トレンド分析、意見調査、ユーザーレビュー、パーソナライゼーション、国民投票や選挙の予測などの重要なタスクに貢献し得るため、スタンス検出は引き続き重要な研究課題として存在し続けるでしょう。現在は主にテキストコンテンツに焦点を当てていますが、特にソーシャルメディアにおいてです。最後に、画像や動画コンテンツも近い将来、スタンス検出研究の主な対象となると信じています。

Abstract:　 Stance detection (also known as stance classification and stance prediction) is a problem related to social media analysis, natural language processing, and information retrieval, which aims to determine the position of a person from a piece of text they produce, towards a target (a concept, idea, event, etc.) either explicitly specified in the text, or implied only. The output of the stance detection procedure is usually from this set: Favor, Against, None. In this tutorial, we will define the core concepts and research problems related to stance detection, present historical and contemporary approaches to stance detection, provide pointers to related resources (datasets and tools), and we will cover outstanding issues and application areas of stance detection. As solutions to stance detection can contribute to significant tasks including trend analysis, opinion surveys, user reviews, personalization, and predictions for referendums and elections, it will continue to stand as an important research problem, mostly on textual content currently, and particularly on social media. Finally, we believe that image and video content will commonly be the subject of stance detection research soon.

CSR 2021: The 1st International Workshop on Causality in Search and Recommendation
Authors: Yongfeng Zhang (1), Xu Chen (2), Yi Zhang (3), Xianjie Chen (4)
1: Rutgers University, 2: Renmin University of China, 3: University of California, 4: Facebook AI Research

ACM DL

Google Scholar

(365)
概要:　現在の情報検索（IR）に対する機械学習アプローチの多くは、検索や推薦タスクを含め、基本的なマッチングの概念を基に設計されています。これらは知覚および類似性学習の観点から機能します。具体的には、データからの特徴学習（例えば表現学習）や、データからの類似性マッチング関数の学習（例えばニューラル関数学習）を含みます。多くのモデルは検索や推薦などの実際のランキングシステムで広く使用されていますが、その設計哲学はデータ内の相関信号に限定されています。しかし、検索や推薦において相関学習から因果学習へと進化することは重要な問題です。因果モデリングは、表現学習やランキングのために観察データの枠を超えて考える助けとなるからです。特に、因果学習は、説明可能なIRモデル、公平なIRモデル、公平性を考慮したIRモデル、堅牢なIRモデル、そして認知的推論IRモデルなど、様々な次元でIRコミュニティに利益をもたらすことができます。本ワークショップでは、検索、推薦、および広範なIRタスクにおける因果モデリングの研究と応用に焦点を当てます。このワークショップは、フィールドの研究者や実務者を集め、議論、アイデアの共有、および研究の促進を行います。また、最近のAI倫理に関する規制についても論じ、IR、機械学習、AI、データ科学などの広い分野にわたる洞察に満ちた議論を生み出すでしょう。ワークショップのホームページはオンラインで利用可能です：https://csr21.github.io/。

Abstract:　 Most of the current machine learning approaches to IR---including search and recommendation tasks---are mostly designed based on the basic idea of matching, which work from the perceptual and similarity learning perspective. This include both the learning of features from data such as representation learning, and the learning of similarity matching functions from data such as neural function learning. Though many models have been widely used in practical ranking systems such as search and recommendation, their design philosophy limits the models to the correlative signals in data. However, advancing from correlative learning to causal learning in search and recommendation is an important problem, because causal modeling can help us to think outside of the observational data for representation learning and ranking. More specially, causal learning can bring benefits to the IR community on various dimensions, including but not limited to Explainable IR models, Unbiased IR models, Fairness-aware IR models, Robust IR models and Cognitive Reasoning IR models. This workshop focuses on the research and application of causal modeling in search, recommendation and a broader scope of IR tasks. The workshop will gather both researchers and practitioners in the field for discussions, idea communications, and research promotions. It will also generate insightful debates about the recent regulations on AI Ethics, to a broader community including but not limited to IR, machine learning, AI, Data Science, and beyond. Workshop homepage is available online at https://csr21.github.io/.

DRL4IR: 2nd Workshop on Deep Reinforcement Learning for Information Retrieval
Authors: Weinan Zhang (1), Xiangyu Zhao (2), Li Zhao (3), Dawei Yin (4), Grace Hui Yang (5)
1: Shanghai Jiao Tong University, 2: Michigan State University, 3: Microsoft Research, 4: Baidu, 5: Georgetown University

ACM DL

Google Scholar

(366)
概要:　現代の情報検索（IR）は、クエリ拡張、候補アイテムのリコール、アイテムのランキング、再ランキングなど、一連のプロセスで構成されます。最終的にランク付けされたアイテムリストはユーザーに提示され、ユーザーはブラウジングやクリックなどの期待されるアクションを通じてフィードバックを提供します。この全過程は、エージェントがIRシステムであり、環境が特定のユーザーである意思決定プロセスとして定式化できます。この意思決定プロセスは、シナリオや問題の定式化方法に応じて、一段階または逐次的になる場合があります。2013年以来、ディープ強化学習（DRL）は意思決定タスクのための急速に発展している技術です。深層学習モデルの高いキャパシティは強化学習のフレームワークに組み込まれており、エージェントは複雑な意思決定をうまく処理できます。近年、様々なIRタスク（例えばアドホック検索、ランク学習、対話型推薦）に対してDRL技術を活用しようとする多くの論文が発表されています。しかし、基本理論、RL手法の原則、またはIRにおける意思決定の実験プロトコルが十分に発展していないため、提案された方法の正確性を評価したり、報告された実験のパフォーマンスが有効かどうかを判断することが困難です。我々は、SIGIR 2021で第2回DRL4IRワークショップを提案し、DRL技術の最新の進展を発表するための場を提供します。さらに、このワークショップの参加者は、意思決定IRタスクを定式化するための基本原則、基礎理論、および実験プロトコルの実際の有効性について議論することが期待されており、新しい方法論、革新的な実験成果、および情報検索のためのDRLの新しい応用に関するさらなる研究を促進することが期待されます。2020年に開催されたDRL4IRは、最も人気のあるワークショップの1つであり、200人以上の会議参加者を引き付けました。今年は、基礎的な研究トピックと最近の応用により重点を置き、約300人の参加者を見込んでいます。

Abstract:　 Modern information retrieval (IR) consists of a series of processes, including query expansion, candidate item recall, item ranking, item re-ranking, etc. The final ranked item list will be exposed to the user, which will accordingly provide feedback through some expected actions such as browsing and click. Such a whole process can be formulated as a decision-making process where the agent is the IR system while the environment is the specific user. This decision-making process can be one-step or sequential, depending on the scenarios or the ways of problem formulation. Since 2013, Deep reinforcement learning (DRL) has been a fast-developing technique for decision-making tasks. The high capacity of deep learning models is incorporated in the reinforcement learning framework so that the agent may successfully handle complex decision-making. In recent years, there have been a bunch of publications attempting to leverage DRL techniques for different IR tasks such as ad hoc retrieval, learning to rank and interactive recommendation. Nonetheless, the fundamental theory, the principle of RL methods or the recognized experimental protocols of decision-making in IR, has not been well developed, making it challenging to evaluate the correctness of a proposed method or judge whether the reported experimental performance is valid. We propose the second DRL4IR workshop at SIGIR 2021, which provides a venue to gather the academia researchers and industry practitioners to present the recent progress of DRL techniques for IR. More importantly, people in this workshop are expected to discuss more about the fundamental principles of formulating a decision-making IR task, the underlying theory as well as the practical effectiveness of the experiment protocol design, which would foster further research on novel methodologies, innovative experimental findings and new applications of DRL for information retrieval. DRL4IR organized at SIGIR'20 was one of the most popular workshops and attracted over 200 conference attendees. In this year, we will pay more attention to fundamental research topics and recent applications, and expect about 300 participants.

ECOM'21: The SIGIR 2021 Workshop on eCommerce
Authors: Surya Kallumadi (1), Tracy Holloway King (2), Shervin Malmasi (3), Maarten de Rijke (4)
1: Lowe's Companies, 2: Adobe, 3: Amazon.com, 4: University of Amsterdam & Ahold Delhaize

ACM DL

Google Scholar

(367)
概要:　 eコマース情報検索（IR）は、学術文献においてますます注目を集めており、Airbnb、Alibaba、Amazon、eBay、Facebook、Flipkart、Lowe's、Taobao、Targetなど、世界最大級のウェブサイトの重要なコンポーネントです。IR研究の重要性を反映して、何年も前からSIGIRはeコマース組織からのスポンサーシップを受けています。このワークショップの目的は、(1) eコマースIRの研究者や実務家を集め、その独自のトピックについて議論すること、(2) eコマースの自由テキスト、構造化データ、顧客行動データのユニークな組み合わせを用いて検索の関連性を向上させる方法を見つけること、(3) この領域においてデータセットを構築し、アルゴリズムを評価する方法を検討することです。eコマースの顧客は、購入したいものを正確には知らないことが多いため（すなわち、ナビゲーションや槍釣りクエリは稀です）、インスピレーションや偶然の発見、およびバスケットビルディングのために推奨が価値があります。今年のeコマースIRワークショップのテーマは、eコマースにおける検索および推奨の公正性を確保することです。ワークショップにはこのトピックに関する論文や、この分野に焦点を当てたパネルが含まれます。さらに、Coveoはカート放棄に関する特別なサブタスクを伴う、次のアクションを予測するためのセッションベースの予測に関するeコマースデータチャレンジをスポンサーしています。このデータチャレンジは、2017年、2018年、2019年、および2020年の以前のSIGIRワークショップのテーマを反映しています。

Abstract:　 eCommerce Information Retrieval (IR) is receiving increasing attention in the academic literature and is an essential component of some of the world's largest web sites (e.g., Airbnb, Alibaba, Amazon, eBay, Facebook, Flipkart, Lowe's, Taobao, and Target). SIGIR has for several years seen sponsorship from eCommerce organisations, reflecting the importance of IR research to them. The purpose of this workshop is (1) to bring together researchers and practitioners of eCommerce IR to discuss topics unique to it, (2) to determine how to use eCommerce's unique combination of free text, structured data, and customer behavioral data to improve search relevance, and (3) to examine how to build datasets and evaluate algorithms in this domain. Since eCommerce customers often do not know exactly what they want to buy (i.e. navigational and spearfishing queries are rare), recommendations are valuable for inspiration and serendipitous discovery as well as basket building. The theme of this year's eCommerce IR workshop is ensuring fairness in search and recommendations for eCommerce. The workshop includes papers on this topic as well as a panel focused on this area. In addition, Coveo is sponsoring an eCommerce data challenge on session-based prediction for predicting the next action with a special subtask on cart abandonment. The data challenge reflects themes from prior SIGIR workshops in 2017, 2018, 2019, and 2020.

IR for Children 2000-2020: Where Are We Now?
Authors: Theo Huibers (1), Monica Landoni (2), Emiliana Murgia (3), Maria Soledad Pera (4)
1: University of Twente, 2: Università della Svizzera Italiana, 3: Università degli Studi di Milano-Bicocca, 4: PIReT - Boise State University

ACM DL

Google Scholar

(368)
概要:　 20年以上前、情報検索（IR）研究者は子供向けの堅牢なIRシステムを求めて研究を開始しました。しかし、その道のりは決して平坦ではありませんでした。インタフェース設計、関連性の判断、多様な文脈、倫理などによる課題が、さまざまな視点から取り組まれてきました。Puppy-IRプロジェクトや国際児童デジタル図書館などの大規模プロジェクトがこの分野に一定の推進力を与えたものの、2021年になっても子供向けの検索領域には堅固な解決策も、それに至るロードマップもありません。それはなぜでしょうか？特定の対象グループや非常に小さなサブフィールド向けに小規模で開発された特定のIRソリューションが求められているのでしょうか？研究者を妨げる重要な予期しない障壁が存在するのでしょうか？それとも、学際的なアプローチが必要な、あるいは保護されている集団を含むこのような研究分野に特有の障害があるのでしょうか？本ワークショップでは、研究と産業の鍵となる専門家をできるだけ多く集め、他のIR分野とは異なりこの分野が繁栄しない理由を理解し、次の10年間で最大の課題を探ります。従来の研究者やデザイナーに限らず、音楽や映画、教育などの分野でIRシステムを開発・利用する人々にも目を向け、この停滞状態を打破し、新しい、そしておそらくより刺激的な視点から問題を考察します。

Abstract:　 Over 20 years ago, Information Retrieval (IR) researchers began their quest for sound IR systems for children. The path was not straightforward. Challenges posed by interface design, relevance determination, diverse contexts, ethics, and many more, were taken up and explored from different perspectives. Large projects such as Puppy-IR and the International Children's Digital Library gave this field a certain boost; still, there is neither a sound solution for children in the search area in 2021 nor a roadmap to get there. What is the reason for this? Does the field cry out for specific IR solutions developed on a small scale for very small sub-fields and specific target groups? Are there some significant unforeseen barriers that hinder researchers? What about obstacles natural to areas of study such as this one that require a multidisciplinary approach or involve protected populations? With this workshop, we want to bring together as many key experts as possible from research and industry who focus on IR for children to understand why, unlike other IR areas, this one has not flourished and look for the biggest challenges for the next 10 years. We are not only thinking of traditional researchers and designers but also of those who develop and use IR systems for fields, such as in music, film, and education, as a way to push past this immobility and look at the problem from new, and perhaps more stimulating, perspectives.

2nd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech2021)
Authors: Ralf Krestel (1), Hidir Aras (2), Linda Andersson (3), Florina Piroi (4), Allan Hanbury (5), Dean Alderucci (6)
1: Hasso Plattner Institute, 2: FIZ Karlsruhe, 3: Artificial Researcher IT GmbH, 4: Data Science Studio, 5: TU Wien, 6: Carnegie Mellon University

ACM DL

Google Scholar

(369)
概要:　特許分野における情報検索は重要な役割を担っています。ディープラーニング（DL）が他の分野で成功を収めたことで、特許の専門家や研究者は特許プロセスを支援するため、あるいは特許解析のプロセスを自動化するために、DLベースのアプローチをますます開発しています。AIを強化した情報検索システムは特許検索を改善できますが、多くの注釈付きデータも必要とします。特許データを扱う際には、一般的な情報検索（IR）やAIメソッドの適応や新しいアプローチが求められる特有の課題が生じます。このワークショップシリーズでは、自然言語処理（NLP）、テキストおよびデータマイニング（TDM）、セマンティック技術（ST）など情報検索関連分野からの産業界と学術界との間で双方向のコミュニケーションチャネルを確立し、新たな知識、方法論、技術を探索し、産業応用の利益のために転送し、知的財産（IP）および隣接分野における応用科学の学際的研究を支援することを目的としています。

Abstract:　 Information retrieval plays a crucial role in the patent domain. With the success of deep learning (DL) in other domains, patent practitioners and researchers are increasingly developing DL-based approaches to support experts in the patenting process or to automate processes for patent analysis. AI-enhanced information retrieval systems can improve patent search but also require lots of annotated data. When working with patent data, particular challenges arise that call for adaption and novel approaches of general IR and AI methods. with this workshop series we want to establish a two-way communication channel between industry and academia from relevant fields in information retrieval, such as natural language processing (NLP), text and data mining (TDM), and semantic technologies (ST), in order to explore and transfer new knowledge, methods and technologies for the benefit of industrial applications as well as support interdisciplinary research in applied sciences forthe intellectual property (IP) and neighbouring domains.

Sim4IR: The SIGIR 2021 Workshop on Simulation for Information Retrieval Evaluation
Authors: Krisztian Balog (1), David Maxwell (2), Paul Thomas (3), Shuo Zhang (4)
1: University of Stavanger, 2: Delft University of Technology, 3: Microsoft, 4: Bloomberg

ACM DL

Google Scholar

(370)
概要:　シミュレーション技術の使用は情報検索において決して新しいものではありません。過去には、シミュレーションはテストコレクションの構築や、様々な情報アクセスのシナリオにおけるモデルの性能予測と分析に利用されてきました。しかし、シミュレーションによる性能評価のための標準化された方法論はまだ確立されていません。このワークショップの目的は、研究者や実務者に対して、評価のためのシミュレーションの方法論開発とより広範な利用を促進するためのフォーラムを作ることです。具体的には、(1)問題設定及び適用シナリオの特定、(2)ツール、技術、経験の共有、(3)可能性と限界の特性評価、(4)研究課題の開発、を通じてこれを達成します。

Abstract:　 The use of simulation techniques is not foreign to information retrieval. In the past, simulation has been employed, for example, for constructing test collections and for model performance prediction and analysis in a broad array of information access scenarios. Nevertheless, a standardized methodology for performance evaluation via simulation has not yet been developed. The goal of this workshop is to create a forum for researchers and practitioners to promote methodology development and more widespread use of simulation for evaluation by: (1) identifying problem settings and application scenarios; (2) sharing tools, techniques, and experiences; (3) characterizing potentials and limitations; and (4) developing a research agenda.

Bridge the Gap: Industrial Data Pipeline oriented Recommendation Enhancement
Authors: Jie Dai (1)
1: Xidian University

ACM DL

Google Scholar

(371)
概要:　数多くの選択肢が溢れる現代において、レコメンダーシステムはパーソナライズされた情報フィルターとして不可欠な存在となり、産業界と学術界の双方から大きな関心を集めています。最近では、深層学習に基づくアルゴリズムから示唆に富んだ洞察が提供され、多くの最先端のレコメンダーシステムが登場しています [1]。しかし、研究者とエンジニアの間にはギャップが拡大しています：学術界は通常、エンドツーエンドのレコメンダーシステムを好む一方で、エンジニアはオリジナルデータの前処理、候補サブセットのリコール、推薦結果のランキングという三段階のパイプラインとしてレコメンダーシステムを設計する傾向があります。このようなパイプラインはシステム開発およびプロジェクト管理に大いに役立ち、近年の多くのインターネット企業によりその有効性が証明されています。エンドツーエンドとパイプラインのギャップを埋めるために、我々は三つの研究方向および関連する方法論を提案します。これには、前処理のためのラベル伝播に基づく手法、リコールのためのグラフニューラルネットワークによるネガティブサンプリング戦略、およびランキングのための暗黙的なインタラクションのグラフベースモデルが含まれます。以下では、それらを詳細に説明します。

Abstract:　 In a world of overloading choices, recommender system has become indispensable as personalized information filter, which makes it of great interest to both industry and academia. Recently, inspiring insights are provided by deep learning based algorithms, and bring us many state-of-the-art recommender systems [1]. However, there exists a widening gap between researchers and engineers: academics usually prefer end-to-end recommender systems, whereas engineers tend to design a recommender system as a three stage pipeline, including preprocessing original data, recalling a candidate subset, and ranking recommendation results. Such pipeline can greatly benefit system development and project management, which has been proven by many Internet companies over recent years. To bridge the gap between end-to-end and pipeline, we propose three research directions and relevant methodologies, including label propagation based methods for preprocessing, graph neural network based negative sampling strategy for recall, and graph-based model of implicit interactions for ranking. In the following, we will describe them in detail.

Embedding Formulae and Text for Improved Math Retrieval
Authors: Behrooz Mansouri (1)
1: Rochester Institute of Technology

ACM DL

Google Scholar

(372)
概要:　オンラインには、数百万もの数学式を含む大規模なデータコレクションが利用可能です。これらのコレクションから数学式を検索することは困難です。式の構造の複雑さは、専門的な処理を必要とします。数学的内容を検索する際には、式の類似性の正確な測定が、文書のランキング、クエリの推薦、結果セットのクラスタリングなどのタスクに役立ちます。これまでに単語やグラフの埋め込みについては多くの試みがありましたが、数式の埋め込みはまだ初期段階にあります。本研究は、数学式とその周辺のテキストの埋め込みモデルを導入し、数学情報検索に利用することを目的としています。まず、孤立した数学式の埋め込みモデルを紹介し、これらの埋め込みを用いた検索の有効性と効率性を内在的な指標で研究します。その結果は、本研究の第二の目標である、数式とテキストの共通埋め込みモデルの開発をサポートします。この共通埋め込みは、化学などのテキストと構造表現が同時に存在する場合を共同でモデル化する関連研究から利益を得ることができます。以下に研究課題をします。RQ1：どのようにして孤立した数学式の埋め込みモデルを効果的に提供できるか？RQ2：数式とテキストの共通埋め込みはどのように行われるべきか？RQ3：数学検索の評価を代表的なタスクに基づいてどのように行うか？RQ1については、まず木構造をたどる簡単なモデルを研究し、埋め込みモデルの有効性と効率性を確認した後、より高度なモデルに移行することを提案します。Tangent-CFT [2]モデルを導入しました。次のステップとして、グラフ埋め込みに適用された深層ニューラルネットワークモデルを検討する予定です。孤立した数式の埋め込みモデルを研究した後、RQ2では数式の周辺テキストの利用に焦点を当てます。共通埋め込みモデルの構築には4つのアプローチを検討します。数式の木構造をシーケンスに線形化し、テキストと線形化された数式に単一のシーケンス埋め込みモデルを適用する方法[1]に似た方法、テキストと数式の埋め込みを別々に形成し、監督またはヒューリスティックを用いて得られたシードアライメントを使用して2つの埋め込み空間を統合する方法、テキストから木を抽出し、両方の木に構造埋め込みモデルを適用する方法、専門的な埋め込みモデルの結果を組み合わせる方法です。例えば、タスクが検索（ランキング）の場合、最も単純なシナリオでは、Reciprocal Rank Fusion（RRF）やCombMNZなどの方法で結果を組み合わせることができます。次に、テキストと数式の埋め込みモデルの組み合わせ方法を研究します。1つの解決策は、各埋め込みを使用して検索を行い、結果を組み合わせることです。別のアプローチは、数式とテキストの特徴を捉えた統一的な埋め込みを提供するモデルを学習することです。さらに、テキストを木構造に変換し、木対木の翻訳問題としてアプローチする方法もあります。RQ1とRQ2の両方について、まず数式検索における提案された埋め込みの有効性を研究し、その後テキスト＋数式の条件に進む予定です。結果は、ARQMath [3]質問応答タスクで報告された最高の結果と比較されます。本研究の一部は数学の埋め込みモデルの作成に焦点を当てていますが、標準的な評価プロトコルとデータセットも必要です。3年間のARQMathラボの計画されたシーケンスで、RQ3に答えることを目指し、数学検索のための高品質なトレーニング、デブテスト、およびテストセットを提供します。特に、ARQMathは、孤立した数式検索における関連性の繰り返し可能なコミュニティコンセンサス定義の運用化のプラットフォームとしても機能します。

Abstract:　 Large data collections containing millions of math formulae are available online. Retrieving math expressions from these collections is challenging. The structural complexity of formulae requires specialized processing. When searching for mathematical content, accurate measures of formula similarity can help with tasks such as document ranking, query recommendation, and result set clustering. While there have been many attempts at embedding words and graphs, formula embedding is still in its early stages. This research aims to introduce an embedding model for mathematical formulae and accompanying text that can be used in math information retrieval. For that, first embedding models for isolated formulae are introduced, using intrinsic measures to study the effectiveness and efficiency of retrieval using those embeddings. Those results support the second goal of this research, which is to develop joint embedding models for formulae and text that can support the full range of content encountered in math retrieval. This can be seen as a special case of multimodal embedding, thus potentially benefiting from related research that jointly models other cases in which text and structured representations are co-present, such as chemistry. I summarize the research questions as follows: RQ1: How can we effectively provide an embedding model for isolated mathematical formulae? RQ2: How should the joint embedding of text and formulae be done? RQ3: How can evaluation of math search be grounded in a representative task? For RQ1, I propose to first study simple models that walk the tree structure to study the effectiveness and efficiency of the formula embedding model and then move to more advanced models. I have introduced Tangent-CFT [2] model. As my next step for formula embedding, I plan to look at deep neural network models that have been applied for graph embedding. After studying an embedding model for isolated formulae, in RQ2 I plan to focus on making use of the surrounding text of formulae. I will consider four possible approaches to constructing a joint embedding model: Linearizing the tree structure of formulae to sequences and then applying a single sequence embedding model to the text and the linearized formula, similar to [1], Forming separate embeddings for text and formulae, then unifying the two embedding spaces using seed alignments obtained either through supervision or using heuristics, or Extracting a tree out of the text and then apply a structure embedding model on both trees, or Combine results from specialized embedding models. For example, if the task is retrieval (ranking), then in the simplest scenario the results can be combined with methods such as Reciprocal Rank Fusion (RRF) or CombMNZ. I would then study how text and formulae embedding models should be combined. One possible solution might be to do retrieval using each of the embeddings and then combine the results. Another approach is to learn a model that provides a unified embedding that captures both formula and text features. Another approach to have a joint embedding model is to convert text to a tree structure. I can then look at this as a tree-to-tree translation problem. For both RQ1 and RQ2, I plan to first study the effectiveness of the proposed embedding in the formula retrieval before proceeding to the text+formula condition. Results will be compared with the best-reported results on the ARQMath [3] question answering task. While part of this research focuses on creating an embedding model for math, I also need a standard evaluation protocol and dataset. In a planned three-year sequence of ARQMath labs, I aim to answer RQ3 and provide high-quality training, devtest, and test sets for math search. Importantly, ARQMath also serves as a platform for operationalizing a repeatable community-consensus definition for relevance in isolated formula search.

Enabling Performance Prediction in Information Retrieval Evaluation
Authors: Guglielmo Faggioli (1)
1: University of Padova

ACM DL

Google Scholar

(373)
概要:　情報検索（IR）システムの性能を、その導入前にどのようにモデル化するかは、長い間IR研究者たちを悩ませてきた問題です。現在、IRシステムの評価は経験的実験に依存しています。経験的評価とは、実験用のコレクションが必要であり、それらを構築するには時間とコストがかかります。既存のコレクションを活用して、新しいコレクションに対するシステムの性能を予測することができれば、このコストを大幅に削減できるでしょう。本研究では、IRシステムの性能を予測するためのモデルの開発を研究します。特に、本研究では一般化線形混合モデルと因果推論を調査します。さらに、性能を点推定ではなく分布としてモデリングすることの重要性を強調します。

Abstract:　 How to model the performance of a retrieval system before its deploying has puzzled the Information Retrieval (IR)researchers for a long time. Currently, the evaluation of IR systems relies on empirical experiments. Empirical evaluation means that we need experimental collections: building them is expensive both in term of time and money. Exploiting already available collections to predict the performance of a system on new collections, would dramatically reduce such cost. With the research line described in this work,we plan to study the development of predictive models for the performance of the IR systems. In particular, the proposed research line will investigate Generalized Linear Mixed Models and Causal Inference. Furthermore, we highlight the importance of modelling the performance as distributions rather than point estimations.

Estimating the Reliability of Health-related Search Results
Authors: Marcos Fernández-Pichel (1)
1: University of Santiago de Compostela

ACM DL

Google Scholar

(374)
概要:　オンラインデータの信頼性を判断することは、近年ますます注目を集めている課題です。特に、COVID-19パンデミックの時期には信頼性の低い健康関連コンテンツが蔓延しました。本博士論文の主な目的は、エンドユーザーがオンラインコンテンツの正確性と信頼性をどのように判断するかを研究し、彼らがコンテンツの信頼性を評価する際に支援する一連のツールを提供することです。そのために、健康関連のオンラインコンテンツの信頼性をより良く評価するために役立つ情報源を特定し、それらをどのように学習に組み合わせるかを明らかにする必要があります。最後に、エンドユーザーが信頼性をより良く評価するために役立つプレゼンテーションの要素についても研究します。これまでの研究では、情報項目の形式やレイアウトがユーザーの偏見と組み合わさることで最終的な評価に影響を与えることが証明されています。

Abstract:　 Determining reliability of online data is a challenge that has recently received increasing attention. In particular, unreliable health-related content has become pervasive during the COVID-19 pandemic. The main objective of this Ph.D. thesis is to study how end-users judge the correctness and credibility of online content and provide them with a series of tools to assist them in assessing content reliability. To that end, we need to determine which sources of evidence may help to better assess the reliability of health-related online content, and how to combine them learning. Finally, I will also study which presentation aspects might help end-users to better assess reliability since previous research has proved that the format and layout of the information items, combined with user-based biases, influence their final assessments.

HONto: A Bottom-Up Knowledge Base from Textbooks for Recommending Contextually Relevant Documents
Authors: Sabine Wehnert (1)
1: Georg Eckert Institute Leibniz Institute for International Textbook Research & Otto von Guericke University Magdeburg

ACM DL

Google Scholar

(375)
概要:　本研究は、教科書から得られたボトムアップの知識ベースに基づいて設計されたレコメンダーシステムを紹介します。通常このようなタスクに適用される他のオントロジーが手作業で作成されるのに対し、我々の自動化アプローチは知識獲得のボトルネックに対する一つの解答となり得ます。私たちはセクションタイトルから概念の階層構造を抽出し、セクション内の共起をもとに、そこで言及されているエンティティ間の文脈的関係の証拠として利用します。法律の改正案の推薦という法的なユースケースを動機として、この設計は3つの主要な課題に取り組んでいます：法的文書のエンティティと規範変更を発表する議会議事録の間の度の違い、知識ベースを用いた説明可能な検索メカニズムの工学、および高いリコール要件にもかかわらず使いやすさを提供できることです。本システムは特定の法的ユースケースのために開発されましたが、レコメンダーシステム、情報検索および情報抽出、エンティティ解決、説明可能な人工知能、使いやすさといった多くの一般的な適用可能性を持つ側面があります。我々は、教育メディア研究などの他の応用についてもシステム設計の一部を検証しています。

Abstract:　 This research presents a recommender system designed on the basis of a bottom-up knowledge base from textbooks. While other ontologies that are usually applied to such tasks are hand-crafted, our automated approach is a possible answer to the knowledge acquisition bottleneck. We extract concept hierarchies from section titles and use co-occurrences in book sections as evidence for possible contextual relationships between the therein mentioned entities. Motivated by a legal use case of recommending upcoming changes in law, the design is targeting three major challenges: different abstraction levels between entities of legal documents and the parliament protocols announcing norm changes, as well as engineering an explainable retrieval mechanism using the knowledge base which can additionally offer decent usability despite a high-recall requirement. Although the system is developed for a specific legal use case, there are many aspects of general applicability in the fields of recommender systems, information retrieval and information extraction, entity resolution, explainable artificial intelligence and usability. We validate selected parts of the system design also on other applications, such as educational media research.

Improving Deep Learning based Multi-document Summarization through Linguistic Knowledge
Authors: Congbo Ma (1)
1: The University of Adelaide

ACM DL

Google Scholar

(376)
概要:　マルチドキュメントは、自然言語処理（NLP）分野における最も重要なタスクの一つであり、近年ますます注目を集めています。このタスクは、複数の関連するトピックの文書から一つのを生成することを目指しています。抽出的と比較すると、的は人間が書いたものにより近いです。効果的かつ効率的な的マルチドキュメントモデルの提案は、NLPコミュニティにとって重要です。既存の深層学習ベースのマルチドキュメントモデルは、ニューラルネットワークの独特な特徴抽出能力に依存していますが、入力文書に対する言語学的情報が有意義な知識に満ちているにもかかわらず、単語間の依存関係などの重要な言語学的知識を見落としています。さらに、の質を自動的に評価する方法は、高性能なモデルを設計する上で極めて重要です。評価指標は、手法の有効性を客観的に測定するためです。本提案では、的マルチドキュメントタスクに関する二つの研究課題とそれに対応する解決策を提示します。

Abstract:　 Multi-document summarization is one of the most important tasks in the field of Natural Language Processing (NLP) and it gains increasing attention in recent years. It aims to generate one summary across several topic-related documents. Compared with extractive summarization, abstractive summarization is more similar to human-written ones. Proposing effective and efficient abstractive multi-document summarization models is significant to the NLP community. Existing deep learning based multi-document summarization models rely on the exceptional ability of neural networks to extract distinct features. However, they have missed out important linguistic knowledge such as dependencies between words since linguistics information in texts is full of meaningful knowledge with respect to the input documents. Besides, how models automatically evaluate the quality of the summary is crucial to design a high-performance summarization model since the evaluation indicator objectively measures the effectiveness of a method. In this proposal, we bring forward two research questions and corresponding solutions for the abstractive multi-document summarization task.

Interpretable Document Representations for Fast and Accurate Retrieval of Mathematical Information
Authors: Vít Novotný (1)
1: Masaryk University

ACM DL

Google Scholar

(377)
概要:　国際データコーポレーションの調査によると、2021年までにデジタル情報資源の総量は40ゼタバイトに達する見込みであった [2]。メリルリンチの規則によれば、これらの資源の80%から90%は非構造化データである [7]。それにもかかわらず、ユーザーはデジタルライブラリに対して、自分の情報ニーズを満たすために迅速かつ解釈可能な情報資源へのアクセスを期待している。2008年には、標準的な情報検索技術がデジタル数学ライブラリ内の文書を正確に検索するには不十分であることが明らかになったため、数学情報検索が情報検索のサブフィールドとして誕生した [8]。

Abstract:　 A study conducted by the International Data Corporation predicted that by the year 2021, the total amount of digital information resources would have reached the 40 zettabyte mark [2]. According to a rule formulated by Merrill Lynch, 80 to 90% of these resources are unstructured [7]. Despite this, users expect digital libraries to provide them with fast and interpretable access to digital information resources that will satisfy their information need. Math information retrieval emerged as a subfield of information retrieval in 2008 [8], when it became clear that standard information retrieval techniques used for text documents are inadequate to accurately retrieve documents in digital mathematical libraries.

New Perspectives to Query Performance Prediction Evaluation
Authors: Oleg Zendel (1)
1: RMIT University

ACM DL

Google Scholar

(378)
概要:　摘要
クエリ性能予測（QPP）の研究は、人間の関連性判断がない状況で検索結果の有効性を推定することに焦点を当てています。この分野は過去20年間にわたり広範に研究されてきました。バーチャルアシスタントの普及と複雑な情報ニーズに対する研究の進展に伴い、信頼性の高いQPP手法の必要性とその潜在的な応用の数が大幅に増加しています。本研究では、QPPの評価フレームワークの改善に焦点を当てます。既存の評価をQPP手法の改善においてかなりの制約と見なし、信頼性のある改善された評価フレームワークがQPPにおける飛躍的な進歩の礎になると考えています。

現行のQPP評価フレームワークは主に、クエリごとの予測スコアと実際のクエリごとのシステム有効性（通常は平均適合率 (AP)）との相関係数の測定に依存しています。高い相関を達成するQPP手法が優れていると考えられますが、Hauffらは高い相関がより正確な予測を保証するものではないことを示しています。著者たちはさらに、複数の相関係数間の統計的に有意な差異を判断するために、フィッシャーの変換と信頼区間 (CI)の使用を推奨しています。

さらに、現行の評価手法は特定のコーパス、検索方法、およびクエリセットの組み合わせにのみ有効であり、これらが変更された場合には必ずしも有効とは限りません。つまり、現行の評価は各コンポーネントに依存しており、QPP手法の相対的な予測品質についての結論は慎重に取り扱うべきです。提案する研究では、QPP手法の性能を信頼性高く比較するためのより良い評価手法の開発を目指しています。クエリのバリアントの利用を可能にし、QPP評価における他の混乱要因を考慮に入れた新しい評価フレームワークと基準を開発することを意図しています。

具体的には、次の研究課題を提起します：(i) 現行のQPP評価慣行にはどのような制限が存在するか？(ii) クエリ性能予測結果の詳細な障害分析を行うための最良のアプローチは何か？(iii) 単一のクエリで表される一連のトピック（異なる情報ニーズ）に対して、複数のクエリで表される同じ情報ニーズに対して、現行のQPP手法はどのように性能が異なるか？(iv) 現行の評価手法と新しい評価手法はユーザー満足度とどのように整合するか？

最初の二つの研究課題に答えるために、FaggioliらはQPPの新しい評価フレームワークを提案しました。提案されたフレームワークでは、各クエリの誤差が計算され、一連のクエリのクエリごとの誤差の分布が得られます。新しい誤差分布により、著者たちはN方向分散分析（ANOVA）を適用し、その後、事後分析としてTukey's Honestly Significant Difference（HSD）テストを行い、QPP評価に関与する複数の要因間の統計的に有意な差異を判断できます。評価プロセスにおける異なるコンポーネントを分離することで、予測プロセスにおける各コンポーネントの影響についてより信頼性の高い結論に到達できます。

予備的な研究として、Zendelらは上述のタスクにおいて複数の既存のQPP手法を比較しました；異なるトピックを表す異なるクエリ、および同じトピックを表す異なるクエリバリアントの有効性を予測することです。彼らは、クエリ間のAPの差が予測品質に影響を与える重要な混乱要因であることを発見しました。今後の研究では、異なるトピックのクエリと同じトピックのクエリバリアントの両方に対して、QPPの信頼性の高い評価フレームワークを開発することに焦点を当てます。適切なフレームワークは、QPPに影響を与える異なる要因の分解および定量化を伴う厳密な統計分析を可能にすべきです。また、後続のユーザー調査では、新しい評価フレームワークがQPP結果のユーザー満足度とどのように整合するかを探求します。

Abstract:　 The research on Query Performance Prediction (QPP) focuses on estimating the effectiveness of retrieval results in the absence of human relevance judgments. Accurately estimating the result of a search performed in response to a query has been extensively studied over the past two decades. With the rising popularity of virtual assistants along with evolving research on complex information needs, the need for reliable QPP methods as well as the number of potential applications significantly increases. In this work, we focus on improving the evaluation framework of QPP. As we see the existing evaluation as a considerable limitation in the improvement of QPP methods, a reliable and improved evaluation framework would constitute a stepping-stone for a breakthrough in QPP. The existing evaluation framework in QPP mainly relies on the measurement of the correlation coefficient between the per-query prediction scores and the actual per-query system effectiveness measure, usually Average Precision (AP). The QPP method that achieves higher correlation is considered to be superior. However, Hauff et al. demonstrate that higher correlation does not vouch for more accurate prediction. The authors additionally advocate the usage of Fisher's transformation and Confidence Intervals (CIs) to determine statistically significant differences between multiple correlation coefficients. Furthermore, the existing evaluation methodology is true only per a specific combination of a corpus, retrieval method, and set of queries; and does not necessarily hold if any of these is changed. That is, the existing evaluation is not agnostic to the different components, thus any conclusions about the relative prediction quality of the QPP methods should be taken with a grain of salt. In the proposed research we aim to develop a better evaluation technique to reliably compare the performance of QPP methods. We intend to develop a new evaluation framework and standards that will simultaneously enable the utilization of query variants and take into consideration other confounding factors in QPP evaluation. Specifically, we raise the following research questions: (i) What limitations exist in the current evaluation practices of QPP? (ii) What are the best approaches to perform detailed failure analysis of query performance predictor results? (iii) How do existing QPP methods differ in performance on a set of topics (distinct information needs) represented by a single query versus a set of multiple queries which represent the same information need? (iv) How do the existing and new evaluation methodologies align with user satisfaction? To answer the first two research questions Faggioli et al. proposed a new evaluation framework for QPP. In the proposed framework an error is calculated for each query, resulting in a distribution of per-query errors for a set of queries. The new distribution of errors enables the authors to apply an N-way ANalysis Of VAriance (ANOVA) followed by a post-hoc analysis, Tukey's Honestly Significant Difference (HSD) test, to determine statistically significant differences between the multiple factors involved in the QPP evaluation. Separating the different components in the evaluation process allows reaching more reliable conclusions regarding the effects of each component in the prediction process. As a preliminary study, Zendel et al. compared multiple existing QPP methods in the aforementioned tasks; predicting the effectiveness for different queries representing different topics, and different query variants, that represent the same topic. They found that the difference in AP between the queries is an important confounding factor, that affects the prediction quality. Future work will focus on developing a reliable evaluation framework for QPP both for queries from different topics and query variants from the same topic. A suitable framework should enable rigorous statistical analysis with decomposition and quantification of the different factors that affect QPP. In addition, a subsequent user study will explore how the new evaluation framework aligns with user satisfaction of QPP results.

Semi-automatic Labelling of Scientific Articles using Deep Learning to Enlarge Benchmark Data for Scientific Summarization
Authors: Alaa El-Ebshihy (1)
1: TU Wien

ACM DL

Google Scholar

(379)
概要:　科学論文のは、十分な大規模アノテーションコーパスが存在しないため、非常に難易度の高いタスクです。本研究提案では、半教師あり/自動アノテーション手法を用いて、科学論文の大規模アノテーションコーパスを構築するアプローチを提案します。まず、少量のアノテーション済みコーパスを出発点として、深層学習手法を適用し、その規模を拡大します。その後、様々な評価技術を用いて、下流の情報におけるアノテーションコーパスの質を測定します。

Abstract:　 Scientific article summarization is a challenging task not least due to the lack of large annotated corpora. In this research proposal, we present an approach to construct a large annotated corpus for scientific articles using semi-supervised/automatic annotation approaches. We intend to apply deep learning methods to increase a small seed of annotated corpus. Then, we will measure the quality of the annotated corpus on down stream informative summaries using various evaluation techniques.

Transfer Learning for the Multilingual and Multi-Domain Classification of Messages Relating to Crises
Authors: Cinthia Sánchez (1)
1: Universidad de Chile

ACM DL

Google Scholar

(380)
概要:　危機時において、ソーシャルメディアは情報源として重要な役割を果たします。ソーシャルメディアは、伝統的なニュースメディアよりも迅速に重要情報を伝播でき、利用者がイベントが展開されている場所から即時に情報を提供できます。いくつかの研究は、災害管理や人道支援に貢献するための危機関連メッセージの自動検出に取り組んでいます。しかし、それらの多くは特定の言語（通常は英語）やイベントの種類に焦点を当てており、他の文脈への適用性が制限されています。異なる言語や災害の種類におけるラベル付きデータの不足は、監督学習ベースのアプローチをより多様なシナリオに適用する上で大きな障害となっています。この問題に対処するため、本研究は多言語の危機検出機能を構築するために、言語に依存しない形で多様な危機ドメインに関連するメッセージを特徴付けることを目指します。この目的を達成するために、異なるデータ表現および分類手法を比較し、危機ドメインおよび言語における転移学習の性能を包括的に評価することを提案します。

Abstract:　 Social media plays an important role as a source of information during crisis events. It allows for more rapid dissemination of critical information than traditional news media, as its users can provide immediate information from the locations where events are unfolding. Several studies have addressed the automatic detection of crisis-related messages to contribute to disaster management and humanitarian assistance. However, most of them have focused on a particular language (usually English) or type of event, which limits their applicability to other contexts. The lack of labeled data in different languages and types of disasters poses a major obstacle to the application of supervised learning-based approaches to more diverse scenarios. To address this problem, this research aims to characterize messages related to diverse crisis domains in a language-agnostic manner in order to construct multilingual crisis detectors. To achieve this, we propose a comprehensive evaluation of transfer learning performance in terms of crisis domain and language, comparing different data representations and classification techniques.

Understanding How Algorithmic and Cognitive Biases in Web Search Affect User Attitudes on Debated Topics
Authors: Tim Draws (1)
1: Delft University of Technology

ACM DL

Google Scholar

(381)
概要:　ウェブ検索は、ユーザーが重要な個人的決定についてのアドバイスを求めるためのプラットフォームとしてますます利用されていますが、いくつかの異なる方法でバイアスがかかる可能性があります。そのようなバイアスの一つの結果として、検索エンジン操作効果（SEME）があります。これは、検索結果のリストが議論の対象となるトピック（例：菜食主義）に関連し、特定の見解に関する文書を促進する場合（例：それらを上位にランク付けすることで）、ユーザーがこの優位な見解を採用しがちになる現象です。しかし、SEMEの検出と緩和は、その基礎となるメカニズムに関する経験的理解の欠如によって複雑化されています。本論文は、議論の対象となるトピックに関して、どのような（およびどの程度の）アルゴリズム的および認知的バイアスがSEMEに影響を与えるかを調査することを目的としています。

RQ1. 議論の対象となるトピックに関する文章の文書の見解を正確に表現できるラベルのセットは何か？
議論の対象となるトピックに関するウェブ検索の文脈でアルゴリズム的および認知的バイアスを研究するためには、文書の正確なラベル付けが必要です。RQ1は、議論の対象となるトピックに関する文章の見解を最適に表現する方法を調査します。本研究の最初のステップとして、文書の見解ラベルに、副次的観点を追加の次元として導入しました（つまり、特定の立場をとる背後にある人々の動機を追加）。そして、それらを共同トピックモデルを使用して自動的に発見できる方法を示しました。将来の研究では、立場と観点から成る見解ラベルが正確な表現であるか（もしくは、より微妙な概念が必要か）を評価し、これらのラベルを取得する方法を説明します。RQ1に関する研究は、テキスト文書が表現する議論の対象となるトピックの見解を正確に表現するためのフレームワークを提供します。これにより、検索結果における見解関連のランキングバイアスをアルゴリズム的に評価し、文書の見解とユーザーの見解の一致を図ることが可能になります。

RQ2. 検索結果における見解関連のランキングバイアスを自動的に測定する方法は何か？
検索結果におけるランキングバイアス、公平性、多様性を測定するためにいくつかの方法が提案されています。RQ2は、これらの方法（または新規の方法）のうち、どれが見解関連のランキングバイアスを評価するのに適しているかを調査します。RQ2への最初の貢献は、カテゴリカルな見解ラベルのためのランキング公平性メトリクスを使用して、検索結果における見解関連のランキングバイアスを評価する方法を実証し、どの具体的方法がどの状況で最も効果的であるかを評価することでした。今後の研究では、より複雑な設定で見解関連のランキングバイアスを評価する方法を開発する予定です。さらに、議論の対象となるトピックに関する実際の検索結果で見解関連のランキングバイアスを評価することも目指しています。この研究は、検索結果における見解関連のランキングバイアスを測定するための新しい評価メトリクス、ウェブベースのデモを使用してそれらを使用する際のガイドラインのセット、および実際の検索結果における見解関連のランキングバイアスに関する実践者への指針を提供します。

RQ3. ウェブ検索エンジンのユーザーにおける議論の対象となるトピックに関する態度変化のプロセスに寄与する認知バイアスは何か？
アルゴリズムによるランキングバイアスを測定するだけでは、人間の行動への影響を理解するには不十分です。RQ3は、SEMEの原因である具体的な認知バイアスを理解することを目的としています。つまり、ユーザーが検索結果を見た後に態度を変える際に犯す論理的な間違いが何であるかを調べます。RQ3への最初の貢献は、注文効果のみがSEMEの原因となりうるかどうかをユーザースタディで評価することでした。我々は、これが必ずしもそうでない可能性があることを発見し、露出効果が以前考えられていたよりも大きな役割を果たす可能性があることを示す探索的結果を説明します。今後の研究では、RQ1およびRQ2の結果を考慮に入れて、より現実的なSEMEのシナリオを描き、アルゴリズム的およびさまざまな認知バイアス間の相互作用を研究します。この研究の結果は、ウェブ検索におけるユーザーの認知バイアスを緩和することによってSEMEを回避する方法についてのガイドラインのセットを提供します。

Abstract:　 Web search increasingly provides a platform for users to seek advice on important personal decisions but may be biased in several different ways. One result of such biases is the search engine manipulation effect (SEME): when a list of search results relates to a debated topic (e.g., veganism) and promotes documents pertaining to a particular viewpoint (e.g., by ranking them higher), users tend to adopt this advantaged viewpoint. However, the detection and mitigation of SEME are complicated by the current lack of empirical understanding of its underlying mechanisms. This dissertation aims to investigate which (and to what degree) algorithmic and cognitive biases play a role in SEME concerning debated topics. RQ1. What set of labels can accurately represent viewpoints of textual documents on debated topics? Studying algorithmic and cognitive biases in the context of web search on debated topics requires accurate labeling of documents. RQ1 investigates how to best represent viewpoints of textual documents on debated topics. The first step in this work was introducing perspectives as an additional dimension of viewpoint labels for textual documents (i.e., adding people's underlying motivations for taking a given stance) and showing how they can be automatically discovered using Joint Topic Models. My future research will evaluate whether viewpoint labels consisting of stances and perspectives are accurate representations (or whether more nuanced notions are necessary) and describe how to obtain these labels. The work on RQ1 will result in a framework to accurately represent viewpoints on debated topics expressed by textual documents. This will allow for algorithmic assessment of viewpoint-related ranking bias in search results and alignment of document viewpoints with users' viewpoints. RQ2. What methods can automatically measure viewpoint-related ranking bias in search results? Several methods have been proposed to measure ranking bias, fairness, and diversity in search results. RQ2 investigates which of these (or novel) methods can be used to assess viewpoint-related ranking bias. The first contribution to RQ2 was demonstrating how to assess viewpoint-related ranking bias in search results using ranking fairness metrics for categorical viewpoint labels and evaluated which specific methods work best in which situation. Going forward, I plan to develop methods that assess viewpoint-related ranking bias in more complex settings. Furthermore, I aim to assess viewpoint-related ranking bias in real search results on debated topics. This work will contribute novel evaluation metrics that measure viewpoint-related ranking bias in search results, a set of guidelines for when and how to use them using a web-based demo, as well as directions for practitioners regarding viewpoint-related ranking bias in real search results. RQ3. What cognitive biases may contribute to the process of attitude change on debated topics in users of web search engines? Being able to measure algorithmic ranking bias is not yet enough to understand its effect on human behavior. RQ3 aims at understanding which specific cognitive biases are responsible for SEME; i.e., what reasoning mistakes users make when they change their attitudes after viewing search results. The first contribution to RQ3 was evaluating in a user study whether order effects alone can cause SEME. We found that this may not be the case and describe exploratory results that show that exposure effects may play a more important role in causing SEME than previously anticipated. My future work in this area will consider findings from RQ1 and RQ2 to draw more realistic scenarios of SEME and study interactions between algorithmic and different cognitive biases. The result of this work will be a set of guidelines for how SEME could be avoided by mitigating cognitive user biases in web search.