SIGIR’22 ABSTRACT¶

Can Clicks Be Both Labels and Features?: Unbiased Behavior Feature Collection and Uncertainty-aware Learning to Rank
Authors: Tao Yang (1), Chen Luo (2), Hanqing Lu (2), Parth Gupta (2), Bing Yin (2), Qingyao Ai (1)
1: University of Utah, 2: Amazon Search

ACM DL

Google Scholar

(0)
概要:　ユーザーのクリックから収集された暗黙的なフィードバックを学習ラベルとして用いることで、学習-to-ランクアルゴリズムのトレーニングを行う方法は、現代の情報検索（IR）システムにおいて広く研究され、活用されてきました。一方、ユーザーのクリックをランキング特徴として用いることについては、現存する文献で十分に探求されていません。その潜在力には短期的なシステム性能の向上がある一方で、ユーザークリックをランキング特徴として取り入れることが長期的には学習-to-ランクシステムに有益であるかどうかには疑問が残ります。最も重要な問題の二つは、(1) ノイズの多いユーザー行動によって引き起こされる明示的なバイアス、そして (2) 行動特徴を用いた学習-to-ランクシステムの動的なトレーニングと運用によって引き起こされる暗黙的なバイアス、すなわち「利用バイアス」と呼ばれるものです。本論文では、ユーザークリックをトレーニングラベルおよびランキング特徴の両方として取り入れる可能性を探ります。特徴収集とモデルトレーニングにおける問題点を正式に調査し、反実的特徴投影関数および新しい不確実性対応の学習-to-ランクフレームワークを提案します。公開されているデータセットを用いた実験では、提案されたフレームワークで学習されたランキングモデルが、生のクリック特徴を用いたモデルやモデルの不確実性を考慮せずにアイテムをランキングするアルゴリズムと比較して大幅に優れていることが示されました。

Abstract:　 Using implicit feedback collected from user clicks as training labels for learning-to-rank algorithms is a well-developed paradigm that has been extensively studied and used in modern IR systems. Using user clicks as ranking features, on the other hand, has not been fully explored in existing literature. Despite its potential in improving short-term system performance, whether the incorporation of user clicks as ranking features is beneficial for learning-to-rank systems in the long term is still questionable. Two of the most important problems are (1) the explicit bias introduced by noisy user behavior, and (2) the implicit bias, which we refer to as the exploitation bias, introduced by the dynamic training and serving of learning-to-rank systems with behavior features. In this paper, we explore the possibility of incorporating user clicks as both training labels and ranking features for learning to rank. We formally investigate the problems in feature collection and model training, and propose a counterfactual feature projection function and a novel uncertainty-aware learning to rank framework. Experiments on public datasets show that ranking models learned with the proposed framework can significantly outperform models built with raw click features and algorithms that rank items without considering model uncertainty.

Implicit Feedback for Dense Passage Retrieval: A Counterfactual Approach
Authors: Shengyao Zhuang (1), Hang Li (1), Guido Zuccon (1)
1: The University of Queensland

ACM DL

Google Scholar

(1)
概要:　本研究では、Dense Retrievers (DRs) における暗黙的フィードバックの効果的な利用方法を検討します。具体的には、過去のクリックログから得られたクリックデータを暗黙的フィードバックとして利用する場合を考えます。このような過去の暗黙的インタラクションを活用して、DRの有効性を向上させることを目指します。我々が研究する主な課題は、位置バイアスのようなクリックシグナルに含まれるバイアスがDRに与える影響です。このようなバイアスの存在に関連する問題を克服するために、我々はCounterfactual Rocchio (CoRocchio) アルゴリズムを提案します。CoRocchioアルゴリズムを使用することで、位置バイアスに対してバイアスのない密なクエリ表現を学習し、より高い検索効果を実現することを理論的および実証的に示します。提案した方法の実装および実験フレームワークと、全ての結果をhttps://github.com/ielab/Counterfactual-DRで公開しています。

Abstract:　 In this paper we study how to effectively exploit implicit feedback in Dense Retrievers (DRs). We consider the specific case in which click data from a historic click log is available as implicit feedback. We then exploit such historic implicit interactions to improve the effectiveness of a DR. A key challenge that we study is the effect that biases in the click signal, such as position bias, have on the DRs. To overcome the problems associated with the presence of such bias, we propose the Counterfactual Rocchio (CoRocchio) algorithm for exploiting implicit feedback in Dense Retrievers. We demonstrate both theoretically and empirically that dense query representations learnt with CoRocchio are unbiased with respect to position bias and lead to higher retrieval effectiveness. We make available the implementations of the proposed methods and the experimental framework, along with all results at https://github.com/ielab/Counterfactual-DR.

Bilateral Self-unbiased Learning from Biased Implicit Feedback
Authors: Jae-woong Lee (1), Seongmin Park (1), Joonseok Lee (2), Jongwuk Lee (1)
1: Sungkyunkwan University, 2: Seoul National University

ACM DL

Google Scholar

(2)
概要:　インプットフィードバックは商業用レコメンダーシステムの構築に広く利用されています。観測されたフィードバックはユーザーのクリックログを表すため、真の関連性と観測されたフィードバックの間にはセマンティックギャップが存在します。さらに重要なのは、観測されたフィードバックが通常、人気のあるアイテムに偏っているため、人気アイテムの実際の関連性を過大評価してしまうことです。既存の研究では、逆傾向重み付け（IPW）や因果推論を用いてバイアスのない学習方法を開発してきましたが、これらの方法はアイテムの人気バイアスの除去のみに焦点を当てています。本研究では、リコメンダーモデルによって引き起こされるアイテムの露出バイアスを取り除く、新しいバイアスのないレコメンダー学習モデル、すなわち「BIlateral SElf-unbiased Recommender（BISER）」を提案します。具体的には、BISERは以下の2つの主要なコンポーネントで構成されています：(i) 自己逆傾向重み付け（SIPW）：高い計算コストをかけることなく、アイテムのバイアスを徐々に軽減する；および (ii) 二方向のバイアスのない学習（BU）：ユーザーおよびアイテムベースのオートエンコーダーというモデル予測において補完的な2つのモデル間のギャップを埋め、高い分散を軽減する。広範な実験により、BISERがCoat、Yahoo! R3、MovieLens、およびCiteULikeを含む複数のデータセットにおいて、最新のバイアスのないレコメンダーモデルを一貫して上回る性能を発揮することが示されました。

Abstract:　 Implicit feedback has been widely used to build commercial recommender systems. Because observed feedback represents users' click logs, there is a semantic gap between true relevance and observed feedback. More importantly, observed feedback is usually biased towards popular items, thereby overestimating the actual relevance of popular items. Although existing studies have developed unbiased learning methods using inverse propensity weighting (IPW) or causal reasoning, they solely focus on eliminating the popularity bias of items. In this paper, we propose a novel unbiased recommender learning model, namely BIlateral SElf-unbiased Recommender (BISER), to eliminate the exposure bias of items caused by recommender models. Specifically, BISER consists of two key components: (i) self-inverse propensity weighting (SIPW) to gradually mitigate the bias of items without incurring high computational costs; and (ii) bilateral unbiased learning (BU) to bridge the gap between two complementary models in model predictions, i.e., user- and item-based autoencoders, alleviating the high variance of SIPW. Extensive experiments show that BISER consistently outperforms state-of-the-art unbiased recommender models over several datasets, including Coat, Yahoo! R3, MovieLens, and CiteULike.

Interpolative Distillation for Unifying Biased and Debiased Recommendation
Authors: Sihao Ding (1), Fuli Feng (2), Xiangnan He (2), Jinqiu Jin (1), Wenjie Wang (3), Yong Liao (2), Yongdong Zhang (1)
1: University of Science and Technology of China, 2: University of Science and Technology of China & CCCD Key Lab of MCT, 3: National University of Singapore

ACM DL

Google Scholar

(3)
概要:　ほとんどのレコメンダーシステムは、モデルの性能をオフラインで評価する際に、1) 実際のインタラクションに基づいたバイアスのかかったテスト、または 2) 無作為化対照試験からの記録を用いたバイアスを軽減したテストのいずれかを使用します。実際、どちらのテストも全体の一部しか反映していません。実際のインタラクションはレコメンデーションポリシーから収集され、それらに適合することはプラットフォームに対してクリック率やコンバージョン率の向上をもたらします。一方、デバイアスされたテストはシステムによるバイアスを排除し、ユーザーの真の嗜好をより反映します。しかし、既存のモデルはこれらの二つのテスト間でトレードオフを示し、両方のテストで優れた性能を発揮する手法が欠けています。本研究では、両方のテストで強力なパフォーマンスを示す「ウィンウィン」レコメンデーション手法を開発することを目指します。これは、現実環境（すなわちバイアスのかかったテスト）および反事実的環境（すなわちデバイアスされたテスト）の両方で正確な予測ができるモデルの学習を必要とするため、容易ではありません。この目標に向けて、我々は両方の環境を考慮に入れた環境認識型レコメンデーションモデリングを行います。特に、「インターポレーティブ・ディスティレーション（InterD）フレームワーク」を提案し、バイアスとデバイアスのモデルをユーザーアイテムペアレベルで補間しつつ、学生モデルに蒸留します。我々は、三つの実世界データセットを用いて両方のテストを実施しました。実験結果は、特にあまり人気のないアイテムで著しい向上を示すInterDの合理性と効果を裏付けています。

Abstract:　 Most recommender systems evaluate model performance offline through either: 1) normal biased test on factual interactions; or 2) debiased test with records from the randomized controlled trial. In fact, both tests only reflect part of the whole picture: factual interactions are collected from the recommendation policy, fitting them better implies benefiting the platform with higher click or conversion rate; in contrast, debiased test eliminates system-induced biases and thus is more reflective of user true preference. Nevertheless, we find that existing models exhibit trade-off on the two tests, and there lacks methods that perform well on both tests. In this work, we aim to develop a win-win recommendation method that is strong on both tests. It is non-trivial, since it requires to learn a model that can make accurate prediction in both factual environment (ie normal biased test) and counterfactual environment (ie debiased test). Towards the goal, we perform environment-aware recommendation modeling by considering both environments. In particular, we propose an Interpolative Distillation (InterD) framework, which interpolates the biased and debiased models at user-item pair level by distilling a student model. We conduct experiments on three real-world datasets with both tests. Empirical results justify the rationality and effectiveness of InterD, which stands out on both tests especially demonstrates remarkable gains on less popular items.

Investigating Accuracy-Novelty Performance for Graph-based Collaborative Filtering
Authors: Minghao Zhao (1), Le Wu (2), Yile Liang (1), Lei Chen (2), Jian Zhang (3), Qilin Deng (1), Kai Wang (1), Xudong Shen (1), Tangjie Lv (1), Runze Wu (1)
1: Fuxi AI Lab, 2: Hefei University of Technology, 3: Zhejiang University of Technology

ACM DL

Google Scholar

(4)
概要:　近年、リコメンダーシステムにおけるグラフベースの協調フィルタリング（CF）モデルの高い精度が注目されています。ユーザーとアイテムの相互作用をグラフとして捉え、これらのグラフベースのCFモデルはグラフニューラルネットワーク（GNN）の成功を取り入れ、協調信号を伝播するために隣接ノードの集約を反復的に行います。従来のCFモデルは人気のあるアイテムを優先する人気バイアスの課題に直面していることが知られていますが、既存のグラフベースのCFモデルがリコメンダーシステムの人気バイアスを緩和するのか、それとも悪化させるのかについて疑問が生じます。この質問に答えるために、まず既存のグラフベースのCF手法における精度と新規性の二重の性能を調査しました。実証結果から、ほとんどの既存のグラフベースのCFモデルが採用する対称的な隣接ノードの集約は人気バイアスを悪化させ、この現象はグラフ伝播の深さが増すにつれてより深刻になることが示されました。さらに、グラフベースのCFの人気バイアスの原因を理論的に分析しました。その後、隣接ノードの集約プロセスにおける正規化の強さを制御することで精度と新規性のトレードオフを達成する単純で効果的なプラグイン「r-AdjNorm」を提案します。同時に、r-AdjNormは追加の計算を必要とせずに、既存のグラフベースのCFバックボーンにスムーズに適用できます。最後に、3つのベンチマークデータセットで行った実験結果により、提案手法は様々なグラフベースのCFバックボーンにおいて精度を犠牲にすることなく新規性を向上させることが確認されました。

Abstract:　 Recent years have witnessed the great accuracy performance of graph-based Collaborative Filtering (CF) models for recommender systems. By taking the user-item interaction behavior as a graph, these graph-based CF models borrow the success of Graph Neural Networks (GNN), and iteratively perform neighborhood aggregation to propagate the collaborative signals. While conventional CF models are known for facing the challenges of the popularity bias that favors popular items, one may wonder "Whether the existing graph-based CF models alleviate or exacerbate the popularity bias of recommender systems?" To answer this question, we first investigate the two-fold performances w.r.t. accuracy and novelty for existing graph-based CF methods. The empirical results show that symmetric neighborhood aggregation adopted by most existing graph-based CF models exacerbates the popularity bias and this phenomenon becomes more serious as the depth of graph propagation increases. Further, we theoretically analyze the cause of popularity bias for graph-based CF. Then, we propose a simple yet effective plugin, namely r-AdjNorm, to achieve an accuracy-novelty trade-off by controlling the normalization strength in the neighborhood aggregation process. Meanwhile, r-AdjNorm can be smoothly applied to the existing graph-based CF backbones without additional computation. Finally, experimental results on three benchmark datasets show that our proposed method can improve novelty without sacrificing accuracy under various graph-based CF backbones.

Co-training Disentangled Domain Adaptation Network for Leveraging Popularity Bias in Recommenders
Authors: Zhihong Chen (1), Jiawei Wu (1), Chenliang Li (2), Jingxu Chen (1), Rong Xiao (1), Binqiang Zhao (1)
1: Alibaba Group, 2: Wuhan University

ACM DL

Google Scholar

(5)
概要:　推奨システムは通常、人気バイアスに直面します。人気分布のシフトの観点から、主に人気のあるアイテム上で訓練された通常のパラダイムは、人気のあるアイテムを頻繁に推奨することで損失を低減できると認識し、アイテムの特性埋め込み、例えばID埋め込みに人気情報を注入します。一方で、長尾分布のシフトの観点からは、長尾アイテムの希少なインタラクションは、十分な学習ができない原因となります。この結果生じる人気のあるアイテムと長尾アイテム間の分布不一致は、バイアスを受け継ぐだけでなく、バイアスを増幅させる可能性があります。既存の研究では、逆傾向スコアリング（IPS）や因果埋め込みを用いてこの問題に対処しています。しかし、すべての人気バイアスが悪影響を意味するわけではないと我々は主張します。例えば、質が高かったりトレンドに合致したために高い人気を示すアイテムは、より多くの推薦に値します。無分別にバイアスを除去しようとすると、高品質や流行のアイテムが抑制される可能性があります。人気バイアスをより良く活用するために、我々は共同訓練分離領域適応ネットワーク（CD$^2$AN）を提案します。これは、バイアスのあるモデルとないモデルの両方を共同で訓練できます。具体的には、人気分布のシフトに対して、CD$^2$ANはアイテム特性表現と人気表現をアイテム特性埋め込みから分離します。また、長尾分布のシフトに対して、我々は追加の未公開アイテム（ほとんどが長尾アイテム）を導入して人気のあるアイテムと長尾アイテム特性表現の分布を整えます。さらに、インスタンスの観点から、我々はアイテム類似性正則化を慎重に設計し、アイテムの包括的な表現を学習します。これにより、効果的な共起パターンを持つアイテムペアが、より類似したアイテム特性表現を持つように促します。オフライン評価とオンラインのA/Bテストに基づいて、我々はCD$^2$ANが既存のバイアス解決策よりも優れていることを示します。現在、CD$^2$ANはモバイル淘宝アプリで成功裏に導入されており、主要なオンライントラフィックを処理しています。

Abstract:　 Recommender system usually faces popularity bias. From the popularity distribution shift perspective, the normal paradigm trained on exposed items (most are hot items) identifies that recommending popular items more frequently can achieve lower loss, thus injecting popularity information into item property embedding, e.g., id embedding. From the long-tail distribution shift perspective, the sparse interactions of long-tail items lead to insufficient learning of them. The resultant distribution discrepancy between hot and long-tail items would not only inherit the bias, but also amplify the bias. Existing work addresses this issue with inverse propensity scoring (IPS) or causal embeddings. However, we argue that not all popularity biases mean bad effects, i.e., some items show higher popularity due to better quality or conform to current trends, which deserve more recommendations. Blindly seeking unbiased learning may inhibit high-quality or fashionable items. To make better use of the popularity bias, we propose a co-training disentangled domain adaptation network (CD$^2$AN), which can co-train both biased and unbiased models. Specifically, for popularity distribution shift, CD$^2$AN disentangles item property representation and popularity representation from item property embedding. For long-tail distribution shift, we introduce additional unexposed items (most are long-tail items) to align the distribution of hot and long-tail item property representations. Further, from the instances perspective, we carefully design the item similarity regularization to learn comprehensive item representation, which encourages item pairs with more effective co-occurrences patterns to have more similar item property representations. Based on offline evaluations and online A/B tests, we show that CD$^2$AN outperforms the existing debiased solutions. Currently, CD$^2$AN has been successfully deployed at Mobile Taobao App and handling major online traffic.

Hypergraph Contrastive Collaborative Filtering
Authors: Lianghao Xia (1), Chao Huang (1), Yong Xu (2), Jiashu Zhao (3), Dawei Yin (4), Jimmy Huang (5)
1: University of Hong Kong, 2: South China University Of Technology, 3: Wilfrid Laurier University, 4: Baidu, 5: York University

ACM DL

Google Scholar

(6)
概要:　協調フィルタリング（CF）は、ユーザーとアイテムを潜在表現空間にパラメータ化し、相互作用データからの相関パターンを用いる基本的なパラダイムとして登場しました。さまざまなCF技術の中で、PinSageやLightGCNなどのGNNベースの推薦システムの開発が最先端の性能を提供しています。しかし、既存のソリューションでは十分に探求されていない2つの重要な課題があります：i) より深いグラフベースのCFアーキテクチャではオーバースムージング効果が発生し、ユーザー表現が区別できなくなり、推薦結果が悪化する可能性があります。ii) 監督信号（すなわち、ユーザーアイテムインタラクション）は、現実には通常乏しく、偏った分布をしています。これにより、CFパラダイムの表現力が制限されます。これらの課題に対処するために、我々はHypergraph Contrastive Collaborative Filtering（HCCF）という新しい自己監督型の推薦フレームワークを提案し、ハイパーグラフを強化したクロスビューの対比学習アーキテクチャを使用して、局所的およびグローバルな協調関係を共同で把握します。特に設計されたハイパーグラフ構造学習は、ユーザー間の複雑な高次依存性を包括的に捉えることにより、GNNベースのCFパラダイムの識別能力を強化します。さらに、我々のHCCFモデルは自己監督型学習を用いて、ハイパーグラフ構造エンコーディングを効果的に統合し、ハイパーグラフの自己識別に基づいて推薦システムの表現品質を強化します。3つのベンチマークデータセットでの広範な実験により、各種最先端の推薦方法に対する我々のモデルの優位性と、希薄なユーザーインタラクションデータに対するロバスト性が実証されました。実装コードはhttps://github.com/akaxlh/HCCFで入手可能です。

Abstract:　 Collaborative Filtering (CF) has emerged as fundamental paradigms for parameterizing users and items into latent representation space, with their correlative patterns from interaction data. Among various CF techniques, the development of GNN-based recommender systems, e.g., PinSage and LightGCN, has offered the state-of-the-art performance. However, two key challenges have not been well explored in existing solutions: i) The over-smoothing effect with deeper graph-based CF architecture, may cause the indistinguishable user representations and degradation of recommendation results. ii) The supervision signals (i.e., user-item interactions) are usually scarce and skewed distributed in reality, which limits the representation power of CF paradigms. To tackle these challenges, we propose a new self-supervised recommendation framework Hypergraph Contrastive Collaborative Filtering (HCCF) to jointly capture local and global collaborative relations with a hypergraph-enhanced cross-view contrastive learning architecture. In particular, the designed hypergraph structure learning enhances the discrimination ability of GNN-based CF paradigm, in comprehensively capturing the complex high-order dependencies among users. Additionally, our HCCF model effectively integrates the hypergraph structure encoding with self-supervised learning to reinforce the representation quality of recommender systems, based on the hypergraph self-discrimination. Extensive experiments on three benchmark datasets demonstrate the superiority of our model over various state-of-the-art recommendation methods, and the robustness against sparse user interaction data. The implementation codes are available at https://github.com/akaxlh/HCCF.

Geometric Disentangled Collaborative Filtering
Authors: Yiding Zhang (1), Chaozhuo Li (2), Xing Xie (2), Xiao Wang (1), Chuan Shi (1), Yuming Liu (3), Hao Sun (3), Liangjie Zhang (3), Weiwei Deng (3), Qi Zhang (3)
1: Beijing University of Posts and Telecommunications, 2: Microsoft Research Asia, 3: Microsoft

ACM DL

Google Scholar

(7)
概要:　ユーザーとアイテムの過去のインタラクションから有益な表現を学習することは、協調フィルタリング（CF）において極めて重要です。既存のCFアプローチは、通常、ユークリッド空間内でのみインタラクションをモデル化します。しかし、複雑なユーザー・アイテムインタラクションは、本来、さまざまな幾何学的パターン（例：ツリー構造や循環構造）を持つ高度な非ユークリッド構造を示します。ユークリッドベースのモデルでは、そのような複合幾何学インタラクションの背後にある意図的な要因を十分に解明することが難しいかもしれません。この欠点を改善するために、本研究では新しい問題である幾何学分離協調フィルタリング（GDCF）について検討します。これは、複数の幾何学空間にわたる潜在的な意図因子を明らかにし、分離することを目的としています。ユーザーの意図やさまざまな幾何学に関連する高次概念を推測することにより、幾何学的に分離された表現を学習する新しい生成的GDCFモデルを提案します。実証的には、私たちの提案は5つの実世界データセットに対して広範に評価され、実験結果はGDCFの優位性を示しています。

Abstract:　 Learning informative representations of users and items from the historical interactions is crucial to collaborative filtering (CF). Existing CF approaches usually model interactions solely within the Euclidean space. However, the sophisticated user-item interactions inherently present highly non-Euclidean anatomy with various types of geometric patterns (i.e., tree-likeness and cyclic structures). The Euclidean-based models may be inadequate to fully uncover the intent factors beneath such hybrid-geometry interactions. To remedy this deficiency, in this paper, we study the novel problem of Geometric Disentangled Collaborative Filtering (GDCF), which aims to reveal and disentangle the latent intent factors across multiple geometric spaces. A novel generative GDCF model is proposed to learn geometric disentangled representations by inferring the high-level concepts associated with user intentions and various geometries. Empirically, our proposal is extensively evaluated over five real-world datasets, and the experimental results demonstrate the superiority of GDCF.

INMO: A Model-Agnostic and Scalable Module for Inductive Collaborative Filtering
Authors: Yunfan Wu (1), Qi Cao (2), Huawei Shen (1), Shuchang Tao (1), Xueqi Cheng (1)
1: Institute of Computing Technology, 2: Institute of Computing Technology

ACM DL

Google Scholar

(8)
概要:　コラボレーティブフィルタリングは、レコメンダーシステムにおける最も一般的なシナリオおよび人気のある研究テーマの一つです。既存の方法の中で、潜在因子モデル、すなわち観測されたインタラクションマトリックスを再構築することによって各ユーザー/アイテムに特定の埋め込みを学習する方法は、優れた性能を示しています。しかし、このようなユーザー固有およびアイテム固有の埋め込みは、本質的に推論的であり、訓練中に見られなかった新しいユーザーや新しいアイテムに対応することが困難です。さらに、モデルのパラメータ数は全ユーザーとアイテムの数に大きく依存するため、実世界のアプリケーションへのスケーラビリティが制限されます。上述の課題を解決するために、本論文では、コラボレーティブフィルタリングのための新しいモデル非依存でスケーラブルなインダクティブ埋め込みモジュール（INMO）を提案します。INMOは、埋め込みルックアップテーブルを使用する代わりに、テンプレートアイテム（テンプレートユーザー）とのインタラクションを特徴付けることにより、ユーザー（アイテム）のインダクティブ埋め込みを生成します。理論的解析の下で、テンプレートユーザーおよびテンプレートアイテムの選択のための効果的な指標も提案します。我々の提案するINMOは、バックボーンモデルの表現力を継承しつつ、インダクティブ能力を持たせ、モデルパラメータを削減するためのプリモジュールとして、既存の潜在因子モデルに付加することができます。Matrix Factorization（MF）とLightGCNという、コラボレーティブフィルタリングの代表的な潜在因子モデルにINMOを連結することで、その汎用性を検証しました。3つの公開ベンチマークにおける広範な実験により、INMOのトランスダクティブおよびインダクティブレコメンデーションシナリオにおける有効性と効率性が示されました。

Abstract:　 Collaborative filtering is one of the most common scenarios and popular research topics in recommender systems. Among existing methods, latent factor models, i.e., learning a specific embedding for each user/item by reconstructing the observed interaction matrix, have shown excellent performances. However, such user-specific and item-specific embeddings are intrinsically transductive, making it difficult for them to deal with new users and new items unseen during training. Besides, the number of model parameters heavily depends on the number of all users and items, restricting their scalability to real-world applications. To solve the above challenges, in this paper, we propose a novel model-agnostic and scalable Inductive Embedding Module for collaborative filtering, namely INMO. INMO generates the inductive embeddings for users (items) by characterizing their interactions with some template items (template users), instead of employing an embedding lookup table. Under the theoretical analysis, we further propose an effective indicator for the selection of template users and template items. Our proposed INMO can be attached to existing latent factor models as a pre-module, inheriting the expressiveness of backbone models, while bringing the inductive ability and reducing model parameters. We validate the generality of INMO by attaching it to Matrix Factorization (MF) and LightGCN, which are two representative latent factor models for collaborative filtering. Extensive experiments on three public benchmarks demonstrate the effectiveness and efficiency of INMO in both transductive and inductive recommendation scenarios.

Improving Implicit Alternating Least Squares with Ring-based Regularization
Authors: Rui Fan (1), Jin Chen (2), Jin Zhang (1), Defu Lian (1), Enhong Chen (1)
1: University of Science and Technology of China, 2: University of Electronic Science and Technology of China

ACM DL

Google Scholar

(9)
概要:　暗黙的フィードバックの広範な存在のため、それに基づく推薦は学術界と産業界の間で長年の研究課題となっています。しかし、各ユーザーが少数のアイテムとしか相互作用を持たないため、極度のスパースな問題に悩まされています。よく知られた高性能な方法の一つは、各ユーザーのすべての未相互作用アイテムを低信頼度のネガティブとして扱うことです。この方法は、各ユーザーの未相互作用アイテムに対する嗜好の大きな逸脱を一定値から抑制する暗黙的な正則化を導入しています。しかし、これらの方法は未相互作用アイテムに対する一定の評価を前提とする必要があり、これは疑問視されるかもしれません。

本論文では、各アイテムと他のいくつかのアイテムとの間のユーザー嗜好差異を著しく抑制するために、リングベースの新しい正則化手法を提案します。アイテムグラフで記述されたリング構造は、正則化において各アイテムに対して選択される他のアイテムを決定します。この正則化手法は、事前評価の導入を回避するだけでなく、理論解析に基づいて全アイテムに対する顕著な嗜好差異を暗黙的に抑制します。

しかし、この正則化を用いたレコメンダーの最適化は計算上の課題に直面するため、我々は勾配計算を慎重に設計することで、スケーラブルな交互最小二乗法アルゴリズムを開発しました。したがって、アイテムグラフ内で各アイテムを他のアイテムとサブリニア/定数数で結合する限り、全体の学習アルゴリズムは既存のアルゴリズムと同等に効率的であり得ます。提案された正則化手法は、いくつかの公開推薦データセットで広範に評価され、その結果、推薦性能においてかなりの改善が見られました。

Abstract:　 Due to the widespread presence of implicit feedback, recommendation based on them has been a long-standing research problem in academia and industry. However, it suffers from the extremely-sparse problem, since each user only interacts with a few items. One well-known and good-performing method is to treat each user's all uninteracted items as negative with low confidence. The method intrinsically imposes an implicit regularization to penalize large deviation of each user's preferences for uninteracted items from a constant. However, these methods have to assume a constant-rating prior to uninteracted items, which may be questionable. In this paper, we propose a novel ring-based regularization to penalize significant differences of each user's preferences between each item and some other items. The ring structure, described by an item graph, determines which other items are selected for each item in the regularization. The regularization not only averts the introduction of the prior ratings but also implicitly penalizes the remarkable preference differences for all items according to theoretical analysis. However, optimizing the recommenders with the regularization still suffers from computational challenges, so we develop a scalable alternating least square algorithm by carefully designing gradient computation. Therefore, as long as connecting each item with a sublinear/constant number of other items in the item graph, the overall learning algorithm could be comparably efficient to the existing algorithms. The proposed regularization is extensively evaluated with several public recommendation datasets, where the results show that the regularization could lead to considerable improvements in recommendation performance.

Graph Trend Filtering Networks for Recommendation
Authors: Wenqi Fan (1), Xiaorui Liu (2), Wei Jin (2), Xiangyu Zhao (3), Jiliang Tang (2), Qing Li (4)
1: The Hong Kong Polytechnic University, 2: Michigan State University, 3: City University of Hong Kong, 4: The Hong Kong Polytechnic University

ACM DL

Google Scholar

(10)
概要:　レコメンダーシステムはユーザーにパーソナライズされたサービスを提供することを目的としており、私たちの日常生活でますます重要な役割を果たしています。レコメンダーシステムの鍵は、ユーザーの過去のオンライン行動に基づいて、ユーザーがアイテムとどれだけの確率でクリック、カートに追加、購入などのインタラクションを行うかを予測することです。これらのユーザー-アイテムインタラクションを活用するために、ユーザーとアイテムのインタラクションをユーザー-アイテムの二部グラフとして考え、そのグラフ内でGraph Neural Networks (GNNs)を使って情報伝播を行う取り組みが増えています。GNNsのグラフ表現学習の力を借りて、これらのGNNsベースのレコメンデーション方法は、レコメンデーションの性能を著しく向上させています。成功にもかかわらず、既存のほとんどのGNNsベースのレコメンダーシステムは信頼性の低い行動（例：ランダム/餌のクリック）によって引き起こされるインタラクションの存在を見落とし、すべてのインタラクションを一様に扱うため、最適でなく不安定な性能につながる可能性があります。本論文では、既存のGNNベースのレコメンデーション方法の欠点（例：非自適応な伝播と非ロバスト性）を調査します。これらの欠点に対処するために、原則に基づいたグラフトレンド協調フィルタリング方法を導入し、インタラクションの適応的信頼性を捉えることができるレコメンデーションに向けたGraph Trend Filtering Networks (GTN)を提案します。提案されたフレームワークの有効性を検証し理解するために包括的な実験とアブレーション研究を行いました。PyTorchに基づいた実装はこちらで利用可能です：https://github.com/wenqifan03/GTN-SIGIR2022

Abstract:　 Recommender systems aim to provide personalized services to users and are playing an increasingly important role in our daily lives. The key of recommender systems is to predict how likely users will interact with items based on their historical online behaviors, e.g., clicks, add-to-cart, purchases, etc. To exploit these user-item interactions, there are increasing efforts on considering the user-item interactions as a user-item bipartite graph and then performing information propagation in the graph via Graph Neural Networks (GNNs). Given the power of GNNs in graph representation learning, these GNNs-based recommendation methods have remarkably boosted the recommendation performance. Despite their success, most existing GNNs-based recommender systems overlook the existence of interactions caused by unreliable behaviors (e.g., random/bait clicks) and uniformly treat all the interactions, which can lead to sub-optimal and unstable performance. In this paper, we investigate the drawbacks (e.g., non-adaptive propagation and non-robustness) of existing GNN-based recommendation methods. To address these drawbacks, we introduce a principled graph trend collaborative filtering method and propose the Graph Trend Filtering Networks for recommendations (GTN) that can capture the adaptive reliability of the interactions. Comprehensive experiments and ablation studies are presented to verify and understand the effectiveness of the proposed framework. Our implementation based on PyTorch is available: https://github.com/wenqifan03/GTN-SIGIR2022.

Learning to Denoise Unreliable Interactions for Graph Collaborative Filtering
Authors: Changxin Tian (1), Yuexiang Xie (2), Yaliang Li (3), Nan Yang (1), Wayne Xin Zhao (1)
1: Renmin University of China, 2: Alibaba Group, 3: Alibaba Group

ACM DL

Google Scholar

(11)
概要:　近年、グラフニューラルネットワーク（GNN）は、効果的な協働フィルタリング（CF）アプローチとしてレコメンダーシステムに成功裏に適用されてきました。しかし、既存のGNNベースのCFモデルは、ノイズの多いユーザー-アイテム間のインタラクションデータに悩まされており、これが実際のアプリケーションにおける有効性と堅牢性に深刻な影響を与えています。レコメンダーシステムにおけるデータノイズ除去に関するいくつかの研究があるものの、それらはGNNのメッセージ伝播におけるノイズの直接的な介入を無視するか、ノイズ除去時に推薦の多様性を保持することに失敗しています。この問題に対処するために、本稿では信頼性の低いインタラクションをノイズ除去するための新しいGNNベースのCFモデル「ロバストグラフ協働フィルタリング（RGCF）」を提案します。具体的には、RGCFはグラフのノイズ除去モジュールと多様性保持モジュールで構成されています。グラフのノイズ除去モジュールは、ノイズインタラクションがGNNの表現学習に与える影響を軽減するために、ハードノイズ除去戦略（すなわち、ノイズであると確信できるインタラクションを削除する）とソフトノイズ除去戦略（すなわち、残りのインタラクションに信頼度重みを割り当てる）を採用しています。多様性保持モジュールでは、ノイズ除去された表現を強化し、推薦の多様性を保持するために、相互情報最大化（MIM）に基づく補助的な自己教師ありタスクを提案し、多様性強化グラフを構築します。これらの2つのモジュールは、推奨性能を共同で向上させるマルチタスク学習の形で統合されています。我々は、3つの実データセットおよび3つの合成データセット上で広範な実験を行いました。実験結果は、RGCFがノイズインタラクションに対してより堅牢であり、ベースラインモデルと比較して顕著な改善を達成することを示しています。

Abstract:　 Recently, graph neural networks (GNN) have been successfully applied to recommender systems as an effective collaborative filtering (CF) approach. However, existing GNN-based CF models suffer from noisy user-item interaction data, which seriously affects the effectiveness and robustness in real-world applications. Although there have been several studies on data denoising in recommender systems, they either neglect direct intervention of noisy interaction in the message-propagation of GNN, or fail to preserve the diversity of recommendation when denoising. To tackle the above issues, this paper presents a novel GNN-based CF model, named Robust Graph Collaborative Filtering (RGCF), to denoise unreliable interactions for recommendation. Specifically, RGCF consists of a graph denoising module and a diversity preserving module. The graph denoising module is designed for reducing the impact of noisy interactions on the representation learning of GNN, by adopting both a hard denoising strategy (i.e., discarding interactions that are confidently estimated as noise) and a soft denoising strategy (i.e., assigning reliability weights for each remaining interaction). In the diversity preserving module, we build up a diversity augmented graph and propose an auxiliary self-supervised task based on mutual information maximization (MIM) for enhancing the denoised representation and preserving the diversity of recommendation. These two modules are integrated in a multi-task learning manner that jointly improves the recommendation performance. We conduct extensive experiments on three real-world datasets and three synthesized datasets. Experiment results show that RGCF is more robust against noisy interactions and achieves significant improvement compared with baseline models.

Analyzing and Simulating User Utterance Reformulation in Conversational Recommender Systems
Authors: Shuo Zhang (1), Mu-Chun Wang (2), Krisztian Balog (3)
1: Bloomberg, 2: University of Science and Technology of China, 3: University of Stavanger

ACM DL

Google Scholar

(12)
概要:　ユーザーシミュレーションは会話型推薦システムを評価するためのコスト効率の高い手法として広く利用されています。しかし、人間のようなシミュレーターを構築することは依然として未解決の課題です。本研究では、会話型エージェントがユーザーの発言を理解できなかった場合に、ユーザーがどのように発言を再構成するかに焦点を当てます。まず、異なるドメインにわたる5つの会話型エージェントを対象としたユーザー調査を実施し、一般的な再構成のタイプとその遷移関係を特定しました。その過程で、粘り強いユーザーは最初に言い換えを試み、次に簡略化し、最後に諦めるという共通のパターンが浮かび上がりました。次に、観察された再構成行動をユーザーシミュレーターに組み込むために、「再構成シーケンス生成」のタスクを導入します。このタスクは、特定の意図（言い換えや簡略化）に基づいて再構成された発言のシーケンスを生成するものです。再構成タイプに基づいてガイドされるトランスフォーマーモデルを拡張し、推定読解難易度に基づくフィルタリングをさらに適用することで手法を開発しました。我々のアプローチの有効性を、自動評価と人間評価の両方を用いて実証します。

Abstract:　 User simulation has been a cost-effective technique for evaluating conversational recommender systems. However, building a human-like simulator is still an open challenge. In this work, we focus on how users reformulate their utterances when a conversational agent fails to understand them. First, we perform a user study, involving five conversational agents across different domains, to identify common reformulation types and their transition relationships. A common pattern that emerges is that persistent users would first try to rephrase, then simplify, before giving up. Next, to incorporate the observed reformulation behavior in a user simulator, we introduce the task of reformulation sequence generation: to generate a sequence of reformulated utterances with a given intent (rephrase or simplify). We develop methods by extending transformer models guided by the reformulation type and perform further filtering based on estimated reading difficulty. We demonstrate the effectiveness of our approach using both automatic and human evaluation.

Conversational Question Answering on Heterogeneous Sources
Authors: Philipp Christmann (1), Rishiraj Saha Roy (1), Gerhard Weikum (1)
1: Max Planck Institute for Informatics

ACM DL

Google Scholar

(13)
概要:　会話型質問応答 (ConvQA) は、後続の質問に隠れた文脈を持つ連続的な情報ニーズに取り組みます。現在のConvQAシステムは、一般的に知識ベース (KB)、テキストコーパス、または表のコレクションのいずれか一つの均質な情報源の上で動作しています。本論文では、これら全ての情報源を同時に活用し、回答の範囲と信頼性を高めるという新しい問題に取り組みます。私たちはCONVINSEを提案します。これは、異種情報源上で動作するConvQAのエンドツーエンドパイプラインであり、三段階で機能します。i) 入力された質問とその会話コンテキストの明示的な構造化表現を学習する、ii) このフレームのような表現を活用して、KB、テキスト、およびテーブルから関連する証拠を一様に取得する、iii) デコーダモデルで統合して回答を生成する、という流れです。私たちは、16000件の質問を含む3000件の実ユーザーの会話と、エンティティ注釈、補完された質問の発言、および質問の言い換えを提供する、異種情報源上のConvQAのための最初のベンチマーク「ConvMix」を構築し、公開しました。実験は、最先端のベースラインと比較して、私たちの手法の実現可能性と利点を示しています。

Abstract:　 Conversational question answering (ConvQA) tackles sequential information needs where contexts in follow-up questions are left implicit. Current ConvQA systems operate over homogeneous sources of information: either a knowledge base (KB), or a text corpus, or a collection of tables. This paper addresses the novel issue of jointly tapping into all of these together, this way boosting answer coverage and confidence. We present CONVINSE, an end-to-end pipeline for ConvQA over heterogeneous sources, operating in three stages: i) learning an explicit structured representation of an incoming question and its conversational context, ii) harnessing this frame-like representation to uniformly capture relevant evidences from KB, text, and tables, and iii) running a fusion-in-decoder model to generate the answer. We construct and release the first benchmark, ConvMix, for ConvQA over heterogeneous sources, comprising 3000 real-user conversations with 16000 questions, along with entity annotations, completed question utterances, and question paraphrases. Experiments demonstrate the viability and advantages of our method, compared to state-of-the-art baselines.

Structured and Natural Responses Co-generation for Conversational Search
Authors: Chenchen Ye (1), Lizi Liao (2), Fuli Feng (3), Wei Ji (4), Tat-Seng Chua (4)
1: National University of Singapore, 2: Singapore Management University, 3: University of Science and Technology of China, 4: Sea-NExT Joint Lab

ACM DL

Google Scholar

(14)
概要:　効果的な対話検索システムにおいて、検索最適化のための代表的な内部状態を保ちながら、流暢で情報豊かな自然応答を生成することは重要です。既存のアプローチには、（1）最初に構造化された対話行為を予測し、その後に自然応答を生成する方法、または（2）会話の文脈をエンドツーエンドで直接自然応答に変換する方法があります。これらのアプローチには双方とも欠点があります。前者はエラーの累積に悩まされ、構造化された行為と自然応答との間の意味的関連性が一方向に限定されます。後者は自然応答の生成に重きを置きますが、構造化された行為の予測には失敗します。そこで、我々は両者を同時に生成するニューラル共同生成モデルを提案します。その鍵は、2つの情報に基づく先行分布により形作られた共有潜在空間にあります。具体的には、構造化された対話行為と自然応答のオートエンコーディングを相互に接続されたネットワークアーキテクチャにおける2つの補助タスクとして設計します。これにより、同時生成と双方向の意味的関連性が可能になります。共有潜在空間は、さらなる共同最適化のための非同期強化学習も可能にします。実験結果は、我々のモデルが顕著な性能向上を達成することを示しています。

Abstract:　 Generating fluent and informative natural responses while main- taining representative internal states for search optimization is critical for conversational search systems. Existing approaches ei- ther 1) predict structured dialog acts first and then generate natural response; or 2) map conversation context to natural responses di- rectly in an end-to-end manner. Both kinds of approaches have shortcomings. The former suffers from error accumulation while the semantic associations between structured acts and natural re- sponses are confined in single direction. The latter emphasizes generating natural responses but fails to predict structured acts. Therefore, we propose a neural co-generation model that gener- ates the two concurrently. The key lies in a shared latent space shaped by two informed priors. Specifically, we design structured dialog acts and natural response auto-encoding as two auxiliary tasks in an interconnected network architecture. It allows for the concurrent generation and bidirectional semantic associations. The shared latent space also enables asynchronous reinforcement learn- ing for further joint optimization. Experiments show that our model achieves significant performance improvements.

Variational Reasoning about User Preferences for Conversational Recommendation
Authors: Zhaochun Ren (1), Zhi Tian (1), Dongdong Li (1), Pengjie Ren (1), Liu Yang (1), Xin Xin (1), Huasheng Liang (2), Maarten de Rijke (3), Zhumin Chen (1)
1: Shandong University, 2: WeChat, 3: University of Amsterdam

ACM DL

Google Scholar

(15)
概要:　会話型推薦システム（CRS）は対話を通じて推薦を提供するシステムです。CRSは通常、システムがユーザーの明示的な属性に基づく好みを継続的に尋ね、それに基づいてアイテムを推薦するという比較的単純なやり取りを通じて推薦を行います。さらに、自然な対話を実現するためにトピック追跡が使用されることもあります。しかし、トピックを追跡するだけでは、対話の中でユーザーの本当の好みを認識するには不十分です。本論文では、CRSにおけるユーザーの好みを正確に認識し維持する問題に対処します。この問題には三つの課題があります: (1) 継続中の対話ではユーザーの短期的なフィードバックしか提供されないこと、(2) ユーザーの好みに関するアノテーションが利用できないこと、(3) 対話に登場するアイテム間には複雑な意味的関連性が存在する可能性があること。我々は対話型推薦のタスクに対するエンドツーエンドの変分推論アプローチを提案することで、これらの課題に取り組みます。我々は長期的および短期的な好みを、それぞれ明示的なトピカル事前分布を持つ隠れ変数としてモデル化します。そして、導出した下限証拠を最適化するために効率的な確率的勾配変分ベイズ推定（SGVB）を使用します。その後、政策ネットワークを使用して、明確化の発言のためのトピックまたは推薦応答のためのアイテムを予測します。異種知識グラフにおける複数回の推論による明示的な好みのシーケンスの使用は、より正確な会話型推薦結果を提供するのに役立ちます。2つのベンチマークデータセットで行った大規模な実験により、提案手法が客観的および主観的評価指標の両方で最先端のベースラインを上回ることが示されました。

Abstract:　 Conversational recommender systems (CRSs) provide recommendations through interactive conversations. CRSs typically provide recommendations through relatively straightforward interactions, where the system continuously inquires about a user's explicit attribute-aware preferences and then decides which items to recommend. In addition, topic tracking is often used to provide naturally sounding responses. However, merely tracking topics is not enough to recognize a user's real preferences in a dialogue. In this paper, we address the problem of accurately recognizing and maintaining user preferences in CRSs. Three challenges come with this problem: (1) An ongoing dialogue only provides the user's short-term feedback; (2) Annotations of user preferences are not available; and (3) There may be complex semantic correlations among items that feature in a dialogue. We tackle these challenges by proposing an end-to-end variational reasoning approach to the task of conversational recommendation. We model both long-term preferences and short-term preferences as latent variables with topical priors for explicit long-term and short-term preference exploration, respectively. We use an efficient stochastic gradient variational Bayesian (SGVB) estimator for optimizing the derived evidence lower bound. A policy network is then used to predict topics for a clarification utterance or items for a recommendation response. The use of explicit sequences of preferences with multi-hop reasoning in a heterogeneous knowledge graph helps to provide more accurate conversational recommendation results. Extensive experiments conducted on two benchmark datasets show that our proposed method outperforms state-of-the-art baselines in terms of both objective and subjective evaluation metric

Curriculum Contrastive Context Denoising for Few-shot Conversational Dense Retrieval
Authors: Kelong Mao (1), Zhicheng Dou (1), Hongjin Qian (2)
1: Renmin University of China, 2: Renmin University of China & Beijing Key Laboratory of Big Data Management and Analysis Methods

ACM DL

Google Scholar

(16)
概要:　会話型検索は情報検索において重要かつ有望な分野です。本論文では、現在のクエリの意図を理解するためにすべての過去の会話ターンが必要ではないことを明らかにします。文脈に存在する冗長なノイズターンは、検索性能の向上を大幅に妨げます。しかし、会話型検索のための文脈ノイズ除去能力を向上させることは、データの不足や会話クエリのエンコーディングと文脈ノイズ除去を同時に学習する困難さ故に非常に挑戦的です。これらの問題に対処するために、本論文では、新しいカリキュラム対照的文脈ノイズ除去フレームワーク（COTED）を提案します。これは、少数ショットの会話型高密度検索に向けたものです。カリキュラム学習順序に従い、新たな会話データ拡張戦略によって生成されたノイズサンプルとノイズ除去サンプル間の対照学習を通じて、モデルに段階的に文脈ノイズ除去の能力を付与します。会話型検索に特化した三つのカリキュラムが我々のフレームワークで活用されています。CAsT-19およびCAsT-20という二つの少数ショット会話型検索データセットに関する広範な実験により、最先端のベースラインと比較して我々の方法の有効性と優位性が検証されました。

Abstract:　 Conversational search is a crucial and promising branch in information retrieval. In this paper, we reveal that not all historical conversational turns are necessary for understanding the intent of the current query. The redundant noisy turns in the context largely hinder the improvement of search performance. However, enhancing the context denoising ability for conversational search is quite challenging due to data scarcity and the steep difficulty for simultaneously learning conversational query encoding and context denoising. To address these issues, in this paper, we present a novel Curriculum cOntrastive conTExt Denoising framework, COTED, towards few-shot conversational dense retrieval. Under a curriculum training order, we progressively endow the model with the capability of context denoising via contrastive learning between noised samples and denoised samples generated by a new conversation data augmentation strategy. Three curriculums tailored to conversational search are exploited in our framework. Extensive experiments on two few-shot conversational search datasets, i.e., CAsT-19 and CAsT-20, validate the effectiveness and superiority of our method compared with the state-of-the-art baselines.

Unified Dialog Model Pre-training for Task-Oriented Dialog Understanding and Generation
Authors: Wanwei He (1), Yinpei Dai (2), Min Yang (1), Jian Sun (2), Fei Huang (2), Luo Si (2), Yongbin Li (2)
1: Shenzhen Institute of Advanced Technology, 2: Alibaba Group

ACM DL

Google Scholar

(17)
概要:　近年では、事前学習手法がタスク指向対話（TOD）システムにおいて顕著な成功を収めています。しかし、現在存在する大部分の事前学習モデルは対話の理解または対話の生成のどちらかに焦点を当てており、両方を網羅しているものは多くありません。本論文では、我々はSPACEを提案します。これは、大規模な対話コーパスの限られたアノテーションから学習し、多様な下流対話タスクに効果的にファインチューニングできる新しい統合型事前学習対話モデルです。具体的には、SPACEは、単一のトランスフォーマー内でタスクフローを維持するための4つの連続するコンポーネントから構成されています：(i) 対話履歴をエンコードする対話エンコーディングモジュール、(ii) ユーザーの問い合わせやシステムの応答から意味ベクトルを抽出する対話理解モジュール、(iii) 応答の高次のセマンティクスを含むポリシーベクトルを生成する対話ポリシーモジュール、および (iv) 適切な応答を生成する対話生成モジュールです。我々は各コンポーネントに対して専用の事前学習目的を設計しました。具体的には、対話エンコーディングモジュールをスパンマスク言語モデリングで事前学習し、文脈化された対話情報を学習させます。構造化された対話セマンティクスを捕捉するために、対話理解モジュールを追加アノテーションを用いた新しいツリー誘導型半教師付きコントラスト学習目標で事前学習します。さらに、対話ポリシーモジュールは、出力ポリシーベクトルと応答のセマンティクスベクトル間のℒ2距離を最小化することで事前学習します。最後に、対話生成モジュールは言語モデリングによって事前学習します。結果として、SPACEは、意図予測、対話状態追跡、およびエンドツーエンド対話モデリングを含む8つの下流対話ベンチマークで最先端の性能を達成しました。また、低リソース設定において既存のモデルよりも強力な few-shot 能力を持つことを示しました。

Abstract:　 Recently, pre-training methods have shown remarkable success in task-oriented dialog (TOD) systems. However, most existing pre-trained models for TOD focus on either dialog understanding or dialog generation, but not both. In this paper, we propose SPACE, a novel unified pre-trained dialog model learning from large-scale dialog corpora with limited annotations, which can be effectively fine-tuned on a wide range of downstream dialog tasks. Specifically, SPACE consists of four successive components in a single transformer to maintain a task-flow in TOD systems: (i) a dialog encoding module to encode dialog history, (ii) a dialog understanding module to extract semantic vectors from either user queries or system responses, (iii) a dialog policy module to generate a policy vector that contains high-level semantics of the response, and (iv) a dialog generation module to produce appropriate responses. We design a dedicated pre-training objective for each component. Concretely, we pre-train the dialog encoding module with span mask language modeling to learn contextualized dialog information. To capture the structured dialog semantics, we pre-train the dialog understanding module via a novel tree-induced semi-supervised contrastive learning objective with the help of extra dialog annotations. In addition, we pre-train the dialog policy module by minimizing the ℒ2 distance between its output policy vector and the semantic vector of the response for policy optimization. Finally, the dialog generation model is pre-trained by language modeling. Results show that SPACE achieves state-of-the-art performance on eight downstream dialog benchmarks, including intent prediction, dialog state tracking, and end-to-end dialog modeling. We also show that SPACE has a stronger few-shot ability than existing models under the low-resource setting.

COSPLAY: Concept Set Guided Personalized Dialogue Generation Across Both Party Personas
Authors: Chen Xu (1), Piji Li (2), Wei Wang (3), Haoran Yang (4), Siyun Wang (5), Chuangbai Xiao (1)
1: Beijing University of Technology, 2: Tencent AI Lab, 3: Tsinghua University, 4: The Chinese University of Hong Kong, 5: University of Southern California

ACM DL

Google Scholar

(18)
概要:　一貫したペルソナの維持は、人間らしい対話モデルの構築において重要です。しかし、パートナーへの注意を欠いた場合、モデルは自己中心的になる傾向があります。具体的には、話題を固くねじ曲げて自分の興味に引っ張ったり、パートナーへの関心が薄くペルソナを延々と話し続けたりします。本研究では、相手を「チーム」として捉え、自己ペルソナを表現しながらもパートナーへの好奇心を持ち、相互のペルソナに関連した応答を導き、共通の基盤を見つけるCOSPLAY（COncept Set guided PersonaLized dialogue generation Across both partY personas）を提案します。具体的には、まず自己ペルソナ、パートナーペルソナ、相互対話をすべて概念セットで表現します。次に、集合代数、集合拡張、集合距離といった知識強化操作を行うためのConcept Setフレームワークを提案します。これらの操作を介して、1）両パーティのペルソナの概念、2）それらの概念間の関係、3）未来の対話との関係を利用してモデルを訓練します。大規模な公開データセットPersona-Chatを使用した広範な実験により、提案モデルは自己中心的でなく、より人間らしく、高品質な応答を生成する点で、従来の最先端モデルを自動評価および人手評価の両方で上回ることを示しました。

Abstract:　 Maintaining a consistent persona is essential for building a human-like conversational model. However, the lack of attention to the partner makes the model more egocentric: they tend to show their persona by all means such as twisting the topic stiffly, pulling the conversation to their own interests regardless, and rambling their persona with little curiosity to the partner. In this work, we propose COSPLAY(COncept Set guided PersonaLized dialogue generation Across both partY personas) that considers both parties as a "team": expressing self-persona while keeping curiosity toward the partner, leading responses around mutual personas, and finding the common ground. Specifically, we first represent self-persona, partner persona and mutual dialogue all in the concept sets. Then, we propose the Concept Set framework with a suite of knowledge-enhanced operations to process them such as set algebras, set expansion, and set distance. Based on these operations as medium, we train the model by utilizing 1) concepts of both party personas, 2) concept relationship between them, and 3) their relationship to the future dialogue. Extensive experiments on a large public dataset, Persona-Chat, demonstrate that our model outperforms state-of-the-art baselines for generating less egocentric, more human-like, and higher quality responses in both automatic and human evaluations.

Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy
Authors: Wenqiang Lei (1), Yao Zhang (2), Feifan Song (3), Hongru Liang (1), Jiaxin Mao (4), Jiancheng Lv (1), Zhenglu Yang (2), Tat-Seng Chua (5)
1: Sichuan University, 2: Nankai University, 3: Peking University, 4: Renmin University of China, 5: National University of Singapore

ACM DL

Google Scholar

(19)
概要:　プロアクティブ対話システムは、会話をゴールトピックに導く能力があり、交渉、説得、および談判において優れた可能性を秘めています。しかし、現在のコーパスに基づく学習手法は、現実のシナリオでの実用性に限界があります。このため、我々はプロアクティブ対話ポリシーの研究を、ユーザーと動的に相互作用するより自然で挑戦的な設定に進展させることに努めています。さらに、非協力的なユーザー行動にも注目します。これは、エージェントによって紹介された前のトピックに満足していないときにユーザーが話の道から外れたトピックについて話すことを指します。我々は、ゴールトピックに迅速に到達することと高いユーザー満足を維持することの目標が必ずしも一致するわけではないと主張します。なぜなら、ゴールに近いトピックとユーザーが好むトピックが同じとは限らないからです。この問題に対処するために、インタラクティブな設定でプロアクティブポリシーを学習する新しいソリューション「I-Pro」を提案します。具体的には、対話ターン、ゴール達成の難易度、ユーザー満足度の推定、協力度の4つの要素から成る学習されたゴールウェイトを通じてトレードオフを学習します。実験結果は、有効性と解釈可能性の点でI-Proがベースラインを大幅に上回ることを示しています。

Abstract:　 Proactive dialogue system is able to lead the conversation to a goal topic and has advantaged potential in bargain, persuasion, and negotiation. Current corpus-based learning manner limits its practical application in real-world scenarios. To this end, we contribute to advancing the study of the proactive dialogue policy to a more natural and challenging setting, i.e., interacting dynamically with users. Further, we call attention to the non-cooperative user behavior - the user talks about off-path topics when he/she is not satisfied with the previous topics introduced by the agent. We argue that the targets of reaching the goal topic quickly and maintaining a high user satisfaction are not always converged, because the topics close to the goal and the topics user preferred may not be the same. Towards this issue, we propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors (dialogue turn, goal completion difficulty, user satisfaction estimation, and cooperative degree). The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.

User-Centric Conversational Recommendation with Multi-Aspect User Modeling
Authors: Shuokai Li (1), Ruobing Xie (2), Yongchun Zhu (1), Xiang Ao (1), Fuzhen Zhuang (3), Qing He (1)
1: Institute of Computing Technology, 2: WeChat Search Application Department, 3: Institute of Artificial Intelligence

ACM DL

Google Scholar

(20)
概要:　会話型推薦システム（CRS）は、会話内で高品質な推薦を提供することを目指しています。しかし、従来のCRSモデルの多くは、現在のセッションの対話理解に主に焦点を当てており、推薦の中心対象（つまり、ユーザー）に関する他の豊富な多面的情報を無視しています。本研究では、CRSにおいて現在の対話セッションに加え、ユーザーの過去の対話セッションや類似ユーザーがユーザーの好みを知るための重要な情報源であることを強調します。多面的情報を体系的にモデル化するために、我々は、CRSタスクにおけるユーザーの好み学習の本質に立ち返る「ユーザー中心の会話型推薦（UCCR）モデル」を提案します。具体的には、過去のセッションから知識、セマンティクス、および消費ビューの観点でユーザーの多視点の好みを捉える「過去セッション学習器」を提案します。自己教師あり目的を通じて、現在および過去のセッションにおける異なる視点間の内在的な相関を学習するための「多視点好みマッパー」を実施します。また、ユーザーの類似ユーザーを理解するための「類似ユーザー選択機能」を設計しています。学習された多面的かつ多視点のユーザーの好みは、推薦および対話生成に使用されます。実験では、中国語および英語のCRSデータセットにおいて包括的な評価を行いました。推薦および対話生成の両方において、競合モデルに対する顕著な改善がUCCRの優位性を検証しました。

Abstract:　 Conversational recommender systems (CRS) aim to provide highquality recommendations in conversations. However, most conventional CRS models mainly focus on the dialogue understanding of the current session, ignoring other rich multi-aspect information of the central subjects (i.e., users) in recommendation. In this work, we highlight that the user's historical dialogue sessions and look-alike users are essential sources of user preferences besides the current dialogue session in CRS. To systematically model the multi-aspect information, we propose a User-Centric Conversational Recommendation (UCCR) model, which returns to the essence of user preference learning in CRS tasks. Specifically, we propose a historical session learner to capture users' multi-view preferences from knowledge, semantic, and consuming views as supplements to the current preference signals. A multi-view preference mapper is conducted to learn the intrinsic correlations among different views in current and historical sessions via self-supervised objectives. We also design a temporal look-alike user selector to understand users via their similar users. The learned multi-aspect multi-view user preferences are then used for the recommendation and dialogue generation. In experiments, we conduct comprehensive evaluations on both Chinese and English CRS datasets. The significant improvements over competitive models in both recommendation and dialogue generation verify the superiority of UCCR.

Generating Clarifying Questions with Web Search Results
Authors: Ziliang Zhao (1), Zhicheng Dou (1), Jiaxin Mao (1), Ji-Rong Wen (2)
1: Renmin University of China, 2: Beijing Key Laboratory of Big Data Management and Analysis Methods & Key Laboratory of Data Engineering and Knowledge Engineering

ACM DL

Google Scholar

(21)
概要:　照会意図を明確化するための質問を行うことは、ユーザーの意図を効果的に理解するためのインタラクティブな方法です。ユーザーが照会を提出した際、検索エンジンはサブインテントのいくつかのクリック可能な項目とともに明確化質問を返します。既存の定義によると、質の高い質問を行うための鍵は、提出された照会と提供された項目に対して良質な説明を生成することです。しかし、既存の方法は主に静的な知識ベースに依存しており、多くの照会に対する説明を見つけるのが難しいのは、照会や対応する項目にエンティティが不足しているためです。このような照会に対しては、情報豊かな質問を生成することが困難です。この問題を解決するために、我々は照会のトップ検索結果を利用して、より良い説明を生成することを提案します。トップの検索結果は、照会に関する豊富で関連するコンテキストを含んでいると考えるからです。具体的には、まず、検索結果から説明候補を抽出し、それらを人間が設計したさまざまな特徴によってランク付けするルールベースのアルゴリズムを設計しました。次に、ランク学習モデルと別の生成モデルを適用し、一般化と明確化質問の質のさらなる向上を図ります。実験結果は、我々の提案手法が既存の方法と比較して、より読みやすく情報豊かな質問を生成できることを示しています。この結果は、検索結果が会話型検索システムにおける検索明確化のためにユーザーの検索体験を向上させるのに利用できることを証明しています。

Abstract:　 Asking clarifying questions is an interactive way to effectively clarify user intent. When a user submits a query, the search engine will return a clarifying question with several clickable items of sub-intents for clarification. According to the existing definition, the key to asking high-quality questions is to generate good descriptions for submitted queries and provided items. However, existing methods mainly based on static knowledge bases are difficult to find descriptions for many queries because of the lack of entities within these queries and their corresponding items. For such a query, it is unable to generate an informative question. To alleviate this problem, we propose leveraging top search results of the query to help generate better descriptions because we deem that the top retrieved documents contain rich and relevant contexts of the query. Specifically, we first design a rule-based algorithm to extract description candidates from search results and rank them by various human-designed features. Then, we apply an learning-to-rank model and another generative model for generalization and further improve the quality of clarifying questions. Experimental results show that our proposed methods can generate more readable and informative questions compared with existing methods. The results prove that search results can be utilized to improve users' search experience for search clarification in conversational search systems.

ADPL: Adversarial Prompt-based Domain Adaptation for Dialogue Summarization with Knowledge Disentanglement
Authors: Lulu Zhao (1), Fujia Zheng (1), Weihao Zeng (1), Keqing He (2), Ruotong Geng (1), Huixing Jiang (2), Wei Wu (2), Weiran Xu (1)
1: Beijing University of Posts and Telecommunications, 2: Meituan Group

ACM DL

Google Scholar

(22)
概要:　タイトル:従来の対話モデルは、大規模な手作業でラベル付けされたコーパスに依存しており、新しいドメインへの一般化能力に欠けています。したがって、ラベル付きソースドメインからラベルなしターゲットドメインへのドメイン適応が、実際のシナリオにおいて重要です。しかし、既存の対話におけるドメイン適応研究は、一般に広範な外部データを用いた大規模な事前トレーニングを必要とします。本論文では、軽量なファインチューニング手法を模索するために、対話のドメイン適応に向けた効率的な敵対的分離プロンプト学習（ADPL）モデルを提案します。私たちは、ドメイン不変プロンプト（DIP）、ドメイン固有プロンプト（DSP）、およびタスク指向プロンプト（TOP）という3種類のプロンプトを導入します。DIPは、ソースドメインとターゲットドメインから共通の知識を分離して転送することを目的とし、敵対的な方法でドメインに依存しない情報の予測精度を向上させ、新しいドメインへの一般化能力を強化します。DSPは、ドメイン関連の特徴を用いてモデルがドメイン固有の知識に焦点を当てるように設計されています。TOPは、高品質なを生成するためにタスク指向の知識を捉えるためのものです。事前トレーニングされた言語モデル（PLM）の全体をファインチューニングする代わりに、プロンプトネットワークのみを更新し、PLMは固定したままにします。ゼロショット設定での実験結果は、プロンプトの新しい設計がプレフィックストレーニングを使用するベースラインと比べて、より一貫性があり、信頼性が高く、関連性のあるを生み出すことができ、ファインチューニングと同等の性能を発揮しながら、より効率的であることを示しています。総じて、我々の研究は対話タスクにおけるゼロショット学習へプロンプトベースの視点を導入し、将来の研究にとって有益な発見と洞察を提供します。

Abstract:　 Traditional dialogue summarization models rely on a large-scale manually-labeled corpus, lacking generalization ability to new domains, and domain adaptation from a labeled source domain to an unlabeled target domain is important in practical summarization scenarios. However, existing domain adaptation works in dialogue summarization generally require large-scale pre-training using extensive external data. To explore the lightweight fine-tuning methods, in this paper, we propose an efficient Adversarial Disentangled Prompt Learning (ADPL) model for domain adaptation in dialogue summarization. We introduce three kinds of prompts including domain-invariant prompt (DIP), domain-specific prompt (DSP), and task-oriented prompt (TOP). DIP aims to disentangle and transfer the shared knowledge from the source domain and target domain in an adversarial way, which improves the accuracy of prediction about domain-invariant information and enhances the ability for generalization to new domains. DSP is designed to guide our model to focus on domain-specific knowledge using domain-related features. TOP is to capture task-oriented knowledge to generate high-quality summaries. Instead of fine-tuning the whole pre-trained language model (PLM), we only update the prompt networks but keep PLM fixed. Experimental results on the zero-shot setting show that the novel design of prompts can yield more coherent, faithful, and relevant summaries than baselines using the prefix-tuning, and perform at par with fine-tuning while being more efficient. Overall, our work introduces a prompt-based perspective to the zero-shot learning for dialogue summarization task and provides valuable findings and insights for future research.

Learning to Infer User Implicit Preference in Conversational Recommendation
Authors: Chenhao Hu (1), Shuhua Huang (1), Yansen Zhang (1), Yubao Liu (2)
1: Sun Yat-Sen University, 2: Sun Yat-Sen University & Guangdong Key Laboratory of Big Data Analysis and Processing

ACM DL

Google Scholar

(23)
概要:　対話型推薦システム（Conversational Recommender Systems, CRS）は、属性に関する質問を通じてユーザーと対話し、アイテムを推薦することで従来の推薦システムを強化します。ユーザーの属性レベルおよびアイテムレベルのフィードバックを利用して、ユーザーの好みを推定することができます。しかし、既存の研究は明示的なアイテムフィードバックの利点を十分に活用しておらず、暗黙的な方法（例：潜在ユーザーやアイテム表現の更新）に限定されています。CRSはユーザーと複数回対話する機会があるため、会話の文脈を活用することで、推薦が拒否された場合のユーザーの暗黙的なフィードバック（例：特定の属性）を推測するのに役立つ可能性があります。このような既存の方法の制約に対応するため、新たなCRSフレームワークである Conversational Recommender with Implicit Feedback (CRIF) を提案します。CRIFは、会話型推薦スキームをオフライン表現学習、トラッキング、意思決定、推論の4段階のプロセスとして構築します。推論モジュールでは、ユーザーの属性レベルおよびアイテムレベルのフィードバック間の関係を完全に活用することで、ユーザーの暗黙的な好みを明示的に推測することができます。これにより、CRIFはより精度の高いユーザー好みの推定を実現します。また、意思決定モジュールでは、属性レベルおよびアイテムレベルのフィードバックをより効果的に活用するために、逆強化学習（inverse reinforcement learning）を採用し、各対話ターンで適切なアクションを選択する柔軟な意思決定戦略を学習します。4つのベンチマークCRSデータセットにおける広範な実験を通じて、我々のアプローチの有効性を検証し、最先端のCRS手法を大幅に凌駕することを確認しました。

Abstract:　 Conversational recommender systems (CRS) enable traditional recommender systems to interact with users by asking questions about attributes and recommending items. The attribute-level and item-level feedback of users can be utilized to estimate users' preferences. However, existing works do not fully exploit the advantage of explicit item feedback --- they only use the item feedback in rather implicit ways such as updating the latent user and item representation. Since CRS has multiple chances to interact with users, leveraging the context in the conversation may help infer users' implicit feedback (e.g., some specific attributes) when recommendations get rejected. To address the limitations of existing methods, we propose a new CRS framework called Conversational Recommender with Implicit Feedback (CRIF). CRIF formulates the conversational recommendation scheme as a four-phase process consisting of offline representation learning, tracking, decision, and inference. In the inference module, by fully utilizing the relation between users' attribute-level and item-level feedback, our method can explicitly deduce users' implicit preferences. Therefore, CRIF is able to achieve more accurate user preference estimation. Besides, in the decision module, to better utilize the attribute-level and item-level feedback, we adopt inverse reinforcement learning to learn a flexible decision strategy that selects the suitable action at each conversation turn. Through extensive experiments on four benchmark CRS datasets, we validate the effectiveness of our approach, which significantly outperforms the state-of-the-art CRS methods.

DisenCDR: Learning Disentangled Representations for Cross-Domain Recommendation
Authors: Jiangxia Cao (1), Xixun Lin (1), Xin Cong (1), Jing Ya (1), Tingwen Liu (1), Bin Wang (2)
1: Institute of Information Engineering, 2: Xiaomi Inc.

ACM DL

Google Scholar

(24)
概要:　データのスパース性（希薄性）は、レコメンダーシステムにおける長年の問題です。これを解決するために、クロスドメイン推薦（CDR）が多くの関心を集めています。これは、関連するソースドメインから豊富なユーザーアイテムの相互作用情報を利用して、スパースなターゲットドメインでの性能を向上させる手法です。最近のCDRアプローチは、ターゲットドメインのためにより優れたユーザー表現を生成するために、ソースドメインの情報を集約することに注目しています。しかし、これらのアプローチは両方のドメインを同時に学習するための強力な相互作用エンコーダーの設計に焦点を当てており、異なるドメインの異なるユーザーの好みをモデル化することができていません。特に、ソースドメインのドメイン固有の好みは、ターゲットドメインでの性能向上には役に立たないことが多く、ドメイン共有情報とドメイン固有情報を直接集約するとターゲットドメインの性能が低下する可能性があります。本研究では、CDRの重要な課題を考察します：「どのようにしてドメイン間で共有情報を転送するか？」。

情報理論を基盤にして、ドメイン共有情報とドメイン固有情報を分離する新しいモデル、DisenCDRを提案します。我々の目標を達成するために、2つの相互情報に基づく分離正則化項を提案します。具体的には、独占的正則化項はユーザーのドメイン共有表現とドメイン固有表現が独占的な情報をエンコーディングすることを強制します。情報正則化項は、ユーザーのドメイン共有表現が両方のドメインの予測情報をエンコーディングすることを奨励します。これらに基づいて、望ましい分離表現を学習するための分離目的の扱いやすい境界を導出します。広範な実験により、DisenCDRが4つの実世界データセットにおいて、最先端のベースラインを大幅に上回る改善を達成することが示されています。

Abstract:　 Data sparsity is a long-standing problem in recommender systems. To alleviate it, Cross-Domain Recommendation (CDR) has attracted a surge of interests, which utilizes the rich user-item interaction information from the related source domain to improve the performance on the sparse target domain. Recent CDR approaches pay attention to aggregating the source domain information to generate better user representations for the target domain. However, they focus on designing more powerful interaction encoders to learn both domains simultaneously, but fail to model different user preferences of different domains. Particularly, domain-specific preferences of the source domain usually provide useless information to enhance the performance in the target domain, and directly aggregating the domain-shared and domain-specific information together maybe hurts target domain performance. This work considers a key challenge of CDR: How do we transfer shared information across domains? Grounded in the information theory, we propose DisenCDR, a novel model to disentangle the domain-shared and domain-specific information. To reach our goal, we propose two mutual-information-based disentanglement regularizers. Specifically, an exclusive regularizer aims to enforce the user domain-shared representations and domain-specific representations encoding exclusive information. An information regularizer is to encourage the user domain-shared representations encoding predictive information for both domains. Based on them, we further derive a tractable bound of our disentanglement objective to learn desirable disentangled representations. Extensive experiments show that DisenCDR achieves significant improvements over state-of-the-art baselines on four real-world datasets.

Structure-Aware Semantic-Aligned Network for Universal Cross-Domain Retrieval
Authors: Jialin Tian (1), Xing Xu (1), Kai Wang (1), Zuo Cao (2), Xunliang Cai (2), Heng Tao Shen (3)
1: University of Electronic Science and Technology of China, 2: Meituan, 3: University of Electronic Science and Technology of China & Peng Cheng Laboratory

ACM DL

Google Scholar

(25)
概要:　クロスドメイン検索 (CDR) の目的は、あるドメインのクエリを使用して別のドメインで同じカテゴリのインスタンスを検索することです。既存のCDRアプローチは、主にトレーニングとテストのクロスドメインデータが同じカテゴリおよび基礎的な分布から来るという標準的なシナリオを考慮しています。しかし、これらの方法は、テストデータがトレーニング中に存在しなかったドメインおよびカテゴリに属するユニバーサルクロスドメイン検索 (UCDR) という新たに出現したタスクにはうまく適用できません。CDRと比較して、UCDRタスクは次の理由でより困難です：（1）マルチソースドメインからの視覚的に多様なデータ、（2）既知および未知ドメイン間のドメインシフト、（3）既知および未知カテゴリ間のセマンティックシフト。この問題に対処するために、異なるソースドメインの異種表現を一般化可能性を損なうことなく整合させるための、新しいモデル「構造認識セマンティック整合ネットワーク (SASA)」を提案します。具体的には、バックボーンとして最新のVision Transformer (ViT) を活用し、2つの補完的な蒸留および整合トークンをViTアーキテクチャに組み込んだ新規トークンベースの戦略を持つ蒸留-アラインメントViT (DAViT) を考案しました。さらに、蒸留トークンは構造情報の保持によってモデルの一般化能力を向上させ、整合トークンは訓練可能なカテゴリプロトタイプにより識別力を向上させるために使用されます。Sketchy、TU-Berlin、およびDomainNetという3つの大規模ベンチマークでの広範な実験により、最先端のUCDRおよびZS-SBIRメソッドに対するSASAメソッドの優位性が実証されました。

Abstract:　 The goal of cross-domain retrieval (CDR) is to search for instances of the same category in one domain by using a query from another domain. Existing CDR approaches mainly consider the standard scenario that the cross-domain data for both training and testing come from the same categories and underlying distributions. However, these methods cannot be well extended to the newly emerging task of universal cross-domain retrieval (UCDR), where the testing data belong to the domain and categories not present during training. Compared to CDR, the UCDR task is more challenging due to (1) visually diverse data from multi-source domains, (2) the domain shift between seen and unseen domains, and (3) the semantic shift across seen and unseen categories. To tackle these problems, we propose a novel model termed Structure-Aware Semantic-Aligned Network (SASA) to align the heterogeneous representations of multi-source domains without loss of generalizability for the UCDR task. Specifically, we leverage the advanced Vision Transformer (ViT) as the backbone and devise a distillation-alignment ViT (DAViT) with a novel token-based strategy, which incorporates two complementary distillation and alignment tokens into the ViT architecture. In addition, the distillation token is devised to improve the generalizability of our model by structure information preservation and the alignment token is used to improve discriminativeness with trainable categorical prototypes. Extensive experiments on three large-scale benchmarks, i.e., Sketchy, TU-Berlin, and DomainNet, demonstrate the superiority of our SASA method over the state-of-the-art UCDR and ZS-SBIR methods.

Dynamics-Aware Adaptation for Reinforcement Learning Based Cross-Domain Interactive Recommendation
Authors: Junda Wu (1), Zhihui Xie (2), Tong Yu (3), Handong Zhao (3), Ruiyi Zhang (3), Shuai Li (2)
1: New York University, 2: Shanghai Jiao Tong University, 3: Adobe Research

ACM DL

Google Scholar

(26)
概要:　インタラクティブレコメンダーシステム (IRS) は、近年広く注目を集めています。ユーザーの動的な嗜好を捕捉し、長期的なエンゲージメントを最大化するために、IRSは通常、強化学習 (RL) 問題として定式化されます。複雑な意思決定問題を解決するための可能性を秘めているにも関わらず、RLベースの手法は一般に大量のオンラインインタラクションを必要とし、経済的な制約からその応用範囲が制限されています。この問題を軽減する一つの方向性として、ソースドメイン（例：映画レコメンデーションにおける冒険ジャンル）からの豊富なログインタラクションデータを活用し、ターゲットドメイン（例：犯罪ジャンル）でのレコメンデーション品質を向上させるクロスドメインレコメンデーションが考えられます。しかし、従来の研究の多くはユーザー/アイテムの静的な表現の適応に焦点を当てており、時系列的に動的なユーザー-アイテムインタラクションパターンがドメイン間でどのように変換されるかについてはあまり探究されていません。上記を踏まえ、本研究ではDACIRという新しいDoubly-Adaptive深層RLベースのクロスドメインインタラクティブレコメンデーションフレームワークを提案します。まず、2つのドメインにおけるユーザーの行動の違いを特定し、共有されたユーザーダイナミクスを活用してIRSを向上させる可能性を強調します。静的なユーザー嗜好をドメイン間で転送するために、DACIRはアイテム表現の一貫性を確保し、埋め込みを共有の潜在空間に整列させます。さらに、IRSにおけるユーザーダイナミクスを考慮し、DACIRは2つのドメインにおける動的インタラクションパターンを報酬相関によって調整します。この二重適応がクロスドメインのギャップを狭めると、ログデータを活用してターゲットレコメンダーの移行可能なポリシーを学習することが可能になります。実世界のデータセットを用いた実験により、本アプローチの優位性が検証され、ベースラインに対して一貫して有意な改善を達成しました。

Abstract:　 Interactive recommender systems (IRS) have received wide attention in recent years. To capture users' dynamic preferences and maximize their long-term engagement, IRS are usually formulated as reinforcement learning (RL) problems. Despite the promise to solve complex decision-making problems, RL-based methods generally require a large amount of online interaction, restricting their applications due to economic considerations. One possible direction to alleviate this issue is cross-domain recommendation that aims to leverage abundant logged interaction data from a source domain (e.g., adventure genre in movie recommendation) to improve the recommendation quality in the target domain (e.g., crime genre). Nevertheless, prior studies mostly focus on adapting the static representations of users/items. Few have explored how the temporally dynamic user-item interaction patterns transform across domains. Motivated by the above consideration, we propose DACIR, a novel Doubly-Adaptive deep RL-based framework for Cross-domain Interactive Recommendation. We first pinpoint how users behave differently in two domains and highlight the potential to leverage the shared user dynamics to boost IRS. To transfer static user preferences across domains, DACIR enforces consistency of item representation by aligning embeddings into a shared latent space. In addition, given the user dynamics in IRS, DACIR calibrates the dynamic interaction patterns in two domains via reward correlation. Once the double adaptation narrows the cross-domain gap, we are able to learn a transferable policy for the target recommender by leveraging logged data. Experiments on real-world datasets validate the superiority of our approach, which consistently achieves significant improvements over the baselines.

Exploring Modular Task Decomposition in Cross-domain Named Entity Recognition
Authors: Xinghua Zhang (1), Bowen Yu (1), Yubin Wang (1), Tingwen Liu (1), Taoyu Su (1), Hongbo Xu (1)
1: Institute of Information Engineering

ACM DL

Google Scholar

(27)
概要:　クロスドメイン固有表現認識（NER）は、ソースドメインからターゲットドメインに知識を転移させ、ターゲットドメインでの高価なラベリングコストを軽減することを目的としています。これまでの多くの研究では、各トークンに構成的なラベル（例：B-LOC）を割り当てるエンドツーエンドのシーケンスラベリングフレームワークの下で、ドメイン不変な特徴を獲得してきました。しかし、この複雑なラベリングスキームのため、クロスドメイン転移の複雑さが増し、特にドメイン間でのエンティティカテゴリが大きく異なる場合には、最適ではない結果をもたらすことがあります。本論文では、クロスドメインNERにおけるタスク分解を探求します。具体的には、エンティティの範囲検出とタイプ分類という2つのサブタスクを別々の機能モジュールで学習し、それぞれのクロスドメイン転移を対応する戦略で行うモジュール学習アプローチを提案します。構成的なラベリングスキームに比べ、ラベル空間は特にエンティティ範囲検出において、ドメイン間でより小さく近くなるため、各サブタスクの転移が容易になります。そして、2つのサブタスクを組み合わせてモジュール相互作用メカニズムによって最終結果を達成し、低リソースのターゲットドメインでの一般化と堅牢な学習のために逆正則化を導入します。10の多様なドメインペアに対する広範な実験により、提案手法がエンドツーエンド方式の最先端のクロスドメインNER手法に比べて平均6.4%の絶対F1スコアの向上を示すことがわかりました。さらに分析では、モジュールタスク分解の効果とクロスドメインNERにおけるその大きな可能性を示しています。

Abstract:　 Cross-domain Named Entity Recognition (NER) aims to transfer knowledge from the source domain to the target, alleviating expensive labeling costs in the target domain. Most prior studies acquire domain-invariant features under the end-to-end sequence-labeling framework where each token is assigned a compositional label (e.g., B-LOC). However, the complexity of cross-domain transfer may be increased over this complicated labeling scheme, which leads to sub-optimal results, especially when there are significantly distinct entity categories across domains. In this paper, we aim to explore the task decomposition in cross-domain NER. Concretely, we suggest a modular learning approach in which two sub-tasks (entity span detection and type classification) are learned by separate functional modules to perform respective cross-domain transfer with corresponding strategies. Compared with the compositional labeling scheme, the label spaces are smaller and closer across domains especially in entity span detection, leading to easier transfer in each sub-task. And then we combine two sub-tasks to achieve the final result with modular interaction mechanism, and deploy the adversarial regularization for generalized and robust learning in low-resource target domains. Extensive experiments over 10 diverse domain pairs demonstrate that the proposed method is superior to state-of-the-art cross-domain NER methods in an end-to-end fashion (about average 6.4% absolute F1 score increase). Further analyses show the effectiveness of modular task decomposition and its great potential in cross-domain NER.

Exploiting Variational Domain-Invariant User Embedding for Partially Overlapped Cross Domain Recommendation
Authors: Weiming Liu (1), Xiaolin Zheng (1), Jiajie Su (1), Mengling Hu (1), Yanchao Tan (1), Chaochao Chen (1)
1: Zhejiang University

ACM DL

Google Scholar

(28)
概要:　異なるドメインの知識を活用してレコメンダーシステムにおけるコールドスタート問題を解決するため、クロスドメインレコメンデーション（CDR）は広く研究されています。既存の多くのCDRモデルは、知識転送のためにソースドメインとターゲットドメインが同じユーザーセットを共有していると仮定しています。しかし、実際のCDRタスクでは、ソースドメインとターゲットドメインの両方に同時にアクティブなユーザーは少数です。本論文では、部分的に重複するクロスドメインレコメンデーション（POCDR）問題、すなわち重複ユーザーと非重複ユーザーの両方の情報を活用してレコメンデーションパフォーマンスを向上させる方法に焦点を当てます。既存のアプローチは、異なるドメイン間の非重複ユーザーの有用な知識を十分に活用できず、ユーザーの大多数が非重複である場合にモデルのパフォーマンスが制限されます。この問題に対処するため、我々は部分的に重複するクロスドメインレコメンデーション問題に対するクロスドメインレコメンデーションフレームワーク、変分ドメイン不変埋め込み整合性（VDEA）を備えたデュアルオートエンコーダーモデルを提案します。VDEAは、局所およびグローバルな埋め込み整合性を持つデュアル変分オートエンコーダーを用いて、ドメイン不変のユーザー埋め込みを活用します。まず、変分推論を採用して協調的なユーザーの嗜好を捉え、次にグロモフ・ワッサースタイン分布の共同クラスタリング最適輸送を利用して、類似の評価インタラクション行動を持つユーザーをクラスタリングします。DoubanとAmazonのデータセットでの実証研究により、VDEAが特にPOCDR設定において最新鋭のモデルを大幅に上回ることが示されました。

Abstract:　 Cross-Domain Recommendation (CDR) has been popularly studied to utilize different domain knowledge to solve the cold-start problem in recommender systems. Most of the existing CDR models assume that both the source and target domains share the same overlapped user set for knowledge transfer. However, only few proportion of users simultaneously activate on both the source and target domains in practical CDR tasks. In this paper, we focus on the Partially Overlapped Cross-Domain Recommendation (POCDR) problem, that is, how to leverage the information of both the overlapped and non-overlapped users to improve recommendation performance. Existing approaches cannot fully utilize the useful knowledge behind the non-overlapped users across domains, which limits the model performance when the majority of users turn out to be non-overlapped. To address this issue, we propose an end-to-end Dual-autoencoder with Variational Domain-invariant Embedding Alignment (VDEA) model, a cross-domain recommendation framework for the POCDR problem, which utilizes dual variational autoencoders with both local and global embedding alignment for exploiting domain-invariant user embedding. VDEA first adopts variational inference to capture collaborative user preferences, and then utilizes Gromov-Wasserstein distribution co-clustering optimal transport to cluster the users with similar rating interaction behaviors. Our empirical studies on Douban and Amazon datasets demonstrate that VDEA significantly outperforms the state-of-the-art models, especially under the POCDR setting.

HIEN: Hierarchical Intention Embedding Network for Click-Through Rate Prediction
Authors: Zuowu Zheng (1), Changwang Zhang (2), Xiaofeng Gao (1), Guihai Chen (1)
1: Shanghai Jiao Tong University, 2: Tencent Inc.

ACM DL

Google Scholar

(29)
概要:　クリック率（CTR）予測は、オンライン広告や推薦システムにおいて重要な役割を果たしており、特定のアイテムに対するユーザーのクリック確率を推定することを目的としています。CTR予測の分野では、特徴相互作用モデルとユーザー興味モデルが二つの主な領域であり、これらは近年広範に研究されてきました。しかし、これらの手法にはまだ二つの制限があります。第一に、従来の手法はアイテムの属性をID特徴として扱いながら、属性間の構造情報や関連依存性を無視しています。第二に、ユーザーアイテム相互作用からユーザーの興味を抽出する際、現在のモデルは異なる属性に対するユーザーの意図やアイテムの意図を無視しており、解釈可能性に欠けています。この観点に基づき、本論文では構築された属性グラフにおけるボトムアップの木集約に基づき、属性の依存性を考慮した新しいアプローチ「階層的意図埋め込みネットワーク（HIEN）」を提案します。HIENは、提案された階層的注意メカニズムに基づいて、異なるアイテム属性に対するユーザーの意図とアイテムの意図もキャプチャします。公的および商用データセットを用いた広範な実験により、提案されたモデルが最新の手法を大幅に上回る性能を示すことがわかりました。さらに、HIENは最新のCTR予測手法への入力モジュールとしても適用可能であり、既存モデルがすでに実システムで広く使用されている場合にも、さらなる性能向上をもたらします。

Abstract:　 Click-through rate (CTR) prediction plays an important role in online advertising and recommendation systems, which aims at estimating the probability of a user clicking on a specific item. Feature interaction modeling and user interest modeling methods are two popular domains in CTR prediction, and they have been studied extensively in recent years. However, these methods still suffer from two limitations. First, traditional methods regard item attributes as ID features, while neglecting structure information and relation dependencies among attributes. Second, when mining user interests from user-item interactions, current models ignore user intents and item intents for different attributes, which lacks interpretability. Based on this observation, in this paper, we propose a novel approach Hierarchical Intention Embedding Network (HIEN), which considers dependencies of attributes based on bottom-up tree aggregation in the constructed attribute graph. HIEN also captures user intents for different item attributes as well as item intents based on our proposed hierarchical attention mechanism. Extensive experiments on both public and production datasets show that the proposed model significantly outperforms the state-of-the-art methods. In addition, HIEN can be applied as an input module to state-of-the-art CTR prediction methods, bringing further performance lift for these existing models that might already be intensively used in real systems.

NAS-CTR: Efficient Neural Architecture Search for Click-Through Rate Prediction
Authors: Guanghui Zhu (1), Feng Cheng (1), Defu Lian (2), Chunfeng Yuan (1), Yihua Huang (1)
1: Nanjing University, 2: University of Science and Technology of China

ACM DL

Google Scholar

(30)
概要:　クリック率（CTR）予測は、オンライン広告やパーソナライズ推薦など、多くの機械学習タスクにおいて広く使用されています。残念ながら、ドメイン固有のデータセットに対して、膨大な候補空間から効果的な特徴相互作用操作および組み合わせを検索するためには、かなりの専門知識と計算コストが必要です。最近、ニューラルアーキテクチャ検索（NAS）は、高品質なネットワークアーキテクチャを自動的に発見するのに大きな成功を収めています。しかし、特徴相互作用操作と組み合わせの多様性のために、アーキテクチャ検索を離散的な検索空間でのブラックボックス最適化問題として扱う既存のNASベースの研究は低効率に悩まされています。したがって、より効率的なアーキテクチャ検索手法を探求することが不可欠です。この目標を達成するために、CTR予測のための微分可能なニューラルアーキテクチャ検索アプローチであるNAS-CTRを提案します。まず、新しい表現力豊かなアーキテクチャ検索空間と、検索空間を微分可能にするための連続緩和スキームを設計します。次に、CTR予測のためのアーキテクチャ検索を、アーキテクチャ上の離散的制約を伴う共同最適化問題として定式化し、近接反復を活用して制約付き最適化問題を解決します。さらに、スキップ接続の集積を排除するための単純ながら効果的な方法を提案します。広範な実験結果により、NAS-CTRがテスト精度と検索効率の両面で、最先端の人間が設計したアーキテクチャや他のNASベースの手法を上回ることが明らかになりました。

Abstract:　 Click-Through Rate (CTR) prediction has been widely used in many machine learning tasks such as online advertising and personalization recommendation. Unfortunately, given a domain-specific dataset, searching effective feature interaction operations and combinations from a huge candidate space requires significant expert experience and computational costs. Recently, Neural Architecture Search (NAS) has achieved great success in discovering high-quality network architectures automatically. However, due to the diversity of feature interaction operations and combinations, the existing NAS-based work that treats the architecture search as a black-box optimization problem over a discrete search space suffers from low efficiency. Therefore, it is essential to explore a more efficient architecture search method. To achieve this goal, we propose NAS-CTR, a differentiable neural architecture search approach for CTR prediction. First, we design a novel and expressive architecture search space and a continuous relaxation scheme to make the search space differentiable. Second, we formulate the architecture search for CTR prediction as a joint optimization problem with discrete constraints on architectures and leverage proximal iteration to solve the constrained optimization problem. Additionally, a straightforward yet effective method is proposed to eliminate the aggregation of skip connections. Extensive experimental results reveal that NAS-CTR can outperform the SOTA human-crafted architectures and other NAS-based methods in both test accuracy and search efficiency.

Enhancing CTR Prediction with Context-Aware Feature Representation Learning
Authors: Fangye Wang (1), Yingxu Wang (1), Dongsheng Li (2), Hansu Gu (3), Tun Lu (1), Peng Zhang (1), Ning Gu (1)
1: Fudan University, 2: Microsoft Research Asia, 3: Independent

ACM DL

Google Scholar

(31)
概要:　 :
CTR予測は実世界で広く使用されています。多くの方法では、パフォーマンスを向上させるために特徴相互作用をモデル化します。しかし、ほとんどの方法は各特徴に対して固定された表現のみを学習し、異なるコンテキストにおける各特徴の重要性の違いを考慮していないため、パフォーマンスが劣ってしまいます。最近では、固定された表現の問題を解決するために特徴表現にベクトルレベルの重みを学習する方法がいくつか試みられています。しかし、これらの方法では固定された特徴表現を洗練するために線形変換のみを行うため、異なるコンテキストでの各特徴の重要性を柔軟に捉えるには不十分です。本論文では、新しいモジュールであるFeature Refinement Network (FRNet)を提案します。FRNetは、異なるコンテキストにおいて各特徴のビットレベルでのコンテキストに応じた特徴表現を学習します。FRNetは次の2つの主要なコンポーネントから成ります。1) Information Extraction Unit (IEU)：コンテキスト情報とクロスフィーチャ関係を捉え、コンテキスト感知型特徴精緻化を導く。2) Complementary Selection Gate (CSGate)：IEUで学習されたオリジナルおよび補完的な特徴表現をビットレベルの重みで適応的に統合します。特筆すべきは、FRNetは既存のCTR方法に対して直交しており、多くの既存の方法に適用してそのパフォーマンスを向上させることができる点です。包括的な実験によって、FRNetの有効性、効率性、および互換性が検証されました。

Abstract:　 CTR prediction has been widely used in the real world. Many methods model feature interaction to improve their performance. However, most methods only learn a fixed representation for each feature without considering the varying importance of each feature under different contexts, resulting in inferior performance. Recently, several methods tried to learn vector-level weights for feature representations to address the fixed representation issue. However, they only produce linear transformations to refine the fixed feature representations, which are still not flexible enough to capture the varying importance of each feature under different contexts. In this paper, we propose a novel module named Feature Refinement Network (FRNet), which learns context-aware feature representations at bit-level for each feature in different contexts. FRNet consists of two key components: 1) Information Extraction Unit (IEU), which captures contextual information and cross-feature relationships to guide context-aware feature refinement; and 2) Complementary Selection Gate (CSGate), which adaptively integrates the original and complementary feature representations learned in IEU with bit-level weights. Notably, FRNet is orthogonal to existing CTR methods and thus can be applied in many existing methods to boost their performance. Comprehensive experiments are conducted to verify the effectiveness, efficiency, and compatibility of FRNet.

Neighbour Interaction based Click-Through Rate Prediction via Graph-masked Transformer
Authors: Erxue Min (1), Yu Rong (2), Tingyang Xu (2), Yatao Bian (2), Da Luo (3), Kangyi Lin (3), Junzhou Huang (4), Sophia Ananiadou (5), Peilin Zhao (6)
1: University of Manchester, 2: Tencent AI Lab, 3: Weixin Open Platform, 4: University of Texas at Arlington, 5: The University of Manchester, 6: Tencent AI lab

ACM DL

Google Scholar

(32)
概要:　クリック率（CTR）の予測は、ユーザーがアイテムをクリックする確率を推定することを目的としており、オンライン広告の重要なコンポーネントです。既存の手法は主に、ユーザーの直接的な関連アイテムを含むユーザーの履歴行動から興味を掘り起こそうとしますが、これらの手法は推薦システムの直接的な露出や消極的な相互作用に制約され、潜在的なユーザーの興味を全て探し出すことができません。この問題に取り組むために、隣接相互作用に基づくCTR予測（NI-CTR）を提案します。この手法は、異種情報ネットワーク（HIN）の設定でこのタスクを考慮し、ターゲットのユーザーアイテムペアの局所的な隣接関係を利用してリンクを予測します。局所的な隣接関係の表現学習を導くために、異なる種類の相互作用を明示的および暗黙的な観点から考慮し、これらの相互作用を効果的に取り入れるための新しいグラフマスクトランスフォーマー（GMT）を提案します。さらに、隣接サンプリングに対するモデルのロバスト性を向上させるために、隣接エンベディングに一貫性正則化損失を課します。数百万のインスタンスを含む2つの実世界データセットで大規模な実験を行い、実験結果は提案手法が最新のCTRモデルを大幅に上回ることを示しています。また、包括的なアブレーション研究により、モデルの各コンポーネントの有効性を検証しました。さらに、このフレームワークをWeChat公式アカウントプラットフォームに展開しており、数十億のユーザーに対するオンラインA/Bテストは、全オンラインベースラインに対して平均21.9％のCTR改善を示しました。

Abstract:　 Click-Through Rate (CTR) prediction, which aims to estimate the probability that a user will click an item, is an essential component of online advertising. Existing methods mainly attempt to mine user interests from users' historical behaviours, which contain users' directly interacted items. Although these methods have made great progress, they are often limited by the recommender system's direct exposure and inactive interactions, and thus fail to mine all potential user interests. To tackle these problems, we propose Neighbor-Interaction based CTR prediction (NI-CTR), which considers this task under a Heterogeneous Information Network (HIN) setting. In short, Neighbor-Interaction based CTR prediction involves the local neighborhood of the target user-item pair in the HIN to predict their linkage. In order to guide the representation learning of the local neighbourhood, we further consider different kinds of interactions among the local neighborhood nodes from both explicit and implicit perspective, and propose a novel Graph-Masked Transformer (GMT) to effectively incorporates these kinds of interactions to produce highly representative embeddings for the target user-item pair. Moreover, in order to improve model robustness against neighbour sampling, we enforce a consistency regularization loss over the neighbourhood embedding. We conduct extensive experiments on two real-world datasets with millions of instances and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly. Meanwhile, the comprehensive ablation studies verify the effectiveness of every component of our model. Furthermore, we have deployed this framework on the WeChat Official Account Platform with billions of users. The online A/B tests demonstrate an average CTR improvement of 21.9% against all online baselines.

ESCM2: Entire Space Counterfactual Multi-Task Model for Post-Click Conversion Rate Estimation
Authors: Hao Wang (1), Tai-Wei Chang (1), Tianqiao Liu (1), Jianmin Huang (1), Zhichao Chen (1), Chao Yu (1), Ruopeng Li (1), Wei Chu (1)
1: Ant Group

ACM DL

Google Scholar

(33)
概要:　リコメンダーシステムの構築において、クリック後のコンバージョン率（CVR）の正確な推定は非常に重要です。しかし、この分野は長らくサンプル選択バイアスやデータの希少性の問題に直面してきました。Entire Space Multi-task Model (ESMM) 系の手法は、ユーザーの行動の順序パターン、すなわち「インプレッション → クリック → コンバージョン」を利用してデータの希少性問題に対処します。しかし、それでもなおCVR推定のバイアスを完全には排除できていません。本論文では、ESMMが以下の2つの問題に悩まされることを理論的に示します。(1) CVR推定のための固有の推定バイアス（IEB）、すなわちCVRの推定値が真の値よりも本質的に高くなる問題。(2) CTCVR推定のための潜在的独立性優先（PIP）、すなわちESMMがクリックからコンバージョンへの因果関係を見落とす可能性がある問題。これらの問題に対処するため、私たちは原則に基づいたアプローチであるEntire Space Counterfactual Multi-task Modelling (ESCM$^2$) を開発しました。これは、ESMMにおいて反実仮想リスク最小化を正則化手法として利用し、IEBとPIPの問題を同時に解決するものです。オフラインデータセットおよびオンライン環境における広範な実験により、提案するESCM$^2$がIEBとPIPの固有問題を大幅に緩和し、ベースラインモデルよりも優れた性能を達成することが確認されました。

Abstract:　 Accurate estimation of post-click conversion rate is critical for building recommender systems, which has long been confronted with sample selection bias and data sparsity issues. Methods in the Entire Space Multi-task Model (ESMM) family leverage the sequential pattern of user actions, \ie $impression\rightarrow click \rightarrow conversion$ to address data sparsity issue. However, they still fail to ensure the unbiasedness of CVR estimates. In this paper, we theoretically demonstrate that ESMM suffers from the following two problems: (1) Inherent Estimation Bias (IEB) for CVR estimation, where the CVR estimate is inherently higher than the ground truth; (2) Potential Independence Priority (PIP) for CTCVR estimation, where ESMM might overlook the causality from click to conversion. To this end, we devise a principled approach named Entire Space Counterfactual Multi-task Modelling (ESCM$^2$), which employs a counterfactual risk miminizer as a regularizer in ESMM to address both IEB and PIP issues simultaneously. Extensive experiments on offline datasets and online environments demonstrate that our proposed ESCM$^2$ can largely mitigate the inherent IEB and PIP issues and achieve better performance than baseline models.

Target-aware Abstractive Related Work Generation with Contrastive Learning
Authors: Xiuying Chen (1), Hind Alamro (1), Mingzhe Li (2), Shen Gao (2), Rui Yan (3), Xin Gao (4), Xiangliang Zhang (5)
1: KAUST, 2: Peking University, 3: Renmin University of China, 4: KAUST, 5: University of Notre Dame & KAUST

ACM DL

Google Scholar

(34)
概要:　関連研究セクションは、科学論文の重要な構成要素であり、参照論文の文脈における対象論文の貢献を強調します。著者は、自動生成された関連研究セクションを草稿として用いることで、最終的な関連研究を完成させるための時間と労力を節約できます。現存する関連研究セクション生成方法の多くは、既成の文を抽出して、対象研究と参照論文について比較的に議論するものに依存しています。しかし、そのような文は事前に書かれている必要があり、実際に入手するのは困難です。したがって、本研究では、新しい文から成る関連研究セクションを生成できる的なターゲット認識型関連研究生成器（TAG）を提案します。具体的には、まずターゲット中心の注意メカニズムを用いて、参照論文と対象論文の関係をモデル化するターゲット認識型グラフエンコーダを提案します。デコーディングプロセスでは、キーフレーズを意味指標としてグラフの異なるレベルのノードに注意を向ける階層型デコーダを提案します。最後に、より情報量豊かな関連研究を生成するために、参照文献との相互情報を最大化し、非参照文献との相互情報を最小化することを目的とした多層対照最適化目標を提案します。2つの公共の学者データセットにおける広範な実験により、提案モデルが自動および特定化された人間評価の両方において、いくつかの強力なベースラインに対して大幅な改善をもたらすことが示されました。

Abstract:　 The related work section is an important component of a scientific paper, which highlights the contribution of the target paper in the context of the reference papers. Authors can save their time and effort by using the automatically generated related work section as a draft to complete the final related work. Most of the existing related work section generation methods rely on extracting off-the-shelf sentences to make a comparative discussion about the target work and the reference papers. However, such sentences need to be written in advance and are hard to obtain in practice. Hence, in this paper, we propose an abstractive target-aware related work generator (TAG), which can generate related work sections consisting of new sentences. Concretely, we first propose a target-aware graph encoder, which models the relationships between reference papers and the target paper with target-centered attention mechanisms. In the decoding process, we propose a hierarchical decoder that attends to the nodes of different levels in the graph with keyphrases as semantic indicators. Finally, to generate a more informative related work, we propose multi-level contrastive optimization objectives, which aim to maximize the mutual information between the generated related work with the references and minimize that with non-references. Extensive experiments on two public scholar datasets show that the proposed model brings substantial improvements over several strong baselines in terms of automatic and tailored human evaluations.

A Study of Cross-Session Cross-Device Search Within an Academic Digital Library
Authors: Sebastian Gomes (1), Miriam Boon (1), Orland Hoeber (1)
1: University of Regina

ACM DL

Google Scholar

(35)
概要:　学術デジタルライブラリにおける情報探索は複雑であり、複数の検索セッションにわたることが多いです。学術的な検索タスクを再開するには、検索者が以前の検索セッションの活動や発見したドキュメントに再度馴染む必要があり、かなりの認知的努力を要します。さらに、いくつかの学術的検索者は、短い時間の合間（例えば授業の合間）にモバイルデバイスで検索を開始し、後でより大きな画面と便利なドキュメント保管機能を備えたデスクトップ環境でそれを再開することを便利に感じるかもしれません。このような検索を支援するために、モバイル環境とデスクトップ環境をまたがるセッション管理を支援する学術デジタルライブラリ検索インターフェースを開発しました。制御された実験室研究を使用して、我々のアプローチ（Dilex）を標準的な学術デジタルライブラリ検索インターフェースと比較しました。その結果、初期の（モバイル）検索活動と再開した（デスクトップ）検索活動の両方でユーザーエンゲージメントが向上し、参加者が検索結果ページに費やす時間が増え、再開したタスク中の情報とパーソナリゼーション機能への相互作用度が向上したことが分かりました。これらの結果は、参加者がDilexの視覚化機能を効果的に利用し、検索タスクを容易に再開し、検索活動に積極的に取り組むことができたことを示しています。この研究は、半自動検索タスク/セッション管理と視覚化機能がセッションをまたがる検索をどのように支援できるか、またモバイルおよびデスクトップ使用の両方に向けたデザインがクロスデバイス検索をどのようにサポートできるかの一例を示しています。

Abstract:　 Information seeking in an academic digital library is complex in nature, often spanning multiple search sessions. Resuming academic search tasks requires significant cognitive effort as searchers must re-acquaint themselves with previous search session activities and previously discovered documents before resuming their search. Further, some academic searchers may find it convenient to initiate such searches on their mobile devices during short gaps in time (e.g., between classes), and resume them later in a desktop environment when they can use the extra screen space and more convenient document storage capabilities of their computers. To support such searching, we have developed an academic digital library search interface that assists searchers in managing cross-session search tasks even when moving between mobile and desktop environments. Using a controlled laboratory study we compared our approach (Dilex) to a standard academic digital library search interface. We found increased user engagement in both the initial (mobile) and resumed (desktop) search activities, and that participants spent more time on the search results pages and had an increased degree of interaction with information and personalization features during the resumed tasks. These results provide evidence that the participants were able to make effective use of the visualization features in Dilex, which enabled them to readily resume their search tasks and stay engaged in the search activities. This work represents an example of how semi-automatic search task/session management and visualization features can support cross-session search, and how designing for both mobile and desktop use can support cross-device search.

DAWAR: Diversity-aware Web APIs Recommendation for Mashup Creation based on Correlation Graph
Authors: Wenwen Gong (1), Xuyun Zhang (2), Yifei Chen (1), Qiang He (3), Amin Beheshti (2), Xiaolong Xu (4), Chao Yan (5), Lianyong Qi (6)
1: China Agricultural University, 2: Macquarie University, 3: Swinburne University of Technology, 4: Nanjing University of Information Science and Technology, 5: Qufu Normal University, 6: Qufu Normal University & Nanjing University

ACM DL

Google Scholar

(36)
概要:　マイクロサービスアーキテクチャの人気の高まりに伴い、多くの企業や組織が複雑なビジネスサービスを様々な軽量な機能にカプセル化し、それをアクセス可能なAPI（アプリケーションプログラミングインタフェース）として公開しています。ソフトウェア開発者はキーワード検索を通じて膨大な候補の中から一連のAPIを選択し、複雑なマッシュアップ機能を実装することで、開発コストを大幅に削減することができます。しかし、従来のキーワード検索方法では、機能の互換性や検索結果の多様性が限定されるといった重大な問題に悩まされることが多く、これによりマッシュアップの作成が失敗したり、開発生産性が低下したりする可能性があります。これらの課題に対処するために、本稿ではDAWARを設計しました。DAWARは多様性を意識したWeb API推薦手法であり、マッシュアップの作成に必要な多様で互換性のあるAPIを見つけることを目的としています。具体的には、APIの相関グラフ上で最小グループシュタイナーツリーを見つけることを目指したグラフ検索問題として、マッシュアップ作成のためのAPI推薦問題をモデル化しています。DAWARは決定的点過程を革新的に利用して推薦結果を多様化します。一般的に使用される実世界のデータセットを用いた実証評価により、統計結果は、DAWARが推薦の多様性、精度、および互換性の面で大幅な改善を達成できることを示しています。

Abstract:　 With the ever-increasing popularity of microservice architecture, a considerable number of enterprises or organizations have encapsulated their complex business services into various lightweight functions as published them accessible APIs (Application Programming Interfaces). Through keyword search, a software developer could select a set of APIs from a massive number of candidates to implement the functions of a complex mashup, which reduces the development cost significantly. However, traditional keyword search methods for APIs often suffer from several critical issues such as functional compatibility and limited diversity in search results, which may lead to mashup creation failures and lower development productivity. To deal with these challenges, this paper designs DAWAR, a diversity-aware Web APIs recommendation approach that finds diversified and compatible APIs for mashup creation. Specifically, the APIs recommendation problem for mashup creating is modelled as a graph search problem that aims to find the minimal group Steiner trees in a correlation graph of APIs. DAWAR innovatively employs the determinantal point processes to diversify the recommended results. Empirical evaluation is performed on commonly-used real-world datasets, and the statistic results show that DAWAR is able to achieve significant improvements in terms of recommendation diversity, accuracy, and compatibility.

Introducing Problem Schema with Hierarchical Exercise Graph for Knowledge Tracing
Authors: Hanshuang Tong (1), Zhen Wang (2), Yun Zhou (2), Shiwei Tong (3), Wenyuan Han (2), Qi Liu (4)
1: Microsoft Corporation, 2: AIXUEXI Education Group Ltd, 3: School of Computer Science and Technology, 4: School of Computer Science and Technology

ACM DL

Google Scholar

(37)
概要:　学習者の知識習得度を予測することを目的とするKnowledge Tracing（KT）は、コンピュータ支援教育システムにおいて重要な役割を果たします。KTの目標は各知識の習得度を診断することで、学習者にパーソナライズされた学習パスを提供し、学習効率を向上させることにあります。近年、多くの深層学習モデルがKTタスクに応用され、期待される成果を上げています。しかし、既存の多くの手法は演習記録を知識シーケンスとして簡略化し、演習に存在する豊富な情報を探求することができません。さらに、既存のKnowledge Tracingの診断結果は、演習間の階層的な関係を無視しているため、説得力に欠けます。これらの問題を解決するために、我々はHGKTと呼ばれる階層的グラフ知識トレースモデルを提案し、演習間の潜在的な複雑な関係を探求します。具体的には、問題スキーマの概念を導入し、演習学習の依存関係をモデル化した階層的演習グラフを構築します。また、2つのアテンションメカニズムを用いて、学習者の重要な過去の状態を強調します。テスト段階では、知識とスキーマの遷移を追跡できる知識&スキーマ診断マトリックスを提示し、さまざまな応用により容易に適用できるようにします。広範な実験により、提案モデルの有効性と解釈可能性が示されました。

Abstract:　 Knowledge tracing (KT) which aims at predicting learner's knowledge mastery plays an important role in the computer-aided educational system. The goal of KT is to provide personalized learning paths for learners by diagnosing the mastery of each knowledge, thus improving the learning efficiency. In recent years, many deep learning models have been applied to tackle the KT task, which has shown promising results. However, most existing methods simplify the exercising records as knowledge sequences, which fail to explore the rich information that existed in exercises. Besides, the existing diagnosis results of knowledge tracing are not convincing enough since they neglect hierarchical relations between exercises. To solve the above problems, we propose a hierarchical graph knowledge tracing model called HGKT to explore the latent complex relations between exercises. Specifically, we introduce the concept of problem schema to construct a hierarchical exercise graph that could model the exercise learning dependencies. Moreover, we employ two attention mechanisms to highlight important historical states of learners. In the testing stage, we present a knowledge&schema diagnosis matrix that could trace the transition of mastery of knowledge and problem schema, which can be more easily applied to different applications. Extensive experiments show the effectiveness and interpretability of our proposed model.

A Robust Computerized Adaptive Testing Approach in Educational Question Retrieval
Authors: Yan Zhuang (1), Qi Liu (2), Zhenya Huang (1), Zhi Li (1), Binbin Jin (3), Haoyang Bi (1), Enhong Chen (1), Shijin Wang (4)
1: Anhui Province Key Laboratory of Big Data Analysis and Application, 2: Anhui Province Key Laboratory of Big Data Analysis and Application, 3: Huawei Cloud Computing Technologies Co., 4: State Key Laboratory of Cognitive Intelligence & iFLYTEK AI Research (Central China)

ACM DL

Google Scholar

(38)
概要:　コンピュータ適応型テスト（Computerized Adaptive Testing: CAT）は、パーソナライズドオンライン教育（例えば、GRE）における有望なテストモードであり、学生の能力を正確に測定し、テストの長さを短縮することを目指しています。「適応型」というのは、各テスト段階で推定される学生の能力に基づいて最適な質問を選択するアルゴリズムに反映されています。多くの洗練された選択アルゴリズムがCATの効果を改善するために存在しますが、これらは現行の能力推定の精度によって制限や妨害を受けるため、ロバスト性に欠けます。この問題に対応するために、我々はテスト中の学生の「多面的」な性質を活用することで、既存のアルゴリズムのロバスト性を向上させる一般的方法を探究しました。具体的には、複数の推定値を各ステップで融合することにより、学生の潜在能力を多面的に記述する汎用的な最適化基準「ロバスト適応型テスト（Robust Adaptive Testing: RAT）」を提案します。我々はさらに、この推定量の望ましい統計的特性（漸近的な不偏性、効率性、一貫性）について理論的な分析を提供します。摂動された合成データと3つの実データセットに関する多数の実験により、RATフレームワーク内の選択アルゴリズムがロバストであり、実質的な改善をもたらすことが示されました。

Abstract:　 Computerized Adaptive Testing (CAT) is a promising testing mode in personalized online education (e.g., GRE), which aims at measuring student's proficiency accurately and reducing test length. The "adaptive" is reflected in its selection algorithm that can retrieve best-suited questions for student based on his/her estimated proficiency at each test step. Although there are many sophisticated selection algorithms for improving CAT's effectiveness, they are restricted and perturbed by the accuracy of current proficiency estimate, thus lacking robustness. To this end, we investigate a general method to enhance the robustness of existing algorithms by leveraging student's "multi-facet" nature during tests. Specifically, we present a generic optimization criterion Robust Adaptive Testing (RAT) for proficiency estimation via fusing multiple estimates at each step, which maintains a multi-facet description of student's potential proficiency. We further provide theoretical analyses of such estimator's desirable statistical properties: asymptotic unbiasedness, efficiency, and consistency. Extensive experiments on perturbed synthetic data and three real-world datasets show that selection algorithms in our RAT framework are robust and yield substantial improvements.

Assessing Student's Dynamic Knowledge State by Exploring the Question Difficulty Effect
Authors: Shuanghong Shen (1), Zhenya Huang (2), Qi Liu (2), Yu Su (3), Shijin Wang (4), Enhong Chen (2)
1: Anhui Province Key Laboratory of Big Data Analysis and Application, 2: Anhui Province Key Laboratory of Big Data Analysis and Application, 3: School of Computer Science and Technology, 4: State Key Laboratory of Cognitive Intelligence & iFLYTEK AI Research (Central China)

ACM DL

Google Scholar

(39)
概要:　ナレッジ・トレーシング（KT）は、様々な質問に対する生徒の動的な知識状態を評価することを目的としており、オンライン学習システムにおいてインテリジェントなサービスを提供するための基本的な研究課題です。研究者たちは、優れたパフォーマンスを持つKTモデルの開発に多大な努力を注いできました。しかし、既存のKT手法では、生徒の学習に直接影響を与える質問の難易度が効果的に探求・活用されていません。本論文では、学習における質問の難易度の影響を探求し、生徒の知識状態の評価を改善することを焦点とし、DIfficulty Matching Knowledge Tracing（DIMKT）モデルを提案します。具体的には、まず質問の表現に難易度を明示的に組み込みます。次に、練習過程において生徒の知識状態と質問の難易度の関係を確立するために、以下の三段階からなる適応的なシーケンシャルニューラルネットワークを設計します：（1）練習前に生徒の質問難易度に対する主観的な感覚を測定する；（2）異なる難易度の質問に答える際の生徒の個別の知識習得を推定する；（3）練習後に質問の難易度に合わせて生徒の知識状態を更新する。最後に、実世界のデータセットを用いた広範な実験を実施し、その結果、DIMKTが最先端のKTモデルを上回る性能を示すことを確認しました。さらに、DIMKTは予測において質問の難易度の影響を探ることで、優れた解釈性も示しています。我々のコードはhttps://github.com/shshen-closer/DIMKTで入手可能です。

Abstract:　 Knowledge Tracing (KT), which aims to assess students' dynamic knowledge states when practicing on various questions, is a fundamental research task for offering intelligent services in online learning systems. Researchers have devoted significant efforts to developing KT models with impressive performance. However, in existing KT methods, the related question difficulty level, which directly affects students' knowledge state in learning, has not been effectively explored and employed. In this paper, we focus on exploring the question difficulty effect on learning to improve student's knowledge state assessment and propose the DIfficulty Matching Knowledge Tracing (DIMKT) model. Specifically, we first explicitly incorporate the difficulty level into the question representation. Then, to establish the relation between students' knowledge state and the question difficulty level during the practice process, we accordingly design an adaptive sequential neural network in three stages: (1) measuring students' subjective feelings of the question difficulty before practice; (2) estimating students' personalized knowledge acquisition while answering questions of different difficulty levels; (3) updating students' knowledge state in varying degrees to match the question difficulty level after practice. Finally, we conduct extensive experiments on real-world datasets, and the results demonstrate that DIMKT outperforms state-of-the-art KT models. Moreover, DIMKT shows superior interpretability by exploring the question difficulty effect when making predictions. Our codes are available at https://github.com/shshen-closer/DIMKT.

Incorporating Retrieval Information into the Truncation of Ranking Lists for Better Legal Search
Authors: Yixiao Ma (1), Qingyao Ai (2), Yueyue Wu (1), Yunqiu Shao (1), Yiqun Liu (1), Min Zhang (1), Shaoping Ma (1)
1: Tsinghua University, 2: University of Utah

ACM DL

Google Scholar

(40)
概要:　検索モデルによって予測されたランキングリストの切り捨ては、ユーザーの検索体験を向上させるために不可欠です。特に法的案件などの特定の垂直領域では、文書が複雑で膨大なため（例：法的案件）、結果を閲覧するコストは従来の情報検索タスク（例：ウェブ検索）よりも非常に高く、合理的なカットオフ位置を設定することが必要です。既存の結果リスト切り捨て手法を法的案件検索に適用するのは簡単ですが、これらの手法は単純な文書統計にのみ焦点を当てており、ランキングリスト内の文書の文脈情報を捉えることができないため、その有効性は限られています。さらに、これらの既存の手法は結果リストの切り捨てをランキングプロセス全体の一部ではなく孤立したタスクとして扱っており、実際のシステムでの切り捨ての使用を制限しています。これらの制約に対処するために、私たちは法的案件検索用のランキングリスト切り捨てモデルLeCutを提案します。LeCutは検索タスクの文脈的な特徴を利用して、文書間の意味的な類似性を捉え、注意機構を用いて最適なカットオフ位置を決定します。さらに、LeCutに基づく切り捨てと再ランキングの共同最適化（JOTR）フレームワークを提案し、切り捨てタスクと検索タスクのパフォーマンスを同時に向上させます。公開されたベンチマークデータセットに対する競争力のあるベースラインとの比較により、LeCutとJOTRの有効性が実証されました。ケーススタディとして、LeCutのカットオフ位置とJOTRが検索および切り捨てタスクをどのように改善するかのプロセスを可視化しました。

Abstract:　 The truncation of ranking lists predicted by retrieval models is vital to ensure users' search experience. Particularly, in specific vertical domains where documents are usually complicated and extensive (e.g., legal cases), the cost of browsing results is much higher than traditional IR tasks (e.g., Web search) and setting a reasonable cut-off position is quite necessary. While it is straightforward to apply existing result list truncation approaches to legal case retrieval, the effectiveness of these methods is limited because they only focus on simple document statistics and usually fail to capture the context information of documents in the ranking list. These existing efforts also treat result list truncation as an isolated task instead of a component in the entire ranking process, limiting the usage of truncation in practical systems. To tackle these limitations, we propose LeCut, a ranking list truncation model for legal case retrieval. LeCut utilizes contextual features of the retrieval task to capture the semantic-level similarity between documents and decides the best cut-off position with attention mechanisms. We further propose a Joint Optimization of Truncation and Reranking (JOTR) framework based on LeCut to improve the performance of truncation and retrieval tasks simultaneously. Comparison against competitive baselines on public benchmark datasets demonstrates the effectiveness of LeCut and JOTR. A case study is conducted to visualize the cut-off positions of LeCut and the process of how JOTR improves both retrieval and truncation tasks.

MetaCare++: Meta-Learning with Hierarchical Subtyping for Cold-Start Diagnosis Prediction in Healthcare Data
Authors: Yanchao Tan (1), Carl Yang (2), Xiangyu Wei (1), Chaochao Chen (1), Weiming Liu (1), Longfei Li (3), Jun Zhou (3), Xiaolin Zheng (1)
1: Zhejiang University, 2: Emory University, 3: Ant Group

ACM DL

Google Scholar

(41)
概要:　コールドスタート診断予測はヘルスケアにおけるAIの難題であり、患者一人当たりの来院回数や各疾患の観察回数が少ないことがよくあります。一般的な領域でデータの希少性問題に対処するためにメタラーニングが広く採用されていますが、これをヘルスケアデータに直接適用するのは効果的ではありません。なぜなら、臨床訪問における時間的関係と症候群的疾患間の複雑な関係をどのように捉えるかが不明確だからです。この問題に対処するために、まず、ヘルスケアデータにおけるコールドスタート診断予測のための新しいメタラーニングフレームワーク（MetaCare）を提案します。MetaCareは、疾患進行の影響を時間経過に沿って明示的にエンコードすることで、一般化のための事前情報として利用し、来院頻度が少ない患者に対して未来の診断とタイムスタンプを動的に予測します。次に、希少疾患間の複雑な関係をモデル化するために、疾患間の階層的関係に関するドメイン知識を活用し、さらに診断のサブタイピングを行って潜在的な症候群的関係を抽出します。最後に、一般的なメタラーニングフレームワークを個別化されたパラメータで最適化するために、階層的な患者サブタイピングメカニズムを設計し、低頻度の患者と希少疾患の双方のモデリングを橋渡しします。この統合モデルをMetaCare++と呼びます。二つの実世界ベンチマークデータセットにおける大規模な実験により、MetaCare++は、診断予測において平均7.71%、診断時刻予測において13.94%という大幅なパフォーマンス向上を示し、最新のベースラインを凌駕しました。

Abstract:　 Cold-start diagnosis prediction is a challenging task for AI in healthcare, where often only a few visits per patient and a few observations per disease can be exploited. Although meta-learning is widely adopted to address the data sparsity problem in general domains, directly applying it to healthcare data is less effective, since it is unclear how to capture both the temporal relations in clinical visits and the complicated relations among syndromic diseases for precise personalized diagnosis. To this end, we first propose a novel Meta-learning framework for cold-start diagnosis prediction in healthCare data (MetaCare). By explicitly encoding the effects of disease progress over time as a generalization prior, MetaCare dynamically predicts future diagnosis and timestamp for infrequent patients. Then, to model complicated relations among rare diseases, we propose to utilize domain knowledge of hierarchical relations among diseases, and further perform diagnosis subtyping to mine the latent syndromic relations among diseases. Finally, to tailor the generic meta-learning framework with personalized parameters, we design a hierarchical patient subtyping mechanism and bridge the modeling of both infrequent patients and rare diseases. We term the joint model as MetaCare++. Extensive experiments on two real-world benchmark datasets show significant performance gains brought by MetaCare++, yielding average improvements of 7.71% for diagnosis prediction and 13.94% for diagnosis time prediction over the state-of-the-art baselines.

Interpreting Patient Descriptions using Distantly Supervised Similar Case Retrieval
Authors: Israa Alghanmi (1), Luis Espinosa-Anke (1), Steven Schockaert (1)
1: Cardiff University

ACM DL

Google Scholar

(42)
概要:　生物医学自然言語処理は、診断や治療提案などのために患者の記述を解釈することをしばしば伴います。現在の方法は、生物医学に特化した言語モデルに基づいていますが、このようなタスクにおいて苦戦することが判明しています。さらに、検索強化型の戦略も限られた成功しか得られていません。これは、与えられた患者記述を解釈するために必要な知識の種類を正確に表す文を見つけるのはまれだからです。このため、明示的な医療知識を検索しようとするのではなく、代わりに最近傍法に依存することを提案します。まず、与えられた患者記述に似たテキストパッセージを検索し、それが類似の状況にいる患者を記述している可能性が高いとします。そして、仮説（例：患者の可能性のある診断）の類似度に基づいてその仮説の可能性を判断します。しかし、類似のケースを特定するのは挑戦的です。なぜなら、類似の患者の記述が表面的にはかなり異なって見えることがあり、それはしばしば無関係な詳細が豊富に含まれているからです。この課題に対処するために、遠隔監督付きクロスエンコーダに依存する戦略を提案します。その概念的な単純さにもかかわらず、この戦略が実際に効果的であることが分かりました。

Abstract:　 Biomedical natural language processing often involves the interpretation of patient descriptions, for instance for diagnosis or for recommending treatments. Current methods, based on biomedical language models, have been found to struggle with such tasks. Moreover, retrieval augmented strategies have only had limited success, as it is rare to find sentences which express the exact type of knowledge that is needed for interpreting a given patient description. For this reason, rather than attempting to retrieve explicit medical knowledge, we instead propose to rely on a nearest neighbour strategy. First, we retrieve text passages that are similar to the given patient description, and are thus likely to describe patients in similar situations, while also mentioning some hypothesis (e.g.\ a possible diagnosis of the patient). We then judge the likelihood of the hypothesis based on the similarity of the retrieved passages. Identifying similar cases is challenging, however, as descriptions of similar patients may superficially look rather different, among others because they often contain an abundance of irrelevant details. To address this challenge, we propose a strategy that relies on a distantly supervised cross-encoder. Despite its conceptual simplicity, we find this strategy to be effective in practice.

Few-shot Node Classification on Attributed Networks with Graph Meta-learning
Authors: Yonghao Liu (1), Mengyu Li (1), Ximing Li (1), Fausto Giunchiglia (2), Xiaoyue Feng (1), Renchu Guan (1)
1: Jilin University, 2: University of Trento

ACM DL

Google Scholar

(43)
概要:　非ユークリッド領域のデータとして表現される属性付きネットワークは、分子特性予測、ソーシャルネットワーク分析、および異常検知など、現実世界で幅広い応用を持っています。属性付きネットワークにおける基本的な研究課題として、ノード分類は研究コミュニティの間でますます注目を集めています。しかし、既存のほとんどのモデルは限定されたラベル付きインスタンス（つまり、少数ショットのシナリオ）を持つデータには直接適用できません。属性付きネットワークにおける少数ショットのノード分類は、徐々に研究のホットスポットとなりつつあります。いくつかの方法が、メタ学習とグラフニューラルネットワークを統合してこの問題に対処しようとしているものの、いくつかの制約が残っています。まず、それらはすべてホモフィリックグラフにおけるグラフニューラルネットワークを使用したノード表現学習を前提としています。そのため、これらのモデルがヘテロフィリックグラフに適用されると、最適なパフォーマンスが得られません。次に、メタ学習に基づく既存のモデルは完全にインスタンスベースの統計に依存しており、このため、少数ショット設定ではデータノイズや外れ値によって必然的に性能が低下します。さらに、以前のほとんどのモデルはすべてのサンプルタスクを等しく扱い、それらの特異性に適応できません。これが、モデル全体のパフォーマンスに大きな影響を与えます。これらの三つの制限を解決するために、我々はメタ学習に基づく新しいグラフフレームワーク「Graph learning based on Prototype and Scaling & shifting transformation (Meta-GPS)」を提案します。具体的には、ヘテロフィリックグラフ上でも表現力豊かなノード表現を学習するための効率的な方法を導入し、メタ学習におけるパラメータの初期化にプロトタイプベースのアプローチを利用することを提唱します。さらに、S^2（スケーリングとシフティング）変換を利用して、多様なタスクから効果的に変換可能な知識を学習します。実世界の六つのデータセットにおける広範な実験結果から、我々の提案するフレームワークが他の最先端のベースラインに対して関連メトリックで最大13%の絶対的な改善を達成する優位性を示しています。

Abstract:　 Attributed networks, as a manifestation of data in non-Euclidean domains, have a wide range of applications in the real world, such as molecular property prediction, social network analysis and anomaly detection. Node classification, as a fundamental research problem in attributed networks, has attracted increasing attention among research communities. However, most existing models cannot be directly applied to the data with limited labeled instances (\textiti.e., the few-shot scenario). Few-shot node classification on attributed networks is gradually becoming a research hotspot. Although several methods aim to integrate meta-learning with graph neural networks to address this problem, some limitations remain. First, they all assume node representation learning using graph neural networks in homophilic graphs. %Hence, suboptimal performance is obtained when these models are applied to heterophilic graphs. Second, existing models based on meta-learning entirely depend on instance-based statistics. %which in few-shot settings are unavoidably degraded by data noise or outliers. Third, most previous models treat all sampled tasks equally and fail to adapt their uniqueness. %which has a significant impact on the overall performance of the model. To solve the above three limitations, we propose a novel graph Meta -learning framework called G raph learning based on P rototype and S caling & shifting transformation (Meta-GPS ). More specifically, we introduce an efficient method for learning expressive node representations even on heterophilic graphs and propose utilizing a prototype-based approach to initialize parameters in meta-learning. Moreover, we also leverage S$^2$ (scaling & shifting) transformation to learn effective transferable knowledge from diverse tasks. Extensive experimental results on six real-world datasets demonstrate the superiority of our proposed framework, which outperforms other state-of-the-art baselines by up to 13% absolute improvement in terms of related metrics.

Personalized Fashion Compatibility Modeling via Metapath-guided Heterogeneous Graph Learning
Authors: Weili Guan (1), Fangkai Jiao (2), Xuemeng Song (2), Haokun Wen (2), Chung-Hsing Yeh (1), Xiaojun Chang (3)
1: Monash University, 2: Shandong University, 3: University of Technology Sydney

ACM DL

Google Scholar

(44)
概要:　ファッション互換性モデル（Fashion Compatibility Modeling, FCM）は新しくも困難な課題であり、一連の補完的なアイテム間のマッチング度を自動的に評価することを目的としています。既存の方法の多くは一般的な観点からファッションの互換性を評価しますが、ユーザーの個人的な好みを見落としています。これにインスパイアされ、少数の先駆者たちはパーソナライズドファッション互換性モデル（Personalized Fashion Compatibility Modeling, PFCM）を研究しています。重要性にもかかわらず、これらのPFCM方法は主にユーザーとアイテムのエンティティ、およびその相互作用に焦点を当てており、豊富なセマンティクスを含む属性エンティティを無視しています。この問題に対処するため、我々はPFCMに関連するエンティティとその関係を十分に探求し、PFCMの性能を向上させることを提案します。しかし、これは異種の内容、多様な高次関係、新しいユーザーの埋め込みなどから簡単ではありません。これらの目的に対応するため、新しいメタパスガイド付きパーソナライズドファッション互換性モデル（MG-PFCM）を提案します。具体的には、3種類のエンティティ（ユーザー、アイテム、属性）とその関係（ユーザー-アイテムの相互作用、アイテム-アイテムのマッチング関係、アイテム-属性の関連関係）を統一する異種グラフを創造的に構築します。その後、マルチモーダルコンテンツ指向のユーザー埋め込みモジュールを設計し、ユーザーの表現を学習します。一方、ユーザー指向およびアイテム指向のメタパスを定義し、メタパスガイド付き異種グラフ学習を実行してユーザーとアイテムの埋め込みを強化します。加えて、モデル性能を向上させるためにコントラスト正則化を導入します。実世界のベンチマークデータセットで広範な実験を行い、いくつかの最先端ベースラインに対する我々の提案スキームの優越性を検証します。副産物として、他の研究者の利益のためにソースコードを公開しました。

Abstract:　 Fashion Compatibility Modeling (FCM) is a new yet challenging task, which aims to automatically access the matching degree among a set of complementary items. Most of existing methods evaluate the fashion compatibility from the common perspective, but overlook the user's personal preference. Inspired by this, a few pioneers study the Personalized Fashion Compatibility Modeling (PFCM). Despite their significance, these PFCM methods mainly concentrate on the user and item entities, as well as their interactions, but ignore the attribute entities, which contain rich semantics. To address this problem, we propose to fully explore the related entities and their relations involved in PFCM to boost the PFCM performance. This is, however, non-trivial due to the heterogeneous contents of different entities, embeddings for new users, and various high-order relations. Towards these ends, we present a novel metapath-guided personalized fashion compatibility modeling, dubbed as MG-PFCM. In particular, we creatively build a heterogeneous graph to unify the three types of entities (i.e., users, items, and attributes) and their relations (i.e., user-item interactions, item-item matching relations, and item-attribute association relations). Thereafter, we design a multi-modal content-oriented user embedding module to learn user representations by inheriting the contents of their interacted items. Meanwhile, we define the user-oriented and item-oriented metapaths, and perform the metapath-guided heterogeneous graph learning to enhance the user and item embeddings. In addition, we introduce the contrastive regularization to improve the model performance. We conduct extensive experiments on the real-world benchmark dataset, which verifies the superiority of our proposed scheme over several cutting-edge baselines. As a byproduct, we have released our source codes to benefit other researchers.

KETCH: Knowledge Graph Enhanced Thread Recommendation in Healthcare Forums
Authors: Limeng Cui (1), Dongwon Lee (1)
1: The Pennsylvania State University

ACM DL

Google Scholar

(45)
概要:　健康スレッド推薦法は、ユーザーにとって最も関連性の高い既存スレッドを提案することを目的としています。従来の手法の多くは、投稿内容をモデル化して関連する回答を検索することに依存していました。しかし、異なる臨床状態のユーザーによって書かれた投稿は、語彙的には類似する場合があります。例えば、関連性のない疾患（狭心症と骨粗鬆症など）が同じ症状（背中の痛みなど）を有することがあり、それらはユーザーにとって無関係なスレッドとなり得ます。したがって、ユーザーとスレッドの関係だけでなく、ユーザーの症状や臨床状態の説明も考慮することが重要です。本論文では、オンライン医療フォーラムにおけるスレッド推薦の問題に対して、知識グラフを活用したThreads Recommendation（KETCH）モデルを提案します。このモデルは、ユーザーとスレッドの相互作用をモデル化し、その表現を学習するためにグラフニューラルネットワークを利用しています。私たちのモデルでは、ユーザー、スレッド、投稿がグラフ内の三種類のノードとして、関連性によって結びつけられています。KETCHは、ネットワークを通じて情報を集約するメッセージパッシング戦略を採用しています。さらに、潜在的な症状や臨床状態を捉えるために、知識強化型のアテンションメカニズムを導入しています。また、薬物の副作用予測のタスクにも本手法を適用し、KETCHが医療知識グラフを補完する可能性があることを示します。MRRの観点から7つの競合手法の最良の結果と比較した場合、KETCHはMedHelpデータセットで少なくとも0.125、Patientデータセットで0.048、HealthBoardsデータセットで0.092優れた結果を示しています。KETCHのソースコードは以下で公開しています：https://github.com/cuilimeng/KETCH。

Abstract:　 Health thread recommendation methods aim to suggest the most relevant existing threads for a user. Most of the existing methods tend to rely on modeling the post contents to retrieve relevant answers. However, some posts written by users with different clinical conditions can be lexically similar, as unrelated diseases (e.g., Angina and Osteoporosis) may have the same symptoms (e.g., back pain), yet irrelevant threads to a user. Therefore, it is critical to not only consider the connections between users and threads, but also the descriptions of users' symptoms and clinical conditions. In this paper, towards this problem of thread recommendation in online healthcare forums, we propose a knowledge graph enhanced Threads Recommendation (KETCH) model, which leverages graph neural networks to model the interactions among users and threads, and learn their representations. In our model, the users, threads and posts are three types of nodes in a graph, linked through their associations. KETCH uses the message passing strategy by aggregating information along with the network. In addition, we introduce a knowledge-enhanced attention mechanism to capture the latent conditions and symptoms. We also apply the method to the task of predicting the side effects of drugs, to show that KETCH has the potential to complement the medical knowledge graph. Comparing with the best results of seven competing methods, in terms of MRR, KETCH outperforms all methods by at least 0.125 on the MedHelp dataset, 0.048 on the Patient dataset and 0.092 on HealthBoards dataset, respectively. We release the source code of KETCH at: https://github.com/cuilimeng/KETCH.

Recognizing Medical Search Query Intent by Few-shot Learning
Authors: Yaqing Wang (1), Song Wang (2), Yanyan Li (1), Dejing Dou (1)
1: Baidu Inc., 2: Baidu Inc. & University of Virginia

ACM DL

Google Scholar

(46)
概要:　オンライン医療サービスはユーザーに無制限かつタイムリーな医療情報を提供することができ、社会的利益を促進し、場所の制約を打破します。しかし、医療に関連したクエリの背後にあるユーザーの意図を理解することは難しい問題です。医療関連の検索クエリは通常、短くてノイズが多く、厳密な構文的構造を欠いており、医療用語を理解するためには専門的な背景が必要です。さらに、医療の意図はきめ細かく、認識が困難です。加えて、多くの意図はラベル付きデータがほとんどありません。これらの問題に対処するために、私たちは医療検索クエリ意図認識のための少数ショット学習法であるMEDICを提案します。私たちは、ラベル付きデータの不足を補うために、ユーザーの検索ログから共クリッククエリを弱い監督データとして抽出します。また、新しいクエリエンコーダを設計し、外部の医療知識グラフに記録されたセマンティック知識、クエリ内の各単語の文法的役割を示すシンタクティック知識、及び大規模なテキストコーパスから事前学習された言語モデルによってキャプチャされた一般的知識の組み合わせとしてクエリを表現することを学びます。実際の医療検索クエリ意図認識データセットにおける実験結果は、MEDICの有効性を確認します。

Abstract:　 Online healthcare services can provide unlimited and in-time medical information to users, which promotes social goods and breaks the barriers of locations. However, understanding the user intents behind the medical related queries is a challenging problem. Medical search queries are usually short and noisy, lack strict syntactic structure, and also require professional background to understand the medical terms. The medical intents are fine-grained, making them hard to recognize. In addition, many intents only have a few labeled data. To handle these problems, we propose a few-shot learning method for medical search query intent recognition called MEDIC. We extract co-click queries from user search logs as weak supervision to compensate for the lack of labeled data. We also design a new query encoder which learns to represent queries as a combination of semantic knowledge recorded in an external medical knowledge graph, syntactic knowledge which marks the grammatical role of each word in the query, and generic knowledge which is captured by language models pretrained from large-scale text corpus. Experimental results on a real medical search query intent recognition dataset validate the effectiveness of MEDIC.

Single-shot Embedding Dimension Search in Recommender System
Authors: Liang Qu (1), Yonghong Ye (2), Ningzhi Tang (3), Lixin Zhang (2), Yuhui Shi (3), Hongzhi Yin (1)
1: The University of Queensland, 2: WeChat, 3: Southern University of Science and Technology

ACM DL

Google Scholar

(47)
概要:　ほとんどの現代のディープレコメンダーシステムの重要なコンポーネントとして、特徴埋め込みは高次元のスパースなユーザー・アイテム特徴を低次元の密な埋め込みにマッピングします。しかし、これらの埋め込みには通常、統一された次元が割り当てられており、以下の問題があります。(1) メモリ使用量および計算コストの高さ。(2) 不適切な次元割り当てによる性能の低下。これらの問題を緩和するため、一部の研究は、埋め込み次元の検索をハイパーパラメータ最適化または埋め込みプルーニング問題として定式化し、自動化された埋め込み次元検索に焦点を当てています。しかし、これらの方法はハイパーパラメータのよく設計された検索空間を必要とするか、時間のかかる最適化手順を要します。本論文では、単一ショット埋め込み次元検索方法であるSSEDS（Single-Shot Embedding Dimension Search）を提案し、モデルの推薦精度を保持しながら、単一ショットの埋め込みプルーニング操作を通じて各特徴フィールドに次元を効率的に割り当てることができます。具体的には、各特徴フィールドの各埋め込み次元の重要性を識別する基準を導入します。その結果、SSEDSは、次元の重要性ランキングと事前定義されたパラメータ予算に基づいて、冗長な埋め込み次元を明示的に削減することにより、自動的に混合次元の埋め込みを取得することができます。さらに、提案されたSSEDSはモデルに依存しないため、異なる基本推薦モデルに統合することができます。オフライン実験は、CTR（クリック率）予測タスクに関する2つの広く使用されている公開データセットで実施され、SSEDSはパラメータを90％削減しても強力な推薦パフォーマンスを達成できることを示しています。また、SSEDSはWeChatサブスクリプションプラットフォーム上で実際の推薦サービスに導入されており、7日間のオンラインA/Bテスト結果は、リソース消費を削減しながらオンライン推薦モデルの性能を大幅に向上させることができることを示しています。

Abstract:　 As a crucial component of most modern deep recommender systems, feature embedding maps high-dimensional sparse user/item features into low-dimensional dense embeddings. However, these embeddings are usually assigned a unified dimension, which suffers from the following issues: (1) high memory usage and computation cost. (2) sub-optimal performance due to inferior dimension assignments. In order to alleviate the above issues, some works focus on automated embedding dimension search by formulating it as hyper-parameter optimization or embedding pruning problems. However, they either require well-designed search space for hyperparameters or need time-consuming optimization procedures. In this paper, we propose a Single-Shot Embedding Dimension Search method, called SSEDS, which can efficiently assign dimensions for each feature field via a single-shot embedding pruning operation while maintaining the recommendation accuracy of the model. Specifically, it introduces a criterion for identifying the importance of each embedding dimension for each feature field. As a result, SSEDS could automatically obtain mixed-dimensional embeddings by explicitly reducing redundant embedding dimensions based on the corresponding dimension importance ranking and the predefined parameter budget. Furthermore, the proposed SSEDS is model-agnostic, meaning that it could be integrated into different base recommendation models. The extensive offline experiments are conducted on two widely used public datasets for CTR (Click Through Rate) prediction task, and the results demonstrate that SSEDS can still achieve strong recommendation performance even if it has reduced 90% parameters. Moreover, SSEDS has also been deployed on the WeChat Subscription platform for practical recommendation services. The 7-day online A/B test results show that SSEDS can significantly improve the performance of the online recommendation model while reducing resource consumption.

Forest-based Deep Recommender
Authors: Chao Feng (1), Defu Lian (1), Zheng Liu (2), Xing Xie (2), Le Wu (3), Enhong Chen (1)
1: University of Science and Technology of China, 2: Microsoft Research Asia, 3: Hefei university of Technology

ACM DL

Google Scholar

(48)
概要:　深層学習技術の発展に伴い、深層推薦モデルも推奨精度の面で顕著な向上を遂げています。しかし、実際には候補アイテムの数が多く、嗜好計算のコストが高いため、これらの手法は推奨効率の低さに悩まされています。最近提案された木構造に基づいた深層推薦モデルは、推奨目標の指導の下でツリー構造と表現を直接学習することにより、この問題を軽減します。しかし、このようなモデルには二つの欠点があります。第一に、階層ツリーにおける最大ヒープ仮定（親ノードの嗜好がその子ノードの嗜好の間で最大であるべきという仮定）は、二値分類の目標において満たすのが難しいことです。第二に、学習されたインデックスが単一の木しか含まないことで、広く使用されている複数の木のインデックスとは異なり、推奨精度を改善する余地があるということです。これを踏まえて、効率的な推奨のためのディープフォレストベースの推薦モデル（DeFoRec）を提案します。DeFoRecでは、トレーニングプロセス中に生成されたすべての木が保持され、森林を形成します。各ツリーのノード表現を学習する際、可能な限り最大ヒープ仮定を満たし、トレーニング段階でツリー内をビームサーチする行動を模倣する必要があります。DeFoRecは、トレーニングタスクを同じレベルのツリーノード上のマルチ分類として見なしこれを実現します。しかし、ツリーノードの数はレベルごとに指数関数的に増加するため、サンプルドソフトマックス技術の指導の下で嗜好モデルをトレーニングします。実験は実世界のデータセットで行われ、提案した嗜好モデル学習方法およびツリー学習方法の有効性を検証しています。

Abstract:　 With the development of deep learning techniques, deep recommendation models also achieve remarkable improvements in terms of recommendation accuracy. However, due to the large number of candidate items in practice and the high cost of preference computation, these methods also suffer from low efficiency of recommendation. The recently proposed tree-based deep recommendation models alleviate the problem by directly learning tree structure and representations under the guidance of recommendation objectives. However, such models have two shortcomings. First, the max-heap assumption in the hierarchical tree, in which the preference for a parent node should be the maximum between the preferences for its children, is difficult to satisfy in their binary classification objectives. Second, the learned index only includes a single tree, which is different from the widely-used multiple trees index, providing an opportunity to improve the accuracy of recommendation. To this end, we propose a Deep Forest-based Recommender (DeFoRec for short) for an efficient recommendation. In DeFoRec, all the trees generated during training process are retained to form the forest. When learning node representation of each tree, we have to satisfy the max-heap assumption as much as possible and mimic beam search behavior over the tree in the training stage. This is achieved by DeFoRec to regard the training task as multi-classification over tree nodes at the same level. However, the number of tree nodes grows exponentially with levels, making us to train the preference model by the guidance of sampled-softmax technique. The experiments are conducted on real-world datasets, validating the effectiveness of the proposed preference model learning method and tree learning method.

Scalable Exploration for Neural Online Learning to Rank with Perturbed Feedback
Authors: Yiling Jia (1), Hongning Wang (1)
1: University of Virginia

ACM DL

Google Scholar

(49)
概要:　ディープニューラルネットワーク（DNNs）は、検索タスクにおけるランキング性能の向上において顕著な利点を示しています。DNNsの最適化と一般化における最近の進展により、ユーザーとのインタラクションからオンラインでニューラルランキングモデルを学習することが可能になっています。しかし、モデル学習に必要な探索はニューラルネットワークの全パラメータ空間で行わなければならず、これは非常に費用がかかり、実際にはそのようなオンライン解決策の適用を制限します。本研究では、ブートストラップに基づいたオンライン対話型ニューラルランカー学習の効率的な探索戦略を提案します。私たちの解決策は、摂動されたユーザークリックフィードバックを使用して訓練されたランキングモデルのアンサンブルに基づいています。提案手法は、明示的な信頼集合の構築とその関連する計算オーバーヘッドを排除し、理論的な保証のもと、実際に効率的にオンラインニューラルランカーの訓練を可能にします。2つの公開されたランキング学習ベンチマークデータセット上で最先端のOL2Rアルゴリズムと広範な比較を行った結果、提案するニューラルOL2R解決策の有効性と計算効率が証明されました。

Abstract:　 Deep neural networks (DNNs) demonstrates significant advantages in improving ranking performance in retrieval tasks. Driven by the recent developments in optimization and generalization of DNNs, learning a neural ranking model online from its interactions with users becomes possible. However, the required exploration for model learning has to be performed in the entire neural network parameter space, which is prohibitively expensive and limits the application of such online solutions in practice. In this work, we propose an efficient exploration strategy for online interactive neural ranker learning based on bootstrapping. Our solution is based on an ensemble of ranking models trained with perturbed user click feedback. The proposed method eliminates explicit confidence set construction and the associated computational overhead, which enables the online neural rankers training to be efficiently executed in practice with theoretical guarantees. Extensive comparisons with an array of state-of-the-art OL2R algorithms on two public learning to rank benchmark datasets demonstrate the effectiveness and computational efficiency of our proposed neural OL2R solution.

On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation
Authors: Xin Xia (1), Hongzhi Yin (1), Junliang Yu (1), Qinyong Wang (2), Guandong Xu (3), Quoc Viet Hung Nguyen (4)
1: The University of Queensland, 2: Baidu Inc., 3: University of Technology Sydney, 4: Griffith University

ACM DL

Google Scholar

(50)
概要:　セッションベースの推薦システム（SBR）は、長期的なユーザープロファイルに依存せず、ログインなしでの推薦をサポートできるため、ますます人気が高まっています。現代の推薦システムは完全にサーバーベースで動作していますが、何百万ものユーザーに対応するために頻繁なモデルメンテナンスと高速な同時ユーザーリクエスト処理が求められ、その結果、大量のカーボンフットプリントが発生します。また、ユーザーは行動データを即時の環境コンテキストを含めてサーバーにアップロードする必要があり、プライバシーに関する懸念が生じています。オンデバイス推薦システムは、コスト意識の高い設定とローカル推論により、これら2つの問題を回避します。しかし、メモリと計算資源の制約により、オンデバイス推薦システムは次の2つの基本的な課題に直面しています: (1) エッジデバイスに適合するように通常のモデルのサイズをどのように縮小するか？ (2) 元のキャパシティをどのように保持するか？

以前の研究では、テンソル分解技術を採用して通常の推薦モデルを低圧縮率で圧縮することが多く、性能の大幅な低下を避けるためです。本論文では、テンソル分解における次元一致の制約を緩和することで、次のアイテム推薦のための超コンパクトモデルを探求します。圧縮によるキャパシティの損失を補うために、自己監督型知識蒸留フレームワークを開発し、圧縮モデル（生徒）が生データに含まれる重要な情報を蒸留し、元のモデル（教師）との埋め込み再結合戦略を通じてロングテールアイテムの推薦を改善します。2つのベンチマークに関する広範な実験により、30倍のサイズ縮小でも圧縮モデルがほとんど精度を失わないどころか、非圧縮モデルを上回ることが示されました。コードはhttps://github.com/xiaxin1998/OD-Recで公開しています。

Abstract:　 Session-based recommender systems (SBR) are becoming increasingly popular because they can predict user interests without relying on long-term user profile and support login-free recommendation. Modern recommender systems operate in a fully server-based fashion. To cater to millions of users, the frequent model maintaining and the high-speed processing for concurrent user requests are required, which comes at the cost of a huge carbon footprint. Meanwhile, users need to upload their behavior data even including the immediate environmental context to the server, raising the public concern about privacy. On-device recommender systems circumvent these two issues with cost-conscious settings and local inference. However, due to the limited memory and computing resources, on-device recommender systems are confronted with two fundamental challenges: (1) how to reduce the size of regular models to fit edge devices? (2) how to retain the original capacity? Previous research mostly adopts tensor decomposition techniques to compress regular recommendation models with low compression rates so as to avoid drastic performance degradation. In this paper, we explore ultra-compact models for next-item recommendation, by loosing the constraint of dimensionality consistency in tensor decomposition. To compensate for the capacity loss caused by compression, we develop a self-supervised knowledge distillation framework which enables the compressed model (student) to distill the essential information lying in the raw data, and improves the long-tail item recommendation through an embedding-recombination strategy with the original model (teacher). The extensive experiments on two benchmarks demonstrate that, with 30x size reduction, the compressed model almost comes with no accuracy loss, and even outperforms its uncompressed counterpart. The code is released at https://github.com/xiaxin1998/OD-Rec.

IR Evaluation and Learning in the Presence of Forbidden Documents
Authors: David Carmel (1), Nachshon Cohen (1), Amir Ingber (2), Elad Kravi (1)
1: Amazon, 2: Pinecone Systems

ACM DL

Google Scholar

(51)
概要:　多くの情報検索（IR）コレクションには、検索者に対して取得してはならない禁止文書（F-docs）が含まれています。理想的なシナリオでは、F-docsは明確にフラグ付けされ、ランカーがそれらをフィルタリングしてF-docが露出しないようにすることが保証されます。しかし、現実のシナリオでは、フィルタリングアルゴリズムはエラーが発生しがちです。したがって、IR評価システムはランキングの質に加えてフィルタリングの質も測定すべきです。通常、フィルタリングは分類タスクと見なされ、ランキングの質とは独立して評価されます。しかし、両者の相互依存性のため、フィルタリングの決定が行われる間にランキングの質を評価することが望ましいです。本研究では、nDCGminメトリック[14]を拡張し、検索結果のランキングとフィルタリングの質の両方を測定する新しいメトリックnDCGfを提案します。理論的および実証的に、nDCGminは同時ランキングとフィルタリングタスクには適していないのに対し、nDCGfはこの場合に信頼できるメトリックであることを示します。ランキングとフィルタリングの両方が必要な3つのデータセットで実験を行います。PRデータセットでは、スパムとしてマークされたレビューをフィルタリングしながら商品レビューをランク付けすることがタスクです。同様に、CQAデータセットでは、悪い回答をフィルタリングしながら、質問ごとの人間の回答リストをランク付けすることがタスクです。また、F-docsが明示的にラベル付けされたTREC web-trackデータセットで実験し、参加者の実行をランキングとフィルタリングの質に基づいて並べ替え、nDCGfの安定性、感度、信頼性を実証します。nDCGfを最適化するために特に設計されたランキングおよびフィルタリングの学習（LTRF）フレームワークを提案し、ランキングモデルの学習と、低スコアの文書を捨てるためのフィルタリング閾値の最適化を行います。いくつかの損失関数を用いた実験を行い、同時学習とフィルタリングタスクにおいて効果的なLTRFモデルの学習における成功を示します。

Abstract:　 Many IR collections contain forbidden documents (F-docs), i.e. documents that should not be retrieved to the searcher. In an ideal scenario F-docs are clearly flagged, hence the ranker can filter them out, guaranteeing that no F-doc will be exposed. However, in real-world scenarios, filtering algorithms are prone to errors. Therefore, an IR evaluation system should also measure filtering quality in addition to ranking quality. Typically, filtering is considered as a classification task and is evaluated independently of the ranking quality. However, due to the mutual affinity between the two, it is desirable to evaluate ranking quality while filtering decisions are being made. In this work we propose nDCGf, a novel extension of the nDCGmin metric[14], which measures both ranking and filtering quality of the search results. We show both theoretically and empirically that while nDCGmin is not suitable for the simultaneous ranking and filtering task, nDCGf is a reliable metric in this case. We experiment with three datasets for which ranking and filtering are both required. In the PR dataset our task is to rank product reviews while filtering those marked as spam. Similarly, in the CQA dataset our task is to rank a list of human answers per question while filtering bad answers. We also experiment with the TREC web-track datasets, where F-docs are explicitly labeled, sorting participant runs according to their ranking and filtering quality, demonstrating the stability, sensitivity, and reliability of nDCGf for this task. We propose a learning to rank and filter (LTRF) framework that is specifically designed to optimize nDCGf, by learning a ranking model and optimizing a filtering threshold used for discarding documents with lower scores. We experiment with several loss functions demonstrating their success in learning an effective LTRF model for the simultaneous learning and filtering task.

Human Preferences as Dueling Bandits
Authors: Xinyi Yan (1), Chengxi Luo (1), Charles L. A. Clarke (1), Nick Craswell (2), Ellen M. Voorhees (3), Pablo Castells (4)
1: University of Waterloo, 2: Microsoft, 3: National Institute of Standards and Technology, 4: Universidad Autónoma de Madrid

ACM DL

Google Scholar

(52)
概要:　の翻訳：
ニューラルランカーによるコア情報検索タスクの劇的な改善は、新しい評価方法の必要性を生み出しています。すべてのランカーがトップランクに非常に関連性の高い項目を返す場合、ランカー間の意味ある差異を認識し、再利用可能なテストコレクションを構築することが難しくなります。最近のいくつかの論文では、従来の段階的関連性評価に代わるものとして、ペアワイズの選好判断を検討しています。項目を1つずつ見るのではなく、審査者が並べて表示された項目を見て、クエリに対してより良い応答を提供する項目を選ぶことで、細かい区別が可能になります。選好判断を用いて各クエリに対して最も優れた項目を特定する場合、それらの項目をできるだけ高い位置に配置するランカーの能力を測定することができます。当方は、最良項目を見つける問題をデュエリングバンディット問題として捉えます。多くの論文がインターリーブを介したオンラインランカー評価のためにデュエリングバンディットを検討していますが、人間の選好判断によるオフライン評価の枠組みとしては考慮されていません。われわれは可能な解決策として文献を検討します。人間の選好判断においては、使用可能なアルゴリズムは、審査者にとってほぼ同等に見える項目が存在するため、引き分けを許容する必要があり、また各比較には独立した審査者が必要なため、特定のペアに必要な判断数を最小限に抑える必要があります。ほとんどのアルゴリズムによって提供される理論的保証は、人間の選好判断では満たされない前提に依存するため、代表的なテストケースに対して選定されたアルゴリズムをシミュレートして、その実用性についての洞察を提供します。このシミュレーションに基づき、可能性があるアルゴリズムが浮かび上がります。このアルゴリズムの性能をさらに向上させるための修正を提案し、修正されたアルゴリズムを使用して、TREC 2021 ディープラーニングトラックの提出物から得られたプールに対して10,000を超える選好判断を収集し、その適合性を確認します。最良項目の評価というアイデアをテストし、さらに理論的および実践的な進展のためのアイデアを提案します。

Abstract:　 The dramatic improvements in core information retrieval tasks engendered by neural rankers create a need for novel evaluation methods. If every ranker returns highly relevant items in the top ranks, it becomes difficult to recognize meaningful differences between them and to build reusable test collections. Several recent papers explore pairwise preference judgments as an alternative to traditional graded relevance assessments. Rather than viewing items one at a time, assessors view items side-by-side and indicate the one that provides the better response to a query, allowing fine-grained distinctions. If we employ preference judgments to identify the probably best items for each query, we can measure rankers by their ability to place these items as high as possible. We frame the problem of finding best items as a dueling bandits problem. While many papers explore dueling bandits for online ranker evaluation via interleaving, they have not been considered as a framework for offline evaluation via human preference judgments. We review the literature for possible solutions. For human preference judgments, any usable algorithm must tolerate ties, since two items may appear nearly equal to assessors, and it must minimize the number of judgments required for any specific pair, since each such comparison requires an independent assessor. Since the theoretical guarantees provided by most algorithms depend on assumptions that are not satisfied by human preference judgments, we simulate selected algorithms on representative test cases to provide insight into their practical utility. Based on these simulations, one algorithm stands out for its potential. Our simulations suggest modifications to further improve its performance. Using the modified algorithm, we collect over 10,000 preference judgments for pools derived from submissions to the TREC 2021 Deep Learning Track, confirming its suitability. We test the idea of best-item evaluation and suggest ideas for further theoretical and practical progress.

A Flexible Framework for Offline Effectiveness Metrics
Authors: Alistair Moffat (1), Joel Mackenzie (2), Paul Thomas (3), Leif Azzopardi (4)
1: The University of Melbourne, 2: The University of Queensland, 3: Microsoft, 4: University of Strathclyde

ACM DL

Google Scholar

(53)
概要:　オフライン効果測定指標の使用は、情報検索評価の重要な柱の一つです。テストコレクションやトピックセット、それらを結びつける関連性評価、および検索システムからのドキュメントランキングを数値スコアに変換する指標を含む静的リソースは、システム比較の重要な方法として数十年にわたり使用されてきました。この実験構造の背景には、システムの指標スコアがユーザー満足度の代理測定として機能するという考え方があります。本研究では、C/W/L系を拡張したユーザー行動フレームワークを紹介します。新しいフレームワークはC/W/L/Aと呼び、ユーザーがランキングを閲覧する際に行う行動と、ランキングを退出した際に得られる利益を分けて考えることがその本質です。この分割構造により、現在の大多数の効果測定指標を体系的に分類でき、それらの相対的な特性と関係性をより理解しやすくします。同時に、新しい組み合わせを幅広く考えることも可能にします。その後、2つの異なるソースから取得した関連性評価、ドキュメントランキング、およびユーザー満足度データを使用して実験を行い、生成された指標スコアのパターンを比較し、それらの指標がユーザー満足度を予測する能力においてかなり顕著に異なることを示します。

Abstract:　 The use of offline effectiveness metrics is one of the cornerstones of evaluation in information retrieval. Static resources that include test collections and sets of topics, the corresponding relevance judgments connecting them, and metrics that map document rankings from a retrieval system to numeric scores have been used for multiple decades as an important way of comparing systems. The basis behind this experimental structure is that the metric score for a system can serve as a surrogate measurement for user satisfaction. Here we introduce a user behavior framework that extends the C/W/L family. The essence of the new framework - which we call C/W/L/A - is that the user actions that are undertaken while reading the ranking can be considered separately from the benefit that each user will have derived as they exit the ranking. This split structure allows the great majority of current effectiveness metrics to be systematically categorized, and thus their relative properties and relationships to be better understood; and at the same time permits a wide range of novel combinations to be considered. We then carry out experiments using relevance judgments, document rankings, and user satisfaction data from two distinct sources, comparing the patterns of metric scores generated, and showing that those metrics vary quite markedly in terms of their ability to predict user satisfaction.

Ranking Interruptus: When Truncated Rankings Are Better and How to Measure That
Authors: Enrique Amigó (1), Stefano Mizzaro (2), Damiano Spina (3)
1: UNED NLP & IR Group, 2: University of Udine, 3: RMIT University

ACM DL

Google Scholar

(54)
概要:　多くの情報検索効果測定メトリクスは、関係のない文書をランキングの下位に追加するシステムが、ランキングを適切な位置で「カット」し、関係のない文書の取得を避けるストップ条件を持つシステムと同等（またはそれほど悪くない）と仮定しています。しかし、こうしたカットされたランキングはエンドユーザーにとってより有用であると言えます。このシナリオで検索効果をどのように測定するかを理解することは重要です。本論文では、理論的および実験的な貢献を提供します。まず、カットされたランキングを評価する際にメトリクスがどのように振る舞うかを分析するための形式的な特性を定義します。理論的な分析の結果、既成事実となっている標準メトリクスは、カットされたランキングを評価するための望ましい特性を満たしていないことが示されました。シャノンの情報理論に基づくメトリクスである観察情報効果（OIE）だけが、これらの特性すべてを満たしています。その後、9つのTRECデータセットに対していくつかのメトリクスを比較する実験を行います。実験結果によると、カットされたランキングに対して最も適切なメトリクスは、OIEと、関係のない文書の取得を罰するユーザー努力要因を追加したランクバイアス精度の新しい拡張版であることが判明しました。

Abstract:　 Most of information retrieval effectiveness evaluation metrics assume that systems appending irrelevant documents at the bottom of the ranking are as effective as (or not worse than) systems that have a stopping criteria to 'truncate' the ranking at the right position to avoid retrieving those irrelevant documents at the end. It can be argued, however, that such truncated rankings are more useful to the end user. It is thus important to understand how to measure retrieval effectiveness in this scenario. In this paper we provide both theoretical and experimental contributions. We first define formal properties to analyze how effectiveness metrics behave when evaluating truncated rankings. Our theoretical analysis shows that de-facto standard metrics do not satisfy desirable properties to evaluate truncated rankings: only Observational Information Effectiveness (OIE) -- a metric based on Shannon's information theory -- satisfies them all. We then perform experiments to compare several metrics on nine TREC datasets. According to our experimental results, the most appropriate metrics for truncated rankings are OIE and a novel extension of Rank-Biased Precision that adds a user effort factor penalizing the retrieval of irrelevant documents.

Offline Retrieval Evaluation Without Evaluation Metrics
Authors: Fernando Diaz (1), Andres Ferraro (2)
1: Google, 2: Mila - Quebec Artificial Intelligence Institute

ACM DL

Google Scholar

(55)
概要:　情報検索およびレコメンデーションのオフライン評価は、従来、平均適合率や正規化割引累積利得といったスカラー値にランキングの品質を抽出することに重点を置いていました。このような指標を用いて、同じリクエストに対する複数のシステムの性能を比較することができます。評価指標はシステム性能のを便利にしますが、利用者間の微妙な違いを一つの数値にまとめるため、検索シナリオ全体でユーザーの行動や有用性に関する仮定を持ち込むことがあります。我々は、ランキングリスト間の優劣を直接的に算出するメトリックフリー評価手法である想起対ペア選好 (RPP) を提案します。RPPは、クエリごとに複数のユーザーサブポピュレーションをシミュレートし、これらの疑似集団間でシステムを比較します。複数の検索およびレコメンデーションタスクにおける我々の結果は、RPPが既存の指標とよく相関し、不完全なデータに対しても同様に堅牢でありながら、判別力を大幅に向上させることを示しています。

Abstract:　 Offline evaluation of information retrieval and recommendation has traditionally focused on distilling the quality of a ranking into a scalar metric such as average precision or normalized discounted cumulative gain. We can use this metric to compare the performance of multiple systems for the same request. Although evaluation metrics provide a convenient summary of system performance, they also collapse subtle differences across users into a single number and can carry assumptions about user behavior and utility not supported across retrieval scenarios. We propose recall-paired preference (RPP), a metric-free evaluation method based on directly computing a preference between ranked lists. RPP simulates multiple user subpopulations per query and compares systems across these pseudo-populations. Our results across multiple search and recommendation tasks demonstrate that RPP substantially improves discriminative power while correlating well with existing metrics and being equally robust to incomplete data.

Information Need Awareness: An EEG Study
Authors: Dominika Michalkova (1), Mario Parra-Rodriguez (1), Yashar Moshfeghi (1)
1: University of Strathclyde

ACM DL

Google Scholar

(56)
概要:　情報検索（Information Retrieval, IR）の基本的な目標は、検索者の情報ニーズ（Information Need, IN）を満たすことです。神経画像技術の進歩により、INの実現に関連する脳活動を調査する学際的な研究が可能になりました。これらの研究は有益でしたが、INの実現に伴う認知プロセスやそれらの相互作用を高い時間分解能で捉えることはできませんでした。本論文は、INの状態と他の2つの（非IN）シナリオとの対比に基づいて脳活動の変動を推測することで、この研究課題を調査することを目的としています。この目的のために、我々は脳波（Electroencephalography, EEG）を使用し、参加者がINの実現を経験している間に記録された脳信号の事象関連電位（Event-Related Potential, ERP）分析を構築しました。具体的には、質問応答（Question-Answering, Q/A）課題を実行している間に、24人の健常参加者の脳信号を記録しました。我々の結果は、INの認識に対応する初期の処理段階と記憶制御メカニズムを意味する後期の活動との関連を示しています。さらに、参加者は初期のN1-P2複合過程を示しており、これによりINの実現がユーザーの意識に到達する前に脳内で表れていることが示唆されます。この研究はINの理解を深めるための新たな知見を提供し、IRシステムをより効果的に設計するための情報を提供します。

Abstract:　 A fundamental goal of Information Retrieval (IR) is to satisfy searchers' information need (IN). Advances in neuroimaging technologies have allowed for interdisciplinary research to investigate the brain activity associated with the realisation of IN. While these studies have been informative, they were not able to capture the cognitive processes underlying the realisation of IN and the interplay between them with a high temporal resolution. This paper aims to investigate this research question by inferring the variability of brain activity based on the contrast of a state of IN with the two other (no-IN) scenarios. To do so, we employed Electroencephalography (EEG) and constructed an Event-Related Potential (ERP) analysis of the brain signals captured while the participants were experiencing the realisation of IN. In particular, the brain signals of 24 healthy participants were captured while performing a Question-Answering (Q/A) Task. Our results show a link between the early stages of processing, corresponding to awareness and the late activity, meaning memory control mechanisms. Our findings also show that participants exhibited early N1-P2 complex indexing awareness processes and indicate, thus, that the realisation of IN is manifested in the brain before it reaches the user's consciousness. This research contributes novel insights into a better understanding of IN and informs the design of IR systems to better satisfy it.

Offline Evaluation of Ranked Lists using Parametric Estimation of Propensities
Authors: Vishwa Vinay (1), Manoj Kilaru (2), David Arbour (3)
1: Adobe Research, 2: University of California, 3: Adobe Research

ACM DL

Google Scholar

(57)
概要:　検索エンジンや推奨システムは、ユーザーに提供する体験の質を継続的に向上させることを目指しています。このプロセスにおいて、ユーザーのリクエストに応じて表示されるリストを生成するランカーの精緻化は重要な要素です。一般的な手法として、サービス提供者は新しいランキング要素や異なるランキングモデルを導入し、ユーザーの一部に対してA/Bテストを行い、変更の有効性を検証します。これに対して、提案された変更の効果をオフラインで推定する方法もあります。この方法では、以前に収集された旧ランカーに基づくクリックデータを利用して、新ランカーによって生成されたランキングリストに対するユーザーの行動を仮定します。ほとんどのオフライン評価手法では、ログデータに内在するバイアスを補正するために、よく研究されている逆傾向重み付けを用います。本論文では、これらの傾向をパラメトリックに推定する手法を提案します。具体的には、よく知られた学習-to-ランク手法をサブルーチンとして利用することで、評価対象の新しいランキングがログデータのものと異なる場合でも、正確なオフライン評価が可能であることを示します。

Abstract:　 Search engines and recommendation systems attempt to continually improve the quality of the experience they afford to their users. Refining the ranker that produces the lists displayed in response to user requests is an important component of this process. A common practice is for the service providers to make changes (e.g. new ranking features, different ranking models) and A/B test them on a fraction of their users to establish the value of the change. An alternative approach estimates the effectiveness of the proposed changes offline, utilising previously collected clickthrough data on the old ranker to posit what the user behaviour on ranked lists produced by the new ranker would have been. A majority of offline evaluation approaches invoke the well studied inverse propensity weighting to adjust for biases inherent in logged data. In this paper, we propose the use of parametric estimates for these propensities. Specifically, by leveraging well known learning-to-rank methods as subroutines, we show how accurate offline evaluation can be achieved when the new rankings to be evaluated differ from the logged ones.

Why Don't You Click: Understanding Non-Click Results in Web Search with Brain Signals
Authors: Ziyi Ye (1), Xiaohui Xie (1), Yiqun Liu (1), Zhihong Wang (1), Xuancheng Li (1), Jiaji Li (2), Xuesong Chen (1), Min Zhang (1), Shaoping Ma (1)
1: Tsinghua University, 2: The Chinese University of Hong Kong

ACM DL

Google Scholar

(58)
概要:　ウェブ検索は、パフォーマンスの評価と改善に不可欠なフィードバック信号としてクリックスルー行動に大きく依存しています。従来、クリックは通常、有用性や関連性のポジティブな暗黙のフィードバック信号として扱われ、クリックされないことは無用性や非関連性の信号と見なされていました。しかし、検索エンジン結果ページ（SERP）に表示された内容でユーザーが情報ニーズを満たす場合も多くあります。このため、クリックされない結果の有用性を測定し、そのような状況でユーザーの満足度をモデル化する問題が浮上しています。長い間、ユーザーの相互作用が乏しいため、クリックされない結果を理解することは困難でした。近年では、神経画像技術の急速な発展が検索、エンターテインメント、教育などのさまざまな業界にパラダイムシフトをもたらしています。そこで、これらの技術を活用し、クリックされない状況における人間の思考と外部検索システムのギャップを埋めることに取り組みました。そのために、私たちはクリックされない検索結果の調査における異なる有用性レベルの間での脳信号の違いを分析しました。これらの発見に基づき、脳信号と従来の情報（コンテンツおよびコンテキスト要因）を用いて、クリックされない結果の有用性を推定するための教師あり学習タスクを実施しました。さらに、推定された有用性に基づく検索結果の再ランキングのために、パーソナライズド方法（PM）および汎用インテントモデリング方法（GIM）の2つの再ランキング方法を考案しました。結果は、脳信号を利用して有用性推定のパフォーマンスを向上させ、検索結果の再ランキングによって人間とコンピューターの相互作用を強化することが可能であることを示しています。

Abstract:　 Web search heavily relies on click-through behavior as an essential feedback signal for performance evaluation and improvement. Traditionally, click is usually treated as a positive implicit feedback signal of relevance or usefulness, while non-click is regarded as a signal of irrelevance or uselessness. However, there are many cases where users satisfy their information need with the contents shown on the Search Engine Result Page (SERP). This raises the problem of measuring the usefulness of non-click results and modeling user satisfaction in such circumstances. For a long period, understanding non-click results is challenging owing to the lack of user interactions. In recent years, the rapid development of neuroimaging technologies constitutes a paradigm shift in various industries, e.g., search, entertainment, and education. Therefore, we benefit from these technologies and apply them to bridge the gap between the human mind and the external search system in non-click situations. To this end, we analyze the differences in brain signals between the examination of non-click search results in different usefulness levels. Inspired by these findings, we conduct supervised learning tasks to estimate the usefulness of non-click results with brain signals and conventional information (i.e., content and context factors). Furthermore, we devise two re-ranking methods, i.e., a Personalized Method (PM) and a Generalized Intent modeling Method (GIM), for search result re-ranking with the estimated usefulness. Results show that it is feasible to utilize brain signals to improve usefulness estimation performance and enhance human-computer interactions by search result re-ranking.

Post Processing Recommender Systems with Knowledge Graphs for Recency, Popularity, and Diversity of Explanations
Authors: Giacomo Balloccu (1), Ludovico Boratto (1), Gianni Fenu (1), Mirko Marras (1)
1: University of Cagliari

ACM DL

Google Scholar

(59)
概要:　既存の説明可能なレコメンダーシステムは、主に推奨商品とすでに経験した商品の間の関係をモデル化し、それに基づいて説明タイプを形成してきました（例：女優「y」が出演した映画「x」が、他の映画で「y」を見たことのあるユーザーに推奨される）。しかし、これらのシステムのどれも、単一の説明の特性（例：その女優との最新のインタラクション）や推奨リストの説明群の特性（例：説明タイプの多様性）が、どの程度説明の質に影響を与えるかを調査していませんでした。本論文では、説明の質をモデル化するための新しい3つの特性（インタラクションの最近性、共有エンティティの人気度、説明タイプの多様性）を概念化し、これらの特性を最適化することができる再ランク付けアプローチを提案します。2つの公開データセットで行った実験により、提案された特性に基づいて説明の質を向上させつつ、推奨の有用性を保持し、かつ人口統計グループ全体で公平に向上できることが示されました。ソースコードとデータはhttps://github.com/giacoballoccu/explanation-quality-recsysで利用可能です。

Abstract:　 Existing explainable recommender systems have mainly modeled relationships between recommended and already experienced products, and shaped explanation types accordingly (e.g., movie "x" starred by actress "y" recommended to a user because that user watched other movies with "y" as an actress). However, none of these systems has investigated the extent to which properties of a single explanation (e.g., the recency of interaction with that actress) and of a group of explanations for a recommended list (e.g., the diversity of the explanation types) can influence the perceived explaination quality. In this paper, we conceptualized three novel properties that model the quality of the explanations (linking interaction recency, shared entity popularity, and explanation type diversity) and proposed re-ranking approaches able to optimize for these properties. Experiments on two public data sets showed that our approaches can increase explanation quality according to the proposed properties, fairly across demographic groups, while preserving recommendation utility. The source code and data are available at https://github.com/giacoballoccu/explanation-quality-recsys.

Explainable Legal Case Matching via Inverse Optimal Transport-based Rationale Extraction
Authors: Weijie Yu (1), Zhongxiang Sun (1), Jun Xu (1), Zhenhua Dong (2), Xu Chen (1), Hongteng Xu (1), Ji-Rong Wen (1)
1: Renmin University of China, 2: Huawei Noah's Ark Lab

ACM DL

Google Scholar

(60)
概要:　法的検索の基本操作として、判例マッチングはインテリジェントな法的システムにおいて中心的な役割を果たします。このタスクは、その結果が下流のアプリケーションに与える重大な影響から、マッチング結果の説明可能性に対して高い要求があります。つまり、マッチングされた判例は、対象となる案件の判決に対して支持証拠を提供し、公平性と正義に影響を与える可能性があります。この難しいタスクに焦点を当て、我々は新しい説明可能な手法であるIOT-Matchを提案します。これは計算最適輸送の助けを借りて、逆最適輸送（IOT）問題として判例マッチング問題を定式化します。従来の多くの手法が、単に判例間の文レベルの意味的類似性に焦点を当てているのとは異なり、IOT-Matchは文の意味と法的特性の両方に基づいて、ペアになった判例から合理的根拠を抽出することを学習します。抽出された合理的根拠は、さらなる信頼性のある説明を生成し、マッチングを実施するために活用されます。加えて、提案したIOT-Matchは、実際の判例マッチングタスクで一般的に見られるアライメントラベル不足の問題に対して頑健であり、監督学習と半監督学習の両方のパラダイムに適しています。我々のIOT-Match手法の優位性を実証すると共に、説明可能な判例マッチングタスクのベンチマークを構築するために、広く知られた法のAI挑戦（CAIL）データセットを拡張するだけでなく、新たに多数の説明可能な注釈が含まれたExplainable Legal cAse Matching（ELAM）データセットを構築しました。これら二つのデータセットにおける実験により、IOT-Matchがマッチング予測、根拠抽出、および説明生成の各分野において最新の方法よりも一貫して優れていることが示されました。

Abstract:　 As an essential operation of legal retrieval, legal case matching plays a central role in intelligent legal systems. This task has a high demand on the explainability of matching results because of its critical impacts on downstream applications --- the matched legal cases may provide supportive evidence for the judgments of target cases and thus influence the fairness and justice of legal decisions. Focusing on this challenging task, we propose a novel and explainable method, namely IOT-Match, with the help of computational optimal transport, which formulates the legal case matching problem as an inverse optimal transport (IOT) problem. Different from most existing methods, which merely focus on the sentence-level semantic similarity between legal cases, our IOT-Match learns to extract rationales from paired legal cases based on both semantics and legal characteristics of their sentences. The extracted rationales are further applied to generate faithful explanations and conduct matching. Moreover, the proposed IOT-Match is robust to the alignment label insufficiency issue commonly in practical legal case matching tasks, which is suitable for both supervised and semi-supervised learning paradigms. To demonstrate the superiority of our IOT-Match method and construct a benchmark of explainable legal case matching task, we not only extend the well-known Challenge of AI in Law (CAIL) dataset but also build a new Explainable Legal cAse Matching (ELAM) dataset, which contains lots of legal cases with detailed and explainable annotations. Experiments on these two datasets show that our IOT-Match outperforms state-of-the-art methods consistently on matching prediction, rationale extraction, and explanation generation.

Towards Explainable Search Results: A Listwise Explanation Generator
Authors: Puxuan Yu (1), Razieh Rahimi (1), James Allan (1)
1: University of Massachusetts Amherst

ACM DL

Google Scholar

(61)
概要:　検索結果の解釈可能性は、ドキュメントがカバーするクエリの側面が明示されたときに向上することが示されています。しかし、既存の検索結果の側面指向の説明は、各ドキュメントを独立して説明しています。そのため、これらの説明はドキュメント間の違いを記述することができません。この問題は、クエリの側面生成に関する既存モデルでも同様です。さらに、これらのモデルは各ドキュメントに対して一つの側面しか提供しませんが、ドキュメントはしばしば複数の側面をカバーします。この制限を克服するために、私たちはLiEGeを提案します。LiEGeは検索結果リスト全体を共同で説明します。LiEGeは、ドキュメントとそのトークンの二つの粒度レベルで意味的な表現を提供し、クロスドキュメントの相互作用を含む異なる相互作用信号を使用します。これにより、検索結果リストの一覧的なモデリングと、ドキュメントの一貫した説明の生成が可能になります。複数のクエリ側面をカバーするドキュメントを適切に説明するために、包括的説明生成と新規性説明生成の二つの設定を導入しました。LiEGeは両方の設定で訓練および評価されています。私たちは、WikipediaとBing検索エンジンの実際のクエリログから構築したデータセットでLiEGeを評価しました。実験結果は、LiEGeがすべてのベースラインを上回り、改善が重要かつ統計的に有意であることを示しています。

Abstract:　 It has been shown that the interpretability of search results is enhanced when query aspects covered by documents are explicitly provided. However, existing work on aspect-oriented explanation of search results explains each document independently. These explanations thus cannot describe the differences between documents. This issue is also true for existing models on query aspect generation. Furthermore, these models provide a single query aspect for each document, even though documents often cover multiple query aspects. To overcome these limitations, we propose LiEGe, an approach that jointly explains all documents in a search result list. LiEGe provides semantic representations at two levels of granularity -- documents and their tokens -- using different interaction signals including cross-document interactions. These allow listwise modeling of a search result list as well as the generation of coherent explanations for documents. To appropriately explain documents that cover multiple query aspects, we introduce two settings for search result explanation: comprehensive and novelty explanation generation. LiEGe is trained and evaluated for both settings. We evaluate LiEGe on datasets built from Wikipedia and real query logs of the Bing search engine. Our experimental results demonstrate that LiEGe outperforms all baselines, with improvements that are substantial and statistically significant.

Explainable Fairness in Recommendation
Authors: Yingqiang Ge (1), Juntao Tan (1), Yan Zhu (2), Yinglong Xia (2), Jiebo Luo (3), Shuchang Liu (1), Zuohui Fu (1), Shijie Geng (1), Zelong Li (1), Yongfeng Zhang (1)
1: Rutgers University, 2: Meta Platforms, 3: University of Rochester

ACM DL

Google Scholar

(62)
概要:　現存する公正性に配慮したレコメンデーションの研究は、公正性の定量化や公正なレコメンデーションモデルの開発に主に焦点を当てており、モデルの不均等の原因を特定するというより実質的な問題には取り組んでいません。この情報は、レコメンダーシステム設計者がレコメンデーションの本質的なメカニズムを理解するために不可欠であり、意思決定者にモデルの公正性を改善するための洞察を提供します。幸いなことに、説明可能なAIの急速な発展により、モデルの説明可能性を利用してモデルの（不）公正さを理解することができます。本論文では、公正性の説明可能性の問題を研究し、システムが公正または不公正である理由を明らかにすることで、より情報に基づいた統一された方法論で公正なレコメンダーシステムの設計を導くことを目指します。特に、特徴に基づいたレコメンデーションと露出の不公正という一般的な設定に焦点を当てますが、提案された説明可能な公正性の枠組みは一般化可能であり、他のレコメンデーション設定や公正性の定義にも適用できます。私たちは、CEF（Counterfactual Explainable Fairness）と呼ばれる公正性を説明するための反実的な枠組みを提案し、この枠組みはモデルの性能を大きく損なうことなく公正性を向上させるための説明を生成します。CEFフレームワークは、レコメンデーション結果を一定の公正性のレベルに変える入力特徴の「最小」変更を学習する最適化問題を定式化します。各特徴の反実的なレコメンデーション結果に基づいて、公正性と有用性のトレードオフに関する説明可能性スコアを計算し、すべての特徴ベースの説明をランク付けし、上位のものを公正性の説明として選択します。複数の実世界のデータセットにおける実験結果は、私たちの手法がモデルの不均等の説明を効果的に提供でき、これらの説明をレコメンデーションに使用することで、すべてのベースライン手法よりも優れた公正性と有用性のトレードオフを達成できることを実証しています。

Abstract:　 Existing research on fairness-aware recommendation has mainly focused on the quantification of fairness and the development of fair recommendation models, neither of which studies a more substantial problem--identifying the underlying reason of model disparity in recommendation. This information is critical for recommender system designers to understand the intrinsic recommendation mechanism and provides insights on how to improve model fairness to decision makers. Fortunately, with the rapid development of Explainable AI, we can use model explainability to gain insights into model (un)fairness. In this paper, we study the problem ofexplainable fairness, which helps to gain insights about why a system is fair or unfair, and guides the design of fair recommender systems with a more informed and unified methodology. Particularly, we focus on a common setting with feature-aware recommendation and exposure unfairness, but the proposed explainable fairness framework is general and can be applied to other recommendation settings and fairness definitions. We propose a Counterfactual Explainable Fairness framework, called CEF, which generates explanations about model fairness that can improve the fairness without significantly hurting the performance. The CEF framework formulates an optimization problem to learn the "minimal'' change of the input features that changes the recommendation results to a certain level of fairness. Based on the counterfactual recommendation result of each feature, we calculate an explainability score in terms of the fairness-utility trade-off to rank all the feature-based explanations, and select the top ones as fairness explanations. Experimental results on several real-world datasets validate that our method is able to effectively provide explanations to the model disparities and these explanations can achieve better fairness-utility trade-off when using them for recommendation than all the baselines.

PEVAE: A Hierarchical VAE for Personalized Explainable Recommendation.
Authors: Zefeng Cai (1), Zerui Cai (1)
1: East China Normal University

ACM DL

Google Scholar

(63)
概要:　変分オートエンコーダー（VAE）は推薦システムに広く応用されています。その一因として、VAEのアモタイズド推論（amortized inference）がデータのスパース性を克服するために有益であることが挙げられます。しかし、自然言語での説明を生成する説明可能な推薦においては、まだあまり探索されていません。したがって、我々はVAEを説明可能な推薦に拡張することを目指します。このタスクにおいて、VAEは利用可能な訓練サンプルが少ないユーザーに対して受け入れ可能な説明を生成できますが、訓練サンプルが相対的に十分なユーザーに対しては、オートエンコーダー（AE）よりも個別化されていない説明を生成しやすい傾向があります。我々は、VAEにおいて異なるユーザー間で共有される情報が特定のユーザーに対する情報を混乱させていると推測します。この問題に対処するために、説明可能な推薦のために個別化された自然言語の説明を生成する「個別化VAE（PEVAE）」を提案します。さらに、我々のモデルがより個別化された説明を生成するための2つの新しいメカニズムを提案します。1) 「自己適応フュージョン（SAF）」は、共有情報の影響を制御するために潜在空間を自己適応的に操作します。これにより、我々のモデルはデータのスパース性を克服する利点を享受しながら、訓練サンプルが相対的に十分なユーザーに対してより個別化された説明を生成できます。2) 「依存最大化（DEM）」は、推薦と説明の間の相互情報量を最大化することで依存関係を強化します。これにより、説明が入力されたユーザーアイテムペアに対してより具体的になり、生成された説明の個別化が向上します。広範な実験により、PEVAEがより個別化された説明を生成できることが示され、さらなる分析により提案したメソッドの実際の効果が実証されました。

Abstract:　 Variational autoencoders (VAEs) have been widely applied in recommendations. One reason is that their amortized inferences are beneficial for overcoming the data sparsity. However, in explainable recommendation that generates natural language explanations, they are still rarely explored. Thus, we aim to extend VAE to explainable recommendation. In this task, we find that VAE can generate acceptable explanations for users with few relevant training samples, however, it tends to generate less personalized explanations for users with relatively sufficient samples than autoencoders (AEs). We conjecture that information shared by different users in VAE disturbs the information for a specific user. To deal with this problem, we present PErsonalized VAE (PEVAE) that generates personalized natural language explanations for explainable recommendation. Moreover, we propose two novel mechanisms to aid our model in generating more personalized explanations, including 1) Self-Adaption Fusion (SAF) manipulates the latent space in a self-adaption manner for controlling the influence of shared information. In this way, our model can enjoy the advantage of overcoming the sparsity of data while generating more personalized explanations for a user with relatively sufficient training samples. 2) DEpendence Maximization (DEM) strengthens dependence between recommendations and explanations by maximizing the mutual information. It makes the explanation more specific to the input user-item pair and thus improves the personalization of the generated explanations. Extensive experiments show PEVAE can generate more personalized explanations and further analyses demonstrate the practical effect of our proposed methods.

Joint Multisided Exposure Fairness for Recommendation
Authors: Haolun Wu (1), Bhaskar Mitra (2), Chen Ma (3), Fernando Diaz (4), Xue Liu (1)
1: McGill University, 2: Microsoft, 3: City University of Hong Kong, 4: Canadian CIFAR AI Chair & Google

ACM DL

Google Scholar

(64)
概要:　推薦システムにおける露出の公平性に関する先行研究は、主に個々のユーザーに対する個々のアイテムやグループのアイテムの露出の格差に焦点を当ててきました。しかし、アイテムの個々またはグループがユーザーのグループ、あるいは全ユーザーに対して体系的に過小または過剰に露出する問題については、比較的注目されていません。情報の露出におけるこのような体系的格差は、歴史的に周縁化されたグループから経済的機会を奪う（配分の不公正）や、性別や人種に基づくステレオタイプを増幅する（表現の不公正）といった観察可能な社会的害悪をもたらす可能性があります。以前、Diazらは、情報検索のために開発された既存のユーザー閲覧モデルを組み込んだ期待露出指標を開発し、個々のユーザーに対するコンテンツ露出の公平性を研究しました。私たちは彼らの提案したフレームワークを拡張し、消費者と生産者の両方の視点から問題を共同でモデル化する露出の公平性指標のファミリーを正式化します。具体的には、両タイプのステークホルダーのグループ属性を考慮し、個々のユーザーやアイテムを超えたより体系的な推薦のバイアスに対する公平性の懸念を識別し、緩和します。さらに、本稿で提案する異なる露出公平性の次元間の関係を研究し、議論するとともに、確率順位付けのポリシーがこの公平性目標に向かってどのように最適化できるかを示します。

Abstract:　 Prior research on exposure fairness in the context of recommender systems has focused mostly on disparities in the exposure of individual or groups of items to individual users of the system. The problem of how individual or groups of items may be systemically under or over exposed to groups of users, or even all users, has received relatively less attention. However, such systemic disparities in information exposure can result in observable social harms, such as withholding economic opportunities from historically marginalized groups (allocative harm) or amplifying gendered and racialized stereotypes (representational harm). Previously, Diaz et al. developed the expected exposure metric---that incorporates existing user browsing models that have previously been developed for information retrieval---to study fairness of content exposure to individual users. We extend their proposed framework to formalize a family of exposure fairness metrics that model the problem jointly from the perspective of both the consumers and producers. Specifically, we consider group attributes for both types of stakeholders to identify and mitigate fairness concerns that go beyond individual users and items towards more systemic biases in recommendation. Furthermore, we study and discuss the relationships between the different exposure fairness dimensions proposed in this paper, as well as demonstrate how stochastic ranking policies can be optimized towards said fairness goals.

Probabilistic Permutation Graph Search: Black-Box Optimization for Fairness in Ranking
Authors: Ali Vardasbi (1), Fatemeh Sarvi (1), Maarten de Rijke (1)
1: University of Amsterdam

ACM DL

Google Scholar

(65)
概要:　ランキングの公平性を測る指標はいくつか存在し、それぞれ異なる前提や視点に基づいています。PERMUTAION LUTENIZATION（PL）最適化とREINFORCEアルゴリズムを組み合わせることで、ブラックボックス目的関数を順列に対して最適化することが可能です。特に、公平性の指標を最適化するために利用できます。しかし、適度な数の繰り返しセッションがあるクエリには効果的である一方で、少数の繰り返しセッションに対しては改善の余地があります。本論文では、PERMUTAION GRAPH（PPG）という新しい順列分布の表現方法を提案します。PLと同様に、PPGもブラックボックス最適化に利用できますが、PLがポイントでのロジットを分布のパラメーターとして使用するのと異なり、PPGはペアワイズの逆転確率と基準となる順列を用いて分布を構築します。この基準の順列は、目的関数に関して最適なサンプル順列に設定できるため、PPGは決定論的および確率論的なランキングの両方に適しています。我々の実験結果によると、セッションの繰り返しが多い場合（すなわち確率論的ランキング）においてPPGはPLと同等であり、セッションが1回の場合（すなわち決定論的ランキング）にはPLを上回る公平性の指標を最適化することが示されています。さらに、例えば表モデルのように正確なユーティリティ推定が利用可能である場合、PPGの公平性最適化の性能は、ランキング学習モデルからの低品質なユーティリティ推定と比較して大幅に向上し、PLと大きな性能差を生み出します。最後に、ペアワイズの確率により「アイテム$d_1$は常にアイテム$d_2$より高くランク付けされるべきである」といったペアワイズの制約を導入することが可能です。このような制約は、公平性の指標を最適化しつつ、ランキング性能などの他の目的を同時に制御できるようにします。

Abstract:　 There are several measures for fairness in ranking, based on different underlying assumptions and perspectives. \acPL optimization with the REINFORCE algorithm can be used for optimizing black-box objective functions over permutations. In particular, it can be used for optimizing fairness measures. However, though effective for queries with a moderate number of repeating sessions, \acPL optimization has room for improvement for queries with a small number of repeating sessions. In this paper, we present a novel way of representing permutation distributions, based on the notion of permutation graphs. Similar to~\acPL, our distribution representation, called~\acPPG, can be used for black-box optimization of fairness. Different from~\acPL, where pointwise logits are used as the distribution parameters, in~\acPPG pairwise inversion probabilities together with a reference permutation construct the distribution. As such, the reference permutation can be set to the best sampled permutation regarding the objective function, making~\acPPG suitable for both deterministic and stochastic rankings. Our experiments show that~\acPPG, while comparable to~\acPL for larger session repetitions (i.e., stochastic ranking), improves over~\acPL for optimizing fairness metrics for queries with one session (i.e., deterministic ranking). Additionally, when accurate utility estimations are available, e.g., in tabular models, the performance of \acPPG in fairness optimization is significantly boosted compared to lower quality utility estimations from a learning to rank model, leading to a large performance gap with PL. Finally, the pairwise probabilities make it possible to impose pairwise constraints such as "item $d_1$ should always be ranked higher than item $d_2$.'' Such constraints can be used to simultaneously optimize the fairness metric and control another objective such as ranking performance.

Measuring Fairness in Ranked Results: An Analytical and Empirical Comparison
Authors: Amifa Raj (1), Michael D. Ekstrand (1)
1: Boise State University

ACM DL

Google Scholar

(66)
概要:　検索システムや推薦システムなどの情報アクセスシステムは、ユーザーの情報ニーズに関連すると考えられる結果をランク付けされたリストで提示することが多いです。これらのリストを従来の評価指標だけでなく公平性の観点からも評価することは、情報アクセスシステムの行動を正確性や有用性の構造を超えてより包括的に理解するために重要です。それらのランキングの（不）公平性を測定するために、特に保護された生産者や提供者グループに関して、過去数年間でいくつかの指標が提案されています。しかし、これらの指標の適用性を特定のシナリオや実際のデータに対して示し、概念的な類似点や相違点を明らかにした実証的で比較的な分析はまだ不足しています。我々は、これらの理論的および実践的な適用のギャップを埋めることを目指しています。本論文では、既存の文献からいくつかの公平なランキング指標を共通の表記法で説明し、それらのアプローチと仮定を直接比較できるようにし、3つの情報アクセスタスクの文脈で同じ実験設定およびデータセットを用いた実証的な比較を行います。また、これらの指標に関わる設計選択やパラメータ設定が結果に与える影響を評価するための感度分析も提供し、公平性測定を改善するために必要な追加の作業に言及します。

Abstract:　 Information access systems, such as search and recommender systems, often use ranked lists to present results believed to be relevant to the user's information need. Evaluating these lists for their fairness along with other traditional metrics provides a more complete understanding of an information access system's behavior beyond accuracy or utility constructs. To measure the (un)fairness of rankings, particularly with respect to the protected group(s) of producers or providers, several metrics have been proposed in the last several years. However, an empirical and comparative analyses of these metrics showing the applicability to specific scenario or real data, conceptual similarities, and differences is still lacking. We aim to bridge the gap between theoretical and practical ap-plication of these metrics. In this paper we describe several fair ranking metrics from the existing literature in a common notation, enabling direct comparison of their approaches and assumptions, and empirically compare them on the same experimental setup and data sets in the context of three information access tasks. We also provide a sensitivity analysis to assess the impact of the design choices and parameter settings that go in to these metrics and point to additional work needed to improve fairness measurement.

Optimizing Generalized Gini Indices for Fairness in Rankings
Authors: Virginie Do (1), Nicolas Usunier (2)
1: Meta AI Research & Université Paris Dauphine-PSL, 2: Meta AI Research

ACM DL

Google Scholar

(67)
概要:　アイテムの生産者や最も満足していないユーザーに対して公正であることを目指すレコメンダーシステムの設計に対する関心が高まっています。本論文では、経済学における不平等測定の領域に着想を得て、レコメンダーシステムが最適化するべき規範的基準として一般化Gini福祉関数（GGFs）の使用を検討しています。GGFsは集団内でのランクに応じて個人に重みを付け、平等を促進するためにより不利な立場の個人により多くの重みを与えます。これらの重みに応じて、GGFsはアイテム露出のジニ指数を最小化してアイテム間の平等を促進するか、または最も満足していないユーザーの特定の分位点でのパフォーマンスに焦点を当てます。GGFsは非微分性のため、ランキング最適化が困難です。私たちは非滑らか最適化と微分可能なソーティングで使用される射影演算子のツールを活用して、この課題を解決します。我々は最大1.5万人のユーザーとアイテムを含む実データセットを用いた実験を行い、多様な推薦タスクと公正性基準において、ベースラインよりも優れたトレードオフを実現することを示しました。

Abstract:　 There is growing interest in designing recommender systems that aim at being fair towards item producers or their least satisfied users. Inspired by the domain of inequality measurement in economics, this paper explores the use of generalized Gini welfare functions (GGFs) as a means to specify the normative criterion that recommender systems should optimize for. GGFs weight individuals depending on their ranks in the population, giving more weight to worse-off individuals to promote equality. Depending on these weights, GGFs minimize the Gini index of item exposure to promote equality between items, or focus on the performance on specific quantiles of least satisfied users. GGFs for ranking are challenging to optimize because they are non-differentiable. We resolve this challenge by leveraging tools from non-smooth optimization and projection operators used in differentiable sorting. We present experiments using real datasets with up to 15k users and items, which show that our approach obtains better trade-offs than the baselines on a variety of recommendation tasks and fairness criteria.

Pareto-Optimal Fairness-Utility Amortizations in Rankings with a DBN Exposure Model
Authors: Till Kletti (1), Jean-Michel Renders (1), Patrick Loiseau (2)
1: Naver Labs Europe, 2: Univ. Grenoble Alpes

ACM DL

Google Scholar

(68)
概要:　近年、様々な分野で提供されるランキングはユーザーに有用であるだけでなく、アイテム製作者に対する露出の公平性も尊重する必要があることが明らかになっています。本研究では、これら二つの側面の間でパレート最適なトレードオフを達成するランキングポリシーを見つける問題を検討します。この問題を解決するためにいくつかの方法が提案されています。例えば、一般的な方法としては、バーコフ＝フォン・ノイマン分解を用いる線形計画法があります。しかし、これらの方法はアイテム間の独立性を仮定する古典的な位置ベース露出モデル（PBM）に基づいており、そのため露出はランクのみに依存します。多くのアプリケーションでは、この仮定は非現実的であり、コミュニティは依存関係を含む他のモデルの検討に移行しつつあります。例えば、動的ベイジアンネットワーク（DBN）露出モデルです。このようなモデルに対して、（正確な）最適な公平ランキングポリシーを計算することは未解決の問題として残っています。本論文では、PBMに対して提案された新しい幾何学的方法、いわゆる expohedron を基にしてこの問題に対する解答を提供します。新しい幾何学的オブジェクト（DBN-expohedron）の構造を定義し、そのための複雑度 $O(n^3)$ のカラテオドリ分解アルゴリズムを提案します（nはランキングするドキュメントの数）。このアルゴリズムは、任意の実現可能な期待露出ベクトルを最多 n ランキングの分布として表現することを可能にし、同様の複雑度 $O(n^3)$ でパレート最適な期待露出ベクトルの全集合を計算できることを示します。本研究は、パレート最適なランキング分布を効果的に見つけることができる最初の正確なアルゴリズムを提供し、メリトクラシーや人口統計学的公平性などの様々な公平性概念に適用可能です。我々の方法を TREC2020 および MSLR データセットで実証的に評価し、パレート最適性と速度の観点からいくつかのベースラインと比較しました。

Abstract:　 In recent years, it has become clear that rankings delivered in many areas need not only be useful to the users but also respect fairness of exposure for the item producers. We consider the problem of finding ranking policies that achieve a Pareto-optimal tradeoff between these two aspects. Several methods were proposed to solve it; for instance a popular one is to use linear programming with a Birkhoff-von Neumann decomposition. These methods, however, are based on a classical Position Based exposure Model (PBM), which assumes independence between the items (hence the exposure only depends on the rank). In many applications, this assumption is unrealistic and the community increasingly moves towards considering other models that include dependences, such as the Dynamic Bayesian Network (DBN) exposure model. For such models, computing (exact) optimal fair ranking policies remains an open question. In this paper, we answer this question by leveraging a new geometrical method based on the so-called expohedron proposed recently for the PBM (Kletti et al., WSDM'22). We lay out the structure of a new geometrical object (the DBN-expohedron), and propose for it a Carathéodory decomposition algorithm of complexity $O(n^3)$, where n is the number of documents to rank. Such an algorithm enables expressing any feasible expected exposure vector as a distribution over at most n rankings; furthermore we show that we can compute the whole set of Pareto-optimal expected exposure vectors with the same complexity $O(n^3)$. Our work constitutes the first exact algorithm able to efficiently find a Pareto-optimal distribution of rankings. It is applicable to a broad range of fairness notions, including classical notions of meritocratic and demographic fairness. We empirically evaluate our method on the TREC2020 and MSLR datasets and compare it to several baselines in terms of Pareto-optimality and speed.

Fairness of Exposure in Light of Incomplete Exposure Estimation
Authors: Maria Heuss (1), Fatemeh Sarvi (2), Maarten de Rijke (1)
1: University of Amsterdam, 2: AIRLab & University of Amsterdam

ACM DL

Google Scholar

(69)
概要:　ランキングシステムにおける露出の公平性は、一般的に使用される公平性の概念です。これは、全てのアイテムまたはアイテムグループが、そのアイテムの価値またはそのグループ内のアイテムの集合的な価値に比例して露出を得るべき、という考えに基づいています。公平な露出を確保するためには、しばしば確率的ランキングポリシーが使用されます。しかし、従来の研究では、確率的ポリシーによって生成された各ランキングに対して、全てのアイテムの期待される露出を信頼性高く推定できるという非現実的な前提がなされていました。本研究では、アイテム間の依存関係のために露出分布を信頼性高く推定できないランキングを含むポリシーにおいて、どのように露出の公平性にアプローチするかを議論します。このような場合には、そのポリシーが公平とみなされるかどうかを判断することはできません。

本論文の貢献は二つあります。第一に、アイテムの公平性やユーザーユーティリティを損なうことなく、未知の露出分布を持つランキングをユーザーに表示しない確率的ポリシーを見つける方法である\method を定義します。第二に、露出の公平性の研究をトップk設定に拡張し、この設定で\method を評価します。我々は、\method が既存の公平なランキング手法と比較して、ユーザーユーティリティや公平性の低下なしに、未知の露出分布を持つランキングの数を大幅に減少させることを発見しました。これは、ユーザーの行動に関する不完全な知識を持つ場合の公平なランキング手法を開発するための重要な第一歩です。

Abstract:　 Fairness of exposure is a commonly used notion of fairness for ranking systems. It is based on the idea that all items or item groups should get exposure proportional to the merit of the item or the collective merit of the items in the group. Often, stochastic ranking policies are used to ensure fairness of exposure. Previous work unrealistically assumes that we can reliably estimate the expected exposure for all items in each ranking produced by the stochastic policy. In this work, we discuss how to approach fairness of exposure in cases where the policy contains rankings of which, due to inter-item dependencies, we cannot reliably estimate the exposure distribution. In such cases, we cannot determine whether the policy can be considered fair. % Our contributions in this paper are twofold. First, we define a method called \method for finding stochastic policies that avoid showing rankings with unknown exposure distribution to the user without having to compromise user utility or item fairness. Second, we extend the study of fairness of exposure to the top-k setting and also assess \method in this setting. We find that \method can significantly reduce the number of rankings with unknown exposure distribution without a drop in user utility or fairness compared to existing fair ranking methods, both for full-length and top-k rankings. This is an important first step in developing fair ranking methods for cases where we have incomplete knowledge about the user's behaviour.

CPFair: Personalized Consumer and Producer Fairness Re-ranking for Recommender Systems
Authors: Mohammadmehdi Naghiaei (1), Hossein A. Rahmani (2), Yashar Deldjoo (3)
1: University of Southern California, 2: University College London, 3: Polytechnic University of Bari

ACM DL

Google Scholar

(70)
概要:　最近、機械学習（ML）アルゴリズムが選択の自動化に用いられる際に、個人に対して不公平に扱う可能性があり、それにより法的、倫理的、または経済的な影響が生じるという認識が高まっています。推薦システムは、このようなMLシステムの代表例であり、ユーザーが重要な判断を行うのを支援します。これまでの文献研究における推薦システムの公平性についての一般的な傾向として、ユーザーとアイテムの公平性の問題を個別に扱い、推薦システムが二面市場で機能していることを無視している点が挙げられます。本研究では、消費者側と生産者側の公平性制約を統合的に取り込んだ最適化ベースの再ランキングアプローチを提案します。我々は8つのデータセットに対する大規模実験を通じて、提案手法が全体の推薦品質を低下させることなく、消費者と生産者の公平性の両方を改善することができることを示し、アルゴリズムがデータバイアスを最小限に抑える役割を果たす可能性を実証します。

Abstract:　 Recently, there has been a rising awareness that when machine learning (ML) algorithms are used to automate choices, they may treat/affect individuals unfairly, with legal, ethical, or economic consequences. Recommender systems are prominent examples of such ML systems that assist users in making high-stakes judgments. A common trend in the previous literature research on fairness in recommender systems is that the majority of works treat user and item fairness concerns separately, ignoring the fact that recommender systems operate in a two-sided marketplace. In this work, we present an optimization-based re-ranking approach that seamlessly integrates fairness constraints from both the consumer and producer-side in a joint objective framework. We demonstrate through large-scale experiments on 8 datasets that our proposed method is capable of improving both consumer and producer fairness without reducing overall recommendation quality, demonstrating the role algorithms may play in minimizing data biases.

Structure and Semantics Preserving Document Representations
Authors: Natraj Raman (1), Sameena Shah (2), Manuela Veloso (2)
1: J.P. Morgan AI Research, 2: J.P. Morgan AI Research

ACM DL

Google Scholar

(71)
概要:　コーパスから関連する文書を検索する際、通常は文書の内容とクエリのテキストとの意味的類似性に基づいて行われます。しかし、文書間の構造的な関係を考慮に入れることで、意味的なギャップに対応し、検索メカニズムを向上させることができます。とはいえ、これらの関係を取り入れるには、構造と意味のバランスを取りながら、一般的な事前学習/微調整パラダイムを活用する現実的なメカニズムが必要です。ここでは、文書内の内容と文書間の関係を統合することにより、文書表現を学習する包括的アプローチを提案します。私たちのディープメトリックラーニングソリューションは、関係ネットワーク内の複雑な近隣関係を分析し、類似/非類似の文書ペアを効率的にサンプリングする新しいクインタプルロス関数を定義します。この関数は、意味的に関連する文書ペアが表現空間で近くにあり、構造的に無関係な文書ペアが遠くにあることを同時に促進します。さらに、文書間の分離マージンは関係の強度の多様性をエンコードするために柔軟に設定されます。本モデルは完全にファインチューニング可能で、推論時のクエリプロジェクションをネイティブにサポートします。複数のデータセットにおける文書検索タスクで競合手法を上回る成果を示しています。

Abstract:　 Retrieving relevant documents from a corpus is typically based on the semantic similarity between the document content and query text. The inclusion of structural relationship between documents can benefit the retrieval mechanism by addressing semantic gaps. However, incorporating these relationships requires tractable mechanisms that balance structure with semantics and take advantage of the prevalent pre-train/fine-tune paradigm. We propose here a holistic approach to learning document representations by integrating intra-document content with inter-document relations. Our deep metric learning solution analyzes the complex neighborhood structure in the relationship network to efficiently sample similar/dissimilar document pairs and defines a novel quintuplet loss function that simultaneously encourages document pairs that are semantically relevant to be closer and structurally unrelated to be far apart in the representation space. Furthermore, the separation margins between the documents are varied flexibly to encode the heterogeneity in relationship strengths. The model is fully fine-tunable and natively supports query projection during inference. We demonstrate that it outperforms competing methods on multiple datasets for document retrieval tasks.

Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion
Authors: Adam Block (1), Rahul Kidambi (2), Daniel N. Hill (2), Thorsten Joachims (3), Inderjit S. Dhillon (4)
1: Massachusetts Institute of Technology, 2: Amazon Search, 3: Amazon Music, 4: University of Texas at Austin

ACM DL

Google Scholar

(72)
概要:　従来のクエリ補完方法は、ユーザーがリストから選択する完成クエリを予測することを目的としています。しかし、このアプローチには、ユーザーが現在の情報検索システムで最適な検索パフォーマンスを提供するクエリを知っているとは限らないという欠点があります。そのため、ユーザーの行動を模倣するように訓練されたクエリ補完方法は、最適でないクエリ提案を導く可能性があります。この制約を克服するために、下流の検索パフォーマンスを明示的に最適化する新しいアプローチを提案します。具体的には、各クエリ提案がもたらす下流のアイテムランキングによって表されるランキングのセットをランク付けする問題として定式化します。次に、アイテムランキングの質によってクエリ提案をランク付けする学習方法を提示します。このアルゴリズムは、カウンターファクチュアル学習アプローチに基づいており、アイテムに関するフィードバック（例：クリック、購入）を利用してクエリ提案をバイアスのない推定量で評価できるため、ユーザーが最適なクエリを書いたり選択したりするという仮定を避けることができます。我々は提案アプローチに理論的なサポートを提供し、学習理論的な保証を示します。また、公開されているデータセットでの実証結果を提示し、オンラインショッピングストアのデータを使用した実世界での適用性を実証します。

Abstract:　 Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions. To overcome this limitation, we propose a new approach that explicitly optimizes the query suggestions for downstream retrieval performance. We formulate this as a problem of ranking a set of rankings, where each query suggestion is represented by the downstream item ranking it produces. We then present a learning method that ranks query suggestions by the quality of their item rankings. The algorithm is based on a counterfactual learning approach that is able to leverage feedback on the items (e.g., clicks, purchases) to evaluate query suggestions through an unbiased estimator, thus avoiding the assumption that users write or select optimal queries. We establish theoretical support for the proposed approach and provide learning-theoretic guarantees. We also present empirical results on publicly available datasets, and demonstrate real-world applicability using data from an online shopping store.

Risk-Sensitive Deep Neural Learning to Rank
Authors: Pedro Henrique Silva Rodrigues (1), Daniel Xavier Sousa (2), Thierson Couto Rosa (3), Marcos André Gonçalves (1)
1: Federal University of Minas Gerais - UFMG, 2: Federal Institute of Goiás - IFG, 3: Federal University of Goiás - UFG

ACM DL

Google Scholar

(73)
概要:　ランク学習（Learning to Rank, L2R）は、多くの情報検索システムの主要なタスクです。近年、深層ニューラルネットワーク（DNN）をL2Rに応用する試みが行われ、顕著な成果が得られています。しかし、重要かつ新しい進展であるリスク感受性が、Deep Neural L2Rにまだ組み込まれていません。リスク感受性の指標は、複数のクエリに対して、ベースラインの情報検索システムに比べてパフォーマンスが悪化するリスクを評価するために重要です。しかし、文献で述べられているリスク感受性の指標は非スムーズな振る舞いを示し、DNNによって最適化することが困難です。本研究では、新しい損失関数 \riskloss\ ファミリーを提案することによって、この難題を解決します。 \riskloss\ は、滑らかなリスク感受性最適化をサポートします。 \riskloss\ は2つの重要な貢献を導入します。(i) リスク感受性の指標における従来のNDCGやMAPメトリクスを置換し、クエリに対する文書の予測順位と実順位の相関関係を評価する滑らかな損失関数を採用すること、および(ii) マルチドロップアウト技術を用いて同一DNNアーキテクチャの異なるバージョンをベースラインとして使用し、DNNトレーニングの一部として複数の情報検索システムを評価する煩雑さを回避することです。最近のDNN手法と共に \riskloss\ 関数を使用した際の成果を、WEB10K、YAHOO、MQ2007といった著名なウェブ検索データセットの文脈で実証的に示します。提案する \riskloss\ ソリューションは、最先端の自己注意型DNN-L2Rアーキテクチャと共に適用した場合、効果（NDCG）の8%の向上とリスク感受性 (\grisk\ 指標) の約5%の向上を達成します。さらに \riskloss\ は、最も評価の高いベースラインに対する損失を28%削減し、リスク感受性の最先端非DNNメソッドに対して大幅な改善（最大13.3%）を実現しつつ、全体的な効果を維持（もしくは向上）させます。これらの結果は最終的に、リスク感受性およびDNN-L2R研究の最先端技術の新しい基準を確立します。

Abstract:　 Learning to Rank (L2R) is the core task of many Information Retrieval systems. Recently, a great effort has been put on exploring Deep Neural Networks (DNNs) for L2R, with significant results. However, risk-sensitiveness, an important and recent advance in the L2R arena, that reduces variability and increases trust, has not been incorporated into Deep Neural L2R yet. Risk-sensitive measures are important to assess the risk of an IR system to perform worse than a set of baseline IR systems for several queries. However, the risk-sensitive measures described in the literature have a non-smooth behavior, making them difficult, if not impossible, to be optimized by DNNs. In this work we solve this difficult problem by proposing a family of new loss functions -- \riskloss\ -- that support a smooth risk-sensitive optimization. \riskloss\ introduces two important contributions: (i) the substitution of the traditional NDCG or MAP metrics in risk-sensitive measures with smooth loss functions that evaluate the correlation between the predicted and the true relevance order of documents for a given query and (ii) the use of distinct versions of the same DNN architecture as baselines by means of a multi-dropout technique during the smooth risk-sensitive optimization, avoiding the inconvenience of assessing multiple IR systems as part of DNN training. We empirically demonstrate significant achievements of the proposed \riskloss\ functions when used with recent DNN methods in the context of well-known web-search datasets such as WEB10K, YAHOO, and MQ2007. Our solutions reach improvements of 8% in effectiveness (NDCG) while improving in around 5% the risk-sensitiveness (\grisk\ measure) when applied together with a state-of-the-art Self-Attention DNN-L2R architecture. Furthermore, \riskloss\ is capable of reducing by 28% the losses over the best evaluated baselines and significantly improving over the risk-sensitive state-of-the-art non-DNN method (by up to 13.3%) while keeping (or even increasing) overall effectiveness. All these results ultimately establish a new level for the state-of-the-art on risk-sensitiveness and DNN-L2R research.

RankFlow: Joint Optimization of Multi-Stage Cascade Ranking Systems as Flows
Authors: Jiarui Qin (1), Jiachen Zhu (1), Bo Chen (2), Zhirong Liu (2), Weiwen Liu (2), Ruiming Tang (2), Rui Zhang (3), Yong Yu (1), Weinan Zhang (1)
1: Shanghai Jiao Tong University, 2: Huawei Noah's Ark Lab, 3: ruizhang.info

ACM DL

Google Scholar

(74)
概要:　マルチステージカスケードランクシステムの構築は、現代の情報検索（IR）アプリケーション、例えばレコメンデーションやウェブ検索において、効率性と効果性のバランスを取るために一般的に使用される解決策です。実践においては広く使われているにもかかわらず、特にマルチステージカスケードランクシステムに関する文献は比較的少ないのが現状です。一般的な方法としては、各ステージのランカーを、データの流れやステージ間の可能な相互作用を無視して、同一のユーザーフィードバックデータ（いわゆるインプレッションデータ）を使用して独立に訓練することです。この単純な方法は、特にカスケードランカーにおいて、サンプル選択バイアス（SSB）の問題により最適とは言えないシステムをもたらす可能性があります。なぜなら、複数のステージで発生する負の影響が蓄積するからです。さらに悪いことに、各ステージのランカー間の相互作用が十分に活用されていません。本研究は、この一般的なソリューションの限界を明らかにするための詳細な分析を提供します。カスケードランキングの本質を研究することで、SSB問題を軽減し、カスケードランカー間の相互作用を活用するために RankFlow と名付けた共同訓練フレームワークを提案します。これはこのテーマについての最初の体系的な解決策です。我々は、ランカーを統一されたユーザーフィードバック分布ではなく、ステージ固有のデータ分布にフィットさせることの重要性を強調したカスケードランカーの訓練パラダイムを提案します。このパラダイムに基づいて、RankFlow フレームワークを設計しました：各ステージの訓練データはその前のステージによって生成され、ガイダンスシグナルはログだけでなく後続のステージからも提供されます。レコメンデーション、ウェブ検索、広告を含むさまざまなIRシナリオで広範な実験を行いました。その結果、RankFlow の有効性と優位性が検証されました。

Abstract:　 Building a multi-stage cascade ranking system is a commonly used solution to balance the efficiency and effectiveness in modern information retrieval (IR) applications, such as recommendation and web search. Despite the popularity in practice, the literature specific on multi-stage cascade ranking systems is relatively scarce. The common practice is to train rankers of each stage independently using the same user feedback data (a.k.a., impression data), disregarding the data flow and the possible interactions between stages. This straightforward solution could lead to a sub-optimal system because of the sample selection bias (SSB) issue, which is especially damaging for cascade rankers due to the negative effect accumulated in the multiple stages. Worse still, the interactions between the rankers of each stage are not fully exploited. This paper provides an elaborate analysis of this commonly used solution to reveal its limitations. By studying the essence of cascade ranking, we propose a joint training framework named RankFlow to alleviate the SSB issue and exploit the interactions between the cascade rankers, which is the first systematic solution for this topic. We propose a paradigm of training cascade rankers that emphasizes the importance of fitting rankers on stage-specific data distributions instead of the unified user feedback distribution. We design the RankFlow framework based on this paradigm: The training data of each stage is generated by its preceding stages while the guidance signals not only come from the logs but its successors. Extensive experiments are conducted on various IR scenarios, including recommendation, web search and advertisement. The results verify the efficacy and superiority of RankFlow.

LoL: A Comparative Regularization Loss over Query Reformulation Losses for Pseudo-Relevance Feedback
Authors: Yunchang Zhu (1), Liang Pang (2), Yanyan Lan (3), Huawei Shen (4), Xueqi Cheng (4)
1: Data Intelligence System Research Center, 2: Data Intelligence System Research Center, 3: Tsinghua University, 4: Institute of Computing Technology

ACM DL

Google Scholar

(75)
概要:　疑似関連フィードバック（PRF）は、検索精度を向上させるための効果的なクエリ再構築手法として実証されています。この手法は、クエリとその潜在的に関連する文書間の言語表現の不一致を緩和することを目的としています。従来のPRF手法は、同じクエリに基づくが異なる数のフィードバック文書を使用する再構築されたクエリを独立して扱うため、重大なクエリドリフトを引き起こします。同じクエリからの二つの異なる再構築の効果を比較しなければ、PRFモデルは多くのフィードバックで増加する無関係な情報に誤って焦点を当てる可能性があり、したがって、少ないフィードバックを使用した再構築よりも効果が低いクエリが再構築されることになります。理想的には、PRFモデルがフィードバック内の無関係な情報と関連する情報を区別できれば、フィードバック文書が多ければ多いほど、修正されたクエリはより良くなります。このギャップを埋めるため、我々は同じクエリの異なる再構築の間の再構築損失を比較するLoss-over-Loss（LoL）フレームワークを提案します。具体的には、異なる量のフィードバックを用いて並行してオリジナルのクエリを複数回修正し、その再構築損失を計算します。その後、これらの再構築損失に追加の正則化損失を導入し、より多くのフィードバックを使用してなおかつ損失が大きくなる修正をペナルティ化します。このような比較的正則化を通じて、PRFモデルは異なる修正クエリの効果を比較することによって、追加で増加する無関係な情報を抑制することを学習すると期待されます。さらに、このフレームワークを実装するために、微分可能なクエリ再構築方法を提示します。この方法は、ベクトル空間でクエリを修正し、クエリベクトルの検索性能を直接最適化するもので、疎および密の両方の検索モデルに適用可能です。実証的評価により、典型的な疎および密の検索モデルに対する我々の手法の有効性と堅牢性が示されています。

Abstract:　 Pseudo-relevance feedback (PRF) has proven to be an effective query reformulation technique to improve retrieval accuracy. It aims to alleviate the mismatch of linguistic expressions between a query and its potential relevant documents. Existing PRF methods independently treat revised queries originating from the same query but using different numbers of feedback documents, resulting in severe query drift. Without comparing the effects of two different revisions from the same query, a PRF model may incorrectly focus on the additional irrelevant information increased in the more feedback, and thus reformulate a query that is less effective than the revision using the less feedback. Ideally, if a PRF model can distinguish between irrelevant and relevant information in the feedback, the more feedback documents there are, the better the revised query will be. To bridge this gap, we propose the Loss-over-Loss (LoL) framework to compare the reformulation losses between different revisions of the same query during training. Concretely, we revise an original query multiple times in parallel using different amounts of feedback and compute their reformulation losses. Then, we introduce an additional regularization loss on these reformulation losses to penalize revisions that use more feedback but gain larger losses. With such comparative regularization, the PRF model is expected to learn to suppress the extra increased irrelevant information by comparing the effects of different revised queries. Further, we present a differentiable query reformulation method to implement this framework. This method revises queries in the vector space and directly optimizes the retrieval performance of query vectors, applicable for both sparse and dense retrieval models. Empirical evaluation demonstrates the effectiveness and robustness of our method for two typical sparse and dense retrieval models.

Few-Shot Stance Detection via Target-Aware Prompt Distillation
Authors: Yan Jiang (1), Jinhua Gao (2), Huawei Shen (1), Xueqi Cheng (3)
1: Data Intelligence System Research Center, 2: Data Intelligence System Research Center, 3: CAS Key Lab of Network Data Science and Technology

ACM DL

Google Scholar

(76)
概要:　スタンス検出は、テキストの著者が特定の対象に対して賛成か反対か、または中立であるかを識別することを目的としています。この課題の主要な挑戦は二つあります。ターゲットの多様性に起因する少数ショット学習と、ターゲットの文脈情報の欠如です。既存の研究は主に、注意機構ベースのモデルやノイズが入った外部知識を導入することで、後者の問題の解決に焦点を当てており、前者の問題は十分に探求されていません。本論文では、事前学習された言語モデル（PLMs）が知識ベースおよび少数ショット学習者としての潜在的能力に着目し、スタンス検出にプロンプトベースの微調整を導入することを提案します。PLMsはターゲットに必要な文脈情報を提供し、プロンプトを通じて少数ショット学習を可能にします。スタンス検出におけるターゲットの重要な役割を考慮し、我々はターゲット認識プロンプトを設計し、新しいベクタライザーを提案します。具体的な単語にラベルをマッピングする代わりに、ベクタライザーは各ラベルをベクトルにマッピングし、スタンスとターゲットの相関を最もよく捉えたラベルを選びます。さらに、単一の手作りプロンプトで変動するターゲットを扱う際の可能な欠陥を軽減するために、複数のプロンプトから学習した情報を蒸留することを提案します。実験結果は、フルデータと少数ショットのシナリオの両方で、我々の提案するモデルの優れた性能を示しています。

Abstract:　 Stance detection aims to identify whether the author of a text is in favor of, against, or neutral to a given target. The main challenge of this task comes two-fold: few-shot learning resulting from the varying targets and the lack of contextual information of the targets. Existing works mainly focus on solving the second issue by designing attention-based models or introducing noisy external knowledge, while the first issue remains under-explored. In this paper, inspired by the potential capability of pre-trained language models (PLMs) serving as knowledge bases and few-shot learners, we propose to introduce prompt-based fine-tuning for stance detection. PLMs can provide essential contextual information for the targets and enable few-shot learning via prompts. Considering the crucial role of the target in stance detection task, we design target-aware prompts and propose a novel verbalizer. Instead of mapping each label to a concrete word, our verbalizer maps each label to a vector and picks the label that best captures the correlation between the stance and the target. Moreover, to alleviate the possible defect of dealing with varying targets with a single hand-crafted prompt, we propose to distill the information learned from multiple prompts. Experimental results show the superior performance of our proposed model in both full-data and few-shot scenarios.

Pre-train a Discriminative Text Encoder for Dense Retrieval via Contrastive Span Prediction
Authors: Xinyu Ma (1), Jiafeng Guo (1), Ruqing Zhang (1), Yixing Fan (1), Xueqi Cheng (1)
1: ICT

ACM DL

Google Scholar

(77)
概要:　密な情報検索（IR）における密な検索手法は、高品質なテキスト表現学習に基づく効果的な検索で有望な結果を示している。最近の研究では、オートエンコーダベースの言語モデルが弱いデコーダを使用して密な検索のパフォーマンスを向上させることが示されている。しかし、我々は以下の点を主張する。1) すべての入力テキストをデコードすることは識別力がない、2) 弱いデコーダであってもエンコーダにバイパス効果を及ぼす。そこで本研究では、エンコーダ単独で事前学習するための新しい対照的スパン予測タスクを導入し、オートエンコーダのボトルネック能力を保持することを目指す。この方法により、1) スパンごとの集団対照学習を用いて識別力の高いテキスト表現を効率的に学習し、2) デコーダのバイパス効果を完全に回避することができる。公開されているリトリーバルベンチマークデータセットに対する包括的な実験により、本手法が既存の密な検索用事前学習方法に比べて顕著に優れていることを示す。

Abstract:　 Dense retrieval has shown promising results in many information retrieval (IR) related tasks, whose foundation is high-quality text representation learning for effective search. Some recent studies have shown that autoencoder-based language models are able to boost the dense retrieval performance using a weak decoder. However, we argue that 1) it is not discriminative to decode all the input texts and, 2) even a weak decoder has the bypass effect on the encoder. Therefore, in this work, we introduce a novel contrastive span prediction task to pre-train the encoder alone, but still retain the bottleneck ability of the autoencoder. In this way, we can 1) learn discriminative text representations efficiently with the group-wise contrastive learning over spans and, 2) avoid the bypass effect of the decoder thoroughly. Comprehensive experiments over publicly available retrieval benchmark datasets show that our approach can outperform existing pre-training methods for dense retrieval significantly.

Co-clustering Interactions via Attentive Hypergraph Neural Network
Authors: Tianchi Yang (1), Cheng Yang (1), Luhao Zhang (2), Chuan Shi (1), Maodi Hu (2), Huaijun Liu (2), Tao Li (2), Dong Wang (2)
1: Beijing University of Posts and Telecommunications, 2: Meituan

ACM DL

Google Scholar

(78)
概要:　 ABSTRAKT
インタラクションデータの急速な増加に伴い、多くのクラスタリング手法が提案され、下流タスクに有益な事前知識としてインタラクションパターンを発見しています。インタラクションは複数のオブジェクト間で発生するアクションと見なせるため、既存の多くの手法はオブジェクトとそのペア関係をノードとリンクとしてグラフにモデル化します。しかし、これらの手法は、実際の全体的なインタラクションの一部の情報しかモデル化および活用せず、全体のインタラクションをいくつかのペアの部分インタラクションに簡略化するか、特定の種類のオブジェクトのクラスタリングに焦点を当てるだけです。これにより、クラスタリングの性能と説明可能性が制限されます。この問題に対処するため、我々は注意機構を備えたハイパーグラフニューラルネットワークを用いてインタラクションを共同クラスタリングする手法（CIAH）を提案します。特に、ハイパーグラフによるインタラクションの包括的なモデリングにより、注意機構を使用して説明のために重要な属性を選択する注意機構を利用して、全体のインタラクションをエンコードすることを提案します。その後、属性の真の重要性とより一致するように注意を導く顕著な方法を導入します。さらに、インタラクションの表現と対応する属性選択の分布の共同クラスタリングを実行する新しい共同クラスタリング手法を提案します。広範な実験により、CIAHが公共データセットおよび実際の産業データセットの両方で最先端のクラスタリング手法を著しく上回ることが示されました。

Abstract:　 With the rapid growth of interaction data, many clustering methods have been proposed to discover interaction patterns as prior knowledge beneficial to downstream tasks. Considering that an interaction can be seen as an action occurring among multiple objects, most existing methods model the objects and their pair-wise relations as nodes and links in graphs. However, they only model and leverage part of the information in real entire interactions, i.e., either decompose the entire interaction into several pair-wise sub-interactions for simplification, or only focus on clustering some specific types of objects, which limits the performance and explainability of clustering. To tackle this issue, we propose to Co-cluster the Interactions via Attentive Hypergraph neural network (CIAH). Particularly, with more comprehensive modeling of interactions by hypergraph, we propose an attentive hypergraph neural network to encode the entire interactions, where an attention mechanism is utilized to select important attributes for explanations. Then, we introduce a salient method to guide the attention to be more consistent with real importance of attributes, namely saliency-based consistency. Moreover, we propose a novel co-clustering method to perform a joint clustering for the representations of interactions and the corresponding distributions of attribute selection, namely cluster-based consistency. Extensive experiments demonstrate that our CIAH significantly outperforms state-of-the-art clustering methods on both public datasets and real industrial datasets.

Adaptable Text Matching via Meta-Weight Regulator
Authors: Bo Zhang (1), Chen Zhang (1), Fang Ma (1), Dawei Song (1)
1: Beijing Institute of Technology

ACM DL

Google Scholar

(79)
概要:　ニューラルテキストマッチングモデルは、質問応答や自然言語推論などのさまざまなアプリケーションで使用されており、良好な性能を発揮しています。しかし、これらのニューラルモデルは適応性が限定されており、異なるデータセットやタスクからのテスト例に直面すると性能が低下することがあります。特に、少数ショット設定において適応性は非常に重要です。多くの場合、ターゲットデータセットやタスクに対して利用可能なラベル付きデータが限られていますが、豊富にラベルが付けられたソースデータセットやタスクにはアクセスできることがよくあります。しかし、豊富なソースデータで訓練されたモデルを少数ショットのターゲットデータセットやタスクに適応させることは困難です。この課題に対処するために、我々はMeta-Weight Regulator (MWR)を提案します。MWRは、ターゲットの損失に対する関連性に基づいてソース例に重みを付与することを学習するメタラーニングアプローチです。具体的には、MWRはまず、均等に重み付けされたソース例でモデルを訓練し、ターゲット例に対するモデルの有効性を損失関数を通じて測定します。メタグラデーションディセントを反復的に実行することで、高次の勾配がソース例に伝播されます。これらの勾配は、その後、ターゲット性能に関連する方法でソース例の重みを更新するために使用されます。MWRはモデルに依存しないため、どのようなバックボーンニューラルモデルにも適用可能です。さまざまなバックボーンテキストマッチングモデルを使用し、4つの広く使用されているデータセットと2つのタスクで広範な実験を行いました。その結果、提案したアプローチが既存の適応方法を大幅に上回り、少数ショット設定においてニューラルテキストマッチングモデルのクロスデータセットおよびクロスタスク適応性を効果的に向上させることを実証しました。

Abstract:　 Neural text matching models have been used in a range of applications such as question answering and natural language inference, and have yielded a good performance. However, these neural models are of a limited adaptability, resulting in a decline in performance when encountering test examples from a different dataset or even a different task. The adaptability is particularly important in the few-shot setting: in many cases, there is only a limited amount of labeled data available for a target dataset or task, while we may have access to a richly labeled source dataset or task. However, adapting a model trained on the abundant source data to a few-shot target dataset or task is challenging. To tackle this challenge, we propose a Meta-Weight Regulator (MWR), which is a meta-learning approach that learns to assign weights to the source examples based on their relevance to the target loss. Specifically, MWR first trains the model on the uniformly weighted source examples, and measures the efficacy of the model on the target examples via a loss function. By iteratively performing a (meta) gradient descent, high-order gradients are propagated to the source examples. These gradients are then used to update the weights of source examples, in a way that is relevant to the target performance. As MWR is model-agnostic, it can be applied to any backbone neural model. Extensive experiments are conducted with various backbone text matching models, on four widely used datasets and two tasks. The results demonstrate that our proposed approach significantly outperforms a number of existing adaptation methods and effectively improves the cross-dataset and cross-task adaptability of the neural text matching models in the few-shot setting.

Incorporating Context Graph with Logical Reasoning for Inductive Relation Prediction
Authors: Qika Lin (1), Jun Liu (2), Fangzhi Xu (1), Yudai Pan (1), Yifan Zhu (3), Lingling Zhang (1), Tianzhe Zhao (1)
1: Xi'an Jiaotong University, 2: National Engineering Lab for Big Data Analytics, 3: Tsinghua University

ACM DL

Google Scholar

(80)
概要:　知識グラフ（KGs）の関係予測は、観測された三つ組から欠けている有効な三つ組を推測することを目的としています。この課題は深く研究されてきましたが、従来の多くの研究は推論設定に限定されており、新しいエンティティを扱うことができません。実際には、テスト段階でエンティティが訓練中に見られないことを許容する推論設定のほうが、現実のシナリオに近いのです。しかし、推論関係予測を正確に行うことはエンティティに依存しない関係モデル化と相互運用性のための離散的な論理推論が必要であり、挑戦的です。

そこで、我々はコンテキストグラフと論理推論を統合する新しいモデルConGLRを提案します。まず、対象となるヘッドエンティティとテールエンティティに関する閉合部分グラフを抽出し、二重半径ラベリングによって初期化します。その後、関係パス、関係、エンティティを含むコンテキストグラフを導入します。次に、エンティティと関係の情報相互作用を伴う2つのグラフ畳み込みネットワーク（GCNs）をそれぞれ部分グラフとコンテキストグラフを処理するために実行します。異なるエッジとターゲット関係の影響を考慮して、サブグラフGCNにはエッジ認識および関係認識の注意メカニズムを導入します。最後に、関係パスをルールボディとし、ターゲット関係をルールヘッドとすることで、ニューラル計算と論理推論を統合し推論スコアを得ます。各モジュールの特定のモデリング目標に集中するために、訓練プロセスでコンテキストグラフとサブグラフGCNsの情報相互作用にストップグラデリエントを利用します。この方法により、ConGLRは2つの推論要件を同時に満たします。

広範な実験により、ConGLRが3つの共通のKGsの12の推論データセットバージョンにおいて、最先端のベースラインに対して優れた性能を発揮することが示されました。

Abstract:　 Relation prediction on knowledge graphs (KGs) aims to infer missing valid triples from observed ones. Although this task has been deeply studied, most previous studies are limited to the transductive setting and cannot handle emerging entities. Actually, the inductive setting is closer to real-life scenarios because it allows entities in the testing phase to be unseen during training. However, it is challenging to precisely conduct inductive relation prediction as there exists requirements of entity-independent relation modeling and discrete logical reasoning for interoperability. To this end, we propose a novel model ConGLR to incorporate context graph with logical reasoning. Firstly, the enclosing subgraph w.r.t. target head and tail entities are extracted and initialized by the double radius labeling. And then the context graph involving relational paths, relations and entities is introduced. Secondly, two graph convolutional networks (GCNs) with the information interaction of entities and relations are carried out to process the subgraph and context graph respectively. Considering the influence of different edges and target relations, we introduce edge-aware and relation-aware attention mechanisms for the subgraph GCN. Finally, by treating the relational path as rule body and target relation as rule head, we integrate neural calculating and logical reasoning to obtain inductive scores. And to focus on the specific modeling goals of each module, the stop-gradient is utilized in the information interaction between context graph and subgraph GCNs in the training process. In this way, ConGLR satisfies two inductive requirements at the same time. Extensive experiments demonstrate that ConGLR obtains outstanding performance against state-of-the-art baselines on twelve inductive dataset versions of three common KGs.

Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion
Authors: Xiang Chen (1), Ningyu Zhang (1), Lei Li (1), Shumin Deng (1), Chuanqi Tan (2), Changliang Xu (3), Fei Huang (2), Luo Si (2), Huajun Chen (1)
1: Zhejiang University, 2: Alibaba Group, 3: State Key Laboratory of Media Convergence Production Technology and Systems

ACM DL

Google Scholar

(81)
概要:　視覚とテキストの事実知識を整理するマルチモーダル知識グラフ（MKG）が、情報検索、質問応答、推薦システムなどのタスクで成功を収めています。ほとんどのMKGはまだ不完全であるため、マルチモーダルなエンティティや関係抽出、リンク予測に焦点を当てた知識グラフ補完の研究が数多く提案されています。しかし、異なるタスクやモダリティにはモデルアーキテクチャの変更が必要であり、すべての画像やオブジェクトがテキスト入力と関連するわけではないため、さまざまな現実世界のシナリオへの適用が困難です。本論文では、これらの問題に対処するために、マルチレベル融合を備えたハイブリッドトランスフォーマを提案します。具体的には、多様なマルチモーダル知識グラフ補完タスクに対応するために、統一された入出力を持つハイブリッドトランスフォーマアーキテクチャを活用します。さらに、粗粒度のプレフィックスガイドインタラクションと細粒度の相関認識融合モジュールを介して、視覚とテキストの表現を統合するマルチレベル融合も提案します。我々は、多くの実験を行い、私たちのMKGformerがマルチモーダルリンク予測、マルチモーダル関係抽出、マルチモーダル固有表現認識の4つのデータセットで最先端の性能を獲得できることを確認しました。https://github.com/zjunlp/MKGformer

Abstract:　 Multimodal Knowledge Graphs (MKGs), which organize visual-text factual knowledge, have recently been successfully applied to tasks such as information retrieval, question answering, and recommendation system. Since most MKGs are far from complete, extensive knowledge graph completion studies have been proposed focusing on the multimodal entity, relation extraction and link prediction. However, different tasks and modalities require changes to the model architecture, and not all images/objects are relevant to text input, which hinders the applicability to diverse real-world scenarios. In this paper, we propose a hybrid transformer with multi-level fusion to address those issues. Specifically, we leverage a hybrid transformer architecture with unified input-output for diverse multimodal knowledge graph completion tasks. Moreover, we propose multi-level fusion, which integrates visual and text representation via coarse-grained prefix-guided interaction and fine-grained correlation-aware fusion modules. We conduct extensive experiments to validate that our MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER1. https://github.com/zjunlp/MKGformer.

Re-thinking Knowledge Graph Completion Evaluation from an Information Retrieval Perspective
Authors: Ying Zhou (1), Xuanang Chen (1), Ben He (1), Zheng Ye (2), Le Sun (3)
1: University of Chinese Academy of Sciences & Institute of Software, 2: South-Central University for Nationalities, 3: Institute of Software

ACM DL

Google Scholar

(82)
概要:　知識グラフ補完（KGC）は、既知の事実に基づいて知識グラフ内の不足している知識トリプルを推測することを目的としています。現在のKGC研究は主にエンティティランキングプロトコルに従っており、その有効性はテストトリプルにおけるマスクされたエンティティの予測順位によって測定されます。全体のパフォーマンスは、各個別の回答エンティティにおけるマイクロ（平均）メトリックによって示されます。しかし、大規模な知識ベースの不完全な性質のため、このエンティティランキングの設定は、上位にランクされる未ラベルの陽性例に影響を受ける可能性があり、現在の評価プロトコルがKGCシステムの公平な比較を保証するのに十分かどうかが問題となります。この目的のため、本稿ではラベルの希薄性が現在のKGC評価における人気のあるマイクロメトリックにどのように影響するかについて体系的な研究を行います。具体的には、大規模情報検索（IR）実験のためのTRECパラダイムに触発され、TRECプーリング法に従って人気のあるFB15k-237データセットのサンプルに基づいて比較的「完全な」判断セットを作成します。我々の分析によると、元のラベルから「完全な」ラベルに切り替えると、13種類の人気あるKGCモデルのシステムランキングがマイクロメトリックの観点で劇的に変化することが分かりました。さらに調査を進めると、IR様のマクロ（平均）メトリックが異なる設定下でもより安定しており、識別力が高く、ラベルの希薄性に影響されにくいことがわかりました。したがって、KGC評価のためにTRECスタイルのプーリングを行い、人間の労力とラベルの完全性のバランスを取り、KGCタスクのランキング性質を反映するためにIR様のマクロメトリックも報告することを推奨します。

Abstract:　 Knowledge graph completion (KGC) aims to infer missing knowledge triples based on known facts in a knowledge graph. Current KGC research mostly follows an entity ranking protocol, wherein the effectiveness is measured by the predicted rank of a masked entity in a test triple. The overall performance is then given by a micro(-average) metric over all individual answer entities. Due to the incomplete nature of the large-scale knowledge bases, such an entity ranking setting is likely affected by unlabelled top-ranked positive examples, raising questions on whether the current evaluation protocol is sufficient to guarantee a fair comparison of KGC systems. To this end, this paper presents a systematic study on whether and how the label sparsity affects the current KGC evaluation with the popular micro metrics. Specifically, inspired by the TREC paradigm for large-scale information retrieval (IR) experimentation, we create a relatively "complete" judgment set based on a sample from the popular FB15k-237 dataset following the TREC pooling method. According to our analysis, it comes as a surprise that switching from the original labels to our "complete" labels results in a drastic change of system ranking of a variety of 13 popular KGC models in terms of micro metrics. Further investigation indicates that the IR-like macro(-average) metrics are more stable and discriminative under different settings, meanwhile, less affected by label sparsity. Thus, for KGC evaluation, we recommend conducting TREC-style pooling to balance between human efforts and label completeness, and reporting also the IR-like macro metrics to reflect the ranking nature of the KGC task.

Meta-Knowledge Transfer for Inductive Knowledge Graph Embedding
Authors: Mingyang Chen (1), Wen Zhang (1), Yushan Zhu (1), Hongting Zhou (1), Zonggang Yuan (2), Changliang Xu (3), Huajun Chen (4)
1: Zhejiang University, 2: Huawei Technologies Co., 3: State Key Laboratory of Media Convergence Production Technology and Systems, 4: Zhejiang University & Alibaba-Zhejiang University Joint Institute of Frontier Technologies

ACM DL

Google Scholar

(83)
概要:　近年、多数のトリプルから成るナレッジグラフ（Knowledge Graph、KG）が普及し、それに伴い、KGのエンティティや関係を連続ベクトル空間に埋め込むための多くのナレッジグラフ埋め込み（Knowledge Graph Embedding、KGE）手法が提案されています。このような埋め込み手法は、KG内タスク（例えばリンク予測）やKG外タスク（例えば質問応答）を行う操作を簡略化します。これにより、KGを表現する一般的な解決策と見なすことができます。しかし、既存のKGE手法は、学習済みのモデルが未知のエンティティを含むターゲットKG上でテストされるインダクティブ設定には適用できません。インダクティブ設定に注目した既存の研究はインダクティブな関係予測タスクのみを解決できるものの、エンティティの埋め込みを生成しないため、KGE手法と同様に多様なKG外タスクに対応できません。本論文では、インダクティブなナレッジグラフ埋め込みを達成するために、エンティティの埋め込みを学習するのではなく、エンティティ埋め込みを生成するのに使用できる転送可能なメタナレッジを学習するモデルMorsEを提案します。このメタナレッジはエンティティに依存しないモジュールによってモデル化され、メタラーニングを用いて学習されます。実験結果は、インダクティブ設定におけるKG内およびKG外タスクで対応するベースラインを大幅に上回ることを示しています。

Abstract:　 Knowledge graphs (KGs) consisting of a large number of triples have become widespread recently, and many knowledge graph embedding (KGE) methods are proposed to embed entities and relations of a KG into continuous vector spaces. Such embedding methods simplify the operations of conducting various in-KG tasks (e.g., link prediction) and out-of-KG tasks (e.g., question answering). They can be viewed as general solutions for representing KGs. However, existing KGE methods are not applicable to inductive settings, where a model trained on source KGs will be tested on target KGs with entities unseen during model training. Existing works focusing on KGs in inductive settings can only solve the inductive relation prediction task. They can not handle other out-of-KG tasks as general as KGE methods since they don't produce embeddings for entities. In this paper, to achieve inductive knowledge graph embedding, we propose a model MorsE, which does not learn embeddings for entities but learns transferable meta-knowledge that can be used to produce entity embeddings. Such meta-knowledge is modeled by entity-independent modules and learned by meta-learning. Experimental results show that our model significantly outperforms corresponding baselines for in-KG and out-of-KG tasks in inductive settings.

Multimodal Entity Linking with Gated Hierarchical Fusion and Contrastive Training
Authors: Peng Wang (1), Jiangheng Wu (1), Xiaohang Chen (1)
1: Southeast University

ACM DL

Google Scholar

(84)
概要:　従来の知識グラフ（KG）のエンティティリンク手法は、主にテキストの言及を対応するエンティティにリンクすることに重点を置いています。しかし、テキストが短すぎて十分な文脈を提供できない場合には、多くのマルチモーダルデータの処理に欠点があります。そこで、他のモダリティの有用な情報を導入するというアイデアを思いつき、新しいマルチモーダルエンティティリンク手法、ゲーテッド階層型マルチモーダル融合および対照学習（GHMFC）を提案します。まず、より細かい粒度のインターモーダル相関を発見するために、GHMFCはマルチモーダル共注意機構を通じてテキストおよび視覚の階層的特徴を抽出します：テキスト誘導型視覚注意および視覚誘導型テキスト注意。前者の注意はテキスト情報のガイダンスの下で重み付けされた視覚特徴を取得します。対照的に、後者の注意は視覚情報のガイダンスの下で重み付けされたテキスト特徴を生成します。その後、ゲーテッド融合を使用して異なるモダリティの階層的特徴の重要性を評価し、それらを言及の最終的なマルチモーダル表現に統合します。その後、2種類の対照損失を伴う対照学習が設計され、より汎用的なマルチモーダル特徴を学習し、ノイズを低減します。最後に、KGの言及とエンティティの表現間のコサイン類似度を計算することによってリンクするエンティティを選択します。提案手法を評価するために、本論文では2つの新しいオープンマルチモーダルエンティティリンクデータセット、WikiMELとRichpedia-MELを公開します。実験結果は、GHMFCが意味のあるマルチモーダル表現を学習し、ほとんどのベースライン手法を大幅に上回ることを示しています。

Abstract:　 Previous entity linking methods in knowledge graphs (KGs) mostly link the textual mentions to corresponding entities. However, they have deficiencies in processing numerous multimodal data, when the text is too short to provide enough context. Consequently, we conceive the idea of introducing valuable information of other modalities, and propose a novel multimodal entity linking method with gated hierarchical multimodal fusion and contrastive training (GHMFC). Firstly, in order to discover the fine-grained inter-modal correlations, GHMFC extracts the hierarchical features of text and visual co-attention through the multi-modal co-attention mechanism: textual-guided visual attention and visual-guided textual attention. The former attention obtains weighted visual features under the guidance of textual information. In contrast, the latter attention produces weighted textual features under the guidance of visual information. Afterwards, gated fusion is used to evaluate the importance of hierarchical features of different modalities and integrate them into the final multimodal representations of mentions. Subsequently, contrastive training with two types of contrastive losses is designed to learn more generic multimodal features and reduce noise. Finally, the linking entities are selected by calculating the cosine similarity between representations of mentions and entities in KGs. To evaluate the proposed method, this paper releases two new open multimodal entity linking datasets: WikiMEL and Richpedia-MEL. Experimental results demonstrate that GHMFC can learn meaningful multimodal representation and significantly outperforms most of the baseline methods.

CRET: Cross-Modal Retrieval Transformer for Efficient Text-Video Retrieval
Authors: Kaixiang Ji (1), Jiajia Liu (1), Weixiang Hong (1), Liheng Zhong (1), Jian Wang (1), Jingdong Chen (1), Wei Chu (1)
1: Ant Group

ACM DL

Google Scholar

(85)
概要:　 :
テキストクエリが与えられた場合、テキストからビデオへの検索タスクはデータベース内の関連するビデオを見つけることを目的としています。最近、モデルベースの（MDB）方法は、大規模な事前トレーニングスキーム（例えば、ClipBERT）を装備することによって、特にローカルなビデオ/テキストの対応関係を優れた精度でモデル化する能力で、埋め込みベースの（EDB）方法よりも優れた精度を示しています。一般的に、MDB方法はテキストとビデオのペアを入力として受け取り、相互類似性を予測するために深層モデルを利用します。一方、EDB方法では、まずテキストとビデオの埋め込みを抽出するためのモダリティ固有のエンコーダを使用し、抽出された埋め込みに基づいて距離を評価します。特に、MDB方法はテキストとビデオの明確な表現を生成することができず、代わりにクエリとデータベース内のすべてのアイテムを対にしてそれらの相互類似性を推論段階で予測しなければならず、実際のアプリケーションにおいては著しい非効率を生じます。本研究では、新しいEDB方法であるCRET（クロスモーダルREetリーバルトランスフォーマー）を提案します。これは検索タスクにおいて有望な効率性を示すだけでなく、既存のMDB方法よりも優れた精度を実現します。この成功の理由は主に、提案されたクロスモーダル対応モデリング（CCM）モジュールと埋め込み空間のガウス推定（GEES）損失にあります。具体的には、CCMモジュールはトランスフォーマーデコーダと一連のデコーダセンターで構成されています。学習されたデコーダセンターの助けを借りて、ペアワイズのモデルベースの推論に苦しむことなく、テキスト/ビデオ埋め込みが効率的に整合されます。さらに、与えられたビデオからフレームをサンプリングする際の情報損失と計算コストのバランスを取るために、新しいGEES損失を提案します。これは、計算コストをかけずにビデオ埋め込み空間で密なサンプリングを暗黙的に行います。広範な実験により、追加のデータセットで事前トレーニングされていなくても、提案されたCRETが最先端のMDBメソッドを上回り、検索タスクにおいて有望な効率性を示すことがわかりました。

Abstract:　 Given a text query, the text-to-video retrieval task aims to find the relevant videos in the database. Recently, model-based (MDB) methods have demonstrated superior accuracy than embedding-based (EDB) methods due to their excellent capacity of modeling local video/text correspondences, especially when equipped with large-scale pre-training schemes like ClipBERT. Generally speaking, MDB methods take a text-video pair as input and harness deep models to predict the mutual similarity, while EDB methods first utilize modality-specific encoders to extract embeddings for text and video, then evaluate the distance based on the extracted embeddings. Notably, MDB methods cannot produce explicit representations for text and video, instead, they have to exhaustively pair the query with every database item to predict their mutual similarities in the inference stage, which results in significant inefficiency in practical applications. In this work, we propose a novel EDB method CRET (Cross-modal REtrieval Transformer), which not only demonstrates promising efficiency in retrieval tasks, but also achieves better accuracy than existing MDB methods. The credits are mainly attributed to our proposed Cross-modal Correspondence Modeling (CCM) module and Gaussian Estimation of Embedding Space (GEES) loss. Specifically, the CCM module is composed by transformer decoders and a set of decoder centers. With the help of the learned decoder centers, the text/video embeddings can be efficiently aligned, without suffering from pairwise model-based inference. Moreover, to balance the information loss and computational overhead when sampling frames from a given video, we present a novel GEES loss, which implicitly conducts dense sampling in the video embedding space, without suffering from heavy computational cost. Extensive experiments show that without pre-training on extra datasets, our proposed CRET outperforms the state-of-the-art MDB methods that were pre-trained on additional datasets, meanwhile still shows promising efficiency in retrieval tasks.

Multimodal Disentanglement Variational AutoEncoders for Zero-Shot Cross-Modal Retrieval
Authors: Jialin Tian (1), Kai Wang (1), Xing Xu (1), Zuo Cao (2), Fumin Shen (1), Heng Tao Shen (1)
1: University of Electronic Science and Technology of China, 2: Meituan

ACM DL

Google Scholar

(86)
概要:　ゼロショットクロスモーダル検索（ZS-CMR）は、実用的な検索シナリオに焦点を当てて近年注目を集めています。例えば、マルチモーダルのテストセットが訓練セットの見られたクラスとは異なる見られないクラスで構成されている場合です。最近提案された手法では、生成モデルをメインフレームワークとして採用し、モダリティ間のギャップを緩和するために共通の潜在埋め込み空間を学習することが一般的です。これらの手法は通常、クラス間の知識移転のために補助的なセマンティック埋め込みに大きく依存し、採用された生成モデルにおけるデータ再構築の方法の影響を無意識に軽視しています。この問題に対処するために、私たちは「マルチモーダル解きほぐし変分オートエンコーダ（MDVAE）」という新しいZS-CMRモデルを提案します。これは、2つの連動した解きほぐし変分オートエンコーダ（DVAE）と融合交換VAE（FVAE）で構成されています。具体的には、DVAEは各モダリティの元の表現をモダリティ不変およびモダリティ特異的特徴に解きほぐすために開発されました。FVAEは、プリエクストラクトされたセマンティック埋め込みなしで、再構築および整列プロセスを通じてマルチモーダルデータの情報を融合および交換するために設計されました。さらに、モダリティ不変特徴の情報性と一般化性を強化し、より効果的な知識移転を実現するために、先進的で逆説的なクロス再構築スキームが提案されています。4つの画像-テキスト検索データセットと2つの画像-スケッチ検索データセットでの包括的な実験により、我々の手法が新たな最先端性能を確立することが一貫して示されています。

Abstract:　 Zero-Shot Cross-Modal Retrieval (ZS-CMR) has recently drawn increasing attention as it focuses on a practical retrieval scenario, i.e., the multimodal test set consists of unseen classes that are disjoint with seen classes in the training set. The recently proposed methods typically adopt the generative model as the main framework to learn a joint latent embedding space to alleviate the modality gap. Generally, these methods largely rely on auxiliary semantic embeddings for knowledge transfer across classes and unconsciously neglect the effect of the data reconstruction manner in the adopted generative model. To address this issue, we propose a novel ZS-CMR model termed Multimodal Disentanglement Variational AutoEncoders (MDVAE), which consists of two coupled disentanglement variational autoencoders (DVAEs) and a fusion-exchange VAE (FVAE). Specifically, DVAE is developed to disentangle the original representations of each modality into modality-invariant and modality-specific features. FVAE is designed to fuse and exchange information of multimodal data by the reconstruction and alignment process without pre-extracted semantic embeddings. Moreover, an advanced counter-intuitive cross-reconstruction scheme is further proposed to enhance the informativeness and generalizability of the modality-invariant features for more effective knowledge transfer. The comprehensive experiments on four image-text retrieval and two image-sketch retrieval datasets consistently demonstrate that our method establishes the new state-of-the-art performance.

CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
Authors: Shuai Zhao (1), Linchao Zhu (2), Xiaohan Wang (1), Yi Yang (1)
1: Zhejiang University, 2: University of Technology Sydney

ACM DL

Google Scholar

(87)
概要:　近年、CLIPのような大規模事前学習法は、テキスト・ビデオ検索などのマルチモーダル研究において大きな進展を遂げています。CLIPでは、複雑なマルチモーダル関係をモデル化するためにトランスフォーマーが重要な役割を果たしています。しかし、CLIPのビジョントランスフォーマーにおける、本質的な視覚トークン化プロセスは、連続した類似フレームが持つ冗長性のために、多くの均質なトークンを生成します。これにより計算コストが大幅に増加し、ウェブアプリケーションでのビデオ検索モデルの展開が妨げられることになります。本研究では、冗長なビデオトークンの数を減らすために、最も代表的なトークンを見つけ出し、不要なトークンを削減するマルチセグメントトークンクラスタリングアルゴリズムを設計しました。フレームの冗長性は主に連続フレームで発生するため、ビデオを複数のセグメントに分割し、セグメントレベルでクラスタリングを行います。各セグメントからの中心トークンは新しいシーケンスに連結され、元の空間・時間関係がしっかりと維持されます。我々は、高次元空間での決定論的なメドイドを効率的に見つけるクラスタリングアルゴリズムを2つ実装し、グループを反復的に分割します。このトークンクラスタリングと中心選択手続きにより、冗長な視覚トークンを削除することで計算コストを削減することに成功しました。この方法は、ビデオとテキスト表現間のセグメントレベルのセマンティックアライメントをさらに強化し、セグメント内フレームからのトークンの時空間相互作用を強化します。我々の手法をCenterCLIPと名付け、従来の技術を大幅に上回る成果を典型的なテキスト・ビデオベンチマークで示し、最良のケースでトレーニングメモリコストを35％削減し、推論速度を14％加速しました。コードはhttps://github.com/mzhaoshuai/CenterCLIPで利用可能です。

Abstract:　 Recently, large-scale pre-training methods like CLIP have made great progress in multi-modal research such as text-video retrieval. In CLIP, transformers are vital for modeling complex multi-modal relations. However, in the vision transformer of CLIP, the essential visual tokenization process, which produces discrete visual token sequences, generates many homogeneous tokens due to the redundancy nature of consecutive and similar frames in videos. This significantly increases computation costs and hinders the deployment of video retrieval models in web applications. In this paper, to reduce the number of redundant video tokens, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones. As the frame redundancy occurs mostly in consecutive frames, we divide videos into multiple segments and conduct segment-level clustering. Center tokens from each segment are later concatenated into a new sequence, while their original spatial-temporal relations are well maintained. We instantiate two clustering algorithms to efficiently find deterministic medoids and iteratively partition groups in high dimensional space. Through this token clustering and center selection procedure, we successfully reduce computation costs by removing redundant visual tokens. This method further enhances segment-level semantic alignment between video and text representations, enforcing the spatio-temporal interactions of tokens from within-segment frames. Our method, coined as CenterCLIP, surpasses existing state-of-the-art by a large margin on typical text-video benchmarks, while reducing the training memory cost by 35% and accelerating the inference speed by 14% at the best case. The code is available at https://github.com/mzhaoshuai/CenterCLIP https://github.com/mzhaoshuai/CenterCLIP.

Bit-aware Semantic Transformer Hashing for Multi-modal Retrieval
Authors: Wentao Tan (1), Lei Zhu (1), Weili Guan (2), Jingjing Li (3), Zhiyong Cheng (4)
1: Shandong Normal University, 2: Monash University, 3: University of Electronic Science and Technology of China, 4: Shandong Artificial Intelligence Institute

ACM DL

Google Scholar

(88)
概要:　マルチモーダルハッシングは、非常に低いストレージコストと高速な検索速度を持つバイナリハッシュコードを学習します。これにより、効率的なマルチモーダル検索をサポートすることができます。しかし、既存の多くの手法は依然として以下の3つの重要な問題に直面しています。1) 浅い学習による限定的なセマンティック表現能力。2) 必須の特徴レベルのマルチモーダルフュージョンが、異種のマルチモーダルセマンティックギャップを無視している。3) 直接的な粗いペアワイズのセマンティック保存では、きめ細かいセマンティック相関を効果的に捉えることができない。これらの問題を解決するために、本研究では、ビット認識セマンティックトランスフォーマーハッシング（BSTH）フレームワークを提案します。これはビット単位のセマンティックコンセプトを掘り下げ、同時に異種モダリティをコンセプトレベルで整合させるものです。具体的には、ビット単位の暗黙的セマンティックコンセプトは、自己注意方式でトランスフォーマーを用いて学習され、これにより細かなコンセプトレベルでの暗黙的セマンティックの整合が達成され、異種モダリティギャップが縮小されます。次に、コンセプトレベルでのマルチモーダルフュージョンが行われ、各暗黙的コンセプトのセマンティック表現能力が強化されます。そして、融合されたコンセプト表現がビット単位のハッシュ関数を介して対応するハッシュビットにエンコードされます。さらに、ビット認識トランスフォーマーモジュールを監督するために、全カテゴリのプロトタイプ埋め込みを学習するラベルプロトタイプ学習モジュールが開発されており、共起先行知識を考慮することによって、カテゴリレベルで明示的なセマンティック相関を捉えます。3つの広くテストされたマルチモーダル検索データセットを用いた実験により、提案手法のさまざまな側面からの優位性が示されました。

Abstract:　 Multi-modal hashing learns binary hash codes with extremely low storage cost and high retrieval speed. It can support efficient multi-modal retrieval well. However, most existing methods still suffer from three important problems: 1) Limited semantic representation capability with shallow learning. 2) Mandatory feature-level multi-modal fusion ignores heterogeneous multi-modal semantic gaps. 3) Direct coarse pairwise semantic preserving cannot effectively capture the fine-grained semantic correlations. For solving these problems, in this paper, we propose a Bit-aware Semantic Transformer Hashing (BSTH) framework to excavate bit-wise semantic concepts and simultaneously align the heterogeneous modalities for multi-modal hash learning on the concept-level. Specifically, the bit-wise implicit semantic concepts are learned with the transformer in a self-attention manner, which can achieve implicit semantic alignment on the fine-grained concept-level and reduce the heterogeneous modality gaps. Then, the concept-level multi-modal fusion is performed to enhance the semantic representation capability of each implicit concept and the fused concept representations are further encoded to the corresponding hash bits via bit-wise hash functions. Further, to supervise the bit-aware transformer module, a label prototype learning module is developed to learn prototype embeddings for all categories that capture the explicit semantic correlations on the category-level by considering the co-occurrence priors. Experiments on three widely tested multi-modal retrieval datasets demonstrate the superiority of the proposed method from various aspects.

V2P: Vision-to-Prompt based Multi-Modal Product Summary Generation
Authors: Xuemeng Song (1), Liqiang Jing (1), Dengtian Lin (1), Zhongzhou Zhao (2), Haiqing Chen (2), Liqiang Nie (1)
1: Shandong University, 2: Alibaba Group

ACM DL

Google Scholar

(89)
概要:　マルチモーダル商品サマリー生成は、新しいが困難な課題であり、商品の長いテキスト説明や画像などのマルチモーダルなコンテンツをもとに、簡潔で読みやすいサマリーを生成することを目的としています。既存の方法は大きな成功を収めているものの、以下の3つの主な制約に悩まされています。1）事前学習の利点を見落としている、2）表現レベルの監督が欠けている、3）販売者が生成するデータの多様性を無視している。これらの制約に対処するために、本研究ではVision-to-Promptベースのマルチモーダル商品サマリー生成フレームワーク（V2P）を提案します。このフレームワークでは生成型事前学習言語モデル（GPLM）をバックボーンとして採用しています。特に、GPLMの元のテキスト処理能力を維持し、商品画像に含まれる高次概念を完全に活用するために、「視覚ベースの重要属性予測」と「属性プロンプト主導のサマリー生成」という2つの主要なコンポーネントを設計しました。最初のコンポーネントは、Swin Transformerを用いて商品の画像から主要なセマンティック属性を取得します。二つ目のコンポーネントは、商品の長いテキスト説明と最初のコンポーネントによって生成された属性プロンプトに基づいてサマリーを生成することを目指しています。二つ目のコンポーネントに対する包括的な監督を目指し、従来の出力レベルの監督に加えて、表現レベルの正則化を導入しました。同時に、多様な入力に対処し、二つ目のコンポーネントの堅牢性を向上させるために、データ拡張に基づいた堅牢性正則化を設計しました。大規模な中国語データセットでの広範な実験により、我々のモデルが最先端の方法に比べて優れていることが確認されました。

Abstract:　 Multi-modal Product Summary Generation is a new yet challenging task, which aims to generate a concise and readable summary for a product given its multi-modal content, e.g., its long text description and image. Although existing methods have achieved great success, they still suffer from three key limitations: 1) overlook the benefit of pre-training, 2) lack the representation-level supervision, and 3) ignore the diversity of the seller-generated data. To address these limitations, in this work, we propose a Vision-to-Prompt based multi-modal product summary generation framework, dubbed as V2P, where a Generative Pre-trained Language Model (GPLM) is adopted as the backbone. In particular, to maintain the original text capability of the GPLM and fully utilize the high-level concepts contained in the product image, we design V2P with two key components: vision-based prominent attribute prediction, and attribute prompt-guided summary generation. The first component works on obtaining the vital semantic attributes of the product from its image by the Swin Transformer, while the second component aims to generate the summary based on the product's long text description and the attribute prompts yielded by the first component with a GPLM. Towards comprehensive supervision over the second component, apart from the conventional output-level supervision, we introduce the representation-level regularization. Meanwhile, we design the data augmentation-based robustness regularization to handle the diverse inputs and improve the robustness of the second component. Extensive experiments on a large-scale Chinese dataset verify the superiority of our model over cutting-edge methods.

Learn from Unlabeled Videos for Near-duplicate Video Retrieval
Authors: Xiangteng He (1), Yulin Pan (2), Mingqian Tang (2), Yiliang Lv (2), Yuxin Peng (1)
1: Peking University, 2: Alibaba Group

ACM DL

Google Scholar

(90)
概要:　ニアデュプリケートビデオ検索（NDVR）は、大量のビデオデータベースからクエリビデオのコピーや変換を特定することを目指しています。これにより、著作権保護、トレース、フィルタリングなど、多くのビデオ関連アプリケーションにおいて重要な役割を果たします。ビデオの表現と類似性検索は、いかなるビデオ検索システムにおいても重要です。効果的なビデオ表現を得るために、多くのビデオ検索システムは大量の手動で注釈付けされたデータを必要とし、これがコスト効率を悪化させます。さらに、多くの検索システムはビデオの類似性検索にフレームレベルの特徴を使用しており、記憶と検索の両面で高コストです。これらの問題に対処するために、私たちは効果的にこれらの欠点に対応するビデオ表現学習(VRL)アプローチを提案します。まず、コントラスト学習を通じてラベルなしビデオから効果的にビデオ表現を学習し、手動の注釈コストを回避します。次に、トランスフォーマー構造を利用してフレームレベルの特徴をクリップレベルに集約し、記憶空間と検索の複雑さを両方とも削減します。この方法は、クリップフレーム間の相互作用から補完的で識別力のある情報を学習し、より柔軟な検索方法をサポートするためのフレームの順序変更および欠損に対する不変性を獲得します。FIVR-200KおよびSVDという2つの挑戦的なニアデュプリケートビデオ検索データセットでの包括的な実験により、提案するVRLアプローチの有効性が確認され、ビデオ検索の精度と効率性において最高のパフォーマンスを達成しました。

Abstract:　 Near-duplicate video retrieval (NDVR) aims to find the copies or transformations of the query video from a massive video database. It plays an important role in many video related applications, including copyright protection, tracing, filtering and etc. Video representation and similarity search are crucial to any video retrieval system. To derive effective video representation, most video retrieval systems require a large amount of manually annotated data for training, making it costly inefficient. In addition, most retrieval systems are based on frame-level features for video similarity searching, making it expensive both storage wise and search wise. To address the above issues, we propose a video representation learning (VRL) approach to effectively address the above shortcomings. It first effectively learns video representation from unlabeled videos via contrastive learning to avoid the expensive cost of manual annotation. Then, it exploits transformer structure to aggregate frame-level features into clip-level to reduce both storage space and search complexity. It can learn the complementary and discriminative information from the interactions among clip frames, as well as acquire the frame permutation and missing invariant ability to support more flexible retrieval manners. Comprehensive experiments on two challenging near-duplicate video retrieval datasets, namely FIVR-200K and SVD, verify the effectiveness of our proposed VRL approach, which achieves the best performance of video retrieval on accuracy and efficiency.

Progressive Learning for Image Retrieval with Hybrid-Modality Queries
Authors: Yida Zhao (1), Yuqing Song (1), Qin Jin (1)
1: Renmin University of China

ACM DL

Google Scholar

(91)
概要:　ハイブリッドモダリティクエリによる画像検索（CTI-IR: Composing Text and Image for Image Retrieval）は、検索意図が視覚とテキストモダリティの両方を含む複雑なクエリ形式で表現される検索タスクです。例えば、対象となる製品画像を検索する際、参照画像とその属性の変更を述べたテキストをクエリとして使用します。このタスクは、セマンティックスペースの学習とクロスモーダル融合の両方を必要とするため、従来の画像検索タスクよりも高度な挑戦を伴います。従来のアプローチは、この二つの側面に対応しようとするものの、満足のいくパフォーマンスを達成できていません。

本論文では、CTI-IRタスクを三段階の学習問題として分解し、ハイブリッドモダリティクエリによる画像検索のための複雑な知識を段階的に学習します。まず、オープンドメインの画像テキスト検索のためにセマンティック埋め込み空間を活用し、その後、得られた知識をファッション関連の事前学習タスクでファッションドメインに移行させます。最後に、単一クエリからハイブリッドモダリティクエリへの強化により、CTI-IRタスクに対応します。

さらに、異なる検索シナリオにおけるハイブリッドモダリティクエリ内の各モダリティの寄与度が異なるため、クエリ内の画像とテキストの重要性を動的に決定する自己教師付き適応重み付け戦略を提案します。広範な実験により、提案モデルがFashion-IQおよびShoesベンチマークデータセットで、Recall@Kの平均値がそれぞれ24.9%および9.5%向上し、最先端の手法を大幅に上回ることを示しました。

Abstract:　 Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities. For example, a target product image is searched using a reference product image along with text about changing certain attributes of the reference image as the query. It is a more challenging image retrieval task that requires both semantic space learning and cross-modal fusion. Previous approaches that attempt to deal with both aspects achieve unsatisfactory performance. In this paper, we decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. We first leverage the semantic embedding space for open-domain image-text retrieval, and then transfer the learned knowledge to the fashion-domain with fashion-related pre-training tasks. Finally, we enhance the pre-trained model from single-query to hybrid-modality query for the CTI-IR task. Furthermore, as the contribution of individual modality in the hybrid-modality query varies for different retrieval scenarios, we propose a self-supervised adaptive weighting strategy to dynamically determine the importance of image and text in the hybrid-modality query for better retrieval. Extensive experiments show that our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.

You Need to Read Again: Multi-granularity Perception Network for Moment Retrieval in Videos
Authors: Xin Sun (1), Xuan Wang (2), Jialin Gao (1), Qiong Liu (3), Xi Zhou (3)
1: Shanghai Jiaotong University, 2: Peking University, 3: CloudWalk Technology Co.

ACM DL

Google Scholar

(92)
概要:　ビデオ内のモーメント検索は挑戦的な課題であり、文の記述に基づいてノートリムのビデオから最も関連性の高いモーメントを検索することを目指します。従来の方法は自己モーダル学習およびクロスモーダルな相互作用を粗い方法で行う傾向があり、ビデオコンテンツ、クエリコンテキスト、およびそれらの整合性に含まれる細かな手がかりを見逃しがちです。そのため、我々は新たなMulti-Granularity Perception Network（MGPN）を提案し、モーダリティ内およびモーダリティ間の情報をマルチグラニュラリティレベルで感知します。具体的には、モーメント検索を複数選択の読解問題として定式化し、人間の読解戦略を我々のフレームワークに統合します。粗粒度特徴エンコーダおよび共同注意メカニズムを利用して、モーダリティ内およびモーダリティ間の情報の初期認識を取得します。その後、細粒度特徴エンコーダと条件付き相互作用モジュールを導入し、人間が読解問題に取り組む方法に触発されて初期認識を強化します。さらに、いくつかの既存方法の多大な計算負担を軽減するため、効率的な選択比較モジュールを設計し、品質をほぼ損なうことなく隠れ層のサイズを減らします。Charades-STA、TACoS、およびActivityNet Captionsデータセットでの広範な実験により、我々のソリューションが既存の最先端手法よりも優れていることが示されました。

Abstract:　 Moment retrieval in videos is a challenging task that aims to retrieve the most relevant video moment in an untrimmed video given a sentence description. Previous methods tend to perform self-modal learning and cross-modal interaction in a coarse manner, which neglect fine-grained clues contained in video content, query context, and their alignment. To this end, we propose a novel Multi-Granularity Perception Network (MGPN) that perceives intra-modality and inter-modality information at a multi-granularity level. Specifically, we formulate moment retrieval as a multi-choice reading comprehension task and integrate human reading strategies into our framework. A coarse-grained feature encoder and a co-attention mechanism are utilized to obtain a preliminary perception of intra-modality and inter-modality information. Then a fine-grained feature encoder and a conditioned interaction module are introduced to enhance the initial perception inspired by how humans address reading comprehension problems. Moreover, to alleviate the huge computation burden of some existing methods, we further design an efficient choice comparison module and reduce the hidden size with imperceptible quality loss. Extensive experiments on Charades-STA, TACoS, and ActivityNet Captions datasets demonstrate that our solution outperforms existing state-of-the-art methods.

Video Moment Retrieval from Text Queries via Single Frame Annotation
Authors: Ran Cui (1), Tianwen Qian (2), Pai Peng (3), Elena Daskalaki (1), Jingjing Chen (2), Xiaowei Guo (3), Huyang Sun (3), Yu-Gang Jiang (2)
1: The Australian National University, 2: Fudan University, 3: bilibili

ACM DL

Google Scholar

(93)
概要:　ビデオモメントリトリーバルは、与えられた自然言語のクエリに基づいて、ビデオ内の適切なモメント（部分）の開始および終了のタイムスタンプを見つけることを目的としています。完全に監督された方法は、有望な結果を達成するためには完全な時間的境界の注釈が必要であり、これは注釈者が全体のモメントを視聴する必要があるため費用がかかります。弱監督方式は、ビデオとクエリのペアのみに依存しますが、パフォーマンスは比較的低いです。本論文では、注釈プロセスにもっと注目し、「グランス注釈」という新しいパラダイムを提案します。このパラダイムでは、完全に監督された対応物の時間的境界内で、ランダムな単一フレームのタイムスタンプ、つまり「グランス」を必要とします。これは弱い監督と比べて、コストの増加がほとんどなく、パフォーマンスの向上が見込めるため、利益があると考えます。グランス注釈設定の下で、我々は対比学習に基づく「グランス注釈によるビデオモメントリトリーバル（ViGA）」と呼ばれる手法を提案します。ViGAは入力ビデオをクリップに分割し、クリップとクエリの間で対比を行います。その際、グランスに基づいたガウス分布の重みが全てのクリップに割り当てられます。我々の広範な実験により、ViGAは最先端の弱監督方式よりも大幅に優れた結果を達成し、場合によっては完全に監督された方法とも比較可能な成果を上げることが示されました。

Abstract:　 Video moment retrieval aims at finding the start and end timestamps of a moment (part of a video) described by a given natural language query. Fully supervised methods need complete temporal boundary annotations to achieve promising results, which is costly since the annotator needs to watch the whole moment. Weakly supervised methods only rely on the paired video and query, but the performance is relatively poor. In this paper, we look closer into the annotation process and propose a new paradigm called "glance annotation". This paradigm requires the timestamp of only one single random frame, which we refer to as a "glance", within the temporal boundary of the fully supervised counterpart. We argue this is beneficial because comparing to weak supervision, trivial cost is added yet more potential in performance is provided. Under the glance annotation setting, we propose a method named as Video moment retrieval via Glance Annotation (ViGA) based on contrastive learning. ViGA cuts the input video into clips and contrasts between clips and queries, in which glance guided Gaussian distributed weights are assigned to all clips. Our extensive experiments indicate that ViGA achieves better results than the state-of-the-art weakly supervised methods by a large margin, even comparable to fully supervised methods in some cases.

HTKG: Deep Keyphrase Generation with Neural Hierarchical Topic Guidance
Authors: Yuxiang Zhang (1), Tao Jiang (1), Tianyu Yang (1), Xiaoli Li (2), Suge Wang (3)
1: Civil Aviation University of China, 2: A*STAR, 3: Shanxi University

ACM DL

Google Scholar

(94)
概要:　キーフレーズは、通常、階層的なトピック構造を持つ文書内で議論される高レベルのトピックを簡潔に説明することができます。そのため、階層的なトピック構造を理解し、それをキーフレーズ識別の指針として利用することは非常に重要です。しかし、階層的なトピック情報を高度なキーフレーズ生成モデルに統合することは未だ未開拓の領域です。本論文では、階層的なトピックを効果的に活用し、キーフレーズ生成性能（HTKG）を向上させる方法に焦点を当てます。具体的には、階層的なトピックに基づく変分ニューラルシーケンス生成方法を提案します。この方法は2つの主要なモジュールで構成されており、一つは全文書コーパスの潜在的なトピックツリーを学習するニューラル階層トピックモデル、もう一つは階層的なトピックガイドの下でキーフレーズを生成する変分ニューラルキーフレーズ生成モデルです。最終的に、これら2つのモジュールを共同で訓練し、互いから補完的な情報を学習することを可能にします。私たちの知る限り、ニューラル階層トピックをキーフレーズ生成に利用する試みはこれが初めてです。実験結果は、私たちの方法が5つのベンチマークデータセット全体で既存の最先端の方法を大幅に上回ることを示しています。

Abstract:　 Keyphrases can concisely describe the high-level topics discussed in a document that usually possesses hierarchical topic structures. Thus, it is crucial to understand the hierarchical topic structures and employ it to guide the keyphrase identification. However, integrating the hierarchical topic information into a deep keyphrase generation model is unexplored. In this paper, we focus on how to effectively exploit the hierarchical topic to improve the keyphrase generation performance (HTKG). Specifically, we propose a novel hierarchical topic-guided variational neural sequence generation method for keyphrase generation, which consists of two major modules: a neural hierarchical topic model that learns the latent topic tree across the whole corpus of documents, and a variational neural keyphrase generation model to generate keyphrases under hierarchical topic guidance. Finally, these two modules are jointly trained to help them learn complementary information from each other. To the best of our knowledge, this is the first attempt to leverage the neural hierarchical topic to guide keyphrase generation. The experimental results demonstrate that our method significantly outperforms the existing state-of-the-art methods across five benchmark datasets.

Logiformer: A Two-Branch Graph Transformer Network for Interpretable Logical Reasoning
Authors: Fangzhi Xu (1), Jun Liu (2), Qika Lin (1), Yudai Pan (1), Lingling Zhang (1)
1: Xi'an Jiaotong University, 2: National Engineering Lab for Big Data Analytics

ACM DL

Google Scholar

(95)
概要:　機械読解はテキスト理解のモデルの可能性を探るため、広く注目を集めています。機械に推論能力を持たせるために、論理推論という挑戦的な課題が提案されています。これまでの論理推論に関する研究では、異なる側面から論理ユニットを抽出するためのいくつかの戦略が提案されてきました。しかし、論理ユニット間の長距離依存性をモデル化することにはまだ課題が残っています。また、テキストの論理構造を明らかにし、離散的な論理を連続的なテキスト埋め込みに統合することも求められます。これらの課題に対処するために、我々は論理推論のための二枝グラフトランスフォーマーネットワークを利用したエンドツーエンドモデル「Logiformer」を提案します。まず、異なる抽出戦略を導入してテキストを2つの論理ユニットのセットに分割し、それぞれ論理グラフと構文グラフを構築します。論理グラフは論理ブランチの因果関係をモデル化し、一方、構文グラフは構文ブランチの共起関係を捕捉します。次に、長距離依存性をモデル化するために、それぞれのグラフからのノードシーケンスを全結合のグラフトランスフォーマー構造に入力します。この二つの隣接行列はグラフトランスフォーマーレイヤの注意バイアスとして利用され、離散的な論理構造を連続的なテキスト埋め込み空間にマッピングします。さらに、動的ゲート機構と質問認識セルフアテンションモジュールを導入し、回答予測の前に特徴を更新します。この推論プロセスは、人間の認知と一致する論理ユニットを用いることで解釈可能性を提供します。実験結果は、我々のモデルの優位性を示しており、2つの論理推論ベンチマークで最先端のシングルモデルを上回っています。

Abstract:　 Machine reading comprehension has aroused wide concerns, since it explores the potential of model for text understanding. To further equip the machine with the reasoning capability, the challenging task of logical reasoning is proposed. Previous works on logical reasoning have proposed some strategies to extract the logical units from different aspects. However, there still remains a challenge to model the long distance dependency among the logical units. Also, it is demanding to uncover the logical structures of the text and further fuse the discrete logic to the continuous text embedding. To tackle the above issues, we propose an end-to-end model Logiformer which utilizes a two-branch graph transformer network for logical reasoning of text. Firstly, we introduce different extraction strategies to split the text into two sets of logical units, and construct the logical graph and the syntax graph respectively. The logical graph models the causal relations for the logical branch while the syntax graph captures the co-occurrence relations for the syntax branch. Secondly, to model the long distance dependency, the node sequence from each graph is fed into the fully connected graph transformer structures. The two adjacent matrices are viewed as the attention biases for the graph transformer layers, which map the discrete logical structures to the continuous text embedding space. Thirdly, a dynamic gate mechanism and a question-aware self-attention module are introduced before the answer prediction to update the features. The reasoning process provides the interpretability by employing the logical units, which are consistent with human cognition. The experimental results show the superiority of our model, which outperforms the state-of-the-art single model on two logical reasoning benchmarks.

Personalized Abstractive Opinion Tagging
Authors: Mengxue Zhao (1), Yang Yang (2), Miao Li (3), Jingang Wang (2), Wei Wu (2), Pengjie Ren (1), Maarten de Rijke (4), Zhaochun Ren (1)
1: Shandong University, 2: Meituan, 3: The University of Melbourne, 4: University of Amsterdam

ACM DL

Google Scholar

(96)
概要:　意見タグとは、製品やサービスの特定の側面に関する一連の単語を指します。これらの意見タグは、製品レビューの重要な特性を反映し、ユーザーがeコマースポータル内でその内容を迅速に把握するのに役立ちます。的意見タグ付けのタスクは以前から提案されており、特定のレビューに対してランク付けされた意見タグのリストを自動生成することが目的です。しかし、現行の意見タグ付けモデルは個人化がなされておらず、特にeコマースにおいてユーザーエンゲージメントの向上に欠かせない要素である個人化が欠如しています。本論文では、個人化された的意見タグ付けのタスクに焦点を当てます。個人化された意見タグのエンドツーエンド生成モデルを開発するにあたり、二つの主な課題があります: レビューの希薄性と複数タイプのシグナル（明示的なレビュー信号と暗黙の行動信号）を統合する難しさです。これらの課題に対処するために、我々はPOTと名付けたエンドツーエンドモデルを提案します。このモデルは三つの主要なコンポーネントから構成されます: （1）階層的異種レビューグラフに基づいたレビューに基づく明示的な嗜好追跡コンポーネント、（2）異種行動グラフを使用して暗黙の行動に基づく嗜好を追跡する行動ベースの暗黙的嗜好追跡コンポーネント、（3）個人化されたランク対応のタグ付けコンポーネント。実験では、eコマースプラットフォームから収集した実データを用いてPOTを評価し、その結果、強力なベースラインを大幅に上回ることを示しました。

Abstract:　 An opinion tag is a sequence of words on a specific aspect of a product or service. Opinion tags reflect key characteristics of product reviews and help users quickly understand their content in e-commerce portals. The task of abstractive opinion tagging has previously been proposed to automatically generate a ranked list of opinion tags for a given review. However, current models for opinion tagging are not personalized, even though personalization is an essential ingredient of engaging user interactions, especially in e-commerce. In this paper, we focus on the task of personalized abstractive opinion tagging. There are two main challenges when developing models for the end-to-end generation of personalized opinion tags: sparseness of reviews and difficulty to integrate multi-type signals, i.e., explicit review signals and implicit behavioral signals. To address these challenges, we propose an end-to-end model, named POT, that consists of three main components: (1) a review-based explicit preference tracker component based on a hierarchical heterogeneous review graph to track user preferences from reviews; (2)a behavior-based implicit preference tracker component using a heterogeneous behavior graph to track the user preferences from implicit behaviors; and (3) a personalized rank-aware tagging component to generate a ranked sequence of personalized opinion tags. In our experiments, we evaluate POT on a real-world dataset collected from e-commerce platforms and the results demonstrate that it significantly outperforms strong baselines.

Contrastive Learning with Hard Negative Entities for Entity Set Expansion
Authors: Yinghui Li (1), Yangning Li (1), Yuxin He (2), Tianyu Yu (1), Ying Shen (3), Hai-Tao Zheng (1)
1: Tsinghua University, 2: Harbin Institute of Technology, 3: Sun-Yat Sen University

ACM DL

Google Scholar

(97)
概要:　エンティティセット拡張（ESE）は、有望なタスクであり、小規模な種エンティティセットによって記述された対象の意味クラスのエンティティを拡張することを目指します。ESEは知識を発見する能力により、さまざまなNLP及びIRアプリケーションに利益をもたらします。以前のESE手法は大きな進歩を遂げてきましたが、その多くは依然として難しいネガティブエンティティ（すなわち、対象のエンティティと区別しにくいエンティティ）の処理能力が欠けています。これは、異なる粒度レベルで分析する際、二つのエンティティが同じ意味クラスに属するかどうかが変わり得るためです。この課題を克服するために、コントラスト学習を用いたエンティティレベルのマスク言語モデルを考案し、エンティティの表現を精緻化しました。さらに、この言語モデルで得られたエンティティ表現を利用してエンティティを拡張する新しい確率的ESEフレームワーク「ProbExpan」を提案します。3つのデータセットにおける広範な実験と詳細な解析により、我々の手法が以前の最先端手法を上回ることを示しました。

Abstract:　 Entity Set Expansion (ESE) is a promising task which aims to expand entities of the target semantic class described by a small seed entity set. Various NLP and IR applications will benefit from ESE due to its ability to discover knowledge. Although previous ESE methods have achieved great progress, most of them still lack the ability to handle hard negative entities (i.e., entities that are difficult to distinguish from the target entities), since two entities may or may not belong to the same semantic class based on different granularity levels we analyze on. To address this challenge, we devise an entity-level masked language model with contrastive learning to refine the representation of entities. In addition, we propose the ProbExpan, a novel probabilistic ESE framework utilizing the entity representation obtained by the aforementioned language model to expand entities. Extensive experiments and detailed analyses on three datasets show that our method outperforms previous state-of-the-art methods.

Unifying Cross-lingual Summarization and Machine Translation with Compression Rate
Authors: Yu Bai (1), Heyan Huang (2), Kai Fan (3), Yang Gao (4), Yiming Zhu (4), Jiaao Zhan (4), Zewen Chi (4), Boxing Chen (3)
1: Beijing Institute of Technology & Beijing Engineering Research Center of High Volume Language Information Processing and Cloud Computing Applications, 2: Beijing Institute of Technology & Southeast Academy of Information Technology, 3: Alibaba DAMO Academy, 4: Beijing Institute of Technology

ACM DL

Google Scholar

(98)
概要:　クロスリンガル(Cross-Lingual Summarization, CLS) は、ソースドキュメントから重要な情報を抽出し、別の言語でするタスクです。このタスクは、理解、、そして翻訳を同時に行う必要があるため、モノリンガル(Monolingual Summarization, MS) や機械翻訳 (Machine Translation, MT) と深く関連しています。実際には、機械翻訳のトレーニングリソースはクロスリンガルおよびモノリンガルのそれよりも遥かに多く存在します。したがって、CLSの性能を向上させるためには、機械翻訳のコーパスを活用することが有益です。しかし、現行の方法はシンプルなマルチタスクフレームワークを利用するだけで、深い探求が不足しています。本論文では、大規模な機械翻訳コーパスを活用するために、新しいタスク「圧縮率を考慮したクロスリンガル(Cross-lingual Summarization with Compression rate, CSC)」を提案します。圧縮率とは、ソーステキストとターゲットテキスト間の情報比率を指し、MTタスクを圧縮率100%の特別なCLSタスクと見なします。これにより、知識をより効果的に共有するために、CLSとMTを統一されたタスクとしてトレーニングできます。しかし、MTタスクとCLSタスクの間には大きなギャップが存在し、圧縮率が30%から90%のサンプルは極めて稀です。そこで、これらのタスクをスムーズに橋渡しするために、異なる圧縮率を持つドキュメントとのペアを生成する効果的なデータ拡張方法を提案します。この方法はCLSタスクの性能を向上させるだけでなく、望ましい長さのを生成するためのコントロール性も提供します。実験により、我々の手法が3つのクロスリンガルデータセットにおいて様々な強力なベースラインを上回る性能を示すことが確認されました。コードとデータは、https://github.com/ybai-nlp/CLS_CR にて公開しています。

Abstract:　 Cross-Lingual Summarization (CLS) is a task that extracts important information from a source document and summarizes it into a summary in another language. It is a challenging task that requires a system to understand, summarize, and translate at the same time, making it highly related to Monolingual Summarization (MS) and Machine Translation (MT). In practice, the training resources for Machine Translation are far more than that for cross-lingual and monolingual summarization. Thus incorporating the Machine Translation corpus into CLS would be beneficial for its performance. However, the present work only leverages a simple multi-task framework to bring Machine Translation in, lacking deeper exploration. In this paper, we propose a novel task, Cross-lingual Summarization with Compression rate (CSC), to benefit Cross-Lingual Summarization by large-scale Machine Translation corpus. Through introducing compression rate, the information ratio between the source and the target text, we regard the MT task as a special CLS task with a compression rate of 100%. Hence they can be trained as a unified task, sharing knowledge more effectively. However, a huge gap exists between the MT task and the CLS task, where samples with compression rates between 30% and 90% are extremely rare. Hence, to bridge these two tasks smoothly, we propose an effective data augmentation method to produce document-summary pairs with different compression rates. The proposed method not only improves the performance of the CLS task, but also provides controllability to generate summaries in desired lengths. Experiments demonstrate that our method outperforms various strong baselines in three cross-lingual summarization datasets. We released our code and data at https://github.com/ybai-nlp/CLS_CR.

What Makes the Story Forward?: Inferring Commonsense Explanations as Prompts for Future Event Generation
Authors: Li Lin (1), Yixin Cao (2), Lifu Huang (3), Shu'Ang Li (1), Xuming Hu (1), Lijie Wen (1), Jianmin Wang (1)
1: Tsinghua University, 2: Singapore Management University, 3: Virginia Tech

ACM DL

Google Scholar

(99)
概要:　イベントシーケンスの予測は、情報検索や自然言語処理の多くの現実世界のアプリケーションにおいて重要な役割を果たします。未来のイベント生成（Future Event Generation, FEG）は、単に流暢なテキスト生成だけでなく、論理的な一貫性を保つための常識的な推論も要求されるため、イベントシーケンス予測における難しい課題です。本論文では、新しい説明可能なFEGフレームワークであるCoepを提案します。Coepは、直接的なイベント間の関係に関する順序知識と、イベント間の中間的なキャラクター心理（意図、原因、反応など）を反映する推論知識の2種類のイベント知識を強調し、統合します。これらは物語を前に進める根本的な要素です。知識の忘却を軽減するために、各知識タイプに対応する2つのモジュール、IMとGMを設計し、プロンプトチューニングを介して統合します。まず、IMは推論知識を理解し、常識的な説明を生成し、GMのためのソフトプロンプトベクトルを提供します。また、一般化能力を向上させるためにコントラストディスクリミネータを設計しました。次に、GMはIMの指導の下、直接的な順序知識をモデル化して未来のイベントを生成します。自動評価と人間評価の結果、我々のアプローチがより一貫性があり、具体的で論理的な未来のイベントを生成可能であることを示しました。

Abstract:　 Prediction over event sequences is critical for many real-world applications in Information Retrieval and Natural Language Processing. Future Event Generation (FEG) is a challenging task in event sequence prediction because it requires not only fluent text generation but also commonsense reasoning to maintain the logical coherence of the entire event story. In this paper, we propose a novel explainable FEG framework, Coep. It highlights and integrates two types of event knowledge, sequential knowledge of direct event-event relations and inferential knowledge that reflects the intermediate character psychology between events, such as intents, causes, reactions, which intrinsically pushes the story forward. To alleviate the knowledge forgetting issue, we design two modules, IM and GM, for each type of knowledge, which are combined via prompt tuning. First, IM focuses on understanding inferential knowledge to generate commonsense explanations and provide a soft prompt vector for GM. We also design a contrastive discriminator for better generalization ability. Second, GM generates future events by modeling direct sequential knowledge with the guidance of IM. Automatic and human evaluation demonstrate that our approach can generate more coherent, specific, and logical future events.

A Dual-Expert Framework for Event Argument Extraction
Authors: Rui Li (1), Wenlin Zhao (1), Cheng Yang (1), Sen Su (1)
1: Beijing University of Posts and Telecommunications

ACM DL

Google Scholar

(100)
概要:　イベント引数抽出（EAE）は、与えられたテキストで記述されたイベントの引数を特定し、それらが果たす役割を分類する重要な情報抽出課題です。現実的なEAEデータの主な特徴の一つとされるのは、異なる役割のインスタンス数が顕著なロングテール分布に従うことです。しかし、現行のEAEモデルの訓練および評価パラダイムは、「テール役割」に対するパフォーマンスを無視する傾向があるか、モデル訓練のために役割インスタンス分布を非現実的な一様分布に変更してしまいます。ロングテールデータセットにおけるクラス不均衡を緩和する汎用的な方法もありますが、それらは通常、「ヘッドクラス」のパフォーマンスを代償にします。これらの問題に対処するために、我々は現実的なロングテールEAEデータセットでモデルを訓練し、すべての役割に対する平均的なパフォーマンスを評価することを提案します。エキスパートの混合モデル（MOE）に着想を得て、我々はルーティングバランスド・デュアルエキスパート・フレームワーク（RBDEF）を提案します。これはすべての役割を「ヘッド」と「テール」の二つの範囲に分け、それぞれの役割の分類を二つの個別のエキスパートに割り当てます。推論において、各エンコードされたインスタンスはルーティングメカニズムによって二つのエキスパートの一方に割り当てられます。役割インスタンスの不均衡によるルーティングエラーを減らすため、我々はバランスド・ルーティング・メカニズム（BRM）を設計しました。これにより、いくつかのヘッド役割をテールエキスパートに移し、ルーティングの負荷をバランスさせ、トリフィルタ・ルーティング戦略を採用してテールエキスパートのインスタンスの誤割り当てを減少させます。希少なインスタンスを持つテール役割の効果的な学習を可能にするために、我々はテールエキスパートを訓練するためのターゲット・スペシャライズド・メタラーニング（TSML）を考案しました。無限のタスクに均等に適用される汎用的なパラメータ初期化を探す他のメタラーニングアルゴリズムとは異なり、TSMLは探索経路を適応的に調整し、テールエキスパートのための特化した初期化を取得することができます。これにより、テール役割の学習への利益が拡張されます。実験において、RBDEFは最先端のEAEモデルおよびロングテールデータのための先進的な方法を大幅に上回るパフォーマンスを示しました。

Abstract:　 Event argument extraction (EAE) is an important information extraction task, which aims to identify the arguments of an event described in a given text and classify the roles played by them. A key characteristic in realistic EAE data is that the instance numbers of different roles follow an obvious long-tail distribution. However, the training and evaluation paradigms of existing EAE models either prone to neglect the performance on "tail roles'', or change the role instance distribution for model training to an unrealistic uniform distribution. Though some generic methods can alleviate the class imbalance in long-tail datasets, they usually sacrifice the performance of "head classes'' as a trade-off. To address the above issues, we propose to train our model on realistic long-tail EAE datasets, and evaluate the average performance over all roles. Inspired by the Mixture of Experts (MOE), we propose a Routing-Balanced Dual Expert Framework (RBDEF), which divides all roles into "head" and "tail" two scopes and assigns the classifications of head and tail roles to two separate experts. In inference, each encoded instance will be allocated to one of the two experts by a routing mechanism. To reduce routing errors caused by the imbalance of role instances, we design a Balanced Routing Mechanism (BRM), which transfers several head roles to the tail expert to balance the load of routing, and employs a tri-filter routing strategy to reduce the misallocation of the tail expert's instances. To enable an effective learning of tail roles with scarce instances, we devise Target-Specialized Meta Learning (TSML) to train the tail expert. Different from other meta learning algorithms that only search a generic parameter initialization equally applying to infinite tasks, TSML can adaptively adjust its search path to obtain a specialized initialization for the tail expert, thereby expanding the benefits to the learning of tail roles. In experiments, RBDEF significantly outperforms the state-of-the-art EAE models and advanced methods for long-tail data.

CorED: Incorporating Type-level and Instance-level Correlations for Fine-grained Event Detection
Authors: Jiawei Sheng (1), Rui Sun (2), Shu Guo (3), Shiyao Cui (1), Jiangxia Cao (1), Lihong Wang (3), Tingwen Liu (1), Hongbo Xu (1)
1: Institute of Information Engineering, 2: Beihang University, 3: National Computer Network Emergency Response Technical Team/Coordination Center of China

ACM DL

Google Scholar

(101)
概要:　イベント検出 (ED) は、イベントトリガーを識別し、それらを事前定義されたイベントタイプに分類することを目的とした、情報検索における重要なタスクです。実世界のアプリケーションでは、イベントは通常、細かく分類された多数のタイプで注釈が付けられ、その結果、ロングテールタイプや同時発生イベントの性質が生じることがよくあります。既存の研究では、イベントの相関関係を十分に活用することなく探求しているため、イベント検出の能力が制限される可能性があります。本論文では、タイプレベルおよびインスタンスレベルの両方のイベント相関関係を同時に取り入れ、新しいフレームワーク「CorED」を提案します。具体的には、インスタンスレベルの相関関係を捉えるために、適応型グラフベースのタイプエンコーダを考案し、トレーニングデータだけでなく関連するタイプからもタイプ表現を学習します。これにより、特にリソースの少ないタイプに対して、より情報豊富なタイプ表現が得られます。また、インスタンスインタラクティブデコーダを考案し、コンテキストに基づいたタイプ付きイベントインスタンスを条件としてイベントインスタンスタイプを予測し、同時発生イベントを予測の重要な証拠として活用します。公開されている2つのベンチマーク、MAVENとACE-2005データセットで実験を行いました。実証結果は、タイプレベルおよびインスタンスレベルの相関関係の統一性を示し、モデルが両ベンチマークで効果的なパフォーマンスを達成することを示しています。

Abstract:　 Event detection (ED) is a pivotal task for information retrieval, which aims at identifying event triggers and classifying them into pre-defined event types. In real-world applications, events are usually annotated with numerous fine-grained types, which often arises long-tail type nature and co-occurrence event nature. Existing studies explore the event correlations without full utilization, which may limit the capability of event detection. This paper simultaneously incorporates both the type-level and instance-level event correlations, and proposes a novel framework, termed as CorED. Specifically, we devise an adaptive graph-based type encoder to capture instance-level correlations, learning type representations not only from their training data but also from their relevant types, thus leading to more informative type representations especially for the low-resource types. Besides, we devise an instance interactive decoder to capture instance-level correlations, which predicts event instance types conditioned on the contextual typed event instances, leveraging co-occurrence events as remarkable evidence in prediction. We conduct experiments on two public benchmarks, MAVEN and ACE-2005 dataset. Empirical results demonstrate the unity of both type-level and instance-level correlations, and the model achieves effectiveness performance on both benchmarks.

Hierarchical Multi-Task Graph Recurrent Network for Next POI Recommendation
Authors: Nicholas Lim (1), Bryan Hooi (2), See-Kiong Ng (2), Yong Liang Goh (1), Renrong Weng (1), Rui Tan (1)
1: GrabTaxi Holdings, 2: National University of Singapore

ACM DL

Google Scholar

(102)
概要:　ユーザーが次に訪れるPoint-of-Interest（POI）を予測することは、地域内のPOIの探索空間が広大であるため、パーソナライズされたレコメンダーシステムにおいて困難な課題です。既存の研究に共通する問題として、ユーザー-POIマトリックスのスパース性（疎性）が学習と性能向上を阻む要因となっています。本研究では、階層的マルチタスクグラフリカレントネットワーク（Hierarchical Multi-Task Graph Recurrent Network, HMT-GRN）アプローチを提案し、マルチタスク設定で異なるユーザー-地域マトリックスを学習することでデータのスパース性問題を緩和します。次に、異なる地域とPOIの分布に対して階層的ビームサーチ（Hierarchical Beam Search, HBS）を行い、空間的細分化を進めながら探索空間を階層的に縮小し、次のPOIを予測します。我々のHBSは探索空間を縮小することで効率を高め、従来の全探索アプローチに比べて5倍から7倍の速度向上を実現します。さらに、新規の選択層を提案し、次のPOIがユーザーによって以前に訪れたことがあるかどうかを予測することで、パーソナライズと探索のバランスを取ります。2つの実世界の位置情報ベースソーシャルネットワーク（LBSN）データセットに対する実験結果から、我々のモデルがベースラインおよび最新の手法を大幅に上回る性能を示すことが確認されました。

Abstract:　 Learning which Point-of-Interest (POI) a user will visit next is a challenging task for personalized recommender systems due to the large search space of possible POIs in the region. A recurring problem among existing works that makes it difficult to learn and perform well is the sparsity of the User-POI matrix. In this paper, we propose our Hierarchical Multi-Task Graph Recurrent Network (HMT-GRN) approach, which alleviates the data sparsity problem by learning different User-Region matrices of lower sparsities in a multi-task setting. We then perform a Hierarchical Beam Search (HBS) on the different region and POI distributions to hierarchically reduce the search space with increasing spatial granularity and predict the next POI. Our HBS provides efficiency gains by reducing the search space, resulting in speedups of 5 to 7 times over an exhaustive approach. In addition, we also propose a novel selectivity layer to predict if the next POI has been visited before by the user to balance between personalization and exploration. Experimental results on two real-world Location-Based Social Network (LBSN) datasets show that our model significantly outperforms baseline and the state-of-the-art methods.

GETNext: Trajectory Flow Map Enhanced Transformer for Next POI Recommendation
Authors: Song Yang (1), Jiamou Liu (1), Kaiqi Zhao (1)
1: The Unviersity of Auckland

ACM DL

Google Scholar

(103)
概要:　次のPOI（ポイント・オブ・インタレスト）推奨は、ユーザーの現在の状況と履歴情報を基に、ユーザーの直近の移動を予測することを目的としており、ユーザーとサービスプロバイダーの両方にとって大きな価値を提供します。しかし、この問題は多様なデータ動向を総合的に考慮する必要があるため、極めて複雑です。これには空間的な位置情報、時間的な文脈、ユーザーの好みなどが含まれます。ほとんどの既存研究は次のPOI推奨をシーケンス予測問題として捉える一方で、他のユーザーからの協調的なシグナルを見落としています。そこで、我々はユーザー固有の特徴に依存しないグローバルトラジェクトリーフローマップと、新たなグラフ強化型トランスフォーマーモデル（GETNext）を提案し、次のPOI予測の精度をより高めるために、広範な協調的シグナルを効果的に活用し、同時にコールドスタート問題を軽減します。GETNextは、グローバルな遷移パターン、ユーザーの一般的な嗜好、時空間コンテキスト、および時間認識カテゴリ埋め込みをトランスフォーマーモデルに統合し、ユーザーの将来の移動を予測します。この設計により、我々のモデルは最先端の手法を大幅に上回り、さらに時空間的に関与する推奨問題におけるコールドスタートの課題に対する洞察も提供します。

Abstract:　 Next POI recommendation intends to forecast users' immediate future movements given their current status and historical information, yielding great values for both users and service providers. However, this problem is perceptibly complex because various data trends need to be considered together. This includes the spatial locations, temporal contexts, user's preferences, etc. Most existing studies view the next POI recommendation as a sequence prediction problem while omitting the collaborative signals from other users. Instead, we propose a user-agnostic global trajectory flow map and a novel Graph Enhanced Transformer model (GETNext) to better exploit the extensive collaborative signals for a more accurate next POI prediction, and alleviate the cold start problem in the meantime. GETNext incorporates the global transition patterns, user's general preference, spatio-temporal context, and time-aware category embeddings together into a transformer model to make the prediction of user's future moves. With this design, our model outperforms the state-of-the-art methods with a large margin and also sheds light on the cold start challenges within the spatio-temporal involved recommendation problems.

Learning Graph-based Disentangled Representations for Next POI Recommendation
Authors: Zhaobo Wang (1), Yanmin Zhu (1), Haobing Liu (1), Chunyang Wang (1)
1: Shanghai Jiao Tong University

ACM DL

Google Scholar

(104)
概要:　次のポイント・オブ・インタレスト（POI）推薦は、ユーザーに魅力的な目的地を個別に提案することで、多くの位置情報ベースのアプリケーションにおいて重要な役割を果たします。ユーザーの次の移動が過去の訪問履歴に強く関連しているため、チェックイン行動をモデル化するためにリカレントニューラルネットワークなどのシーケンシャルメソッドが広く用いられています。しかし、既存の手法は主にチェックインシーケンスの逐次的な規則性のモデル化に焦点を当てており、POIの本質的な特性に対する注意が不足しています。そのため、異なる側面からの多様な影響の絡み合いを無視しています。本論文では、次のPOI推薦のために、新しいDisentangled Representation-enhanced Attention Network（DRAN）を提案し、POIをより正確に表現するために、異なる側面とそれに対応する影響を明示的にモデル化するための分離表現を活用します。具体的には、まず二種類のPOI関係グラフを精緻化し、距離ベースおよび遷移ベースの影響を最大限に活用してグラフベースの分離表現を学習するプロパゲーションルールを設計します。次に、注意アーキテクチャを拡張して、個人的な時空間情報を集約し、次のタイムスタンプにおけるユーザーの動的な嗜好をモデル化すると同時に、分離表現の異なるコンポーネントを独立に保ちます。二つの実世界データセットにおける多数の実験により、本モデルが最先端のアプローチに対して優れた性能を示すことが確認されました。さらに、DRANの表現分離における有効性も確認されました。

Abstract:　 Next Point-of-Interest (POI) recommendation plays a critical role in many location-based applications as it provides personalized suggestions on attractive destinations for users. Since users' next movement is highly related to the historical visits, sequential methods such as recurrent neural networks are widely used in this task for modeling check-in behaviors. However, existing methods mainly focus on modeling the sequential regularity of check-in sequences but pay little attention to the intrinsic characteristics of POIs, neglecting the entanglement of the diverse influence stemming from different aspects of POIs. In this paper, we propose a novel Disentangled Representation-enhanced Attention Network (DRAN) for next POI recommendation, which leverages the disentangled representations to explicitly model different aspects and corresponding influence for representing a POI more precisely. Specifically, we first design a propagation rule to learn graph-based disentangled representations by refining two types of POI relation graphs, making full use of the distance-based and transition-based influence for representation learning. Then, we extend the attention architecture to aggregate personalized spatio-temporal information for modeling dynamic user preferences on the next timestamp, while maintaining the different components of disentangled representations independent. Extensive experiments on two real-world datasets demonstrate the superior performance of our model to state-of-the-art approaches. Further studies confirm the effectiveness of DRAN in representation disentanglement.

ProFairRec: Provider Fairness-aware News Recommendation
Authors: Tao Qi (1), Fangzhao Wu (2), Chuhan Wu (1), Peijie Sun (3), Le Wu (3), Xiting Wang (2), Yongfeng Huang (1), Xing Xie (2)
1: Tsinghua University, 2: Microsoft Research Asia, 3: Hefei University of Technology

ACM DL

Google Scholar

(105)
概要:　ニュース推薦は、オンラインニュースプラットフォームのユーザーが自身の好みに合ったニュース記事を見つける手助けをすることを目的としています。既存のニュース推薦方法は、通常ユーザーの過去のニュース閲覧行動に基づいてモデルを学習します。しかし、これらの行動はしばしばニュース提供者に偏っているため、そのデータで訓練されたモデルはニュース提供者に対する偏りを捕捉し、さらには増幅させてしまう可能性があります。これは一部の少数派ニュース提供者に対して不公平です。本研究では、この課題を解決するために、ニュース提供者の公平性を考慮したニュース推薦フレームワーク（ProFairRec）を提案します。ProFairRecの主要なアイデアは、提供者に公平なニュース表現と提供者に公平なユーザー表現を学習することにより、公平性を実現することです。偏ったデータから公平な表現を学習するために、提供者に偏った表現を用いてデータから提供者のバイアスを継承します。提供者に公平なニュース表現と偏ったニュース表現は、それぞれニュースの内容と提供者のIDから学習され、さらにユーザーのクリック履歴に基づいて、公平および偏ったユーザー表現を構築します。これらすべての表現はモデル訓練に使用され、公平なニュース推薦を実現するために、公平な表現のみがユーザーとニュースのマッチングに利用されます。また、ニュース提供者の識別に対する対向学習タスクを提案し、提供者に公平なニュース表現が提供者のバイアスをエンコードしないようにします。さらに、提供者に公平な表現と偏った表現に対する直交正則化を提案し、提供者に公平な表現から提供者のバイアスをより良く除去します。加えて、ProFairRecは汎用的なフレームワークであり、さまざまなニュース推薦方法に適用可能です。公開データセットにおける広範な実験により、ProFairRecアプローチが多くの既存の手法の提供者公平性を効果的に改善し、その一方で推奨精度を維持できることが確認されました。

Abstract:　 News recommendation aims to help online news platform users find their preferred news articles. Existing news recommendation methods usually learn models from historical user behaviors on news. However, these behaviors are usually biased on news providers. Models trained on biased user data may capture and even amplify the biases on news providers, and are unfair for some minority news providers. In this paper, we propose a provider fairness-aware news recommendation framework (named ProFairRec), which can learn news recommendation models fair for different news providers from biased user data. The core idea of ProFairRec is to learn provider-fair news representations and provider-fair user representations to achieve provider fairness. To learn provider-fair representations from biased data, we employ provider-biased representations to inherit provider bias from data. Provider-fair and -biased news representations are learned from news content and provider IDs respectively, which are further aggregated to build fair and biased user representations based on user click history. All of these representations are used in model training while only fair representations are used for user-news matching to achieve fair news recommendation. Besides, we propose an adversarial learning task on news provider discrimination to prevent provider-fair news representation from encoding provider bias. We also propose an orthogonal regularization on provider-fair and -biased representations to better reduce provider bias in provider-fair representations. Moreover, ProFairRec is a general framework and can be applied to different news recommendation methods. Extensive experiments on a public dataset verify that our ProFairRec approach can effectively improve the provider fairness of many existing methods and meanwhile maintain their recommendation accuracy.

CAPTOR: A Crowd-Aware Pre-Travel Recommender System for Out-of-Town Users
Authors: Haoran Xin (1), Xinjiang Lu (2), Nengjun Zhu (3), Tong Xu (1), Dejing Dou (2), Hui Xiong (4)
1: University of Science and Technology of China, 2: Baidu Research, 3: Shanghai University, 4: The Hong Kong University of Science and Technology (Guangzhou)

ACM DL

Google Scholar

(106)
概要:　遠出旅行の事前推薦は、近い将来に自分の住んでいる町を離れて旅行しようと計画しているが、まだ行き先を決めていないユーザーに観光地（POIs）を推薦することを目的としています。つまり、旅行の目的地地域と観光地の両方が不明な状態です。これは検索範囲が広大であるため、異なる地域で多様な旅行経験を引き起こし、意思決定を混乱させる可能性があるため、簡単ではない課題です。さらに、ユーザーの遠出旅行の行動は、個人的な好みだけでなく、他の旅行者の行動にも大きく影響されます。この目的のために、我々はCrowd-Aware Pre-Travel Out-of-town Recommendation（CAPTOR）という枠組みを提案します。これは、大きく分けて空間連動条件確率場（SA-CRF）と群行動記憶ネットワーク（CBMN）の二つの主要なモジュールで構成されています。具体的には、SA-CRFは観光地間の空間的親和性を捉えつつ、観光地の固有情報を保持します。次に、CBMNは各地域における群旅行行動を、三つの関連ブロックを通じて適応的に読み書きすることで維持します。私は、ユーザーと観光地が固有的かつ地理的に区別可能な動的マッピングメカニズムを持つ精巧なメトリック空間を設計しました。二つの実世界の全国的なデータセットに対する広範な実験によって、CAPTORが事前予約の遠出旅行推薦タスクに対して有効であることが検証されました。

Abstract:　 Pre-travel out-of-town recommendation aims to recommend Point-of-Interests (POIs) to the users who plan to travel out of their hometown in the near future yet have not decided where to go, i.e., their destination regions and POIs both remain unknown. It is a non-trivial task since the searching space is vast, which may lead to distinct travel experiences in different out-of-town regions and eventually confuse decision-making. Besides, users' out-of-town travel behaviors are affected not only by their personalized preferences but heavily by others' travel behaviors. To this end, we propose a Crowd-Aware Pre-Travel Out-of-town Recommendation framework (CAPTOR) consisting of two major modules: spatial-affined conditional random field (SA-CRF) and crowd behavior memory network (CBMN). Specifically, SA-CRF captures the spatial affinity among POIs while preserving the inherent information of POIs. Then, CBMN is proposed to maintain the crowd travel behaviors w.r.t. each region through three affiliated blocks reading and writing the memory adaptively. We devise the elaborated metric space with a dynamic mapping mechanism, where the users and POIs are distinguishable both inherently and geographically. Extensive experiments on two real-world nationwide datasets validate the effectiveness of CAPTOR against the pre-travel out-of-town recommendation task.

Positive, Negative and Neutral: Modeling Implicit Feedback in Session-based News Recommendation
Authors: Shansan Gong (1), Kenny Q. Zhu (1)
1: Shanghai Jiao Tong University

ACM DL

Google Scholar

(107)
概要:　匿名読者へのニュースレコメンデーションは、多くのニュースポータルにとって有用でありながらも挑戦的な課題です。これは、読者と記事の間のインタラクションが一時的なログインセッション内に限定されているためです。従来の研究は、セッションベースのレコメンデーションを次のアイテム予測タスクとして捉えがちですが、ユーザーの行動が示す暗黙のフィードバックを無視しています。これらのフィードバックは、ユーザーが本当に好きか嫌いかを示しています。そのため、我々はユーザー行動を包括的にモデル化するフレームワークを提案します。これは、ユーザーが多くの時間を費やす記事に対するポジティブフィードバックと、クリックせずにスキップする記事に対するネガティブフィードバックを通じて行います。さらに、このフレームワークは、セッションの開始時刻を用いてユーザーを、記事の初回公開時刻を用いて記事を暗黙的にモデル化します。これを中立フィードバックと呼びます。3つの実世界のニュースデータセットに対する実証評価により、本フレームワークは他の最新のセッションベースのレコメンデーション手法に比べて、より正確で多様かつ意外性のあるレコメンデーションを提供できることが示されました。

Abstract:　 News recommendation for anonymous readers is a useful but challenging task for many news portals, where interactions between readers and articles are limited within a temporary login session. Previous works tend to formulate session-based recommendation as a next item prediction task, while they neglect the implicit feedback from user behaviors, which indicates what users really like or dislike. Hence, we propose a comprehensive framework to model user behaviors through positive feedback (i.e., the articles they spend more time on) and negative feedback (i.e., the articles they choose to skip without clicking in). Moreover, the framework implicitly models the user using their session start time, and the article using its initial publishing time, in what we call neutral feedback. Empirical evaluation on three real-world news datasets shows the framework's promising performance of more accurate, diverse and even unexpectedness recommendations than other state-of-the-art session-based recommendation approaches.

A Non-Factoid Question-Answering Taxonomy
Authors: Valeriia Bolotova (1), Vladislav Blinov (2), Falk Scholer (1), W. Bruce Croft (3), Mark Sanderson (1)
1: RMIT University, 2: Ural Federal University, 3: University of Massachusetts Amherst

ACM DL

Google Scholar

(108)
概要:　非定型質問応答（NFQA）は、説明や意見などの長文回答を構築する必要がある、難題かつ研究が不足している課題です。人々がよく尋ねる非定型質問（NFQ）のカテゴリや、期待される回答の形式、各カテゴリにおける主要な研究課題については、まだほとんど理解されていません。本研究では、初めて包括的なNFQカテゴリおよび期待される回答構造の分類体系を提案します。この分類体系は、透明性のある方法論で構築され、クラウドソーシングによって広範囲に評価されました。さらに編集者ユーザースタディを通じて、最も難解なカテゴリが特定されました。私たちは、分類されたNFQのデータセットと質問カテゴリ分類器も公開します。最後に、主要なNFQAデータセットを用いた質問カテゴリの分布に関する定量分析を行い、現在のNFQAシステムにとって最も難解なNFQカテゴリがこれらのデータセットで十分に表現されていないことを示します。この不均衡は、難解なカテゴリに対するシステム性能の不足につながる可能性があります。新しい分類体系とカテゴリ分類器は、この分野の研究を支援し、よりバランスの取れたベンチマークを作成し、特定のカテゴリに焦点を当てたモデルの開発を促進します。

Abstract:　 Non-factoid question answering (NFQA) is a challenging and under-researched task that requires constructing long-form answers, such as explanations or opinions, to open-ended non-factoid questions - NFQs. There is still little understanding of the categories of NFQs that people tend to ask, what form of answers they expect to see in return, and what the key research challenges of each category are. This work presents the first comprehensive taxonomy of NFQ categories and the expected structure of answers. The taxonomy was constructed with a transparent methodology and extensively evaluated via crowdsourcing. The most challenging categories were identified through an editorial user study. We also release a dataset of categorised NFQs and a question category classifier. Finally, we conduct a quantitative analysis of the distribution of question categories using major NFQA datasets, showing that the NFQ categories that are the most challenging for current NFQA systems are poorly represented in these datasets. This imbalance may lead to insufficient system performance for challenging categories. The new taxonomy, along with the category classifier, will aid research in the area, helping to create more balanced benchmarks and to focus models on addressing specific categories.

QUASER: Question Answering with Scalable Extractive Rationalization
Authors: Asish Ghoshal (1), Srinivasan Iyer (1), Bhargavi Paranjape (2), Kushal Lakhotia (1), Scott Wen-tau Yih (1), Yashar Mehdad (1)
1: Meta AI, 2: University of Washington

ACM DL

Google Scholar

(109)
概要:　自然言語処理（NLP）モデルが、まず関連する入力文、すなわち「合理的な理由」を抽出することで予測を行う設計は、モデルの解釈性を向上させ、ユーザーに支持証拠を提供するために重要性を増しています。現在の無監督アプローチは、予測精度を最大化するための合理的な理由を抽出することを目的としていますが、これはデータセット内の擬似相関を利用することで達成されるため、説得力に欠ける合理的理由が導かれます。本論文では、無監督生成モデルを用いて二重目的の合理的理由を抽出する手法を紹介します。これにより、後続の回答予測をサポートするのみならず、入力クエリの再現をもサポートする合理的な理由が提供されます。我々のモデルは、データセットのアーティファクトに影響されにくく、より意味のある合理的理由を生成できることを示します。その結果として、ERASERベンチマークの4つのデータセットで合理的理由抽出の指標において最先端の成果を達成し、以前の無監督手法を大幅に上回ることができます。我々のマルチタスクモデルはスケーラブルであり、最先端の事前学習された言語モデルを使用して説明可能な質問応答システムを設計することを可能にします。

Abstract:　 Designing natural language processing (NLP) models that produce predictions by first extracting a set of relevant input sentences, i.e., rationales, is gaining importance for improving model interpretability and producing supporting evidence for users. Current unsupervised approaches are designed to extract rationales that maximize prediction accuracy, which is invariably obtained by exploiting spurious correlations in datasets, and leads to unconvincing rationales. In this paper, we introduce unsupervised generative models to extract dual-purpose rationales, which must not only be able to support a subsequent answer prediction, but also support a reproduction of the input query. We show that such models can produce more meaningful rationales, that are less influenced by dataset artifacts, and as a result, also achieve the state-of-the-art on rationale extraction metrics on four datasets from the ERASER benchmark, significantly improving upon previous unsupervised methods. Our multi-task model is scalable and enables using state-of-the-art pretrained language models to design explainable question answering systems.

PTAU: Prompt Tuning for Attributing Unanswerable Questions
Authors: Jinzhi Liao (1), Xiang Zhao (1), Jianming Zheng (1), Xinyi Li (1), Fei Cai (1), Jiuyang Tang (1)
1: National University of Defense Technology

ACM DL

Google Scholar

(110)
概要:　現行の質問応答システムは、実際のシナリオに直面した際、質問がその文脈に基づいて回答可能かどうかを認識することが困難であり、不十分です。そのため、最近では、質問の不可回答性とその帰属に関する研究が進められています。不回答性の帰属とは、不可回答な質問に対して適切な原因を選択することをシステムに要求するものです。この課題は人間にとっても複雑であり、ラベル付きデータを取得するのが高コストであるため、データ量が少ない問題となります。さらに、原因自体が意味的に的かつ複雑であり、帰属のプロセスは質問とその文脈に大きく依存します。このため、高性能なモデルは原因を慎重に評価し、その後、質問を文脈と巧みに対比させて適切な原因に結びつける必要があります。

これらの課題に対応するために、我々はPTAUを提案します。これは、「期待を持って読む」という高度な人間の読解戦略を実装したものです。具体的には、PTAUは最近のプロンプトチューニングパラダイムを活用し、さらに以下の二つの革新モジュールによって強化されています：1）高次元ベクトル空間で特定の帰属クラスに向けた連続的テンプレートを構築する、原因指向テンプレートモジュール；2）コントラスト学習を通じてラベルの意味を活用し、クラスを識別可能にする意味認識ラベルモジュール。多数の実験により、提案した設計が帰属モデルだけでなく、現行の質問応答モデルもより優れた性能を発揮することが実証されました。

Abstract:　 Current question answering systems are insufficient when confronting real-life scenarios, as they can hardly be aware of whether a question is answerable given its context. Hence, there is a recent pursuit of unanswerability of a question and its attribution. Attribution of unanswerability requires the system to choose an appropriate cause for an unanswerable question. As the task is sophisticated for even human beings, it is expensive to acquire labeled data, which makes it a low-data regime problem. Moreover, the causes themselves are semantically abstract and complex, and the process of attribution is heavily question- and context-dependent. Thus, a capable model has to carefully appreciate the causes, and then, judiciously contrast the question with its context, in order to cast it into the right cause. In response to the challenges, we present PTAU, which refers to and implements a high-level human reading strategy such that one reads with anticipation. In specific, PTAU leverages the recent prompt-tuning paradigm, and is further enhanced with two innovatively conceived modules: 1) a cause-oriented template module that constructs continuous templates towards certain attributing class in high dimensional vector space; and 2) a semantics-aware label module that exploits label semantics through contrastive learning to render the classes distinguishable. Extensive experiments demonstrate that the proposed design better enlightens not only the attribution model, but also current question answering models, leading to superior performance.

DGQAN: Dual Graph Question-Answer Attention Networks for Answer Selection
Authors: Haitian Yang (1), Xuan Zhao (2), Yan Wang (1), Min Li (1), Wei Chen (3), Weiqing Huang (1)
1: Institute of Information Engineering, 2: York University, 3: Shanghai University of Finance and Economics

ACM DL

Google Scholar

(111)
概要:　近年、コミュニティ質問応答（CQA）はますます普及しており、多様な背景を持つユーザーが情報を得たり知識を共有したりするためのプラットフォームを提供しています。しかし、クラウドソースされた回答の冗長性や長大さが回答選択の性能を制限し、コミュニティユーザーにとっての読解困難や誤解を招くことがあります。これらの問題を解決するために、私たちは回答選択タスクのためにデュアルグラフ質問-回答アテンションネットワーク（DGQAN）を提案します。質問と対応する回答の内部構造を十分に理解することを目指し、まず、オリジナルの質問と回答のテキストを用いてグラフ畳み込みネットワークを使用し、デュアルCQAコンセプトグラフを構築します。具体的には、私たちのCQAコンセプトグラフは、質問-回答ペア間の相関情報を利用して、QSubject-AnswerとQBody-Answerという二つのサブグラフをそれぞれ構築します。また、新しいデュアルアテンションメカニズムを取り入れ、質問と回答の内部および外部の意味関係をモデル化します。さらに、BERTモデルの各層の影響を調査する実験を実施しました。実験結果は、DGQANモデルが3つのデータセット（SemEval-2015、2016、および2017）において最先端の性能を達成し、すべてのベースラインモデルを上回ったことを示しています。

Abstract:　 Community question answering (CQA) becomes increasingly prevalent in recent years, providing platforms for users with various backgrounds to obtain information and share knowledge. However, the redundancy and lengthiness issues of crowd-sourced answers limit the performance of answer selection, thus leading to difficulties in reading or even misunderstandings for community users. To solve these problems, we propose the dual graph question-answer attention networks (DGQAN) for answer selection task. Aims to fully understand the internal structure of the question and the corresponding answer, firstly, we construct a dual-CQA concept graph with graph convolution networks using the original question and answer text. Specifically, our CQA concept graph exploits the correlation information between question-answer pairs to construct two sub-graphs (QSubject-Answer and QBody-Answer), respectively. Further, a novel dual attention mechanism is incorporated to model both the internal and external semantic relations among questions and answers. More importantly, we conduct experiment to investigate the impact of each layer in the BERT model. The experimental results show that DGQAN model achieves state-of-the-art performance on three datasets (SemEval-2015, 2016, and 2017), outperforming all the baseline models.

ReCANet: A Repeat Consumption-Aware Neural Network for Next Basket Recommendation in Grocery Shopping
Authors: Mozhdeh Ariannezhad (1), Sami Jullien (1), Ming Li (1), Min Fang (2), Sebastian Schelter (1), Maarten de Rijke (1)
1: University of Amsterdam, 2: Albert Heijn

ACM DL

Google Scholar

(112)
概要:　食料品店やオンラインマーケットプレイスなどの小売業者は、ユーザーが選べる豊富な商品を揃えていることが多いです。次にユーザーが購入する商品を予測する「次のバスケット推薦（Next Basket Recommendation, NBR）」は、ユーザーが広範な品揃えの中から簡単にアイテムを見つけられるようにするために、最近注目を集めています。バスケット表現の学習に焦点を当てたニューラルネットワークベースのモデルが、最近の文献では主流となっています。しかし、これらの手法は、ユーザーが定期的に食料品を買い物し、同じユーザーが頻繁に再購入するという食料品ショッピングの特有の特徴を考慮していません。

この論文では、まず6つの公開および専有の食料品ショッピング取引データセットを用いた実証研究を通じて、ユーザーの再消費行動に関するデータ駆動の理解を深めます。我々は、すべてのデータセットで平均して、リコールに関するNBRの性能の54%以上が再購入アイテムから来ていることを発見しました。これらのアイテムは全体のアイテムコレクションのわずか1%を占めるに過ぎません。以前に購入されたアイテムに強く焦点を当てたNBRモデルは、高い性能を達成する可能性があります。我々は、ユーザーの再消費行動を明示的にモデル化して次のバスケットを予測する、再消費認識ニューラルネットワーク「ReCANet」を紹介します。ReCANetは、リコールおよびnDCGの観点から、NBRタスクにおいて最先端のモデルを大幅に上回ります。アブレーションスタディを行い、ReCANetの全てのコンポーネントがその性能に寄与していることを示し、またユーザーの反復率がReCANetの治療効果に直接的な影響を持つことを実証します。

Abstract:　 Retailers such as grocery stores or e-marketplaces often have vast selections of items for users to choose from. Predicting a user's next purchases has gained attention recently, in the form of next basket recommendation (NBR), as it facilitates navigating extensive assortments for users. Neural network-based models that focus on learning basket representations are the dominant approach in the recent literature. However, these methods do not consider the specific characteristics of the grocery shopping scenario, where users shop for grocery items on a regular basis, and grocery items are repurchased frequently by the same user. In this paper, we first gain a data-driven understanding of users' repeat consumption behavior through an empirical study on six public and proprietary grocery shopping transaction datasets. We discover that, averaged over all datasets, over 54% of NBR performance in terms of recall comes from repeat items: items that users have already purchased in their history, which constitute only 1% of the total collection of items on average. A NBR model with a strong focus on previously purchased items can potentially achieve high performance. We introduce ReCANet, a repeat consumption-aware neural network that explicitly models the repeat consumption behavior of users in order to predict their next basket. ReCANet significantly outperforms state-of-the-art models for the NBR task, in terms of recall and nDCG. We perform an ablation study and show that all of the components of ReCANet contribute to its performance, and demonstrate that a user's repetition ratio has a direct influence on the treatment effect of ReCANet.

User-controllable Recommendation Against Filter Bubbles
Authors: Wenjie Wang (1), Fuli Feng (2), Liqiang Nie (3), Tat-Seng Chua (1)
1: National University of Singapore, 2: University of Science and Technology of China, 3: Shandong University

ACM DL

Google Scholar

(113)
概要:　レコメンダーシステムは通常、フィルターバブルの問題に直面します。これは、ユーザーの特徴や履歴的なインタラクションに基づいて、同質のアイテムを過剰に推奨する現象です。フィルターバブルはフィードバックループに沿って成長し、意図せずにユーザーの興味を狭めてしまいます。既存の研究では、通常、多様性や公正性といった精度以外の目標を組み込むことでフィルターバブルを軽減しようとします。しかし、その結果として精度が犠牲になり、モデルの忠実性やユーザー体験が損なわれることが多いです。さらに悪いことに、ユーザーはレコメンデーション戦略を受動的に受け入れるしかなく、例えば、「いいね」や「嫌い」といったフィードバックを提供し続け、システムがユーザーの意図を認識するまで時間がかかるなど、非効率的な方法でしかシステムに影響を与えられません。本研究では、ユーザーがフィルターバブルの軽減を能動的にコントロールできる新しいレコメンダープロトタイプであるUser-Controllable Recommender System (UCRS)を提案します。機能的には、1) UCRSはユーザーがフィルターバブルに深くはまっている場合に警告を出すことができます。2) UCRSはユーザーが異なる粒度でフィルターバブルを軽減するための4種類のコントロールコマンドをサポートします。3) UCRSはコントロールに応じてその場でレコメンデーションを調整できます。調整の鍵は、コントロールコマンドと一致しない履歴情報を含む古いユーザー表現の影響をブロックすることです。このため、私たちは因果関係強化型User-Controllable Inference (UCI)フレームワークを開発しました。これにより、推論段階でのユーザーコントロールに基づくレコメンデーションの迅速な修正が可能となり、反事実推論を利用して古いユーザー表現の影響を軽減することができます。3つのデータセットを用いた実験で、UCIフレームワークがユーザーのコントロールに基づいて効果的により望ましいアイテムを推奨できることを検証し、精度と多様性の両方において有望なパフォーマンスを示しました。

Abstract:　 Recommender systems usually face the issue of filter bubbles: over-recommending homogeneous items based on user features and historical interactions. Filter bubbles will grow along the feedback loop and inadvertently narrow user interests. Existing work usually mitigates filter bubbles by incorporating objectives apart from accuracy such as diversity and fairness. However, they typically sacrifice accuracy, hurting model fidelity and user experience. Worse still, users have to passively accept the recommendation strategy and influence the system in an inefficient manner with high latency, e.g., keeping providing feedback (e.g., like and dislike) until the system recognizes the user intention. This work proposes a new recommender prototype called User-Controllable Recommender System (UCRS), which enables users to actively control the mitigation of filter bubbles. Functionally, 1) UCRS can alert users if they are deeply stuck in filter bubbles. 2) UCRS supports four kinds of control commands for users to mitigate the bubbles at different granularities. 3) UCRS can respond to the controls and adjust the recommendations on the fly. The key to adjusting lies in blocking the effect of out-of-date user representations on recommendations, which contains historical information inconsistent with the control commands. As such, we develop a causality-enhanced User-Controllable Inference (UCI) framework, which can quickly revise the recommendations based on user controls in the inference stage and utilize counterfactual inference to mitigate the effect of out-of-date user representations. Experiments on three datasets validate that the UCI framework can effectively recommend more desired items based on user controls, showing promising performance w.r.t. both accuracy and diversity.

Unify Local and Global Information for Top-N Recommendation
Authors: Xiaoming Liu (1), Shaocong Wu (1), Zhaohan Zhang (1), Chao Shen (1)
1: Xi'an Jiaotong University

ACM DL

Google Scholar

(114)
概要:　ナレッジグラフ（KG）は、複雑な情報を統合し豊かなセマンティクスを含んでおり、推薦システムを強化するための補助情報として広く考慮されています。しかし、既存の多くのKGベースの手法はグラフの構造情報のエンコードに集中しており、ユーザーとアイテムの相互作用データにおける協調シグナルを利用していません。これらのシグナルはユーザーの嗜好を理解するために重要です。そのため、これらのモデルが学習する表現は、推薦環境におけるユーザーとアイテムのセマンティック情報を十分に表現できません。両方のデータの組み合わせはこの問題を解決する良い機会を提供しますが、次のような課題に直面します: i) ユーザーまたはアイテムの一方からではユーザー-アイテム相互作用データの内的相関を捉えにくい; ii) 全体のKG上での知識の関連をキャプチャするとノイズが加わり、推薦結果に多様な影響を及ぼす; iii) 両方のデータ間のセマンティックギャップを解消することが難しい。これらの研究ギャップに取り組むために、我々はKADMと呼ばれる新しいデュエット表現学習フレームワークを提案します。このフレームワークは、ローカル情報（ユーザー-アイテム相互作用データ）とグローバル情報（外部ナレッジグラフ）を統合して上位N件の推薦を行うもので、2つの個別のサブモデルで構成されています。一方はナレッジアウェア・コアテンションメカニズムを用いてローカル情報内の内的相関を発見することでローカルな表現を学習し、もう一方はリレーションアウェア・アテンションネットワークを用いてグローバル情報内の知識の関連をエンコードすることでグローバルな表現を学習します。この2つのサブモデルはセマンティックフュージョンネットワークの一部として共同でトレーニングされ、特定の文脈における両サブモデルの貢献度を区別してユーザーの嗜好を計算します。我々は2つの実世界データセットで実験を行い、KADMが最新の手法を大幅に上回ることを示す評価結果を得ました。さらなるアブレーションスタディによって、デュエットアーキテクチャが各サブモデル単独よりも推薦タスクで有意に優れていることが確認されました。

Abstract:　 Knowledge graph (KG), integrating complex information and containing rich semantics, is widely considered as side information to enhance the recommendation systems. However, most of the existing KG-based methods concentrate on encoding the structural information in the graph, without utilizing the collaborative signals in user-item interaction data, which are important for understanding user preferences. Therefore, the representations learned by these models are insufficient for representing semantic information of users and items in the recommendation environment. The combination of both kinds of data provides a good chance to solve this problem, but it faces the following challenges: i) the inner correlations in user-item interaction data are difficult to capture from one side of the user or item; ii) capturing the knowledge associations on the whole KG would introduce noises and variously influence the recommendation results; iii) the semantic gap between both kinds of data is hard to alleviate. To tackle this research gap, we propose a novel duet representation learning framework named KADM to fuse local information (user-item interaction data) and global information (external knowledge graph) for the top-N recommendation, which is composed of two separate sub-models. One learns the local representations by discovering the inner correlations in local information with a knowledge-aware co-attention mechanism, and another learns the global representations by encoding the knowledge associations in global information with a relation-aware attention network. The two sub-models are jointly trained as part of the semantic fusion network to compute the user preferences, which discriminates the contribution of the two sub-models under the special context. We conduct experiments on two real-world datasets, and the evaluations show that KADM significantly outperforms state-of-art methods. Further ablation studies confirm that the duet architecture performs significantly better than either sub-model on the recommendation tasks.

Less is More: Reweighting Important Spectral Graph Features for Recommendation
Authors: Shaowen Peng (1), Kazunari Sugiyama (1), Tsunenori Mine (2)
1: Kyoto University, 2: Kyushu University

ACM DL

Google Scholar

(115)
概要:　グラフ畳み込みネットワーク（GCN）は、推薦システムや協調フィルタリング（CF）において大きな成功を収めてきましたが、特にコアコンポーネント（すなわち、近傍の集約）が推薦にどのように貢献するかのメカニズムは十分に研究されていません。GCNの推薦における有効性を明らかにするために、まずスペクトルの観点から分析し、以下の2つの重要な発見をしました：(1) 推薦精度に寄与するのは、近傍の滑らかさと差を強調するスペクトルグラフ特徴のごく一部であり、大部分のグラフ情報はノイズと見なされることで性能を低下させている、(2) 近傍集約の繰り返しは、滑らかになった特徴を強調し、ノイズ情報を非効率的にフィルタリングします。これら2つの発見に基づき、近傍集約を単純で効果的なグラフデノイジングエンコーダ（GDE）に置き換えた新しいGCN学習スキームを提案します。このGDEは重要なグラフ特徴を捉えるためのバンドパスフィルターとして機能します。我々の提案する方法は、過度のスムージングを緩和し、任意のホップ近傍を考慮する無限層GCNと同等の性能を示します。最後に、負のサンプルに対する勾配を動的に調整し、追加の複雑さを導入することなくモデルのトレーニングを加速させます。5つの実世界のデータセットに対する広範な実験により、我々の提案する方法が最先端技術を上回るだけでなく、LightGCNに対して12倍のスピードアップを達成することが示されました。

Abstract:　 As much as Graph Convolutional Networks (GCNs) have shown tremendous success in recommender systems and collaborative filtering (CF), the mechanism of how they, especially the core components (\textiti.e., neighborhood aggregation) contribute to recommendation has not been well studied. To unveil the effectiveness of GCNs for recommendation, we first analyze them in a spectral perspective and discover two important findings: (1) only a small portion of spectral graph features that emphasize the neighborhood smoothness and difference contribute to the recommendation accuracy, whereas most graph information can be considered as noise that even reduces the performance, and (2) repetition of the neighborhood aggregation emphasizes smoothed features and filters out noise information in an ineffective way. Based on the two findings above, we propose a new GCN learning scheme for recommendation by replacing neihgborhood aggregation with a simple yet effective Graph Denoising Encoder (GDE), which acts as a band pass filter to capture important graph features. We show that our proposed method alleviates the over-smoothing and is comparable to an indefinite-layer GCN that can take any-hop neighborhood into consideration. Finally, we dynamically adjust the gradients over the negative samples to expedite model training without introducing additional complexity. Extensive experiments on five real-world datasets show that our proposed method not only outperforms state-of-the-arts but also achieves 12x speedup over LightGCN.

A Review-aware Graph Contrastive Learning Framework for Recommendation
Authors: Jie Shuai (1), Kun Zhang (1), Le Wu (1), Peijie Sun (1), Richang Hong (1), Meng Wang (1), Yong Li (2)
1: Hefei University of Technology, 2: Tsinghua University

ACM DL

Google Scholar

(116)
概要:　現代の多くのレコメンダーシステムは、ユーザーとアイテムの埋め込み学習に続いて、ユーザーアイテムのインタラクションモデルを予測する2つのコンポーネントでユーザーの嗜好を予測します。ユーザーの評価に伴う補助的なレビュー情報を活用することにより、既存の多くのレビューに基づくレコメンデーションモデルは、過去のレビューや入手可能なユーザーアイテムのターゲットレビューの助けを借りて、ユーザー/アイテムの埋め込み学習能力を強化したり、ユーザーアイテム間のインタラクションをより良くモデル化したりしています。しかし、著しい進展があったものの、レビューに基づくレコメンデーションに対する現在の解決策には2つの欠点があると考えます。第一に、レビューに基づくレコメンデーションは、対応するユーザーアイテムのレビューからエッジ特徴を持つユーザーアイテムの二部グラフとして自然に構成されるため、この独特なグラフ構造をどのようにしてより良く活用するかということです。第二に、現在のほとんどのモデルが限定されたユーザー行動に苦しんでいる中、レビューに対応したグラフの中での独自の自己教師信号を利用して、レコメンデーションの2つのコンポーネントをより効果的に導くことができるかということです。この目的のために、本論文ではレビューに基づくレコメンデーションのための新しいレビュー対応グラフ対比学習(RGCL)フレームワークを提案します。具体的には、まず、レビューから特徴強化されたエッジを持つレビュー対応のユーザーアイテムグラフを構築します。各エッジ特徴はユーザーアイテムの評価と対応するレビューの意味から構成されます。この特徴強化されたエッジを持つグラフは、ユーザーとアイテムの表現学習のために各隣接ノードの重みを注意深く学習するのに役立ちます。その後、2つの追加の対比学習タスク（すなわち、ノード識別とエッジ識別）を設計し、レコメンデーションプロセスの2つのコンポーネントに自己教師信号を提供します。最後に、5つのベンチマークデータセットにわたる広範な実験により、我々が提案したRGCLが最先端のベースラインと比較して優れていることが示されました。

Abstract:　 Most modern recommender systems predict users' preferences with two components: user and item embedding learning, followed by the user-item interaction modeling. By utilizing the auxiliary review information accompanied with user ratings, many of the existing review-based recommendation models enriched user/item embedding learning ability with historical reviews or better modeled user-item interactions with the help of available user-item target reviews. Though significant progress has been made, we argue that current solutions for review-based recommendation suffer from two drawbacks. First, as review-based recommendation can be naturally formed as a user-item bipartite graph with edge features from corresponding user-item reviews, how to better exploit this unique graph structure for recommendation? Second, while most current models suffer from limited user behaviors, can we exploit the unique self-supervised signals in the review-aware graph to guide two recommendation components better? To this end, in this paper, we propose a novel Review-aware Graph Contrastive Learning (RGCL) framework for review-based recommendation. Specifically, we first construct a review-aware user-item graph with feature-enhanced edges from reviews, where each edge feature is composed of both the user-item rating and the corresponding review semantics. This graph with feature-enhanced edges can help attentively learn each neighbor node weight for user and item representation learning. After that, we design two additional contrastive learning tasks (i.e., Node Discrimination and Edge Discrimination) to provide self-supervised signals for the two components in recommendation process. Finally, extensive experiments over five benchmark datasets demonstrate the superiority of our proposed RGCL compared to the state-of-the-art baselines.

Are Graph Augmentations Necessary?: Simple Graph Contrastive Learning for Recommendation
Authors: Junliang Yu (1), Hongzhi Yin (1), Xin Xia (1), Tong Chen (1), Lizhen Cui (2), Quoc Viet Hung Nguyen (3)
1: The University of Queensland, 2: Shandong University, 3: Griffith University

ACM DL

Google Scholar

(117)
概要:　コントラスト学習（Contrastive Learning, CL）は、データ不足問題に対処するために推薦システムが必要とする自己教師付き信号を生データから抽出する能力が一致していることから、最近、推薦の分野で実り多い研究の一線を駆使しています。CLベースの推薦モデルの典型的なパイプラインは、まずユーザー-アイテムの二部グラフを構造摂動で増強し、異なるグラフ増強間でのノード表現の一貫性を最大化することです。このパラダイムは効果的であることが判明しましたが、パフォーマンス向上の背後にある要因はまだ謎のままです。本論文では、CLベースの推薦モデルにおいて、CLは人気バイアスを暗黙的に軽減することができる、より均一なユーザー/アイテム表現を学習することによって機能することを実験的に明らかにします。同時に、必要と考えられていたグラフ増強は実質的には取るに足らない役割を果たすだけであることを示します。この発見に基づいて、グラフ増強を放棄し、代わりにエンベディング空間に均一なノイズを加えてコントラストビューを作成する簡単なCL手法を提案します。3つのベンチマークデータセットで包括的な実験研究を行った結果、本手法は非常にシンプルに見えますが、学習された表現の均一性をスムーズに調整でき、推薦精度および訓練効率の点でグラフ増強ベースの手法よりも明確な利点を持つことが示されました。コードはhttps://github.com/Coder-Yu/QRecで公開されています。

Abstract:　 Contrastive learning (CL) recently has spurred a fruitful line of research in the field of recommendation, since its ability to extract self-supervised signals from the raw data is well-aligned with recommender systems' needs for tackling the data sparsity issue. A typical pipeline of CL-based recommendation models is first augmenting the user-item bipartite graph with structure perturbations, and then maximizing the node representation consistency between different graph augmentations. Although this paradigm turns out to be effective, what underlies the performance gains is still a mystery. In this paper, we first experimentally disclose that, in CL-based recommendation models, CL operates by learning more uniform user/item representations that can implicitly mitigate the popularity bias. Meanwhile, we reveal that the graph augmentations, which used to be considered necessary, just play a trivial role. Based on this finding, we propose a simple CL method which discards the graph augmentations and instead adds uniform noises to the embedding space for creating contrastive views. A comprehensive experimental study on three benchmark datasets demonstrates that, though it appears strikingly simple, the proposed method can smoothly adjust the uniformity of learned representations and has distinct advantages over its graph augmentation-based counterparts in terms of recommendation accuracy and training efficiency. The code is released at https://github.com/Coder-Yu/QRec.

AutoLossGen: Automatic Loss Function Generation for Recommender Systems
Authors: Zelong Li (1), Jianchao Ji (1), Yingqiang Ge (1), Yongfeng Zhang (1)
1: Rutgers University

ACM DL

Google Scholar

(118)
概要:　推薦システムにおいて、損失関数の選択は重要です。適切な損失関数を用いることで、モデルの性能が大幅に向上する可能性があります。しかし、問題の複雑さゆえに、良い損失関数を手動で設計することは非常に困難です。既存の多くの研究は、多大な専門知識と人間の労力を要する手作りの損失関数に焦点を当てています。本研究では、自動機械学習の最近の発展に触発され、事前知識なしで基本的な数学的操作から直接構築される損失関数を生成するための自動損失関数生成フレームワーク「AutoLossGen」を提案します。具体的には、強化学習により制御されるコントローラーモデルを開発し、損失関数を生成します。また、コントローラーモデルと推薦モデルの両方のパラメータを更新する反復的かつ交互の最適化スケジュールを開発しました。推薦システムにおける自動損失生成の一つの課題は、推薦データセットの極度のスパース性であり、これが損失生成と検索のためのスパース報酬問題を引き起こします。この問題を解決するために、効率的かつ効果的な損失生成のための報酬フィルタリングメカニズムも構築しました。実験結果から、提案したフレームワークは、異なる推薦モデルおよびデータセットに対してカスタマイズされた損失関数を作成し、生成された損失が一般に使用されるベースラインの損失よりも優れた推薦性能を示すことが確認されました。さらに、生成された損失の多くは移植可能であり、一つのモデルとデータセットに基づいて生成された損失が別のモデルやデータセットでも良好に機能することが明らかになりました。本研究のソースコードは、https://github.com/rutgerswiselab/AutoLossGen で利用可能です。

Abstract:　 In recommendation systems, the choice of loss function is critical since a good loss may significantly improve the model performance. However, manually designing a good loss is a big challenge due to the complexity of the problem. A large fraction of previous work focuses on handcrafted loss functions, which needs significant expertise and human effort. In this paper, inspired by the recent development of automated machine learning, we propose an automatic loss function generation framework, AutoLossGen, which is able to generate loss functions directly constructed from basic mathematical operators without prior knowledge on loss structure. More specifically, we develop a controller model driven by reinforcement learning to generate loss functions, and develop iterative and alternating optimization schedule to update the parameters of both the controller model and the recommender model. One challenge for automatic loss generation in recommender systems is the extreme sparsity of recommendation datasets, which leads to the sparse reward problem for loss generation and search. To solve the problem, we further develop a reward filtering mechanism for efficient and effective loss generation. Experimental results show that our framework manages to create tailored loss functions for different recommendation models and datasets, and the generated loss gives better recommendation performance than commonly used baseline losses. Besides, most of the generated losses are transferable, i.e., the loss generated based on one model and dataset also works well for another model or dataset. Source code of the work is available at https://github.com/rutgerswiselab/AutoLossGen.

Locality-Sensitive State-Guided Experience Replay Optimization for Sparse Rewards in Online Recommendation
Authors: Xiaocong Chen (1), Lina Yao (1), Julian McAuley (2), Weili Guan (3), Xiaojun Chang (4), Xianzhi Wang (4)
1: University of New South Wales, 2: University of California, 3: Monash University, 4: University of Technology Sydney

ACM DL

Google Scholar

(119)
概要:　オンライン推薦システムは急速に変化するユーザーの嗜好に対応する必要があります。深層強化学習（Deep Reinforcement Learning, DRL）は、推薦システムとのインタラクション中にユーザーの動的な興味を捉えるための有効な手段です。一般に、オンライン推薦システムでDRLエージェントを訓練することは、大きなアクション空間（例えば、候補アイテム空間）と比較的少ないユーザーインタラクションによって引き起こされる希薄な報酬のため、困難です。経験再生（Experience Replay, ER）を利用することは、希薄な報酬の問題に対処するために広く研究されていますが、オンライン推薦システムの複雑な環境に適応するのが難しく、過去の経験から最適な戦略を学習することに非効率的です。このギャップを埋めるための一歩として、我々は新しい状態認識経験再生モデルを提案します。このモデルでは、エージェントが最も関連性の高い経験を選択的に発見し、オンライン推薦のための最適なポリシーを見つけるよう導かれます。特に、ローカリティセンシティブハッシング法を提案して、大規模で最も意味のある経験を選択的に保持し、優先度の高い報酬駆動型戦略を設計して、より価値のある経験を高い確率で再生するようにします。我々は、提案された方法が経験再生の上下限を保証し、空間複雑度を最適化することを形式的に示し、また3つのベンチマークシミュレーションプラットフォームにおいて複数の既存の経験再生方法に対する我々のモデルの優越性を実証的に示します。

Abstract:　 Online recommendation requires handling rapidly changing user preferences. Deep reinforcement learning (DRL) is an effective means of capturing users' dynamic interest during interactions with recommender systems. Generally, it is challenging to train a DRL agent in online recommender systems because of the sparse rewards caused by the large action space (e.g., candidate item space) and comparatively fewer user interactions. Leveraging experience replay (ER) has been extensively studied to conquer the issue of sparse rewards. However, they adapt poorly to the complex environment of online recommender systems and are inefficient in learning an optimal strategy from past experience. As a step to filling this gap, we propose a novel state-aware experience replay model, in which the agent selectively discovers the most relevant and salient experiences and is guided to find the optimal policy for online recommendations. In particular, a locality-sensitive hashing method is proposed to selectively retain the most meaningful experience at scale and a prioritized reward-driven strategy is designed to replay more valuable experiences with higher chance. We formally show that the proposed method guarantees the upper and lower bound on experience replay and optimizes the space complexity, as well as empirically demonstrate our model's superiority to several existing experience replay methods over three benchmark simulation platforms.

User-Aware Multi-Interest Learning for Candidate Matching in Recommenders
Authors: Zheng Chai (1), Zhihong Chen (2), Chenliang Li (3), Rong Xiao (2), Houyi Li (2), Jiawei Wu (2), Jingxu Chen (2), Haihong Tang (2)
1: Zhejiang University, 2: Alibaba Group, 3: Wuhan University

ACM DL

Google Scholar

(120)
概要:　レコメンダーシステムは大半のEコマースプラットフォームにおいて必須のサービスとなっており、その中でもマッチング段階はユーザーに対して潜在的に関連する候補アイテムを取得し、さらにランク付けを行う役割を果たしています。最近では、ユーザーの過去の行動から複数の興味を抽出する手法が優れたパフォーマンスを示しています。しかし、過去の行動には誤クリックやノイズの影響が含まれているため、全てがノイズフリーではありません。既存の研究では、ユーザーの興味は過去の行動だけでなく、プロファイル情報によっても本質的に調整されるという事実を見落としがちです。そこで、候補アイテムのマッチング性能を向上させるために、複数の興味を学習する際にユーザープロファイルを活用することに関心を持ちました。

この目的を達成するために、本論文では、ユーザープロファイルと行動情報の両方を活用して候補アイテムをマッチングするための「ユーザー認識型マルチインタレスト学習フレームワーク（UMI）」を提案します。具体的には、UMIはデュアル・アテンションルーティングと興味の洗練という2つの主要なコンポーネントで構成されています。デュアル・アテンションルーティングでは、まずユーザープロファイルに基づいて重要な過去のアイテムを特定するためのユーザーガイド付きアテンションネットワークを導入します。その後、得られた重要度の重みをデュアル・アテンティブ・カプセルネットワークを介して活用し、ユーザーの複数の興味を抽出します。次に、抽出された興味を用いて対応するユーザープロファイル特徴を強調し、興味の洗練を図ります。これにより、異なるユーザープロファイルが多様なユーザープリファレンス理解に反映されるようになります。さらに、モデルの識別能力を向上させるために、より難しいネガティブ例を用いた戦略を考案し、モデルの最適化をサポートします。

広範な実験により、UMIは最新のマルチインタレストモデルと比較して大幅に優れていることが示されました。現在、UMIはアリババの淘宝（タオバオ）アプリに成功裏に展開され、数億人のユーザーにサービスを提供しています。

Abstract:　 Recommender systems have become a fundamental service in most E-Commerce platforms, in which the matching stage aims to retrieve potentially relevant candidate items to users for further ranking. Recently, some efforts on extracting multi-interests from user's historical behaviors have demonstrated superior performance. However, the historical behaviors are not noise-free due to the possible misclicks or disturbances. Existing works mainly overlook the fact that the interests of a user are not only reflected by the historical behaviors, but also inherently regulated by the profile information. Hence, we are interested in exploiting the benefit of user profile in multi-interest learning to enhance candidate matching performance. To this end, a user-aware multi-interest learning framework (named UMI) is proposed in this paper to exploit both user profile and behavior information for candidate matching. Specifically, UMI consists of two main components: dual-attention routing and interest refinement. In the dual-attention routing, we firstly introduce a user-guided attention network to identify the important historical items with respect to the user profile. Then, the resultant importance weights are leveraged via the dual-attentive capsule network to extract the user's multi-interests. Afterwards, the extracted interests are utilized to highlight the corresponding user profile features for interest refinement, such that different user profiles can be incorporated into interest learning for diverse user preference understanding. Besides, to improve the model's discriminative capacity, we further devise a harder-negatives strategy to support model optimization. Extensive experiments show that UMI significantly outperforms state-of-the-art multi-interest modeling alternatives. Currently, UMI has been successfully deployed at Taobao App in Alibaba, serving hundreds of millions of users.

Multi-Level Interaction Reranking with User Behavior History
Authors: Yunjia Xi (1), Weiwen Liu (2), Jieming Zhu (2), Xilong Zhao (1), Xinyi Dai (1), Ruiming Tang (2), Weinan Zhang (1), Rui Zhang (3), Yong Yu (1)
1: Shanghai Jiao Tong University, 2: Huawei Noah's Ark Lab, 3: ruizhang.info

ACM DL

Google Scholar

(121)
概要:　多段階レコメンダーシステム（MRS）の最終段階として、再ランキングはユーザーの体験と満足度に直接影響を与えるため、MRSの中で重要な役割を果たします。既存の研究で達成された改善にもかかわらず、解決されていない三つの問題点があります。第一に、ユーザーの過去の行動には長期および短期の興味などの豊富な嗜好情報が含まれていますが、再ランキングにおいて十分に活用されていません。従来の方法では、履歴内のアイテムを等価に扱い、履歴と候補アイテムの間の動的な相互作用を無視しています。第二に、既存の再ランキングモデルはアイテムレベルでの相互作用の学習に焦点を当てており、詳細な特徴レベルでの相互作用を無視しています。最後に、再ランキング前の順序付けられた初期リストで再ランキングスコアを推定すると、初期のスコアリング問題が生じ、最適でない再ランキング性能をもたらす可能性があります。これらの問題に対処するために、我々は「Multi-level Interaction Reranking（MIR）」と呼ばれるフレームワークを提案します。MIRは、低レベルのクロスアイテム相互作用と高レベルのセット対リスト相互作用を組み合わせたもので、候補アイテムをセット、ユーザーの行動履歴を時系列順のリストとして扱います。個別の長短期興味を持つセット対リスト相互作用をモデル化するために新しいSLAttention構造を設計しました。さらに、特徴レベルの相互作用を取り入れてアイテム間の詳細な影響を捉えます。MIRは入力アイテムの任意の順列が出力ランキングを変えないように設計されており、その理論的証明も行いました。三つの公開および独自データセットに関する大規模な実験により、MIRは様々なランキングおよびユーティリティメトリクスを用いて最先端モデルを大幅に上回ることが示されました。

Abstract:　 As the final stage of the multi-stage recommender system (MRS), reranking directly affects users' experience and satisfaction, thus playing a critical role in MRS. Despite the improvement achieved in the existing work, three issues are yet to be solved. First, users' historical behaviors contain rich preference information, such as users' long and short-term interests, but are not fully exploited in reranking. Previous work typically treats items in history equally important, neglecting the dynamic interaction between the history and candidate items. Second, existing reranking models focus on learning interactions at the item level while ignoring the fine-grained feature-level interactions. Lastly, estimating the reranking score on the ordered initial list before reranking may lead to the early scoring problem, thereby yielding suboptimal reranking performance. To address the above issues, we propose a framework named Multi-level Interaction Reranking (MIR). MIR combines low-level cross-item interaction and high-level set-to-list interaction, where we view the candidate items to be reranked as a set and the users' behavior history in chronological order as a list. We design a novel SLAttention structure for modeling the set-to-list interactions with personalized long-short term interests. Moreover, feature-level interactions are incorporated to capture the fine-grained influence among items. We design MIR in such a way that any permutation of the input items would not change the output ranking, and we theoretically prove it. Extensive experiments on three public and proprietary datasets show that MIR significantly outperforms the state-of-the-art models using various ranking and utility metrics.

Rethinking Reinforcement Learning for Recommendation: A Prompt Perspective
Authors: Xin Xin (1), Tiago Pimentel (2), Alexandros Karatzoglou (3), Pengjie Ren (1), Konstantina Christakopoulou (4), Zhaochun Ren (1)
1: Shandong University, 2: University of Cambridge, 3: Google Research, 4: Google

ACM DL

Google Scholar

(122)
概要:　近年のレコメンダーシステムはユーザー体験の向上を目指しています。強化学習（RL）は、セッションごとのユーザーの報酬を最大化するという目標に自然に適合するため、レコメンダーシステムにおける新興のトピックとなっています。しかし、RLベースの推薦方法の開発は、オフライン学習の課題のため、決して容易ではありません。特に、従来のRLの要は、大量のオンライン探索を通じてエージェントを訓練し、その過程で多くの「エラー」を生じさせることです。しかし、推薦設定では、オンラインで「エラー」を犯す代償は許容できません。したがって、エージェントは異なる推薦ポリシーの下で収集されたオフラインの歴史的な暗黙的フィードバックを通じて訓練する必要があります。しかし、従来のRLアルゴリズムはこれらのオフライン学習設定では最適なポリシーが得られない可能性があります。そこで、本研究では、RLベースの推薦エージェントのオフライン学習のための新しい学習パラダイム---すなわちPrompt-Based Reinforcement Learning（PRL）---を提案します。従来のRLアルゴリズムが状態-アクション入力ペアを期待される報酬（例えばQ値）にマッピングしようとするのに対し、PRLは状態-報酬入力から直接アクション（すなわち推薦アイテム）を推論します。簡単に言えば、エージェントは事前のインタラクションと観察された報酬値を与えられた推薦アイテムを予測するように訓練されます---これは単純な教師あり学習によって行われます。展開時には、この歴史的（訓練）データは知識ベースとして機能し、状態-報酬ペアはプロンプトとして使用されます。エージェントは、事前のインタラクションとプロンプトされた報酬値を考慮して、どのアイテムを推薦すべきかを答える役割を果たします。我々は、4つの著名な推薦モデルを使用してPRLを実装し、2つの実世界の電子商取引データセットで実験を行いました。実験結果は、提案手法の優れた性能を示しています。

Abstract:　 Modern recommender systems aim to improve user experience. As reinforcement learning (RL) naturally fits this objective---maximizing an user's reward per session---it has become an emerging topic in recommender systems. Developing RL-based recommendation methods, however, is not trivial due to the offline training challenge. Specifically, the keystone of traditional RL is to train an agent with large amounts of online exploration making lots of 'errors' in the process. In the recommendation setting, though, we cannot afford the price of making 'errors' online. As a result, the agent needs to be trained through offline historical implicit feedback, collected under different recommendation policies; traditional RL algorithms may lead to sub-optimal policies under these offline training settings. Here we propose a new learning paradigm---namely Prompt-Based Reinforcement Learning (PRL)---for the offline training of RL-based recommendation agents. While traditional RL algorithms attempt to map state-action input pairs to their expected rewards (e.g., Q-values), PRL directly infers actions (i.e., recommended items) from state-reward inputs. In short, the agents are trained to predict a recommended item given the prior interactions and an observed reward value---with simple supervised learning. At deployment time, this historical (training) data acts as a knowledge base, while the state-reward pairs are used as a prompt. The agents are thus used to answer the question: Which item should be recommended given the prior interactions & the prompted reward value? We implement PRL with four notable recommendation models and conduct experiments on two real-world e-commerce datasets. Experimental results demonstrate the superior performance of our proposed methods.

Multi-level Cross-view Contrastive Learning for Knowledge-aware Recommender System
Authors: Ding Zou (1), Wei Wei (1), Xian-Ling Mao (2), Ziyang Wang (1), Minghui Qiu (3), Feida Zhu (4), Xin Cao (5)
1: Huazhong University of Science and Technology, 2: Beijing Institute of Technology, 3: Alibaba Group, 4: Singapore Management University, 5: The University of New South Wales

ACM DL

Google Scholar

(123)
概要:　ナレッジグラフ (KG) はレコメンダーシステムにおいてますます重要な役割を果たしています。最近では、グラフニューラルネットワーク (GNN) に基づくモデルが知識対応型レコメンデーション (KGR) の主流となりつつあります。しかし、GNNベースのKGRモデルには、自然な欠点があります。それは、教師信号が疎であるため、実際のパフォーマンスが低下する可能性があるという問題です。データそのものから教師信号を発掘する最近のコントラスト学習の成功に着想を得て、本研究ではKG対応型レコメンデーションにおけるコントラスト学習の探求に焦点を当て、MCCLKという新しい多層クロスビューコントラスト学習メカニズムを提案します。従来のコントラスト学習手法とは異なり、均一なデータ増強手法（例えば、汚損やドロップ）によって2つのグラフビューを生成するのではなく、KG対応型レコメンデーションのために3つの異なるグラフビューを包括的に考慮します。これには、グローバルレベルの構造ビュ―、ローカルレベルのコラボレーティブおよびセマンティックビューが含まれます。具体的には、ユーザーとアイテムのグラフをコラボレーティブビュー、アイテムとエンティティのグラフをセマンティックビュー、そしてユーザー・アイテム・エンティティのグラフを構造ビュ―とします。MCCLKは、ローカルおよびグローバルレベルでの3つのビュ―間でコントラスト学習を行い、自己教師付きの方法で包括的なグラフの特徴および構造情報を抽出します。さらに、セマンティックビューでは、重要なアイテム-アイテムの意味関係を捉えるために、k最近傍 (k-NN) アイテム-アイテム意味グラフ構築モジュールを提案し、従来の研究で見落とされがちな関係を捉えます。3つのベンチマークデータセットでの大規模な実験により、提案手法が最新の手法を上回る性能を持つことが示されました。実装は、以下のURLで公開されています：https://github.com/CCIIPLab/MCCLK。

Abstract:　 Knowledge graph (KG) plays an increasingly important role in recommender systems. Recently, graph neural networks (GNNs) based model has gradually become the theme of knowledge-aware recommendation (KGR). However, there is a natural deficiency for GNN-based KGR models, that is, the sparse supervised signal problem, which may make their actual performance drop to some extent. Inspired by the recent success of contrastive learning in mining supervised signals from data itself, in this paper, we focus on exploring the contrastive learning in KG-aware recommendation and propose a novel multi-level cross-view contrastive learning mechanism, named MCCLK. Different from traditional contrastive learning methods which generate two graph views by uniform data augmentation schemes such as corruption or dropping, we comprehensively consider three different graph views for KG-aware recommendation, including global-level structural view, local-level collaborative and semantic views. Specifically, we consider the user-item graph as a collaborative view, the item-entity graph as a semantic view, and the user-item-entity graph as a structural view. MCCLK hence performs contrastive learning across three views on both local and global levels, mining comprehensive graph feature and structure information in a self-supervised manner. Besides, in semantic view, a k-Nearest-Neighbor (k NN) item-item semantic graph construction module is proposed, to capture the important item-item semantic relation which is usually ignored by previous work. Extensive experiments conducted on three benchmark datasets show the superior performance of our proposed method over the state-of-the-arts. The implementations are available at: https://github.com/CCIIPLab/MCCLK.

MGPolicy: Meta Graph Enhanced Off-policy Learning for Recommendations
Authors: Xiangmeng Wang (1), Qian Li (2), Dianer Yu (1), Zhichao Wang (3), Hongxu Chen (1), Guandong Xu (1)
1: University of Technology Sydney, 2: Curtin University, 3: University of New South Wales

ACM DL

Google Scholar

(124)
概要:　オフポリシー学習は推薦システム（RS）において大きな注目を集めており、高価なオンライントレーニングを省く機会を強化学習に提供しています。しかし、記録されたデータからのオフポリシー学習は、ターゲットポリシーとロギングポリシー間のポリシーシフトによってバイアスが生じます。その結果、ほとんどのオフポリシー学習は逆傾向スコアリング（IPS）に依存しますが、これは露出された（または推薦された）アイテムに過剰適合しがちで、未露出のアイテムを探索することができません。本論文では、文脈情報を通じてオフポリシーバイアスを修正する初めての推薦モデルであるメタグラフ強化オフポリシー学習（MGPolicy）を提案します。特に、メタグラフの豊富なセマンティクスをユーザー状態表現に明示的に活用し、その後、候補生成モデルを訓練してアクション空間の効率的な検索を促進します。さらに、MGPolicyは反事実リスク最小化を設計に組み込んでおり、ポリシー学習のバイアスを修正し、最終的には推薦の長期的な報酬を最大化する効果的なターゲットポリシーを生み出します。我々は一連のシミュレーションおよび大規模な実世界データセットを通じて本手法を広範に評価し、最先端手法と比較して有利な結果を達成しました。我々のコードは現在オンラインで利用可能です。

Abstract:　 Off-policy learning has drawn huge attention in recommender systems (RS), which provides an opportunity for reinforcement learning to abandon the expensive online training. However, off-policy learning from logged data suffers biases caused by the policy shift between the target policy and the logging policy. Consequently, most off-policy learning resorts to inverse propensity scoring (IPS) which however tends to be over-fitted over exposed (or recommended) items and thus fails to explore unexposed items. In this paper, we propose meta graph enhanced off-policy learning (MGPolicy), which is the first recommendation model for correcting the off-policy bias via contextual information. In particular, we explicitly leverage rich semantics in meta graphs for user state representation, and then train the candidate generation model to promote an efficient search in the action space. lMoreover, our MGpolicy is designed with counterfactual risk minimization, which can correct poicy learning bias and ultimately yield an effective target policy to maximize the long-run rewards for the recommendation. We extensively evaluate our method through a series of simulations and large-scale real-world datasets, achieving favorable results compared with state-of-the-art methods. Our code is currently available online.

Privacy-Preserving Synthetic Data Generation for Recommendation Systems
Authors: Fan Liu (1), Zhiyong Cheng (2), Huilin Chen (3), Yinwei Wei (1), Liqiang Nie (4), Mohan Kankanhalli (1)
1: National University of Singapore, 2: Qilu University of Technology (Shandong Artificial Intelligence Institute), 3: Tianjin University of Technology, 4: Shandong University

ACM DL

Google Scholar

(125)
概要:　レコメンデーションシステムは主にユーザーの過去のインタラクションデータ（例、以前クリックしたアイテムや購入したアイテム）に基づいて予測を行います。レコメンデーションモデルを構築するためにユーザーの行動データを収集する際、プライバシー漏洩のリスクがあります。しかし、既存のプライバシー保護のソリューションはモデル訓練[32]および結果収集[40]のフェーズにおけるプライバシー問題を解決するために設計されています。ユーザーのインタラクションデータを組織と直接共有したり、公開したりする際には、依然としてプライバシー漏洩の問題が存在します。本論文では、この問題に対処するために、ユーザーのプライバシーの好みに基づいて合成インタラクションデータを生成する「User Privacy Controllable Synthetic Data Generation」モデル（短縮してUPC-SDG）を提案します。この生成モデルは、データレベルとアイテムレベルの両方で生成された合成データの有用性を最大化しながら、一定のプライバシー保証を提供することを目指しています。具体的には、データレベルでは、ユーザーのインタラクションデータからユーザーの好みに対して寄与が少ないアイテムを選択する選択モジュールを設計しました。アイテムレベルでは、ユーザーの好みに基づいて選択されたアイテムに対応する合成アイテムを生成する合成データ生成モジュールを提案しています。さらに、合成データのプライバシーと有用性のバランスを取るためのプライバシーと有用性のトレードオフ戦略も提案します。3つの公開データセットを用いた広範な実験およびアブレーションスタディが我々の手法の正当性を示し、ユーザーのプライバシーの好みに基づいた合成データ生成の効果を実証しています。

Abstract:　 Recommendation systems make predictions chiefly based on users' historical interaction data (e.g., items previously clicked or purchased). There is a risk of privacy leakage when collecting the users' behavior data for building the recommendation model. However, existing privacy-preserving solutions are designed for tackling the privacy issue only during the model training [32] and results collection [40] phases. The problem of privacy leakage still exists when directly sharing the private user interaction data with organizations or releasing them to the public. To address this problem, in this paper, we present a User Privacy Controllable Synthetic Data Generation model (short for UPC-SDG), which generates synthetic interaction data for users based on their privacy preferences. The generation model aims to provide certain privacy guarantees while maximizing the utility of the generated synthetic data at both data level and item level. Specifically, at the data level, we design a selection module that selects those items that contribute less to a user's preferences from the user's interaction data. At the item level, a synthetic data generation module is proposed to generate a synthetic item corresponding to the selected item based on the user's preferences. Furthermore, we also present a privacy-utility trade-off strategy to balance the privacy and utility of the synthetic data. Extensive experiments and ablation studies have been conducted on three publicly accessible datasets to justify our method, demonstrating its effectiveness in generating synthetic data under users' privacy preferences.

HAKG: Hierarchy-Aware Knowledge Gated Network for Recommendation
Authors: Yuntao Du (1), Xinjun Zhu (2), Lu Chen (1), Baihua Zheng (3), Yunjun Gao (1)
1: Zhejiang University, 2: Zhejiang University, 3: Singapore Management University

ACM DL

Google Scholar

(126)
概要:　知識グラフ（KG）は、推薦システムの性能と解釈性を向上させるためにますます重要な役割を果たしています。最近の技術トレンドは、情報伝達スキームに基づいたエンドツーエンドのモデルを設計することです。しかし、既存の伝搬ベースの手法では、(1) 基本的な階層構造と関係をモデル化できず、(2) 高品質なユーザーおよびアイテムの表現を学習するための高次の共同信号を捉えることができないという問題があります。本論文では、これらの問題を解決するために、新しいモデル「Hierarchy-Aware Knowledge Gated Network（HAKG）」を提案します。技術的には、ユーザーとアイテム（ユーザー-アイテムグラフで捉える）およびエンティティと関係（KGで捉える）を双曲空間でモデル化し、KG上の関係的なコンテキストを集約する新しい双曲集約スキームを設計しました。同時に、埋め込み空間でアイテムの特徴を保持するための新しい角度制約を導入しました。さらに、共同信号と知識の関連を分離して表現および伝搬するために、デュアルアイテム埋め込み設計を提案し、ユーザーの行動パターンをよりよく捉えるために識別情報を抽出するゲート付き集約を利用しました。三つのベンチマークデータセットにおける実験結果は、HAKGがCKAN、Hyper-Know、KGINなどの最先端手法に対して大幅な改善を達成することを示しています。学習された双曲埋め込みに関するさらなる分析は、HAKGがデータの階層に関する意味のある洞察を提供できることを確認しました。

Abstract:　 Knowledge graph (KG) plays an increasingly important role to improve the recommendation performance and interpretability. A recent technical trend is to design end-to-end models based on the information propagation schemes. However, existing propagation-based methods fail to (1) model the underlying hierarchical structures and relations, and (2) capture the high-order collaborative signals of items for learning high-quality user and item representations. In this paper, we propose a new model, called Hierarchy-Aware Knowledge Gated Network (HAKG), to tackle the aforementioned problems. Technically, we model users and items (that are captured by a user-item graph), as well as entities and relations (that are captured in a KG) in hyperbolic space, and design a new hyperbolic aggregation scheme to gather relational contexts over KG. Meanwhile, we introduce a novel angle constraint to preserve characteristics of items in the embedding space. Furthermore, we propose the dual item embeddings design to represent and propagate collaborative signals and knowledge associations separately, and leverage the gated aggregation to distill discriminative information for better capturing user behavior patterns. Experimental results on three benchmark datasets show that, HAKG achieves significant improvement over the state-of-the-art methods like CKAN, Hyper-Know, and KGIN. Further analyses on the learned hyperbolic embeddings confirm that HAKG can offer meaningful insights into the hierarchies of data.

Alleviating Spurious Correlations in Knowledge-aware Recommendations through Counterfactual Generator
Authors: Shanlei Mu (1), Yaliang Li (2), Wayne Xin Zhao (1), Jingyuan Wang (3), Bolin Ding (2), Ji-Rong Wen (1)
1: Renmin University of China & Beijing Key Laboratory of Big Data Management and Analysis Methods, 2: Alibaba Group, 3: Peng Cheng Laboratory

ACM DL

Google Scholar

(127)
概要:　統計に基づく機械学習のフレームワークの制約により、既存の知識ベースの推薦手法においては誤った相関が発生する可能性があります。これは、推薦システムによってユーザーの行動に因果関係があると推定される知識事実が、実際には因果関係がないことを指します。この問題に対処するために、我々は反実仮想の視点から潜在的な誤った相関を発見し緩和する新しいアプローチを提案します。具体的には、我々のアプローチは二つの反実仮想生成器と推薦器から構成されます。反実仮想生成器は強化学習を用いて反実仮想のインタラクションを生成するよう設計されており、推薦器は二つの異なるグラフニューラルネットワークを用いてKGおよびユーザー・アイテム間のインタラクションから情報を集約します。反実仮想生成器と推薦器は相互に協力的に統合されています。このアプローチにより、推薦器は反実仮想生成器が潜在的な誤った相関をより良く特定し、高品質な反実仮想のインタラクションを生成するのを手助けし、一方で反実仮想生成器は推薦器が潜在的な誤った相関の影響を弱めるのを同時に支援します。三つの実世界のデータセット上で行った広範な実験により、提案手法の有効性が多くの競合ベースラインと比較して示されました。我々の実装コードは次のURLから入手可能です：https://github.com/RUCAIBox/CGKR。

Abstract:　 Limited by the statistical-based machine learning framework, a spurious correlation is likely to appear in existing knowledge-aware recommendation methods. It refers to a knowledge fact that appears causal to the user behaviors (inferred by the recommender) but is not in fact. For tackling this issue, we present a novel approach to discovering and alleviating the potential spurious correlations from a counterfactual perspective. To be specific, our approach consists of two counterfactual generators and a recommender. The counterfactual generators are designed to generate counterfactual interactions via reinforcement learning, while the recommender is implemented with two different graph neural networks to aggregate the information from KG and user-item interactions respectively. The counterfactual generators and recommender are integrated in a mutually collaborative way. With this approach, the recommender helps the counterfactual generators better identify potential spurious correlations and generate high-quality counterfactual interactions, while the counterfactual generators help the recommender weaken the influence of the potential spurious correlations simultaneously. Extensive experiments on three real-world datasets have shown the effectiveness of the proposed approach by comparing it with a number of competitive baselines. Our implementation code is available at: https://github.com/RUCAIBox/CGKR.

Self-Guided Learning to Denoise for Robust Recommendation
Authors: Yunjun Gao (1), Yuntao Du (1), Yujia Hu (1), Lu Chen (1), Xinjun Zhu (1), Ziquan Fang (1), Baihua Zheng (2)
1: Zhejiang University, 2: Singapore Management University

ACM DL

Google Scholar

(128)
概要:　暗黙のフィードバックの普及により、それらが現代の推薦システムを構築する際のデフォルトの選択となっています。一般的に、観察されたインタラクションは正のサンプルとして、観察されなかったインタラクションは負のサンプルとして扱われます。しかし、暗黙のフィードバックは本質的にノイズが多く含まれており、そのため正のノイズや負のノイズが普遍的に存在します。最近では、暗黙のフィードバックをデノイズして推薦の性能を向上させることの重要性が注目されており、推薦モデルの頑健性がいくらか向上しています。それにもかかわらず、従来の手法は（1）包括的なユーザーの嗜好を学習するための難しいがクリーンなインタラクションの捕捉、および（2）さまざまな種類の推薦モデルに適用できる汎用的なデノイズソリューションの提供に失敗しています。本論文では、推薦モデルの記憶効果を徹底的に調査し、新しいデノイズパラダイム、すなわち自己誘導デノイズ学習（Self-Guided Denoising Learning, SGDL）を提案します。SGDLはトレーニングの初期段階（つまり「ノイズ耐性」期間）で記憶されたインタラクションを収集し、それらのデータをデノイズ信号として利用して、モデルの（「ノイズ感受性」期間）にメタラーニング方式で訓練を続行します。さらに、この手法は記憶点で学習フェーズを記憶から自己指導学習に自動的に切り替え、新しい適応型デノイズスケジューラを介してクリーンで有益な記憶データを選択することで、頑健性を向上させます。私たちはSGDLを、NeuMF、CDAE、NGCF、LightGCNという4つの代表的な推薦モデルおよびバイナリクロスエントロピーやBPR損失といった異なる損失関数に組み込みました。3つのベンチマークデータセットでの実験結果は、SGDLがT-CE、IR、DeCAなどの最新のデノイズ手法や、さらにSGCNやSGLのような最新の頑健なグラフベース手法よりも有効であることを示しています。

Abstract:　 The ubiquity of implicit feedback makes them the default choice to build modern recommender systems. Generally speaking, observed interactions are considered as positive samples, while unobserved interactions are considered as negative ones. However, implicit feedback is inherently noisy because of the ubiquitous presence of noisy-positive and noisy-negative interactions. Recently, some studies have noticed the importance of denoising implicit feedback for recommendations, and enhanced the robustness of recommendation models to some extent. Nonetheless, they typically fail to (1) capture the hard yet clean interactions for learning comprehensive user preference, and (2) provide a universal denoising solution that can be applied to various kinds of recommendation models. In this paper, we thoroughly investigate the memorization effect of recommendation models, and propose a new denoising paradigm, i.e., Self-Guided Denoising Learning (SGDL), which is able to collect memorized interactions at the early stage of the training (i.e., ''noise-resistant'' period), and leverage those data as denoising signals to guide the following training (i.e., ''noise-sensitive'' period) of the model in a meta-learning manner. Besides, our method can automatically switch its learning phase at the memorization point from memorization to self-guided learning, and select clean and informative memorized data via a novel adaptive denoising scheduler to improve the robustness. We incorporate SGDL with four representative recommendation models (i.e., NeuMF, CDAE, NGCF and LightGCN) and different loss functions (i.e., binary cross-entropy and BPR loss). The experimental results on three benchmark datasets demonstrate the effectiveness of SGDL over the state-of-the-art denoising methods like T-CE, IR, DeCA, and even state-of-the-art robust graph-based methods like SGCN and SGL.

Deployable and Continuable Meta-learning-Based Recommender System with Fast User-Incremental Updates
Authors: Renchu Guan (1), Haoyu Pang (1), Fausto Giunchiglia (2), Ximing Li (1), Xuefeng Yang (3), Xiaoyue Feng (1)
1: Jilin University, 2: University of Trento, 3: Tencent

ACM DL

Google Scholar

(129)
概要:　ユーザーコールドスタートは個別化推奨システムの構築における主要な課題の一つです。十分なインタラクションがないため、新規ユーザーを効果的にモデル化することが困難です。この問題に対する主要な解決策の一つは、メタラーニング（主に勾配ベースの方法）を用いて初期モデルを取得し、数回の勾配降下ステップで新しいユーザーに適応させることです。これらの方法は顕著な性能を発揮している一方で、高まるデータ処理の要求、重い計算負荷、効果的なユーザーインクリメンタルアップデートの不能性のため、依然として実世界の応用には程遠い状況です。本論文では、タスク再現と一次勾配降下法を用いて迅速なユーザーインクリメンタルアップデートを実現可能な、「展開可能で継続可能なメタラーニングに基づく推奨方式（DCMR）」を提案します。具体的には、このフレームワーク内で二重制約付きタスクサンプラー、蒸留ベースのロス関数、および適応型コントローラを導入し、安定性と可塑性のトレードオフをバランスよく保つことを目的としています。すると、DCMRは新しいユーザーに対してサービスを提供しながらアップデートされ続けます。言い換えれば、順次ユーザーストリームから継続的かつ迅速に学習し、いつでも推奨を行うことが可能です。三つのベンチマークデータセット上で行った広範な実験により、我々のモデルの優位性が示されました。

Abstract:　 User cold-start is a major challenge in building personalized recommender systems. Due to the lack of sufficient interactions, it is difficult to effectively model new users. One of the main solutions is to obtain an initial model through meta-learning (mainly gradient-based methods) and adapt it to new users with a few steps of gradient descent. Although these methods have achieved remarkable performance, they are still far from being usable in real-world applications due to their high-demand data processing, heavy computational burden, and inability to perform effective user-incremental update. In this paper, we propose a d eployable and c ontinuable m eta-learning-based r ecommendation (DCMR) approach, which can achieve fast user-incremental updating with task replay and first-order gradient descent. Specifically, we introduce a dual-constrained task sampler, distillation-based loss functions, and an adaptive controller in this framework to balance the trade-off between stability and plasticity in updating. In summary, DCMR can be updated while serving new users; in other words, it learns continuously and rapidly from a sequential user stream and is able to make recommendations at any time. The extensive experiments conducted on three benchmark datasets illustrate the superiority of our model.

Knowledge Graph Contrastive Learning for Recommendation
Authors: Yuhao Yang (1), Chao Huang (1), Lianghao Xia (2), Chenliang Li (3)
1: University of Hong Kong, 2: University of Hong Kong, 3: Wuhan University

ACM DL

Google Scholar

(130)
概要:　知識グラフ（Knowledge Graphs、KGs）は、推薦システムの品質向上のための有用な補助情報として利用されています。これらの推薦システムでは、知識グラフの情報が豊富な事実とアイテム間の固有の語彙的関連性を含んでいます。しかし、このような方法の成功は高品質な知識グラフに依存しており、以下の二つの課題によって質の高い表現を学習できない可能性があります。i) エンティティの長尾分布により、KG強化アイテム表現の監督信号が希薄になる。ii) 現実世界の知識グラフはしばしばノイズが多く、アイテムとエンティティ間に話題無関係な接続を含んでいる。このようなKGの希薄性とノイズは、アイテム-エンティティ依存関係が本来の特性を反映することを妨げ、その結果ノイズの影響が大きくなり、ユーザーの好みの正確な表現を妨げます。この研究ギャップを埋めるために、知識グラフ強化推薦システムの情報ノイズを軽減するための一般的なKnowledge Graph Contrastive Learningフレームワーク（KGCL）を設計しました。具体的には、情報集約におけるKGノイズを抑制するための知識グラフ拡張スキーマを提案し、アイテムのより堅牢な知識ベース表現を導き出します。さらに、クロスビューコントラスト学習パラダイムを導くために、KG拡張プロセスから追加の監督信号を利用し、重み付けのないユーザー-アイテム相互作用により大きな役割を与え、ノイズをさらに抑制します。三つの公的データセットに関する広範な実験は、最先端技術に対するKGCLの一貫した優位性を示しています。KGCLはまた、ユーザー-アイテム相互作用が希薄で、長尾およびノイズの多いKGエンティティを含む推薦シナリオにおいても強力なパフォーマンスを達成しています。我々の実装コードはhttps://github.com/yuh-yang/KGCL-SIGIR22で利用可能です。

Abstract:　 Knowledge Graphs (KGs) have been utilized as useful side information to improve recommendation quality. In those recommender systems, knowledge graph information often contains fruitful facts and inherent semantic relatedness among items. However, the success of such methods relies on the high quality knowledge graphs, and may not learn quality representations with two challenges: i) The long-tail distribution of entities results in sparse supervision signals for KG-enhanced item representation; ii) Real-world knowledge graphs are often noisy and contain topic-irrelevant connections between items and entities. Such KG sparsity and noise make the item-entity dependent relations deviate from reflecting their true characteristics, which significantly amplifies the noise effect and hinders the accurate representation of user's preference. To fill this research gap, we design a general Knowledge Graph Contrastive Learning framework (KGCL) that alleviates the information noise for knowledge graph-enhanced recommender systems. Specifically, we propose a knowledge graph augmentation schema to suppress KG noise in information aggregation, and derive more robust knowledge-aware representations for items. In addition, we exploit additional supervision signals from the KG augmentation process to guide a cross-view contrastive learning paradigm, giving a greater role to unbiased user-item interactions in gradient descent and further suppressing the noise. Extensive experiments on three public datasets demonstrate the consistent superiority of our KGCL over state-of-the-art techniques. KGCL also achieves strong performance in recommendation scenarios with sparse user-item interactions, long-tail and noisy KG entities. Our implementation codes are available at https://github.com/yuh-yang/KGCL-SIGIR22.

CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos
Authors: Shengyao Zhuang (1), Guido Zuccon (1)
1: The University of Queensland

ACM DL

Google Scholar

(131)
概要:　現在の高密度レトリーバーは、ドメイン外および外れ値のクエリに対してロバストではありません。つまり、これらのクエリに対する有効性が期待されるほど高くありません。本論文では、そのようなクエリの一例、すなわちタイプミスを含むクエリを考察します。少しの文字レベルの攪乱（タイプミスによる）でも、高密度レトリーバーの有効性に大きな影響を与えることを示します。次に、その根本原因がBERTが採用する入力トークン化戦略にあることを示します。BERTでは、トークン化はBERTのWordPieceトークナイザーを使って行われ、タイプミスを含むトークンはトークン化後のトークン分布を大幅に変えることが明らかになりました。この分布の変化は、高密度レトリーバーのBERTベースのクエリエンコーダに渡される入力埋め込みの変化に繋がります。次に、タイプミスを含むクエリにもロバストで、かつタイプミスを含まないクエリに対する性能も維持する高密度レトリーバー手法の開発に注目します。これには、CharacterBERTをバックボーンエンコーダとして使用し、知識蒸留法「Self-Teaching（ST）」を用いる効率的かつ効果的なトレーニング方法を採用します。実験結果は、CharacterBERTとSTの組み合わせが、タイプミスを含むクエリに対して従来の方法に比べて著しく高い有効性を達成することを示しています。これらの結果と手法のオープンソース実装に加え、MS MARCOコーパスに基づくタイプミスを含む実世界のクエリと関連性評価を含む新しいパッセージ取得データセットも提供し、ロバストな高密度レトリーバーの研究をサポートします。コード、実験結果、およびデータセットはhttps://github.com/ielab/CharacterBERT-DR で入手可能です。

Abstract:　 Current dense retrievers are not robust to out-of-domain and outlier queries, i.e. their effectiveness on these queries is much poorer than what one would expect. In this paper, we consider a specific instance of such queries: queries that contain typos. We show that a small character level perturbation in queries (as caused by typos) highly impacts the effectiveness of dense retrievers. We then demonstrate that the root cause of this resides in the input tokenization strategy employed by BERT. In BERT, tokenization is performed using the BERT's WordPiece tokenizer and we show that a token with a typo will significantly change the token distributions obtained after tokenization. This distribution change translates to changes in the input embeddings passed to the BERT-based query encoder of dense retrievers. We then turn our attention to devising dense retriever methods that are robust to such queries with typos, while still being as performant as previous methods on queries without typos. For this, we use CharacterBERT as the backbone encoder and an efficient yet effective training method, called Self-Teaching (ST), that distills knowledge from queries without typos into the queries with typos. Experimental results show that CharacterBERT in combination with ST achieves significantly higher effectiveness on queries with typos compared to previous methods. Along with these results and the open-sourced implementation of the methods, we also provide a new passage retrieval dataset consisting of real-world queries with typos and associated relevance assessments on the MS MARCO corpus, thus supporting the research community in the investigation of effective and robust dense retrievers. Code, experimental results and dataset are made available at https://github.com/ielab/CharacterBERT-DR.

Entity-aware Transformers for Entity Search
Authors: Emma J. Gerritse (1), Faegheh Hasibi (1), Arjen P. de Vries (1)
1: Radboud University

ACM DL

Google Scholar

(132)
概要:　 BERTのような事前学習済み言語モデルは、自然言語処理の様々なタスクにおいて最先端の結果を達成するための重要な要素であり、最近では情報検索にも役立っています。最新の研究では、BERTが知識グラフから得られる実体関係や特性に関する事実知識を捉えることができるとも主張されています。本論文では、「BERTベースの実体検索モデルは知識グラフに保存された追加の実体情報から利益を得るのか？」という疑問を探究します。この研究問題に対処するために、我々は実体埋め込みを事前学習済みBERTモデルと同じ入力空間にマップし、これらの実体埋め込みをBERTモデルに注入します。この実体を強化した言語モデルを実体検索タスクに適用しました。我々の研究は、実体を強化したBERTモデルが通常のBERTモデルに比べて、特定の特性を持つ実体のリストを要求する複雑な自然言語クエリに対して、実体指向のクエリの効果を向上させ、新たな最先端の結果を確立することを示しています。さらに、我々の実体を強化したモデルが提供する実体情報は、特に人気のない実体に関連するクエリに役立つことを示しています。最後に、実体を強化したBERTモデルが、サンプルの少ないファインチューニングにおけるBERTの既知の不安定性のために通常は不可能である、限られたトレーニングデータでのファインチューニングを可能にし、実体検索のためのBERTのデータ効率的なトレーニングに貢献することを実証的に観察しました。

Abstract:　 Pre-trained language models such as BERT have been a key ingredient to achieve state-of-the-art results on a variety of tasks in natural language processing and, more recently, also in information retrieval. Recent research even claims that BERT is able to capture factual knowledge about entity relations and properties, the information that is commonly obtained from knowledge graphs. This paper investigates the following question: Do BERT-based entity retrieval models benefit from additional entity information stored in knowledge graphs? To address this research question, we map entity embeddings into the same input space as a pre-trained BERT model and inject these entity embeddings into the BERT model. This entity-enriched language model is then employed on the entity retrieval task. We show that the entity-enriched BERT model improves effectiveness on entity-oriented queries over a regular BERT model, establishing a new state-of-the-art result for the entity retrieval task, with substantial improvements for complex natural language queries and queries requesting a list of entities with a certain property. Additionally, we show that the entity information provided by our entity-enriched model particularly helps queries related to less popular entities. Last, we observe empirically that the entity-enriched BERT models enable fine-tuning on limited training data, which otherwise would not be feasible due to the known instabilities of BERT in few-sample fine-tuning, thereby contributing to data-efficient training of BERT for entity search.

BERT-ER: Query-specific BERT Entity Representations for Entity Ranking
Authors: Shubham Chatterjee (1), Laura Dietz (1)
1: University of New Hampshire

ACM DL

Google Scholar

(133)
概要:　エンティティ指向の検索システムでは、しばしばエンティティのWikipediaページの冒頭段落を通じてエンティティのベクトル表現を学習します。しかし、このような表現はすべてのクエリに対して同じであるため、情報検索（IR）タスクには理想的ではないと仮定しました。本研究では、クエリに対するエンティティの関連性を記述したテキストから得られるクエリ特有のベクトル表現であるBERT Entity Representations（BERT-ER）を提案します。BERT-ERを下流のエンティティランキングシステムに使用することで、Wikipediaの冒頭段落のBERT埋め込みを使用したシステムと比較して、2つの大規模テストコレクションで13-42%（平均適合率）のパフォーマンス向上を達成しました。我々のアプローチは、Wikipedia2Vec、ERNIE、およびE-BERTからのエンティティ埋め込みを使用するエンティティランキングシステムをも上回ります。BERT-ERを使用したエンティティランキングシステムは、関連するエンティティを上位にプロモートすることでランキングの上位における精度を向上できることを示します。この研究により、エンティティランキングタスクに特化したBERTモデルとクエリ特有のエンティティ埋め込みを公開します。

Abstract:　 Entity-oriented search systems often learn vector representations of entities via the introductory paragraph from the Wikipedia page of the entity. As such representations are the same for every query, our hypothesis is that the representations are not ideal for IR tasks. In this work, we present BERT Entity Representations (BERT-ER) which are query-specific vector representations of entities obtained from text that describes how an entity is relevant for a query. Using BERT-ER in a downstream entity ranking system, we achieve a performance improvement of 13-42% (Mean Average Precision) over a system that uses the BERT embedding of the introductory paragraph from Wikipedia on two large-scale test collections. Our approach also outperforms entity ranking systems using entity embeddings from Wikipedia2Vec, ERNIE, and E-BERT. We show that our entity ranking system using BERT-ER can increase precision at the top of the ranking by promoting relevant entities to the top. With this work, we release our BERT models and query-specific entity embeddings fine-tuned for the entity ranking task.

H-ERNIE: A Multi-Granularity Pre-Trained Language Model for Web Search
Authors: Xiaokai Chu (1), Jiashu Zhao (2), Lixin Zou (3), Dawei Yin (3)
1: Institute of Computing Technology, 2: Wilfrid Laurier University, 3: Baidu Inc.

ACM DL

Google Scholar

(134)
概要:　事前学習された言語モデル（PLM）であるBERTやERNIEは、多くの自然言語理解タスクにおいて卓越した性能を発揮しています。最近では、PLMに基づく情報検索モデル（例：MORES、PROP、ColBERT）も調査され、最先端の効果を示しています。しかし、ほとんどのPLMベースのランカーは単一レベルの関連性マッチングにのみ焦点を当てており（例：文字レベル）、他の粒度情報（例：単語やフレーズ）を無視しています。これにより、検索クエリの理解の曖昧さや、検索結果の不正確さにつながりやすくなります。本論文では、クエリとドキュメント内の単語の重要性を意識しながら、多粒度のコンテキスト情報をモデル化することで、最新のPLMであるERNIEをウェブ検索向けに改良することを目指します。特に、新しいH-ERNIEフレームワークを提案します。これは、クエリ・ドキュメント分析コンポーネントと階層的ランク付けコンポーネントを含みます。クエリ・ドキュメント分析コンポーネントには、単語分割、単語重要度分析、単語結合度分析など、必要な変数を生成するいくつかの個別のモジュールが含まれています。これらの変数に基づいて、重要度を意識した複数レベルの対応がランク付けモデルに送信されます。階層的ランク付けモデルには、文字レベルの表現を学習するための多層トランスフォーマーモジュール、単語レベルのマッチングモジュール、および単語の重要性を持つフレーズレベルのマッチングモジュールが含まれています。これらのモジュールのそれぞれが、異なる視点からクエリとドキュメントのマッチングをモデル化します。また、これらのレベルは本質的に連携して、全体として正確なマッチングを実現します。提案したフレームワークの時間計算量について議論し、実際のアプリケーションで効率的に実装できることを示します。公開データセットと商用検索エンジンのオフラインおよびオンライン実験により、提案したH-ERNIEフレームワークの有効性を示します。

Abstract:　 The pre-trained language models (PLMs), such as BERT and ERNIE, have achieved outstanding performance in many natural language understanding tasks. Recently, PLMs-based Information Retrieval models have also been investigated and showed substantially state-of-the-art effectiveness, e.g., MORES, PROP and ColBERT. Moreover, most of the PLMs-based rankers only focus on a single level relevance matching (e.g., character-level), while ignore the other granularity information (e.g., words and phrases), which easily lead to the ambiguity of query understanding and inaccurate matching issues in web search. In this paper, we aim to improve the state-of-the-art PLMs ERNIE for web search, by modeling multi-granularity context information with the awareness of word importance in queries and documents. In particular, we propose a novel H-ERNIE framework, which includes a query-document analysis component and a hierarchical ranking component. The query-document analysis component has several individual modules which generate the necessary variables, such as word segmentation, word importance analysis, and word tightness analysis. Based on these variables, the importance-aware multiple-level correspondences are sent to the ranking model. The hierarchical ranking model includes a multi-layer transformer module to learn the character-level representations, a word-level matching module, and a phrase-level matching module with word importance. Each of these modules models the query and the document matching from a different perspective. Also, these levels are inherently communicated to achieve the overall accurate matching. We discuss the time complexity of the proposed framework, and show that it can be efficiently implemented in real applications. The offline and online experiments on both public data sets and a commercial search engine illustrate the effectiveness of the proposed H-ERNIE framework.

Incorporating Explicit Knowledge in Pre-trained Language Models for Passage Re-ranking
Authors: Qian Dong (1), Yiding Liu (2), Suqi Cheng (2), Shuaiqiang Wang (2), Zhicong Cheng (2), Shuzi Niu (1), Dawei Yin (2)
1: Institute of Software, 2: Baidu Inc.

ACM DL

Google Scholar

(135)
概要:　パッセージの再ランキングは、検索ステージから得られた候補パッセージセットに対して並べ替えを行うことです。再ランキング手法は、自然言語理解における圧倒的な優位性から、事前学習言語モデル（PLM）の隆盛により発展してきました。しかし、既存のPLMベースの再ランキング手法は、語彙の不一致やドメイン特有の知識の欠如に悩まされることが多いです。これらの問題を軽減するために、本研究ではナレッジグラフに含まれる明示的な知識を慎重に導入しました。具体的には、不完全でノイズの多い既存のナレッジグラフを使用し、最初にパッセージ再ランキングタスクに適用しました。信頼性のある知識を活用するために、新しいナレッジグラフ蒸留法を提案し、クエリとパッセージの間の橋渡しとして知識メタグラフを生成しました。潜在空間で両方の埋め込みを整合させるために、テキストエンコーダーとしてPLMを、知識エンコーダーとしてナレッジメタグラフ上のグラフニューラルネットワークを使用しました。さらに、動的なテキストと知識エンコーダー間の相互作用のための新しい知識インジェクターを設計しました。実験結果は、特に深いドメイン知識を必要とするクエリにおける我々の手法の有効性を示しています。

Abstract:　 Passage re-ranking is to obtain a permutation over the candidate passage set from retrieval stage. Re-rankers have been boomed by Pre-trained Language Models (PLMs) due to their overwhelming advantages in natural language understanding. However, existing PLM based re-rankers may easily suffer from vocabulary mismatch and lack of domain specific knowledge. To alleviate these problems, explicit knowledge contained in knowledge graph is carefully introduced in our work. Specifically, we employ the existing knowledge graph which is incomplete and noisy, and first apply it in passage re-ranking task. To leverage a reliable knowledge, we propose a novel knowledge graph distillation method and obtain a knowledge meta graph as the bridge between query and passage. To align both kinds of embedding in the latent space, we employ PLM as text encoder and graph neural network over knowledge meta graph as knowledge encoder. Besides, a novel knowledge injector is designed for the dynamic interaction between text and knowledge encoder. Experimental results demonstrate the effectiveness of our method especially in queries requiring in-depth domain knowledge.

Webformer: Pre-training with Web Pages for Information Retrieval
Authors: Yu Guo (1), Zhengyi Ma (1), Jiaxin Mao (1), Hongjin Qian (1), Xinyu Zhang (2), Hao Jiang (2), Zhao Cao (2), Zhicheng Dou (3)
1: Gaoling School of Artificial Intelligence, 2: Distributed and Parallel Software Lab, 3: Gaoling School of Artificial Intelligence

ACM DL

Google Scholar

(136)
概要:　事前学習済み言語モデル（PLM）は、情報検索の分野で大きな成功を収めています。研究によれば、これらのモデルをアドホックドキュメントランキングに適用することで、より効果的な検索結果が得られることが示されています。しかし、ウェブ上では、ほとんどの情報はHTMLウェブページの形で組織されています。純粋なテキストコンテンツに加えて、HTMLタグによって整理されたコンテンツの構造も、ウェブページで提供される情報の重要な部分です。現在、このような構造情報は、テキストコンテンツのみに基づいて訓練される事前学習モデルによって完全に無視されています。本論文では、大規模なウェブページとそのDOM（Document Object Model）ツリー構造を活用して、情報検索のためのモデルを事前学習することを提案します。ウェブページに含まれる階層構造を使用することで、より豊かな文脈情報を得て、より優れた言語モデルを訓練できると主張します。この種の情報を活用するために、ウェブページの構造に基づいた4つの事前学習目標を考案し、従来のマスク付き言語モデルの目標と共同でこれらのタスクに向けてTransformerモデルを事前学習させます。2つの権威あるアドホック検索データセットでの実験結果は、既存の事前学習済みモデルと比較して、我々のモデルがランキング性能を大幅に向上できることを証明しています。

Abstract:　 Pre-trained language models (PLMs) have achieved great success in the area of Information Retrieval. Studies show that applying these models to ad-hoc document ranking can achieve better retrieval effectiveness. However, on the Web, most information is organized in the form of HTML web pages. In addition to the pure text content, the structure of the content organized by HTML tags is also an important part of the information delivered on a web page. Currently, such structured information is totally ignored by pre-trained models which are trained solely based on text content. In this paper, we propose to leverage large-scale web pages and their DOM (Document Object Model) tree structures to pre-train models for information retrieval. We argue that using the hierarchical structure contained in web pages, we can get richer contextual information for training better language models. To exploit this kind of information, we devise four pre-training objectives based on the structure of web pages, then pre-train a Transformer model towards these tasks jointly with traditional masked language model objective. Experimental results on two authoritative ad-hoc retrieval datasets prove that our model can significantly improve ranking performance compared to existing pre-trained models.

Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings
Authors: Shitao Xiao (1), Zheng Liu (2), Weihao Han (3), Jianjin Zhang (3), Defu Lian (4), Yeyun Gong (2), Qi Chen (2), Fan Yang (2), Hao Sun (3), Yingxia Shao (1), Xing Xie (2)
1: Beijing University of Posts and Telecommunications, 2: Microsoft Research Asia, 3: Microsoft Search Technology Center, 4: University of Science and Technology of China

ACM DL

Google Scholar

(137)
概要:　ベクトル量子化（VQ）に基づくANN索引、例えば反転ファイルシステム（IVF）やプロダクト量子化（PQ）は、時間とメモリの効率が競争力があるため、埋め込みベースの文書検索に広く適用されています。元々、VQは再構成損失、すなわち元の密な埋め込みと量子化後の再構成された埋め込みの間の歪みを最小化するように学習されます。しかしながら、この目的は入力クエリに対する正解の文書を選択するゴールと一致せず、検索品質の重大な低下を引き起こす可能性があります。最近の研究では、この欠陥を識別し、コントラスト学習を通じて検索損失を最小化することを提案しています。しかし、これらの方法は正解の文書を持つクエリに大きく依存しており、ラベル付きデータの不足によってその性能が制限されます。本論文では、IVFとPQの学習を知識蒸留フレームワーク内で統一するDistill-VQを提案します。Distill-VQでは、密な埋め込みが「教師」として利用され、サンプル文書に対するクエリの関連性を予測します。一方、VQモジュールは「生徒」として扱われ、予測された関連性を再現するように学習されるため、再構成された埋め込みが密な埋め込みの検索結果を完全に保持することができます。このようにして、Distill-VQは大量の非ラベルデータから大量のトレーニング信号を抽出することができ、検索品質に大きく貢献します。VQベースのANN索引の学習に関する洞察を提供できるように、知識蒸留の最適な実施方法を包括的に探求しました。また、ラベルデータが高品質なベクトル量子化に必須でないことを実験的に示しており、これはDistill-VQの実用性を強く示唆しています。MS MARCOおよびNatural Questionsベンチマークでの評価において、Distill-VQはRecallやMRRにおいてSOTA VQメソッドを著しく上回る結果を示しました。コードはhttps://github.com/staoxiao/LibVQで利用可能です。

Abstract:　 Vector quantization (VQ) based ANN indexes, such as Inverted File System (IVF) and Product Quantization (PQ), have been widely applied to embedding based document retrieval thanks to the competitive time and memory efficiency. Originally, VQ is learned to minimize the reconstruction loss, i.e., the distortions between the original dense embeddings and the reconstructed embeddings after quantization. Unfortunately, such an objective is inconsistent with the goal of selecting ground-truth documents for the input query, which may cause severe loss of retrieval quality. Recent works identify such a defect, and propose to minimize the retrieval loss through contrastive learning. However, these methods intensively rely on queries with ground-truth documents, whose performance is limited by the insufficiency of labeled data. In this paper, we propose Distill-VQ, which unifies the learning of IVF and PQ within a knowledge distillation framework. In Distill-VQ, the dense embeddings are leveraged as "teachers'', which predict the query's relevance to the sampled documents. The VQ modules are treated as the "students'', which are learned to reproduce the predicted relevance, such that the reconstructed embeddings may fully preserve the retrieval result of the dense embeddings. By doing so, Distill-VQ is able to derive substantial training signals from the massive unlabeled data, which significantly contributes to the retrieval quality. We perform comprehensive explorations for the optimal conduct of knowledge distillation, which may provide useful insights for the learning of VQ based ANN index. We also experimentally show that the labeled data is no longer a necessity for high-quality vector quantization, which indicates Distill-VQ's strong applicability in practice. The evaluations are performed on MS MARCO and Natural Questions benchmarks, where Distill-VQ notably outperforms the SOTA VQ methods in Recall and MRR. Our code is avaliable at https://github.com/staoxiao/LibVQ.

Axiomatically Regularized Pre-training for Ad hoc Search
Authors: Jia Chen (1), Yiqun Liu (1), Yan Fang (1), Jiaxin Mao (2), Hui Fang (3), Shenghao Yang (1), Xiaohui Xie (1), Min Zhang (1), Shaoping Ma (1)
1: Tsinghua University, 2: Renmin University of China, 3: University of Delaware

ACM DL

Google Scholar

(138)
概要:　近年、情報検索（IR）タスクに特化した事前学習手法が大きな成功を収めています。しかし、性能向上のメカニズムが十分に解明されていないため、これらの事前学習モデルの解釈性と頑健性は依然として改善の余地があります。公理的IRは、ランキングモデルの設計を導くために数学的に形式化された望ましい特性のセットを識別することを目的としています。既存の研究は、特定の公理を考慮することでIRモデルの有効性と解釈性が向上する可能性があることを示しています。しかし、これらのIR公理を事前学習手法に組み込む努力はまだ不足しています。この研究課題に光を当てるために、我々はアドホック検索のための新しい事前学習手法「Axiomatic Regularization for ad hoc Search（ARES）」を提案します。ARESフレームワークでは、いくつかの既存のIR公理を再編成して、事前学習プロセスに適合させるためのトレーニングサンプルを生成します。これらのトレーニングサンプルは、ニューラルランカーが望ましいランキング特性を学習するように誘導します。既存の事前学習手法と比較して、ARESはより直感的で説明可能です。複数の公開ベンチマークデータセットでの実験結果は、ARESがフルリソースとローリソース（例：ゼロショットおよび数ショット）両方の設定で有効であることを示しています。直感的なケーススタディも、ARESが既存の事前学習モデル（例：BERTやPROP）が持っていない有用な知識を学習していることを示しています。本研究は、事前学習モデルの解釈性の向上と、IR公理や人間のヒューリスティックを事前学習手法に組み込むための指針を提供します。

Abstract:　 Recently, pre-training methods tailored for IR tasks have achieved great success. However, as the mechanisms behind the performance improvement remain under-investigated, the interpretability and robustness of these pre-trained models still need to be improved. Axiomatic IR aims to identify a set of desirable properties expressed mathematically as formal constraints to guide the design of ranking models. Existing studies have already shown that considering certain axioms may help improve the effectiveness and interpretability of IR models. However, there still lack efforts of incorporating these IR axioms into pre-training methodologies. To shed light on this research question, we propose a novel pre-training method with \underlineA xiomatic \underlineRe gularization for ad hoc \underlineS earch (ARES). In the ARES framework, a number of existing IR axioms are re-organized to generate training samples to be fitted in the pre-training process. These training samples then guide neural rankers to learn the desirable ranking properties. Compared to existing pre-training approaches, ARES is more intuitive and explainable. Experimental results on multiple publicly available benchmark datasets have shown the effectiveness of ARES in both full-resource and low-resource (e.g., zero-shot and few-shot) settings. An intuitive case study also indicates that ARES has learned useful knowledge that existing pre-trained models (e.g., BERT and PROP) fail to possess. This work provides insights into improving the interpretability of pre-trained models and the guidance of incorporating IR axioms or human heuristics into pre-training methods.

Automatic Expert Selection for Multi-Scenario and Multi-Task Search
Authors: Xinyu Zou (1), Zhi Hu (2), Yiming Zhao (2), Xuchu Ding (2), Zhongyi Liu (2), Chenliang Li (1), Aixin Sun (3)
1: Wuhan University, 2: Ant Group, 3: Nanyang Technological University

ACM DL

Google Scholar

(139)
概要:　マルチシナリオ学習（MSL）は、ユーザーの地理的地域などの異なるユーザーセクターに対してサービスを分離することにより、サービスプロバイダがユーザーの細かいニーズに対応できるようにします。各シナリオでは、クリック率やコンバージョン率などの複数のタスク固有のターゲットを最適化する必要があります。これは、マルチタスク学習（MTL）として知られています。MSLおよびMTLの最近のソリューションは、多くの場合、マルチゲートミクスチャーオブエキスパーツ（MMoE）アーキテクチャに基づいています。MMoE構造は通常静的であり、その設計にはドメイン固有の知識が必要なため、MSLおよびMTLの両方を効果的に処理することが困難です。本論文では、マルチシナリオおよびマルチタスク検索のための新しい自動エキスパート選択フレームワークであるAESM2を提案します。AESM2は、自動構造学習を用いてMSLとMTLの両方を統合した統一フレームワークです。具体的には、AESM2はマルチタスク層をマルチシナリオ層の上に積み重ねています。この階層型設計により、異なるシナリオ間の内在的な接続を柔軟に確立することができるとともに、異なるタスクの高レベル特徴抽出もサポートします。各マルチシナリオ/マルチタスク層では、新しいエキスパート選択アルゴリズムを提案し、各入力に対してシナリオ/タスク固有および共有エキスパートを自動的に識別します。2つの実世界の大規模データセットに対する実験により、AESM2が多くの強力なベースラインを上回る効果があることが実証されました。オンラインA/Bテストでも、複数の指標で大幅なパフォーマンス向上が示されました。現在、AESM2は主要なトラフィックのサービス提供のためにオンラインで導入されています。

Abstract:　 Multi-scenario learning (MSL) enables a service provider to cater for users' fine-grained demands by separating services for different user sectors, e.g., by user's geographical region. Under each scenario there is a need to optimize multiple task-specific targets e.g., click through rate and conversion rate, known as multi-task learning (MTL). Recent solutions for MSL and MTL are mostly based on the multi-gate mixture-of-experts (MMoE) architecture. MMoE structure is typically static and its design requires domain-specific knowledge, making it less effective in handling both MSL and MTL. In this paper, we propose a novel Automatic Expert Selection framework for Multi-scenario and Multi-task search, named AESM2. AESM2 integrates both MSL and MTL into a unified framework with an automatic structure learning. Specifically, AESM2 stacks multi-task layers over multi-scenario layers. This hierarchical design enables us to flexibly establish intrinsic connections between different scenarios, and at the same time also supports high-level feature extraction for different tasks. At each multi-scenario/multi-task layer, a novel expert selection algorithm is proposed to automatically identify scenario-/task-specific and shared experts for each input. Experiments over two real-world large-scale datasets demonstrate the effectiveness of AESM2 over a battery of strong baselines. Online A/B test also shows substantial performance gain on multiple metrics. Currently, AESM2 has been deployed online for serving major traffic.

Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities
Authors: Jiandian Zeng (1), Tianyi Liu (2), Jiantao Zhou (1)
1: University of Macau, 2: Shanghai Jiao Tong University

ACM DL

Google Scholar

(140)
概要:　マルチモーダル感情分析は、すべてのモダリティが利用可能であるという前提のもとで研究されてきました。しかし、実際にはこの強い前提が常に成り立つわけではなく、多くのマルチモーダル融合モデルは一部のモダリティが欠損している場合には機能しなくなることがあります。いくつかの研究は欠損モダリティの問題に取り組んできましたが、そのほとんどが単一モダリティの欠損ケースのみを考慮し、複数のモダリティが欠損するより一般的な実際のケースを無視してきました。この問題に対処するため、本論文ではTATE（Tag-Assisted Transformer Encoder）ネットワークを提案し、不確実なモダリティの欠損に対応します。具体的には、タグエンコーディングモジュールを設計し、単一モダリティおよび複数モダリティの欠損ケースの両方をカバーし、ネットワークの注意を欠損しているモダリティに誘導するようにします。さらに、新しい空間射影パターンを採用して共通ベクトルを整列させます。その後、Transformerエンコーダーデコーダーネットワークを使用して欠損モダリティの特徴を学習します。最後に、Transformerエンコーダーの出力を最終的な感情分類に使用します。CMU-MOSIおよびIEMOCAPデータセットで多数の実験を行った結果、我々の手法は複数のベースラインと比較して有意な改善を達成できることが示されました。

Abstract:　 Multimodal sentiment analysis has been studied under the assumption that all modalities are available. However, such a strong assumption does not always hold in practice, and most of multimodal fusion models may fail when partial modalities are missing. Several works have addressed the missing modality problem; but most of them only considered the single modality missing case, and ignored the practically more general cases of multiple modalities missing. To this end, in this paper, we propose a Tag-Assisted Transformer Encoder (TATE) network to handle the problem of missing uncertain modalities. Specifically, we design a tag encoding module to cover both the single modality and multiple modalities missing cases, so as to guide the network's attention to those missing modalities. Besides, we adopt a new space projection pattern to align common vectors. Then, a Transformer encoder-decoder network is utilized to learn the missing modality features. At last, the outputs of the Transformer encoder are used for the final sentiment classification. Extensive experiments are conducted on CMU-MOSI and IEMOCAP datasets, showing that our method can achieve significant improvements compared with several baselines.

Mutual Disentanglement Learning for Joint Fine-Grained Sentiment Classification and Controllable Text Generation
Authors: Hao Fei (1), Chenliang Li (1), Donghong Ji (1), Fei Li (1)
1: Wuhan University

ACM DL

Google Scholar

(141)
概要:　細粒度感情分類（Fine-grained Sentiment Classification, FGSC）タスクと細粒度制御可能テキスト生成（Fine-grained Controllable Text Generation, FGSG）タスクは、感情分析の代表的な応用例であり、前者は与えられたテキストから細粒度の感情極性を推定することを目的とし、後者は入力された細粒度の意見を説明するテキストを生成することを目的とします。しかし、既存の多くの研究はFGSCとFGSGを独立して解決し、その間の相補的な利点を無視しています。本論文では、FGSCとFGSGを結合し、互いに有利な手法を学習させる共同デュアル学習システムを提案します。デュアル学習フレームワークに基づき、2つのタスクの特徴表現を、細粒度の側面指向の意見変数とコンテンツ変数にそれぞれ分離し、相互解きほぐし学習を行うことによって分離します。また、FGSGで広く使用されている難しい「データからテキスト」生成形式を、モデル入力として代理の自然言語テキストを作成することで、より容易なテキストからテキスト生成形式に変換することを提案します。ドキュメントレベルとセンテンスレベルの両方のデータセットを含む7つの感情分析ベンチマークでの実験結果により、我々の手法がFGSCとFGSGの両タスクにおいて現行の強力なベースラインを大幅に上回ることが示されました。自動評価および人間による評価により、我々のFGSGモデルが細粒度の感情に基づいた流暢で多様かつ豊富なコンテンツを成功裏に生成することが証明されました。

Abstract:　 Fine-grained sentiment classification (FGSC) task and fine-grained controllable text generation (FGSG) task are two representative applications of sentiment analysis, two of which together can actually form an inverse task prediction, i.e., the former aims to infer the fine-grained sentiment polarities given a text piece, while the latter generates text content that describes the input fine-grained opinions. Most of the existing work solves the FGSC and the FGSG tasks in isolation, while ignoring the complementary benefits in between. This paper combines FGSC and FGSG as a joint dual learning system, encouraging them to learn the advantages from each other. Based on the dual learning framework, we further propose decoupling the feature representations in two tasks into fine-grained aspect-oriented opinion variables and content variables respectively, by performing mutual disentanglement learning upon them. We also propose to transform the difficult "data-to-text'' generation fashion widely used in FGSG into an easier text-to-text generation fashion by creating surrogate natural language text as the model inputs. Experimental results on 7 sentiment analysis benchmarks including both the document-level and sentence-level datasets show that our method significantly outperforms the current strong-performing baselines on both the FGSC and FGSG tasks. Automatic and human evaluations demonstrate that our FGSG model successfully generates fluent, diverse and rich content conditioned on fine-grained sentiments.

Graph Adaptive Semantic Transfer for Cross-domain Sentiment Classification
Authors: Kai Zhang (1), Qi Liu (1), Zhenya Huang (1), Mingyue Cheng (1), Kun Zhang (2), Mengdi Zhang (3), Wei Wu (3), Enhong Chen (4)
1: Anhui Province Key Lab. of Big Data Analysis and Application, 2: Hefei University of Technology, 3: Meituan, 4: Anhui Province Key Lab. of Big Data Analysis and Application

ACM DL

Google Scholar

(142)
概要:　クロスドメイン感情分類（CDSC）は、ソースドメインから学習した転送可能な意味論を用いて、ラベルなしターゲットドメイン内のレビューの感情を予測することを目指します。このタスクに関する既存の研究は、文のシーケンスモデリングに多くの注意を払う一方で、グラフ構造（例えば、品詞タグや依存関係）に埋め込まれた豊富なドメイン非依存セマンティクスを大いに無視しています。言語理解の特性を探求する重要な側面として、適応的グラフ表現は近年重要な役割を果たしてきました。そこで、本論文では、CDSCにおいてグラフのような構造からドメイン非依存のセマンティック特徴を学習する可能性を探求することを目指します。具体的には、単語シーケンスと統語的グラフの両方からドメイン非依存セマンティクスを学習できる適応的統語グラフ埋め込み手法であるGraph Adaptive Semantic Transfer（GAST）モデルを提示します。まず、単語シーケンスおよび品詞タグからシーケンシャルなセマンティクス特徴を抽出するためのPOS-Transformerモジュールを提案します。次に、転送可能な依存関係を考慮して構文ベースのセマンティック特徴を生成するためのHybrid Graph Attention（HGAT）モジュールを設計します。最後に、両方のモジュールの共同学習プロセスをガイドするための統合適応戦略（IDS）を考案します。4つの公開データセットに対する広範な実験により、GASTが最先端モデルの範囲に匹敵する効果を達成することが示されました。

Abstract:　 Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain. Existing studies in this task attach more attention to the sequence modeling of sentences while largely ignoring the rich domain-invariant semantics embedded in graph structures (i.e., the part-of-speech tags and dependency relations). As an important aspect of exploring characteristics of language comprehension, adaptive graph representations have played an essential role in recent years. To this end, in the paper, we aim to explore the possibility of learning invariant semantic features from graph-like structures in CDSC. Specifically, we present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs. More specifically, we first raise a POS-Transformer module to extract sequential semantic features from the word sequences as well as the part-of-speech tags. Then, we design a Hybrid Graph Attention (HGAT) module to generate syntax-based semantic features by considering the transferable dependency relations. Finally, we devise an Integrated aDaptive Strategy (IDS) to guide the joint learning process of both modules. Extensive experiments on four public datasets indicate that GAST achieves comparable effectiveness to a range of state-of-the-art models.

Aspect Feature Distillation and Enhancement Network for Aspect-based Sentiment Analysis
Authors: Rui Liu (1), Jiahao Cao (1), Nannan Sun (1), Lei Jiang (1)
1: Institute of Information Engineering

ACM DL

Google Scholar

(143)
概要:　アスペクトベースの感情分析（ABSA）は、対象アスペクトの極性を特定することを目的とした細粒度の感情分析タスクです。いくつかの研究では、さまざまなアテンション機構を導入して異なるアスペクトの関連コンテキスト単語を完全に探索し、ABSAタスクのモデルを微調整するために従来のクロスエントロピー損失を使用しています。しかし、アテンション機構がアスペクトと無関係な単語に部分的に注意を払うことにより、無関係なノイズが不可避的に導入されます。さらに、クロスエントロピー損失は特徴の識別学習に欠けており、クラス内のコンパクトさやクラス間の分離性の暗黙情報を活用するのが困難です。これらの課題を克服するために、ABSAタスクのためにアスペクト特徴蒸留および強化ネットワーク（AFDEN）を提案します。まず、デュアル特徴抽出モジュールを提案し、アテンション機構とグラフ畳み込みネットワークを通じてアスペクト関連の特徴とアスペクト非関連の特徴を抽出します。次に、アスペクト非関連単語の干渉を排除するために、逆行訓練を通じてアスペクト非関連の文脈特徴を学習する勾配逆転層と、アスペクト関連の特徴をアスペクト非関連の特徴の直交空間にさらに投影するアスペクト特有の直交投影層を含む新しいアスペクト特徴蒸留モジュールを設計します。最後に、同じ感情ラベル間および異なる感情ラベル間の暗黙情報を捉えるために、教師付きコントラスト学習を活用するアスペクト特徴強化モジュールを提案します。3つの公開データセットにおける実験結果は、我々のAFDENモデルが最先端のパフォーマンスを達成し、モデルの有効性と堅牢性を検証することを示しています。

Abstract:　 Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task designed to identify the polarity of a target aspect. Some works introduce various attention mechanisms to fully mine the relevant context words of different aspects, and use the traditional cross-entropy loss to fine-tune the models for the ABSA task. However, the attention mechanism paying partial attention to aspect-unrelated words inevitably introduces irrelevant noise. Moreover, the cross-entropy loss lacks discriminative learning of features, which makes it difficult to exploit the implicit information of intra-class compactness and inter-class separability. To overcome these challenges, we propose an Aspect Feature Distillation and Enhancement Network (AFDEN) for the ABSA task. We first propose a dual-feature extraction module to extract aspect-related and aspect-unrelated features through the attention mechanisms and graph convolutional networks. Then, to eliminate the interference of aspect-unrelated words, we design a novel aspect-feature distillation module containing a gradient reverse layer that learns aspect-unrelated contextual features through adversarial training, and an aspect-specific orthogonal projection layer to further project aspect-related features into the orthogonal space of aspect-unrelated features. Finally, we propose an aspect-feature enhancement module that leverages supervised contrastive learning to capture the implicit information between the same sentiment labels and between different sentiment labels. Experimental results on three public datasets demonstrate that our AFDEN model achieves state-of-the-art performance and verify the effectiveness and robustness of our model.

IAOTP: An Interactive End-to-End Solution for Aspect-Opinion Term Pairs Extraction
Authors: Ambreen Nazir (1), Yuan Rao (1)
1: Xi'an Jiaotong University

ACM DL

Google Scholar

(144)
概要:　近年、アスペクトベースの感情分析の分野では、アスペクト・意見用語ペア（AOTP）抽出タスクが重要性を増しています。このタスクは、ユーザーレビューに含まれる各アスペクト用語に対する対応する意見用語のペアを抽出することを目的としています。既存の研究の中には、アノテーションされたアスペクト用語や意見用語に大きく依存するものや、外部の知識・リソースを用いてタスクを解決するものもあります。そこで本研究では、AOTPを探索するための新しいエンドツーエンドソリューションとして、インタラクティブAOTP（IAOTP）モデルを提案します。IAOTPモデルはまず、スパンベースの操作を通じて、アスペクト固有および意見固有の表現における各トークンの境界を追跡します。次に、双対の関係をバイアファイン変換によって形成することで候補のAOTPを生成します。その後、各候補ペアが持つ重要な距離関係を捉えるために位置情報を計算します。そして最後に、2次元自己注意を通じて、AOTPの協調的な相互作用と予測を共同でモデル化します。さらに、IAOTPモデルとは別に、アスペクトおよび意見用語の抽出を効果的に行うために、関係のセマンティクスを形成してアスペクト固有および意見固有の表現を得る独立したアスペクト/意見エンコーディングモデル（RSモデル）も提案します。AOTP、アスペクト用語、および意見用語の抽出タスクに関する公開ベンチマークデータセットを用いて実施した詳細な実験では、他の最新の競合ベースラインと比較して、我々のモデルの性能が著しく向上していることが明確に示されました。

Abstract:　 Recently, the aspect-opinion term pairs (AOTP) extraction task has gained substantial importance in the domain of aspect-based sentiment analysis. It intends to extract the potential pair of each aspect term with its corresponding opinion term present in a user review. Some existing studies heavily relied on the annotated aspect terms and/or opinion terms, or adopted external knowledge/resources to figure out the task. Therefore, in this study, we propose a novel end-to-end solution, called an Interactive AOTP (IAOTP) model, for exploring AOTP. The IAOTP model first tracks the boundary of each token in given aspect-specific and opinion-specific representations through a span-based operation. Next, it generates the candidate AOTP by formulating the dyadic relations between tokens through the Biaffine transformation. Then, it computes the positioning information to capture the significant distance relationship that each candidate pair holds. And finally, it jointly models collaborative interactions and prediction of AOTP through a 2D self-attention. Besides the IAOTP model, this study also proposes an independent aspect/opinion encoding model (a RS model) that formulates relational semantics to obtain aspect-specific and opinion-specific representations that can effectively perform the extraction of aspect and opinion terms. Detailed experiments conducted on the publicly available benchmark datasets for AOTP, aspect terms, and opinion terms extraction tasks, clearly demonstrate the significantly improved performance of our models relative to other competitive state-of-the-art baselines.

Ada-Ranker: A Data Distribution Adaptive Ranking Paradigm for Sequential Recommendation
Authors: Xinyan Fan (1), Jianxun Lian (2), Wayne Xin Zhao (1), Zheng Liu (2), Chaozhuo Li (2), Xing Xie (2)
1: Renmin University of China, 2: Microsoft Research Asia

ACM DL

Google Scholar

(145)
概要:　大規模レコメンダーシステムは通常、リコールモジュールとランク付けモジュールで構成されます。ランク付けモジュール（通称ランカー）の目的は、リコールモジュールによって提案されたアイテム候補に対するユーザーの好みを精緻に識別することです。ディープラーニング技術が様々な分野で成功を収めたことで、主流のランカーは従来のモデルからディープニューラルモデルへと進化してきました。しかし、ランカーの設計と利用方法は依然として同じで、オフラインでモデルを訓練し、パラメータを固定し、そのままオンラインで使用するスタイルが続いています。実際、候補アイテムは特定のユーザーリクエストによって決定され、その基礎的な分布（例：さまざまなカテゴリのアイテムの割合、人気アイテムや新アイテムの割合）は、プロダクション環境において大きく異なります。この従来のパラメータ固定の推論方法では、動的なサービス環境に適応できないため、ランカーの性能が低下します。本論文では、動的なオンラインサービスの課題に対処するために、「Ada-Ranker」と呼ばれる新しい訓練および推論のパラダイムを提案します。パラメータ固定のモデルを普遍的に使用する代わりに、Ada-Rankerは現在のアイテム候補グループのデータ分布に応じてランカーのパラメータを適応的に調整できます。まず、アイテム候補から分布パターンを抽出します。次に、そのパターンに基づいてランカーを調整し、現在のデータ分布に適応させます。最後に、調整済みのランカーを使用して候補リストをスコアリングします。この方法により、ランカーにグローバルモデルからローカルモデルへ適応する能力を持たせ、現在のタスクをより適切に処理します。初めての研究として、連続推薦シナリオにおいてAda-Rankerパラダイムを検証しました。3つのデータセットでの実験により、Ada-Rankerがさまざまな基礎となる連続モデルを効果的に強化し、また多くの競合ベースラインを上回ることが示されました。

Abstract:　 A large-scale recommender system usually consists of recall and ranking modules. The goal of ranking modules (aka rankers) is to elaborately discriminate users' preference on item candidates proposed by recall modules. With the success of deep learning techniques in various domains, we have witnessed the mainstream rankers evolve from traditional models to deep neural models. However, the way that we design and use rankers remains unchanged: offline training the model, freezing the parameters, and deploying it for online serving. Actually, the candidate items are determined by specific user requests, in which underlying distributions (e.g., the proportion of items for different categories, the proportion of popular or new items) are highly different from one another in a production environment. The classical parameter-frozen inference manner cannot adapt to dynamic serving circumstances, making rankers' performance compromised. In this paper, we propose a new training and inference paradigm, termed as Ada-Ranker, to address the challenges of dynamic online serving. Instead of using parameter-frozen models for universal serving, Ada-Ranker can adaptively modulate parameters of a ranker according to the data distribution of the current group of item candidates. We first extract distribution patterns from the item candidates. Then, we modulate the ranker by the patterns to make the ranker adapt to the current data distribution. Finally, we use the revised ranker to score the candidate list. In this way, we empower the ranker with the capacity of adapting from a global model to a local model which better handles the current task. As a first study, we examine our Ada-Ranker paradigm in the sequential recommendation scenario. Experiments on three datasets demonstrate that Ada-Ranker can effectively enhance various base sequential models and also outperform a comprehensive set of competitive baselines.

Decoupled Side Information Fusion for Sequential Recommendation
Authors: Yueqi Xie (1), Peilin Zhou (2), Sunghun Kim (1)
1: HKUST, 2: Upstage

ACM DL

Google Scholar

(146)
概要:　連続型レコメンデーション (Sequential Recommendation, SR) における補助情報統合は、次アイテムの予測性能を向上させるために様々な補助情報を効果的に活用することを目指しています。最新の方法の多くは自己注意 (self-attention) ネットワークに基づいて構築され、注意層の前にアイテム埋め込みと補助情報埋め込みを統合するための様々なソリューションを探求しています。しかし、我々の分析によると、様々な種類の埋め込みの早期統合はランクのボトルネックのために注意行列の表現力を制限し、勾配の柔軟性をも制約しています。また、異なるヘテロジニアス情報リソース間の混合相関が注意計算に余分な混乱をもたらします。これに動機づけられ、補助情報を入力層から注意層に移動させ、補助情報およびアイテム表現の注意計算を分離するDecoupled Side Information Fusion for Sequential Recommendation (DIF-SR)を提案します。我々は理論的および実証的に、この提案されたソリューションが高ランクの注意行列と柔軟な勾配を可能にし、補助情報統合のモデリング能力を向上させることを示します。さらに、補助的な属性予測子を提案し、補助情報とアイテム表現学習の間の有益な相互作用をさらに活性化させます。四つの実データセットにおける広範な実験により、提案されたソリューションが最新のSRモデルを安定的に上回ることを示しています。更なる研究では、提案するソリューションが現在の注意ベースのSRモデルに容易に組み込まれ、性能を大幅に向上させることが示されています。我々のソースコードは、https://github.com/AIM-SE/DIF-SR で利用可能です。

Abstract:　 Side information fusion for sequential recommendation (SR) aims to effectively leverage various side information to enhance the performance of next-item prediction. Most state-of-the-art methods build on self-attention networks and focus on exploring various solutions to integrate the item embedding and side information embeddings before the attention layer. However, our analysis shows that the early integration of various types of embeddings limits the expressiveness of attention matrices due to a rank bottleneck and constrains the flexibility of gradients. Also, it involves mixed correlations among the different heterogeneous information resources, which brings extra disturbance to attention calculation. Motivated by this, we propose Decoupled Side Information Fusion for Sequential Recommendation (DIF-SR), which moves the side information from the input to the attention layer and decouples the attention calculation of various side information and item representation. We theoretically and empirically show that the proposed solution allows higher-rank attention matrices and flexible gradients to enhance the modeling capacity of side information fusion. Also, auxiliary attribute predictors are proposed to further activate the beneficial interaction between side information and item representation learning. Extensive experiments on four real-world datasets demonstrate that our proposed solution stably outperforms state-of-the-art SR models. Further studies show that our proposed solution can be readily incorporated into current attention-based SR models and significantly boost performance. Our source code is available at https://github.com/AIM-SE/DIF-SR.

Multi-Agent RL-based Information Selection Model for Sequential Recommendation
Authors: Kaiyuan Li (1), Pengfei Wang (1), Chenliang Li (2)
1: Beijing University of Posts and Telecommunications, 2: Wuhan University

ACM DL

Google Scholar

(147)
概要:　系列推薦システムにおいて、大規模なユーザアイテム相互作用から抽出される粗粒度でありながら希少な系列信号は、推薦性能の向上を阻害するボトルネックとなっています。この疎な問題を解決するために、補助的な意味的特徴（例えばテキストの説明、視覚的画像、知識グラフなど）を活用して文脈情報を豊かにする手法が主流となっています。しかし、それらの異質な特徴には多くのノイズが含まれており、価値ある系列信号を圧倒させる可能性があり、容易に負の協調現象（すなわち1 + 1 > 2）に至ると考えます。適切な補助情報を選択し、負の協調を軽減してより良い推薦を達成するための柔軟な戦略を設計することは、依然として興味深く未解決の課題です。しかし、系列推薦におけるこの課題に取り組んだ研究はほとんどありません。本研究では、異なる補助情報と系列信号の間で効果的な協力関係を自動的に探るために、マルチエージェント強化学習ベースの情報選択モデル（MARIS）を紹介します。具体的には、MARISは補助特徴の選択を協調的なマルチエージェント・マルコフ決定プロセスとして形式化します。各補助特徴タイプについて、MARISはエージェントを用いて特定の補助特徴を取り込むべきかを判断し、正の協力関係を実現します。その際、QMIXネットワークを利用して、選択行動の協調を図り、過去の系列全体に対する効果的な補助特徴の組み合わせを生成します。教師付き選択信号の不足を考慮し、エピソードサンプリングのための新しい報酬ガイド付きサンプリング戦略を考案し、利用と探索を駆使します。これらをリプレイバッファに保存することで、MARISは最適化のためにアクション値関数と報酬を交互に学習します。4つの実世界のデータセットで行った広範な実験により、我々のモデルが最新の最先端推薦モデルに対して有意な性能向上を達成することを示しました。

Abstract:　 For sequential recommender, the coarse-grained yet sparse sequential signals mined from massive user-item interactions have become the bottleneck to further improve the recommendation performance. To alleviate the spareness problem, exploiting auxiliary semantic features (\eg textual descriptions, visual images and knowledge graph) to enrich contextual information then turns into a mainstream methodology. Though effective, we argue that these different heterogeneous features certainly include much noise which may overwhelm the valuable sequential signals, and therefore easily reach the phenomenon of negative collaboration (ie 1 + 1 > 2). How to design a flexible strategy to select proper auxiliary information and alleviate the negative collaboration towards a better recommendation is still an interesting and open question. Unfortunately, few works have addressed this challenge in sequential recommendation. In this paper, we introduce a Multi-Agent RL-based Information S election Model (named MARIS) to explore an effective collaboration between different kinds of auxiliary information and sequential signals in an automatic way. Specifically, MARIS formalizes the auxiliary feature selection as a cooperative Multi-agent Markov Decision Process. For each auxiliary feature type, MARIS resorts to using an agent to determine whether a specific kind of auxiliary feature should be imported to achieve a positive collaboration. In between, a QMIX network is utilized to cooperate their joint selection actions and produce an episode corresponding an effective combination of different auxiliary features for the whole historical sequence. Considering the lack of supervised selection signals, we further devise a novel reward-guided sampling strategy to leverage exploitation and exploration scheme for episode sampling. By preserving them in a replay buffer, MARIS learns the action-value function and the reward alternatively for optimization. Extensive experiments on four real-world datasets demonstrate that our model obtains significant performance improvement over up-to-date state-of-the-art recommendation models.

When Multi-Level Meets Multi-Interest: A Multi-Grained Neural Model for Sequential Recommendation
Authors: Yu Tian (1), Jianxin Chang (2), Yanan Niu (2), Yang Song (2), Chenliang Li (1)
1: Wuhan University, 2: Kuaishou.com

ACM DL

Google Scholar

(148)
概要:　シーケンシャルレコメンデーションは、ユーザーの行動履歴に基づいて次に好まれるアイテムを特定することを目的としています。従来のシーケンシャルモデルがアテンションメカニズムやRNNを活用するのに対し、最近の研究は主に二つの方向性に沿って改善が試みられています：マルチインタレスト学習とグラフ畳み込み集約です。具体的には、ComiRecやMIMNのようなマルチインタレストメソッドは、履歴アイテムのクラスターリングを行うことでユーザーの異なる興味を抽出し、一方、TGSRecやSURGEなどのグラフ畳み込みメソッドは、履歴アイテム間の多層的な相関に基づいてユーザーの好みを洗練します。しかし、これらの二つの方法が互いに補完し合って、マルチレベルのユーザーの好みを集約することでより精密なマルチインタレスト抽出を達成し、より良いレコメンデーションを提供できることには気づいていません。これを踏まえ、本論文では、マルチインタレスト学習とグラフ畳み込み集約を組み合わせた統一されたマルチグレインニューラルモデル（MGNM）を提案します。具体的には、MGNMはまずユーザーの履歴アイテムのグラフ構造と情報集約経路を学習します。その後、グラフ畳み込みを行い、反復的にアイテム表現を導き出し、異なるレベルの複雑な好みを良く捉えます。その後、新しいシーケンシャルカプセルネットワークを提案し、シーケンスパターンをマルチインタレスト抽出プロセスに注入し、より正確な興味学習をマルチグレインの方法で実現します。異なるシナリオから得た三つの実世界のデータセットを用いた実験により、最先端のベースラインに対するMGNMの優位性が実証されました。特にNDCG@5とHIT@5での最高のベースラインに対する性能向上はそれぞれ27.10%と25.17%に達しており、これはシーケンシャルレコメンデーションの最近の開発における最大の向上の一つです。更なる分析により、MGNMがマルチグレインレベルでのユーザーの好み理解において強力かつ有効であることが示されました。

Abstract:　 Sequential recommendation aims at identifying the next item that is preferred by a user based on their behavioral history. Compared to conventional sequential models that leverage attention mechanisms and RNNs, recent efforts mainly follow two directions for improvement: multi-interest learning and graph convolutional aggregation. Specifically, multi-interest methods such as ComiRec and MIMN, focus on extracting different interests for a user by performing historical item clustering, while graph convolution methods including TGSRec and SURGE elect to refine user preferences based on multilevel correlations between historical items. Unfortunately, neither of them realizes that these two types of solutions can mutually complement each other, by aggregating multi-level user preference to achieve more precise multi-interest extraction for a better recommendation. To this end, in this paper, we propose a unified multi-grained neural model (named MGNM) via a combination of multi-interest learning and graph convolutional aggregation. Concretely, MGNM first learns the graph structure and information aggregation paths of the historical items for a user. It then performs graph convolution to derive item representations in an iterative fashion, in which the complex preferences at different levels can be well captured. Afterwards, a novel sequential capsule network is proposed to inject the sequential patterns into the multi-interest extraction process, leading to a more precise interest learning in a multi-grained manner. Experiments on three real-world datasets from different scenarios demonstrate the superiority of MGNM against several state-of-the-art baselines. The performance gain over the best baseline is up to 27.10% and 25.17% in terms of NDCG@5 and HIT@5 respectively, which is one of the largest gains in recent development of sequential recommendation. Further analysis also demonstrates that MGNM is robust and effective at user preference understanding at multi-grained levels.

Multi-Behavior Sequential Transformer Recommender
Authors: Enming Yuan (1), Wei Guo (2), Zhicheng He (2), Huifeng Guo (2), Chengkai Liu (3), Ruiming Tang (2)
1: Tsinghua University, 2: Noah's Ark Lab, 3: Shanghai Jiao Tong University

ACM DL

Google Scholar

(149)
概要:　多くの実世界のレコメンダーシステムでは、ユーザーがアイテムと逐次的かつ多様な行動で相互作用します。ユーザーの多様な行動の背後にあるアイテム間の微細な関係を探索することは、レコメンダーシステムの性能向上において極めて重要です。これまでの成功にもかかわらず、既存の手法には、異種アイテムレベルの多様な依存関係をモデル化すること、異なる行動の逐次的な動態を捉えること、データの希薄性問題を緩和することにおいて限界があります。本論文では、上記の3つの制限に対処するフレームワークを導出できることを示します。提案するフレームワークMB-STR、すなわちMulti-Behavior Sequential Transformer Recommenderは、多行動トランスフォーマーレイヤー（MB-Trans）、多行動逐次パターン生成器（MB-SPG）、および行動認識予測モジュール（BA-Pred）を備えています。一般的なトランスフォーマーと比較して、MB-Transは多様な行動依存関係及び行動固有の意味を捉えるために設計され、MB-SPGは複数の行動間の多様な逐次パターンをエンコードし、BA-Predは多様な行動の監視をより効果的に利用します。3つの実世界データセットに対する包括的な実験で、様々な競争力あるベースラインと比較して、MB-STRが推奨性能を大幅に向上させることを示しました。さらに、アブレーション研究により、MB-STRの異なるモジュールの優位性が明らかになりました。

Abstract:　 In most real-world recommender systems, users interact with items in a sequential and multi-behavioral manner. Exploring the fine-grained relationship of items behind the users' multi-behavior interactions is critical in improving the performance of recommender systems. Despite the great successes, existing methods seem to have limitations on modelling heterogeneous item-level multi-behavior dependencies, capturing diverse multi-behavior sequential dynamics, or alleviating data sparsity problems. In this paper, we show it is possible to derive a framework to address all the above three limitations. The proposed framework MB-STR, a Multi-Behavior Sequential Transformer Recommender, is equipped with the multi-behavior transformer layer (MB-Trans), the multi-behavior sequential pattern generator (MB-SPG) and the behavior-aware prediction module (BA-Pred). Compared with a typical transformer, we design MB-Trans to capture multi-behavior heterogeneous dependencies as well as behavior-specific semantics, propose MB-SPG to encode the diverse sequential patterns among multiple behaviors, and incorporate BA-Pred to better leverage multi-behavior supervision. Comprehensive experiments on three real-world datasets show the effectiveness of MB-STR by significantly boosting the recommendation performance compared with various competitive baselines. Further ablation studies demonstrate the superiority of different modules of MB-STR.

Determinantal Point Process Likelihoods for Sequential Recommendation
Authors: Yuli Liu (1), Christian Walder (2), Lexing Xie (1)
1: Australian National University & Data61, 2: Data61

ACM DL

Google Scholar

(150)
概要:　シーケンシャルレコメンデーションは学術研究において人気のある課題であり、実世界の応用シナリオに近いもので、その目的はユーザーの過去の行動シーケンスに基づいて次の行動を予測することです。レコメンデーションシステムのトレーニングプロセスにおいて、損失関数は、ユーザーに正確な提案を生成するためにレコメンデーションモデルの最適化を導く重要な役割を果たします。しかし、現存の多くのシーケンシャルレコメンデーション技術はアルゴリズムやニューラルネットワークのアーキテクチャ設計に焦点を当てており、シーケンシャルレコメンダーシステムの実際の応用シナリオに自然に適合する損失関数をカスタマイズする努力はほとんど行われていません。順序に基づく損失関数、例えばクロスエントロピーやベイズパーソナライズドランキング（BPR）はシーケンシャルレコメンデーション分野で広く使用されていますが、これらの目的関数には2つの本質的な欠点があると主張します。i) シーケンスの要素間の依存性がこれらの損失形式では見過ごされる; ii) 精度（品質）と多様性のバランスを取るのではなく、正確な結果を生成することに過度に重点が置かれている。そこで、次のアイテムまたはアイテム群を適応的に推定するために使用できるディターミナントポイントプロセス（DPP）尤度に基づく2つの新しい損失関数を提案します。DPP分布アイテムセットは時間的行動間の自然な依存性を捉え、DPPカーネルの品質対多様性の分解により、精度指向の損失関数を超えて進化することができます。3つの実世界のデータセットを使用した実験結果により、提案された損失関数を用いた場合、品質と多様性の両方の指標において、最先端のシーケンシャルレコメンデーション手法に比べて顕著な改善が示されました。

Abstract:　 Sequential recommendation is a popular task in academic research and close to real-world application scenarios, where the goal is to predict the next action(s) of the user based on his/her previous sequence of actions. In the training process of recommender systems, the loss function plays an essential role in guiding the optimization of recommendation models to generate accurate suggestions for users. However, most existing sequential recommendation tech- niques focus on designing algorithms or neural network architectures, and few efforts have been made to tailor loss functions that fit naturally into the practical application scenario of sequential recommender systems. Ranking-based losses, such as cross-entropy and Bayesian Personalized Ranking (BPR) are widely used in the sequential recommendation area. We argue that such objective functions suffer from two inherent drawbacks: i) the dependencies among elements of a sequence are overlooked in these loss formulations; ii) instead of balancing accuracy (quality) and diversity, only generating accurate results has been over emphasized. We therefore propose two new loss functions based on the Determinantal Point Process (DPP) likelihood, that can be adaptively applied to estimate the subsequent item or items. The DPP-distributed item set captures natural dependencies among temporal actions, and a quality vs. diversity decomposition of the DPP kernel pushes us to go beyond accuracy-oriented loss functions. Experimental results using the proposed loss functions on three real-world datasets show marked improvements over state-of-the-art sequential recommendation methods in both quality and diversity metrics.

Thinking inside The Box: Learning Hypercube Representations for Group Recommendation
Authors: Tong Chen (1), Hongzhi Yin (1), Jing Long (1), Quoc Viet Hung Nguyen (2), Yang Wang (3), Meng Wang (3)
1: The University of Queensland, 2: Griffith University, 3: Hefei University of Technology

ACM DL

Google Scholar

(151)
概要:　従来の個人レベルのレコメンデーションを超えて、グループレコメンデーションは、ユーザーのグループを満足させるアイテムを提案するタスクです。グループレコメンデーションにおいて、中心となるのは全てのグループメンバーの好みを質的に統合するための嗜好集約関数を設計することです。このようなユーザーおよびグループの嗜好は、ベクトル空間（つまり、埋め込み）内の点として表現されることが一般的であり、複数のユーザー埋め込みが1つに圧縮されて、グループ-アイテムペアのランキングを容易にします。しかし、点として表されるグループの表現は多面的なユーザーの嗜好を十分に反映する柔軟性と容量に欠けています。また、点埋め込みに基づく嗜好集約は、グループの意思決定プロセスを忠実に反映しているとは言えません。なぜなら、各埋め込み次元で一定の値に全ユーザーが同意しなければならず、交渉可能な範囲を持たないからです。本論文では、ベクトル空間内に無数の点を含む部分空間であるハイパーキューブの概念を用いたグループの新しい表現を提案します。具体的には、ユーザーの埋め込みからの情報損失を最小限に抑えつつ、グループのハイパーキューブを適応的に学習させるためにハイパーキューブレコメンダー（CubeRec）を設計し、グループのハイパーキューブとアイテムの点との親和性を測るために改良された距離計量を活用します。さらに、グループレコメンデーションにおける長年の問題であるデータの希少性に対抗するために、ハイパーキューブの幾何学的な表現力を最大限に活用し、二つのグループを交差させることで自己教師あり学習を革新的に取り入れます。4つの実世界データセットでの実験により、CubeRecが最先端のベースラインを上回る優越性が確認されました。

Abstract:　 As a step beyond traditional personalized recommendation, group recommendation is the task of suggesting items that can satisfy a group of users. In group recommendation, the core is to design preference aggregation functions to obtain a quality summary of all group members' preferences. Such user and group preferences are commonly represented as points in the vector space (i.e., embeddings), where multiple user embeddings are compressed into one to facilitate ranking for group-item pairs. However, the resulted group representations, as points, lack adequate flexibility and capacity to account for the multi-faceted user preferences. Also, the point embedding-based preference aggregation is a less faithful reflection of a group's decision-making process, where all users have to agree on a certain value in each embedding dimension instead of a negotiable interval. In this paper, we propose a novel representation of groups via the notion of hypercubes, which are subspaces containing innumerable points in the vector space. Specifically, we design the hypercube recommender (CubeRec) to adaptively learn group hypercubes from user embeddings with minimal information loss during preference aggregation, and to leverage a revamped distance metric to measure the affinity between group hypercubes and item points. Moreover, to counteract the long-standing issue of data sparsity in group recommendation, we make full use of the geometric expressiveness of hypercubes and innovatively incorporate self-supervision by intersecting two groups. Experiments on four real-world datasets have validated the superiority of CubeRec over state-of-the-art baselines.

An Attribute-Driven Mirror Graph Network for Session-based Recommendation
Authors: Siqi Lai (1), Erli Meng (2), Fan Zhang (1), Chenliang Li (1), Bin Wang (2), Aixin Sun (3)
1: Wuhan University, 2: Xiaomi.com, 3: Nanyang Technological University

ACM DL

Google Scholar

(152)
概要:　セッションベースのレコメンデーション（SBR）は、匿名で短いインタラクションシーケンスに基づいてユーザーが次にクリックするアイテムを予測することを目的としています。これまでのSBRモデルは、限られた短期的な遷移情報のみに依存しており、追加の有益な知識を活用していないため、データの希少性の問題に多く苦しんできました。本研究では、セッションベースのレコメンデーションのための新規なミラーグラフ強化ニューラルモデル（MGS）を提案し、アイテム属性情報をアイテム埋め込みに活用することで、より正確な好みの推定を行います。具体的には、MGSはアイテム表現を学習するために二種類のグラフを利用します。一つは、遷移パターンに基づいてユーザーの好みを記述するユーザーインタラクションシーケンスから生成されるセッショングラフです。もう一つは、アイテムの属性情報を統合することで各セッションアイテムに対して最も属性を代表する情報を選択する属性認識モジュールによって構築されるミラーグラフです。私たちは、セッションおよびミラーグラフ間で情報を伝播させるために反復的な二重精錬メカニズムを適用しました。さらに、属性認識モジュールのトレーニングプロセスをガイドするために、属性が同じ隣接アイテムをランダムにサンプリングして同じセッションのために生成された二つのミラーグラフを比較する対比学習戦略も導入しました。三つの実世界データセットでの実験結果は、MGSのパフォーマンスが多くの最新のモデルを上回ることを示しています。

Abstract:　 Session-based recommendation (SBR) aims to predict a user's next clicked item based on an anonymous yet short interaction sequence. Previous SBR models, which rely only on the limited short-term transition information without utilizing extra valuable knowledge, have suffered a lot from the problem of data sparsity. This paper proposes a novel mirror graph enhanced neural model for session-based recommendation (MGS), to exploit item attribute information over item embeddings for more accurate preference estimation. Specifically, MGS utilizes two kinds of graphs to learn item representations. One is a session graph generated from the user interaction sequence describing users' preference based on transition patterns. Another is a mirror graph built by an attribute-aware module that selects the most attribute-representative information for each session item by integrating items' attribute information. We applied an iterative dual refinement mechanism to propagate information between the session and mirror graphs. To further guide the training process of the attribute-aware module, we also introduce a contrastive learning strategy that compares two mirror graphs generated for the same session by randomly sampling the attribute-same neighbors. Experiments on three real-world datasets exhibit that the performance of MGS surpasses many state-of-the-art models.

Price DOES Matter!: Modeling Price and Interest Preferences in Session-based Recommendation
Authors: Xiaokun Zhang (1), Bo Xu (1), Liang Yang (1), Chenliang Li (2), Fenglong Ma (3), Haifeng Liu (1), Hongfei Lin (1)
1: Dalian University of Technology, 2: Wuhan University, 3: Pennsylvania State University

ACM DL

Google Scholar

(153)
概要:　セッションベースのレコメンデーションは、匿名ユーザーの短い行動シーケンスに基づいて、ユーザーが購入したいと考えるアイテムを予測することを目的としています。現在のセッションベースのレコメンデーションのアプローチは、ユーザーの興味の好みをモデル化することに重点を置いており、アイテムの重要な属性である価格を無視しています。多くのマーケティング研究は、価格要因がユーザーの行動に大きな影響を与え、ユーザーの購入決定が価格と興味の好みの両方によって同時に決定されることを示しています。しかし、セッションベースのレコメンデーションに価格の好みを組み込むことは容易ではありません。まず、ユーザーの価格の好みを把握するために、アイテムのさまざまな特徴から異質な情報を処理するのは難しいです。次に、ユーザーの選択を決定する際の価格と興味の好みの複雑な関係をモデル化するのは困難です。これらの課題に対処するために、新しい方法である共同指導異質ハイパーグラフネットワーク（CoHHN）を提案します。最初の課題に対しては、異質な情報とそれらの豊富な関係を表現するために異質ハイパーグラフを考案しました。その後、異質なハイパーグラフ内のさまざまな情報を集約するためにデュアルチャネル集約メカニズムが設計されました。次に、注意層を通じてユーザーの価格の好みと興味の好みを抽出します。第二の課題に関しては、価格と興味の好みの関係をモデル化し、相互に学習を強化するために共同指導学習スキームを設計しました。最後に、アイテムの特徴とユーザーの価格と興味の好みに基づいてユーザーの行動を予測します。3つの実世界のデータセットにおける広範な実験により、提案されたCoHHNの有効性が実証されました。さらに分析は、セッションベースのレコメンデーションにおける価格の重要性を明らかにします。

Abstract:　 Session-based recommendation aims to predict items that an anonymous user would like to purchase based on her short behavior sequence. The current approaches towards session-based recommendation only focus on modeling users' interest preferences, while they all ignore a key attribute of an item, i.e., the price. Many marketing studies have shown that the price factor significantly influences users' behaviors and the purchase decisions of users are determined by both price and interest preferences simultaneously. However, it is nontrivial to incorporate price preferences for session-based recommendation. Firstly, it is hard to handle heterogeneous information from various features of items to capture users' price preferences. Secondly, it is difficult to model the complex relations between price and interest preferences in determining user choices. To address the above challenges, we propose a novel method Co-guided Heterogeneous Hypergraph Network (CoHHN) for session-based recommendation. Towards the first challenge, we devise a heterogeneous hypergraph to represent heterogeneous information and rich relations among them. A dual-channel aggregating mechanism is then designed to aggregate various information in the heterogeneous hypergraph. After that, we extract users' price preferences and interest preferences via attention layers. As to the second challenge, a co-guided learning scheme is designed to model the relations between price and interest preferences and enhance the learning of each other. Finally, we predict user actions based on item features and users' price and interest preferences. Extensive experiments on three real-world datasets demonstrate the effectiveness of the proposed CoHHN. Further analysis reveals the significance of price for session-based recommendation.

AutoGSR: Neural Architecture Search for Graph-based Session Recommendation
Authors: Jingfan Chen (1), Guanghui Zhu (1), Haojun Hou (1), Chunfeng Yuan (1), Yihua Huang (1)
1: Nanjing University

ACM DL

Google Scholar

(154)
概要:　セッションベースの推薦は、匿名ユーザーの一定数の過去のアクションに基づいて次のクリックアクション（例：アイテム）を予測することを目的としています。最近では、グラフニューラルネットワーク（GNN）が様々なアプリケーションで優れたパフォーマンスを示しており、GNNの成功に触発されて、セッションベースの推薦にGNNを導入する試みが数多く行われており、顕著な結果を達成しています。しかし、セッション内の潜在情報の多様なタイプのために、既存のGNNベースの方法は異なるセッションデータセットで異なるパフォーマンスを示します。このため、さまざまなセッション推薦シナリオに適応したニューラルネットワークの効率的な設計が求められています。この問題に対処するために、我々はGraphベースのセッション推薦のための自動化されたニューラルアーキテクチャ検索フレームワーク、すなわちAutoGSRを提案します。このフレームワークは、最適なGNNベースのセッション推薦モデルを自動的に見つける実用的かつ汎用的なソリューションを提供します。AutoGSRでは、表現力とコンパクトな検索空間を構築するための2つの新しいGNN操作を提案します。この検索空間を基に、微分可能な検索アルゴリズムを用いて最適なグラフニューラルアーキテクチャを検索します。さらに、すべてのタイプのセッション情報を総合的に考慮するために、最終セッション表現の最適化を指導する予備知識としてアイテムメタ知識を学習することを提案します。3つの実世界のデータセットに関する包括的な実験により、AutoGSRが効果的なニューラルアーキテクチャを見つけ、最先端の結果を達成することが示されました。我々の知る限り、セッションベースの推薦に対するニューラルアーキテクチャ検索を研究するのは我々が初めてです。

Abstract:　 Session-based recommendation aims to predict next click action (e.g., item) of anonymous users based on a fixed number of previous actions. Recently, Graph Neural Networks (GNNs) have shown superior performance in various applications. Inspired by the success of GNNs, tremendous endeavors have been devoted to introduce GNNs into session-based recommendation and have achieved significant results. Nevertheless, due to the highly diverse types of potential information in sessions, existing GNNs-based methods perform differently on different session datasets, leading to the need for efficient design of neural networks adapted to various session recommendation scenarios. To address this problem, we propose Automated neural architecture search for Graph-based Session Recommendation, namely AutoGSR, a framework that provides a practical and general solution to automatically find the optimal GNNs-based session recommendation model. In AutoGSR, we propose two novel GNN operations to build an expressive and compact search space. Building upon the search space, we employ a differentiable search algorithm to search for the optimal graph neural architecture. Furthermore, to consider all types of session information together, we propose to learn the item meta knowledge, which acts as a priori knowledge for guiding the optimization of final session representations. Comprehensive experiments on three real-world datasets demonstrate that AutoGSR is able to find effective neural architectures and achieve state-of-the-art results. To the best of our knowledge, we are the first to study the neural architecture search for the session-based recommendation.

Multi-Faceted Global Item Relation Learning for Session-Based Recommendation
Authors: Qilong Han (1), Chi Zhang (1), Rui Chen (1), Riwei Lai (1), Hongtao Song (1), Li Li (2)
1: Harbin Engineering University, 2: University of Delaware

ACM DL

Google Scholar

(155)
概要:　新たなパラダイムとして、セッションベースの推薦は匿名セッションのセットに基づいて次のアイテムを推薦することを目的としています。通常、短い相互作用シーケンスであるセッションを効果的に表現することは、主要な技術的課題となります。他のセッションからの協調情報を探索する先駆的研究の限界を踏まえ、本稿では、新しい方向性としてマルチファセットなセッション非依存のグローバルアイテム関係を学習することによってセッション表現を強化する方法を提案します。特に、これまで研究されていなかった負の関係を含む、3種類の有利なグローバルアイテム関係を特定し、それらの関係を捉えるための異なるグラフ構築方法を提案します。その後、異なる関係を異なる集約層を用いてエンコードし、正の関係と負の関係を融合することで強化されたセッション表現を生成する新しいマルチファセットグローバルアイテム関係（MGIR）モデルを考案します。我々の解決策は、新しいアイテム関係を取り入れる柔軟性を持ち、既存のセッション表現学習方法を容易に統合してグローバル関係強化セッション情報からより良い表現を生成することができます。3つのベンチマークデータセットを用いた広範な実験により、我々のモデルが多数の最先端の方法に対して優位性を示しました。特に、負の関係を学習することがセッションベースの推薦において重要であることを示しています。

Abstract:　 As an emerging paradigm, session-based recommendation is aimed at recommending the next item based on a set of anonymous sessions. Effectively representing a session that is normally a short interaction sequence renders a major technical challenge. In view of the limitations of pioneering studies that explore collaborative information from other sessions, in this paper we propose a new direction to enhance session representations by learning multi-faceted session-independent global item relations. In particular, we identify three types of advantageous global item relations, including negative relations that have not been studied before, and propose different graph construction methods to capture such relations. We then devise a novel multi-faceted global item relation (MGIR) model to encode different relations using different aggregation layers and generate enhanced session representations by fusing positive and negative relations. Our solution is flexible to accommodate new item relations and can easily integrate existing session representation learning methods to generate better representations from global relation enhanced session information. Extensive experiments on three benchmark datasets demonstrate the superiority of our model over a large number of state-of-the-art methods. Specifically, we show that learning negative relations is critical for session-based recommendation.

Towards Suicide Ideation Detection Through Online Conversational Context
Authors: Ramit Sawhney (1), Shivam Agarwal (2), Atula Tejaswi Neerkaje (3), Nikolaos Aletras (4), Preslav Nakov (5), Lucie Flek (3)
1: Georgia Institute of Technology & Conversational AI and Social Analytics (CAISA) Lab, 2: University of Illinois at Urbana-Champaign & Conversational AI and Social Analytics (CAISA) Lab, 3: Conversational AI and Social Analytics (CAISA) Lab, 4: University of Sheffield, 5: Qatar Computing Research Institute

ACM DL

Google Scholar

(156)
概要:　ソーシャルメディアは、ユーザーが感情や情緒的な悩みを共有する場を提供します。また、自殺傾向のあるユーザーに対して、コミュニティによる支援を提供する機会も提供します。自殺リスク評価に関する最近の研究では、ユーザーの過去のタイムラインやソーシャルネットワークからの情報を利用して、彼らの感情状態を分析する方法が探求されています。しかし、そのような方法は大量のユーザー中心のデータを必要とする場合が多いです。より控えめな代替手段は、オンラインコミュニティからの応答によって生成される会話ツリーのみを使用することです。コミュニティと困っている人との間のオンライン会話をモデル化することは、その人の精神状態を理解するための重要な文脈です。しかし、ソーシャルメディア上の多数の会話ツリーをモデル化することは容易ではありません。なぜなら、各コメントが困っているユーザーに多様な影響を与えるからです。通常、わずかな数のコメント/投稿が非常に多くの返信を受け取り、これが会話ツリーのスケールフリーダイナミクスを生み出します。さらに、心理学的研究によれば、大量のコメントのリリースにおける細かい時間的不規則性を捉えることが重要であり、自殺傾向にあるユーザーはオンラインコミュニティの支援に迅速に反応することが示唆されています。これらの制約と心理学的研究に基づいて、私たちはHCN（双曲線会話ネットワーク）を提案します。HCNは、自殺思考検出のためのよりユーザに侵襲的でない方法です。HCNは、オンライン会話のスケールフリーダイナミクスを表現するために双曲線空間を活用します。実世界のTwitterデータを用いた広範な定量的、定性的、及びアブレーション実験を通じて、HCNは最先端の方法を上回りながら、98%少ないユーザー特定データを使用し、74%低い炭素フットプリント、及び94%小さいモデルサイズを維持することが分かりました。また、最初の30分以内のコメントが危険にさらされているユーザーを特定する上で最も重要であることがわかりました。

Abstract:　 Social media enable users to share their feelings and emotional struggles. They also offer an opportunity to provide community support to suicidal users. Recent studies on suicide risk assessment have explored the user's historic timeline and information from their social network to analyze their emotional state. However, such methods often require a large amount of user-centric data. A less intrusive alternative is to only use conversation trees arising from online community responses. Modeling such online conversations between the community and a person in distress is an important context for understanding that person's mental state. However, it is not trivial to model the vast number of conversation trees on social media, since each comment has a diverse influence on a user in distress. Typically, a handful of comments/posts receive a significantly high number of replies, which results in scale-free dynamics in the conversation tree. Moreover, psychological studies suggested that it is important to capture the fine-grained temporal irregularities in the release of vast volumes of comments, since suicidal users react quickly to online community support. Building on these limitations and psychological studies, we propose HCN, a Hyperbolic Conversation Network, which is a less user-intrusive method for suicide ideation detection. HCN leverages the hyperbolic space to represent the scale-free dynamics of online conversations. Through extensive quantitative, qualitative, and ablative experiments on real-world Twitter data, we find that HCN outperforms state-of-the art methods, while using 98% less user-specific data, and while maintaining a 74% lower carbon footprint and a 94% smaller model size. We also find that the comments within the first half an hour are most important to identify at-risk users.

Unsupervised Belief Representation Learning with Information-Theoretic Variational Graph Auto-Encoders
Authors: Jinning Li (1), Huajie Shao (2), Dachun Sun (1), Ruijie Wang (1), Yuchen Yan (1), Jinyang Li (1), Shengzhong Liu (1), Hanghang Tong (1), Tarek Abdelzaher (1)
1: University of Illinois at Urbana-Champaign, 2: College of William and Mary

ACM DL

Google Scholar

(157)
概要:　本論文では、極性ネットワークにおける信念表現学習のための新しい教師なしアルゴリズムを開発します。このアルゴリズムは、(i) 基本的な信念空間の潜在次元を明らかにし、(ii) ユーザーとコンテンツアイテム（ユーザーとの相互作用があるもの）をその空間に埋め込み、それによりスタンス検出、スタンス予測、イデオロギーマッピングなどの下流タスクを可能にします。情報理論における総相関に触発され、我々はユーザーとコンテンツアイテム（例えばユーザーの見解を表す投稿）を適切に歪みのない潜在空間に投射することを学習するInformation-Theoretic Variational Graph Auto-Encoder（InfoVGAE）を提案します。この空間での潜在変数をより良く分離するために、総相関の正則化モジュール、比例・積分（PI）制御モジュール、および直交性を確保するための修正ガウス分布を開発・採用しました。ユーザーとコンテンツの潜在表現は、それぞれのイデオロギー的傾向を定量化し、問題に対するスタンスを検出・予測するために利用できます。提案したInfoVGAEの性能をTwitterから収集した2つのデータセットと、アメリカ合衆国議会の投票記録から収集した1つのデータセットの合計3つの実データセットで評価しました。評価結果は、InfoVGAEが最先端の教師なしモデルを上回り、ユーザークラスタリングエラーを10.5%削減し、コンテンツアイテムのスタンス分離のためのF1スコアを12.1%向上させることを示しています。さらに、InfoVGAEは教師ありモデルと比較可能な結果も生成します。また、スタンス予測やイデオロギーグループ内のユーザーランキングにおけるその性能についても議論します。

Abstract:　 This paper develops a novel unsupervised algorithm for belief representation learning in polarized networks that (i) uncovers the latent dimensions of the underlying belief space and (ii) jointly embeds users and content items (that they interact with) into that space in a manner that facilitates a number of downstream tasks, such as stance detection, stance prediction, and ideology mapping. Inspired by total correlation in information theory, we propose the Information-Theoretic Variational Graph Auto-Encoder (InfoVGAE) that learns to project both users and content items (e.g., posts that represent user views) into an appropriate disentangled latent space. To better disentangle latent variables in that space, we develop a total correlation regularization module, a Proportional-Integral (PI) control module, and adopt rectified Gaussian distribution to ensure the orthogonality. The latent representation of users and content can then be used to quantify their ideological leaning and detect/predict their stances on issues. We evaluate the performance of the proposed InfoVGAE on three real-world datasets, of which two are collected from Twitter and one from U.S. Congress voting records. The evaluation results show that our model outperforms state-of-the-art unsupervised models by reducing 10.5% user clustering errors and achieving 12.1% higher F1 scores for stance separation of content items. In addition, InfoVGAE produces a comparable result with supervised models. We also discuss its performance on stance prediction and user ranking within ideological groups.

A Multitask Framework for Sentiment, Emotion and Sarcasm aware Cyberbullying Detection from Multi-modal Code-Mixed Memes
Authors: Krishanu Maity (1), Prince Jha (1), Sriparna Saha (1), Pushpak Bhattacharyya (2)
1: Indian Institute of Technology Patna, 2: Indian Institute of Technology Bombay

ACM DL

Google Scholar

(158)
概要:　ミームからのサイバーブルーイングの検出は、しばしば皮肉を含む暗黙の感情的内容の存在や、マルチモーダル（画像 + テキスト）が関与するため、非常に難しいです。本研究は、サイバーブルーイングをコードミックス言語環境におけるマルチモーダルなミームから判別する際の感情、エモーション、皮肉の役割を調査する初めての試みです。我々の貢献として、オープンソースのTwitterとRedditプラットフォームから収集したいじめ、感情、エモーション、皮肉のラベルが付いたベンチマークマルチモーダルミームデータセット「MultiBully」を作成しました。さらに、各ミームに有害性スコアを追加することで、サイバーブルーイング投稿の深刻度も調査しています。このデータセットはテキストと画像の二つのモダリティで構成されており、ほとんどのテキストは多言語ユーザーのための無意識な言語の移行を捉えたコードミックス形式です。サイバーブルーイング検出（CD）を目的とした二つの異なるマルチモーダルマルチタスクフレームワーク（BERT+ResNET-FeedbackとCLIP-CentralNet）を提案し、感情分析（SA）、エモーション認識（ER）、皮肉検出（SAR）という三つの補助タスクに取り組みます。実験結果によれば、提案されたフレームワークはユニモーダルおよびシングルタスクのバリアントと比較して、主要なタスクであるCDの性能を正確度とF1スコアの両方においてそれぞれ3.18%および3.10%向上させました。

Abstract:　 Detecting cyberbullying from memes is highly challenging, because of the presence of the implicit affective content which is also often sarcastic, and multi-modality (image + text). The current work is the first attempt, to the best of our knowledge, in investigating the role of sentiment, emotion and sarcasm in identifying cyberbullying from multi-modal memes in a code-mixed language setting. As a contribution, we have created a benchmark multi-modal meme dataset called MultiBully annotated with bully, sentiment, emotion and sarcasm labels collected from open-source Twitter and Reddit platforms. Moreover, the severity of the cyberbullying posts is also investigated by adding a harmfulness score to each of the memes. The created dataset consists of two modalities, text and image. Most of the texts in our dataset are in code-mixed form, which captures the seamless transitions between languages for multilingual users. Two different multimodal multitask frameworks (BERT+ResNET-Feedback and CLIP-CentralNet) have been proposed for cyberbullying detection (CD), the three auxiliary tasks being sentiment analysis (SA), emotion recognition (ER) and sarcasm detection (SAR). Experimental results indicate that compared to uni-modal and single-task variants, the proposed frameworks improve the performance of the main task, i.e., CD, by 3.18% and 3.10% in terms of accuracy and F1 score, respectively.

Bias Mitigation for Toxicity Detection via Sequential Decisions
Authors: Lu Cheng (1), Ahmadreza Mosallanezhad (1), Yasin N. Silva (2), Deborah L. Hall (3), Huan Liu (1)
1: Arizona State University, 2: Loyola University Chicago, 3: Arizona State University

ACM DL

Google Scholar

(159)
概要:　ソーシャルメディアの利用増加により、虐待的、無礼、攻撃的なテキストコメントの普及が進んでいます。オンラインで有害コメントを検出するための機械学習モデルが開発されていますが、これらのモデルはしばしば疎外されたり少数派のアイデンティティ（例: 女性やアフリカ系アメリカ人）のユーザーに対してバイアスを示す傾向があります。毒性分類のバイアス除去に関する既存の研究は、多くの場合(1)すべての情報が利用可能であると仮定し、一度の決定を行う静的またはバッチアプローチを採用し、(2)ジェンダーや人種バイアスなど異なるバイアスの独立性を前提として、一律の戦略を用いています。しかし、実際のシナリオでは、入力は通常、一度にではなく時間と共にコメントや言葉のシーケンスとして到着します。従って、追加の入力が到着する間に部分的な情報に基づいて決定を下す必要があります。さらに、社会的バイアスは本質的に複雑です。各種バイアスは独自の文脈内で定義され、社会科学の交差性理論と一致するように、他のバイアスの文脈と相関している可能性があります。本研究では、有害コメント検出のバイアス除去を、異なるバイアスが相互に依存しうる連続的な意思決定プロセスと見なし、特に次の2つの目的を持って研究します。(1)異なるバイアスが互いに相関する傾向があるかどうかを調査し、(2)これらの相関するバイアスを対話的に一緒に軽減する方法を調査して、バイアスの総量を最小限に抑えます。私たちのアプローチの核心は、シーケンシャルなマルコフ決定プロセス理論に基づいたフレームワークであり、個々のバイアスに合わせてバイアス測定を最小化しながら予測精度を最大化することを目指しています。二つのベンチマークデータセットでの評価は、バイアスが相関する傾向があるという仮説を実証的に検証し、提案した連続的バイアス除去戦略の有効性を裏付けています。

Abstract:　 Increased social media use has contributed to the greater prevalence of abusive, rude, and offensive textual comments. Machine learning models have been developed to detect toxic comments online, yet these models tend to show biases against users with marginalized or minority identities (e.g., females and African Americans). Established research in debiasing toxicity classifiers often (1) takes a static or batch approach, assuming that all information is available and then making a one-time decision; and (2) uses a generic strategy to mitigate different biases (e.g., gender and racial biases) that assumes the biases are independent of one another. However, in real scenarios, the input typically arrives as a sequence of comments/words over time instead of all at once. Thus, decisions based on partial information must be made while additional input is arriving. Moreover, social bias is complex by nature. Each type of bias is defined within its unique context, which, consistent with intersectionality theory within the social sciences, might be correlated with the contexts of other forms of bias. In this work, we consider debiasing toxicity detection as a sequential decision-making process where different biases can be interdependent. In particular, we study debiasing toxicity detection with two aims: (1) to examine whether different biases tend to correlate with each other; and (2) to investigate how to jointly mitigate these correlated biases in an interactive manner to minimize the total amount of bias. At the core of our approach is a framework built upon theories of sequential Markov Decision Processes that seeks to maximize the prediction accuracy and minimize the bias measures tailored to individual biases. Evaluations on two benchmark datasets empirically validate the hypothesis that biases tend to be correlated and corroborate the effectiveness of the proposed sequential debiasing strategy.

A Weakly Supervised Propagation Model for Rumor Verification and Stance Detection with Multiple Instance Learning
Authors: Ruichao Yang (1), Jing Ma (1), Hongzhan Lin (2), Wei Gao (3)
1: Hong Kong Baptist University, 2: Beijing University of Posts and Telecommunications, 3: Singapore Management University

ACM DL

Google Scholar

(160)
概要:　ソーシャルメディア上での噂の拡散は一般的に伝播木構造に従っており、これは元のメッセージがユーザーによってどのように伝達され反応されたかについて貴重な手がかりを提供します。最近の研究によれば、噂の検証と姿勢検出は異なるタスクでありながら、それぞれが相互に強化し合うことができることが明らかになっています。例えば、噂は関連する投稿に表現された姿勢をクロスチェックすることで解明でき、姿勢もまた噂の性質に依存しています。しかし、姿勢検出は通常、多数のラベル付けされた投稿レベルの姿勢の訓練データセットを必要とし、これらをアノテートするのは手間とコストがかかります。そこで、Multiple Instance Learning (MIL) スキームに着想を得て、噂の検証と姿勢検出のための新しい弱教師ありの合同学習フレームワークを提案します。このフレームワークでは、噂の真偽に関するバッグレベルのクラスラベルのみを必要とします。具体的には、ソース投稿の伝播木に基づいて、2つの多クラス問題を複数のMILベースの二値分類問題に変換し、各二値モデルが対象クラス（噂または姿勢）を他のクラスから区別することに焦点を当てます。次に、二値予測を集約するために階層的注意機構を提案します。この機構には、(1) 二値の姿勢を二値の真偽に集約するためのボトムアップ/トップダウンツリー注意層、および(2) 二値クラスをより細かい粒度のクラスに集約する識別注意層が含まれます。3つのTwitterベースのデータセットで行われた広範な実験により、主張レベルの噂検出と投稿レベルの姿勢分類の両方で、最先端の手法と比較して我々のモデルが有望なパフォーマンスを示すことが確認されました。

Abstract:　 The diffusion of rumors on social media generally follows a propagation tree structure, which provides valuable clues on how an original message is transmitted and responded by users over time. Recent studies reveal that rumor verification and stance detection are two relevant tasks that can jointly enhance each other despite their differences. For example, rumors can be debunked by cross-checking the stances conveyed by their relevant posts, and stances are also conditioned on the nature of the rumor. However, stance detection typically requires a large training set of labeled stances at post level, which are rare and costly to annotate. Enlightened by Multiple Instance Learning (MIL) scheme, we propose a novel weakly supervised joint learning framework for rumor verification and stance detection which only requires bag-level class labels concerning the rumor's veracity. Specifically, based on the propagation trees of source posts, we convert the two multi-class problems into multiple MIL-based binary classification problems where each binary model is focused on differentiating a target class (of rumor or stance) from the remaining classes. Then, we propose a hierarchical attention mechanism to aggregate the binary predictions, including (1) a bottom-up/top-down tree attention layer to aggregate binary stances into binary veracity; and (2) a discriminative attention layer to aggregate the binary class into finer-grained classes. Extensive experiments conducted on three Twitter-based datasets demonstrate promising performance of our model on both claim-level rumor detection and post-level stance classification compared with state-of-the-art methods.

On the Role of Relevance in Natural Language Processing Tasks
Authors: Artsiom Sauchuk (1), James Thorne (2), Alon Halevy (3), Nicola Tonellotto (4), Fabrizio Silvestri (1)
1: Sapienza University of Rome, 2: Cambridge University, 3: Meta AI, 4: University of Pisa

ACM DL

Google Scholar

(161)
概要:　近年における多くの自然言語処理（NLP）タスクの形式化、例えば質問応答や事実検証などは、二段階のカスケードアーキテクチャとして実装されています。第一段階では、IRシステムが「関連」する知識を含む文書を検索し、第二段階ではNLPシステムが推論を行いタスクを解決します。関連する文書を検索するためにIRシステムを最適化することで、NLPシステムが十分な情報を得られるようにします。これらの近年のNLPタスクの形式化は、IRにとって興味深く刺激的な課題を投げかけています。なぜなら、IRシステムのエンドユーザーが情報を必要とする人間ではなく、IRシステムが検索した文書を利用して推論を行い、ユーザーの情報ニーズに対応する他のシステムだからです。これらの課題の中で、本論文が示すように、IRシステムによるノイズ（例えば、不適切または無関係な文書の検索）が、下流の推論モジュールの精度に悪影響を及ぼす可能性があります。そのため、IRシステムでは関連性を最大化しつつノイズを最小限に抑える必要があります。本論文では、二段階のカスケードアーキテクチャとして実装された二つのNLPタスクに関する実験結果を示します。そして、第一段階で検索された不適切または無関係な結果が第二段階でどのようにエラーを引き起こすかを示します。これらの結果を基に、これらの知識集約型NLPタスクの文脈において、IRコミュニティが取り組むべき研究課題について議論します。

Abstract:　 Many recent Natural Language Processing (NLP) task formulations, such as question answering and fact verification, are implemented as a two-stage cascading architecture. In the first stage an IR system retrieves "relevant'' documents containing the knowledge, and in the second stage an NLP system performs reasoning to solve the task. Optimizing the IR system for retrieving relevant documents ensures that the NLP system has sufficient information to operate over. These recent NLP task formulations raise interesting and exciting challenges for IR, where the end-user of an IR system is not a human with an information need, but another system exploiting the documents retrieved by the IR system to perform reasoning and address the user information need. Among these challenges, as we will show, is that noise from the IR system, such as retrieving spurious or irrelevant documents, can negatively impact the accuracy of the downstream reasoning module. Hence, there is the need to balance maximizing relevance while minimizing noise in the IR system. This paper presents experimental results on two NLP tasks implemented as a two-stage cascading architecture. We show how spurious or irrelevant retrieved results from the first stage can induce errors in the second stage. We use these results to ground our discussion of the research challenges that the IR community should address in the context of these knowledge-intensive NLP tasks.

Learning to Enrich Query Representation with Pseudo-Relevance Feedback for Cross-lingual Retrieval
Authors: Ramraj Chandradevan (1), Eugene Yang (2), Mahsa Yarmohammadi (2), Eugene Agichtein (1)
1: Emory University, 2: Johns Hopkins University

ACM DL

Google Scholar

(162)
概要:　クロスリンガル情報検索（CLIR）は、異なる言語間で情報にアクセスできるようにすることを目指しています。最近の事前学習された多言語言語モデルは、クロスリンガルなアドホック検索を含む自然言語タスクに大きな改善をもたらしました。しかし、上位に最初に取得されたアイテムの内容を使用してランキングを改善する技術ファミリーである疑似関連フィードバック（PRF）は、ニューラルCLIR検索モデルではまだ探求されていません。長文書からのフィードバックの取り込みと、クロス言語知識の転送が二つの課題です。これらの課題に対処するために、NCLPRFという新しいニューラルCLIRアーキテクチャを提案します。このアーキテクチャは、複数の潜在的に長い文書からPRFフィードバックを取り込むことができ、クエリと言語間の共有セマンティックスペース内でのクエリ表現の改善を可能にします。ターゲット言語でフィードバック文書が提供する追加の情報は、クエリの表現を豊かにし、埋め込み空間内で関連する文書により近づけることができます。提案されたモデルは、中国語、ロシア語、ペルシャ語の三つのCLIRテストコレクションを通じて、従来のモデルおよびSOTAニューラルCLIRベースラインに対してすべてのコレクションで有意な改善を示しています。

Abstract:　 Cross-lingual information retrieval (CLIR) aims to provide access to information across languages. Recent pre-trained multilingual language models brought large improvements to the natural language tasks, including cross-lingual adhoc retrieval. However, pseudo-relevance feedback (PRF), a family of techniques for improving ranking using the contents of top initially retrieved items, has not been explored with neural CLIR retrieval models. Two of the challenges are incorporating feedback from long documents, and cross-language knowledge transfer. To address these challenges, we propose a novel neural CLIR architecture, NCLPRF, capable of incorporating PRF feedback from multiple potentially long documents, which enables improvements to query representation in the shared semantic space between query and document languages. The additional information that the feedback documents provide in a target language, can enrich the query representation, bringing it closer to relevant documents in the embedding space. The proposed model performance across three CLIR test collections in Chinese, Russian, and Persian languages, exhibits significant improvements over traditional and SOTA neural CLIR baselines across all three collections.

CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space
Authors: Yupeng Hou (1), Binbin Hu (2), Zhiqiang Zhang (2), Wayne Xin Zhao (3)
1: Renmin University of China, 2: Ant Group, 3: Renmin University of China & Beijing Academy of Artificial Intelligence

ACM DL

Google Scholar

(163)
概要:　セッションベースのレコメンデーション（SBR）は、匿名セッション内での短期的なユーザー行動に基づいて次のアイテムを予測するタスクを指します。しかし、非線形エンコーダーによって学習されたセッション埋め込みは、アイテム埋め込みと同じ表現空間にないことが多いため、おすすめアイテムの予測が不一致になる問題が生じます。この問題に対処するために、私たちはCOREと名付けたシンプルで効果的なフレームワークを提案します。これにより、エンコードおよびデコードの過程で表現空間を統一できます。まず、セッション埋め込みに入力アイテム埋め込みの線形結合を使用することで、セッションとアイテムが同じ表現空間内にあることを保証する表現一貫性のあるエンコーダーを設計しました。加えて、一貫性のある表現空間内で埋め込みの過学習を防ぐために、堅牢な距離測定方法を提案します。5つの公開された実データセットで行った広範な実験により、提案手法の有効性と効率性が実証されました。コードは以下で入手可能です：https://github.com/RUCAIBox/CORE。

Abstract:　 Session-based Recommendation (SBR) refers to the task of predicting the next item based on short-term user behaviors within an anonymous session. However, session embedding learned by a non-linear encoder is usually not in the same representation space as item embeddings, resulting in the inconsistent prediction issue while recommending items. To address this issue, we propose a simple and effective framework named CORE, which can unify the representation space for both the encoding and decoding processes. Firstly, we design a representation-consistent encoder that takes the linear combination of input item embeddings as session embedding, guaranteeing that sessions and items are in the same representation space. Besides, we propose a robust distance measuring method to prevent overfitting of embeddings in the consistent representation space. Extensive experiments conducted on five public real-world datasets demonstrate the effectiveness and efficiency of the proposed method. The code is available at: https://github.com/RUCAIBox/CORE.

Learning Disentangled Representations for Counterfactual Regression via Mutual Information Minimization
Authors: Mingyuan Cheng (1), Xinru Liao (2), Quan Liu (2), Bin Ma (2), Jian Xu (1), Bo Zheng (1)
1: Alibaba Group, 2: Alibaba Group

ACM DL

Google Scholar

(164)
概要:　個人レベルの治療効果の学習は、因果推論における根本的な問題であり、特に多くのインターネット企業が関心を持つユーザー成長領域で注目を集めています。最近、媒介変数、交絡因子、および調整因子という3つの潜在的な要素に共変量を分解する解離表現学習方法が、治療効果推定において大きな成功を収めています。しかし、潜在的な解離因子を正確に学習する方法については未解決の問題が残っています。具体的には、従来の方法では、治療効果を識別するために必要な条件である独立した解離因子を取得することができません。この論文では、相互情報最小化を介した反事実回帰のための解離表現（Mutual Information Minimization-Disentangled Representations for Counterfactual Regression, MIM-DRCFR）を提案します。我々の手法は、潜在因子を学習する際に情報を共有するマルチタスク学習フレームワークを使用し、これらの因子の独立性を確保するためにMI最小化学習基準を組み込みます。公開ベンチマークおよび実世界の産業ユーザー成長データセットを含む広範な実験により、我々の手法が最新の手法に比べてはるかに優れていることが示されました。

Abstract:　 Learning individual-level treatment effect is a fundamental problem in causal inference and has received increasing attention in many areas, especially in the user growth area which concerns many internet companies. Recently, disentangled representation learning methods that decompose covariates into three latent factors, including instrumental, confounding and adjustment factors, have witnessed great success in treatment effect estimation. However, it remains an open problem how to learn the underlying disentangled factors precisely. Specifically, previous methods fail to obtain independent disentangled factors, which is a necessary condition for identifying treatment effect. In this paper, we propose Disentangled Representations for Counterfactual Regression via Mutual Information Minimization (MIM-DRCFR), which uses a multi-task learning framework to share information when learning the latent factors and incorporates MI minimization learning criteria to ensure the independence of these factors. Extensive experiments including public benchmarks and real-world industrial user growth datasets demonstrate that our method performs much better than state-of-the-art methods.

Multi-modal Graph Contrastive Learning for Micro-video Recommendation
Authors: Zixuan Yi (1), Xi Wang (1), Iadh Ounis (1), Craig Macdonald (1)
1: University of Glasgow

ACM DL

Google Scholar

(165)
概要:　最近、TikTokやInstagramなどのソーシャルメディアプラットフォームでマイクロビデオがますます人気になっています。これらのプラットフォームでのエンゲージメントは、マルチモーダル推薦システムによって促進されます。実際、このようなマルチメディアコンテンツは、推薦モデルに視覚、音響、およびテキスト特徴として表現される多様なモダリティを含むことが多いです。既存のマイクロビデオ推薦に関する研究は、マルチモーダルチャネルを統一し、各モダリティを等しく重要視する傾向があります。しかし、これらのアプローチでは、複数のモダリティを持つアイテム表現を十分にエンコードできないと主張します。使用される手法では、ユーザーの異なるモダリティに対する好みを完全に分離することができないためです。この問題を解決するために、自己教師あり学習の方法でマルチモーダル表現学習を明示的に強化することを目的とした新しい学習方法を提案します。この手法は、Multi-Modal Graph Contrastive Learning（MMGCL）と名付けました。具体的には、ユーザーやアイテムの複数のビューを生成するために、モダリティエッジのドロップアウトとモダリティマスキングという二つの拡張技術を考案しました。さらに、モダリティ間の相関を学習し、各モダリティの効果的な寄与を確実にする新しいネガティブサンプリング技術を導入しました。二つのマイクロビデオデータセットで行った広範な実験により、提案したMMGCL手法が、推薦性能および訓練収束速度の両面で既存の最新のアプローチを上回ることが実証されました。

Abstract:　 Recently micro-videos have become more popular in social media platforms such as TikTok and Instagram. Engagements in these platforms are facilitated by multi-modal recommendation systems. Indeed, such multimedia content can involve diverse modalities, often represented as visual, acoustic, and textual features to the recommender model. Existing works in micro-video recommendation tend to unify the multi-modal channels, thereby treating each modality with equal importance. However, we argue that these approaches are not sufficient to encode item representations with multiple modalities, since the used methods cannot fully disentangle the users' tastes on different modalities. To tackle this problem, we propose a novel learning method named Multi-Modal Graph Contrastive Learning (MMGCL), which aims to explicitly enhance multi-modal representation learning in a self-supervised learning manner. In particular, we devise two augmentation techniques to generate the multiple views of a user/item: modality edge dropout and modality masking. Furthermore, we introduce a novel negative sampling technique that allows to learn the correlation between modalities and ensures the effective contribution of each modality. Extensive experiments conducted on two micro-video datasets demonstrate the superiority of our proposed MMGCL method over existing state-of-the-art approaches in terms of both recommendation performance and training convergence speed.

RESETBERT4Rec: A Pre-training Model Integrating Time And User Historical Behavior for Sequential Recommendation
Authors: Qihang Zhao (1)
1: University of Science and Technology of China & JD AI Research

ACM DL

Google Scholar

(166)
概要:　シーケンシャル・レコメンデーション・メソッドは、現代のレコメンデーションシステムにおいて非常に重要です。これらのメソッドは、ユーザーのインタラクション履歴からユーザーの動的な興味を適切にキャプチャし、ユーザーに対して正確なおすすめを行うことで、企業のビジネス成功を支援します。しかし、既存のシーケンシャル・レコメンデーションベースのメソッドは成功しているにもかかわらず、ユーザーのクリック履歴のアイテムレベルのモデリングに重点を置きすぎており、ユーザーの全クリック履歴（クリック順序、クリック時間など）に関する情報が不足しています。この問題に対処するために、自然言語処理分野における最近の事前学習技術の進展に触発されて、元のBERT事前学習フレームワークに基づいて新しい事前学習タスクを構築し、時間情報を組み込みました。具体的には、シーケンシャル・レコメンデーションのためのBERTを介したRE arrange S equence prE -training および T ime embeddingモデル (RESETBERT4Rec) \footnoteは、この作業がJDのインターンシップ中に完了したことを示しています。を提案します。このモデルは、元のBERT事前学習フレームワークに再配置シーケンス予測タスクを追加することでユーザーの全クリック履歴の情報をさらにキャプチャし、異なる時間情報の見方を統合します。2つの公開データセットおよび1つのeコマースデータセットにおける包括的な実験により、RESETBERT4Recが既存のベースラインに対して最先端のパフォーマンスを達成することが実証されました。

Abstract:　 Sequential recommendation methods are very important in modern recommender systems because they can well capture users' dynamic interests from their interaction history, and make accurate recommendations for users, thereby helping enterprises succeed in business. However, despite the great success of existing sequential recommendation-based methods, they focus too much on item-level modeling of users' click history and lack information about the user's entire click history (such as click order, click time, etc.). To tackle this problem, inspired by recent advances in pre-training techniques in the field of natural language processing, we build a new pre-training task based on the original BERT pre-training framework and incorporate temporal information. Specifically, we propose a new model called the RE arrange S equence prE -training and T ime embedding model via BERT for sequential R ecommendation (RESETBERT4Rec ) \footnoteThis work was completed during JD internship., it further captures the information of the user's whole click history by adding a rearrange sequence prediction task to the original BERT pre-training framework, while it integrates different views of time information. Comprehensive experiments on two public datasets as well as one e-commerce dataset demonstrate that RESETBERT4Rec achieves state-of-the-art performance over existing baselines.

Item-Provider Co-learning for Sequential Recommendation
Authors: Lei Chen (1), Jingtao Ding (2), Min Yang (1), Chengming Li (3), Chonggang Song (2), Lingling Yi (2)
1: Shenzhen Institute of Advanced Technology, 2: Tencent Inc., 3: Sun Yat-sen University

ACM DL

Google Scholar

(167)
概要:　シーケンシャルレコメンダーシステム（SRS）は、ユーザーの動的な好みを捉える強力な能力により、近年注目されている研究分野となっています。SRSの主なアイデアは、ユーザーとアイテムの相互作用における逐次依存関係をモデル化することです。しかし、ユーザーの好みは視聴や購入したアイテムによってのみ決定されるわけではなく、ユーザーがやり取りしたアイテム提供者にも影響されると考えられます。例えば、短編動画のシナリオでは、ユーザーは単に動画コンテンツに惹かれるか、または動画提供者であるブロガーがアイドルだからといって動画をクリックするかもしれません。このような観察に基づき、本論文ではシーケンシャルレコメンデーションのための新しいアイテム・プロバイダー協調学習フレームワークであるIPSRecを提案します。具体的には、ユーザーの過去のアイテムシーケンスおよび提供者シーケンスに基づき、包括的なアイテムおよびユーザー表現を学習するための二つの表現学習方法（シングルストリームおよびクロスストリーム）を提案します。続いて、自己教師あり方式でユーザー埋め込みをさらに強化するためにコントラスト学習を採用し、特定のユーザーのアイテム側およびアイテム・プロバイダー側から学習された表現をポジティブペアとして扱い、バッチ内の異なるユーザーの表現をネガティブサンプルとして扱います。三つの実世界のSRSデータセットにおける広範な実験により、IPSRecが強力な競争相手に比べて大幅に優れた結果を達成することを示しました。再現性を高めるために、コードおよびデータはhttps://github.com/siat-nlp/IPSRecで入手可能です。

Abstract:　 Sequential recommender systems (SRSs) have become a research hotspot recently due to its powerful ability in capturing users' dynamic preferences. The key idea behind SRSs is to model the sequential dependencies over the user-item interactions. However, we argue that users' preferences are not only determined by their view or purchase items but also affected by the item-providers with which users have interacted. For instance, in a short-video scenario, a user may click on a video because he/she is attracted to either the video content or simply the video-providers as the vloggers are his/her idols. Motivated by the above observations, in this paper, we propose IPSRec, a novel Item-Provider co-learning framework for Sequential Recommendation. Specifically, we propose two representation learning methods (single-steam and cross-stream) to learn comprehensive item and user representations based on the user's historical item sequence and provider sequence. Then, contrastive learning is employed to further enhance the user embeddings in a self-supervised manner, which treats the representations of a specific user learned from the item side as well as the item-provider side as the positive pair and treats the representations of different users in the batch as the negative samples. Extensive experiments on three real-world SRS datasets demonstrate that IPSRec achieves substantially better results than the strong competitors. For reproducibility, our code and data are available at https://github.com/siat-nlp/IPSRec.

Re-weighting Negative Samples for Model-Agnostic Matching
Authors: Jiazhen Lou (1), Hong Wen (1), Fuyu Lv (1), Jing Zhang (2), Tengfei Yuan (1), Zhao Li (3)
1: Alibaba Group, 2: The University of Sydney, 3: Zhejiang University

ACM DL

Google Scholar

(168)
概要:　レコメンダーシステム（RS）は、非常に大規模なコーパスからユーザーの興味のあるアイテムを発見する効率的なツールとして、学術界と産業界からますます注目されています。RSの初期段階である大規模マッチングは、基本的でありながらも挑戦的な課題です。一般的な方法は、二塔アーキテクチャを用いてユーザーとアイテムの表現を学習し、次に両方の表現ベクトル間の類似度スコアを計算することですが、負のサンプルを適切に扱うことには依然として困難が伴います。本論文では、空間全体からランダムに負のサンプルを抽出し、それらを同等に扱う一般的な手法が最適ではないことを発見しました。これは、異なる段階での異なるサブスペースからの負のサンプルは、マッチングモデルに対して異なる重要性を持つためです。この問題に対処するため、我々は「Unbiased Model-Agnostic Matching Approach」（UMA2）と名付けた新しい手法を提案します。UMA2は、1）エンベディングベースの二塔モデルとして実装可能なモデル非依存の「一般的マッチングモデル」（GMM）、2）「負のサンプルデバイアスネットワーク」（NSDN）から成り、IPW（逆傾向スコア重み付け）のアイデアを借りて負のサンプルを識別し、GMMでの損失に再重み付けを行います。UMA2はこれらの二つのモジュールをエンドツーエンドのマルチタスク学習フレームワークにシームレスに統合しています。実世界のオフラインデータセットとオンラインのA/Bテストの両方で行われた広範な実験により、UMA2の最先端手法に対する優位性が示されました。

Abstract:　 Recommender Systems (RS), as an efficient tool to discover users' interested items from a very large corpus, has attracted more and more attention from academia and industry. As the initial stage of RS, large-scale matching is fundamental yet challenging. A typical recipe is to learn user and item representations with a two-tower architecture and then calculate the similarity score between both representation vectors, which however still struggles in how to properly deal with negative samples. In this paper, we find that the common practice that randomly sampling negative samples from the entire space and treating them equally is not an optimal choice, since the negative samples from different sub-spaces at different stages have different importance to a matching model. To address this issue, we propose a novel method named Unbiased Model-Agnostic Matching Approach (UMA2). It consists of two basic modules including 1) General Matching Model (GMM), which is model-agnostic and can be implemented as any embedding-based two-tower models; and 2) Negative Samples Debias Network (NSDN), which discriminates negative samples by borrowing the idea of Inverse Propensity Weighting (IPW) and re-weighs the loss in GMM. UMA$^2$ seamlessly integrates these two modules in an end-to-end multi-task learning framework. Extensive experiments on both real-world offline dataset and online A/B test demonstrate its superiority over state-of-the-art methods.

Towards Event-level Causal Relation Identification
Authors: Chuang Fan (1), Daoxing Liu (1), Libo Qin (2), Yue Zhang (3), Ruifeng Xu (1)
1: Harbin Institute of Technology (Shenzhen), 2: Harbin Institute of Technology, 3: WestLake University

ACM DL

Google Scholar

(169)
概要:　既存の手法は通常、イベント間の因果関係を言及レベルで特定し、それぞれのイベント言及ペアを個別の入力として取り扱います。その結果、別々に予測される因果関係間の矛盾が生じるか、これらの矛盾を解決するための追加の制約が必要となります。私たちは、この課題をより現実的な設定で研究することを提案します。そこでは、イベントレベルでの因果関係の識別が行えます。これには二つの利点があります。1) イベントの異なる言及を一つの単位としてモデル化することで、予測結果間の矛盾がなくなり、追加の制約が不要になる。2) 多様な知識源（例えば、共起関係や共参照関係）を使用し、文書からリッチなグラフベースのイベント構造を導き出すことで、イベントレベルの因果推論を支援できる。構造情報をエンコードするためにグラフ畳み込みネットワークを使用し、ノード間の局所および非局所依存関係を捉えることを目指します。結果として、私たちのモデルは、言及レベルおよびイベントレベルの両設定で最高のパフォーマンスを達成し、強力なベースラインを最低でもF1スコアで2.8%上回りました。

Abstract:　 Existing methods usually identify causal relations between events at the mention-level, which takes each event mention pair as a separate input. As a result, they either suffer from conflicts among causal relations predicted separately or require a set of additional constraints to resolve such conflicts. We propose to study this task in a more realistic setting, where event-level causality identification can be made. The advantage is two folds: 1) with modeling different mentions of an event as a single unit, no more conflicts among predicted results, without any extra constraints; 2) with the use of diverse knowledge sources (e.g., co-occurrence and coreference relations), a rich graph-based event structure can be induced from the document for supporting event-level causal inference. Graph convolutional network is used to encode such structural information, which aims to capture the local and non-local dependencies among nodes. Results show that our model achieves the best performance under both mention- and event-level settings, outperforming a number of strong baselines by at least 2.8% on F1 score.

Exploring Heterogeneous Data Lake based on Unified Canonical Graphs
Authors: Qin Yuan (1), Ye Yuan (1), Zhenyu Wen (2), He Wang (1), Chen Chen (1), Guoren Wang (1)
1: Beijing Institute of Technology, 2: Zhejiang University of Technology

ACM DL

Google Scholar

(170)
概要:　データレイクは、複数のデータモデルと異なるデータスキーマおよびクエリインタフェースを含む、膨大な未処理で異質なデータのリポジトリです。キーワード検索は、基盤となるスキーマやクエリ言語の知識がなくても、ユーザーにとって有益な情報を抽出できます。しかし、従来のキーワード検索は特定のデータモデルに制限され、データレイクには容易に適応できません。本論文では、新しいキーワード検索について研究します。高精度かつ効率的な検索を実現するために、正準グラフを導入し、頂点表現に基づいて意味的に関連のある頂点を統合します。複数のデータソースにわたる回答を見つけるためのマッチングエンティティベースのキーワード検索アルゴリズムを提案します。最後に、広範な実験的研究により、我々のソリューションの有効性と効率性が示されます。

Abstract:　 A data lake is a repository for massive raw and heterogeneous data, which includes multiple data models with different data schemas and query interfaces. Keyword search can extract valuable information for users without the knowledge of underlying schemas and query languages. However, conventional keyword searches are restricted to a certain data model and cannot easily adapt to a data lake. In this paper, we study a novel keyword search. To achieve high accuracy and efficiency, we introduce canonical graphs and then integrate semantically related vertices based on vertex representations. A matching entity based keyword search algorithm is presented to find answers across multiple data sources. Finally, extensive experimental study shows the effectiveness and efficiency of our solution.

Regulating Group Exposure for Item Providers in Recommendation
Authors: Mirko Marras (1), Ludovico Boratto (1), Guilherme Ramos (2), Gianni Fenu (1)
1: University of Cagliari, 2: University of Lisbon

ACM DL

Google Scholar

(171)
概要:　新規参入者や少数派のコンテンツ提供者を含むすべてのコンテンツ提供者を巻き込むことは、オンラインプラットフォームが成長し続け、機能し続けるために重要です。したがって、レコメンデーションサービスを構築する際には、これらの提供者の関心も重視されるべきです。本研究では、提供者を共通の特性に基づいてグループ化し、特定の提供者グループがカタログにおけるアイテムの表現が少ないため、ユーザーとのインタラクションが少ない環境を考察します。そして、プラットフォーム所有者がレコメンデーションプロセスにおけるこれらのグループへの露出の程度を制御しようとするシナリオを想定します。このシナリオをサポートするために、グループに対するレコメンデーションの比率とプラットフォーム所有者が目指す露出レベルとの差を特徴付ける露出格差指標に依存します。次に、望ましい露出レベルを確保するための再ランク付け手順を提案します。実験結果は、特定の提供者グループを目標露出レベルに合わせてサポートすることで、精度以外の目標においてもレコメンデーションの有用性にほとんど影響を与えずに大幅な向上が見られることを示しています。

Abstract:　 Engaging all content providers, including newcomers or minority demographic groups, is crucial for online platforms to keep growing and working. Hence, while building recommendation services, the interests of those providers should be valued. In this paper, we consider providers as grouped based on a common characteristic in settings in which certain provider groups have low representation of items in the catalog and, thus, in the user interactions. Then, we envision a scenario wherein platform owners seek to control the degree of exposure to such groups in the recommendation process. To support this scenario, we rely on disparate exposure measures that characterize the gap between the share of recommendations given to groups and the target level of exposure pursued by the platform owners. We then propose a re-ranking procedure that ensures desired levels of exposure are met. Experiments show that, while supporting certain groups of providers by rendering them with the target exposure, beyond-accuracy objectives experience significant gains with negligible impact in recommendation utility.

L3E-HD: A Framework Enabling Efficient Ensemble in High-Dimensional Space for Language Tasks
Authors: Fangxin Liu (1), Haomin Li (2), Xiaokang Yang (3), Li Jiang (3)
1: Shanghai Jiao Tong University, 2: Tianjin University, 3: Shanghai Jiao Tong University

ACM DL

Google Scholar

(172)
概要:　脳に着想を得た高次元コンピューティング（HDC）は、効率的で堅牢な学習を実現する代替の計算パラダイムとして導入されました。HDCは全てのデータポイントを高次元空間の神経活動パターンにマッピングすることで認知タスクをシミュレートし、ロボティクス、生物医学信号処理、ゲノムシーケンシングなどの幅広いアプリケーションで有望な性能を示しています。言語タスクは一般的に機械学習手法を用いて解決され、低消費電力の組み込みデバイスで広く展開されています。しかし、既存のHDCソリューションは、次の二つの主要な課題により低消費電力組み込みデバイスへの展開が阻まれています：（i）次元数の増加と（ii）推論時の複雑な類似性評価基準によるストレージおよび計算のオーバーヘッド。この論文では、L3E-HDと称する言語タスクのための新しいアンサンブルフレームワークを提案し、低消費電力エッジデバイスでの効率的なHDCを可能にします。L3E-HDはデータポイントを高次元のバイナリ空間にマッピングすることで推論を加速させ、HDCで頻繁かつ高コストな操作を支配する類似検索を簡略化します。HDCとアンサンブル技術を結びつけることで、L3E-HDはモデルの次元および精度の圧縮によって引き起こされる重大な精度低下も解決します。実験の結果、アンサンブル技術はHDCを強化するための自然な適応であることがわかりました。我々のL3E-HDは、従来の機械学習手法よりも高速で効率的かつ精度が高いだけでなく、より小さなモデルサイズでフル精度モデルの精度も上回ることができることがわかりました。コードは以下で公開されています：https://github.com/MXHX7199/SIGIR22-EnsembleHDC。

Abstract:　 Brain-inspired hyperdimensional computing (HDC) has been introduced as an alternative computing paradigm to achieve efficient and robust learning. HDC simulates cognitive tasks by mapping all data points to patterns of neural activity in the high-dimensional space, which has demonstrated promising performances in a wide range of applications such as robotics, biomedical signal processing, and genome sequencing. Language tasks, generally solved using machine learning methods, are widely deployed on low-power embedded devices. However, existing HDC solutions suffer from major challenges that impede the deployment of low-power embedded devices: the storage and computation overhead of HDC models grows dramatically with (i) the number of dimensions and (ii) the complex similarity metric during the inference. In this paper, we proposed a novel ensemble framework for the language task, termed L3E-HD, which enables efficient HDC on low-power edge devices. L3E-HD accelerates the inference by mapping data points to a high-dimensional binary space to simplify similarity search, which dominates costly and frequent operation in HDC. Through marrying HDC with the ensemble technique, L3E-HD also addresses the severe accuracy degradation induced by the compression of the dimension and precision of the model. Our experiments show that the ensemble technique is naturally a perfect fit to boost HDCs. We find that our L3E-HD, which is faster, more efficient, and more accurate than conventional machine learning methods, can even surpass the accuracy of the full-precision model at a smaller model size. Code is released at: https://github.com/MXHX7199/SIGIR22-EnsembleHDC.

Neural Statistics for Click-Through Rate Prediction
Authors: Yanhua Huang (1), Hangyu Wang (2), Yiyun Miao (1), Ruiwen Xu (1), Lei Zhang (1), Weinan Zhang (2)
1: Xiaohongshu Inc., 2: Shanghai Jiao Tong University

ACM DL

Google Scholar

(173)
概要:　ディープラーニングの成功に伴い、クリック率（CTR）予測は浅いアプローチからディープアーキテクチャへと移行しています。現在のディープCTR予測は通常、Embedding & MLP（多層パーセプトロン）パラダイムに従い、モデルはカテゴリカルな特徴を潜在的な意味空間に埋め込みます。本論文では、ニューラル統計と呼ばれる新しい埋め込み技術を紹介します。従来のアプローチとは異なり、この技術は特徴工学を内在的な事前情報としてディープアーキテクチャに組み込むことで、カテゴリカルな特徴の明示的なセマンティクスを学習します。さらに、統計情報は時間とともに変化するため、MLPモジュール内での分布シフトに効率的に適応する方法を検討します。2つの公開データセットを用いたオフライン実験では、最先端モデルに対するニューラル統計の有効性が検証されました。また、オンラインA/Bテストを通じて、大規模なレコメンダーシステムに適用し、ユーザーの満足度が著しく向上したことを確認しています。

Abstract:　 With the success of deep learning, click-through rate (CTR) predictions are transitioning from shallow approaches to deep architectures. Current deep CTR prediction usually follows the Embedding & MLP paradigm, where the model embeds categorical features into latent semantic space. This paper introduces a novel embedding technique called neural statistics that instead learns explicit semantics of categorical features by incorporating feature engineering as an innate prior into the deep architecture in an end-to-end manner. Besides, since the statistical information changes over time, we study how to adapt to the distribution shift in the MLP module efficiently. Offline experiments on two public datasets validate the effectiveness of neural statistics against state-of-the-art models. We also apply it to a large-scale recommender system via online A/B tests, where the user's satisfaction is significantly improved.

Adversarial Graph Perturbations for Recommendations at Scale
Authors: Huiyuan Chen (1), Kaixiong Zhou (2), Kwei-Herng Lai (2), Xia Hu (2), Fei Wang (1), Hao Yang (1)
1: Visa Research, 2: Rice University

ACM DL

Google Scholar

(174)
概要:　グラフニューラルネットワーク（GNN）は、グラフに基づく協調フィルタリングに効果的な強力なアーキテクチャの一種です。しかし、GNNは敵対的な摂動に対して脆弱であることが知られています。敵対的なトレーニングは、ニューラルモデルのロバスト性を向上させるための簡単で効果的な方法です。例えば、多くの先行研究では、GNNのノード特徴や隠れ層に敵対的な摂動を注入する方法が採用されています。しかしながら、推薦システムにおいてグラフ構造を摂動することは非常に少ないです。このギャップを埋めるために、我々はAdvGraphを提案し、GNNのトレーニング中に敵対的なグラフ摂動をモデル化します。AdvGraphは主に、内側の最大化で得られる普遍的なグラフ摂動と外側の最適化を使用してGNNのモデルパラメータを計算するmin-maxロバスト最適化に基づいています。しかし、グラフ摂動の離散的な性質のため、内側の問題を直接最適化することは困難です。この問題に対処するために、離散変数の勾配を計算するための偏りのない勾配推定器をさらに提案します。広範な実験により、AdvGraphがGNNベースのレコメンダーの汎化性能を向上させることができることが示されています。

Abstract:　 Graph Neural Networks (GNNs) provide a class of powerful architectures that are effective for graph-based collaborative filtering. Nevertheless, GNNs are known to be vulnerable to adversarial perturbations. Adversarial training is a simple yet effective way to improve the robustness of neural models. For example, many prior studies inject adversarial perturbations into either node features or hidden layers of GNNs. However, perturbing graph structures has been far less studied in recommendations. To bridge this gap, we propose AdvGraph to model adversarial graph perturbations during the training of GNNs. Our AdvGraph is mainly based on min-max robust optimization, where an universal graph perturbation is obtained through an inner maximization while the outer optimization aims to compute the model parameters of GNNs. However, direct optimizing the inner problem is challenging due to discrete nature of the graph perturbations. To address this issue, an unbiased gradient estimator is further proposed to compute the gradients of discrete variables. Extensive experiments demonstrate that our AdvGraph is able to enhance the generalization performance of GNN-based recommenders.

Graph Capsule Network with a Dual Adaptive Mechanism
Authors: Xiangping Zheng (1), Xun Liang (1), Bo Wu (1), Yuhui Guo (1), Xuan Zhang (1)
1: Renmin University of China

ACM DL

Google Scholar

(175)
概要:　グラフ畳み込みネットワーク (GCN) は、その強力な表現能力により、人工知能のさまざまな分野に拡張されていますが、最近の研究では、グラフの部分-全体構造を捉える能力が制限されていることが明らかになっています。さらに、多くのGCNのバリアントが提案され、最先端の結果を得ていますが、グラフ畳み込みのステップで初期の情報が多く失われる可能性があるという問題に直面しています。この問題に対処するために、私たちは革新的にデュアル適応メカニズム (DA-GCN) を備えたグラフカプセルネットワークを提案します。具体的には、この強力なメカニズムは、グラフの部分-全体構造を捉えるためのデュアル適応メカニズムです。一つは、対話ノード間の潜在的な関係を探るための適応ノード相互作用モジュールです。もう一つは、適切なグラフカプセルを選択するための適応注意に基づくグラフ動的ルーティングであり、有利なグラフカプセルのみが収集され、冗長なグラフカプセルが制約されることで、グラフ間の全体構造をより良く捉えます。実験により、提案されたアルゴリズムが全てのデータセットで最先端または競争力のある結果を達成したことが示されました。

Abstract:　 While Graph Convolutional Networks (GCNs) have been extended to various fields of artificial intelligence with their powerful representation capabilities, recent studies have revealed that their ability to capture the part-whole structure of the graph is limited. Furthermore, though many GCNs variants have been proposed and obtained state-of-the-art results, they face the situation that much early information may be lost during the graph convolution step. To this end, we innovatively present an Graph Capsule Network with a Dual Adaptive Mechanism (DA-GCN) to tackle the above challenges. Specifically, this powerful mechanism is a dual-adaptive mechanism to capture the part-whole structure of the graph. One is an adaptive node interaction module to explore the potential relationship between interactive nodes. The other is an adaptive attention-based graph dynamic routing to select appropriate graph capsules, so that only favorable graph capsules are gathered and redundant graph capsules are restrained for better capturing the whole structure between graphs. Experiments demonstrate that our proposed algorithm has achieved the most advanced or competitive results on all datasets.

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification
Authors: Chao Yu (1), Yi Shen (1), Yue Mao (1)
1: Alibaba Group

ACM DL

Google Scholar

(176)
概要:　タイトル: 階層型テキスト分類におけるラベル整合性のためのSeq2Treeフレームワークの導入階層型テキスト分類（HTC）は、文書が分類学上の階層構造を持つ複数のカテゴリに割り当てられる難問です。従来の研究の多くは、HTCを平坦なマルチラベル分類問題として捉え、「ラベルの不整合」という問題を引き起こしていました。本研究では、HTCをシーケンス生成タスクとして定式化し、階層型ラベル構造をモデル化するためのシーケンスツリーフレームワーク（Seq2Tree）を導入します。さらに、動的な語彙を用いた制約付きデコーディング戦略を設計し、結果のラベル整合性を確保します。従来の手法と比較して、提案手法は三つのベンチマークデータセットにおいて顕著かつ一貫した改善を達成しています。

Abstract:　 Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy. The majority of prior studies consider HTC as a flat multi-label classification problem, which inevitably leads to ''label inconsistency'' problem. In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure. Moreover, we design a constrained decoding strategy with dynamic vocabulary to secure the label consistency of the results. Compared with previous works, the proposed approach achieves significant and consistent improvements on three benchmark datasets.

Relevance under the Iceberg: Reasonable Prediction for Extreme Multi-label Classification
Authors: Jyun-Yu Jiang (1), Wei-Cheng Chang (1), Jiong Zhang (1), Cho-Jui Hsieh (2), Hsiang-Fu Yu (1)
1: Amazon Search, 2: University of California

ACM DL

Google Scholar

(177)
概要:　ビッグデータの時代において、極端なマルチラベル分類（eXtreme Multi-label Classification, XMC）は、機械学習アプリケーションにおける巨大なラベル空間を処理するための最も重要な研究課題の一つとなっています。個々のラベルを逐一評価するのではなく、ほとんどのXMC手法はラベルツリーやフィルターに依存して、短いランク付きラベルリストを予測として導き出し、計算の負荷を軽減しています。具体的には、既存の研究では予測と評価のために固定長のランク付きラベルリストが取得されています。しかし、データポイントごとに関連ラベルの数が異なるため、これらの予測は非合理的です。評価においてPrecision@5やRecall@100のような非常に短いまたは長いリスト長は、他の関連ラベルを無視したり、多くの無関係なラベルを許容したりする可能性があります。本研究では、動的な数の予測ラベルを用いた合理的な極端なマルチラベル分類の予測を提供することを目指します。特に、ランク付けの特性を活用し、長いランク付きラベルリストをより正確に切り詰めるための、新しいフレームワーク「順序回帰を用いたモデルに依存しないリスト切り捨て手法（Model-Agnostic List Truncation with Ordinal Regression, MALTOR）」を提案します。6つの大規模な実世界のベンチマークデータセットで行った広範な実験により、MALTORは統計的なベースライン手法や従来のランク付きリスト切り捨て手法を、線形および深層XMCモデルの両方で用いたアドホック検索において大幅に上回ることが示されました。また、アブレーション研究の結果、提案したMALTORの各個別コンポーネントの有効性も確認されました。

Abstract:　 In the era of big data, eXtreme Multi-label Classification (XMC) has already become one of the most essential research tasks to deal with enormous label spaces in machine learning applications. Instead of assessing every individual label, most XMC methods rely on label trees or filters to derive short ranked label lists as prediction, thereby reducing computational overhead. Specifically, existing studies obtain ranked label lists with a fixed length for prediction and evaluation. However, these predictions are unreasonable since data points have varied numbers of relevant labels. The greatly small and large list lengths in evaluation, such as Precision@5 and Recall@100, can also lead to the ignorance of other relevant labels or the tolerance of many irrelevant labels. In this paper, we aim to provide reasonable prediction for extreme multi-label classification with dynamic numbers of predicted labels. In particular, we propose a novel framework, Model-Agnostic List Truncation with Ordinal Regression (MALTOR), to leverage the ranking properties and truncate long ranked label lists for better accuracy. Extensive experiments conducted on six large-scale real-world benchmark datasets demonstrate that MALTOR significantly outperforms statistical baseline methods and conventional ranked list truncation methods in ad-hoc retrieval with both linear and deep XMC models. The results of an ablation study also shows the effectiveness of each individual component in our proposed MALTOR.

Training Entire-Space Models for Target-oriented Opinion Words Extraction
Authors: Yuncong Li (1), Fang Wang (2), Sheng-Hua Zhong (2)
1: Tencent Inc, 2: Shenzhen University

ACM DL

Google Scholar

(178)
概要:　このテキストを日本語に翻訳します。

---ターゲット指向意見語抽出（TOWE）は、アスペクトベースの感情分析（ABSA）のサブタスクです。文とその文中に出現するアスペクト用語が与えられた場合、TOWEはそのアスペクト用語に対応する意見語を抽出します。TOWEには2種類のインスタンスがあります。第一のタイプでは、アスペクト用語が少なくとも一つの意見語と関連付けられています。一方、第二のタイプでは、アスペクト用語に対応する意見語が存在しません。しかし、これまでの研究は第一のタイプのインスタンスのみを使用してモデルの訓練と評価を行っており、サンプル選択バイアスの問題が発生していました。具体的には、モデルは一つのタイプのインスタンスだけで訓練され、両方のタイプのインスタンス全体に対して推論を行うために使われることになります。そのため、モデルの汎化性能が低下します。さらに、第一のタイプのインスタンスに対するモデルの性能は、全体のスペースに対する性能を反映しません。このサンプル選択バイアスの問題を検証するために、少なくとも一つの意見語と関連付けられるアスペクト用語のみを含む4つの人気のあるTOWEデータセットを拡張し、対応する意見語を持たないアスペクト用語も追加しました。これらのデータセットに対する実験結果は、すべてのインスタンスを対象にしてTOWEモデルを訓練することで、モデルの性能が大幅に向上し、第一のタイプのインスタンスのみで評価することはモデルの性能を過大評価することが示されました。

---

Abstract:　 Target-oriented opinion words extraction (TOWE) is a subtask of aspect-based sentiment analysis (ABSA). Given a sentence and an aspect term occurring in the sentence, TOWE extracts the corresponding opinion words for the aspect term. TOWE has two types of instance. In the first type, aspect terms are associated with at least one opinion word, while in the second type, aspect terms do not have corresponding opinion words. However, previous researches trained and evaluated their models with only the first type of instance, resulting in a sample selection bias problem. Specifically, TOWE models were trained with only the first type of instance, while these models would be utilized to make inference on the entire space with both the first type of instance and the second type of instance. Thus, the generalization performance will be hurt. Moreover, the performance of these models on the first type of instance cannot reflect their performance on entire space. To validate the sample selection bias problem, four popular TOWE datasets containing only aspect terms associated with at least one opinion word are extended and additionally include aspect terms without corresponding opinion words. Experimental results on these datasets show that training TOWE models on entire space will significantly improve model performance and evaluating TOWE models only on the first type of instance will overestimate model performance.

Zero-shot Query Contextualization for Conversational Search
Authors: Antonios Minas Krasakis (1), Andrew Yates (1), Evangelos Kanoulas (1)
1: University of Amsterdam

ACM DL

Google Scholar

(179)
概要:　現在の会話型パッセージ検索システムは、ユーザーの質問を会話の文脈に沿って中間的なクエリ解決ステップを使用してアドホック検索に変換しています。提案された方法は効果的であると証明されていますが、これらは依然として大規模な質問解決および会話検索データセットの利用可能性を前提としています。このようなデータの利用可能性への依存を解消するために、追加の微調整を行わずに事前訓練されたトークンレベルの密なレトリーバーをアドホック検索データに適応させ、会話検索を実行します。提案された方法では、会話の履歴内でユーザーの質問を文脈化することができる一方で、質問と潜在的な回答の間の一致だけに制限します。我々の実験では、提案手法の有効性を実証しました。さらに、文脈化が潜在空間でどのように機能するかについての洞察を提供する分析を行い、実質的に会話からの顕著な用語へのバイアスを導入します。

Abstract:　 Current conversational passage retrieval systems cast conversational search into ad-hoc search by using an intermediate query resolution step that places the user's question in context of the conversation. While the proposed methods have proven effective, they still assume the availability of large-scale question resolution and conversational search datasets. To waive the dependency on the availability of such data, we adapt a pre-trained token-level dense retriever on ad-hoc search data to perform conversational search with no additional fine-tuning. The proposed method allows to contextualize the user question within the conversation history, but restrict the matching only between question and potential answer. Our experiments demonstrate the effectiveness of the proposed approach. We also perform an analysis that provides insights of how contextualization works in the latent space, in essence introducing a bias towards salient terms from the conversation.

EFLEC: Efficient Feature-LEakage Correction in GNN based Recommendation Systems
Authors: Ishaan Kumar (1), Yaochen Hu (1), Yingxue Zhang (1)
1: Huawei Technologies Canada

ACM DL

Google Scholar

(180)
概要:　グラフ畳み込みニューラルネットワーク (GNN) に基づくレコメンダーシステムは、ユーザとアイテム間の高次の協調信号を捉えることができるため、最先端の技術とされています。しかし、ラベル情報がエッジによって決定され、その同じエッジによってガイドされるGNNの集約プロセスを通じてノード埋め込みに漏れるため、フィーチャリーク問題に悩まされます。これにより、一般化性能が低下します。我々は最終的な埋め込みを生成するための精密な削除アルゴリズムを提案します。各エッジについて、そのエッジを削除したグラフ上で両端ノードの埋め込みを評価します。LightGCNモデルのために明示的に別々のグラフを構築することなく、このプロセスを効率的に計算する代数的な技法を考案しました。4つのデータセットにおける実験において、本アルゴリズムはスパースな相互作用を持つデータセットにおいても優れた性能を発揮し、トレーニング時間は大幅に短縮されることが示されました。

Abstract:　 Graph Convolutional Neural Networks (GNN) based recommender systems are state-of-the-art since they can capture the high order collaborative signals between users and items. However, they suffer from the feature leakage problem since label information determined by edges can be leaked into node embeddings through the GNN aggregation procedure guided by the same set of edges, leading to poor generalization. We propose the accurate removal algorithm to generate the final embedding. For each edge, the embeddings of the two end nodes are evaluated on a graph with that edge removed. We devise an algebraic trick to efficiently compute this procedure without explicitly constructing separate graphs for the LightGCN model. Experiments on four datasets demonstrate that our algorithm can perform better on datasets with sparse interactions, while the training time is significantly reduced.

Gating-adapted Wavelet Multiresolution Analysis for Exposure Sequence Modeling in CTR Prediction
Authors: Xiaoxiao Xu (1), Zhiwei Fang (1), Qian Yu (1), Ruoran Huang (1), Chaosheng Fan (1), Yong Li (1), Yang He (1), Changping Peng (1), Zhangang Lin (1), Jingping Shao (1), Non Non (2)
1: Business Growth BU, 2: Non

ACM DL

Google Scholar

(181)
概要:　クリック率 (CTR) 予測におけるユーザー興味モデルの研究では、露出シーケンスが重要なトピックとなっています。しかし、現行の露出シーケンスモデルの手法は計算負荷が大きく、ノイズ問題を無視しているため、オンラインレコメンダーにおいて過剰な遅延と限定的な性能に悩まされています。本論文では、高遅延およびノイズ問題に対処するために、Gating-adapted wavelet multiresolution analysis (Gama) を提案します。Gamaは、非常に長い露出シーケンスのノイズを効果的に除去し、線形計算複雑度で多次元のユーザー興味を適応的に捉えることができます。これが、ユーザー露出シーケンスをモデル化するために非パラメトリックな多重解像度解析技術をディープニューラルネットワークに統合する初の試みとなります。大規模ベンチマークデータセットや実際のプロダクションデータセットでの広範な実験により、特にコールドスタートシナリオにおいてGamaの有効性が確認されました。その低遅延と高い効果により、Gamaは実際の大規模産業レコメンダーに導入され、何億ものユーザーに対して成功裏にサービスを提供しています。

Abstract:　 The exposure sequence is being actively studied for user interest modeling in Click-Through Rate (CTR) prediction. However, the existing methods for exposure sequence modeling bring extensive computational burden and neglect noise problems, resulting in an excessively latency and the limited performance in online recommenders. In this paper, we propose to address the high latency and noise problems via Gating-adapted wavelet multiresolution analysis (Gama), which can effectively denoise the extremely long exposure sequence and adaptively capture the implied multi-dimension user interest with linear computational complexity. This is the first attempt to integrate non-parametric multiresolution analysis technique into deep neural network to model user exposure sequence. Extensive experiments on large scale benchmark dataset and real production dataset confirm the effectiveness of Gama for exposure sequence modeling, especially in cold-start scenarios. Benefited from its low latency and high effecitveness, Gama has been deployed in our real large-scale industrial recommender, successfully serving over hundreds of millions users.

Enhancing Top-N Item Recommendations by Peer Collaboration
Authors: Yang Sun (1), Fajie Yuan (2), Min Yang (3), Alexandros Karatzoglou (4), Li Shen (5), Xiaoyan Zhao (3)
1: Harbin Institute of Technology, 2: Westlake University, 3: Shenzhen Institutes of Advanced Technology, 4: Google Research, 5: JD Explore Academy

ACM DL

Google Scholar

(182)
概要:　ディープニューラルネットワーク（DNN）ベースのレコメンダーモデルは、卓越した性能を達成するために多数のパラメータを必要とすることがよくあります。しかし、これにより過パラメータ化と呼ばれる現象が発生し、冗長なニューロンが生じることは避けられません。本論文では、このような冗長性の現象をレコメンダーシステム（RS）に活用することを目指し、同じネットワーク構造を持つ2つのレコメンダーモデルを共同で訓練する、ピアコラボレーションと呼ばれる手法を用いたトップNアイテムレコメンデーションフレームワーク「PCRec」を提案します。まず、与えられたレコメンダーモデルのパラメータの重要性を識別するための2つの基準を導入します。そして、重要でないパラメータをピアネットワークからコピーすることで再生させます。この操作と再訓練の後、元のレコメンダーモデルは、より多くの機能的なモデルパラメータを持つことにより、表現能力が向上します。その汎用性を示すために、PCRecを三つの有名なレコメンダーモデルを用いて具体化します。そして、二つの実世界のデータセットで広範な実験を行い、PCRecが同じモデル（パラメータ）サイズの対応するモデルよりも著しく優れた性能を発揮することを示します。

Abstract:　 Deep neural networks (DNN) based recommender models often require numerous parameters to achieve remarkable performance. However, this inevitably brings redundant neurons, a phenomenon referred to as over-parameterization. In this paper, we plan to exploit such redundancy phenomena for recommender systems (RS), and propose a top-N item recommendation framework called PCRec that leverages collaborative training of two recommender models of the same network structure, termed peer collaboration. We first introduce two criteria to identify the importance of parameters of a given recommender model. Then, we rejuvenate the unimportant parameters by copying parameters from its peer network. After such an operation and retraining, the original recommender model is endowed with more representation capacity by possessing more functional model parameters. To show its generality, we instantiate PCRec by using three well-known recommender models. We conduct extensive experiments on two real-world datasets, and show that PCRec yields significantly better performance than its counterpart with the same model (parameter) size.

Faster Learned Sparse Retrieval with Guided Traversal
Authors: Antonio Mallia (1), Joel Mackenzie (2), Torsten Suel (1), Nicola Tonellotto (3)
1: New York University, 2: The University of Queensland, 3: University of Pisa

ACM DL

Google Scholar

(183)
概要:　 BERTのようなトランスフォーマーに基づくニューラル情報検索アーキテクチャは、BM25のような従来のスパースモデルに比べてシステムの効果を大幅に向上させることができます。しかし、高い効果を発揮する一方で、これらのニューラルアプローチは実行コストが非常に高く、厳しいレイテンシー制約の下では展開が難しいです。この制約に対処するため、最近の研究では、新たな学習スパースモデルが提案されており、これらは従来のインバーテッドインデックスデータ構造を利用して、学習された高密度モデルの効果を維持しつつ効率性を高めようとしています。現在の学習スパースモデルは、文書や場合によってはクエリ内の用語の重みを学習しますが、異なる語彙構造、文書拡張技術、クエリ拡張戦略を利用するため、BM25のような従来のスパースモデルよりも遅くなることがあります。本研究では、従来のスパースモデルの「ガイダンス」を利用してインデックスを効率的に辿る新しいインデックス作成およびクエリ処理手法を提案します。これにより、より効果的な学習モデルがスコアリング操作を少なく実行できるようになります。我々の実験では、ガイド処理のヒューリスティックを適用することで、基盤となる学習スパースモデルの効率性を4倍に向上させ、効果の低下は認められませんでした。

Abstract:　 Neural information retrieval architectures based on transformers such as BERT are able to significantly improve system effectiveness over traditional sparse models such as BM25. Though highly effective, these neural approaches are very expensive to run, making them difficult to deploy under strict latency constraints. To address this limitation, recent studies have proposed new families of learned sparse models that try to match the effectiveness of learned dense models, while leveraging the traditional inverted index data structure for efficiency. Current learned sparse models learn the weights of terms in documents and, sometimes, queries; however, they exploit different vocabulary structures, document expansion techniques, and query expansion strategies, which can make them slower than traditional sparse models such as BM25. In this work, we propose a novel indexing and query processing technique that exploits a traditional sparse model's "guidance" to efficiently traverse the index, allowing the more effective learned model to execute fewer scoring operations. Our experiments show that our guided processing heuristic is able to boost the efficiency of the underlying learned sparse model by a factor of four without any measurable loss of effectiveness.

Animating Images to Transfer CLIP for Video-Text Retrieval
Authors: Yu Liu (1), Huai Chen (2), Lianghua Huang (1), Di Chen (3), Bin Wang (3), Pan Pan (1), Lisheng Wang (2)
1: DAMO Academy, 2: Shanghai Jiao Tong University, 3: DAMO Academy

ACM DL

Google Scholar

(184)
概要:　近年の研究では、CLIP（対照言語-画像事前学習）モデルをビデオとテキストの検索に転用する可能性が示されていますが、有望な性能を発揮しています。しかし、静止画像とビデオのドメインギャップのために、インタラクションベースのマッチングを用いたCLIPベースのビデオ-テキスト検索モデルは、表現ベースのマッチングを用いたモデルに比べて大幅に劣ります。本論文では、画像-テキストCLIPモデルをビデオ-テキスト検索に効果的に転用するための新しい画像アニメーション戦略を提案します。ビデオ撮影の要素を模倣することで、広く使用されている画像-言語コーパスを合成ビデオ-テキストデータに変換し、事前学習を行います。インタラクションマッチングの時間複雑性を低減するために、デュアルエンコーダーを用いた高速候補検索と、クロスモーダルインタラクションモジュールを用いた細粒度の再ランキングからなる粗から細へのフレームワークを提案します。合成ビデオ-テキスト事前学習を用いた粗から細へのフレームワークは、効率を維持しながら検索精度において顕著な向上を提供します。MSR-VTT、MSVD、およびVATEXデータセットで行った包括的な実験により、我々のアプローチの有効性が実証されました。

Abstract:　 Recent works show the possibility of transferring the CLIP (Contrastive Language-Image Pretraining) model for video-text retrieval with promising performance. However, due to the domain gap between static images and videos, CLIP-based video-text retrieval models with interaction-based matching perform far worse than models with representation-based matching. In this paper, we propose a novel image animation strategy to transfer the image-text CLIP model to video-text retrieval effectively. By imitating the video shooting components, we convert widely used image-language corpus to synthesized video-text data for pretraining. To reduce the time complexity of interaction matching, we further propose a coarse to fine framework which consists of dual encoders for fast candidates searching and a cross-modality interaction module for fine-grained re-ranking. The coarse to fine framework with the synthesized video-text pretraining provides significant gains in retrieval accuracy while preserving efficiency. Comprehensive experiments conducted on MSR-VTT, MSVD, and VATEX datasets demonstrate the effectiveness of our approach.

IPR: Interaction-level Preference Ranking for Explicit feedback
Authors: Shih-Yang Liu (1), Hsien Hao Chen (1), Chih-Ming Chen (2), Ming-Feng Tsai (3), Chuan-Ju Wang (1)
1: Academia Sinica, 2: National Chengchi University, 3: National Chengchi University

ACM DL

Google Scholar

(185)
概要:　ユーザーのアイテムに対する興味に関する明示フィードバック（ユーザーの入力）は、ユーザーから直接得られる情報であり、アイテムへの直接的な興味を示しているため、推薦において最も有益な情報です。多くのアプローチは、このようなフィードバックに基づいた推薦を典型的な回帰問題として扱うか、このデータを暗黙的なものと見なし、暗黙フィードバックのためのアプローチを直接採用します。しかし、両方の方法はトップk推薦において満足のいくパフォーマンスを得ることが難しいです。本論文では、明示フィードバックをより効果的に活用するための新しいペアワイズランキング埋め込み学習アプローチである相互作用レベルの嗜好ランキング（IPR）を提案します。三つの実データセットで行った実験において、IPRは六つの強力なベースラインと比較して最良の結果をもたらしました。

Abstract:　 Explicit feedback---user input regarding their interest in an item---is the most helpful information for recommendation as it comes directly from the user and shows their direct interest in the item. Most approaches either treat the recommendation given such feedback as a typical regression problem or regard such data as implicit and then directly adopt approaches for implicit feedback; both methods, however,tend to yield unsatisfactory performance in top-k recommendation. In this paper, we propose interaction-level preference ranking(IPR), a novel pairwise ranking embedding learning approach to better utilize explicit feedback for recommendation. Experiments conducted on three real-world datasets show that IPR yields the best results compared to six strong baselines.

News Recommendation with Candidate-aware User Modeling
Authors: Tao Qi (1), Fangzhao Wu (2), Chuhan Wu (1), Yongfeng Huang (1)
1: Tsinghua University, 2: Microsoft Research Asia

ACM DL

Google Scholar

(186)
概要:　ニュース推薦は、ニュースを個別化されたユーザーの興味と一致させることを目指します。現行のニュース推薦手法は通常、ユーザーの興味を過去にクリックしたニュースからモデル化しますが、候補ニュースを考慮しない場合が多いです。しかし、各ユーザーは通常複数の興味を持っており、これらの手法では特定のユーザーの興味と候補ニュースを正確に一致させることは困難です。本研究では、パーソナライズされたニュース推薦のための候補ニュースに対応したユーザーモデリング手法を提案し、候補ニュースをユーザーモデリングに組み込むことで、候補ニュースとユーザーの興味の間のマッチングを改善します。我々は、候補ニュースを手がかりとして使用し、候補ニュースに対応したグローバルなユーザー興味をモデル化するための候補ニュース対応自己注意ネットワークを提案します。さらに、候補ニュースを局所的な行動コンテキストモデリングに組み込み、候補ニュースに対応した短期的なユーザー興味を学習するための候補ニュース対応CNNネットワークを提案します。また、候補ニュースとの関連性に基づいて過去にクリックされたニュースを重み付けして集約し、候補ニュースに対応したユーザー表現を構築するための候補ニュース対応注意ネットワークも利用しています。実世界のデータセットを用いた実験では、我々の手法がニュース推薦性能の向上に有効であることが示されました。

Abstract:　 News recommendation aims to match news with personalized user interest. Existing methods for news recommendation usually model user interest from historical clicked news without the consideration of candidate news. However, each user usually has multiple interests, and it is difficult for these methods to accurately match a candidate news with a specific user interest. In this paper, we present a candidate-aware user modeling method for personalized news recommendation, which can incorporate candidate news into user modeling for better matching between candidate news and user interest. We propose a candidate-aware self-attention network that uses candidate news as clue to model candidate-aware global user interest. In addition, we propose a candidate-aware CNN network to incorporate candidate news into local behavior context modeling and learn candidate-aware short-term user interest. Besides, we use a candidate-aware attention network to aggregate previously clicked news weighted by their relevance with candidate news to build candidate-aware user representation. Experiments on real-world datasets show the effectiveness of our method in improving news recommendation performance.

PERD: Personalized Emoji Recommendation with Dynamic User Preference
Authors: Xuanzhi Zheng (1), Guoshuai Zhao (1), Li Zhu (1), Xueming Qian (1)
1: Xi'an Jiaotong University

ACM DL

Google Scholar

(187)
概要:　絵文字推薦は、膨大な候補の中から適切な絵文字を短いツイートテキストに基づいてユーザーが見つけるために重要なタスクです。従来の絵文字推薦手法は、個別の推薦を欠き、絵文字の選択においてユーザーの履歴情報を無視しています。本論文では、動的ユーザープリファレンスを含むパーソナライズド絵文字推薦（PERD）を提案します。これには、テキストエンコーダーとパーソナライズドアテンションメカニズムが含まれます。テキストエンコーダーには、BERTモデルが含まれ、ツイートの密で低次元の表現を学習します。パーソナライズドアテンションメカニズムでは、履歴ツイートと絵文字推薦対象ツイートの間のセマンティクスおよび感情的類似性に従って動的ユーザープリファレンスが学習されます。有益な履歴ツイートが選択され、強調されます。Sina WeiboとTwitterの2つの実世界のデータセットを用いて実験を行いました。実験結果は、個別の絵文字推薦における我々のアプローチの優位性を実証しました。

Abstract:　 Emoji recommendation is an important task to help users find appropriate emojis from thousands of candidates based on a short tweet text. Traditional emoji recommendation methods lack personalized recommendation and ignore user historical information in selecting emojis. In this paper, we propose a personalized emoji recommendation with dynamic user preference (PERD) which contains a text encoder and a personalized attention mechanism. In text encoder, a BERT model is contained to learn dense and low-dimensional representations of tweets. In personalized attention, user dynamic preferences are learned according to semantic and sentimental similarity between historical tweets and the tweet which is waiting for emoji recommendation. Informative historical tweets are selected and highlighted. Experiments are carried out on two real-world datasets from Sina Weibo and Twitter. Experimental results validate the superiority of our approach on personalized emoji recommendation.

Socially-aware Dual Contrastive Learning for Cold-Start Recommendation
Authors: Jing Du (1), Zesheng Ye (1), Lina Yao (1), Bin Guo (2), Zhiwen Yu (2)
1: The University of New South Wales, 2: Northwestern Polytechnical University

ACM DL

Google Scholar

(188)
概要:　グラフニューラルネットワーク（GNNs）を用いたソーシャルレコメンデーションは、ユーザー間のソーシャル関係とユーザー-アイテムの相互作用を融合させることで、コールドスタート問題を軽減し、コールドユーザーの表現を学習します。このようにソーシャル関係やユーザー-アイテムの相互作用に適応しているにもかかわらず、これらの教師ありモデルは依然として人気バイアスの影響を受けやすいです。コントラスト学習は、正例と負例を区別する特性を識別することで、このジレンマを解決する手助けをします。しかし、従来のレコメンデーションシステムとの組み合わせでは、この文脈におけるソーシャル関係やコールドスタートケースが考慮されていません。また、それらは主にユーザーとアイテム間の協調的な特徴に焦点を当てており、アイテム間の類似性が十分に活用されていません。本研究では、コールドユーザーを温かいユーザーと同様にモデリングするための社会認識型デュアルコントラスト学習を提案します。ソーシャル関係を最大限活用するために、ユーザーとアイテムのペアとして異なるクエリアイテムに応じて異なる隣接ノードから情報を集約することで、各ユーザーに対する動的なノード埋め込みを作成します。さらに、ユーザー-アイテムの協調的特徴とアイテム間の相互情報を考慮したデュアルブランチ自己教師付きコントラスト目的を設計します。一方で、コントラスト学習における適切なネガティブサンプリングを用いることで、余分な正解ラベルの監督なしに人気バイアスを排除します。他方で、ソーシャル関係を含むコールドスタート問題に対する解決策を提供するために、既存のコントラスト学習法を拡張します。実世界の二つのソーシャルレコメンデーションデータセットにおける広範な実験により、その効果が実証されています。

Abstract:　 Social recommendation with Graph Neural Networks(GNNs) learns to represent cold users by fusing user-user social relations with user-item interactions, thereby alleviating the cold-start problem associated with recommender systems. Despite being well adapted to social relations and user-item interactions, these supervised models are still susceptible to popularity bias. Contrastive learning helps resolve this dilemma by identifying the properties that distinguish positive from negative samples. In its previous combinations with recommender systems, social relationships and cold-start cases in this context are not considered. Also, they primarily focus on collaborative features between users and items, leaving the similarity between items under-utilized. In this work, we propose socially-aware dual contrastive learning for cold-start recommendation, where cold users can be modeled in the same way as warm users. To take full advantage of social relations, we create dynamic node embeddings for each user by aggregating information from different neighbors according to each different query item, in the form of user-item pairs. We further design a dual-branch self-supervised contrastive objective to account for user-item collaborative features and item-item mutual information, respectively. On one hand, our framework eliminates popularity bias with proper negative sampling in contrastive learning, without extra ground-truth supervision. On the other hand, we extend previous contrastive learning methods to provide a solution to cold-start problem with social relations included. Extensive experiments on two real-world social recommendation datasets demonstrate its effectiveness.

Hierarchical Task-aware Multi-Head Attention Network
Authors: Jing Du (1), Lina Yao (1), Xianzhi Wang (2), Bin Guo (3), Zhiwen Yu (3)
1: The University of New South Wales, 2: University of Technology Sydney, 3: Northwestern Polytechnical University

ACM DL

Google Scholar

(189)
概要:　ニューラル・マルチタスク学習は、単一のモデル内で複数のタスクを共同で学習する方法として人気が高まっています。関連する研究が新たな地平を切り開いている一方で、まだ二つの大きな制約が残っています。具体的には、（i）タスク間の関連が薄いシナリオに対する一般化の欠如、および（ii）タスクのグローバルな共通性とローカルな特性に関する調査の不足です。これらのギャップを埋めるために、我々は階層的タスク認識マルチヘッドアテンションネットワーク（Hierarchical Task-aware Multi-headed Attention Network, HTMN）と名付けられたニューラル・マルチタスク学習モデルを提案します。HTMNは、タスク固有の特徴とタスク共有の特徴を明示的に区別することで、タスク間の関連が弱いことによる影響を減少させます。提案手法は二つの部分に焦点を当てています。マルチレベル・タスク認識エキスパート・ネットワークはタスク共有のグローバルな特徴とタスク固有のローカルな特徴を識別し、階層的マルチヘッドアテンションネットワークはグローバルおよびローカルな特徴を組み合わせることによって、各タスクのより堅牢で適応的な表現をプロファイルします。その後、各タワーはハイブリッドなタスク適応表現を受け取り、タスク固有の予測を行います。二つの実データセットにおける広範な実験から、HTMNがさまざまな予測タスクにおいて比較された手法を一貫して上回ることが示されました。

Abstract:　 Neural Multi-task Learning is gaining popularity as a way to learn multiple tasks jointly within a single model. While related research continues to break new ground, two major limitations still remain, including (i) poor generalization to scenarios where tasks are loosely correlated; and (ii) under-investigation on global commonality and local characteristics of tasks. Our aim is to bridge these gaps by presenting a neural multi-task learning model coined Hierarchical Task-aware Multi-headed Attention Network (HTMN). HTMN explicitly distinguishes task-specific features from task-shared features to reduce the impact caused by weak correlation between tasks. The proposed method highlights two parts: Multi-level Task-aware Experts Network that identifies task-shared global features and task-specific local features, and Hierarchical Multi-Head Attention Network that hybridizes global and local features to profile more robust and adaptive representations for each task. Afterwards, each task tower receives its hybrid task-adaptive representation to perform task-specific predictions. Extensive experiments on two real datasets show that HTMN consistently outperforms the compared methods on a variety of prediction tasks.

Image-Text Retrieval via Contrastive Learning with Auxiliary Generative Features and Support-set Regularization
Authors: Lei Zhang (1), Min Yang (2), Chengming Li (3), Ruifeng Xu (4)
1: Shenzhen Institutes of Advanced Technology, 2: Shenzhen Institutes of Advanced Technology, 3: Sun Yat-sen University, 4: Harbin Institute of Technology

ACM DL

Google Scholar

(190)
概要:　本論文では、異なるモダリティ間の異質性を埋め、コントラスト学習を活用して画像とテキストの取得精度を向上させる手法を提案します。具体的には、補助的な画像からテキストへの生成特徴とテキストから画像への生成特徴を利用し、コントラスト学習によって整合している画像とテキストのペアの距離を縮め、整合していないペア間の距離を増やすことでモダリティ内およびモダリティ間の関係を調整します。さらに、同一の意味カテゴリ内に含まれるクロスモーダルサポートセット情報と各画像/テキストとの間の距離を制約するサポートセット正則化項を設けることで、コントラスト学習をさらに向上させます。提案手法の有効性を評価するために、MIRFLICKR-25K、NUS-WIDE、MS COCOの3つのベンチマークデータセットを用いて実験を行いました。実験結果は、提案モデルがクロスモーダルな画像とテキストの取得において既存の強力なベースラインを大幅に上回ることを示しています。再現性を確保するため、コードとデータは以下で公開しています：https://github.com/Hambaobao/CRCGS。

Abstract:　 In this paper, we bridge the heterogeneity gap between different modalities and improve image-text retrieval by taking advantage of auxiliary image-to-text and text-to-image generative features with contrastive learning. Concretely, contrastive learning is devised to narrow the distance between the aligned image-text pairs and push apart the distance between the unaligned pairs from both inter- and intra-modality perspectives with the help of cross-modal retrieval features and auxiliary generative features. In addition, we devise a support-set regularization term to further improve contrastive learning by constraining the distance between each image/text and its corresponding cross-modal support-set information contained in the same semantic category. To evaluate the effectiveness of the proposed method, we conduct experiments on three benchmark datasets (i.e., MIRFLICKR-25K, NUS-WIDE, MS COCO). Experimental results show that our model significantly outperforms the strong baselines for cross-modal image-text retrieval. For reproducibility, we submit the code and data publicly at: \urlhttps://github.com/Hambaobao/CRCGS.

Enhancing Event-Level Sentiment Analysis with Structured Arguments
Authors: Qi Zhang (1), Jie Zhou (2), Qin Chen (1), Qingchun Bai (3), Liang He (1)
1: East China Normal University, 2: Fudan University, 3: Shanghai Open University

ACM DL

Google Scholar

(191)
概要:　これまでのイベントレベルの感情分析（SA）に関する研究では、イベントをトピック、カテゴリ、またはターゲット用語としてモデル化することが一般的でしたが、感情に影響を与える可能性のある構造化された引数（例：主語、目的語、時間、場所）については十分に研究されていません。本稿では、このタスクを構造化されたイベントレベルのSAとして再定義し、この問題を解決するためのエンドツーエンドイベントレベル感情分析（E3SA）手法を提案します。具体的には、イベントの構造情報を明示的に抽出し、モデリングすることでイベントレベルのSAを強化します。数多くの実験により、我々の提案手法が最新の方法に対して大きな優位性を有することが示されています。また、データセットの不足を考慮し、イベントの引数と感情ラベルを含む大規模な実世界のデータセットも公開し、さらなる研究を促進します。

Abstract:　 Previous studies about event-level sentiment analysis (SA) usually model the event as a topic, a category or target terms, while the structured arguments (e.g., subject, object, time and location) that have potential effects on the sentiment are not well studied. In this paper, we redefine the task as structured event-level SA and propose an End-to-End Event-level Sentiment Analysis (E3SA) approach to solve this issue. Specifically, we explicitly extract and model the event structure information for enhancing event-level SA. Extensive experiments demonstrate the great advantages of our proposed approach over the state-of-the-art methods. Noting the lack of the dataset, we also release a large-scale real-world dataset with event arguments and sentiment labelling for promoting more researches.

Denoising Time Cycle Modeling for Recommendation
Authors: Sicong Xie (1), Qunwei Li (1), Weidi Xu (2), Kaiming Shen (3), Shaohu Chen (3), Wenliang Zhong (1)
1: Ant Group, 2: Ant Group, 3: Ant Group

ACM DL

Google Scholar

(192)
概要:　近年、レコメンダーシステムにおいて、ユーザーとアイテムの相互作用の時間的パターンをモデル化することが多くの注目を集めています。しかし、既存の方法はユーザー行動の多様な時間的パターンを無視していると私たちは主張します。ターゲットアイテムに関係のないユーザー行動の部分集合を「ノイズ」と定義し、これはターゲット関連の時間サイクルモデルの性能を制限し、推薦の精度に影響を与えます。本論文では、ユーザー行動のノイズを除去し、ターゲットアイテムに高く関連するユーザー行動の部分集合を選択する新しい手法、Denoising Time Cycle Modeling (DiCycle)を提案します。DiCycleは推薦のための多様な時間サイクルパターンを明示的にモデル化することができます。公開ベンチマークと実世界のデータセットの両方を用いた広範な実験により、最先端の推薦方法に対するDiCycleの優れた性能が実証されました。

Abstract:　 Recently, modeling temporal patterns of user-item interactions have attracted much attention in recommender systems. We argue that existing methods ignore the variety of temporal patterns of user behaviors. We define the subset of user behaviors that are ir- relevant to the target item as noises, which limits the performance of target-related time cycle modeling and affect the recommendation performance. In this paper, we propose Denoising Time Cycle Modeling (DiCycle), a novel approach to denoise user behaviors and select the subset of user behaviors that are highly related to the target item. DiCycle is able to explicitly model diverse time cycle patterns for recommendation. Extensive experiments are conducted on both public benchmarks and a real-world dataset, demonstrating the superior performance of DiCycle over the state-of-the-art recommendation methods.

P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning
Authors: Xiaomeng Hu (1), Shi Yu (2), Chenyan Xiong (3), Zhenghao Liu (1), Zhiyuan Liu (2), Ge Yu (1)
1: Northeastern University, 2: Tsinghua University, 3: Microsoft

ACM DL

Google Scholar

(193)
概要:　他の言語タスクと比較して、検索ランキングに事前学習済み言語モデル（PLM）を適用するには、より微妙な調整とトレーニングシグナルが必要な場合が多い。この論文では、事前学習とランキングの微調整における2つの不一致を特定し、研究する。これらは、トレーニング目的およびモデルアーキテクチャに関する違いを考慮したトレーニングスキーマのギャップ、およびランキングに必要な知識と事前学習中に学習された知識のズレを考慮したタスク知識のギャップである。これらのギャップを軽減するために、我々は事前学習、プロンプト学習、事前微調整されたニューラルランカー（P3 Ranker）を提案する。P3 Rankerは、ランキングタスクを事前学習に似たスキーマに変換するためにプロンプトベースの学習を活用し、モデルを中間的な監督タスクで初期化するために事前微調整を使用する。MS MARCOとRobust04での実験により、P3 Rankerの少量データによるランキングの優れた性能が示された。分析から、P3 Rankerがプロンプトベースの学習を通じてランキングタスクにより適応し、事前微調整で得られた必要なランキング指向の知識を引き出すことができるため、データ効率に優れたPLMの適応が可能であることが明らかになった。我々のコードはhttps://github.com/NEUIR/P3Rankerで入手可能である。

Abstract:　 Compared to other language tasks, applying pre-trained language models (PLMs) for search ranking often requires more nuances and training signals. In this paper, we identify and study the two mismatches between pre-training and ranking fine-tuning: the training schema gap regarding the differences in training objectives and model architectures, and the task knowledge gap considering the discrepancy between the knowledge needed in ranking and that learned during pre-training. To mitigate these gaps, we propose Pre-trained, Prompt-learned and Pre-finetuned Neural Ranker (P3 Ranker). P3 Ranker leverages prompt-based learning to convert the ranking task into a pre-training like schema and uses pre-finetuning to initialize the model on intermediate supervised tasks. Experiments on MS MARCO and Robust04 show the superior performances of P3 Ranker in few-shot ranking. Analyses reveal that P3 Ranker is able to better accustom to the ranking task through prompt-based learning and retrieve necessary ranking-oriented knowledge gleaned in pre-finetuning, resulting in data-efficient PLM adaptation. Our code is available at https://github.com/NEUIR/P3Ranker.

Towards Results-level Proportionality for Multi-objective Recommender Systems
Authors: Ladislav Peska (1), Patrik Dokoupil (1)
1: Charles University

ACM DL

Google Scholar

(194)
概要:　本研究の主な焦点は、ユーザーに最終的な推奨リストを提供する際の複数目的最適化（MOO）の問題にあります。現在、システム設計者は通常、重み付き平均の設定で個々の目的の重要性を設定することにより、MOOを調整することができます。しかし、これでは最終結果においてそのような目的が反映されない場合があります。対照的に、本研究ではシステム設計者やエンドユーザーが、目的ごとの相対比率を直接定量化し、その結果に反映させることを目指します。例えば、最終結果は60％の関連性、30％の多様性、10％の新規性を持つべきというように。個々の目的が同一スケールで品質を表すように変換されれば、これらの結果条件式は推奨の調整可能性および説明可能性、さらにはユーザーの推奨に対するコントロールを大いに向上させる可能性があります。この課題を達成するために、公選の際の命令配分問題に着想を得た反復アルゴリズムを提案します。このアルゴリズムは、個々の目的のアイテム毎の限界利益が計算できる限り適用可能です。アルゴリズムの有効性は、関連性-新規性-多様性の最適化問題のいくつかの設定で評価されました。さらに、ユーザーにとって同様の価値を表現するために、個々の目的をスケールアップするいくつかのオプションも概説します。

Abstract:　 The main focus of our work is the problem of multiple objectives optimization (MOO) while providing a final list of recommendations to the user. Currently, system designers can tune MOO by setting importance of individual objectives, usually in some kind of weighted average setting. However, this does not have to translate into the presence of such objectives in the final results. In contrast, in our work we would like to allow system designers or end-users to directly quantify the required relative ratios of individual objectives in the resulting recommendations, e.g., the final results should have 60% relevance, 30% diversity and 10% novelty. If individual objectives are transformed to represent quality on the same scale, these result conditioning expressions may greatly contribute towards recommendations tuneability and explainability as well as user's control over recommendations. To achieve this task, we propose an iterative algorithm inspired by the mandates allocation problem in public elections. The algorithm is applicable as long as per-item marginal gains of individual objectives can be calculated. Effectiveness of the algorithm is evaluated on several settings of relevance-novelty-diversity optimization problem. Furthermore, we also outline several options to scale individual objectives to represent similar value for the user.

Adversarial Filtering Modeling on Long-term User Behavior Sequences for Click-Through Rate Prediction
Authors: Xiaochen Li (1), Jian Liang (1), Xialong Liu (1), Yu Zhang (2)
1: Alibaba Group, 2: Lazada Group

ACM DL

Google Scholar

(195)
概要:　クリック率（CTR）予測において、ユーザーの行動情報はユーザーの興味を捉えるために非常に重要です。その豊かさを向上させるために、長期の行動を収集することは、学術界と業界で一般的なアプローチとなっていますが、その分オンラインストレージと遅延が増加するというコストが伴います。最近では、研究者たちが長期の行動シーケンスを短縮し、ユーザーの興味をモデル化するいくつかの方法を提案しています。これらの方法はオンラインコストを効率的に削減しますが、長期のユーザー行動に含まれるノイズ情報をうまく処理できず、CTR予測の性能を著しく悪化させる可能性があります。より良いコスト/パフォーマンスのトレードオフを得るために、我々は新しい対向フィルタリングモデル（ADFM）を提案します。このモデルは長期のユーザー行動をモデル化するもので、階層的集約表現を使用して生の行動シーケンスを圧縮し、対向フィルタリング機構を用いて不要な行動情報を削除する事を学習します。選択されたユーザー行動はCTR予測のための興味抽出モジュールに入力されます。公開データセットおよび業界データセットでの実験結果により、我々の手法が最新のモデルに対して大幅な改善を達成することが示されています。

Abstract:　 Rich user behavior information is of great importance for capturing and understanding user interest in click-through rate (CTR) prediction. To improve the richness, collecting long-term behaviors becomes a typical approach in academy and industry but at the cost of increasing online storage and latency. Recently, researchers have proposed several approaches to shorten long-term behavior sequence and then model user interests. These approaches reduce online cost efficiently but do not well handle the noisy information in long-term user behavior, which may deteriorate the performance of CTR prediction significantly. To obtain better cost/performance trade-off, we propose a novel Adversarial Filtering Model (ADFM) to model long-term user behavior. ADFM uses a hierarchical aggregation representation to compress raw behavior sequence and then learns to remove useless behavior information with an adversarial filtering mechanism. The selected user behaviors are fed into interest extraction module for CTR prediction. Experimental results on public datasets and industrial dataset demonstrate that our method achieves significant improvements over state-of-the-art models.

FUM: Fine-grained and Fast User Modeling for News Recommendation
Authors: Tao Qi (1), Fangzhao Wu (2), Chuhan Wu (3), Yongfeng Huang (1)
1: Tsinghua University, 2: Microsoft Research Asia, 3: Tsinghua Unvisersity

ACM DL

Google Scholar

(196)
概要:　ニュースの推薦において、ユーザーモデリングは重要です。既存の手法は通常、ユーザーがクリックしたニュースをそれぞれ独立してニュース埋め込みにエンコードし、それらをユーザー埋め込みに集約します。しかし、同じユーザーがクリックした異なるニュース間の単語レベルの相互作用、すなわちユーザーの興味を推測するための豊富な詳細な手がかりはこれらの手法では無視されがちです。本論文では、ニュース推薦のための細粒度で高速なユーザーモデリングフレームワーク（FUM）を提案します。FUMの核心は、クリックされたニュースを長文ドキュメントに連結し、ユーザーモデリングをドキュメントモデリングのタスクに変換し、ニュース内およびニュース間の単語レベルの相互作用を含めることです。標準のトランスフォーマーは長文ドキュメントを効率的に処理できないため、我々はFastformerと呼ばれる効率的なトランスフォーマーを使用し、詳細な行動相互作用をモデル化します。実世界の2つのデータセットにおける広範な実験により、FUMがニュース推薦のためのユーザーの興味を効果的かつ効率的にモデル化できることが検証されました。

Abstract:　 User modeling is important for news recommendation. Existing methods usually first encode user's clicked news into news embeddings independently and then aggregate them into user embedding. However, the word-level interactions across different clicked news from the same user, which contain rich detailed clues to infer user interest, are ignored by these methods. In this paper, we propose a fine-grained and fast user modeling framework (FUM) to model user interest from fine-grained behavior interactions for news recommendation. The core idea of FUM is to concatenate the clicked news into a long document and transform user modeling into a document modeling task with both intra-news and inter-news word-level interactions. Since vanilla transformer cannot efficiently handle long document, we apply an efficient transformer named Fastformer to model fine-grained behavior interactions. Extensive experiments on two real-world datasets verify that FUM can effectively and efficiently model user interest for news recommendation.

Curriculum Learning for Dense Retrieval Distillation
Authors: Hansi Zeng (1), Hamed Zamani (1), Vishwa Vinay (2)
1: University of Massachusetts Amherst, 2: Adobe Research

ACM DL

Google Scholar

(197)
概要:　最近の研究では、既存のリランキングモデルからランク付け知識を蒸留することにより、より効果的な高密度検索モデルを得ることができることが示されています。本論文では、リランキング（教師）モデルによって生成される訓練データの難易度を制御する一般的なカリキュラム学習ベースの最適化フレームワーク「CL-DRD」を提案します。CL-DRDは、知識蒸留データの難易度を段階的に増加させることで、高密度検索（生徒）モデルを反復的に最適化します。具体的には、最初に生徒モデルに対して教師のランキング間での粗い文書選好ペアを提供し、徐々に細かいペアワイズの文書順序付け要求に移行します。我々の実験では、CL-DRDフレームワークのシンプルな実装を、最新の2つの高密度検索モデルに適用しました。3つの公開パッセージ検索データセットでの実験により、提案したフレームワークの有効性が示されました。

Abstract:　 Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it. In more detail, we initially provide the student model coarse-grained preference pairs between documents in the teacher's ranking, and progressively move towards finer-grained pairwise document ordering requirements. In our experiments, we apply a simple implementation of the CL-DRD framework to enhance two state-of-the-art dense retrieval models. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.

Evaluation of Herd Behavior Caused by Population-scale Concept Drift in Collaborative Filtering
Authors: Chenglong Ma (1), Yongli Ren (1), Pablo Castells (2), Mark Sanderson (1)
1: RMIT University, 2: Universidad Autónoma de Madrid

ACM DL

Google Scholar

(198)
概要:　ストリームデータにおける概念ドリフトは、機械学習アプリケーションで広く研究されています。レコメンダーシステムの分野においても、この問題はユーザー行動の時間的ダイナミクスとして広く観察されています。さらに、COVID-19パンデミックに関連する緊急事態の文脈では、人々の行動パターンが極端に変化し、他人の意見を模倣する傾向があります。ユーザー行動の変化は常に合理的であるとは限りません。したがって、非合理的な行動はアルゴリズムによって学習された知識を損なう可能性があります。この非合理的な行動は群集効果を引き起こし、レコメンダーシステムにおける人気バイアスを悪化させる恐れがあります。しかし、関連研究は通常、個々の概念ドリフトに注目し、同じ社会集団内のユーザー間の相乗効果を見落としています。我々はユーザー行動の研究を行い、ユーザー間の協調的概念ドリフトを検出します。また、個々の経験の増加が群集効果を弱めることを実証的に研究します。我々の結果は、CFモデルが群行動によって大きく影響を受けることを示唆しており、将来のレコメンダーアルゴリズムの設計に有益な示唆を提供する可能性があります。

Abstract:　 Concept drift in stream data has been well studied in machine learning applications. In the field of recommender systems, this issue is also widely observed, as known as temporal dynamics in user behavior. Furthermore, in the context of COVID-19 pandemic related contingencies, people shift their behavior patterns extremely and tend to imitate others' opinions. The changes in user behavior may not be always rational. Thus, irrational behavior may impair the knowledge learned by the algorithm. It can cause herd effects and aggravate the popularity bias in recommender systems due to the irrational behavior of users. However, related research usually pays attention to the concept drift of individuals and overlooks the synergistic effect among users in the same social group. We conduct a study on user behavior to detect the collaborative concept drifts among users. Also, we empirically study the increase of experience of individuals can weaken herding effects. Our results suggest the CF models are highly impacted by the herd behavior and our findings could provide useful implications for the design of future recommender algorithms.

Detecting Frozen Phrases in Open-Domain Question Answering
Authors: Mostafa Yadegari (1), Ehsan Kamalloo (1), Davood Rafiei (1)
1: University of Alberta

ACM DL

Google Scholar

(199)
概要:　自然言語での質問における単語やフレーズの基盤となる構造には重要な情報が含まれており、この構造は広範に研究されています。本論文では、質問から回答パッセージに全体として転送されることが強く期待される「フローズンフレーズ」と呼ばれる特定の構造を研究します。フローズンフレーズが検出されれば、与えられた入力質問の局所的な文脈を特定することが重要なオープンドメインの質問応答（QA）において有用です。興味深い問いは、フローズンフレーズが正確に検出できるかどうかです。我々はこの問題をシーケンスラベリングタスクとして設定し、既存のQAデータセットから合成データを作成してモデルを訓練します。さらに、このモデルを検出されたフレーズを認識するスパースリトリーバーに組み込みます。我々の実験から、回答ドキュメントに出現する可能性が高いフローズンフレーズを検出することが、リトリーバルおよびオープンドメインQAモデルのエンドツーエンドの精度において顕著な改善をもたらすことが明らかになりました。

Abstract:　 There is essential information in the underlying structure of words and phrases in natural language questions, and this structure has been extensively studied. In this paper, we study one particular structure, referred to as frozen phrases, that is highly expected to transfer as a whole from questions to answer passages. Frozen phrases, if detected, can be helpful in open-domain Question Answering (QA) where identifying the localized context of a given input question is crucial. An interesting question is if frozen phrases can be accurately detected. We cast the problem as a sequence-labeling task and create synthetic data from existing QA datasets to train a model. We further plug this model into a sparse retriever that is made aware of the detected phrases. Our experiments reveal that detecting frozen phrases whose presence in answer documents are highly plausible yields significant improvements in retrievals as well as in the end-to-end accuracy of open-domain QA models.

Enhancing Hypergraph Neural Networks with Intent Disentanglement for Session-based Recommendation
Authors: Yinfeng Li (1), Chen Gao (1), Hengliang Luo (2), Depeng Jin (1), Yong Li (1)
1: Tsinghua University, 2: Meituan Inc.

ACM DL

Google Scholar

(200)
概要:　セッションベースのレコメンデーション（SBR）は、短い行動セッション内で次のアイテムを予測することを目指しています。既存の解決策は、以下の2つの主な課題に対応していません。1) ユーザーの興味が動的に結びついた意図として示されること、2) セッションには常にノイズの信号が含まれること。これらの課題に対応するために、本論文ではハイパーグラフに基づく解決策であるHIDEを提案します。具体的には、HIDEはまず各セッションのハイパーグラフを構築し、異なる視点から興味の遷移をモデル化します。次に、HIDEは各アイテムクリックにおける意図を微視的および巨視的に分解します。微視的分解においては、セッションハイパーグラフ上で意図対応の埋め込み伝播を行い、ノイズデータから分離された意図を適応的に活性化させます。巨視的分解においては、異なる意図の独立性を促進するために補助的な意図分類タスクを導入します。最後に、与えられたセッションに対して意図特有の表現を生成し、最終的なレコメンデーションを行います。ベンチマーク評価によって、最先端の手法と比較してもHIDEの大幅な性能向上が示されました。

Abstract:　 Session-based recommendation (SBR) aims at the next-item prediction with a short behavior session. Existing solutions fail to address two main challenges: 1) user interests are shown as dynamically coupled intents, and 2) sessions always contain noisy signals. To address them, in this paper, we propose a hypergraph-based solution, HIDE. Specifically, HIDE first constructs a hypergraph for each session to model the possible interest transitions from distinct perspectives. HIDE then disentangles the intents under each item click in micro and macro manners. In the micro-disentanglement, we perform intent-aware embedding propagation on session hypergraph to adaptively activate disentangled intents from noisy data. In the macro-disentanglement, we introduce an auxiliary intent-classification task to encourage the independence of different intents. Finally, we generate the intent-specific representations for the given session to make the final recommendation. Benchmark evaluations demonstrate the significant performance gain of our HIDE over the state-of-the-art methods.

Point Prompt Tuning for Temporally Language Grounding
Authors: Yawen Zeng (1)
1: Tencent Inc.

ACM DL

Google Scholar

(201)
概要:　時間的言語グラウンディング（TLG）の課題は、未編集のビデオから与えられたテキストクエリに一致するビデオモーメントを特定することであり、これは多くの研究者の関心を集めています。近年、典型的な検索ベースのTLG手法は事前にセグメント化された候補モーメントのために非効率であり、一方でローカリゼーションベースのTLGソリューションは強化学習を採用しており、収束が不安定です。したがって、効率的かつ安定的にTLGタスクを実行する方法は容易ではありません。この問題に取り組むために、我々は革新的なソリューションとして、ポイントプロンプトチューニング（PPT）を提案します。これはこのタスクをプロンプトベースのマルチモーダル問題として定式化し、複数のサブタスクを統合して性能をチューニングします。具体的には、クエリを最初に書き直すための柔軟なプロンプト戦略を提案し、これにはクエリ、開始ポイント、終了ポイントが含まれます。続いて、マルチモーダルコンテキストを完全に学習するためのマルチモーダルトランスフォーマーを採用します。同時に、新しいフレームワークを制約するさまざまなサブタスク、つまりマッチングタスクとローカリゼーションタスクを設計しています。最終的に、マッチしたビデオモーメントの開始ポイントと終了ポイントが簡単かつ安定的に予測されます。2つの実世界データセットでの広範な実験により、提案されたソリューションの有効性が十分に検証されました。

Abstract:　 The task of temporally language grounding (TLG) aims to locate a video moment from an untrimmed video that match a given textual query, which has attracted considerable research attention. In recent years, typical retrieval-based TLG methods are inefficient due to pre-segmented candidate moments, while localization-based TLG solutions adopt reinforcement learning resulting in unstable convergence. Therefore, how to perform TLG task efficiently and stably is a non-trivial work. Toward this end, we innovatively contribute a solution, Point Prompt Tuning (PPT), which formulates this task as a prompt-based multi-modal problem and integrates multiple sub-tasks to tuning performance. Specifically, a flexible prompt strategy is contributed to rewrite the query firstly, which contains both query, start point and end point. Thereafter, a multi-modal Transformer is adopted to fully learn the multi-modal context. Meanwhile, we design various sub-tasks to constrain the novel framework, namely matching task and localization task. Finally, the start and end points of matched video moment are straightforward predicted, simply yet stably. Extensive experiments on two real-world datasets have well verified the effectiveness of our proposed solution.

Value Penalized Q-Learning for Recommender Systems
Authors: Chengqian Gao (1), Ke Xu (2), Kuangqi Zhou (3), Lanqing Li (2), Xueqian Wang (1), Bo Yuan (1), Peilin Zhao (4)
1: Tsinghua University, 2: Tencent AI Lab, 3: National University of Singapore, 4: Tencent AI Lab

ACM DL

Google Scholar

(202)
概要:　強化学習（RL）をレコメンダーシステム（RS）に拡張することは有望です。なぜなら、RLエージェントによる期待累積報酬の最大化が、つまり顧客の長期的な満足度向上がRSの目的と一致するからです。この目標を達成するための重要なアプローチは、オフライン強化学習です。オフラインRLは、高価なオンラインインタラクションではなく、ログデータからポリシーを学習することを目指します。本稿では、Value Penalized Q-learning（VPQ）という新しい不確実性ベースのオフラインRLアルゴリズムを提案します。これは、不確実性を考慮した重みを使って回帰ターゲットで不安定なQ値をペナルティし、行動方針を推定することなく保守的なQ関数を実現します。これは、大量のアイテムを扱うRSに適しています。二つの実世界データセットを用いた実験により、提案手法が既存のRSモデルに対する効果的なプラグインとして機能することが示されました。

Abstract:　 Scaling reinforcement learning (RL) to recommender systems (RS) is promising since maximizing the expected cumulative rewards for RL agents meets the objective of RS, i.e., improving customers' long-term satisfaction. A key approach to this goal is offline RL, which aims to learn policies from logged data rather than expensive online interactions. In this paper, we propose Value Penalized Q-learning (VPQ), a novel uncertainty-based offline RL algorithm that penalizes the unstable Q-values in the regression target using uncertainty-aware weights, achieving the conservative Q-function without the need of estimating the behavior policy, suitable for RS with a large number of items. Experiments on two real-world datasets show the proposed method serves as a gain plug-in for existing RS models.

Transform Cold-Start Users into Warm via Fused Behaviors in Large-Scale Recommendation
Authors: Pengyang Li (1), Rong Chen (2), Quan Liu (2), Jian Xu (2), Bo Zheng (2)
1: Zhejiang University, 2: Alibaba Group

ACM DL

Google Scholar

(203)
概要:　データが非常に限られているコールドスタートユーザーに対する推薦は、レコメンダシステムにおける代表的な課題です。現在のディープレコメンダシステムはユーザーのコンテンツ特性や行動を利用してパーソナライズされた推薦を行いますが、以下のような課題から、既存ユーザーに比べてコールドスタートユーザーに対するパフォーマンスが著しく低下することがよくあります。（1）コールドスタートユーザーは既存ユーザーとは大きく異なる特徴分布を持つ可能性があります。（2）コールドスタートユーザーの少数の行動を活用することが難しいです。本論文では、これらの問題を軽減するためにCold-Transformerと呼ばれるレコメンダシステムを提案します。具体的には、コンテキストベースのEmbedding Adaptionを設計し、特徴分布の違いを補正します。これにより、コールドスタートユーザーの埋め込みを既存ユーザーに近い「ウォーム」な状態に変換し、対応するユーザーの嗜好を表現します。さらに、コールドスタートユーザーの少数の行動を利用し、ユーザーのコンテキストを特徴付けるために、ポジティブおよびネガティブなフィードバックのFused Behaviorsを同時にモデル化するLabel Encodingを提案します。これにより、比較的十分な情報を持つことができます。最後に、大規模な産業用推薦を行うために、ユーザーとターゲットアイテムを分離したtwo-towerアーキテクチャを維持します。公共および産業データセットに対する広範な実験により、Cold-Transformerがディープカプル型やスケーラビリティの低い最新の手法を含むものよりも大幅に優れた性能を発揮することが示されました。

Abstract:　 Recommendation for cold-start users who have very limited data is a canonical challenge in recommender systems. Existing deep recommender systems utilize user content features and behaviors to produce personalized recommendations, yet often face significant performance degradation on cold-start users compared to existing ones due to the following challenges: (1) Cold-start users may have a quite different distribution of features from existing users. (2) The few behaviors of cold-start users are hard to be exploited. In this paper, we propose a recommender system called Cold-Transformer to alleviate these problems. Specifically, we design context-based Embedding Adaption to offset the differences in feature distribution. It transforms the embedding of cold-start users into a warm state that is more like existing ones to represent corresponding user preferences. Furthermore, to exploit the few behaviors of cold-start users and characterize the user context, we propose Label Encoding that models Fused Behaviors of positive and negative feedback simultaneously, which are relatively more sufficient. Last, to perform large-scale industrial recommendations, we keep the two-tower architecture that de-couples user and target item. Extensive experiments on public and industrial datasets show that Cold-Transformer significantly outperforms state-of-the-art methods, including those that are deep coupled and less scalable.

Understanding User Satisfaction with Task-oriented Dialogue Systems
Authors: Clemencia Siro (1), Mohammad Aliannejadi (1), Maarten de Rijke (1)
1: University of Amsterdam

ACM DL

Google Scholar

(204)
概要:　対話システム（DS）は、その種類と目的に応じて評価されます。以下の2つのカテゴリがよく区別されます：1) 特定のタスクを完了する能力、すなわち有用性に基づいて評価されるタスク指向対話システム（TDS）、および2) 人と対話する能力、すなわちユーザー体験に基づいて評価されるオープンドメインチャットボット。ユーザー体験がTDSのユーザー満足度に与える影響は、有用性との対比、またはそれに加えて、どのように影響するのでしょうか？我々は、広く使用されている対話推薦データセットReDialからサンプリングされた対話に対し、追加の注釈レイヤーを提供することでデータを収集しました。これまでの研究とは異なり、6つの対話側面（関連性、興味深さ、理解度、タスク完了度、効率、興味喚起）の両方について、ターンレベルと対話レベルの両方で注釈を行いました。この注釈により、異なる対話側面がユーザー満足度にどのように影響するかを研究することができます。アノテータの自由記述から導き出された包括的なユーザー体験側面を紹介し、ユーザーの全体的な印象に影響を与える可能性があることを示しています。我々の発見は、満足の概念がアノテータや対話によって異なることを示しており、あるアノテータにとっては関連性のあるターンが重要である一方、他のアノテータにとっては興味深いターンが必要とされることが示されています。我々の分析は、全体的な人間評価では捉えられないユーザー満足度の詳細な分析を提供するために、提案されたユーザー体験側面が役立つことを示しています。

Abstract:　 \beginabstract \AcpDS are evaluated depending on their type and purpose. Two categories are often distinguished: \beginenumerate\item \acpTDS, which are typically evaluated on utility, i.e., their ability to complete a specified task, and \item open-domain chat-bots, which are evaluated on the user experience, i.e., based on their ability to engage a person. \endenumerateWhat is the influence of user experience on the user satisfaction rating of \acpTDS as opposed to, or in addition to, utility ? We collect data by providing an additional annotation layer for dialogues sampled from the ReDial dataset, a widely used conversational recommendation dataset. Unlike prior work, we annotate the sampled dialogues at both the turn and dialogue level on six dialogue aspects: relevance, interestingness, understanding, task completion, efficiency, and interest arousal. The annotations allow us to study how different dialogue aspects influence user satisfaction. We introduce a comprehensive set of user experience aspects derived from the annotators' open comments that can influence users' overall impression. We find that the concept of satisfaction varies across annotators and dialogues, and show that a relevant turn is significant for some annotators, while for others, an interesting turn is all they need. Our analysis indicates that the proposed user experience aspects provide a fine-grained analysis of user satisfaction that is not captured by a monolithic overall human rating. \endabstract

Distilling Knowledge on Text Graph for Social Media Attribute Inference
Authors: Quan Li (1), Xiaoting Li (2), Lingwei Chen (3), Dinghao Wu (1)
1: The Pennsylvania State University, 2: Visa Research, 3: Wright State University

ACM DL

Google Scholar

(205)
概要:　翻訳 (日本語)

ソーシャルメディアの普及は大量のユーザー指向データを生成し、特にテキストデータは研究者やスペキュレーターがユーザー属性（例えば年齢、性別など）を推測するための重要な情報源となっています。一般に、この種の研究は属性推論をテキスト分類の問題として捉え、高度なテキスト表現を得るためにグラフニューラルネットワークを活用し始めています。しかし、これらのテキストグラフは単語に基づいて構築されるため、高いメモリ消費と少数のラベル付きテキストに対する効果の低さに悩まされています。この課題に対処するため、ソーシャルメディアの属性推論に向けてテキストグラフベースの少数ショット学習モデルを設計しました。我々のモデルは、テキストをノードとし、現在のテキスト表現に基づく学習を通じてエッジを生成することでテキストグラフを構築します。また、ラベルのないテキストを用いて少数ショット性能を向上させるために、知識蒸留を用いた最適化を行います。これにより、表現力と複雑性のバランスを取ることが可能です。ソーシャルメディアデータセットに対する実験結果は、我々のモデルが極めて少ないラベル付きテキストで属性推論において最先端の性能を示すことを証明しています。

Abstract:　 The popularization of social media generates a large amount of user-oriented data, where text data especially attracts researchers and speculators to infer user attributes (e.g., age, gender) for fulfilling their intents. Generally, this line of work casts attribute inference as a text classification problem, and starts to leverage graph neural networks for higher-level text representations. However, these text graphs are constructed on words, suffering from high memory consumption and ineffectiveness on few labeled texts. To address this challenge, we design a text-graph-based few-shot learning model for social media attribute inferences. Our model builds a text graph with texts as nodes and edges learned from current text representations via manifold learning and message passing. To further use unlabeled texts to improve few-shot performance, a knowledge distillation is devised to optimize the problem. This offers a trade-off between expressiveness and complexity. Experiments on social media datasets demonstrate the state-of-the-art performance of our model on attribute inferences with considerably fewer labeled texts.

Progressive Self-Attention Network with Unsymmetrical Positional Encoding for Sequential Recommendation
Authors: Yuehua Zhu (1), Bo Huang (2), Shaohua Jiang (2), Muli Yang (1), Yanhua Yang (1), Wenliang Zhong (2)
1: Xidian University, 2: AntGroup

ACM DL

Google Scholar

(206)
概要:　実世界の推薦システムでは、ユーザーの嗜好は長期的な一定の興味と短期的な一時的なニーズによって影響されることが多いです。最近提案されたTransformerベースのモデルは、注目すべき自己注意機構を通じて時間的動態をグローバルにモデル化し、シーケンシャルな推薦において優れていることが証明されています。しかし、元の自己注意における全ての等価なアイテム間相互作用は煩雑であり、豊富な短期パターンを含むユーザーの局所的な嗜好変動を捉えきれません。この論文では、短期および長期のパターンを効率的に捉える新しい解釈可能な畳み込み自己注意を提案します。具体的には、全体の長い行動シーケンスを一連の局所的なサブシーケンスに分割するためのダウンサンプリング畳み込みモジュールを提案します。これにより、セグメントは自己注意層の各アイテムと相互作用し、局所性に配慮した文脈的な表現を生成し、その間に元の自己注意の二次的な複雑性がほぼ線形の複雑性に減少します。さらに、Transformerの文脈での堅牢な特徴学習を強化するために、非対称の位置符号化戦略が慎重に設計されています。ML-1M、Amazon Books、Yelpなどの実データセットで広範な実験を行い、提案手法が効果と効率の両面において最先端の手法を上回ることを示しています。

Abstract:　 In real-world recommendation systems, the preferences of users are often affected by long-term constant interests and short-term temporal needs. The recently proposed Transformer-based models have proved superior in the sequential recommendation, modeling temporal dynamics globally via the remarkable self-attention mechanism. However, all equivalent item-item interactions in original self-attention are cumbersome, failing to capture the drifting of users' local preferences, which contain abundant short-term patterns. In this paper, we propose a novel interpretable convolutional self-attention, which efficiently captures both short- and long-term patterns with a progressive attention distribution. Specifically, a down-sampling convolution module is proposed to segment the overall long behavior sequence into a series of local subsequences. Accordingly, the segments are interacted with each item in the self-attention layer to produce locality-aware contextual representations, during which the quadratic complexity in original self-attention is reduced to nearly linear complexity. Moreover, to further enhance the robust feature learning in the context of Transformers, an unsymmetrical positional encoding strategy is carefully designed. Extensive experiments are carried out on real-world datasets, \eg ML-1M, Amazon Books, and Yelp, indicating that the proposed method outperforms the state-of-the-art methods w.r.t. both effectiveness and efficiency.

Empowering Next POI Recommendation with Multi-Relational Modeling
Authors: Zheng Huang (1), Jing Ma (1), Yushun Dong (1), Natasha Zhang Foutz (1), Jundong Li (1)
1: University of Virginia

ACM DL

Google Scholar

(207)
概要:　モバイルデバイスやウェブアプリケーションの広範な普及により、位置情報に基づくソーシャルネットワーク（LBSN）は、個々の場所に関連する大規模な活動や体験を提供しています。次の注目地点（POI）の推薦はLBSNにおける最も重要なタスクのひとつであり、ユーザーの過去の活動から嗜好を発見し、次に適した場所を個別に推薦することを目指しています。注目すべきは、LBSNはユーザーとPOIに関する豊富な異種の関係情報（例えば、家族や同僚などのユーザー間の社会的関係、ユーザーとPOIの訪問関係）に前例のないアクセスを提供している点です。このような関係情報は、次のPOIの推薦を促進する大きな可能性を秘めています。しかし、既存の方法の多くはユーザーとPOIの訪問にのみ焦点を当てるか、異なる関係を過度に単純化された仮定に基づいて処理し、関係の多様性を無視しています。これらの重要な欠点を補うために、我々は新しいフレームワークであるMEMOを提案します。このフレームワークは、異種関係を効果的に活用するためのマルチネットワーク表現学習モジュールを備え、連結再帰神経ネットワークを用いて時間を超えたユーザーとPOIの相互影響を明示的に組み込みます。実際のLBSNデータに基づく広範な実験により、我々のフレームワークが最先端の次のPOI推薦方法に優れていることが確認されました。

Abstract:　 With the wide adoption of mobile devices and web applications, location-based social networks (LBSNs) offer large-scale individual-level location-related activities and experiences. Next point-of-interest (POI) recommendation is one of the most important tasks in LBSNs, aiming to make personalized recommendations of next suitable locations to users by discovering preferences from users' historical activities. Noticeably, LBSNs have offered unparalleled access to abundant heterogeneous relational information about users and POIs (including user-user social relations, such as families or colleagues; and user-POI visiting relations). Such relational information holds great potential to facilitate the next POI recommendation. However, most existing methods either focus on merely the user-POI visits, or handle different relations based on over-simplified assumptions while neglecting relational heterogeneities. To fill these critical voids, we propose a novel framework, MEMO, which effectively utilizes the heterogeneous relations with a multi-network representation learning module, and explicitly incorporates the inter-temporal user-POI mutual influence with the coupled recurrent neural networks. Extensive experiments on real-world LBSN data validate the superiority of our framework over the state-of-the-art next POI recommendation methods.

What Makes a Good Podcast Summary?
Authors: Rezvaneh Rezapour (1), Sravana Reddy (2), Rosie Jones (3), Ian Soboroff (4)
1: Drexel University, 2: ASAPP, 3: Spotify, 4: NIST

ACM DL

Google Scholar

(208)
概要:　ポッドキャストの人気が高まっていることと、それに伴うリスナーのニーズが、ポッドキャストの的なを動機付けています。ポッドキャストは、ニュースやその他のメディアとは顕著に異なる領域であり、自動の文脈で一般的に研究されているものとは異なります。そのため、良いポッドキャストのの特性はまだ明らかになっていません。本研究では、TREC 2020 Podcasts Trackで得られた異なるアルゴリズムによって生成されたポッドキャストと、それに対する人間の評価を用いて、様々な自動評価指標と人間の評価との相関関係、および高評価につながるの言語的側面を研究します。

Abstract:　 ive summarization of podcasts is motivated by the growing popularity of podcasts and the needs of their listeners. Podcasting is a markedly different domain from news and other media that are commonly studied in the context of automatic summarization. As such, the qualities of a good podcast summary are yet unknown. Using a collection of podcast summaries produced by different algorithms alongside human judgments of summary quality obtained from the TREC 2020 Podcasts Track, we study the correlations between various automatic evaluation metrics and human judgments, as well as the linguistic aspects of summaries that result in strong evaluations.

A Simple Meta-learning Paradigm for Zero-shot Intent Classification with Mixture Attention Mechanism
Authors: Han Liu (1), Siyang Zhao (1), Xiaotong Zhang (1), Feng Zhang (2), Junjie Sun (1), Hong Yu (1), Xianchao Zhang (1)
1: Dalian University of Technology, 2: Peking University

ACM DL

Google Scholar

(209)
概要:　ゼロショットインテント分類は対話システムにおいて重要かつ挑戦的なタスクであり、注釈付き訓練データなしで新たに出現する多くの未知のインテントに対処することを目的としています。より満足のいくパフォーマンスを得るためには、優れた発話特徴の抽出とモデルの一般化能力の強化が重要なポイントとなります。本論文では、ゼロショットインテント分類のためのシンプルで効果的なメタ学習パラダイムを提案します。発話に対する意味表現を向上させるために、新しいミクスチャーアテンションメカニズムを導入し、分布的シグネチャーアテンションと多層パーセプトロンアテンションを同時に活用して、関連する単語出現パターンをエンコードします。見たことのあるクラスから見たことのないクラスへのモデルの転送能力を強化するために、メタ学習戦略を用いてゼロショットインテント分類を再構築します。この戦略では、見たことのあるカテゴリに対する複数のゼロショット分類タスクをシミュレートすることでモデルを訓練し、模倣された見たことのないカテゴリに対するメタ適応手順でモデルの一般化能力を促進します。異なる言語の2つの実世界の対話データセットに対する広範な実験により、標準的および一般化ゼロショットインテント分類タスクの両方で、我々のモデルが他の強力なベースラインを上回ることを示しました。

Abstract:　 Zero-shot intent classification is a vital and challenging task in dialogue systems, which aims to deal with numerous fast-emerging unacquainted intents without annotated training data. To obtain more satisfactory performance, the crucial points lie in two aspects: extracting better utterance features and strengthening the model generalization ability. In this paper, we propose a simple yet effective meta-learning paradigm for zero-shot intent classification. To learn better semantic representations for utterances, we introduce a new mixture attention mechanism, which encodes the pertinent word occurrence patterns by leveraging the distributional signature attention and multi-layer perceptron attention simultaneously. To strengthen the transfer ability of the model from seen classes to unseen classes, we reformulate zero-shot intent classification with a meta-learning strategy, which trains the model by simulating multiple zero-shot classification tasks on seen categories, and promotes the model generalization ability with a meta-adapting procedure on mimic unseen categories. Extensive experiments on two real-world dialogue datasets in different languages show that our model outperforms other strong baselines on both standard and generalized zero-shot intent classification tasks.

BSAL: A Framework of Bi-component Structure and Attribute Learning for Link Prediction
Authors: Bisheng Li (1), Min Zhou (2), Shengzhong Zhang (1), Menglin Yang (3), Defu Lian (4), Zengfeng Huang (1)
1: Fudan University, 2: Huawei Noah's Ark Lab, 3: The Chinese University of Hong Kong, 4: University of Science and Technology of China

ACM DL

Google Scholar

(210)
概要:　グラフ構造のデータが広く存在する中で、ノードの表現を学ぶことは、ノード分類、リンク予測、グラフ分類などの下流タスクにおいて非常に重要です。多様なネットワークにおける欠損リンク推論に関して、リンク予測技術を再検討し、構造情報と属性情報の両方の重要性を認識しました。しかし、既存の技術はネットワークのトポロジーに過度に依存するため実際の利用には適さないか、グラフのトポロジーと特徴の統合が適切に行えません。本研究では、このギャップを埋めるために、トポロジー情報と特徴空間からの情報を適応的に活用するための二成分構造と属性学習フレームワーク（BSAL）を提案します。具体的には、BSALはノードの属性を用いてセマンティックトポロジーを構築し、セマンティックビューに基づいて埋め込みを取得します。これにより、ノード属性が持つ情報を適応的に組み込むための柔軟で実装しやすい解決策を提供します。その後、セマンティック埋め込みとトポロジー埋め込みを注意メカニズムを用いて融合し、最終的な予測を行います。広範な実験により、この提案の優れた性能が示され、多様な研究ベンチマークにおいてベースラインを大幅に上回ることが確認されました。

Abstract:　 Given the ubiquitous existence of graph-structured data, learning the representations of nodes for the downstream tasks ranging from node classification, link prediction to graph classification is of crucial importance. Regarding missing link inference of diverse networks, we revisit the link prediction techniques and identify the importance of both the structural and attribute information. However, the available techniques either heavily count on the network topology which is spurious in practice, or cannot integrate graph topology and features properly. To bridge the gap, we propose a bicomponent structural and attribute learning framework (BSAL) that is designed to adaptively leverage information from topology and feature spaces. Specifically, BSAL constructs a semantic topology via the node attributes and then gets the embeddings regarding the semantic view, which provides a flexible and easy-to-implement solution to adaptively incorporate the information carried by the node attributes. Then the semantic embedding together with topology embedding are fused together using attention mechanism for the final prediction. Extensive experiments show the superior performance of our proposal and it significantly outperforms baselines on diverse research benchmarks.

Analyzing the Support Level for Tips Extracted from Product Reviews
Authors: Miriam Farber (1), David Carmel (2), Lital Kuchy (2), Avihai Mejer (2)
1: Faebook, 2: Amazon

ACM DL

Google Scholar

(211)
概要:　製品レビューから抽出された有用なヒントは、顧客がより情報に基づいた購入判断を下すのを助けるとともに、製品をより良く、簡単に、安全に使用するのに役立ちます。本研究では、抽出されたヒントは全ての製品レビューから受け取る支持と反対の量に基づいて評価されるべきであると論じます。この目的のために開発された分類器は、ヒントが単一のレビュー文によってどの程度支持または反対されているかを判断します。これらの支持レベルはすべてのレビュー文にわたって集約され、グローバル支持スコアおよびグローバル反対スコアを提供し、与えられたヒントに対する全てのレビューの支持レベルを反映し、顧客のヒントの有効性に対する信頼を向上させます。製品レビューから抽出された大量のヒントを分析することで、我々はヒントを「高く支持されている」、「高く反対されている」、「議論の的（支持も反対もある）」、「逸話的（支持も反対もない）」として分類する新しい分類法を提案します。

Abstract:　 Useful tips extracted from product reviews assist customers to take a more informed purchase decision, as well as making a better, easier, and safer usage of the product. In this work we argue that extracted tips should be examined based on the amount of support and opposition they receive from all product reviews. A classifier, developed for this purpose, determines the degree to which a tip is supported or contradicted by a single review sentence. These support-levels are then aggregated over all review sentences, providing a global support score, and a global contradiction score, reflecting the support-level of all reviews to the given tip, thus improving the customer confidence in the tip validity. By analyzing a large set of tips extracted from product reviews, we propose a novel taxonomy for categorizing tips as highly-supported, highly-contradicted, controversial (supported and contradicted), and anecdotal (neither supported nor contradicted).

Why do Semantically Unrelated Categories Appear in the Same Session?: A Demand-aware Method
Authors: Liqi Yang (1), Linhao Luo (1), Xiaofeng Zhang (1), Fengxin Li (2), Xinni Zhang (1), Zelin Jiang (1), Shuai Tang (1)
1: Harbin Institute of Technology, 2: China Merchants Securities Co.

ACM DL

Google Scholar

(212)
概要:　セッションベースの推薦システムは最近ますます多くの研究が行われています。ほとんどの既存のアプローチは、匿名のセッションデータからユーザーの潜在的な好みや興味を直感的に発見しようとしています。しかし、これらの時系列行動データは通常、セッションユーザーの潜在的な需要、すなわち意味レベルの要因を反映しているため、セッションから潜在的な需要を推定することが難しい課題となっています。この問題に対処するために、本稿では新しい需要認識グラフニューラルネットワークモデルを提案します。特に、需要モデル化コンポーネントを設計し、各セッションの基礎となる複数の需要を抽出します。その後、需要認識グラフニューラルネットワークを設計し、まずセッション需要グラフを構築し、次に需要認識アイテム埋め込みを学習して推薦を行います。さらに、学習された埋め込みの質を向上させるために相互情報損失が設計されています。実世界の2つのデータセットで広範な実験が行われ、提案されたモデルは最新のモデルパフォーマンスを達成しました。

Abstract:　 Session-based recommendation has recently attracted more and more research efforts. Most existing approaches are intuitively proposed to discover users' potential preferences or interests from the anonymous session data. This apparently ignores the fact that these sequential behavior data usually reflect session user's potential demand, i.e., a semantic level factor, and therefore how to estimate underlying demands from a session has become a challenging task. To tackle the aforementioned issue, this paper proposes a novel demand-aware graph neural network model. Particularly, a demand modeling component is designed to extract the underlying multiple demands of each session. Then, the demand-aware graph neural network is designed to first construct session demand graphs and then learn the demand-aware item embeddings to make the recommendation. The mutual information loss is further designed to enhance the quality of the learnt embeddings. Extensive experiments have been performed on two real-world datasets and the proposed model achieves the SOTA model performance.

Enhancing Zero-Shot Stance Detection via Targeted Background Knowledge
Authors: Qinglin Zhu (1), Bin Liang (1), Jingyi Sun (1), Jiachen Du (1), Lanjun Zhou (2), Ruifeng Xu (3)
1: Harbin Institute of Technology, 2: None, 3: Harbin Institute of Technology

ACM DL

Google Scholar

(213)
概要:　スタンス検出は、テキストが特定のターゲットに対して持つスタンスを識別することを目的としています。従来のスタンス検出と異なり、Zero-Shotスタンス検出（ZSSD）では、推論段階で未見のターゲットに対するスタンスを予測する必要があります。人間は一般に、既知のターゲットから得られた関連知識を結びつけることで、新しいターゲットのスタンスを推論しようとします。したがって、本論文では、既知のターゲットから学習したターゲット関連スタンス特徴を未見のターゲットにも一般化するために、Wikipediaから得られるターゲット特有の背景知識をモデルに組み込みます。背景知識は、既知のターゲットと未見のターゲットの意味をつなぐ橋渡しとして機能し、ZSSDにおけるモデルの一般化および推論能力を向上させます。広範な実験結果から、我々のモデルがZSSDタスクにおいて最先端の手法を上回る成果を示すことが確認されました。

Abstract:　 Stance detection aims to identify the stance of the text towards a target. Different from conventional stance detection, Zero-Shot Stance Detection (ZSSD) needs to predict the stances of the unseen targets during the inference stage. For human beings, we generally tend to reason the stance of a new target by linking it with the related knowledge learned from the known ones. Therefore, in this paper, to better generalize the target-related stance features learned from the known targets to the unseen ones, we incorporate the targeted background knowledge from Wikipedia into the model. The background knowledge can be considered as a bridge for connecting the meanings between known targets and the unseen ones, which enables the generalization and reasoning ability of the model to be improved in dealing with ZSSD. Extensive experimental results demonstrate that our model outperforms the state-of-the-art methods on the ZSSD task.

Translation-Based Implicit Annotation Projection for Zero-Shot Cross-Lingual Event Argument Extraction
Authors: Chenwei Lou (1), Jun Gao (1), Changlong Yu (2), Wei Wang (3), Huan Zhao (4), Weiwei Tu (4), Ruifeng Xu (1)
1: Harbin Institute of Technology (Shenzhen), 2: The Hong Kong University of Science and Technology, 3: Tsinghua University, 4: 4Paradigm Inc

ACM DL

Google Scholar

(214)
概要:　ゼロショットクロスリンガルイベント引数抽出（EAE）は、情報抽出において困難でありながらも実用的な問題です。これまでの多くの研究は、外部の構造化された言語的特徴に大きく依存しており、実世界のシナリオでは容易にアクセスできません。本稿では、翻訳ベースの手法を用いて、ソース言語からターゲット言語へのアノテーションの暗黙的な投影を検討します。翻訳ベースの並列コーパスを使用することで、トレーニングおよび推論時に追加の言語的特徴は必要ありません。その結果、提案された手法は、ゼロショットクロスリンガルEAEにおいて、以前の研究よりもコスト効率が高くなります。さらに、我々の暗黙的なアノテーション投影手法は、ノイズが少なく、したがって、明示的な手法よりも効果的でロバストです。実験結果は、我々のモデルが多くの競合ベースラインを上回り、最高のパフォーマンスを達成することを示しています。詳細な分析は、明示的なアノテーション投影手法と比較して、我々のモデルの有効性をさらに実証しています。

Abstract:　 Zero-shot cross-lingual event argument extraction (EAE) is a challenging yet practical problem in Information Extraction. Most previous works heavily rely on external structured linguistic features, which are not easily accessible in real-world scenarios. This paper investigates a translation-based method to implicitly project annotations from the source language to the target language. With the use of translation-based parallel corpora, no additional linguistic features are required during training and inference. As a result, the proposed approach is more cost effective than previous works on zero-shot cross-lingual EAE. Moreover, our implicit annotation projection approach introduces less noises and hence is more effective and robust than explicit ones. Experimental results show that our model achieves the best performance, outperforming a number of competitive baselines. The thorough analysis further demonstrates the effectiveness of our model compared to explicit annotation projection approaches.

Coarse-to-Fine Sparse Sequential Recommendation
Authors: Jiacheng Li (1), Tong Zhao (2), Jin Li (2), Jim Chan (2), Christos Faloutsos (3), George Karypis (2), Soo-Min Pantel (2), Julian McAuley (2)
1: University of California, 2: Amazon, 3: Carnegie Mellon University

ACM DL

Google Scholar

(215)
概要:　シーケンシャルレコメンデーションは、過去のインタラクションから動的なユーザー行動をモデル化することを目的としています。自己注意型手法は、短期的な動態と長期的な嗜好を捉えるのに効果的であることが証明されています。しかし、これらの手法はスパースデータをモデル化するのに苦労しており、高品質なアイテム表現を学習するのが難しいという問題があります。我々は、ショッピング意図とインタラクティブなアイテムを同時にモデル化することを提案します。学習された意図は粗粒度であり、アイテム推薦のための事前知識として機能します。この目的のために、我々は粗粒度から細粒度への自己注意フレームワークであるCaFeを提案し、粗粒度と細粒度のシーケンス動態を明示的に学習します。具体的には、CaFeはまず高品質なユーザー意図表現を提供するために密集した粗粒度のシーケンスから意図を学習します。次に、CaFeは意図表現をアイテムエンコーダーの出力に融合させ、改良されたアイテム表現を得ます。最後に、アイテムと対応する意図の表現に基づいて推奨アイテムを推測します。スパースなデータセットに対する実験では、CaFeは最先端の自己注意型レコメンダーを平均してNDCG@5で44.03%上回ることを示しました。

Abstract:　 Sequential recommendation aims to model dynamic user behavior from historical interactions. Self-attentive methods have proven effective at capturing short-term dynamics and long-term preferences. Despite their success, these approaches still struggle to model sparse data, on which they struggle to learn high-quality item representations. We propose to model user dynamics from shopping intents and interacted items simultaneously. The learned intents are coarse-grained and work as prior knowledge for item recommendation. To this end, we present a coarse-to-fine self-attention framework, namely CaFe, which explicitly learns coarse-grained and fine-grained sequential dynamics. Specifically, CaFe first learns intents from coarse-grained sequences which are dense and hence provide high-quality user intent representations. Then, CaFe fuses intent representations into item encoder outputs to obtain improved item representations. Finally, we infer recommended items based on representations of items and corresponding intents. Experiments on sparse datasets show that CaFe outperforms state-of-the-art self-attentive recommenders by 44.03% NDCG@5 on average.

UserBERT: Pre-training User Model with Contrastive Self-supervision
Authors: Chuhan Wu (1), Fangzhao Wu (2), Tao Qi (1), Yongfeng Huang (1)
1: Department of Electronic Engineering, 2: Microsoft Research Asia

ACM DL

Google Scholar

(216)
概要:　緒言
ユーザーモデリングはパーソナライゼーションのために非常に重要です。既存の手法は通常、タスク固有のラベル付きデータからユーザーモデルを訓練しますが、それだけでは不十分な場合があります。実際、豊富な未ラベルのユーザー行動データが存在し、それらは豊富な普遍的なユーザー情報をエンコードしています。これらのデータからユーザーモデルを事前訓練することで、多くの下流タスクにおけるユーザーモデリングを強化することができます。本論文では、UserBERTと名付けられたユーザーモデル事前訓練手法を提案し、二つの対照的自己教師ありタスクを用いて未ラベルのユーザー行動データから普遍的なユーザーモデルを学習します。最初のタスクは、マスクされた行動の予測と識別であり、ユーザー行動の文脈をモデル化することを目的としています。第二のタスクは、行動シーケンスのマッチングであり、異なる期間におけるユーザーの安定した興味を捉えることを目的としています。加えて、より良い対照事前訓練のために有益なネガティブサンプルを選択する中程度の困難度のネガティブサンプリングフレームワークを提案します。広範な実験により、UserBERTのユーザーモデル事前訓練における有効性が検証されました。

Abstract:　 User modeling is critical for personalization. Existing methods usually train user models from task-specific labeled data, which may be insufficient. In fact, there are usually abundant unlabeled user behavior data that encode rich universal user information, and pre-training user models on them can empower user modeling in many downstream tasks. In this paper, we propose a user model pre-training method named UserBERT to learn universal user models on unlabeled user behavior data with two contrastive self-supervision tasks. The first one is masked behavior prediction and discrimination, aiming to model the contexts of user behaviors. The second one is behavior sequence matching, aiming to capture user interest stable in different periods. Besides, we propose a medium-hard negative sampling framework to select informative negative samples for better contrastive pre-training. Extensive experiments validate the effectiveness of UserBERT in user model pre-training.

Understanding Long Programming Languages with Structure-Aware Sparse Attention
Authors: Tingting Liu (1), Chengyu Wang (2), Cen Chen (1), Ming Gao (1), Aoying Zhou (1)
1: East China Normal University, 2: Alibaba Group

ACM DL

Google Scholar

(217)
概要:　 CodeBERT などのプログラムベース事前訓練言語モデル（PPLMs）は、多くのコード関連の下流タスクで大きな成功を収めています。Transformer の自己注意機構のメモリと計算複雑度はシーケンス長に従って二次的に増加するため、PPLMs は通常コードの長さを 512 に制限します。しかし、現実のアプリケーションにおけるコードは一般的に長く、コード検索などは既存の PPLMs では効率的に処理できません。この問題を解決するために、本論文では長いコード理解タスクの複雑度を削減し、性能を向上させる構造認識スパースアテンション（Structure-Aware Sparse Attention: SASA）機構を提案します。SASA の主要な要素は、top-k スパースアテンションと構文木（AST）ベースの構造認識アテンションです。top-k スパースアテンションを用いることで、最も重要なアテンション関係を低コストで取得できます。コードの構造はコード文の論理を表し、コードシーケンスの特性を補完するため、アテンションに AST 構造をさらに導入しました。CodeXGLUE タスクにおける広範な実験により、SASA は競合するベースラインモデルよりも優れた性能を発揮することが示されました。

Abstract:　 Programming-based Pre-trained Language Models (PPLMs) such as CodeBERT have achieved great success in many downstream code-related tasks. Since the memory and computational complexity of self-attention in the Transformer grow quadratically with the sequence length, PPLMs typically limit the code length to 512. However, codes in real-world applications are generally long, such as code searches, which cannot be processed efficiently by existing PPLMs. To solve this problem, in this paper, we present SASA, a Structure-Aware Sparse Attention mechanism, which reduces the complexity and improves performance for long code understanding tasks. The key components in SASA are top-k sparse attention andSyntax Tree (AST)-based structure-aware attention. With top-k sparse attention, the most crucial attention relation can be obtained with a lower computational cost. As the code structure represents the logic of the code statements, which is a complement to the code sequence characteristics, we further introduce AST structures into attention. Extensive experiments on CodeXGLUE tasks show that SASA achieves better performance than the competing baselines.

Learning Trustworthy Web Sources to Derive Correct Answers and Reduce Health Misinformation in Search
Authors: Dake Zhang (1), Amir Vakili Tahami (1), Mustafa Abualsaud (2), Mark D. Smucker (1)
1: University of Waterloo, 2: Thomson Reuters Labs

ACM DL

Google Scholar

(218)
概要:　健康に関する質問に答えを求めてウェブを検索する際、検索結果に誤情報が含まれると、人々は人生に悪影響を及ぼす誤った決定を下す可能性があります。検索結果における健康に関する誤情報を減らすためには、正しい回答を含む文書を検出し、それを誤情報を含む文書よりも上位に表示する能力が必要です。正しい回答を特定することは、TREC Health Misinformation Trackの参加者にとって困難な課題でした。2021年のトラックでは、自動実行はトピックの健康質問に対する既知の回答を使用することが許可されていなかったため、最高の自動実行の互換性差異スコアは0.043であり、既知の回答を使用した最高の手動実行は0.259のスコアでした。互換性差異は、正しい文書や信頼性のある文書を誤った文書や信頼性の低い文書よりも高く評価する能力を測定します。既存の健康質問とそれらの既知の回答のセットを使用することで、信頼できるウェブホストを学習し、それに基づいて2021年の健康質問に対する正しい回答を76%の精度で予測できることを示します。我々の予測された回答を使用することで、この回答を含むと予測される文書を上位に表示し、互換性差異スコアを0.129に達することができ、これにより従来の最高の自動方法と比較して性能が3倍向上します。

Abstract:　 When searching the web for answers to health questions, people can make incorrect decisions that have a negative effect on their lives if the search results contain misinformation. To reduce health misinformation in search results, we need to be able to detect documents with correct answers and promote them over documents containing misinformation. Determining the correct answer has been a difficult hurdle to overcome for participants in the TREC Health Misinformation Track. In the 2021 track, automatic runs were not allowed to use the known answer to a topic's health question, and as a result, the top automatic run had a compatibility-difference score of 0.043 while the top manual run, which used the known answer, had a score of 0.259. The compatibility-difference measures the ability of methods to rank correct and credible documents before incorrect and non-credible documents. By using an existing set of health questions and their known answers, we show it is possible to learn which web hosts are trustworthy, from which we can predict the correct answers to the 2021 health questions with an accuracy of 76%. Using our predicted answers, we can promote documents that we predict contain this answer and achieve a compatibility-difference score of 0.129, which is a three-fold increase in performance over the best previous automatic method.

MP2: A Momentum Contrast Approach for Recommendation with Pointwise and Pairwise Learning
Authors: Menghan Wang (1), Yuchen Guo (1), Zhenqi Zhao (2), Guangzheng Hu (3), Yuming Shen (4), Mingming Gong (3), Philip Torr (4)
1: eBay Inc., 2: Tencent Inc., 3: The University of Melbourne, 4: University of Oxford

ACM DL

Google Scholar

(219)
概要:　ディープラーニングに基づく推薦アルゴリズムでは、バイナリーポイントワイズラベル（別名：暗黙のフィードバック）が多用されています。本研究では、これらのラベルの表現力が制限されており、ユーザーの好みの程度の違いに対応しきれず、モデル訓練中にコンフリクトを引き起こす可能性があることを議論します。この問題を「アノテーションバイアス」と呼びます。この問題を解決するために、ペアワイズラベルのソフトラベリング特性がポイントワイズラベルのバイアスを軽減するのに利用できることを発見しました。この目的のために、ポイントワイズ学習とペアワイズ学習を組み合わせたモーメンタムコントラストフレームワーク（\method）を提案します。\method は、ユーザーネットワークと2つのアイテムネットワークからなる三塔型ネットワーク構造を持っています。2つのアイテムネットワークはそれぞれポイントワイズ損失とペアワイズ損失を計算するために使われます。アノテーションバイアスの影響を軽減するために、一貫したアイテム表現を確保するためのモーメンタムアップデートを実行します。実世界のデータセットを用いた広範な実験により、最新の推薦アルゴリズムに対する我々の手法の優位性が実証されました。

Abstract:　 Binary pointwise labels (aka implicit feedback) are heavily leveraged by deep learning based recommendation algorithms nowadays. In this paper we discuss the limited expressiveness of these labels may fail to accommodate varying degrees of user preference, and thus lead to conflicts during model training, which we call annotation bias. To solve this issue, we find the soft-labeling property of pairwise labels could be utilized to alleviate the bias of pointwise labels. To this end, we propose a momentum contrast framework (\method ) that combines pointwise and pairwise learning for recommendation. \method has a three-tower network structure: one user network and two item networks. The two item networks are used for computing pointwise and pairwise loss respectively. To alleviate the influence of the annotation bias, we perform a momentum update to ensure a consistent item representation. Extensive experiments on real-world datasets demonstrate the superiority of our method against state-of-the-art recommendation algorithms.

MetaCVR: Conversion Rate Prediction via Meta Learning in Small-Scale Recommendation Scenarios
Authors: Xiaofeng Pan (1), Ming Li (1), Jing Zhang (2), Keren Yu (1), Hong Wen (1), Luping Wang (1), Chengjun Mao (1), Bo Cao (1)
1: Alibaba Group, 2: The University of Sydney

ACM DL

Google Scholar

(220)
概要:　 TaobaoやAmazonのような大規模なプラットフォームとは異なり、小規模な推薦シナリオにおけるCVR（Conversion Rate）モデルの構築は、深刻なデータ分布の変動（DDF）の問題により、より困難です。DDFは、既存のCVRモデルが効果を発揮するのを妨げる要因として、以下の2点が挙げられます。1) 小規模シナリオではCVRモデルを十分に訓練するために数ヶ月分のデータが必要となり、これにより訓練時とオンライン利用時のデータ分布に大きな差異が生じる； 2) eコマースのプロモーションが小規模シナリオに大きな影響を与え、今後の時期の分布に不確実性をもたらす。本研究では、DDF問題に対処するために、メタ学習の観点から新しいCVR手法であるMetaCVRを提案します。まず、特徴表現ネットワーク（FRN）と出力層から構成される基本的なCVRモデルを設計し、数ヶ月間にわたるサンプルを用いて十分に訓練します。その後、異なるデータ分布を持つ期間を異なる機会と見なし、対応するサンプルと事前訓練されたFRNを用いて各機会の正および負のプロトタイプを取得します。次に、各サンプルとすべてのプロトタイプとの間の距離計量を計算するための距離計量ネットワーク（DMN）を考案し、分布の不確実性を軽減します。最後に、FRNとDMNの出力を統合して最終的なCVR予測を行うアンサンブル予測ネットワーク（EPN）を開発します。この段階では、FRNをフリーズし、DMNとEPNを最近の期間のサンプルで訓練するため、分布の差異を効果的に解消します。我々の知る限りでは、これは小規模な推薦シナリオにおけるDDF問題にターゲットを絞ったCVR予測の初の研究です。実世界のデータセットにおける実験結果は、MetaCVRの優位性を確認しており、オンラインA/BテストではPCVRが11.92%、GMVが8.64%の著しい向上を示しています。

Abstract:　 Different from large-scale platforms such as Taobao and Amazon, CVR modeling in small-scale recommendation scenarios is more challenging due to the severe Data Distribution Fluctuation (DDF) issue. DDF prevents existing CVR models from being effective since 1) several months of data are needed to train CVR models sufficiently in small scenarios, leading to considerable distribution discrepancy between training and online serving; and 2) e-commerce promotions have significant impacts on small scenarios, leading to distribution uncertainty of the upcoming time period. In this work, we propose a novel CVR method named MetaCVR from a perspective of meta learning to address the DDF issue. Firstly, a base CVR model which consists of a Feature Representation Network (FRN) and output layers is designed and trained sufficiently with samples across months. Then we treat time periods with different data distributions as different occasions and obtain positive and negative prototypes for each occasion using the corresponding samples and the pre-trained FRN. Subsequently, a Distance Metric Network (DMN) is devised to calculate the distance metrics between each sample and all prototypes to facilitate mitigating the distribution uncertainty. At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction. In this stage, we freeze the FRN and train the DMN and EPN with samples from recent time period, therefore effectively easing the distribution discrepancy. To the best of our knowledge, this is the first study of CVR prediction targeting the DDF issue in small-scale recommendation scenarios. Experimental results on real-world datasets validate the superiority of our MetaCVR and online A/B test also shows our model achieves impressive gains of 11.92% on PCVR and 8.64% on GMV.

A Multi-Task Based Neural Model to Simulate Users in Goal Oriented Dialogue Systems
Authors: To Eun Kim (1), Aldo Lipani (1)
1: University College London

ACM DL

Google Scholar

(221)
概要:　ユーザーの満足度評価、行動、発話を予測できる人間らしいユーザーシミュレーターは、目標指向対話システムにおいて会話の評価と対話戦略の洗練に役立ちます。しかし、ユーザーの発話を生成するユーザーシミュレーターの研究はほとんど行われていません。本論文では、ユーザーの満足度評価と行動を予測しつつ、ユーザーの発話をマルチタスク方式で生成する深層学習ベースのユーザーシミュレーターを提案します。特に、1) 提案するテキスト間マルチタスクニューラルモデルがユーザー満足度評価と行動予測タスクにおいて最先端の性能を達成すること、2) 除去実験によって、ユーザー満足度評価、行動予測、および発話生成タスクがそれぞれの性能を相互に向上させることが確認できました。本論文の実験で使用したソースコードおよびモデルのチェックポイントは、以下のウェブリンクから入手可能です：\urlhttps://github.com/kimdanny/user-simulation-t5。

Abstract:　 A human-like user simulator that anticipates users' satisfaction scores, actions, and utterances can help goal-oriented dialogue systems in evaluating the conversation and refining their dialogue strategies. However, little work has experimented with user simulators which can generate users' utterances. In this paper, we propose a deep learning-based user simulator that predicts users' satisfaction scores and actions while also jointly generating users' utterances in a multi-task manner. In particular, we show that 1) the proposed deep text-to-text multi-task neural model achieves state-of-the-art performance in the users' satisfaction scores and actions prediction tasks, and 2) in an ablation analysis, user satisfaction score prediction, action prediction, and utterance generation tasks can boost the performance with each other via positive transfers across the tasks. The source code and model checkpoints used for the experiments run in this paper are available at the following weblink: \urlhttps://github.com/kimdanny/user-simulation-t5.

Generalizing to the Future: Mitigating Entity Bias in Fake News Detection
Authors: Yongchun Zhu (1), Qiang Sheng (1), Juan Cao (1), Shuokai Li (1), Danding Wang (1), Fuzhen Zhuang (2)
1: Institute of Computing Technology, 2: Beihang University

ACM DL

Google Scholar

(222)
概要:　虚偽ニュースの広範な拡散は、個人および社会に対してますます脅威をもたらしています。虚偽ニュース検出は、過去のニュースに基づいてモデルを訓練し、将来の虚偽ニュースを検出することを目的としています。多くの努力がなされているにもかかわらず、既存の虚偽ニュース検出方法は現実世界のデータにおける意図しないエンティティバイアスを見落としており、これがモデルの将来データへの汎化能力に深刻な影響を与えています。例えば、2010年から2017年までのデータでは、「ドナルド・トランプ」というエンティティを含むニュースの97%が本物でしたが、その割合は2018年にはわずか33%にまで低下しています。これにより、前者のデータセットで訓練されたモデルが後者に対して汎化しにくくなり、訓練損失を低く抑えるために「ドナルド・トランプ」に関するニュースを本物と予測しがちになります。本論文では、エンティティバイアスを原因と結果の観点から軽減することで、将来のデータに汎化する虚偽ニュース検出モデルを提案します。エンティティ、ニュース内容、ニュースの信憑性の間の因果グラフに基づき、訓練中に各原因（エンティティと内容）の寄与を個別にモデル化します。推論段階では、エンティティの直接的な影響を取り除くことでエンティティバイアスを軽減します。英語および中国語のデータセットでの広範なオフライン実験により、提案されたフレームワークが基本的な虚偽ニュース検出器の性能を大幅に向上させることが示され、オンラインテストではその実用性が確認されました。我々の知る限り、これは虚偽ニュース検出モデルの将来データへの汎化能力を明示的に向上させる最初の研究です。コードはhttps://github.com/ICTMCG/ENDEF-SIGIR2022で公開されています。

Abstract:　 The wide dissemination of fake news is increasingly threatening both individuals and society. Fake news detection aims to train a model on the past news and detect fake news of the future. Though great efforts have been made, existing fake news detection methods overlooked the unintended entity bias in the real-world data, which seriously influences models' generalization ability to future data. For example, 97% of news pieces in 2010-2017 containing the entity 'Donald Trump' are real in our data, but the percentage falls down to merely 33% in 2018. This would lead the model trained on the former set to hardly generalize to the latter, as it tends to predict news pieces about 'Donald Trump' as real for lower training loss. In this paper, we propose an entity debiasing framework (ENDEF) which generalizes fake news detection models to the future data by mitigating entity bias from a cause-effect perspective. Based on the causal graph among entities, news contents, and news veracity, we separately model the contribution of each cause (entities and contents) during training. In the inference stage, we remove the direct effect of the entities to mitigate entity bias. Extensive offline experiments on the English and Chinese datasets demonstrate that the proposed framework can largely improve the performance of base fake news detectors, and online tests verify its superiority in practice. To the best of our knowledge, this is the first work to explicitly improve the generalization ability of fake news detection models to the future data. The code has been released at https://github.com/ICTMCG/ENDEF-SIGIR2022.

Dialogue Topic Segmentation via Parallel Extraction Network with Neighbor Smoothing
Authors: Jinxiong Xia (1), Cao Liu (2), Jiansong Chen (2), Yuchen Li (2), Fan Yang (2), Xunliang Cai (2), Guanglu Wan (2), Houfeng Wang (1)
1: Peking University, 2: Meituan

ACM DL

Google Scholar

(223)
概要:　対話トピック分割は、対話を事前に定義されたトピックのセグメントに分割する難しい課題です。既存のトピック分割の研究は、テキスト分割とセグメントラベリングを含む二段階のパラダイムを採用しています。しかし、このような方法はセグメント化において局所的な文脈に焦点を当てがちであり、セグメント間の依存関係が十分に捉えられていません。さらに、対話セグメントの境界における曖昧さとラベリングノイズが、既存モデルにさらなる課題をもたらします。本研究では、これらの問題に対処するために、隣接平滑化を導入した並列抽出ネットワーク（PEN-NS）を提案します。具体的には、セグメント間の依存関係を捉えるために、セグメントの二部マッチングコストを最適化する並列抽出ネットワークを提案します。また、セグメント境界のノイズと曖昧さに対処するために、隣接平滑化を提案します。対話ベースおよび文書ベースのトピック分割データセットにおける実験結果は、PEN-NSが最先端モデルを大幅に上回ることを示しています。

Abstract:　 Dialogue topic segmentation is a challenging task in which dialogues are split into segments with pre-defined topics. Existing works on topic segmentation adopt a two-stage paradigm, including text segmentation and segment labeling. However, such methods tend to focus on the local context in segmentation, and the inter-segment dependency is not well captured. Besides, the ambiguity and labeling noise in dialogue segment bounds bring further challenges to existing models. In this work, we propose the Parallel Extraction Network with Neighbor Smoothing (PEN-NS) to address the above issues. Specifically, we propose the parallel extraction network to perform segment extractions, optimizing the bipartite matching cost of segments to capture inter-segment dependency. Furthermore, we propose neighbor smoothing to handle the segment-bound noise and ambiguity. Experiments on a dialogue-based and a document-based topic segmentation dataset show that PEN-NS outperforms state-the-of-art models significantly.

Analysing the Robustness of Dual Encoders for Dense Retrieval Against Misspellings
Authors: Georgios Sidiropoulos (1), Evangelos Kanoulas (1)
1: University of Amsterdam

ACM DL

Google Scholar

(224)
概要:　高密度検索は、文書およびパッセージランキングの標準的アプローチの一つとなりつつあります。効率性と高パフォーマンスのため、デュアルエンコーダアーキテクチャは、質問とパッセージのペアのスコアリングに広く採用されています。通常、高密度検索モデルは、クリーンで精選されたデータセット上で評価されます。しかし、実際のアプリケーションに展開された場合、これらのモデルはノイズの多いユーザー生成テキストに直面します。このような状況では、最先端の高密度検索のパフォーマンスは大幅に低下することがあります。本研究では、ユーザーの質問に含まれるタイプミスに対する高密度検索のロバスト性を検討します。タイプミスに遭遇した際にデュアルエンコーダモデルのパフォーマンスが著しく低下することを観察し、データ拡張とコントラスト学習を組み合わせることでロバスト性を向上させる方法を模索します。我々の大規模パッセージランキングおよびオープンドメイン質問応答データセットに対する実験では、我々の提案するアプローチが競合するアプローチを上回ることを示しました。さらに、ロバスト性に関する徹底的な分析を行いました。最後に、異なるタイプミスが埋め込みのロバスト性に与える影響がどのように異なるか、そして我々の方法がいくつかのタイプミスの影響をどのように軽減するかについての洞察を提供します。

Abstract:　 Dense retrieval is becoming one of the standard approaches for document and passage ranking. The dual-encoder architecture is widely adopted for scoring question-passage pairs due to its efficiency and high performance. Typically, dense retrieval models are evaluated on clean and curated datasets. However, when deployed in real-life applications, these models encounter noisy user-generated text. That said, the performance of state-of-the-art dense retrievers can substantially deteriorate when exposed to noisy text. In this work, we study the robustness of dense retrievers against typos in the user question. We observe a significant drop in the performance of the dual-encoder model when encountering typos and explore ways to improve its robustness by combining data augmentation with contrastive learning. Our experiments on two large-scale passage ranking and open-domain question answering datasets show that our proposed approach outperforms competing approaches. Additionally, we perform a thorough analysis on robustness. Finally, we provide insights on how different typos affect the robustness of embeddings differently and how our method alleviates the effect of some typos but not of others.

From Cluster Ranking to Document Ranking
Authors: Egor Markovskiy (1), Fiana Raiber (2), Shoham Sabach (1), Oren Kurland (1)
1: Technion, 2: Yahoo Research

ACM DL

Google Scholar

(225)
概要:　アドホックな文書の検索において、一般的なアプローチは類似文書のクラスターを利用し、クエリに応じてクラスターをランク付けし、その後クラスターのランク付けを文書のランク付けに変換するというものです。本研究では、クラスターのランク付けを文書のランク付けに変換するための新しい教師ありアプローチを提案します。このアプローチにより、異なるクラスタリングおよびその結果得られるクラスターのランク付けを同時に活用することが可能となり、文書類似性空間のモデル化が向上します。実証的な評価の結果、我々のアプローチを使用することで、クラスターに基づく文書検索において、従来の技術を大幅に上回るパフォーマンスが得られることが示されました。

Abstract:　 The common approach of using clusters of similar documents for ad hoc document retrieval is to rank the clusters in response to the query; then, the cluster ranking is transformed to document ranking. We present a novel supervised approach to transform cluster ranking to document ranking. The approach allows to simultaneously utilize different clusterings and the resultant cluster rankings; this helps to improve the modeling of the document similarity space. Empirical evaluation shows that using our approach results in performance that substantially transcends the state-of-the-art in cluster-based document retrieval.

Unlearning Protected User Attributes in Recommendations with Adversarial Training
Authors: Christian Ganhör (1), David Penz (2), Navid Rekabsaz (3), Oleg Lesota (3), Markus Schedl (3)
1: Johannes Kepler University Linz, 2: Johannes Kepler University Linz & TU Wien, 3: Johannes Kepler University Linz & Linz Institute of Technology

ACM DL

Google Scholar

(226)
概要:　協調フィルタリングアルゴリズムは、特定の人口統計やユーザーの保護された情報、例えば性別、人種、場所などに特有の消費パターンを捉えることができます。これらのエンコードされたバイアスは、リコメンデーションシステム（RS）の決定に影響を及ぼし、様々な人口統計サブグループに提供されるコンテンツのさらなる分離を助長し、ユーザーの保護された属性を開示するプライバシー上の懸念を引き起こす可能性があります。本研究では、RSアルゴリズムの学習された相互作用表現から特定の保護情報を削除しながら、その有効性を維持する可能性と課題を検討します。具体的には、最先端のMultVAEアーキテクチャに対抗学習を組み込み、保護属性の暗黙の情報を削除しつつリコメンデーション性能を維持することを目的とした、新しいモデルである「多項分布尤度を持つ対抗的変分オートエンコーダー（Adv-MultVAE）」を提案します。MovieLens-1MおよびLFM-2b-DemoBiasデータセットを用いて実験を行い、外部攻撃者がモデルからユーザーの性別情報を明らかにすることができないことに基づいて、バイアス軽減方法の有効性を評価します。ベースラインのMultVAEと比較すると、Adv-MultVAEはパフォーマンスがわずかに低下する一方（NDCGおよびリコールに関して）、両データセットにおいてモデル内の内在するバイアスを大幅に軽減することが示されました。

Abstract:　 Collaborative filtering algorithms capture underlying consumption patterns, including the ones specific to particular demographics or protected information of users, e.g., gender, race, and location. These encoded biases can influence the decision of a recommendation system (RS) towards further separation of the contents provided to various demographic subgroups, and raise privacy concerns regarding the disclosure of users' protected attributes. In this work, we investigate the possibility and challenges of removing specific protected information of users from the learned interaction representations of a RS algorithm, while maintaining its effectiveness. Specifically, we incorporate adversarial training into the state-of-the-art MultVAE architecture, resulting in a novel model, Adversarial Variational Auto-Encoder with Multinomial Likelihood (Adv-MultVAE), which aims at removing the implicit information of protected attributes while preserving recommendation performance. We conduct experiments on the MovieLens-1M and LFM-2b-DemoBias datasets, and evaluate the effectiveness of the bias mitigation method based on the inability of external attackers in revealing the users' gender information from the model. Comparing with baseline MultVAE, the results show that Adv-MultVAE, with marginal deterioration in performance (w.r.t. NDCG and recall), largely mitigates inherent biases in the model on both datasets.

A 'Pointwise-Query, Listwise-Document' based Query Performance Prediction Approach
Authors: Suchana Datta (1), Sean MacAvaney (2), Debasis Ganguly (2), Derek Greene (1)
1: University College Dublin, 2: University of Glasgow

ACM DL

Google Scholar

(227)
概要:　情報検索（IR）におけるクエリ性能予測（QPP）の課題は、特定の入力クエリに対する検索システムの相対的な有効性を予測することに関わります。NeuralQPPなどの教師ありアプローチは、一般的にクエリのペアに対して訓練され、それらの相対的な検索パフォーマンスを捉えます。しかし、最近提案されたBERT-QPPのようなポイントワイズアプローチは、効率性の観点から一般的に好まれます。本論文では、個々のクエリにポイントワイズに訓練されるが、トップランクのドキュメントをチャンクに分割してリストワイズに訓練する、新しいエンドツーエンドのニューラルクロスエンコーダーベースのアプローチを提案します。従来の研究とは対照的に、本ネットワークは特定のクエリに対する各チャンク内の関連ドキュメント数を予測するように訓練されます。したがって、我々の手法は、トップ-k内の関連ドキュメントの予測ではなく、固定チャンクサイズp（p

Abstract:　 The task of Query Performance Prediction (QPP) in Information Retrieval (IR) involves predicting the relative effectiveness of a search system for a given input query. Supervised approaches for QPP, such as NeuralQPP are often trained on pairs of queries to capture their relative retrieval performance. However, pointwise approaches, such as the recently proposed BERT-QPP, are generally preferable for efficiency reasons. In this paper, we propose a novel end-to-end neural cross-encoder-based approach that is trained pointwise on individual queries, but listwise over the top ranked documents (split into chunks). In contrast to prior work, the network is then trained to predict the number of relevant documents in each chunk for a given query. Our method is thus a split-n-merge technique that instead of predicting the likely number of relevant documents in the top-k, rather predicts the number of relevant documents for each fixed chunk size p(p

How Does Feedback Signal Quality Impact Effectiveness of Pseudo Relevance Feedback for Passage Retrieval
Authors: Hang Li (1), Ahmed Mourad (1), Bevan Koopman (2), Guido Zuccon (1)
1: The University of Queensland, 2: CSIRO

ACM DL

Google Scholar

(228)
概要:　擬似関連フィードバック（Pseudo-Relevance Feedback, PRF）は、第一段階の順位付けシステムが返した上位の結果が元のクエリに関連していると仮定し、それらを使用して第二段階の検索のためにクエリ表現を改善します。しかしながら、この仮定はしばしば正しくありません。フィードバック文書の一部または全てが無関係である可能性があります。実際、PRFメソッドの有効性はフィードバック信号の質、すなわち第一段階の順位付けの有効性に大きく依存する可能性があります。それにもかかわらず、この側面はこれまでほとんど注目を浴びてきませんでした。本論文ではフィードバック信号の質を制御し、従来の単語袋モデル（Rocchio）や密なベクトルベースの方法（学習されたものとされていないものを含む）を含む様々なPRFメソッドへの影響を測定しました。我々の結果は、フィードバック信号の質がPRFメソッドの有効性に果たす重要な役割を示しています。重要なことに、驚くべきことに、我々の分析ではフィードバック信号の質が異なるとき、すべてのPRFメソッドが等しく効果的でないことが明らかになりました。これらの発見は、PRFメソッドのより良い理解を得るため、およびフィードバック信号の質に基づいてどのメソッドをいつ使用すべきかを判断するために重要であり、この分野での将来の研究の基盤を築きます。

Abstract:　 Pseudo-Relevance Feedback (PRF) assumes that the top results retrieved by a first-stage ranker are relevant to the original query and uses them to improve the query representation for a second round of retrieval. This assumption however is often not correct: some or even all of the feedback documents may be irrelevant. Indeed, the effectiveness of PRF methods may well depend on the quality of the feedback signal and thus on the effectiveness of the first-stage ranker. This aspect however has received little attention before. In this paper we control the quality of the feedback signal and measure its impact on a range of PRF methods, including traditional bag-of-words methods (Rocchio), and dense vector-based methods (learnt and not learnt). Our results show the important role the quality of the feedback signal plays on the effectiveness of PRF methods. Importantly, and surprisingly, our analysis reveals that not all PRF methods are the same when dealing with feedback signals of varying quality. These findings are critical to gain a better understanding of the PRF methods and of which and when they should be used, depending on the feedback signal quality, and set the basis for future research in this area.

Improving Contrastive Learning of Sentence Embeddings with Case-Augmented Positives and Retrieved Negatives
Authors: Wei Wang (1), Liangzhu Ge (1), Jingqiao Zhang (1), Cheng Yang (1)
1: Alibaba Group

ACM DL

Google Scholar

(229)
概要:　 SimCSE に続いて、コントラスト学習ベースの手法は文埋め込みの学習において最先端の性能を達成しています。しかしながら、無監督のコントラスト学習法は依然として監督法に大きく遅れをとっています。この原因を正負サンプルの質に起因するとし、両者の改善を目指します。具体的には、正例サンプルに対しては、文中のランダムに選ばれた単語の最初の文字の大文字を逆転させる switch-case augmentation を提案します。これは、事前学習されたトークン埋め込みの頻度、単語の大文字小文字、部分単語に対する内在的なバイアスを打ち消すためです。負例サンプルについては、事前学習された言語モデルに基づいてデータセット全体からハードネガティブをサンプリングします。これら二つの方法を SimCSE と組み合わせることで、我々の提案する増強および検索データを用いたコントラスト学習(CARDS)メソッドは、無監督の設定において STS ベンチマークでの現行最先端を大幅に上回る性能を達成しました。

Abstract:　 Following SimCSE, contrastive learning based methods have achieved the state-of-the-art (SOTA) performance in learning sentence embeddings. However, the unsupervised contrastive learning methods still lag far behind the supervised counterparts. We attribute this to the quality of positive and negative samples, and aim to improve both. Specifically, for positive samples, we propose switch-case augmentation to flip the case of the first letter of randomly selected words in a sentence. This is to counteract the intrinsic bias of pre-trained token embeddings to frequency, word cases and subwords. For negative samples, we sample hard negatives from the whole dataset based on a pre-trained language model. Combining the above two methods with SimCSE, our proposed Contrastive learning with Augmented and Retrieved Data for Sentence embedding (CARDS) method significantly surpasses the current SOTA on STS benchmarks in the unsupervised setting.

Expression Syntax Information Bottleneck for Math Word Problems
Authors: Jing Xiong (1), Chengming Li (1), Min Yang (2), Xiping Hu (1), Bin Hu (3)
1: Sun Yat-sen University, 2: Shenzhen Institute of Advanced Technology, 3: Lanzhou University

ACM DL

Google Scholar

(230)
概要:　数式問題（Math Word Problems, MWP）は、テキストに含まれる数学的な質問を自動的に解答することを目指しています。従来の研究では、元のテキストから追加情報を捕捉するために複雑なモデルを設計し、より包括的な特徴をモデルに提供しようとする傾向があります。本研究では、この方向とは逆に、MWPに対して冗長な特徴を排除する方法に焦点を当てています。本目的のために、我々は変分情報ボトルネックに基づいたExpression Syntax Information Bottleneck（ESIB）手法を設計しました。この手法は、構文木の重要な特徴を抽出し、構文に関連しない潜在的な冗長性をフィルタリングします。ESIBの主要なアイデアは、複数のモデルが異なる問題表現に対して同じ構文木を予測するよう相互に学習することで、構文木の一貫した情報を取得し、潜在的な冗長性を排除することにあります。モデルの一般化能力を向上させ、多様な表現を生成するために、我々は自己蒸留損失を設計し、モデルが潜在空間内の構文情報により依存するよう促します。二つの大規模なベンチマークにおける実験結果は、我々のモデルが最先端の結果を達成するだけでなく、より多様な解答を生成することを示しています。

Abstract:　 Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original text so as to enable the model to gain more comprehensive features. In this paper, we turn our attention in the opposite direction, and work on how to discard redundant features containing spurious correlations for MWP. To this end, we design an Expression Syntax Information Bottleneck method for MWP (called ESIB) based on variational information bottleneck, which extracts essential features of the expression syntax tree while filtering latent-specific redundancy containing syntax-irrelevant features. The key idea of ESIB is to encourage multiple models to predict the same expression syntax tree for different problem representations of the same problem by mutual learning so as to capture consistent information of expression syntax tree and discard latent-specific redundancy. To improve the generalization ability of the model and generate more diverse expressions, we design a self-distillation loss to encourage the model to rely more on the expression syntax information in the latent space. Experimental results on two large-scale benchmarks show that our model not only achieves state-of-the-art results but also generates more diverse solutions.

Masking and Generation: An Unsupervised Method for Sarcasm Detection
Authors: Rui Wang (1), Qianlong Wang (1), Bin Liang (1), Yi Chen (1), Zhiyuan Wen (1), Bing Qin (2), Ruifeng Xu (3)
1: Harbin Institute of Technology, 2: Harbin Institute of Technology, 3: Harbin Institute of Technology & Peng Cheng Laboratory

ACM DL

Google Scholar

(231)
概要:　既存の皮肉検出のアプローチは主に教師あり学習に基づいており、その有望な性能は主に大量のラベル付きデータや追加情報に依存します。しかし、現実世界のシナリオでは、豊富なラベル付きデータや追加情報を得るには高い労働コストがかかり、さらに多くのリソースが限られた状況では十分な注釈データが利用できないことが多々あります。このジレンマを緩和するために、私たちは皮肉検出を教師なし学習の視点から調査し、文脈におけるマスキングと生成のパラダイムを探求して、皮肉な表現を学習するための文脈の不整合を抽出します。さらに、文の特徴表現を向上させるために、標準的なドロップアウトに基づいて文表現を改善するための教師なしのコントラスト学習を使用します。6つの認識された皮肉検出ベンチマークデータセットにおける実験結果は、私たちのアプローチがベースラインを上回ることを示しています。同時に、私たちの教師なし方法は、意図された皮肉データセットに対して教師あり方法と同等の性能を得ています。

Abstract:　 Existing approaches for sarcasm detection are mainly based on supervised learning, in which the promising performance largely depends on a considerable amount of labeled data or extra information. In the real world scenario, however, the abundant labeled data or extra information requires high labor cost, not to mention that sufficient annotated data is unavailable in many low-resource conditions. To alleviate this dilemma, we investigate sarcasm detection from an unsupervised perspective, in which we explore a masking and generation paradigm in the context to extract the context incongruities for learning sarcastic expression. Further, to improve the feature representations of the sentences, we use unsupervised contrastive learning to improve the sentence representation based on the standard dropout. Experimental results on six perceived sarcasm detection benchmark datasets show that our approach outperforms baselines. Simultaneously, our unsupervised method obtains comparative performance with supervised methods for the intended sarcasm dataset.

Cross-Probe BERT for Fast Cross-Modal Search
Authors: Tan Yu (1), Hongliang Fei (1), Ping Li (1)
1: Baidu Research

ACM DL

Google Scholar

(232)
概要:　クロスモーダルアテンションの有効性により、テキストと画像のBERTモデルはテキストと画像の検索タスクで優れたパフォーマンスを発揮しています。しかし、ペア入力の処理が必要なため、テキストと画像のBERTモデルでクロスモーダルアテンションを用いると、計算コストが高くなります。したがって、実際の大規模なクロスモーダル検索にこれらを適用するのは通常非現実的です。この問題に対処するために、本研究では新しいアーキテクチャであるクロスプローブ BERT を開発しました。このモデルでは、少数のテキストおよび画像プローブを使用し、テキストと画像プローブの相互作用によって効率的にクロスモーダルアテンションを実現します。これにより軽量な計算コストで効果的にクロスモーダルアテンションを活用できます。公開ベンチマークでの系統的な実験により、クロスプローブ BERT の優れた有効性と効率性が実証されました。

Abstract:　 Owing to the effectiveness of cross-modal attentions, text-vision BERT models have achieved excellent performance in text-image retrieval. Nevertheless, cross-modal attentions in text-vision BERT models require expensive computation cost when tackling text-vision retrieval due to their pairwise input. Therefore, normally, it is impractical for deploying them for large-scale cross-modal retrieval in real applications. To address the inefficiency issue in exiting text-vision BERT models, in this work, we develop a novel architecture, cross-probe BERT. It devises a small number of text and vision probes, and the cross-modal attentions are efficiency achieved through the interactions between text and vision probes. It takes lightweight computation cost, and meanwhile effectively exploits cross-modal attention. Systematic experiments on public benchmarks demonstrate the excellent effectiveness and efficiency of our cross-probe BERT.

GERE: Generative Evidence Retrieval for Fact Verification
Authors: Jiangui Chen (1), Ruqing Zhang (1), Jiafeng Guo (1), Yixing Fan (1), Xueqi Cheng (1)
1: ICT

ACM DL

Google Scholar

(233)
概要:　事実検証（Fact Verification, FV）は、信頼できるコーパス（例：Wikipedia）から複数の証拠文を用いて主張を検証する、困難なタスクです。既存の多くのアプローチは、文書検索、文検索、そして主張検証という三段階のパイプラインフレームワークに従っています。最初の二つのステップで提供される高品質な証拠は、最後のステップでの効果的な推論の基盤となります。しかし、重要であるにもかかわらず、FVにおける高品質な証拠は既存の研究であまり検討されていません。多くの場合、関連する文書や文を「インデックス作成→検索→順位付け」という方法で取得する既製のモデルが採用されています。この古典的なアプローチには、以下のような明確な欠点があります。i) 大規模な文書インデックスと複雑な検索プロセスが必要であり、これにより多大なメモリと計算資源が消費される；ii) 文書間や文間の相互作用を捉えられない独立採点のパラダイム；iii) 最終的な証拠セットを形成するために固定数の文を選択する点。この研究では、GEREというシステムを提案します。これは、生成的な方法で文書タイトルおよび証拠文識別子を生成しながら証拠を取得する初のシステムです。これにより、以下の技術的な問題を軽減できます。i) 文書インデックスが不要となり、負荷の高い順位付けプロセスが軽量な生成プロセスに置き換えられるため、メモリと計算コストが大幅に削減される；ii) 証拠間の依存関係が逐次生成プロセスを通じて捕捉される；iii) 生成式により、各主張に対して動的に関連する証拠の精密なセットを選択できる。FEVERデータセットでの実験結果から、GEREは最先端のベースラインを時間効率とメモリ効率の両面で大幅に上回る成果を達成しました。

Abstract:　 Fact verification (FV) is a challenging task which aims to verify a claim using multiple evidential sentences from trustworthy corpora, e.g., Wikipedia. Most existing approaches follow a three-step pipeline framework, including document retrieval, sentence retrieval and claim verification. High-quality evidences provided by the first two steps are the foundation of the effective reasoning in the last step. Despite being important, high-quality evidences are rarely studied by existing works for FV, which often adopt the off-the-shelf models to retrieve relevant documents and sentences in an "index-retrieve-then-rank'' fashion. This classical approach has clear drawbacks as follows: i) a large document index as well as a complicated search process is required, leading to considerable memory and computational overhead; ii) independent scoring paradigms fail to capture the interactions among documents and sentences in ranking; iii) a fixed number of sentences are selected to form the final evidence set. In this work, we proposeGERE, the first system that retrieves evidences in a generative fashion, i.e., generating the document titles as well as evidence sentence identifiers. This enables us to mitigate the aforementioned technical issues since: i) the memory and computational cost is greatly reduced because the document index is eliminated and the heavy ranking process is replaced by a light generative process; ii) the dependency between documents and that between sentences could be captured via sequential generation process; iii) the generative formulation allows us to dynamically select a precise set of relevant evidences for each claim. The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines, with both time-efficiency and memory-efficiency.

DH-HGCN: Dual Homogeneity Hypergraph Convolutional Network for Multiple Social Recommendations
Authors: Jiadi Han (1), Qian Tao (1), Yufei Tang (2), Yuhan Xia (3)
1: South China University of Technology, 2: Florida Atlantic University, 3: Northeastern University

ACM DL

Google Scholar

(234)
概要:　社会的関係は、推薦の精度を向上させるための補助情報としてよく利用されます。現実の世界では、ユーザー間の社会的関係は複雑で多様です。しかし、既存の多くの推薦手法は単一の社会的関係のみ（ペアワイズの関係を利用してユーザーの好みを探る）を仮定しており、ユーザーの好みに対する多面的な社会的関係の影響（ユーザー関係の高次の複雑さ）を無視しています。さらに、観察された事実として、類似したアイテムはユーザーに対して魅力が似ていることが多いことが挙げられ、これはアイテムの静的属性間に潜在的な関係があることを示唆しています。本研究では、社会的関係とアイテム間の接続からデュアルな均質性をモデル化するために、ハイパーグラフ畳み込みネットワーク（DH-HGCN）を提唱し、ユーザーとアイテム間の高次相関を取得します。具体的には、感情分析を使用してコメント関係を抽出し、k平均クラスタリングを使用してアイテム間の相関関係を構築し、それらの異質なグラフを統一されたフレームワークで最適化します。2つの実世界のデータセットに対する広範な実験により、本モデルの有効性が実証されました。

Abstract:　 Social relations are often used as auxiliary information to improve recommendations. In the real-world, social relations among users are complex and diverse. However, most existing recommendation methods assume only single social relation (i.e., exploit pairwise relations to mine user preferences), ignoring the impact of multifaceted social relations on user preferences (i.e., high order complexity of user relations). Moreover, an observing fact is that similar items always have similar attractiveness when exposed to users, indicating a potential connection among the static attributes of items. Here, we advocate modeling the dual homogeneity from social relations and item connections by hypergraph convolution networks, named DH-HGCN, to obtain high-order correlations among users and items. Specifically, we use sentiment analysis to extract comment relation and use the k-means clustering to construct item-item correlations, and we then optimize those heterogeneous graphs in a unified framework. Extensive experiments on two real-world datasets demonstrate the effectiveness of our model.

Clustering based Behavior Sampling with Long Sequential Data for CTR Prediction
Authors: Yuren Zhang (1), Enhong Chen (1), Binbin Jin (2), Hao Wang (1), Min Hou (1), Wei Huang (1), Runlong Yu (1)
1: University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, 2: Huawei Cloud Computing Technologies Co.

ACM DL

Google Scholar

(235)
概要:　クリック率 (CTR) 予測は、オンライン広告やレコメンダーシステムなど多くの産業応用において基盤的な役割を果たします。オンラインプラットフォームの発展に伴い、ユーザーの連続的な行動データは急速に増加しており、ユーザーの嗜好をより良く理解するための大きな機会が生まれています。しかし、既存の連続モデルが各ユーザーの全行動履歴を効果的に利用するのは非常に難しいです。まず、長い履歴には多くのノイズが含まれており、それが予測性能に深刻な影響を与える可能性があります。次に、長い行動シーケンスを直接取り込むと、推論時間やストレージコストが現実的ではありません。これらの課題に対処するために、本論文では新しいフレームワーク「User Behavior Clustering Sampling (UBCS)」を提案します。UBCSでは、2つのカスケードモジュールを用いてユーザー履歴全体から短いサブシーケンスを取得します: (i) 行動サンプリングモジュールは、関連性と時間情報を考慮した新しいサンプリング方法を使用して候補アイテムに関連する短いシーケンスをサンプリングします; (ii) アイテムクラスタリングモジュールは、アイテムを少数のクラスタ中心にクラスタリングし、ノイズの影響を軽減し効率を改善します。その後、サンプリングされた短いサブシーケンスがCTR予測モジュールに供給され、効率的な予測が行われます。さらに、ユーザーのパーソナ嗜好を抽出しサンプリングモジュールを効果的に最適化するために、自己監督型の一貫性事前学習タスクを実施します。実世界のデータセットを用いた実験により、提案されたフレームワークの優位性と効率性が実証されました。

Abstract:　 Click-through rate (CTR) prediction is fundamental in many industrial applications, such as online advertising and recommender systems. With the development of the online platforms, the sequential user behaviors grow rapidly, bringing us great opportunity to better understand user preferences.However, it is extremely challenging for existing sequential models to effectively utilize the entire behavior history of each user. First, there is a lot of noise in such long histories, which can seriously hurt the prediction performance. Second, feeding the long behavior sequence directly results in infeasible inference time and storage cost. In order to tackle these challenges, in this paper we propose a novel framework, which we name as User Behavior Clustering Sampling (UBCS). In UBCS, short sub-sequences will be obtained from the whole user history sequence with two cascaded modules: (i) Behavior Sampling module samples short sequences related to candidate items using a novel sampling method which takes relevance and temporal information into consideration; (ii) Item Clustering module clusters items into a small number of cluster centroids, mitigating the impact of noise and improving efficiency. Then, the sampled short sub-sequences will be fed into the CTR prediction module for efficient prediction. Moreover, we conduct a self-supervised consistency pre-training task to extract user persona preference and optimize the sampling module effectively. Experiments on real-world datasets demonstrate the superiority and efficiency of our proposed framework.

Conversational Recommendation via Hierarchical Information Modeling
Authors: Quan Tu (1), Shen Gao (2), Yanran Li (3), Jianwei Cui (3), Bin Wang (3), Rui Yan (1)
1: Renmin University of China, 2: Peking University, 3: Xiaomi AI Lab

ACM DL

Google Scholar

(236)
概要:　会話型レコメンデーションシステムは、ユーザーに属性に関する好みを直接尋ねたり、アイテムリストを推薦することで、適切なアイテムを提案することを目的としています。しかし、既存の多くの方法は、アイテムと属性の関係をフラットにしか扱わず、類似ユーザーによって接続される階層的な関係を無視しています。これらの階層的な関係は、より包括的な情報を提供する可能性があります。また、これらの方法は通常、ユーザーが受け入れた属性を会話履歴の代表として使用し、過去のターンにおける順序的な遷移の階層的情報を無視します。本論文では、CRSの性能を向上させるために、2種類の階層的情報をモデル化する階層情報認識型会話レコメンダー（HICR）を提案します。4つのベンチマークデータセットで行った実験は、我々の提案モデルの有効性を検証しています。

Abstract:　 Conversational recommendation system aims to recommend appropriate items to user by directly asking preference on attributes or recommending item list. However, most of existing methods only employ the flat item and attribute relationship, and ignore the hierarchical relationship connected by the similar user which can provide more comprehensive information. And these methods usually use the user accepted attributes to represent the conversational history and ignore the hierarchical information of sequential transition in the historical turns. In this paper, we propose Hierarchical Information-aware Conversational Recommender (HICR) to model the two types of hierarchical information to boost the performance of CRS. Experiments conducted on four benchmark datasets verify the effectiveness of our proposed model.

Relation-Guided Few-Shot Relational Triple Extraction
Authors: Xin Cong (1), Jiawei Sheng (1), Shiyao Cui (1), Bowen Yu (1), Tingwen Liu (1), Bin Wang (2)
1: Institute of Information Engineering, 2: Xiaomi Inc.

ACM DL

Google Scholar

(237)
概要:　少数ショット関係三重項抽出（FS-RTE）では、少数のアノテートされたサンプルを用いて、文章から関係三重項を抽出することを目的とします。最近の研究では、まずすべてのエンティティを抽出し、その後にそれらの関係を分類する方法が取られています。しかし、このエンティティ-関係のパラダイムでは、関係間のエンティティの違いが無視されることがあります。この問題を解決するために、FS-RTEのための新しいタスク分解戦略である「関係-エンティティ」（Relation-then-Entity）を提案します。これはまず文中に出現する関係を検出し、次に検出された関係の対応するヘッド・テイルのエンティティを抽出する方法です。この戦略を具現化するために、二重レベルのアテンションを構築するモデル「RelATE」を提案します。このモデルは関係に関連する情報を集約して関係の出現を検出し、検出された関係のアノテートされたサンプルを用いて対応するヘッド・テイルのエンティティを抽出します。実験結果は、我々のモデルが以前の研究に比べて（2つの少数ショット設定においてF1スコアでそれぞれ18.98%、28.85%の絶対差）大幅に優れていることを示しています。

Abstract:　 In few-shot relational triple extraction (FS-RTE), one seeks to extract relational triples from plain texts by utilizing only few annotated samples. Recent work first extracts all entities and then classifies their relations. Such an entity-then-relation paradigm ignores the entity discrepancy between relations. To address it, we propose a novel task decomposition strategy, Relation-then-Entity, for FS-RTE. It first detects relations occurred in a sentence and then extracts the corresponding head/tail entities of the detected relations. To instantiate this strategy, we further propose a model, RelATE, which builds a dual-level attention to aggregate relation-relevant information to detect the relation occurrence and utilizes the annotated samples of the detected relations to extract the corresponding head/tail entities. Experimental results show that our model outperforms previous work by an absolute gain (18.98%, 28.85% in F1 in two few-shot settings).

On Survivorship Bias in MS MARCO
Authors: Prashansa Gupta (1), Sean MacAvaney (1)
1: University of Glasgow

ACM DL

Google Scholar

(238)
概要:　サバイバーシップバイアスは、選択プロセスの正の結果に焦点を当て、負の結果を生み出す選択を見逃しがちな傾向を指します。このバイアスは、注釈者が38～45％のクエリに対して回答を見つけられず、それらのクエリがトレーニングや評価プロセスで除外されるため、人気のMS MARCOデータセットにも存在する可能性があります。MS MARCOにおける除外されたクエリの中には不明瞭または回答不可能なものも含まれていますが、多くは完全な注釈があれば解答可能な正当な質問です（現代のランキング技術を用いると約3分の2は解答可能です）。このサバイバーシップバイアスは、いくつかの面でMS MARCOコレクションを歪めます。我々は、情報の種類に基づくクエリの自然な分布が影響を受けることを発見しました。評価に使用する場合、このバイアスは観測される絶対的なパフォーマンススコアに著しい歪みをもたらす可能性があります。さらに、MS MARCOはモデルのトレーニングによく使用されるため、サバイバーシップバイアスをシミュレートしたMS MARCOのサブセットに基づいてモデルをトレーニングしました。この設定でトレーニングされたモデルは、より完全に注釈されたデータセットのバージョンで評価した場合、最大9.9％悪化し、ゼロショット転送では最大3.5％悪化することがわかりました。我々の発見は、MS MARCOのさらなる注釈についての最近の他の提案に補完的なものであり、除外されたクエリに焦点を当てています。

Abstract:　 Survivorship bias is the tendency to concentrate on the positive outcomes of a selection process and overlook the results that generate negative outcomes. We observe that this bias could be present in the popular MS MARCO dataset, given that annotators could not find answers to 38--45% of the queries, leading to these queries being discarded in training and evaluation processes. Although we find that some discarded queries in MS MARCO are ill-defined or otherwise unanswerable, many are valid questions that could be answered had the collection been annotated more completely (around two thirds using modern ranking techniques). This survivability problem distorts the MS MARCO collection in several ways. We find that it affects the natural distribution of queries in terms of the type of information needed. When used for evaluation, we find that the bias likely yields a significant distortion of the absolute performance scores observed. Finally, given that MS MARCO is frequently used for model training, we train models based on subsets of MS MARCO that simulates more survivorship bias. We find that models trained in this setting are up to 9.9% worse when evaluated on versions of the dataset with more complete annotations, and up to 3.5% worse at zero-shot transfer. Our findings are complementary to other recent suggestions for further annotation of MS MARCO, but with a focus on discarded queries.

An Efficiency Study for SPLADE Models
Authors: Carlos Lassance (1), Stéphane Clinchant (1)
1: Naver Labs Europe

ACM DL

Google Scholar

(239)
概要:　複数のハードウェアおよびソフトウェアのテストシナリオにおいて、事前学習済み言語モデル（PLM）に基づく情報検索（IR）モデルを評価する際、遅延と効率の問題はしばしば見落とされがちです。しかしながら、効率性はこのようなシステムの重要な部分であり、無視されるべきではありません。本論文では、最新のゼロショット性能を達成し、TRECコレクションで競争力のある結果を示したSPLADEモデルの効率性向上に焦点を当てます。SPLADEの効率性は正則化因子によって調整可能ですが、この正則化の制御のみでは十分な効率性が得られないことが示されています。SPLADEと従来の検索システムとの遅延差を減少させるために、我々はクエリに対するL1正則化、ドキュメント／クエリエンコーダーの分離、FLOPS正則化中間トレーニング、およびより高速なクエリエンコーダーの使用など、いくつかの技術を提案します。我々のベンチマークは、インドメインデータにおいて性能指標を向上させながら、これらのモデルの効率性を飛躍的に改善できることを示しています。我々の知る限り、同じ計算制約の下で、従来のBM25との遅延差が4ミリ秒未満でありながら、インドメインデータでは最新の単一ステージニューラルランカーと同等の性能（MRR@10の減少が10%未満）を達成する最初のニューラルモデルを提案します。

Abstract:　 Latency and efficiency issues are often overlooked when evaluating IR models based on Pretrained Language Models (PLMs) in reason of multiple hardware and software testing scenarios. Nevertheless, efficiency is an important part of such systems and should not be overlooked. In this paper, we focus on improving the efficiency of the SPLADE model since it has achieved state-of-the-art zero-shot performance and competitive results on TREC collections. SPLADE efficiency can be controlled via a regularization factor, but solely controlling this regularization has been shown to not be efficient enough. In order to reduce the latency gap between SPLADE and traditional retrieval systems, we propose several techniques including L1 regularization for queries, a separation of document/query encoders, a FLOPS-regularized middle-training, and the use of faster query encoders. Our benchmark demonstrates that we can drastically improve the efficiency of these models while increasing the performance metrics on in-domain data. To our knowledge, we propose the first neural models that, under the same computing constraints, achieve similar latency (less than 4ms difference) as traditional BM25, while having similar performance (less than 10% MRR@10 reduction) as the state-of-the-art single-stage neural rankers on in-domain data.

Tensor-based Graph Modularity for Text Data Clustering
Authors: Rafika Boutalbi (1), Mira Ait-Saada (2), Anastasiia Iurshina (1), Steffen Staab (1), Mohamed Nadif (2)
1: University of Stuttgart, 2: Université de Paris

ACM DL

Google Scholar

(240)
概要:　グラフは、インスタンス間の類似性を表現するために多くの応用で使用されます。テキストデータに関しては、テキストをバッグ・オブ・ワーズ、静的埋め込み（Word2vec、GloVeなど）、および文脈埋め込み（BERT、RoBERTaなど）といった異なる特徴で表現可能であり、それぞれの表現に基づいて複数の類似性（またはグラフ）が生成されます。本研究の提案では、各グラフ内の局所不変性と異なるグラフ間の一貫性を組み合わせることで、文書クラスタリングが改善されるというコンセンサスクラスタリングを提唱します。この問題は複雑であり、各グラフに含まれるスパースデータやノイズデータによって挑戦されます。これを解決するために、グラフクラスタリングを効果的に評価するモジュラリティ指標に依拠します。したがって、スパーステンソル表現とグラフモジュラリティに基づく新しいテキストクラスタリングアプローチを提案します。これにより、異なるグラフから得られる情報を捕捉しつつテキスト（ノード）をクラスタリングします。私たちはテンソルベースのグラフモジュラリティ基準を反復的に最大化します。標準的なテキストクラスタリングデータセットでの広範な実験により、提案されたアルゴリズム（Tensor Graph Modularity - TGM と呼ばれる）が、クラスタリングタスクにおいて他のベースライン手法よりも優れていることが示されました。ソースコードは https://github.com/TGMclustering/TGMclustering にて入手可能です。

Abstract:　 Graphs are used in several applications to represent similarities between instances. For text data, we can represent texts by different features such as bag-of-words, static embeddings (Word2vec, GloVe, etc.), and contextual embeddings (BERT, RoBERTa, etc.), leading to multiple similarities (or graphs) based on each representation. The proposal posits that incorporating the local invariance within every graph and the consistency across different graphs leads to a consensus clustering that improves the document clustering. This problem is complex and challenged with the sparsity and the noisy data included in each graph. To this end, we rely on the modularity metric, which effectively evaluates graph clustering in such circumstances. Therefore, we present a novel approach for text clustering based on both a sparse tensor representation and graph modularity. This leads to cluster texts (nodes) while capturing information arising from the different graphs. We iteratively maximize a Tensor-based Graph Modularity criterion. Extensive experiments on benchmark text clustering datasets are performed, showing that the proposed algorithm referred to as Tensor Graph Modularity -TGM- outperforms other baseline methods in terms of clustering task. The source code is available at https://github.com/TGMclustering/TGMclustering.

Learned Token Pruning in Contextualized Late Interaction over BERT (ColBERT)
Authors: Carlos Lassance (1), Maroua Maachou (1), Joohee Park (2), Stéphane Clinchant (1)
1: Naver Labs Europe, 2: Naver

ACM DL

Google Scholar

(241)
概要:　 BERTベースのランカーは、情報検索タスクにおいて再ランキング手法として非常に効果的であることが示されています。これらのモデルを完全なランキングシナリオに拡張するため、最近ColBERTモデルが提案され、後期相互作用メカニズムを採用しています。このメカニズムにより、文書の表現を事前に計算することが可能となります。しかし、後期相互作用メカニズムは、各文書の各トークンの表現を保存する必要があるため、大規模なインデックスサイズを引き起こします。本研究では、この問題を軽減するためにトークン削減技術に焦点を当てます。私たちは、簡単な方法から注意メカニズムの単一層を用いてインデックス時に保持するトークンを選択する方法まで、4つの手法をテストしました。実験結果によると、MS MARCO-passagesコレクションでは、元のサイズの70%までインデックスを削減しても、パフォーマンスに大きな低下は見られませんでした。また、MS MARCO-documentsコレクションおよびBEIRベンチマークで評価を行い、提案されたメカニズムに対するいくつかの課題が明らかになりました。

Abstract:　 BERT-based rankers have been shown very effective as rerankers in information retrieval tasks. In order to extend these models to full-ranking scenarios, the ColBERT model has been recently proposed, which adopts a late interaction mechanism. This mechanism allows for the representation of documents to be precomputed in advance. However, the late-interaction mechanism leads to large index size, as one needs to save a representation for each token of every document. In this work, we focus on token pruning techniques in order to mitigate this problem. We test four methods, ranging from simpler ones to the use of a single layer of attention mechanism to select the tokens to keep at indexing time. Our experiments show that for the MS MARCO-passages collection, indexes can be pruned up to 70% of their original size, without a significant drop in performance. We also evaluate on the MS MARCO-documents collection and the BEIR benchmark, which reveals some challenges for the proposed mechanism.

AHP: Learning to Negative Sample for Hyperedge Prediction
Authors: Hyunjin Hwang (1), Seungwoo Lee (2), Chanyoung Park (2), Kijung Shin (1)
1: KAIST, 2: KAIST

ACM DL

Google Scholar

(242)
概要:　ハイパーグラフ（すなわち、ハイパーエッジの集合）は、グループ関係を自然に表現します（例として、共同で論文を執筆する研究者や、レシピで一緒に使われる材料などが挙げられます）。これらの関係はそれぞれ一つのハイパーエッジ（すなわち、ノードの部分集合）に対応します。将来のハイパーエッジや欠落したハイパーエッジを予測することは、多くの応用分野において重要な意味を持ちます（例：共同研究の促進やレシピの推薦）。ハイパーエッジ予測を特に困難にするのは、非ハイパーエッジ部分集合の数が膨大であり、ノードの数に比例して指数関数的に増えることです。全ての非ハイパーエッジを負の例としてモデルを訓練するのは現実的ではないため、ごく一部をサンプリングする必要があります。この目的のために、ヒューリスティックなサンプリング手法が用いられてきました。しかし、訓練されたモデルは異なる性質を持つ例に対して、一般化能力が低い場合があります。本論文では、対敵訓練を用いたハイパーエッジ予測法であるAHPを提案します。この方法は、どのようなヒューリスティック手法にも依存せずに負の例をサンプリングすることを学習します。我々は6つの実際のハイパーグラフを用いて、AHPが様々な性質を持つ負の例に対してより良い一般化性能を示すことを確かめました。既存の最良手法と比較して最大28.2%高いAUROCを達成し、テストセットに合わせたサンプリング手法を用いたそのバリアントをしばしば上回ります。

Abstract:　 Hypergraphs (i.e., sets of hyperedges) naturally represent group relations (e.g., researchers co-authoring a paper and ingredients used together in a recipe), each of which corresponds to a hyperedge (i.e., a subset of nodes). Predicting future or missing hyperedges bears significant implications for many applications (e.g., collaboration and recipe recommendation). What makes hyperedge prediction particularly challenging is the vast number of non-hyperedge subsets, which grows exponentially with the number of nodes. Since it is prohibitive to use all of them as negative examples for model training, it is inevitable to sample a very small portion of them, and to this end, heuristic sampling schemes have been employed. However, trained models suffer from poor generalization capability for examples of different natures. In this paper, we propose AHP, an adversarial training-based hyperedge-prediction method. It learns to sample negative examples without relying on any heuristic schemes. Using six real hypergraphs, we show that AHP generalizes better to negative examples of various natures. It yields up to 28.2% higher AUROC than the best existing methods and often even outperforms its variants with sampling schemes tailored to test sets.

GraFN: Semi-Supervised Node Classification on Graph with Few Labels via Non-Parametric Distribution Assignment
Authors: Junseok Lee (1), Yunhak Oh (1), Yeonjun In (1), Namkyeong Lee (1), Dongmin Hyun (2), Chanyoung Park (1)
1: KAIST, 2: POSTECH

ACM DL

Google Scholar

(243)
概要:　グラフニューラルネットワーク（GNN）は様々なアプリケーションで成功を収めているにもかかわらず、監督信号（つまり、ラベル付きノードの数）が限られている場合には、性能の大幅な低下に直面します。これはGNNがラベル付きノードから得られる監督に基づいてのみ訓練されるため、予期される状況です。一方、最近の自己教師あり学習パラダイムは、ラベル付きノードを必要としない補助タスクを解くことでGNNを訓練することを目指しており、少数のラベル付きノードで訓練されたGNNを凌駕することさえ示されています。しかし、自己教師あり方法の大きな欠点は、訓練中にラベル情報が使用されないため、クラス判別的なノード表現を学ぶことに劣る点です。そこで我々は、同じクラスに属するノードをまとめることを確実にするために、少数のラベル付きノードを活用する新しい半教師あり手法GraFNを提案し、半教師ありおよび自己教師あり手法の長所の両方を達成します。具体的には、GraFNはランダムにラベル付きノードからサポートノードを、全体のグラフからアンカーノードをサンプリングします。その後、異なる増強グラフからアンカーサポートの類似性に基づいて非パラメトリックに割り当てられた2つの予測クラス分布の違いを最小化します。我々は実験的に、GraFNが現実世界のグラフにおけるノード分類において、半教師ありおよび自己教師あり手法の両方を上回ることを示しています。

Abstract:　 Despite the success of Graph Neural Networks (GNNs) on various applications, GNNs encounter significant performance degradation when the amount of supervision signals, i.e., number of labeled nodes, is limited, which is expected as GNNs are trained solely based on the supervision obtained from the labeled nodes. On the other hand, recent self-supervised learning paradigm aims to train GNNs by solving pretext tasks that do not require any labeled nodes, and it has shown to even outperform GNNs trained with few labeled nodes. However, a major drawback of self-supervised methods is that they fall short of learning class discriminative node representations since no labeled information is utilized during training. To this end, we propose a novel semi-supervised method for graphs, GraFN, that leverages few labeled nodes to ensure nodes that belong to the same class to be grouped together, thereby achieving the best of both worlds of semi-supervised and self-supervised methods. Specifically, GraFN randomly samples support nodes from labeled nodes and anchor nodes from the entire graph. Then, it minimizes the difference between two predicted class distributions that are non-parametrically assigned by anchor-supports similarity from two differently augmented graphs. We experimentally show that GraFN surpasses both the semi-supervised and self-supervised methods in terms of node classification on real-world graphs.

Item Similarity Mining for Multi-Market Recommendation
Authors: Jiangxia Cao (1), Xin Cong (1), Tingwen Liu (1), Bin Wang (2)
1: Institute of Information Engineering, 2: Xiaomi AI Lab

ACM DL

Google Scholar

(244)
概要:　 :
AmazonやNetflixのような実世界のウェブアプリケーションは、世界中の複数の国や地域（すなわちマーケット）でサービスを提供しています。一般的に、異なるマーケットは類似したアイテムセットを共有しながらも、相互作用データの量が異なります。あるマーケットはデータが不足しており、他のマーケットはデータが豊富であり、豊富なデータを持つ補助的なマーケットのデータを活用することで、データが不足しているマーケットを強化することができます。本論文では、マルチマーケット推薦（MMR）を探求し、すべてのマーケットの推薦を同時に改善する新しいモデルM$^3$Recを提案します。アイテムが異なるマーケットを連結する役割を果たすため、アイテム間の類似性を抽出することがMMRの鍵であると主張します。我々のM^3Recは、二種類のグローバルアイテム類似性—市場内および市場間の類似性—を事前処理します。具体的には、まず線形モデルを使用して閉形式解を用いることで二階の市場内類似性を学習し、その後、ランダムウォークによって高階の市場間類似性を捉えます。その後、各ローカルマーケットに対してグローバルアイテム類似性を組み込みます。我々は、5つの公に利用可能なマーケットで広範な実験を行い、いくつかの最新の手法と比較しました。詳細な実験結果は、提案手法の有効性を示しています。

Abstract:　 Real-world web applications such as Amazon and Netflix often provide services in multiple countries and regions (i.e., markets) around the world. Generally, different markets share similar item sets while containing different amounts of interaction data. Some markets are data-scarce and others are data-rich and leveraging those data from similar and data-rich auxiliary markets could enhance the data-scarce markets. In this paper, we explore multi-market recommendation (MMR), and propose a novel model called M$^3$Rec to improve all markets recommendation simultaneously. Since items play the role to bridge different markets, we argue that mining the similarities among items is the key point of MMR. Our M^3Rec preprocess two global item similarities: intra- and inter- market similarities. Specifically, we first learn the second-order intra-market similarity by adopting linear models with closed-form solutions, and then capture the high-order inter-market similarity by the random walk. Afterward, we incorporate the global item similarities for each local market. We conduct extensive experiments on five public available markets and compare with several state-of-the-art methods. Detailed experimental results demonstrate the effectiveness of our proposed method.

ILMART: Interpretable Ranking with Constrained LambdaMART
Authors: Claudio Lucchese (1), Franco Maria Nardini (2), Salvatore Orlando (1), Raffaele Perego (2), Alberto Veneri (3)
1: Ca' Foscari University of Venice, 2: ISTI-CNR, 3: Ca' Foscari University of Venice & ISTI-CNR

ACM DL

Google Scholar

(245)
概要:　解釈可能なランキング学習（LtR）は、説明可能なAIの研究分野の中で新しく出現している分野であり、理解しやすく正確な予測モデルの開発を目指しています。これまでの研究の多くは事後説明の作成に焦点を当ててきましたが、本論文では効果的かつ本質的に解釈可能なランキングモデルをどのように訓練するかについて探求します。この種のモデルの開発には特に困難が伴い、ランキングの質とモデルの複雑性のバランスを見つける必要があります。最先端のランカーは、大規模なツリーのアンサンブルや複数のニューラル層から成り、事実上無限の特徴間相互作用を利用しているためブラックボックス化します。本質的に解釈可能なランキングモデルに関する以前のアプローチは、特徴間の相互作用を避けることによりこの問題に対処していますが、そのために完全な複雑性モデルに対して大きな性能低下を招いています。これに対して、LambdaMARTに基づく新しい解釈可能なLtRソリューションであるILMARTは、制限されたコントロールされたペアワイズ特徴相互作用を利用することで、効果的で理解しやすいモデルを訓練することができます。3つの公開LtRデータセットで行った徹底的かつ再現可能な実験により、ILMARTが解釈可能ランキングの現在の最先端ソリューションを大幅に上回り、nDCGが最大8%向上することが示されました。

Abstract:　 Interpretable Learning to Rank (LtR) is an emerging field within the research area of explainable AI, aiming at developing intelligible and accurate predictive models. While most of the previous research efforts focus on creating post-hoc explanations, in this paper we investigate how to train effective and intrinsically-interpretable ranking models. Developing these models is particularly challenging and it also requires finding a trade-off between ranking quality and model complexity. State-of-the-art rankers, made of either large ensembles of trees or several neural layers, exploit in fact an unlimited number of feature interactions making them black boxes. Previous approaches on intrinsically-interpretable ranking models address this issue by avoiding interactions between features thus paying a significant performance drop with respect to full-complexity models. Conversely, ILMART, our novel and interpretable LtR solution based on LambdaMART, is able to train effective and intelligible models by exploiting a limited and controlled number of pairwise feature interactions. Exhaustive and reproducible experiments conducted on three publicly-available LtR datasets show that ILMART outperforms the current state-of-the-art solution for interpretable ranking of a large margin with a gain of nDCG of up to 8%.

Modern Baselines for SPARQL Semantic Parsing
Authors: Debayan Banerjee (1), Pranav Ajit Nair (2), Jivat Neet Kaur (3), Ricardo Usbeck (1), Chris Biemann (1)
1: Universität Hamburg, 2: Indian Institute of Technology (BHU), 3: Microsoft Research

ACM DL

Google Scholar

(246)
概要:　 :
本研究では、ナチュラルランゲージクエスチョン（自然言語の質問）からSPARQLクエリを生成し、ナレッジグラフ（KGs）上で実行するタスクに焦点を当てます。ゴールドエンティティとリレーションが提供されている前提で、それらを適切な順序に並べ替え、SPARQLの語彙および入力トークンと組み合わせて、正しいSPARQLクエリを生成することを目指します。このタスクにおいて、事前訓練済み言語モデル（PLMs）は十分に探索されていないため、BART、T5、PGNs（Pointer Generator Networks）をBERTの埋め込みとともに用いて実験を行い、新しいベースラインを模索しました。実験はDBpediaおよびWikidataのKGs上で実施しました。その結果、T5は特殊な入力トークナイズが必要ですが、LC-QuAD 1.0およびLC-QuAD 2.0データセットで最先端の性能を発揮し、過去の研究からのタスク特化モデルを上回ることを示しました。さらに、これらの手法は、入力の一部を出力クエリにコピーする必要がある質問のセマンティックパースを可能にし、KGセマンティックパースにおける新しいパラダイムを実現します。

Abstract:　 In this work, we focus on the task of generating SPARQL queries from natural language questions, which can then be executed on Knowledge Graphs (KGs). We assume that gold entity and relations have been provided, and the remaining task is to arrange them in the right order along with SPARQL vocabulary, and input tokens to produce the correct SPARQL query. Pre-trained Language Models (PLMs) have not been explored in depth on this task so far, so we experiment with BART, T5 and PGNs (Pointer Generator Networks) with BERT embeddings, looking for new baselines in the PLM era for this task, on DBpedia and Wikidata KGs. We show that T5 requires special input tokenisation, but produces state of the art performance on LC-QuAD 1.0 and LC-QuAD 2.0 datasets, and outperforms task-specific models from previous works. Moreover, the methods enable semantic parsing for questions where a part of the input needs to be copied to the output query, thus enabling a new paradigm in KG semantic parsing.

Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation with Minimal Computational Complexity
Authors: Harrie Oosterhuis (1)
1: Radboud University

ACM DL

Google Scholar

(247)
概要:　プラケット・ルーシュ勾配推定法（Plackett-Luce gradient estimation）は、サンプリング技術を通じて確率的ランキングモデルの最適化を実現し、現実的な時間内での処理を可能にします。しかし、既存の手法の計算複雑性はランキングの長さ（すなわちランキングカットオフ）およびアイテムコレクションのサイズに比例してうまくスケールしません。本論文では、最良のソートアルゴリズムに匹敵する計算複雑性で公平な勾配推定を行う新規のPL-Rank-3アルゴリズムを紹介します。その結果、新しい学習ランキング手法は標準的なソートが現実的な時間内で実行可能なあらゆるシナリオに適用可能です。我々の実験結果は、性能の低下を伴うことなく、最適化に要する時間に大幅な改善が見られることを示しています。我々の貢献は、最先端の学習ランキング手法をこれまでよりもはるかに大規模なデータに適用可能にする可能性を秘めています。

Abstract:　 Plackett-Luce gradient estimation enables the optimization of stochastic ranking models within feasible time constraints through sampling techniques. Unfortunately, the computational complexity of existing methods does not scale well with the length of the rankings, i.e. the ranking cutoff, nor with the item collection size. In this paper, we introduce the novel PL-Rank-3 algorithm that performs unbiased gradient estimation with a computational complexity comparable to the best sorting algorithms. As a result, our novel learning-to-rank method is applicable in any scenario where standard sorting is feasible in reasonable time. Our experimental results indicate large gains in the time required for optimization, without any loss in performance. For the field, our contribution could potentially allow state-of-the-art learning-to-rank methods to be applied to much larger scales than previously feasible.

CTnoCVR: A Novelty Auxiliary Task Making the Lower-CTR-Higher-CVR Upper
Authors: Dandan Zhang (1), Haotian Wu (2), Guanqi Zeng (3), Yao Yang (1), Weijiang Qiu (4), Yujie Chen (3), Haoyuan Hu (3)
1: Zhejiang Lab, 2: Beijing Jiaotong University, 3: Cainiao Network, 4: China Electric Power Research Institute

ACM DL

Google Scholar

(248)
概要:　近年、レコメンダーシステムにおけるディープラーニングに基づくマルチタスク学習モデルが、産業界および学界の研究者から注目を集めています。特に、クリック後のコンバージョン率（CVR）を正確に推定することは、レコメンダーシステムにおけるマルチタスク学習の主要なタスクと見なされています。しかし、広告主の中には広告の装飾を過度に行い、クリック率（CTR）を高めようとする者もいるかもしれません。これにより、CVRが低いサンプルが過剰に表示されることになります。例えば、目を引くクリックベイトは高いCTRを持っていますが、実際のCVRは非常に低いことがあります。この結果、レコメンダーシステムの全体的なパフォーマンスは低下します。本論文では、CTnoCVRという新しい補助タスクを導入し、クリックのみでコンバージョンがないイベントの確率を予測することを目的としています。これにより、CTRが低くてもCVRが高いサンプルを促進することができます。Taobaoのレコメンダーシステムのトラフィックログから収集された大規模なデータセットで実施された豊富な実験により、CTnoCVRタスクの導入が様々なマルチタスクフレームワークでCVRの予測精度を大幅に向上させることが示されました。さらに、オンラインテストを実施し、高CVRかつ低CTRのサンプルが上位にランクされることを評価しました。

Abstract:　 In recent years, multi-task learning models based on deep learning in recommender systems have attracted increasing attention from researchers in industry and academia. Accurately estimating post-click conversion rate (CVR) is often considered as the primary task of multi-task learning in recommender systems. However, some advertisers may try to get higher click-through rates (CTR) by over-decorating their ads, which may result in excessive exposure to samples with lower CVR. For example, some only eye-catching clickbait have higher CTR, but actually, CVR is very low. As a result, the overall performance of the recommender system will be hurt. In this paper, we introduce a novelty auxiliary task called CTnoCVR, which aims to predict the probability of events with click but no-conversion, in various state-of-the-art multi-task models of recommender systems to promote samples with high CVR but low CTR. Plentiful Experiments on a large-scale dataset gathered from traffic logs of Taobao's recommender system demonstrate that the introduction of CTnoCVR task significantly improves the prediction effect of CVR under various multi-task frameworks. In addition, we conduct the online test and evaluate the effectiveness of our proposed method to make those samples with high CVR and low CTR rank higher.

Deep Multi-Representational Item Network for CTR Prediction
Authors: Jihai Zhang (1), Fangquan Lin (1), Cheng Yang (1), Wei Wang (1)
1: Alibaba Group

ACM DL

Google Scholar

(249)
概要:　クリック率（CTR）予測は、レコメンダーシステムのモデリングにおいて重要です。これまでの研究は主にユーザー行動のモデリングに焦点を当てており、候補アイテムの表現を考慮するものは少ない。このため、モデルはユーザー表現に強く依存し、ユーザー行動が疎な場合に効果が低減します。さらに、ほとんどの既存の研究では、候補アイテムを一つの固定エンベディングとして扱い、アイテムの多重表現的な特性を無視しています。これらの問題に対処するために、CTR予測を目的とした深層多重表現アイテムネットワーク（DRINK）を提案します。具体的には、疎なユーザー行動問題に対処するため、相互作用するユーザーとタイムスタンプのシーケンスを構築し、候補アイテムを表現します。また、アイテムの特性を動的に捉えるため、マルチCLS表現サブモジュールとコンテキスト化されたグローバルアイテム表現サブモジュールからなるトランスフォーマー型の多重表現アイテムネットワークを提案します。さらに、時間情報とアイテムの行動を分離して情報が過剰になるのを防ぐことを提案します。以上のコンポーネントの出力を連結し、MLP層に入力してCTRを適合させます。Amazonの実データセットで広範な実験を行った結果、提案したモデルの有効性が実証されました。

Abstract:　 Click-through rate (CTR) prediction is essential in the modelling of a recommender system. Previous studies mainly focus on user behavior modelling, while few of them consider candidate item representations. This makes the models strongly dependent on user representations, and less effective when user behavior is sparse. Furthermore, most existing works regard the candidate item as one fixed embedding and ignore the multi-representational characteristics of the item. To handle the above issues, we propose a Deep multi-Representational Item NetworK (DRINK) for CTR prediction. Specifically, to tackle the sparse user behavior problem, we construct a sequence of interacting users and timestamps to represent the candidate item; to dynamically capture the characteristics of the item, we propose a transformer-based multi-representational item network consisting of a multi-CLS representation submodule and contextualized global item representation submodule. In addition, we propose to decouple the time information and item behavior to avoid information overwhelming. Outputs of the above components are concatenated and fed into a MLP layer to fit the CTR. We conduct extensive experiments on real-world datasets of Amazon and the results demonstrate the effectiveness of the proposed model.

A New Sequential Prediction Framework with Spatial-temporal Embedding
Authors: Jihai Zhang (1), Fangquan Lin (1), Cheng Yang (1), Wei Jiang (1)
1: Alibaba Group

ACM DL

Google Scholar

(250)
概要:　シーケンシャル予測は推薦システムの主要な要素の一つです。オンラインの電子商取引における推薦システムでは、ユーザー行動はシーケンシャルな訪問記録から成り、アイテム行動は順序立ったユーザーリストを含みます。しかし、既存の最先端シーケンシャル予測手法の多くはユーザー行動のみを考慮し、アイテム行動を無視しています。さらに、ユーザー行動は異なる時間において大きく変動し、多くの既存モデルはこの豊富な時間情報を特徴づけることに失敗しています。この問題に対処するために、我々はTransformerベースの空間-時間推薦フレームワーク（STEM）を提案します。STEMフレームワークでは、最初にアテンションメカニズムを用いてユーザー行動とアイテム行動をモデル化し、その後Transformerベースのモデルを通じて空間および時間情報を活用します。STEMフレームワークは、プラグインとして多くのニューラルネットワークベースのシーケンシャル推薦方法に組み込むことができ、性能を向上させます。我々は、3つの実世界のAmazonデータセットで広範な実験を行いました。その結果、提案したフレームワークの有効性が実証されました。

Abstract:　 Sequential prediction is one of the key components in recommendation. In online e-commerce recommendation system, user behavior consists of the sequential visiting logs and item behavior contains the interacted user list in order. Most of the existing state-of-the-art sequential prediction methods only consider the user behavior while ignoring the item behavior. In addition, we find that user behavior varies greatly at different time, and most existing models fail to characterize the rich temporal information. To address the above problems, we propose a transformer-based spatial-temporal recommendation framework (STEM). In the STEM framework, we first utilize attention mechanisms to model user behavior and item behavior, and then exploit spatial and temporal information through a transformer-based model. The STEM framework, as a plug-in, is able to be incorporated into many neural network-based sequential recommendation methods to improve performance. We conduct extensive experiments on three real-world Amazon datasets. The results demonstrate the effectiveness of our proposed framework.

Rethinking Correlation-based Item-Item Similarities for Recommender Systems
Authors: Katsuhiko Hayashi (1)
1: Hokkaido University

ACM DL

Google Scholar

(251)
概要:　本論文では、推薦システムにおける相関ベースのアイテム間類似度測定法を研究します。現在の推薦システムに関する研究は深層学習に基づいたアプローチに集中していますが、シンプルさゆえ、近隣法は依然として商業的な推薦システムで広く利用されています。アイテムベースの近隣法において重要なステップはアイテム間の類似度を算出することであり、これらは一般的にPearsonのような相関測定を通じて推定されます。本論文の目的は、近年推奨評価に使用された複数のベンチマークデータセットを用いて、相関ベースの近隣法の有効性を再調査することです。また、本論文は従来のPearson相関係数よりも効果的な相関測定法の推定手法を提供し、これが推薦パフォーマンスにおいて大幅な改善をもたらすことを示します。

Abstract:　 This paper studies correlation-based item-item similarity measures for recommendation systems. While current research on recommender systems is directed toward deep learning-based approaches, nearest neighbor methods have been still used extensively in commercial recommender systems due to their simplicity. A crucial step in item-based nearest neighbor methods is to compute similarities between items, which are generally estimated through correlation measures like Pearson. The purpose of this paper is to re-investigate the effectiveness of correlation-based nearest neighbor methods on several benchmark datasets that have been used for recommendation evaluation in recent years. This paper also provides a more effective estimation method for correlation measures than the classical Pearson correlation coefficient and shows that this leads to significant improvements in recommendation performance.

Deep Page-Level Interest Network in Reinforcement Learning for Ads Allocation
Authors: Guogang Liao (1), Xiaowen Shi (1), Ze Wang (1), Xiaoxu Wu (1), Chuheng Zhang (2), Yongkang Wang (1), Xingxing Wang (1), Dong Wang (1)
1: Meituan, 2: Tsinghua University

ACM DL

Google Scholar

(252)
概要:　フィード内で広告とオーガニックアイテムの混在リストが表示される際に、限られた枠をどのように割り当てて全体の収益を最大化するかが重要な問題です。同時に、ユーザ行動のモデリングはおすすめシステムや広告（例：CTR予測や広告割り当て）において不可欠です。従来の多くの研究は、ポイントレベルの肯定的フィードバック（例：クリック）のみをモデル化し、フィードバックのページレベルの情報や他のフィードバックの種類を無視していました。この問題に対処するために、複数のフィードバックタイプを活用し、ページレベルでユーザの嗜好をモデル化するDeep Page-level Interest Network（DPIN）を提案します。具体的には、4つの異なるページレベルのフィードバックタイプを導入し、マルチチャネル相互作用モジュールを通じて異なる受容野の下でのアイテム配置に対するユーザの嗜好を捉えます。Meituanのフードデリバリープラットフォームにおける広範なオフラインおよびオンライン実験を通じて、DPINがページレベルのユーザ嗜好を効果的にモデル化し、収益を増加させることを実証します。

Abstract:　 A mixed list of ads and organic items is usually displayed in feed and how to allocate the limited slots to maximize the overall revenue is a key problem. Meanwhile, user behavior modeling is essential in recommendation and advertising (e.g., CTR prediction and ads allocation). Most previous works only model point-level positive feedback (i.e., click), which neglect the page-level information of feedback and other types of feedback. To this end, we propose Deep Page-level Interest Network (DPIN) to model the page-level user preference and exploit multiple types of feedback. Specifically, we introduce four different types of page-level feedback, and capture user preference for item arrangement under different receptive fields through the multi-channel interaction module. Through extensive offline and online experiments on Meituan food delivery platform, we demonstrate that DPIN can effectively model the page-level user preference and increase the revenue.

GraphAD: A Graph Neural Network for Entity-Wise Multivariate Time-Series Anomaly Detection
Authors: Xu Chen (1), Qiu Qiu (2), Changshan Li (2), Kunqing Xie (1)
1: Peking University, 2: Alibaba DAMO Academy

ACM DL

Google Scholar

(253)
概要:　近年、サードパーティプラットフォームの登場と発展により、オンラインからオフライン（O2O）ビジネスの成長が大いに促進されている。しかし、膨大な取引データは小売業者に新たな課題をもたらし、特に運用状況における異常検知が問題となっている。このため、プラットフォームは小売業者の管理負担を軽減するために、異常検知方法を組み込んだインテリジェントビジネスアシスタントの開発を始めている。従来の時系列異常検知方法は、時間と属性の観点から基底パターンを捉えるが、このシナリオにおける小売業者間の違いを無視している。加えて、プラットフォームが抽出した類似の取引パターンは、個々の小売業者にガイダンスを提供し、プライバシーの問題なく利用可能な情報を豊富にすることができる。本論文では、各ユニークなエンティティの時系列を考慮したエンティティ単位の多変量時系列異常検知問題を提起する。この課題に取り組むため、グラフニューラルネットワークに基づく新しい多変量時系列異常検知モデル「GraphAD」を提案する。GraphADは主要業績評価指標（KPI）を安定成分と変動成分に分解し、グラフニューラルネットワークを通じて属性、エンティティ、および時間的観点からそのパターンを抽出する。また、Ele.meのビジネスデータから実世界のエンティティ単位の多変量時系列データセットを構築した。このデータセットにおける実験結果は、GraphADが既存の異常検知方法を大幅に上回ることを示している。

Abstract:　 In recent years, the emergence and development of third-party platforms have greatly facilitated the growth of the Online to Offline (O2O) business. However, the large amount of transaction data raises new challenges for retailers, especially anomaly detection in operating conditions. Thus, platforms begin to develop intelligent business assistants with embedded anomaly detection methods to reduce the management burden on retailers. Traditional time-series anomaly detection methods capture underlying patterns from the perspectives of time and attributes, ignoring the difference between retailers in this scenario. Besides, similar transaction patterns extracted by the platforms can also provide guidance to individual retailers and enrich their available information without privacy issues. In this paper, we pose an entity-wise multivariate time-series anomaly detection problem that considers the time-series of each unique entity. To address this challenge, we propose GraphAD, a novel multivariate time-series anomaly detection model based on the graph neural network. GraphAD decomposes the Key Performance Indicator (KPI) into stable and volatility components and extracts their patterns in terms of attributes, entities and temporal perspectives via graph neural networks. We also construct a real-world entity-wise multivariate time-series dataset from the business data of Ele.me. The experimental results on this dataset show that GraphAD significantly outperforms existing anomaly detection methods.

On Optimizing Top-K Metrics for Neural Ranking Models
Authors: Rolf Jagerman (1), Zhen Qin (1), Xuanhui Wang (1), Michael Bendersky (1), Marc Najork (1)
1: Google Research

ACM DL

Google Scholar

(254)
概要:　 NDCG@KなどのTop-K指標は、ランキングのパフォーマンスを評価する際によく使用されます。LambdaMARTのような従来のツリーベースのモデルは、Gradient Boosted Decision Trees (GBDT) に基づいており、LambdaRank損失を使用してNDCG@Kを最適化するように設計されています。最近では、ランキングタスクのためのニューラルランキングモデルへの研究関心が高まっています。これらのモデルは決定木モデルと根本的に異なり、異なる損失関数に対して異なる挙動を示します。例えば、ニューラルモデルで最も一般的に使用されるランキング損失には、Softmax損失やGumbelApproxNDCG損失があります。これらの損失は、NDCG@KのようなTop-K指標に自然に結びつかないため、ニューラルランキングモデルでNDCG@Kを効果的に最適化する方法は未だに課題として残っています。本論文では、LambdaLossフレームワークに従い、新しい理論的に正当なNDCG@K指標のための損失関数を設計します。元のLambdaLoss論文では不完全なヒューリスティックを用いたが、我々のアプローチはそれを改善しました。LETORベンチマークデータセットを用いて、新しい損失の効果を研究し、ニューラルランキングモデルにおいて他の損失よりも優れた性能を示すことを確認しました。

Abstract:　 Top-K metrics such as NDCG@K are frequently used to evaluate ranking performance. The traditional tree-based models such as LambdaMART, which are based on Gradient Boosted Decision Trees (GBDT), are designed to optimize NDCG@K using the LambdaRank losses. Recently, there is a good amount of research interest on neural ranking models for learning-to-rank tasks. These models are fundamentally different from the decision tree models and behave differently with respect to different loss functions. For example, the most popular ranking losses used in neural models are the Softmax loss and the GumbelApproxNDCG loss. These losses do not connect to top-K metrics such as NDCG@K naturally. It remains a question on how to effectively optimize NDCG@K for neural ranking models. In this paper, we follow the LambdaLoss framework and design novel and theoretically sound losses for NDCG@K metrics, while the original LambdaLoss paper can only do so using an unsound heuristic. We study the new losses on the LETOR benchmark datasets and show that the new losses work better than other losses for neural ranking models.

Bias Mitigation for Evidence-aware Fake News Detection by Causal Intervention
Authors: Junfei Wu (1), Qiang Liu (1), Weizhi Xu (1), Shu Wu (1)
1: Institute of Automation

ACM DL

Google Scholar

(255)
概要:　証拠に基づくフェイクニュース検出とは、関連する証拠に対してニュースの真偽を判断することです。しかし、モデルはニュースのパターンと真偽ラベルとの間の虚偽の相関をショートカットとして覚える傾向があり、それらの背後にある情報を統合して論理的に考える方法を学習することはありません。その結果、実際の状況で多くの異なるパターンのニュースに直面すると、モデルは深刻な失敗をする可能性があります。因果推論の成功に刺激されて、我々は証拠に基づくフェイクニュース検出をデバイアスするための新しいフレームワークを提案します。このフレームワークでは、まず通常のデータセットでモデルを訓練し、テスト段階で通常の予測と反事実的予測を同時に行います。反事実的予測は介入された証拠に基づいています。介入された出力から通常の出力を差し引くことで、比較的バイアスの少ない予測が得られます。複数のデータセットで行った広範な実験により、我々の方法がバイアスを除去したデータセットにおいて効果的かつ汎用的であることが示されました。

Abstract:　 Evidence-based fake news detection is to judge the veracity of news against relevant evidences. However, models tend to memorize the dataset biases within spurious correlations between news patterns and veracity labels as shortcuts, rather than learning how to integrate the information behind them to reason. As a consequence, models may suffer from a serious failure when facing real-life conditions where most news has different patterns. Inspired by the success of causal inference, we propose a novel framework for debiasing evidence-based fake news detection\footnoteCode available at https://github.com/CRIPAC-DIG/CF-FEND by causal intervention. Under this framework, the model is first trained on the original biased dataset like ordinary work, then it makes conventional predictions and counterfactual predictions simultaneously in the testing stage, where counterfactual predictions are based on the intervened evidence. Relatively unbiased predictions are obtained by subtracting intervened outputs from the conventional ones. Extensive experiments conducted on several datasets demonstrate our method's effectiveness and generality on debiased datasets.

DisenCTR: Dynamic Graph-based Disentangled Representation for Click-Through Rate Prediction
Authors: Yifan Wang (1), Yifang Qin (1), Fang Sun (1), Bo Zhang (2), Xuyang Hou (2), Ke Hu (2), Jia Cheng (2), Jun Lei (2), Ming Zhang (1)
1: Peking University, 2: Meituan

ACM DL

Google Scholar

(256)
概要:　クリック率（CTR）の予測は、レコメンダーシステムやその他のアプリケーションにおいて重要な役割を果たします。最近では、ユーザー行動のシーケンスをモデル化することが注目され、CTR分野において大きな改善をもたらしています。多くの既存の研究は、シーケンスからユーザーの興味を引き出すためにアテンションメカニズムやリカレントニューラルネットワークを活用していますが、ユーザーのリアルタイムな興味が本質的に多様で流動的であるという単純な事実を認識できていません。本論文では、CTR予測のための新しい動的グラフに基づく離散化表現フレームワークであるDisenCTRを提案します。既存のアプローチと比較して我々の方法の主要な新規性は、進化するユーザーの多様な興味をモデル化することです。具体的には、過去のインタラクションによって引き起こされる時系列のユーザーアイテムインタラクショングラフを構築します。そして、このグラフが提供する豊かなダイナミクスに基づき、多様なユーザーの興味を抽出する離散化グラフ表現モジュールを提案します。さらに、ユーザーの興味の流動性を活用し、過去の行動の時間的影響をMixture of Hawkes Processを用いてモデル化します。3つの実世界データセットでの広範な実験により、当方の方法が最先端アプローチと比較して優れた性能を発揮することを実証しています。

Abstract:　 Click-through rate (CTR) prediction plays a critical role in recommender systems and other applications. Recently, modeling user behavior sequences attracts much attention and brings great improvements in the CTR field. Many existing works utilize attention mechanism or recurrent neural networks to exploit user interest from the sequence, but fail to recognize the simple truth that a user's real-time interests are inherently diverse and fluid. In this paper, we propose DisenCTR, a novel dynamic graph-based disentangled representation framework for CTR prediction. The key novelty of our method compared with existing approaches is to model evolving diverse interests of users. Specifically, we construct a time-evolving user-item interaction graph induced by historical interactions. And based on the rich dynamics supplied by the graph, we propose a disentangled graph representation module to extract diverse user interests. We further exploit the fluidity of user interests and model the temporal effect of historical behaviors using Mixture of Hawkes Process. Extensive experiments on three real-world datasets demonstrate the superior performance of our method comparing to state-of-the-art approaches.

Improving Conversational Recommender Systems via Transformer-based Sequential Modelling
Authors: Jie Zou (1), Evangelos Kanoulas (2), Pengjie Ren (3), Zhaochun Ren (3), Aixin Sun (1), Cheng Long (1)
1: SCALE@Nanyang Technological University, 2: University of Amsterdam, 3: Shandong University

ACM DL

Google Scholar

(257)
概要:　会話型レコメンダーシステム（CRS）において、会話は通常、関連するアイテムやエンティティ（例：アイテムの属性）を含みます。これらのアイテムやエンティティは、対話の進展に従って順序立てて言及されます。つまり、会話の中には潜在的な順序依存性が存在します。しかし、既存の多くのCRSはこの潜在的な順序依存性を無視しています。本論文では、会話の順序依存性をモデル化してCRSを改善する、Transformerベースの順序会話型レコメンデーション手法であるTSCRを提案します。会話をアイテムとエンティティで表現し、言及されたアイテムとエンティティの両方を考慮してユーザーの好みを発見するためにユーザーシーケンスを構築します。構築されたシーケンスに基づき、クロースタスクを活用してシーケンスに沿った推奨アイテムを予測します。実験結果は、TSCRモデルが最新のベースラインを大幅に上回るパフォーマンスを示していることを証明しています。

Abstract:　 In Conversational Recommender Systems (CRSs), conversations usually involve a set of related items and entities e.g., attributes of items. These items and entities are mentioned in order following the development of a dialogue. In other words, potential sequential dependencies exist in conversations. However, most of the existing CRSs neglect these potential sequential dependencies. In this paper, we propose a Transformer-based sequential conversational recommendation method, named TSCR, which models the sequential dependencies in the conversations to improve CRS. We represent conversations by items and entities, and construct user sequences to discover user preferences by considering both mentioned items and entities. Based on the constructed sequences, we deploy a Cloze task to predict the recommended items along a sequence. Experimental results demonstrate that our TSCR model significantly outperforms state-of-the-art baselines.

Neural Query Synthesis and Domain-Specific Ranking Templates for Multi-Stage Clinical Trial Matching
Authors: Ronak Pradeep (1), Yilin Li (1), Yuetong Wang (2), Jimmy Lin (2)
1: University of Waterloo, 2: University of Waterloo

ACM DL

Google Scholar

(258)
概要:　本研究では、臨床試験のマッチング問題に対する効果的なマルチステージニューラルランキングシステムを提案します。まず、NQS（Neural Query Synthesis）というニューラルクエリ合成手法を紹介します。この手法はゼロショット文書展開モデルを活用し、長い患者記述から複数の文長クエリを生成します。これらのクエリは独立して検索エンジンに発行され、結果が融合されます。我々の調査では、TREC 2021臨床試験トラックにおいて、この方法がBM25およびBM25 + RM3といった強力な従来のベースラインを約12ポイント、相対的に34%、nDCG@10で上回ることが分かりました。このシンプルな方法は非常に効果的であり、NQSの結果をリランキングする際、MS MARCOパッセージの医療サブセットで訓練された最先端のニューラルリレバンスランキング手法でさえ、ランクリストの改善に失敗します。次に、臨床試験マッチングデータを用いた、特別なランキングテンプレートで訓練された2ステージニューラルリランキングパイプラインを紹介します。この設定では、わずか1.1kのポジティブ例を用いてポイントワイズリランカーを訓練し、NQSに対して24ポイントの効果改善を達成します。このエンドツーエンドのマルチステージシステムはTREC 2021での次優秀提出物に比べて20%の相対効果向上を示し、より良い自動化された臨床試験マッチングへの重要な一歩となります。

Abstract:　 In this work, we propose an effective multi-stage neural ranking system for the clinical trial matching problem. First, we introduce NQS, a neural query synthesis method that leverages a zero-shot document expansion model to generate multiple sentence-long queries from lengthy patient descriptions. These queries are independently issued to a search engine and the results are fused. We find that on the TREC 2021 Clinical Trials Track, this method outperforms strong traditional baselines like BM25 and BM25 + RM3 by about 12 points in nDCG@10, a relative improvement of 34%. This simple method is so effective that even a state-of-the-art neural relevance ranking method trained on the medical subset of MS MARCO passage, when reranking the results of NQS, fails to improve on the ranked list. Second, we introduce a two-stage neural reranking pipeline trained on clinical trial matching data using tailored ranking templates. In this setting, we can train a pointwise reranker using just 1.1k positive examples and obtain effectiveness improvements over NQS by 24 points. This end-to-end multi-stage system demonstrates a 20% relative effectiveness gain compared to the second-best submission at TREC 2021, making it an important step towards better automated clinical trial matching.

On Extractive Summarization for Profile-centric Neural Expert Search in Academia
Authors: Rennan C. Lima (1), Rodrygo L. T. Santos (1)
1: Universidade Federal de Minas Gerais

ACM DL

Google Scholar

(259)
概要:　学術専門家を特定することは科学の進歩にとって極めて重要であり、研究者がコネクションを形成し、ネットワークを築き、最も重要な研究課題に協力することを可能にします。クエリに応じて専門家をランク付けする際の主要な課題は、共著した論文から彼らの専門性をどのように推測するかという点です。プロフィール中心のアプローチでは、候補者のすべての論文をテキストベースのプロフィールにまとめて表現します。このように、各候補者の科学的成果を完全に把握できる反面、その長いプロフィールは、最新のニューラルアーキテクチャを用いた専門性推測の効率を低下させます。この制限を克服するために、トランスフォーマーを用いて意味的にエンコードするための候補者プロフィールの縮小メカニズムとして、抽出型の適性を調査します。代表的な学術検索テストコレクションを用いた徹底的な実験により、されたプロフィールをエンコードすることによる専門性推測の向上効果が実証されました。

Abstract:　 Identifying academic experts is crucial for the progress of science, enabling researchers to connect, form networks, and collaborate on the most pressing research problems. A key challenge for ranking experts in response to a query is how to infer their expertise from the publications they coauthored. Profile-centric approaches represent candidate experts by concatenating all their publications into a text-based profile. Despite offering a complete picture of each candidate's scientific output, such lengthy profiles make it inefficient to leverage state-of-the-art neural architectures for inferring expertise. To overcome this limitation, we investigate the suitability of extractive summarization as a mechanism to reduce candidate profiles for semantic encoding using Transformers. Our thorough experiments with a representative academic search test collection demonstrate the benefits of encoding summarized profiles for an improved expertise inference.

Hybrid CNN Based Attention with Category Prior for User Image Behavior Modeling
Authors: Xin Chen (1), Qingtao Tang (1), Ke Hu (1), Yue Xu (1), Shihang Qiu (1), Jia Cheng (1), Jun Lei (1)
1: Meituan

ACM DL

Google Scholar

(260)
概要:　オンライン広告システムにおいて、ユーザーの過去の行動はクリック率（CTR）予測に有用であることが証明されています。中国最大級のEコマースプラットフォームの一つであるMeituanでは、商品は通常その画像とともに表示され、ユーザーがその商品をクリックするかどうかは主に画像に影響されます。これは、ユーザーの画像行動がユーザーの視覚的な好みを理解し、CTR予測の精度を向上させるために役立つことを示唆しています。既存のユーザー画像行動モデルは通常、二段階のアーキテクチャを使用しています。第一段階では、市販の畳み込みニューラルネットワーク（CNN）を通じて画像の視覚的埋め込みを抽出し、第二段階では、これらの視覚的埋め込みと非視覚的特徴を使用してCTRモデルを共同で訓練します。しかし、この二段階のアーキテクチャはCTR予測において最適とは言えません。一方、オンライン広告システムにおける正確にラベル付けされたカテゴリには豊富な視覚的な事前情報が含まれており、ユーザーの画像行動のモデル化を強化できます。ただし、カテゴリ事前情報を持たない市販のCNNはカテゴリに関連しない特徴を抽出する可能性があり、CNNの表現能力を制限します。この2つの課題に対処するために、ユーザーの画像行動とカテゴリ事前情報を統一したハイブリッドCNNベースのアテンションモジュールを提案し、CTR予測を行います。我々のアプローチは、数十億規模の実際のサービングデータセットを使用したオンラインおよびオフライン実験の両方で、顕著な改善を達成しました。

Abstract:　 User historical behaviors are proved useful for Click Through Rate (CTR) prediction in online advertising system. In Meituan, one of the largest e-commerce platform in China, an item is typically displayed with its image and whether a user clicks the item or not is usually influenced by its image, which implies that user's image behaviors are helpful for understanding user's visual preference and improving the accuracy of CTR prediction. Existing user image behavior models typically use a two-stage architecture, which extracts visual embeddings of images through off-the-shelf Convolutional Neural Networks (CNNs) in the first stage, and then jointly trains a CTR model with those visual embeddings and non-visual features. We find that the two-stage architecture is sub-optimal for CTR prediction. Meanwhile, precisely labeled categories in online ad systems contain abundant visual prior information, which can enhance the modeling of user image behaviors. However, off-the-shelf CNNs without category prior may extract category unrelated features, limiting CNN's expression ability. To address the two issues, we propose a hybrid CNN based attention module, unifying user's image behaviors and category prior, for CTR prediction. Our approach achieves significant improvements in both online and offline experiments on a billion scale real serving dataset.

Joint Optimization of Ad Ranking and Creative Selection
Authors: Kaiyi Lin (1), Xiang Zhang (1), Feng Li (1), Pengjie Wang (1), Qingqing Long (1), Hongbo Deng (1), Jian Xu (1), Bo Zheng (1)
1: Alibaba Group

ACM DL

Google Scholar

(261)
概要:　 Eコマースにおいて、広告クリエイティブはユーザーに製品情報を効果的に伝える上で重要な役割を果たします。オンラインクリエイティブの選定の目的は、ユーザーのクリエイティブに対する嗜好を学習し、クリック率（CTR）を最大化するために最も魅力的なデザインを選択することです。しかし、業界で一般的に行われている方法では、通常クリエイティブの選定は広告ランキングのステージ後に行われ、そのため最適なクリエイティブが広告ランキングへの影響を反映することができません。これらの問題に対処するために、我々はクリエイティブ選定のカスケードアーキテクチャ（CACS）を提案します。これはランキングステージの前に構築され、広告内のクリエイティブ選定と広告間ランキングの共同最適化を実現します。

効率を向上させるために、クラシックなツータワー構造を設計し、クリエイティブ選定ステージのクリエイティブ埋め込みをランキングステージと共有することを可能にします。効果を強化するために、一方ではランキング知識をCACS学習に導くために、ランキングステージからランキング知識を蒸留するソフトラベルリストワイズランク蒸留法を提案し、他方では、内容特性を学習するためにID特性を確率的に無視するようにモデルを誘導する適応的ドロップアウトネットワークを設計します。

最も重要なのは、ランキングモデルがCACSから各広告の最適クリエイティブ情報を取得し、利用可能なすべての特徴を使用してランキングモデルのパフォーマンスを向上させることです。我々はこのソリューションをタオバオ広告プラットフォームに導入し、オフラインおよびオンライン評価の両方で大幅な改善を達成しました。

Abstract:　 In e-commerce, ad creatives play an important role in effectively delivering product information to users. The purpose of online creative selection is to learn users' preferences for ad creatives, and to select the most appealing design for users to maximize Click-Through Rate (CTR). However, the existing common practices in the industry usually place the creative selection after the ad ranking stage, and thus the optimal creative fails to reflect the influence on the ad ranking stage. To address these issues, we propose a novel Cascade Architecture of Creative Selection (CACS), which is built before the ranking stage to joint optimization of intra-ad creative selection and inter-ad ranking. To improve the efficiency, we design a classic two-tower structure and allow creative embeddings of the creative selection stage to share with the ranking stage. To boost the effectiveness, on the one hand, we propose a soft label list-wise ranking distillation method to distill the ranking knowledge from the ranking stage to guide CACS learning; and on the other hand, we also design an adaptive dropout network to encourage the model to probabilistically ignore ID features in favor of content features to learn multi-modal representations of the creative. Most of all, the ranking model obtains the optimal creative information of each ad from our CACS, and uses all available features to improve the performance of the ranking model. We have launched our solution in Taobao advertising platform and have obtained significant improvements both in offline and online evaluations.

BERT-based Dense Intra-ranking and Contextualized Late Interaction via Multi-task Learning for Long Document Retrieval
Authors: Minghan Li (1), Eric Gaussier (1)
1: Univ. Grenoble Alpes

ACM DL

Google Scholar

(262)
概要:　クエリトークンとドキュメントトークンを結合し、BERTのような事前訓練済みトランスフォーマーモデルに入力する手法は、相互作用ベースとして知られ、情報検索において最先端の効果を示しています。しかし、このアプローチはオンラインでの自己注意計算が必要なため、計算コストが高くなります。一方で、表現ベースのアプローチである密な検索方法は効率的ですが、その効果は低いとされています。両者の間の折衷案として、ColBERTのような遅延相互作用法があり、これにより両方のアプローチの利点を活かすことが試みられています。文脈化されたトークン埋め込みは、BERT上で事前計算され、微細な効果的相互作用を実現しつつ、効率性も維持します。しかし、パッセージ検索での成功にもかかわらず、このアプローチを長文検索に適用するのは容易ではありません。本論文では、長文検索のための単一モデルを使用したカスケード遅延相互作用アプローチを提案します。内ランキングをドット積で高速に行い、関連するパッセージを選択し、事前保存されたトークン埋め込みの微細な相互作用を使用してパッセージスコアを生成し、それを最終ドキュメントスコアに集約します。マルチタスク学習を活用し、BERTモデルをドット積と微細相互作用損失関数の両方を最適化するように訓練します。我々の実験では、提案されたアプローチがTREC 2019などのコレクションにおいて効率的でありながら、ほぼ最先端レベルの効果を得られることが示されました。

Abstract:　 Combining query tokens and document tokens and inputting them to pre-trained transformer models like BERT, an approach known as interaction-based, has shown state-of-the-art effectiveness for information retrieval. However, the computational complexity of this approach is high due to the online self-attention computation. In contrast, dense retrieval methods in representation-based approaches are known to be efficient, however less effective. A tradeoff between the two is reached with late interaction methods like ColBERT, which attempt to benefit from both approaches: contextualized token embeddings can be pre-calculated over BERT for fine-grained effective interaction while preserving efficiency. However, despite its success in passage retrieval, it's not straightforward to use this approach for long document retrieval. In this paper, we propose a cascaded late interaction approach using a single model for long document retrieval. Fast intra-ranking by dot product is used to select relevant passages, then fine-grained interaction of pre-stored token embeddings is used to generate passage scores which are aggregated to the final document score. Multi-task learning is used to train a BERT model to optimize both a dot product and a fine-grained interaction loss functions. Our experiments reveal that the proposed approach obtains near state-of-the-art level effectiveness while being efficient on such collections as TREC 2019.

From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective
Authors: Thibault Formal (1), Carlos Lassance (2), Benjamin Piwowarski (3), Stéphane Clinchant (2)
1: Naver Labs Europe / Sorbonne Université, 2: Naver Labs Europe, 3: Sorbonne Université

ACM DL

Google Scholar

(263)
概要:　高密度表現に基づくニューラル検索器と近似最近隣検索の組み合わせは、蒸留および/またはトレーニング例のより良いサンプリングに成功の理由を見出しつつも、依然として同じバックボーンアーキテクチャに依存しているため、最近非常に注目されています。それと同時に、従来の逆インデックス技術に支えられたスパース表現学習も関心を集めており、明示的な語彙マッチングといった望ましいIRの事前知識を受け継いでいます。いくつかのアーキテクチャの変種が提案されている一方で、このようなモデルのトレーニングにはほとんど努力が払われていません。本研究では、SPLADE――スパース展開に基づく検索器――に基づき、高密度モデルと同様のトレーニング改善からどの程度利益を得られるかを示し、蒸留、ハードネガティブマイニング、および事前学習言語モデルの初期化の効果を検証します。さらに、ドメイン内およびゼロショット設定における効果と効率の関連性を調査し、十分表現力のあるモデルにおいて、両方のシナリオで最先端の結果を導きます。

Abstract:　 Neural retrievers based on dense representations combined with Approximate Nearest Neighbors search have recently received a lot of attention, owing their success to distillation and/or better sampling of examples for training -- while still relying on the same backbone architecture. In the meantime, sparse representation learning fueled by traditional inverted indexing techniques has seen a growing interest, inheriting from desirable IR priors such as explicit lexical matching. While some architectural variants have been proposed, a lesser effort has been put in the training of such models. In this work, we build on SPLADE -- a sparse expansion-based retriever -- and show to which extent it is able to benefit from the same training improvements as dense models, by studying the effect of distillation, hard-negative mining as well as the Pre-trained Language Model initialization. We furthermore study the link between effectiveness and efficiency, on in-domain and zero-shot settings, leading to state-of-the-art results in both scenarios for sufficiently expressive models.

Which Discriminator for Cooperative Text Generation?
Authors: Antoine Chaffin (1), Thomas Scialom (2), Sylvain Lamprier (3), Jacopo Staiano (4), Benjamin Piwowarski (5), Ewa Kijak (6), Vincent Claveau (7)
1: IRISA, 2: ISIR - Sorbonne Université, 3: ISIR - Sorbonne Université, 4: reciTAL, 5: CNRS, 6: Université Rennes, 7: CNRS

ACM DL

Google Scholar

(264)
概要:　言語モデルは、過去のトークンを基に次のトークンの確率分布を継次予測することでテキストを生成します。近年、生成されたテキストがより自然で毒性がない、信頼性が高い、または特定の文体を持つといった望ましい特性を持つように、デコーディングプロセスで外部情報を活用する試みが増えてきています。その一つの解決策として、各生成ステップで分類器を使用し、分類器が言語モデルの分布デコーディングをガイドしてタスクに応じた関連性のあるテキスト生成を目指す協調環境を構築する方法があります。本論文では、この特定の協調デコーディングタスクに対して三つの（トランスフォーマーに基づく）識別器のファミリー、すなわち双方向、左から右への順方向、及び生成型の識別器を検討します。これらの識別器の協調生成のための利点と欠点を評価し、分類タスクにおけるそれぞれの精度や生成サンプルの品質、計算性能に与える影響を探ります。また、我々の実験で使用した強力な協調デコーディング戦略であるモンテカルロ木探索のバッチ実装コードも提供し、各識別器と連携した自然言語生成を実現しています。

Abstract:　 Language models generate texts by successively predicting probability distributions for next tokens given past ones. A growing field of interest tries to leverage external information in the decoding process so that the generated texts have desired properties, such as being more natural, non toxic, faithful, or having a specific writing style. A solution is to use a classifier at each generation step, resulting in a cooperative environment where the classifier guides the decoding of the language model distribution towards relevant texts for the task at hand. In this paper, we examine three families of (transformer-based) discriminators for this specific task of cooperative decoding: bidirectional, left-to-right and generative ones. We evaluate the pros and cons of these different types of discriminators for cooperative generation, exploring respective accuracy on classification tasks along with their impact on the resulting sample quality and computational performances. We also provide the code of a batched implementation of the powerful cooperative decoding strategy used for our experiments, the Monte Carlo Tree Search, working with each discriminator for Natural Language Generation.

When Online Meets Offline: Exploring Periodicity for Travel Destination Prediction
Authors: Wanjie Tao (1), Liangyue Li (1), Chen Chen (2), Zulong Chen (1), Hong Wen (1)
1: Alibaba Group, 2: University of Virginia

ACM DL

Google Scholar

(265)
概要:　オンライン旅行プラットフォーム（OTPs）、例えばbooking.comやCtrip.comは、旅行関連商品の提供を通じてオンラインユーザーに旅行体験を提供しています。OTPsが直面する主要な課題の一つは、ユーザーの将来の旅行先を予測することです。これは、目的地の都市でのフライトチケットやホテルの推薦など、多くの重要な応用があります。次の観光地の推薦に関しては多くの進展がある一方で、ユーザーの旅行行動から現れるオフラインの時空的な周期性やオンラインの多興味探索などのユニークな特性により、OTPでの旅行先予測には最適化されていません。本論文では、OTPsでの旅行先予測のために、オンライン・オフライン周期性対応情報ゲインネットワーク（OOPIN）を提案します。このモデルの主要な構成要素は、(1) 訪問都市のシーケンスから時空的な周期性と連続した依存関係を抽出するオフライン移動パターン抽出器、および (2) ユーザーが興味を持っているがまだ訪れていない目的地をオンラインのインタラクションデータから発見するオンライン多興味探索モジュールです。実世界のOTPデータに対する包括的な実験により、提案モデルが最先端の手法と比較して旅行先予測において優れた性能を示すことが証明されました。

Abstract:　 Online travel platforms (OTPs), e.g., booking.com and Ctrip.com, deliver travel experiences to online users by providing travel-related products. One key problem facing OTPs is to predict users' future travel destination, which has many important applications, e.g., proactively recommending users flight tickets or hotels in the destination city. Although much progress has been made for the next POI recommendation, they are largely sub-optimal for travel destination prediction on OTPs, due to the unique characteristics exhibited from users' travel behaviors such as offline spatial-temporal periodicity and online multi-interest exploration. In this paper, we propose an online-offline periodicity-aware information gain network, OOPIN, for travel destination prediction on OTPs. The key components of the model are (1) an offline mobility pattern extractor, which extracts spatial-temporal periodicity along with the sequential dependencies from the visited city sequence; and (2) an online multi-interests exploration module that discovers destinations that the user might be interested in but not yet visited from their online interaction data.Comprehensive experiments on real-world OTP demonstrate the superior performance of the proposed model for travel destination prediction compared with state-of-the-art methods.

Long Document Re-ranking with Modular Re-ranker
Authors: Luyu Gao (1), Jamie Callan (1)
1: Carnegie Mellon University

ACM DL

Google Scholar

(266)
概要:　長文ドキュメントのリランキングは、BERTのような深層言語モデルに基づくニューラルリランキング機構にとって難題となっています。従来の研究では、ドキュメントを短いパッセージ状のチャンクに分割します。これらのチャンクは独立してスカラー得点や潜在ベクトルにマッピングされ、それらを最後にまとめて関連性スコアとします。しかし、これらのエンコード・プール方法は、低次元表現の情報ボトルネックを不可避的に導入します。本研究では、代わりに全文書のクエリ対ドキュメント相互作用をモデル化し、アテンション操作とモジュラーTransformerリランカーのフレームワークを活用することを提案します。まず、ドキュメントチャンクはエンコードモジュールで独立してエンコードされます。次に、相互作用モジュールがクエリをエンコードし、すべてのドキュメントチャンク表現に対するジョイントアテンションを行います。この新しい自由度を利用することで、モデルはドキュメント全体から重要な情報を集約できることを示します。我々の実験は、この設計が古典的な情報検索コレクションRobust04とClueWeb09、さらに大規模な教師付きコレクションMS-MARCOドキュメントランキングで効果的なリランキングを実現することを示しています。

Abstract:　 Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. Early work breaks the documents into short passage-like chunks. These chunks are independently mapped to scalar scores or latent vectors, which are then pooled into a final relevance score. These encode-and-pool methods however inevitably introduce an information bottleneck: the low dimension representations. In this paper, we propose instead to model full query-to-document interaction, leveraging the attention operation and modular Transformer re-ranker framework. First, document chunks are encoded independently with an encoder module. An interaction module then encodes the query and performs joint attention from the query to all document chunk representations. We demonstrate that the model can use this new degree of freedom to aggregate important information from the entire document. Our experiments show that this design produces effective re-ranking on two classical IR collections Robust04 and ClueWeb09, and a large-scale supervised collection MS-MARCO document ranking.

Improving Micro-video Recommendation via Contrastive Multiple Interests
Authors: Beibei Li (1), Beihong Jin (1), Jiageng Song (1), Yisong Yu (1), Yiyuan Zheng (1), Wei Zhou (2)
1: Institute of Software, 2: MX Media Co.

ACM DL

Google Scholar

(267)
概要:　マイクロビデオの作成者と視聴者が急増する中、大量の候補から視聴者に対して個別に推薦を行う方法がますます注目されています。しかし、既存のマイクロビデオ推薦モデルは高価なマルチモーダル情報に依存しており、ユーザーの複数の興味を反映できない全体的な興味埋め込みを学習します。最近では、対比学習が既存の推薦技術を洗練する新しい機会を提供しています。そこで、本論文では、対比的なマルチインタレストを抽出し、CMIというマイクロビデオ推薦モデルを考案することを提案します。具体的には、CMIは各ユーザーの過去のインタラクションシーケンスから複数の興味の埋め込みを学習し、暗黙の直交するマイクロビデオカテゴリを用いて複数のユーザー興味を分離します。さらに、興味埋め込みのロバスト性と推薦パフォーマンスを向上させるために、対比的なマルチインタレスト損失を確立します。2つのマイクロビデオデータセットにおける実験結果は、CMIが既存のベースラインを超える最新のパフォーマンスを達成することを示しています。

Abstract:　 With the rapid increase of micro-video creators and viewers, how to make personalized recommendations from a large number of candidates to viewers begins to attract more and more attention. However, existing micro-video recommendation models rely on expensive multi-modal information and learn an overall interest embedding that cannot reflect the user's multiple interests in micro-videos. Recently, contrastive learning provides a new opportunity for refining the existing recommendation techniques. Therefore, in this paper, we propose to extract contrastive multi-interests and devise a micro-video recommendation model CMI. Specifically, CMI learns multiple interest embeddings for each user from his/her historical interaction sequence, in which the implicit orthogonal micro-video categories are used to decouple multiple user interests. Moreover, it establishes the contrastive multi-interest loss to improve the robustness of interest embeddings and the performance of recommendations. The results of experiments on two micro-video datasets demonstrate that CMI achieves state-of-the-art performance over existing baselines.

Is News Recommendation a Sequential Recommendation Task?
Authors: Chuhan Wu (1), Fangzhao Wu (2), Tao Qi (1), Chenliang Li (3), Yongfeng Huang (1)
1: Tsinghua University, 2: Microsoft Research Asia, 3: Wuhan University

ACM DL

Google Scholar

(268)
概要:　ニュース推薦は、歴史的にクリックされたニュースにおける短期的依存度が豊富であると仮定して、逐次推薦タスクとしてモデル化されることがよくあります。しかし、ユーザーは通常、ニュース情報の時間的多様性に対して強い嗜好を持ち、類似したニュースを続けてクリックする傾向が低いことが多く、これはeコマース推薦などの多くの逐次推薦シナリオとは大きく異なります。本稿では、ニュース推薦が標準的な逐次推薦問題と見なせるかどうかを研究します。2つの実世界のデータセットにおける広範な実験を通じて、ニュース推薦を従来の逐次推薦問題としてモデル化することが最適ではないことがわかりました。この問題に対処するために、最近クリックされたニュースとは異なる候補のニュースを促進することで、将来のクリックをより正確に予測することができる時間的多様性に配慮した逐次ニュース推薦方法を提案します。実験結果は、我々の方法が様々なニュース推薦方法を強化できることを示しています。

Abstract:　 News recommendation is often modeled as a sequential recommendation task, assuming there are rich short-term dependencies over historical clicked news. However, users usually have strong preferences on the temporal diversity of news information and may not tend to click similar news successively, which is very different from many sequential recommendation scenarios such as e-commerce recommendation. In this paper, we study whether news recommendation can be regarded as a standard sequential recommendation problem. Through extensive experiments on two real-world datasets, we find it suboptimal to model news recommendation as a conventional sequential recommendation problem. To handle this issue, we further propose a temporal diversity-aware sequential news recommendation method that can promote candidate news that are diverse from recently clicked news to help predict future clicks more accurately. Experiments show that our method can empower various news recommendation methods.

InPars: Unsupervised Dataset Generation for Information Retrieval
Authors: Luiz Bonifacio (1), Hugo Abonizio (2), Marzieh Fadaee (3), Rodrigo Nogueira (4)
1: Zeta Alpha, 2: Zeta Alpha & NeuralMind, 3: Zeta Alpha, 4: Zeta Alpha

ACM DL

Google Scholar

(269)
概要:　情報検索（IR）の分野では、最近、大規模な事前学習済みのトランスフォーマーモデルによる革命が起きています。この革命のもう一つの重要な要素は、MS MARCOデータセットです。このデータセットのスケールと多様性が、様々なタスクへのゼロショット転移学習を可能にしました。しかし、すべてのIRタスクやドメインが一つのデータセットから平等に恩恵を受けられるわけではありません。様々なNLPタスクにおける広範な研究は、汎用データセットではなくドメイン固有のトレーニングデータを使用することで、ニューラルモデルのパフォーマンスが向上することを示しています。本研究では、大規模な事前学習済み言語モデルの少数ショット能力をIRタスクのための合成データ生成に利用します。我々は、合成データセットのみに微調整されたモデルが、BM25や最近提案された自己教師付きの密集検索手法などの強力なベースラインを上回ることを示します。コード、モデル、およびデータは https://github.com/zetaalphavector/inpars で利用可能です。

Abstract:　 The Information Retrieval (IR) community has recently witnessed a revolution due to large pretrained transformer models. Another key ingredient for this revolution was the MS MARCO dataset, whose scale and diversity has enabled zero-shot transfer learning to various tasks. However, not all IR tasks and domains can benefit from one single dataset equally. Extensive research in various NLP tasks has shown that using domain-specific training data, as opposed to a general-purpose one, improves the performance of neural models. In this work, we harness the few-shot capabilities of large pretrained language models as synthetic data generators for IR tasks. We show that models finetuned solely on our synthetic datasets outperform strong baselines such as BM25 as well as recently proposed self-supervised dense retrieval methods. Code, models, and data are available at https://github.com/zetaalphavector/inpars.

Identifying Argumentative Questions in Web Search Logs
Authors: Yamen Ajjour (1), Pavel Braslavski (2), Alexander Bondarenko (3), Benno Stein (4)
1: Leipzig University & Bauhaus-Universität Weimar, 2: Ural Federal University & HSE University, 3: Martin Luther Universität Halle-Wittenberg, 4: Bauhaus-Universität Weimar

ACM DL

Google Scholar

(270)
概要:　本研究では、ウェブ検索クエリの中から議論的な質問を識別するアプローチを提案します。議論的な質問は、例えば「マリファナは合法化されるべきか？」のように、物議を醸すトピックに対して特定の立場を支持する理由を求めます。物議を醸すトピックは対立する立場を含んでおり、様々な論拠によって支持も反対もされる可能性があります。議論的な質問は検索エンジンにとって課題となります。それは、特定の立場に偏らないように、賛成と反対の両方の論拠によって回答されるべきだからです。この問題をさらに分析するために、大規模なYandexの検索ログから19の物議を醸すトピックに関する質問をサンプリングし、人間のアノテーターにそれらを事実、方法、議論的のいずれかにラベル付けしてもらいました。その結果、39,340のラベル付き質問が収集され、そのうち28%が議論的であることが示されました。これにより、この種の質問に特化したシステムの開発が必要であることが明らかになりました。三つの質問タイプの比較分析により、理由や予測を求めることが議論的な質問の最も重要な特徴の一つであることが示されました。分類タスクの実現可能性を示すために、BERTベースの分類器を開発し、質問を各タイプにマッピングし、有望なマクロ平均F1スコア0.78を達成しました。

Abstract:　 We present an approach to identify argumentative questions among web search queries. Argumentative questions ask for reasons to support a certain stance on a controversial topic, such as ''Should marijuana be legalized?'' Controversial topics entail opposing stances, and hence can be supported or opposed by various arguments. Argumentative questions pose a challenge for search engines since they should be answered with both pro and con arguments in order to not bias a user toward a certain stance. To further analyze the problem, we sampled questions about 19 controversial topics from a large Yandex search log and let human annotators label them as one of factual, method, or argumentative. The result is a collection of 39,340 labeled questions, 28% of which are argumentative, demonstrating the need to develop dedicated systems for this type of questions. A comparative analysis of the three question types shows that asking for reasons and predictions are among the most important features of argumentative questions. To demonstrate the feasibility of the classification task, we developed a BERT-based classifier to map questions to the question types, reaching a promising macro-averaged F>sub>1-score of 0.78.

Smooth-AUC: Smoothing the Path Towards Rank-based CTR Prediction
Authors: Shuang Tang (1), Fangyuan Luo (1), Jun Wu (1)
1: Beijing Jiaotong University

ACM DL

Google Scholar

(271)
概要:　ディープニューラルネットワーク（DNNs）は、クリック率（CTR）推定の主要な技術となっていますが、既存のDNNsベースのCTRモデルは、その最適化目的（例: バイナリクロスエントロピー、BCE）とCTRランキングメトリクス（例: 受信者動作特性曲線下面積、AUC）との不一致を無視しています。特筆すべきは、勾配降下法によってAUCを直接最適化することが、AUCに内在する非微分可能なヘヴィサイド関数のために困難である点です。この問題を解決するために、ランクベースのCTR予測に向けて、AUCの滑らかな近似を提案します。これをSmooth-AUC（SAUC）と呼びます。具体的には、SAUCはシグモイド関数と温度係数を用いてヘヴィサイド関数を緩和し、勾配ベースの最適化を容易にします。さらに、SAUCはプラグアンドプレイの目的関数であり、任意のDNNsベースのCTRモデルに使用することができます。二つの実世界データセットでの実験結果は、SAUCが現在のDNNsベースのCTRモデルの推薦精度を一貫して向上させることを示しています。

Abstract:　 Deep neural networks (DNNs) have been a key technique for click-through rate (CTR) estimation, yet existing DNNs-based CTR models neglect the inconsistency between their optimization objectives (e.g., Binary Cross Entropy, BCE) and CTR ranking metrics (e.g., Area Under the ROC Curve, AUC). It is noteworthy that directly optimizing AUC by gradient-descent methods is difficult due to the non-differentiable Heaviside function built-in AUC. To this end, we propose a smooth approximation of AUC, called smooth-AUC (SAUC), towards the rank-based CTR prediction. Specifically, SAUC relaxes the Heaviside function via sigmoid with a temperature coefficient (aiming at controlling the function sharpness) in order to facilitate the gradient-based optimization. Furthermore, SAUC is a plug-and-play objective that can be used in any DNNs-based CTR model. Experimental results on two real-world datasets demonstrate that SAUC consistently improves the recommendation accuracy of current DNNs-based CTR models.

Diversity Vs Relevance: A Practical Multi-objective Study in Luxury Fashion Recommendations
Authors: João Sá (1), Vanessa Queiroz Marinho (1), Ana Rita Magalhães (1), Tiago Lacerda (1), Diogo Goncalves (1)
1: Farfetch

ACM DL

Google Scholar

(272)
概要:　精度にのみ焦点を当てたパーソナライズドアルゴリズムは非常に関連性の高い推奨を提供する可能性がありますが、推奨される項目は現利用者の好みにあまりにも類似しているかもしれません。その結果、レコメンダーシステムは利用者が新しい商品やブランドを探索するのを妨げる可能性があります（フィルターバブル）。これは、特にラグジュアリーファッションの推奨において重要です。なぜなら、ラグジュアリーショッパーは排他的で稀少なアイテムを発見することを期待しているからです。したがって、ファッションのレコメンダーシステムは多様性を考慮し、カタログから新しいブランドや商品を推奨することでショッピング体験を向上させる必要があります。本研究では、リレバンスに焦点を当てたレコメンダーシステムの出力を再ランキングするための多様化戦略を探索しました。その後、リレバンスと多様性を同時に最適化するマルチオブジェクティブなオフライン実験を実施しました。多様性はカバレッジ、セレンディピティ、近隣距離といった一般的に使用される指標で測定し、リレバンスにはリコールなどのランキング指標を選択しました。最も優れた多様化戦略は、クリック率を2%向上させ、リアルユーザーを対象にABテストした際、推奨された異なるブランドの数において46%の増加を示しました。これらの結果は、レコメンダーシステムの開発において精度と多様性の指標を考慮する重要性を強化するものです。

Abstract:　 Personalized algorithms focusing uniquely on accuracy might provide highly relevant recommendations, but the recommended items could be too similar to current users' preferences. Therefore, recommenders might prevent users from exploring new products and brands (filter bubbles). This is especially critical for luxury fashion recommendations because luxury shoppers expect to discover exclusive and rare items. Thus, recommender systems for fashion need to consider diversity and elevate the shopping experience by recommending new brands and products from the catalog. In this work, we explored a handful of diversification strategies to rerank the output of a relevance-focused recommender system. Subsequently, we conducted a multi-objective offline experiment optimizing for relevance and diversity simultaneously. We measured diversity with commonly used metrics such as coverage, serendipity, and neighborhood distance, whereas, for relevance, we selected ranking metrics such as recall. The best diversification strategy offline improved user engagement by 2% in click-through rate and presented an uplift of 46% in distinct brands recommended when AB tested against real users. These results reinforced the importance of considering accuracy and diversity metrics when developing a recommender system.

Revisiting Two-tower Models for Unbiased Learning to Rank
Authors: Le Yan (1), Zhen Qin (1), Honglei Zhuang (1), Xuanhui Wang (1), Michael Bendersky (1), Marc Najork (1)
1: Google

ACM DL

Google Scholar

(273)
概要:　 Two-towerアーキテクチャは、Unbiased Learning to Rank（ULTR）のための実世界システムで一般的に使用されており、ここでは、あるDeep Neural Network（DNN）タワーが偏りのない関連性予測をモデル化し、もう一方のタワーがユーザークリックのようなトレーニングデータに固有の観察バイアスをモデル化します。このtwo-towerアーキテクチャは、限定された観察ログをより効率的に利用し、シングルタワーアーキテクチャが関連性予測とバイアスの間で学ぶ可能性のある偽の相関を避け、デプロイメント時の汎化能力を向上させます。しかし、既存のtwo-towerモデルは関連性予測と観察確率の同時分布が完全に分解可能であると仮定していることが文献でほとんど無視されています。本研究では、ULTRのためのtwo-towerモデルを再検討します。我々は、分解仮定が実世界のユーザー行動に対して強すぎる可能性があり、既存の方法がわずかに穏やかな仮定の下でも簡単に失敗する可能性があることを厳密に示します。その後、two-towerフレームワークのシンプルさと汎用性を維持しながら、より広範なユーザー行動を考慮したいくつかの新しいアイデアを提案します。既存のtwo-towerモデルに関する我々の懸念と提案された方法の有効性は、制御された合成データセットと大規模な実世界のデータセットの両方で検証されました。

Abstract:　 Two-tower architecture is commonly used in real-world systems for Unbiased Learning to Rank (ULTR), where a Deep Neural Network (DNN) tower models unbiased relevance predictions, while another tower models observation biases inherent in the training data like user clicks. This two-tower architecture introduces inductive biases to allow more efficient use of limited observational logs and better generalization during deployment than single-tower architecture that may learn spurious correlations between relevance predictions and biases. However, despite their popularity, it is largely neglected in the literature that existing two-tower models assume that the joint distribution of relevance prediction and observation probabilities are completely factorizable. In this work, we revisit two-tower models for ULTR. We rigorously show that the factorization assumption can be too strong for real-world user behaviors, and existing methods may easily fail under slightly milder assumptions. We then propose several novel ideas that consider a wider spectrum of user behaviors while still under the two-tower framework to maintain simplicity and generalizability. Our concerns of existing two-tower models and the effectiveness of our proposed methods are validated on both controlled synthetic and large-scale real-world datasets.

Answering Count Queries with Explanatory Evidence
Authors: Shrestha Ghosh (1), Simon Razniewski (2), Gerhard Weikum (2)
1: Max Planck Institute for Informatics & Saarland University, 2: Max Planck Institute for Informatics

ACM DL

Google Scholar

(274)
概要:　ウェブ検索や質問応答において、ジョン・レノンの楽曲数などのカウントクエリは難しいケースです。従来の手法は、単一の（時には曖昧な）数字で答えるか、異なる数字が含まれるテキストスニペットのランキングリストを返すのみでした。本論文では、カウントクエリに対して推論、文脈化、説明的証拠を用いて回答する手法を提案します。従来のシステムとは異なり、本手法は複数の観察から最終的な答えを推論し、カウントに対する意味論的修飾子をサポートし、代表的なインスタンスを列挙することで証拠を提供します。多様なクエリを対象とした実験では、本手法の利点が示されました。この未踏領域のさらなる研究を促進するために、5千件のクエリと20万件の関連テキストスパンから成る注釈付きデータセットを公開します。

Abstract:　 A challenging case in web search and question answering are count queries, such as"number of songs by John Lennon''. Prior methods merely answer these with a single, and sometimes puzzling number or return a ranked list of text snippets with different numbers. This paper proposes a methodology for answering count queries with inference, contextualization and explanatory evidence. Unlike previous systems, our method infers final answers from multiple observations, supports semantic qualifiers for the counts, and provides evidence by enumerating representative instances. Experiments with a wide variety of queries show the benefits of our method. To promote further research on this underexplored topic, we release an annotated dataset of 5k queries with 200k relevant text spans.

Interactive Query Clarification and Refinement via User Simulation
Authors: Pierre Erbacher (1), Ludovic Denoyer (1), Laure Soulier (1)
1: Sorbonne Université

ACM DL

Google Scholar

(275)
概要:　ユーザーが検索セッションを開始する際、そのクエリはしばしば曖昧であったり、コンテキストを欠いていることがあります。その結果、効率的なドキュメントランキングが困難になります。情報検索のコミュニティでは、コンテキストを追加し、ユーザーの意図に合致するドキュメントを取得するための複数のアプローチが提案されています。ある研究では、ユーザーの閲覧履歴を用いてクエリの曖昧性を解消することに焦点を当てていますが、最近の研究では、明確化の質問をしたり、明確化パネルを提案したりしてユーザーと対話する手法が提案されています。しかし、これらのアプローチはユーザーとのインタラクションの数が限られていたり（例：1回）、ログに基づいたインタラクションであることが多いです。本研究では、IRシステムとユーザーエージェントとの間で複数回のインタラクションを可能にする完全なシミュレーションによるクエリ明確化フレームワークを提案し、その評価を行います。

Abstract:　 When users initiate search sessions, their query are often ambiguous or might lack of context; this resulting in non-efficient document ranking. Multiple approaches have been proposed by the Information Retrieval community to add context and retrieve documents aligned with users' intents. While some work focus on query disambiguation using users' browsing history, a recent line of work proposes to interact with users by asking clarification questions or/and proposing clarification panels. However, these approaches count either a limited number (i.e., 1) of interactions with user or log-based interactions. In this paper, we propose and evaluate a fully simulated query clarification framework allowing multi-turn interactions between IR systems and user agents.

Summarizing Legal Regulatory Documents using Transformers
Authors: Svea Klaus (1), Ria Van Hecke (2), Kaweh Djafari Naini (2), Ismail Sengor Altingovde (3), Juan Bernabé-Moreno (1), Enrique Herrera-Viedma (4)
1: University of Granada & E.ON Digital Technology GmbH, 2: E.ON Digital Technology GmbH, 3: Middle East Technical University, 4: University of Granada

ACM DL

Google Scholar

(276)
概要:　企業は規制遵守の確保や、遵守を証明できない場合の罰金に多大な時間と資源を投資しています。このトピックは、変更や修正の頻度、ケースの複雑さ、法的言語の難しさを考慮すると、非常に関連性があるだけでなく、高度な複雑さを持ちます。本論文は、非法曹専門家でもどの規制作業がさらに追跡すべきかを判断できるようにするために、高度な抽出型技術を適用して規制の理解を民主化することを目的としています。そのために、まず、4595件のヨーロッパ規制文書とその対応するを含むコーパス「EUR-LexSum」を作成しました。次に、このコーパスに適用された変換器ベースのモデルをファインチューニングし、ROUGEメトリクスの観点から、従来の抽出型のベースラインと比べて優れたパフォーマンスを実現しました。我々の実験では、限られたデータ量でも、そのような変換器ベースのモデルは法的文書の分野で効果的であることが明らかになりました。

Abstract:　 Companies invest a substantial amount of time and resources in ensuring the compliance to the existing regulations or in the form of fines when compliance cannot be proven in auditing procedures. The topic is not only relevant, but also highly complex, given the frequency of changes and amendments, the complexity of the cases and the difficulty of the juristic language. This paper aims at applying advanced extractive summarization to democratize the understanding of regulations, so that non-jurists can decide which regulations deserve further follow-up. To achieve that, we first create a corpus named EUR-LexSum EUR-LexSum containing 4595 curated European regulatory documents and their corresponding summaries. We then fine-tune transformer-based models which, applied to this corpus, yield a superior performance (in terms of ROUGE metrics) compared to a traditional extractive summarization baseline. Our experiments reveal that even with limited amounts of data such transformer-based models are effective in the field of legal document summarization.

An MLP-based Algorithm for Efficient Contrastive Graph Recommendations
Authors: Siwei Liu (1), Iadh Ounis (1), Craig Macdonald (1)
1: University of Glasgow

ACM DL

Google Scholar

(277)
概要:　グラフベースのレコメンダーシステム（GBRS）は、グラフニューラルネットワーク（GNN）を使用してユーザーとアイテムの二部グラフを組み込むことで有望なパフォーマンスを達成しています。GBRSの中で、各ユーザーとアイテムのマルチホップの近隣情報は、近隣の集約およびメッセージの伝達を通じて効果的にノード間で伝達されます。しかし効果的ではあるものの、既存の近隣情報の集約および伝達関数は通常、計算負荷が高いです。新興の対比学習技術に動機づけられ、GNNの近隣情報処理をシミュレートするために、対比目的関数と組み合わせたシンプルな近隣構築法を設計しました。さらに、計算負荷を低減しつつ、複数層のGNNと比較して非線形性を追加してユーザーとアイテムの表現を学習するための多層パーセプトロン（MLP）に基づくシンプルなアルゴリズムを提案します。3つの公開データセットにおける広範な実証実験により、提案モデル（MLP-CGRec）は競合するベースラインと比較して、推奨精度を大幅に劣化させることなく、GPUメモリ消費とトレーニング時間をそれぞれ最大24.0％および33.1％削減できることが示されました。

Abstract:　 Graph-based recommender systems (GBRSs) have achieved promising performance by incorporating the user-item bipartite graph using the Graph Neural Network (GNN). Among GBRSs, the information from each user and item's multi-hop neighbours is effectively conveyed between nodes through neighbourhood aggregation and message passing. Although effective, existing neighbourhood information aggregation and passing functions are usually computationally expensive. Motivated by the emerging contrastive learning technique, we design a simple neighbourhood construction method in conjunction with the contrastive objective function to simulate the neighbourhood information processing of GNN. In addition, we propose a simple algorithm based on Multilayer Perceptron (MLP) for learning users and items' representations with extra non-linearity while lowering computational burden compared with multi-layers GNNs. Our extensive empirical experiments on three public datasets demonstrate that our proposed model, i.e. MLP-CGRec, can reduce the GPU memory consumption and training time by up to 24.0% and 33.1%, respectively, without significantly degenerating the recommendation accuracy in comparison with competitive baselines.

Modeling User Behavior With Interaction Networks for Spam Detection
Authors: Prabhat Agarwal (1), Manisha Srivastava (1), Vishwakarma Singh (1), Charles Rosenberg (1)
1: Pinterest

ACM DL

Google Scholar

(278)
概要:　スパムは、ユーザーコンテンツの作成と配信を促進するウェブ規模のデジタルプラットフォームに深刻な問題をもたらしています。これにより、プラットフォームの信頼性、推薦や検索といったサービスのパフォーマンス、そしてビジネス全体が損なわれます。スパマーは、非スパマーとは異なるさまざまな悪用行為や回避行為を行います。ユーザーの複雑な行動は、ノードおよびエッジ属性に富んだ異種グラフでよく表現できます。このような大規模ウェブプラットフォーム上のグラフでスパマーを特定することは、その構造的な複雑さと規模のために困難です。本論文では、新しいグラフフレームワークを利用したスパム検出モデルであるSEINE（Interaction NEtworksを用いたSpam DEtection）を提案します。我々のグラフは、豊富なユーザーの詳細と行動を同時にキャプチャし、数十億規模のグラフ上での学習を可能にします。このモデルは、エッジの種類と属性に加え、近隣も考慮することで、広範なスパマーを捕捉できます。数千万のノードと数十億のエッジからなる実際のデータセットで訓練されたSEINEは、1％の誤検出率で80％のリコール率という高いパフォーマンスを達成しました。SEINEは、パブリックデータセットにおいて最先端技術と同等の性能を示しながら、大規模なプロダクションシステムで実用的に使用できる点が特長です。

Abstract:　 Spam is a serious problem plaguing web-scale digital platforms which facilitate user content creation and distribution. It compromises platform's integrity, performance of services like recommendation and search, and overall business. Spammers engage in a variety of abusive and evasive behavior which are distinct from non-spammers. Users' complex behavior can be well represented by a heterogeneous graph rich with node and edge attributes. Learning to identify spammers in such a graph for a web-scale platform is challenging because of its structural complexity and size. In this paper, we propose SEINE (Spam DEtection using Interaction NEtworks), a spam detection model over a novel graph framework. Our graph simultaneously captures rich users' details and behavior and enables learning on a billion-scale graph. Our model considers neighborhood along with edge types and attributes, allowing it to capture a wide range of spammers. SEINE, trained on a real dataset of tens of millions of nodes and billions of edges, achieves a high performance of 80% recall with 1% false positive rate. SEINE achieves comparable performance to the state-of-the-art techniques on a public dataset while being pragmatic to be used in a large-scale production system.

Relation Extraction as Open-book Examination: Retrieval-enhanced Prompt Tuning
Authors: Xiang Chen (1), Lei Li (2), Ningyu Zhang (2), Chuanqi Tan (3), Fei Huang (3), Luo Si (3), Huajun Chen (2)
1: Zhejiang University, 2: Zhejiang University, 3: Alibaba Group

ACM DL

Google Scholar

(279)
概要:　事前学習された言語モデルは、顕著な少数ショット学習能力を示すことで関係抽出に大きく貢献しています。しかし、関係抽出のためのプロンプト調整方法は、依然として稀少または困難なパターンに一般化しない場合があります。以前のパラメトリック学習パラダイムは、トレーニングデータを本としての暗記、推論を試験として捉えることができます。少数ショットのインスタンスでは、これらの長尾や難解なパターンをパラメータ内に記憶することはほぼ不可能です。この問題に対処するため、関係抽出（RE）をオープンブックの試験と見なし、関係抽出のための取得強化プロンプト調整の新しい半パラメトリックパラダイムを提案します。プロンプトベースのインスタンス表現と対応する関係ラベルを記憶されたキー値ペアと見なして取得する、オープンブックデータストアを構築します。推論中には、モデルは事前学習された言語モデル（PLM）の基本出力とデータストア上の非パラメトリックな最も近い近傍分布を線形補間することで関係を推論できます。この方法により、私たちのモデルはトレーニング中に重みに保存された知識を通じて関係を推論するだけでなく、オープンブックデータストア内の例を解きほぐし、クエリを実行することによって意思決定を支援します。ベンチマークデータセットに関する広範な実験により、私たちの方法が標準の教師ありおよび少数ショット設定の両方で最先端を達成できることが示されています。

Abstract:　 Pre-trained language models have contributed significantly to relation extraction by demonstrating remarkable few-shot learning abilities. However, prompt tuning methods for relation extraction may still fail to generalize to those rare or hard patterns. Note that the previous parametric learning paradigm can be viewed as memorization regarding training data as a book and inference as the close-book test. Those long-tailed or hard patterns can hardly be memorized in parameters given few-shot instances. To this end, we regard RE as an open-book examination and propose a new semiparametric paradigm of retrieval-enhanced prompt tuning for relation extraction. We construct an open-book datastore for retrieval regarding prompt-based instance representations and corresponding relation labels as memorized key-value pairs. During inference, the model can infer relations by linearly interpolating the base output of PLM with the non-parametric nearest neighbor distribution over the datastore. In this way, our model not only infers relation through knowledge stored in the weights during training but also assists decision-making by unwinding and querying examples in the open-book datastore. Extensive experiments on benchmark datasets show that our method can achieve state-of-the-art in both standard supervised and few-shot settings

End-to-end Distantly Supervised Information Extraction with Retrieval Augmentation
Authors: Yue Zhang (1), Hongliang Fei (1), Ping Li (1)
1: Baidu

ACM DL

Google Scholar

(280)
概要:　遠隔監視（DS）は情報抽出（IE）タスクのためのラベル付きデータを生成するための一般的なアプローチです。しかし、DSはしばしば入力コンテキストに関係なく、知識ベース（KB）からラベルが抽出されるため、ノイズの多いラベルの問題に悩まされます。この問題を解決するために多くの取り組みがなされてきましたが、ほとんどの戦略は特定のタスク専用に設計されており、他のタスクに直接適応することはできません。我々はKBベースのDSの問題を解決するための一般的なパラダイム（Dasiera）を提案します。KBからのラベルは、ターゲットエンティティやエンティティペアの普遍的なラベルと見なすことができますが、IEタスクのための与えられたコンテキストには、ターゲットエンティティに関する部分的または全く情報が含まれないことがあり、含意された情報が曖昧であることもあります。したがって、与えられたコンテキストとKBラベルの間の不一致、つまり与えられたコンテキストがDSラベルを推論するのに十分な情報を持っていないことが、IEトレーニングデータセットで発生することがあります。この問題を解決するために、トレーニング中にDasieraはリトリーバル拡張メカニズムを活用して与えられたコンテキストの欠落した情報を補完し、ニューラルリトリーバと一般的な予測子をエンドツーエンドのフレームワークでシームレスに統合します。推論中に、指定されたコンテキストのみに基づいて予測したい場合は、リトリーバルコンポーネントを保持/削除することができます。DS設定の下で2つのIEタスク（名前付けエンティティのタイピングおよび関係抽出）でDasieraを評価しました。実験結果は、両方のタスクでDasieraが他のベースラインよりも優れていることを示しています。

Abstract:　 Distant supervision (DS) has been a prevalent approach to generating labeled data for information extraction (IE) tasks. However, DS often suffers from noisy label problems, where the labels are extracted from the knowledge base (KB), regardless of the input context. Many efforts have been devoted to designing denoising mechanisms. However, most strategies are only designed for one specific task and cannot be directly adapted to other tasks. We propose a general paradigm (Dasiera) to resolve issues in KB-based DS. Labels from KB can be viewed as universal labels of a target entity or an entity pair. While the given context for an IE task may only contain partial/zero information about the target entities, or the entailed information may be vague. Hence the mismatch between the given context and KB labels, i.e., the given context has insufficient information to infer DS labels, can happen in IE training datasets. To solve the problem, during training, Dasiera leverages a retrieval-augmentation mechanism to complete missing information of the given context, where we seamlessly integrate a neural retriever and a general predictor in an end-to-end framework. During inference, we can keep/remove the retrieval component based on whether we want to predict solely on the given context. We have evaluated Dasiera on two IE tasks under the DS setting: named entity typing and relation extraction. Experimental results show Dasiera's superiority to other baselines in both tasks.

DeSCoVeR: Debiased Semantic Context Prior for Venue Recommendation
Authors: Sailaja Rajanala (1), Arghya Pal (2), Manish Singh (1), Raphaël C.-W. Phan (3), KokSheik Wong (3)
1: Indian Institute of Technology Hyderabad, 2: Harvard Medical School, 3: Monash University Malaysia

ACM DL

Google Scholar

(281)
概要:　私たちは、新しいセマンティックコンテキストに基づいた会場推奨システムを提案します。このシステムは論文のタイトルとのみを使用します。タイトルとのテキストにはセマンティック（意味的）な要素とシンタクティック（統語的）な要素が両方含まれているという直感に基づき、セマンティック特徴抽出器とシンタクティック特徴抽出器を共同で訓練することで、有意義な情報を引き出し、論文の会場を提供する助けになることを示します。提案する手法「DeSCoVeR」は、初めにニューラルトピックモデルとテキスト分類器を使用してこれらのセマンティックおよびシンタクティック特徴を引き出します。その後、モデルは訓練段階でニューラルトピックモデルとテキスト分類器の特徴分布間のコンテキスト転送を行うための転移学習最適化手法を実行します。DeSCoVeRはまた、因果逆引きパス基準と文レベルのキーワードバイアス除去技術を使用して文書レベルのラベルバイアスを軽減します。DBLPデータセット上の実験では、DeSCoVeRが最新の手法を上回る性能を示しました。

Abstract:　 We present a novel semantic context prior-based venue recommendation system that uses only the title and the abstract of a paper. Based on the intuition that the text in the title and abstract have both semantic and syntactic components, we demonstrate that a joint training of a semantic feature extractor and syntactic feature extractor collaboratively leverages meaningful information that helps to provide venues for papers. The proposed methodology that we call DeSCoVeR at first elicits these semantic and syntactic features using a Neural Topic Model and text classifier respectively. The model then executes a transfer learning optimization procedure to perform a contextual transfer between the feature distributions of the Neural Topic Model and the text classifier during the training phase. DeSCoVeR also mitigates the document-level label bias using a Causal back-door path criterion and a sentence-level keyword bias removal technique. Experiments on the DBLP dataset show that DeSCoVeR outperforms the state-of-the-art methods.

Entity-Conditioned Question Generation for Robust Attention Distribution in Neural Information Retrieval
Authors: Revanth Gangi Reddy (1), Md Arafat Sultan (2), Martin Franz (2), Avirup Sil (2), Heng Ji (1)
1: UIUC, 2: IBM Research AI

ACM DL

Google Scholar

(282)
概要:　本論文では、教師ありニューラル情報検索（IR）モデルが通過トークンに対して疎な注意パターンを学習しがちであり、その結果、命名されたエンティティを含む主要なフレーズが低い注意重みを受け、最終的にモデルの性能低下につながることを示します。重要なエンティティに対する不十分な注意を特定し、それに基づいて生成エピソードを条件づける新しいターゲティング型の合成データ生成手法を使用して、ニューラルIRモデルが特定のパッセージ内のすべてのエンティティに対してより均一かつ堅牢に注意を払うように教育します。2つの公開IRベンチマークにおいて、提案手法がモデルの注意パターンと検索性能の両方を改善することを実証的に示しており、ゼロショット設定でも同様の効果が得られることを確認しました。

Abstract:　 We show that supervised neural information retrieval (IR) models are prone to learning sparse attention patterns over passage tokens, which can result in key phrases including named entities receiving low attention weights, eventually leading to model under-performance. Using a novel targeted synthetic data generation method that identifies poorly attended entities and conditions the generation episodes on those, we teach neural IR to attend more uniformly and robustly to all entities in a given passage. On two public IR benchmarks, we empirically show that the proposed method helps improve both the model's attention patterns and retrieval performance, including in zero-shot settings.

Assessing Scientific Research Papers with Knowledge Graphs
Authors: Kexuan Sun (1), Zhiqiang Qiu (1), Abel Salinas (1), Yuzhong Huang (1), Dong-Ho Lee (1), Daniel Benjamin (2), Fred Morstatter (1), Xiang Ren (1), Kristina Lerman (1), Jay Pujara (1)
1: University of Southern California, 2: Nova Southeastern University

ACM DL

Google Scholar

(283)
概要:　近年、科学研究の規模の拡大に伴い、数多くの新しい発見がなされています。これらの発見を再現することは、将来の研究の基盤となります。しかし、実験の複雑さにより、社会・行動科学において科学研究を手動で評価することは労力と時間を要します。再現性の研究が研究コミュニティで注目を集めつつあるものの、規模の大きい科学研究を体系的に評価する方法は依然として不足しています。本論文では、科学論文を自動的に評価する新しいアプローチを提案します。このアプローチでは、研究貢献を包括的に捉える知識グラフ（KG）を構築します。具体的には、KGの構築において、発表された論文からのサンプルサイズや効果サイズ、実験モデルなどのミクロレベルの特徴と、著者関係や参考文献情報などのエンティティ間の関係を示すマクロレベルの特徴を組み合わせます。そして、言語モデルと知識グラフ埋め込みを用いてエンティティ（KGのノード）の低次元表現を学習し、これを評価に使用します。二つのベンチマークデータセットで行った包括的な実験により、KGを活用して科学研究を評価する有用性が示されました。

Abstract:　 In recent decades, the growing scale of scientific research has led to numerous novel findings. Reproducing these findings is the foundation of future research. However, due to the complexity of experiments, manually assessing scientific research is laborious and time-intensive, especially in social and behavioral sciences. Although increasing reproducibility studies have garnered increased attention in the research community, there is still a lack of systematic ways for evaluating scientific research at scale. In this paper, we propose a novel approach towards automatically assessing scientific publications by constructing a knowledge graph (KG) that captures a holistic view of the research contributions. Specifically, during the KG construction, we combine information from two different perspectives: micro-level features that capture knowledge from published articles such as sample sizes, effect sizes, and experimental models, and macro-level features that comprise relationships between entities such as authorship and reference information. We then learn low-dimensional representations using language models and knowledge graph embeddings for entities (nodes in KGs), which are further used for the assessments. A comprehensive set of experiments on two benchmark datasets shows the usefulness of leveraging KGs for scoring scientific research.

Matching Search Result Diversity with User Diversity Acceptance in Web Search Sessions
Authors: Jiqun Liu (1), Fangyuan Han (2)
1: The University of Oklahoma, 2: Xiamen University

ACM DL

Google Scholar

(284)
概要:　ランク付けされた結果の関連性を維持しつつ、多様性を促進することは、人間中心の検索システムを強化するために重要です。既存のランク付けアルゴリズムと多様性に関する情報検索（IR）指標は、オフライン実験で検索結果の多様化を評価・改善するための堅固な基盤を提供しますが、ユーザーの多様性受容度の可能な乖離や時間的な変化を見逃しています。本研究における多様性受容度とは、トピック的に多様化した検索結果と実際にどの程度ユーザーがやり取りを行いたいかの度合いを指します。オフライン評価とユーザーの期待の間のこのギャップに対処するために、我々は直感的な多様性受容度測定を提案し、制御されたラボと自然主義的な設定の双方からのデータセットに基づいて多様性受容予測と多様性を考慮した再ランク付けの実験を実施しました。我々の結果は次のことを示しています: 1) ユーザーの多様性受容度は異なるクエリセグメントやセッションコンテキストによって変化し、検索インタラクション信号から予測可能であること; 2) 予測された多様性受容度と推定された関連性ラベルを活用する我々の多様性を考慮した再ランク付けアルゴリズムは、多様性受容度と結果の多様性のギャップを最小化し、検索結果ページ（SERP）の関連性レベルを維持できること。我々の研究は、ユーザーのニーズ、結果の多様性、およびセッションにおけるSERPの関連性のバランスを取る初歩的な試みを提示し、効果的な結果の多様化を促進するために多様性受容度を研究する重要性を強調しています。

Abstract:　 Promoting diversity in ranking while maintaining the relevance of ranked results is critical for enhancing human-centered search systems. While existing ranking algorithm and diversity IR metrics provide a solid basis for evaluating and improving search result diversification in offline experiments, it misses out possible divergences and temporal changes of users' levels of Diversity Acceptance, which in this work refers to the extent to which users actually prefer to interact with topically diversified search results. To address this gap between offline evaluations and users' expectations, we proposed an intuitive diversity acceptance measure and ran experiments for diversity acceptance prediction and diversity-aware re-ranking based on datasets from both controlled lab and naturalistic settings. Our results demonstrate that: 1) user diversity acceptance change across different query segments and session contexts, and can be predicted from search interaction signals; 2) our diversity-aware re-ranking algorithm utilizing predicted diversity acceptance and estimated relevance labels can effectively minimize the gap between diversity acceptance and result diversity, while maintaining SERP relevance levels. Our research presents an initial attempt on balancing user needs, result diversity, and SERP relevance in sessions and highlights the importance of studying diversity acceptance in promoting effective result diversification.

Topological Analysis of Contradictions in Text
Authors: Xiangcheng Wu (1), Xi Niu (1), Ruhani Rahman (1)
1: University of North Carolina at Charlotte

ACM DL

Google Scholar

(285)
概要:　課題: 英語から日本語への翻訳
対象ジャーナル: 機械学習
目的: 分かりやすく、明確な記述
対象: 専門家
スタイル: 分析的、博士課程レベル
本文:自動的にテキストから矛盾を見つけることは、自然言語理解および情報検索において基本的でありながら十分に研究されていない問題です。最近、幾何学的形状の特性に関与する数学の分野であるトポロジーが、テキストの意味を理解するのに有用であることが示されています。本研究では、テキストの矛盾を検出する際にディープラーニングモデルを強化するためのトポロジカルアプローチを提案します。また、矛盾をよりよく理解するために、6種類の矛盾の分類を提案します。その後、トポロジカルに強化されたモデルを異なる種類の矛盾や異なるテキストジャンルで評価します。全体として、トポロジカルな特徴が特に潜在的で複雑なテキストの矛盾を見つけるのに有用であることを実証しました。

Abstract:　 Automatically finding contradictions from text is a fundamental yet under-studied problem in natural language understanding and information retrieval. Recently, topology, a branch of mathematics concerned with the properties of geometric shapes, has been shown useful to understand semantics of text. This study presents a topological approach to enhancing deep learning models in detecting contradictions in text. In addition, in order to better understand contradictions, we propose a classification with six types of contradictions. Following that, the topologically enhanced models are evaluated with different contradictions types, as well as different text genres. Overall we have demonstrated the usefulness of topological features in finding contradictions, especially the more latent and more complex contradictions in text.

Addressing Gender-related Performance Disparities in Neural Rankers
Authors: Shirin Seyedsalehi (1), Amin Bigdeli (2), Negar Arabzadeh (3), Morteza Zihayat (2), Ebrahim Bagheri (2)
1: Ryerson University, 2: Ryerson University, 3: University of Waterloo

ACM DL

Google Scholar

(286)
概要:　ニューラルランカーはさまざまな情報検索タスクにおいて顕著な性能向上を示し続けていますが、近年の研究ではこれらのランカーが特定のステレオタイプ的なバイアスを強化する可能性があることが示されています。本論文では、ニューラルランカーが異なる性別に関連するクエリに対して検索効率（性能）に格差をもたらすかどうかを調査します。特に、男性と女性のクエリがニューラルランカーによって検索される際に、性能に有意な差異があるかどうかを検討します。MS MARCO コレクションに基づく実証的研究を通じて、性能の格差が顕著であり、異なる性別のクエリに対してクエリと関連する評価の収集および分布方法の違いによって性能格差が生じる可能性があることを確認しました。具体的には、男性クエリが関連する文書とより密接に関連しているのに対し、女性クエリはそうではなく、そのためニューラルランカーは男性クエリと関連文書の関連性をより簡単に学習できることを観察しました。我々は、異なる性別のクエリ間の性能格差を減少させるために、全体的なモデル性能を損なうことなく関連性評価の収集を体系的にバランスさせることが可能であることを示します。

Abstract:　 While neural rankers continue to show notable performance improvements over a wide variety of information retrieval tasks, there have been recent studies that show such rankers may intensify certain stereotypical biases. In this paper, we investigate whether neural rankers introduce retrieval effectiveness (performance) disparities over queries related to different genders. We specifically study whether there are significant performance differences between male and female queries when retrieved by neural rankers. Through our empirical study over the MS MARCO collection, we find that such performance disparities are notable and that the performance disparities may be due to the difference between how queries and their relevant judgements are collected and distributed for different gendered queries. More specifically, we observe that male queries are more closely associated with their relevant documents compared to female queries and hence neural rankers are able to more easily learn associations between male queries and their relevant documents. We show that it is possible to systematically balance relevance judgment collections in order to reduce performance disparity between different gendered queries without negatively compromising overall model performance.

Alignment Rationale for Query-Document Relevance
Authors: Youngwoo Kim (1), Razieh Rahimi (1), James Allan (1)
1: University of Massachusetts Amherst

ACM DL

Google Scholar

(287)
概要:　ディープニューラルネットワークは、アドホック情報検索などのテキストペア分類タスクに広く使用されています。これらのディープニューラルネットワークは本質的に解釈可能ではなく、その決定の背後にある根拠を明らかにするためには追加の努力が必要です。既存の説明モデルは、クエリ語と文書語の間の整列を誘導することがまだできていません。つまり、文書のどの部分の根拠がクエリのどの部分に対応しているかを示すことができていません。本論文では、入力の摂動を使用して、クエリと文書の範囲間の整列をどのように推測または評価できるかを研究し、ブラックボックスランカーの関連性予測を最もよく説明する方法を探ります。異なる摂動戦略を使用し、それに応じてモデルに対する整列根拠の忠実度を評価する一連の指標を提案します。我々の実験結果は、置換ベースの摂動に基づく指標が、削除ベースの指標と比較して、より高品質な整列を優先する上でより成功していることを示しています。

Abstract:　 Deep neural networks are widely used for text pair classification tasks such as as adhoc information retrieval. These deep neural networks are not inherently interpretable and require additional efforts to get rationale behind their decisions. Existing explanation models are not yet capable of inducing alignments between the query terms and the document terms -- which part of the document rationales are responsible for which part of the query? In this paper, we study how the input perturbations can be used to infer or evaluate alignments between the query and document spans, which best explain the black-box ranker's relevance prediction. We use different perturbation strategies and accordingly propose a set of metrics to evaluate the faithfulness of alignment rationales to the model. Our experiments show that the defined metrics based on substitution-based perturbation are more successful in preferring higher-quality alignments, compared to the deletion-based metrics.

To Interpolate or not to Interpolate: PRF, Dense and Sparse Retrievers
Authors: Hang Li (1), Shuai Wang (1), Shengyao Zhuang (1), Ahmed Mourad (1), Xueguang Ma (2), Jimmy Lin (2), Guido Zuccon (3)
1: The University of Queensland, 2: University of Waterloo, 3: The University of Queensland

ACM DL

Google Scholar

(288)
概要:　情報検索のための事前訓練済み言語モデルアプローチは、主にスパースリトリーバー（BM25のような非神経的アプローチも含む）とデンスリトリーバーの二つに大別されます。これらのカテゴリはそれぞれ異なる関連性の特徴を捉えるようです。これまでの研究では、スパースリトリーバーからの関連性信号をデンスリトリーバーからの信号と補間して結合する方法が調査されてきました。このような補間は、一般的にはより高い検索効果をもたらすとされています。本論文では、スパースリトリーバーとデンスリトリーバーの関連性信号を擬似関連フィードバック（Pseudo Relevance Feedback, PRF）の文脈で結合する問題を考察します。この文脈は2つの主要な課題を提起します：(1) 補間はいつ実行するべきか：PRFプロセスの前か後か、またはその両方か。(2) どのスパース表現を考慮するべきか：ゼロショットのBag-of-Wordsモデル（BM25）か、学習されたスパース表現か。これらの質問に答えるために、効果的かつスケーラブルな神経系PRFアプローチ（Vector-PRF）、3つの効果的なデンスリトリーバー（ANCE、TCTv2、DistillBERT）、および最新の学習済みスパースリトリーバー（uniCOIL）を考慮し、包括的な実証評価を行いました。実験の結果は、スパース表現やデンスリトリーバーに関わらず、PRFの前後両方での補間が多くのデータセットとメトリクスにおいて最も高い効果を達成することを示唆しています。

Abstract:　 Current pre-trained language model approaches to information retrieval can be broadly divided into two categories: sparse retrievers (to which belong also non-neural approaches such as bag-of-words methods, e.g., BM25) and dense retrievers. Each of these categories appears to capture different characteristics of relevance. Previous work has investigated how relevance signals from sparse retrievers could be combined with those from dense retrievers via interpolation. Such interpolation would generally lead to higher retrieval effectiveness. In this paper we consider the problem of combining the relevance signals from sparse and dense retrievers in the context of Pseudo Relevance Feedback (PRF). This context poses two key challenges: (1) When should interpolation occur: before, after, or both before and after the PRF process? (2) Which sparse representation should be considered: a zero-shot bag-of-words model (BM25), or a learned sparse representation? To answer these questions we perform a thorough empirical evaluation considering an effective and scalable neural PRF approach (Vector-PRF), three effective dense retrievers (ANCE, TCTv2, DistillBERT), and one state-of-the-art learned sparse retriever (uniCOIL). The empirical findings from our experiments suggest that, regardless of sparse representation and dense retriever, interpolation both before and after PRF achieves the highest effectiveness across most datasets and metrics.

A Content Recommendation Policy for Gaining Subscribers
Authors: Konstantinos Theocharidis (1), Manolis Terrovitis (2), Spiros Skiadopoulos (3), Panagiotis Karras (4)
1: University of the Peloponnese & Information Management Systems Institute, 2: Information Management Systems Institute, 3: University of the Peloponnese, 4: Aarhus University

ACM DL

Google Scholar

(289)
概要:　ブランド・エージェントがソーシャルネットワークページの新しい登録者を獲得するために、いくつかのラウンドにわたって使用するコンテンツをどのように推薦すればよいでしょうか？影響力最大化（IM）の問題は、目的はソーシャルネットワークにおける期待される影響の目標を達成するために、~kユーザーまたは~k投稿特徴のセットを求めるものです。しかし、生の影響力とは別に、長期的な成功はブランドページの購読者に依存するため、登録者の増加も研究する価値があります。古典的なIMは~kユーザーを購読者セットから選択し、コンテンツ対応IMはその購読者セットから投稿の拡散を開始します。本論文では、複数のラウンドにわたってメッセージング（GSM）により購読者を獲得するためのブランド・エージェントへの新しいコンテンツ推薦方針を提案します。各ラウンドにおいて、ブランド・エージェントは一定数のソーシャルネットワークユーザーにメッセージを送り、彼らをブランドページに招待して購読を促し、その最新の公開コンテンツには招待されたユーザーの嗜好を強く引きつける特徴が含まれています。GSMを解決するために、各ラウンドでどのコンテンツ特徴を公開し、どのユーザーに通知するかを見つけ、全てのラウンドでの累積購読増加を最大化します。3つのGSMソルバーである\sR, \sSC, \sSUを導入し、VKの投稿に基づいて異なるユーザーセットと特徴セットを考慮して、それらの性能を実験的に評価しました。実験結果は、\sSUが最も優れたソリューションを提供し、\sSCよりも効率的で、小さな効果損失があり、\sRよりも明らかに効果的で競争力のある効率を持つことを示しています。

Abstract:　 How can we recommend content for a brand agent to use over a series of rounds so as to gain new subscribers to its social network page? The Influence Maximization (IM) problem seeks a set of~k users, and its content-aware variants seek a set of~k post features, that achieve, in both cases, an objective of expected influence in a social network. However, apart from raw influence, it is also relevant to study gain in subscribers, as long-term success rests on the subscribers of a brand page; classic IM may select~k users from the subscriber set, and content-aware IM starts the post's propagation from that subscriber set. In this paper, we propose a novel content recommendation policy to a brand agent for Gaining Subscribers by Messaging (GSM) over many rounds. In each round, the brand agent messages a fixed number of social network users and invites them to visit the brand page aiming to gain their subscription, while its most recently published content consists of features that intensely attract the preferences of the invited users. To solve GSM, we find, in each round, which content features to publish and which users to notify aiming to maximize the cumulative subscription gain over all rounds. We deploy three GSM solvers, named \sR, \sSC, and \sSU, and we experimentally evaluate their performance based on VKontakte (VK) posts by considering different user sets and feature sets. Our experimental results show that \sSU provides the best solution, as it is significantly more efficient than \sSC with a minor loss of efficacy and clearly more efficacious than \sR with competitive efficiency.

C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval
Authors: Eugene Yang (1), Suraj Nair (2), Ramraj Chandradevan (3), Rebecca Iglesias-Flores (4), Douglas W. Oard (2)
1: Johns Hopkins University, 2: University of Maryland, 3: Emory University, 4: University of Pennsylvania

ACM DL

Google Scholar

(290)
概要:　事前学習済みの言語モデルは、アドホック検索を含む多くのタスクで効果が向上しています。最近の研究では、検索タスクに微調整する前に補助目的で事前学習を続けることで、さらに検索の効果が向上することが示されています。モノリンガルの検索と異なり、言語間マッピングに適した補助タスクを設計することは困難です。この課題に対処するために、我々は異なる言語の対応するWikipedia記事を使用して、市販の多言語事前学習モデルを再事前学習し、その後検索タスクに微調整します。我々のアプローチは、検索効果の向上をもたらすことを示しています。

Abstract:　 Pretrained language models have improved effectiveness on numerous tasks, including ad-hoc retrieval. Recent work has shown that continuing to pretrain a language model with auxiliary objectives before fine-tuning on the retrieval task can further improve retrieval effectiveness. Unlike monolingual retrieval, designing an appropriate auxiliary task for cross-language mappings is challenging. To address this challenge, we use comparable Wikipedia articles in different languages to further pretrain off-the-shelf multilingual pretrained models before fine-tuning on the retrieval task. We show that our approach yields improvements in retrieval effectiveness.

Dual Pseudo Supervision for Semi-Supervised Text Classification with a Reliable Teacher
Authors: Shujie Li (1), Min Yang (2), Chengming Li (3), Ruifeng Xu (4)
1: University of Science and Technology of China, 2: Shenzhen Institute of Advanced Technology, 3: Sun Yat-sen University, 4: Harbin Institute of Technology & Peng Cheng Lab

ACM DL

Google Scholar

(291)
概要:　この論文では、ラベル付きデータと追加のラベルなしデータの両方を活用して半教師ありテキスト分類（SSTC）を研究します。最も一般的なSSTC技術の一つは擬似ラベル付けであり、これはラベル付きデータで訓練された教師分類器を使用してラベルなしデータに擬似ラベルを付与します。続いて、この擬似ラベル付きデータを使って生徒分類器を訓練します。しかし、擬似ラベルが不正確である場合、生徒分類器は不正確なデータから学習し、教師分類器よりもさらに悪い性能を示すことがあります。この問題を緩和するために、私たちはDual Pseudo Supervision（DPS）と呼ばれるシンプルで効率的な擬似ラベル付けフレームワークを提案します。DPSは生徒からのフィードバック信号を活用して、教師がより良い擬似ラベルを生成できるように指導します。具体的には、教師が注釈した擬似ラベル付きデータに基づいて生徒を交互に更新し、メタ学習を通じて生徒の性能に基づいて教師を最適化します。さらに、教師の安定性を向上させるために一貫性正規化項も設計しました。これらの2つの戦略により、信頼性の高い教師が学生により正確な擬似ラベルを提供し、テキスト分類の全体的な性能を向上させることができます。我々は、3つのベンチマークデータセット（AG News、Yelp、Yahoo）で広範な実験を行い、DPS方法の有効性を検証しました。実験結果では、私たちのアプローチが強力な競合他社よりもはるかに優れた性能を達成することが示されました。再現性を高めるために、論文のコードとデータをhttps://github.com/GRIT621/DPSで公開します。

Abstract:　 In this paper, we study the semi-supervised text classification (SSTC) by exploring both labeled and extra unlabeled data. One of the most popular SSTC techniques is pseudo-labeling which assigns pseudo labels for unlabeled data via a teacher classifier trained on labeled data. These pseudo labeled data is then applied to train a student classifier. However, when the pseudo labels are inaccurate, the student classifier will learn from inaccurate data and get even worse performance than the teacher. To mitigate this issue, we propose a simple yet efficient pseudo-labeling framework called Dual Pseudo Supervision (DPS), which exploits the feedback signal from the student to guide the teacher to generate better pseudo labels. In particular, we alternately update the student based on the pseudo labeled data annotated by the teacher and optimize the teacher based on the student's performance via meta learning. In addition, we also design a consistency regularization term to further improve the stability of the teacher. With the above two strategies, the learned reliable teacher can provide more accurate pseudo-labels to the student and thus improve the overall performance of text classification. We conduct extensive experiments on three benchmark datasets (i.e., AG News, Yelp and Yahoo) to verify the effectiveness of our DPS method. Experimental results show that our approach achieves substantially better performance than the strong competitors. For reproducibility, we will release our code and data of this paper publicly at https://github.com/GRIT621/DPS.

Learning to Rank Knowledge Subgraph Nodes for Entity Retrieval
Authors: Parastoo Jafarzadeh (1), Zahra Amirmahani (2), Faezeh Ensan (1)
1: Ryerson University, 2: Ferdowsi University of Mashhad

ACM DL

Google Scholar

(292)
概要:　テキストクエリに基づいて、大規模な知識ベースから関連するエンティティのランク付け済みリストを取得するタスクであるエンティティ検索の重要性は、文献において広く認識されています。本論文では、エンティティ同士が関係するコンテクストを効果的に表現し、モデル化する必要性という重要な課題に対処する新しいエンティティ検索手法を提案します。本手法に基づき、モデルはまずテキスト知識グラフのサブグラフを取得し、剪定するように訓練されます。これにより、エンティティ間のコンテクスト関係が表されます。次に、深層モデルを導入し、ノード、エッジ、および与えられた質問のテキスト内容を推論し、サブグラフ内のエンティティをスコアリングおよびランク付けします。我々の手法は、エンティティ検索のいくつかのベンチマークにおいて最先端の手法を実験的に上回ることを示します。

Abstract:　 The importance of entity retrieval, the task of retrieving a ranked list of related entities from big knowledge bases given a textual query, has been widely acknowledged in the literature. In this paper, we propose a novel entity retrieval method that addresses the important challenge that revolves around the need to effectively represent and model context in which entities relate to each other. Based on our proposed method, a model is firstly trained to retrieve and prune a subgraph of a textual knowledge graph that represents contextual relationships between entities. Secondly, a deep model is introduced to reason over the textual content of nodes, edges, and the given question and score and rank entities in the subgraph. We show experimentally that our approach outperforms state-of-the-art methods on a number of benchmarks for entity retrieval.

Mitigating the Filter Bubble While Maintaining Relevance: Targeted Diversification with VAE-based Recommender Systems
Authors: Zhaolin Gao (1), Tianshu Shen (1), Zheda Mai (2), Mohamed Reda Bouadjenek (3), Isaac Waller (1), Ashton Anderson (1), Ron Bodkin (4), Scott Sanner (1)
1: University of Toronto, 2: Optimy AI, 3: Deakin University, 4: Vector Institute for Artificial Intelligence

ACM DL

Google Scholar

(293)
概要:　オンラインレコメンデーションシステムはしばしばフィルターバブルを発生させ、ユーザーに対して過去の興味に固執した内容のみを推奨する傾向があります。メディアレコメンデーションのケースでは、経済に関するトピックなどが複数の政治的視点からバランスの取れた情報を提供する代わりに、政治的スペクトルの一端に極端に偏ったコンテンツを推奨することで、政治的分極化を強化する可能性があります。歴史的に見て、最大限の限界異質性（MMR）は結果リストを多様化し、フィルターバブルを緩和するために使用されてきましたが、次の3つの主要な欠点があります：(1) MMRは多様性のために直接的に関連性を犠牲にする、(2) MMRは通常、特定の次元（例：政治的分極化）ではなく、全体的なコンテンツを多様化する、(3) MMRは推奨されるアイテム間のペアワイズ類似性を計算する必要があるため、実際には非効率的である。これらの制約を同時に克服するために、我々は特定のトピカル次元（例：政治的分極化）に対する概念活性化ベクトル（CAV）をトレーニングする新しい方法論を提案します。その後、最新のVAEベースのレコメンダーシステムにおけるユーザーの嗜好の潜在埋め込みを調整し、正交する次元全体でトピカルな関連性を保持しながら、特定された次元に沿って多様化を図ります。我々の実験では、我々のターゲット多様化VAEベースの協調フィルタリング（TD-VAE-CF）方法論が、MMRの非ターゲットおよびターゲット変異と比較して、幅広い多様化レベルにわたってユーザーの嗜好に対するコンテンツの関連性をより良く保持し、また、MMRの事後リランキングアプローチよりも計算効率がはるかに高いことが示されています。

Abstract:　 Online recommendation systems are prone to create filter bubbles, whereby users are only recommended content narrowly aligned with their historical interests. In the case of media recommendation, this can reinforce political polarization by recommending topical content (e.g., on the economy) at one extreme end of the political spectrum even though this topic has broad coverage from multiple political viewpoints that would provide a more balanced and informed perspective for the user. Historically, Maximal Marginal Relevance (MMR) has been used to diversify result lists and even mitigate filter bubbles, but suffers from three key drawbacks: (1)~MMR directly sacrifices relevance for diversity, (2)~MMR typically diversifies across all content and not just targeted dimensions (e.g., political polarization), and (3)~MMR is inefficient in practice due to the need to compute pairwise similarities between recommended items. To simultaneously address these limitations, we propose a novel methodology that trains Concept Activation Vectors (CAVs) for targeted topical dimensions (e.g., political polarization). We then modulate the latent embeddings of user preferences in a state-of-the-art VAE-based recommender system to diversify along the targeted dimension while preserving topical relevance across orthogonal dimensions. Our experiments show that our Targeted Diversification VAE-based Collaborative Filtering (TD-VAE-CF) methodology better preserves relevance of content to user preferences across a range of diversification levels in comparison to both untargeted and targeted variations of Maximum Marginal Relevance (MMR); TD-VAE-CF is also much more computationally efficient than the post-hoc re-ranking approach of MMR.

Mitigating Bias in Search Results Through Contextual Document Reranking and Neutrality Regularization
Authors: George Zerveas (1), Navid Rekabsaz (2), Daniel Cohen (1), Carsten Eickhoff (1)
1: Brown University, 2: Johannes Kepler University Linz & Linz Institute of Technology

ACM DL

Google Scholar

(294)
概要:　社会的バイアスは情報検索システムの結果に影響を与えることがあり、その結果、検索結果が既存の社会的バイアスを強化する可能性もあります。そのため、近年の研究では、検索結果におけるバイアスを定量化して軽減する方法の開発に焦点が当てられ、トランスフォーマーベースの言語モデルを活用する現代的な検索システムに適用されています。本研究では、文書の文脈埋め込みに基づく再ランキングの枠組み内でバイアス軽減を考慮することで、この研究方向を拡張します。この枠組みでは、トランスフォーマーベースのクエリエンコーダーがリスト形式の目的を通じて関連性ランキングのために最適化され、同じクエリに対して大量の候補文書埋め込みを互いの文脈でスコアリングします。同時に、保護された属性（例えば、性別）に関して中立性から逸脱する高スコアの文書に対して正則化損失を課します。我々のバイアス軽減手法はエンドツーエンドで微分可能かつ効率的です。敵対的トレーニングに基づく既存のディープニューラル検索アーキテクチャの代替手段と比較して、私たちの手法はより強力なバイアス軽減/公平性を達成できることを示します。また、同等のバイアス軽減効果に対して、より優れた関連性パフォーマンス（有用性）を提供します。さらに重要なのは、私たちの方法は実際のシステムへの導入において本質的な、よりきめ細かく予測可能なバイアス軽減の強度を可能にすることです。

Abstract:　 Societal biases can influence Information Retrieval system results, and conversely, search results can potentially reinforce existing societal biases. Recent research has therefore focused on developing methods for quantifying and mitigating bias in search results and applied them to contemporary retrieval systems that leverage transformer-based language models. In the present work, we expand this direction of research by considering bias mitigation within a framework for contextual document embedding reranking. In this framework, the transformer-based query encoder is optimized for relevance ranking through a list-wise objective, by jointly scoring for the same query a large set of candidate document embeddings in the context of one another, instead of in isolation. At the same time, we impose a regularization loss which penalizes highly scoring documents that deviate from neutrality with respect to a protected attribute (e.g., gender). Our approach for bias mitigation is end-to-end differentiable and efficient. Compared to the existing alternatives for deep neural retrieval architectures, which are based on adversarial training, we demonstrate that it can attain much stronger bias mitigation/fairness. At the same time, for the same amount of bias mitigation, it offers significantly better relevance performance (utility). Crucially, our method allows for a more finely controllable and predictable intensity of bias mitigation, which is essential for practical deployment in production systems.

A Meta-learning Approach to Fair Ranking
Authors: Yuan Wang (1), Zhiqiang Tao (1), Yi Fang (1)
1: Santa Clara University

ACM DL

Google Scholar

(295)
概要:　近年、情報検索（IR）システムにおける公平性がますます注目されています。データ駆動のランキングモデルは従来の手法に比べて顕著な改善を達成していますが、これらのモデルを訓練する際に使用されるデータセットは通常バイアスがかかっており、それがランキングモデルにおいて不公平を引き起こします。例えば、エキスパート検索のために収集された不均衡なデータセットは、人種や性別など特定の人口集団に対する体系的な差別を導き、少数派グループの露出をさらに減少させます。この問題を解決するために、我々はメタ学習に基づく公正ランキング（MFR）モデルを提案します。これは、自動的に重み付けされた損失を通じて保護対象グループに対するデータバイアスを軽減することができます。具体的には、公平な損失重みで管理される全体（バイアスあり）データセット上でリストワイズ学習ランキング（LTR）モデルを訓練する一方、バイアスのないサンプルデータセット（メタデータセット）からメタラーナーを明示的に訓練するメタ学習フレームワークを採用します。メタラーナーは、少数派グループに対するランキング損失に重みを付ける関数として機能します。重み付け関数とランキングモデルのパラメータを更新するために、提案されたMFRを二段階最適化問題として定式化し、勾配を通じた勾配法を使用して解決します。複数の実世界データセットにおける実験結果は、提案手法が比較可能なランキング性能を達成し、最先端の方法と比較して公平性メトリックを大幅に改善することを示しています。

Abstract:　 In recent years, the fairness in information retrieval (IR) system has received increasing research attention. While the data-driven ranking models achieve significant improvements over traditional methods, the dataset used to train such models is usually biased, which causes unfairness in the ranking models. For example, the collected imbalance dataset on the subject of the expert search usually leads to systematic discrimination on the specific demographic groups such as race, gender, etc, which further reduces the exposure for the minority group. To solve this problem, we propose a Meta-learning based Fair Ranking (MFR) model that could alleviate the data bias for protected groups through an automatically-weighted loss. Specifically, we adopt a meta-learning framework to explicitly train a meta-learner from an unbiased sampled dataset (meta-dataset), and simultaneously, train a listwise learning-to-rank (LTR) model on the whole (biased) dataset governed by "fair" loss weights. The meta-learner serves as a weighting function to make the ranking loss attend more on the minority group. To update the parameters of the weighting function and the ranking model, we formulate the proposed MFR as a bilevel optimization problem and solve it using the gradients through gradients. Experimental results on several real-world datasets demonstrate that the proposed method achieves a comparable ranking performance and significantly improves the fairness metric compared with state-of-the-art methods.

Can Users Predict Relative Query Effectiveness?
Authors: Oleg Zendel (1), Melika P. Ebrahim (1), J. Shane Culpepper (1), Alistair Moffat (2), Falk Scholer (1)
1: RMIT University, 2: The University of Melbourne

ACM DL

Google Scholar

(296)
概要:　任意の情報ニーズは、幅広いクエリによって表現することができます。このようなクエリの多様性に関する最近の研究では、同じ意図と表面的な類似性を持つクエリであっても、異なるクエリが著しく異なる文書セットを取得する可能性があることが示されています。つまり、同じ情報ニーズに対して、異なるユーザーが受け取る検索結果ページ（SERP）の有効性は大きく異なる可能性があります。この観察結果から次のような興味深い質問が浮かび上がります：ユーザーはあるクエリがどれほど有用であるかを感知できるのでしょうか？同じ情報検索ニーズに対する他のクエリの有効性を予測できるのでしょうか？これらの質問を探るために、私たちはクラウドソーシングを利用したユーザースタディをデザインし実施しました。このスタディでは、バックストーリーとして表現された情報ニーズの文を用意し、その目標を満たす一連のクエリの相対的な有用性に関する意見を被験者に提供してもらいました。意見を集めるために二つの異なるインターフェイスを使用しました：クエリの絶対評価を集めるものと、一連のクエリを「順序付け」するよう要求するものです。我々の研究の結果、クラウドワーカーはクエリの有効性の見積もりにおいて合理的に一貫性があり、さらにその見積もりが実際のシステムパフォーマンスと正の相関があることが確認されました。

Abstract:　 Any given information need can be expressed via a wide range of possible queries. Recent work with such query variations has demonstrated that different queries can fetch notably divergent sets of documents, even when the queries have identical intents and superficial similarity. That is, different users might receive SERPs of quite different effectiveness for the same information need. That observation then raises an interesting question: do users have a sense of how useful any given query will be? Can they anticipate the effectiveness of alternative queries for the same retrieval need? To explore that question we designed and carried out a crowd-sourced user study in which we asked subjects to consider an information need statement expressed as a backstory, and then provide their opinions as to the relative usefulness of a set of queries ostensibly addressing that objective. We solicited opinions using two different interfaces: one that collected absolute ratings of queries, and one that required that the subjects place a set of queries into "order". We found that crowd workers are reasonably consistent in their estimates of how effective queries are likely to be, and also that their estimates correlate positively with actual system performance.

ELECRec: Training Sequential Recommenders as Discriminators
Authors: Yongjun Chen (1), Jia Li (1), Caiming Xiong (1)
1: Salesforce Research

ACM DL

Google Scholar

(297)
概要:　シーケンシャルレコメンデーション（逐次推薦）は、ユーザーの過去の操作履歴に基づいて次に興味を持つアイテムを生成する役割、つまり逐次エンコーダを訓練して生成タスクを実行するものとされることが多い。しかし、これらの手法は有効に機能するためには、より意味のあるサンプルで訓練する必要があり、そうでなければ訓練不足のモデルとなる危険性がある。本研究では、シーケンシャルレコメンダーを生成モデルとしてではなく識別モデルとして訓練する手法を提案する。次のアイテムを予測する代わりに、本手法ではサンプルされたアイテムが「本物」のターゲットアイテムであるかどうかを識別する識別器を訓練する。補助モデルとしての生成器は、訓練中のみ識別器と共に訓練され、信頼性のある次のアイテムをサンプルする。この生成器は訓練後に廃棄され、訓練済みの識別器が最終的なSRモデルとして機能し、\modelnameとして示される。4つのデータセットで行われた実験は、提案手法の有効性と効率性を実証している。

Abstract:　 Sequential recommendation is often considered as a generative task, i.e., training a sequential encoder to generate the next item of a user's interests based on her historical interacted items. Despite their prevalence, these methods usually require training with more meaningful samples to be effective, which otherwise will lead to a poorly trained model. In this work, we propose to train the sequential recommenders as discriminators rather than generators. Instead of predicting the next item, our method trains a discriminator to distinguish if a sampled item is a 'real' target item or not. A generator, as an auxiliary model, is trained jointly with the discriminator to sample plausible alternative next items and will be thrown out after training. The trained discriminator is considered as the final SR model and denoted as \modelname. Experiments conducted on four datasets demonstrate the effectiveness and efficiency of the proposed approach.

Explainable Session-based Recommendation with Meta-path Guided Instances and Self-Attention Mechanism
Authors: Jiayin Zheng (1), Juanyun Mai (1), Yanlong Wen (1)
1: Nankai University

ACM DL

Google Scholar

(298)
概要:　セッションベースの推薦（Session-based Recommendation, SR）は、ユーザーのプライバシーを大幅に保護するため、その人気が高まっています。効果的であることに加えて、説明可能性も成功するSRモデルの開発には重要です。これは、結果の説得力を高め、ユーザーの満足度を向上させ、デバッグの効率を向上させます。しかし、現在のSRモデルの大多数は説明不可能であり、解釈可能であると主張するモデルでさえ、ユーザーの意図とそれがどのようにモデルの決定に影響を与えるかについて明確で説得力のある説明を提供することはできません。この問題を解決するために、本研究では、パスインスタンスを用いてアイテム依存関係を捉え、基礎的な動機を明示的に明らかにし、全体の推論プロセスを説明するメタパスガイドモデルを提案します。まず、我々のモデルはメタパスガイドインスタンスを探索し、マルチヘッド自己注意メカニズムを活用してこれらのパスインスタンスの背後にある隠れた動機を明らかにします。ユーザーの興味とその変化を包括的にモデリングするために、隣接アイテムと非隣接アイテムの両方でパスを探索します。次に、ユーザーアイテムの相互作用とメタパスに基づくコンテキストを順序に取り入れることで、アイテムの表現を更新します。最近の強力なベースラインと比較して、我々の手法は3つのデータセットでSOTA（最先端）性能を発揮するだけでなく、明確で分かりやすい説明も提供します。

Abstract:　 Session-based recommendation (SR) gains increasing popularity because it helps greatly maintain users' privacy. Aside from its efficacy, explainability is also critical for developing a successful SR model, since it can improve the persuasiveness of the results, the users' satisfaction, and the debugging efficiency. However, the majority of current SR models are unexplainable and even those that claim to be interpretable cannot provide clear and convincing explanations of users' intentions and how they influence the models' decisions. To solve this problem, in this research, we propose a meta-path guided model which uses path instances to capture item dependencies, explicitly reveal the underlying motives, and illustrate the entire reasoning process. To begin with, our model explores meta-path guided instances and leverages the multi-head self-attention mechanism to disclose the hidden motivations beneath these path instances. To comprehensively model the user interest and interest shifting, we search paths in both adjacent and non-adjacent items. Then, we update item representations by incorporating the user-item interactions and meta-path-based context sequentially. Compared with recent strong baselines, our method is competent to the SOTA performance on three datasets and meanwhile provides sound and clear explanations.

MM-Rec: Visiolinguistic Model Empowered Multimodal News Recommendation
Authors: Chuhan Wu (1), Fangzhao Wu (2), Tao Qi (1), Chao Zhang (3), Yongfeng Huang (1), Tong Xu (4)
1: Tsinghua University, 2: Microsoft Research Asia, 3: Shandong University, 4: University of Science and Technology of China

ACM DL

Google Scholar

(299)
概要:　ニュースの表現はニュース推奨において非常に重要です。既存の多くの手法はニュースのテキストからのみニュースの表現を学習し、ニュースの視覚情報は無視しています。しかし、ユーザーはニュースのタイトルへの興味だけでなく、ニュース画像の魅力によってもニュースをクリックすることがあります。したがって、画像はニュースを表現しニュースクリックを予測するのに有用です。事前学習された視覚言語モデルはマルチモーダルな理解において強力であり、ニュースをテキストと視覚の両方のコンテンツから表現することができます。本稿では、ニュースのテキスト情報と視覚情報の両方を取り入れてマルチモーダルなニュース表現を学習する、マルチモーダルニュース推奨手法を提案します。まず、物体検出を通じてニュース画像から関心領域（ROIs）を抽出します。次に、事前学習された視覚言語モデルを使用してニューステキストと画像ROIsの両方をエンコードし、共注意変換器を用いてそれらの固有の関連性をモデル化します。さらに、候補ニュースに対するユーザーの興味を正確にモデル化するために、関連性のある過去にクリックされたニュースを選択するクロスモーダルな候補認識注意ネットワークを提案します。実験により、マルチモーダルなニュース情報の取り入れがニュース推奨の性能を効果的に向上させることが確認されました。

Abstract:　 News representation is critical for news recommendation. Most existing methods learn news representations only from news texts while ignoring the visual information of news. In fact, users may click news not only due to the interest in news titles but also the attraction of news images. Thus, images are useful for representing news and predicting news clicks. Pretrained visiolinguistic models are powerful in multi-modal understanding, which can represent news from both textual and visual contents. In this paper, we propose a multimodal news recommendation method that can incorporate both textual and visual information of news to learn multimodal news representations. We first extract region-of-interests (ROIs) from news images via object detection. We then use a pre-trained visiolinguistic model to encode both news texts and image ROIs and model their inherent relatedness using co-attentional Transformers. In addition, we propose a crossmodal candidate-aware attention network to select relevant historical clicked news for the accurate modeling of user interest in candidate news. Experiments validate that incorporating multimodal news information can effectively improve the performance of news recommendation.

Generative Adversarial Framework for Cold-Start Item Recommendation
Authors: Hao Chen (1), Zefan Wang (2), Feiran Huang (2), Xiao Huang (3), Yue Xu (4), Yishi Lin (5), Peng He (5), Zhoujun Li (1)
1: Beihang University, 2: Jinan University, 3: The Hong Kong Polytechnic University, 4: Alibaba Group, 5: Tencent Inc.

ACM DL

Google Scholar

(300)
概要:　コールドスタート問題は推薦システムにおける長年の課題です。埋め込みベースの推薦モデルは、ユーザーとアイテムの履歴的なやり取りからそれぞれの埋め込みを学習し、推薦を行います。そのため、訓練セットに現れていないアイテム（コールドアイテム）に対しては、このような埋め込みベースのモデルは通常、性能が低下します。最も一般的な解決策は、コールドアイテムのコンテンツ特徴からコールド埋め込みを生成することですが、コンテンツから生成されたコールド埋め込みは、履歴的なやり取りから学習されたウォーム埋め込みとは異なる分布になります。この場合、現在のコールドスタート手法は興味深いシーソー現象に直面しており、コールドアイテムまたはウォームアイテムのいずれかの推薦を改善するが、逆に一方を悪化させるという問題が生じます。この問題に対処するために、我々はGenerative Adversarial Recommendation（GAR）と名付けた一般的なフレームワークを提案します。生成器と推薦者を対抗的に訓練することにより、生成されたコールドアイテムの埋め込みがウォーム埋め込みと似た分布を持ち、推薦者さえも欺くことができるようになります。同時に、推薦者は「偽」のウォーム埋め込みと実際のウォーム埋め込みを正しく順位付けするように微調整されます。その結果、ウォームとコールドの推薦がお互いに影響を与えず、シーソー現象を回避することができます。さらに、GARは既存のどの推薦モデルにも適用可能です。2つのデータセットで行われた実験では、GARがコールドスタートにおいてCFベースのモデル（30.18%以上の改善）およびGNNベースのモデル（17.78%以上の改善）の両方で強力な全体推薦性能を持つことを示しています。

Abstract:　 The cold-start problem has been a long-standing issue in recommendation. Embedding-based recommendation models provide recommendations by learning embeddings for each user and item from historical interactions. Therefore, such embedding-based models perform badly for cold items which haven't emerged in the training set. The most common solutions are to generate the cold embedding for the cold item from its content features. However, the cold embeddings generated from contents have different distribution as the warm embeddings are learned from historical interactions. In this case, current cold-start methods are facing an interesting seesaw phenomenon, which improves the recommendation of either the cold items or the warm items but hurts the opposite ones. To this end, we propose a general framework named Generative Adversarial Recommendation (GAR). By training the generator and the recommender adversarially, the generated cold item embeddings can have similar distribution as the warm embeddings that can even fool the recommender. Simultaneously, the recommender is fine-tuned to correctly rank the "fake'' warm embeddings and the real warm embeddings. Consequently, the recommendation of the warms and the colds will not influence each other, thus avoiding the seesaw phenomenon. Additionally, GAR could be applied to any off-the-shelf recommendation model. Experiments on two datasets present that GAR has strong overall recommendation performance in cold-starting both the CF-based model (improved by over 30.18%) and the GNN-based model (improved by over 17.78%).

Inconsistent Ranking Assumptions in Medical Search and Their Downstream Consequences
Authors: Daniel Cohen (1), Kevin Du (1), Bhaskar Mitra (2), Laura Mercurio (1), Navid Rekabsaz (3), Carsten Eickhoff (1)
1: Brown University, 2: Microsoft, 3: Johannes Kepler University Linz & Linz Institute of Technology

ACM DL

Google Scholar

(301)
概要:　あるクエリに対して、ニューラル検索モデルは各ドキュメントの関連性に関する点推定値を予測します。しかし、点推定値のみに依存する主な欠点は、予測に対するモデルの信頼度を示す情報が含まれていないことです。この情報の欠如にもかかわらず、リランキング、カットオフ予測、無回答分類といった下流の手法は、それぞれのタスクを達成するための効果的な関数を学習することができます。しかし、これらの下流手法は、初期のランキングモデルがスコアの予測に対して信頼度を失う場合に、性能が低下することがあります。これは、健康の意思決定に影響を与える医療検索のような高リスクの設定において特に重要です。この問題を解決するために、最近の研究では、ドキュメントスコアの分布を捉えるためにベイズ的不確実性を導入しています。本論文では、この不確実性情報を、ランクリスト上で下流手法がどれだけうまく機能するかを示す指標として利用することを提案します。また、ニューラルモデルの事後分布において、特定の疾病関連クエリに対する著しいバイアスが存在し、このバイアスがモデルの予測分布から下流手法に伝播することを示します。最後に、ユーザーフィードバックを必要とせずにオフラインでこれらの失敗事例を部分的に特定する有効な方法として、マルチディストリビューションの不確実性メトリクスである「信頼度減衰」を導入します。

Abstract:　 Given a query, neural retrieval models predict point estimates of relevance for each document; however, a significant drawback of relying solely on point estimates is that they contain no indication of the model's confidence in its predictions. Despite this lack of information, downstream methods such as reranking, cutoff prediction, and none-of-the-above classification are still able to learn effective functions to accomplish their respective tasks. Unfortunately, these downstream methods can suffer poor performance when the initial ranking model loses confidence in its score predictions. This becomes increasingly important in high-stakes settings, such as medical searches that can influence health decision making. Recent work has resolved this lack of information by introducing Bayesian uncertainty to capture the possible distribution of a document score. This paper presents the use of this uncertainty information as an indicator of how well downstream methods will function over a ranklist. We highlight a significant bias against certain disease-related queries within the posterior distribution of a neural model, and show that this bias in a model's predictive distribution propagates to downstream methods. Finally, we introduce a multi-distribution uncertainty metric, confidence decay, as a valid way of partially identifying these failure cases in an offline setting without the need of any user feedback.

Modality-Balanced Embedding for Video Retrieval
Authors: Xun Wang (1), Bingqing Ke (1), Xuanping Li (1), Fangyu Liu (2), Mingyu Zhang (1), Xiao Liang (1), Qiushi Xiao (1)
1: Kuaishou, 2: University of Cambridge

ACM DL

Google Scholar

(302)
概要:　大規模な短編動画共有プラットフォームにおいて、テキストクエリに関連する動画を発見するために動画検索が主要な手段となっています。オンライン検索ログを用いてクエリ-動画のバイエンコーダモデルを訓練する際に、動画エンコーダがほとんどテキストのマッチングに依存し、ビジョンやオーディオなど他のモダリティを無視するというモダリティバイアス現象を確認しました。このモダリティの不均衡は、a) モダリティギャップ：クエリと動画テキストの関連性は、クエリもテキストという同じモダリティであるため、学習が容易であること、b) データバイアス：ほとんどの訓練サンプルがテキストマッチングだけで解決可能であること、に起因します。ここでは、最初の検索段階を改善するための実践およびモダリティ不均衡問題に対する解決策を共有します。モダリティバランスドビデオリトリーバル（MBVR）というモデルを提案し、これには手動生成されたモダリティシャッフル（MS）サンプルと視覚的関連性に基づく動的マージン（DM）の二つの重要な要素が含まれます。これにより、動画エンコーダが各モダリティに均等に注意を払うよう促します。実世界のデータセットで広範な実験を行った結果、我々の手法がモダリティバイアス問題の解決において有効かつ効率的であることを実証しました。また、提案手法を大規模な動画プラットフォームに導入し、最適化されたベースラインと比較してA/Bテストと手動GSB評価において統計的に有意な向上を観察しました。

Abstract:　 Video search has become the main routine for users to discover videos relevant to a text query on large short-video sharing platforms. During training a query-video bi-encoder model using online search logs,\textit we identify a modality bias phenomenon that the video encoder almost entirely relies on text matching, neglecting other modalities of the videos such as vision, audio, \etc This modality imbalance results from a) modality gap: the relevance between a query and a video text is much easier to learn as the query is also a piece of text, with the same modality as the video text; b) data bias: most training samples can be solved solely by text matching. Here we share our practices to improve the first retrieval stage including our solution for the modality imbalance issue. We propose \modelname (short for Modality Balanced Video Retrieval) with two key components: manually generated modality-shuffled (MS) samples and a dynamic margin (DM) based on visual relevance. They can encourage the video encoder to pay balanced attentions to each modality. Through extensive experiments on a real world dataset, we show empirically that our method is both effective and efficient in solving modality bias problem. We have also deployed our ~\modelname~ in a large video platform and observed statistically significant boost over a highly optimized baseline in an A/B test and manual GSB evaluations.

An Efficient Fusion Mechanism for Multimodal Low-resource Setting
Authors: Dushyant Singh Chauhan (1), Asif Ekbal (1), Pushpak Bhattacharyya (1)
1: Indian Institute of Technology Patna

ACM DL

Google Scholar

(303)
概要:　複数のモダリティ（テキスト、音響、映像）の効果的な融合は簡単ではありません。これらのモダリティはそれぞれが特有で多様な情報を持ち、均等に貢献しないことが多いためです。特に、トレーニング用のサンプルが少ない低リソース環境では、異なるモダリティの融合はさらに困難を伴います。本論文では、複数のモダリティを用いて多様な融合を生成し、その中から最良の融合を選択するマルチリプレゼンタティブ融合メカニズムを提案します。まず、マルチモーダル入力に畳み込みフィルターを適用してモダリティの異なる多様な表現を生成します。次に、ペアごとのモダリティを複数の表現で融合し、複数の融合結果を得ます。最後に、最適な融合のみを選択する注意メカニズムを提案し、ノイズのある融合を無視することでノイズ問題を解決します。提案手法を低リソースのマルチモーダル感情分析データセット（YouTube、MOUD、ICT-MMMO）で評価しました。実験結果は、提案手法が有効であることを示し、それぞれのデータセットで 59.3%、83.0%、および 84.1%の精度を達成しました。

Abstract:　 The effective fusion of multiple modalities (i.e., text, acoustic, and visual) is a non-trivial task, as these modalities often carry specific and diverse information and do not contribute equally. The fusion of different modalities could even be more challenging under the low-resource setting, where we have fewer samples for training. This paper proposes a multi-representative fusion mechanism that generates diverse fusions with multiple modalities and then chooses the best fusion among them. To achieve this, we first apply convolution filters on multimodal inputs to generate different and diverse representations of modalities. We then fuse pairwise modalities with multiple representations to get the multiple fusions. Finally, we propose an attention mechanism that only selects the most appropriate fusion, which eventually helps resolve the noise problem by ignoring the noisy fusions. We evaluate our proposed approach on three low-resource multimodal sentiment analysis datasets, i.e., YouTube, MOUD, and ICT-MMMO. Experimental results show the effectiveness of our proposed approach with the accuracies of 59.3%, 83.0%, and 84.1% for the YouTube, MOUD, and ICT-MMMO datasets, respectively.

QSG Transformer: Transformer with Query-Attentive Semantic Graph for Query-Focused Summarization
Authors: Choongwon Park (1), Youngjoong Ko (1)
1: Sungkyunkwan University

ACM DL

Google Scholar

(304)
概要:　 Query-Focused Summarization（QFS）は、長文から重要な情報を抽出し、クエリに答えるを作成するタスクです。最近では、TransformerベースのモデルがQFSに広く使用されています。しかし、従来のTransformerアーキテクチャは、離れた単語間の関係やクエリからの情報を直接活用することができません。本研究では、これらの課題に対応するために、Query-attentive Semantic Graph（QSG）の構造情報を活用した新しいQFSモデル「QSG Transformer」を提案します。具体的には、QSG Transformerでは、提案するQuery-attentive Graph Attention NetworkによってQSGノード表現を改善し、Personalized PageRankを用いてクエリノードの情報をQSGに拡散させます。この手法により、クエリと文書の関係から得られる情報をより反映したを生成します。提案手法は、2つのQFSデータセットで評価され、最新のモデルと比較して優れた性能を達成しました。

Abstract:　 Query-Focused Summarization (QFS) is a task that aims to extract essential information from a long document and organize it into a summary that can answer a query. Recently, Transformer-based summarization models have been widely used in QFS. However, the simple Transformer architecture cannot utilize the relationships between distant words and information from a query directly. In this study, we propose the QSG Transformer, a novel QFS model that leverages structure information on Query-attentive Semantic Graph (QSG) to address these issues. Specifically, in the QSG Transformer, QSG node representation is improved by a proposed query-attentive graph attention network, which spreads the information of the query node into QSG using Personalized PageRank, and it is used to generate a summary that better reflects the information from the relationships of a query and document. The proposed method is evaluated on two QFS datasets, and it achieves superior performances over the state-of-the-art models.

Improving Item Cold-start Recommendation via Model-agnostic Conditional Variational Autoencoder
Authors: Xu Zhao (1), Yi Ren (1), Ying Du (1), Shenzheng Zhang (1), Nian Wang (1)
1: Tencent News

ACM DL

Google Scholar

(305)
概要:　埋め込みとMLP（多層パーセプトロン）は、現代の大規模な推薦システムのパラダイムとなっています。しかし、このパラダイムはコールドスタート問題に悩まされており、これが推薦システムのエコロジカルヘルスを深刻に損なう可能性があります。本論文では、履歴データと限られたインタラクション記録を使用して、コールドアイテムに対して強化されたウォームアップID埋め込みを生成することで、アイテムのコールドスタート問題に取り組もうとしています。産業実践の観点から、主に以下の3点に焦点を当てています。1) 追加のデータ要求なしにコールドスタートをどのように実施し、オンライン推薦シナリオに簡単に戦略を展開できるか。2) 新しいアイテムの履歴記録と新たに発生するインタラクションデータの両方をどのように活用するか。3) インタラクションデータを基にアイテムIDとサイド情報の関係をどのように安定的にモデル化するか。これらの問題に対処するために、モデルに依存しない条件付き変分オートエンコーダー（CVAR）ベースの推薦フレームワークを提案します。このフレームワークには、さまざまなバックボーンと互換性があり、追加のデータ要件がないこと、履歴データと最近のインタラクションの両方を利用できることなどの利点があります。CVARは潜在変数を使用してアイテムのサイド情報の分布を学習し、条件付きデコーダーを用いて望ましいアイテムID埋め込みを生成します。提案手法は、公共データセットを用いた広範なオフライン実験およびTencentニュース推薦プラットフォームでのオンラインA/Bテストによって評価され、CVARの利点と堅牢性がさらに示されました。

Abstract:　 Embedding & MLP has become a paradigm for modern large-scale recommendation system. However, this paradigm suffers from the cold-start problem which will seriously compromise the ecological health of recommendation systems. This paper attempts to tackle the item cold-start problem by generating enhanced warmed-up ID embeddings for cold items with historical data and limited interaction records. From the aspect of industrial practice, we mainly focus on the following three points of item cold-start: 1) How to conduct cold-start without additional data requirements and make strategy easy to be deployed in online recommendation scenarios. 2) How to leverage both historical records and constantly emerging interaction data of new items. 3) How to model the relationship between item ID and side information stably from interaction data. To address these problems, we propose a model-agnostic Conditional Variational Autoencoder based Recommendation(CVAR) framework with some advantages including compatibility on various backbones, no extra requirements for data, utilization of both historical data and recent emerging interactions. CVAR uses latent variables to learn a distribution over item side information and generates desirable item ID embeddings using a conditional decoder. The proposed method is evaluated by extensive offline experiments on public datasets and online A/B tests on Tencent News recommendation platform, which further illustrate the advantages and robustness of CVAR.

PST: Measuring Skill Proficiency in Programming Exercise Process via Programming Skill Tracing
Authors: Ruixin Li (1), Yu Yin (1), Le Dai (2), Shuanghong Shen (1), Xin Lin (3), Yu Su (4), Enhong Chen (3)
1: Institute of Advanced Technology, 2: School of Data Science, 3: School of Computer Science and Technology, 4: Hefei Normal University & Hefei Comprehensive National Science Center

ACM DL

Google Scholar

(306)
概要:　現代において、プログラミングは重要なスキルとなっています。個々のプログラミング能力を向上させるための需要が高まる中、プログラミング技術の習熟度を追跡することがますます重要になっています。しかし、学習者のプログラミングスキルを測定することに注目する研究者はほとんどいません。既存の多くの研究では、学習者の能力を描写する際に練習結果のみを使用しており、プログラミング練習過程に含まれる豊富な行動情報は利用されていません。そこで、我々はプログラミング練習過程におけるスキル習熟度を測定するモデル「Programming Skill Tracing（PST）」を提案します。我々は学習者の解答コードの特徴を表現するために「Code Information Graph（CIG）」を設計し、隣接した提出物の間の変化を測定するために「Code Tracing Graph（CTG）」を設計しました。さらに、プログラミングスキルを「プログラミング知識」と「コーディング能力」に分け、より詳細な評価が可能になるようにしました。最後に、PSTモデルの有効性と解釈可能性を確認するための様々な実験を行いました。

Abstract:　 Programming has become an important skill for individuals nowadays. For the demand to improve personal programming skill, tracking programming skill proficiency is getting more and more important. However, few researchers pay attention to measuring the programming skill of learners. Most of existing studies on learner capability portrait only made use of the exercise results, while the rich behavioral information contained in programming exercise process remains unused. Therefore, we propose a model that measures skill proficiency in programming exercise process named Programming Skill Tracing (PST). We designed Code Information Graph (CIG) to represent the feature of learners' solution code, and Code Tracing Graph (CTG) to measure the changes between the adjacent submissions. Furthermore, we divided programming skill into programming knowledge and coding ability to get more fine-grained assessment. Finally, we conducted various experiments to verify the effectiveness and interpretability of our PST model.

Towards Validating Long-Term User Feedbacks in Interactive Recommendation Systems
Authors: Hojoon Lee (1), Dongyoon Hwang (1), Kyushik Min (2), Jaegul Choo (1)
1: KAIST, 2: KAKAO Enterprise

ACM DL

Google Scholar

(307)
概要:　対話型推薦システム（IRS）は、ユーザーと推薦システム間の対話プロセスのモデリング能力により、多くの注目を集めています。多くのアプローチが、ユーザーの累積報酬を直接最大化できる強化学習（RL）アルゴリズムを採用しています。IRSにおいて、研究者たちは公開されているレビューデータセットを使用してアルゴリズムを比較・評価することが一般的です。しかし、公開データセットで提供されるユーザーフィードバックは、即時の反応（例：評価）を含むに過ぎず、遅延応答（例：滞在時間や生涯価値）は含まれていません。そのため、これらのレビューデータセットがIRSにおける長期的な効果の評価に適しているかどうかは疑問の余地があります。本研究では、レビューデータセットを用いたIRSの実験を再検討し、RLベースのモデルと最高の一段階報酬を持つアイテムを貪欲に推薦するシンプルな報酬モデルを比較しました。広範な分析を通じて、以下の三つの主要な発見を報告します。第一に、シンプルな貪欲報酬モデルは、累積報酬の最大化においてRLベースのモデルを一貫して上回ること。第二に、長期的な報酬に高い重みを適用すると、推薦パフォーマンスが低下すること。第三に、ベンチマークデータセットにおけるユーザーフィードバックの長期的な効果はわずかであることです。これらの発見を基に、データセットは慎重に検証される必要があり、RLベースのIRSアプローチの適切な評価のためにはシンプルな貪欲なベースラインを含めるべきであると結論付けます。私たちのコードとデータセットはhttps://github.com/dojeon-ai/irs_validationで利用可能です。

Abstract:　 Interactive Recommender Systems (IRSs) have attracted a lot of attention, due to their ability to model interactive processes between users and recommender systems. Numerous approaches have adopted Reinforcement Learning (RL) algorithms, as these can directly maximize users' cumulative rewards. In IRS, researchers commonly utilize publicly available review datasets to compare and evaluate algorithms. However, user feedback provided in public datasets merely includes instant responses (e.g., a rating), with no inclusion of delayed responses (e.g., the dwell time and the lifetime value). Thus, the question remains whether these review datasets are an appropriate choice to evaluate the long-term effects in IRS. In this work, we revisited experiments on IRS with review datasets and compared RL-based models with a simple reward model that greedily recommends the item with the highest one-step reward. Following extensive analysis, we can reveal three main findings: First, a simple greedy reward model consistently outperforms RL-based models in maximizing cumulative rewards. Second, applying higher weighting to long-term rewards leads to degradation of recommendation performance. Third, user feedbacks have mere long-term effects in the benchmark datasets. Based on our findings, we conclude that a dataset has to be carefully verified and that a simple greedy baseline should be included for a proper evaluation of RL-based IRS approaches. Our code and dataset are available at https://github.com/dojeon-ai/irs_validation.

Next Point-of-Interest Recommendation with Auto-Correlation Enhanced Multi-Modal Transformer Network
Authors: Yanjun Qin (1), Yuchen Fang (1), Haiyong Luo (2), Fang Zhao (1), Chenxing Wang (1)
1: Beijing University of Posts and Telecommunications, 2: Institute of Computing Technology Chinese Academy Of Sciences

ACM DL

Google Scholar

(308)
概要:　次のポイントオブインタレスト（POI）の推薦は、位置情報を活用したソーシャルネットワークの研究者にとって極めて重要な問題です。最近の多くの研究では、リカレントニューラルネットワーク（RNN）を利用した次POI推薦アルゴリズムの有効性が示されていますが、いくつかの重要な課題がいまだにうまく解決されていません。具体的には、(i) 以前のモデルの多くは連続した訪問の依存関係のみを考慮し、トレース中のPOI間の複雑な依存関係を無視しています。 (ii) POIシーケンスの階層構造や部分シーケンスのマッチングは、従来の方法ではほとんどモデル化されていません。 (iii) 既存のほとんどのソリューションは、POIの二つのモードと密度のカテゴリとの相互作用を無視しています。これらの課題を解決するために、次のPOI推薦用の自己相関強化マルチモーダルトランスフォーマーネットワーク（AutoMTN）を提案します。特に、AutoMTNはトランスフォーマーネットワークを用いて、トレースに沿ったすべてのPOIの関係を明示的に活用します。さらに、部分シーケンスレベルでの依存関係を発見し、POIとカテゴリシーケンス間のクロスモーダル相互作用に注目するために、トランスフォーマーの自己注意メカニズムを自己相関メカニズムに置き換え、マルチモーダルネットワークを設計しました。二つの実世界のデータセットを用いた実験結果は、AutoMTNが次のPOI推薦において最先端の方法に対して優位性を持つことを示しています。

Abstract:　 Next Point-of-Interest (POI) recommendation is a pivotal issue for researchers in the field of location-based social networks. While many recent efforts show the effectiveness of recurrent neural network-based next POI recommendation algorithms, several important challenges have not been well addressed yet: (i) The majority of previous models only consider the dependence of consecutive visits, while ignoring the intricate dependencies of POIs in traces; (ii) The nature of hierarchical and the matching of sub-sequence in POI sequences are hardly model in prior methods; (iii) Most of the existing solutions neglect the interactions between two modals of POI and the density category. To tackle the above challenges, we propose an auto-correlation enhanced multi-modal Transformer network (AutoMTN) for the next POI recommendation. Particularly, AutoMTN uses the Transformer network to explicitly exploits connections of all the POIs along the trace. Besides, to discover the dependencies at the sub-sequence level and attend to cross-modal interactions between POI and category sequences, we replace self-attention in Transformer with the auto-correlation mechanism and design a multi-modal network. Experiments results on two real-world datasets demonstrate the ascendancy of AutoMTN contra state-of-the-art methods in the next POI recommendation.

MuchSUM: Multi-channel Graph Neural Network for Extractive Summarization
Authors: Qianren Mao (1), Hongdong Zhu (1), Junnan Liu (1), Cheng Ji (1), Hao Peng (1), Jianxin Li (1), Lihong Wang (2), Zheng Wang (3)
1: Beihang University, 2: CNCERT, 3: University of Leeds

ACM DL

Google Scholar

(309)
概要:　最近の抽出型テキストの研究では、ドキュメントのエンコードにBERTを活用し、画期的な成果を上げています。しかし、事前学習されたBERTベースのエンコーダを使用する場合、代表的な文を選択する既存のアプローチは不十分です。これは、BERTが文の表現を明示的に訓練されていないためです。単にBERTで初期化された文を、意味的特徴をエンコードするためにクロスセンテンシャルのグラフベースのニューラルネットワーク（GNN）に提供するだけでは、文の重要性や位置といった他のに適した特徴を統合するのに理想的ではありません。本論文では、抽出型テキストのためのより良いアプローチとしてMuchSUMを提案します。MuchSUMは、複数の顕著なに値する特徴を明示的に取り込むために設計されたマルチチャネルグラフ畳み込みネットワークです。具体的には、異種単語-文の二部グラフの下で、ノードのテキスト特徴、ノードの中心性特徴、およびノードの位置特徴をエンコードする3つの特定のグラフチャネルを導入しました。次に、異なるチャネル間で共有される共通のグラフ表現を抽出するためのクロスチャネル畳み込み操作を設計しました。最終的に、各チャネルの文表現を融合して抽出型を行います。また、グラフベースのモデリングのために、各チャネルでエッジ特徴を取り込む三つの重み付きグラフも調査しました。実験結果は、我々のモデルがいくつかのBERT初期化されたグラフベースの抽出型システムと比較して、かなりのパフォーマンスを達成できることを示しています。

Abstract:　 Recent studies of extractive text summarization have leveraged BERT for document encoding with breakthrough performance. However, when using a pre-trained BERT-based encoder, existing approaches for selecting representative sentences for text summarization are inadequate since the encoder is not explicitly trained for representing sentences. Simply providing the BERT-initialized sentences to cross-sentential graph-based neural networks (GNNs) to encode semantic features of the sentences is not ideal because doing so fail to integrate other summary-worthy features like sentence importance and positions. This paper presents MuchSUM, a better approach for extractive text summarization. MuchSUM is a multi-channel graph convolutional network designed to explicitly incorporate multiple salient summary-worthy features. Specifically, we introduce three specific graph channels to encode the node textual features, node centrality features, and node position features, respectively, under bipartite word-sentence heterogeneous graphs. Then, a cross-channel convolution operation is designed to distill the common graph representations shared by different channels. Finally, the sentence representations of each channel are fused for extractive summarization. We also investigate three weighted graphs in each channel to infuse edge features for graph-based summarization modeling. Experimental results demonstrate our model can achieve considerable performance compared with some BERT-initialized graph-based extractive summarization systems.

Neutralizing Popularity Bias in Recommendation Models
Authors: Guipeng Xv (1), Chen Lin (1), Hui Li (1), Jinsong Su (1), Weiyao Ye (2), Yewang Chen (2)
1: Xiamen University, 2: Huaqiao University

ACM DL

Google Scholar

(310)
概要:　現在の大多数の推薦モデルは、アイテムのベクトル表現、すなわちアイテム埋め込みを学習することで予測を行います。しかし、アイテム埋め込みはデータから人気バイアスを引き継ぎ、それが偏った推薦結果を導きます。この観察に基づき、我々は二つのシンプルかつ効果的な戦略を設計しました。これらの戦略は異なるバックボーン推薦モデルに柔軟に組み込むことができ、人気バイアスに中立なアイテム表現を学習します。一つ目の戦略は、人気バイアスを一つの埋め込み方向に隔離し、訓練後にその方向を中和します。二つ目の戦略は、全ての埋め込み方向が分離され、人気に中立であることを促進します。我々は、提案する戦略が様々な実世界データセット上で最新のデバイアス手法を凌駕し、浅層および深層バックボーンモデルの推薦品質を改善することを実証しました。

Abstract:　 Most existing recommendation models learn vectorized representations for items, i.e., item embeddings to make predictions. Item embeddings inherit popularity bias from the data, which leads to biased recommendations. We use this observation to design two simple and effective strategies, which can be flexibly plugged into different backbone recommendation models, to learn popularity neutral item representations. One strategy isolates popularity bias in one embedding direction and neutralizes the popularity direction post-training. The other strategy encourages all embedding directions to be disentangled and popularity neutral. We demonstrate that the proposed strategies outperform state-of-the-art debiasing methods on various real-world datasets, and improve recommendation quality of shallow and deep backbone models.

Lightweight Meta-Learning for Low-Resource Abstractive Summarization
Authors: Taehun Huh (1), Youngjoong Ko (1)
1: Sungkyunkwan University

ACM DL

Google Scholar

(311)
概要:　近年、CNN/DailyMailやXsumのような高リソースのデータセットを使用した教師ありが顕著な性能向上を達成しています。しかし、既存の高リソースデータセットの多くはニュースのような特定のドメインに偏っており、低リソースデータセットに対して文書とのペアをアノテーションすることは非常に費用がかかります。さらに、低リソースタスクの必要性が高まっていますが、転移学習などの既存の方法ではドメインの変化や過学習の問題が依然として存在します。これらの問題に対処するために、少量のデータを使用して新しいドメインに迅速に適応できるメタ学習アルゴリズムを用いた低リソースのための新しいフレームワークを提案します。適応的メタ学習のために、事前訓練された言語モデルのアテンションメカニズムに挿入される軽量モジュールを導入します。このモジュールはまず高リソースのタスク関連データセットでメタ学習され、その後低リソースの対象データセットで微調整されます。我々のモデルを11の異なるデータセットで評価しました。実験結果は、提案手法が低リソースにおいて9つのデータセットで最先端の性能を達成することを示しました。

Abstract:　 Recently, supervised abstractive summarization using high-resource datasets, such as CNN/DailyMail and Xsum, has achieved significant performance improvements. However, most of the existing high-resource dataset is biased towards a specific domain like news, and annotating document-summary pairs for low-resource datasets is too expensive. Furthermore, the need for low-resource abstractive summarization task is emerging but existing methods for the task such as transfer learning still have domain shifting and overfitting problems. To address these problems, we propose a new framework for low-resource abstractive summarization using a meta-learning algorithm that can quickly adapt to a new domain using small data. For adaptive meta-learning, we introduce a lightweight module inserted into the attention mechanism of a pre-trained language model; the module is first meta-learned with high-resource task-related datasets and then is fine-tuned with the low-resource target dataset. We evaluate our model on 11 different datasets. Experimental results show that the proposed method achieves the state-of-the-art on 9 datasets in low-resource abstractive summarization.

Towards Personalized Bundle Creative Generation with Contrastive Non-Autoregressive Decoding
Authors: Penghui Wei (1), Shaoguo Liu (1), Xuanhua Yang (1), Liang Wang (1), Bo Zheng (1)
1: Alibaba Group

ACM DL

Google Scholar

(312)
概要:　現在のバンドル生成研究は、ユーザー体験を向上させるためのアイテムの組み合わせを生成することに焦点を当てています。しかし、実際の応用においては、プロモーション効果を向上させるために、アイテム、スローガン、テンプレートといったさまざまなオブジェクトを含むバンドルクリエイティブを生成する必要性も高まっています。私たちは、新たな課題としてバンドルクリエイティブ生成に取り組みます。これは特定のユーザーに対して、そのユーザーが興味を持つようなパーソナライズされたバンドルクリエイティブを生成することを目指すものです。品質と効率の両方を考慮するために、私たちはユーザーの好みを捉える巧妙なデコーディング目的を持つ対比的非自己回帰モデルを提案します。大規模な実世界のデータセットを用いた実験により、提案したモデルがクリエイティブの品質や生成速度の面で顕著な利点を示すことを確認しました。

Abstract:　 Current bundle generation studies focus on generating a combination of items to improve user experience. In real-world applications, there is also a great need to produce bundle creatives that consist of mixture types of objects (e.g., items, slogans and templates) for achieving better promotion effect. We study a new problem named bundle creative generation: for given users, the goal is to generate personalized bundle creatives that the users will be interested in. To take both quality and efficiency into account, we propose a contrastive non-autoregressive model that captures user preferences with ingenious decoding objective. Experiments on large-scale real-world datasets verify that our proposed model shows significant advantages in terms of creative quality and generation speed.

Exploiting Session Information in BERT-based Session-aware Sequential Recommendation
Authors: Jinseok Jamie Seol (1), Youngrok Ko (1), Sang-goo Lee (1)
1: Seoul National University

ACM DL

Google Scholar

(313)
概要:　推薦システムにおいて、ユーザーのインタラクション履歴を時系列情報として利用することは、大幅な性能向上をもたらしています。しかし、多くのオンラインサービスでは、ユーザーのインタラクションは一般にセッションごとにグループ化されており、これは通常の時系列表現技術とは異なるアプローチを必要とします。この目的のために、階層構造や様々な視点を持つ時系列表現モデルが開発されてきましたが、これらは複雑なネットワーク構造を持ちがちです。本稿では、BERTベースの時系列推薦モデルにおけるセッション情報を活用しつつ、追加のパラメータを最小限に抑えることで推薦性能を向上させる3つの方法を提案します。それは、セッショントークンの使用、セッションセグメント埋め込みの追加、そして時間認識セルフアテンションの利用です。提案手法の有効性を、広く使用されている推薦データセットを用いた実験を通じて実証します。

Abstract:　 In recommendation systems, utilizing the user interaction history as sequential information has resulted in great performance improvement. However, in many online services, user interactions are commonly grouped by sessions that presumably share preferences, which requires a different approach from ordinary sequence representation techniques. To this end, sequence representation models with a hierarchical structure or various viewpoints have been developed but with a rather complex network structure. In this paper, we propose three methods to improve recommendation performance by exploiting session information while minimizing additional parameters in a BERT-based sequential recommendation model: using session tokens, adding session segment embeddings, and a time-aware self-attention. We demonstrate the feasibility of the proposed methods through experiments on widely used recommendation datasets.

Posterior Probability Matters: Doubly-Adaptive Calibration for Neural Predictions in Online Advertising
Authors: Penghui Wei (1), Weimin Zhang (1), Ruijie Hou (1), Jinquan Liu (1), Shaoguo Liu (1), Liang Wang (1), Bo Zheng (1)
1: Alibaba Group

ACM DL

Google Scholar

(314)
概要:　ユーザーの反応確率を予測することは、広告のランキングや入札において非常に重要です。私たちは、予測モデルが実際の確率を反映する正確な確率予測を生成できることを期待しています。校正技術は、モデルの予測を後部確率に後処理することを目指しています。特定のフィールド値に対して校正を行うフィールドレベルの校正は、より詳細で実用的です。本論文では、二重適応アプローチAdaCalibを提案します。これは、後部統計の指導のもと、モデル予測を校正するためのアイソトニック関数ファミリを学習し、フィールド値に適した後部確率を確保するためのフィールド適応メカニズムを設計しています。実験により、AdaCalibが校正性能において大幅な改善を達成することが確認されました。このモデルはオンラインに展開され、従来のアプローチを上回る成果を上げています。

Abstract:　 Predicting user response probabilities is vital for ad ranking and bidding. We hope that predictive models can produce accurate probabilistic predictions that reflect true likelihoods. Calibration techniques aims to post-process model predictions to posterior probabilities. Field-level calibration -- which performs calibration w.r.t. to a specific field value -- is fine-grained and more practical. In this paper we propose a doubly-adaptive approach AdaCalib. It learns an isotonic function family to calibrate model predictions with the guidance of posterior statistics, and field-adaptive mechanisms are designed to ensure that the posterior is appropriate for the field value to be calibrated. Experiments verify that AdaCalib achieves significant improvement on calibration performance. It has been deployed online and beats previous approach.

Towards Motivational and Empathetic Response Generation in Online Mental Health Support
Authors: Tulika Saha (1), Vaibhav Gakhreja (1), Anindya Sundar Das (1), Souhitya Chakraborty (1), Sriparna Saha (1)
1: Indian Institute of Technology Patna

ACM DL

Google Scholar

(315)
概要:　患者を助けるための精神保健専門家（Mental Health Professionals, MHP）の不足は、重大な精神疾患である大うつ病性障害に対処するための自動システムの開発の必要性を浮き彫りにしています。本論文では、抑うつや気落ちしたユーザーの最初のコンタクトポイントとして機能するバーチャルアシスタント（Virtual Assistant, VA）を開発します。支援を行う会話において、ポジティブな結果を生むための主要な要素として、共感と動機付けが特定されています。共感はユーザーの感情を認識し助けたいという意欲を必要とする一方で、希望と動機付けを与えることで、困難に直面している支援者の精神を高めます。これらの要素の組み合わせにより、精神保健支援における一般的な良好な結果と有益な連携が確保されます。したがって、VAは共感的かつ動機付けのある応答を生成でき、継続的にポジティブな感情を示す能力が求められます。このエンドツーエンドのシステムでは、次の2つのメカニズムをパイプライン方式で採用しています：(i) 動機付け応答生成器（Motivational Response Generator, MRG）: 感情に基づく強化学習（Reinforcement Learning, RL）を用いた動機付け応答生成器、(ii) 共感リライトフレームワーク（Empathetic Rewriting Framework, ERF）: MRGからの応答を共感を持たせるためにリライトするトランスフォーマーモデル。実験結果は、提案したVAがいくつかの他のシステムを上回ることを示しています。我々の知る限り、これらの要素を統合したエンドツーエンドのシステムは本研究が初めてです。

Abstract:　 The scarcity of Mental Health Professionals (MHPs) available to assist patients underlines the need for developing automated systems to help MHPs combat the grievous mental illness called Major Depressive Disorder. In this paper, we develop a Virtual Assistant (VA) that serves as a first point of contact for users who are depressed or disheartened. In support based conversations, two primary components have been identified to produce positive outcomes,empathy andmotivation. While empathy necessitates acknowledging the feelings of the users with a desire to help, imparting hope and motivation uplifts the spirit of support seekers in distress. A combination of these aspects will ensure generalized positive outcome and beneficial alliance in mental health support. The VA, thus, should be capable of generating empathetic and motivational responses, continuously demonstrating positive sentiment by the VA. The end-to-end system employs two mechanisms in a pipe-lined manner : (i)Motivational Response Generator (MRG) : a sentiment driven Reinforcement Learning (RL) based motivational response generator; and (ii)Empathetic Rewriting Framework (ERF) : a transformer based model that rewrites the response from MRG to induce empathy. Experimental results indicate that our proposed VA outperforms several of its counterparts. To the best of our knowledge, this is the first work that seeks to incorporate these aspects together in an end-to-end system.

Selective Fairness in Recommendation via Prompts
Authors: Yiqing Wu (1), Ruobing Xie (2), Yongchun Zhu (3), Fuzhen Zhuang (4), Ao Xiang (3), Xu Zhang (2), Leyu Lin (2), Qing He (3)
1: Institute of Computing Technology, 2: Tencent, 3: Institute of Computing Technology, 4: Beihang University

ACM DL

Google Scholar

(316)
概要:　推薦システムにおける公平性が最近大きな注目を集めています。現実世界のシステムでは、ユーザーには通常、年齢、性別、職業などの複数のセンシティブな属性が存在し、ユーザーはこれらの属性によって推薦結果が影響されることを望まない場合があります。さらに、これらのユーザー属性のうちどれを、いつ考慮すべきかは、ユーザーの特定の要求に依存するべきです。本研究では、ユーザーがどのセンシティブな属性を推薦モデルから排除すべきかを柔軟に選択できる選択的公平性タスクを定義します。我々は、属性特有のプロンプトベースのバイアス排除を用いた対戦学習に基づく新しいパラメータ効率の良いプロンプトベースの公平性認識推薦（PFRec）フレームワークを提案し、逐次推薦において異なる属性の組み合わせにおける選択的公平性を実現します。タスク特有およびユーザー特有のプロンプトが考慮されます。我々は、PFRecの選択的公平性における優位性を検証するために広範な評価を行います。ソースコードは\urlhttps://github.com/wyqing20/PFRecで公開されています。

Abstract:　 Recommendation fairness has attracted great attention recently. In real-world systems, users usually have multiple sensitive attributes (e.g. age, gender, and occupation), and users may not want their recommendation results influenced by those attributes. Moreover, which of and when these user attributes should be considered in fairness-aware modeling should depend on users' specific demands. In this work, we define the selective fairness task, where users can flexibly choose which sensitive attributes should the recommendation model be bias-free. We propose a novel parameter-efficient prompt-based fairness-aware recommendation (PFRec) framework, which relies on attribute-specific prompt-based bias eliminators with adversarial training, enabling selective fairness with different attribute combinations on sequential recommendation. Both task-specific and user-specific prompts are considered. We conduct extensive evaluations to verify PFRec's superiority in selective fairness. The source codes are released in \urlhttps://github.com/wyqing20/PFRec.

Multi-label Masked Language Modeling on Zero-shot Code-switched Sentiment Analysis
Authors: Zhi Li (1), Xing Gao (2), Ji Zhang (2), Yin Zhang (1)
1: Zhejiang University, 2: Alibaba Group

ACM DL

Google Scholar

(317)
概要:　多言語コミュニティにおいて、コードスイッチングは一般的な現象であり、コードスイッチングに関するタスクは自然言語処理（NLP）アプリケーションの重要な研究分野となっています。既存のアプローチは主に教師あり学習に焦点を当てていますが、十分な量のコードスイッチングデータに注釈を付けることは高コストです。本論文では、ゼロショット設定を考慮し、モノリンガル言語データセット、ラベルなしのコードスイッチングデータセット、および意味辞書を用いてコードスイッチングタスクのモデル性能を向上させます。コードスイッチングのメカニズムから着想を得て、マルチラベルマスク付き言語モデルを提案し、マスクされた単語と他の言語での同義語の両方を予測します。実験結果は、ベースラインと比較して、我々の手法が事前学習された多言語モデルのコードスイッチングされた感情分析データセットにおける性能をさらに向上させることを示しています。

Abstract:　 In multilingual communities, code-switching is a common phenomenon and code-switched tasks have become a crucial area of research in natural language processing (NLP) applications. Existing approaches mainly focus on supervised learning. However, it is expensive to annotate a sufficient amount of code-switched data. In this paper, we consider zero-shot setting and improve model performance on code-switched tasks via monolingual language datasets, unlabeled code-switched datasets, and semantic dictionaries. Inspired by the mechanism of code-switching itself, we propose multi-label masked language modeling and predict both the masked word and its synonyms in other languages. Experimental results show that compared with baselines, our method can further improve the pretrained multilingual model's performance on code-switched sentiment analysis datasets.

Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings
Authors: Esaú Villatoro-Tello (1), Srikanth Madikeri (1), Petr Motlicek (1), Aravind Ganapathiraju (2), Alexei V. Ivanov (2)
1: Idiap Research Institute, 2: Uniphore Software Systems Inc.

ACM DL

Google Scholar

(318)
概要:　本論文では、音声文書検索 (SDR) のためのラティス展開アルゴリズムに基づいて、より豊かな形態の自動音声認識 (ASR) 出力を処理するための様々な代替案を評価します。通常、SDRシステムはASRのトランスクリプトを使用して関連する文書をインデックス化し、検索を行います。しかし、ASRの誤りは検索のパフォーマンスに悪影響を与えます。そのため、一つの最良仮説の誤りを補うために、複数の代替仮説を文書検索の入力として増強することが可能です。重み付き有限状態トランスデューサ (WFST) ベースのASRシステムでは、n-best出力（つまり上位「n」件のスコアリング仮説）を検索タスクに使用することが一般的です。これは従来の情報検索 (IR) パイプラインに容易に組み込めるためです。しかし、n-best仮説は非常に冗長であり、ASR出力の豊かさを十分に表現していません。この豊かさはラティスと呼ばれる非巡回有向グラフとして表現されます。特に、ラティスの制約付き最小パスカバーを使用して、IRの再ランキングフェーズの入力となる最小セットの仮説を生成します。我々の提案アプローチの新規性は、ラティスの各アークを代表する仮説セットを考慮することにより、ニューラル再ランキングの入力としてラティスを組み込む点にあります。得られた仮説は、SBERTやRoBERTaなどのBERTベースモデルを用いた文エンベディングを通じてエンコードされ、入力クエリと仮説セット間の計算されたスコアの最大プーリング操作により、検索セグメントの最終順位が取得されます。公開されているAMIミーティングコーパスで評価を行いました。我々の結果は、拡張ラティスから得られた仮説の使用がn-best ASR出力と比較してSDRのパフォーマンスを大幅に改善することを示しています。

Abstract:　 In this paper, we evaluate different alternatives to process richer forms of Automatic Speech Recognition (ASR) output based on lattice expansion algorithms for Spoken Document Retrieval (SDR). Typically, SDR systems employ ASR transcripts to index and retrieve relevant documents. However, ASR errors negatively affect the retrieval performance. Multiple alternative hypotheses can also be used to augment the input to document retrieval to compensate for the erroneous one-best hypothesis. In Weighted Finite State Transducer-based ASR systems, using the n-best output (i.e. the top "n'' scoring hypotheses) for the retrieval task is common, since they can easily be fed to a traditional Information Retrieval (IR) pipeline. However, the n-best hypotheses are terribly redundant, and do not sufficiently encapsulate the richness of the ASR output, which is represented as an acyclic directed graph called the lattice. In particular, we utilize the lattice's constrained minimum path cover to generate a minimum set of hypotheses that serve as input to the reranking phase of IR. The novelty of our proposed approach is the incorporation of the lattice as an input for neural reranking by considering a set of hypotheses that represents every arc in the lattice. The obtained hypotheses are encoded through sentence embeddings using BERT-based models, namely SBERT and RoBERTa, and the final ranking of the retrieved segments is obtained with a max-pooling operation over the computed scores among the input query and the hypotheses set. We present our evaluation on the publicly available AMI meeting corpus. Our results indicate that the proposed use of hypotheses from the expanded lattice improves the SDR performance significantly over the n-best ASR output.

Extractive Elementary Discourse Units for Improving Abstractive Summarization
Authors: Ye Xiong (1), Teeradaj Racharak (1), Minh Le Nguyen (1)
1: Japan Advanced Institute of Science and Technology

ACM DL

Google Scholar

(319)
概要:　生成は、元の文書から簡潔で流暢な文章を生成しつつ、元の意図を維持し、元の文書には現れない新しい単語を含むことに焦点を当てます。最近の研究では、抽出を再構成すると、文単位でのより簡潔で理解しやすい出力の性能が向上することが示されています。しかし、単一の文書内の文だけでは通常、十分な情報を提供できません。本論文では、内容選択のテキスト単位として、基本談話単位（EDU）を適用します。高品質なを生成するためにEDUを利用するために、まず顕著な内容を選択するEDUセレクタを設計し、次に選択されたEDUを最終的なとして再構成する新しいモデルを提案します。各EDUの文書全体に対する関連性を判断するために、グループタグ埋め込みを適用し、と関連するEDUとの連結を確立します。これにより、ジェネレータは選択されたEDUのみに焦点を当てるのではなく、元の文書全体を取り込むことが可能となります。CNN/Daily Mailデータセットを用いた広範な実験により、我々のモデルの有効性が実証されました。

Abstract:　 ive summarization focuses on generating concise and fluent text from an original document while maintaining the original intent and containing the new words that do not appear in the original document. Recent studies point out that rewriting extractive summaries help improve the performance with a more concise and comprehensible output summary, which uses a sentence as a textual unit. However, a single document sentence normally cannot supply sufficient information. In this paper, we apply elementary discourse unit (EDU) as textual unit of content selection. In order to utilize EDU for generating a high quality summary, we propose a novel summarization model that first designs an EDU selector to choose salient content. Then, the generator model rewrites the selected EDUs as the final summary. To determine the relevancy of each EDU on the entire document, we choose to apply group tag embedding, which can establish the connection between summary sentences and relevant EDUs, so that our generator does not only focus on selected EDUs, but also ingest the entire original document. Extensive experiments on the CNN/Daily Mail dataset have demonstrated the effectiveness of our model.

LightSGCN: Powering Signed Graph Convolution Network for Link Sign Prediction with Simplified Architecture Design
Authors: Haoxin Liu (1)
1: Tsinghua University

ACM DL

Google Scholar

(320)
概要:　正と負の両方のリンクを持つ署名付きグラフは、現実世界で広く存在しています。最近では、署名付きグラフニューラルネットワーク（GNN）が最も一般的な署名付きグラフの解析タスク、すなわちリンク符号予測で優れた性能を示しています。既存の署名付きGNNは、非署名付きGNNの古典的な非線形伝播パラダイムに従っています。しかし、最近の複数の非署名付きGNNに関する研究では、このようなパラダイムが訓練の困難さを増し、様々な非署名付きグラフ解析タスクの性能を低下させることが示されています。一方で、多くの公開されている現実の署名付きグラフデータセットはノードの特徴量を提供していません。これにより、既存の複雑なモデルアーキテクチャが適切であるかどうかを検討する動機が生まれます。

本研究では、署名付きGNNのアーキテクチャを簡素化し、リンク符号予測に対してより簡潔で適切なものにすることを目指します。ここでは、簡素化された署名付きグラフ畳み込みネットワークモデル「LightSGCN」を提案します。具体的には、LightSGCNは、広く採用されている社会理論であるバランス理論に基づく線形伝播を利用します。その後、各層での隠れ表現の線形結合が最終的な表現として使用されます。さらに、特化した予測関数も提案します。これらにより、単純でありながら効果的なLightSGCNモデルが生まれ、より解釈しやすく、実装しやすく、訓練効率も向上します。4つの現実世界の署名付きグラフを用いた実験結果は、このような線形手法が最先端の署名付きGNN手法を上回り、リンク符号予測タスクで大幅な改善を達成し、最も類似した簡単なベースラインに対して100倍以上の速度向上を実現することを示しています。

Abstract:　 With both positive and negative links, signed graphs exist widely in the real world. Recently, signed graph neural networks (GNNs) have shown superior performance in the most common signed graph analysis task, i.e., link sign prediction. Existing signed GNNs follow the classic nonlinear-propagation paradigm in unsigned GNNs. However, several recent studies on unsigned GNNs have shown that such a paradigm increases training difficulty and even reduces performance in various unsigned graph analysis tasks. Meanwhile, most of the public real-world signed graph datasets do not provide node features. These motivate us to consider whether the existing complex model architecture is suitable. In this work, we aim to simplify the architecture of signed GNNs to make it more concise and appropriate for link sign prediction. We propose a simplified signed graph convolution network model called LightSGCN. Specifically, LightSGCN utilizes linear propagation based on the balance theory, a widely adopted social theory. Then, the linear combination of hidden representations at each layer is used as the final representations. Moreover, we also propose a tailored prediction function. These finally yield a simple yet effective LightSGCN model, which is more interpretable, easier to implement, and more efficient to train. Experimental results on four real-world signed graphs demonstrate that such a linear method outperforms the state-of-the-art signed GNNs methods with significant improvement in the link sign prediction task and achieves more than 100X speedup over the most similar and simplest baseline.

Dual Contrastive Network for Sequential Recommendation
Authors: Guanyu Lin (1), Chen Gao (1), Yinfeng Li (1), Yu Zheng (1), Zhiheng Li (1), Depeng Jin (1), Yong Li (1)
1: Tsinghua University

ACM DL

Google Scholar

(321)
概要:　今日の推薦システムで広く応用されている逐次推薦は、ユーザーの歴史的なアイテムシーケンスを基に次に利用されるアイテムを予測します。しかし、逐次推薦も他の多くのレコメンダーシステムと同様にデータの希少性の問題に悩まされています。データから補助的なシグナルを抽出するために、最近の研究ではドロップアウト戦略を用いて増強データを生成する自己教師あり学習が試みられていますが、これによりシーケンスデータがさらに希薄化しシグナルが不明瞭になるという問題が生じます。本論文では、アイテムに対する補助的なユーザーシーケンスの統合という新しい視点から、逐次推薦を強化するためのデュアルコントラストネットワーク（DCN）を提案します。具体的には、二種類のコントラスト学習を提案します。第一に、ユーザーやアイテムの埋め込みとシーケンス表現の距離を最小化するデュアル表現コントラスト学習です。第二に、補助的な訓練を通じて次のアイテム予測の静的な関心と動的な関心を自己監督するデュアルインタレストコントラスト学習です。また、特定のユーザータイプが好むアイテムの傾向を捉えるため、アイテムの歴史的ユーザーシーケンスを基に次のユーザーを予測する補助的なタスクも取り入れています。ベンチマークデータセットでの実験により、提案手法の有効性が検証されました。さらなるアブレーションスタディによって、提案された各コンポーネントが異なる逐次モデルに与える強化効果も示されています。

Abstract:　 Widely applied in today's recommender systems, sequential recommendation predicts the next interacted item for a given user via his/her historical item sequence. However, sequential recommendation suffers data sparsity issue like most recommenders. To extract auxiliary signals from the data, some recent works exploit self-supervised learning to generate augmented data via dropout strategy, which, however, leads to sparser sequential data and obscure signals. In this paper, we propose D ual C ontrastive N etwork (DCN) to boost sequential recommendation, from a new perspective of integrating auxiliary user-sequence for items. Specifically, we propose two kinds of contrastive learning. The first one is the dual representation contrastive learning that minimizes the distances between embeddings and sequence-representations of users/items. The second one is the dual interest contrastive learning which aims to self-supervise the static interest with the dynamic interest of next item prediction via auxiliary training. We also incorporate the auxiliary task of predicting next user for a given item's historical user sequence, which can capture the trends of items preferred by certain types of users. Experiments on benchmark datasets verify the effectiveness of our proposed method. Further ablation study also illustrates the boosting effect of the proposed components upon different sequential models.

ReLoop: A Self-Correction Continual Learning Loop for Recommender Systems
Authors: Guohao Cai (1), Jieming Zhu (1), Quanyu Dai (1), Zhenhua Dong (1), Xiuqiang He (2), Ruiming Tang (1), Rui Zhang (3)
1: Huawei Noah's Ark Lab, 2: Shenzhen, 3: www.ruizhang.info

ACM DL

Google Scholar

(322)
概要:　ディープラーニングを基盤としたレコメンデーションは、さまざまなオンラインアプリケーションで広く採用されている。通常、デプロイメントされたモデルは、新しく収集されたユーザーのインタラクションログからユーザーの動的な行動を捉えるために頻繁に再トレーニングが行われる。しかし、現在のモデル訓練プロセスはユーザーのフィードバックをラベルとして取得するのみであり、過去のレコメンデーションで犯したエラーを考慮に入れていない。人間が通常、過ちから反省し学ぶという直感にインスパイアされて、本論文ではレコメンデーションシステムに対する自己修正の継続的学習ループ（ReLoopと呼ばれる）を構築しようと試みる。特に、新しいカスタマイズされた損失関数を使用し、新しいモデルバージョンが訓練中に以前のモデルバージョンの予測誤差を減少させることを奨励する。我々のReLoop学習フレームワークは長期的な自己修正プロセスを可能にし、既存の訓練戦略よりも優れたパフォーマンスを発揮することが期待される。ReLoopの有効性を検証するために、オフライン実験およびオンラインA/Bテストの両方が実施された。

Abstract:　 Deep learning-based recommendation has become a widely adopted technique in various online applications. Typically, a deployed model undergoes frequent re-training to capture users' dynamic behaviors from newly collected interaction logs. However, the current model training process only acquires users' feedbacks as labels, but fails to take into account the errors made in previous recommendations. Inspired by the intuition that humans usually reflect and learn from mistakes, in this paper, we attempt to build a self-correction continual learning loop (dubbed ReLoop) for recommender systems. In particular, a new customized loss is employed to encourage every new model version to reduce prediction errors over the previous model version during training. Our ReLoop learning framework enables a continual self-correction process in the long run and thus is expected to obtain better performance over existing training strategies. Both offline experiments and an online A/B test have been conducted to validate the effectiveness of ReLoop.

Task-Oriented Dialogue System as Natural Language Generation
Authors: Weizhi Wang (1), Zhirui Zhang (2), Junliang Guo (3), Yinpei Dai (4), Boxing Chen (4), Weihua Luo (4)
1: University of California, 2: Tencent AI Lab, 3: Microsoft Research Asia, 4: Alibaba DAMO Academy

ACM DL

Google Scholar

(323)
概要:　本論文では、大規模な事前学習モデル（GPT-2など）を最大限に活用し、複雑な非語彙化前処理を簡略化するために、タスク指向の対話システムを純粋な自然言語生成タスクとして定式化することを提案します。しかし、この手法を直接適用すると、非語彙化トークンの削除に起因する対話エンティティの不整合や、ファインチューニング中に事前学習モデルが直面する壊滅的な忘却問題により、満足のいく性能が得られません。これらの問題を軽減するために、我々はGPT-Adapter-CopyNetネットワークを新たに設計しました。これは、軽量なアダプタとCopyNetモジュールをGPT-2に組み込むことで、遷移学習と対話エンティティ生成の性能を向上させるものです。DSTC8 Track 1ベンチマークおよびMultiWOZデータセットで実施された実験結果は、我々の提案手法が自動評価および人間評価の両方で、ベースラインモデルを大幅に上回る卓越した性能を示すことを証明しています。

Abstract:　 In this paper, we propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing. However, directly applying this method heavily suffers from the dialogue entity inconsistency caused by the removal of delexicalized tokens, as well as the catastrophic forgetting problem of the pre-trained model during fine-tuning, leading to unsatisfactory performance. To alleviate these problems, we design a novel GPT-Adapter-CopyNet network, which incorporates the lightweight adapter and CopyNet modules into GPT-2 to achieve better performance on transfer learning and dialogue entity generation. Experimental results conducted on the DSTC8 Track 1 benchmark and MultiWOZ dataset demonstrate that our proposed approach significantly outperforms baseline models with a remarkable performance on automatic and human evaluations.

Preference Enhanced Social Influence Modeling for Network-Aware Cascade Prediction
Authors: Likang Wu (1), Hao Wang (1), Enhong Chen (1), Zhi Li (1), Hongke Zhao (2), Jianhui Ma (1)
1: University of Science and Technology of China & State Key Laboratory of Cognitive Intelligence, 2: Tianjin University

ACM DL

Google Scholar

(324)
概要:　ネットワーク対応型カスケードサイズ予測は、ソーシャルネットワークにおける情報伝搬プロセスをモデル化することにより、ユーザー生成情報の最終リポスト数を予測することを目指しています。情報拡散プロセスにおいて、ソーシャルインフルエンス、すなわち状態活性化によってユーザーのリポスト確率を推定することは重要な役割を果たします。そのため、ノード間の情報相互作用をシミュレーションできるグラフニューラルネットワーク（GNN）は、この予測タスクを処理するための効果的なスキームであることが証明されています。しかし、既存の研究、特にGNNベースのモデルは、状態活性化に深く影響するユーザーの嗜好という重要な要素を無視しがちです。そのために、我々は嗜好トピック生成、嗜好の変化モデリング、そしてソーシャルインフルエンス活性化という3つのステージを元にユーザー嗜好のモデリングを強化することで、カスケードサイズ予測を促進する新しいフレームワークを提案します。我々のエンドツーエンドの手法は、情報拡散のユーザー活性化プロセスをより適応的かつ正確にします。2つの大規模な実データセットを用いた広範な実験により、最先端のベースラインと比較して、提案モデルの有効性が明確に示されました。

Abstract:　 Network-aware cascade size prediction aims to predict the final reposted number of user-generated information via modeling the propagation process in social networks. Estimating the user's reposting probability by social influence, namely state activation plays an important role in the information diffusion process. Therefore, Graph Neural Networks (GNN), which can simulate the information interaction between nodes, has been proved as an effective scheme to handle this prediction task. However, existing studies including GNN-based models usually neglect a vital factor of user's preference which influences the state activation deeply. To that end, we propose a novel framework to promote cascade size prediction by enhancing the user preference modeling according to three stages, i.e., preference topics generation, preference shift modeling, and social influence activation. Our end-to-end method makes the user activating process of information diffusion more adaptive and accurate. Extensive experiments on two large-scale real-world datasets have clearly demonstrated the effectiveness of our proposed model compared to state-of-the-art baselines.

Constructing Better Evaluation Metrics by Incorporating the Anchoring Effect into the User Model
Authors: Nuo Chen (1), Fan Zhang (2), Tetsuya Sakai (1)
1: Waseda University, 2: Wuhan University

ACM DL

Google Scholar

(325)
概要:　既存の評価指標のモデルは、ユーザーが最大限の効用を追求する合理的な意思決定者であることを前提としています。しかし、行動経済学の研究によれば、人々は意思決定時に必ずしも合理的ではありません。先行研究では、アンカリング効果が文書の関連性判断に影響を与えることが示されています。本論文では、合理的ユーザーの仮定に挑戦し、アンカリング効果をユーザーモデルに導入します。まず、アンカリング効果をユーザーモデルに取り入れることで、クエリレベルの評価指標に対するフレームワークを提案します。このフレームワークでは、アンカリング効果の大きさは前の文書の質に関連しています。次に、我々のフレームワークをいくつかのクエリレベルの評価指標に適用し、ユーザー満足度に関して、その標準版をベースラインとして比較します。その結果、アンカリング対応メトリック（AM）は、ユーザー満足度との相関においてベースラインを上回りました。この結果は、アンカリング効果を既存の評価指標のユーザーモデルに組み込むことで、ユーザーのクエリ満足度のフィードバックをより正確に予測できることを示唆しています。我々の知る限り、情報検索の評価指標にアンカリング効果を導入するのは初めてです。我々の発見は、行動経済学の視点から、検索インタラクションにおけるユーザー行動と満足度をよりよく理解するための一助となります。

Abstract:　 Models of existing evaluation metrics assume that users are rational decision-makers trying to pursue maximised utility. However, studies in behavioural economics show that people are not always rational when making decisions. Previous studies showed that the anchoring effect can influence the relevance judgement of a document. In this paper, we challenge the rational user assumption and introduce the anchoring effect into user models. We first propose a framework for query-level evaluation metrics by incorporating the anchoring effect into the user model. In the framework, the magnitude of the anchoring effect is related to the quality of the previous document. We then apply our framework to several query-level evaluation metrics and compare them with their vanilla version as the baseline in terms of user satisfaction on a publicly available search dataset. As a result, our Anchoring-aware Metrics (AMs) outperformed their baselines in term of correlation with user satisfaction. The result suggests that we can better predict user query satisfaction feedbacks by incorporating the anchoring effect into user models of existing evaluating metrics. As far as we know, we are the first to introduce the anchoring effect into information retrieval evaluation metrics. Our findings provide a perspective from behavioural economics to better understand user behaviour and satisfaction in search interaction.

Where Does the Performance Improvement Come From?: - A Reproducibility Concern about Image-Text Retrieval
Authors: Jun Rao (1), Fei Wang (2), Liang Ding (3), Shuhan Qi (1), Yibing Zhan (3), Weifeng Liu (2), Dacheng Tao (3)
1: Harbin Institute of Technology, 2: China University of Petroleum (East China), 3: JD Explore Academy

ACM DL

Google Scholar

(326)
概要:　本論文は、情報検索コミュニティに対して、画像-テキスト検索モデルの再現性についての考察を通じて、検索学習の最近の進展に関する情報を提供することを目的としています。この10年間でマルチモーダルデータの増加に伴い、画像-テキスト検索は情報検索分野で主要な研究方向となってきました。多くの研究者は、MS-COCOやFlickr30kといったベンチマークデータセットを用いて、画像-テキスト検索アルゴリズムを訓練および評価しています。これまでの研究は主に性能に焦点を当て、多様な方法で最先端の手法が提案されてきました。それらの手法は、より優れたモダリティ相互作用を提供し、より正確なマルチモーダル表現を生成するとされています。従来の研究とは対照的に、我々は手法の再現性に着目し、事前訓練モデルおよび未訓練モデルによる画像とテキストの検索性能向上に寄与する要素を解析します。具体的には、まず関連する再現性の問題点を調査し、画像-テキスト検索タスクに焦点を当てる理由を説明します。次に、現在の画像-テキスト検索モデルのパラダイムとそれらのアプローチの貢献点を体系的にまとめます。第三に、事前訓練モデルおよび未訓練モデルの再現に関する様々な側面を分析します。これを達成するために、アブレーション実験を実施し、原論文で主張された改善よりも検索リコールに影響を与える要因を特定しました。最後に、情報検索コミュニティが今後考慮すべき反省点と課題を提示します。我々のソースコードは、https://github.com/WangFei-2019/Image-text-Retrieval で公開されています。

Abstract:　 This article aims to provide the information retrieval community with some reflections on recent advances in retrieval learning by analyzing the reproducibility of image-text retrieval models. Due to the increase of multimodal data over the last decade, image-text retrieval has steadily become a major research direction in the field of information retrieval. Numerous researchers train and evaluate image-text retrieval algorithms using benchmark datasets such as MS-COCO and Flickr30k. Research in the past has mostly focused on performance, with multiple state-of-the-art methodologies being suggested in a variety of ways. According to their assertions, these techniques provide improved modality interactions and hence more precise multimodal representations. In contrast to previous works, we focus on the reproducibility of the approaches and the examination of the elements that lead to improved performance by pretrained and nonpretrained models in retrieving images and text. To be more specific, we first examine the related reproducibility concerns and explain why our focus is on image-text retrieval tasks. Second, we systematically summarize the current paradigm of image-text retrieval models and the stated contributions of those approaches. Third, we analyze various aspects of the reproduction of pretrained and nonpretrained retrieval models. To complete this, we conducted ablation experiments and obtained some influencing factors that affect retrieval recall more than the improvement claimed in the original paper. Finally, we present some reflections and challenges that the retrieval community should consider in the future. Our source code is publicly available at https://github.com/WangFei-2019/Image-text-Retrieval.

State Encoders in Reinforcement Learning for Recommendation: A Reproducibility Study
Authors: Jin Huang (1), Harrie Oosterhuis (2), Bunyamin Cetinkaya (1), Thijs Rood (1), Maarten de Rijke (1)
1: University of Amsterdam, 2: Radboud University

ACM DL

Google Scholar

(327)
概要:　提案された強化学習（RL）をレコメンデーションシステムに応用する手法は、ユーザーからのフィードバックに迅速に適応できるため、ますます注目を集めています。典型的なRL4Recフレームワークは、（1）ユーザーの過去のインタラクションを記録する状態をエンコードする状態エンコーダー、および（2）アクションを取り、報酬を観察するRLメソッドから構成されます。以前の研究では、実際のログデータに基づいてシミュレーションされたユーザーフィードバック環境で、4種類の状態エンコーダーを比較しました。その結果、注意メカニズムに基づく状態エンコーダーが最高のパフォーマンスを達成することが判明しました。しかし、この結果はアクタークリティック法、4つの状態エンコーダー、およびログデータのバイアスを排除しない評価シミュレーターに限定されています。この欠点に対処するために、本研究では以下の点で既存の比較結果を再現・拡張します。（1）公開されているデバイアスされたRL4Rec SOFAシミュレーターにおいて、（2）異なるRLメソッドを用い、（3）より多くの状態エンコーダーを比較し、（4）異なるデータセットを使用します。重要なことに、実験結果は既存の知見が異なるデータセットから生成されたデバイアスされたSOFAシミュレーターや、DQNベースのメソッド、追加の状態エンコーダーに対して一般化できないことを示しています。

Abstract:　 Methods for reinforcement learning for recommendation are increasingly receiving attention as they can quickly adapt to user feedback. A typical RL4Rec framework consists of (1) a state encoder to encode the state that stores the users' historical interactions, and (2) an RL method to take actions and observe rewards. Prior work compared four state encoders in an environment where user feedback is simulated based on real-world logged user data. An attention-based state encoder was found to be the optimal choice as it reached the highest performance. However, this finding is limited to the actor-critic method, four state encoders, and evaluation-simulators that do not debias logged user data. In response to these shortcomings, we reproduce and expand on the existing comparison of attention-based state encoders (1) in the publicly available debiased RL4Rec SOFA simulator with (2) a different RL method, (3) more state encoders, and (4) a different dataset. Importantly, our experimental results indicate that existing findings do not generalize to the debiased SOFA simulator generated from a different dataset and a DQN-based method when compared with more state encoders.

Another Look at Information Retrieval as Statistical Translation
Authors: Yuqi Liu (1), Chengcheng Hu (1), Jimmy Lin (1)
1: University of Waterloo

ACM DL

Google Scholar

(328)
概要:　 20年以上前、BergerとLaffertyは「情報検索を統計的翻訳として」(IRST)という、雑音チャンネルモデルに基づく簡潔かつ優雅なアドホック検索手法を提案しました。当時、彼らは適切にモデルを訓練するために必要な大規模な人間によって注釈されたデータセットを持っていませんでした。本稿では、シンプルな質問を投げかけます：今日当然のように利用しているMS MARCOパッセージランキングデータセットのようなデータセットが、もしBergerとLaffertyにあったとしたらどうなるでしょうか？この質問に対する答えは、ランキングの最近の改善が、モデル（例：事前訓練されたトランスフォーマー）や最適化技術（例：コントラスト損失）の改良ではなく、利用可能なデータの増加にどの程度依存しているかを示しています。実際、BoytsovとKolterは最近、BergerとLaffertyのモデルを再現し、この質問に部分的に答え始めました。本研究は、その一般化として、ColBERTのMaxSimオペレーターを含む、以前未探査だった追加条件への拡張を行った、もう一つの独立した再現努力と見なすことができます。我々は、ニューラルモデル（特に事前訓練されたトランスフォーマー）が検索効果において大きな進歩をもたらした一方で、数十年前に提案されたIRSTモデルも十分な訓練データを提供されれば非常に効果的であることを確認しました。

Abstract:　 Over two decades ago, Berger and Lafferty proposed "information retrieval as statistical translation" (IRST), a simple and elegant method for ad hoc retrieval based on the noisy channel model. At the time, they lacked the large-scale human-annotated datasets necessary to properly train their models. In this paper, we ask the simple question: What if Berger and Lafferty had access to datasets such as the MS MARCO passage ranking dataset that we take for granted today? The answer to this question tells us how much of recent improvements in ranking can be solely attributed to having more data available, as opposed to improvements in models (e.g., pretrained transformers) and optimization techniques (e.g., contrastive loss). In fact, Boytsov and Kolter recently began to answer this question with a replication of Berger and Lafferty's model, and this work can be viewed as another independent replication effort, with generalizations to additional conditions not previously explored, including replacing the sum of translation probabilities with ColBERT's MaxSim operator. We confirm that while neural models (particularly pretrained transformers) have indeed led to great advances in retrieval effectiveness, the IRST model proposed decades ago is quite effective if provided sufficient training data.

Experiments on Generalizability of User-Oriented Fairness in Recommender Systems
Authors: Hossein A. Rahmani (1), Mohammadmehdi Naghiaei (2), Mahdi Dehghan (3), Mohammad Aliannejadi (4)
1: University College London, 2: University of Southern California, 3: Shahid Beheshti University, 4: University of Amsterdam

ACM DL

Google Scholar

(329)
概要:　近年のレコメンダーシステムにおける研究は、推奨の質を測る重要な側面としてフェアネス（公正性）に主に焦点を当てています。フェアネスに配慮したレコメンダーシステムは、異なるユーザーグループを同様に扱うことを目指しています。ユーザー志向のフェアネスに関する関連研究では、ユーザーの活動レベルに基づいて定義された特定のユーザーグループに対するフェアネス非対応の推奨アルゴリズムの差別的な振る舞いを強調しています。典型的な解決策には、基本的なランキングモデルの不公平な振る舞いを軽減するために、ユーザー中心のフェアネス再ランキングフレームワークを提案することが含まれます。本論文では、ユーザー志向のフェアネス研究を再現し、提案された方法が推奨のドメイン、基本ランキングモデルの性質、およびユーザーグループの方法など、さまざまなフェアネスおよび推奨の側面に依存するかどうかを分析するための広範な実験を提供します。さらに、最終的な推奨をユーザー側（例：NDCG、ユーザーフェアネス）およびアイテム側（例：新規性、アイテムフェアネス）の指標から評価します。我々は、異なる評価指標におけるモデルのパフォーマンス間の興味深い傾向とトレードオフを発見しました。例えば、優遇/不利なユーザーグループの定義がフェアネスアルゴリズムの効果と特定の基本ランキングモデルのパフォーマンス向上において重要な役割を果たすことがわかりました。最後に、本分野におけるいくつかの重要な未解決の課題と今後の方向性を強調します。データ、評価パイプライン、および訓練されたモデルは、https://github.com/rahmanidashti/FairRecSys にて公開しています。

Abstract:　 Recent work in recommender systems mainly focuses on fairness in recommendations as an important aspect of measuring recommendations quality. A fairness-aware recommender system aims to treat different user groups similarly. Relevant work on user-oriented fairness highlights the discriminant behavior of fairness-unaware recommendation algorithms towards a certain user group, defined based on users' activity level. Typical solutions include proposing a user-centered fairness re-ranking framework applied on top of a base ranking model to mitigate its unfair behavior towards a certain user group i.e., disadvantaged group. In this paper, we re-produce a user-oriented fairness study and provide extensive experiments to analyze the dependency of their proposed method on various fairness and recommendation aspects, including the recommendation domain, nature of the base ranking model, and user grouping method. Moreover, we evaluate the final recommendations provided by the re-ranking framework from both user- (e.g., NDCG, user-fairness) and item-side (e.g., novelty, item-fairness) metrics. We discover interesting trends and trade-offs between the model's performance in terms of different evaluation metrics. For instance, we see that the definition of the advantaged/disadvantaged user groups plays a crucial role in the effectiveness of the fairness algorithm and how it improves the performance of specific base ranking models. Finally, we highlight some important open challenges and future directions in this field. We release the data, evaluation pipeline, and the trained models publicly on https://github.com/rahmanidashti/FairRecSys.

Users and Contemporary SERPs: A (Re-)Investigation
Authors: Nirmal Roy (1), David Maxwell (1), Claudia Hauff (1)
1: Delft University of Technology

ACM DL

Google Scholar

(330)
概要:　過去20年間で、検索エンジン結果ページ（SERP）は、単純な十の青いリンクのパラダイムから、多くの垂直方向およびテキスト情報の粒度を含む、かなり複雑なプレゼンテーションに進化してきました。先行研究では、SERP上のユーザーの相互作用が、異種コンテンツ（例えば、画像、動画、ニュースコンテンツ）の有無、SERPのレイアウト（リスト形式 vs. グリッド形式）、およびタスクの複雑さによってどのように影響されるかが調査されてきました。本論文では、5〜10年前に行われた研究が今日でも有効かどうかを探るために、~\citetarguello2012task と~\citetsiu2014first の研究を再現しました。これを目的として、4つの異なるSERPインターフェースを用いたユーザースタディを設計し実施しました：(i) ~\empha heterogeneous grid ;(ii) ~\empha heterogeneous list ;(iii) ~\empha simple grid ;(iv) ~\empha simple list. 41人の参加者の12の検索タスクでの相互作用を収集し分析しました。その結果、SERPの種類とタスクの複雑さが検索結果とのユーザーの相互作用に影響を与えることがわかりました。また、~\citearguello2012task,siu2014first の観察事項のうち6つ（8つ中）を支持する証拠が見つかり、異なるインターフェースや複雑さの異なるタスクを解決するためのユーザーの相互作用は時間を経てもほぼ同じであることが示されました。

Abstract:　 TheSearch Engine Results Page (SERP) has evolved significantly over the last two decades, moving away from the simple ten blue links paradigm to considerably more complex presentations that contain results from multiple verticals and granularities of textual information. Prior works have investigated how user interactions on the SERP are influenced by the presence or absence of heterogeneous content (e.g., images, videos, or news content), the layout of the SERP (\emphlist vs. grid layout), and task complexity. In this paper, we reproduce the user studies conducted in prior works---specifically those of~\citetarguello2012task and~\citetsiu2014first ---to explore to what extent the findings from research conducted five to ten years ago still hold today as the average web user has become accustomed to SERPs with ever-increasing presentational complexity. To this end, we designed and ran a user study with four different SERP interfaces:(i) ~\empha heterogeneous grid ;(ii) ~\empha heterogeneous list ;(iii) ~\empha simple grid ; and(iv) ~\empha simple list. We collected the interactions of $41$ study participants over $12$ search tasks for our analyses. We observed that SERP types and task complexity affect user interactions with search results. We also find evidence to support most (6 out of 8) observations from~\citearguello2012task,siu2014first indicating that user interactions with different interfaces and to solve tasks of different complexity have remained mostly similar over time.

Space4HGNN: A Novel, Modularized and Reproducible Platform to Evaluate Heterogeneous Graph Neural Network
Authors: Tianyu Zhao (1), Cheng Yang (2), Yibo Li (1), Quan Gan (3), Zhenyi Wang (1), Fengqi Liang (1), Huan Zhao (4), Yingxia Shao (1), Xiao Wang (2), Chuan Shi (2)
1: Beijing University of Posts and Telecommunications, 2: Beijing University of Posts and Telecommunications & Peng Cheng Laboratory, 3: AWS Shanghai AI Lab, 4: 4Paradigm Inc.

ACM DL

Google Scholar

(331)
概要:　様々なタスクで成功を収めている異質グラフニューラルネットワーク（Heterogeneous Graph Neural Network, HGNN）ですが、多様なアーキテクチャや適用シナリオが存在するため、HGNNの異なる設計次元の重要性を正確に把握することは困難です。また、HGNNの研究コミュニティでは、様々なタスクの実装と評価には依然として多くの人的労力が必要です。これらの問題を解決するために、まず多くのHGNNを網羅する統一フレームワークを提案します。このフレームワークは、異質線形変換、異質グラフ変換、および異質メッセージパッシング層の3つのコンポーネントで構成されます。次に、この統一フレームワークに基づいてHGNNの設計空間を定義し、モジュール化されたコンポーネント、再現可能な実装、および標準化された評価を提供するプラットフォームSpace4HGNNを構築します。最後に、異なる設計の効果を分析するための実験を行います。得られた洞察をもとに、凝縮された設計空間を抽出し、その有効性を検証します。

Abstract:　 Heterogeneous Graph Neural Network (HGNN) has been successfully employed in various tasks, but we cannot accurately know the importance of different design dimensions of HGNNs due to diverse architectures and applied scenarios. Besides, in the research community of HGNNs, implementing and evaluating various tasks still need much human effort. To mitigate these issues, we first propose a unified framework covering most HGNNs, consisting of three components: heterogeneous linear transformation, heterogeneous graph transformation, and heterogeneous message passing layer. Then we build a platform Space4HGNN by defining a design space for HGNNs based on the unified framework, which offers modularized components, reproducible implementations, and standardized evaluation for HGNNs. Finally, we conduct experiments to analyze the effect of different designs. With the insights found, we distill a condensed design space and verify its effectiveness.

An Inspection of the Reproducibility and Replicability of TCT-ColBERT
Authors: Xiao Wang (1), Sean MacAvaney (1), Craig Macdonald (1), Iadh Ounis (1)
1: University of Glasgow

ACM DL

Google Scholar

(332)
概要:　稠密検索アプローチは、BM25などのスパース検索モデルと比較して文脈化された類似性をよりよく捉えることができるため、ますます関心を集めています。このアプローチの中で特に注目されているのがTCT-ColBERTであり、高価な「教師」モデルから軽量な「生徒」モデルを訓練します。本研究では、TCT-ColBERTの再現性と再複製性について詳しく調査しました。我々の研究を構造化するために、モデルに焦点を当てた論文の訓練、推論、および評価を再現するための三段階の視点を提案し、それぞれ異なる段階で生成されたアーティファクトを使用します。我々は、完全な訓練プロセスを実施する場合、公開された訓練済みモデルからの推論のみを行う場合に比べて、正確な再現がより困難であることを確認しました。各段階は、再複製とアブレーション実験を行う機会を提供します。我々は、モデルの推論や稠密インデックス/検索のための効果的な独立実装を行うことができましたが、訓練プロセスの再複製には失敗しました。オリジナルの論文のギャップをカバーするためにいくつかのアブレーションを行い、以下の観察結果を得ました。(1) モデルは安価な再ランカーとして機能でき、新たなパレート最適な結果を確立する。(2) スコアの同点を適切に扱うのであれば、下位精度の浮動小数点値を使用してインデックスサイズを削減できる。(3) 最適なパフォーマンスを達成するには、訓練は提案された期間全体にわたって実施する必要がある。(4)教師からの生徒の初期化は必要ない。

Abstract:　 Dense retrieval approaches are of increasing interest because they can better capture contextualised similarity compared to sparse retrieval models such as BM25. Among the most prominent of these approaches is TCT-ColBERT, which trains a light-weight "student'' model from a more expensive "teacher'' model. In this work, we take a closer look into TCT-ColBERT concerning its reproducibility and replicability. To structure our study, we propose a three-stage perspective on reproducing the training, inference, and evaluation of model-focused papers, each using artefacts produced from different stages in the pipeline. We find that --- perhaps as expected --- precise reproduction is more challenging when the complete training process is conducted, rather than just inference from a released trained model. Each stage provides the opportunity to perform replication and ablation experiments. We are able to replicate (i.e., produce an effective independent implementation) for model inference and dense indexing/retrieval, but are unable to replicate the training process. We conduct several ablations to cover gaps in the original paper, and make the following observations: (1) the model can function as an inexpensive re-ranker, establishing a new Pareto-optimal result; (2) the index size can be reduced by using lower-precision floating point values, but only if ties in scores are handled appropriately; (3) training needs to be conducted for the entire suggested duration to achieve optimal performance; and (4) student initialisation from the teacher is not necessary.

Is Non-IID Data a Threat in Federated Online Learning to Rank?
Authors: Shuyi Wang (1), Guido Zuccon (1)
1: The University of Queensland

ACM DL

Google Scholar

(333)
概要:　本稿では、独立かつ同一分布 (IID) でないデータがフェデレーションオンライン学習ランキング (FOLTR) に及ぼす影響を研究し、この新しくほとんど未開拓の情報検索分野における今後の研究の方向性を示します。FOLTR プロセスでは、クライアントがフェデレーションに参加して、各クライアントから発生する暗黙のクリック信号から効果的なランカーを共同で作成しますが、データ（ドキュメント、クエリ、クリック）の共有は必要ありません。フェデレーション学習システムの性能に影響を与え、これらのアプローチに重大な課題を突きつける要因の一つに、データがクライアント間で分散される方法に何らかのバイアスが存在する可能性があります。FOLTRシステムはフェデレーション学習システムの一種ですが、FOLTR における非IIDデータの存在と影響については研究されていません。これを目的として、まずクライアント間のデータバイアスを示す可能性のあるデータ分布設定を列挙し、それにより非IID問題が生じることを示します。次に、現行の最先端FOLTRアプローチであるフェデレーテッド・ペアワイズ・ディファレンシャブル・グラディエント・ディセント (FPDGD) の性能に対する各設定の影響を研究し、どのデータ分布がFOLTRメソッドに問題を提起するかを明らかにします。また、フェデレーション学習文献で提案されている一般的なアプローチがどのようにFOLTRにおける非 IID問題に対処するかも探ります。これにより、今後のFOLTR研究が考慮すべき新たな研究ギャップを明らかにします。

Abstract:　 In this perspective paper we study the effect of non independent and identically distributed (non-IID) data on federated online learning to rank (FOLTR) and chart directions for future work in this new and largely unexplored research area of Information Retrieval. In the FOLTR process, clients participate in a federation to jointly create an effective ranker from the implicit click signal originating in each client, without the need to share data (documents, queries, clicks). A well-known factor that affects the performance of federated learning systems, and that poses serious challenges to these approaches, is that there may be some type of bias in the way data is distributed across clients. While FOLTR systems are on their own rights a type of federated learning system, the presence and effect of non-IID data in FOLTR has not been studied. To this aim, we first enumerate possible data distribution settings that may showcase data bias across clients and thus give rise to the non-IID problem. Then, we study the impact of each setting on the performance of the current state-of-the-art FOLTR approach, the Federated Pairwise Differentiable Gradient Descent (FPDGD), and we highlight which data distributions may pose a problem for FOLTR methods. We also explore how common approaches proposed in the federated learning literature address non-IID issues in FOLTR. This allows us to unveil new research gaps that, we argue, future research in FOLTR should consider.

Towards Feature Selection for Ranking and Classification Exploiting Quantum Annealers
Authors: Maurizio Ferrari Dacrema (1), Fabio Moroni (1), Riccardo Nembrini (2), Nicola Ferro (3), Guglielmo Faggioli (3), Paolo Cremonesi (1)
1: Politecnico Di Milano, 2: Politecnico di Milano, 3: Università degli Studi di Padova

ACM DL

Google Scholar

(334)
概要:　特徴選択は、多くのランキング、分類、または予測タスクにおいて一般的なステップであり、多目的に役立ちます。冗長またはノイズの多い特徴を取り除くことで、ランキングや分類の精度を向上させ、次の学習ステップの計算コストを削減できます。しかし、特徴選択自体が計算コストの高いプロセスとなることがあります。数十年間、理論的なアルゴリズム論文に限られていた量子コンピューティングは、現在では現実的な問題に取り組むための実用的なツールとなりつつあり、特に量子アニーリングパラダイムに基づく特定用途のソルバーが注目されています。本論文では、現在利用可能な量子コンピューティングアーキテクチャを使用して、ランキングと分類のためのいくつかの二次特徴選択アルゴリズムを解決する可能性を探ります。実験分析では、最新の15のデータセットが含まれています。量子コンピューティングハードウェアで得られた効果は、従来のソルバーと同等であり、量子コンピュータが興味深い問題に取り組むのに十分に信頼できることを示しています。スケーラビリティの観点では、現世代の量子コンピュータは、特定の従来のアルゴリズムに対して限定的なスピードアップを提供することができ、ハイブリッド量子-古典戦略は、1000以上の特徴を持つ問題に対してより低い計算コストを示しています。

Abstract:　 Feature selection is a common step in many ranking, classification, or prediction tasks and serves many purposes. By removing redundant or noisy features, the accuracy of ranking or classification can be improved and the computational cost of the subsequent learning steps can be reduced. However, feature selection can be itself a computationally expensive process. While for decades confined to theoretical algorithmic papers, quantum computing is now becoming a viable tool to tackle realistic problems, in particular special-purpose solvers based on the Quantum Annealing paradigm. This paper aims to explore the feasibility of using currently available quantum computing architectures to solve some quadratic feature selection algorithms for both ranking and classification. The experimental analysis includes 15 state-of-the-art datasets. The effectiveness obtained with quantum computing hardware is comparable to that of classical solvers, indicating that quantum computers are now reliable enough to tackle interesting problems. In terms of scalability, current generation quantum computers are able to provide a limited speedup over certain classical algorithms and hybrid quantum-classical strategies show lower computational cost for problems of more than a thousand features.

Reduce, Reuse, Recycle: Green Information Retrieval Research
Authors: Harrisen Scells (1), Shengyao Zhuang (1), Guido Zuccon (1)
1: The University of Queensland

ACM DL

Google Scholar

(335)
概要:　情報検索の最新の進展は、最先端の結果を生み出すためにエネルギーを大量に消費するハードウェアを利用しています。自然言語処理や機械学習など、情報検索と密接に関連する研究分野では、こうしたハードウェアに依存する手法が生み出す電力消費と排出量を定量化し、削減する取り組みが進んでいます。環境への配慮を考慮し、その影響を軽減するための措置を取る研究は「グリーン」と見なされます。データと電力を必要とする手法への需要が続く中で、グリーン研究は研究コミュニティ全体でますます重要になることが予想されます。したがって、情報検索コミュニティ内でも、グリーンでない（言い換えればレッド）研究の影響は少なくとも考慮されるべきです。そこで本論文の目的は、以下の四つにあります：（1）情報検索だけでなく関連する分野のグリーン文献をレビューし、移転可能なグリーン技術を特定すること；（2）情報検索研究の電力使用量と排出量を定量化するための手段を提供すること；（3）現行の様々な情報検索手法における電力使用量と排出量の影響を報告すること；そして（4）「減らす、再利用、リサイクル」という廃棄物管理キャンペーンにインスピレーションを得て、これらの概念を実装した文献の顕著な例を含めた、グリーン情報検索研究を導くためのフレームワークを提供することです。

Abstract:　 Recent advances in Information Retrieval utilise energy-intensive hardware to produce state-of-the-art results. In areas of research highly related to Information Retrieval, such as Natural Language Processing and Machine Learning, there have been efforts to quantify and reduce the power and emissions produced by methods that depend on such hardware. Research that is conscious of the environmental impacts of its experimentation and takes steps to mitigate some of these impacts is considered 'Green'. Given the continuous demand for more data and power-hungry techniques, Green research is likely to become more important within the broader research community. Therefore, within the Information Retrieval community, the consequences of non-Green (in other words, Red) research should at least be considered and acknowledged. As such, the aims of this perspective paper are fourfold: (1) to review the Green literature not only for Information Retrieval but also for related domains in order to identify transferable Green techniques; (2) to provide measures for quantifying the power usage and emissions of Information Retrieval research; (3) to report the power usage and emission impacts for various current IR methods; and (4) to provide a framework to guide Green Information Retrieval research, taking inspiration from 'reduce, reuse, recycle' waste management campaigns, including salient examples from the literature that implement these concepts.

Competitive Search
Authors: Oren Kurland (1), Moshe Tennenholtz (1)
1: Technion - Israel Institute of Technology

ACM DL

Google Scholar

(336)
概要:　 Webは、競争的検索環境の典型的な例であり、文書の作成者はランキングのインセンティブを持っています。彼らの目標は、クエリに対して生成されるランキングにおいて自分たちの文書を上位に表示することです。このインセンティブは、ランキングに対する戦略的な文書操作を通じて、コーパスの動態に影響を与えます。この現実は、スパム対策の必要性を超えて深い影響を及ぼします。具体例として、研究者たちはゲーム理論解析を用いて、確率ランキング原理が競争的な検索環境において最適でないことを示しました。特に、この原理はコーパスのトピカルな多様性を減少させる結果をもたらします。我々は、競争的検索環境に関する最近の研究について広範な視点を提供し、この分野がまだ多くの未踏の領域を持っていることを論じ、新しい研究の方向性を提示します。たとえば、競争的検索のための一般的なゲーム理論フレームワーク、ランキング後の効果を考慮したランキング学習手法、自動文書操作のアプローチ、社会的側面の取り扱い、評価手法などが含まれます。

Abstract:　 The Web is a canonical example of a competitive search setting that includes document authors with ranking incentives: their goal is to promote their documents in rankings induced for queries. The incentives affect some of the corpus dynamics as the authors respond to rankings by applying strategic document manipulations. This well known reality has deep consequences that go well beyond the need to fight spam. As a case in point, researchers showed using game theoretic analysis that the probability ranking principle is not optimal in competitive retrieval settings; specifically, it leads to reduced topical diversity in the corpus. We provide a broad perspective on recent work on competitive retrieval settings, argue that this work is the tip of the iceberg, and pose a suite of novel research directions; for example, a general game theoretic framework for competitive search, methods of learning-to-rank that account for post-ranking effects, approaches to automatic document manipulation, addressing societal aspects and evaluation.

Where Do Queries Come From?
Authors: Marwah Alaofi (1), Luke Gallagher (1), Dana Mckay (1), Lauren L. Saling (1), Mark Sanderson (1), Falk Scholer (1), Damiano Spina (1), Ryen W. White (2)
1: RMIT University, 2: Microsoft Research

ACM DL

Google Scholar

(337)
概要:　検索者が検索ボックスに入力するクエリ（検索語句）はどこから来るのでしょうか？情報検索（IR）コミュニティはクエリや検索エンジンのパフォーマンスについて広範な理解を持ち、最近ではクエリの変動が検索結果に与える影響についても検討し始めました。異なるクエリが同じ情報ニーズに対して異なる結果を生み出すことが示されています。これは、悪意のある者が検索者を誤情報に誘導しようとする情報環境では懸念される事態です。クエリの変動の原因である検索者の特性、文脈的または言語的な促進要因、認知バイアス、さらには外部の影響などについて、他の研究コミュニティでは断片的に研究されていますが、我々のコミュニティにおいては十分に研究されていません。本論文では、情報探索、心理学、誤情報などの文献をふまえ、小規模な実験を通じてクエリの発生源について既知の事柄を記述し、IRにおけるクエリ変動の原因に関する明確な研究ギャップを明示します。我々は、検索者がどのようにクエリを構成するかに関わらず、一貫性があり、正確で関連性の高い検索結果を提供する検索エンジンを作成するために、IRがこの重要な問題を研究し、文書化し、理解するための道筋を示します。

Abstract:　 Where do queries -- the words searchers type into a search box -- come from? The Information Retrieval community understands the performance of queries and search engines extensively, and has recently begun to examine the impact of query variation, showing that different queries for the same information need produce different results. In an information environment where bad actors try to nudge searchers toward misinformation, this is worrisome. The source of query variation -- searcher characteristics, contextual or linguistic prompts, cognitive biases, or even the influence of external parties -- while studied in a piecemeal fashion by other research communities has not been studied by ours. In this paper we draw on a variety of literatures (including information seeking, psychology, and misinformation), and report some small experiments to describe what is known about where queries come from, and demonstrate a clear literature gap around the source of query variations in IR. We chart a way forward for IR to research, document and understand this important question, with a view to creating search engines that provide more consistent, accurate and relevant search results regardless of the searcher's framing of the query.

On Natural Language User Profiles for Transparent and Scrutable Recommendation
Authors: Filip Radlinski (1), Krisztian Balog (2), Fernando Diaz (3), Lucas Dixon (4), Ben Wedin (5)
1: Google, 2: Google, 3: Google, 4: Google, 5: Google

ACM DL

Google Scholar

(338)
概要:　推薦システムとパーソナライズされた検索システムに自然なインタラクションを提供することは、近年大きな注目を集めています。私たちは、これらのシステムの理解と制御を支援するための課題に焦点を当て、知識の表現方法について根本的に新しい考え方を探ります。具体的には、ユーザーの好みを自然言語で表現するアルゴリズムを開発することが、望ましくかつ可能であると主張します。これにより、透明性が大幅に向上し、推薦に対する実際的で行動可能な質問や制御の機能が提供される可能性があると考えます。さらに、このアプローチが成功すれば、ノイズの多い暗黙の観察に依存することを減らし、自身の興味に関する知識の移植性を高めるシステムへの大きな一歩となるかもしれないと論じています。

Abstract:　 Natural interaction with recommendation and personalized search systems has received tremendous attention in recent years. We focus on the challenge of supporting people's understanding and control of these systems and explore a fundamentally new way of thinking about representation of knowledge in recommendation and personalization systems. Specifically, we argue that it may be both desirable and possible for algorithms that use natural language representations of users' preferences to be developed. We make the case that this could provide significantly greater transparency, as well as affordances for practical actionable interrogation of, and control over, recommendations. Moreover, we argue that such an approach, if successfully applied, may enable a major step towards systems that rely less on noisy implicit observations while increasing portability of knowledge of one's interests.

Retrieval-Enhanced Machine Learning
Authors: Hamed Zamani (1), Fernando Diaz (2), Mostafa Dehghani (3), Donald Metzler (4), Michael Bendersky (4)
1: University of Massachusetts Amherst, 2: Google Research, 3: Google Research, 4: Google Research

ACM DL

Google Scholar

(339)
概要:　情報アクセスシステムは長い間、人々が様々なタスクを達成するための支援をしてきましたが、我々は情報アクセスシステムの利用者の範囲を、機械学習モデルなどのタスク駆動型マシンにも拡大することを提案します。これにより、インデックス付け、表現、検索、ランキングの基本原則が適用および拡張され、モデルの一般化、スケーラビリティ、堅牢性、解釈可能性が大幅に向上します。我々は、いくつかの既存のモデルを特殊なケースとして含む、汎用的な検索強化型機械学習（REML）フレームワークを説明します。REMLは情報検索の既成概念に挑戦し、最適化を含むコア分野での新しい進展の機会を提供します。REML研究のアジェンダは、新しいスタイルの情報アクセス研究の基盤を築き、機械学習と人工知能の進展への道を切り開きます。

Abstract:　 Although information access systems have long supportedpeople in accomplishing a wide range of tasks, we propose broadening the scope of users of information access systems to include task-driven machines, such as machine learning models. In this way, the core principles of indexing, representation, retrieval, and ranking can be applied and extended to substantially improve model generalization, scalability, robustness, and interpretability. We describe a generic retrieval-enhanced machine learning (REML) framework, which includes a number of existing models as special cases. REML challenges information retrieval conventions, presenting opportunities for novel advances in core areas, including optimization. The REML research agenda lays a foundation for a new style of information access research and paves a path towards advancing machine learning and artificial intelligence.

MET-Meme: A Multimodal Meme Dataset Rich in Metaphors
Authors: Bo Xu (1), Tingting Li (1), Junzhe Zheng (1), Mehdi Naseriparsa (2), Zhehuan Zhao (1), Hongfei Lin (1), Feng Xia (2)
1: Dalian University of Technology, 2: Federation University Australia

ACM DL

Google Scholar

(340)
概要:　ミームは世界中のインターネットユーザーにとって人気のあるコミュニケーション手段となっています。インターネットミームを理解することは、その特有の非標準的な書き方やネットワーク用語のために、自然言語処理（NLP）タスクにおいて最も困難な課題の一つです。最近、多くの言語学者はミームに豊富な隠喩的情報が含まれていることを指摘しています。しかし、既存の研究はこの重要な特徴を無視しています。したがって、豊富な隠喩情報をミーム分析に取り入れるために、メタファー特徴に富んだ新しいマルチモーダルミームデータセット「MET-Meme」を紹介します。このデータセットは、10045対のテキストと画像ペアを含み、隠喩の出現、感情カテゴリー、意図、および攻撃性の度合いについて手動で注釈が付けられています。さらに、ミームの感情分析および意味理解タスクにおける隠喩的特徴の組み合わせの重要性を示すために、一連の強力なベースラインを提案します。MET-Memeとそのコードは、\urlhttps://github.com/liaolianfoka/MET-Meme-A-Multi-modal-Meme-Dataset-Rich-in-Metaphorsで公開されており、研究に利用できます。

Abstract:　 Memes have become the popular means of communication for Internet users worldwide. Understanding the Internet meme is one of the most tricky challenges in natural language processing (NLP) tasks due to its convenient non-standard writing and network vocabulary. Recently, many linguists suggested that memes contain rich metaphorical information. However, the existing researches ignore this key feature. Therefore, to incorporate informative metaphors into the meme analysis, we introduce a novel multimodal meme dataset called MET-Meme, which is rich in metaphorical features. It contains 10045 text-image pairs, with manual annotations of the metaphor occurrence, sentiment categories, intentions, and offensiveness degree. Moreover, we propose a range of strong baselines to demonstrate the importance of combining metaphorical features for meme sentiment analysis and semantic understanding tasks, respectively. MET-Meme, and its code are released publicly for research in \urlhttps://github.com/liaolianfoka/MET-Meme-A-Multi-modal-Meme-Dataset-Rich-in-Metaphors.

Revisiting Bundle Recommendation: Datasets, Tasks, Challenges and Opportunities for Intent-aware Product Bundling
Authors: Zhu Sun (1), Jie Yang (2), Kaidong Feng (3), Hui Fang (4), Xinghua Qu (5), Yew Soon Ong (6)
1: Institute of High Performance Computing and Centre for Frontier AI Research, 2: Delft University of Technology, 3: Yanshan University, 4: Shanghai University of Finance and Economics, 5: Bytedance AI Lab, 6: A*STAR Centre for Frontier AI Research and Nanyang Technological University

ACM DL

Google Scholar

(341)
概要:　プロダクトバンドリングは、オフライン小売店およびオンラインeコマースシステムにおける一般的なマーケティング戦略です。現在のバンドル推薦に関する研究は以下の点で制約されています：(1) ノイズの多いデータセットであり、バンドルがヒューリスティックに定義されている（例：同じセッションで共同購入された製品）；(2) 非現実的な仮定を持つ特定のタスク（例：推薦のためのバンドルが直接利用可能なこと）。本研究では一歩引き下がり、ユーザー体験の全体的な観点からバンドル推薦プロセスを再検討することを提案します。まず、特にバンドルの意図に関するリッチなメタ情報を含む高品質なバンドルデータセットを、慎重に設計されたクラウドソーシングタスクを通じて構築します。次に、バンドル検出、完成、ランキング、説明、および自動命名に至るまで、典型的なバンドル推薦プロセスのすべての重要なステップをサポートする一連のタスクを定義します。最後に、大規模な実験と詳細な分析を実施し、ユーザー、製品、およびバンドル間の複雑な関係を捕捉する必要性から生じるバンドル推薦の課題と、特にグラフベースのニューラル手法における研究機会を示します。総括すると、我々の研究は新しいデータソースを提供し、新たな研究方向を開拓し、実際のeコマースプラットフォームにおけるプロダクトバンドリングのための有益なガイダンスを提供します。我々のデータセットはGitHub（\urlhttps://github.com/BundleRec/bundle_recommendation）で利用可能です。

Abstract:　 Product bundling is a commonly-used marketing strategy in both offline retailers and online e-commerce systems. Current research on bundle recommendation is limited by: (1) noisy datasets, where bundles are defined by heuristics, e.g., products co-purchased in the same session; and (2) specific tasks, holding unrealistic assumptions, e.g., the availability of bundles for recommendation directly. In this paper, we propose to take a step back and consider the process of bundle recommendation from a holistic user experience perspective. We first construct high-quality bundle datasets with rich meta information, particularly bundle intents, through a carefully designed crowd-sourcing task. We then define a series of tasks that together, support all key steps in a typical bundle recommendation process, from bundle detection, completion, ranking, to explanation and auto-naming. Finally, we conduct extensive experiments and in-depth analysis that demonstrate the challenges of bundle recommendation, arising from the need for capturing complex relations among users, products, and bundles, as well as the research opportunities, especially in graph-based neural methods. To sum up, our study delivers new data sources, opens up new research directions, and provides useful guidance for product bundling in real e-commerce platforms. Our datasets are available at GitHub (\urlhttps://github.com/BundleRec/bundle_recommendation ).

BARS: Towards Open Benchmarking for Recommender Systems
Authors: Jieming Zhu (1), Quanyu Dai (1), Liangcai Su (2), Rong Ma (3), Jinyang Liu (4), Guohao Cai (1), Xi Xiao (2), Rui Zhang (5)
1: Huawei Noah's Ark Lab, 2: Tsinghua University, 3: Tsinghua University, 4: The Chinese University of Hong Kong, 5: ruizhang.info

ACM DL

Google Scholar

(342)
概要:　編集者ノート著者はバージョン・オブ・レコードに対して軽微で非本質的な変更を要求し、ACMポリシーに従って2022年7月11日に修正版のレコードが公開されました。参考目的のため、このページの補足資料セクションからVoRにアクセスすることができます。過去20年間で、パーソナライズド推薦技術の急速な発展が見られました。推薦システムの研究と実践の両方で大きな進歩があったにもかかわらず、この分野では未だ広く認知されたベンチマークの標準が欠如しています。多くの既存の研究では、独自のデータ分割や異なる実験設定を用いるなど、アドホックな方法でモデルの評価と比較を行っています。しかし、このような慣習は、既存の研究の再現性を難しくするだけでなく、その間で実験結果の一貫性を欠如させることにも繋がります。このことが、この分野の研究結果の信頼性と実用価値を大きく制限しています。これらの問題に対処するために、我々は推薦システムのオープンベンチマークを目指したプロジェクトを立ち上げました。この目標に向けた以前の試みとは異なり、再現可能な研究のための標準化されたベンチマークパイプラインを設定し、データセット、ソースコード、ハイパーパラメータ設定、実行ログ、評価結果などの全ての詳細を統合しています。ベンチマークは包括性と持続可能性を念頭に設計されており、マッチングタスクとランキングタスクの両方を網羅しています。また、誰でも簡単にフォローし貢献できるようになっています。我々のベンチマークは、研究者が既存のベースラインを再実装または再実行するための冗長な努力を減らすだけでなく、推薦システムに関するより堅実で再現性のある研究を推進できると信じています。

Abstract:　 Editorial NotesThe authors have requested minor, non-substantive changes to the Version of Record and, in accordance with ACM policies, a Corrected Version of Record was published on July 11, 2022. For reference purposes, the VoR may still be accessed via the Supplemental Material section on this page.The past two decades have witnessed the rapid development of personalized recommendation techniques. Despite the significant progress made in both research and practice of recommender systems, to date, there is a lack of a widely-recognized benchmarking standard in this field. Many of the existing studies perform model evaluations and comparisons in an ad-hoc manner, for example, by employing their own private data splits or using a different experimental setting. However, such conventions not only increase the difficulty in reproducing existing studies, but also lead to inconsistent experimental results among them. This largely limits the credibility and practical value of research results in this field. To tackle these issues, we present an initiative project aimed for open benchmarking for recommender systems. In contrast to some earlier attempts towards this goal, we take one further step by setting up a standardized benchmarking pipeline for reproducible research, which integrates all the details about datasets, source code, hyper-parameter settings, running logs, and evaluation results. The benchmark is designed with comprehensiveness and sustainability in mind. It spans both matching and ranking tasks, and also allows anyone to easily follow and contribute. We believe that our benchmark could not only reduce the redundant efforts of researchers to re-implement or re-run existing baselines, but also drive more solid and reproducible research on recommender systems.

OVQA: A Clinically Generated Visual Question Answering Dataset
Authors: Yefan Huang (1), Xiaoli Wang (1), Feiyan Liu (1), Guofeng Huang (2)
1: Xiamen University, 2: Affiliated Southeast Hospital of Xiamen University

ACM DL

Google Scholar

(343)
概要:　 :
付記事項

メディカルビジュアル質問応答（Med-VQA）は、医療画像とその画像に関する臨床質問を入力とし、自然言語で正しい答えを出力することを目指す挑戦的な課題です。現在の医療システムは、大規模で高品質なラベル付きデータをトレーニングおよび評価のために必要とすることが多いです。この課題に対応するため、私たちは電子医療記録から生成された新しいデータセットOVQAを提案します。私たちは、データセットを構築するために半自動データ生成ツールを開発しました。まず、医療記録から医療エンティティを自動的に抽出し、質問と回答のペアを生成するための事前定義されたテンプレートに埋め込みます。これらのペアは、対応する医療記録から抽出された医療画像と組み合わせられ、VQA（ビジュアル質問応答）の候補が生成されます。最後に、これらの候補は経験豊富な医師により注釈を付けられた高品質なラベルで検証されます。OVQAの品質を評価するために、最先端のMed-VQA手法に対して総合的な実験を行いました。その結果、OVQAは既存のMed-VQAシステムを評価するためのベンチマークデータセットとして使用できることが示されました。データセットはhttp://47.94.174.82/からダウンロードできます。

Abstract:　 Medical visual question answering (Med-VQA) is a challenging problem that aims to take a medical image and a clinical question about the image as input and output a correct answer in natural language. Current medical systems often require large-scale and high-quality labeled data for training and evaluation. To address the challenge, we present a new dataset, denoted by OVQA, which is generated from electronic medical records. We develop a semi-automatic data generation tool for constructing the dataset. First, medical entities are automatically extracted from medical records and filled into predefined templates for generating question and answer pairs. These pairs are then combined with medical images extracted from corresponding medical records, to generate candidates for visual question answering (VQA). The candidates are finally verified with high-quality labels annotated by experienced physicians. To evaluate the quality of OVQA, we conduct comprehensive experiments on state-of-the-art methods for the Med-VQA task to our dataset. The results show that our OVQA can be used as a benchmarking dataset for evaluating existing Med-VQA systems. The dataset can be downloaded from http://47.94.174.82/.

Fostering Coopetition While Plugging Leaks: The Design and Implementation of the MS MARCO Leaderboards
Authors: Jimmy Lin (1), Daniel Campos (2), Nick Craswell (3), Bhaskar Mitra (4), Emine Yilmaz (5)
1: University of Waterloo, 2: University of Illinois at Urbana-Champaign, 3: Microsoft, 4: Microsoft, 5: University College London

ACM DL

Google Scholar

(344)
概要:　本稿では、MS MARCOドキュメントランキングおよびパッセージランキングのリーダーボードの設計と実装について説明します。TRECに代表されるような「標準的な」コミュニティ全体の評価とは異なり、リーダーボードは各プレーヤーの動きが即座に全コミュニティに公開される逐次的なゲームと見なされます。この設定における根本的な課題は、リーダーボードへの各提出がホールドアウト評価セットに関する情報を漏らすことです。これは、トレーニングデータとテストデータの分離に関する機械学習の基本原則と矛盾します。これらの「情報漏洩」が長期間にわたって蓄積されることで、リーダーボードから得られる洞察の妥当性が脅かされます。本稿では、過去数年間にわたってこの問題に取り組んできた経験と、それらの考慮がどのように一貫した提出ポリシーとして具体化されたかを共有します。我々の研究は、MS MARCOリーダーボードの設計選択を理解するための有用なガイドをコミュニティに提供し、将来のリーダーボード設計者への教訓を提供するものです。

Abstract:　 We articulate the design and implementation of the MS MARCO document ranking and passage ranking leaderboards. In contrast to "standard" community-wide evaluations such as those at TREC, which can be characterized as simultaneous games, leaderboards represent sequential games, where every player move is immediately visible to the entire community. The fundamental challenge with this setup is that every leaderboard submission leaks information about the held-out evaluation set, which conflicts with the fundamental tenant in machine learning about separation of training and test data. These "leaks", accumulated over long periods of time, threaten the validity of the insights that can be derived from the leaderboards. In this paper, we share our experiences grappling with this issue over the past few years and how our considerations are operationalized into a coherent submission policy. Our work provides a useful guide to help the community understand the design choices made in the popular MS MARCO leaderboards and offers lessons for designers of future leaderboards.

Monant Medical Misinformation Dataset: Mapping Articles to Fact-Checked Claims
Authors: Ivan Srba (1), Branislav Pecher (2), Matus Tomlein (1), Robert Moro (1), Elena Stefancova (1), Jakub Simko (1), Maria Bielikova (3)
1: Kempelen Institute of Intelligent Technologies, 2: Brno University of Technology & Kempelen Institute of Intelligent Technologies, 3: Kempelen Institute of Intelligent Technologies & slovak.AI

ACM DL

Google Scholar

(345)
概要:　偽情報は個人および社会全体に大きな悪影響を及ぼします。特に現在のCOVID-19時代において、医療に関連する誤情報の前例のない増加が見られます。この問題に機械学習のアプローチで取り組むために、約317,000件の医療ニュース記事やブログ、3,500件のファクトチェック済みの主張を含む機能豊富なデータセットを公開します。このデータセットには、573件の手動ラベル付けと51,000件以上の自動ラベル付けによる主張と記事の間の対応関係も含まれています。対応関係には、特定の記事に主張が含まれているか、またその記事が主張に対してどのような立場を取っているかが含まれます。これらの二つのタスクに対していくつかのベースラインを提供し、手動ラベル部分のデータセットで評価を行いました。このデータセットにより、誤情報の特性評価や複数の情報源間の誤情報の拡散に関する研究など、医療誤情報に関連する多数の追加タスクが可能となります。

Abstract:　 False information has a significant negative influence on individuals as well as on the whole society. Especially in the current COVID-19 era, we witness an unprecedented growth of medical misinformation. To help tackle this problem with machine learning approaches, we are publishing a feature-rich dataset of approx. 317k medical news articles/blogs and 3.5k fact-checked claims. It also contains 573 manually and more than 51k automatically labelled mappings between claims and articles. Mappings consist of claim presence, i.e., whether a claim is contained in a given article, and article stance towards the claim. We provide several baselines for these two tasks and evaluate them on the manually labelled part of the dataset. The dataset enables a number of additional tasks related to medical misinformation, such as misinformation characterisation studies or studies of misinformation diffusion between sources.

A Dataset for Sentence Retrieval for Open-Ended Dialogues
Authors: Itay Harel (1), Hagai Taitelbaum (2), Idan Szpektor (2), Oren Kurland (3)
1: TSG IT Advanced Systems Ltd., 2: Google Research, 3: Technion - Israel Institute of Technology

ACM DL

Google Scholar

(346)
概要:　タスク: 英語から日本語への翻訳
対象ジャーナル: 機械学習
目標: 分かりやすく、明瞭な記述
聴衆: 専門家
スタイル: 分析的、PhDレベル
本文:我々は、オープンエンドの対話に対する文検索のタスクに取り組む。このタスクの目的は、特定の対話の次のターンを生成するために有用な情報を含む文を文書コーパスから検索することである。従来の対話ベースの検索研究は、会話型QAや会話型検索などの特定のタイプの対話に焦点を当てていた。本研究では、どのようなタイプの対話でも使用できるタスクのより広い範囲に対処するために、Redditから取得したオープンエンドな対話、各対話に対してWikipediaからの候補文、およびその文に対する人間によるアノテーションを含むデータセットを構築した。データセット上で、ニューラル検索モデルを含むいくつかの検索ベースラインの性能を報告する。データセット内の対話タイプにニューラルモデルを適応させるために、Redditから大規模な弱教師ありの訓練データを生成するアプローチを検討した。この訓練データセットを使用することで、MS MARCOデータセットでの訓練よりも大幅に性能が向上した。

Abstract:　 We address the task of sentence retrieval for open-ended dialogues. The goal is to retrieve sentences from a document corpus that contain information useful for generating the next turn in a given dialogue. Prior work on dialogue-based retrieval focused on specific types of dialogues: either conversational QA or conversational search. To address a broader scope of this task where any type of dialogue can be used, we constructed a dataset that includes open-ended dialogues from Reddit, candidate sentences from Wikipedia for each dialogue and human annotations for the sentences. We report the performance of several retrieval baselines, including neural retrieval models, over the dataset. To adapt neural models to the types of dialogues in the dataset, we explored an approach to induce a large-scale weakly supervised training data from Reddit. Using this training set significantly improved the performance over training on the MS MARCO dataset.

Too Many Relevants: Whither Cranfield Test Collections?
Authors: Ellen M. Voorhees (1), Nick Craswell (2), Jimmy Lin (3)
1: National Institute of Standards and Technology, 2: Microsoft, 3: University of Waterloo

ACM DL

Google Scholar

(347)
概要:　本論文では、TREC 2021 Deep Learningトラックから学んだ、大規模なCranfieldスタイルのテストコレクションの構築と使用に関する教訓を紹介します。2021年版のトラックで使用されたコーパスは以前使用されたものよりもはるかに大きく、より多くの関連文書を含んでいます。以前のトラックで使用された文書選定プロセスは、ほとんどのトピックに関連文書が多すぎたため、信頼性のあるコレクションを作成することができませんでした。判断予算が適切な関連セットのサンプルを見つける前に超過してしまい、その結果、未評価のコーパス部分に多くの未知の関連文書が存在する可能性があります。その結果、このコレクションは再利用不可能であり、さらに、このコレクションを構築するために使用された検索システムに対しても再現率に基づく評価は信頼性が低いです。早期精度の評価指標は、多くのトピックで最大スコアが容易に得られるため、システムの結果を区別することができません。また、テストコレクションの質を評価する既存のツールは、システムのスコアに依存しているため、関連文書が多すぎる場合には機能しません。Cranfieldパラダイムを大規模なコーパスで継続して使用するためには、コレクション構築者は信頼性のあるテストコレクションを構築するための新しい戦略とツールが必要です。関連性の定義がシステムのランキング希望を真に反映することを確保することは、コレクション構築を継続するための暫定的な戦略です。

Abstract:　 This paper presents the lessons regarding the construction and use of large Cranfield-style test collections learned from the TREC 2021 Deep Learning track. The corpus used in the 2021 edition of the track was much bigger than the corpus used previously and it contains many more relevant documents. The process used to select documents to judge that had been used in earlier years of the track failed to produce a reliable collection because most topics have too many relevant documents. Judgment budgets were exceeded before an adequate sample of the relevant set could be found, so there are likely many unknown relevant documents in the unjudged portion of the corpus. As a result, the collection is not reusable, and furthermore, recall-based measures are unreliable even for the retrieval systems that were used to build the collection. Yet, early-precision measures cannot distinguish among system results because the maximum score is easily obtained for many topics. And since the existing tools for appraising the quality of test collections depend on systems' scores, they also fail when there are too many relevant documents. Collection builders will need new strategies and tools for building reliable test collections for continued use of the Cranfield paradigm on ever-larger corpora. Ensuring that the definition of 'relevant' truly reflects the desired systems' rankings is a provisional strategy for continued collection building.

ACORDAR: A Test Collection for Ad Hoc Content-Based (RDF) Dataset Retrieval
Authors: Tengteng Lin (1), Qiaosheng Chen (1), Gong Cheng (1), Ahmet Soylu (2), Basil Ell (3), Ruoqi Zhao (1), Qing Shi (1), Xiaxia Wang (1), Yu Gu (4), Evgeny Kharlamov (5)
1: Nanjing University, 2: OsloMet -- Oslo Metropolitan University & Norwegian University of Science and Technology, 3: Bielefeld University & University of Oslo, 4: The Ohio State University, 5: Bosch Center for Artificial Intelligence & University of Oslo

ACM DL

Google Scholar

(348)
概要:　アドホックなデータセット検索はIR研究における注目のトピックです。方法やシステムは、メタデータベースからコンテンツベースへと進化しつつあり、データ自体を活用して検索の精度を向上させていますが、これまで専門のテストコレクションが不足していました。本論文では、初めてのアドホックなコンテンツベースのデータセット検索用テストコレクションを構築し公開します。このテストコレクションでは、コンテンツ指向のデータセットクエリとコンテンツベースの関連性評価を、人間の専門家が注釈を付け、データセットのメタデータとデータの両方を包括的かつ便利に閲覧するために特別に設計されたダッシュボードを用いて支援します。テストコレクションを用いて広範な実験を行い、その難易度を分析し、基礎的な課題に関する洞察を提供します。

Abstract:　 Ad hoc dataset retrieval is a trending topic in IR research. Methods and systems are evolving from metadata-based to content-based ones which exploit the data itself for improving retrieval accuracy but thus far lack a specialized test collection. In this paper, we build and release the first test collection for ad hoc content-based dataset retrieval, where content-oriented dataset queries and content-based relevance judgments are annotated by human experts who are assisted with a dashboard designed specifically for comprehensively and conveniently browsing both the metadata and data of a dataset. We conduct extensive experiments on the test collection to analyze its difficulty and provide insights into the underlying task.

RELISON: A Framework for Link Recommendation in Social Networks
Authors: Javier Sanz-Cruzado (1), Pablo Castells (2)
1: University of Glasgow, 2: Universidad Autónoma de Madrid

ACM DL

Google Scholar

(349)
概要:　リンク推薦は、レコメンダーシステムとオンラインソーシャルネットワークの交差点において、重要かつ魅力的な問題です。ユーザーを指定すると、リンク推薦システムはそのユーザーが興味を持って交流する可能性のあるプラットフォーム上の人物を特定します。我々は、リンク推薦実験を実行するための拡張可能なフレームワーク、RELISONを提供します。このライブラリは多様なアルゴリズムを提供し、生成された推薦を評価するためのツールも含んでいます。RELISONは、推薦がオンラインソーシャルネットワークの特性に与える潜在的な影響を考慮したアルゴリズムおよび評価指標を含んでいます。これにより、ネットワーク構造解析指標、コミュニティ検出アルゴリズム、およびネットワーク拡散シミュレーション機能も実装されています。ライブラリのコードおよびドキュメントは、https://github.com/ir-uam/RELISON で入手可能です。

Abstract:　 Link recommendation is an important and compelling problem at the intersection of recommender systems and online social networks. Given a user, link recommenders identify people in the platform the user might be interested in interacting with. We present RELISON, an extensible framework for running link recommendation experiments. The library provides a wide range of algorithms, along with tools for evaluating the produced recommendations. RELISON includes algorithms and metrics that consider the potential effect of recommendations on the properties of online social networks. For this reason, the library also implements network structure analysis metrics, community detection algorithms, and network diffusion simulation functionalities. The library code and documentation is available at https://github.com/ir-uam/RELISON.

Wikimarks: Harvesting Relevance Benchmarks from Wikipedia
Authors: Laura Dietz (1), Shubham Chatterjee (1), Connor Lennox (1), Sumanta Kashyapi (1), Pooja Oza (1), Ben Gamari (2)
1: University of New Hampshire, 2: Well-Typed LLP

ACM DL

Google Scholar

(350)
概要:　私たちはWikipediaから自動的に関連性ベンチマークを収集するためのリソースを提供します――これを「Wikimarks」と呼び、手動で作成されたベンチマークと区別します。シミュレートされたベンチマークとは異なり、これらはWikipediaの著者による手動アノテーションに基づいています。TRECのComplex Answer Retrievalトラックに関する研究では、Wikimarksと手動アノテーションによるベンチマークのリーダーボードが非常に類似していることが示されました。そのため、その存在により、Wikimarksは情報検索研究にとって重要なニーズを満たすことができます。私たちは異なる言語にわたる複数の情報検索タスク（段落検索、エンティティランキング、クエリ特定のクラスタリング、アウトライン予測、関連エンティティのリンクなど）のためにWikimarksを収集するためのメタリソースを提供します。さらに、2022年1月1日版のWikipediaダンプから得られた英語、簡単な英語、日本語のWikimarksの例も提供します。リソースは以下のリンクから利用可能です: https://trema-unh.github.io/wikimarks/

Abstract:　 We provide a resource for automatically harvesting relevance benchmarks from Wikipedia -- which we refer to as "Wikimarks" to differentiate them from manually created benchmarks. Unlike simulated benchmarks, they are based on manual annotations of Wikipedia authors. Studies on the TREC Complex Answer Retrieval track demonstrated that leaderboards under Wikimarks and manually annotated benchmarks are very similar. Because of their availability, Wikimarks can fill an important need for Information Retrieval research. We provide a meta-resource to harvest Wikimarks for several information retrieval tasks across different languages: paragraph retrieval, entity ranking, query-specific clustering, outline prediction, and relevant entity linking and many more. In addition, we provide example Wikimarks for English, Simple English, and Japanese derived from the 01/01/2022 Wikipedia dump. Resource available: https://trema-unh.github.io/wikimarks/

ReMeDi: Resources for Multi-domain, Multi-service, Medical Dialogues
Authors: Guojun Yan (1), Jiahuan Pei (2), Pengjie Ren (1), Zhaochun Ren (1), Xin Xin (1), Huasheng Liang (3), Maarten de Rijke (2), Zhumin Chen (1)
1: Shandong University, 2: University of Amsterdam, 3: WeChat Tencent

ACM DL

Google Scholar

(351)
概要:　 \AcpMDS（医療対話システム）は、診断、治療、相談といった幅広い専門的な医療サービスを提供することで、医者と患者の支援を目的としています。しかし、\AcpMDSの開発は資源の不足により妨げられています。特に、以下の2点が挙げられます \beginenumerate[label=(\arabic] \item 多岐にわたる医療サービスをカバーし、細かい医療ラベル（目的、行動、スロット、値）を含む大規模な医療対話データセットが存在しないこと、 \item 複数のドメインおよびサービスに対応した医療対話のベンチマークセットがないこと。 \endenumerate本論文では、\acsReMeDi（医療対話データセットとベンチマークセット）の開発について述べます。ØurResourcesはØurResourcesデータセットとベンチマークの2つの部分で構成されています。ØurResourcesデータセットは医師と患者の間の96,965件の会話（そのうち1,557件は詳細なラベル付き）を含んでおり、843種の疾患、5,228の医療エンティティ、および40のドメインにわたる3つの専門医療サービスをカバーしています。私たちの知る限り、ØurResourcesデータセットは複数のドメインとサービスをカバーし、細かい医療ラベルを備えた唯一の医療対話データセットです。 ØurResourcesのリソースの第2の部分は、最先端の医療対話生成モデルから成るベンチマークセットです。ØurResourcesベンチマークは次の方法を含みます：\beginenumerate\item ØurResourcesデータセットで訓練、検証、およびテストされた事前学習モデル（BERT-WWM、BERT-MED、GPT2、およびMT5）、 \item ØurResourcesデータセットを拡張し、最先端の事前学習モデルの訓練を強化するための\acfSCL（自己教師付き学習）方法。 \endenumerate本論文では、ØurResourcesデータセットおよびベンチマーク方法の作成と、それらを用いた実験結果を示し、今後の研究の基準とします。この論文を通じて、データセット、ベンチマークの実装、および評価スクリプトを共有します。

Abstract:　 \AcpMDS aim to assist doctors and patients with a range of professional medical services, i.e., diagnosis, treatment and consultation. The development of \acpMDS is hindered because of a lack of resources. In particular. \beginenumerate[label=(\arabic] \item there is no dataset with large-scale medical dialogues that covers multiple medical services and contains fine-grained medical labels (i.e., intents, actions, slots, values), and \item there is no set of established benchmarks for \acpMDS for multi-domain, multi-service medical dialogues. \endenumerateIn this paper, we present \acsReMeDi, a set of \aclReMeDi \acusedReMeDi. ØurResources consists of two parts, the ØurResources dataset and the ØurResources benchmarks. The ØurResources dataset contains 96,965 conversations between doctors and patients, including 1,557 conversations with fine-gained labels. It covers 843 types of diseases, 5,228 medical entities, and 3 specialties of medical services across 40 domains. To the best of our knowledge, the ØurResources dataset is the only medical dialogue dataset that covers multiple domains and services, and has fine-grained medical labels. The second part of the ØurResources resources consists of a set of state-of-the-art models for (medical) dialogue generation. The ØurResources benchmark has the following methods: \beginenumerate\item pretrained models (i.e., BERT-WWM, BERT-MED, GPT2, and MT5) trained, validated, and tested on the ØurResources dataset, and \item a \acfSCL method to expand the ØurResources dataset and enhance the training of the state-of-the-art pretrained models. \endenumerateWe describe the creation of the ØurResources dataset, the ØurResources benchmarking methods, and establish experimental results using the ØurResources benchmarking methods on the ØurResources dataset for future research to compare against. With this paper, we share the dataset, implementations of the benchmarks, and evaluation scripts.

ArchivalQA: A Large-scale Benchmark Dataset for Open-Domain Question Answering over Historical News Collections
Authors: Jiexin Wang (1), Adam Jatowt (2), Masatoshi Yoshikawa (1)
1: Kyoto University, 2: University of Innsbruck

ACM DL

Google Scholar

(352)
概要:　ここ数年、オープンドメイン質問応答（ODQA）は、深層学習技術の発展と大規模なQAデータセットの提供により急速に進化しました。しかし、現在のデータセットは基本的に同期的な文書コレクション（例えば、Wikipedia）のために設計されています。数十年にわたる長期的なニュースアーカイブなどの時間的なニュースコレクションは、社会的な価値が高いにもかかわらず、モデルの訓練にはほとんど利用されていません。このような歴史的コレクションにおけるODQAの研究を促進するために、我々はArchivalQAを提案します。これは、532,444組の質問-回答ペアから成る、大規模な質問応答データセットであり、時間的なニュースQAのために設計されています。データセットは、その質問の難易度レベルおよび時間表現の含有に基づいて4つのサブパートに分けられており、異なる強みと能力を持つODQAシステムの訓練とテストに有用であると考えています。我々が導入する新しいQAデータセット構築フレームワークは、他の種類の時間的文書コレクションに対しても、高品質で曖昧さのない質問を生成するために適用可能です。

Abstract:　 In the last few years, open-domain question answering (ODQA) has advanced rapidly due to the development of deep learning techniques and the availability of large-scale QA datasets. However, the current datasets are essentially designed for synchronic document collections (e.g., Wikipedia). Temporal news collections such as long-term news archives spanning decades are rarely used in training the models despite they are quite valuable for our society. To foster the research in the field of ODQA on such historical collections, we present ArchivalQA, a large question answering dataset consisting of 532,444 question-answer pairs which is designed for temporal news QA. We divide our dataset into four subparts based on the question difficulty levels and the containment of temporal expressions, which we believe are useful for training and testing ODQA systems characterized by different strengths and abilities. The novel QA dataset-constructing framework that we introduce can be also applied to generate high-quality, non-ambiguous questions over other types of temporal document collections.

SoChainDB: A Database for Storing and Retrieving Blockchain-Powered Social Network Data
Authors: Hoang H. Nguyen (1), Dmytro Bozhkov (1), Zahra Ahmadi (1), Nhat-Minh Nguyen (2), Thanh-Nam Doan (3)
1: L3S Research Center, 2: Sunshine Tech Ho Chi Minh, 3: Independent Researcher

ACM DL

Google Scholar

(353)
概要:　ソーシャルネットワークは人間の活動において不可欠な部分となっています。既存の多くのソーシャルネットワークは中央集権型のシステムモデルに従っていますが、これには貴重なユーザー情報を保存する一方で、コンテンツの所有権や過度の商業化といった重要な問題が生じます。最近では、主にブロックチェーン技術に基づいて構築された分散型ソーシャルネットワークが、これらの問題を解消するための代替手段として提案されています。分散型アーキテクチャは中央集権型と同等の成熟度に達しており、分散型ソーシャルネットワークはますます人気を集めています。分散型ソーシャルネットワークは、投稿やコメントの作成といった一般的な機能だけでなく、報酬システムや投票メカニズムなどの高度なオプションも提供します。また、インフルエンサーが仮想通貨トークンに基づくステーキングシステムを通じてフォロワーや他のユーザーと交流するための豊かなエコシステムを提供します。分散型ソーシャルネットワークの広範かつ貴重なデータは、人間の行動知識を拡張するための新しい研究方向をもたらします。しかし、これらのソーシャルネットワークからデータにアクセスして収集することは、コンピュータサイエンスや社会科学の研究者が主に注力する分野ではないブロックチェーンの強い知識を必要とするため、容易ではありません。したがって、我々の研究は、これら新しいソーシャルネットワークからデータを取得するためのSoChainDBフレームワークを提案します。SoChainDBの能力と強みを示すために、最大級のブロックチェーンベースのソーシャルネットワークであるHiveのデータを収集し公開しました。Hiveデータの洞察を理解するために広範な分析を実施し、Hive上に構築されたゲームや非代替性トークン市場などの興味深い応用についても議論します。なお、我々のフレームワークは最小限の修正で他のブロックチェーンソーシャルネットワークにも適用可能です。SoChainDBは http://sochaindb.com で公開されており、データセットは CC BY-SA 4.0 ライセンスの下で利用可能です。

Abstract:　 Social networks have become an inseparable part of human activities. Most existing social networks follow a centralized system model, which despite storing valuable information of users, arise many critical concerns such as content ownership and over-commercialization. Recently, decentralized social networks, built primarily on blockchain technology, have been proposed as a substitution to eliminate these concerns. Since decentralized architectures are mature enough to be on par with the centralized ones, decentralized social networks are becoming more and more popular. Decentralized social networks can offer both common options like writing posts and comments and more advanced options such as reward systems and voting mechanisms. They provide rich eco-systems for the influencers to interact with their followers and other users via staking systems based on cryptocurrency tokens. The vast and valuable data of the decentralized social networks open several new directions for the research community to extend human behavior knowledge. However, accessing and collecting data from these social networks is not easy because it requires strong blockchain knowledge, which is not the main focus of computer science and social science researchers. Hence, our work proposes the SoChainDB framework that facilitates obtaining data from these new social networks. To show the capacity and strength of SoChainDB, we crawl and publish Hive data - one of the largest blockchain-based social networks. We conduct extensive analyses to understand the insight of Hive data and discuss some interesting applications, e.g., game, non-fungible tokens market built upon Hive. It is worth mentioning that our framework is well-adaptable to other blockchain social networks with minimal modification. SoChainDB is publicly accessible at http://sochaindb.com and the dataset is available under the CC BY-SA 4.0 license.

Multi-CPR: A Multi Domain Chinese Dataset for Passage Retrieval
Authors: Dingkun Long (1), Qiong Gao (1), Kuan Zou (1), Guangwei Xu (1), Pengjun Xie (1), Ruijie Guo (1), Jian Xu (1), Guanjun Jiang (1), Luxi Xing (1), Ping Yang (1)
1: Alibaba Group

ACM DL

Google Scholar

(354)
概要:　パッセージ検索は情報検索（IR）研究における基本的なタスクであり、最近多くの注目を集めています。英語の分野では、MS MARCOのような大規模な注釈付きデータセットの利用可能性と、BERTのような深層事前学習言語モデルの出現により、既存のパッセージ検索システムが大幅に改善されています。しかし、中国語の分野、特に特定のドメインにおいては、スケールの制限から高品質な注釈付きデータセットが不足しており、パッセージ検索システムはまだ未成熟です。したがって、本論文では、パッセージ検索のための革新的なマルチドメイン中国語データセット（Multi-CPR）を紹介します。このデータセットは、Eコマース、エンターテインメントビデオ、医療の3つの異なるドメインから収集されており、それぞれ数百万のパッセージと一定量の人間注釈付きクエリーパッセージ関連ペアを含んでいます。ベースラインとしてさまざまな代表的なパッセージ検索法を実装しました。一般ドメインからのデータセットで訓練された検索モデルの性能は、特定のドメインでは必然的に低下することがわかりました。それにもかかわらず、インドメイン注釈付きデータセットに基づいたパッセージ検索システムは、大幅な改善を達成でき、これによりドメインラベルデータの最適化の必要性が実証されました。我々は、Multi-CPRデータセットの公開が、特定ドメインにおける中国語パッセージ検索タスクのベンチマークとなり、将来の研究の進展にも寄与することを期待しています。

Abstract:　 Passage retrieval is a fundamental task in information retrieval (IR) research, which has drawn much attention recently. In the English field, the availability of large-scale annotated dataset (e.g, MS MARCO) and the emergence of deep pre-trained language models (e.g, BERT) has resulted in a substantial improvement of existing passage retrieval systems. However, in the Chinese field, especially for specific domains, passage retrieval systems are still immature due to quality-annotated dataset being limited by scale. Therefore, in this paper, we present a novel multi-domain Chinese dataset for passage retrieval (Multi-CPR). The dataset is collected from three different domains, including E-commerce, Entertainment video and Medical. Each dataset contains millions of passages and a certain amount of human annotated query-passage related pairs. We implement various representative passage retrieval methods as baselines. We find that the performance of retrieval models trained on dataset from general domain will inevitably decrease on specific domain. Nevertheless, a passage retrieval system built on in-domain annotated dataset can achieve significant improvement, which indeed demonstrates the necessity of domain labeled data for further optimization. We hope the release of the Multi-CPR dataset could benchmark Chinese passage retrieval task in specific domain and also make advances for future studies.

ORCAS-I: Queries Annotated with Intent using Weak Supervision
Authors: Daria Alexander (1), Wojciech Kusa (2), Arjen P. de Vries (3)
1: Radboud University & Spinque, 2: TU Wien, 3: Radboud University

ACM DL

Google Scholar

(355)
概要:　ユーザー意図の分類は情報検索において重要なタスクです。本研究では、ユーザー意図の改訂された分類法を紹介します。ナビゲーショナル（移動型）、トランザクショナル（取引型）、インフォメーショナル（情報提供型）のクエリ間の一般的な区別を出発点とし、情報提供型クエリに対して以下の3つのサブクラス（道具的、事実的、回避的）を識別しました。このユーザークエリの分類は、より詳細で、アノテーター間で高い一貫性を達成し、効果的な自動分類プロセスの基盤となり得ます。新たに導入されたカテゴリは、検索システムが結果を重視する際に区別すべきクエリタイプを明確にします。

我々は、新しいユーザー意図分類法に基づき、確立されたヒューリスティックとキーワードを利用して規則を構築し、Snorkelを用いた弱教師付きアプローチでORCASデータセットにアノテーションを付けました。そして、そのラベルをトレーニングデータとして使用し、様々な機械学習モデルを使った一連の実験を行いましたが、Snorkelが生成した結果はこれらの競合手法を上回らず、最新の技術と見なせる水準であることがわかりました。Snorkelのような規則ベースのアプローチの利点は、実際のシステムに効率的に展開できる点にあり、意図分類は発行されるすべてのクエリに対して実行されます。

本論文と共に公開されるリソースは、ORCAS-Iデータセットです。これは、WebクエリのクリックベースのデータセットであるORCASのラベル付きバージョンで、1000万の異なるクエリに対する1800万の接続が提供されます。このリソースの利用は、検索システムが内部動作やユーザーインターフェースを情報リクエストのタイプに合わせて変更するシナリオを想定しています。例えば、ナビゲーショナルなクエリに対しては短い結果リストを表示し、道具的な意図に対しては、他のクエリタイプよりもチュートリアルや説明記事を優先的にランク付けすることができます。

Abstract:　 User intent classification is an important task in information retrieval. In this work, we introduce a revised taxonomy of user intent. We take the widely used differentiation between navigational, transactional and informational queries as a starting point, and identify three different sub-classes for the informational queries: instrumental, factual and abstain. The resulting classification of user queries is more fine-grained, reaches a high level of consistency between annotators, and can serve as the basis for an effective automatic classification process. The newly introduced categories help distinguish between types of queries that a retrieval system could act upon, for example by prioritizing different types of results in the ranking. We have used a weak supervision approach based on Snorkel to annotate the ORCAS dataset according to our new user intent taxonomy, utilising established heuristics and keywords to construct rules for the prediction of the intent category. We then present a series of experiments with a variety of machine learning models, using the labels from the weak supervision stage as training data, but find that the results produced by Snorkel are not outperformed by these competing approaches and can be considered state-of-the-art. The advantage of a rule-based approach like Snorkel's is its efficient deployment in an actual system, where intent classification would be executed for every query issued. The resource released with this paper is the ORCAS-I dataset: a labelled version of the ORCAS click-based dataset of Web queries, which provides 18 million connections to 10 million distinct queries. We anticipate the usage of this resource in a scenario where the retrieval system would change its internal workings and search user interface to match the type of information request. For example, a navigational query could trigger just a short result list; and, for instrumental intent the system could rank tutorials and instructions higher than for other types of queries.

CODEC: Complex Document and Entity Collection
Authors: Iain Mackie (1), Paul Owoicho (1), Carlos Gemmell (1), Sophie Fischer (1), Sean MacAvaney (1), Jeffrey Dalton (1)
1: University of Glasgow

ACM DL

Google Scholar

(356)
概要:　 CODECは、複雑な研究テーマに焦点を当てたドキュメントおよびエンティティランキングのベンチマークです。社会科学研究者のエッセイスタイルの情報ニーズ、例えば「英国のオープンバンキング規制がチャレンジャーバンクにどのように利益をもたらしたか」に注目しています。CODECは、研究者によって開発された42のトピックと、エンティティリンクを含むセマンティックアノテーションが付与された新しいフォーカスウェブコーパスを含んでいます。このリソースには、多様な自動およびインタラクティブなマニュアル実行から得られた17,509のドキュメントとエンティティ（トピックあたり416.9）の専門家による評価が含まれています。マニュアル実行には387のクエリ再構成が含まれており、クエリのパフォーマンス予測と自動再構成評価のデータを提供します。CODECは、密な情報検索とニューラル再ランキングを含む最先端システムの分析も行っています。結果は、これらのトピックがドキュメントおよびエンティティランキングの改善余地がある難題であることを示しています。エンティティ情報を用いたクエリ拡張は、ドキュメントランキングにおいて顕著な向上を示し、エンティティ指向の検索の評価と改善におけるこのリソースの価値を証明します。また、マニュアルクエリ再構成がドキュメントランキングおよびエンティティランキングのパフォーマンスを大幅に改善することも示しています。全体として、CODECはエンティティ中心の検索手法の開発と評価を支援するために挑戦的な研究トピックを提供します。

Abstract:　 CODEC is a document and entity ranking benchmark that focuses on complex research topics. We target essay-style information needs of social science researchers, i.e. "How has the UK's Open Banking Regulation benefited Challenger Banks". CODEC includes 42 topics developed by researchers and a new focused web corpus with semantic annotations including entity links. This resource includes expert judgments on 17,509 documents and entities (416.9 per topic) from diverse automatic and interactive manual runs. The manual runs include 387 query reformulations, providing data for query performance prediction and automatic rewriting evaluation. CODEC includes analysis of state-of-the-art systems, including dense retrieval and neural re-ranking. The results show the topics are challenging with headroom for document and entity ranking improvement. Query expansion with entity information shows significant gains on document ranking, demonstrating the resource's value for evaluating and improving entity-oriented search. We also show that the manual query reformulations significantly improve document ranking and entity ranking performance. Overall, CODEC provides challenging research topics to support the development and evaluation of entity-centric search methods.

ir_metadata: An Extensible Metadata Schema for IR Experiments
Authors: Timo Breuer (1), Jüri Keller (1), Philipp Schaer (1)
1: TH Köln

ACM DL

Google Scholar

(357)
概要:　情報検索（IR）コミュニティは、将来的な再利用のために計算アーティファクトやリソースを提供するという強い伝統を持っており、実験結果の検証を可能にしています。実際のテストコレクションに加えて、基礎となるランファイルもTREC、CLEF、NTCIRなどの会議の一部としてデータアーカイブに保管されることが一般的です。しかしながら、ランデータ自体は実験の背景に関する情報をあまり提供しません。例えば、単一のランファイルは、共有タスクのウェブサイトやランデータアーカイブの文脈なしではあまり有用ではありません。社会科学のような他の分野では、研究データにメタデータを付けて注釈を付けるのが良い実践とされています。本研究では、PRIMADモデルに基づくTRECランファイルのための拡張可能なメタデータスキーマ「\textttir\_metadata」を紹介します。メタデータ注釈をPRIMADに整合させることを提案し、再現性に影響を与える計算実験の要素を考慮します。さらに、文献から得られた証拠を基に、メタデータに報告すべき重要なコンポーネントと情報を概説します。これらのメタデータ注釈の有用性を実証するために、再現性研究のユースケースに対して、\textttrepro\_eval における新しい機能を実装します。さらに、PRIMADコンポーネントの異なるインスタンスから導出された実験ランファイルのデータセットを作成し、対応するメタデータで注釈を付けます。実験では、メタデータによって特定され、PRIMADに分類される再現性実験をカバーします。本研究により、IR研究者はTRECランファイルに注釈を付け、実験アーティファクトの再利用価値をさらに向上させることが可能になります。

Abstract:　 The information retrieval (IR) community has a strong tradition of making the computational artifacts and resources available for future reuse, allowing the validation of experimental results. Besides the actual test collections, the underlying run files are often hosted in data archives as part of conferences like TREC, CLEF, or NTCIR. Unfortunately, the run data itself does not provide much information about the underlying experiment. For instance, the single run file is not of much use without the context of the shared task's website or the run data archive. In other domains, like the social sciences, it is good practice to annotate research data with metadata. In this work, we introduce \textttir\_metadata - an extensible metadata schema for TREC run files based on the PRIMAD model. We propose to align the metadata annotations to PRIMAD, which considers components of computational experiments that can affect reproducibility. Furthermore, we outline important components and information that should be reported in the metadata and give evidence from the literature. To demonstrate the usefulness of these metadata annotations, we implement new features in \textttrepro\_eval that support the outlined metadata schema for the use case of reproducibility studies. Additionally, we curate a dataset with run files derived from experiments with different instantiations of PRIMAD components and annotate these with the corresponding metadata. In the experiments, we cover reproducibility experiments that are identified by the metadata and classified by PRIMAD. With this work, we enable IR researchers to annotate TREC run files and improve the reuse value of experimental artifacts even further.

Would You Ask it that Way?: Measuring and Improving Question Naturalness for Knowledge Graph Question Answering
Authors: Trond Linjordet (1), Krisztian Balog (1)
1: University of Stavanger

ACM DL

Google Scholar

(358)
概要:　ナレッジグラフ質問応答（KGQA）は、ユーザーに公式なクエリ言語の知識を必要とせずに構造化データを活用して情報アクセスを容易にします。ユーザーはただ自然言語（NL）で質問をするだけで情報のニーズを表現できます。このようなサービスを提供するためのKGQAモデルをトレーニングするためのデータセットの構築には、専門家とクラウドソースの労働力の両方で高額な費用がかかります。通常、クラウドソースの労働力は、公式なクエリから生成されたテンプレートベースの疑似自然な質問を改良するために使用されます。しかし、結果として得られるデータセットは、本当に自然で流暢な言語を表現することが多くありません。本研究では、これらの欠点を特定し、改善する方法を調査します。既存のKGQAデータセットから質問をサンプリングし、それらを五つの異なる自然さの観点から評価することで、IQN-KGQAテストコレクションを作成します。その後、質問を修正して流暢さを向上させます。最後に、既存のKGQAモデルのパフォーマンスをオリジナルおよび修正されたNL質問のバージョンで比較します。一部のKGQAシステムは、より現実的なNL質問の形式で提示されると、成績が悪化することがわかりました。IQN-KGQAテストコレクションは、KGQAシステムをより現実的な環境で評価するためのリソースです。このテストコレクションの構築は、本当にNL質問を含む大規模なKGQAデータセットの構築の課題にも光を当てます。

Abstract:　 Knowledge graph question answering (KGQA) facilitates information access by leveraging structured data without requiring formal query language expertise from the user. Instead, users can express their information needs by simply asking their questions in natural language (NL). Datasets used to train KGQA models that would provide such a service are expensive to construct, both in terms of expert and crowdsourced labor. Typically, crowdsourced labor is used to improve template-based pseudo-natural questions generated from formal queries. However, the resulting datasets often fall short of representing genuinely natural and fluent language. In the present work, we investigate ways to characterize and remedy these shortcomings. We create the IQN-KGQA test collection by sampling questions from existing KGQA datasets and evaluating them with regards to five different aspects of naturalness. Then, the questions are rewritten to improve their fluency. Finally, the performance of existing KGQA models is compared on the original and rewritten versions of the NL questions. We find that some KGQA systems fare worse when presented with more realistic formulations of NL questions. The IQN-KGQA test collection is a resource to help evaluate KGQA systems in a more realistic setting. The construction of this test collection also sheds light on the challenges of constructing large-scale KGQA datasets with genuinely NL questions.

The Istella22 Dataset: Bridging Traditional and Neural Learning to Rank Evaluation
Authors: Domenico Dato (1), Sean MacAvaney (2), Franco Maria Nardini (3), Raffaele Perego (3), Nicola Tonellotto (4)
1: Istella, 2: University of Glasgow, 3: ISTI-CNR, 4: University of Pisa

ACM DL

Google Scholar

(359)
概要:　質問応答やアドホック文書ランキングなど、様々なランキングタスクにおいて、事前学習済み言語モデルを使用したニューラルアプローチが効果的であることが示されています。しかし、これらのアプローチが特徴ベースのLearning-to-Rank (LtR)手法と比較してどれほど効果的であるかは、まだ十分に確立されていません。その主要な理由は、クエリ-文書特徴ベクトルを含む現行のLtRベンチマークは、ニューラルモデルに必要な生のクエリや文書テキストを含んでいないためです。一方で、ニューラルモデルの評価に頻繁に使用されるベンチマーク（例：MS MARCO、TREC Robustなど）は、テキストを提供するものの、クエリ-文書特徴ベクトルを提供していません。本論文では、Istella22という新しいデータセットを紹介し、産業用検索エンジンで使用されるクエリ/文書テキストと強力なクエリ-文書特徴ベクトルの両方を提供することで、このような比較を可能にします。このデータセットは、8.4Mのウェブ文書の包括的なコーパス、220の手作りの特徴を含むクエリ-文書ペアのコレクション、5段階で判定される関連性の評価、及びテスト用に使用される2,198のテキストクエリから構成されています。Istella22は、同一データ上で従来の学習-to-ランキング技術と転移ランキング技術の公正な評価を可能にします。LtRモデルは、トレーニングサンプルの特徴ベースの表現を活用し、事前学習されたTransformerベースのニューラルランカーはクエリと文書のテキスト内容に基づいて評価されます。Istella22での予備実験を通じて、ニューラルリランキングアプローチが効果の面でLtRモデルに遅れを取っていることがわかりました。しかし、LtRモデルはニューラルモデルからのスコアを強力なシグナルとして認識します。

Abstract:　 Neural approaches that use pre-trained language models are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their effectiveness compared to feature-based Learning-to-Rank (LtR) methods has not yet been well-established. A major reason for this is because present LtR benchmarks that contain query-document feature vectors do not contain the raw query and document text needed for neural models. On the other hand, the benchmarks often used for evaluating neural models, e.g., MS MARCO, TREC Robust, etc., provide text but do not provide query-document feature vectors. In this paper, we present Istella22, a new dataset that enables such comparisons by providing both query/document text and strong query-document feature vectors used by an industrial search engine. The dataset consists of a comprehensive corpus of 8.4M web documents, a collection of query-document pairs including 220 hand-crafted features, relevance judgments on a 5-graded scale, and a set of 2,198 textual queries used for testing purposes. Istella22 enables a fair evaluation of traditional learning-to-rank and transfer ranking techniques on the same data. LtR models exploit the feature-based representations of training samples while pre-trained transformer-based neural rankers can be evaluated on the corresponding textual content of queries and documents. Through preliminary experiments on Istella22, we find that neural re-ranking approaches lag behind LtR models in terms of effectiveness. However, LtR models identify the scores from neural models as strong signals.

ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities
Authors: Paul Lerner (1), Olivier Ferret (2), Camille Guinaudeau (1), Hervé Le Borgne (2), Romaric Besançon (2), Jose G. Moreno (3), Jesús Lovón Melgarejo (3)
1: Université Paris-Saclay, 2: Université Paris-Saclay, 3: IRIT

ACM DL

Google Scholar

(360)
概要:　検索、回答、翻訳、あるいは推論を目的とする際、マルチモーダル処理は新たな挑戦と視点を生み出します。この文脈において、我々は視覚的なコンテクストに基づく知識ベース (KB) を用いて、固有名詞に関する質問に回答することに関心を持っています。このタスクを評価するために、KVQAE（固有名詞に関する知識ベースを利用した視覚的質問応答）と呼ばれる基準を設け、画像とペアになった3.7Kの質問を含むデータセットであるViQuAEを提供します。これは、人物、ランドマーク、製品など、幅広いエンティティタイプをカバーする初めてのKVQAEデータセットです。このデータセットは半自動的な方法でアノテーションされています。我々はまた、1.5MのWikipedia記事とそれに対応する画像から成るKBを提案します。そして、このベンチマーク上でのベースラインを設定するために、情報検索と読解という2段階の問題としてKVQAEに取り組みます。これにゼロショットおよび少数ショット学習の手法を用いました。実験結果は、特に質問が人物に関するものでない場合に、このタスクの難しさを実証しました。本研究は、より優れたマルチモーダルエンティティ表現と質問応答のための道を開きます。データセット、KB、コード、および半自動アノテーションパイプラインは、https://github.com/PaulLerner/ViQuAE から自由に利用可能です。

Abstract:　 Whether to retrieve, answer, translate, or reason, multimodality opens up new challenges and perspectives. In this context, we are interested in answering questions about named entities grounded in a visual context using a Knowledge Base (KB). To benchmark this task, called KVQAE (Knowledge-based Visual Question Answering about named Entities), we provide ViQuAE, a dataset of 3.7K questions paired with images. This is the first KVQAE dataset to cover a wide range of entity types (e.g. persons, landmarks, and products). The dataset is annotated using a semi-automatic method. We also propose a KB composed of 1.5M Wikipedia articles paired with images. To set a baseline on the benchmark, we address KVQAE as a two-stage problem: Information Retrieval and Reading Comprehension, with both zero- and few-shot learning methods. The experiments empirically demonstrate the difficulty of the task, especially when questions are not about persons. This work paves the way for better multimodal entity representations and question answering. The dataset, KB, code, and semi-automatic annotation pipeline are freely available at https://github.com/PaulLerner/ViQuAE.

Biographical Semi-Supervised Relation Extraction Dataset
Authors: Alistair Plum (1), Tharindu Ranasinghe (1), Spencer Jones (1), Constantin Orasan (2), Ruslan Mitkov (1)
1: University of Wolverhampton, 2: University of Surrey

ACM DL

Google Scholar

(361)
概要:　オンライン文書から伝記情報を抽出することは、情報抽出（IE）コミュニティにおいて人気のある研究トピックです。テキスト分類、テキスト、関係抽出など、さまざまな自然言語処理（NLP）技術が一般的に使用されますが、この中で関係抽出（RE）が最も一般的です。REは、伝記知識グラフの構築に直接使用できるためです。REは通常、アノテーションされたデータセット上で機械学習（ML）モデルを訓練する、教師あり機械学習の問題としてフレーム化されます。しかし、アノテーションのプロセスは費用がかかり、時間がかかるため、RE用のアノテーション済みデータセットは少ないです。これに対処するために、私たちは初の半教師ありデータセットである「Biographical」を開発しました。このデータセットは、デジタル人文学（DH）や歴史研究を対象としており、Wikipediaの記事からの文をPantheonやWikidataなどの構造化データと一致させて自動で編成されます。Wikipedia記事の構造と強力な名前付きエンティティ認識（NER）を活用して、比較的高精度で情報を一致させ、DH分野で重要な10種類の関係に対するアノテーション済みの関係ペアを編成します。さらに、最先端のニューラルモデルを用いて関係ペアを分類する訓練を行い、手作業でアノテーションされたゴールドスタンダードセットで評価することで、このデータセットの有効性を実証しました。Biographicalは主にデジタル人文学と歴史の領域内でREのためのニューラルモデルの訓練を目的としていますが、本論文の最後で述べるように、他の目的にも有用である可能性があります。

Abstract:　 Extracting biographical information from online documents is a popular research topic among the information extraction (IE) community. Various natural language processing (NLP) techniques such as text classification, text summarisation and relation extraction are commonly used to achieve this. Among these techniques, RE is the most common since it can be directly used to build biographical knowledge graphs. RE is usually framed as a supervised machine learning (ML) problem, where ML models are trained on annotated datasets. However, there are few annotated datasets for RE since the annotation process can be costly and time-consuming. To address this, we developedBiographical, the first semi-supervised dataset for RE. The dataset, which is aimed towards digital humanities (DH) and historical research, is automatically compiled by aligning sentences from Wikipedia articles with matching structured data from sources including Pantheon and Wikidata. By exploiting the structure of Wikipedia articles and robust named entity recognition (NER), we match information with relatively high precision in order to compile annotated relation pairs for ten different relations that are important in the DH domain. Furthermore, we demonstrate the effectiveness of the dataset by training a state-of-the-art neural model to classify relation pairs, and evaluate it on a manually annotated gold standard set.Biographical is primarily aimed at training neural models for RE within the domain of digital humanities and history, but as we discuss at the end of this paper, it can be useful for other purposes as well.

Axiomatic Retrieval Experimentation with ir_axioms
Authors: Alexander Bondarenko (1), Maik Fröbe (1), Jan Heinrich Reimer (1), Benno Stein (2), Michael Völske (2), Matthias Hagen (1)
1: Martin-Luther-Universität Halle-Wittenberg, 2: Bauhaus-Universität Weimar

ACM DL

Google Scholar

(362)
概要:　情報検索に対する公理的アプローチは、優れた検索モデルを特徴付ける基本的な制約を決定する上で重要な役割を果たしてきました。検索理論における重要性を超えて、公理は初期ランキングの改善、検索の「ガイド」、またはモデルのランキングの説明に活用されています。しかし、PyTerrierやPyseriniのような最近のオープンソース検索フレームワークは、スパースおよびデンス検索モデルの実験を容易にする一方で、検索公理のサポートをまだ含んでいません。このギャップを埋めるために、我々はir_axiomsというオープンソースのPythonフレームワークを提案します。これは一般的な検索フレームワークと検索公理を統合するものです。我々は、25の検索公理のリファレンス実装、嗜好集合化、再ランキング、評価のためのコンポーネントを含めました。新しい公理は、データ型を実装することで、または既存の公理をPythonの演算子や回帰と直感的に組み合わせることで容易に定義できます。PyTerrierおよびir_datasetsとの統合により、標準的な検索モデル、コーパス、トピック、および関連評価（TRECで使用されるものを含む）が公理実験のために即座に利用可能となります。我々のTRECディープラーニングトラックでの実験は、ir_axiomsが解決に役立つ可能性のある研究課題をいくつか示しています。

Abstract:　 Axiomatic approaches to information retrieval have played a key role in determining basic constraints that characterize good retrieval models. Beyond their importance in retrieval theory, axioms have been operationalized to improve an initial ranking, to "guide" retrieval, or to explain some model's rankings. However, recent open-source retrieval frameworks like PyTerrier and Pyserini, which made it easy to experiment with sparse and dense retrieval models, have not included any retrieval axiom support so far. To fill this gap, we propose ir_axioms, an open-source Python framework that integrates retrieval axioms with common retrieval frameworks. We include reference implementations for 25 retrieval axioms, as well as components for preference aggregation, re-ranking, and evaluation. New axioms can easily be defined by implementing an abstract data type or by intuitively combining existing axioms with Python operators or regression. Integration with PyTerrier and ir_datasets makes standard retrieval models, corpora, topics, and relevance judgments---including those used at TREC---immediately accessible for axiomatic experimentation. Our experiments on the TREC Deep Learning tracks showcase some potential research questions that ir_axioms can help to address.

MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset
Authors: Dan S. Nielsen (1), Ryan McConville (1)
1: University of Bristol

ACM DL

Google Scholar

(363)
概要:　ソーシャルメディアやニュース記事における誤情報はますます蔓延しています。その広がりは深刻で、機械学習を利用したアルゴリズム支援が必要な状況です。これらの機械学習モデルを訓練するためには、十分な規模、多様性、そして品質を持ったデータセットが求められます。しかし、自動誤情報検出の分野におけるデータセットは主に単一言語で構成され、使用されるモダリティの種類が限られており、規模や品質も充分ではありません。これに対処するために、我々はデータ収集と連携システム（MuMiN-trawl）を開発し、公的な誤情報グラフデータセット（MuMiN）を構築しました。このデータセットは、21万件のツイートに属する26千のTwitterスレッドを含む豊富なソーシャルメディアデータ（ツイート、返信、ユーザー、画像、記事、ハッシュタグ）をカバーし、それぞれが13千のファクトチェック済みの主張と意味的にリンクされています。データは数十のトピック、イベント、領域にわたり、41言語で10年以上にわたるものです。このデータセットはPythonパッケージ（mumin）経由で異種グラフとして提供されます。我々は、ソーシャルメディアに関連する主張の真偽に関する2つのノード分類タスクについてベースライン結果を提供し、これらが挑戦的なタスクであることを示しました。その2つのタスクにおける最高のマクロ平均F1スコアは、それぞれ62.55%および61.45%でした。MuMiNエコシステムは、データ、ドキュメント、チュートリアル、リーダーボードを含むhttps://mumin-dataset.github.io/で利用可能です。

Abstract:　 Misinformation is becoming increasingly prevalent on social media and in news articles. It has become so widespread that we require algorithmic assistance utilising machine learning to detect such content. Training these machine learning models require datasets of sufficient scale, diversity and quality. However, datasets in the field of automatic misinformation detection are predominantly monolingual, include a limited amount of modalities and are not of sufficient scale and quality. Addressing this, we develop a data collection and linking system (MuMiN-trawl), to build a public misinformation graph dataset (MuMiN), containing rich social media data (tweets, replies, users, images, articles, hashtags) spanning 21 million tweets belonging to 26 thousand Twitter threads, each of which have been semantically linked to 13 thousand fact-checked claims across dozens of topics, events and domains, in 41 different languages, spanning more than a decade. The dataset is made available as a heterogeneous graph via a Python package (mumin). We provide baseline results for two node classification tasks related to the veracity of a claim involving social media, and demonstrate that these are challenging tasks, with the highest macro-average F1-score being 62.55% and 61.45% for the two tasks, respectively. The MuMiN ecosystem is available at https://mumin-dataset.github.io/, including the data, documentation, tutorials and leaderboards.

CAVES: A Dataset to facilitate Explainable Classification and Summarization of Concerns towards COVID Vaccines
Authors: Soham Poddar (1), Azlaan Mustafa Samad (1), Rajdeep Mukherjee (1), Niloy Ganguly (2), Saptarshi Ghosh (1)
1: Indian Institute of Technology, 2: Indian Institute of Technology

ACM DL

Google Scholar

(364)
概要:　新型コロナウイルス感染症（COVID-19）に対するワクチン接種を促進することは、現代社会における重要な課題です。この目標に向けた第一歩として、多くの先行研究は、ソーシャルメディア分析を通じてワクチンに対する具体的な懸念事項（副作用の可能性、効果の欠如、政治的要因など）を理解しようとしてきました。ソーシャルメディア投稿を反ワクチン（Anti-vax）および賛成ワクチン（Pro-Vax）という広範なラベルで分類するデータセットはいくつか存在しますが、特定の反ワクチン懸念に基づいて投稿をラベル付けするデータセットは（当研究の知る限り）存在していません。本論文では、CAVESという初の大規模データセットを作成しました。このデータセットは、約10,000件のCOVID-19反ワクチンツイートを特定の反ワクチン懸念に基づいてマルチラベル設定でラベル付けしています。また、本データセットは、各ラベルに対する説明を提供する初のマルチラベル分類データセットでもあります。さらに、各ツイートのクラス固有のも提供しています。本データセットに対して予備実験を行い、最先端のモデルにおける適度なスコアからも明らかなように、マルチラベルの説明可能な分類とツイートにおいて非常に難しい課題であることを示しました。

Abstract:　 Convincing people to get vaccinated against COVID-19 is a key societal challenge in the present times. As a first step towards this goal, many prior works have relied on social media analysis to understand the specific concerns that people have towards these vaccines, such as potential side-effects, ineffectiveness, political factors, and so on. Though there are datasets that broadly classify social media posts into Anti-vax and Pro-Vax labels, there is no dataset (to our knowledge) that labels social media posts according to the specific anti-vaccine concerns mentioned in the posts. In this paper, we have curated CAVES, the first large-scale dataset containing about 10k COVID-19 anti-vaccine tweets labelled into various specific anti-vaccine concerns in a multi-label setting. This is also the first multi-label classification dataset that provides explanations for each of the labels. Additionally, the dataset also provides class-wise summaries of all the tweets. We also perform preliminary experiments on the dataset and show that this is a very challenging dataset for multi-label explainable classification and tweet summarization, as is evident by the moderate scores achieved by some state-of-the-art models.

iRec: An Interactive Recommendation Framework
Authors: Thiago Silva (1), Nícollas Silva (2), Heitor Werneck (1), Carlos Mito (1), Adriano C.M. Pereira (2), Leonardo Rocha (1)
1: Universidade Federal de São João Del Rei, 2: Universidade Federal de Minas Gerais

ACM DL

Google Scholar

(365)
概要:　現代において、ほとんどの電子商取引およびエンターテインメントサービスは、インタラクティブな推薦システム（Recommender Systems, RS）を採用して、ユーザーのシステム内での全体的な旅をガイドしています。このタスクは、マルチアームバンディット問題として取り組まれており、システムは各イテレーションで連続的に学習し、推薦を行う必要があります。しかし、最近の進歩にもかかわらず、このようなバンディットソリューションの評価に関する最良の実践方法についてはまだコンセンサスが欠けています。評価プロセスに影響を与える変数は複数ありますが、ほとんどの研究は各手法の精度のみに関心を持っていました。そこで、本研究はiRecと名付けたインタラクティブなRSフレームワークを提案します。これにより、主要なRSガイドラインに従った実験プロセス全体をカバーします。iRecはデータセットの準備、新しい推薦エージェントの作成、インタラクティブなシナリオのシミュレーションを行う3つのモジュールを提供します。さらに、最新のアルゴリズム、ハイパーパラメータ調整モジュール、異なる評価指標、結果の可視化方法、および統計的検証も含まれています。

Abstract:　 Nowadays, most e-commerce and entertainment services have adopted interactive Recommender Systems (RS) to guide the entire journey of users into the system. This task has been addressed as a Multi-Armed Bandit problem where systems must continuously learn and recommend at each iteration. However, despite the recent advances, there is still a lack of consensus on the best practices to evaluate such bandit solutions. Several variables might affect the evaluation process, but most of the works have only been concerned about the accuracy of each method. Thus, this work proposes an interactive RS framework named iRec. It covers the whole experimentation process by following the main RS guidelines. The iRec provides three modules to prepare the dataset, create new recommendation agents, and simulate the interactive scenario. Moreover, it also contains several state-of-the-art algorithms, a hyperparameter tuning module, distinct evaluation metrics, different ways of visualizing the results, and statistical validation.

From Little Things Big Things Grow: A Collection with Seed Studies for Medical Systematic Review Literature Search
Authors: Shuai Wang (1), Harrisen Scells (1), Justin Clark (2), Bevan Koopman (3), Guido Zuccon (1)
1: The University of Queensland, 2: Bond Institute for Evidence-Based Healthcare, 3: CSIRO

ACM DL

Google Scholar

(366)
概要:　医療系の系統的レビューにおけるクエリの作成は、訓練を受けた情報専門家によって行われる非常に複雑な作業です。この複雑さは、詳細な研究問題を表現するために長いブールクエリに依存することから生じます。クエリの作成を支援するために、情報専門家は「シードスタディ」と呼ばれる一連の模範文書をクエリ作成前に使用します。シードスタディは、取得された研究を全面的に評価する前に、クエリの有効性を確認するのに役立ちます。シードのこの使用を超えて、特定の情報検索（IR）手法は、自動クエリ作成や新しい検索モデルの案内にシードスタディを利用することができます。これまでの作業の一つの主要な制約は、これらの手法が含まれた研究（すなわち、関連性評価）を遡及的に使用する「擬似シードスタディ」を利用していることです。しかし、我々は擬似シードスタディが情報専門家によって使用される実際のシードスタディを代表していないことを示します。したがって、我々はクエリの作成を支援するために使用された実際のシードスタディを含むテストコレクションを提供します。我々のコレクションを支援するために、シードスタディが検索に与える影響についてのこれまで不可能だった分析を行い、擬似シードスタディを使用するのに対してシードスタディを使用する効果を比較するためのいくつかの実験を行います。我々のテストコレクションおよびすべての実験と分析の結果は、http://github.com/ielab/sysrev-seed-collection で利用可能です。

Abstract:　 Medical systematic review query formulation is a highly complex task done by trained information specialists. Complexity comes from the reliance on lengthy Boolean queries, which express a detailed research question. To aid query formulation, information specialists use a set of exemplar documents, called 'seed studies', prior to query formulation. Seed studies help verify the effectiveness of a query prior to the full assessment of retrieved studies. Beyond this use of seeds, specific IR methods can exploit seed studies for guiding both automatic query formulation and new retrieval models. One major limitation of work to date is that these methods exploit 'pseudo seed studies' through retrospective use of included studies (i.e., relevance assessments). However, we show pseudo seed studies are not representative of real seed studies used by information specialists. Hence, we provide a test collection with real world seed studies used to assist with the formulation of queries. To support our collection, we provide an analysis, previously not possible, on how seed studies impact retrieval and perform several experiments using seed study based methods to compare the effectiveness of using seed studies versus pseudo seed studies. We make our test collection and the results of all of our experiments and analysis available at http://github.com/ielab/sysrev-seed-collection.

Document Expansion Baselines and Learned Sparse Lexical Representations for MS MARCO V1 and V2
Authors: Xueguang Ma (1), Ronak Pradeep (1), Rodrigo Nogueira (1), Jimmy Lin (1)
1: University of Waterloo

ACM DL

Google Scholar

(367)
概要:　摘要
我々はdoc2queryを用いて、テキストの入力スパンに対して、そのテキストが回答するかもしれない自然言語のクエリを予測するニューラルシーケンス・トゥ・シーケンスモデルを訓練します。これらの予測は、BM25のような標準的なバッグ・オブ・ワーズのターム重み付けモデルや、uniCOILのような学習されたスパースな語彙表現に基づくニューラル検索モデルに供給されるドキュメント拡張として見ることができます。MS MARCOデータセットでの過去の実験において、これらの方法の有効性が示され、現在もコミュニティによって広く使用されるベースラインとなっています。最近リリースされたMS MARCO V2パッセージおよびドキュメントランキングのテストコレクションに基づき、我々はdoc2queryとuniCOILモデルを刷新しました。本研究は、AnseriniおよびPyseriniのIRツールキットを用いて、MS MARCO V1とV2の両方のテストコレクションに対する競争力のある再現性のあるベースラインをサポートするための多くのリソースを記述しています。これらを合わせることで、MS MARCOデータセット及びそれ以降のニューラル検索モデルに関する今後の研究に対するしっかりとした基盤を提供します。

Abstract:　 With doc2query, we train a neural sequence-to-sequence model that, given an input span of text, predicts a natural language query that the text might answer. These predictions can be viewed as document expansions that feed standard bag-of-words term weighting models such as BM25 or neural retrieval models based on learned sparse lexical representations such as uniCOIL. Previous experiments on the MS MARCO datasets have demonstrated the effectiveness of these methods, and they serve as baselines that are widely used by the community today. Following the recent release of the MS MARCO V2 passage and document ranking test collections, we have refreshed our doc2query and uniCOIL models. This work describes a number of resources that support competitive, reproducible baselines for both the MS MARCO V1 and V2 test collections using our Anserini and Pyserini IR toolkits. Together, they provide a solid foundation for future research on neural retrieval models using the MS MARCO datasets and beyond.

MIMICS-Duo: Offline & Online Evaluation of Search Clarification
Authors: Leila Tavakoli (1), Johanne R. Trippas (2), Hamed Zamani (3), Falk Scholer (1), Mark Sanderson (1)
1: RMIT University, 2: University of Melbourne, 3: University of Massachusetts Amherst

ACM DL

Google Scholar

(368)
概要:　明確化質問の研究は活発に行われていますが、検索明確化手法の訓練および評価のためのリソースは十分ではありません。この問題に対処するため、私たちは新たに306件の検索クエリと複数の明確化を含む無料のデータセット「MIMICS-Duo」（合計1,034のクエリ-明確化ペア）を紹介します。MIMICS-Duoは、明確化質問とその候補回答に関する詳細な注釈を含んでおり、オンラインおよびオフラインでの検索明確化手法の多次元評価を可能にすることで、既存のMIMICSデータセットを強化します。私たちは、オフラインとオンラインの検索明確化データセット間の関係を示すための広範な分析を行い、MIMICS-Duoによって実現されるいくつかの研究方向性を概説します。このリソースが、研究者が検索における明確化をよりよく理解するために役立つと信じています。

Abstract:　 Asking clarification questions is an active area of research; however, resources for training and evaluating search clarification methods are not sufficient. To address this issue, we describe MIMICS-Duo, a new freely available dataset of 306 search queries with multiple clarifications (a total of 1,034 query-clarification pairs). MIMICS-Duo contains fine-grained annotations on clarification questions and their candidate answers and enhances the existing MIMICS datasets by enabling multi-dimensional evaluation of search clarification methods, including online and offline evaluation. We conduct extensive analysis to demonstrate the relationship between offline and online search clarification datasets and outline several research directions enabled by MIMICS-Duo. We believe that this resource will help researchers better understand clarification in search.

Knowledge Graph Question Answering Datasets and Their Generalizability: Are They Enough for Future Research?
Authors: Longquan Jiang (1), Ricardo Usbeck (1)
1: University Hamburg

ACM DL

Google Scholar

(369)
概要:　知識グラフに対する質問応答 (KGQA) の既存のアプローチは、一般化能力が弱いことがよくあります。これは、基礎データセットに対する標準的な独立同分布 (i.i.d.) 仮定が原因であることが多いです。最近、KGQAの一般化に関する3つのレベル、すなわちi.i.d、構成的、ゼロショットが定義されました。私たちは、5つの異なる知識グラフ (KG) に対する25の有名なKGQAデータセットを分析します。この定義に従うと、多くの既存のオンラインで利用可能なKGQAデータセットは、一般化可能なKGQAシステムを訓練するのには適していないか、または廃止され古くなったKGに基づいていることを示します。新しいデータセットを生成することは高コストであり、小規模な研究グループや企業にとっては現実的な選択肢ではありません。本研究では、利用可能なKGQAデータセットの再分割方法を提案し、一般化を評価するための適用性をコストや手動による作業なしで実現します。我々の仮説を、3つのKGQAデータセット、すなわちLC-QuAD、LC-QuAD 2.0、QALD-9でテストしました。再分割されたKGQAデータセットの実験は、一般化に向けたその効果を示しています。コードと18の利用可能なデータセットにアクセスする統一された方法は、https://github.com/semantic-systems/KGQA-datasets および https://github.com/semantic-systems/KGQA-datasets-generalization にオンラインで公開されています。

Abstract:　 Existing approaches on Question Answering over Knowledge Graphs (KGQA) have weak generalizability. That is often due to the standard i.i.d. assumption on the underlying dataset. Recently, three levels of generalization for KGQA were defined, namely i.i.d., compositional, zero-shot. We analyze 25 well-known KGQA datasets for 5 different Knowledge Graphs (KGs). We show that according to this definition many existing and online available KGQA datasets are either not suited to train a generalizable KGQA system or that the datasets are based on discontinued and out-dated KGs. Generating new datasets is a costly process and, thus, is not an alternative to smaller research groups and companies. In this work, we propose a mitigation method for re-splitting available KGQA datasets to enable their applicability to evaluate generalization, without any cost and manual effort. We test our hypothesis on three KGQA datasets, i.e., LC-QuAD, LC-QuAD 2.0 and QALD-9). Experiments on re-splitted KGQA datasets demonstrate its effectiveness towards generalizability. The code and a unified way to access 18 available datasets is online at https://github.com/semantic-systems/KGQA-datasets as well as https://github.com/semantic-systems/KGQA-datasets-generalization.

SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated Counterfactuals
Authors: Zijian Zhang (1), Vinay Setty (2), Avishek Anand (3)
1: L3S Research Center, 2: L3S Research Center & University of Stavanger, 3: L3S Research Center & Delft University of Technology

ACM DL

Google Scholar

(370)
概要:　我々は、言語タスクに訓練された機械学習モデルのリスク評価ツールであるSparCAssistを紹介します。このツールは、与えられたデータインスタンスに基づいて生成された分布外インスタンス（反事実）に対するモデルの挙動を検査することにより、モデルのリスクを評価します。反事実は、ExPredによって識別された合理的な部分列のトークンを置換することで生成され、置換にはHotFlipまたはMasked-Language-Modelに基づくアルゴリズムが使用されます。システムの主な目的は、ヒューマンアノテーターがモデ

Abstract:　 We introduce SparCAssist, a general-purpose risk assessment tool for the machine learning models trained for language tasks. It evaluates models' risk by inspecting their behavior on counterfactuals, namely out-of-distribution instances generated based on the given data instance. The counterfactuals are generated by replacing tokens in rational subsequences identified by ExPred, while the replacements are retrieved using HotFlip or the Masked-Language-Model-based algorithms. The main purpose of our system is to help the human annotators to assess the model's risk on deployment. The counterfactual instances generated during the assessment are the by-product and can be used to train more robust NLP models in the future.

RecDelta: An Interactive Dashboard on Top-k Recommendation for Cross-model Evaluation
Authors: Yi-Shyuan Chiang (1), Yu-Ze Liu (1), Chen-Feng Tsai (1), Jing-Kai Lou (2), Ming-Feng Tsai (3), Chuan-Ju Wang (1)
1: Academia Sinica, 2: KKStream Technologies, 3: National Chengchi University

ACM DL

Google Scholar

(371)
概要:　このデモンストレーションでは、トップk推奨のクロスモデル評価のためのインタラクティブツール「RecDelta」を紹介します。RecDeltaは、様々な推奨アルゴリズムのパフォーマンスとその推奨項目を視覚的に比較することができるウェブベースの情報システムです。提案されたシステムでは、アルゴリズム間のδスコアの分布を視覚化します。これは、推奨リスト間の交差度合いを測定する距離指標です。このような視覚化により、異なるアルゴリズムが推奨する項目が分岐するユーザーやその逆のユーザーを迅速に特定できます。さらに、特定されたユーザーを選択すると、推奨項目とそのユーザーの過去の行動との関係を提示することができます。RecDeltaは、モデルの説明可能性を向上させることにより、研究者や実務者が新たな洞察を得ながら推奨アルゴリズムを開発する際に有益です。なお、本システムは現在https://cfda.csie.org/recdeltaでオンラインで利用可能であり、システムの概念と使用方法を紹介するビデオ録画もhttps://tinyurl.com/RecDeltaで提供しています。

Abstract:　 In this demonstration, we present RecDelta, an interactive tool for the cross-model evaluation of top-k recommendation. RecDelta is a web-based information system where people visually compare the performance of various recommendation algorithms and their recommended items. In the proposed system, we visualize the distribution of the δ scores between algorithms--a distance metric measuring the intersection between recommendation lists. Such visualization allows for rapid identification of users for whom the items recommended by different algorithms diverge or vice versa; then, one can further select the desired user to present the relationship between recommended items and his/her historical behavior. RecDelta benefits both academics and practitioners by enhancing model explainability as they develop recommendation algorithms with their newly gained insights. Note that while the system is now online at https://cfda.csie.org/recdelta, we also provide a video recording at https://tinyurl.com/RecDelta to introduce the concept and the usage of our system.

A Common Framework for Exploring Document-at-a-Time and Score-at-a-Time Retrieval Methods
Authors: Andrew Trotman (1), Joel Mackenzie (2), Pradeesh Parameswaran (1), Jimmy Lin (3)
1: University of Otago, 2: The University of Queensland, 3: University of Waterloo

ACM DL

Google Scholar

(372)
概要:　 Document-at-a-time (DaaT) と score-at-a-time (SaaT) のクエリ評価手法は、反転インデックスを使用したトップkリトリーバルの異なるアプローチです。現代のシステムではDaaTが主流である一方、学術的な文献ではそれぞれの利点について数十年にわたり議論が行われてきました。最近では、トランスフォーマーが生成する「異常な重み」によりDaaT方法における最適化の機会が減少するように見えるため、学習済みのスパース語彙モデルに対するSaaT手法への関心が再燃しています。しかし、研究者は現在のところ、さらなる探求を支えるための使いやすいSaaTシステムを持っていません。これは我々の研究が埋めるべきギャップです。最新のSaaTシステム（JASS）を出発点として、我々はPythonバインディングを構築し、DaaT Pyserini IRツールキット（Lucene）に統合しました。その結果、DaaTシステムとSaaTシステムの両方に共通のフロントエンドを実現しました。我々は、学習済みのスパース語彙モデルを幅広く使用した最近の実験が簡単に再現できることを示します。我々の貢献は、現代のニューラルリトリーバルモデルの文脈でDaaTとSaaT手法を比較する今後の研究を可能にするフレームワークです。

Abstract:　 Document-at-a-time (DaaT) and score-at-a-time (SaaT) query evaluation techniques are different approaches to top-k retrieval with inverted indexes. While modern systems are dominated by DaaT, the academic literature has seen decades of debate about the merits of each. Recently, there has been renewed interest in SaaT methods for learned sparse lexical models, where studies have shown that transformers generate "wacky weights" that appear to reduce opportunities for optimizations in DaaT methods. However, researchers currently lack an easy-to-use SaaT system to support further exploration. This is the gap that our work fills. Starting with a modern SaaT system (JASS), we built Python bindings in order to integrate into the DaaT Pyserini IR toolkit (Lucene). The result is a common frontend to both a DaaT and a SaaT system. We demonstrate how recent experiments with a wide range of learned sparse lexical models can be easily reproduced. Our contribution is a framework that enables future research comparing DaaT and SaaT methods in the context of modern neural retrieval models.

Asyncval: A Toolkit for Asynchronously Validating Dense Retriever Checkpoints During Training
Authors: Shengyao Zhuang (1), Guido Zuccon (1)
1: The University of Queensland

ACM DL

Google Scholar

(373)
概要:　モデルチェックポイント検証のプロセスとは、モデルのハイパーパラメータを学習しながら、保持しているトレーニングデータの一部で実行されたモデルチェックポイントの性能を評価することを指します。このモデルチェックポイント検証プロセスは、過学習を避け、トレーニングを終了するタイミングを判断するために使用されます。ディープラーニングのチェックポイントを検証するための簡単かつ効率的な戦略は、トレーニング中に実行する検証ループを追加することです。しかし、密なレトリバー（DR）チェックポイントの検証はそれほど簡単ではなく、検証ループの追加は効率的ではありません。これは、DRチェックポイントの性能を正確に評価するためには、ドキュメントコーパス全体を現在のチェックポイントを用いてベクトルに符号化し、実際のチェックポイント検証の取得操作を行う前に必要だからです。このコーパス符号化プロセスは、ドキュメントコーパスに数百万のドキュメントが含まれている場合（例えば、MS MARCO v1では8.8M、Natural Questionsでは21M）非常に時間がかかる可能性があります。そのため、トレーニング中にナイーブな検証ループを使用すると、トレーニング時間が大幅に増加します。この問題に対処するため、我々はAsyncvalを提案します：トレーニング中にDRチェックポイントを効率的に検証するためのPythonベースのツールキットです。Asyncvalは、検証のためにトレーニングループを中断する代わりに、検証ループをトレーニングループから分離し、別のGPUを使用して新しいDRチェックポイントを自動的に検証します。これにより、トレーニングから非同期で検証を行うことができます。Asyncvalはさらに、DRチェックポイントの検証を高速化するために、一連の異なるコーパスサブセットサンプリング戦略を実装しています。これらの方法の検証時間と検証精度への影響についての調査結果も提供します。Asyncvalはhttps://github.com/ielab/asyncval でオープンソースプロジェクトとして公開されています。

Abstract:　 The process of model checkpoint validation refers to the evaluation of the performance of a model checkpoint executed on a held-out portion of the training data while learning the hyperparameters of the model. This model checkpoint validation process is used to avoid over-fitting and determine when the model has converged so as to stop training. A simple and efficient strategy to validate deep learning checkpoints is the addition of validation loops to execute during training. However, the validation of dense retrievers (DR) checkpoints is not as trivial -- and the addition of validation loops is not efficient. This is because, in order to accurately evaluate the performance of a DR checkpoint, the whole document corpus needs to be encoded into vectors using the current checkpoint before any actual retrieval operation for checkpoint validation can be performed. This corpus encoding process can be very time-consuming if the document corpus contains millions of documents (e.g., 8.8M for MS MARCO v1 and 21M for Natural Questions). Thus, a naïve use of validation loops during training will significantly increase training time. To address this issue, we propose Asyncval: a Python-based toolkit for efficiently validating DR checkpoints during training. Instead of pausing the training loop for validating DR checkpoints, Asyncval decouples the validation loop from the training loop, uses another GPU to automatically validate new DR checkpoints and thus permits to perform validation asynchronously from training. Asyncval also implements a range of different corpus subset sampling strategies for validating DR checkpoints; these strategies allow to further speed up the validation process. We provide an investigation of these methods in terms of their impact on validation time and validation fidelity. Asyncval is made available as an open-source project at https://github.com/ielab/asyncval.

TaskMAD: A Platform for Multimodal Task-Centric Knowledge-Grounded Conversational Experimentation
Authors: Alessandro Speggiorin (1), Jeffrey Dalton (1), Anton Leuski (2)
1: University of Glasgow, 2: University of Southern California

ACM DL

Google Scholar

(374)
概要:　会話型アシスタントの役割は、単純な音声コマンドを超えるものへと進化し、家庭や車、さらには仮想現実においても複雑なタスクをサポートしています。単なる音声コマンドと制御を超えるには、構造化された対話、情報検索、基づいた推論、そしてコンテキストに基づいた質問応答を融合させたエージェントとデータセットが必要です。そのためには、豊富な画像やビデオコンテンツを含むマルチモーダル環境が求められます。本デモでは、新しいプラットフォームであるタスク指向マルチモーダルエージェント対話（TaskMAD）を紹介します。TaskMADは、ウィザード・オブ・オズ実験設定でのインタラクティブなマルチモーダルかつタスク中心のデータセットの作成をサポートします。TaskMADは、テキストと音声のサポート、テキストおよび知識ベースからの連合検索、オフラインラベリングのためのインタラクションの構造化ログ記録機能を含みます。そのアーキテクチャは、オープンドメインの探査的検索から伝統的なフレームベースの対話タスクまでの幅広いタスクをサポートします。TaskMADはオープンソースで、Amazon Alexa 賞 Taskbot チャレンジ、TREC 会話アシスタンストラック、学部生研究などのデータ収集プラットフォームとして豊富な機能を提供します。TaskMADはMITライセンスの下で配布されています。

Abstract:　 The role of conversational assistants continues to evolve, beyond simple voice commands to ones that support rich and complex tasks in the home, car, and even virtual reality. Going beyond simple voice command and control requires agents and datasets blending structured dialogue, information seeking, grounded reasoning, and contextual question-answering in a multimodal environment with rich image and video content. In this demo, we introduce Task-oriented Multimodal Agent Dialogue (TaskMAD), a new platform that supports the creation of interactive multimodal and task-centric datasets in a Wizard-of-Oz experimental setup. TaskMAD includes support for text and voice, federated retrieval from text and knowledge bases, and structured logging of interactions for offline labeling. Its architecture supports a spectrum of tasks that span open-domain exploratory search to traditional frame-based dialogue tasks. It's open-source and offers rich capability as a platform used to collect data for the Amazon Alexa Prize Taskbot challenge, TREC Conversational Assistance track, undergraduate student research, and others. TaskMAD is distributed under the MIT license.

Golden Retriever: A Real-Time Multi-Modal Text-Image Retrieval System with the Ability to Focus
Authors: Florian Schneider (1), Chris Biemann (1)
1: Universität Hamburg

ACM DL

Google Scholar

(375)
概要:　本研究では、最先端の視覚言語モデル（VLM）を活用したリアルタイムのテキスト画像検索システム「Golden Retriever」を紹介します。本システムの独自の特徴は、テキストクエリに含まれる単語に焦点を当て、それらを検索した画像内で特定し、ハイライトすることができる点です。効率的な二段階プロセスによってリアルタイム処理能力と焦点機能を実現します。まず、被処理画像の数を大幅に削減し、次にVLMの出力を使用して画像をランク付けし、強調する単語をハイライトします。さらに、短いテキストクエリに対して画像を検索するためにTF-IDFの概念に基づいた新しい効率的なアルゴリズムを紹介します。Golden Retrieverを使用するケースの一例として、言語学習者のシナリオがあります。この場合、文章中の「難しい」単語に対する視覚的手がかりを提供し、ユーザーの読解力を向上させます。しかし、バックエンドがフロントエンドから完全に切り離されているため、本システムは高速に画像を検索する必要がある他のアプリケーションにも組み込むことができます。Golden Retrieverのデモとして、最小限のユーザーインターフェースのスクリーンショットを用いて説明します。

Abstract:　 In this work, we present the Golden Retriever, a system leveraging state-of-the-art visio-linguistic models (VLMs) for real-time text-image retrieval. The unique feature of our system is that it can focus on words contained in the textual query, i.e., locate and high-light them within retrieved images. An efficient two-stage process implements real-time capability and the ability to focus. Therefore, we first drastically reduce the number of images processed by a VLM. Then, in the second stage, we rank the images and highlight the focussed word using the outputs of a VLM. Further, we introduce a new and efficient algorithm based on the idea of TF-IDF to retrieve images for short textual queries. One of multiple use cases where we employ the Golden Retriever is a language learner scenario, where visual cues for "difficult" words within sentences are provided to improve a user's reading comprehension. However, since the backend is completely decoupled from the frontend, the system can be integrated into any other application where images must be retrieved fast. We demonstrate the Golden Retriever with screenshots of a minimalistic user interface.

BiTe-REx: An Explainable Bilingual Text Retrieval System in the Automotive Domain
Authors: Viju Sudhi (1), Sabine Wehnert (1), Norbert Michael Homner (2), Sebastian Ernst (2), Mark Gonter (2), Andreas Krug (1), Ernesto William De Luca (1)
1: Otto von Guericke University, 2: Audi AG

ACM DL

Google Scholar

(376)
概要:　増加し続ける多言語化の中、ユーザーの多岐にわたる情報需要を満たすために、言語の壁を越えるリトリーバルシステムがデジタル空間で不可欠となっています。本研究は、競争相手や賃金分析を行う自動車業界のユーザーを対象とした、初の試みであるバイリンガル・テキスト・リトリーバル・エクスプレーネーションズ（BiTe-REx）を提案します。BiTe-RExは、クエリ言語に関係なく結果をリトリーブすることで、ユーザーがクエリに対する包括的な理解を得るのを支援し、基礎モデルがどのように文書の関連性を判断しているかを明示することで、より情報に基づいた意思決定を可能にします。ユーザースタディにより、システムが提供する説明の理解しやすさと有効性において、統計的に有意な結果を示しています。

Abstract:　 To satiate the comprehensive information need of users, retrieval systems surpassing the boundaries of language are inevitable in the present digital space in the wake of an ever-rising multilingualism. This work presents the first-of-its-kind Bilingual Text Retrieval Explanations (BiTe-REx) aimed at users performing competitor or wage analysis in the automotive domain. BiTe-REx supports users to gather a more comprehensive picture of their query by retrieving results regardless of the query language and enables them to make a more informed decision by exposing how the underlying model judges the relevance of documents. With a user study, we demonstrate statistically significant results on the understandability and helpfulness of the explanations provided by the system.

TARexp: A Python Framework for Technology-Assisted Review Experiments
Authors: Eugene Yang (1), David D. Lewis (2)
1: Johns Hopkins University, 2: Redgrave Data

ACM DL

Google Scholar

(377)
概要:　技術支援レビュー（TAR）は、情報検索（IR）および機械学習（ML）の重要な産業応用です。小規模ながらTARの研究コミュニティは存在していますが、TARソフトウェアとワークフローの複雑さが参入の大きな障壁となっています。過去のオープンソースTARの取り組みや、IRおよびMLのオープンソースソフトウェアのデザインパターンに基づいて、TARアルゴリズムの実験を行うためのオープンソースPythonフレームワークを紹介します。このフレームワークの主な特徴は、ワークフローおよび実験計画の宣言的な表現、ワークフロー内でのコンポーネントの可変な役割の実行能力、そして状態の維持および再起動機能です。ユーザーは、標準的なTARアルゴリズムのリファレンス実装を利用しながら、新規コンポーネントを組み込んで自らの研究興味を探求できます。フレームワークは、https://github.com/eugene-yang/tarexp で利用可能です。

Abstract:　 Technology-assisted review (TAR) is an important industrial application of information retrieval (IR) and machine learning (ML). While a small TAR research community exists, the complexity of TAR software and workflows is a major barrier to entry. Drawing on past open source TAR efforts, as well as design patterns from the IR and ML open source software, we present an open source Python framework for conducting experiments on TAR algorithms. Key characteristics of this framework are declarative representations of workflows and experiment plans, the ability for components to play variable numbers of workflow roles, and state maintenance and restart capabilities. Users can draw on reference implementations of standard TAR algorithms while incorporating novel components to explore their research interests. The framework is available at https://github.com/eugene-yang/tarexp.

ZeroMatcher: A Cost-Off Entity Matching System
Authors: Congcong Ge (1), Xiaocan Zeng (2), Lu Chen (1), Yunjun Gao (1)
1: Zhejiang University, 2: Zhejiang University

ACM DL

Google Scholar

(378)
概要:　エンティティ・マッチング (EM) は、異なる情報源から同じ実世界のエンティティを指すデータインスタンスを見つけることを目的としています。既存のEM技術は、コストが高いか、特定のデータタイプに特化していることが多いです。我々は、ZeroMatcherというコスト効率の高いエンティティ・マッチング・システムを提案します。ZeroMatcherは、以下をサポートします：(i) 関係データベースや知識グラフなど、異なるデータタイプに対応したEMタスクの処理、(ii) サブモジュールを軽量に更新することで常に競争力のあるEM性能を維持し、開発コストを削減、(iii) ヒューマン・アノテーションなしでEMを実行し、さらに労働コストを削減。まず、ZeroMatcherは入力データセットのデータタイプに応じて、適切なモジュールセットを自動的にユーザーに提案します。ユーザーは、自分の好みに応じて後続のEMプロセスのモジュールを指定したり、カスタマイズすることができます。その後、システムはEMタスクに進み、ユーザーがEMプロセス全体を追跡し、メモリ使用量の変化をリアルタイムで監視することができるようにします。EMプロセスが完了すると、ZeroMatcherは異なる観点からEM結果を視覚化し、ユーザーの理解を容易にします。最後に、ZeroMatcherはEM結果の評価を提供し、ユーザーが異なるパラメータ設定の効果を比較できるようにします。

Abstract:　 Entity Matching (EM) aims to find data instances from different sources that refer to the same real-world entity. The existing EM techniques can be either costly or tailored for a specific data type. We present ZeroMatcher, a cost-off entity matching system, which supports (i) handling EM tasks with different data types, including relational tables and knowledge graphs; (ii) keeping its EM performance always competitive by enabling the sub-modules to be updated in a lightweight manner, thus reducing development costs; and (iii) performing EM without human annotations to further slash the labor costs. First, ZeroMatcher automatically suggests users a set of appropriate modules for EM according to the data types of the input datasets. Users could specify the modules for the subsequent EM process according to their preferences. Alternatively, users are able to customize the modules of ZeroMatcher. Then, the system proceeds to the EM task, where users can track the entire EM process and monitor the memory usage changes in real-time. When the EM process is completed, ZeroMatcher visualizes the EM results from different aspects to ease the understanding for users. Finally, ZeroMatcher provides EM results evaluation, enabling users to compare the effectiveness among different parameter settings.

Table Enrichment System for Machine Learning
Authors: Yuyang Dong (1), Masafumi Oyamada (1)
1: NEC Corporation

ACM DL

Google Scholar

(379)
概要:　データサイエンティストは、不十分な表形式データで予測精度を向上させる問題に常に直面しています。我々は、データレイクから外部属性（列）を追加することでクエリテーブルを強化し、機械学習予測モデルの精度を向上させる表補強システムを提案します。本システムは、結合行検索、タスク関連テーブルの選択、行と列の整列、特徴選択と評価という4つのステージを備えており、指定された機械学習タスクおよびクエリテーブルに対して効率的に強化テーブルを作成します。表補強の利用事例を示すために、Web UIを用いて本システムを実演します。

Abstract:　 Data scientists are constantly facing the problem of how to improve prediction accuracy with insufficient tabular data. We propose a table enrichment system that enriches a query table by adding external attributes (columns) from data lakes and improves the accuracy of machine learning predictive models. Our system has four stages, join row search, task-related table selection, row and column alignment, and feature selection and evaluation, to efficiently create an enriched table for a given query table and a specified machine learning task. We demonstrate our system with a web UI to show the use cases of table enrichment.

QFinder: A Framework for Quantity-centric Ranking
Authors: Satya Almasian (1), Milena Bruseva (1), Michael Gertz (1)
1: Heidelberg University

ACM DL

Google Scholar

(380)
概要:　数量は私たちの値や測定の理解を形作り、物体の特性を伝える重要な手段です。しばしば、検索クエリは「800ユーロ以下のiPhone」のように、数量を検索単位として含みます。しかし、現代の検索エンジンは、数字や単位を適切に理解していません。クエリや文書内で、検索エンジンはこれらを通常のキーワードとして扱うため、例えば、より大きいか小さいかといった数値間の相対的条件や、一般的には数量の数値的近接性について無知です。本研究では、数量制約を持つクエリの検索結果をランキングするための、数量中心のフレームワークであるQFinderを示します。また、将来の利用のために、新しいランキング方法をElasticsearchのプラグインとしてオープンソース化しました。我々のデモは以下のリンクで利用可能です: https://qfinder.ifi.uni-heidelberg.de/

Abstract:　 Quantities shape our understanding of measures and values, and they are an important means to communicate the properties of objects. Often, search queries contain numbers as retrieval units, e.g., "iPhone that costs less than 800 Euros''. Yet, modern search engines lack a proper understanding of numbers and units. In queries and documents, search engines handle them as normal keywords and therefore are ignorant of relative conditions between numbers, such as greater than or less than, or, more generally, the numerical proximity of quantities. In this work, we demonstrate QFinder, our quantity-centric framework for ranking search results for queries with quantity constraints. We also open-source our new ranking method as an Elasticsearch plug-in for future use. Our demo is available at: https://qfinder.ifi.uni-heidelberg.de/

ROGUE: A System for Exploratory Search of GANs
Authors: Yang Liu (1), Alan Medlar (1), Dorota Glowacka (1)
1: University of Helsinki

ACM DL

Google Scholar

(381)
概要:　生成的敵対ネットワーク（GAN）からの画像検索は、いくつかの理由で困難です。第一に、GANの潜在空間と有用な意味的特徴との明確な対応がないため、ユーザーがナビゲートするのが難しいです。第二に、生成可能なユニークな画像の数が非常に多いため、既存の検索アルゴリズムのスケーリング特性に負担をかけます。本稿では、GANから生成された画像の探索的検索を支援するシステムROGUEを紹介します。GAN検索の文脈で、ファセット検索や関連フィードバックなど、探索的検索インターフェースによく見られる機能を実装する方法を示します。さらに、ユーザーが画像空間をナビゲートするのを助けるために強化学習を利用し、多様な画像を表示する探索と、肯定的な関連フィードバックが予測される画像を表示する利用とのトレードオフを行います。最後に、キャスティングディレクターの役割に配置された参加者が、次回の映画のために俳優のヘッドショットを探索するユーザビリティスタディを提示します。システムは平均SUSスコア72.8を獲得し、参加者全員がシステムで特定した画像に満足または非常に満足したと報告しました。システムは、この付随するビデオで示されています：https://vimeo.com/680036160。

Abstract:　 Image retrieval from generative adversarial networks (GANs) is challenging for several reasons. First, there are no clear mappings between the GAN's latent space and useful semantic features, making it difficult for users to navigate. Second, the number of unique images that can be generated is exceptionally high, taxing the scaling properties of existing search algorithms. In this article, we present ROGUE, a system to support exploratory search of images generated from GANs. We demonstrate how to implement features that are commonly found in exploratory search interfaces, such as faceted search and relevance feedback, in the context of GAN search. We additionally use reinforcement learning to help users navigate the image space [8], trading off exploration (showing diverse images) and exploitation (showing images predicted to receive positive relevance feedback). Finally, we present a usability study where participants were situated in the role of a casting director who needs to explore actors' headshots for an upcoming movie. The system obtained an average SUS score of 72.8 and all participants reported being either satisfied or very satisfied with the images they identified with the system. The system is shown in this accompanying video: https://vimeo.com/680036160.

CHERCHE: A New Tool to Rapidly Implement Pipelines in Information Retrieval
Authors: Raphaël Sourty (1), Jose G. Moreno (2), Lynda Tamine (2), Francois-Paul Servant (3)
1: Université Paul Sabatier, 2: Université Paul Sabatier, 3: Renault

ACM DL

Google Scholar

(382)
概要:　本デモ論文では、トランスフォーマーを用いた情報検索パイプラインを構築するための新しいオープンソースのPythonモジュール「CHERCHE」を紹介します。我々の目的は、簡単に接続可能なツールを提案することで、シンプルながら強力な最新の情報検索モデルを実行できるようにすることです。そのために、従来の語彙マッチングに基づくモデルだけでなく、意味マッチングに基づく最近のモデルも統合しました。実際に、多くのモデルが公開ハブで利用可能となり、数行のコードで情報検索タスクをテストすることができます。CHERCHEは、トランスフォーマーベースのモデルを大規模なツールに頼らずに小規模なコレクションで使用したいニューラル情報検索分野への新参者を対象としています。CHERCHEのコードとドキュメントは、https://github.com/raphaelsty/cherche で公開されています。

Abstract:　 In this demo paper, we present a new open-source python module for building information retrieval pipelines with transformers namely CHERCHE. Our aim is to propose an easy to plug tool capable to execute, simple but strong, state-of-the-art information retrieval models. To do so, we have integrated classical models based on lexical matching but also recent models based on semantic matching. Indeed, a large number of models available on public hubs can be now tested on information retrieval tasks with only a few lines. CHERCHE is oriented to newcomers into the neural information retrieval field that want to use transformer-based models in small collections without struggling with heavy tools. The code and documentation of CHERCHE is public available at https://github.com/raphaelsty/cherche

Online DATEing: A Web Interface for Temporal Annotations
Authors: Dennis Aumiller (1), Satya Almasian (1), David Pohl (1), Michael Gertz (1)
1: Heidelberg University

ACM DL

Google Scholar

(383)
概要:　二十年以上にわたる時間タグ付けや時間関係抽出に関する研究にもかかわらず、テキストを注釈するための実用的なツールは依然として非常に基本的であり、平均的なエンドユーザーの観点からは設定が困難であるため、開発の適用範囲が一部の関心を持つ研究者に限られています。本研究では、既存の時間注釈フレームワークとの対話を簡素化する直感的なWebインターフェース「Online DATEing」を提供することで、時間タグ付けシステムのアクセス性を向上させることを目指しています。我々のシステムは複数のアプローチを単一のインターフェースに統合し、ドキュメントのインポート（およびタグ付け）のプロセスを簡素化するとともに、プログラムAPIを通じてアクセスできるようにします。また、ユーザーがタグ付けされたテキストを対話的に調査および視覚化することを可能にし、新しいモデルやデータ形式を含めるための拡張可能なAPIを備えています。我々のツールのWebデモはhttps://onlinedating.ifi.uni-heidelberg.deで利用可能であり、公開コードはhttps://github.com/satya77/Temporal_Tagger_Serviceでアクセス可能です。

Abstract:　 Despite more than two decades of research on temporal tagging and temporal relation extraction, usable tools for annotating text remain very basic and hard to set up from an average end-user perspective, limiting the applicability of developments to a selected group of invested researchers. In this work, we aim to increase the accessibility of temporal tagging systems by presenting an intuitive web interface, called "Online DATEing", which simplifies the interaction with existing temporal annotation frameworks. Our system integrates several approaches in a single interface and streamlines the process of importing (and tagging) groups of documents, as well as making it accessible through a programmatic API. It further enables users to interactively investigate and visualize tagged texts, and is designed with an extensible API for the inclusion of new models or data formats. A web demonstration of our tool is available at https://onlinedating.ifi.uni-heidelberg.de and public code accessible at https://github.com/satya77/Temporal_Tagger_Service.

Are Taylor's Posts Risky? Evaluating Cumulative Revelations in Online Personal Data: A persona-based tool for evaluating awareness of online risks and harms
Authors: Leif Azzopardi (1), Jo Briggs (2), Melissa Duheric (2), Callum Nash (2), Emma Nicol (1), Wendy Moncur (3), Burkhard Schafer (4)
1: University of Strathclyde, 2: Northumbria University, 3: University of Strathclyde, 4: University of Edinburgh

ACM DL

Google Scholar

(384)
概要:　オンラインで人を検索することは、多くの人が一度は行ったことがある一般的な検索タスクです。インターネット上には人々に関する膨大な情報が存在し、異なる情報源から集められたデータを組み合わせることで、個人の詳細なプロフィールが形成され、その人物について推論が可能になります（これにより累積的な暴露が進行します）。そのため、ある情報の関連性は他の情報に依存し、状況によって変動します。これにより、個人のプロフィールや投稿、関連情報を検索する際には、情報の関連性を評価する新しい挑戦が生じ、またそのような暴露から生じうる潜在的なリスクも考慮する必要があります。

本デモンストレーション論文では、架空の人格（例：テイラー・アディソン）に関連する小さく一見無害な情報が、オンラインプロファイルやソーシャルメディアを検索・閲覧する際にどのように評価され、関連性および潜在リスクが判断されるかを調査するために設計されたツールを紹介します。このデモンストレーターには、サイバーセーフティツールも含まれており、累積的な暴露の潜在リスクについて教育を提供し、認識を高めることを目的としています。それは、個々の情報項目の関連性が検索者およびその特定の動機に依存するさまざまなシナリオで、参加者を巻き込むことでこれを実現します。

Abstract:　 Searching for people online is a common search task that most of us have performed at some point or other. With so much information about people available online it is often amazing what one can find out about someone else -- especially when information taken from different sources is pieced together to create a more detailed picture of the individual, and then used to make inferences about them (leading to cumulative revelations ). As such, the relevance of one piece of information is often conditional and dependent on other pieces of information found. This creates interesting and novel challenges in evaluating informationrelevance when searching personal profiles, posts and related information about an individual, as well as the potential risks that can arise from such revelations. In this demonstration paper, we present a tool designed to investigate how people assess and judge the relevance and potential risks ofsmall, apparently innocuous pieces of information associated with fictitious personas, such as Taylor Addison, when searching and browsing online profiles and social media. The demonstrator also comprises a cyber-safety tool, which aims to provide education and raise awareness of the potential risks of cumulative revelations. It does so by engaging participants in different scenarios where the relevance of individual information items depends on the searcher and their particular underlying motivation.

LawNet-Viz: A Web-based System to Visually Explore Networks of Law Article References
Authors: Lucio La Cava (1), Andrea Simeri (1), Andrea Tagarelli (1)
1: University of Calabria

ACM DL

Google Scholar

(385)
概要:　我々は、法文献ネットワークのモデリング、分析、および可視化を行うためのウェブベースのツール「LawNet-Viz」を紹介します。LawNet-Vizは、法的研究タスクを支援し、法律専門家および一般市民が、記事内容に明示的に含まれる法的参照を基にした記事の接続を視覚的に探索するのに役立ちます。LawNet-Vizの実証として、最近のBERTベースのモデルを利用し、イタリア民法典（ICC）に適用した事例を示します。LawNet-Vizは、商品開発を予定しているシステムプロトタイプです。

Abstract:　 We present LawNet-Viz, a web-based tool for the modeling, analysis and visualization of law reference networks extracted from a statute law corpus. LawNet-Viz is designed to support legal research tasks and help legal professionals as well as laymen visually exploring the article connections built upon the explicit law references detected in the article contents. To demonstrate LawNet-Viz, we show its application to the Italian Civil Code (ICC), which exploits a recent BERT-based model fine-tuned on the ICC. LawNet-Viz is a system prototype that is planned for product development.

SpaceQA: Answering Questions about the Design of Space Missions and Space Craft Concepts
Authors: Andres Garcia-Silva (1), Cristian Berrio (1), Jose Manuel Gomez-Perez (1), Jose Antonio Martínez-Heras (2), Alessandro Donati (3), Ilaria Roma (4)
1: Expert.ai, 2: Solenix, 3: ESA - ESOC, 4: ESA - ESTEC

ACM DL

Google Scholar

(386)
概要:　私たちは、スペースミッション設計における初のオープンドメインQAシステムであるSpaceQAを発表します。SpaceQAは、欧州宇宙機関（ESA）が行っているイニシアティブの一環であり、宇宙ミッション設計に関する情報へのアクセス、共有、および再利用を促進することを目的としています。我々は、高密度なレトリーバーとニューラルリーダーを組み合わせた最先端アーキテクチャを採用し、ドメイン固有の注釈データが不足しているため、ファインチューニングではなく転移学習に基づくアプローチを選択しました。ESAが生成したテストセットでの評価では、評価されたレトリーバーによって元々報告された結果とほぼ一致し、読解のためのファインチューニングの必要性が確認されました。本論文執筆時点で、ESAはSpaceQAを内部で試行しています。

Abstract:　 We present SpaceQA, to the best of our knowledge the first open-domain QA system in Space mission design. SpaceQA is part of an initiative by the European Space Agency (ESA) to facilitate the access, sharing and reuse of information about Space mission design within the agency and with the public. We adopt a state-of-the-art architecture consisting of a dense retriever and a neural reader and opt for an approach based on transfer learning rather than fine-tuning due to the lack of domain-specific annotated data. Our evaluation on a test set produced by ESA is largely consistent with the results originally reported by the evaluated retrievers and confirms the need of fine tuning for reading comprehension. As of writing this paper, ESA is piloting SpaceQA internally.

DIANES: A DEI Audit Toolkit for News Sources
Authors: Xiaoxiao Shang (1), Zhiyuan Peng (1), Qiming Yuan (1), Sabiq Khan (1), Lauren Xie (1), Yi Fang (1), Subramaniam Vincent (1)
1: Santa Clara University

ACM DL

Google Scholar

(387)
概要:　プロフェッショナルなニュースメディア組織は常に多様な視点の重要性を強調してきました。しかし、実際には、それぞれの立場を尊重するという伝統的なアプローチは、支配的な文化を持つ人々に有利なものでした。そのため、新しい多様性、公平性、包括性（DEI）規範の下で倫理的な批判を受けることとなりました。ジャーナリズムにDEIを適用すると、従来の公平性や偏見の概念を超えて、ジャーナリスティックなソーシングの実践を民主化します（誰が引用されるか、インタビューされるか、誰がされないか、その頻度、どの人種群や性別からか等々）。現在、記者が引用している人物をリアルタイムまたはオンデマンドで分析するツールは存在しません。本論文では、ニュースソースのためのDEI監査ツールキット「DIANES」を紹介します。DIANESはバックエンドにおいて、ニュース記事から引用、発言者、職位、組織をリアルタイムで抽出する自然言語処理のパイプラインで構成されています。フロントエンドでは、DIANESはWordPressプラグイン、Webモニター、およびDEI注釈APIサービスを提供し、ニュースメディアが自らの引用パターンを監視し、DEI規範に向けて自己改善するのを支援します。

Abstract:　 Professional news media organizations have always touted the importance that they give to multiple perspectives. However, in practice, the traditional approach to all-sides has favored people in the dominant culture. Hence it has come under ethical critique under the new norms of diversity, equity, and inclusion (DEI). When DEI is applied to journalism, it goes beyond conventional notions of impartiality and bias and instead democratizes the journalistic practice of sourcing -- who is quoted or interviewed, who is not, how often, from which demographic group, gender, and so forth. There is currently no real-time or on-demand tool in the hands of reporters to analyze the persons they quote. In this paper, we present DIANES, a DEI Audit Toolkit for News Sources. It consists of a natural language processing pipeline on the backend to extract quotes, speakers, titles, and organizations from news articles in real time. On the frontend, DIANES offers the WordPress plugins, a Web monitor, and a DEI annotation API service, to help news media monitor their own quoting patterns and push themselves towards DEI norms.

A2A-API: A Prototype for Biomedical Information Retrieval Research and Benchmarking
Authors: Maciej Rybinski (1), Liam Watts (1), Sarvnaz Karimi (1)
1: CSIRO Data61

ACM DL

Google Scholar

(388)
概要:　関連する文献を見つけることは生物医学研究およびエビデンスに基づく医療実践において極めて重要であり、生物医学検索は情報検索の重要な応用分野です。このことは広範なIR（情報検索）コミュニティによって認識されており、とりわけ2003年早々にテキスト検索会議（TREC）の主催者によっても認識されています。TRECは重要な評価リソースを提供しますが、生物医学IRを始めるには、いくつかの大規模な文書コレクションを解析、インデックス化、展開するという重要なソフトウェア工学の課題に直面する必要があります。さらに、この分野の新参者の多くは理論的な概念が技術的な側面と絡み合っており、急峻な学習カーブに直面します。最後に、既存のベースラインやシステムの多くは再現が困難であるという問題もあります。これらの3つのボトルネックを解消するために、A2A-APIを発表します。A2A-APIは、既存の生物医学TRECコレクションに対して、使いやすくプログラミング言語に依存しないインターフェースを提供するRESTful APIです。私たちの生物医学情報検索ベンチマークシステムであるA2Aに基づいて構築され、追加の機能を拡張しています。元のA2Aシステムの機能（主にベンチマーキングに焦点を当てたもの）へのプログラム的アクセスを提供することに加えて、A2A-APIはリランキングやクエリの再構成コンポーネントを備えたシステムの開発を行う生物医学IR研究者を支援します。本デモンストレーションでは、A2A-APIの機能を包括的なユースケースを通じて紹介します。

Abstract:　 Finding relevant literature is crucial for biomedical research and in the practice of evidence-based medicine, making biomedical search an important application area within the field of information retrieval. This is recognised by the broader IR community, and in particular by the organisers of Text Retrieval Conference (TREC) as early as 2003. While TREC provides crucial evaluation resources, to get started in biomedical IR one needs to tackle an important software engineering hurdle of parsing, indexing, and deploying several large document collections. Moreover, many newcomers to the field often face a steep learning curve, where theoretical concepts are tangled up with technical aspects. Finally, many of the existing baselines and systems are difficult to reproduce. We aim to alleviate all three of these bottlenecks with the launch of A2A-API. It is a RESTful API which serves as an easy-to-use and programming-language-independent interface to existing biomedical TREC collections. It builds upon A2A, our system for biomedical information retrieval benchmarking, and extends it with additional functionalities. Apart from providing programmatic access to the features of the original A2A system - focused principally on benchmarking - A2A-API supports biomedical IR researchers in development of systems featuring reranking and query reformulation components. In this demonstration, we illustrate the capabilities of A2A-API with comprehensive use cases.

NeuralKG: An Open Source Library for Diverse Representation Learning of Knowledge Graphs
Authors: Wen Zhang (1), Xiangnan Chen (1), Zhen Yao (1), Mingyang Chen (2), Yushan Zhu (2), Hongtao Yu (2), Yufeng Huang (1), Yajing Xu (1), Ningyu Zhang (1), Zezhong Xu (2), Zonggang Yuan (3), Feiyu Xiong (4), Huajun Chen (2)
1: School of Software Technology, 2: College of Computer Science and Technology, 3: Huawei Technologies Co., 4: Alibaba Group

ACM DL

Google Scholar

(389)
概要:　 NeuralKGは、多様な知識グラフの表現学習のためのオープンソースのPythonベースのライブラリです。本ライブラリは、従来型の知識グラフ埋め込み (KGE) メソッド、グラフニューラルネットワーク (GNN) ベースのKGEメソッド、ルールベースのKGEメソッドの三種類を実装しています。統一されたフレームワークにより、NeuralKGはこれらのメソッドのリンク予測結果をベンチマークで再現し、特に非Pythonプログラミング言語で書かれた一部のメソッドを再実装する手間を解消します。さらに、NeuralKGは高い構成可能性と拡張性を備えています。各種のモジュールが分離されており、相互に組み合わせたり適用したりすることが可能です。そのため、NeuralKGを使用することで、開発者や研究者は自分自身で設計したモデルを迅速に実装し、最適な訓練方法を見つけ出して効率的に最高のパフォーマンスを達成することができます。私たちは、オープンかつ共有された知識グラフ表現学習コミュニティを組織するためのウェブサイトhttp://neuralkg.zjukg.orgを構築しました。このライブラリ、実験方法論、およびNeuralKGのモデル再実装結果はすべてhttps://github.com/zjukg/NeuralKGで公開されています。

Abstract:　 NeuralKG is an open-source Python-based library for diverse representation learning of knowledge graphs. It implements three kinds of Knowledge Graph Embedding (KGE) methods, including conventional KGEs, GNN-based KGEs, and Rule-based KGEs. With a unified framework, NeuralKG successfully reproduces link prediction results of these methods on benchmarks, freeing users from the laborious task of reimplementing them, especially for some methods originally written in non-python programming languages. Besides, NeuralKG is highly configurable and extensible. It provides various decoupled modules that can be mixed and adapted to each other. Thus with NeuralKG, developers and researchers can quickly implement their own designed models and obtain the optimal training methods to achieve the best performance efficiently. We built a website http://neuralkg.zjukg.org to organize an open and shared KG representation learning community. The library, experimental methodologies, and model reimplement results of NeuralKG are all publicly released at https://github.com/zjukg/NeuralKG.

IRVILAB: Gamified Searching on Multilingual Wikipedia
Authors: Paavo Arvola (1), Tuulikki Alamettälä (1)
1: Tampere University

ACM DL

Google Scholar

(390)
概要:　情報検索（IR）の評価は、文書とクエリを一致させる競争の一形態と見なすことができます。本論文では、文書検索のためのクエリ構築をゲーム化した学習環境であるIRVILAB（Information Retrieval Virtual Lab）を紹介します。このラボは、標準的な評価設定を作成するモジュール、関連性評価を含むトピック作成のためのモジュール、およびユーザークエリのパフォーマンス評価のモジュールを備えています。さらに、多言語対応のWikipediaオンラインコレクションを活用し、関連性評価を他の言語に翻訳するモジュールも含まれています。このゲームの根底にあるものは、情報検索のパフォーマンス指標を活用し、参加者の情報検索パフォーマンスを測定しフィードバックを提供することです。それは、参加者の検索スキルや専門知識の向上を目指し、実験的方法を導入することで科学教育にも貢献します。システムの顕著な特徴には、アルゴリズムによる関連性評価と自動リコールベースの翻訳が含まれます。

Abstract:　 Information retrieval (IR) evaluation can be considered as a form of competition in matching documents and queries. This paper introduces a learning environment based on gamification of query construction for document retrieval, called IRVILAB (Information Retrieval Virtual Lab). The lab has modules for creating standard evaluation settings, one for topic creation including relevance assessments and another for performance evaluation of user queries. In addition, multilingual Wikipedia online collection enables a module, where relevance assessments are translated to other languages. The underlying game utilizes IR performance metrics to measure and give feedback on participants' information retrieval performance. It aims to improve participants' search skills, subject knowledge and contributes to science education by introducing an experimental method. Distinctive features of the system include algorithmic relevance assessments and automatic recall base translation.

PKG: A Personal Knowledge Graph for Recommendation
Authors: Yu Yang (1), Jiangxu Lin (1), Xiaolian Zhang (2), Meng Wang (1)
1: Southeast University, 2: Huawei Technologies Co. Ltd.

ACM DL

Google Scholar

(391)
概要:　この時代において、モバイルインターネットユーザーは常にデバイス上で個人データを生成しています。本論文では、異なるソースからユーザーのデータを統合するための新しいシステムを、個人知識グラフ（PKG）として紹介します。我々は、ユーザーの意図をどのように検出し、個人データをユーザーの行動によって整合させ、接続するかを示します。構築されたPKGは、異なるサービスを横断する「ニューラル＋象徴的」アプローチによって、ユーザーに対して合理的かつ正確な推奨を行うことを可能にします。私たちのシステムはhttps://youtu.be/hWuo8KCDrtoでご覧いただけます。

Abstract:　 Mobile internet users generate personal data on the devices all the time in this era. In this paper, we demonstrate a novel system for integrating the data of a user from different sources into a Personal Knowledge Graph, i.e., PKG. We show how a user's intention can be detected and how the personal data can be aligned and connected by the user behaviors. The constructed PKG allows the system makes reasonable and accurate recommendations for users by a "neural + symbolic'' approach across different services. Our system is shown in https://youtu.be/hWuo8KCDrto.

A Python Interface to PISA!
Authors: Sean MacAvaney (1), Craig Macdonald (1)
1: University of Glasgow

ACM DL

Google Scholar

(392)
概要:　 PISA（学術用高性能インデックスと検索）は、疎な逆インデックスに対するさまざまな検索アルゴリズムの非常に効率的な実装を提供します。しかし、高度に最適化されたC++実装は、これまでコマンドラインツールを介してのみアクセス可能でした。インデックス作成から検索まで、5～6つのコマンドを順に実行する必要があり、プロセスは比較的複雑です。PISAを使用する際のさらなる問題として、ビルドプロセスが長時間かかることや、他のツールとの互換性が最小限であることが挙げられます。本研究では、PISAのネイティブPythonラッパーを提供する新しいツールを紹介します。このラッパーはPyTerrier APIに準拠した簡素化されたインターフェースを備えており、使いやすい（例：Pandas DataFrameを使用）、多数のデータセットに適用可能（例：ir_datasetsパッケージのデータセット）であり、他の方法（例：ニューラルリランキングや密な検索方法）と組み合わせることが可能です。

Abstract:　 PISA (Performant Indexes and Search for Academia) provides very efficient implementations of various retrieval algorithms over sparse inverted indices. The highly-optimized C++ implementation, however, has previously only been accessible via command line tools. From indexing to retrieval, 5--6 commands need to be executed in sequence, making the process relatively involved. Further complications when using PISA include a lengthy build process and minimal interoperability with other tools. In this work, we demonstrate a new tool that provides a native Python wrapper around PISA. The wrapper features a simplified interface that adheres to the PyTerrier API, making it easy to use (e.g., via Pandas DataFrames), apply to a multitude of datasets (e.g., those from the ir_datasets package) and combine with other methods (e.g., neural re-ranking and dense retrieval methods).

Arm: Efficient Learning of Neural Retrieval Models with Desired Accuracy by Automatic Knowledge Amalgamation
Authors: Linzhu Yu (1), Dawei Jiang (1), Ke Chen (1), Lidan Shou (1)
1: Zhejiang University

ACM DL

Google Scholar

(393)
概要:　近年、コーパスから学習された公開ニューラルリトリーバルモデルをテキストリトリーバルに採用することに対する関心が高まっています。これらのモデルは学習に使用されたデータセットに対しては優れたリトリーバル性能を示しますが、新しいテキストデータに対しては性能が低下する可能性があります。学習に使用したデータと学習後に収集された最新のデータの両方に対して望ましいリトリーバル性能を得るためには、単純に両方のデータセットから新しいモデルを学習するアプローチでは、アノテーションされたデータセットが学習モデルと共に公開されていることが少ないため、必ずしも実現可能ではありません。この問題に対処するために、知識融合（KA）は新たな技術として浮上しています。KAは、訓練データにアクセスせずに、いくつかの学習済みモデル（教師モデルと呼ばれる）を再利用（融合）することによって、新しいデータから新しいモデル（生徒モデルと呼ばれる）を学習します。しかし、効率的かつ正確な生徒モデルを学習するためには、古典的なKAアプローチでは適切な教師モデルのサブセットを手動で選定する必要があります。この手動の手順は、大量の候補教師モデルが利用可能であるリトリーバルタスクにスケールすることを妨げます。本研究は、新しいデータに対する精度を自動的に融合することで効率的にニューラルリトリーバルモデルを学習するための知的システムであるArmを提案します。Armは、精度予測モデルをベイズ最適化で派生させ、サンプリングされた融合タスクに基づいて精度の低い組み合わせを除外することで、正確な生徒モデルを生成します。さらに、Armはトレーニングコストモデルを導入して、最少のトレーニングコストで最終的な生徒モデルを生成する教師モデルの組み合わせを選定します。本稿では、Armの主要なワークフローを示し、生成された生徒モデルをユーザーに提示します。

Abstract:　 In recent years, there has been increasing interest in adopting published neural retrieval models learned from corpora for text retrieval. Although these models achieve excellent retrieval performance, in terms of popular accuracy metrics, on datasets they have been trained, their performance on new text data might degrade. To obtain the desired retrieval performance on both the data used in training and the latest data collected after training, the simple approach of learning a new model from both datasets is not always feasible since the annotated dataset used in training is often not published along with the learned model. Knowledge amalgamation (KA) is an emerging technique to deal with this problem of inaccessibility of data used in previous training. KA learns a new model (called a student model) from new data by reusing (called amalgamating) a number of trained models (called teacher models) instead of accessing the teachers' original training data. However, in order to efficiently learn an accurate student model, the classical KA approach requires manual selection of an appropriate subset of teacher models for amalgamation. This manual procedure for selecting teacher models prevents the classical KA from being scaled to retrieval tasks for which a large number of candidate teacher models are ready to be reused. This paper presents Arm, an intelligent system for efficiently learning a neural retrieval model with the desired accuracy on incoming data by automatically amalgamating a subset of teacher models (called a teacher model combination or simply combination ) among a large number of teacher models. o filter combinations that fail to produce accurate student models, Arm employs Bayesian optimization to derive an accuracy prediction model based on sampled amalgamation tasks. Then, Arm uses the derived prediction model to exclude unqualified combinations without training the rest combinations. To speed up training, Arm introduces a cost model that picks the teacher model combination with the minimal training cost among all qualified teacher model combinations to produce the final student model. This paper will demonstrate the major workflow of Arm and present the produced student models to users.

Quote Erat Demonstrandum: A Web Interface for Exploring the Quotebank Corpus
Authors: Vuk Vuković (1), Akhil Arora (1), Huan-Cheng Chang (1), Andreas Spitz (2), Robert West (1)
1: EPFL, 2: University of Konstanz

ACM DL

Google Scholar

(394)
概要:　ニュースにおける情報伝達の中で、引用の使用は最も直接的でフィルタリングが少ない方法です。したがって、引用はニュース記事の構想、受け取り、分析において中心的な役割を果たします。引用は通常の報道よりも話者の意見を直接にうかがうことができるため、ジャーナリストや研究者にとって貴重な資源です。ニュースから引用を自動的に抽出し、話者に帰属させる方法に関しては多くの研究が行われていますが、現代のソースからの帰属された引用の包括的なコーパスは一般にはほとんど公開されていません。ここでは、ニュースから得られた膨大な引用のコレクションであるQuotebankを検索するための適応型ウェブインターフェースを紹介します。このインターフェースはhttps://quotebank.dlab.toolsで公開しています。

Abstract:　 The use of attributed quotes is the most direct and least filtered pathway of information propagation in news. Consequently, quotes play a central role in the conception, reception, and analysis of news stories. Since quotes provide a more direct window into a speaker's mind than regular reporting, they are a valuable resource for journalists and researchers alike. While substantial research efforts have been devoted to methods for the automated extraction of quotes from news and their attribution to speakers, few comprehensive corpora of attributed quotes from contemporary sources are available to the public. Here, we present an adaptive web interface for searching Quotebank, a massive collection of quotes from the news, which we make available at https://quotebank.dlab.tools.

DDEN: A Heterogeneous Learning-to-Rank Approach with Deep Debiasing Experts Network
Authors: Wenchao Xiu (1), Yiran Wang (1), Taofeng Xue (1), Kai Zhang (1), Qin Zhang (1), Zhonghuo Wu (1), Yifan Yang (1), Gong Zhang (1)
1: Meituan

ACM DL

Google Scholar

(395)
概要:　 Learning-to-Rank(LTR)は、ウェブ検索や位置情報サービス(LBS)検索など、多くの情報検索(IR)シナリオで広く使用されています。しかし、既存のほとんどのLTR技術は主に同質ランキングに焦点を当てています。例えば、Dianping検索のQACでは、提案クエリ(SQ)や興味のある場所(POI)などの異質な文書をランク付けして提示し、ユーザー体験を向上させる必要があります。異質ランキングを実行する際には、一貫性のない特徴空間や、異なる表現空間による位置バイアスの問題がより深刻になります。そこで、我々は、Mixture-of-Expertsアーキテクチャとゲーティングネットワークに基づく新しい異質LTRアプローチであるDeep Debiasing Experts Network (DDEN)を提案し、ランキングシステムにおける文書の特徴空間の不一致に対処します。さらに、DDENは異質LTR技術を組み込んだ敵対的デバイアスフレームワークを採用することで位置バイアスを軽減します。我々は、Dianpingという最大級のローカルライフプラットフォームからの産業データセットを用いて再現実験を行い、DDENをオンラインアプリケーションに展開しました。結果は、DDENがオフライン評価においてランキング性能を大幅に向上させ、オンラインA/Bテストで全体のクリック率を2.1%向上させることを示しています。

Abstract:　 Learning-to-Rank(LTR) is widely used in many Information Retrieval(IR) scenarios, including web search and Location Based Services(LBS) search. However, most existing LTR techniques mainly focus on homogeneous ranking. Taking QAC in Dianping search as an example, heterogeneous documents including suggested queries (SQ) and Point-of-Interests(POI) need to be ranked and presented to enhance user experience. New challenges are faced when conducting heterogeneous ranking, including inconsistent feature space and more serious position bias caused by distinct representation spaces. Therefore, we propose Deep Debiasing Experts Network (DDEN), a novel heterogeneous LTR approach based on Mixture-of-Experts architecture and gating network, to deal with the inconsistent feature space of documents in ranking system. Furthermore, DDEN mitigates the position bias by adopting adversarial-debiasing framework embedded with heterogeneous LTR techniques. We conduct reproducible experiments on industrial datasets from Dianping, one of the largest local life platforms, and deploy DDEN in online application. Results show that DDEN substantially improves ranking performance in offline evaluation and boost the overall click-through rate in online A/B test by 2.1%.

ClueWeb22: 10 Billion Web Documents with Rich Information
Authors: Arnold Overwijk (1), Chenyan Xiong (1), Jamie Callan (2)
1: Microsoft, 2: Carnegie Mellon University

ACM DL

Google Scholar

(396)
概要:　 ClueWeb22は、ClueWebデータセットシリーズの最新バージョンであり、産業界と学術界の1年以上の協力の成果です。その設計は、学術コミュニティの研究ニーズと大規模産業システムの実世界のニーズに影響を受けています。以前のClueWebデータセットと比較して、ClueWeb22コーパスはより大きく、より多様で、高品質な文書を含んでいます。その中心は生のHTMLですが、文書のクリーンテキスト版も含まれており、参入障壁を低くしています。ClueWeb22のいくつかの側面は、この規模で研究コミュニティに初めて提供されており、例えば、レンダリングされたウェブページの視覚的表現、HTML文書から解析された構造化情報、そして文書分布（ドメイン、言語、トピック）の商業ウェブ検索への一致が含まれます。この講演では、ClueWeb22の設計と構築について共有し、その新機能について議論します。この新しく、より大きく、より豊富なClueWebコーパスが、情報検索（IR）、自然言語処理（NLP）、およびディープラーニングの幅広い研究を可能にし、支援することができると信じています。

Abstract:　 ClueWeb22, the newest iteration of the ClueWeb line of datasets, is the result of more than a year of collaboration between industry and academia. Its design is influenced by the research needs of the academic community and the real-world needs of large-scale industry systems. Compared with earlier ClueWeb datasets, the ClueWeb22 corpus is larger, more varied, and has higher-quality documents. Its core is raw HTML, but it includes clean text versions of documents to lower the barrier to entry. Several aspects of ClueWeb22 are available to the research community for the first time at this scale, for example, visual representations of rendered web pages, parsed structured information from the HTML document, and the alignment of document distributions (domains, languages, and topics) to commercial web search. This talk shares the design and construction of ClueWeb22, and discusses its new features. We believe this newer, larger, and richer ClueWeb corpus will enable and support a broad range of research in IR, NLP, and deep learning.

An Auto Encoder-based Dimensionality Reduction Technique for Efficient Entity Linking in Business Phone Conversations
Authors: Md Tahmid Rahman Laskar (1), Cheng Chen (1), Jonathan Johnston (1), Xue-Yong Fu (1), Shashi Bhushan TN (1), Simon Corston-Oliver (1)
1: Dialpad Canada Inc.

ACM DL

Google Scholar

(397)
概要:　エンティティリンクシステムは、テキスト内の固有名詞を知識ベース内の対応するエントリにリンクさせます。近年、トランスフォーマーアーキテクチャを活用したエンティティリンクシステムの構築が多くの注目を集めています。しかし、限られたリソース環境で産業生産環境にトランスフォーマーベースのニューラルエンティティリンクシステムを展開することは、挑戦的な課題です。本研究では、トランスフォーマーベースのBERTエンコーダ（BLINKモデル）を活用し、ビジネス電話の会話中の製品および組織タイプのエンティティを対応するWikipediaエントリに接続するエンティティリンクシステムを提案します。我々は、事前に学習されたBERT埋め込みの次元を1024から256に効果的に圧縮できるオートエンコーダを用いた次元削減技術を提案します。これにより、リソースが限られたクラウドマシンにおいて高い精度を保ちながら、推論時間を短縮し、空間要件を大幅に最適化することが可能になります。

Abstract:　 An entity linking system links named entities in a text to their corresponding entries in a knowledge base. In recent years, building an entity linking system that leverages the transformer architecture has gained lots of attention. However, deploying a transformer-based neural entity linking system in industrial production environments in a limited resource setting is a challenging task. In this work, we present an entity linking system that leverages a transformer-based BERT encoder (the BLINK model) to connect the product and organization type entities in business phone conversations to their corresponding Wikipedia entries. We propose a dimensionality reduction technique via utilizing an auto encoder that can effectively compress the dimension of the pre-trained BERT embeddings to 256 from the original size of 1024. This allows our entity linking system to significantly optimize the space requirement when deployed in a resource limited cloud machine while reducing the inference time along with retaining high accuracy.

An Intelligent Advertisement Short Video Production System via Multi-Modal Retrieval
Authors: Yanheng Wei (1), Lianghua Huang (1), Yanhao Zhang (1), Yun Zheng (1), Pan Pan (1)
1: Alibaba Group

ACM DL

Google Scholar

(398)
概要:　広告映像制作とは、商品やサービスに関するメッセージを一般の人々に伝えることが基本的な役割です。デジタルマーケティングの時代において、広告映像は最も人気のある観客との接触方法です。しかし、広告映像の制作は、創作、素材の撮影、編集から最終的な商業映像までを含む、費用と時間を要する複雑なプロセスです。そのため、質の高い広告映像を制作することは資本と人材を必要とするため、新興企業や経験の浅い広告制作者には大きな課題となります。本論文では、記述的なコピーの入力のみを必要とし、マルチモーダル検索によって駆動されるインテリジェントな広告映像制作システムを提案します。このシステムは自動的に脚本を生成し、主要なクエリを抽出し、動画ライブラリから関連する短編動画素材を検索し、最終的に短編広告動画を合成します。このプロセス全体は人間の介入力を最小限に抑え、広告映像制作の敷居を大幅に引き下げ、生産性と効率を大幅に向上させます。また、新しいマルチモーダルアルゴリズムの研究を促進するためのモジュラー設計を採用し、バッチモードで評価することができます。また、ユーザーインターフェースと統合でき、インタラクティブモードでユーザー調査およびデータ収集が可能であり、バックエンドは完全にアルゴリズムベースまたは「ウィザード・オブ・オズ」設定にすることができます。提案するシステムは完全に検証されており、Alibaba内での商品の広告短編動画制作において広範な展望を持っています。

Abstract:　 In its most basic form, advertising video production communicates a message about a product or service to the public. In the age of digital marketing, where the most popular way to connect with audiences is through advertising videos. However, advertising video production is a costly and complicated process from creation, material shooting, editing to the final commercial video. Therefore, producing qualified advertising videos is a capital and talent-intensive task, which poses a huge challenge for start-ups or inexperienced ad creators. paper proposes an intelligent advertising video production system driven by multi-modal retrieval, which only requires the input of descriptive copy. This system can automatically generate scripts, then extract key queries, retrieve related short video materials in the video library, and finally synthesize short advertising videos. The whole process minimizes human input, greatly reduces the threshold for advertising video production and greatly improves output and efficiency. It has a modular design to encourage the study of new multi-modal algorithms, which can be evaluated in batch mode. It can also integrate with a user interface, which allows user studies and data collection in an interactive mode, where the back end can be fully algorithmic or a wizard of oz setup. The proposed system has been fully verified and has broad prospects in the production of short videos for commodity advertisements within Alibaba.

Applications and Future of Dense Retrieval in Industry
Authors: Yubin Kim (1)
1: Etsy

ACM DL

Google Scholar

(399)
概要:　大規模な検索エンジンは、通常、少なくとも二層からなる階層型システムとして設計されています。L1候補検索層は、遥かに大規模なコーパスから効率的に潜在的に関連するドキュメントのサブセット（通常は約1000件）を生成します。L1システムは効率性を重視し、リコールを最大化するように設計されています。L2再ランキング層では、より計算コストが高いが精度の高いモデル（例えば、ランク学習またはニューラルモデル）を使用して、L1によって生成された候補を再ランク付けし、最終結果リストの精度を最大化します。

伝統的に、候補検索は逆インデックスデータ構造を使用して正確な語彙一致で行われていました。候補は q と d がドキュメント/クエリおよびコーパスにおけるトークンの頻度に基づいて導出されたトークン重みを含むスパースベクトルで構成される点積のようなスコアリング関数 f(q,d) によって順序付けられます。逆インデックスはドキュメントの準線形ランク付けを可能にします。ドキュメントとクエリのスパースベクトル表現のため、語彙一致検索システムはスパース検索とも呼ばれます。対照的に、デンス検索はクエリとドキュメントを低次元のデンスベクトルに埋め込んで表現します。候補ドキュメントはクエリとドキュメントの埋め込みベクトル間の距離に基づいてスコアリングされます。実際には、近似 k-近傍（ANN）システムを用いて効率的に類似度計算が行われます。本パネルでは、ウェブ検索、企業および個人検索、電子商取引、ドメイン外検索など、複数の産業応用にわたるデンス検索の専門家を一堂に会します。

Abstract:　 Large-scale search engines are often designed as tiered systems with at least two layers. The L1 candidate retrieval layer efficiently generates a subset of potentially relevant documents (typically ~1000 documents) from a corpus many orders of magnitude larger in size. L1 systems emphasize efficiency and are designed to maximize recall. The L2 re-ranking layer uses a more computationally expensive, but more accurate model (e.g. learning-to-rank or neural model) to re-rank the candidates generated by L1 in order to maximize precision of the final result list. Traditionally, candidate retrieval was performed with an inverted index data structure, with exact lexical matching. Candidates are ordered by a dot-product-like scoring function f(q,d) where q and d are sparse vectors containing token weights, typically derived from the token's frequency in the document/query and corpus. The inverted index enables sub-linear ranking of the documents. Due to the sparse vector representation of the documents and queries, lexical match retrieval systems have also been called sparse retrieval. To contrast, dense retrieval represents queries and documents by embedding the text into lower dimensional dense vectors. Candidate documents are scored based on the distance between the query and document embedding vectors. Practically, the similarity computations are made efficiently with approximate k-nearest neighbours (ANN) systems. In this panel, we bring together experts in dense retrieval across multiple industry applications, including web search, enterprise and personal search, e-commerce, and out-of-domain retrieval.

Scalable User Interface Optimization Using Combinatorial Bandits
Authors: Ioannis Kangas (1), Maud Schwoerer (1), Lucas Bernardi (1)
1: Booking.com

ACM DL

Google Scholar

(400)
概要:　主要な電子商取引プラットフォームの使命は、顧客が自身のニーズに最適な商品を見つける手助けをすることです。大規模な在庫がある場合、円滑なナビゲーションを実現するために複雑なユーザーインターフェース（UI）が必要です。しかし、UIには多くの異なる関連性を持つウィジェットが含まれることが多く、顧客体験を向上させるための最適なレイアウトを構築する課題が生じます。これは特に、複数の独立したチームがウィジェットを追加および変更することで対立する一般的な産業環境において、困難な課題となります。さらに、顧客の嗜好が時間とともに進化するため、適応的なソリューションが必要となります。過去の研究[6]では、機械学習（ML）アルゴリズムによって動作し、最適なレイアウトを自動的かつ継続的に探索するUIガバナンスフレームワークを導入し、この課題に取り組みました。しかしながら、産業界で実装された際には、ウィジェットの依存性、組み合わせ的解空間、コールドスタート問題など、素朴なアルゴリズムの選択がいくつかの問題を引き起こすことを指摘しました。本研究では、マルチアームバンディット（MAB）の拡張である「組み合わせバンディット」を使用して、これらの問題に対処する方法を示します。組み合わせバンディットでは、エージェントが複数のアームを同時に選択します。自然言語処理（NLP）および進化アルゴリズム（EA）分野に触発されて、組み合わせバンディットをモデル化するための2つの新しいアプローチを開発し、スケーラブルなUI最適化を可能にする能力を示します。

Abstract:　 The mission of major e-commerce platforms is to enable their customers to find the best products for their needs. In the common case of large inventories, complex User Interfaces (UIs) are required to allow a seamless navigation. However, as UIs often contain many widgets of different relevance, the task of constructing an optimal layout arises in order to improve the customer's experience. This is a challenging task, especially in the typical industrial setup where multiple independent teams conflict by adding and modifying UI widgets. It becomes even more challenging due to the customer preferences evolving over time, bringing the need for adaptive solutions. In a previous work [6], we addressed this task by introducing a UI governance framework powered by Machine Learning (ML) algorithms that automatically and continuously search for the optimal layout. Nevertheless, we highlighted that naive algorithmic choices exhibit several issues when implemented in the industry, such as widget dependency, combinatorial solution space and cold start problem. In this work, we demonstrate how we deal with these issues using Combinatorial Bandits, an extension of Multi-Armed Bandits (MAB) where the agent selects not only one but multiple arms at the same time. We develop two novel approaches to model combinatorial bandits, inspired by the Natural Language Processing (NLP) and the Evolutionary Algorithms (EA) fields and present their ability to enable scalable UI optimization.

Flipping the Script: Inverse Information Seeking Dialogues for Market Research
Authors: Josh Seltzer (1), Kathy Cheng (1), Shi Zong (2), Jimmy Lin (2)
1: Nexxt Intelligence, 2: University of Waterloo

ACM DL

Google Scholar

(401)
概要:　情報検索は伝統的に、主に静的なリソースから情報を検索し抽出することと捉えられてきました。対話型情報検索（Interactive Information Retrieval, IIR）は、その範囲を拡大し、主に対話を通じて情報検索空間を明確化（すなわち、明示化し、または精緻化）する役割を果たしてきました。市場調査の実践に基づき、私たちはIIRを人間の対話相手から新たな情報を引き出すプロセスとして再定義し、インタビューアの役割を果たすチャットボット風の仮想エージェントを導入します。この再定義により、従来のIIRを「逆情報探索対話（inverse information seeking dialogue）」と呼ぶ新しい概念に転換し、仮想エージェントが人間の発話から繰り返し情報を抽出し、関連する情報を引き出す質問をする役割を果たします。本研究では、逆情報探索エージェントの正式な定義を導入し、その特有の課題のいくつかを概説し、自然言語処理（NLP）とIIRの技術に基づく本問題解決のための新しいフレームワークを提案します。

Abstract:　 Information retrieval has traditionally been framed in terms of searching and extracting information from mostly static resources. Interactive information retrieval (IIR) has widened the scope, with interactive dialogues largely playing the role of clarifying (i.e., making explicit, and/or refining) the information search space. Informed by market research practices, we seek to reframe IIR as a process of eliciting novel information from human interlocutors, with a chatbot-inspired virtual agent playing the role of an interviewer. This reframing flips conventional IIR into what we call an inverse information seeking dialogue, wherein the virtual agent recurrently extracts information from human utterances and poses questions intended to elicit related information. In this work, we introduce and provide a formal definition of an inverse information seeking agent, outline some of its unique challenges, and propose our novel framework to tackle this problem based on techniques from natural language processing (NLP) and IIR.

Information Ecosystem Threats in Minoritized Communities: Challenges, Open Problems and Research Directions
Authors: Shiri Dori-Hacohen (1), Scott A. Hale (2)
1: University of Connecticut & AuCoDe, 2: Meedan & University of Oxford

ACM DL

Google Scholar

(402)
概要:　ジャーナリスト、ファクトチェッカー、学者、そしてコミュニティメディアは、ジェンダー、人種、民族に対する情報エコシステムの脅威（誤情報、ヘイトスピーチ、武器化された論争、オンラインからオフラインへの嫌がらせなど）に苦しむコミュニティを支援しようとする試みにおいて、圧倒されています。しかし、多くの理由により、これらの脅威に対抗する現在のアプローチでは、少数派コミュニティが十分に支援されていません。本パネルでは、そのようなコミュニティやそれを支援しようとする研究者が直面する課題と未解決問題について紹介し、議論します。また、情報検索（IR）分野の最新技術の現状や、コンピュータサイエンス全体、および分野横断的・セクター横断的な協力を必要とする将来的に有望な方向性についても議論します。このパネルは、IR実務者および研究者の両方を引きつけ、少なくとも一人のIR以外の専門家を含み、この分野でのユニークな専門知識を提供します。

Abstract:　 Journalists, fact-checkers, academics, and community media are overwhelmed in their attempts to support communities suffering from gender-, race- and ethnicity-targeted information ecosystem threats, including but not limited to misinformation, hate speech, weaponized controversy and online-to-offline harassment. Yet, for a plethora of reasons, minoritized groups are underserved by current approaches to combat such threats. In this panel, we will present and discuss the challenges and open problems facing such communities and the researchers hoping to serve them. We will also discuss the current state-of-the-art as well as the most promising future directions, both within IR specifically, across Computer Science more broadly, as well as that requiring transdisciplinary and cross-sectoral collaborations. The panel will attract both IR practitioners and researchers and include at least one panelist outside of IR, with unique expertise in this space.

Extractive Search for Analysis of Biomedical Texts
Authors: Daniel Clothiaux (1), Ravi Starzl (1)
1: Bioplx & Carnegie Mellon University

ACM DL

Google Scholar

(403)
概要:　抽出検索はクエリおよび構文パターンに一致するデータセット作成に用いられてきましたが、そのデータセットを活用する方法にはあまり注目されていません。本研究では、生物医学テキストを対象とした2段階のシステムを提案します。まず、強力なキーワードと構文マッチングを用いてカスタムデータセットを作成します。その後、関連する単語リストを返し、意味検索を提供し、大規模言語モデルを訓練し、合成データに基づくQAモデルおよびモデルをこれらの結果に基づいてトレーニングします。そして、これらを下流の生物医学研究に活用します。

Abstract:　 Extractive search has been used to create datasets matching queries and syntactic patterns, but less attention has been paid on what to do with those datasets. We present a two-stage system targeted towards biomedical texts. First, it creates custom datasets using a powerful mix of keyword and syntactic matching. We then return lists of related words, provide semantic search, train a large language model, a synthetic data based QA model, a summarization model over those results, and so on. These are then used in downstream biomedical work.

Organizing Portuguese Legal Documents through Topic Discovery
Authors: Daniela Vianna (1), Edleno Silva de Moura (1)
1: Universidade Federal do Amazonas & Jusbrasil

ACM DL

Google Scholar

(404)
概要:　法的分野における重要な課題の一つは、絶えず増加する法的文書のコレクションを整理およびし、隠れたトピックやテーマを明らかにすることです。これらのトピックやテーマは、法的案件の検索や司法判断の予測といったタスクを支援することができます。この大量のデジタル法的文書と、世界中の司法システムの固有の複雑さが結びつくことで、特に自然言語処理（NLP）の分野での進展を利用する機械学習のソリューションにとって、有望なシナリオが生まれます。このようなシナリオにおいて、ブラジル最大のリーガルテック企業であるJusbrasilが位置しています。部分的にJusbrasilの法務チームによってキュレーションされたデータセットを使用し、最新の言語モデルを用いたトピックモデリングのソリューションを探求し、法律ポルトガル語文書を訓練データとして、この複雑な文書のコレクションを自動的に整理およびします。通常、多くのページで構成されている全体の法的案件を使用するのではなく、各裁判所の判決から重要なポイントを簡潔にしたシラバス（ポルトガル語ではementa jurisprudencial）を使用することで、効率的にコレクションを整理できることを示します。

Abstract:　 A significant challenge in the legal domain is to organize and summarize a constantly growing collection of legal documents, uncovering hidden topics, or themes, that later can support tasks such as legal case retrieval and legal judgment prediction. This massive amount of digital legal documents, combined with the inherent complexity of judiciary systems worldwide, presents a promising scenario for Machine Learning solutions, mainly those taking advantage of all the advancements in the area of Natural Language Processing (NLP). It is in this scenario that Jusbrasil, the largest legal tech company in Brazil, is situated. Using a dataset partially curated by the Jusbrasil legal team, we explore topic modeling solutions using state of the art language models, trained with legal Portuguese documents, to automatically organize and summarize this complex collection of documents. Instead of using an entire legal case, which usually is composed of many pages, we show that it is possible to efficiently organize the collection using the syllabus (in Portuguese, ementa jurisprudencial) from each court decision as they concisely summarize the main points presented by the entire decision.

Query Facet Mapping and its Applications in Streaming Services: The Netflix Case Study
Authors: Sudeep Das (1), Ivan Provalov (1), Vickie Zhang (1), Weidong Zhang (1)
1: Netflix Inc.

ACM DL

Google Scholar

(405)
概要:　 Netflix検索のようなインスタント検索環境では、各キー入力に対して結果が返されるため、部分的なクエリがビデオ、タレント、ジャンルといった広範な関連エンティティまたはファセットにどのように対応するかを決定することは、そのクエリの基礎となる目的の理解を促進します。このようなクエリからファセットへのマッピングシステムには多数の用途があります。検索結果の品質向上、意味のある結果の整理、またNetflixのサービスに存在しないエンティティを検索した場合に、メンバーに対して透明性をもって説明することで信頼を築くなどに役立ちます。各キー入力ごとに関連するファセットを予測することで、検索セッション内での体験をより良くガイドすることもできます。クエリ全体を集約すると、メンバーの興味のパターンが明らかになります。このようなシステムを構築するための主な課題は、語彙的な類似性と行動的な関連性を慎重にバランスさせることです。本論文では、Netflixで開発したクエリファセットマッピングシステムのを示し、その主要なコンポーネントを説明し、実世界のデータによる評価結果を提供し、いくつかの潜在的な応用について概説します。

Abstract:　 In an instant search setting such as Netflix Search where results are returned in response to every keystroke, determining how a partial query maps onto broad classes of relevant entities orfacets --- such as videos, talent, and genres --- can facilitate a better understanding of the underlying objective of that query. Such a query-to-facet mapping system has a multitude of applications. It can help improve the quality of search results, drive meaningful result organization, and can be leveraged to establish trust by being transparent with Netflix members when they search for an entity that is not available on the service. By anticipating the relevant facets with each keystroke entry, the system can also better guide the experience within a search session. When aggregated across queries, the facets can reveal interesting patterns of member interest. A key challenge for building such a system is to judiciously balance lexical similarity with behavioral relevance. In this paper, we present a high level overview of a Query Facet Mapping system that we have developed at Netflix, describe its main components, provide evaluation results with real-world data, and outline several potential applications.

A Low-Cost, Controllable and Interpretable Task-Oriented Chatbot: With Real-World After-Sale Services as Example
Authors: Xiangyu Xi (1), Chenxu Lv (2), Yuncheng Hua (2), Wei Ye (3), Chaobo Sun (2), Shuaipeng Liu (2), Fan Yang (4), Guanglu Wan (2)
1: Meituan Group & Peking University, 2: Meituan Group, 3: Peking University, 4: Meituan Group

ACM DL

Google Scholar

(406)
概要:　従来のタスク指向対話システムは業界で広く使用されているものの、以下の3つのボトルネックに悩まされています：(i) オントロジーの構築が難しい（例：インテントとスロット）；(ii) 制御性と解釈性が低い；(iii) アノテーションに多くの労力を要する。本論文では、対話発話を「対話アクション」というシンプルな概念で表現することを提案し、その上にツリー構造を持つタスクフローを構築し、タスクフローをコアコンポーネントとしてタスク指向チャットボットを開発します。大規模な対話データから自動的にタスクフローを構築し、オンラインに展開するためのフレームワークを提示します。実世界のアフターサービスにおける実験結果から、タスクフローが主要なニーズを満たし、開発者の負担を効果的に軽減できることが示されました。

Abstract:　 Though widely used in industry, traditional task-oriented dialogue systems suffer from three bottlenecks: (i) difficult ontology construction (e.g., intents and slots); (ii) poor controllability and interpretability; (iii) annotation-hungry. In this paper, we propose to represent utterance with a simpler concept named Dialogue Action, upon which we construct a tree-structured TaskFlow and further build task-oriented chatbot with TaskFlow as core component. A framework is presented to automatically construct TaskFlow from large-scale dialogues and deploy online. Our experiments on real-world after-sale customer services show TaskFlow can satisfy the major needs, as well as reduce the developer burden effectively.

An Industrial Framework for Cold-Start Recommendation in Zero-Shot Scenarios
Authors: Zhaoxin Huan (1), Gongduo Zhang (1), Xiaolu Zhang (1), Jun Zhou (1), Qintong Wu (1), Lihong Gu (1), Jinjie Gu (1), Yong He (1), Yue Zhu (1), Linjian Mo (1)
1: Ant Group

ACM DL

Google Scholar

(407)
概要:　レコメンデーションシステムには、ユーザーとアイテムのインタラクションが不十分な場合にコールドスタート問題が存在します。この問題を緩和するために、既存のほとんどの研究は、全アイテムにわたってグローバルに共有される事前知識を学習し、少数のインタラクションで新しいアイテムに迅速に適応することを目指しています。しかし、これらの学習技術はデータを多く必要とし、インタラクションが全くない新しいアイテムにはうまく機能しません。本適用論文では、Alipayで最近導入されたコールドスタート問題のゼロショットシナリオに対処する産業用フレームワークを紹介します。この提案されたフレームワークは、ログデータがないコールドアイテムに対しても効率的かつ高品質なレコメンデーションを提供します。具体的には、コールドスタート問題をゼロショット学習問題として定式化し、大規模プラットフォームで使用されるオンラインゼロショットレコメンデーションを実現するための高度に効率的なインフラストラクチャを構築しました。広範なオフライン実験およびオンラインA/Bテストにより、提案されたフレームワークが優れたパフォーマンスを持ち、最新の方法よりも効率的にコールドアイテムを好ましいユーザーにレコメンドすることが示されました。

Abstract:　 There exists the cold-start problem in the recommendation systems when observed user-item interactions are insufficient. To alleviate this problem, most existing works aim to learn globally shared prior knowledge across all items and be fast adapted to a new item with few interactions. However, such learning techniques are data demanding and work poorly on new items with no interactions. In this applied paper, we present an industrial framework recently deployed on Alipay to address the item cold-start problem in zero-shot scenarios. The proposed framework provides both efficient and high-quality recommendations for cold items with no log data. Specifically, we formulate the cold-start problem as a zero-shot learning problem and build a highly efficient infrastructure to accomplish online zero-shot recommendations used on large-scale platforms. Extensive offline experiments and online A/B testing demonstrate that the proposed framework has superior performance and recommends cold items to preferred users more effectively than other state-of-the-art methods.

Unsupervised Product Offering Title Quality Scores
Authors: Henry S. Vieira (1)
1: LuizaLabs

ACM DL

Google Scholar

(408)
概要:　商品オファーのタイトルは、ユーザー消費のための商品特性をテキスト形式で統合したものです。商品のタイトルのテキスト内容の品質が低い場合、全体のショッピング体験に対して悪影響を及ぼす可能性があります。このネガティブな体験は、望む商品を発見できないことから始まり、商品とその特性を識別する際の問題、さらには不要な商品の購入にまで広がる可能性があります。この問題の解決策として、商品のタイトルの品質を自動的に説明する指標を確立することが考えられます。この評価により、品質の低いタイトルで登録された商品を持つ販売者に通知し、修正を促すか改善の提案を行うことが可能となります。本研究の焦点は、無監督学習法を用いて、eコマース市場環境における商品オファーの説明的品質を示すスコアをどのように付与するかを示すことです。

Abstract:　 The title of a product offering is the consolidation of a product's characteristics in textual format for user consumption. The low quality of the textual content of a product's title can negatively influence the entire shopping experience. The negative experience can start with the impossibility of discovering a desired product, going from problems in identifying a product and its characteristics up to the purchase of an unwanted item. A solution to this problem is to establish an indicator that automatically describes the quality of the product title. With this assessment, it is possible to notify sellers who have registered products with poor quality titles and encourage revisions or suggest improvements. The focus of this work is to show how it is possible to assign a score that indicates the descriptive quality of product offers in an e-commerce marketplace environment using unsupervised methods.

Learning to Rank Instant Search Results with Multiple Indices: A Case Study in Search Aggregation for Entertainment
Authors: Scott Rome (1), Sardar Hamidian (2), Richard Walsh (3), Kevin Foley (2), Ferhan Ture (2)
1: Comcast, 2: Comcast, 3: Comcast

ACM DL

Google Scholar

(409)
概要:　 Xfinityでは、インスタント検索システムが様々なソースからの検索結果を提供しています。各キー入力ごとに新しい結果がユーザーの画面に表示され、それらには映画、テレビシリーズ、スポーツイベント、ミュージックビデオ、ニュースクリップ、人物ページなどが含まれる可能性があります。また、ユーザーはXfinityの音声リモートを使用して、より長いクエリを入力することもでき、その中にはよりオープンエンドなものもあります。クエリの例としては、複数の結果に一致する不完全な単語（例：「ali」）、トピック検索（例：「ヴァンパイア映画」）、より具体的な長い検索（例：「アダム・サンドラー出演映画」）があります。検索結果は語彙的マッチング、意味的マッチング、アイテム間の類似性マッチング、または様々なビジネスロジックに基づくソースに依存するため、結果を一つのリストに統合することが主要な課題となります。これを達成するために、我々は検索クエリを考慮した学習ランキング（Learning to Rank: LTR）ニューラルモデルを用いてリストを統合することを提案します。この統合リストは、ユーザーの検索履歴とプログラムのメタデータに関する知識を持つ第2のLTRニューラルモデルによってパーソナライズされます。インスタント検索については文献においてあまり言及されていないため、我々の研究から得られた知見を他の実務者と共有することを目的としています。

Abstract:　 At Xfinity, an instant search system provides a variety of results for a given query from different sources. For each keystroke, new results are rendered on screen to the user, which could contain movies, television series, sporting events, music videos, news clips, person pages, and other result types. Users are also able to use the Xfinity Voice Remote to submit longer queries, some of which are more open-ended. Examples of queries include incomplete words which match multiple results through lexical matching (i.e., "ali"), topical searches ("vampire movies"), and more specific longer searches ("Movies with Adam Sandler"). Since results can be based on lexical matches, semantic matches, item-to-item similarity matches, or a variety of business logic driven sources, a key challenge is how to combine results into a single list. To accomplish this, we propose merging the lists via a Learning to Rank (LTR) neural model which takes into account the search query. This combined list can be personalized via a second LTR neural model with knowledge of the user's search history and metadata of the programs. Because instant search is under-represented in the literature, we present our learnings from research to aid other practitioners.

Recent Advances in Retrieval-Augmented Text Generation
Authors: Deng Cai (1), Yan Wang (2), Lemao Liu (2), Shuming Shi (2)
1: The Chinese University of Hong Kong, 2: Tencent AI Lab

ACM DL

Google Scholar

(410)
概要:　近年、検索強化型テキスト生成は多くの自然言語処理（NLP）タスクにおいて最先端の性能を達成しており、NLPおよび情報検索（IR）コミュニティの関心を集めています。本チュートリアルは、この検索強化型テキスト生成の最近の進展を包括的かつ比較的に紹介することを目指しています。まず、検索強化型テキスト生成の一般的なパラダイムを強調し、次に対話生成、機械翻訳、およびその他の生成タスクを含む異なるテキスト生成タスクに対する注目すべき研究をレビューします。最後に、今後の研究を促進するための限界や短所を指摘します。

Abstract:　 Recently retrieval-augmented text generation has achieved state-of-the-art performance in many NLP tasks and has attracted increasing attention of the NLP and IR community, this tutorial thereby aims to present recent advances in retrieval-augmented text generation comprehensively and comparatively. It firstly highlights the generic paradigm of retrieval-augmented text generation, then reviews notable works for different text generation tasks including dialogue generation, machine translation, and other generation tasks, and finally points out some limitations and shortcomings to facilitate future research.

Retrieval and Recommendation Systems at the Crossroads of Artificial Intelligence, Ethics, and Regulation
Authors: Markus Schedl (1), Emilia Gómez (2), Elisabeth Lex (3)
1: Johannes Kepler University Linz & Linz Institue of Technology, 2: European Commission, 3: Graz University of Technology

ACM DL

Google Scholar

(411)
概要:　本チュートリアルは、情報検索およびレコメンダシステムの研究分野に特化して、公平性と差別禁止、多様性、およびAIシステムの透明性に関する学際的なを提供することを目的としています。本チュートリアルを通じて、SIGIRの主に技術的な聴衆に対し、一方では研究開発が持つ倫理的影響についての必要な理解を装備し、他方では前述の課題に対処する最近の政治的および法的規制についての理解を深めさせたいと考えています。

Abstract:　 This tutorial aims at providing its audience an interdisciplinary overview about the topics of fairness and non-discrimination, diversity, and transparency of AI systems, tailored to the research fields of information retrieval and recommender systems. By means of this tutorial, we would like to equip the mostly technical audience of SIGIR with the necessary understanding of the ethical implications of their research and development on the one hand, and of recent political and legal regulations that address the aforementioned challenges on the other hand.

Sequential/Session-based Recommendations: Challenges, Approaches, Applications and Opportunities
Authors: Shoujin Wang (1), Qi Zhang (2), Liang Hu (3), Xiuzhen Zhang (4), Yan Wang (5), Charu Aggarwal (6)
1: RMIT University & Macquarie University, 2: DeepBlue Academy of Sciences, 3: Tongji University, 4: RMIT University, 5: Macquarie University, 6: IBM T. J. Watson Research Center

ACM DL

Google Scholar

(412)
概要:　近年、逐次推薦システム（SRS）およびセッションベース推薦システム（SBRS）は、ユーザの短期的かつ動的な嗜好を捉え、よりタイムリーかつ正確な推薦を可能にする新しいパラダイムとして登場しました。SRSおよびSBRSは広く研究されてきましたが、多様な記述、設定、仮定、および適用分野によるいくつかの不整合が存在しています。SRS/SBRSの分野における共通のさまざまな不整合を解消するための統一的なフレームワークや問題提起を提供する研究は存在しません。また、この分野におけるデータの特性、主要な課題、最も代表的かつ最先端のアプローチ、典型的な実世界の応用および重要な将来の研究方向について、包括的かつ体系的なデモンストレーションを提供する研究も不足しています。本研究は、このようなギャップを埋めることを目的としており、このエキサイティングで活気あふれる分野でのさらなる研究を促進することを目指します。

Abstract:　 In recent years, sequential recommender systems (SRSs) and session-based recommender systems (SBRSs) have emerged as a new paradigm of RSs to capture users' short-term but dynamic preferences for enabling more timely and accurate recommendations. Although SRSs and SBRSs have been extensively studied, there are many inconsistencies in this area caused by the diverse descriptions, settings, assumptions and application domains. There is no work to provide a unified framework and problem statement to remove the commonly existing and various inconsistencies in the area of SR/SBR. There is a lack of work to provide a comprehensive and systematic demonstration of the data characteristics, key challenges, most representative and state-of-the-art approaches, typical real- world applications and important future research directions in the area. This work aims to fill in these gaps so as to facilitate further research in this exciting and vibrant area.

Continual Learning Dialogue Systems - Learning during Conversation
Authors: Sahisnu Mazumder (1), Bing Liu (2)
1: Intel Labs, 2: University of Illinois at Chicago

ACM DL

Google Scholar

(413)
概要:　対話システム、一般にチャットボットとして知られるものは、ユーザーとの雑談やパーソナルアシスタントとして様々なタスクを実行するために、近年ますます普及しています。しかし、依然としていくつかの主要な弱点があります。一つの重要な弱点は、これらのシステムが一般に事前収集され手動でラベル付けされたデータから訓練されるか、手作業で作成されたルールを基にしていることです。また、それらの知識ベース（KB）は固定されており、人間の専門家によって事前に編纂されています。このため、手動での作業に膨大な労力がかかり、スケーリングが難しく、自然言語を理解する能力の限界や知識ベース内の知識が限られているため、しばしば多くのエラーを引き起こします。この結果として、これらのシステムが展開される際、ユーザーの満足度が低いという問題が生じます。このチュートリアルでは、チャットボットが会話中に継続的かつインタラクティブに新しい知識を学習する能力、つまり自ら「オンザジョブ」で学ぶ方法を紹介し、議論します。このため、システムがユーザーと多くの対話を重ねるにつれて、ますます知識が豊富になり、性能が向上します。チュートリアルの前半では、生涯学習と継続学習のパラダイムを紹介し、対話型AIアプリケーションにおける様々な関連問題や課題について議論します。後半では、このテーマに関する最近の進展を紹介し、対話における継続的な語彙と事実知識の学習、展開後のオープンドメイン対話学習、ユーザーとのやりとりを通じた新しい言語表現の学習など、言語の意味付けアプリケーション（例：自然言語インターフェース）に焦点をあてます。最後に、継続的な会話スキル学習の範囲について論じ、将来の研究に向けたいくつかの未解決の課題を提示します。

Abstract:　 Dialogue systems, commonly known as Chatbots, have gained escalating popularity in recent years due to their wide-spread applications in carrying out chit-chat conversations with users and accomplishing various tasks as personal assistants. However, they still have some major weaknesses. One key weakness is that they are typically trained from pre-collected and manually-labeled data and/or written with handcrafted rules. Their knowledge bases (KBs) are also fixed and pre-compiled by human experts. Due to the huge amount of manual effort involved, they are difficult to scale and also tend to produce many errors ought to their limited ability to understand natural language and the limited knowledge in their KBs. Thus, when these systems are deployed, the level of user satisfactory is often low. In this tutorial, we introduce and discuss methods to give chatbots the ability to continuously and interactively learn new knowledge during conversation, i.e. "on-the-job" by themselves so that as the systems chat more and more with users, they become more and more knowledgeable and improve their performance over time. The first half of the tutorial focuses on introducing the paradigm of lifelong and continual learning and discuss various related problems and challenges in conversational AI applications. In the second half, we present recent advancements on the topic, with a focus on continuous lexical and factual knowledge learning in dialogues, open-domain dialogue learning after deployment and learning of new language expressions via user interactions for language grounding applications (e.g. natural language interfaces). Finally, we conclude with a discussion on the scopes for continual conversational skill learning and present some open challenges for future research.

Improving Efficiency and Robustness of Transformer-based Information Retrieval Systems
Authors: Edmon Begoli (1), Sudarshan Srinivasan (1), Maria Mahbub (1)
1: Oak Ridge National Laboratory (ORNL)

ACM DL

Google Scholar

(414)
概要:　このチュートリアルは、トランスフォーマーベースのアプローチの効率性と堅牢性を向上させるための理論的および実践的な側面に焦点を当てており、これらが実用的で大規模かつ大量の情報検索（IR）シナリオで効果的に使用できるようにすることを目的としています。このチュートリアルは、巨大なナラティブデータセット（85億件の医療ノート）を扱う際の私たちの作業と経験に触発され、またトランスフォーマーベースのIRタスクに関する基礎研究と学術的経験によって情報が提供されています。さらに、チュートリアルでは、敵対的（AI）の悪用に対してトランスフォーマーベースのIRを堅牢にするための技術にも焦点を当てています。これはIRドメインにおける最近の懸念であり、私たちはこれを考慮に入れる必要がありました。そこで、学んだ教訓と適用可能な原則を聴衆と共有したいと考えています。最後に、このチュートリアルの重要な要素として、ディダクティズム（教示主義）に焦点を当てています。これは、チュートリアル内容を明確で直感的な言葉で伝えることです。トランスフォーマーは挑戦的な主題であり、私たちの教育経験を通じて、このアーキテクチャおよび関連する原則のすべての側面をできるだけ単純明快かつ直感的な方法で説明することの重要性を認識しました。これは、提案するチュートリアルの定義的なスタイルです。

Abstract:　 This tutorial focuses on both theoretical and practical aspects of improving the efficiency and robustness of transformer-based approaches, so that these can be effectively used in practical, high-scale, and high-volume information retrieval (IR) scenarios. The tutorial is inspired and informed by our work and experience while working with massive narrative datasets (8.5 billion medical notes), and by our basic research and academic experience with transformer-based IR tasks. Additionally, the tutorial focuses on techniques for making transformer-based IR robust against adversarial (AI) exploitation. This is a recent concern in the IR domain that we needed to take into concern, and we want to want to share some of the lessons learned and applicable principles with our audience. Finally, an important, if not critical, element of this tutorial is its focus on didacticism -- delivering tutorial content in a clear, intuitive, plain-speak fashion. Transformers are a challenging subject, and, through our teaching experience, we observed a great value and a great need to explain all relevant aspects of this architecture and related principles in the most straightforward, precise, and intuitive manner. That is the defining style of our proposed tutorial.

Gender Fairness in Information Retrieval Systems
Authors: Amin Bigdeli (1), Negar Arabzadeh (2), Shirin SeyedSalehi (1), Morteza Zihayat (1), Ebrahim Bagheri (1)
1: Ryerson University, 2: University of Waterloo

ACM DL

Google Scholar

(415)
概要:　最近の研究により、情報検索（IR）システムの表現的およびアルゴリズム的側面にステレオタイプ的な性別バイアスが入り込む可能性があり、その結果が検索結果に現れることが示されています。本チュートリアルでは、IRシステムにおけるステレオタイプ的な性別バイアスの存在を体系的に報告した様々な研究を紹介します。さらに、IRシステムにおける性別バイアスに関する既存の研究を(1)関連性評価データセット、(2)検索手法の構造、および(3)クエリと文書のために学習された表現に関連するものとして分類します。これらの各コンポーネントが検索中にバイアスを強化する原因となり得るか、あるいは影響を受けるかを示します。これらの問題を踏まえ、性別バイアスを測定、制御、または軽減する方法について議論した文献からのアプローチを紹介します。加えて、IRシステムの性別バイアスを調査するために頻繁に使用される公的に利用可能なデータセットと、性別バイアス軽減戦略の有用性を評価するための評価方法についても紹介します。

Abstract:　 Recent studies have shown that it is possible for stereotypical gender biases to find their way into representational and algorithmic aspects of retrieval methods; hence, exhibit themselves in retrieval outcomes. In this tutorial, we inform the audience of various studies that have systematically reported the presence of stereotypical gender biases in Information Retrieval (IR) systems. We further classify existing work on gender biases in IR systems as being related to (1) relevance judgement datasets, (2) structure of retrieval methods, and (3) representations learnt for queries and documents. We present how each of these components can be impacted by or cause intensified biases during retrieval. Based on these identified issues, we then present a collection of approaches from the literature that have discussed how such biases can be measured, controlled, or mitigated. Additionally, we introduce publicly available datasets that are often used for investigating gender biases in IR systems as well as evaluation methodology adopted for determining the utility of gender bias mitigation strategies.

Self-Supervised Learning for Recommender System
Authors: Chao Huang (1), Xiang Wang (2), Xiangnan He (2), Dawei Yin (3)
1: University of Hong Kong, 2: University of Science and Technology of China, 3: Baidu Inc.

ACM DL

Google Scholar

(416)
概要:　レコメンダーシステムは、情報の過剰供給を軽減し、ユーザーにアイテムを推奨するために、Eコマースサイト、ビデオ共有プラットフォーム、ライフスタイルアプリケーションなど、幅広いウェブアプリケーションの重要なコンポーネントとなっています。しかし、既存の多くのレコメンデーションモデルは監督付き学習方式に従っており、実際のアプリケーションで一般的なスパースでノイズの多いデータに対する表現能力が著しく制限されています。最近では、自己教師付き学習（SSL）が、有効な監督信号に大きく依存せずにラベルの付いていないデータから有益な知識を抽出する有望な学習パラダイムとして注目されています。自己教師付き学習の効果に触発され、最近の取り組みでは、拡張された補助的な学習タスクを用いて、様々なレコメンデーション表現学習シナリオにSSLの優位性をもたらしています。本チュートリアルでは、既存の自己教師付き学習フレームワークの体系的なレビューを提供し、一般的な協調フィルタリングパラダイム、ソーシャルレコメンデーション、シーケンシャルレコメンデーション、多行動レコメンデーションなど、様々なレコメンデーションシナリオに対応する課題を分析します。その後、この分野の今後の方向性と課題について議論を行います。この新興で有望なテーマの紹介を通じて、聴衆がこの分野に対する深い理解を持つことを期待しています。また、自己教師付き学習のレコメンデーション技術の発展を促進するために、より多くのアイデアや議論を促進することを目指しています。

Abstract:　 Recommender systems have become key components for a wide spectrum of web applications (e.g., E-commerce sites, video sharing platforms, lifestyle applications, etc), so as to alleviate the information overload and suggest items for users. However, most existing recommendation models follow a supervised learning manner, which notably limits their representation ability with the ubiquitous sparse and noisy data in practical applications. Recently, self-supervised learning (SSL) has become a promising learning paradigm to distill informative knowledge from unlabeled data, without the heavy reliance on sufficient supervision signals. Inspired by the effectiveness of self-supervised learning, recent efforts bring SSL's superiority into various recommendation representation learning scenarios with augmented auxiliary learning tasks. In this tutorial, we aim to provide a systemic review of existing self-supervised learning frameworks and analyze the corresponding challenges for various recommendation scenarios, such as general collaborative filtering paradigm, social recommendation, sequential recommendation, and multi-behavior recommendation. We then raise discussions and future directions of this area. With the introduction of this emerging and promising topic, we expect the audience to have a deep understanding of this domain. We also seek to promote more ideas and discussions, which facilitates the development of self-supervised learning recommendation techniques.

What the Actual...Examining User Behaviour in Information Retrieval
Authors: George Buchanan (1), Dana Mckay (2)
1: University of Melbourne, 2: RMIT University

ACM DL

Google Scholar

(417)
概要:　実際のユーザーを対象とする研究を行うことは、情報検索において繰り返し直面する課題です。本チュートリアルでは、ユーザー調査を実施する際の戦略的および戦術的な選択肢の主要なポイントについて検討します。評価と形成的調査の両方を考慮しながら進めます。再現性と自然なユーザー行動の確保との間の緊張関係に焦点を当て、各研究者が意図的で論理的な選択を行えるよう支援します。講師陣は、インタラクティブな情報検索および情報相互作用の分野で合わせて50年以上の経験を有しています。

Abstract:　 Conducting studies involving actual users is a recurring challenge in information retrieval. In this tutorial we will address the main strategic and tactical choices for engaging with, designing and executing user studies, considering both evaluation and formative investigation. The tension between reproducibility and ensuring natural user behaviour will be a recurring focus, seeking to help individual researchers make an intentional and well-argued choice for their research. The presenters have over fifty years of combined experience working in interactive information retrieval, and information interaction in general.

Beyond Opinion Mining: Summarizing Opinions of Customer Reviews
Authors: Reinald Kim Amplayo (1), Arthur Brazinskas (2), Yoshi Suhara (3), Xiaolan Wang (4), Bing Liu (5)
1: Google Research, 2: University of Edinburgh, 3: Grammarly, 4: Megagon Labs, 5: University of Illinois at Chicago

ACM DL

Google Scholar

(418)
概要:　情報化時代において、顧客レビューは購買決定において極めて重要な役割を果たしています。このようなレビューを自動的にすることで、ユーザーに意見の概観を提供することができます。本チュートリアルでは、研究者や実務者に有用な意見のさまざまな側面を紹介します。まず、タスクと主要な課題を紹介します。次に、既存の意見ソリューションを、ニューラルネットワーク以前のものとニューラルネットワークを用いたものの両方を紹介します。器を無監督、少数ショット、および監督下の各レジームでどのように訓練できるかを説明します。それぞれのレジームは、オートエンコーディング、制御可能なテキスト生成、変分推論などの異なる機械学習手法に根ざしています。最後に、リソースと評価方法について議論し、将来の方向性をまとめます。この3時間のチュートリアルは、意見の主要な進展について包括的なを提供します。リスナーは、研究および実践的な応用に有用な知識を十分に身につけることができるでしょう。

Abstract:　 Customer reviews are vital for making purchasing decisions in the Information Age. Such reviews can be automatically summarized to provide the user with an overview of opinions. In this tutorial, we present various aspects of opinion summarization that are useful for researchers and practitioners. First, we will introduce the task and major challenges. Then, we will present existing opinion summarization solutions, both pre-neural and neural. We will discuss how summarizers can be trained in the unsupervised, few-shot, and supervised regimes. Each regime has roots in different machine learning methods, such as auto-encoding, controllable text generation, and variational inference. Finally, we will discuss resources and evaluation methods and conclude with the future directions. This three-hour tutorial will provide a comprehensive overview over major advances in opinion summarization. The listeners will be well-equipped with the knowledge that is both useful for research and practical applications.

Deep Knowledge Graph Representation Learning for Completion, Alignment, and Question Answering
Authors: Soumen Chakrabarti (1)
1: IIT Bombay

ACM DL

Google Scholar

(419)
概要:　知識グラフ（KG）はエンティティと関係を表すノードとエッジを持つ。KGは検索や質問応答（QA）の中心的な役割を果たすが、KGの深層/ニューラル表現および深層QAに関する研究は、主にAI、ML、NLPのコミュニティへと移行している。本チュートリアルの目的は、IR研究者に対し、AI、ML、NLPのコミュニティから得られるニューラルKG表現と推論のベストプラクティスを徹底的にアップデートすることである。その後、IRコミュニティにおけるKG表現研究が、検索、パッセージ検索、QAのニーズによってどのようにより良く推進されるかを探求する。本チュートリアルでは、最も広く使用されている公開KG、関係の重要な特性、タイプおよびエンティティ、KG要素のベストプラクティス深層表現と、それらがどのようにしてそのような特性をサポートできるかまたはサポートできないか、KG補完と推論のための損失定式化と学習方法、時間的KGにおける時間の表現、複数のKG間でのアライメント、おそらく異なる言語で、ならびにQAアプリケーションにおける深層KG表現の使用と利点について学ぶ。

Abstract:　 A knowledge graph (KG) has nodes and edges representing entities and relations. KGs are central to search and question answering (QA), yet research on deep/neural representation of KGs, as well as deep QA, have moved largely to AI, ML and NLP communities. The goal of this tutorial is to give IR researchers a thorough update on the best practices of neural KG representation and inference from AI, ML and NLP communities, and then explore how KG representation research in the IR community can be better driven by the needs of search, passage retrieval, and QA. In this tutorial, we will study the most widely-used public KGs, important properties of their relations, types and entities, best-practice deep representations of KG elements and how they support or cannot support such properties, loss formulations and learning methods for KG completion and inference, the representation of time in temporal KGs, alignment across multiple KGs, possibly in different languages, and the use and benefits of deep KG representations in QA applications.

Conversational Information Seeking: Theory and Application
Authors: Jeffrey Dalton (1), Sophie Fischer (1), Paul Owoicho (1), Filip Radlinski (2), Federico Rossetto (1), Johanne R. Trippas (3), Hamed Zamani (4)
1: University of Glasgow, 2: Google, 3: RMIT University, 4: University of Massachusetts Amherst

ACM DL

Google Scholar

(420)
概要:　タイトル:コンバージョナル情報検索（CIS）は、ユーザーと情報システムの間での一連のインタラクションを伴います。CISにおけるインタラクションは主に自然言語対話に基づいていますが、クリック、タッチ、ジェスチャーなど他の種類のインタラクションも含まれることがあります。CISは最近かなりの注目を集めており、進展が続いています。このチュートリアルは、複数の講師が著した最近の「Conversational Information Seeking」書籍の内容に沿ったものです。本チュートリアルは、CISの初心者に対する導入として、また、この分野に中程度の知識を持つ学生や研究者に対して最新のトピックと最先端のアプローチを提供することを目的としています。チュートリアルの重要な部分は、出席者がコンバージョナル・パッセージ・リトリーバルやマルチモーダルなタスク指向対話のためのツールキットで実践的な経験を積むことに重点を置いています。本チュートリアルの成果には理論的および実践的な知識が含まれ、さらにCISに興味を持つ研究者と出会うためのフォーラムも提供されます。

Abstract:　 Conversational information seeking (CIS) involves interaction sequences between one or more users and an information system. Interactions in CIS are primarily based on natural language dialogue, while they may include other types of interactions, such as click, touch, and body gestures. CIS recently attracted significant attention and advancements continue to be made. This tutorial follows the content of the recent Conversational Information Seeking book authored by several of the tutorial presenters. The tutorial aims to be an introduction to CIS for newcomers to CIS in addition to the recent advanced topics and state-of-the-art approaches for students and researchers with moderate knowledge of the topic. A significant part of the tutorial is dedicated to hands-on experiences based on toolkits developed by the presenters for conversational passage retrieval and multi-modal task-oriented dialogues. The outcomes of this tutorial include theoretical and practical knowledge, including a forum to meet researchers interested in CIS.

Towards Reproducible Machine Learning Research in Information Retrieval
Authors: Ana Lucic (1), Maurits Bleeker (1), Maarten de Rijke (1), Koustuv Sinha (2), Sami Jullien (1), Robert Stojnic (3)
1: University of Amsterdam, 2: McGill University, 3: Facebook AI Research

ACM DL

Google Scholar

(421)
概要:　機械学習（ML）および情報検索（IR）の分野における最近の進展は顕著ですが、これら最先端の成果の再現性がしばしば欠けており、多くの投稿が後続の再現性を確保するために必要な情報を提供していません。提出前の自己チェック機構（再現性チェックリスト、いくつかの主要な会議での再現性評価基準、アーティファクトレビューおよびバッジングフレームワーク、主要なIR会議での専用の再現性トラックおよびチャレンジの導入など）にもかかわらず、広範な情報コミュニティにおける再現可能な研究の実行に対する動機付けが不足しています。本論文では、IR研究における再現性を保証するための入門として、特にMLの側面に重点を置いた本チュートリアルを提案します。

Abstract:　 While recent progress in the field of machine learning (ML) and information retrieval (IR) has been significant, the reproducibility of these cutting-edge results is often lacking, with many submissions failing to provide the necessary information in order to ensure subsequent reproducibility. Despite the introduction of self-check mechanisms before submission (such as the Reproducibility Checklist, criteria for evaluating reproducibility during reviewing at several major conferences, artifact review and badging framework, and dedicated reproducibility tracks and challenges at major IR conferences, the motivation for executing reproducible research is lacking in the broader information community. We propose this tutorial as a gentle introduction to help ensure reproducible research in IR, with a specific emphasis on ML aspects of IR research.

ReNeuIR: Reaching Efficiency in Neural Information Retrieval
Authors: Sebastian Bruch (1), Claudio Lucchese (2), Franco Maria Nardini (3)
1: Pinecone, 2: Ca' Foscary University of Venice, 3: ISTI-CNR

ACM DL

Google Scholar

(422)
概要:　情報検索研究の応用的な性質は、機械学習モデルを総合的に評価してきたコミュニティの豊かな歴史をある程度説明していると言えるでしょう。効率の重要性を理解しつつ、それを達成するためにかかる計算コストも重要視しているのです。例えば、ラーニング・トゥ・ランクにおける大規模な決定森林モデルの効率的なトレーニングと推論に関する10年以上の研究がこれを裏付けています。コミュニティがさらに複雑なニューラルネットワークベースのモデルを幅広い応用に採用するにつれ、効率に関する課題が再び重要になってきました。本ワークショップを、ニューラル情報検索の時代における効率性について批判的に議論するフォーラムとして提案します。この分野の研究の現状と将来の方向性について議論を促進し、情報検索用のニューラルモデルの開発と評価におけるベストプラクティスを特定することで、より持続可能な研究を推進することを目指します。

Abstract:　 Perhaps the applied nature of information retrieval research goes some way to explain the community's rich history of evaluating machine learning models holistically, understanding that efficacy matters but so does the computational cost incurred to achieve it. This is evidenced, for example, by more than a decade of research on efficient training and inference of large decision forest models in learning-to-rank. As the community adopts even more complex, neural network-based models in a wide range of applications, questions on efficiency have once again become relevant. We propose this workshop as a forum for a critical discussion of efficiency in the era of neural information retrieval, to encourage debate on the current state and future directions of research in this space, and to promote more sustainable research by identifying best practices in the development and evaluation of neural models for information retrieval.

The Seventh Workshop on Search-Oriented Conversational Artificial Intelligence (SCAI'22)
Authors: Gustavo Penha (1), Svitlana Vakulenko (2), Ondrej Dusek (3), Leigh Clark (4), Vaishali Pal (5), Vaibhav Adlakha (6)
1: Delft University of Technology, 2: Amazon Alexa AI, 3: Charles University, 4: Swansea University, 5: University of Amsterdam, 6: Mila

ACM DL

Google Scholar

(423)
概要:　第7版のSCAI（https://scai.info）の目標は、情報アクセスのための会話システムに関心を持つ研究者や実務者のコミュニティを集め、さらに成長させることです。ワークショップの過去の回では、会話型検索エージェントの設計と開発に内在する広範さと学際性が既に示されています。従来のウェブ検索から、人間のような対話を介した検索インターフェースへの移行が提案される中で、多くの課題が浮上しています。近年、これらの課題に対する注目度は高まっているものの、情報検索コミュニティが取り組むべき未解決の研究課題は依然として多く残っています。これらの課題は、自然言語処理、機械学習、人間とコンピュータの相互作用、対話システムなど、他の研究分野との協力を通じて、大いに恩恵を受けることができます。本ワークショップは、検索指向の会話システムの設計に関する主要な研究課題を継続的に議論するためのプラットフォームとして設けられています。今年は、参加者が対面で集まり、終日の現地ワークショップを通じて、より深くインタラクティブな議論を行う機会が提供されます。

Abstract:　 The goal of the seventh edition of SCAI (https://scai.info) is to bring together and further grow a community of researchers and practitioners interested in conversational systems for information access. The previous iterations of the workshop already demonstrated the breadth and multidisciplinarity inherent in the design and development of conversational search agents. The proposed shift from traditional web search to search interfaces enabled via human-like dialogue leads to a number of challenges, and although such challenges have received more attention in the recent years, there are many pending research questions that should be addressed by the information retrieval community and can largely benefit from a collaboration with other research fields, such as natural language processing, machine learning, human-computer interaction and dialogue systems. This workshop is intended as a platform enabling a continuous discussion of the major research challenges that surround the design of search-oriented conversational systems. This year, participants have the opportunity to meet in person and have more in-depth interactive discussions with a full-day onsite workshop.

The 10th International Workshop on News Recommendation and Analytics (INRA 2022)
Authors: Özlem Özgöbek (1), Andreas Lommatzsch (2), Benjamin Kille (1), Peng Liu (1), Jon Atle Gulla (1), Edward C. Malthouse (3)
1: Norwegian University of Science and Technology, 2: Technische Universität Berlin, 3: Northwestern Univserity

ACM DL

Google Scholar

(424)
概要:　急速に変化するニュースのエコシステムは、研究者、メディア組織、消費者、そして社会に新たな課題を突きつけています。第10回国際ニュース推薦とアナリティクスワークショップ（INRA）は、アイデアの交換やニュースに関する最新のトレンド、技術の進展、そして未解決の問題を議論する場を提供します。我々は、多様な科学論文、デモンストレーション、そして新たなアイデアの投稿を歓迎します。研究者、実務家、意思決定者を一堂に集め、重要な課題に取り組むことを目指しています。このワークショップは、最新の研究について学び、ニュースに関連する技術的および学際的な側面についてインタラクティブに議論する機会を提供します。関心のあるトピックには、ニュース用の情報アクセスシステム、自然言語処理の進展、マルチモダリティ、虚偽情報と偽情報、信頼性とユーザー体験、そしてパーソナライゼーションが含まれます。

Abstract:　 A rapidly changing news ecosystem presents new challenges to research, media organizations, consumers, and societies. The 10th edition of the International Workshop on News Recommendation and Analytics (INRA) serves to exchange ideas and discuss recent trends, technological advancements, and open problems concerning news. We welcome contributions in scientific articles, demonstrations, and ideas. We strive to bring together researchers, practitioners, and decision-makers to address crucial challenges. The workshop provides an opportunity to learn about recent research and interactively discuss technical and interdisciplinary aspects related to news. Topics of interest include information access systems for news, advances in natural language processing, multi-modality, mis- and disinformation, trust and user experiences, and personalization.

3rd Workshop on Patent Text Mining and Semantic Technologies (PatentSemTech2022)
Authors: Ralf Krestel (1), Hidir Aras (2), Linda Andersson (3), Florina Piroi (4), Allan Hanbury (5), Dean Alderucci (6)
1: ZBW - Leibniz Information Centre for Economics & Kiel University, 2: FIZ Karlsruhe, 3: Artificial Researcher IT GmbH, 4: TU Wien & Data Science Studios Austria, 5: TU Wien, 6: Carnegie Mellon University

ACM DL

Google Scholar

(425)
概要:　毎年特許申請数が着実に増加し、利用可能な特許データの量も膨大であるため、特許分野における次世代情報検索システムの高度な効率性と対話性が求められています。多くの分野で成功を収めているAIおよび機械学習（ML）手法、特にディープラーニング（DL）は、特許研究者や実務者にも取り入れられ始めています。これにより、専門家の支援や特許分析・検索プロセスの自動化が可能になります。AIを強化した情報検索システムは、特許検索や分析を向上させる一方で、MLモデルの訓練には数百万の注釈付きサンプルデータが必要とされます。特許データを扱う際には、既存の情報検索（IR）およびAI手法の適応や、新しいアプローチの開発が求められる特有の課題が発生します。本ワークショップの第3回では、自然言語処理（NLP）、テキストデータマイニング（TDM）、セマンティックテクノロジー（ST）など、情報検索の全領域から業界と学界の双方向コミュニケーションに焦点を当てます。新しい研究成果と知的財産（IP）業界で採用されている最新のシステムや方法を一堂に会することを目指しています。

Abstract:　 Steadily increasing numbers of patent applications per year and large amounts of available patent data necessitate highly efficient and interactive next-generation information retrieval systems in the patent domain. AI and Machine Learning (ML) methods such as Deep Learning (DL) are successfully adopted in many domains, so patent researchers and practitioners start to employ AI-based approaches as well, to support experts in the patenting process or to automate patent analysis and retrieval processes. AI-enhanced Information Retrieval systems can improve patent search and analysis but also require millions of annotated sample data for training the ML models. When working with patent data, particular challenges arise that call for adaption of existing IR and AI methods as well as development of novel approaches suited for the patent domain. The focus of the 3rd edition of this workshop will be on two-way communication between industry and academia from all areas of Information Retrieval, such as Natural Language Processing (NLP), Text and Data Mining (TDM), and Semantic Technologies (ST). We want to bring together novel research results and the latest systems and methods employed by the Intellectual Property (IP) industry.

QUARE: 1st Workshop on Measuring the Quality of Explanations in Recommender Systems
Authors: Alessandro Piscopo (1), Oana Inel (2), Sanne Vrijenhoek (3), Martijn Millecamp (4), Krisztian Balog (5)
1: BBC, 2: University of Zurich, 3: University of Amsterdam, 4: AE NV, 5: Google

ACM DL

Google Scholar

(426)
概要:　品質評価を通じた推薦システムの説明測定に関する初のワークショップであるQUARE（品質評価を通じた説明の質測定）は、推薦システムの説明に関する評価手法についての今後の研究と実践の方向性に関する議論を促進することを目的としています。この目的のために、学術界と産業界の研究者および実務者を一堂に会し、主要な問題点およびベストプラクティスについて議論を行い、相乗効果の可能性を特定し、将来の研究方向性に関する優先課題を明確にします。さらに、組織と人間の価値の相互作用における説明手法、影響、目標を体系的かつ包括的に評価する方法についての考察を促すことも目指しています。ワークショップのホームページは以下のURLにてご覧いただけます：https://sites.google.com/view/quare-2022/.

Abstract:　 QUARE - measuring the QUality of explAnations in REcommender systems - is the first workshop that aims to promote discussion upon future research and practice directions around evaluation methodologies for explanations in recommender systems. To that end, we bring together researchers and practitioners from academia and industry to facilitate discussions about the main issues and best practices in the respective areas, identify possible synergies, and outline priorities regarding future research directions. Additionally, we want to stimulate reflections around methods to systematically and holistically assess explanation approaches, impact, and goals, at the interplay between organisational and human values. The homepage of the workshop is available at: https://sites.google.com/view/quare-2022/.

IWILDS'22 -- Third International Workshop on Investigating Learning During Web Search
Authors: Anett Hoppe (1), Ran Yu (2), Jiqun Liu (3)
1: Leibniz Information Centre for Science and Technology & L3S Research Centre, 2: Data Science & Intelligent Systems Group (DSIS), 3: The University of Oklahoma

ACM DL

Google Scholar

(427)
概要:　ワールドワイドウェブの誕生以来、ウェブは多様な情報タスクのための主要な情報源となってきました。オンラインで豊富な情報が利用できる現在、ウェブ検索エンジンは主要な入り口となり、ますます複雑化する情報ニーズに対応するためにユーザーをサポートしています。IWILDSワークショップシリーズでは、人間の学習に関連する複雑な検索活動に関する研究を募集しています。このワークショップは、ウェブ上での人間の学習に関する最新の研究の発表と議論のための学際的なプラットフォームを提供し、コンピュータおよび情報科学、教育、心理学の視点を歓迎しています。

Abstract:　 Since its inception, the World Wide Web has become a major information source, consulted for a diversity of informational tasks. With an abundance of information available online, Web search engines have been a main entry point, supporting users in finding suitable Web content for ever more complex information needs. The IWILDS workshop series invites research on complex search activities related to human learning. It provides an interdisciplinary platform for the presentation and discussion of recent research on human learning on the Web, welcoming perspectives from computer & information science, education and psychology.

eCom'22: The SIGIR 2022 Workshop on eCommerce
Authors: Ajinkya Kale (1), Surya Kallumadi (2), Tracy Holloway King (1), Shervin Malmasi (3), Maarten de Rijke (4), Jacopo Tagliabue (5)
1: Adobe, 2: Lowe's Companies, 3: Amazon.com, 4: University of Amsterdam, 5: Coveo

ACM DL

Google Scholar

(428)
概要:　タイトル:eCommerceの情報検索（IR）は、学術文献においてますます注目されており、例えばAirbnb、Alibaba、Amazon、eBay、Facebook、Flipkart、Lowe's、Taobao、Targetなど、世界最大級のウェブサイトの重要な要素となっています。SIGIRは数年間にわたり、eCommerce組織からのスポンサーシップを受けており、これは彼らにとってIR研究の重要性を反映しています。本ワークショップの目的は、(1) eCommerce IRの研究者と実務者を集め、その特有のトピックを議論する場を提供すること、(2) eCommerceの特有の構成要素である自由テキスト、構造化データ、および顧客行動データを活用して、検索の関連性を向上させる方法を模索すること、(3) この分野のデータセットの構築方法とアルゴリズムの評価方法を検討することです。eCommerceの顧客は、購入したい商品を正確に知らないことが多いため（例：ナビゲーションクエリやスピアフィッシングクエリは稀です）、レコメンデーションはインスピレーションや偶然の発見、バスケット構築にとって貴重です。今年のeCommerce IRワークショップのテーマは、IRメトリクスとビジネスメトリクスおよび多目的最適化を橋渡しすることです。ワークショップでは、このトピックに関する論文やこの領域に焦点を当てたパネルディスカッション（第3節参照）が含まれています。さらに、Farfetchはアウトフィットの完成に焦点を当てたレコメンデーションチャレンジをスポンサーしており、イベントの一環として、研究コミュニティに対してファッション専門家によってキュレーションされた新しい大規模なデータセットを公開します。このデータチャレンジは、2017年から2021年の以前のSIGIRワークショップのテーマを反映しています。

Abstract:　 eCommerce Information Retrieval (IR) is receiving increasing attention in the academic literature and is an essential component of some of the world's largest web sites (e.g. Airbnb, Alibaba, Amazon, eBay, Facebook, Flipkart, Lowe's, Taobao, and Target). SIGIR has for several years seen sponsorship from eCommerce organisations, reflecting the importance of IR research to them. The purpose of this workshop is (1) to bring together researchers and practitioners of eCommerce IR to discuss topics unique to it, (2) to determine how to use eCommerce's unique combination of free text, structured data, and customer behavioral data to improve search relevance, and (3) to examine how to build datasets and evaluate algorithms in this domain. Since eCommerce customers often do not know exactly what they want to buy (i.e. navigational and spearfishing queries are rare), recommendations are valuable for inspiration and serendipitous discovery as well as basket building. The theme of this year's eCommerce IR workshop is Bridging IR Metrics and Business Metrics and Multi-objective Optimization. The workshop includes papers on this topic as well as a panel focused on this area (see Section 3). In addition, Farfetch is sponsoring a recommendation challenge focused on outfit completion: as part of the event, Farfetch will release to the research community a novel, large dataset containing multi-modal information and extensive labels curated by fashion experts. The data challenge reflects themes from prior SIGIR workshops in 2017, 2018, 2019, 2020, 2021.

DRL4IR: 3rd Workshop on Deep Reinforcement Learning for Information Retrieval
Authors: Xiangyu Zhao (1), Xin Xin (2), Weinan Zhang (3), Li Zhao (4), Dawei Yin (5), Grace Hui Yang (6)
1: City University of Hong Kong, 2: Shandong University, 3: Shanghai Jiao Tong University, 4: Microsoft Research, 5: Baidu, 6: Georgetown University

ACM DL

Google Scholar

(429)
概要:　情報検索（Information Retrieval, IR）システムは、クエリー拡張、アイテムのリコール、ランキング、再ランキングなどを含む一連のプロセスを通じて、ユーザーが有用な情報を見つけるのを支援する現代社会における重要なコンポーネントとなっています。ランキングされた情報リストに基づいて、ユーザーはフィードバックを提供することができます。このようなユーザーとIRシステムの間のインタラクションプロセスは、ワンステップまたはシーケンシャルな意思決定問題として自然に定式化することができます。過去10年間で、ディープリインフォースメントラーニング（Deep Reinforcement Learning, DRL）は、複雑な意思決定タスクのためにディープラーニングの高いモデルキャパシティを活用することから、有望な方向性として注目されています。最近、DRLをIRタスクに活用する研究が増加してきました。しかし、DRL環境下での基本的な情報理論、IRタスクに対するRL手法の原則、またはDRLベースのIRシステムの実験評価プロトコルについては十分に研究されていません。これを踏まえ、SIGIR 2022において、第3回DRL4IRワークショップ（https://drl4ir.github.io）を提案します。本ワークショップは、学術研究者および業界の実務者がDRLベースのIRシステムの最近の進展を発表し、DRLをIRに適用する新しい研究、興味深い発見、および新しい応用を促進する場を提供します。過去2年間で、SIGIR'20および'21で開催されたDRL4IRは、最も成功したワークショップの1つであり、それぞれ200人以上の参加者を引きつけました。今年は、基礎研究トピックと最近の応用進展により多くの注目を集め、300人以上の参加者を期待しています。

Abstract:　 Information retrieval (IR) systems have become an essential component in modern society to help users find useful information, which consists of a series of processes including query expansion, item recall, item ranking and re-ranking, etc. Based on the ranked information list, users can provide their feedbacks. Such an interaction process between users and IR systems can be naturally formulated as a decision-making problem, which can be either one-step or sequential. In the last ten years, deep reinforcement learning (DRL) has become a promising direction for decision-making, since DRL utilizes the high model capacity of deep learning for complex decision-making tasks. Recently, there have been emerging research works focusing on leveraging DRL for IR tasks. However, the fundamental information theory under DRL settings, the principle of RL methods for IR tasks, or the experimental evaluation protocols of DRL-based IR systems, has not been deeply investigated. To this end, we propose the third DRL4IR workshop (https://drl4ir.github.io) at SIGIR 2022, which provides a venue for both academia researchers and industry practitioners to present the recent advances of DRL-based IR system, to foster novel research, interesting findings, and new applications of DRL for IR. In the last two years, DRL4IR organized at SIGIR'20/21 was one of the most successful workshops and attracted over 200 workshop attendees each year. In this year, we will pay more attention to fundamental research topics and recent application advances, with an expectation of over 300 workshop participants.

Fairness-Aware Question Answering for Intelligent Assistants
Authors: Sachin Pathiyan Cherumanal (1)
1: RMIT University

ACM DL

Google Scholar

(430)
概要:　 Amazon Alexa、Google Assistant、Apple Siriなどの対話型知的アシスタントは、音声のみの質問応答（QA）システムの一種であり、複雑な情報ニーズに対処する可能性を秘めています。しかし、現在のところ、それらはほとんどが数語で表現された事実を答えることに限られています。例えば、ユーザーがGoogle Assistantにコーヒーが健康に良いかどうかを尋ねた際、それは健康に良い理由を説明するのみで、コーヒー消費の副作用については言及しません \citegao2020toward。このように多角的な視点への限定的な露出は、ユーザーの認識、好み、態度に変化をもたらし、望ましくない認知バイアスの生成と強化につながる可能性があります。こうした問題を解決し、公平な形で対立する視点を含む複雑な回答を提供することは、未解決の研究課題です。本研究では、ユーザーの満足度を損なうことなく、マルチターンの対話において多角的な視点や関連する回答を公正にユーザーに提示する問題に取り組むことを目指します。

Abstract:　 Conversational intelligent assistants, such as Amazon Alexa, Google Assistant, and Apple Siri, are a form of voice-only Question Answering (QA) system and have the potential to address complex information needs. However, at the moment they are mostly limited to answering with facts expressed in a few words. For example, when a user asks Google Assistant if coffee is good for their health, it responds by justifying why it is good for their health without shedding any light on the side effects coffee consumption might have \citegao2020toward. Such limited exposure to multiple perspectives can lead to change in perceptions, preferences, and attitude of users, as well as to the creation and reinforcement of undesired cognitive biases. Getting such QA systems to provide a fair exposure to complex answers -- including those with opposing perspectives -- is an open research problem. In this research, I aim to address the problem of fairly exposing multiple perspectives and relevant answers to users in a multi-turn conversation without negatively impacting user satisfaction.

Continuous Result Delta Evaluation of IR Systems
Authors: Gabriela González-Sáez (1)
1: Univ. Grenoble Alpes

ACM DL

Google Scholar

(431)
概要:　情報検索システムの従来の評価は、静的なテストコレクションでシステムを評価します。ウェブ検索の場合、評価環境（EE）は継続的に変化し、静的なテストコレクションを使用する仮説はこの変化する現実を反映していません。さらに、文書セット、トピックセット、関連性判断、および選択されたメトリクスなど、評価環境の変化はパフォーマンス測定に影響を与えます。我々の知る限りでは、変化するEEを持つ2つのバージョンの検索エンジンを評価する方法は存在しません。我々は、異なる評価環境で異なる検索エンジンのバージョンを評価するための継続的なフレームワークを提案することを目指しています。従来のパラダイムは、システム結果の再現性を保証するために、安定して意味のあるEEとして制御されたテストコレクション（つまり、トピックのセット、文書のコーパス、および関連する評価）に依存しています。我々は、これらの異なるEEを動的テストコレクション（DTC）として定義します。DTCは、静的テストコレクションの制御された進化に基づいたテストコレクションのリストです。DTCは、Knowledge delta（KΔ）と呼ばれるテストコレクション要素間の違いと、これらの変化するテストコレクションで評価されたシステムのパフォーマンスの違い、Result delta（RΔ）の違いを定量化し関連付けることを可能にします。最後に、継続的な評価はKΔとRΔを特徴とします。両者のデルタに関連する変化は、システムパフォーマンスの評価を解釈するための手がかりとなります。本論文の期待される貢献は以下の通りです：（i）異なるEEで評価されたシステムを比較するためのRΔに基づくピボット戦略；（ii）連続評価をシミュレートし、進化する文脈で有意なRΔを提供するDTCの形式化；および（iii）評価されたシステムのRΔを説明するためにKΔを組み込んだ継続評価フレームワーク。異なるEEで評価された2つのシステムのRΔを測定することはできません。なぜなら、そのパフォーマンスの変動がEEの変化に依存しているからです。このRΔの測定値を得るために、基準システム、すなわちピボットシステムを使用することを提案します。このシステムは、考慮される2つのEE内で評価されます。その後、ピボットシステムと各評価されたシステムとの間の相対的な距離を使用してRΔ値を測定します。我々の結果は、ピボット戦略を使用することで、異なるEEで評価されたシステムのランキング（RoS）の正確性が向上することを示します（つまり、RoSがグラウンドトゥルースで評価された場合の類似性）。これは、異なるEEで評価された各システムの絶対的なパフォーマンス値で構築されたRoSと比較してです。RoSの正確性は、ピボットとして定義されたシステムとメトリクスによって決定されます。我々の提案は、進化するEEでウェブ検索の同じまたは異なるバージョンを繰り返し評価する、継続的な評価に焦点を移します。現在のテストコレクションは、文書、トピック、および関連性判断の進化を考慮していません。我々は、DTCを使用して比較されたシステムのRΔを抽出し、EEの変化（KΔ）との関係を調べる必要があります。我々は、進化するEEをシミュレートする方法として、制御された特徴に基づいて静的テストコレクションからDTCを定義する方法を提供します。我々の予備実験によると、提案されたDTCで評価されたシステムは、いくつかのランダムな文書シャードまたはブートストラップで評価された場合よりも、より変動するパフォーマンスと大きなRΔを示します。今後の課題として、KΔを統合して説明可能な継続評価フレームワークを形式化する予定です。ピボット戦略は、EE全体でシステムのパフォーマンスが向上している場合に示してくれます。DTCは、有意なRΔを特定するために必要なEEを提供し、フレームワークにKΔを含めることで、システムのパフォーマンス変化を説明する一連の要因を定義します。

Abstract:　 Classical evaluation of information retrieval systems evaluates a system in a static test collection. In the case of Web search, the evaluation environment (EE) is continuously changing and the hypothesis of using a static test collection is not representative of this changing reality. Moreover, the changes in the evaluation environment, as the document set, the topics set, the relevance judgments, and the chosen metrics, have an impact on the performance measurement [1, 4]. To the best of our knowledge, there is no way to evaluate two versions of a search engine with evolving EEs. We aim at proposing a continuous framework to evaluate different versions of a search engine in different evaluation environments. The classical paradigm relies on a controlled test collection (i.e., set of topics, corpus of documents and relevant assessments) as a stable and meaningful EE that guarantees the reproducibility of system results. We define the different EEs as a dynamic test collection (DTC). A DTC is a list of test collections based on a controlled evolution of a static test collection. The DTC allows us to quantify and relate the differences between the test collection elements, called Knowledge delta (K)Δ, and the performance differences between systems evaluated on these varying test collections, called Result delta (R)Δ. Finally, the continuous evaluation is characterized by KΔs and RΔs. The related changes in both deltas will allow for interpreting the evaluations in systems performances. The expected contributions of the thesis are: (i) a pivot strategy based on RΔ to compare systems evaluated in different EEs; (ii) a formalization of DTC to simulate the continuous evaluation and provide significant RΔ in evolving contexts; and (iii) a continuous evaluation framework that incorporates KΔ to explain RΔ of evaluated systems. It is not possible to measure the RΔ of two systems evaluated in different EEs, because the performance variations are dependent on the changes in the EEs. [1]. To get an estimation of this RΔ measure, we propose to use a reference system, called the pivot system, which would be evaluated within the two EEs considered. Then, the RΔ value is measured using the relative distance between the pivot system and each evaluated system. Our results [2, 3] show that using the pivot strategy we improve the correctness of the ranking of systems (RoS) evaluated in two EEs (i.e., similarity with the RoS evaluated in the ground truth), compared to the RoS constructed with the absolute performance values for each system evaluated in the different EEs. The correctness of the RoS depends on the system defined as pivot and the metric. The proposal focus moves to a continuous evaluation as a repeated assessment of the same or different versions of a web search across evolving EEs. Current test collections do not consider the evolution of documents, topics and relevance judgements. We require a DTC to extract RΔs of the compared system and its relation with the changes on the EEs (KΔ). We provide a method to define a DTC from static test collections based on controlled features as a way to better simulate the evolving EE. According to our preliminary experiments, a system evaluated in our proposed DTC shows more variable performances, and larger RΔs, than when it is evaluated in several random shards or bootstraps of documents. As future work, we will integrate the KΔs to formalize an explainable continuous evaluation framework. The pivot strategy tells us when the performance of the system is improving across EEs. The DTC provides us with the required EEs to identify significant RΔs, and the inclusion of KΔs in the framework will define a set of factors that explain the system's performance changes.

Generating Knowledge-based Explanation for Recommendation from Review
Authors: Zuoxi Yang (1)
1: South China University of Technology

ACM DL

Google Scholar

(432)
概要:　抜粋

説明の合理性は、レコメンダーシステムに対するユーザーの信頼と満足度を向上させるために重要です。多くの先行研究の中で、レビュー文に基づいた説明生成に対する関心が高まってきています。協調フィルタリングは、ユーザーの好みを予測するための最も成功したアプローチの1つです。しかし、多くの場合、データのスパース性問題に悩まされます。研究者は、レビュー、ナレッジグラフ（KG）、画像などの補助データを活用してこの問題に対処します。例えば、評価データとレビューを組み合わせることで、推薦精度が向上することが確認されています。また、ニューラルネットワークもレビュー文からユーザーとアイテムのより強力な表現を学習するために使用されます。例えば、畳み込みニューラルネットワーク（CNN）は、畳み込みフィルタを使用してレビュー文から表現を抽出します。リカレントニューラルネットワーク（RNN）は、シーケンシャルな行動を隠れ状態としてエンコードするために広く使用されるモデルです。しかし、これらの多くは説明を生成する能力に欠けています。

説明を生成するために、主にテンプレートベースのアプローチと生成ベースのアプローチの2つが使用されます。テンプレートベースのアプローチでは、複数のテンプレートを定義し、それを個別の特徴や単語で埋めます。この方法は読みやすい説明を提供できますが、事前に定義されたテンプレートに大きく依存するため、多くの手動作業が必要となり、表現の幅が制限されます。一方、生成ベースのアプローチは、自然言語モデルの強力な生成能力を利用してテンプレートなしに説明を生成できるため、生成された文章の表現力を大幅に向上させることができます。しかし、自由で柔軟な説明が生成される一方で、内容が不十分になる傾向があります。

これらの課題を解決するために、私たちはレビューから推薦を行うための知識ベースの説明生成システム（GKER）を提案し、情報豊富な説明を提供します。従来のマルチタスクフレームワークとは異なり、私たちはシングルタスクフレームワークを設計し、ユーザーの好みのモデル化と説明生成を同時に行います。マルチタスク学習は通常、手動作業と時間が多くかかります。この統合フレームワークでは、ユーザーの感情的な好みを説明生成に組み込み、ユーザーの興味を捉えながら高品質な説明を生成することを目指します。

具体的には、二部グラフ、KG、共起グラフの3つのグラフを構築し、それらを統合して一体化したグラフを形成します。これにより、ユーザーとアイテムのインタラクション、KG、およびレビュー間のセマンティックな関連性がもたらされます。この統合グラフを基に、ユーザーとアイテムのより効果的な表現を学習することが可能になります。また、統合KGをより活用するために、グラフ畳み込みネットワーク（GCN）を使用し、その優れた表現学習能力により改良された埋め込みを取得します。これにより、統合KGとGCNの助けを借りて、これらの埋め込みがより多くのセマンティックなインタラクションシグナルを含むことができると主張します。

これらの豊富な埋め込みを取得した後、マルチレイヤーパセプトロン（MLP）層をさらに使用して、ユーザーとアイテム間の非線形なインタラクションシグナルを捕捉し、ユーザーの評価を正確に予測することを目指します。予測された評価は、ユーザーがターゲットアイテムを好きか嫌いかを探る感情指標として捉えられます。感情指標と関連するレビュー文データの関連性を調査するために、トランスフォーマーによるエンコーダーデコーダーアーキテクチャを設計し、情報豊富かつトピック関連の説明を生成します。さらに、このアーキテクチャにはアテンションメカニズムを通じて語義セマンティクスが追加されます。このフレームワークでは、トランスフォーマーを「教師」として使用して、エンコーダーデコーダープロセスの生成を監督します。

最後に、3つのデータセットで実施された実験は、GKERの最先端の性能を示しました。いくつかの研究課題があります。1）KGは推薦精度と説明可能性に有用ですが、現実世界では常に不完全です。したがって、KGを補完することは価値があります。2）さらに、説明の評価にはより多くの指標が必要です。

Abstract:　 Reasonable explanation is helpful to increase the trust and satisfaction of user to the recommender system. Among many previous studies, there is growing concern about generating explanation based on review text. Collaborative filtering is one of the most successful approaches to predict user's preference. However, most of them suffer from data sparsity problem. Researcher often utilizes auxiliary data to address this problem, such as review, knowledge graph (KG), image and so on. Some researchers have proven that recommendation accuracy can be improved via incorporating rating and review data. Besides, neural network is also applied to learn more powerful representations for user and item from the review data. For example, convolution neural network (CNN) is used to extract representation from review text by using convolutional filters. Recurrent neural network (RNN) is another widely used model, which can encode the sequential behaviours as hidden states. However, most of them lack the ability to generate explanation. In order to generate explanation, there are two main approaches are used, i.e., template-based approach and generation-based approach. It is usually necessary for the templated-based approach to define serval templates. Then, these templates will be further filled with different personalized features/words. Although they can offer readable explanations, they rely heavily on pre-defined templates. It causes large manual efforts, limiting their explanation expression. Due to the strong generation ability of natural language model, the generation-based approach is capable to generate explanation without templates, which can largely enhance the expression of the generated sentence. Although they can generate more free and flexible explanation, the explanation might tend to be uninformative. To tackle these challenges of the above-mentioned work, we propose a Generating Knowldge-based Explanation for Recommendation from Review (GKER) to provide informative explanation. Unlike the traditional generation-based approach with a multi-task framework, we design a single-task framework to simultaneously model user's preference and explanation generation. The multi-task training usually needs more manual effort and time overhead. In this unitary framework, we inject the user's sentiment preference into the explanation generation, aiming at capturing the user's interest while producing high-quality explanation. Specifically, we build three graphs, including a bipartite graph, a KG and a co-occur graph. All of them are integrated to form a unitary graph, thus bringing the semantic among user-item interaction, KG and review. Based on this integrated graph, it is possible to learn more effective representations for user and item. To make better use of the integrated KG, a graph convolution network (GCN) is utilized to obtain improved embeddings due to its superior representation learning ability. We argue that these embeddings can contain more semantic interaction signals with the help of the integrated KG and GCN. After obtaining these extensive embeddings, a multilayer perceptron (MLP) layer is further employed to capture non-linear interaction signals between user and item, aiming at predicting user's rating accurately. The predicted rating would be regarded as a sentiment indicator to explore why the user likes or dislikes the target item. To investigate the association between sentiment indicator and the related review data, a transformer-enhanced encoder-decoder architecture is designed to produce informative and topic-relevant explanation. Besides, the aspect semantic is added in this architecture through an attention mechanism. In this framework, the transformer is utilized as a "teacher" model to supervise the generation of the encoder-decoder process. Finally, experiments conducted on three datasets have shown the state-of-the-art performance of GKER. There are some research issues for discussion: 1) although KG is a useful tool for recommendation accuracy and explainability, it is always incomplete in the real world. Hence, it is worth completing it for the recommendation. 2) Besides, as for explainable, it still needs more metrics to evaluate the quality of its explanation.

Adaptive Dialogue Management for Conversational Information Elicitation
Authors: Harshita Sahijwani (1)
1: Emory University

ACM DL

Google Scholar

(433)
概要:　情報引き出しの会話、例えば医療専門家が患者の病歴を尋ねる場合や販売代理人が顧客の嗜好を理解しようとする場合、多くの場合は一連の定型質問から始まります。インタビュアーは、個人の特性や文脈に応じて、あらかじめ決められた質問を会話形式で適応させながら行います。複数選択式の質問票は、専門家と面談する前により効率的な情報収集を行うためのスクリーニングツールとして一般的に使用されています[5]。しかし、最近の概念実証研究では、ユーザーが紙とペンによる調査よりも体現化された会話エージェント（ECA）に対して症状を報告しやすく[3]、ECA のユーザーエクスペリエンスを高く評価することが示されています[4]。チャットボットは、ユーザーが自由形式の回答を提供したり、リストから選ぶ代わりにクラリフィケーション質問を行ったりすることができます。また、適切な場面で関連情報を共有したり、共感的な応答を提供したりすることで、ユーザーの関与を維持することができます。しかし、このような会話エージェントを構築する際の多くの技術的課題はまだ解決されていません。

Abstract:　 Information elicitation conversations, for example, when a medical professional asks about a patient's history or a sales agent tries to understand their client's preferences, often start with a set of routine questions. The interviewer asks a predetermined set of questions conversationally, adapting them to the unique characteristics and context of an individual. Multiple-choice questionnaires are commonly used as a screening tool before the client sees the professional for more efficient information elicitation [5]. However, recent proof-of-concept studies show that users are more likely to report their symptoms to an embodied conversational agent (ECA) than on a pen-and-paper survey [3], and rate ECAs highly on user experience [4]. Chatbots allow the user to give free-form responses and ask clarification questions instead of having to interpret and choose from a list of given options. They can also keep the user engaged by sharing relevant information and offering empathetic acknowledgments when appropriate. However, many of the technical challenges involved in building such a conversational agent remain unsolved.

Pre-Training for Mathematics-Aware Retrieval
Authors: Anja Reusch (1)
1: Technische Universität Dresden

ACM DL

Google Scholar

(434)
概要:　数式は、科学や教育の分野でアイデアを簡潔に伝えるための重要なツールであり、説明、計算、または導出を明確にするために使用されます。科学文献を検索する際、数式表記（しばしばLATEX表記で書かれる）は重要な役割を果たし、無視することはできません。数学認識情報検索の課題は、自然言語と数式の両方を含むクエリや質問に対して、関連する記述を検索することです。自然言語理解に依存する多くの分野と同様に、トランスフォーマーベースのモデルは現在、情報検索の分野を支配しています。その大きさとトランスフォーマーエンコーダーアーキテクチャに加えて、事前学習がこれらのモデルの高性能の重要な要因であると考えられています。また、事前学習と特定ドメインのデータの語彙重複が少ない場合、ドメイン適応型事前学習が下流タスクのパフォーマンスをさらに向上させることが示されています。これは数学文書の領域にも当てはまります。

Abstract:　 Mathematical formulas are an important tool to concisely communicate ideas in science and education, used to clarify descriptions, calculations or derivations. When searching in scientific literature, mathematical notation, which is often written using the LATEX notation, therefore plays a crucial role that should not be neglected. The task of mathematics-aware information retrieval is to retrieve relevant passages given a query or question, which both can include natural language and mathematical formulas. As in many domains that rely on Natural Language Understanding, transformer-based models are now dominating the field of information retrieval [3]. Apart from their size and the transformerencoder architecture, pre-training is considered to be a key factor for the high performance of these models. It has also been shown that domain-adaptive pre-training improves their performance on down-stream tasks even further [2] especially when the vocabulary overlap between pre-training and in-domain data is low. This is also the case for the domain of mathematical documents.

Neural Pseudo-Relevance Feedback Models for Sparse and Dense Retrieval
Authors: Xiao Wang (1)
1: University of Glasgow

ACM DL

Google Scholar

(435)
概要:　疑似関連フィードバックメカニズムは、情報検索における検索効率を向上させる有効な手法として長らく活用されてきました。最近では、T5やBERTといった大規模な事前学習済み言語モデルが、テキストの潜在的な特性を捉える強力な能力を示しています。これらのモデルの成功を踏まえ、本研究ではこれらのモデルによるクエリ再構成の能力を検討します。さらに、BERTモデルはクエリとドキュメントを文脈化された埋め込み表現にエンコードし、意味的なマッチング操作を通じて関連ドキュメントを検索する密検索（デンスリトリーバル）においても有望であることが明らかになっています。疎検索（スパースリトリーバル）における疑似関連フィードバックの成功は十分に記録されていますが、密検索のパラダイムに対する効果的な疑似関連フィードバック手法はまだ初期段階にあります。したがって、本研究では、大規模事前学習モデルと組み合わせた疑似関連フィードバック情報の潜在力を発掘し、疎検索および密検索の両方に対して効果的なクエリ再構成を行うことに重点を置いています。

Abstract:　 Pseudo-relevance feedback mechanisms have long served as an effective technique to improve the retrieval effectiveness in information retrieval. Recently, large pre-trained language models, such as T5 and BERT, have shown a strong capacity to capture the latent traits of texts. Given the success of these models, we seek to study the capacity of these models for query reformulation. In addition, the BERT models have demonstrated further promise for dense retrieval, where the query and documents are encoded into the contextualised embeddings and relevant documents are retrieved by conducting the semantic matching operation. Although the success of pseudo-relevance feedback for sparse retrieval is well documented, effective pseudo-relevance feedback approaches for dense retrieval paradigm are still in their infancy. Thus, we are concerned with excavating the potential of the pseudo-relevance feedback information combined with the large pre-trained models to conduct effective query reformulation operating on both sparse retrieval and dense retrieval.

Improving Fairness and Transparency for Artists in Music Recommender Systems
Authors: Karlijn Dinnissen (1)
1: Utrecht University

ACM DL

Google Scholar

(436)
概要:　ストリーミングサービスは現代の主要な音楽消費手段の一つとなっており、その中でも音楽レコメンダーシステム（MRS）は重要な構成要素です。MRSの選択はユーザーの消費に強い影響を与え、その逆もまた然りです。そのため、すべての利害関係者に対してこれらの選択の公平性を確保することへの関心が高まっています。まず、ユーザーにとっての不公平は、正確性やカバレッジの観点から質の低い推奨を受ける結果を招く可能性があります。次に、アイテム提供者（すなわちアーティスト）の不公平は、一部のアーティストが露出を減らされ、したがって収益が減少する可能性があることを意味します。しかし、公平性を向上させることは、例えば全体的な推奨の質やユーザー満足度の低下を招かずに行うことが挑戦となります。複数のステークホルダーに対して、おそらくドメイン固有の目的を同時にバランスさせる際には追加の複雑さが生じます。音楽分野においてユーザーおよびアーティストの視点からの公平性研究は存在していますが、直接アーティストに意見を求める研究は少ないです（例外としてFerraro et al. (2021)）。レコメンデーションシステムと対話し、これらのシステムの公平性を評価する際に、多くの要素が意思決定に影響を与えるため、もう一つの課題として透明性の欠如が挙げられます。アーティストは、自身のためおよびユーザーに向けて、MRSの透明性を高めることを希望しています。例えば、Millecamp et al. (2019)はMRSユーザー向けに透明性を高めるために説明を用いていますが、我々の知る限り、アーティスト向けに透明性を向上させる研究は行われていません。

Abstract:　 Streaming services have become one of today's main sources of music consumption, with music recommender systems (MRS) as important components. The MRS' choices strongly influence what users consume, and vice versa. Therefore, there is a growing interest in ensuring the fairness of these choices for all stakeholders involved. Firstly, for users, unfairness might result in some users receiving lower-quality recommendations in terms of accuracy and coverage. Secondly, item provider (i.e. artist) unfairness might result in some artists receiving less exposure, and therefore less revenue. However, it is challenging to improve fairness without a decrease in, for instance, overall recommendation quality or user satisfaction. Additional complications arise when balancing possibly domain-specific objectives for multiple stakeholders at once. While fairness research exists from both the user and artist perspective in the music domain, there is a lack of research directly consulting artists---with Ferraro et al. (2021) as an exception. When interacting with recommendation systems and evaluating their fairness, the many factors influencing recommendation system decisions can cause another difficulty: lack of transparency. Artists indicate they would appreciate more transparency in MRS---both towards the user and themselves. While e.g. Millecamp et al. (2019) use explanations to increase transparency for MRS users, to the best of our knowledge, no research has addressed improving transparency for artists this way.

Explainable Conversational Question Answering over Heterogeneous Sources
Authors: Philipp Christmann (1)
1: Max Planck Institute for Informatics & Saarland University

ACM DL

Google Scholar

(437)
概要:　最先端の会話型質問応答（ConvQA）は、知識ベース（KB）、テキストコーパス、またはテーブルのコレクションなど、均質な情報源に基づいて動作します。これにより、ConvQAシステムの回答範囲は本質的に制限されます。そのため、私の博士課程中に、会話型質問に回答するために異質な情報源を活用したいと考えています。さらに、そのようなConvQAシステムの説明可能性を調査し、ユーザーが回答導出プロセスを理解する上で何が役立つかを特定することを計画しています。

Abstract:　 State-of-the-art conversational question answering (ConvQA) operates over homogeneous sources of information: either a knowledge base (KB), or a text corpus, or a collection of tables. This inherently limits the answer coverage of ConvQA systems. Therefore, during my PhD, we would like to tap into heterogeneous sources for answering conversational questions. Further, we plan to investigate the explainability of such ConvQA systems, to identify what helps users in understanding the answer derivation process.

KA-Recsys: Patient Focused Knowledge Appropriate Health Recommender System
Authors: Khushboo Thaker (1)
1: University of Pittsburgh

ACM DL

Google Scholar

(438)
概要:　糖尿病患者、癌患者、心臓病患者などの慢性疾患患者は、自己管理と意思決定のために毎日健康情報を積極的に探しています。患者中心の健康レコメンダシステム（Patient Focused Health Recommender Systems, PHRS）は、患者の変化するニーズに応じた健康情報を提案し、情報へのアクセスを容易にする役割を果たしています。しかし、疾患の進行や患者の知識が増えるにつれ、患者のニーズは複雑化します。そのため、PHRSの一つの重要な要件は、患者の変化する病気についての知識に合わせた健康情報を提案することです。しかし、現在のPHRSは患者の興味に個別化されているだけで、病気についての患者の知識を考慮していません。

患者に知識レベルに合わせた情報を提供することで、患者は病気の管理に対して理解を深め、積極的に参加しやすくなるだけでなく、PHRSを利用して病気に関する学習を行うことができます。そのため、私の博士論文の主な目標は、患者の動的な情報ニーズと病気についての知識レベルを考慮した健康情報を提案するために、レコメンダシステムと個別化学習の分野の技術を探求することです。

これらのアイデアを知識適応型レコメンダシステム（Knowledge-Appropriate PHRS, KA-PHRS）を開発する文脈で探求します。KA-PHRSの重要なイノベーションは、患者の病気に関する知識レベルの変化を追跡し、知識適応型の推薦を可能にする「患者知識モデル」です。KA-PHRSが提案する健康情報は、患者の自己管理と治療への関与を増進し、具体的な利益をもたらすことが期待されています。

Abstract:　 Chronic disease patients, such as diabetics, cancer patients, and heart disease patients, actively seek health information for self-management and decision-making every single day. Patient focused health recommender systems (PHRSs) that suggest health information relevant to patients' changing needs, assists them with easy information accessibility. Nevertheless, patients' needs become more complex with disease progression and their increased knowledge about disease. Hence, a unique requirement of the PHRS would be to suggest health information in line with patients' changing knowledge about the disease. However, current PHRS are personalized to patient interest and don't consider their knowledge about disease. By providing patients with information tailored at their knowledge-level, they not only are more likely to understand and engage better in disease management, but can use PHRS for disease related learning. Hence, the overarching goal of my PhD thesis is to explore technologies in the field of recommender systems and personalized learning for the purpose of suggesting health information that accounts for patients' dynamic information needs and level of knowledge about disease. We will explore these ideas in the context of developing a knowledge-appropriate PHRS (KA-PHRS ). A critical innovation of KA-PHRS is the patient knowledge model that keeps track of patients' changing knowledge-level about disease and enables knowledge-appropriate recommendations. The expectation is that health information suggested by KA-PHRS will increase as well as benefit patients' involvement in self management and treatment.Chronic disease patients, such as diabetics, cancer patients, and heart disease patients, actively seek health information for self-management and decision-making every single day. Patient focused health recommender systems (PHRSs) that suggest health information relevant to patients' changing needs, assists them with easy information accessibility. Nevertheless, patients' needs become more complex with disease progression and their increased knowledge about disease. Hence, a unique requirement of the PHRS would be to suggest health information in line with patients' changing knowledge about the disease. However, current PHRS are personalized to patient interest and don't consider their knowledge about disease. By providing patients with information tailored at their knowledge-level, they not only are more likely to understand and engage better in disease management, but can use PHRS for disease related learning. Hence, the overarching goal of my PhD thesis is to explore technologies in the field of recommender systems and personalized learning for the purpose of suggesting health information that accounts for patients' dynamic information needs and level of knowledge about disease. We will explore these ideas in the context of developing a knowledge-appropriate PHRS (KA-PHRS ). A critical innovation of KA-PHRS is the patient knowledge model that keeps track of patients' changing knowledge-level about disease and enables knowledge-appropriate recommendations. The expectation is that health information suggested by KA-PHRS will increase as well as benefit patients' involvement in self management and treatment.

User-centered Non-factoid Answer Retrieval
Authors: Marwah Alaofi (1)
1: RMIT University

ACM DL

Google Scholar

(439)
概要:　本研究は、検索エンジンを用いて非事実型回答を探す際のユーザーに対する仮定を検証することを目的としています。具体的には、非事実型質問回答タスクへの取り組み方、質問を表現する際の言語、クエリの多様性、および提供された回答に対する行動を調査します。本調査では、これらの無視されがちな要因が検索パフォーマンスに与える影響の程度を検討し、現実的な方法論とテストコレクションの構築の重要性を示すことを目指しています。予備調査を通じて、非事実型質問回答クエリの特性を探り、クエリの多様性と現代の検索モデルに与える影響を調査し始めました。予備結果は、大規模なクエリログからサンプルされた非事実型質問と、QAデータセットで使用される質問の間に顕著な違いがあることを示しています。さらに、クエリの多様性が検索の一貫性に深刻な影響を与えることを示しており、検索パフォーマンスに与える潜在的な影響が研究に値することを示しています。非事実型回答を探す際のユーザー行動、特に回答を受け取った後の行動を理解する重要性を強調します。これにより、異なる種類の非事実型質問にわたるユーザーのサポートニーズを理解し、学習を促進し、探究を奨励するインタラクションモデルの設計に役立つと考えられます。

Abstract:　 In this research, we aim to examine the assumptions made about users when searching for non-factoid answers using search engines. That is, the way they approach non-factoid question-answering tasks, the language they use to express their questions, the variability in their queries and their behavior towards the provided answers. The investigation will also examine the extent to which these neglected factors affect retrieval performance and potentially highlight the importance of building more realistic methodologies and test collections that capture the real nature of this task. Through our preliminary work, we have begun to explore the characteristics of non-factoid question-answering queries and investigate query variability and their impact on modern retrieval models. Our preliminary results demonstrate notable differences between non-factoid questions sampled from a large query log and those used in QA datasets. In addition, our results demonstrate a profound effect of query variability on retrieval consistency, indicating a potential impact on retrieval performance that is worth studying. We highlight the importance of understanding user behaviour while searching for non-factoid answers, specifically the way they behave in response to receiving an answer. This should advance our understanding of the support users require across different types of non-factoid questions and inform the design of interaction models that support learning and encourage exploring.