Abstract: Using implicit feedback collected from user clicks as training labels for learning-to-rank algorithms is a well-developed paradigm that has been extensively studied and used in modern IR systems. Using user clicks as ranking features, on the other hand, has not been fully explored in existing literature. Despite its potential in improving short-term system performance, whether the incorporation of user clicks as ranking features is beneficial for learning-to-rank systems in the long term is still questionable. Two of the most important problems are (1) the explicit bias introduced by noisy user behavior, and (2) the implicit bias, which we refer to as the exploitation bias, introduced by the dynamic training and serving of learning-to-rank systems with behavior features. In this paper, we explore the possibility of incorporating user clicks as both training labels and ranking features for learning to rank. We formally investigate the problems in feature collection and model training, and propose a counterfactual feature projection function and a novel uncertainty-aware learning to rank framework. Experiments on public datasets show that ranking models learned with the proposed framework can significantly outperform models built with raw click features and algorithms that rank items without considering model uncertainty.
Abstract: In this paper we study how to effectively exploit implicit feedback in Dense Retrievers (DRs). We consider the specific case in which click data from a historic click log is available as implicit feedback. We then exploit such historic implicit interactions to improve the effectiveness of a DR. A key challenge that we study is the effect that biases in the click signal, such as position bias, have on the DRs. To overcome the problems associated with the presence of such bias, we propose the Counterfactual Rocchio (CoRocchio) algorithm for exploiting implicit feedback in Dense Retrievers. We demonstrate both theoretically and empirically that dense query representations learnt with CoRocchio are unbiased with respect to position bias and lead to higher retrieval effectiveness. We make available the implementations of the proposed methods and the experimental framework, along with all results at https://github.com/ielab/Counterfactual-DR.
Abstract: Implicit feedback has been widely used to build commercial recommender systems. Because observed feedback represents users' click logs, there is a semantic gap between true relevance and observed feedback. More importantly, observed feedback is usually biased towards popular items, thereby overestimating the actual relevance of popular items. Although existing studies have developed unbiased learning methods using inverse propensity weighting (IPW) or causal reasoning, they solely focus on eliminating the popularity bias of items. In this paper, we propose a novel unbiased recommender learning model, namely BIlateral SElf-unbiased Recommender (BISER), to eliminate the exposure bias of items caused by recommender models. Specifically, BISER consists of two key components: (i) self-inverse propensity weighting (SIPW) to gradually mitigate the bias of items without incurring high computational costs; and (ii) bilateral unbiased learning (BU) to bridge the gap between two complementary models in model predictions, i.e., user- and item-based autoencoders, alleviating the high variance of SIPW. Extensive experiments show that BISER consistently outperforms state-of-the-art unbiased recommender models over several datasets, including Coat, Yahoo! R3, MovieLens, and CiteULike.
Abstract: Most recommender systems evaluate model performance offline through either: 1) normal biased test on factual interactions; or 2) debiased test with records from the randomized controlled trial. In fact, both tests only reflect part of the whole picture: factual interactions are collected from the recommendation policy, fitting them better implies benefiting the platform with higher click or conversion rate; in contrast, debiased test eliminates system-induced biases and thus is more reflective of user true preference. Nevertheless, we find that existing models exhibit trade-off on the two tests, and there lacks methods that perform well on both tests. In this work, we aim to develop a win-win recommendation method that is strong on both tests. It is non-trivial, since it requires to learn a model that can make accurate prediction in both factual environment (ie normal biased test) and counterfactual environment (ie debiased test). Towards the goal, we perform environment-aware recommendation modeling by considering both environments. In particular, we propose an Interpolative Distillation (InterD) framework, which interpolates the biased and debiased models at user-item pair level by distilling a student model. We conduct experiments on three real-world datasets with both tests. Empirical results justify the rationality and effectiveness of InterD, which stands out on both tests especially demonstrates remarkable gains on less popular items.
Abstract: Recent years have witnessed the great accuracy performance of graph-based Collaborative Filtering (CF) models for recommender systems. By taking the user-item interaction behavior as a graph, these graph-based CF models borrow the success of Graph Neural Networks (GNN), and iteratively perform neighborhood aggregation to propagate the collaborative signals. While conventional CF models are known for facing the challenges of the popularity bias that favors popular items, one may wonder "Whether the existing graph-based CF models alleviate or exacerbate the popularity bias of recommender systems?" To answer this question, we first investigate the two-fold performances w.r.t. accuracy and novelty for existing graph-based CF methods. The empirical results show that symmetric neighborhood aggregation adopted by most existing graph-based CF models exacerbates the popularity bias and this phenomenon becomes more serious as the depth of graph propagation increases. Further, we theoretically analyze the cause of popularity bias for graph-based CF. Then, we propose a simple yet effective plugin, namely r-AdjNorm, to achieve an accuracy-novelty trade-off by controlling the normalization strength in the neighborhood aggregation process. Meanwhile, r-AdjNorm can be smoothly applied to the existing graph-based CF backbones without additional computation. Finally, experimental results on three benchmark datasets show that our proposed method can improve novelty without sacrificing accuracy under various graph-based CF backbones.
Abstract: Recommender system usually faces popularity bias. From the popularity distribution shift perspective, the normal paradigm trained on exposed items (most are hot items) identifies that recommending popular items more frequently can achieve lower loss, thus injecting popularity information into item property embedding, e.g., id embedding. From the long-tail distribution shift perspective, the sparse interactions of long-tail items lead to insufficient learning of them. The resultant distribution discrepancy between hot and long-tail items would not only inherit the bias, but also amplify the bias. Existing work addresses this issue with inverse propensity scoring (IPS) or causal embeddings. However, we argue that not all popularity biases mean bad effects, i.e., some items show higher popularity due to better quality or conform to current trends, which deserve more recommendations. Blindly seeking unbiased learning may inhibit high-quality or fashionable items. To make better use of the popularity bias, we propose a co-training disentangled domain adaptation network (CD$^2$AN), which can co-train both biased and unbiased models. Specifically, for popularity distribution shift, CD$^2$AN disentangles item property representation and popularity representation from item property embedding. For long-tail distribution shift, we introduce additional unexposed items (most are long-tail items) to align the distribution of hot and long-tail item property representations. Further, from the instances perspective, we carefully design the item similarity regularization to learn comprehensive item representation, which encourages item pairs with more effective co-occurrences patterns to have more similar item property representations. Based on offline evaluations and online A/B tests, we show that CD$^2$AN outperforms the existing debiased solutions. Currently, CD$^2$AN has been successfully deployed at Mobile Taobao App and handling major online traffic.
Abstract: Collaborative Filtering (CF) has emerged as fundamental paradigms for parameterizing users and items into latent representation space, with their correlative patterns from interaction data. Among various CF techniques, the development of GNN-based recommender systems, e.g., PinSage and LightGCN, has offered the state-of-the-art performance. However, two key challenges have not been well explored in existing solutions: i) The over-smoothing effect with deeper graph-based CF architecture, may cause the indistinguishable user representations and degradation of recommendation results. ii) The supervision signals (i.e., user-item interactions) are usually scarce and skewed distributed in reality, which limits the representation power of CF paradigms. To tackle these challenges, we propose a new self-supervised recommendation framework Hypergraph Contrastive Collaborative Filtering (HCCF) to jointly capture local and global collaborative relations with a hypergraph-enhanced cross-view contrastive learning architecture. In particular, the designed hypergraph structure learning enhances the discrimination ability of GNN-based CF paradigm, in comprehensively capturing the complex high-order dependencies among users. Additionally, our HCCF model effectively integrates the hypergraph structure encoding with self-supervised learning to reinforce the representation quality of recommender systems, based on the hypergraph self-discrimination. Extensive experiments on three benchmark datasets demonstrate the superiority of our model over various state-of-the-art recommendation methods, and the robustness against sparse user interaction data. The implementation codes are available at https://github.com/akaxlh/HCCF.
Abstract: Learning informative representations of users and items from the historical interactions is crucial to collaborative filtering (CF). Existing CF approaches usually model interactions solely within the Euclidean space. However, the sophisticated user-item interactions inherently present highly non-Euclidean anatomy with various types of geometric patterns (i.e., tree-likeness and cyclic structures). The Euclidean-based models may be inadequate to fully uncover the intent factors beneath such hybrid-geometry interactions. To remedy this deficiency, in this paper, we study the novel problem of Geometric Disentangled Collaborative Filtering (GDCF), which aims to reveal and disentangle the latent intent factors across multiple geometric spaces. A novel generative GDCF model is proposed to learn geometric disentangled representations by inferring the high-level concepts associated with user intentions and various geometries. Empirically, our proposal is extensively evaluated over five real-world datasets, and the experimental results demonstrate the superiority of GDCF.
Abstract: Collaborative filtering is one of the most common scenarios and popular research topics in recommender systems. Among existing methods, latent factor models, i.e., learning a specific embedding for each user/item by reconstructing the observed interaction matrix, have shown excellent performances. However, such user-specific and item-specific embeddings are intrinsically transductive, making it difficult for them to deal with new users and new items unseen during training. Besides, the number of model parameters heavily depends on the number of all users and items, restricting their scalability to real-world applications. To solve the above challenges, in this paper, we propose a novel model-agnostic and scalable Inductive Embedding Module for collaborative filtering, namely INMO. INMO generates the inductive embeddings for users (items) by characterizing their interactions with some template items (template users), instead of employing an embedding lookup table. Under the theoretical analysis, we further propose an effective indicator for the selection of template users and template items. Our proposed INMO can be attached to existing latent factor models as a pre-module, inheriting the expressiveness of backbone models, while bringing the inductive ability and reducing model parameters. We validate the generality of INMO by attaching it to Matrix Factorization (MF) and LightGCN, which are two representative latent factor models for collaborative filtering. Extensive experiments on three public benchmarks demonstrate the effectiveness and efficiency of INMO in both transductive and inductive recommendation scenarios.
Abstract: Due to the widespread presence of implicit feedback, recommendation based on them has been a long-standing research problem in academia and industry. However, it suffers from the extremely-sparse problem, since each user only interacts with a few items. One well-known and good-performing method is to treat each user's all uninteracted items as negative with low confidence. The method intrinsically imposes an implicit regularization to penalize large deviation of each user's preferences for uninteracted items from a constant. However, these methods have to assume a constant-rating prior to uninteracted items, which may be questionable. In this paper, we propose a novel ring-based regularization to penalize significant differences of each user's preferences between each item and some other items. The ring structure, described by an item graph, determines which other items are selected for each item in the regularization. The regularization not only averts the introduction of the prior ratings but also implicitly penalizes the remarkable preference differences for all items according to theoretical analysis. However, optimizing the recommenders with the regularization still suffers from computational challenges, so we develop a scalable alternating least square algorithm by carefully designing gradient computation. Therefore, as long as connecting each item with a sublinear/constant number of other items in the item graph, the overall learning algorithm could be comparably efficient to the existing algorithms. The proposed regularization is extensively evaluated with several public recommendation datasets, where the results show that the regularization could lead to considerable improvements in recommendation performance.
Abstract: Recommender systems aim to provide personalized services to users and are playing an increasingly important role in our daily lives. The key of recommender systems is to predict how likely users will interact with items based on their historical online behaviors, e.g., clicks, add-to-cart, purchases, etc. To exploit these user-item interactions, there are increasing efforts on considering the user-item interactions as a user-item bipartite graph and then performing information propagation in the graph via Graph Neural Networks (GNNs). Given the power of GNNs in graph representation learning, these GNNs-based recommendation methods have remarkably boosted the recommendation performance. Despite their success, most existing GNNs-based recommender systems overlook the existence of interactions caused by unreliable behaviors (e.g., random/bait clicks) and uniformly treat all the interactions, which can lead to sub-optimal and unstable performance. In this paper, we investigate the drawbacks (e.g., non-adaptive propagation and non-robustness) of existing GNN-based recommendation methods. To address these drawbacks, we introduce a principled graph trend collaborative filtering method and propose the Graph Trend Filtering Networks for recommendations (GTN) that can capture the adaptive reliability of the interactions. Comprehensive experiments and ablation studies are presented to verify and understand the effectiveness of the proposed framework. Our implementation based on PyTorch is available: https://github.com/wenqifan03/GTN-SIGIR2022.
Abstract: Recently, graph neural networks (GNN) have been successfully applied to recommender systems as an effective collaborative filtering (CF) approach. However, existing GNN-based CF models suffer from noisy user-item interaction data, which seriously affects the effectiveness and robustness in real-world applications. Although there have been several studies on data denoising in recommender systems, they either neglect direct intervention of noisy interaction in the message-propagation of GNN, or fail to preserve the diversity of recommendation when denoising. To tackle the above issues, this paper presents a novel GNN-based CF model, named Robust Graph Collaborative Filtering (RGCF), to denoise unreliable interactions for recommendation. Specifically, RGCF consists of a graph denoising module and a diversity preserving module. The graph denoising module is designed for reducing the impact of noisy interactions on the representation learning of GNN, by adopting both a hard denoising strategy (i.e., discarding interactions that are confidently estimated as noise) and a soft denoising strategy (i.e., assigning reliability weights for each remaining interaction). In the diversity preserving module, we build up a diversity augmented graph and propose an auxiliary self-supervised task based on mutual information maximization (MIM) for enhancing the denoised representation and preserving the diversity of recommendation. These two modules are integrated in a multi-task learning manner that jointly improves the recommendation performance. We conduct extensive experiments on three real-world datasets and three synthesized datasets. Experiment results show that RGCF is more robust against noisy interactions and achieves significant improvement compared with baseline models.
Abstract: User simulation has been a cost-effective technique for evaluating conversational recommender systems. However, building a human-like simulator is still an open challenge. In this work, we focus on how users reformulate their utterances when a conversational agent fails to understand them. First, we perform a user study, involving five conversational agents across different domains, to identify common reformulation types and their transition relationships. A common pattern that emerges is that persistent users would first try to rephrase, then simplify, before giving up. Next, to incorporate the observed reformulation behavior in a user simulator, we introduce the task of reformulation sequence generation: to generate a sequence of reformulated utterances with a given intent (rephrase or simplify). We develop methods by extending transformer models guided by the reformulation type and perform further filtering based on estimated reading difficulty. We demonstrate the effectiveness of our approach using both automatic and human evaluation.
Abstract: Conversational question answering (ConvQA) tackles sequential information needs where contexts in follow-up questions are left implicit. Current ConvQA systems operate over homogeneous sources of information: either a knowledge base (KB), or a text corpus, or a collection of tables. This paper addresses the novel issue of jointly tapping into all of these together, this way boosting answer coverage and confidence. We present CONVINSE, an end-to-end pipeline for ConvQA over heterogeneous sources, operating in three stages: i) learning an explicit structured representation of an incoming question and its conversational context, ii) harnessing this frame-like representation to uniformly capture relevant evidences from KB, text, and tables, and iii) running a fusion-in-decoder model to generate the answer. We construct and release the first benchmark, ConvMix, for ConvQA over heterogeneous sources, comprising 3000 real-user conversations with 16000 questions, along with entity annotations, completed question utterances, and question paraphrases. Experiments demonstrate the viability and advantages of our method, compared to state-of-the-art baselines.
Abstract: Generating fluent and informative natural responses while main- taining representative internal states for search optimization is critical for conversational search systems. Existing approaches ei- ther 1) predict structured dialog acts first and then generate natural response; or 2) map conversation context to natural responses di- rectly in an end-to-end manner. Both kinds of approaches have shortcomings. The former suffers from error accumulation while the semantic associations between structured acts and natural re- sponses are confined in single direction. The latter emphasizes generating natural responses but fails to predict structured acts. Therefore, we propose a neural co-generation model that gener- ates the two concurrently. The key lies in a shared latent space shaped by two informed priors. Specifically, we design structured dialog acts and natural response auto-encoding as two auxiliary tasks in an interconnected network architecture. It allows for the concurrent generation and bidirectional semantic associations. The shared latent space also enables asynchronous reinforcement learn- ing for further joint optimization. Experiments show that our model achieves significant performance improvements.
Abstract: Conversational recommender systems (CRSs) provide recommendations through interactive conversations. CRSs typically provide recommendations through relatively straightforward interactions, where the system continuously inquires about a user's explicit attribute-aware preferences and then decides which items to recommend. In addition, topic tracking is often used to provide naturally sounding responses. However, merely tracking topics is not enough to recognize a user's real preferences in a dialogue. In this paper, we address the problem of accurately recognizing and maintaining user preferences in CRSs. Three challenges come with this problem: (1) An ongoing dialogue only provides the user's short-term feedback; (2) Annotations of user preferences are not available; and (3) There may be complex semantic correlations among items that feature in a dialogue. We tackle these challenges by proposing an end-to-end variational reasoning approach to the task of conversational recommendation. We model both long-term preferences and short-term preferences as latent variables with topical priors for explicit long-term and short-term preference exploration, respectively. We use an efficient stochastic gradient variational Bayesian (SGVB) estimator for optimizing the derived evidence lower bound. A policy network is then used to predict topics for a clarification utterance or items for a recommendation response. The use of explicit sequences of preferences with multi-hop reasoning in a heterogeneous knowledge graph helps to provide more accurate conversational recommendation results. Extensive experiments conducted on two benchmark datasets show that our proposed method outperforms state-of-the-art baselines in terms of both objective and subjective evaluation metric
Abstract: Conversational search is a crucial and promising branch in information retrieval. In this paper, we reveal that not all historical conversational turns are necessary for understanding the intent of the current query. The redundant noisy turns in the context largely hinder the improvement of search performance. However, enhancing the context denoising ability for conversational search is quite challenging due to data scarcity and the steep difficulty for simultaneously learning conversational query encoding and context denoising. To address these issues, in this paper, we present a novel Curriculum cOntrastive conTExt Denoising framework, COTED, towards few-shot conversational dense retrieval. Under a curriculum training order, we progressively endow the model with the capability of context denoising via contrastive learning between noised samples and denoised samples generated by a new conversation data augmentation strategy. Three curriculums tailored to conversational search are exploited in our framework. Extensive experiments on two few-shot conversational search datasets, i.e., CAsT-19 and CAsT-20, validate the effectiveness and superiority of our method compared with the state-of-the-art baselines.
Abstract: Recently, pre-training methods have shown remarkable success in task-oriented dialog (TOD) systems. However, most existing pre-trained models for TOD focus on either dialog understanding or dialog generation, but not both. In this paper, we propose SPACE, a novel unified pre-trained dialog model learning from large-scale dialog corpora with limited annotations, which can be effectively fine-tuned on a wide range of downstream dialog tasks. Specifically, SPACE consists of four successive components in a single transformer to maintain a task-flow in TOD systems: (i) a dialog encoding module to encode dialog history, (ii) a dialog understanding module to extract semantic vectors from either user queries or system responses, (iii) a dialog policy module to generate a policy vector that contains high-level semantics of the response, and (iv) a dialog generation module to produce appropriate responses. We design a dedicated pre-training objective for each component. Concretely, we pre-train the dialog encoding module with span mask language modeling to learn contextualized dialog information. To capture the structured dialog semantics, we pre-train the dialog understanding module via a novel tree-induced semi-supervised contrastive learning objective with the help of extra dialog annotations. In addition, we pre-train the dialog policy module by minimizing the ℒ2 distance between its output policy vector and the semantic vector of the response for policy optimization. Finally, the dialog generation model is pre-trained by language modeling. Results show that SPACE achieves state-of-the-art performance on eight downstream dialog benchmarks, including intent prediction, dialog state tracking, and end-to-end dialog modeling. We also show that SPACE has a stronger few-shot ability than existing models under the low-resource setting.
Abstract: Maintaining a consistent persona is essential for building a human-like conversational model. However, the lack of attention to the partner makes the model more egocentric: they tend to show their persona by all means such as twisting the topic stiffly, pulling the conversation to their own interests regardless, and rambling their persona with little curiosity to the partner. In this work, we propose COSPLAY(COncept Set guided PersonaLized dialogue generation Across both partY personas) that considers both parties as a "team": expressing self-persona while keeping curiosity toward the partner, leading responses around mutual personas, and finding the common ground. Specifically, we first represent self-persona, partner persona and mutual dialogue all in the concept sets. Then, we propose the Concept Set framework with a suite of knowledge-enhanced operations to process them such as set algebras, set expansion, and set distance. Based on these operations as medium, we train the model by utilizing 1) concepts of both party personas, 2) concept relationship between them, and 3) their relationship to the future dialogue. Extensive experiments on a large public dataset, Persona-Chat, demonstrate that our model outperforms state-of-the-art baselines for generating less egocentric, more human-like, and higher quality responses in both automatic and human evaluations.
Abstract: Proactive dialogue system is able to lead the conversation to a goal topic and has advantaged potential in bargain, persuasion, and negotiation. Current corpus-based learning manner limits its practical application in real-world scenarios. To this end, we contribute to advancing the study of the proactive dialogue policy to a more natural and challenging setting, i.e., interacting dynamically with users. Further, we call attention to the non-cooperative user behavior - the user talks about off-path topics when he/she is not satisfied with the previous topics introduced by the agent. We argue that the targets of reaching the goal topic quickly and maintaining a high user satisfaction are not always converged, because the topics close to the goal and the topics user preferred may not be the same. Towards this issue, we propose a new solution named I-Pro that can learn Proactive policy in the Interactive setting. Specifically, we learn the trade-off via a learned goal weight, which consists of four factors (dialogue turn, goal completion difficulty, user satisfaction estimation, and cooperative degree). The experimental results demonstrate I-Pro significantly outperforms baselines in terms of effectiveness and interpretability.
Abstract: Conversational recommender systems (CRS) aim to provide highquality recommendations in conversations. However, most conventional CRS models mainly focus on the dialogue understanding of the current session, ignoring other rich multi-aspect information of the central subjects (i.e., users) in recommendation. In this work, we highlight that the user's historical dialogue sessions and look-alike users are essential sources of user preferences besides the current dialogue session in CRS. To systematically model the multi-aspect information, we propose a User-Centric Conversational Recommendation (UCCR) model, which returns to the essence of user preference learning in CRS tasks. Specifically, we propose a historical session learner to capture users' multi-view preferences from knowledge, semantic, and consuming views as supplements to the current preference signals. A multi-view preference mapper is conducted to learn the intrinsic correlations among different views in current and historical sessions via self-supervised objectives. We also design a temporal look-alike user selector to understand users via their similar users. The learned multi-aspect multi-view user preferences are then used for the recommendation and dialogue generation. In experiments, we conduct comprehensive evaluations on both Chinese and English CRS datasets. The significant improvements over competitive models in both recommendation and dialogue generation verify the superiority of UCCR.
Abstract: Asking clarifying questions is an interactive way to effectively clarify user intent. When a user submits a query, the search engine will return a clarifying question with several clickable items of sub-intents for clarification. According to the existing definition, the key to asking high-quality questions is to generate good descriptions for submitted queries and provided items. However, existing methods mainly based on static knowledge bases are difficult to find descriptions for many queries because of the lack of entities within these queries and their corresponding items. For such a query, it is unable to generate an informative question. To alleviate this problem, we propose leveraging top search results of the query to help generate better descriptions because we deem that the top retrieved documents contain rich and relevant contexts of the query. Specifically, we first design a rule-based algorithm to extract description candidates from search results and rank them by various human-designed features. Then, we apply an learning-to-rank model and another generative model for generalization and further improve the quality of clarifying questions. Experimental results show that our proposed methods can generate more readable and informative questions compared with existing methods. The results prove that search results can be utilized to improve users' search experience for search clarification in conversational search systems.
Abstract: Traditional dialogue summarization models rely on a large-scale manually-labeled corpus, lacking generalization ability to new domains, and domain adaptation from a labeled source domain to an unlabeled target domain is important in practical summarization scenarios. However, existing domain adaptation works in dialogue summarization generally require large-scale pre-training using extensive external data. To explore the lightweight fine-tuning methods, in this paper, we propose an efficient Adversarial Disentangled Prompt Learning (ADPL) model for domain adaptation in dialogue summarization. We introduce three kinds of prompts including domain-invariant prompt (DIP), domain-specific prompt (DSP), and task-oriented prompt (TOP). DIP aims to disentangle and transfer the shared knowledge from the source domain and target domain in an adversarial way, which improves the accuracy of prediction about domain-invariant information and enhances the ability for generalization to new domains. DSP is designed to guide our model to focus on domain-specific knowledge using domain-related features. TOP is to capture task-oriented knowledge to generate high-quality summaries. Instead of fine-tuning the whole pre-trained language model (PLM), we only update the prompt networks but keep PLM fixed. Experimental results on the zero-shot setting show that the novel design of prompts can yield more coherent, faithful, and relevant summaries than baselines using the prefix-tuning, and perform at par with fine-tuning while being more efficient. Overall, our work introduces a prompt-based perspective to the zero-shot learning for dialogue summarization task and provides valuable findings and insights for future research.
Abstract: Conversational recommender systems (CRS) enable traditional recommender systems to interact with users by asking questions about attributes and recommending items. The attribute-level and item-level feedback of users can be utilized to estimate users' preferences. However, existing works do not fully exploit the advantage of explicit item feedback --- they only use the item feedback in rather implicit ways such as updating the latent user and item representation. Since CRS has multiple chances to interact with users, leveraging the context in the conversation may help infer users' implicit feedback (e.g., some specific attributes) when recommendations get rejected. To address the limitations of existing methods, we propose a new CRS framework called Conversational Recommender with Implicit Feedback (CRIF). CRIF formulates the conversational recommendation scheme as a four-phase process consisting of offline representation learning, tracking, decision, and inference. In the inference module, by fully utilizing the relation between users' attribute-level and item-level feedback, our method can explicitly deduce users' implicit preferences. Therefore, CRIF is able to achieve more accurate user preference estimation. Besides, in the decision module, to better utilize the attribute-level and item-level feedback, we adopt inverse reinforcement learning to learn a flexible decision strategy that selects the suitable action at each conversation turn. Through extensive experiments on four benchmark CRS datasets, we validate the effectiveness of our approach, which significantly outperforms the state-of-the-art CRS methods.
Abstract: Data sparsity is a long-standing problem in recommender systems. To alleviate it, Cross-Domain Recommendation (CDR) has attracted a surge of interests, which utilizes the rich user-item interaction information from the related source domain to improve the performance on the sparse target domain. Recent CDR approaches pay attention to aggregating the source domain information to generate better user representations for the target domain. However, they focus on designing more powerful interaction encoders to learn both domains simultaneously, but fail to model different user preferences of different domains. Particularly, domain-specific preferences of the source domain usually provide useless information to enhance the performance in the target domain, and directly aggregating the domain-shared and domain-specific information together maybe hurts target domain performance. This work considers a key challenge of CDR: How do we transfer shared information across domains? Grounded in the information theory, we propose DisenCDR, a novel model to disentangle the domain-shared and domain-specific information. To reach our goal, we propose two mutual-information-based disentanglement regularizers. Specifically, an exclusive regularizer aims to enforce the user domain-shared representations and domain-specific representations encoding exclusive information. An information regularizer is to encourage the user domain-shared representations encoding predictive information for both domains. Based on them, we further derive a tractable bound of our disentanglement objective to learn desirable disentangled representations. Extensive experiments show that DisenCDR achieves significant improvements over state-of-the-art baselines on four real-world datasets.
Abstract: The goal of cross-domain retrieval (CDR) is to search for instances of the same category in one domain by using a query from another domain. Existing CDR approaches mainly consider the standard scenario that the cross-domain data for both training and testing come from the same categories and underlying distributions. However, these methods cannot be well extended to the newly emerging task of universal cross-domain retrieval (UCDR), where the testing data belong to the domain and categories not present during training. Compared to CDR, the UCDR task is more challenging due to (1) visually diverse data from multi-source domains, (2) the domain shift between seen and unseen domains, and (3) the semantic shift across seen and unseen categories. To tackle these problems, we propose a novel model termed Structure-Aware Semantic-Aligned Network (SASA) to align the heterogeneous representations of multi-source domains without loss of generalizability for the UCDR task. Specifically, we leverage the advanced Vision Transformer (ViT) as the backbone and devise a distillation-alignment ViT (DAViT) with a novel token-based strategy, which incorporates two complementary distillation and alignment tokens into the ViT architecture. In addition, the distillation token is devised to improve the generalizability of our model by structure information preservation and the alignment token is used to improve discriminativeness with trainable categorical prototypes. Extensive experiments on three large-scale benchmarks, i.e., Sketchy, TU-Berlin, and DomainNet, demonstrate the superiority of our SASA method over the state-of-the-art UCDR and ZS-SBIR methods.
Abstract: Interactive recommender systems (IRS) have received wide attention in recent years. To capture users' dynamic preferences and maximize their long-term engagement, IRS are usually formulated as reinforcement learning (RL) problems. Despite the promise to solve complex decision-making problems, RL-based methods generally require a large amount of online interaction, restricting their applications due to economic considerations. One possible direction to alleviate this issue is cross-domain recommendation that aims to leverage abundant logged interaction data from a source domain (e.g., adventure genre in movie recommendation) to improve the recommendation quality in the target domain (e.g., crime genre). Nevertheless, prior studies mostly focus on adapting the static representations of users/items. Few have explored how the temporally dynamic user-item interaction patterns transform across domains. Motivated by the above consideration, we propose DACIR, a novel Doubly-Adaptive deep RL-based framework for Cross-domain Interactive Recommendation. We first pinpoint how users behave differently in two domains and highlight the potential to leverage the shared user dynamics to boost IRS. To transfer static user preferences across domains, DACIR enforces consistency of item representation by aligning embeddings into a shared latent space. In addition, given the user dynamics in IRS, DACIR calibrates the dynamic interaction patterns in two domains via reward correlation. Once the double adaptation narrows the cross-domain gap, we are able to learn a transferable policy for the target recommender by leveraging logged data. Experiments on real-world datasets validate the superiority of our approach, which consistently achieves significant improvements over the baselines.
Abstract: Cross-domain Named Entity Recognition (NER) aims to transfer knowledge from the source domain to the target, alleviating expensive labeling costs in the target domain. Most prior studies acquire domain-invariant features under the end-to-end sequence-labeling framework where each token is assigned a compositional label (e.g., B-LOC). However, the complexity of cross-domain transfer may be increased over this complicated labeling scheme, which leads to sub-optimal results, especially when there are significantly distinct entity categories across domains. In this paper, we aim to explore the task decomposition in cross-domain NER. Concretely, we suggest a modular learning approach in which two sub-tasks (entity span detection and type classification) are learned by separate functional modules to perform respective cross-domain transfer with corresponding strategies. Compared with the compositional labeling scheme, the label spaces are smaller and closer across domains especially in entity span detection, leading to easier transfer in each sub-task. And then we combine two sub-tasks to achieve the final result with modular interaction mechanism, and deploy the adversarial regularization for generalized and robust learning in low-resource target domains. Extensive experiments over 10 diverse domain pairs demonstrate that the proposed method is superior to state-of-the-art cross-domain NER methods in an end-to-end fashion (about average 6.4% absolute F1 score increase). Further analyses show the effectiveness of modular task decomposition and its great potential in cross-domain NER.
Abstract: Cross-Domain Recommendation (CDR) has been popularly studied to utilize different domain knowledge to solve the cold-start problem in recommender systems. Most of the existing CDR models assume that both the source and target domains share the same overlapped user set for knowledge transfer. However, only few proportion of users simultaneously activate on both the source and target domains in practical CDR tasks. In this paper, we focus on the Partially Overlapped Cross-Domain Recommendation (POCDR) problem, that is, how to leverage the information of both the overlapped and non-overlapped users to improve recommendation performance. Existing approaches cannot fully utilize the useful knowledge behind the non-overlapped users across domains, which limits the model performance when the majority of users turn out to be non-overlapped. To address this issue, we propose an end-to-end Dual-autoencoder with Variational Domain-invariant Embedding Alignment (VDEA) model, a cross-domain recommendation framework for the POCDR problem, which utilizes dual variational autoencoders with both local and global embedding alignment for exploiting domain-invariant user embedding. VDEA first adopts variational inference to capture collaborative user preferences, and then utilizes Gromov-Wasserstein distribution co-clustering optimal transport to cluster the users with similar rating interaction behaviors. Our empirical studies on Douban and Amazon datasets demonstrate that VDEA significantly outperforms the state-of-the-art models, especially under the POCDR setting.
Abstract: Click-through rate (CTR) prediction plays an important role in online advertising and recommendation systems, which aims at estimating the probability of a user clicking on a specific item. Feature interaction modeling and user interest modeling methods are two popular domains in CTR prediction, and they have been studied extensively in recent years. However, these methods still suffer from two limitations. First, traditional methods regard item attributes as ID features, while neglecting structure information and relation dependencies among attributes. Second, when mining user interests from user-item interactions, current models ignore user intents and item intents for different attributes, which lacks interpretability. Based on this observation, in this paper, we propose a novel approach Hierarchical Intention Embedding Network (HIEN), which considers dependencies of attributes based on bottom-up tree aggregation in the constructed attribute graph. HIEN also captures user intents for different item attributes as well as item intents based on our proposed hierarchical attention mechanism. Extensive experiments on both public and production datasets show that the proposed model significantly outperforms the state-of-the-art methods. In addition, HIEN can be applied as an input module to state-of-the-art CTR prediction methods, bringing further performance lift for these existing models that might already be intensively used in real systems.
Abstract: Click-Through Rate (CTR) prediction has been widely used in many machine learning tasks such as online advertising and personalization recommendation. Unfortunately, given a domain-specific dataset, searching effective feature interaction operations and combinations from a huge candidate space requires significant expert experience and computational costs. Recently, Neural Architecture Search (NAS) has achieved great success in discovering high-quality network architectures automatically. However, due to the diversity of feature interaction operations and combinations, the existing NAS-based work that treats the architecture search as a black-box optimization problem over a discrete search space suffers from low efficiency. Therefore, it is essential to explore a more efficient architecture search method. To achieve this goal, we propose NAS-CTR, a differentiable neural architecture search approach for CTR prediction. First, we design a novel and expressive architecture search space and a continuous relaxation scheme to make the search space differentiable. Second, we formulate the architecture search for CTR prediction as a joint optimization problem with discrete constraints on architectures and leverage proximal iteration to solve the constrained optimization problem. Additionally, a straightforward yet effective method is proposed to eliminate the aggregation of skip connections. Extensive experimental results reveal that NAS-CTR can outperform the SOTA human-crafted architectures and other NAS-based methods in both test accuracy and search efficiency.
Abstract: CTR prediction has been widely used in the real world. Many methods model feature interaction to improve their performance. However, most methods only learn a fixed representation for each feature without considering the varying importance of each feature under different contexts, resulting in inferior performance. Recently, several methods tried to learn vector-level weights for feature representations to address the fixed representation issue. However, they only produce linear transformations to refine the fixed feature representations, which are still not flexible enough to capture the varying importance of each feature under different contexts. In this paper, we propose a novel module named Feature Refinement Network (FRNet), which learns context-aware feature representations at bit-level for each feature in different contexts. FRNet consists of two key components: 1) Information Extraction Unit (IEU), which captures contextual information and cross-feature relationships to guide context-aware feature refinement; and 2) Complementary Selection Gate (CSGate), which adaptively integrates the original and complementary feature representations learned in IEU with bit-level weights. Notably, FRNet is orthogonal to existing CTR methods and thus can be applied in many existing methods to boost their performance. Comprehensive experiments are conducted to verify the effectiveness, efficiency, and compatibility of FRNet.
Abstract: Click-Through Rate (CTR) prediction, which aims to estimate the probability that a user will click an item, is an essential component of online advertising. Existing methods mainly attempt to mine user interests from users' historical behaviours, which contain users' directly interacted items. Although these methods have made great progress, they are often limited by the recommender system's direct exposure and inactive interactions, and thus fail to mine all potential user interests. To tackle these problems, we propose Neighbor-Interaction based CTR prediction (NI-CTR), which considers this task under a Heterogeneous Information Network (HIN) setting. In short, Neighbor-Interaction based CTR prediction involves the local neighborhood of the target user-item pair in the HIN to predict their linkage. In order to guide the representation learning of the local neighbourhood, we further consider different kinds of interactions among the local neighborhood nodes from both explicit and implicit perspective, and propose a novel Graph-Masked Transformer (GMT) to effectively incorporates these kinds of interactions to produce highly representative embeddings for the target user-item pair. Moreover, in order to improve model robustness against neighbour sampling, we enforce a consistency regularization loss over the neighbourhood embedding. We conduct extensive experiments on two real-world datasets with millions of instances and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly. Meanwhile, the comprehensive ablation studies verify the effectiveness of every component of our model. Furthermore, we have deployed this framework on the WeChat Official Account Platform with billions of users. The online A/B tests demonstrate an average CTR improvement of 21.9% against all online baselines.
Abstract: Accurate estimation of post-click conversion rate is critical for building recommender systems, which has long been confronted with sample selection bias and data sparsity issues. Methods in the Entire Space Multi-task Model (ESMM) family leverage the sequential pattern of user actions, \ie $impression\rightarrow click \rightarrow conversion$ to address data sparsity issue. However, they still fail to ensure the unbiasedness of CVR estimates. In this paper, we theoretically demonstrate that ESMM suffers from the following two problems: (1) Inherent Estimation Bias (IEB) for CVR estimation, where the CVR estimate is inherently higher than the ground truth; (2) Potential Independence Priority (PIP) for CTCVR estimation, where ESMM might overlook the causality from click to conversion. To this end, we devise a principled approach named Entire Space Counterfactual Multi-task Modelling (ESCM$^2$), which employs a counterfactual risk miminizer as a regularizer in ESMM to address both IEB and PIP issues simultaneously. Extensive experiments on offline datasets and online environments demonstrate that our proposed ESCM$^2$ can largely mitigate the inherent IEB and PIP issues and achieve better performance than baseline models.
Abstract: The related work section is an important component of a scientific paper, which highlights the contribution of the target paper in the context of the reference papers. Authors can save their time and effort by using the automatically generated related work section as a draft to complete the final related work. Most of the existing related work section generation methods rely on extracting off-the-shelf sentences to make a comparative discussion about the target work and the reference papers. However, such sentences need to be written in advance and are hard to obtain in practice. Hence, in this paper, we propose an abstractive target-aware related work generator (TAG), which can generate related work sections consisting of new sentences. Concretely, we first propose a target-aware graph encoder, which models the relationships between reference papers and the target paper with target-centered attention mechanisms. In the decoding process, we propose a hierarchical decoder that attends to the nodes of different levels in the graph with keyphrases as semantic indicators. Finally, to generate a more informative related work, we propose multi-level contrastive optimization objectives, which aim to maximize the mutual information between the generated related work with the references and minimize that with non-references. Extensive experiments on two public scholar datasets show that the proposed model brings substantial improvements over several strong baselines in terms of automatic and tailored human evaluations.
Abstract: Information seeking in an academic digital library is complex in nature, often spanning multiple search sessions. Resuming academic search tasks requires significant cognitive effort as searchers must re-acquaint themselves with previous search session activities and previously discovered documents before resuming their search. Further, some academic searchers may find it convenient to initiate such searches on their mobile devices during short gaps in time (e.g., between classes), and resume them later in a desktop environment when they can use the extra screen space and more convenient document storage capabilities of their computers. To support such searching, we have developed an academic digital library search interface that assists searchers in managing cross-session search tasks even when moving between mobile and desktop environments. Using a controlled laboratory study we compared our approach (Dilex) to a standard academic digital library search interface. We found increased user engagement in both the initial (mobile) and resumed (desktop) search activities, and that participants spent more time on the search results pages and had an increased degree of interaction with information and personalization features during the resumed tasks. These results provide evidence that the participants were able to make effective use of the visualization features in Dilex, which enabled them to readily resume their search tasks and stay engaged in the search activities. This work represents an example of how semi-automatic search task/session management and visualization features can support cross-session search, and how designing for both mobile and desktop use can support cross-device search.
Abstract: With the ever-increasing popularity of microservice architecture, a considerable number of enterprises or organizations have encapsulated their complex business services into various lightweight functions as published them accessible APIs (Application Programming Interfaces). Through keyword search, a software developer could select a set of APIs from a massive number of candidates to implement the functions of a complex mashup, which reduces the development cost significantly. However, traditional keyword search methods for APIs often suffer from several critical issues such as functional compatibility and limited diversity in search results, which may lead to mashup creation failures and lower development productivity. To deal with these challenges, this paper designs DAWAR, a diversity-aware Web APIs recommendation approach that finds diversified and compatible APIs for mashup creation. Specifically, the APIs recommendation problem for mashup creating is modelled as a graph search problem that aims to find the minimal group Steiner trees in a correlation graph of APIs. DAWAR innovatively employs the determinantal point processes to diversify the recommended results. Empirical evaluation is performed on commonly-used real-world datasets, and the statistic results show that DAWAR is able to achieve significant improvements in terms of recommendation diversity, accuracy, and compatibility.
Abstract: Knowledge tracing (KT) which aims at predicting learner's knowledge mastery plays an important role in the computer-aided educational system. The goal of KT is to provide personalized learning paths for learners by diagnosing the mastery of each knowledge, thus improving the learning efficiency. In recent years, many deep learning models have been applied to tackle the KT task, which has shown promising results. However, most existing methods simplify the exercising records as knowledge sequences, which fail to explore the rich information that existed in exercises. Besides, the existing diagnosis results of knowledge tracing are not convincing enough since they neglect hierarchical relations between exercises. To solve the above problems, we propose a hierarchical graph knowledge tracing model called HGKT to explore the latent complex relations between exercises. Specifically, we introduce the concept of problem schema to construct a hierarchical exercise graph that could model the exercise learning dependencies. Moreover, we employ two attention mechanisms to highlight important historical states of learners. In the testing stage, we present a knowledge&schema diagnosis matrix that could trace the transition of mastery of knowledge and problem schema, which can be more easily applied to different applications. Extensive experiments show the effectiveness and interpretability of our proposed model.
Abstract: Computerized Adaptive Testing (CAT) is a promising testing mode in personalized online education (e.g., GRE), which aims at measuring student's proficiency accurately and reducing test length. The "adaptive" is reflected in its selection algorithm that can retrieve best-suited questions for student based on his/her estimated proficiency at each test step. Although there are many sophisticated selection algorithms for improving CAT's effectiveness, they are restricted and perturbed by the accuracy of current proficiency estimate, thus lacking robustness. To this end, we investigate a general method to enhance the robustness of existing algorithms by leveraging student's "multi-facet" nature during tests. Specifically, we present a generic optimization criterion Robust Adaptive Testing (RAT) for proficiency estimation via fusing multiple estimates at each step, which maintains a multi-facet description of student's potential proficiency. We further provide theoretical analyses of such estimator's desirable statistical properties: asymptotic unbiasedness, efficiency, and consistency. Extensive experiments on perturbed synthetic data and three real-world datasets show that selection algorithms in our RAT framework are robust and yield substantial improvements.
Abstract: Knowledge Tracing (KT), which aims to assess students' dynamic knowledge states when practicing on various questions, is a fundamental research task for offering intelligent services in online learning systems. Researchers have devoted significant efforts to developing KT models with impressive performance. However, in existing KT methods, the related question difficulty level, which directly affects students' knowledge state in learning, has not been effectively explored and employed. In this paper, we focus on exploring the question difficulty effect on learning to improve student's knowledge state assessment and propose the DIfficulty Matching Knowledge Tracing (DIMKT) model. Specifically, we first explicitly incorporate the difficulty level into the question representation. Then, to establish the relation between students' knowledge state and the question difficulty level during the practice process, we accordingly design an adaptive sequential neural network in three stages: (1) measuring students' subjective feelings of the question difficulty before practice; (2) estimating students' personalized knowledge acquisition while answering questions of different difficulty levels; (3) updating students' knowledge state in varying degrees to match the question difficulty level after practice. Finally, we conduct extensive experiments on real-world datasets, and the results demonstrate that DIMKT outperforms state-of-the-art KT models. Moreover, DIMKT shows superior interpretability by exploring the question difficulty effect when making predictions. Our codes are available at https://github.com/shshen-closer/DIMKT.
Abstract: The truncation of ranking lists predicted by retrieval models is vital to ensure users' search experience. Particularly, in specific vertical domains where documents are usually complicated and extensive (e.g., legal cases), the cost of browsing results is much higher than traditional IR tasks (e.g., Web search) and setting a reasonable cut-off position is quite necessary. While it is straightforward to apply existing result list truncation approaches to legal case retrieval, the effectiveness of these methods is limited because they only focus on simple document statistics and usually fail to capture the context information of documents in the ranking list. These existing efforts also treat result list truncation as an isolated task instead of a component in the entire ranking process, limiting the usage of truncation in practical systems. To tackle these limitations, we propose LeCut, a ranking list truncation model for legal case retrieval. LeCut utilizes contextual features of the retrieval task to capture the semantic-level similarity between documents and decides the best cut-off position with attention mechanisms. We further propose a Joint Optimization of Truncation and Reranking (JOTR) framework based on LeCut to improve the performance of truncation and retrieval tasks simultaneously. Comparison against competitive baselines on public benchmark datasets demonstrates the effectiveness of LeCut and JOTR. A case study is conducted to visualize the cut-off positions of LeCut and the process of how JOTR improves both retrieval and truncation tasks.
Abstract: Cold-start diagnosis prediction is a challenging task for AI in healthcare, where often only a few visits per patient and a few observations per disease can be exploited. Although meta-learning is widely adopted to address the data sparsity problem in general domains, directly applying it to healthcare data is less effective, since it is unclear how to capture both the temporal relations in clinical visits and the complicated relations among syndromic diseases for precise personalized diagnosis. To this end, we first propose a novel Meta-learning framework for cold-start diagnosis prediction in healthCare data (MetaCare). By explicitly encoding the effects of disease progress over time as a generalization prior, MetaCare dynamically predicts future diagnosis and timestamp for infrequent patients. Then, to model complicated relations among rare diseases, we propose to utilize domain knowledge of hierarchical relations among diseases, and further perform diagnosis subtyping to mine the latent syndromic relations among diseases. Finally, to tailor the generic meta-learning framework with personalized parameters, we design a hierarchical patient subtyping mechanism and bridge the modeling of both infrequent patients and rare diseases. We term the joint model as MetaCare++. Extensive experiments on two real-world benchmark datasets show significant performance gains brought by MetaCare++, yielding average improvements of 7.71% for diagnosis prediction and 13.94% for diagnosis time prediction over the state-of-the-art baselines.
Abstract: Biomedical natural language processing often involves the interpretation of patient descriptions, for instance for diagnosis or for recommending treatments. Current methods, based on biomedical language models, have been found to struggle with such tasks. Moreover, retrieval augmented strategies have only had limited success, as it is rare to find sentences which express the exact type of knowledge that is needed for interpreting a given patient description. For this reason, rather than attempting to retrieve explicit medical knowledge, we instead propose to rely on a nearest neighbour strategy. First, we retrieve text passages that are similar to the given patient description, and are thus likely to describe patients in similar situations, while also mentioning some hypothesis (e.g.\ a possible diagnosis of the patient). We then judge the likelihood of the hypothesis based on the similarity of the retrieved passages. Identifying similar cases is challenging, however, as descriptions of similar patients may superficially look rather different, among others because they often contain an abundance of irrelevant details. To address this challenge, we propose a strategy that relies on a distantly supervised cross-encoder. Despite its conceptual simplicity, we find this strategy to be effective in practice.
Abstract: Attributed networks, as a manifestation of data in non-Euclidean domains, have a wide range of applications in the real world, such as molecular property prediction, social network analysis and anomaly detection. Node classification, as a fundamental research problem in attributed networks, has attracted increasing attention among research communities. However, most existing models cannot be directly applied to the data with limited labeled instances (\textiti.e., the few-shot scenario). Few-shot node classification on attributed networks is gradually becoming a research hotspot. Although several methods aim to integrate meta-learning with graph neural networks to address this problem, some limitations remain. First, they all assume node representation learning using graph neural networks in homophilic graphs. %Hence, suboptimal performance is obtained when these models are applied to heterophilic graphs. Second, existing models based on meta-learning entirely depend on instance-based statistics. %which in few-shot settings are unavoidably degraded by data noise or outliers. Third, most previous models treat all sampled tasks equally and fail to adapt their uniqueness. %which has a significant impact on the overall performance of the model. To solve the above three limitations, we propose a novel graph Meta -learning framework called G raph learning based on P rototype and S caling & shifting transformation (Meta-GPS ). More specifically, we introduce an efficient method for learning expressive node representations even on heterophilic graphs and propose utilizing a prototype-based approach to initialize parameters in meta-learning. Moreover, we also leverage S$^2$ (scaling & shifting) transformation to learn effective transferable knowledge from diverse tasks. Extensive experimental results on six real-world datasets demonstrate the superiority of our proposed framework, which outperforms other state-of-the-art baselines by up to 13% absolute improvement in terms of related metrics.
Abstract: Fashion Compatibility Modeling (FCM) is a new yet challenging task, which aims to automatically access the matching degree among a set of complementary items. Most of existing methods evaluate the fashion compatibility from the common perspective, but overlook the user's personal preference. Inspired by this, a few pioneers study the Personalized Fashion Compatibility Modeling (PFCM). Despite their significance, these PFCM methods mainly concentrate on the user and item entities, as well as their interactions, but ignore the attribute entities, which contain rich semantics. To address this problem, we propose to fully explore the related entities and their relations involved in PFCM to boost the PFCM performance. This is, however, non-trivial due to the heterogeneous contents of different entities, embeddings for new users, and various high-order relations. Towards these ends, we present a novel metapath-guided personalized fashion compatibility modeling, dubbed as MG-PFCM. In particular, we creatively build a heterogeneous graph to unify the three types of entities (i.e., users, items, and attributes) and their relations (i.e., user-item interactions, item-item matching relations, and item-attribute association relations). Thereafter, we design a multi-modal content-oriented user embedding module to learn user representations by inheriting the contents of their interacted items. Meanwhile, we define the user-oriented and item-oriented metapaths, and perform the metapath-guided heterogeneous graph learning to enhance the user and item embeddings. In addition, we introduce the contrastive regularization to improve the model performance. We conduct extensive experiments on the real-world benchmark dataset, which verifies the superiority of our proposed scheme over several cutting-edge baselines. As a byproduct, we have released our source codes to benefit other researchers.
Abstract: Health thread recommendation methods aim to suggest the most relevant existing threads for a user. Most of the existing methods tend to rely on modeling the post contents to retrieve relevant answers. However, some posts written by users with different clinical conditions can be lexically similar, as unrelated diseases (e.g., Angina and Osteoporosis) may have the same symptoms (e.g., back pain), yet irrelevant threads to a user. Therefore, it is critical to not only consider the connections between users and threads, but also the descriptions of users' symptoms and clinical conditions. In this paper, towards this problem of thread recommendation in online healthcare forums, we propose a knowledge graph enhanced Threads Recommendation (KETCH) model, which leverages graph neural networks to model the interactions among users and threads, and learn their representations. In our model, the users, threads and posts are three types of nodes in a graph, linked through their associations. KETCH uses the message passing strategy by aggregating information along with the network. In addition, we introduce a knowledge-enhanced attention mechanism to capture the latent conditions and symptoms. We also apply the method to the task of predicting the side effects of drugs, to show that KETCH has the potential to complement the medical knowledge graph. Comparing with the best results of seven competing methods, in terms of MRR, KETCH outperforms all methods by at least 0.125 on the MedHelp dataset, 0.048 on the Patient dataset and 0.092 on HealthBoards dataset, respectively. We release the source code of KETCH at: https://github.com/cuilimeng/KETCH.
Abstract: Online healthcare services can provide unlimited and in-time medical information to users, which promotes social goods and breaks the barriers of locations. However, understanding the user intents behind the medical related queries is a challenging problem. Medical search queries are usually short and noisy, lack strict syntactic structure, and also require professional background to understand the medical terms. The medical intents are fine-grained, making them hard to recognize. In addition, many intents only have a few labeled data. To handle these problems, we propose a few-shot learning method for medical search query intent recognition called MEDIC. We extract co-click queries from user search logs as weak supervision to compensate for the lack of labeled data. We also design a new query encoder which learns to represent queries as a combination of semantic knowledge recorded in an external medical knowledge graph, syntactic knowledge which marks the grammatical role of each word in the query, and generic knowledge which is captured by language models pretrained from large-scale text corpus. Experimental results on a real medical search query intent recognition dataset validate the effectiveness of MEDIC.
Abstract: As a crucial component of most modern deep recommender systems, feature embedding maps high-dimensional sparse user/item features into low-dimensional dense embeddings. However, these embeddings are usually assigned a unified dimension, which suffers from the following issues: (1) high memory usage and computation cost. (2) sub-optimal performance due to inferior dimension assignments. In order to alleviate the above issues, some works focus on automated embedding dimension search by formulating it as hyper-parameter optimization or embedding pruning problems. However, they either require well-designed search space for hyperparameters or need time-consuming optimization procedures. In this paper, we propose a Single-Shot Embedding Dimension Search method, called SSEDS, which can efficiently assign dimensions for each feature field via a single-shot embedding pruning operation while maintaining the recommendation accuracy of the model. Specifically, it introduces a criterion for identifying the importance of each embedding dimension for each feature field. As a result, SSEDS could automatically obtain mixed-dimensional embeddings by explicitly reducing redundant embedding dimensions based on the corresponding dimension importance ranking and the predefined parameter budget. Furthermore, the proposed SSEDS is model-agnostic, meaning that it could be integrated into different base recommendation models. The extensive offline experiments are conducted on two widely used public datasets for CTR (Click Through Rate) prediction task, and the results demonstrate that SSEDS can still achieve strong recommendation performance even if it has reduced 90% parameters. Moreover, SSEDS has also been deployed on the WeChat Subscription platform for practical recommendation services. The 7-day online A/B test results show that SSEDS can significantly improve the performance of the online recommendation model while reducing resource consumption.
Abstract: With the development of deep learning techniques, deep recommendation models also achieve remarkable improvements in terms of recommendation accuracy. However, due to the large number of candidate items in practice and the high cost of preference computation, these methods also suffer from low efficiency of recommendation. The recently proposed tree-based deep recommendation models alleviate the problem by directly learning tree structure and representations under the guidance of recommendation objectives. However, such models have two shortcomings. First, the max-heap assumption in the hierarchical tree, in which the preference for a parent node should be the maximum between the preferences for its children, is difficult to satisfy in their binary classification objectives. Second, the learned index only includes a single tree, which is different from the widely-used multiple trees index, providing an opportunity to improve the accuracy of recommendation. To this end, we propose a Deep Forest-based Recommender (DeFoRec for short) for an efficient recommendation. In DeFoRec, all the trees generated during training process are retained to form the forest. When learning node representation of each tree, we have to satisfy the max-heap assumption as much as possible and mimic beam search behavior over the tree in the training stage. This is achieved by DeFoRec to regard the training task as multi-classification over tree nodes at the same level. However, the number of tree nodes grows exponentially with levels, making us to train the preference model by the guidance of sampled-softmax technique. The experiments are conducted on real-world datasets, validating the effectiveness of the proposed preference model learning method and tree learning method.
Abstract: Deep neural networks (DNNs) demonstrates significant advantages in improving ranking performance in retrieval tasks. Driven by the recent developments in optimization and generalization of DNNs, learning a neural ranking model online from its interactions with users becomes possible. However, the required exploration for model learning has to be performed in the entire neural network parameter space, which is prohibitively expensive and limits the application of such online solutions in practice. In this work, we propose an efficient exploration strategy for online interactive neural ranker learning based on bootstrapping. Our solution is based on an ensemble of ranking models trained with perturbed user click feedback. The proposed method eliminates explicit confidence set construction and the associated computational overhead, which enables the online neural rankers training to be efficiently executed in practice with theoretical guarantees. Extensive comparisons with an array of state-of-the-art OL2R algorithms on two public learning to rank benchmark datasets demonstrate the effectiveness and computational efficiency of our proposed neural OL2R solution.
Abstract: Session-based recommender systems (SBR) are becoming increasingly popular because they can predict user interests without relying on long-term user profile and support login-free recommendation. Modern recommender systems operate in a fully server-based fashion. To cater to millions of users, the frequent model maintaining and the high-speed processing for concurrent user requests are required, which comes at the cost of a huge carbon footprint. Meanwhile, users need to upload their behavior data even including the immediate environmental context to the server, raising the public concern about privacy. On-device recommender systems circumvent these two issues with cost-conscious settings and local inference. However, due to the limited memory and computing resources, on-device recommender systems are confronted with two fundamental challenges: (1) how to reduce the size of regular models to fit edge devices? (2) how to retain the original capacity? Previous research mostly adopts tensor decomposition techniques to compress regular recommendation models with low compression rates so as to avoid drastic performance degradation. In this paper, we explore ultra-compact models for next-item recommendation, by loosing the constraint of dimensionality consistency in tensor decomposition. To compensate for the capacity loss caused by compression, we develop a self-supervised knowledge distillation framework which enables the compressed model (student) to distill the essential information lying in the raw data, and improves the long-tail item recommendation through an embedding-recombination strategy with the original model (teacher). The extensive experiments on two benchmarks demonstrate that, with 30x size reduction, the compressed model almost comes with no accuracy loss, and even outperforms its uncompressed counterpart. The code is released at https://github.com/xiaxin1998/OD-Rec.
Abstract: Many IR collections contain forbidden documents (F-docs), i.e. documents that should not be retrieved to the searcher. In an ideal scenario F-docs are clearly flagged, hence the ranker can filter them out, guaranteeing that no F-doc will be exposed. However, in real-world scenarios, filtering algorithms are prone to errors. Therefore, an IR evaluation system should also measure filtering quality in addition to ranking quality. Typically, filtering is considered as a classification task and is evaluated independently of the ranking quality. However, due to the mutual affinity between the two, it is desirable to evaluate ranking quality while filtering decisions are being made. In this work we propose nDCGf, a novel extension of the nDCGmin metric[14], which measures both ranking and filtering quality of the search results. We show both theoretically and empirically that while nDCGmin is not suitable for the simultaneous ranking and filtering task, nDCGf is a reliable metric in this case. We experiment with three datasets for which ranking and filtering are both required. In the PR dataset our task is to rank product reviews while filtering those marked as spam. Similarly, in the CQA dataset our task is to rank a list of human answers per question while filtering bad answers. We also experiment with the TREC web-track datasets, where F-docs are explicitly labeled, sorting participant runs according to their ranking and filtering quality, demonstrating the stability, sensitivity, and reliability of nDCGf for this task. We propose a learning to rank and filter (LTRF) framework that is specifically designed to optimize nDCGf, by learning a ranking model and optimizing a filtering threshold used for discarding documents with lower scores. We experiment with several loss functions demonstrating their success in learning an effective LTRF model for the simultaneous learning and filtering task.
Abstract: The dramatic improvements in core information retrieval tasks engendered by neural rankers create a need for novel evaluation methods. If every ranker returns highly relevant items in the top ranks, it becomes difficult to recognize meaningful differences between them and to build reusable test collections. Several recent papers explore pairwise preference judgments as an alternative to traditional graded relevance assessments. Rather than viewing items one at a time, assessors view items side-by-side and indicate the one that provides the better response to a query, allowing fine-grained distinctions. If we employ preference judgments to identify the probably best items for each query, we can measure rankers by their ability to place these items as high as possible. We frame the problem of finding best items as a dueling bandits problem. While many papers explore dueling bandits for online ranker evaluation via interleaving, they have not been considered as a framework for offline evaluation via human preference judgments. We review the literature for possible solutions. For human preference judgments, any usable algorithm must tolerate ties, since two items may appear nearly equal to assessors, and it must minimize the number of judgments required for any specific pair, since each such comparison requires an independent assessor. Since the theoretical guarantees provided by most algorithms depend on assumptions that are not satisfied by human preference judgments, we simulate selected algorithms on representative test cases to provide insight into their practical utility. Based on these simulations, one algorithm stands out for its potential. Our simulations suggest modifications to further improve its performance. Using the modified algorithm, we collect over 10,000 preference judgments for pools derived from submissions to the TREC 2021 Deep Learning Track, confirming its suitability. We test the idea of best-item evaluation and suggest ideas for further theoretical and practical progress.
Abstract: The use of offline effectiveness metrics is one of the cornerstones of evaluation in information retrieval. Static resources that include test collections and sets of topics, the corresponding relevance judgments connecting them, and metrics that map document rankings from a retrieval system to numeric scores have been used for multiple decades as an important way of comparing systems. The basis behind this experimental structure is that the metric score for a system can serve as a surrogate measurement for user satisfaction. Here we introduce a user behavior framework that extends the C/W/L family. The essence of the new framework - which we call C/W/L/A - is that the user actions that are undertaken while reading the ranking can be considered separately from the benefit that each user will have derived as they exit the ranking. This split structure allows the great majority of current effectiveness metrics to be systematically categorized, and thus their relative properties and relationships to be better understood; and at the same time permits a wide range of novel combinations to be considered. We then carry out experiments using relevance judgments, document rankings, and user satisfaction data from two distinct sources, comparing the patterns of metric scores generated, and showing that those metrics vary quite markedly in terms of their ability to predict user satisfaction.
Abstract: Most of information retrieval effectiveness evaluation metrics assume that systems appending irrelevant documents at the bottom of the ranking are as effective as (or not worse than) systems that have a stopping criteria to 'truncate' the ranking at the right position to avoid retrieving those irrelevant documents at the end. It can be argued, however, that such truncated rankings are more useful to the end user. It is thus important to understand how to measure retrieval effectiveness in this scenario. In this paper we provide both theoretical and experimental contributions. We first define formal properties to analyze how effectiveness metrics behave when evaluating truncated rankings. Our theoretical analysis shows that de-facto standard metrics do not satisfy desirable properties to evaluate truncated rankings: only Observational Information Effectiveness (OIE) -- a metric based on Shannon's information theory -- satisfies them all. We then perform experiments to compare several metrics on nine TREC datasets. According to our experimental results, the most appropriate metrics for truncated rankings are OIE and a novel extension of Rank-Biased Precision that adds a user effort factor penalizing the retrieval of irrelevant documents.
Abstract: Offline evaluation of information retrieval and recommendation has traditionally focused on distilling the quality of a ranking into a scalar metric such as average precision or normalized discounted cumulative gain. We can use this metric to compare the performance of multiple systems for the same request. Although evaluation metrics provide a convenient summary of system performance, they also collapse subtle differences across users into a single number and can carry assumptions about user behavior and utility not supported across retrieval scenarios. We propose recall-paired preference (RPP), a metric-free evaluation method based on directly computing a preference between ranked lists. RPP simulates multiple user subpopulations per query and compares systems across these pseudo-populations. Our results across multiple search and recommendation tasks demonstrate that RPP substantially improves discriminative power while correlating well with existing metrics and being equally robust to incomplete data.
Abstract: A fundamental goal of Information Retrieval (IR) is to satisfy searchers' information need (IN). Advances in neuroimaging technologies have allowed for interdisciplinary research to investigate the brain activity associated with the realisation of IN. While these studies have been informative, they were not able to capture the cognitive processes underlying the realisation of IN and the interplay between them with a high temporal resolution. This paper aims to investigate this research question by inferring the variability of brain activity based on the contrast of a state of IN with the two other (no-IN) scenarios. To do so, we employed Electroencephalography (EEG) and constructed an Event-Related Potential (ERP) analysis of the brain signals captured while the participants were experiencing the realisation of IN. In particular, the brain signals of 24 healthy participants were captured while performing a Question-Answering (Q/A) Task. Our results show a link between the early stages of processing, corresponding to awareness and the late activity, meaning memory control mechanisms. Our findings also show that participants exhibited early N1-P2 complex indexing awareness processes and indicate, thus, that the realisation of IN is manifested in the brain before it reaches the user's consciousness. This research contributes novel insights into a better understanding of IN and informs the design of IR systems to better satisfy it.
Abstract: Search engines and recommendation systems attempt to continually improve the quality of the experience they afford to their users. Refining the ranker that produces the lists displayed in response to user requests is an important component of this process. A common practice is for the service providers to make changes (e.g. new ranking features, different ranking models) and A/B test them on a fraction of their users to establish the value of the change. An alternative approach estimates the effectiveness of the proposed changes offline, utilising previously collected clickthrough data on the old ranker to posit what the user behaviour on ranked lists produced by the new ranker would have been. A majority of offline evaluation approaches invoke the well studied inverse propensity weighting to adjust for biases inherent in logged data. In this paper, we propose the use of parametric estimates for these propensities. Specifically, by leveraging well known learning-to-rank methods as subroutines, we show how accurate offline evaluation can be achieved when the new rankings to be evaluated differ from the logged ones.
Abstract: Web search heavily relies on click-through behavior as an essential feedback signal for performance evaluation and improvement. Traditionally, click is usually treated as a positive implicit feedback signal of relevance or usefulness, while non-click is regarded as a signal of irrelevance or uselessness. However, there are many cases where users satisfy their information need with the contents shown on the Search Engine Result Page (SERP). This raises the problem of measuring the usefulness of non-click results and modeling user satisfaction in such circumstances. For a long period, understanding non-click results is challenging owing to the lack of user interactions. In recent years, the rapid development of neuroimaging technologies constitutes a paradigm shift in various industries, e.g., search, entertainment, and education. Therefore, we benefit from these technologies and apply them to bridge the gap between the human mind and the external search system in non-click situations. To this end, we analyze the differences in brain signals between the examination of non-click search results in different usefulness levels. Inspired by these findings, we conduct supervised learning tasks to estimate the usefulness of non-click results with brain signals and conventional information (i.e., content and context factors). Furthermore, we devise two re-ranking methods, i.e., a Personalized Method (PM) and a Generalized Intent modeling Method (GIM), for search result re-ranking with the estimated usefulness. Results show that it is feasible to utilize brain signals to improve usefulness estimation performance and enhance human-computer interactions by search result re-ranking.
Abstract: Existing explainable recommender systems have mainly modeled relationships between recommended and already experienced products, and shaped explanation types accordingly (e.g., movie "x" starred by actress "y" recommended to a user because that user watched other movies with "y" as an actress). However, none of these systems has investigated the extent to which properties of a single explanation (e.g., the recency of interaction with that actress) and of a group of explanations for a recommended list (e.g., the diversity of the explanation types) can influence the perceived explaination quality. In this paper, we conceptualized three novel properties that model the quality of the explanations (linking interaction recency, shared entity popularity, and explanation type diversity) and proposed re-ranking approaches able to optimize for these properties. Experiments on two public data sets showed that our approaches can increase explanation quality according to the proposed properties, fairly across demographic groups, while preserving recommendation utility. The source code and data are available at https://github.com/giacoballoccu/explanation-quality-recsys.
Abstract: As an essential operation of legal retrieval, legal case matching plays a central role in intelligent legal systems. This task has a high demand on the explainability of matching results because of its critical impacts on downstream applications --- the matched legal cases may provide supportive evidence for the judgments of target cases and thus influence the fairness and justice of legal decisions. Focusing on this challenging task, we propose a novel and explainable method, namely IOT-Match, with the help of computational optimal transport, which formulates the legal case matching problem as an inverse optimal transport (IOT) problem. Different from most existing methods, which merely focus on the sentence-level semantic similarity between legal cases, our IOT-Match learns to extract rationales from paired legal cases based on both semantics and legal characteristics of their sentences. The extracted rationales are further applied to generate faithful explanations and conduct matching. Moreover, the proposed IOT-Match is robust to the alignment label insufficiency issue commonly in practical legal case matching tasks, which is suitable for both supervised and semi-supervised learning paradigms. To demonstrate the superiority of our IOT-Match method and construct a benchmark of explainable legal case matching task, we not only extend the well-known Challenge of AI in Law (CAIL) dataset but also build a new Explainable Legal cAse Matching (ELAM) dataset, which contains lots of legal cases with detailed and explainable annotations. Experiments on these two datasets show that our IOT-Match outperforms state-of-the-art methods consistently on matching prediction, rationale extraction, and explanation generation.
Abstract: It has been shown that the interpretability of search results is enhanced when query aspects covered by documents are explicitly provided. However, existing work on aspect-oriented explanation of search results explains each document independently. These explanations thus cannot describe the differences between documents. This issue is also true for existing models on query aspect generation. Furthermore, these models provide a single query aspect for each document, even though documents often cover multiple query aspects. To overcome these limitations, we propose LiEGe, an approach that jointly explains all documents in a search result list. LiEGe provides semantic representations at two levels of granularity -- documents and their tokens -- using different interaction signals including cross-document interactions. These allow listwise modeling of a search result list as well as the generation of coherent explanations for documents. To appropriately explain documents that cover multiple query aspects, we introduce two settings for search result explanation: comprehensive and novelty explanation generation. LiEGe is trained and evaluated for both settings. We evaluate LiEGe on datasets built from Wikipedia and real query logs of the Bing search engine. Our experimental results demonstrate that LiEGe outperforms all baselines, with improvements that are substantial and statistically significant.
Abstract: Existing research on fairness-aware recommendation has mainly focused on the quantification of fairness and the development of fair recommendation models, neither of which studies a more substantial problem--identifying the underlying reason of model disparity in recommendation. This information is critical for recommender system designers to understand the intrinsic recommendation mechanism and provides insights on how to improve model fairness to decision makers. Fortunately, with the rapid development of Explainable AI, we can use model explainability to gain insights into model (un)fairness. In this paper, we study the problem ofexplainable fairness, which helps to gain insights about why a system is fair or unfair, and guides the design of fair recommender systems with a more informed and unified methodology. Particularly, we focus on a common setting with feature-aware recommendation and exposure unfairness, but the proposed explainable fairness framework is general and can be applied to other recommendation settings and fairness definitions. We propose a Counterfactual Explainable Fairness framework, called CEF, which generates explanations about model fairness that can improve the fairness without significantly hurting the performance. The CEF framework formulates an optimization problem to learn the "minimal'' change of the input features that changes the recommendation results to a certain level of fairness. Based on the counterfactual recommendation result of each feature, we calculate an explainability score in terms of the fairness-utility trade-off to rank all the feature-based explanations, and select the top ones as fairness explanations. Experimental results on several real-world datasets validate that our method is able to effectively provide explanations to the model disparities and these explanations can achieve better fairness-utility trade-off when using them for recommendation than all the baselines.
Abstract: Variational autoencoders (VAEs) have been widely applied in recommendations. One reason is that their amortized inferences are beneficial for overcoming the data sparsity. However, in explainable recommendation that generates natural language explanations, they are still rarely explored. Thus, we aim to extend VAE to explainable recommendation. In this task, we find that VAE can generate acceptable explanations for users with few relevant training samples, however, it tends to generate less personalized explanations for users with relatively sufficient samples than autoencoders (AEs). We conjecture that information shared by different users in VAE disturbs the information for a specific user. To deal with this problem, we present PErsonalized VAE (PEVAE) that generates personalized natural language explanations for explainable recommendation. Moreover, we propose two novel mechanisms to aid our model in generating more personalized explanations, including 1) Self-Adaption Fusion (SAF) manipulates the latent space in a self-adaption manner for controlling the influence of shared information. In this way, our model can enjoy the advantage of overcoming the sparsity of data while generating more personalized explanations for a user with relatively sufficient training samples. 2) DEpendence Maximization (DEM) strengthens dependence between recommendations and explanations by maximizing the mutual information. It makes the explanation more specific to the input user-item pair and thus improves the personalization of the generated explanations. Extensive experiments show PEVAE can generate more personalized explanations and further analyses demonstrate the practical effect of our proposed methods.
Abstract: Prior research on exposure fairness in the context of recommender systems has focused mostly on disparities in the exposure of individual or groups of items to individual users of the system. The problem of how individual or groups of items may be systemically under or over exposed to groups of users, or even all users, has received relatively less attention. However, such systemic disparities in information exposure can result in observable social harms, such as withholding economic opportunities from historically marginalized groups (allocative harm) or amplifying gendered and racialized stereotypes (representational harm). Previously, Diaz et al. developed the expected exposure metric---that incorporates existing user browsing models that have previously been developed for information retrieval---to study fairness of content exposure to individual users. We extend their proposed framework to formalize a family of exposure fairness metrics that model the problem jointly from the perspective of both the consumers and producers. Specifically, we consider group attributes for both types of stakeholders to identify and mitigate fairness concerns that go beyond individual users and items towards more systemic biases in recommendation. Furthermore, we study and discuss the relationships between the different exposure fairness dimensions proposed in this paper, as well as demonstrate how stochastic ranking policies can be optimized towards said fairness goals.
Abstract: There are several measures for fairness in ranking, based on different underlying assumptions and perspectives. \acPL optimization with the REINFORCE algorithm can be used for optimizing black-box objective functions over permutations. In particular, it can be used for optimizing fairness measures. However, though effective for queries with a moderate number of repeating sessions, \acPL optimization has room for improvement for queries with a small number of repeating sessions. In this paper, we present a novel way of representing permutation distributions, based on the notion of permutation graphs. Similar to~\acPL, our distribution representation, called~\acPPG, can be used for black-box optimization of fairness. Different from~\acPL, where pointwise logits are used as the distribution parameters, in~\acPPG pairwise inversion probabilities together with a reference permutation construct the distribution. As such, the reference permutation can be set to the best sampled permutation regarding the objective function, making~\acPPG suitable for both deterministic and stochastic rankings. Our experiments show that~\acPPG, while comparable to~\acPL for larger session repetitions (i.e., stochastic ranking), improves over~\acPL for optimizing fairness metrics for queries with one session (i.e., deterministic ranking). Additionally, when accurate utility estimations are available, e.g., in tabular models, the performance of \acPPG in fairness optimization is significantly boosted compared to lower quality utility estimations from a learning to rank model, leading to a large performance gap with PL. Finally, the pairwise probabilities make it possible to impose pairwise constraints such as "item $d_1$ should always be ranked higher than item $d_2$.'' Such constraints can be used to simultaneously optimize the fairness metric and control another objective such as ranking performance.
Abstract: Information access systems, such as search and recommender systems, often use ranked lists to present results believed to be relevant to the user's information need. Evaluating these lists for their fairness along with other traditional metrics provides a more complete understanding of an information access system's behavior beyond accuracy or utility constructs. To measure the (un)fairness of rankings, particularly with respect to the protected group(s) of producers or providers, several metrics have been proposed in the last several years. However, an empirical and comparative analyses of these metrics showing the applicability to specific scenario or real data, conceptual similarities, and differences is still lacking. We aim to bridge the gap between theoretical and practical ap-plication of these metrics. In this paper we describe several fair ranking metrics from the existing literature in a common notation, enabling direct comparison of their approaches and assumptions, and empirically compare them on the same experimental setup and data sets in the context of three information access tasks. We also provide a sensitivity analysis to assess the impact of the design choices and parameter settings that go in to these metrics and point to additional work needed to improve fairness measurement.
Abstract: There is growing interest in designing recommender systems that aim at being fair towards item producers or their least satisfied users. Inspired by the domain of inequality measurement in economics, this paper explores the use of generalized Gini welfare functions (GGFs) as a means to specify the normative criterion that recommender systems should optimize for. GGFs weight individuals depending on their ranks in the population, giving more weight to worse-off individuals to promote equality. Depending on these weights, GGFs minimize the Gini index of item exposure to promote equality between items, or focus on the performance on specific quantiles of least satisfied users. GGFs for ranking are challenging to optimize because they are non-differentiable. We resolve this challenge by leveraging tools from non-smooth optimization and projection operators used in differentiable sorting. We present experiments using real datasets with up to 15k users and items, which show that our approach obtains better trade-offs than the baselines on a variety of recommendation tasks and fairness criteria.
Abstract: In recent years, it has become clear that rankings delivered in many areas need not only be useful to the users but also respect fairness of exposure for the item producers. We consider the problem of finding ranking policies that achieve a Pareto-optimal tradeoff between these two aspects. Several methods were proposed to solve it; for instance a popular one is to use linear programming with a Birkhoff-von Neumann decomposition. These methods, however, are based on a classical Position Based exposure Model (PBM), which assumes independence between the items (hence the exposure only depends on the rank). In many applications, this assumption is unrealistic and the community increasingly moves towards considering other models that include dependences, such as the Dynamic Bayesian Network (DBN) exposure model. For such models, computing (exact) optimal fair ranking policies remains an open question. In this paper, we answer this question by leveraging a new geometrical method based on the so-called expohedron proposed recently for the PBM (Kletti et al., WSDM'22). We lay out the structure of a new geometrical object (the DBN-expohedron), and propose for it a Carathéodory decomposition algorithm of complexity $O(n^3)$, where n is the number of documents to rank. Such an algorithm enables expressing any feasible expected exposure vector as a distribution over at most n rankings; furthermore we show that we can compute the whole set of Pareto-optimal expected exposure vectors with the same complexity $O(n^3)$. Our work constitutes the first exact algorithm able to efficiently find a Pareto-optimal distribution of rankings. It is applicable to a broad range of fairness notions, including classical notions of meritocratic and demographic fairness. We empirically evaluate our method on the TREC2020 and MSLR datasets and compare it to several baselines in terms of Pareto-optimality and speed.
Abstract: Fairness of exposure is a commonly used notion of fairness for ranking systems. It is based on the idea that all items or item groups should get exposure proportional to the merit of the item or the collective merit of the items in the group. Often, stochastic ranking policies are used to ensure fairness of exposure. Previous work unrealistically assumes that we can reliably estimate the expected exposure for all items in each ranking produced by the stochastic policy. In this work, we discuss how to approach fairness of exposure in cases where the policy contains rankings of which, due to inter-item dependencies, we cannot reliably estimate the exposure distribution. In such cases, we cannot determine whether the policy can be considered fair. % Our contributions in this paper are twofold. First, we define a method called \method for finding stochastic policies that avoid showing rankings with unknown exposure distribution to the user without having to compromise user utility or item fairness. Second, we extend the study of fairness of exposure to the top-k setting and also assess \method in this setting. We find that \method can significantly reduce the number of rankings with unknown exposure distribution without a drop in user utility or fairness compared to existing fair ranking methods, both for full-length and top-k rankings. This is an important first step in developing fair ranking methods for cases where we have incomplete knowledge about the user's behaviour.
Abstract: Recently, there has been a rising awareness that when machine learning (ML) algorithms are used to automate choices, they may treat/affect individuals unfairly, with legal, ethical, or economic consequences. Recommender systems are prominent examples of such ML systems that assist users in making high-stakes judgments. A common trend in the previous literature research on fairness in recommender systems is that the majority of works treat user and item fairness concerns separately, ignoring the fact that recommender systems operate in a two-sided marketplace. In this work, we present an optimization-based re-ranking approach that seamlessly integrates fairness constraints from both the consumer and producer-side in a joint objective framework. We demonstrate through large-scale experiments on 8 datasets that our proposed method is capable of improving both consumer and producer fairness without reducing overall recommendation quality, demonstrating the role algorithms may play in minimizing data biases.
Abstract: Retrieving relevant documents from a corpus is typically based on the semantic similarity between the document content and query text. The inclusion of structural relationship between documents can benefit the retrieval mechanism by addressing semantic gaps. However, incorporating these relationships requires tractable mechanisms that balance structure with semantics and take advantage of the prevalent pre-train/fine-tune paradigm. We propose here a holistic approach to learning document representations by integrating intra-document content with inter-document relations. Our deep metric learning solution analyzes the complex neighborhood structure in the relationship network to efficiently sample similar/dissimilar document pairs and defines a novel quintuplet loss function that simultaneously encourages document pairs that are semantically relevant to be closer and structurally unrelated to be far apart in the representation space. Furthermore, the separation margins between the documents are varied flexibly to encode the heterogeneity in relationship strengths. The model is fully fine-tunable and natively supports query projection during inference. We demonstrate that it outperforms competing methods on multiple datasets for document retrieval tasks.
Abstract: Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions. To overcome this limitation, we propose a new approach that explicitly optimizes the query suggestions for downstream retrieval performance. We formulate this as a problem of ranking a set of rankings, where each query suggestion is represented by the downstream item ranking it produces. We then present a learning method that ranks query suggestions by the quality of their item rankings. The algorithm is based on a counterfactual learning approach that is able to leverage feedback on the items (e.g., clicks, purchases) to evaluate query suggestions through an unbiased estimator, thus avoiding the assumption that users write or select optimal queries. We establish theoretical support for the proposed approach and provide learning-theoretic guarantees. We also present empirical results on publicly available datasets, and demonstrate real-world applicability using data from an online shopping store.
Abstract: Learning to Rank (L2R) is the core task of many Information Retrieval systems. Recently, a great effort has been put on exploring Deep Neural Networks (DNNs) for L2R, with significant results. However, risk-sensitiveness, an important and recent advance in the L2R arena, that reduces variability and increases trust, has not been incorporated into Deep Neural L2R yet. Risk-sensitive measures are important to assess the risk of an IR system to perform worse than a set of baseline IR systems for several queries. However, the risk-sensitive measures described in the literature have a non-smooth behavior, making them difficult, if not impossible, to be optimized by DNNs. In this work we solve this difficult problem by proposing a family of new loss functions -- \riskloss\ -- that support a smooth risk-sensitive optimization. \riskloss\ introduces two important contributions: (i) the substitution of the traditional NDCG or MAP metrics in risk-sensitive measures with smooth loss functions that evaluate the correlation between the predicted and the true relevance order of documents for a given query and (ii) the use of distinct versions of the same DNN architecture as baselines by means of a multi-dropout technique during the smooth risk-sensitive optimization, avoiding the inconvenience of assessing multiple IR systems as part of DNN training. We empirically demonstrate significant achievements of the proposed \riskloss\ functions when used with recent DNN methods in the context of well-known web-search datasets such as WEB10K, YAHOO, and MQ2007. Our solutions reach improvements of 8% in effectiveness (NDCG) while improving in around 5% the risk-sensitiveness (\grisk\ measure) when applied together with a state-of-the-art Self-Attention DNN-L2R architecture. Furthermore, \riskloss\ is capable of reducing by 28% the losses over the best evaluated baselines and significantly improving over the risk-sensitive state-of-the-art non-DNN method (by up to 13.3%) while keeping (or even increasing) overall effectiveness. All these results ultimately establish a new level for the state-of-the-art on risk-sensitiveness and DNN-L2R research.
Abstract: Building a multi-stage cascade ranking system is a commonly used solution to balance the efficiency and effectiveness in modern information retrieval (IR) applications, such as recommendation and web search. Despite the popularity in practice, the literature specific on multi-stage cascade ranking systems is relatively scarce. The common practice is to train rankers of each stage independently using the same user feedback data (a.k.a., impression data), disregarding the data flow and the possible interactions between stages. This straightforward solution could lead to a sub-optimal system because of the sample selection bias (SSB) issue, which is especially damaging for cascade rankers due to the negative effect accumulated in the multiple stages. Worse still, the interactions between the rankers of each stage are not fully exploited. This paper provides an elaborate analysis of this commonly used solution to reveal its limitations. By studying the essence of cascade ranking, we propose a joint training framework named RankFlow to alleviate the SSB issue and exploit the interactions between the cascade rankers, which is the first systematic solution for this topic. We propose a paradigm of training cascade rankers that emphasizes the importance of fitting rankers on stage-specific data distributions instead of the unified user feedback distribution. We design the RankFlow framework based on this paradigm: The training data of each stage is generated by its preceding stages while the guidance signals not only come from the logs but its successors. Extensive experiments are conducted on various IR scenarios, including recommendation, web search and advertisement. The results verify the efficacy and superiority of RankFlow.
Abstract: Pseudo-relevance feedback (PRF) has proven to be an effective query reformulation technique to improve retrieval accuracy. It aims to alleviate the mismatch of linguistic expressions between a query and its potential relevant documents. Existing PRF methods independently treat revised queries originating from the same query but using different numbers of feedback documents, resulting in severe query drift. Without comparing the effects of two different revisions from the same query, a PRF model may incorrectly focus on the additional irrelevant information increased in the more feedback, and thus reformulate a query that is less effective than the revision using the less feedback. Ideally, if a PRF model can distinguish between irrelevant and relevant information in the feedback, the more feedback documents there are, the better the revised query will be. To bridge this gap, we propose the Loss-over-Loss (LoL) framework to compare the reformulation losses between different revisions of the same query during training. Concretely, we revise an original query multiple times in parallel using different amounts of feedback and compute their reformulation losses. Then, we introduce an additional regularization loss on these reformulation losses to penalize revisions that use more feedback but gain larger losses. With such comparative regularization, the PRF model is expected to learn to suppress the extra increased irrelevant information by comparing the effects of different revised queries. Further, we present a differentiable query reformulation method to implement this framework. This method revises queries in the vector space and directly optimizes the retrieval performance of query vectors, applicable for both sparse and dense retrieval models. Empirical evaluation demonstrates the effectiveness and robustness of our method for two typical sparse and dense retrieval models.
Abstract: Stance detection aims to identify whether the author of a text is in favor of, against, or neutral to a given target. The main challenge of this task comes two-fold: few-shot learning resulting from the varying targets and the lack of contextual information of the targets. Existing works mainly focus on solving the second issue by designing attention-based models or introducing noisy external knowledge, while the first issue remains under-explored. In this paper, inspired by the potential capability of pre-trained language models (PLMs) serving as knowledge bases and few-shot learners, we propose to introduce prompt-based fine-tuning for stance detection. PLMs can provide essential contextual information for the targets and enable few-shot learning via prompts. Considering the crucial role of the target in stance detection task, we design target-aware prompts and propose a novel verbalizer. Instead of mapping each label to a concrete word, our verbalizer maps each label to a vector and picks the label that best captures the correlation between the stance and the target. Moreover, to alleviate the possible defect of dealing with varying targets with a single hand-crafted prompt, we propose to distill the information learned from multiple prompts. Experimental results show the superior performance of our proposed model in both full-data and few-shot scenarios.
Abstract: Dense retrieval has shown promising results in many information retrieval (IR) related tasks, whose foundation is high-quality text representation learning for effective search. Some recent studies have shown that autoencoder-based language models are able to boost the dense retrieval performance using a weak decoder. However, we argue that 1) it is not discriminative to decode all the input texts and, 2) even a weak decoder has the bypass effect on the encoder. Therefore, in this work, we introduce a novel contrastive span prediction task to pre-train the encoder alone, but still retain the bottleneck ability of the autoencoder. In this way, we can 1) learn discriminative text representations efficiently with the group-wise contrastive learning over spans and, 2) avoid the bypass effect of the decoder thoroughly. Comprehensive experiments over publicly available retrieval benchmark datasets show that our approach can outperform existing pre-training methods for dense retrieval significantly.
Abstract: With the rapid growth of interaction data, many clustering methods have been proposed to discover interaction patterns as prior knowledge beneficial to downstream tasks. Considering that an interaction can be seen as an action occurring among multiple objects, most existing methods model the objects and their pair-wise relations as nodes and links in graphs. However, they only model and leverage part of the information in real entire interactions, i.e., either decompose the entire interaction into several pair-wise sub-interactions for simplification, or only focus on clustering some specific types of objects, which limits the performance and explainability of clustering. To tackle this issue, we propose to Co-cluster the Interactions via Attentive Hypergraph neural network (CIAH). Particularly, with more comprehensive modeling of interactions by hypergraph, we propose an attentive hypergraph neural network to encode the entire interactions, where an attention mechanism is utilized to select important attributes for explanations. Then, we introduce a salient method to guide the attention to be more consistent with real importance of attributes, namely saliency-based consistency. Moreover, we propose a novel co-clustering method to perform a joint clustering for the representations of interactions and the corresponding distributions of attribute selection, namely cluster-based consistency. Extensive experiments demonstrate that our CIAH significantly outperforms state-of-the-art clustering methods on both public datasets and real industrial datasets.
Abstract: Neural text matching models have been used in a range of applications such as question answering and natural language inference, and have yielded a good performance. However, these neural models are of a limited adaptability, resulting in a decline in performance when encountering test examples from a different dataset or even a different task. The adaptability is particularly important in the few-shot setting: in many cases, there is only a limited amount of labeled data available for a target dataset or task, while we may have access to a richly labeled source dataset or task. However, adapting a model trained on the abundant source data to a few-shot target dataset or task is challenging. To tackle this challenge, we propose a Meta-Weight Regulator (MWR), which is a meta-learning approach that learns to assign weights to the source examples based on their relevance to the target loss. Specifically, MWR first trains the model on the uniformly weighted source examples, and measures the efficacy of the model on the target examples via a loss function. By iteratively performing a (meta) gradient descent, high-order gradients are propagated to the source examples. These gradients are then used to update the weights of source examples, in a way that is relevant to the target performance. As MWR is model-agnostic, it can be applied to any backbone neural model. Extensive experiments are conducted with various backbone text matching models, on four widely used datasets and two tasks. The results demonstrate that our proposed approach significantly outperforms a number of existing adaptation methods and effectively improves the cross-dataset and cross-task adaptability of the neural text matching models in the few-shot setting.
Abstract: Relation prediction on knowledge graphs (KGs) aims to infer missing valid triples from observed ones. Although this task has been deeply studied, most previous studies are limited to the transductive setting and cannot handle emerging entities. Actually, the inductive setting is closer to real-life scenarios because it allows entities in the testing phase to be unseen during training. However, it is challenging to precisely conduct inductive relation prediction as there exists requirements of entity-independent relation modeling and discrete logical reasoning for interoperability. To this end, we propose a novel model ConGLR to incorporate context graph with logical reasoning. Firstly, the enclosing subgraph w.r.t. target head and tail entities are extracted and initialized by the double radius labeling. And then the context graph involving relational paths, relations and entities is introduced. Secondly, two graph convolutional networks (GCNs) with the information interaction of entities and relations are carried out to process the subgraph and context graph respectively. Considering the influence of different edges and target relations, we introduce edge-aware and relation-aware attention mechanisms for the subgraph GCN. Finally, by treating the relational path as rule body and target relation as rule head, we integrate neural calculating and logical reasoning to obtain inductive scores. And to focus on the specific modeling goals of each module, the stop-gradient is utilized in the information interaction between context graph and subgraph GCNs in the training process. In this way, ConGLR satisfies two inductive requirements at the same time. Extensive experiments demonstrate that ConGLR obtains outstanding performance against state-of-the-art baselines on twelve inductive dataset versions of three common KGs.
Abstract: Multimodal Knowledge Graphs (MKGs), which organize visual-text factual knowledge, have recently been successfully applied to tasks such as information retrieval, question answering, and recommendation system. Since most MKGs are far from complete, extensive knowledge graph completion studies have been proposed focusing on the multimodal entity, relation extraction and link prediction. However, different tasks and modalities require changes to the model architecture, and not all images/objects are relevant to text input, which hinders the applicability to diverse real-world scenarios. In this paper, we propose a hybrid transformer with multi-level fusion to address those issues. Specifically, we leverage a hybrid transformer architecture with unified input-output for diverse multimodal knowledge graph completion tasks. Moreover, we propose multi-level fusion, which integrates visual and text representation via coarse-grained prefix-guided interaction and fine-grained correlation-aware fusion modules. We conduct extensive experiments to validate that our MKGformer can obtain SOTA performance on four datasets of multimodal link prediction, multimodal RE, and multimodal NER1. https://github.com/zjunlp/MKGformer.
Abstract: Knowledge graph completion (KGC) aims to infer missing knowledge triples based on known facts in a knowledge graph. Current KGC research mostly follows an entity ranking protocol, wherein the effectiveness is measured by the predicted rank of a masked entity in a test triple. The overall performance is then given by a micro(-average) metric over all individual answer entities. Due to the incomplete nature of the large-scale knowledge bases, such an entity ranking setting is likely affected by unlabelled top-ranked positive examples, raising questions on whether the current evaluation protocol is sufficient to guarantee a fair comparison of KGC systems. To this end, this paper presents a systematic study on whether and how the label sparsity affects the current KGC evaluation with the popular micro metrics. Specifically, inspired by the TREC paradigm for large-scale information retrieval (IR) experimentation, we create a relatively "complete" judgment set based on a sample from the popular FB15k-237 dataset following the TREC pooling method. According to our analysis, it comes as a surprise that switching from the original labels to our "complete" labels results in a drastic change of system ranking of a variety of 13 popular KGC models in terms of micro metrics. Further investigation indicates that the IR-like macro(-average) metrics are more stable and discriminative under different settings, meanwhile, less affected by label sparsity. Thus, for KGC evaluation, we recommend conducting TREC-style pooling to balance between human efforts and label completeness, and reporting also the IR-like macro metrics to reflect the ranking nature of the KGC task.
Abstract: Knowledge graphs (KGs) consisting of a large number of triples have become widespread recently, and many knowledge graph embedding (KGE) methods are proposed to embed entities and relations of a KG into continuous vector spaces. Such embedding methods simplify the operations of conducting various in-KG tasks (e.g., link prediction) and out-of-KG tasks (e.g., question answering). They can be viewed as general solutions for representing KGs. However, existing KGE methods are not applicable to inductive settings, where a model trained on source KGs will be tested on target KGs with entities unseen during model training. Existing works focusing on KGs in inductive settings can only solve the inductive relation prediction task. They can not handle other out-of-KG tasks as general as KGE methods since they don't produce embeddings for entities. In this paper, to achieve inductive knowledge graph embedding, we propose a model MorsE, which does not learn embeddings for entities but learns transferable meta-knowledge that can be used to produce entity embeddings. Such meta-knowledge is modeled by entity-independent modules and learned by meta-learning. Experimental results show that our model significantly outperforms corresponding baselines for in-KG and out-of-KG tasks in inductive settings.
Abstract: Previous entity linking methods in knowledge graphs (KGs) mostly link the textual mentions to corresponding entities. However, they have deficiencies in processing numerous multimodal data, when the text is too short to provide enough context. Consequently, we conceive the idea of introducing valuable information of other modalities, and propose a novel multimodal entity linking method with gated hierarchical multimodal fusion and contrastive training (GHMFC). Firstly, in order to discover the fine-grained inter-modal correlations, GHMFC extracts the hierarchical features of text and visual co-attention through the multi-modal co-attention mechanism: textual-guided visual attention and visual-guided textual attention. The former attention obtains weighted visual features under the guidance of textual information. In contrast, the latter attention produces weighted textual features under the guidance of visual information. Afterwards, gated fusion is used to evaluate the importance of hierarchical features of different modalities and integrate them into the final multimodal representations of mentions. Subsequently, contrastive training with two types of contrastive losses is designed to learn more generic multimodal features and reduce noise. Finally, the linking entities are selected by calculating the cosine similarity between representations of mentions and entities in KGs. To evaluate the proposed method, this paper releases two new open multimodal entity linking datasets: WikiMEL and Richpedia-MEL. Experimental results demonstrate that GHMFC can learn meaningful multimodal representation and significantly outperforms most of the baseline methods.
Abstract: Given a text query, the text-to-video retrieval task aims to find the relevant videos in the database. Recently, model-based (MDB) methods have demonstrated superior accuracy than embedding-based (EDB) methods due to their excellent capacity of modeling local video/text correspondences, especially when equipped with large-scale pre-training schemes like ClipBERT. Generally speaking, MDB methods take a text-video pair as input and harness deep models to predict the mutual similarity, while EDB methods first utilize modality-specific encoders to extract embeddings for text and video, then evaluate the distance based on the extracted embeddings. Notably, MDB methods cannot produce explicit representations for text and video, instead, they have to exhaustively pair the query with every database item to predict their mutual similarities in the inference stage, which results in significant inefficiency in practical applications. In this work, we propose a novel EDB method CRET (Cross-modal REtrieval Transformer), which not only demonstrates promising efficiency in retrieval tasks, but also achieves better accuracy than existing MDB methods. The credits are mainly attributed to our proposed Cross-modal Correspondence Modeling (CCM) module and Gaussian Estimation of Embedding Space (GEES) loss. Specifically, the CCM module is composed by transformer decoders and a set of decoder centers. With the help of the learned decoder centers, the text/video embeddings can be efficiently aligned, without suffering from pairwise model-based inference. Moreover, to balance the information loss and computational overhead when sampling frames from a given video, we present a novel GEES loss, which implicitly conducts dense sampling in the video embedding space, without suffering from heavy computational cost. Extensive experiments show that without pre-training on extra datasets, our proposed CRET outperforms the state-of-the-art MDB methods that were pre-trained on additional datasets, meanwhile still shows promising efficiency in retrieval tasks.
Abstract: Zero-Shot Cross-Modal Retrieval (ZS-CMR) has recently drawn increasing attention as it focuses on a practical retrieval scenario, i.e., the multimodal test set consists of unseen classes that are disjoint with seen classes in the training set. The recently proposed methods typically adopt the generative model as the main framework to learn a joint latent embedding space to alleviate the modality gap. Generally, these methods largely rely on auxiliary semantic embeddings for knowledge transfer across classes and unconsciously neglect the effect of the data reconstruction manner in the adopted generative model. To address this issue, we propose a novel ZS-CMR model termed Multimodal Disentanglement Variational AutoEncoders (MDVAE), which consists of two coupled disentanglement variational autoencoders (DVAEs) and a fusion-exchange VAE (FVAE). Specifically, DVAE is developed to disentangle the original representations of each modality into modality-invariant and modality-specific features. FVAE is designed to fuse and exchange information of multimodal data by the reconstruction and alignment process without pre-extracted semantic embeddings. Moreover, an advanced counter-intuitive cross-reconstruction scheme is further proposed to enhance the informativeness and generalizability of the modality-invariant features for more effective knowledge transfer. The comprehensive experiments on four image-text retrieval and two image-sketch retrieval datasets consistently demonstrate that our method establishes the new state-of-the-art performance.
Abstract: Recently, large-scale pre-training methods like CLIP have made great progress in multi-modal research such as text-video retrieval. In CLIP, transformers are vital for modeling complex multi-modal relations. However, in the vision transformer of CLIP, the essential visual tokenization process, which produces discrete visual token sequences, generates many homogeneous tokens due to the redundancy nature of consecutive and similar frames in videos. This significantly increases computation costs and hinders the deployment of video retrieval models in web applications. In this paper, to reduce the number of redundant video tokens, we design a multi-segment token clustering algorithm to find the most representative tokens and drop the non-essential ones. As the frame redundancy occurs mostly in consecutive frames, we divide videos into multiple segments and conduct segment-level clustering. Center tokens from each segment are later concatenated into a new sequence, while their original spatial-temporal relations are well maintained. We instantiate two clustering algorithms to efficiently find deterministic medoids and iteratively partition groups in high dimensional space. Through this token clustering and center selection procedure, we successfully reduce computation costs by removing redundant visual tokens. This method further enhances segment-level semantic alignment between video and text representations, enforcing the spatio-temporal interactions of tokens from within-segment frames. Our method, coined as CenterCLIP, surpasses existing state-of-the-art by a large margin on typical text-video benchmarks, while reducing the training memory cost by 35% and accelerating the inference speed by 14% at the best case. The code is available at https://github.com/mzhaoshuai/CenterCLIP https://github.com/mzhaoshuai/CenterCLIP.
Abstract: Multi-modal hashing learns binary hash codes with extremely low storage cost and high retrieval speed. It can support efficient multi-modal retrieval well. However, most existing methods still suffer from three important problems: 1) Limited semantic representation capability with shallow learning. 2) Mandatory feature-level multi-modal fusion ignores heterogeneous multi-modal semantic gaps. 3) Direct coarse pairwise semantic preserving cannot effectively capture the fine-grained semantic correlations. For solving these problems, in this paper, we propose a Bit-aware Semantic Transformer Hashing (BSTH) framework to excavate bit-wise semantic concepts and simultaneously align the heterogeneous modalities for multi-modal hash learning on the concept-level. Specifically, the bit-wise implicit semantic concepts are learned with the transformer in a self-attention manner, which can achieve implicit semantic alignment on the fine-grained concept-level and reduce the heterogeneous modality gaps. Then, the concept-level multi-modal fusion is performed to enhance the semantic representation capability of each implicit concept and the fused concept representations are further encoded to the corresponding hash bits via bit-wise hash functions. Further, to supervise the bit-aware transformer module, a label prototype learning module is developed to learn prototype embeddings for all categories that capture the explicit semantic correlations on the category-level by considering the co-occurrence priors. Experiments on three widely tested multi-modal retrieval datasets demonstrate the superiority of the proposed method from various aspects.
Abstract: Multi-modal Product Summary Generation is a new yet challenging task, which aims to generate a concise and readable summary for a product given its multi-modal content, e.g., its long text description and image. Although existing methods have achieved great success, they still suffer from three key limitations: 1) overlook the benefit of pre-training, 2) lack the representation-level supervision, and 3) ignore the diversity of the seller-generated data. To address these limitations, in this work, we propose a Vision-to-Prompt based multi-modal product summary generation framework, dubbed as V2P, where a Generative Pre-trained Language Model (GPLM) is adopted as the backbone. In particular, to maintain the original text capability of the GPLM and fully utilize the high-level concepts contained in the product image, we design V2P with two key components: vision-based prominent attribute prediction, and attribute prompt-guided summary generation. The first component works on obtaining the vital semantic attributes of the product from its image by the Swin Transformer, while the second component aims to generate the summary based on the product's long text description and the attribute prompts yielded by the first component with a GPLM. Towards comprehensive supervision over the second component, apart from the conventional output-level supervision, we introduce the representation-level regularization. Meanwhile, we design the data augmentation-based robustness regularization to handle the diverse inputs and improve the robustness of the second component. Extensive experiments on a large-scale Chinese dataset verify the superiority of our model over cutting-edge methods.
Abstract: Near-duplicate video retrieval (NDVR) aims to find the copies or transformations of the query video from a massive video database. It plays an important role in many video related applications, including copyright protection, tracing, filtering and etc. Video representation and similarity search are crucial to any video retrieval system. To derive effective video representation, most video retrieval systems require a large amount of manually annotated data for training, making it costly inefficient. In addition, most retrieval systems are based on frame-level features for video similarity searching, making it expensive both storage wise and search wise. To address the above issues, we propose a video representation learning (VRL) approach to effectively address the above shortcomings. It first effectively learns video representation from unlabeled videos via contrastive learning to avoid the expensive cost of manual annotation. Then, it exploits transformer structure to aggregate frame-level features into clip-level to reduce both storage space and search complexity. It can learn the complementary and discriminative information from the interactions among clip frames, as well as acquire the frame permutation and missing invariant ability to support more flexible retrieval manners. Comprehensive experiments on two challenging near-duplicate video retrieval datasets, namely FIVR-200K and SVD, verify the effectiveness of our proposed VRL approach, which achieves the best performance of video retrieval on accuracy and efficiency.
Abstract: Image retrieval with hybrid-modality queries, also known as composing text and image for image retrieval (CTI-IR), is a retrieval task where the search intention is expressed in a more complex query format, involving both vision and text modalities. For example, a target product image is searched using a reference product image along with text about changing certain attributes of the reference image as the query. It is a more challenging image retrieval task that requires both semantic space learning and cross-modal fusion. Previous approaches that attempt to deal with both aspects achieve unsatisfactory performance. In this paper, we decompose the CTI-IR task into a three-stage learning problem to progressively learn the complex knowledge for image retrieval with hybrid-modality queries. We first leverage the semantic embedding space for open-domain image-text retrieval, and then transfer the learned knowledge to the fashion-domain with fashion-related pre-training tasks. Finally, we enhance the pre-trained model from single-query to hybrid-modality query for the CTI-IR task. Furthermore, as the contribution of individual modality in the hybrid-modality query varies for different retrieval scenarios, we propose a self-supervised adaptive weighting strategy to dynamically determine the importance of image and text in the hybrid-modality query for better retrieval. Extensive experiments show that our proposed model significantly outperforms state-of-the-art methods in the mean of Recall@K by 24.9% and 9.5% on the Fashion-IQ and Shoes benchmark datasets respectively.
Abstract: Moment retrieval in videos is a challenging task that aims to retrieve the most relevant video moment in an untrimmed video given a sentence description. Previous methods tend to perform self-modal learning and cross-modal interaction in a coarse manner, which neglect fine-grained clues contained in video content, query context, and their alignment. To this end, we propose a novel Multi-Granularity Perception Network (MGPN) that perceives intra-modality and inter-modality information at a multi-granularity level. Specifically, we formulate moment retrieval as a multi-choice reading comprehension task and integrate human reading strategies into our framework. A coarse-grained feature encoder and a co-attention mechanism are utilized to obtain a preliminary perception of intra-modality and inter-modality information. Then a fine-grained feature encoder and a conditioned interaction module are introduced to enhance the initial perception inspired by how humans address reading comprehension problems. Moreover, to alleviate the huge computation burden of some existing methods, we further design an efficient choice comparison module and reduce the hidden size with imperceptible quality loss. Extensive experiments on Charades-STA, TACoS, and ActivityNet Captions datasets demonstrate that our solution outperforms existing state-of-the-art methods.
Abstract: Video moment retrieval aims at finding the start and end timestamps of a moment (part of a video) described by a given natural language query. Fully supervised methods need complete temporal boundary annotations to achieve promising results, which is costly since the annotator needs to watch the whole moment. Weakly supervised methods only rely on the paired video and query, but the performance is relatively poor. In this paper, we look closer into the annotation process and propose a new paradigm called "glance annotation". This paradigm requires the timestamp of only one single random frame, which we refer to as a "glance", within the temporal boundary of the fully supervised counterpart. We argue this is beneficial because comparing to weak supervision, trivial cost is added yet more potential in performance is provided. Under the glance annotation setting, we propose a method named as Video moment retrieval via Glance Annotation (ViGA) based on contrastive learning. ViGA cuts the input video into clips and contrasts between clips and queries, in which glance guided Gaussian distributed weights are assigned to all clips. Our extensive experiments indicate that ViGA achieves better results than the state-of-the-art weakly supervised methods by a large margin, even comparable to fully supervised methods in some cases.
Abstract: Keyphrases can concisely describe the high-level topics discussed in a document that usually possesses hierarchical topic structures. Thus, it is crucial to understand the hierarchical topic structures and employ it to guide the keyphrase identification. However, integrating the hierarchical topic information into a deep keyphrase generation model is unexplored. In this paper, we focus on how to effectively exploit the hierarchical topic to improve the keyphrase generation performance (HTKG). Specifically, we propose a novel hierarchical topic-guided variational neural sequence generation method for keyphrase generation, which consists of two major modules: a neural hierarchical topic model that learns the latent topic tree across the whole corpus of documents, and a variational neural keyphrase generation model to generate keyphrases under hierarchical topic guidance. Finally, these two modules are jointly trained to help them learn complementary information from each other. To the best of our knowledge, this is the first attempt to leverage the neural hierarchical topic to guide keyphrase generation. The experimental results demonstrate that our method significantly outperforms the existing state-of-the-art methods across five benchmark datasets.
Abstract: Machine reading comprehension has aroused wide concerns, since it explores the potential of model for text understanding. To further equip the machine with the reasoning capability, the challenging task of logical reasoning is proposed. Previous works on logical reasoning have proposed some strategies to extract the logical units from different aspects. However, there still remains a challenge to model the long distance dependency among the logical units. Also, it is demanding to uncover the logical structures of the text and further fuse the discrete logic to the continuous text embedding. To tackle the above issues, we propose an end-to-end model Logiformer which utilizes a two-branch graph transformer network for logical reasoning of text. Firstly, we introduce different extraction strategies to split the text into two sets of logical units, and construct the logical graph and the syntax graph respectively. The logical graph models the causal relations for the logical branch while the syntax graph captures the co-occurrence relations for the syntax branch. Secondly, to model the long distance dependency, the node sequence from each graph is fed into the fully connected graph transformer structures. The two adjacent matrices are viewed as the attention biases for the graph transformer layers, which map the discrete logical structures to the continuous text embedding space. Thirdly, a dynamic gate mechanism and a question-aware self-attention module are introduced before the answer prediction to update the features. The reasoning process provides the interpretability by employing the logical units, which are consistent with human cognition. The experimental results show the superiority of our model, which outperforms the state-of-the-art single model on two logical reasoning benchmarks.
Abstract: An opinion tag is a sequence of words on a specific aspect of a product or service. Opinion tags reflect key characteristics of product reviews and help users quickly understand their content in e-commerce portals. The task of abstractive opinion tagging has previously been proposed to automatically generate a ranked list of opinion tags for a given review. However, current models for opinion tagging are not personalized, even though personalization is an essential ingredient of engaging user interactions, especially in e-commerce. In this paper, we focus on the task of personalized abstractive opinion tagging. There are two main challenges when developing models for the end-to-end generation of personalized opinion tags: sparseness of reviews and difficulty to integrate multi-type signals, i.e., explicit review signals and implicit behavioral signals. To address these challenges, we propose an end-to-end model, named POT, that consists of three main components: (1) a review-based explicit preference tracker component based on a hierarchical heterogeneous review graph to track user preferences from reviews; (2)a behavior-based implicit preference tracker component using a heterogeneous behavior graph to track the user preferences from implicit behaviors; and (3) a personalized rank-aware tagging component to generate a ranked sequence of personalized opinion tags. In our experiments, we evaluate POT on a real-world dataset collected from e-commerce platforms and the results demonstrate that it significantly outperforms strong baselines.
Abstract: Entity Set Expansion (ESE) is a promising task which aims to expand entities of the target semantic class described by a small seed entity set. Various NLP and IR applications will benefit from ESE due to its ability to discover knowledge. Although previous ESE methods have achieved great progress, most of them still lack the ability to handle hard negative entities (i.e., entities that are difficult to distinguish from the target entities), since two entities may or may not belong to the same semantic class based on different granularity levels we analyze on. To address this challenge, we devise an entity-level masked language model with contrastive learning to refine the representation of entities. In addition, we propose the ProbExpan, a novel probabilistic ESE framework utilizing the entity representation obtained by the aforementioned language model to expand entities. Extensive experiments and detailed analyses on three datasets show that our method outperforms previous state-of-the-art methods.
Abstract: Cross-Lingual Summarization (CLS) is a task that extracts important information from a source document and summarizes it into a summary in another language. It is a challenging task that requires a system to understand, summarize, and translate at the same time, making it highly related to Monolingual Summarization (MS) and Machine Translation (MT). In practice, the training resources for Machine Translation are far more than that for cross-lingual and monolingual summarization. Thus incorporating the Machine Translation corpus into CLS would be beneficial for its performance. However, the present work only leverages a simple multi-task framework to bring Machine Translation in, lacking deeper exploration. In this paper, we propose a novel task, Cross-lingual Summarization with Compression rate (CSC), to benefit Cross-Lingual Summarization by large-scale Machine Translation corpus. Through introducing compression rate, the information ratio between the source and the target text, we regard the MT task as a special CLS task with a compression rate of 100%. Hence they can be trained as a unified task, sharing knowledge more effectively. However, a huge gap exists between the MT task and the CLS task, where samples with compression rates between 30% and 90% are extremely rare. Hence, to bridge these two tasks smoothly, we propose an effective data augmentation method to produce document-summary pairs with different compression rates. The proposed method not only improves the performance of the CLS task, but also provides controllability to generate summaries in desired lengths. Experiments demonstrate that our method outperforms various strong baselines in three cross-lingual summarization datasets. We released our code and data at https://github.com/ybai-nlp/CLS_CR.
Abstract: Prediction over event sequences is critical for many real-world applications in Information Retrieval and Natural Language Processing. Future Event Generation (FEG) is a challenging task in event sequence prediction because it requires not only fluent text generation but also commonsense reasoning to maintain the logical coherence of the entire event story. In this paper, we propose a novel explainable FEG framework, Coep. It highlights and integrates two types of event knowledge, sequential knowledge of direct event-event relations and inferential knowledge that reflects the intermediate character psychology between events, such as intents, causes, reactions, which intrinsically pushes the story forward. To alleviate the knowledge forgetting issue, we design two modules, IM and GM, for each type of knowledge, which are combined via prompt tuning. First, IM focuses on understanding inferential knowledge to generate commonsense explanations and provide a soft prompt vector for GM. We also design a contrastive discriminator for better generalization ability. Second, GM generates future events by modeling direct sequential knowledge with the guidance of IM. Automatic and human evaluation demonstrate that our approach can generate more coherent, specific, and logical future events.
Abstract: Event argument extraction (EAE) is an important information extraction task, which aims to identify the arguments of an event described in a given text and classify the roles played by them. A key characteristic in realistic EAE data is that the instance numbers of different roles follow an obvious long-tail distribution. However, the training and evaluation paradigms of existing EAE models either prone to neglect the performance on "tail roles'', or change the role instance distribution for model training to an unrealistic uniform distribution. Though some generic methods can alleviate the class imbalance in long-tail datasets, they usually sacrifice the performance of "head classes'' as a trade-off. To address the above issues, we propose to train our model on realistic long-tail EAE datasets, and evaluate the average performance over all roles. Inspired by the Mixture of Experts (MOE), we propose a Routing-Balanced Dual Expert Framework (RBDEF), which divides all roles into "head" and "tail" two scopes and assigns the classifications of head and tail roles to two separate experts. In inference, each encoded instance will be allocated to one of the two experts by a routing mechanism. To reduce routing errors caused by the imbalance of role instances, we design a Balanced Routing Mechanism (BRM), which transfers several head roles to the tail expert to balance the load of routing, and employs a tri-filter routing strategy to reduce the misallocation of the tail expert's instances. To enable an effective learning of tail roles with scarce instances, we devise Target-Specialized Meta Learning (TSML) to train the tail expert. Different from other meta learning algorithms that only search a generic parameter initialization equally applying to infinite tasks, TSML can adaptively adjust its search path to obtain a specialized initialization for the tail expert, thereby expanding the benefits to the learning of tail roles. In experiments, RBDEF significantly outperforms the state-of-the-art EAE models and advanced methods for long-tail data.
Abstract: Event detection (ED) is a pivotal task for information retrieval, which aims at identifying event triggers and classifying them into pre-defined event types. In real-world applications, events are usually annotated with numerous fine-grained types, which often arises long-tail type nature and co-occurrence event nature. Existing studies explore the event correlations without full utilization, which may limit the capability of event detection. This paper simultaneously incorporates both the type-level and instance-level event correlations, and proposes a novel framework, termed as CorED. Specifically, we devise an adaptive graph-based type encoder to capture instance-level correlations, learning type representations not only from their training data but also from their relevant types, thus leading to more informative type representations especially for the low-resource types. Besides, we devise an instance interactive decoder to capture instance-level correlations, which predicts event instance types conditioned on the contextual typed event instances, leveraging co-occurrence events as remarkable evidence in prediction. We conduct experiments on two public benchmarks, MAVEN and ACE-2005 dataset. Empirical results demonstrate the unity of both type-level and instance-level correlations, and the model achieves effectiveness performance on both benchmarks.
Abstract: Learning which Point-of-Interest (POI) a user will visit next is a challenging task for personalized recommender systems due to the large search space of possible POIs in the region. A recurring problem among existing works that makes it difficult to learn and perform well is the sparsity of the User-POI matrix. In this paper, we propose our Hierarchical Multi-Task Graph Recurrent Network (HMT-GRN) approach, which alleviates the data sparsity problem by learning different User-Region matrices of lower sparsities in a multi-task setting. We then perform a Hierarchical Beam Search (HBS) on the different region and POI distributions to hierarchically reduce the search space with increasing spatial granularity and predict the next POI. Our HBS provides efficiency gains by reducing the search space, resulting in speedups of 5 to 7 times over an exhaustive approach. In addition, we also propose a novel selectivity layer to predict if the next POI has been visited before by the user to balance between personalization and exploration. Experimental results on two real-world Location-Based Social Network (LBSN) datasets show that our model significantly outperforms baseline and the state-of-the-art methods.
Abstract: Next POI recommendation intends to forecast users' immediate future movements given their current status and historical information, yielding great values for both users and service providers. However, this problem is perceptibly complex because various data trends need to be considered together. This includes the spatial locations, temporal contexts, user's preferences, etc. Most existing studies view the next POI recommendation as a sequence prediction problem while omitting the collaborative signals from other users. Instead, we propose a user-agnostic global trajectory flow map and a novel Graph Enhanced Transformer model (GETNext) to better exploit the extensive collaborative signals for a more accurate next POI prediction, and alleviate the cold start problem in the meantime. GETNext incorporates the global transition patterns, user's general preference, spatio-temporal context, and time-aware category embeddings together into a transformer model to make the prediction of user's future moves. With this design, our model outperforms the state-of-the-art methods with a large margin and also sheds light on the cold start challenges within the spatio-temporal involved recommendation problems.
Abstract: Next Point-of-Interest (POI) recommendation plays a critical role in many location-based applications as it provides personalized suggestions on attractive destinations for users. Since users' next movement is highly related to the historical visits, sequential methods such as recurrent neural networks are widely used in this task for modeling check-in behaviors. However, existing methods mainly focus on modeling the sequential regularity of check-in sequences but pay little attention to the intrinsic characteristics of POIs, neglecting the entanglement of the diverse influence stemming from different aspects of POIs. In this paper, we propose a novel Disentangled Representation-enhanced Attention Network (DRAN) for next POI recommendation, which leverages the disentangled representations to explicitly model different aspects and corresponding influence for representing a POI more precisely. Specifically, we first design a propagation rule to learn graph-based disentangled representations by refining two types of POI relation graphs, making full use of the distance-based and transition-based influence for representation learning. Then, we extend the attention architecture to aggregate personalized spatio-temporal information for modeling dynamic user preferences on the next timestamp, while maintaining the different components of disentangled representations independent. Extensive experiments on two real-world datasets demonstrate the superior performance of our model to state-of-the-art approaches. Further studies confirm the effectiveness of DRAN in representation disentanglement.
Abstract: News recommendation aims to help online news platform users find their preferred news articles. Existing news recommendation methods usually learn models from historical user behaviors on news. However, these behaviors are usually biased on news providers. Models trained on biased user data may capture and even amplify the biases on news providers, and are unfair for some minority news providers. In this paper, we propose a provider fairness-aware news recommendation framework (named ProFairRec), which can learn news recommendation models fair for different news providers from biased user data. The core idea of ProFairRec is to learn provider-fair news representations and provider-fair user representations to achieve provider fairness. To learn provider-fair representations from biased data, we employ provider-biased representations to inherit provider bias from data. Provider-fair and -biased news representations are learned from news content and provider IDs respectively, which are further aggregated to build fair and biased user representations based on user click history. All of these representations are used in model training while only fair representations are used for user-news matching to achieve fair news recommendation. Besides, we propose an adversarial learning task on news provider discrimination to prevent provider-fair news representation from encoding provider bias. We also propose an orthogonal regularization on provider-fair and -biased representations to better reduce provider bias in provider-fair representations. Moreover, ProFairRec is a general framework and can be applied to different news recommendation methods. Extensive experiments on a public dataset verify that our ProFairRec approach can effectively improve the provider fairness of many existing methods and meanwhile maintain their recommendation accuracy.
Abstract: Pre-travel out-of-town recommendation aims to recommend Point-of-Interests (POIs) to the users who plan to travel out of their hometown in the near future yet have not decided where to go, i.e., their destination regions and POIs both remain unknown. It is a non-trivial task since the searching space is vast, which may lead to distinct travel experiences in different out-of-town regions and eventually confuse decision-making. Besides, users' out-of-town travel behaviors are affected not only by their personalized preferences but heavily by others' travel behaviors. To this end, we propose a Crowd-Aware Pre-Travel Out-of-town Recommendation framework (CAPTOR) consisting of two major modules: spatial-affined conditional random field (SA-CRF) and crowd behavior memory network (CBMN). Specifically, SA-CRF captures the spatial affinity among POIs while preserving the inherent information of POIs. Then, CBMN is proposed to maintain the crowd travel behaviors w.r.t. each region through three affiliated blocks reading and writing the memory adaptively. We devise the elaborated metric space with a dynamic mapping mechanism, where the users and POIs are distinguishable both inherently and geographically. Extensive experiments on two real-world nationwide datasets validate the effectiveness of CAPTOR against the pre-travel out-of-town recommendation task.
Abstract: News recommendation for anonymous readers is a useful but challenging task for many news portals, where interactions between readers and articles are limited within a temporary login session. Previous works tend to formulate session-based recommendation as a next item prediction task, while they neglect the implicit feedback from user behaviors, which indicates what users really like or dislike. Hence, we propose a comprehensive framework to model user behaviors through positive feedback (i.e., the articles they spend more time on) and negative feedback (i.e., the articles they choose to skip without clicking in). Moreover, the framework implicitly models the user using their session start time, and the article using its initial publishing time, in what we call neutral feedback. Empirical evaluation on three real-world news datasets shows the framework's promising performance of more accurate, diverse and even unexpectedness recommendations than other state-of-the-art session-based recommendation approaches.
Abstract: Non-factoid question answering (NFQA) is a challenging and under-researched task that requires constructing long-form answers, such as explanations or opinions, to open-ended non-factoid questions - NFQs. There is still little understanding of the categories of NFQs that people tend to ask, what form of answers they expect to see in return, and what the key research challenges of each category are. This work presents the first comprehensive taxonomy of NFQ categories and the expected structure of answers. The taxonomy was constructed with a transparent methodology and extensively evaluated via crowdsourcing. The most challenging categories were identified through an editorial user study. We also release a dataset of categorised NFQs and a question category classifier. Finally, we conduct a quantitative analysis of the distribution of question categories using major NFQA datasets, showing that the NFQ categories that are the most challenging for current NFQA systems are poorly represented in these datasets. This imbalance may lead to insufficient system performance for challenging categories. The new taxonomy, along with the category classifier, will aid research in the area, helping to create more balanced benchmarks and to focus models on addressing specific categories.
Abstract: Designing natural language processing (NLP) models that produce predictions by first extracting a set of relevant input sentences, i.e., rationales, is gaining importance for improving model interpretability and producing supporting evidence for users. Current unsupervised approaches are designed to extract rationales that maximize prediction accuracy, which is invariably obtained by exploiting spurious correlations in datasets, and leads to unconvincing rationales. In this paper, we introduce unsupervised generative models to extract dual-purpose rationales, which must not only be able to support a subsequent answer prediction, but also support a reproduction of the input query. We show that such models can produce more meaningful rationales, that are less influenced by dataset artifacts, and as a result, also achieve the state-of-the-art on rationale extraction metrics on four datasets from the ERASER benchmark, significantly improving upon previous unsupervised methods. Our multi-task model is scalable and enables using state-of-the-art pretrained language models to design explainable question answering systems.
Abstract: Current question answering systems are insufficient when confronting real-life scenarios, as they can hardly be aware of whether a question is answerable given its context. Hence, there is a recent pursuit of unanswerability of a question and its attribution. Attribution of unanswerability requires the system to choose an appropriate cause for an unanswerable question. As the task is sophisticated for even human beings, it is expensive to acquire labeled data, which makes it a low-data regime problem. Moreover, the causes themselves are semantically abstract and complex, and the process of attribution is heavily question- and context-dependent. Thus, a capable model has to carefully appreciate the causes, and then, judiciously contrast the question with its context, in order to cast it into the right cause. In response to the challenges, we present PTAU, which refers to and implements a high-level human reading strategy such that one reads with anticipation. In specific, PTAU leverages the recent prompt-tuning paradigm, and is further enhanced with two innovatively conceived modules: 1) a cause-oriented template module that constructs continuous templates towards certain attributing class in high dimensional vector space; and 2) a semantics-aware label module that exploits label semantics through contrastive learning to render the classes distinguishable. Extensive experiments demonstrate that the proposed design better enlightens not only the attribution model, but also current question answering models, leading to superior performance.
Abstract: Community question answering (CQA) becomes increasingly prevalent in recent years, providing platforms for users with various backgrounds to obtain information and share knowledge. However, the redundancy and lengthiness issues of crowd-sourced answers limit the performance of answer selection, thus leading to difficulties in reading or even misunderstandings for community users. To solve these problems, we propose the dual graph question-answer attention networks (DGQAN) for answer selection task. Aims to fully understand the internal structure of the question and the corresponding answer, firstly, we construct a dual-CQA concept graph with graph convolution networks using the original question and answer text. Specifically, our CQA concept graph exploits the correlation information between question-answer pairs to construct two sub-graphs (QSubject-Answer and QBody-Answer), respectively. Further, a novel dual attention mechanism is incorporated to model both the internal and external semantic relations among questions and answers. More importantly, we conduct experiment to investigate the impact of each layer in the BERT model. The experimental results show that DGQAN model achieves state-of-the-art performance on three datasets (SemEval-2015, 2016, and 2017), outperforming all the baseline models.
Abstract: Retailers such as grocery stores or e-marketplaces often have vast selections of items for users to choose from. Predicting a user's next purchases has gained attention recently, in the form of next basket recommendation (NBR), as it facilitates navigating extensive assortments for users. Neural network-based models that focus on learning basket representations are the dominant approach in the recent literature. However, these methods do not consider the specific characteristics of the grocery shopping scenario, where users shop for grocery items on a regular basis, and grocery items are repurchased frequently by the same user. In this paper, we first gain a data-driven understanding of users' repeat consumption behavior through an empirical study on six public and proprietary grocery shopping transaction datasets. We discover that, averaged over all datasets, over 54% of NBR performance in terms of recall comes from repeat items: items that users have already purchased in their history, which constitute only 1% of the total collection of items on average. A NBR model with a strong focus on previously purchased items can potentially achieve high performance. We introduce ReCANet, a repeat consumption-aware neural network that explicitly models the repeat consumption behavior of users in order to predict their next basket. ReCANet significantly outperforms state-of-the-art models for the NBR task, in terms of recall and nDCG. We perform an ablation study and show that all of the components of ReCANet contribute to its performance, and demonstrate that a user's repetition ratio has a direct influence on the treatment effect of ReCANet.
Abstract: Recommender systems usually face the issue of filter bubbles: over-recommending homogeneous items based on user features and historical interactions. Filter bubbles will grow along the feedback loop and inadvertently narrow user interests. Existing work usually mitigates filter bubbles by incorporating objectives apart from accuracy such as diversity and fairness. However, they typically sacrifice accuracy, hurting model fidelity and user experience. Worse still, users have to passively accept the recommendation strategy and influence the system in an inefficient manner with high latency, e.g., keeping providing feedback (e.g., like and dislike) until the system recognizes the user intention. This work proposes a new recommender prototype called User-Controllable Recommender System (UCRS), which enables users to actively control the mitigation of filter bubbles. Functionally, 1) UCRS can alert users if they are deeply stuck in filter bubbles. 2) UCRS supports four kinds of control commands for users to mitigate the bubbles at different granularities. 3) UCRS can respond to the controls and adjust the recommendations on the fly. The key to adjusting lies in blocking the effect of out-of-date user representations on recommendations, which contains historical information inconsistent with the control commands. As such, we develop a causality-enhanced User-Controllable Inference (UCI) framework, which can quickly revise the recommendations based on user controls in the inference stage and utilize counterfactual inference to mitigate the effect of out-of-date user representations. Experiments on three datasets validate that the UCI framework can effectively recommend more desired items based on user controls, showing promising performance w.r.t. both accuracy and diversity.
Abstract: Knowledge graph (KG), integrating complex information and containing rich semantics, is widely considered as side information to enhance the recommendation systems. However, most of the existing KG-based methods concentrate on encoding the structural information in the graph, without utilizing the collaborative signals in user-item interaction data, which are important for understanding user preferences. Therefore, the representations learned by these models are insufficient for representing semantic information of users and items in the recommendation environment. The combination of both kinds of data provides a good chance to solve this problem, but it faces the following challenges: i) the inner correlations in user-item interaction data are difficult to capture from one side of the user or item; ii) capturing the knowledge associations on the whole KG would introduce noises and variously influence the recommendation results; iii) the semantic gap between both kinds of data is hard to alleviate. To tackle this research gap, we propose a novel duet representation learning framework named KADM to fuse local information (user-item interaction data) and global information (external knowledge graph) for the top-N recommendation, which is composed of two separate sub-models. One learns the local representations by discovering the inner correlations in local information with a knowledge-aware co-attention mechanism, and another learns the global representations by encoding the knowledge associations in global information with a relation-aware attention network. The two sub-models are jointly trained as part of the semantic fusion network to compute the user preferences, which discriminates the contribution of the two sub-models under the special context. We conduct experiments on two real-world datasets, and the evaluations show that KADM significantly outperforms state-of-art methods. Further ablation studies confirm that the duet architecture performs significantly better than either sub-model on the recommendation tasks.
Abstract: As much as Graph Convolutional Networks (GCNs) have shown tremendous success in recommender systems and collaborative filtering (CF), the mechanism of how they, especially the core components (\textiti.e., neighborhood aggregation) contribute to recommendation has not been well studied. To unveil the effectiveness of GCNs for recommendation, we first analyze them in a spectral perspective and discover two important findings: (1) only a small portion of spectral graph features that emphasize the neighborhood smoothness and difference contribute to the recommendation accuracy, whereas most graph information can be considered as noise that even reduces the performance, and (2) repetition of the neighborhood aggregation emphasizes smoothed features and filters out noise information in an ineffective way. Based on the two findings above, we propose a new GCN learning scheme for recommendation by replacing neihgborhood aggregation with a simple yet effective Graph Denoising Encoder (GDE), which acts as a band pass filter to capture important graph features. We show that our proposed method alleviates the over-smoothing and is comparable to an indefinite-layer GCN that can take any-hop neighborhood into consideration. Finally, we dynamically adjust the gradients over the negative samples to expedite model training without introducing additional complexity. Extensive experiments on five real-world datasets show that our proposed method not only outperforms state-of-the-arts but also achieves 12x speedup over LightGCN.
Abstract: Most modern recommender systems predict users' preferences with two components: user and item embedding learning, followed by the user-item interaction modeling. By utilizing the auxiliary review information accompanied with user ratings, many of the existing review-based recommendation models enriched user/item embedding learning ability with historical reviews or better modeled user-item interactions with the help of available user-item target reviews. Though significant progress has been made, we argue that current solutions for review-based recommendation suffer from two drawbacks. First, as review-based recommendation can be naturally formed as a user-item bipartite graph with edge features from corresponding user-item reviews, how to better exploit this unique graph structure for recommendation? Second, while most current models suffer from limited user behaviors, can we exploit the unique self-supervised signals in the review-aware graph to guide two recommendation components better? To this end, in this paper, we propose a novel Review-aware Graph Contrastive Learning (RGCL) framework for review-based recommendation. Specifically, we first construct a review-aware user-item graph with feature-enhanced edges from reviews, where each edge feature is composed of both the user-item rating and the corresponding review semantics. This graph with feature-enhanced edges can help attentively learn each neighbor node weight for user and item representation learning. After that, we design two additional contrastive learning tasks (i.e., Node Discrimination and Edge Discrimination) to provide self-supervised signals for the two components in recommendation process. Finally, extensive experiments over five benchmark datasets demonstrate the superiority of our proposed RGCL compared to the state-of-the-art baselines.
Abstract: Contrastive learning (CL) recently has spurred a fruitful line of research in the field of recommendation, since its ability to extract self-supervised signals from the raw data is well-aligned with recommender systems' needs for tackling the data sparsity issue. A typical pipeline of CL-based recommendation models is first augmenting the user-item bipartite graph with structure perturbations, and then maximizing the node representation consistency between different graph augmentations. Although this paradigm turns out to be effective, what underlies the performance gains is still a mystery. In this paper, we first experimentally disclose that, in CL-based recommendation models, CL operates by learning more uniform user/item representations that can implicitly mitigate the popularity bias. Meanwhile, we reveal that the graph augmentations, which used to be considered necessary, just play a trivial role. Based on this finding, we propose a simple CL method which discards the graph augmentations and instead adds uniform noises to the embedding space for creating contrastive views. A comprehensive experimental study on three benchmark datasets demonstrates that, though it appears strikingly simple, the proposed method can smoothly adjust the uniformity of learned representations and has distinct advantages over its graph augmentation-based counterparts in terms of recommendation accuracy and training efficiency. The code is released at https://github.com/Coder-Yu/QRec.
Abstract: In recommendation systems, the choice of loss function is critical since a good loss may significantly improve the model performance. However, manually designing a good loss is a big challenge due to the complexity of the problem. A large fraction of previous work focuses on handcrafted loss functions, which needs significant expertise and human effort. In this paper, inspired by the recent development of automated machine learning, we propose an automatic loss function generation framework, AutoLossGen, which is able to generate loss functions directly constructed from basic mathematical operators without prior knowledge on loss structure. More specifically, we develop a controller model driven by reinforcement learning to generate loss functions, and develop iterative and alternating optimization schedule to update the parameters of both the controller model and the recommender model. One challenge for automatic loss generation in recommender systems is the extreme sparsity of recommendation datasets, which leads to the sparse reward problem for loss generation and search. To solve the problem, we further develop a reward filtering mechanism for efficient and effective loss generation. Experimental results show that our framework manages to create tailored loss functions for different recommendation models and datasets, and the generated loss gives better recommendation performance than commonly used baseline losses. Besides, most of the generated losses are transferable, i.e., the loss generated based on one model and dataset also works well for another model or dataset. Source code of the work is available at https://github.com/rutgerswiselab/AutoLossGen.
Abstract: Online recommendation requires handling rapidly changing user preferences. Deep reinforcement learning (DRL) is an effective means of capturing users' dynamic interest during interactions with recommender systems. Generally, it is challenging to train a DRL agent in online recommender systems because of the sparse rewards caused by the large action space (e.g., candidate item space) and comparatively fewer user interactions. Leveraging experience replay (ER) has been extensively studied to conquer the issue of sparse rewards. However, they adapt poorly to the complex environment of online recommender systems and are inefficient in learning an optimal strategy from past experience. As a step to filling this gap, we propose a novel state-aware experience replay model, in which the agent selectively discovers the most relevant and salient experiences and is guided to find the optimal policy for online recommendations. In particular, a locality-sensitive hashing method is proposed to selectively retain the most meaningful experience at scale and a prioritized reward-driven strategy is designed to replay more valuable experiences with higher chance. We formally show that the proposed method guarantees the upper and lower bound on experience replay and optimizes the space complexity, as well as empirically demonstrate our model's superiority to several existing experience replay methods over three benchmark simulation platforms.
Abstract: Recommender systems have become a fundamental service in most E-Commerce platforms, in which the matching stage aims to retrieve potentially relevant candidate items to users for further ranking. Recently, some efforts on extracting multi-interests from user's historical behaviors have demonstrated superior performance. However, the historical behaviors are not noise-free due to the possible misclicks or disturbances. Existing works mainly overlook the fact that the interests of a user are not only reflected by the historical behaviors, but also inherently regulated by the profile information. Hence, we are interested in exploiting the benefit of user profile in multi-interest learning to enhance candidate matching performance. To this end, a user-aware multi-interest learning framework (named UMI) is proposed in this paper to exploit both user profile and behavior information for candidate matching. Specifically, UMI consists of two main components: dual-attention routing and interest refinement. In the dual-attention routing, we firstly introduce a user-guided attention network to identify the important historical items with respect to the user profile. Then, the resultant importance weights are leveraged via the dual-attentive capsule network to extract the user's multi-interests. Afterwards, the extracted interests are utilized to highlight the corresponding user profile features for interest refinement, such that different user profiles can be incorporated into interest learning for diverse user preference understanding. Besides, to improve the model's discriminative capacity, we further devise a harder-negatives strategy to support model optimization. Extensive experiments show that UMI significantly outperforms state-of-the-art multi-interest modeling alternatives. Currently, UMI has been successfully deployed at Taobao App in Alibaba, serving hundreds of millions of users.
Abstract: As the final stage of the multi-stage recommender system (MRS), reranking directly affects users' experience and satisfaction, thus playing a critical role in MRS. Despite the improvement achieved in the existing work, three issues are yet to be solved. First, users' historical behaviors contain rich preference information, such as users' long and short-term interests, but are not fully exploited in reranking. Previous work typically treats items in history equally important, neglecting the dynamic interaction between the history and candidate items. Second, existing reranking models focus on learning interactions at the item level while ignoring the fine-grained feature-level interactions. Lastly, estimating the reranking score on the ordered initial list before reranking may lead to the early scoring problem, thereby yielding suboptimal reranking performance. To address the above issues, we propose a framework named Multi-level Interaction Reranking (MIR). MIR combines low-level cross-item interaction and high-level set-to-list interaction, where we view the candidate items to be reranked as a set and the users' behavior history in chronological order as a list. We design a novel SLAttention structure for modeling the set-to-list interactions with personalized long-short term interests. Moreover, feature-level interactions are incorporated to capture the fine-grained influence among items. We design MIR in such a way that any permutation of the input items would not change the output ranking, and we theoretically prove it. Extensive experiments on three public and proprietary datasets show that MIR significantly outperforms the state-of-the-art models using various ranking and utility metrics.
Abstract: Modern recommender systems aim to improve user experience. As reinforcement learning (RL) naturally fits this objective---maximizing an user's reward per session---it has become an emerging topic in recommender systems. Developing RL-based recommendation methods, however, is not trivial due to the offline training challenge. Specifically, the keystone of traditional RL is to train an agent with large amounts of online exploration making lots of 'errors' in the process. In the recommendation setting, though, we cannot afford the price of making 'errors' online. As a result, the agent needs to be trained through offline historical implicit feedback, collected under different recommendation policies; traditional RL algorithms may lead to sub-optimal policies under these offline training settings. Here we propose a new learning paradigm---namely Prompt-Based Reinforcement Learning (PRL)---for the offline training of RL-based recommendation agents. While traditional RL algorithms attempt to map state-action input pairs to their expected rewards (e.g., Q-values), PRL directly infers actions (i.e., recommended items) from state-reward inputs. In short, the agents are trained to predict a recommended item given the prior interactions and an observed reward value---with simple supervised learning. At deployment time, this historical (training) data acts as a knowledge base, while the state-reward pairs are used as a prompt. The agents are thus used to answer the question: Which item should be recommended given the prior interactions & the prompted reward value? We implement PRL with four notable recommendation models and conduct experiments on two real-world e-commerce datasets. Experimental results demonstrate the superior performance of our proposed methods.
Abstract: Knowledge graph (KG) plays an increasingly important role in recommender systems. Recently, graph neural networks (GNNs) based model has gradually become the theme of knowledge-aware recommendation (KGR). However, there is a natural deficiency for GNN-based KGR models, that is, the sparse supervised signal problem, which may make their actual performance drop to some extent. Inspired by the recent success of contrastive learning in mining supervised signals from data itself, in this paper, we focus on exploring the contrastive learning in KG-aware recommendation and propose a novel multi-level cross-view contrastive learning mechanism, named MCCLK. Different from traditional contrastive learning methods which generate two graph views by uniform data augmentation schemes such as corruption or dropping, we comprehensively consider three different graph views for KG-aware recommendation, including global-level structural view, local-level collaborative and semantic views. Specifically, we consider the user-item graph as a collaborative view, the item-entity graph as a semantic view, and the user-item-entity graph as a structural view. MCCLK hence performs contrastive learning across three views on both local and global levels, mining comprehensive graph feature and structure information in a self-supervised manner. Besides, in semantic view, a k-Nearest-Neighbor (k NN) item-item semantic graph construction module is proposed, to capture the important item-item semantic relation which is usually ignored by previous work. Extensive experiments conducted on three benchmark datasets show the superior performance of our proposed method over the state-of-the-arts. The implementations are available at: https://github.com/CCIIPLab/MCCLK.
Abstract: Off-policy learning has drawn huge attention in recommender systems (RS), which provides an opportunity for reinforcement learning to abandon the expensive online training. However, off-policy learning from logged data suffers biases caused by the policy shift between the target policy and the logging policy. Consequently, most off-policy learning resorts to inverse propensity scoring (IPS) which however tends to be over-fitted over exposed (or recommended) items and thus fails to explore unexposed items. In this paper, we propose meta graph enhanced off-policy learning (MGPolicy), which is the first recommendation model for correcting the off-policy bias via contextual information. In particular, we explicitly leverage rich semantics in meta graphs for user state representation, and then train the candidate generation model to promote an efficient search in the action space. lMoreover, our MGpolicy is designed with counterfactual risk minimization, which can correct poicy learning bias and ultimately yield an effective target policy to maximize the long-run rewards for the recommendation. We extensively evaluate our method through a series of simulations and large-scale real-world datasets, achieving favorable results compared with state-of-the-art methods. Our code is currently available online.
Abstract: Recommendation systems make predictions chiefly based on users' historical interaction data (e.g., items previously clicked or purchased). There is a risk of privacy leakage when collecting the users' behavior data for building the recommendation model. However, existing privacy-preserving solutions are designed for tackling the privacy issue only during the model training [32] and results collection [40] phases. The problem of privacy leakage still exists when directly sharing the private user interaction data with organizations or releasing them to the public. To address this problem, in this paper, we present a User Privacy Controllable Synthetic Data Generation model (short for UPC-SDG), which generates synthetic interaction data for users based on their privacy preferences. The generation model aims to provide certain privacy guarantees while maximizing the utility of the generated synthetic data at both data level and item level. Specifically, at the data level, we design a selection module that selects those items that contribute less to a user's preferences from the user's interaction data. At the item level, a synthetic data generation module is proposed to generate a synthetic item corresponding to the selected item based on the user's preferences. Furthermore, we also present a privacy-utility trade-off strategy to balance the privacy and utility of the synthetic data. Extensive experiments and ablation studies have been conducted on three publicly accessible datasets to justify our method, demonstrating its effectiveness in generating synthetic data under users' privacy preferences.
Abstract: Knowledge graph (KG) plays an increasingly important role to improve the recommendation performance and interpretability. A recent technical trend is to design end-to-end models based on the information propagation schemes. However, existing propagation-based methods fail to (1) model the underlying hierarchical structures and relations, and (2) capture the high-order collaborative signals of items for learning high-quality user and item representations. In this paper, we propose a new model, called Hierarchy-Aware Knowledge Gated Network (HAKG), to tackle the aforementioned problems. Technically, we model users and items (that are captured by a user-item graph), as well as entities and relations (that are captured in a KG) in hyperbolic space, and design a new hyperbolic aggregation scheme to gather relational contexts over KG. Meanwhile, we introduce a novel angle constraint to preserve characteristics of items in the embedding space. Furthermore, we propose the dual item embeddings design to represent and propagate collaborative signals and knowledge associations separately, and leverage the gated aggregation to distill discriminative information for better capturing user behavior patterns. Experimental results on three benchmark datasets show that, HAKG achieves significant improvement over the state-of-the-art methods like CKAN, Hyper-Know, and KGIN. Further analyses on the learned hyperbolic embeddings confirm that HAKG can offer meaningful insights into the hierarchies of data.
Abstract: Limited by the statistical-based machine learning framework, a spurious correlation is likely to appear in existing knowledge-aware recommendation methods. It refers to a knowledge fact that appears causal to the user behaviors (inferred by the recommender) but is not in fact. For tackling this issue, we present a novel approach to discovering and alleviating the potential spurious correlations from a counterfactual perspective. To be specific, our approach consists of two counterfactual generators and a recommender. The counterfactual generators are designed to generate counterfactual interactions via reinforcement learning, while the recommender is implemented with two different graph neural networks to aggregate the information from KG and user-item interactions respectively. The counterfactual generators and recommender are integrated in a mutually collaborative way. With this approach, the recommender helps the counterfactual generators better identify potential spurious correlations and generate high-quality counterfactual interactions, while the counterfactual generators help the recommender weaken the influence of the potential spurious correlations simultaneously. Extensive experiments on three real-world datasets have shown the effectiveness of the proposed approach by comparing it with a number of competitive baselines. Our implementation code is available at: https://github.com/RUCAIBox/CGKR.
Abstract: The ubiquity of implicit feedback makes them the default choice to build modern recommender systems. Generally speaking, observed interactions are considered as positive samples, while unobserved interactions are considered as negative ones. However, implicit feedback is inherently noisy because of the ubiquitous presence of noisy-positive and noisy-negative interactions. Recently, some studies have noticed the importance of denoising implicit feedback for recommendations, and enhanced the robustness of recommendation models to some extent. Nonetheless, they typically fail to (1) capture the hard yet clean interactions for learning comprehensive user preference, and (2) provide a universal denoising solution that can be applied to various kinds of recommendation models. In this paper, we thoroughly investigate the memorization effect of recommendation models, and propose a new denoising paradigm, i.e., Self-Guided Denoising Learning (SGDL), which is able to collect memorized interactions at the early stage of the training (i.e., ''noise-resistant'' period), and leverage those data as denoising signals to guide the following training (i.e., ''noise-sensitive'' period) of the model in a meta-learning manner. Besides, our method can automatically switch its learning phase at the memorization point from memorization to self-guided learning, and select clean and informative memorized data via a novel adaptive denoising scheduler to improve the robustness. We incorporate SGDL with four representative recommendation models (i.e., NeuMF, CDAE, NGCF and LightGCN) and different loss functions (i.e., binary cross-entropy and BPR loss). The experimental results on three benchmark datasets demonstrate the effectiveness of SGDL over the state-of-the-art denoising methods like T-CE, IR, DeCA, and even state-of-the-art robust graph-based methods like SGCN and SGL.
Abstract: User cold-start is a major challenge in building personalized recommender systems. Due to the lack of sufficient interactions, it is difficult to effectively model new users. One of the main solutions is to obtain an initial model through meta-learning (mainly gradient-based methods) and adapt it to new users with a few steps of gradient descent. Although these methods have achieved remarkable performance, they are still far from being usable in real-world applications due to their high-demand data processing, heavy computational burden, and inability to perform effective user-incremental update. In this paper, we propose a d eployable and c ontinuable m eta-learning-based r ecommendation (DCMR) approach, which can achieve fast user-incremental updating with task replay and first-order gradient descent. Specifically, we introduce a dual-constrained task sampler, distillation-based loss functions, and an adaptive controller in this framework to balance the trade-off between stability and plasticity in updating. In summary, DCMR can be updated while serving new users; in other words, it learns continuously and rapidly from a sequential user stream and is able to make recommendations at any time. The extensive experiments conducted on three benchmark datasets illustrate the superiority of our model.
Abstract: Knowledge Graphs (KGs) have been utilized as useful side information to improve recommendation quality. In those recommender systems, knowledge graph information often contains fruitful facts and inherent semantic relatedness among items. However, the success of such methods relies on the high quality knowledge graphs, and may not learn quality representations with two challenges: i) The long-tail distribution of entities results in sparse supervision signals for KG-enhanced item representation; ii) Real-world knowledge graphs are often noisy and contain topic-irrelevant connections between items and entities. Such KG sparsity and noise make the item-entity dependent relations deviate from reflecting their true characteristics, which significantly amplifies the noise effect and hinders the accurate representation of user's preference. To fill this research gap, we design a general Knowledge Graph Contrastive Learning framework (KGCL) that alleviates the information noise for knowledge graph-enhanced recommender systems. Specifically, we propose a knowledge graph augmentation schema to suppress KG noise in information aggregation, and derive more robust knowledge-aware representations for items. In addition, we exploit additional supervision signals from the KG augmentation process to guide a cross-view contrastive learning paradigm, giving a greater role to unbiased user-item interactions in gradient descent and further suppressing the noise. Extensive experiments on three public datasets demonstrate the consistent superiority of our KGCL over state-of-the-art techniques. KGCL also achieves strong performance in recommendation scenarios with sparse user-item interactions, long-tail and noisy KG entities. Our implementation codes are available at https://github.com/yuh-yang/KGCL-SIGIR22.
Abstract: Current dense retrievers are not robust to out-of-domain and outlier queries, i.e. their effectiveness on these queries is much poorer than what one would expect. In this paper, we consider a specific instance of such queries: queries that contain typos. We show that a small character level perturbation in queries (as caused by typos) highly impacts the effectiveness of dense retrievers. We then demonstrate that the root cause of this resides in the input tokenization strategy employed by BERT. In BERT, tokenization is performed using the BERT's WordPiece tokenizer and we show that a token with a typo will significantly change the token distributions obtained after tokenization. This distribution change translates to changes in the input embeddings passed to the BERT-based query encoder of dense retrievers. We then turn our attention to devising dense retriever methods that are robust to such queries with typos, while still being as performant as previous methods on queries without typos. For this, we use CharacterBERT as the backbone encoder and an efficient yet effective training method, called Self-Teaching (ST), that distills knowledge from queries without typos into the queries with typos. Experimental results show that CharacterBERT in combination with ST achieves significantly higher effectiveness on queries with typos compared to previous methods. Along with these results and the open-sourced implementation of the methods, we also provide a new passage retrieval dataset consisting of real-world queries with typos and associated relevance assessments on the MS MARCO corpus, thus supporting the research community in the investigation of effective and robust dense retrievers. Code, experimental results and dataset are made available at https://github.com/ielab/CharacterBERT-DR.
Abstract: Pre-trained language models such as BERT have been a key ingredient to achieve state-of-the-art results on a variety of tasks in natural language processing and, more recently, also in information retrieval. Recent research even claims that BERT is able to capture factual knowledge about entity relations and properties, the information that is commonly obtained from knowledge graphs. This paper investigates the following question: Do BERT-based entity retrieval models benefit from additional entity information stored in knowledge graphs? To address this research question, we map entity embeddings into the same input space as a pre-trained BERT model and inject these entity embeddings into the BERT model. This entity-enriched language model is then employed on the entity retrieval task. We show that the entity-enriched BERT model improves effectiveness on entity-oriented queries over a regular BERT model, establishing a new state-of-the-art result for the entity retrieval task, with substantial improvements for complex natural language queries and queries requesting a list of entities with a certain property. Additionally, we show that the entity information provided by our entity-enriched model particularly helps queries related to less popular entities. Last, we observe empirically that the entity-enriched BERT models enable fine-tuning on limited training data, which otherwise would not be feasible due to the known instabilities of BERT in few-sample fine-tuning, thereby contributing to data-efficient training of BERT for entity search.
Abstract: Entity-oriented search systems often learn vector representations of entities via the introductory paragraph from the Wikipedia page of the entity. As such representations are the same for every query, our hypothesis is that the representations are not ideal for IR tasks. In this work, we present BERT Entity Representations (BERT-ER) which are query-specific vector representations of entities obtained from text that describes how an entity is relevant for a query. Using BERT-ER in a downstream entity ranking system, we achieve a performance improvement of 13-42% (Mean Average Precision) over a system that uses the BERT embedding of the introductory paragraph from Wikipedia on two large-scale test collections. Our approach also outperforms entity ranking systems using entity embeddings from Wikipedia2Vec, ERNIE, and E-BERT. We show that our entity ranking system using BERT-ER can increase precision at the top of the ranking by promoting relevant entities to the top. With this work, we release our BERT models and query-specific entity embeddings fine-tuned for the entity ranking task.
Abstract: The pre-trained language models (PLMs), such as BERT and ERNIE, have achieved outstanding performance in many natural language understanding tasks. Recently, PLMs-based Information Retrieval models have also been investigated and showed substantially state-of-the-art effectiveness, e.g., MORES, PROP and ColBERT. Moreover, most of the PLMs-based rankers only focus on a single level relevance matching (e.g., character-level), while ignore the other granularity information (e.g., words and phrases), which easily lead to the ambiguity of query understanding and inaccurate matching issues in web search. In this paper, we aim to improve the state-of-the-art PLMs ERNIE for web search, by modeling multi-granularity context information with the awareness of word importance in queries and documents. In particular, we propose a novel H-ERNIE framework, which includes a query-document analysis component and a hierarchical ranking component. The query-document analysis component has several individual modules which generate the necessary variables, such as word segmentation, word importance analysis, and word tightness analysis. Based on these variables, the importance-aware multiple-level correspondences are sent to the ranking model. The hierarchical ranking model includes a multi-layer transformer module to learn the character-level representations, a word-level matching module, and a phrase-level matching module with word importance. Each of these modules models the query and the document matching from a different perspective. Also, these levels are inherently communicated to achieve the overall accurate matching. We discuss the time complexity of the proposed framework, and show that it can be efficiently implemented in real applications. The offline and online experiments on both public data sets and a commercial search engine illustrate the effectiveness of the proposed H-ERNIE framework.
Abstract: Passage re-ranking is to obtain a permutation over the candidate passage set from retrieval stage. Re-rankers have been boomed by Pre-trained Language Models (PLMs) due to their overwhelming advantages in natural language understanding. However, existing PLM based re-rankers may easily suffer from vocabulary mismatch and lack of domain specific knowledge. To alleviate these problems, explicit knowledge contained in knowledge graph is carefully introduced in our work. Specifically, we employ the existing knowledge graph which is incomplete and noisy, and first apply it in passage re-ranking task. To leverage a reliable knowledge, we propose a novel knowledge graph distillation method and obtain a knowledge meta graph as the bridge between query and passage. To align both kinds of embedding in the latent space, we employ PLM as text encoder and graph neural network over knowledge meta graph as knowledge encoder. Besides, a novel knowledge injector is designed for the dynamic interaction between text and knowledge encoder. Experimental results demonstrate the effectiveness of our method especially in queries requiring in-depth domain knowledge.
Abstract: Pre-trained language models (PLMs) have achieved great success in the area of Information Retrieval. Studies show that applying these models to ad-hoc document ranking can achieve better retrieval effectiveness. However, on the Web, most information is organized in the form of HTML web pages. In addition to the pure text content, the structure of the content organized by HTML tags is also an important part of the information delivered on a web page. Currently, such structured information is totally ignored by pre-trained models which are trained solely based on text content. In this paper, we propose to leverage large-scale web pages and their DOM (Document Object Model) tree structures to pre-train models for information retrieval. We argue that using the hierarchical structure contained in web pages, we can get richer contextual information for training better language models. To exploit this kind of information, we devise four pre-training objectives based on the structure of web pages, then pre-train a Transformer model towards these tasks jointly with traditional masked language model objective. Experimental results on two authoritative ad-hoc retrieval datasets prove that our model can significantly improve ranking performance compared to existing pre-trained models.
Abstract: Vector quantization (VQ) based ANN indexes, such as Inverted File System (IVF) and Product Quantization (PQ), have been widely applied to embedding based document retrieval thanks to the competitive time and memory efficiency. Originally, VQ is learned to minimize the reconstruction loss, i.e., the distortions between the original dense embeddings and the reconstructed embeddings after quantization. Unfortunately, such an objective is inconsistent with the goal of selecting ground-truth documents for the input query, which may cause severe loss of retrieval quality. Recent works identify such a defect, and propose to minimize the retrieval loss through contrastive learning. However, these methods intensively rely on queries with ground-truth documents, whose performance is limited by the insufficiency of labeled data. In this paper, we propose Distill-VQ, which unifies the learning of IVF and PQ within a knowledge distillation framework. In Distill-VQ, the dense embeddings are leveraged as "teachers'', which predict the query's relevance to the sampled documents. The VQ modules are treated as the "students'', which are learned to reproduce the predicted relevance, such that the reconstructed embeddings may fully preserve the retrieval result of the dense embeddings. By doing so, Distill-VQ is able to derive substantial training signals from the massive unlabeled data, which significantly contributes to the retrieval quality. We perform comprehensive explorations for the optimal conduct of knowledge distillation, which may provide useful insights for the learning of VQ based ANN index. We also experimentally show that the labeled data is no longer a necessity for high-quality vector quantization, which indicates Distill-VQ's strong applicability in practice. The evaluations are performed on MS MARCO and Natural Questions benchmarks, where Distill-VQ notably outperforms the SOTA VQ methods in Recall and MRR. Our code is avaliable at https://github.com/staoxiao/LibVQ.
Abstract: Recently, pre-training methods tailored for IR tasks have achieved great success. However, as the mechanisms behind the performance improvement remain under-investigated, the interpretability and robustness of these pre-trained models still need to be improved. Axiomatic IR aims to identify a set of desirable properties expressed mathematically as formal constraints to guide the design of ranking models. Existing studies have already shown that considering certain axioms may help improve the effectiveness and interpretability of IR models. However, there still lack efforts of incorporating these IR axioms into pre-training methodologies. To shed light on this research question, we propose a novel pre-training method with \underlineA xiomatic \underlineRe gularization for ad hoc \underlineS earch (ARES). In the ARES framework, a number of existing IR axioms are re-organized to generate training samples to be fitted in the pre-training process. These training samples then guide neural rankers to learn the desirable ranking properties. Compared to existing pre-training approaches, ARES is more intuitive and explainable. Experimental results on multiple publicly available benchmark datasets have shown the effectiveness of ARES in both full-resource and low-resource (e.g., zero-shot and few-shot) settings. An intuitive case study also indicates that ARES has learned useful knowledge that existing pre-trained models (e.g., BERT and PROP) fail to possess. This work provides insights into improving the interpretability of pre-trained models and the guidance of incorporating IR axioms or human heuristics into pre-training methods.
Abstract: Multi-scenario learning (MSL) enables a service provider to cater for users' fine-grained demands by separating services for different user sectors, e.g., by user's geographical region. Under each scenario there is a need to optimize multiple task-specific targets e.g., click through rate and conversion rate, known as multi-task learning (MTL). Recent solutions for MSL and MTL are mostly based on the multi-gate mixture-of-experts (MMoE) architecture. MMoE structure is typically static and its design requires domain-specific knowledge, making it less effective in handling both MSL and MTL. In this paper, we propose a novel Automatic Expert Selection framework for Multi-scenario and Multi-task search, named AESM2. AESM2 integrates both MSL and MTL into a unified framework with an automatic structure learning. Specifically, AESM2 stacks multi-task layers over multi-scenario layers. This hierarchical design enables us to flexibly establish intrinsic connections between different scenarios, and at the same time also supports high-level feature extraction for different tasks. At each multi-scenario/multi-task layer, a novel expert selection algorithm is proposed to automatically identify scenario-/task-specific and shared experts for each input. Experiments over two real-world large-scale datasets demonstrate the effectiveness of AESM2 over a battery of strong baselines. Online A/B test also shows substantial performance gain on multiple metrics. Currently, AESM2 has been deployed online for serving major traffic.
Abstract: Multimodal sentiment analysis has been studied under the assumption that all modalities are available. However, such a strong assumption does not always hold in practice, and most of multimodal fusion models may fail when partial modalities are missing. Several works have addressed the missing modality problem; but most of them only considered the single modality missing case, and ignored the practically more general cases of multiple modalities missing. To this end, in this paper, we propose a Tag-Assisted Transformer Encoder (TATE) network to handle the problem of missing uncertain modalities. Specifically, we design a tag encoding module to cover both the single modality and multiple modalities missing cases, so as to guide the network's attention to those missing modalities. Besides, we adopt a new space projection pattern to align common vectors. Then, a Transformer encoder-decoder network is utilized to learn the missing modality features. At last, the outputs of the Transformer encoder are used for the final sentiment classification. Extensive experiments are conducted on CMU-MOSI and IEMOCAP datasets, showing that our method can achieve significant improvements compared with several baselines.
Abstract: Fine-grained sentiment classification (FGSC) task and fine-grained controllable text generation (FGSG) task are two representative applications of sentiment analysis, two of which together can actually form an inverse task prediction, i.e., the former aims to infer the fine-grained sentiment polarities given a text piece, while the latter generates text content that describes the input fine-grained opinions. Most of the existing work solves the FGSC and the FGSG tasks in isolation, while ignoring the complementary benefits in between. This paper combines FGSC and FGSG as a joint dual learning system, encouraging them to learn the advantages from each other. Based on the dual learning framework, we further propose decoupling the feature representations in two tasks into fine-grained aspect-oriented opinion variables and content variables respectively, by performing mutual disentanglement learning upon them. We also propose to transform the difficult "data-to-text'' generation fashion widely used in FGSG into an easier text-to-text generation fashion by creating surrogate natural language text as the model inputs. Experimental results on 7 sentiment analysis benchmarks including both the document-level and sentence-level datasets show that our method significantly outperforms the current strong-performing baselines on both the FGSC and FGSG tasks. Automatic and human evaluations demonstrate that our FGSG model successfully generates fluent, diverse and rich content conditioned on fine-grained sentiments.
Abstract: Cross-domain sentiment classification (CDSC) aims to use the transferable semantics learned from the source domain to predict the sentiment of reviews in the unlabeled target domain. Existing studies in this task attach more attention to the sequence modeling of sentences while largely ignoring the rich domain-invariant semantics embedded in graph structures (i.e., the part-of-speech tags and dependency relations). As an important aspect of exploring characteristics of language comprehension, adaptive graph representations have played an essential role in recent years. To this end, in the paper, we aim to explore the possibility of learning invariant semantic features from graph-like structures in CDSC. Specifically, we present Graph Adaptive Semantic Transfer (GAST) model, an adaptive syntactic graph embedding method that is able to learn domain-invariant semantics from both word sequences and syntactic graphs. More specifically, we first raise a POS-Transformer module to extract sequential semantic features from the word sequences as well as the part-of-speech tags. Then, we design a Hybrid Graph Attention (HGAT) module to generate syntax-based semantic features by considering the transferable dependency relations. Finally, we devise an Integrated aDaptive Strategy (IDS) to guide the joint learning process of both modules. Extensive experiments on four public datasets indicate that GAST achieves comparable effectiveness to a range of state-of-the-art models.
Abstract: Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task designed to identify the polarity of a target aspect. Some works introduce various attention mechanisms to fully mine the relevant context words of different aspects, and use the traditional cross-entropy loss to fine-tune the models for the ABSA task. However, the attention mechanism paying partial attention to aspect-unrelated words inevitably introduces irrelevant noise. Moreover, the cross-entropy loss lacks discriminative learning of features, which makes it difficult to exploit the implicit information of intra-class compactness and inter-class separability. To overcome these challenges, we propose an Aspect Feature Distillation and Enhancement Network (AFDEN) for the ABSA task. We first propose a dual-feature extraction module to extract aspect-related and aspect-unrelated features through the attention mechanisms and graph convolutional networks. Then, to eliminate the interference of aspect-unrelated words, we design a novel aspect-feature distillation module containing a gradient reverse layer that learns aspect-unrelated contextual features through adversarial training, and an aspect-specific orthogonal projection layer to further project aspect-related features into the orthogonal space of aspect-unrelated features. Finally, we propose an aspect-feature enhancement module that leverages supervised contrastive learning to capture the implicit information between the same sentiment labels and between different sentiment labels. Experimental results on three public datasets demonstrate that our AFDEN model achieves state-of-the-art performance and verify the effectiveness and robustness of our model.
Abstract: Recently, the aspect-opinion term pairs (AOTP) extraction task has gained substantial importance in the domain of aspect-based sentiment analysis. It intends to extract the potential pair of each aspect term with its corresponding opinion term present in a user review. Some existing studies heavily relied on the annotated aspect terms and/or opinion terms, or adopted external knowledge/resources to figure out the task. Therefore, in this study, we propose a novel end-to-end solution, called an Interactive AOTP (IAOTP) model, for exploring AOTP. The IAOTP model first tracks the boundary of each token in given aspect-specific and opinion-specific representations through a span-based operation. Next, it generates the candidate AOTP by formulating the dyadic relations between tokens through the Biaffine transformation. Then, it computes the positioning information to capture the significant distance relationship that each candidate pair holds. And finally, it jointly models collaborative interactions and prediction of AOTP through a 2D self-attention. Besides the IAOTP model, this study also proposes an independent aspect/opinion encoding model (a RS model) that formulates relational semantics to obtain aspect-specific and opinion-specific representations that can effectively perform the extraction of aspect and opinion terms. Detailed experiments conducted on the publicly available benchmark datasets for AOTP, aspect terms, and opinion terms extraction tasks, clearly demonstrate the significantly improved performance of our models relative to other competitive state-of-the-art baselines.
Abstract: A large-scale recommender system usually consists of recall and ranking modules. The goal of ranking modules (aka rankers) is to elaborately discriminate users' preference on item candidates proposed by recall modules. With the success of deep learning techniques in various domains, we have witnessed the mainstream rankers evolve from traditional models to deep neural models. However, the way that we design and use rankers remains unchanged: offline training the model, freezing the parameters, and deploying it for online serving. Actually, the candidate items are determined by specific user requests, in which underlying distributions (e.g., the proportion of items for different categories, the proportion of popular or new items) are highly different from one another in a production environment. The classical parameter-frozen inference manner cannot adapt to dynamic serving circumstances, making rankers' performance compromised. In this paper, we propose a new training and inference paradigm, termed as Ada-Ranker, to address the challenges of dynamic online serving. Instead of using parameter-frozen models for universal serving, Ada-Ranker can adaptively modulate parameters of a ranker according to the data distribution of the current group of item candidates. We first extract distribution patterns from the item candidates. Then, we modulate the ranker by the patterns to make the ranker adapt to the current data distribution. Finally, we use the revised ranker to score the candidate list. In this way, we empower the ranker with the capacity of adapting from a global model to a local model which better handles the current task. As a first study, we examine our Ada-Ranker paradigm in the sequential recommendation scenario. Experiments on three datasets demonstrate that Ada-Ranker can effectively enhance various base sequential models and also outperform a comprehensive set of competitive baselines.
Abstract: Side information fusion for sequential recommendation (SR) aims to effectively leverage various side information to enhance the performance of next-item prediction. Most state-of-the-art methods build on self-attention networks and focus on exploring various solutions to integrate the item embedding and side information embeddings before the attention layer. However, our analysis shows that the early integration of various types of embeddings limits the expressiveness of attention matrices due to a rank bottleneck and constrains the flexibility of gradients. Also, it involves mixed correlations among the different heterogeneous information resources, which brings extra disturbance to attention calculation. Motivated by this, we propose Decoupled Side Information Fusion for Sequential Recommendation (DIF-SR), which moves the side information from the input to the attention layer and decouples the attention calculation of various side information and item representation. We theoretically and empirically show that the proposed solution allows higher-rank attention matrices and flexible gradients to enhance the modeling capacity of side information fusion. Also, auxiliary attribute predictors are proposed to further activate the beneficial interaction between side information and item representation learning. Extensive experiments on four real-world datasets demonstrate that our proposed solution stably outperforms state-of-the-art SR models. Further studies show that our proposed solution can be readily incorporated into current attention-based SR models and significantly boost performance. Our source code is available at https://github.com/AIM-SE/DIF-SR.
Abstract: For sequential recommender, the coarse-grained yet sparse sequential signals mined from massive user-item interactions have become the bottleneck to further improve the recommendation performance. To alleviate the spareness problem, exploiting auxiliary semantic features (\eg textual descriptions, visual images and knowledge graph) to enrich contextual information then turns into a mainstream methodology. Though effective, we argue that these different heterogeneous features certainly include much noise which may overwhelm the valuable sequential signals, and therefore easily reach the phenomenon of negative collaboration (ie 1 + 1 > 2). How to design a flexible strategy to select proper auxiliary information and alleviate the negative collaboration towards a better recommendation is still an interesting and open question. Unfortunately, few works have addressed this challenge in sequential recommendation. In this paper, we introduce a Multi-Agent RL-based Information S election Model (named MARIS) to explore an effective collaboration between different kinds of auxiliary information and sequential signals in an automatic way. Specifically, MARIS formalizes the auxiliary feature selection as a cooperative Multi-agent Markov Decision Process. For each auxiliary feature type, MARIS resorts to using an agent to determine whether a specific kind of auxiliary feature should be imported to achieve a positive collaboration. In between, a QMIX network is utilized to cooperate their joint selection actions and produce an episode corresponding an effective combination of different auxiliary features for the whole historical sequence. Considering the lack of supervised selection signals, we further devise a novel reward-guided sampling strategy to leverage exploitation and exploration scheme for episode sampling. By preserving them in a replay buffer, MARIS learns the action-value function and the reward alternatively for optimization. Extensive experiments on four real-world datasets demonstrate that our model obtains significant performance improvement over up-to-date state-of-the-art recommendation models.
Abstract: Sequential recommendation aims at identifying the next item that is preferred by a user based on their behavioral history. Compared to conventional sequential models that leverage attention mechanisms and RNNs, recent efforts mainly follow two directions for improvement: multi-interest learning and graph convolutional aggregation. Specifically, multi-interest methods such as ComiRec and MIMN, focus on extracting different interests for a user by performing historical item clustering, while graph convolution methods including TGSRec and SURGE elect to refine user preferences based on multilevel correlations between historical items. Unfortunately, neither of them realizes that these two types of solutions can mutually complement each other, by aggregating multi-level user preference to achieve more precise multi-interest extraction for a better recommendation. To this end, in this paper, we propose a unified multi-grained neural model (named MGNM) via a combination of multi-interest learning and graph convolutional aggregation. Concretely, MGNM first learns the graph structure and information aggregation paths of the historical items for a user. It then performs graph convolution to derive item representations in an iterative fashion, in which the complex preferences at different levels can be well captured. Afterwards, a novel sequential capsule network is proposed to inject the sequential patterns into the multi-interest extraction process, leading to a more precise interest learning in a multi-grained manner. Experiments on three real-world datasets from different scenarios demonstrate the superiority of MGNM against several state-of-the-art baselines. The performance gain over the best baseline is up to 27.10% and 25.17% in terms of NDCG@5 and HIT@5 respectively, which is one of the largest gains in recent development of sequential recommendation. Further analysis also demonstrates that MGNM is robust and effective at user preference understanding at multi-grained levels.
Abstract: In most real-world recommender systems, users interact with items in a sequential and multi-behavioral manner. Exploring the fine-grained relationship of items behind the users' multi-behavior interactions is critical in improving the performance of recommender systems. Despite the great successes, existing methods seem to have limitations on modelling heterogeneous item-level multi-behavior dependencies, capturing diverse multi-behavior sequential dynamics, or alleviating data sparsity problems. In this paper, we show it is possible to derive a framework to address all the above three limitations. The proposed framework MB-STR, a Multi-Behavior Sequential Transformer Recommender, is equipped with the multi-behavior transformer layer (MB-Trans), the multi-behavior sequential pattern generator (MB-SPG) and the behavior-aware prediction module (BA-Pred). Compared with a typical transformer, we design MB-Trans to capture multi-behavior heterogeneous dependencies as well as behavior-specific semantics, propose MB-SPG to encode the diverse sequential patterns among multiple behaviors, and incorporate BA-Pred to better leverage multi-behavior supervision. Comprehensive experiments on three real-world datasets show the effectiveness of MB-STR by significantly boosting the recommendation performance compared with various competitive baselines. Further ablation studies demonstrate the superiority of different modules of MB-STR.
Abstract: Sequential recommendation is a popular task in academic research and close to real-world application scenarios, where the goal is to predict the next action(s) of the user based on his/her previous sequence of actions. In the training process of recommender systems, the loss function plays an essential role in guiding the optimization of recommendation models to generate accurate suggestions for users. However, most existing sequential recommendation tech- niques focus on designing algorithms or neural network architectures, and few efforts have been made to tailor loss functions that fit naturally into the practical application scenario of sequential recommender systems. Ranking-based losses, such as cross-entropy and Bayesian Personalized Ranking (BPR) are widely used in the sequential recommendation area. We argue that such objective functions suffer from two inherent drawbacks: i) the dependencies among elements of a sequence are overlooked in these loss formulations; ii) instead of balancing accuracy (quality) and diversity, only generating accurate results has been over emphasized. We therefore propose two new loss functions based on the Determinantal Point Process (DPP) likelihood, that can be adaptively applied to estimate the subsequent item or items. The DPP-distributed item set captures natural dependencies among temporal actions, and a quality vs. diversity decomposition of the DPP kernel pushes us to go beyond accuracy-oriented loss functions. Experimental results using the proposed loss functions on three real-world datasets show marked improvements over state-of-the-art sequential recommendation methods in both quality and diversity metrics.
Abstract: As a step beyond traditional personalized recommendation, group recommendation is the task of suggesting items that can satisfy a group of users. In group recommendation, the core is to design preference aggregation functions to obtain a quality summary of all group members' preferences. Such user and group preferences are commonly represented as points in the vector space (i.e., embeddings), where multiple user embeddings are compressed into one to facilitate ranking for group-item pairs. However, the resulted group representations, as points, lack adequate flexibility and capacity to account for the multi-faceted user preferences. Also, the point embedding-based preference aggregation is a less faithful reflection of a group's decision-making process, where all users have to agree on a certain value in each embedding dimension instead of a negotiable interval. In this paper, we propose a novel representation of groups via the notion of hypercubes, which are subspaces containing innumerable points in the vector space. Specifically, we design the hypercube recommender (CubeRec) to adaptively learn group hypercubes from user embeddings with minimal information loss during preference aggregation, and to leverage a revamped distance metric to measure the affinity between group hypercubes and item points. Moreover, to counteract the long-standing issue of data sparsity in group recommendation, we make full use of the geometric expressiveness of hypercubes and innovatively incorporate self-supervision by intersecting two groups. Experiments on four real-world datasets have validated the superiority of CubeRec over state-of-the-art baselines.
Abstract: Session-based recommendation (SBR) aims to predict a user's next clicked item based on an anonymous yet short interaction sequence. Previous SBR models, which rely only on the limited short-term transition information without utilizing extra valuable knowledge, have suffered a lot from the problem of data sparsity. This paper proposes a novel mirror graph enhanced neural model for session-based recommendation (MGS), to exploit item attribute information over item embeddings for more accurate preference estimation. Specifically, MGS utilizes two kinds of graphs to learn item representations. One is a session graph generated from the user interaction sequence describing users' preference based on transition patterns. Another is a mirror graph built by an attribute-aware module that selects the most attribute-representative information for each session item by integrating items' attribute information. We applied an iterative dual refinement mechanism to propagate information between the session and mirror graphs. To further guide the training process of the attribute-aware module, we also introduce a contrastive learning strategy that compares two mirror graphs generated for the same session by randomly sampling the attribute-same neighbors. Experiments on three real-world datasets exhibit that the performance of MGS surpasses many state-of-the-art models.
Abstract: Session-based recommendation aims to predict items that an anonymous user would like to purchase based on her short behavior sequence. The current approaches towards session-based recommendation only focus on modeling users' interest preferences, while they all ignore a key attribute of an item, i.e., the price. Many marketing studies have shown that the price factor significantly influences users' behaviors and the purchase decisions of users are determined by both price and interest preferences simultaneously. However, it is nontrivial to incorporate price preferences for session-based recommendation. Firstly, it is hard to handle heterogeneous information from various features of items to capture users' price preferences. Secondly, it is difficult to model the complex relations between price and interest preferences in determining user choices. To address the above challenges, we propose a novel method Co-guided Heterogeneous Hypergraph Network (CoHHN) for session-based recommendation. Towards the first challenge, we devise a heterogeneous hypergraph to represent heterogeneous information and rich relations among them. A dual-channel aggregating mechanism is then designed to aggregate various information in the heterogeneous hypergraph. After that, we extract users' price preferences and interest preferences via attention layers. As to the second challenge, a co-guided learning scheme is designed to model the relations between price and interest preferences and enhance the learning of each other. Finally, we predict user actions based on item features and users' price and interest preferences. Extensive experiments on three real-world datasets demonstrate the effectiveness of the proposed CoHHN. Further analysis reveals the significance of price for session-based recommendation.
Abstract: Session-based recommendation aims to predict next click action (e.g., item) of anonymous users based on a fixed number of previous actions. Recently, Graph Neural Networks (GNNs) have shown superior performance in various applications. Inspired by the success of GNNs, tremendous endeavors have been devoted to introduce GNNs into session-based recommendation and have achieved significant results. Nevertheless, due to the highly diverse types of potential information in sessions, existing GNNs-based methods perform differently on different session datasets, leading to the need for efficient design of neural networks adapted to various session recommendation scenarios. To address this problem, we propose Automated neural architecture search for Graph-based Session Recommendation, namely AutoGSR, a framework that provides a practical and general solution to automatically find the optimal GNNs-based session recommendation model. In AutoGSR, we propose two novel GNN operations to build an expressive and compact search space. Building upon the search space, we employ a differentiable search algorithm to search for the optimal graph neural architecture. Furthermore, to consider all types of session information together, we propose to learn the item meta knowledge, which acts as a priori knowledge for guiding the optimization of final session representations. Comprehensive experiments on three real-world datasets demonstrate that AutoGSR is able to find effective neural architectures and achieve state-of-the-art results. To the best of our knowledge, we are the first to study the neural architecture search for the session-based recommendation.
Abstract: As an emerging paradigm, session-based recommendation is aimed at recommending the next item based on a set of anonymous sessions. Effectively representing a session that is normally a short interaction sequence renders a major technical challenge. In view of the limitations of pioneering studies that explore collaborative information from other sessions, in this paper we propose a new direction to enhance session representations by learning multi-faceted session-independent global item relations. In particular, we identify three types of advantageous global item relations, including negative relations that have not been studied before, and propose different graph construction methods to capture such relations. We then devise a novel multi-faceted global item relation (MGIR) model to encode different relations using different aggregation layers and generate enhanced session representations by fusing positive and negative relations. Our solution is flexible to accommodate new item relations and can easily integrate existing session representation learning methods to generate better representations from global relation enhanced session information. Extensive experiments on three benchmark datasets demonstrate the superiority of our model over a large number of state-of-the-art methods. Specifically, we show that learning negative relations is critical for session-based recommendation.
Abstract: Social media enable users to share their feelings and emotional struggles. They also offer an opportunity to provide community support to suicidal users. Recent studies on suicide risk assessment have explored the user's historic timeline and information from their social network to analyze their emotional state. However, such methods often require a large amount of user-centric data. A less intrusive alternative is to only use conversation trees arising from online community responses. Modeling such online conversations between the community and a person in distress is an important context for understanding that person's mental state. However, it is not trivial to model the vast number of conversation trees on social media, since each comment has a diverse influence on a user in distress. Typically, a handful of comments/posts receive a significantly high number of replies, which results in scale-free dynamics in the conversation tree. Moreover, psychological studies suggested that it is important to capture the fine-grained temporal irregularities in the release of vast volumes of comments, since suicidal users react quickly to online community support. Building on these limitations and psychological studies, we propose HCN, a Hyperbolic Conversation Network, which is a less user-intrusive method for suicide ideation detection. HCN leverages the hyperbolic space to represent the scale-free dynamics of online conversations. Through extensive quantitative, qualitative, and ablative experiments on real-world Twitter data, we find that HCN outperforms state-of-the art methods, while using 98% less user-specific data, and while maintaining a 74% lower carbon footprint and a 94% smaller model size. We also find that the comments within the first half an hour are most important to identify at-risk users.
Abstract: This paper develops a novel unsupervised algorithm for belief representation learning in polarized networks that (i) uncovers the latent dimensions of the underlying belief space and (ii) jointly embeds users and content items (that they interact with) into that space in a manner that facilitates a number of downstream tasks, such as stance detection, stance prediction, and ideology mapping. Inspired by total correlation in information theory, we propose the Information-Theoretic Variational Graph Auto-Encoder (InfoVGAE) that learns to project both users and content items (e.g., posts that represent user views) into an appropriate disentangled latent space. To better disentangle latent variables in that space, we develop a total correlation regularization module, a Proportional-Integral (PI) control module, and adopt rectified Gaussian distribution to ensure the orthogonality. The latent representation of users and content can then be used to quantify their ideological leaning and detect/predict their stances on issues. We evaluate the performance of the proposed InfoVGAE on three real-world datasets, of which two are collected from Twitter and one from U.S. Congress voting records. The evaluation results show that our model outperforms state-of-the-art unsupervised models by reducing 10.5% user clustering errors and achieving 12.1% higher F1 scores for stance separation of content items. In addition, InfoVGAE produces a comparable result with supervised models. We also discuss its performance on stance prediction and user ranking within ideological groups.
Abstract: Detecting cyberbullying from memes is highly challenging, because of the presence of the implicit affective content which is also often sarcastic, and multi-modality (image + text). The current work is the first attempt, to the best of our knowledge, in investigating the role of sentiment, emotion and sarcasm in identifying cyberbullying from multi-modal memes in a code-mixed language setting. As a contribution, we have created a benchmark multi-modal meme dataset called MultiBully annotated with bully, sentiment, emotion and sarcasm labels collected from open-source Twitter and Reddit platforms. Moreover, the severity of the cyberbullying posts is also investigated by adding a harmfulness score to each of the memes. The created dataset consists of two modalities, text and image. Most of the texts in our dataset are in code-mixed form, which captures the seamless transitions between languages for multilingual users. Two different multimodal multitask frameworks (BERT+ResNET-Feedback and CLIP-CentralNet) have been proposed for cyberbullying detection (CD), the three auxiliary tasks being sentiment analysis (SA), emotion recognition (ER) and sarcasm detection (SAR). Experimental results indicate that compared to uni-modal and single-task variants, the proposed frameworks improve the performance of the main task, i.e., CD, by 3.18% and 3.10% in terms of accuracy and F1 score, respectively.
Abstract: Increased social media use has contributed to the greater prevalence of abusive, rude, and offensive textual comments. Machine learning models have been developed to detect toxic comments online, yet these models tend to show biases against users with marginalized or minority identities (e.g., females and African Americans). Established research in debiasing toxicity classifiers often (1) takes a static or batch approach, assuming that all information is available and then making a one-time decision; and (2) uses a generic strategy to mitigate different biases (e.g., gender and racial biases) that assumes the biases are independent of one another. However, in real scenarios, the input typically arrives as a sequence of comments/words over time instead of all at once. Thus, decisions based on partial information must be made while additional input is arriving. Moreover, social bias is complex by nature. Each type of bias is defined within its unique context, which, consistent with intersectionality theory within the social sciences, might be correlated with the contexts of other forms of bias. In this work, we consider debiasing toxicity detection as a sequential decision-making process where different biases can be interdependent. In particular, we study debiasing toxicity detection with two aims: (1) to examine whether different biases tend to correlate with each other; and (2) to investigate how to jointly mitigate these correlated biases in an interactive manner to minimize the total amount of bias. At the core of our approach is a framework built upon theories of sequential Markov Decision Processes that seeks to maximize the prediction accuracy and minimize the bias measures tailored to individual biases. Evaluations on two benchmark datasets empirically validate the hypothesis that biases tend to be correlated and corroborate the effectiveness of the proposed sequential debiasing strategy.
Abstract: The diffusion of rumors on social media generally follows a propagation tree structure, which provides valuable clues on how an original message is transmitted and responded by users over time. Recent studies reveal that rumor verification and stance detection are two relevant tasks that can jointly enhance each other despite their differences. For example, rumors can be debunked by cross-checking the stances conveyed by their relevant posts, and stances are also conditioned on the nature of the rumor. However, stance detection typically requires a large training set of labeled stances at post level, which are rare and costly to annotate. Enlightened by Multiple Instance Learning (MIL) scheme, we propose a novel weakly supervised joint learning framework for rumor verification and stance detection which only requires bag-level class labels concerning the rumor's veracity. Specifically, based on the propagation trees of source posts, we convert the two multi-class problems into multiple MIL-based binary classification problems where each binary model is focused on differentiating a target class (of rumor or stance) from the remaining classes. Then, we propose a hierarchical attention mechanism to aggregate the binary predictions, including (1) a bottom-up/top-down tree attention layer to aggregate binary stances into binary veracity; and (2) a discriminative attention layer to aggregate the binary class into finer-grained classes. Extensive experiments conducted on three Twitter-based datasets demonstrate promising performance of our model on both claim-level rumor detection and post-level stance classification compared with state-of-the-art methods.
Abstract: Many recent Natural Language Processing (NLP) task formulations, such as question answering and fact verification, are implemented as a two-stage cascading architecture. In the first stage an IR system retrieves "relevant'' documents containing the knowledge, and in the second stage an NLP system performs reasoning to solve the task. Optimizing the IR system for retrieving relevant documents ensures that the NLP system has sufficient information to operate over. These recent NLP task formulations raise interesting and exciting challenges for IR, where the end-user of an IR system is not a human with an information need, but another system exploiting the documents retrieved by the IR system to perform reasoning and address the user information need. Among these challenges, as we will show, is that noise from the IR system, such as retrieving spurious or irrelevant documents, can negatively impact the accuracy of the downstream reasoning module. Hence, there is the need to balance maximizing relevance while minimizing noise in the IR system. This paper presents experimental results on two NLP tasks implemented as a two-stage cascading architecture. We show how spurious or irrelevant retrieved results from the first stage can induce errors in the second stage. We use these results to ground our discussion of the research challenges that the IR community should address in the context of these knowledge-intensive NLP tasks.
Abstract: Cross-lingual information retrieval (CLIR) aims to provide access to information across languages. Recent pre-trained multilingual language models brought large improvements to the natural language tasks, including cross-lingual adhoc retrieval. However, pseudo-relevance feedback (PRF), a family of techniques for improving ranking using the contents of top initially retrieved items, has not been explored with neural CLIR retrieval models. Two of the challenges are incorporating feedback from long documents, and cross-language knowledge transfer. To address these challenges, we propose a novel neural CLIR architecture, NCLPRF, capable of incorporating PRF feedback from multiple potentially long documents, which enables improvements to query representation in the shared semantic space between query and document languages. The additional information that the feedback documents provide in a target language, can enrich the query representation, bringing it closer to relevant documents in the embedding space. The proposed model performance across three CLIR test collections in Chinese, Russian, and Persian languages, exhibits significant improvements over traditional and SOTA neural CLIR baselines across all three collections.
Abstract: Session-based Recommendation (SBR) refers to the task of predicting the next item based on short-term user behaviors within an anonymous session. However, session embedding learned by a non-linear encoder is usually not in the same representation space as item embeddings, resulting in the inconsistent prediction issue while recommending items. To address this issue, we propose a simple and effective framework named CORE, which can unify the representation space for both the encoding and decoding processes. Firstly, we design a representation-consistent encoder that takes the linear combination of input item embeddings as session embedding, guaranteeing that sessions and items are in the same representation space. Besides, we propose a robust distance measuring method to prevent overfitting of embeddings in the consistent representation space. Extensive experiments conducted on five public real-world datasets demonstrate the effectiveness and efficiency of the proposed method. The code is available at: https://github.com/RUCAIBox/CORE.
Abstract: Learning individual-level treatment effect is a fundamental problem in causal inference and has received increasing attention in many areas, especially in the user growth area which concerns many internet companies. Recently, disentangled representation learning methods that decompose covariates into three latent factors, including instrumental, confounding and adjustment factors, have witnessed great success in treatment effect estimation. However, it remains an open problem how to learn the underlying disentangled factors precisely. Specifically, previous methods fail to obtain independent disentangled factors, which is a necessary condition for identifying treatment effect. In this paper, we propose Disentangled Representations for Counterfactual Regression via Mutual Information Minimization (MIM-DRCFR), which uses a multi-task learning framework to share information when learning the latent factors and incorporates MI minimization learning criteria to ensure the independence of these factors. Extensive experiments including public benchmarks and real-world industrial user growth datasets demonstrate that our method performs much better than state-of-the-art methods.
Abstract: Recently micro-videos have become more popular in social media platforms such as TikTok and Instagram. Engagements in these platforms are facilitated by multi-modal recommendation systems. Indeed, such multimedia content can involve diverse modalities, often represented as visual, acoustic, and textual features to the recommender model. Existing works in micro-video recommendation tend to unify the multi-modal channels, thereby treating each modality with equal importance. However, we argue that these approaches are not sufficient to encode item representations with multiple modalities, since the used methods cannot fully disentangle the users' tastes on different modalities. To tackle this problem, we propose a novel learning method named Multi-Modal Graph Contrastive Learning (MMGCL), which aims to explicitly enhance multi-modal representation learning in a self-supervised learning manner. In particular, we devise two augmentation techniques to generate the multiple views of a user/item: modality edge dropout and modality masking. Furthermore, we introduce a novel negative sampling technique that allows to learn the correlation between modalities and ensures the effective contribution of each modality. Extensive experiments conducted on two micro-video datasets demonstrate the superiority of our proposed MMGCL method over existing state-of-the-art approaches in terms of both recommendation performance and training convergence speed.
Abstract: Sequential recommendation methods are very important in modern recommender systems because they can well capture users' dynamic interests from their interaction history, and make accurate recommendations for users, thereby helping enterprises succeed in business. However, despite the great success of existing sequential recommendation-based methods, they focus too much on item-level modeling of users' click history and lack information about the user's entire click history (such as click order, click time, etc.). To tackle this problem, inspired by recent advances in pre-training techniques in the field of natural language processing, we build a new pre-training task based on the original BERT pre-training framework and incorporate temporal information. Specifically, we propose a new model called the RE arrange S equence prE -training and T ime embedding model via BERT for sequential R ecommendation (RESETBERT4Rec ) \footnoteThis work was completed during JD internship., it further captures the information of the user's whole click history by adding a rearrange sequence prediction task to the original BERT pre-training framework, while it integrates different views of time information. Comprehensive experiments on two public datasets as well as one e-commerce dataset demonstrate that RESETBERT4Rec achieves state-of-the-art performance over existing baselines.
Abstract: Sequential recommender systems (SRSs) have become a research hotspot recently due to its powerful ability in capturing users' dynamic preferences. The key idea behind SRSs is to model the sequential dependencies over the user-item interactions. However, we argue that users' preferences are not only determined by their view or purchase items but also affected by the item-providers with which users have interacted. For instance, in a short-video scenario, a user may click on a video because he/she is attracted to either the video content or simply the video-providers as the vloggers are his/her idols. Motivated by the above observations, in this paper, we propose IPSRec, a novel Item-Provider co-learning framework for Sequential Recommendation. Specifically, we propose two representation learning methods (single-steam and cross-stream) to learn comprehensive item and user representations based on the user's historical item sequence and provider sequence. Then, contrastive learning is employed to further enhance the user embeddings in a self-supervised manner, which treats the representations of a specific user learned from the item side as well as the item-provider side as the positive pair and treats the representations of different users in the batch as the negative samples. Extensive experiments on three real-world SRS datasets demonstrate that IPSRec achieves substantially better results than the strong competitors. For reproducibility, our code and data are available at https://github.com/siat-nlp/IPSRec.
Abstract: Recommender Systems (RS), as an efficient tool to discover users' interested items from a very large corpus, has attracted more and more attention from academia and industry. As the initial stage of RS, large-scale matching is fundamental yet challenging. A typical recipe is to learn user and item representations with a two-tower architecture and then calculate the similarity score between both representation vectors, which however still struggles in how to properly deal with negative samples. In this paper, we find that the common practice that randomly sampling negative samples from the entire space and treating them equally is not an optimal choice, since the negative samples from different sub-spaces at different stages have different importance to a matching model. To address this issue, we propose a novel method named Unbiased Model-Agnostic Matching Approach (UMA2). It consists of two basic modules including 1) General Matching Model (GMM), which is model-agnostic and can be implemented as any embedding-based two-tower models; and 2) Negative Samples Debias Network (NSDN), which discriminates negative samples by borrowing the idea of Inverse Propensity Weighting (IPW) and re-weighs the loss in GMM. UMA$^2$ seamlessly integrates these two modules in an end-to-end multi-task learning framework. Extensive experiments on both real-world offline dataset and online A/B test demonstrate its superiority over state-of-the-art methods.
Abstract: Existing methods usually identify causal relations between events at the mention-level, which takes each event mention pair as a separate input. As a result, they either suffer from conflicts among causal relations predicted separately or require a set of additional constraints to resolve such conflicts. We propose to study this task in a more realistic setting, where event-level causality identification can be made. The advantage is two folds: 1) with modeling different mentions of an event as a single unit, no more conflicts among predicted results, without any extra constraints; 2) with the use of diverse knowledge sources (e.g., co-occurrence and coreference relations), a rich graph-based event structure can be induced from the document for supporting event-level causal inference. Graph convolutional network is used to encode such structural information, which aims to capture the local and non-local dependencies among nodes. Results show that our model achieves the best performance under both mention- and event-level settings, outperforming a number of strong baselines by at least 2.8% on F1 score.
Abstract: A data lake is a repository for massive raw and heterogeneous data, which includes multiple data models with different data schemas and query interfaces. Keyword search can extract valuable information for users without the knowledge of underlying schemas and query languages. However, conventional keyword searches are restricted to a certain data model and cannot easily adapt to a data lake. In this paper, we study a novel keyword search. To achieve high accuracy and efficiency, we introduce canonical graphs and then integrate semantically related vertices based on vertex representations. A matching entity based keyword search algorithm is presented to find answers across multiple data sources. Finally, extensive experimental study shows the effectiveness and efficiency of our solution.
Abstract: Engaging all content providers, including newcomers or minority demographic groups, is crucial for online platforms to keep growing and working. Hence, while building recommendation services, the interests of those providers should be valued. In this paper, we consider providers as grouped based on a common characteristic in settings in which certain provider groups have low representation of items in the catalog and, thus, in the user interactions. Then, we envision a scenario wherein platform owners seek to control the degree of exposure to such groups in the recommendation process. To support this scenario, we rely on disparate exposure measures that characterize the gap between the share of recommendations given to groups and the target level of exposure pursued by the platform owners. We then propose a re-ranking procedure that ensures desired levels of exposure are met. Experiments show that, while supporting certain groups of providers by rendering them with the target exposure, beyond-accuracy objectives experience significant gains with negligible impact in recommendation utility.
Abstract: Brain-inspired hyperdimensional computing (HDC) has been introduced as an alternative computing paradigm to achieve efficient and robust learning. HDC simulates cognitive tasks by mapping all data points to patterns of neural activity in the high-dimensional space, which has demonstrated promising performances in a wide range of applications such as robotics, biomedical signal processing, and genome sequencing. Language tasks, generally solved using machine learning methods, are widely deployed on low-power embedded devices. However, existing HDC solutions suffer from major challenges that impede the deployment of low-power embedded devices: the storage and computation overhead of HDC models grows dramatically with (i) the number of dimensions and (ii) the complex similarity metric during the inference. In this paper, we proposed a novel ensemble framework for the language task, termed L3E-HD, which enables efficient HDC on low-power edge devices. L3E-HD accelerates the inference by mapping data points to a high-dimensional binary space to simplify similarity search, which dominates costly and frequent operation in HDC. Through marrying HDC with the ensemble technique, L3E-HD also addresses the severe accuracy degradation induced by the compression of the dimension and precision of the model. Our experiments show that the ensemble technique is naturally a perfect fit to boost HDCs. We find that our L3E-HD, which is faster, more efficient, and more accurate than conventional machine learning methods, can even surpass the accuracy of the full-precision model at a smaller model size. Code is released at: https://github.com/MXHX7199/SIGIR22-EnsembleHDC.
Abstract: With the success of deep learning, click-through rate (CTR) predictions are transitioning from shallow approaches to deep architectures. Current deep CTR prediction usually follows the Embedding & MLP paradigm, where the model embeds categorical features into latent semantic space. This paper introduces a novel embedding technique called neural statistics that instead learns explicit semantics of categorical features by incorporating feature engineering as an innate prior into the deep architecture in an end-to-end manner. Besides, since the statistical information changes over time, we study how to adapt to the distribution shift in the MLP module efficiently. Offline experiments on two public datasets validate the effectiveness of neural statistics against state-of-the-art models. We also apply it to a large-scale recommender system via online A/B tests, where the user's satisfaction is significantly improved.
Abstract: Graph Neural Networks (GNNs) provide a class of powerful architectures that are effective for graph-based collaborative filtering. Nevertheless, GNNs are known to be vulnerable to adversarial perturbations. Adversarial training is a simple yet effective way to improve the robustness of neural models. For example, many prior studies inject adversarial perturbations into either node features or hidden layers of GNNs. However, perturbing graph structures has been far less studied in recommendations. To bridge this gap, we propose AdvGraph to model adversarial graph perturbations during the training of GNNs. Our AdvGraph is mainly based on min-max robust optimization, where an universal graph perturbation is obtained through an inner maximization while the outer optimization aims to compute the model parameters of GNNs. However, direct optimizing the inner problem is challenging due to discrete nature of the graph perturbations. To address this issue, an unbiased gradient estimator is further proposed to compute the gradients of discrete variables. Extensive experiments demonstrate that our AdvGraph is able to enhance the generalization performance of GNN-based recommenders.
Abstract: While Graph Convolutional Networks (GCNs) have been extended to various fields of artificial intelligence with their powerful representation capabilities, recent studies have revealed that their ability to capture the part-whole structure of the graph is limited. Furthermore, though many GCNs variants have been proposed and obtained state-of-the-art results, they face the situation that much early information may be lost during the graph convolution step. To this end, we innovatively present an Graph Capsule Network with a Dual Adaptive Mechanism (DA-GCN) to tackle the above challenges. Specifically, this powerful mechanism is a dual-adaptive mechanism to capture the part-whole structure of the graph. One is an adaptive node interaction module to explore the potential relationship between interactive nodes. The other is an adaptive attention-based graph dynamic routing to select appropriate graph capsules, so that only favorable graph capsules are gathered and redundant graph capsules are restrained for better capturing the whole structure between graphs. Experiments demonstrate that our proposed algorithm has achieved the most advanced or competitive results on all datasets.
Abstract: Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy. The majority of prior studies consider HTC as a flat multi-label classification problem, which inevitably leads to ''label inconsistency'' problem. In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure. Moreover, we design a constrained decoding strategy with dynamic vocabulary to secure the label consistency of the results. Compared with previous works, the proposed approach achieves significant and consistent improvements on three benchmark datasets.
Abstract: In the era of big data, eXtreme Multi-label Classification (XMC) has already become one of the most essential research tasks to deal with enormous label spaces in machine learning applications. Instead of assessing every individual label, most XMC methods rely on label trees or filters to derive short ranked label lists as prediction, thereby reducing computational overhead. Specifically, existing studies obtain ranked label lists with a fixed length for prediction and evaluation. However, these predictions are unreasonable since data points have varied numbers of relevant labels. The greatly small and large list lengths in evaluation, such as Precision@5 and Recall@100, can also lead to the ignorance of other relevant labels or the tolerance of many irrelevant labels. In this paper, we aim to provide reasonable prediction for extreme multi-label classification with dynamic numbers of predicted labels. In particular, we propose a novel framework, Model-Agnostic List Truncation with Ordinal Regression (MALTOR), to leverage the ranking properties and truncate long ranked label lists for better accuracy. Extensive experiments conducted on six large-scale real-world benchmark datasets demonstrate that MALTOR significantly outperforms statistical baseline methods and conventional ranked list truncation methods in ad-hoc retrieval with both linear and deep XMC models. The results of an ablation study also shows the effectiveness of each individual component in our proposed MALTOR.
Abstract: Target-oriented opinion words extraction (TOWE) is a subtask of aspect-based sentiment analysis (ABSA). Given a sentence and an aspect term occurring in the sentence, TOWE extracts the corresponding opinion words for the aspect term. TOWE has two types of instance. In the first type, aspect terms are associated with at least one opinion word, while in the second type, aspect terms do not have corresponding opinion words. However, previous researches trained and evaluated their models with only the first type of instance, resulting in a sample selection bias problem. Specifically, TOWE models were trained with only the first type of instance, while these models would be utilized to make inference on the entire space with both the first type of instance and the second type of instance. Thus, the generalization performance will be hurt. Moreover, the performance of these models on the first type of instance cannot reflect their performance on entire space. To validate the sample selection bias problem, four popular TOWE datasets containing only aspect terms associated with at least one opinion word are extended and additionally include aspect terms without corresponding opinion words. Experimental results on these datasets show that training TOWE models on entire space will significantly improve model performance and evaluating TOWE models only on the first type of instance will overestimate model performance.
Abstract: Current conversational passage retrieval systems cast conversational search into ad-hoc search by using an intermediate query resolution step that places the user's question in context of the conversation. While the proposed methods have proven effective, they still assume the availability of large-scale question resolution and conversational search datasets. To waive the dependency on the availability of such data, we adapt a pre-trained token-level dense retriever on ad-hoc search data to perform conversational search with no additional fine-tuning. The proposed method allows to contextualize the user question within the conversation history, but restrict the matching only between question and potential answer. Our experiments demonstrate the effectiveness of the proposed approach. We also perform an analysis that provides insights of how contextualization works in the latent space, in essence introducing a bias towards salient terms from the conversation.
Abstract: Graph Convolutional Neural Networks (GNN) based recommender systems are state-of-the-art since they can capture the high order collaborative signals between users and items. However, they suffer from the feature leakage problem since label information determined by edges can be leaked into node embeddings through the GNN aggregation procedure guided by the same set of edges, leading to poor generalization. We propose the accurate removal algorithm to generate the final embedding. For each edge, the embeddings of the two end nodes are evaluated on a graph with that edge removed. We devise an algebraic trick to efficiently compute this procedure without explicitly constructing separate graphs for the LightGCN model. Experiments on four datasets demonstrate that our algorithm can perform better on datasets with sparse interactions, while the training time is significantly reduced.
Abstract: The exposure sequence is being actively studied for user interest modeling in Click-Through Rate (CTR) prediction. However, the existing methods for exposure sequence modeling bring extensive computational burden and neglect noise problems, resulting in an excessively latency and the limited performance in online recommenders. In this paper, we propose to address the high latency and noise problems via Gating-adapted wavelet multiresolution analysis (Gama), which can effectively denoise the extremely long exposure sequence and adaptively capture the implied multi-dimension user interest with linear computational complexity. This is the first attempt to integrate non-parametric multiresolution analysis technique into deep neural network to model user exposure sequence. Extensive experiments on large scale benchmark dataset and real production dataset confirm the effectiveness of Gama for exposure sequence modeling, especially in cold-start scenarios. Benefited from its low latency and high effecitveness, Gama has been deployed in our real large-scale industrial recommender, successfully serving over hundreds of millions users.
Abstract: Deep neural networks (DNN) based recommender models often require numerous parameters to achieve remarkable performance. However, this inevitably brings redundant neurons, a phenomenon referred to as over-parameterization. In this paper, we plan to exploit such redundancy phenomena for recommender systems (RS), and propose a top-N item recommendation framework called PCRec that leverages collaborative training of two recommender models of the same network structure, termed peer collaboration. We first introduce two criteria to identify the importance of parameters of a given recommender model. Then, we rejuvenate the unimportant parameters by copying parameters from its peer network. After such an operation and retraining, the original recommender model is endowed with more representation capacity by possessing more functional model parameters. To show its generality, we instantiate PCRec by using three well-known recommender models. We conduct extensive experiments on two real-world datasets, and show that PCRec yields significantly better performance than its counterpart with the same model (parameter) size.
Abstract: Neural information retrieval architectures based on transformers such as BERT are able to significantly improve system effectiveness over traditional sparse models such as BM25. Though highly effective, these neural approaches are very expensive to run, making them difficult to deploy under strict latency constraints. To address this limitation, recent studies have proposed new families of learned sparse models that try to match the effectiveness of learned dense models, while leveraging the traditional inverted index data structure for efficiency. Current learned sparse models learn the weights of terms in documents and, sometimes, queries; however, they exploit different vocabulary structures, document expansion techniques, and query expansion strategies, which can make them slower than traditional sparse models such as BM25. In this work, we propose a novel indexing and query processing technique that exploits a traditional sparse model's "guidance" to efficiently traverse the index, allowing the more effective learned model to execute fewer scoring operations. Our experiments show that our guided processing heuristic is able to boost the efficiency of the underlying learned sparse model by a factor of four without any measurable loss of effectiveness.
Abstract: Recent works show the possibility of transferring the CLIP (Contrastive Language-Image Pretraining) model for video-text retrieval with promising performance. However, due to the domain gap between static images and videos, CLIP-based video-text retrieval models with interaction-based matching perform far worse than models with representation-based matching. In this paper, we propose a novel image animation strategy to transfer the image-text CLIP model to video-text retrieval effectively. By imitating the video shooting components, we convert widely used image-language corpus to synthesized video-text data for pretraining. To reduce the time complexity of interaction matching, we further propose a coarse to fine framework which consists of dual encoders for fast candidates searching and a cross-modality interaction module for fine-grained re-ranking. The coarse to fine framework with the synthesized video-text pretraining provides significant gains in retrieval accuracy while preserving efficiency. Comprehensive experiments conducted on MSR-VTT, MSVD, and VATEX datasets demonstrate the effectiveness of our approach.
Abstract: Explicit feedback---user input regarding their interest in an item---is the most helpful information for recommendation as it comes directly from the user and shows their direct interest in the item. Most approaches either treat the recommendation given such feedback as a typical regression problem or regard such data as implicit and then directly adopt approaches for implicit feedback; both methods, however,tend to yield unsatisfactory performance in top-k recommendation. In this paper, we propose interaction-level preference ranking(IPR), a novel pairwise ranking embedding learning approach to better utilize explicit feedback for recommendation. Experiments conducted on three real-world datasets show that IPR yields the best results compared to six strong baselines.
Abstract: News recommendation aims to match news with personalized user interest. Existing methods for news recommendation usually model user interest from historical clicked news without the consideration of candidate news. However, each user usually has multiple interests, and it is difficult for these methods to accurately match a candidate news with a specific user interest. In this paper, we present a candidate-aware user modeling method for personalized news recommendation, which can incorporate candidate news into user modeling for better matching between candidate news and user interest. We propose a candidate-aware self-attention network that uses candidate news as clue to model candidate-aware global user interest. In addition, we propose a candidate-aware CNN network to incorporate candidate news into local behavior context modeling and learn candidate-aware short-term user interest. Besides, we use a candidate-aware attention network to aggregate previously clicked news weighted by their relevance with candidate news to build candidate-aware user representation. Experiments on real-world datasets show the effectiveness of our method in improving news recommendation performance.
Abstract: Emoji recommendation is an important task to help users find appropriate emojis from thousands of candidates based on a short tweet text. Traditional emoji recommendation methods lack personalized recommendation and ignore user historical information in selecting emojis. In this paper, we propose a personalized emoji recommendation with dynamic user preference (PERD) which contains a text encoder and a personalized attention mechanism. In text encoder, a BERT model is contained to learn dense and low-dimensional representations of tweets. In personalized attention, user dynamic preferences are learned according to semantic and sentimental similarity between historical tweets and the tweet which is waiting for emoji recommendation. Informative historical tweets are selected and highlighted. Experiments are carried out on two real-world datasets from Sina Weibo and Twitter. Experimental results validate the superiority of our approach on personalized emoji recommendation.
Abstract: Social recommendation with Graph Neural Networks(GNNs) learns to represent cold users by fusing user-user social relations with user-item interactions, thereby alleviating the cold-start problem associated with recommender systems. Despite being well adapted to social relations and user-item interactions, these supervised models are still susceptible to popularity bias. Contrastive learning helps resolve this dilemma by identifying the properties that distinguish positive from negative samples. In its previous combinations with recommender systems, social relationships and cold-start cases in this context are not considered. Also, they primarily focus on collaborative features between users and items, leaving the similarity between items under-utilized. In this work, we propose socially-aware dual contrastive learning for cold-start recommendation, where cold users can be modeled in the same way as warm users. To take full advantage of social relations, we create dynamic node embeddings for each user by aggregating information from different neighbors according to each different query item, in the form of user-item pairs. We further design a dual-branch self-supervised contrastive objective to account for user-item collaborative features and item-item mutual information, respectively. On one hand, our framework eliminates popularity bias with proper negative sampling in contrastive learning, without extra ground-truth supervision. On the other hand, we extend previous contrastive learning methods to provide a solution to cold-start problem with social relations included. Extensive experiments on two real-world social recommendation datasets demonstrate its effectiveness.
Abstract: Neural Multi-task Learning is gaining popularity as a way to learn multiple tasks jointly within a single model. While related research continues to break new ground, two major limitations still remain, including (i) poor generalization to scenarios where tasks are loosely correlated; and (ii) under-investigation on global commonality and local characteristics of tasks. Our aim is to bridge these gaps by presenting a neural multi-task learning model coined Hierarchical Task-aware Multi-headed Attention Network (HTMN). HTMN explicitly distinguishes task-specific features from task-shared features to reduce the impact caused by weak correlation between tasks. The proposed method highlights two parts: Multi-level Task-aware Experts Network that identifies task-shared global features and task-specific local features, and Hierarchical Multi-Head Attention Network that hybridizes global and local features to profile more robust and adaptive representations for each task. Afterwards, each task tower receives its hybrid task-adaptive representation to perform task-specific predictions. Extensive experiments on two real datasets show that HTMN consistently outperforms the compared methods on a variety of prediction tasks.
Abstract: In this paper, we bridge the heterogeneity gap between different modalities and improve image-text retrieval by taking advantage of auxiliary image-to-text and text-to-image generative features with contrastive learning. Concretely, contrastive learning is devised to narrow the distance between the aligned image-text pairs and push apart the distance between the unaligned pairs from both inter- and intra-modality perspectives with the help of cross-modal retrieval features and auxiliary generative features. In addition, we devise a support-set regularization term to further improve contrastive learning by constraining the distance between each image/text and its corresponding cross-modal support-set information contained in the same semantic category. To evaluate the effectiveness of the proposed method, we conduct experiments on three benchmark datasets (i.e., MIRFLICKR-25K, NUS-WIDE, MS COCO). Experimental results show that our model significantly outperforms the strong baselines for cross-modal image-text retrieval. For reproducibility, we submit the code and data publicly at: \urlhttps://github.com/Hambaobao/CRCGS.
Abstract: Previous studies about event-level sentiment analysis (SA) usually model the event as a topic, a category or target terms, while the structured arguments (e.g., subject, object, time and location) that have potential effects on the sentiment are not well studied. In this paper, we redefine the task as structured event-level SA and propose an End-to-End Event-level Sentiment Analysis (E3SA) approach to solve this issue. Specifically, we explicitly extract and model the event structure information for enhancing event-level SA. Extensive experiments demonstrate the great advantages of our proposed approach over the state-of-the-art methods. Noting the lack of the dataset, we also release a large-scale real-world dataset with event arguments and sentiment labelling for promoting more researches.
Abstract: Recently, modeling temporal patterns of user-item interactions have attracted much attention in recommender systems. We argue that existing methods ignore the variety of temporal patterns of user behaviors. We define the subset of user behaviors that are ir- relevant to the target item as noises, which limits the performance of target-related time cycle modeling and affect the recommendation performance. In this paper, we propose Denoising Time Cycle Modeling (DiCycle), a novel approach to denoise user behaviors and select the subset of user behaviors that are highly related to the target item. DiCycle is able to explicitly model diverse time cycle patterns for recommendation. Extensive experiments are conducted on both public benchmarks and a real-world dataset, demonstrating the superior performance of DiCycle over the state-of-the-art recommendation methods.
Abstract: Compared to other language tasks, applying pre-trained language models (PLMs) for search ranking often requires more nuances and training signals. In this paper, we identify and study the two mismatches between pre-training and ranking fine-tuning: the training schema gap regarding the differences in training objectives and model architectures, and the task knowledge gap considering the discrepancy between the knowledge needed in ranking and that learned during pre-training. To mitigate these gaps, we propose Pre-trained, Prompt-learned and Pre-finetuned Neural Ranker (P3 Ranker). P3 Ranker leverages prompt-based learning to convert the ranking task into a pre-training like schema and uses pre-finetuning to initialize the model on intermediate supervised tasks. Experiments on MS MARCO and Robust04 show the superior performances of P3 Ranker in few-shot ranking. Analyses reveal that P3 Ranker is able to better accustom to the ranking task through prompt-based learning and retrieve necessary ranking-oriented knowledge gleaned in pre-finetuning, resulting in data-efficient PLM adaptation. Our code is available at https://github.com/NEUIR/P3Ranker.
Abstract: The main focus of our work is the problem of multiple objectives optimization (MOO) while providing a final list of recommendations to the user. Currently, system designers can tune MOO by setting importance of individual objectives, usually in some kind of weighted average setting. However, this does not have to translate into the presence of such objectives in the final results. In contrast, in our work we would like to allow system designers or end-users to directly quantify the required relative ratios of individual objectives in the resulting recommendations, e.g., the final results should have 60% relevance, 30% diversity and 10% novelty. If individual objectives are transformed to represent quality on the same scale, these result conditioning expressions may greatly contribute towards recommendations tuneability and explainability as well as user's control over recommendations. To achieve this task, we propose an iterative algorithm inspired by the mandates allocation problem in public elections. The algorithm is applicable as long as per-item marginal gains of individual objectives can be calculated. Effectiveness of the algorithm is evaluated on several settings of relevance-novelty-diversity optimization problem. Furthermore, we also outline several options to scale individual objectives to represent similar value for the user.
Abstract: Rich user behavior information is of great importance for capturing and understanding user interest in click-through rate (CTR) prediction. To improve the richness, collecting long-term behaviors becomes a typical approach in academy and industry but at the cost of increasing online storage and latency. Recently, researchers have proposed several approaches to shorten long-term behavior sequence and then model user interests. These approaches reduce online cost efficiently but do not well handle the noisy information in long-term user behavior, which may deteriorate the performance of CTR prediction significantly. To obtain better cost/performance trade-off, we propose a novel Adversarial Filtering Model (ADFM) to model long-term user behavior. ADFM uses a hierarchical aggregation representation to compress raw behavior sequence and then learns to remove useless behavior information with an adversarial filtering mechanism. The selected user behaviors are fed into interest extraction module for CTR prediction. Experimental results on public datasets and industrial dataset demonstrate that our method achieves significant improvements over state-of-the-art models.
Abstract: User modeling is important for news recommendation. Existing methods usually first encode user's clicked news into news embeddings independently and then aggregate them into user embedding. However, the word-level interactions across different clicked news from the same user, which contain rich detailed clues to infer user interest, are ignored by these methods. In this paper, we propose a fine-grained and fast user modeling framework (FUM) to model user interest from fine-grained behavior interactions for news recommendation. The core idea of FUM is to concatenate the clicked news into a long document and transform user modeling into a document modeling task with both intra-news and inter-news word-level interactions. Since vanilla transformer cannot efficiently handle long document, we apply an efficient transformer named Fastformer to model fine-grained behavior interactions. Extensive experiments on two real-world datasets verify that FUM can effectively and efficiently model user interest for news recommendation.
Abstract: Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it. In more detail, we initially provide the student model coarse-grained preference pairs between documents in the teacher's ranking, and progressively move towards finer-grained pairwise document ordering requirements. In our experiments, we apply a simple implementation of the CL-DRD framework to enhance two state-of-the-art dense retrieval models. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.
Abstract: Concept drift in stream data has been well studied in machine learning applications. In the field of recommender systems, this issue is also widely observed, as known as temporal dynamics in user behavior. Furthermore, in the context of COVID-19 pandemic related contingencies, people shift their behavior patterns extremely and tend to imitate others' opinions. The changes in user behavior may not be always rational. Thus, irrational behavior may impair the knowledge learned by the algorithm. It can cause herd effects and aggravate the popularity bias in recommender systems due to the irrational behavior of users. However, related research usually pays attention to the concept drift of individuals and overlooks the synergistic effect among users in the same social group. We conduct a study on user behavior to detect the collaborative concept drifts among users. Also, we empirically study the increase of experience of individuals can weaken herding effects. Our results suggest the CF models are highly impacted by the herd behavior and our findings could provide useful implications for the design of future recommender algorithms.
Abstract: There is essential information in the underlying structure of words and phrases in natural language questions, and this structure has been extensively studied. In this paper, we study one particular structure, referred to as frozen phrases, that is highly expected to transfer as a whole from questions to answer passages. Frozen phrases, if detected, can be helpful in open-domain Question Answering (QA) where identifying the localized context of a given input question is crucial. An interesting question is if frozen phrases can be accurately detected. We cast the problem as a sequence-labeling task and create synthetic data from existing QA datasets to train a model. We further plug this model into a sparse retriever that is made aware of the detected phrases. Our experiments reveal that detecting frozen phrases whose presence in answer documents are highly plausible yields significant improvements in retrievals as well as in the end-to-end accuracy of open-domain QA models.
Abstract: Session-based recommendation (SBR) aims at the next-item prediction with a short behavior session. Existing solutions fail to address two main challenges: 1) user interests are shown as dynamically coupled intents, and 2) sessions always contain noisy signals. To address them, in this paper, we propose a hypergraph-based solution, HIDE. Specifically, HIDE first constructs a hypergraph for each session to model the possible interest transitions from distinct perspectives. HIDE then disentangles the intents under each item click in micro and macro manners. In the micro-disentanglement, we perform intent-aware embedding propagation on session hypergraph to adaptively activate disentangled intents from noisy data. In the macro-disentanglement, we introduce an auxiliary intent-classification task to encourage the independence of different intents. Finally, we generate the intent-specific representations for the given session to make the final recommendation. Benchmark evaluations demonstrate the significant performance gain of our HIDE over the state-of-the-art methods.
Abstract: The task of temporally language grounding (TLG) aims to locate a video moment from an untrimmed video that match a given textual query, which has attracted considerable research attention. In recent years, typical retrieval-based TLG methods are inefficient due to pre-segmented candidate moments, while localization-based TLG solutions adopt reinforcement learning resulting in unstable convergence. Therefore, how to perform TLG task efficiently and stably is a non-trivial work. Toward this end, we innovatively contribute a solution, Point Prompt Tuning (PPT), which formulates this task as a prompt-based multi-modal problem and integrates multiple sub-tasks to tuning performance. Specifically, a flexible prompt strategy is contributed to rewrite the query firstly, which contains both query, start point and end point. Thereafter, a multi-modal Transformer is adopted to fully learn the multi-modal context. Meanwhile, we design various sub-tasks to constrain the novel framework, namely matching task and localization task. Finally, the start and end points of matched video moment are straightforward predicted, simply yet stably. Extensive experiments on two real-world datasets have well verified the effectiveness of our proposed solution.
Abstract: Scaling reinforcement learning (RL) to recommender systems (RS) is promising since maximizing the expected cumulative rewards for RL agents meets the objective of RS, i.e., improving customers' long-term satisfaction. A key approach to this goal is offline RL, which aims to learn policies from logged data rather than expensive online interactions. In this paper, we propose Value Penalized Q-learning (VPQ), a novel uncertainty-based offline RL algorithm that penalizes the unstable Q-values in the regression target using uncertainty-aware weights, achieving the conservative Q-function without the need of estimating the behavior policy, suitable for RS with a large number of items. Experiments on two real-world datasets show the proposed method serves as a gain plug-in for existing RS models.
Abstract: Recommendation for cold-start users who have very limited data is a canonical challenge in recommender systems. Existing deep recommender systems utilize user content features and behaviors to produce personalized recommendations, yet often face significant performance degradation on cold-start users compared to existing ones due to the following challenges: (1) Cold-start users may have a quite different distribution of features from existing users. (2) The few behaviors of cold-start users are hard to be exploited. In this paper, we propose a recommender system called Cold-Transformer to alleviate these problems. Specifically, we design context-based Embedding Adaption to offset the differences in feature distribution. It transforms the embedding of cold-start users into a warm state that is more like existing ones to represent corresponding user preferences. Furthermore, to exploit the few behaviors of cold-start users and characterize the user context, we propose Label Encoding that models Fused Behaviors of positive and negative feedback simultaneously, which are relatively more sufficient. Last, to perform large-scale industrial recommendations, we keep the two-tower architecture that de-couples user and target item. Extensive experiments on public and industrial datasets show that Cold-Transformer significantly outperforms state-of-the-art methods, including those that are deep coupled and less scalable.
Abstract: \beginabstract \AcpDS are evaluated depending on their type and purpose. Two categories are often distinguished: \beginenumerate\item \acpTDS, which are typically evaluated on utility, i.e., their ability to complete a specified task, and \item open-domain chat-bots, which are evaluated on the user experience, i.e., based on their ability to engage a person. \endenumerateWhat is the influence of user experience on the user satisfaction rating of \acpTDS as opposed to, or in addition to, utility ? We collect data by providing an additional annotation layer for dialogues sampled from the ReDial dataset, a widely used conversational recommendation dataset. Unlike prior work, we annotate the sampled dialogues at both the turn and dialogue level on six dialogue aspects: relevance, interestingness, understanding, task completion, efficiency, and interest arousal. The annotations allow us to study how different dialogue aspects influence user satisfaction. We introduce a comprehensive set of user experience aspects derived from the annotators' open comments that can influence users' overall impression. We find that the concept of satisfaction varies across annotators and dialogues, and show that a relevant turn is significant for some annotators, while for others, an interesting turn is all they need. Our analysis indicates that the proposed user experience aspects provide a fine-grained analysis of user satisfaction that is not captured by a monolithic overall human rating. \endabstract
Abstract: The popularization of social media generates a large amount of user-oriented data, where text data especially attracts researchers and speculators to infer user attributes (e.g., age, gender) for fulfilling their intents. Generally, this line of work casts attribute inference as a text classification problem, and starts to leverage graph neural networks for higher-level text representations. However, these text graphs are constructed on words, suffering from high memory consumption and ineffectiveness on few labeled texts. To address this challenge, we design a text-graph-based few-shot learning model for social media attribute inferences. Our model builds a text graph with texts as nodes and edges learned from current text representations via manifold learning and message passing. To further use unlabeled texts to improve few-shot performance, a knowledge distillation is devised to optimize the problem. This offers a trade-off between expressiveness and complexity. Experiments on social media datasets demonstrate the state-of-the-art performance of our model on attribute inferences with considerably fewer labeled texts.
Abstract: In real-world recommendation systems, the preferences of users are often affected by long-term constant interests and short-term temporal needs. The recently proposed Transformer-based models have proved superior in the sequential recommendation, modeling temporal dynamics globally via the remarkable self-attention mechanism. However, all equivalent item-item interactions in original self-attention are cumbersome, failing to capture the drifting of users' local preferences, which contain abundant short-term patterns. In this paper, we propose a novel interpretable convolutional self-attention, which efficiently captures both short- and long-term patterns with a progressive attention distribution. Specifically, a down-sampling convolution module is proposed to segment the overall long behavior sequence into a series of local subsequences. Accordingly, the segments are interacted with each item in the self-attention layer to produce locality-aware contextual representations, during which the quadratic complexity in original self-attention is reduced to nearly linear complexity. Moreover, to further enhance the robust feature learning in the context of Transformers, an unsymmetrical positional encoding strategy is carefully designed. Extensive experiments are carried out on real-world datasets, \eg ML-1M, Amazon Books, and Yelp, indicating that the proposed method outperforms the state-of-the-art methods w.r.t. both effectiveness and efficiency.
Abstract: With the wide adoption of mobile devices and web applications, location-based social networks (LBSNs) offer large-scale individual-level location-related activities and experiences. Next point-of-interest (POI) recommendation is one of the most important tasks in LBSNs, aiming to make personalized recommendations of next suitable locations to users by discovering preferences from users' historical activities. Noticeably, LBSNs have offered unparalleled access to abundant heterogeneous relational information about users and POIs (including user-user social relations, such as families or colleagues; and user-POI visiting relations). Such relational information holds great potential to facilitate the next POI recommendation. However, most existing methods either focus on merely the user-POI visits, or handle different relations based on over-simplified assumptions while neglecting relational heterogeneities. To fill these critical voids, we propose a novel framework, MEMO, which effectively utilizes the heterogeneous relations with a multi-network representation learning module, and explicitly incorporates the inter-temporal user-POI mutual influence with the coupled recurrent neural networks. Extensive experiments on real-world LBSN data validate the superiority of our framework over the state-of-the-art next POI recommendation methods.
Abstract: ive summarization of podcasts is motivated by the growing popularity of podcasts and the needs of their listeners. Podcasting is a markedly different domain from news and other media that are commonly studied in the context of automatic summarization. As such, the qualities of a good podcast summary are yet unknown. Using a collection of podcast summaries produced by different algorithms alongside human judgments of summary quality obtained from the TREC 2020 Podcasts Track, we study the correlations between various automatic evaluation metrics and human judgments, as well as the linguistic aspects of summaries that result in strong evaluations.
Abstract: Zero-shot intent classification is a vital and challenging task in dialogue systems, which aims to deal with numerous fast-emerging unacquainted intents without annotated training data. To obtain more satisfactory performance, the crucial points lie in two aspects: extracting better utterance features and strengthening the model generalization ability. In this paper, we propose a simple yet effective meta-learning paradigm for zero-shot intent classification. To learn better semantic representations for utterances, we introduce a new mixture attention mechanism, which encodes the pertinent word occurrence patterns by leveraging the distributional signature attention and multi-layer perceptron attention simultaneously. To strengthen the transfer ability of the model from seen classes to unseen classes, we reformulate zero-shot intent classification with a meta-learning strategy, which trains the model by simulating multiple zero-shot classification tasks on seen categories, and promotes the model generalization ability with a meta-adapting procedure on mimic unseen categories. Extensive experiments on two real-world dialogue datasets in different languages show that our model outperforms other strong baselines on both standard and generalized zero-shot intent classification tasks.
Abstract: Given the ubiquitous existence of graph-structured data, learning the representations of nodes for the downstream tasks ranging from node classification, link prediction to graph classification is of crucial importance. Regarding missing link inference of diverse networks, we revisit the link prediction techniques and identify the importance of both the structural and attribute information. However, the available techniques either heavily count on the network topology which is spurious in practice, or cannot integrate graph topology and features properly. To bridge the gap, we propose a bicomponent structural and attribute learning framework (BSAL) that is designed to adaptively leverage information from topology and feature spaces. Specifically, BSAL constructs a semantic topology via the node attributes and then gets the embeddings regarding the semantic view, which provides a flexible and easy-to-implement solution to adaptively incorporate the information carried by the node attributes. Then the semantic embedding together with topology embedding are fused together using attention mechanism for the final prediction. Extensive experiments show the superior performance of our proposal and it significantly outperforms baselines on diverse research benchmarks.
Abstract: Useful tips extracted from product reviews assist customers to take a more informed purchase decision, as well as making a better, easier, and safer usage of the product. In this work we argue that extracted tips should be examined based on the amount of support and opposition they receive from all product reviews. A classifier, developed for this purpose, determines the degree to which a tip is supported or contradicted by a single review sentence. These support-levels are then aggregated over all review sentences, providing a global support score, and a global contradiction score, reflecting the support-level of all reviews to the given tip, thus improving the customer confidence in the tip validity. By analyzing a large set of tips extracted from product reviews, we propose a novel taxonomy for categorizing tips as highly-supported, highly-contradicted, controversial (supported and contradicted), and anecdotal (neither supported nor contradicted).
Abstract: Session-based recommendation has recently attracted more and more research efforts. Most existing approaches are intuitively proposed to discover users' potential preferences or interests from the anonymous session data. This apparently ignores the fact that these sequential behavior data usually reflect session user's potential demand, i.e., a semantic level factor, and therefore how to estimate underlying demands from a session has become a challenging task. To tackle the aforementioned issue, this paper proposes a novel demand-aware graph neural network model. Particularly, a demand modeling component is designed to extract the underlying multiple demands of each session. Then, the demand-aware graph neural network is designed to first construct session demand graphs and then learn the demand-aware item embeddings to make the recommendation. The mutual information loss is further designed to enhance the quality of the learnt embeddings. Extensive experiments have been performed on two real-world datasets and the proposed model achieves the SOTA model performance.
Abstract: Stance detection aims to identify the stance of the text towards a target. Different from conventional stance detection, Zero-Shot Stance Detection (ZSSD) needs to predict the stances of the unseen targets during the inference stage. For human beings, we generally tend to reason the stance of a new target by linking it with the related knowledge learned from the known ones. Therefore, in this paper, to better generalize the target-related stance features learned from the known targets to the unseen ones, we incorporate the targeted background knowledge from Wikipedia into the model. The background knowledge can be considered as a bridge for connecting the meanings between known targets and the unseen ones, which enables the generalization and reasoning ability of the model to be improved in dealing with ZSSD. Extensive experimental results demonstrate that our model outperforms the state-of-the-art methods on the ZSSD task.
Abstract: Zero-shot cross-lingual event argument extraction (EAE) is a challenging yet practical problem in Information Extraction. Most previous works heavily rely on external structured linguistic features, which are not easily accessible in real-world scenarios. This paper investigates a translation-based method to implicitly project annotations from the source language to the target language. With the use of translation-based parallel corpora, no additional linguistic features are required during training and inference. As a result, the proposed approach is more cost effective than previous works on zero-shot cross-lingual EAE. Moreover, our implicit annotation projection approach introduces less noises and hence is more effective and robust than explicit ones. Experimental results show that our model achieves the best performance, outperforming a number of competitive baselines. The thorough analysis further demonstrates the effectiveness of our model compared to explicit annotation projection approaches.
Abstract: Sequential recommendation aims to model dynamic user behavior from historical interactions. Self-attentive methods have proven effective at capturing short-term dynamics and long-term preferences. Despite their success, these approaches still struggle to model sparse data, on which they struggle to learn high-quality item representations. We propose to model user dynamics from shopping intents and interacted items simultaneously. The learned intents are coarse-grained and work as prior knowledge for item recommendation. To this end, we present a coarse-to-fine self-attention framework, namely CaFe, which explicitly learns coarse-grained and fine-grained sequential dynamics. Specifically, CaFe first learns intents from coarse-grained sequences which are dense and hence provide high-quality user intent representations. Then, CaFe fuses intent representations into item encoder outputs to obtain improved item representations. Finally, we infer recommended items based on representations of items and corresponding intents. Experiments on sparse datasets show that CaFe outperforms state-of-the-art self-attentive recommenders by 44.03% NDCG@5 on average.
Abstract: User modeling is critical for personalization. Existing methods usually train user models from task-specific labeled data, which may be insufficient. In fact, there are usually abundant unlabeled user behavior data that encode rich universal user information, and pre-training user models on them can empower user modeling in many downstream tasks. In this paper, we propose a user model pre-training method named UserBERT to learn universal user models on unlabeled user behavior data with two contrastive self-supervision tasks. The first one is masked behavior prediction and discrimination, aiming to model the contexts of user behaviors. The second one is behavior sequence matching, aiming to capture user interest stable in different periods. Besides, we propose a medium-hard negative sampling framework to select informative negative samples for better contrastive pre-training. Extensive experiments validate the effectiveness of UserBERT in user model pre-training.
Abstract: Programming-based Pre-trained Language Models (PPLMs) such as CodeBERT have achieved great success in many downstream code-related tasks. Since the memory and computational complexity of self-attention in the Transformer grow quadratically with the sequence length, PPLMs typically limit the code length to 512. However, codes in real-world applications are generally long, such as code searches, which cannot be processed efficiently by existing PPLMs. To solve this problem, in this paper, we present SASA, a Structure-Aware Sparse Attention mechanism, which reduces the complexity and improves performance for long code understanding tasks. The key components in SASA are top-k sparse attention andSyntax Tree (AST)-based structure-aware attention. With top-k sparse attention, the most crucial attention relation can be obtained with a lower computational cost. As the code structure represents the logic of the code statements, which is a complement to the code sequence characteristics, we further introduce AST structures into attention. Extensive experiments on CodeXGLUE tasks show that SASA achieves better performance than the competing baselines.
Abstract: When searching the web for answers to health questions, people can make incorrect decisions that have a negative effect on their lives if the search results contain misinformation. To reduce health misinformation in search results, we need to be able to detect documents with correct answers and promote them over documents containing misinformation. Determining the correct answer has been a difficult hurdle to overcome for participants in the TREC Health Misinformation Track. In the 2021 track, automatic runs were not allowed to use the known answer to a topic's health question, and as a result, the top automatic run had a compatibility-difference score of 0.043 while the top manual run, which used the known answer, had a score of 0.259. The compatibility-difference measures the ability of methods to rank correct and credible documents before incorrect and non-credible documents. By using an existing set of health questions and their known answers, we show it is possible to learn which web hosts are trustworthy, from which we can predict the correct answers to the 2021 health questions with an accuracy of 76%. Using our predicted answers, we can promote documents that we predict contain this answer and achieve a compatibility-difference score of 0.129, which is a three-fold increase in performance over the best previous automatic method.
Abstract: Binary pointwise labels (aka implicit feedback) are heavily leveraged by deep learning based recommendation algorithms nowadays. In this paper we discuss the limited expressiveness of these labels may fail to accommodate varying degrees of user preference, and thus lead to conflicts during model training, which we call annotation bias. To solve this issue, we find the soft-labeling property of pairwise labels could be utilized to alleviate the bias of pointwise labels. To this end, we propose a momentum contrast framework (\method ) that combines pointwise and pairwise learning for recommendation. \method has a three-tower network structure: one user network and two item networks. The two item networks are used for computing pointwise and pairwise loss respectively. To alleviate the influence of the annotation bias, we perform a momentum update to ensure a consistent item representation. Extensive experiments on real-world datasets demonstrate the superiority of our method against state-of-the-art recommendation algorithms.
Abstract: Different from large-scale platforms such as Taobao and Amazon, CVR modeling in small-scale recommendation scenarios is more challenging due to the severe Data Distribution Fluctuation (DDF) issue. DDF prevents existing CVR models from being effective since 1) several months of data are needed to train CVR models sufficiently in small scenarios, leading to considerable distribution discrepancy between training and online serving; and 2) e-commerce promotions have significant impacts on small scenarios, leading to distribution uncertainty of the upcoming time period. In this work, we propose a novel CVR method named MetaCVR from a perspective of meta learning to address the DDF issue. Firstly, a base CVR model which consists of a Feature Representation Network (FRN) and output layers is designed and trained sufficiently with samples across months. Then we treat time periods with different data distributions as different occasions and obtain positive and negative prototypes for each occasion using the corresponding samples and the pre-trained FRN. Subsequently, a Distance Metric Network (DMN) is devised to calculate the distance metrics between each sample and all prototypes to facilitate mitigating the distribution uncertainty. At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction. In this stage, we freeze the FRN and train the DMN and EPN with samples from recent time period, therefore effectively easing the distribution discrepancy. To the best of our knowledge, this is the first study of CVR prediction targeting the DDF issue in small-scale recommendation scenarios. Experimental results on real-world datasets validate the superiority of our MetaCVR and online A/B test also shows our model achieves impressive gains of 11.92% on PCVR and 8.64% on GMV.
Abstract: A human-like user simulator that anticipates users' satisfaction scores, actions, and utterances can help goal-oriented dialogue systems in evaluating the conversation and refining their dialogue strategies. However, little work has experimented with user simulators which can generate users' utterances. In this paper, we propose a deep learning-based user simulator that predicts users' satisfaction scores and actions while also jointly generating users' utterances in a multi-task manner. In particular, we show that 1) the proposed deep text-to-text multi-task neural model achieves state-of-the-art performance in the users' satisfaction scores and actions prediction tasks, and 2) in an ablation analysis, user satisfaction score prediction, action prediction, and utterance generation tasks can boost the performance with each other via positive transfers across the tasks. The source code and model checkpoints used for the experiments run in this paper are available at the following weblink: \urlhttps://github.com/kimdanny/user-simulation-t5.
Abstract: The wide dissemination of fake news is increasingly threatening both individuals and society. Fake news detection aims to train a model on the past news and detect fake news of the future. Though great efforts have been made, existing fake news detection methods overlooked the unintended entity bias in the real-world data, which seriously influences models' generalization ability to future data. For example, 97% of news pieces in 2010-2017 containing the entity 'Donald Trump' are real in our data, but the percentage falls down to merely 33% in 2018. This would lead the model trained on the former set to hardly generalize to the latter, as it tends to predict news pieces about 'Donald Trump' as real for lower training loss. In this paper, we propose an entity debiasing framework (ENDEF) which generalizes fake news detection models to the future data by mitigating entity bias from a cause-effect perspective. Based on the causal graph among entities, news contents, and news veracity, we separately model the contribution of each cause (entities and contents) during training. In the inference stage, we remove the direct effect of the entities to mitigate entity bias. Extensive offline experiments on the English and Chinese datasets demonstrate that the proposed framework can largely improve the performance of base fake news detectors, and online tests verify its superiority in practice. To the best of our knowledge, this is the first work to explicitly improve the generalization ability of fake news detection models to the future data. The code has been released at https://github.com/ICTMCG/ENDEF-SIGIR2022.
Abstract: Dialogue topic segmentation is a challenging task in which dialogues are split into segments with pre-defined topics. Existing works on topic segmentation adopt a two-stage paradigm, including text segmentation and segment labeling. However, such methods tend to focus on the local context in segmentation, and the inter-segment dependency is not well captured. Besides, the ambiguity and labeling noise in dialogue segment bounds bring further challenges to existing models. In this work, we propose the Parallel Extraction Network with Neighbor Smoothing (PEN-NS) to address the above issues. Specifically, we propose the parallel extraction network to perform segment extractions, optimizing the bipartite matching cost of segments to capture inter-segment dependency. Furthermore, we propose neighbor smoothing to handle the segment-bound noise and ambiguity. Experiments on a dialogue-based and a document-based topic segmentation dataset show that PEN-NS outperforms state-the-of-art models significantly.
Abstract: Dense retrieval is becoming one of the standard approaches for document and passage ranking. The dual-encoder architecture is widely adopted for scoring question-passage pairs due to its efficiency and high performance. Typically, dense retrieval models are evaluated on clean and curated datasets. However, when deployed in real-life applications, these models encounter noisy user-generated text. That said, the performance of state-of-the-art dense retrievers can substantially deteriorate when exposed to noisy text. In this work, we study the robustness of dense retrievers against typos in the user question. We observe a significant drop in the performance of the dual-encoder model when encountering typos and explore ways to improve its robustness by combining data augmentation with contrastive learning. Our experiments on two large-scale passage ranking and open-domain question answering datasets show that our proposed approach outperforms competing approaches. Additionally, we perform a thorough analysis on robustness. Finally, we provide insights on how different typos affect the robustness of embeddings differently and how our method alleviates the effect of some typos but not of others.
Abstract: The common approach of using clusters of similar documents for ad hoc document retrieval is to rank the clusters in response to the query; then, the cluster ranking is transformed to document ranking. We present a novel supervised approach to transform cluster ranking to document ranking. The approach allows to simultaneously utilize different clusterings and the resultant cluster rankings; this helps to improve the modeling of the document similarity space. Empirical evaluation shows that using our approach results in performance that substantially transcends the state-of-the-art in cluster-based document retrieval.
Abstract: Collaborative filtering algorithms capture underlying consumption patterns, including the ones specific to particular demographics or protected information of users, e.g., gender, race, and location. These encoded biases can influence the decision of a recommendation system (RS) towards further separation of the contents provided to various demographic subgroups, and raise privacy concerns regarding the disclosure of users' protected attributes. In this work, we investigate the possibility and challenges of removing specific protected information of users from the learned interaction representations of a RS algorithm, while maintaining its effectiveness. Specifically, we incorporate adversarial training into the state-of-the-art MultVAE architecture, resulting in a novel model, Adversarial Variational Auto-Encoder with Multinomial Likelihood (Adv-MultVAE), which aims at removing the implicit information of protected attributes while preserving recommendation performance. We conduct experiments on the MovieLens-1M and LFM-2b-DemoBias datasets, and evaluate the effectiveness of the bias mitigation method based on the inability of external attackers in revealing the users' gender information from the model. Comparing with baseline MultVAE, the results show that Adv-MultVAE, with marginal deterioration in performance (w.r.t. NDCG and recall), largely mitigates inherent biases in the model on both datasets.