==================================
SIGIR'20 ABSTRACT
==================================
.. raw:: html
Abstract: We introduce Smart Shuffling, a cross-lingual embedding (CLE) method that draws from statistical word alignment approaches to leverage dictionaries, producing dense representations that are significantly more effective for cross-language information retrieval (CLIR) than prior CLE methods. This work is motivated by the observation that although neural approaches are successful for monolingual IR, they are less effective in the cross-lingual setting. We hypothesize that neural CLIR fails because typical cross-lingual embeddings "translate" query terms into related terms -- i.e., terms that appear in a similar context -- in addition to or sometimes rather than synonyms in the target language. Adding related terms to a query (i.e., query expansion) can be valuable for retrieval, but must be mitigated by also focusing on the starting query. We find that prior neural CLIR models are unable to bridge the translation gap, apparently producing queries that drift from the intent of the source query. We conduct extrinsic evaluations of a range of CLE methods using CLIR performance, compare them to neural and statistical machine translation systems trained on the same translation data, and show a significant gap in effectiveness. Our experiments on standard CLIR collections across four languages indicate that Smart Shuffling fills the translation gap and provides significantly improved semantic matching quality. Having such a representation allows us to exploit deep neural (re-)ranking methods for the CLIR task, leading to substantial improvement with up to 21% gain in MAP, approaching human translation performance. Evaluations on bilingual lexicon induction show a comparable improvement.
Abstract: An essential task of information retrieval (IR) is to compute the probability of relevance of a document given a query. If we regard a query term or n-gram fragment as a relevance matching unit, most retrieval models firstly calculate the relevance evidence between the given query and the candidate document separately, and then accumulate these evidences as the final document relevance prediction. This kind of approach obeys the the classical probability, which is not fully consistent with human cognitive rules in the actual retrieval process, due to the possible existence of interference effect between relevance matching units. In our work, we propose a Quantum Interference inspired Neural Matching model (QINM), which can apply the interference effects to guide the construction of additional evidence generated by the interaction between matching units in the retrieval process. Experimental results on two benchmark collections demonstrate that our approach outperforms the quantum-inspired retrieval models, and some well-known neural retrieval models in the ad-hoc retrieval task.
Abstract: Position bias is a critical problem in information retrieval when dealing with implicit yet biased user feedback data. Unbiased ranking methods typically rely on causality models and debias the user feedback through inverse propensity weighting. While practical, these methods still suffer from two major problems. First, when infer a user click, the impact of the contextual information, such as documents that have been examined, is often ignored. Second, only the position bias is considered but other issues resulted from user browsing behaviors are overlooked. In this paper, we propose an end-to-end Deep Recurrent Survival Ranking (DRSR), a unified framework to jointly model user's various behaviors, to (i) consider the rich contextual information in the ranking list; and (ii) address the hidden issues underlying user behaviors, i.e., to mine observe pattern in queries without any click (non-click queries), and to model tracking logs which cannot truly reflect the user browsing intents (untrusted observation). Specifically, we adopt a recurrent neural network to model the contextual information and estimates the conditional likelihood of user feedback at each position. We then incorporate survival analysis techniques with the probability chain rule to mathematically recover the unbiased joint probability of one user's various behaviors. DRSR can be easily incorporated with both point-wise and pair-wise learning objectives. The extensive experiments over two large-scale industrial datasets demonstrate the significant performance gains of our model comparing with the state-of-the-arts.
Abstract: Recent progress in Natural Language Understanding (NLU) is driving fast-paced advances in Information Retrieval (IR), largely owed to fine-tuning deep language models (LMs) for document ranking. While remarkably effective, the ranking models based on these LMs increase computational cost by orders of magnitude over prior approaches, particularly as they must feed each query-document pair through a massive neural network to compute a single relevance score. To tackle this, we present ColBERT, a novel ranking model that adapts deep LMs (in particular, BERT) for efficient retrieval. ColBERT introduces a late interaction architecture that independently encodes the query and the document using BERT and then employs a cheap yet powerful interaction step that models their fine-grained similarity. By delaying and yet retaining this fine-granular interaction, ColBERT can leverage the expressiveness of deep LMs while simultaneously gaining the ability to pre-compute document representations offline, considerably speeding up query processing. Crucially, ColBERT's pruning-friendly interaction mechanism enables leveraging vector-similarity indexes for end-to-end retrieval directly from millions of documents. We extensively evaluate ColBERT using two recent passage search datasets. Results show that ColBERT's effectiveness is competitive with existing BERT-based models (and outperforms every non-BERT baseline), while executing two orders-of-magnitude faster and requiring up to four orders-of-magnitude fewer FLOPs per query.
Abstract: Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking. However, their computational expenses deem them cost-prohibitive in practice. Our proposed approach, called PreTTR (Precomputing Transformer Term Representations), considerably reduces the query-time latency of deep transformer networks (up to a 42x speedup on web document ranking) making these networks more practical to use in a real-time ranking scenario. Specifically, we precompute part of the document term representations at indexing time (without a query), and merge them with the query representation at query time to compute the final ranking score. Due to the large size of the token representations, we also propose an effective approach to reduce the storage requirement by training a compression layer to match attention scores. Our compression technique reduces the storage required up to 95% and it can be applied without a substantial degradation in ranking performance.
Abstract: We present RML, the first known general reinforcement learning framework for relevance feedback that directly optimizes any desired retrieval metric, including precision-oriented, recall-oriented, and even diversity metrics: RML can be easily extended to directly optimize any arbitrary user satisfaction signal. Using the RML framework, we can select effective feedback terms and weight them appropriately, improving on past methods that fit parameters to feedback algorithms using heuristic approaches or methods that do not directly optimize for retrieval performance. Learning an effective relevance feedback model is not trivial since the true feedback distribution is unknown. Experiments on standard TREC collections compare RML to existing feedback algorithms, demonstrate the effectiveness of RML at optimizing for MAP and α-n DCG, and show the impact on related measures.
Abstract: There has been growing attention on fairness considerations recently, especially in the context of intelligent decision making systems. For example, explainable recommendation systems may suffer from both explanation bias and performance disparity. We show that inactive users may be more susceptible to receiving unsatisfactory recommendations due to their insufficient training data, and that their recommendations may be biased by the training records of active users due to the nature of collaborative filtering, which leads to unfair treatment by the system. In this paper, we analyze different groups of users according to their level of activity, and find that bias exists in recommendation performance between different groups. Empirically, we find that such performance gap is caused by the disparity of data distribution, specifically the knowledge graph path distribution in this work. We propose a fairness constrained approach via heuristic re-ranking to mitigate this unfairness problem in the context of explainable recommendation over knowledge graphs. We experiment on several real-world datasets with state-of-the-art knowledge graph-based explainable recommendation algorithms. The promising results show that our algorithm is not only able to provide high-quality explainable recommendations, but also reduces the recommendation unfairness in several aspects.
Abstract: Massive open online courses (MOOCs) are becoming a modish way for education, which provides a large-scale and open-access learning opportunity for students to grasp the knowledge. To attract students' interest, the recommendation system is applied by MOOCs providers to recommend courses to students. However, as a course usually consists of a number of video lectures, with each one covering some specific knowledge concepts, directly recommending courses overlook students' interest to some specific knowledge concepts. To fill this gap, in this paper, we study the problem of knowledge concept recommendation. We propose an end-to-end graph neural network based approach calledAttentional Heterogeneous Graph Convolutional Deep Knowledge Recommender (ACKRec) for knowledge concept recommendation in MOOCs. Like other recommendation problems, it suffers from sparsity issue. To address this issue, we leverage both content information and context information to learn the representation of entities via graph convolution network. In addition to students and knowledge concepts, we consider other types of entities (e.g., courses, videos, teachers) and construct a heterogeneous information network (HIN) to capture the corresponding fruitful semantic relationships among different types of entities and incorporate them into the representation learning process. Specifically, we use meta-path on the HIN to guide the propagation of students' preferences. With the help of these meta-paths, the students' preference distribution with respect to a candidate knowledge concept can be captured. Furthermore, we propose an attention mechanism to adaptively fuse the context information from different meta-paths, in order to capture the different interests of different students. To learn the parameters of the proposed model, we propose to utilize extended matrix factorization (MF). A series of experiments are conducted, demonstrating the effectiveness of ACKRec across multiple popular metrics compared with state-of-the-art baseline methods. The promising results show that the proposed ACKRec is able to effectively recommend knowledge concepts to students pursuing online learning in MOOCs.
Abstract: Recently, deep learning has made significant progress in the task of sequential recommendation. Existing neural sequential recommenders typically adopt a generative way trained with Maximum Likelihood Estimation (MLE). When context information (called factor) is involved, it is difficult to analyze when and how each individual factor would affect the final recommendation performance. For this purpose, we take a new perspective and introduce adversarial learning to sequential recommendation. In this paper, we present a Multi-Factor Generative Adversarial Network (MFGAN) for explicitly modeling the effect of context information on sequential recommendation. Specifically, our proposed MFGAN has two kinds of modules: a Transformer-based generator taking user behavior sequences as input to recommend the possible next items, and multiple factor-specific discriminators to evaluate the generated sub-sequence from the perspectives of different factors. To learn the parameters, we adopt the classic policy gradient method, and utilize the reward signal of discriminators for guiding the learning of the generator. Our framework is flexible to incorporate multiple kinds of factor information, and is able to trace how each factor contributes to the recommendation decision over time. Extensive experiments conducted on three real-world datasets demonstrate the superiority of our proposed model over the state-of-the-art methods, in terms of effectiveness and interpretability.
Abstract: Researchers have begun to utilize heterogeneous knowledge graphs(KGs) as auxiliary information in recommendation systems to mitigate the cold start and sparsity issues. However, utilizing a graph neural network (GNN) to capture information in KG and further apply in RS is still problematic as it is unable to see each item's properties from multiple perspectives. To address these issues, we propose the multi-view item network (MVIN), a GNN-based recommendation model that provides superior recommendations by describing items from a unique mixed view from user and entity angles. MVIN learns item representations from both the user view and the entity view. From the user view, user-oriented modules score and aggregate features to make recommendations from a personalized perspective constructed according to KG entities which incorporates user click information. From the entity view, the mixing layer contrasts layer-wise GCN information to further obtain comprehensive features from internal entity-entity interactions in the KG. We evaluate MVIN on three real-world datasets: MovieLens-1M (ML-1M), LFM-1b 2015 (LFM-1b), and Amazon-Book (AZ-book). Results show that MVIN significantly outperforms state-of-the-art methods on these three datasets. Besides, from user-view cases, we find that MVIN indeed captures entities that attract users. Figures further illustrate that mixing layers in a heterogeneous KG plays a vital role in neighborhood information aggregation.
Abstract: Traditional recommender systems mainly aim to model inherent and long-term user preference, while dynamic user demands are also of great importance. Typically, a historical consumption will have impacts on the user demands for its relational items. For instance, users tend to buy complementary items together (iPhone and Airpods) but not substitutive items (Powerbeats and Airpods), although substitutes of the bought one still cater to his/her preference. To better model the effects of history sequence, previous studies introduce the semantics of item relations to capture user demands for recommendation. However, we argue that the temporal evolution of the effects caused by different relations cannot be neglected. In the example above, user demands for headphones can be promoted after a long period when a new one is needed. To model dynamic meanings of an item in different sequence contexts, a novel method Chorus is proposed to take both item relations and corresponding temporal dynamics into consideration. Chorus aims to derive the embedding of target item in a knowledge-aware and time-aware way, where each item will get its basic representation and relation-related ones. Then, we devise temporal kernel functions to combine these representations dynamically, according to whether there are relational items in history sequence as well as the elapsed time. The enhanced target item embedding is flexible to work with various algorithms to calculate the ranking score and generate recommendations. According to extensive experiments in three real-world datasets, Chorus gains significant improvements compared to state-of-the-art baseline methods. Furthermore, the time-related parameters are highly interpretable and hence can strengthen the explainability of recommendation.
Abstract: Conventional models on Neural Text Generation (NTG) determine the output distribution by applying maximum likelihood estimation on training corpora. However, as user preference for generated content can be constantly changing, an optimized text generator needs to assimilate such non-static nature into the outcome adaptively. In this paper, our goal is to generate product descriptions on e-commerce platforms, and we explore this classic task from a novel perspective that allows the optimal output text to vary with ever-changing user preference. Specifically, we propose an evolutionary NTG model to enable its interactive environment to fine-tune the pre-trained generative policy via Reinforcement Learning (RL). To this end, a dynamic context of textual fitness is established based on the user click behavior associated with previously generated content to estimate reward/penalty signals for each output text. Our motivation is to leverage the click-through rate as a kind of user-centric measurement on text quality, by which we can assess how likely a product description attracts people's attention and follows shopping trends. Extensive experiments on a real e-commerce website demonstrate that the proposed approach achieves a significant superiority over two statically RL-based variants and four state-of-the-art NTG solutions.
Abstract: View-based 3D model retrieval has become an important task in both computer vision and machine learning domains. Although deep learning methods have achieved excellent performances on view-based 3D model retrieval, the intrinsic correlation and the degree of view discrimination among multiple views in a 3D model have not been effectively exploited. To obtain a more efficient feature descriptor for 3D model retrieval, in this work, we propose the pairwise view weighted graph network (abbreviated PVWGN) for view-based 3D model retrieval where non-local graph layers are embedded into the network architecture to automatically mine the intrinsic relationship among multiple views of a 3D model. Furthermore, the view weighted layer is employed in the PVWGN to adaptively assign the weight to each view according to its aggregation information. In addition, the pairwise discrimination loss function is designed to improve the feature discrimination of the 3D model. Most importantly, these three issues are integrated into a unified framework. Extensive experimental results on the ModelNet40 and ModelNet10 3D model retrieval datasets show that PVWGN can outperform all state-of-the-art methods on the 3D model retrieval task with mAPs of 93.2% and 96.2%, respectively.
Abstract: Cyberspace hosts abundant interactions between users and different kinds of objects, and their relations are often encapsulated as bipartite graphs. Detecting user community in such heterogeneous graphs is an essential task to uncover user information needs and to further enhance recommendation performance. While several main cyber domains carrying high-quality graphs, unfortunately, most others can be quite sparse. However, as users may appear in multiple domains (graphs), their high-quality activities in the main domains can supply community detection in the sparse ones, e.g., user behaviors on Google can help thousands of applications to locate his/her local community when s/he uses Google ID to login those applications. In this paper, our model, Pairwise Cross-graph Community Detection (PCCD), is proposed to cope with the sparse graph problem by involving external graph knowledge to learn user pairwise community closeness instead of detecting direct communities. Particularly in our model, to avoid taking excessive propagated information, a two-level filtering module is utilized to select the most informative connections through both community and node level filters. Subsequently, a Community Recurrent Unit (CRU) is designed to estimate pairwise user community closeness. Extensive experiments on two real-world graph datasets validate our model against several strong alternatives. Supplementary experiments also validate its robustness on graphs with varied sparsity scales.
Abstract: Network embedding effectively transforms complex network data into a low-dimensional vector space and has shown great performance in many real-world scenarios, such as link prediction, node classification, and similarity search. A plethora of methods have been proposed to learn node representations and achieve encouraging results. Nevertheless, little attention has been paid on the embedding technique for bipartite attributed networks, which is a typical data structure for modeling nodes from two distinct partitions. In this paper, we propose a novel model called BiANE, short forBipartite Attributed Network Embedding. In particular, BiANE not only models the inter-partition proximity but also models the intra-partition proximity. To effectively preserve the intra-partition proximity, we jointly model the attribute proximity and the structure proximity through a novel latent correlation training approach. Furthermore, we propose a dynamic positive sampling technique to overcome the efficiency drawbacks of the existing dynamic negative sampling techniques. Extensive experiments have been conducted on several real-world networks, and the results demonstrate that our proposed approach can significantly outperform state-of-the-art methods.
Abstract: Fashion outfit recommendation has attracted increasing attentions from online shopping services and fashion communities.Distinct from other scenarios (e.g., social networking or content sharing) which recommend a single item (e.g., a friend or picture) to a user, outfit recommendation predicts user preference on a set of well-matched fashion items. Hence, performing high-quality personalized outfit recommendation should satisfy two requirements -- 1) the nice compatibility of fashion items and 2) the consistence with user preference. However, present works focus mainly on one of the requirements and only consider either user-outfit or outfit-item relationships, thereby easily leading to suboptimal representations and limiting the performance. In this work, we unify two tasks, fashion compatibility modeling and personalized outfit recommendation. Towards this end, we develop a new framework, Hierarchical Fashion Graph Network(HFGN), to model relationships among users, items, and outfits simultaneously. In particular, we construct a hierarchical structure upon user-outfit interactions and outfit-item mappings. We then get inspirations from recent graph neural networks, and employ the embedding propagation on such hierarchical graph, so as to aggregate item information into an outfit representation, and then refine a user's representation via his/her historical outfits. Furthermore, we jointly train these two tasks to optimize these representations. To demonstrate the effectiveness of HFGN, we conduct extensive experiments on a benchmark dataset, and HFGN achieves significant improvements over the state-of-the-art compatibility matching models like NGNN and outfit recommenders like FHN.
Abstract: Session-based recommendation (SBR) is a challenging task, which aims at recommending items based on anonymous behavior sequences. Almost all the existing solutions for SBR model user preference only based on the current session without exploiting the other sessions, which may contain both relevant and irrelevant item-transitions to the current session. This paper proposes a novel approach, called Global Context Enhanced Graph Neural Networks (GCE-GNN) to exploit item transitions over all sessions in a more subtle manner for better inferring the user preference of the current session. Specifically, GCE-GNN learns two levels of item embeddings from session graph and global graph, respectively: (i) Session graph, which is to learn the session-level item embedding by modeling pairwise item-transitions within the current session; and (ii) Global graph, which is to learn the global-level item embedding by modeling pairwise item-transitions over all sessions. In GCE-GNN, we propose a novel global-level item representation learning layer, which employs a session-aware attention mechanism to recursively incorporate the neighbors' embeddings of each node on the global graph. We also design a session-level item representation learning layer, which employs a GNN on the session graph to learn session-level item embeddings within the current session. Moreover, GCE-GNN aggregates the learnt item representations in the two levels with a soft attention mechanism. Experiments on three benchmark datasets demonstrate that GCE-GNN outperforms the state-of-the-art methods consistently.
Abstract: Interactive recommender system (IRS) has drawn huge attention because of its flexible recommendation strategy and the consideration of optimal long-term user experiences. To deal with the dynamic user preference and optimize accumulative utilities, researchers have introduced reinforcement learning (RL) into IRS. However, RL methods share a common issue of sample efficiency, i.e., huge amount of interaction data is required to train an effective recommendation policy, which is caused by the sparse user responses and the large action space consisting of a large number of candidate items. Moreover, it is infeasible to collect much data with explorative policies in online environments, which will probably harm user experience. In this work, we investigate the potential of leveraging knowledge graph (KG) in dealing with these issues of RL methods for IRS, which provides rich side information for recommendation decision making. Instead of learning RL policies from scratch, we make use of the prior knowledge of the item correlation learned from KG to (i) guide the candidate selection for better candidate item retrieval, (ii) enrich the representation of items and user states, and (iii) propagate user preferences among the correlated items over KG to deal with the sparsity of user feedback. Comprehensive experiments have been conducted on two real-world datasets, which demonstrate the superiority of our approach with significant improvements against state-of-the-arts.
Abstract: Knowledge graph (KG) contains well-structured external information and has shown to be effective for high-quality recommendation. However, existing KG enhanced recommendation methods have largely focused on exploring advanced neural network architectures to better investigate the structural information of KG. While for model learning, these methods mainly rely on Negative Sampling (NS) to optimize the models for both KG embedding task and recommendation task. Since NS is not robust (e.g., sampling a small fraction of negative instances may lose lots of useful information), it is reasonable to argue that these methods are insufficient to capture collaborative information among users, items, and entities. In this paper, we propose a novel Jointly Non-Sampling learning model for Knowledge graph enhanced Recommendation (JNSKR). Specifically, we first design a new efficient NS optimization algorithm for knowledge graph embedding learning. The subgraphs are then encoded by the proposed attentive neural network to better characterize user preference over items. Through novel designs of memorization strategies and joint learning framework, JNSKR not only models the fine-grained connections among users, items, and entities, but also efficiently learns model parameters from the whole training data (including all non-observed data) with a rather low time complexity. Experimental results on two public benchmarks show that JNSKR significantly outperforms the state-of-the-art methods like RippleNet and KGAT. Remarkably, JNSKR also shows significant advantages in training efficiency (about 20 times faster than KGAT), which makes it more applicable to real-world large-scale systems.
Abstract: Modelling feature interactions is key in Click-Through Rate (CTR) predictions. State-of-the-art models usually include explicit feature interactions to better model non-linearity in a deep network, but enumerating all feature combinations of high orders is not efficient and brings challenges to network optimization. In this work, we use AutoML to seek useful high-order feature interactions to train on without manual feature selection. For this purpose, an end-to-end model, AutoGroup, is proposed, which casts the selection of feature interactions as a structural optimization problem. In a nutshell, AutoGroup first automatically groups useful features into a number of feature sets. Then, it generates interactions of any order from these feature sets using a novel interaction function. The main contribution of AutoGroup is that it performs both dimensionality reduction and feature selection which are not seen in previous models. Offline experiments on three public large-scale benchmark datasets demonstrate the superior performance and efficiency of AutoGroup over state-of-the-art models. Furthermore, a ten-day online A/B test verifies that AutoGroup can be reliably deployed in production and outperform the commercial baseline by 10% on average in terms of CTR and CVR.
Abstract: For sequential recommendation, it is essential to capture and predict future or long-term user preference for generating accurate recommendation over time. To improve the predictive capacity, we adopt reinforcement learning (RL) for developing effective sequential recommenders. However, user-item interaction data is likely to be sparse, complicated and time-varying. It is not easy to directly apply RL techniques to improve the performance of sequential recommendation. Inspired by the availability of knowledge graph (KG), we propose a novel Knowledge-guidEd Reinforcement Learning model (KERL for short) for fusing KG information into a RL framework for sequential recommendation. Specifically, we formalize the sequential recommendation task as a Markov Decision Process (MDP), and make three major technical extensions in this framework, including state representation, reward function and learning algorithm. First, we propose to enhance the state representations with KG information considering both exploitation and exploration. Second, we carefully design a composite reward function that is able to compute both sequence- and knowledge-level rewards. Third, we propose a new algorithm for more effectively learning the proposed model. To our knowledge, it is the first time that knowledge information has been explicitly discussed and utilized in RL-based sequential recommenders, especially for the exploration process. Extensive experiment results on both next-item and next-session recommendation tasks show that our model can significantly outperform the baselines on four real-world datasets.
Abstract: Since it can effectively address the problem of sparsity and cold start of collaborative filtering, knowledge graph (KG) is widely studied and employed as side information in the field of recommender systems. However, most of existing KG-based recommendation methods mainly focus on how to effectively encode the knowledge associations in KG, without highlighting the crucial collaborative signals which are latent in user-item interactions. As such, the learned embeddings underutilize the two kinds of pivotal information and are insufficient to effectively represent the latent semantics of users and items in vector space. In this paper, we propose a novel method named Collaborative Knowledge-aware Attentive Network (CKAN) which explicitly encodes the collaborative signals by collaboration propagation and proposes a natural way of combining collaborative signals with knowledge associations together. Specifically, CKAN employs a heterogeneous propagation strategy to explicitly encode both kinds of information, and applies a knowledge-aware attention mechanism to discriminate the contribution of different knowledge-based neighbors. Compared with other KG-based methods, CKAN provides a brand-new idea of combining collaborative information with knowledge information together. We apply the proposed model on four real-world datasets, and the empirical results demonstrate that CKAN significantly outperforms several compelling state-of-the-art baselines.
Abstract: In a large recommender system, the products (or items) could be in many different categories or domains. Given two relevant domains (e.g., Book and Movie), users may have interactions with items in one domain but not in the other domain. To the latter, these users are considered as cold-start users. How to effectively transfer users' preferences based on their interactions from one domain to the other relevant domain, is the key issue in cross-domain recommendation. Inspired by the advances made in review-based recommendation, we propose to model user preference transfer at aspect-level derived from reviews. To this end, we propose a cross-domain recommendation framework via aspect transfer network for cold-start users (named CATN). CATN is devised to extract multiple aspects for each user and each item from their review documents, and learn aspect correlations across domains with an attention mechanism. In addition, we further exploit auxiliary reviews from like-minded users to enhance a user's aspect representations. Then, an end-to-end optimization framework is utilized to strengthen the robustness of our model. On real-world datasets, the proposed CATN outperforms SOTA models significantly in terms of rating prediction accuracy. Further analysis shows that our model is able to reveal user aspect connections across domains at a fine level of granularity, making the recommendation explainable.
Abstract: Knowledge graphs have been widely adopted to improve recommendation accuracy. The multi-hop user-item connections on knowledge graphs also endow reasoning about why an item is recommended. However, reasoning on paths is a complex combinatorial optimization problem. Traditional recommendation methods usually adopt brute-force methods to find feasible paths, which results in issues related to convergence and explainability. In this paper, we address these issues by better supervising the path finding process. The key idea is to extract imperfect path demonstrations with minimum labeling efforts and effectively leverage these demonstrations to guide path finding. In particular, we design a demonstration-based knowledge graph reasoning framework for explainable recommendation. We also propose an ADversarial Actor-Critic (ADAC) model for the demonstration-guided path finding. Experiments on three real-world benchmarks show that our method converges more quickly than the state-of-the-art baseline and achieves better recommendation accuracy and explainability.
Abstract: Given an occurred event, human can easily predict the next event or reason the preceding event, yet which is difficult for machine to perform such event reasoning. Event representation bridges the connection and targets to model the process of event reasoning as a machine-readable format, which then can support a wide range of applications in information retrieval, e.g., question answering and information extraction. Existing work mainly resorts to a joint training to integrate all levels of training loss in event chains by a simple loss summation, which is easily trapped into a local optimum. In addition, the scenario knowledge in event chains is not well investigated for event representation. In this paper, we propose a unified fine-tuning architecture, incorporated with scenario knowledge for event representation, i.e., UniFA-S, which mainly consists of a unified fine-tuning architecture (UniFA) and a scenario-level variational auto-encoder (S-VAE). In detail, UniFA employs a multi-step fine-tuning to integrate all levels of training and S-VAE applies a stochastic variable to implicitly represent the scenario-level knowledge. We evaluate our proposal from two aspects, i.e., the representation and inference abilities. For the representation ability, our ensemble model UniFA-S can beat state-of-the-art baselines for two similarity tasks. For the inference ability, UniFA-S can outperform the best baseline, achieving 4.1%-8.2% improvements in terms of accuracy for various inference tasks.
Abstract: The Web is a canonical example of a competitive retrieval setting where many documents' authors consistently modify their documents to promote them in rankings. We present an automatic method for quality-preserving modification of document content --- i.e., maintaining content quality --- so that the document is ranked higher for a query by a non-disclosed ranking function whose rankings can be observed. The method replaces a passage in the document with some other passage. To select the two passages, we use a learning-to-rank approach with a bi-objective optimization criterion: rank promotion and content-quality maintenance. We used the approach as a bot in content-based ranking competitions. Analysis of the competitions demonstrates the merits of our approach with respect to human content modifications in terms of rank promotion, content-quality maintenance and relevance.
Abstract: Editorial NotesA corrigendum was issued for this paper on December 7, 2020. You can download the corrigendum from the supplemental material section of this citation page.Understanding how data workers interact with data and various pieces of information (e.g., code snippet examples) is key to design systems that can better support them in exploring a given dataset. To date, however, there is a paucity of research studying information seeking patterns and the strategies adopted by data workers as they carry out data curation activities. In this work, we aim at understanding the behaviors of data workers in discovering data quality issues, and how these behavioral observations relate to their performance. Specifically, we investigate how data workers use information resources and tools to support their task completion. To this end, we collect a multi-modal dataset through a data-driven experiment that relies on the use of eye-tracking technology with a purpose-designed platform built on top of iPython Notebook. The collected data reveals that: (i) searching in external resources is a prevalent action that can be leveraged to achieve better performance; (ii) 'copy-paste-modify' is a typical strategy for writing code to complete tasks; (iii) providing sample code within the system could help data workers to get started with their task; and (iv) surfacing underlying data is an effective way to support exploration. By investigating the behaviors prior to each search action, we also find that the most common reasons that trigger external search actions are the need to seek assistance in writing or debugging code and to search for relevant code to reuse. Our findings provide insights into patterns of interactions with various system components and information resources to perform data curation tasks. This bears implications on the design of domain-specific IR systems for data workers like code-base search.
Abstract: Technological advancements have led to increasing availability of erotic literature and pornography novels online, which can be alluring to adolescence and children. Unfortunately, because of the inherent complexity of these indecent contents and training data sparseness, it is a challenging task to detect these readings in the Cyberspace while children can easily access them. In this study, we propose a novel framework, Joint Learning of Content and Human Attention (GoodMan), to identify indecent readings by augmenting natural language understanding models with large scale human reading behaviors (dwell time per page) on portable devices. From the text modeling viewpoint, the innovative joint attention trained by joint learning is employed to orchestrate the content attention and human behavior attention via the BiGRU. From the data augmentation perspective, various users' reading behaviors on the same text can generate considerable training instances with joint attention, which can be effective to address the cold start problem. We conduct an extensive set of experiments on an online ebook dataset (with human reading behaviors on portable devices). The experimental results show insights into the task and demonstrate the superiority of the proposed model against alternative solutions.
Abstract: Candidate generation is a critical task for recommendation system, which is technically challenging from two perspectives. On the one hand, recommendation system requires the comprehensive inclusion of user's interested candidates, yet typical deep user modeling approaches would represent each user as an onefold vector, which is hard to capture user's diverse interests. On the other hand, for the sake of practicability, the candidate generation process needs to be both accurate and efficient. Although existing "multi-channel structures'', like memory networks, are more capable of representing user's diverse interests, they may bring in substantial irrelevant candidates and lead to rapid growth of temporal cost. As a result, it remains a tough issue to comprehensively acquire user's interested items in a practical way. In this work, a novel personalized candidate generation paradigm, Octopus, is proposed, which is remarkable for its comprehensiveness and elasticity. Similar with those conventional "multi-channel structures'', Octopus also generates multiple vectors for the comprehensive representation of user's diverse interests. However, Octopus' representation functions are formulated in a highly elastic way, whose scale and type are adaptively determined based on each user's individual background. Therefore, it will not only identify user's interested items comprehensively, but also rule out irrelevant candidates and help to maintain a feasible running cost. Extensive experiments are conducted with both industrial and publicly available datasets, where the effectiveness of Octopus is verified in comparison with the state-of-the-art baseline approaches.
Abstract: Relevance is an essential concept in Information Retrieval (IR). Recent studies using brain imaging have significantly contributed towards the understanding of this concept, but only as a binary notion, i.e. a document being judged as relevant or non-relevant. While such a binary division is prevalent in IR, seminal theories have proposed relevance as a graded variable; i.e. having different degrees. In this paper, we aim to investigate the brain activity associated with relevance when it is treated as a graded concept. Twenty-five participants provided graded relevance judgements in the context of a Question Answering (Q/A) Task, during assessment with an electroencephalogram (EEG). Our findings show that significant differences in event-related potentials (ERPs) were observed in response to information segments processed in the context of high-relevance, low-relevance and no-relevance, supporting the concept of graded relevance. We speculate that differences in attentional engagement, semantic mismatch (between the question and answer) and memory processing underpin the electrophysiological responses to the graded relevance judgements. We believe our conclusions constitute an important step in unravelling the nature of graded relevance and knowledge of the electrophysiological modulation to each grade of relevance will help to improve the design and evaluation of IR systems.
Abstract: In most real-world recommender systems, the observed rating data are subject to selection bias, and the data are thus missing-not-at-random. Developing a method to facilitate the learning of a recommender with biased feedback is one of the most challenging problems, as it is widely known that naive approaches under selection bias often lead to suboptimal results. A well-established solution for the problem is using propensity scoring techniques. The propensity score is the probability of each data being observed, and unbiased performance estimation is possible by weighting each data by the inverse of its propensity. However, the performance of the propensity-based unbiased estimation approach is often affected by choice of the propensity estimation model or the high variance problem. To overcome these limitations, we propose a model-agnostic meta-learning method inspired by the asymmetric tri-training framework for unsupervised domain adaptation. The proposed method utilizes two predictors to generate data with reliable pseudo-ratings and another predictor to make the final predictions. In a theoretical analysis, a propensity-independent upper bound of the true performance metric is derived, and it is demonstrated that the proposed method can minimize this bound. We conduct comprehensive experiments using public real-world datasets. The results suggest that the previous propensity-based methods are largely affected by the choice of propensity models and the variance problem caused by the inverse propensity weighting. Moreover, we show that the proposed meta-learning method is robust to these issues and can facilitate in developing effective recommendations from biased explicit feedback.
Abstract: Modeling large scale and rare-interaction users are the two major challenges in recommender systems, which derives big gaps between researches and applications. Facing to millions or even billions of users, it is hard to store and leverage personalized preferences with a user embedding matrix in real scenarios. And many researches pay attention to users with rich histories, while users with only one or several interactions are the biggest part in real systems. Previous studies make efforts to handle one of the above issues but rarely tackle efficiency and cold-start problems together. In this work, a novel user preference representation called Preference Hash (PreHash) is proposed to model large scale users, including rare-interaction ones. In PreHash, a series of buckets are generated based on users' historical interactions. Users with similar preferences are assigned into the same buckets automatically, including warm and cold ones. Representations of the buckets are learned accordingly. Contributing to the designed hash buckets, only limited parameters are stored, which saves a lot of memory for more efficient modeling. Furthermore, when new interactions are made by a user, his buckets and representations will be dynamically updated, which enables more effective understanding and modeling of the user. It is worth mentioning that PreHash is flexible to work with various recommendation algorithms by taking the place of previous user embedding matrices. We combine it with multiple state-of-the-art recommendation methods and conduct various experiments. Comparative results on public datasets show that it not only improves the recommendation performance but also significantly reduces the number of model parameters. To summarize, PreHash has achieved significant improvements in both efficiency and effectiveness for recommender systems.
Abstract: Explanations have a large effect on how people respond to recommendations. However, there are many possible intentions a system may have in generating explanations for a given recommendation -from increasing transparency, to enabling a faster decision, to persuading the recipient. As a good explanation for one goal may not be good for others, we address the questions of (1) how to robustly measure if an explanation meets a given goal and (2) how the different goals interact with each other. Specifically, this paper presents a first proposal of how to measure the quality of explanations along seven common goal dimensions catalogued in the literature. We find that the seven goals are not independent, but rather exhibit strong structure. Proposing two novel explanation evaluation designs, we identify challenges in evaluation, and provide more efficient measurement approaches of explanation quality.
Abstract: Information retrieval (IR) ranking models in production systems continually evolve in response to user feedback, insights from research, and new developments. Rather than investing all engineering resources to produce a single challenger to the existing system, a commercial provider might choose to explore multiple new ranking models simultaneously. However, even small changes to a complex model can have unintended consequences. In particular, the per-topic effectiveness profile is likely to change, and even when an overall improvement is achieved, gains are rarely observed for every query, introducing the risk that some users or queries may be negatively impacted by the new model if deployed into production. Risk adjustments that re-weight losses relative to gains and mitigate such behavior are available when making one-to-one system comparisons, but not for one-to-many or many-to-one comparisons. Moreover, no IR evaluation methodology integrates priors from previous or alternative rankers in a homogeneous inferential framework. In this work, we propose a Bayesian approach where multiple challengers are compared to a single champion. We also show that risk can be incorporated, and demonstrate the benefits of doing so. Finally, the alternative scenario that is commonly encountered in academic research is also considered, when a single challenger is compared against several previous champions.
Abstract: Replicability and reproducibility of experimental results are primary concerns in all the areas of science and IR is not an exception. Besides the problem of moving the field towards more reproducible experimental practices and protocols, we also face a severe methodological issue: we do not have any means to assess when reproduced is reproduced. Moreover, we lack any reproducibility-oriented dataset, which would allow us to develop such methods. To address these issues, we compare several measures to objectively quantify to what extent we have replicated or reproduced a system-oriented IR experiment. These measures operate at different levels of granularity, from the fine-grained comparison of ranked lists, to the more general comparison of the obtained effects and significant differences. Moreover, we also develop a reproducibility-oriented dataset, which allows us to validate our measures and which can also be used to develop future measures.
Abstract: For offline evaluation of IR systems, some researchers have proposed to utilise pairwise document preference assessments instead of relevance assessments of individual documents, as it may be easier for assessors to make relative decisions rather than absolute ones. Simple preference-based evaluation measures such as ppref and wpref have been proposed, but the past decade did not see any wide use of such measures. One reason for this may be that, while these new measures have been reported to behave more or less similarly to traditional measures based on absolute assessments, whether they actually align with the users' perception of search engine result pages (SERPs) has been unknown. The present study addresses exactly this question, after formally defining two classes of preference-based measures called Pref measures and Δ-measures. We show that the best of these measures perform at least as well as an average assessor in terms of agreement with users' SERP preferences, and that implicit document preferences (i.e., those suggested by a SERP that retrieves one document but not the other) play a much more important role than explicit preferences (i.e., those suggested by a SERP that retrieves one document above the other). We have released our data set containing 119,646 document preferences, so that the feasibility of document preferenced-based evaluation can be further pursued by the IR community.
Abstract: Following the success of Cranfield-like evaluation approaches to evaluation in web search, web image search has also been evaluated with absolute judgments of (graded) relevance. However, recent research has found that collecting absolute relevance judgments may be difficult in image search scenarios due to the multi-dimensional nature of relevance for image results. Moreover, existing evaluation metrics based on absolute relevance judgments do not correlate well with search users' satisfaction perceptions in web image search. Unlike absolute relevance judgments, preference judgments do not require that relevance grades be pre-defined, i.e., how many levels to use and what those levels mean. Instead of considering each document in isolation, preference judgments consider a pair of documents and require judges to state their relative preference. Such preference judgments are usually more reliable than absolute judgments since the presence of (at least) two items establishes a certain context. While preference judgments have been studied extensively for general web search, there exists no thorough investigation on how preference judgments and preference-based evaluation metrics can be used to evaluate web image search systems. Compared to general web search, web image search may be an even better fit for preference-based evaluation because of its grid-based presentation style. The limited need for fresh results in web image search also makes preference judgments more reusable than for general web search. In this paper, we provide a thorough comparison of variants of preference judgments for web image search. We find that compared to strict preference judgments, weak preference judgments require less time and have better inter-assessor agreement. We also study how absolute relevance levels of two given images affect preference judgments between them. Furthermore, we propose a preference-based evaluation metric named Preference-Winning-Penalty (PWP) to evaluate and compare between two different image search systems. The proposed PWP metric outperforms existing evaluation metrics based on absolute relevance judgments in terms of agreement to system-level preferences of actual users.
Abstract: Evaluation metrics play an important role in the batch evaluation of IR systems. Based on a user model that describes how users interact with the rank list, an evaluation metric is defined to link the relevance scores of a list of documents to an estimation of system effectiveness and user satisfaction. Therefore, the validity of an evaluation metric has two facets: whether the underlying user model can accurately predict user behavior and whether the evaluation metric correlates well with user satisfaction. While a tremendous amount of work has been undertaken to design, evaluate, and compare different evaluation metrics, few studies have explored the consistency between these two facets of evaluation metrics. Specifically, we want to investigate whether the metrics that are well calibrated with user behavior data can perform as well in estimating user satisfaction. To shed light on this research question, we compare the performance of various metrics with the C/W/L Framework in estimating user satisfaction when they are optimized to fit observed user behavior. Experimental results on both self-collected and public available user search behavior datasets show that the metrics optimized to fit users' click behavior can perform as well as those calibrated with user satisfaction feedback. We also investigate the reliability in the calibration process of evaluation metrics to find out how much data is required for parameter tuning. Our findings provide empirical support for the consistency between user behavior modeling and satisfaction measurement, as well as guidance for tuning the parameters in evaluation metrics.
Abstract: Recently session search evaluation has been paid more attention as a realistic search scenario usually involves multiple queries and interactions between users and systems. Evolved from model-based evaluation metrics for a single query, existing session-based metrics also follow a generic framework based on the cascade hypothesis. The cascade hypothesis assumes that lower-ranked search results and later-issued queries receive less attention from users and should therefore be assigned smaller weights when calculating evaluation metrics. This hypothesis gains much success in modeling search users' behavior and designing evaluation metrics, by explaining why users' attention decays on search engine result pages. However, recent studies have found that the recency effect also plays an important role in determining user satisfaction in search sessions. Especially, whether a user feels satisfied in the later-issued queries heavily influences his/her search satisfaction in the whole session. To take both the cascade hypothesis and the recency effect into the design of session search evaluation metrics, we propose Recency-aware Session-based Metrics (RSMs) to simultaneously characterize users' examination process with a browsing model and cognitive process with a utility accumulation model. With both self-constructed and public available user search behavior datasets, we show the effectiveness of proposed RSMs by comparing them with existing session-based metrics in the light of correlation with user satisfaction. We also find that the influence of the cascade and the recency effects varies dramatically among tasks with different difficulties and complexities, which suggests that we should use different model parameters for different types of search tasks. Our findings highlight the importance of investigating and utilizing cognitive effects besides examination hypotheses in search evaluation.
Abstract: Article 5(1)(c) of the European Union's General Data Protection Regulation (GDPR) requires that "personal data shall be [...] adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed (`data minimisation')". To date, the legal and computational definitions of 'purpose limitation' and 'data minimization' remain largely unclear. In particular, the interpretation of these principles is an open issue for information access systems that optimize for user experience through personalization and do not strictly require personal data collection for the delivery of basic service. In this paper, we identify a lack of a homogeneous interpretation of the data minimization principle and explore two operational definitions applicable in the context of personalization. The focus of our empirical study in the domain of recommender systems is on providing foundational insights about the (i) feasibility of different data minimization definitions, (ii) robustness of different recommendation algorithms to minimization, and (iii) performance of different minimization strategies.We find that the performance decrease incurred by data minimization might not be substantial, but that it might disparately impact different users---a finding which has implications for the viability of different formal minimization definitions. Overall, our analysis uncovers the complexities of the data minimization problem in the context of personalization and maps the remaining computational and regulatory challenges.
Abstract: The rapid growth of e-commerce has made people accustomed to shopping online. Before making purchases on e-commerce websites, most consumers tend to rely on rating scores and review information to make purchase decisions. With this information, they can infer the quality of products to reduce the risk of purchase. Specifically, items with high rating scores and good reviews tend to be less risky, while items with low rating scores and bad reviews might be risky to purchase. On the other hand, the purchase behaviors will also be influenced by consumers' tolerance of risks, known as the risk attitudes. Economists have studied risk attitudes for decades. These studies reveal that people are not always rational enough when making decisions, and their risk attitudes may vary in different circumstances. Most existing works over recommendation systems do not consider users' risk attitudes in modeling, which may lead to inappropriate recommendations to users. For example, suggesting a risky item to a risk-averse person or a conservative item to a risk-seeking person may result in the reduction of user experience. In this paper, we propose a novel risk-aware recommendation framework that integrates machine learning and behavioral economics to uncover the risk mechanism behind users' purchasing behaviors. Concretely, we first develop statistical methods to estimate the risk distribution of each item and then draw the Nobel-award winning Prospect Theory into our model to learn how users choose from probabilistic alternatives that involve risks, where the probabilities of the outcomes are uncertain. Experiments on several e-commerce datasets demonstrate that by taking user risk preferences into consideration, our approach can achieve better performance than many classical recommendation approaches, and further analyses also verify the advantages of risk-aware recommendation beyond accuracy.
Abstract: Factorization machines (FMs) have been widely adopted to model the discrete feature interactions in recommender systems. Despite their great success, currently there is no study of their robustness to discrete adversarial perturbations. Whether modifying a certain number of the discrete input features has a dramatic effect on the FM's prediction? Although there exist robust training methods for FMs, they neglect the discrete property of input features and lack of an effective mechanism to verify the model robustness. In our work, we propose the first method for the certifiable robustness of factorization machines with respect to the discrete perturbation on input features. If an instance is certifiably robust, it is guaranteed to be robust (under the considered space) no matter what the perturbations and attack models are. Likewise, we provide non-robust certificates via the existence of discrete adversarial perturbations that change the FM's prediction. Through such robustness certificates, we show that FMs and the current robust training methods are vulnerable to discrete adversarial perturbations. The vulnerability makes the outcome unreliable and restricts the application of FMs. To enhance the FM's robustness against such perturbations, a robust training procedure is presented whose core idea is to increase the number of instances that are certifiably robust. Extensive experiments on three real-world datasets demonstrate that our method significantly enhances the robustness of the factorization machines with little impact on predictive accuracy.
Abstract: Rankings are the primary interface through which many online platforms match users to items (e.g. news, products, music, video). In these two-sided markets, not only the users draw utility from the rankings, but the rankings also determine the utility (e.g. exposure, revenue) for the item providers (e.g. publishers, sellers, artists, studios). It has already been noted that myopically optimizing utility to the users -- as done by virtually all learning-to-rank algorithms -- can be unfair to the item providers. We, therefore, present a learning-to-rank approach for explicitly enforcing merit-based fairness guarantees to groups of items (e.g. articles by the same publisher, tracks by the same artist). In particular, we propose a learning algorithm that ensures notions of amortized group fairness, while simultaneously learning the ranking function from implicit feedback data. The algorithm takes the form of a controller that integrates unbiased estimators for both fairness and utility, dynamically adapting both as more data becomes available. In addition to its rigorous theoretical foundation and convergence guarantees, we find empirically that the algorithm is highly practical and robust.
Abstract: Truthfulness judgments are a fundamental step in the process of fighting misinformation, as they are crucial to train and evaluate classifiers that automatically distinguish true and false statements. Usually such judgments are made by experts, like journalists for political statements or medical doctors for medical statements. In this paper, we follow a different approach and rely on (non-expert) crowd workers. This of course leads to the following research question: Can crowdsourcing be reliably used to assess the truthfulness of information and to create large-scale labeled collections for information credibility systems? To address this issue, we present the results of an extensive study based on crowdsourcing: we collect thousands of truthfulness assessments over two datasets, and we compare expert judgments with crowd judgments, expressed on scales with various granularity levels. We also measure the political bias and the cognitive background of the workers, and quantify their effect on the reliability of the data provided by the crowd.
Abstract: Recommendation algorithms typically build models based on user-item interactions (e.g., clicks, likes, or ratings) to provide a personalized ranked list of items. These interactions are often distributed unevenly over different groups of items due to varying user preferences. However, we show that recommendation algorithms can inherit or even amplify this imbalanced distribution, leading to item under-recommendation bias. Concretely, we formalize the concepts of ranking-based statistical parity and equal opportunity as two measures of item under-recommendation bias. Then, we empirically show that one of the most widely adopted algorithms -- Bayesian Personalized Ranking -- produces biased recommendations, which motivates our effort to propose the novel debiased personalized ranking model. The debiased model is able to improve the two proposed bias metrics while preserving recommendation performance. Experiments on three public datasets show strong bias reduction of the proposed model versus state-of-the-art alternatives.
Abstract: From 2017 to 2019 the Text REtrieval Conference (TREC) held a challenge task on precision medicine using documents from medical publications (PubMed) and clinical trials. Despite lots of performance measurements carried out in these evaluation campaigns, the scientific community is still pretty unsure about the impact individual system features and their weights have on the overall system performance. In order to overcome this explanatory gap, we first determined optimal feature configurations using the Sequential Model-based Algorithm Configuration (SMAC) program and applied its output to a BM25-based search engine. We then ran an ablation study to systematically assess the individual contributions of relevant system features: BM25 parameters, query type and weighting schema, query expansion, stop word filtering, and keyword boosting. For evaluation, we employed the gold standard data from the three TREC Precision Medicine (TREC-PM) installments to evaluate the effectiveness of different features using the commonly shared infNDCG metric.
Abstract: Counterfactual Learning To Rank (LTR) algorithms learn a ranking model from logged user interactions, often collected using a production system. Employing such an offline learning approach has many benefits compared to an online one, but it is challenging as user feedback often contains high levels of bias. Unbiased LTR uses Inverse Propensity Scoring (IPS) to enable unbiased learning from logged user interactions. One of the major difficulties in applying Stochastic Gradient Descent (SGD) approaches to counterfactual learning problems is the large variance introduced by the propensity weights. In this paper we show that the convergence rate of SGD approaches with IPS-weighted gradients suffers from the large variance introduced by the IPS weights: convergence is slow, especially when there are large IPS weights. To overcome this limitation, we propose a novel learning algorithm, called CounterSample, that has provably better convergence than standard IPS-weighted gradient descent methods. We prove that CounterSample converges faster and complement our theoretical findings with empirical results by performing extensive experimentation in a number of biased LTR scenarios -- across optimizers, batch sizes, and different degrees of position bias.
Abstract: Search result diversification aims to retrieve diverse results to cover as many subtopics related to the query as possible. Recent studies showed that supervised diversification models are able to outperform the heuristic approaches, by automatically learning a diversification function other than using manually designed score functions. The main challenge of training a diversification model is the lack of high-quality training samples. Due to the involvement of dependence between documents in the ranker, it is very hard for training algorithms to select effective positive and negative ranking lists to train a reliable ranking model, given a large number of candidate documents within which different documents are relevant to different subtopics. To tackle this problem, we propose a supervised diversification framework based on Generative Adversarial Network (GAN). It consists of a generator and a discriminator interacting with each other in a minimax game. Specifically, the generator generates more confusing negative samples for the discriminator, and the discriminator sends back complementary ranking signals to the generator. Furthermore, we explicitly exploit subtopics in the generator, whereas focusing on modeling document similarity in the discriminator. Through such a minimax game, we are able to obtain better ranking models by combining ranking signals learned by the generator and the discriminator. Experimental results on the TREC Web Track dataset show that the proposed method can significantly outperform existing diversification methods.
Abstract: Counterfactual Learning to Rank (LTR) methods optimize ranking systems using logged user interactions that contain interaction biases. Existing methods are only unbiased if users are presented with all relevant items in every ranking. There is currently no existing counterfactual unbiased LTR method for top-k rankings. We introduce a novel policy-aware counterfactual estimator for LTR metrics that can account for the effect of a stochastic logging policy. We prove that the policy-aware estimator is unbiased if every relevant item has a non-zero probability to appear in the top-k ranking. Our experimental results show that the performance of our estimator is not affected by the size of k: for any k, the policy-aware estimator reaches the same retrieval performance while learning from top-k feedback as when learning from feedback on the full ranking. Lastly, we introduce novel extensions of traditional LTR methods to perform counterfactual LTR and to optimize top-k metrics. Together, our contributions introduce the first policy-aware unbiased LTR approach that learns from top-k feedback and optimizes top-k metrics. As a result, counterfactual LTR is now applicable to the very prevalent top-k ranking setting in search and recommendation.
Abstract: In learning-to-rank for information retrieval, a ranking model is automatically learned from the data and then utilized to rank the sets of retrieved documents. Therefore, an ideal ranking model would be a mapping from a document set to a permutation on the set, and should satisfy two critical requirements: (1) it should have the ability to model cross-document interactions so as to capture local context information in a query; (2) it should be permutation-invariant, which means that any permutation of the inputted documents would not change the output ranking. Previous studies on learning-to-rank either design uni-variate scoring functions that score each document separately, and thus failed to model the cross-document interactions; or construct multivariate scoring functions that score documents sequentially, which inevitably sacrifice the permutation invariance requirement. In this paper, we propose a neural learning-to-rank model called SetRank which directly learns a permutation-invariant ranking model defined on document sets of any size. SetRank employs a stack of (induced) multi-head self attention blocks as its key component for learning the embeddings for all of the retrieved documents jointly. The self-attention mechanism not only helps SetRank to capture the local context information from cross-document interactions, but also to learn permutation-equivariant representations for the inputted documents, which therefore achieving a permutation-invariant ranking model. Experimental results on three benchmarks showed that the SetRank significantly outperformed the baselines include the traditional learning-to-rank models and state-of-the-art Neural IR models.
Abstract: This paper concerns reinforcement learning~(RL) of the document ranking models for information retrieval~(IR). One branch of the RL approaches to ranking formalize the process of ranking with Markov decision process~(MDP) and determine the model parameters with policy gradient. Though preliminary success has been shown, these approaches are still far from achieving their full potentials. Existing policy gradient methods directly utilize the absolute performance scores (returns) of the sampled document lists in its gradient estimations, which may cause two limitations: 1) fail to reflect the relative goodness of documents within the same query, which usually is close to the nature of IR ranking; 2) generate high variance gradient estimations, resulting in slow learning speed and low ranking accuracy. To deal with the issues, we propose a novel policy gradient algorithm in which the gradients are determined using pairwise comparisons of two document lists sampled within the same query. The algorithm, referred to as Pairwise Policy Gradient (PPG), repeatedly samples pairs of document lists, estimates the gradients with pairwise comparisons, and finally updates the model parameters. Theoretical analysis shows that PPG makes an unbiased and low variance gradient estimations. Experimental results have demonstrated performance gains over the state-of-the-art baselines in search result diversification and text retrieval.
Abstract: Community question-answering (CQA) has been established as a prominent web service enabling users to post questions and get answers from the community. Product Question Answering (PQA) is a special CQA framework where questions are asked (and are answered) in the context of a specific product. Naturally, humorous questions are integral part of such platforms, especially as some products attract humor due to their unreasonable price, their peculiar functionality, or in cases that users emphasize their critical point-of-view through humor. Detecting humorous questions in such systems is important for sellers, to better understand user engagement with their products. It is also important to signal users about flippancy of humorous questions, and that answers for such questions should be taken with a grain of salt. In this study we present a deep-learning framework for detecting humorous questions in PQA systems. Our framework utilizes two properties of the questions - Incongruity and Subjectivity, demonstrating their contribution for humor detection. We evaluate our framework over a real-world dataset, demonstrating an accuracy of 90.8%, up to 18.3% relative improvement over baseline methods. We then demonstrate the existence of product bias in PQA platforms, when some products attract more humorous questions than others. A classifier trained over unbiased data is outperformed by the biased classifier, however, it excels in the task of differentiating between humorous and non-humorous questions that are both related to the same product. To the best of our knowledge this work is the first to detect humor in PQA setting.
Abstract: In precision-oriented tasks like answer ranking, it is more important to rank many relevant answers highly than to retrieve all relevant answers. It follows that a good ranking strategy would be to learn how to identify the easiest correct answers first (i.e., assign a high ranking score to answers that have characteristics that usually indicate relevance, and a low ranking score to those with characteristics that do not), before incorporating more complex logic to handle difficult cases (e.g., semantic matching or reasoning). In this work, we apply this idea to the training of neural answer rankers using curriculum learning. We propose several heuristics to estimate the difficulty of a given training sample. We show that the proposed heuristics can be used to build a training curriculum that down-weights difficult samples early in the training process. As the training process progresses, our approach gradually shifts to weighting all samples equally, regardless of difficulty. We present a comprehensive evaluation of our proposed idea on three answer ranking datasets. Results show that our approach leads to superior performance of two leading neural ranking architectures, namely BERT and ConvKNRM, using both pointwise and pairwise losses. When applied to a BERT-based ranker, our method yields up to a 4% improvement in MRR and a 9% improvement in P@1 (compared to the model trained without a curriculum). This results in models that can achieve comparable performance to more expensive state-of-the-art techniques.
Abstract: Conversational search is one of the ultimate goals of information retrieval. Recent research approaches conversational search by simplified settings of response ranking and conversational question answering, where an answer is either selected from a given candidate set or extracted from a given passage. These simplifications neglect the fundamental role of retrieval in conversational search. To address this limitation, we introduce an open-retrieval conversational question answering (ORConvQA) setting, where we learn to retrieve evidence from a large collection before extracting answers, as a further step towards building functional conversational search systems. We create a dataset, OR-QuAC, to facilitate research on ORConvQA. We build an end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader that are all based on Transformers. Our extensive experiments on OR-QuAC demonstrate that a learnable retriever is crucial for ORConvQA. We further show that our system can make a substantial improvement when we enable history modeling in all system components. Moreover, we show that the reranker component contributes to the model performance by providing a regularization effect. Finally, further in-depth analyses are performed to provide new insights into ORConvQA.
Abstract: At LinkedIn, we want to create economic opportunity for everyone in the global workforce. A critical aspect of this goal is matching jobs with qualified applicants. To improve hiring efficiency and reduce the need to manually screening each applicant, we develop a new product where recruiters can ask screening questions online so that they can filter qualified candidates easily. To add screening questions to all 20M active jobs at Linked In, we propose a new task that aims to automatically generate screening questions for a given job posting. To solve the task of generating screening questions, we develop a two-stage deep learning model called Job2Questions, where we apply a deep learning model to detect intent from the text description, and then rank the detected intents by their importance based on other contextual features. Since this is a new product with no historical data, we employ deep transfer learning to train complex models with limited training data. We launched the screening question product and our AI models to LinkedIn users and observed significant impact in the job marketplace. During our online A/B test, we observed +53.10% screening question suggestion acceptance rate, +22.17% job coverage, +190% recruiter-applicant interaction, and +11 Net Promoter Score. In sum, the deployed Job2Questions model helps recruiters to find qualified applicants and job seekers to find jobs they are qualified for.
Abstract: Community Question Answering (CQA) has become a primary means for people to acquire knowledge, where people are free to ask questions or submit answers. To enhance the efficiency of the service, similar question identification becomes a core task in CQA which aims to find a similar question from the archived repository whenever a new question is asked. However, it has long been a challenge to properly measure the similarity between two questions due to the inherent variation of natural language, i.e., there could be different ways to ask a same question or different questions sharing similar expressions. To alleviate this problem, it is natural to involve the existing answers for the enrichment of the archived questions. Traditional methods typically take aone-side usage, which leverages the answer as some expanded representation of the corresponding question. Unfortunately, this may introduce unexpected noises into the similarity computation since answers are often long and diverse, leading to inferior performance. In this work, we propose atwo-side usage, which leverages the answer as a bridge of the two questions. The key idea is based on our observation that similar questions could be addressed by similar parts of the answer while different questions may not. In other words, we can compare the matching patterns of the two questions over the same answer to measure their similarity. In this way, we propose a novel matching over matching model, namely Match2, which compares the matching patterns between two question-answer pairs for similar question identification. Empirical experiments on two benchmark datasets demonstrate that our model can significantly outperform previous state-of-the-art methods on the similar question identification task.
Abstract: Many E-commerce sites now offer product-specific question answering platforms for users to communicate with each other by posting and answering questions during online shopping. However, the multiple answers provided by ordinary users usually vary diversely in their qualities and thus need to be appropriately ranked for each question to improve user satisfaction. It can be observed that product reviews usually provide useful information for a given question, and thus can assist the ranking process. In this paper, we investigate the answer ranking problem for product-related questions, with the relevant reviews treated as auxiliary information that can be exploited for facilitating the ranking. We propose an answer ranking model named MUSE which carefully models multiple semantic relations among the question, answers, and relevant reviews. Specifically, MUSE constructs a multi-semantic relation graph with the question, each answer, and each review snippet as nodes. Then a customized graph convolutional neural network is designed for explicitly modeling the semantic relevance between the question and answers, the content consistency among answers, and the textual entailment between answers and reviews. Extensive experiments on real-world E-commerce datasets across three product categories show that our proposed model achieves superior performance on the concerned answer ranking task.
Abstract: Most of ranking models are trained only with displayed items (most are hot items), but they are utilized to retrieve items in the entire space which consists of both displayed and non-displayed items (most are long-tail items). Due to the sample selection bias, the long-tail items lack sufficient records to learn good feature representations, ie data sparsity and cold start problems. The resultant distribution discrepancy between displayed and non-displayed items would cause poor long-tail performance. To this end, we propose an entire space adaptation model (ESAM) to address this problem from the perspective of domain adaptation (DA). ESAM regards displayed and non-displayed items as source and target domains respectively. Specifically, we design the attribute correlation alignment that considers the correlation between high-level attributes of the item to achieve distribution alignment. Furthermore, we introduce two effective regularization strategies, ie center-wise clustering andself-training to improve DA process. Without requiring any auxiliary information and auxiliary domains, ESAM transfers the knowledge from displayed items to non-displayed items for alleviating the distribution inconsistency. Experiments on two public datasets and a large-scale industrial dataset collected from Taobao demonstrate that ESAM achieves state-of-the-art performance, especially in the long-tail space. Besides, we deploy ESAM to the Taobao search engine, leading to significant improvement on online performance. The code is available at https://github.com/A-bone1/ESAM.git.
Abstract: Pretrained contextualized language models such as BERT have achieved impressive results on various natural language processing benchmarks. Benefiting from multiple pretraining tasks and large scale training corpora, pretrained models can capture complex syntactic word relations. In this paper, we use the deep contextualized language model BERT for the task of ad hoc table retrieval. We investigate how to encode table content considering the table structure and input length limit of BERT. We also propose an approach that incorporates features from prior literature on table retrieval and jointly trains them with BERT. In experiments on public datasets, we show that our best approach can outperform the previous state-of-the-art method and BERT baselines with a large margin under different evaluation metrics.
Abstract: Edit-distance-based string similarity search has many applications such as spell correction, data de-duplication, and sequence alignment. However, computing edit distance is known to have high complexity, which makes string similarity search challenging for large datasets. In this paper, we propose a deep learning pipeline (called CNN-ED) that embeds edit distance into Euclidean distance for fast approximate similarity search. A convolutional neural network (CNN) is used to generate fixed-length vector embeddings for a dataset of strings and the loss function is a combination of the triplet loss and the approximation error. To justify our choice of using CNN instead of other structures (e.g., RNN) as the model, theoretical analysis is conducted to show that some basic operations in our CNN model preserve edit distance. Experimental results show that CNN-ED outperforms data-independent CGK embedding and RNN-based GRU embedding in terms of both accuracy and efficiency by a large margin. We also show that string similarity search can be significantly accelerated using CNN-based embeddings, sometimes by orders of magnitude.
Abstract: Motivated by a success of generative adversarial networks (GAN) in various domains including information retrieval, we propose a novel signed network embedding framework, ASiNE, which represents each node of a given signed network as a low-dimensional vector based on the adversarial learning. To do this, we first design a generator G+ and a discriminator D+ that consider positive edges, as well as a generator G - and a discriminator D- that consider negative edges: (1) G+/G- aim to generate the most indistinguishable fake positive/negative edges, respectsupively; (2) D+/D aim to discriminate between real positive/negative edges and fake positive/negative edges, respectively. Furthermore, under ASiNE, we propose two new strategies for effective signed network embedding: (1) an embedding space sharing strategy for learning both positive and negative edges; (2) a fake edge generation strategy based on the balance theory. Through extensive experiments using five real-life signed networks, we verify the effectiveness of each of the strategies employed in ASiNE. We also show that ASiNE consistently and significantly outperforms all the state-of-the-art signed network embedding methods in all datasets and with all metrics in terms of accuracy of sign prediction.
Abstract: Graph queries have emerged as one of the fundamental techniques to support modern search services, such as PageRank web search, social networking search and knowledge graph search. As such graphs are maintained globally and very huge (e.g., billions of nodes), we need to efficiently process graph queries across multiple geographically distributed datacenters, running geo-distributed graph queries. Existing graph computing frameworks may not work well for geographically distributed datacenters, because they implement a Bulk Synchronous Parallel model that requires excessive inter-datacenter transfers, thereby introducing extremely large latency for query processing. In this paper, we propose GeoGraph --a universal framework to support efficient geo-distributed graph query processing based on clustering datacenters and meta-graph, while reducing the inter-datacenter communication. Our new framework can be applied to many types of graph algorithms without any modification. The framework is developed on the top of Apache Giraph. The experiments were conducted by applying four important graph queries, i.e., shortest path, graph keyword search, subgraph isomorphism and PageRank. The evaluation results show that our proposed framework can achieve up to 82% faster convergence, 42% lower WAN bandwidth usage, and 45% less total monetary cost for the four graph queries, with input graphs stored across ten geo-distributed datacenters.
Abstract: In location-based services, such as navigation and ride-hailing, it is an essential function to match a query with Point-of-Interests (POIs) for efficient destination retrieval. Indeed, due to the space limit and real-time requirement, such services usually require intermediate POI matching results when only partial search keywords are typed. While there are numerous retrieval models for general textual semantic matching, few attempts have been made for query-POI matching by considering the integration of rich spatio-temporal factors and dynamic user preferences. To this end, in this paper, we develop a spatio-temporal dual graph attention network ~(STDGAT), which can jointly model dynamic situational context and users' sequential behaviors for intelligent query-POI matching. Specifically, we first utilize a semantic representation block to model semantic correlations among incomplete texts as well as various spatio-temporal factors captured by location and time. Next, we propose a novel dual graph attention network to capture two types of query-POI relevance, where one models global query-POI interaction and another one models time-evolving user preferences on destination POIs. Moreover, we also incorporate spatio-temporal factors into the dual graph attention network so that the query-POI relevance can be generalized to the sophisticated situational context. After that, a pairwise fusion strategy is introduced to extract the salient global feature representatives for both queries and POIs. Finally, several cold-start strategies and training methods are proposed to improve the matching effectiveness and training efficiency. Extensive experiments on two real-world datasets demonstrate the performances of our approach compared with state-of-the-art baselines. The results show that our model achieves significant improvement in terms of matching accuracy even with only partial query keywords are given.
Abstract: Graph Convolution Network (GCN) has become new state-of-the-art for collaborative filtering. Nevertheless, the reasons of its effectiveness for recommendation are not well understood. Existing work that adapts GCN to recommendation lacks thorough ablation analyses on GCN, which is originally designed for graph classification tasks and equipped with many neural network operations. However, we empirically find that the two most common designs in GCNs -- feature transformation and nonlinear activation -- contribute little to the performance of collaborative filtering. Even worse, including them adds to the difficulty of training and degrades recommendation performance. In this work, we aim to simplify the design of GCN to make it more concise and appropriate for recommendation. We propose a new model named LightGCN, including only the most essential component in GCN -- neighborhood aggregation -- for collaborative filtering. Specifically, LightGCN learns user and item embeddings by linearly propagating them on the user-item interaction graph, and uses the weighted sum of the embeddings learned at all layers as the final embedding. Such simple, linear, and neat model is much easier to implement and train, exhibiting substantial improvements (about 16.0% relative improvement on average) over Neural Graph Collaborative Filtering (NGCF) -- a state-of-the-art GCN-based recommender model -- under exactly the same experimental setting. Further analyses are provided towards the rationality of the simple LightGCN from both analytical and empirical perspectives.
Abstract: Group recommendation aims to suggest preferred items to a group of users rather than to an individual user. Most existing methods on group recommendation directly learn theinherent interests of groups and users orinherent features of items, i.e., independently modeling the inherent embeddings of groups, users or items. However, the independent view severely suffers from the cold-start problem when making recommendations for occasional groups that are temporally formed by a set of users and have few interactions on items. Actually, the groups, users and items are interdependent because they interact with one another. The interdependencies constitute an interaction graph that provides multiple views to model the embeddings of groups, users and items from their interacting counterparts to improve recommendation for occasional groups. To this end, we propose a model, named GAME to learn the Graphical and Attentive Multi-view Embeddings (i.e., representations) for the groups, users and items from the independent view and counterpart views based on the interaction graph. In the counterpart views, the embedding of a group, user or item is aggregated from the interacting counterparts based on an attention mechanism that derives the adaptive weight for each counterpart. For instance, a user's embedding may be aggregated from her interacting items or groups. Further, GAME applies neural collaborative filtering to investigate the interactions between the multi-view embeddings of groups (or users) and items for group recommendation. Finally, we conduct extensive experiments on two real datasets. The experimental results show that GAME outperforms other state-of-the-art models, especially on both cold-start groups (i.e., occasional groups) and cold-start items.
Abstract: Traditional recommendation models that usually utilize only one type of user-item interaction are faced with serious data sparsity or cold start issues. Multi-behavior recommendation taking use of multiple types of user-item interactions, such as clicks and favorites, can serve as an effective solution. Early efforts towards multi-behavior recommendation fail to capture behaviors' different influence strength on target behavior. They also ignore behaviors' semantics which is implied in multi-behavior data. Both of these two limitations make the data not fully exploited for improving the recommendation performance on the target behavior. In this work, we approach this problem by innovatively constructing a unified graph to represent multi-behavior data and proposing a new model named MBGCN (short for Multi-Behavior Graph Convolutional Network ). Learning behavior strength by user-item propagation layer and capturing behavior semantics by item-item propagation layer, MBGCN can well address the limitations of existing works. Empirical results on two real-world datasets verify the effectiveness of our model in exploiting multi-behavior data. Our model outperforms the best baseline by 25.02% and 6.51% averagely on two datasets. Further studies on cold-start users confirm the practicability of our proposed model.
Abstract: Streaming session-based recommendation (SSR) is a challenging task that requires the recommender system to do the session-based recommendation (SR) in the streaming scenario. In the real-world applications of e-commerce and social media, a sequence of user-item interactions generated within a certain period are grouped as a session, and these sessions consecutively arrive in the form of streams. Most of the recent SR research has focused on the static setting where the training data is first acquired and then used to train a session-based recommender model. They need several epochs of training over the whole dataset, which is infeasible in the streaming setting. Besides, they can hardly well capture long-term user interests because of the neglect or the simple usage of the user information. Although some streaming recommendation strategies have been proposed recently, they are designed for streams of individual interactions rather than streams of sessions. In this paper, we propose a G lobal A ttributed G raph (GAG) neural network model with a Wasserstein reservoir for the SSR problem. On one hand, when a new session arrives, a session graph with a global attribute is constructed based on the current session and its associate user. Thus, the GAG can take both the global attribute and the current session into consideration to learn more comprehensive representations of the session and the user, yielding a better performance in the recommendation. On the other hand, for the adaptation to the streaming session scenario, a Wasserstein reservoir is proposed to help preserve a representative sketch of the historical data. Extensive experiments on two real-world datasets have been conducted to verify the superiority of the GAG model compared with the state-of-the-art methods.
Abstract: In many recommender systems, users and items are associated with attributes, and users show preferences to items. The attribute information describes users'(items') characteristics and has a wide range of applications, such as user profiling, item annotation, and feature-enhanced recommendation. As annotating user (item) attributes is a labor intensive task, the attribute values are often incomplete with many missing attribute values. Therefore, item recommendation and attribute inference have become two main tasks in these platforms. Researchers have long converged that user(item) attributes and the preference behavior are highly correlated. Some researchers proposed to leverage one kind of data for the remaining task, and showed to improve performance. Nevertheless, these models either neglected the incompleteness of user~(item) attributes or regarded the correlation of the two tasks with simple models, leading to suboptimal performance of these two tasks. To this end, in this paper, we define these two tasks in an attributed user-item bipartite graph, and propose an Adaptive Graph Convolutional Network(AGCN) approach for joint item recommendation and attribute inference. The key idea of AGCN is to iteratively perform two parts: 1) Learning graph embedding parameters with previously learned approximated attribute values to facilitate two tasks; 2) Sending the approximated updated attribute values back to the attributed graph for better graph embedding learning. Therefore, AGCN could adaptively adjust the graph embedding learning parameters by incorporating both the given attributes and the estimated attribute values, in order to provide weakly supervised information to refine the two tasks. Extensive experimental results on three real-world datasets clearly show the effectiveness of the proposed model.
Abstract: In recent years, recommender system has become an indispensable function in all e-commerce platforms. The review rating data for a recommender system typically comes from open platforms, which may attract a group of malicious users to deliberately insert fake feedback in an attempt to bias the recommender system to their favour. The presence of such attacks may violate modeling assumptions that high-quality data is always available and these data truly reflect users' interests and preferences. Therefore, it is of great practical significance to construct a robust recommender system that is able to generate stable recommendations even in the presence of shilling attacks. In this paper, we propose GraphRfi - a GCN-based user representation learning framework to perform robust recommendation and fraudster detection in a unified way. In its end-to-end learning process, the probability of a user being identified as a fraudster in the fraudster detection component automatically determines the contribution of this user's rating data in the recommendation component; while the prediction error outputted in the recommendation component acts as an important feature in the fraudster detection component. Thus, these two components can mutually enhance each other. Extensive experiments have been conducted and the experimental results show the superiority of our GraphRfi in the two tasks - robust rating prediction and fraudster detection. Furthermore, the proposed GraphRfi is validated to be more robust to the various types of shilling attacks over the state-of-the-art recommender systems.
Abstract: Even though Automatic Speech Recognition (ASR) systems significantly improved over the last decade, they still introduce a lot of errors when they transcribe voice to text. One of the most common reasons for these errors is phonetic confusion between similar-sounding expressions. As a result, ASR transcriptions often contain "quasi-oronyms", i.e., words or phrases that sound similar to the source ones, but that have completely different semantics (e.g., "win" instead of "when" or "accessible on defecting" instead of "accessible and affecting"). These errors significantly affect the performance of downstream Natural Language Understanding (NLU) models (e.g., intent classification, slot filling, etc.) and impair user experience. To make NLU models more robust to such errors, we propose novel phonetic-aware text representations. Specifically, we represent ASR transcriptions at the phoneme level, aiming to capture pronunciation similarities, which are typically neglected in word-level representations (e.g., word embeddings). To train and evaluate our phoneme representations, we generate noisy ASR transcriptions of four existing datasets - Stanford Sentiment Treebank, SQuAD, TREC Question Classification and Subjectivity Analysis - and show that common neural network architectures exploiting the proposed phoneme representations can effectively handle noisy transcriptions and significantly outperform state-of-the-art baselines. Finally, we confirm these results by testing our models on real utterances spoken to the Alexa virtual assistant.
Abstract: This paper presents a knowledge graph enhanced personalized search model, KEPS. For each user and her queries, KEPS first con- ducts personalized entity linking on the queries and forms better intent representations; then it builds a knowledge enhanced profile for the user, using memory networks to store the predicted search intents and linked entities in her search history. The knowledge enhanced user profile and intent representation are then utilized by KEPS for better, knowledge enhanced, personalized search. Furthermore, after providing personalized search for each query, KEPS leverages user's feedback (click on documents) to post-adjust the entity linking on previous queries. This fixes previous linking errors and improves ranking quality for future queries. Experiments on the public AOL search log demonstrate the advantage of knowledge in personalized search: personalized entity linking better reflects user's search intent, the memory networks better maintain user's subtle preferences, and the post linking adjustment fixes some linking errors with the received feedback signals. The three components together lead to a significantly better ranking accuracy of KEPS.
Abstract: Graphs are used to model pairwise relations between entities in many real-world scenarios such as social networks. Graph Neural Networks(GNNs) have shown their superior ability in learning representations for graph structured data, which leads to performance improvements in many graph related tasks such as link prediction, node classification and graph classification. Most of the existing graph neural networks models are designed for static graphs while many real-world graphs are inherently dynamic with new nodes and edges constantly emerging. Existing graph neural network models cannot utilize the dynamic information, which has been shown to enhance the performance of many graph analytic tasks such as community detection. Hence, in this paper, we propose DyGNN, a Dynamic Graph Neural Network model, which can model the dynamic information as the graph evolving. In particular, the proposed framework keeps updating node information by capturing the sequential information of edges (interactions), the time intervals between edges and information propagation coherently. Experimental results on various dynamic graphs demonstrate the effectiveness of the proposed framework.
Abstract: Web search is a key digital literacy skill that can be particularly challenging for people with dyslexia, a common learning disability that affects reading and spelling skills in about 15% of the English-speaking population. In this paper, we collected and analyzed eye-tracking, search log, and self-report data from 27 participants (14 with dyslexia) to confirm that searchers with dyslexia struggle with all stages of the search process and have markedly different gaze patterns and search behavior that reflect the strategies used and challenges faced. Based on these findings, we discuss design implications to improve the cognitive accessibility of web search.
Abstract: Knowledge graphs have emerged as a key abstraction for organizing information in diverse domains and their embeddings are increasingly used to harness their information in various information retrieval and machine learning tasks. However, the ever growing size of knowledge graphs requires computationally efficient algorithms capable of scaling to graphs with millions of nodes and billions of edges. This paper presents DGL-KE, an open-source package to efficiently compute knowledge graph embeddings. DGL-KE introduces various novel optimizations that accelerate training on knowledge graphs with millions of nodes and billions of edges using multi-processing, multi-GPU, and distributed parallelism. These optimizations are designed to increase data locality, reduce communication overhead, overlap computations with memory accesses, and achieve high operation efficiency. Experiments on knowledge graphs consisting of over 86M nodes and 338M edges show that DGL-KE can compute embeddings in 100 minutes on an EC2 instance with 8 GPUs and 30 minutes on an EC2 cluster with 4 machines with 48 cores/machine. These results represent a 2× ~ 5× speedup over the best competing approaches. DGL-KE is available on https://github.com/awslabs/dgl-ke.
Abstract: In this paper, we study collaborative filtering in an interactive setting, in which the recommender agents iterate between making recommendations and updating the user profile based on the interactive feedback. The most challenging problem in this scenario is how to suggest items when the user profile has not been well established, \ie recommend for cold-start users or warm-start users with taste drifting. Existing approaches either rely on overly pessimistic linear exploration strategy or adopt meta-learning based algorithms in a full exploitation way. In this work, to quickly catch up with the user's interests, we proposed to represent the exploration policy with a neural network and directly learn it from the feedback data. Specifically, the exploration policy is encoded in the weights of multi-channel stacked self-attention neural networks and trained with efficient Q-learning by maximizing users' overall satisfaction in the recommender systems. The key insight is that the satisfied recommendations triggered by the exploration recommendation can be viewed as the exploration bonus (delayed reward) for its contribution on improving the quality of the user profile. Therefore, the proposed exploration policy, to balance between learning the user profile and making accurate recommendations, can be directly optimized by maximizing users' long-term satisfaction with reinforcement learning. Extensive experiments and analysis conducted on three benchmark collaborative filtering datasets have demonstrated the advantage of our method over state-of-the-art methods.
Abstract: Recent years have witnessed a growing trend of fashion compatibility modeling, which scores the matching degree of the given outfit and then provides people with some dressing advice. Existing methods have primarily solved this problem by analyzing the discrete interaction among multiple complementary items. However, the fashion items would present certain occlusion and deformation when they are worn on the body. Therefore, the discrete item interaction cannot capture the fashion compatibility in a combined manner due to the neglect of a crucial factor: the overall try-on appearance. In light of this, we propose a multi-modal try-on-guided compatibility modeling scheme to jointly characterize the discrete interaction and try-on appearance of the outfit. In particular, we first propose a multi-modal try-on template generator to automatically generate a try-on template from the visual and textual information of the outfit, depicting the overall look of its composing fashion items. Then, we introduce a new compatibility modeling scheme which integrates the outfit try-on appearance into the traditional discrete item interaction modeling. To fulfill the proposal, we construct a large-scale real-world dataset from SSENSE, named FOTOS, consisting of 11,000 well-matched outfits and their corresponding realistic try-on images. Extensive experiments have demonstrated its superiority to state-of-the-arts.
Abstract: Existing spatial object recommendation algorithms generally treat objects identically when ranking them. However, spatial objects often cover different levels of spatial granularity and thereby are heterogeneous. For example, one user may prefer to be recommended a region (say Manhattan), while another user might prefer a venue (say a restaurant). Even for the same user, preferences can change at different stages of data exploration. In this paper, we study how to support top-k spatial object recommendations at varying levels of spatial granularity, enabling spatial objects at varying granularity, such as a city, suburb, or building, as a Point of Interest (POI). To solve this problem, we propose the use of a POI tree, which captures spatial containment relationships between POIs. We design a novel multi-task learning model called MPR (short for Multi-level POI Recommendation), where each task aims to return the top-k POIs at a certain spatial granularity level. Each task consists of two subtasks: (i) attribute-based representation learning; (ii) interaction-based representation learning. The first subtask learns the feature representations for both users and POIs, capturing attributes directly from their profiles. The second subtask incorporates user-POI interactions into the model. Additionally, MPR can provide insights into why certain recommendations are being made to a user based on three types of hints: user-aspect, POI-aspect, and interaction-aspect. We empirically validate our approach using two real-life datasets, and show promising performance improvements over several state-of-the-art methods.
Abstract: Many sellers on e-commerce platforms offer buyers product bundles, which package together two or more different items. The identification of such bundles is a necessary step to support a variety of related services, from recommendation to dynamic pricing. In this work, we present a comprehensive study of bundle identification on a large e-commerce website. Our analysis of bundle compared to non-bundle listed items reveals several key differentiating characteristics, spanning the listing's title, image, and attributes. Following, we experiment with a multi-modal classifier, which takes advantage of these characteristics as features. Our analysis also shows that a bundle indicator input by sellers tends to be highly noisy and carries only a weak signal. The bundle identification task therefore faces the challenge of having a small set of manually-labeled clean examples and a larger set of noisy-labeled examples, in conjunction with class imbalance due to the relative scarcity of bundles. Our experiments with basic supervised classifiers, using the manually-labeled and/or the noisy-labeled data for training, demonstrates only moderate performance. We therefore turn to a semisupervised approach and propose GREED, a self-training ensemblebased algorithm with a greedy model selection. Our evaluation over two different meta-categories shows a superior performance of semi-supervised approaches for the bundle identification task, with GREED outperforming several semi-supervised alternatives. The combination of textual, image, and some metadata features is shown to yield the best performance, reaching an AUC of 0.89 and 0.92 for the two meta-categories, respectively
Abstract: Electronic Health Record (EHR) coding is the task of assigning one or more International Classification of Diseases (ICD) codes to every EHR. Most previous work either ignores the hierarchical nature of the ICD codes or only focuses on parent-child relations. Moreover, existing EHR coding methods predict ICD codes from the leaf level with the greatest ICD number and the most fine-grained categories, which makes it difficult for models to make correct decisions. In order to address these problems, we model EHR coding as a path generation task. For this approach, we need to address two main challenges: (1) How to model relations between EHR and ICD codes, and relations between ICD codes? (2) How to evaluate the quality of generated ICD paths in order to obtain a signal that can be used to supervise the learning? We propose a coarse-to-fine ICD path generation framework, named Reinforcement Path Generation Network (RPGNet), that implements EHR coding with a Path Generator (PG) and a Path Discriminator (PD). We address challenge (1) by introducing a Path Message Passing (PMP) module in the PG to encode three types of relation: between EHRs and ICD codes, between parent-child ICD codes, and between sibling ICD codes. To address challgenge (2), we propose a PD component that estimates the reward for each ICD code in a generated path. RPGNet is trained with Reinforcement Learning (RL) in an adversarial manner. Experiments on the MIMIC-III benchmark dataset show that RPGNet significantly outperforms state-of-the-art methods in terms of micro-averaged F1 and micro-averaged AUC.
Abstract: Entity alignment (EA) is to discover equivalent entities in knowledge graphs (KGs), which bridges heterogeneous sources of information and facilitates the integration of knowledge. Existing EA solutions mainly rely on structural information to align entities, typically through KG embedding. Nonetheless, in real-life KGs, only a few entities are densely connected to others, and the rest majority possess rather sparse neighborhood structure. We refer to the latter as long-tail entities, and observe that such phenomenon arguably limits the use of structural information for EA. To mitigate the issue, we revisit and investigate into the conventional EA pipeline in pursuit of elegant performance. For pre-alignment, we propose to amplify long-tail entities, which are of relatively weak structural information, with entity name information that is generally available (but overlooked) in the form of concatenated power mean word embeddings. For alignment, under a novel complementary framework of consolidating structural and name signals, we identify entity's degree as important guidance to effectively fuse two different sources of information. To this end, a degree-aware co-attention network is conceived, which dynamically adjusts the significance of features in a degree-aware manner. For post-alignment, we propose to complement original KGs with facts from their counterparts by using confident EA results as anchors via iterative training. Comprehensive experimental evaluations validate the superiority of our proposed techniques.
Abstract: In the process of visual perception, humans perceive not only the appearance of objects existing in a place but also their relationships (e.g. spatial layout). However, the dominant works on visual place recognition are always based on the assumption that two images depict the same place if they contain enough similar objects, while the relation information is neglected. In this paper, we propose a regional relation module which models the regional relationships and converts the convolutional feature maps to the relational feature maps. We further design a cascaded pooling method to get discriminative relation descriptors by preventing the influence of confusing relations and preserving as much useful information as possible. Extensive experiments on two place recognition benchmarks demonstrate that training with the proposed regional relation module improves the appearance descriptors and the relation descriptors are complementary to appearance descriptors. When these two kinds of descriptors are concatenated together, the resulting combined descriptors outperform the state-of-the-art methods.
Abstract: Recommender systems are feedback loop systems, which often face bias problems such as popularity bias, previous model bias and position bias. In this paper, we focus on solving the bias problems in a recommender system via a uniform data. Through empirical studies in online and offline settings, we observe that simple modeling with a uniform data can alleviate the bias problems and improve the performance. However, the uniform data is always few and expensive to collect in a real product. In order to use the valuable uniform data more effectively, we propose a general knowledge distillation framework for counterfactual recommendation that enables uniform data modeling through four approaches: (1) label-based distillation focuses on using the imputed labels as a carrier to provide useful de-biasing guidance; (2) feature-based distillation aims to filter out the representative causal and stable features; (3) sample-based distillation considers mutual learning and alignment of the information of the uniform and non-uniform data; and (4) model structure-based distillation constrains the training of the models from the perspective of embedded representation. We conduct extensive experiments on both public and product datasets, demonstrating that the proposed four methods achieve better performance over the baseline models in terms of AUC and NLL. Moreover, we discuss the relation between the proposed methods and the previous works. We emphasize that counterfactual modeling with uniform data is a rich research area, and list some interesting and promising research topics worthy of further exploration. Note that the source codes are available at \urlhttps://github.com/dgliu/SIGIR20_KDCRec.
Abstract: False-positive metrics can capture an important side of recommendation quality, focusing on the impact of suggestions that are disliked by users, as a complement of common metrics that only measure the amount of successful recommendations. In this paper we research the extent to which false-positive metrics agree or disagree with true-positive metrics in the offline evaluation of recommender systems. We discover a surprising degree of systematic disagreement that was occasionally noted but not explained in the literature by previous authors. We find an explanation for the discrepancy be-tween the metrics in the effect of popularity biases, which impact false and true-positive metrics in very different ways: instead of rewarding the recommendation of popular items, as with true-positive, false-positive metrics penalize the popular. We determine precise conditions and cases in the general trends, with a formal explanation for our findings, which we confirm and illustrate empirically in experiments with different datasets.
Abstract: Patients are increasingly using the web for understanding medical information, making health decisions, and validating physicians' advice. However, most of this content is tailored to an expert audience, due to which people with inadequate health literacy often find it difficult to access, comprehend, and act upon this information. Medical text simplification aims to alleviate this problem by computationally simplifying medical text. Most text simplification methods employ neural seq-to-seq models for this task. However, training such models requires a corpus of aligned complex and simple sentences. Creating such a dataset manually is effort intensive, while creating it automatically is prone to alignment errors. To overcome these challenges, we propose a denoising autoencoder based neural model for this task which leverages the simplistic writing style of medical social media text. Experiments on four datasets show that our method significantly outperforms the best known medical text simplification models across multiple automated and human evaluation metrics. Our model achieves an improvement of up to 16.52% over the existing best performing model on SARI which is the primary metric to evaluate text simplification models.
Abstract: Implicit feedback data is extensively explored in recommendation as it is easy to collect and generally applicable. However, predicting users' preference on implicit feedback data is a challenging task since we can only observe positive (voted) samples and unvoted samples. It is difficult to distinguish between the negative samples and unlabeled positive samples from the unvoted ones. Existing works, such as Bayesian Personalized Ranking (BPR), sample unvoted items as negative samples uniformly, therefore suffer from a critical noisy-label issue. To address this gap, we design an adaptive sampler based on noisy-label robust learning for implicit feedback data. To formulate the issue, we first introduce Bayesian Point-wise Optimization (BPO) to learn a model, e.g., Matrix Factorization (MF), by maximum likelihood estimation. We predict users' preferences with the model and learn it by maximizing likelihood of observed data labels, i.e., a user prefers her positive samples and has no interests in her unvoted samples. However, in reality, a user may have interests in some of her unvoted samples, which are indeed positive samples mislabeled as negative ones. We then consider the risk of these noisy labels, and propose a Noisy-label Robust BPO (NBPO). NBPO also maximizes the observation likelihood while connects users' preference and observed labels by the likelihood of label flipping based on the Bayes' theorem. In NBPO, a user prefers her true positive samples and shows no interests in her true negative samples, hence the optimization quality is dramatically improved. Extensive experiments on two public real-world datasets show the significant improvement of our proposed optimization methods.
Abstract: As huge commercial value of the recommender system, there has been growing interest to improve its performance in recent years. The majority of existing methods have achieved great improvement on the metric of click, but perform poorly on the metric of conversion possibly due to its extremely sparse feedback signal. To track this challenge, we design a novel deep hierarchical reinforcement learning based recommendation framework to model consumers' hierarchical purchase interest. Specifically, the high-level agent catches long-term sparse conversion interest, and automatically sets abstract goals for low-level agent, while the low-level agent follows the abstract goals and catches short-term click interest via interacting with real-time environment. To solve the inherent problem in hierarchical reinforcement learning, we propose a novel multi-goals abstraction based deep hierarchical reinforcement learning algorithm (MaHRL). Our proposed algorithm contains three contributions: 1) the high-level agent generates multiple goals to guide the low-level agent in different sub-periods, which reduces the difficulty of approaching high-level goals; 2) different goals share the same state encoder structure and its parameters, which increases the update frequency of the high-level agent and thus accelerates the convergence of our proposed algorithm; 3) an appreciated reward assignment mechanism is designed to allocate rewards in each goal so as to coordinate different goals in a consistent direction. We evaluate our proposed algorithm based on a real-world e-commerce dataset and validate its effectiveness.
Abstract: Conversational and question-based recommender systems have gained increasing attention in recent years, with users enabled to converse with the system and better control recommendations. Nevertheless, research in the field is still limited, compared to traditional recommender systems. In this work, we propose a novel Question-based recommendation method, Qrec, to assist users to find items interactively, by answering automatically constructed and algorithmically chosen questions. Previous conversational recommender systems ask users to express their preferences over items or item facets. Our model, instead, asks users to express their preferences over descriptive item features. The model is first trained offline by a novel matrix factorization algorithm, and then iteratively updates the user and item latent factors online by a closed-form solution based on the user answers. Meanwhile, our model infers the underlying user belief and preferences over items to learn an optimal question-asking strategy by using Generalized Binary Search, so as to ask a sequence of questions to the user. Our experimental results demonstrate that our proposed matrix factorization model outperforms the traditional Probabilistic Matrix Factorization model. Further, our proposed Qrec model can greatly improve the performance of state-of-the-art baselines, and it is also effective in the case of cold-start user and item recommendations.
Abstract: As a fundamental yet significant process in personalized recommendation, candidate generation and suggestion effectively help users spot the most suitable items for them. Consequently, identifying substitutable items that are interchangeable opens up new opportunities to refine the quality of generated candidates. When a user is browsing a specific type of product (e.g., a laptop) to buy, the accurate recommendation of substitutes (e.g., better equipped laptops) can offer the user more suitable options to choose from, thus substantially increasing the chance of a successful purchase. However, existing methods merely treat this problem as mining pairwise item relationships without the consideration of users' personal preferences. Moreover, the substitutable relationships are implicitly identified through the learned latent representations of items, leading to uninterpretable recommendation results. In this paper, we propose attribute-aware collaborative filtering (A2CF) to perform substitute recommendation by addressing issues from both personalization and interpretability perspectives. In A2CF, instead of directly modelling user-item interactions, we extract explicit and polarized item attributes from user reviews with sentiment analysis, whereafter the representations of attributes, users, and items are simultaneously learned. Then, by treating attributes as the bridge between users and items, we can thoroughly model the user-item preferences (i.e., personalization) and item-item relationships (i.e., substitution) for recommendation. In addition, A2CF is capable of generating intuitive interpretations by analyzing which attributes a user currently cares the most and comparing the recommended substitutes with her/his currently browsed items at an attribute level. The recommendation effectiveness and interpretation quality of A2CF are further demonstrated via extensive experiments on three real-life datasets.
Abstract: As the emergence of E-commerce services, billions of products are sold online everyday. How to detect illegal products from the large-scale online products has become an important and practical research problem. In order to evade detection, malicious sellers usually utilize camouflaged text to describe their illegal products implicitly. Thus brings great challenges to the current detection systems since newly camouflaged text can hardly be learned from historical data and the distribution of illegal and normal products is extremely unbalanced. Rather than solving this problem as a classification task in most previous efforts, we reformulate the problem from a perspective of implicit entity linking, which targets at linking a camouflaged description to a known product. In this paper, we introduce three types of context that could help to infer implicit entity from camouflaged descriptions and propose an end-to-end contextual representation model to capture the effect of different context. Furthermore, we involve a symmetric metric to model the matching score of the input title to the product by learning the mutual effect among the context. The experimental results on the datasets collected from a real-world E-commerce site demonstrate the advantage of the proposed model against the state-of-the-art methods.
Abstract: We present Distributed Equivalent Substitution (DES) training, a novel distributed training framework for large-scale recommender systems with dynamic sparse features. DES introduces fully synchronous training to large-scale recommendation system for the first time by reducing communication, thus making the training of commercial recommender systems converge faster and reach better CTR. DES requires much less communication by substituting the weights-rich operators with the computationally equivalent sub-operators and aggregating partial results instead of transmitting the huge sparse weights directly through the network. Due to the use of synchronous training on large-scale Deep Learning Recommendation Models (DLRMs), DES achieves higher AUC(Area Under ROC). We successfully apply DES training on multiple popular DLRMs of industrial scenarios. Experiments show that our implementation outperforms the state-of-the-art PS-based training framework, achieving up to 68.7% communication savings and higher throughput compared to other PS-based recommender systems.
Abstract: In this work we focus on multi-turn passage retrieval as a crucial component of conversational search. One of the key challenges in multi-turn passage retrieval comes from the fact that the current turn query is often underspecified due to zero anaphora, topic change, or topic return. Context from the conversational history can be used to arrive at a better expression of the current turn query, defined as the task of query resolution. In this paper, we model the query resolution task as a binary term classification problem: for each term appearing in the previous turns of the conversation decide whether to add it to the current turn query or not. We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers. We propose a distant supervision method to automatically generate training data by using query-passage relevance labels. Such labels are often readily available in a collection either as human annotations or inferred from user interactions. We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC. We incorporate QuReTeC in a multi-turn, multi-stage passage retrieval architecture and demonstrate its effectiveness on the TREC CAsT dataset.
Abstract: In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major component of RL approaches is to train the agent through interactions with the environment. However, it is often problematic to train a recommender in an on-line fashion due to the requirement to expose users to irrelevant recommendations. As a result, learning the policy from logged implicit feedback is of vital importance, which is challenging due to the pure off-policy setting and lack of negative rewards (feedback). In this paper, we propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. The RL part acts as a regularizer to drive the supervised layer focusing on specific rewards (e.g., recommending items which may lead to purchases rather than clicks) while the self-supervised layer with cross-entropy loss provides strong gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning (SQN) and Self-Supervised Actor-Critic (SAC). We integrate the proposed frameworks with four state-of-the-art recommendation models. Experimental results on two real-world datasets demonstrate the effectiveness of our approach.
Abstract: In this work, we aim to investigate the practical task of flexible fashion search with attribute manipulation, where users can retrieve the target fashion items by replacing the unwanted attributes of an available query image with the desired ones (e.g., changing the collar attribute from v-neck to round). Although several pioneer efforts have been dedicated to fulfilling the task, they mainly ignore the potential of generative models in enhancing the visual understanding of target fashion items. To this end, we propose an end-to-end generative attribute manipulation scheme, which consists of a generator and a discriminator. The generator works on producing the prototype image that meets the user's requirement of attribute manipulation over the query image with the regularization of visual-semantic consistency and pixel-wise consistency. Besides, the discriminator aims to jointly fulfill the semantic learning towards correct attribute manipulation and adversarial metric learning for fashion search. Pertaining to the adversarial metric learning, we provide two general paradigms: the pair-based scheme and the triplet-based scheme, where the fake generated prototype images that closely resemble the ground truth images of target items are incorporated as hard negative samples to boost the model performance. Extensive experiments on two real-world datasets verify the effectiveness of our scheme.
Abstract: Shilling attacks against collaborative filtering (CF) models are characterized by several fake user profiles mounted on the system by an adversarial party to harvest recommendation outcomes toward a malicious desire. The vulnerability of CF models is directly tied with their reliance on the underlying interaction data ---like user-item rating matrix (URM) --- to train their models and their inherent inability to distinguish genuine profiles from non-genuine ones. The majority of works conducted so far for analyzing shilling attacks mainly focused on properties such as confronted recommendation models, recommendation outputs, and even users under attack. The under-researched element has been the impact of data characteristics on the effectiveness of shilling attacks on CF models. Toward this goal, this work presents a systematic and in-depth study by using an analytical modeling approach built on a regression model to test the hypothesis of whether URM properties can impact the outcome of CF recommenders under a shilling attack. We ran extensive experiments involving 97200 simulations on three different domains (movie, business, and music), and showed that URM properties considerably affect the robustness of CF models in shilling attack scenarios. Obtained results can be of great help for the system designer in understanding the cause of variations in a recommender system performance due to a shilling attack.
Abstract: Most existing recommender systems leverage users' complete original behavioral logs, which are collected from mobile devices and stored by the service provider and further fed into recommendation models. This may lead to a high risk of privacy leakage since the recommendation service provider may be trustless. Despite many research efforts on privacy-aware recommendation, the problem of building an effective recommender system completely preserving user privacy is still open. In this work, we propose a general framework named differentially private local collaborative filtering for recommendation. The designed workflow consists of three steps. First, for accumulated behavioral logs saved on users' devices, a differentially private protection mechanism is adopted to help obfuscate the real interactions before reporting them to the server. Second, after collecting all obfuscated records from all users, the server runs an estimation model to calculate similarities between each pair of items. This step requires no user-relevant data, and thus it does not introduce any auxiliary privacy risk. Last, the server sends the estimated user-irrelevant item-similarity matrix to each user device, and the recommendation results are inferred locally based on item similarities with each user's locally stored original behavioral data. To verify our method's efficacy, we conduct extensive experiments on three real-world datasets, demonstrating that our proposed method achieves the best performance compared with the state-of-the-art baselines. We further demonstrate that our method still works well under various privacy budgets and different data sparsity level.
Abstract: Content-aware recommendation approaches are essential for providing meaningful recommendations for new (i.e.,cold-start) items in a recommender system. We present a content-aware neural hashing-based collaborative filtering approach (NeuHash-CF), which generates binary hash codes for users and items, such that the highly efficient Hamming distance can be used for estimating user-item relevance. NeuHash-CF is modelled as an autoencoder architecture, consisting of two joint hashing components for generating user and item hash codes. Inspired from semantic hashing, the item hashing component generates a hash code directly from an item's content information (i.e., it generates cold-start and seen item hash codes in the same manner). This contrasts existing state-of-the-art models, which treat the two item cases separately. The user hash codes are generated directly based on user id, through learning a user embedding matrix. We show experimentally that NeuHash-CF significantly outperforms state-of-the-art baselines by up to 12% NDCG and 13% MRR in cold-start recommendation settings, and up to 4% in both NDCG and MRR in standard settings where all items are present while training. Our approach uses 2-4x shorter hash codes, while obtaining the same or better performance compared to the state of the art, thus consequently also enabling a notable storage reduction.
Abstract: With distinct privacy protection advantages, federated recommendation is becoming increasingly feasible to store data locally in devices and federally train recommender models. However, previous work on federated recommender systems does not take full account of the limitations of storage, RAM, energy and communication bandwidth in the mobile environment. Their model scales are too big to run easily in mobile devices. Moreover, existing federated recommenders need to fine-tune recommendation models in each device, which makes them hard to effectively exploit collaborative filtering (CF) information among users/devices. Our goal in this paper is to design a novel federated learning framework to rating prediction (RP) for this environment that operates on par with state-of-the-art fully centralized RP methods. To this end, we introduce a novel federated matrix factorization (MF) framework, named meta matrix factorization (MetaMF), that is able to generate private item embeddings and RP models with a meta network. Given a user, we first obtain a collaborative vector by collecting useful information with a collaborative memory (CM) module. Then, we employ a meta recommender (MR) module to generate private item embeddings and a RP model based on the collaborative vector in the server. To address the challenge of generating a large number of high-dimensional item embeddings, we devise a rise-dimensional generation (RG) strategy that first generates a low-dimensional item embedding matrix and a rise-dimensional matrix, and then multiply them to obtain high-dimensional embeddings. Finally, we use the generated model to produce private RPs for the given user in her device. MetaMF shows a high capacity even with a small RP model, which can adapt to those limitations in the mobile environment. We conduct extensive experiments on four benchmark datasets to compare MetaMF with existing MF methods and find MetaMF can achieve competitive performance. Moreover, we find MetaMF achieves higher RP performance over existing federated methods by better exploiting CF among users/devices.
Abstract: Many interactive online systems, such as social media platforms or news sites, provide personalized experiences through recommendations or news feed customization based on people's feedback and engagement on individual items (e.g., liking items). In this paper, we investigate how we can support a greater degree of user control in such systems by changing the way the system allows people to gauge the consequences of their feedback actions. To this end, we consider two important aspects of how the system responds to feedback actions: (i) immediacy, i.e., how quickly the system responds with an update, and (ii) visibility, i.e., whether or not changes will get highlighted. We used both an in-lab qualitative study and a large-scale crowd-sourced study to examine the impact of these factors on people's reported preferences and observed behavioral metrics. We demonstrate that UX design which enables people to preview the impact of their actions and highlights changes results in a higher reported transparency, an overall preference for this design, and a greater selectivity in which items are liked.
Abstract: Learning informative representations of users and items from the interaction data is of crucial importance to collaborative filtering (CF). Present embedding functions exploit user-item relationships to enrich the representations, evolving from a single user-item instance to the holistic interaction graph. Nevertheless, they largely model the relationships in a uniform manner, while neglecting the diversity of user intents on adopting the items, which could be to pass time, for interest, or shopping for others like families. Such uniform approach to model user interests easily results in suboptimal representations, failing to model diverse relationships and disentangle user intents in representations. In this work, we pay special attention to user-item relationships at the finer granularity of user intents. We hence devise a new model, Disentangled Graph Collaborative Filtering (DGCF), to disentangle these factors and yield disentangled representations. Specifically, by modeling a distribution over intents for each user-item interaction, we iteratively refine the intent-aware interaction graphs and representations. Meanwhile, we encourage independence of different intents. This leads to disentangled representations, effectively distilling information pertinent to each intent. We conduct extensive experiments on three benchmark datasets, and DGCF achieves significant improvements over several state-of-the-art models like NGCF, DisenGCN, and MacridVAE. Further analyses offer insights into the advantages of DGCF on the disentanglement of user intents and interpretability of representations. Our codes are available in https://github.com/ xiangwang1223/disentangled_graph_collaborative_filtering.
Abstract: Automated essay scoring (AES) is a promising, yet challenging task. Current state-of-the-art AES models ignore the domain difference and cannot effectively leverage data from different domains. In this paper, we propose a domain-adaptive framework to improve the domain adaptability of AES models. We design two domain-independent self-supervised tasks and jointly train them with the AES task simultaneously. The self-supervised tasks enable the model to capture the shared knowledge across different domains and act as the regularization to induce a shared feature space. We further propose to enhance the model's robustness to domain variation via a novel domain adversarial training technique. The main idea of the proposed domain adversarial training is to train the model with small well-designed perturbations to make the model robust to domain variation. We obtain the perturbation via a variation of the Fast Gradient Sign Method (FGSM). Our approach achieves new state-of-the-art performance in both in-domain and cross-domain experiments on the ASAP dataset. We also show that the proposed domain adaptation framework is architecture-free and can be successfully applied to different models.
Abstract: Online reviews play a critical role in persuading or dissuading users when making purchase decisions. And yet very few users take the time to write helpful reviews. Encouragingly, recent advances in deep neural networks offer good potential to produce review-like natural language content. However, there is a lack of large, high-quality labeled data at both the aspect and sentiment level for training. Hence, toward enabling a writing assistant framework to help users post online reviews, this paper proposes a scalable labeling method for bootstrapping aspect and sentiment labels.Concretely, the proposed approach ? Aspect Dependent Online RE-views (ADORE) - leverages the underlying distribution of reviews and a small seed set of labeled data through carefully designed review segmentation and label assignment. We then show how these labels can inform a generative model to produce aspect and sentiment-aware reviews. We study the effectiveness of ADORE under various scenarios such as how end-users perceive the quality of the labels and aspect-aware generated reviews. Our experiments indicate that the proposed effective labeling process along with a regularized joint generative model lead to high quality reviews with 90% accuracy.
Abstract: Many top-k document retrieval strategies have been proposed based on the WAND and MaxScore heuristics and yet, from recent work, it is surprisingly difficult to identify the "fastest" strategy. This becomes even more challenging when considering various retrieval criteria, like different ranking models and values of k. In this paper, we conduct the first extensive comparison between ten effective strategies, many of which were never compared before to our knowledge, examining their efficiency under five representative ranking models. Based on a careful analysis of the comparison, we propose LazyBM, a remarkably simple retrieval strategy that bridges the gap between the best performing WAND-based and MaxScore-based approaches. Empirically, LazyBM considerably outperforms all of the considered strategies across ranking models, values of k, and index configurations under both mean and tail query latency.
Abstract: Recommender systems are increasingly used to predict and serve content that aligns with user taste, yet the task of matching new users with relevant content remains a challenge. We consider podcasting to be an emerging medium with rapid growth in adoption, and discuss challenges that arise when applying traditional recommendation approaches to address the cold-start problem. Using music consumption behavior, we examine two main techniques in inferring Spotify users preferences over more than 200k podcasts. Our results show significant improvements in consumption of up to 50% for both offline and online experiments. We provide extensive analysis on model performance and examine the degree to which music data as an input source introduces bias in recommendations.
Abstract: Email remains one of the most frequently used means of online communication. People spend significant amount of time every day on emails to exchange information, manage tasks and schedule events. Previous work has studied different ways for improving email productivity by prioritizing emails, suggesting automatic replies or identifying intents to recommend appropriate actions. The problem has been mostly posed as a supervised learning problem where models of different complexities were proposed to classify an email message into a predefined taxonomy of intents or classes. The need for labeled data has always been one of the largest bottlenecks in training supervised models. This is especially the case for many real-world tasks, such as email intent classification, where large scale annotated examples are either hard to acquire or unavailable due to privacy or data access constraints. Email users often take actions in response to intents expressed in an email (e.g., setting up a meeting in response to an email with a scheduling request). Such actions can be inferred from user interaction logs. In this paper, we propose to leverage user actions as a source of weak supervision, in addition to a limited set of annotated examples, to detect intents in emails. We develop an end-to-end robust deep neural network model for email intent identification that leverages both clean annotated data and noisy weak supervision along with a self-paced learning mechanism. Extensive experiments on three different intent detection tasks show that our approach can effectively leverage the weakly supervised data to improve intent detection in emails.
Abstract: Unsupervised video quantization is to compress the original videos to compact binary codes so that video retrieval can be conducted in an efficient way. In this paper, we make a first attempt to combine quantization method with video retrieval called 3D-UVQ, which obtains high retrieval accuracy with low storage cost. In the proposed framework, we address two main problems: 1) how to design an effective pipeline to perceive video contextual information for video features extraction; and 2) how to quantize these features for efficient retrieval. To tackle these problems, we propose a 3D self-attention module to exploit the spatial and temporal contextual information, where each pixel is influenced by its surrounding pixels. By taking a further recurrent operation, each pixel can finally capture the global context from all pixels. Then, we propose gradient-based residual quantization which consists of several quantization blocks to approximate the features gradually. Extensive experimental results on three benchmark datasets demonstrate that our method significantly outperforms the state-of-the-arts. Ablation study shows that both the 3D self-attention module and the gradient-based residual quantization can improve the performance of retrieval. Our model is publicly available at https://github.com/brownwolf/3D-UVQ.
Abstract: Next-basket recommendation (NBR) is prevalent in e-commerce and retail industry. In this scenario, a user purchases a set of items (a basket) at a time. NBR performs sequential modeling and recommendation based on a sequence of baskets. NBR is in general more complex than the widely studied sequential (session-based) recommendation which recommends the next item based on a sequence of items. Recurrent neural network (RNN) has proved to be very effective for sequential modeling, and thus been adapted for NBR. However, we argue that existing RNNs cannot directly capture item frequency information in the recommendation scenario. Through careful analysis of real-world datasets, we find that personalized item frequency (PIF) information (which records the number of times that each item is purchased by a user) provides two critical signals for NBR. But, this has been largely ignored by existing methods. Even though existing methods such as RNN based methods have strong representation ability, our empirical results show that they fail to learn and capture PIF. As a result, existing methods cannot fully exploit the critical signals contained in PIF. Given this inherent limitation of RNNs, we propose a simple item frequency based k-nearest neighbors (kNN) method to directly utilize these critical signals. We evaluate our method on four public real-world datasets. Despite its relative simplicity, our method frequently outperforms the state-of-the-art NBR methods - including deep learning based methods using RNNs - when patterns associated with PIF play an important role in the data.
Abstract: The rapid proliferation of new users and items on the social web has aggravated the gray-sheep user/long-tail item challenge in recommender systems. Historically, cross-domain co-clustering methods have successfully leveraged shared users and items across dense and sparse domains to improve inference quality. However, they rely on shared rating data and cannot scale to multiple sparse target domains (i.e., the one-to-many transfer setting). This, combined with the increasing adoption of neural recommender architectures, motivates us to develop scalable neural layer-transfer approaches for cross-domain learning. Our key intuition is to guide neural collaborative filtering with domain-invariant components shared across the dense and sparse domains, improving the user and item representations learned in the sparse domains. We leverage contextual invariances across domains to develop these shared modules, and demonstrate that with user-item interaction context, we can learn-to-learn informative representation spaces even with sparse interaction data. We show the effectiveness and scalability of our approach on two public datasets and a massive transaction dataset from Visa, a global payments technology company (19% Item Recall, 3x faster vs. training separate models for each domain). Our approach is applicable to both implicit and explicit feedback settings.
Abstract: Session-based recommendation (SR) has become an important and popular component of various e-commerce platforms, which aims to predict the next interacted item based on a given session. Most of existing SR models only focus on exploiting the consecutive items in a session interacted by a certain user, to capture the transition pattern among the items. Although some of them have been proven effective, the following two insights are often neglected. First, a user's micro-behaviors, such as the manner in which the user locates an item, the activities that the user commits on an item (e.g., reading comments, adding to cart), offer fine-grained and deep understanding of the user's preference. Second, the item attributes, also known as item knowledge, provide side information to model the transition pattern among interacted items and alleviate the data sparsity problem. These insights motivate us to propose a novel SR model MKM-SR in this paper, which incorporates user Micro-behaviors and item Knowledge into Multi-task learning for Session-based Recommendation. Specifically, a given session is modeled on micro-behavior level in MKM-SR, i.e., with a sequence of item-operation pairs rather than a sequence of items, to capture the transition pattern in the session sufficiently. Furthermore, we propose a multi-task learning paradigm to involve learning knowledge embeddings which plays a role as an auxiliary task to promote the major task of SR. It enables our model to obtain better session representations, resulting in more precise SR recommendation results. The extensive evaluations on two benchmark datasets demonstrate MKM-SR's superiority over the state-of-the-art SR models, justifying the strategy of incorporating knowledge learning.
Abstract: There is an increasing attention on next-item recommendation systems to infer the dynamic user preferences with sequential user interactions. While the semantics of an item can change over time and across users, the item correlations defined by user interactions in the short term can be distilled to capture such change, and help in uncovering the dynamic user preferences. Thus, we are motivated to develop a novel next-item recommendation framework empowered by sequential hypergraphs. Specifically, the framework: (i) adopts hypergraph to represent the short-term item correlations and applies multiple convolutional layers to capture multi-order connections in the hypergraph; (ii) models the connections between different time periods with a residual gating layer; and (iii) is equipped with a fusion layer to incorporate both the dynamic item embedding and short-term user intent to the representation of each interaction before feeding it into the self-attention layer for dynamic user modeling. Through experiments on datasets from the ecommerce sites Amazon and Etsy and the information sharing platform Goodreads, the proposed model can significantly outperform the state-of-the-art in predicting the next interesting item for each user.
Abstract: The key to personalized search is to clarify the meaning of current query based on user's search history. Previous personalized studies tried to build user profiles on the basis of historical data to tailor the ranking. However, we argue that the user profile based methods do not really disambiguate the current query. They still retain some semantic bias when building user profiles. In this paper, we propose to encode history with context-aware representation learning to enhance the representation of current query, which is a direct way to clarify the user's information need. Specifically, endowed with the benefit from transformer on aggregating contextual information, we devise a query disambiguation model to parse the meaning of current query in multiple stages. Moreover, for covering the cases that current query is not sufficient to express the intent, we train a personalized language model to predict user intent from existing queries. Under the interaction of two sub-models, we can generate the context-aware representation of current query and re-rank the results based on it. Experimental results show the significant improvement of our model compared with previous methods.
Abstract: The cold start problem is a long-standing challenge in recommender systems. That is, how to recommend for new users and new items without any historical interaction record? Recent ML-based approaches have made promising strides versus traditional methods. These ML approaches typically combine both user-item interaction data of existing warm start users and items (as in CF-based methods) with auxiliary information of users and items such as user profiles and item content information (as in content-based methods). However, such approaches face key drawbacks including the error superimposition issue that the auxiliary-to-CF transformation error increases the final recommendation error; the ineffective learning issue that long distance from transformation functions to model output layer leads to ineffective model learning; and the unified transformation issue that applying the same transformation function for different users and items results in poor transformation. Hence, this paper proposes a novel model designed to overcome these drawbacks while delivering strong cold start performance. Three unique features are: (i) a combined separate-training and joint-training framework to overcome the error superimposition issue and improve model quality; (ii) a Randomized Training mechanism to promote the effectiveness of model learning; and (iii) a Mixture-of-Experts Transformation mechanism to provide 'personalized' transformation functions. Extensive experiments on three datasets show the effectiveness of the proposed model over state-of-the-art alternatives.
Abstract: Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying questions have been studied recently but the accurate utilization of user responses to clarifying questions has been relatively less explored. In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources that weights each term in the conversation. We evaluate this Guided Transformer model in a conversational search scenario that includes clarifying questions. In our experiments, we use two separate external sources, including the top retrieved documents and a set of different possible clarifying questions for the query. We implement the proposed representation learning model for two downstream tasks in conversational search; document retrieval and next clarifying question selection. Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines.
Abstract: How users think, behave, and make decisions when interacting with information retrieval (IR) systems is a fundamental research problem in the area of Interactive IR. There is substantial evidence from behavioral economics and decision sciences demonstrating that in the context of decision-making under uncertainty, the carriers of value behind actions are gains and losses defined relative to a reference point, rather than the absolute final outcomes. This Reference Dependence Effect as a systematic cognitive bias was largely ignored by most formal interaction models built upon a series of unrealistic assumptions of user rationality. To address this gap, our work seeks to 1) understand the effects of reference points on search behavior and satisfaction at both query and session levels; 2) apply the knowledge of reference dependence in predicting users' search decisions and variations in level of satisfaction. Based on our experiments on three datasets collected from 1840 task-based search sessions (5225 query segments), we found that: 1) users' search satisfaction and many aspects of search behaviors and decisions are significantly associated with relative gains, losses and the associated reference points; 2) users' judgments of session-level satisfaction are significantly affected by peak and end reference moments; 3) compared to final-outcome-based baselines, models employing gain- and loss-based features often achieve significantly better performances in predicting search decisions and user satisfaction. The adaptation of behavioral economics perspective enables us to keep taking advantage of the collision of interdisciplinary insights in advancing IR research and also increase the explanatory power of formal search models by providing them with a more realistic behavioral and psychological foundation.
Abstract: Today's conversational agents often generate responses that not sufficiently informative. One way of making them more informative is through the use of of external knowledge sources with so-called Knowledge-Grounded Conversations (KGCs). In this paper, we target the Knowledge Selection (KS) task, a key ingredient in KGC, that is aimed at selecting the appropriate knowledge to be used in the next response. Existing approaches to Knowledge Selection (KS) based on learned representations of the conversation context, that is previous conversation turns, and use Maximum Likelihood Estimation (MLE) to optimize KS. Such approaches have two main limitations. First, they do not explicitly track what knowledge has been used in the conversation nor how topics have shifted during the conversation. Second, MLE often relies on a limited set of example conversations for training, from which it is hard to infer that facts retrieved from the knowledge source can be re-used in multiple conversation contexts, and vice versa. We propose Dual Knowledge Interaction Network (DukeNet), a framework to address these challenges. DukeNet explicitly models knowledge tracking and knowledge shifting as dual tasks. We also design Dual Knowledge Interaction Learning (DukeL), an unsupervised learning scheme to train DukeNet by facilitating interactions between knowledge tracking and knowledge shifting, which, in turn, enables DukeNet to explore extra knowledge besides the knowledge encountered in the training set. This dual process also allows us to define rewards that help us to optimize both knowledge tracking and knowledge shifting. Experimental results on two public KGC benchmarks show that DukeNet significantly outperforms state-of-the-art methods in terms of both automatic and human evaluations, indicating that DukeNet enhanced by DukeL can select more appropriate knowledge and hence generate more informative and engaging responses.
Abstract: For social bots, smooth emotional transitions are essential for delivering a genuine conversation experience to users. Yet, the task is challenging because emotion is too implicit and complicated to understand. Among previous studies in the domain of retrieval-based conversational model, they only consider the factors of semantic and functional dependencies of utterances. In this paper, to implement a more empathetic retrieval-based conversation system, we incorporate emotional factors into context-response matching from two aspects: 1) On top of semantic matching, we propose an emotion-aware transition network to model the dynamic emotional flow and enhance context-response matching in retrieval-based dialogue systems with learnt intrinsic emotion features through a multi-task learning framework; 2) We design several flexible controlling mechanisms to customize social bots in terms of emotion. Extensive experiments on two benchmark datasets indicate that the proposed model can effectively track the flow of emotions throughout a human-machine conversation and significantly improve response selection in dialogues over the state-of-the-art baselines. We also empirically validate the emotion-control effects of our proposed model on three different emotional aspects. Finally, we apply such functionalities to a real IoT application.
Abstract: Past work in information-seeking conversation has demonstrated that people exhibit different conversational styles---for example, in word choice or prosody---that differences in style lead to poorer conversations, and that partners actively align their styles over time. One might assume that this would also be true for conversations with an artificial agent such as Cortana, Siri, or Alexa; and that agents should therefore track and mimic a user's style. We examine this hypothesis with reference to a lab study, where 24 participants carried out relatively long information-seeking tasks with an embodied conversational agent. The agent combined topical language models with a conversational dialogue engine, style recognition and alignment modules. We see that "style'' can be measured in human-to-agent conversation, although it looks somewhat different to style in human-to-human conversation and does not correlate with self-reported preferences. There is evidence that people align their style to the agent, and that conversations run more smoothly if the agent detects, and aligns to, the human's style as well.
Abstract: Asking clarifying questions in response to search queries has been recognized as a useful technique for revealing the underlying intent of the query. Clarification has applications in retrieval systems with different interfaces, from the traditional web search interfaces to the limited bandwidth interfaces as in speech-only and small screen devices. Generation and evaluation of clarifying questions have been recently studied in the literature. However, user interaction with clarifying questions is relatively unexplored. In this paper, we conduct a comprehensive study by analyzing large-scale user interactions with clarifying questions in a major web search engine. In more detail, we analyze the user engagements received by clarifying questions based on different properties of search queries, clarifying questions, and their candidate answers. We further study click bias in the data, and show that even though reading clarifying questions and candidate answers does not take significant efforts, there still exist some position and presentation biases in the data. We also propose a model for learning representation for clarifying questions based on the user interaction data as implicit feedback. The model is used for re-ranking a number of automatically generated clarifying questions for a given query. Evaluation on both click data and human labeled data demonstrates the high quality of the proposed method.
Abstract: Acquiring accurate summarization and sentiment from user reviews is an essential component of modern e-commerce platforms. Review summarization aims at generating a concise summary that describes the key opinions and sentiment of a review, while sentiment classification aims to predict a sentiment label indicating the sentiment attitude of a review. To effectively leverage the shared sentiment information in both review summarization and sentiment classification tasks, we propose a novel dual-view model that jointly improves the performance of these two tasks. In our model, an encoder first learns a context representation for the review, then a summary decoder generates a review summary word by word. After that, a source-view sentiment classifier uses the encoded context representation to predict a sentiment label for the review, while a summary-view sentiment classifier uses the decoder hidden states to predict a sentiment label for the generated summary. During training, we introduce an inconsistency loss to penalize the disagreement between these two classifiers. It helps the decoder to generate a summary to have a consistent sentiment tendency with the review and also helps the two sentiment classifiers learn from each other. Experiment results on four real-world datasets from different domains demonstrate the effectiveness of our model.
Abstract: Text classification in low-resource languages (eg Thai) is of great practical value for some information retrieval applications (eg sentiment-analysis-based restaurant recommendation). Due to lacking large-scale corpus for learning comprehensive text representation, bilingual text classification which borrows the linguistics knowledge from a rich-resource language becomes a promising solution. Despite the success of bilingual methods, they largely ignore another source of semantic information---the writing system. Noting that most low-resource languages are phonographic languages, we argue that a logographic language (eg Chinese) can provide helpful information for improving some phonographic languages' text classification, since a logographic character (ie logogram) could represent a sememe or a whole concept, not only a phoneme or a sound. In this paper, by using a phonographic labeled corpus and its machine-translated logographic corpus both, we devise a framework to explore the central theme of utilizing logograms as a "semantic detection assistant''. Specifically, from a logographic labeled corpus, we first devise a statistical-significance-based module to pick out informative text pieces. To represent them and further reduce the effects of translation errors, our approach is equipped with Gaussian embedding whose covariances serve as reliable signals of translation errors. For a test document, all seeds' Gaussian representations are used to convolute the document and produce a logographic embedding, before being fused with its phonographic embedding for final prediction. Extensive experiments validate the effectiveness of our approach and further investigations show its generalizability and robustness.
Abstract: With the increasing availability of videos, how to edit them and present the most interesting parts to users, i.e., video highlight, has become an urgent need with many broad applications. As users' visual preferences are subjective and vary from person to person, previous generalized video highlight extraction models fail to tailor to users' unique preferences. In this paper, we study the problem of personalized video highlight recommendation with rich visual content. By dividing each video into non-overlapping segments, we formulate the problem as a personalized segment recommendation task with many new segments in the test stage. The key challenges of this problem lie in: the cold-start users with limited video highlight records in the training data and new segments without any user ratings at the test stage. To tackle these challenges, an intuitive idea is to formulate a user-item interaction graph and perform inductive graph neural network based models for better user and item embedding learning. However, the graph embedding models fail to generalize to unseen items as these models rely on the item content feature and item link information for item embedding calculation. To this end, we propose an inductive Graph based Transfer learning framework for personalized video highlight Recommendation (TransGRec). TransGRec is composed of two parts: a graph neural network followed by an item embedding transfer network. Specifically, the graph neural network part exploits the higher-order proximity between users and segments to alleviate the user cold-start problem. The transfer network is designed to approximate the learned item embeddings from graph neural networks by taking each item's visual content as input, in order to tackle the new segment problem in the test phase. We design two detailed implementations of the transfer learning optimization function, and we show how the two parts of TransGRec can be efficiently optimized with different transfer learning optimization functions. Please note that, our proposed framework is generally applicable to any inductive graph based recommendation model to address the new node problem without any link structure. Finally, extensive experimental results on a real-world dataset clearly show the effectiveness of our proposed model.
Abstract: While product recommendation algorithms on the Web are well-supported by a vast amount of interaction data, the same is not true on Voice. A promising approach to mitigate the issue is transfer learning, i.e., transferring the knowledge of customers' shopping behaviors learned from their shopping activities on the Web to Voice. Such a Web-to-Voice transfer is challenging due to customers' distinct shopping behaviors on Voice: customers are inclined to purchase more low-consideration products and are more likely to purchase certain products repeatedly. This paper presents TransV, a novel Web-to-Voice neural transfer network that allows for effective transfer of customers' shopping patterns from the Web to Voice, while taking into account customers' distinct purchase patterns on Voice. Our method extends the state-of-the-art self-attention neural architecture with a multi-level tri-factorization neural component, which allows to explicitly capture the similarity and dissimilarity of customers' shopping patterns on the Web and Voice. To model repeated purchases, TransV adopts a recency-based copy mechanism that considers the impact of the recency of historical purchases on customers' behavior of repeated purchases. Extensive validation on multiple real-world datasets, including two cross-platform datasets from Amazon.com and Amazon Alexa, shows that our method is able to improve voice-based recommendation substantially by 26.8% as compared with non-transfer learning methods.
Abstract: Document categorization, which aims to assign a topic label to each document, plays a fundamental role in a wide variety of applications. Despite the success of existing studies in conventional supervised document classification, they are less concerned with two real problems: (1)the presence of metadata : in many domains, text is accompanied by various additional information such as authors and tags. Such metadata serve as compelling topic indicators and should be leveraged into the categorization framework; (2)label scarcity: labeled training samples are expensive to obtain in some cases, where categorization needs to be performed using only a small set of annotated data. In recognition of these two challenges, we propose MetaCat, a minimally supervised framework to categorize text with metadata. Specifically, we develop a generative process describing the relationships between words, documents, labels, and metadata. Guided by the generative model, we embed text and metadata into the same semantic space to encode heterogeneous signals. Then, based on the same generative process, we synthesize training samples to address the bottleneck of label scarcity. We conduct a thorough evaluation on a wide range of datasets. Experimental results prove the effectiveness of MetaCat over many competitive baselines.
Abstract: Aspect-based sentiment analysis is a substantial step towards text understanding which benefits numerous applications. Since most existing algorithms require a large amount of labeled data or substantial external language resources, applying them on a new domain or a new language is usually expensive and time-consuming. We aim to build an aspect-based sentiment analysis model from an unlabeled corpus with minimal guidance from users, i.e., only a small set of seed words for each aspect class and each sentiment class. We employ an autoencoder structure with attention to learn two dictionary matrices for aspect and sentiment respectively where each row of the dictionary serves as an embedding vector for an aspect or a sentiment class. We propose to utilize the user-given seed words to regularize the dictionary learning. In addition, we improve the model by joining the aspect and sentiment encoder in the reconstruction of sentiment in sentences. The joint structure enables sentiment embeddings in the dictionary to be tuned towards the aspect-specific sentiment words for each aspect, which benefits the classification performance. We conduct experiments on two real data sets to verify the effectiveness of our models.
Abstract: Cold-start problems are arguably the biggest challenges faced by collaborative filtering (CF) used in recommender systems. When few ratings are available, CF models typically fail to provide satisfactory recommendations for cold-start users or to display cold-start items on users' top-N recommendation lists. Data imputation has been a popular choice to deal with such problems in the context of CF, filling empty ratings with inferred scores. Different from (and complementary to) data imputation, this paper presents AR-CF, which stands for Augmented Reality CF, a novel framework for addressing the cold-start problems by generating virtual, but plausible neighbors for cold-start users or items and augmenting them to the rating matrix as additional information for CF models. Notably, AR-CF not only directly tackles the cold-start problems, but is also effective in improving overall recommendation qualities. Via extensive experiments on real-world datasets, AR-CF is shown to (1) significantly improve the accuracy of recommendation for cold-start users, (2) provide a meaningful number of the cold-start items to display in top-N lists of users, and (3) achieve the best accuracy as well in the basic top-N recommendations, all of which are compared with recent state-of-the-art methods.
Abstract: Studying competition and market structure at the product level instead of brand level can provide firms with insights on cannibalization and product line optimization. However, it is computationally challenging to analyze product-level competition for the millions of products available on e-commerce platforms. We introduce Product2Vec, a method based on the representation learning algorithm Word2Vec, to study product-level competition, when the number of products is large. The proposed model takes shopping baskets as inputs and, for every product, generates a low-dimensional embedding that preserves important product information. In order for the product embeddings to be useful for firm strategic decision making, we leverage economic theories and causal inference to propose two modifications to Word2Vec. First of all, we create two measures, complementarity and exchangeability, that allow us to determine whether product pairs are complements or substitutes. Second, we combine these vectors with random utility-based choice models to forecast demand. To accurately estimate price elasticities, i.e., how demand responds to changes in price, we modify Word2Vec by removing the influence of price from the product vectors. We show that, compared with state-of-the-art models, our approach is faster, and can produce more accurate demand forecasts and price elasticities.
Abstract: Providing explanations for recommended items not only allows users to understand the reason for receiving recommendations but also provides users with an opportunity to refine recommendations by critiquing undesired parts of the explanation. While much research focuses on improving the explanation of recommendations, less effort has focused on interactive recommendation by allowing a user to critique explanations. Aside from traditional constraint- and utility-based critiquing systems, the only end-to-end deep learning based critiquing approach in the literature so far, CE-VNCF, suffers from unstable and inefficient training performance. In this paper, we propose a Variational Autoencoder (VAE) based critiquing system to mitigate these issues and improve overall performance. The proposed model generates keyphrase-based explanations of recommendations and allows users to critique the generated explanations to refine their personalized recommendations. Our experiments show promising results: (1) The proposed model is competitive in terms of general performance in comparison to state-of-the-art recommenders, despite having an augmented loss function to support explanation and critiquing. (2) The proposed model can generate high-quality explanations compared to user or item keyphrase popularity baselines. (3) The proposed model is more effective in refining recommendations based on critiquing than CE-VNCF, where the rank of critiquing-affected items drops while general recommendation performance remains stable. In summary, this paper presents a significantly improved method for multi-step deep critiquing based recommender systems based on the VAE framework.
Abstract: We study the problem of making item recommendations to ephemeral groups, which comprise users with limited or no historical activities together. Existing studies target persistent groups with substantial activity history, while ephemeral groups lack historical interactions. To overcome group interaction sparsity, we propose data-driven regularization strategies to exploit both the preference covariance amongst users who are in the same group, as well as the contextual relevance of users' individual preferences to each group. We make two contributions. First, we present a recommender architecture-agnostic framework GroupIM that can integrate arbitrary neural preference encoders and aggregators for ephemeral group recommendation. Second, we regularize the user-group latent space to overcome group interaction sparsity by: maximizing mutual information between representations of groups and group members; and dynamically prioritizing the preferences of highly informative members through contextual preference weighting. Our experimental results on several real-world datasets indicate significant performance improvements (31-62% relative NDCG@20) over state-of-the-art group recommendation techniques.
Abstract: Personalized recommendation plays an important role in many online services. Substantial research has been dedicated to learning embeddings of users and items to predict a user's preference for an item based on the similarity of the representations. In many settings, there is abundant relationship information, including user-item interaction history, user-user and item-item similarities. In an attempt to exploit these relationships to learn better embeddings, researchers have turned to the emerging field of Graph Convolutional Neural Networks (GCNs), and applied GCNs for recommendation. Although these prior works have demonstrated promising performance, directly apply GCNs to process the user-item bipartite graph is suboptimal because the GCNs do not consider the intrinsic differences between user nodes and item nodes. Additionally, existing large-scale graph neural networks use aggregation functions such as sum/mean/max pooling operations to generate a node embedding that considers the nodes' neighborhood (i.e., the adjacent nodes in the graph), and these simple aggregation strategies fail to preserve the relational information in the neighborhood. To resolve the above limitations, in this paper, we propose a novel framework NIA-GCN, which can explicitly model the relational information between neighbor nodes and exploit the heterogeneous nature of the user-item bipartite graph. We conduct empirical studies on four public benchmarks, demonstrating a significant improvement over state-of-the-art approaches. Furthermore, we generalize our framework to a commercial App store recommendation scenario. We observe significant improvement on a large-scale commercial dataset, demonstrating the practical potential for our proposed solution as a key component of a large scale commercial recommender system. Furthermore, online experiments are conducted to demonstrate that NIA-GCN outperforms the baseline by 10.19% and 9.95% in average in terms of CTR and CVR during ten-day AB test in a mainstream App store.
Abstract: Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations. Current state-of-the-art sequential recommender models are typically based on a sandwich-structured deep neural network, where one or more middle (hidden) layers are placed between the input embedding layer and output softmax layer. In general, these models require a large number of parameters to obtain optimal performance. Despite the effectiveness, at some point, further increasing model size may be harder for model deployment in resource-constraint devices. To resolve the issues, we propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed. Specifically, we first propose a block-wise adaptive decomposition to approximate the input and softmax matrices by exploiting the fact that items in SRS obey a long-tailed distribution. To reduce the parameters of the middle layers, we introduce three layer-wise parameter sharing schemes. We instantiate CpRec using deep convolutional neural network with dilated kernels given consideration to both recommendation accuracy and efficiency. By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4~8 times compression rates in real-world SRS datasets. Meanwhile, CpRec is faster during training & inference, and in most cases outperforms its uncompressed counterpart.
Abstract: Tracking mouse cursor movements can be used to predict user attention on heterogeneous page layouts like SERPs. So far, previous work has relied heavily on handcrafted features, which is a time-consuming approach that often requires domain expertise. We investigate different representations of mouse cursor movements, including time series, heatmaps, and trajectory-based images, to build and contrast both recurrent and convolutional neural networks that can predict user attention to direct displays, such as SERP advertisements. Our models are trained over raw mouse cursor data and achieve competitive performance. We conclude that neural network models should be adopted for downstream tasks involving mouse cursor movements, since they can provide an invaluable implicit feedback signal for re-ranking and evaluation.
Abstract: The importance of e-commerce platforms has driven forward a growing body of research work on e-commerce search. We present the first large-scale and in-depth study of query reformulations performed by users of e-commerce search; the study is based on the query logs of eBay's search engine. We analyze various factors including the distribution of different types of reformulations, changes of search result pages retrieved for the reformulations, and clicks and purchases performed upon the retrieved results. We then turn to address a novel challenge in the e-commerce search realm: predicting whether a user will reformulate her query before presenting her the search results. Using a suite of prediction features, most of which are novel to this study, we attain high prediction quality. Some of the features operate prior to retrieval time, whereas others rely on the retrieved results. While the latter are substantially more effective than the former, we show that the integration of these two types of features is of merit. We also show that high prediction quality can be obtained without considering information from the past about the user or the query she posted. Nevertheless, using these types of information can further improve prediction quality.
Abstract: Finding images matching a user's intention has been largely based on matching a representation of the user's information needs with an existing collection of images. For example, using an example image or a written query to express the information need and retrieving images that share similarities with the query or example image. However, such an approach is limited to retrieving only images that already exist in the underlying collection. Here, we present a methodology for generating images matching the user intention instead of retrieving them. The methodology utilizes a relevance feedback loop between a user and generative adversarial neural networks (GANs). GANs can generate novel photorealistic images which are initially not present in the underlying collection, but generated in response to user feedback. We report experiments (N=29) where participants generate images using four different domains and various search goals with textual and image targets. The results show that the generated images match the tasks and outperform images selected as baselines from a fixed image collection. Our results demonstrate that generating new information can be more useful for users than retrieving it from a collection of existing information.
Abstract: The rapid growth of user-generated videos on the Internet has intensified the need for text-based video retrieval systems. Traditional methods mainly favor the concept-based paradigm on retrieval with simple queries, which are usually ineffective for complex queries that carry far more complex semantics. Recently, embedding-based paradigm has emerged as a popular approach. It aims to map the queries and videos into a shared embedding space where semantically-similar texts and videos are much closer to each other. Despite its simplicity, it forgoes the exploitation of the syntactic structure of text queries, making it suboptimal to model the complex queries. To facilitate video retrieval with complex queries, we propose a Tree-augmented Cross-modal Encoding method by jointly learning the linguistic structure of queries and the temporal representation of videos. Specifically, given a complex user query, we first recursively compose a latent semantic tree to structurally describe the text query. We then design a tree-augmented query encoder to derive structure-aware query representation and a temporal attentive video encoder to model the temporal characteristics of videos. Finally, both the query and videos are mapped into a joint embedding space for matching and ranking. In this approach, we have a better understanding and modeling of the complex queries, thereby achieving a better video retrieval performance. Extensive experiments on large scale video retrieval benchmark datasets demonstrate the effectiveness of our approach.
Abstract: Hashing techniques have recently been successfully applied to solve similarity search problems in the information retrieval field because of their significantly reduced storage and high-speed search capabilities. However, the hash codes learned from most recent cross-modal hashing methods lack the ability to comprehensively preserve adequate information, resulting in a less than desirable performance. To solve this limitation, we propose a novel method termed Nonlinear Robust Discrete Hashing (NRDH), for cross-modal retrieval. The main idea behind NRDH is motivated by the success of neural networks, i.e., nonlinear descriptors, in the field of representation learning, and the use of nonlinear descriptors instead of simple linear transformations is more in line with the complex relationships that exist between common latent representation and heterogeneous multimedia data in the real world. In NRDH, we first learn a common latent representation through nonlinear descriptors to encode complementary and consistent information from the features of the heterogeneous multimedia data. Moreover, an asymmetric learning scheme is proposed to correlate the learned hash codes with the common latent representation. Empirically, we demonstrate that NRDH is able to successfully generate a comprehensive common latent representation that significantly improves the quality of the learned hash codes. Then, NRDH adopts a linear learning strategy to fast learn the hash function with the learned hash codes. Extensive experiments performed on two benchmark datasets highlight the superiority of NRDH over several state-of-the-art methods.
Abstract: Personalized search is a task to tailor the general document ranking list based on user interests to better satisfy the user's information need. Many personalized search models have been proposed and demonstrated their capability to improve search quality. The general idea of most approaches is to build a user interest profile according to the user's search history, and then re-rank the documents based on the matching scores between the created user profile and candidate documents. In this paper, we propose to solve the problem of personalized search in an alternative way. We know that there are many ambiguous words in natural language such as 'Apple', and people with different knowledge backgrounds and interests have personalized understandings of these words. Therefore, for different users, such a word should own different semantic representations. Motivated by this idea, we design a personalized search model based on personal word embeddings, referred to as PEPS. Specifically, we train personal word embeddings for each user in which the representation of each word is mainly decided by the user's personal data. Then, we obtain the personalized word and contextual representations of the query and documents with an attention function. Finally, we use a matching model to calculate the matching score between the personalized query and document representations. Experiments on two datasets verify that our model can significantly improve state-of-the-art personalization models.
Abstract: Voice shopping using natural language introduces new challenges related to customer queries, like handling mispronounced, misexpressed, and misunderstood queries. Voice null queries, which result in no offers, have negative impact on customers shopping experience. Query rewriting (QR) attempts to automatically replace null queries with alternatives that lead to relevant results. We present a new approach for pre-retrieval QR of voice shopping null queries. Our proposed QR framework first generates alternative queries using a search index-based approach that targets different potential failures in voice queries. Then, a machine-learning component ranks these alternatives, and the original query is amended by the selected alternative. We provide an experimental evaluation of our approach based on data logs of a commercial voice assistant and an e-commerce website, demonstrating that it outperforms several baselines by more than $22%$. Our evaluation also highlights an interesting phenomenon, showing that web shopping null queries are considerably different, and apparently easier to fix, than voice queries. This further substantiates the use of specialized mechanisms for the voice domain. We believe that our proposed framework, mapping tail queries to head queries, is of independent interest since it can be extended and applied to other domains.
Abstract: Hashing-based cross-modal search which aims to map multiple modality features into binary codes has attracted increasingly attention due to its storage and search efficiency especially in large-scale database retrieval. Recent unsupervised deep cross-modal hashing methods have shown promising results. However, existing approaches typically suffer from two limitations: (1) They usually learn cross-modal similarity information separately or in a redundant fusion manner, which may fail to capture semantic correlations among instances from different modalities sufficiently and effectively. (2) They seldom consider the sampling and weighting schemes for unsupervised cross-modal hashing, resulting in the lack of satisfactory discriminative ability in hash codes. To overcome these limitations, we propose a novel unsupervised deep cross-modal hashing method called Joint-modal Distribution-based Similarity Hashing (JDSH) for large-scale cross-modal retrieval. Firstly, we propose a novel cross-modal joint-training method by constructing a joint-modal similarity matrix to fully preserve the cross-modal semantic correlations among instances. Secondly, we propose a sampling and weighting scheme termed the Distribution-based Similarity Decision and Weighting (DSDW) method for unsupervised cross-modal hashing, which is able to generate more discriminative hash codes by pushing semantic similar instance pairs closer and pulling semantic dissimilar instance pairs apart. The experimental results demonstrate the superiority of JDSH compared with several unsupervised cross-modal hashing methods on two public datasets NUS-WIDE and MIRFlickr.
Abstract: Image search engines rely on appropriately designed ranking features that capture various aspects of the content semantics as well as the historic popularity. In this work, we consider the role of colour in this relevance matching process. Our work is motivated by the observation that a significant fraction of user queries have an inherent colour associated with them. While some queries contain explicit colour mentions (such as 'black car' and 'yellow daisies'), other queries have implicit notions of colour (such as 'sky' and 'grass'). Furthermore, grounding queries in colour is not a mapping to a single colour, but a distribution in colour space. For instance, a search for 'trees' tends to have a bimodal distribution around the colours green and brown. We leverage historical clickthrough data to produce a colour representation for search queries and propose a recurrent neural network architecture to encode unseen queries into colour space. We also show how this embedding can be learnt alongside a cross-modal relevance ranker from impression logs where a subset of the result images were clicked. We demonstrate that the use of a query-image colour distance feature leads to an improvement in the ranker performance as measured by users' preferences of clicked versus skipped images.
Abstract: We address the web table retrieval task, aiming to retrieve and rank web tables as whole answers to a given information need. To this end, we formally define web tables as multimodal objects. We then suggest a neural ranking model, termed MTR, which makes a novel use of Gated Multimodal Units (GMUs) to learn a joint-representation of the query and the different table modalities. We further enhance this model with a co-learning approach which utilizes automatically learned query-independent and query-dependent "helper'' labels. We evaluate the proposed solution using both ad hoc queries (WikiTables) and natural language questions (GNQtables). Overall, we demonstrate that our approach surpasses the performance of previously studied state-of-the-art baselines.
Abstract: Cross-modal hashing has been widely investigated recently for its efficiency in large-scale cross-media retrieval. However, most existing cross-modal hashing methods learn hash functions in a batch-based learning mode. Such mode is not suitable for large-scale data sets due to the large memory consumption and loses its efficiency when training streaming data. Online cross-modal hashing can deal with the above problems by learning hash model in an online learning process. However, existing online cross-modal hashing methods cannot update hash codes of old data by the newly learned model. In this paper, we propose Online Collective Matrix Factorization Hashing (OCMFH) based on collective matrix factorization hashing (CMFH), which can adaptively update hash codes of old data according to dynamic changes of hash model without accessing to old data. Specifically, it learns discriminative hash codes for streaming data by collective matrix factorization in an online optimization scheme. Unlike conventional CMFH which needs to load the entire data points into memory, the proposed OCMFH retrains hash functions only by newly arriving data points. Meanwhile, it generates hash codes of new data and updates hash codes of old data by the latest updated hash model. In such way, hash codes of new data and old data are well-matched. Furthermore, a zero mean strategy is developed to solve the mean-varying problem in the online hash learning process. Extensive experiments on three benchmark data sets demonstrate the effectiveness and efficiency of OCMFH on online cross-media retrieval.
Abstract: The goal of cross-modal retrieval is to search for semantically similar instances in one modality by using a query from another modality. Existing approaches mainly consider the standard scenario that requires the source set for training and the target set for testing share the same scope of classes. However, they may not generalize well on zero-shot cross-modal retrieval (ZS-CMR) task, where the target set contains unseen classes that are disjoint with the seen classes in the source set. This task is more challenging due to 1) the absence of the unseen classes during training, 2) inconsistent semantics across seen and unseen classes, and 3) the heterogeneous multimodal distributions between the source and target set. To address these issues, we propose a novel Correlated Feature Synthesis and Alignment (CFSA) approach to integrate multimodal feature synthesis, common space learning and knowledge transfer for ZS-CMR. Our CFSA first utilizes class-level word embeddings to guide two coupled Wassertein generative adversarial networks (WGANs) to synthesize sufficient multimodal features with semantic correlation for stable training. Then the synthetic and true multimodal features are jointly mapped to a common semantic space via an effective distribution alignment scheme, where the cross-modal correlations of different semantic features are captured and the knowledge can be transferred to the unseen classes under the cycle-consistency constraint. Experiments on four benchmark datasets for image-text retrieval and two large-scale datasets for image-sketch retrieval show the remarkable improvements achieved by our CFAS method comparing with a bundle of state-of-the-art approaches.
Abstract: With the increasing popularity of location-aware social media services, next-Point-of-Interest (POI) recommendation has gained significant research interest. The key challenge of next-POI recommendation is to precisely learn users' sequential movements from sparse check-in data. To this end, various embedding methods have been proposed to learn the representations of check-in data in the Euclidean space. However, their ability to learn complex patterns, especially hierarchical structures, is limited by the dimensionality of the Euclidean space. To this end, we propose a new research direction that aims to learn the representations of check-in activities in a hyperbolic space, which yields two advantages. First, it can effectively capture the underlying hierarchical structures, which are implied by the power-law distributions of user movements. Second, it provides high representative strength and enables the check-in data to be effectively represented in a low-dimensional space. Specifically, to solve the next-POI recommendation task, we propose a novel hyperbolic metric embedding (HME) model, which projects the check-in data into a hyperbolic space. The HME jointly captures sequential transition, user preference, category and region information in a unified approach by learning embeddings in a shared hyperbolic space. To the best of our knowledge, this is the first study to explore a non-Euclidean embedding model for next-POI recommendation. We conduct extensive experiments on three check-in datasets to demonstrate the superiority of our hyperbolic embedding approach over the state-of-the-art next-POI recommendation algorithms. Moreover, we conduct experiments on another four online transaction datasets for next-item recommendation to further demonstrate the generality of our proposed model.
Abstract: Many sequential behaviors such as purchasing items from time to time, selecting courses in different terms, collecting event logs periodically could be formalized as sequential sets of actions or elements, namely temporal sets. Predicting the subsequent set according to historical sequence of sets could help us make better producing, scheduling, or operating decisions. However, most of the existing methods were designed for predicting time series or temporal events, which could not be directly used for temporal sets prediction due to the difficulties of multi-level representations of items and sets, complex temporal dependencies of sets, and evolving dynamics of sequential behaviors. To address these issues, this paper provides a novel sets prediction method, called DSNTSP (Dual Sequential Network for Temporal Sets Prediction). Our model first learns both item-level representations and set-level representations of set sequences separately based on a transformer framework. Then, a co-transformer module is proposed to capture the multiple temporal dependencies of items and sets. Last, a gated neural module is designed to predict the subsequent set by fusing all the multi-level correlations and multiple temporal dependencies of items and sets. The experimental results on real-world data sets show that our methods lead to significant and consistent improvements as compared to other methods.
Abstract: Sequential recommendation and group recommendation are two important branches in the field of recommender system. While considerable efforts have been devoted to these two branches in an independent way, we combine them by proposing the novel sequential group recommendation problem which enables modeling group dynamic representations and is crucial for achieving better group recommendation performance. The major challenge of the problem is how to effectively learn dynamic group representations based on the sequential user-item interactions of group members in the past time frames. To address this, we devise a Group-aware Long- and Short-term Graph Representation Learning approach, namely GLS-GRL, for sequential group recommendation. Specifically, for a target group, we construct a group-aware long-term graph to capture user-item interactions and item-item co-occurrence in the whole history, and a group-aware short-term graph to contain the same information regarding only the current time frame. Based on the graphs, GLS-GRL performs graph representation learning to obtain long-term and short-term user representations, and further adaptively fuse them to gain integrated user representations. Finally, group representations are obtained by a constrained user-interacted attention mechanism which encodes the correlations between group members. Comprehensive experiments demonstrate that GLS-GRL achieves better performance than several strong alternatives coming from sequential recommendation and group recommendation methods, validating the effectiveness of the core components in GLS-GRL.
Abstract: Incorporating temporal information into recommender systems has recently attracted increasing attention from both the industrial and academic research communities. Existing methods mostly reduce the temporal information of behaviors to behavior sequences for subsequently RNN-based modeling. In such a simple manner, crucial time-related signals have been largely neglected. This paper aims to systematically investigate the effects of the temporal information in sequential recommendations. In particular, we firstly discover two elementary temporal patterns of user behaviors: "absolute time patterns'' and "relative time patterns'', where the former highlights user time-sensitive behaviors, e.g., people may frequently interact with specific products at certain time point, and the latter indicates how time interval influences the relationship between two actions. For seamlessly incorporating these information into a unified model, we devise a neural architecture that jointly learns those temporal patterns to model user dynamic preferences. Extensive experiments on real-world datasets demonstrate the superiority of our model, comparing with the state-of-the-arts.
Abstract: Inductive transfer learning has had a big impact on computer vision and NLP domains but has not been used in the area of recommender systems. Even though there has been a large body of research on generating recommendations based on modeling user-item interaction sequences, few of them attempt to represent and transfer these models for serving downstream tasks where only limited data exists. In this paper, we delve on the task of effectively learning a single user representation that can be applied to a diversity of tasks, from cross-domain recommendations to user profile predictions. Fine-tuning a large pre-trained network and adapting it to downstream tasks is an effective way to solve such tasks. However, fine-tuning is parameter inefficient considering that an entire model needs to be re-trained for every new task. To overcome this issue, we develop a parameter-efficient transfer learning architecture, termed as PeterRec, which can be configured on-the-fly to various downstream tasks. Specifically, PeterRec allows the pre-trained parameters to remain unaltered during fine-tuning by injecting a series of re-learned neural networks, which are small but as expressive as learning the entire network. We perform extensive experimental ablation to show the effectiveness of the learned user representation in five downstream tasks. Moreover, we show that PeterRec performs efficient transfer learning in multiple domains, where it achieves comparable or sometimes better performance relative to fine-tuning the entire model parameters. Codes and datasets are available at https://github.com/fajieyuan/sigir2020_peterrec.
Abstract: Practical recommender systems need be periodically retrained to refresh the model with new interaction data. To pursue high model fidelity, it is usually desirable to retrain the model on both historical and new data, since it can account for both long-term and short-term user preference. However, a full model retraining could be very time-consuming and memory-costly, especially when the scale of historical data is large. In this work, we study the model retraining mechanism for recommender systems, a topic of high practical values but has been relatively little explored in the research community. Our first belief is that retraining the model on historical data is unnecessary, since the model has been trained on it before. Nevertheless, normal training on new data only may easily cause overfitting and forgetting issues, since the new data is of a smaller scale and contains fewer information on long-term user preference. To address this dilemma, we propose a new training method, aiming to abandon the historical data during retraining through learning to transfer the past training experience.Specifically, we design a neural network-based transfer component, which transforms the old model to a new model that is tailored for future recommendations. To learn the transfer component well, we optimize the "future performance'' -- i.e., the recommendation accuracy evaluated in the next time period. Our Sequential Meta-Learning(SML) method offers a general training paradigm that is applicable to any differentiable model. We demonstrate SML on matrix factorization and conduct experiments on two real-world datasets. Empirical results show that SML not only achieves significant speed-up, but also outperforms the full model retraining in recommendation accuracy, validating the effectiveness of our proposals. We release our codes at: https://github.com/zyang1580/SML.
Abstract: Discrete time event sequences from web users are commonly encountered in a variety of real-world domains, from digital marketing to recommender systems. Most of the literature in the area focuses on supervised prediction tasks of the user's next action or return time. However, little attention has been paid to the challenging task of modeling user evolution and behavior in a quantitative fashion. For example, progress of the users' knowledge in an online tutorial, user exploration, or engagement. We propose methods to perform a time-varying clustering along with event predictions in a unified domain agnostic framework. Our framework can help track the evolution of the users as they interact with the platform. We evaluate our methods on three real-world datasets and show that our method performs at par with the supervised baseline on the prediction tasks while providing meaningful clusters.
Abstract: One popular thread of research in computational sarcasm detection involves modeling sarcasm as a contrast between positive and negative sentiment polarities or exploring more fine-grained categories of emotions such as happiness, sadness, surprise, and so on. Most current models, however, treat these affective features independently, without regard for the sequential information encoded among the affective states. In order to explore the role of transitions in affective states, we formulate the task of sarcasm detection as a sequence classification problem by leveraging the natural shifts in various emotions over the course of a piece of text. Experiments conducted on datasets from two different genres suggest that our proposed approach particularly benefits datasets with limited labeled data and longer instances of text.
Abstract: An accurate understanding of a user's query intent can help improve the performance of downstream tasks such as query scoping and ranking. In the e-commerce domain, recent work in query understanding focuses on the query to product-category mapping. But, a small yet significant percentage of queries (in our website 1.5% or 33M queries in 2019) have non-commercial intent associated with them. These intents are usually associated with non-commercial information seeking needs such as discounts, store hours, installation guides, etc. In this paper, we introduce Joint Query Intent Understanding (JointMap), a deep learning model to simultaneously learn two different high-level user intent tasks: 1) identifying a query's commercial vs. non-commercial intent, and 2) associating a set of relevant product categories in taxonomy to a product query. JointMap model works by leveraging the transfer bias that exists between these two related tasks through a joint-learning process. As curating a labeled data set for these tasks can be expensive and time-consuming, we propose a distant supervision approach in conjunction with an active learning model to generate high-quality training data sets. To demonstrate the effectiveness of JointMap, we use search queries collected from a large commercial website. Our results show that JointMap significantly improves both "commercial vs. non-commercial" intent prediction and product category mapping by 2.3% and 10% on average over state-of-the-art deep learning methods. Our findings suggest a promising direction to model the intent hierarchies in an e-commerce search engine.
Abstract: Work in information retrieval has traditionally focused on ranking and relevance: given a query, return some number of results ordered by relevance to the user. However, the problem of determining how many results to return, i.e. how to optimally truncate the ranked result list, has received less attention despite being of critical importance in a range of applications. Such truncation is a balancing act between the overall relevance, or usefulness of the results, with the user cost of processing more results. In this work, we propose Choppy, an assumption-free model based on the widely successful Transformer architecture, to the ranked list truncation problem. Needing nothing more than the relevance scores of the results, the model uses a powerful multi-head attention mechanism to directly optimize any user-defined IR metric. We show Choppy improves upon recent state-of-the-art methods.
Abstract: Cyber attacks are increasingly becoming prevalent and causing significant damage to individuals, businesses and even countries. In particular, ransomware attacks have grown significantly over the last decade. We do the first study on mining insights about ransomware attacks by analyzing query logs from Bing web search engine. We first extract ransomware related queries and then build a machine learning model to identify queries where users are seeking support for ransomware attacks. We show that user search behavior and characteristics are correlated with ransomware attacks. We also analyse trends in the temporal and geographical space and validate our findings against publicly available information. Lastly, we do a case study on 'Nemty', a popular ransomware, to show that it is possible to derive accurate insights about cyber attacks by query log analysis.
Abstract: Product search is an important way for people to browse and purchase items on E-commerce platforms. While customers tend to make choices based on their personal tastes and preferences, analysis of commercial product search logs has shown that personalization does not always improve product search quality. Most existing product search techniques, however, conduct undifferentiated personalization across search sessions. They either use a fixed coefficient to control the influence of personalization or let personalization take effect all the time with an attention mechanism. The only notable exception is the recently proposed zero-attention model (ZAM) that can adaptively adjust the effect of personalization by allowing the query to attend to a zero vector. Nonetheless, in ZAM, personalization can act at most as equally important as the query and the representations of items are static across the collection regardless of the items co-occurring in the user's historical purchases. Aware of these limitations, we propose a transformer-based embedding model (TEM) for personalized product search, which could dynamically control the influence of personalization by encoding the sequence of query and user's purchase history with a transformer architecture. Personalization could have a dominant impact when necessary and interactions between items can be taken into consideration when computing attention weights. Experimental results show that TEM outperforms state-of-the-art personalization product retrieval models significantly.
Abstract: Interactive Information Retrieval (IIR) and Reinforcement Learning (RL) share many commonalities, including an agent who learns while interacts, a long-term and complex goal, and an algorithm that explores and adapts. To successfully apply RL methods to IIR, one challenge is to obtain sufficient relevance labels to train the RL agents, which are infamously known as sample inefficient. However, in a text corpus annotated for a given query, it is not the relevant documents but the irrelevant documents that predominate. This would cause very unbalanced training experiences for the agent and prevent it from learning any policy that is effective. Our paper addresses this issue by using domain randomization to synthesize more relevant documents for the training. Our experimental results on the Text REtrieval Conference (TREC) Dynamic Domain (DD) 2017 Track show that the proposed method is able to boost an RL agent's learning effectiveness by 22% in dealing with unseen situations.
Abstract: Predicting user engagement (e.g., click-through rate, conversion rate) on the display ads plays a critical role in delivering the right ad to the right user in online advertising. Existing techniques spanning Logistic Regression to Factorization Machines and their derivatives, focus on modeling the interactions among handcrafted features to predict the user engagement. Little attention has been paid on how the ad fits with the context (e.g., hosted webpage, user demographics). In this paper, we propose to include the metadata feature, which captures the visual appearance of the ad, in the user engagement prediction task. In particular, given a data sample, we combine both the basic context features, which have been widely used in existing prediction models, and the metadata feature, which is extracted from the ad using a state-of-the-art deep learning framework, to predict user engagement. To demonstrate the effectiveness of the proposed metadata feature, we compare the performance of the widely used prediction models before and after integrating the metadata feature. Our experimental results on a real-world dataset demonstrate that the metadata feature is able to further improve the prediction performance.
Abstract: Term frequency is a common method for identifying the importance of a term in a document. But term frequency ignores how a term interacts with its text context, which is key to estimating document-specific term weights. This paper proposes a Deep Contextualized Term Weighting framework (DeepCT) that maps the contextualized term representations from BERT to into context-aware term weights for passage retrieval. The new, deep term weights can be stored in an ordinary inverted index for efficient retrieval. Experiments on two datasets demonstrate that DeepCT greatly improves the accuracy of first-stage passage retrieval algorithms.
Abstract: Tabular data provide answers to a significant portion of search queries. However, reciting an entire result table is impractical in conversational search systems. We propose to generate natural language summaries as answers to describe the complex information contained in a table. Through crowdsourcing experiments, we build a new conversation-oriented, open-domain table summarization dataset. It includes annotated table summaries, which not only answer questions but also help people explore other information in the table. We utilize this dataset to develop automatic table summarization systems as SOTA baselines. Based on the experimental results, we identify challenges and point out future research directions that this resource will support.
Abstract: In cross-lingual text classification, one seeks to exploit labeled data from one language to train a text classification model that can then be applied to a completely different language. Recent multilingual representation models have made it much easier to achieve this. Still, there may still be subtle differences between languages that are neglected when doing so. To address this, we present a semi- supervised adversarial training process that minimizes the maximal loss for label-preserving input perturbations. The resulting model then serves as a teacher to induce labels for unlabeled target lan- guage samples that can be used during further adversarial training, allowing us to gradually adapt our model to the target language. Compared with a number of strong baselines, we observe signifi- cant gains in effectiveness on document and intent classification for a diverse set of languages.
Abstract: We report the results of a crowdsourcing user study for evaluating the effectiveness of human-chatbot collaborative conversation systems, which aim to extend the ability of a human user to answer another person's requests in a conversation using a chatbot. We examine the quality of responses from two collaborative systems and compare them with human-only and chatbot-only settings. Our two systems both allow users to formulate responses based on a chatbot's top-ranked results as suggestions. But they encourage the synthesis of human and AI outputs to a different extent. Experimental results show that both systems significantly improved the informativeness of messages and reduced user effort compared with a human-only baseline while sacrificing the fluency and humanlikeness of the responses. Compared with a chatbot-only baseline, the collaborative systems provided comparably informative but more fluent and human-like messages.
Abstract: Identification of new concepts in scientific literature can help power faceted search, scientific trend analysis, knowledge-base construction, and more, but current methods are lacking. Manual identification can't keep up with the torrent of new publications, while the precision of existing automatic techniques is too low for many applications. We present an unsupervised concept extraction method for scientific literature that achieves much higher precision than previous work. Our approach relies on a simple but novel intuition: each scientific concept is likely to be introduced or popularized by a single paper that is disproportionately cited by subsequent papers mentioning the concept. From a corpus of computer science papers on arXiv, we find that our method achieves a Precision@1000 of 99%, compared to 86% for prior work, and a substantially better precision-yield trade-off across the top 15,000 extractions. To stimulate research in this area, we release our code and data.
Abstract: In this paper, we propose a reinforcement learning based large scale multi-objective ranking system for optimizing short-video recommendation on an industrial video sharing platform. Multiple competing ranking objective and implicit selection bias in user feedback are the main challenges in real-world platform. In order to address those challenges, we integrate multi-gate mixture of experts and soft actor critic into the ranking system. We demonstrated that our proposed framework can greatly reduce the loss function compared with systems only based on single strategies.
Abstract: Voice-activated intelligent entertainment systems are prevalent in modern TVs. These systems require accurate automatic speech recognition (ASR) models to transcribe voice queries for further downstream language understanding tasks. Currently, labeling audio data for training is the main bottleneck in deploying accurate machine learning ASR models, especially when these models require up-to-date training data to adapt to the shifting customer needs. We present an auto-annotation system, which provides high quality training data without any hand-labeled audios by detecting speech recognition errors and providing possible fixes. Through our algorithm, the auto-annotated training data reaches an overall word error rate (WER) of 0.002; furthermore, we obtained a reduction of 0.907 in WER after applying the auto-suggested fixes.
Abstract: Identifying critical information in real time in the beginning of a disaster is a challenging but important task. This task has been recently addressed using domain adaptation approaches, which eliminate the need for target labeled data, and can thus accelerate the process of identifying useful information. We propose to investigate the effectiveness of the Domain Reconstruction Classification Network (DRCN) approach on disaster tweets. DRCN adapts information from target data by reconstructing it with an autoencoder. Experimental results using a sequence-to-sequence autoencodershow that the DRCN approach can improve the performance of both supervised and domain adaptation baseline models.
Abstract: Recipe retrieval is a representative and useful application of cross-modal information retrieval. Recent studies have proposed frameworks for retrieving images of cuisines given textual ingredient lists and instructions. However, the textual form of ingredients easily causes information loss or inaccurate description, especially for novices of cookery who are often the main users of recipe retrieval systems. In this paper, we revisit the task of recipe retrieval by taking images of ingredients as input queries, and retrieving cuisine images by incorporating visual information of ingredients through a deep convolutional neural network. We build an image-to-image recipe retrieval system to validate the effect of ingredient image queries. We further combine the proposed solution with a state-of-the-art cross-modal recipe retrieval model to improve the overall performance of the recipe retrieval task.
Abstract: Graph-based models have been widely used to fraud detection tasks. Owing to the development of Graph Neural Networks~(GNNs), recent works have proposed many GNN-based fraud detectors based on either homogeneous or heterogeneous graphs. These works leverage existing GNNs and aggregate the neighborhood information to learn the node embeddings, which relies on the assumption that the neighbors share similar context, features, and relations. However, the inconsistency problem incurred by fraudsters is hardly investigated, i.e., the context inconsistency, feature inconsistency, and relation inconsistency. In this paper, we introduce these inconsistencies and design a new GNN framework, GraphConsis, to tackle the inconsistency problem: (1) for the context inconsistency, we propose to combine the context embeddings with node features; (2) for the feature inconsistency, we design a consistency score to filter the inconsistent neighbors and generate corresponding sampling probability; (3) for the relation inconsistency, we learn the relation attention weights associated with the sampled nodes. Empirical analysis on four datasets demonstrates that the inconsistency problem is critical in fraud detection tasks. Extensive experiments show the effectiveness of GraphConsis. We also released a GNN-based fraud detection toolbox with implementations of SOTA models. The code is available at \urlhttps://github.com/safe-graph/DGFraud
Abstract: The identification of relevance with little textual context is a primary challenge in passage retrieval. We address this problem with a representation-based ranking approach that: (1) explicitly models the importance of each term using a contextualized language model; (2) performs passage expansion by propagating the importance to similar terms; and (3) grounds the representations in the lexicon, making them interpretable. Passage representations can be pre-computed at index time to reduce query-time latency. We call our approach EPIC (Expansion via Prediction of Importance with Contextualization). We show that EPIC significantly outperforms prior importance-modeling and document expansion approaches. We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches. Specifically, EPIC achieves a MRR@10 of 0.304 on the MS-MARCO passage ranking dataset with 78ms average query latency on commodity hardware. We also find that the latency is further reduced to 68ms by pruning document representations, with virtually no difference in effectiveness.
Abstract: IR-based Question Answering (QA) systems typically use a sentence selector to extract the answer from retrieved documents. Recent studies have shown that powerful neural models based on the Transformer can provide an accurate solution to Answer Sentence Selection (AS2). Unfortunately, their computation cost prevents their use in real-world applications. In this paper, we show that standard and efficient neural rerankers can be used to reduce the amount of sentence candidates fed to Transformer models without hurting Accuracy, thus improving efficiency up to four times. This is an important finding as the internal representation of shallower neural models is dramatically different from the one used by a Transformer model, e.g., word vs. contextual embeddings.
Abstract: In cross-language information retrieval using probabilistic structured queries (PSQ), translation probabilities from statistical machine translation act as a bridge between the query and document vocabulary. These translation probabilities are typically estimated from a sentence-aligned corpus on a word to word basis without taking into account the context. Neural methods, by contrast, can learn to translate using the context around the words, and this can be used as a basis for estimating context-dependent translation probabilities. However, sparsity limits the accuracy of context-specific translation probabilities for rare words, which can be important in retrieval applications. This paper presents evidence that combining such context-dependent translation probabilities with context-independent translation probabilities learned from the same parallel corpus can yield improvements in the effectiveness of cross-language ranked retrieval.
Abstract: Conversational systems such as digital assistants can help users per-form many simple tasks upon request. Looking to the future, these systems will also need to fully support more complex, multi-step tasks (e.g., following cooking instructions), and help users complete those tasks, e.g., via useful and relevant suggestions made during the process. This paper takes the first step towards automatic generation of task-related suggestions. We introduce proactive suggestion generation as a novel task of natural language generation, in which a decision is made to inject a suggestion into an ongoing user dialog and one is then automatically generated. We propose two types of stepwise suggestions: multiple-choice response generation and text generation. We provide several models for each type of suggestion, including binary and multi-class classification, and text generation.
Abstract: In this work, we focus on the contextual document ranking task, which deals with the challenge of user interaction modeling for conversational search. Given a history of user feedback behaviors, such as issuing a query, clicking a document, and skipping a document, we propose to introduce behavior awareness to a neural ranker, resulting in a Hierarchical Behavior Aware Transformers (HBA-Transformers) model. The hierarchy is composed of an intra-behavior attention layer and an inter-behavior attention layer to let the system effectively distinguish and model different user behaviors. Our extensive experiments on the AOL session dataset demonstrate that the hierarchical behavior aware architecture is more powerful than a simple combination of history behaviors. Besides, we analyze the conversational property of queries. We show that coherent sessions tend to be more conversational and thus are more demanding in terms of considering history user behaviors.
Abstract: In professional search tasks such as precision medicine literature search, queries often involve multiple aspects. To assess the relevance of a document, a searcher often painstakingly validates each aspect in the query and follows a task-specific logic to make a relevance decision. In such scenarios, we say the searcher makes a structured relevance judgment, as opposed to the traditional univariate (binary or graded) relevance judgment. Ideally, a search engine can support searcher's workflow and follow the same steps to predict document relevance. This approach may not only yield highly effective retrieval models, but also open up opportunities for the model to explain its decision in the same "lingo" as the searcher. Using structured relevance judgment data from the TREC Precision Medicine track, we propose novel retrieval models that emulate how medical experts make structured relevance judgments. Our experiments demonstrate that these simple, explainable models can outperform complex, black-box learning-to-rank models.
Abstract: Search engines often provide only limited explanation on why results are ranked in a particular order. This lack of transparency prevents users from understanding results and can potentially give rise to biased or unfair systems. Opaque search engines may also hurt user trust in the presented ranking. This paper presents an investigation of system quality when different degrees of explanation are provided on search engine result pages. Our user study demonstrates that the inclusion of even simplistic explanations leads to better transparency, increased user trust and better search efficiency.
Abstract: We propose a Query by Example (QBE) setting for cross-lingual event retrieval. In this setting, a user describes a query event using example sentences in one language, and a retrieval system returns a ranked list of sentences that describe the query event, but from a corpus in a different language. One challenge in this setting is that a sentence may mention more than one event. Hence, matching the query sentence with document sentence results in a noisy matching. We propose a Semantic Role Labeling (SRL) based approach to identify event spans in sentences and use a state-of-the-art sentence matching model, Sentence BERT (SBERT) to match event spans in queries and documents without any supervision. To evaluate our approach we construct an event retrieval dataset from ACE which is an existing event detection dataset. Experimental results show that it is valuable to predict event spans in queries and documents and our proposed unsupervised approach achieves superior performance compared to Query Likelihood (QL), Relevance Model 3 (RM3) and SBERT.
Abstract: Recent interest in the design of information retrieval systems that can balance an ability to find relevant content with an ability to protect sensitive content creates a need for test collections that are annotated for both relevance and sensitivity. This paper describes the development of such a test collection that is based on the Avocado Research Email Collection. Four people created search topics as a basis for assessing relevance, and two personas describing the sensitivities of representative (but fictional) content creators were created as a basis for assessing sensitivity. These personas were based on interviews with potential donors of historically significant email collections and with archivists who currently manage access to such collections. Two annotators then created relevance and sensitivity judgments for 65 topics, divided approximately equally between the two personas. Annotator agreement statistics indicate fairly good external reliability for both relevance and sensitivity annotations, and a baseline sensitivity classifier trained and evaluated using cross-validation achieved better than 80% $F_1$, suggesting that the resulting collection will likely be useful as a basis for comparing alternative retrieval systems that seek to balance relevance and sensitivity.
Abstract: Word embeddings are essential components for many text data applications. In most work, "out-of-the-box" embeddings trained on general text corpora are used, but they can be less effective when applied to domain-specific settings. Thus, how to create "domain-aware" word embeddings is an interesting open research question. In this paper, we study three methods for creating domain-aware word embeddings based on both general and domain-specific text corpora, including concatenation of embedding vectors, weighted fusion of text data, and interpolation of aligned embedding vectors. Even though the investigated strategies are tailored for domain-specific tasks, they are general enough to be applied to any domain and are not specific to a single task. Experimental results show that all three methods can work well, however, the interpolation method consistently works best.
Abstract: We focus on improving the effectiveness of a Virtual Assistant (VA) in recognizing emerging entities in spoken queries. We introduce a method that uses historical user interactions to forecast which entities will gain in popularity and become trending, and it subsequently integrates the predictions within the Automated Speech Recognition (ASR) component of the VA. Experiments show that our proposed approach results in a 20% relative reduction in errors on emerging entity name utterances without degrading the overall recognition quality of the system.
Abstract: Recommender systems based on collaborative filtering are highly vulnerable to data poisoning attacks, where a determined attacker injects fake users with false user-item feedback, with an objective to either corrupt the recommender system or promote/demote a target set of items. Recently, differential privacy was explored as a defense technique against data poisoning attacks in the typical machine learning setting. In this paper, we study the effectiveness of differential privacy against such attacks on matrix factorization based collaborative filtering systems. Concretely, we conduct extensive experiments for evaluating robustness to injection of malicious user profiles by simulating common types of shilling attacks on real-world data and comparing the predictions of typical matrix factorization with differentially private matrix factorization.
Abstract: In this paper, we present results from an exploratory study to investigate users' behaviors and preferences for three different styles of search results presentation in a virtual reality (VR) head-mounted display (HMD). Prior work in 2D displays has suggested possible benefits of presenting information in ways that exploit users' spatial cognition abilities. We designed a VR system that displays search results in three different spatial arrangements: a list of 8 results, a 4x5 grid, and a 2x10 arc. These spatial display conditions were designed to differ in terms of the number of results displayed per page (8 vs 20) and the amount of head movement required to scan the results (list < grid < arc). Thirty-six participants completed 6 search trials in each display condition (18 total). For each trial, the participant was presented with a display of search results and asked to find a given target result or to indicate that the target was not present. We collected data about users' behaviors with and perceptions about the three display conditions using interaction data, questionnaires, and interviews. We explore the effects of display condition and target presence on behavioral measures (e.g., completion time, head movement, paging events, accuracy) and on users' perceptions (e.g., workload, ease of use, comfort, confidence, difficulty, and lostness). Our results suggest that there was no difference in accuracy among the display conditions, but that users completed tasks more quickly using the arc. However, users also expressed lower preferences for the arc, instead preferring the list and grid displays. Our findings extend prior research on visual search into to the area of 3-dimensional result displays for interactive information retrieval in VR HMD environments.
Abstract: We present a study on the importance of information retrieval (IR) techniques for both the interpretability and the performance of neural question answering (QA) methods. We show that the current state-of-the-art transformer methods (like RoBERTa) encode poorly simple information retrieval (IR) concepts such as lexical overlap between query and the document. To mitigate this limitation, we introduce a supervised RoBERTa QA method that is trained to mimic the behavior of BM25 and the soft-matching idea behind embedding-based alignment methods. We show that fusing the simple lexical-matching IR concepts in transformer techniques results in improvement a) of their (lexical-matching) interpretability, b) retrieval performance, and c) the QA performance on two multi-hop QA datasets. We further highlight the lexical-chasm gap bridging capabilities of transformer methods by analyzing the attention distributions of the supervised RoBERTa classifier over the context versus lexically-matched token pairs.
Abstract: Concept maps provide concise structured representations for documents regarding their important concepts and interaction links, which have been widely used for document summarization and downstream tasks. However, the construction of concept maps often relies heavily on heuristic design and auxiliary tools. Recent popular neural network models, on the other hand, are shown effective in tasks across various domains, but are short in interpretability and prone to overfitting. In this work, we bridge the gap between concept map construction and neural network models, by designing doc2graph, a novel weakly-supervised text-to-graph neural network, which generates concept maps in the middle and is trained towards document-level tasks like document classification. In our experiments, doc2graph outperforms both its traditional baselines and neural counterparts by significant margins in document classification, while producing high-quality interpretable concept maps as document structured summarization.
Abstract: We introduce a new metric for measuring the performance of multi-class classifiers. This metric is a generalization of the f1 score that is defined on binary classifiers, and offers significant improvement over other generalizations such as micro- and macro-averaging. In particular, one can select coefficients that weight the per-class precision and recall, as well as the overall class importance, with a robust mathematical interpretation. When certain parameters are selected our metric yields macro-averaged statistic as a special case. We demonstrate the efficacy of this metric on an application in genealogical search.
Abstract: In this study, we investigate interaction-based neural matching models for ad-hoc cross-lingual information retrieval (CLIR) using cross-lingual word embeddings (CLWEs). With experiments conducted on the CLEF collection over four language pairs, we evaluate and provide insight into different neural model architectures, different ways to represent query-document interactions and word-pair similarity distributions in CLIR. This study paves the way for learning an end-to-end CLIR system using CLWEs.
Abstract: Determining user geolocation is vital to various real-world applications on the internet, such as online marketing and event detection. To identify the geolocations of users, their behaviors on social media like published posts and social interactions can be strong evidence. However, most of the existing social media based approaches individually learn from text contexts and social networks. This separation can not only lead to sub-optimal performance but also ignore the distinct importance of two resources for different users. To address this challenge, we propose a novel end-to-end framework, Hybrid-attentive User Geolocation (HUG), to jointly model post texts and user interactions in social media. The hybrid attention mechanism is introduced to automatically determine the importance of texts and social networks for each user while social media posts and interactions are modeled by a graph attention network and a language attention network. Extensive experiments conducted on three benchmark geolocation datasets using Twitter data demonstrate that HUG significantly outperforms competitive baseline methods. The in-depth analysis also indicates the robustness and interpretability of HUG.
Abstract: With the exponential growth of information on the internet, users have been relying on search engines for finding the precise documents. However, user queries are often short. The inherent ambiguity of short queries imposes great challenges for search engines to understand user intent. Query suggestion is one key technique for search engines to augment user queries so that they can better understand user intent. In the past, query suggestions have been relying on either term-frequency--based methods with little semantic understanding of the query, or word-embedding--based methods with little personalization efforts. Here, we present a sequence-to-sequence-model--based query suggestion framework that is capable of modeling structured, personalized features and unstructured query texts naturally. This capability opens up the opportunity to better understand query semantics and user intent at the same time. As the largest professional network, LinkedIn has the advantage of utilizing a rich amount of accurate member profile information to personalize query suggestions. We applied this framework in the LinkedIn production traffic and showed that personalized query suggestions significantly improved member search experience as measured by key business metrics at LinkedIn.
Abstract: Although neural network models enjoy tremendous advantages in handling image and text data, tree-based models still remain competitive for learning-to-rank tasks with numerical data. A major strength of tree-based ranking models is the insensitivity to different feature scales, while neural ranking models may suffer from features with varying scales or skewed distributions. Feature transformation or normalization is a simple technique which preprocesses input features to mitigate their potential adverse impact on neural models. However, due to lack of studies, it is unclear to what extent feature transformation can benefit neural ranking models. In this paper, we aim to answer this question by providing empirical evidence for learning-to-rank tasks. First, we present a list of commonly used feature transformation techniques and perform a comparative study on multiple learning-to-rank data sets. Then we propose a mixture feature transformation mechanism which can automatically derive a mixture of basic feature transformation functions to achieve the optimal performance. Our experiments show that applying feature transformation can substantially improve the performance of neural ranking models compared to directly using the raw features. In addition, the proposed mixture transformation method can further improve the performance of the ranking model without any additional human effort.
Abstract: Peer reviews form an essential part of scientific communications. Research papers and proposals are reviewed by several peers before they are finally accepted or rejected. The procedure followed requires experts to review the research work. Then the area/program chair/ editor writes a meta-review summarizing the review comments and taking a call based on the reviewers' decisions. In this paper, we present MetaGen, a novel meta-review generation system which takes the peer reviews as input and produces an assistive meta-review. This meta-review generation can help the area/program chair writing a meta-review and taking the final decision on the paper/proposal. Thus it can also help to speed up the review process for conference/journals where a large number of submissions need to be handled within a stipulated time. Our approach first generates an extractive draft and then uses fine-tuned UniLM (Unified Langauge Model) for predicting the acceptance decision and making the final meta-review in an abstractive manner. To the best of our knowledge, this is the first work in the direction of meta-review generation. Evaluation based on ROUGE score shows promising results and comparison with few state-of-the-art summarizers demonstrates the effectiveness of the system.
Abstract: Computing similarity between two legal case documents is a challenging task, for which text-based and network-based measures have been proposed in literature. All prior network-based similarity methods considered a precedent citation network among case documents only (PCNet). However, this approach misses an important source of legal knowledge - the hierarchy of legal statutes that are applicable in a given legal jurisdiction (e.g., country). We propose to augment the PCNet with the hierarchy of legal statutes, to form a heterogeneous network Hier-SPCNet. Experiments over a set of Indian Supreme Court case documents show that Hier-SPCNet enables significantly better document similarity estimation, as compared to existing approaches using PCNet. We also show that the proposed network-based method can complement text-based measures for better estimation of legal document similarity.
Abstract: Internet insurance products are apparently different from traditional e-commerce goods for their complexity, low purchasing frequency, etc. So, cold start problem is even worse. In traditional e-commerce field, several cross-domain recommendation (CDR) methods have been studied to infer preferences of cold start users based on their preferences in other domains. However, these CDR methods couldn't be applied into insurance domain directly due to product complexity. In this paper, we propose a Deep Cross-Domain Insurance Recommendation System (DCDIR) for cold start users. Specifically, we first learn more effective user and item latent features in both domains. In target domain, given the complexity of insurance products, we design a meta-path based method over insurance product knowledge graph. In source domain, we employ GRU to model users' dynamic interests. Then we learn a feature mapping function by multi-layer perceptions. We apply DCDIR on our company's dataset, and show DCDIR significantly outperforms the state-of-the-art solutions.
Abstract: Pre-trained models have brought significant improvements to many NLP tasks and have been extensively analyzed. But little is known about the effect of fine-tuning on specific tasks. Intuitively, people may agree that a pre-trained model already learns semantic representations of words (e.g. synonyms are closer to each other) and fine-tuning further improves its capabilities which require more complicated reasoning (e.g. coreference resolution, entity boundary detection, etc). However, how to verify these arguments analytically and quantitatively is a challenging task and there are few works focus on this topic. In this paper, inspired by the observation that most probing tasks involve identifying matched pairs of phrases (e.g. coreference requires matching an entity and a pronoun), we propose a pairwise probe to understand BERT fine-tuning on the machine reading comprehension (MRC) task. Specifically, we identify five phenomena in MRC. According to pairwise probing tasks, we compare the performance of each layer's hidden representation of pre-trained and fine-tuned BERT. The proposed pairwise probe alleviates the problem of distraction from inaccurate model training and makes a robust and quantitative comparison. Our experimental analysis leads to highly confident conclusions: (1) Fine-tuning has little effect on the fundamental and low-level information and general semantic tasks. (2) For specific abilities required for downstream tasks, fine-tuned BERT is better than pre-trained BERT and such gaps are obvious after the fifth layer.
Abstract: Adversarial attacks pose significant challenges for detecting adversarial attacks at an early stage. We propose attack-agnostic detection on reinforcement learning-based interactive recommendation systems. We first craft adversarial examples to show their diverse distributions and then augment recommendation systems by detecting potential attacks with a deep learning-based classifier based on the crafted data. Finally, we study the attack strength and frequency of adversarial examples and evaluate our model on standard datasets with multiple crafting methods. Our extensive experiments show that most adversarial attacks are effective, and both attack strength and attack frequency impact the attack performance. The strategically-timed attack achieves comparative attack performance with only 1/3 to 1/2 attack frequency. Besides, our black-box detector trained with one crafting method has the generalization ability over several crafting methods.
Abstract: Bundle recommendation aims to recommend a bundle of items for a user to consume as a whole. Existing solutions integrate user-item interaction modeling into bundle recommendation by sharing model parameters or learning in a multi-task manner, which cannot explicitly model the affiliation between items and bundles, and fail to explore the decision-making when a user chooses bundles. In this work, we propose a graph neural network model named BGCN (short forBundle Graph Convolutional Network ) for bundle recommendation. BGCN unifies user-item interaction, user-bundle interaction and bundle-item affiliation into a heterogeneous graph. With item nodes as the bridge, graph convolutional propagation between user and bundle nodes makes the learned representations capture the item level semantics. Through training based on hard-negative sampler, the user's fine-grained preferences for similar bundles are further distinguished. Empirical results on two real-world datasets demonstrate the strong performance gains of BGCN, which outperforms the state-of-the-art baselines by 10.77% to 23.18%.
Abstract: Answer selection plays a crucial role in natural language processing. and thus has received much attention. Many recent works treat it as an ad-hoc retrieval problem where ranking optimization accounts for a large proportion. Previous works mainly consider the similarity between answer and question, but rarely utilize similarity and dissimilarity relationship in the answers candidate set. In this paper, we propose a similarity aggregation method to rerank the results produced by different baseline neural networks. The key idea of similarity aggregation is that true matches should not only similar to other true matches, but also dissimilar with false matches, and inspired by multi-view verification, the true answers should have the same ranking to the question in different baseline methods and false answers are the same. The empirical results, from the public benchmark task of answer selection, demonstrate that our method has significant improvement over the baseline methods.
Abstract: Predicting tags for a given item and leveraging tags to assist item recommendation are two popular research topics in the field of recommender system. Previous studies mostly focus only one of them to make contributions. However, we believe that these tasks are inherently correlated with each other: tags can provide additional information to profile items for more accurate recommendation; user behaviors can help to infer item relationships to benefit the item tagging process. In order to take the advantages of such mutually influential signals, we propose to integrate item tagging and tag-based recommendation into a unified model. We firstly design a basic framework, where the user-item interaction signals are leveraged to supervise the item tagging process. Then we extend the basic model with a bootstrapping technique to circulate such mutual improvements between different tasks. We conduct extensive experiments based on real-word datasets to demonstrate our model's superiorities.
Abstract: Text classification requires a deep understanding of the linguistic features in text; in particular, the intra-sentential (local) and inter-sentential features (global). Models that operate on word sequences have been successfully used to capture the local features, yet they are not effective in capturing the global features in long-text. We investigate graph-level extensions to such models and propose a novel architecture for combining alternative text features. It uses an attention mechanism to dynamically decide how much information to use from a sequence- or graph-level component. We evaluated different architectures on a range of text classification datasets, and graph-level extensions were found to improve performance on most benchmarks. In addition, the attention-based architecture, as adaptively-learned from the data, outperforms the generic and fixed-value concatenation ones.
Abstract: Different to traditional IR, which retrieves a set of topically relevant documents given a user query, we investigate causal retrieval, which involves retrieving a set of documents that describe a set of potential causes leading to an effect specified in the query. We argue that the nature of causal relevance should be different to that of traditional topical relevance. This is because although the causally relevant documents would have partial term overlap with the ones that are topically relevant for a query, yet it is expected that a majority of these documents would use a different set of terms to describe a number of causes possibly leading to their effects. To address this, we propose a feedback model to estimate a distribution of terms which are relatively infrequent but associated with high weights in the topically relevant distribution, leading to potential causal relevance. Our experiments demonstrate that such a feedback model turns out to be substantially more effective than traditional IR models and a number of other causality heuristic baselines.
Abstract: Non-factoid question answering (QA) is one of the most extensive yet challenging application and research areas of retrieval-based question answering. In particular, answers to non-factoid questions can often be too lengthy and redundant to comprehend, which leads to the great demand on answer sumamrization in non-factoid QA. However, the multi-level interactions between QA pairs and the interrelation among different answer sentences are usually modeled separately on current answer summarization studies. In this paper, we propose a unified model to bridge hierarchical and sequential context modeling for question-driven extractive answer summarization. Specifically, we design a hierarchical compare-aggregate method to integrate the interaction between QA pairs in both word-level and sentence-level into the final question and answer representations. After that, we conduct the question-aware sequential extractor to produce a summary for the lengthy answer. Experimental results show that answer summarization benefits from both hierarchical and sequential context modeling and our method achieves superior performance on WikiHowQA and PubMedQA.
Abstract: Predicting the survival of cancer patients holds significant meaning for public health, and has attracted increasing attention in medical information communities. In this study, we propose a novel framework for cancer survival prediction named Multimodal Graph Neural Network (MGNN), which explores the features of real-world multimodual data such as gene expression, copy number alteration and clinical data in a unified framework. In order to explore the inherent relation, we first construct the bipartite graphs between patients and multimodal data. Subsequently, graph neural network is adopted to obtain the embedding of each patient on different bipartite graphs. Finally, a multimodal fusion neural layer is designed to fuse the features from different modal data. The output of our method is the classification of short term survival or long term survival for each patient. Experimental results on one breast cancer dataset demonstrate that MGNN outperforms all baselines. Furthermore, we test the trained model on lung cancer dataset, and the experimental results verify the strong robust by comparing with state-of-the-art methods.
Abstract: Graph retrieval from a large corpus of graphs has a wide variety of applications, e.g., sentence retrieval using words and dependency parse trees for question answering, image retrieval using scene graphs, and molecule discovery from a set of existing molecular graphs. In such graph search applications, nodes, edges and associated features bear distinctive physical significance. Therefore, a unified, trainable search model that efficiently returns corpus graphs that are highly relevant to a query graph has immense potential impact. In this paper, we present an effective, feature and structure-aware, end-to-end trainable neural match scoring system for graphs. We achieve this by constructing the product graph between the query and a candidate graph in the corpus, and then conduct a family of random walks on the product graph, which are then aggregated into the match score, using a network whose parameters can be trained. Experiments show the efficacy of our method, compared to competitive baseline approaches.
Abstract: Session-based recommendation aims to predict the next item that users will interact based solely on anonymous sessions. In real-life scenarios, the user's preferences are usually various, and distinguishing different preferences in the session is important. However previous studies focus mostly on the transition modeling between items, ignoring the mining of various user preferences. In this paper, we propose a Hierarchical Leaping Network (HLN) to explicitly model the users' multiple preferences by grouping items that share some relationships. We first design a Leap Recurrent Unit (LRU) which is capable of skipping preference-unrelated items and accepting knowledge of previously learned preferences. Then we introduce a Preference Manager (PM) to manage those learned preferences and produce an aggregated preference representation each time LRU reruns. The final output of PM which contains multiple preferences of the user is used to make recommendations. Experiments on two benchmark datasets demonstrate the effectiveness of HLN. Furthermore, the visualization of explicitly learned subsequences also confirms our idea.
Abstract: Factorization Machine (FM)-based models can only reveal the relationship between a pair of features. With all feature embeddings fed to a MLP, DNN-based factorization models which combine FM with multi-layer perceptron (MLP) can only reveal the relationship among some features implicitly. Some other DNN-based methods apply CNN to generate feature interactions. However, (1) they model feature interactions at the bit-wise (where only part of an embedding is utilized to generate feature interactions), which can not express the semantics of features comprehensively, (2) they can only model the interactions among the neighboring features. To deal with aforementioned problems, this paper proposes a Multi-Branch Convolutional Network (MBCN) which includes three branches: the standard convolutional layer, the dilated convolutional layer and the bias layer. MBCN is able to explicitly model feature interactions with arbitrary orders at the vector-wise, which fully express context-aware feature semantics. Extensive experiments on three public benchmark datasets are conducted to demonstrate the superiority of MBCN, compared to the state-of-the-art baselines for context-aware top-k recommendation.
Abstract: Estimation of session duration for an e-commerce search engine is important for various downstream applications, including user satisfaction prediction, personalization, and diversification of search results. It has been shown in previous studies that search session length has a strong correlation with user's explore vs specific purchase intent. Based on previous work [14], we hypothesize that early prediction of session length distribution can be used to control the degree of explore vs exploit (loosely related to diversification v/s personalization) for Search Engine Result Pages (SERPs) in the user's session to follow. In this work, we try to early predict the user's session length, which will enable the control on explore v/s exploit of the search results. Towards this end, based on previous work and strong empirical evidence, we hypothesize session lengths are Weibull distributed and propose its parameters being modeled by a Recurrent Neural Network over actions in user's search sessions. Through experimentation, we demonstrate that our method performs better as compared to strong baselines for the same.
Abstract: Many networks in real applications are constantly evolving as the creation and elimination of nodes and edges. Dynamic link prediction aims to infer whether there will be an edge between a pair of nodes, given the recent evolution history of the network. In this paper, we devise a flexible framework for link prediction on dynamic networks regularly archived as different snapshots. On the basis of node vectors learned on individual snapshots, a gated recurrent unit (GRU) network is utilized to model the node vector evolution series and predict the node representation in the future. Then, the edge representation is not only constructed from the interaction between representations of the target node pair, but also enriched with local neighborhood representations---historical embeddings of their common neighbors. Finally, a binary classifier is trained to perform link prediction. The framework can be instantiated with many off-the-shelf outstanding node embedding and binary classification methods. Extensive experiments on three different datasets demonstrate the effectiveness and flexibility of our proposed framework. Ablation studies show that the node vector evolution and local neighborhood representation both have positive but different effects on dynamic link prediction on diverse networks.
Abstract: As a particularly prominent application of recommender systems on automated personalized service, the music recommendation has been widely used in various music network platforms, music education and music therapy. Importantly, the individual music preference for a certain moment is closely related to personal experience of the music and music literacy, as well as temporal scenario without any interruption. Therefore, this paper proposes a novel policy for music recommendation NRRS (Nonintrusive-Sensing and Reinforcement-Learning based Recommender Systems) by integrating prior research streams. Specifically, we develop a novel recommendation framework for sensing, learning and adaptation to user's current preference based on wireless sensing and reinforcement learning in real time during a listening session. The established music recommendation prototype monitors individual vital signals for listening music, and captures song characters, individual dynamic preferences, and that it can yield better listening experience for users.
Abstract: Question-answering sentiment analysis (QASA) is a novel but meaningful sentiment analysis task based on question-answering online reviews. Existing neural network-based models that conduct sentiment analysis of online reviews have already achieved great success. However, the syntax and implicitly semantic connection in the dependency tree have not been made full use of, especially for Chinese which has specific syntax. In this work, we propose a Residual-Duet Network leveraging textual and tree dependency information for Chinese question-answering sentiment analysis. In particular, we explore the synergies of graph embedding with structural dependency links to learn syntactic information. The transverse and longitudinal compression encoders are developed to capture sentiment evidence with disparate types of compression and different residual connections. We evaluate our model on three Chinese QASA datasets in different domains. Experimental results demonstrate the superiority of our proposed model in Chinese question-answering sentiment analysis.
Abstract: Automatically answering mathematical problems is a challenging task since it requires not only the ability of linguistic understanding but also mathematical comprehension. Existing studies usually explore solutions on the elementary math word problems that aim to understand the questions described in natural language narratives, which are not capable of solving more general problems containing structural formulas. To this end, in this paper, we propose a novel Neural Mathematical Solver (NMS) with enhanced formula structures. Specifically, we first frame the formulas in a certain problem as a TeX dependency graph to preserve formula-enriched structures. Then, we design a formula graph network (FGN) to capture its mathematical relations. Next, we develop a novel architecture with two GRU models, connecting tokens from both word space and formula space together, to learn the linguistic semantics for the answers. Extensive experiments on a large-scale dataset demonstrate that NMS not only achieves better answer prediction but also visualizes reasonable mathematical representations of problems.
Abstract: in their accompanying referral documents, which contain a mix of free text and structured data. By training a model to predict triage decisions from these referral documents, we can partially automate the triage process, resulting in more efficient and systematic triage decisions. One of the difficulties of this task is maintaining robustness against changes in triage priorities due to changes in policy, funding, staff, or other factors. This is reflected as changes in relationship between document features and triage labels, also known as concept drift. These changes must be detected so that the model can be retrained to reflect the new environment. We introduce a new concept drift detection algorithm for this domain called calibrated drift detection method (CDDM). We evaluated CDDM on benchmark and synthetic medical triage datasets, and find it competitive with state-of-the-art detectors, while also being less prone to false positives from feature drift.
Abstract: Text documents are often mapped to vectors of binary values where 1 indicates the presence of a word and 0 indicates the absence. The vectors are then used to train predictive models. In tree-based ensemble models, predictions from some decision trees may be made purely from absent words. This type of predictions should be trusted less as absent words can be interpreted in multiple ways. In this work, we propose to improve the comprehensibility and accuracy of ensemble models by distinguishing word presence and absence. The presented method weights predictions based on word presence. Experimental results on 35 real text datasets indicate that our method outperforms state-of-the-art ensemble methods on various text classification tasks.
Abstract: Unsupervised multi-lingual language modeling has gained attraction in the last few years and poly-lingual topic models provide a mechanism to learn aligned document representations. However, training such models require translation-aligned data across languages, which is not always available. Also, in case of short texts like tweets, search queries, etc, the training of topic models continues to be a challenge. In this work, we present a novel strategy of creating a pseudo-parallel dataset followed by training topic models for sponsored search retrieval, that also mitigates the short text challenge. Our data augmentation strategy leverages easily available bipartite click-though graph that allows us to draw similar documents in different languages. The proposed methodology is evaluated on sponsored search system whose performance is measured on correctly matching the user intent, presented via the query, with ads provided by the advertiser. Our experiments substantiate the goodness of the method on EuroParl dataset and live search-engine traffic.
Abstract: Significant development of communication technology over the past few years has motivated research in multi-modal summarization techniques. A majority of the previous works on multi-modal summarization focus on text and images. In this paper, we propose a novel extractive multi-objective optimization based model to produce a multi-modal summary containing text, images, and videos. Important objectives such as intra-modality salience, cross-modal redundancy and cross-modal similarity are optimized simultaneously in a multi-objective optimization framework to produce effective multi-modal output. The proposed model has been evaluated separately for different modalities, and has been found to perform better than state-of-the-art approaches.
Abstract: Popularity is often included in experimental evaluation to provide areference performance for a recommendation task. To understand how popularity baseline is defined and evaluated, we sample 12 papers from top-tier conferences including KDD, WWW, SIGIR, and RecSys, and 6 open source toolkits. We note that the widely adoptedMostPop baseline simply ranks items based on the number of interactions in the training data. We argue that the current evaluation of popularity (i) does not reflect the popular items at the time when a user interacts with the system, and (ii) may recommend items released after a user's last interaction with the system. On the widely used MovieLens dataset, we show that the performance of popularity could be significantly improved by 70% or more, if we consider the popular itemsat the time point when a user interacts with the system. We further show that, on MovieLens dataset, the users having lower tendencies on movies tend to follow the crowd and rate more popular movies. Movie lovers who rate a large number of movies, rate movies based on their own preferences and interests. Through this study, we call for a re-visit of the popularity baseline in recommender system to better reflect its effectiveness.
Abstract: Related or ideal follow-up suggestions to a web query in search engines are often optimized based on several different parameters -- relevance to the original query, diversity, click probability etc. One or many rankers may be trained to score each suggestion from a candidate pool based on these factors. These scorers are usually pairwise classification tasks where each training example consists of a user query and a single suggestion from the list of candidates. We propose an architecture that takes all candidate suggestions associated with a given query and outputs a suggestion block. We discuss the benefits of such an architecture over traditional approaches and experiment with further enforcing each individual metric through mixed-objective training.
Abstract: Reinforcement learning (RL) has been successfully applied to recommender systems. However, the existing RL-based recommendation methods are limited by their unstructured state/action representations. To address this limitation, we propose a novel way that builds high-quality graph-structured states/actions according to the user-item bipartite graph. More specifically, we develop an end-to-end RL agent, termed Graph Convolutional Q-network (GCQN), which is able to learn effective recommendation policies based on the inputs of the proposed graph-structured representations. We show that GCQN achieves significant performance margins over the existing methods, across different datasets and task settings.
Abstract: The crowd is cheaper and easier to access than the oracle to collect the ground truth data for training and evaluating models. To ensure the quality of the crowdsourced data, people can assign multiple crowd workers to one question and then aggregate the multiple answers with diverse quality into a golden one. In the areas of IR and NLP, the ground truth data of many tasks are text sequences. To aggregate multiple crowdsourced text sequences with diverse quality, the methods adapted from the existing answer aggregation methods which are proposed for labels (e.g., categories) only focus on one-sided reliability and do not fully utilize the rich information in text sequences. We thus propose a crowdsourced text sequence aggregation method which can capture the hybrid reliability information, i.e., the local question-wise reliability of text answers and global dataset-wise reliability of crowd workers. For the local reliability, it also incorporates the text similarities from hybrid representation, i.e., the text embeddings and word sequences. The experiments based on real crowdsourced datasets show that our method outperforms the baselines which only utilize one-sided reliability and one-sided representation. Our method can effectively leverage the rich information of text sequences.
Abstract: The main task of personalized recommendation is capturing users' interests based on their historical behaviors. Most of recent advances in recommender systems mainly focus on modeling users' preferences accurately using deep learning based approaches. There are two important properties of users' interests, one is that users' interests are dynamic and evolve over time, the other is that users' interests have different resolutions, or temporal-ranges to be precise, such as long-term and short-term preferences. Existing approaches either use Recurrent Neural Networks (RNNs) to address the drifts in users' interests without considering different temporal-ranges, or design two different networks to model long-term and short-term preferences separately. This paper presents a multi-resolution interest fusion model (MRIF) that takes both properties of users' interests into consideration. The proposed model is capable to capture the dynamic changes in users' interests at different temporal-ranges, and provides an effective way to combine a group of multi-resolution user interests to make predictions. Experiments show that our method outperforms state-of-the-art recommendation methods consistently.
Abstract: This paper proposes a novel neural network, joint training capsule network (JTCN), for the cold start recommendation task. We propose to mimic the high-level user preference other than the raw interaction history based on the side information for the fresh users. Specifically, an attentive capsule layer is proposed to aggregate high-level user preference from the low-level interaction history via a dynamic routing-by-agreement mechanism. Moreover, JTCN jointly trains the loss for mimicking the user preference and the softmax loss for the recommendation together in an end-to-end manner. Experiments on two publicly available datasets demonstrate the effectiveness of the proposed model. JTCN improves other state-of-the-art methods at least 7.07% for CiteULike and 16.85% for Amazon in terms of Recall@100 in cold start recommendation.
Abstract: Extracting the topical information from documents is important for public opinion analysis, text classification, and information retrieval tasks. Compared with identifying a wide variety of topics from long documents, it is challenging to generate a concentrated topic distribution for each short message. Although this problem can be tackled by adjusting the hyper-parameters in traditional topic models such as Latent Dirichlet Allocation, it remains an open problem in neural topic modelling. In this paper, we focus on adapting the popular Auto-Encoding Variational Bayes based neural topic models to short texts, by exploring the Archimedean copulas to guide the estimated topic distributions derived from linear projected samples of re-parameterized posterior distributions. Experimental results show the superiority of our method when compared with existing neural topic models in terms of perplexity, topic coherence, and classification accuracy.
Abstract: Classical ad-hoc retrieval models based on exact matches suffer from the issue of soft matching in text. Besides query expansion approaches, many existing neural IR approaches exploiting word embedding representation alleviate this issue to some extent. We observe that word embedding vectors are usually normalized in practice to retain cosine similarities that are used to construct the query-document interaction matrix for most neural ranking models. These vectors are in fact mapped to the surface of a high-dimensional hypersphere. Existing work in kernel-based ranking do not consider kernel to be a distribution on a certain geometry if the variable of a kernel is a geometric quantity. We propose a kernel-based neural ranking model based on a statistical manifold. We consider the interaction as geodesic on a manifold. We propose a smoothed kernel pooling scheme at different similarity levels based on Riemann normal distribution. Extensive experiments are conducted on the recent benchmark dataset with the state-of-the-art kernel-based neural ranking model, which demonstrate significant improvements brought by our model.
Abstract: Multimodal Machine Comprehension ($\rm M^3C$) has been a challenging task that requires understanding both language and vision, as well as their integration and interaction. For example, the RecipeQA challenge, which provides several $\rm M^3C$ tasks, requires deep neural models to understand textual instructions, images of different steps, as well as the logic orders of food cooking. To address this challenge, we propose a Multi-Level Multi-Modal Transformer (MLMM-Trans) framework to integrate and understand multiple textual instructions and multiple images. Our model can conduct intensive attention mechanism at multiple levels of objects (e.g., step level and passage-image level) for sequences of different modalities. Experiments have shown that our model can achieve the state-of-the-art results on the three multimodal tasks of RecipeQA.
Abstract: By setting a typeface, each character of the Chinese text can be converted to a glyph pixel matrix. We propose to conduct text classification with such glyph features using bi-directional convolution. Although the pixel embedding can be applied to all languages, it is much more convenient to be used to represent Chinese scripts due to the square shape of Chinese characters. We extract both the forward and backward n-gram features of the text via bi-directional convolutional operations and then concatenate them. A subsequent 1-dimensional max-over-time pooling is applied to the bi-directional feature maps, and then three fully connected layers are used for conducting text classification. The proposed model has a light-weight architecture that only contains a single-layer convolutional neural network. Experiments on several Chinese text classification datasets demonstrate surprisingly excellent results for the training speed and superior performance of the proposed model in comparison with traditional methods.
Abstract: There are two main paradigms to exploit review information for recommendation. One is to concatenate all reviews of a user/item into a long document, which may neglect the different usefulness of reviews. The other paradigm is review-level i.e., analyzing each review separately to learn user/item features. In fact, the two paradigms are complementary, and fusing them together has the potential to learn more comprehensive features of users/items. Hence, we propose a unified framework to jointly learn document- and review-level representations of users/items. We design a document encoder to learn document-level features of users/items. Then, we use a review encoder to learn representations of reviews from words, and a user/item encoder to learn review-level features of users/items. Besides, different reviews from the same user may have different importance for different target items due to different item characteristics. We propose a cross attention model for user representation learning whose query vector is the embedding of target item ID, and apply it to the above three encoders to select different informative words and reviews for different target items. Extensive experiments validate the effectiveness of our method.
Abstract: A chatbot that converses like a human should be goal-oriented (i.e., be purposeful in conversation), which is beyond language generation. However, existing goal-oriented dialogue systems often heavily rely on cumbersome hand-crafted rules or costly labelled datasets, which limits the applicability. In this paper, we propose Goal-oriented Chatbots (GoChat), a framework for end-to-end training the chatbot to maximize the long-term return from offline multi-turn dialogue datasets. Our framework utilizes hierarchical reinforcement learning (HRL), where the high-level policy determines some sub-goals to guide the conversation towards the final goal, and the low-level policy fulfills the sub-goals by generating the corresponding utterance for response. In our experiments conducted on a real-world dialogue dataset for anti-fraud in financial, our approach outperforms previous methods on both the quality of response generation as well as the success rate of accomplishing the goal.
Abstract: Commonsense knowledge is fundamental to make machines reach human-level intelligence. However, conventional methods of commonsense extraction generally do not work well because commonsense by nature is usually not explicitly stated in texts or other data. Besides, commonsense knowledge graphs built in advance are difficult to cover all the knowledge required for practical tasks due to the incompleteness of knowledge graphs. In this paper, we propose an online commonsense oracle to achieve knowledge reasoning. Specifically, we focus on the on-demand inference of specific commonsense propositions. We use capableOf relation as an example due to its notable significance in daily life. For more effective capableOf reasoning, informative supporting features derived from an existing commonsense knowledge graph and a Web search engine are exploited. Finally, we conduct extensive experiments, and the results demonstrate the effectiveness of our approach.
Abstract: To leverage on entity and word semantics in entity linking, embedding models have been developed to represent entities, words and their context such that candidate entities for each mention can be determined and ranked accurately using their embeddings. To leverage on entity and word semantics in entity linking, embedding models have been developed to represent entities, words and their context such that candidate entities for each mention can be determined and ranked accurately using their embeddings. In this paper, we leverage on human intelligence for embedding-based interactive entity linking. We adopt an active learning approach to select mentions for human annotation that can best improve entity linking accuracy at the same time updating the embedding model. We propose two mention selection strategies based on: (1) coherence of entities linked, and (2) contextual closeness of candidate entities with respect to mention. Our experiments show that our proposed interactive entity linking methods outperform their batch counterpart in all our experimented datasets with relatively small amount of human annotations.
Abstract: As an important branch of current dialogue systems, retrieval-based chatbots leverage information retrieval to select proper predefined responses. Various promising architectures have been designed for boosting response retrieval, however, few researches exploit the effectiveness of the pre-trained contextual language models. In this paper, we propose two approaches to adapt contextual language models in dialogue response selection task. In detail, the Speaker Segmentation approach is designed to discriminate different speakers to fully utilize speaker characteristics. Besides, we propose the Dialogue Augmentation approach, i.e., cutting off real conversations at different time points, to enlarge the training corpora. Compared with previous works which use utterance-level representations, our augmented contextual language models are able to obtain top-hole contextual dialogue representations for deeper semantic understanding. Evaluation on three large-scale datasets has demonstrated that our proposed approaches yield better performance than existing models.
Abstract: The increasing demand for on-device deep learning necessitates the deployment of deep models on mobile devices. However, directly deploying deep models on mobile devices presents both capacity bottleneck and prohibitive privacy risk. To address these problems, we develop a Differentially Private Knowledge Distillation (DPKD) framework to enable on-device deep learning as well as preserve training data privacy. We modify the conventional Private Aggregation of Teacher Ensembles (PATE) paradigm by compressing the knowledge acquired by the ensemble of teachers into a student model in a differentially private manner. The student model is then trained on both the labeled, public data and the distilled knowledge by adopting a mixed training algorithm. Extensive experiments on popular image datasets, as well as the real implementation on a mobile device show that DPKD can not only benefit from the distilled knowledge but also provide a strong differential privacy guarantee (ε=2$) with only marginal decreases in accuracy.
Abstract: Most deep learning frameworks require users to pool their local data or model updates to a trusted server to train or maintain a global model. The assumption of a trusted server who has access to user information is ill-suited in many applications. To tackle this problem, we develop a new deep learning framework under an untrusted server setting, which includes three modules: (1) embedding module, (2) randomization module, and (3) classifier module. For the randomization module, we propose a novel local differentially private (LDP) protocol to reduce the impact of privacy parameter ε on accuracy, and provide enhanced flexibility in choosing randomization probabilities for LDP. Analysis and experiments show that our framework delivers comparable or even better performance than the non-private framework and existing LDP protocols, demonstrating the advantages of our LDP protocol.
Abstract: Inspired by the recent discoveries in neuroscience, the study of the sparse binary projection model started to attract people's attention, shedding new light on image retrieval. Different from the classical work that tries to reduce the dimension of the data for faster retrieval speed, the model projects dense input samples into a higher-dimensional space and outputs sparse binary data representations after winner-take-all competition. Following the work along this line, this paper designed a new algorithm which obtains a high-quality sparse binary projection matrix through unsupervised training. Simple as it is, the algorithm reported significantly improved results over the state-of-the-art methods in both search accuracy and retrieval speed in a series of empirical evaluations on large-scale image retrieval tasks, which exhibited its promising potential in industrial applications.
Abstract: Language model pre-training has spurred a great deal of attention for tasks involving natural language understanding, and has been successfully applied to many downstream tasks with impressive results. Within information retrieval, many of these solutions are too costly to stand on their own, requiring multi-stage ranking architectures. Recent work has begun to consider how to "backport" salient aspects of these computationally expensive models to previous stages of the retrieval pipeline. One such instance is DeepCT, which uses BERT to re-weight term importance in a given context at the passage level. This process, which is computed offline, results in an augmented inverted index with re-weighted term frequency values. In this work, we conduct an investigation of query processing efficiency over DeepCT indexes. Using a number of candidate generation algorithms, we reveal how term re-weighting can impact query processing latency, and explore how DeepCT can be used as a static index pruning technique to accelerate query processing without harming search effectiveness.
Abstract: Manually extracting relevant aspects and opinions from large volumes of user-generated text is a time-consuming process. Summaries, on the other hand, help readers with limited time budgets to quickly consume the key ideas from the data. State-of-the-art approaches for multi-document summarization, however, do not consider user preferences while generating summaries. In this work, we argue the need and propose a solution for generating personalized aspect-based opinion summaries from large collections of online tourist reviews. We let our readers decide and control several attributes of the summary such as the length and specific aspects of interest among others. Specifically, we take an unsupervised approach to extract coherent aspects from tourist reviews posted onTripAdvisor. We then propose an Integer Linear Programming (ILP) based extractive technique to select an informative subset of opinions around the identified aspects while respecting the user-specified values for various control parameters. Finally, we evaluate and compare our summaries using crowdsourcing and ROUGE-based metrics and obtain competitive results.
Abstract: Recent studies on open-domain question answering have achieved prominent performance improvement using pre-trained language models such as BERT. State-of-the-art approaches typically follow the "retrieve and read" pipeline and employ BERT-based reranker to filter retrieved documents before feeding them into the reader module. The BERT retriever takes as input the concatenation of question and each retrieved document. Despite the success of these approaches in terms of QA accuracy, due to the concatenation, they can barely handle high-throughput of incoming questions each with a large collection of retrieved documents. To address the efficiency problem, we propose DC-BERT, a decoupled contextual encoding framework that has dual BERT models: an online BERT which encodes the question only once, and an offline BERT which pre-encodes all the documents and caches their encodings. On SQuAD Open and Natural Questions Open datasets, DC-BERT achieves 10x speedup on document retrieval, while retaining most (about 98%) of the QA performance compared to state-of-the-art approaches for open-domain question answering.
Abstract: Session-based recommendation produces item predictions mainly based on anonymous sessions. Previous studies have leveraged collaborative information from neighbor sessions to boost the recommendation accuracy for a given ongoing session. Previous work often selects the most recent sessions as candidate neighbors, thereby failing to identify the most related neighbors to obtain an effective neighbor representation. In addition, few existing methods simultaneously consider the sequential signal and the most recent interest in an ongoing session. In this paper, we introduce an Intent-guided Collaborative Machine for Session-based Recommendation (ICM-SR). ICM-SR encodes an ongoing session by leveraging the prior sequential items and the last item to generate an accurate session representation, which is then used to produce initial item predictions as intent. After that, we design an intent-guided neighbor detector to locate the correct neighbor sessions. Finally, the representations of the current session and the neighbor sessions are adaptively combined by a gated fusion layer to produce the final item recommendations. Experiments conducted on two public benchmark datasets show that ICM-SR achieves a significant improvement in terms of Recall and MRR over the state-of-the-art baselines.
Abstract: Session-based recommendation aims to predict a user's actions at the next timestamp based on anonymous sessions. Previous work mainly focuses on the transition relationship between items that the user interacted with during an ongoing session. They generally fail to pay enough attention to the importance of the items involved in these interactions in terms of their relevance to user's main intent. In this paper, we propose a Session-based Recommendation approach with an Importance Extraction Module, i.e., SR-IEM, that considers both a user's long-term and recent behavior in an ongoing session. We employ a modified self-attention mechanism to estimate item importance in a session, which is then used to predict user's long-term preference. Item recommendations are produced by combining the user's long-term preference and their current interest as conveyed by the last item they interacted with. Comprehensive experiments are conducted on two publicly available benchmark datasets. The proposed SR-IEM model outperforms start-of-the-art baselines in terms of Recall and MRR for the task of session-based recommendation. In addition, compared to state-of-the-art models, SR-IEM has a reduced computational complexity.
Abstract: Diagnosis prediction aims to forecast diseases that a patient might have in his next hospital visit, which is critical in Clinical Decision Supporting System (CDSS). Existing approaches mainly formulate diagnosis prediction as a multi-label classification problem and use discrete medical codes as major features. While the structural information among medical codes and time series data in clinical data are generally neglected. In this paper, we propose Multi-modal Clinical Data based Hierarchical Multi-label model (MHM) to integrate discrete medical codes, structural information and time series data into the same framework for diagnosis prediction task. Experimental results on two real world datasets demonstrate the superiority of proposed MHM over state-of-the-art approaches.
Abstract: We investigate a growing body of work that seeks to improve recommender systems through the use of review text. Generally, these papers argue that since reviews 'explain' users' opinions, they ought to be useful to infer the underlying dimensions that predict ratings or purchases. Schemes to incorporate reviews range from simple regularizers to neural network approaches. Our initial findings reveal several discrepancies in reported results, partly due to (e.g.) copying results across papers despite changes in experimental settings or data pre-processing. First, we attempt a comprehensive analysis to resolve these ambiguities. Further investigation calls for discussion on a much larger problem about the "importance" of user reviews for recommendation. Through a wide range of experiments, we observe several cases where state-of-the-art methods fail to outperform existing baselines, especially as we deviate from a few narrowly-defined settings where reviews are useful. We conclude by providing hypotheses for our observations, that seek to characterize under what conditions reviews are likely to be helpful. Through this work, we aim to evaluate the direction in which the field is progressing and encourage robust empirical evaluation.
Abstract: In display advertising, predicting the conversion rate (CVR), meaning the probability that a user takes a predefined action on an advertiser's website, is a fundamental task for estimating the value of displaying an advertisement to a user. There are two main challenges in CVR prediction due to delayed feedback. First, some positive labels are not correctly observed in training data because some conversions do not occur immediately after a click. Second, delay mechanisms are not uniform among instances, meaning some positive feedback are much more frequently observed than others. It is widely acknowledged that these problems lead to severe bias in CVR prediction. To overcome these challenges, we propose two unbiased estimators: one for CVR prediction and the other for bias estimation. Subsequently, we propose a dual learning algorithm in which a CVR predictor and a bias estimator are trained in alternating fashion using only observable conversions. The proposed algorithm is the first of its kind to address the two major challenges in a theoretically sophisticated manner. Empirical evaluations using synthetic datasets demonstrate the practical value of the proposed approach.
Abstract: Extractive-abstractive hybrid summarization can generate readable, concise summaries for long documents. Extraction-then-abstraction and extraction-with-abstraction are two representative approaches to hybrid summarization. But their general performance is yet to be evaluated by large scale experiments.We examined two state-of-the-art hybrid summarization algorithms from three novel perspectives: we applied them to a form of headline generation not previously tried, we evaluated the generalization of the algorithms by testing them both within and across news domains; and we compared the automatic assessment of the algorithms to human comparative judgments. It is found that an extraction-then-abstraction hybrid approach outperforms an extraction-with-abstraction approach, particularly for cross-domain headline generation.
Abstract: With the development of online education systems, a growing number of research works are focusing on Knowledge Tracing (KT), which aims to assess students' changing knowledge state and help them learn knowledge concepts more efficiently. However, only given student learning interactions, most of existing KT methods neglect the individualization of students, i.e., the prior knowledge and learning rates differ from student to student. To this end, in this paper, we propose a novel Convolutional Knowledge Tracing (CKT) method to model individualization in KT. Specifically, for individualized prior knowledge, we measure it from students' historical learning interactions. For individualized learning rates, we design hierarchical convolutional layers to extract them based on continuous learning interactions of students. Extensive experiments demonstrate that CKT could obtain better knowledge tracing results through modeling individualization in learning process. Moreover, CKT can learn meaningful exercise embeddings automatically.
Abstract: Generating natural language descriptions for knowledge graph (KG) is an important category for intelligent writing. Recent models on this task substitute the sequence encoder in a commonly used encoder-decoder framework with a graph encoder. However, these models suffer from entity missing and repetition. In this paper, we propose a novel end-to-end generation model named G2T, which integrates a novel Graph Structure Enhanced Mechanism (GSEM) and a Copy Coverage Loss (CCL). Instead of just considering graph structure in the encoding phase in most existing methods, our GSEM fully utilizes graph structure in the decoding phase and helps to mitigate entity missing problem. Moreover, our CCL can further improve performance by avoiding generating repeated entities. With their help, our model is capable of generating fluent description for KG. The results of automatic and human evaluations show that our model outperforms the state-of-the-art models.
Abstract: In this paper, we study context-response matching with pre-trained contextualized representations for multi-turn response selection in retrieval-based chatbots. Existing models, such as Cove and ELMo, are trained with limited context (often a single sentence or paragraph), and may not work well on multi-turn conversations, due to the hierarchical nature, informal language, and domain-specific words. To address the challenges, we propose pre-training hierarchical contextualized representations, including contextual word-level and sentence-level representations, by learning a dialogue generation model from large-scale conversations with a hierarchical encoder-decoder architecture. Then the two levels of representations are blended into the input and output layer of a matching model respectively. Experimental results on two benchmark conversation datasets indicate that the proposed hierarchical contextualized representations can bring significantly and consistently improvement to existing matching models for response selection.
Abstract: In product-to-product search and recommendation, the product image often plays a pivotal role for the user to determine the relevance of that product. The present study investigates the relationship between the users' visual intents (in terms of colour, texture and material, and design) and the amount of user feedback (namely, clicks, likes, and purchases) using real product data and crowdsourcing. Through the analysis, we found that visual relevance (i.e., relevance of a target product with respect to a particular visual intent) correlates with the amount of user feedback, and that visual relevance can be the cause of user feedback.
Abstract: In this paper, we propose a Multi-View Learning (MVL) framework for news recommendation which uses both the content view and the user-news interaction graph view. In the content view, we use a news encoder to learn news representations from different information like titles, bodies and categories. We obtain representation of user from his/her browsed news conditioned on the candidate news article to be recommended. In the graph-view, we propose to use a graph neural network to capture the user-news, user-user and news-news relatedness in the user-news bipartite graphs by modeling the interactions between different users and news. In addition, we propose to incorporate attention mechanism into the graph neural network to model the importance of these interactions for more informative representation learning of user and news. Experiments on a real world dataset validate the effectiveness of MVL.
Abstract: People use web image search with various search intents: from serious demands for work to just passing time by browsing images of a favorite actor. Such a diversity of intents can influence user satisfaction and evaluation metrics, both of which are important factors for providing a better image search environment. In this paper, we investigate this influence by using a publicly available one-month field study dataset. With respect to satisfaction, we take into consideration both query-level and task-level satisfaction provided by search users. Regarding the evaluation metrics, we use grid-based evaluation metrics that incorporate user behavior specific to image search. The results of our analysis indicate that both query/task satisfaction and grid-based evaluation metrics are influenced by the image search intent. Based on the results, we show possibilities to support users' search processes according to their search intents. We also discuss that there is still room for improvement in evaluation metrics through the development of intent-aware evaluation metrics in image search.
Abstract: Many cognitive researches have shown the natural possibility of face-voice association, and such potential association has attracted much attention in biometric cross-modal retrieval domain. Nevertheless, the existing methods often fail to explicitly learn the common embeddings for challenging face-voice association tasks. In this paper, we present to learn discriminative joint embedding for face-voice association, which can seamlessly train the face subnetwork and voice subnetwork to learn their high-level semantic features, while correlating them to be compared directly and efficiently. Within the proposed approach, we introduce bi-directional ranking constraint, identity constraint and center constraint to learn the joint face-voice embedding, and adopt bi-directional training strategy to train the deep correlated face-voice model. Meanwhile, an online hard negative mining technique is utilized to discriminatively construct hard triplets in a mini-batch manner, featuring on speeding up the learning process. Accordingly, the proposed approach is adaptive to benefit various face-voice association tasks, including cross-modal verification, 1:2 matching, 1:N matching, and retrieval scenarios. Extensive experiments have shown its improved performances in comparison with the state-of-the-art ones.
Abstract: The CTR (Click-Through Rate) prediction plays a central role in the domain of computational advertising and recommender systems. There exists several kinds of methods proposed in this field, such as Logistic Regression (LR), Factorization Machines (FM) and deep learning based methods like Wide&Deep, Neural Factorization Machines (NFM) and DeepFM. However, such approaches generally use the vector-product of each pair of features, which have ignored the different semantic spaces of the feature interactions. In this paper, we propose a novel Tensor-based Feature interaction Network (TFNet) model, which introduces an operating tensor to elaborate feature interactions via multi-slice matrices in multiple semantic spaces. Extensive offline and online experiments show that TFNet: 1) outperforms the competitive compared methods on the typical Criteo and Avazu datasets; 2) achieves large improvement of revenue and click rate in online A/B tests in the largest Chinese App recommender system, Tencent MyApp.
Abstract: A better understanding of users' reading behavior helps improve many information retrieval (IR) tasks, such as relevance estimation and document ranking. Existing research has already leveraged eye movement information to investigate user's reading process during document-level relevance judgments and the findings were adopted to build more effective ranking models. Recently, fine-grained (e.g., passage or sentence level) relevance judgments have been paid much attention to with the requirements in conversational search and QA systems. However, there is still a lack of thorough investigation on user's reading behavior during these kinds of interaction processes. To shed light on this research question, we investigate how users allocate their attention to passages of a document during the relevance judgment process. With the eye-tracking data collected in a laboratory study, we show that users pay more attention to the "key" passages which contain key useful information. Users tend to revisit these key passages several times to accumulate and verify the gathered information. With both content and user behavior features, we find that key passages can be predicted with supervised learning. We believe that this work contributes to better understanding users' reading behavior and may provide more explainability for relevance estimation.
Abstract: Many prediction tasks of real-world applications need to model multi-order feature interactions in user's event sequence for better detection performance. However, existing popular solutions usually suffer two key issues: 1) only focusing on feature interactions and failing to capture the sequence influence; 2) only focusing on sequence information, but ignoring internal feature relations of each event, thus failing to extract a better event representation. In this paper, we consider a two-level structure for capturing the hierarchical information over user's event sequence: 1) learning effective feature interactions based event representation; 2) modeling the sequence representation of user's historical events. Experimental results on both industrial and public datasets clearly demonstrate that our model achieves significantly better performance compared with state-of-the-art baselines.
Abstract: Graph neural networks (GNNs) achieve remarkable success in graph-based semi-supervised node classification, leveraging the information from neighboring nodes to improve the representation learning of target node. The success of GNNs at node classification depends on the assumption that connected nodes tend to have the same label. However, such an assumption does not always work, limiting the performance of GNNs at node classification. In this paper, we propose label-consistency based graph neural network (LC-GNN), leveraging node pairs unconnected but with the same labels to enlarge the receptive field of nodes in GNNs. Experiments on benchmark datasets demonstrate the proposed LC-GNN outperforms traditional GNNs in graph-based semi-supervised node classification. We further show the superiority of LC-GNN in sparse scenarios with only a handful of labeled nodes.
Abstract: The ability of semantic reasoning over the sentence pair is essential for many natural language understanding tasks, e.g., natural language inference and machine reading comprehension. A recent significant improvement in these tasks comes from BERT. As reported, the next sentence prediction (NSP) in BERT is of great significance for downstream problems with sentence-pair input. Despite its effectiveness, NSP still lacks the essential signal to distinguish between entailment and shallow correlation. To remedy this, we propose to augment the NSP task to a multi-class categorization task, which includes previous sentence prediction (PSP). This task encourages the model to learn the subtle semantics, thereby improves the ability of semantic understanding. Furthermore, by using a smoothing technique, the scopes of NSP and PSP are expanded into a broader range which includes close but nonsuccessive sentences. This simple method yields remarkable improvement against vanilla BERT. Our method consistently improves the performance on the NLI and MRC benchmarks by a large margin, including the challenging HANS dataset.
Abstract: Deep Interest Network (DIN) is a state-of-the-art model which uses attention mechanism to capture user interests from historical behaviors. User interests intuitively follow a hierarchical pattern such that users generally show interests from a higher-level then to a lower-level abstraction. Modelling such interest hierarchy in an attention network can fundamentally improve the representation of user behaviors. We therefore propose an improvement over DIN to model arbitrary interest hierarchy: Deep Interest with Hierarchical Attention Network (DHAN). In this model, a multi-dimensional hierarchical structure is introduced on the first attention layer which attends to individual item, and the subsequent attention layers in the same dimension attend to higher-level hierarchy built on top of the lower corresponding layers. To enable modelling of multiple dimensional hierarchy, an expanding mechanism is introduced to capture one to many hierarchies. This design enables DHAN to attend different importance to different hierarchical abstractions thus can fully capture a user's interests at different dimensions (e.g. category, price or brand). To validate our model, a simplified DHAN is applied to Click-Through Rate (CTR) prediction and our experimental results on three public datasets with two levels of one-dimensional hierarchy only by category. It shows DHAN's superiority with significant AUC uplift from 12% to 21% over DIN. DHAN is also compared with another state-of-the-art model Deep Interest Evolution Network (DIEN), which models temporal interest. The simplified DHAN also gets slight AUC uplift from 1.0% to 1.7% over DIEN. A potential future work can be combination of DHAN and DIEN to model both temporal and hierarchical interests.
Abstract: Deep neural networks (DNNs) have been widely employed in recommender systems including incorporating attention mechanism for performance improvement. However, most of existing attention-based models only apply item-level attention on user side, restricting the further enhancement of recommendation performance. In this paper, we propose a knowledge-enhanced recommendation model ACAM, which incorporates item attributes distilled from knowledge graphs (KGs) as side information, and is built with a co-attention mechanism on attribute-level to achieve performance gains. Specifically, each user and item in ACAM are represented by a set of attribute embeddings at first. Then, user representations and item representations are augmented simultaneously through capturing the correlations between different attributes by a co-attention module. Our extensive experiments over two realistic datasets show that the user representations and item representations augmented by attribute-level co-attention gain ACAM's superiority over the state-of-the-art deep models.
Abstract: In this paper, we propose a multi-source domain adaptation method with a Granger-causal objective (MDA-GC) for cross-domain sentiment classification. Specifically, for each source domain, we build an expert model by using a novel sentiment-guided capsule network, which captures the domain invariant knowledge that bridges the knowledge gap between the source and target domains. Then, an attention mechanism is devised to assign importance weights to a mixture of experts, each of which specializes in a different source domain. In addition, we propose a Granger causal objective to make the weights assigned to individual experts correlate strongly with their contributions to the decision at hand. Experimental results on a benchmark dataset demonstrate that the proposed MDA-GC model significantly outperforms the compared methods.
Abstract: Spaced repetition technique aims at improving long-term memory retention for human students by exploiting repeated, spaced reviews of learning contents. The study of spaced repetition focuses on designing an optimal policy to schedule the learning contents. To the best of our knowledge, none of the existing methods based on reinforcement learning take into account the varying time intervals between two adjacent learning events of the student, which, however, are essential to determine real-world schedule. In this paper, we aim to learn a scheduling policy that fully exploits the varying time interval information with high sample efficiency. We propose the Time-Aware scheduler with Dyna-Style planning (TADS) approach: a sample-efficient reinforcement learning framework for realistic spaced repetition. TADS learns a Time-LSTM policy to select an optimal content according to the student's whole learning history and the time interval since the last learning event. Besides, Dyna-style planning is integrated into TADS to further improve the sample efficiency. We evaluate our approach on three environments built from synthetic data and real-world data based on well-recognized cognitive models. Empirical results demonstrate that TADS achieves superior performance against state-of-the-art algorithms.
Abstract: Session-based recommendation nowadays plays a vital role in many websites, which aims to predict users' actions based on anonymous sessions. There have emerged many studies that model a session as a sequence or a graph via investigating temporal transitions of items in a session. However, these methods compress a session into one fixed representation vector without considering the target items to be predicted. The fixed vector will restrict the representation ability of the recommender model, considering the diversity of target items and users' interests. In this paper, we propose a novel target attentive graph neural network (TAGNN) model for session-based recommendation. In TAGNN, target-aware attention adaptively activates different user interests with respect to varied target items. The learned interest representation vector varies with different target items, greatly improving the expressiveness of the model. Moreover, TAGNN harnesses the power of graph neural networks to capture rich item transitions in sessions. Comprehensive experiments conducted on real-world datasets demonstrate its superiority over state-of-the-art methods.
Abstract: E-commerce platforms greatly benefit from high-quality search that retrieves relevant search results in response to search terms. For the sake of search relevance, Query Classification (QC) has been widely adopted to make search engines robust against low text quality and complex category hierarchy. Generally, QC solutions categorize search queries and direct users to the suggested categories whereby the search results are then retrieved. In this way, the search scope is contextually constrained to increase search relevance. However, such operations might risk deteriorating e-commerce metrics when irrelevant categories are suggested. Thus, QC solutions are expected to demonstrate high accuracy. Unfortunately, existing QC methods mainly focus on the intrinsic performance of classifiers whereas fail to consider post-inference optimization that could further improve reliability. To fill up the research gap, we propose the Query Classification with Multi-objective Backoff (QCMB). The proposed solution consists of two steps: 1) hierarchical text classification that classifies search queries into multi-level categories; and 2) multi-objective backoff that substitutes potentially misclassified leaf categories with appropriate ancestors that optimize the trade-off between accuracy and depth. The proposed QCMB is evaluated using the real-world search data of Trade Me that is the largest e-commerce platform in New Zealand. Compared with the benchmarks, QCMB delivers superior solutions with flexible tuning to satisfy different users' demands. To the best of our knowledge, this work is the first attempt to enhance QC with multi-objective optimization.
Abstract: Recommender system is one of the most successful machine learning technologies for commerce. However, it can reinforce the closed feedback loop problem, where the recommender system generates items to users, then the further recommendation model is trained with the data that users' feedback to the items. Such self-reinforcing pattern can cause data bias problems. There are several debiasing methods, inverse-propensity-scoring (IPS) is a practical one for industry product. Since it is relatively easy to reweight training samples, and ameliorate the distribution shift problem. However,because of deterministic policy problem and confoundings in real-world data, it is hard to predict propensity score accurately. Inspired by the sample reweight work for robust deep learning, we propose a novel influence function based method for recommendation modeling, and analyze how the influence function corrects the bias. In the experiments, our proposed method achieves better performance against the state-of-the-art approaches.
Abstract: Conversational query rewriting aims to reformulate a concise conversational query to a fully specified, context-independent query that can be effectively handled by existing information retrieval systems. This paper presents a few-shot generative approach to conversational query rewriting. We develop two methods, based on rules and self-supervised learning, to generate weak supervision data using large amounts of ad hoc search sessions, and to fine-tune GPT-2 to rewrite conversational queries. On the TREC Conversational Assistance Track, our weakly supervised GPT-2 rewriter improves the state-of-the-art ranking accuracy by 12%, only using very limited amounts of manual query rewrites. In the zero-shot learning setting, the rewriter still gives a comparable result to previous state-of-the-art systems. Our analyses reveal that GPT-2 effectively picks up the task syntax and learns to capture context dependencies, even for hard cases that involve group references and long-turn dependencies.
Abstract: Recommendation reason generation, aiming at showing the selling points of products for customers, plays a vital role in attracting customers' attention as well as improving user experience. A simple and effective way is to extract keywords directly from the knowledge-base of products, i.e., attributes or title, as the recommendation reason. However, generating recommendation reason from product knowledge doesn't naturally respond to users' interests. Fortunately, on some E-commerce websites, there exists more and more user-generated content (user-content for short), i.e., product question-answering (QA) discussions, which reflect user-cared aspects. Therefore, in this paper, we consider generating the recommendation reason by taking into account not only the product attributes but also the customer-generated product QA discussions. In reality, adequate user-content is only possible for the most popular commodities, whereas large sums of long-tail products or new products cannot gather a sufficient number of user-content. To tackle this problem, we propose a user-inspired multi-source posterior transformer (MSPT), which induces the model reflecting the users' interests with a posterior multiple QA discussions module, and generating recommendation reasons containing the product attributes as well as the user-cared aspects. Experimental results show that our model is superior to traditional generative models. Additionally, the analysis also shows that our model can focus more on the user-cared aspects than baselines.
Abstract: Although BERT has shown its effectiveness in a number of IR-related tasks, especially document ranking, the understanding of its internal mechanism remains insufficient. To increase the explainability of the ranking process performed by BERT, we investigate a state-of-the-art BERT-based ranking model with focus on its attention mechanism and interaction behavior. Firstly, we look into the evolving of the attention distribution. It shows that in each step, BERT dumps redundant attention weights on tokens with high document frequency (such as periods). This may lead to a potential threat to the model robustness and should be considered in future studies. Secondly, we study how BERT models interactions between query and document and find that BERT aggregates document information to query token representations through their interactions, but extracts query-independent representations for document tokens. It indicates that it is possible to transform BERT into a more efficient representation-focused model. These findings help us better understand the ranking process by BERT and may inspire future improvement.
Abstract: Multi-Choice Reading Comprehension~(MCRC) is an essential task where a machine selects the correct answer from multiple choices given a context document and a corresponding question. Existing methods usually make predictions based on a single-round reasoning process with the attention mechanism, however, this may be insufficient for tasks that require a more complex reasoning process. To effectively comprehend the context and select the correct answer from different perspectives, we propose the Read-Attend-Exclude (RAE) model which is motivated by what human readers do for MCRC in multi-rounds reasoning process. Specifically, the RAE model includes four components: the Scan Reading Module, the Attended Intensive Reading Module, the Answer Exclusion Module, and the Gated Fusion Module that makes the final decisions collectively based on the aforementioned three modules. Extensive experiments demonstrate the strong results of the proposed model on the DREAM dataset and the effectiveness of all proposed modules.
Abstract: Obtaining training data for multi-document Summarization (MDS) is time consuming and resource-intensive, so recent neural models can only be trained for limited domains. In this paper, we propose SummPip: an unsupervised method for multi-document summarization, in which we convert the original documents to a sentence graph, taking both linguistic and deep representation into account, then apply spectral clustering to obtain multiple clusters of sentences, and finally compress each cluster to generate the final summary. Experiments on Multi-News and DUC-2004 datasets show that our method is competitive to previous unsupervised methods and is even comparable to the neural supervised approaches. In addition, human evaluation shows our system produces consistent and complete summaries compared to human written ones.
Abstract: Chinese word segmentation (CWS) is an important research topic in information retrieval (IR) and natural language processing (NLP). Significant progresses have been made by deep neural networks with context features. However, these deep models may fail to deal with rare or ambiguous words, thus limit the overall CWS performance. In this paper, we propose a lexicon-enhanced adaptive attention network (LAAN), which takes full advantage of external lexicons to deal with the rare or ambiguous words. Specifically, we devise an adaptive attention mechanism to learn the lexicon-aware representation. In addition, we propose a fusion gate to effectively integrate the additional word information with context information to improve the performance of CWS. LAAN is evaluated on four benchmark datasets, and the experimental results demonstrate that LAAN has robust superiority over the compared methods.
Abstract: The existing sequential recommendation methods focus on modeling the temporal relationships of user behaviors and are good at using additional item information to improve performance. However, these methods rarely consider the influences of users' sequential subjective sentiments on their behaviors---and sometimes the temporal changes in human sentiment patterns plays a decisive role in users' final preferences. To investigate the influence of temporal sentiments on user preferences, we propose generating preferences by guiding user behavior through sequential sentiments. Specifically, we design a dual-channel fusion mechanism. The main channel consists of sentiment-guided attention to match and guide sequential user behavior, and the secondary channel consists of sparse sentiment attention to assist in preference generation. In the experiments, we demonstrate the effectiveness of these two sentiment modeling mechanisms through ablation studies. Our approach outperforms current state-of-the-art sequential recommendation methods that incorporate sentiment factors.
Abstract: The abductive natural language inference task (αNLI) is proposed to evaluate the abductive reasoning ability of a learning system. In the αNLI task, two observations are given and the most plausible hypothesis is asked to pick out from the candidates. Existing methods simply formulate it as a classification problem, thus a cross-entropy log-loss objective is used during training. However, discriminating true from false does not measure the plausibility of a hypothesis, for all the hypotheses have a chance to happen, only the probabilities are different. To fill this gap, we switch to a ranking perspective that sorts the hypotheses in order of their plausibilities. With this new perspective, a novel L2R2 approach is proposed under the learning-to-rank framework. Firstly, training samples are reorganized into a ranking form, where two observations and their hypotheses are treated as the query and a set of candidate documents respectively. Then, an ESIM model or pre-trained language model, e.g. BERT or RoBERTa, is obtained as the scoring function. Finally, the loss functions for the ranking task can be either pair-wise or list-wise for training. The experimental results on the ART dataset reach the state-of-the-art in the public leaderboard.
Abstract: Topic modelling is a popular unsupervised method for identifying the underlying themes in document collections that has many applications in information retrieval. A topic is usually represented by a list of terms ranked by their probability but, since these can be difficult to interpret, various approaches have been developed to assign descriptive labels to topics. Previous work on the automatic assignment of labels to topics has relied on a two-stage approach: (1) candidate labels are retrieved from a large pool (e.g. Wikipedia article titles); and then (2) re-ranked based on their semantic similarity to the topic terms. However, these extractive approaches can only assign candidate labels from a restricted set that may not include any suitable ones. This paper proposes using a sequence-to-sequence neural-based approach to generate labels that does not suffer from this limitation. The model is trained over a new large synthetic dataset created using distant supervision. The method is evaluated by comparing the labels it generates to ones rated by humans.
Abstract: Snippets are used in web search to help users assess the relevance of retrieved results to their query. Recently, specialized search engines have arisen that retrieve pro and con arguments on controversial issues. We argue that standard snippet generation is insufficient to represent the core reasoning of an argument. In this paper, we introduce the task of generating a snippet that represents the main claim and reason of an argument. We propose a query-independent extractive summarization approach to this task that uses a variant of PageRank to assess the importance of sentences based on their context and argumentativeness. In both automatic and manual evaluation, our approach outperforms strong baselines.
Abstract: Looking for health information is one of the most popular activities online. However, the specificity of language on this domain is frequently an obstacle to comprehension, especially for the ones with lower levels of health literacy. For this reason, search engines should consider the readability of health content and, if possible, adapt it to the user behind the search. In this work, we explore methods to assess the readability of health content automatically. We propose features capable of measuring the specificity of a medical text and estimate the knowledge necessary to comprehend it. The features are based on information retrieval metrics and the log-likelihood of a text with lay and medico-scientific language models. To evaluate our methods, we built and used a dataset composed of health articles of Simple English Wikipedia and the respective documents in ordinary Wikipedia. We achieved a maximum accuracy of 88% in binary classifications (easy versus hard-to-read). We found out that the machine learning algorithm does not significantly interfere with performance. We also experimented and compared different features combinations. The features using the values of the log-likelihood of a text with lay and medico-scientific language models perform better than all the others.
Abstract: The Information Retrieval (IR) community has witnessed a flourishing development of deep neural networks, however, only a few managed to beat strong baselines. Among them, models like DRMM and DUET were able to achieve better results thanks to the proper handling of exact match signals. Nowadays, the application of pre-trained language models to IR tasks has achieved impressive results exceeding all previous work. In this paper, we assume that established IR cues like exact term-matching, proven to be valuable for deep neural models, can be used to augment the direct supervision from labeled data for training these pre-trained models. To study the effectiveness of this assumption, we propose MarkedBERT a modified version of one of the most popular pre-trained models via language modeling tasks, BERT. MarkedBERT integrates exact match signals using a marking technique that locates and highlights Exact Matched query-document terms using marker tokens. Experiments on MS MARCO Passage Ranking task show that our rather simple approach is actually effective. We find that augmenting the input with marker tokens allows the model to focus on valuable text sequences for IR.
Abstract: Trip-qualifiers, such as 'trip-type' (vacation, work etc.), 'accompanied-by' (e.g., solo, friends, family etc.) are potentially useful sources of information that could be used to improve the effectiveness of POI recommendation in a current context (with a given set of these constraints). Using such information is not straight forward because a user's text reviews about the POIs visited in the past do not explicitly contain such annotations (e.g., a positive review about a pub visit does not contain the information on whether the user was with friends or alone, on a business trip or vacation). We propose to use a small set of manually compiled knowledge resource to predict the associations between the review texts in a user profile and the likely trip contexts. We demonstrate that incorporating this information within an IR-based relevance modeling framework significantly improves POI recommendation.
Abstract: CAsT-19 is a new dataset that supports research on conversational information seeking. The corpus is 38,426,252 passages from the TREC Complex Answer Retrieval (CAR) and Microsoft MAchine Reading COmprehension (MARCO) datasets. Eighty information seeking dialogues (30 train, 50 test) are an average of 9 to 10 questions long. A dialogue may explore a topic broadly or drill down into subtopics. Questions contain ellipsis, implied context, mild topic shifts, and other characteristics of human conversation that may prevent them from being understood in isolation. Relevance assessments are provided for 30 training topics and 20 test topics. CAsT-19 promotes research on conversational information seeking by defining it as a task in which effective passage selection requires understanding a question's context (the dialogue history). It focuses attention on user modeling, analysis of prior retrieval results, transformation of questions into effective queries, and other topics that have been difficult to study with existing datasets.
Abstract: Perceptual Speed (PS) is a cognitive ability that is known to affect multiple factors in Information Retrieval (IR) such as a user's search performance and subjective experience. However PS tests are difficult to administer which limits the design of user-adaptive systems that can automatically infer PS to appropriately accommodate low PS users. Consequently, this paper evaluated whether PS can be automatically classified from search behaviour using several machine learning models trained on features extracted from TREC Common Core search task logs. Our results are encouraging: given a user's interactions from one query, a Decision Tree was able to predict a user's PS as low or high with 86% accuracy. Additionally, we identified different behavioural components for specific PS tests, implying that each PS test measures different aspects of a person's cognitive ability. These findings motivate further work for how best to design search systems that can adapt to individual differences.
Abstract: Document indexing is a key component for efficient information retrieval (IR). After preprocessing steps such as stemming and stop-word removal, document indexes usually store term-frequencies (tf). Along with tf (that only reflects the importance of a term in a document), traditional IR models use term discrimination values (TDVs) such as inverse document frequency (idf) to favor discriminative terms during retrieval. In this work, we propose to learn TDVs for document indexing with shallow neural networks that approximate traditional IR ranking functions such as TF-IDF and BM25. Our proposal outperforms, both in terms of nDCG and recall, traditional approaches, even with few positively labelled query-document pairs as learning data. Our learned TDVs, when used to filter out terms of the vocabulary that have zero discrimination value, allow to both significantly lower the memory footprint of the inverted index and speed up the retrieval process (BM25 is up to 3~times faster), without degrading retrieval quality.
Abstract: Learning to rank~(LTR) is the de facto standard for web search, improving upon classical retrieval models by exploiting (in)direct relevance feedback from user judgments, interaction logs, etc. We investigate for the first time the effect of a sampling bias on LTR~models due to the potential presence of near-duplicate web pages in the training data, and how (in)consistent relevance feedback of duplicates influences an LTR~model's decisions. To examine this bias, we construct a series of specialized LTR~datasets based on the ClueWeb09 corpus with varying amounts of near-duplicates. We devise worst-case and average-case train/test splits that are evaluated on popular pointwise, pairwise, and listwise LTR~models. Our experiments demonstrate that duplication causes overfitting and thus less effective models, making a strong case for the benefits of systematic deduplication before training and model evaluation.
Abstract: Top-N recommendations are widely applied in various real life domains and keep attracting intense attention from researchers and industry due to available multi-type information, new advances in AI models and deeper understanding of user satisfaction. Whileaccuracy has been the prevailing issue of the recommendation problem for the last decades, other facets of the problem, namelydiversity andexplainability, have received much less attention. In this paper, we focus on enhancing diversity of top-N recommendation, while ensuring the trade-off between accuracy and diversity. Thus, we propose an effective framework DivKG leveraging knowledge graph embedding and determinantal point processes (DPP). First, we capture different kinds of relations among users, items and additional entities through a knowledge graph structure. Then, we represent both entities and relations as k-dimensional vectors by optimizing a margin-based loss with all kinds of historical interactions. We use these representations to construct kernel matrices of DPP in order to make top-N diversified predictions. We evaluate our framework on MovieLens datasets coupled with IMDb dataset. Our empirical results show substantial improvement over the state-of-the-art regarding both accuracy and diversity metrics.
Abstract: Tools capable of automatic code generation have the potential to augment programmer's capabilities. While straightforward code retrieval is incorporated into many IDEs, an emerging area is explicit code generation. Code generation is currently approached as a Machinetask, with Recurrent Neural Network (RNN) based encoder-decoder architectures trained on code-description pairs. In this work we introduce and study modern Transformer architectures for this task. We further propose a new model called the Relevance Transformer that incorporates external knowledge using pseudo-relevance feedback. The Relevance Transformer biases the decoding process to be similar to existing retrieved code while enforcing diversity. We perform experiments on multiple standard benchmark datasets for code generation including Django, Hearthstone, and CoNaLa. The results show improvements over state-of-the-art methods based on BLEU evaluation. The Relevance Transformer model shows the potential of Transformer-based architectures for code generation and introduces a method of incorporating pseudo-relevance feedback during inference.
Abstract: Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by this, we present Semantic Hashing with Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model. PairRec first encodes weakly supervised training pairs (a query document and a semantically similar document) into two hash codes, and then learns to reconstruct the same query document from both of these hash codes (i.e., pairwise reconstruction). This pairwise reconstruction enables our model to encode local neighbourhood structures within the hash code directly through the decoder. We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.
Abstract: We study whether it is possible to infer if a news headline is true or false using only the movement of the human eyes when reading news headlines. Our study with 55 participants who are eye-tracked when reading 108 news headlines (72 true, 36 false) shows that false headlines receive statistically significantly less visual attention than true headlines. We further build an ensemble learner that predicts news headline factuality using only eye-tracking measurements. Our model yields a mean AUC of 0.688 and is better at detecting false than true headlines. Through a model analysis, we find that eye-tracking 25 users when reading 3-6 headlines is sufficient for our ensemble learner.
Abstract: The scarcity of Arabic test collections has long hindered information retrieval (IR) research over the Arabic Web. In this work, we present ArTest, the first large-scale test collection designed for the evaluation of ad-hoc search over the Arabic Web. ArTest uses ArabicWeb16, a collection of around 150M Arabic Web pages as the document collection, and includes 50 topics, 10,529 relevance judgments, and (more importantly) a rationale behind each judgment. To our knowledge, this is also the first IR test collection that includes rationales of primary assessors (i.e., topic developers) for their relevance judgments, exhibiting a useful resource for understanding the relevance phenomena. Finally, ArTest is made publicly-available for the research community.
Abstract: Neural networks, particularly Transformer-based architectures, have achieved significant performance improvements on several retrieval benchmarks. When the items being retrieved are documents, the time and memory cost of employing Transformers over a full sequence of document terms can be prohibitive. A popular strategy involves considering only the first n terms of the document. This can, however, result in a biased system that under retrieves longer documents. In this work, we propose a local self-attention which considers a moving window over the document terms and for each term attends only to other terms in the same window. This local attention incurs a fraction of the compute and memory cost of attention over the whole document. The windowed approach also leads to more compact packing of padded documents in minibatches resulting in additional savings. We also employ a learned saturation function and a two-staged pooling strategy to identify relevant regions of the document. The Transformer-Kernel pooling model with these changes can efficiently elicit relevance information from documents with thousands of tokens. We benchmark our proposed modifications on the document ranking task from the TREC 2019 Deep Learning track and observe significant improvements in retrieval quality as well as increased retrieval of longer documents at moderate increase in compute and memory costs.
Abstract: Recommendation systems are often trained and evaluated based on users' interactions obtained through the use of an existing, already deployed, recommendation system. Hence the deployed recommendation systems will recommend some items and not others, and items will have varying levels of exposure to users. As a result, the collected feedback dataset (including most public datasets) can be skewed towards the particular items favored by the deployed model. In this manner, training new recommender systems from interaction data obtained from a previous model creates a feedback loop, i.e. a closed loop feedback. In this paper, we first introduce the closed loop feedback and then investigate the effect of closed loop feedback in both the training and offline evaluation of recommendation models, in contrast to a further exploration of the users' preferences (obtained from the randomly presented items). To achieve this, we make use of open loop datasets, where randomly selected items are presented to users for feedback. Our experiments using an open loop Yahoo! dataset reveal that there is a strong correlation between the deployed model and a new model that is trained based on the closed loop feedback. Moreover, with the aid of exploration we can decrease the effect of closed loop feedback and obtain new and better generalizable models.
Abstract: The users' historical interactions usually contain their interests and purchase habits based on which personalised recommendations can be made. However, such user interactions are often sparse, leading to the well-known cold-start problem when a user has no or very few interactions. In this paper, we propose a new recommendation model, named Heterogeneous Graph Neural Recommender (HGNR), to tackle the cold-start problem while ensuring effective recommendations for all users. Our HGNR model learns users and items' embeddings by using the Graph Convolutional Network based on a heterogeneous graph, which is constructed from user-item interactions, social links and semantic links predicted from the social network and textual reviews. Our extensive empirical experiments on three public datasets demonstrate that HGNR significantly outperforms competitive baselines in terms of the Normalised Discounted Cumulative Gain and Hit Ratio measures.
Abstract: Search engine ranking pipelines are commonly based on large ensembles of machine-learned decision trees. The tight constraints on query response time recently motivated researchers to investigate algorithms to make faster the traversal of the additive ensemble or to early terminate the evaluation of documents that are unlikely to be ranked among the top-k. In this paper, we investigate the novel problem of query-level early exiting, aimed at deciding the profitability of early stopping the traversal of the ranking ensemble for all the candidate documents to be scored for a query, by simply returning a ranking based on the additive scores computed by a limited portion of the ensemble. Besides the obvious advantage on query latency and throughput, we address the possible positive impact on ranking effectiveness. To this end, we study the actual contribution of incremental portions of the tree ensemble to the ranking of the top-k documents scored for a given query. Our main finding is that queries exhibit different behaviors as scores are accumulated during the traversal of the ensemble and that query-level early stopping can remarkably improve ranking quality. We present a reproducible and comprehensive experimental evaluation, conducted on two public datasets, showing that query-level early exiting achieves an overall gain of up to 7.5% in terms of NDCG@10 with a speedup of the scoring process of up to 2.2x.
Abstract: To fulfill their information needs, users submit sets of related queries to available search engines. Query logs record users' activities along with timestamps and additional search-related information. The analysis of those chronological query logs enables the modeling of search tasks from user interactions. Previous research works rely on clicked URLs and surrounding queries to determine if adjacent queries are part of the same search tasks to segment the query logs properly. However, waiting for clicked URLs or future adjacent queries could render the use of these methods unfeasible in user supporting applications that require model results on the fly. Therefore, we propose a model for sequential search log segmentation. The proposed model uses only query pairs and their time span, generating results suited for on the fly user supporting applications, with improved accuracy over existing search segmentation approaches. We also show the advantages of fine-tuning the proposed model for adjusting the architecture to a small annotated collection.
Abstract: Users convert their information needs to search queries, which are then run on available search engines. Query logs registered by search engines enable the automatic identification of the search tasks that users perform to fulfill their information needs. Search engine logs contain queries in multiple languages, but most existing methods for search task identification are not multilingual. Some methods rely on search context training of custom embeddings or external indexed collections that support a single language, making it challenging to support the multiple languages of queries run in search engines. Other methods depend on supervised components and user identifiers to model search tasks. The supervised components require labeled collections, which are difficult and costly to get in multiple languages. Also, the need for user identifiers renders these methods unfeasible in user agnostic scenarios. Hence, we propose an unsupervised multilingual approach for search task identification. The proposed approach is user agnostic, enabling its use in both user-independent and personalized scenarios. Furthermore, the multilingual query representation enables us to address the existing trade-off when mapping new queries to the identified search tasks.
Abstract: Personalised top-N item recommendation systems aim to generate a ranked list of interesting items to users based on their interactions (e.g. click, purchase and rating). Recently, various sequential-based factorised approaches have been proposed to exploit deep neural networks to effectively capture the users' dynamic preferences from their sequences of interactions. These factorised approaches usually rely on a pairwise ranking objective such as the Bayesian Personalised Ranking (BPR) for optimisation. However, previous works have shown that optimising factorised approaches with BPR can hinder the generalisation, which can degrade the quality of item recommendations. To address this challenge, we propose a Sequential-based Adversarial Optimisation (SAO) framework that effectively enhances the generalisation of sequential-based factorised approaches. Comprehensive experiments on six public datasets demonstrate the effectiveness of the SAO framework in enhancing the performance of the state-of-the-art sequential-based factorised approach in terms of NDCG by 3-14%.
Abstract: Task-based Virtual Personal Assistants (VPAs) such as the Google Assistant, Alexa, and Siri are increasingly being adopted for a wide variety of tasks. These tasks are grounded in real-world entities and actions (e.g., book a hotel, organise a conference, or requesting funds). In this work we tackle the task of automatically constructing actionable knowledge graphs in response to a user query in order to support a wider variety of increasingly complex assistant tasks. We frame this as an entity property ranking task given a user query with annotated properties. We propose a new method for property ranking, CrossBERT. CrossBERT builds on the Bidirectional Encoder Representations from Transformers (BERT) and creates a new triplet network structure on cross query-property pairs that is used to rank properties. We also study the impact of using external evidence for query entities from textual entity descriptions. We perform experiments on two standard benchmark collections, the NTCIR-13 Actionable Knowledge Graph Generation (AKGG) task and Entity Property Identification (EPI) task. The results demonstrate that CrossBERT significantly outperforms the best performing runs from AKGG and EPI, as well as previous state-of-the-art BERT-based models. In particular, CrossBERT significantly improves Recall and NDCG by approximately 2-12% over the BERT models across the two used datasets.
Abstract: Active learning strategies are often deployed in technology-assisted review tasks, such as e-discovery and sensitivity review, to learn a classifier that can assist the reviewers with their task. In particular, an active learning strategy selects the documents that are expected to be the most useful for learning an effective classifier, so that these documents can be reviewed before the less useful ones. However, when reviewing for sensitivity, the order in which the documents are reviewed can impact on the reviewers' ability to perform the review. Therefore, when deploying active learning in technology-assisted sensitivity review, we want to know when a sufficiently effective classifier has been learned, such that the active learning can stop and the reviewing order of the documents can be selected by the reviewer instead of the classifier. In this work, we propose two active learning stopping strategies for technology-assisted sensitivity review. We evaluate the effectiveness of our proposed approaches in comparison with three state-of-the-art stopping strategies from the literature. We show that our best performing approach results in a significantly more effective sensitivity classifier (+6.6% F2) than the best performing stopping strategy from the literature (McNemar's test, p<0.05).
Abstract: In a conversational context, a user expresses her multi-faceted information need as a sequence of natural-language questions, i.e., utterances. Starting from a given topic, the conversation evolves through user utterances and system replies. The retrieval of documents relevant to a given utterance in a conversation is challenging due to ambiguity of natural language and to the difficulty of detecting possible topic shifts and semantic relationships among utterances. We adopt the 2019 TREC Conversational Assistant Track (CAsT) framework to experiment with a modular architecture performing: (i) topic-aware utterance rewriting, (ii) retrieval of candidate passages for the rewritten utterances, and (iii) neural-based re-ranking of candidate passages. We present a comprehensive experimental evaluation of the architecture assessed in terms of traditional IR metrics at small cutoffs. Experimental results show the effectiveness of our techniques that achieve an improvement of up to $0.28$ (+93%) for P@1 and $0.19$ (+89.9%) for nDCG@3 w.r.t. the CAsT baseline.
Abstract: Recent literature on ranking systems (RS) has considered users' exposure when they are the object of the ranking. Although items are the object of reputation-based RS, users have a central role also in this class of algorithms. Indeed, when ranking the items, user preferences are weighted by how relevant this user is in the platform (i.e., their reputation). In this paper, we formulate the concept of disparate reputation (DR) and study if users characterized by sensitive attributes systematically get a lower reputation, leading to a final ranking that reflects less their preferences. We consider two demographic attributes, i.e., gender and age, and show that DR systematically occurs. Then, we propose mitigation, which ensures that reputation is independent of the users' sensitive attributes. Experiments on real-world data show that our approach can overcome DR and also improve ranking effectiveness.
Abstract: Concerns regarding the footprint of societal biases in information retrieval (IR) systems have been raised in several previous studies. In this work, we examine various recent IR models from the perspective of the degree of gender bias in their retrieval results. To this end, we first provide a bias measurement framework which includes two metrics to quantify the degree of the unbalanced presence of gender-related concepts in a given IR model's ranking list. To examine IR models by means of the framework, we create a dataset of non-gendered queries, selected by human annotators. Applying these queries to the MS MARCO Passage retrieval collection, we then measure the gender bias of a BM25 model and several recent neural ranking models. The results show that while all models are strongly biased toward male, the neural models, and in particular the ones based on contextualized embedding models, significantly intensify gender bias. Our experiments also show an overall increase in the gender bias of neural models when they exploit transfer learning, namely when they use (already biased) pre-trained embeddings.
Abstract: It is often useful for an IR practitioner to analyze the similarity function of an IR model, or for a non-technical search engine user to understand why a document was shown at a certain rank, in terms of the three fundamental aspects of a similarity function, namely the a) frequency of a term in a document, b) frequency of a term in a collection and c) the length of a document. We propose a general methodology of approximating an IR model as the coefficients of a linear function of these three fundamental aspects (and an additional aspect of semantic similarity between terms for neural models), which potentially can help IR practitioners to optimize the relative importance of each aspect on specific document collection and types of queries. Our analysis shows that the coefficients, which represent the relative importance of the three fundamental aspects, are useful to compare a model's different parametric instantiations or compare across different models.
Abstract: Recent advances in machine learning have led to emerging new approaches to deal with different kinds of biases that exist in the data. On the one hand, counterfactual learning copes with biases in the policy used for sampling (or logging) the data in order to evaluate and learn new policies. On the other hand, fairness-aware learning aims at learning fair models to avoid discrimination against certain individuals or groups. In this paper, we design a counterfactual framework to model fairness-aware learning which benefits from counterfactual reasoning to achieve more fair decision support systems. We utilize a definition of fairness to determine the bandit feedback in the counterfactual setting that learns a classification strategy from the offline data, and balances classification performance versus fairness measure. In the experiments, we demonstrate that a counterfactual setting can be perfectly exerted to learn fair models with competitive results compared to a well-known baseline system.
Abstract: Rankings are at the core of countless modern applications and thus play a major role in various decision making scenarios. When such rankings are produced by data-informed, machine learning-based algorithms, the potentially harmful biases contained in the data and algorithms are likely to be reproduced and even exacerbated. This motivated recent research to investigate a methodology for fair ranking, as a way to correct the aforementioned biases. Current approaches to fair ranking consider that the protected groups, i.e., the partition of the population potentially impacted by the biases, are known. However, in a realistic scenario, this assumption might not hold as different biases may lead to different partitioning into protected groups. Only accounting for one such partition (i.e., grouping) would still lead to potential unfairness with respect to the other possible groupings. Therefore, in this paper, we study the problem of designing fair ranking algorithms without knowing in advance the groupings that will be used later to assess their fairness. The approach that we follow is to rely on a carefully chosen set of groupings when deriving the ranked lists, and we empirically investigate which selection strategies are the most effective. An efficient two-step greedy brute-force method is also proposed to embed our strategy. As benchmark for this study, we adopted the dataset and setting composing the TREC 2019 Fair Ranking track.
Abstract: Response retrieval is a subset of neural ranking in which a model selects a suitable response from a set of candidates given a conversation history. Retrieval-based chat-bots are typically employed in information seeking conversational systems such as customer support agents. To make pairwise comparisons between a conversation history and a candidate response, two approaches are common: cross-encoders performing full self-attention over the pair and bi-encoders encoding the pair separately. The former gives better prediction quality but is too slow for practical use. In this paper, we propose a new cross-encoder architecture and transfer knowledge from this model to a bi-encoder model using distillation. This effectively boosts bi-encoder performance at no cost during inference time. We perform a detailed analysis of this approach on three response retrieval datasets.
Abstract: The ability to engage in mixed-initiative interaction is one of the core requirements for a conversational search system. How to achieve this is poorly understood. We propose a set of unsupervised metrics, termed ConversationShape, that highlights the role each of the conversation participants plays by comparing the distribution of vocabulary and utterance types. Using ConversationShape as a lens, we take a closer look at several conversational search datasets and compare them with other dialogue datasets to better understand the types of dialogue interaction they represent, either driven by the information seeker or the assistant. We discover that deviations from the ConversationShape of a human-human dialogue of the same type is predictive of the quality of a human-machine dialogue.
Abstract: Unbiased counterfactual learning to rank (CLTR) requires click propensities to compensate for the difference between user clicks and true relevance of search results via inverse propensity scoring (IPS). Current propensity estimation methods assume that user click behavior follows the position-based click model (PBM) and estimate click propensities based on this assumption. However, in reality, user clicks often follow the cascade model (CM), where users scan search results from top to bottom and where each next click depends on the previous one. In this cascade scenario, PBM-based estimates of propensities are not accurate, which, in turn, hurts CLTR performance. In this paper, we propose a propensity estimation method for the cascade scenario, called cascade model-based inverse propensity scoring (CM-IPS). We show that CM-IPS keeps CLTR performance close to the full-information performance in case the user clicks follow the CM, while PBM-based CLTR has a significant gap towards the full-information. The opposite is true if the user clicks follow PBM instead of the CM. Finally, we suggest a way to select between CM- and PBM-based propensity estimation methods based on historical user clicks.
Abstract: The ranking incentives of many authors of Web pages play an important role in the Web dynamics. That is, authors who opt to have their pages highly ranked for queries of interest often respond to rankings for these queries by manipulating their pages; the goal is to improve the pages' future rankings. Various theoretical aspects of this dynamics have recently been studied using game theory. However, empirical analysis of the dynamics is highly constrained due to lack of publicly available datasets. We present an initial such dataset that is based on TREC's ClueWeb09 dataset. Specifically, we used the WayBack Machine of the Internet Archive to build a document collection that contains past snapshots of ClueWeb documents which are highly ranked by some initial search performed for ClueWeb queries. Temporal analysis of document changes in this dataset reveals that findings recently presented for small-scale controlled ranking competitions between documents' authors also hold for Web data. Specifically, documents' authors tend to mimic the content of documents that were highly ranked in the past, and this practice can result in improved ranking.
Abstract: First Story Detection describes the task of identifying new events in a stream of documents. The UMass-FSD system is known for its strong performance in First Story Detection competitions. Recently, it has been frequently used as a high accuracy baseline in research publications. We are the first to discover that UMass-FSD inadvertently leverages temporal bias. Interestingly, the discovered bias contrasts previously known biases and performs significantly better. Our analysis reveals an increased contribution of temporally distant documents, resulting from an unusual way of handling incremental term statistics. We show that this form of temporal bias is also applicable to other well-known First Story Detection systems, where it improves the detection accuracy. To provide a more generalizable conclusion and demonstrate that the observed bias is not only an artefact of a particular implementation, we present a model that intentionally leverages a bias on temporal distance. Our model significantly improves the detection effectiveness of state-of-the-art First Story Detection systems.
Abstract: As deep learning based models are increasingly being used for information retrieval, a major challenge is to ensure the availability of test collections for measuring their quality. Test collections are usually generated based on pooling results of various retrieval systems, but until recently this did not include deep learning systems. This raises a major challenge for reusable evaluation: Since deep learning based models use external resources (e.g. word embeddings) and advanced representations when compared to traditional methods, they may return different types of relevant document that were not identified in the original pooling. If so, test collections constructed using traditional methods could lead to biased and unfair evaluation results for deep learning systems. This paper uses simulated pooling to test the fairness and reusability of test collections, showing that especially when shallow pools (e.g. depth-10 pools) are used, pooling based on traditional systems only may lead to biased evaluation of deep learning systems.
Abstract: User intent is not restricted in human-to-machine conversations, and sometimes overshoots the scope of a designed system. Many tasks for understanding conversations require the elimination of such out-of-scope queries. We propose an out-of-scope intent detection method, called KLOOS, based on a novel feature extraction mechanism that incorporates the information accumulation of sequential word processing. Information is accumulated by KL divergence between the intent distributions of consecutive words. The performance of our approach is compared with the conventional classifiers and state-of-the-art language models fine-tuned for out-of-scope detection on three spoken query collections. The results show that KLOOS statistically significantly improves out-of-scope sensitivity in all cases, while the overall performance is not deteriorated in most cases.
Abstract: The success of crowdsourcing based annotation of text corpora depends on ensuring that crowdworkers are sufficiently well-trained to perform the annotation task accurately. To that end, a frequent approach to train annotators is to provide instructions and a few example cases that demonstrate how the task should be performed (referred to as the CONTROL approach). These globally defined "task-level examples", however, (i) often only cover the common cases that are encountered during an annotation task; and (ii) require effort from crowdworkers during the annotation process to find the most relevant example for the currently annotated sample. To overcome these limitations, we propose to support workers in addition to task-level examples, also with "task-instance level" examples that are semantically similar to the currently annotated data sample (referred to as Dynamic Examples for Annotation, DEXA). Such dynamic examples can be retrieved from collections previously labeled by experts, which are usually available as gold standard dataset. We evaluate DEXA on a complex task of annotating participants, interventions, and outcomes (known as PIO) in sentences of medical studies. The dynamic examples are retrieved using BioSent2Vec, an unsupervised semantic sentence similarity method specific to the biomedical domain. Results show that (i) workers of the DEXA approach reach on average much higher agreements (Cohen's Kappa) to experts than workers of the the CONTROL approach (avg. of 0.68 to experts in DEXA vs. 0.40 in CONTROL); (ii) already three per majority voting aggregated annotations of the DEXA approach reach substantial agreements to experts of 0.78/0.75/0.69 for P/I/O (in CONTROL 0.73/0.58/0.46). Finally, (iii) we acquire explicit feedback from workers and show that in the majority of cases (avg. 72%) workers find the dynamic examples useful.
Abstract: Tools, computing environments, and datasets form the three critical ingredients for teaching and learning the practical aspects of experimental IR research. Assembling these ingredients can often be challenging, particularly in the context of short courses that cannot afford large startup costs. As an initial attempt to address these issues, we describe materials that we have developed for the "Introduction to IR" session at the ACM SIGIR/SIGKDD Africa Summer School on Machine Learning for Data Mining and Search (AFIRM 2020), which builds on three components: the open-source Lucene search library, cloud-based notebooks, and the MS MARCO dataset. We offer a self-reflective evaluation of our efforts and hope that our lessons shared can benefit future efforts.
Abstract: Misinformation such as fake news has drawn a lot of attention in recent years. It has serious consequences on society, politics and economy. This has lead to a rise of manually fact-checking websites such as Snopes and Politifact. However, the scale of misinformation limits their ability for verification. In this demonstration, we propose BRENDA a browser extension which can be used to automate the entire process of credibility assessments of false claims. Behind the scenes BRENDA uses a tested deep neural network architecture to automatically identify fact check worthy claims and classifies as well as presents the result along with evidence to the user. Since BRENDA is a browser extension, it facilities fast automated fact checking for the end user without having to leave the Webpage.
Abstract: Conversational Information Seeking (CIS) is an emerging area of Information Retrieval focused on interactive search systems. As a result there is a need for new benchmark datasets and tools to enable their creation. In this demo we present the Agent Dialogue (AD) platform, an open-source system developed for researchers to perform Wizard-of-Oz CIS experiments. AD is a scalable cloud-native platform developed with Docker and Kubernetes with a flexible and modular micro-service architecture built on production-grade state-of-the-art open-source tools (Kubernetes, gRPC streaming, React, and Firebase). It supports varied front-ends and has the ability to interface with multiple existing agent systems, including Google Assistant and open-source search libraries. It includes support for centralized structure logging as well as offline relevance annotation.
Abstract: Small pieces of data that are shared online, over time and across multiple social networks, have the potential to reveal more cumulatively than a person intends. This could result in harm, loss or detriment to them depending what information is revealed, who can access it, and how it is processed. But how aware are social network users of how much information they are actually disclosing? And if they could examine all their data, what cumulative revelations might be found that could potentially increase their risk of various online threats (social engineering, fraud, identify theft, loss of face, etc.)? In this paper, we present DataMirror, an initial prototype tool, that enables social network users to aggregate their online data so that they can search, browse and visualise what they have put online. The aim of the tool is to investigate and explore people's awareness of their data self that is projected online; not only in terms of the volume of information that they might share, but what it may mean when combined together, what pieces of sensitive information may be gleaned from their data, and what machine learning may infer about them given their data.
Abstract: Question answering (QA) over text passages is a problem of longstanding interest in information retrieval. Recently, the conversational setting has attracted attention, where a user asks a sequence of questions to satisfy her information needs around a topic. While this setup is a natural one and similar to humans conversing with each other, it introduces a key research challenge: understanding the context left implicit by the user in follow-up questions. In this work, we demonstrate CROWN (Conversational passage ranking by Reasoning Over Word Networks): an unsupervised yet effective system for conversational QA with passage responses, that supports several modes of context propagation over multiple turns. To this end, CROWN first builds a word proximity network (WPN) from large corpora to store statistically significant term co-occurrences. At answering time, passages are ranked by a combination of their similarity to the question, and coherence of query terms within: these factors are measured by reading off node and edge weights from the WPN. CROWN provides an interface that is both intuitive for end-users, and insightful for experts for reconfiguration to individual setups. CROWN was evaluated on TREC CAsT data, where it achieved above-median performance in a pool of neural methods.
Abstract: In this paper, we present FigExplorer, a novel general system that supports the retrieval and exploration of research article figures. Specifically, FigExplorer can support 1) figure retrieval using keyword queries, 2) exploration of related figures of a given figure, 3) exploration of a figure topic using the citation network, and 4) search result re-ranking using an example figure. The different functions were implemented using either classical IR models or neural network-based figure embeddings. Finally, the system was designed to facilitate the collection of user data for training and test purposes and it is flexible enough such that it can be extended to include new functions and algorithms. As an open-source system, FigExplorer can help advance the research, evaluation, and development of applications in this area.
Abstract: Systematic reviews constitute the cornerstone of Evidence-based Medicine. They can provide guidance to medical policy-making by synthesizing all available studies regarding a certain topic. However, conducting systematic reviews has become a laborious and time-consuming task due to the large amount and rapid growth of published literature. The TAR approaches aim to accelerate the screening stage of systematic reviews by combining machine learning algorithms and human relevance feedback. In this work, we built an online active search system for systematic reviews, named APS, by applying an state-of-the-art TAR approach -- Continuous Active Learning. The system is built on the top of the PubMed collection, which is a widely used database of biomedical literature. It allows users to conduct the abstract screening for systematic reviews. We demonstrate the effectiveness and robustness of the APS in detecting relevant literature and reducing workload for systematic reviews using the CLEF TAR 2017 benchmark.
Abstract: Systematic reviews are used widely in the biomedical and healthcare domains. Systematic reviews aim to provide a complete and exhaustive overview of the medical literature for a specific research question. Core to the construction of a systematic review is the search strategy. The main component of a search strategy is a complex Boolean query, typically developed by information specialists (e.g., librarians). The aim of the search strategy is to retrieve relevant studies that will contribute to the outcomes of the systematic review. One barrier information specialists face when developing a search strategy is the enormous amount of medical literature that exists in databases. This vast amount of literature means that search strategies often suffer from biases (e.g., lack of expertise, overconfidence, limited knowledge of the domain) and are incomplete, or retrieve far too many studies (possibly as a result of the biases, but also due to the tools used to develop search strategies). Retrieving too many studies impacts the time and financial costs of the review, and retrieving too few studies may impact the outcomes of the review. Therefore, it is vital to support expert searchers develop effective search strategies. In this paper, we present a novel end-to-end set of advanced tools for information specialists. These tools are tightly integrated into an existing Open Source search strategy refining package (searchrefiner). These tools aim to address the problems associated with search strategy development by providing a complete framework from query development, to refinement, to documentation. The implementation of these tools also offers a glimpse at the ease at which related tools may be implemented within the searchrefiner ecosystem. More information about the tools including installation, documentation, and screenshots is made available on the searchrefiner website: https://ielab.io/searchrefiner.
Abstract: As the world's largest professional network, LinkedIn wants to create economic opportunity for everyone in the global workforce. One of its most critical missions is matching jobs with processionals. Improving job targeting accuracy and hire efficiency align with LinkedIn's Member First Motto. To achieve those goals, we need to understand unstructured job postings with noisy information. We applied deep transfer learning to create domain-specific job understanding models. After this, jobs are represented by professional entities, including titles, skills, companies, and assessment questions. To continuously improve LinkedIn's job understanding ability, we designed an expert feedback loop where we integrated job understanding models into LinkedIn's products to collect job posters' feedback. In this demonstration, we present LinkedIn's job posting flow and demonstrate how the integrated deep job understanding work improves job posters' satisfaction and provides significant metric lifts in LinkedIn's job recommendation system.
Abstract: There exists a natural tension between encouraging a diverse ecosystem of open-source search engines and supporting fair, replicable comparisons across those systems. To balance these two goals, we examine two approaches to providing interoperability between the inverted indexes of several systems. The first takes advantage of internal abstractions around index structures and building wrappers that allow one system to directly read the indexes of another. The second involves sharing indexes across systems via a data exchange specification that we have developed, called the Common Index File Format (CIFF). We demonstrate the first approach with the Java systems Anserini and Terrier, and the second approach with Anserini, JASSv2, OldDog, PISA, and Terrier. Together, these systems provide a wide range of implementations and features, with different research goals. Overall, we recommend CIFF as a low-effort approach to support independent innovation while enabling the types of fair evaluations that are critical for driving the field forward.
Abstract: In this work, we demonstrate a novel system, namely Web of Scholars, which integrates state-of-the-art mining techniques to search, mine, and visualize complex networks behind scholars in the field of Computer Science. Relying on the knowledge graph, it provides services for fast, accurate, and intelligent semantic querying as well as powerful recommendations. In addition, in order to realize information sharing, it provides open API to be served as the underlying architecture for advanced functions. Web of Scholars takes advantage of knowledge graph, which means that it will be able to access more knowledge if more search exist. It can be served as a useful and interoperable tool for scholars to conduct in-depth analysis within Science of Science.
Abstract: In this paper we present SPot, an automated tool for detecting operating segments and their related performance indicators from earnings reports. Due to their company-specific nature, operating segments cannot be detected using taxonomy-based approaches. Instead, we train a bidirectional RNN classifier that can distinguish between common metrics such as "revenue" and company-specific metrics that are likely to be operating segments, such as "iPhone" or "cloud services". SPot surfaces the results in an interactive web interface that allows users to trace and adjust performance metrics for each operating segment. This facilitates credit monitoring, enables them to perform competitive benchmarking more effectively, and can be used for trend analysis at company and sector levels.
Abstract: Many government and public organisations have a requirement to release their official documents to the public and therefore need to review such documents to identify and protect any sensitive information that they contain. When reviewing a document for sensitivity, reviewers often use information from other documents within the collection to assist in their decisions. It can be difficult for the reviewers to find related documents in large digital collections when they are performing sensitivity review. Receptor is a new solution that aims to provide sensitivity reviewers with the ability to explore a collection of documents to discover latent relations, between for example entities and events, that can be a reliable indicator of sensitive information. The system provides novel scalable graph search and exploration functionalities as well as interactive visualisations of the latent relations between related entities, events, and documents to enable users to identify hidden patterns of sensitivity.
Abstract: The open nature of the Web enables users to produce and propagate any content without authentication, which has been exploited to spread thousands of unverified claims via millions of online documents. Maintenance of credible knowledge bases thus has to rely on fact checking that constructs a trusted set of facts through credibility assessment. Due to an inherent lack of ground truth information and language ambiguity, fact checking cannot be done in a purely automated manner without compromising accuracy. However, state-of-the-art fact checking services, rely mostly on human validation, which is costly, slow, and non-transparent. This paper presents FactCatch, a human-in-the-loop system to guide users in fact checking that aims at minimisation of the invested effort. It supports incremental quality estimation, mistake mitigation, and pay-as-you-go instantiation of a high-quality fact database.
Abstract: The demand for a tool for summarizing emerging topics is increasing in modern life since the tool can deliver well-organized information to its users. Even though there are already a number of successful search systems, the system which automatically summarizes and organizes the content of emerging topics is still in its infancy. To fulfill such demand, we introduce an automated report generation system that generates a well-summarized human-readable report for emerging topics. In this report generation system, emerging topics are automatically discovered by a topic model and news articles are indexed by the discovered topics. Then, a topical summary and a timeline summary for each topic is generated by a topical multi-document summarizer and a timeline summarizer respectively. In order to enhance the apprehensibility of the users, the proposed report system provides two report modes. One is Today's Briefing which summarizes five discovered topics of every day, and the other is Full Report which shows a long-term view of each topic with a detailed topical summary and an important event timeline.
Abstract: Ensuring reproducibility is key to all scientific domains. As Information Retrieval (IR) experiments are often composed of several steps that can be shared between tested models, and rely on various resources, it is difficult to keep track of all the experimental settings and to ensure experiments can be reproduced easily. In this demo paper, we present two managers, Experimaestro and Datamaestro, and their add-ons for IR, designed to help to define and run experimental plans.
Abstract: In this paper, we present the design of QuAChIE, a Question Answering based Chinese Information Extraction system. QuAChIE mainly depends on a well-trained question answering model to extract high-quality triples. The group of head entity and relation are regarded as a question given the input text as the context. For the training and evaluation of each model in the system, we build a large-scale information extraction dataset using Wikidata and Wikipedia pages by distant supervision. The advanced models implemented on top of the pre-trained language model and the enormous distant supervision data enable QuAChIE to extract relation triples from documents with cross-sentence correlations. The experimental results on the test set and the case study based on the interactive demonstration show its satisfactory Information Extraction quality on Chinese document-level texts.
Abstract: We introduce Vis-Trec, an open-source cross-platform system, which provides the capability to perform in-depth analysis of the results obtained from trec-style evaluation campaigns. Vis-Trec allows researchers to dig deeper in their evaluations by providing various visualizations of the results based on performance percentiles, query difficulty, and comparative analysis of different methods using help-hurt diagrams at the query level. It also automatically organizes the obtained results in tabular LaTeX format that can be used for reporting evaluation findings. The added benefit for Vis-Trec is that it has been developed in Python and is extensible by other developers. The source code along with a functional version of the program are released to the public.
Abstract: We present JASSjr, a minimalistic trec_eval compatible BM25-ranking search engine that can index small TREC data sets such as the Wall Street Journal collection. We do this for several reasons. First, to demonstrate how a term-at-a-time (TAAT) search engine works. Second, to demonstrate that a straightforward and competitive search engine with indexer can be written in under 600 lines of documented code. Third, as a way of providing a simple code-base for teaching Information Retrieval. We present two index-compatible versions (one in C/C++, the other in Java) that compile and run on MacOS, Linux, and Windows. Our code is released under the 2-clause BSD licence, and we provide several suggestions for extensions which might be used as exercises in an Information Retrieval course.
Abstract: With the rapid growth of B2B (Business-to-Business), how to efficiently respond to various customer questions is becoming an important issue. In this scenario, customer questions always involve many aspects of the products, so there are usually multiple customer service agents to response respectively. To improve efficiency, we propose a human-machine cooperation solution called ServiceGroup, where relevant agents and customers are invited into the same group, and the system can provide a series of intelligent functions, including question notification, question recommendation and knowledge extraction. With the assistance of our developed ServiceGroup, the response rate within 15 minutes is improved twice. Until now, our ServiceGroup has already supported thousands of enterprises by means of millions of groups in instant messaging softwares.
Abstract: Conversational information seeking (CIS) has been recognized as a major emerging research area in information retrieval. Such research will require data and tools, to allow the implementation and study of conversational systems. This paper introduces Macaw, an open-source framework with a modular architecture for CIS research. Macaw supports multi-turn, multi-modal, and mixed-initiative interactions, and enables research for tasks such as document retrieval, question answering, recommendation, and structured data exploration. It has a modular design to encourage the study of new CIS algorithms, which can be evaluated in batch mode. It can also integrate with a user interface, which allows user studies and data collection in an interactive mode, where the back end can be fully algorithmic or a wizard of oz setup. Macaw is distributed under the MIT License.
Abstract: Entity linking is a standard component in modern retrieval system that is often performed by third-party toolkits. Despite the plethora of open source options, it is difficult to find a single system that has a modular architecture where certain components may be replaced, does not depend on external sources, can easily be updated to newer Wikipedia versions, and, most important of all, has state-of-the-art performance. The REL system presented in this paper aims to fill that gap. Building on state-of-the-art neural components from natural language processing research, it is provided as a Python package as well as a web API. We also report on an experimental comparison against both well-established systems and the current state-of-the-art on standard entity linking benchmarks.
Abstract: Purchase-related micro-behaviors, e.g., favorite, add to cart, read reviews, etc., provide implicit feedback of users' decision-making process. Such informative feedback can lead to fine-grained post-click conversion rate (CVR) modeling of the buying process. However, most existing works on CVR estimation either neglect these informative feedback, or model them as a sequential pattern with Recurrent Neural Networks. We argue such modeling could be inappropriate since different orders of micro-behaviors may represent similar user buying intention, and micro-behaviors often correlate with each other. To this end, we propose to represent user micro-behaviors as a Purchase-related Micro-behavior Graph (PMG). Specifically, each node stands for one micro-behavior, and edge weights denote the connection strength. Based on this graph representation, we frame CVR estimation as a graph classification problem over the PMG instances. We propose a novel CVR model, namely, Graph-based Micro-behavior Conversion Model (GMCM), that utilizes Graph Convolutional networks (GCN) to enhance the conventional CVR modeling. In addition, we adopt multi-task learning and inverse propensity weighting to tackle two well-recognized issues in CVR estimation: data sparsity and sample selection bias. Extensive experiments on six large-scale production datasets demonstrate that the proposed methods outperform the state-of-the-art CVR methods under industrial setting.
Abstract: Internet is changing the world, adapting to the trend of internet sales will bring revenue to traditional insurance companies. Online insurance is still in its early stages of development, where cold start problem (prospective customer) is one of the greatest challenges. In traditional e-commerce field, several cross-domain recommendation (CDR) methods have been studied to infer preferences of cold start users based on their preferences in other domains. However, these CDR methods couldn't be applied to insurance domain directly due to the domain's specific properties. In this paper, we propose a novel framework called a Heterogeneous information network based Cross Domain Insurance Recommendation (HCDIR) system for cold start users. Specifically, we first try to learn more effective user and item latent features in both source and target domains. In source domain, we employ gated recurrent unit (GRU) to module users' dynamic interests. In target domain, given the complexity of insurance products and the data sparsity problem, we construct an insurance heterogeneous information network (IHIN) based on data from PingAn Jinguanjia, the IHIN connects users, agents, insurance products and insurance product properties together, giving us richer information. Then we employ three-level (relational, node, and semantic) attention aggregations to get user and insurance product representations. After obtaining latent features of overlapping users, a feature mapping between the two domains is learned by multi-layer perceptron (MLP). We apply HCDIR on Jinguanjia dataset, and show HCDIR significantly outperforms the state-of-the-art solutions.
Abstract: Event representative learning aims to embed news events into continuous space vectors for capturing syntactic and semantic information from text corpus, which is benefit to event-driven quantitative investments. However, the financial market reaction of events is also influenced by the lead-lag effect, which is driven by internal relationships. Therefore, in this paper, we present a knowledge graph-based event embedding framework for quantitative investments. In particular, we first extract structured events from raw texts, and construct the knowledge graph with the mentioned entities and relations simultaneously. Then, we leverage a joint model to merge the knowledge graph information into the objective function of an event embedding learning model. The learned representations are fed as inputs of downstream quantitative trading methods. Extensive experiments on real-world dataset demonstrate the effectiveness of the event embeddings learned from financial news and knowledge graphs. We also deploy the framework for quantitative algorithm trading. The accumulated portfolio return contributed by our method significantly outperforms other baselines.
Abstract: Recommender system (RS) devotes to predicting user preference to a given item and has been widely deployed in most web-scale applications. Recently, knowledge graph (KG) attracts much attention in RS due to its abundant connective information. Existing methods either explore independent meta-paths for user-item pairs over KG, or employ graph neural network (GNN) on whole KG to produce representations for users and items separately. Despite effectiveness, the former type of methods fails to fully capture structural information implied in KG, while the latter ignores the mutual effect between target user and item during the embedding propagation. In this work, we propose a new framework named Adaptive Target-Behavior Relational Graph network (ATBRG for short) to effectively capture structural relations of target user-item pairs over KG. Specifically, to associate the given target item with user behaviors over KG, we propose the graph connect and graph prune techniques to construct adaptive target-behavior relational graph. To fully distill structural information from the sub-graph connected by rich relations in an end-to-end fashion, we elaborate on the model design of ATBRG, equipped with relation-aware extractor layer and representation activation layer. We perform extensive experiments on both industrial and benchmark datasets. Empirical results show that ATBRG consistently and significantly outperforms state-of-the-art methods. Moreover, ATBRG has also achieved a performance improvement of 5.1% on CTR metric after successful deployment in one popular recommendation scenario of Taobao APP.
Abstract: Dating back from late December 2019, the Chinese city of Wuhan has reported an outbreak of atypical pneumonia, now known as lung inflammation caused by novel coronavirus (COVID-19). Cases have spread to other cities in China and more than 180 countries and regions internationally. World Health Organization (WHO) officially declares the coronavirus outbreak a pandemic and the public health emergency is perhaps one of the top concerns in the year of 2020 for governments all over the world. Till today, the coronavirus outbreak is still raging and has no sign of being under control in many countries. In this paper, we aim at drawing lessons from the COVID-19 outbreak process in China and using the experiences to help the interventions against the coronavirus wherever in need. To this end, we have built a system predicting hazard areas on the basis of confirmed infection cases with location information. The purpose is to warn people to avoid of such hot zones and reduce risks of disease transmission through droplets or contacts. We analyze the data from the daily official information release which are publicly accessible. Based on standard classification frameworks with reinforcements incrementally learned day after day, we manage to conduct thorough feature engineering from empirical studies, including geographical, demographic, temporal, statistical, and epidemiological features. Compared with heuristics baselines, our method has achieved promising overall performance in terms of precision, recall, accuracy, F1 score, and AUC. We expect that our efforts could be of help in the battle against the virus, the common opponent of human kind.
Abstract: In this paper, we address the text and image matching in cross-modal retrieval of the fashion industry. Different from the matching in the general domain, the fashion matching is required to pay much more attention to the fine-grained information in the fashion images and texts. Pioneer approaches detect the region of interests (i.e., RoIs) from images and use the RoI embeddings as image representations. In general, RoIs tend to represent the "object-level" information in the fashion images, while fashion texts are prone to describe more detailed information, e.g. styles, attributes. RoIs are thus not fine-grained enough for fashion text and image matching. To this end, we propose FashionBERT, which leverages patches as image features. With the pre-trained BERT model as the backbone network, FashionBERT learns high level representations of texts and images. Meanwhile, we propose an adaptive loss to trade off multitask learning in the FashionBERT modeling. Two tasks (i.e., text and image matching and cross-modal retrieval) are incorporated to evaluate FashionBERT. On the public dataset, experiments demonstrate FashionBERT achieves significant improvements in performances than the baseline and state-of-the-art approaches. In practice, FashionBERT is applied in a concrete cross-modal retrieval application. We provide the detailed matching performance and inference efficiency analysis.
Abstract: Personalized recommendation benefits users in accessing contents of interests effectively. Current research on recommender systems mostly focuses on matching users with proper items based on user interests. However, significant efforts are missing to understand how the recommendations influence user preferences and behaviors, e.g., if and how recommendations result in echo chambers. Extensive efforts have been made in examining the phenomenon in online media and social network systems. Meanwhile, there are growing concerns that recommender systems might lead to the self-reinforcing of user's interests due to narrowed exposure of items, which may be the potential cause of echo chamber. In this paper, we aim to analyze the echo chamber phenomenon in Alibaba Taobao --- one of the largest e-commerce platforms in the world. Echo chamber means the effect of user interests being reinforced through repeated exposure to similar contents. Based on the definition, we examine the presence of echo chamber in two steps. First, we explore whether user interests have been reinforced. Second, we check whether the reinforcement results from the exposure of similar contents. Our evaluations are enhanced with robust metrics, including cluster validity and statistical significance. Experiments are performed on extensive collections of real-world data consisting of user clicks, purchases, and browse logs from Alibaba Taobao. Evidence suggests the tendency of echo chamber in user click behaviors, while it is relatively mitigated in user purchase behaviors. Insights from the results guide the refinement of recommendation algorithms in real-world e-commerce systems.
Abstract: Query Auto-Completion (QAC) is an ubiquitous feature of modern textual search systems, suggesting possible ways of completing the query being typed by the user. Efficiency is crucial to make the system have a real-time responsiveness when operating in the million-scale search space. Prior work has extensively advocated the use of a trie data structure for fast prefix-search operations in compact space. However, searching by prefix has little discovery power in that only completions that are prefixed by the query are returned. This may impact negatively the effectiveness of the QAC system, with a consequent monetary loss for real applications like Web Search Engines and eCommerce. In this work we describe the implementation that empowers a new QAC system at eBay, and discuss its efficiency/effectiveness in relation to other approaches at the state-of-the-art. The solution is based on the combination of an inverted index with succinct data structures, a much less explored direction in the literature. This system is replacing the previous implementation based on Apache SOLR that was not always able to meet the required service-level- agreement.
Abstract: Image galleries provide a rich source of diverse information about a product which can be leveraged across many recommendation and retrieval applications. We study the problem of building a universal image gallery encoder through multi-task learning (MTL) approach and demonstrate that it is indeed a practical way to achieve generalizability of learned representations to new downstream tasks. Additionally, we analyze the relative predictive performance of MTL-trained solutions against optimal and substantially more expensive solutions, and find signals that MTL can be a useful mechanism to address sparsity in low-resource binary tasks.
Abstract: Many consumer products are two-sided marketplaces, ranging from commerce products that connect buyers and sellers, such as Amazon, Alibaba, and Facebook Marketplace, to sharing-economy products that connect passengers to drivers or guests to hosts, like Uber and Airbnb. The search and recommender systems behind these products are typically optimized for objectives like click-through, purchase, or booking rates, which are mostly tied to the consumer side of the marketplace (namely buyers, passengers, or guests). For the long-term growth of these products, it is also crucial to consider the value to the providers (sellers, drivers, or hosts). However, optimizing ranking for such objectives is uncommon because it is challenging to measure the causal effect of ranking changes on providers. For instance, if we run a standard seller-side A/B test on Facebook Marketplace that exposes a small percentage of sellers, what we observe in the test would be significantly different from when the treatment is launched to all sellers. To overcome this challenge, we propose a counterfactual framework for seller-side A/B testing. The key idea is that items in the treatment group are ranked the same regardless of experiment exposure rate. Similarly, the items in the control are ranked where they would be if the status quo is applied to all sellers. Theoretically, we show that the framework satisfies the stable unit treatment value assumption since the experience that sellers receive is only affected by their own treatment and independent of the treatment of other sellers. Empirically, both seller-side and buyer-side online A/B tests are conducted on Facebook Marketplace to verify the framework.
Abstract: Implied semantics is a complex language act that can appear everywhere on the Cyberspace. The prevalence of implied spam texts, such as implied pornography, sarcasm, and abuse hidden within the novel, tweet, microblog, or review, can be extremely harmful to the physical and mental health of teenagers. The non-literal interpretation of the implied text is hard to be understood by machine models due to its high context-sensitivity and heavy usage of figurative language. In this study, inspired by human reading comprehension, we propose a novel, simple, and effective deep neural framework, called Skim and Intensive Reading Model (SIRM), for figuring out implied textual meaning. The proposed SIRM consists of three main components, namely the skim reading component, intensive reading component, and adversarial training component. N-gram features are quickly extracted from the skim reading component, which is a combination of several convolutional neural networks, as skim (entire) information. An intensive reading component enables a hierarchical investigation for both sentence-level and paragraph-level representation, which encapsulates the current (local) embedding and the contextual information (context) with a dense connection. More specifically, the contextual information includes the near-neighbor information and the skim information mentioned above. Finally, besides the common training loss function, we employ an adversarial loss function as a penalty over the skim reading component to eliminate noisy information (noise) arisen from special figurative words in the training data. To verify the effectiveness, robustness, and efficiency of the proposed architecture, we conduct extensive comparative experiments on an industrial novel dataset involving implied pornography and three sarcasm benchmarks. Experimental results indicate that (1) the proposed model, which benefits from context and local modeling and consideration of figurative language (noise), outperforms existing state-of-the-art solutions, with comparable parameter scale and running speed; (2) the SIRM yields superior robustness in terms of parameter size sensitivity; (3) compared with ablation and addition variants of the SIRM, the final framework is efficient enough.
Abstract: Deep recommender systems have achieved promising performance on real-world recommendation tasks. They typically represent users and items in a low-dimensional embedding space and then feed the embeddings into the following deep network structures for prediction. Traditional deep recommender models often adopt uniform and fixed embedding sizes for all the users and items. However, such design is not optimal in terms of not only the recommendation performance and but also the space complexity. In this paper, we propose to dynamically search the embedding sizes for different users and items and introduce a novel embedding size adjustment policy network (ESAPN). ESAPN serves as an automated reinforcement learning agent to adaptively search appropriate embedding sizes for users and items. Different from existing works, our model performs hard selection on different embedding sizes, which leads to a more accurate selection and decreases the storage space. We evaluate our model under the streaming setting on two real-world benchmark datasets. The results show that our proposed framework outperforms representative baselines. Moreover, our framework is demonstrated to be robust to the cold-start problem and reduce memory consumption by around 40%-90%. The implementation of the model is released.
Abstract: Tabular data is the most common data format adopted by our customers ranging from retail, finance to E-commerce, and tabular data classification plays an essential role to their businesses. In this paper, we present Network On Network (NON), a practical tabular data classification model based on deep neural network to provide accurate predictions. Various deep methods have been proposed and promising progress has been made. However, most of them use operations like neural network and factorization machines to fuse the embeddings of different features directly, and linearly combine the outputs of those operations to get the final prediction. As a result, the intra-field information and the non-linear interactions between those operations (e.g. neural network and factorization machines) are ignored. Intra-field information is the information that features inside each field belong to the same field. NON is proposed to take full advantage of intra-field information and non-linear interactions. It consists of three components: field-wise network at the bottom to capture the intra-field information, across field network in the middle to choose suitable operations data-drivenly, and operation fusion network on the top to fuse outputs of the chosen operations deeply. Extensive experiments on six real-world datasets demonstrate NON can outperform the state-of-the-art models significantly. Furthermore, both qualitative and quantitative study of the features in the embedding space show NON can capture intra-field information effectively.
Abstract: Tagging has been recognized as a successful practice to boost relevance matching for information retrieval (IR), especially when items lack rich textual descriptions. A lot of research has been done for either multi-label text categorization or image annotation. However, there is a lack of published work that targets at item tagging specifically for IR. Directly applying a traditional multi-label classification model for item tagging is sub-optimal, due to the ignorance of unique characteristics in IR. In this work, we propose to formulate item tagging as a link prediction problem between item nodes and tag nodes. To enrich the representation of items, we leverage the query logs available in IR tasks, and construct a query-item-tag tripartite graph. This formulation results in a TagGNN model that utilizes heterogeneous graph neural networks with multiple types of nodes and edges. Different from previous research, we also optimize both full tag prediction and partial tag completion cases in a unified framework via a primary-dual loss mechanism. Experimental results on both open and industrial datasets show that our TagGNN approach outperforms the state-of-the-art multi-label classification approaches.
Abstract: In an active e-commerce environment, customers process a large number of reviews when deciding on whether to buy a product or not.ive Multi-Review Summarization aims to assist users to efficiently consume the reviews that are the most relevant to them. We propose the first large-scale abstractive multi-review summarization dataset that leverages more than 17.9 billion raw reviews and uses novel aspect-alignment techniques based on aspect annotations. Furthermore, we demonstrate that one can generate higher-quality review summaries by using a novel aspect-alignment-based model. Results from both automatic and human evaluation show that the proposed dataset plus the innovative aspect-alignment model can generate high-quality and trustful review summaries.
Abstract: Click-through rate (CTR) prediction plays a key role in modern online personalization services. In practice, it is necessary to capture user's drifting interests by modeling sequential user behaviors to build an accurate CTR prediction model. However, as the users accumulate more and more behavioral data on the platforms, it becomes non-trivial for the sequential models to make use of the whole behavior history of each user. First, directly feeding the long behavior sequence will make online inference time and system load infeasible. Second, there is much noise in such long histories to fail the sequential model learning. The current industrial solutions mainly truncate the sequences and just feed recent behaviors to the prediction model, which leads to a problem that sequential patterns such as periodicity or long-term dependency are not embedded in the recent several behaviors but in far back history. To tackle these issues, in this paper we consider it from the data perspective instead of just designing more sophisticated yet complicated models and propose User Behavior Retrieval for CTR prediction (UBR4CTR) framework. In UBR4CTR, the most relevant and appropriate user behaviors will be firstly retrieved from the entire user history sequence using a learnable search method. These retrieved behaviors are then fed into a deep model to make the final prediction instead of simply using the most recent ones. It is highly feasible to deploy UBR4CTR into industrial model pipeline with low cost. Experiments on three real-world large-scale datasets demonstrate the superiority and efficacy of our proposed framework and models.
Abstract: Mobile devices have become an increasingly ubiquitous part of our everyday life. We use mobile services to perform a broad range of tasks (e.g. booking travel or office work), leading to often lengthy interactions within distinct apps and services. Existing mobile systems handle mostly simple user needs, where a single app is taken as the unit of interaction. To understand users' expectations and to provide context-aware services, it is important to model users' interactions in the task space. In this work, we first propose and evaluate a method for the automated segmentation of users' app usage logs into task units. We focus on two problems: (i) given a sequential pair of app usage logs, identify if there exists a task boundary, and (ii) given any pair of two app usage logs, identify if they belong to the same task. We model these as classification problems that use features from three aspects of app usage patterns: temporal, similarity, and log sequence. Our classifiers improve on traditional timeout segmentation, achieving over 89% performance for both problems. Secondly, we use our best task classifier on a large-scale data set of commercial mobile app usage logs to identify common tasks. We observe that users' performed common tasks ranging from regular information checking to entertainment and booking dinner. Our proposed task identification approach provides the means to evaluate mobile services and applications with respect to task completion.
Abstract: Many business documents processed in modern NLP and IR pipelines are visually rich: in addition to text, their semantics can also be captured by visual traits such as layout, format, and fonts. We study the problem of information extraction from visually rich documents (VRDs) and present a model that combines the power of large pre-trained language models and graph neural networks to efficiently encode both textual and visual information in business documents. We further introduce new fine-tuning objectives to improve in-domain unsupervised fine-tuning to better utilize large amount of unlabeled in-domain data. We experiment on real world invoice and resume data sets and show that the proposed method outperforms strong text-based RoBERTa baselines by 6.3% absolute F1 on invoices and 4.7% absolute F1 on resumes. When evaluated in a few-shot setting, our method requires up to 30x less annotation data than the baseline to achieve the same level of performance at ~90% F1.
Abstract: Recommender system, as an essential part of modern e-commerce, consists of two fundamental modules, namely Click-Through Rate (CTR) and Conversion Rate (CVR) prediction. While CVR has a direct impact on the purchasing volume, its prediction is well-known challenging due to the Sample Selection Bias (SSB) and Data Sparsity (DS) issues. Although existing methods, typically built on the user sequential behavior path "impression->click->purchase", is effective for dealing with SSB issue, they still struggle to address the DS issue due to rare purchase training samples. Observing that users always take several purchase-related actions after clicking, we propose a novel idea of post-click behavior decomposition. Specifically, disjoint purchase-related Deterministic Action (DAction) and Other Action (OAction) are inserted between click and purchase in parallel, forming a novel user sequential behavior graph "impression->click->D(O)Action->purchase". Defining model on this graph enables to leverage all the impression samples over the entire space and extra abundant supervised signals from D(O)Action, which will effectively address the SSB and DS issues together. To this end, we devise a novel deep recommendation model named Elaborated Entire Space Supervised Multi-task Model (ESM2). According to the conditional probability rule defined on the graph, it employs multi-task learning to predict some decomposed sub-targets in parallel and compose them sequentially to formulate the final CVR. Extensive experiments on both offline and online environments demonstrate the superiority of ESM2 over state-of-the-art models. The source code and dataset will be released.
Abstract: A two-sided travel marketplace is an E-Commerce platform where users can both host tours or activities and book them as a guest. When a new guest visits the platform, given tens of thousands of available listings, a natural question is that what kind of activities or trips are the best fit. In order to answer the question, a recommender system needs to both understand characteristics of its inventories, and to know the preferences of each individual guest. In this work, we present our efforts on building a recommender system for Airbnb Experiences, a two-sided online marketplace for tours and activities. Traditional recommender systems rely on abundant user-listing interactions. Airbnb Experiences is an emerging business where many listings and guests are new to the platform. Instead of passively waiting for data to accumulate, we propose novel approaches to identify key features of a listing and estimate guest preference with limited data availability. In particular, we focus on extending the knowledge graph and utilizing location features. We extend the original knowledge graph to include more city-specific concepts, which enables us to better characterize inventories. In addition, since many users are new to the business, and the limited information of cold-start guests are categorical features, such as locations and destinations, we propose to utilize categorical information by employing additive submodels. Extensive experiments have been conducted and the results show the superiority of the proposed methods over state-of-the-art approaches. Results from an online A/B test prove that the deployment of the categorical feature handling method leads to statistically significant growth of conversions and revenue, which concludes to be the most influential experiment in lifting the revenue of Airbnb Experiences in 2019.
Abstract: Capturing users' precise preferences is of great importance in various recommender systems (e.g., e-commerce platforms and online advertising sites), which is the basis of how to present personalized interesting product lists to individual users. In spite of significant progress has been made to consider relations between users and items, most of existing recommendation techniques solely focus on singular type of user-item interactions. However, user-item interactive behavior is often exhibited with multi-type (e.g., page view, add-to-favorite and purchase) and inter-dependent in nature. The overlook of multiplex behavior relations can hardly recognize the multi-modal contextual signals across different types of interactions, which limit the feasibility of current recommendation methods. To tackle the above challenge, this work proposes a Memory-Augmented Transformer Networks (MATN), to enable the recommendation with multiplex behavioral relational information, and joint modeling of type-specific behavioral context and type-wise behavior inter-dependencies, in a fully automatic manner. In our MATN framework, we first develop a transformer-based multi-behavior relation encoder, to make the learned interaction representations be reflective of the cross-type behavior relations. Furthermore, a memory attention network is proposed to supercharge MATN capturing the contextual signals of different types of behavior into the category-specific latent embedding space. Finally, a cross-behavior aggregation component is introduced to promote the comprehensive collaboration across type-aware interaction behavior representations, and discriminate their inherent contributions in assisting recommendations. Extensive experiments on two benchmark datasets and a real-world e-commence user behavior data demonstrate significant improvements obtained by MATN over baselines. Codes are available at: https://github.com/akaxlh/MATN.
Abstract: Nowadays e-commerce search has become an integral part of many people's shopping routines. Two critical challenges stay in today's e-commerce search: how to retrieve items that are semantically relevant but not exact matching to query terms, and how to retrieve items that are more personalized to different users for the same search query. In this paper, we present a novel approach called DPSR, which stands for Deep Personalized and Semantic Retrieval, to tackle this problem. Explicitly, we share our design decisions on how to architect a retrieval system so as to serve industry-scale traffic efficiently and how to train a model so as to learn query and item semantics accurately. Based on offline evaluations and online A/B test with live traffics, we show that DPSR model outperforms existing models, and DPSR system can retrieve more personalized and semantically relevant items to significantly improve users' search experience by +1.29% conversion rate, especially for long tail queries by +10.03%. As a result, our DPSR system has been successfully deployed into JD.com's search production since 2019.
Abstract: While the World Wide Web provides a large amount of text in many languages, cross-lingual parallel data is more difficult to obtain. Despite its scarcity, this parallel cross-lingual data plays a crucial role in a variety of tasks in natural language processing with applications in machine translation, cross-lingual information retrieval, and document classification, as well as learning cross-lingual representations. Here, we describe the end-to-end process of searching the web for parallel cross-lingual texts. We motivate obtaining parallel text as a retrieval problem whereby the goal is to retrieve cross-lingual parallel text from a large, multilingual web-crawled corpus. We introduce techniques for searching for cross-lingual parallel data based on language, content, and other metadata. We motivate and introduce multilingual sentence embeddings as a core tool and demonstrate techniques and models that leverage them for identifying parallel documents and sentences as well as techniques for retrieving and filtering this data. We describe several large-scale datasets curated using these techniques and show how training on sentences extracted from parallel or comparable documents mined from the Web can improve machine translation models and facilitate cross-lingual NLP.
Abstract: Recent progress in deep learning has brought tremendous improvements in conversational AI, leading to a plethora of commercial conversational services that allow naturally spoken interactions, increasing the need for more human-centric interactions in IR. As a result, we have witnessed a resurgent interest in developing modern CIR systems in research communities and industry. This tutorial presents recent advances in CIR, focusing mainly on neural approaches and new applications developed in the past five years. Our goal is to provide a thorough and in-depth overview of the general definition of CIR, the components of CIR systems, new applications raised for its conversational aspects, and the (neural) techniques recently developed for it.
Abstract: Recommender systems have demonstrated great success in information seeking. However, traditional recommender systems work in a static way, estimating user preferences on items from past interaction history. This prevents recommender systems from capturing dynamic and fine-grained preferences of users. Conversational recommender systems bring a revolution to existing recommender systems. They are able to communicate with users through natural languages during which they can explicitly ask whether a user likes an attribute or not. With the preferred attributes, a recommender system can conduct more accurate and personalized recommendations. Therefore, while they are still a relatively new topic, conversational recommender systems attract great research attention. We identify four emerging directions: (1) exploration and exploitation trade-off in the cold-start recommendation setting; (2) attribute-centric conversational recommendation; (3) strategy-focused conversational recommendation; and (4) dialogue understanding and response generation. This tutorial covers these four directions, providing a review of existing approaches and progress on the topic. By presenting the emerging and promising topic of conversational recommender systems, we aim to provide take-aways to practitioners to build their own systems. We also want to stimulate more ideas and discussions with audiences on core problems of this topic such as task formalization, dataset collection, algorithm development, and evaluation, with the ambition of facilitating the development of conversational recommender systems.
Abstract: Reciprocal recommender systems, which recommend users to each other, have gained significant importance in various Internet services for connecting people in a personalized manner, such as: online dating, recruitment, socializing, learning, or skill-sharing. Unlike classical item-to-user recommenders, a fundamental requirement in reciprocal recommendation is that both parties, namely the requester user and the recommended user, must be satisfied with the "user match" recommendation in order to deem it as successful. Therefore, bidirectional preferences indicating mutual compatibility between pairs of users need to be estimated predicated on information fusion. This tutorial introduces the emerging and novel topic of reciprocal recommender systems, by analyzing their information retrieval, data-driven preference modelling and integration mechanisms for predicting suitable user matches. The tutorial will also discuss the current trends, practical use, impact and challenges of reciprocal recommenders in different application domains.
Abstract: The last few years have seen an explosion of research on the topic of automated question answering (QA), spanning the communities of information retrieval, natural language processing, and artificial intelligence. This tutorial would cover the highlights of this really active period of growth for QA to give the audience a grasp over the families of algorithms that are currently being used. We partition research contributions by the underlying source from where answers are retrieved: curated knowledge graphs, unstructured text, or hybrid corpora. We choose this dimension of partitioning as it is the most discriminative when it comes to algorithm design. Other key dimensions are covered within each sub-topic: like the complexity of questions addressed, and degrees of explainability and interactivity introduced in the systems. We would conclude the tutorial with the most promising emerging trends in the expanse of QA, that would help new entrants into this field make the best decisions to take the community forward. Much has changed in the community since the last tutorial on QA in SIGIR 2016, and we believe that this timely overview will indeed benefit a large number of conference participants.
Abstract: While great strides are made in the field of search and recommendation, there are still challenges and opportunities to address information access issues that involve solving tasks and accomplishing goals for a wide variety of users. Specifically, we lack intelligent systems that can detect not only the request an individual is making (what), but also understand and utilize the intention (why) and strategies (how) while providing information. Many scholars in the fields of information retrieval, recommender systems, productivity (especially in task management and time management), and artificial intelligence have recognized the importance of extracting and understanding people's tasks and the intentions behind performing those tasks in order to serve them better. However, we are still struggling to support them in task completion, e.g., in search and assistance, it has been challenging to move beyond single-query or single-turn interactions. The proliferation of intelligent agents has opened up new modalities for interacting with information, but these agents will need to be able to work more intelligently in understanding the context and helping the users at task level. This tutorial will introduce the attendees to the issues of detecting, understanding, and using task and task-related information in an information episode (with or without active searching). Specifically, it will cover several recent theories, models, and methods that show how to represent tasks and use behavioral data to extract task information. It will then show how this knowledge or model could contribute to addressing emerging retrieval and recommendation problems.
Abstract: Search applications such as image search, app search and product search are crucial parts of web search, which we denote as vertical search services. This tutorial will introduce the research and applications of user behavior modeling for vertical search. The bulk of the tutorial is devoted to covering research into behavior patterns, user behavior models and applications of user behavior data to refine evaluation metrics and ranking models for web-based vertical search.
Abstract: Since Information Retrieval (IR) is an interactive process in general, it is important to study Interactive Information Retrieval (IIR), where we would attempt to model and optimize an entire interactive retrieval process (rather than a single query) with consideration of many different ways a user can potentially interact with a search engine. This tutorial systematically reviews the progress of research in IIR with an emphasis on the most recent progress in the development of models, algorithms, and evaluation strategies for IIR. It starts with a broad overview of research in IIR and then gives an introduction to formal models for IIR using a cooperative game framework and covering decision-theoretic models such as the Interface Card Model and Probability Ranking Principle for IIR. Next, it provides a review of some representative specific techniques and algorithms for IIR, such as various forms of feedback techniques and diversification of search results, followed by a discussion of how an IIR system should be evaluated and multiple strategies proposed recently for evaluating IIR using user simulation. The tutorial ends with a brief discussion of the major open challenges in IIR and some of the most promising future research directions.
Abstract: Nowadays, intelligent information systems, especially the interactive information systems (e.g., conversational interaction systems like Siri, and Cortana; news feed recommender systems, and interactive search engines, etc.), are ubiquitous in real-world applications. These systems either converse with users explicitly through natural languages, or mine users interests and respond to users requests implicitly. Interactivity has become a crucial element towards intelligent information systems. Despite the fact that interactive information systems have gained significant progress, there are still many challenges to be addressed when applying these models to real-world scenarios. This half day workshop explores challenges and potential research, development, and application directions in applied interactive information systems. We aim to discuss the issues of applying interactive information models to production systems, as well as to shed some light on the fundamental characteristics, i.e., interactivity and applicability, of different interactive tasks. We welcome practical, theoretical, experimental, and methodological studies that advances the interactivity towards intelligent information systems. The workshop aims to bring together a diverse set of practitioners and researchers interested in investigating the interaction between human and information systems to develop more intelligent information systems.
Abstract: This half-day workshop explores challenges and potential research directions about Information Retrieval (IR) in finance. The focus will be on stimulating discussions around the accessing, searching, filtering, and analyzing financial documents in banking, insurance, and investment, such as the financial statements, analyst reports, filling forms, and news articles. We welcome theoretical, experimental, and methodological studies that aim to advance techniques of managing and understanding financial documents, as well as emphasize the applicability in practical applications. The workshop aims to bring together a diverse set of researchers and practitioners interested in investigating relevant topics. Besides, to facilitate developing and testing some relevant techniques, we hold a data challenge on quantifying analyst reports and news articles for the prediction of commodity prices.
Abstract: The BIRDS workshop aimed to foster the cross-fertilization of Information Science (IS), Information Retrieval (IR) and Data Science (DS). Recognising the commonalities and differences between these communities, the proposed full-day workshop brought together experts and researchers in IS, IR and DS to discuss how they can learn from each other to provide more user-driven data and infor- mation exploration and retrieval solutions. Therefore, the papers aimed to convey ideas on how to utilise, for instance, IS concepts and theories in DS and IR or DS approaches to support users in data and information exploration.
Abstract: eCommerce Information Retrieval (IR) is receiving increasing attention in the academic literature and is an essential component of some of the largest web sites (e.g. Amazon, Alibaba, Taobao, eBay, Airbnb, Target, Facebook). eCommerce organisations consistently sponsor SIGIR, reflecting the importance of IR research to them. This workshop (1) brings together researchers and practitioners of eCommerce IR to discuss topics unique to it, (2) determines how to use eCommerce's unique combination of free text, structured data, and customer behavioral data to improve search relevance, and (3) examines how to build data sets and evaluate algorithms in this domain. Since eCommerce customers often do not know exactly what they want to buy, recommendations are valuable for inspiration, serendipitous discovery and basket building. The theme of this year's eCommerce IR workshop is integrating recommendations into search for eCommerce. In addition to the focus on recommender systems in eCommerce search, Rakuten France is sponsoring a data challenge on taxonomy classification using multi-modal (image, text and structured data) input. The data challenge reflects themes from the 2017--2019 SIGIR workshops.
Abstract: Search and recommender systems process rich natural language text data such as user queries and documents. Achieving high-quality search and recommendation results requires processing and understanding such information effectively and efficiently, where natural language processing (NLP) technologies are widely deployed. In recent years, the rapid development of deep learning technology has been proven successful for improving various NLP tasks, indicating their great potential of promoting search and recommender systems. Developing deep learning models for NLP in search and recommender systems involves various fundamental components including query / document understanding, retrieval & ranking, and language generation. In this workshop, we propose to discuss deep neural network based NLP technologies and their applications in search and recommendation, with the goal of understanding (1) Why deep NLP is helpful; (2) What are the challenges to develop and productionize it; (3) How to overcome the challenges; (4) Where deep NLP models produce the largest impact.
Abstract: In the digital era, information retrieval, text/knowledge mining, and NLP techniques are playing increasingly vital roles in legal domain. While the open datasets and innovative deep learning methodologies provide critical potentials, in the legal-domain, efforts need to be made to transfer the theoretical/algorithmic models into the real applications to assist users, lawyers, judges and the legal professions to solve the real problems. The objective of this workshop is to aggregate studies/applications of text mining/retrieval and NLP automation in the context of classical/novel legal tasks, which address algorithmic, data and social challenges of legal intelligence. Keynote and invited presentations from industry and academic will be able to fill the gap between ambition and execution in the legal domain.
Abstract: Information retrieval (IR) techniques, such as search, recommendation and online advertising, satisfying users' information needs by suggesting users personalized objects (information or services) at the appropriate time and place, play a crucial role in mitigating the information overload problem. Since the widely use of mobile applications, more and more information retrieval services have provided interactive functionality and products. Thus, learning from interaction becomes a crucial machine learning paradigm for interactive IR, which is based on reinforcement learning. With recent great advances in deep reinforcement learning (DRL), there have been increasing interests in developing DRL based information retrieval techniques, which could continuously update the information retrieval strategies according to users' real-time feedback, and optimize the expected cumulative long-term satisfaction from users. Our workshop aims to provide a venue, which can bring together academia researchers and industry practitioners (i) to discuss the principles, limitations and applications of DRL for information retrieval, and (ii) to foster research on innovative algorithms, novel techniques, and new applications of DRL to information retrieval.
Abstract: Explainable recommendation and search attempt to develop models or methods that not only generate high-quality recommendation or search results, but also interpretability of the models or explanations of the results for users or system designers, which can help to improve the system transparency, persuasiveness, trustworthiness, and effectiveness, etc. This is even more important in personalized search and recommendation scenarios, where users would like to know why a particular product, web page, news report, or friend suggestion exists in his or her own search and recommendation lists. The workshop focuses on the research and application of explainable recommendation, search, and a broader scope of IR tasks. It will gather researchers as well as practitioners in the field for discussions, idea communications, and research promotions. It will also generate insightful debates about the recent regulations regarding AI interpretability, to a broader community including but not limited to IR, machine learning, AI, Data Science, and beyond.
Abstract: The growth in social Web platforms in the past years has brought an increase in displays of online hate speech. This subject is considered as a critical matter in the Web community, since it can be related to potentially dangerous actions that affect individuals and groups in the physical world. The automatic detection of this type of expressions has been the center of several investigations over the past few years. However, most research on this subject has been done for the English language and on rather limited datasets. In addition, although some works approach the problem from a multilingual perspective, analyzing different language separately, across-lingual perspective of this problem has not been used so far. The main research proposal of this thesis is to characterize hate speech and other forms of online harassment from different perspectives and use this characterizations to create novel models for online hate speech detection across different languages and domains.
Abstract: With the advancement of the Web and large number of legal documents being made available digitally, legal practitioners are now facing new challenges. It is now intractable for them to manually find relevant information (prior cases, related statutes, etc.) that would assist an ongoing case. From our discussions with law experts from India (faculty members from the Rajiv Gandhi School of Intellectual Property Law, India) as well as other countries like UK (Swansea University) and USA (Thomson Reuters), we understand that it is important to develop assistive tools for several tasks in the legal domain, e.g., identifying relevant documents, summarizing legal text, and so on. Though there exist legal information systems (e.g., Thomson Reuters WestLaw, LexisNexis), often law practitioners are not satisfied with the search results/summaries available on such systems. Developing assistive tools in the legal domain needs addressing certain research challenges. We outline the challenges and some initial progress made on each of them.
Abstract: The current research will be devoted to the challenging and under-investigated task of multi-source answer generation for complex non-factoid questions. We will start with experimenting with generative models on one particular type of non-factoid questions - instrumental/procedural questions which often start with "how-to". For this, a new dataset, comprised of more than 100,000 QA-pairs which were crawled from a dedicated web-resource where each answer has a set of references to the articles it was written upon, will be used. We will also compare different ways of model evaluation to choose a metric which better correlates with human assessment. To be able to do this, the way people evaluate answers to non-factoid questions and set some formal criteria of what makes a good quality answer is needed to be understood. Eye-tracking and crowdsourcing methods will be employed to study how users interact with answers and evaluate them, and how the answer features correlate with task complexity. We hope that our research will help to redefine the way users interact and work with search engines so as to transform IR finally into the answer retrieval systems that users have always desired.
Abstract: The ubiquitous availability of mobile devices with GPS capabilities and the popularity of social media platforms have created a rich source for textual data with spatio-temporal information. Also, other domains like crime incident description and search engine queries, can provide spatio-temporal textual data. These data sources can be used to discover space-time related insights of human behavior. This work focuses on modeling text that is associated with a particular time and place. We extend the traditional language modeling task from natural language processing to language modeling under spatio-temporal conditions. This task definition allows us to use the same evaluation framework used in language modeling. A model for spatio-temporal text data representation should be able to capture the patterns that guide how text is generated in a spatio-temporal context. We aim to develop neural network models for language modeling conditioned on spatio-temporal variables with the ability to capture properties such as: neighborhood, periodicity and hierarchy.
Abstract: Users are increasingly relying on personalized recommendations (such as news, songs, products) for their daily information consumption. To deliver personalized content to users, Heterogeneous Information Network (HIN)-based recommender systems integrate various data collected from users into an often complex ranking model. The resulting recommendations might thus be puzzling for the users, leaving them wondering why some particular items are recommended to them or how these items relate to their actions on the platform. Therefore, to gain users' trust, it is crucial to provide them with explanations for their recommendations.
Abstract: Data from human-machine interaction can be used to improve the quality of artificial intelligence (AI) systems. When designing a system with humans in the loop, one of the questions to be asked is how much human work is required to create a reliable data collection. Crowdsourcing has become a popular methodology to collect annotations from crowd workers who successfully complete crowdsourcing tasks. Thus, if workers reach varying task completion stages without finally submitting their work, all their effort would be unrewarded and annotations discarded. Task abandonment remains invisible within the platform. On the other hand, paid crowdsourcing dynamics are often influenced by large batches of similar tasks, allowing workers to learn and develop efficient work strategies. To date, however, there is limited research aiming at understanding how human annotators complete tasks over time. Even for complex tasks in lab studies, understanding how behavioral patterns evolve during task completion remains unexplored. The aim of this research is to study how these tasks are completed, to help reduce non-essential data collection costs, and to support workers in efficient task completion. The objective is to reveal what happens with humans in the loop, looking at three related aspects: cost, effort and behavior. In particular, I explore (i) how to make best use of a given budget to conduct data annotation experiments and to collect labelled data; (ii) how the reward received by human workers can be affected by invisible actions in abandoned tasks; and (iii) how they complete tasks to collect the associate reward. I focus on the following research questions (RQs). RQ1: How can we increase the intrinsic value of tasks? RQ2: What is the blind effort of workers while completing tasks? RQ3: What are the patterns displayed by workers as they progress in tasks? Our findings bear implications on building cost-effective human-machine workflows. The proposed methodology has the potential to benefit AI practitioners building human-like products, and to enable domain experts and crowd workers to perform tasks effectively.
Abstract: Relevance in ad-hoc retrieval is a fundamental problem of text understanding. Developing neural network methods for this foundational task of Information Retrieval (IR) has the potential to impact many search domains. Recently, a new generation of Transformerbased [6] neural re-ranking models initiated a new era, by providing substantial effectiveness increases in ad-hoc search tasks [1, 4, 5]. They operate on the full text of a query and a list of candidate documents from an initial retrieval. Using self-attention, they contextualize term occurrences conditioned on the sequence they are contained in. However, self-attention, because it is applied to a whole sequence and commonly uses many layers, is inefficient and re-ranking models still depended on the bottleneck of the initial retrieval of candidate documents. In our thesis we plan to address these shortcomings. First, we plan to make contextualization efficient enough to to be usable in resource constrained environments, and second we plan to use the efficient contextualization components to create a novel approach for learning to index and retrieve in a unified neural model. Hence, our main research questions are: RQ1 How can we balance the trade-off between efficiency and effectiveness in contextualized neural re-ranking? Solving efficiency problems with massively parallelized hardware is neither an economically nor environmentally friendly solution. We wish to tackle the problem of efficient contextualization in two common scenarios of ad-hoc ranking: passage retrieval and document retrieval. As part of this PhD we already proposed a novel efficient Transformer based passage re-ranking model: The TK (Transformer-Kernel) model [4] utilizes shallow Transformerlayers to contextualize query and passages separately, and the kernel-pooling technique [7] to score individual term interactions between a query and a passage. Additionally, we explored localattention as an effective approach for document-length ranking with an extension to the TK for long text (TKL) [3]. RQ2 What is needed to learn generalized contextualized representations for indexing and retrieval? Currently, IR systems are a patchwork of traditional and neural systems. Different parts of the pipeline are optimized in isolation and missing integration during training. To overcome bottlenecks and sub-optimal integration of pipeline components, we plan to develop a unified neural index and ranking model. A unified indexing and ranking model needs to learn how to efficiently store document representations, for example via trained sparsity [8], followed by the integration of all second-generation contextualized re-ranking components. We plan to train this unified model in a true end-toend approach, with gradients flowing through the complete system. The main challenge here is to generalize models beyond their training domain, to match the usability of traditional relevance models such as BM25, which can be used as a drop-in approach in almost any domain and language. As part of this PhD we already showed the importance of tuning the re-ranking depth as the interface between traditional and neural models [2]. Furthermore, we plan to create a truly novel indexing and ranking model, that removes all bottlenecks and patchwork systems. Contextualization is a key element to focus the model on the actual topic of a query and document, and reduce the search space for previously ambiguous words and topics. Parallel to our main research questions, we plan to keep explainability and thorough analysis of our approaches a priority during this PhD work.
Abstract: This PhD thesis will explore conversational question answering with a special emphasis on incorporating user feedback. As preliminary work, we developed a conversational passage retrieval system in the scope of the TREC Conversational Assistance Track 2019. Our current focus is to develop methods based on reinforcement learning to incorporate implicit user feedback in form of question reformulations for conversational QA over knowledge graphs. Finally, we plan to design a conversational QA system operating on heterogeneous sources.
Abstract: Modeling user behaviors in an interactive information seeking process is key to understand user information need. Two prime characteristics of interactive information seeking are statefulness and participant initiatives. There are many work focus on click modeling using a search session log. However, such a search session of user queries and clicks lacks initiative information between user and system to model user state. In this research proposal, we focus on product search in E-commerce. We augment the search session with generalized initiative information. We integrate long term user features into search sessions and formulate user modeling as a Markov Decision Process (MDP). We further investigate how proposed user modeling would benefit Learning to Rank (LTR) in interactive stateful information seeking process.
Abstract: Recommender systems lie at the heart of many online services such as E-commerce, social media platforms and advertising. To keep users engaged and satisfied with the displayed items, recommender systems usually use the users' historical interactions containing their interests and purchase habits to make personalised recommendations. Recently, Graph Neural Networks (GNNs) have emerged as a technique that can effectively learn representations from structured graph data. By treating the traditional user-item interaction matrix as a bipartite graph, many existing graph-based recommender systems (GBRS) have been shown to achieve state-of-the-art performance when employing GNNs. However, the existing GBRS approaches still have several limitations, which prevent the GNNs from achieving their full potential. In this work, we propose to enhance the performance of the GBRS approaches along several research directions, namely leveraging additional items and users' side information, extending the existing undirected graphs to account for social influence among users, and enhancing their underlying optimisation criterion. In the following, we describe these proposed research directions.
Abstract: Legal case retrieval is a specialized IR task that aims to retrieve supporting precedents given a query case. Different from the traditional ad-hoc text retrieval, the query case is much longer and complex than common keyword queries. Legal relevance between a supporting case and a query case is defined beyond general topical relevance and it requires legal knowledge to make relevance judgment. It is thus difficult to collect a large-scale case retrieval dataset along with accurate relevance judgments. Therefore, legal case retrieval is more challenging. As a primary attempt, we propose to develop a retrieval model to tackle these challenges based on the benchmarks in this task. Moreover, we plan to investigate the practical interactions between legal practitioners and retrieval systems and further apply the user behavior models to improve system performance. Beyond the binary labels, we would like to take a deeper look at the decision process of relevance judgment in legal practice, which will benefit related tasks such as relevance estimation and result ranking.
Abstract: As for many complex diseases, there is no "one size fits all" solutions for patients with a particular diagnosis in practice, which should be treated depends on patient's genetic, environmental, lifestyle choices and so on. Precision medicine can provide personalized treatment for a particular patient that has been drawn more and more attention. There are a large number of treatment options, which is overwhelming for clinicians to make best treatment for a particular patient. One of the effective ways to alleviate this problem is biomedical information retrieval system, which can automatically find out relevant information and proper treatment from mass of alternative treatments and cases. However, in the biomedical literature and clinical trials, there is a larger number of synonymous, polysemous and context terms, causing the semantic gap between query and document in traditional biomedical information retrieval systems. Recently, deep learning-based biomedical information retrieval systems have been adopted to address this problem, which has the potential improvements in the performance of BMIR. With these approaches, the semantic information of query and document would be encoded as low-dimensional feature vectors. Although most existing deep learning-based biomedical information retrieval systems can perform strong accuracy, they are usually treated as a black-box model that lack the explainability. It would be difficult for clinicians to understand their ranked results, which make them doubt the effectiveness of these systems. Reasonable explanations are profitable for clinicians to make better decisions via appropriate treatment logic inference, thus further enhancing the transparency, fairness and trust of biomedical information retrieval systems. Furthermore, knowledge graph has drawn more and more attention which contains abundant real-world facts and entities. It is an effective way to provide accuracy and explainability for deep learning model and reduce the knowledge gap between experts and publics. However, it is usually simply employed as a query expansion strategy simply into biomedical information retrieval systems. It remains an open question how to extend explainable biomedical information retrieval systems to knowledge graph. Given the above, to alleviate the tradeoff between accuracy and explainability of the precision medicine, we propose to research on Biomedical Information Retrieval incorporating Knowledge Graph for Explainable Precision Medicine. In this work, we propose a neural-based biomedical information retrieval model to address the semantic gap problem and fully investigate the utility of KG for the explainable biomedical information retrieval systems. which can soft-matches the query and document with semantic information instead of ranking the model by exact matches. On the one hand, our model encodes semantic feature information of documents by using convolutional neural networks, which perform strong ability to model text information in recent years. And the relevance between query and document would be measured via soft-matches rather than exact matches. On the other hand, the explainability is endowed to biomedical information retrieval model by extending the utility of knowledge graph. A graph-based strategy would be designed to achieve this goal by building knowledge-aware paths with the help of attention scores. Specifically, graph attention networks (GAT) would be adopted to model the query's representation by summarizing high-order connectivity from graph structure. With the help of GAT-level attention, the weight scores are automatically assigned to build knowledge-aware propagation connectivity which can be regarded as evidence for the further explainable biomedical information retrieval systems. Finally, the proposed system would be evaluated by the datasets from TREC Precision Medicine.
Abstract: The quality of digital information on the web has been disquieting due to the absence of careful checking. Consequently, a large volume of false textual information is being produced and disseminated. The focus of this doctoral study is to work towards evaluating veracity of textual statements on the web. The major contributions to this growing area of research will be made from the following aspects: (1) improve stance detection and incorporate it to misinformation detection; (2) effectively utilize noisy, unstructured user engagements on social media platforms; (3) design a general framework for the early misinformation detection. Findings of this research will provide a deeper understanding of how machine learning can be leveraged to automatically detect misinformation.
Abstract: Traditional Chinese Medicine (TCM) has been used by practitioners for millennia to prevent and treat disease, but has struggled to gain broad acceptance in the West. In 2019, the World Health Organisation officially recognized TCM as a form of medical treatment, a step towards internationalizing TCM and integrating it with Western medicine (WM). The proposed dissertation research aims to bridge eastern and western medical philosophies by applying named entity recognition (NER) and information retrieval (IR) models supported by medical and cross lingual knowledge graphs, to enhance the retrieval performance as well as to increase the model explainability.