Don’t Read Too Much into It: Adaptive Computation for Open-Domain Question Answering will appear at EMNLP 2020. We propose an adaptive computation method that significantly reduces the computational cost of ODQA systems while retraining a similar performance.
Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension has been accepted at TACL 2020, and we’ll be presenting it at EMNLP 2020. We’ve also publicly released the dataset and you can try to beat our best model and submit to our online leaderboard through Dynabench.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks has been accepted at NeurIPS 2020. We’ve also released a blog post and released the code as part of the HuggingFace ecosystem. Check out a demo of RAG here
KILT: a Benchmark for Knowledge Intensive Language Tasks is now available on ArXiv! KILT is set of tools and data to accelerate research progress on open domain and knowledge intensive NLP, including open domain QA, fact checking, relation extraction and entity linking. KILT will make your work easier, more comparable and reproducible, and allow researchers to share components more easily.
check out the code, and leaderboard, and HuggingFace integrations.
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets is now on ArXiv! Do you use NaturalQuestions, TriviaQA, or WebQuestions? It turns out 60% of test set answers are also in the train set. More surprising, 30% of test questions have a close paraphrase in the train set. We look at what this means for models. Annotations and code available here:
R4C: A Benchmark for Evaluating RC Systems to Get the Right Answer for the Right Reason was presented at ACL 2020. It previously won the “Best Linguistic Resource” award at the 26th annual meeting of the Japanese Association for Natural Language Processing.
“How Context Affects Language Models' Factual Predictions” has been been awarded Best Paper at AKBC 2020!
Affiliated Faculty (Lecturer)
Senior Research Fellow, Principal Investigator for H2020 CLARIFY
Now a senior lecturer at University of Cambridge
Now a Research Scientist at DeepMind
Now a PhD student at MIT
Now a research associate at University of Sheffield
Now back to being a Ph.D. student at the University of Tokyo.
Now a master student at Tohoku University
Now back to being a PhD student at the Chinese Academy of Sciences.
Now a Research Manager at Facebook
Now a post-doc at University of Ghent
Now a research scientist at Preferred Networks (PFN)
Now back to being a PhD student at Xerox Research Centre Europe
Now a Research Scientist at Facebook
Now an associate professor at University of Copenhagen
Now an assistant professor at Tohoku University
Now a PhD student at University of Washington
V. Ivan Sanchez
Now an NLP researcher at Lenovo
Now back to being a PhD student at MIT
Now a student at Toyota Technological Institute at Chicago
Now a ML engineer at PolyAI
In this work we investigate this annotation methodology and apply it in three different settings, collecting a total of 36,000 samples with progressively stronger models in the annotation loop. This allows us to explore questions such as the reproducibility of the adversarial effect, transfer from data collected with varying model-in-the-loop strengths, and generalisation to data collected without a model. We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets, yet with progressive performance deterioration with increasingly stronger models-in-the-loop. Furthermore, we find that stronger models can still learn from datasets collected with substantially weaker models-in-the-loop. When trained on data collected with a BiDAF model in the loop, RoBERTa achieves 39.9F1 on questions that it cannot answer when trained on SQuAD - only marginally lower than when trained on data collected using RoBERTa itself (41.0F1).
KILT is a resource for training, evaluating and analyzing NLP models on Knowledge Intensive Language Tasks. KILT has been built from 11 datasets representing 5 tasks. All these datasets have been grounded in a single pre-processed wikipedia dump, allowing for fairer and more consistent evaluation and enabling new task setups such as multitask and transfer learning with minimal effort. KILT also provides tools to analyze and understand the predictions made by models, as well as the evidence they provide for their predictions.
A multi-way aligned extractive QA evaluation benchmark MLQA contains QA instances in 7 languages, English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese.
A collection of 32k task instances based on real-world rules and crowd-generated questions and scenarios requiring both the interpretation of rules and the application of background knowledge.