PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them has been released on ArXiv! Two times faster and more accurate QA models by using a huge collection of 65M automatically-generated QA-pairs. This leads to a flexible QA system that can be optimised for memory, speed or accuracy. Check out the PAQ data here, code and models coming soon!
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieva has been accepted at ICLR 2021. This simple recursive retrieval gets state of the art results without requiring additional resources such as hyperlink networks.
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets has been accepted at EACL 2021. If you’re doing open domain QA, evaluate your model on our test sets to see whether it generalizes, or is just memorizing the training set. Get the data here
Don’t Read Too Much into It: Adaptive Computation for Open-Domain Question Answering will appear at EMNLP 2020. We propose an adaptive computation method that significantly reduces the computational cost of ODQA systems while retraining a similar performance.
Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension has been accepted at TACL 2020, and we’ll be presenting it at EMNLP 2020. We’ve also publicly released the dataset and you can try to beat our best model and submit to our online leaderboard through Dynabench.
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks has been accepted at NeurIPS 2020. We’ve also released a blog post and released the code as part of the HuggingFace ecosystem. Check out a demo of RAG here
KILT: a Benchmark for Knowledge Intensive Language Tasks is now available on ArXiv! KILT is set of tools and data to accelerate research progress on open domain and knowledge intensive NLP, including open domain QA, fact checking, relation extraction and entity linking. KILT will make your work easier, more comparable and reproducible, and allow researchers to share components more easily.
check out the code, and leaderboard, and HuggingFace integrations.
Affiliated Faculty (Lecturer)
Senior Research Fellow, Principal Investigator for H2020 CLARIFY
Now a senior lecturer at University of Cambridge
Now a Research Scientist at DeepMind
Now a PhD student at MIT
Now a research associate at University of Sheffield
Now back to being a Ph.D. student at the University of Tokyo.
Now a master student at Tohoku University
Now back to being a PhD student at the Chinese Academy of Sciences.
Now a Research Manager at Facebook
Now a post-doc at University of Ghent
Now a research scientist at Preferred Networks (PFN)
Now back to being a PhD student at Xerox Research Centre Europe
Now a Research Scientist at Facebook
Now an associate professor at University of Copenhagen
Now an assistant professor at Tohoku University
Now a PhD student at University of Washington
V. Ivan Sanchez
Now an NLP researcher at Lenovo
Now back to being a PhD student at MIT
Now a student at Toyota Technological Institute at Chicago
Now a ML engineer at PolyAI
Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-ArtIn ClinicalNLP2020 @ EMNLP2020.
In this work we investigate this annotation methodology and apply it in three different settings, collecting a total of 36,000 samples with progressively stronger models in the annotation loop. This allows us to explore questions such as the reproducibility of the adversarial effect, transfer from data collected with varying model-in-the-loop strengths, and generalisation to data collected without a model. We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets, yet with progressive performance deterioration with increasingly stronger models-in-the-loop. Furthermore, we find that stronger models can still learn from datasets collected with substantially weaker models-in-the-loop. When trained on data collected with a BiDAF model in the loop, RoBERTa achieves 39.9F1 on questions that it cannot answer when trained on SQuAD - only marginally lower than when trained on data collected using RoBERTa itself (41.0F1).
KILT is a resource for training, evaluating and analyzing NLP models on Knowledge Intensive Language Tasks. KILT has been built from 11 datasets representing 5 tasks. All these datasets have been grounded in a single pre-processed wikipedia dump, allowing for fairer and more consistent evaluation and enabling new task setups such as multitask and transfer learning with minimal effort. KILT also provides tools to analyze and understand the predictions made by models, as well as the evidence they provide for their predictions.
A multi-way aligned extractive QA evaluation benchmark MLQA contains QA instances in 7 languages, English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese.
A collection of 32k task instances based on real-world rules and crowd-generated questions and scenarios requiring both the interpretation of rules and the application of background knowledge.