News
- 15/02/2021:
PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them has been released on ArXiv! Two times faster and more accurate QA models by using a huge collection of 65M automatically-generated QA-pairs. This leads to a flexible QA system that can be optimised for memory, speed or accuracy. Check out the PAQ data here, code and models coming soon!
- 12/01/2021:
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieva has been accepted at ICLR 2021. This simple recursive retrieval gets state of the art results without requiring additional resources such as hyperlink networks.
- 11/01/2021:
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets has been accepted at EACL 2021. If you’re doing open domain QA, evaluate your model on our test sets to see whether it generalizes, or is just memorizing the training set. Get the data here
- 11/11/2020:
Don’t Read Too Much into It: Adaptive Computation for Open-Domain Question Answering will appear at EMNLP 2020. We propose an adaptive computation method that significantly reduces the computational cost of ODQA systems while retraining a similar performance.
- 09/11/2020:
Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension has been accepted at TACL 2020, and we’ll be presenting it at EMNLP 2020. We’ve also publicly released the dataset and you can try to beat our best model and submit to our online leaderboard through Dynabench.
- 28/09/2020:
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks has been accepted at NeurIPS 2020. We’ve also released a blog post and released the code as part of the HuggingFace ecosystem. Check out a demo of RAG here
- 04/09/2020:
KILT: a Benchmark for Knowledge Intensive Language Tasks is now available on ArXiv! KILT is set of tools and data to accelerate research progress on open domain and knowledge intensive NLP, including open domain QA, fact checking, relation extraction and entity linking. KILT will make your work easier, more comparable and reproducible, and allow researchers to share components more easily.
check out the code, and leaderboard, and HuggingFace integrations.
People

Sebastian Riedel
Professor

Pontus Stenetorp
Lecturer

Tim Rocktäschel
Affiliated Faculty (Lecturer)

Ed Grefenstette
Honorary Reader

Pasquale Minervini
Senior Research Fellow, Principal Investigator for H2020 CLARIFY
Linqing Liu
PhD Student

Matko Bošnjak
PhD Student

Max Bartolo
PhD Student

Maximilian Mozes
PhD Student

Mikayel Samvelyan
PhD Student

Minqi Jiang
PhD student

Patrick Lewis
PhD Student

Tom Crossland
PhD Student

Yao Lu
PhD Student

Yihong Chen
PhD Student

Yuxiang Wu
PhD Student

Zhengyao Jiang
PhD Student

Alastair Roberts
Visiting Researcher
Alumni

Andreas Vlachos
Now a senior lecturer at University of Cambridge

Johannes Welbl
Now a Research Scientist at DeepMind

Luke Hewitt
Now a PhD student at MIT

Gerasimos Lampouras
Now a research associate at University of Sheffield

Saku Sugawara
Now back to being a Ph.D. student at the University of Tokyo.

Sonse Shimaoka
Now a master student at Tohoku University

Zhao Zhang
Now back to being a PhD student at the Chinese Academy of Sciences.

Guillaume Bouchard
Now a Research Manager at Facebook

Thomas Demeester
Now a post-doc at University of Ghent

Jason Naradowsky
Now a research scientist at Preferred Networks (PFN)

Théo Trouillon
Now back to being a PhD student at Xerox Research Centre Europe

Marzieh Saeidi
Now a Research Scientist at Facebook

Isabelle Augenstein
Now an associate professor at University of Copenhagen

Naoya Inoue
Now an assistant professor at Tohoku University

Tim Dettmers
Now a PhD student at University of Washington

V. Ivan Sanchez
Now an NLP researcher at Lenovo

Andres Campero
Now back to being a PhD student at MIT

Takuma Yoneda
Now a student at Toyota Technological Institute at Chicago

Georgios Spithourakis
Now a ML engineer at PolyAI
Publications
Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-Art
In ClinicalNLP2020 @ EMNLP2020.Datasets
AdversarialQA (from Beat the AI)
In this work we investigate this annotation methodology and apply it in three different settings, collecting a total of 36,000 samples with progressively stronger models in the annotation loop. This allows us to explore questions such as the reproducibility of the adversarial effect, transfer from data collected with varying model-in-the-loop strengths, and generalisation to data collected without a model. We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets, yet with progressive performance deterioration with increasingly stronger models-in-the-loop. Furthermore, we find that stronger models can still learn from datasets collected with substantially weaker models-in-the-loop. When trained on data collected with a BiDAF model in the loop, RoBERTa achieves 39.9F1 on questions that it cannot answer when trained on SQuAD - only marginally lower than when trained on data collected using RoBERTa itself (41.0F1).
KILT: a Benchmark for Knowledge Intensive Language Tasks
KILT is a resource for training, evaluating and analyzing NLP models on Knowledge Intensive Language Tasks. KILT has been built from 11 datasets representing 5 tasks. All these datasets have been grounded in a single pre-processed wikipedia dump, allowing for fairer and more consistent evaluation and enabling new task setups such as multitask and transfer learning with minimal effort. KILT also provides tools to analyze and understand the predictions made by models, as well as the evidence they provide for their predictions.
MLQA
A multi-way aligned extractive QA evaluation benchmark MLQA contains QA instances in 7 languages, English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese.
ShARC: Shaping Answers with Rules through Conversation
A collection of 32k task instances based on real-world rules and crowd-generated questions and scenarios requiring both the interpretation of rules and the application of background knowledge.
WikiHop & MedHop (QAngaroo)
Multi-hop question answering datasets from two different domains, designed to enabe models to combine disjoint pieces of textual evidence.