Our paper, Question and Answer Test-Train Overlap in Open-Domain Question Answering, has won a Best Paper award at EACL 2021! Congrats to authors Patrick, Pontus and Sebastian.
Our paper Complex Query Answering with Neural Link Predictors, a paper based on Erik’s MSc thesis supervised by Pasquale, won an Outstanding Paper Award at ICLR 2021!
PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them has been released on ArXiv! Two times faster and more accurate QA models by using a huge collection of 65M automatically-generated QA-pairs. This leads to a flexible QA system that can be optimised for memory, speed or accuracy. Check out the PAQ data here, code and models coming soon!
Answering Complex Open-Domain Questions with Multi-Hop Dense Retrieva has been accepted at ICLR 2021. This simple recursive retrieval gets state of the art results without requiring additional resources such as hyperlink networks.
Stereotype and Skew: Quantifying Gender Bias in Pre-trained and Fine-tuned Language Models has been accepted at EACL 2021. We propose two intuitive metrics, skew and stereotype, that quantify and analyse the gender bias present in contextual language models when tackling the WinoBias pronoun resolution task.
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets has been accepted at EACL 2021. If you’re doing open domain QA, evaluate your model on our test sets to see whether it generalizes, or is just memorizing the training set. Get the data here
Complex Query Answering with Neural Link Predictors, our state-of-the-art approach for answering complex queries on large and incomplete Knowledge Graphs, will appear at ICLR 2021 as an Oral – top 2% of all publications!
Affiliated Faculty (Lecturer)
Senior Research Fellow, Principal Investigator for H2020 CLARIFY
Now a senior lecturer at University of Cambridge
Now a Research Scientist at DeepMind
Now a PhD student at MIT
Now a research associate at University of Sheffield
Now back to being a Ph.D. student at the University of Tokyo.
Now a master student at Tohoku University
Now back to being a PhD student at the Chinese Academy of Sciences.
Now a Research Manager at Facebook
Now a post-doc at University of Ghent
Now a research scientist at Preferred Networks (PFN)
Now back to being a PhD student at Xerox Research Centre Europe
Now a Research Scientist at Facebook
Now an associate professor at University of Copenhagen
Now an assistant professor at Tohoku University
Now a PhD student at University of Washington
V. Ivan Sanchez
Now an NLP researcher at Lenovo
Now back to being a PhD student at MIT
Now a student at Toyota Technological Institute at Chicago
Now a ML engineer at PolyAI
Pretrained Language Models for Biomedical and Clinical Tasks: Understanding and Extending the State-of-the-ArtIn ClinicalNLP2020 @ EMNLP2020.
In this work we investigate this annotation methodology and apply it in three different settings, collecting a total of 36,000 samples with progressively stronger models in the annotation loop. This allows us to explore questions such as the reproducibility of the adversarial effect, transfer from data collected with varying model-in-the-loop strengths, and generalisation to data collected without a model. We find that training on adversarially collected samples leads to strong generalisation to non-adversarially collected datasets, yet with progressive performance deterioration with increasingly stronger models-in-the-loop. Furthermore, we find that stronger models can still learn from datasets collected with substantially weaker models-in-the-loop. When trained on data collected with a BiDAF model in the loop, RoBERTa achieves 39.9F1 on questions that it cannot answer when trained on SQuAD - only marginally lower than when trained on data collected using RoBERTa itself (41.0F1).
KILT is a resource for training, evaluating and analyzing NLP models on Knowledge Intensive Language Tasks. KILT has been built from 11 datasets representing 5 tasks. All these datasets have been grounded in a single pre-processed wikipedia dump, allowing for fairer and more consistent evaluation and enabling new task setups such as multitask and transfer learning with minimal effort. KILT also provides tools to analyze and understand the predictions made by models, as well as the evidence they provide for their predictions.
A multi-way aligned extractive QA evaluation benchmark MLQA contains QA instances in 7 languages, English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese.
A collection of 32k task instances based on real-world rules and crowd-generated questions and scenarios requiring both the interpretation of rules and the application of background knowledge.