Amazon scientists, in collaboration with researchers from the University of Sheffield, are making a large-scale fact extraction and verification dataset publicly available for the first time. The dataset, comprising more than 185,000 evidence-backed claims, is being made available to hopefully catalyze research and development that addresses the problems of fact extraction and verification in software applications or cloud-based services that perform automatic information extraction.

Portrait of Arpit Mittal, an Amazon senior machine learning scientist

Arpit Mittal

Amazon senior machine learning scientist

In a blog post published earlier today, Amazon scientist Arpit Mittal says the dataset could be used to train artificial intelligence systems to extract verifiable information, adding that this effort could further advance AI systems capable of answering any question with verifiable information.

In addition to making the dataset publicly available, the researchers are also organizing a public machine-learning competition, inviting academic and industry colleagues to tackle the problem. Results from the competition will be presented at the Fact Extraction and Verification (FEVER) workshop that will be held later this year in conjunction with the 2018 Conference on Empirical Methods in Natural Language Processing.

More details about the dataset and broader initiative are available on the project website and in the researchers’ FEVER paper, which will be published next month in the proceedings of the 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2018).