New project helps Amazon create dataset to advance multilingual language understanding research

Published:

Share:

Image of data programming

Researchers at the National Robotarium, hosted by Heriot-Watt University and the University of Edinburgh, have created a Spoken Language Understanding Resource Package (SLURP) aimed at making it easier for AI and machines to understand spoken questions and commands from humans.

One of the items included in the package is an open dataset in English spanning 18 domains. Amazon recently localised and translated the English-only SLURP dataset into 50 typologically diverse languages, creating a new multilingual dataset called MASSIVE. 

Although spoken-language understanding-based virtual assistants like Alexa have made major capability advances in the past decade, academic and industrial natural language understanding (NLU) efforts worldwide are still limited to a small subset of the world's 7,000+ languages. One difficulty in creating massively multilingual NLU models is the lack of labelled data for training and evaluation.

The newly created MASSIVE dataset, which contains one million labelled utterances spanning 51 languages and open-source code, fills the gap and helps advance the state of the art of massively multilingual NLU research.

In conjunction with the dataset release, Amazon has launched a global competition challenging researchers to build the best spoken-language understanding systems using the dataset. Results from the competition will be presented at Empirical Methods in Natural Language Processing, a leading conference on natural language processing held later this year.

The National Robotarium, based in Edinburgh, is a world-leading centre for robotics and Artificial Intelligence, accelerating research from laboratory to market that delivers substantial benefits for society.

It is supported by £21 million from the UK Government and £1.4 million from the Scottish Government through the £1.3 billion Edinburgh and South East Scotland City Region Deal - a 15 year investment programme jointly funded by both governments and regional partners.

Project lead and Professor of Conversational AI at the National Robotarium, Verena Rieser, said:

“Virtual assistants have until now only supported a tiny fraction of the world's 7,000-plus languages. The technology is an increasing presence in homes and businesses, so it's exciting that the National Robotarium has played a part in making it much more relevant and accessible for potentially millions more people. Significantly, it shows the practical applications of AI in the real world and underlines the importance of improving our conversational interactions with AI technology.”

National Robotarium CEO, Stewart Miller, said:

“Industry collaboration that impacts both business and society is a key focus of the research being developed at the National Robotarium. Helping people around the world to use voice AI systems in their native tongue is an excellent example of the solutions we're delivering to global challenges and local needs.

“With the combined robotic and AI experience of Heriot-Watt and the University of Edinburgh, the National Robotarium is paving the way for the UK to take a globally significant role at the forefront of developments in AI and machine learning technology.”