European AI start-ups race to improve chatbots’ language skills
European start-ups are racing to solve a problem with popular artificial intelligence chatbots: the quality of responses in languages other than English.
Helsinki-based Silo AI is on Wednesday set to launch an initiative to contribute to the building new large language models, which underlie generative AI products such as OpenAI’s ChatGPT and Google’s Bard, in European languages including Swedish, Icelandic, Norwegian and Danish.
The Finnish company joins other groups working to improve the technology behind chatbots, which give realistic answers to written prompts, with languages such as German, Hebrew and Arabic.
The moves come as companies around the world start adopting AI software built by the likes of Microsoft-backed OpenAI and Google, causing critics to express concerns about an over-reliance on a powerful, closed technology built by a small group of mostly US participants.
“A European initiative needs to . . . capture knowledge from a European perspective and we can control what kind of data is being fed into it,” said Peter Sarlin, chief executive of Silo AI.
Google’s Bard currently only works in English. OpenAI’s ChatGPT supports dozens of languages, including European languages, Hindi, Farsi and others. However, it is not equally accurate across all languages, according to those who have tested it extensively.
Silo AI is attempting to solve the issue by assembling a team of experienced AI academics from across Europe. They will build, train and operate Scandinavian-language models on the continent’s most powerful supercomputer LUMI, which is located in Finland and has been modified to run generative AI software.
The new team’s initiative, known as SiloGen, plans to expand to more languages over time.
The issue is not purely linguistic, however. Creating models in Europe can ensure the quality of the data used for training is representative of the culture and ethics of countries outside the US, including on matters of privacy, said Sarlin.
Silo AI’s model will also be open-source, meaning it can be analysed and adapted by anyone wanting to deploy it. This is in contrast to OpenAI and Google’s closed models, with which companies may be reluctant to share their confidential or proprietary data.
Other European efforts include OpenGPT-X and LEAM, which are both German-led initiatives to develop open-source language models. The models of OpenGPT-X are being built in conjunction with German AI start-up Aleph Alpha.
When it launched last year, the group behind OpenGPT-X warned that the lack of access to details of models such as GPT-4 threatened Europe’s “digital sovereignty and market independence” in AI, which might hamper the growth of European companies and research.
Marco Trombetti, chief executive of Italian digital translation company Translated, said leading chatbots had been programmed to deliver their best results in English, which was “not fair to the rest of the world”.
To counter this, his company has created a live translation tool for ChatGPT that works in 60 languages and is aimed at improving the tool’s answers.
“It’s like a leap five years backwards, in terms of the technology, for the non-English speaking world, which effectively creates a two-speed world,” said Trombetti of the current generative AI tools.
Such concerns are not only being voiced in Europe. The Israel Innovation Authority spent Shk7.5mn (about $2.1mn) on creating the Association of Natural Language Processing. The group is trying to reverse the “poor and insufficient quality of Hebrew and Arabic speech recognition in various types of computerised systems”, said Dror Bin, its chief executive.
Bin said that with limited funding for AI research in Arabic-speaking countries and relatively few Hebrew speakers in the world, the fear is they will be left behind as AI products are integrated into commercial applications such as Microsoft Office and Google Workspace.
“The quality of understanding and recognising human speech in Hebrew and Arabic is lower and constitutes a barrier to realising and applying advanced services,” he added.
Additional reporting by John Thornhill