Close Menu
  • Home
  • AI
  • Art & Style
  • Economy
  • Entertainment
  • International
  • Market
  • Opinion
  • Politics
  • Sports
  • Trump
  • US
  • World
What's Hot

20 of the world’s best soups

December 15, 2025

Hugo Ekitike has already scored for Liverpool, Sprint stats show Arne Slott is right – there will be more to come | Soccer News

December 15, 2025

Inflation reduces holiday spending: CNBC National Economic Survey

December 15, 2025
Facebook X (Twitter) Instagram
WhistleBuzz – Smart News on AI, Business, Politics & Global Trends
Facebook X (Twitter) Instagram
  • Home
  • AI
  • Art & Style
  • Economy
  • Entertainment
  • International
  • Market
  • Opinion
  • Politics
  • Sports
  • Trump
  • US
  • World
WhistleBuzz – Smart News on AI, Business, Politics & Global Trends
Home » Africa has thousands of languages. Can we train AI for all of them?
International

Africa has thousands of languages. Can we train AI for all of them?

Editor-In-ChiefBy Editor-In-ChiefDecember 13, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email


How can you teach a language to read if there is nothing to read? This is a problem facing developers across the African continent as they seek to train AI to understand and respond to prompts in local languages.

To train a language model, you need data. For languages ​​like English, developers have articles, books, and manuals easily accessible on the Internet. However, for most of Africa’s languages ​​(estimated to be between 1,500 and 3,000 of them), very few written resources are available. Vukosi Maribate, a computer science professor at the University of Pretoria in South Africa, uses the number of Wikipedia articles available to illustrate the amount of data available. There are over 7 million articles in English. Tigrinya, spoken by about 9 million people in Ethiopia and Eritrea, has 335 words, but Akan, Ghana’s most widely spoken mother tongue, has none.

Of these thousands of languages, only 42 are currently supported by the language model. Of Africa’s 23 scripts and alphabets, only three are available: Latin, Arabic, and Ge’Ez (used in the Horn of Africa). This underdevelopment “comes from a financial perspective,” says Chinasa T. Okoro, founder of TechnēculturĎ, a research institute working to advance global equity in AI. “Even though there are more people who speak Swahili than Finnish, Finland is a better market for companies like Apple and Google.”

Okoro warns that unless more language models are developed, the implications for the entire continent could be dire. “We’re going to continue to see people being excluded from opportunities,” she told CNN. As the continent looks to develop its own AI infrastructure and capabilities, those who do not speak one of these 42 languages ​​risk being left behind.

Chinasa T. Okoro, founder of techno culture.

To avoid this, Okoro says, AI developers across the continent “need to rethink the way they approach model development in the first place.”

This is what Marivate did. Mr Maribate heads the South African arm of the African Next Voices project and has made recordings in 18 languages ​​in South Africa, Kenya and Nigeria. Over two years, the three teams collected 9,000 hours of recordings from people of different ages and locations, creating a dataset that AI developers across the continent can use to train their models.

The researchers sometimes gave native speakers scripts to read, but mostly they gave prompts, recorded their responses, and transcribed them. In the case of Isindebele, a language spoken in South Africa and Zimbabwe, I had great difficulty finding written material, so I relied on government manuals for goat herders to create prompts.

African Next Voices does not collect enough data to train large-scale language models (LLMs) like ChatGPT and Gemini, which can cover thousands of topics in detail. But Maribate said he focused his recordings on specific topics he considered most important, such as health and agriculture.

Using small datasets to create generalized models results in high error rates, while small, focused datasets can have high accuracy within the limited scope of specialized models, explained Nyalleng Moorosi, a researcher at the Distributed AI Research Institute (DAIR) who is not affiliated with the African Next Voices project.

For her, it’s a matter of “prioritizing error.” “If someone just wants to know what’s going on in downtown Nairobi, mistakes there can be tolerated,” Murosi says, but mistakes in models dealing with topics such as banking or health care can have serious consequences.

“We need to make sure that the people building these models understand the culture enough to understand the consequences and understand the weight of these errors,” Murosi told CNN.

Nyalleng Moorosi, Researcher at the Distributed AI Institute.

Words and symbols have multiple meanings, she says. For example, St George’s Cross has associations with right-wing politics in the UK, but this is not obvious to someone from Ghana or Lesotho. This problem is especially noticeable in languages ​​with fewer resources. “There is a lot of contextual knowledge, but very little documentation,” she says.

A DAIR investigation found that social media websites were unable to recognize and remove hate speech related to ethnic violence in Ethiopia, in part because automated systems and human moderators were unfamiliar with the slang terminology used.

Without this cultural understanding, Murosi says, it is impossible for “AI systems to behave and make decisions that are consistent with our beliefs and values.”

While many Africans speak multiple languages, including African and European languages ​​that are already supported by language models, Moorosi believes the goal should be “to make AI accessible in all languages, even those with a single speaker.” All languages ​​deserve expression or preservation.

But lack of data is not the only challenge facing AI developers in Africa. Most African languages ​​have not been codified through dictionaries or grammatical studies. In Kinyarwanda, the language of Rwanda, there are three common ways to spell the country’s name: uRwanda, Urwanda, and uRwanda. Without spelling rules, even the most basic text processing becomes difficult.

Another problem is the lack of data centers. The African Union has warned that only 10% of the continent’s data center needs will be met in 2024, creating a bottleneck for Africa’s AI hopes.

The concern for Marivate is that if models are not created for these small languages, they will “die.” If developers were to create a dataset for a language that might not even have a writing system, “the model would have to change,” he added.

The African Next Voices project has just finished collecting and transcribing data. Maribate said that while he is not currently working on a new language, he is already thinking about which language to develop next.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Editor-In-Chief
  • Website

Related Posts

20 of the world’s best soups

December 15, 2025

Venezuelan opposition leader says he aims for a peaceful transition of power after Maduro

December 15, 2025

Victims of Bondi Beach shooting: 10-year-old girl, Holocaust survivor, French national

December 15, 2025
Add A Comment

Comments are closed.

News

Cambodia warns displaced people and tourist attractions at risk of bombings in Thailand | Conflict News

By Editor-In-ChiefDecember 15, 2025

According to Phnom Penh, Thailand is targeting Siem Reap province, home to Cambodia’s tourist hub…

JetBlue flight avoids ‘midair collision’ with US tanker near Venezuela | Reuters Donald Trump News

December 14, 2025

Why the US is targeting Venezuela | Politics

December 14, 2025
Top Trending

Mirelo raises $41M from Index and a16z to solve silent AI video problem

By Editor-In-ChiefDecember 15, 2025

AI allows anyone to create videos, but many AI video creation tools…

Grok misunderstood key facts about the Bondi Beach shooting.

By Editor-In-ChiefDecember 14, 2025

Grok, a chatbot developed by Elon Musk’s xAI and popularized by social…

AI data center boom could be bad news for other infrastructure projects

By Editor-In-ChiefDecember 13, 2025

As data center construction accelerates, improvements to roads, bridges and other infrastructure…

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Welcome to WhistleBuzz.com (“we,” “our,” or “us”). Your privacy is important to us. This Privacy Policy explains how we collect, use, disclose, and safeguard your information when you visit our website https://whistlebuzz.com/ (the “Site”). Please read this policy carefully to understand our views and practices regarding your personal data and how we will treat it.

Facebook X (Twitter) Instagram Pinterest YouTube

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Facebook X (Twitter) Instagram Pinterest
  • Home
  • Advertise With Us
  • Contact US
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
  • About US
© 2025 whistlebuzz. Designed by whistlebuzz.

Type above and press Enter to search. Press Esc to cancel.