Close Menu
  • Home
  • AI
  • Art & Style
  • Economy
  • Entertainment
  • International
  • Market
  • Opinion
  • Politics
  • Sports
  • Trump
  • US
  • World
What's Hot

See the images shortlisted for the 2026 Wildlife Photographer of the Year People’s Choice Award.

February 3, 2026

Trump Project Vault Stockpile Contains All Critical Minerals

February 3, 2026

AMD Earnings Report Q4 2025

February 3, 2026
Facebook X (Twitter) Instagram
WhistleBuzz – Smart News on AI, Business, Politics & Global Trends
Facebook X (Twitter) Instagram
  • Home
  • AI
  • Art & Style
  • Economy
  • Entertainment
  • International
  • Market
  • Opinion
  • Politics
  • Sports
  • Trump
  • US
  • World
WhistleBuzz – Smart News on AI, Business, Politics & Global Trends
Home » Africa has thousands of languages. Can we train AI for all of them?
International

Africa has thousands of languages. Can we train AI for all of them?

Editor-In-ChiefBy Editor-In-ChiefDecember 13, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Share
Facebook Twitter LinkedIn Pinterest Email


How can you teach a language to read if there is nothing to read? This is a problem facing developers across the African continent as they seek to train AI to understand and respond to prompts in local languages.

To train a language model, you need data. For languages ​​like English, developers have articles, books, and manuals easily accessible on the Internet. However, for most of Africa’s languages ​​(estimated to be between 1,500 and 3,000 of them), very few written resources are available. Vukosi Maribate, a computer science professor at the University of Pretoria in South Africa, uses the number of Wikipedia articles available to illustrate the amount of data available. There are over 7 million articles in English. Tigrinya, spoken by about 9 million people in Ethiopia and Eritrea, has 335 words, but Akan, Ghana’s most widely spoken mother tongue, has none.

Of these thousands of languages, only 42 are currently supported by the language model. Of Africa’s 23 scripts and alphabets, only three are available: Latin, Arabic, and Ge’Ez (used in the Horn of Africa). This underdevelopment “comes from a financial perspective,” says Chinasa T. Okoro, founder of TechnēculturĎ, a research institute working to advance global equity in AI. “Even though there are more people who speak Swahili than Finnish, Finland is a better market for companies like Apple and Google.”

Okoro warns that unless more language models are developed, the implications for the entire continent could be dire. “We’re going to continue to see people being excluded from opportunities,” she told CNN. As the continent looks to develop its own AI infrastructure and capabilities, those who do not speak one of these 42 languages ​​risk being left behind.

Chinasa T. Okoro, founder of techno culture.

To avoid this, Okoro says, AI developers across the continent “need to rethink the way they approach model development in the first place.”

This is what Marivate did. Mr Maribate heads the South African arm of the African Next Voices project and has made recordings in 18 languages ​​in South Africa, Kenya and Nigeria. Over two years, the three teams collected 9,000 hours of recordings from people of different ages and locations, creating a dataset that AI developers across the continent can use to train their models.

The researchers sometimes gave native speakers scripts to read, but mostly they gave prompts, recorded their responses, and transcribed them. In the case of Isindebele, a language spoken in South Africa and Zimbabwe, I had great difficulty finding written material, so I relied on government manuals for goat herders to create prompts.

African Next Voices does not collect enough data to train large-scale language models (LLMs) like ChatGPT and Gemini, which can cover thousands of topics in detail. But Maribate said he focused his recordings on specific topics he considered most important, such as health and agriculture.

Using small datasets to create generalized models results in high error rates, while small, focused datasets can have high accuracy within the limited scope of specialized models, explained Nyalleng Moorosi, a researcher at the Distributed AI Research Institute (DAIR) who is not affiliated with the African Next Voices project.

For her, it’s a matter of “prioritizing error.” “If someone just wants to know what’s going on in downtown Nairobi, mistakes there can be tolerated,” Murosi says, but mistakes in models dealing with topics such as banking or health care can have serious consequences.

“We need to make sure that the people building these models understand the culture enough to understand the consequences and understand the weight of these errors,” Murosi told CNN.

Nyalleng Moorosi, Researcher at the Distributed AI Institute.

Words and symbols have multiple meanings, she says. For example, St George’s Cross has associations with right-wing politics in the UK, but this is not obvious to someone from Ghana or Lesotho. This problem is especially noticeable in languages ​​with fewer resources. “There is a lot of contextual knowledge, but very little documentation,” she says.

A DAIR investigation found that social media websites were unable to recognize and remove hate speech related to ethnic violence in Ethiopia, in part because automated systems and human moderators were unfamiliar with the slang terminology used.

Without this cultural understanding, Murosi says, it is impossible for “AI systems to behave and make decisions that are consistent with our beliefs and values.”

While many Africans speak multiple languages, including African and European languages ​​that are already supported by language models, Moorosi believes the goal should be “to make AI accessible in all languages, even those with a single speaker.” All languages ​​deserve expression or preservation.

But lack of data is not the only challenge facing AI developers in Africa. Most African languages ​​have not been codified through dictionaries or grammatical studies. In Kinyarwanda, the language of Rwanda, there are three common ways to spell the country’s name: uRwanda, Urwanda, and uRwanda. Without spelling rules, even the most basic text processing becomes difficult.

Another problem is the lack of data centers. The African Union has warned that only 10% of the continent’s data center needs will be met in 2024, creating a bottleneck for Africa’s AI hopes.

The concern for Marivate is that if models are not created for these small languages, they will “die.” If developers were to create a dataset for a language that might not even have a writing system, “the model would have to change,” he added.

The African Next Voices project has just finished collecting and transcribing data. Maribate said that while he is not currently working on a new language, he is already thinking about which language to develop next.



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Editor-In-Chief
  • Website

Related Posts

See the images shortlisted for the 2026 Wildlife Photographer of the Year People’s Choice Award.

February 3, 2026

Grieving Iranians cower in silence next to protesters’ graves

February 3, 2026

Who is Peter Mandelson and why has his relationship with Epstein rocked the British establishment?

February 3, 2026
Add A Comment

Comments are closed.

News

US House passes $1.2 trillion spending package to end government shutdown | Politics News

By Editor-In-ChiefFebruary 3, 2026

The bill will be sent to US President Donald Trump’s desk for signature.Published February 3,…

Trump-Petro talks: How cold have U.S.-Colombia relations been? |Donald Trump News

February 3, 2026

Modi, Trump announce India-US ‘trade deal’: What we know and what we don’t | Explainer News

February 3, 2026
Top Trending

Xcode moves to agent coding with deeper OpenAI and Anthropic integration

By Editor-In-ChiefFebruary 3, 2026

Apple brings agent coding to Xcode. On Tuesday, the company announced the…

Intel will start manufacturing GPUs, a market dominated by Nvidia

By Editor-In-ChiefFebruary 3, 2026

As Intel continues to rebuild, its CEO has promised to start producing…

Lotus Health wins $35 million for AI doctors to examine patients for free

By Editor-In-ChiefFebruary 3, 2026

More and more people are asking OpenAI’s ChatGPT and other LLMs questions…

Subscribe to News

Subscribe to our newsletter and never miss our latest news

Welcome to WhistleBuzz.com (“we,” “our,” or “us”). Your privacy is important to us. This Privacy Policy explains how we collect, use, disclose, and safeguard your information when you visit our website https://whistlebuzz.com/ (the “Site”). Please read this policy carefully to understand our views and practices regarding your personal data and how we will treat it.

Facebook X (Twitter) Instagram Pinterest YouTube

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Facebook X (Twitter) Instagram Pinterest
  • Home
  • Advertise With Us
  • Contact US
  • DMCA Policy
  • Privacy Policy
  • Terms & Conditions
  • About US
© 2026 whistlebuzz. Designed by whistlebuzz.

Type above and press Enter to search. Press Esc to cancel.