Google Launches WAXAL: Giving 100 Million Africans a Voice in AI
- Discovery Community
- 5 hours ago
- 4 min read

Google and African Institutions Launch WAXAL to Power AI in 21 African Languages
Google, in partnership with leading African universities and community organisations, has unveiled WAXAL, a large-scale open speech dataset designed specifically for African languages. The initiative aims to unlock a new generation of artificial intelligence tools that can genuinely understand and speak the languages used across the continent.
Built with thousands of hours of real speech data, WAXAL is set to support research, education, and locally developed technologies. More importantly, it tackles one of AI’s most persistent problems: the historical exclusion of African languages from the data used to train modern systems.
What Google Announced
On February 2, 2026, Google and a coalition of African universities and research institutions officially launched WAXAL. The project’s mission is to make it easier for researchers, developers, and startups to create speech technology that reflects how Africans actually speak.
WAXAL contains over 1,250 hours of transcribed speech across 21 Sub-Saharan African languages, as well as more than 20 hours of studio-quality recordings for building high-fidelity synthetic voices. The dataset is now publicly available, allowing anyone working on African speech technology to build without starting from zero.
In practical terms, this means AI models can finally be trained to recognise, understand, and generate speech in languages that have long been overlooked by mainstream technology.
Why African Speech Data Has Been a Major Challenge
Speech-based AI systems depend on vast amounts of training data. Without recordings of people speaking a language in different accents, speeds, and contexts, models struggle with tasks like transcription and voice recognition.
Africa is home to more than 2,000 languages, yet only a small fraction have had enough digital data to support modern AI development. As a result, many voice-enabled tools—from virtual assistants to automated customer service platforms either fail to work in local languages or do not support them at all.
WAXAL directly addresses this gap by providing foundational data for building:
Speech recognition systems
Text-to-speech tools
Voice-driven applications tailored to African users
Languages Covered by WAXAL
The first release of WAXAL includes a broad linguistic and geographic spread:
Acholi, Akan, Dagaare, Dagbani, Dholuo, Ewe, Fante, Fulani (Fula), Hausa, Igbo, Ikposo (Kposo), Kikuyu, Lingala, Luganda, Malagasy, Masaaba, Nyankole, Rukiga, Shona, Soga (Lusoga), Swahili, and Yoruba.
This mix includes both widely spoken languages and those that have historically received limited technological attention, helping to reduce long-standing digital inequality.
Built in Africa, Not Just for Africa
One of WAXAL’s most significant features is how it was created. The dataset was built over three years by African academic and community organisations working directly with local speakers, rather than being collected remotely.
Institutions such as Makerere University (Uganda), the University of Ghana, and Digital Umuganda (Rwanda) led data collection efforts, with technical support from Google Research Africa. Importantly, these institutions retain full ownership of the data, setting a new standard for equitable AI development partnerships.
Aisha Walcott-Bryant, Head of Google Research Africa, described WAXAL as “scientific infrastructure” rather than just a dataset. She noted that it gives African students, researchers, and entrepreneurs the tools to build technology “on their own terms, in their own languages,” with the potential to impact more than 100 million people.
Implications for Education, Health, and Business
The benefits of WAXAL extend beyond academic research. With reliable speech data, developers can create tools that:
Support education in local languages
Improve access to healthcare information
Enable voice-based services for people with limited literacy
At the University of Ghana, more than 7,000 volunteers contributed their voices to the project. According to Professor Isaac Wiafe, the dataset has already helped train a new generation of AI researchers and inspired innovation in agriculture, education, and health technology.
Similarly, Joyce Nakatumba-Nabende of Makerere University explained that WAXAL has strengthened local research capacity in Uganda, enabling student-led and faculty-led projects focused on speech technologies grounded in real community needs.
Why This Matters Now
As AI tools become more embedded in daily life, language access is rapidly becoming a new form of digital inclusion. Without support for local languages, millions risk exclusion from essential services such as banking, healthcare, and government platforms that rely on voice interaction.
By releasing WAXAL as an open dataset, Google and its African partners are lowering the barriers to innovation. Researchers no longer need to spend years collecting speech data before building useful tools. Startups can prototype faster, and universities can focus on improving models rather than assembling basic resources.
What Comes Next
WAXAL is available starting today through Google’s Africa blog, and its open nature allows it to evolve as more researchers and developers build upon it. While it does not solve every challenge facing African AI development, it addresses one of the most fundamental: access to data.
For a continent rich in linguistic diversity, WAXAL could mark the difference between being an afterthought in AI development and becoming a driving force in shaping how intelligent systems understand human speech.





Comments