As a medical doctor in Nigeria, Tobi Olatunji knows the stress of practicing in Africa’s busy hospitals. As a machine-learning scientist, he has a prescription for it.
“I worked at one of West Africa’s largest hospitals, where I would routinely see more than 30 patients a day — it’s a very hard job,” said Olatunji.
The need to write detailed patient notes and fill out forms makes it even harder. Paper records slowed the pace of medical research, too.
In his first years of practice, Olatunji imagined a program to plow through the mounds of paperwork, freeing doctors to help more patients.
A Side Trip in Tech
With encouragement from med school mentors, Olatunji got a master’s degree in medical informatics from the University of San Francisco and another in computer science at Georgia Tech. He started working as a machine-learning scientist in the U.S. by day and writing code on nights and weekends to help digitize Africa’s hospitals.
A pilot test during the pandemic hit a snag.
The first few doctors to use the code took 45 minutes to finish their patient notes. Feeling awkward in front of a keyboard, some health workers said they prefer pen and paper.
“We made a hard decision to invest in natural language processing and speech recognition,” he said. It’s technology he was already familiar with in his day job.
Building AI Models
“The combination of medical terminology and thick African accents produced horrible results with most existing speech-to-text software, so we knew there would be no shortcut to training our own models,” he said.
The Intron team evaluated several commercial and open-source speech recognition frameworks and large language models before choosing to build with NVIDIA NeMo, a software framework for text-based generative AI. In addition, the resulting models were trained on NVIDIA GPUs in the cloud.
“We initially tried to train with CPUs as the cheapest option, but it took forever, so we started with a single GPU and eventually grew to using several of them in the cloud,” he said.
The resulting Transcribe app captures doctors’ dictated messages with more than 92% accuracy across more than 200 African accents. It slashes the time they spend on paperwork by 6x on average, according to an ongoing study Intron is conducting across hospitals in four African countries.
“Even the doctor with the fastest typing skills in the study got a 40% speedup,” he said of the software now in use at several hospitals across Africa.
Listening to Africa’s Voices
Olatunji knew his models needed high quality audio data. So, the company created an app to capture sound bites of medical terms spoken in different accents.
To date, the app’s gathered more than a million clips from more than 7,000 people across 24 countries, including 13 African nations. It’s one of the largest datasets of its type, parts of which have been released as open source to support African speech research.
Today, Intron refreshes its models every other month as more data comes in.
Nurturing Diversity in Medtech
Very little research exists on speech recognition for African accents in a clinical setting. So, working with Africa’s tech communities like DSN, Masakhane and Zindi, Intron launched AfriSpeech-200, a developer challenge to kickstart research using its data.
Similarly, for all its sophistication, medtech lags in diversity and inclusion, so Olatunji recently launched an effort that addresses that issue, too.
Bio-RAMP Lab is a global community of minority researchers working on problems they care about at the intersection of AI and healthcare. The group already has a half dozen papers under review at major conferences.
“For seven years, I was the only Black person on every team I worked on,” he said. “There were no Black scientists or managers, even in my job interviews.”
Meanwhile, Intron is even helping hospitals in Africa find creative ways to acquire the hardware they need. It’s another challenge on the way to opening up huge opportunities.
“Once healthcare data gets digitized, you unlock a whole new world for research into areas like predictive models that can be early warning systems for epidemics — we can’t do it without data,” Olatunji said.
Watch a masterclass (starting at 20:30) with Olatunji, HuggingFace and NVIDIA on AI for speech recognition.