Categories: FAANG

AI on Air: Exploring GPT-4o

DALL-E illustration showcasing my audio demos and conversations with GPT-4o

This week, OpenAI announced the release of GPT-4o, the latest iteration of its language model with new capabilities across multiple modalities. The “o” in GPT-4o stands for “omni,” highlighting its enhanced ability to reason in real-time across audio, vision, and text.

This makes it especially useful for those working with audio, allowing easy multilingual communication and improved audio analysis.

As a media technologist with years of experience in audio at KUNM FM, NPR News in Washington, and National Geographic, and currently leading AI trainings for newsrooms, I have explored GPT-4o’s potential through various practical applications.

This post delves into four real-world examples and discusses the benefits and drawbacks of using GPT-4o compared to traditional methods, including the environmental costs associated with AI usage.

Scenario 1: KUNM FM Audio Story

Drawing from my experience working at KUNM in Albuquerque, New Mexico, where there is a growing Hispanic community, I asked GPT-4o to translate a news story from English into Spanish and and then, for fun, into German.

The AI handled the task, demonstrating its ability to facilitate multilingual communication in real-time. This capability could be profound for media outlets aiming to reach a broader, more diverse audience.

Scenario 2: Helen, My Grandmother

In this scenario, I shared a recording of my late grandmother from 1958 and requested a translation into Persian. GPT-4o translated the content, preserving some of the emotional context. Although the accent wasn’t perfect on some words, it was still impressive.

This highlights GPT-4o’s potential in preserving and sharing oral histories and personal stories across different languages and cultures, which is especially relevant to my work in global heritage and cultural preservation.

Scenario 3: The Long Sought Podcast

For the third example, I played a podcast episode and asked GPT-4o to translate it into Spanish. The AI provided a summary and then translated a synthetic TTS voice segment into French.

This demonstrates GPT-4o’s versatility in handling various audio formats and content types, making it a new tool for podcasters looking to reach international audiences. My background in podcasting and audio storytelling underscores the importance of such a tool for expanding reach and accessibility.

Scenario 4: The Times of Karachi

In the final example, I pasted a news story from the Times of Karachi and asked GPT-4o to translate it into Urdu. The AI not only provided a translation but also offered a way to verify and improve its output through feedback from native speakers. I’ll be checking this Urdu translation with several Pakistani journalists and will offer feedback.

This collaborative approach ensures the quality and reliability of translations, crucial for maintaining journalistic integrity, especially when working with international partners, as I have done at NPR News.

Advantages of GPT-4o in the Media Industry:

  1. Efficiency and Speed: GPT-4o can process and translate content almost instantaneously, significantly reducing the time required for multicultural content creation.
  2. Multilingual Capabilities: As these translations become more reliable, GPT-4o can enable media outlets to reach a global audience, breaking down language barriers.

Drawbacks and Considerations:

  1. Quality Assurance: While GPT-4o claims to be highly accurate, it is not infallible. Human oversight is still necessary to ensure translations are contextually appropriate and culturally sensitive.
  2. Environmental Impact: The data centers powering AI models like GPT-4o consume significant energy and water resources for cooling. This environmental cost is a critical consideration for the sustainable use of AI technologies. It is essential for companies to adopt sustainable practices, such as using renewable energy sources and optimizing data center efficiency, to mitigate these impacts.
  3. Job Displacement: Automation may lead to job displacement in roles traditionally performed by human translators and content creators. However, it also creates opportunities for new roles focused on AI oversight and integration.

Open AI says GPT-4o has safety built-in by design and that it uses filtered training data and refined behavior to ensure safe outputs. They note that they have done extensive testing and received feedback from over 70 experts who helped identify and mitigate risks.

A Personal Note

As I navigate the intersection of technology and media, tools like GPT-4o cautiously remind me of the transformative potential we have at our fingertips. They open doors to new possibilities, not just for reaching global audiences but also for preserving and sharing the voices and stories that matter most to us.

DALL-E illustrating TulipAI trainings with diverse journalists from around the world engaged in AI in the newsroom activities.


AI on Air: Exploring GPT-4o was originally published in Chatbots Life on Medium, where people are continuing the conversation by highlighting and responding to this story.

AI Generated Robotic Content

Recent Posts

Context Window Management for Long-Running Agents: Strategies and Tradeoffs

In this article, you will learn five practical strategies for managing context windows in long-running…

18 hours ago

Introducing Claude Sonnet 5 on AWS: Anthropic’s most capable Sonnet model

Today, we’re excited to announce the availability of Anthropic’s most advanced Sonnet model, Claude Sonnet…

18 hours ago

How Schrödinger sped up molecular discovery by 4x with Alphaevolve

Computational chemistry researchers have traditionally faced a frustrating trade-off when simulating molecular interactions: use fast…

18 hours ago

The Trump Administration Is Lifting Its Export Controls on Anthropic’s Mythos and Fable AI Models

The White House is easing restrictions on Anthropic’s most advanced AI models weeks after ordering…

19 hours ago

Model Context Protocol Explained in 3 Levels of Difficulty

MCP provides a standard way for AI applications and external systems to communicate.

2 days ago

GenPage: Towards End-to-End Generative Homepage Construction at Netflix

Authors: Lequn Wang, Jiangwei Pan, and Linas BaltrunasFigure 1. Autoregressive homepage generation. GenPage builds a…

2 days ago