Categories: FAANG

AI Opener: OpenAI’s Sutskever in Conversation With Jensen Huang

Like old friends catching up over coffee, two industry icons reflected on how modern AI got its start, where it’s at today and where it needs to go next.

Jensen Huang, founder and CEO of NVIDIA, interviewed AI pioneer Ilya Sutskever in a fireside chat at GTC. The talk was recorded a day after the launch of GPT-4, the most powerful AI model to date from OpenAI, the research company Sutskever co-founded.

They talked at length about GPT-4 and its forerunners, including ChatGPT. That generative AI model, though only a few months old, is already the most popular computer application in history.

Their conversation touched on the capabilities, limits and inner workings of the deep neural networks that are capturing the imaginations of hundreds of millions of users.

Compared to ChatGPT, GPT-4 marks a “pretty substantial improvement across many dimensions,” said Sutskever, noting the new model can read images as well as text.

“In some future version, [users] might get a diagram back” in response to a query, he said.

Under the Hood With GPT

“There’s a misunderstanding that ChatGPT is one large language model, but there’s a system around it,” said Huang.

In a sign of that complexity, Sutskever said OpenAI uses two levels of training.

The first stage focuses on accurately predicting the next word in a series. Here, “what the neural net learns is some representation of the process that produced the text, and that’s a projection of the world,” he said.

The second “is where we communicate to the neural network what we want, including guardrails … so it becomes more reliable and precise,” he added.

Present at the Creation

While he’s at the swirling center of modern AI today, Sutskever was also present at its creation.

In 2012, he was among the first to show the power of deep neural networks trained on massive datasets. In an academic contest, the AlexNet model he demonstrated with AI pioneers Geoff Hinton and Alex Krizhevsky recognized images faster than a human could.

Huang referred to their work as the Big Bang of AI.

The results “broke the record by such a large margin, it was clear there was a discontinuity here,” Huang said.

The Power of Parallel Processing

Part of that breakthrough came from the parallel processing the team applied to its model with GPUs.

“The ImageNet dataset and a convolutional neural network were a great fit for GPUs that made it unbelievably fast to train something unprecedented,” Sutskever said.

That early work ran on a few GeForce GTX 5080 GPUs in a University of Toronto lab. Today, tens of thousands of the latest NVIDIA A100 and H100 Tensor Core GPUs in the Microsoft Azure cloud service handle training and inference on models like ChatGPT.

“In the 10 years we’ve known each other, the models you’ve trained [have grown by] about a million times,” Huang said. “No one in computer science would have believed the computation done in that time would be a million times larger.”

“I had a very strong belief that bigger is better, and a goal at OpenAI was to scale,” said Sutskever.

A Billion Words

Along the way, the two shared a laugh.

“Humans hear a billion words in a lifetime,” Sutskever said.

“Does that include the words in my own head,” Huang shot back.

“Make it 2 billion,” Sutskever deadpanned.

The Future of AI

They ended their nearly hour-long talk discussing the outlook for AI.

Asked if GPT-4 has reasoning capabilities, Sutskever suggested the term is hard to define and the capability may still be on the horizon.

“We’ll keep seeing systems that astound us with what they can do,” he said. “The frontier is in reliability, getting to a point where we can trust what it can do, and that if it doesn’t know something, it says so,” he added.

“Your body of work is incredible … truly remarkable,” said Huang in closing the session. “This has been one of the best beyond Ph.D. descriptions of the state of the art of large language models,” he said.

To get all the news from GTC, watch the keynote below.

AI Generated Robotic Content

Recent Posts

Tried longer videos with WAN 2.2 Animate

I altered the workflow a little bit from my previous post (using Hearmeman's Animate v2…

15 hours ago

10 Python One-Liners for Generating Time Series Features

Time series data normally requires an in-depth understanding in order to build effective and insightful…

15 hours ago

Evaluating Evaluation Metrics — The Mirage of Hallucination Detection

Hallucinations pose a significant obstacle to the reliability and widespread adoption of language models, yet…

15 hours ago

Announcing new capabilities in Vertex AI Training for large-scale training

Building and scaling generative AI models demands enormous resources, but this process can get tedious.…

15 hours ago

MiniMax-M2 is the new king of open source LLMs (especially for agentic tool calling)

Watch out, DeepSeek and Qwen! There's a new king of open source large language models…

16 hours ago

Elon Musk’s Grokipedia Pushes Far-Right Talking Points

The new AI-powered Wikipedia competitor falsely claims that pornography worsened the AIDS epidemic and that…

16 hours ago