How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

Update: 2025-10-16

Description

What does it really mean when GPT-5 “thinks”? In this conversation, OpenAI’s VP of Research Jerry Tworek explains how modern reasoning models work in practice—why pretraining and reinforcement learning (RL/RLHF) are both essential, what that on-screen “thinking” actually does, and when extra test-time compute helps (or doesn’t). We trace the evolution from O1 (a tech demo good at puzzles) to O3 (the tool-use shift) to GPT-5 (Jerry calls it “03.1-ish”), and talk through verifiers, reward design, and the real trade-offs behind “auto” reasoning modes.

We also go inside OpenAI: how research is organized, why collaboration is unusually transparent, and how the company ships fast without losing rigor. Jerry shares the backstory on competitive-programming results like ICPC, what they signal (and what they don’t), and where agents and tool use are genuinely useful today. Finally, we zoom out: could pretraining + RL be the path to AGI?

This is the MAD Podcast —AI for the 99%. If you’re curious about how these systems actually work (without needing a PhD), this episode is your map to the current AI frontier.

OpenAI

Website - https://openai.com

X/Twitter - https://x.com/OpenAI

Jerry Tworek

LinkedIn - https://www.linkedin.com/in/jerry-tworek-b5b9aa56

X/Twitter - https://x.com/millionint

FIRSTMARK

Website - https://firstmark.com

X/Twitter - https://twitter.com/FirstMarkCap

Matt Turck (Managing Director)

LinkedIn - https://www.linkedin.com/in/turck/

X/Twitter - https://twitter.com/mattturck

(00:00 ) Intro

(01:01 ) What Reasoning Actually Means in AI

(02:32 ) Chain of Thought: Models Thinking in Words

(05:25 ) How Models Decide Thinking Time

(07:24 ) Evolution from O1 to O3 to GPT-5

(11:00 ) Before OpenAI: Growing up in Poland, Dropping out of School, Trading

(20:32 ) Working on Robotics and Rubik's Cube Solving

(23:02 ) A Day in the Life: Talking to Researchers

(24:06 ) How Research Priorities Are Determined

(26:53 ) Collaboration vs IP Protection at OpenAI

(29:32 ) Shipping Fast While Doing Deep Research

(31:52 ) Using OpenAI's Own Tools Daily

(32:43 ) Pre-Training Plus RL: The Modern AI Stack

(35:10 ) Reinforcement Learning 101: Training Dogs

(40:17 ) The Evolution of Deep Reinforcement Learning

(42:09 ) When GPT-4 Seemed Underwhelming at First

(45:39 ) How RLHF Made GPT-4 Actually Useful

(48:02 ) Unsupervised vs Supervised Learning

(49:59 ) GRPO and How DeepSeek Accelerated US Research

(53:05 ) What It Takes to Scale Reinforcement Learning

(55:36 ) Agentic AI and Long-Horizon Thinking

(59:19 ) Alignment as an RL Problem

(1:01:11 ) Winning ICPC World Finals Without Specific Training

(1:05:53 ) Applying RL Beyond Math and Coding

(1:09:15 ) The Path from Here to AGI

(1:12:23 ) Pure RL vs Language Models

Comments

In Channel

Trino, Iceberg and the Battle for the Lakehouse | Justin Borgman, CEO, Starburst

2025-01-3001:06:24

How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

2025-10-1601:16:04

Sonnet 4.5 & the AI Plateau Myth — Sholto Douglas (Anthropic)

2025-10-0201:10:03

Goodbye Excel? AI Agents for Self-Driving Finance – Pigment CEO

2025-09-1101:05:46

AI Video’s Wild Year – Runway CEO on What’s Next

2025-09-0401:04:57

How to Build a Beloved AI Product - Granola CEO Chris Pedregal

2025-08-2101:08:28

Anthropic's Surprise Hit: How Claude Code Became an AI Coding Powerhouse

2025-08-0701:00:16

Ex‑DeepMind Researcher Misha Laskin on Enterprise Super‑Intelligence | Reflection AI

2025-07-1701:06:29

The Rise of Agentic Commerce — Emily Glassberg Sands (Stripe)

2025-07-1001:15:14

AI Engineering Revolution: Winners, Chaos & What’s Next | FirstMark

2025-07-0349:53

Guillermo Rauch: Why Software Development Will Never Be the Same

2025-06-2601:45:40

Inside Canva’s $3B ARR AI Design Rocketship — CTO Brendan Humphreys on Magic Studio & Canva Code

2025-06-2056:38

GitHub CEO: The AI Coding Gold Rush, Vibe Coding & Cursor

2025-06-1201:04:46

Inside the Paper That Changed AI Forever - Cohere CEO Aidan Gomez on 2025 Agents

2025-06-0501:02:24

AI That Ends Busy Work — Hebbia CEO on “Agent Employees”

2025-05-2948:24

AI Eats the World: Benedict Evans on What Really Matters Now

2025-05-2201:15:09

Jeremy Howard on Building 5,000 AI Products with 14 People (Answer AI Deep-Dive)

2025-05-1555:02

Why Influx Rebuilt Its Database for the IoT and Robotics Explosion

2025-05-0835:35

Dashboards Are Dead: Sigma’s BI Revolution for Trillion-Row Data

2025-05-0141:32

Glean’s Breakthrough: CEO Arvind Jain on Scaling AI Agents & Search

2025-04-2452:11

00:00

How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

#box-pro-ellipsis-17606363832667{-webkit-line-clamp:2;}How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

Matt Turck

How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek