DiscoverLatent Space: The AI Engineer PodcastWorld Models & General Intuition: Khosla's largest bet since LLMs & OpenAI
World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

Update: 2025-12-06
Share

Description

From building Medal into a 12M-user game clipping platform with 3.8B highlight moments to turning down a reported $500M offer from OpenAI (https://www.theinformation.com/articles/openai-offered-pay-500-million-startup-videogame-data) and raising a $134M seed from Khosla (https://techcrunch.com/2025/10/16/general-intuition-lands-134m-seed-to-teach-agents-spatial-reasoning-using-video-game-clips/) to spin out General Intuition, Pim is betting that world models trained on peak human gameplay are the next frontier after LLMs.

We sat down with Pim to dig into why game highlights are “episodic memory for simulation” (and how Medal’s privacy-first action labels became a world-model goldmine https://medal.tv/blog/posts/enabling-state-of-the-art-security-and-protections-on-medals-new-apm-and-controller-overlay-features), what it takes to build fully vision-based agents that just see frames and output actions in real time, how General Intuition transfers from games to real-world video and then into robotics, why world models and LLMs are complementary rather than rivals, what founders with proprietary datasets should know before selling or licensing to labs, and his bet that spatial-temporal foundation models will power 80% of future atoms-to-atoms interactions in both simulation and the real world.

We discuss:

  • How Medal’s 3.8B action-labeled highlight clips became a privacy-preserving goldmine for world models

  • Building fully vision-based agents that only see frames and output actions yet play like (and sometimes better than) humans

  • Transferring from arcade-style games to realistic games to real-world video using the same perception–action recipe

  • Why world models need actions, memory, and partial observability (smoke, occlusion, camera shake) vs. “just” pretty video generation

  • Distilling giant policies into tiny real-time models that still navigate, hide, and peek corners like real players

  • Pim’s path from RuneScape private servers, Tourette’s, and reverse engineering to leading a frontier world-model lab

  • How data-rich founders should think about valuing their datasets, negotiating with big labs, and deciding when to go independent

  • GI’s first customers: replacing brittle behavior trees in games, engines, and controller-based robots with a “frames in, actions out” API

  • Using Medal clips as “episodic memory of simulation” to move from imitation learning to RL via world models and negative events

  • The 2030 vision: spatial–temporal foundation models that power the majority of atoms-to-atoms interactions in simulation and the real world

Pim

Where to find Latent Space

Chapters

  • 00:00:00 Introduction and Medal's Gaming Data Advantage
  • 00:02:08 Exclusive Demo: Vision-Based Gaming Agents
  • 00:06:17 Action Prediction and Real-World Video Transfer
  • 00:08:41 World Models: Interactive Video Generation
  • 00:13:42 From Runescape to AI: Pim's Founder Journey
  • 00:16:45 The Research Foundations: Diamond, Genie, and SEMA
  • 00:33:03 Vinod Khosla's Largest Seed Bet Since OpenAI
  • 00:35:04 Data Moats and Why GI Stayed Independent
  • 00:38:42 Self-Teaching AI Fundamentals: The Francois Fleuret Course
  • 00:40:28 Defining World Models vs Video Generation
  • 00:41:52 Why Simulation Complexity Favors World Models
  • 00:43:30 World Labs, Yann LeCun, and the Spatial Intelligence Race
  • 00:50:08 Business Model: APIs, Agents, and Game Developer Partnerships
  • 00:58:57 From Imitation Learning to RL: Making Clips Playable
  • 01:00:15 Open Research, Academic Partnerships, and Hiring
  • 01:02:09 2030 Vision: 80 Percent of Atoms-to-Atoms AI Interactions

Comments 
In Channel
AI is Eating Search

AI is Eating Search

2025-07-2356:21

loading
00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI

World Models & General Intuition: Khosla's largest bet since LLMs & OpenAI