#201 - GPT 4.5, Sonnet 3.7, Grok 3, Phi 4
Description
Our 201st episode with a summary and discussion of last week's big AI news!
Recorded on 03/02/2025
Join our brand new Discord here! https://discord.gg/nTyezGSKwP
Hosted by Andrey Kurenkov and guest host Sharon Zhou
Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai
Read out our text newsletter and comment on the podcast at https://lastweekin.ai/.
In this episode:
- The release of GPT-4.5 from OpenAI, Anthropic's Claude 3.7, and Grok 3 from XAI, comparing their features, costs, and capabilities.
- Discussion on new tools and applications including Sesame's new voice assistant and Google's AI coding assistant, Gemini Code Assist, highlighting their unique benefits.
- OpenAI's continued user growth despite competition, pricing models for Google's text-to-video platform, and HP acquiring and shutting down Humane's AI pin.
- Insights into new research on alignment and specification gaming in LLMs, including papers on fine-tuning causing broad misalignment and Google's multi-agent system for scientific collaboration.
Timestamps + Links:
- (00:02:33 ) OpenAI announces GPT-4.5, warns it’s not a frontier AI model
- (00:07:22 ) Anthropic launches a new AI model that ‘thinks’ as long as you want
- (00:11:14 ) New Grok 3 release tops LLM leaderboards
- (00:16:43 ) Sesame is the first voice assistant I’ve ever wanted to talk to more than once
- (00:18:30 ) Google launches a free AI coding assistant with very high usage caps
- (00:20:45 ) Rabbit shows off the AI agent it should have launched with
- (00:22:23 ) Mistral’s Le Chat tops 1M downloads in just 14 days
- Applications & Business
- (00:24:06 ) OpenAI Tops 400 Million Users Despite DeepSeek’s Emergence
- (00:27:37 ) Google’s new AI video model Veo 2 will cost 50 cents per second
- (00:29:52 ) HP is buying Humane and shutting down the AI Pin
- Projects & Open Source
- (00:31:44 ) Microsoft launches next-gen Phi AI models.
- (00:33:47 ) OpenAI introduces SWE-Lancer: A Benchmark for Evaluating Model Performance on Real-World Freelance Software Engineering Work
- (00:37:12 ) SWE-Bench+: Enhanced Coding Benchmark for LLMs
- Research & Advancements
- (00:40:00 ) Towards an AI co-scientist
- (00:42:52 ) Magma: A Foundation Model for Multimodal AI Agents
- Policy & Safety