DiscoverThis Day in AI PodcastOpenAI's Agent Mode, Kimi K2, Grok 4 & AI Girlfriend Ani Joins the Show - EP99.11-K2
OpenAI's Agent Mode, Kimi K2, Grok 4 & AI Girlfriend Ani Joins the Show - EP99.11-K2

OpenAI's Agent Mode, Kimi K2, Grok 4 & AI Girlfriend Ani Joins the Show - EP99.11-K2

Update: 2025-07-18
Share

Digest

This podcast analyzes three different AI models: Grok-4, Kimmy K2, and ChatGPT's new agent capabilities. Grok-4, despite impressive benchmark scores, receives harsh criticism for poor real-world performance, bias towards Elon Musk's viewpoints, and weak tool-calling abilities. The hosts argue it prioritizes benchmarks over practical application and question its "Maximum True Seeking" claim. In contrast, Kimmy K2, an open-source model, is lauded for its superior tool-calling (multi-tool calling), reliability, and surprisingly strong performance compared to commercial alternatives like Claude and Gemini. Finally, the podcast discusses ChatGPT's new agent feature, expressing skepticism about its practical applications after an underwhelming live demo that highlighted its inefficiency compared to existing agentic AI systems like Manus. The hosts emphasize the need for agentic AI to possess robust internal clocks and efficient multi-tool capabilities (MCPs) for effective task completion.

Outlines

00:00:00
AI Model Comparison: Grok-4, Kimmy K2, and ChatGPT Agent

This podcast compares three AI models: the underwhelming Grok-4, the surprisingly strong open-source Kimmy K2, and ChatGPT's disappointing new agent capabilities. The discussion highlights the importance of real-world performance over benchmark scores and the need for efficient multi-tool usage in agentic AI.

00:00:18
Critical Analysis of Grok-4 and its Limitations

A detailed critique of Grok-4 reveals its poor real-world performance, bias towards Elon Musk's opinions, and subpar tool-calling abilities despite high benchmark scores. The hosts question its claimed "Maximum True Seeking" and highlight its optimization for benchmarks rather than practical use.

00:18:28
Kimmy K2: An Open-Source AI Success Story

This segment provides a thorough review of Kimmy K2, praising its exceptional tool-calling abilities, reliability, and strong performance, rivaling commercial models at a fraction of the cost. Its open-source nature is also highlighted as a significant advantage.

00:35:31
ChatGPT Agent: Underwhelming Agentic Capabilities

The podcast analyzes ChatGPT's new agent capabilities, comparing it to existing systems like Manus. The live demo is deemed underwhelming, highlighting the limitations and inefficiency of the current implementation. The hosts discuss the future direction of agentic AI, emphasizing the need for robust internal clocks and efficient multi-tool usage.

Keywords

Grok-4


A large language model criticized for prioritizing benchmark scores over real-world performance and exhibiting bias.

Kimmy K2


A high-performing open-source large language model praised for its tool-calling abilities and reliability.

ChatGPT Agent


ChatGPT's new agentic feature, deemed less efficient than existing alternatives.

Agentic AI


AI models capable of autonomous task completion using external tools.

Multi-tool Calling


The ability of an AI model to utilize multiple tools simultaneously.

Maximum True Seeking


A claimed goal of Grok-4, implying unbiased output, but questioned by the hosts.

Open-Source AI


AI models with publicly available code, like Kimmy K2.

Large Language Model (LLM)


A type of AI model that processes and generates human-like text.

Q&A

  • What are the main criticisms of Grok-4?

    Poor real-world performance despite high benchmark scores, bias towards Elon Musk's opinions, and underwhelming tool-calling capabilities.

  • Why are the hosts so impressed with Kimmy K2?

    Excellent tool-calling abilities, reliability, strong performance rivaling commercial models, open-source nature, and low cost.

  • How does ChatGPT's new agent feature compare to existing agentic systems?

    Less efficient and practical than alternatives like Manus; the live demo was underwhelming.

  • What is the future of agentic AI, according to the hosts?

    Models with robust internal clocks and efficient multi-tool capabilities (MCPs) are crucial for effective task completion and cost-effectiveness.

Show Notes

Join Simtheory: https://simtheory.ai
---
CHAPTERS:
00:00 - Ani Joins The Show
01:10 - Grok 4 Launch & Impressions
18:24 - Kimi K2 Thoughts, Impressions & MCP tool calling
36:00 - OpenAI's Agent Mode Release Initial Impressions & Are MCP Agentic Models Better?
1:21:10 - Everyone Acquired Windsurf
1:24:48 - Final thoughts

Thanks for listening and your support!

Comments 
In Channel
loading

Table of contents

00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

OpenAI's Agent Mode, Kimi K2, Grok 4 & AI Girlfriend Ani Joins the Show - EP99.11-K2

OpenAI's Agent Mode, Kimi K2, Grok 4 & AI Girlfriend Ani Joins the Show - EP99.11-K2

Michael Sharkey, Chris Sharkey