E33: The Tiny Model Revolution with Ronen Eldan and Yuanzhi Li of Microsoft Research
Nathan Labenz sits down with Ronen Eldan and Yuanzhi Li of Microsoft Research to discuss the small natural language dataset they created called TinyStories. Tiny Stories is designed to reflect the full richness of natural language while still being small to support research with modest compute budgets. Using this dataset, they began to explore aspects of language model performance, behavior, and mechanism by training a series of models that range in size from just 1 million to a maximum of 33 million parameters – which is still just 2% the scale of GPT-2. In this conversation, Nathan, Ronen, and Yuanzhi touch on LM reasoning, emergence, interpretability, and what understanding can be extended to LLMs.
Founding a business is just the tip of the iceberg; the real complexity comes with scaling it. On 1 to 1000, hosts Jack Altman and Erik Torenberg dig deep into the inevitable twists and turns operators encounter along the journey of turning an idea into a business. Hear all about the tactical challenges of scaling from the people that built up the world’s leading companies like Stripe, Ramp, and Lattice. Our first episode with Eric Glyman of Ramp is out now: https://link.chtbl.com/1to1000
Tiny Stories paper: https://huggingface.co/papers/2305.07759
(00:00 ) Episode Preview
(07:12 ) The inspiration for the Tiny Stories project
(15:07 ) Sponsor: Omneky
(15:44 ) Creating the Tiny Stories dataset
(21:27 ) GPT-4 vs GPT-3.5
(24:13 ) Did the TinyStories team try any other versions of GPT-4
(29:23 ) Curriculum models and weirder curriculums
(35:34 ) What does reasoning mean?
(46:27 ) What does emergence mean?
(01:01:44 ) The curriculum development space
(01:11:40 ) The similarities between models and human development
(01:20:12 ) Fewer layers vs. more layers
(01:29:22 ) Attention heads
(01:33:40 ) Semantic attention head
(01:36:54 ) Neuron technique used in developing the TinyStories model
(01:52:20 ) Interpretability work that inspires Ronen and Yuanzhi
Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off.
Music Credit: MusicLM
More show notes and reading material released in our Substack: https://cognitiverevolution.substack.com