ANTHOLOGY — Open source AI
This week on The Changelog we’re taking you to the hallway track of The Linux Foundation’s Open Source Summit North America 2023 in Vancouver, Canada. Today’s anthology episode features: Beyang Liu (Co-founder and CTO at Sourcegraph), Denny Lee (Developer Advocate at Databricks), and Stella Biderman (Executive Director and Head of Research at EleutherAI).
Special thanks to our friends at GitHub for sponsoring us to attend this conference as part of Maintainer Month.
Changelog++ members get a bonus 3 minutes at the end of this episode and zero ads. Join today!
- DevCycle – Build better software with DevCycle. Feature flags, without the tech debt. DevCycle is a Feature Flag Management platform designed to help you build maintainable code at scale.
- Sentry – See the untested code causing errors - or whether it’s partially or fully covered - directly in your stack trace, so you can avoid similar errors from happening in the future. Use the code
CHANGELOGand get the team plan free for three months.
- Rocky Linux – Enterprise Linux, the open source community way.
- Beyang Liu – Twitter, GitHub
- Denny Lee – Mastodon, Twitter, GitHub, LinkedIn
- Stella Biderman – Twitter, GitHub, LinkedIn, Website
- Adam Stacoviak – Mastodon, Twitter, GitHub, LinkedIn, Website
- Jerod Santo – Mastodon, Twitter, GitHub, LinkedIn
The common denominator for these conversations is open source AI.
Beyang Liu and his team at Sourcegraph are focused on enabling more developers to understand code and their approach to a completely open source, model agnostic, coding assistant called Cody has significant interest from us.
Denny Lee and the team at Databricks recently released Dolly 2.0, the first open source, instruction-following LLM, that has been fine-tuned on a human-generated instruction dataset and is licensed for research and commercial use. They want to be the platform of choice the future of AI development.
Stella Biderman gave the keynote address on generative AI at the conference and works at the base layer doing open source research, model training, and AI ethics. Stella trained the EleutherAI pythia model family that Databricks’ used to create Dolly - 2.0.
- Cody from Sourcegraph - Read, write, and understand code 10x faster with AI. Cody answers code questions and writes code for you by reading your entire codebase and the code graph.
- Free Dolly: Introducing the World’s First Truly Open Instruction-Tuned LLM
- EleutherAI - Empowering Open Source Artificial Intelligence Research
Something missing or broken? PRs welcome!
(00:00 ) - This week on The Changelog
(01:44 ) - Sponsor: DevCycle
(04:38 ) - Start the show!
(05:46 ) - We met Beyang 10 years ago!
(06:08 ) - The mission of Sourcegraph
(07:22 ) - Adam still Googles, just less
(08:30 ) - Plugins make models interesting
(09:35 ) - When did you start thinking about this?
(12:16 ) - This is a "Eureka!" momement in time
(13:11 ) - The gospel of text based input
(15:44 ) - Is this the future interface of Sourcegraph?
(17:21 ) - Iterating the interface
(17:59 ) - How can you access Cody?
(18:27 ) - Cody is open source
(20:13 ) - How does it get code intelligence?
(21:58 ) - What about privacy?
(26:11 ) - GPT for X
(26:53 ) - Cody vs Copilot
(29:25 ) - Open source + model agnostic
(31:22 ) - What's next?
(33:19 ) - How high up the stack can AI tooling go?
(36:07 ) - Is this a step change to plateau?
(38:21 ) - The ultimate flattener
(42:56 ) - Will AI awallow all of programing?
(45:52 ) - Sponsor: Sentry
(50:08 ) - We're fine-tuned
(50:51 ) - JIT conference presenter
(52:32 ) - This time 4 weeks ago
(53:54 ) - Let's generate our own data
(55:05 ) - All 15,000 Q&A data is open
(56:12 ) - Verbose is not always desirable
(56:42 ) - I want my own Dolly 2.0
(58:14 ) - How did you collect the Q&A data?
(1:00:39 ) - We thought we'd need more data
(1:01:40 ) - Dolly proved it could be done
(1:03:24 ) - Google's leaked memo
(1:06:06 ) - Databricks' play in this chess game
(1:08:45 ) - Turning AI on our transcripts
(1:11:03 ) - Chain or foundational model?
(1:12:42 ) - Sponsor: Rocky Linux
(1:15:19 ) - The base layer
(1:16:27 ) - What should the world know?
(1:17:40 ) - Where does the money come from?
(1:18:13 ) - Training LLMs is NOT that expensive
(1:22:07 ) - Focused on open source AI research
(1:25:49 ) - Interpreting LLMs
(1:28:30 ) - Influencing the properties of the model
(1:31:40 ) - Do you have fear of where this is going?
(1:32:58 ) - Connecting with Stella and team
(1:34:07 ) - Stella's news source is their Discord server
(1:36:22 ) - Outro