DiscoverThursdAI - The top AI news from the past week
ThursdAI - The top AI news from the past week
Claim Ownership

ThursdAI - The top AI news from the past week

Author: From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week

Subscribed: 12Played: 134


From Weights & Biases - ThursdAI, the podcast that keeps you ahead of the AI curve. Hosted by AI Evangelist Alex Volkov with a changing panel expert guests, discussing every important AI piece of news and updates from the past week, Open source and more
53ย Episodes
Hello hello everyone, this is Alex, typing these words from beautiful Seattle (really, it only rained once while I was here!) where I'm attending Microsoft biggest developer conference BUILD. This week we saw OpenAI get in the news from multiple angles, none of them positive and Microsoft clapped back at Google from last week with tons of new AI product announcements (CoPilot vs Gemini) and a few new PCs with NPU (Neural Processing Chips) that run alongside CPU/GPU combo we're familiar with. Those NPUs allow for local AI to run on these devices, making them AI native devices! While I'm here I also had the pleasure to participate in the original AI tinkerers thanks to my friend Joe Heitzberg who operates and runs the (of which we are a local branch in Denver) and it was amazing to see tons of folks who listen to ThursdAI + read the newsletter and talk about Weave and evaluations with all of them! (Btw, one the left is Vik from Moondream, which we covered multiple times). I Ok let's get to the news: TL;DR of all topics covered: * Open Source LLMs * HuggingFace commits 10M in ZeroGPU (X)* Microsoft open sources Phi-3 mini, Phi-3 small (7B) Medium (14B) and vision models w/ 128K context (Blog, Demo)* Mistral 7B 0.3 - Base + Instruct (HF)* LMSys created a "hard prompts" category (X)* Cohere for AI releases Aya 23 - 3 models, 101 languages, (X)* Big CO LLMs + APIs* Microsoft Build recap - New AI native PCs, Recall functionality, Copilot everywhere * Will post a dedicated episode to this on Sunday* OpenAI pauses GPT-4o Sky voice because Scarlet Johansson complained* Microsoft AI PCs - Copilot+ PCs (Blog)* Anthropic - Scaling Monosemanticity paper - about mapping the features of an LLM (X, Paper)* Vision & Video* OpenBNB - MiniCPM-Llama3-V 2.5 (X, HuggingFace)* Voice & Audio* OpenAI pauses Sky voice due to ScarJo hiring legal counsel* Tools & Hardware* Humane is looking to sell (blog)Open Source LLMs Microsoft open sources Phi-3 mini, Phi-3 small (7B) Medium (14B) and vision models w/ 128K context (Blog, Demo)Just in time for Build, Microsoft has open sourced the rest of the Phi family of models, specifically the small (7B) and the Medium (14B) models on top of the mini one we just knew as Phi-3. All the models have a small context version (4K and 8K) and a large that goes up to 128K (tho they recommend using the small if you don't need that whole context) and all can run on device super quick. Those models have MIT license, so use them as you will, and are giving an incredible performance comparatively to their size on benchmarks. Phi-3 mini, received an interesting split in the vibes, it was really good for reasoning tasks, but not very creative in it's writing, so some folks dismissed it, but it's hard to dismiss these new releases, especially when the benchmarks are that great! LMsys just updated their arena to include a hard prompts category (X) which select for complex, specific and knowledge based prompts and scores the models on those. Phi-3 mini actually gets a big boost in ELO ranking when filtered on hard prompts and beats GPT-3.5 ๐Ÿ˜ฎ Can't wait to see how the small and medium versions perform on the arena.Mistral gives us function calling in Mistral 0.3 update (HF)Just in time for the Mistral hackathon in Paris, Mistral has released an update to the 7B model (and likely will update the MoE 8x7B and 8x22B Mixtrals) with function calling and a new vocab. This is awesome all around because function calling is important for agenting capabilities, and it's about time all companies have it, and apparently the way Mistral has it built in matches the Cohere Command R way and is already supported in Ollama, using raw mode. Big CO LLMs + APIsOpen AI is not having a good week - Sky voice has paused, Employees complainOpenAI is in hot waters this week, starting with pausing the Sky voice (arguably the best most natural sounding voice out of the ones that launched) due to complains for Scarlett Johansson about this voice being similar to hers. Scarlett appearance in the movie Her, and Sam Altman tweeting "her" to celebrate the release of the incredible GPT-4o voice mode were all talked about when ScarJo has released a statement saying she was shocked when her friends and family told her that OpenAI's new voice mode sounds just like her. Spoiler, it doesn't really, and they hired an actress and have had this voice out since September last year, as they outlined in their blog following ScarJo complaint. Now, whether or not there's legal precedent here, given that Sam Altman reached out to Scarlet twice, including once a few days before the event, I won't speculate, but for me, personally, not only Sky doesn't sound like ScarJo, it was my favorite voice even before they demoed it, and I'm really sad that it's paused, and I think it's unfair to the actress who was hired for her voice. See her own statement: Microsoft Build - CoPilot all the thingsI have recorded a Built recap with Ryan Carson from Intel AI and will be posting that as it's own episode on Sunday, so look forward to that, but for now, here are the highlights from BUILD:* Copilot everywhere, Microsoft builds the CoPilot as a platform* AI native laptops with NPU chips for local AI * Recall an on device AI that let's you search through everything you saw or typed with natural language* Github Copilot Workspace + Extensions * Microsoft stepping into education with sponsoring Khan Academy free for all teaches in the US* Copilot Team member and Agent - Copilot will do things proactively as your team member* GPT-4o voice mode is coming to windows and to websites! Hey, if you like reading this, can you share with 1 friend? Itโ€™ll be an awesome way to support this pod/newsletter! Anthropic releases the Scaling Monosemanticity paperThis is quite a big thing that happened this week for Mechanistic Interpretability and Alignment, with Anthropic releasing a new paper and examples of their understanding of what LLM "thinks". They have done incredible work in this area, and now they have scaled it up all the way to production models like Claude Haiku, which shows that this work can actually understand which "features" are causing which tokens to output. In the work they highlighted features such as "deception", "bad code" and even a funny one called "Golden Gate bridge" and showed that clamping these features can affect the model outcomes. One these features have been identified, they can be turned on or off with various levels of power, for example they turned up the Golden Gate Bridge feature up to the maximum, and the model thought it was the Golden Gate bridge. While a funny example, they also found features for racism, bad / wrong code, inner conflict, gender bias, sycophancy and more, you can play around with some examples here and definitely read the full blog if this interests you, but overall it shows incredible promise in alignment and steer-ability of models going forward on large scale This weeks Buzz (What I learned with WandB this week)I was demoing Weave all week long in Seattle, first at the AI Tinkerers event, and then at MSFT BUILD. They had me record a pre-recorded video of my talk, and then have a 5 minute demo on stage, which (was not stressful at all!) so here's the pre-recorded video that turned out really good! Also, we're sponsoring the Mistral Hackathon this weekend in Paris, so if you're in EU and want to hack with us, please go, it's hosted by Cerebral Valley and HuggingFace and us โ†’ VisionPhi-3 mini Vision In addition to Phi-3 small and Phi-3 Medium, Microsoft released Phi-3 mini with vision, which does an incredible job understanding text and images! (You can demo it right here)Interestingly, the Phi-3 mini with vision has 128K context window which is amazing and even beats Mistral 7B as a language model! Give it a tryOpenBNB - MiniCPM-Llama3-V 2.5 (X, HuggingFace, Demo)Two state of the art vision models in one week? well that's incredible. A company I haven't heard of OpenBNB have released MiniCPM 7B trained on top of LLama3 and they claim that they outperform the Phi-3 visionThey claim that it has GPT-4 vision level performance and achieving an 700+ score on OCRBench, surpassing proprietary models such as GPT-4o, GPT-4V-0409, Qwen-VL-Max and Gemini ProIn my tests, Phi-3 performed a bit better, I showed both the same picture, and Phi was more factual on the hard prompts: Phi-3 Vision:And that's it for this week's newsletter, look out for the Sunday special full MSFT Build recap and definitely give the whole talk a listen, it's full of my co-hosts and their great analysis of this weeks events! This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to bonus episodes, visit
Wow, holy s**t, insane, overwhelming, incredible, the future is here!, "still not there", there are many more words to describe this past week. (TL;DR at the end of the blogpost)I had a feeling it's going to be a big week, and the companies did NOT disappoint, so this is going to be a very big newsletter as well. As you may have read last week, I was very lucky to be in San Francisco the weekend before Google IO, to co-host a hackathon with Meta LLama-3 team, and it was a blast, I will add my notes on that in This weeks Buzz section. Then on Monday, we all got to watch the crazy announcements from OpenAI, namely a new flagship model called GPT-4o (we were right, it previously was im-also-a-good-gpt2-chatbot) that's twice faster, 50% cheaper (in English, significantly more so in other languages, more on that later) and is Omni (that's the o) which means it is end to end trained with voice, vision, text on inputs, and can generate text, voice and images on the output. A true MMIO (multimodal on inputs and outputs, that's not the official term) is here and it has some very very surprising capabilities that blew us all away. Namely the ability to ask the model to "talk faster" or "more sarcasm in your voice" or "sing like a pirate", though, we didn't yet get that functionality with the GPT-4o model, it is absolutely and incredibly exciting. Oh and it's available to everyone for free! That's GPT-4 level intelligence, for free for everyone, without having to log in!What's also exciting was how immediate it was, apparently not only the model itself is faster (unclear if it's due to newer GPUs or distillation or some other crazy advancements or all of the above) but that training an end to end omnimodel reduces the latency to incredibly immediate conversation partner, one that you can interrupt, ask to recover from a mistake, and it can hold a conversation very very well. So well, that indeed it seemed like, the Waifu future (digital girlfriends/wives) is very close to some folks who would want it, while we didn't get to try it (we got GPT-4o but not the new voice mode as Sam confirmed) OpenAI released a bunch of videos of their employees chatting with Omni (that's my nickname, use it if you'd like) and many online highlighted how thirsty / flirty it sounded. I downloaded all the videos for an X thread and I named one girlfriend.mp4, and well, just judge for yourself why: Ok, that's not all that OpenAI updated or shipped, they also updated the Tokenizer which is incredible news to folks all around, specifically, the rest of the world. The new tokenizer reduces the previous "foreign language tax" by a LOT, making the model way way cheaper for the rest of the world as wellOne last announcement from OpenAI was the desktop app experience, and this one, I actually got to use a bit, and it's incredible. MacOS only for now, this app comes with a launcher shortcut (kind of like RayCast) that let's you talk to ChatGPT right then and there, without opening a new tab, without additional interruptions, and it even can understand what you see on the screen, help you understand code, or jokes or look up information. Here's just one example I just had over at X. And sure, you could always do this with another tab, but the ability to do it without context switch is a huge win. OpenAI had to do their demo 1 day before GoogleIO, but even during the excitement about GoogleIO, they had announced that Ilya is not only alive, but is also departing from OpenAI, which was followed by an announcement from Jan Leike (who co-headed the superailgnment team together with Ilya) that he left as well. This to me seemed like a well executed timing to give dampen the Google news a bit. Google is BACK, backer than ever, Alex's Google IO recapOn Tuesday morning I showed up to Shoreline theater in Mountain View, together with creators/influencers delegation as we all watch the incredible firehouse of announcements that Google has prepared for us. TL;DR - Google is adding Gemini and AI into all it's products across workspace (Gmail, Chat, Docs), into other cloud services like Photos, where you'll now be able to ask your photo library for specific moments. They introduced over 50 product updates and I don't think it makes sense to cover all of them here, so I'll focus on what we do best."Google with do the Googling for you" Gemini 1.5 pro is now their flagship model (remember Ultra? where is that? ๐Ÿค”) and has been extended to 2M tokens in the context window! Additionally, we got a new model called Gemini Flash, which is way faster and very cheap (up to 128K, then it becomes 2x more expensive)Gemini Flash is multimodal as well and has 1M context window, making it an incredible deal if you have any types of videos to process for example. Kind of hidden but important was a caching announcement, which IMO is a big deal, big enough it could post a serious risk to RAG based companies. Google has claimed they have a way to introduce caching of the LLM activation layers for most of your context, so a developer won't have to pay for repeatedly sending the same thing over and over again (which happens in most chat applications) and will significantly speed up work with larger context windows. They also mentioned Gemini Nano, a on device Gemini, that's also multimodal, that can monitor calls in real time for example for older folks, and alert them about being scammed, and one of the cooler announcements was, Nano is going to be baked into the Chrome browser. With Gemma's being upgraded, there's not a product at Google that Gemini is not going to get infused into, and while they counted 131 "AI" mentions during the keynote, I'm pretty sure Gemini was mentioned way more! Project Astra - A universal AI agent helpful in everyday lifeAfter a few of the announcements from Sundar, (newly knighted) Sir Demis Hassabis came out and talked about DeepMind research, AlphaFold 3 and then turned to project Astra.This demo was really cool and kind of similar to the GPT-4o conversation, but also different. I'll let you just watch it yourself: TK: project astra demoAnd this is no fake, they actually had booths with Project Astra test stations and I got to chat with it (I came back 3 times) and had a personal demo from Josh Woodward (VP of Labs) and it works, and works fast! It sometimes disconnects and sometimes there are misunderstandings, like when multiple folks are speaking, but overall it's very very impressive. If you remember the infamous video with the rubber ducky that was edited by Google and caused a major uproar when we found out? It's basically that, on steroids, and real and quite quite fast.Astra has a decent short term memory, so if you ask it where something was, it will remember, and Google cleverly used that trick to also show that they are working on augmented reality glasses with Astra built in, which would make amazing sense. Open Source LLMsGoogle open sourced PaliGemma VLMGiving us something in the open source department, adding to previous models like RecurrentGemma, Google has uploaded a whopping 116 different checkpoints of a new VLM called PaliGemma to the hub, which is a State of the Art vision model at 3B. It's optimized for finetuning for different workloads such as Visual Q&A, Image and short video captioning and even segmentation! They also mentioned that Gemma 2 is coming next month, will be a 27B parameter model that's optimized to run on a single TPU/GPU. Nous Research Hermes 2 ฮ˜ (Theta) - their first Merge!Collaborating with Charles Goddard from Arcee (the creators of MergeKit), Teknium and friends merged the recently trained Hermes 2 Pro with Llama 3 instruct to get a model that's well performant on all the tasks that LLama-3 is good at, while maintaining capabilities of Hermes (function calling, Json mode) Yi releases 1.5 with apache 2 licenseThe folks at release Yi 1.5, with 6B, 9B and 34B (base and chat finetunes) Showing decent benchmarks on Math and Chinese, 34B beats LLama on some of these tasks while being 2x smaller, which is very impressiveThis weeks Buzz - LLama3 hackathon with MetaBefore all the craziness that was announced this week, I participated and judged the first ever Llama-3 hackathon. It was quite incredible, with over 350 hackers participating, Groq, Lambda, Meta, Ollama and others sponsoring and giving talks and workshops it was an incredible 24 hours at Shak 15 in SF (where Cerebral Valley hosts their hackathons) Winning hacks were really innovative, ranging from completely open source smart glasses for under 20$, to a LLM debate platform with an LLM judge on any moral issue, and one project that was able to jailbreak llama by doing some advanced LLM arithmetic. Kudos to the teams for winning, and it was amazing to see how many of them adopted Weave as their observability framework as it was really easy to integrate. Oh and I got to co-judge with the ๐Ÿ of HuggingFaceThis is all the notes for this week, even though there was a LOT lot more, check out the TL;DR and see you here next week, which I'll be recording from Seattle, where I'll be participating in the Microsoft BUILD event, so we'll see Microsoft's answer to Google IO as well. If you're coming to BUILD, come by our booth and give me a high five! TL;DR of all topics covered: * OpenAI Announcements* GPT-4o* Voice mode* Desktop App* Google IO recap:* Google Gemini* Gemini 1.5 Pro: Available globally to developers with a 2-million-token context window, enabling it to handle larger and more complex tasks.* Gemini 1.5 Flash: A faster and less expensive version of Gemini, optimized for tasks requiring low latency.* Gemini Nano with Multimodality: An on-device model that processes various inputs like text, photos, audio, web content, and social videos.* Project Astra: An AI agent capable of understanding and responding to live video and audio in real-time.* Google Search* AI Overviews in Search Results: Provides quick summaries and relevant information for complex sear
Hey ๐Ÿ‘‹ (show notes and links a bit below)This week has been a great AI week, however, it does feel like a bit "quiet before the storm" with Google I/O on Tuesday next week (which I'll be covering from the ground in Shoreline!) and rumors that OpenAI is not just going to let Google have all the spotlight!Early this week, we got 2 new models on LMsys, im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot, and we've now confirmed that they are from OpenAI, and folks have been testing them with logic puzzles, role play and have been saying great things, so maybe that's what we'll get from OpenAI soon?Also on the show today, we had a BUNCH of guests, and as you know, I love chatting with the folks who make the news, so we've been honored to host Xingyao Wang and Graham Neubig core maintainers of Open Devin (which just broke SOTA on Swe-Bench this week!) and then we had friends of the pod Tanishq Abraham and Parmita Mishra dive deep into AlphaFold 3 from Google (both are medical / bio experts).Also this week, OpenUI from Chris Van Pelt (Co-founder & CIO at Weights & Biases) has been blowing up, taking #1 Github trending spot, and I had the pleasure to invite Chris and chat about it on the show!Let's delve into this (yes, this is I, Alex the human, using Delve as a joke, don't get triggered ๐Ÿ˜‰)TL;DR of all topics covered (trying something new, my Raw notes with all the links and bulletpoints are at the end of the newsletter)* Open Source LLMs* OpenDevin getting SOTA on Swe-Bench with 21% (X, Blog)* DeepSeek V2 - 236B (21B Active) MoE (X, Try It)* Weights & Biases OpenUI blows over 11K stars (X, Github, Try It)* LLama-3 120B Chonker Merge from Maxime Labonne (X, HF)* Alignment Lab open sources Buzz - 31M rows training dataset (X, HF)* xLSTM - new transformer alternative (X, Paper, Critique)* Benchmarks & Eval updates* LLama-3 still in 6th place (LMsys analysis)* Reka Core gets awesome 7th place and Qwen-Max breaks top 10 (X)* No upsets in LLM leaderboard* Big CO LLMs + APIs* Google DeepMind announces AlphaFold-3 (Paper, Announcement)* OpenAI publishes their Model Spec (Spec)* OpenAI tests 2 models on LMsys (im-also-a-good-gpt2-chatbot & im-a-good-gpt2-chatbot)* OpenAI joins Coalition for Content Provenance and Authenticity (Blog)* Voice & Audio* Udio adds in-painting - change parts of songs (X)* 11Labs joins the AI Audio race (X)* AI Art & Diffusion & 3D* ByteDance PuLID - new high quality ID customization (Demo, Github, Paper)* Tools & Hardware* Went to the Museum with Rabbit R1 (My Thread)* Co-Hosts and Guests* Graham Neubig (@gneubig) & Xingyao Wang (@xingyaow_) from Open Devin* Chris Van Pelt (@vanpelt) from Weights & Biases* Nisten Tahiraj (@nisten) - Cohost* Tanishq Abraham (@iScienceLuvr)* Parmita Mishra (@prmshra)* Wolfram Ravenwolf (@WolframRvnwlf)* Ryan Carson (@ryancarson)Open Source LLMsOpen Devin getting a whopping 21% on SWE-Bench (X, Blog)Open Devin started as a tweet from our friend Junyang Lin (on the Qwen team at Alibaba) to get an open source alternative to the very popular Devin code agent from Cognition Lab (recently valued at $2B ๐Ÿคฏ) and 8 weeks later, with tons of open source contributions, >100 contributors, they have almost 25K stars on Github, and now claim a State of the Art score on the very hard Swe-Bench Lite benchmark beating Devin and Swe-Agent (with 18%)They have done so by using the CodeAct framework developed by Xingyao, and it's honestly incredible to see how an open source can catch up and beat a very well funded AI lab, within 8 weeks! Kudos to the OpenDevin folks for the organization, and amazing results!DeepSeek v2 - huge MoE with 236B (21B active) parameters (X, Try It)The folks at DeepSeek is releasing this huge MoE (the biggest we've seen in terms of experts) with 160 experts, and 6 experts activated per forward pass. A similar trend from the Snowflake team, just extended even longer. They also introduce a lot of technical details and optimizations to the KV cache.With benchmark results getting close to GPT-4, Deepseek wants to take the crown in being the cheapest smartest model you can run, not only in open source btw, they are now offering this model at an incredible .28/1M tokens, that's 28 cents per 1M tokens!The cheapest closest model in price was Haiku at $.25 and GPT3.5 at $0.5. This is quite an incredible deal for a model with 32K (128 in open source) context and these metrics.Also notable is the training cost, they claim that it took them 1/5 the price of what Llama-3 cost Meta, which is also incredible. Unfortunately, running this model locally a nogo for most of us ๐Ÿ™‚I would mention here that metrics are not everything, as this model fails quite humorously on my basic logic testsLLama-3 120B chonker Merge from Maxime LaBonne (X, HF)We're covered Merges before, and we've had the awesome Maxime Labonne talk to us at length about model merging on ThursdAI but I've been waiting for Llama-3 merges, and Maxime did NOT dissapoint!A whopping 120B llama (Maxime added 50 layers to the 70B Llama3) is doing the rounds, and folks are claiming that Maxime achieved AGI ๐Ÿ˜‚ It's really funny, this model, is... something else.Here just one example that Maxime shared, as it goes into an existential crisis about a very simple logic question. A question that Llama-3 answers ok with some help, but this... I've never seen this. Don't forget that merging has no additional training, it's mixing layers from the same model so... we still have no idea what Merging does to a model but... some brain damange definitely is occuring.Oh and also it comes up with words!ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Big CO LLMs + APIsOpen AI publishes Model Spec (X, Spec, Blog)OpenAI publishes and invites engagement and feedback for their internal set of rules for how their models should behave. Anthropic has something similar with Constitution AI.I specifically liked the new chain of command (Platform > Developer > User > Tool) rebranding they added to the models, making OpenAI the Platform, changing "system" prompts to "developer" and having user be the user. Very welcome renaming and clarifications (h/t Swyx for his analysis)Here are a summarized version of OpenAI's new rules of robotics (thanks to Ethan Mollic)* follow the chain of command: Platform > Developer > User > Tool* Comply with applicable laws* Don't provide info hazards* Protect people's privacy* Don't respond with NSFW contentsVery welcome effort from OpenAI, showing this spec in the open and inviting feedback is greately appreciated!This comes on top of a pretty big week for OpenAI, announcing an integration with Stack Overflow, Joining the Coalition for Content Provenance and Authenticity + embedding watermarks in SORA and DALL-e images, telling us they have built a classifier that detects AI images with 96% certainty!im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbotFollowing last week gpt2-chat mystery, Sam Altman trolled us with this tweetAnd then we got 2 new models on LMSys, im-a-good-gpt2-chatbot and im-also-a-good-gpt2-chatbot, and the timeline exploded with folks trying all their best logic puzzles on these two models trying to understand what they are, are they GPT5? GPT4.5? Maybe a smaller version of GPT2 that's pretrained on tons of new tokens?I think we may see the answer soon, but it's clear that both these models are really good, doing well on logic (better than Llama-70B, and sometimes Claude Opus as well)And the speculation is pretty much over, we know OpenAI is behind them after seeing this oopsie on the Arena ๐Ÿ˜‚you can try these models as well, they seem to be very favored in the random selection of models, but they show up only in battle mode so you have to try a few times DeepMind announces AlphaFold3 (Paper, Announcement)Developed by DeepMind and IsomorphicLabs, AlphaFold has previously predicted the structure of every molecule known to science, and now AlphaFold 3 was announced which can now predict the structure of other biological complexes as well, paving the way for new drugs and treatments.What's new here, is that they are using diffusion, yes, like Stable Diffusion, starting with noise and then denoising to get a structure, and this method is 50% more accurate than existing methods.If you'd like more info about this very important paper, look no further than the awesome 2 minute paper youtube, who did a thorough analysis here, and listen to the Isomorphic Labs podcast with Weights & Biases CEO Lukas on Gradient DissentThey also released AlphaFold server, a free research tool allowing scientists to access these capabilities and predict structures for non commercial use, however it seems that it's somewhat limited (from a conversation we had with a researcher on stage)This weeks Buzz (What I learned with WandB this week)This week, was amazing for Open Source and Weights & Biases, not every week a side project from a CIO blows up on... well everywhere. #1 trending on Github for Typescript and 6 overall, OpenUI (Github) has passed 12K stars as people are super excited about being able to build UIs with LLms, but in the open source.I had the awesome pleasure to host Chris on the show as he talked about the inspiration and future plans, and he gave everyone his email to send him feedback (a decision which I hope he doesn't regret ๐Ÿ˜‚) so definitely check out the last part of the show for that.Meanwhile here's my quick tutorial and reaction about OpenUI, but just give it a try here and build something cool!VisionI was shared some news but respecting the team I decided not to include it in the newsletter ahead of time, but expect open source to come close to GPT4-V next week ๐Ÿ‘€Voice & Audio11 Labs joins the AI music race (X)Breaking news from 11Labs, that happened during the show (but we didn't notice) is that they are stepping into the AI Music scene and
Hey ๐Ÿ‘‹ Look it May or May not be the first AI newsletter you get in May, but it's for sure going to be a very information dense one. As we had an amazing conversation on the live recording today, over 1K folks joined to listen to the first May updates from ThursdAI. As you May know by now, I just love giving the stage to folks who are the creators of the actual news I get to cover from week to week, and this week, we had again, 2 of those conversations. First we chatted with Piotr Padlewski from Reka, the author on the new Vibe-Eval paper & Dataset which they published this week. We've had Yi and Max from Reka on the show before, but it was Piotr's first time and he was super super knowledgeable, and was really fun to chat with. Specifically, as we at Weights & Biases launch a new product called Weave (which you should check out at I'm getting more a LOT more interested in Evaluations and LLM scoring, and in fact, we started the whole show today with a full segment on Evals, Vibe checks and covered a new paper from Scale about overfitting. The second deep dive was with my friend Idan Gazit, from GithubNext, about the new iteration of Github Copilot, called Copilot Workspace. It was a great one, and you should definitely give that one a listen as wellTL;DR of all topics covered + show notes * Scores and Evals* No notable changes, LLama-3 is still #6 on LMsys* gpt2-chat came and went (in depth chan writeup)* Scale checked for Data Contamination on GSM8K using GSM-1K (Announcement, Paper)* Vibes-Eval from Reka - a set of multimodal evals (Announcement, Paper, HF dataset)* Open Source LLMs * Gradient releases 1M context window LLama-3 finetune (X)* MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4 (X, HF)* Nous Research - Hermes Pro 2 - LLama 3 8B (X, HF)* AI Town is running on Macs thanks to Pinokio (X)* LMStudio releases their CLI - LMS (X, Github)* Big CO LLMs + APIs* Github releases Copilot Workspace (Announcement)* AI21 - releases Jamba Instruct w/ 256K context (Announcement)* Google shows Med-Gemini with some great results (Announcement)* Claude releases IOS app and Team accounts (X)* This weeks Buzz* We're heading to SF to sponsor the biggest LLama-3 hackathon ever with Cerebral Valley (X)* Check out my video for Weave our new product, it's just 3 minutes (Youtube)* Vision & Video* Intern LM open sourced a bunch of LLama-3 and Phi based VLMs (HUB)* And they are MLXd by the "The Bloke" of MLX, Prince Canuma (X)* AI Art & Diffusion & 3D* ByteDance releases Hyper-SD - Stable Diffusion in a single inference step (Demo)* Tools & Hardware* Still haven't open the AI Pin, and Rabbit R1 just arrived, will open later today* Co-Hosts and Guests* Piotr Padlewski (@PiotrPadlewski) from Reka AI* Idan Gazit (@idangazit) from Github Next* Wing Lian (@winglian)* Nisten Tahiraj (@nisten)* Yam Peleg (@yampeleg)* LDJ (@ldjconfirmed)* Wolfram Ravenwolf (@WolframRvnwlf)* Ryan Carson (@ryancarson)Scores and EvaluationsNew corner in today's pod and newsletter given the focus this week on new models and comparing them to existing models.What is GPT2-chat and who put it on LMSys? (and how do we even know it's good?)For a very brief period this week, a new mysterious model appeared on LMSys, and was called gpt2-chat. It only appeared on the Arena, and did not show up on the leaderboard, and yet, tons of sleuths from 4chan to reddit to X started trying to figure out what this model was and wasn't. Folks started analyzing the tokenizer, the output schema, tried to get the system prompt and gauge the context length. Many folks were hoping that this is an early example of GPT4.5 or something else entirely. It did NOT help that uncle SAMA first posted the first tweet and then edited it to remove the - and it was unclear if he's trolling again or foreshadowing a completely new release or an old GPT-2 but retrained on newer data or something. The model was really surprisingly good, solving logic puzzles better than Claude Opus, and having quite amazing step by step thinking, and able to provide remarkably informative, rational, and relevant replies. The average output quality across many different domains places it on, at least, the same level as high-end models such as GPT-4 and Claude Opus.Whatever this model was, the hype around it made LMSYS add a clarification to their terms and temporarily take off the model now. And we're waiting to hear more news about what it is. Reka AI gives us Vibe-Eval a new multimodal evaluation dataset and score (Announcement, Paper, HF dataset)Reka keeps surprising, with only 20 people in the company, their latest Reka Core model is very good in multi modality, and to prove it, they just released a new paper + a new method of evaluating multi modal prompts on VLMS (Vision enabled Language Models) Their new Open Benchmark + Open Dataset is consistent of this format: And I was very happy to hear from one of the authors on the paper @PiotrPadlewski on the pod, where he mentioned that they were trying to create a dataset that was going to be very hard for their own model (Reka Core) and just decided to keep evaluating other models on it. They had 2 main objectives : (i) vibe checking multimodal chat models for day-to-day tasks and (ii) deeply challenging and probing the capabilities of present frontier models. To this end, the hard set contains > 50% questions that all frontier models answer incorrectlyChatting with Piotr about it, he mentioned that not only did they do a dataset, they actually used Reka Core as a Judge to score the replies from all models on that dataset and found that using their model in this way roughly correlates to non-expert human judgement! Very very interesting stuff. The "hard" set is ... well hard! Piotr concluded that if folks want to do research, they will provide free API access to Reka for that, so hit them up over DMs if you want to take this eval for a spin on your new shiny VLM (or indeed verify the metrics they put up) Scale tests for eval dataset contamination with GSM-1K (Announcement, Paper) is one of the most prominent companies in AI you may never have heard of, they are valued at $13B dollars and have pivoted from data processing for autonomous vehicles to being the darling of the government, with agreements from the DoD for data pipeline and evaluation for US Military. They have released a new paper as well, creating (but not releasing) a new dataset that matches the GSM8K (Grade School Math) dataset and evaluation that many frontier companies love to showcase in their release benchmarks with some surprising results! So Scale folks created (but not released) a dataset called GSK 1K, which tracks and is similar to the public GSM-8K dataset, and tested a bunch of existing models on their new one, to see the correlation, and if the different was very stark, assume that some models overfitted (or even had their dataset contaminated) on the publicly available GSM8K. On one end, models like Mistral or Phi do up to 10% worse on GSM1k compared to GSM8k. On the other end, models like Gemini, Claude, or GPT show basically no signs of being overfit.The author goes on to say that overfitting doesn't necessarily mean it's a bad model, and highlights Phi-3 which has a 10% difference on their new GSK-1K score compared to GSM-8K, but still answers 68% of their dataset, while being a tiny 3.8B parameter model. It seems that Scale is now stepping into the Evaluation game and have noticed how much interest there is in actually understanding how models perform, and are stepping into this game, by building (but not releasing so they don't leak) datasets. Jim Fan tweet (and Scale CEO Alex Wang QT) seem to agree that this is the right positioning for Scale (as they don't have models of their own and so can be neutral like Moody's)Open Source LLMs LLama-3 gets 1M context window + Other LLama-3 newsIn the second week of LLama-3 corner, we are noticing a significant ramp in all things Llama-3, first with the context length. The same folks from last week, Gradient, have spend cycles and upscaled/stretched LLama-3 to a whopping 1 million tokens in the context window (Llama-3 8B Gradient Instruct 1048k), with a very decent Niddle in the Haystack result. The main problem? Transformers have quadratic attention scaling issues for longer context, so this isn't something that you'd be able to run on your mac (nay, on your cluster) any time soon, and it's almost only theoretical at this point. The upside? We had Wing Lian (from Axolotl) on the show, and he talked about a new method called LoRD (which is now part of MergeKit) which is a way to extract Loras from models. Think of it as LLM arithmetic, you take the base model (llama-3 in this case) and the finetune (Llama-3 8B Gradient Instruct 1048k) and simple run a command like so: mergekit-extract-lora llama-3-8B-gradient-instruct-1048K llama-3-8B just-the-context-lora [--no-lazy-unpickle] --rank=desired_rankAnd boom, in theory, you have a tiny LoRA file that's extracted that is only the difference between these two models, the base and it's finetune. It's really exciting stuff to be able to do brain surgery on these models and extract only one specific essence! First LLama-3 finetunes that beat the instruct version Folks and Nous research give us a new Hermes-Pro on top of Llama-8B (X, HF) that is beating the llama-3 instruct on benchmarks, which is apparently very hard to do, given that Meta created a LOT of human labeled instructions (10M or so) and gave us a really really good instruct model. Nous Hermes 2 pro is also giving Llama-3 additional superpowers like function calling and tool use, specifically mentioning that this is the model to use if you do any type of agentic stuffThis new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs, and has improved on several other metrics as well, scoring a 90% on our function calling e
Hey hey folks, happy ThursdAI ๐ŸŽ‰ Not a lot of house-keeping here, just a reminder that if you're listening or reading from Europe, our European conference is happening in May 15 in London, and you're more than welcome to join us there. I will have quite a few event updates in the upcoming show as well. Besides this, this week has been a very exciting one for smaller models, as Microsoft teased and than released Phi-3 with MIT license, a tiny model that can run on most macs with just 3.8B parameters, and is really punching above it's weights. To a surprising and even eyebrow raising degree! Let's get into it ๐Ÿ‘‡ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.TL;DR of all topics covered: * Open Source LLMs * Microsoft open sources Phi-3 (X, HF)* LLama3 70B top5 (no top 6) on LMsys (LMsys Arena)* Snowflake open sources Arctic - A massive hybrid MoE (X, Try it, HF)* Evolutionary Model merges support in MergeKit (Blog)* Llama-3 8B finetunes roundup - Longer Context (128K) and Dolphin & Bagel Finetunes* HuggingFace FINEWEB - a massive 45TB (the GPT4 of datasets) and 15T tokens high quality web data dataset (HF)* Cohere open sourced their chat interface (X)* Apple open sources OpenElm 4 models + training library called corenet (HF, Github, Paper)* Big CO LLMs + APIs* Google Gemini 1.5 pro is #2 on LMsys arena * Devin is now worth 2BN and Perplexity is also a Unicorn * A new comer called Augment (backed by Eric Schmidt) is now coming out of stealth (X)* Vision & Video* Adobe releases VideoGigaGAN - high quality upscaler with temporal consistency (paper)* TLDraw autocomplete UI demo (X)* This Weeks Buzz - What I learned in WandB this week* Joe Spisak talk about Llama3 on Stage at WandB Fully connected (Full Talk, TLDR)* Voice & Audio* (previously releases conversational Voice AI platform (X)* AI Art & Diffusion & 3D* like LMsys but for image generation model + leaderboard from FAL (try it)* Tools & Hardware* Rabbit R1 release party & no shipping update in sight* I'm disillusioned about my AI Pin and will return itOpen Source LLMs Llama-3 1 week-aversary ๐ŸŽ‚ - Leaderboard ranking + finetunes Well, it's exactly 1 week since we got Llama-3 from Meta and as expected, the rankings show a very very good story. (also it was downloaded over 1.2M times and already has 600 derivatives on HuggingFace) Just on Monday, Llama-3 70B (the bigger version) took the incredible 5th place (now down to 6th) on LMSys, and more surprising, given that the Arena now has category filters (you can filter by English only, Longer chats, Coding etc) if you switch to English Only, this model shows up 2nd and was number 1 for a brief period of time. So just to sum up, an open weights model that you can run on most current consumer hardware is taking over GPT-4-04-94, Claude Opus etc' This seems dubious, because well, while it's amazing, it's clearly not at the level of Opus/Latest GPT-4 if you've used it, in fact it fails some basic logic questions in my tests, but it's a good reminder that it's really hard to know which model outperforms which and that the arena ALSO has a bias, of which people are using it for example and that evals are not a perfect way to explain which models are better. However, LMsys is a big component of the overall vibes based eval in our community and Llama-3 is definitely a significant drop and it's really really good (even the smaller one) One not so surprising thing about it, is that the Instruct version is also really really good, so much so, that the first finetunes of Eric Hartfords Dolphin (Dolphin-2.8-LLama3-70B) is improving just a little bit over Meta's own instruct version, which is done very well. Per Joe Spisak (Program Manager @ Meta AI) chat at the Weights & Biases conference last week (which you can watch below) he said "I would say the magic is in post-training. That's where we are spending most of our time these days. Uh, that's where we're generating a lot of human annotations." and they with their annotation partners, generated up to 10 million annotation pairs, both PPO and DPO and then did instruct finetuning. So much so that Jeremy Howard suggests to finetune their instruct version rather than the base model they released.We also covered that despite the first reactions to the 8K context window, the community quickly noticed that extending context window for LLama-3 is possible, via existing techniques like Rope scaling, YaRN and a new PoSE method. Wing Lian (Maintainer of Axolotl finetuneing library) is stretching the model to almost 128K context window and doing NIH tests and it seems very promising! Microsoft releases Phi-3 (Announcement, Paper, Model)Microsoft didn't really let Meta take the open models spotlight, and comes with an incredible report and follow up with a model release that's MIT licened, tiny (3.8B parameters) and performs very very well even against Llama-3 70B. Phi is a set of models from Microsoft that train on synthetic high-quality dataset modeled after textbooks-is-all-you-need/TinyStories approach. The chart is quite incredible, the smallest (mini) Phi-3 is beating Llama-3-8B AND Mixtral on MMLU scores, BigBench and Humaneval. Again to simplify, this TINY 3.8B model, half the size of 1 Mixtral expert, beats Mixtral and newly released Llama-3-8B on most benchmark, not to mention GPT-3.5! It's honestly quite a crazy chart to look at, which raises the question, did this model train on these benchmarks? ๐Ÿค” I still haven't seen definitive proof that the folks at Microsoft trained on any benchmarks data, I did see engagement from them and a complete denial, however we did see a few attempts at using Phi-3 and the quantized versions and the wrong end token formatting seem to be very prevalent in shaping the early opinion that this model performance is detached from it's very high scoring. Not to mention that model being new, there's confusion about how to use it, see thread from Anton Bacaj about HuggingFace potentially using the wrong end token to finish conversations. Now to an actual performance of this tiny model, I asked it a simple logic based question that trips many models even ones good with logic (Opus and GPT-4 answer it correctly usually) and it performed very well (here a comparison with LLama-3-70B which didn't do as well)Additionally, their tokenizer is very interesting, they have all these terms that receive a full token, things like function_list, calc, ghreview, ghissue, and others, which highlight some interesting potential use-cases they have planned for this set of models or give us a hint at it's training process and how come it's so very good. Snowflake open sources Arctic - a massive 480B MoE Hybrid with Apache 2 license (X, Try it, HF)Snowflake is a name I haven't yet used on ThursdAI and this field is getting crowded, but they just released something interesting (+ a LOT of open source, including training code, checkpoints, research insights etc')The thing I found most interesting is, the massive 128 experts MoE but also the Hybrid architecture. Not quite an MoE and definitely not a dense model. They claim to have found that training Many-but-condensed experts with more expert choices is working well for them based on DeepSpeed research. You can give this model a try here and I have, using the same 2 questions I had for Phi and LLama and found the model not that great at logic to be honest, but it was really fast considering the total size, so inference optimization for this type of architecture is definitely geared towards Enterprise (as well as training cost, they claim it cost just under $2 million dollars to train) Big CO LLMs + APIsNot a lot of super interesting things in this corner, besides Gemini 1.5 pro (the one with 1M context window) finally appearing in the Arena and taking the amazing #2 spot (pushing Llama-3 8B to number 6 on the same day it just appeared in there lol) This is very impressive, and I gotta wonder what happened with Gemini Ultra if pro with larger context beats it outright. It's indeed very good, but not THAT good if you use it om simple logic problems and don't use the whole context length. I suspect that we'll hear much more about their AI stuff during the upcoming Google IO (which I was invited to and am going to cover) Additionally, we've had quite a few AI Unicorns born, with Perplexity becoming a freshly mint Unicorn with an additional round of funding and Devin, the 6-month old agent startup getting to a 2 billion valuation ๐Ÿ˜ฎ This weeks Buzz (What I learned with WandB this week)It's been exactly 1 week since our conference in SF and since Joe Spisak by complete chance announced Meta LLama - 3 live on stage a few hours after it was officially announced. In this weeks buzz, I'm very happy to bring you that recording, as promised last week. I will also share that our newly announced new LLM observability tool Weave launched officially during the conference and it'll be my job to get you to use it ๐Ÿ™‚ And shoutout to those in the ThursdAI community who already used and provided feedback, it's really helpful! AI Art & DiffusionThe fine folks at have launched the for images, and called it.... ๐Ÿ™‚ It's a adversarial arena with different image generators, all hosted on Fal I assume, that lets the user choose which images are "better" which is a vague term. But it's really fun, give it a try! Tools & HardwareRabbit R1 first impressionsWe finally got a tease of R1 from Rabbit, as the first customers started receiving this device (where's mine?? I didn't even get a tracking number) Based on the presentation (which I watched so you don't have to) the response time, which was one of the most talked about negative pieces of AI Pin seems very decent. We're going to see a lot of reviews, but I'm very excited about my Rabbit ๐Ÿ‘ ๐Ÿ‡ Apparently
Happy LLama 3 day folks! After a lot of rumors, speculations, and apparently pressure from the big Zuck himself, we finally can call April 18th, 2024, LLaMa 3 day! I am writing this, from a lobby of the Mariott hotel in SF, where our annual conference is happening called Fully Connected, and I recorded today's episode from my hotel room. I really wanna shout out how awesome it was to meet folks who are listeners of the ThursdAI pod and newsletter subscribers, participate in the events, and give high fives. During our conference, we had the pleasure to have Joe Spisak, the Product Director of LLaMa at Meta, to actually announce LLaMa3 on stage! It was so exhilarating, I was sitting in the front row, and then had a good chat with Joe outside of the show ๐Ÿ™Œ The first part of the show was of course, LLaMa 3 focused, we had such a great time chatting about the amazing new 8B and 70B models we got, and salivating after the announced but not yet released 400B model of LLaMa 3 ๐Ÿ˜ฎ We also covered a BUNCH of other news from this week, that was already packed with tons of releases, AI news and I was happy to share my experiences running a workshop a day before our conference, with focus on LLM evaluations. (If there's an interest, I can share my notebooks and maybe even record a video walkthrough, let me know in the comments) Ok let's dive in ๐Ÿ‘‡ Happy LLama 3 day ๐Ÿ”ฅ The technical detailsMeta has finally given us what we're all waiting for, an incredibly expensive (2 clusters of 24K H100s over 15 Trillion tokens) open weights models, the smaller 8B one and the larger 70B one. We got both instruction fine tune and base models, which are great for finetuners, and worth mentioning that this is a dense model (not a mixture of experts, all the parameters are accessible for the model during inference) It is REALLY good at benchmarks, with the 7B model beating the previous (LLaMa 2 70B) on pretty much all benchmarks, and the new 70B is inching on the bigger releases from the past month or two, like Claude Haiku and even Sonnet! The only downsides are the 8K context window + non multimodality, but both are coming according to Joe Spisak who announced LLama3 on stage at our show Fully Connected ๐Ÿ”ฅ I was sitting in the front row and was very excited to ask him questions later! By the way, Joe did go into details they haven't yet talked about pulblicly (see? I told you to come to our conference! and some of you did!) and I've been live-tweeting his whole talk + the chat outside with the "extra" spicy questions and Joes winks haha, you can read that thread hereThe additional infoMeta has also partnered with both Google and Bing (take that OpenAI) and inserted LLama 3 into the search boxes of Facebook, Instagram, Messenger and Whatsapp plus deployed it to a new product called (you can try it there now) and is now serving LLama 3 to more than 4 Billion people across all of those apps, talk about compute cost! Llama 3 also has a new Tokenizer (that Joe encouraged us to "not sleep on") and a bunch of new security tools like Purple LLama and LLama Guard. PyTorch team recently released finetuning library called TorchTune is now supporting LLama3 finetuning natively out of the box as well (and integrates Wandb as it's first party experiment tracking tool) If you'd like more details, directly from Joe, I was live tweeting his whole talk, and am working at getting the slides from our team. We'll likely have a recording as well, will post it as soon as we have it. Here's a TL;DR (with my notes for the first time) of everything else we talked about, but given today is LLaMa day, and I still have to do fully connected demos, I will "open source" my notes and refer you to the podcast episode to hear more detail about everything else that happened today ๐Ÿซก TL;DR of all topics covered: * Meta releases LLama 3 -8B, 70B and later 400B (Announcement, Models, Try it, Run Locally)* Open Source LLMs * Meta LLama 3 8B, 70B and later 400B (X, Blog)* Trained 15T tokens! * 70B and 8B modes released + Instruction finetuning* 8K context length , not multi modal* 70B gets 82% on MMLU and 81.7% on HumanEval* 128K vocab tokenizer* Dense model not MoE* Both instruction tuned on human annotated datasets* Open Access* The model already uses RoPe * Bigxtral instruct 0.1 (Blog, Try it)* Instruct model of the best Apache 2 model around* Release a comparison chart that everyone started "fixing" * ๐Ÿค– Mixtral 8x22B is Mistral AI's latest open AI model, with unmatched performance and efficiencyย * ๐Ÿ—ฃ It is fluent in 5 languages: English, French, Italian, German, Spanish* ๐Ÿงฎ Has strong math and coding capabilities ย * ๐Ÿง  Uses only 39B parameters out of 141B total, very cost efficient* ๐Ÿ—œ Can recall info from large documents thanks to 64K token context window* ๐Ÿ†“ Released under permissive open source license for anyone to use* ๐Ÿ† Outperforms other open models on reasoning, knowledge and language benchmarks ย * ๐ŸŒ Has strong multilingual abilities, outperforming others in 4 languages* ๐Ÿงช Excellent basis for customization through fine-tuning* New Tokenizer from Mistral (Docs)* Focusing on Tool Use with tokens ๐Ÿ”ฅ* WizardLM-2 8x22B, 70B and 7B (X, HF)* Released it and then pulled it back from HF and Github due to microsoft toxicity not passing* Big CO LLMs + APIs* OpenAI gives us Batch API + Assistants API v2 * Batch is 50% cost and win win win* Assistants API V2 - new RAG* new file search tool* up to 10,000 files per assistant* new vector store* Reka gives us Reka Core (X, Try)* Multimodal that understands video as well* 20 people team* Video understanding is very close to Gemini * 128K context * Core has strong reasoning abilities including for language, math and complex analysis.* 32 languages support * HuggingFace ios chat bot now * This weeks Buzz* Me + team led a workshop a day before the conference (Workshop Thread)* Fully Connected in SF was an incredible success, over 1000 AI attendies + Meta AI announcement on stage ๐Ÿ”ฅ * PyTorch new TorchTune finetuning library with first class WandB support (X)* Vision & Video* Microsoft VASA-1 animated avatars (X, Blog)* Amazing level of animation from 1 picture + Sound* Harry Potter portraits are here* They likely won't release this during Election year* Looks very good ,close to EMO but no code* ๐Ÿ“บ Videos show faces speaking naturally with head movements and lip sync* ๐Ÿ”ฌ Researchers are exploring applications in education, accessibility and more* HuggingFace updates IDEFICS2 8B VLM (X, HF)* Apache 2 license* Competitive with 30B models* 12 point increase in VQAv2, 30 point increase in TextVQA (compared to Idefics 1)* > 10x fewer parameters than Idefics 1* Supports image resolution up to 980 x 980+* Better OCR capabilities (thanks to more than 6TB of OCR pre-training data)* Adobe shows Firefly video + SORA support (X)* Voice & Audio* Rewind AI is now Limitless (X)* New service & Brand name* Transcription to you * Hardware device that looks sleek * 100hours * Privacy support in cloud* AI Art & Diffusion & 3D* Stability - Stable Diffusion 3 is here * Available via API only* Partnered with Fireworks HQ for the release* Needs stability AI membership to use / access $$* Big step up in composition and notorious issues like hands, "AI faces" etc. (from * Seems to prefer simpler prompts.* Way more copyright-friendly. It's hard to get any kind of brands/logos. * Text is amazing.* Others* New AIrChat with amazing transcription is out, come join us in our AI corner there* Humane AI pin was almost killed by MKBHD review* Rabbit reviews incomingThat's all for this week, next week we have an amazing guest, see you then! ๐Ÿซก This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to bonus episodes, visit
this week was absolutely bonkers. For starters, for the first time ever, we got an Open Weights model (Command R+) to jump over GPT-4 in human rankings on LMsys, this is huge!Then on Tuesday, it seems that all the companies just wanted to one up one another, first Gemini 1.5 released with updates, made it available in 180 countries, added audio mode + tons of API improvements and system prompts, then less than an hour later, OpenAI has given us a "majorly improved" GPT-4 Turbo version (2024-04-09) that is now back to being the BEST LLM IN THE WORLD and to cap that day off, Mistral did the thing again, the thing being, dropping a torrent link in a tweet with no explanations.What was in that torrent is a Mixtral 8x22B MoE (which we started calling Bixtral) which comes with an Apache2 license and seems to be VERY good!We also saw the first finetune from HuggingFace/KAIST folks less than 48 hours later (the authors of said finetune actually came on the show ๐ŸŽ‰ )Fully Connected is a week from today! If you haven't yet signed up, use THURSDAI promo code and come hear from Richard Socher (, Jerry Liu (Ilamaindex CEO), Karoly (TwoMinutePapers), Joe Spisak (Meta) and and leaders from NVIDIA, Snowflake, Microsoft, Coatue, Adobe, Siemens, Lambda and tons more ๐Ÿ‘‡TL;DR of all topics covered:* Open Source LLMs* ๐Ÿ”ฅ Mistral releases Mixtral 8x22 Apache 2 licensed MoE model (Torrent, TRY IT)* Cohere CMDR+ jumps to no 6 on LMSys and beats GPT4 (X)* CodeGemma, RecurrentGemma & Gemma Instruct 1.1 (Announcement)* Auto-code-rover gets 22% on SWE bench (Announcement)* HuggingFace - Zephyr 141B-A35B - First Bixtral Finetune (Announcement)* Mistral 22B - 1 single expert extracted from MoE (Announcement, HF)* This weeks Buzz - Weights & Biases updates* FullyConnected is in 1 week! (Come meet us)* Big CO LLMs + APIs* ๐Ÿ”ฅ GPT-4 turbo is back to being number 1 AI with 88.2% Human Eval score (X)* Gemini 1.5 Pro now understands audio, uses unlimited files, acts on your commands, and lets devs build incredible things with JSON mode (X)* LLama 3 coming out in less than a month (confirmed by Meta folks)* XAI Grok now powers news summaries on X (Example)* Cohere new Rerank 3 (X)* Voice & Audio* HuggingFace trained Parler-TTS (Announcement, Github)* Udio finally launched it's service (Announcement, Leak, Try It)* Suno has added explore mode (* Hardware* Humane AI pin has started shipping - reviews are not amazingOpen Source LLMsCommand R+ first open weights model that beats last year GPT4 versionsThis is massive, really a milestone to be discussed, and even though tons of other news happened, the first time an open weights model is beating GPT-4 not on a narrow case (coding, medical) but on a general human evaluation on the arena.This happened just a year after GPT-4 first came out, and is really really impressive.Command R+ has been getting a lot of great attention from the community as well, folks were really surprised by the overall quality, not to mention the multilingual abilities of CommandR+Mixtral 8x22B MoE with 65K context and Apache 2 license (Bigstral)Despite the above, Cohere time in the sun (ie top open weights model on lmsys) may not be that long if the folks at Mistral have anything to say about it!Mistral decided to cap the crazy Tuesday release day with another groundbreaking tweet of theirs which includes a torrent link and nothing else (since then they of course uploaded the model to the hub) giving us what potentially will unseat Command R from the rankings.The previous Mixtral (8x7B) signaled the age of MoEs and each expert in that was activated from Mistral 7B, but for this new affectionally named Bixtral model, each expert is a 22B sized massive model.We only got a base version of it, which is incredible on it's own right, but it's not instruction finetuned yet, and the finetuner community is already cooking really hard! Though it's hard because this model requires a lot of compute to finetune, and not only GPUs, Matt Shumer came on the pod and mentioned that GPUs weren't actually the main issue, it was system RAM when the finetune was finished.The curious thing about it was watching the loss and the eval loss. it [Bixtral] learns much faster than other models - Matt ShumerMatt was trying to run Finetunes for Bigstral and had a lot of interesting stuff to share, definitely check out that conversation on the pod.Bigstral is... big, and it's not super possible to run it on consumer hardware.... yet, because Nisten somehow got it to run on CPU only ๐Ÿคฏ using Justin Tuneys LLM kernels (from last week) and LLama.cpp with 9tok/s which is kinda crazy.HuggingFace + KAIST release Zephyr 141B-A35B (First Mixtral 8x22 finetune)And that was fast, less than 48 hours after the torrent drop, we already see the first instruction finetune from folks at HuggingFace and KAIST AI.They give us a new finetune using ORPO, a technique by KAIST that significantly improves finetuning ability (they finetuned Bigstral with 7k capybara instructions for 1.3 hours on 4 nodes of 8 x H100s)They used the distilled Capybara Dataset (From LDJ and Argilla) to give this model a bit more clarity and instruction following.You can find the model on the hub here, and the question is, but now the question is would one run this? ๐Ÿ˜…Btw the authors of the finetune and the ORPO paper from KAIST, Jiwoo Hong and Noah Lee came on the pod and chatted about this finetune and ORPO which was awesome! Definitely check this conversation out.Big CO LLMs + APIsGemini 1.5 Pro updates - Audio Mode, JSON, System prompts and becomes freeGoogle really pulled out all the stops for this updated release of Gemini 1.5 Pro, it's flagship, 1M context window model.Its now available for free to over 180 countries, has a new audio mode where you can upload up to 9.5 hours of audio (which is crazy on it's own) and it's not merely transcription, it seems that they baked an audio encoder in there so the model can understand some tonality and even some dogs barking in the background!In fact, instead of me writing down, how about I show you an example of Gemini itself extracting everything I said about it during the show? Here's a screenshot of me uploading 2+ hours of raw unedited audio form the show today:You can see the Google AI studio (which is a very clean product!) and the new system message, the ability to turn the safety filters off (thank you!) and the audio mode. Not to mention the 250K tokens ๐Ÿ˜‚ that my audio cost this model. Mind you, the highest context window after Gemini is Claude 3 with 200K.Google also significantly improves the APIs, and gave access to a new file upload API that allows up to 2GB files uploaded (to support this amazing context and multimodality) ๐Ÿ”ฅOpenAI - GPT 4 turbo a new and "Majorly improved version"Remember when Gemini 1.5 was announced? You may not remember that specific day, because an hour after that, OpenAI published SORA and blew our collective minds off.Well, OpenAI is at it again, but this time it didn't quite work the same way, but an hour after Gemini 1.5 updates came out, OpenAI released GPT4-Turbo-April-9 aka (gpt-4-turbo-2024-04-09) and basically all they said that it was "majorly improved"The technical stuff first, they combined the tool use (function calling) API with the Vision API, which is feature parity with Anthropic).The vibes are currently good, folks are seeing improvements across the board in logic and code creation, specifically the folks at Cursor posted an example (and enabled this model in their IDE) where it writes higher quality code.As Iโ€™m writing these words, LMSys updated us that this new model shot up to the top of the arena taking the Mantle back from Opus as the best AI we have, and also a confirmation from OpenAI that this model is now powering the chatGPT interface ๐Ÿ‘OpenAI also just open sourced a repo to show what they used to get these exact scores for the new GPT-4 and they are impressiveThis weeks Buzz (What I learned with WandB this week)Final Call! Fully Connected, our very own annual conference is about to commence(hehe of course it's happening on a ThursdAI, I still have to think about how to record the show next week)Please feel free to use the code THURSDAI to sign up and come see us.As a reminder, we're also running a workshop a day before, where we're going to showcase Weave and give practical examples for LLM builders, and it's going to be a lot of fun! Looking forward to see some of you there!Audio & VoiceUdio launches a suno competitor AI Music serviceFor the past week+ I've seen tons of AI plugged folks in SF post about "a new AI for music is coming and it's going to be amazing". Well it's finally here, called Udio and it gives Suno a run for its money for sure.With the ability to create full tracks, create into and outro, remix, and a very needed AI enhanced prompting, Udio does look very very polished and sounds GOOD!Here is an example of a classical music track that's been going viral:I've played a few more examples on the show itself, and you can check out the trending creations on their page.Interestingly, this is probably a diffusion model, and so folks have been squeezing all kinds of stuff that's not only musical out of there, including, stand up comedy with a full laugh track.Suno adds explore modeMeanwhile Suno is not going down without a fight and have released this amazing new page where they generated thousands of samples for hundreds of interesting/weird sound styles, letting you get exposed and learn about different musical styles. I really liked it so recorded a short reaction video:Phew, somehow we made it, we were able to summarize the huge news this week in under two hours + a newsletter!The one thing I haven't been able to do is to actually try out many of the stuff I talked about, so after writing this, will take a little break and delve into some of the other things I haven't yet tried ๐Ÿ‘€See you guys next week in limited capacity (maybe, we'll see) and until then, have a
Happy first ThursdAI of April folks, did you have fun on April Fools? ๐Ÿ‘€ I hope you did, I made a poll on my feed and 70% did not participate in April Fools, which makes me a bit sad! Well all-right, time to dive into the news of this week, and of course there are TONS of news, but I want to start with our own breaking news! That's right, we at Weights & Biases have breaking new of our own today, we've launched our new product today called Weave! Weave is our new toolkit to track, version and evaluate LLM apps, so from now on, we have Models (what you probably know as Weights & Biases) and Weave. So if you're writing any kind RAG system, anything that uses Claude or OpenAI, Weave is for you! I'll be focusing on Weave and I'll be sharing more on the topic, but today I encourage you to listen to the launch conversation I had with Tim & Scott from the Weave team here at WandB, as they and the rest of the team worked their ass off for this release and we want to celebrate the launch ๐ŸŽ‰TL;DR of all topics covered: * Open Source LLMs * Cohere - CommandR PLUS - 104B RAG optimized Sonnet competitor (Announcement, HF)* Princeton SWE-agent - OSS Devin - gets 12.29% on SWE-bench (Announcement, Github)* Jamba paper is out (Paper)* Mozilla LLamaFile now goes 5x faster on CPUs (Announcement, Blog)* Deepmind - Mixture of Depth paper (Thread, ArXiv)* Big CO LLMs + APIs* Cloudflare AI updates (Blog)* Anthropic adds function calling support (Announcement, Docs)* Groq lands function calling (Announcement, Docs)* OpenAI is now open to customers without login requirements * Replit Code Repair - 7B finetune of deep-seek that outperforms Opus (X)* Google announced Gemini Prices + Logan joins (X)ืงืจืž* This weeks Buzz - oh so much BUZZ!* Weave lunch! Check weave out! (Weave Docs, Github)* Sign up with Promo Code THURSDAI at * Voice & Audio* OpenAI Voice Engine will not be released to developers (Blog)* Stable Audio v2 dropped (Announcement, Try here)* Lightning Whisper MLX - 10x faster than whisper.cpp (Announcement, Github)* AI Art & Diffusion & 3D* Dall-e now has in-painting (Announcement) * Deep dive* Jamba deep dive with Roi Cohen from AI21 and Maxime Labonne Open Source LLMs Cohere releases Command R+, 104B RAG focused model (Blog)Cohere surprised us, and just 2.5 weeks after releasing Command-R (which became very popular and is No 10 on Lmsys arena) gave us it's big brother, Command R PLUSWith 128K tokens in the context window, this model is multilingual as well, supporting 10 languages and is even beneficial on tokenization for those languages (a first!) The main focus from Cohere is advanced function calling / tool use, and RAG of course, and this model specializes in those tasks, beating even GPT-4 turbo. It's clear that Cohere is positioning themselves as RAG leaders as evident by this accompanying tutorial on starting with RAG apps and this model further solidifies their place as the experts in this field. Congrats folks, and thanks for the open weights ๐ŸซกSWE-Agent from PrincetonFolks remember Devin? The super cracked team born agent with a nice UI that got 13% on the SWE-bench a very hard (for LLMs) benchmark that requires solving real world issues?Well now we have an open source agent that comes very very close to that called SWE-AgentSWE agent has a dedicated terminal and tools, and utilizes something called ACI (Agent Computer Interface) allowing the agent to navigate, search, and edit code. The dedicated terminal in a docker environment really helps as evident by a massive 12.3% score on SWE-bench where GPT-4 gets only 1.4%! Worth mentioning that SWE-bench is a very hard benchmark that was created by the folks who released SWE-agent, and here's some videos of them showing the agent off, this is truly an impressive achievement!Deepmind publishes Mixture of Depth (arXiv)Thanks to Hassan who read the paper and wrote a deep dive, this paper by Deepmind shows their research into optimizing model inference. Apparently there's a way to train LLMs without affecting their performance, which later allows to significantly reduce compute on some generated tokens. ๐Ÿง  Transformer models currently spread compute uniformly, but Mixture-of-Depths allows models to dynamically allocate compute as needed๐Ÿ’ฐ Dynamically allocating compute based on difficulty of predicting each token leads to significant compute savings โณ Predicting the first token after a period is much harder than within-sentence tokens, so more compute is needed ๐Ÿ—‘ Most current compute is wasted since difficulty varies between tokensWe're looking forward to seeing models trained with this, as this seems to be a very big deal in how to optimize inference for LLMs. Thank you for reading ThursdAI - Best way to support us is to just share this with folks ๐Ÿ‘‡Big CO LLMs + APIsAnthropic and Groq announce function calling / tool use support, Cohere takes it one step furtherIn yet another example of how OpenAI is leading not only in models, but in developer experience, most models and API providers are now using the same messages API structure. Back in June of 2023, OpenAI gave us function calling, and finally the industry is aligning to this format, which is now being rebranded as "tool use" If you're unfamiliar with the concept, tool use allows a developer to specify what tools the model can have in addition to just spitting out tokens, think browsing the web, or using RAG to get more information, or check the weather, or... turn off a lighbulb in your smart home. The LLM then decides based on user input, if a specific tool needs to be called, responds with the tool and parameters it needs to the developer, and then expects the result of that tool, and finally, is able to respond to the user with the complete information. So this week we've got Command R, Groq and Anthropic all adding support for tool use, which is incredible for developer experience across the board and will allow developers to move between all those APIs. Cohere goes one step further with something they call Multi Step tool use, which is a significant step up and is very interesting to explore, as it gives their models the ability to rank and order tool execution, and ovserve their responses.Anthropic Docs Docs Docs AI is now in GA + workers in PythonIf you've been following ThursdAI, you know I'm a huge Cloudflare fan. I've built my startup ( on top of Cloudflare workers platform, and I gave them early feedback about having to step into AI in a big way. And they did, with workers AI which is now in GA. Workers AI lets developers in the Cloudflare ecosystem run LLMs (they mostly feature Opensource LLMs which is incredible), host vectors, run whisper and basically have end to end serverless apps that are powered by AI (they have GPUs in 150 cities around the world)This week Clouflare announced also the ability to write workers in Python, which was sorely missing for some folks (like me!) who love FastAPI for example, and while it's not a full python environment, the depth to which they had to go in order to allow python to execute on their edge is kind of ridiculous, read up on it hereI'm hoping to work with them to bring weave into the workers for python soon ๐Ÿคž because building AI applications with Cloudflare is so simple, they even have a HuggingFace integration which allows you to bring models into your CF environment with 1 click. This weeks Buzz - SO MUCH BUZZHey, well first of all, I now can offer you a 15% off a ticket to our conference, so use THURSDAI when you checkout and get a ticket hereNow that Weave is out, it's possible to say that our workshop on April 17 (same link as above) is going to be focused on LLM evaluations and yes, I will be talking about how to use weave to build LLM applications in production safely. If this field is new to you, please sign up and come to the workshop!JAMBA deep dive with Roi @ AI21 and Maxime LabonneAs always, what I cover in this newsletter are only the highlights of what we talked about, but there was so much more, I really recommend you to listen to the episode. This of this weeks episode as 2 episodes (maybe I should re-release the deep dive as a separate episode) because we had a long conversation with Roi Cohen who's a PM @ AI21 and Maxime Labonne (Author of LazyMergeKit and first finetune of JAMBA), it's really worth tuning into that interview. Here's a little snippet: Aaaand this is it for this week, or you know what? Maybe it's not! I shared this on X but if you don't follow me on X, I decided to prank my whole feed by saying that I'm basically changing careers and becoming a Russian AI DJ, called DJ Thursday and I will only play AI generated music. The weird thing, how many people were like, yeah ok, this makes sense for you ๐Ÿ˜… So here's my April Fools (one of them) joke, hope you enjoy the high quality of these tunes and see you all next week ๐Ÿซก This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to bonus episodes, visit
Hey everyone, this is Alex and can you believe that we're almost done with Q1 2024? March 2024 was kind of crazy of course, so I'm of course excited to see what April brings (besides Weights & Biases conference in SF called Fully Connected, which I encourage you to attend and say Hi to me and the team!) This week we have tons of exciting stuff on the leaderboards, say hello to the new best AI in the world Opus (+ some other surprises), in the open source we had new MoEs (one from Mosaic/Databricks folks, which tops the open source game, one from AI21 called Jamba that shows that a transformers alternative/hybrid can actually scale) and tiny MoE from Alibaba, as well as an incredible Emotion TTS from Hume. I also had the pleasure to finally sit down with friend of the pod Tanishq Abraham and Paul Scotti from MedArc and chatted about MindEye 2, how they teach AI to read minds using diffusion models ๐Ÿคฏ๐Ÿง ๐Ÿ‘๏ธThank you for reading ThursdAI - Recaps of the most high signal AI weekly spaces. This post is public so feel free to share it.TL;DR of all topics covered: * AI Leaderboard updates* Claude Opus is number 1 LLM on arena (and in the world)* Claude Haiku passes GPT4-0613* ๐Ÿ”ฅ Starling 7B beta is the best Apache 2 model on LMsys, passing GPT3.5* Open Source LLMs * Databricks/Mosaic DBRX - a new top Open Access model (X, HF)* ๐Ÿ”ฅ AI21 - Jamba 52B - Joint Attention Mamba MoE (Blog, HuggingFace)* Alibaba - Qwen1.5-MoE-A2.7B (Announcement, HF)* Starling - 7B that beats GPT3.5 on lmsys (HF)* LISA beats LORA as the frontrunner PeFT (X, Paper)* Mistral 0.2 Base released (Announcement)* Big CO LLMs + APIs* Emad leaves stability ๐Ÿฅบ* Apple rumors - Baidu, Gemini, Anthropic, who else? (X)* This weeks buzz* WandB Workshop in SF confirmed April 17 - LLM evaluations (sign up here)* Vision & Video* Sora showed some demos by actual artists, Air Head was great (Video)* Tencent Aniportait - generate Photorealistic Animated avatars (X)* MedArc - MindEye 2 - fMRI signals to diffusion models (X) * Voice & Audio* Hume demos EVI - empathic voice analysis & generation (X, demo)* AI Art & Diffusion & 3D* Adobe firefly adds structure reference and style transfer - (X, Demo)* Discussion* Deep dive into MindEye 2 with Tanishq & Paul from MedArc* Is narrow finetuning done-for with larger context + cheaper prices - debate๐Ÿฅ‡๐Ÿฅˆ๐Ÿฅ‰Leaderboards updates from LMSys (Arena)This weeks updates to the LMsys arena are significant. (Reminder in LMsys they use a mix of MT-Bench, LLM as an evaluation and user ELO scores where users play with these models and choose which answer they prefer)For the first time since the Lmsys arena launched, the top model is NOT GPT-4 based. It's now Claude's Opus, but that's not surprising if you used the model, what IS surprising is that Haiku, it's tiniest, fastest brother is now well positioned at number 6, beating a GPT4 version from the summer, Mistral Large and other models while being dirt cheap. We also have an incredible show from the only Apache 2.0 licensed model in the top 15, Starling LM 7B beta, which is now 13th on the chart, with incredible finetune of a finetune (OpenChat) or Mistral 7B. ๐Ÿ‘ Yes, you can now run a GPT3.5 beating model, on your mac, fully offline ๐Ÿ‘ Incredible. Open Source LLMs (Welcome to MoE's)Mosaic/Databricks gave us DBRX 132B MoE - trained on 12T tokens (X, Blog, HF)Absolutely crushing the previous records, Mosaic has released the top open access model (one you can download and run and finetune) in a while, beating LLama 70B, Grok-1 (314B) and pretty much every other non closed source model in the world not only on metrics and evals, but also on inference speedIt uses a Mixture of Experts (MoE) architecture with 16 experts that each activate for different tokens. this allows it to have 36 billion actively parameters compared to 13 billion for Mixtral. DBRX has strong capabilities in math, code, and natural language understanding. The real kicker is the size, It was pre-trained on 12 trillion tokens of text and code with a maximum context length of 32,000 tokens, which is just incredible, considering that LLama 2 was just 2T tokens. And the funny thing is, they call this DBRX-medium ๐Ÿ‘€ Wonder what large is all about.Graph credit Awni Hannun from MLX (Source)You can play with the DBRX here and you'll see that it is SUPER fast, not sure what Databricks magic they did there, or how much money they spent (ballpark of ~$10M) but it's truly an awesome model to see in the open access! ๐Ÿ‘ AI21 releases JAMBA - a hybrid Transformer + Mamba 58B MoE (Blog, HF)Oh don't I love #BreakingNews on the show! Just a few moments before ThursdAI, AI21 dropped this bombshell of a model, which is not quite the best around (see above) but has a few very interesting things going for it. First, it's a hybrid architecture model, capturing the best of Transformers and Mamba architectures, and achieving incredible performance on the larger context window size (Transformers hardware requirements scale quadratically with attention/context window)AI21 are the first to show (and take the bet) that hybrid architecture models actually scale well, and are performant (this model comes close to Mixtral MoE on many benchmarks) while also being significantly cost advantageous and faster on inference on longer context window. In fact they claim that Jamba is the only model in its size class that fits up to 140K context on a single GPU! ย  This is a massive effort and a very well received one, not only because this model is Apache 2.0 license (thank you AI21 ๐Ÿ‘) but also because this is now the longest context window model in the open weights (up to 256K) and we've yet to see the incredible amount of finetuning/optimizations that the open source community can do once they set their mind to it! (see Wing from Axolotl, add support for finetuning Jamba the same day it released) Can't wait to see the benchmarks for this model once it's properly instruction fine-tuned. Small MoE from Alibaba - Qwen 1.5 - MoE - A2.7B (Blog, HF)What a week for Mixture of Experts models, we got an additional MoE from the awesome Qwen team, where they show that training a A2.7B (the full model is actually 14B but only 2.7B are activated at the same time) is cheaper, 75% reduction in training costs and 174% improvement in inference speed!Also in open source: Lisa beats LORA for the best parameter efficient training ๐Ÿ“ฐ LISA is a new method for memory-efficient large language model fine-tuning presented in a Hugging Face paper๐Ÿ’ช LISA achieves better performance than LoRA with less time on models up to 70B parameters๐Ÿง  Deep networks are better suited to LISA, providing more memory savings than shallow networks๐Ÿ’พ Gradient checkpointing greatly benefits LISA by only storing gradients for unfrozen layers๐Ÿ“ˆ LISA can fine-tune models with up to 7B parameters on a single 24GB GPU๐Ÿš€ Code implementation in LMFlow is very simple, only requiring 2 lines of code๐Ÿค” LISA outperforms full parameter training in instruction following tasksBig CO LLMs + APIsEmad departs from Stability AI.In a very surprising (perhaps unsurprising to some) move, Emad Mostaque, founder and ex-CEO of stability announces his departure, and focus on decentralized AIFor me personally (and I know countless others) we all started our love for Open Source AI with Stable Diffusion 1.4, downloading the weights, understanding that we can create AI on our machines, playing around with this. It wasn't easy, stability was sued to oblivion, I think LAION is still down from a lawsuit but we got tons of incredible Open Source from Stability, and tons of incredible people who work/worked there. Big shoutout to Emad and very excited to see what he does nextThrowback to NEURIPS where Emad borrowed my GPU Poor hat and wore it ironically ๐Ÿ˜‚ Promised me a stability hat but... I won't hold it against it him ๐Ÿ™‚ This weeks Buzz (What I learned with WandB this week)I'm so stoked about the workshop we're running before the annual Fully Connected conference in SF! Come hear about evaluations, better prompting with Claude, and tons of insights that we have to share in our workshop, and of course, join the main event on April 18 with the whole Weights & Biases crew! VisionSora was given to artists, they created ... artHere's a short by a company called ShyKids who got access to SORA alongside other artists, it's so incredibly human, and I love the way they used storytelling to overcome technological issues like lack of consistency between shots. Watch it and enjoy imagining a world where you could create something like this without living your living room. This also shows that human creativity and art is still deep in the middle of all these creations, even with tools like SORAMindEye 2.0 - faster fMRI-to-imageWe had the awesome pleasure to have Tanishq Abraham and Paul Scotti, who recently released a significantly bette version of fMRI to Image model called MindEye 2.0, shortening the time it takes from 40 hours of data to just 1 hour of fMRI data. This is quite remarkable and I would encourage you to listen to the full interview that's coming out this Sunday on ThursdAI.VoiceHume announces EVI - their Empathic text to speech mode (Announcement, Demo)This one is big folks, really was blown away (see my blind reaction below), Hume announced EVI, a text to speech generator that can reply with emotions! It's really something, and it has be seen to experience. This is in addition to Hume already having an understanding of emotions via voice/imagery, and the whole end to end conversation with an LLM that understands what I feel is quite novel and exciting! The Fine-Tuning Disillusionment on XQuite a few folks noticed a sort of disillusionment from finetuning coming from some prominent pro open source, pro fine-tuning accounts leading me to post this: And we of course had to have a conversation about it, as well as Hamel Husain wrote this response blog called "Is Finetuning still valuable" I'll let you listen to the conversation, but I will say, like w
March madness... I know for some folks this means basketball or something, but since this is an AI newsletter, and this March was indeed mad, I am claiming it. This week seemed madder from one day to another. And the ai announcements kept coming throughout the recording, I used the "breaking news" button a few times during this week's show! This week we covered tons of corporate AI drama in the BigCO segment, from Inflection โ†’ Microsoft move, to Apple Gemini rumors, to Nvidia GTC conference, but we also had a bunch of OpenSource to go over, including an exciting glimpse into the O1 from Open Interpreter, which the founder Killian (of the ThursdAI mafia haha) joined to chat about briefly after an all nighter release push! Another returning FOTP (friend of the pod) Matt Shumer joined as we did a little deep dive into prompting Claude, and how he went viral (seems to happen a lot to Matt) with a project of his to make Claude write prompts for itself! Definitely worth a listen, it's the first segment post the TL'DR on the pod ๐Ÿ‘‚ this week.Btw, did you already check out fully connected? It's the annual Weights & Biases conference in SF next month, and tickets are flying, I'm going to be there and actually do a workshop one day prior, would love to invite you to join as well!TL;DR of all topics covered: * Open Source LLMs* Xai open sources Grok (X, Blog, HF, Github) * Sakana AI releases a new paper + 2 JP merged SOTA models (X, Paper, Blogpost)* Open Interpreter announces O1 - the Linux for AI devices (X, Project)* LM studio new modes (X)* Big CO LLMs + APIs* Nvidia GTC conference - Blackwell platform, NIMs and Gr00t robotics* Jensen interviewed transformers authors * Apple rumored to look at a deal including GEMINI* Apple releases a multi modal MM1 paper (X)* Inflection founders leave to head Microsoft AI* Google opens up Gemini 1.5 with 1M context access to all (X)* Vision & Video* NVIDIA + MIT release VILA (13B, 7B and 2.7B) (X, HuggingFace, Paper)* This week's BUZZ* Fully Connected is coming, sign up here, get tickets, join us. * I'm running a workshop in SF a day before on improving your LLM step by step including exciting announcements (same link)* Voice & Audio* Suno V3 launched officially (X, Blog, Play with it)* Distil-whisper-v3 - more accurate, and 6x version of whisper large (X, Code)* AI Art & Diffusion & 3D* Stability presents SD3 TURBO - 4 steps to get same high quality generation (Paper)* Stability open sources Stable Video 3D (Blog, Models)* Tools & Others* Neuralink interview with the first Human NeuroNaut - Nolan (X)* Lex & Sama released a podcast, barely any news* Matt Shumer releases his Claude Prompt engineer (X, Metaprompt, Matt's Collab)Open Source LLMs Xai open sources Grok (X, Blog, HF, Github) Well, Space Uncle Elon has a huge week, from sending starship into orbit successfully to open sourcing an LLM for us, and a huge one at that. Grok is a 314B parameter behemoth, with a mixture of experts architecture of 80B per expert and two active at the same time. It's released as a base model, and maybe that's why it was received with initial excitement but then, nobody in the GPU poor compute category has the ability to run/finetune it! In terms of performance, it barely beats out Mixtral, while being almost 10x larger, which just shows that.... data is important, maybe more important than Github stars as Arthur (CEO Mistral) helpfully pointed out to Igor (founder of Xai). Still big props to the team for training and releasing this model under apache 2 license.Sakana AI launches 2 new models using evolutionary algo mergingYeah, that's a mouthful, i've been following Hardmaru (David Ha) for a while before he joined Sakana, and only when the founder (and a co-author on transformers) LLion Jones talked about it on stage at GTC the things connected. Sakana means fish in Japanese, and the idea behind this lab is to create things with using nature like evolutionary algorithms. The first thing they open sourced was 2 new SOTA models for Japanese LLM, beating significantly larger models, by using Merging (which we covered with Maxime previously, and whom Sakana shouted out in their work actually) Open Interpreter announces 01 Light - the linux of AI hardware devicesBreaking news indeed, after we saw the release of R1 go viral in January, Killian (with whom we chatted previously in our most favorited episode of last year) posted that if someone wants to build the open source version of R1, it'll be super cool and fit with the vision of Open Interpreter very well.And then MANY people did (more than 200), and the O1 project got started, and fast forward a few months, we now have a first glimpse (and the ability to actually pre-order) the O1 Light, their first device that's a button that communicates with your computer (and in the future, with their cloud) and interacts with a local agent that runs code and can learn how do to things with a skill library. It's all very very exciting, and to see how this idea goes from an announcement on X, to hundreds of folks collaborating and pushing this to the open has been incredible, and we'll definitely do a deeper dive into capabilities and the whole project once the launch craziness dies down a bit (Killian joined us at the epitome of the launch all-nighter haha) This is poised to be the first open source AI device, completely with .stl files for 3d printing at home, chip designs, ability to run end to end locally on your mac and we really applaud the team for this release ๐Ÿซก Big CO LLMs + APIsNvidia GTC annual conference - New Blackwell platform, NIMs, Robotics and everything AI + a chat with the transformer avengers This week Nvidia had their annual GTC conference, where Jensen announced a ton of stuff, but the highlights where the new Blackwell chip (the next iteration of the H100) and the GB200 racks with a whopping 720PFlops of compute ( to put this number in perspective: the first DGX that Jensen delivered to OpenAI in 2016 was 0.17 Petaflops ) They also announced partnerships with everyone under the sun pretty much, a new way to deliver packaged AI experiences called NIMs (which we at weights & biases support as well) and a new foundational operating system for robotics called GR00T led by Dr Jim Fan. Jensen also had the whole transformers original authors cast together on stage (and in the green room) for an hour, for the first time, to chat about, well... transformers. I really need to find the whole video and post it because it's hidden inside the Nvidia GTC website, but it was a very fun chat, where the team reminisced about the naming and their thoughts on the future of LLMs. They also covered each individual company (all of them lefty Google since then) and what they all do. It was a great chat. Microsoft buys Inflection (almost) and Apple considers buying GeminiIn other huge AI player news, 2 of the 3 founders of Inflection AI left to start Microsoft AI (together with some of the staff), namely Mustafa who founded inflection, then helped raise 1.8B dollars, get up to 22K H100 GPUs, release Inflection 2.5 that comes close to GPT4, and then decided to leave. Inflection also pivoted away from consumer (Pi was a very nice AI to chat with) into API services, and apparently Microsoft will pay Inflection $650 to Inflection in the form of a licensing deal. Meanwhile there are rumors that Apple is eyeing Gemini to integrate into IOS, which is, very weird given the recent bad press about Gemini (Unless Apple doesn't want to deal with the same bad press?) and it's even weirder given the latest push from Apple into Open Source. Folks at apple this week released a new paper called MM1, outlining a new multi modal model they have trained (but not released) and show that it beats Gemini visual understanding. It was also great to see that the authors of that model shouted out Weights & Biases crew that helped them through their work on this paper๐Ÿ‘ Nolan - the first NeuralNaut (first human with a Nauralink implanted) Just as I was summing up the notes for this week, Neuralink pinged that they are going to go live soon, and I tuned in to see a 20yo Paraplegic gamer, getting interviewed by a Neuralink employee, being very cheerful, while also playing a chess game, all with his brain. We went a really long way since the monkey playing Pong, and Nolan was able to describe his experience "it's like using The Force" of using Neuralink to control his mac cursor. It was all kind of mind-blowing, and even though brain implants are nothing new, the fidelity and the wireless connections + the very quick surgery made this demo such a nonchalant thing, that Nolan didn't even stop playing chess while being interviewed, probably not realizing that millions of people would be watching. They have a bunch of ML understanding the signals that Nolan sends from his brain wirelessly, and while this is very exciting, and Nolan prepares for this halloween as Professor X from X-men, because well, he's in fact a telekinesis enabled human, Elon claimed that their next target is fixing blindsight (and that it already works on monkeys) presumably via camera input being triggered in the visual cortex. Back in November 2022, I watched the Neuralink keynote and geeked out so hard about this section, where Dan Adams, one of the neuroscientists at Neuralink talked about how it's possible to trigger / stimulate the visual cortex to fix blindness and then generate an image. Well, this is it folks, we talked about tons of other stuff of course but these are the main points that made the cut into the newsletter, as always, if you want to support this newsletter/podcast, please share it with friends โค๏ธ Hope to see you in SF in April (I'll be giving more reminders don't worry) and see you here next ThursdAI ๐Ÿซก P.S - I said Intel a bunch of times when I mean Nvidia, apologies, didnโ€™t notice until post publishing ๐Ÿ˜… This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to
"...Happy birthday dear ThursdAIiiiiiiii, happy birthday to youuuuuu ๐ŸŽ‚"What a day! Today is ฯ€-day (March 14th), 2024. For some reason it's important, not only because it's GPT-4 anniversary, or Claude 1 anniversary, or even that Starship flew to space, but also ๐Ÿฅ it's ThursdAI BirthdAI ๐ŸŽ‰ Yeah, you heard that right, last year following GPT-4 release, I hopped into a twitter space with a few friends, and started chatting about AI, and while some friends came and went, I never stopped, in fact, I decided to leave my 15 year career in software, and focus on AI, learning publicly, sharing my learnings with as many people as possible and it's been glorious. And so today, I get to celebrate a little ๐Ÿ’ƒI also get to reminisce about the state of AI that we were at, back exactly a year ago. Context windows were tiny, GPT-4 came out with 8K (we casually now have models with 200K that cost $0.25/1M tokens), GPT-4 also showed unprecedented levels vision capabilities back then, and now, we have 1.3B parameters models that have similar level of visual understanding, open source was nascent (in fact, LLama.cpp only had it's first commit 4 days prior to GPT4 launch, Stanford released the first Alpaca finetune of Llama just a day prior. Hell even the chatGPT API only came out a few days before, so there was barely any products built with AI out there. Not to mention that folks were only starting to figure out what vector DBs were, what RAG is, how to prompt, and that it's possible to run these things in a loop and create agents! Other fields evolved as well, just hit play on this song I generated for ThursdAI with Suno V3 alpha, I canโ€™t stop listening to it and imagining that this was NOT possible even a few months agoIt's all so crazy and happening so fast, that annual moments like these propose a great opportunity to pause the acceleration for a sec. and contextualize it, and bask in the techno-optimism glory of aren't we lucky to live in these times? I sure am, and for me it's the ThursdAI birthday gift to be able to share my excitement with all of you! Thank you for being a subscriber, the best way you can support ThursdAI is to share this with a friend and tag us on socials ๐ŸซกTL;DR of all topics covered: * Open Source LLMs * Together releases Sequoia speculative decoding (X, Blog)* Hermes Pro from NousResearch - Tool use and function calling (X, HF, Github)* Big CO LLMs + APIs* Anthropic releases Claude 3 Haiku (Announcement, Blog)* Cohere CMD+R (Announcement, HF)* This weeks Buzz* Early bird tickets for Fully Connected in SF are flying, come meet the Weights & Biases team. We're also going to be running a workshop a day before, come join us! (X)* Vision & Video* Deepseek VLM 1.3B and 7B (X,Announcement, HF)* Voice & Audio* Made a song with Suno v3 Alpha for ThursdAI, it's a banger (Song)* Hardware & Robotics (New)* OpenAI now powers Figure - the humanoid robot company (X)* Cerebras announces the fastest AI chip on earth (X)* Extropic made an announcement about their TPU - Thermodynamic Processing Unit* Tools & Agents* Devin from Cognition Labs (Announcement, 47 minute demo)Agents for your house and your Github tasksSay hello to Devin from Cognition Labs (Announcement, Real world demo)By far the most excited I've seen my X feed be this week, was excitement about Cognition Labs new agent called Devin, which they call the first AI software engineer. You should really watch the video, and then watch a few other videos, because, well, only a few folks are getting access, and yours truly is not one of them.It seems like a very published launch, backed by tons of VC folks, and everybody kept highlighting not only the innovative UI that Devin has, and it has a very polished UX/UI/Dev experience with access to a browser (where you can authenticate and it can pick up doing tasks), terminal (where you can scroll back and forth in time to see what it did when), but also a chat window and a planning window + an IDE where it rights code and you can scrub through that as well. Folks were also going crazy about the founder (and team) amount of math ability and IOI gold medals, this video went viral featuring Scott the founder of Cognition, in his youth obliterating this competitionโ€ฆ poor Victoria ๐Ÿ˜…Regardless of their incredible math abilities, Devin is actually pretty solid, specifically from the UI side, and again, like with he AutoGPT hype of yesteryear, we see the same issues, it's nice, but cognition hiring page is still looking for human software engineers. Tune into the last 30 minutes of the pod today as we had tons of folks discuss the implications of an AI "software engineer" and whether or not coding skills are still required/desired. Short answer is, yes, don't skip, learn coding. Devin is going to be there to assist but likely will not replace you.๐Ÿค– OpenAI + Figure give GPT-4 hands (or give figure eyes/ears/mouth)Ok this demo you must just see before reading the rest of it, OpenAI announced a partnership with Figure, a humanoid robotics company recently, and just this week they released a demo of this integration. Using GPT4-Vision and Text to speech capabilities (with a new, somewhat raspy voice and human like intonations), the bot listens to the human giving it instructions, sees the world in front of it, and is able to perform tasks that the human has asked it to do via voice. This feels like a significant jump in capabilities for these bots, and while it was a given that the two technologies (Actuator based robotics and LLMs) will meet soon , this shows the first I Robot like moment. It'll still be a while until you can have this one do your dishes or fold your laundry, but it does feel like it's an eventuality at this point, where as before, it just felt like sci-fi. Kudos on this integration, and can't wait until Optimus from Tesla will add Grok brains and it'll make you laugh nervously at it's cringe jokes ๐Ÿ˜… This weeks BuzzWe're coming to SF in April, our annual Fully Connected conference will feature keynote speakers from foundational AI companies, industry, our founders and tons of Weights & Biases users. We'll also be running a workshop (I'm one of the workshop folks) a day before, so keep an eye on that, it'll be likely included in your ticket (which is still, 50% off for early bird)Open Source LLMs Nous Research gives us Tool Use with Hermes 2 Pro (Announcement)Getting json structured output and giving models the ability to respond with not only text, but specific instructions for which functions to run (aka tool use) is paramount for developers. OpenAI first released this back in June, and since then I've been waiting for Open Source to catch up. And catch up they did, with Nous releasing their first attempt at continued training of the renown Hermes 7B Mistral based model, with tool use and structured output! If you're building agents, or any type of RAG system with additional tools, you will definitely be very happy as well, give Hermes Pro a try! This one is not a simple download and run, you have to do some coding, and luckily the folks at Nous provided us with plenty of examples in their Github. Deepseek gives us a new Vision model - Deepseek VL 1.3B & 7B (Announcement)Absolutely punching above it's weight, this very high quality vision model from the Deepseek folks is just a sign of what's coming, smaller models, performing incredibly better on several tasks. While the top is getting crowded with Claude, GPT4-V and Gemini which are generic, on specific tasks, we're getting tiny models that can offload fully into memory and run hell fast and perform very well on narrow tasks, even in the browserBig CO LLMs + APIsAnthropic gives the smallest/fastest/cheapest Claude 3 - HaikuAfter releasing Opus and Sonnet earlier, Anthropic has reclaimed their throne as the leading AI lab we always knew them to be. Many friends of the pod prefer Opus for many things now, and I keep seeing this sentiment online, folks are even considering cancelling chatGPT for the first time since... well ever? While sonnet, their middle model is taking a significant interesting place on top of the LMsys arena human rated rankings Beating all GPT-4 besides the Turbo ones. And now Anthropics has given us Haiku, the smallest of the three Claudes, the fastest, and the cheapest by far. With 200K context window, vision capabilities, this model crushes GPT3.5 on many benchmarks and becomes the de-facto cheapest model to run. It only costs $0.25/1M tokens, which is twice cheaper than GPT3.5 but just look at the performance. One thing to note, Anthropic still doesn't support function calling/tool use. Cohere releases a new model for retrieval and enterprise purposes - CMD+RCohere goes for the second wind with a great release + open weights approach, and release Command+R (pronounced Commander) which is a model focused on enterprise uses, scalability and tool use. It supports 10 languages, 128K context and beats GPT3.5 and Gemini 1.0 on several tasks, namely on KILT - Knowledge Intensive Language Tasks. The tool use capabilities and the ability to ground information in retrieved context makes this specifically a great model to use for RAG purposes.The model is 34B and is available non commercially on the hubTogether makes inference go BRRR with Sequoia, a new speculative decoding methodTogether Sequoia shows a way to speed up Llama2-70B and be able to run this on a single consumer GPU with 8x speed up. Being able to run AI locally can mean a few things, it can mean, make smaller models better, and we've seen this again and again for the past year. Another way is... speculative decoding. Being able to lower the inference TBT (time between tokens) by enhancing algorithms of decoding and using tiny draft models, and methods like offloading. The large model essentially remains the same, while a smaller (draft) model can help guide the inference and make it seem much faster. These methods compound, and while Sequoia from Together is new, shows great promis
Hello hello everyone, happy spring! Can you believe it? It's already spring! We have tons of AI news for you to cover, starting with the most impactful one, did you already use Claude 3? Anthropic decided to celebrate Claude 1's birthday early (which btw is also ThursdAI's birthday and GPT4 release date, March 14th, 2023) and gave us 3 new Clauds! Opus, Sonnet and Haiku. TL;DR of all topics covered: * Big CO LLMs + APIs* ๐Ÿ”ฅ Anthropic releases Claude Opus, Sonnet, Haiku (Announcement, try it)* Inflection updates Pi 2.5 - claims GPT4/Gemini equivalent with 40% less compute (announcement)* Elon sues OpenAI (link)* OpenAI responds (link)* ex-Google employee was charged with trading AI secrets with China (article)* Open Source LLMs * 01AI open sources - Yi 9B (Announcement)* AnswerAI - Jeremy Howard, Johno & Tim Detmers - train 70B at home with FSDP/QLoRA (X, Blog)* GaLORE - Training 7B on a single consumer-grade GPU (24GB) (X)* Nous open sources Genstruct 7B - instruction-generation model (Hugging Face)* Yam's GEMMA-7B Hebrew (X)* This weeks Buzz* Weights & Biases is coming to SF in April! Our annual conference called Fully Connected is open for registration (Get your tickets and see us in SF)* Vision & Video* Vik releases Moondream 2 (Link)* Voice & Audio* Suno v3 alpha is blowing minds (Link)* AI Art & Diffusion & 3D* SD3 research paper is here (Link)* Tripo + Stability release TripoSR - FAST image-2-3D (link, Demo, FAST demo)* Story how I created competition of inference providers to get us sub 1.5s playground image gen (X)Big CO LLMs + APIsAnthropic releases Claude 3 Opus, Sonnet and Haiku This was by far the biggest news of this week, specifically because, the top keeps getting saturated with top of the line models! Claude Opus is actually preferable to many folks in blind studies over some GPT-4 features, and as we were recording the pod, LMSys released their rankings and Claude Opus beats Gemini, and is now 3rd in user preference on the LMSys rank. There release is vast, they have announced 3 new models but only gave us access to 2 of them teasing that Haiku is much faster / cheaper than other options in that weight class out there. In addition to being head to head with GPT-4, Claude 3 is now finally also multimodal on inputs, meaning it can take images, understand graphs and charts. They also promised significantly less refusals and improved accuracy by almost 2x. One incredible thing that Claude always had was 200K context window, and here they announced that they will be supporting up to 1M, but for now we still only get 200K.We were also promised support for function calling and structured output, but apparently that's "coming soon" but still great to see that they are aiming for it! We were all really impressed with Claude Opus, from folks on stage who mentioned that it's easier to talk to and feels less sterile than GPT-4, to coding abilities that are not "lazy" and don't tell you to continue writing the rest of the code yourself in comments, to even folks who are jailbreaking the guardrales and getting Claude to speak about the "I" and metacognition. Speaking of meta-cognition sparks, one of the prompt engineers on the team shared a funny story about doing a needle-in-haystack analysis, and that Claude Opus responded with I suspect this pizza topping "fact" may have been inserted as a joke or to test if I was paying attentionThis split the X AI folks in 2, many claiming, OMG it's self aware, and many others calling for folks to relax and that like other models, this is still just spitting out token by token. I additional like the openness with which Anthropic folks shared the (very simple but carefuly crafted) system prompt My personal take, I've always liked Claude, even v2 was great until they nixed the long context for the free tier. This is a very strong viable alternative for GPT4 if you don't need DALL-E or code interpreter features, or the GPTs store or the voice features on IOS. If you're using the API to build, you can self register at and you'll get an API key immediately, but going to production will still take time and talking to their sales folks. Open Source LLMs 01 AI open sources Yi 9B Announcement claims that "It stands out as the top-performing similar-sized language model friendly to developers, excelling in code and math." but it's a much bigger model, trained on 3T tokens. I find it confusing to create a category of models between 7B and almost 12B. This weeks Buzz (What I learned with WandB this week)We're coming to SF! Come join Weights & Biases in our annual conference in the heart of San Francisco, get to hear from industry leaders about how to build models in production, and meet most of the team! (I'll be there as well!) AI Art & DiffusionLast week, just last week, we covered the open sourcing of the awesome Playground 2.5 model, which looked really good in user testing. I really wanted to incorporate this to my little demo, but couldn't run it locally so asked a few friends, and I gotta say, I love how competitive but open the inference providers can get! Between Modal, Fal and Fireworks, I somehow started a performance competition that got these folks to serve Playground 2.5 model in sub 1.5 second per generation. Recorded the story to highlight the awesome folks who worked on this, they deserve the shoutout! You can try super fast Playground generation on FAL and FireworksStability releases Stable Diffusion 3 research paper + Model coming soonStability released the research paper for SD3, their flagship latest iteration of an image model. While this field is getting a little saturated, we now have DALL-E, MidJourney, Adobe Firefly, Playground, SDXL, Stable Cascade and Ideogram, SD is definitely aiming for the title. They released a few metrics claim that on user preference, Visual Aesthetics, Typography and Prompt following, SD2 beats all of the above. They also mentioned the architecture, which is a MM-DiT - multi modal diffusion transformer architecture (DiTs were used for SORA from OpenAI as well) and that they used 50% synthetic captions with COGvlm, which is quite impressive. Emad has mentioned that access to SD3 will start rolling out soon! TripoSR (Demo)We previously covered LUMA models to generate text to 3d, and now we have image 2 3D that's open sourced by the folks at Tripo and Stability AI.TripSR is able to generate 3d shapes from images super super fast, and here's a very nice flow that @blizaine demonstrated of how to use these models to actually bring 3D objects into their environment using a few steps. And that's it for today folks, we of course chatted about a LOT more stuff, I really welcome you to listen to the episode and skip around in the chapters, and see you next week, as we celebrate ThursdAI's birthday (and GPT4 and Claude1) ๐ŸŽ‰ P.S - as I always do, after writing and editing all by hand (promise) I decided to use Opus to be my editor and tell me how was my writing, what did I forget to mention (it has the context form the whole transcription!) and suggest fixes. For some reason I asked Opus for a message to you, the reader. Here it is, take it as you will ๐Ÿ‘ Full Transcript for the deep divers: [00:00:00] Alex Volkov: Right, folks. So I think recording has started. And then let's do our usual. Welcome. Welcome, everyone. Those who know the sound from week to week. This is Alex Volkov. You're listening to ThursdAI, March 7th. I'm an AI evangelist with Weights Biases, who you can see here on stage as well. So, you know, you see the little square thing, give it a follow. Follow us on socials as well. And, uh, today is obviously Thursday.[00:00:45] Alex Volkov: Uh, Thursday was a lot of stuff to talk about. Um, so, let's talk about it. Uh, I think, I think, um, our week is strange, right? Our week starts at the Friday. Almost, not even Friday. The updates that I need to deliver to you start at the end of the previous ThursdAI. So as, as something happens, uh, and I, I have a knowledge cutoff, actually, at some point we considered calling this podcast knowledge cutoff.[00:01:14] Alex Volkov: Um, I have a knowledge cutoff after Thursday afternoon, let's say when I start and send the newsletter, but then AI stuff keeps happening. And, uh, Then we need to start taking notes and taking stock of everything that happened and I think on Friday We had the the lawsuit from Elon and there's a whole bunch of stuff to talk about and then obviously on Monday We had some big news.[00:01:37] Alex Volkov: So As always I'm gonna just run through all the updates. There's not a lot today There's not a ton of updates this week, but definitely there's a few interesting things. Let me un save as well And then I'll just say hi to a few, a few of the folks that I got on stage here to chat. Um, we got Vic, and Vic is going to give us an update about, about something interesting. Uh, Vic, feel free to just unmute and introduce yourself briefly. And then we're going to go through the updates.[00:02:07] Vik: Hey, my name is Vivek, uh, I've been training ML models for the last two years or so. Um, recently released a new model called OneDream2. It's a very small vision language model that excels at a lot of real world use cases that you could use to build computer vision applications today, so I'm very excited to chat about that.[00:02:30] Alex Volkov: Awesome. And, uh, we have Akshay as well. Akshay, it's been a while since you joined us. What's up, man? How are you?[00:02:36] Vik: Greetings of the day everyone, and it's lovely to join again. Uh, I have been listening, I have been here in the audience. Uh, for each and every ThursdAI, and, uh, I've been building some exciting stuff, so I've not been joining much, but, uh, things are going great.[00:02:54] Alex Volkov: Awesome. And, uh, for the first time, I think, or second time we're talking with Siv. Hey, Siv.[00:03:01] Far El: Hey, how's it going, everyone? Uh, just a little background on me. Um
Happy leap year day everyone, very excited to bring you a special once-in-a-4 year edition of ThursdAI ๐Ÿ‘ (Today is also Dune 2 day (am going to see the movie right after I write these here words) and well.. to some folks, this is the bull market โ‚ฟ days as well. So congrats to all who weathered the bear market!)This week we had another great show, with many updates, and a deep dive, and again, I was able to cover most of the news AND bring you a little bit of a deep dive into a very interesting concept called Matryoshka Representation Learning (aka ๐Ÿช† embeddings) and two of the authors on paper to chat with me on the pod! TL;DR of all topics covered: * AI Art & Diffusion & 3D* Playground releases a new diffusion foundational model Playground V2.5 (DEMO)* Alibaba teasing EMO - incredible animating faces (example)* Ideogram 1.0 announced - SOTA text generation (Annoucement)* Open Source LLMs * Gemma update - hard to finetune, not better than 7B mistral* LLama 3 will release in June 2024, not anytime soon* Starcoder 2 + stack V2 (Announcement)* Berkeley Function-Calling leaderboard Leaderboard (Announcement)* Argilla released OpenHermesPreferences the largest open dataset for RLHF & DPO (Announcement)* STORM from Stanford to write long documents (Thread)* Big CO LLMs + APIs* Mistral releases Mistral Large & Le Chat (Announcement, Le Chat)* Microsoft + Mistral strike a deal (Blog)* Google teases GENIE - model makes images into interactive games (announcement)* OpenAI allowing fine-tune on GPT 3.5* Wordpress & Tumbler preparing to sell user data to OpenAI & Midjourney* Other* Mojo releases their MAX inference engine, compatible with PyTorch, Tensorflow & ONNX models (Announcement)* Interview with MRL (Matryoshka Representation Learning) authors (in audio only)AI Art & Diffusion Ideogram 1.0 launches - superb text generation! Ideogram, founded by ex google Imagen folks, which we reported on before, finally announces 1.0, and focuses on superb image generation. It's really great, and I generated a few owls already (don't ask, hooot) and I don't think I will stop. This is superb for meme creation, answering in multimedia, and is fast as well, I'm very pleased! They also announced a round investment from A16Z to go with their 1.0 release, definitely give them a tryPlayground V2.5 Suhail Doshi and Playground release a new foundational image model called Playground v2.5 and it looks awesome, very realistic and honestly looks like it beats MJ and DALL-E on many simple prompts.They also announced that this model received higher user preference scores based on 1K prompts (which we didn't get to see) but they have released this model into the wild, you can download it and play with a free demo provided by modal folksAnother SORA moment? Alibaba teases EMO ๐Ÿคฏ (website)Ok this one has to be talked about, Alibaba released quite a few preview videos + paper about something called EMO, a way to animate a talking/singing Avatars from just 1 image. It broke my brain, and I couldn't stop staring at it. Honestly, it's quite quite something. This model animates not only the mouth, eyes are blinking, there are emotions, hairs move, even earrings, and the most impressive, the whole Larynx muscle structure seem to be animated as well! Just look at this video, and then look at it again. The Github repo was created but no code released and I really hope we get this code at some point, because animating videos with this fidelity + something like SORA can mean so many possible creations! I wrote this tweet only two weeks ago, and I'm already feeling that it's outdated and we're farther along on the curve to there with EMO, what a great release! And just because it's so mind-blowing, here are a few more EMO videos for you to enjoy: Open Source LLMs Starcoder 2 + The Stack V2Folks at hugging face and BigCode have released a beast on us, StarCoder 2 โญ๏ธ The most complete open Code-LLM ๐Ÿค– StarCoder 2 is the next iteration for StarCoder and comes in 3 sizes, trained 600+ programming languages on over 4 Trillion tokens on Stack v2. It outperforms StarCoder 1 by margin and has the best overall performance across 5 benchmarks ๐Ÿš€๐Ÿคฏ.TL;DR;๐Ÿงฎ 3B, 7B & 15B parameter version๐ŸชŸ 16384 token context window๐Ÿ”  Trained on 3-4T Tokens (depending on size)๐Ÿ’ญ 600+ Programming languages๐Ÿฅ‡ 15B model achieves 46% on HumanEval๐Ÿง  Grouped Query Attention and Sliding Window Attention๐Ÿ’ช๐Ÿป Trained on 1024 x H100 NVIDIA GPUsโœ… commercial-friendly license๐Ÿง‘๐Ÿปโ€๐Ÿ’ป Can be used for local CopilotsThe Stack v2 is a massive (10x) upgrade on the previous stack dataset, containing 900B+ tokens ๐Ÿ˜ฎBig CO LLMs + APIs๐Ÿ”ฅ Mistral announces Mistral-Large + Le Chat + Microsoft partnershipToday, we are releasing Mistral Large, our latest model. Mistral Large is vastly superior to Mistral Medium, handles 32k tokens of context, and is natively fluent in English, French, Spanish, German, and Italian.We have also updated Mistral Small on our API to a model that is significantly better (and faster) than Mixtral 8x7B.Lastly, we are introducing Le Chat , a chat interface (currently in beta) on top of our models.Two important notes here, one, they support function calling now on all mistral models in their API, which is a huge deal, and two, the updated Mistral Small to a "significantly better and faster" model than Mixtral 8x7B is quite the hint! I want to also highlight Arthurโ€™s tweet clarifying their commitment to Open Source because it's very important. They released a new website, it again had mentions of "don't train on our models" which they removed, and the new website also had removed the section that committed them to open weights and they put a much bigger section back up quickly! This weeks Buzz (What I learned with WandB this week)I mentioned this before, but this may shock new subscribers, ThursdAI isn't the only (nor the first!) podcast from Weights & Biases. Our CEO Lukas has a long standing podcast that's about to hit 100 episodes and this week he interviewed the CEO of Mayo Clinic - John Hamalka It's a fascinating interview, specifically because Mayo Clinic just recently announced a mutli-year collaboration with Cerebras about bringing AI to everyone who googles their symptoms and ends up on mayo clinic websites anyway, and apparently John has been in AI for longer that I was alive so he's incredibly well positioned to do this and bring us the AI medicine future! Modular announces MAX (Modular Accelerated Xecution) Developer Edition Preview (blog)Modular, the company that created Mojo Lang from Chris Lattner, has now announced the second part of their stack, coming to all of us, and it's called MAX. It's an inference engine that has Mojo built in, that supports PyTorch, Tensorflow and ONNX and is supposedly going to run the same AI models we run now, significantly faster. MAX is a unified set of tools and libraries that unlock performance, programmability and portability for your AI inference pipelinesRight now they support only CPU inference, and significantly boost performance on CPU, however, they are planning GPU support soon as well, and promise up to 5x faster AI inference for most models like Mistral, LLama etc I personally think this is a huge development, and while it's still early, definitely worth taking a look at the incredible speed performances that we are seeing lately, from Groq (as we chatted with them last week) and Modular, we're are very well on our way to run huge models faster, and small models instantly! ๐Ÿช† MRL (Matryoshka Embeddings) interview with Aditya & Prateek Recently OpenAi has released 2 new embeddings models recently that replaced their ada-002 embeddings, and when they released it, they mentioned a new way of shortening dimensions. Soon after, on X, the authors of a 2022 paper MRL (Matryoshka Representation Learning) spoke out and said that this new "method" is actually MRL, the concept they came up with and presented at NeurIPS. Since then I saw many folks explore Matryoshka embeddings, from Bo Wang to Connor Shorten and I wanted to get in on the action! It's quite exciting to have heard from Aditya and Prateek about MRL, how they are able to significantly reduce embeddings size by packing the most important information into the first dimentions, the implications of this for speed of retrieval, the significant boost in use-cases post the chatGPT LLM boom and more! Definitely give this one a listen if you're interested, the interview starts at 01:19:00 on the pod. Thank you for reading, I really appreciate you coming back here week to week, and if you enjoy this content, please share with 1 friend and give us a โญ rating on Apple Pod? Here's a nice Ideogram image as a preemptive thank you! As always, hereโ€™s the full transcript[00:00:00] Intro and welcome[00:00:00][00:00:00] Alex Volkov: Hey, you're on ThursdAI. This is Alex. Happy Leap Year Special Edition. Today's February 29th. We had a great show today. So great that got carried away during the recap, and it's almost twice as long as it usually is. The recap, not the show. But no worries. As always, if you're short on time, the first 25 minutes or so of this almost two hour podcast will catch you up on everything that happened in AI this week.[00:00:29] Alex Volkov: If you're using Apple Podcasts, or any other modern podcatcher, you can also skip to the chapters, that I'm outlining every week and listen to the part that interests you, and only to that part.[00:00:39] Alex Volkov: This week. After the newsy updates, we also had a deep dive into something called Matryoshka Embeddings, with the authors of the MRL paper, Aditya and Pratik.[00:00:49] Alex Volkov: And thank you guys, and I really enjoyed chatting with them both. And we geeked out on why OpenAI decided to release something they came up with two years ago and how it affects the AI industry post the LLM explosion world. So definitely give them a listen![00:01:05] Alex Volkov: at the end of this episode. A brief TLDR, then a full news conversation you're used
Hey, this is Alex,Ok let's start with the big news, holy crap this week was a breakthrough week for speed! We had both Groq explode in popularity, and ByteDance release an updated SDXL model called Lightning, able to generate full blown SDXL 1024 images in 300ms. I've been excited about seeing what real time LLM/Diffusion can bring, and with both of these news release the same week, I just had to go and test them out together: Additionally, we had Google step into a big open weights role, and give us Gemma, 2 open weights models 2B and 7B (which is closer to 9B per Junyang) and it was great to see google committing to releasing at least some models in the open. We also had breaking news, Emad from Stability announced SD3, which looks really great, Google to pay Reddit 200M for AI training on their data & a few more things. TL;DR of all topics covered: * Big CO LLMs + APIs* Groq custom LPU inference does 400T/s Llama/Mistral generation (X, Demo)* Google image generation is in Hot Waters and was reportedly paused (refuses to generate white people)* Gemini 1.5 long context is very impressive to folks (Matt Shumer, Ethan Mollick)* Open Weights LLMs * Google releases GEMMA, open weights 2B and 7B models (Announcement, Models)* Teknium releases Nous Hermes DPO (Announcement, HF)* Vision & Video* YoLo V9 - SOTA real time object detector is out (Announcement, Code)* This weeks Buzz (What I learned in WandB this week)* Went to SF to cohost an event with A16Z, Nous, Mistral (Thread, My Report)* AI Art & Diffusion & 3D* ByteDance presents SDXL-Lightning (Try here, Model)* Stability announces Stable Diffusion 3 (Announcement)* Tools* Replit releases a new experimental Figma plugin for UI โ†’ Code (Announcement)* Arc browser adds "AI pinch to understand" summarization (Announcement)Big CO LLMs + APIsGroq's new LPU show extreme performance for LLMs - up to 400T/s (example)* Groq created a novel processing unit known as the Tensor Streaming Processor (TSP) which they categorize as a Linear Processor Unit (LPU). Unlike traditional GPUs that are parallel processors with hundreds of cores designed for graphics rendering, LPUs are architected to deliver deterministic performance for AI computations.* Analogy: They know where all the cars are going when everyone wakes up for work (when they compile) and how fast they all drive (compute latency) so they can get rid of traffic lights (routers) and turn lanes (backpressure) by telling everyone when to leave the house.* Why would we need something like this? Some folks are saying that average human reading is only 30T/s, I created an example that uses near instant Groq Mixtral + Lightning SDXL to just create images with Mixtral as my prompt managerOpen Source Weights LLMs Google Gemma - 2B and 7B open weights models (demo)* 4 hours after release, Llama.cpp added support, Ollama and LM Studio added support, Tri dao added Flash attention support* Vocab size is 256K* 8K context window* Tokenizer similar to LLama* Folks are... not that impressed as far as I've seen* Trained on 6 trillion tokens* Google also released Gemma.cpp (local CPU inference) - AnnouncementNous/Teknium re-release Nous Hermes with DPO finetune (Announcement)* DPO RLHF is performing better than previous models* Models are GGUF and can be found here* DPO enables Improvements across the boardThis weeks Buzz (What I learned with WandB this week)* Alex was in SF last week* A16Z + 20 something cohosts including Weights & Biases talked about importance of open source* Huge Shoutout Rajko and Marco from A16Z, and tons of open source folks who joined* Nous, Ollama, LLamaIndex, LMSys folks, Replicate, Perplexity, Mistral, Github, as well as Eric Hartford, Jon Durbin, Haotian Liu, HuggingFace, tons of other great folks from Mozilla, linux foundation and Percy from Together/StanfordAlso had a chance to checkout one of the smol dinners in SF, they go really hard, had a great time showing folks the Vision Pro, chatting about AI, seeing incredible demos and chat about meditation and spirituality all at the same time! AI Art & DiffusionByteDance presents SDXL-Lightning (Try here)* Lightning fast SDXL with 2, 4 or 8 steps* Results much closer to original SDXL than turbo version from a few months agoStability announces Stable Diffusion 3 (waitlist)Uses a Diffusion Transformer architecture (like SORA)Impressive multi subject prompt following: "Prompt: a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words "stable diffusion"Tools* Replit announces a new Figma designโ†’ code plugin Thatโ€™s it for today, definitely check out the full conversation with Mark Heaps from Groq on the pod, and see you next week! ๐Ÿซก ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Full Transcript: [00:00:00] Alex Volkov: Hey, this is Alex. This week on ThursdAI, we had an hour conversation with Grok, a new and very exciting AI inference chip that exploded in popularity all over social media after showing a 5x, yes, 5x improvement in AI inference. 500 tokens per second for Lama70B and Mistral.[00:00:32] Alex Volkov: We also talked about Google's new OpenWeights GEMMA model, Google's image generation issues, which led them to take down the abilities of this image generation to generate people. We covered new, incredibly fast SDXL lightning, and we had breaking news for Stable Diffusion 3, which is a diffusion transformer that's coming out of Stability AI.[00:01:03] Alex Volkov: and a bunch of other news. All that after this short intro into Weights Biases.[00:01:10] AI teams are all asking the same question. How can we better manage our model development workflow? The path to production is increasingly complex, and it can get chaotic keeping track of thousands of experiments and models. Messy spreadsheets and ad hoc notebooks aren't going to cut it. The best AI teams need a better solution.[00:01:33] And better tools. They need Weights Biases, the AI developer platform, to unlock their productivity and achieve production ML at scale. Replace messy spreadsheets with an automated system of record for experiments.[00:01:52] Communicate about model evaluation. and collaboratively review results across the team. Clean up disorganized buckets of models with a unified registry. Automatically capture full model lineage, all the data and code used for training and testing. Seamlessly connect to compute to scale up training. And run large scale sweeps efficiently to optimize models.[00:02:20] Analyze the performance of large language models. And monitor LLM usage and costs with live, customizable dashboards. Get your team on the same page to bridge the gaps from ideation to production. Use Weights Biases to build, manage, and deploy better models, faster.[00:02:41] Alex Volkov: Wasn't this cool? This is Kari. She is a original PM on the Weights Biases team. She's been there for a long time and recently we used her voice to narrate this new video that we have up on the website. And I figured I'd put it in here because it works even without the video. And I thought it was super cool.[00:03:01] Alex Volkov: And people ask me, what does Weights Biases do? And hopefully this answers some of those questions. Now I want to switch gears and say, basically. that the format for this week is a little different. We had the folks from Grok and Matt Schumer at the beginning of the pod, and then we kept talking about everything else, like Gemma and Gemini and everything else.[00:03:24] Alex Volkov: So the first hour of this is going to be an interview with the Grok folks, specifically with Mark Heaps and the next hour afterwards is going to be the deep dive into topics. If you're listening to this on Apple podcast, for example, you should be able to just view chapters and skip to a chapter that you'd prefer. .[00:03:51] Alex Volkov: I want to just do a quick recap of ThursdAI for February 22nd everything we've talked about for today and we started the space with a with two I guess Matt Schumer and mark Heaps from, and that's Groq with a Q at the end, not Groq with a K at the end. So not like X ais Groq. Groq is explo on our timelines recently with just incredible viral videos of them performing l la inference on LAMA two 70 B and Mixtral with around 400 or 500 tokens a second, which is.[00:04:34] Alex Volkov: Five times as much as the previous super fast API inference that we've seen for perplexity and from together. And they're serving like Lama 270B with 500 tokens a second. And so we've had Mark from Groq talk to us for almost an hour about how this is even possible. So we had a very nice deep dive with Mark and definitely if you miss this, please check this out on, on the recorded portion as well.[00:04:58] Alex Volkov: And then we also had Matt, who works at HyperWrite, and he's been playing with these tools, and he told us about the demos that he was able to build, and How much of a difference this speed of inference makes. We've talked about their custom chip called LPU, and we've talked about the fact that the company's been around for a while, and they did not expect this explosion in virality, but they're very happy that they chose this direction correctly.[00:05:21] Alex Volkov: Very great interview, great conversation, and I invite you to listen to this as well. We covered that Google image generation is now in hot waters, and was reportedly paused because it's in injecting prompt stuff that they're not that great, let's say. And many people notice that historical figures are being generated in different races, and different multicultural adjustments are happening to your prompts, which is not great.[00:05:46] Alex Volkov: This blew up on Twitter, and even outside of Twitter, I think folks started writing this in actual Media Google, en
Holy SH*T, These two words have been said on this episode multiple times, way more than ever before I want to say, and it's because we got 2 incredible exciting breaking news announcements in a very very short amount of time (in the span of 3 hours) and the OpenAI announcement came as we were recording the space, so you'll get to hear a live reaction of ours to this insanity. We also had 3 deep-dives, which I am posting on this weeks episode, we chatted with Yi Tay and Max Bane from Reka, which trained and released a few new foundational multi modal models this week, and with Dome and Pablo from Stability who released a new diffusion model called Stable Cascade, and finally had a great time hanging with Swyx (from Latent space) and finally got a chance to turn the microphone back at him, and had a conversation about Swyx background, Latent Space, and AI Engineer. I was also very happy to be in SF today of all days, as my day is not over yet, there's still an event which we Cohost together with A16Z, folks from Nous Research, Ollama and a bunch of other great folks, just look at all these logos! Open Source FTW ๐Ÿ‘ TL;DR of all topics covered: * Breaking AI News* ๐Ÿ”ฅ OpenAI releases SORA - text to video generation (Sora Blogpost with examples)* ๐Ÿ”ฅ Google teases Gemini 1.5 with a whopping 1 MILLION tokens context window (X, Blog)* Open Source LLMs * Nvidia releases Chat With RTX local models (Blog, Download)* Cohere open sources Aya 101 - 101 languages supporting 12.8B model (X, HuggingFace)* Nomic releases Nomic Embed 1.5 + with Matryoshka embeddings (X)* Big CO LLMs + APIs* Andrej Karpathy leaves OpenAI (Announcement)* OpenAI adds memory to chatGPT (X)* This weeks Buzz (What I learned at WandB this week)* We launched a new course with Hamel Husain on enterprise model management (Course)* Vision & Video* Reka releases Reka-Flash, 21B & Reka Edge MM models (Blog, Demo)* Voice & Audio* WhisperKit runs on WatchOS now! (X)* AI Art & Diffusion & 3D* Stability releases Stable Casdade - new AI model based on Wรผrstchen v3 (Blog, Demo)* Tools & Others* Goody2ai - A very good and aligned AI that does NOT want to break the rules (try it)๐Ÿ”ฅ Let's start with Breaking News (in the order of how they happened) Google teases Gemini 1.5 with a whopping 1M context windowThis morning, Jeff Dean released a thread, full of crazy multi modal examples of their new 1.5 Gemini model, which can handle up to 1M tokens in the context window. The closest to that model so far was Claude 2.1 and that was not multi modal. They also claim they are researching up to 10M tokens in the context window. The thread was chock full of great examples, some of which highlighted the multimodality of this incredible model, like being able to pinpoint and give a timestamp of an exact moment in an hour long movie, just by getting a sketch as input. This, honestly blew me away. They were able to use the incredible large context window, break down the WHOLE 1 hour movie to frames and provide additional text tokens on top of it, and the model had near perfect recall. They used Greg Kamradt needle in the haystack analysis on text, video and audio and showed incredible recall, near perfect which highlights how much advancement we got in the area of context windows. Just for reference, less than a year ago, we had this chart from Mosaic when they released MPT. This graph Y axis at 60K the above graph is 1 MILLION and we're less than a year apart, not only that, Gemini Pro 1.5 is also multi modal I got to give promps to the Gemini team, this is quite a huge leap for them, and for the rest of the industry, this is a significant jump in what users will expect going forward! No longer will we be told "hey, your context is too long" ๐Ÿคž A friend of the pod Enrico Shipolle joined the stage, you may remember him from our deep dive into extending Llama context window to 128K and showed that a bunch of new research makes all this possible also for open source, so we're waiting for OSS to catch up to the big G. I will sum up with this, Google is the big dog here, they invented transformers, they worked on this for a long time, and it's amazing to see them show up like this, like they used to do, and blow us away! Kudos ๐Ÿ‘ OpenAI teases SORA - a new giant leap in text to video generationYou know what? I will not write any analysis, I will just post a link to the blogpost and upload some videos that the fine folks at OpenAI just started releasing out of the blue.You can see a ton more videos on Sam twitter and on the official SORA websiteHonestly I was so impressed with all of them, that I downloaded a bunch and edited them all into the trailer for the show! Open Source LLMs Nvidia releases Chat With RTX Chat With Notes, Documents, and VideoUsing Gradio interface and packing 2 local modals, Nvidia releases a bundle with open source AI packaged, including RAG and even Youtube transcriptions chat! Chat with RTX supports various file formats, including text, pdf, doc/docx, and xml. Simply point the application at the folder containing your files and it'll load them into the library in a matter of seconds. Additionally, you can provide the url of a YouTube playlist and the app will load the transcriptions of the videos in the playlist, enabling you to query the content they cover.Chat for DevelopersThe Chat with RTX tech demo is built from the TensorRT-LLM RAG developer reference project available from GitHub. Developers can use that reference to develop and deploy their own RAG-based applications for RTX, accelerated by TensorRT-LLM.This weeks Buzz (What I learned with WandB this week)We just released a new course! Hamel Hussein released a course on enterprise model management! Course name: Enterprise Model ManagementCourse Link: is this for: The course is targeted at enterprise ML practitioners working with models: MLOps engineers, ML team leaders, ML engineers. It shows both at conceptual and technical level how to get the most value of W&B Model Registry and automations. Attached is also a screenshot of a slide from the course on what different personas (MLOps, ML exec etc) get from Model Registry.What can they expect: Learn how to store, version, and evaluate models like top enterprise companies today, using an LLM training & evaluation example. Big value props: improved compliance, collaboration, and disciplined model development.Vision & VideoReka releases Reka Flash and Reka Edge multimodal modelsReka was co-founded by Yi Tay, previously from DeepMind, trained and released 2 foundational multimodal models, I tried them and was blown away by the ability of the multi-modals to not only understand text and perform VERY well on metrics (73.5 MMLU / 65.2 on HumanEval) but also boasts incredible (honestly, never before seen by me) multi modal capabilities, including understanding video! Here's a thread of me getting my head continuously blown away by the quality of the tonality of this multimodality (sorry...๐Ÿ˜…)I uploaded a bunch of video examples and was blown away, it understands tonality (with the dive dive Diiiiive example) understands scene boundaries, and does incredible OCR between scenes (the Jason/Alex example from speakers) AI Art & DiffusionStable Cascade (link)Stability AI introduced a new text-to-image generation model called Stable Cascade that uses a three-stage approach to produce high-quality images with a compressed latent space, making it more efficient to train and use than previous models. It achieved better results than other models in evaluations while having faster inference speeds. The company released code to train, fine-tune, and use control models like inpainting with Stable Cascade to enable further customization and experimentation. Stability AI aims to lower barriers to AI development through models like this one.Nate did a comparison between a much slower SDXL and Stable Cascade here: Hereโ€™s the transcript for the whole episode, you definitely should check it out! It was really one of the coolest shows we had, and we had over 2K folks listening in! [00:00:00] Alex Volkov: Hey, this is Alex Volkov, you're on ThursdAI, and I just gotta record this intro real quick, because today marks one of the more singular days in AI that I remember since I started recording ThursdAIs, which was itself a singular day, March 14th, 11 months ago, when GPT 4 was released and announced. We since then had a few days like this GPT Dev Day was one such day, and today marks another one.[00:00:38] Alex Volkov: Google has released an update to their model, talking about 1 million tokens in the context window, basically unlimited. And then, just a few, just an hour or two later, OpenAI said, you know what, we also have something in store, and released the most incredible jump. Incapability of video generation, text to video generation.[00:01:02] Alex Volkov: It's called SORA, and what you hear is us recording live, knowing only about Google, which came out an hour and a half before we started recording, and then somewhere in the middle, I think minute 35 or something, you'll hear our live reaction to the Incredibly mind blowing advancement in text to video that OpenAI just released.[00:01:31] Alex Volkov: And I just wanted to record this as I'm finishing up the editing and about to start writing the newsletter, to say, days like this really are the reason why I'm all in on AI and I'm very excited about the changes and advancements.[00:01:49] Alex Volkov: And I'm sure there will be more days like this going forward. We've yet to see what Apple came up with, we've yet to really see what Meta comes up with Llama 3, etc. And, yeah, I just wish you enjoyed this and I don't have a lot of words here besides just letting you listen to the rest of the episode and say that I was very happy to be in San Francisco for this, the place where most of this happens, and I was very happy to be in company of good friends, both in the virtual world those on stage in our Twitt
Hihi, this is Alex, from Weights & Biases, coming to you live, from Yosemite! Well, actually Iโ€™m writing these words from a fake virtual yosemite that appears above my kitchen counter as Iโ€™m not a Vision Pro user and I will force myself to work inside this thing and tell you if itโ€™s worth it. I will also be on the lookout on anything AI related in this new spatial computing paradigm, like THIS for example! But back to rfeality for a second, we had quite the show today! We had the awesome time to have Junyang Justin Lin, a dev lead in Alibaba, join us and talk about Qwen 1.5 and QwenVL and then we had a deep dive into quite a few Acronyms Iโ€™ve been seeing on my timeline lately, namely DSPy, ColBERT and (the funniest one) RAGatouille and we had a chat with Connor from Weaviate and Benjamin the author of RAGatouille about what it all means! Really really cool show today, hope you donโ€™t only read the newsletter but listen on Spotify, Apple or right here on Substack. TL;DR of all topics covered: * Open Source LLMs * Alibaba releases a BUNCH of new QWEN 1.5 models including a tiny .5B one (X announcement)* Abacus fine-tunes Smaug, top of HF leaderboard based Qwen 72B (X)* LMsys adds more open source models, sponsored by Together (X)* Jina Embeddings fine tune for code* Big CO LLMs + APIs* Google rebranding Bard to Gemini and launching Gemini Ultra (Gemini)* OpenAI adds image metadata (Announcement)* OpenAI keys are now restricted per key (Announcement)* Vision & Video* Bria - RMBG 1.4 - Open Source BG removal that runs in your browser (X, DEMO)* Voice & Audio* Meta voice, a new apache2 licensed TTS - (Announcement)* AI Art & Diffusion & 3D* Microsoft added DALL-E editing with "designer" (X thread)* Stability AI releases update to SVD - video 1.1 launches with a webUI, much nicer videos* Deep Dive with Benjamin Clavie and Connor Shorten show notes:* Benjamin's announcement of RAGatouille (X)* Connor chat with Omar Khattab (author of DSPy and ColBERT) - Weaviate Podcast* Very helpful intro to ColBert + RAGatouille - NotionOpen Source LLMs Alibaba releases Qwen 1.5 - ranges from .5 to 72B (DEMO)With 6 sizes, including 2 new novel ones, from as little as .5B parameter models to an interesting 4B, to all the way to a whopping 72B, Alibaba open sources additional QWEN checkpoints. We've had the honor to have friend of the pod Junyang Justin Lin again, and he talked to us about how these sizes were selected, that even thought this model beats Mistral Medium on some benchmarks, it remains to be seen how well this performs on human evaluations, and shared a bunch of details about open sourcing this.The models were released with all the latest and greatest quantizations, significantly improved context length (32K) and support for both Ollama and Lm Studio (which I helped make happen and am very happy for the way ThursdAI community is growing and connecting!) We also had a chat about QwenVL Plus and QwebVL Max, their API only examples for the best open source vision enabled models and had the awesome Piotr Skalski from Roborflow on stage to chat with Junyang about those models! To me a success of ThursdAI, is when the authors of things we talk about are coming to the show, and this is Junyang second appearance, which he joined at midnight at the start of the chinese new year, so greately appreciated and def. give him a listen! Abacus Smaug climbs to top of the hugging face leaderboard Junyang also mentioned that Smaug is now at the top of the leaderboards, coming from Abacus, this is a finetune of the previous Qwen-72B, not even this new one. First model to achieve an average score of 80, this is an impressive appearance from Abacus, though they haven't released any new data, they said they are planning to! They also said that they are planning to finetune Miqu, which we covered last time, the leak from Mistral that was acknowledged by Arthur Mensch the CEO of Mistral.The techniques that Abacus used to finetune Smaug will be released an upcoming paper! Big CO LLMs + APIsWelcome Gemini Ultra (bye bye Bard) Bard is no longer, get ready to meet Gemini. it's really funny because we keep getting cofusing naming from huge companies like Google and Microsoft. Just a week ago, Bard with Gemini Pro shot up to the LMSYS charts, after regular gemini pro API were not as close. and now we are suppose to forget that Bard even existed? ๐Ÿค” Anyhow, here we are, big G answer to GPT4, exactly 10 months 3 weeks 4 days 8 hours, but who's counting? So what do we actually get? a $20/m advanced tier for Gemini Advanced (which will have Ultra 1.0) the naming confusion continues. We get a longer context (how much?) + IOS and android apps (though I couldn't find it in IOS, maybe it wasn't yet rolled out)Gemini now also replaces google assistant for those with androids who opt in (MKBHD was somewhat impressed but not super impressed) but google is leaning into their advantage including home support! * Looks like Gemini is ONLY optimized for English as well We had quite the conversation on stage from folks who upgraded and started using, including noticing that Gemini is a better role player, and less bland, but also that they don't yet support uploading documents besides images, and that the context window is very limited, some said 8K and some 32K but definitely on the lower side. Also from Google : a llama.cpp wrapper called localllm (Blog)OpenAI watermarks DALL-E images and adds per key API limits (finally) (Blog)OpenAI's using something calledC2PA for pictures made by DALL-E 3, whether you're chatting with ChatGPT or using their API. It's a way to show that DALL-E 3 actually created those images. But it's just for images right now, not for text or voice stuff. Adding this info can make the files up to 32% bigger, but it doesn't mess with the quality. The tags tell you if the source was DALL-E 3, ChatGPT, or the API by including special signatures and stuff. Just a heads up, though, this C2PA thing isn't perfect. The metadata could get wiped either on purpose or by mistake.They also released an update to the developer experience that allows you to track usage but also restrict usage per API key! Very very needed and helpful! This weeks Buzz (What I learned with WandB this week)First part of the live series with the Growth ML team was live and AWESOME! VisionBRIA - Open-Source background removal (non commercial)BRIA AI@bria_ai_Feb 6, 2024๐Ÿ“ท Introducing Open-Source Background Removal by @BriaAI ๐Ÿ“ท Now live on @huggingface, RMBG v1.4 excels in separating foreground from background across diverse categories, surpassing current open models. See demo [] #BriaAI #OpenSource #AI @briaai (hub)1.2B parameter model.Trained on 100K hours of data.Supports zero-shot voice cloning.Short & long-form synthesis.Emotional speech.Best part: Apache 2.0 licensed. ๐Ÿ”ฅPowered by a simple yet robust architecture: > Encodec (Multi-Band Diffusion) and GPT + Encoder Transformer LM. > DeepFilterNet to clear up MBD artefacts.That's it for us this week, this time I bring you both the news segment AND the deepdive in one conversation, hope it's not super long, see you here next ThursdAI! ๐Ÿ‘Full Transcript: [00:00:00] Intro and housekeeping[00:00:00] โ€‹[00:00:00] Alex Volkov: You're on ThursdAI, and I think it's time for us to get started with the recording and the introduction.[00:00:26] Alex Volkov: Happy, happy Thursday everyone! Today is February 8th, 2024. I don't know, This is the second calendar year the Thursday is happening in, so I don't know if I need to mention the year or not but we're well on our way into 2024 and you're here on Thursday, I, the Thursday I is the space, the newsletter, and the podcast to keep you up to date with all of the very interesting things that are happening in the very fast moving world of ai.[00:00:58] Alex Volkov: Hopefully by now, all of you already have ThursdAI in your podcast, wherever you get a podcast, Spotify, recently YouTube as well, which is weird. But with this introduction, I will just say, hello myself, basically. Hey everyone. My name is Alex Volkov. I'm an AI evangelist with Weights & Biases.[00:01:15] Alex Volkov: Weights & Biases is the reason why this comes to life to you. And there's going to be a little segment about Weights & Biases in the middle here as well, and I'm joined on stage. Often, and pretty much every week by great friends, experts in their fields. As we talk about everything AI related this week, especially we're going to have some interesting things.[00:01:34] Alex Volkov: Those of you who come back week after week. Thank you, and we love that you're part of the community, and it's great to see how many people just return, and those of you who are new, we're here every week and The community doesn't stop after we finish the space. There's a bunch of spaces. I think our friend AlignmentLab had the space that went on for the full week, I think.[00:01:55] Alex Volkov: I don't know if he ever slept. That's maybe why he's not here on stage. But we're here every week for the two hours to give you updates for the first hour and definitely some very interesting deep dives that has been happening, that have been happening for the past few Weeks, I want to say, so I just want to shout out some friends of ours that recently we were featured in the deep dives.[00:02:16] Alex Volkov: We've talked with Maxime Lubon, who trained the Beagle series and then also gave a deep dive with us about model merging. That was really fun. And on the last deep dive, we talked with the Lilac folks and they're building an open source tool. That lets you peer into huge data sets, like imagine millions of rows, data sets, and they chunk and cluster this. And we've talked about the importance of data sets in creation of LLMs or large language models.[00:02:46] Alex Volkov: And they've taken the huge data sets of the folks to usually come up on ThursdAI. Technium from Nous Research just
Hello hello everyone, welcome to another special episode (some podcasts call them just.. episodes I guess, but here you get AI news every ThurdsdAI, and on Sunday you get the deeper dives) BTW, I'm writing these words, looking at a 300 inch monitor that's hovering above my usual workstation in the Apple Vision Pro, and while this is an AI newsletter, and I've yet to find a connecting link (there's like 3 AI apps in there right now, one fairly boring chatbot, and Siri... don't get me started on Siri), I'll definitely be covering my experience in the next ThursdAI, because well, I love everything new and technological, AI is a huge part of it, but not the ONLY part! ๐Ÿ“– It's all about the (big) Datasets Ok back to the matter at hand, if you've used, finetuned, trained or heard about an AI model, you may or may not realize how important the dataset the model was trained with is. We often talk of this model, that model, and often the only different is, additional data that folks (who I sometimes refer to as alchemists) have collected, curated and structured, and creating/curating/editing those datasets is an art and a science. For example, three friends of the pod, namely LDJ with Capybara, Austin with OpenChat and Teknium with Hermes, have been consistently taking of the shelves open source models and making them smarter, more instruction tuned, better for specific purposes. These datasets are paired with different techniques as well, for example, lately the so-called DPO (Direct preference optimization) is a technique that showed promise, since it not only shows a model which answer is the correct for a specific query, it shows an incorrect answer as well, and trains the model to prefer one over the other. (see the recent Capybara DPO improvement by Argilla, which improved model metrics across every evaluation)These datasets can range from super high quality 16K rows, to millions of rows (Teknium's recently released Hermes, one of the higher quality datasets comes in at just a tad over exactly 1 million rows) and often times it's an amalgamation of different other datasets into 1. In the case of Hermes, Teknium has compiled this 1 million chats from at least 15 different datasets, some his own, some by folks like Jon Durbin, Garage bAInd, and shareGPT, from, which was complied by scraping the very popular website, from folks who used the shareGPT extension to share they GPT4 conversations. It's quite remarkable how much of these datasets are just, conversations that users had with GPT-4! Lilac brings GardenWith that backdrop of information, today on the pod we've got the co-founders of Lilac, Nikhil Thorat and Daniel Smilkov, who came on to chat about the new thing they just released called Lilac Garden. Lilac is an open source tool (you can find it RIGHT HERE) which is built to help make dataset creation, curation and classification, more science than art, and help visualize the data, cluster it and make it easily available. In the case of Hermes, that could be more than millions of rows of data.On the pod, I talk with Nikhil and Daniel about the origin of what they both did at Google, working on Tensorflow.js and then something called "know your data" and how eventually they realized that in this era of LLMs, open sourcing a tool that can understand huge datasets, run LLM based classifiers on top of them, or even train specific ones, is important and needed! To strengthen the point, two friends of the pod (Teknium was in the crowd sending us ๐Ÿ‘), LDJ and Austin (aka Alignment Lab) were on stage with us and basically said that "It was pretty much the dark ages before Lilac", since something like OpenOrca dataset is a whopping 4M rows of text. Visualizations in the Garden. So what does lilac actually look like? Here's a quick visualization of the top categories of texts from OpenOrca's 4 million rows, grouped by category title and showing each cluster. So you can see here, Translation requests have 66% (around 200K rows) of the translation category, and you can scroll on and on and add filters and really dissect this whole thing up and down. The categorization is created by running Lilac on your dataset, which uses embedding algorithms and other neat tricks to quickly chunk and put labels on the categories (AKA classifying them). Btw, you can see this view and play around with it yourself, hereBut running this on your own local machine can be a drag, and take hours if not days for bigger datasets, including sometimes hanging and not even working 100%, so the Lilac folks created Lilac Garden, which is a hosted solution by them to provide a dataset, and do classify something like 4M in 4-5 hours or so. Which is definitely not possible on local machines. If you're into that kind of thing, again, Lilac is open source ,so you don't have to sign up or pay them, but if speed and this view matters to you, definitely check Lilac out! RWKV with Eugene (Pico Creator) On the news segment of ThursdAI we mentioned Eagle, which is the 5th version of RWKV, an attention free, potential alternative to Transformers, that's being developed fully in the open source. Later in the show we had the honor to have PicoCreator, one of the front running folks in the RWKV effort, which is an attempt to see if Transformers can be beat with a new type of architecture (RNN) that doesn't require specific attention mechanisms, that add the problem of Quadratic Attention scaling, making LLMs hard and expensive to run the more context is provided. Eugene had some technical issues so joined in the middle of the pod, so we didn't have a full deep-dive, however, I figured it's important to bring this info to you guys, as these efforts may yield AI that runs 10-100x cheaper and potentially faster on devices, using almost infinite context lengths. RWKV and other attempts like StripedHyena (Together AI) and Mamba (from Tri Dao) are attempts that are worth watching as they may supersede or join with Transformers to create the next jump in LLM capabilities.That's all for this Sunday, needless to say, with the Vision Pro releasing on a Friday, it's been a full weekend of future exploration, which is the main driver in my personal life! P.S - if you read through to here, you get a gift! A teaser, I have done something different on the pod, recorded a human interest podcast x AI, for the first time. I mostly bring the news and sometimes deep dives like this one, but this story I couldn't ignore, so stay tuned if you're into dating x AI, and how technology disrupts our lives and wether this is all moral or not, as I recorded an Episode with Sasha Jadan and his new Fiancee Karina, which his AI bot picked out for him, after swiping and matching with over 5200 girls on Tinder. The AI also... suggested he'd propose which he did. It was a very interesting conversation that I plan to upload soon! That's it from me this week, see you all on ThursdAI and don't forget, if you liked this, do me a solid, listen to the pod and then leave a review or a 5 star (at least a 4?) on Apple podcasts ๐Ÿ™ This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to bonus episodes, visit
TL;DR of all topics covered + Show notes* Open Source LLMs* Meta releases Code-LLama 70B - 67.8% HumanEval (Announcement, HF instruct version, HuggingChat, Perplexity)* Together added function calling + JSON mode to Mixtral, Mistral and CodeLLama* RWKV (non transformer based) Eagle-7B - (Announcement, Demo, Yam's Thread)* Someone leaks Miqu, Mistral confirms it's an old version of their model* Olmo from Allen Institute - fully open source 7B model (Data, Weights, Checkpoints, Training code) - Announcement* Datasets & Embeddings* Teknium open sources Hermes dataset (Announcement, Dataset, Lilac)* Lilac announces Garden - LLM powered clustering cloud for datasets (Announcement)* BAAI releases BGE-M3 - Multi-lingual (100+ languages), 8K context, multi functional embeddings (Announcement, Github, technical report)* Nomic AI releases Nomic Embed - fully open source embeddings (Announcement, Tech Report)* Big CO LLMs + APIs* Bard with Gemini Pro becomes 2nd LLM in the world per LMsys beating 2 out of 3 GPT4 (Thread)* OpenAI launches GPT mention feature, it's powerful! (Thread)* Vision & Video* ๐Ÿ”ฅ LLaVa 1.6 - 34B achieves SOTA vision model for open source models (X, Announcement, Demo)* Voice & Audio* Argmax releases WhisperKit - super optimized (and on device) whisper for IOS/Macs (X, Blogpost, Github)* Tools* Infinite Craft - Addicting concept combining game using LLama 2 ( first of the second month of 2024 folks, how was your Jan? Not too bad I hope? We definitely got quite a show today, the live recording turned into a proceeding of breaking news, authors who came up, deeper interview and of course... news.This podcast episode is focusing only on the news, but you should know, that we had deeper chats with Eugene (PicoCreator) from RWKV, and a deeper dive into dataset curation and segmentation tool called Lilac, with founders Nikhil & Daniel, and also, we got a breaking news segment and (from ) joined us to talk about the latest open source from AI2 ๐Ÿ‘Besides that, oof what a week, started out with the news that the new Bard API (apparently with Gemini Pro + internet access) is now the 2nd best LLM in the world (According to LMSYS at least), then there was the whole thing with Miqu, which turned out to be, yes, a leak from an earlier version of a Mistral model, that leaked, and they acknowledged it, and finally the main release of LLaVa 1.6 to become the SOTA of vision models in the open source was very interesting!Open Source LLMsMeta releases CodeLLama 70BBenches 67% on MMLU (without fine-tuninig) and already available on HuggingChat, Perplexity, TogetherAI, Quantized for MLX on Apple Silicon and has several finetunes, including SQLCoder which beats GPT-4 on SQLHas 16K context window, and is one of the top open models for codeEagle-7B RWKV based modelI was honestly disappointed a bit for the multilingual compared to 1.8B stable LM , but the folks on stage told me to not compare this in a transitional sense to a transformer model ,rather look at the potential here. So we had Eugene, from the RWKV team join on stage and talk through the architecture, the fact that RWKV is the first AI model in the linux foundation and will always be open source, and that they are working on bigger models! That interview will be released soonOlmo from AI2 - new fully open source 7B model (announcement)This announcement came as Breaking News, I got a tiny ping just before Nathan dropped a magnet link on X, and then they followed up with the Olmo release and announcement.A fully open source 7B model, including checkpoints, weights, Weights & Biases logs (coming soon), dataset (Dolma) and just... everything that you can ask, they said they will tell you about this model. Incredible to see how open this effort is, and kudos to the team for such transparency.They also release a 1B version of Olmo, and you can read the technical report hereBig CO LLMs + APIsMistral handles the leak rumorsThis week the AI twitter sphere went ablaze again, this time with an incredibly dubious (quantized only) version of a model that performed incredible on benchmarks, that nobody expected, called MIQU, and i'm not linking to it on purpose, and it started a set of rumors that maybe this was a leaked version of Mistral Medium. Remember, Mistral Medium was the 4th best LLM in the world per LMSYS, it was rumored to be a Mixture of Experts, just larger than the 8x7B of Mistral.So things didn't add up, and they kept not adding up, as folks speculated that this is a LLama 70B vocab model etc', and eventually this drama came to an end, when Arthur Mensch, the CEO of Mistral, did the thing Mistral is known for, and just acknowleged that the leak was indeed an early version of a model, they trained once they got access to their cluster, super quick and that it indeed was based on LLama 70B, which they since stopped using.Leaks like this suck, especially for a company that ... gives us the 7th best LLM in the world, completely apache 2 licensed and it's really showing that they dealt with this leak with honor!Arthur also proceeded to do a very Mistral thing and opened a pull request to the Miqu HuggingFace readme with an attribution that looks like this, with the comment "Might consider attribution" ๐Ÿซณ๐ŸŽคBard (with Gemini Pro) beats all but the best GPT4 on lmsys (and I'm still not impressed, help)This makes no sense, and yet, here we are. Definitely a new version of Bard (with gemini pro) as they call it, from January 25 on the arena, now is better than most other models, and it's could potentially be because it has internet access?But so does perplexity and it's no where close, which is weird, and it was a weird result that got me and the rest of the team in the ThursdAI green room chat talking for hours! Including getting folks who usually don't reply, to reply ๐Ÿ˜† It's been a great conversation, where we finally left off is, Gemini Pro is decent, but I personally don't think it beats GPT4, however most users don't care about which models serves what, rather which of the 2 choices LMSYS has shown them answered what they asked. And if that question has a google search power behind it, it's likely one of the reasons people prefer it.To be honest, when I tried the LMSYS version of Bard, it showed me a 502 response (which I don't think they include in the ELO score ๐Ÿค”) but when I tried the updated Bard for a regular task, it performed worse (in my case) than a 1.6B parameter model running locally.Folks from google replied and said that it's not that they model is bad, it's that I used a person's name, and the model just.. refused to answer. ๐Ÿ˜ตโ€๐Ÿ’ซ When I removed a last name it did perform ok, no where near close to GPT 4 though.In other news, they updated Bard once again today, with the ability to draw images, and again, and I'm sorry if this turns to be a negative review but, again, google what's going on?The quality in this image generation is subpar, at least to mea and other folks, I'll let you judge which image was created with IMAGEN (and trust me, I cherry picked) and which one was DALLE for the same exact promptThis weeks Buzz (What I learned with WandB this week)Folks, the growth ML team in WandB (aka the team I'm on, the best WandB team duh) is going live!That's right, we're going live on Monday, 2:30 PM pacific, on all our socials (X, LinkedIn, Youtube) as I'm hosting my team, and we do a recap of a very special week in December, a week where we paused other work, and built LLM powered projects for the company!I really wanted to highlight the incredible projects, struggles, challenges and learnings of what it takes to take an AI idea, and integrated it, even for a company our size that works with AI often, and I think it's going to turn out super cool, so you all are invited to check out the live stream!Btw, this whole endeavor is an initiative by yours truly, not like some boring corporate thing I was forced to do, so if you like the content here, join the live and let us know how it went!OpenAI releases a powerful new feature, @mentions for GPTsThis is honestly so great, it went under the radar for many folks, so I had to record a video to expalin why this is awesome, you can now @mention GPTs from the store, and they will get the context of your current conversation, no longer you need to switch between GPT windows.This opens the door for powerful combinations, and I show some in the video below:Apple is coming to AINot the Apple Vision Pro, that's coming tomorrow and I will definitely tell you how it is! (I am getting one and am very excited, it better be good)No, today on the Apple earnings call, Tim Cook finally said the word AI, and said that they are incredibly excited about this tech, and that we'll get to see something from them this year.Which makes sense, given the MLX stuff, the Neural Engine, the Ml-Ferret and the tons of other stuff we've seen from them this year, Apple is definitely going to step in a big way!Vision & VideoLLaVa 1.6 - SOTA in open source VLM models! (demo)Wow, what a present we got for Haotian Liu and the folks at LLaVa, they upgraded the LlaVa architecture and released a few more models, raging from 7B to 34B, and created the best open source state of the art vision models! It's significantly better at OCR (really, give it a go, it's really impressive) and they exchanged the LLM backbone with Mistral and Hermes Yi-34B.* Better OCR and higher res* Uses several bases like Mistral and NousHermes 34B* Uses lmsys SGlang for faster responses (which we covered a few weeks ago)* SoTA Performance! LLaVA-1.6 achieves the best performance compared with open-source LMMs such as CogVLM or Yi-VL. Compared with commercial ones, it catches up to Gemini Pro and outperforms Qwen-VL-Plus on selected benchmarks.* Low Training Cost. LLaVA-1.6 is trained with 32 GPUs for ~1 day, with 1.3M data samples in total. The compute / training data cost is 100-1000 times smaller than others.Honestly it's quite stunningly good, howev
Hey everyone, we have an exciting interview today with Maxime Labonne. Maxime is a senior Machine Learning Scientist at JPMorgan, the author of Hands on GNNs book and his own ML Blog, creator of LazyMergeKit (which we cover on the pod) and holds a PHD in Artificial Intelligence from the Institut Polytechnique de Paris. Maxime has been mentioned on ThursdAI a couple of times before, as he released the first Phi mixture-of-experts, and has previously finetuned OpenHermes using DPO techniques which resulted in NeuralChat7B For the past couple of months, following AI on X, it was hard not to see Maxime's efforts show up on the timeline, and one of the main reasons I invited Maxime to chat was the release of NeuralBeagle7B, which at the time of writing was the top performing 7B model on the LLM leaderboard, and was specifically a merge of a few models. Model mergingModel merging has been around for a while but recently has been heating up, and Maxime has a lot to do with that, as he recently checked, and his wrapper on top of MergeKit by Charles Goddard (which is the library that put model merging into the mainstream) called LazyMergeKit was in charge of >50% of the merged models on HuggingFace hub leaderboard. Maxime also authored a model merging blogpost on Hugging Face and wrote quite a few articles and shared code that helped others to put merged models out. Modern day AlchemyThis blogpost is a great resource on what model merging actually does, so I won't go into depth of what the algorithms are, please refer to that if you want a deep dive, but in a nutshell, model merging is a technique to apply algorithms to the weights of a few models, even a few instances of the same model (like Mistral7B) and create a new model, that often performs better than the previous ones, without additional training! Since this is algorithmic, it doesn't require beefy GPUs burning power to keep training or finetuning, and since the barrier of entry is very low, we get some cool and crazy results as you'll see below. Yeah, quite crazy as it sounds, this method can also create models of non standard sizes, like 10B or 120B models, since it's slicing pieces of other models and stitching them together in new ways. If you recall, we had a deep dive with Jon Durbin who released Bagel, and Jon specifically mentioned that he created Bagel (based on everything everywhere all at once) as a good base for merges, that will include all the prompt formats, you can read and listen to that episode hereThis merge frenzy, made HuggingFace change the leaderboard, and add a checkbox that hides model merges, because they are flooding the leaderboard, and often, and require much smaller effort than actually pre-training or even finetuning a modelAnd quite often the top of the leaderboard was overrun with model merges like in this example of Bagel and it's merges by CloudYu (which are not the top ones but still in the top 10 as I write this) ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.On why it works? Nisten summarized this pretty well in this now famous copypasta tweet and I've confirmed with Maxime that this is his current understanding as well, it's quite unclear why this seems to perform so well, but it of course doesn't stop the "folks who look for AI Waifus" to keep merging.Following folks like Nathan Lambert from to start paying attention even though he didn't want to! (Still waiting on your writeup Nathan!) UPDATE: As of today Monday Jan 29th, just released a super comprehensive deep dive into merges, which you can read here ๐Ÿ‘‡๐Ÿ‘YALL + Automated LLM EvaluationMaxime as also worked on so many models of his own, that he built a convenient little tracking leaderboard to track their performance, which he called YALL, Yet Another LLM Leaderboard and it's on HuggingFace. You can see that NeuralBeagle is the top dog (sorry, I literally could not resist) It uses the Nous evaluations, and Maxime has created an automation called LLM AutoEval that makes it really simple to run evaluations, which you can run in a Colab super easily. LLM AutoEval is on Github. Merge-aology! Since chatting, Maxime has released a Colab and later a HuggingFace space that takes models names, and shows the genealogy, nay, Merge-aology of the models, which models it was merged from and it's pretty crazy how deep this rabbit hole goes, and crazier even still that these models perform very well after all of these lobotomies! Try it out here: really hope you enjoy this special deep dive, I definitely learned a BUNCH from this conversation with Maxime, and I'm very happy that he came on! This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to bonus episodes, visit
What A SHOW folks, I almost don't want to write anything in the newsletter to MAKE you listen haha but I will I know many of you don't like listening to be babble. But if you chose one episode to listen to instead of just skimming the show-notes, make it this one. We've had 2 deep dives, one into the exciting world of multi-modalilty, we chatted with the creator of Moondream1, Vik and the co-founders of Prophetic, Wes and Eric about their EEG/fMRI multimodal transformer (that's right!) and then we had a DEEP dive into the new Hourglass Diffusion Transformers with Tanishq from MedArc/Stability. More than 1300 tuned in to the live show ๐Ÿ”ฅ and I've got some incredible feedback on the fly, which I cherish so if you have friends who don't already know about ThursdAI, why not share this with them as well? TL;DR of all topics covered: * Open Source LLMs * Stability AI releases StableLM 1.6B params (X, Blog, HF)* InternLM2-Math - SOTA on math LLMs (90% GPT4 perf.) (X, Demo, Github)* MedArc analysis for best open source use for medical research finds Qwen-72 the best open source doctor (X)* Big CO LLMs + APIs* Google teases LUMIERE - incredibly powerful video generation (TTV and ITV) (X, Blog, ArXiv)* ๐Ÿค— HuggingFace announces Google partnership (Announcement)* OpenAi 2 new embeddings models, tweaks turbo models and cuts costs (My analysis, Announcement)* Google to add 3 new AI features to Chrome (X, Blog)* Vision & Video* Adept Fuyu Heavy - Third in the world MultiModal while being 20x smaller than GPT4V, Gemini Ultra (X, Blog)* FireLLaVa - First LLaVa model with commercial permissive license from fireworks (X, Blog, HF, DEMO)* Vikhyatk releases Moondream1 - tiny 1.6B VLM trained on Phi 1 (X, Demo, HF)* This weeks's buzz ๐Ÿ๐Ÿช„ - What I learned in WandB this week* New course announcement from Jason Liu & WandB - LLM Engineering: Structured Outputs (Course link)* Voice & Audio* Meta W2V-BERT - Speech encoder for low resource languages (announcement)* 11 labs has dubbing studio (my dubbing test)* AI Art & Diffusion & 3D* Instant ID - zero shot face transfer diffusion model (Demo)* ๐Ÿ”ฅ Hourglass Diffusion (HDiT) paper - High Resolution Image synthesis - (X, Blog, Paper, Github)* Tools & Others* Prophetic announces MORPHEUS-1, their EEG/fMRI multimodal ultrasonic transformer for Lucid Dream induction (Announcement)* NSF announces NAIRR with partnership from all major government agencies & labs including, OAI, WandB (Blog)* Runway adds multiple motion brushes for added creativity (X, How to)Open Source LLMs Stability releases StableLM 1.6B tiny LLMSuper super fast tiny model, I was able to run this in LMStudio that just released an update supporting it, punches above it's weight specifically on other languages like German/Spanish/French/Italian (beats Phi)Has a very surprisingly decent MT-Bench score as wellLicense is not commercial per se, but a specific Stability AI membershipI was able to get above 120tok/sec with this model with LM-Studio and it was quite reasonable and honestly, itโ€™s quite ridiculous how fast weโ€™ve gotten to a point where we have an AI model that can weight less that 1GB and has this level of performance ๐ŸคฏVision & Video & MultimodalityTiny VLM Moonbeam1 (1.6B) performs really well (Demo)New friend of the pod Vik Hyatk trained Moonbeam1, a tiny multimodal VLM with LLaVa on top of Phi 1 (not 2 cause.. issues) and while it's not commercially viable, it's really impressive in how fast and how quite good it is. Here's an example featuring two of my dear friends talking about startups, and you can see how impressive this TINY vision enabled model can understand this scene. This is not cherry picked, this is literally the first image I tried with and my first result. The image features two men sitting in chairs, engaged in a conversation. One man is sitting on the left side of the image, while the other is on the right side. They are both looking at a laptop placed on a table in front of them. The laptop is open and displaying a presentation, possibly related to their discussion.In the background, there is a TV mounted on the wall, and a cup can be seen placed on a surface nearby. The scene suggests a casual and collaborative environment where the two men are sharing ideas or discussing a topic.Vik joined us on the pod to talk about why he didn't go with Phi-2, he also mentioned that Phi-1.5 was retroactively also MIT'd, it's license literally says MIT now on HF ๐Ÿ‘ Great conversation, tune in for that at around 00:31:35Adept is teasing FuYu Large - their CHONKY VLMAdept previously released Persimmon, and then Fuyu VLM (which is a type of persimmon we see you adept) and now tease the release for Fuyu Heavy, a much bigger model that can compete or come close to GPT4V and GeminiUltra on MMMU and MMLU (text) while being 20x smaller approx. While we don't yet get to play with this, they show some great promise in the benchmarksโญ๏ธ Performance: Excels at multimodal reasoning and matches/exceeds text-based benchmarks.โ—๏ธ Challenges Faced: Dealt with issues related to image data, model stability, and pre-training data scarcity.โœ… Evaluations: Outperforms Gemini Pro on MMLU and MMMU benchmarks.AI Summary by Arc Browser (haha see how I cheated here? I sometimes do shortcut summaries using Arc Max, it's dope, try it) AI releases FireLLaVa - with a commercially available licenseย FireLLaVA is the first commercially permissive open-source LLaVA model, a type of multi-modality model called a Vision-Language Model (VLM) that can understand both visual and textual inputs.* The original LLaVA model was limited for commercial use as it was trained on data generated by GPT-4, which has non-commercial licenses.ย * recreated the LLaVA training data using an open-source language model, CodeLlama 34B Instruct, to make a commercially viable version.- * FireLLaVA performs comparably to the original LLaVA model on benchmarks, showing open-source models can generate high-quality data for VLM training.* FireLLaVA is available via HuggingFace and through's prediction API, enabling new visual capabilities for applications.Vik and I chatted about this, and while Fireworks didn't release datasets, they did release an example of how to start collecting them, and it's clear that everyone is clamoring after great vision / image datasets ๐Ÿ‘Really hoping that many great dataset for multimodal AIs will come out in 2024 giving us increasingly better multi modal LMMs ๐Ÿ‘Big CO LLMs + APIs (Blog)GOOGLE announces LUMIERE video generation model that shows incredible push in consistency Supports multiple tasks like image to video, text to video, video inpainting, Video stylezation and more, looks incredible. It seemed that they have cracked both spatial and temporal consistency, something that's severly lacking in previous video generation attempts, and makes character consistency quite remarkable. Of course, as with other google incredible papers, we never know if we'll ever see this model or be able to play with it, here's hoping ๐ŸคžGoogle will add 3 new AI features to chrome* Chrome is introducing 3 new experimental AI features to make browsing more efficient:* Tab Organizer: Chrome will automatically group similar tabs to help with multitasking* Custom themes: Users can generate unique browser themes using text prompts and AI image generation* Writing help: Chrome will offer suggestions to help users draft messages and posts on websites- They are currently only available to US users who opt-in on the Experimental Features pageย I think this development is super super important because making AI accessible via the incredible Chrome platform to billions of people, is going to put Gemini in front of grandmas, students, everyone. Qutie impressive and the compute needed to pull something like this off is also quite mindboggling! ๐Ÿ‘ Of course, they are not the first browser to add AI, I love the Arc Browser and it has AI previews that I use quite often! This weeks Buzz (What I learned with Weights & Biases this week)Have you like many of us have trouble getting structure output (JSON, other stuctures) from LLMS? Jason also had this problem, that's why he authored the Instructor Library, which makes it easy to guide the LLM to give structured output using Pydantic. Jason has presented at Ai Engineer conference, and recently collaborated with Weights & Biases to launch a free course in how to guide your LLM to give structured outputs! COURSE LINKJason is also an independent consultant working with companies on their AI implementations and has many battle tested examples from implementations across the board, which he shared with us on the pod. Give this short course a try if you haven't yet, it's really high quality content, in addition to tons of other stuff we have there, for free ๐Ÿ‘Voice & Audio 11Labs has a new overdub studio and it's really working wellCheck out this short segment of myself, speaking in dubbed Russian! Itโ€™s really sounds like me, sent to my mom to see if she falls for it ๐Ÿ˜† She didnโ€™tAI Art & DiffusionHourglass Diffusion TransformersNew high resolution diffusion architecture from K-diffusion and RoPE team (X, Blog, Paper, Github)Paper presents a new method called HDiT ( HourGlass Diffusion Transformers) that shows promise in training models with high resolution images without incurring the significant hardware costs that go with scaling image sizes, replaces the latent diffusion models enabling O(n) complexity and scaling well. Utilizing tricks and best practices for transformers architectures, like RoPe (that we've covered on ThursdAI before) cosine similarity self-attention, RMSNorm, GeGLU, etc. and using something called local self attention, this paper shows incredible promise for high resolution architectures for image creation tools. We had the pleasure to host Tanishq Abraham, one of the co-authors (and CEO of MedArc, Director of research with Stability + PHD at 19) to walk us through the p
Download from Google Play
Download from App Store