DiscoverThursdAI - The top AI news from the past week
ThursdAI - The top AI news from the past week
Claim Ownership

ThursdAI - The top AI news from the past week

Author: From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week

Subscribed: 11Played: 123
Share

Description

From Weights & Biases - ThursdAI, the podcast that keeps you ahead of the AI curve. Hosted by AI Evangelist Alex Volkov with a changing panel expert guests, discussing every important AI piece of news and updates from the past week, Open source and more
50ย Episodes
Reverse
Hey ๐Ÿ‘‹ Look it May or May not be the first AI newsletter you get in May, but it's for sure going to be a very information dense one. As we had an amazing conversation on the live recording today, over 1K folks joined to listen to the first May updates from ThursdAI. As you May know by now, I just love giving the stage to folks who are the creators of the actual news I get to cover from week to week, and this week, we had again, 2 of those conversations. First we chatted with Piotr Padlewski from Reka, the author on the new Vibe-Eval paper & Dataset which they published this week. We've had Yi and Max from Reka on the show before, but it was Piotr's first time and he was super super knowledgeable, and was really fun to chat with. Specifically, as we at Weights & Biases launch a new product called Weave (which you should check out at https://wandb.me/weave) I'm getting more a LOT more interested in Evaluations and LLM scoring, and in fact, we started the whole show today with a full segment on Evals, Vibe checks and covered a new paper from Scale about overfitting. The second deep dive was with my friend Idan Gazit, from GithubNext, about the new iteration of Github Copilot, called Copilot Workspace. It was a great one, and you should definitely give that one a listen as wellTL;DR of all topics covered + show notes * Scores and Evals* No notable changes, LLama-3 is still #6 on LMsys* gpt2-chat came and went (in depth chan writeup)* Scale checked for Data Contamination on GSM8K using GSM-1K (Announcement, Paper)* Vibes-Eval from Reka - a set of multimodal evals (Announcement, Paper, HF dataset)* Open Source LLMs * Gradient releases 1M context window LLama-3 finetune (X)* MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4 (X, HF)* Nous Research - Hermes Pro 2 - LLama 3 8B (X, HF)* AI Town is running on Macs thanks to Pinokio (X)* LMStudio releases their CLI - LMS (X, Github)* Big CO LLMs + APIs* Github releases Copilot Workspace (Announcement)* AI21 - releases Jamba Instruct w/ 256K context (Announcement)* Google shows Med-Gemini with some great results (Announcement)* Claude releases IOS app and Team accounts (X)* This weeks Buzz* We're heading to SF to sponsor the biggest LLama-3 hackathon ever with Cerebral Valley (X)* Check out my video for Weave our new product, it's just 3 minutes (Youtube)* Vision & Video* Intern LM open sourced a bunch of LLama-3 and Phi based VLMs (HUB)* And they are MLXd by the "The Bloke" of MLX, Prince Canuma (X)* AI Art & Diffusion & 3D* ByteDance releases Hyper-SD - Stable Diffusion in a single inference step (Demo)* Tools & Hardware* Still haven't open the AI Pin, and Rabbit R1 just arrived, will open later today* Co-Hosts and Guests* Piotr Padlewski (@PiotrPadlewski) from Reka AI* Idan Gazit (@idangazit) from Github Next* Wing Lian (@winglian)* Nisten Tahiraj (@nisten)* Yam Peleg (@yampeleg)* LDJ (@ldjconfirmed)* Wolfram Ravenwolf (@WolframRvnwlf)* Ryan Carson (@ryancarson)Scores and EvaluationsNew corner in today's pod and newsletter given the focus this week on new models and comparing them to existing models.What is GPT2-chat and who put it on LMSys? (and how do we even know it's good?)For a very brief period this week, a new mysterious model appeared on LMSys, and was called gpt2-chat. It only appeared on the Arena, and did not show up on the leaderboard, and yet, tons of sleuths from 4chan to reddit to X started trying to figure out what this model was and wasn't. Folks started analyzing the tokenizer, the output schema, tried to get the system prompt and gauge the context length. Many folks were hoping that this is an early example of GPT4.5 or something else entirely. It did NOT help that uncle SAMA first posted the first tweet and then edited it to remove the - and it was unclear if he's trolling again or foreshadowing a completely new release or an old GPT-2 but retrained on newer data or something. The model was really surprisingly good, solving logic puzzles better than Claude Opus, and having quite amazing step by step thinking, and able to provide remarkably informative, rational, and relevant replies. The average output quality across many different domains places it on, at least, the same level as high-end models such as GPT-4 and Claude Opus.Whatever this model was, the hype around it made LMSYS add a clarification to their terms and temporarily take off the model now. And we're waiting to hear more news about what it is. Reka AI gives us Vibe-Eval a new multimodal evaluation dataset and score (Announcement, Paper, HF dataset)Reka keeps surprising, with only 20 people in the company, their latest Reka Core model is very good in multi modality, and to prove it, they just released a new paper + a new method of evaluating multi modal prompts on VLMS (Vision enabled Language Models) Their new Open Benchmark + Open Dataset is consistent of this format: And I was very happy to hear from one of the authors on the paper @PiotrPadlewski on the pod, where he mentioned that they were trying to create a dataset that was going to be very hard for their own model (Reka Core) and just decided to keep evaluating other models on it. They had 2 main objectives : (i) vibe checking multimodal chat models for day-to-day tasks and (ii) deeply challenging and probing the capabilities of present frontier models. To this end, the hard set contains > 50% questions that all frontier models answer incorrectlyChatting with Piotr about it, he mentioned that not only did they do a dataset, they actually used Reka Core as a Judge to score the replies from all models on that dataset and found that using their model in this way roughly correlates to non-expert human judgement! Very very interesting stuff. The "hard" set is ... well hard! Piotr concluded that if folks want to do research, they will provide free API access to Reka for that, so hit them up over DMs if you want to take this eval for a spin on your new shiny VLM (or indeed verify the metrics they put up) Scale tests for eval dataset contamination with GSM-1K (Announcement, Paper)Scale.ai is one of the most prominent companies in AI you may never have heard of, they are valued at $13B dollars and have pivoted from data processing for autonomous vehicles to being the darling of the government, with agreements from the DoD for data pipeline and evaluation for US Military. They have released a new paper as well, creating (but not releasing) a new dataset that matches the GSM8K (Grade School Math) dataset and evaluation that many frontier companies love to showcase in their release benchmarks with some surprising results! So Scale folks created (but not released) a dataset called GSK 1K, which tracks and is similar to the public GSM-8K dataset, and tested a bunch of existing models on their new one, to see the correlation, and if the different was very stark, assume that some models overfitted (or even had their dataset contaminated) on the publicly available GSM8K. On one end, models like Mistral or Phi do up to 10% worse on GSM1k compared to GSM8k. On the other end, models like Gemini, Claude, or GPT show basically no signs of being overfit.The author goes on to say that overfitting doesn't necessarily mean it's a bad model, and highlights Phi-3 which has a 10% difference on their new GSK-1K score compared to GSM-8K, but still answers 68% of their dataset, while being a tiny 3.8B parameter model. It seems that Scale is now stepping into the Evaluation game and have noticed how much interest there is in actually understanding how models perform, and are stepping into this game, by building (but not releasing so they don't leak) datasets. Jim Fan tweet (and Scale CEO Alex Wang QT) seem to agree that this is the right positioning for Scale (as they don't have models of their own and so can be neutral like Moody's)Open Source LLMs LLama-3 gets 1M context window + Other LLama-3 newsIn the second week of LLama-3 corner, we are noticing a significant ramp in all things Llama-3, first with the context length. The same folks from last week, Gradient, have spend cycles and upscaled/stretched LLama-3 to a whopping 1 million tokens in the context window (Llama-3 8B Gradient Instruct 1048k), with a very decent Niddle in the Haystack result. The main problem? Transformers have quadratic attention scaling issues for longer context, so this isn't something that you'd be able to run on your mac (nay, on your cluster) any time soon, and it's almost only theoretical at this point. The upside? We had Wing Lian (from Axolotl) on the show, and he talked about a new method called LoRD (which is now part of MergeKit) which is a way to extract Loras from models. Think of it as LLM arithmetic, you take the base model (llama-3 in this case) and the finetune (Llama-3 8B Gradient Instruct 1048k) and simple run a command like so: mergekit-extract-lora llama-3-8B-gradient-instruct-1048K llama-3-8B just-the-context-lora [--no-lazy-unpickle] --rank=desired_rankAnd boom, in theory, you have a tiny LoRA file that's extracted that is only the difference between these two models, the base and it's finetune. It's really exciting stuff to be able to do brain surgery on these models and extract only one specific essence! First LLama-3 finetunes that beat the instruct version Folks and Nous research give us a new Hermes-Pro on top of Llama-8B (X, HF) that is beating the llama-3 instruct on benchmarks, which is apparently very hard to do, given that Meta created a LOT of human labeled instructions (10M or so) and gave us a really really good instruct model. Nous Hermes 2 pro is also giving Llama-3 additional superpowers like function calling and tool use, specifically mentioning that this is the model to use if you do any type of agentic stuffThis new version of Hermes maintains its excellent general task and conversation capabilities - but also excels at Function Calling, JSON Structured Outputs, and has improved on several other metrics as well, scoring a 90% on our function calling e
Hey hey folks, happy ThursdAI ๐ŸŽ‰ Not a lot of house-keeping here, just a reminder that if you're listening or reading from Europe, our European fullyconnected.com conference is happening in May 15 in London, and you're more than welcome to join us there. I will have quite a few event updates in the upcoming show as well. Besides this, this week has been a very exciting one for smaller models, as Microsoft teased and than released Phi-3 with MIT license, a tiny model that can run on most macs with just 3.8B parameters, and is really punching above it's weights. To a surprising and even eyebrow raising degree! Let's get into it ๐Ÿ‘‡ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.TL;DR of all topics covered: * Open Source LLMs * Microsoft open sources Phi-3 (X, HF)* LLama3 70B top5 (no top 6) on LMsys (LMsys Arena)* Snowflake open sources Arctic - A massive hybrid MoE (X, Try it, HF)* Evolutionary Model merges support in MergeKit (Blog)* Llama-3 8B finetunes roundup - Longer Context (128K) and Dolphin & Bagel Finetunes* HuggingFace FINEWEB - a massive 45TB (the GPT4 of datasets) and 15T tokens high quality web data dataset (HF)* Cohere open sourced their chat interface (X)* Apple open sources OpenElm 4 models + training library called corenet (HF, Github, Paper)* Big CO LLMs + APIs* Google Gemini 1.5 pro is #2 on LMsys arena * Devin is now worth 2BN and Perplexity is also a Unicorn * A new comer called Augment (backed by Eric Schmidt) is now coming out of stealth (X)* Vision & Video* Adobe releases VideoGigaGAN - high quality upscaler with temporal consistency (paper)* TLDraw autocomplete UI demo (X)* This Weeks Buzz - What I learned in WandB this week* Joe Spisak talk about Llama3 on Stage at WandB Fully connected (Full Talk, TLDR)* Voice & Audio* Play.ai (previously play.ht) releases conversational Voice AI platform (X)* AI Art & Diffusion & 3D* IMGsys.org- like LMsys but for image generation model + leaderboard from FAL (try it)* Tools & Hardware* Rabbit R1 release party & no shipping update in sight* I'm disillusioned about my AI Pin and will return itOpen Source LLMs Llama-3 1 week-aversary ๐ŸŽ‚ - Leaderboard ranking + finetunes Well, it's exactly 1 week since we got Llama-3 from Meta and as expected, the rankings show a very very good story. (also it was downloaded over 1.2M times and already has 600 derivatives on HuggingFace) Just on Monday, Llama-3 70B (the bigger version) took the incredible 5th place (now down to 6th) on LMSys, and more surprising, given that the Arena now has category filters (you can filter by English only, Longer chats, Coding etc) if you switch to English Only, this model shows up 2nd and was number 1 for a brief period of time. So just to sum up, an open weights model that you can run on most current consumer hardware is taking over GPT-4-04-94, Claude Opus etc' This seems dubious, because well, while it's amazing, it's clearly not at the level of Opus/Latest GPT-4 if you've used it, in fact it fails some basic logic questions in my tests, but it's a good reminder that it's really hard to know which model outperforms which and that the arena ALSO has a bias, of which people are using it for example and that evals are not a perfect way to explain which models are better. However, LMsys is a big component of the overall vibes based eval in our community and Llama-3 is definitely a significant drop and it's really really good (even the smaller one) One not so surprising thing about it, is that the Instruct version is also really really good, so much so, that the first finetunes of Eric Hartfords Dolphin (Dolphin-2.8-LLama3-70B) is improving just a little bit over Meta's own instruct version, which is done very well. Per Joe Spisak (Program Manager @ Meta AI) chat at the Weights & Biases conference last week (which you can watch below) he said "I would say the magic is in post-training. That's where we are spending most of our time these days. Uh, that's where we're generating a lot of human annotations." and they with their annotation partners, generated up to 10 million annotation pairs, both PPO and DPO and then did instruct finetuning. So much so that Jeremy Howard suggests to finetune their instruct version rather than the base model they released.We also covered that despite the first reactions to the 8K context window, the community quickly noticed that extending context window for LLama-3 is possible, via existing techniques like Rope scaling, YaRN and a new PoSE method. Wing Lian (Maintainer of Axolotl finetuneing library) is stretching the model to almost 128K context window and doing NIH tests and it seems very promising! Microsoft releases Phi-3 (Announcement, Paper, Model)Microsoft didn't really let Meta take the open models spotlight, and comes with an incredible report and follow up with a model release that's MIT licened, tiny (3.8B parameters) and performs very very well even against Llama-3 70B. Phi is a set of models from Microsoft that train on synthetic high-quality dataset modeled after textbooks-is-all-you-need/TinyStories approach. The chart is quite incredible, the smallest (mini) Phi-3 is beating Llama-3-8B AND Mixtral on MMLU scores, BigBench and Humaneval. Again to simplify, this TINY 3.8B model, half the size of 1 Mixtral expert, beats Mixtral and newly released Llama-3-8B on most benchmark, not to mention GPT-3.5! It's honestly quite a crazy chart to look at, which raises the question, did this model train on these benchmarks? ๐Ÿค” I still haven't seen definitive proof that the folks at Microsoft trained on any benchmarks data, I did see engagement from them and a complete denial, however we did see a few attempts at using Phi-3 and the quantized versions and the wrong end token formatting seem to be very prevalent in shaping the early opinion that this model performance is detached from it's very high scoring. Not to mention that model being new, there's confusion about how to use it, see thread from Anton Bacaj about HuggingFace potentially using the wrong end token to finish conversations. Now to an actual performance of this tiny model, I asked it a simple logic based question that trips many models even ones good with logic (Opus and GPT-4 answer it correctly usually) and it performed very well (here a comparison with LLama-3-70B which didn't do as well)Additionally, their tokenizer is very interesting, they have all these terms that receive a full token, things like function_list, calc, ghreview, ghissue, and others, which highlight some interesting potential use-cases they have planned for this set of models or give us a hint at it's training process and how come it's so very good. Snowflake open sources Arctic - a massive 480B MoE Hybrid with Apache 2 license (X, Try it, HF)Snowflake is a name I haven't yet used on ThursdAI and this field is getting crowded, but they just released something interesting (+ a LOT of open source, including training code, checkpoints, research insights etc')The thing I found most interesting is, the massive 128 experts MoE but also the Hybrid architecture. Not quite an MoE and definitely not a dense model. They claim to have found that training Many-but-condensed experts with more expert choices is working well for them based on DeepSpeed research. You can give this model a try here and I have, using the same 2 questions I had for Phi and LLama and found the model not that great at logic to be honest, but it was really fast considering the total size, so inference optimization for this type of architecture is definitely geared towards Enterprise (as well as training cost, they claim it cost just under $2 million dollars to train) Big CO LLMs + APIsNot a lot of super interesting things in this corner, besides Gemini 1.5 pro (the one with 1M context window) finally appearing in the Arena and taking the amazing #2 spot (pushing Llama-3 8B to number 6 on the same day it just appeared in there lol) This is very impressive, and I gotta wonder what happened with Gemini Ultra if pro with larger context beats it outright. It's indeed very good, but not THAT good if you use it om simple logic problems and don't use the whole context length. I suspect that we'll hear much more about their AI stuff during the upcoming Google IO (which I was invited to and am going to cover) Additionally, we've had quite a few AI Unicorns born, with Perplexity becoming a freshly mint Unicorn with an additional round of funding and Devin, the 6-month old agent startup getting to a 2 billion valuation ๐Ÿ˜ฎ This weeks Buzz (What I learned with WandB this week)It's been exactly 1 week since our conference in SF and since Joe Spisak by complete chance announced Meta LLama - 3 live on stage a few hours after it was officially announced. In this weeks buzz, I'm very happy to bring you that recording, as promised last week. I will also share that our newly announced new LLM observability tool Weave launched officially during the conference and it'll be my job to get you to use it ๐Ÿ™‚ And shoutout to those in the ThursdAI community who already used and provided feedback, it's really helpful! AI Art & DiffusionThe fine folks at FAL.ai have launched the LMsys.org for images, and called it.... IMGsys.org ๐Ÿ™‚ It's a adversarial arena with different image generators, all hosted on Fal I assume, that lets the user choose which images are "better" which is a vague term. But it's really fun, give it a try! Tools & HardwareRabbit R1 first impressionsWe finally got a tease of R1 from Rabbit, as the first customers started receiving this device (where's mine?? I didn't even get a tracking number) Based on the presentation (which I watched so you don't have to) the response time, which was one of the most talked about negative pieces of AI Pin seems very decent. We're going to see a lot of reviews, but I'm very excited about my Rabbit ๐Ÿ‘ ๐Ÿ‡ Apparently
Happy LLama 3 day folks! After a lot of rumors, speculations, and apparently pressure from the big Zuck himself, we finally can call April 18th, 2024, LLaMa 3 day! I am writing this, from a lobby of the Mariott hotel in SF, where our annual conference is happening called Fully Connected, and I recorded today's episode from my hotel room. I really wanna shout out how awesome it was to meet folks who are listeners of the ThursdAI pod and newsletter subscribers, participate in the events, and give high fives. During our conference, we had the pleasure to have Joe Spisak, the Product Director of LLaMa at Meta, to actually announce LLaMa3 on stage! It was so exhilarating, I was sitting in the front row, and then had a good chat with Joe outside of the show ๐Ÿ™Œ The first part of the show was of course, LLaMa 3 focused, we had such a great time chatting about the amazing new 8B and 70B models we got, and salivating after the announced but not yet released 400B model of LLaMa 3 ๐Ÿ˜ฎ We also covered a BUNCH of other news from this week, that was already packed with tons of releases, AI news and I was happy to share my experiences running a workshop a day before our conference, with focus on LLM evaluations. (If there's an interest, I can share my notebooks and maybe even record a video walkthrough, let me know in the comments) Ok let's dive in ๐Ÿ‘‡ Happy LLama 3 day ๐Ÿ”ฅ The technical detailsMeta has finally given us what we're all waiting for, an incredibly expensive (2 clusters of 24K H100s over 15 Trillion tokens) open weights models, the smaller 8B one and the larger 70B one. We got both instruction fine tune and base models, which are great for finetuners, and worth mentioning that this is a dense model (not a mixture of experts, all the parameters are accessible for the model during inference) It is REALLY good at benchmarks, with the 7B model beating the previous (LLaMa 2 70B) on pretty much all benchmarks, and the new 70B is inching on the bigger releases from the past month or two, like Claude Haiku and even Sonnet! The only downsides are the 8K context window + non multimodality, but both are coming according to Joe Spisak who announced LLama3 on stage at our show Fully Connected ๐Ÿ”ฅ I was sitting in the front row and was very excited to ask him questions later! By the way, Joe did go into details they haven't yet talked about pulblicly (see? I told you to come to our conference! and some of you did!) and I've been live-tweeting his whole talk + the chat outside with the "extra" spicy questions and Joes winks haha, you can read that thread hereThe additional infoMeta has also partnered with both Google and Bing (take that OpenAI) and inserted LLama 3 into the search boxes of Facebook, Instagram, Messenger and Whatsapp plus deployed it to a new product called meta.ai (you can try it there now) and is now serving LLama 3 to more than 4 Billion people across all of those apps, talk about compute cost! Llama 3 also has a new Tokenizer (that Joe encouraged us to "not sleep on") and a bunch of new security tools like Purple LLama and LLama Guard. PyTorch team recently released finetuning library called TorchTune is now supporting LLama3 finetuning natively out of the box as well (and integrates Wandb as it's first party experiment tracking tool) If you'd like more details, directly from Joe, I was live tweeting his whole talk, and am working at getting the slides from our team. We'll likely have a recording as well, will post it as soon as we have it. Here's a TL;DR (with my notes for the first time) of everything else we talked about, but given today is LLaMa day, and I still have to do fully connected demos, I will "open source" my notes and refer you to the podcast episode to hear more detail about everything else that happened today ๐Ÿซก TL;DR of all topics covered: * Meta releases LLama 3 -8B, 70B and later 400B (Announcement, Models, Try it, Run Locally)* Open Source LLMs * Meta LLama 3 8B, 70B and later 400B (X, Blog)* Trained 15T tokens! * 70B and 8B modes released + Instruction finetuning* 8K context length , not multi modal* 70B gets 82% on MMLU and 81.7% on HumanEval* 128K vocab tokenizer* Dense model not MoE* Both instruction tuned on human annotated datasets* Open Access* The model already uses RoPe * Bigxtral instruct 0.1 (Blog, Try it)* Instruct model of the best Apache 2 model around* Release a comparison chart that everyone started "fixing" * ๐Ÿค– Mixtral 8x22B is Mistral AI's latest open AI model, with unmatched performance and efficiencyย * ๐Ÿ—ฃ It is fluent in 5 languages: English, French, Italian, German, Spanish* ๐Ÿงฎ Has strong math and coding capabilities ย * ๐Ÿง  Uses only 39B parameters out of 141B total, very cost efficient* ๐Ÿ—œ Can recall info from large documents thanks to 64K token context window* ๐Ÿ†“ Released under permissive open source license for anyone to use* ๐Ÿ† Outperforms other open models on reasoning, knowledge and language benchmarks ย * ๐ŸŒ Has strong multilingual abilities, outperforming others in 4 languages* ๐Ÿงช Excellent basis for customization through fine-tuning* New Tokenizer from Mistral (Docs)* Focusing on Tool Use with tokens ๐Ÿ”ฅ* WizardLM-2 8x22B, 70B and 7B (X, HF)* Released it and then pulled it back from HF and Github due to microsoft toxicity not passing* Big CO LLMs + APIs* OpenAI gives us Batch API + Assistants API v2 * Batch is 50% cost and win win win* Assistants API V2 - new RAG* new file search tool* up to 10,000 files per assistant* new vector store* Reka gives us Reka Core (X, Try)* Multimodal that understands video as well* 20 people team* Video understanding is very close to Gemini * 128K context * Core has strong reasoning abilities including for language, math and complex analysis.* 32 languages support * HuggingFace ios chat bot now * This weeks Buzz* Me + team led a workshop a day before the conference (Workshop Thread)* Fully Connected in SF was an incredible success, over 1000 AI attendies + Meta AI announcement on stage ๐Ÿ”ฅ * PyTorch new TorchTune finetuning library with first class WandB support (X)* Vision & Video* Microsoft VASA-1 animated avatars (X, Blog)* Amazing level of animation from 1 picture + Sound* Harry Potter portraits are here* They likely won't release this during Election year* Looks very good ,close to EMO but no code* ๐Ÿ“บ Videos show faces speaking naturally with head movements and lip sync* ๐Ÿ”ฌ Researchers are exploring applications in education, accessibility and more* HuggingFace updates IDEFICS2 8B VLM (X, HF)* Apache 2 license* Competitive with 30B models* 12 point increase in VQAv2, 30 point increase in TextVQA (compared to Idefics 1)* > 10x fewer parameters than Idefics 1* Supports image resolution up to 980 x 980+* Better OCR capabilities (thanks to more than 6TB of OCR pre-training data)* Adobe shows Firefly video + SORA support (X)* Voice & Audio* Rewind AI is now Limitless (X)* New service & Brand name* Transcription to you * Hardware device that looks sleek * 100hours * Privacy support in cloud* AI Art & Diffusion & 3D* Stability - Stable Diffusion 3 is here * Available via API only* Partnered with Fireworks HQ for the release* Needs stability AI membership to use / access $$* Big step up in composition and notorious issues like hands, "AI faces" etc. (from * Seems to prefer simpler prompts.* Way more copyright-friendly. It's hard to get any kind of brands/logos. * Text is amazing.* Others* New AIrChat with amazing transcription is out, come join us in our AI corner there* Humane AI pin was almost killed by MKBHD review* Rabbit reviews incomingThat's all for this week, next week we have an amazing guest, see you then! ๐Ÿซก This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
this week was absolutely bonkers. For starters, for the first time ever, we got an Open Weights model (Command R+) to jump over GPT-4 in human rankings on LMsys, this is huge!Then on Tuesday, it seems that all the companies just wanted to one up one another, first Gemini 1.5 released with updates, made it available in 180 countries, added audio mode + tons of API improvements and system prompts, then less than an hour later, OpenAI has given us a "majorly improved" GPT-4 Turbo version (2024-04-09) that is now back to being the BEST LLM IN THE WORLD and to cap that day off, Mistral did the thing again, the thing being, dropping a torrent link in a tweet with no explanations.What was in that torrent is a Mixtral 8x22B MoE (which we started calling Bixtral) which comes with an Apache2 license and seems to be VERY good!We also saw the first finetune from HuggingFace/KAIST folks less than 48 hours later (the authors of said finetune actually came on the show ๐ŸŽ‰ )Fully Connected is a week from today! If you haven't yet signed up, use THURSDAI promo code and come hear from Richard Socher (You.com), Jerry Liu (Ilamaindex CEO), Karoly (TwoMinutePapers), Joe Spisak (Meta) and and leaders from NVIDIA, Snowflake, Microsoft, Coatue, Adobe, Siemens, Lambda and tons more ๐Ÿ‘‡TL;DR of all topics covered:* Open Source LLMs* ๐Ÿ”ฅ Mistral releases Mixtral 8x22 Apache 2 licensed MoE model (Torrent, TRY IT)* Cohere CMDR+ jumps to no 6 on LMSys and beats GPT4 (X)* CodeGemma, RecurrentGemma & Gemma Instruct 1.1 (Announcement)* Auto-code-rover gets 22% on SWE bench (Announcement)* HuggingFace - Zephyr 141B-A35B - First Bixtral Finetune (Announcement)* Mistral 22B - 1 single expert extracted from MoE (Announcement, HF)* This weeks Buzz - Weights & Biases updates* FullyConnected is in 1 week! (Come meet us)* Big CO LLMs + APIs* ๐Ÿ”ฅ GPT-4 turbo is back to being number 1 AI with 88.2% Human Eval score (X)* Gemini 1.5 Pro now understands audio, uses unlimited files, acts on your commands, and lets devs build incredible things with JSON mode (X)* LLama 3 coming out in less than a month (confirmed by Meta folks)* XAI Grok now powers news summaries on X (Example)* Cohere new Rerank 3 (X)* Voice & Audio* HuggingFace trained Parler-TTS (Announcement, Github)* Udio finally launched it's service (Announcement, Leak, Try It)* Suno has added explore mode (suno.ai/explore)* Hardware* Humane AI pin has started shipping - reviews are not amazingOpen Source LLMsCommand R+ first open weights model that beats last year GPT4 versionsThis is massive, really a milestone to be discussed, and even though tons of other news happened, the first time an open weights model is beating GPT-4 not on a narrow case (coding, medical) but on a general human evaluation on the arena.This happened just a year after GPT-4 first came out, and is really really impressive.Command R+ has been getting a lot of great attention from the community as well, folks were really surprised by the overall quality, not to mention the multilingual abilities of CommandR+Mixtral 8x22B MoE with 65K context and Apache 2 license (Bigstral)Despite the above, Cohere time in the sun (ie top open weights model on lmsys) may not be that long if the folks at Mistral have anything to say about it!Mistral decided to cap the crazy Tuesday release day with another groundbreaking tweet of theirs which includes a torrent link and nothing else (since then they of course uploaded the model to the hub) giving us what potentially will unseat Command R from the rankings.The previous Mixtral (8x7B) signaled the age of MoEs and each expert in that was activated from Mistral 7B, but for this new affectionally named Bixtral model, each expert is a 22B sized massive model.We only got a base version of it, which is incredible on it's own right, but it's not instruction finetuned yet, and the finetuner community is already cooking really hard! Though it's hard because this model requires a lot of compute to finetune, and not only GPUs, Matt Shumer came on the pod and mentioned that GPUs weren't actually the main issue, it was system RAM when the finetune was finished.The curious thing about it was watching the loss and the eval loss. it [Bixtral] learns much faster than other models - Matt ShumerMatt was trying to run Finetunes for Bigstral and had a lot of interesting stuff to share, definitely check out that conversation on the pod.Bigstral is... big, and it's not super possible to run it on consumer hardware.... yet, because Nisten somehow got it to run on CPU only ๐Ÿคฏ using Justin Tuneys LLM kernels (from last week) and LLama.cpp with 9tok/s which is kinda crazy.HuggingFace + KAIST release Zephyr 141B-A35B (First Mixtral 8x22 finetune)And that was fast, less than 48 hours after the torrent drop, we already see the first instruction finetune from folks at HuggingFace and KAIST AI.They give us a new finetune using ORPO, a technique by KAIST that significantly improves finetuning ability (they finetuned Bigstral with 7k capybara instructions for 1.3 hours on 4 nodes of 8 x H100s)They used the distilled Capybara Dataset (From LDJ and Argilla) to give this model a bit more clarity and instruction following.You can find the model on the hub here, and the question is, but now the question is would one run this? ๐Ÿ˜…Btw the authors of the finetune and the ORPO paper from KAIST, Jiwoo Hong and Noah Lee came on the pod and chatted about this finetune and ORPO which was awesome! Definitely check this conversation out.Big CO LLMs + APIsGemini 1.5 Pro updates - Audio Mode, JSON, System prompts and becomes freeGoogle really pulled out all the stops for this updated release of Gemini 1.5 Pro, it's flagship, 1M context window model.Its now available for free to over 180 countries, has a new audio mode where you can upload up to 9.5 hours of audio (which is crazy on it's own) and it's not merely transcription, it seems that they baked an audio encoder in there so the model can understand some tonality and even some dogs barking in the background!In fact, instead of me writing down, how about I show you an example of Gemini itself extracting everything I said about it during the show? Here's a screenshot of me uploading 2+ hours of raw unedited audio form the show today:You can see the Google AI studio (which is a very clean product!) and the new system message, the ability to turn the safety filters off (thank you!) and the audio mode. Not to mention the 250K tokens ๐Ÿ˜‚ that my audio cost this model. Mind you, the highest context window after Gemini is Claude 3 with 200K.Google also significantly improves the APIs, and gave access to a new file upload API that allows up to 2GB files uploaded (to support this amazing context and multimodality) ๐Ÿ”ฅOpenAI - GPT 4 turbo a new and "Majorly improved version"Remember when Gemini 1.5 was announced? You may not remember that specific day, because an hour after that, OpenAI published SORA and blew our collective minds off.Well, OpenAI is at it again, but this time it didn't quite work the same way, but an hour after Gemini 1.5 updates came out, OpenAI released GPT4-Turbo-April-9 aka (gpt-4-turbo-2024-04-09) and basically all they said that it was "majorly improved"The technical stuff first, they combined the tool use (function calling) API with the Vision API, which is feature parity with Anthropic).The vibes are currently good, folks are seeing improvements across the board in logic and code creation, specifically the folks at Cursor posted an example (and enabled this model in their IDE) where it writes higher quality code.As Iโ€™m writing these words, LMSys updated us that this new model shot up to the top of the arena taking the Mantle back from Opus as the best AI we have, and also a confirmation from OpenAI that this model is now powering the chatGPT interface ๐Ÿ‘OpenAI also just open sourced a repo to show what they used to get these exact scores for the new GPT-4 and they are impressiveThis weeks Buzz (What I learned with WandB this week)Final Call! Fully Connected, our very own annual conference is about to commence(hehe of course it's happening on a ThursdAI, I still have to think about how to record the show next week)Please feel free to use the code THURSDAI to sign up and come see us.As a reminder, we're also running a workshop a day before, where we're going to showcase Weave and give practical examples for LLM builders, and it's going to be a lot of fun! Looking forward to see some of you there!Audio & VoiceUdio launches a suno competitor AI Music serviceFor the past week+ I've seen tons of AI plugged folks in SF post about "a new AI for music is coming and it's going to be amazing". Well it's finally here, called Udio and it gives Suno a run for its money for sure.With the ability to create full tracks, create into and outro, remix, and a very needed AI enhanced prompting, Udio does look very very polished and sounds GOOD!Here is an example of a classical music track that's been going viral:I've played a few more examples on the show itself, and you can check out the trending creations on their page.Interestingly, this is probably a diffusion model, and so folks have been squeezing all kinds of stuff that's not only musical out of there, including, stand up comedy with a full laugh track.Suno adds explore modeMeanwhile Suno is not going down without a fight and have released this amazing new page where they generated thousands of samples for hundreds of interesting/weird sound styles, letting you get exposed and learn about different musical styles. I really liked it so recorded a short reaction video:Phew, somehow we made it, we were able to summarize the huge news this week in under two hours + a newsletter!The one thing I haven't been able to do is to actually try out many of the stuff I talked about, so after writing this, will take a little break and delve into some of the other things I haven't yet tried ๐Ÿ‘€See you guys next week in limited capacity (maybe, we'll see) and until then, have a
Happy first ThursdAI of April folks, did you have fun on April Fools? ๐Ÿ‘€ I hope you did, I made a poll on my feed and 70% did not participate in April Fools, which makes me a bit sad! Well all-right, time to dive into the news of this week, and of course there are TONS of news, but I want to start with our own breaking news! That's right, we at Weights & Biases have breaking new of our own today, we've launched our new product today called Weave! Weave is our new toolkit to track, version and evaluate LLM apps, so from now on, we have Models (what you probably know as Weights & Biases) and Weave. So if you're writing any kind RAG system, anything that uses Claude or OpenAI, Weave is for you! I'll be focusing on Weave and I'll be sharing more on the topic, but today I encourage you to listen to the launch conversation I had with Tim & Scott from the Weave team here at WandB, as they and the rest of the team worked their ass off for this release and we want to celebrate the launch ๐ŸŽ‰TL;DR of all topics covered: * Open Source LLMs * Cohere - CommandR PLUS - 104B RAG optimized Sonnet competitor (Announcement, HF)* Princeton SWE-agent - OSS Devin - gets 12.29% on SWE-bench (Announcement, Github)* Jamba paper is out (Paper)* Mozilla LLamaFile now goes 5x faster on CPUs (Announcement, Blog)* Deepmind - Mixture of Depth paper (Thread, ArXiv)* Big CO LLMs + APIs* Cloudflare AI updates (Blog)* Anthropic adds function calling support (Announcement, Docs)* Groq lands function calling (Announcement, Docs)* OpenAI is now open to customers without login requirements * Replit Code Repair - 7B finetune of deep-seek that outperforms Opus (X)* Google announced Gemini Prices + Logan joins (X)ืงืจืž* This weeks Buzz - oh so much BUZZ!* Weave lunch! Check weave out! (Weave Docs, Github)* Sign up with Promo Code THURSDAI at fullyconnected.com * Voice & Audio* OpenAI Voice Engine will not be released to developers (Blog)* Stable Audio v2 dropped (Announcement, Try here)* Lightning Whisper MLX - 10x faster than whisper.cpp (Announcement, Github)* AI Art & Diffusion & 3D* Dall-e now has in-painting (Announcement) * Deep dive* Jamba deep dive with Roi Cohen from AI21 and Maxime Labonne Open Source LLMs Cohere releases Command R+, 104B RAG focused model (Blog)Cohere surprised us, and just 2.5 weeks after releasing Command-R (which became very popular and is No 10 on Lmsys arena) gave us it's big brother, Command R PLUSWith 128K tokens in the context window, this model is multilingual as well, supporting 10 languages and is even beneficial on tokenization for those languages (a first!) The main focus from Cohere is advanced function calling / tool use, and RAG of course, and this model specializes in those tasks, beating even GPT-4 turbo. It's clear that Cohere is positioning themselves as RAG leaders as evident by this accompanying tutorial on starting with RAG apps and this model further solidifies their place as the experts in this field. Congrats folks, and thanks for the open weights ๐ŸซกSWE-Agent from PrincetonFolks remember Devin? The super cracked team born agent with a nice UI that got 13% on the SWE-bench a very hard (for LLMs) benchmark that requires solving real world issues?Well now we have an open source agent that comes very very close to that called SWE-AgentSWE agent has a dedicated terminal and tools, and utilizes something called ACI (Agent Computer Interface) allowing the agent to navigate, search, and edit code. The dedicated terminal in a docker environment really helps as evident by a massive 12.3% score on SWE-bench where GPT-4 gets only 1.4%! Worth mentioning that SWE-bench is a very hard benchmark that was created by the folks who released SWE-agent, and here's some videos of them showing the agent off, this is truly an impressive achievement!Deepmind publishes Mixture of Depth (arXiv)Thanks to Hassan who read the paper and wrote a deep dive, this paper by Deepmind shows their research into optimizing model inference. Apparently there's a way to train LLMs without affecting their performance, which later allows to significantly reduce compute on some generated tokens. ๐Ÿง  Transformer models currently spread compute uniformly, but Mixture-of-Depths allows models to dynamically allocate compute as needed๐Ÿ’ฐ Dynamically allocating compute based on difficulty of predicting each token leads to significant compute savings โณ Predicting the first token after a period is much harder than within-sentence tokens, so more compute is needed ๐Ÿ—‘ Most current compute is wasted since difficulty varies between tokensWe're looking forward to seeing models trained with this, as this seems to be a very big deal in how to optimize inference for LLMs. Thank you for reading ThursdAI - Best way to support us is to just share this with folks ๐Ÿ‘‡Big CO LLMs + APIsAnthropic and Groq announce function calling / tool use support, Cohere takes it one step furtherIn yet another example of how OpenAI is leading not only in models, but in developer experience, most models and API providers are now using the same messages API structure. Back in June of 2023, OpenAI gave us function calling, and finally the industry is aligning to this format, which is now being rebranded as "tool use" If you're unfamiliar with the concept, tool use allows a developer to specify what tools the model can have in addition to just spitting out tokens, think browsing the web, or using RAG to get more information, or check the weather, or... turn off a lighbulb in your smart home. The LLM then decides based on user input, if a specific tool needs to be called, responds with the tool and parameters it needs to the developer, and then expects the result of that tool, and finally, is able to respond to the user with the complete information. So this week we've got Command R, Groq and Anthropic all adding support for tool use, which is incredible for developer experience across the board and will allow developers to move between all those APIs. Cohere goes one step further with something they call Multi Step tool use, which is a significant step up and is very interesting to explore, as it gives their models the ability to rank and order tool execution, and ovserve their responses.Anthropic Docs https://docs.anthropic.com/claude/docs/tool-useGroq Docs https://console.groq.com/docs/tool-useCohere Docs https://docs.cohere.com/docs/multi-step-tool-useCloudflare AI is now in GA + workers in PythonIf you've been following ThursdAI, you know I'm a huge Cloudflare fan. I've built my startup (https://targum.video) on top of Cloudflare workers platform, and I gave them early feedback about having to step into AI in a big way. And they did, with workers AI which is now in GA. Workers AI lets developers in the Cloudflare ecosystem run LLMs (they mostly feature Opensource LLMs which is incredible), host vectors, run whisper and basically have end to end serverless apps that are powered by AI (they have GPUs in 150 cities around the world)This week Clouflare announced also the ability to write workers in Python, which was sorely missing for some folks (like me!) who love FastAPI for example, and while it's not a full python environment, the depth to which they had to go in order to allow python to execute on their edge is kind of ridiculous, read up on it hereI'm hoping to work with them to bring weave into the workers for python soon ๐Ÿคž because building AI applications with Cloudflare is so simple, they even have a HuggingFace integration which allows you to bring models into your CF environment with 1 click. This weeks Buzz - SO MUCH BUZZHey, well first of all, I now can offer you a 15% off a ticket to our conference, so use THURSDAI when you checkout and get a ticket hereNow that Weave is out, it's possible to say that our workshop on April 17 (same link as above) is going to be focused on LLM evaluations and yes, I will be talking about how to use weave to build LLM applications in production safely. If this field is new to you, please sign up and come to the workshop!JAMBA deep dive with Roi @ AI21 and Maxime LabonneAs always, what I cover in this newsletter are only the highlights of what we talked about, but there was so much more, I really recommend you to listen to the episode. This of this weeks episode as 2 episodes (maybe I should re-release the deep dive as a separate episode) because we had a long conversation with Roi Cohen who's a PM @ AI21 and Maxime Labonne (Author of LazyMergeKit and first finetune of JAMBA), it's really worth tuning into that interview. Here's a little snippet: Aaaand this is it for this week, or you know what? Maybe it's not! I shared this on X but if you don't follow me on X, I decided to prank my whole feed by saying that I'm basically changing careers and becoming a Russian AI DJ, called DJ Thursday and I will only play AI generated music. The weird thing, how many people were like, yeah ok, this makes sense for you ๐Ÿ˜… So here's my April Fools (one of them) joke, hope you enjoy the high quality of these tunes and see you all next week ๐Ÿซก This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
Hey everyone, this is Alex and can you believe that we're almost done with Q1 2024? March 2024 was kind of crazy of course, so I'm of course excited to see what April brings (besides Weights & Biases conference in SF called Fully Connected, which I encourage you to attend and say Hi to me and the team!) This week we have tons of exciting stuff on the leaderboards, say hello to the new best AI in the world Opus (+ some other surprises), in the open source we had new MoEs (one from Mosaic/Databricks folks, which tops the open source game, one from AI21 called Jamba that shows that a transformers alternative/hybrid can actually scale) and tiny MoE from Alibaba, as well as an incredible Emotion TTS from Hume. I also had the pleasure to finally sit down with friend of the pod Tanishq Abraham and Paul Scotti from MedArc and chatted about MindEye 2, how they teach AI to read minds using diffusion models ๐Ÿคฏ๐Ÿง ๐Ÿ‘๏ธThank you for reading ThursdAI - Recaps of the most high signal AI weekly spaces. This post is public so feel free to share it.TL;DR of all topics covered: * AI Leaderboard updates* Claude Opus is number 1 LLM on arena (and in the world)* Claude Haiku passes GPT4-0613* ๐Ÿ”ฅ Starling 7B beta is the best Apache 2 model on LMsys, passing GPT3.5* Open Source LLMs * Databricks/Mosaic DBRX - a new top Open Access model (X, HF)* ๐Ÿ”ฅ AI21 - Jamba 52B - Joint Attention Mamba MoE (Blog, HuggingFace)* Alibaba - Qwen1.5-MoE-A2.7B (Announcement, HF)* Starling - 7B that beats GPT3.5 on lmsys (HF)* LISA beats LORA as the frontrunner PeFT (X, Paper)* Mistral 0.2 Base released (Announcement)* Big CO LLMs + APIs* Emad leaves stability ๐Ÿฅบ* Apple rumors - Baidu, Gemini, Anthropic, who else? (X)* This weeks buzz* WandB Workshop in SF confirmed April 17 - LLM evaluations (sign up here)* Vision & Video* Sora showed some demos by actual artists, Air Head was great (Video)* Tencent Aniportait - generate Photorealistic Animated avatars (X)* MedArc - MindEye 2 - fMRI signals to diffusion models (X) * Voice & Audio* Hume demos EVI - empathic voice analysis & generation (X, demo)* AI Art & Diffusion & 3D* Adobe firefly adds structure reference and style transfer - (X, Demo)* Discussion* Deep dive into MindEye 2 with Tanishq & Paul from MedArc* Is narrow finetuning done-for with larger context + cheaper prices - debate๐Ÿฅ‡๐Ÿฅˆ๐Ÿฅ‰Leaderboards updates from LMSys (Arena)This weeks updates to the LMsys arena are significant. (Reminder in LMsys they use a mix of MT-Bench, LLM as an evaluation and user ELO scores where users play with these models and choose which answer they prefer)For the first time since the Lmsys arena launched, the top model is NOT GPT-4 based. It's now Claude's Opus, but that's not surprising if you used the model, what IS surprising is that Haiku, it's tiniest, fastest brother is now well positioned at number 6, beating a GPT4 version from the summer, Mistral Large and other models while being dirt cheap. We also have an incredible show from the only Apache 2.0 licensed model in the top 15, Starling LM 7B beta, which is now 13th on the chart, with incredible finetune of a finetune (OpenChat) or Mistral 7B. ๐Ÿ‘ Yes, you can now run a GPT3.5 beating model, on your mac, fully offline ๐Ÿ‘ Incredible. Open Source LLMs (Welcome to MoE's)Mosaic/Databricks gave us DBRX 132B MoE - trained on 12T tokens (X, Blog, HF)Absolutely crushing the previous records, Mosaic has released the top open access model (one you can download and run and finetune) in a while, beating LLama 70B, Grok-1 (314B) and pretty much every other non closed source model in the world not only on metrics and evals, but also on inference speedIt uses a Mixture of Experts (MoE) architecture with 16 experts that each activate for different tokens. this allows it to have 36 billion actively parameters compared to 13 billion for Mixtral. DBRX has strong capabilities in math, code, and natural language understanding. The real kicker is the size, It was pre-trained on 12 trillion tokens of text and code with a maximum context length of 32,000 tokens, which is just incredible, considering that LLama 2 was just 2T tokens. And the funny thing is, they call this DBRX-medium ๐Ÿ‘€ Wonder what large is all about.Graph credit Awni Hannun from MLX (Source)You can play with the DBRX here and you'll see that it is SUPER fast, not sure what Databricks magic they did there, or how much money they spent (ballpark of ~$10M) but it's truly an awesome model to see in the open access! ๐Ÿ‘ AI21 releases JAMBA - a hybrid Transformer + Mamba 58B MoE (Blog, HF)Oh don't I love #BreakingNews on the show! Just a few moments before ThursdAI, AI21 dropped this bombshell of a model, which is not quite the best around (see above) but has a few very interesting things going for it. First, it's a hybrid architecture model, capturing the best of Transformers and Mamba architectures, and achieving incredible performance on the larger context window size (Transformers hardware requirements scale quadratically with attention/context window)AI21 are the first to show (and take the bet) that hybrid architecture models actually scale well, and are performant (this model comes close to Mixtral MoE on many benchmarks) while also being significantly cost advantageous and faster on inference on longer context window. In fact they claim that Jamba is the only model in its size class that fits up to 140K context on a single GPU! ย  This is a massive effort and a very well received one, not only because this model is Apache 2.0 license (thank you AI21 ๐Ÿ‘) but also because this is now the longest context window model in the open weights (up to 256K) and we've yet to see the incredible amount of finetuning/optimizations that the open source community can do once they set their mind to it! (see Wing from Axolotl, add support for finetuning Jamba the same day it released) Can't wait to see the benchmarks for this model once it's properly instruction fine-tuned. Small MoE from Alibaba - Qwen 1.5 - MoE - A2.7B (Blog, HF)What a week for Mixture of Experts models, we got an additional MoE from the awesome Qwen team, where they show that training a A2.7B (the full model is actually 14B but only 2.7B are activated at the same time) is cheaper, 75% reduction in training costs and 174% improvement in inference speed!Also in open source: Lisa beats LORA for the best parameter efficient training ๐Ÿ“ฐ LISA is a new method for memory-efficient large language model fine-tuning presented in a Hugging Face paper๐Ÿ’ช LISA achieves better performance than LoRA with less time on models up to 70B parameters๐Ÿง  Deep networks are better suited to LISA, providing more memory savings than shallow networks๐Ÿ’พ Gradient checkpointing greatly benefits LISA by only storing gradients for unfrozen layers๐Ÿ“ˆ LISA can fine-tune models with up to 7B parameters on a single 24GB GPU๐Ÿš€ Code implementation in LMFlow is very simple, only requiring 2 lines of code๐Ÿค” LISA outperforms full parameter training in instruction following tasksBig CO LLMs + APIsEmad departs from Stability AI.In a very surprising (perhaps unsurprising to some) move, Emad Mostaque, founder and ex-CEO of stability announces his departure, and focus on decentralized AIFor me personally (and I know countless others) we all started our love for Open Source AI with Stable Diffusion 1.4, downloading the weights, understanding that we can create AI on our machines, playing around with this. It wasn't easy, stability was sued to oblivion, I think LAION is still down from a lawsuit but we got tons of incredible Open Source from Stability, and tons of incredible people who work/worked there. Big shoutout to Emad and very excited to see what he does nextThrowback to NEURIPS where Emad borrowed my GPU Poor hat and wore it ironically ๐Ÿ˜‚ Promised me a stability hat but... I won't hold it against it him ๐Ÿ™‚ This weeks Buzz (What I learned with WandB this week)I'm so stoked about the workshop we're running before the annual Fully Connected conference in SF! Come hear about evaluations, better prompting with Claude, and tons of insights that we have to share in our workshop, and of course, join the main event on April 18 with the whole Weights & Biases crew! VisionSora was given to artists, they created ... artHere's a short by a company called ShyKids who got access to SORA alongside other artists, it's so incredibly human, and I love the way they used storytelling to overcome technological issues like lack of consistency between shots. Watch it and enjoy imagining a world where you could create something like this without living your living room. This also shows that human creativity and art is still deep in the middle of all these creations, even with tools like SORAMindEye 2.0 - faster fMRI-to-imageWe had the awesome pleasure to have Tanishq Abraham and Paul Scotti, who recently released a significantly bette version of fMRI to Image model called MindEye 2.0, shortening the time it takes from 40 hours of data to just 1 hour of fMRI data. This is quite remarkable and I would encourage you to listen to the full interview that's coming out this Sunday on ThursdAI.VoiceHume announces EVI - their Empathic text to speech mode (Announcement, Demo)This one is big folks, really was blown away (see my blind reaction below), Hume announced EVI, a text to speech generator that can reply with emotions! It's really something, and it has be seen to experience. This is in addition to Hume already having an understanding of emotions via voice/imagery, and the whole end to end conversation with an LLM that understands what I feel is quite novel and exciting! The Fine-Tuning Disillusionment on XQuite a few folks noticed a sort of disillusionment from finetuning coming from some prominent pro open source, pro fine-tuning accounts leading me to post this: And we of course had to have a conversation about it, as well as Hamel Husain wrote this response blog called "Is Finetuning still valuable" I'll let you listen to the conversation, but I will say, like w
March madness... I know for some folks this means basketball or something, but since this is an AI newsletter, and this March was indeed mad, I am claiming it. This week seemed madder from one day to another. And the ai announcements kept coming throughout the recording, I used the "breaking news" button a few times during this week's show! This week we covered tons of corporate AI drama in the BigCO segment, from Inflection โ†’ Microsoft move, to Apple Gemini rumors, to Nvidia GTC conference, but we also had a bunch of OpenSource to go over, including an exciting glimpse into the O1 from Open Interpreter, which the founder Killian (of the ThursdAI mafia haha) joined to chat about briefly after an all nighter release push! Another returning FOTP (friend of the pod) Matt Shumer joined as we did a little deep dive into prompting Claude, and how he went viral (seems to happen a lot to Matt) with a project of his to make Claude write prompts for itself! Definitely worth a listen, it's the first segment post the TL'DR on the pod ๐Ÿ‘‚ this week.Btw, did you already check out fully connected? It's the annual Weights & Biases conference in SF next month, and tickets are flying, I'm going to be there and actually do a workshop one day prior, would love to invite you to join as well!TL;DR of all topics covered: * Open Source LLMs* Xai open sources Grok (X, Blog, HF, Github) * Sakana AI releases a new paper + 2 JP merged SOTA models (X, Paper, Blogpost)* Open Interpreter announces O1 - the Linux for AI devices (X, Project)* LM studio new modes (X)* Big CO LLMs + APIs* Nvidia GTC conference - Blackwell platform, NIMs and Gr00t robotics* Jensen interviewed transformers authors * Apple rumored to look at a deal including GEMINI* Apple releases a multi modal MM1 paper (X)* Inflection founders leave to head Microsoft AI* Google opens up Gemini 1.5 with 1M context access to all (X)* Vision & Video* NVIDIA + MIT release VILA (13B, 7B and 2.7B) (X, HuggingFace, Paper)* This week's BUZZ* Fully Connected is coming, sign up here, get tickets, join us. * I'm running a workshop in SF a day before on improving your LLM step by step including exciting announcements (same link)* Voice & Audio* Suno V3 launched officially (X, Blog, Play with it)* Distil-whisper-v3 - more accurate, and 6x version of whisper large (X, Code)* AI Art & Diffusion & 3D* Stability presents SD3 TURBO - 4 steps to get same high quality generation (Paper)* Stability open sources Stable Video 3D (Blog, Models)* Tools & Others* Neuralink interview with the first Human NeuroNaut - Nolan (X)* Lex & Sama released a podcast, barely any news* Matt Shumer releases his Claude Prompt engineer (X, Metaprompt, Matt's Collab)Open Source LLMs Xai open sources Grok (X, Blog, HF, Github) Well, Space Uncle Elon has a huge week, from sending starship into orbit successfully to open sourcing an LLM for us, and a huge one at that. Grok is a 314B parameter behemoth, with a mixture of experts architecture of 80B per expert and two active at the same time. It's released as a base model, and maybe that's why it was received with initial excitement but then, nobody in the GPU poor compute category has the ability to run/finetune it! In terms of performance, it barely beats out Mixtral, while being almost 10x larger, which just shows that.... data is important, maybe more important than Github stars as Arthur (CEO Mistral) helpfully pointed out to Igor (founder of Xai). Still big props to the team for training and releasing this model under apache 2 license.Sakana AI launches 2 new models using evolutionary algo mergingYeah, that's a mouthful, i've been following Hardmaru (David Ha) for a while before he joined Sakana, and only when the founder (and a co-author on transformers) LLion Jones talked about it on stage at GTC the things connected. Sakana means fish in Japanese, and the idea behind this lab is to create things with using nature like evolutionary algorithms. The first thing they open sourced was 2 new SOTA models for Japanese LLM, beating significantly larger models, by using Merging (which we covered with Maxime previously, and whom Sakana shouted out in their work actually) Open Interpreter announces 01 Light - the linux of AI hardware devicesBreaking news indeed, after we saw the release of R1 go viral in January, Killian (with whom we chatted previously in our most favorited episode of last year) posted that if someone wants to build the open source version of R1, it'll be super cool and fit with the vision of Open Interpreter very well.And then MANY people did (more than 200), and the O1 project got started, and fast forward a few months, we now have a first glimpse (and the ability to actually pre-order) the O1 Light, their first device that's a button that communicates with your computer (and in the future, with their cloud) and interacts with a local agent that runs code and can learn how do to things with a skill library. It's all very very exciting, and to see how this idea goes from an announcement on X, to hundreds of folks collaborating and pushing this to the open has been incredible, and we'll definitely do a deeper dive into capabilities and the whole project once the launch craziness dies down a bit (Killian joined us at the epitome of the launch all-nighter haha) This is poised to be the first open source AI device, completely with .stl files for 3d printing at home, chip designs, ability to run end to end locally on your mac and we really applaud the team for this release ๐Ÿซก Big CO LLMs + APIsNvidia GTC annual conference - New Blackwell platform, NIMs, Robotics and everything AI + a chat with the transformer avengers This week Nvidia had their annual GTC conference, where Jensen announced a ton of stuff, but the highlights where the new Blackwell chip (the next iteration of the H100) and the GB200 racks with a whopping 720PFlops of compute ( to put this number in perspective: the first DGX that Jensen delivered to OpenAI in 2016 was 0.17 Petaflops ) They also announced partnerships with everyone under the sun pretty much, a new way to deliver packaged AI experiences called NIMs (which we at weights & biases support as well) and a new foundational operating system for robotics called GR00T led by Dr Jim Fan. Jensen also had the whole transformers original authors cast together on stage (and in the green room) for an hour, for the first time, to chat about, well... transformers. I really need to find the whole video and post it because it's hidden inside the Nvidia GTC website, but it was a very fun chat, where the team reminisced about the naming and their thoughts on the future of LLMs. They also covered each individual company (all of them lefty Google since then) and what they all do. It was a great chat. Microsoft buys Inflection (almost) and Apple considers buying GeminiIn other huge AI player news, 2 of the 3 founders of Inflection AI left to start Microsoft AI (together with some of the staff), namely Mustafa who founded inflection, then helped raise 1.8B dollars, get up to 22K H100 GPUs, release Inflection 2.5 that comes close to GPT4, and then decided to leave. Inflection also pivoted away from consumer (Pi was a very nice AI to chat with) into API services, and apparently Microsoft will pay Inflection $650 to Inflection in the form of a licensing deal. Meanwhile there are rumors that Apple is eyeing Gemini to integrate into IOS, which is, very weird given the recent bad press about Gemini (Unless Apple doesn't want to deal with the same bad press?) and it's even weirder given the latest push from Apple into Open Source. Folks at apple this week released a new paper called MM1, outlining a new multi modal model they have trained (but not released) and show that it beats Gemini visual understanding. It was also great to see that the authors of that model shouted out Weights & Biases crew that helped them through their work on this paper๐Ÿ‘ Nolan - the first NeuralNaut (first human with a Nauralink implanted) Just as I was summing up the notes for this week, Neuralink pinged that they are going to go live soon, and I tuned in to see a 20yo Paraplegic gamer, getting interviewed by a Neuralink employee, being very cheerful, while also playing a chess game, all with his brain. We went a really long way since the monkey playing Pong, and Nolan was able to describe his experience "it's like using The Force" of using Neuralink to control his mac cursor. It was all kind of mind-blowing, and even though brain implants are nothing new, the fidelity and the wireless connections + the very quick surgery made this demo such a nonchalant thing, that Nolan didn't even stop playing chess while being interviewed, probably not realizing that millions of people would be watching. They have a bunch of ML understanding the signals that Nolan sends from his brain wirelessly, and while this is very exciting, and Nolan prepares for this halloween as Professor X from X-men, because well, he's in fact a telekinesis enabled human, Elon claimed that their next target is fixing blindsight (and that it already works on monkeys) presumably via camera input being triggered in the visual cortex. Back in November 2022, I watched the Neuralink keynote and geeked out so hard about this section, where Dan Adams, one of the neuroscientists at Neuralink talked about how it's possible to trigger / stimulate the visual cortex to fix blindness and then generate an image. Well, this is it folks, we talked about tons of other stuff of course but these are the main points that made the cut into the newsletter, as always, if you want to support this newsletter/podcast, please share it with friends โค๏ธ Hope to see you in SF in April (I'll be giving more reminders don't worry) and see you here next ThursdAI ๐Ÿซก P.S - I said Intel a bunch of times when I mean Nvidia, apologies, didnโ€™t notice until post publishing ๐Ÿ˜… This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to
"...Happy birthday dear ThursdAIiiiiiiii, happy birthday to youuuuuu ๐ŸŽ‚"What a day! Today is ฯ€-day (March 14th), 2024. For some reason it's important, not only because it's GPT-4 anniversary, or Claude 1 anniversary, or even that Starship flew to space, but also ๐Ÿฅ it's ThursdAI BirthdAI ๐ŸŽ‰ Yeah, you heard that right, last year following GPT-4 release, I hopped into a twitter space with a few friends, and started chatting about AI, and while some friends came and went, I never stopped, in fact, I decided to leave my 15 year career in software, and focus on AI, learning publicly, sharing my learnings with as many people as possible and it's been glorious. And so today, I get to celebrate a little ๐Ÿ’ƒI also get to reminisce about the state of AI that we were at, back exactly a year ago. Context windows were tiny, GPT-4 came out with 8K (we casually now have models with 200K that cost $0.25/1M tokens), GPT-4 also showed unprecedented levels vision capabilities back then, and now, we have 1.3B parameters models that have similar level of visual understanding, open source was nascent (in fact, LLama.cpp only had it's first commit 4 days prior to GPT4 launch, Stanford released the first Alpaca finetune of Llama just a day prior. Hell even the chatGPT API only came out a few days before, so there was barely any products built with AI out there. Not to mention that folks were only starting to figure out what vector DBs were, what RAG is, how to prompt, and that it's possible to run these things in a loop and create agents! Other fields evolved as well, just hit play on this song I generated for ThursdAI with Suno V3 alpha, I canโ€™t stop listening to it and imagining that this was NOT possible even a few months agoIt's all so crazy and happening so fast, that annual moments like these propose a great opportunity to pause the acceleration for a sec. and contextualize it, and bask in the techno-optimism glory of aren't we lucky to live in these times? I sure am, and for me it's the ThursdAI birthday gift to be able to share my excitement with all of you! Thank you for being a subscriber, the best way you can support ThursdAI is to share this with a friend and tag us on socials ๐ŸซกTL;DR of all topics covered: * Open Source LLMs * Together releases Sequoia speculative decoding (X, Blog)* Hermes Pro from NousResearch - Tool use and function calling (X, HF, Github)* Big CO LLMs + APIs* Anthropic releases Claude 3 Haiku (Announcement, Blog)* Cohere CMD+R (Announcement, HF)* This weeks Buzz* Early bird tickets for Fully Connected in SF are flying, come meet the Weights & Biases team. We're also going to be running a workshop a day before, come join us! (X)* Vision & Video* Deepseek VLM 1.3B and 7B (X,Announcement, HF)* Voice & Audio* Made a song with Suno v3 Alpha for ThursdAI, it's a banger (Song)* Hardware & Robotics (New)* OpenAI now powers Figure - the humanoid robot company (X)* Cerebras announces the fastest AI chip on earth (X)* Extropic made an announcement about their TPU - Thermodynamic Processing Unit* Tools & Agents* Devin from Cognition Labs (Announcement, 47 minute demo)Agents for your house and your Github tasksSay hello to Devin from Cognition Labs (Announcement, Real world demo)By far the most excited I've seen my X feed be this week, was excitement about Cognition Labs new agent called Devin, which they call the first AI software engineer. You should really watch the video, and then watch a few other videos, because, well, only a few folks are getting access, and yours truly is not one of them.It seems like a very published launch, backed by tons of VC folks, and everybody kept highlighting not only the innovative UI that Devin has, and it has a very polished UX/UI/Dev experience with access to a browser (where you can authenticate and it can pick up doing tasks), terminal (where you can scroll back and forth in time to see what it did when), but also a chat window and a planning window + an IDE where it rights code and you can scrub through that as well. Folks were also going crazy about the founder (and team) amount of math ability and IOI gold medals, this video went viral featuring Scott the founder of Cognition, in his youth obliterating this competitionโ€ฆ poor Victoria ๐Ÿ˜…Regardless of their incredible math abilities, Devin is actually pretty solid, specifically from the UI side, and again, like with he AutoGPT hype of yesteryear, we see the same issues, it's nice, but cognition hiring page is still looking for human software engineers. Tune into the last 30 minutes of the pod today as we had tons of folks discuss the implications of an AI "software engineer" and whether or not coding skills are still required/desired. Short answer is, yes, don't skip, learn coding. Devin is going to be there to assist but likely will not replace you.๐Ÿค– OpenAI + Figure give GPT-4 hands (or give figure eyes/ears/mouth)Ok this demo you must just see before reading the rest of it, OpenAI announced a partnership with Figure, a humanoid robotics company recently, and just this week they released a demo of this integration. Using GPT4-Vision and Text to speech capabilities (with a new, somewhat raspy voice and human like intonations), the bot listens to the human giving it instructions, sees the world in front of it, and is able to perform tasks that the human has asked it to do via voice. This feels like a significant jump in capabilities for these bots, and while it was a given that the two technologies (Actuator based robotics and LLMs) will meet soon , this shows the first I Robot like moment. It'll still be a while until you can have this one do your dishes or fold your laundry, but it does feel like it's an eventuality at this point, where as before, it just felt like sci-fi. Kudos on this integration, and can't wait until Optimus from Tesla will add Grok brains and it'll make you laugh nervously at it's cringe jokes ๐Ÿ˜… This weeks BuzzWe're coming to SF in April, our annual Fully Connected conference will feature keynote speakers from foundational AI companies, industry, our founders and tons of Weights & Biases users. We'll also be running a workshop (I'm one of the workshop folks) a day before, so keep an eye on that, it'll be likely included in your ticket (which is still, 50% off for early bird)Open Source LLMs Nous Research gives us Tool Use with Hermes 2 Pro (Announcement)Getting json structured output and giving models the ability to respond with not only text, but specific instructions for which functions to run (aka tool use) is paramount for developers. OpenAI first released this back in June, and since then I've been waiting for Open Source to catch up. And catch up they did, with Nous releasing their first attempt at continued training of the renown Hermes 7B Mistral based model, with tool use and structured output! If you're building agents, or any type of RAG system with additional tools, you will definitely be very happy as well, give Hermes Pro a try! This one is not a simple download and run, you have to do some coding, and luckily the folks at Nous provided us with plenty of examples in their Github. Deepseek gives us a new Vision model - Deepseek VL 1.3B & 7B (Announcement)Absolutely punching above it's weight, this very high quality vision model from the Deepseek folks is just a sign of what's coming, smaller models, performing incredibly better on several tasks. While the top is getting crowded with Claude, GPT4-V and Gemini which are generic, on specific tasks, we're getting tiny models that can offload fully into memory and run hell fast and perform very well on narrow tasks, even in the browserBig CO LLMs + APIsAnthropic gives the smallest/fastest/cheapest Claude 3 - HaikuAfter releasing Opus and Sonnet earlier, Anthropic has reclaimed their throne as the leading AI lab we always knew them to be. Many friends of the pod prefer Opus for many things now, and I keep seeing this sentiment online, folks are even considering cancelling chatGPT for the first time since... well ever? While sonnet, their middle model is taking a significant interesting place on top of the LMsys arena human rated rankings Beating all GPT-4 besides the Turbo ones. And now Anthropics has given us Haiku, the smallest of the three Claudes, the fastest, and the cheapest by far. With 200K context window, vision capabilities, this model crushes GPT3.5 on many benchmarks and becomes the de-facto cheapest model to run. It only costs $0.25/1M tokens, which is twice cheaper than GPT3.5 but just look at the performance. One thing to note, Anthropic still doesn't support function calling/tool use. Cohere releases a new model for retrieval and enterprise purposes - CMD+RCohere goes for the second wind with a great release + open weights approach, and release Command+R (pronounced Commander) which is a model focused on enterprise uses, scalability and tool use. It supports 10 languages, 128K context and beats GPT3.5 and Gemini 1.0 on several tasks, namely on KILT - Knowledge Intensive Language Tasks. The tool use capabilities and the ability to ground information in retrieved context makes this specifically a great model to use for RAG purposes.The model is 34B and is available non commercially on the hubTogether makes inference go BRRR with Sequoia, a new speculative decoding methodTogether Sequoia shows a way to speed up Llama2-70B and be able to run this on a single consumer GPU with 8x speed up. Being able to run AI locally can mean a few things, it can mean, make smaller models better, and we've seen this again and again for the past year. Another way is... speculative decoding. Being able to lower the inference TBT (time between tokens) by enhancing algorithms of decoding and using tiny draft models, and methods like offloading. The large model essentially remains the same, while a smaller (draft) model can help guide the inference and make it seem much faster. These methods compound, and while Sequoia from Together is new, shows great promis
Hello hello everyone, happy spring! Can you believe it? It's already spring! We have tons of AI news for you to cover, starting with the most impactful one, did you already use Claude 3? Anthropic decided to celebrate Claude 1's birthday early (which btw is also ThursdAI's birthday and GPT4 release date, March 14th, 2023) and gave us 3 new Clauds! Opus, Sonnet and Haiku. TL;DR of all topics covered: * Big CO LLMs + APIs* ๐Ÿ”ฅ Anthropic releases Claude Opus, Sonnet, Haiku (Announcement, try it)* Inflection updates Pi 2.5 - claims GPT4/Gemini equivalent with 40% less compute (announcement)* Elon sues OpenAI (link)* OpenAI responds (link)* ex-Google employee was charged with trading AI secrets with China (article)* Open Source LLMs * 01AI open sources - Yi 9B (Announcement)* AnswerAI - Jeremy Howard, Johno & Tim Detmers - train 70B at home with FSDP/QLoRA (X, Blog)* GaLORE - Training 7B on a single consumer-grade GPU (24GB) (X)* Nous open sources Genstruct 7B - instruction-generation model (Hugging Face)* Yam's GEMMA-7B Hebrew (X)* This weeks Buzz* Weights & Biases is coming to SF in April! Our annual conference called Fully Connected is open for registration (Get your tickets and see us in SF)* Vision & Video* Vik releases Moondream 2 (Link)* Voice & Audio* Suno v3 alpha is blowing minds (Link)* AI Art & Diffusion & 3D* SD3 research paper is here (Link)* Tripo + Stability release TripoSR - FAST image-2-3D (link, Demo, FAST demo)* Story how I created competition of inference providers to get us sub 1.5s playground image gen (X)Big CO LLMs + APIsAnthropic releases Claude 3 Opus, Sonnet and Haiku This was by far the biggest news of this week, specifically because, the top keeps getting saturated with top of the line models! Claude Opus is actually preferable to many folks in blind studies over some GPT-4 features, and as we were recording the pod, LMSys released their rankings and Claude Opus beats Gemini, and is now 3rd in user preference on the LMSys rank. There release is vast, they have announced 3 new models but only gave us access to 2 of them teasing that Haiku is much faster / cheaper than other options in that weight class out there. In addition to being head to head with GPT-4, Claude 3 is now finally also multimodal on inputs, meaning it can take images, understand graphs and charts. They also promised significantly less refusals and improved accuracy by almost 2x. One incredible thing that Claude always had was 200K context window, and here they announced that they will be supporting up to 1M, but for now we still only get 200K.We were also promised support for function calling and structured output, but apparently that's "coming soon" but still great to see that they are aiming for it! We were all really impressed with Claude Opus, from folks on stage who mentioned that it's easier to talk to and feels less sterile than GPT-4, to coding abilities that are not "lazy" and don't tell you to continue writing the rest of the code yourself in comments, to even folks who are jailbreaking the guardrales and getting Claude to speak about the "I" and metacognition. Speaking of meta-cognition sparks, one of the prompt engineers on the team shared a funny story about doing a needle-in-haystack analysis, and that Claude Opus responded with I suspect this pizza topping "fact" may have been inserted as a joke or to test if I was paying attentionThis split the X AI folks in 2, many claiming, OMG it's self aware, and many others calling for folks to relax and that like other models, this is still just spitting out token by token. I additional like the openness with which Anthropic folks shared the (very simple but carefuly crafted) system prompt My personal take, I've always liked Claude, even v2 was great until they nixed the long context for the free tier. This is a very strong viable alternative for GPT4 if you don't need DALL-E or code interpreter features, or the GPTs store or the voice features on IOS. If you're using the API to build, you can self register at https://console.anthropic.com and you'll get an API key immediately, but going to production will still take time and talking to their sales folks. Open Source LLMs 01 AI open sources Yi 9B Announcement claims that "It stands out as the top-performing similar-sized language model friendly to developers, excelling in code and math." but it's a much bigger model, trained on 3T tokens. I find it confusing to create a category of models between 7B and almost 12B. This weeks Buzz (What I learned with WandB this week)We're coming to SF! Come join Weights & Biases in our annual conference in the heart of San Francisco, get to hear from industry leaders about how to build models in production, and meet most of the team! (I'll be there as well!) AI Art & DiffusionLast week, just last week, we covered the open sourcing of the awesome Playground 2.5 model, which looked really good in user testing. I really wanted to incorporate this to my little demo, but couldn't run it locally so asked a few friends, and I gotta say, I love how competitive but open the inference providers can get! Between Modal, Fal and Fireworks, I somehow started a performance competition that got these folks to serve Playground 2.5 model in sub 1.5 second per generation. Recorded the story to highlight the awesome folks who worked on this, they deserve the shoutout! You can try super fast Playground generation on FAL and FireworksStability releases Stable Diffusion 3 research paper + Model coming soonStability released the research paper for SD3, their flagship latest iteration of an image model. While this field is getting a little saturated, we now have DALL-E, MidJourney, Adobe Firefly, Playground, SDXL, Stable Cascade and Ideogram, SD is definitely aiming for the title. They released a few metrics claim that on user preference, Visual Aesthetics, Typography and Prompt following, SD2 beats all of the above. They also mentioned the architecture, which is a MM-DiT - multi modal diffusion transformer architecture (DiTs were used for SORA from OpenAI as well) and that they used 50% synthetic captions with COGvlm, which is quite impressive. Emad has mentioned that access to SD3 will start rolling out soon! TripoSR (Demo)We previously covered LUMA models to generate text to 3d, and now we have image 2 3D that's open sourced by the folks at Tripo and Stability AI.TripSR is able to generate 3d shapes from images super super fast, and here's a very nice flow that @blizaine demonstrated of how to use these models to actually bring 3D objects into their environment using a few steps. And that's it for today folks, we of course chatted about a LOT more stuff, I really welcome you to listen to the episode and skip around in the chapters, and see you next week, as we celebrate ThursdAI's birthday (and GPT4 and Claude1) ๐ŸŽ‰ P.S - as I always do, after writing and editing all by hand (promise) I decided to use Opus to be my editor and tell me how was my writing, what did I forget to mention (it has the context form the whole transcription!) and suggest fixes. For some reason I asked Opus for a message to you, the reader. Here it is, take it as you will ๐Ÿ‘ Full Transcript for the deep divers: [00:00:00] Alex Volkov: Right, folks. So I think recording has started. And then let's do our usual. Welcome. Welcome, everyone. Those who know the sound from week to week. This is Alex Volkov. You're listening to ThursdAI, March 7th. I'm an AI evangelist with Weights Biases, who you can see here on stage as well. So, you know, you see the little square thing, give it a follow. Follow us on socials as well. And, uh, today is obviously Thursday.[00:00:45] Alex Volkov: Uh, Thursday was a lot of stuff to talk about. Um, so, let's talk about it. Uh, I think, I think, um, our week is strange, right? Our week starts at the Friday. Almost, not even Friday. The updates that I need to deliver to you start at the end of the previous ThursdAI. So as, as something happens, uh, and I, I have a knowledge cutoff, actually, at some point we considered calling this podcast knowledge cutoff.[00:01:14] Alex Volkov: Um, I have a knowledge cutoff after Thursday afternoon, let's say when I start and send the newsletter, but then AI stuff keeps happening. And, uh, Then we need to start taking notes and taking stock of everything that happened and I think on Friday We had the the lawsuit from Elon and there's a whole bunch of stuff to talk about and then obviously on Monday We had some big news.[00:01:37] Alex Volkov: So As always I'm gonna just run through all the updates. There's not a lot today There's not a ton of updates this week, but definitely there's a few interesting things. Let me un save as well And then I'll just say hi to a few, a few of the folks that I got on stage here to chat. Um, we got Vic, and Vic is going to give us an update about, about something interesting. Uh, Vic, feel free to just unmute and introduce yourself briefly. And then we're going to go through the updates.[00:02:07] Vik: Hey, my name is Vivek, uh, I've been training ML models for the last two years or so. Um, recently released a new model called OneDream2. It's a very small vision language model that excels at a lot of real world use cases that you could use to build computer vision applications today, so I'm very excited to chat about that.[00:02:30] Alex Volkov: Awesome. And, uh, we have Akshay as well. Akshay, it's been a while since you joined us. What's up, man? How are you?[00:02:36] Vik: Greetings of the day everyone, and it's lovely to join again. Uh, I have been listening, I have been here in the audience. Uh, for each and every ThursdAI, and, uh, I've been building some exciting stuff, so I've not been joining much, but, uh, things are going great.[00:02:54] Alex Volkov: Awesome. And, uh, for the first time, I think, or second time we're talking with Siv. Hey, Siv.[00:03:01] Far El: Hey, how's it going, everyone? Uh, just a little background on me. Um
Happy leap year day everyone, very excited to bring you a special once-in-a-4 year edition of ThursdAI ๐Ÿ‘ (Today is also Dune 2 day (am going to see the movie right after I write these here words) and well.. to some folks, this is the bull market โ‚ฟ days as well. So congrats to all who weathered the bear market!)This week we had another great show, with many updates, and a deep dive, and again, I was able to cover most of the news AND bring you a little bit of a deep dive into a very interesting concept called Matryoshka Representation Learning (aka ๐Ÿช† embeddings) and two of the authors on paper to chat with me on the pod! TL;DR of all topics covered: * AI Art & Diffusion & 3D* Playground releases a new diffusion foundational model Playground V2.5 (DEMO)* Alibaba teasing EMO - incredible animating faces (example)* Ideogram 1.0 announced - SOTA text generation (Annoucement)* Open Source LLMs * Gemma update - hard to finetune, not better than 7B mistral* LLama 3 will release in June 2024, not anytime soon* Starcoder 2 + stack V2 (Announcement)* Berkeley Function-Calling leaderboard Leaderboard (Announcement)* Argilla released OpenHermesPreferences the largest open dataset for RLHF & DPO (Announcement)* STORM from Stanford to write long documents (Thread)* Big CO LLMs + APIs* Mistral releases Mistral Large & Le Chat (Announcement, Le Chat)* Microsoft + Mistral strike a deal (Blog)* Google teases GENIE - model makes images into interactive games (announcement)* OpenAI allowing fine-tune on GPT 3.5* Wordpress & Tumbler preparing to sell user data to OpenAI & Midjourney* Other* Mojo releases their MAX inference engine, compatible with PyTorch, Tensorflow & ONNX models (Announcement)* Interview with MRL (Matryoshka Representation Learning) authors (in audio only)AI Art & Diffusion Ideogram 1.0 launches - superb text generation! Ideogram, founded by ex google Imagen folks, which we reported on before, finally announces 1.0, and focuses on superb image generation. It's really great, and I generated a few owls already (don't ask, hooot) and I don't think I will stop. This is superb for meme creation, answering in multimedia, and is fast as well, I'm very pleased! They also announced a round investment from A16Z to go with their 1.0 release, definitely give them a tryPlayground V2.5 Suhail Doshi and Playground release a new foundational image model called Playground v2.5 and it looks awesome, very realistic and honestly looks like it beats MJ and DALL-E on many simple prompts.They also announced that this model received higher user preference scores based on 1K prompts (which we didn't get to see) but they have released this model into the wild, you can download it and play with a free demo provided by modal folksAnother SORA moment? Alibaba teases EMO ๐Ÿคฏ (website)Ok this one has to be talked about, Alibaba released quite a few preview videos + paper about something called EMO, a way to animate a talking/singing Avatars from just 1 image. It broke my brain, and I couldn't stop staring at it. Honestly, it's quite quite something. This model animates not only the mouth, eyes are blinking, there are emotions, hairs move, even earrings, and the most impressive, the whole Larynx muscle structure seem to be animated as well! Just look at this video, and then look at it again. The Github repo was created but no code released and I really hope we get this code at some point, because animating videos with this fidelity + something like SORA can mean so many possible creations! I wrote this tweet only two weeks ago, and I'm already feeling that it's outdated and we're farther along on the curve to there with EMO, what a great release! And just because it's so mind-blowing, here are a few more EMO videos for you to enjoy: Open Source LLMs Starcoder 2 + The Stack V2Folks at hugging face and BigCode have released a beast on us, StarCoder 2 โญ๏ธ The most complete open Code-LLM ๐Ÿค– StarCoder 2 is the next iteration for StarCoder and comes in 3 sizes, trained 600+ programming languages on over 4 Trillion tokens on Stack v2. It outperforms StarCoder 1 by margin and has the best overall performance across 5 benchmarks ๐Ÿš€๐Ÿคฏ.TL;DR;๐Ÿงฎ 3B, 7B & 15B parameter version๐ŸชŸ 16384 token context window๐Ÿ”  Trained on 3-4T Tokens (depending on size)๐Ÿ’ญ 600+ Programming languages๐Ÿฅ‡ 15B model achieves 46% on HumanEval๐Ÿง  Grouped Query Attention and Sliding Window Attention๐Ÿ’ช๐Ÿป Trained on 1024 x H100 NVIDIA GPUsโœ… commercial-friendly license๐Ÿง‘๐Ÿปโ€๐Ÿ’ป Can be used for local CopilotsThe Stack v2 is a massive (10x) upgrade on the previous stack dataset, containing 900B+ tokens ๐Ÿ˜ฎBig CO LLMs + APIs๐Ÿ”ฅ Mistral announces Mistral-Large + Le Chat + Microsoft partnershipToday, we are releasing Mistral Large, our latest model. Mistral Large is vastly superior to Mistral Medium, handles 32k tokens of context, and is natively fluent in English, French, Spanish, German, and Italian.We have also updated Mistral Small on our API to a model that is significantly better (and faster) than Mixtral 8x7B.Lastly, we are introducing Le Chat , a chat interface (currently in beta) on top of our models.Two important notes here, one, they support function calling now on all mistral models in their API, which is a huge deal, and two, the updated Mistral Small to a "significantly better and faster" model than Mixtral 8x7B is quite the hint! I want to also highlight Arthurโ€™s tweet clarifying their commitment to Open Source because it's very important. They released a new website, it again had mentions of "don't train on our models" which they removed, and the new website also had removed the section that committed them to open weights and they put a much bigger section back up quickly! This weeks Buzz (What I learned with WandB this week)I mentioned this before, but this may shock new subscribers, ThursdAI isn't the only (nor the first!) podcast from Weights & Biases. Our CEO Lukas has a long standing podcast that's about to hit 100 episodes and this week he interviewed the CEO of Mayo Clinic - John Hamalka It's a fascinating interview, specifically because Mayo Clinic just recently announced a mutli-year collaboration with Cerebras about bringing AI to everyone who googles their symptoms and ends up on mayo clinic websites anyway, and apparently John has been in AI for longer that I was alive so he's incredibly well positioned to do this and bring us the AI medicine future! Modular announces MAX (Modular Accelerated Xecution) Developer Edition Preview (blog)Modular, the company that created Mojo Lang from Chris Lattner, has now announced the second part of their stack, coming to all of us, and it's called MAX. It's an inference engine that has Mojo built in, that supports PyTorch, Tensorflow and ONNX and is supposedly going to run the same AI models we run now, significantly faster. MAX is a unified set of tools and libraries that unlock performance, programmability and portability for your AI inference pipelinesRight now they support only CPU inference, and significantly boost performance on CPU, however, they are planning GPU support soon as well, and promise up to 5x faster AI inference for most models like Mistral, LLama etc I personally think this is a huge development, and while it's still early, definitely worth taking a look at the incredible speed performances that we are seeing lately, from Groq (as we chatted with them last week) and Modular, we're are very well on our way to run huge models faster, and small models instantly! ๐Ÿช† MRL (Matryoshka Embeddings) interview with Aditya & Prateek Recently OpenAi has released 2 new embeddings models recently that replaced their ada-002 embeddings, and when they released it, they mentioned a new way of shortening dimensions. Soon after, on X, the authors of a 2022 paper MRL (Matryoshka Representation Learning) spoke out and said that this new "method" is actually MRL, the concept they came up with and presented at NeurIPS. Since then I saw many folks explore Matryoshka embeddings, from Bo Wang to Connor Shorten and I wanted to get in on the action! It's quite exciting to have heard from Aditya and Prateek about MRL, how they are able to significantly reduce embeddings size by packing the most important information into the first dimentions, the implications of this for speed of retrieval, the significant boost in use-cases post the chatGPT LLM boom and more! Definitely give this one a listen if you're interested, the interview starts at 01:19:00 on the pod. Thank you for reading, I really appreciate you coming back here week to week, and if you enjoy this content, please share with 1 friend and give us a โญ rating on Apple Pod? Here's a nice Ideogram image as a preemptive thank you! As always, hereโ€™s the full transcript[00:00:00] Intro and welcome[00:00:00][00:00:00] Alex Volkov: Hey, you're on ThursdAI. This is Alex. Happy Leap Year Special Edition. Today's February 29th. We had a great show today. So great that got carried away during the recap, and it's almost twice as long as it usually is. The recap, not the show. But no worries. As always, if you're short on time, the first 25 minutes or so of this almost two hour podcast will catch you up on everything that happened in AI this week.[00:00:29] Alex Volkov: If you're using Apple Podcasts, or any other modern podcatcher, you can also skip to the chapters, that I'm outlining every week and listen to the part that interests you, and only to that part.[00:00:39] Alex Volkov: This week. After the newsy updates, we also had a deep dive into something called Matryoshka Embeddings, with the authors of the MRL paper, Aditya and Pratik.[00:00:49] Alex Volkov: And thank you guys, and I really enjoyed chatting with them both. And we geeked out on why OpenAI decided to release something they came up with two years ago and how it affects the AI industry post the LLM explosion world. So definitely give them a listen![00:01:05] Alex Volkov: at the end of this episode. A brief TLDR, then a full news conversation you're used
Hey, this is Alex,Ok let's start with the big news, holy crap this week was a breakthrough week for speed! We had both Groq explode in popularity, and ByteDance release an updated SDXL model called Lightning, able to generate full blown SDXL 1024 images in 300ms. I've been excited about seeing what real time LLM/Diffusion can bring, and with both of these news release the same week, I just had to go and test them out together: Additionally, we had Google step into a big open weights role, and give us Gemma, 2 open weights models 2B and 7B (which is closer to 9B per Junyang) and it was great to see google committing to releasing at least some models in the open. We also had breaking news, Emad from Stability announced SD3, which looks really great, Google to pay Reddit 200M for AI training on their data & a few more things. TL;DR of all topics covered: * Big CO LLMs + APIs* Groq custom LPU inference does 400T/s Llama/Mistral generation (X, Demo)* Google image generation is in Hot Waters and was reportedly paused (refuses to generate white people)* Gemini 1.5 long context is very impressive to folks (Matt Shumer, Ethan Mollick)* Open Weights LLMs * Google releases GEMMA, open weights 2B and 7B models (Announcement, Models)* Teknium releases Nous Hermes DPO (Announcement, HF)* Vision & Video* YoLo V9 - SOTA real time object detector is out (Announcement, Code)* This weeks Buzz (What I learned in WandB this week)* Went to SF to cohost an event with A16Z, Nous, Mistral (Thread, My Report)* AI Art & Diffusion & 3D* ByteDance presents SDXL-Lightning (Try here, Model)* Stability announces Stable Diffusion 3 (Announcement)* Tools* Replit releases a new experimental Figma plugin for UI โ†’ Code (Announcement)* Arc browser adds "AI pinch to understand" summarization (Announcement)Big CO LLMs + APIsGroq's new LPU show extreme performance for LLMs - up to 400T/s (example)* Groq created a novel processing unit known as the Tensor Streaming Processor (TSP) which they categorize as a Linear Processor Unit (LPU). Unlike traditional GPUs that are parallel processors with hundreds of cores designed for graphics rendering, LPUs are architected to deliver deterministic performance for AI computations.* Analogy: They know where all the cars are going when everyone wakes up for work (when they compile) and how fast they all drive (compute latency) so they can get rid of traffic lights (routers) and turn lanes (backpressure) by telling everyone when to leave the house.* Why would we need something like this? Some folks are saying that average human reading is only 30T/s, I created an example that uses near instant Groq Mixtral + Lightning SDXL to just create images with Mixtral as my prompt managerOpen Source Weights LLMs Google Gemma - 2B and 7B open weights models (demo)* 4 hours after release, Llama.cpp added support, Ollama and LM Studio added support, Tri dao added Flash attention support* Vocab size is 256K* 8K context window* Tokenizer similar to LLama* Folks are... not that impressed as far as I've seen* Trained on 6 trillion tokens* Google also released Gemma.cpp (local CPU inference) - AnnouncementNous/Teknium re-release Nous Hermes with DPO finetune (Announcement)* DPO RLHF is performing better than previous models* Models are GGUF and can be found here* DPO enables Improvements across the boardThis weeks Buzz (What I learned with WandB this week)* Alex was in SF last week* A16Z + 20 something cohosts including Weights & Biases talked about importance of open source* Huge Shoutout Rajko and Marco from A16Z, and tons of open source folks who joined* Nous, Ollama, LLamaIndex, LMSys folks, Replicate, Perplexity, Mistral, Github, as well as Eric Hartford, Jon Durbin, Haotian Liu, HuggingFace, tons of other great folks from Mozilla, linux foundation and Percy from Together/StanfordAlso had a chance to checkout one of the smol dinners in SF, they go really hard, had a great time showing folks the Vision Pro, chatting about AI, seeing incredible demos and chat about meditation and spirituality all at the same time! AI Art & DiffusionByteDance presents SDXL-Lightning (Try here)* Lightning fast SDXL with 2, 4 or 8 steps* Results much closer to original SDXL than turbo version from a few months agoStability announces Stable Diffusion 3 (waitlist)Uses a Diffusion Transformer architecture (like SORA)Impressive multi subject prompt following: "Prompt: a painting of an astronaut riding a pig wearing a tutu holding a pink umbrella, on the ground next to the pig is a robin bird wearing a top hat, in the corner are the words "stable diffusion"Tools* Replit announces a new Figma designโ†’ code plugin Thatโ€™s it for today, definitely check out the full conversation with Mark Heaps from Groq on the pod, and see you next week! ๐Ÿซก ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Full Transcript: [00:00:00] Alex Volkov: Hey, this is Alex. This week on ThursdAI, we had an hour conversation with Grok, a new and very exciting AI inference chip that exploded in popularity all over social media after showing a 5x, yes, 5x improvement in AI inference. 500 tokens per second for Lama70B and Mistral.[00:00:32] Alex Volkov: We also talked about Google's new OpenWeights GEMMA model, Google's image generation issues, which led them to take down the abilities of this image generation to generate people. We covered new, incredibly fast SDXL lightning, and we had breaking news for Stable Diffusion 3, which is a diffusion transformer that's coming out of Stability AI.[00:01:03] Alex Volkov: and a bunch of other news. All that after this short intro into Weights Biases.[00:01:10] AI teams are all asking the same question. How can we better manage our model development workflow? The path to production is increasingly complex, and it can get chaotic keeping track of thousands of experiments and models. Messy spreadsheets and ad hoc notebooks aren't going to cut it. The best AI teams need a better solution.[00:01:33] And better tools. They need Weights Biases, the AI developer platform, to unlock their productivity and achieve production ML at scale. Replace messy spreadsheets with an automated system of record for experiments.[00:01:52] Communicate about model evaluation. and collaboratively review results across the team. Clean up disorganized buckets of models with a unified registry. Automatically capture full model lineage, all the data and code used for training and testing. Seamlessly connect to compute to scale up training. And run large scale sweeps efficiently to optimize models.[00:02:20] Analyze the performance of large language models. And monitor LLM usage and costs with live, customizable dashboards. Get your team on the same page to bridge the gaps from ideation to production. Use Weights Biases to build, manage, and deploy better models, faster.[00:02:41] Alex Volkov: Wasn't this cool? This is Kari. She is a original PM on the Weights Biases team. She's been there for a long time and recently we used her voice to narrate this new video that we have up on the website. And I figured I'd put it in here because it works even without the video. And I thought it was super cool.[00:03:01] Alex Volkov: And people ask me, what does Weights Biases do? And hopefully this answers some of those questions. Now I want to switch gears and say, basically. that the format for this week is a little different. We had the folks from Grok and Matt Schumer at the beginning of the pod, and then we kept talking about everything else, like Gemma and Gemini and everything else.[00:03:24] Alex Volkov: So the first hour of this is going to be an interview with the Grok folks, specifically with Mark Heaps and the next hour afterwards is going to be the deep dive into topics. If you're listening to this on Apple podcast, for example, you should be able to just view chapters and skip to a chapter that you'd prefer. .[00:03:51] Alex Volkov: I want to just do a quick recap of ThursdAI for February 22nd everything we've talked about for today and we started the space with a with two I guess Matt Schumer and mark Heaps from, and that's Groq with a Q at the end, not Groq with a K at the end. So not like X ais Groq. Groq is explo on our timelines recently with just incredible viral videos of them performing l la inference on LAMA two 70 B and Mixtral with around 400 or 500 tokens a second, which is.[00:04:34] Alex Volkov: Five times as much as the previous super fast API inference that we've seen for perplexity and from together. And they're serving like Lama 270B with 500 tokens a second. And so we've had Mark from Groq talk to us for almost an hour about how this is even possible. So we had a very nice deep dive with Mark and definitely if you miss this, please check this out on, on the recorded portion as well.[00:04:58] Alex Volkov: And then we also had Matt, who works at HyperWrite, and he's been playing with these tools, and he told us about the demos that he was able to build, and How much of a difference this speed of inference makes. We've talked about their custom chip called LPU, and we've talked about the fact that the company's been around for a while, and they did not expect this explosion in virality, but they're very happy that they chose this direction correctly.[00:05:21] Alex Volkov: Very great interview, great conversation, and I invite you to listen to this as well. We covered that Google image generation is now in hot waters, and was reportedly paused because it's in injecting prompt stuff that they're not that great, let's say. And many people notice that historical figures are being generated in different races, and different multicultural adjustments are happening to your prompts, which is not great.[00:05:46] Alex Volkov: This blew up on Twitter, and even outside of Twitter, I think folks started writing this in actual Media Google, en
Holy SH*T, These two words have been said on this episode multiple times, way more than ever before I want to say, and it's because we got 2 incredible exciting breaking news announcements in a very very short amount of time (in the span of 3 hours) and the OpenAI announcement came as we were recording the space, so you'll get to hear a live reaction of ours to this insanity. We also had 3 deep-dives, which I am posting on this weeks episode, we chatted with Yi Tay and Max Bane from Reka, which trained and released a few new foundational multi modal models this week, and with Dome and Pablo from Stability who released a new diffusion model called Stable Cascade, and finally had a great time hanging with Swyx (from Latent space) and finally got a chance to turn the microphone back at him, and had a conversation about Swyx background, Latent Space, and AI Engineer. I was also very happy to be in SF today of all days, as my day is not over yet, there's still an event which we Cohost together with A16Z, folks from Nous Research, Ollama and a bunch of other great folks, just look at all these logos! Open Source FTW ๐Ÿ‘ TL;DR of all topics covered: * Breaking AI News* ๐Ÿ”ฅ OpenAI releases SORA - text to video generation (Sora Blogpost with examples)* ๐Ÿ”ฅ Google teases Gemini 1.5 with a whopping 1 MILLION tokens context window (X, Blog)* Open Source LLMs * Nvidia releases Chat With RTX local models (Blog, Download)* Cohere open sources Aya 101 - 101 languages supporting 12.8B model (X, HuggingFace)* Nomic releases Nomic Embed 1.5 + with Matryoshka embeddings (X)* Big CO LLMs + APIs* Andrej Karpathy leaves OpenAI (Announcement)* OpenAI adds memory to chatGPT (X)* This weeks Buzz (What I learned at WandB this week)* We launched a new course with Hamel Husain on enterprise model management (Course)* Vision & Video* Reka releases Reka-Flash, 21B & Reka Edge MM models (Blog, Demo)* Voice & Audio* WhisperKit runs on WatchOS now! (X)* AI Art & Diffusion & 3D* Stability releases Stable Casdade - new AI model based on Wรผrstchen v3 (Blog, Demo)* Tools & Others* Goody2ai - A very good and aligned AI that does NOT want to break the rules (try it)๐Ÿ”ฅ Let's start with Breaking News (in the order of how they happened) Google teases Gemini 1.5 with a whopping 1M context windowThis morning, Jeff Dean released a thread, full of crazy multi modal examples of their new 1.5 Gemini model, which can handle up to 1M tokens in the context window. The closest to that model so far was Claude 2.1 and that was not multi modal. They also claim they are researching up to 10M tokens in the context window. The thread was chock full of great examples, some of which highlighted the multimodality of this incredible model, like being able to pinpoint and give a timestamp of an exact moment in an hour long movie, just by getting a sketch as input. This, honestly blew me away. They were able to use the incredible large context window, break down the WHOLE 1 hour movie to frames and provide additional text tokens on top of it, and the model had near perfect recall. They used Greg Kamradt needle in the haystack analysis on text, video and audio and showed incredible recall, near perfect which highlights how much advancement we got in the area of context windows. Just for reference, less than a year ago, we had this chart from Mosaic when they released MPT. This graph Y axis at 60K the above graph is 1 MILLION and we're less than a year apart, not only that, Gemini Pro 1.5 is also multi modal I got to give promps to the Gemini team, this is quite a huge leap for them, and for the rest of the industry, this is a significant jump in what users will expect going forward! No longer will we be told "hey, your context is too long" ๐Ÿคž A friend of the pod Enrico Shipolle joined the stage, you may remember him from our deep dive into extending Llama context window to 128K and showed that a bunch of new research makes all this possible also for open source, so we're waiting for OSS to catch up to the big G. I will sum up with this, Google is the big dog here, they invented transformers, they worked on this for a long time, and it's amazing to see them show up like this, like they used to do, and blow us away! Kudos ๐Ÿ‘ OpenAI teases SORA - a new giant leap in text to video generationYou know what? I will not write any analysis, I will just post a link to the blogpost and upload some videos that the fine folks at OpenAI just started releasing out of the blue.You can see a ton more videos on Sam twitter and on the official SORA websiteHonestly I was so impressed with all of them, that I downloaded a bunch and edited them all into the trailer for the show! Open Source LLMs Nvidia releases Chat With RTX Chat With Notes, Documents, and VideoUsing Gradio interface and packing 2 local modals, Nvidia releases a bundle with open source AI packaged, including RAG and even Youtube transcriptions chat! Chat with RTX supports various file formats, including text, pdf, doc/docx, and xml. Simply point the application at the folder containing your files and it'll load them into the library in a matter of seconds. Additionally, you can provide the url of a YouTube playlist and the app will load the transcriptions of the videos in the playlist, enabling you to query the content they cover.Chat for DevelopersThe Chat with RTX tech demo is built from the TensorRT-LLM RAG developer reference project available from GitHub. Developers can use that reference to develop and deploy their own RAG-based applications for RTX, accelerated by TensorRT-LLM.This weeks Buzz (What I learned with WandB this week)We just released a new course! Hamel Hussein released a course on enterprise model management! Course name: Enterprise Model ManagementCourse Link: wandb.me/emm-courseWho is this for: The course is targeted at enterprise ML practitioners working with models: MLOps engineers, ML team leaders, ML engineers. It shows both at conceptual and technical level how to get the most value of W&B Model Registry and automations. Attached is also a screenshot of a slide from the course on what different personas (MLOps, ML exec etc) get from Model Registry.What can they expect: Learn how to store, version, and evaluate models like top enterprise companies today, using an LLM training & evaluation example. Big value props: improved compliance, collaboration, and disciplined model development.Vision & VideoReka releases Reka Flash and Reka Edge multimodal modelsReka was co-founded by Yi Tay, previously from DeepMind, trained and released 2 foundational multimodal models, I tried them and was blown away by the ability of the multi-modals to not only understand text and perform VERY well on metrics (73.5 MMLU / 65.2 on HumanEval) but also boasts incredible (honestly, never before seen by me) multi modal capabilities, including understanding video! Here's a thread of me getting my head continuously blown away by the quality of the tonality of this multimodality (sorry...๐Ÿ˜…)I uploaded a bunch of video examples and was blown away, it understands tonality (with the dive dive Diiiiive example) understands scene boundaries, and does incredible OCR between scenes (the Jason/Alex example from speakers) AI Art & DiffusionStable Cascade (link)Stability AI introduced a new text-to-image generation model called Stable Cascade that uses a three-stage approach to produce high-quality images with a compressed latent space, making it more efficient to train and use than previous models. It achieved better results than other models in evaluations while having faster inference speeds. The company released code to train, fine-tune, and use control models like inpainting with Stable Cascade to enable further customization and experimentation. Stability AI aims to lower barriers to AI development through models like this one.Nate did a comparison between a much slower SDXL and Stable Cascade here: Hereโ€™s the transcript for the whole episode, you definitely should check it out! It was really one of the coolest shows we had, and we had over 2K folks listening in! [00:00:00] Alex Volkov: Hey, this is Alex Volkov, you're on ThursdAI, and I just gotta record this intro real quick, because today marks one of the more singular days in AI that I remember since I started recording ThursdAIs, which was itself a singular day, March 14th, 11 months ago, when GPT 4 was released and announced. We since then had a few days like this GPT Dev Day was one such day, and today marks another one.[00:00:38] Alex Volkov: Google has released an update to their model, talking about 1 million tokens in the context window, basically unlimited. And then, just a few, just an hour or two later, OpenAI said, you know what, we also have something in store, and released the most incredible jump. Incapability of video generation, text to video generation.[00:01:02] Alex Volkov: It's called SORA, and what you hear is us recording live, knowing only about Google, which came out an hour and a half before we started recording, and then somewhere in the middle, I think minute 35 or something, you'll hear our live reaction to the Incredibly mind blowing advancement in text to video that OpenAI just released.[00:01:31] Alex Volkov: And I just wanted to record this as I'm finishing up the editing and about to start writing the newsletter, to say, days like this really are the reason why I'm all in on AI and I'm very excited about the changes and advancements.[00:01:49] Alex Volkov: And I'm sure there will be more days like this going forward. We've yet to see what Apple came up with, we've yet to really see what Meta comes up with Llama 3, etc. And, yeah, I just wish you enjoyed this and I don't have a lot of words here besides just letting you listen to the rest of the episode and say that I was very happy to be in San Francisco for this, the place where most of this happens, and I was very happy to be in company of good friends, both in the virtual world those on stage in our Twitt
Hihi, this is Alex, from Weights & Biases, coming to you live, from Yosemite! Well, actually Iโ€™m writing these words from a fake virtual yosemite that appears above my kitchen counter as Iโ€™m not a Vision Pro user and I will force myself to work inside this thing and tell you if itโ€™s worth it. I will also be on the lookout on anything AI related in this new spatial computing paradigm, like THIS for example! But back to rfeality for a second, we had quite the show today! We had the awesome time to have Junyang Justin Lin, a dev lead in Alibaba, join us and talk about Qwen 1.5 and QwenVL and then we had a deep dive into quite a few Acronyms Iโ€™ve been seeing on my timeline lately, namely DSPy, ColBERT and (the funniest one) RAGatouille and we had a chat with Connor from Weaviate and Benjamin the author of RAGatouille about what it all means! Really really cool show today, hope you donโ€™t only read the newsletter but listen on Spotify, Apple or right here on Substack. TL;DR of all topics covered: * Open Source LLMs * Alibaba releases a BUNCH of new QWEN 1.5 models including a tiny .5B one (X announcement)* Abacus fine-tunes Smaug, top of HF leaderboard based Qwen 72B (X)* LMsys adds more open source models, sponsored by Together (X)* Jina Embeddings fine tune for code* Big CO LLMs + APIs* Google rebranding Bard to Gemini and launching Gemini Ultra (Gemini)* OpenAI adds image metadata (Announcement)* OpenAI keys are now restricted per key (Announcement)* Vision & Video* Bria - RMBG 1.4 - Open Source BG removal that runs in your browser (X, DEMO)* Voice & Audio* Meta voice, a new apache2 licensed TTS - (Announcement)* AI Art & Diffusion & 3D* Microsoft added DALL-E editing with "designer" (X thread)* Stability AI releases update to SVD - video 1.1 launches with a webUI, much nicer videos* Deep Dive with Benjamin Clavie and Connor Shorten show notes:* Benjamin's announcement of RAGatouille (X)* Connor chat with Omar Khattab (author of DSPy and ColBERT) - Weaviate Podcast* Very helpful intro to ColBert + RAGatouille - NotionOpen Source LLMs Alibaba releases Qwen 1.5 - ranges from .5 to 72B (DEMO)With 6 sizes, including 2 new novel ones, from as little as .5B parameter models to an interesting 4B, to all the way to a whopping 72B, Alibaba open sources additional QWEN checkpoints. We've had the honor to have friend of the pod Junyang Justin Lin again, and he talked to us about how these sizes were selected, that even thought this model beats Mistral Medium on some benchmarks, it remains to be seen how well this performs on human evaluations, and shared a bunch of details about open sourcing this.The models were released with all the latest and greatest quantizations, significantly improved context length (32K) and support for both Ollama and Lm Studio (which I helped make happen and am very happy for the way ThursdAI community is growing and connecting!) We also had a chat about QwenVL Plus and QwebVL Max, their API only examples for the best open source vision enabled models and had the awesome Piotr Skalski from Roborflow on stage to chat with Junyang about those models! To me a success of ThursdAI, is when the authors of things we talk about are coming to the show, and this is Junyang second appearance, which he joined at midnight at the start of the chinese new year, so greately appreciated and def. give him a listen! Abacus Smaug climbs to top of the hugging face leaderboard Junyang also mentioned that Smaug is now at the top of the leaderboards, coming from Abacus, this is a finetune of the previous Qwen-72B, not even this new one. First model to achieve an average score of 80, this is an impressive appearance from Abacus, though they haven't released any new data, they said they are planning to! They also said that they are planning to finetune Miqu, which we covered last time, the leak from Mistral that was acknowledged by Arthur Mensch the CEO of Mistral.The techniques that Abacus used to finetune Smaug will be released an upcoming paper! Big CO LLMs + APIsWelcome Gemini Ultra (bye bye Bard) Bard is no longer, get ready to meet Gemini. it's really funny because we keep getting cofusing naming from huge companies like Google and Microsoft. Just a week ago, Bard with Gemini Pro shot up to the LMSYS charts, after regular gemini pro API were not as close. and now we are suppose to forget that Bard even existed? ๐Ÿค” Anyhow, here we are, big G answer to GPT4, exactly 10 months 3 weeks 4 days 8 hours, but who's counting? So what do we actually get? a $20/m advanced tier for Gemini Advanced (which will have Ultra 1.0) the naming confusion continues. We get a longer context (how much?) + IOS and android apps (though I couldn't find it in IOS, maybe it wasn't yet rolled out)Gemini now also replaces google assistant for those with androids who opt in (MKBHD was somewhat impressed but not super impressed) but google is leaning into their advantage including home support! * Looks like Gemini is ONLY optimized for English as well We had quite the conversation on stage from folks who upgraded and started using, including noticing that Gemini is a better role player, and less bland, but also that they don't yet support uploading documents besides images, and that the context window is very limited, some said 8K and some 32K but definitely on the lower side. Also from Google : a llama.cpp wrapper called localllm (Blog)OpenAI watermarks DALL-E images and adds per key API limits (finally) (Blog)OpenAI's using something calledC2PA for pictures made by DALL-E 3, whether you're chatting with ChatGPT or using their API. It's a way to show that DALL-E 3 actually created those images. But it's just for images right now, not for text or voice stuff. Adding this info can make the files up to 32% bigger, but it doesn't mess with the quality. The tags tell you if the source was DALL-E 3, ChatGPT, or the API by including special signatures and stuff. Just a heads up, though, this C2PA thing isn't perfect. The metadata could get wiped either on purpose or by mistake.They also released an update to the developer experience that allows you to track usage but also restrict usage per API key! Very very needed and helpful! This weeks Buzz (What I learned with WandB this week)First part of the live series with the Growth ML team was live and AWESOME! VisionBRIA - Open-Source background removal (non commercial)BRIA AI@bria_ai_Feb 6, 2024๐Ÿ“ท Introducing Open-Source Background Removal by @BriaAI ๐Ÿ“ท Now live on @huggingface, RMBG v1.4 excels in separating foreground from background across diverse categories, surpassing current open models. See demo [https://t.co/DDwncjkYqi] #BriaAI #OpenSource #AI @briaai https://t.co/BlhjMMNWxaVoiceMetaVoice (hub)1.2B parameter model.Trained on 100K hours of data.Supports zero-shot voice cloning.Short & long-form synthesis.Emotional speech.Best part: Apache 2.0 licensed. ๐Ÿ”ฅPowered by a simple yet robust architecture: > Encodec (Multi-Band Diffusion) and GPT + Encoder Transformer LM. > DeepFilterNet to clear up MBD artefacts.That's it for us this week, this time I bring you both the news segment AND the deepdive in one conversation, hope it's not super long, see you here next ThursdAI! ๐Ÿ‘Full Transcript: [00:00:00] Intro and housekeeping[00:00:00] โ€‹[00:00:00] Alex Volkov: You're on ThursdAI, and I think it's time for us to get started with the recording and the introduction.[00:00:26] Alex Volkov: Happy, happy Thursday everyone! Today is February 8th, 2024. I don't know, This is the second calendar year the Thursday is happening in, so I don't know if I need to mention the year or not but we're well on our way into 2024 and you're here on Thursday, I, the Thursday I is the space, the newsletter, and the podcast to keep you up to date with all of the very interesting things that are happening in the very fast moving world of ai.[00:00:58] Alex Volkov: Hopefully by now, all of you already have ThursdAI in your podcast, wherever you get a podcast, Spotify, recently YouTube as well, which is weird. But with this introduction, I will just say, hello myself, basically. Hey everyone. My name is Alex Volkov. I'm an AI evangelist with Weights & Biases.[00:01:15] Alex Volkov: Weights & Biases is the reason why this comes to life to you. And there's going to be a little segment about Weights & Biases in the middle here as well, and I'm joined on stage. Often, and pretty much every week by great friends, experts in their fields. As we talk about everything AI related this week, especially we're going to have some interesting things.[00:01:34] Alex Volkov: Those of you who come back week after week. Thank you, and we love that you're part of the community, and it's great to see how many people just return, and those of you who are new, we're here every week and The community doesn't stop after we finish the space. There's a bunch of spaces. I think our friend AlignmentLab had the space that went on for the full week, I think.[00:01:55] Alex Volkov: I don't know if he ever slept. That's maybe why he's not here on stage. But we're here every week for the two hours to give you updates for the first hour and definitely some very interesting deep dives that has been happening, that have been happening for the past few Weeks, I want to say, so I just want to shout out some friends of ours that recently we were featured in the deep dives.[00:02:16] Alex Volkov: We've talked with Maxime Lubon, who trained the Beagle series and then also gave a deep dive with us about model merging. That was really fun. And on the last deep dive, we talked with the Lilac folks and they're building an open source tool. That lets you peer into huge data sets, like imagine millions of rows, data sets, and they chunk and cluster this. And we've talked about the importance of data sets in creation of LLMs or large language models.[00:02:46] Alex Volkov: And they've taken the huge data sets of the folks to usually come up on ThursdAI. Technium from Nous Research just
Hello hello everyone, welcome to another special episode (some podcasts call them just.. episodes I guess, but here you get AI news every ThurdsdAI, and on Sunday you get the deeper dives) BTW, I'm writing these words, looking at a 300 inch monitor that's hovering above my usual workstation in the Apple Vision Pro, and while this is an AI newsletter, and I've yet to find a connecting link (there's like 3 AI apps in there right now, one fairly boring chatbot, and Siri... don't get me started on Siri), I'll definitely be covering my experience in the next ThursdAI, because well, I love everything new and technological, AI is a huge part of it, but not the ONLY part! ๐Ÿ“– It's all about the (big) Datasets Ok back to the matter at hand, if you've used, finetuned, trained or heard about an AI model, you may or may not realize how important the dataset the model was trained with is. We often talk of this model, that model, and often the only different is, additional data that folks (who I sometimes refer to as alchemists) have collected, curated and structured, and creating/curating/editing those datasets is an art and a science. For example, three friends of the pod, namely LDJ with Capybara, Austin with OpenChat and Teknium with Hermes, have been consistently taking of the shelves open source models and making them smarter, more instruction tuned, better for specific purposes. These datasets are paired with different techniques as well, for example, lately the so-called DPO (Direct preference optimization) is a technique that showed promise, since it not only shows a model which answer is the correct for a specific query, it shows an incorrect answer as well, and trains the model to prefer one over the other. (see the recent Capybara DPO improvement by Argilla, which improved model metrics across every evaluation)These datasets can range from super high quality 16K rows, to millions of rows (Teknium's recently released Hermes, one of the higher quality datasets comes in at just a tad over exactly 1 million rows) and often times it's an amalgamation of different other datasets into 1. In the case of Hermes, Teknium has compiled this 1 million chats from at least 15 different datasets, some his own, some by folks like Jon Durbin, Garage bAInd, and shareGPT, from LMsys.org, which was complied by scraping the very popular sharegpt.com website, from folks who used the shareGPT extension to share they GPT4 conversations. It's quite remarkable how much of these datasets are just, conversations that users had with GPT-4! Lilac brings GardenWith that backdrop of information, today on the pod we've got the co-founders of Lilac, Nikhil Thorat and Daniel Smilkov, who came on to chat about the new thing they just released called Lilac Garden. Lilac is an open source tool (you can find it RIGHT HERE) which is built to help make dataset creation, curation and classification, more science than art, and help visualize the data, cluster it and make it easily available. In the case of Hermes, that could be more than millions of rows of data.On the pod, I talk with Nikhil and Daniel about the origin of what they both did at Google, working on Tensorflow.js and then something called "know your data" and how eventually they realized that in this era of LLMs, open sourcing a tool that can understand huge datasets, run LLM based classifiers on top of them, or even train specific ones, is important and needed! To strengthen the point, two friends of the pod (Teknium was in the crowd sending us ๐Ÿ‘), LDJ and Austin (aka Alignment Lab) were on stage with us and basically said that "It was pretty much the dark ages before Lilac", since something like OpenOrca dataset is a whopping 4M rows of text. Visualizations in the Garden. So what does lilac actually look like? Here's a quick visualization of the top categories of texts from OpenOrca's 4 million rows, grouped by category title and showing each cluster. So you can see here, Translation requests have 66% (around 200K rows) of the translation category, and you can scroll on and on and add filters and really dissect this whole thing up and down. The categorization is created by running Lilac on your dataset, which uses embedding algorithms and other neat tricks to quickly chunk and put labels on the categories (AKA classifying them). Btw, you can see this view and play around with it yourself, hereBut running this on your own local machine can be a drag, and take hours if not days for bigger datasets, including sometimes hanging and not even working 100%, so the Lilac folks created Lilac Garden, which is a hosted solution by them to provide a dataset, and do classify something like 4M in 4-5 hours or so. Which is definitely not possible on local machines. If you're into that kind of thing, again, Lilac is open source ,so you don't have to sign up or pay them, but if speed and this view matters to you, definitely check Lilac out! RWKV with Eugene (Pico Creator) On the news segment of ThursdAI we mentioned Eagle, which is the 5th version of RWKV, an attention free, potential alternative to Transformers, that's being developed fully in the open source. Later in the show we had the honor to have PicoCreator, one of the front running folks in the RWKV effort, which is an attempt to see if Transformers can be beat with a new type of architecture (RNN) that doesn't require specific attention mechanisms, that add the problem of Quadratic Attention scaling, making LLMs hard and expensive to run the more context is provided. Eugene had some technical issues so joined in the middle of the pod, so we didn't have a full deep-dive, however, I figured it's important to bring this info to you guys, as these efforts may yield AI that runs 10-100x cheaper and potentially faster on devices, using almost infinite context lengths. RWKV and other attempts like StripedHyena (Together AI) and Mamba (from Tri Dao) are attempts that are worth watching as they may supersede or join with Transformers to create the next jump in LLM capabilities.That's all for this Sunday, needless to say, with the Vision Pro releasing on a Friday, it's been a full weekend of future exploration, which is the main driver in my personal life! P.S - if you read through to here, you get a gift! A teaser, I have done something different on the pod, recorded a human interest podcast x AI, for the first time. I mostly bring the news and sometimes deep dives like this one, but this story I couldn't ignore, so stay tuned if you're into dating x AI, and how technology disrupts our lives and wether this is all moral or not, as I recorded an Episode with Sasha Jadan and his new Fiancee Karina, which his AI bot picked out for him, after swiping and matching with over 5200 girls on Tinder. The AI also... suggested he'd propose which he did. It was a very interesting conversation that I plan to upload soon! That's it from me this week, see you all on ThursdAI and don't forget, if you liked this, do me a solid, listen to the pod and then leave a review or a 5 star (at least a 4?) on Apple podcasts ๐Ÿ™ This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
TL;DR of all topics covered + Show notes* Open Source LLMs* Meta releases Code-LLama 70B - 67.8% HumanEval (Announcement, HF instruct version, HuggingChat, Perplexity)* Together added function calling + JSON mode to Mixtral, Mistral and CodeLLama* RWKV (non transformer based) Eagle-7B - (Announcement, Demo, Yam's Thread)* Someone leaks Miqu, Mistral confirms it's an old version of their model* Olmo from Allen Institute - fully open source 7B model (Data, Weights, Checkpoints, Training code) - Announcement* Datasets & Embeddings* Teknium open sources Hermes dataset (Announcement, Dataset, Lilac)* Lilac announces Garden - LLM powered clustering cloud for datasets (Announcement)* BAAI releases BGE-M3 - Multi-lingual (100+ languages), 8K context, multi functional embeddings (Announcement, Github, technical report)* Nomic AI releases Nomic Embed - fully open source embeddings (Announcement, Tech Report)* Big CO LLMs + APIs* Bard with Gemini Pro becomes 2nd LLM in the world per LMsys beating 2 out of 3 GPT4 (Thread)* OpenAI launches GPT mention feature, it's powerful! (Thread)* Vision & Video* ๐Ÿ”ฅ LLaVa 1.6 - 34B achieves SOTA vision model for open source models (X, Announcement, Demo)* Voice & Audio* Argmax releases WhisperKit - super optimized (and on device) whisper for IOS/Macs (X, Blogpost, Github)* Tools* Infinite Craft - Addicting concept combining game using LLama 2 (neal.fun/infinite-craft/)Haaaapy first of the second month of 2024 folks, how was your Jan? Not too bad I hope? We definitely got quite a show today, the live recording turned into a proceeding of breaking news, authors who came up, deeper interview and of course... news.This podcast episode is focusing only on the news, but you should know, that we had deeper chats with Eugene (PicoCreator) from RWKV, and a deeper dive into dataset curation and segmentation tool called Lilac, with founders Nikhil & Daniel, and also, we got a breaking news segment and (from ) joined us to talk about the latest open source from AI2 ๐Ÿ‘Besides that, oof what a week, started out with the news that the new Bard API (apparently with Gemini Pro + internet access) is now the 2nd best LLM in the world (According to LMSYS at least), then there was the whole thing with Miqu, which turned out to be, yes, a leak from an earlier version of a Mistral model, that leaked, and they acknowledged it, and finally the main release of LLaVa 1.6 to become the SOTA of vision models in the open source was very interesting!Open Source LLMsMeta releases CodeLLama 70BBenches 67% on MMLU (without fine-tuninig) and already available on HuggingChat, Perplexity, TogetherAI, Quantized for MLX on Apple Silicon and has several finetunes, including SQLCoder which beats GPT-4 on SQLHas 16K context window, and is one of the top open models for codeEagle-7B RWKV based modelI was honestly disappointed a bit for the multilingual compared to 1.8B stable LM , but the folks on stage told me to not compare this in a transitional sense to a transformer model ,rather look at the potential here. So we had Eugene, from the RWKV team join on stage and talk through the architecture, the fact that RWKV is the first AI model in the linux foundation and will always be open source, and that they are working on bigger models! That interview will be released soonOlmo from AI2 - new fully open source 7B model (announcement)This announcement came as Breaking News, I got a tiny ping just before Nathan dropped a magnet link on X, and then they followed up with the Olmo release and announcement.A fully open source 7B model, including checkpoints, weights, Weights & Biases logs (coming soon), dataset (Dolma) and just... everything that you can ask, they said they will tell you about this model. Incredible to see how open this effort is, and kudos to the team for such transparency.They also release a 1B version of Olmo, and you can read the technical report hereBig CO LLMs + APIsMistral handles the leak rumorsThis week the AI twitter sphere went ablaze again, this time with an incredibly dubious (quantized only) version of a model that performed incredible on benchmarks, that nobody expected, called MIQU, and i'm not linking to it on purpose, and it started a set of rumors that maybe this was a leaked version of Mistral Medium. Remember, Mistral Medium was the 4th best LLM in the world per LMSYS, it was rumored to be a Mixture of Experts, just larger than the 8x7B of Mistral.So things didn't add up, and they kept not adding up, as folks speculated that this is a LLama 70B vocab model etc', and eventually this drama came to an end, when Arthur Mensch, the CEO of Mistral, did the thing Mistral is known for, and just acknowleged that the leak was indeed an early version of a model, they trained once they got access to their cluster, super quick and that it indeed was based on LLama 70B, which they since stopped using.Leaks like this suck, especially for a company that ... gives us the 7th best LLM in the world, completely apache 2 licensed and it's really showing that they dealt with this leak with honor!Arthur also proceeded to do a very Mistral thing and opened a pull request to the Miqu HuggingFace readme with an attribution that looks like this, with the comment "Might consider attribution" ๐Ÿซณ๐ŸŽคBard (with Gemini Pro) beats all but the best GPT4 on lmsys (and I'm still not impressed, help)This makes no sense, and yet, here we are. Definitely a new version of Bard (with gemini pro) as they call it, from January 25 on the arena, now is better than most other models, and it's could potentially be because it has internet access?But so does perplexity and it's no where close, which is weird, and it was a weird result that got me and the rest of the team in the ThursdAI green room chat talking for hours! Including getting folks who usually don't reply, to reply ๐Ÿ˜† It's been a great conversation, where we finally left off is, Gemini Pro is decent, but I personally don't think it beats GPT4, however most users don't care about which models serves what, rather which of the 2 choices LMSYS has shown them answered what they asked. And if that question has a google search power behind it, it's likely one of the reasons people prefer it.To be honest, when I tried the LMSYS version of Bard, it showed me a 502 response (which I don't think they include in the ELO score ๐Ÿค”) but when I tried the updated Bard for a regular task, it performed worse (in my case) than a 1.6B parameter model running locally.Folks from google replied and said that it's not that they model is bad, it's that I used a person's name, and the model just.. refused to answer. ๐Ÿ˜ตโ€๐Ÿ’ซ When I removed a last name it did perform ok, no where near close to GPT 4 though.In other news, they updated Bard once again today, with the ability to draw images, and again, and I'm sorry if this turns to be a negative review but, again, google what's going on?The quality in this image generation is subpar, at least to mea and other folks, I'll let you judge which image was created with IMAGEN (and trust me, I cherry picked) and which one was DALLE for the same exact promptThis weeks Buzz (What I learned with WandB this week)Folks, the growth ML team in WandB (aka the team I'm on, the best WandB team duh) is going live!That's right, we're going live on Monday, 2:30 PM pacific, on all our socials (X, LinkedIn, Youtube) as I'm hosting my team, and we do a recap of a very special week in December, a week where we paused other work, and built LLM powered projects for the company!I really wanted to highlight the incredible projects, struggles, challenges and learnings of what it takes to take an AI idea, and integrated it, even for a company our size that works with AI often, and I think it's going to turn out super cool, so you all are invited to check out the live stream!Btw, this whole endeavor is an initiative by yours truly, not like some boring corporate thing I was forced to do, so if you like the content here, join the live and let us know how it went!OpenAI releases a powerful new feature, @mentions for GPTsThis is honestly so great, it went under the radar for many folks, so I had to record a video to expalin why this is awesome, you can now @mention GPTs from the store, and they will get the context of your current conversation, no longer you need to switch between GPT windows.This opens the door for powerful combinations, and I show some in the video below:Apple is coming to AINot the Apple Vision Pro, that's coming tomorrow and I will definitely tell you how it is! (I am getting one and am very excited, it better be good)No, today on the Apple earnings call, Tim Cook finally said the word AI, and said that they are incredibly excited about this tech, and that we'll get to see something from them this year.Which makes sense, given the MLX stuff, the Neural Engine, the Ml-Ferret and the tons of other stuff we've seen from them this year, Apple is definitely going to step in a big way!Vision & VideoLLaVa 1.6 - SOTA in open source VLM models! (demo)Wow, what a present we got for Haotian Liu and the folks at LLaVa, they upgraded the LlaVa architecture and released a few more models, raging from 7B to 34B, and created the best open source state of the art vision models! It's significantly better at OCR (really, give it a go, it's really impressive) and they exchanged the LLM backbone with Mistral and Hermes Yi-34B.* Better OCR and higher res* Uses several bases like Mistral and NousHermes 34B* Uses lmsys SGlang for faster responses (which we covered a few weeks ago)* SoTA Performance! LLaVA-1.6 achieves the best performance compared with open-source LMMs such as CogVLM or Yi-VL. Compared with commercial ones, it catches up to Gemini Pro and outperforms Qwen-VL-Plus on selected benchmarks.* Low Training Cost. LLaVA-1.6 is trained with 32 GPUs for ~1 day, with 1.3M data samples in total. The compute / training data cost is 100-1000 times smaller than others.Honestly it's quite stunningly good, howev
Hey everyone, we have an exciting interview today with Maxime Labonne. Maxime is a senior Machine Learning Scientist at JPMorgan, the author of Hands on GNNs book and his own ML Blog, creator of LazyMergeKit (which we cover on the pod) and holds a PHD in Artificial Intelligence from the Institut Polytechnique de Paris. Maxime has been mentioned on ThursdAI a couple of times before, as he released the first Phi mixture-of-experts, and has previously finetuned OpenHermes using DPO techniques which resulted in NeuralChat7B For the past couple of months, following AI on X, it was hard not to see Maxime's efforts show up on the timeline, and one of the main reasons I invited Maxime to chat was the release of NeuralBeagle7B, which at the time of writing was the top performing 7B model on the LLM leaderboard, and was specifically a merge of a few models. Model mergingModel merging has been around for a while but recently has been heating up, and Maxime has a lot to do with that, as he recently checked, and his wrapper on top of MergeKit by Charles Goddard (which is the library that put model merging into the mainstream) called LazyMergeKit was in charge of >50% of the merged models on HuggingFace hub leaderboard. Maxime also authored a model merging blogpost on Hugging Face and wrote quite a few articles and shared code that helped others to put merged models out. Modern day AlchemyThis blogpost is a great resource on what model merging actually does, so I won't go into depth of what the algorithms are, please refer to that if you want a deep dive, but in a nutshell, model merging is a technique to apply algorithms to the weights of a few models, even a few instances of the same model (like Mistral7B) and create a new model, that often performs better than the previous ones, without additional training! Since this is algorithmic, it doesn't require beefy GPUs burning power to keep training or finetuning, and since the barrier of entry is very low, we get some cool and crazy results as you'll see below. Yeah, quite crazy as it sounds, this method can also create models of non standard sizes, like 10B or 120B models, since it's slicing pieces of other models and stitching them together in new ways. If you recall, we had a deep dive with Jon Durbin who released Bagel, and Jon specifically mentioned that he created Bagel (based on everything everywhere all at once) as a good base for merges, that will include all the prompt formats, you can read and listen to that episode hereThis merge frenzy, made HuggingFace change the leaderboard, and add a checkbox that hides model merges, because they are flooding the leaderboard, and often, and require much smaller effort than actually pre-training or even finetuning a modelAnd quite often the top of the leaderboard was overrun with model merges like in this example of Bagel and it's merges by CloudYu (which are not the top ones but still in the top 10 as I write this) ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.On why it works? Nisten summarized this pretty well in this now famous copypasta tweet and I've confirmed with Maxime that this is his current understanding as well, it's quite unclear why this seems to perform so well, but it of course doesn't stop the "folks who look for AI Waifus" to keep merging.Following folks like Nathan Lambert from interconnects.ai to start paying attention even though he didn't want to! (Still waiting on your writeup Nathan!) UPDATE: As of today Monday Jan 29th, just released a super comprehensive deep dive into merges, which you can read here ๐Ÿ‘‡๐Ÿ‘YALL + Automated LLM EvaluationMaxime as also worked on so many models of his own, that he built a convenient little tracking leaderboard to track their performance, which he called YALL, Yet Another LLM Leaderboard and it's on HuggingFace. You can see that NeuralBeagle is the top dog (sorry, I literally could not resist) It uses the Nous evaluations, and Maxime has created an automation called LLM AutoEval that makes it really simple to run evaluations, which you can run in a Colab super easily. LLM AutoEval is on Github. Merge-aology! Since chatting, Maxime has released a Colab and later a HuggingFace space that takes models names, and shows the genealogy, nay, Merge-aology of the models, which models it was merged from and it's pretty crazy how deep this rabbit hole goes, and crazier even still that these models perform very well after all of these lobotomies! Try it out here: https://huggingface.co/spaces/mlabonne/model-family-treeI really hope you enjoy this special deep dive, I definitely learned a BUNCH from this conversation with Maxime, and I'm very happy that he came on! This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
What A SHOW folks, I almost don't want to write anything in the newsletter to MAKE you listen haha but I will I know many of you don't like listening to be babble. But if you chose one episode to listen to instead of just skimming the show-notes, make it this one. We've had 2 deep dives, one into the exciting world of multi-modalilty, we chatted with the creator of Moondream1, Vik and the co-founders of Prophetic, Wes and Eric about their EEG/fMRI multimodal transformer (that's right!) and then we had a DEEP dive into the new Hourglass Diffusion Transformers with Tanishq from MedArc/Stability. More than 1300 tuned in to the live show ๐Ÿ”ฅ and I've got some incredible feedback on the fly, which I cherish so if you have friends who don't already know about ThursdAI, why not share this with them as well? TL;DR of all topics covered: * Open Source LLMs * Stability AI releases StableLM 1.6B params (X, Blog, HF)* InternLM2-Math - SOTA on math LLMs (90% GPT4 perf.) (X, Demo, Github)* MedArc analysis for best open source use for medical research finds Qwen-72 the best open source doctor (X)* Big CO LLMs + APIs* Google teases LUMIERE - incredibly powerful video generation (TTV and ITV) (X, Blog, ArXiv)* ๐Ÿค— HuggingFace announces Google partnership (Announcement)* OpenAi 2 new embeddings models, tweaks turbo models and cuts costs (My analysis, Announcement)* Google to add 3 new AI features to Chrome (X, Blog)* Vision & Video* Adept Fuyu Heavy - Third in the world MultiModal while being 20x smaller than GPT4V, Gemini Ultra (X, Blog)* FireLLaVa - First LLaVa model with commercial permissive license from fireworks (X, Blog, HF, DEMO)* Vikhyatk releases Moondream1 - tiny 1.6B VLM trained on Phi 1 (X, Demo, HF)* This weeks's buzz ๐Ÿ๐Ÿช„ - What I learned in WandB this week* New course announcement from Jason Liu & WandB - LLM Engineering: Structured Outputs (Course link)* Voice & Audio* Meta W2V-BERT - Speech encoder for low resource languages (announcement)* 11 labs has dubbing studio (my dubbing test)* AI Art & Diffusion & 3D* Instant ID - zero shot face transfer diffusion model (Demo)* ๐Ÿ”ฅ Hourglass Diffusion (HDiT) paper - High Resolution Image synthesis - (X, Blog, Paper, Github)* Tools & Others* Prophetic announces MORPHEUS-1, their EEG/fMRI multimodal ultrasonic transformer for Lucid Dream induction (Announcement)* NSF announces NAIRR with partnership from all major government agencies & labs including, OAI, WandB (Blog)* Runway adds multiple motion brushes for added creativity (X, How to)Open Source LLMs Stability releases StableLM 1.6B tiny LLMSuper super fast tiny model, I was able to run this in LMStudio that just released an update supporting it, punches above it's weight specifically on other languages like German/Spanish/French/Italian (beats Phi)Has a very surprisingly decent MT-Bench score as wellLicense is not commercial per se, but a specific Stability AI membershipI was able to get above 120tok/sec with this model with LM-Studio and it was quite reasonable and honestly, itโ€™s quite ridiculous how fast weโ€™ve gotten to a point where we have an AI model that can weight less that 1GB and has this level of performance ๐ŸคฏVision & Video & MultimodalityTiny VLM Moonbeam1 (1.6B) performs really well (Demo)New friend of the pod Vik Hyatk trained Moonbeam1, a tiny multimodal VLM with LLaVa on top of Phi 1 (not 2 cause.. issues) and while it's not commercially viable, it's really impressive in how fast and how quite good it is. Here's an example featuring two of my dear friends talking about startups, and you can see how impressive this TINY vision enabled model can understand this scene. This is not cherry picked, this is literally the first image I tried with and my first result. The image features two men sitting in chairs, engaged in a conversation. One man is sitting on the left side of the image, while the other is on the right side. They are both looking at a laptop placed on a table in front of them. The laptop is open and displaying a presentation, possibly related to their discussion.In the background, there is a TV mounted on the wall, and a cup can be seen placed on a surface nearby. The scene suggests a casual and collaborative environment where the two men are sharing ideas or discussing a topic.Vik joined us on the pod to talk about why he didn't go with Phi-2, he also mentioned that Phi-1.5 was retroactively also MIT'd, it's license literally says MIT now on HF ๐Ÿ‘ Great conversation, tune in for that at around 00:31:35Adept is teasing FuYu Large - their CHONKY VLMAdept previously released Persimmon, and then Fuyu VLM (which is a type of persimmon we see you adept) and now tease the release for Fuyu Heavy, a much bigger model that can compete or come close to GPT4V and GeminiUltra on MMMU and MMLU (text) while being 20x smaller approx. While we don't yet get to play with this, they show some great promise in the benchmarksโญ๏ธ Performance: Excels at multimodal reasoning and matches/exceeds text-based benchmarks.โ—๏ธ Challenges Faced: Dealt with issues related to image data, model stability, and pre-training data scarcity.โœ… Evaluations: Outperforms Gemini Pro on MMLU and MMMU benchmarks.AI Summary by Arc Browser (haha see how I cheated here? I sometimes do shortcut summaries using Arc Max, it's dope, try it) https://t.co/BZi6EKhS5RFireworks AI releases FireLLaVa - with a commercially available licenseย FireLLaVA is the first commercially permissive open-source LLaVA model, a type of multi-modality model called a Vision-Language Model (VLM) that can understand both visual and textual inputs.* The original LLaVA model was limited for commercial use as it was trained on data generated by GPT-4, which has non-commercial licenses.ย * Fireworks.ai recreated the LLaVA training data using an open-source language model, CodeLlama 34B Instruct, to make a commercially viable version.- * FireLLaVA performs comparably to the original LLaVA model on benchmarks, showing open-source models can generate high-quality data for VLM training.* FireLLaVA is available via HuggingFace and through Fireworks.ai's prediction API, enabling new visual capabilities for applications.Vik and I chatted about this, and while Fireworks didn't release datasets, they did release an example of how to start collecting them, and it's clear that everyone is clamoring after great vision / image datasets ๐Ÿ‘Really hoping that many great dataset for multimodal AIs will come out in 2024 giving us increasingly better multi modal LMMs ๐Ÿ‘Big CO LLMs + APIs (Blog)GOOGLE announces LUMIERE video generation model that shows incredible push in consistency Supports multiple tasks like image to video, text to video, video inpainting, Video stylezation and more, looks incredible. It seemed that they have cracked both spatial and temporal consistency, something that's severly lacking in previous video generation attempts, and makes character consistency quite remarkable. Of course, as with other google incredible papers, we never know if we'll ever see this model or be able to play with it, here's hoping ๐ŸคžGoogle will add 3 new AI features to chrome* Chrome is introducing 3 new experimental AI features to make browsing more efficient:* Tab Organizer: Chrome will automatically group similar tabs to help with multitasking* Custom themes: Users can generate unique browser themes using text prompts and AI image generation* Writing help: Chrome will offer suggestions to help users draft messages and posts on websites- They are currently only available to US users who opt-in on the Experimental Features pageย I think this development is super super important because making AI accessible via the incredible Chrome platform to billions of people, is going to put Gemini in front of grandmas, students, everyone. Qutie impressive and the compute needed to pull something like this off is also quite mindboggling! ๐Ÿ‘ Of course, they are not the first browser to add AI, I love the Arc Browser and it has AI previews that I use quite often! This weeks Buzz (What I learned with Weights & Biases this week)Have you like many of us have trouble getting structure output (JSON, other stuctures) from LLMS? Jason also had this problem, that's why he authored the Instructor Library, which makes it easy to guide the LLM to give structured output using Pydantic. Jason has presented at Ai Engineer conference, and recently collaborated with Weights & Biases to launch a free course in how to guide your LLM to give structured outputs! COURSE LINKJason is also an independent consultant working with companies on their AI implementations and has many battle tested examples from implementations across the board, which he shared with us on the pod. Give this short course a try if you haven't yet, it's really high quality content, in addition to tons of other stuff we have there, for free ๐Ÿ‘Voice & Audio 11Labs has a new overdub studio and it's really working wellCheck out this short segment of myself, speaking in dubbed Russian! Itโ€™s really sounds like me, sent to my mom to see if she falls for it ๐Ÿ˜† She didnโ€™tAI Art & DiffusionHourglass Diffusion TransformersNew high resolution diffusion architecture from K-diffusion and RoPE team (X, Blog, Paper, Github)Paper presents a new method called HDiT ( HourGlass Diffusion Transformers) that shows promise in training models with high resolution images without incurring the significant hardware costs that go with scaling image sizes, replaces the latent diffusion models enabling O(n) complexity and scaling well. Utilizing tricks and best practices for transformers architectures, like RoPe (that we've covered on ThursdAI before) cosine similarity self-attention, RMSNorm, GeGLU, etc. and using something called local self attention, this paper shows incredible promise for high resolution architectures for image creation tools. We had the pleasure to host Tanishq Abraham, one of the co-authors (and CEO of MedArc, Director of research with Stability + PHD at 19) to walk us through the p
๐Ÿ‘‹ Hey there, been quite a week, started slow and whoah, the last two days were jam-packed with news, I was able to barely keep up! But thankfully, the motto of ThursdAI is, we stay up to date so you donโ€™t have to! We had a milestone, 1.1K listeners tuned into the live show recording, itโ€™s quite the number, and Iโ€™m humbled to present the conversation and updates to that many people, if youโ€™re reading this but never joined live, welcome! Weโ€™re going live every week on ThursdAI, 8:30AM pacific time.TL;DR of all topics covered: * Open Source LLMs * Nous Hermes Mixtral finetune (X, HF DPO version, HF SFT version)* NeuralBeagle14-7B - From Maxime Labonne (X, HF,)* It's the best-performing 7B parameter model on the Open LLM Leaderboard (when released, now 4th)* We had a full conversation with Maxime about merging that will release as a standalone episode on Sunday! * LMsys - SGLang - a 5x performance on inference (X, Blog, Github)* NeuralMagic applying #sparceGPT to famous models to compress them with 50% sparsity (X, Paper)* Big CO LLMs + APIs* ๐Ÿ”ฅ Google Deepmind solves geometry at Olympiad level with 100M synthetic data (Announcement, Blog)* Meta announces Llama3 is training, will have 350,000 H100 GPUs (X)* Open AI releases guidelines for upcoming elections and removes restrictions for war use (Blog)* Sam Altman (in Davos) doesn't think that AGI will change things as much as people think (X)* Samsung S24 has AI everywhere, including real time translation of calls (X)* Voice & Audio* Meta releases MAGNet (X, HF)* AI Art & Diffusion & 3D* Stable diffusion runs 100% in the browser with WebGPU, Diffusers.js (X thread)* DeciAI - Deci Diffusion - A text-to-image 732M-parameter model thatโ€™s 2.6x faster and 61% cheaper than Stable Diffusion 1.5 with on-par image quality* Tools & Hardware* Rabbit R1 announces a deal with Perplexity, giving a full year of perplexity pro to Rabbit R1 users and will be the default search engine on Rabbit (link)Open Source LLMs Nous Research releases their first Mixtral Finetune, in 2 versions DPO and SFT (X, DPO HF)This is the first Mixtral finetune from Teknium1 and Nous team, trained on the Hermes dataset and comes in two variants, the SFT and SFT+DPO versions, and is a really really capable model, they call it their flagship! This is the fist Mixtral finetune to beat Mixtral instruct, and is potentially the best open source model available right now! ๐Ÿ‘ Already available at places like Together endpoints, GGUF versions by the Bloke and Iโ€™ve been running this model on my mac for the past few days. Quite remarkable considering where we are in only January and this is the best open chat model available for us. Make sure you use ample system prompting for it, as it was trained with system prompts in mind. LMsys new inference 5x with SGLang & RadixAttention (Blog)ย LMSys introduced SGLang, a new interface and runtime for improving the efficiency of large language model (LLM) inference. It claims to provide up to 5x faster inference speeds compared to existing systems like Guidance and vLLM.ย SGLang was designed to better support complex LLM programs through features like control flow, prompting techniques, and external interaction. It co-designs the frontend language and backend runtime.- On the backend, it proposes a new technique called RadixAttention to automatically handle various patterns of key-value cache reuse, improving performance.ย - Early users like LLaVa reported SGLang providing significantly faster inference speeds in their applications compared to other options. The LMSys team released code on GitHub for others to try it out.Big CO LLMs + APIsMeta AI announcements (link)These #BreakingNews came during our space, Mark Zuckerberg posted a video on Instagram saying that Llama3 is currently training, and will be open sourced! He also said that Meta will have 350K (thatโ€™s not a typo, 350,000) H100 GPUs by end of the year, and a total of ~600,000 H100 equivalent compute power (including other GPUs) which isโ€ฆ ๐Ÿคฏ (and this is the reason why I had to give him double GPU rich hats)Deepmind releases AlphaGeometry (blog)Solving geometry at the Olympiad gold-medalist level with 100M synthetic examplesAlphaGeometry is an AI system developed by Google DeepMind that can solve complex geometry problems on par with human Olympiad gold medalistsIt uses a "neuro-symbolic" approach, combining a neural language model with a symbolic deduction engine to leverage the strengths of bothThe language model suggests useful geometric constructs to add to diagrams, guiding the deduction engine towards solutionsIt was trained on over 100 million synthetic geometry examples generated from 1 billion random diagramsย On a benchmark of 30 official Olympiad problems, it solved 25 within time limits, similar to the average human medalistOpenAI releases guidelines for upcoming elections. (Blog)- OpenAI is taking steps to prevent their AI tools like DALL-E and ChatGPT from being abused or used to spread misinformation around elections- They are refining usage policies for ChatGPT and enforcing limits on political campaigning, impersonating candidates, and discouraging voting- OpenAI is working on technology to detect if images were generated by DALL-E and labeling AI-generated content for more transparency ย - They are partnering with organizations in the US and other countries to provide users with authoritative voting information through ChatGPT- OpenAI's goal is to balance the benefits of their AI while mitigating risks around election integrity and democratic processesMicrosoft announces copilot PROMicrosoft announced new options for accessing Copilot, including Copilot Pro, a $20/month premium subscription that provides access to the latest AI models and enhanced image creation. Copilot for Microsoft 365 is now generally available for small businesses with no user minimum, and available for additional business plans. This weeks Buzz (What I learned with WandB this week)Did you know that ThursdAI is not the FIRST podcast at Weights & Biases? (Shocking, I know!) Lukas, our CEO, has been a long time host of the Gradient Dissent pod, and this week, we had two of the more prolific AI investors on as guests, Elad Gil and Sarah Guo. Itโ€™s definitely worth a listen, itโ€™s more of a standard 1:1 or sometimes 1:2 interview, so after you finish with ThursdAI, and seeking for more of a deep dive, definitely recommended to extend your knowledge. AI Art & DiffusionZero shot face adapted image gen - 3 different tech approaches What used to take ages, now takes seconds with 0 shot, there are quite a few approaches to generate images with real human faces, in 0 shot capacity, providing just a few faces. Gradio folks call it Zero-shot face-adapted image generation and there are 3 tools to generate those: 1โƒฃIPAdapter 2โƒฃPhotoMaker 3โƒฃInstantIDHereโ€™s a great summary thread from Gradio folks for this fast advancing field! Remember we had to finetune on faces for a long time? Dreambooth and then LORAs, and now we have this exciting development. Tools & HardwareRabbit R1 partners with PerplexityThe R1 device that was just announced, is about to sell through itโ€™s first 50K in just a few days, which is remarkable. I definitely pre-ordered one, and canโ€™t wait to get my hands on it. Jesse the founder has been all over X, getting incredible recognition, and after a few conversations with Aravind Srinivas, they agreed to make a deal right on X.Today they hopped on a space and announced that all the first 100K early buyers of Rabbit are going to get a full year PRO subscription of Perplexity (one of the best AI search engines out there) for free! I sure as heck didnโ€™t expect it, but the email was sent just a few minutes after the X space, and now guess who uses perplexity pro? Hereโ€™s an example of a perplexity searching ThursdAI content (it doesnโ€™t always get it right tho)! I guess thatโ€™s it for today, as Iโ€™m writing this, there are incredible other stuff getting released, Codium open sourced AlphaCodium (hereโ€™s a link to the founder talking about it) but I didnโ€™t have a second to dive into this, hopefully will bring Imatar to ThursdAI next time and chat about it! Have a great weekend all ๐Ÿซก (please give us a good review on Apple Itunes, apparently it really helps discovery!) Full Transcription for convenience: [00:00:02] Alex Volkov: Hey everyone, happy Thursday. My name is Alex Volkov. I'm an AI evangelist with Weights Biases, and this is Thursday AI.[00:00:13] Alex Volkov: We had such a great show today, over 1100 of you tuned in to the live recording, which is incredible.[00:00:30] I also wanted to say that if you're not subscribed to thursdai.news newsletter, please go ahead and do because I send a full blog with the links to the show notes and to the speakers that we have on stage, and you should be able to follow up.[00:00:46] Alex Volkov: There's a bunch of multimedia, like videos, that are not coming through in the audio only podcast format. So please subscribe to ThursdayEye. News as well. This live recording, we also hosted Maxime Lebon, who's a senior machine learning scientist with J.[00:01:04] Alex Volkov: P. Morgan, and the author of several models, and Merged models, lately the Neural Beagle model that we've talked about. We had a great conversation with Maxime. And that full episode will be posted as a Sunday special evergreen content episode. So please stay tuned for that.[00:01:29] Alex Volkov: It's been an incredibly illuminating conversation in the world of merging and merge kit and everything else that Maxim does and it was a super cool conversation. So that's coming soon.[00:01:41] Alex Volkov: And, as I've been doing recently, the following is going to be a 7 minute segment, from the end of the live recording, summarizing everything we've talked about.[00:01:54] Alex Volkov: I hope you've been enjoying these TLDR intros. Please let me know in the comments if this is something that's helpful to you.[00:02:05] ThursdAI Jan18 TL;DR
ThursdAI - Sunday special deep dive, interviews with Joao, and Jon, AI agent Crews and Bagel Merges. Happy Sunday dear reader, As you know by now, ThursdAI pod is not a standard interview based podcast, we don't focus on a 1:1 guest/host conversation, but from time to time we do! And this week I was very lucky to have one invited guest and one surprise guest, and I'm very happy to bring you both these conversations today. Get your Crew together - interview with Joรฃo Moura, creator of CrewAIWe'll first hear from Joรฃo Moura, the creator of Crew AI, the latest agent framework. Joรฃo is a director of AI eng. at Clearbit (acquired by Hubspot recently) and created Crew AI for himself, to automate many of the things he didn't want to keep doing, for example, post more on Linkedin. Crew has been getting a lot of engagement lately, and we go into the conversation about it with Joรฃo, it's been trending #1 on Github, and received #2 product of the day when Chris Messina hunted this (to Joรฃo's complete surprise) on Product Hunt. CrewAI is built on top of Langchain, and is an agent framework, focusing on Orchestration or role-playing, autonomous agents. In our chat with Joรฃo we go into the inspiration, the technical challenges and the success of CrewAI so far, how maintenance for crew is now partly a family effort and what's next for crewMerges and Bagels - chat with Jon Durbin about Bagel, DPO and mergingThe second part of today's pod was a conversation with Jon Durbin, a self described AI tinkerer and software engineer. Jon is a Sr. applied AI researcher at Convai, and is well known in our AI circles as a master finetuner and dataset curator. This interview was not scheduled, but I'm very happy it happened! If you've been following along with the AI / Finetuning space, Jon's Airoboros dataset and set of models have been often mentioned, and cited, and Jon's latest work on the Bagel models took the lead on HuggingFace open LLM leaderboardSo when I mentioned on X (as I often do) that I'm going to mention this on ThursdAI, Jon came up to the space and we had a great conversation, in which he shared a LOT of deep insights into finetuning, DPO (Direct Preference Optimizations) and merging. The series of Bagel dataset and models, was inspired by the Everything Everywhere All at Once movie (which is a great movie, watch it if you haven't!) and is alluding to, Jon trying to throw as many datasets together as he could, but not only datasets! There has been a lot of interest in merging models recently, specifically many folks are using MergeKit to merge models with other models (and often a model with itself) to create larger/better models, without additional training or GPU requirements. This is solely an engineering thing, some call it frankensteining, some frankenmerging.If you want to learn about Merging, Maxime Labonne (the author of Phixtral) has co-authored a great deep-dive on Huggingface blog, it's a great resource to quickly get up to speedSo given the merging excitement, Jon has set out to create a model that can be an incredible merge base, many models are using different prompt techniques, and Jon has tried to cover as many as possible. Jon also released a few versions of Bagel models, DPO and non DPO, that and we had a brief conversation about why the DPO versions are more factual and better at math, but not great for Role Playing (which is unsurprisingly what many agents are using these models for) or creative writing. The answer is, as always, dataset mix! I learned a TON from this brief conversation with Jon, and if you're interested in the incredible range of techniques in the Open Source LLM world, DPO and Merging are definitely at the forefront of this space right now, and Jon is just at the cross-roads of them, so definitely worth a listen and I hope to get Jon to say more and learn more in future episodes so stay tuned! So I'm in San Francisco, again... As I've mentioned on the previous newsletter, I was invited to step in for a colleauge and fly to SF to help co-host a hack-a-thon with friends from TogetherCompute, Langchain, in AGI house in Hillsborough CA. The Hackathon was under the Finetune VS RAG theme, because, well, we don't know what works better, and for what purpose.The keynote speaker was Tri Dao, Chief Scientist @ Together and the creator of Flash Attention, who talked about SSM, State space models and Mamba. Harrison from Langchain gave a talk with a deepdive into 5 techniques for knowledge assistants, starting with basic RAG and going all the way to agents ๐Ÿ‘I also gave a talk, but, I couldn't record a cool gif like this for myself, but thanks to Lizzy I got a pic as well ๐Ÿ™‚ Here is the link to my slides if interesting (SLIDES)More than 150 hackers got together to try and find this out, and it was quite a blast for me to participate and meet many of the folks hacking, hear what they worked on, what worked, what didn't, and how they used WandB, Together and Langchain to achieve some of the incredible results they hacked together in a very short time. The projects showcased a range of creative applications leveraging RAG, finetuning, and other large language models. Several projects like Magic RAG, CareerNavigator-AI, and CompetitionAI used RAG for document retrieval and knowledge enhancement. Others like rags2pizza and Naturalist DALL-E focused more on finetuning models for specific domains. Some projects compared finetuning and RAG, finding that combining both gives superior performance over using either alone but that result wasn't conclusive. My vote as a judge (which I did not expect to be) eventually went to the team that built the OptiMUS project, they had generated a systentic dataset, cleaned it up, finetuned a model on it, and showed that they want to optimize AI agents. They used WandB to track their work and I hope they take this project forward and keep making advancements in AI. Congrats for the win Ali and Shayan, hope you enjoy the WandB branded Airpods (even I don't have those) and the Meta Quest, well deserved! Thank you for tuning in! See you next week! Full Transcription :[00:00:00] Alex Volkov: Hi. Welcome back to Thursday. The Sunday special episode. This is Alex Volkov. And I'm recording this in. A gorgeous space. In San Francisco. Where I was. Invited to judge hackathon. And now I'm hanging out with a few friends from cerebral valley. So thank you. Valley folks. For letting me use this place for recording and Today, we have a special episode for you. As If you hear this on Sunday. Today's not a Thursday. We often times have special guests on the pod. Where conversations. Or deeper.[00:00:45] Alex Volkov: And usually I reserve that slot for a Sunday special release. So this is what you're hearing now. In today's episode, we actually have two conversations. Although I only planned on one. And the first part is the planned part that you hear from Joao Maura. He is a director of AI in Clearbit, and now acquired by HubSpot. And he's also the creator of Crew AI and the Gentek AI framework that can run. By orchestrating.[00:01:14] Alex Volkov: c[00:01:15] Alex Volkov: The digital AI agents and have them work together.[00:01:19] Alex Volkov: And I think you'll hear from, Joao why this peaked interest. For many folks. Specifically. Because as we caught up with. Wow.[00:01:29] Alex Volkov: Crew AI was trending on GitHub and getting number two on product hunt at the same time. And it's a really cool framework. And I think the underlying. Power of this is that it can use open source, local models. A lot of previous agent attempts used GPT4 For example, and the crew AI can use things like Mistral or Mixtral running in LM studio or Ollama on your Mac, which I think is super cool.[00:01:55] Alex Volkov: And I think on device AI, plus something like this framework is going to be very, very powerful. It was a great conversation was wow. And surprising to me, the second guest was not planned. However you may have heard from the previous Thursday that the. Bagel series of models from a. Self-proclaimed AI, tinker, John Durbin. Have taken over the leaderboards on hung and face. Including a bunch of mergers and we haven't. Done a deep dive into merges and merge good and Franklin state models.[00:02:32] Alex Volkov: But if you've been to Thursday for awhile, you probably heard about them. Merging is a technique to take a model or different models. And without any computation, great, bigger or different models using a dissection and some computing. Process of the layers of those models just based on weights without any training or continuing to fine tuning, which is incredibly interesting.[00:02:58] Alex Volkov: And John goes into this a little bit and he created. Bagel. Based on the inference of what I'll let you hear this at the end. And it's a very fascinating conversation. I took a lot from it and unfortunately we didn't have time for a long, deep dive, but I learned a lot from John and hopefully he'll come from the podcast and we'll be able to deep even dive even deeper and talk with John about. How to create data sets, why DPO is better than PPO and all of these great things. So we had two great guests. And I. Had a blast having them on the bud and I probably should do more of these deep dives.[00:03:37] Alex Volkov: So please let me know what you think. Don't forget to subscribe to the newsletter or I sent a summary and in the newsletter, you'll find my. Trip report, quote unquote for the hackathon. There was co-sponsor with together, AI. And Lang chain and Harrison was there and I gave a brief talk as well. And the, sorry, I that a bunch of pictures.[00:03:57] Alex Volkov: So if you're hearing this in your car, check out the newsletter afterwards on Thursday, either.[00:04:02] Alex Volkov: And with that, I give you our first guests as well. Maura. All right, everyone. Welcome back to ThurdsAI. And we have a great guest today. Joรฃo Moura from I want to say clear a bit. If I'm not mistaken.
Hey hey everyone, how are you this fine ThursdAI? ๐Ÿ‘‹ Iโ€™m gud thanks for asking!Iโ€™m continuing my experiment of spilling the beans, and telling you about everything we talked about in advance, both on the pod and in the newsletter, so let me know if this is the right way to go or not, for the busy ones it seems that it is. If you donโ€™t have an hour 15, hereโ€™s a short video recap of everything we chatted about:ThursdAI - Jan 11 2024 TL;DRTL;DR of all topics covered + Show notes* Open Source LLMs* ๐Ÿ”ฅ Donut from Jon Durbin is now top of the LLM leaderboard (X, HF, Wolframs deep dive and scoring)* OpenChat January Update - Best open source 7B LLM (X, Hugging Face)* Our friends at NousResearch announce a seed round of 5.2M as their models pass 1.2 million downloads (X)* Argilla improved (Distillabeled?) the DPO enhanced Neural Hermes with higher quality DPO pairs (X)* New MoEs are coming out like hotcakes - PhixTral and DeepSeek MoE (X, Omar Thread, Phixtral Thread)* Microsoft makes Phi MIT licensed ๐Ÿ‘* Big CO LLMs + APIs* OpenAI adds personalization & team tiers (Teams announcement)* OpenAI launches GPT store (Store announcement, Store link)* Mixtral medium tops the LMsys human evaluation arena, is the best LLM overall after GPT4 ๐Ÿ‘ (X)* Hardware* Rabbit R1 is announced, $200/mo without a subscription, everybody has a take (X)* This weeks Buzz from Weights & Biases* Hackathon with Together, Langchain and WandB (and ME!) this weekend in AGI house (X, Signup)* Video* Bytedance releases MagicVideo-V2 video gen that looks great and passes Pika labs in human tests (X)* AI Art & Diffusion & 3D* Luma launched their online version of Genie and it's coming to the API (X)* Show notes and links mentioned* MergeKit (github)* Jon Durbins Contextual DPO dataset (HuggingFace)* Phixtral from Maxime Lebonne (X, HuggingFace)* WandGPT - out custom Weights & Biases GPT (GPT store)* Visual Weather GPT by me - https://chatg.pt/artweather* Ask OpenAI to not train on your chats - https://privacy.openai.com/policiesAI HardwareIt seems that the X conversation had a new thing this week, the AI hardware startup Rabbit, showcased their new $200 device (no subscriptions!) at CES and everyone and their mom had an opinion! We had quite a long conversation about that with (his first time on ThursdAI ๐Ÿ‘) as we both pre-ordered one, however there were quite a few red flags, like for example, GPUs are costly, so how would an AI device that has AI in the cloud just cost a 1 time 200 bucks??There were other interesting things they showed during the demo, and Iโ€™ll let you watch the full 30 minutes and if you want to read more, hereโ€™s a great deeper dive into this from .UPDATE: Ss Iโ€™m writing this, the CEO of Rabbit (whoโ€™s also on the board of Teenage Engineering, the amazing company that designed this device) tweeted that they sold out the initial first AND second batch of 10K unites, netting a nice $2M in hardware sales in 48 hours!Open Source LLMsMixtral paper dropped (ArXiv, Morgans take)Mistral finally published the paper on Mixtral of experts, the MoE that's the absolutel best open source model right now, and it's quite the paper. Nisten did a full paper reading with explanations on X space, which I co-hosted and we had almost 3K people tune in to listen. Here's the link to the live reading X space by Nisten.And here's some notes courtecy Morgan McGuire (who's my boss at WandB btw ๐Ÿ™Œ)Strong retrieval across the entire context windowMixtral achieves a 100% retrieval accuracy regardless of the context length or the position of passkey in the sequence.Experts don't seem to activate based on topicSurprisingly, we do not observe obvious patterns in the assignment of experts based on the topic. For instance, at all layers, the distribution of expert assignment is very similar for ArXiv papers (written in Latex), for biology (PubMed Abstracts), and for Philosophy (PhilPapers) documents.However...The selection of experts appears to be more aligned with the syntax rather than the domainDatasets - No info was provided to which datasets Mixtral used to pretrain their incredible models ๐Ÿ˜ญUpsampled multilingual dataCompared to Mistral 7B, we significantly upsample the proportion of multilingual data during pretraining. The extra capacity allows Mixtral to perform well on multilingual benchmarks while maintaining a high accuracy in EnglishMixtral Instruct TrainingWe train Mixtral โ€“ Instruct using supervised fine-tuning (SFT) on an instruction dataset followed by Direct Preference Optimization (DPO) on a paired feedback dataset and was trained onย @CoreWeaveJon Durbin Donut is the ๐Ÿคด of open source this week6 of the top 10 are donut based models or merges of it. If you remember Auroborous, Donut includes that dataset, and there are two varieties there, the DPO and the non DPO versions of Bagel, including two merges from Cloudyu, which are non trained merges with mergekit, based on Donut. Jon pro tip for selecting DPO vs Non DPO models isFYI, the DPO version is more factual, truthful, better at math, etc., but is not great for RP, creative writing, etc. Use non-DPO for those tasks!Donut includes an impressive amount of dataset mixed together, which are all linked from the model card but here they are:"ai2_arc, airoboros, apps, belebele, bluemoon, boolq, capybara, cinematika, drop, emobank, gutenberg, lmsys_chat_1m, mathinstruct, mmlu, natural_instructions, openbookqa, pippa, piqa, python_alpaca, rosetta_code, slimorca, spider, squad_v2, synthia, winogrande, airoboros 3.1 vs airoboros 2.2.1, helpsteer, orca_dpo_pairs"Jon also shared his end of the year WandB report nad has trained a whopping 917 models this year for a total of ~2500 hours and is in the top 10% of the top active users (among 800K or so users)I didn't know that Jon is going to join, but was so happy that he joined the live recording that we ended up chatting for 20 minutes, and there was so many nuggets in that conversation, about how to prepare DPO datasets, which other ones Jon has been releasing, and just a bunch more gold, that I decided to CUT that out and post it as a separate special deepdive episode that's going to get released on the Sunday special. Stay tuned for that!Nous Research announces $5.2 million funding seed round as they cross 1.1 million model downloads on the hubCongrats to Karan, Emozilla, Teknium, Bowen, Shivani and the rest of the Nous team on this great news! ๐Ÿ‘ We expect to hear more from them in the coming year, with a consistent commitment to open source, keep open sourcing the best models, and the upcoming Forge news!With investors like Balaji, OSS capital, Vipul from Together, Nous completes the $5.2M seed round, and we had Karan (one of the co-founders of Nous) on the pod to chat to use about what they are planning to do with that money and what are their continuous commitments to open source!In addition, they just recently passed 1.1 million downloads on the hub with Nous-Hermes-2-34B being their best model! ๐ŸคดOpenChat Jan update becomes the leading open source 7B model (X, Hugging Face)This update mainly enhanced training methodology, in-context learning & coding skills, outperforming the last 1210 release on 7 out of 8 benchmarks! and scores 71.3 on HumanEval, 65.8% on MMLU ๐Ÿ‘The previous version of OpenChat trails just behind OpenHermes on the human evals on Lmsys arena, but both are incredible 7B models.Argilla- Argilla used their Distilabel tool to build a preference dataset from ratings and critiques of AI response pairs, taking around 3 hoursย - The original dataset assumed the GPT-4/3.5 responses were always best, but Argilla found this was not always the case- Their dataset confirmed ~4,000 pairs had the same rating, 7,000 pairs were unchanged, and ~2,000 times the rejected response was preferred ย - Improving existing DPO datasets with higher quality pairs is important for model fine-tuning- They are releasing an improved version of the popular Orca Pairs DPO dataset from Intel, and a new OpenHermes model outperforming baselines with 54% fewer DPO pairsBig CO LLMs + APIsOpenAI has a big week, launches GPTs store and team pro accounts (Blog)Things of note about the store:* My GPTs are getting feedback and crossed 10K chats , was #6 on lifestyle and the disappeared, but has gained 2x more chats in 24 hours since the store has launched!* Discoverability is great, trending GPTs are shown clearly, and folks are getting a lot of exposure* Copycats already started copying a bunch of the great GPTs, see this example of what happens when you search for Gymstreak, most of the top GPTs are already being copy-catted.Team accounts:$25/mo per user for annual plans and at least 2 teamsThe biggest confusion was from folks who didn't understand that OpenAI trains on Pro conversations, and there's an option to Opt-out!This weeks Buzz (What I learned with WandB this week)Weights and Biases (and ME!) are going to AGI house to lead a Rag vs Finetune hackathon with cool prizes!There's still time to RSVP, will incredible guests speakers, this Hackathon is organized together with... LangChain, TogetherCompute and AGI house - If you're in the SF area, and you wanna hack on some cool RAG things and get awesome prizes (and meet me!) join the waitlist here https://partiful.com/e/AlntdLtxh9Jh1J6PcsmaVision & VideoLuma released GENIE on Web and IOS, if you remember, we covered the GENIE text-to-3d model they first released on discord a while ago, and now it's incorporated into the luma website, and is significantly higher quality 3D assets.The generations are free for now, and they look awesome! Here are some of mine, I created a Bee holding a Wand (get it? WandB? ๐Ÿ˜†) and a polish bear (internal joke) and they look so cool!Friend of the pod and recent LUMA hire Arthur Islamov jumped on and also told us that this is coming to the API, so developers would be able to automate asset creation and generate tons of 3D objects programmatically, and use cool prompt techniques to make sure they are a bit better
loading
Commentsย 
Download from Google Play
Download from App Store