
Author:
Subscribed: 0Played: 0Subscribe
Share
Description
Episodes
Reverse
Dr. Dan Shiebler, Head of ML at Abnormal Security, joins Jon Krohn this week and unveils the intricacies of cybercrime detection and email protection, and the role of AI in future challenges.
This episode is brought to you by Grafbase (https://grafbase.com), the unified data layer, by ODSC (https://odsc.com/), the Open Data Science Conference, and by Modelbit (https://modelbit.com), for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn:
• The heuristic and “intermediate” ML models that they develop at Abnormal Security [07:08]
• How Dan uses LLMs at Abnormal Security [15:46]
• How false negatives are individually the biggest classification error to avoid in cybersecurity [20:49]
• How head-to-head competitor analysis helps refine models [34:34]
• Resilient ML in cybersecurity [38:36]
• Abnormal Security’s routine for updating their models [52:37]
• AI's impact on the urban world [1:09:57]
• How to stay updated in data science and AI [1:13:46]
Additional materials: www.superdatascience.com/717
Jon Krohn's 94-year-old grandmother, Annie, who's bursting with life and wisdom, shares her recipe to lifelong happiness and how relationships and daily intentions play an integral role. Annie also shares her curious take on modern technology. Get inspired by her infectious joy and perspective on life.
Additional materials: www.superdatascience.com/716
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
Join us as Dr. Allen Downey, renowned author and professor, shares insights from his upcoming book 'Probably Overthinking It,' breaking down underused techniques like Survival Analysis, explaining common paradoxes, and discussing the dynamic Overton Window.
This episode is brought to you by the Zerve data science dev environment (https://zerve.ai), by Modelbit (https://modelbit.com), for deploying models in seconds, and by Grafbase (https://grafbase.com), the unified data layer. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn:
• Why interpreting data is not always easy [06:21]
• What is Survival Analysis [15:32]
• Preston's Paradox [22:09]
• Are you Normal? [36:52]
• How to better prepare for rare “Black Swan” events [42:48]
• What is an Overton Window? [53:06]
• What is the base rate fallacy? [1:23:31]
• How to protect yourself from biased samples [1:33:39]
• Simpson’s Paradox [1:42:43]
Additional materials: www.superdatascience.com/715
In this Friday episode, guest Tim Albiges explores with host Jon Krohn how people with blindness can have a lucrative and fulfilling career in data science, how Tim’s PhD thesis applied machine learning to help diagnose chronic respiratory diseases, and the communication tools that blind people can use to live a full and independent life.
Additional materials: www.superdatascience.com/714
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
Artificial General Intelligence, RLHF’s application in AI, and how entrepreneurs can enter the AI industry: Meta’s AI Research Scientist Thomas Scialom gives us behind-the-scenes insights into developing Llama 2 and what’s in the works for Llama 3. With host Jon Krohn, he discusses the future of Artificial General Intelligence, why the Galactica science-focused LLM was taken down, and what he learned from it.
This episode is brought to you by AWS Inferentia (https://go.aws/3zWS0au), by Grafbase (https://grafbase.com), the unified data layer, and by Modelbit (https://modelbit.com), for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn:
• Llama 2: Behind the Scenes of Today’s Top Open-Source LLM [05:04]
• Responsible use of Llama 2 [15:26]
• Toolformer: LLM That Learns How to Use External Tools [24:57]
• Galactica: The Science-Specific LLM and Why It Was Brought Down [36:57]
• Is AGI Around the Corner? [57:03]
• Advice for AI entrepreneurs [1:05:46]
• How Thomas develops and manages large-scale AI projects [1:14:42]
Additional materials: www.superdatascience.com/713
Code Llama might just be starting the revolution for how data scientists code. In this Five-Minute Friday, host Jon Krohn investigates the suite of models under the free-to-use Code Llama and how to find the best fit for your project’s needs.
Additional materials: www.superdatascience.com/712
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode, host Jon Krohn explores with his guest Ajay Jain, Co-Founder of Genmo.ai, how creative general intelligence could take the video industry by storm. They also discuss the models that got Genmo to this point, the applications of NeRF, and how understanding human psychology is so essential to developing models that output high-fidelity video.
This episode is brought to you by the Zerve data science dev environment (https://zerve.ai), by Grafbase (https://grafbase.com), the unified data layer, and by Modelbit (https://modelbit.com), for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn:
• About Genmo.ai and the term “creative general intelligence” [03:47]
• Why Ajay started Genmo.ai [09:26]
• The increased performance of multimodal models [21:12]
• All about Denoising Diffusion Probabilistic Models (DDPMs) [31:03]
• The application of Neural Radiance Fields (NeRF) [55:26]
• Predicting pedestrian behavior at Uber [1:01:50]
• How to save money in the process of training models [1:12:42]
Additional materials: www.superdatascience.com/711
Discover the power of Large Language Models with Kris Ograbek as he unravels the intricacies of LangChain and showcases a chatbot in action, all while putting our host Jon Krohn in the hot seat!
Additional materials: www.superdatascience.com/710
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
Meta's Senior Research Director, Dr. Laurens van der Maaten, takes center stage to unravel the captivating realm of AI innovation. Learn about his groundbreaking contributions, including pioneering the t-SNE dimensionality reduction technique and harnessing AI for novel protein synthesis, climate change mitigation, and wearable materials simulation. Join us to explore the transformative power of AI across diverse domains and gain a glimpse into its future societal implications.
This episode is brought to you by AWS Inferentia (https://go.aws/3zWS0au), by Modelbit (https://modelbit.com), for deploying models in seconds, and by Grafbase (https://grafbase.com), the unified data layer. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn:
• Large-scale learning of image recognition models on web data [05:05]
• Evolutionary Scale Modeling protein models [16:45]
• Fighting climate change by building an A.I. model [29:49]
• The CrypTen privacy-preserving ML framework [38:36]
• Concerns about adversarial examples [53:25]
• Laurens’ t-SNE algorithm [58:56]
• How to make a big impact [1:07:25]
Additional materials: www.superdatascience.com/709
On this week’s Five-Minute Friday, host Jon Krohn gives five reasons why he is so excited about ChatGPT’s Code Interpreter and walks listeners through its capabilities with a practical example.
Additional materials: www.superdatascience.com/708
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
LLM Vicuña, Chatbot Arena, and the race to increase LLM context windows: This episode’s guest Joey Gonzalez talks to Jon Krohn about developing models and platforms that leverage and improve LLMs, as well as the future of AI development and access.
This episode is brought to you by the AWS Insiders Podcast (https://pod.link/1608453414), by Modelbit (https://modelbit.com), for deploying models in seconds, and by Grafbase (https://grafbase.com), the unified data layer. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn:
• Vicuña: How the revolutionary LLM came to be [03:35]
• Chatbot Arena: The leading LLM leaderboard [09:47]
• Trusting LLM results [17:54]
• Gorilla: The open-source ChatGPT plugin alternative [32:13]
• About LMSYS and long context windows [47:48]
• Open- vs closed-source LLMs: Which is better? [1:01:39]
• Aqueduct [1:16:49]
• Founding GraphLab [1:27:02]
• How AI will positively impact society in the coming decades [1:33:23]
Additional materials: www.superdatascience.com/707
In this episode, Caterina Constantinescu dives deep into Large Language Models (LLMs), spotlighting top leaderboards, evaluation benchmarks, and real-world user perceptions. Plus, discover the challenges of dataset contamination and the intricacies of platforms like HELM and Chatbot Arena.
Additional materials: www.superdatascience.com/706
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
Join Jon Krohn as he chats with Syngenta Group's Feroz Sheikh, Jeremy Groeteke, and Thomas Jung about the digital revolution in agriculture. Learn how data science is evolving farming, from precision techniques to global food solutions. A compelling blend of tech meets nature.
This episode is brought to you by AWS Inferentia (https://go.aws/3zWS0au) and by Modelbit (https://modelbit.com), for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn:
• What is precision agriculture? [09:43]
• What is computational agronomy? [12:30]
• How Syngenta helps growers optimize yields [21:37]
• How to bridge the gap between R&D and out in the real world [33:58]
• What is generative chemistry? [37:52]
• How generative chemistry accelerates the discovery of new compounds [41:55]
• How you could make a big social impact in agriculture with data science [56:22]
• How to go about designing ML models for agriculture [1:00:27]
Additional materials: www.superdatascience.com/705
Take on the world of GPT and learn to develop your own, commercially successful Large Language Models (LLMs) with Jon Krohn’s comprehensive, guided training video for generative AI. Get to grips with the technology, learn which tools to use, and find out how to get an eye for business-viable models with Jon’s (ad-)free educational video.
Additional materials: www.superdatascience.com/704
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
Statistics history, interdisciplinarity, and data and society. Chris Wiggins talks with Jon Krohn about the power dynamics of data, the transformation of the field of biology through data-driven approaches to genetic sequencing, and the New York Times’ data science team’s cutting-edge approach to accommodating its tech stack.
This episode is brought to you by the AWS Insiders Podcast (https://pod.link/1608453414) and by Modelbit (https://modelbit.com), for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn:
• The importance of the humanities in data science [09:18]
• How data science “rearranges” power [17:19]
• An overview of How Data Happened [20:36]
• The controversial nature of Bayes theorem [29:16]
• Why we need to consider data ethics [34:00]
• How biology came to adopt data science into its field [45:44]
• The data science tech stack at the New York Times [49:18]
Additional materials: www.superdatascience.com/703
This week, Jon Krohn is examining Meta's newly released open-source large language model, Llama 2, highlighting its commercial prospects, immense capacity, model variety, and unique 'time awareness' feature. He also discusses its innovative two-stage RLHF approach that enhances its performance.
Additional materials: www.superdatascience.com/702
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
Dr. Raluca Ada Popa, renowned computer scientist, entrepreneur, and President of Opaque Systems, joins Jon Krohn to share her insights on securely interacting with AI APIs like OpenAI's GPT-4, the pros and cons of open vs. closed-source AI development, and the seamless operation of compute pipelines across multiple clouds.
This episode is brought to you by AWS Inferentia (https://go.aws/3zWS0au) and by Modelbit (https://modelbit.com), for deploying models in seconds. Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn:
• What is a confidential computing platform? [04:31]
• How to get started with confidential computing [12:10]
• The challenges of confidential computing and LLMs [21:11]
• How to safeguard your data while using commercial LLMs like GPT-4 [38:00]
• Open-source vs closed-source [52:28]
• Raluca's PreVail cybersecurity company [1:01:50]
• Combining entrepreneurship and academic career [1:04:03]
• DARE Program [1:10:39]
Additional materials: www.superdatascience.com/701
Yoga and Hindu mythology: This special episode continues the thread of our centenary episodes, SDS 500: Yoga Nidra with Jes Allen and SDS 600: Yoga Nidra Practice with Steve Fazzari, which talked through guided meditation techniques to help improve posture, sleep, and expand consciousness. Inspired by these sessions, host Jon Krohn explores Hindu mythology via Alan Watts’ “The Dream of Life”.
Additional materials: www.superdatascience.com/700
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
Model deployment, data warehouse options for running models, and how to best leverage BI tools: Harry Glaser and Jon Krohn discuss Modelbit’s capabilities to automate ML models from notebooks into production-ready models, reducing the time and effort in ‘translating’ information from one mode to another. Harry’s conversation with host Jon Krohn expanded on the importance of automating this task, and how developments in ML modeling have widened access to entire teams to analyze data, whatever their level of expertise.
This episode is brought to you by the AWS Insiders Podcast (https://pod.link/1608453414). Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
In this episode you will learn:
• What the modern data stack is [03:28]
• Version control for data scientists [13:30]
• CI/CD, load balancing and logging [20:38]
• Snowflake vs. Redshift [30:10]
• How tools like Looker and Tableau help monitor models [35:26]
Additional materials: www.superdatascience.com/699
Company-wide AI adoption can take a lot of persuasion. Rehgan Avon talks to host Jon Krohn about why AI has become necessary for forward-thinking businesses and the steps to implement AI in an institution so that everyone benefits.
Additional materials: www.superdatascience.com/698
Interested in sponsoring a SuperDataScience Podcast episode? Visit JonKrohn.com/podcast for sponsorship information.
I found this podcast really helpful for anyone who wants to better their knowledge of machine learning. I am especially interested in the data processing. If you want to deepen your knowledge of this topic, check this article https://techlogitic.net/categorization-and-data-labeling-for-supervised-machine-learning/. It has some pretty useful information and professional tips from experts in data annotation and tagging.
a really nice and quick overview with just the right amount of detail.
great thanks to you and your endeavors for this pod. I learnt a lot. welcome to Jon , wish you the best 👏👍
😢
you are the best
nice summarisation, Data Analyts looks at the past and data scientist looks at past and future
Great talk, very inspiring. thanks.
Sleeps 3 hrs a day, not a good example for healthy person. sleep well and keep the brain more refreshed and healthy. #health
a lot of extra, unrelated stuff. Dude I appreciate your effort but you need to be specific and respect audiences' time.
Eu não conhecia Gabriela de Queiroz mas agora ouvindo esse podcast (já ouvi umas 5x) estou completamente encantada. Muito legal descobrir esse nível de profissional pelo mundo e ainda saber que trata-se de uma brasileira.
Great job!👍 It's so interesting to listen your podcasts! thanks for sharing your knowledge and helping people to get into data business 🙌👍
Hi thanks for doing this podcast. Being a data engineer and who commutes a lot, I gain a lot from your podcasts. One suggestion that I would like to give is, it would be better if you do not interrupt the speaker until they complete their flow.
What amazing episode! Adrian rocks!! Congratulations!
Thanks for this advice !
Great episode! I wish he touched on how to connect Sparklyr to data viz like Tableau!
thought this was one of my stoic podcast episodes! Great message.
Great episode! I'd love to access the show notes, but is having an issue pulling up the link.
Great job !!! Thank you Kirill
great episode, but please mute/tone down background music
So what you are saying is that I should aim for the stars so even if I fall I will land om the moon. This is motivational for me.