DiscoverThe Gradient: Perspectives on AI
The Gradient: Perspectives on AI
Claim Ownership

The Gradient: Perspectives on AI

Author: Daniel Bashir

Subscribed: 164Played: 4,928
Share

Description

Interviews with various people who research, build, or use AI, including academics, engineers, artists, entrepreneurs, and more.
133 Episodes
Reverse
Episode 130I spoke with David Pfau about:* Spectral learning and ML* Learning to disentangle manifolds and (projective) representation theory* Deep learning for computational quantum mechanics* Picking and pursuing research problems and directionsDavid’s work is really (times k for some very large value of k) interesting—I’ve been inspired to descend a number of rabbit holes because of it. (if you listen to this episode, you might become as cool as this guy)While I’m at it — I’m still hovering around 40 ratings on Apple Podcasts. It’d mean a lot if you’d consider helping me bump that up!Enjoy—and let me know what you think!David is a staff research scientist at Google DeepMind. He is also a visiting professor at Imperial College London in the Department of Physics, where he supervises work on applications of deep learning to computational quantum mechanics. His research interests span artificial intelligence, machine learning and scientific computing.Find me on Twitter for updates on new episodes, and reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. I spend a lot of time on this podcast—if you like my work, you can support me on Patreon :) You can also support upkeep for the full Gradient team/project through a paid subscription on Substack!Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (00:52) David Pfau the “critic”* (02:05) Scientific applications of deep learning — David’s interests* (04:57) Brain / neural network analogies* (09:40) Modern ML systems and theories of the brain* (14:19) Desirable properties of theories* (18:07) Spectral Inference Networks* (19:15) Connections to FermiNet / computational physics, a series of papers* (33:52) Deep slow feature analysis — interpretability and findings on eigenfunctions* (39:07) Following up on eigenfunctions (there are indeed only so many hours in a day; I have been asking the Substack people if they can ship 40-hour days, but I don’t think they’ve gotten to it yet)* (42:17) Power iteration and intuitions* (45:23) Projective representation theory* (46:00) ???* (46:54) Geomancer and learning to decompose a manifold from data* (47:45) we consider the question of whether you will spend 90 more minutes of this podcast episode (there are not 90 more minutes left in this podcast episode, but there could have been)* (1:08:47) Learning embeddings* (1:11:12) The “unexpected emergent property” of Geomancer* (1:14:43) Learned embeddings and disentangling and preservation of topology* n/b I still haven’t managed to do this in colab because I keep crashing my instance when I use s3o4d :(* (1:21:07) What’s missing from the ~ current (deep learning) paradigm ~* (1:29:04) LLMs as swiss-army knives* (1:32:05) RL and human learning — TD learning in the brain* (1:37:43) Models that cover the Pareto Front (image below)* (1:46:54) AI accelerators and doubling down on transformers* (1:48:27) On Slow Research — chasing big questions and what makes problems attractive* (1:53:50) Future work on Geomancer* (1:55:35) Finding balance in pursuing interesting and lucrative work* (2:00:40) OutroLinks:* Papers* Natural Quantum Monte Carlo Computation of Excited States (2023)* Making sense of raw input (2021)* Integrable Nonparametric Flows (2020)* Disentangling by Subspace Diffusion (2020)* Ab initio solution of the many-electron Schrödinger equation with deep neural networks (2020)* Spectral Inference Networks (2018)* Connecting GANs and Actor-Critic Methods (2016)* Learning Structure in Time Series for Neuroscience and Beyond (2015, dissertation)* Robust learning of low-dimensional dynamics from large neural ensembles (2013)* Probabilistic Deterministic Infinite Automata (2010)* Other* On Slow Research* “I just want to put this out here so that no one ever says ‘we can just get around the data limitations of LLMs with self-play’ ever again.” Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 129I spoke with Dan Hart and Michelle Michael about:* Developing NSWEduChat, an AI-powered chatbot designed and delivered by the NSW Department of Education for students and teachers.* The challenges in effectively teaching students as technology develops* Understanding and defining the importance of the classroomEnjoy—and let me know what you think!Dan Hart is Head of AI, and Michelle Michael is Director of Educational Support and Rural Initiatives at the New South Wales (NSW) Department of Education. Find me on Twitter for updates on new episodes, and reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. I spend a lot of time on this podcast—if you like my work, you can support me on Patreon :) You can also support upkeep for the full Gradient team/project through a paid subscription on Substack!Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (00:48) How NSWEduChat came to be, educational principles for AI use* (02:37) Educational environment in New South Wales* (04:41) How educators have adapted to new challenges for teaching and assessment* (07:47) Considering technology advancement while teaching and assessing students* (12:14) Educating teachers and students about how to use AI tools* (15:03) AI in the classroom and enabling teachers* (19:44) Product-first thinking for educational AI* (22:15) Red teaming and testing* (24:02) Benchmarking, chatbots as an assistant* (26:35) The importance of the classroom* (28:10) Media coverage and hype* (30:35) Measurement and the benchmarking process/methodology* (34:50) Principles for how chatbots should interact with students* (44:29) Producing good educational outcomes at scale* (46:41) Operating with speed and effectiveness while implementing governance* (49:03) How the experience of building technologies evolves* (51:45) Identifying good technologists and educators for development and use* (55:07) Teaching standards and how AI impacts teachers* (57:01) How technologists incorporate teaching standards and expertise in their work* (1:00:03) NSWEduChat model details* (1:02:55) Value alignment for NSWEduChat* (1:05:40) Practicing caution in filtering chatbot responses* (1:07:35) Equity and personalized instruction — how NSWEduChat can help* (1:10:19) Helping students become “the students they could be”* (1:13:39) OutroLinks:* NSWEduChat* Guardian article on NSWEduChat Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 129I spoke with Kristin Lauter about:* Elliptic curve cryptography and homomorphic encryption* Standardizing cryptographic protocols* Machine Learning on encrypted data* Attacking post-quantum cryptography with AIEnjoy—and let me know what you think!Kristin is Senior Director of FAIR Labs North America (2022—present), based in Seattle. Her current research areas are AI4Crypto and Private AI. She joined FAIR (Facebook AI Research) in 2021, after 22 years at Microsoft Research (MSR). At MSR she was Partner Research Manager on the senior leadership team of MSR Redmond. Before joining Microsoft in 1999, she was Hildebrandt Assistant Professor of Mathematics at the University of Michigan (1996-1999). She is an Affiliate Professor of Mathematics at the University of Washington (2008—present). She received all her advanced degrees from the University of Chicago, BA (1990), MS (1991), PhD (1996) in Mathematics. She is best known for her work on Elliptic Curve Cryptography, Supersingular Isogeny Graphs in Cryptography, Homomorphic Encryption (SEALcrypto.org), Private AI, and AI4Crypto. She served as President of the Association for Women in Mathematics from 2015-2017 and on the Council of the American Mathematical Society from 2014-2017.Find me on Twitter for updates on new episodes, and reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. I spend a lot of time on this podcast—if you like my work, you can support me on Patreon :) You can also support upkeep for the full Gradient team/project through a paid subscription on Substack!Subscribe to The Gradient Podcast: Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:10) Llama 3 and encrypted data — where do we want to be?* (04:20) Tradeoffs: individual privacy vs. aggregated value in e.g. social media forums* (07:48) Kristin’s shift in views on privacy* (09:40) Earlier work on elliptic curve cryptography — applications and theory* (10:50) Inspirations from algebra, number theory, and algebraic geometry* (15:40) On algebra vs. analysis and on clear thinking* (18:38) Elliptic curve cryptography and security, algorithms and concrete running time* (21:31) Cryptographic protocols and setting standards* (26:36) Supersingular isogeny graphs (and higher-dimensional supersingular isogeny graphs)* (32:26) Hard problems for cryptography and finding new problems* (36:42) Guaranteeing security for cryptographic protocols and mathematical foundations* (40:15) Private AI: Crypto-Nets / running neural nets on homomorphically encrypted data* (42:10) Polynomial approximations, activation functions, and expressivity* (44:32) Scaling up, Llama 2 inference on encrypted data* (46:10) Transitioning between MSR and FAIR, industry research* (52:45) An efficient algorithm for integer lattice reduction (AI4Crypto)* (56:23) Local minima, convergence and limit guarantees, scaling* (58:27) SALSA: Attacking Lattice Cryptography with Transformers* (58:38) Learning With Errors (LWE) vs. standard ML assumptions* (1:02:25) Powers of small primes and faster learning* (1:04:35) LWE and linear regression on a torus* (1:07:30) Secret recovery algorithms and transformer accuracy* (1:09:10) Interpretability / encoding information about secrets* (1:09:45) Future work / scaling up* (1:12:08) Reflections on working as a mathematician among technologistsLinks:* Kristin’s Meta, Wikipedia, Google Scholar, and Twitter pages* Papers and sources mentioned/referenced:* The Advantages of Elliptic Curve Cryptography for Wireless Security (2004)* Cryptographic Hash Functions from Expander Graphs (2007, introducing Supersingular Isogeny Graphs)* Families of Ramanujan Graphs and Quaternion Algebras (2008 — the higher-dimensional analogues of Supersingular Isogeny Graphs)* Cryptographic Cloud Storage (2010)* Can homomorphic encryption be practical? (2011)* ML Confidential: Machine Learning on Encrypted Data (2012)* CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy (2016)* A community effort to protect genomic data sharing, collaboration and outsourcing (2017)* The Homomorphic Encryption Standard (2022)* Private AI: Machine Learning on Encrypted Data (2022)* SALSA: Attacking Lattice Cryptography with Transformers (2022)* SalsaPicante: A Machine Learning Attack on LWE with Binary Secrets* SALSA VERDE: a machine learning attack on LWE with sparse small secrets* Salsa Fresca: Angular Embeddings and Pre-Training for ML Attacks on Learning With Errors* The cool and the cruel: separating hard parts of LWE secrets* An efficient algorithm for integer lattice reduction (2023) Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 128I spoke with Sergiy Nesterenko about:* Developing an automated system for designing PCBs* Difficulties in human and automated PCB design* Building a startup at the intersection of different areas of expertiseBy the way — I hit 40 ratings on Apple Podcasts (and am at 66 on Spotify). It’d mean a lot (really, a lot) if you’d consider leaving a rating or a review. I read everything, and it’s very heartening and helpful to hear what you think. Enjoy, and let me know what you think!Sergiy is founder and CEO of Quilter. Sergiy spent 5 years at SpaceX developing radiation-hardened avionics for SpaceX's Falcon 9 and Falcon Heavy's second stage rockets, before discovering a big problem: designing printed circuit boards for all the electronics in these rockets was tedious, manual and error prone. So in 2019, he founded Quilter to build the next generation of AI-powered tooling for electrical engineers.I spend a lot of time on this podcast—if you like my work, you can support me on Patreon :)Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (00:45) Quilter origins and difficulties in designing PCBs* (04:12) PCBs and schematic implementations* (06:40) Iteration cycles and simulations* (08:35) Octilinear traces and first-principles design for PCBs* (12:38) The design space of PCBs* (15:27) Benchmarks for PCB design* (20:05) RL and PCB design* (22:48) PCB details, track widths* (25:09) Board functionality and aesthetics* (27:53) PCB designers and automation* (30:24) Quilter as a compiler* (33:56) Gluing social worlds and bringing together expertise* (36:00) Process knowledge vs. first-principles thinking* (42:05) Example boards* (44:45) Auto-routers for PCBs* (48:43) Difficulties for scaling to larger boards* (50:42) Customers and skepticism* (53:42) On experiencing negative feedback* (56:42) Maintaining stamina while building Quilter* (1:00:00) Endgame for Quilter and future directions* (1:03:24) OutroLinks:* Quilter homepage* Other pages/features mentioned:* Thin-to-thick traces* Octilinear trace routing* Comment from Tom Fleet Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 127I spoke with Christopher Thi Nguyen about:* How we lose control of our values* The tradeoffs of legibility, aggregation, and simplification* Gamification and its risksEnjoy—and let me know what you think!C. Thi Nguyen as of July 2020 is Associate Professor of Philosophy at the University of Utah. His research focuses on how social structures and technology can shape our rationality and our agency. He has published on trust, expertise, group agency, community art, cultural appropriation, aesthetic value, echo chambers, moral outrage porn, and games. He received his PhD from UCLA. Once, he was a food writer for the Los Angeles Times.I spend a lot of time on this podcast—if you like my work, you can support me on Patreon :)Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:10) The ubiquity of James C. Scott* (06:03) Legibility and measurement* (12:50) Value capture, classes and measurement* (17:30) Political value choice in ML* (23:30) Why value collapse happens* (33:00) Blackburn, “Hume and Thick Connexions” — projectivism and legibility* (36:20) Heuristics and decision-making* (40:08) Institutional classification systems* (46:55) Back to Hume* (48:27) Epistemic arms races, stepping outside our conceptual architectures* (56:40) The “what to do” question* (1:04:00) Gamification, aesthetic engagement* (1:14:51) Echo chambers and defining utility* (1:22:10) Progress, AGI millenarianism* (disclaimer: I don’t know what’s going to happen with the world, either.)* (1:26:04) Parting visions* (1:30:02) OutroLinks:* Chrisopher’s Twitter and homepage* Games: Agency as Art* Papers referenced* Transparency is Surveillance* Games and the art of agency* Autonomy and Aesthetic Engagement* Art as a Shelter from Science* Value Capture* Hostile Epistemology* Hume and Thick Connexions (Simon Blackburn) Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 126I spoke with Vivek Natarajan about:* Improving access to medical knowledge with AI* How an LLM for medicine should behave* Aspects of training Med-PaLM and AMIE* How to facilitate appropriate amounts of trust in users of medical AI systemsVivek Natarajan is a Research Scientist at Google Health AI advancing biomedical AI to help scale world class healthcare to everyone. Vivek is particularly interested in building large language models and multimodal foundation models for biomedical applications and leads the Google Brain moonshot behind Med-PaLM, Google's flagship medical large language model. Med-PaLM has been featured in The Scientific American, The Economist, STAT News, CNBC, Forbes, New Scientist among others.I spend a lot of time on this podcast—if you like my work, you can support me on Patreon :)Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (00:35) The concept of an “AI doctor”* (06:54) Accessibility to medical expertise* (10:31) Enabling doctors to do better/different work* (14:35) Med-PaLM* (15:30) Instruction tuning, desirable traits in LLMs for medicine* (23:41) Axes for evaluation of medical QA systems* (30:03) Medical LLMs and scientific consensus* (35:32) Demographic data and patient interventions* (40:14) Data contamination in Med-PaLM* (42:45) Grounded claims about capabilities* (45:48) Building trust* (50:54) Genetic Discovery enabled by a LLM* (51:33) Novel hypotheses in genetic discovery* (57:10) Levels of abstraction for hypotheses* (1:01:10) Directions for continued progress* (1:03:05) Conversational Diagnostic AI* (1:03:30) Objective Structures Clinical Examination as an evaluative framework* (1:09:08) Relative importance of different types of data* (1:13:52) Self-play — conversational dispositions and handling patients* (1:16:41) Chain of reasoning and information retention* (1:20:00) Performance in different areas of medical expertise* (1:22:35) Towards accurate differential diagnosis* (1:31:40) Feedback mechanisms and expertise, disagreement among clinicians* (1:35:26) Studying trust, user interfaces* (1:38:08) Self-trust in using medical AI models* (1:41:39) UI for medical AI systems* (1:43:50) Model reasoning in complex scenarios* (1:46:33) Prompting* (1:48:41) Future outlooks* (1:54:53) OutroLinks:* Vivek’s Twitter and homepage* Papers* Towards Expert-Level Medical Question Answering with LLMs (2023)* LLMs encode clinical knowledge (2023)* Towards Generalist Biomedical AI (2024)* AMIE* Genetic Discovery enabled by a LLM (2023) Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 125False universalism freaks me out. It doesn’t freak me out as a first principle because of epistemic violence; it freaks me out because it works. I spoke with Professor Thomas Mullaney about:* Telling stories about your work and balancing what feels meaningful with practical realities* Destabilizing our understandings of the technologies we feel familiar with, and the work of researching the history of the Chinese typewriter* The personal nature of researchThe Chinese Typewriter and The Chinese Computer are two of the best books I’ve read in a very long time. And they’re not just good and interesting, but important to read, for the history they tell and the ideas and arguments they present—I can’t recommend them and Professor Mullaney’s other work enough.Tom is Professor of History and Professor of East Asian Languages and Cultures, by courtesy. He is also the Kluge Chair in Technology and Society at the Library of Congress, and a Guggenheim Fellow. He is the author or lead editor of 8 books, including The Chinese Computer, The Chinese Typewriter (winner of the Fairbank prize), Your Computer is on Fire, and Coming to Terms with the Nation: Ethnic Classification in Modern China.I spend a lot of time on this podcast—if you like my work, you can support me on Patreon :)Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:00) “In Their Own Words” interview: on telling stories about your work* (07:42) Clashing narratives and authenticity/inauthenticity in pursuing your work* (15:48) Why Professor Mullaney pursued studying the Chinese typewriter* (18:20) Worldmaking, transforming the physical world to fit our descriptive models* (30:07) Internal and illegible continuities/coherence in work* (31:45) The role of a “self”* (43:06) The 2008 Beijing Olympics and false (alphabetical) universalism, projectivism* (1:04:23) “Kicking the ladder” and the personal nature of research* (1:18:07) The “Technolinguistic Chinese Exclusion Act” — the situatedness of historians in their work* (1:33:00) Is the Chinese typewriter project finished? / on the resolution of problems* (1:43:35) OutroLinks:* Professor Mullaney’s homepage and Twitter* In Their Own Words: Thomas Mullaney* Books* The Chinese Computer: A Global History of the Information Age* The Chinese Typewriter: A History* Coming to Terms with the Nation: Ethnic Classification in Modern China Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 124You may think you’re doing a priori reasoning, but actually you’re just over-generalizing from your current experience of technology.I spoke with Professor Seth Lazar about:* Why managing near-term and long-term risks isn’t always zero-sum* How to think through axioms and systems in political philosphy* Coordination problems, economic incentives, and other difficulties in developing publicly beneficial AISeth is Professor of Philosophy at the Australian National University, an Australian Research Council (ARC) Future Fellow, and a Distinguished Research Fellow of the University of Oxford Institute for Ethics in AI. He has worked on the ethics of war, self-defense, and risk, and now leads the Machine Intelligence and Normative Theory (MINT) Lab, where he directs research projects on the moral and political philosophy of AI.Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (00:54) Ad read — MLOps conference* (01:32) The allocation of attention — attention, moral skill, and algorithmic recommendation* (03:53) Attention allocation as an independent good (or bad)* (08:22) Axioms in political philosophy* (11:55) Explaining judgments, multiplying entities, parsimony, intuitive disgust* (15:05) AI safety / catastrophic risk concerns* (22:10) Superintelligence arguments, reasoning about technology* (28:42) Attacking current and future harms from AI systems — does one draw resources from the other? * (35:55) GPT-2, model weights, related debates* (39:11) Power and economics—coordination problems, company incentives* (50:42) Morality tales, relationship between safety and capabilities* (55:44) Feasibility horizons, prediction uncertainty, and doing moral philosophy* (1:02:28) What is a feasibility horizon? * (1:08:36) Safety guarantees, speed of improvements, the “Pause AI” letter* (1:14:25) Sociotechnical lenses, narrowly technical solutions* (1:19:47) Experiments for responsibly integrating AI systems into society* (1:26:53) Helpful/honest/harmless and antagonistic AI systems* (1:33:35) Managing incentives conducive to developing technology in the public interest* (1:40:27) Interdisciplinary academic work, disciplinary purity, power in academia* (1:46:54) How we can help legitimize and support interdisciplinary work* (1:50:07) OutroLinks:* Seth’s Linktree and Twitter* Resources* Attention, moral skill, and algorithmic recommendation* Catastrophic AI Risk slides Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 123I spoke with Suhail Doshi about:* Why benchmarks aren’t prepared for tomorrow’s AI models* How he thinks about artists in a world with advanced AI tools* Building a unified computer vision model that can generate, edit, and understand pixels. Suhail is a software engineer and entrepreneur known for founding Mixpanel, Mighty Computing, and Playground AI (they’re hiring!).Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (00:54) Ad read — MLOps conference* (01:30) Suhail is *not* in pivot hell but he *is* all-in on 50% AI-generated music* (03:45) AI and music, similarities to Playground* (07:50) Skill vs. creative capacity in art* (12:43) What we look for in music and art* (15:30) Enabling creative expression* (18:22) Building a unified computer vision model, underinvestment in computer vision* (23:14) Enhancing the aesthetic quality of images: color and contrast, benchmarks vs user desires* (29:05) “Benchmarks are not prepared for how powerful these models will become”* (31:56) Personalized models and personalized benchmarks* (36:39) Engaging users and benchmark development* (39:27) What a foundation model for graphics requires* (45:33) Text-to-image is insufficient* (46:38) DALL-E 2 and Imagen comparisons, FID* (49:40) Compositionality* (50:37) Why Playground focuses on images vs. 3d, video, etc.* (54:11) Open source and Playground’s strategy* (57:18) When to stop open-sourcing?* (1:03:38) Suhail’s thoughts on AGI discourse* (1:07:56) OutroLinks:* Playground homepage* Suhail on Twitter Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 122I spoke with Azeem Azhar about:* The speed of progress in AI* Historical context for some of the terminology we use and how we think about technology* What we might want our future to look likeAzeem is an entrepreneur, investor, and adviser. He is the creator of Exponential View, a global platform for in-depth technology analysis, and the host of the Bloomberg Original series Exponentially.Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (00:32) Ad read — MLOps conference* (01:05) Problematizing the term “exponential”* (07:35) Moore’s Law as social contract, speed of technological growth and impedances* (14:45) Academic incentives, interdisciplinary work, rational agents and historical context* (21:24) Monolithic scaling* (26:38) Investment in scaling* (31:22) On Sam Altman* (36:25) Uses of “AGI,” “intelligence”* (41:32) Historical context for terminology* (48:58) AI and teaching* (53:51) On the technology-human divide* (1:06:26) New technologies and the futures we want* (1:10:50) Inevitability narratives* (1:17:01) Rationality and objectivity* (1:21:13) Cultural affordances and intellectual history* (1:26:15) Centralized and decentralized AI systems* (1:32:54) Instruction tuning and helpful/honest/harmless* (1:39:18) Azeem’s future outlook * (1:46:15) OutroLinks:* Azeem’s website and Twitter* Exponential View Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 122I spoke with Professor David Thorstad about:* The practical difficulties of doing interdisciplinary work* Why theories of human rationality should account for boundedness, heuristics, and other cognitive limitations* why EA epistemics suck (ok, it’s a little more nuanced than that)Professor Thorstad is an Assistant Professor of Philosophy at Vanderbilt University, a Senior Research Affiliate at the Global Priorities Institute at Oxford, and a Research Affiliate at the MINT Lab at Australian National University. One strand of his research asks how cognitively limited agents should decide what to do and believe. A second strand asks how altruists should use limited funds to do good effectively.Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. Subscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:15) David’s interest in rationality* (02:45) David’s crisis of confidence, models abstracted from psychology* (05:00) Blending formal models with studies of the mind* (06:25) Interaction between academic communities* (08:24) Recognition of and incentives for interdisciplinary work* (09:40) Movement towards interdisciplinary work* (12:10) The Standard Picture of rationality* (14:11) Why the Standard Picture was attractive* (16:30) Violations of and rebellion against the Standard Picture* (19:32) Mistakes made by critics of the Standard Picture* (22:35) Other competing programs vs Standard Picture* (26:27) Characterizing Bounded Rationality* (27:00) A worry: faculties criticizing themselves* (29:28) Self-improving critique and longtermism* (30:25) Central claims in bounded rationality and controversies* (32:33) Heuristics and formal theorizing* (35:02) Violations of Standard Picture, vindicatory epistemology* (37:03) The Reason Responsive Consequentialist View (RRCV)* (38:30) Objective and subjective pictures* (41:35) Reason responsiveness* (43:37) There are no epistemic norms for inquiry* (44:00) Norms vs reasons* (45:15) Arguments against epistemic nihilism for belief* (47:30) Norms and self-delusion* (49:55) Difficulty of holding beliefs for pragmatic reasons* (50:50) The Gibbardian picture, inquiry as an action* (52:15) Thinking how to act and thinking how to live — the power of inquiry* (53:55) Overthinking and conducting inquiry* (56:30) Is thinking how to inquire as an all-things-considered matter?* (58:00) Arguments for the RRCV* (1:00:40) Deciding on minimal criteria for the view, stereotyping* (1:02:15) Eliminating stereotypes from the theory* (1:04:20) Theory construction in epistemology and moral intuition* (1:08:20) Refusing theories for moral reasons and disciplinary boundaries* (1:10:30) The argument from minimal criteria, evaluating against competing views* (1:13:45) Comparing to other theories* (1:15:00) The explanatory argument* (1:17:53) Parfit and Railton, norms of friendship vs utility* (1:20:00) Should you call out your friend for being a womanizer* (1:22:00) Vindicatory Epistemology* (1:23:05) Panglossianism and meliorative epistemology* (1:24:42) Heuristics and recognition-driven investigation* (1:26:33) Rational inquiry leading to irrational beliefs — metacognitive processing* (1:29:08) Stakes of inquiry and costs of metacognitive processing* (1:30:00) When agents are incoherent, focuses on inquiry* (1:32:05) Indirect normative assessment and its consequences* (1:37:47) Against the Singularity Hypothesis* (1:39:00) Superintelligence and the ontological argument* (1:41:50) Hardware growth and general intelligence growth, AGI definitions* (1:43:55) Difficulties in arguing for hyperbolic growth* (1:46:07) Chalmers and the proportionality argument* (1:47:53) Arguments for/against diminishing growth, research productivity, Moore’s Law* (1:50:08) On progress studies* (1:52:40) Improving research productivity and technology growth* (1:54:00) Mistakes in the moral mathematics of existential risk, longtermist epistemics* (1:55:30) Cumulative and per-unit risk* (1:57:37) Back and forth with longtermists, time of perils* (1:59:05) Background risk — risks we can and can’t intervene on, total existential risk* (2:00:56) The case for longtermism is inflated* (2:01:40) Epistemic humility and longtermism* (2:03:15) Knowledge production — reliable sources, blog posts vs peer review* (2:04:50) Compounding potential errors in knowledge* (2:06:38) Group deliberation dynamics, academic consensus* (2:08:30) The scope of longtermism* (2:08:30) Money in effective altruism and processes of inquiry* (2:10:15) Swamping longtermist options* (2:12:00) Washing out arguments and justified belief* (2:13:50) The difficulty of long-term forecasting and interventions* (2:15:50) Theory of change in the bounded rationality program* (2:18:45) OutroLinks:* David’s homepage and Twitter and blog* Papers mentioned/read* Bounded rationality and inquiry* Why bounded rationality (in epistemology)?* Against the newer evidentialists* The accuracy-coherence tradeoff in cognition* There are no epistemic norms of inquiry* Permissive metaepistemology* Global priorities and effective altruism* What David likes about EA* Against the singularity hypothesis (+ blog posts)* Three mistakes in the moral mathematics of existential risk (+ blog posts)* The scope of longtermism* Epistemics Get full access to The Gradient at thegradientpub.substack.com/subscribe
Episode 121I spoke with Professor Ryan Tibshirani about:* Differences between the ML and statistics communities in scholarship, terminology, and other areas. * Trend filtering* Why you can’t just use garbage prediction functions when doing conformal predictionRyan is a Professor in the Department of Statistics at UC Berkeley. He is also a Principal Investigator in the Delphi group. From 2011-2022, he was a faculty member in Statistics and Machine Learning at Carnegie Mellon University. From 2007-2011, he did his Ph.D. in Statistics at Stanford University.Reach me at editor@thegradient.pub for feedback, ideas, guest suggestions. The Gradient Podcast on: Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:10) Ryan’s background and path into statistics* (07:00) Cultivating taste as a researcher* (11:00) Conversations within the statistics community* (18:30) Use of terms, disagreements over stability and definitions* (23:05) Nonparametric Regression* (23:55) Background on trend filtering* (33:48) Analysis and synthesis frameworks in problem formulation* (39:45) Neural networks as a specific take on synthesis* (40:55) Divided differences, falling factorials, and discrete splines* (41:55) Motivations and background* (48:07) Divided differences vs. derivatives, approximation and efficiency* (51:40) Conformal prediction* (52:40) Motivations* (1:10:20) Probabilistic guarantees in conformal prediction, choice of predictors* (1:14:25) Assumptions: i.i.d. and exchangeability — conformal prediction beyond exchangeability* (1:25:00) Next directions* (1:28:12) Epidemic forecasting — COVID-19 impact and trends survey* (1:29:10) Survey methodology* (1:38:20) Data defect correlation and its limitations for characterizing datasets* (1:46:14) OutroLinks:* Ryan’s homepage* Works read/mentioned* Nonparametric Regression* Adaptive Piecewise Polynomial Estimation via Trend Filtering (2014) * Divided Differences, Falling Factorials, and Discrete Splines: Another Look at Trend Filtering and Related Problems (2020)* Distribution-free Inference* Distribution-Free Predictive Inference for Regression (2017)* Conformal Prediction Under Covariate Shift (2019)* Conformal Prediction Beyond Exchangeability (2023)* Delphi and COVID-19 research* Flexible Modeling of Epidemics* Real-Time Estimation of COVID-19 Infections* The US COVID-19 Trends and Impact Survey and Big data, big problems: Responding to “Are we there yet?” Get full access to The Gradient at thegradientpub.substack.com/subscribe
In episode 120 of The Gradient Podcast, Daniel Bashir speaks to Sasha Luccioni.Sasha is the AI and Climate Lead at HuggingFace, where she spearheads research, consulting, and capacity-building to elevate the sustainability of AI systems. A founding member of Climate Change AI (CCAI) and a board member of Women in Machine Learning (WiML), Sasha is passionate about catalyzing impactful change, organizing events and serving as a mentor to under-represented minorities within the AI community.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach Daniel at editor@thegradient.pubSubscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (00:43) Sasha’s background* (01:52) How Sasha became interested in sociotechnical work* (03:08) Larger models and theory of change for AI/climate work* (07:18) Quantifying emissions for ML systems* (09:40) Aggregate inference vs training costs* (10:22) Hardware and data center locations* (15:10) More efficient hardware vs. bigger models — Jevons paradox* (17:55) Uninformative experiments, takeaways for individual scientists, knowledge sharing, failure reports* (27:10) Power Hungry Processing: systematic comparisons of ongoing inference costs* (28:22) General vs. task-specific models* (31:20) Architectures and efficiency* (33:45) Sequence-to-sequence architectures vs. decoder-only* (36:35) Hardware efficiency/utilization* (37:52) Estimating the carbon footprint of Bloom and lifecycle assessment* (40:50) Stable Bias* (46:45) Understanding model biases and representations* (52:07) Future work* (53:45) Metaethical perspectives on benchmarking for AI ethics* (54:30) “Moral benchmarks”* (56:50) Reflecting on “ethicality” of systems* (59:00) Transparency and ethics* (1:00:05) Advice for picking research directions* (1:02:58) OutroLinks:* Sasha’s homepage and Twitter* Papers read/discussed* Climate Change / Carbon Emissions of AI Models* Quantifying the Carbon Emissions of Machine Learning* Power Hungry Processing: Watts Driving the Cost of AI Deployment?* Tackling Climate Change with Machine Learning* CodeCarbon* Responsible AI* Stable Bias: Analyzing Societal Representations in Diffusion Models* Metaethical Perspectives on ‘Benchmarking’ AI Ethics* Measuring Data* Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice Get full access to The Gradient at thegradientpub.substack.com/subscribe
In episode 119 of The Gradient Podcast, Daniel Bashir speaks to Professor Michael Sipser.Professor Sipser is the Donner Professor of Mathematics and member of the Computer Science and Artificial Intelligence Laboratory at MIT.He received his PhD from UC Berkeley in 1980 and joined the MIT faculty that same year. He was Chairman of Applied Mathematics from 1998 to 2000 and served as Head of the Mathematics Department 2004-2014. He served as interim Dean of Science 2013-2014 and then as Dean of Science 2014-2020.He was a research staff member at IBM Research in 1980, spent the 1985-86 academic year on the faculty of the EECS department at Berkeley and at MSRI, and was a Lady Davis Fellow at Hebrew University in 1988. His research areas are in algorithms and complexity theory, specifically efficient error correcting codes, interactive proof systems, randomness, quantum computation, and establishing the inherent computational difficulty of problems. He is the author of the widely used textbook, Introduction to the Theory of Computation (Third Edition, Cengage, 2012).Have suggestions for future podcast guests (or other feedback)? Let us know here or reach Daniel at editor@thegradient.pubSubscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:40) Professor Sipser’s background* (04:35) On interesting questions* (09:00) Different kinds of research problems* (13:00) What makes certain problems difficult* (18:48) Nature of the P vs NP problem* (24:42) Identifying interesting problems* (28:50) Lower bounds on the size of sweeping automata* (29:50) Why sweeping automata + headway to P vs. NP* (36:40) Insights from sweeping automata, infinite analogues to finite automata problems* (40:45) Parity circuits* (43:20) Probabilistic restriction method* (47:20) Relativization and the polynomial time hierarchy* (55:10) P vs. NP* (57:23) The non-connection between GO’s polynomial space hardness and AlphaGo* (1:00:40) On handicapping Turing Machines vs. oracle strategies* (1:04:25) The Natural Proofs Barrier and approaches to P vs. NP* (1:11:05) Debates on methods for P vs. NP* (1:15:04) On the possibility of solving P vs. NP* (1:18:20) On academia and its role* (1:27:51) OutroLinks:* Professor Sipser’s homepage* Papers discussed/read* Halting space-bounded computations (1978)* Lower bounds on the size of sweeping automata (1979)* GO is Polynomial-Space Hard (1980)* A complexity theoretic approach to randomness (1983)* Parity, circuits, and the polynomial-time hierarchy (1984)* A follow-up to Furst-Saxe-Sipser* The Complexity of Finite Functions (1991) Get full access to The Gradient at thegradientpub.substack.com/subscribe
In episode 118 of The Gradient Podcast, Daniel Bashir speaks to Andrew Lee.Andrew is co-founder and CEO of Shortwave, a company dedicated to building a better product experience for email, particularly by leveraging AI. He previously co-founded and was CTO at Firebase.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach Daniel at editor@thegradient.pubSubscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:43) Andrew’s previous work, Firebase* (04:48) Benefits of lacking experience in building Firebase* (08:55) On “abstract reasoning” vs empirical capabilities* (10:30) Shortwave’s AI system as a black box* (11:55) Motivations for Shortwave* (17:10) Why is Google not innovating on email?* (21:53) Shortwave’s overarching product vision and pivots* (27:40) Shortwave AI features* (33:20) AI features for email and security concerns* (35:45) Shortwave’s AI Email Assistant + architecture* (43:40) Issues with chaining LLM calls together* (45:25) Understanding implicit context in utterances, modularization without loss of context* (48:56) Performance for AI assistant, batching and pipelining* (55:10) Prompt length* (57:00) On shipping fast* (1:00:15) AI improvements that Andrew is following* (1:03:10) OutroLinks:* Andrew’s blog and Twitter* Shortwave* Introducing Ghostwriter* Everything we shipped for AI Launch Week* A deep dive into the world’s smartest email AI Get full access to The Gradient at thegradientpub.substack.com/subscribe
“You get more of what you engage with. Everyone who complains about coverage should understand that every click, every quote tweet, every argument is registered by these publications as engagement. If what you want is really meaty, dispassionate, balanced, and fair explainers, you need to click on that, you need to read the whole thing, you need to share it, talk about it, comment on it. We get the media that we deserve.”In episode 117 of The Gradient Podcast, Daniel Bashir speaks to Joss Fong.Joss is a producer focused on science and technology, and was a founding member of the Vox video team. Her work has been recognized by the AAAS Kavli Science Journalism Awards, the Online Journalism Awards, and the News & Documentary Emmys. She holds a master's degree in science, health, and environmental reporting from NYU.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pubSubscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:32) Joss’s path into videomaking, J-school* (07:45) Consumption and creation in explainer journalism* (10:45) Finding clarity in information* (13:15) Communication of ML research* (15:55) Video journalism and science communication as separate and overlapping disciplines* (19:41) Evolution of videos and videomaking* (26:33) Explaining AI and communicating mental models* (30:47) Meeting viewers in the middle, competing for attention* (34:07) Explanatory techniques in Glad You Asked* (37:10) Storytelling and communicating scientific information* (40:57) “Is Beauty Culture Hurting Us?” and participating in video narratives* (46:37) AI beauty filters* (52:59) Obvious bias in generative AI* (59:31) Definitions and ideas of progress, humanities and technology* (1:05:08) “Iterative development” and outsourcing quality control to the public* (1:07:10) Disagreement about (tech) journalism’s purpose* (1:08:51) Incentives in newsrooms and journalistic organizations* (1:12:04) AI for video generation and implications, limits of creativity* (1:17:20) Skill and creativity* (1:22:35) Joss’s new YouTube channel!* (1:23:29) OutroLinks:* Joss’s website and playlist of selected work* AI-focused videos* AI Art, Explained (2022)* AI can do your homework. Now what? (2023)* Computers just got a lot better at writing (2020)* Facebook showed this ad to 95% women. Is that a problem? (2020)* What facial recognition steals from us (2019)* The big debate about the future of work (2017)* AI and Creativity short film for Runway’s AIFF (2023)* Others* Is Beauty Culture Hurting Us? from Glad You Asked (2020)* Joss’s Scientific American videos :) Get full access to The Gradient at thegradientpub.substack.com/subscribe
In episode 116 of The Gradient Podcast, Daniel Bashir speaks to Kate Park. Kate is the Director of Product at Scale AI. Prior to joining Scale, Kate worked on Tesla Autopilot as the AI team’s first and lead product manager building the industry’s first data engine. She has also published research on spoken natural language processing and a travel memoir.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pubSubscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:11) Kate’s background* (03:22) Tesla and cameras vs. Lidar, importance of data* (05:12) “Data is key”* (07:35) Data vs. architectural improvements* (09:36) Effort for data scaling* (10:55) Transfer of capabilities in self-driving* (13:44) Data flywheels and edge cases, deployment* (15:48) Transition to Scale* (18:52) Perspectives on shifting to transformers and data* (21:00) Data engines for NLP vs. for vision* (25:32) Model evaluation for LLMs in data engines* (27:15) InstructGPT and data for RLHF* (29:15) Benchmark tasks for assessing potential labelers* (32:07) Biggest challenges for data engines* (33:40) Expert AI trainers* (36:22) Future work in data engines* (38:25) Need for human labeling when bootstrapping new domains or tasks* (41:05) OutroLinks:* Scale Data Engine* OpenAI case study Get full access to The Gradient at thegradientpub.substack.com/subscribe
In episode 115 of The Gradient Podcast, Daniel Bashir speaks to Ben Wellington.Ben is the Deputy Head of Feature Forecasting at Two Sigma, a financial sciences company. Ben has been at Two Sigma for more than 15 years, and currently leads efforts focused on natural language processing and feature forecasting. He is also the author of data science blog I Quant NY, which has influenced local government policy, including changes in NYC street infrastructure and the design of NYC subway vending machines. Ben is a Visiting Assistant Professor in the Urban and Community Planning program at the Pratt Institute in Brooklyn where he teaches statistics using urban open data. He holds a Ph.D. in Computer Science from New York University.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pubSubscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:30) Ben’s background* (04:30) Why Ben was interested in NLP* (05:48) Ben’s work on translational equivalence, dominant techniques* (10:14) Scaling, large datasets at Two Sigma* (12:50) Applying ML techniques to quantitative finance, features in financial ML systems* (17:27) Baselines and time-dependence in constructing features, human knowledge* (19:23) Black box models in finance* (24:00) Two Sigma’s presence in the AI research community* (26:55) Short- and long-term research initiatives at Two Sigma* (30:42) How ML fits into Two Sigma’s investment strategy* (34:05) Alpha and competition in investing* (36:13) Temporality in data* (40:38) Challenges for finance/AI and beating the market* (44:36) Reproducibility* (49:47) I Quant NY and storytelling with data* (56:43) Descriptive statistics and stories* (1:01:05) Benefits of simple methods* (1:07:11) OutroLinks:* Ben’s work on translational equivalence and scalable discriminative learning* Two Sigma Insights* Storytelling with data and I Quant NY Get full access to The Gradient at thegradientpub.substack.com/subscribe
“There is this move from generality in a relative sense of ‘we are not as specialized as insects’ to generality in the sense of omnipotent, omniscient, godlike capabilities. And I think there's something very dangerous that happens there, which is you start thinking of the word ‘general’ in completely unhinged ways.”In episode 114 of The Gradient Podcast, Daniel Bashir speaks to Venkatesh Rao. Venkatesh is a writer and consultant. He has been writing the widely read Ribbonfarm blog since 2007, and more recently, the popular Ribbonfarm Studio Substack newsletter. He is the author of Tempo, a book on timing and decision-making, and is currently working on his second book, on the foundations of temporality. He has been an independent consultant since 2011, supporting senior executives in the technology industry. His work in recent years has focused on AI, semiconductor, sustainability, and protocol technology sectors. He holds a PhD in control theory (2003) from the University of Michigan. He is currently based in the Seattle area, and enjoys dabbling in robotics in his spare time. You can learn more about his work at venkateshrao.comHave suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pubSubscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:38) Origins of Ribbonfarm and Venkat’s academic background* (04:23) Voice and recurring themes in Venkat’s work* (11:45) Patch models and multi-agent systems: integrating philosophy of language, balancing realism with tractability* (21:00) More on abstractions vs. tractability in Venkat’s work* (29:07) Scaling of industrial value systems, characterizing AI as a discipline* (39:25) Emergent science, intelligence and abstractions, presuppositions in science, generality and universality, cameras and engines* (55:05) Psychometric terms* (1:09:07) Inductive biases (yes I mentioned the No Free Lunch Theorem and then just talked about the definition of inductive bias and not the actual theorem 🤡)* (1:18:13) LLM training and efficiency, comparing LLMs to humans* (1:23:35) Experiential age, analogies for knowledge transfer* (1:30:50) More clarification on the analogy* (1:37:20) Massed Muddler Intelligence and protocols* (1:38:40) Introducing protocols and the Summer of protocols* (1:49:15) Evolution of protocols, hardness* (1:54:20) LLMs, protocols, time, future visions, and progress* (2:01:33) Protocols, drifting from value systems, friction, compiling explicit knowledge* (2:14:23) Directions for ML people in protocols research* (2:18:05) OutroLinks:* Venkat’s Twitter and homepage* Mediocre Computing* Summer of Protocols and 2024 Call for Applications (apply!)* Essays discussed* Patch models and their applications to multivehicle command and control* From Mediocre Computing* Text is All You Need* Magic, Mundanity, and Deep Protocolization* A Camera, Not an Engine* Massed Muddler Intelligence* On protocols* The Unreasonable Sufficiency of Protocols* Protocols Don’t Build Pyramids* Protocols in (Emergency) Time* Atoms, Institutions, Blockchains Get full access to The Gradient at thegradientpub.substack.com/subscribe
In episode 113 of The Gradient Podcast, Daniel Bashir speaks to Professor Sasha Rush.Professor Rush is an Associate Professor at Cornell University and a Researcher at HuggingFace. His research aims to develop natural language processing systems that are safe, fast, and controllable. His group is interested primarily in tasks that involve text generation, and they study data-driven probabilistic methods that combine deep-learning based models with probabilistic controls. He is also interested in open-source NLP and deep learning, and develops projects to make deep learning systems safer, clearer, and easier to use.Have suggestions for future podcast guests (or other feedback)? Let us know here or reach us at editor@thegradient.pubSubscribe to The Gradient Podcast:  Apple Podcasts  | Spotify | Pocket Casts | RSSFollow The Gradient on TwitterOutline:* (00:00) Intro* (01:47) Professor Rush’s background* (03:23) Professor Rush’s reflections on prior work—importance of learning and inference* (04:58) How much engineering matters in deep learning, the Rush vs. Frankle Bet* (07:12) On encouraging and incubating good research* (10:50) Features of good research environments* (12:36) 5% bets in Professor Rush’s research: State-Space Models (SSMs) as an alternative to Transformers* (15:58) SSMs vs. Transformers* (18:53) Probabilistic Context-Free Grammars—are (P)CFGs worth paying attention to?* (20:53) Sequence-level knowledge distillation: approximating sequence-level distributions* (25:08) Pruning and knowledge distillation — orthogonality of efficiency techniques* (26:33) Broader thoughts on efficiency* (28:31) Works on prompting* (28:58) Prompting and In-Context Learning* (30:05) Thoughts on mechanistic interpretability* (31:25) Multitask prompted training enables zero-shot task generalization* (33:48) How many data points is a prompt worth? * (35:13) Directions for controllability in LLMs* (39:11) Controllability and safety* (41:23) Open-source work, deep learning libraries* (42:08) A story about Professor Rush’s post-doc at FAIR* (43:51) The impact of PyTorch* (46:08) More thoughts on deep learning libraries* (48:48) Levels of abstraction, PyTorch as an interface to motivate research* (50:23) Empiricism and research commitments* (53:32) OutroLinks:* Research* Early work / PhD* Dual Decomposition and LP Relaxations* Vine Pruning for Efficient Multi-Pass Dependency Parsing* Improved Parsing and POS Tagging Using Inter-Sentence Dependency Constraints* Research — interpretable and controllable natural language generation* Compound Probabilistic Context-Free Grammars for Grammar Induction* Multitask prompted training enables zero-shot task generalization* Research — deep generative models* A Neural Attention Model for Abstractive Sentence Summarization* Learning Neural Templates for Text Generation* How many data points is a prompt worth?* Research — efficient algorithms and hardware for speech, translation, dialogue* Sequence-Level Knowledge Distillation* Open-source work* NamedTensor* Torch Struct Get full access to The Gradient at thegradientpub.substack.com/subscribe
loading
Comments (2)

Fabiano PS

I'm not an academic, this talk goes deep in the academic meta, and for me that's really not interesting. 2/10

May 4th
Reply

Priya Dharshini

🔴WATCH>>ᗪOᗯᑎᒪOᗩᗪ>>👉https://co.fastmovies.org

Jan 16th
Reply