Unlocking Cells' Secrets: Diffusion, Deconvolution, & Discovery with Siyu He, author of Squidiff & CORAL
Digest
This podcast discusses the application of AI in biological research, focusing on two key projects: SquidDiff, a diffusion model predicting cellular responses to perturbations using transcriptome data; and Coral, a system integrating low and high-resolution tissue data via graph neural networks to create detailed tissue maps. The conversation highlights the accelerating pace of AI in biology, potentially delivering a century's worth of progress in the next decade. Challenges in accessing and utilizing biological data are addressed, emphasizing the need for a large-scale initiative, similar to the Manhattan Project, to unlock this data for AI-driven healthcare advancements. Existing projects like Google DeepMind's virtual cells and the Chan Zuckerberg Initiative's "Billion Cells" project are showcased, illustrating the current efforts in creating large-scale biological models. The podcast concludes with an optimistic outlook on AI's potential to revolutionize healthcare, improving disease treatment and extending lifespans, while acknowledging the importance of responsible AI development.
Outlines

Introduction to AI in Cellular Systems and Disease Mechanisms
Introduction of C.U.H.'s research using AI and single/spatial transcriptomics to understand cellular systems and disease mechanisms. This includes discussion of the complexity of biological systems and the challenge of understanding cause-and-effect relationships.

SquidDiff and Coral: AI Models for Cellular and Tissue Analysis
Detailed explanation of SquidDiff, a diffusion model predicting cellular responses, and Coral, a system integrating low and high-resolution tissue data using graph neural networks. The models' efficiency and potential to save researchers time and resources are highlighted.

The Rapid Advancement of AI in Biology
Discussion on the rapid progress in AI and its potential to revolutionize biological research, potentially leading to a century's worth of progress in the next decade.

Biological Data Liberation and AI-Driven Healthcare
Discussion on the challenges of limited biological data and the potential for a large-scale initiative to unlock this data for AI-driven medical advancements, including ethical considerations of data privacy and the potential benefits of increased data availability.

Current AI Projects and Future Applications in Biological Modeling
Exploration of existing projects like Google DeepMind's virtual cells and the Chan Zuckerberg Initiative's "Billion Cells" project, showcasing current efforts in creating large-scale biological models using AI. Potential applications, including virtual doctors and personalized medicine, are discussed.

The Future of AI in Healthcare: Optimism and Cautions
The conversation concludes with an optimistic outlook on AI's potential to revolutionize healthcare, emphasizing the positive impact on disease treatment and lifespan extension. Concerns about uncontrolled AI development are acknowledged, but the focus remains on the potential benefits of responsible AI development in the medical field.
Keywords
Transcriptome
A measure of gene expression in a cell, crucial for understanding cellular activity and disease mechanisms. Single-cell RNA sequencing provides high-resolution transcriptome profiling.
Diffusion Model
A generative AI model used in SquidDiff to predict cellular responses by adding and removing noise from transcriptome data.
Graph Neural Network (GNN)
A neural network used in Coral to analyze graph-structured data, modeling relationships between cells in tissues.
Spatial Transcriptomics
Technology measuring gene expression while preserving cell location, providing information on cell-cell interactions and tissue architecture.
Single-cell RNA Sequencing
Technology profiling gene expression of individual cells, revealing cellular heterogeneity and enabling detailed analysis of cellular states.
AI-driven Healthcare
Application of AI to improve healthcare, including disease diagnosis, treatment development, and personalized medicine.
Biological Data Liberation
Making restricted biological data accessible for research and development, navigating privacy concerns for medical advancements.
Virtual Cells
Digital representations of biological cells, enabling simulation of cellular processes and accelerating drug discovery.
Large-Scale Biological Models
Complex computational models simulating entire biological systems using AI and vast amounts of biological data.
Q&A
What is the main goal of SquidDiff and Coral?
SquidDiff predicts cellular responses efficiently; Coral integrates low and high-resolution tissue data for a comprehensive tissue view.
How can these AI models accelerate biological research and clinical applications?
They reduce experiment time and cost, enabling hypothesis testing and personalized medicine through prediction of individual patient responses.
What are the major obstacles to leveraging AI for advancements in healthcare?
Limited biological data availability due to privacy regulations and data silos.
How could a large-scale initiative accelerate AI-driven healthcare advancements?
By unlocking vast amounts of biological data, funding research, and fostering collaboration.
What are some examples of current AI projects impacting biological modeling?
Google DeepMind's virtual cells and the Chan Zuckerberg Initiative's "Billion Cells" project.
Show Notes
In this episode of the Cognitive Revolution, we hear from Siyu He, a postdoc at Stanford specializing in biomedical data science. Siyu discusses the implications and methods behind their recent AI-driven biological research papers, Squidiff and CORAL. The conversation explores the use of AI models to analyze complex cellular systems and disease mechanisms, focusing on transcriptome and tissue sample analyses. Squidiff aims to simulate cellular transcriptomes to predict outcomes of various conditions, significantly expediting traditionally lengthy and expensive biological experiments. CORAL Project extends this by integrating different levels of biological data, enabling a more comprehensive understanding of tissue structures and cellular interactions. The discussion also delves into the challenges of using synthetic data for validating AI models and the potential acceleration of scientific discoveries through AI in biomedical research. The episode encapsulates the interplay between AI and biology, highlighting the future possibilities and current limitations of this innovative research front.
Squidiff: https://www.biorxiv.org/content/10.1101/2024.11.16.623974v1.full.pdf
CORAL: https://www.biorxiv.org/content/10.1101/2025.02.01.636038v1.full.pdf
SPONSORS:
Oracle Cloud Infrastructure (OCI): Oracle's next-generation cloud platform delivers blazing-fast AI and ML performance with 50% less for compute and 80% less for outbound networking compared to other cloud providers. OCI powers industry leaders like Vodafone and Thomson Reuters with secure infrastructure and application development capabilities. New U.S. customers can get their cloud bill cut in half by switching to OCI before March 31, 2024 at https://oracle.com/cognitive
Shopify: Shopify is revolutionizing online selling with its market-leading checkout system and robust API ecosystem. Its exclusive library of cutting-edge AI apps empowers e-commerce businesses to thrive in a competitive market. Cognitive Revolution listeners can try Shopify for just $1 per month at https://shopify.com/cognitive
NetSuite: Over 41,000 businesses trust NetSuite by Oracle, the #1 cloud ERP, to future-proof their operations. With a unified platform for accounting, financial management, inventory, and HR, NetSuite provides real-time insights and forecasting to help you make quick, informed decisions. Whether you're earning millions or hundreds of millions, NetSuite empowers you to tackle challenges and seize opportunities. Download the free CFO's guide to AI and machine learning at https://netsuite.com/cognitive
PRODUCED BY:
CHAPTERS:
(00:00 ) About the Episode
(03:37 ) Introduction and Guest Welcome
(04:00 ) Setting the Big Picture Context
(04:37 ) Exploring the Squidiff and CORAL Papers
(08:31 ) Understanding Transcriptomes
(11:17 ) Single Cell RNA Sequencing Technology
(15:32 ) Motivation Behind Squidiff (Part 1)
(17:14 ) Sponsors: Oracle Cloud Infrastructure (OCI) | Shopify
(19:41 ) Motivation Behind Squidiff (Part 2)
(25:56 ) Training Data and Model Architecture (Part 1)
(31:38 ) Sponsors: NetSuite
(33:11 ) Training Data and Model Architecture (Part 2)
(37:18 ) Diffusion Models in Biology
(46:07 ) In Silico Experiments and Applications
(54:25 ) Clarifying the Validation Process
(55:36 ) Validation Strategies and Real Data
(58:26 ) Challenges in Modeling and Predictions
(01:02:14 ) Accelerating Research with AI Models
(01:07:31 ) Future Directions and Collaboration
(01:10:46 ) Introduction to CORAL Paper
(01:13:09 ) Spatial Transcriptomics and Proteomics
(01:17:10 ) Challenges in Integrating Spatial Data
(01:31:53 ) Synthetic Data and Model Validation
(01:36:42 ) The Future of AI in Healthcare
(01:43:31 ) Outro
SOCIAL LINKS:
Website: https://www.cognitiverevolution.ai
Twitter (Podcast): https://x.com/cogrev_podcast
Twitter (Nathan): https://x.com/labenz
LinkedIn: https://linkedin.com/in/nathanlabenz/


![E32: [Bonus Episode - The AI Breakdown] Can OpenAI's New GPT Training Model Solve Math and AI Alignment At the Same Time? E32: [Bonus Episode - The AI Breakdown] Can OpenAI's New GPT Training Model Solve Math and AI Alignment At the Same Time?](https://megaphone.imgix.net/podcasts/680351f6-0179-11ee-a281-5bef084f2628/image/e57b08.png?ixlib=rails-4.3.1&max-w=3000&max-h=3000&fit=crop&auto=format,compress)





















