Discover"The Cognitive Revolution" | AI Builders, Researchers, and Live Player AnalysisBioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn
Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Update: 2026-03-111
Share

Digest

This podcast delves into the complex intersection of artificial intelligence and biological research, highlighting both its immense potential for advancement and the significant biosecurity risks it presents. The discussion emphasizes the urgent need for robust access control systems for biological data to prevent AI-driven misuse, particularly concerning the design of dangerous pathogens. It covers current biosecurity challenges, the risks associated with gain-of-function research, and how AI can accelerate biological discovery while also enabling sophisticated biological experiment design by extremist groups. The episode explores strategies for mitigating these risks, including the feasibility of targeted data controls that can reduce AI model performance on dangerous tasks without hindering beneficial applications. A comprehensive "defense in depth" strategy is proposed, encompassing pre-synthesis screening, global pathogen surveillance, and frontline defenses. The podcast details a proposed tiered framework (BDL 0-4) for controlling access to biological data based on its potential for harm, inspired by biosafety levels. Empirical evidence supporting the effectiveness of data holdouts in AI models is presented, alongside discussions on operationalizing data controls through Trusted Research Environments (TREs). Further topics include the evolution of gene synthesis screening, the future of autonomous cloud labs and their cybersecurity needs, and the importance of inter-company communication in the gene synthesis industry. The "defense in depth" strategy is broken down into four pillars: delay, deter, detect, and defend, with detailed explanations of each. The discussion also touches upon environmental defenses like air sterilization and the challenges of deterrence. Finally, the podcast concludes with a call to action for proactive measures in biological security and a reflection on the perils of unchecked biological knowledge.

Outlines

00:00:00
Introduction to Biosecurity and AI Risks

Introduction to the podcast and guest Jossie Panu, discussing the critical need for access control systems for biological data to prevent AI-driven misuse, particularly concerning dangerous pathogen design. Overview of current biosecurity challenges, including virus detection, patient data aggregation, and the vaccine development pipeline.

00:00:54
Gain-of-Function Research and AI's Accelerating Threat

Discussion on the risks of gain-of-function research, its legality despite defunding, and the increasing threat from extremist groups due to AI advancements enabling sophisticated biological experiment design. Examples of advanced AI capabilities, such as troubleshooting lab experiments from images and AI agents making research progress autonomously, highlighting AI's potential to accelerate biological discovery and exploit data.

00:02:45
Strategic Data Controls for AI Safety

The good news: strategically excluding specific datasets (like human virus DNA) can reduce AI model performance on dangerous tasks without hindering desirable capabilities, suggesting targeted data controls are feasible.

00:03:03
Comprehensive Biosecurity Strategies and Data Frameworks

Discussion of a comprehensive "defense in depth" strategy for biosecurity, including pre-synthesis screening, global pathogen surveillance (e.g., wastewater monitoring), and frontline defenses like PPE and sterilization. Detailed exploration of biological data types, the abundance of sequence data, and the proposed tiered framework (BDL 0-4) for controlling access to functional data linked to dangerous pathogen properties.

00:41:02
Regulation of Gain-of-Function and AI Biological Risks

Examination of the legality and regulation of gain-of-function research, the challenges in controlling information versus physical samples, and the reasons for limited legislative action on rare, high-consequence events. Categorization of AI models (LLMs, bio-design tools, foundation models) and their associated risks in the biological domain, emphasizing the potential for misuse and the need for careful consideration of capabilities.

01:01:43
Differentiating Desired AI Capabilities and Harm Pathways

Discussion on identifying and controlling AI capabilities in biology, distinguishing between beneficial applications (e.g., virtual cell models) and potentially destabilizing ones (e.g., viral design), focusing on offense dominance. Analysis of how AI-driven biological advances can be used for harm, comparing direct vs. multi-step pathways and the expertise required, and questioning the likelihood of biological weapons being chosen over existing options.

01:08:03
The Biological Data Level (BDL) Framework Explained

Detailed explanation of the proposed Biological Data Level (BDL) framework, inspired by biosafety levels, to control access to biological data based on its potential for harm, preserving open access for most data.

01:14:16
Empirical Evidence and Strategies for Data Control

Discussion of empirical studies on data holdouts in AI models like ESM3 and EVO2, demonstrating that removing specific datasets (e.g., human virus sequences) significantly reduces their performance on dangerous tasks. Exploring strategies for controlling AI models trained on sensitive data, the role of private agreement and consensus, and the progress in regulating wet lab gain-of-function research and AI biosecurity efforts.

01:23:02
Operationalizing Controls: Trusted Research Environments

Discussion on operationalizing data controls through Trusted Research Environments (TREs), where researchers bring code to secure data, potentially managed by institutions or private actors, to facilitate research while maintaining security.

01:26:41
Monitoring Future Threats and AI Arms Race

Examining monitoring systems like DNA synthesis screening and wastewater monitoring, and discussing the potential for an AI-driven arms race in biosecurity, with a focus on achieving a defense-dominant world.

01:31:44
Gene Synthesis, Cloud Labs, and Information Infrastructure

Discusses gene synthesis screening, including KYC, and its evolution into a more cost-effective, voluntary system. Explores the need for mandatory screening to prevent misuse of pathogen sequences. Envisions a future with autonomous cloud labs controlled remotely and highlights the need for robust cybersecurity. Addresses the lack of a unified information infrastructure for sharing data between gene synthesis companies.

01:35:20
A Layered Defense-Dominant Strategy for Biosecurity

Proposes a "defense in depth" strategy for biological security, breaking it down into four pillars: delay, deter, detect, and defend. Details these pillars, including limiting access, punishing misuse, surveillance systems, and countermeasures. Explores environmental defenses like air sterilization and calls for proactive measures in biological security.

Keywords

Biosecurity


The protection of a nation's people, territory, and interests against biological threats, including preventing the development and use of biological weapons and mitigating outbreaks.

Gain-of-Function Research


Research that modifies pathogens to increase transmissibility or lethality, carrying risks of accidental release or intentional misuse.

AI Agents


Autonomous software that can perform tasks and make decisions without direct human intervention, raising concerns about potential misuse in biological research.

Data Controls


Mechanisms to restrict access to or regulate the use of sensitive biological data to prevent misuse in developing dangerous biological capabilities.

Trusted Research Environments (TREs)


Secure platforms allowing researchers to analyze sensitive data without it leaving its secure location, facilitating beneficial research while minimizing misuse risks.

Biosafety Levels (BSL)


A system of containment precautions for working with infectious agents, dictating safety measures based on the risk of the agent.

Defense in Depth


A security strategy using multiple layers of defense (delay, deter, detect, defend) to protect against biological threats.

Synthetic Biology


The design and construction of new biological parts, devices, and systems for useful purposes, enabling the creation of novel organisms.

Gene Synthesis Screening


A process to vet orders for DNA sequences, preventing the creation of dangerous pathogens through customer checks and sequence assessment.

Autonomous Cloud Lab


A remotely controlled biological laboratory requiring minimal human intervention, aiming to accelerate biological research and development.

Q&A

  • What are the primary concerns regarding AI's role in biological data and research?

    AI can accelerate the discovery and design of dangerous pathogens, potentially enabling extremist groups or lone actors to create novel viruses with high transmissibility and lethality, moving threats from theoretical to practical.

  • How does the proposed Biological Data Level (BDL) framework aim to mitigate risks?

    The BDL framework, tiered from 0 to 4, categorizes biological data based on its potential for harm. It aims to keep most data open access while imposing increasing controls on functional data linked to dangerous pathogen properties like transmissibility and immune evasion.

  • What is gain-of-function research and why is it controversial?

    Gain-of-function research modifies pathogens to enhance traits like transmissibility or virulence. It's controversial due to the risk of accidental lab release or intentional misuse, potentially creating pandemic-level threats, despite its potential for understanding viruses.

  • Can excluding certain data from AI training significantly reduce its dangerous capabilities?

    Yes, empirical studies on models like EVO2 and ESM3 show that strategically removing datasets, such as sequences of human-infecting viruses, dramatically reduces their performance on dangerous tasks like viral design, without compromising beneficial capabilities.

  • What are Trusted Research Environments (TREs) and how do they fit into biosecurity?

    TREs are secure platforms where researchers can analyze data without it leaving its secure location. They are proposed as a way to operationalize data controls, allowing legitimate researchers to access sensitive biological data for beneficial purposes while preventing misuse.

  • Why is it difficult for governments to legislate against certain types of biological research?

    Governments are better at legislating frequent occurrences. High-consequence events like pandemics are rare, leading to fluctuating policy focus. Additionally, controlling information versus physical samples presents a significant challenge for regulation.

  • What is gene synthesis screening and why is it important?

    Gene synthesis screening involves vetting orders for DNA sequences to prevent the creation of dangerous pathogens. It includes Know Your Customer (KYC) checks and assessing the nature of the requested sequences, aiming to enhance biosecurity.

  • What are the four pillars of a defense-dominant strategy for biological security?

    The four pillars are: Delay (limiting access to concerning capabilities), Deterrence (punishing the use of biological weapons), Detection (surveillance systems for new pathogens), and Defense (vaccines, countermeasures, and environmental controls).

  • What are some examples of environmental defenses against biological threats?

    Environmental defenses include centrally filtered water systems, screens on windows to prevent insect-borne diseases, and emerging technologies like Far UV and glycol vapors for passively sterilizing the air.

  • What is the main challenge with deterrence as a strategy against biological weapons?

    Deterrence relies on the assumption that actors are rational and respond to punishment. However, this strategy breaks down if the actor is irrational or does not respond to typical punishment mechanisms.

Show Notes

Jassi Pannu, Assistant Professor at Johns Hopkins, explains how rapidly advancing AI is transforming biological research and raising the risk of engineered pandemics. They map today’s biosecurity landscape, from pathogen detection and DNA sequencing to vaccine development, and examine how frontier models can already troubleshoot lab work and bypass data safeguards. The conversation introduces a proposed Biosecurity Data Level framework to restrict only the most dangerous functional biological data while preserving open science. They close with a broader defense-in-depth strategy—Delay, Deter, Detect, Defend—including DNA synthesis screening, global pathogen surveillance, and practical tools like Far UV sterilization.




LINKS:



Sponsors:


VCX:

VCX, by Fundrise, is the public ticker for private tech, giving everyday investors access to high-growth private companies in AI, space, defense tech, and more. Learn how to invest at https://getvcx.com


Framer:

Framer is an enterprise-grade website builder that lets business teams design, launch, and optimize their.com with AI-powered wireframing, real-time collaboration, and built-in analytics. Start building for free and get 30% off a Framer Pro annual plan at https://framer.com/cognitive


Claude:

Claude is the AI collaborator that understands your entire workflow, from drafting and research to coding and complex problem-solving. Start tackling bigger problems with Claude and unlock Claude Pro’s full capabilities at https://claude.ai/tcr


Tasklet:

Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai




CHAPTERS:


(00:00 ) About the Episode


(05:59 ) From outbreak to vaccine


(17:08 ) Threat actors and data (Part 1)


(21:23 ) Sponsors: VCX | Framer


(23:53 ) Threat actors and data (Part 2)


(31:05 ) Gain-of-function research risks (Part 1)


(37:39 ) Sponsors: Claude | Tasklet


(41:03 ) Gain-of-function research risks (Part 2)


(48:05 ) AI models in biology


(01:00:51 ) Dangerous AI capabilities


(01:07:59 ) Biosecurity data level framework


(01:18:58 ) Policy, governance, and infrastructure


(01:28:53 ) Defense in depth vision


(01:40:43 ) Episode Outro


(01:45:02 ) Outro




PRODUCED BY:


https://aipodcast.ing




Comments 
In Channel
loading

Table of contents

00:00
00:00
x

0.5x

0.8x

1.0x

1.25x

1.5x

2.0x

3.0x

Sleep Timer

Off

End of Episode

5 Minutes

10 Minutes

15 Minutes

30 Minutes

45 Minutes

60 Minutes

120 Minutes

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Bioinfohazards: Jassi Pannu on Controlling Dangerous Data from which AI Models Learn

Erik Torenberg, Nathan Labenz