Aviv Regev on Genomics’ Quality Problem: Ion Genomics Newsletter, June 16, 2026
Roche Axelios at ESHG, shareholder lawsuits, Adaptive to focus on MRD, the limits of training virtual cell models, The Love Hypothesis movie gets a release date, and more.
Genomics has a quality problem.
“For a long time — this is really an ethos, both of very established biological methods and also of genomics — the idea [behind] figuring out biology was that it is a quality problem,” said Aviv Regev, Genentech’s head of research and early development, in a June 12 presentation at Rahul Satija’s Single-Cell Genomics Day virtual workshop. “[People thought] ‘We need a great way to look at everything with exceptional precision in each experiment,’ but what I’m going to try and convince you, at least today, is that when the quantity of data grows large enough, this quantity becomes a new quality. It’s still far smaller than all the possibilities, but it’s big enough to let AI models systematically figure out the rest of the problem for us.”
Even recently, computational modeling in genomics faced sheer numbers that made progress impossible. “Imagine we wanted to [...] be able to predict, for any sequence of DNA, say just over 80 letters long, what level of expression it would drive,” Regev said. “There’s 480 possible sequences of length 80. If I made all of them and put them in a tube, its mass would be more than 10 times the mass of the Earth. I can’t do experiments as big as that.”
However, CRISPR-based Perturb-seq approaches can help, Regev showed in her presentation, weaving a through line from the early days of the single-cell sequencing field, which she and Satija, a former member of her lab, helped build. Even as she laid out the staggering, soul-dwarfing scale of the problem facing virtual cell modelers like her, it appeared as though she’s already begun laying a path to making it tractable.
There is no better person to explain how Perturb-seq experiments can contribute to the AI revolution already underway in biology. Regev not only helped develop the concept of these pooled CRISPR screens a decade ago, she has been putting them into practice at Genentech. Just a few weeks ago, I heard from my podcast guest Alexander Nevue that she has developed a new spatial version of this method. In parallel, she has been at the forefront of creating a vision for how new foundation AI models can help create virtual cells and help systematically pick apart the many rules of biology that have remained hidden to humans.
I’ve been covering this area for nearly two years now and at every turn, Perturb-seq is brought up as a way to create the data that the next waves of models will need to be trained on. The idea is simple enough: CRISPR lets you systematically poke at the genome and you can read out the effects in the transcription of a single cell. But how that fits into a grander plan of interrogating all of human biology had remained elusive to me. Regev’s talk helped.
“This Perturb-seq approach, approach, especially when driven together with computation, can be applied in many settings to decipher the regulation of pathways … to combine with genetic signal to find functions for variants that increase risk of human disease in co-cultures and organoids, or animals,” she said, “but also to find targets that can yield desired therapeutic effects, or to give cell therapies desired properties, and even identify combinations of interventions.”
But Perturb-seq on its own is just a measure of RNA expression she noted, and to make a multi-modal virtual cell, biologists would need to “measure everything everywhere all at once” — a challenge in its own right. New AI models, however, are good at translating between different data “languages” and she showed how her team was able to use AI to predict spatial transcriptomic profiles from H&E staining images. Not only that, they could basically generate virtual Perturb-seq experiment data from those H&E images. 🤯
“So, we can do much more for less and get much more than you think, and that might be it, right? Well, not so fast, because there’s even bigger numbers yet when we consider combinations.
So, let’s look at these numbers super briefly: testing, experimentally, five-way combinations over 20,000 genes would mean profiling 1021 cells for just one cell type under one condition. For reference, an adult human body has on the order of 1013 cells, so it’s not going to happen. If we look at the pairwise combinations, that’s 108 [to] 1010 cells, one cell type, one condition: unlikely to happen. And if you just look at 600 genes in the Lipopolysaccharide response [in immune cells], that’s 180,000 cells.
18 million cells can be done as a comprehensive experiment, absolutely, but it’s a lot, and it’s just one cell type in one condition, and somehow doesn’t feel like the right thing to do. And so what if we used AI?
What if, instead of 180,000 pairs, we used AI to predict the 600 pairs most informative to test in order to get a better model, and then complete the rest? … Such iterative Perturb-seq is actually a real lab-in-the-loop. So now we don’t just start with cells, but also with predictions to select the first 600 combinations, plus a slew of positive and negative controls. We collect the data, we feed it again to the AI, we generate more predictions, we feed the next 600 and as we run through a couple of these iterations, we get a model that outperforms on predictions, versus a model that got to see as much data — but where the iterations were a strong baseline of random pairs over the same set of impactful genes.
In this way, as we work iteratively through sequences, expression, and images across systems and levels, we will be able to build a foundation model of cell biology, or a virtual cell that is a match to these massive numbers of biology.”
In addition to providing the big-picture takes, Regev’s talk showed how the little pieces she’s built along the way have led to actual insights for Genentech, albeit one chunk of (disease) biology at a time. It’s tantalizing to think about how putting it all together could help systematically explore myriad cell types, signaling circuits, tissues, and diseases.
The livestream was publicly available; however, it’s unclear whether this talk will be posted on the Satija Lab’s YouTube page. For anyone who missed it, I’m going to explore how to cover it more deeply. This is something I’ve been considering as part of my value proposition to premium subscribers. How would you like to get more of a peek behind the curtain in these cases?
If you haven’t already, check out the latest Ion Genomics podcast with Wall Street analyst Kyle Mikson. He’s one of the sharpest people following research tools and diagnostics companies like Illumina and Natera. We discussed everything we could think of that happened over the recent first quarter earnings season, including his highlights from the ASCO meeting.

Elsewhere on Substack
Roche Drips Axelios Updates at ESHG
Roche’s Axelios sequencer is proving to be able to get results from very low DNA inputs, according to updates from customers captured yesterday at the annual meeting of the European Society of Human Genetics by sequencing expert Albert Vilella on his Substack.
Broad Clinical Labs has introduced an amplification-free workflow, which only requires 500 ng of DNA input. This allows labs to sequence 16 genomes in 13 hours. And Centogene, a German diagnostics company, has found an application for Axelios in rare disease testing by sequencing low amounts of DNA taken from dried blood spots. “Utilizing just 75 ng of input DNA from 16 historical samples, the platform reliably detected all known clinically relevant variants,” Vilella wrote, noting that the Centogene team “specifically praised the system’s robust handling of challenging homopolymer stretches and its overall coverage quality.”
A Roche exec said that by the time Axelios hits the market, it will be able to do whole-genome sequencing, including from FFPE and cell-free DNA samples in blood, as well as single-cell RNA sequencing — thanks to a partnership with 10x Genomics. By the end of the year, Roche plans to offer bulk RNA sequencing, targeted sequencing, methylation sequencing, and proteomics workflows. Earlier this year, Roche and Olink partnered to integrate the Olink Explore HT with Axelios.
Proteins Predict Lung Cancer
Precision medicine evangelist Eric Topol covered the implications of a new study published in Cell that was able to predict lung cancer using protein biomarkers five years before it was diagnosed.
“The new study takes us to an unprecedented position for identifying and thoroughly validating a 14-protein signature for lung cancer,” he wrote, adding that the researchers took advantage of new high-throughput proteomics methods, a 3,000-protein panel from Olink.
On social media, he noted: “Challenging dogma, the proteins are not coming from cancerous cells!”
Other genomics news
Samsung Invests $175M in Element Biosciences
The funding was part of a Series E financing round; however, other investors in the round and the total amount raised were not disclosed. The deal makes Samsung the largest shareholder in the San Diego-based sequencing startup.
Grail, GeneDx Face Shareholder Suits
An investor has launched a class action suit against liquid biopsy maker Grail, alleging that company officials misled investors about the chances of success in its key NHS-Galleri trial. Recall that earlier this year, Grail disclosed that the study of multi-cancer early detection tests had not met its primary endpoint, something Mikson and I discussed on last week’s podcast.
Similarly, a GeneDx shareholder has filed a securities class action lawsuit against the company, alleging it misled investors about its recent acquistion of Fabric Genomics, as reported by Huanjia Zhang of GenomeWeb.
Adaptive Biotechnologies to Focus on MRD, Split Business
Adaptive Biotechnologies announced that it is planning to concentrate on MRD testing and kick its immune medicine business out of the nest. The units have been operating separately after a 2024 strategic review.
George Church-Advised AI Startup Raises $50M Seed Round
Radical Numerics, a Bay Area startup aiming to build “general biological intelligence” with AI, raised $50 million, led by Emergence Capital. The company said it has identified applications in cancer diagnostics, drug target identification, and biosecurity. Harvard University genomics expert George Church, a scientific advisor, has already updated his disclosure statement.
Guardant Health Nabs 27th CDx Indication for Guardant360 CDx Assay
The FDA has approved the Guardant360 CDx liquid biopsy test as a companion diagnostic for Boehringer Ingelheim’s Hernexeos (zongertinib), a targeted therapy for adults with HER2 (ERBB2)-mutant advanced non-small cell lung cancer (NSCLC) as an initial treatment option.
University of Exeter to profile UK Biobank samples using Illumina methylation array technology
Funding of £16 million ($21.5 million) comes primarily from the Novo Nordisk Foundation, with additional support from Illumina.
What I’m Reading
A new paper from a leading expert on AI in genomics raises interesting questions for the field about the limits of training virtual cells.
“We evaluated the role of the size and diversity of the training dataset in the performance of single-cell foundation models and found little gain in increasing dataset size beyond a set point,” the authors of a paper published June 9 in Nature Methods wrote in a research briefing accompanying their paper. They were led by Fabian Theis of Germany’s Technical University of Munich.
“Have we reached the limits of data scaling, or only the limits of current objectives?” Theis wrote in a post on LinkedIn. “My guess is that the next generation of biological foundation models will depend less on simply collecting more cells and more on finding the right representation learning principles for biology.”
It’s potentially disheartening news for the makers of single-cell assays, but echoes comments made by virtual cell expert Christina Theodoris during her appearance on the Ion Genomics podcast. “As we increase the diversity, that actually has even more impact than just the pure numbers,” she told me.
Elsewhere on the Internet
The Love Hypothesis, a charming novel about a grad student working on pancreatic cancer diagnostics and her campus run-ins with a surly computational biology professor, has been getting the Hollywood treatment. Last week, we finally got a release date.
https://www.instagram.com/lilireinhart/reel/DZctn1rDcR9/?hl=en
If by chance someone on author Ali Hazelwood’s press team is reading this, expect to receive an invitation to podcast with me shortly.





