Issue #61 // Orthogonal Validation Over Replication
The biology we study is constrained by the biology we can measure. The solution to this problem isn’t more reproducibility—It’s orthogonal validation using different measurement approaches.
Thanks for reading! If you found this post useful, please consider subscribing. I share hands-on computational biology techniques, fresh ways to think about tough problems, and perspectives on a range of related topics. All free, straight to your inbox:
Issue № 61 // Orthogonal Validation Over Replication
I recently shared the following note, which was a reflection on the fact that molecular biology research typically relies on indirect measurement techniques to infer the presence, abundance, and activity of genes, proteins, and metabolites (as well as the downstream consequences).1
Anyone who works with biological data knows this intuitively, yet it’s easy to forget that our measurement technologies themselves are not perfectly transparent and can distort our observations in subtle ways. In computational biology we address this issue through quality control, normalization, and batch correct. But, the unsettling reality is that a portion of what we consider to be biological signal might reflect variations in how different samples interact with our measurement technologies2.
For example, RNA-sequencing is prone to GC content bias and amplification preferences, stemming from the physicochemical properties of DNA/RNA, which affects which transcripts we detect and how we quantify them. Additionally, mass spectrometry-based proteomics favors certain peptides based on ionization efficiency and often misses low-abundance proteins entirely, while antibody-based methods depend on epitope accessibility that may not reflect true protein abundance or activity. These types of errors aren’t random and as a result they don’t just average out given enough data. They’re systematic biases inherent to how each measurement technology interrogates biological systems.
This brings us to something I’ve been thinking a lot about recently, which is that when different sequencing runs, technicians, or reagent lots produce different results, we often attribute it to batch effects that need correction. But, batch correction, by definition, can only address variations between uses of the same measurement technology. It doesn’t, and cannot, address the inherent way that technology affects what we ultimately measure. We’re always measuring biology filtered through the lens of our chosen instrumentation, limitations and biases included, and no amount of normalization or batch correction can remove these types of systematic distortions. If this is true—and I’m open to the possibility that my logic is flawed—then it leads to the realization that we often can’t distinguish between measurement technology-intrinsic bias and true biological signals when working within a single measurement modality. Using the RNA-sequencing example again, consider a given reproducible finding… can we be certain we’re identifying true transcriptional regulation, or is it possible we’re capturing a systematic bias in how RNA-seq captures certain transcript features?
When we apply batch correction, we make certain assumptions about what constitutes technical versus biological variation. However, these assumptions can’t be validated within a given measurement technology. Additionally, if biological and technical variation correlate (for example, diseases tissue being processed differently than controls, samples being collected at different time points, population structure confounding with processing batch, etc), our corrections may inadvertently remove real biological signals or introduce spurious patterns.
This is where an indiscriminate fixation on scientific reproducibility, which while foundational to science, can introduce a blind spot. We treat replication across labs, datasets, and study cohorts as the gold standard for biological truth, but reproducible doesn’t always mean correct—it could mean that our systematic errors are consistent. For example, if a given set of measurement protocols introduce similar biases and downstream analysis pipelines make the same normalizations assumptions we’ll get perfectly reproducible results that all reflect the same technical artifacts. As a result, some of our most reproducible findings—the ones that replicate across dozens of studies—might be measuring properties of our measurement technologies rather than properties of biology.
Of course, not all reproducibility is created equal. Replication using identical protocols in different labs is weaker evidence than replication using varied protocols within the same measurement technology. But both are fundamentally limited by the fact that they're looking at biology through the same technological lens. The solution to this problem isn’t more reproducibility through the same analytics lenses—It’s orthogonal validation using fundamentally different measurement approaches. When RNA-sequencing, high-throughout proteomics, and functional assays all independently point to the same conclusion despite having different technical biases you’re likely to have triangulated in on a true biological signal3.
This is also why the reflexive integration of multi-omics data can be counterproductive. Sometimes we don’t want to merge transcriptomics and proteomics into a unified model—we want each to speak independently, then ask whether they’re telling the same story. If they converge despite looking at biology through entirely different technical lenses, the signal is real. If they diverge, we’ve learned something about either the biology or our measurement limitations. The goal isn’t consensus through integration; it’s triangulation through independence4.
Liked this piece? If so, tap the 🖤 in the header above. It’s a small gesture that goes a long way in helping me understand what you value and in growing this newsletter.
I’ve previously written about this topic, as is relates to measuring muscle oxygenation, in Dampening the Noise: Making Sense of Variability In Biometric Measurements.
Andrew Carroll recently commented on this in The Virtual Cell Will Be More Like Gwas Than Alphafold, where he states “…But there are two areas where we still have obstacles to overcome. The first is that experimental batch effects are very strong in these assays. Which sequencer you use, which kits you use, how you prepare the RNA, how you handle the cells before preparation can all have large effects. So the majority of the difference between two datasets can be effectively “non-biological” in the sense that what you would learn doesn’t correspond to the parts of the biology you want to learn.”
This doesn’t necessarily mean taking all of these measurements at the same point in time, as different omics technologies capture biological processes at different time scales, which needs to be accounted for when interpreting data. For more on this topic see Issue #45 // Time: The Fourth Dimension In Multiomics Data Analysis.
In practice, this could be as simple as performing two separate sets of differential expression analyses using transcriptomics and proteomics data from the same experiment. From these outputs, you can see if paired genes and proteins are up- and down-regulated to a similar degree (acknowledging that RNA and protein abundances don’t correlate perfectly even in ideal conditions, so we’re looking for directional consistency rather than quantitative matching). Or, where measured genes and proteins do not align well, you can see if both datasets result in similar pathway enrichment. Additionally, if you were to perform weighted co-expression network analysis with both datasets, comparing control vs treatment networks, you can see if the same patterns emerge (for example, network density increasing across conditions), if similar functional modules form, if corresponding networks across datatypes have similar graphlet structures, etc. The specifics here aren’t necessarily prescriptive—the key is asking whether independent measurement modalities converge on the same biological interpretation despite their different technical biases.




Very insightful, really like the way you explain the crisis of reproducibility. As a student, this gives a lot of food for thought while hypothesizing for projects!
Hoping to see computational biology does more in contributing the measurement of the data!!!