Issue #70 // Why Interpretable Models Outperform Black Boxes in Biology
The Case for Mechanistic Understanding and Interpretability in an Age of Black Box ML
I recently had an interesting conversation on the tension between accuracy and interpretability with the head of ML at a small biotech working on liquid biopsies, which grew out of two earlier pieces: How to Develop Predictive Biomarkers and What Actually Makes Someone Good at ML in Computational Biology. One of the more interesting facets of the discussion was on how accuracy and interpretability can act as competitive moats, but through totally different mechanisms.
Accuracy, in a sense, is a commodity that can be bought. Given enough money, you can buy your way to a better model—more high quality training data, more compute, larger teams, and greater access to proprietary datasets. This gives early movers a huge competitive advantage, but given enough resources a well-funded competitor can close almost any accuracy gap. Interpretability, on the other hand, doesn’t work that way—it can’t be purchased at scale. Instead, it requires deep domain expertise, biological intuition, and the kind of hard-won insight that you can’t list on a procurement order—it’s a bit of a dark art, and that’s what makes it defensible.
But, the more interesting question isn’t which moat is harder to breach. It’s which produces better science. In this piece, we’ll explore how interpretable models don’t just offer a strategic advantage; in biology specifically, they outperform black boxes where it matters most (i.e., clinical translation, generalization to new patient cohorts). Additionally, unlike accuracy, that advantage isn’t one that more spending can replicate.
Issue № 70 // Why Interpretable Models Outperform Black Boxes in Biology
Accuracy is a property of a model’s performance on a test set—it tells you how often the model is right, but nothing about why. Interpretability, on the other hand, describes a model’s relationship to the system its modeling, meaning the degree to which the model’s internal logic mirrors the underlying biology. The distinction between accuracy and interpretability can be made more clear with an example.
After building a biological knowledge graph, you can predict whether an edge exists between two nodes using a graph neural network (GNN). Alternatively, you can predict that same edge with a graph traversal, following chains of known biological relationships to their logical conclusion. The GNN might outperform the traversal on a test set, but the traversal tells you how those nodes are connected and why that connection is plausible. One approach gives you an answer while the other gives you a hypothesis. This same tension pops up when trying to predict treatment response. For example, a deep learning model trained on reverse phase protein array data may predict an HR-/HER2- breast cancer patient’s response to immunotherapy with high accuracy. A 5-gene signature, by contrast, will likely perform worse on held-out data, but is has the advantage of telling you which biological processes are doing the work and it gives a clinician tangible information to act on.
In most domains, we accept the trade-off between accuracy and interpretability without much deliberation. A bit of interpretability is worth giving up if the model generalizes reasonably well and ships on time. But, in biology there’s a third consideration that changes the calculus entirely: the model needs to suggest the next experiment.
This is why the standard machine learning framework—train, validate, test, deploy—doesn’t map cleanly onto biological research. In a recommendation system, deployment is the goal. In biology, deployment is the beginning of the interesting part. A classifier that predicts tumor response with 95% accuracy but offers no mechanistic hypothesis about why certain tumors respond is a dead end from a scientific standpoint. An interpretable model that achieves 88% accuracy but identifies a specific regulatory relationship—say, that tumors with elevated hypoxia signatures and low immune infiltration are the ones failing to respond—gives you a lever to pull. It tells you what to measure in the next cohort, what to interrogate in the next cell line experiment, and what pathway to target in the next therapeutic design.
George Church touched on this in a recent interview with the Lifespan Research Institute, which I tend to agree with:
Would you prefer a weaker but more interpretable AI or a stronger but less interpretable one?
GC: I lean on the interpretability side. It’s not an either-or, but… we’re in science. Few engineers are willing to just pull a rabbit out of a hat, just a black box. Scientists and engineers, by and large, want to know the mechanism. The FDA likes to know mechanisms. Typically, the autocatalytic loop where you learn something and then you invent something is better if it’s mechanistically grounded. So, I lean pretty heavily in the direction of interpretability, explainability, transparency, et cetera, and also it’s safer.
I just honestly think that we will soon be faced with this dilemma, where we will have to choose between the power of the model to do things and its actual interpretability, but maybe we’re not there yet.
GC: If you look at the human scientist experience, the most powerful sciences are the ones that are better articulated mechanistically on a solid foundation rather than black boxes. The black boxes tend to include artifacts, dead ends. Most of the progress in science and engineering has been part of community efforts with strong mechanistic underpinnings.
There’s another, subtler, version of this argument that goes beyond experimental utility. Interpretable models that align with biological mechanisms are more likely to generalize across contexts, because they’ve learned something about the causal structure of the system rather than correlations in a particular dataset. A gene expression signature that works as a classifier in one cohort may fail in another if the cohort-specific technical and demographic variables are confounded with the biology. For example, if your training cohort happens to be predominantly collected at a single institution, on a particular sequencing platform, with a skewed demographic composition, the model will quietly bake all of that in. Then, when it’s applied to a new cohort with different technical and demographic variables, it often fail (and more frustratingly, it fails without telling you why).
A model built around a mechanistically interpretable pathway is more robust to this kind of distribution shift because the mechanism travels with the biology, not with the dataset. For example, a hypoxia signature exists in a tumor regardless of where the sample was collected or how it was sequenced. Anchoring your model to that kind of structure gives it something real to hold onto across contexts. This isn’t always true, of course. Mechanisms can be context dependent as previously discussed in Issue #55 // Molecular Moonlighting. But, even when interpretable models fail, they tend to fail informatively. If your 5-gene signature stops being predictive in a new cohort, that’s a hypothesis: perhaps hypoxia isn’t the rate-limiting factor in this population, or perhaps there’s a moderating variable you haven’t accounted for. A black box achieving 0.51 AUC on held-out data gives you nothing to work with. The failure mode of an interpretable model is, in a sense, still science.
Thanks for reading! If you found this post useful, please consider subscribing. I share hands-on computational biology techniques, fresh ways to think about tough problems, and perspectives on a range of related topics. All free, straight to your inbox.



