Issue #58 // Graphs As Metaphor In Biological Systems
Representing Biological Systems As Multi-Layered Graphs
Liked this piece? If so, tap the 🖤 in the header above. It’s a small gesture that goes a long way in helping me understand what you value and in growing this newsletter. Thanks so much!
Issue #№ 58 // Graphs As Metaphør In Biøløgical Systems
There’s a particular kind of frustration that comes with studying living systems, one that doesn’t quite exist in physics or chemistry. You can spend years characterizing how a protein behaves in a carefully controlled experiment, publish your findings, and then watch as someone else discovers that same protein doing something completely different in another context. This anecdote isn’t personal; I've haven’t spent years studying a single protein, learning everything there is to know about it. But I have experienced a similar widening of my aperture as I went from undergrad where I learned that a protein’s structure determines its function to grad school where I learned about specific functions for a number of different oncoproteins to my doctoral research where I saw how protein function is highly context dependent, inherently relational, and not always entirely clear.
Along the way, I also learned to think about biological systems as graphs—networks of nodes connected by edges, which I wrote about at length in Issue #51 // Mapping Biology’s Dark Matter. At the molecular level, the nodes might be genes, proteins, or metabolites. The edges represent their relationships: how they interact, regulate each other, compete for resources, are co-expressed, and more. Zoom out, and those molecular nodes aggregate into functional modules such as signaling pathways, molecular functions, and biological processes. Zoom out further still, and you’re looking at cells, tissues, entire organisms, and populations. It’s graphs all the way up.
Most molecular biology research, when you strip away the particulars, does one of two things: it identifies a new node, or it identifies a new edge. A novel protein gets characterized—that’s a node. Someone discovers that this protein phosphorylates another protein under certain conditions—that’s an edge. The problem, and the thing that makes biology uniquely maddening, is that we don’t know what all the nodes are, even for relatively well-studied processes. And even when we do know the nodes, mapping the edges between them is extraordinarily slow work.
I’ve written before about molecular moonlighting, which is the phenomenon where a single protein performs multiple, often unrelated functions depending on context. It’s a good example of why the graph metaphor matters. If you study a protein in isolation, or even in a small set of pairwise interactions, you get an incomplete picture. You might tag it with a fluorophore, watch how it moves when you perturb the system, and conclude that it plays one specific role. But that role might be contingent on a dozen other factors you weren’t measuring: the presence of particular cofactors, the cell’s metabolic state, or what other proteins are competing for the same binding sites. Biology is full of higher-order feedback loops. A protein’s function isn’t intrinsic—it’s emergent from its position in the network.
The traditional experimental toolkit reflects these limitations1. If you wanted to understand a biological process twenty years ago, you could really only look at a few nodes at a time. Fluorescent tagging let you watch a handful of proteins and knockout models let you remove a node and observe the ripple effects. These approaches worked, in the sense that they produced real knowledge, but they were constrained by throughput. Designing an experiment that could simultaneously account for the behavior of dozens or hundreds of interacting components was essentially impossible.
The last fifteen to twenty years have been different. Next-generation sequencing technologies didn’t just make it cheaper to sequence DNA—they enabled a fundamentally different mode of inquiry. Suddenly you could measure the expression levels of thousands of genes in a cell at once. You could characterize tens of thousands of protein-protein interactions in a single experiment. RNA-seq, ChIP-seq, ATAC-seq, single-cell sequencing, spatial transcriptomics—each of these represents a way to observe many more nodes and edges simultaneously. The graph becomes denser, more complete.
What becomes visible at this scale are the things that pairwise studies miss. Redundancy, for example. Biological systems are full of backup mechanisms, parallel pathways that only become apparent when you knock out one component and discover that the system compensates. Or context-dependency; a protein that’s essential in one cell type might be practically irrelevant in another, not because the protein itself has changed, but because the surrounding network is different. These aren’t edge (pardon the pun) cases. They’re the norm.
The knowledge graph framework also clarifies what kinds of questions remain hard. Even with high-throughput tools, we’re still working with incomplete graphs. Some nodes are easier to observe than others—genes are straightforward, but post-translational modifications or transient protein complexes are slippery. Some edges are easier to infer: co-expression is a weak signal, but it’s measurable at scale. Physical protein-protein interactions are stronger evidence, but harder to capture comprehensively. Causal relationships—this protein activates that pathway, under these conditions—are the hardest of all, because biology resists clean causality2. The graph is dynamic. The edges change depending on the state of the system.
There’s an intellectual style I admire, the kind George Church exemplifies, where technical rigor coexists with a willingness to make unexpected connections across disciplines. The graph metaphor invites this. Network theory was developed to study social systems, electrical grids, and the internet3. The mathematics of centrality and modularity don’t care whether the nodes are people or proteins. This isn’t just analogy; there are real conceptual tools from network science that translate directly. The idea of network motifs, recurring patterns of connections that show up across different systems, has been productively applied to gene regulatory networks. The concept of scale-free networks, where a few highly connected hubs dominate, describes both the World Wide Web and protein interaction networks.
But the metaphor also has limits, and it’s worth being clear about them. Real biological systems aren’t static graphs. They’re dynamic, probabilistic, and context-dependent in ways that are difficult to capture in a single representation. An edge between two proteins might exist in one cellular state and disappear in another. The graph itself is a moving target. Still, as an organizing framework, it works. It lets us think clearly about what we know and what we don’t and it suggests where the next productive questions might be. The project of biology from this perspective is one of gradual graph completion. We’re filling in missing nodes, adding edges, and refining our understanding of which connections matter under which conditions. The pace has accelerated, but the work remains enormous. Even for the relatively well-characterized model organisms—yeast, worms, flies—the graphs are incomplete. For human biology, where the complexity is orders of magnitude higher, we’re still sketching the outline.
What feels different now isn’t just the speed of data generation, but the possibility of asking questions that were previously out of reach. If you can measure thousands of variables simultaneously, you can start to model how networks respond to perturbations. You can identify vulnerabilities—nodes whose removal would collapse an entire pathway. You can predict how a system might compensate for damage. This is the promise of systems biology, computational biology, network medicine: not just cataloging parts, but understanding how they fit together.
The frustration I mentioned earlier doesn’t go away. Biological systems remain irreducibly complex. But the graph gives us a way to think about that complexity that’s both rigorous and generative. It lets us map what we know, acknowledge what we don’t, and chart a course forward. The edges are still hard to draw, the nodes still incomplete, but at least we can see the shape of the problem.
I first started thinking about the ideas in this paragraph after reading Yuri Lazebnik’s Can a biologist fix a radio?—Or, what I learned while studying apoptosis, which makes clear the limitations of reducing biological networks to their component parts.
Mapping causal relationships also requires time-series data with a sufficient number of time points, and at the right temporal scale, which is relatively rare in molecular biology. I recently published an open source tool called CausalEdge, which identifies causal influences in this type of data when it is available.
Interestingly, a number of the concepts I discussed in Issue #49: Information Theory Gets to The Heart of Biometric Analysis were also first developed to study electrical grids and communication over phone lines.





Beyond its heuristic value, graph approaches actually encourage one to think about biology in information processing terms. That is the single most important role of the paradigm.
Isn’t “shape of the problem” the irreducible becoming something that is reducible?
Only it isn’t about say a “mechanical irreducibility”, but more so, the whole or systems level reducibility.
The shape in this case would be the dynamic network patterns, the conceptual patterns that biology has so far missed.
I am beginning to construct an idea that biology’s true purpose isn’t enumeration of nodes or edges (the classic taxonomy stamp collecting), but rather the elucidation of hierarchical patterns in dynamic complex networks.