A Picture Worth 1,000 Deals

Visualizing VC syndicates with a co-occurrence matrix

4 min readApr 2, 2021

Mike Bostock, a key developer of D3.js and co-creator of ObservableHQ, created a diagram that visualizes character co-occurrences in Victor Hugo’s Les Misérables. What strikes me about this diagram is how a story traditionally told through novel, musical, and film can collapse into a single, atemporal matrix. This is what it looks like:

Follow the link above to see a higher-resolution animation

Bostock notes the following about using an adjacency matrix over a traditional node-link diagram when visualizing complex networks:

A network can be represented by an adjacency matrix, where each cell ij represents an edge from vertex i to vertex j. Here, vertices represent characters in a book, while edges represent co-occurrence in a chapter.
While path-following is harder in a matrix view than in a node-link diagram, matrices have other advantages. As networks get large and highly connected, node-link diagrams often devolve into giant hairballs of line crossings. Line crossings are impossible with matrix views.

It sounds like a Michael Scott-ism, but in my line of work — venture capital investing — relationships are everything. Like characters in Les Misérables, investors can be viewed as nodes in a large and highly-connected network of founders, friends, employees, LPs, and of course, other investors. This underlying network differs from the Les Misérables example for two reasons, among 24,601 others:

With VC relationships, connections between nodes are opaque to outside observers. We see pieces of the network through TechCrunch articles (“Investor A co-invested with Investor B”), Twitter followers (“Investor A followed Founder A”), and the like, but wrapping our heads around the entire network would be a Sisyphean endeavor at best.
With Les Misérables relationships, connections are determined by a self-contained, static book. The network is literally spelled out for us. There’s also only one flavor of node we have to deal with: “character.” With the VC network, by contrast, we introduce different types of nodes (“investor,” “founder,” “LP,” etc.), which complicates things.

What follows is purely a function of my low fascination threshold. After learning about Bostock’s visualization, I kept wondering if it was possible to create a co-occurrence matrix that visualizes relationships within the VC industry. The biggest challenges I foresaw pertained to data reliability and data acquisition. I thought, “if we restrict VC networks to mean firm-level connections implied by syndicated financing rounds, then we might be able to work with a reliable, robust dataset.” Chasing this hunch, I reached out to the wonderful team over at PitchBook, Inc. They offered to share aggregated syndicate data from thousands of deals over the last few years. I soon realized that my data reliability and acquisition problems were solved, and the only thing left to do was evaluate if any patterns arose in the data.

The below animation depicts round co-occurrences (i.e., distinct VC firms participating in the same financing round together, including follow-ons) of the top 250 US-based VC investors by cumulative 2017–2020 deal count. The matrix is sorted by vector similarity defined by an adapted Cosine Similarity function. For clustering, we utilized Hierarchical Clustering with Fast Optimal Leaf Ordering.

If you’re like me, this visualization sparks about 100 questions: Is the data normalized? Aren’t there way more than 250 investors that need to be considered? Why not delineate between early and growth stage investors? Investing alongside a firm doesn’t necessarily mean a relationship was built, right? What do the clusters and bridges mean? And most importantly, how is this useful? While all valid questions, I’m most interested in addressing the last one in this article. For now, I see two ways this project can turn into something useful:

A tool for entrepreneurs (1) to meet investors who are 1+ degrees of separation away from their network, (2) to maximize network potential when fundraising (i.e., optimize for investors who can facilitate introductions to a wide range of other investors), or (3) to localize network potential when fundraising (i.e., optimize for investors within a particular cluster).
A tool for fund investors (i.e., Fund of Funds, LPs) (1) to identify firms that invest alongside or orthogonal to an existing portfolio, and (2) to identify emerging managers who are investing alongside established firms but aren’t as access constrained.

Two examples of detected clusters are shown below:

**Left**: Notice the common thread of healthcare investing between these investors. **Right**: Spectral lines emerge from investors YC, SV Angel, and Founders Fund

To summarize, I applied Mike Bostock’s visualization of character co-occurrences in chapters of Les Misérables to firm co-occurrences in rounds of VC funding. The resulting matrix shows signs of clustering that could lead to a useful tool for entrepreneurs and fund investors alike.

Disclaimer: Views are my own and may not reflect those of my employer.

Source(s): Bailey York at PitchBook, Inc. helped with data acquisition; John McCambridge played an integral role in clustering the data; and code from Jean-Daniel Fekete’s GitHub repository was used throughout this project.

Icon for article preview:

A Picture Worth 1,000 Deals

Visualizing VC syndicates with a co-occurrence matrix

Written by James Detweiler