HNSW: navigating a proximity graph
What if finding the nearest neighbor became a short stroll of a few hops, instead of a scan across millions of vectors?
In the previous chapter, we ran into two walls. Comparing the query to every vector gives the exact answer, but that scan costs : impractical on millions of vectors. And high dimension blurs the very notion of proximity. The chapter ended with a promise: what if vectors were connected by a network of shortcuts, so that starting from any point and always moving toward a neighbor closer to the query, you arrive in a few hops near the goal?
That idea has a name, HNSW, and it is today the most widely used index in vector databases. This chapter builds it piece by piece. Two questions will guide us: why can a rule as naive as “always jump to the closest neighbor” work, and what do we need to add to the network for it to truly work?
What if vectors held hands
Shift your perspective for a moment. Until now, the database was a bag of vectors with no relationships: to answer a query, we were forced to touch them all. Let us give them links. Each vector knows a handful of close neighbors, and only them. The result is a proximity graph Proximity graph A structure where each vector (a node) is connected by edges to a handful of its nearest neighbors. Instead of a relationless bag of vectors that forces a full scan, you get a network you can walk through, hop by hop, to approach a query without visiting every point. It is the foundation of graph-based indexes such as HNSW. : points are nodes, and an edge connects two points judged to be close.
The analogy is that of a social network. You do not know all eight billion people on earth, just a few dozen close contacts. Yet to pass a message to a stranger on the other side of the world, you do not need to speak to everyone: you hand it to whichever of your contacts seems closest to the recipient, who passes it on in turn, and from hand to hand the message arrives. That is exactly what we will do with a query, moving from neighbor to neighbor.
Greedy navigation
The movement rule is disarmingly simple. We stand on a starting node. We look at the distance from the query to each of its neighbors. We jump to the neighbor closest to the query. We repeat. We stop when no neighbor is closer to the query than the node we are already on. This is greedy search Greedy search A movement strategy in a proximity graph: at each step you hop to the neighbor closest to the query, and you stop as soon as no neighbor is closer than the current node. Fast and short-sighted, it takes the best local move without planning, which does not guarantee reaching the true nearest neighbor: it can get stuck in a local minimum. : at each step, the best local move, without ever looking back or planning ahead.
At each hop, we strictly close in on the query. Since there are finitely many nodes and we never go back, the walk is guaranteed to terminate. The interesting question is not whether it ends, but where.
And there, an unpleasant surprise. The walk stops as soon as it reaches a node whose every neighbor is farther from the query than itself. Nothing guarantees that this node is the true nearest neighbor in the whole database. It may be a simple local minimum Local minimum A graph node whose immediate neighbors are all farther from the query than itself, even though a much better point exists elsewhere in the graph, out of direct reach. A greedy search wrongly stops there, believing it has found the nearest neighbor. Widening the beam (keeping several candidates) lets it escape. : a hollow from which we can no longer descend with the available links, while a much better point exists elsewhere, just out of reach. Greedy search is fast, but it can be wrong.
Keeping multiple tracks: the beam width
How do we escape a hollow? By not putting all our eggs in one basket. Rather than keeping only the single best current node, we maintain a list of the best candidates encountered so far, and keep exploring their neighbors as long as one of them can still improve the list. That number is called the beam width (or candidate list size in some texts).
With , we recover exactly greedy navigation, with all its traps. With , we always keep a fallback track: if the best path hits a local minimum, the second candidate can go around the obstacle and reach a region the first would never have seen. The larger is, the more we explore, the fewer mistakes we make, but the more we compute. The tension from the previous chapter reappears, intact: we will need to measure what we gain and what we lose.
Under the hood: small worlds and construction
We have the search mechanics. The fundamental question remains: why do a few hops suffice? If the graph only connected each point to its immediate nearest neighbors, the answer would be disappointing. To travel from one end of the cloud to the other, you would have to cross the whole space step by step, one small move at a time: a path as long as the cloud is wide. We would have gained nothing.
The secret: long-range shortcuts
The solution comes from a famous observation about social networks. In the “six degrees of separation” experiment, two strangers picked at random anywhere on Earth are connected by a chain of only about six relationships. How, with only close friends? Because there exist a few long-range links: a childhood friend who moved to the other side of the world, a colleague from a different industry. These rare bridges dramatically shorten all distances. A network that combines many local links with a few long-range links is called a small-world Small-world network A network that combines many local links (to nearby neighbors) with a few rare long-range links (to distant regions). Those shortcuts collapse path lengths: to cross the network, the number of hops grows like the logarithm of the number of nodes rather than like their count. It is the principle behind six degrees of separation, and the core of HNSW's efficiency. network.
The effect on path lengths is spectacular. On a purely local graph, the number of hops to cross it grows with the size of the network. Add a few well-placed shortcuts, and that number collapses.
This line reads: without shortcuts, the number of hops grows like the number of points; with well-distributed long-range shortcuts, it grows only like the logarithm of the number of points. Going from to means going from a million hops to about twenty. That is the entire gain of HNSW: a search that cost becomes a stroll of on the order of steps. The component a little further below stacks these very shortcuts into layers: you will follow, layer by layer, the descent that the upper layers make possible.
Building the structure, one point at a time
We still need to build such a graph, and in particular its hierarchy of layers. HNSW builds it incrementally: points are inserted one by one, and each one decides for itself how high up the hierarchy it will climb.
When a new point arrives, it draws a level at random, according to a heavily skewed distribution. The vast majority of points stay at level zero, the base layer. Fewer and fewer reach level one, even fewer level two, and so on. Concretely, a number is drawn at random between zero and one, and the level is:
This formula reads: the level is the integer part of minus the logarithm of , multiplied by a constant . Since is almost always small, the level is almost always zero; it is large only for the rare draws where is tiny. Those rare points, promoted to high levels, become the relay nodes of the upper layers.
Once its level is known, the point is inserted into every layer up to that level. In each one, it is connected to its nearest neighbors already present, being the number of connections per node, fixed in advance. And to wire it into the right place without comparing everything, we use the search we just learned: we descend the hierarchy greedily to find where it lands. HNSW thus uses its own navigation to build itself.
The hierarchy: a skip-list in vector space
Now let us stack these layers. At the very top, a handful of points connected by large hops. Lower down, more points and shorter hops. At the very bottom, the base layer, which contains all points with their finest-grained neighbors.
The search exploits this hierarchy like a parachute descent. We enter through the single point at the summit. In that very sparse layer, a few greedy long hops quickly bring us to the right region of the space. We then drop one level: the best point found becomes the entry point for the layer below, denser, where shorter hops refine the position. We repeat down to the base layer, where we widen the beam to to precisely harvest the nearest neighbors.
This is exactly the idea of a skip-list, that structure that accelerates search in a sorted list by adding increasingly spaced index levels, transposed not to a sorted line but to a vector space. Hence the full name: Hierarchical Navigable Small World, or HNSW HNSW Hierarchical Navigable Small World. An approximate search index that stacks proximity graphs in layers, sparse and coarse at the top, dense and fine at the bottom. A greedy walk descends layer by layer to find the nearest neighbors in about log n hops. Two knobs: M (neighbors per node, paid in memory) and ef (beam width, paid in time). , the hierarchical navigable small-world graph.
Navigate the hierarchy yourself
The component below builds an HNSW index on a small cloud of points in two dimensions, with its stacked layers. Place the query, then step through the descent layer by layer and watch the path draw itself. Always compare the neighbor found to the true nearest neighbor, which the exhaustive oracle marks for you: that is your recall in real time.
Click inside the frame to move the query.
Three questions to ask yourself while exploring:
- At , how often does the descent miss the true neighbor when you move the query? What happens when you raise ?
- Mentally disable the upper layers by imagining a single level: would the path be longer? Count the hops from the base layer alone, then those of the full descent.
- Lower down to very few neighbors per node. Does the graph fragment? Can the descent still reach all regions?
The two dials: M and ef
HNSW is essentially tuned with two numbers, and it is important to understand their distinct roles, because one acts at construction time and the other at search time.
is the number of neighbors per node, fixed when building the index. The larger is, the more richly connected the graph is, so navigation is safer and recall is higher, but the index uses more memory, because all those edges must be stored. is paid in RAM, once and for all.
is the beam width at search time. It can be changed for each query, without rebuilding anything. The larger is, the more candidates are explored, the higher recall climbs, but the slower each query becomes. is paid in time, for every query.
Exercises
In one sentence
HNSW connects vectors in a hierarchical small-world graph, where simple greedy navigation, widened by an adjustable beam width , finds the nearest neighbor in on the order of hops instead of , at the cost of a recall we choose to tune between memory () and time ().
1. What does greedy navigation do in a proximity graph?
2. Why add long-range shortcuts to the graph?
3. What is the difference between parameters M and ef?
Towards chapter 4
HNSW is elegant, but it is not the only way to avoid a full scan, and it is not a free solution. Its graph hierarchy is expensive in memory, and updating it or filtering results within it is far from straightforward. Other families of indexes make very different trade-offs: partitioning space into cells and visiting only a few of them, or compressing vectors to store far more in memory at the cost of some precision. How do we make sense of all this? Chapter 4 maps the landscape of approximate nearest neighbor indexes, places HNSW among its cousins, and provides a framework for choosing based on what you have, memory, speed, or recall, because you can never maximize all three at once.
Sources
- Malkov, Y. A. & Yashunin, D. A. (2018). “Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs.” IEEE Transactions on Pattern Analysis and Machine Intelligence. arXiv:1603.09320
- Malkov, Y., Ponomarenko, A., Logvinov, A. & Krylov, V. (2014). “Approximate Nearest Neighbor Algorithm Based on Navigable Small World Graphs.” Information Systems 45, 61-68. DOI 10.1016/j.is.2013.10.006
Further reading
- Watts, D. J. & Strogatz, S. H. (1998). “Collective Dynamics of ‘Small-World’ Networks.” Nature 393, 440-442. DOI 10.1038/30918
- Kleinberg, J. (2000). “The Small-World Phenomenon: An Algorithmic Perspective.” STOC. DOI 10.1145/335305.335325
- Bruch, S. (2024). Foundations of Vector Retrieval. Springer. A modern textbook covering graph-based indexes, including HNSW, and their trade-offs. arXiv:2401.09350