skip to content
Newvick's blog

How skip lists work and why databases use them

Table of Contents

Intro

When you want fast lookups in a sorted collection, a very common choice is a B-tree. But there’s a lesser known alternative, skip lists, that shows up in systems like LSM-based databases and Redis. Let’s see why.

At their core, a skip list is just a sorted linked list. The interesting part is that some nodes have “express lanes” that let you skip over others.

Before we get into that, it’s useful to think about what problem they solve. You want to store key value pairs and search through it to find the key value pair you want.

Skip lists vs B-trees

In databases, B-trees are the default choice for indexing. They support search, insert, and delete in O(log n). That’s quite good! So how does that compare to skip lists?

OperationB-TreeSkip List
SearchO(log n)O(log n)
InsertionO(log n)O(log n)
DeletionO(log n)O(log n)

They have the same average time complexity. But time complexity doesn’t tell the whole story. They bring advantages that don’t show up in Big-O math. So why would you want to use a skip list?

antirez (creator of Redis), summed it up quite well: skip lists are lightweight, cache-friendly, and dead simple to implement. In his words:

  1. They are not very memory intensive. It’s up to you basically. Changing parameters about the probability of a node to have a given number of levels will make then less memory intensive than btrees.
  2. A sorted set is often target of many ZRANGE or ZREVRANGE operations, that is, traversing the skip list as a linked list. With this operation the cache locality of skip lists is at least as good as with other kind of balanced trees.
  3. They are simpler to implement, debug, and so forth. For instance thanks to the skip list simplicity I received a patch (already in Redis master) with augmented skip lists implementing ZRANK in O(log(N)). It required little changes to the code.

How skip lists work

Now, how do they work?

Let’s build it up the skip list piece by piece.

Plain linked list (diagram a)

We start with a sorted linked list. To find a value, you walk node by node until you hit it. In the worst case, that’s O(n) steps.

diagram a

Adding shortcuts (diagram b)

How can we do better? What if we give ourselves a shortcut (“express lane”) every other node? Now searching is faster. To find 9: head -> 6 -> 9. Notice we skip over 3 and 7.

diagram b diagram b2

Improving even more (diagram c)

Why stop there? Let’s add an additional express lane that skips over every four nodes. Searching for 9 is even quicker. We can jump to it directly in one step!

diagram c diagram c2

Searching in a skip list

Using our “express lane” analogy, the trick to a general search process is to use those express lanes to quickly cover ground and only drop down when you’ve gone too far. This lets you avoid scanning every single node.

Here’s the search process step by step:

  1. Start at the highest level from the head node
  2. Move forward until the next node would overshoot the value you’re looking for
  3. Drop down a level and continue searching
  4. Repeat until either:
    • you land exactly on the node or
    • you reach the bottom and it isn’t there

Each new linked list has 1/2 as many entries as the one below it. You can continue repeating this process. In this “ideal” skip list, every other node from level i is promoted up to the next level i+1.

Random promotion

Instead, what if you do it randomly? At each node, flip a coin. Heads: promote it to a higher level. Tails: leave it where it is.

This randomness ensures that, on average, you have enough shortcuts to make searches fast without having to carefully rebalance anything (unlike in a B-tree).

Comparing deterministic vs random (figures d and e)

The difference:

  • Figure d: a neat pyramid-like structure from promoting every other node
  • Figure e: a messier look due to random promotion. For example, node 6 climbs all the way to the top while node 21 never gets promoted at all
diagram d diagram e

This randomization process assumes a probability of 1/2. But you can tweak that value.

Why randomness helps

What’s the benefit of building up each level randomly? Compared to balanced tree structures that have to maintain more complex balance information, skip lists use randomization which makes implementation much easier. Practically, skip lists can do everything a binary tree can do.

Conclusion

In short, skip lists give you performance of balanced trees, without the balancing complexity. That simplicity, is why you’ll find them in in-memory data structures (in Redis and LSM engines).

Further reading

💬 Have thoughts on this post? Send me an email or use this form

If you're interested in updates, you can subscribe below or via the RSS feed