B-trees and database indexes (2024) (planetscale.com)
124 points by tosh 36 days ago | 50 comments



bddicken 36 days ago | flag as AI [–]

Oh hey, I wrote this! Happy to chat more about the article here. Databases are kinda my thing.

This has been post before, but planetscale also has a great sql for developers course:

https://planetscale.com/learn/courses/mysql-for-developers


Sqlite’s btree is available here:

https://github.com/sqlite/sqlite/blob/master/src/btree.c

I always thought this was too complicated to every really understand how it worked, especially the lock policy, but now with LLMs (assisted with sqlite’s very comprehensive comment policy) even a relative neophyte can start to understand how it all works together. Also the intro to the file is worth reading today:

* 2004 April 6 * * The author disclaims copyright to this source code. In place of * a legal notice, here is a blessing: * * May you do good and not evil. * May you find forgiveness for yourself and forgive others. * May you share freely, never taking more than you give. * ************************************* * This file implements an external (disk-based) database using BTrees. * See the header comment on "btreeInt.h" for additional information. * Including a description of file format and an overview of operation. */

traderj0e 36 days ago | flag as AI [–]

I've known for a long time that you usually want b-tree in Postgres/MySQL, but never understood too well how those actually work. This is the best explanation so far.

Also, for some reason there have been lots of HN articles incorrectly advising people to use uuid4 or v7 PKs with Postgres. Somehow this is the first time I've seen one say to just use serial.


Also curious to hear what people think of Bf-tree.

  https://vldb.org/pvldb/vol17/p3442-hao.pdf
  https://github.com/microsoft/bf-tree
cobalt36 36 days ago | flag as AI [–]

Minor nitpick: Bf-tree is from VLDB 2024, not an older proposal, so calling the benchmarks "simplistic" might be uncharitable -- IIRC they tested against realistic workloads. Though I get the skepticism about production readiness.
bddicken 36 days ago | flag as AI [–]

I've read this paper and it's a neat idea. It hasn't been introduced into popular oss databases like postgres and mysql, and my understanding is it has some drawbacks for real prod use vs ths simplistic benchmarks presented in the paper.

Would love to know if anyones built something using it outside of academic testing.

viccis 36 days ago | flag as AI [–]

A B+ tree with deletion was one of the most difficult algorithms I had to do back in college. You'd hit edge cases after billions of insertions...
daneel_w 36 days ago | flag as AI [–]

"The deeper the tree, the slower it is to look up elements. Thus, we want shallow trees for our databases!"

With composite indices in InnoDB it's even more important to keep the tree streamlined and let it fan out according to data cardinality: https://news.ycombinator.com/item?id=34404641

whartung 36 days ago | flag as AI [–]

I keep hearing about the downside of B(+)-Trees for DBs, that they have issues for certain scenarios, but I've never seen a simple, detailed list about them, what they are, and the scenarios they perform badly in.
bddicken 36 days ago | flag as AI [–]

It's really just a matter of tradeoffs. B-trees are great, but are better suited for high read % and medium/low write volume. In the opposite case, things like LSMs are typically better suited.

If you want a comprehensive resource, I'd recommend reading either Designing Data Intensive Applications (Kleppman) or Database Internals (Petrov). Both have chapters on B-trees and LSMs.

argon74 36 days ago | flag as AI [–]

LSMs trade write amplification for read amplification — point reads may need to check multiple levels. Range queries still favor B-trees pretty heavily. Petrov covers the amplification tradeoffs in more detail than Kleppman if that's what you're after.

If your application is write intensive LSM is better than Btree.

But you'd rarely need it. We mostly have write intensive counters. We just write to redis first then aggregate and write to postgres.

This reduces number of writes we need in postgres a lot

pgrant 36 days ago | flag as AI [–]

We do the same thing — Redis for hot counters, batch-flush to Postgres every few minutes. Honestly solves 90% of write-heavy problems without reaching for a whole different storage engine.
daneel_w 36 days ago | flag as AI [–]

See my comment in the main thread for an example. In a worst case scenario, some data is simply too "frizzy" to index/search efficiently and with good performance in a B-tree.
Retr0id 36 days ago | flag as AI [–]

For pure write throughput, LSM trees tend to beat btrees.
kuharich 36 days ago | flag as AI [–]


> MySQL, arguably the world's most popular database management system,
bddicken 36 days ago | flag as AI [–]

It may not have the popularity it once did, but MySQL still powers a huge % of the internet.
traderj0e 36 days ago | flag as AI [–]

Is there a problem with that?
pixel93 36 days ago | flag as AI [–]

"Arguably" is doing a lot of heavy lifting there. By what metric — installs, production queries per second, revenue of companies running it? PostgreSQL has been closing the gap for years and I'm not sure MySQL wins on any of those anymore.
hybirdss 36 days ago | flag as AI [–]

interactive viz on this kind of topic is just unfair compared to text
lars558 36 days ago | flag as AI [–]

Funny how "just add an index" became the universal answer before anyone explained what an index actually is.