BlueSky's Massive Database Sharding Refactor in TypeScript

Опубликовано: 03 Декабрь 2023
на канале: Stephen Blum
122
4

BlueSky just updated their AT protocol, short for Authenticated Transfer Protocol, and I think it's worth your attention. They decided to shard their database, every user now gets their own personal data sector. It doesn't get heavier in terms of database sharding than this, unless of course, you decide to dedicate one database row to each user.

That would be absurd, yet technically, it qualifies as a database. Here's the juicy part. They gave us an opportunity to rejig the PDS, the Personal Data Server.

Now, this is a major part of the AT protocol, particularly useful when you're dealing with large-scale distribution involving multiple servers and computers. The need for inherent reliability and redundancy cannot be left out of discussion. The PDS now stands as a single-tenant SQLite data center, with each user owning a unique SQLite file.

Here's how they did it. By using the first two characters of a SHA-256 hash of an ID (looks like the user ID or person ID) as a directory, they've ensured your directories remain manageable. The actual database ID gets stored into this folder.

It might sound simple, but the number of commits it took was huge. On top of that, they did all this in TypeScript, very unexpected. You'd usually see the likes of Rust or other compiled languages, but no, it's all in TypeScript.

Beyond that, we have the disk directory for the database itself, the SQLite store, where the directory and database location is returned by the code.