What to do when your database grew too big in size
Let’s be clear that a single monolithic database is big enough and majority of the companies run fine with it.
However, once you outgrow the database capacity, these are some of the ways to work around it.
Compress data#
Most databases now support compression in some way. For example PostgreSQL and MySQL.
Doordash https://www.google.com/amp/s/doordash.engineering/2019/01/02/speeding-up-redis-with-compression/amp/
Delete data#
The way to work with billions of data is, not to work with billions of data.
Keep the database slim
Do not keep the soft deleted data
Grab uses TTL to remove the data
Zendesk removes the data in the main db and keep them in a separate historical DB
Laravel move the deleted data to separate table https://github.com/spatie/laravel-deleted-models
Shard#
Split the data into chunks
Cons: choosing the shard key is difficult. No cross shard operations (eg joins, aggregation)
Robinhood shard the data at application level
Shopify split the data by tenant
Facebook is sharding their database aggressively using Tupperware orchestrator
Notion https://www.notion.so/blog/sharding-postgres-at-notion
Vitess
Cockroach DB
Citus
Use Bottomless database#
Pros: cheap, durable and virtually limitless storage size Cons: latency since it would need to retrieve data from the cloud. Not popular yet
Neon DB
Clickhouse storage tiering
Postgres Timescale storage tiering
Redpanda
Materialize
Dragonfly DB: quickly spin a new replica
Sqld bottomless
Conclusion#
There are many dimensions that require you to scale-out your database, for example scaling database reads and database writes.