Microservice: Shared database

If you are familiar with microservice architecture, you might heard the pattern “Database per service”. There are popular articles written by Chris Richardson, AWS and Microsoft on this topic. These articles suggest that microservices should not share data via database access, but only via inter-microservice RPC/API calls. This pattern ensures that the respective microservice owner owns the data and the schema, so that it can evolve independently. The microservices communicate only via established interface of the API calls.

I have worked in several companies and I’ve seen both shared database and separate database per service played well. Since there are many references on database per service available only, I’m going to instead explain my experience on working with shared database in a microservice architecture.

Software engineering is always about making trade off and not being dogmatic with the tech decisions.

Shared database service do it right#

I used to work in a startup company (now no longer active) that uses microservice architecture in a monorepo. The entire team that I worked with is just around 10 engineers. In my opinion, choosing shared database was the right call at the time.

For context, although we’re working in a relatively small team, we build and operate around 10 microservices, each of them listens to events and process them asynchronously. There were a lot of message passing between the microservices but all of them access the same databases at the same time.

The data access layer or the ORM was shared across all of the microservices.

One way you could achieve this is by using monorepo setup where the data access layer is made as a shared library that can be imported by other microservices. In context of Go project setup, it would look something like below:

.
├── data_access_layer
├── microservice1
├── microservice2
└── go.work

Alternatively, you could also be a package published to package registry which is then imported by all microservices in the system. However, I personally like the monorepo setup better because each time you make changes, you can test all the microservices in the system to work as expected and then you can deploy confidently.

Nevertheless, this data access layer is used a common way to serialize & deserialize data in the data store of your choice. You can use tools like sqlc, jet, gorm to auto-generate the code that translate the table schema into the object or you can hand write your data model. This layer is database-agnostic, which means you could write the data access layer for Mongo, Cassandra or any other databases.

One important distinction you’d need to make is, each time the data model is changed, all microservices has to redeployed with the latest data access layer. This ensures that all services are referencing to the latest data schema when interacting with the database.

Another important detail is, when you are making schema change, you have to always make sure that they are always backward compatible. This is also applies to Database per service approach where the API has to be always backward compatible so that you don’t break their existing behaviour.

If you are using SQL databases, the schema & the migrations have to be managed centrally.

Advantages of using shared database for microservice architecture are:

You can perform JOINs across microservices easily since the data access layer/ORM has access to all tables, thus avoiding N+1 issues
You can use DB transaction to ensure operation atomicity
You don’t have to copy/pipe the data over from one service to another just to be able to query the data from another microservice. This typically requires message brokers and it’s expensive to operate & maintain.
If it’s in a monorepo setup, any changes you make to the data_access_layer directory has to pass the tests on all other microservice in the repo.
You can enforce Foreign Key constraint on the tables because they are aware of each other

There are disadvantages of this approach are:

Since the data model is shared across all microservices, the data schema designed might not best suite each of the microservice access pattern
Since the database is shared across all microservices, the load from diferrent microservices might interfere other microservices

In my opinion, this pattern works well when you are working in a small team, and you prefer microservice instead of monolith architecture.

A big company that adopt this pattern is Datadog until then they finally migrated to database per service pattern.

The wrong way of doing shared database service#

In my opinion, if you are repeatedly declaring the table schema in many places, then you are doing it wrong. If the data model is not propagated immediately to the other microservice, you could face a lot of issues in the future.

For example, when you repeatedly declare the same ORM in all repositories, you’ll likely lost track which one is the latest schema. Your system is now has become so brittle. Some microservice could be accessing a field/column that no longer exists/deprecated. Some people/team might forget to update the data model in some repos after many months.

Compare this to Database per service pattern#

It’s commonly stated that the Database per service is the best practice when architecting a microservice pattern. However, it comes with a price.

You can read about its advantages from the articles I shared earlier. However, the common trade offs when working with database per service pattern are:

Since you cannot perform JOINs in the database, it’s common to fall into the trap of doing N+1 when you want to JOIN the data. Otherwise, the service owner has to build a completely new endpoint to accept multiple arguments to avoid N+1 problems.
Since the data is owned and isolated by the team, any other team that wants the same copy of the data has to stream the data into their own local copy that they has to maintain themselves. This is commonly done using Kafka data fan-out.
You cannot enforce Foreign Key among the tables because they should not know each other.
Since you can only share the data via RPC, each microservices has to be deployed with the right capacity to be able to handle internal traffic load

Conclusion#

Building system for 10 engineers is definitely not the same buidling for 1000 engineers. If you’re confused or conflate between the 2 situations, we’d probably end up in a bad place.