Examining Notion's Backend Architecture

Table of contents

Introduction

Notion, a leading productivity and note-taking platform, has gathered a dedicated user base exceeding 30 million users globally. In this post, we will explore the intricate technical architecture supporting Notion's platform, enabling seamless support for millions of concurrent users.

Data Model

Diverging from conventional note-taking applications that store documents, Notion adopts a less traditional approach by organizing data into modular entities known as blocks. These blocks constitute the fundamental units of information and facilitate a versatile hierarchical structure capable of accommodating various data types including text, images, lists, data rows, and even entire pages. Hierarchy of Blocks

Here is an example of how Notion uses blocks to store a page with a paragraph and a few to-do items. Example of Blocks

The page is stored in a page block that contains a unique identifier of a page, title, references to child blocks, a reference to parent page, and a reference to a workspace. Page Block Example

Child blocks store content of the page. Here is an example of a to-do block that stores a completed to-do item. This block type looks very similar to the page block. It stores a unique identifier of a page, text and status of the to-do item, references to child blocks, a parent and a workspace. To-do Block Example

Blocks provide the flexibility that is not available in traditional document formats. For example, since the content of a page is not stored directly inside a page block, the same content block can be included into multiple pages or multiple times into the same page. Page Inside Page

Another example of blocks’ flexibility is the ability to have a page inside a table which is included into another page. Page Inside Table

Database

Notion leverages a PostgreSQL database hosted on the Amazon RDS for PostgreSQL cloud service to store its blocks. At the beginning Notion used one database server to handle all requests. As the number of users was increasing, Notion’s team was applying vertical scaling by increasing the capacity of the virtual server to keep up with the increasing load. Eventually, even the largest available Amazon’s database instance was not able to handle the constantly increasing load and Notion’s team had to split the single database server into a cluster of servers. As of July 2023 the database cluster consisted of 96 database servers. Each database server has 1 database that is logically partitioned into 5 shards with each shard represented as a PostgreSQL schema. Database Cluster Details

The data are partitioned into logical shards using workspace ID as a partition key. This ensures that all blocks that belong to one workspace are stored in the same database. This approach allows using transactions and guarantees consistency when storing and updating blocks. Database Partition Key

Back-end Services

Client applications interface with Notion's database via a dedicated API server, operating on a cluster of Node.js web servers. The connection to the database is managed through PgBouncer connection pooling, enhancing performance and scalability. Back-end Architecture

Strengths and Limitations

Notion's architectural framework showcases the scalability of relational databases in accommodating vast user bases. While the utilization of relational databases offers advantages such as transactional integrity and scalability, it demands meticulous planning and execution. Let's look at the strengths and limitations inherent in Notion's architecture.

Strengths:

  • 👍 Blocks data model supports features that are not available in traditional documents such as reusing same content in multiple pages and storing pages inside tables.
  • 👍 Blocks are better units for synchronization when compared to traditional documents. Using blocks instead of documents for synchronization decreases the network traffic sent from clients and reduces conflicts when multiple users are working on the same document.
  • 👍 Relational database allowed Notion’s engineering team to start small, quickly build the first version and then gradually iterate and scale their system from a few hundreds to millions of active users.
  • 👍 Relational database allows relying on transactions to guarantee consistency of data.
  • 👍 Using multiple database servers in a cluster partitioned by workspace allows fine tuning the size of individual servers to match their load since some workspaces have higher load than others.

Limitations:

  • 👎 The relational database does not scale automatically. As the number of users were increasing, the engineering team had to spend months to prepare and execute complex migrations of data from a single database server to shards on a cluster of servers.
  • 👎 Since each database stores data for multiple tenants/workspaces, PostgreSQL’s built-in backup and restore functionality cannot be used to restore data of individual tenants.
  • 👎 Blocks data model makes it more difficult to implement end-to-end encryption when compared to encrypting traditional documents. Even if the content of each block is encrypted, the backend still needs to know relations between blocks to load them from the database.

In summary, while Notion's architecture showcases the resilience of relational databases in supporting expansive user bases, it also underscores the complexities inherent in scaling and maintaining such systems.