Denormalization in NoSQL databases is the process of combining or nesting related data into a single document or structure, instead of separating it into multiple tables or collections as in traditional relational databases. This approach can lead to better performance and faster retrieval of data by avoiding the need for complex joins or multiple queries.

In document-based NoSQL databases, such as MongoDB, denormalization can be achieved by embedding one document as a subdocument within another document. For example, instead of having separate collections for user and address, the address information can be nested within the user document.

In graph-based NoSQL databases, such as Neo4j, denormalization can be achieved by creating direct relationships between nodes, instead of using relationships to link nodes through intermediate nodes.

It’s important to note that denormalization can lead to data duplication and increased document size, so it’s important to weigh the pros and cons and carefully consider the trade-offs before deciding to denormalize a database.

There are two techniques for schema design:

1 - Embedded Document Pattern

In document-based databases, nesting refers to the practice of including one entity within another as a subdocument.

{
  "_id": "123456789",
  "name": "Example Product",
  "description": "This is an example product.",
  "price": 9.99,
  "reviews": [
    {
      "rating": 4,
      "comment": "This product works well.",
      "author": "User1"
    },
    {
      "rating": 5,
      "comment": "I love this product!",
      "author": "User2"
    }
  ]
}

Suppose you are building an e-commerce application, and you need to store information about a product and its reviews. Instead of creating separate documents for the product and its reviews and then linking them using references, you can use the Embedded Document Pattern to store the reviews as a nested document within the product document.

In this example, the “reviews” field is an array of nested documents that contain information about each review. By using the Embedded Document Pattern, you can easily query for all the reviews of a particular product without having to perform additional queries or join operations. Additionally, this approach can help improve performance by reducing the number of database requests required to retrieve all the data needed for a particular operation.

Pros

  • Obtaining all related information becomes possible with a single query.
  • Implementing joins in application code or using populate/lookup (Join) is not necessary.
  • Updating related information can be done in one atomic operation. As a default, all Create, Read, Update, and Delete operations on a single document are ACID compliant.

Cons

  • Document-based databases have a limit on the size of a single document. For example, MongoDB has a 16 MB limit.
  • The level of nesting or embedding of subdocuments is another factor to consider. MongoDB, for instance, can support embedded data up to a depth of 100.
  • When large amounts of data are embedded in a single document, it is important to keep in mind this limitation.

2 - Extended Reference Pattern

Embedding ensures the best behavior and data consistency in many circumstances. However, in some cases, a normalized model may perform better. Normalizing the data collection into multiple collections provides greater flexibility in executing queries.

It involves storing related data in separate documents and using references to link them together, rather than embedding the data as nested documents within a single document.

User Document

{
  "_id": "123456789",
  "name": "John Smith",
  "email": "john.smith@example.com",
  "posts": [
    "987654321",
    "543216789"
  ]
}

Post Document

{
  "_id": "987654321",
  "author": "123456789",
  "text": "This is my first post."
}

Suppose you are building a social media application, and you need to store information about users and their posts. Instead of embedding the posts as nested documents within the user document, you can store them in a separate document and use references to link them to the user document.

In this example, the “posts” field in the user document is an array of references to the post documents that belong to the user. Each post document contains a reference to the user document that authored the post.

By using the Extended Reference Pattern, you can easily query for all the posts of a particular user by performing a lookup operation using the references stored in the user document. Additionally, this approach can help avoid issues with data duplication and consistency that can arise when data is embedded as nested documents within a single document.

Pros

  • Splitting up the data results in smaller documents.
  • Selective retrieval of data, as most queries do not require fetching all data from a collection.
  • Avoids duplicating data.

Cons

  • At least two queries or the populate/lookup (join) function are required to retrieve data from referenced documents.

Which is Better?

When considering schema modeling and relationships:

  • One to one: embedding method.
  • One to many: referencing method.
  • Many to many : referencing method.

Unless there is a compelling reason not to, prefer embedding.

The need to access an object on its own is a compelling reason not to embed it.

Avoid joins and populate (lookups) if possible, but use them if they provide a better schema design.

If there are more than a few hundred documents on the many side, do not embed them.

If there are more than a few thousand documents on the many side, do not use an array of ObjectID references.