MongoDB Maximum Document Size

Mehvish Ashiq Feb 16, 2024
MongoDB Maximum Document Size

This tutorial describes the default maximum size limit for storing a document in MongoDB. It also educates the alternate solution if the data exceeds the size limit.

We will also learn about the efficient use of the default maximum size limit for a BSON document.

MongoDB Maximum Document Size

In MongoDB, the documents (objects) are stored in BSON format. The BSON (the Binary JSON) is a binary serialization of the JSON-like documents.

Using this format, we can use different extensions to use the various representation of data types that are not a part of the JSON.

For instance, we have a Date and BinData type in BSON that are not available in JSON. According to the MongoDB documentation, the size limit for a single BSON document is 16MB.

We have the maximum size limit of a document to ensure that one document can’t use the unrestricted amount of RAM or bandwidth during transmission. Remember that we can nest the BSON documents up to 100 levels where each array/object adds one level.

In today’s world, we have data all around us. So, there is a possibility that our data may increase the size limit for a BSON document which is 16 megabytes.

In that case, MongoDB assists us by providing the GridFS API to store the documents larger than 16MB.

What Is the GridFS API

The GridFS is a MongoDB specification that we can use to store and access the large files exceeding the limit of BSON document (16MB), for instance, audio, video, or image files. It is similar to the file system for storing files, but the data is stored in MongoDB collections.

The GridFS API divides the file into chunks and stores every data chunk in a separate document where each document’s size is 255KB. The GridFS contains two collections, fs.files and fs.chunks by default, storing a file’s metadata and chunks.

Every chunk is recognized by a unique _id (the ObjectId) field, while the fs.files serve as a parent document. The files_id field in the fs.chunks document links the chunk to its parent.

You can go through this article to understand the syntax while using GridFS.

Use Default BSON Document Size Limit Efficiently

The BSON document size limit (16MB) is a lot. For instance, the whole uncompressed text of the War of the Worlds is only 364k (HTML), but exceptions are always there.

If your data exceeds the limit, you can use the GridFS API that we discussed earlier or make a strategy for efficient use of 16MB.

Let’s have a scenario where we want to develop an XYZ application. The application needs four data types - Booleans, numbers, strings, and dates (represented as UNIX ms).

With a 16MB size limit, MongoDB can easily store around two million values of 64-bit numbers (dates and Booleans as well).

Here, the string type values need special attention because every UTF-8 character occupies one byte. We need to optimize the size of all the columns containing string type values.

We can try the following ways to decrease the size of a column having string type values.

  1. We can use the stringify() and zip() method as zip(JSON.stringify(column.values));.

  2. We can create a dictionary and insert all unique string type values into the dictionary. Then, replace the string values with indexes.

    This approach is useful if we have many repeated string values in a field. This method will not help if someone wants to store a column of hashes, but they can use the GridFS API.

  3. We can also split the column into various chunks and save these chunks in some other documents linked to the main document.

There is a reference article demonstrating all these approaches.

Mehvish Ashiq avatar Mehvish Ashiq avatar

Mehvish Ashiq is a former Java Programmer and a Data Science enthusiast who leverages her expertise to help others to learn and grow by creating interesting, useful, and reader-friendly content in Computer Programming, Data Science, and Technology.

LinkedIn GitHub Facebook

Related Article - MongoDB Document