Project Nested Fields in MongoDB

Mehvish Ashiq Nov 05, 2023
  1. Understanding Nested Fields in MongoDB
  2. Use the $project Aggregation Stage to Project Nested Fields in MongoDB
  3. Use the $unset Aggregation Stage to Get Nested Fields Excluding the Specified Ones in MongoDB
  4. Use a forEach() Loop to Get Nested Fields in MongoDB
  5. Use the mapReduce() Method to Project Nested Fields in MongoDB
  6. Use the $addFields Aggregation Stage to Project Nested Fields in MongoDB
  7. Use the Dot Notation to Project Nested Fields in MongoDB
  8. Use the $map and the $mergeObjects Aggregation Pipeline to Project Nested Fields in MongoDB
  9. Conclusion
Project Nested Fields in MongoDB

MongoDB, a NoSQL database, offers powerful features for handling complex data structures. One common task is projecting nested fields, which involves extracting specific elements from nested documents.

Today, we will learn how to use the $project, $unset, and $addFields aggregation stages, the forEach() loop, the mapReduce() method, the dot notation, and the $map and $mergeObjects aggregation pipeline to project nested fields while querying data in MongoDB. In this guide, we will explore various methods to achieve this, complete with detailed explanations and example codes.

Understanding Nested Fields in MongoDB

Nested fields in MongoDB refer to documents embedded within other documents. These nested documents can contain their own set of fields, forming a hierarchical structure.

Consider the following example:

{
   "_id": 1,
   "name": "John Doe",
   "address": {
      "street": "123 Main St",
      "city": "New York",
      "country": "USA"
   }
}

In this document, the address field is nested, containing subfields like street, city, and country.

In MongoDB, we can retrieve all documents using the find() method, but what if we only want access to specific nested fields? This is where we use projection.

We can project nested fields in various ways. Here, we will learn about the following solutions to project nested fields:

  1. Use the $project aggregation stage
  2. Use the $unset aggregation stage
  3. Use the forEach() loop
  4. Use the mapReduce() function
  5. Use the $addFields aggregation stage
  6. Use the dot notation
  7. Use the $map and $mergeObjects aggregation pipeline

To learn the above approaches, let’s create a collection named nested containing one document. You may also use the query given below to follow up with us.

Example Code:

// MongoDB version 5.0.8

db.nested.insertOne(
    {
        "name": {
            "first_name": "Mehvish",
            "last_name": "Ashiq",
         },
         "contact": {
            "phone":{"type": "manager", "number": "123456"},
            "email":{ "type": "office", "mail": "delfstack@example.com"}
         },
         "country_name" : "Australien",
         "posting_locations" : [
             {
                 "city_id" : 19398,
                 "city_name" : "Bondi Beach (Sydney)"
             },
             {
                  "city_id" : 31101,
                  "city_name" : "Rushcutters Bay (Sydney)"
             },
             {
                  "city_id" : 31022,
                  "city_name" : "Wolly Creek (Sydney)"
             }
          ],
          "regions" : {
              "region_id" : 796,
              "region_name" : "Australien: New South Wales (Sydney)"
          }
    }
);

Use the db.nested.find().pretty(); command on the Mongo shell to see the inserted data.

Use the $project Aggregation Stage to Project Nested Fields in MongoDB

The $project stage is a fundamental component of the MongoDB Aggregation Pipeline. It allows for the reshaping and transformation of documents, enabling precise control over which fields are included or excluded from the output.

Syntax:

db.collection.aggregate([
   {
      $project: {
         field1: <expression>,
         field2: <expression>,
         ...
      }
   }
])
  • db.collection.aggregate([...]): Initiates the aggregation pipeline on a specific collection.
  • $project: Specifies the $project stage.
  • field1: <expression>, field2: <expression>, …: Define the fields you want to include in the output. The <expression> can be a direct field, a transformation, or an evaluation.

When dealing with nested documents, it’s common to need specific fields from within those nested structures. The $project stage excels at this task.

Example Code:

// MongoDB version 5.0.8

var current_location = "posting_locations";
var project = {};
project["id"] = "$"+current_location+".city_id";
project["name"] = "$"+current_location+".city_name";
project["regions"] = 1;

var find = {};
find[current_location] = {"$exists":true};

db.nested.aggregate([
    { $match : find },
    { $project : project }
]).pretty()

Output:

{
  _id: ObjectId("..."),
  regions: {
    region_id: 796,
    region_name: 'Australien: New South Wales (Sydney)'
  },
  id: [
    19398,
    31101,
    31022
  ],
  name: [
    'Bondi Beach (Sydney)',
    'Rushcutters Bay (Sydney)',
    'Wolly Creek (Sydney)'
  ]
}

Here, we save the first-level field named posting_locations in a variable called current_location.

Then, we use that variable to access the city_id and city_name and save them in the project object while using bracket notation to create properties for the project object. Additionally, we save the regions field in the project["regions"].

Next, we have another object named find that we will use in the aggregate() method to match the documents. In the aggregate() method, we use the $match stage to match the documents and $project to project the fields, whether nested or at the first level.

We use $project to specify what fields we want to display in the output. We can use the following solution if we are interested in projecting the specified nested fields only without any filter query.

Example Code:

// MongoDB version 5.0.8

var current_location = "posting_locations";
db.nested.aggregate({
    $project: {
         "_id": 0,
         "city_id": "$" + current_location + ".city_id",
         "city_name": "$" + current_location + ".city_name",
         "regions": 1
    }
}).pretty();

Output:

{
  regions: {
    region_id: 796,
    region_name: 'Australien: New South Wales (Sydney)'
  },
  city_id: [
    19398,
    31101,
    31022
  ],
  city_name: [
    'Bondi Beach (Sydney)',
    'Rushcutters Bay (Sydney)',
    'Wolly Creek (Sydney)'
  ]
}

Use the $unset Aggregation Stage to Get Nested Fields Excluding the Specified Ones in MongoDB

The $unset stage is a crucial component of MongoDB’s aggregation pipeline. It allows you to remove specific fields from documents, effectively excluding them from the final output.

This can be particularly useful when dealing with complex nested documents.

Example Code:

// MongoDB version 5.0.8

db.nested.aggregate({
        $unset: ["posting_locations.city_id", "contact", "regions", "name", "_id"]
}).pretty()

Output:

{
  country_name: 'Australien',
  posting_locations: [
    {
      city_name: 'Bondi Beach (Sydney)'
    },
    {
      city_name: 'Rushcutters Bay (Sydney)'
    },
    {
      city_name: 'Wolly Creek (Sydney)'
    }
  ]
}

Here, we use the $unset operator, which is used to delete the specified field or array of fields.

Remember that we use the dot notation to specify the embedded documents or array of documents. The $unset operator does no operation if the given field does not exist.

When we use $ to match the elements of an array, the $unset operator replaces matching elements with null instead of removing them from the array. This behavior assists in keeping the element positions and array size consistent.

Use a forEach() Loop to Get Nested Fields in MongoDB

The forEach() loop in MongoDB is a JavaScript method that allows you to iterate over elements in an array or documents in a cursor. This can be particularly useful when dealing with nested arrays or documents.

Syntax For Arrays:

array.forEach(function(currentValue, index, arr), thisValue)
  • currentValue: The current element being processed in the array.
  • index (Optional): The index of the current element being processed in the array.
  • arr (Optional): The array that forEach() is being applied to.
  • thisValue (Optional): A value to use as this when executing the callback function.

Syntax For Cursors (used in queries):

cursor.forEach(function(doc))
  • doc: The current document being processed in the cursor.

Note: The callback function used with forEach() can take up to three arguments for arrays (currentValue, index, and arr), but for cursors, it typically takes a single argument (doc) since cursors represent a stream of documents.

Example Code:

// MongoDB version 5.0.8

var bulk = db.newcollection.initializeUnorderedBulkOp(),
counter = 0;

db.nested.find().forEach(function(doc) {
    var document = {};
    document["name"] = doc.name.first_name + " " + doc.name.last_name;
    document["phone"] = doc.contact.phone.number;
    document["mail"] = doc.contact.email.mail;
    bulk.insert(document);
    counter++;
    if (counter % 1000 == 0) {
        bulk.execute();
        bulk = db.newcollection.initializeUnorderedBulkOp();
    }
});

if (counter % 1000 != 0) { bulk.execute(); }

Output:

{
  acknowledged: true,
  insertedCount: 1,
  insertedIds: {
    '0': ObjectId("...")
  },
  matchedCount: 0,
  modifiedCount: 0,
  deletedCount: 0,
  upsertedCount: 0,
  upsertedIds: {}
}

Next, execute the command below on your Mongo shell to see the projected fields.

// MongoDB version 5.0.8

db.newcollection.find().pretty();

Output:

{
  _id: ObjectId("..."),
  name: 'Mehvish Ashiq',
  phone: '123456',
  mail: 'delfstack@example.com'
}

To learn this example code, suppose we want to grab certain nested fields and insert them into a new collection. Here, inserting the transformed fields as a document into a new collection may impact our operations based on the size of the nested collection.

We can avoid this slow insert performance by using a new unordered bulk insert API. It will streamline the insert operations by sending in bulk and give us feedback in real-time about whether the operation succeeded or failed.

So, we are using the bulk insert API to insert the desired data structure into the newcollection collection, where the brand new documents will be created with the nested collection cursor’s forEach() loop. To create new properties, we use the bracket notation.

For this code, we assume to have a large amount of data. So, we will send the operations to a server in 1000’s batches to perform the bulk insert operation.

As a result, it gives us good performance because we are not sending each request but just once for every 1000 requests to the server.

Use the mapReduce() Method to Project Nested Fields in MongoDB

The mapReduce() method in MongoDB allows for flexible data processing by applying JavaScript functions to collections. It performs two primary steps: mapping and reducing.

The mapping step processes each document, while the reducing step aggregates and summarizes the mapped data.

Syntax:

db.collection.mapReduce(
   <mapFunction>,
   <reduceFunction>,
   {
      out: <output>,
      query: <query>,
      sort: <sort>,
      limit: <limit>,
      finalize: <finalize>,
      scope: <scope>,
      jsMode: <boolean>,
      verbose: <boolean>,
      bypassDocumentValidation: <boolean>
   }
)

Here’s an explanation of the parameters:

  • <mapFunction> (function): This is a JavaScript function that processes each document and emits key-value pairs. It takes the form function() {...}.
  • <reduceFunction> (function): This function aggregates and processes the mapped values. It takes the form function(key, values) {...}. In some cases, this function may be used as a placeholder.
  • out (string or document): Specifies where to output the results. It can be either a collection name (string) or a document that defines the output options. For example: { out: "output_collection" }.
  • query (document): Specifies the query filter to select documents for processing. This parameter is optional.
  • sort (document): Specifies the order in which documents are processed. This parameter is optional.
  • limit (number): Limits the number of documents processed by mapReduce(). This parameter is optional.
  • finalize (function): An optional JavaScript function that can be used to further process the result values after the reduce step.
  • scope (document): A document that defines the global variables accessible in the map and reduce functions.
  • jsMode (Boolean): When set to true, MongoDB performs the map-reduce operation in JavaScript mode. This can be used for compatibility with earlier versions of MongoDB.
  • verbose (Boolean): When set to true, the map-reduce operation provides detailed logging.
  • bypassDocumentValidation (Boolean): When set to true, the map-reduce operation bypasses document validation during the operation.

Keep in mind that both the mapFunction and reduceFunction should be valid JavaScript functions. The mapFunction emits key-value pairs, and the reduceFunction aggregates the values for a specific key.

Remember to replace placeholders like <mapFunction>, <reduceFunction>, etc., with your actual functions or values when using mapReduce() in MongoDB.

When dealing with nested documents, mapReduce() provides a powerful mechanism to selectively project specific fields. This can be especially useful when you need to extract and process specific elements from deeply nested data structures.

Example Code:

// MongoDB version 5.0.8

function map() {
    for(var i in this.posting_locations) {
         emit({
             "country_id" : this.country_id,
             "city_id" : this.posting_locations[i].city_id,
             "region_id" : this.regions.region_id
         },1);
    }
}

function reduce(id,docs) {
      return Array.sum(docs);
}

db.nested.mapReduce(map,reduce,{ out : "map_reduce_output" } )

Now, run the following query to see the output.

// MongoDB version 5.0.8
db.map_reduce_output.find().pretty();

Output:

{
        "_id" : {
                "country_id" : undefined,
                "city_id" : 19398,
                "region_id" : 796
        },
        "value" : 1
}
{
        "_id" : {
                "country_id" : undefined,
                "city_id" : 31022,
                "region_id" : 796
        },
        "value" : 1
}
{
        "_id" : {
                "country_id" : undefined,
                "city_id" : 31101,
                "region_id" : 796
        },
        "value" : 1
}

For this example code, we use the mapReduce() function to perform map-reduce on all documents of the nested collection. For that, we have to follow a three-step process briefly explained below.

  • Define the map() function to process every input document. In this function, the this keyword refers to the current document being processed by the map-reduce operation, and the emit() function maps the given values to the keys and returns them.
  • Here, we define the corresponding reduce() function, which is the actual place where aggregation of data takes place. It takes two arguments (keys and values); our code example takes the id and docs.

    Remember that the elements of the docs are returned by the emit() function from the map() method. At this step, the reduce() function reduces the docs array to the sum of its values (elements).

  • Finally, we perform map-reduce on all the documents in the nested collection by using map() and reduce() functions. We use out to save the output in the specified collection, which is map_reduce_output in this case.

Use the $addFields Aggregation Stage to Project Nested Fields in MongoDB

The $addFields stage in MongoDB’s aggregation framework allows for the addition of new fields to documents in the result set. It is particularly valuable when you want to augment existing documents with additional information or extract nested fields.

The $addFields stage is used within an aggregation pipeline. Its syntax is as follows:

{
   $addFields: {
      newField1: expression1,
      newField2: expression2,
      // ...
   }
}
  • newField: The name of the new field to be added.
  • expression: An expression that defines the value of the new field. This can be a direct value, a computation, or a reference to an existing field.

Consider a collection of documents representing users:

{
   "_id": 1,
   "name": "John Doe",
   "address": {
      "street": "123 Main St",
      "city": "New York",
      "country": "USA"
   }
}

Here, the address field is nested, containing subfields like street, city, and country.

Let’s explore how to use the $addFields stage to extract and project nested fields from our example documents.

db.users.aggregate([
   {
      $addFields: {
         "street": "$address.street",
         "city": "$address.city",
         "country": "$address.country"
      }
   }
])

The db.users.aggregate([...]) initiates an aggregation pipeline on the users collection. Then, the $addFields stage allows us to add new fields to the documents.

Finally, we are adding three new fields (street, city, and country) to each document. The values of these fields are extracted from the nested address document.

Output:

{
	"_id": 1,
	"address": {
	  "city": "New York",
	  "country": "USA",
	  "street": "123 Main St"
	},
	"city": "New York",
	"country": "USA",
	"name": "John Doe",
	"street": "123 Main St"
}

Use the Dot Notation to Project Nested Fields in MongoDB

Dot notation is a powerful and intuitive approach for projecting specific fields within nested documents. It involves using dots (.) to navigate through the document’s structure.

Syntax:

{ "outerField.innerField": 1, "outerField.anotherInnerField": 1, ... }

Let’s break down the components:

  • "outerField.innerField": 1: This notation indicates that we want to include the innerField from the outerField nested document. The value 1 is used to include the field.
  • "outerField.anotherInnerField": 1: Similarly, this includes the anotherInnerField from the outerField nested document.

Let’s delve into practical examples to illustrate the application of dot notation for nested field projection in MongoDB.

Example 1: Basic Field Projection

Consider a collection of user documents with nested address fields:

db.users.insertMany([
   {
      "_id": 1,
      "name": "John Doe",
      "address": {
         "street": "123 Main St",
         "city": "New York",
         "country": "USA"
      }
   },
   {
      "_id": 2,
      "name": "Jane Doe",
      "address": {
         "street": "456 Oak Ave",
         "city": "Los Angeles",
         "country": "USA"
      }
   }
])

Now, let’s use dot notation to project-specific fields:

db.users.find({}, { "address.street": 1, "address.city": 1 })

In this example, the query instructs MongoDB to include the street and city fields from the address nested document. The result will be:

{ "_id": 1, "address": { "street": "123 Main St", "city": "New York" } }
{ "_id": 2, "address": { "street": "456 Oak Ave", "city": "Los Angeles" } }

Example 2: Projecting Multiple Nested Fields

You can project multiple nested fields in a single query:

db.users.find({}, { "address.street": 1, "address.city": 1, "address.country": 1 })

The result will include the street, city, and country fields from the address nested document.

{
    "_id": 1,
    "address": {
      "city": "New York",
      "country": "USA",
      "street": "123 Main St"
    }
},
{
    "_id": 2,
    "address": {
      "city": "Los Angeles",
      "country": "USA",
      "street": "456 Oak Ave"
    }
}

Example 3: Excluding Specific Fields

Dot notation can also be used to exclude specific fields:

db.users.find({}, { "address.country": 0 })

In this query, the country field from the address nested document will be excluded.

{
    "_id": 1,
    "address": {
      "city": "New York",
      "street": "123 Main St"
    },
    "name": "John Doe"
},
{
    "_id": 2,
    "address": {
      "city": "Los Angeles",
      "street": "456 Oak Ave"
    },
    "name": "Jane Doe"
}

Use the $map and the $mergeObjects Aggregation Pipeline to Project Nested Fields in MongoDB

The $map operator is a fundamental aggregation function in MongoDB. It applies an expression to each element in an array and returns an array with the modified elements.

The $mergeObjects operator combines multiple objects into a single document. This is particularly useful when you want to merge fields from different documents or when dealing with nested documents.

Scenario: Extracting Nested Fields

Consider the following example, where each document contains information about a person, including their name and address:

{
   "_id": 1,
   "name": "John Doe",
   "address": {
      "street": "123 Main St",
      "city": "New York",
      "country": "USA"
   }
}

We want to extract the street and city fields from the address nested document.

Method 1: Using $map and $mergeObjects

db.people.aggregate([
   {
      $project: {
         address: {
            $mergeObjects: [
               {
                  street: { $map: { input: [{}], as: 'el', in: '$address.street' } }
               },
               {
                  city: { $map: { input: [{}], as: 'el', in: '$address.city' } }
               }
            ]
         }
      }
   }
])

First, the $project stage is used to shape the output document.

Then, $mergeObjects combines multiple objects. Here, we create two separate objects for street and city.

Inside each object, $map iterates over an array with a single empty object {}. This is done to apply the $address.street and $address.city expressions.

Finally, the result is placed in the address field.

Output:

{
	"_id": 1,
	"address": {
	  "city": [
		"New York"
	  ],
	  "street": [
		"123 Main St"
	  ]
	}
}

Method 2: Combining $map and $mergeObjects for Multiple Documents

db.people.aggregate([
   {
      $project: {
         addresses: {
            $map: {
               input: [{}],
               as: 'el',
               in: {
                  $mergeObjects: [
                     {
                        street: '$$el.address.street',
                        city: '$$el.address.city'
                     }
                  ]
               }
            }
         }
      }
   }
])

Here, we use $map to iterate over an array with a single empty object {}. Inside the $map, $mergeObjects combines the street and city fields from the address nested document.

The result is placed in an array called addresses.

Output:

{
	"_id": 1,
	"addresses": [
	  {}
	]
}

Method 3: Handling Arrays of Nested Documents

db.people.aggregate([
   {
      $project: {
         addresses: {
            $map: {
               input: '$addresses',
               as: 'el',
               in: {
                  $mergeObjects: [
                     {
                        street: '$$el.address.street',
                        city: '$$el.address.city'
                     }
                  ]
               }
            }
         }
      }
   }
])

In this scenario, addresses is an array of nested documents.

We use $map to iterate over each element in the addresses array. Inside the $map, $mergeObjects combines the street and city fields from the address nested document for each element.

Output:

{
    "_id": 1,
    "addresses": null
}

Conclusion

In this guide, we’ve explored seven effective methods for projecting nested fields in MongoDB, a versatile NoSQL database. These methods equip you to handle complex data structures efficiently.

  1. $project Aggregation Stage: Reshapes and transforms documents, enabling precise control over included/excluded fields.
  2. $unset Aggregation Stage: Removes specific fields from documents, useful for complex nested structures.
  3. forEach() Loop: JavaScript method for iterating over elements in an array or cursor, handy for nested arrays/documents.
  4. mapReduce() Method: Applies JavaScript functions to collections, involving mapping and reducing steps.
  5. $addFields Aggregation Stage: Adds new fields to documents, augmenting them with extra information or extracting nested fields.
  6. Dot Notation: Intuitively navigates through nested documents, enabling precise field selection.
  7. $map and $mergeObjects Pipeline: Employs operators to efficiently project specific fields from complex nested structures.

By incorporating these techniques, you’ll adeptly process and transform data to suit your application’s needs.

Mehvish Ashiq avatar Mehvish Ashiq avatar

Mehvish Ashiq is a former Java Programmer and a Data Science enthusiast who leverages her expertise to help others to learn and grow by creating interesting, useful, and reader-friendly content in Computer Programming, Data Science, and Technology.

LinkedIn GitHub Facebook

Related Article - MongoDB Projection