How to Insert Pandas Data Frame Into MongoDB Using PyMongo

Hira Arif Feb 02, 2024
How to Insert Pandas Data Frame Into MongoDB Using PyMongo

MongoDB is an open-source document-oriented database that supports flexible, JSON-like documents to store and query data. It uses a dynamic, schemeless query language (DQL) that allows you to express queries in JavaScript.

We can design MongoDB as a backend database for applications that require fast access to changing data and deployments vary over time as web apps and APIs.

Pandas data frame is a class of Python data structures used for data analysis and data manipulation, like tables in Excel or databases with rows and columns. This tutorial explains the insertion of Pandas data frames into MongoDB using PyMongo.

Insert Pandas Data Frame Into MongoDB Using PyMongo

To insert the pandas data frame to MongoDB, we need to install the below Python libraries.

  1. pandas

    PS C:\> pip install pandas
    
  2. json

    PS C:\> pip install json
    
  3. pymongo

    PS C:\> pip install pymongo
    

Let’s create a client by running the below code.

Example Code (saved in demo.py):

from pymongo import MongoClient


def create_connection():
    connection = None
    try:
        connection = MongoClient("mongodb://localhost:27017/")
        print("Connection made!!")
    except Exception as e:
        print(e)
    return connection


client = create_connection()

From the Python package pymongo, we import a class MongoClient. The above function create_connection() uses that class to create a connection by connecting the MongoDB server locally at port number 27017.

It then returns the connection to the client. Let’s run the below code that creates a database named db.

Example Code (saved in demo.py):

def create_database(client, db_name):
    db = None
    try:
        db = client[db_name]
        print(f"Database {db_name} created!!")
    except Exception as e:
        print(e)
    return db


db_name = "companyDB"  # name of your database
db = create_database(client, db_name)

The function create_database() creates a database named db by taking client and db_name as arguments. In case of any error, this function will print the Exception without breaking the program.

Now, let’s run the below code to create a collection.

Example Code (saved in demo.py):

def create_collection(db, collection_name):
    collection = None
    try:
        collection = db[collection_name]
        print(f"Collection {collection_name} created!!!")
    except Exception as e:
        print(e)
    return collection


collection_name = "startups"  # name of your collection
collection = create_collection(db, collection_name)

We create a collection using the above function create_collection() with the specified name in the provided database. It allows us the insertion of Pandas Data Frame into the MongoDB.

The below code inserts Pandas Data Frame to the created collection.

Example Code (saved in demo.py):

import json
import pandas as pd


def insert_records(collection, records):
    rows = None
    try:
        rows = collection.insert_many(records)
        print(f"{len(rows.inserted_ids)} records added successfully")
    except Exception as e:
        print(e)
    return rows


df_file = "50_Startups.csv"
df = pd.read_csv(df_file)

records = json.loads(df.T.to_json()).values()
insert_records(collection, records)

To insert the pandas data frame into MongoDB, first, we have to read it using the pandas library. By default, MongoDB supports JSON-type files, so we need to convert the data frame to the supported format using the to_json() function.

The function insert_records() take db, collection_name, and converted data frame records as arguments and inserts them into the collection. We use the insert_many() function to insert multiple records at once.

Finally, as we have inserted the data frame into the database, we need to close the connection by running the code below.

Example Code (saved in demo.py):

# Close Connection
def close_connection(client):
    if client:
        client.close()
        print("Connection closed!!")


close_connection(client)

The function close_connection() uses the client.close() function to close the connection if it exists.

Now, we run the Python file demo.py as:

PS C:>python demo.py

Output (printed on console):

console output

Whereas data frame inserted into MongoDB is below.

database output