How to Use MongoDB as File Storage in PHP

How to Use MongoDB as File Storage in PHP

When it comes to creating scalable storage for big files, MongoDB and its GridFS (written in MongoDB Query Language - MQL) is one of the best file storage solutions in the market. In this tutorial, you will learn how to use MongoDB as file storage in PHP.

It eases the process of querying for any part of the files collection and results in the same process as it would for any other query by loading the data it needs into your working set (the working set represents a set of data or loaded data at that given time) required by MongoDB within a given time frame to enhance or maintain optimal performance. Furthermore, it does the querying (storage, transfer, manipulation of data/files) by paging it into RAM.

Its read performance varies as it can be excellent for small files directly from RAM and, on the other hand, could be better for large files. It’s expected as most computers will not have 600+ GB of RAM and can easily handle 600+ GB partition of a single file on a single mongod instance.

An important thing to consider here is that the default or average size of chunks is 256KB, which makes a massive amount of documents for a 600GB file collection; however, you can manipulate this setting in most drivers. GridFS uses the default locks (database level 2.2+ or global level pre-2.2) as on any other collection.

Database level 2.2+ and global level pre-2.2 do interfere with each other, and with in-depth understanding, you can ensure a consistent read of a document that is being written.

The server being driver implemented means it does not have any idea about GridFS, and there is no special resolution of GridFS data on the server side. You will learn how to implement the GridFS using MongoDB in your PHP project, and there is a possibility for contention based on your scenario specifics, traffic, number of concurrent writes/reads, and many others, so there may be any lock for read/write operations in some cases.

Use GridFS From MongoDB as File Storage

Files that increase in size (above the 16MB default limit of BSON-file) require a more robust storage system, and GridFS is capable of handling big files as it divides the files into chunks to store/handle each chunk as a separate file/document instead of storing the file as a single document. Its default chunk size is 255KB, and it stores them in one collection and the file metadata in another.

The system-level filesystem cannot handle large files, and storing them in a MongoDB database is more efficient. GridFS is a perfect solution for accessing information from portions of large files without loading the whole document into the memory, and it is a performance-optimal solution to increasing data/storage limit in a directory.

Furthermore, it helps with the document, files, and metadata synchronization and to deploy the data across many systems to distribute files and their metadata efficiently and automatically when using geographically distributed replica sets (data distribution to several mongod instances). GridFS facilitates programmers to store and distribute data in two ways, including using a MongoDB driver or the mongofiles command-line tool.

There are two types of collections GridFS works with, including the chunks and files collections. Moreover, it represents the chunks collection by fs.chunks and the files collection by fs.files for a buck named fs.

/*
prerequisites

i) MongoDB 3.0 or higher
ii) PHP Driver

[Remember| install both on the same server]

This example will store a >1GB video on a MongoDB server.

the source code `mkdir src` must be stored in a directory so create one
to create a successful connection and understanding between your files and storage, create a settings file
your new directory will contain a `Settings.php` file with the following PHP contents:
*/

<?php
    # Remember| PHP 7.0 allows you to have one `define` using an array
    define("USERNAME_MONGODB", "[your_username]");
    define("PASSWORD_MONGODB", "[your_password]");
    define("DATABASE_MONGODB", "[your_databaseName]");
    define("SERVERIP_MONGODB", "[your_serverIP or your_hostname]");

    // the following script will enable file storage so you can store a file
    require_once(__DIR__ . '/Settings.php');

    if (!isset($argv[1]))
    {
        die("Please mention/pass the filepath to the file you want to store in MongoDB Server.");
    }

    $storage_filepath = $argv[1];

    if (!file_exists($storage_filepath) || is_dir($storage_filepath))
    {
        die("Invalid! Please, re-check your filepath.");
    }


    function mongo_Connect($your_username, $your_password, $your_database, $your_host_serverIP)
    {
        $con_MongoDB = new Mongo("mongodb://{$your_username}:{$your_password}@{$your_host_serverIP}"); // Connect to Mongo Server
        $con_database = $con_MongoDB -> selectDB($your_database); // Connect to Database
        return $con_database;
    }

    $con_database = mongo_Connect(
        USERNAME_MONGODB,
        PASSWORD_MONGODB,
        DATABASE_MONGODB,
        SERVERIP_MONGODB
    );

    $grid_FS = $con_database -> getGridFS();

    # you can stick any metadata here, e.g., upload the owner's ID, date of execution, etc.
    $add_metadata = array("date" => new MongoDate());
    $storage_filepath = $storage_filepath;
    $grid_FS -> storeFile($storage_filepath, array("metadata" => $add_metadata));

    // You can execute the script using the php `StoreFile.php` "filename.com.avi"
    // list files with the file size
    require_once(__DIR__ . '/Settings.php');

    function _mongoConnect($your_username, $your_password, $your_database, $your_host_serverIP)
    {
        $con_MongoDB = new Mongo("mongodb://{$your_username}:{$your_password}@{$your_host_serverIP}"); // Connect to Mongo Server
        $con_database = $con_MongoDB -> selectDB($your_database); // Connect to Database
        return $con_database;
    }

    $con_database = _mongoConnect(
        USERNAME_MONGODB,
        PASSWORD_MONGODB,
        DATABASE_MONGODB,
        SERVERIP_MONGODB
    );

    $grid_FS = $con_database -> getGridFS();

    # Loop over the files and output their names and file sizes
    $storage_files = $grid_FS -> find();

    while (($your_file = $storage_files -> getNext()) != null)
    {
        print $your_file -> getFilename() . "\t" . $your_file -> getSize() . PHP_EOL;
    }

    // retrieve files
    require_once(__DIR__ . '/Settings.php');

    if (!isset($argv[1]))
    {
        die("Please mention/pass the filepath to the file you want to store in MongoDB Server.");
    }

    $storage_filepath = $argv[1];

    if (!file_exists($storage_filepath) || is_dir($storage_filepath))
    {
        die("Invalid! Please, re-check your filepath.");
    }

    function mongoConnect($your_username, $your_password, $your_database, $your_host_serverIP)
    {
        $con_MongoDB = new Mongo("mongodb://{$your_username}:{$your_password}@{$your_host_serverIP}"); // Connect to Mongo Server
        $con_database = $con_MongoDB -> selectDB($your_database); // Connect to Database
        return $con_database;
    }

    $con_database = mongoConnect(
        USERNAME_MONGODB,
        PASSWORD_MONGODB,
        DATABASE_MONGODB,
        SERVERIP_MONGODB
    );

    $grid_FS = $con_database -> getGridFS();

    # in the following code, you can search for the filepath passed in as the first argument
    $search_fileParams = array("filename" => $storage_filepath);

    if (false)
    {
        # If you used absolute paths when storing files, then you could use the following to download a folder's contents.
        $folder = '/path/to/folder';

        # Refer to https://secure.php.net/manual/en/class.mongoregex.php
        $file_get_filename = new MongoRegex("/^$folder");

        $search_fileParams = array(
            'filename' => $file_get_filename
        );
    }

    # Alternatively, use findOne($search_fileParams) if you expect only one file with the provided name.
    $storage_files = $grid_FS -> find($search_fileParams);

    while (($your_file = $storage_files -> getNext()) != null)
    {
        # Use a random string in case there is a file with the same name in the current directory.
        $random_string = substr(str_shuffle(MD5(microtime())), 0, 10);
        $outputFile_path = __DIR__ . '/' . $random_string . "_" . basename($your_file -> getFilename());
        $your_file -> write($outputFile_path);
        print "Retrieved: " . $outputFile_path . PHP_EOL;
    }

    // use the script like
    // php RetrieveFiles.php "my_filename.mp4"
?>

Output:

Retrieved: my_filename.mp4

Use the StoreUpload() method if you are developing a PHP website rather than a CLI tool (rather than using the StoreFile() method). Use relative or full paths to store a file in the system, and the filename stored in MongoDB will be the exact string that was passed in if you pass the full path, such as /path/to/file.mp4 and the filename will be the same.

Remember, calling the script multiple times on the same file will not fail; however, doing so can waste valuable storage resources storing the same file multiple times. The PHP code example shows you how to use MongoDB as the default storage for your PHP website or projects.

Hadoop and its HDFS are a great alternative to MongoDB, but it is extremely complicated; however, it supports Map/Reduce jobs compared to MongoDB. Most importantly, the GridFS is a top-notch option as its implementation is client-side within the driver itself (with no special loading or understanding of the file’s context).

MongoDB and its GridFS are driver-implemented, and the specification can vary as drivers will allow you to query a collection of documents from the files collection and enable programmers to later serve the file itself from the chunks collection with a single query. It eases loading the files collection and subsequent chunks collection into your working set.

Syed Hassan Sabeeh Kazmi avatar Syed Hassan Sabeeh Kazmi avatar

Hassan is a Software Engineer with a well-developed set of programming skills. He uses his knowledge and writing capabilities to produce interesting-to-read technical articles.

GitHub