Serialization Libraries in C++

Abdul Mateen Dec 11, 2023
  1. Overview of Serialization
  2. Serialization Libraries in C++
  3. Comparison Between All Serialization Libraries
Serialization Libraries in C++

In this tutorial, you’ll learn about the different C++ serialization libraries.

First, we will learn about serialization and its purpose in C++. Next, we will discuss serialization libraries in C++ and how to use them in our program.

Overview of Serialization

Programmers routinely work with data objects in memory. And sometimes, the objects must be sent over a network or written to persistent storage (typically a file) to save some parts of the program’s state.

Serialization is the process/technique of transforming the data or object state into a binary format. The binary form is the stream of bytes for storing/preserving the object state or transmitting it to memory, database, file, or disk over a computer network.

The reverse of the serialization process is known as deserialization. Serialization is ideally suitable when you want to maintain the state of the structured data (i.e., C++ classes or structs) during or after the program’s execution.

You can read about serialize() and deserialize() with an example on this website.

Serialization provides a stable byte representation of the value of software objects. Then, these bytes can be sent over a network that will continue to work correctly even in future implementations using different hardware and software.

Serialization in Various Languages

Before moving forward, the serialization process is implemented differently in different languages. Let’s explore them below.

  1. In Java, the serializing method is writeObject, which is implemented in ObjectOutputStream.
  2. In Python, the serializing method is a pickle.dumps().
  3. In Ruby, the serializing method is marshal.dump().
  4. In C++, the method is boost::archive::text_oarchive a (filename); a << data; invoke the serializing method
  5. In MFC (Microsoft Foundation Class Library), the serializing method is: driving your class from CObject.

Our primary focus is C++ serialization, so let’s see how it works.

C++ Boost.Serialization uses text archive objects. Serialization writes into an output archive object operating as an output data stream.

When invoked for class data types, the >> output operator calls the class to serialize the function. Each serialize function uses the & operator, or via >>, recursively serializes nested objects to save or load its data members.

To better understand serialization in C++, you can look at this website.

Serialization Libraries in C++

C++ provides many libraries for serialization except for Boost serialization. All libraries help achieve high serialization performance, so let’s explore a few.

Protobuf

Protocol Buffers (Protobufs), a cross-platform used to serialize the data. It is helpful for communication between the programs over a network.

The Protobuf serialization mechanism is given through the protocol application. This compiler will parse the .proto file and generate, as output, source files according to the configured language by its arguments, in this case, C++.

You can read the tutorial on this link to understand Protobufs better.

FlatBuffers

FlatBuffers is an open-source cross-platform used to achieve maximum memory efficiency. You can directly access serialized data without parsing it with forward/backward compatibility.

Using FlatBuffers, you first generate your C++ schema with the --CPP option. Then you can include Flatbuffer in the file to read or write this generated code.

You can find serialization implementation using FlatBuffers at this link.

Cereal

Cereal is the header-only C++ 11 serialization library. Cereal takes arbitrary data types and reversibly turns them into different representations, such as compact binary encodings, XML, or JSON.

Cereal is extensible, fast, unit-tested, and offers a familiar syntax like Boost. To download the complete cereal library, click here.

Next, we have an example that shows the use of the cereal library:

#include <cereal/archives/binary.hpp>
#include <cereal/types/memory.hpp>
#include <cereal/types/unordered_map.hpp>
#include <fstream>

struct MyRecord {
  uint8_t x, y;
  float z;
  template <class Archive>
  void serialize(Archive& ar) {
    ar(x, y, z);
  }
};
struct SomeData {
  int32_t id;
  std::shared_ptr<std::unordered_map<uint32_t, MyRecord>> data;
  template <class Archive>
  void save(Archive& ar) const {
    ar(data);
  }
  template <class Archive>
  void load(Archive& ar) {
    static int32_t idGen = 0;
    id = idGen++;
    ar(data);
  }
};
int main() {
  std::ofstream out("out.cereal", std::ios::binary);
  cereal::BinaryOutputArchive archive(out);
  SomeData myData;
  archive(myData);
  return 0;
}

In the primary function, we create a binary file name os.cereal in binary mode.

In the following line, we create an object of the class BinaryOutputArchive from binary.hpp defined in the archives folder of the cereal library. We are passing our file object to the constructor of BinaryOutputArchive.

Next, we create our data object to be serialized, the object of class some data. Finally, in the fourth line of the primary function, we pass our data object to the archive, indirectly calling the save function to save data in the output file to save a serialized object in the output file out.cereal.

The binary file, out.cereal has our serialized data. The output of this code will be data in file out.cereal.

This data is in binary form, and we may check it using different utilities/commands available in different operating systems.

HPS

HPS is a high-performance simulation tool and is the header-only library for data serialization in C++11. HPS encodes your data into a compressed format to easily pass over the network.

HPS is 150% faster than the normal boost serialization in C++, as it just requires one line of code for Standard Template Libraries and primitive data types.

You can also use the write to_stream and from_stream functions to read and write data in the files. For a better understanding of HPS, you can also visit this website.

GitHub Msgpack

MessagePack is an open-source-specific serialization format. You can easily exchange data in different formats like JSON and XML.

The integer value requires only one byte to encode data, while the string value requires an extra byte.

For further implementation of msgpack, you can visit the link.

Boost.Serialization

The Boost Serialization library in C++ makes it possible to convert objects into bytes for saving and loading to restore them. Different formats generate the sequences of bytes.

All of these formats are supported by Boost. Serialization is only intended for use with this library.

In C++, each object you are going to serialize requires you to implement the serialize method. It should take an archive as an argument; an archive is similar to an input/output data stream.

Instead of using the operators << or >>, you can use the general operator and handle bot saving and loading operations. You can read about boost serialization on this website.

Apache Avro

Apache Avro is a data serialization system. Avro C++ is a C++ library that implements the Avro specifications, and the library is aimed to be used in the streaming pipeline.

For example, Apache Kafka performs data serialization and deserialization with centrally managed schemas.

Avro does not require code generation; you can use the code generation tool. The code generator reads a schema and outputs a C++ object to represent the data for the schema in .schema.

It also creates the code to serialize this object and deserialize it. Here, all the heavy coding is done for you.

Even if you wish to write custom serializers or parsers using the core C++ libraries, the generated code can be an example of how to use these libraries.

This style can work, but you can make struct or class serializable for the perfect solution.

Cap’n Proto

Cap’n Proto can easily interchange the data format as its capability depends on RPC (Remote Procedure Call) system. There is no concept of encoding/decoding in Cap'n Proto.

The data interchange format acts as encoding and represents memory. You can easily write bytes straight from the disk by structuring your data.

You can find the best example on Cap’n Proto page.

Thrift

Apache Thrift is a serialization framework that focuses on language issues. You can define abstract data types in IDL (Interface Definition Language), which will further compile into source code for any supported language.

Then, complete serialization will be provided by these generated codes for user-defined types. Apache Thrift ensures that any data type can be readable or writable with it.

You can find the best example of Apache Thrift on this page.

YAS

YAS is created as a replacement for Boost.serialization because of its low serialization speed; it is also a header-only file and doesn’t require any third-party libraries. It supports the binary, test, and JSON formats and requires C++11.

You can read about YAS on this page.

Comparison Between All Serialization Libraries

All these libraries provide serialization and have theirs on different periods. We are just showing you the results of these libraries, whereas you can find the code and further details on this page (the same line is already shared in the previous line).

Here in this article, we are showing results taken from an article.

This code has run on a typical desktop computer with an Intel Core i7 processor running Ubuntu 16.04, and their average time has been calculated.

Serializer Object’s size Average total time
thrift-binary 17017 1190.22
thrift-compact 13378 3474.32
protobuf 16116 2312.78
boost 17470 1195.04
msgpack 13402 2560.6
cereal 17416 1052.46
avro 16384 4488.18
yas 17416 302.7
yas-compact 13321 2063.34

Cap’n Proto and Flatbuffers store data in a serialized form, where serialization means getting the pointer to the internal storage. Therefore in Cap’n Proto, we measure the entire build/serialize/deserialize cycle.

In the case of other libraries, we can also serialize/deserialize the cycle of the already-built data structure.

serializer object’s size avg. total time
capnproto 17768 400.98
flatbuffers 17632 491.5

If you see the above data representation according to size, YAS takes more object size than other libraries. Still, YAS takes low time if you consider the serialization time.

Thus, YAS shows the fastest serialization among all libraries.

Note: The size is measured in bytes, and time is measured in milliseconds.

We hope you now understand serialization and its uses. Now, you know what libraries can do serialization in C++ or which library you can achieve the fastest.