Python Rsync

Abid Ullah Oct 10, 2023
Python Rsync

In a world of ever-changing technologies, we still and may always encounter instances where we need to transfer or exchange files. Rsync is a Linux-based tool that can help us specify the transfer details.

This article will explore rsync and how we can use it from a Python script.

Python Rsync

As mentioned above, rsync is a powerful tool that helps us specify the transfer details. This means we can determine what files to exclude from a transfer and what kind of shell should be used.

Rsync is typically used for transfers with a high transfer complexity or files being transferred in bulk. It is also possible to automate backups created by rsync with the help of cron.

the rsync Command in Linux

This is what a generic rsync command format looks like.

rsync [option][origin][destination]

This is a straightforward command when one is familiar with Linux, but we will break it down anyway. Every command starts with the keyword rsync.

It is followed by an option, which we have a wide range to choose from. Each option specifies the nature of the rsync we hope to execute.

The origin and destination here are where we wish to transfer our files to (destination) and from where (origin). This means that we have to be wary of what it is we are syncing as well as whether we are syncing from a local or a remote machine because rsync is often the cause of files being rewritten without much warning.

Here is a list of basic and common options for rsync.

  1. -a - This option helps recursively copy files and helps preserve the ownership of the files even after they have been copied.
  2. -dry-run - This option allows us to run a trial for the command to observe the changes that would come about if the command were executed. This option does not bring about any actual changes.
  3. -delete - This option helps delete extraneous files from the destination machine/directory.
  4. -e - This option helps inform rsync about the shell that should be used.
  5. -exclude="*.filetype" - This option helps exclude all of a specific file type from a transfer. In the command above, we replace filetype with the actual filetype. For example, -exclude="*.docx".
  6. -h - This option helps initiate help for rsync.
  7. -progress - This option helps show the progress of the transfer as the command runs.
  8. -q - This option runs all the commands in the background or quietly.
  9. -v - This option makes the transfer so the user can read all the processes being run.
  10. -z - This option helps compress synced data.

Use Rsync From a Python Script

There are now two ways to make use of Rsync in Python.

  1. Make a call to subprocess and specify the rsync command.

    import subprocess
    
    subprocess.call(["rsync", "[option]", "[origin]", "[destination]"])
    
  2. Use the pyrsync library

    That’s right, and Python now offers its library for Rsync. This library is not a wrapper for Rsync but contains a full-fledged functionality of Rsync itself.

    We can install this library via pip.

    pip install pyrsync
    

Initially, rsync requires the use of MD5 hash, which developers often find outdated compared to the use of SHA256, which is used by the modernized pyrsync. SHA256 meets the standard requirements for the security of verification processes.

While pyrsync has had no major releases since its launch, it can be observed that it has huge potential in the world of development, and currently, this library is not known to have any bugs or vulnerabilities.

Since the library is not available, it must be built from source code, which is available, and installed.

Pyrsync has the potential to save us hours and hours of development time and resources by not having to build the functionality it provides from scratch.

Its easy-to-read code and Pypi’s straightforward installation instructions make it very easy to incorporate into our scripts.

We need to run this command if the system has setup tools already installed.

$ sudo python setup.py install

Even if the system does not have setup tools, the setup.py script will detect the absence and set the default to use Python’s built-in distutils instead.

An example flow of commands script for this module is as follows:

# In the system with the file that needs patching
>>> import pyrsync2
>>> unpatched = open("unpatched.file", "rb")
>>> hashes = pyrsync2.blockchecksums(unpatched)
# In the remote machine receiving hashes
>>> import pyrsync2
>>> patchedfile = open("patched.file", "rb")
>>> delta = pyrsync2.rsyncdelta(patchedfile, hashes)
# In the origin machine with the unpatched file after receiving delta
>>> unpatched.seek(0)
>>> save_to = open("locally-patched.file", "wb")
>>> pyrsync2.patchstream(unpatched, save_to, delta)

An essential point to note here would be that this library only offers support for Python 3 currently.

We hope you find this article helpful in understanding how to use rsync in Python.

Author: Abid Ullah
Abid Ullah avatar Abid Ullah avatar

My name is Abid Ullah, and I am a software engineer. I love writing articles on programming, and my favorite topics are Python, PHP, JavaScript, and Linux. I tend to provide solutions to people in programming problems through my articles. I believe that I can bring a lot to you with my skills, experience, and qualification in technical writing.

LinkedIn