- Importance of Dynamic Languages in Monkey Patching
- Implement a Monkey Patch in Python
- Use Monkey Patch for Unit Testing in Python
A piece of code is written to achieve the desired outcome, like sending data from the user to a database. But the code needs to be tweaked during testing phases, like checking whether the code runs correctly or for bugs.
Monkey patching is the process of assigning a stub or a similar piece of code so that the default behavior of the code gets changed. This article will focus on different ways for monkey patching in python.
Importance of Dynamic Languages in Monkey Patching
Only dynamic languages, of which Python is an excellent example, can be used for monkey patching. In static languages where everything needs to be defined, monkey patching is impossible.
As an example, monkey patching is the practice of adding attributes (whether methods or variables) during runtime rather than altering the object description. These are frequently used when working with those modules whose source code is unavailable, making it difficult to update the object definitions.
Monkey patching in Python can be helpful if a new version of an object is built with patched-in members inside a decorator instead of altering an existing object or class.
Implement a Monkey Patch in Python
Monkey patching in Python will be demonstrated through this program. A method will be assigned to a new decorated method to monkey patch it during runtime.
import pandas as pd def word_counter(self): """This method will return all the words inside the column that has the word 'tom'""" return [i for i in self.columns if "tom" in i] pd.DataFrame.word_counter_patch = word_counter # monkey-patch the DataFrame class df = pd.DataFrame([list(range(4))], columns=["Arm", "tomorrow", "phantom", "tommy"]) print(df.word_counter_patch())
"C:\Users\Win 10\main.py" ['tomorrow', 'phantom', 'tommy'] Process finished with exit code 0
Let’s break down the code to understand monkey patching in Python.
The first line of code imports the library
pandas used to create data frames in the program.
import pandas as pd
Then, since the distinction between a function and an unbound method is largely useless in Python 3, a method definition is established that exists unbound and free outside the scope of any class definitions:
def word_counter(self): """This method will return all the words inside the column that has the word 'tom'""" return [i for i in self.columns if "tom" in i]
A new class is created using
pd.Dataframe.word_counter. Then the newly created class is attached to the method
What it does is it monkey patches the method
word_counter with the data frame class.
pd.DataFrame.word_counter_patch = word_counter # monkey-patch the DataFrame class
Once the method is attached to the class, a new data frame must be created to store the words. This data frame is assigned an object variable
df = pd.DataFrame([list(range(4))], columns=["Arm", "tomorrow", "phantom", "tommy"])
Lastly, the monkey patch class is called by passing the data frame
df to it, which is printed. What happens here is that when the compiler calls the class
word_counter_patch, the monkey patching passes the data frame to the method
As classes and methods can be treated as object variables in dynamic programming languages, monkey patching in Python can be applied to methods using other classes.
Use Monkey Patch for Unit Testing in Python
So far, we’ve learned how monkey patching in Python is executed on functions. This section will examine how to monkey patch the global variables using Python.
Pipelines will be used to demonstrate this example. For readers new to pipelines, it is a process to train and test machine learning models.
A pipeline has two modules, a training module that collects data - like text or images and a testing module.
What this program does is that the pipeline is created to search for several files in the data directory. In the
test.py file, the program creates a temporary directory with a single file and searches for the number of files in that directory.
Train the Pipeline for Unit Testing
The program creates a pipeline that collects data from two plain text files stored inside a directory,
data. To recreate this process, we must create the
test.py Python files in a parent directory where the folder
data is stored.
from pathlib import Path DATA_DIR = Path(__file__).parent / "data" def collect_files(pattern): return list(DATA_DIR.glob(pattern))
Let’s breakdown the code:
pathlib is imported as
Path will be used inside the code.
from pathlib import Path
This is a global variable
DATA_DIR that stores the location of data files. The
Path indicates the file inside the parent directory data.
DATA_DIR = Path(__file__).parent / "data"
collect_files is created that takes one parameter, which is the string pattern that is needed to be searched.
DATA_DIR.glob method searches the pattern inside the data directory. The method returns a list.
def collect_files(pattern): return list(DATA_DIR.glob(pattern))
How can the method
collect_files be tested correctly as a global variable is used at this point?
A new file,
test.py, needs to be created to store the code for testing the pipeline class.
import pipeline def test_collect_files(tmp_path): # given temp_data_directory = tmp_path / "data" temp_data_directory.mkdir(parents=True) temp_file = temp_data_directory / "file1.txt" temp_file.touch() expected_length = 1 # when files = pipeline.collect_files("*.txt") actual_length = len(files) # then assert expected_length == actual_length
The first line of code imports
pytest Python libraries. Next, a
test function is created called
This function has a parameter
temp_path that will be used to get a temporary directory.
The pipeline is divided into three sections - Given, When, and Then.
Inside the Given, a new variable is created named
temp_data_directory, which is nothing but a temporary path that points to the
data directory. This is possible because the
tmp_path fixture returns a path object.
Next, the data directory needs to be created. It is done using the
mkdir function, and the parent is set to true to ensure that all the parent directories inside this path are created.
Next, a single text file is created inside this directory, named
file1.txt, and then created using the
A new variable,
expected_length, is created that returns the number of files inside the data directory. It is given a value of
1 as only one file is expected inside the data directory.
temp_data_directory = tmp_path / "data" temp_data_directory.mkdir(parents=True) temp_file = temp_data_directory / "file1.txt" temp_file.touch() expected_length = 1
Now the program enters the When section.
pipeline.collect_files function is invoked, it returns a list of files having a pattern
* is a string. It is then assigned to a variable
The number of files is fetched using
len(files), which returns the length of the list and is stored inside the variable
files = pipeline.collect_files("*.txt") actual_length = len(files)
In the Then section, an
assert statement states that
expected_length must be equal to
assert is used to check whether a given statement is true.
Now the pipeline is ready for testing. Head over to the terminal and run the
test.py file using the command:
When the test is run, it fails.
assert expected_length == actual_length E assert 1 == 0 test.py:23: AssertionError =============================== short test summary info ============================================ FAILED test.py::test_collect_files - assert 1 == 0
It happens because the expected length is
1, but in reality, it is
2. This happens because, at this point, the program is not using the temp directory; instead, it uses the real data directory created at the beginning of the program.
Two files were created inside the data directory, while only a single file was created inside the temporary directory. What happens is that the
test.py code is written to check for files inside the temporary directory where only a single file is stored, but instead, the code causes it to go back to the original directory.
That is why the
expected_length variable is given a value of
1, but when it is compared with
actual_length, the test fails.
We can patch the global variable to solve this issue using a monkey patch.
At first, a parameter
monkeypatch needs to be added to the function
collect_files like this:
def test_collect_files(tmp_path, monkeypatch):
Now what needs to be done now is that the global variable will be patched by using the monkey patch:
def test_collect_files(tmp_path, monkeypatch): # given temp_data_directory = tmp_path / "data" temp_data_directory.mkdir(parents=True) temp_file = temp_data_directory / "file1.txt" temp_file.touch() monkeypatch.setattr(pipeline, "DATA_DIR", temp_data_directory) # Monkey Patch expected_length = 1
Monkey patching in Python has a function
setattr, which allows assigning a new value to the
DATA_DIR variable inside the pipeline module. And the new value for
DATA_DIR is assigned to
If the test is executed again, it is passed because the global variable is patched, and it uses
platform win32 -- Python 3.10.5, pytest-7.1.2, pluggy-1.0.0 rootdir: C:\Users\Win 10\PycharmProjects\Monkey_Patch collected 1 item test.py . [100%] ================================== 1 passed in 0.02s ====================================
This article focuses on monkey patching in Python and explains in detail the practical uses of monkey patching. The reader will be able to implement monkey patching easily.