How to Convert Docx to PDF in Python

Muhammad Maisam Abbas Feb 02, 2024
  1. Convert Docx to PDF With the pywin32 Package in Python
  2. Convert Docx to PDF With the docx2pdf Package in Python
How to Convert Docx to PDF in Python

This tutorial will discuss the methods to convert a docx file to a pdf file in Python.

Convert Docx to PDF With the pywin32 Package in Python

The pywin32 package is generally used for creating and initializing COM objects and using windows services in Python. As it is an external package, we have to install pywin32 before using it. The command to install pywin32 is given below.

pip install pywin32

We can use the Microsoft Word application with this package to open the docx file and save it as a pdf file. The following code example shows us how to convert a docx file to a pdf file with the pywin32 package.

import os
import win32com.client

wdFormatPDF = 17

inputFile = os.path.abspath("document.docx")
outputFile = os.path.abspath("document.pdf")
word = win32com.client.Dispatch("Word.Application")
doc = word.Documents.Open(inputFile)
doc.SaveAs(outputFile, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()

We converted the document.docx to document.pdf with the win32com.client library in the above code. We opened the docx file with doc = word.Documents.Open(inputFile) and saved it as a pdf file with doc.SaveAs(outputFile, FileFormat=wdFormatPDF). In the end, we closed the opened document with doc.Close() function and exited Microsoft Word with word.Quit() function. Notice that the output file must already be created for this code to work properly. This means that we have to manually create a file named document.pdf before executing the above code. This process can also be automated with the help of file handling in Python. The following code snippet shows how we can further automate this whole process.

import os
import win32com.client

wdFormatPDF = 17

inputFile = os.path.abspath("document.docx")
outputFile = os.path.abspath("document.pdf")
file = open(outputFile, "w")
file.close()
word = win32com.client.Dispatch("Word.Application")
doc = word.Documents.Open(inputFile)
doc.SaveAs(outputFile, FileFormat=wdFormatPDF)
doc.Close()
word.Quit()

In the above code, we create the output file with file = open(outputFile, "w") before opening Microsoft Word with the win32com.client library.

Convert Docx to PDF With the docx2pdf Package in Python

The pywin32 method works just fine and gives us a lot of control over the nitty-gritty details. The only drawback is that we have to write a lot of code for it. If we need to quickly convert a docx file to a pdf file without worrying too much about any low-level details, we can use the docx2pdf package in Python. The docx2pdf package provides us simple functions that take the file names and take care of all the low-level conversion stuff discussed in the previous section. The docx2pdf is also an external package. The command to install docx2pdf package is given below.

pip install docx2pdf

The following code example shows us how to convert a docx file to a pdf file with the docx2pdf package.

from docx2pdf import convert

inputFile = "document.docx"
outputFile = "document2.pdf"

convert(inputFile, outputFile)

We converted document.docx to document.pdf with the convert() function of docx2pdf package in the above code. The only drawback of this code is that we still need to create the output file before executing this code. We can automate this process as we did in the previous section using file handling.

from docx2pdf import convert

inputFile = "document.docx"
outputFile = "document2.pdf"
file = open(outputFile, "w")
file.close()

convert(inputFile, outputFile)

In the above code, we create the output file with file = open(outputFile, "w") before calling the convert() function.

Muhammad Maisam Abbas avatar Muhammad Maisam Abbas avatar

Maisam is a highly skilled and motivated Data Scientist. He has over 4 years of experience with Python programming language. He loves solving complex problems and sharing his results on the internet.

LinkedIn