This is one off-post, irrelevant to my blog’s main focus.
Yes – This article would not be about Finite Element Method(FEM) or any of the concepts associated with it.
Merging multiple PDFs into a single document is one activity which most of us have to do. Almost on a daily basis or on a weekly or monthly basis. There are of course many websites which offer this as a service. The ones which allow you to merge PDFs for free often have some limits. Either based on number of files or the time between every merging operation.
What if you can write a Python script that can do this for you?
Sounds great right? Keep reading.
In this article, I am presenting two different methods for merging many PDF files into a single document. Using the Python tool kit – PyPDF2.
Before we go further, I emphasize that there is no “one-method-fits-all” approach. And I do not claim that these methods are the best. These are two methods that have worked fine for me so far. So, I thought that I would share it in this platform.
Prerequisites before you try either of these methods:
Make sure that
- You have installed latest version of Python (that’s obvious, duh!)
- You have installed the PyPDF2 tool kit
- Saved the PDF files that you want to merge in Python’s working directory. Of course, you can change the directory using Python code. For simplicity of code, I am placing the PDF files on the working directory for these two methods that I am going to present here..
This method is directly taken from Chapter 13 of the book “Automate the Boring Stuff with Python” by Al Sweigart.
When is method 1 suitable?
- When you have lesser number of files
- When the group of files to be merged do not have a common filename pattern
How this method works?
In the following sequence.
- Import the PyPDF2 tool kit which has the tools that we need for playing with PDFs
- Open each and every file by entering the file name
- Read each and every file which was opened in Step 2 using PdfFileReader
- Create a blank PDF file using PdfFileWriter where you can store the merged output
- Loop through every page in every file which was read in Step 3 using for loop and copy all the information
- Give a name for the output file and then paste all the copied information in Step 5
- Close all the files
If you find the above sequence difficult to understand, have a look at the code below. Python is very reader-friendly. So I hope you would get the idea.
import PyPDF2 # Open the files that have to be merged one by one pdf1File = open('FirstInputFile.pdf', 'rb') pdf2File = open('SecondInputFile.pdf', 'rb') # Read the files that you have opened pdf1Reader = PyPDF2.PdfFileReader(pdf1File) pdf2Reader = PyPDF2.PdfFileReader(pdf2File) # Create a new PdfFileWriter object which represents a blank PDF document pdfWriter = PyPDF2.PdfFileWriter() # Loop through all the pagenumbers for the first document for pageNum in range(pdf1Reader.numPages): pageObj = pdf1Reader.getPage(pageNum) pdfWriter.addPage(pageObj) # Loop through all the pagenumbers for the second document for pageNum in range(pdf2Reader.numPages): pageObj = pdf2Reader.getPage(pageNum) pdfWriter.addPage(pageObj) # Now that you have copied all the pages in both the documents, write them into the a new document pdfOutputFile = open('MergedFiles.pdf', 'wb') pdfWriter.write(pdfOutputFile) # Close all the files - Created as well as opened pdfOutputFile.close() pdf1File.close() pdf2File.close()
This method is more elegant and has just 5 lines of code. It’s my favorite and it uses the PdfFileMerger module.
When is method 2 suitable?
- When you have a lot of PDF files ( I mean a loooot – Like for example, hundreds of PDF files or even more)
- If all the PDF files that you want to merge follow a naming convention for their file names.
How this method works?
In the following sequence.
- Import PdfFileMerger and PdfFileReader tools
- Loop through all the files that have to be merged and append them
- Write the appended files into an output document and specify a name for it.
That’s it. It’s simple but powerful.
So let’s look into the code now. Before we go there, I will show how my input files are named. And remember that these files are placed in Python’s working directory.
As you can see, the file names follow a pattern which makes my job very very easy to loop through them.
Okay. Without much further ado, let’s have a look at the code.
from PyPDF2 import PdfFileMerger, PdfFileReader # Call the PdfFileMerger mergedObject = PdfFileMerger() # I had 116 files in the folder that had to be merged into a single document # Loop through all of them and append their pages for fileNumber in range(1, 117): mergedObject.append(PdfFileReader('6_yuddhakanda_' + str(fileNumber)+ '.pdf', 'rb')) # Write all the files into a file which is named as shown below mergedObject.write("mergedfilesoutput.pdf")
Method 2 might look very efficient to you – But it has it’s catch about the file names. If you have a method or a script to take care of that part, then obviously method 2 is very efficient.
So this brings us to the end of this one off-topic post about merging multiple PDF files into a single file. I hope it was useful.
Have a great day.
Prost! ~ Renga
Some links to ponder:
- PyPDF2 documentation – to explore further options – https://pythonhosted.org/PyPDF2/
- Automate boring stuff with Python – A great book – https://automatetheboringstuff.com/#toc