Merging multiple PDFs into a single PDF using a Python script

This is one off-post, irrelevant to my blog’s main focus.

Yes – This article would not be about Finite Element Method(FEM) or any of the concepts associated with it.

Merging multiple PDFs into a single document is one activity which most of us have to do. Almost on a daily basis or on a weekly or monthly basis. There are of course many websites which offer this as a service. The ones which allow you to merge PDFs for free often have some limits. Either based on number of files or the time between every merging operation.

What if you can write a Python script that can do this for you?

Sounds great right? Keep reading.

In this article, I am presenting two different methods for merging many PDF files into a single document. Using the Python tool kit – PyPDF2.

Before we go further, I emphasize that there is no “one-method-fits-all” approach. And I do not claim that these methods are the best. These are two methods that have worked fine for me so far. So, I thought that I would share it in this platform.

Prerequisites before you try either of these methods:

Make sure that

  1. You have installed latest version of Python (that’s obvious, duh!)
  2. You have installed the PyPDF2 tool kit
  3. Saved the PDF files that you want to merge in Python’s working directory. Of course, you can change the directory using Python code. For simplicity of code, I am placing the PDF files on the working directory for these two methods that I am going to present here..

Method 1:

This method is directly taken from Chapter 13 of the book “Automate the Boring Stuff with Python” by Al Sweigart.

When is method 1 suitable?

  1. When you have lesser number of files
  2. When the group of files to be merged do not have a common filename pattern

How this method works?

In the following sequence.

  1. Import the PyPDF2 tool kit which has the tools that we need for playing with PDFs
  2. Open each and every file by entering the file name
  3. Read each and every file which was opened in Step 2 using PdfFileReader
  4. Create a blank PDF file using PdfFileWriter where you can store the merged output
  5. Loop through every page in every file which was read in Step 3 using for loop and copy all the information
  6. Give a name for the output file and then paste all the copied information in Step 5
  7. Close all the files

If you find the above sequence difficult to understand, have a look at the code below. Python is very reader-friendly. So I hope you would get the idea.


import PyPDF2 

# Open the files that have to be merged one by one
pdf1File = open('FirstInputFile.pdf', 'rb')
pdf2File = open('SecondInputFile.pdf', 'rb')

# Read the files that you have opened
pdf1Reader = PyPDF2.PdfFileReader(pdf1File)
pdf2Reader = PyPDF2.PdfFileReader(pdf2File)

# Create a new PdfFileWriter object which represents a blank PDF document
pdfWriter = PyPDF2.PdfFileWriter()

# Loop through all the pagenumbers for the first document
for pageNum in range(pdf1Reader.numPages):
    pageObj = pdf1Reader.getPage(pageNum)
    pdfWriter.addPage(pageObj)

# Loop through all the pagenumbers for the second document
for pageNum in range(pdf2Reader.numPages):
    pageObj = pdf2Reader.getPage(pageNum)
    pdfWriter.addPage(pageObj)

# Now that you have copied all the pages in both the documents, write them into the a new document
pdfOutputFile = open('MergedFiles.pdf', 'wb')
pdfWriter.write(pdfOutputFile)

# Close all the files - Created as well as opened
pdfOutputFile.close()
pdf1File.close()
pdf2File.close()

Method 2:

This method is more elegant and has just 5 lines of code. It’s my favorite and it uses the PdfFileMerger module.

When is method 2 suitable?

  1. When you have a lot of PDF files ( I mean a loooot – Like for example, hundreds of PDF files or even more)
  2. If all the PDF files that you want to merge follow a naming convention for their file names.

How this method works?

In the following sequence.

  1. Import PdfFileMerger and PdfFileReader tools
  2. Loop through all the files that have to be merged and append them
  3. Write the appended files into an output document and specify a name for it.

That’s it. It’s simple but powerful.

So let’s look into the code now. Before we go there, I will show how my input files are named. And remember that these files are placed in Python’s working directory.

files_in_folder

As you can see, the file names follow a pattern which makes my job very very easy to loop through them.

Okay. Without much further ado, let’s have a look at the code.


from PyPDF2 import PdfFileMerger, PdfFileReader

# Call the PdfFileMerger
mergedObject = PdfFileMerger()

# I had 116 files in the folder that had to be merged into a single document
# Loop through all of them and append their pages
for fileNumber in range(1, 117):
    mergedObject.append(PdfFileReader('6_yuddhakanda_' + str(fileNumber)+ '.pdf', 'rb'))

# Write all the files into a file which is named as shown below
mergedObject.write("mergedfilesoutput.pdf")

Method 2 might look very efficient to you – But it has it’s catch about the file names. If you have a method or a script to take care of that part, then obviously method 2 is very efficient.

So this brings us to the end of this one off-topic post about merging multiple PDF files into a single file. I hope it was useful.

Have a great day.

Prost! ~ Renga

Some links to ponder:

  1. PyPDF2 documentation – to explore further options – https://pythonhosted.org/PyPDF2/
  2. Automate boring stuff with Python – A great book – https://automatetheboringstuff.com/#toc

 

8 thoughts on “Merging multiple PDFs into a single PDF using a Python script

  1. Thanks! I modified your code to merge all pdfs given directory. My files are in ‘orders/’ Cheers!

    from PyPDF2 import PdfFileMerger, PdfFileReader
    import os

    merged_object = PdfFileMerger()
    str_output_name = ‘output.pdf’

    lst_pdfs = []
    for obj in os.listdir(‘orders/’):
    if ‘.pdf’ in obj:
    lst_pdfs.append(obj)
    print(lst_pdfs)

    for file_name in lst_pdfs:
    merged_object.append(PdfFileReader(f’orders/{file_name}’), ‘rb’)
    merged_object.write(f’orders/{str_output_name}’)

    Like

    • Thanks for the code!

      I replaced the strange quotes,
      pip installed pypdf2 and confirmed it’s working.

      The very last line is outside the loop.

      Like

  2. from PyPDF2 import PdfFileMerger, PdfFileReader
    import os
    mergedObject = PdfFileMerger()

    dir = r”C:\Users\SHARATH KUMAR H K\Desktop\Projects\DC\event2″

    Loop through all of the pdf in directory and append pages of each pdf

    for pdf in os.listdir(dir):
    PdfName = dir + ‘\’ + pdf
    mergedObject.append(PdfFileReader(PdfName, ‘rb’))

    Write all the files into a file which is named as shown below

    OutputName = input(‘Enter Output PDF Name : ‘)
    if OutputName.endswith(‘.pdf’) is False: # No extension found
    OutputName = OutputName + ‘.pdf’
    mergedObject.write(OutputName)

    print(f’Saved! PDF at {os.getcwd()}’)

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s