当前位置：移动技术网 > IT编程>脚本编程>Python > Python之利用PyPDF2库实现对PDF的删除和合并

Python之利用PyPDF2库实现对PDF的删除和合并

2020年07月03日 | 移动技术网IT编程 | 我要评论

文章目录

概述
安装
一、The PdfFileReader Class

1、getNumPages()
2、getPage(pageNumber)

二、The PdfFileWriter Class

1、addPage(page)
2、write(stream)

三、The PdfFileMerger Class

方法1、append（）
方法2、merge（）
方法3、write（）

实例一：删除
实例二：合并

概述

PyPDF2是Python中用于对PDF操作的第三方库，提供了删除、合并、裁剪、转换等操作
最主要有四个类：
The PdfFileReader Class
The PdfFileMerger Class
The PageObject Class
The PdfFileWriter Class

安装

打开命令行键入

pip install PyPDF2

一、The PdfFileReader Class

 PyPDF2.PdfFileReader(stream, strict=True, warndest=None, overwriteWarnings=True)

Parameters:
stream – A File object or an object that supports the standard read and seek methods similar to a File object. Could also be a string representing a path to a PDF file.

strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True.

warndest – Destination for logging warnings (defaults to sys.stderr).

overwriteWarnings (bool) – Determines whether to override Python’s warnings.py module with a custom implementation (defaults to True).

1、getNumPages()

Calculates the number of pages in this PDF file.

Returns: number of pages
Return type: int
Raises PdfReadError:
if file is encrypted and restrictions prevent this action.

2、getPage(pageNumber)

Retrieves a page by number from this PDF file.

Parameters: pageNumber (int)
– The page number to retrieve (pages begin at zero)
Returns: a PageObject instance.
Return type: PageObject

二、The PdfFileWriter Class

class PyPDF2.PdfFileWriter
This class supports writing PDF files out, given pages produced by another class (typically PdfFileReader).

1、addPage(page)

Adds a page to this PDF file. The page is usually acquired from a PdfFileReader instance.

Parameters: page (PageObject) – The page to add to the document. Should be an instance of PageObject

2、write(stream)

Writes the collection of pages added to this object out as a PDF file.

Parameters: stream – An object to write the file to. The object must support the write method and the tell method, similar to a file object.

三、The PdfFileMerger Class

Initializes a PdfFileMerger object. PdfFileMerger merges multiple PDFs into a single PDF. It can concatenate, slice, insert, or any combination of the above.
初始化一个PdfFileMerger对象，PdfFileMerger 用来将多个PDF合并为一个PDF，它能够连接，切割，插入或者以上的任意组合

See the functions merge() (or append()) and write() for usage information.

Parameters: strict (bool) – Determines whether user should be warned of all problems and also causes some correctable problems to be fatal. Defaults to True.

方法1、append（）

append(fileobj, bookmark=None, pages=None, import_bookmarks=True)

Identical to the merge() method, but assumes you want to concatenate all pages onto the end of the file instead of specifying a position.
和merge（）方法相同，但假定的是你想要把全部页面连接到文件的最后而不是指定位置

Parameters:
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
一个文件对象（python中用open（）创建的对象）或者类似文件对象的能够支持标准读取和寻找方法的对象，也可以是一个代表指向PDF文件路径的字符串
bookmark (str) – Optionally, you may specify a bookmark to be applied at the beginning of the included file by supplying the text of the bookmark.
pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.
可以是一个页码序列或者一个(start, stop[, step])元组，用来合并指定范围的源文件页面到输出文件

import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

在这里插入代码片

方法2、merge（）

merge(position, fileobj, bookmark=None, pages=None, import_bookmarks=True)

Merges the pages from the given file into the output file at the specified page number.
从指定位置合并来自给定文件的页面到输出文件

Parameters:
position (int) – The page number to insert this file. File will be inserted after the given number.
插入文件的页码数，将插入到给定页数的后面
0口1 口2 口3
fileobj – A File Object or an object that supports the standard read and seek methods similar to a File Object. Could also be a string representing a path to a PDF file.
bookmark (str) – Optionally, you may specify a bookmark to be applied at the beginning of the included file by supplying the text of the bookmark.
pages – can be a Page Range or a (start, stop[, step]) tuple to merge only the specified range of pages from the source document into the output document.

import_bookmarks (bool) – You may prevent the source document’s bookmarks from being imported by specifying this as False.

注意：position和pages均指的是下图绿色数字，pages的范围是绿色数字之间囊括的页面
在这里插入图片描述

方法3、write（）

write(fileobj)
Writes all data that has been merged to the given output file.
将所有被合并的数据写入到给定的输出文件中

Parameters: fileobj – Output file. Can be a filename or any kind of file-like object.
输出文件，可以是一个文件名或者所有类似文件对象的对象

实例一：删除

#PDF_delete.py
from PyPDF2 import PdfFileWriter, PdfFileReader

def PDF_delete(index):
    output = PdfFileWriter()  # 声明一个用于输出PDF的实例
    input1 = PdfFileReader(open("C:/Users/Yuanzheng/Desktop/Test1.pdf", "rb"))  # 读取本地PDF文件
    pages = input1.getNumPages()  # 读取文档的页数
    for i in range(pages):
        if i + 1 in index:
            continue  # 待删除的页面
        output.addPage(input1.getPage(i))  # 读取PDF的第i页，添加到输出Output实例中
    outputStream = open("C:/Users/Yuanzheng/Desktop/Test-Output1.pdf", "wb")
    output.write(outputStream)  # 把编辑后的文档保存到本地
PDF_delete([2])

实例二：合并

#PDF_merger.py
from PyPDF2 import PdfFileMerger

merger = PdfFileMerger()

input1 =open("C:/Users/Yuanzheng/Desktop/Test1.pdf","rb")
input2  = open("C:/Users/Yuanzheng/Desktop/Test2.pdf","rb")

merger.append(fileobj= input1)
merger.merge(position=0,fileobj=input2,pages=(1,3))

output = open("C:/Users/Yuanzheng/Desktop/PyPDF-Output2.pdf","wb")
merger.write(output)

Reference：https://pythonhosted.org/PyPDF2/PageObject.html

本文地址：https://blog.csdn.net/weixin_41998772/article/details/107082060

您可能感兴趣的文章:

如对本文有疑问，点击进行留言回复！！

Python3爬虫中关于中文分词的详解

原理中文分词，即 chinese word segmentation，即将一个汉字序列进行切分，得到一个个单独的词。表面上看，分词其实就是那么回事，但分词效果好... [阅读全文]
F盘提示使用驱动器中的光盘之前需要将其格式化要怎样寻回资料

磁盘打不开使用驱动器中的光盘之前需要将其格式化，是因为这个I盘的文件系统内部结构损坏导致的。要恢复里面的数据... [阅读全文]
PWM讲解及stm32配置过程

第一部分：PWM简介：脉冲宽度调制(PWM)，是英文“Pulse Width Modulation” 的缩写，简... [阅读全文]
【芯片驱动】2. CMT2300A配合硬件测试的软件实现

前言在开发一款无线射频产品的时候，软件是一部分，硬件也是一部分。而决定无线收发性能的，首先是硬件的匹配电... [阅读全文]
PC-Arduino Serial communication using python

PC-Arduino Serial communication介绍PC: python serial 程序Ard... [阅读全文]
LoRa技术的发展应用和LoRa应用设备

LoRa技术的发展应用LORA技术大约在十年前由法国和瑞士开发，到现今LORA技术已经是物联网发展应用中不可缺少... [阅读全文]
python爬虫要用到的库总结

python爬虫要用到的库：请求库：实现 http 请求操作 urllib：一系列用于操作url的功能。 requests：基于 urllib 编写的，阻塞... [阅读全文]
Python 实现一个计时器

问题你想记录程序执行多个任务所花费的时间解决方案time 模块包含很多函数来执行跟时间有关的函数。尽管如此，通常我们会在此基础之上构造一个更高级的接口来模拟一... [阅读全文]
Python远程方法调用实现过程解析

rpchandler 和 rpcproxy 的基本思路是很比较简单的。如果一个客户端想要调用一个远程函数，比如 foo(1, 2, z=3) ,代理类创建一个... [阅读全文]
GO语言实现标题闪烁效果

在实现客服系统的过程中,需要有新消息的时候标题栏闪烁提示因为聊天框是被iframe框进去的页面,所以在聊天框中收到websocket消息以后要把消息发送给父集页... [阅读全文]

网友评论


验证码：