当前位置：移动技术网 > IT编程>脚本编程>Python > Python3标准库：zlib GNUzlib压缩

Python3标准库：zlib GNUzlib压缩

2020年03月28日 | 移动技术网IT编程 | 我要评论

空气能热水器好吗,山东省二本院校,易金星

1. zlib gnuzlib压缩

zlib模块为gnu项目zlib压缩库中的很多函数提供了底层接口。

1.1 处理内存中的数据

使用zlib最简单的方法要求把所有将要压缩或解压缩的数据存放在内存中。

import zlib
import binascii

original_data = b'this is the original text.'
print('original     :', len(original_data), original_data)

compressed = zlib.compress(original_data)
print('compressed   :', len(compressed),
      binascii.hexlify(compressed))

decompressed = zlib.decompress(compressed)
print('decompressed :', len(decompressed), decompressed)

compress()和decompress()函数都取一个字节序列参数，并且返回一个字节序列。

从前面的例子可以看到，少量数据的压缩版本可能比未压缩的版本还要大。具体的结果取决于输入数据，不过观察小数据集的压缩开销很有意思。

import zlib

original_data = b'this is the original text.'

template = '{:>15}  {:>15}'
print(template.format('len(data)', 'len(compressed)'))
print(template.format('-' * 15, '-' * 15))

for i in range(5):
    data = original_data * i
    compressed = zlib.compress(data)
    highlight = '*' if len(data) < len(compressed) else ''
    print(template.format(len(data), len(compressed)), highlight)

输出中的*突出显示了哪些行的压缩数据比未压缩版本占用的内存更多。

zlib支持不同的压缩级别，允许在计算成本和空间缩减量之间有所平衡。默认压缩级别zlib.z_default_compression为-1，这对应一个硬编码值，表示性能和压缩结果之间的一个折中。当前这对应级别6。

import zlib

input_data = b'some repeated text.\n' * 1024
template = '{:>5}  {:>5}'

print(template.format('level', 'size'))
print(template.format('-----', '----'))

for i in range(0, 10):
    data = zlib.compress(input_data, i)
    print(template.format(i, len(data)))

压缩级别为0意味着根本没有压缩。级别9要求的计算最多，同时会生成最小的输出。如下面的例子，对于一个给定的输入，可以多个压缩级别得到的空间缩减量是一样的。

1.2 增量压缩与解压缩

这种内存中的压缩方法有一些缺点，主要是系统需要有足够的内存，可以在内存中同时驻留未压缩和压缩版本，因此这种方法对于真实世界的用例并不实用。另一种方法是使用compress和decompress对象以增量方式处理数据，这样就不需要将整个数据集都放在内存中。

import zlibimport binascii

compressor = zlib.compressobj(1)

with open('lorem.txt','rb') as input:
    while true:
        block = input.read(64)
        if not block:
            break
        compressed = compressor.compress(block)
        if compressed:
            print('compressed: {}'.format(
                binascii.hexlify(compressed)))
        else:
            print('buffering...')
    remaining = compressor.flush()
    print('flushed: {}'.format(binascii.hexlify(remaining)))

这个例子从一个纯文本文件读取小数据块，并把这个数据集传至compress()。压缩器维护压缩数据的一个内存缓冲区。由于压缩算法依赖于校验和以及最小块大小，所以压缩器每次接收更多输入时可能并没有准备好返回数据。如果它没有准备好一个完整的压缩块，那便会返回一个空字节串。当所有

1.3 混合内容流

在压缩和未压缩数据混合在一起的情况下，还可以使用decompressobj()返回的decompress类。

import zlib

lorem = open('lorem.txt','rb').read()
compressed = zlib.compress(lorem)
combined = compressed +lorem

decompressor = zlib.decompressobj()
decompressed = decompressor.decompress(combined)

decompressed_matches = decompressed == lorem
print('decompressed matches lorem:',decompressed_matches)

unused_matches = decompressor.unused_data == lorem
print('unused data matches lorem:',unused_matches)

解压缩所有数据后，unused_data属性会包含未用的所有数据。

1.4 校验和

除了压缩和解压缩函数，zlib还包括两个用于计算数据的校验和的函数，分别是adler32()和crc32()。这两个函数计算出的校验和都不能认为是密码安全的，它们只用于数据完整性验证。

import zlib

data = open('lorem.txt','rb').read()

cksum = zlib.adler32(data)
print('adler32: {:12d}'.format(cksum))
print('       : {:12d}'.format(zlib.adler32(data,cksum)))

cksum = zlib.crc32(data)
print('crc-32: {:12d}'.format(cksum))
print('       : {:12d}'.format(zlib.crc32(data,cksum)))

这两个函数取相同的参数，包括一个包含数据的字节串和一个可选值，这个值可作为校验和的起点。这些函数会返回一个32位有符号整数值，这个值可以作为一个新的起点参数再传回给后续的调用，以生成一个动态变化的校验和。

1.5 压缩网络数据

下一个代码清单中的服务器使用流压缩器来响应文件名请求，它将文件的一个压缩版本写至与客户通信的套接字中。

import zlib
import logging
import socketserver
import binascii

block_size = 64


class zlibrequesthandler(socketserver.baserequesthandler):

    logger = logging.getlogger('server')

    def handle(self):
        compressor = zlib.compressobj(1)

        # find out what file the client wants
        filename = self.request.recv(1024).decode('utf-8')
        self.logger.debug('client asked for: %r', filename)

        # send chunks of the file as they are compressed
        with open(filename, 'rb') as input:
            while true:
                block = input.read(block_size)
                if not block:
                    break
                self.logger.debug('raw %r', block)
                compressed = compressor.compress(block)
                if compressed:
                    self.logger.debug(
                        'sending %r',
                        binascii.hexlify(compressed))
                    self.request.send(compressed)
                else:
                    self.logger.debug('buffering')

        # send any data being buffered by the compressor
        remaining = compressor.flush()
        while remaining:
            to_send = remaining[:block_size]
            remaining = remaining[block_size:]
            self.logger.debug('flushing %r',
                              binascii.hexlify(to_send))
            self.request.send(to_send)
        return


if __name__ == '__main__':
    import socket
    import threading
    from io import bytesio

    logging.basicconfig(
        level=logging.debug,
        format='%(name)s: %(message)s',
    )
    logger = logging.getlogger('client')

    # set up a server, running in a separate thread
    address = ('localhost', 0)  # let the kernel assign a port
    server = socketserver.tcpserver(address, zlibrequesthandler)
    ip, port = server.server_address  # what port was assigned?

    t = threading.thread(target=server.serve_forever)
    t.setdaemon(true)
    t.start()

    # connect to the server as a client
    logger.info('contacting server on %s:%s', ip, port)
    s = socket.socket(socket.af_inet, socket.sock_stream)
    s.connect((ip, port))

    # ask for a file
    requested_file = 'lorem.txt'
    logger.debug('sending filename: %r', requested_file)
    len_sent = s.send(requested_file.encode('utf-8'))

    # receive a response
    buffer = bytesio()
    decompressor = zlib.decompressobj()
    while true:
        response = s.recv(block_size)
        if not response:
            break
        logger.debug('read %r', binascii.hexlify(response))

        # include any unconsumed data when
        # feeding the decompressor.
        to_decompress = decompressor.unconsumed_tail + response
        while to_decompress:
            decompressed = decompressor.decompress(to_decompress)
            if decompressed:
                logger.debug('decompressed %r', decompressed)
                buffer.write(decompressed)
                # look for unconsumed data due to buffer overflow
                to_decompress = decompressor.unconsumed_tail
            else:
                logger.debug('buffering')
                to_decompress = none

    # deal with data reamining inside the decompressor buffer
    remainder = decompressor.flush()
    if remainder:
        logger.debug('flushed %r', remainder)
        buffer.write(remainder)

    full_response = buffer.getvalue()
    lorem = open('lorem.txt', 'rb').read()
    logger.debug('response matches file contents: %s',
                 full_response == lorem)

    # clean up
    s.close()
    server.socket.close()

我们人为的将这个代码清单做了一些划分，以展示缓冲行为，如果将数据传递到compress()或decompress()，但没有得到完整的压缩或未压缩输出块，此时便会进行缓冲。

客户连接到套接字，并请求一个文件。然后循环，接收压缩数据块。由于一个块可能未包含足够多的信息来完全解压缩，所以之前接收的剩余数据将与新数据结合，并且传递到解压缩器。解压缩数据时，会把它追加到一个缓冲区，处理循环结束时将与文件内容进行比较。

您可能感兴趣的文章:

如对本文有疑问，请在下面进行留言讨论，广大热心网友会与你互动！！点击进行留言回复

python dict乱码如何解决

定义字典并直接输出，结果输出结果中文是乱码展示d={'name':'lily','age':18,'sex':'女','no':1121}print d输出结果... [阅读全文]
如何写python的配置文件

一、创建配置文件在d盘建立一个配置文件，名字为：test.ini内容如下：[baseconf]host=127.0.0.1port=3306user=rootp... [阅读全文]
使用Python FastAPI构建Web服务的实现

fastapi 是一个使用 python 编写的 web 框架，还应用了 python asyncio 库中最新的优化。本文将会介绍如何搭建基于容器的开发环境，... [阅读全文]
Python过滤掉numpy.array中非nan数据实例

代码需要先导入pandasarr的数据类型为一维的np.arrayimport pandas as pdarr[~pd.isnull(arr)]补充知识：pyt... [阅读全文]
python求numpy中array按列非零元素的平均值案例

输入：numpy的array输出：一个一维的平均值arrayimport numpy as np def non_zero_mean(np_arr): exis... [阅读全文]
Python如何向SQLServer存储二进制图片

需求是需要用python往 sqlserver中的image类型字段中插入二进制图片核心代码，研究好几个小时的代码：安装pywin32，adodbapiimag... [阅读全文]
python numpy实现rolling滚动案例

相比较pandas，numpy并没有很直接的rolling方法，但是numpy 有一个技巧可以让numpy在c代码内部执行这种循环。这是通过添加一个与窗口大小相... [阅读全文]
python opencv 实现读取、显示、写入图像的方法

opencv是一个强大的图像处理和计算机视觉库，实现了很多实用算法，值得学习和深究下。opencv包安装·　　这里直接安装opencv-python包（非官方）... [阅读全文]
python thrift 实现单端口多服务的过程

thrift 是一种接口描述语言和二进制通信协议。以前也没接触过，最近有个项目需要建立自动化测试，这个项目之间的微服务都是通过 thrift 进行通信的，然后写... [阅读全文]
Python while true实现爬虫定时任务

记得以前的windows 任务定时是可以的正常使用的，今天试了下，发现不能正常使用了，任务计划总是挂起。接下来记录下python 爬虫定时任务的几种解决方法。今... [阅读全文]

网友评论


验证码：