Python生成Pdf报告

生成报告这个功能应该也有很多办法。因为我不会前端相关的开发，所以只能尝试用python来生成pdf报告。在实际使用的过程中发现现有的操作pdf的库体验都不是很好。所以改变策略尝试两步来实现pdf生成：

1.通过jinja2库操作doc文档根据模板生成相关的word文档

2.通过openoffice或者其他的命令行工具生成pdf，这是常规做法。还有另外一个办法就是通过oss的pdf转换功能生成pdf，这么做的好处是生成完了直接可以顺便生成一个下载链接，可以直接使用。

下面就是具体实现了：

Jinja2 是一个现代的，设计者友好的，仿照 Django 模板的 Python 模板语言。它速度快，被广泛使用，并且提供了可选的沙箱模板执行环境保证安全

以上是jinja2的介绍，官网地址：https://docs.jinkan.org/docs/jinja2/

具体的详细用法可以参考：https://www.cnblogs.com/feifeifeisir/p/14701066.html

1.首先准备一个模板文档，内容如下：

语法为模板语言，具体文本内容（code为印章，需要提前准备一个印章图片）：

服务问题限期整改通知书

编号：202{{year}}-{{first_no}}-{{second_no}}

{{company_name}}(企业)：
经抽查，你公司服务的{{project_name}}项目，存在下列问题：
{{check_list}}

请你公司在{{days}}日内对上述抽查检查提出的问题举一反三、全面进行整改，并将整改情况及前后对比照片报管理服务中心。如果抽查复查时发现仍未整改或整改不力的情况，将对你公司进行扣信用分、直至黑名单等信用等级考核管理。



管理服务中心
{{code}} 



抽查人（签字）：{{checker_sign}}
接收人（签字）：{{confirm_sign}}

2.根据模板生成doc文档：

def generate_doc(report_number, company_name, project_name, check_list, days, checker_sign, confirm_sign):
    tpl = DocxTemplate('template.docx')
    doc_context = {'report_number':report_number,
                   'company_name': company_name,
                   'project_name': project_name,
                   'check_list': check_list,
                   'days': days,
                   'checker_sign': InlineImage(tpl, download_image(checker_sign), width=Mm(60), height=Mm(20)),
                   'confirm_sign': InlineImage(tpl, download_image(confirm_sign), width=Mm(60), height=Mm(20)),
                   'code': InlineImage(tpl, 'code.png', width=Mm(40), height=Mm(40)),
                   }
    jinja_env = jinja2.Environment(autoescape=True)
    tpl.render(doc_context, jinja_env)
    file_name = random_file_name('docx')
    if not os.path.exists('docx'):
        os.mkdir('docx')
    tpl.save(os.path.join('docx/') + file_name)
    return os.path.join('docx/') + file_name

3.上传oss以及进行格式转换：

def upload_file_to_oss(file_name, upload_path):
    with open(file_name, 'rb') as fileobj:
        # Seek方法用于指定从第1000个字节位置开始读写。上传时会从您指定的第1000个字节位置开始上传，直到文件结束。
        # fileobj.seek(1000, os.SEEK_SET)
        # # Tell方法用于返回当前位置。
        # current = fileobj.tell()
        # # 填写Object完整路径。Object完整路径中不能包含Bucket名称。
        dest_full_path = upload_path + '/' + file_name
        bucket.put_object(dest_full_path, fileobj)
    return dest_full_path


def convert_file_format(file_path, target_format='pdf'):
    createReq = CreateOfficeConversionTaskRequest()
    srcUri = "oss://ls-d/"+ file_path
    tgtUri = "oss://ls-d/reports/pdf/"
    tgtType = target_format
    createReq.set_Project("ossdocdefault")
    createReq.set_SrcUri(srcUri)
    createReq.set_TgtUri(tgtUri)
    createReq.set_TgtType(tgtType)
    pdf_file_prefix = file_path.split('/')[-1].split('.')[0]
    createReq.set_TgtFilePrefix(pdf_file_prefix)
    createReq.set_TgtFileSuffix('.'+target_format)
    # dest_pdf_path =
    response = client.do_action_with_exception(createReq)
    print(response)
    res = json.loads(response)
    taskId = res["TaskId"]
    getReq = GetOfficeConversionTaskRequest()
    getReq.set_Project("ossdocdefault")
    getReq.set_TaskId(taskId)

    period = 1
    timeout = 30
    start = time.time()
    pdf_file_url = ''
    while True:
        time.sleep(period)
        response = client.do_action_with_exception(getReq)
        print(response)
        status = json.loads(response)["Status"]
        if status == "Finished":  # 任务完成。
            print("Task finished.")
            pdf_file_url = 'http://ls-d.oss-cn-hangzhou.aliyuncs.com/reports/pdf/' + pdf_file_prefix+ '.'+target_format
            break
        if status == "Failed":  # 任务失败。
            print("Task failed.")
            break
        if time.time() - start > timeout:  # 任务超时。
            print("Task timeout.")
            break
    return pdf_file_url

4.最终效果如下：

完整代码：

from http.server import HTTPServer, BaseHTTPRequestHandler
import json
from urllib.parse import urlparse
import hashlib
import oss2
import os
import requests
import jinja2
from docxtpl import DocxTemplate
from docxtpl import InlineImage
from docx.shared import Mm, Inches, Pt
import uuid
from aliyunsdkcore.client import AcsClient
from aliyunsdkcore.acs_exception.exceptions import ClientException
from aliyunsdkcore.acs_exception.exceptions import ServerException
from aliyunsdkimm.request.v20170906.CreateOfficeConversionTaskRequest import CreateOfficeConversionTaskRequest
from aliyunsdkimm.request.v20170906.GetOfficeConversionTaskRequest import GetOfficeConversionTaskRequest
import time

data = {'result': 'this is a test'}
host = ('0.0.0.0', 10010)


def get_md5(str):
    hl = hashlib.md5()
    hl.update(str.encode(encoding='utf-8'))
    return hl.hexdigest()


from aliyunsdkcore.client import AcsClient

# 阿里云账号AccessKey拥有所有API的访问权限，风险很高。强烈建议您创建并使用RAM用户进行API访问或日常运维，请登录RAM控制台创建RAM用户。
auth = oss2.Auth('cccc', 'dddd')
# yourEndpoint填写Bucket所在地域对应的Endpoint。以华东1（杭州）为例，Endpoint填写为https://oss-cn-hangzhou.aliyuncs.com。
endpoint = 'oss-cn-hangzhou.aliyuncs.com'
# 填写Bucket名称。
bucket = oss2.Bucket(auth, endpoint, 'ls-d')
client = AcsClient('aaaa', 'bbb', 'cn-hangzhou')


def upload_file_to_oss(file_name, upload_path):
    with open(file_name, 'rb') as fileobj:
        # Seek方法用于指定从第1000个字节位置开始读写。上传时会从您指定的第1000个字节位置开始上传，直到文件结束。
        # fileobj.seek(1000, os.SEEK_SET)
        # # Tell方法用于返回当前位置。
        # current = fileobj.tell()
        # # 填写Object完整路径。Object完整路径中不能包含Bucket名称。
        dest_full_path = upload_path + '/' + file_name
        bucket.put_object(dest_full_path, fileobj)
    return dest_full_path


def download_image(image_url):
    file_path = 'user_sign'
    if not os.path.exists(file_path):
        os.mkdir(file_path)
    file_name = os.path.join(file_path, random_file_name('jpg'))
    r = requests.get(image_url)
    print(file_name)
    with open(file_name, 'wb') as f:
        f.write(r.content)
    return file_name


def random_file_name(file_type):
    file_name = str(uuid.uuid4()) + '.' + file_type
    return file_name

def convert_file_format(file_path, target_format='pdf'):
    createReq = CreateOfficeConversionTaskRequest()
    srcUri = "oss://ls-d/"+ file_path
    tgtUri = "oss://ls-d/reports/pdf/"
    tgtType = target_format
    createReq.set_Project("ossdocdefault")
    createReq.set_SrcUri(srcUri)
    createReq.set_TgtUri(tgtUri)
    createReq.set_TgtType(tgtType)
    pdf_file_prefix = file_path.split('/')[-1].split('.')[0]
    createReq.set_TgtFilePrefix(pdf_file_prefix)
    createReq.set_TgtFileSuffix('.'+target_format)
    # dest_pdf_path =
    response = client.do_action_with_exception(createReq)
    print(response)
    res = json.loads(response)
    taskId = res["TaskId"]
    getReq = GetOfficeConversionTaskRequest()
    getReq.set_Project("ossdocdefault")
    getReq.set_TaskId(taskId)

    period = 1
    timeout = 30
    start = time.time()
    pdf_file_url = ''
    while True:
        time.sleep(period)
        response = client.do_action_with_exception(getReq)
        print(response)
        status = json.loads(response)["Status"]
        if status == "Finished":  # 任务完成。
            print("Task finished.")
            pdf_file_url = 'http://ls-d.oss-cn-hangzhou.aliyuncs.com/reports/pdf/' + pdf_file_prefix+ '.'+target_format
            break
        if status == "Failed":  # 任务失败。
            print("Task failed.")
            break
        if time.time() - start > timeout:  # 任务超时。
            print("Task timeout.")
            break
    return pdf_file_url


def generate_doc(report_number, company_name, project_name, check_list, days, checker_sign, confirm_sign):
    tpl = DocxTemplate('template.docx')
    doc_context = {'report_number':report_number,
                   'company_name': company_name,
                   'project_name': project_name,
                   'check_list': check_list,
                   'days': days,
                   'checker_sign': InlineImage(tpl, download_image(checker_sign), width=Mm(60), height=Mm(20)),
                   'confirm_sign': InlineImage(tpl, download_image(confirm_sign), width=Mm(60), height=Mm(20)),
                   'code': InlineImage(tpl, 'code.png', width=Mm(40), height=Mm(40)),
                   }
    jinja_env = jinja2.Environment(autoescape=True)
    tpl.render(doc_context, jinja_env)
    file_name = random_file_name('docx')
    if not os.path.exists('docx'):
        os.mkdir('docx')
    tpl.save(os.path.join('docx/') + file_name)
    return os.path.join('docx/') + file_name


class Resquest(BaseHTTPRequestHandler):
    def do_GET(self):
        self.send_response(200)
        self.send_header('Content-type', 'application/json')
        self.end_headers()
        self.wfile.write(json.dumps(data).encode())

    def do_POST(self):
        path = self.path
        length = int(self.headers.get('content-length'))  # 获取除头部后的请求参数的长度
        print(length)
        datas = json.loads(self.rfile.read(length))  # 获取请求参数数据，请求数据为json字符串
        print(datas)
        if path == "/create_report/":
            models = datas

            report_number = models['report_number']
            company_name = models['company_name']
            project_name = models['project_name']
            check_list = models['check_list']
            days = models['days']
            checker_sign = models['checker_sign']
            confirm_sign = models['confirm_sign']
            try:
                auth_key = models['key']
                print(auth_key)
                if auth_key != 'l404h4n_r3p0rt':
                    
                    self.send_error(403, 'invalid data')
                else:
                    doc_path = generate_doc(report_number, company_name, project_name, check_list, days,
                                            checker_sign,
                                            confirm_sign)
                    print(doc_path)
                    doc_oss_path = upload_file_to_oss(doc_path, 'reports')

                    pdf_file_url = convert_file_format(doc_oss_path)

                    self.send_response(200)
                    self.send_header("Content-type", "text/html;charset=utf-8")
                    self.end_headers()
                    buf = {"status": 0,
                           "data": {
                               "pdf": pdf_file_url,
                               "image": ''}}  # 返回文件下载地址（点击下载地址是个get请求）
                    self.wfile.write(json.dumps(buf).encode())
            except Exception as e:
                print(e)
                self.send_error(403, 'invalid data key not found')

        else:
            self.send_response(200)
            self.send_header('Content-type', 'application/json')
            self.end_headers()
            self.wfile.write(json.dumps(data).encode())


if __name__ == '__main__':
    server = HTTPServer(host, Resquest)
    print("Starting server, listen at: %s:%s" % host)
    server.serve_forever()

提供web服务如果基于django可能会出现模板冲突，简单的办法就是直接基于httpserver起一个服务。

☆版权☆

* 网站名称：obaby@mars
* 网址：https://h4ck.org.cn/
* 个性：https://oba.by/
* 本文标题：《Python生成Pdf报告》
* 本文链接：https://h4ck.org.cn/2023/03/11512
* 短链接：https://oba.by/?p=11512
* 转载文章请标明文章来源，原文标题以及原文链接。请遵从《署名-非商业性使用-相同方式共享 2.5 中国大陆 (CC BY-NC-SA 2.5 CN) 》许可协议。

4 comments

TeacherDu说道：

2023年3月16日 14:31

Microsoft Edge 110 Windows 10 中国中国移动
这个好这个好，收藏一下！

回复
叶小姐说道：

2023年3月17日 13:20

Firefox 110 Windows 10 美国
诶诶诶!!! 图片效果真么”逼真”吗?! 完全不像生成的诶!!!!

姐姐, 说出来你可能不信. 这三天我看入门视频, 还停留在 ” IDE编辑器 ” 上面, 嘻嘻.

‘PyCharm Community ‘ 和 ‘VScode’ 来回纠结好多次! 还是习惯 ‘VScode’ , 就python环境等等我就研究了一天. 报错太多啦.

以至于 , 现在还没开始学代码, 还在 “python介绍 – > pyton未来规划 – > python官网下载 – > python安装 – > pthon编辑器使用”

不过昨天看到了 ‘ import ‘ 倒入作用! 其他就不知道了, 嘻嘻. 等以后在来看看吧. 现在什么也卡不懂.

回复
1. obaby说道：
  
  2023年3月17日 15:12
  
  Google Chrome 102 Mac OS X 10.15 中国–山东–青岛联通
  嗯嗯。其实呢，如果想快速搭建环境还是使用比较成熟的集成环境，可以减少自己配置的部分。
  python的话pycharm姐姐觉得还是最简单的环境，不用配置很多的东西，基本安装完就可以用啦。
  目前姐姐用的pycharm。
  配置环境出现问题，很容易出现从入门到放弃的情绪。
  加油~~
  
  回复
Pingback：再谈 Python自动生成 pdf 文件 – obaby@mars