Pacharapol Withayasakpunt Pacharapol Withayasakpunt
Mon, September 28, 2020

A reliable way to create PDF from HTML/markdown, with PDF specific features

Indeed, the way includes

  • Don't just simply convert a HTML file to PDF, one-to-one. Otherwise, you can never control page breaks.
  • Nonetheless, HTML rendering will be web-browser dependent. (Therefore, not sure about Pandoc.)
  • CSS is powerful, but are there exceptions?

Therefore, I suggest a way of using a web driver + a PDF library, that can READ and MODIFY pdf.

The web driver is currently best either Puppeteer, or Chrome DevTools Protocol.

Additionally, it might be possible to distribute PDF generator via Electron + Puppeteer-in-Electron.

https://stackoverflow.com/questions/58213258/how-to-use-puppeteer-core-with-electron

The PDF manager, that can read-and-merge PDF, is traditionally either PDFtk (binary) or pdfbox (Java), I think; but I have just recently found,

https://github.com/Hopding/pdf-lib

About CSS, yes CSS can also detect page margins.

  body {
    position: fixed;
    width: 100vw;
    height: 100vh;
    display: flex;
    align-items: center;
    justify-content: center;
  }

This is my attempt so far.

https://github.com/patarapolw/make-pdf

So, the answer to the question is, no, do not convert a single HTML or Markdown file, to one PDF file; but do combine within a folder. Also,

  • Running a web server might be better than using file:// protocol and relative paths
  • Choosing a web browser might affect result.

Also, consider alternatives to PDF, that easily allow editing. Might be odt or docx?