Generate PDF from HTML in NodeJS
I don’t do much JavaScript to be honest, if all I’m doing is web development I usually go with Python on the backend and Elm on the client side, but this time I needed to generate a PDF in NodeJS without any transpiler involved, so JavaScript it was. I found out about the library Puppeteer which is a high-level API to control a headless chrome, in essence you can control a web browser via code, in this post we use Nunjucks to render a web page in HTML and puppeteer to save that page to PDF.
First let’s add the needed packages to our dependencies:
$ npm install puppeteer --save
That will download a recent version of Chromium that is guaranteed to work with the library, so expect around 170MB ~ 280MB of disk space to be used, you can also point puppeteer to an existing chrome installation using environment variables. The first step is to generate our desired page in HTML, for that I used nunjucks, a template engine very similar to python’s ninja2, this can be installed like this:
$ npm install nunjucks --save
Let’s create the HTML page, for that we need a template, be sure to read the docs to know what syntax is allowed in templates, in my experience it’s very similar to ninja2, let’s create a simple template:
Hello {{ user.name }}, how's life in {{ user['country'] }}?
If we save this in templates/template.html
we can render it like this from a
NodeJS script like this:
const nunjucks = require('nunjucks');
let htmlContent = nunjucks.render(
'templates/template.html',
{ user: { name: 'John Doe', country: 'Brazil' } }
);
Notice render
takes as second parameter an object with data passed to the
template to be rendered. Now let’s use Puppeteer to turn this HTML to PDF, the
first step is to launch a browser and open a new page, set the contents of the
page to be the HTML contents we want to convert and finally save this page as
a PDF, the browser will need to be closed at the end of this operation:
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setContent(htmlContent);
const pdfContents = await page.pdf({ format: 'A4' });
await browser.close();
At this point pdfContents
is the binary representation of the PDF and can be
saved to disk using the NodeJS file system API:
const fs = require('fs');
fs.writeFile('pdfs/example.pdf', pdfContents, (err) => {
if (err) {
// Couldn't save PDF to disk
console.log(err);
}
});
Notice that the PDF generation code is concurrent code so you need to place
that in a async
code block, if you don’t use async/await
then promises
will be returned by puppeteer and you’ll need to handle that using callbacks,
I find it cleaner and clearer to use async/await
, python3 users should be
also comfortable with this concurrent style of programming.
Lastly when running the PDF generation you can pass the path
attribute so
puppeteer saves the PDF to disk, this is handy if you don’t need to manipulate
the binary contents of the PDF:
const nunjucks = require('nunjucks');
const puppeteer = require('puppeteer');
async function writePDF(contents) {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setContent(contents);
await page.pdf({path: 'pdfs/example.pdf', format: 'A4'});
await browser.close();
}
(async () => {
let htmlContent = nunjucks.render(
'templates/template.html',
{ user: { name: 'John Doe', country: 'Brazil' } }
);
await writePDF(htmlContent);
console.log('Done!');
})();
Conclusion
In this post we’ve seen how to use Nunjucks to generate an HTML page using a
template with Jinja2 like syntax, then we used Puppeteer a high-level API to
control a headless chromium to turn the generated HTML into PDF. However
generation of the PDF is concurrent code so it needs to be annotated with
async/await
keywords where promises are returned.