December 5, 2018

631 words 3 mins read

Generate PDF from HTML in NodeJS

I don’t do much JavaScript to be honest, if all I’m doing is web development I usually go with Python on the backend and Elm on the client side, but this time I needed to generate a PDF in NodeJS without any transpiler involved, so JavaScript it was. I found out about the library Puppeteer which is a high-level API to control a headless chrome, in essence you can control a web browser via code, in this post we use Nunjucks to render a web page in HTML and puppeteer to save that page to PDF.

First let’s add the needed packages to our dependencies:

$ npm install puppeteer --save

That will download a recent version of Chromium that is guaranteed to work with the library, so expect around 170MB ~ 280MB of disk space to be used, you can also point puppeteer to an existing chrome installation using environment variables. The first step is to generate our desired page in HTML, for that I used nunjucks, a template engine very similar to python’s ninja2, this can be installed like this:

$ npm install nunjucks --save

Let’s create the HTML page, for that we need a template, be sure to read the docs to know what syntax is allowed in templates, in my experience it’s very similar to ninja2, let’s create a simple template:

Hello {{ user.name }}, how's life in {{ user['country'] }}?

If we save this in templates/template.html we can render it like this from a NodeJS script like this:

const nunjucks = require('nunjucks');

let htmlContent = nunjucks.render(
  'templates/template.html',
  { user: { name: 'John Doe', country: 'Brazil' } }
);

Notice render takes as second parameter an object with data passed to the template to be rendered. Now let’s use Puppeteer to turn this HTML to PDF, the first step is to launch a browser and open a new page, set the contents of the page to be the HTML contents we want to convert and finally save this page as a PDF, the browser will need to be closed at the end of this operation:

const puppeteer = require('puppeteer');

const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.setContent(htmlContent);

const pdfContents = await page.pdf({ format: 'A4' });
await browser.close();

At this point pdfContents is the binary representation of the PDF and can be saved to disk using the NodeJS file system API:

const fs = require('fs');

fs.writeFile('pdfs/example.pdf', pdfContents, (err) => {
  if (err) {
    // Couldn't save PDF to disk
    console.log(err);
  }
});

Notice that the PDF generation code is concurrent code so you need to place that in a async code block, if you don’t use async/await then promises will be returned by puppeteer and you’ll need to handle that using callbacks, I find it cleaner and clearer to use async/await, python3 users should be also comfortable with this concurrent style of programming.

Lastly when running the PDF generation you can pass the path attribute so puppeteer saves the PDF to disk, this is handy if you don’t need to manipulate the binary contents of the PDF:

const nunjucks = require('nunjucks');
const puppeteer = require('puppeteer');

async function writePDF(contents) {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setContent(contents);

  await page.pdf({path: 'pdfs/example.pdf', format: 'A4'});

  await browser.close();
}

(async () => {
  let htmlContent = nunjucks.render(
      'templates/template.html',
      { user: { name: 'John Doe', country: 'Brazil' } }
  );

  await writePDF(htmlContent);
  console.log('Done!');
})();

Conclusion

In this post we’ve seen how to use Nunjucks to generate an HTML page using a template with Jinja2 like syntax, then we used Puppeteer a high-level API to control a headless chromium to turn the generated HTML into PDF. However generation of the PDF is concurrent code so it needs to be annotated with async/await keywords where promises are returned.