Categories
Tools

Generate PDF’s and ePub with wkhtmltopdf and Calibre

In a previous post, I wrote about how I use GNU make to manage dependencies and generate html files from markdown source. In this post, I'll build on that and use the html to generate PDF's and ePub files.

MultiMarkdown can generate PDF files using LaTeX. For some reason, I never got that to work. I tried on multiple Macs with clean installs of MultiMarkdown and a variety of LaTeX apps like mmd2tex, the LaTeX support files for MultiMarkdown, etc. I failed each time. Plus, I don't really want to re-learn LaTeX. And I hate typing in LaTeX.

I know HTML. I know CSS. So I sought out tools to help me leverage those skills.

Leveraging HTML

It's obvious I didn't want to load the HTML into the browser and use the OS to Save As PDF. That would be lame. I wanted to generate this stuff at the command-line, and in scripts, and automate things. I also had an intuition that Webkit would be exposed in more ways that simply embedded into Chrome and Safari. So, I searched for "webkit pdf." What's the first link I found?

http://code.google.com/p/wkhtmltopdf/

Jackpot! This is an awesome set of commands that allow you to feed it an HTML input and have it generate a crisp PDF. Click on the HTML and resulting PDF:

http://primordia.com/upload/lorem_ipsum.html
http://primordia.com/upload/lorem_ipsum.pdf

The command-line to generate this is as follows:

[code wraplines="true"]
wkhtmltopdf --page-width 5.5in --page-height 8.5in --margin-top 0.25in
--margin-bottom 0.25in --margin-left 0.25in --margin-right 0.25in
--load-error-handling ignore lorem_ipsum.html lorem_ipsum.pdf
[/code]

QED.

ePub

Generating an ePub format was also a goal. I googled all over and found a bunch of tools. The one that looked nicest was Calibre. Calibre is a complete e-book management app. It has crazy features:

  • Understand a gazillion e-book formats
  • Can import pdf books, html books, etc.
  • Can edit e-book meta-data
  • Add cover graphics
  • and lots more

But I  just wanted an ePub file. I first tried to see if I could generate an ePub myself. The format is a zip file with HTML inside and a bunch of crazy metadata to create table of contents, chapters, etc. I didn't want to do all of that work so I'm glad I found Calibre. But like pdf's, I didn't want to load the GUI and manually generate ePub files. So I inspected the .app package and inside I found SOLID GOLD. There are a bunch of command-line apps that the GUI uses to do all of the work. Now that's a programmer who knows what he or she is doing! I was able to add the ebook-convert program to my path and invoke it as follows:

[code wraplines="true"]

ebook-convert --no-default-epub-cover --base-font-size 12 --keep-ligatures
--margin-top 10.0 --margin-bottom 10 tmp.html tmp.epub

[/code]

Now I have lorem_ipsum.epub!

I like to use Adobe Digital Editions e-reader on the Mac because I don't need to add an ePub file to any kind of Library. I just open an ePub file and a viewer displays it. No "import" process which is stupid since my ePub files will change so often as I write. You can of course use Calibre to view ePub files, Kindle, and a zillion other apps to do the same thing (albeit with the extra import step).

Width Problems

But there is a very bad problem. The formatting is horrendous! Check this out:

Cut off

The whole thing is cut off at the right? Why? Because my HTML specifically set a width so it would be small like a real book. This was not really necessary, but makes the HTML easy to read when your browser is maximized. If I hadn't done this, text would wrap to 100% of the width of the browser and lines would be too long to read comfortably. But e-readers don't want you to specify widths. Users have lots of different devices. Users play with font sizes and orientations and nothing can be easily predicted. So, you want an HTML file that doesn't specify any width, like this:

http://primordia.com/upload/tmp.html

The offending code was in the CSS:

[code wraplines="true"]

body {
width: 6in;
margin-left: auto;
margin-right: auto;
}

[/code]

So I just removed the width setting. Unfortunately, either calibre or wkhtmltopdf doesn't respect multiple STYLE tags so I could not simply override the width when generating ePub with a second stylesheet (the way CSS was designed!). I guess I should file a bug report. Anyways, I punted and just cloned the CSS and use the ePub version of the CSS when I want to generate ePub. This is lame but cpress handles it. At some point, I'll create a facility in cpress to merge css streams so I can have one master CSS and an ePub version which simply gets rid of the width. For now, two CSS files. Here is an image of the resulting ePub as viewed on my iPhone in iBooks Reader:

lorem2

How fucking beautiful is that?!

I think that's all I'll say for now. I have lots more to share with you. In my next post, I'll talk about how I aggregate multiple markdown files into as single markdown file, how a table-of-contents gets auto-generated, and how I script the generation of my  html, pdf, and ePub artifacts with crontab and sync everything with Dropbox.