I use this blog as a soap box to preach (ahem... to talk :-) about subjects that interest me.

Wednesday, July 30, 2014

The struggle of making an EPUB on the Mac

EPUB (Electronic PUBlication) is an e-book standard by the International Digital Publishing Forum (IDPF).  Most significantly, Apple and an increasing number of vendors have adopted it for their e-readers.

The latest version of the standard is 3.01, but be warned: it is not easy to understand.

In essence, an EPUB consists of web pages plus some files that tell the e-reader how the pages are organised:
  • The file named mimetype contains the string application/epub+zip.
  • The folder named META-INF contains the XML file container.inf with additional general info.
  • XHTML (i.e., HTML conforming to the XML standard) documents contain all the text, with links to images and media objects.
  • A file with extension opf (which stands for Open Packaging Format) defines how the various documents fit together.
  • An XHTML document defines how the user can navigates through the e-book.
Once you have all your files in place, you zip them together, change the extension from zip to epub, and read them as an e-book on your iPad.  The IDPF provides a validator that lets you check your document.  If you have done everything right, you are rewarded with the following message:



My test.epub was a trivial e-book, but it literally took me hours before I could work out how to put it together on the Mac.

To zip a folder on the Mac is easy: all you need to do is select the folder and then click on the "Compress" entry of the "File menu".  But if you do so, the folder itself will be zipped and you don't want that.  You want a single zip file with the content of the folder, without the folder itself.  In my case, I had a folder called test containing the file mimetype, the folder META-INF, and the folder EPUB with the rest (you can name most of the files as you like).

The Mac OS is Unix-based.  As such, it includes the almost universally present zip command.  But it took me a while to make my test.zip (then renamed test.epub) that would pass IDPF's validator.  After attaching to the test folder where all the e-book files were, I typed the following commands:

Giulios-Mac:test giulio$ zip test -X -0 mimetype

  adding: mimetype (stored 0%)
This first command created the file test.zip containing mimetype and nothing else.  The -X option ensures that no attributes are added to the file and -0 that the file remains uncompressed.  In this way, you satisfy the EPUB standard that mimetype be the first file in the package, naked, and uncompressed.  If you zip everything at the same time or without the options, the validator will fail.

Giulios-Mac:test giulio$ zip -r test * -u -n zip
  adding: EPUB/ (stored 0%)
  adding: EPUB/.DS_Store (deflated 95%)
  adding: EPUB/main.html (deflated 79%)
  adding: EPUB/nav.html (deflated 39%)
  adding: EPUB/package.opf (deflated 51%)
  adding: EPUB/util/ (stored 0%)
  adding: EPUB/util/ebook.css (deflated 71%)
  adding: META-INF/ (stored 0%)
  adding: META-INF/container.xml (deflated 34%)
This second command adds to test.zip the rest of the e-book (identified by the asterisk).  The -u option specifies that it is an update, and -n zip excludes from the compression the files with extension zip (necessary because test.zip is inside test/).

As you can see, my e-book only included one XHTML document (main.html) and a style sheet (ebook.css), with no images.  I have named my XHTML files with the extension HTML because I found it easier to work with and extensions don't matter.  Also notice that the folder EPUB/ contains a file named .DS_Store.  Mac OS freely sprinkles these files all over the place to store folder properties.  They are a hidden nuisance that causes problems whenever you access Mac folders outside the Mac universe.  But you can remove them with the following command:

Giulios-Mac:test giulio$ find . -name ".DS_Store" -depth -exec zip test -d {} \;
deleting: EPUB/.DS_Store
It searches the current folder and all subfolders for files named .DS_Store.  Whenever it finds one, it passes its location on to the zip command that removes it from test.zip.

Finally, the following command showed that all .DS_Store files had been removed:

Giulios-Mac:test giulio$ zipinfo test.zip
Archive:  test.zip   3176 bytes   9 files
-rwxr-xr-x  3.0 unx       20 b- stor 30-Jul-14 11:29 mimetype
drwxr-xr-x  3.0 unx        0 bx stor 30-Jul-14 16:47 EPUB/
-rwxr-xr-x  3.0 unx     1694 tx defN 30-Jul-14 15:03 EPUB/main.html
-rwxr-xr-x  3.0 unx      461 tx defN 30-Jul-14 15:05 EPUB/nav.html
-rwxr-xr-x  3.0 unx      836 tx defN 30-Jul-14 16:03 EPUB/package.opf
drwxr-xr-x  3.0 unx        0 bx stor 30-Jul-14 14:02 EPUB/util/
-rwxr-xr-x  3.0 unx     1996 tx defN 30-Jul-14 15:57 EPUB/util/ebook.css
drwxr-xr-x  3.0 unx        0 bx stor 30-Jul-14 11:29 META-INF/
-rwxr-xr-x  3.0 unx      259 tx defN 30-Jul-14 14:54 META-INF/container.xml
9 files, 5266 bytes uncompressed, 1822 bytes compressed:  65.4%

I tried to remove the bloody .DS_Store files before zipping, but without success.  I resorted to removing them from the zip file out of desperation, but it works just fine.

No comments:

Post a Comment