I have a bunch of ebooks, mostly tech, for example purchased from Humble Bundle and from different authors who distribute their work DRM free.
The epub format is basically just a ZIP file with structured content like XHTML, CSS and image files. The images are stored as PNG frequently.
Some publishers forget or don’t know how to optimize the PNG images losslessly. Usually the compression for the final epub file will compress the data significantly if the images aren’t optimized, which is good to keep the target file small.
But this still means that the compression for the files within the epub container is not optimal.
Why is the compression for files in the epub container relevant?
The epub reader software must somehow extract all the contents in order to display the ebook. The implementation details differ, I think most commonly they will just extract the contents to a temporary location.
Let’s take this book I got in a Humble Bundle as an example. It’s one of the more extreme ones, just to show what I mean:
The file with original_epub
at the end is the file I received as a download.
If I check this epub with zipinfo
I can see that the compression ratio is 94,4%
1364804253
bytes uncompressed (1301MB)
76907502
bytes compressed (73MB) is ~ the size of the epub file
Which means that the ebook reader must store that data somewhere when opening the epub file for the first time. In my case it’s especially relevant because I self host calibre, where I enabled a web version of the reader, so I can access my ebooks from any browser. For this ebook I would have to download 1301MB just to open the book.
On some mobile devices with storage constrains having an epub in this format might utilize a large portion of the device storage. Some epub apps also might have a bad cache management.
Calibre ebook-polish to the rescue
CLI method
If you don’t use calibre for managing ebooks, but you still want to optimize some of your ebooks you can install calibre just because it comes with a command line tool ebook-polish
. To optimize the PNG images use ebook-polish -i <filename>
. The output will be stored as a new file, appended with _polished
.
ebook-polish
help output:
-i, --compress-images
Losslessly compress images in the book, to reduce the
filesize, without affecting image quality
It is not recommended to use this tool via CLI on an existing calibre library, because calibre should do all operations on the files and database. For an existing library you can use the next method:
calibre GUI method
This allows to optimize PNG images for multiple ebooks for an existing calibre library.
Go to Preferences
- Toolbars & menus
Select The main toolbar
and add Polish books
to the current actions.
You can then select books and hit the polish button in the menu bar and activate Losslessly compress images
This process is very CPU intensive. For this example it ran on a smaller server with 4 cores, so it should be faster on your device.
calibre uses optipng for this, here you can read more about it. I like that it’s lossless, because I don’t want to degrade the image quality the editor or author chose for the ebook permanently.
As you might have seen in the first screenshot:
The resulting epub file went from 74MB to 62MB. Only a 12MB difference, but now we know that our ebook reader doesn’t have to expand that file to 1301MB anymore.