/dev/schnouki

Optimizing JPEG pictures

I recently realized that during our vacation in London, my girlfriend and me took about 4 GB of pictures. Since I currently have 30 GB of storage space on rsync.net to do my backups, 4 GB is quite a lot. Fortunately, there are several solutions to reduce their size.

The first one would be to resize them or to increase their compression ratio / decrease their quality. But I don't want such a lossy method: I want to keep my pictures at the best quality available so I can print them in high resolution if I want to.

The other solution is to "optimize" them. Once again, several methods: removal of unnecessary data (EXIF markers and other metadata), conversion to progressive JPEG, or Huffman table optimization. Since I don't want to lose metadata (mostly because I add many tags to my pictures in Shotwell and they are stored in these metadata), I only use the other two methods.

Most of my photos are taken with my camera (Panasonic Lumix FZ100) or with my girlfriend's (Nikon Coolpix S8000).

I first tried to use jpegoptim to do this task. It only optimizes Huffman tables, and it does it well. However, this tool only supports EXIF and IPTC metadata, and on pictures taken with my camera, Shotwell stores its tags in the XMP "Subject" marker. And jpegoptim erases XMP markers when processing them, resulting in many lost tags...

So I tried to use jpegtran to do the same. It also supports progressive JPEG, and is apparently much better at not destroying metadata when not asked to do so :) Here is the command I use to optimize my pictures with it:

parallel -u 'echo {}; jpegtran -optimize -progressive -perfect -copy all -outfile {}.tran {} && mv {}.tran {}' ::: *.JPG

parallel is GNU Parallel, a tool which is very useful to speed things up by using the 16 cores of my work PC to do the job :)

Using jpegtran this way, I reduced the size of my "London" folder from 4.0 GB to 3.5 GB, i.e. a 12.5% reduction with absolutely no quality loss. Not bad!

Now, some funny things I noticed while doing this:

  • the Lumix FZ100 does not do any optimization to its JPEG files: jpegtran always reduced them by at least 13%, sometimes more. It also create some EXIF and XMP markers in its files, but no IPTC tag.

  • the Coolpix S8000 does a much better job at optimizing its files: jpegtran could only reduce their size by 0.6 to 0.8%, 1.2% at best. It creates EXIF, XMP and IPTC markers.

  • when Shotwell stores tags directly into pictures, it will use the IPTC "Keywords" marker only if there already are IPTC data in the file. This is why jpegoptim lost tags on pictures taken with my camera: the FZ100 only added XMP markers, which were then wiped out by jpegoptim. For pictures taken with the S8000, tags were stored both in XMP and IPTC markers, so when the XMP ones were removed, Shotwell still took the IPTC version into account.

    Not sure if it's a bug or a feature...

iPhone tracking

There has recently been a lot of noise about a tool made by Alasdair Allan and Pete Warden: every iPhone is tracking its owner's movements all the time. For the record, the existence of this database on each iPhone with iOS 4.x has been documented for several months already. And it's not really surprising... Remember Eben Moglen's talk at FOSDEM 2011?

Now, time for a confession: I have an iPhone too. It's a nice, mostly useless device, but it becomes quite fun to use once you jailbreak it. And since I jailbroke mine, I can have fun with it. Now, let's have fun with this geolocation database.

Accessing the geolocation database...

First, ssh into your jailbroken iPhone as root (or mount it with ifuse: ifuse --root /path/to/mountpoint). The DB is stored in the /var/root/Library/Caches/locationd folder and is named consolidated.db, just as explained on the iPhone Tracker page. On my phone, it's a 5.4 MB file. You can copy it to your computer (using scp, rsync, or just cp if you're using ifuse).

If you're curious, you can then investigate the content of this file using sqlite3 or a GUI such as sqliteman. Here are a few interesting tables: celllocation, celllocationlocal, and wifilocation.

The first one is the one used by Alasdair Allan and Pete Warden in their "iPhone Tracker" tool. On my phone, there are 2,624 records in this table (timestamp, latitude, longitude, altitude, plus some other columns), the oldest one are 2.5 months old (February 5th -- FOSDEM!). It would seem that these records indicate the positions of cell towers rather than your own, but this can only be guessed since you can't have a look at the iOS source code...

The second table has a similar structure, but apparently a different content. I did not investigate further (yet).

The last one is a little different: wifilocation. It stores the position of a lot of MAC addresses (with, of course, an associated timestamp). I don't know if these are the MAC addresses of some wireless access points or the MAC addresses of wireless clients, but given that on my phone there are 35,770 records since February 6th, I doubt these are just access points.

...for fun and profit

The iPhone Tracker seems to be a very nice program, but it's for Macs only. So I hacked a little Python script that can read such a database and produce a KML file that can be then viewed using Google Earth.

The script is available here: iphone-tracker.py. No dependencies except for Python 3. Very simple to use:

./iphone-tracker.py path/to/consolidated.db > output.kml

The result can then be opened in Google Earth. The positions are grouped by day to avoid having 2500+ points overlapping on a map.

It can be seen, as described by the researchers who first found out about this, that the stored positions are far from being precise. The recorded timestamps are very approximate too. But the simple fact that so many data are stored about one's location is really concerning.

Several months ago, someone also made a web viewer where you can upload you database file and see the result in Google Maps (in French).

What now?

As far as I know, Apple has not made a public statement about this little controversy yet. But I'm really eager to see what they will tell about it -- if they care to tell something about it.

I'm also deeply concerned about the wifilocation table of this database, which, in some aspects, is much worse than the celllocation table (no need for your phone to store that: your network operator already has the data, and it's probably far easier for your government to ask them than to get access to your phone).

If it contains geolocation data of wireless access points, this could cause problems similar to what Google encountered in Germany, when Google Cars were gathering data about wireless networks in addition to the Google Street View pictures.

But if the wifilocation table actually contains the last seen location of wireless clients, it could mean that your phone can be used to prove that you were close to a specific person (identified by his phone wireless MAC address) at a specific moment. And, for some persons, in some countries, this is a serious reason to worry.

If you wish to disable this database on your (jailbroken) iPhone, you may use this workaround.

Secure remote backup for my mail folder

It is said that there are two kinds of people in the world: those that have lost a hard drive, and those that are going to lose a hard drive. Several weeks ago, I lost a huge hard drive and a lot of data. I was able to retrieve most, but not all of them.

Remote backups?

Now I regularly do complete incremental backups of my computers on external hard drives. But in case of big trouble (fire, theft...), this is just useless. The solution is to do additional backups on a remote server.

Last week, I decided to use rsync.net for that. I must say I am very pleased by this service: it's not that cheap (but affordable, especially when you are entitled to the student/teacher/Open Source developer discount), but it seems to be very solid, and the support is very competent and quick to answer any question. So now I am backing up 10+ GB of pictures using unison1, and the next step is to store more sensitive data: all my e-mails.

Here is what I want for my e-mails:

  • store them all. Currently there are about 7 years of e-mails from 3 different accounts, fetched on my laptop using offlineimap.
  • store them safely. I consider this to be sensitive data, so I want them to be encrypted with GnuPG.
  • sync them efficiently. I don't want to have to upload 1.5 GB just to store 5 more MB of mail, I want something that works a bit like rsync.

I considered different approaches:

  • rsync + sshfs + EncFS or eCryptFS: too complicated, probably too slow.
  • rsyncrypto: too complex (I don't want to have an extra certificate just for that), does not seem to be maintained, has annoying dependencies (needs to patch gzip).
  • duplicity: nice, but I don't need incremental backups.

So in the end I just wrote my own script to do just what I want =]

Introducing seb: Simple Encrypted Backup

How does it work?

seb is a simple Python 3 script that performs backups quite efficiently. It produces a bunch of packs, which are encrypted tarballs, which can then be uploaded to the remote host.

A pack contains a fixed number of files (250 in my case, which is reasonable given that most of my mails are only a few kilobytes). When running seb, it finds which files were added, modified or deleted since its last run. Packs are then updated to reflect these local changes. seb tries to be a little smart here: instead of creating new packs, it will first try to add new files to packs that have less than 250 files.

All the needed internal data are saved in a single file, a dictionary that is serialized using Python's pickle module. File modifications are detected using their modification times only; this can be an issue, but computing file hashes seems overkill here: mails are not supposed to change once they are sent and received...

Performance

seb is quite quick. The longest part of the backup is by far uploading data to the remote server.

Some statistics:

Local mail Remote backup
Number of files 87,107 349
Minimum size 218 B 54.89 kB
Maximum size 16.06 MB 47.82 MB
Average size 13.75 kB 1.65 MB
Total size 1169.32 MB 575.70 MB

How to use?

  1. Download seb: https://gist.github.com/821667
  2. Read the doc: ./seb --help
  3. Use: ./seb ~/mail /path/to/backup_fs

Here's a tip: on the first run, use a local directory as the destination, and then upload its content to your remote backup site (using scp or rsync for example). On the subsequent runs (where there will be much less data to transfer), you can mount your remote backup site using sshfs and use this mount point directly as your destination folder. It works fine for me.

How to restore a backup?

  1. Grab all the packs.
  2. Decrypt each pack with gpg and extract it with tar.
  3. That's all.

  1. At the moment, the servers of rsync.net are running unison 2.27.157. Arch Linux has 2.32.52 (which is the current stable release), and 2.40.* is due in a few days (next stable release). Problem: 2.2* is not compatible with 2.3*, 2.3* is not compatible with 2.4*, etc. So I updated the unison-old package on the AUR, which works really fine for me.

    According to the rsync.net support, they will be upgrading Unison on their servers this month.  

OpenPGP smartcard setup on Arch Linux

After I joined the FSFE Fellowship a few months ago, I received a nice OpenPGP smartcard. Now I'm using it for real, and I like it!

I've decided to buy two OpenPGP card readers on Kernel concepts:

  • Gemalto PC Express card for my laptop
  • SCM SCR-335 for my workstation

Both are very easy to get working on Arch Linux: just install ccid and pcsclite from the AUR, restart udev, start pcscd (/etc/rc.d/pcscd start), plug your reader in, and you're good to go.

The next step is to create a key to be used with the card. There is a good tutorial on this topic on the FSFE Wiki. Only one step can be greatly enhanced: step 12, "Removing the master key from the keyring". Here is a much easier version:

  1. Backup your public key: gpg --armor --export 559C215F > publickey.asc
  2. Remove your private and public key from your keyring: gpg --delete-secret-and-public-key 559C215F
  3. Import your public key: gpg --import publickey.asc
  4. Edit your key and set its trust level to Ultimate: gpg --edit-key 559C215F, trust, 5, save, quit
  5. Make GPG check your smartcard and recreate the secret key stubs by itself: gpg --card-status

That's it! Now you can return to the tutorial and test your card.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

And don't forget to have fun!
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)

iQEcBAEBAgAGBQJL8+C0AAoJEMPdciX+bh5InokH/17+dG0bYU05dTqHVOIDUKch
dGJ75jnO3cci9UcZeqghyH0Odi1uPpidRLWKjd1EogHNo24fb6/CwyL+6yUgW/RF
No0YOKG2r6dJGqpD91v5afd70JSkwMo66CRBpsou5TM6b6bG2p6dHVg3r2pJOKwJ
WoMbrsgHAAX7pGpAjhjREMLTIADwh5+5d1aQJx3qTjWNh908PVm+KN1iT9eryBWE
UJb98O6Zj02I4OTX3VsBmC29FyjfISBJ7LIElZQFTV7I3BIE+FDK9H9Hnb/3psF+
G/VOgHPILzd+BxuUzo4PGVne2GMPHv6vmm+yQlgvuz5Bnn/duU8gWVc+erDC2xQ=
=K7tA
-----END PGP SIGNATURE-----

Many thanks to the people involved in this thread on the GnuPG mailing list for the tip!

Jónsi — Go

Go, first solo album of Sigur Rós's singer Jón Þór Birgisson (aka Jónsi), was released on April 6th.

Of course, I had pre-ordered it. But I'm attending to a conference in Lille this week, so no CD for me (it hasn't arrived yet anyway). But thanks to Spotify, I've been able to listen to it anyway... and, well, I could say it is great, wonderful, magical, or even insanely awesome, but this would be an understatement.

"Go" cover

A little earlier, while walking in the street, I decided that I should definitely dent/tweet something about this album. Then I thought that listening to Jónsi is a little bit like flying without the fear of falling. And that's the moment I realized that writing reviews for albums is complicated (especially when you're not using your mother tongue), and that I should probably not even try to do that myself.

So here is an excellent review of Go. Now go listen to it, you won't regret it.

HOWTO Backup your GnuPG secret key on paper

Paper is a safe way to backup a secret key: you can't hack into it remotely, you can hide it very easily, and you will still be able to use it in 50+ years. No USB stick can do that...

If you want to store your GnuPG secret key on a paper sheet, it is quite simple to do. You can use PaperKey, a small tool that strips all the useless data from a secret key and formats it into a printable result. This is great, but the result can be quite long: printing my 2048 bits secret key would take 3 pages.

But there is a nice way to store more data on a small surface: 2D barcodes, for example in the DataMatrix format, using the great libdmtx library. For small keys, this is really easy:

gpg --export-secret-key KEY_ID | paperkey --output-type raw | dmtxwrite -e 8 -f PDF > secret-key.pdf

If your key is bigger (like my 2048 bits key), you will need to split it in several parts, because the result of the paperkey command will be too big to be encoded in a single DataMatrix. Here is a simple method:

# Generates key-aa, key-ab, ...
gpg --export-secret-key KEY_ID | paperkey --output-type raw | split -b 1500 - key-

# Convert each of them to a PNG image
for K in key-*; do
    dmtxwrite -e 8 $K > $K.png
done

You now have several PNG images that you can print together on a single page.


To restore your key, it's just as simple: scan each DataMatrix into a separate image, decode them with dmtxread, concatenate all the resulting files (cat...), and use paperkey:

cat my-scanned-keys | paperkey --pubring ~/.gnupg/pubring.gpg > secret-key.gpg

Source: TPK Archival (by David Shaw, creator of PaperKey).

"Piled Higher and Deeper" in France

This morning I read the latest Piled Higher & Deeper about égalité des chances.

Contes de la route: Equal Opportunity

I am part of the few people who do a PhD after a grande école. The part about "Good job €€€€" vs "Crappy job" is quite true -- except that, from what I experienced, these "good jobs" are often boring, non-technical ones: contract managers, directors, etc. The kind of job that takes 50+ hours a week, plus many weekends. Not my cup of tea. And that is just why I decided to do a PhD: it is very interesting and rewarding, and I have enough free time to do what I like to do besides my work.

New blog

After more than two months of silence, I'm back! And, once again, with a brand new blog...

So, why did I change again? Many reasons: Tumblr was annoying (non-free, no control over your data, simplistic template system). Before that, Dotclear required constant attention to make it was up-to-date because of security issues (and it would have been even worse with WordPress...). I realized that what I really want is something that generates a static website: plain (X)HTML files are much safer than any dynamic website! However I do not want to do everything by hand, so I need something that generates these static files from some human-readable markup (preferably Markdown). I also need to track everything I do on my data, to backup it easily, and to be able to quickly revert to an older version: I want to use Git on my blog. And since I now use Emacs all day long, I definitely want something that integrates well in Emacs.

Several static blog generators are available:

  • BlazeBlogger: quite nice! Written in bash, Markdown syntax... However, nothing for Emacs, and it uses a kind-of-version-control system that I do not like very much (it adds a lot of files to my git repository and just logs "this post was edited" without being able to revert to a previous version, so what's the point?).
  • nanoblogger: written in bash too. Seems too complex for what I want. Plus, it describes itself as "slow"...
  • Jekyll: close to perfect. It uses the Markdown syntax, has a nice template engine, integrates very well with Git (it's hosted on GitHub, which I like very much, and is even used for GitHub Pages). An Emacs mode is available. It has some very good ideas, like its YAML front matter. But Jekyll is written in Ruby, which is far from being my favorite language, and it lacks some features I like (tags...).

I finally decided to do something much funnier: write my own blog engine in Python. It took me a few days, but now it's done: Golbarg is born!

This brand new engine is written 100% in Python. It uses the Jinja 2 template engine, python-markdown for turning Markdown into HTML, and PyYAML for managing posts headers and metadata. golbarg.el is bundled, so every Emacs user can enjoy golbarg-mode. And all of this is of course available under the terms of the GNU GPLv3 license.

Golbarg is hosted on GitHub, as well as this blog. I also made Golbarg available on the Python package index (yes, you can install it with a simple pip install Golbarg!). Except for the comments in the source code, there is very little documentation available... So if you want to give Golbarg a try, look at the source of this blog, it's probably the best way to dive in.

Last few words: the old RSS feed will be available for a few weeks. Be sure to switch to the new feed as soon as possible!