Sleep soundly with JungleDisk and Amazon S3

Some time ago, a friend's house was broken into, and some bits and pieces were stolen. One of those pieces was the family's laptop with six or seven years of digital photographs stored on it. Next to the laptop lay a stack of DVDs on which my friend's wife wanted to backup the photos and some other personal data. Unfortunately, it was to be the very first backup of the laptop. With the thief went all the photographs of the kids' lives. It is very unfortunate (and exceedingly stupid, being an IT guy!) that he didn't do anything in terms of backup.

As many of you know, I'm almost paranoid in terms of backing up my data, and I care a great deal for where I locate the data I backup; it is useless to keep the backup next to the computer(s) you are backing up: theft and fire can take care of both very quickly. Keeping current backups off-site is easy, if you remember to swap the data carriers once in a while, but it is a bit of a pain.

Amazon (yes, the book people) have a service called S3. You apply for an S3 account with your regular Amazon credentials (e-mail address and password), and they offer as much storage as you want, for as long as you need it. What makes S3 unique, is that you don't pay a flat fee: you pay for what you use. Every GB of data you store costs USD 0.15 (or 0.18 for Europeans) per month. Every GB you upload or download costs USD 0.10. Knowing that, you can easily calculate what backing up your data will cost you per year (use the Amazon calculator as a guide).

With Amazon S3, and end-user can't do very much: you need tools to deposit and access data, and that is where JungleDisk comes in.

JungleDisk is a marvellous little program, which costs a one-time fee of USD 20.00. It is easy to set up, and JungleDisk assists you with the process of signing up for Amazon S3 by pointing you to the correct URLs at the Amazon web sites.

If you use a single S3 key with it, you can use JungleDisk from your Linux, Mac OS X and your Linux computer at the same time if you like. You give JungleDisk your S3 key and it handles the data management on your S3 account for you. So for example, you can use JungleDisk to backup your files at home, and retrieve them on your laptop at the office. JungleDisk maps your S3 storage onto a drive (on Windows) or a volume (on Mac). On Linux you use the command-line program (jungledisk) to mount your S3 storage using FUSE. With this, you can, for example, rsync your backups into JungleDisk, which then transparently uploads those to Amazon's S3 storage. Additionally, JungleDisk provides a WebDAV compatible service that you access via TCP port 2667 from the loopback interface:

You access the WebDAV interface from Windows, Mac OSX finder, or from Linux. Even with cadaver if you want to:

$ cadaver http://localhost:2667/
...

The JungleDisk GUI (available on all platforms) offers automatic backups.

The buckets stored on the S3 servers are encrypted by Amazon's service. Theoretically it is possible for Amazon employees to access that data, but it isn't very likely they'd do that. JungleDisk supports an additional AES encryption on the files you submit to it. The way this works is like this: you configure JungleDisk with an encryption key, effectively a passphrase you set (don't use a comma in it) with which JungleDisk encrypts the files transparently between itself and S3. When you attempt to access a file (via JungleDisk), it transparently decrypts it with your key, giving you the original data. You don't have to manually encrypt the files (with GPG or whatever): JungleDisk handles the encryption for you.

Amazon S3 allows you to divide your online storage space into multiple "buckets". Each bucket has its own set of files and directories that are completely separate from all other buckets. You can only access a single bucket at a time within Jungle Disk, and you cannot copy files and folders between buckets. Most users should be able to use just a single bucket for all their files, even when using Jungle Disk on multiple machines. However, Jungle Disk does allow you to create alternate buckets under your account if desired.

JungleDisk has good documentation, some answers, is actively supported and has an active support forum. You can get started immediately, by downloading JungleDisk and trying it for a month; but you will of course require an Amazon S3 account.

There are alternative services to S3, some of them even free of charge, but remember: you get what you pay for.

Memorize this

Print this out!

[via]

A tale of two upgrades

Two upgrades performed since yesterday: one on a ReadyNAS NV, and the other on SafeBoot.

First the good: I upgraded my ReadyNAS NV+ from its web interface to RAIDiator 4.00c1-p2, which was completely painless and works as advertised, directly from the FrontView web interface.

NV upgrade

One reboot later, the system was up and running. Beautiful.

Then the bad and the ugly: I wanted to upgrade my SafeBoot installation from version 4.2 to version 5 because the speed of hibernation has increased thirty-fold in the newer version. No problem, thought we; install the new version, reboot and bob is your uncle.

The reboot looked promising and the hibernation really is very much faster. What the program doesn't tell you though, is that it totally fucks up the partition table. Now that is one bit of miserable software (I mean the partition table. Oh, and I also mean any software that screws with it). Not that it deletes partitions, but SafeBoot sort of moved some sectors around. Not much mind you:

the situation "before":

/dev/sda4 : start=128744910, size=105691635, Id= f
/dev/sda5 : start=128744973, size=105691572, Id=8e

the situation "after":

/dev/sda4 : start=128680650, size=105755895, Id=8e

You'll see some bits missing (duh!) and a little bit of "movement" in the sizes and starting sectors of the partition.

Now, might that be a reason why my Centos doesn't want to start up any more? :-(

About seventy two reboots later, after having uninstalled SafeBoot (meaning two hours for decryption), the system was back to normal. Oh yes, we called support, after all that is why you have a corporate support contract, isn't it? Their answer: "it appears that the partitions have been changed. Reinstall the system". Good thing, a support contract; we wouldn't have otherwise known… Damn them!

I'm back to SafeBoot 4.2. Thanks for fifteen billable hours wasted!

Consumer Appliances for "Grown-Ups"

There is quite a respectable number of consumer appliances that have an Open Source operating system under their hoods, even though the fact isn't publicized (nor is it usually kept secret, either). I have at least two of them, a WRT54GL and an NSLU2, both by Linksys.

Often the manufacturers publish the Open Source–based code and some clever people take that and add features or capabilities to the devices that make them much more versatile.

Case in point is a LinkStation PRO which comes with a nice looking (if terribly slow) Web interface. The device as such is useful enough (in a predominantly Windows environment), but it isn't quite as capable as it ought to be. Why doesn't it offer NFS, for example?

A few clicks, and I find the LinkStation Wiki.

I follow some simple instructions, produce a bit of adrenalin and transpire a bit (not caused by the warmth in my office) and after five minutes my device is not bricked and welcomes me with

Now, this wouldn't be acceptable to the general public of course, but why don't these devices have a "simple" and a "complex" interface? The manufacturer could refuse support if a modification to the system has been performed (e.g. as soon as an SSH login was conducted).

The code is there, just waiting to be used. Give it to us.

RSS to IMAP: Newspipe

NewspipeMost RSS feed readers will keep a cache of downloaded articles for a user-defined amount of time. In NetNewsWire which I use, I can set article expiry in the program's preferences. In spite of having that functionality, there are some feeds that I want to archive for ever, and I'd like to have access to the article from wherever I am located. Additionally, I'd love to be able to access this archive with a simple mail client.

To this effect, I want certain feeds stored on an IMAP server. If I search for rss to imap or any similar query, there is a mass of software and solutions that turn up, but the best program I have found to date is Newspipe, an Open Source tool which is very easy to use and extremely flexible at that.

Newspipe is fed an OPML file with a twist, and it then grabs the feeds on a regular basis, or when I want it to, and produces SMTP messages which are then submitted to my mail server.

The twist on the OPML file is that Newspipe recognizes special attributes in the OPML outline entries which cause it to handle a feed in a special way. By the way, many feed readers can export an OPML file of my feeds on request, and I can use that as a starting point for Newspipe.

For example, certain feeds (such as the one on this site) have a short text–only representation of the post. If so desired, I can tell Newspipe to go and get the actual page, and it will download the page and all images to create a full MIME message representing the page, sending that off via SMTP to my account. This feature should be used with care, as the messages can become quite large. This is a sample of a textual message as rendered by Thunderbird

Newspipe text

and here is the same RSS feed entry as viewed when the referring page is downloaded by Newspipe:

Newspipe MIME

(Please note that these screen shots are not taken from Thunderbird's built-in RSS feed reader, but rather as seen by Thunderbird's IMAP client.)

Newspipe is controlled by a small .ini file and by attributes in the OPML entries, as described above. It runs on any platform that has Python and it is well documented.

By default, Newspipe runs for ever when started; I changed this behaviour in the configuration, because I execute the program every few hours via cron, so I set sleep_time=0.

Draft

I've just printed the pre-pre-final draft of an upcoming article I'm writing for iX magazine, and I'm going to relax with a cold beer and read through it carefully.

My head is buzzing.

The Tao of Backup

Tao of Backup

Encrypting Bacula Backups

BaculaBacula supports encryption of backup volumes in order to protect the tapes or disks it writes to from prying eyes.

Data encryption in Bacula works as adverstised; after creating a master key pair and a key pair for each file daemon, the configuration files on each fd that is to be protected (and I would recommend all of them) must be adjusted to use the newly created keys.

The documentation doesn't point out though, that certificates created as shown are only valid for thirty (30) days! In order to create certificates with a longer validity, the –days option must be used. Here I create a certificate valid for 10 years:

openssl genrsa -out master.key 2048
openssl req -new -key master.key -x509 -out master.cert -days 3650

which should be similarly applied to the certificate pairs created for each file daemon.

In order to prove that encryption is actually being used in my environment, I'm going to backup a single file; the file's content is a nursery rhyme which I can later easily recognize).

I first perform a backup without encryption using bconsole. When that has completed, I see

  FD Bytes Written:       467 (467 B)
  SD Bytes Written:       556 (556 B)
  Rate:                   0.1 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             no
  Volume name(s):         Vol001

in the messages, indicating that no encryption was used; correct. Performing a hex dump on the volume file shows the clear text of the nursery rhyme:

Bacula w/o encryption

After enabling data encryption on the file daemon by adding my keys and certificates to the bacula-fd.conf file

FileDaemon {
   Name = dad-fd
   FDport = 9102
   WorkingDirectory = /var/bacula/working
   Pid Directory = /var/run
   Maximum Concurrent Jobs = 4

   PKI Signatures = Yes            # Enable Data Signing
   PKI Encryption = Yes            # Enable Data Encryption
   PKI Keypair = "/etc/bacula/dad-fd.pem"    # Public and Private Keys
   PKI Master Key = "/etc/bacula/master.cert"    # ONLY the Public Key
}

I reset the whole system and start anew.

A backup of the same file differs:

  FD Bytes Written:       1,120 (1.120 KB)
  SD Bytes Written:       1,527 (1.527 KB)
  Rate:                   0.1 KB/s
  Software Compression:   None
  VSS:                    no
  Encryption:             yes
  Volume name(s):         Vol001

Note how a larger amount of data has been written compared to the original size above, which is natural: data encryption causes overhead. And note how the file daemon informs us that encryption was performed.

Looking at a hex dump of the output file Vol001 clearly indicates that the plain text is no longer visible:

Bacula encrypted

What remains visible though is the file's meta data (i.e the path names); Bacula doesn't encrypt them but this is clearly documented.

I'm sure the little bit of overhead in configuration and stored data will make many a systems administrator sleep more soundly. I certainly know that I will. ;-)