Nikolaus Rath's Website

ZFS-on-NBD - My Verdict

2022-09-29T00:00:00-07:00

I've been evaluating a ZFS-on-NBD based backup system for several weeks. I've now made a decision if I want to adopt this new setup, or stick with S3QL.

To make it short, I will drop my experiments with ZFS and continue using S3QL.

The main reasons for this are that the advantages of using ZFS are not as big as I had hoped, and that I discovered additional problems with the setup. But let me start with the positive things.

To recap: my final setup consisted of a ZFS pool spanning two bcache devices (one regular vdev, one special vdev). Each bcache device was backed by a loopback-mounted cache file and an NBD device. The NBD devices were backed by S3 buckets accessed through S3Backer running as an NBDkit plugin, one with a large object size and one with a small object size (for more details, refer to my previous post).

Having tested this for a few weeks, I found that:

Backups, overall, ran fine and without issues.
Hibernation/suspend and resume worked flawlessly even when the backup was running. By limiting the bandwidth of the rsync process, I was also able to prevent the bcache from filling, and thus able to suspend even when backing up more data that would fit into the cache.
Performance of file system traversal was good enough for me not to notice it (in particular since it did not block suspend).
Trim performance was not a concern either (though I used s3backer's --listBlocks flag).

However, several things did not pan out as I had hoped:

The use of the special vdev for smaller writes did not work as expected: large amounts of data were still written to this device with large request sizes, resulting in needless write splitting (see below for details).
Similarly, the regular vdev (backed by the bucket with larger object size) received a large numbers of small (4k) requests, resulting in needless write amplification.
I realized that there is no way to invalidate the local bcache (short of manually deleting it). This means that when the bucket is mounted from different systems, data corruption will eventually result (because the contents of the cache no longer reflect what's actually stored in the bucket).
At times, the setup script would fail with a No Such Device error when writing to /sys/block/{nbd_dev}/bcache/stop.

I have no doubts that at least the last two issues could be resolved in some way while keeping the overall setup mostly the same. However, I think this setup is already well past the point where it is more complex and fragile than using S3QL + FUSE - even if S3QL is out-of-tree and has a much smaller userbase. Combined with the increased traffic and latency due to the frequent mismatches between write request size and object size, I have therefore decided that this is not a route I want to pursue further.

Still, I feel like I've learned a lot along the way, and I certainly feel more confident in my use of S3QL again! I might even make another effort to finally found a solution for incremental uploads/downloads of the metadata database :-).

Appendix: Request Size Distribution

In the small object size bucket (4k), the vast majority of write requests had the expected 4k size:

But when looking at the total amount of data that was written, less than 5% of it was written with 4k requests:

In the large object size bucket (256k), the percentage of large write requests was bigger than in the 4k bucket, but 4k writes were still the most common write request size:

But at least by volume, most of the data was written in larger requests:

S3QL vs ZFS-on-NBD

2022-09-12T00:00:00-07:00

When I created S3QL in 2008, the Cloud Storage and Linux filesystem landscape looked rather different than today: there was no filesystem that supported compression, encryption, or de-duplication of data. The only relevant cloud storage system was Amazon S3, and it only offered eventual consistency.

S3QL was designed to fill these gaps: I wanted to be able to store compressed, encrypted, and de-duplicated backups in the cloud, without being tied to a particular backup software. It think it has served this usecase very well, and I am still using it today.

However, nowadays there are filesystems like ZFS which offer compression, encryption and de-duplication, and all the major cloud storage systems offer immediate consistency for all operations. In other words, large parts of S3QL now replicate functionality that also exists in-kernel filesystems, while other parts provide features that are no longer required (consistency handling).

This article documents my experience with setting up a cloud-based backup system that offers features very similar to S3QL, but is based on ZFS and network block devices (NBDs).

Contents

Filesystems vs Block Devices
s3backer
nbdkit
Object Size Considerations
Initial Success
Alignment Issues
Suspend/Hibernation
Metadata Performance
Adding caching
L2 ARC
S3backer over NBD
bcache
Back to s3backer
Summary

Filesystems vs Block Devices

The thing that has always bugged me most about S3QL is the handling of filesystem metadata. On the plus side, metadata operations are very fast. On the minus side, a full metadata copy needs to be available locally and needs to be downloaded/uploaded completely on every mount/umount. I have spent a lot of time thinking about ways to change this (e.g. by uploading a transaction log rather than the complete metadata, and/or splitting the data across multiple storage objects) but never came up with something that seemed worth the additional complexity. Therefore, I wanted to try an entirely different approach: instead of implementing a file system, implement a network-backed block storage device and run ZFS on top of that.

The appeal of this solution is that a block storage device needs to implement just a small number of very simple operations (read, write, discard), while file system semantics (and in particular the handling of metadata) are provided by ZFS.

This idea is not new. Even though a driver for userspace backed block devices was only just added to io_uring, the NBD driver has been in the kernel since at least 2004 and s3backer (which provides an S3-backed block device through FUSE) has been around for as long as S3QL. However, in the past none of these solutions appealed to me because there was no suitable (local) filesystem to combine them Handling encryption, compression and snapshots in the block-layer to me does not feel like a simplification over what S3QL is doing.

However, the rise of ZFS and BTRFS, and the availability of storage services with immediate consistency has changed this. Therefore, a few months ago I decided to try setting up a backup system using ZFS on a cloud-backed block device.

s3backer

As mentioned above, S3Backer provides a FUSE filesystem just like S3QL. However, the filesystem contains only one large file that is intended to be loopback-mounted to provide a block device, which can in turn be used with any (local) filesystem one desires.

S3Backer has a complex codebase that offers many features: compression, encryption, de-duplication, caching, and support for eventually consistent storage systems. Therefore, these features are available even when using a local filesystem that does not support them on its own.

To me, the architecture of s3backer has always felt somewhat klunky - why go through the trouble of implementing a filesystem if all we want is a block device? This is not just a cosmetic concern, but comes with a number of practical disadvantages:

The kernel serializes FUSE write and read requests on a per-file basis - which means that the entire block device will have its write and read requests serialized with no concurrency at the kernel - userspace interface.
Read and write request size cannot exceed 128 kB
All data is present in the page cache twice (once for the "upper" filesystem, once for the FUSE filesystem)
All requests pass through the VFS twice.

In the beginning, I therefore looked for a different option (however, as we'll see I later came back to s3backer).

nbdkit

nbdkit is a framework for writing NBD servers, which can then be used with the kernel's NBD client to provide a userspace-backed block device. nbdkit supports "plugins" (that implement data store/retrieval) as well as "filters" (which can be stacked to transform requests), both of which can be written in a multitude of programming languages.

nbdkit already came with an example plugin that provided a read-only view of a single storage object hosted on S3. It is written in Python and seemed straightforward to extend. Furthermore, nbdkit has an extensive test suite and merge requests are carefully reviewed, so I felt confident that I'd be able to extend this plugin quickly to do what I would need, while still having a high confidence in the correctness of the code.

nbdkit's plugin interface also meant that I could arbitrarily switch out the S3 plugin for "memory" and "file" plugins that stored all data in memory (or a local file) instead of going over the network, and to use the rate and delay filters to simulate the effects of network latency and bandwidth.

For these reasons, I decided to use nbdkit for my experiemnts. I started by contributing a number of features to the S3 plugin, adding write support, trim support, and the ability to split the data among multiple storage objects. I also extended the "stats" filter to keep track of request size and alignment distributions.

Creating the block device is then (relatively) simple:

$ nbdkit --unix my_socket S3 size=50G bucket=nikratio-test \
  key=my_data object-size=4K &  # start NBD server

$ nbd-client -unix my_socket /dev/nbd0  # connect /dev/nbd0

Object Size Considerations

The first design decision that I had to make was what object size to use on the S3 side, i.e. into how many different objects my filesystem would be split.

Ideally, the size of storage objects matches the size of NBD read/write requests received from the kernel:

If an NBD read/write is larger than the object size then it has to be spread across multiple storage objects. This means there is higher transfer overhead (since we need to transfer HTTP response/request headers for every object), higher costs (since every HTTP request is charged), and lower effective bandwidth (due to accumulation of round-trip latency, though this could be worked around by sending HTTP requests concurrently).
If an NBD write is smaller than the object size, then we first have to read the entire storage object, apply the update locally, and then write the entire object back again. This means we incur transfer overhead, higher transfer costs, higher per-request cost, and lower effective bandwidth.
If an NBD read is smaller than the object size, there are no negative consequences (since we can read just the required part of the storage object).

Unfortunately, the size of NBD read/write requests is not fixed, so choosing the right object size is not trivial (which is why I did a lot of experiments using the stats filter to determine what requests sizes are encountered in practice).

The smallest reasonable storage object size is the block size of the filesystem (since we know that every modification will at least result in one changed block). The largest reasonable storage object size is thus the maximum size of an NBD request, which can be up to 32 MB.

For ZFS, the block size is dynamic (details) and varies between 2^ashift and recordsize (both of which are ZFS configuration parameters). We want the gap between these values to be reasonably large (otherwise e.g. compression will not be effective, because it is done in recordsize chunks and the compressed data still takes at least 2^ashift bytes).

Reasonable ashift values (according to internet wisdom) are between 9 and 12, corresponding to minimum block sizes of 512 to 4096 bytes. Requests of this minimum size are also likely to be encountered frequently in practice as a result of e.g. metadata changes.

On the other hand, when storing multi-MB files in the filesystem, most of the data will be stored in full recordsize blocks. If such data blocks happen to be placed adjacently (which is common for e.g. ext4, but less likely for ZFS), even larger NBD requests can result.

So whichever object size we choose, we will most likely incur significant penalties - either for large or for small NBD write requests. To put this into numbers (assuming Amazon S3 pricings from today, an upload bandwidth of 3 MB/s, 30 ms ping times to S3 servers, and 512 bytes of HTTP metadata):

Uploading 1 GB of data using an object size of 4 kB will result in $ 1.31 of extra per-request charges, roughly 10% request overhead, and roughly 20x bandwidth reduction due to latency. (In theory we could avoid the bandwidth reduction by artifically limiting NBD requests to 4 kB, so that the splitting happens in the kernel and upload happens concurrently, but as we'll see below this is currently not an option).
Uploading 4096 bytes of data using an object size of 512 kB will result in 255-fold write amplification/bandwidth reduction (512 kB of extra read plus 508 kB of additional write).

Luckily, when using ZFS we can ameliorate this problem: ZFS can split blocks between two kinds of vdevs (backing devices) depending on their size. So if we assemble a zpool from a regular vdev (backed by one storage bucket) and a special vdev (backed by a different storage bucket), then the regular vdev will only see writes that are larger than special_small_blocks (another ZFS configuration parameter). We can therefore safely set the object size for the bucket behind the regular vdev to match recordsize, while choosing something smaller for the bucket behind the special vdev.

For my experiments, I picked ashift=12, recordsize=512k and special_small_blocks=128k (I really wanted to use 256 kB, but ran into a bug). For the storage object sizes, I chose 4 kB and 128 kB.

Initial Success

First, the good news. The setup fundamentally worked: I was able to setup ZFS across multiple NBD devices, and all ZFS operations worked just as for a local block device.

Bringing this setup up and down correctly is a bit involved so I created orchestration scripts. The commands that were ultimately executed are:

$ nbdkit --unix /tmp/tmplxtw074f/nbd_socket_sb --foreground --filter=exitlast \
  --filter=stats --threads 16 --filter=retry S3 size=50G bucket=nikratio-test \
  key=s3_plain_sb endpoint-url=http://s3.eu-west-2.amazonaws.com \
  statsfile=s3_plain_sb_stats.txt statsappend=true statsthreshold=100 retries=100 \
  retry-readonly=false retry-delay=30 retry-exponential=no object-size=4K

$ nbdkit --unix /tmp/tmplxtw074f/nbd_socket_lb --foreground --filter=exitlast \
  --filter=stats --threads 16 --filter=retry S3 size=50G bucket=nikratio-test \
  key=s3_plain_lb endpoint-url=http://s3.eu-west-2.amazonaws.com \
  statsfile=s3_plain_lb_stats.txt statsappend=true statsthreshold=100 retries=100 \
  retry-readonly=false retry-delay=30 retry-exponential=no object-size=128K

$ nbd-client -unix /tmp/tmplxtw074f/nbd_socket_sb --timeout 604800 /dev/nbd1
$ echo 32768 > /sys/block/nbd1/queue/max_sectors_kb
$ nbd-client -unix /tmp/tmplxtw074f/nbd_socket_lb --timeout 604800 /dev/nbd2
$ echo 32768 > /sys/block/nbd2/queue/max_sectors_kb
$ zpool create -f -R /zpools -o ashift=12 -o autotrim=on -o failmode=continue \
  -O acltype=posixacl -O relatime=on -O xattr=sa -O compression=zstd-19 -O checksum=sha256 \
  -O sync=disabled -O special_small_blocks=131072 -O redundant_metadata=most \
  -O recordsize=524288 -O encryption=on -O keyformat=passphrase \
  -O keylocation=file:///<path> s3_plain /dev/nbd2 special /dev/nbd1

Creating a the zpool, writing a 771 MB test file, and exporting the zpool again resulted in the following NBD requests:

4k bucket:
read: 162 ops, 0.000155 s, 4.41 MiB, 27.76 GiB/s op, 82.71 KiB/s total
write: 631 ops, 0.003885 s, 7.88 MiB, 1.98 GiB/s op, 147.90 KiB/s total

128k bucket:
read: 114 ops, 0.000117 s, 3.39 MiB, 28.27 GiB/s op, 64.75 KiB/s total
write: 1521 ops, 0.548140 s, 716.91 MiB, 1.28 GiB/s op, 13.39 MiB/s total

(the total written data is less than 771 MB due to the ZFS compression).

The distribution of NBD write request sizes was as follows:

4 kB bucket:
   4096 bytes: 67.8% of requests (428)
   8192 bytes: 11.4% of requests (72)
  12288 bytes:  4.9% of requests (31)
  16384 bytes:  4.6% of requests (29)
  20480 bytes:  1.9% of requests (12)
 114688 bytes:  1.7% of requests (11)
  32768 bytes:  0.8% of requests (5)
  53248 bytes:  0.6% of requests (4)
 253952 bytes:  0.5% of requests (3)
  45056 bytes:  0.3% of requests (2)
  77824 bytes:  0.2% of requests (1)

128 kB bucket:
 524288 bytes: 81.1% of requests (1233)
   4096 bytes:  3.0% of requests (45)
1048576 bytes:  1.0% of requests (15)
 204800 bytes:  0.6% of requests (9)
 454656 bytes:  0.5% of requests (8)
 155648 bytes:  0.5% of requests (7)
 208896 bytes:  0.4% of requests (6)
 450560 bytes:  0.3% of requests (5)
 417792 bytes:  0.3% of requests (4)
 278528 bytes:  0.2% of requests (3)
 339968 bytes:  0.1% of requests (2)
 352256 bytes:  0.1% of requests (1)

(Yes, there is a slight discrepancy between the histogram and the total count - this still needs investigating).

Alignment Issues

Initially, I limited the size of NBD requests to the larger object size (128 kB) to enable concurrent processing of larger requests (by virtue of the kernel splitting them into smaller requests and sending them concurrently to userspace, and userspace processing them with multiple threads).

However, I soon found out that the kernel's NBD client does not align its requests to the preferred block size of the NBD server. For example, the alignment of the 128 kB writes was:

131072 bytes: 95.7% of requests (5632)
      12 bit aligned: 100.0% (5632)
      13 bit aligned:  77.9% (4389)
      14 bit aligned:  59.8% (3369)
      15 bit aligned:  17.0% (959)
      16 bit aligned:  15.0% (843)
      17 bit aligned:  12.0% (677)
      18 bit aligned:   6.0% (336)
      19 bit aligned:   3.0% (168)
      20 bit aligned:   1.5% (84)
      21 bit aligned:   0.8% (43)
      22 bit aligned:   0.4% (22)
      23 bit aligned:   0.2% (11)
      24 bit aligned:   0.1% (5)
      25 bit aligned:   0.1% (3)
      26 bit aligned:   0.0% (2)
      27 bit aligned:   0.0% (1)

Ideally, every single request should have been aligned to a 128 kB boundary (i.e., be 17-bit aligned). The fact that 88% of requests had no such alignment meant that they were actually overlapping two storage objects, and thus required four HTTP requests (2 read-modify-write cycles). In other words, non-aligned writes are just as bad as writes smaller than the object size.

Changing this required non-trivial changes to the kernel that were beyond my current capabilities.

To mitigate the impact of this, I instead set the NBD request size to the maximum value (32 MB) rather than the object size. This means that large requests spanning multiple objects are split by the NBD server rather than the kernel. At least with the current ndbkit architecture, this means that the "sub-requests" are processed sequentially - but on the plus side, the requests are split in the right positions to minimize write amplification.

Suspend/Hibernation

The second disappointing finding was that any activity on the NBD-backed filesystem made it impossible to suspend (or hibernate) the system. Attempts to suspend generally ended with the kernel giving up as follows:

kernel: Freezing user space processes ...
kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
kernel: task:rsync           state:D stack:    0 pid:348105 ppid:348104 flags:0x00004004
kernel: Call Trace:
kernel:  <TASK>
kernel:  __schedule+0x308/0x9e0
kernel:  schedule+0x4e/0xb0
kernel:  schedule_timeout+0x88/0x150
kernel:  ? __bpf_trace_tick_stop+0x10/0x10
kernel:  io_schedule_timeout+0x4c/0x80
kernel:  __cv_timedwait_common+0x129/0x160 [spl]
kernel:  ? dequeue_task_stop+0x70/0x70
kernel:  __cv_timedwait_io+0x15/0x20 [spl]
kernel:  zio_wait+0x129/0x2b0 [zfs]
kernel:  dmu_buf_hold+0x5b/0x90 [zfs]
kernel:  zap_lockdir+0x4e/0xb0 [zfs]
kernel:  zap_cursor_retrieve+0x1ae/0x320 [zfs]
kernel:  ? dbuf_prefetch+0xf/0x20 [zfs]
kernel:  ? dmu_prefetch+0xc8/0x200 [zfs]
kernel:  zfs_readdir+0x12a/0x440 [zfs]
kernel:  ? preempt_count_add+0x68/0xa0
kernel:  ? preempt_count_add+0x68/0xa0
kernel:  ? aa_file_perm+0x120/0x4c0
kernel:  ? rrw_exit+0x65/0x150 [zfs]
kernel:  ? _copy_to_user+0x21/0x30
kernel:  ? cp_new_stat+0x150/0x180
kernel:  zpl_iterate+0x4c/0x70 [zfs]
kernel:  iterate_dir+0x171/0x1c0
kernel:  __x64_sys_getdents64+0x78/0x110
kernel:  ? __ia32_sys_getdents64+0x110/0x110
kernel:  do_syscall_64+0x38/0xc0
kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
kernel: RIP: 0033:0x7f03c897a9c7
kernel: RSP: 002b:00007ffd41e3c518 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
kernel: RAX: ffffffffffffffda RBX: 0000561eff64dd40 RCX: 00007f03c897a9c7
kernel: RDX: 0000000000008000 RSI: 0000561eff64dd70 RDI: 0000000000000000
kernel: RBP: 0000561eff64dd70 R08: 0000000000000030 R09: 00007f03c8a72be0
kernel: R10: 0000000000020000 R11: 0000000000000293 R12: ffffffffffffff80
kernel: R13: 0000561eff64dd44 R14: 0000000000000000 R15: 0000000000000001
kernel:  </TASK>

As far as I can tell, the problem is that while an NBD request is pending, the process that waits for the result (in this case rsync) is refusing to freeze. This happens no matter how long the timeout is set, so I suspect that the root cause is that the NBD server task (in this case nbdkit) has already been frozen, so the client process is unable to make progress to a state where it can be frozen.

Interestingly enough, the same should apparently happen with FUSE (see kernel list discussion) but - at least for me - almost never happens in practice. So either I was exceptionally lucky, or something else is going on (I suspect that maybe a task waiting for FUSE I/O enters interruptible sleep, while a task waiting for ZFS I/O enters uninterruptible sleep).

As an experiment, I tried renaming the NBD server to zzz_nbdkit (hoping that freezing goes in alphabetical order), but it did not help.

I also discovered that attempting to suspend while zpool export is running is a very bad idea. In contrast to "regular" client processes, the kernel here hangs before attempting to freeze userspace:

kernel: PM: suspend entry (deep)
kernel: Filesystems sync: 661.109 seconds
[...]
kernel: Freezing user space processes ...

In between the first two messages, the systems is in a weird, semi-suspended state. This means that eg WiFi is no longer available, which may well prevent the sync to ever complete it requires data to go over the network. The same probably applies to some other devices too. Furthermore, the system will refuse to restart, suspend, or power off.

Metadata Performance

The third issue is something that, in hindsight, I probably should have expected: with the metadata being downloaded piece by piece, rather than all at once, metadata operations with ZFS-on-NBD are a lot slower than with S3QL.

I did not, however, expected them to be that slow. Running a simple find -type d -exec ls -l {} \; on a directory tree with 687 files required 28 seconds - i.e. the pace is roughly 25 files per second. (The total amount of data downloaded for this was 9.6 MB, but this was mostly likely dominated by the transfer from mounting and unmounting).

This means that even a backup where no files have been changed requires a long time to run. I could probably still accept this if I could suspend the system during that time, but taken together with the inability to suspend this becomes a dealbreaker for me. Turning on your laptop briefly to check something, and then being unable to turn it off again because a backup has kicked in is very frustrating.

Adding caching

At this point accepted that I would probably have to add some sort of persistent cache to my setup. At least in theory, a cache should be able to solve all of the above problems. I identififed four options for this:

Use ZFS's L2ARC
Use s3backer and its caching feature
Use bcache
Add persistent caching support to nbdkit (I did not look into this in more detail).

L2 ARC

ZFS L2 seemed like the most natural solution for this since it is part of ZFS. My hope was that:

Since it's tighly integrated into ZFS, it would not cache the same data twice (in ARC/page-cache and L2ARC)
Since its in-kernel, writes to the cache should not be affected by the freezing of userspace, enabling suspend during IO.

Unfortunately, I discovered that the way the L2ARC is designed makes it pretty useless for my use-case.

First, the L2 ARC is not a writeback cache, so suspend continued to be impossible while there was active I/O.

Secondly, I observed absolutely no performance increase. I tried adding an L2 ARC backed by a local, on-disk file and, to maximize cache filling, set a l2arc_noprefetch to zero, l2arc_headroom to 1000, and l2arc_write_max to 100 MB. I then mounted and accessed the system a few times. However, performance for metadata operations remained exactly the same as without the cache.

It seeems that for whatever reason, the data that's needed for simple directory walking does not make it into the L2 ARC. I spent some time studying the L2 ARC Feeding Mechanism but still could not figure out why. So I gave up on this approach.

S3backer over NBD

Since I couldn't get the L2 ARC to work, I decided to instead give s3backer a try after all (since it looked like it provided the kind of caching that I needed).

s3backer being in userspace, I did not expect it to make a difference for the ability to suspend during I/O. However, I was hoping that I'd see much improved metadata performance and better write throughput (since the cache completely decouples NBD requests from HTTP requests to cloud storage).

My first step was to contribute NBD support for s3backer, i.e. I made it possible to run S3Backer as a NBD server (in form of another nbdkit plugin) rather than a FUSE server. I also disabled as much functionality as possible (no compression, no encryption, no MD5 verification). In this setup, we still have duplication between ARC and page cache, but I contributed patches to at least advise the kernel to drop page cache data quickly.

Unfortunately working on the S3Backer codebase was quite frustrating for me for several reasons:

The code is written in C and uses structs of function pointers to provide flexible layering of components. This means that given a function call in the code, there is no easy way to jump to the code implementing the function (at least I didn't manage to do so with both Emacs and VS Code). This makes it hard to navigate and understand the codebase.
There are no unit or integration test. This means that when making changes there was no easy way to check that I didn't break something.
When testing my own changes, I repeatedly encountered bugs in the master branch - which initially I kept attributing to my own changes (examples: #191, #184)
Almost every time I would try to run S3Backer (e.g. after pulling a new version, or wanting to try a different configuration) it would not work (examples: #181, #179, #175, #174)

The last point probably deserves some additional explanation. As far as I know, S3Backer is used in production by many users, so it cannot truly be as unreliable as it felt to me. I therefore suspect that the reason for my unfortunate experience is the combination of S3backer's extreme configurability (there is a huge number of configuration parameters) with the absence of tests.

My theory is that the active S3backer users have been using it with the same, unchanging configuration for a long time (so bugs with that configuration have long been found and fixed). In contrast, I probably used a configuration that - while theoretically supported - has never been used by someone in practice and was therefore entirely untested. To S3backers credit, the bugs that I encountered were generally fixed quickly (or a workaround provided), but it still made for a frustrating experience.

Nevertheless, with the help of s3backer's main developer I was able to eventually get things running as I wanted. Once again, at first sight things seemed well:

I was able to switch between s3backer and nbdkit's S3 plugin at will, accessing the same data.
Caching worked as expected, making metadata access much faster
Alignment effects were (presumably) reduced (I did not test this systematically).

However, I also ran into new issues:

By default, s3backer's trim/discard operation is very inefficient. Given a range to discard, s3backer unconditionally tries to delete every object in this range, even if none of them actually exist (the S3 nbdkit plugin, in contrast, issues a scoped LIST request first and only deletes the objects that actually exist).
The alternative is to configure s3backer to download the full list of objects on startup and keep it in memory. This makes discard requests faster, but it means that the time and memory consumption to mount the filesystem becomes proportional to the size of the filesystem, i.e. the very thing that I disliked about S3QL and wanted to eliminate.
s3backer's cache is LRU rather than LFU. So when writing a lot of data, this data pushes the metadata out of the cache. What I really needed was seemingly an LFU or writearound cache.

bcache

My third attempt to add caching to my setup featured bcache, a kernel-side caching layer for block devices.

I had used bcache in the past as a local, SSD-based cache for HDDs. Based on this, I already had some reservations:

Firstly, the interface is clumsy. Removing/unregistering bcache devices is hard. It requires writing commands into multiple files in /sys/{fs,block} in the right order, and then making sure that nothing triggers re-registration through udev rules.

Secondly, bcache seems to have been effectively abandoned (I suspect at the point when it was sufficiently stable to serve as a proof of concept for bcachefs). This is reflected in there being some major unresolved bugs and the documentation being sparse, partially out of date, and split across too many locations (e.g. make-bcache(3) does not agree with make-bcache --help, the kernel docs refer to non-existing control files).

Still, bcache seemed like a reasonable solution for the problem at hand so I gave it a shot. I turned the two NBD devices into bcache backing devices and added two cache devices (by loopback-mounting two local image files). I then created a zpool from the resulting two bcache devices.

Once again, fundamentally this setup worked. Both ZFS and bcache performed as expected, oblivious to the fact that backing devices were stored in the cloud. I also found that make-bcache's bucket and block size settings seemed to affect only requests to the caching device, not the backing device. This was a relief, because the bcache documentation did not give me much guidance how to choose these values.

I also got great improvements in performance. When emulating 20 ms network latency and 1 MB/s bandwidth, I previously got:

# time find /zpools/test/ -type d -exec ls -l {} \; | wc -l
140466

________________________________________________________
Executed in  409.17 secs   fish           external
   usr time    3.93 secs  600.00 micros    3.93 secs
   sys time    3.62 secs  186.00 micros    3.62 secs

With bcache in the stack (and a single cache-priming run), this improved to:

# time find /zpools/test_bcache/ -type d -exec ls -l {} \; | wc -l
140466

________________________________________________________
Executed in    3.70 secs   fish           external
   usr time    2.51 secs  608.00 micros    2.51 secs
   sys time    1.30 secs  186.00 micros    1.30 secs

bcache supports writeback, writethrough, and writearound caching. In both writeback and writearound mode, the system still cannot suspend while there is active I/O. However, writearound has the advantage that writing data to the filesystem does not push out cached metadata - big advantage when running backups.

In writeback mode, we write requests can be completed without having to write to the backing (NBD) device. This means that as long as there is space available on the caching device, the system can suspend even while under I/O - Eureka! However, this comes at the disadvantage that writing new files can now push out cached metadata.

Luckily, both these caveats can be worked around by pacing writes to the zpool. I found that by rate-limit the rsync process that I use to create backups to my uplink bandwidth, I could ensure that the cache never fully fills - thus ensuring that suspend is always possible, and metadata not pushed out.

As the reader possible expects at this point, there were still other drawbacks:

Some ZFS operations seem to be able to bypass the cache even in writeback mode. Setting /sys/modules/zfs/parameters/{zfs,zil}_nocacheflush to 1 prevented this behavior for zpool sync and zpool export, but I did not manage to carry out e.g. zfs snapshot without activity on the backing device. This is important because while such commands run, suspend is not working right. I worked around this by wrapping such commands into systemd-inhibit.
There does not seem to be a good way to tell when the NBD device can be disconnected. When shutting down a bcache device (by writing into the stop files in /sys/block/<dev>/bcache/cache/stop and /sys/block/<dev>/bcache/stop, the corresponding control files disappear, but bcache continues to write to the backing (NBD) device, resulting in data loss when terminating NBDkit at this point. The only reliable method that I found was to follow the kernel's log messages and wait for a cache: bcache_device_free() bcache<N> stopped message - which feels very klunky.
The documented behavior to flush the bcache cache (writing none to the cache_mode control file) works, but also blocks suspend while this command is running. I worked around this by setting writeback_percent and writeback_delay to zero and polling /sys/block/<dev>/bcache/dirty_data.

Back to s3backer

Just when I thought that I finally had a reasonably well working configuration and started running reasonably sized backups, I discovered another problem: the performance of the nbdkit S3 plugin is abysmal for small NBD requests. For example, I would see 112 kB writes taking 0.45 seconds (248 kB/s) and 4 kB writes taking 0.36 seconds (11 kB/s).

I have not yet used root caused this but suspect that either the code does not re-use established HTTP connections or that some of the hash calculations are done in pure Python (reducing performance and preventing effective multithreading). A casual inspection of the boto Python module (which is used for interfacing with S3) made me quickly drop the idea of ever touching this code. In short, boto is very "enterprisy" code that provides a generic interface to every possible AWS service (not just S3), and attempts to navigate (let alone understand) the S3-specific code made the s3backer codebase look like a piece of code.

Therefore, I found myself coming back to s3backer a third time. This time, I disabled also the caching functionality (while keeping bcache in the stack for this). With this setup, write performance was much better (though, as mentioned before, at the cost of reduced discard performance).

Summary

So, what have I learned from all of this?

First of all, I was able to setup a ZFS-on-NBD stack that I think will would most likely satisfy my backup requirements.

But.. is it better than my current, S3QL-based setup? I am not sure.

Performance and feature wise, both solutions perform equally well.
In terms of robustness and complexity, I prefer having encryption and compression implemented in ZFS rather than in S3QL. I believe the ZFS implementation is both more efficient and better tested.
However, the ZFS-on-NBD setup involves stacking together a large number of components (ZFS, nbdkit, bcache, s3backer) that probably aren't used together in this form by many people, and each of which introduces new points of failure. Just setting up the mountpoint properly requires a script with hundreds of lines of code.
Furthermore, some of these components are abandoned, many of them as least as complex as S3QL on its own, and most of them I'm not comfortable debugging or modifying.
S3QL, on the other hand, is an all-in-one solution with a code-base that I'm very familiar with.
Lastly, I currently seem to have the choice between either abysmal performance for slow writes (when using the S3 plugin), abysmal performance for discard requests (s3backer without "preloading" the object list), and having to download (and keep in memory) a full object list at mount time (the very kind of operation that I dislike most about S3QL).

For now, I think I will run both solutions in parallel and wait for comments on this write-up!

MATLAB is a terrible programming language

2017-09-17T00:00:00-07:00

I consider it fairly uncontroversial that, as a programming language, MATLAB is a terrible choice. However, I found out that to some people this isn't actually obvious at all - especially when their first exposure to programming was through MATLAB. Explaining why the MATLAB language is so bad isn't easy to do in a quick hallway conversation, so I wrote this blog post as a resource I can refer people to.

This post is inspired by Eevee's excellent PHP: A fractal of bad design blog post. While I wouldn't say that MATLAB is quite as bad as PHP, there are some interesting similarities. The MATLAB language was originally designed for numerical computation (like PHP was designed to insert small dynamic elements into mostly static HTML pages), but then kept gaining features that turn it into something closer to a general purpose programming language. And as for PHP, it is difficult to point at one specific thing and say that this is what makes it a bad language - it's more that there's a ton of small things that are all slightly wrong. Individually, none of them make the language bad, but taken together, they make writing and maintaining non-trivial MATLAB code a rather painful exercise.

Here comes the list:

Lack of Documentation

The first problem with the MATLAB language is that it is not documented. I am not talking about documentation of the functions that the language provides - there is plenty of that. I am talking about the syntax and semantics of the language itself. Every worthwhile general purpose programming language that I know has a description of the language, and a description of the standard library (though it may not be called that). MATLAB only has the latter. If you still don't know what I'm talking about maybe an example will help: the Python language is documented in the Python Language Reference, the Python standard library is documented in the Python Library Reference. Take a look at both, and you will (hopefully) note how the latter describes the functions that Python provides, while the former describes how to write Python code. In the MATLAB documentation, this part is missing. The only way to find out what is valid MATLAB code and what isn't is to look at other code or via trial & error. A language reference would answer questions like "where exactly can I write end"? From a first glance at MATLAB code, you can tell that end is used to signal the end of e.g. an if block (if foo; do_something(); end). On a second glance, you may notice that you can also use it to index into an array (my_numbers(end), and even my_numbers(end-1)). You may then conclude that you can also pass end as a function parameter (print_number(my_numbers, end)) or save it in a variable (idx = end). Will this work? There is no way to find out from the documentation.

Ambigious Syntax

What do you think happens in the following code: multiply(2)? You may be forgiven if you think it calls a function called multiply with a parameter of 2. You may also be forgiven if you think it returns the second element of the multiply vector. This is because you can't tell. To understand what this code does, you first have to find out how multiply is actually defined. On the other hand, MATLAB has a datatype called cell array (that is similar to e.g. a Python list) which you have to index with curly braces (bru{3}) and that gives an error when indexed with (). So there are two kinds of braces supported by MATLAB, and instead of using one kind for function calls and the other kind for indexing, one kind of brace is overloaded to mean both, and the second kind is (seemingly randomly) assigned to work with just some datatypes.

Counterintuitively limited syntax

Suppose I told you that get_numbers() calls a function that returns an array of numbers, and that bar(4) accesses the 4th element of the array bar, what do you think get_numbers()(4) will do? If you think it gives you the 4th element of the vector returned by the get_numbers() function, you think like me and you are wrong. It will actually give you an error. Instead, you have to first assign the result into a variable, and then index into it (tmp = get_numbers(); tmp(4)). The same applies to a variety of other operations - they don't work on arbitrary expressions, but only on some specific ones (e.g. just on plain variable names).

Function semantics are needlessly overloaded

Several MATLAB functions fulfill multiple, but completely unrelated purposes. For example, the exist function checks if its argument is either a variable declared in the current workspace, a file or directory in the current directory, a file with an extension known to MATLAB somewhere in the MATLAB search path, or a Java class - unless MATLAB is started with the --nojvm argument. If you want to check specifically if a file with a given name exists, you have to pass an extra parameter to exist telling it to look only for files and directories - but even then you have to take extra care because calling exist('myfile', 'file') still returns a true value if there is no myfile, but myfile.m exists somewhere in the search path. This is ridiculous. Why would you ever want to check if something is either a file or a builtin MATLAB function? This is just inviting trouble. There should be separate functions (e.g. file_exist, class_exist, var_exist) for separate purposes.

Everything is in the same namespace

MATLAB does not have namespaces, everything sits in the same global namespace. This means that nobody is able to remember what names are already in use by MATLAB, and which ones aren't. Predefined names are also all over the place, from short and obscure abbreviations like lqrd over common verbs like find and who to long descriptions like SimulinkRealTime. There is no convention you could follow to avoid accidentally redefining (or using) a name that MATLAB already uses for something else.

Parameter names are treated as strings

Many MATLAB functions accept parameters in the form function(arg1, arg2, 'NameOfArg3', arg3, 'NameOfArg4', arg4). In other words, parameter names are passed as parameters itself. This is fundamentally bad because parameter names are not strings and should not be treated as such.

No 1-D arrays

MATLAB does not have support for one-dimensional arrays (or lists, cell arrays, etc). This means that if you need to represent an ordered set of elements, you have to make an awkward decision between representing it as an 1xN or an Nx1 data structure. Even worse, the two variants are treated the same way in some situations, but differently in others. For example, you can use a single index (foo(3)) to index both an 1xN and an Nx1 array, but if you attempt to loop over it (for el=array), it will work with only 1xN arrays. To add insult to injury, some functions (like Simulink.sdi.getAllRunIDs) return 1xN arrays when they have something to return, but 0x1 arrays when the list is empty. To handle this correctly, you have to use incantations like for el=reshape(array, 1, []).

Cell Array Iteration is awkward

Another iteration problem comes about when iterating over cell arrays: instead of assigning the loop variable to each element of the array, the loop variable is assigned to a one-element cell array. So the following won't work:

data = { 'foo', 'bar', 'com' };
for el=data
    fprintf('Processing %s...\n', el);
end

Instead you have to index into the loop variable first:

data = { 'foo', 'bar', 'com' };
for el=data
    el = el{1};
    fprintf('Processing %s...\n', el);
end

Semicolon Changes Semantics

In MATLAB, the semicolon acts both as a statement terminator and to suppress printing of the evaluated expression. This is reasonable, but it turns out that sometimes whether an expression is printed determines how the expression behaves. Calling tg = slrt('barf'); will give you an object to communicate with the barf system. Calling tg = slrt('barf'), however, will also attempt to connect to the system - i.e. it may block or return a whole new class of errors.

Functions are too clever

Many MATLAB functions try to be particularly clever in anticipating the users needs. Unfortunately, that cleverness cannot be turned off when it is not wanted. For example, the delete function can be used to delete files from disk. (It may also be used to unset a variable or release memory, which is another example of pointless overloading of function semantics, but this is not what I complain about here). The problem is that if delete finds that the given filename contains an asterisk (*), it magically expands the asterisk and deletes all matching files. I have no doubt that this can be useful, but it is terrible when it cannot be switched off. If I have code that is intended to delete one file and I pass it a filename containing *, I expect it to delete a file with exactly that name. If filenames with * characters are not supported (Hello, Microsoft Windows) I expect the function to return an error, and not to silently delete other files.

No way to store static data

If you have a some static data that you'd like to use in multiple MATLAB files (eg. a mapping from error codes to textual descriptions), there is no direct way to do that. You can create a .m file that defines a variable, but you cannot get access to this variable from another file without terrible contortions (aka loading the file as a string and passing it to eval()). The only feasible workaround is to create a class that encapsulates your single variable, and then work with that class, or to define a function that re-creates the data on every call and then returns it.

Programmatic error handling is near impossible

This is one of my biggest gripes: programmatic handling errors in MATLAB in a reliable way is near impossible. The problems are:

The documentation does not contain any information about how a given MATLAB function may fail. The only way to find out is to somehow trigger all the error cases you can think of, and then call the function to figure out what exception it will throw (if any), or if it is going to return a special return value, or if it will silently do nothing.
There is no consistency in the failure modes. Some functions may throw a exception if the relevant file isn't found (eg. renamefile()), others may return a special value (eg. fopen()). In other words, there is no way to generalize the knowledge you have empirically gained.
When functions raise exceptions, the exception identifiers are way too broad to be useful. For example, any kind of problem with files (if it leads to an exception in the first place) gives you a generic I/O error. To distinguish between e.g. "file not found" or "permission denied" you have to parse the string representation of the error message.
Inexplicably, MATLAB's exception handling construct doesn't allow you to restrict the exceptions that you want to catch -- it's all or nothing. This means that every exception handling code first starts with an if statement to determine if this is actually an exception that should be handled, and has to re-throw the exception if not.

All in all, this means that "error-aware" code in MATLAB typically looks like this:

try
    res = do_something();
    if res == special_value1
       % Handle problem
    elseif res == special_value2
       % Handle problem
    end
catch exc
    if strcmp(exc.identifier, 'IOError') && \
       strfind(exc.msg, 'File not found')
       % Handle problem
    else
       exc.rethrow()
end

The amount of boilerplace here is incredible. Why can't this be written as eg.:

try
    res = do_something();
catch 'IOError:FileNotFound' as exc
    % Handle problem
catch 'OtherError1' as exc
    % Handle other problem
end % pass through all other problems

My swapalease.com experience

2017-04-05T00:00:00-07:00

I recently found myself in the situation of wanting to get out of a lease before its contractual maturity date. The most cost-effective way to do so is to transfer the lease to someone else. swapalease.com is one of the companies that promises to bring together people for such transactions.

I will say straightaway that I was very skeptical. Swapalease charges not just a provision when the lease is actually transferred, but you also have to pay to list your vehicle (if you want to get out of your lease), and you have to pay to contact sellers (if you are looking for a lease). The charge for listing a vehicle is about $ 100, the price for contacting sellers is $ 59 (you have to join a "Buyer's Club"). To me, this seems quite a lot. On the other hand, it is relatively little when compared to the other options for getting out of a lease.

Since I also couldn't find any reliable reports of how big the chances of finding a buyer via Swapalease are, I decided to risk the $ 100.

My car has now been listed for about a month, so I think I am ready to comment on whether this was worth the expense. The answer is a clear No, and I hope this report will prevent other people from making the same mistake.

For comparison purposes, I posted the same ad on Craigslist (for free). In about a month, I have had about 10 people contact me about the Craigslist ad and not a single contact from Swapalease. I received one email from Swapalease telling me that a buyer was "interested" and that I should get in touch with him. I did so, but never heard back. I am not sure what criterion Swapalease uses to trigger these emails, but they do not seem to be the result of direct action by the buyer.

What I did receive was phone calls and emails from Swapalease sales agents. They helpfully pointed out that with my chosen listing, it would take an average of 180 days to find a buyer (something they did not mention before for obvious reasons), but that they would be happy to upgrade me to a (more expensive) "Platinum package" to speed things up.

In summary, I don't think Swapalease is worthwhile. As long as potential buyers have to pay a fee, the audience is most likely very small, and even if buyers weren't required to to pay a fee, I consider the listing fee to be disproportionate. Stay away!

BTRFS Reliability - a datapoint

2016-09-17T00:00:00-07:00

A little while ago I blogged about SSD caching under Linux and promised to report back should I encounter any problems with the (rather complex) stack of btrfs on dm-crypt on lvm on bcache. I have now run this setup for several months and indeed encountered a few issues.

The first issue is that attempting to read from a freshly created file sometimes results in I/O errors that persist until the system is rebooted. With hindsight, I expect that re-mounting would also help but that didn't occur to me at the time so I did not try that. The same problem has been reported by several other people using a variety of configurations (thread on linux-btrfs), so I believe this is not specific to my storage stack but happens even when running btrfs directly on a disk (i.e., with no lvm, dm-crypt and bcache in-between).

The second problem I have encountered only twice in about 6 months. In this case, the kernel crashes with a BUG message from deep within the btrfs code. I am not sufficiently familiar with the Linux kernel to say with certainty what component is at fault nor can I narrow down which circumstances trigger the problem. However, based on just the file and function names in the kernel stack trace I do not think that the problem is caused by any of the lower layers here either - it's

I am running the latest kernel from jessie-backports (which typically trails the most recent mainline version by at most a few weeks) so I'm pretty sure that (as of today) neither of these bugs has been fixed.

I have concluded that btrfs is not yet stable enough for my purposes, and have therefore migrated my system to use ext4 instead of btrfs for most mountpoints. The only exceptions are sbuild chroots (which can easily be re-generated). My backup disks are still formatted with btrfs, but I intend to migrate them to ZFS shortly. The reason for choosing ZFS in one case and ext4 in the other is that ZFS offers snapshots, de-duplication, checksumming and compression (which are especially nice to have for backups), but that it is also an out-of-tree kernel module that I've never used before (so I don't want my production system to rely on it).

Why you should give Mercurial a shot

2016-05-26T00:00:00-07:00

Git has arguable become the most widely used version control system in the open-source and free software communities. However, I believe this is mostly thanks to the amazing features of GitHub rather than the virtues of Git itself. As a matter of fact, I believe that if you can live without GitHub and don't need to manage a code base as large as e.g. the Linux kernel, you are much better of using Mercurial.

Before you continue with this post, please take a moment to read my previous article on the differences between Git and Mercurial to ensure that you're up to date and differences and similarities. With that settled, let's reflect on why you really should be using Mercurial for most of your projects.

Mercurial requires fewer concepts to grasp

There are only a few things you need to understand to use Mercurial efficiently:

Mercurial tracks changes to files in form of commits
Commits form a directed graph (each commit has one or two ancestors and zero or more children)
Tags are named commits, and heads are commits without children (i.e, the ones holding your most recent changes)
Bookmarks are pointers to commits that typically move to the youngest commit in a lineage.
Pulling from a remote repository adds additional commits into your local graph, and pushing to a repository puts your commits into the remote graph.
Merging means to create a new commit that combines the changes made in both of its parents.

Most of these concepts have close analogues in Git - but Git also introduces several additional ones, and then ties them all together so that you cannot just ignore the extra complexity. Many of these additional features go under the headline of "plumbing layer". In theory there are "high-level" Git commands that don't require you to know about the lower level details, and "plumbing-layer" commands that you should need only if you want to do unusual things. In practice, this separation unfortunately does not work out. While you rarely need the plumbing layer commands, using Git effectively still requires you to understand and be aware of the lower-level details. So let's take a look at what this means in practice...

Using Git takes up more mental capacity

The "minimum working knowledge" that you have to keep in your mind when using Git amounts to something like this:

Git stores (de-duplicated) "objects" that are identified by their hash values. Each object has a type and a value.
Git can store snapshots of a directory tree in these objects. The data is stored in "blob" objects, and the list of blobs that constitute a snapshot is stored in a "tree" object.
Git can order the snapshots into a directed graph that tracks changes to files over time. This is done by creating "commit" objects.
There is one special tree object called the "index". Commits can be created from the index, or directly from the file system.
Tags are pointers to commits, HEAD is a pointer to the most-recently checked out commit.
Branches are pointers to commits that typically move to the youngest commit in a lineage
A commit that has no descendants and no branch or tag pointing at it is in a "limbo" state: it does not exist in the commit graph and it will deleted when the next "garbage collection" runs. But it is still a regular "commit" object and (as long as it has not been garbage collected) you can still add a reference to it to make it part of the commit graph.
If you pull from a remote repository, the remote snapshots are added to your local repository. However, this would leave them in limbo (because nothing is pointing at them) so Git needs to assign a name to them. This is done by creating "remote tracking branch". If you repeatedly pull from a repository, the branch labels need to be moved from the old leafs to the new leafs -- which is called a "fast-forward merge" (even though nothing is merged).
Similarly, if you push to a remote repository, Git needs to tell the remote repository what branch name to attach to the new leafs. If this leaves other leafs without label, the operation is forbidded and you first need to create a new commit that descends from the leaf that you actually want to push, and the leaf that's going to be orphaned. This is called a merge (an actual one, not a fast-forward one). Alternatively, you can force the push, which will put the orphaned leaf into the aforementioned limbo - it may be garbage-collected or it may be revived by someone else pushing a new, named descendant.

At this point I just stopped, since this post isn't meant to be a Git tutorial. But this pretty much proves the point I'm trying to make here. Compare this list to the one for Mercurial, and consider that is the bare minimum that you have to keep in mind just to properly use Git's commit, reset, pull, and checkout commands. When you program, do you want as much of your memory and intellect to be available for the problem at hand, or do you want to reserve large parts of it for using your version control software?

Mercurial commands are more intuitive

The Git Koans make for some pretty entertaining reading, but they are depressingly true. Figuring out the Git command to perform a certain action requires either dumb memoization or intimate knowledge of Git's internals (or, in many cases, both). Mercurials commands, on the other hand, are typically easily deduced from what one wants to do.

Do you want to revert changes to a file that you have not yet committed? Use hg revert or git reset --hard. Do you want to create a Git branch / Mercurial bookmark? Use hg bookmark or git checkout. Do you want to edit your commit history (reorder commits, squash commits, remove commits)? Use hg histedit or git rebase -i. And the list goes on.

Git further increases the cognitive load by often using very unfortunate terminology. Why is the "staging area" also called the "index"? And why is the option to make commands work on the staging area called --cached? Why is a "feed-forward merge" called a "merge", if nothing is actually being merged?

Specifying Mercurial revisions is more powerful and intuitive

In Git, the only way to specify revisions is to use their hashes and combine them with a few single-symbol operators (e.g. @ and ^). The hashes are hard to remember and type even if the hash printed by your last git log command is still on the screen and you want to run git diff for it next. The restriction to single-symbol operators makes even simple queries difficult to understand, and hard ones impossible.

In Mercurial commit has an additional short, numeric id that you can use to identify it in such situations. These numeric ids change when the history is modified, but they make it much easier to refer to specific changesets when you execute a number of (history-preserving) commands in sequence (e.g. during bisection).

(Incidentally, I think this is also one of the reasons why Mercurial bookmarks aren't used as often as Git branches even though they are conceptually the same thing: in Mercurial, it's much easier to refer to a commit even if it doesn't have a user-defined bookmark pointing at it).

Mercurial allows you to specify sets of revisions by combining named functions, which is both easier to understand and more powerful. Quickly, which revisions are selected by last(ancestors(ef4b8),3) or modifies("Changes.txt") and date(">2016-05-01")? You can probably figure that out without looking at hg help revsets and hg help dates. Now, what is the meaning of @{5} or ef4b8^{/bug 284}? And how does the former differ from @{-5}? If you are a frequent Git user, you may be able to answer those right away - but I dare you to argue that this is in any way intuitive or obvious.

Furthermore, what if you want to find the last three commits preceding the first release after a given bug was fixed? With Git, you will have to write a little program to walk through the output of git rev-list. With Mercurial, you can run something like hg log -r "last(ancestors(tag("re:release-.*")),3) and descendants(3843)" (assuming that the bug was fixed in commit 3843 and that releases are tagged with names starting with release-).

Mercurial prevents accidental history rewrite

Git makes it very easy to loose the history of your project, and to mess-up even remote repositories. Git will happily let you rewrite history after you have pushed it, or accidentally reset a branch to an earlier commit. While it is generally possible to recover from such accidents, the necessary skills are generally only found in those people who don't make the mistake in the first place. In other words, Git makes it easy to make mistakes while at the same time making it hard to fix them.

Mercurial, on the other hand, makes it hard to make mistakes. Unless you specifically force it to, it will refuse to rebase or modify history that has been pushed. Since there is no need to name branches, a whole class of mistakes (like resetting a branch or committing in detached head state) cannot even happen in the first place.

Mercurial messages are easier to understand

In recent years, there has been a heroic effort to make Git easier to use for beginners. Among other things, this has lead to the introduction of (luckily optional) "advice" messages that are presumably intended to refresh the users memory and help him understand what Git has just done. Let's look at an example:

$ git checkout 979d
Note: checking out '979d'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:

git checkout -b new_branch_name

HEAD is now at 979d... Merge remote-tracking branch 'origin/master'

Now, I totally understand the intention here, but this completely fails in practice. If you need this message to understand what has happened, then you won't be able to understand it without going back to the documentation. And if you understand it, then you don't need it. Git is a perfect example of why it's generally not possible to fix a usability problem with additional documentation. The information that's provided here is too dense for people who need it, and at the same time also way too long for everyone (it get's in the way of experienced users, and further scares off people who don't understand it).

I would have liked to compare this with a message from Mercurial in the same situation, but there's a problem: in Mercurial, situations requiring such messages don't occur in the first place. The "detached head" state in Mercurial doesn't have a special name, it's a state like any other. There is no need to tell the user that he can "look around, make changes, and commit them" because that is always possible. Even more importantly, commits are never silently discarded (or impact other branches) so there is no need to warn about that. To delete an experimental commit, the user has to explicitly instruct Mercurial to do so. Therefore, the message that Mercurial prints when entering its version of a "detached head" state is:

$ hg update -r 2766
resolving manifests
removing tests/pytest_checklogs.py
getting tests/common.py
getting tests/conftest.py
getting tests/t4_fuse.py
getting tests/t5_failsafe.py
4 files updated, 0 files merged, 1 files removed, 0 files unresolved

In a nutshell

Mercurial is simple. It does the job without getting in your way, and without requiring you to worry more about your version control system than about the actual code you're working with. It implements everything that is required of a distributed version control system in the vast majority of cases.

Git, on the other hand, implements a generic content tracker with a distributed version control system built on-top. This makes it amazingly flexible, but in the vast majority of cases the additional flexibily is not needed and just adds unneeded complexity that cannot be avoided even when using only the high-level commands. The unfortunate and often counterintuitive terminology further adds to the cognitive load.

Having said all this, there are situations when Git is the better choice. If you have to work with very large repositories (the size of the linux kernel), Git is both faster and more space efficient. If you have a complicated development structure with many teams working on different features, Git's remote tracking branches make life much easier. And finally, if you are looking for a service like GitHub, then there is simply nothing comparable for Mercurial.

Let me thus close with a final appeal: the next time you need a version control system (and even more importantly, the next time someone new to DVCS asks you for a recommendation) do not immediately reach for Git. Consider not just the advantages of GitHub, but also the drawbacks of Git, and maybe give Mercurial a try.

Convinced? Then head over to HG Init or just start using Mercurial. Most likely you will find using it a lot easier than when you started using Git - you just got used to the pain.

Resolving (apparent) Windows Update freeze when "Checking for Updates"

2016-05-08T00:00:00-07:00

I recently set up a fresh Windows 7 system. While the installation completed without hassles, I was worried that an attempt to run Windows Update seemed to freeze in the "Checking for Updates" state.

The problem with Windows problems

In my opinion, there are two major issues that make fixing any problems with a Windows system a major headache. The first one is the completely absence of error messages, and the second one is the preponderance of bad advice on the internet. As expected, in this case both issues hit me full-on. The amount of results that a Google search for "windows 7 update hang" yields is surpassed only by their uselessness. About 30% of the matches are descriptions of the problem with lots of "me too" postings but no solutions. Another 30% consists of advice that ranges from "reinstall windows" and "disable antivirus/firewall" (which still seem to be the first answer to any windows problem), over clearly dubious downloads of "registry fixers" and "cleanup tools" to almost convincing pointers to specific "Windows Update Agent Updates". The problem is that the mass of clearly bogus information makes even those suggestion appears questionable that otherwise sound reasonable. The waters are further muddled by the factw that even obviously irrelevant suggestions are often accompagnied by reports from other people that this "fixed their problem when nothing else helped".

Distilling an explanation

In the end, the most trustworthy advice that I was able to find was from http://wu.krelay.de/en/. In short, the page advises to manually install a few specific updates before relying on the automatic process. This made sense because I do not expect that Microsoft would actually require users to perform any registry manipulations or installation of third-party components in order to get a fresh Windows to an up-to-date state. Furthermore, this was also quite consistent with several reports from users that they had to wait from anywhere between 24 to 48 hours, but after that the update procedure actually continued (which would also explain the various success stories after performing other, more spurious steps: people simply spent the time trying various procedures, but in the end it was the waiting that helped).

In my case even after manually installing the recommended select updates the automatic update did not make any progress for more than 15 minutes. I would have waited a while longer before giving up, but at that point I discovered another potentially helpful tool: http://wsusoffline.net/ - a collection of scripts that download all updates from Microsoft so that they can be applied offline, i.e. even on a system that's not connected to the internet. I judged this to reasonable again mostly because it just downloaded and installed official Microsoft patches, without any dubious registry hackery or other file system manipulation. I figured having the updates available offline could certainly not hurt, in case I needed to do a reinstallation or setup another system in the near future.

The offline updater runs in a terminal window and gives various status messages which already gave me a lot more confidence. At some point, it was saying something along the lines of Identifying required updates. Please wait, this step may take quite some time - and indeed, this step took about 3 hours. In this case the patience paid off though, after three hours it started installing updates and finished in less than an hour. And after this, the automatic update procedure now also finishes checking for updates in less than a minute.

Summary

So in hindsight it seems there are two solutions to the problem of Windows Update getting stuck in the "Checking for Updates" stage:

Don't do anything, after about 1 - 2 days of keeping the system running the upgrades will start coming (I'm not sure if restarting the system resets the procedure).
Do a manual install of KB3138612 (as advised by http://wu.krelay.de/en/). I expect this would shortened the wait time to about 3 hours.

Note, however, that I did not try either of these solutions (though this would be the steps that I would try should I encounter this problem again). They're just my conclusions from the things that I have tried and read. If you want to follow the procedure that worked for me, do step 2 followed by an offline update installation.

What's wrong with Gnus

2016-02-12T00:00:00-08:00

Gnus is a mail user agent (MUA) and newsreader written in Emacs Lisp. It is famous for being very configurable, and for being one of the few MUAs that come with a decent editor to compose your messages (by virtue of actually running inside Emacs). I have switched between Gnus and Thunderbird as my primary MUA several times over the years and recently switched to Gnus again. This time, however, I made an additional resolution: should I encounter any problems or rough edges I would attempt to debug and fix them (instead of grudgingly living with them and eventually switching MUAs again).

I already had some superficial experience with Emacs Lisp, and this seemed like a great opportunity to do something more complicated with it. Not surprisingly, I quickly discovered things I disliked. To my delight, I also managed to fix several issues and got the changes incorporated upstream.

However, the predominant feeling that I had when hacking Gnus was one of disappointment. I always wondered how you could actually write something as complicated as a MUA in a language designed to facilitate text editing. After looking at the source, my answer to that question is "badly". There are no clever techniques or powerful language features I wasn't aware of, instead everything looks more hacky and fragile than I ever thought possible for a project with the size, history and reputation of Gnus.

Turning an editor into a MUA

I think I was always assuming that Gnus was essentially implemented like any other MUA - only using Emacs buffers to render the user interface. This turns out not to be the case. Instead, Gnus attempts to cast most of the tasks that a MUA has to perform into something that looks like text editing, and then uses Emacs text editing functions to accomplish the task. This may not seem like such a terrible idea at first, but here are some examples of what this means in practice:

There are no complex data structures to pass information from one function to another. Instead, information is passed in a dedicated Emacs buffer. Sometimes the buffer is allocated dynamically, but sometimes even with a fixed name (giving the term "global variable" a whole new meaning). I would expect that e.g. a function that fetches headers from an IMAP server would return some sort of array of structs but instead it actually writes them into buffer in RFC822 format. The caller then reads this buffer and parses the context.
Basically any sort of parsing is done in an ad-hoc fashion using lots of regular expressions (potentially inside a while loop). This includes both the parsing of the buffers that are used for communication between Gnus functions, and the parsing of standardized protocols that are used with remote servers. Take, for example, the parsing of the IMAP FETCH response. The format is specified in RFC 3501, section 7.4.2. Instead of writing a proper parser that would e.g. ensure that parentheses are properly nested and would return some structured object, the way Gnus determines the properties of a message is to first move to the beginning of the line, do a regular expression search for the name of the property (e.g. RFC822.SIZE followed by some numbers), and then take the first match. After this, the cursor (remember, everything happens in a buffer) is moved to the beginning of the line and the procedure starts again for the next property. There is absolutely zero awareness of nesting levels or a distinction between keys and values.

This sort of parsing appears extremely fragile. I'm pretty sure that something like an email with content type text/RFC822.SIZE 47 would confuse the hell out of Gnus. It doesn't even take a malicious sender: one of the bugs that I fixed was the implicit assumption that in the FETCH response of an IMAP server the UID always comes before the size - because the "parser" forgot to go back to the beginning of the line after it was looking for the UID.
There are no good means to signal errors. Basically, if there is any sort of problem (e.g. the SMTP server refuses to accept a message) there are only two choices: write a message into the minibuffer (that the user may or may not see, and that's overwritten as soon as any other Emacs components wants to say something), or signal a lisp error. Unfortunately Lisp errors are (or are used this way in Gnus) completely unstructured, so they are often caught and suppressed indistinctly by higher level code.

There are various examples for this: did you specify an invalid expiry target? Messages will just silently not expire. Did you specify the wrong credentials for your IMAP server? You will only get a brief message in the minibuffer. Is the server refusing to delete or store an IMAP message? You may or may get a brief message in the minibuffer, but Gnus will otherwise appear to have successfully executed the command.
There is very little error detection, and lots of chances for errors to occur and propagate. Since data is passed in form of buffers, functions can receive and return completely unstructured, arbitrary values. In some cases, the expected format of the buffer is documented in the comments. In other cases, it is completely undocumented and has to be deduced by looking at all the participating functions. But only very rarely is the structure of the data in actually reflected in the datatype of a function's arguments.
Even worse, most functions also do not make any attempt to validate the buffers they work with. If they happen to contain garbage, the function will happily work with it. And since parsing of the buffer is done with some ad-hoc regular expression, chances are that the function will even be able "process" the buffer and obtain a result that it passes on (but which is just garbage).
The Gnus code is full of arcane workarounds. For example, "select methods" (which is Gnus name for backends that implement different protocols) can share common code. Normally, this would be implemented by making backends into classes that can inherit from each other. Gnus, however, was written before Emacs Lisp had support for object-oriented programming and thus comes with its own runtime emulation of classes. In other words, there is code that dynamically renames your variables and functions to make it appear they're part of multiple backends (look at nnoo.el for some code that will blow your head).

UPDATE: there has just been a decision to move Gnus development into the GNU Emacs Git repository and scrap support for anything but the most-recent GNU Emacs version. Nice!.

Concluding Thoughts

Having seen this, my optimism of making Gnus into my one and only tool for mail handling is rather diminished. For me, one of Gnus' most appealing features is that it is written in a high-level, interpreted language and can therefore very easily be modified and extended. However, now that I have actually worked with the code, I don't think it's all that easy and pleasant in practice. Emacs Lisp as a language is nice to work with. However, the contortions that have been done to turn Emacs into a MUA seem to make everything so fragile and complex that even small changes become rather unpleasant to implement (at least for me).

Unfortunately, at this point Gnus nevertheless seems to be the least worst option when it comes to mail handling. In my opinion, web-based interfaces are a non-starter for anything but casual use and Thunderbird has a terrible editor (just try to rearrange quotes when composing in HTML, or to copy & paste a fragment of source code without getting automatic line breaks when composing in text mode). There is Evolution, Geary, and Claws, but the last time I looked at them none of them did a better job at Email handling than Thunderbird. I guess what I really want is something like Gnus but written in a language that's more suitable to this task than Emacs Lisp - maybe Python. Does anyone have any suggestions?

SSD Caching under Linux

2016-02-10T00:00:00-08:00

I recently found myself with a spare 128 GB SSD disk and decided to try my hand at setting up SSD caching under Linux. My personal desktop system so far stored all data on traditional spinning disks. However, a little while ago I got a new computer at work that comes exclusively with SSD storage, and since then I've become increasingly annoyed with the disk performance of my personal system. Using SSD caching seemed like an appealing option to increase performance without having to replace any disks.

The two operations where the disk performance is most noticeable is when booting up the system, and when using Unison to synchronize my home directory between different systems. Booting the system typically takes about 1 minute and 30 seconds from loading the initrd to X11 coming up, and about 2 minutes until my desktop is fully loaded (that's i3, Dropbox, Emacs, Firefox, Network Manager, XFCE Terminal, and Pulse Audio). Scanning my home directory with Unison typically takes about 1 minute and 19 seconds (that's just detecting any changes that need to be synchronized, not actually transferring the changed data).

To better estimate how much improvement I could possibly get from the SSD, I first transferred my entire root file system from spinning disks to the SSD. This increased my boot time to X11 from 1:30 to 22 seconds (I unfortunately didn't write down the time when the desktop was fully loaded).

For SSD caching under Linux, there are currently three options: bcache, lvmcache, and EnhanceIO (A nice overview of the differences between bcache and lvmcache can be found on the Life Reflections Blog). EnhanceIO I ruled out immediately because it isn't included in the mainline kernel. bcache has the drawback of requiring you to reformat your partitions and there are various rumours about data corruption with more complex storage stacks. Therefore, I tried lvmcache first.

lvmcache

Initial setup of lvmcache on the block devices was straightforward, but getting the system to setup the stack correctly on boot required some manual work for my Debian Jessie system. It turns out that there is a missing dependency on the thin-provisioning-tools package that contains the cache_check binary. Furthermore, in order to be able cache the root file system, you need to manually configure initramfs-tools to include this binary (and the C++ library that it requires) in the initrd. That out of the way, things worked smoothly.

Unfortunately, even after several boots I was unable to measure any performance improvement. I tried to encourage promotion of blocks to the cache by setting the sequential_threshold, read_promote_adjustment and write_promote_adjustment variables to zero (using dmsetup message <device> 0 <variable> 0), but to no avail. Maybe using it for a longer time would have eventually improved performance, but I got the impression that lvmcache was not the right tool for my use case. As I understand, lvmcache is not actually a cache but more of tiered storage system: it tries to determine which blocks are accessed most frequently, and "promotes" them to storage on the cache device. A traditional cache, in contrast, would put almost every block that is read or written into the cache, evicting the least-frequently accessed block from the cache to make room if necessary.

bcache

bcache works more like a traditional cache. The only exception is that it tries to detect sequential access (e.g. watching a movie, creating an ISO image) and bypasses the cache for such requests (because typically spinning disks are quite performant for them). I was a bit worried about the interaction between bcache, LVM, dm-crypt and btrfs, but also unable to find any concrete reports of problems other than a bug in the btrfs+bcrypt interaction that was fixed in kernel 3.19. Also, there were several people on the bcache and btrfs mailing lists who reported successfully using it even in complex stacks.

Therefore, I decided to bite the bullet and give bcache a try. After moving around a lot of data in order to be able to format the bcache devices and enabling the most-recent kernel from the jessie-backports repository (4.3 at the time of this post) to avoid the above mentioned bcrypt+btrfs bug, I ended up with the following configuration:

/dev/sda3 (a 512 GB partition an a spinning disk) is the backing device for /dev/bcache0
/dev/sdb3 (a 256 GB partition on a spinning disk) is the backing device for /dev/bcache1
/dev/sdc2 (a 58 GB partition on the SSD) is initialized as a cache device, and connected to both bcache0 and bcache1.
/dev/bcache1 and /dev/bcache2 are used as LVM physical volumes (PVs), forming volume group vg0.
/dev/mapper/vg0-root is the btrfs formatted root file system (linearly mapped onto the PVs)
/dev/mapper/vg0-home is a LUKS (dm-crypt) encrypted device (linearly mapped onto the PVs)
/dev/mapper/vg0-home_luks is the btrfs-formatted home file system (backed by vg0-home).

This time, there was no need for any manual fiddling with the initrd, things worked perfectly out of the box. I am also extremely pleased with the performance. While the first reboot proceeded at regular speed, subsequest boots reduced the time to X11 from 1:30 minutes to about 9 seconds, and the time until the fully loaded desktop from 2:00 minutes to about 15 seconds (times are not exact because I used a stopwatch). Note that this is even faster than in my SSD-only experiment - presumably because now both root file system and home directory are utilizing the SSD. The time required to scan my home directory for synchronization changed from 1:19 (with spinning disks) to 4 seconds (with SSD cache). Wow!

I am a little bit worried about encountering bugs in this rather complex stack, but I hope that they will at least not go unnoticed because btrfs checksums all its data. If I encounter any problems, I will update this blog post.

Update: I've now replaced btrfs with ext4, but otherwise kept the stack unchanged. See BTRFS Reliability - a datapoint for details.

Mercurial for Git Users (and vice versa)

2016-01-15T00:00:00-08:00

Until a few weeks ago, the only version control system that I used regularly (and was reasonably proficient in) was Mercurial. This changed when I took over maintainership of libfuse and sshfs at the end of 2014. Both are maintained in Git and have plenty of forks, so converting the repositories to Mercurial would have been silly. For a while, I tried to use the Hg-Git extension that lets you access a Git repository with Mercurial. However, I found that not to be working very well for more complex use-cases. So I finally bit the bullet and learned to use Git. Here is what I learned.

(To make search engines happy, the title of this article should really have been Mercurial for Git users and Git for Mercurial users. Unfortunately that does not sound particularly witty, so I am mentioning Git for Mercurial users a few times in this paragraph instead. Hopefully, this will make this post show up in searches for both phrases).

Fundamental representation

Both Git and Mercurial maintain a directed, acyclic graph of commits (the "commit DAG"). For Mercurial, this is the fundamental representation of the repository. For Git, there is a lower-level layer (called the "plumbing layer", more on that below). However, on the layer of the DAG the two systems are quite similar. A commit consists of a description of changes made to files in the repository and some associated metadata (like author, date, or a message describing the changes). Every node in the commit DAG represents a commit and has a unique id that depends on both the changes made in the commit and the previous state of the repository (the "hash"). Pulling and pushing to other repositories means effectively the exchange of DAG nodes.

The first area where Git and Mercurial differ (and where a lot of confusion can arise) is how elements of the DAG are called. Both systems allow to assign user-defined names to specific nodes which are called tags - but this is where the similarities end.

Mercurial commits have ephemeral handles

In Mercurial, every commit has, in addition to its hash, a unique numerical id that is valid until the next non-trivial (i.e., anything but a simple commit with one ancestor) change. This number is obtained by simply enumerating the id's starting from the root on a "best effort" basis. Typically, successive commits have successive numerical ids. This is especially handy in interactive use (e.g. hg log followed by hg diff), but comes with the risk of accidentally re-using a numeric id after it is no longer valid (or, worse, refers to a different commit). In Git, the only identifier that every commit has is it's alphanumerical hash - but luckily, the hash can be abbreviated as long as it remains unique within the repository.

Git branches are Mercurial bookmarks

In Git, a branch is a pointer to a node of the DAG with a user-defined name. In Mercurial, the same thing is called a bookmark. Technically, a bookmark and a branch are both the same thing as a tag (though living in different namespaces), but the expected usage is different: while a tag is typically assigned to one commit and stays there, bookmarks/branches generally move to different commits over time. A bookmark/branch identifies the "most recent" commit in a particular line of development. Both Git and Mercurial allow the user to declare one branch/bookmark as active, which means that every time the user commits, the bookmark/branch will be moved to from the parent commit to the freshly-created descendant.

Mercurial does not have the concept of branches - every mention of branches in the context of Mercurial actually refers to named branches.

Git does not have named branches

The concept of named branches exists only in Mercurial, and does not have an equivalent in Git. Much time has been wasted by people trying to use Mercurial's named branches like Git's branches.

Every DAG node in Mercurial belongs to exactly one named branch (which may be the default branch, in which case it is often not displayed explicitly). The branch name is part of the commit's metadata. A named branch thus refers not to a specific commit, but to a specific set of commits that may grow (but not shrink) over time. Branch names are often used to distinguish different lines of development, for example the Python Mercurial repository has named branches for each major Python release (2.7, 3.4, 3.5 etc).

Git DAG leaves must be referenced to survive

In Mercurial, leaves of the commit DAG (i.e., nodes without descendants) do not have any special status other than their name. Mercurial calls them heads, and has a hg heads command to list them (so that it is easy to switch to and pick up development from them). Mercurial allows to "close" a head, but technically this is implemented as an additional commit (which becomes the new head) that has a flag in its metadata which causes it to be ignored by commands like hg heads.

In Git, leaves of the commit DAG are only kept alive by references. A reference can be a tag, a branch, or the automatically managed "HEAD" pointer that references the commit that is checked out in the working directory. A leaf commit that is not referenced by anything will eventually be garbage-collected and disappear. For that reason, Git repositories typically have a number of branches which ensure that "important" DAG leaves do not disappear. For the same reason, Git also makes it difficult to create DAG leaves without at the same time starting a new branch. This is often confusing to Mercurial users who are used to easily making new commits starting from any commit in the DAG. Correspondingly, Git users should keep in mind that they do not need to create a bookmark (or, even worse, named branch) every time they want to start a new development line: Mercurial allows to start new heads anywhere, anytime.

Mercurial does not have annotated tags

In both Git and Mercurial, tags can principally be moved from one commit to another (called "re-tagging"). Git, however, has the additional concept of an "annotated tag". An annotated tag is not just a name assigned to a specific commit, but an object that lives in Git's plumbing layer. This object contains not just the hash of the commit and the name of the tag, but can carry additional information like a GPG signature, allowing to sign a tag (and the associated state of the repository). There is no such thing in Mercurial.

Git supports changing history on remotes

Once you push a commit to a remote Mercurial repository, there is no way back. Unless you have additional channels (like SSH access or a web interface), you cannot make a commit disappear from the remote server. This can be a source of a lot of frustration, but it also ensures that once you have cloned a repository you can be sure that every commit you have locally will also persist on the server - i.e., there is no chance that someone else will rebase some of the commits and leave you with two divergent heads on the next pull.

In Git, this is not the case. This is a consequence of the requirement for DAG leaves to be branches. If you have accidentally pushed a commit to a remote server, all you have to do is re-assign the branch names (which can be pushed as well) so that the commit is no longer reachable from any branch and it will eventually be garbage collected. You can also delete a branch completely by pointing it at a special "this does not exist" node. Either operation can be very useful and very dangerous for the reasons given above.

Git tracks upstream status in the revision graph

As mentioned before, Git requires every commit without descendants to have a branch name pointing at it. This means that when you have done (and committed) some work in your local repository, and then pull new commits from a remote repository, Git needs to ensure that both your most recent commit and the most recently pulled commit have associated branch names (the latter are called "remote tracking branches"). On the plus side, this means you can always tell which commits had not been pushed to a particular remote repository the last time you pulled from it. On the minus side, this means that you have to juggle with a lot of branches (keep in mind that you may be interacting with multiple remote repositories). To help prevent name clashes, Git allows the local branch names to be different from the branch names of the remote server, i.e. you can instruct Git to always assign the branch name "foo" to the most-recent commit in the "bar" branch on the remote server (In practice, one would obviously not use "foo" but something more informative like "origin/bar" or "remotename/bar"). Similarly, you can tell Git to assign a different branch name on the remote server when pushing commits.

Since Mercurial heads (commits without descendants) are not required to have bookmarks pointing at them, the situation is simpler. Pulling and pushing commits simply adds the commits to the respective repository without the need to create or move any bookmarks. This means that the Mercurial DAG does not tell you what commits are available in which remote repository. Instead, you can use the hg incoming and hg outgoing commands to connect to a specific repository and determine what would be pulled or pushed. The advantage of this is that in contrast to Git's remote branches, the information is always up-to-date. The drawback is that it requires a network connection to the remote.

Mercurial bookmarks can also be exchanged with remote repositories, but there is no way to set up a mapping between remote and local bookmarks: local and remote bookmarks always have the same name.

Git does not have phases

One advantage of Git's remote branches is that they can be used to determine if a given commit has already been pushed to a remote repository. However, the generality of remote branches makes this a little cumbersome: one has to examine the ancestors of every remote branch to determine if any of them contain the commit one is interested in.

While Mercurial does not have remote branches, it does have a different feature that makes answering this specific question very simple: phases. In Mercurial, every commit has an associated "phase" that's local to the repository. A commit can be either in "draft", "public" or "secret" phase. Draft commits have not yet been pushed to any remote, but will be included in the next push. Public commits have been pushed to at least one remote repository. Secret commits have not yet been pushed, and will not be included in any push until they have explicitly been marked as drafts.

Therefore, to determine if a Mercurial commit has been pushed anywhere it's sufficient to look at its phase. The primary use of phases in Mercurial is to prevent accidental modification of history: if a commit is public, Mercurial will complain very loudly if you attempt to change its history.

Git has a plumbing layer

In Mercurial, the only exposed representation of the repository is the DAG of commits. In Git, on the other hand, the DAG is actually constructed on top of a different data structure that is also exposed to the user - the so called "plumbing layer".

The plumbing layer is essentially just an object storage system that (internally) uses some techniques to efficiently store similar (or somewhat similar) objects. Every commit in the DAG is stored in a plumbing layer object. The plumbing layer does not know anything about how the different objects relate to each other.

All of the "version-control" layer commands (like commit) can be expressed as a series of plumbing-layer commands (and, in some cases, are even implemented as such). By working directly with the plumbing layer one can therefore do interesting things. For example, git-annex is a tool to store large files outside of the Git repository. To the version-control layer, all the files still appear to reside in the local repository - but on the plumbing layer, one can see that there is only a reference to the actual location of the data (which may be on an external hard disk, or on a remote server) that is resolved on-demand to retrieve the actual data. Alternatively, one can add arbitrary additional data in the object storage that is entirely ignored by the upper "version-control" layer.

Obviously, working at the plumbing layer also enables one to do all sorts of things to the DAG that the version control layer would never allow to happen - resulting in a Git repository that's valid as far as the plumbing layer is concerned, but has all sorts of inconsistencies or peculiarities when interpreted by the version control layer.

In Mercurial, none of the above is possible. The way the DAG is stored is an implementation detail, and there is only version-control layer commands to access it.

Git has a staging area

Git has something called the "staging area". Changes to a Git repository are not directly committed, but first moved to this staging area and then committed from there. The contents of the staging area are stored as objects in the plumbing layer, but are not yet part of the commit DAG. Many Git commands accept an option that tells them to act on the working directory, but that just means that they automatically first move all changes from the working directory to the staging area.

For newcomers to Git it sometimes feels appealing to entirely ignore the concept of the staging area and just use these options. That is generally not a good idea, because understanding the effect of commands (like git reset) requires understanding the staging area.

For Mercurial users, it can be helpful to think of the staging area as a mandatory head commit in "secret" phase. Adding and removing changes to the staging area corresponds to amending the secret head, and what Git calls "commit" corresponds to changing the phase of the head to draft (so now it can be pulled and pushed), and creating a new, secret head (which initially has no contents).

Git users coming to Mercurial may be used to the staging area as a way to commit only selected chunks of a file. In Mercurial, the way to do such "partial commits" is to use the --interactive option (which queries for each chunk whether it should be included in the commit). Alternatively, the TortoiseHG GUI (see below) offers an excellent GUI for both partial commits and selective stashing/unstashing of chunks.

Git does not have patch management

Mercurial comes with a build-in patch management extension called Mercurial Queues ("MQ"). MQ manages an arbitrary number of "patch queues", which each contain an ordered set of patches. A patch can either be "applied" (including its predecessors) or "unapplied". If it is applied, then it appears as a specially marked commit in the DAG that can be neither pushed nor pulled. When a patch is unapplied, it lives in a different area that is only visible to the MQ commands. MQ provides commands to "pop" (unapply) and "push" (apply) patches, to convert patches to regular commits, to refresh patches, and to reorder them. Strictly speaking MQ provides no functionality that could not be implemented by rebasing and history editing (rebasing and interactive rebasing in Git lingo) together with dedicated branches for unapplied patch queues. However, MQ automates the necessary bookkeeping (which commit is a patch commit and must not be pulled/pushed, to which patch queue does an applied patch belong, etc) and provides dedicated commands that don't require to re-express the desired operation in terms of rebases and history edits.

Git does not have dedicated patch management functionality. However, as explained above, patch management can still be done by treating patches as regular commits in dedicated branches and doing manual bookkeeping and rebasing.

Mercurial is less complex

This is probably the most subjective point in this article, but I believe that it nevertheless reflects consensus. Generally, Mercurial is less complex than Git and thus faster to learn and less likely to put non-expert users into perplexing situations. This is the result of several factors:

Mercurial has fewer concepts that the user needs to grasp to use it effectively. There is no staging area, no mapping between remote and local branches, no plumbing layer, no "fast-forward" merge, and no remote tracking branches.
Mercurial commands are more consistent. This is probably because most of them were designed together, while Git's user interface has grown and changed over time.
Mercurial documentation is easier to understand. This is partially a consequence of the last two points (there is less complexity that needs to be documented, and often the meaning of a command or option is intuitive), but also the result of a very different writing style. Compare, for example, the help for hg revert with the help for git reset.

Obviously, less complexity comes at a price: some functions that Git provides have simply no equivalent in Mercurial. Typically, the absence of these functions gets more noticeable when projects get bigger. For example, if there are many development lines (production, qa, bugfix, development) as well as many different interacting repositories, having distinct namespaces for the branch names in each repository becomes very handy.

Mercurial users coming to Git should expect a steep learning curve - it will take you a while to memorize the commands, and for a while you will occasionally encounter situations that require you to go back to the documentation to figure out what just happened.

Git users coming to Mercurial are in for a treat: you will be able to work productively pretty quickly and with few suprises. However, over time you may notice (and have to work around) the absence of some concepts that you used to rely on.

Mercurial has a nice GUI

If you are used to something like TortoiseHG under Linux, the GUIs that are available for Git will be sorely disappointing. The best that I was able to find is Emacs' Magit and Gitk (shipped with Git), but even together they are very far from what TortoiseHG provides.

Under Windows, the situation is somewhat better because there is GitHub Desktop and Sourcetree, but both are proprietory tools. When coming from Git to Mercurial, make sure to take a look at TortoiseHG (available Linux, MacOS and Windows) - you may be pleasantly surprised.

BitBucket sucks, GitHub rules

Comparing BitBucket and GitHub probably provides enough material for another article, but I'd like to at least briefly mention them here. Essentially, the situation here is the opposite of what I said about the available GUIs: even though they superficially provide the same service, GitHub is far ahead of BitBucket in terms of usability as well as features. For example, merging pull-requests in BitBucket always creates a named branch that one has to manually close after merging, and that stick around forever (note that there is no need to do that as far as Mercurial is concerned, the deficiency is in BitBucket).

Non-destructive history editing

In Mercurial, the evolve extension implements non-destructive history editing. This means that a later commit can "obsolete" one or more earlier commits. The obsolete commit is not removed from the DAG, but will be ignored by most Mercurial commands by default. With changeset evolution enabled, there are effectively two layers of history: the history of the managed content, and the history of the DAG itself. This means that just like you can use Mercurial to determine the contents of a file several commits ago, you can query for the status of the commit DAG prior to e.g. a rebase. The big advantage of this is that it allows history-mutating operations (like a rebase) to be pulled and pushed between repositories.

While it sounds exciting, the evolve extension is not yet sufficiently stable to be included in the Mercurial core so it needs to be downloaded and installed separately. It is possible that a similar out-of-tree extension exists for Git - if you know about it, please leave a comment below!

Things that are not different

In the past, Git and Mercurial had some additional significant differences. While these are no longer present in current versions, the internet doesn't quite have caught up with that so it's worth listing these non-differences here explicitly.

Mercurial repositories are not orders of magnitude bigger than Git repositories. This used to be the case, but has been reduced to about 30% in more recent Mercurial releases. The GNU Emacs repository takes 239 MB in a fresh Git clone and 336 MB when stored in Mercurial. Git repositories, however, need occasional "repack" operations to minimize space usage, so when the Git repository is actively used the above number is more of a lower bound for the actual size.
Mercurial lets you change history. This has always been the case, but the feature has to be enabled in Mercurial's configuration file first and is labeled as an "extension" (even though it is fully supported and shipped with Mercurial).
Git is well documented. The documentation of early Git versions was atrocious, but these days you can actually learn Git by reading the manpages.
Git works well under Windows. This used to be different, but has also been rectified quite some time ago.

Review: KDLinks X1 Dashcam

2016-01-08T00:00:00-08:00

This is a review for the KDLinks X1 dashcam, which I have used for a couple of months now.

Overall, the X1 is well designed. The picture quality is very good (though not good enough to read license plates at night, but I couldn't find any dashcam that can do that), and the support is very response. The installation is relatively straightforward if the area behind your rear-view mirror is sufficiently smooth so that you can use the suction cup. If the area is textured, you'll have to first glue some intermediate, smooth material on the windshield and attach the suction cup to that.

All these qualities have been discussed at length throughout the internet, so I won't dwell on them much. Instead, I'd like to focus on something that most of the reviews I've read seem to have missed completely: how well the camera is actually capturing and saving footage in critical situations.

The X1 offers two ways to do this: automatically via an acceleration sensor (which is supposed to trigger on collisions), or manually by pressing a "lock" button. Both methods supposedly to ensure that the last n minutes (configurable to 1, 2 or 3 minutes) are not overwritten by later recordings.

Lacking a spare car to crash, I was not able to test the acceleration sensor, so it's probably prudent to always use the manual button as well when you want to save footage (just pressing the button is much easier than checking if the sensor has triggered).

The manual (which KDLinks deliberately does not put online for obscure reasons) claims that pressing the "lock" button guarantees that the last n minutes of video are saved and will not be overwritten by later footage. In my opinion, this must reliably work in any and all situations, and is really the most important function of any dashcam. Saving footage must be extremely simple and reliable, because it is needed most when the user is most likely to pretty agitated and not able to recall any complicated operating instructions.

Unfortunately, the handling of the lock button is more complicated than that. And even worse, none of what I'm describing here is actually mentioned in the manual. In other words, if you rely on your X1 working without having tested it several times, you might be in for an unpleasant surprise when you actuallly need the footage. There are basically three separate issues:

Pressing the lock button only protects the footage if the camera screen is active. So if you have turned on the screensaver function (so that there isn't always a moving picture in the corner of your vision), you actually need to press the lock button once, wait until the screen turns on (about a second), and then press it again to lock the footage (pressing it twice in quick succession does not work).
The camera saves footage in segments of a fixed length (1,2 or 3 minutes). When there is no more space on the SD card, the oldest non-protected segment is deleted. Unfortunately, this means that when you press the lock button, the camera does not actually lock the last 3 minutes of footage, but it locks the current segment. So if the camera has just started a new segment, pressing the lock button may only capture a few seconds of footage instead of the last three minutes. Putting it different, unless you press the lock button at exactly 2:59, 6:59, 9:59 etc minutes after turning on the camera, you will always end up with less than 3 minutes of protected video.
The segment is not actually protected immedatialywhen you press the lock button, but only when the camera starts recording the next segment. So if you press the lock button, but then turn off the camera before the camera has started a new 3 minute interval, the footage you're interested in will be overwritten the next time you start the camera. I think this is especially bad, because it fails in the typical situation where there's an accident, you press the lock button, turn off the car to work things out, and eventually turn on the car again to continue driving.

The support recommends that, instead of using the lock button, one should turn off the camera and remove the SD card. This certainly avoids all the above problems, but it begs the question of why there is an acceleration sensor and manual trigger in the first place.

On the Beauty of Python's ExitStack

2015-12-28T00:00:00-08:00

I believe Python's ExitStack feature does not get the recognition it deserves. I think part of the reason for this is that its documentation is somewhere deep down in the (already obscure) contextlib module because formally ExitStack is just one of many available context managers for Python's with statement. But ExitStack deserves far more prominent notice than that. This post will hopefully help with that.

So what makes ExitStack so important? In short, it's the best way to handle allocation and release of external resources in Python.

The Problem

The main challenge with external resources is that you have to release them when you don't need them anymore -- and in particular you must not forget to do so in all the alternate execution paths that may be entered in case of error conditions.

Most languages implement error conditions as "exceptions" that can be "caught" and handled (Python, Java, C++), or as special return values that you need to check to determine if an error occured (C, Rust, Go). Typically, code that needs to acquire and release external resources then looks like this:

res1 = acquire_resource_one()
try:
    # do stuff with res1
    res2 = acquire_resource_two()
    try:
        # do stuff with res1 and res2
    finally:
        release_resource(res2)
finally:
   release_resource(res1)

or, if the language doesn't have exceptions:

res1 = acquire_resource_one();
if(res == -1) {
   retval = -1;
   goto error_out1;
}
// do stuff with res1
res2 = acquire_resource_two();
if(res == -1) {
   retval = -2;
   goto error_out2;
}
// do stuff with res1 and res2
retval = 0; // ok

error_out2:
  release_resource(res2);
error_out1:
  release_resource(res1);
return retval;

This approach has three big problems:

The cleanup code is far away from the allocation code.
When the number of resources increases, indentation levels (or jump labels) accumulate, making things hard to read.
Managing a dynamic number of resources this way is impossible.

In Python, some of these issues can be alleviated by using the with statement:

 @contextlib.contextmanager
 def my_resource(id_):
     res = acquire_resource(id_)
     try:
         yield res
     finally:
         release_source(res)

with my_resource(RES_ONE) as res1, \
   my_resource(RES_TWO) as res2:
    # do stuff with res1
    # do stuff with res1 and res2

However, this solution is far from optimal: you need to implement resource-specific context managers (note that in the above example we silently assumed that both resources can be acquired by the same function), you can get rid of extra indentation only if you allocate all the resources at the same time and live with an ugly continuation line (no parenthesis allowed in this context), and you still need to know the number of required resources ahead of time.

Over in the world of exception-less programming languages (no pun intended), Go has developed a different remedy: the defer statement defers execution of an expression until the enclosing function returns. Using defer, the above example can be written as:

res1 = acquire_resource_one()
if(res == NULL) {
    return -1
}
defer release_resource(res1)
// do stuff with res1
res2 = acquire_resource_two()
if(res == NULL) {
    return -2
}
defer release_resource(res2)
// do stuff with res1 and res2
return 0

This is pretty nice: allocation and cleanup are kept close together, no extra indentation or jump labels are required, and converting this to a loop that dynamically acquires multiple resources would be straightforward. But there are still some drawbacks:

To control when exactly a group of resources is getting released you have to factor out into separate functions all parts of code that access the respective resources.
You cannot "cancel" a deferred expression, so there is no way to e.g. return a resource to the caller if no error occured.
There is no way to handle errors from the cleanup functions.
defer is available in Go, but not in Python.

ExitStack to the rescue

ExitStack fixes all of the above issues, and adds some benefits on top of it. An ExitStack is (as the name suggests) a stack of clean-up functions. Adding a callback to the stack is the equivalent of calling Go's defer statement. However, clean-up functions are not executed when the function returns, but when execution leaves the with block - and until then, the stack can also be emptied again.

Finally, clean-up functions itself may raise exceptions without affecting execution of other clean-up functions. Even if multiple clean-ups raise exceptions, you are will get a usable stacktrace.

Here's how to acquire multiple resources:

with ExitStack() as cm:
    res1 = acquire_resource_one()
    cm.callback(release_resource, res1)
    # do stuff with res1
    res2 = acquire_resource_two()
    cm.callback(release_resource, res2)
    # do stuff with res1 and res2

Note that

acquisition and release are close to each other
there's no extra indentation,
the pattern and it easily scales up to many resources (including a dynamic number that's acquired in a loop)

If there already is a context manager for your resource, there's also a shortcut function:

with ExitStack() as cm:
    res1 = cm.enter(open('first_file', 'r'))
    # do stuff with res1
    res2 = cm.enter(open('second_file', 'r'))
    # do stuff with res1 and res2

To open a bunch of files and return them to the caller (without leaking already opened files if a subsequent open fails):

def open_files(filelist):
    fhs = []
    with ExitStack() as cm:
        for name in filelist:
            fhs.append(cm.enter(open(name, 'r')))
        cm.pop_all()
        return fhs

Disclaimer: the original idea for ExitStack came from me.

No vigor regeneration bug in Witcher 2 with FCR

2015-10-13T00:00:00-07:00

I recently stumbled over a nasty bug when playing The Witcher 2 with FCR (the Full Combat Rebalance mod). The symptoms are that all of a sudden your vigor is stuck at zero and does not re-generate anymore. If you look at your character attributes, you will find that vigor regeneration has indeed changed to zero both in and outside of combat.

There are other reports of this problem to be found in the internet, but surprisingly enough no one seems to have found a solution other than going back to the most-recent saved game that does not have the problem.

After some investigation, I believe I found out what is happening and how to work around it.

FCR changes the definition of the "block" action (triggered with E) to also invoke the Quen sign. While Quen is hold, vigor re-generation stops (otherwise you'd be able to stay in Quen forever and be effectively invulnerable). Now, I think what happens is that sometimes there are issues with this "double action" of the E key that result in the effects not being properly reset when E is released. This seems especially likely to happen if you only briefly depress E (as opposed to holding it).

So how can this be fixed? Well, theoretically it is enough to simply enter blocking stance / cast Quen sign again by holding E. When you release it, your vigor regeneration should be back to normal.

There is just one problem with this solution: typically, when you notice the problem, you don't have any vigor, so you can't cast Quen in the first place.

Solution 1

The first workaround that I found is to temporarily cheat to get your vigor up again. The Deviatted Trainer (unfortunately the site requires registration to download) works nicely with the most-recent game version (at least as of today). Start the trainer, start the game, press F2 to activate the trainer and then 3 on the numeric keypad to recover your vigor. Now you should be able to cast Quen again by pressing E, and afterwards your vigor re-generation should be fixed. At this point you don't need the trainer anymore.

Solution 2

If you're hesitant to run cheat programs from dubious internet sources (as I would normally be, but I'm playing games on a throw-away Windows installation), I believe there is a second solution. I did not test this one myself, but I'm pretty sure it will work. As a bonus, this solution should have the advantage of also making sure that the problem will not occur again.

The trick is to modify some game files such that vigor re-generation while casting Quen is not completely turned off, but just very slow. This will still not help you in combat, but if you now run into the bug, you can simply wait until your vigor has regenerated and enter blocking stance to reset the re-generation.

To modify the game files, download the RED tools and use them to extract the contents of your CookedPC\z_fcr2_data.dzip file (in the game installation directory) into a temporary directory. Now you need to edit the abilities/gerald_basic.xml file and look for the definition of the QuenBuff ability. It should look like this:

<ability name="QuenBuff" >
    <!-- Some lines removed -->
    <endurance_combat_regen mult="true" always_random="false" min="0" max="0"/>
    <endurance_noncombat_regen mult="true" always_random="false" min="0" max="0"/>
</ability>

What this tells you is that while Quen is active, endurance (aka vigor) re-generation is multiplied by 0. So if you change this to e.g.

<ability name="QuenBuff" >
    <!-- Some lines removed -->
    <endurance_combat_regen mult="true" always_random="false" min="0.1" max="0.1"/>
    <endurance_noncombat_regen mult="true" always_random="false" min="0.1" max="0.1"/>
</ability>

Vigor re-generation will merely be reduced to 10%. Once the change is done, use the RED Tools again to pack all the files back into a .dzip file, and replace the z_fcr2_data.dzip file with it. This should fix the issue!

Book Review: Zoo

2015-10-06T00:00:00-07:00

This is a review for Zoo, book one of the Enclosure Chronices written by Tara Elizabeth.

Zoo is written in the first person. The protagonist, Emma, tells the story of how she dies and then wakes up in an "Anthropologic Center" in the future. We quickly learn that at some point in the future, time travel was invented but quickly banned because it caused all sorts of problems. However, there is one exception: anthropologic centers are allowed to travel back in time and snatch people away at the instant they're dying (because in this case the repercussions of the change are minimal), transport them to the future and heal them using their superior technology. As a side effect, this also strips them of any rights they may have otherwise had as human beings. Emma is one of the people that have been saved this way, and now lives with another girl in a glass-cage that illustrates how people lived in some indeterminate pre-industrial time. Interactions with the watching visitors are not permitted. After a little while, they are joined by two additional captives and instructed to procreate. However, neither Emma nor her assigned partner are interested in that, so after repeated warnings they get reasssigned to a different center...

Based on this description, I would have expected a novel that is somewhere between disturbing, terrifying and depressing. However, the book turns out entirely different.

Despite the solemn theme, a perpetual helplessness of the protagonists and plenty of one-sided violence, the book reads more like a description of your last stroll through the neighbourhood. Somehow the struggle of the characters never appears truly serious and the deaths barely bother. I would say that this book is the best example of "light reading" that I've seen.

This is not to say that the story is bad or wholly unbelievable. The book does deliver an entertaining read, but don't except to be captured by it. I mostly kept reading out of modest curiosity what might happen next, not because I really cared about any of the characters. For the same reason, I found myself not caring all that much about some blatant plot holes (which normally bother me a lot).

Verdict: read it when you're otherwise bored. Or don't. Whatever.

Book Review: Exodus: Empires at War

2015-08-08T00:00:00-07:00

This is a review for the first book of the Exodus series written by Doug Dandridge.

Empires at War comes across as an ambitious project. There is a huge number of characters, locations, and plot lines. Normally this is something that I like (for example, I think the Nights Dawn triology from Peter Hamilton is fantastic), but in this case I have the feeling that too much has been compressed into a single book (of just 384 pages). Essentially, this book feels like an introduction to the universe in which the story is going to unroll in later books (which I haven't yet read), but on its own this book has little to offer. There are simply too many characters and locations to keep track of, and the author switches between them too quickly (sometimes from one paragraph to the next). Since the locations are light-years apart the book requires you to keep in mind what information has made it to a given location. I'm not yet sure if I like this as a general approach, but in this specific case it did not work well because at least I wasn't able to keep all this information in my head. Often, I had to deduce from the behavior of the characters how far the plot had progressed ("ok, these guys don't feel nervous about an unknown ship approaching, so they must be at a point where it is not yet known that humanity is under attack").

On the plus side, the individual characters are done well (to the extent one can tell from the amount of space that they get), and the plot promises to be interesting. But on the other hand, none of the characters get a chance to really affect the plot and the plot does not evolve at all. This book does not have an end, it simply stops at a point where you'd normally expect a new chapter (or even a new paragraph) to start.

I believe most of the points above could have been easily avoided by merging this book with the next one (or maybe two) in the series, and delaying the introduction of some of the characters and locations to a later point. However, this is just speculation: I haven't read the other books yet.

On a different note, the plot also contains some elements that I'd normally not expect in a science fiction setting. For example, the emperor's bloodline has the ability to see visions of the future in their dreams. While there is nothing wrong with the idea per se, I found it discordant in this setting.

Another thing that really nagged me (though I don't hold it against the book) is the author's frequently repeated assertion that "humanity used to be technology 2000 years behind [their main enemy], but they caught up in just 500 years". This is so wrong that I cannot resist pointing it out here. The very definition of being x years behind is that it will take you x years to catch up -- if you do it faster then quite obviously you weren't behind that much to begin with.

Verdict: as part of a series this book might be a good read if you don't mind the mystical aspects. However, on its own (and it is sold as such), I don't recommend it.

How to find oldest tagged descendant of a given changeset

2015-07-30T00:00:00-07:00

I occasionally want to know at which version a specific change was first released. Finding the commit that introduced the change is typically easy using hg blame. However, it always takes me several minutes to figure out the correct syntax to figure out the command to find the oldest tagged descendant of this changeset, i.e. (assuming you are tagging releases) the first release that contains the change. Here it is:

hg log -r 'first(sort(descendants(<commit-id>) and tag("re:^release-"), "date"))'

Book Review: The long way to a small, angry planet

2015-06-17T00:00:00-07:00

This is a review for The long way to a small, angry planet, written by Becky Chambers.

This is a very interesting book. The overarching plot is not particularly novel or intriguing, but the author does an amazing job describing the different characters and the world they live in. Normally, the lack of an intriguing story arc is something that I very much dislike (e.g. I think that Kim Robinsons' "2312" is just boring), but I absolutely loved this book. It is hard to pin down what makes it special. There are no great visions or fascinating technologies, no threat to the universe/galaxy/humanity, nor tense battles. But there are a number of completely different characters (both alien and human), with captivating back stories and unique views that makes them wonderfully distinct. One of the most memorable parts is when Sissix (a reptilian alien) describes her species way of having and raising children and wonders why humans would ever consider the loss of a child to be more tragic than the loss of an adult (I'm deliberately not saying more here, go read the book!).

Verdict: read it, right now.

Book Review: Half-Life

2015-05-13T00:00:00-07:00

This is a review for the second book in the Russell's Attic series by SL Huang.

I really liked the first book in this series (Zero-sum game), but unfortunately the second one does not reach the quality of the first. At the beginning, the story is very slow-paced and tedious, but it picks up in the second half. However, at that point the overall story arc also becomes rather predictable (in short, Cas builds herself a little team), and the details of the plot do not make much sense on their own (you often wonder why the protagonists do what they do, and the only explanation seems to be that this is necessary to support the overall story arc). Also, there is one resolution of a sub-story (the Mafia stuff) that just does not make any sense at all.

While I liked Cas' special abilities in the first book, here they've become just a tad too surreal. For example, Cas' is supposedly able to plan a certain mission down to second accuracy days in advance (and then worries when she's a few second behind the plan), when the very same plan involves human guards that are patrolling the area (which are certainly not going to conform to schedule with second-accuracy). Often, it feels as if the author was desperately trying to one-up on Cas' abilities of the first book (which I felt was really not necessary). Cas' sudden reluctance to kill just because she's got some friends who don't like that (even though she's not quite sure what friends actually are) is not plausible either.

Verdict: meh.

Book Review: Mastering C-Make

2015-04-29T00:00:00-07:00

This is a review for Mastering C-Make, written by Ken Martin, and published by Kitware.

I bought this book in the hope of getting a structured introduction to CMake. While the CMake online documentation appears to be comprehensive, it seems mostly intended as a reference and is thus rather difficult to use as a starting point, so I was hoping this book would provide more guidance. Unfortunately this is not what the book delivers.

While it comes in with 685 pages, the appendix starts at page 235 and from there on all you get is a printed version of the CMake manpages.

Furthermore, even the first 235 are low quality in almost every aspect. The illustrations are badly prepared and mostl pixelated and too big in comparison to the text. The text has not been proofread properly ("CMake is no longer case insensitive, so where you see COMMAND you could use command"). The contents are not structured nor particularly suited for learning CMake (why would I care about the class hierarchy of the CMake implementation?). Not even the syntax of the CMakeLists file is explained completely, e.g. the difference between whitespace and semicolors in variable definitions is unclear, and there seems to be a rather fuzzy distinction between "expanding" and "replacing" variables.

If you're looking for a printed copy of the onilne documentation this book may be what you're looking for. Otherwise skip it.

Book Review: Invincible

2015-03-29T00:00:00-07:00

This is a review for the second book in the The Lost Fleet: Beyond the Frontier series by Jack Campbell.

Here is a best-of of the first ~20% of the book. I don't think there are any significant spoilers.

The main protagonist has the rank of admiral, his wife is captain on his flagship. They have to be careful not to stand too close to each other when looking at a screen (it would be inappropriate) and they must not visit each other in their private quarters (yes, neither on- nor off-duty). Discussing this is, for some reason, a recurring theme.
The protagonist (supposedly a strategic genius) implements his enemy-crushing strategies by giving commands like "Immediate execute, all units come up one nine zeros degrees". Because ships generally move with ~10% of light speed, engagements only last for fractions of a second.
He is also amazed by the ability of his fleet to make a 90 degree turn without breaking formation. Yes, that is not a joke. Yes, they have computers.
Another repeating theme is that the protagonist loudly tells something to his wife when they're on the ships bridge so that everyone can overhear it. This is his strategy to disseminate information that (for some reason) he cannot just tell the crew directly.
After discovering that a new aliens species looks sort of like a mixture of bears and cows, they are baptised as "bear-cows". Furthermore, from the fact that the bear-cows don't have incisors, the main characters immediately deduce that they must be herbivores (leaving aside for the moment the fact that a herbivore/carnivore distinction doesn't even make sense for an alien biology that may not even have such a distinction), and thus that they must be out to crush humanity because they think humans (as omnivores) will otherwise eat them.
At some point, the book talks about arranging ships in a Fourier series as an example of a particularly beautiful formation. Yes, they really mean the mathematical thing, it comes right after discussion a "Mandelbrot fractal formation".
Another science facepalm: because the ships move with appreciable fractions of the speed of light, the ships target systems supposedly have to cope with "the differences between how the ship saw the universe and how that universe actually was" (yes, literal quote). As a result of that, ships cannot fight when there relative velocity is larger than 0.2c (or some number like that).

If you aren't bothered by stuff like this, it might be an enjoyable read. For me it was not.

Verdict: skip it.

Book Review: The Mistborn Triology

2014-03-15T00:00:00-07:00

This is a review for the Mistborn series from Brandon Sanderson, covering both Mistborn, The Well of Ascension, and The Hero of Ages.

I'm the kind of person who gets annoyed quickly by even small inconsistencies in plot, world setup, and character behavior. Yet I found this book thoroughly enjoyable. There is a fantastic big story ark spanning over all three books, yet every single book also stands well on his own. The way magic works in the Mistborn triology is clearly the best approach I have seen so far. There are only two points of criticism I can make. Firstly, in middle part of the third book everything consistently look so desperate and depressing that at times it actually becomes a bit of a chore to keep reading - however, the fantastic ending makes it well worth it. Secondly, there is one small inconsistency in the story that I noticed. It's a bit difficult to describe it spoiler free, but I'll try: at the very end of the triology, a character taps some power that was previously unavailable to him/her. However, there seem to have been a few occasions much earlier where some other character could (as far as I can tell) easily have facilitated this - thereby avoiding a lot of suffering and death. Why didn't that other guy do that? It seemed to be in his/her best interest as well as within his/her capabilities.

Verdict: read it. now.

The RackSpace CloudFiles support is terrible and incompetent

2014-02-02T00:00:00-08:00

Rackspace is rightfully praised for its use and development of open source technologies like OpenStack. However, I have had some rather shocking experiences when contacting their technical support. What I have seen was in fact sufficently bad for me to feel the need to document it here as a warning for others. The bottom line is, all support agents I was in contact with had minimal expertise (which is disappointing, but not that unusual), but (and this is the truly shocking aspect of the matter), did not hesitate to blindly make up stuff to conceal their ignorance. In other words, my advice to every Rackspace customer is to never take anything from the Rackspace support at face value -- if you cannot independently confirm it, it's probably made up.

That said, here's the best of my interactions.

Chunked Uploads

Please wait while we find an agent to assist you...
Welcome to The Rackspace Cloud. My name is John M., may I have your name and email address in case we get disconnected?
Customer: Hi, I'm Nikolaus Nikolaus@rath.org
Customer: Does your cloud hosting service support chunked object uploads?
John M.: Hi Nikolaus!
John M.: Yes
Customer: And what kind of consistency do you provide?
John M.: WHy are you looking to upload chunked objects?
Customer: To stream data without local caching
John M.: ok great!
John M.: Cloud files is a storage offering, primarily used for content delivery. we have several customers that use it for the delivery of media files (just an example) since it has the ability to sync with Akamai's CDN. the advantage to you is that files delivered via the CDN will be quick, regardless of your end user's geographic location. this is all because the CDN has nodes scattered throughout the world.
Customer: That looks suspiciously like a rather inappropriate standard text block.
Thank you for contacting the Rackspace Cloud.
Your session has ended. You may now close this window.

Yes, this chat sesson was not closed by me but by John M.

Data Durability

Welcome to Rackspace. How can I assist you?
You have been connected to Monica.
Monica: Welcome to Rackspace, may I help you with a hosting solution today?
Customer: Hi
Customer: I have a question about Cloud Files
Customer: What happens if one of my storage object gets lost by the server?
Monica: Sure :)
Customer: Will I get notified? Will the object name still be returned by list requests?
Customer: What will be the error code when I try to retrieve the object?
Customer: I find the Cloud Files API Developer guide a bit lacking in detail...
Monica: We offer persistent storage which is actually stored on a physical hard drive.. so you will not lose your information if it happens to go down
Customer: Yeah, but what if the hard drive crashes?
Customer: S3 offers two classes of durability guarantees, do you have something similar?
Monica: It is all RAID 10
Monica: so it is replicated 3 times
Customer: So what average durability do you guarantee?
Customer: And what happens when an object is lost?
The agent is sending you to http://www.rackspace.com/cloud/legal/sla/.
Monica: This is our Cloud files SLA
Customer: Yes, but it does not say anything about durability, just availability
Monica: Unfortunately we do not have anything on our site about the durability
Customer: So you do not make any guarantees?
Monica: Because we offer persistent storage on RAID 10 servers, objects should not be lost
Customer: Can you nevertheless tell me how the system will handle requests for a lost or damaged object?
Customer: [ I'm pretty sure that Amazon knows about RAID as well, yet they guarantee either 99.99% or 99.999999999% durability ]
Monica: The only way you were to lose an object is if you were to delete it
Monica: Our Cloud files is our most secure data storage medium
Customer: There is no such thing as absolute security.
Monica: We also guarantee 99.999999% on the durability
Customer: Could you point me to the paragraph where that is specified?
Monica: What is it that you are looking to store in our Cloud files?
Customer: I don't store data myself, I'm providing an online storage file system, and I'm interested in adding a CloudFiles backend.
The agent is sending you to http://www.rackspace.com/cloud/cloud_hosting_products/files/support/.
The agent is sending you to http://www.rackspace.com/cloud/cloud_hosting_products/files/compare/.
Customer: I don't think this conversation is going anywhere. Nevertheless, I appreciate your efforts. I know you have to work with the information you've got.

What bothers me here is not that Rackspace doesn't offer durability guarantees, and Monica does not to know how their system is going to react to a missing object. The problem is that she completely ignores a question, maintains that data is absolute secure because it's stored on a RAID-10 array, and points me to several unrelated webpages, before finally pulling a number out of thin-air that can not be found anywhere else.

Either Rackspace is dangerously overconfident in the reliability of its systems, and surprisingly unfamiliar with the behaviour of its own software, or they are deliberately refusing to give all the information needed to build reliable software on top of their storage service and misleading their customers about the security und durability of their data.

In either case, it's not a pretty picture. Their customer support obviously goes as far as giving wrong pieces of information without even flinching. If they now come up with some information on how the system would respond to a missing object after all, how do you know this is accurate and not just made up as well? You can't test it.

HTTP Headers

11:19:58 PM [Tim R] Hello! Thank you for contacting Rackspace Chat support! My name is Tim R, How can I help you today?
11:20:03 PM [Nikolaus] Hi Tim
11:20:12 PM [Nikolaus] I'm using the REST API to access CloudFiles
11:20:38 PM [Nikolaus] I was wondering if it's possible that CloudFiles is changing the case of metadata keys
11:20:57 PM [Nikolaus] I have stored an object with a metadata key of "foo"
11:21:17 PM [Nikolaus] but when I retrieve it, the HTTP header says "X-Object-Meta-Foo: value"
11:21:19 PM [Nikolaus] Note the capital F
11:22:11 PM [Tim R] Give me a moment
11:24:21 PM [Tim R] What is the call that you are running?
11:24:33 PM [Nikolaus] What do you mean with "call"?
11:24:39 PM [Nikolaus] I'm sending a HTTP GET
11:24:46 PM [Tim R] The full command
11:25:07 PM [Nikolaus] one moment
11:28:44 PM [Nikolaus] GET /v1/MossoCloudFS_e0f1f364-b0c0-4062-ad38-ca87466ee99f/nikratio-test/?marker=&limit=5000&format=json&prefix=s3ql_test%2F-s3ql_test_1370143692 ({'content-length': '0', 'connection': 'keep-alive', 'X-Auth-Token': '93a4bdd7d83f4404862308bf407ca959'})
11:29:04 PM [Nikolaus] wops, wrong line
11:29:10 PM [Nikolaus] GET /v1/MossoCloudFS_e0f1f364-b0c0-4062-ad38-ca87466ee99f/nikratio-test/s3ql_test/-s3ql_test_1370143692s3ql_%3D/1 ({'content-length': '0', 'connection': 'keep-alive', 'X-Auth-Token': '93a4bdd7d83f4404862308bf407ca959'})
11:29:36 PM [Nikolaus] the stuff in {...} are the request headers
11:30:30 PM [Tim R] What is the query you are running to set the metadata keys?
11:30:46 PM [Nikolaus] PUT /v1/MossoCloudFS_e0f1f364-b0c0-4062-ad38-ca87466ee99f/nikratio-test/s3ql_test/-s3ql_test_1370143692s3ql%3D/_1 ({'Content-Type': 'application/octet-stream', 'Content-Length': 9, 'connection': 'keep-alive', 'X-Object-Meta-jimmy': 'jups@42', 'X-Auth-Token': '93a4bdd7d83f4404862308bf407ca959'})
11:31:00 PM [Nikolaus] in this example, jimmy becomes Jimmy
11:33:08 PM [Tim R] the keys all have to have Caps when separating naming schemes by a delimiter (-)
11:33:48 PM [Nikolaus] Could you point me to a spec with details?
11:34:02 PM [Nikolaus] I don't think that's from RFC 2616
11:35:59 PM [Tim R] Give me a few moments
11:36:15 PM [Nikolaus] http://docs.rackspace.com/files/api/v1/cf-devguide/content/Create_Update_Object-d1e1965.html just says "The object can be created with custom metadata through HTTP headers identified with the X-Object-Meta- prefix."
11:41:29 PM [Tim R] Basically, think of the "X-Object-Meta-Jimmy" as the key, and the value is "jups@42" The data is maintained the same way as arrays, but all you need to know: the thing to the left of the ":" is the key
11:41:48 PM [Nikolaus] aeh, yes, that much is pretty obvious :-)
11:42:05 PM [Tim R] I'm trying to find some documentation on this.
11:42:06 PM [Nikolaus] My question is: why is RackSpace changing the case of the keys?
11:42:12 PM [Nikolaus] and in what way?
11:42:54 PM [Nikolaus] This silent mangling that appears to be done is making me a bit nervous
11:46:00 PM [Tim R] HTTP 1/1 header standars say that headers that have naming schemes separated by a delimiter will have each segment capatalized
11:47:06 PM [Nikolaus] Could you be more precise? I can't find that in http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2
11:49:12 PM [Tim R] Give me a few minutes, I'm trying to find the specific documentation that states this
11:52:11 PM [Tim R] http://tools.ietf.org/html/rfc822
11:58:29 PM [Nikolaus] I don't see any specific prescriptions for the field name in there either...
12:01:16 AM [Tim R] So RFC822 is about messages, but it sets the standards for headers, only 2 headers are non-caps after the delimination, Reply-cc and Reply-bcc to be exact. and they have since made "X-" headers obsolete in draft http://tools.ietf.org/html/draft-saintandre-xdash-00, the draft was submitted in 2011, and has not been converted into RFC yet, though
12:02:42 AM [Nikolaus] Could you be more precise? I don't see anything about delimination of field names in RFC822
12:03:08 AM [Nikolaus] http://tools.ietf.org/html/rfc822#section-3.2
12:03:40 AM [Nikolaus] All it says is that a field name is 1*
12:05:31 AM [Tim R] I'm deeply sorry, but the RFC is less than clear, but it did set the standard, and that only cc and bcc are allowed to be non-caps after a delimination
12:05:32 AM [Nikolaus] RFC 2616 adds that "Field names are case-insensitive", but there is nothing about delimiters, or specific capitalization rules
12:06:05 AM [Nikolaus] According to the RFCs, there is no such thing as a delimination of field headers.
12:06:10 AM [Nikolaus] Where did you get this concept from?
12:06:20 AM [Nikolaus] This isn't unclear, it simply does not exist.
12:13:27 AM [Nikolaus] Tim?
12:14:01 AM [Tim R] Give me a moment
12:20:55 AM [Tim R] "X-Special-action: This is a sample of user-defined field-names. There could also be a field-name "Special-action", but its name might later be preempted" basically, whenever you have a lower-case after the "-" it can be preempted, due to RFC822 A.3.3.
12:22:32 AM [Nikolaus] Any X- header may be preempted, this has nothing to do with the existance of a dash.
12:23:26 AM [Nikolaus] And preemption of a header is unrelated to its case anyway
12:23:35 AM [Nikolaus] I don't think we're getting anywhere here
12:23:50 AM [Tim R] Alright, I would recommend submitting a ticket about this
12:23:54 AM [Nikolaus] Could you maybe just escalate this issue?
12:24:29 AM [Nikolaus] Ok, thank you for your efforts.
12:24:32 AM [Tim R] I was actually getting this information from my escalation admin, so he said it is best to submit a ticket for this
12:24:34 AM [Tim R] You're welcome

(I am aware that RFC2616 says that HTTP headers are case-insensitive. However, I believe it is nevertheless reasonable to ask why Rackspace is not preserving the case, especially since we are taking about user-defined metadata keys).

Obviously Tim has no idea what he is talking about, but instead of just admitting that, he quotes random standards and adds a couple of extra "facts" to make them seem relevant to the question.

Book Review: The Mark of Coban

2013-07-09T00:00:00-07:00

This is a review for the second book in the Koban series by Stephen W. Bennett.

Apart from a few incongruities I liked the first book of this series ("Koban") a lot and was looking forward to the sequel. Unfortunately "The Mark of Koban" fell short of my expectations. Instead of improving upon the few things that didn't feel quite right in the first book, the sequel feels a lot less thought-through.

The story can easily be summarized as follows (mild spoiler alert for this paragraph): the Krall are attacking humanity, and humanity responds by reintroducing ground armies, but they don't do very well. The Koban humans improve themselves by genetic engineering to such an extent that they really aren't in danger from anything. The can single handedly beat Krall warriors, have rippers as pets, and telepathy (at first just on contact, but later they discover that it also works over astronomic distances through tachyon space). In the end they capture three Krall ships and go after the Krall's supply lines. This is where the book ends.

As you may have already concluded from the above summary, the book is really way too long for the stuff that actually happens. One reason for this is the lengthy description of the Krall raids on different human planets. Unfortunately, both the planets and the characters attacking and defending them are only introduced for this sole purpose. There doesn't seem to be any connection to the main story line, and as a result I quickly stopped caring about both characters and outcomes. If the goal was to give some indication of the Krall's progress, this could have easily been achieved by cutting all the planetary assault scenes to just a few pages.

Another problem with the book is that many scenes seem completely artifical. There are several parts where characters begin a conversation by "reviewing" the events of the last x years. In another scene two couples simultenously discover that they can read each others minds when they both touch a ripper's frill, and the first thing they say to each other is that they accept the marriage proposal and they're willing to do some nasty act they were thinking of. And that's it. After that, telepathy is pretty much an established thing for the rest of the book. In a third scene the geneticist giving someone an injection starts by "reviewing" the terminology that they have established for people with different levels of genetic enhancements: SGs, SG1s, TGs, TG1s, and some. That terminology is then actually used throughout the rest of the book by people in conversations ("My son is a TG, but he wants to become a TG1"). To me this seemed as if the author was just using some placeholders with the intention of later revisiting this.

So, to summarize: very little is actually happening in Mark of Koban, and the book fails to build up any tension (either because the reader doesn't care about the characters, or because the characters just don't have anything to fear). Many scenes are implausible and pull you out of the story. Overall, my impression is that this book was published too early. It could have been a lot better if the author spend some more time on it (as he demonstrated with the prequel).

Verdict: skip it.

Book Review: Koban

2013-06-30T00:00:00-07:00

This is a review for the first book in the Koban series by Stephen Bennett.

This review may sound rather critical, so let me begin by stating that overall, I enjoyed the book a lot. However, there were three issues that really disappointed/annoyed me disproportionally. So when continuing to read, keep in mind that these are really the only negative points that I noticed, i.e. in every other aspect the book is great.

The first issue that really bugged me is with the grand story ark. The entire story is based on the idea that the krall are doing "combat tests" with some captured humans on an isolated world to determine if it's worth fighting a good man-to-krall combat with all of humanity, or if they should just eradicate all of humanity from space using their superior technology. Unfortunately, there doesn't seem to be any need for this separate combat testing in the first place. Why don't the krall simply attack humanity? If the fight becomes too boring, they can always fall back on their technological advantage anyway. There truly seems to be no reason at all to first test a few humans, if the eventual goal is war on humanity in any case.

Another thing I just couldn't wrap my mind around is the intelligence (or lack thereof) of the human's AI. All in all, this is definitely a pretty clever AI. It can understand and respond pretty much any sentence (when addressed directly), is able to learn new languages on its own initiative, and clearly also able to adapt to unforeseen situations and make independent inferences. However, the very same AI is described as having serious trouble figuring out when it'd be appropriate to answer a question or provide information. This leads to the main character to come up with some complicated scheme for asking questions without the Krall noticing that just doesn't make any sense at all. Given the other abilities of the AI, it just doesn't seem reasonable that it wouldn't be able to figure out when it is expected to respond.

The third point I'd like to make may seem a bit unfair, but I think that in parts the author was actually being too ambitious for his own good. For example, one of the main ideas is that the dynamics between men and women have changed fundamentally after some event in the books past. This idea certainly has a lot of potential, and I would have loved to see how it affects the story and interaction of the characters. This is also well done in the first chapter or so, but after that (when the humans arrive on Koban) one almost gets the impression that the author got tired of having to think about this, because suddenly all that's left of an entirely different social context is a different salutation and different surnames. Given the potential of the idea, this is rather dissappointing, and I think the book would actually been better if it hadn't introduced the idea at all rather than introducing it for a little while and then discarding it entirely. In another example, it seems that the author could not quite decide if the Koban animals are sentient or not. I believe the goal was probably to tell the reader about them being (a little bit?) sentient, while at the same time making it clear that the characters are not aware of that. The result, however, is a slightly irritating dissonance when the book jumps from one viewpoint to the other.

While not related to the quality of the book, I also want to briefly comment on the book's own description (as it's used on the cover letter or the product page of your favorite ebook store): this description is completely misleading. While there is some talk about genetic modification, it's mostly about two guys slightly enhancing their strength and endurance towards the end. The danger of loosing their humanity by carrying this further is expressed in a few sentences in a conversation at the very end of the book, and the phrase about the bio-scientist producing "better, smarter fighters" is just plain wrong. The rippers (I suppose this is what the big cat is supposed to be) also feature in just one scene that leaves the reader with a lot of open questions. My best guess is that the cover is meant to describe the triology rather than this book (I haven't read the other books yet though).

Verdict: try it.

Book Review: The Second Ship

2013-06-17T00:00:00-07:00

This is a review for the first book in the Rho Agenda series by Richard Phillips.

I've stopped reading this after the first ~20% of the book, so the following may contain spoilers from up to then. That said, in my opinion most the story is so predictable that reading this is really not going to ruin anything for you.

So, what happens? Well, the book starts with a just-graduated postdoc setting up an experiment on a crashed ufo. Apparently a bunch of really clever scientist has been working on the ship for a few years, but they never managed to even scratch its surface, let alone getting it open. So now it's sitting all alone in a hangar, and a single scientist can mess with it as he pleases.

Now, the freshly baked Dr. has an incredibly clever idea. Apparently back then eyewitnesses noticed a blue glow around the crashing ship, which could only be caused by Cherenkov radiation (we'll just buy that for now), but our Dr. can see no possible source of such radiation. Obviously, he concludes, Cherenkov radiation is going to have some effect on the ship. So he sets up a little particle accelerator ring and directs a small beam of such radiation on the ship. Lo and behold, the ship opens(!). Up to this point the story isn't actually quite as bad as it sounds. You could probably look over the fact a single postdoc working alone on an alien ship is rather unlikely and discard the missing connection between an observed blue glow during flight and an opening mechanism. However, from this point on the book actually started to make me agitated.

What happens next is that our ingenious scientist, being alone in a hangar, doesn't tell anyone anything, but just walks into the ship. The fact that e.g. it may close just as randomly as it opened doesn't even occur to him. Of course, something bad happens and turns the Dr. into a really clever bad guy who goes on to win 3 Nobel prizes before he is 30.

With this backstory, the book goes to the present where three teenagers and a drug addict (separately) find a second alien ship somewhere in a cave in New Mexico. They're teenagers and drug addicts, so them entering the ship through a hole (that was apparently shot by the first ship) without telling anyone is half-way reasonable. It becomes less plausible when they start putting some weird alien headsets on, which cause them a lot of pain and give them something like a memory dump of the second ship chasing the first ship and both ships crashing into earth. The drug addict concludes that the first ship is obviously an evil ship from the devil, the second ship sent by god, and the headsets meant for the 4 apocalytic riders (of which he is the first). (I don't know if that is just a random interlude or if this guy or his ideas are going to become important again later on).

The teenagers, on the other hand, conclude from this obviously trustworthy source of knowledge (who wouldn't take everything that and alien device dumps into your memory at face value) that they should not tell anyone about the ship (because "it would be bad if the same people who work on the first ship would investigate this one as well"). Instead, they decide to keep exploring it on their own (but what the want to achieve with that doesn't become clear). It then turns out the alien headsets gave them photographic memory and really good motoric skills, but that unfortunately gets them into trouble when they all quote the same passage from a book in a school exam. This delays their further exploring of the ship.

Meanwhile, the clever scientist from the beginning has made it to laboratory head and is up to something nasty. When an auditor reviews the files on his computer (of course, alone and after work hours), he shows up and tricks her into coming with him into a secret room in the ufo -- the trick being that he tells her that he'll show her even more of the bad things he's been up to. Having just realized how obvously scheming this guy actually is, the auditor does not hesitate at all to enter a secret room alone with the person for whose guilt she just found evidence. Dr. evil then gives her a good punch and an injection, which apparently turns her into his puppet.

At this point I stopped reading.

Book Review: Strange Attractors

2011-01-07T00:00:00-08:00

This is a review for the second book in the Chaos Chronicles by Jeffrey A. Carver. It is preceded by Neptune Crossing.

I enjoyed Neptune Crossing very much and couldn't wait to read this book. However, I have to say that it was quite a disappointment. The characters all remain very flat and unconvincing. The whole book consists of Bandicut and a couple of fellow aliens chasing and being chased by some computational "disturbance", whose nature is never resolved. The characters actions also do not make any sense to the reader, one just keeps wondering why the hell they are doing what they're doing.

Compared to Neptune Crossing, the author unleashes a wealth of alien creatures and technologies at the reader, but in contrast to the capturing descriptions in the previous novel, all the alien scenery remains a flat, unimpressive background that not even the novel's hero truly interacts with.

What especially annoyed me was the repeated dying of the quarx. Presented as a being with many memories of working with several races in the first book, it just doesn't jibe with it dying and resurrecting about 4 times within the same host.

Verdict: read Neptune Crossing (it's great!) and stop with that.

Nikolaus Rath's Website

ZFS-on-NBD - My Verdict

Appendix: Request Size Distribution

S3QL vs ZFS-on-NBD

MATLAB is a terrible programming language

Lack of Documentation

Ambigious Syntax

Counterintuitively limited syntax

Function semantics are needlessly overloaded

Everything is in the same namespace

Parameter names are treated as strings

No 1-D arrays

Cell Array Iteration is awkward

Semicolon Changes Semantics

Functions are too clever

No way to store static data

Programmatic error handling is near impossible

My swapalease.com experience

BTRFS Reliability - a datapoint

Why you should give Mercurial a shot

Mercurial requires fewer concepts to grasp

Using Git takes up more mental capacity

Mercurial commands are more intuitive

Specifying Mercurial revisions is more powerful and intuitive

Mercurial prevents accidental history rewrite

Mercurial messages are easier to understand

In a nutshell

Resolving (apparent) Windows Update freeze when "Checking for Updates"

The problem with Windows problems

Distilling an explanation

Summary

What's wrong with Gnus

Turning an editor into a MUA

Concluding Thoughts

SSD Caching under Linux

lvmcache

bcache

Mercurial for Git Users (and vice versa)

Fundamental representation

Mercurial commits have ephemeral handles

Git branches are Mercurial bookmarks

Git does not have named branches

Git DAG leaves must be referenced to survive

Mercurial does not have annotated tags

Git supports changing history on remotes

Git tracks upstream status in the revision graph

Git does not have phases

Git has a plumbing layer

Git has a staging area

Git does not have patch management

Mercurial is less complex

Mercurial has a nice GUI

BitBucket sucks, GitHub rules

Non-destructive history editing

Things that are not different

See also

Review: KDLinks X1 Dashcam

On the Beauty of Python's ExitStack

The Problem

ExitStack to the rescue

No vigor regeneration bug in Witcher 2 with FCR

Solution 1

Solution 2

Book Review: Zoo

Book Review: Exodus: Empires at War

How to find oldest tagged descendant of a given changeset

Book Review: The long way to a small, angry planet

Book Review: Half-Life

Book Review: Mastering C-Make

Book Review: Invincible

Book Review: The Mistborn Triology

The RackSpace CloudFiles support is terrible and incompetent

Chunked Uploads

Data Durability

HTTP Headers

Book Review: The Mark of Coban

Book Review: Koban

Book Review: The Second Ship

Book Review: Strange Attractors