Advanced S3QL Features

Snapshotting and Copy-on-Write

The command s3qlcp can be used to duplicate a directory tree without physically copying the file contents. This is made possible by the data de-duplication feature of S3QL.

The syntax of s3qlcp is:

s3qlcp [options] <src> <target>

This will replicate the contents of the directory <src> in the directory <target>. <src> has to be an existing directory and <target> must not exist. Moreover, both directories have to be within the same S3QL file system.

The replication will not take any additional space. Only if one of directories is modified later on, the modified data will take additional storage space.

s3qlcp can only be called by the user that mounted the file system and (if the file system was mounted with --allow-other or --allow-root) the root user.

Note that:

  • After the replication, both source and target directory will still be completely ordinary directories. You can regard <src> as a snapshot of <target> or vice versa. However, the most common usage of s3qlcp is to regularly duplicate the same source directory, say documents, to different target directories. For a e.g. monthly replication, the target directories would typically be named something like documents_January for the replication in January, documents_February for the replication in February etc. In this case it is clear that the target directories should be regarded as snapshots of the source directory.
  • Exactly the same effect could be achieved by an ordinary copy program like cp -a. However, this procedure would be orders of magnitude slower, because cp would have to read every file completely (so that S3QL had to fetch all the data over the network from the backend) before writing them into the destination folder.

Snapshotting vs Hardlinking

Snapshot support in S3QL is inspired by the hardlinking feature that is offered by programs like rsync or storeBackup. These programs can create a hardlink instead of copying a file if an identical file already exists in the backup. However, using hardlinks has two large disadvantages:

  • backups and restores always have to be made with a special program that takes care of the hardlinking. The backup must not be touched by any other programs (they may make changes that inadvertently affect other hardlinked files)
  • special care needs to be taken to handle files which are already hardlinked (the restore program needs to know that the hardlink was not just introduced by the backup program to safe space)

S3QL snapshots do not have these problems, and they can be used with any backup program.

Getting Statistics

You can get more information about a mounted S3QL file system with the s3qlstat command. It has the following syntax:

s3qlstat [options] <mountpoint>

This will print out something like this

Directory entries:    1488068
Inodes:               1482991
Data blocks:          87948
Total data size:      400 GiB
After de-duplication: 51 GiB (12.98% of total)
After compression:    43 GiB (10.85% of total, 83.60% of de-duplicated)
Database size:        172 MiB (uncompressed)
(some values do not take into account not-yet-uploaded dirty blocks in cache)

Probably the most interesting numbers are the total size of your data, the total size after duplication, and the final size after de-duplication and compression.

s3qlstat can only be called by the user that mounted the file system and (if the file system was mounted with --allow-other or --allow-root) the root user.

For a full list of available options, run s3qlstat --help.

Immutable Trees

The command s3qllock can be used to make a directory tree immutable. Immutable trees can no longer be changed in any way whatsoever. You can not add new files or directories and you can not change or delete existing files and directories. The only way to get rid of an immutable tree is to use the s3qlrm command (see below).

For example, to make the directory tree beneath the directory 2010-04-21 immutable, execute

s3qllock 2010-04-21

Immutability is a feature designed for backups. Traditionally, backups have been made on external tape drives. Once a backup was made, the tape drive was removed and locked somewhere in a shelf. This has the great advantage that the contents of the backup are now permanently fixed. Nothing (short of physical destruction) can change or delete files in the backup.

In contrast, when backing up into an online storage system like S3QL, all backups are available every time the file system is mounted. Nothing prevents a file in an old backup from being changed again later on. In the worst case, this may make your entire backup system worthless. Imagine that your system gets infected by a nasty virus that simply deletes all files it can find – if the virus is active while the backup file system is mounted, the virus will destroy all your old backups as well!

Even if the possibility of a malicious virus or trojan horse is excluded, being able to change a backup after it has been made is generally not a good idea. A common S3QL use case is to keep the file system mounted at all times and periodically create backups with rsync -a. This allows every user to recover her files from a backup without having to call the system administrator. However, this also allows every user to accidentally change or delete files in one of the old backups.

Making a backup immutable protects you against all these problems. Unless you happen to run into a virus that was specifically programmed to attack S3QL file systems, backups can be neither deleted nor changed after they have been made immutable.

Fast Recursive Removal

The s3qlrm command can be used to recursively delete files and directories on an S3QL file system. Although s3qlrm is faster than using e.g. rm -r, the main reason for its existence is that it allows you to delete immutable trees as well. The syntax is rather simple:

s3qlrm <directory>

Be warned that there is no additional confirmation. The directory will be removed entirely and immediately.

Runtime Configuration

The s3qlctrl can be used to control a mounted S3QL file system. Its syntax is

s3qlctrl [options] <action> <mountpoint> ...

<mountpoint> must be the location of a mounted S3QL file system. For a list of valid options, run s3qlctrl --help. <action> may be either of:

flushcache:Flush file system cache. The command blocks until the cache has been flushed.
dropcache:Flush, and then drop file system cache. The command blocks until the cache has been flushed and dropped.
log:Change log level.
cachesize:Change file system cache size.
upload-meta:Trigger a metadata upload.