Advanced S3QL Features

Snapshotting and Copy-on-Write

The command s3qlcp can be used to duplicate a directory tree without physically copying the file contents. This is made possible by the data de-duplication feature of S3QL.

The syntax of s3qlcp is:

s3qlcp [options] <src> <target>

This will replicate the contents of the directory <src> in the directory <target>. <src> has to be an existing directory and <target> must not exist. Moreover, both directories have to be within the same S3QL file system.

The replication will not take any additional space for the file contents. Metadata usage will increase proportional to the amount of directory entries and inodes. Only if one of directories is modified later on, the modified data will take additional storage space.

s3qlcp can only be called by the user that mounted the file system and (if the file system was mounted with --allow-other or --allow-root) the root user.

After the replication, both source and target directory will still be completely ordinary directories. You can regard <src> as a snapshot of <target> or vice versa. However, the most common usage of s3qlcp is to regularly duplicate the same source directory, say documents, to different target directories. For a e.g. monthly replication, the target directories would typically be named something like documents_January for the replication in January, documents_February for the replication in February etc. In this case it is clear that the target directories should be regarded as snapshots of the source directory.

Exactly the same effect could be achieved by an ordinary copy program like cp -a. However, this procedure would be orders of magnitude slower, because cp would have to read every file completely (so that S3QL had to fetch all the data over the network from the backend) before writing them into the destination folder (at which point S3QL would de-duplicate the data).

Snapshotting vs Hardlinking

Snapshot support in S3QL is inspired by the hardlinking feature that is offered by programs like rsync or storeBackup. These programs can create a hardlink instead of copying a file if an identical file already exists in the backup. However, using hardlinks has disadvantages:

  • backups and restores always have to be made with a special program that takes care of the hardlinking. The backups must not be touched by any other programs (they may make changes that inadvertently affect other hardlinked files)

  • special care needs to be taken to handle files which are already hardlinked (the restore program needs to know that the hardlink was not introduced by the backup program)

S3QL snapshots do not have these problems, and they can be used with any backup program.

Getting Statistics

You can get more information about a mounted S3QL file system with the s3qlstat command. It has the following syntax:

s3qlstat [options] <mountpoint>

This will print out something like this

Directory entries:    1488068
Inodes:               1482991
Data blocks:          87948
Total data size:      400 GiB
After de-duplication: 51 GiB (12.98% of total)
After compression:    43 GiB (10.85% of total, 83.60% of de-duplicated)
Database size:        172 MiB (uncompressed)
(some values do not take into account not-yet-uploaded dirty blocks in cache)

Probably the most interesting numbers are the total size of your data, the total size after duplication, and the final size after de-duplication and compression.

s3qlstat can only be called by the user that mounted the file system and (if the file system was mounted with --allow-other or --allow-root) the root user.

For a full list of available options, run s3qlstat --help.

Immutable Trees

The command s3qllock can be used to make a directory tree immutable. Immutable trees can no longer be changed in any way whatsoever. You can not add new files or directories and you can not change or delete existing files and directories. The only way to get rid of an immutable tree is to use the s3qlrm command (see below).

For example, to make the directory tree beneath the directory 2010-04-21 immutable, execute

s3qllock 2010-04-21

Immutability is a feature designed for backups. Traditionally, backups have been made on external tape drives. Once a backup was made, the tape drive was removed and locked away somewhere. This means that the contents of the backup are permanently fixed. Nothing (short of physical destruction) can change or delete files in the backup.

In contrast, when backing up into an online storage system like S3QL, all backups are available every time the file system is mounted. Nothing prevents a file in an old backup from being changed again later on. In the worst case, this may make your entire backup system worthless. Imagine that your system gets infected by a virus that simply deletes all files it can find – if the virus is active while the backup file system is mounted, the virus will destroy all backups together with the originals.

Even in the absence of malware,, being able to change a backup after it has been made is generally not a good idea. A common S3QL use case is to keep the file system mounted at all times and periodically create backups with rsync -a. This allows every user to recover her files from a backup without having to call the system administrator. However, this also allows every user to accidentally change or delete files in one of the old backups.

Making a backup immutable protects you against all these problems. Unless you happen to run into a virus that was specifically programmed to attack S3QL file systems, backups can be neither deleted nor changed after they have been made immutable.

Fast Recursive Removal

The s3qlrm command can be used to recursively delete files and directories on an S3QL file system. Although s3qlrm is faster than using e.g. rm -r, the main reason for its existence is that it allows you to delete immutable trees as well. The syntax is rather simple:

s3qlrm <directory>

Be warned that there is no additional confirmation. The directory will be removed entirely and immediately.

Runtime Configuration

s3qlctrl [options] <action> <mountpoint> …

The s3qlctrl command performs various actions on the S3QL file system mounted in mountpoint.

s3qlctrl can only be called by the user that mounted the file system and (if the file system was mounted with --allow-other or --allow-root) the root user.

The following actions may be specified:

flushcache:

Write all modified blocks to the backend. The command blocks until the cache is clean.

dropcache:

Flush the filesystem cache and then drop all contents (i.e., make the cache empty).

log:

Change the amount of information that is logged. The complete syntax is:

s3qlctrl [options] log <mountpoint> <level> [<module> ...]

here level is the desired new log level and may be either of debug, info or warn. One or more module may only be specified with the debug level and allow to restrict the debug output to just the listed modules.

cachesize:

Changes the cache size of the file system. This action requires an additional argument that specifies the new cache size in KiB, so the complete command line is:

s3qlctrl [options] cachesize <mountpoint> <new-cache-size>
backup-metadata:

Trigger a metadata backup.