Storage Backends

S3QL supports different backends to store data at different service providers and using different protocols. A storage url specifies a backend together with some backend-specific information and uniquely identifies an S3QL file system. The syntax of the storage url depends on the backend and is described for every backend below.

Furthermore, every S3QL commands that accepts a storage url also accepts a --backend-options parameter than can be used to pass backend-specific options to the backend module. The available options are documented with the respective backends below.

All storage backends respect the http_proxy (for plain HTTP connections) and https_proxy (for SSL connections) environment variables.

Note

Storage backends are not necessarily compatible. Don’t expect that you can e.g. copy the data stored by the local backend into Amazon S3 using some non-S3QL tool and then access it with S3QL’s S3 backend).

Google Storage

Google Storage is an online storage service offered by Google. In order to use it with S3QL, make sure that you enable the JSON API in the GCP Console API Library

The Google Storage backend uses OAuth2 authentication or ADC (Application Default Credentials).

To use OAuth2 authentication, specify oauth2 as the backend login and a valid OAuth2 refresh token as the backend password. To obtain a refresh token, you can use the s3ql_oauth_client program. It will instruct you to open a specific URL in your browser, enter a code and authenticate with your Google account. Once this procedure is complete, s3ql_oauth_client will print out the refresh token. Note that you need to do this procedure only once, the refresh token will remain valid until you explicitly revoke it.

To use ADC, specify adc as the backend login and use an arbitrary value for the backend password.

To create a Google Storage bucket, you can use e.g. the Google Storage Manager. The storage URL for accessing the bucket in S3QL is then

gs://<bucketname>/<prefix>

Here bucketname is the name of the bucket, and prefix can be an arbitrary prefix that will be prepended to all object names used by S3QL. This allows you to store several S3QL file systems in the same Google Storage bucket.

The Google Storage backend accepts the following backend options:

ssl-ca-path=<path>

Instead of using the system’s default certificate store, validate the server certificate against the specified CA certificates. <path> may be either a file containing multiple certificates, or a directory containing one certificate per file.

tcp-timeout

Specifies the timeout used for TCP connections. If no data can be exchanged with the remote server for longer than this period, the TCP connection is closed and re-established (default: 20 seconds).

Amazon S3

Amazon S3 is the online storage service offered by Amazon Web Services (AWS). Buckets need to be created with the AWS Management Console. For best performance, it is recommend to create the bucket in the geographically closest storage region.

The storage URL for accessing S3 buckets in S3QL has the form

s3://<region>/<bucket>/<prefix>

prefix can be an arbitrary prefix that will be prepended to all object names used by S3QL. This allows you to store several S3QL file systems in the same S3 bucket. For example, the storage URL

s3://ap-south-1/foomart.net/data/s3ql_backup/

refers to the foomart.net bucket in the ap-south-1 region. All storage objects that S3QL stores in this bucket will be prefixed with data/s3ql_backup/.

The backend login and password for accessing S3 are not the user id and password that you use to log into the Amazon Webpage, but the AWS access key id and AWS secret access key shown under My Account/Access Identifiers.

The Amazon S3 backend accepts the following backend options:

no-ssl

Disable encrypted (https) connections and use plain HTTP instead.

ssl-ca-path=<path>

Instead of using the system’s default certificate store, validate the server certificate against the specified CA certificates. <path> may be either a file containing multiple certificates, or a directory containing one certificate per file.

tcp-timeout

Specifies the timeout used for TCP connections. If no data can be exchanged with the remote server for longer than this period, the TCP connection is closed and re-established (default: 20 seconds).

sse

Enable server side encryption. Both costs & benefits of S3 server side encryption are probably rather small, and this option does not affect any client side encryption performed by S3QL itself.

it

Use INTELLIGENT_TIERING storage class for new objects. See AWS S3 Storage classes

ia

Use STANDARD_IA (infrequent access) storage class for new objects. See AWS S3 Storage classes

oia

Use ONEZONE_IA (infrequent access) storage class for new objects. See AWS S3 Storage classes

rrs

Enable reduced redundancy storage for newly created objects (overwrites the ia option).

When enabling this option, it is strongly recommended to periodically run s3ql_verify, because objects that are lost by the storage backend may cause subsequent data loss even later in time due to the data de-duplication feature of S3QL (see Data Durability for details).

OpenStack/Swift

OpenStack is an open-source cloud server application suite. Swift is the cloud storage module of OpenStack. Swift/OpenStack storage is offered by many different companies.

There are two different storage URL for the OpenStack backend that make use of different authentication APIs. For legacy (v1) authentication, the storage URL is

swift://<hostname>[:<port>]/<container>[/<prefix>]

for Keystone (v2 and v3) authentication, the storage URL is

swiftks://<hostname>[:<port>]/<region>:<container>[/<prefix>]

When using Keystone v3 authentication, the domain backend option (see below) must be specified too.

In both cases, hostname name should be the name of the authentication server. The storage container must already exist (most OpenStack providers offer either a web frontend or a command line tool for creating containers). prefix can be an arbitrary prefix that will be prepended to all object names used by S3QL, which can be used to store multiple S3QL file systems in the same container.

When using legacy authentication, the backend login and password correspond to the OpenStack username and API Access Key. When using Keystone authentication, the backend password is your regular OpenStack password and the backend login combines you OpenStack username and tenant/project in the form <tenant>:<user>. If no tenant is required, the OpenStack username alone may be used as backend login. For Keystone v2 <tenant> needs to be the tenant name (OS_TENANT_NAME in the OpenStack RC File). For Keystone v3 <tenant> needs to be the project ID (OS_TENANT_ID in the OpenStack RC File).

The OpenStack backend accepts the following backend options:

no-ssl

Use plain HTTP to connect to the authentication server. This option does not directly affect the connection to the storage server. Whether HTTPS or plain HTTP is used to connect to the storage server is determined by the authentication server.

ssl-ca-path=<path>

Instead of using the system’s default certificate store, validate the server certificate against the specified CA certificates. <path> may be either a file containing multiple certificates, or a directory containing one certificate per file.

tcp-timeout

Specifies the timeout used for TCP connections. If no data can be exchanged with the remote server for longer than this period, the TCP connection is closed and re-established (default: 20 seconds).

disable-expect100

If this option is specified, S3QL does not use the Expect: continue header (cf. RFC2616, section 8.2.3) when uploading data to the server. This can be used to work around broken storage servers that don’t fully support HTTP 1.1, but may decrease performance as object data will be transmitted to the server more than once in some circumstances.

no-feature-detection

If this option is specified, S3QL does not try to dynamically detect advanced features of the Swift backend. In this case S3QL can only use the least common denominator of supported Swift versions and configurations.

domain

If this option is specified, S3QL will use the Keystone v3 API. The default domain ID for OpenStack installations is default. If this option is specified without setting the project-domain option, this will be used for both the project and the user domain. You need to provide the domain ID not the domain name to this option. If your provider did not give you a domain ID, then it is most likely default.

domain-is-name

If your provider only supplies you with the name of your domain and not the uuid, you need to set this domain-is-name option, whereby the domain is used as the domain name, not the domain id.

project-domain

In simple cases, the project domain will be the same as the auth domain. If the project-domain option is not specified, it will be assumed to be the same as the user domain. You need to provide the domain ID not the domain name to this option. If your provider did not give you a domain ID, then it is most likely default.

project-domain-is-name

If your provider only supplies you with the name of your project domain and not the uuid, you need to set this project-domain-name option, whereby the project-domain is used as the name of the project domain, not the id of the project domain. If project-domain-is-name is not set, it is assumed the same as domain-is-name.

tenant-is-name

Some providers use the tenant name to specify the storage location, and others use the tenant id. If your provider uses the tenant name and not the id, you need to set this tenant-is-name option. If tenant-is-name is provided, the <tenant> component of the login is used as the tenant name, not the tenant id.

identity-url

If your provider does not use hostname:port/v3/auth/tokens but instead has another identity URL, you can use this option. It allows to replace /v3/auth/tokens with another path like for example /identity/v3/auth/tokens

Rackspace CloudFiles

Rackspace CloudFiles uses OpenStack internally, so it is possible to just use the OpenStack/Swift backend (see above) with auth.api.rackspacecloud.com as the host name. For convenience, there is also a special rackspace backend that uses a storage URL of the form

rackspace://<region>/<container>[/<prefix>]

The storage container must already exist in the selected region. prefix can be an arbitrary prefix that will be prepended to all object names used by S3QL and can be used to store several S3QL file systems in the same container.

You can create a storage container for S3QL using the Cloud Control Panel (click on Files in the topmost menu bar).

The Rackspace backend accepts the same backend options as the OpenStack backend.

S3 compatible

The S3 compatible backend allows S3QL to access any storage service that uses the same protocol as Amazon S3. The storage URL has the form

s3c://<hostname>:<port>/<bucketname>/<prefix>

Here bucketname is the name of an (existing) bucket, and prefix can be an arbitrary prefix that will be prepended to all object names used by S3QL. This allows you to store several S3QL file systems in the same bucket.

The S3 compatible backend accepts the following backend options:

no-ssl

Disable encrypted (https) connections and use plain HTTP instead.

ssl-ca-path=<path>

Instead of using the system’s default certificate store, validate the server certificate against the specified CA certificates. <path> may be either a file containing multiple certificates, or a directory containing one certificate per file.

tcp-timeout

Specifies the timeout used for TCP connections. If no data can be exchanged with the remote server for longer than this period, the TCP connection is closed and re-established (default: 20 seconds).

disable-expect100

If this option is specified, S3QL does not use the Expect: continue header (cf. RFC2616, section 8.2.3) when uploading data to the server. This can be used to work around broken storage servers that don’t fully support HTTP 1.1, but may decrease performance as object data will be transmitted to the server more than once in some circumstances.

dumb-copy

If this option is specified, S3QL assumes that a COPY request to the storage server has succeeded as soon as the server returns a 200 OK status. The S3 COPY API specifies that the storage server may still return an error in the request body (see the copy proposal for the rationale), so this option should only be used if you are certain that your storage server only returns 200 OK when the copy operation has been completely and successfully carried out. Using this option may be necessary if your storage server does not return a valid response body for a successful copy operation.

Backblaze B2

Backblaze B2 is a cloud storage with its own API.

Warning

S3QL developers do not have access to a Backblaze instance, so this backend is not being tested before release and may break randomly. This backend depends on Backblaze users to test and maintain it (aka submit pull requests when it doesn’t work).

The storage URL for backblaze b2 storage is

b2://<bucket-name>[/<prefix>]

bucket-name is an (existing) bucket which has to be accessible with the provided account key. The prefix will be appended to all names used by S3QL and can be used to hold separate S3QL repositories in the same bucket.

It is also possible to use an application key. The required key capabilities are the following:

  • listBuckets

  • listFiles

  • readFiles

  • writeFiles

  • deleteFiles

disable-versions

If versioning of the bucket is not enabled, this option can be set. When deleting objects, the bucket will not be scanned for all file versions because it will be implied that only the one (the most recent) version of a file exists. This will use only one class B transaction instead of (possibly) multiple class C transactions.

retry-on-cap-exceeded

If there are data/transaction caps set for the backblaze account, this option controls if operations should be retried as cap counters are reset every day. Otherwise the exception would abort the program.

test_mode=<value>

This option puts the backblaze B2 server into test mode by adding a special header to the requests. Use this option only to test the failure resiliency of the backend implementation as it causes unnecessary traffic, delays and transactions.

Valid values are documented in https://www.backblaze.com/docs/en/cloud-storage-integration-checklist and include:

  • fail_some_uploads to randomly fail some uploads.

  • expire_some_account_authorization_tokens to let the server fail some authorization tokens.

  • force_cap_exceeded to let the server to behave as if the data/transaction caps were exceeded.

tcp-timeout

Specifies the timeout used for TCP connections. If no data can be exchanged with the remote server for longer than this period, the TCP connection is closed and re-established (default: 20 seconds).

Local

S3QL is also able to store its data on the local file system. This can be used to backup data on external media, or to access external services that S3QL can not talk to directly (e.g., it is possible to store data over SSH by first mounting the remote system using sshfs and then using the local backend to store the data in the sshfs mountpoint).

The storage URL for local storage is

local://<path>

Note that you have to write three consecutive slashes to specify an absolute path, e.g. local:///var/archive. Also, relative paths will automatically be converted to absolute paths before the authentication file (see Storing Backend Information and Credentials) is read, i.e. if you are in the /home/john directory and try to mount local://s3ql, the corresponding section in the authentication file must match the storage url local:///home/john/s3ql.

The local backend does not accept any backend options.