How to Port a Storage System to Blockstack

Hey everyone! I just pushed a skeleton storage driver here. If you’re a developer and want to use your preferred storage system to host your Blockstack data, feel free to take a look and send us a PR with the driver implementation :slight_smile:

Background

Blockstack storage drivers are responsible for implementing a get/put/delete interface for two logical types of I/O: mutable data, and immutable data.

Mutable data is data that does NOT touch the underlying blockchain. Instead, mutable data is signed by a private key derived from the keypair listed in the user’s zone file. Most user data (profiles, application data stores) follows the mutable data I/O model, since mutable I/O can happen as fast as the storage service allows.

Immutable data is data that touches the underlying blockchain. Each ‘put’ and ‘delete’ corresponds to an on-chain transaction (specificially, a NAME_UPDATE transaction that modifies the user’s zone file). Similarly, each ‘get’ corresponds to a previously-sent transaction. Immutable data is appropriate for storing data that will only be written once, where freshness, integrity, and consistency are more important than I/O performance (examples include storing PGP keys, software releases, and certificates).

In practice, most storage drivers can implement the mutable I/O path and immutable I/O path the same way; the only difference between the two will be the interfaces. For example, the disk driver simply stores everything to disk, immutable or mutable.

Replication Strategy

Replication in Blockstack is best-effort. On a given put, some data may be successfully replicated to some storage providers, and some data may not. Blockstack automatically masks any inconsistencies that get introduced
(see Responsibilities below). Blockstack uses three configuration fields in its config file to determine how to replicate data.

  • blockstack-client.storage_drivers. This is the list of storage drivers to use to both read and write data. All of these drivers will be attempted on any get or put. A get or put is attempted on each driver in the order they are listed (but this may change in the future).

  • blockstack-client.storage_drivers_required_write. This is the list of storage drivers that must successfully put data in order for a write to succeed. If even one of them fails, the entire write fails.

  • blockstack-client.storage_drivers_local. This is the list of drivers that keep their data invisible to other clients. For example, the disk driver is listed here by default since writes to disk are invisible to other clients.

In order for put to work on mutable data, there must be at least one driver listed in blockstack-client.storage_drivers_required_write that is NOT listed blockstack-client.storage_drivers_local.

There are no long-term plans for creating more sophisticated replication strategies. This is because more sophisticated strategies can be implemented as “meta drivers” that load existing drivers as modules, and forward get and put requests to them according to the desired strategy. For example, a “meta driver” could be written to turn data on put into erasure codes, replicate the erasure codes to multiple separate providers, and reconstruct the data on get even if some providers later go offline.

Responsibilities

Blockstack handles a lot of higher-level storage responsibilities on its own, so the driver implementer can focus on interfacing with the storage provider and/or creating the desired replication strategy. The responsibilities are divided as follows:

  • Consistency. Blockstack takes care of writing immutable data hashes to the zone file, and takes care of maintaining consistency info for mutable data. Specifically:

    • Blockstack guarantees per-key monotonic read consistency for mutable data (i.e. a get on a key returns the same or newer data as the previous get on the same key, but does not guarantee that the get returns the same data written by the last put on it).

    • A correct driver must guarantee per-key read-your-writes consistency (i.e. a put followed by a get on the same key should return the last-put data to the local client).

    • It is acceptable to rely on the storage system to enforce consistency. For example, most cloud storage providers claim to offer per-key sequential consistency already (i.e. a put followed by a get on the same key returns the data stored by the put to all clients). However, the driver must mask weak consistency by the storage provider if the provider cannot offer per-key read-your-writes consistency.

  • Authenticity. Blockstack signs all data before giving it to the driver. The driver does not need to implement separate authenticity checks.

  • Integrity. Similarly, Blockstack ensures that the data hasn’t been tampered with. No action is required by the driver.

  • Data Confidentiality. Blockstack encrypts data before giving it to the driver, and decrypts it after it loads it. However, Blockstack does not guarantee that all the data it writes will be encrypted (i.e. the user or application may specify that it is “public” data). If this is unacceptable, then the driver may take its own additional steps to ensure data confidentiality.

  • Behavioral Confidentiality. Blockstack does NOT take any action to hide network-visible access patterns. Without assistance from the driver, someone watching the network can do timing analysis on the packets Blockstack sends and receives, and deduce things like the user’s network location and the application being used. If behavior confidentiality is required, then the driver must take additional steps to implement it.

3 Likes