This is a follow-up from the Developer Engineering meeting last week.
Background
For a variety of reasons, users need to be able to store different app data in different Gaia hubs. This can include reasons like:
- Better fault tolerance: If one Gaia node goes down, others can still serve replicas
- Better cost/performance trade-offs: Depending on the workload, different apps can make more efficient of use of different storage providers. For example, Amazon Glacier could be used to cheaply store large amounts of infrequently-accessed data, but would not be used for small data or data that is accessed frequently.
- More flexible security model: By choosing where a particular app’s data lives, users can add custom authentication requirements on both the read and write paths without affecting other apps.
Proposal: Per-App Gaia Hubs
Right now, the user has one Gaia hub, and it takes a Bitcoin transaction to change it. This could be extended to allow the user to specify multiple Gaia hubs, but this is still too rigid—in the limit, the user may need to change their set of Gaia hubs each time they sign into a new application.
Instead of trying to put multiple Gaia hub URLs into a Blockstack ID’s zone file, the user would instead add one or more Gaia hub URLs to each application’s entry in her profile. These would be visible in the user’s apps
listing.
Reading and writing to the user’s Gaia hubs would depend on a “replication strategy” set in the getFile()
and putFile()
methods. I recommend having a few built-in strategies, but importantly the API must allow the developer to specify a callback to implement a bespoke replication strategy. This would look like the following:
On the read path:
-
getFile(path, {replicationStrategy: "any"})
: This returns data if any of the user’s app-specific Gaia hubs return data. The hubs are tried in-order until one succeeds. This would be the default strategy. -
getFile(path, {replicationStrategy: "primary"})
: This returns data if the first of the user’s app-specific Gaia hubs returns data. No other hubs are tried. This is meant to be used in conjunction withputFile
with its"primary"
strategy, in order to accommodate reads on data that may undergo write-bursts. -
getFile(path, {replicationStrategy: "all"})
: This only returns data if all of the user’s app-specific Gaia hubs return the same data. -
getFile(path, {replicationStrategy: function (hubURLs: Array<string>) : Promise<Buffer>})
: This uses a custom strategy, implemented as a callback that takes a list of the user’s Gaia hubs as input and returns aPromise
to aBuffer
that contains the data. The callback would throw an exception if it could not fetch data.
On the write path:
-
putFile(path, data, {replicationStrategy: "all"})
: This only returns successfully if all of the user’s app-specific Gaia hubs acknowledge receipt of the data. This would be the default strategy. -
putFile(path, data, {replicationStrategy: "primary"})
: This only returns successfully if the first of the user’s app-specific Gaia hubs acknowledge receipt of the data. All Gaia hubs will be written to, but secondaries may silently fail. This strategy would be good for “burst” writes, where speed matters more than consistency or durability. It should be used in conjunction withgetFile(path, {replicationStrategy: "primary"})
. You would follow a burst ofputFile
calls with this strategy by a call toputFile
with the"all"
strategy. -
putFile(path, data, {replicationStrategy: function (hubURLs: Array<string>) : Promise<Array<URLs>>})
: This uses a custom strategy, implemented as a callback that takes a list of the user’s Gaia hubs as input and returns aPromise
of a list of strings that each represent a URL to a data replica. The callback would throw an exception if it failed to replicate data somehow (implementation-defined).
Design Philosophy
The reason for making blockstack.js (and by extension, the application) responsible for replica placement and replica consistency is that it makes it straightforward to deal with partial failures in application-specific ways. For example, if I try to save a short-lived photo to three Gaia hubs in a Snapchat-like dapp, but only two writes succeed, the application may still consider the write “successful.” As another example, saving that same photo as my profile photo in a Facebook-like dapp would require successful acknowledgement from all three Gaia hubs in order to consider the write successful. The Gaia hub has no insight into the application’s needs; therefore the application must be responsible for driving replica placement and consistency on its own (i.e. via the callback interface).
I point this out because part of the previous discussions on how to let the user add multiple Gaia hubs revolved around making the Gaia hub “smarter” by being able to handle partial writes/reads on its own, and mask failures (or propagate them in some meaningful way). This line of thought was ultimately scrapped, because it lead to really complex implementations that are hard to reason about but get us no closer towards solving the problem than the strategy outlined in this proposal.
Implementation
User Profiles
The current profile structure “approximately” allows per-app Gaia hubs today:
$ blockstack-cli lookup judecnelson.id | jq '.profile.apps'
{
"https://app.graphitedocs.com": "https://gaia.blockstack.org/hub/16YzkXKsYWZKypRcXk6vn4ETu1GBzoiZLw/",
"https://www.chat.hihermes.co": "https://gaia.blockstack.org/hub/16wcVWogB3U3GAaHMVRXgWX68mGdN25Xkp/",
"https://www.stealthy.im": "https://gaia.blockstack.org/hub/1ERc9KRMnpG7x4v8mN8e2WW7viEVbZnpvr/",
"http://publik.ykliao.com": "https://gaia.blockstack.org/hub/1GzmHhQuUP4aKmnwXLCEEyJ2Won4gZkpJP/",
}
With a few modifications, a user could instead have something more like this:
$ blockstack-cli lookup judecnelson.id | jq '.profile.apps'
{
"https://app.graphitedocs.com": [
"https://gaia.blockstack.org/hub/16YzkXKsYWZKypRcXk6vn4ETu1GBzoiZLw/",
"https://gaia.cs.princeton.edu/hub/16YzkXKsYWZKypRcXk6vn4ETu1GBzoiZLw/"
],
"https://www.chat.hihermes.co": [
"https://gaia.blockstack.org/hub/16wcVWogB3U3GAaHMVRXgWX68mGdN25Xkp/",
"https://www.private-gaia-hubs.eu/hub/16wcVWogB3U3GAaHMVRXgWX68mGdN25Xkp/",
"https://www.my-local-server.com/hub/16wcVWogB3U3GAaHMVRXgWX68mGdN25Xkp/"
],
"https://www.stealthy.im": "https://gaia.blockstack.org/hub/1ERc9KRMnpG7x4v8mN8e2WW7viEVbZnpvr/",
"http://publik.ykliao.com": "https://gaia.blockstack.org/hub/1GzmHhQuUP4aKmnwXLCEEyJ2Won4gZkpJP/",
}
The default strategies listed above would continue to work in applications that just have one Gaia hub, without requiring any application-level code changes.
Sign-in
The user would need a way to specify which Gaia hub(s) would be used to load and store data when they sign into the application.
Profile Editing
The user needs a way to add/remove Gaia hubs from their profile, independent of applications.
Miscellaneous
This problem is related to being able to explore a Gaia hub’s data and enumerate files. This will require extending the Gaia hub driver model to include a list()
API call for enumerating previously-written files. This feature should be made available at around the same time as this proposal is implemented, because we’ll need a way to migrate data from one Gaia hub to another once the user can add/remove them at will.
CC @aaron @larry @jehunter5811 @yukan for your thoughts
EDIT 1: Add “primary” strategy to read and write paths, and remove “any” from the write path.