In the Blockstack dev tools roadmap posted on this forum a few months ago, collections was identified as one of the most important upgrades to the platform. So the team at Blockstack PBC spent the last month working on a design proposal. We wanted to make sure that the result meets the requirements of end-users, developers as well as the ecosystem. We would love to get more input from the community.
Overview
Collections is a way to store common user data in a known location with a known structure. This allows different apps on Blockstack to access and write to the same collection of data. This allows users to use the same data in different apps. An example is a single store of photos owned by a user that could be read and shared by many different apps with permission.
Goals & Design Considerations
The goal is to realize true data portability on Blockstack. In the existing implementation, app data is stored in separate app-specific buckets on Gaia and structured differently. It is difficult to take your own data and use it in another app.
For end-users, we want:
- True data portability without cumbersome UX.
- Ability to easily manage app permissions and level of access to collection data.
- Reduced damage that faulty/malicious apps can cause to the user’s data.
For developers:
- Great developer experience to incentivize usage of collections over proprietary data formats.
- Make it easy to utilize user data generated from other apps.
- Have a voice in the governance of collection data schemas.
- Ability to extend the vanilla collection data schemas without affecting other apps.
- We don’t want to stifle developer creativity with rigid data schemas.
- We don’t want developers to fork away from common data formats.
For the ecosystem:
- Have some form of governance to add and improve collection data types.
- A way to incentivize and reward developers that use collections.
Summary of Design
Blockstack will build a library that provides defined classes for commonly used data schemas. Developers will work with these classes and objects instead of creating new data schemas. These objects will automatically convert to the defined data schemas when stored to Gaia and vice versa.
This library of Blockstack collection classes will be open source and we will put in place a governance process to allow addition of new classes and modification of existing ones. Any community member can propose upgrades to the library via a process similar to the SIP process for the Stacks blockchain.
In this design we made a decision to not validate and enforce the schema of the data written to collections. The rationale is that it’s easier to incentivize usage of collections than to enforce it on an open platform. We also provide users with the ability to roll-back data in case apps make undesirable changes that break compatibility with collections.
To provide users with roll-back capability, we’ve designed the collections data store conceptually as an event log. In version 1.0 of collections, every data write apps make will be stored as a separate file. This ensures that data is never lost and users can return files back to any previous state. Potential storage scalability issues will be addressed via compression and limiting history.
We will provide the users with full control over their collections data through the Blockstack Browser. Apps must request access to specific collections during authentication. Users can manage app permissions for collections and explore their raw collections data through the Browser. A file manager in the Browser is needed so the user can explore files and roll back if necessary.
Specification
Collections API in blockstack.js
The collections API will include a set of additional storage functions made available to developers. Under the hood, these new collection functions will use the existing storage functions in blockstack.js.
Storage
Instead of dealing directly with JSON data like the existing storage functions, collection storage will use data model objects. We will create a set of classes that represent the collection data types we want. (e.g. Contacts, Photos, Documents) These classes will internally convert between the objects and JSON for Gaia. Each object will map to individual files in storage. These classes can be extended by developers for additional properties with the requirement that they namespace their additions. We should also build in a versioning system for the schemas to help with compatibility as the schemas evolve.
We need to create a governance process to allow for the community to request changes to schemas and propose new ones. This should be something similar to the SIP process. We will add new schemas to collections if there is enough support for it in the community.
We won’t explicitly validate schemas, the objects themselves will handle the object to JSON schema conversions under the hood. This prevents developers from accidentally breaking the schema. It won’t completely prevent them from changing the data schema but there isn’t really anything we can do to 100% stop them. We can instead offer a way for users to roll back changes through the browser.
Usage example for the proposed API:
import { Contact } from 'blockstack-collections'
// Saving a collection item
var contact = new Contact({
firstName: 'Blocky',
lastName: 'Stackerson',
blockstackID: 'blockstacker.id',
email: '[email protected]',
})
contact.save().then((contactID) => {
// contact saved successfully
// Returns a unique contact ID for use in retrieval later
// contactID = abc12345
})
// Retrieving a single item using the item ID
var contactID = 'xyz12345'
Contact.get(contactID)
// List items in the Collection, returns a count of items in the collection
Contact.list(callback)
// the callback is called for each file
callback(contactID) {
// Fetch the actual contact object
Contact.get(contactID).then((contact) => {
// Do something with the returned contact object
})
}
// Delete a collection item
contact.delete()
// This should just rename the latest file to a historical file and
// update the index to reflect this
Scope Request
For app developers, collections permissions can be requested via the authentication scope system. The collection schema libraries should provide collection scope identifier constants.
import { Contact, Document } from 'blockstack-collections'
const collectionScopes = [
Contact.scope,
Document.scope
]
const appConfig = new AppConfig(
[...DEFAULT_SCOPE.slice(), ...collectionScopes] // scopes
)
const userSession = new UserSession(appConfig)
userSession.redirectToSignIn()
Requesting scope after authentication
We should optionally provide a function to request additional scopes after the user is already authenticated. For existing apps that want to add collections, the alternative would be to force the user to re-authenticate.
userSession.requestCollectionScope(Contact.scope.write)
Browser-side changes
Storage Key Generation
Collection data would be stored in separate Gaia buckets not related to any apps.
Currently app data is stored in Gaia buckets:
"http://myapp.com": "https://gaia.blockstack.org/hub/143tnkzivRBSSvmyo1bXghoap2gRVpyvzz/
Collections data would be stored in similar buckets:
"collections.contacts": "https://gaia.blockstack.org/hub/143tnkzivFVgPqerPKUoKLdyvgyYNPjM9/
The app data bucket address 143tnkzivRBSSvmyo1bXghoap2gRVpyvzz
is generated by deriving from the appsNodeKey in each identity address using a hash of the app domain as the index. We can similarly generate collections data bucket addresses using a collectionsNodeKey and the collection name as the index.
// Key derivation for app buckets
var appDomain = 'https://www.myBlockstackApp.com'
var hashAppIndex = sha256(appDomain + salt)
var appNode = this.hdNode.deriveHardened(hashAppIndex)
// Key derivation for collections bucket
var collectionsPrefix = 'collections'
var collectionName = collectionsPrefix + 'contacts'
var hashCollectionIndex = sha256(collectionName + salt)
var appNode = this.hdNode.deriveHardened(hashCollectionIndex)
We prefix collection index with collections to avoid collisions between app and collection indices.
Encryption Key Generation
Encryption keys will be generated deterministically and stored in the user’s identity settings file. This file is persisted to Gaia. Keys will be written to the app’s storage bucket when permission to the collection is granted by the user during auth. In contrast to the previously proposed approach, this new approach does not produce a new encryption key when granting collections access to new apps. The encryption index is meant to be incremented when the encryption key needs to be changed. i.e. when revoking collections permission from an app.
// Encryption key derivation for collections bucket
var collectionName = 'collections.contacts'
var encryptionIndex = collectionIdentitySettings[collectionName].encryptionIndex
var hashCollectionIndex = sha256(collectionName + encryptionIndex + salt)
var appNode = this.hdNode.deriveHardened(hashCollectionIndex)
Encryption Key Storage
We can store the encryption keys for collections in the app’s own storage bucket and encrypt the key with the app private key. When the app needs to decrypt data from a collection, it should fetch and decrypt the key from it’s own storage bucket. This way when the encryption keys change, no action is required from the app. All collection permission granted to the app will be stored in the single key file.
Proposed naming convention for encryption keys on app data buckets:
.collection.keys
Example collection key file:
// .collection.keys
[
contacts: {
gaiaHubConfig: { ... },
encryptionKey: '3b14ad869364c3db175d48ee56a7cec24cdb69a65a945b80401'
}
]
Encryption Key Revocation
To revoke an app’s ability to encrypt and decrypt a specific collection’s data, we need to change the encryption key and re-encrypt the existing data.
We can change the encryption key by incrementing the encryption index for the specific collection. And regenerating the key using the resulting new derivation index.
The user will be given a choice to decrypt and re-encrypt all files in the collection, including historical files using the new key. Or only encrypt new file writes using the newly generated key. The current and historical files will not be re-encrypted. A necessary next step would be to update the stored encryption key file in each authorized app’s bucket.
This action would need to be performed from the user’s browser/authenticator since it’s the only agent that can write to every app’s storage bucket as well as the collection.
If the apps cache collection encryption keys locally, they need to know when the encryption key changes. Each encrypted collection write operation should send the encryption key ID to the Gaia hub. The hub will check the ID against the stored key file in the bucket and return an error in case of mismatch. The client-side logic would be to automatically fetch the new key, re-encrypt and perform the write again.
Gaia Hub Changes
The Gaia hub should allow a new type of authentication token that only supports a special write operation that retains change history. This provides the user the ability to roll back files to a previous state. In version 1.0, we will just keep every file that was written to a collections storage bucket. We store the latest version of the file with the canonical name so that file reads don’t need to query an index or log.
Example:
myphoto1.jpg <----- Always the latest version
.history.1566581249949.myphoto1.jpg <----- Previous version
.history.1566581000000.myphoto1.jpg
.history.1566580000000.myphoto1.jpg
Naming scheme for historical files is .history.<timestamp>.<filename>
On file writes, the Gaia hub would simply rename the last version of the file to the historical file naming scheme. The naming scheme includes an incrementing number so we can order the files later. The index file provides the current max number for each file. And the Gaia hub will need to be able to deny writes to files using the historical file naming scheme so that apps cannot overwrite historical files. When the user wants to roll back a file, we can construct the full history of each file using the historical files and the number in the filename.
Gaia Hub write permission
Currently Gaia hub write permissions are granted via an auth token generated by signing a challenge message from the hub. Since the app does not have access to the collection private keys, the Gaia auth token will be generated by the browser. The token will be written to the app’s storage bucket in the .collection.keys file which also contains the collection encryption keys.
File manager
The collections implementation should include a file manager that can allows users to browse their collections data and potentially regular app data. The only place this can be implemented is the Browser since it can generate storage and encryption keys for all collections/app buckets.
App Permission Manager
A simple interface is required to manage app permissions for collections. The user should be able to view the list of apps that have access to each collection type. It should also be possible to revoke app’s access to collections from here.