Feedback wanted: Collections Design

yukan · March 28, 2019, 7:40pm

In the Blockstack dev tools roadmap posted on this forum a few months ago, collections was identified as one of the most important upgrades to the platform. So the team at Blockstack PBC spent the last month working on a design proposal. We wanted to make sure that the result meets the requirements of end-users, developers as well as the ecosystem. We would love to get more input from the community.

Overview

Collections is a way to store common user data in a known location with a known structure. This allows different apps on Blockstack to access and write to the same collection of data. This allows users to use the same data in different apps. An example is a single store of photos owned by a user that could be read and shared by many different apps with permission.

Goals & Design Considerations

The goal is to realize true data portability on Blockstack. In the existing implementation, app data is stored in separate app-specific buckets on Gaia and structured differently. It is difficult to take your own data and use it in another app.

For end-users, we want:

True data portability without cumbersome UX.
Ability to easily manage app permissions and level of access to collection data.
Reduced damage that faulty/malicious apps can cause to the user’s data.

For developers:

Great developer experience to incentivize usage of collections over proprietary data formats.
Make it easy to utilize user data generated from other apps.
Have a voice in the governance of collection data schemas.
Ability to extend the vanilla collection data schemas without affecting other apps.
- We don’t want to stifle developer creativity with rigid data schemas.
- We don’t want developers to fork away from common data formats.

For the ecosystem:

Have some form of governance to add and improve collection data types.
A way to incentivize and reward developers that use collections.

Summary of Design

Blockstack will build a library that provides defined classes for commonly used data schemas. Developers will work with these classes and objects instead of creating new data schemas. These objects will automatically convert to the defined data schemas when stored to Gaia and vice versa.

This library of Blockstack collection classes will be open source and we will put in place a governance process to allow addition of new classes and modification of existing ones. Any community member can propose upgrades to the library via a process similar to the SIP process for the Stacks blockchain.

In this design we made a decision to not validate and enforce the schema of the data written to collections. The rationale is that it’s easier to incentivize usage of collections than to enforce it on an open platform. We also provide users with the ability to roll-back data in case apps make undesirable changes that break compatibility with collections.

To provide users with roll-back capability, we’ve designed the collections data store conceptually as an event log. In version 1.0 of collections, every data write apps make will be stored as a separate file. This ensures that data is never lost and users can return files back to any previous state. Potential storage scalability issues will be addressed via compression and limiting history.

We will provide the users with full control over their collections data through the Blockstack Browser. Apps must request access to specific collections during authentication. Users can manage app permissions for collections and explore their raw collections data through the Browser. A file manager in the Browser is needed so the user can explore files and roll back if necessary.

Specification

Collections API in blockstack.js

The collections API will include a set of additional storage functions made available to developers. Under the hood, these new collection functions will use the existing storage functions in blockstack.js.

Storage

Instead of dealing directly with JSON data like the existing storage functions, collection storage will use data model objects. We will create a set of classes that represent the collection data types we want. (e.g. Contacts, Photos, Documents) These classes will internally convert between the objects and JSON for Gaia. Each object will map to individual files in storage. These classes can be extended by developers for additional properties with the requirement that they namespace their additions. We should also build in a versioning system for the schemas to help with compatibility as the schemas evolve.

We need to create a governance process to allow for the community to request changes to schemas and propose new ones. This should be something similar to the SIP process. We will add new schemas to collections if there is enough support for it in the community.

We won’t explicitly validate schemas, the objects themselves will handle the object to JSON schema conversions under the hood. This prevents developers from accidentally breaking the schema. It won’t completely prevent them from changing the data schema but there isn’t really anything we can do to 100% stop them. We can instead offer a way for users to roll back changes through the browser.

Usage example for the proposed API:

import { Contact } from 'blockstack-collections'

// Saving a collection item
var contact = new Contact({
  firstName: 'Blocky', 
  lastName: 'Stackerson',
  blockstackID: 'blockstacker.id',
  email: '[email protected]',
})

contact.save().then((contactID) => {
  // contact saved successfully   
  // Returns a unique contact ID for use in retrieval later
  // contactID = abc12345
})

// Retrieving a single item using the item ID
var contactID = 'xyz12345'
Contact.get(contactID)

// List items in the Collection, returns a count of items in the collection
Contact.list(callback)

// the callback is called for each file
callback(contactID) {
  // Fetch the actual contact object
 Contact.get(contactID).then((contact) => {
    // Do something with the returned contact object
  })
}

// Delete a collection item
contact.delete()
// This should just rename the latest file to a historical file and 
// update the index to reflect this

Scope Request

For app developers, collections permissions can be requested via the authentication scope system. The collection schema libraries should provide collection scope identifier constants.


import { Contact, Document } from 'blockstack-collections'

const collectionScopes = [
  Contact.scope,
  Document.scope
]

const appConfig = new AppConfig(
  [...DEFAULT_SCOPE.slice(), ...collectionScopes] // scopes
)
                      
const userSession = new UserSession(appConfig)
userSession.redirectToSignIn()

Requesting scope after authentication

We should optionally provide a function to request additional scopes after the user is already authenticated. For existing apps that want to add collections, the alternative would be to force the user to re-authenticate.

userSession.requestCollectionScope(Contact.scope.write)

Browser-side changes

Storage Key Generation

Collection data would be stored in separate Gaia buckets not related to any apps.

Currently app data is stored in Gaia buckets:

"http://myapp.com": "https://gaia.blockstack.org/hub/143tnkzivRBSSvmyo1bXghoap2gRVpyvzz/

Collections data would be stored in similar buckets:

"collections.contacts": "https://gaia.blockstack.org/hub/143tnkzivFVgPqerPKUoKLdyvgyYNPjM9/

The app data bucket address 143tnkzivRBSSvmyo1bXghoap2gRVpyvzz is generated by deriving from the appsNodeKey in each identity address using a hash of the app domain as the index. We can similarly generate collections data bucket addresses using a collectionsNodeKey and the collection name as the index.

// Key derivation for app buckets
var appDomain = 'https://www.myBlockstackApp.com'
var hashAppIndex = sha256(appDomain + salt)
var appNode = this.hdNode.deriveHardened(hashAppIndex)

// Key derivation for collections bucket
var collectionsPrefix = 'collections'
var collectionName = collectionsPrefix + 'contacts'
var hashCollectionIndex = sha256(collectionName + salt)
var appNode = this.hdNode.deriveHardened(hashCollectionIndex)

We prefix collection index with collections to avoid collisions between app and collection indices.

Encryption Key Generation

Encryption keys will be generated deterministically and stored in the user’s identity settings file. This file is persisted to Gaia. Keys will be written to the app’s storage bucket when permission to the collection is granted by the user during auth. In contrast to the previously proposed approach, this new approach does not produce a new encryption key when granting collections access to new apps. The encryption index is meant to be incremented when the encryption key needs to be changed. i.e. when revoking collections permission from an app.

// Encryption key derivation for collections bucket
var collectionName = 'collections.contacts'
var encryptionIndex = collectionIdentitySettings[collectionName].encryptionIndex
var hashCollectionIndex = sha256(collectionName + encryptionIndex + salt)
var appNode = this.hdNode.deriveHardened(hashCollectionIndex)

Encryption Key Storage

We can store the encryption keys for collections in the app’s own storage bucket and encrypt the key with the app private key. When the app needs to decrypt data from a collection, it should fetch and decrypt the key from it’s own storage bucket. This way when the encryption keys change, no action is required from the app. All collection permission granted to the app will be stored in the single key file.

Proposed naming convention for encryption keys on app data buckets:

.collection.keys

Example collection key file:

// .collection.keys
[
  contacts: {
    gaiaHubConfig: { ... },
    encryptionKey: '3b14ad869364c3db175d48ee56a7cec24cdb69a65a945b80401'
  }
]

Encryption Key Revocation

To revoke an app’s ability to encrypt and decrypt a specific collection’s data, we need to change the encryption key and re-encrypt the existing data.

We can change the encryption key by incrementing the encryption index for the specific collection. And regenerating the key using the resulting new derivation index.

The user will be given a choice to decrypt and re-encrypt all files in the collection, including historical files using the new key. Or only encrypt new file writes using the newly generated key. The current and historical files will not be re-encrypted. A necessary next step would be to update the stored encryption key file in each authorized app’s bucket.

This action would need to be performed from the user’s browser/authenticator since it’s the only agent that can write to every app’s storage bucket as well as the collection.

If the apps cache collection encryption keys locally, they need to know when the encryption key changes. Each encrypted collection write operation should send the encryption key ID to the Gaia hub. The hub will check the ID against the stored key file in the bucket and return an error in case of mismatch. The client-side logic would be to automatically fetch the new key, re-encrypt and perform the write again.

Gaia Hub Changes

The Gaia hub should allow a new type of authentication token that only supports a special write operation that retains change history. This provides the user the ability to roll back files to a previous state. In version 1.0, we will just keep every file that was written to a collections storage bucket. We store the latest version of the file with the canonical name so that file reads don’t need to query an index or log.

Example:

myphoto1.jpg <----- Always the latest version
.history.1566581249949.myphoto1.jpg <----- Previous version
.history.1566581000000.myphoto1.jpg
.history.1566580000000.myphoto1.jpg

Naming scheme for historical files is .history.<timestamp>.<filename>

On file writes, the Gaia hub would simply rename the last version of the file to the historical file naming scheme. The naming scheme includes an incrementing number so we can order the files later. The index file provides the current max number for each file. And the Gaia hub will need to be able to deny writes to files using the historical file naming scheme so that apps cannot overwrite historical files. When the user wants to roll back a file, we can construct the full history of each file using the historical files and the number in the filename.

Gaia Hub write permission

Currently Gaia hub write permissions are granted via an auth token generated by signing a challenge message from the hub. Since the app does not have access to the collection private keys, the Gaia auth token will be generated by the browser. The token will be written to the app’s storage bucket in the .collection.keys file which also contains the collection encryption keys.

File manager

The collections implementation should include a file manager that can allows users to browse their collections data and potentially regular app data. The only place this can be implemented is the Browser since it can generate storage and encryption keys for all collections/app buckets.

App Permission Manager

A simple interface is required to manage app permissions for collections. The user should be able to view the list of apps that have access to each collection type. It should also be possible to revoke app’s access to collections from here.

MidnightLightning · March 29, 2019, 4:19pm

I’m glad to see this sort of concept is being worked on! I was musing on the same sort of idea, and surprised to see this is a current topic!

I like the idea overall, but one modification I would propose is to not have it be baked into the protocol itself the bind between the collection data type and the name of the Collection (e.g. as currently proposed, all Contact records go in a collections.contacts bucket). I think it’s a good idea to have a standard set of schemas, but I think it would be better to give the user control over creating the Collection buckets themselves, and choosing what types of things go in it. For example, I may not want thatsketchyapp.xyz to have access to all my Photos, only a few that I’m testing with (since I don’t trust the app developers of that app to not maliciously delete my Photos out from under me).

The key concern in that is if apps that are granted access to a bucket have the ability to delete files out of the bucket, that runs the risk of encouraging users to keep all their data “in one basket” (your One Main Contact list) and an accidental or malicious app could delete your one copy of your data. If they only have edit-access, a malicious app could rewrite all my Contact records to have the name “John Smith”, but with the History naming scheme presented here, there’d be the possibility of undoing such an action (though might be very tedious to roll back hundreds of edits by one app. Might be useful to note with the history changes which app was the one that made the change. That would allow automated actions like “reverse all changes made by sketchyapp.xyz in the last 3 hours”).

For Schemas, having a master list of “root” schemas that then developers can extend with custom (“namespaced”) properties sounds a lot like the Resource Description Framework (RDF)/Semantic Web/Linked Data idea. Perhaps since the serialized format for these objects is JSON, the JSON-LD structure can be used, and existing schemas/contexts (like a Person, instead of a Contact) could be used?

friedger · April 1, 2019, 9:05am

The description of the gaia storage sound like Collection permissions in 1.0 are always read/write, correct? While the scopes are read or write in the code snippet. It would be nice to have read-only permission in 2.0 maybe.

On the browser-side changes, there needs to be also an update on the permission dialog to show/explain the requested collection permissions. It is not clear how custom collections could be created. Is it just to request a new scope? Do new collections can only exist with a browser and blockstack.js update? That would prevent innovation, in particular if the long term goal is that all data is stored in collections. There should be a fallback if the collection is not known to the browser.

jehunter5811 · April 1, 2019, 1:36pm

I like this proposal a lot. I think the sooner this can be rolled out, the better. 3Box started behind Blockstack and is now ahead of Blockstack in terms of using data across multiple apps, so it’s important to push this forward quickly IMO.

yukan · April 1, 2019, 2:38pm

The issue I see with letting people create any custom collection they want is that apps won’t know about it. If my documents app doesn’t know I have a custom “super sensitive documents” collection, then they won’t be able to ask for access permissions. However I think it would make sense for people to be able to create multiple versions of a single collection type. So that when apps ask for “documents” collection permissions, the user can choose one of their several documents collections to use in the app.

Yes, I think we want to simplify as much as possible for version 1. This feature sounds like a great upgrade for a later version.

The serialized formats should be JSON and we’re most likely going to use existing standard schemas such as Person.

yukan · April 1, 2019, 2:44pm

Read-only permissions are going to be in version 1.

New collection types should be added to the Blockstack collection schema library via pull request. The Browser should be able to gracefully handle unknown collection types. Since all the Browser will do to enable collections is generate the keys, this should be very doable.

MidnightLightning · April 1, 2019, 4:36pm

I think this could work similarly to how sites like Google Drive handle permission requests. When I am logged into several Google accounts, and on a third-party site, click a “attach document from Google Drive” link, my browser first redirects to Google and I get a Google-created “which account do you want to give Drive access to?” prompt. After that I get a filepicker and rights are granted to the third-party app.

So, with that idea, a Blockstack app doesn’t need to know all a user’s collections. They’d instead show the user some a “Import Collection” button (or similar), which when clicked would open a Blockstack browser dialog with all the user’s collections (similar to how the login prompt, if you have multiple IDs registered, gives you a prompt for which ID to pass along to the app), for the user to tick off which one(s) they want to give the app access to.

MichaelFedora · April 1, 2019, 5:46pm

What would be interesting is if Apps / the User could self-define collections using a “manifest.json” of sorts.

For instance, a collection titled mycollection could be created by myapp1 and could be used as a cross-app bucket for communicating with myapp2, with whatever spec myapp1 defines. Though this would make more sense for a later versions of collections (along with the changefeed subscriptions!).

Excited to see where this goes, and I’m glad to see that people can at least make different collections of the same type – it’ll be interesting to see if you can sync files between collections as well (partial/full-sync buckets I suppose).

dant · April 1, 2019, 7:34pm

Forgive if this is a naive question, but I assume ‘the browser’ wont be the only potential data manager? Is a ‘manage collections’ permission an option?

Couldn’t we solve this by just making collection types an array?

collections: {
  "documents": [
{
    "location": "https://gaia.blockstack.org/hub/1Lsdf83isMHFsfse223hrbEynNR63vn2A/",
    "authorizedApps": 
    // Encrypted section
    [
      "https://MyApp.com",
      "https://OtherApp.com"
    ]
    // End encrypted section
  },
{
    "location": "https://gaia.blockstack.org/hub/completelydifferentgaiaurl/",
    "authorizedApps": 
    // Encrypted section
    [
      "https://MyApp.com",
      "https://OtherApp.com"
    ]
    // End encrypted section
  }
]
}

Other than that question, multiple +1s. Looking forward to this.

yukan · April 1, 2019, 10:05pm

The Blockstack Browser is the only collections manager for now. The reason is that in order to manage collections you need the master private key to generate and revoke encryption keys for apps. However a future third-party implementation of the Browser/authenticator can also perform this.

We have to also consider that it’s not just knowing about the collections but also the schema of the data.

avthars · April 2, 2019, 5:08am

Sounds solid! Looking forward to playing around with this!

larry · September 20, 2019, 10:26am

Following up on this based on a chat that @patrick and I had. This is a bit long-winded - apologies in advance.

To update the community, the good folks over at Blockstack PBC have been working moving this forward and make collections a reality.

In the meantime, as part of the work I’m doing on a new browser, I’ve built a simple blog app built around a collection protocol that I’m using myself. It is currently pretty barebones and consists of a CLI app to upload posts from some common Mac editors to a collection and then another separate app that renders this collection of posts to the legacy web. You can see deployed app at my new blog and for the devs around you, poke around network requests as you navigate from one page to the next to see posts getting loaded from gaia.

Why am I doing this? Aren’t there a dozen other blog apps on blockstack?

Three reasons:

I’m a strong believer that you need to build something you use and use it as you build it out. If you don’t dogfood your own things and try to build in a sterile vacuum you miss a critical part of the feedback process that informs your design. Having my own Blockstack helps inform decisions about the new browser.
I’m looking at changing some assumptions of the legacy web and I needed to be able to look at these changes from the perspective of a app developer so that I have skin in the game. I can’t expect app developers to make any chances to their apps that I wouldn’t make myself.
I think current Blockstack apps miss one of the HUGE differentiators between the new internet we’re building and the legacy cloud-based web. Where users have their own data you want to build app ecosystems around protocols not siloed apps. Think building gmail or superhuman around the email protocol instead of building whatsapp.

Points 1 & 3 are relevant to this thread.

While building my own collection, I’ve learned that it is that it is an iterative process - the assumptions I made at the beginning turned out to not all be right once I started building and deploying. Much like the building of a new software library, this process is best done initially by one or two developers. It isn’t conducive to a governance process with many cooks. Developers need to be able to make their own collections to bootstrap their app ecosystem.

One of the concerns is that we’ll end up with a dozen different collection types for a given type of app. This may be true in the beginning, but as tends to happen in software developers, developers vote with their code and 1 or 2 will get the majority of support and converge around it.

Think about Javascript libaries - jQuery has dozens (hundreds?) of contributors but the first version was was written by one person at a hackathon: John Resig. The same with React - hugely popular with many contributors today, but the first version was written by one person Jordan Walke. Imagine if before these libraries existed, someone said, “we are only going to allow a couple javascript libraries and then create a governance process to decide which new libraries can be created.” Would jQuery have been created at a hackathon if there was a governance process? Would one guy at Facebook have created React?

How would the governance process have decided that either of these was worth allowing people to use in apps before their massive success?

I propose that collections need to be permissionless - any dev should be able to make a collection and ship an app that uses that collection. We have the perfect tool for publishing information about which collections exists in a way that an authenticator and apps can discover them: the Blockstack Name System.

If we’re concerned about no overlap in collection compatibility, we can use the app mining program to provide some sort of incentive to apps that use collections that are also used by other apps.

I think this concern is overblown though - if one collection gets traction - existing users with data - app developers will naturally want to support it. If there’s an existing tool that does what we want AND it also brings users to our app, we app developers will jump on it. For example, in building my blog collection protocol, I chose to base the posts object on the TextBundle protocol because it was less work for me and gives me a half a dozen high quality text editors that work out of the box with my protocol.

To summarize:

I think it’s great that the Blockstack PBC team plans to ship a few of their own collection schemas (protocols). Anyone else should be able to the same thing. Without permission (ie. no goverance process).
Hopefully these collections will come from apps that they’ve built that use the protocols and will be informed by that experience.
We might consider using App Mining incentives if we’re concerned that it will take too long for communities to form around emerging collection protocols.

Thanks for reading! Love to hear your thoughts!

friedger · September 20, 2019, 1:15pm

Coming from the Android world, I had similar discussions about Content Providers (=collections).

Initially, every developer could create content providers and there were a few predefined by the Android system (text message, call logs). Then a few more were added to the system (calendars, contacts), and developers were warned that they shouldn’t rely on the existence of a content provider definition, resulting in everybody defining their own definitions. Nowadays, they are rarely used apart from the ones defined by the system.

In Android, these definitions are created by companies, usually not keen to share the data or not keen to get into competition with other apps. However, for Blockstack, I hope these definitions are driven by (power) users as they are the data owners and much more in control than on Android.

I am in favor of a permission-free solution for Blockstack collections!

nicktee · September 20, 2019, 4:23pm

Great insightful post Larry! I love the idea of using App Rewards mining to incentivise developers to use Collections over a siloed data storage approached!

yukan · September 20, 2019, 5:25pm

Thanks for the feedback @larry. We’re just about to launch the developer preview of collections. So we’ll have more details for the community soon.

One thing that I want to make clear now is that the intention of the yet-to-be determined governance process for collections is not to prevent app developers from creating and using their own collection schemas. It is more of a process to standardize commonly used schemas to improve interoperability. In fact it’s really easy to create a new collection type by subclassing the Collection class from the blockstack-collections repo. We provided an example in the form of a contacts collection. Any custom collection type does not have to be in the blockstack-collections repo for it to work. It is fully permission-less! Moreover, if other developers like the collection type you’ve created, they can import that into their app without going through the blockstack-collections package provided you’ve published the code. What’s potentially problematic is when 2 different collection schemas share the same name. I think this is where BNS can come in.

Once again stay tuned for more details.

Walterion · September 21, 2019, 12:38pm

I am waiting for collection, and after reading all the comment, I should say I agree with @larry that if we want to let people make new ways, removing governance is the best way. But having a basic, open and documented standard gives a good structure to begin.
@yukan , I like to know about the “governance process” you are talking about and want to know how far we can go with that as I’ve got some ideas that I like to know how it can be done once I got more detail.
It is the interesting update I like to see soon, so please keep up the excellent work.

jeffd · September 23, 2019, 6:17pm

When we say “use App Mining” I assume we mean find a way to incentivize app developers to use a valuable and relevant collection vs. keep data siloed. And then, some way to incentivize consolidation or competition amongst similar collections until one wins, they merge, etc.

That sounds pretty complicated to me. Anyone have a specific mechanism in mind?

Per Friedger’s story, I don’t trust that “app developers will naturally want to support it” mojo will actually get us to a desirable result.