Support for Collections with Simple Gaia Authentication

I did some brainstorming with @larry about supporting collections with the simple Gaia permissioning model we have today.

Collections Background

The idea is that a collection is a user-owned bucket of data which is writable and readable from different applications, each of which will need to ask permission for the bucket. Users will be able to revoke that permission from their browser.

Collections-specific Private Key

The first part of the design is to instantiate a derived private key unique for the collection (say skCollection).

Now, to write to that collection, we need to use an authentication token:

token = SIGN(challenge-text, skCollection)

We want to deliver that to applications — so when an application requests permission for a collection, the Blockstack Browser can sign the challenge text and give the token to the application.

Unfortunately, this makes revocation impossible — the user would simply have to wait until the token expires (because the challenge text has changed).

Revoking Collection Authentication Token

Authentication tokens can be forced to expire. Basically, if the challenge text contains an integer, then the user can revoke all outstanding tokens by bumping that integer. This does require some metadata on the gaia hub to track the current integer, but I do think that there’s a relatively elegant way to implement metadata.

So, the challenge text generated by the gaia hub would be something like:

{ CUSTOM_GAIA_HUB_TEXT, EXPIRE_DATE, GENERATION_INTEGER }

The integer can be bumped by the Blockstack-Browser everytime the user wants to revoke collections access for a given application. But now all tokens are invalid. Does the user need to re-authenticate to all of these apps now? Can we avoid that? That depends on how the collection authentication token can be passed to the application.

Passing the Collection Authentication Token

If, instead of passing the collection auth token to apps during the authentication process, we pass it by storing it at a well-known location in the application’s data store, the blockstack browser can proactively update the auth token for all the different applications.

This means that the collection authentication token for an application X would be stored in a file:

${gaia_hub_url}/${application_address}/.collections.json

Then, when the blockstack-browser invalidates all of the credentials, it can re-sign the challenge token and write it to that location for each application that should still have access. The blockstack-browser can write to all these locations because it possesses the user’s root-key, so it can always write to application-addresses.

Metadata for Gaia Buckets

A user needs to be able to update that generation integer for a gaia-bucket, and it needs to be persisted. This can be stored in a metadata location within the gaia bucket, however, we want to make sure that the metadata location isn’t writable by arbitrary applications with the collection authentication token, but only by a requester who really owns the private key for the bucket.

This could be implemented by requiring a stronger challenge text for writes to files with a prefix '.' — and then the gaia hub would read the generation integer for a gaia-bucket from .auth_generation if it exists. This would need to then be part of ‘in-spec’ behavior of the gaia hub.

9 Likes

@aaron Any timeline when this is planned.

I was thinking about how this could be implemented (sharing data between applications) today and one of the things I thought of was to pass the collection private key in a manner similar to appPrivateKey. While authenticating there would just be another parameter in redirectToSignIn named collections. A developer would pass in an array of collections that the application would want access to (e.g. ['Contacts', 'Images']).

(Side note: generation of these private keys in the Blockstack browser would probably be really similar to how appPrivateKeys are generated – const appPrivateKey appsNode.getAppNode(appDomain).getAppPrivateKey(). Instead of passing in an appDomain, pass in a collection name like 'Contacts')

From there the developer would have access to the requested collection’s information (private keys) in profile.collections. With those private keys they would be able to authenticate with those Collection’s Gaia buckets and encrypt data specifically for a Collection.

I like the idea of going a step further and enabling the ability to revoke access to a Collection whenever the user wants, that wasn’t something that I was considering at all! So I am happy I found this thread and I have a few questions on your idea for the implementation.

  1. In addition to storing a collections auth token in .collections.json, will that collection’s private key live in that file as well? That way data can be encrypted/decrypted for a specific collection.
  2. After revoking access to a collection for an application a new private key will need to be made (since you are bumping the generation integer). A side effect of this will be that the Gaia address will need to change as well, which brings up some questions:
    a. How will the migration of data to the new Gaia bucket work?
    b. Assuming all data needs to be unencrypted, migrated, and encrypted again, isn’t this a very expensive operation?
  3. In general for Collections, is there anything from stopping a malicious application developer from asking for access to a Collection and deleting all data inside of it? Or from reformatting the data in a way that it isn’t able to be read by other applications using it?

Sorry for all of the questions, but this is a topic I am very interested in! Thanks!

1 Like

The questions are great! Very happy to have them!

It’s pretty late here, so I’m only going to take a stab at answering one of the 3 questions. Hopefully @aaron will chime in on others.

If you give an app permission to write to your collection, then it is able to overwrite or delete items in the collection. This is the same as what happens when you give an app access to your photos on a phone. An evil appjcan delete all the photos if it wants. One way to mitigate the effects of this is either have versioning or a “trash can” like function on gaiahubs that so that deleted files are recoverable for a period of time. There may be other approaches as well.

At some point gaia hubs or some other tool in the stack might enforce schemas for certain collections. One idea I currently like is to have collection type specific tools built on type of the gaia storage primitives that make it easy for app developers that want to use a certain type of collection in their apps to use the same schema. This of course wouldn’t stop a malicious app developer, but might encourage interoperability simply because it’s the path of least resistance.

1 Like

I really like the idea of having a “trash can” system in place, I think I’ve actually brought it up on the Slack before! It wouldn’t just help in regard to having data deleted in Collections, but app-specific data as well.

Users don’t necessarily have control over what data is stored or when it is deleted/overwritten, they own derived private keys per application that assist in connecting to Gaia and encrypting/decrypting data stored on their Gaia hub. Developers are the ones that are given a user’s private keys to control their data for them. Giving users another way to access previous versions of data, deleted data, or even all of their data would be highly beneficial I think! (This could be a tool contained inside of the browser perhaps? Since only the browser has the “master key”?)

With this idea, are you saying (I might be stretching your words and putting some of my ideas in here…) that there might be some sort of repository of collections that developers can request access to in their apps? In addition to a schema given per collection there could be some “server side validation” to ensure that data matches a schema? I know @hologram is working on some client side tools to facilitate for the first step of this process :slight_smile:

Hey @brandonparee – these are great questions.

The reason we want to store an auth token there, rather than a private key, is that it is much harder to revoke access once the application has the private key (if the application is a little malicious).

However, this does complicate encryption/decryption, as you say. However, this could be solved with a different feature which we also want to enable — multi-party encryption. Basically, let’s say you have a group of public keys and you want to share encrypted data between them. A solution to that problem could also be applied here. (A related forum post which you’ve already found :slight_smile: Multiplayer access -- Sharing with the public vs. just trusted users?)

  1. After revoking access to a collection for an application a new private key will need to be made (since you are bumping the generation integer). A side effect of this will be that the Gaia address will need to change as well, which brings up some questions:

That’s the great thing about using the auth tokens rather than distributing the private key. Auth tokens can be invalidating by “bumping” a nonce on the gaia hub, making all current auth tokens invalid for a given address. The browser, which has access to the real private key can then regenerate the auth tokens and write them to the .collections files for the applications still authorized.

2 Likes

I knew these two problems would have some overlap eventually! Definitely two of the things I am most interested in learning the solution to.

Dang, that’s really elegant and makes a ton of sense. :slight_smile:

So the benefits for bumping the nonce for the auth token are:

  • The private key does not need to be distributed
  • Since the private key never has to change, the Gaia hub address will always stay the same
    • This means my scenario of needing to migrate data never needs to happen

The downsides are:

  • Encryption is difficult, even for a single user scenario
  • Implementation of collections might need to wait on multi-party encryption

I do have followup question as well:
In order to avoid problems with encryption for a single user, it seems the easiest way is to move up the nonce to creating a Collections private key. Is there any way to avoid the the data migration scenario in Gaia if that approach was used?

Reviving an old topic here with a somewhat alternative proposal — as a first, step, we should encourage/implement library support for something along the lines of group encryption.

Basically, when I encrypt a file with putFile, I should be able to add a “group-name”:

putFile('foo.txt', 'data', { encrypt: { type: 'group', name: 'besties' } ])

getFile could automatically discover whether or not a given file is group encrypted, and attempt to locate the correct decryption key if the current user is an authorized participant in the group.

A user’s app can control group membership:

addGroupMember('besties', {publicKey: '02c0251...'})
addGroupMember('besties', {username: 'a.blockstack'})
removeGroupMember('besties', {username: 'judecn.id'}) // friendship over!

For username-based adding/removing, we’d need a public key entry in the user’s .apps entry in the profile (see https://github.com/blockstack/blockstack.js/issues/458). For revocation, we’d need to reencrypt files with the new group key. That could be done lazily (i.e., reencrypt on new writes only), I suppose, but if we wanted to greedily reencrypt all known files, that’d require some library-side tracking of group-encrypted files.

The above interface is useful for applications even if they aren’t interested in collections and cross-app sharing. However that interface could also be used for cross-app sharing. A user would just need to add the app’s user-app-publickey to a specific group. If we wanted something like this to be a seamless permissions request, we’d need auth-protocol support for it, but that would certainly be achievable at the authenticator level.

5 Likes