Radiks Decentralization Proposal

hank · June 28, 2019, 5:04pm

Subtitle: Enter the radiverse

I’m proposing a model for how Radiks can be more decentralized.

Current model: Each app runs one Radiks server, maintained by the app developer. If a user wants to use this app, they must use this Radiks server. Their data is still stored in Gaia, but they can only query data from the single Radiks server.

Example: Banter runs a single Radiks server. Users who wish to communicate with other users on Banter must save and query data with this one server.

New model: Each app is part of a connected ‘radiverse’ (yes) of app nodes. Each app node keeps a full index of all app-related data. Anyone can run their own Radiks server and stay in sync with all app data, even when end-users are not connecting to this server. There are separate radiverses for each app type - messaging, social, project management, etc.

Example: Multiple apps can exist as ‘Banter’ servers. User A can compose a message on banter.pub, and User B can see that message on altbanter.net, and vice versa.

Inspired by ActivityPub and federated protocols

I am taking a lot of ideas from ActivityPub, a protocol for federated social networks. I believe their model is very similar to what this proposal is trying to accomplish. However, ActivityPub is designed solely for social networks, whereas this proposal aims to serve all application types. The Matrix protocol is somewhat similar.

I am taking some ideas from these federated protocols, and attempting to generalize their concepts to support any type of application. ActivityPub is for social, and Matrix is for chat. Their schemas are heavily tied to the type of application. This proposal aims for a design that supports any type of app.

Network Design

This proposal works under the assumption that there will be a small number of “app nodes” in each radiverse. Since Radiks requires each node to keep track of all data in this radiverse, running an app node will be relatively expensive, and will likely only be ran by other app developers and some power users. Since this assumes the number of nodes in the network to be relatively small, all nodes are connected to each other .

Why not a blockchain?

I believe that using a blockchain to broadcast data is overkill for this use case.

Being real-time and cost efficient is extremely important
Data is not interdependent
- Guaranteed ordering of operations is not required
Trust-less finality is not an absolute requirement.
There can be tradeoffs where a node can whitelist or blacklist nodes, in order to maintain higher speed and mitigate sybils.

That said, a blockchain could be optionally used for audibility and eventual finality. This could even be a ‘selling point’ of why a user should use one Radiks server node vs. another. See the addendum on blockchain use cases.

How it works

When a new Radiks app is created, they deploy a Radiks server instance. It’s the same as today. If a power user or app developer wants to run their own instance of the app, perhaps with a different UI, they can subscribe to the original server. When a third radiks server joins the mix, it subscribes to the original server and all of its peers.

Walkthrough

Bootstrapping

Server A runs a radiks server at banter.pub. Server B (running at https://mybanter.com) wants to become a new server in the Banter radiverse. Server B is booted up with a bootstrap_url=https://banter.pub configuration. When this new server is started for the first time, it subscribes to the bootstrap_url server.

Subscribing

To subscribe to a server, Server B fetches Server A’s public key at https://banter.pub/radiks/node_info, which returns a response like:

{ public_key: 'asdf', radiverses: ['social'] }

Next, it sends a POST request to https://banter.pub/radiks/subscribe. It includes a payload with a signature, where the message of the signature is Server A’s public key. It is signed with Server B’s signature.\

{
  signature: "asdf",
  origin: "https://mybanter.com",
  radiverses: ["social"]
}

Server A then fetches Server B’s public key and validates the signature of this request. If the signature is validated, it returns a 200 status code. Once done, Server A initiates a subscription to Server B.

Sidenote: Server Keys

Each server publishes their public key. This is used to authorize server-to-server requests, like subscribing and sharing data. One use case for this is to prevent maliciously “subscribing” from lots of different servers, that the attacker doesn’t even own. Also, see the addendum on malicious servers.

Subscribing to peers

Later on, Server C comes into the mix. It also bootstraps with Server A and subscribes, as described above. Once this handshake is completed, Server C fetches all of Server A’s peers, at https://banter.pub/radiks/peers, which returns some data for each node its connected to in each radiverse:

{
  social: [
    "https://mybanter.com"
  ]
}

Server C goes through all of these peers and subscribes to them. Note that this ‘peer discovery’ phase happens on both sides after every subscription - Server A will also subscribe to all of Server C’s peers, if it hasn’t already.

Broadcasting new data

Now, Server A is running their app, and a user saves some data. Server A now, in a worker process, broadcasts this new data to each of its subscribers (B and C). It sends a POST request to {serverURL}/radiks/broadcast, with a payload like so:

{
  data: {
    someKey: "someValue"
  },
  "signingKeyId": "user-signing-key-id",
  "signature": "asdf",
  "serverSignature": "qwerty",
  radiverse: "social"
}

The receivers of this broadcast validate two signatures - the user’s signature and the serverSignature. If valid, this data is saved in the server’s database.

Although HTTP could be one transport option, a socket-based pubsub model between servers would be more efficient. Radiks could use off-the-shelf libraries like libp2p for establishing connections and publishing data to peers. While this transport might add complexity, it would greatly improve latency and throughput.

Addendums

Malicious servers

Server operators could be required to be associated with a Blockstack ID. Other servers could validate that the server’s blockstack ID, and blacklist servers that turn out to be invalid. A server could be configured to only be peered with “paid” Blockstack IDs, since there is more cost associated with them.

Malicious users

This is not really related to the decentralization of Radiks servers, but is important for the future of Radiks.

The current design of radiks-server does not validate that each write is associated with a particular Blockstack ID. This decision was made to increase user privacy, but it may prove insufficient in preventing malicious users, since currently it is quite easy to spam a radiks server anonymously.

After seeing how Radiks is used in the wild, I am second guessing this decision, for a few reasons:

The primary use case of Radiks, currently, is around public data. Although private use cases are still perfectly valid (and used), the tradeoff may not be worth it.
By making it easy to discover the ‘global’ world of Blockstack users for a given app, it would be easier for a Radiks server to re-establish the entire state of the app’s data by simply crawling Gaia hubs.
Many Radiks apps have the need to associate user data with a Blockstack ID, and validate that the write came from a particular user. For example, social networks need to validate that a post coming from “hankstoever.id” actually came from that user. In this case, associating all writes with an ID may become a de-facto pattern.
Knowing that an ID is a user of a particular app may be OK if you can’t see their data. For example, in a messaging app, you might be OK with knowing that hankstoever.id uses the app, but not know who they are messaging with.
- UserGroups could be structured such that all writes associated with a group are validated with just a single user, the “owner”, even if the write comes from a different user. This way, you still get the sybil protection, but protect the privacy around who is in the group.

Blockchain / smart contract uses

Although I believe that using a smart contract for all data is not a good idea, there are certain use cases where it may be quite handy for this model:

Archival / finality / auditability : Eventually, Radiks servers commit some hash or reference to the data they’ve processed.
Peer discovery: Provide a way for peers to discover each other, even if a well-known bootstrapping node is down. It could also provide a way for end-users to choose which server to connect with.
- Protection against malicious servers: Some cost could be associated with becoming a node in the radiverse. You could also imaging some kind of DAO or TCR for becoming a new peer.

Supernodes

There are use cases where it might be beneficial for a single Radiks server to be a part of multiple radiverses. For example, and app that connects the messaging and social radiverses in a single app. Or, a power user that runs one server for all of their apps. The implementation of Radiks should support this out of the box. This requires an adjustment to how Radiks stores data and the APIs used to query.

Usage of collections

Right now, the way Blockstack auth works is that you get a unique private key for each domain you log into. So, it’s not possible to get the same private key from 2+ domains. This is a blocker for this proposal, because you:

Can’t sign data that will be accepted by the indexer across multiple domains
Can’t decrypt the same data from 2 different domains.

Because of this, the current intention to to use collections as a way of sharing keys across multiple domains. Each ‘radiverse’ might have a single ‘collection’ that it requests. For social apps, you may request the ‘Social’ collection. Then, radiks.js would be updated to allow developers to specify which collection to use for different models.

Model validation

One missing component of Radiks, currently, is validation of models. In the current model, since there is a 1:1 map from app domain to Radiks server, this is less of an issue, because app develops manage the code that writes to their own server.

When we enter a world of many apps, with their own code, writing to a shared model, this becomes a bigger issue. Adding explicit support for validation of models would not only provide convenience of app developers, it would reduce the possibility for malicious (or benevolent) apps from writing data to a model that uses a different schema.

I’m proposing a way to enforce schemas that will work on both the server and the client. Validation will use the well-supported json-schema design. When defining a model, you can now introduce a schema:

class Todo extends Model {
  schema: {
    // existing code
  }

  validation: {
    schema: {
      $id: "https://my-todo-app.com/todo.schema.json",
      required: [
        "title",
      ]
      properties: {
        title: {
          type: "string",
        },
        completedAt: {
          type: "string",
          format: "datetime"
        }
      }
    }
    hash: "hash-of-schema",
  }
}

When this model is saved, a few things happen:

The model is validated against this schema on the client.
A hash of the schema is generated, and passed to the server along with the schema itself, and the normal app data.
The server validates the model against the hash. It stores the schema hash along with the model.
The server broadcasts this write to its subscribers. The subscribers validate the schema as well, before saving everything.

Then, on the client, you can query data and pass in one or more values as a schemaHash property. This way, the client only pulls data in the schemas that it supports. This way, the client can be sure that it’s only pulling data in the formats it expects, while still supporting evolution of schemas over time.

There might have to be some extensions to vanilla JSON schema to support more complicated validation needs. For example, models may require uniqueness, based on other fields in the document. It would be best to provide schema extensions that can do this with only JSON annotations - that way, we don’t have to worry about hashing Javascript code, which could get messy.

Note that I’m not 100% sure about the exact APIs for defining schemas on the client and server. I want some way to easily share schemas. This might mean simply defining a schema by a URL, and the client can cache and hash these schemas automatically.

Open Questions

How can a user/server have some ‘guarantee’ that the data they’re being served is in sync with the rest of the network?

I worry that requiring global consistency this will lead into the territory of consensus and blockchains. I wonder if each server can keep and publish their own “chain” of changes, so that each new write is hashed, using the hash of the previous write (à la Git). Then, each server can publish their own “consensus” hash. This would be beneficial even just for the use case of making sure that servers are in sync with one another.

How can a user know that the server they’re connecting with is not hiding some data?

Following the suggestion in the previous question, the only way to know for sure is to run your own node and sync with that chain. That way, you can validate that server’s consensus hash and know the server isn’t keeping some data from you.

This could be done efficiently by each Radiks node keeping their own merkle tree of all stored models. A consumer could request a merkle proof for any individual models, and efficiently validate that the model is contained in the merkle root, without having to compute and validate the entire history of the node.

MichaelFedora · June 28, 2019, 5:33pm

Love this idea, as Radiks serves as a database of sorts, and I’ve always wanted a decentralized database for apps, so this is rather perfect.

I do agree with the concern in question one (re: race cases, state validation, etc); I know blockchain does this by proof of work, and Matrix tries to auto-join states in a fancy way, but how will Radik(sverse) do this well without corrupting an app’s state?

One other question was, if I have an app A, and super-parent-node A gets shutdown, will the app still be able to continue to work? I suppose the frontend could have an option to bootstrap to any node, and the nodes themselves could alter their trees as well as they see fit. As long as this stuff stays dynamically configurable it should be fine, though autoresolution would be neat as well (i.e. if supernode A goes offline, then the other nodes either serve as a shared oligarchy or compete for new supernode power until A comes back online and syncs).

edit: I know some programs like Yggdrasil have an interesting view of node hierarchy, but that’s all outside of my league and still doesn’t help state resolution. Something to look into though, maybe.

Can’t wait for this to be made =)

nicktee · June 28, 2019, 5:47pm

Soooo Rad! The radiverse! I am soo excited for this and I really like the concept on using json-schemas.

Partitioning

The one question I have is on scalability and partitioning of data. If an app became super popular it might end up having petabytes of indexes. How would partitioning work? Should we extend the Radiks model to have a partitionKey field? For example, on Twitter when somebody famous tweets they might have millions of responses. I am assuming they partition the data based on a partition key. They might even put a famous persons partition of data on their own server cluster.

I have been playing around a lot with graph databases and have read some interesting documents on partitioning here: https://docs.microsoft.com/en-us/azure/cosmos-db/partitioning-overview

hank · June 28, 2019, 6:15pm

I didn’t mention it in the doc, but I think during this upgrade it would be best to put each model into its own MongoDB collection (instead of one collection for everything). This would allow you to easily shard based on collections.

Ultimately I think “devops” type tasks (sharding, indexing, etc) are more around the responsibility of the node maintainer, and shouldn’t be necessarily built-in to Radiks. Each app has different needs for these type of things, and it doesn’t make sense to manage database infrastructure from the framework.

I think that would be similar to other web frameworks, like Rails. Rails doesn’t automatically partition or add indexes for you, because every app is different.

hank · June 28, 2019, 6:20pm

I know blockchain does this by proof of work, and Matrix tries to auto-join states in a fancy way, but how will Radik(sverse) do this well without corrupting an app’s state?

My proposal is that there is no “global” guaranteed ordering or single state. Each node would have it’s own Merkle tree and root. By not requiring this, it provides a lot more flexibility and doesn’t necessitate global trustless consensus, which gets close to needing a blockchain.

One other question was, if I have an app A, and super-parent-node A gets shutdown, will the app still be able to continue to work?

There are two things here:

The frontend should allow the client to connect to whatever Radiks node the user wants. It should be a “setting” type of thing that each user can specify. All this would really entail is changing the apiServer parameter in Radik’s configure function.
In this model, each Radiks server node will likely be hosting their own app as well. There is no single “super node”, where if it goes down then the rest can’t function.

MichaelFedora · June 28, 2019, 6:39pm

I guess I should specify my question further – if the primary node goes down, what happens to the peer tree? Is it alright because all nodes are connected and therefore nothing happens, and only new nodes will have to find a new “bootstrap” peer?

hank · June 28, 2019, 6:39pm

Yep, you’ve got it.

hank · June 28, 2019, 6:42pm

Autoresolution is a good idea. This could be done via a smart contract where nodes publish their latest merkle root. If the client fails to connect to a node, then it could query the smart contract and provide options for the client to connect to. The same could be done for new nodes to find a “bootstrap” peer.

nicktee · December 11, 2019, 6:11pm

@hank Jack wants the Radiverse! https://twitter.com/jack/status/1204766078468911106

friedger · September 8, 2020, 7:17am

Meanwhile there is a collaborative notepad app showing that matrix is not only for chat.