General architecture of Blockstack -- initial version

Maybe we should name all the sections as well? For example on the right Networks.

1 Like

@ryan, @muneeb, and I had a good conversation this afternoon on the structure of the storage layer. I’ll attempt to summarize it here:

There are two types of data in the storage layer: the data record itself, and a route to that data. The data itself is just an opaque blob. The route, however, is a JSON document that contains enough metadata for a client to find the data. Reading, then, is the act of fetching the route, and then using it to fetch the data. Writing is the act of not only uploading new data, but also announcing a new route.

This construction gives us a way to talk about signed and snapshot storage. Snapshot storage is when the writer puts the signature on the data into the blockchain, whereas signed storage is when the writer puts the signature on the route instead. With snapshot storage, once the reader gets the data’s signature, it can determine whether or not the data an application serves it is the data it asked for. This means the route can be changed arbitrarily at no cost, meaning that once snapshotted, data can be mirrored as many times as you’d like and the route can be updated to reflect all of its locations. The downside, of course, is that writes to the data are much slower and much more expensive, since the signature must be re-written to the blockchain each time.

With signed storage, once the reader gets the route’s signature, it uses the route as authoritative hints from the writer on how to go about fetching and verifying the data. It’s important that the writer provide enough information to the reader in the route, but exactly what is application-specific. For example, if you can use TLS, you might publish a route to include HTTP URIs to the data, and also HTTPS URIs to signature files covering the data’s latest version. You’d be able to update both the data and signature files as many times as you like, but the reader will always discover the same set of URIs (i.e. assuming TLS works for you, the reader couldn’t get unknowing redirected to malicious data). While you would not be able to change your route without writing a new hash to the blockchain, you would be able to include as many different URIs (with as many different schemes) as possible for the data and signatures. You’d include enough to ensure that at least one of them will be usable to the reader in the future, and I would argue that this is likely to be the case if data sources fail independently, since the propability that they are all unavailable diminishes exponentially as you add more sources.

There are some nice features of this construction. First, users are the authoritative sources of their data. All writes come from users; users choose where their data is stored; users choose how to go about fetching the data. The application is no longer the authoritative source of user data.

Second, anyone can host data. For snapshot storage, anyone can mirror data and routes (or even generate new routes on-the-fly), and it won’t matter to readers, since the blockchain’s integrity implies that a host can’t tamper with the data without getting caught. This is great for spreading your public key far and wide, for example. For signed storage, anyone can mirror data and routes, and provided that the means to do so is established through the application, readers will not only get authentic data, but also be able to tell whether or not it is the latest version. The means of doing so are intentionally not specified, because notions of data consistency are application-specific (however, I would be happy to work out examples of how to achieve particular consistency models by varying what information gets included in signed storage routes).

Third, applications are incentivized to replicate their users’ data. Because applications are downstream replicas, it’s in their best interest to ensure that their users’ routes and data are highly available (otherwise, it hurts their user experiences). They would do so by caching it or replicating it themselves, thereby increasing its availability. Importantly, they are not required to do so like they are in traditional Web application architectures–they can still access users’ data remotely like any other reader–but it provides a very real, very easy-to-understand incentive mechanism that neither requires nor precludes the use of decentralized storage-cryptocurrency or a DHT.

Finally, we can minimize writes to one transaction per app per block, using a variation of @ryan’s hash-packing scheme [1]. Individual users do not need to snapshot each write or each route; instead, using the application as a coordinating service, a set of users can pack their writes together into a single transaction, and split the costs evenly. An application might even pay for this on their users’ behalf. Users would simply verify that their signature, as well as everyone else’s signatures, got pushed through as part of a packed write.

Thoughts?

[1] https://github.com/blockstack/blockstore/issues/81

3 Likes

With signature you mean hash, correct? For snapshot storage hash(data) is in the blockchain and for signed storage hash(route) is in the blockchain.

And this implies that we need a minimum amount of space for the route entry. So routes are probably not written directly to the blockchain but hash(route) is. I’m pointing this out again, because in our discussion initially I thought route record goes directly to the blockchain and I was concerned about that.

But the application can act on behalf of the user, I think. We might need to discuss this more factoring in practical concerns and UX etc.

Yeah this is great! Mirroring service that you’re working on is going to be critical for performance and this ties nicely with incentives for apps/services to mirror the data of their users. Awesome.

This is going to be very important for scalability in the long run. @jessewalden and @denisnazarov already need this.

1 Like

With signature you mean hash, correct? For snapshot storage hash(data) is in the blockchain and for signed storage hash(route) is in the blockchain.

Yes. The point is to get the blockchain to attest to both the integrity and authenticity of the data or route. A signed hash of the data or route is sufficient.

And this implies that we need a minimum amount of space for the route entry. So routes are probably not written directly to the blockchain but hash(route) is. I’m pointing this out again, because in our discussion initially I thought route record goes directly to the blockchain and I was concerned about that.

Routes, like data, are stored off of the blockchain. Moreover, the application is incentivized to replicate them along with their users’ data. Only hash(route) gets written to the blockchain.

But the application can act on behalf of the user, I think. We might need to discuss this more factoring in practical concerns and UX etc.

The application’s web page does, but not the servers. By design, the user has the final say on what writes get accepted. But, the user would need to authorize the application to write to a subset of data on the user’s behalf. For example, the authorization would be part of the account registration process, and the user’s data would be partitioned such that an application could only write to its own partition.

I agree that the client-side code to do this is still lacking, but I think this is surmountable, and could flow well.

2 Likes

Guys, keeping this discussion going. A couple of things: please feel free to make any diagram and post here. There is no standard, no official version, we’re just having discussions. Having a stack is a moving target and the more versions/models we have the better. Also, if you haven’t already, join http://chat.blockstack.org because some of these discussions are happening there. Here is another version from my side. Looking forward to other versions/models and issues / problems / feedback on this diagram.

For context, for our system the “consensus” was coming from Namecoin, and the “storage” was also coming from Namecoin (we believe the two need to be separated for scalability – we ran into these challenges), and at the “processing layer” our resolvers were processing all the data and making the namespace available to other people so they can use it however they want.

1 Like

OK I have another proposal for a partial architecture diagram that came out of our conversations in #architecture on http://chat.blockstack.org. Note that this is not a one-for-one comparison or a replacement for the diagram Muneeb showed. It’s only a partial representation to start a conversation about the complexity of the consensus layer and propose what I believe is a more accurate way to think about how state is built up and how applications actually interact with it.

Basically, the idea is to flatten the system out, simplify it, and clarify that the consensus on the state is actually coming from a single layer, but that the layer can actually have subcomponents in it that help it derive the state. I’m using naming and identity here, but this could just as easily apply to fungible tokens/assets like what we see in OpenAssets.

Let me know what you think.

Here it is:

3 Likes

I feel this is heading in the right direction and feel that keeping the architecture as linear as possible is a smart route to go as it allows for easier explanation. I feel that a beginner level “Blockstack” developer can understand something like this.

1 Like

In my latest discussions, I’ve been including “messaging” as a necessary layer in the stack. Basically looks like this:

That said, not all apps need to have real-time or even asynchronous messaging capabilities; do we want to add messaging and figure out a good protocol for apps to agree on for this or should Blockstack be independent of whatever messaging protocol is used? Reason I ask is I can’t think of a decentralized messaging protocol that meets needs for both real-time and asynchronous messaging while remaining mobile friendly (though some attempts are being made to develop a solution here).

HT @jeremie_miller

2 Likes

Everyone, the other day @jude proposed another way of looking at the stack. See attached diagram. Jude’s proposal is derived from his experience with building storage (the work he is currently doing on “mutable” and “immutable” storage) and how it’s hard to put naming on top of storage and vice versa. I’ve changed the diagram a bit and instead of saying “naming” I’m saying “state engine” (a term I first heard about from @williamcotton) to keep it generic i.e., naming is one type of state engine and assets can be another type of state engine.


What I like about this diagram is that instead of taking a networking approach, it takes a systems approach. Think of an OS kernel that handles lots of complexity and security for you and exposes a syscall interface. Life above syscalls is simple and app developers don’t have to worry about low-level things. Life below syscalls is more complicated, but once common problems are solved everyone in the application space benefits. This is a very interesting approach to looking at the infrastructure that we’re building. Yes, it’s distributed but there can still be a division between application space and what I’m calling “distributed core” (for lack of a better name).

This then leads us to thinking hard about the API calls that are supported e.g., http://github.com/blockstack/resolver is basically implementing a couple of these API calls and making life simple for any app developer who is talking directly to the resolver.

4 Likes

Ah yes, I love that we’re all on board with the terms “state engine”, “mutable storage” and “immutable storage”. I’m really digging this.

2 Likes

Blockstack should be independent of whatever messaging protocol is used - unless (until?) a good-works-for-most-use-cases messaging protocol emerges.

Can’t either. The link you shared is a great list of all the problems with current messaging solutions. Thanks!

1 Like

Forgive my ignorance, most of this is way above me, but what about Whisper as a decentralized messaging network - the eth-dev video of Gavin Wood seems to imply that it is asynchronous and real-time with a whole lot of cool other stuff? Otherwise, this is a really great thread, thanks!

1 Like