General architecture of Blockstack -- initial version

muneeb · June 15, 2015, 7:12pm

Hi everyone,

We talked about a common “Blockchain Stack” earlier. Below is an initial stab at a unified stack:

Some points to keep in mind:

– Our goal is to separate “common infrastructure” as much as possible. These are things that anyone who is building decentralized services/applications will need to (re-)implement. The more components we can identity as part of the common infrastructure the better it is.

– The separation of different modules in the stack is logical/functional and is not done from a networking perspective. When people think stack they immediately think Internet stack, and then get confused when Bitcoin/Blockchain itself has a P2P network and DHT has its own P2P network and so on.

– The module divisions and how they’re currently arranged is just an initial stab and we’re actively looking for feedback. Expect things to change. All feedback is welcome.

– Our goal is that it should be possible to replace generic components in the stack with more specific systems/protocols/specifications that have open-source code available e.g., IPFS or Blockstore can go (somewhere) in the storage layer or Passcard can be used for identity and so on.

– We’re currently putting identity / reputation / trade etc all in the “application layer”. The motivation is that just like in the Internet stack HTTP or FTP are considered application level and use the underlying layers, we can build decentralized identity, decentralized marketplaces, decentralized attribution etc at the “application layer” by using the underlying common core.

Looking forward to getting more feedback. Thanks @ryan and @jude for feedback on an earlier version of this diagram.

ryan · June 16, 2015, 2:06pm

OK looking very good, I’ll dig into this again soon and write something up.

jude · June 16, 2015, 3:22pm

We’re still coming to an understanding of how the storage layer and consensus layer interact, but we’re getting close. Our current understanding is that the consensus layer provides us data attestation, whereas the storage layer provides us data verification. Both are needed to build decentralized storage systems without a single point of trust.

What do we mean by data attestation? By constructing the blockchain, mutually distrustful peers already come to consensus on which principal committed which transaction, and in which block interval it will have been incorporated. We extend this to have a principal P testify that it wrote datum D by writing a signed transaction to the blockchain that contains the cryptographic hash of D. By accepting the transaction into block T in the blockchain, the peers in the consensus layer witness P’s testament at block T. As more and more peers witness P’s testament, it becomes harder and harder for an adversary to alter the testament after-the-fact–the adversary would need increasingly large amounts of computing power and control over network links to stop a well-formed testament from getting witnessed and accepted onto the longest chain. By constructing transactions this way, and by treating T as a coarse-grained timestamp, the consensus layer gives the layers above it stronger and stronger guarantees of content addresses, content authenticity, and bounds on content write ordering as time goes on.

Attesting to data is not enough, though–the written data itself must be consistent with the type of record that was written. For example, a write to a certificate must actually contain a certificate. Another example, an update to a Passcard’s social media fields must contain URLs to information in that principal’s social media accounts. To address this, applications need a way to verify that a testament is truthful (i.e consistent with an application-given specification). Once a write has been testified by the writer, witnessed by the consensus layer, and verified by the application, it is considered successful.

I’m still struggling with where the verify operation fits in this architecture. Specifically, it’s not clear that verification belongs in the storage layer. The application’s necessary involvement suggests that the “verified storage” layer actually sits directly beneath both resolvers and applications. This would imply that resolvers are really a type of application (albeit a “well-known” application) that gets leveraged by other applications. Alternatively, it would imply that there is a protocol by which the application can program the storage layer to perform its verifications. This is the approach taken by Syndicate, for example–the application brings its own storage capacity to the storage layer, and implements the common storage API to interface with the naming layer and P2P layer services but applies application-specific verification logic to writes on its records.

Thoughts?

light · June 16, 2015, 9:05pm

@muneeb I like this description. A couple of days ago, Blockstack.org had a home page with a nice looking “stack” diagram; now blockstack.org just redirects to the forum. What happened to that home page? Is it just not ready yet? I really liked the diagram I saw there.

muneeb · June 16, 2015, 9:06pm

Yep, it in the works and we’d love to share the website after getting feedback from everyone and agreeing on a common stack etc

light · June 16, 2015, 9:27pm

I’ve invited my friend Harlan to the forum, so he might be able to speak to this more, but he has a project called Nodesphere which could be good for providing interoperability between the various apps on the application layer.

The stack brings to mind the ADEPT project, which uses Ethereum, Bittorrent, and Telehash (essentially blockchain + DHT).

I would like to see elaboration on the Discovery and Notifications modules in the P2P layer. What existing software would be used for these purposes today?

In Passcard, verification of e.g. a Twitter account seems to occur at the application layer. You publish a tweet which is formatted a certain way, and the application uses the Twitter link you added to your profile to verify that the tweet was sent from the right account, and then verifies this. Is this what you mean when you refer to verification?

muneeb · June 17, 2015, 6:20pm

Thanks for inviting Harlan!

Yeah, @ryan @jude and I had a discussion about this over lunch. There is a confusion with the use of the term “verified”. As @light points out that Passcard verification happens at the application layer.

For data storage, after our discussion, we’re proposing to change the name of the two types of storage to:

Snapshotted Storage: Or “Snapshot Storage” for simplicity. This is the data where the hash(data) is in the blockchain and we have a guarantee that we’re getting a particular snapshot of the data at a particular point in time (i.e., at a certain blockchain block height).
Signed Storage: The hash of this data is not in the blockchain, but the data is signed by the same owner and a) hash of the block, b) block number is included with the data to provide ordering of data. We don’t have the guarantee that we’re getting the latest version in this type of storage.

I’ve updated the architecture diagram and the new version is below. I’m currently using “Snapshot” instead of “Snapshotted” to save space. Feedback welcome.

jude · June 17, 2015, 6:20pm

@light Yes, that’s what I mean by verification. I also agree that verifying a Passcard is an application-level concern.

We had a good talk about this at lunch today. Let me see if I can recap. First, the current understanding is that the notion of “verified” storage should live entirely in the application–my description of the testify-write-verify procedure is from the vantage point of the application, not the storage system. Second, “verified storage” and “extended storage” are really bad names–what we really want from the storage layer is to differentiate between “signed storage” and “snapshotted storage”.

The consensus layer gives writes a coarse-grained notion of order, because each datum written will get incorporated into a particular block in the blockchain. We would say that datum D is snapshotted when there is global consensus of its state at a particular time (at the interval of blocks). The diagram should say “snapshotted storage” instead of “verified storage”–we were unintentionally using two different definitions of “verify” (i.e. semantic application-level verification of a record’s structure, versus verification through consensus that a particular record existed at a particular point in time).

With snapshotted storage, readers are able to not only verify the authenticity and integrity of a record, but can also verify that data they receive is the “latest” such instance of the record. For example, I might update my Passcard from time to time. How do you verify that the Passcard record you receive is the latest version? By snapshotting the Passcard, you guarantee that if you’re receiving blocks and coming to consensus on the same blockchain as I am, you will receive its latest hash and signature. If I did not snapshot my Passcard, then it would be possible to censor the new version without you knowing it.

However, there are downsides to snapshotted storage: it makes writes both slow and costly. Fortunately, there are other ways to solve this problem, if you are willing to change your trust model. For example:

My data might be write-once–there are no subsequent writes to worry about.
I write my data to a storage node under your direct control, which you can trust to always send you the latest data.
I snapshot my data to a private sidechain that you and I work together to build.
We establish an SSL connection using our public keys from our Passcards, through which I’ll periodically send you snapshots out-of-band.

For these kinds of situations, we have “signed storage”. You figure out how to get the latest version of the data out-of-band (if at all), and I simply sign each record I write. This is much faster since it doesn’t require a blockchain transaction, and it still ensures global data availability, but it don’t give you the benefit of global consensus on write ordering (but as you can see, you may not need it anyway).

EDIT: Looks like @muneeb posted at the same time as me.

larry · June 17, 2015, 6:43pm

Was discussing original diagram with @liongrass a few hours before the new one came online. Agree the original names are confusing.

I assumed “extended” storage meant data that was not only not “snapshotted” on the blockchain, but not even signed - perhaps it would exist simply as a URL pointer to an external resource like profile pics in the v0.2 Passcard profile schema.

Would it make more sense to call “Applications” “Services” as in “Blockstack Services”? Identity, auth, reputation, etc are all services that other applications can integrate into their end user products. For example, say I’m building a new game, I can get identity and authentication services from Passcard, a reputation system from , a market place for my players to buy and sell virtual goods from OpenBazaar, etc.

ryan · June 17, 2015, 7:16pm

Yes, I really like snapshotted storage.

The names would shift from “verified storage” and “extended storage” to “snapshotted storage” and “signed storage”.

@muneeb What do you think of sticking to calling it “snapshotted storage”? Each storage term has a proceeding modifier that indicates a procedure that has been performed on it in the past tense (verified, extended, signed, etc.). Therefore “snapshotted” might be more apt than “snapshot” for continuity and clarity.

I think this makes sense.

ryan · June 17, 2015, 7:17pm

Separate issue - “Resolvers (for domains, userames, certificates)” is really long. Can we cut this down?

What do you think of referring to it as naming instead? That clearly includes domains and usernames but if you want to indicate it includes certificates you can say something like “secure naming”.

Thoughts?

muneeb · June 17, 2015, 7:35pm

I switched to “Snapshotted Storage”. I think the word looks ugly and people will probably shorten it overtime, but for consistency we can say Snapshotted in our version.

Agree with @larry on Services vs. Applications. Changed that.

For naming, I don’t think it makes sense to call it “naming” or “secure naming” because I’m looking at it as the layer where DNS resolvers and DNS caches operate. The layer above this has access to name resolution. So probably makes more sense to keep “resolver” in there and remove any details for simplicity. Also, having “Naming” as the component in the “Naming Layer” is slightly redundant. We can even link to the “BNS Resolver” repo when we have a version that makes thing specific i.e., we’re using BNS in the naming layer and BNS resolver lives here in the stack.

ryan · June 17, 2015, 7:46pm

Yeah you’re right, noticing now it does look a bit ugly since it’s longer. Hm, well I guess either can work.

Gotcha, I see what you mean. Hm what about “name resolvers”? Might be a bit more clear about what the resolvers are doing.

larry · June 17, 2015, 8:03pm

I understand snapshotted storage in secured by the blockchain and signed storage isn’t - and I think that’s clear from the diagram.

Is it intended that similar conclusions be drawn from the relative positions of items above the storage layer?

muneeb · June 17, 2015, 8:19pm

Nope that’s not the intention. That’s a good point though. We’ll need to brainstorm how to best convey any dependency on the blockchain.

guylepage3 · June 18, 2015, 6:02pm

I feel “Snapshotted Storage” not only looks bad, but it sounds bad as well. What about just “Snap Storage” or “SsS”?

ryan · June 18, 2015, 6:14pm

The name should be descriptive. It should say what it does. Snap is just adding branding where there doesn’t need to be any.

light · June 21, 2015, 2:31am

I think “Snapshot Storage” sounds good and is descriptive enough.

muneeb · June 23, 2015, 4:06pm

Sounds good about “Snapshot Storage” – looks like most people prefer that.

Here is another revision. Two main changes are:

Using “Snapshot Storage”.
Took out the P2P layer as there was some confusion about if it and displaying it on a side. At the blockchain level, you do broadcast (announce transactions to everyone) but at the upper layers usually you’re talking to only peers and collaborating to provide a layer like storage.

Comments are welcome! Reminder, this is just brainstorming and with more feedback we’ll keep polishing the architecture

light · June 23, 2015, 8:46pm

Is the Blockstack messaging protocol agnostic, and/or should it have a messaging layer included (or is this what P2P is in the stack)?