About the trust zone of an open membership gaia hub

nextonr · January 16, 2019, 6:44am

here is the scene:
If I want to set up a topic in twitter, under which there are many tweets about the same topic. People own their own data, that’s totally fine with blockstack, but what if a public bulletin board needs to be raised? like a hot topic aggregated from many tweets?
In my opinion, I have to set up an open membership gaia hub to store many tweets from the ‘decentralized twitter’ user. But where should the gaia hub running? If runs in remote server, then it’s untrusted, if runs locally, it’s useless if a malicious people want to get rid of the permission check. Additionally, for an open membership gaia hub, who should provide the cloud storage? Is the provider trusted?

jude · January 16, 2019, 2:55pm

Recall that the Gaia hub is only on the write path. An open-membership Gaia hub is a Gaia hub that allows anyone to send writes through it to its back-end storage provider. The storage provider (whatever it is) handles reads. You do not need to use someone else’s Gaia hub or storage provider; you can run your own.

Recall that the storage provider is trusted only with data availability, not application correctness. The app client would sign all public data before uploading, and perform a signature check on read. Then, if the data is missing or tampered with, the client will detect it and raise an error. Note that in such a scenario, it is indistinguishable as to whether the storage provider or the ISP is to blame for the missing/corrupt data.

Standing up a Gaia hub for hosting a “hot topic” is not required. Instead, you would stand up an indexer that crawls the set of users posting about the hot topic, and aggregates their (signed) data into one place so clients can easily fetch it. Note that this is an optimization on the read path, and does not affect the trust model or protocol described above. Anyone can run an indexer, so if you don’t like the default indexer, you could run your own or point your client to one you trust.

nextonr · January 16, 2019, 3:35pm

Thank you, jude, you’re really nice.
But what I worry about using a crawl lies in 3 points:

The efficiency. Like twitter, Anyone could send their own tweet at any time, if many tweets are exploding at the same time, will it be quick enough for a crawl to fetch the decentralized tweets and provide them to people? I mean, I can’t wait for a day long, if that is the case, the hot topic will be a cool topic
The storage of a indexer. Who provide the storage for the indexer? It could be huge amount of storage. It may be unaffordable for personal use.
Is the indexer trustworthy? Setting up my own indexer may be time consuming, but using default indexer, it may be out of personal user’s local trust zone.

ps. I’ve heard that peepeth, the decentralized twitter, using IPFS as storage, I don’t know if this could be the solution.

jude · January 16, 2019, 4:53pm

The app client would simply replicate the user’s tweets to both the Gaia hub and the indexer itself. If there are multiple indexers for the app, then the indexers would forward new tweets to each other as they arrive.

At the very least, the app developer would run one. Depending on the structure of the app, different user communities could run their own indexers. For example, if you were to build Reddit on Blockstack, each subreddit could maintain its own indexer.

It is no more or less trusted than the storage provider itself. The indexer is a downstream replica of other peoples’ data – it cannot alter signed data nor view encrypted data, but it can corrupt or censor data (but this is also true of a storage provider or an ISP).

IPFS is far less reliable than Gaia – the minute your data in IPFS is no longer pinned by anyone, it gets removed. Also, IPFS is trivially easy to censor – you simply pollute its DHT’s routing tables. Unlike Gaia, you don’t need to be an ISP to attack IPFS’s DHT – you can carry out an attack from your laptop.

nextonr · January 17, 2019, 2:48am

Thank you, jude, I get it.
So gaia is just for hosting personal data, it doesn’t fit for dealing with someone-else’s data, this part of logic should be moved out of the design of blockstack.