The basic idea here is that many applications will have a difficult time implementing sharing and other functionality directly with the decentralized data storage interface provided by Gaia. The solution is to provide a indexing service which is open-source, configurable, and easy to deploy. The exact specifications for this, however, still need to be ironed out.
Easily realizable version
The most directly realizable version is something like a simple search indexer, which uses multi-player reads of gaia-stored data. The indexer would consume a data schema, and be provided with a filename to regularly index (this is a pretty direct generalization of the profile indexer that powers our currently deployed search service (see https://github.com/kantai/blockstack-search-indexer/)
This could emit data in a number of different formats: JSON, by default, but also importantly it should be able to POST to an elasticsearch endpoint. However, the repository should have a pretty easy to deploy setup which will initialize both the indexer, and a search endpoint.
Open questions
Support for notifications?
Can this service be used to enable notifications (i.e., pushed from the indexer to a client)?
If so, is there a standard that weād like to implement? How should this interact with the notion of a āgaia inboxā?
I think this is going to be an app-specific thing, regardless of whether or not it touches the indexer design and/or implementation.
To a first approximation, the indexer could implement a set of GET and POST endpoints to receive re-indexing hints. Apps would use this endpoint to tell the indexer that the data in the userās Gaia hub has changed, and that the indexer should go re-crawl it. The endpoint would have the following properties and exhibit the following behaviors:
The GET endpoint serves a description of how to POST hints. It includes:
A version number
A challenge text
Which app(s) it indexes
The maximum payload size the POST hint will accept
The POST endpoint authenticates the POSTer as being sent by a particular Blockstack user. This can be achieved much the same way it is with Gaiaāthe indexer gives the user a challenge text, and the user replies a signature over the challenge text, a random salt, and the posted payload.
The POST endpoint accepts a posted payload that encodes the following information:
Version number
The userās app address
a list of files that the user has changed (and need to be re-crawled)
OPTIONAL: the new data, if itās small enough
The POST endpoint requires that the posted payloadās user app address must correspond to the public key used to authenticate the POST headerāi.e. the user can only signal that they have modified their own files for a specific application.
The indexer would need to have access to each userās profile so it can identify and authenticate the POSTed hint. I think this is fineāgetting the set of profiles is going to be necessary anyway, since the indexer will need to know where the set of Gaia hubs are for a particular app.
The indexer implementation can decide what to do when a user POSTs to it. It can do things like:
Rate-limit or throttle user requests
Queue files for reindexing at a particular date, or at a particular rate
Synchronously update its index with new data
Ignore the user
etc.
Iām not sure this has any interaction with the Gaia inbox proposalāI think the Gaia inbox proposal is less about maintaining an index over an applicationās data, and more about bootstrapping a social graph between users. Thoughts?
Bob searches for all users which indicate Bob as a desired contact
Bob adds Alice back
This could be done more simply with the gaia hub inboxes, and the design of the gaia hub inboxes was to prevent the above use-case exactly, but I want to explore why the above use case is a bad one, because this is significantly simpler of an architecture (and doesnāt involve users writing to each otherās gaia hubs).
Hereās a mockup of a social network bootstrap:
Alice adds user Bob as a desired contact
Bob searches for all users which indicate Bob as a desired contact
Bob adds Alice back
Totally agree that Gaia inboxes make this particular problem simpler!
My protocol sketch above was more for the case of:
Alice and Bob are already connected
Bobās signs in, and in doing so, subscribes to push notifications from the indexer for Bob-specific events.
Alice writes a new file, or updates an existing one (like a status or profile picture). This pushes a notification to the indexer that the relevant file(s) have changed.
The indexer sends Bob an immediate notification that Alice has written new data, and his client refreshes it.
The indexer marks its cached files for Alice as stale, until it can fetch them from her Gaia hub and re-process them.
Right ā I think this is what I was trying to get an idea of. The above use case actually doesnāt sound like an indexer, but instead a notification service, with the difference being how important the specific service is to the normal functioning of the application. A search index should (in theory) be a completely replaceable part ā Alice could run an āAlice Indexā and Bob could run a āBob Indexā and have the exact same data. This is not true for the proposed notification system. With such a system, itās very important which notification service Bob pushes his updates to, because Alice should subscribe for updates for Bobās events from that service. In that case, I think this is something that would need to be user-specified and published, i.e., Bobās profile contains an entry that says āFor notifications about Bobās files, ask server Xā. This, at least to me, seems like a separate thing from a search index.
The use-case I had in mind was something like a decentralized Facebook, which needs to implement both an indexer and a notification service. The indexer aggregates your friendsā status updates into a Wall, which can be fetched with a single HTTP request. The notification service informs you when one of your friends posts something, so your client sees the update without having to poll the indexer all the time. Both the indexer and notification service are app-specific.
How separable are indexers and notification services? I think the deployments are separableāAlice and Bob could run their own indexers and share a notification service. However, the code probably isnāt separableāhandling notifications in a way that achieves the above effect sounds like a cross-cutting concern to me.
Iām not convinced that Alice and Bob need to share a notification service per se, although a simple implementation of a notification service could be a logically centralized one. But, if the indexer and notification service turn out to be logically inseparable, weāll need to think hard about the design of the notification service to allow for multiple cooperating deployments.
One idea I had for this problem is to have a namespace for the decentralized Facebook whereby people who run indexers and notification services can list their servicesā DNS or IP addresses. Then, users subscribe to one or more such services. While indexers donāt need to communicate, the notification services can ensure all-to-all notification transmission by enumerating the set of notification services via the namespace, and forwarding notifications along to other notification servers (kind of like how Matrix works today).
I think this is an example where the assembly of the wall could easily be done client-side, where the indexer is just used to aggregate ā itās just a search endpoint. So in the example of a wall, youād have a search like āfind all posts with wall identifier = Xā, and the client would be responsible for assembling.
They wouldnāt need to share the same notification service, but in order to receive notifications from Bob, Alice would need to subscribe for āBob Eventsā from a service that Bobās client is communicating with (say Service 1). Now, Alice could designate some other service (Service 2) as a notification service for āAlice Eventsā, and then subscribers for notifications on Alice events could use that service. Obviosly thereās other schemes that are possible, but those require either (1) way more infrastructure to be actively crawling or (2) significant latency degradation.
I donāt understand why that use case is a bad one as long as nothing requires step 1 of a user (also, you might want the ability to undo step 1āi.e. remove Bob as a desired contact before Bob gets a chance to search for users indicating him as a desired contact). Is there more background information that I missed?
Iām fairly convinced that this use case for an indexer is fine. The only downside I can really think of is that it makes the indexer a required component for the applicationās normal functionality ā though I would argue that is okay as long as the indexer is user-selectable.
This is actually similar to how Stealthyās offline messaging service works todayāthe final product it assembles is not a wall, but there is no reason why it couldnāt be. With some planned modifications to our protocol, it scales reasonably well, but probably works best with something like the Gaia inbox proposal.
This discussion about indexing is an interesting way of potentially increasing the efficiency of the protocol in a way similar to the inbox proposal (reducing the number of sources that need to be consulted to check for updates). I wonder if it would be sufficiently fast for offline messaging notificationsāthough I believe the Gaia inbox is still preferable.
Hi everyone. First of all, great discussion here and I agree with most of the topics raised.
Without getting into the technical implementation for now, I would just like to raise two points:
From my personal experience with Travelstack and after talking with Justin from Graphite and George from Souq, there is a first indexing component that is common to all our apps: the ability to know which users are already authenticated. This could be integrated directly into core.blockstack (like the endpoint to lookup users) or as a separate service but I believe that for new developers it would be extremely useful to have that information readily available when they start.
This would also allow for cross-app prompts such as querying if a user uses both Graphite and Travelstack, for example, and prompting to add an image to a doc from Travelstack.
Regarding the other features such as the example given of adding a contact I believe both services (indexer and notifications) serve different purposes. The indexer would always work as the aggregator that one could query to reconstruct the overall graph and the notifications service as an ephemeral service to push and receive changes from the users.
There is an example up from our Feb. 2018 Stealthy website on our github. Look at indexedIO.js
Some things to note:
hierarchy not supported (i.e. one index per directory)
indexedIO.js stores deleted files under āinactiveā, this is inefficient for systems that create/delete a lot of files
a sharedIndex is also created and encrypted with a separate public key (this is to allow others to understand what files are present)
if you see mention of firebase or firebaseIO, this was a switchable back end that allows for rapid development and debugging
Hereās the github link:
Weāve since designed a number of changes and might open source this eventually with a lot of new features, though weāre looking at Judeās list files work with GAIA before hand, and that might actually meet your current requirements. Be sure to look there first.
Iām a bit late in replying, but Iām wondering if āthe ability to know which users are already authenticatedā is met by the apps list in a userās profile or if you mean something different?
Thanks for the reply
I mean indexing that information for every user in order to know which users already authenticated on your application without having to query all 22.000 users (number of users on blockstack right now: https://core.blockstack.org/v1/names?page=219) every time you need to know whoās already using your app. Does that make sense?
I donāt think I can suggest any alternates you havenāt already considered (i.e.a centralized store like Firebase, etc.). Weāve talked about a decentralized analytics/db service, but thatās a ways off for us to reconsider.
An analytics platform might also be useful for this depending on itās ability to allow you to incorporate user data and export it manually or better automaticallyāalso, inevitably someone will ask you about DAU/MAU stats.