RFC: public blockstack-core indexing

jwiley · January 3, 2019, 9:27pm

With more indexing requests coming in, we’ve been discussing how to better provide index data from our public blockstack-core nodes.
A few ideas:

Export the data as a json file into a public bucket in S3 and/or GCS
Provide a dedicated API endpoint for the json

However, before we do anything - We’d like to get some community feedback about what data you’d like to be available, and ideally how to access it.

In the end, the goal would be to provide a better public blockstack-core experience by offloading any indexing operations, while at the same time making it easier for everyone to get the data they need.

aaron · January 4, 2019, 3:50pm

There’s a couple of app developers that I know do some indexing on blockstack-core (and have done so via core.blockstack.org in the past):

@jehunter5811
@alvesjtiago
@valrepsys

Tagging them to see if they have any comments.

jehunter5811 · January 4, 2019, 5:46pm

Thanks for the tag, Aaron!

I run a search indexer to track total users of Graphite. It would be great to be able to just query that data from a hosted location rather than having to run an indexer service once a week (or however often) and then query ad hoc when I need the data.

jwiley · January 4, 2019, 10:47pm

That’s the plan!

Would you have a specific preference of how to get that data though?
We had a couple of ideas I’ve listed above, but we’re open to suggestions

jehunter5811 · January 6, 2019, 6:17pm

I think the two ideas you proposed have the same end result, don’t they? They provide a place for a developer (or anyone) to query. If it’s a dedicated endpoint, I would query that endpoint. If it was a public S3 bucket (or similar), I would fetch and query.

alexc.id · January 8, 2019, 10:00pm

Both suggestions seem fine–if resources are tight, go with option 1 maybe and then see how things evolve based on usage (assuming option 1 is less time)?

mikecohen.id · January 9, 2019, 10:08am

I index blockstack data to find users from the ‘apps’ field who have visited my app so I can then create an app specific search index. Of the two options the rest API is more dynamic, easier to keep up with new registrations. Exporting data into json implies the exported data will lag behind the real time data. It may also be harder to maintain data exports if changes down the line break the json data structures - easier to avoid that sort of problem with a rest api?

markmhendrickson · January 10, 2019, 1:22pm

My only immediate need for indexing so far has been querying for all the names that have ever authenticated my app, which appears to be a very common use case.

I looked to the profile search endpoint to do this (https://core.blockstack.org/#resolver-endpoints-profile-search) but found that it returns results only for value matches, not key matches (and apps are represented by keys).

I wonder if a simple improvement could be to return results based on key matches as well? e.g. I could run /v1/search?query=humans and results with app key https://humans.name would get returned (such as shown in my profile https://gaia.blockstack.org/hub/1789gBX7w1XFPeG5SFKkbfsUbrHvnTvYRC/profile.json).

My impression is that community indexer needs are virtually all centered on querying user data like this, so perhaps we can expand profile search not only for this case but others? That way we don’t have to spin out new endpoints (and I agree with @mikecohen.id here that an endpoint is preferable to a data dump and much easier to integrate).

markmhendrickson · January 10, 2019, 3:42pm

I’d also just note this previous forum thread about an indexer since it has a lot of good conversation around this topic: Request for Comments: Gaia Indexing Service

jwiley · January 14, 2019, 4:13pm