With more indexing requests coming in, we’ve been discussing how to better provide index data from our public blockstack-core nodes.
A few ideas:
Export the data as a json file into a public bucket in S3 and/or GCS
Provide a dedicated API endpoint for the json
However, before we do anything - We’d like to get some community feedback about what data you’d like to be available, and ideally how to access it.
In the end, the goal would be to provide a better public blockstack-core experience by offloading any indexing operations, while at the same time making it easier for everyone to get the data they need.
I run a search indexer to track total users of Graphite. It would be great to be able to just query that data from a hosted location rather than having to run an indexer service once a week (or however often) and then query ad hoc when I need the data.
I think the two ideas you proposed have the same end result, don’t they? They provide a place for a developer (or anyone) to query. If it’s a dedicated endpoint, I would query that endpoint. If it was a public S3 bucket (or similar), I would fetch and query.
Both suggestions seem fine–if resources are tight, go with option 1 maybe and then see how things evolve based on usage (assuming option 1 is less time)?
I index blockstack data to find users from the ‘apps’ field who have visited my app so I can then create an app specific search index. Of the two options the rest API is more dynamic, easier to keep up with new registrations. Exporting data into json implies the exported data will lag behind the real time data. It may also be harder to maintain data exports if changes down the line break the json data structures - easier to avoid that sort of problem with a rest api?
My only immediate need for indexing so far has been querying for all the names that have ever authenticated my app, which appears to be a very common use case.
My impression is that community indexer needs are virtually all centered on querying user data like this, so perhaps we can expand profile search not only for this case but others? That way we don’t have to spin out new endpoints (and I agree with @mikecohen.id here that an endpoint is preferable to a data dump and much easier to integrate).