There’s a number of reasons we’d want a list-files (ls) command in gaia hubs. The biggest is data migration and portability. In order for a user to truly be able to migrate all their data (or see all their data), they’d need to be able to see all the files that the gaia hub is storing on their behalf (because an application may not show them that information). This has also been requested a number of times (example: Feature Request: Gaia `ls` functionality)
Overview
Supporting this requires 2 spec changes:
Gaia hub driver model
Gaia hub API
Gaia Hub Driver Model Changes
This proposal would extend the gaia hub driver model to support listing files with a given prefix:
If we want to limit the return size of these requests, we’ll need to support pagination, but many backend drivers will have different methods of doing their own pagination, and we wouldn’t want to act a buffer here.
The GET /list-files/<hub-address> endpoint is implemented by the Gaia hub, not the storage endpoint. It must only be accessed via https. This is crucial because user data shouldn’t be enumerable by default by just anyone, and (see below) we’ll need to pass some sensitive data to the Gaia hub to do pagination efficiently.
Not all storage systems support pagination natively, since they have no notion of pages. Instead, they expect clients to pass a cursor-like object in the relevant listFiles()-like API endpoint.
Regarding pagination, one idea is to allow the Gaia hub to pass back to the client a cookie that contains any driver-specific state, like pagination cursors. Then, successive calls to GET /list-files/<hub-address>?page=XXX would preserve the pagination cursor, thereby allowing file scans to operate efficiently (the alternative would be to force each page query to scan “up to” the page requested, which would have O(n^2) time complexity for n files).
Also, regarding pagination, getting prefix matches to work at the API level is going to be a lot of work since not all drivers support it. Do we have a real case where match-by-prefix will be necessary? If so, should we just pass a prefix as a driver-specific hint in the query string, so we don’t have to commit to supporting it in all future drivers?
Definitely excited about this feature @aaron. I don’t have anything meaningful to add regarding pagination or prefix matches that @jude mentions.
Stealthy currently writes each offline message to a file that is indexed by our own js module. There are some performance concerns when writing many files and because deletion of a file is not possible, we’re currently storing deleted file handles (essentially empty files) in an index for the day when deletion is possible. I’m curious if true file deletion is on the roadmap?
As an FYI–and probably specific to our use case, Stealthy’s indexing optionally writes two index files–one encrypted for the user and another encrypted for the recipient. This way the intended recipient is able to see what new files (messages) are available for processing.
The blockstack-cli tool now supports gaia_getfile, gaia_putfile, gaia_listfiles, and get_app_keys directives, making this somewhat straightfoward to test. Will deploy this code to the testnet as well.