[v1.1] Hestia - Pseudo-Decentralized Storage Middleware

Hestia

Pseudo-Decentralized Storage Middleware, or in other words, a Multi-Backend Gaia Hub

ko-fi
Source Code

Feature Overview

  • Gaia Compatability
    • To use as your Gaia Hub, simply use {your hestia node url}/gaia!
    • Uses Gaia Authentication token format for simplicity and ease-of-use
  • Easily configurable with many whitelisting options
  • Uses a Database for storing metadata, indexing the files, and storing user preferences
    • This improves performance when looking up your files instead of just assuming they all exist.
  • Advance Drivers
    • Multi-Instance: Run more than one driver of a particular type with different configuration options
    • Multi-User: Run one driver that supports individual users (for remote cloud storage like Dropbox)
    • “Root Only”: Only writing the root folder (profile.json and avatar), to limit storage use and encourages users to use their own remote backends
  • Plugins
    • From making backups-on-request to providing a dashboard, it’s all possible through the Plugin API Interface

Doc Shortcuts:

Note: This is a trimmed verison of the readme; to view the full version, go here.

About / Goals

Background

I am simply trying to finish what Blockstack started in regards to Gaia – or rather, to try and fulfill the original goal in a different way. I wanted Users to be able to use their own Dropbox without having to spin up their own node and all of the complexity that doing so brings – so why not have a pseudo-centralized service handle it all for them? And beyond that, why not have the ability to hook up multiple backends (as advertised in the whitepaper) that can replicate or be given to a particular app at the user’s choosing?

If you want to get into blockstack easily, use this; if you are concerned about centralization but still want the ease-of-use this brings, run your own node; if you want to go as deep as you can, run multiple of your own gaia hubs and use a browser that supports that (if any exist currently).

  • Michael Fedora, from here.

Explanation

The goal of Hestia is to serve as a more complex Gaia Hub. While the original software works well, it works simply and cannot solve certain problems, such as those posed by using personal cloud storage providers as backends while also allowing any end-user to do so.

Hestia was made so that any Blockstack user could use their personal cloud storage (i.e. Dropbox, Google Drive, etc) as their own storage backend, i.e. to have full control over both ends (writing the files and accessing the backend). While this node, which serves as middleware, is still controled by a third-party, it can easily be run by any user as well, whether for themselves, for their family, or their organization.

End User -> Gaia -> Amazon S3 (Node Owned Backend)
End User -> Hestia -> End User's Dropbox (User Owned Backend)

In addition, because of its inherent complexity, more features have been added to allow extension by third party plugins and other drivers, whether locally created or imported through npm. There are also more configuration options on the default drivers, such as the Disk driver being able to limit how much overall storage it is using, as well as how much each user is allowed to store. Hestia can also whitelist drivers as well as limit a driver to only being able to be used as an identity folder – i.e. only a user’s profile.json and avatar will be stored on the driver backend, and all other apps that attempt to use the driver will fail.

Hestia also provides a singular Gaia interface for all backends – this means the user manages what drivers handle what buckets. For instance, an end-user could have their Dropbox handle Stealthy data, while their Google Drive can handle their Travelstack data, and their One Drive gets everything. All drivers get your identity folder, however, and some drivers allow you to use them more than once; for instance, you could attatch two dropbox instances to your account if you so desired – but you can only have one disk driver per driver instance.

User Connections:
- Disk (identity only, 5mb limit)
- Dropbox 1 (personal, 2gb limit, default (store everything))
- Dropbox 2 (team, 2gb limit, stealthy.im only)

Of course, with all of this complexity, there will be some downsides:

  • It is required to use an association token for every request, as the Hestia Hub needs to know the end-user’s address to be able to read the connection information
  • Connection information (such as dropbox tokens) are stored unencrypted within the Hestia Hub
    • These can obviously still be revoked, and should not seem more unsecure than any other app requesting to use your dropbox.
  • File Metadata and their Path are stored in the local database, unencrypted
    • While it could be concerning because of how easy it is to get it, reading a Gaia Hub’s logs (or the box’s HTTP logs) would get you the same information.

Installation / Setup

  • Clone the source
  • npm i
  • npm run build-prod
  • Copy config.sample.json, rename to config.json and configure (see:
    Configuration in the readme)
  • npm start

Known Issues & Dev Comments

  • I can’t find an app that has the latest version of blockstack.js and accepts association tokens correctly… the hestia frontend logs in all well and fine but graphite, xor drive, and one of the kanban apps just don’t want to use it. Quite unfortunate.

  • I can probably upload a gif of it working eventually, but I have a node running in production and it works pretty smoothly… as much as a storage backend can when no apps will run on it, but I was at least able to test syncing, driver adding/removing, user whitelisting, etc. so…

  • There’s quite a number of things I want to do still, but this is the MVP to release. You can see my whishlist here.

License

Released under Mozilla Public License 2.0, with graphics under CC BY-SA 4.0.

6 Likes

Mercurius 2.0 ?

Better – it’s a complete replacement to Gaia, with plugins, multiple drivers, syncing, and a built in explorer that loads “instantly” instead of having to grind through possible buckets again.

Mercurius can still be used to browse normal Gaia Hub’s, as well as create a “migration index” to import into the Hestia Hub, but it was only made as a prototype so I could get the explorer UI figured out (as well as make a way where I could migrate without waiting on Blockstack to figure something out).

1 Like

Wow this is really cool @MichaelFedora! Just wanted to address a couple things on your wishlist:

direct links to save bandwidth? (V1.1)

I’m curious to know your thoughts on your approach to getting direct links from Dropbox to work with multi-reader storage? As I’m sure you’re well aware, what makes this hard with Dropbox is that the reader only knows your app-specific URL prefix and the file path, and yet has to somehow come up with the obfuscated Dropbox URL that resolves to the actual file data.

One approach that could save on bandwidth/space is to have the Hestia Hub’s DB store a mapping between the tuple (url_prefix, file_path) and the obfuscated Dropbox URL on write, and on GET $url_prefix/file_path, it would respond with an HTTP 302 to the Dropbox URL. Not sure if this is something you’d necessarily want though, since all your reads would still be hitting your hub.

Another approach we actually took in the distant past, but dropped support for since it lead to very bad read performance, was to build up an index in the user’s Dropbox folder so that if you knew the “root URL” to the index file, you could dynamically fetch pages of the index to translate an app URL and file path back into the Dropbox URL. While this meant that the hub didn’t have to handle reads, it meant that all reads took at least two sequential HTTP requests (one to fetch the index page, and one to fetch the content).

overarching admin feature (/api/v1/admin/ + frontend work) (v1.1)

Not sure if you’ve seen this yet, but there’s a Gaia admin service that if deployed alongside a Gaia hub will do lot of this back-end administration already. It’s in the admin/ directory in the source tree.
Please feel free to borrow parts of it if you’d like :). Its API is stable.

hestia/gaia driver (v1.1)

Nice! If you go this route, just please be aware that size limits on the public hub may cause a large file write to partially fail (i.e. they’ll succeed on Hestia, and they’ll succeed on a private Gaia hub, but they’ll fail on the public Gaia hub). It’s an open question to me at least as to whether or not this means the entire write ought to fail (since your replicas are now in an inconsistent state) or succeed (since one replica – the Hestia one – succeeded).

2 Likes

I’m not totally sure but I think I can get a “public share url” (via create_shared_link_with_settings) and then redirect the /gaia/readto that url, but making it direct (using dl.dropbox.cominstead of www.dropbox.com). I’ll have to do some testing.

If I had to do the index way, I’d just store the links in the database for quicker access.


Please feel free to borrow parts of it if you’d like :). Its API is stable.

Thanks for the info! I might take a look at it for reference but because most of my user handling and the rest of my API is homebrew I will probably end up keeping it in line with that, haha.


Nice! If you go this route, just please be aware that size limits on the public hub may cause a large file write to partially fail

No it should fail if the “driver” fails (which is good). I don’t return from the /store route until it is confirmed to finish writing to the driver. I think I’ll either end up putting a warning that “you should make sure the gaia/hestia hub you are connecting to has a greater or equal bandwidth limit to yours, otherwise large files will fail to write” or have it split into parts… but will probably not do the latter as to save time. (Hestia also has a bandwidth limit you can set, but I set it to 7.5mb default instead of 5).


Thanks for your support and comments!

2 Likes

@MichaelFedora, You are rockstar! great stuff.

1 Like

Release v1.1

Breaking Changes

  • Reorganized DB interface layout

    • this includes the plugin api, so those will break if you have been developing one
  • Modified Driver Type to separate autoRegister calls from register calls

Changes

Frontend

  • When switching buckets, delete old connections only after full sync
    • before, it would remove the connection first, thus deleting all the files before
      they synced

Backend

  • Added experiemental metadata switch in gaia/list-files

  • Switched from 7zip to yazl

    • removed a binary (yay!)
  • Added DB driver functionality

    • You can now run off of any DB you want (as long as they have a driver)
  • Added RethinkDB DB Driver

  • Added Sqlite3 DB Driver

    • the default one if none is selected, so you can run hestia without installing any db
    • also adds a binary :frowning:
  • Added the ability for drivers to use (re)direct links to save bandwidth

  • Updated Dropbox Driver

    • No longer caches list-files calls, because all of that is done via the metadata
      database anyways.
    • Now uses direct (shared) links, which are created on register and on file-read if
      they do not exist.
    • No longer uploads metadata files (as it is stored in the database)
    • Files less than 8mb in size now attempt to upload via the normal endpoint
      instead of being sent to the batch files upload endpoint (but will be sent there
      to try again if they fail)
      • this speeds up the uploading speed significantly
  • Updated Disk Driver

    • No longer creates metadata files (as it is stored in the database)
  • Added Gaia storage driver

    • Takes a token and uses that bucket as it’s own storage endpoint
    • Uses direct links for reads
    • Can be used as a hub-driver (like the disk driver) when supplied a token
      • Hub-driver is auto-registerable
    • Can be used as a user-driver (like the dropbox driver) when not supplied a token
      • Registration requires this to be added: ?token={gaia token}
    • Technically it can be used as both as well
  • Made /api/v1/connections/{id}/list-files take a {bucket} instead of a {path}

  • Added ?metadata={boolean} option to /gaia/read/{...} to view (non-private) file metadata (contentType, size, hash)

  • Removed a bunch of excess dependencies

  • Handle closing the application better

1 Like