Gaia is a decentralized storage system that’s been in operation for several years, and has been used by a wide variety of Stacks (and Blockstack) applications. As mentioned elsewhere, Gaia is currently run by Hiro, and it’s very expensive to run as deployed.
Gaia was designed with the intention that users would run their own Gaia hubs, and connect them to storage back-ends of their own choosing. The BNS client libraries ensure that applicactions automatically discover other users’ Gaia hubs when loading data: given the user’s BNS name, an application name, and the name of the file to load, the client library is able to find the user’s Gaia hub and fetch the corresponding file.
This of course did not come to pass. In a well-meaning effort to bootstrap usage, Blockstack PBC (now Hiro) ran a default Gaia hub, which of course everyone used in place of running their own. To make matters worse, because the user’s Gaia hub URL is written to a BNS zone file, the act of changing over to a new hub is the act of sending a Stacks transaction. So realistically, users are not going to migrate their Gaia data to their own Gaia hubs anytime soon (if forced, they’ll more likely abandon the data).
Gaia Rail
This has put Hiro in a bind. They don’t want to delete the default Gaia hub because it would break other applications that use it. But they don’t want to be paying to host an ever-growing amount of data either (who does?). Asking users to download a copy of their data might be tenable, but asking them to spin up their own hubs and send Stacks transactions is probably a no-go.
This gives me an idea. Instead of having Hiro host both the hub and pay for the back-end, what if instead Hiro just ran a “request rail” for Gaia? It would serve to give a public, consistent URL for users’ Gaia hubs running at home behind NATs, This service would be stateless, and host no data on its own. Users would host their data at home (or even on their on-the-go laptops), and would run Gaia hubs that would establish a persistent TCP connection to the Gaia rail in order to traverse whatever NATs separate their hubs from the public Internet.
The request flow would look like this:
NAT
|
request --> Gaia rail -- | -> Gaia hub --> local storage
|
When a request for a user’s data arrives at the Gaia rail, the Gaia rail simply forwards the request through the persisted TCP connection to the user’s Gaia hub, which in turn loads and replies the data to the Gaia rail. As the data arrives at the Gaia rail, the Gaia rail simply pushes it back to the requester.
All the Gaia rail operator does here is supply network bandwidth and a public IP/DNS name for clients to access, and perhaps does some rate limiting and caching (possibly by means of a 3rd party CDN) to prevent the service from being overwhelmed. The Gaia rail would implement an authentication protocol for users’ Gaia hubs, such that the Gaia hub must prove that it operates on behalf of a particular BNS name owner (meaning, only users with BNS names can use the Gaia rail).
The Gaia hub itself would be largely unmodified, save for two things:
- This authentication protocol, whereby it proves to the Gaia rail that it’s the Gaia hub for a particular BNS name
- A keep-alive protocol, so that when the user’s computer changes its IP address, it can reconnect itself to the Gaia rail. This would permit users to run their Gaia hubs on mobile devices such as laptops, which can go offline or rejoin the system from different IP addresses as the user moves through the world.
Decentralizing the Rail
Running the above rail would not be free of course – it would cost money in terms of the bandwidth required to run at scale. In the future, we could have Stacks nodes implement the rail through a variation of the up-and-coming StackerDB system. Briefly, a StackerDB is a store-and-forward chunk store for storing soft state in the Stacks network on behalf of a smart contract. The smart contract grants a whitelist of users a storage quota (i.e. a fixed number of fixed-sized chunks they can write), and employs a best-effort store-and-forward protocol to ensure that a user’s written chunks get replicated to all Stacks nodes that replicate that particular StackerDB. It’s currently being built for sBTC signers to leverage the Stacks p2p network to exchange FROST DKG information, but it’s otherwise general purpose.
Already, the StackerDB system would enable a set of Stacks nodes to replicate an authorized user’s Gaia writes amongst themselves, as long as the data was small enough to fit into the user’s quota. From there, we could extend the StackerDB system to permit authorized users to open persistent TCP connections to Stacks nodes that replicate that DB, and in doing so, subscribe to new chunk writes and service chunk reads. This would enable users to write to their Gaia hubs at home by means of a Stacks node – the user writes data as one or more StackerDB chunks, which in turn get pushed to the user’s at-home Gaia hub (which is a subscriber).
In addition, we would extend the StackerDB system to allow downstream clients to register themselves as origins for chunk data. Then, the Gaia hub would open a persistent TCP connection to one or more public Stacks nodes that replicated the hub’s StackerDB, and register itself as the origin for the user’s chunks. Then, when the user asks the Stacks node for a particular file, a Stacks node would simply forward the request to the hub and ferry back the data it returns (just as the rail would do).
If we implement the Gaia rail this way, then we open up a way to fund Gaia operation to Stacks node operators: for a fee paid in STX, they would agree to run a StackerDB replica for a user’s Gaia hub. The node operator does not store data persistently; it just provides transit for the user’s reads and writes to and from their (NAT’ed) Gaia hub. The StackerDB chunk store acts as both a write-back cache for a user’s Gaia writes, and a read cache for other users’ Gaia reads.