Help Wanted Makers: Gaia Administration Application

jude · November 27, 2018, 3:35pm

Gaia Administration Panel

Project Brief

Looking to help grow the Blockstack ecosystem? We need a Web front-end to help users set up their Gaia hubs. The app would serve as a front-end to Gaia’s new administrative API, and let the user do things like configure their Gaia hub’s backend storage settings and access controls. This is an intermediate difficulty Blockstack project.

Background

Gaia hubs are configured by means of a configuration file, as well as some environment variables. They control aspects of the hub’s behavior, such as which driver it uses (including which credentials), which public keys are allowed to write to it, and how many social proofs are required by the writer. These fields need to be configurable at runtime.

Configuring a Gaia hub at runtime today requires ssh-ing into the hub’s host, editing the config file and environs, and restarting the Gaia hub. This is well and good for experienced developers and power users, but is not an option for everyone else. To alleviate this, Blockstack PBC created a Gaia administrative side-car – a RESTful API endpoint for reading and writing the configuration file, and instructing a co-located Gaia hub to reload itself. This paves the way for client-side tools and Web apps for administrating a Gaia hub, which makes doing so much more approachable for general audiences.

Project Requirements

The Web interface for administrating a Gaia hub could be implemented as a stand-alone Blockstack app. The challenge in doing so is that the user must be able to use this app without a functioning Gaia hub – the user may use this app to bootstrap a Gaia hub for other Blockstack apps to use.

On first page-load, the app would need to prompt the user for:

The URL for accessing the administrative side-car
The API key used to interact with its administrative side-car
A password for encrypting the above information. This password should be used for encrypting all Gaia hub information stored in local storage (see below).

This information should be stored encrypted in local storage, so the user can access them in the event they cannot reach their Gaia hub.

Once the user loads in their side-car URL and API key (or provides the password to load and decrypt a saved URL and API key from local storage), the app would need to load the Gaia hub’s configuration from the administrative side-car. If the URL or API key are invalid, then the Web app should show a suitable error page indicating this.

The configuration for a Gaia hub is a JSON blob. At a minimum, the app needs to provide editable fields for each supported Gaia hub configuration field. The list of fields can be found in the Gaia hub Github repository here.

The app needs to give the user the ability to save the configuration fields and reload the Gaia hub. The app may present these as separate actions – if so, then the app should by default reload the Gaia hub whenever the user saves any changes. When the user reloads the Gaia hub, the app must test that the Gaia hub is working correctly with the new settings by attempting to store a randomly-generated file to it. If the app fails to do so, it must report to the user that the configuration changes did not take effect.

To avoid rendering the Gaia hub inaccessible if the user accidentally supplies bad configuration information, the app must store backups of the Gaia hub configuration in local storage. The app must store at least the last known-good configuration, but may store additional backups (such as the last 10 known-good configuration fields). These backups must be encrypted, since they contain sensitive information like cloud storage API credentials. The app must provide a way for the user to clear them out, since Gaia hub configurations contain sensitive information like storage system API keys.

Project Execution

Blockstack PBC has already implemented a back-end administrative side-car and test suite in the develop branch of Gaia (code here). The side-car daemon can be used today to remotely administrate a Gaia hub via a standardized RESTful API. Blockstack PBC is also in the process of producing sample scripts, Docker files, and VM images for deploying a Gaia hub and its administrative side-car together.

What is missing is a Blockstack-powered Web app for sending commands to a Gaia hub’s administrative side-car. Since Blockstack users own their data, they should also control the means of storing it – this means giving users the ability to run and administrate their own Gaia hubs. I think building out the user interfaces for this ought to be a community-driven effort, so users have the most say in how this gets implemented.

moxiegirl · November 27, 2018, 3:51pm

moxiegirl · November 27, 2018, 3:51pm

moxiegirl · November 27, 2018, 3:53pm

patrick · December 1, 2018, 11:13pm

Cool this could be an app for app mining

mikecohen.id · December 2, 2018, 3:41pm

Hi @Jude I’m well up for this challenge - in fact I’ve been building a Java based implementation of gaia hub over the past week or so as I am interested in the open-membership hub model for the art auction platform I’m working on. The plan here is a gaia hub backed by memcached that is configurable from the dapp which invalidates cached objects on write/store. I will take a closer look at the spec now.

retired_user · December 3, 2018, 5:21pm

Hello.

An HTTP server is a Gaia hub as long as it implements the same API and honors the same auth protocols. I checked with @jude on this.

We also welcome alternatives to gaia. gaia does not require blockstack and blockstack apps do not require gaia. If you have design implementations where the minimum implementation might expand beyond the gaia specification or ideas that deviate from gaia all together this is also fine.

Curious, why Java? Did you mean javascript? Either way curious what advantages you are looking to obtain from one language implementation over another. Always a good conversation to have for any design implementation and I learn alot that way.

Can you clarify what the use of memcache is for? Presumably to increase performance, but can you address exactly what performance issues this would solve for gaia use as it stands?

I can see how memcache might expediate performance of reads? Is there a limitation or latency you are looking to overcome? Is there a method to have memcached identify when a write associated with a read has been changed and memcached can stay up to date on this?

memcache has had many vulnerabilities in the past, and having a clear technical specification for how memcached would be used and properly sandboxed might be useful.

Adding as few layers as possible unless there is a clear need/solution to be had by implementing layers could be useful. For now, there are options like utilizing load balancers in tandem with an ssl endpoint on a cloud host provider who will already be implementing optimized read/writes.

For individual hosters, it will be interesting to see what needs arise as the ecosytem for local gaia hub hosters evolve and how we can support that.

Addressing what qualifies as a gaia hub, and also thinking about performance optimization are really awesome points to bring up, so thank you.

I am not sure I understand how your ideas are related to the Gaia Administration panel, can you elaborate?

mikecohen.id · December 3, 2018, 6:37pm

Hi - thanks for the detailed response.

I was implementing a gaia hub partly out of curiosity and also in response to reading the forums generally about gaia for example amongst other similar comments. I also have an early blockstack identity that is linked to the dropbox storage driver via an early version of blockstack js and it’s been an itch to figure out the underlying problems with dropbox as backend user storage vs say S3.

I did mean java - but only for implementing a cloud hosted implementation of gaia hub. The admin console ui would be a vue js application.

Yes as you say I was thinking about speeding up reads but was thinking off the shelf cloud caching would not necessarily respect invalidation on ‘write’ very effectively which is how I came to memcache - however I wasn’t aware of the potential vulnerabilities so maybe redis is a better choice there?

My app development domain is online auctions where cache invalidation is a critical concern for a bidding service. My thinking about this is therefore biased toward looking for high performance reads but with transactional guarantees against serving stale data as this can kill bidding platform. I take your point about not reinventing wheels but the invalidation aspect seems worth it especially when using docker to deploy microservices where use of say docker swarm or kubernetes means reliance on infrastructure as a service can still be leveraged?

Yes I saw a Judes’ post just when I was thinking about how to configure the gaia hub I was implementing to make it flexible enough to support multiple storage drivers and for multiple users. It strikes me as a difficult problem but one that’s crucial for realising the use case of users bringing their own storage to d-apps and an area where I can maybe contribute to the overall effort. Let me know if you think this is off kilter as it will save me some time?

retired_user · December 3, 2018, 6:59pm

We are not trying to hardcode solution per cloud providers. As I understand with most cloud providers, they have hypervisors that provision Virtual machines, which support/run various operating systems. On that operating system, whether it runs as host on your local physical machine, or on a cloud somewhere, we have designed the gaia hub configuration to be the same.

The implementation details could be in the ssl terminus endpoint, which could coincide with an industrial grade load balancer if for example the app developer is hosting gaia hubs and needs to service heavy load, but perhaps an individual user hosting their own gaia hub on their local machine will just use a simple nginx.conf with Lets Encrypt with their FQDN for ssl.

Ideally, different users for different gaia hubs will have their own needs for read/write performance, so adding default layers of memcaching might not make sense to internalize within the scope of gaia.

Also ideally, we want to modularize gaia hub services as much as possible so they are plug and play as much as possible for any operating system running on any vm for any cloud host provider. I also answered in more detail about this here:

Internally we call this our “cloud host agnostic” approach to all solutions we provide, and it is especially valid in the case of gaia hubs.

Dropbox
I’m curious what your issues were for using the Dropbox driver config, can you post github issues/bugs in the gaia github so we can take a look? When it comes to your uses for memcaching, Dropbox already implements their own internal solution for servicing requests optimally, and I’m curious how your implementation of memcache would overlay on top of this (see here: https://blogs.dropbox.com/tech/2018/03/meet-bandaid-the-dropbox-service-proxy/), but we will want to see the issues you ran into configuring your dropbox driver. Don’t hesitate to post github issues about these, we want to see them.

Admin Panel
In regards to the gaia admin panel, we are actually looking for a UI that supports servicing the admin side car in the gaia/develop branch (there is a readme for the sidecar) so individual users have a more intuitive way to configure their gaia hubs during or after onboarding (we are working on the UI flow for prompting this admin panel) so an average user isnt having to log into a vm and use the sidecar with a command line.

let me know if you have any more questions!

retired_user · December 3, 2018, 7:30pm

high performance reads is worth looking into, but technically out of the scope of the gaia panel. It would ultimately be best to not rely on any cloud host infrastructure, but the first goal of gaia is to provide choice for users, whether they are on local machines or on cloud host providers.

Optimizing the performance of reads being serviced from a docker compose that runs on any vm on any cloud is definitely worth looking into. For now, there are many options offered per cloud host providers, and developers needing to service high load/offer high performance reads are probably going to choose a cloud host provider that can provide the configurations they need.

It would be very cool to offer plug and play performance optimizations, but I would say even considering this for our teams roadmap is closer to Q2 2019. First we are focusing on giving users choice for gaia hubs and exploring migrations. You are welcome to try a modular feature for memcached to accelerate reads and we would love to have participation and see pull requests for this.

I am still not sure why Java would be required for implementing a cloud host configuration based on my response, nor do I know of any Vuejs or memcached dependencies on Java.

mikecohen.id · December 4, 2018, 10:33am

Hi @retired_user, many thanks for the detailed response and sharing your thoughts.

The dropbox issue was way back this time last year I managed to link my zone file for mikecohen.id to dropbox storage before support for this was removed from the browser. Larry Salibra told me how to update my zone file at the March Berlin hackathon to remove the dependency but I decided to leave it till I had time to understand what was going on. I think things have moved on too much in the meantime for a bug report to be useful.

I’ll report back once I’ve digested the information above!

mikecohen.id · December 5, 2018, 11:32am

Hi @retired_user a quick question to double check my understanding about read_url_prefix. For hub.blockstack.org the read_url_prefix;

read_url_prefix: https://gaia.blockstack.org/hub/

goes to backend storage via the gaia hub. My experience with this so far is that gaia hub is able to read from eg S3 bucket since it has the api access creds but this does not guarantee the data can be read directly from the S3 bucket url as reads from S3 are by default not public and so this depends on additional aws configuration of the users S3 bucket.

I think reads must get proxied via gaia hub but just want to get clear this in in line with how you guys see this evolving?

The example on read urls on github says for example

While the prefix of the read-from URL may change between the two

implies the users backend storage creds are stored in two different places but the example continues with the same prefix myservice.org for both reads and writes;

https://myservice.org/read/1DHvWDj834zPAkwMhpXdYbCYh4PomwQfzz/0/profile.json

do you see ‘myservice.org’ as always pointing to a gaia hub to proxy - and in general that it would be the same service endpoint as the ‘store’?

moxiegirl · December 5, 2018, 11:00pm

mikecohen.id · December 7, 2018, 12:38pm

Just answering my question above as I noticed " The readURL parameter" section of the readme here is more explicit;

drivers will return read URLs which point directly at the written content

in saying that the default behaviour of the read url is to point directly to the storage provider (or cdn version of) which means my questions about caching (on the gaia hub) aren’t really relevant as the reads aren’t expected to go via the hub. I guess it can work either way (under the user’s control ) but does seem to imply that the user needs to also be aware that settings on the storage provider need to guarantee the read url will work?

retired_user · December 7, 2018, 6:46pm

If you are using a third party storage provider, they do need to have their own services and apis working if you are relying on their apis. In the case of utilizing an amazon s3 bucket, they would need to have the read url to the bucket working.

mikecohen.id · December 12, 2018, 11:43am

Hi @retired_user nd @jude, just to let you know I’ve a prototype for this application up. Just need some feedback to check its in line with what you want before further dev and test.

The project is in github and the readme hopefully gives enough info to review.

The prototype is running here

markmhendrickson · December 21, 2018, 10:39am

Hi Mike! My apologies for leaving you hanging here. The team has been hustling to close out work for the year on a number of fronts, and I think we simply lost track on this thread.

Much thanks for providing this prototype as well! I have it on my list to try out soon with my own Gaia hub, and I’ll report back with what I find in terms of feedback, etc.

In the meantime, please let us know if you have any particular questions or thoughts. We’re looking to keep pushing forward with best Gaia hub support in general as the new year starts up.

retired_user · December 22, 2018, 3:50pm

Thank you for this. We did not lose track of this thread, but this req was put out in // with other projects that need to be finished first before testing for this is complete, but you are free to have other people hosting their own gaia hubs from develop test out the panel or try it with different configurations. In general, the admin panel should work by enabling all of the api calls mentioned in the admin sidecar in the gaia/develop branch, so the testing would enable all of the api calls in the develop branch at this point.

Thank you for being dedicated to this. We did not forget about it.