High-Impact Bug in Default Gaia Hub and Remedies

This past weekend, a lead developer on the Misthos project discovered that new user signups were unable to successfully log in to applications. This affected 124 users during an important announcement for that project.

This was a result of a misconfiguration in the default Gaia hub — and I’d like to explain exactly what happened, what we’ve done to fix it, and how we’re ensuring this doesn’t happen again.

Root Cause of the Issue

A misconfiguration in the default Gaia hub caused the hub to respond to POSTs with non-CDNed URLs. Usually, a Gaia reader uses the /hub_info value for readUrlPrefix to establish the URL to read from. However, our profile writing and zone file generation code in the Blockstack browser uses the returned URL instead. This led to a difference in what the zone file contained, and what the hub_info returned. This difference persisted from UTC 2018-06-21 14:00 to 2018-06-23 20:00 (54 hours), which impacted 124 name registrations that signed up following Misthos’ testnet announcement. Our testing of the configuration change unfortunately did not catch this, because our tests only checked that the hub_info read URL was correctly able to read the POSTed data.

Resolving the Issue for Users

We addressed this issue for the 124 affected users by shipping a hotfix of the Blockstack Browser (patch here, and released here). This hotfix will attempt to write to a user’s configured Gaia hub in the event that the zone file points at an unknown URL. This fail-over behavior ensures that users are able to write their profile to their configured location, and it also allows their current zone file to resolve correctly (because the zone file points to the same underlying data, just not behind a CDN). When a user updates to this patched version, they can use their existing names to log into applications. We tested this by issuing a subdomain name with the “wrong” URL (aarontest2.id.blockstack) and confirming that we were able to log into applications as normal.

As a long term solution, this highlights the need for our software to have a mechanism for fixing corrupted zone files (or simply updating them to point to a different Gaia hub). This will require adding support to the Blockstack Browser to issue zone file updates (and allow the subdomain registrar to broadcast those zone file updates on behalf of subdomain users). Work for this will be ongoing.

Ensuring That This Doesn’t Happen Again

Looking ahead, there are two major steps we’ll take to make sure any issues like this are diagnosed and corrected:

  1. Increased unit tests on the gaia deployment

  2. Regular end-to-end test of new user on-boarding: going through making an account, and signing into an account

Both of the above can (and should) be automated. Tests have already been added to unit tests on the Gaia codebase (see here), whereas developing an automated test for #2 is on-going work, but should be completed within the month. Until the automated test is in place, we’ll commit to daily manual tests of this flow (keep your eyes peeled for regular new users called aaron_tests_dayN.id.blockstack)