How Blockstack could address Amazon S3 outages

jude · February 28, 2017, 8:17pm

Today, you may have noticed that some of your favorite sites experienced an outage. I noticed it when I went to download an ISO image from a Github repository, for example.

The reason is that Amazon S3 experienced an outage. Amazon S3 storage serves a large part of the Web’s content. One study from 2012 estimates that about 1% of all Internet traffic comes from S3 servers. I’ll bet it has climbed since then.

I was able to get my ISO image, but only by virtue of the fact that the developer was responsive and made copies and put them in Dropbox and MegaUpload.

The lesson here is that when you host content, you should put it in multiple services so that an outage in one service does not affect availability from others. Blockstack addresses this by automatically mirroring all your data to the storage providers listed in your zone file, so that if you can access at least one service you can still get at your data.

muneeb · February 28, 2017, 8:35pm

Yep, centralized services are not only bad for security, but they’re also bad for reliability.

I noticed the S3 outage when I couldn’t upload a screenshot to Slack and then tried uploading the same screenshot as a Github image and that didn’t work either. I was like “wait how is image uploading broken on both Slack and Github today?”

It turns out they both use S3.

Failures are the norm and not the exception:
We should assume that individual service providers will go down and design systems to survive these failures. That’s the design principle that Blockstack follows. Individual storage backends are just dumb drives and, by default, we replicate to multiple storage providers. If one provider goes down, you can try the next one.

Need to separate data delivery from data authenticity:
Also, we separate data storage from checking if it’s the correct data. Users can fetch data from anywhere and independently verify that they received the correct data, i.e., the data was signed by the correct owner or the hash of data matches with what they were expecting. This design choice has implications for the current outage. If S3 went down and our apps were not tied to S3 but could fetch images from some other source and verify that they downloaded the correct files, then the outage would go unnoticed.

Practical limitations:
There are obviously practical constraints like (a) how many replicates will users realistically have and (b) what if all replicas go down. First, disk space is the cheapest resource and adding storage replicas doesn’t cost much. Secondly, if S3 goes down and Google goes down and Dropbox goes down and BitTorrent goes down then something catastrophic happened and you can take a break until at least one of your replicas comes back online!

A real example:
Here is how this works on Blockstack.

$ blockstack lookup judecn.id

Gives his zone file:

$ORIGIN judecn.id
$TTL 3600
pubkey TXT "pubkey:data:04cabba0b5b9a871dbaa11c044066e281c5feb57243c7d2a452f06a0d708613a46ced59f9f806e601b3353931d1e4a98d7040127f31016311050bedc0d4f1f62ff"
_file URI 10 1 "file:///home/jude/.blockstack/storage-disk/mutable/judecn.id"
_https._tcp URI 10 1 "https://blockstack.s3.amazonaws.com/judecn.id"
http._tcp URI 10 1 "http://node.blockstack.org:6264/RPC2#judecn.id"
_dht._udp URI 10 1 "dht+udp://fc4d9c1481a6349fe99f0e3dd7261d67b23dadc5"

You can see that S3 is just one storage backend out of many and if Blockstack can’t find his data on the S3 URI, it’ll simply try the next one.

And yes, Jude’s profile is still resolving while S3 is down.

Taek · February 28, 2017, 10:28pm

@muneeb have you given more consideration to adding Sia to your list of providers? It is a much more cost effective way of achieving even higher levels of reliability.

muneeb · February 28, 2017, 10:48pm

@Taek great to see you here! We’re happy to add drivers for storage backends. Our stance is that if users want to use a particular storage backend, they should be able to.

With limited engineering resources, we’re roughly ordering them by user demand. Please start a Github issue on Blockstack Core filed under discussion, and we’d be happy to chat more

larry · March 1, 2017, 10:58am

Not only are they bad for security and reliability -> centralized services are bad for business. They introduce counter-party risk where your ability to deliver on your promises to customers is constrained by the underlying risk characteristics of your third-party suppliers.

Wow. I was wondering why I had trouble uploading photos of yesterday’s meetup before it started from the venue. I assumed I was having some sort of MITM wifi problem. Would have been a great point to bring up during the talk.

jude · March 1, 2017, 4:56pm

@Taek I just pushed a skeleton storage driver and accompanying documentation here.

patrick · March 1, 2017, 5:28pm

Awesome @jude! @Taek looks like you’re ready to get started! Let us know if you need anything else after you’ve checked out this -> https://github.com/blockstack/blockstack-core/blob/rc-0.14.1b/blockstack_client/backend/drivers/_skel.py