Decentralized Storage - Why I'm Skeptical

Great discussion yesterday in [Blockstack Slack] (https://blockstack.slack.com/archives/storage/p1446056224000190) about incentives and decentralized storage. I’ve replicated it here so that everyone can learn from and it join in the discussion.


Bedeho of Joystream writes in the Blockstack Slack:

Happy to be refuted on this one, want someone to change my mind:

@jude writes:

the elephant in the room is that decentralized storage (in particular, across the wide-area) is actually very hard to do compared to “centralized” storage (i.e. cloud storage, NFS, etc.), since there are way more failure modes to deal with. However, there has been decades of research into such systems from the best minds in computer science, and there are scalable decentralized storage systems that do work well. The reason you don’t see them in the wild is because they’re hard to commercialize, hard to deploy and manage at scale, and hard to run with good performance compared to cloud storage. (edited)

bedeho writes:

interesting. Not sure why you say that its hard to compare? Consumers care about cost, primarily, so if we keep other variables fixed (reliability, privacy, etc), then are these research systems actually better?

@jude writes:

“reliability” and “privacy” are ​not​ fixed, nor are they really comparable between cloud storage and fully decentralized storage. so, it’s hard to say in an absolute sense which approach is “better.” It’s really a question of whether or not it’s the right tool for the job

bedeho writes:

ok, lets say consumer dropbox is the job, and priacy is done through client side encryption with all keys with host, what is your feeling in that spesific case?

@jude writes:

regardless of the storage medium, client-side encryption ensures that realistically the only person who can read the data is the client. However, the network traffic generated by uploading the data reveals information–i.e. an attacker will know that I sent ​something​ to dropbox. This problem is even worse with a decentralized system, where I might connect to many peers and send them chunks of data, and those peers would in turn replicate those chunks. This network traffic also reveals to someone watching the network who the storage peers are, and which of them store pieces of my file.

from a reliability standpoint, dropbox is an all-or-nothing proposition: either my upload works, or it doesn’t

it’s different in a decentralized system–sometimes, my upload might be a lot slower since some of the peers are slow to respond. Other times, the decentralized system might be even more available than dropbox, since dropbox might be throttled or blocked but the peers might not be.

also sometimes I can get a partial/corrupt copy of my file if some peers don’t serve back the chunks I sent them, or if they serve back the wrong chunks

so you see, reliability and privacy have totally different meanings in these two contexts

bedeho writes:

ok, you seem even more bearish than me, I was actually willing to grant that all these issues were solved perfectly in decentralized solution, to consider best case scenario, seems like you are saying that basic value prop is dibious, and on top of that, there are tons of extra problems.

feel like some proponents need to chime in on this and save the day

@jude writes:

I’m not bearish on this–there are cases where decentralized systems will beat the pants off of a centralized solution by orders of magnitude (i.e. a CDN or a bittorrent swarm can serve data much faster and to many more clients than a single server) (edited)

bedeho writes:

yes, this is spesific to storage though, obviously bittorrent works very well, which is why it has existed for 15 yrs

@jude writes:

the point I’m trying to make is that decentralized storage systems are fundamentally different animals from centralized storage systems
I think the market indicates that a lot of money-making applications today simply do better with a centralized storage system

we don’t really have decentralized applications, for example–they might actually do better with a decentralized storage system, depending on how they use it

bedeho writes:

you could still totaly make money from the system I described, it combines a server to coordinate and pay, and distributed system for actual storage… lots of rents to collect there for the entrepreneur

@jude writes:

how do you prove that your remote peers are actually serving the data?

bedeho writes:

tit for tat

@jude writes:

i.e. how do you know you’re not paying them to just sit there and not serve it

bedeho writes:

A serves B with data, and they report back to my server when a piece has been sent, at which point i do a little payment in my database in from B to A

@jude writes:

how do you know B isn’t lying on behalf of A?

bedeho writes:

no benefit, B is paying

@jude writes:

sorry–the point I’m trying to get at is that it looks very easy to game your proposal. i.e. to not serve data, but still get paid

bedeho writes:

I get your point, but it is no harder to sovle than in the decentralized system. tit for tat works

@jude writes:

no, it does not. not even with bittorrent. there are still leechers

bedeho writes:

hehe, I am aware of that, making this -> joystream.co, but the point is that the underlying protocol for this would work with tit for tat where payment for service happens on each step, and lack of cooperation means no payment, reagrdless of who cheats.

this is really a side issue, I think I get the gist of what you are saying about distributed storage… if anything I feel more sceptical now :neutral_face:

@jude writes:

it’s hard, but not impossible

so with joystream.co, a reader only pays the seeder once they get the content?

bedeho writes:

leecher you mean?

if A downloads from B, then A pays B after each torrent piece using paychan

@jude writes:

how do you make sure A actually pays?

bedeho writes:

this is a godd question, if you do one payment, its prisoners dilemma, it will not work out, if you do iterated payments, you get iterated prisoners dilemma, which generally works out, in particular in horizon is uncertain

bedeho writes:

typical torrent file has 500-1000 pieces, so you verify integrity and pay for each one to get next

same arrangement would work for transmitting data which someone has stored on your behalf

@jude writes:

I’m not seeing how B is guaranteed to get money for serving data

bedeho writes:

paying for actual storage, rather than just transmission, requires another set of protocols which I am not intimately familiar with - but it is supposed to work some how

B is not guarnteed to get money, first piece is sent in good faith, but no more pieces are sent before payment is made https://en.wikipedia.org/wiki/Prisoner's_dilemma#The_iterated_prisoners.27_dilemma

so tit for tat

a few bytes of free service is not a big sacrifice

@jude writes:

it’s not, but if A is patient, then A will never have to pay, right?

bedeho writes:

how so

@jude writes:

A downloads from B, and does not pay B. B can try to block A, but then A can just pretend to be C, and to the same thing

bedeho writes:

sure, or more likely connect to D and try the same game

even leaving aside the insane latency issues that would cause, B should not transmit first peice until its clear that A has paid sunk cost of tx fee to setup paychan, as long as its greater than the price per piece, A has no incentive to switch to setup new paychan with D

@jude writes:

then, what stops B from taking the money and not serving to A?

bedeho writes:

that is not possible by assumption, right? B gets payment after each unit of service

B goes first on each round of the game, so to speak

@jude writes:

either B gets paid before, or after, A gets the data. If B gets paid before, then B can simply not transmit. If B gets paid after, then A can simply not pay B. What you need is some sort of escrow service, it seems

bedeho writes:

B gets paid after giving service

@jude writes:

then a clever A never needs to pay

bedeho writes:

well that is what I described above

“even leaving aside the insane latency issues that would cause, B should not transmit first peice until its clear that A has paid sunk cost of tx fee to setup paychan, as long as its greater than the price per piece, A has no incentive to switch to setup new paychan with D”

@jude writes:

maybe I’m misinterpreting, but it sounds like the following:

  • A spends money to set up a pay channel for X btc
  • B sends the data to A
  • A sends value Y < X btc to B

bedeho writes:

yes

@jude writes:

then, it’s cheaper for A to skip step 3, no?

bedeho writes:

and then setup new paychan with someone else?

fee on doing that is worse than price per piece

by a large margin

@jude writes:

could B trick A into sinking money into the paychan, and then not serve data?

i.e. B stops at step 2?

@jude writes:

yes, excellent point

bedeho writes:

this is why you need multiway paychan :smile:

@jude writes:

also, A simply doesn’t pay for the last chunk

better than not paying at all, but A can still be dishonest (but probably not a big deal if there are lots and lots of chunks)

bedeho writes:

classic paychan is 1to1, 1toN is best way to protec against this

unless the attacker can occupy all slots in paychan, A is fine

which is a serious amount of work for almost no benefit

@jude writes:

the fact that it’s possible is worrisome, though. The benefit could be nothing more than “for the lulz”

bedeho writes:

yes, defection on last round is also possible, and can lead to death spiral if players start anticipating this, but that is very unlikely to be an issue

@jude

also, how do you guarantee it’s costly to set up 1-to-N? Can’t a single seeder pretend to be many different seeders?

and, doesn’t A pay for the entire paychan setup?

bedeho writes:

A pays entire cost, true

yes single seeder can try to occupy all spots (1-10) in multiway channel, but doing this with many peers for any long period of time without much payoff is unlikely to be very attractive

bottom line is however that it is an imperfect system, but still way better than no incentives or tit for tat barter

@Taek writes:

Wish I had seen this last night, have a lot to say. Obviously I think decentralized storage is fully viable. I’m busy most of today and tomorrow but hopefully soon I can jump in with some of my own points. Sia should easily be able to compete with centralized systems, the biggest challenge is managing the incentives and failure modes. I think we are well equipped to do so.

More specific technical points to come later.

Look forward to hearing more from @Taek and others.

1 Like

@larry I’m glad you saved this conversation for posterity, as unfortunately our interesting conversations in the Slack group will get lost after a while.

We should keep doing this.

3 Likes

It seems like the core arugment of the post is ‘Why hasn’t anyone done this in a centralized way?’. I do think that this is a space which hasn’t been explored heavily. ‘Airbnb for X’ is a new idea, and there have really only been about 3 huge successes (airbnb, uber, kickstarter) - the forumla for doing it correctly is relatively poorly understood. The idea space is too new for larger companies to be doing more than getting their toes wet, especially for something like storage, which seems a little more ‘out there’ at first pitch to most people (from my experience).

Even more, cloud storage at all is super new. Larger corporations are just starting to really integrate with the cloud, let alone integrate with a crowd-sourced cloud. It shouldn’t be that surprising that collaborative technology and cloud technology hasn’t been highly investigated yet.

There are two technologies that explored this space before the MaidSAFE/Sia/Storj/Filecoin group - Synform and Spacemonkey. Neither group had monetary incentive. With spacemonkey, you got cloud storage in return for running a server at home (in the shape of an external hard drive) that stored other people’s cloud data. With synform, you got cloud data in return for hosting data, but that didn’t address the fact that people typically have vastly different supply and demand needs - synform was useless to both people with lots of supply and little demand, and useless to people with lots of demand but little supply. Generally, there just weren’t that many people in the middle.

I also think you underestimate how useful decentralized money has been when it comes to facilitating crowd-based cloud storage. The barrier-to-entry for getting paid in money for something like uber or airbnb is very high. It makes sense when the incomes are hundreds or thousands of dollars, as is the case with uber drivers and airbnb hosts. With storage, incomes are substantially lower. This makes the barrier to entry for getting paid proportionally much higher, because there is less reward for going through the effort. Furthermore, payments are likely to be very low, which means the overhead incurred by payment processors is high.

Finally, a lot of the technology that allows you to trust foreign computers with unknown uptime and overall reliability is really only accessible when you are thinking about decentralization. Tools like ‘Proof-of-Capacity’ or concerns such as ‘Sybil-attack-resistance’ are much more present-focused when you are thinking in terms of decentralization, yet are important to centralized systems as well. It’s likely that the people who would be most interested in creating a crowd service for cloud storage were already thinking about decentralization.

1 Like

Don’t forget Mojonation and Cryptosphere:

https://en.wikipedia.org/wiki/Mnet_(peer-to-peer_network)#MojoNation.

2 Likes

Yes, this is the crux of my skepticism, and I see it as a big problem. I guess it all hinges on how long you believe entrepreneurs will fail to take advantage of an obvious cost saving. I don’t think 10-15 years is a plausible period in highly competitive markets like cloud storage.

This is why I mentioned the PokerStars example. Micropayments are trivial so long as you allow them to happen in a silow. Only fees you incur is when users refill their accounts - which would be for a significant amount anyway ($10+), or if people want to cash out to fiat, which again is really only worth the effort when its a sizable amount. Is this perfect? No, but is it hardly seems like the delta between this and the ideal (which btw. cryptocurrency is not, still fees to get in and out) can be what has held this back if it otherwise has significant cost savings.

I have a hard time thinking that engineers would fail to solve these issues due to ‘state of mind’ associated with building centralized control systems. Client-server protocols where the counterparty cannot be trusted are ubiquitous, decentalized consensus is not unique domain to have these problems.

1 Like