I tend to look at decentralization as a solution to a three-pronged problem (with not all prongs being equal):
-Privacy
-Security
-Censorship-resistance
The privacy prong is the focus of this post. I do not want to use any third-party analytics. Currently, Graphite uses Matomo, an open-source and anonymous analytics provider. But I’d like to provide even more comfort and anonymity. This requires the storing of some data, of course. So, with that in mind, I’d like to get feedback from the community on a potential custom-built solution. The goal here is to store the least amount of data as possible, with little to no identifying information.
Some background on WHY I am wanting to do this:
Currently, it’s possible to query all registered IDs with a username and check if they’ve ever logged into an app using multi-player storage. But, Graphite launched before it was free and easy to register a username with your Blockstack ID. So, there is a non-zero number of Graphite users that do not have a username and thus would not be queryable through the method mentioned above. To get accurate usage statistics, which are going to be necessary in proving out the viability of decentralized apps, I want to make sure I’m capturing ALL Blockstack users and not just those with a username associated with their ID, and I’d like to make sure I can accurately query active user counts.
You’ll see here the only thing that can be tied back to a user is the public key, and even then, doing so seems like it would require a significant amount of guessing/work. The public key would act as the unique identifier to make sure I am not double-tracking data points. And since Blockstack IDs are all publicly queryable now, I’m not sure using the public key is even necessary (but it does feel like an abstraction layer).
I’d love to get people’s thoughts and comfort level on this.
We do something very similar and store the data on Firebase. Would be awesome if we can create an account with multi-write permissions only if that makes sense.
For example: analytics.stealthy.id or analytics.graphite account can have open write privileges. This would allow us to completely rely on gaia to write the analytics data encrypted with analytics.* account’s public key.
This is definitely something that I’ve mulled over often before too, since stat tracking can be a real gateway drug to hoarding user data. Thanks for opening the discussion.
If the only purpose here is disambiguation, is there any reason you can’t just sign some arbitrary data with the user’s public key and use that for the unique key?
Personally I have no issue with storing anonymous usage statistics in private storage, so long as you have a good privacy policy that outlines what data you keep and you adhere to it.
I’ll agree with this one.
Using the public key itself is something that can be used to track certain users and what applications they use which is not good for privacy focused ecosystem.
Nope no reason I can’t do that. In fact, that’s a much better anonymity solution. I flat-out don’t care WHO is using the app. I just care that people are using it
I agree that we, as a community, should try and find the right answer to this problem that every app runs into. My main question goes along with this point:
Personally I have no issue with storing anonymous usage statistics in private storage, so long as you have a good privacy policy that outlines what data you keep and you adhere to it.
What are the reasons to store analytics in a decentralized manner, vs private anonymized? The only thing I can think of is that anyone could access that data and run interesting statistics. Then the community could come out with interesting data visualizations. And Blockstack could use this to calculate and share global usage numbers.
We could still achieve that by making our statistics public. The only reason to make it decentralized is to ensure that nobody’s doctoring the numbers. But I think in most product’s cases, that’s an unnecessary concern.
This approach sounds like an idea I had been tinkering with: building a decentralized Mixpanel where users own all of their analytics data in their Gaia hubs. The way it would work is that the application client would write any analytics state it needed to gather to a well-known path like .analytics.json, and the developer would run a crawler that periodically crawled user Gaia buckets to find their app’s .analytics.json file in order to build up a global view of app-specific activity measurements (e.g. time spent logged in, number of documents shared, etc.).
As others have pointed out, there are a few “interesting” properties in this approach could provide that don’t exist in centralized analytics systems today:
If the user runs their own Gaia hub, they can delete the .analytics.json file, they can have their hub deny writes to it, and they can have their hub deny reads to it. Users have unilateral discretion over what gets shared.
Since the data lives in the user’s Gaia hub, the user can also view their own activity in the app (in order to know exactly what is being collected), as well as other users’ activities. There is no longer a single central panopticon, since for better or worse anyone can be a panopticon.
The user can monitor their own behavior across all the apps they use that employ this scheme.
Insofar as preserving privacy, the app could encrypt and sign the .analytics.json file with the app developer’s private key. This would prevent any app-specific analytics from being exposed. However, the presence of .analytics.json in the user’s Gaia hub (and whether or not the contents change over time) would indicate that the user is an app user and that the user has used the app at particular points in time. This would, as @wbobeirne points out, make it hard to fake user numbers and make it easy to get accurate user engagement counts without exposing app-specific sensitive data (something that can help with app rewards mining, as described in the whitepaper).
Regarding the privacy concern as to exposing which users use which apps, one option we’re thinking about is to let the user encrypt parts of the apps field in their profile, so that only privileged users can discover which apps they’re using without breaking multi-reader storage. For example, Stealthy users Alice and Bob can expose their multi-reader apps field for stealthy to each other, but no one else. This would have the side-effect of making it intractable to link a Gaia app address to a user, and thus make it intractable to track the user using the above approach (with the downside that the user has to manage which other users can see their apps).
This is a really great idea! The only concern I have is one that I think comes from outside the decentralized bubble we are operating in. I want, very much, to live in a world where people just use apps and if they want to expose their analytics data to an app they can. But if users have the ability to delete the file necessary to track basic usage, apps will be missing one of the key components that will ultimately convince the rest of the world to use the app:
Social Proof
I assume it would be infrequent for a user to A) run their own gaia hub and B) delete the analytics file. But if that happened, you no longer have an accurate count of user data. But maybe it’s close enough?
Just thinking through the challenges of convincing the rest of the world (not those already in this forum) that decentralized apps are viable. As any developer on this forum can tell you, the first question investors, journalists, and potential customers ask is how many people are using the app.