Difficulties encountered with profile data

larry · May 22, 2015, 1:47am

I encountered a couple difficulties working with avatar and cover photos while putting together Nametiles.

Performance

When users register with Onename, their profile & cover images are uploaded to Amazon S3 (appears to be the US East coast data center). This data center is really slow in my part of the world (Hong Kong).

To get more consistent performance worldwide, I proxy certain avatar & cover photo urls so that they can be cached at our CDN’s edge servers.

You can see an example here:

https://api.nametiles.co/v1/users/larry

At some point we’re going to get people linking not only to resources that perform poorly in certain parts of the world, but might not be accessible.

What’s the best way to approach this?

Format

Most avatar photos are square, some aren’t. Some are HUGE (file size). Services consuming passcard profile data are going to have to do a fair amount of normalization and processing to generate usable assets. Do there need to be more constraints on file format? Would a 1000000x1 photo be a valid avatar? What about a 200 megabyte bmp or tiff?

I see @ryan added a link to a proposal for Canonical Content Identifiers. Would these play a role in solving this problem?

muneeb · May 22, 2015, 10:21pm

Serving images over CDNs is something we can certainly do. In fact, @jude can help with this. Through Jude / Larry Peterson we have access to a couple of large CDN providers and can get this setup

muneeb · May 27, 2015, 2:15pm

I had a discussion with @jude about this and we can start looking into putting assets like images behind a CDN already.

larry · May 28, 2015, 6:21am

That’s great - however I think its better if we come up with either a decentralized solution or a set of best practices and format constraints that let people building on passcard reliably locally cache/proxy all of these assets.

Why? Some examples:

Onename may host assets on a CDN, but Acme Registrar may not…users are going to blame a service for being slow and not going to make the connection that it’s because they picked Acme Registrar instead of Onename.
Your CDN may work great in some countries but have horrible performance, be totally unaccessible in China, etc.
People building businesses on Passcard aren’t going to want to have their users making a ton of requests to 3rd party servers on each page load - for both performance (ie. end user experience) reasons and to stop leaking of competitive data.

Should we add things to the schema specification like:

cover->url will point to an image of the type png or jpg that fits within X file size constraints, Y resolution constraints, etc?

The idea would be that if profiles and the assets they link to don’t conform to the spec, resolvers and services consuming profile data could ignore them.

Also open to other solutions.

muneeb · June 2, 2015, 9:15pm

The idea about handling it at the specification layer is interesting. We can probably do thumb nails like this. I’d separate out the CDN concern from this specific problem. We’ll get into using a CDN sooner or later anyway, especially for images. It’s great that you’re in Hong Kong and can experience these issues and point them out!

So there are two questions here:

a) How to use CDNs when there could be different providers. I think CDNs can be application-specific and not provider-specific e.g., Mine is interested in the performance of their application and can try to provide a CDN for their data, regardless of the provider a user used to signup. Does that make sense?

b) Should we include some specific requirements in the schema to help make image lookups faster. I think it can work if we enforce it strictly. But do we want to enforce it strictly? One of the design principles from HTML/WWW was that they were generous in forgiving people for badly formatted pages. Would love to see more discussion on this.

larry · June 3, 2015, 5:04am

My questions about format aren’t directly connected to image performance.

The schema should answer questions like:

What’s the maximum number of characters in a bio? Can it be a 100k character long auto biography? Can it include html? Can it include markup?
Can an avatar be a 80 megapixel panoramic picture? Can it be SVG?

If we don’t do this, different users and providers of passcard services will (out of necessity) end up making their own decisions about this.

Arguably the design lesson from HTML/WWW is make the specification clear - 2000-2010 saw lots of money spent trying to make websites that worked on various versions of IE and other browsers which behaved totally differently.

Building products towards the end of that period the discussion was always “it will cost X time/money for your web app and 2 * X if you want to also support IE6”

Better parallel is JPEG (or GIF or PNG) - your image is either JPEG and it works or not JPEG and a program that expects only JPEG will say “sorry this isn’t JPEG, I can’t open it”.

I understand the benefit of making a specification vague since we don’t know how exactly its going to be used, but the downside of this that resources spent developing products will be making the data usable. Web browsers are incredibly complex (which is why we only have a couple of them) in part because web is full of loosely structured data.

A specific example of this is can be seen in avatars: 99% of passcard avatars are squares. One could either a) assume all passcard avatars are squares when using them (this is what the old openname.org did) and when a non-square avatar appeared (+arianna’s comes to mind), the page layout will break, or b) you could spend time creating code that handles all the possible combinations. Without schema guidance, many people will pick a) because its easy (you and @ryan did!) and others will be pick b).

What I’m trying to say (I feel not very successfully) is that without more guidance about the expectations for profile data type (beyond just String or url), different implementations will make arbitrary assumptions about what types of data they expect in profiles.

Are you saying that applications built on Passcard should locally cache profile data and deliver it through their own CDN (this is sort of what we do with Nametiles)?