IGF: Dreaming the cloud

This year’s Internet Governance Forum in Vilnius, Lithuania, was a huge event. There were about a hundred sessions, some with several topics crammed into them.

In the session on “Data in the cloud: Where do Open Standards fit in?”, I shared a panel with Vint Cerf and the W3C’s Daniel Dardallier, among others. It turned out to be an extremely interesting time. A transcript of the session will hopefully become available in due time.

(This is the first of two articles from the IGF. In the second one, a couple of days down the line, I’ll be dealing with the state of the discussion on Open Standards.)

Vint Cerf described the situation succinctly, by saying that cloud computing today is in a similar place as the Internet was in 1973: Dominated by a few large nodes, just as mainframes were the dominant model back then. He jokingly called cloud computing “just time sharing on steroids”.

He duly added that this was oversimplified. For the operator of a cloud, the nice thing is that it can run applications independently of the physical machines in the datacenter. Any process can be shifted between different machines.

He made another very important point: On the Internet, all end devices are equal. The network doesn’t care whether you connect to it with a mobile phone or with a supercomputer. Clouds should work the same way.

With the audience thus prepared, I could spin some ideas that I’ve been discussing with many different people recently. They’re appropriately cloudy, and the only way to make them more concrete is to share them, have them criticised, identify the part that’s nonsense, discard it, and get to work on the rest.

When you get down to it, cloud is a question of power: If we put all our data and computation in centralised systems, the owner of hte system will have power over us. So the idea that’s been floating around (e.g. Eben Moglen here) is that that we could use small, cheap servers to build our own cloud. So why not build a Free Software stack for these servers that’s easy to install? Fill in a few fields, and it sets you up with servers for web, mail, Jabber, microblogging, social networking, and whatever else you’d like, all pre-configured. Instantly, you’d become part of those respective networks, with your own little node. Small, but fully capable. A distributed search engine like YaCy, which we’re using on the FSFE website, adds another key element.

That’s still fairly conventional. So why not add a layer for distributed data storage? We could have a program that encrypts your data, breaks it up into hundreds or thousands of small chunks, and stores those chunks across many machines on the Internet. You are the only person who has the key, and the only person who can pull all this data back together and decrypt it. The more storage capacity you provide, the more you receive on the machines of others.

That’s a bit farther off. Projects like Gnunet or Angel Application are still in the early stages of development. But I have no doubt that if developers focus their effort here, they’ll make progress very soon.

So with that done, how about tying all those machines together so that applications can move seamlessly between them? There are all sorts of challenges here, such as:

  • How can you keep your data private if it’s processed by an application on a remote machine, and needs to be decrypted for that purpose? Good question. A social trust layer could be part of the solution.
  • How do you manage the overhead that comes with so much encrypted and distributed communication? The fact that the Internet hasn’t collapsed under the weight of online video as predicted a few years ago gives some grounds for hope. So does my new 25 mbit VDSL line.
  • And how do you achieve anything like reasonable speeds on such a system? I’d say that depends on your value of “reasonable”, for the time being.

The reactions I got were quite interesting. Some people gave me rather vague looks, which was hardly surprising. Others were enthusiastic. It certainly made for a good debate.

Why shouldn’t we build distributed systems that are owned by all of us? I wish that today’s big nodes – Amazon, Google, Microsoft, and so forth – would get onto the topic together with the Free Software community, and add their considerable engineering resources to the effort of developing both the technology that will let us run our computers in freedom, and the business models around it.

A lot is currently happening in this field. At this year’s FSCONS in Gothenburg, Sweden, FSFE is organising a track on distributed systems under the heading “Divide and Reconquer“. I’m very much looking forward to seeing the discussion take another step forward there.