The answer: About 57% of the mails in his inbox had been delivered by Google. That’s still a conservative calculation, and it’s pretty depressing for someone who goes to the length he does to keep his data private.
Mako’s work inspired me to do the same. It was late and I was tired, so instead of futzing around with Python and R, I decided to simply use the tool I had available anyway, and rely on Mutt’s limit patterns. The archives I analysed go back to September 2009 – not quite as comprehensive as Mako’s own, but still significant.
I used a pretty simple limit pattern:
Limit to messages matching: ~h google.com
which translates to “show me all messages that have the string ‘google.com’ somewhere in the header”.
Out of 140,819 messages, 15,746 matched the pattern. That’s 11.18% – much lower than Mako’s share. Why is this?
Besides the fact that I run my own mail server, the reason is probably that most of my email concerns my work as FSFE’s president. I exchange a lot of mail with FSFE’s staff and volunteers, most of whom use @fsfe.org addresses. These addresses are just a redirect that people can point anywhere they like (hey, if you want one, you can become a Fellow, and support FSFE’s work!).
A few people use them to point to Gmail, but most apparently don’t. A lot of the people I routinely exchange mail with run their own mail server, or host their mail with a small provider. (I assure you that there aren’t a lot of Hotmail users in FSFE.)
The figures above don’t include most public mailing lists that I subscribe to. So I took a look at those, too. Here, I was expecting the share of mail that passed through Google’s servers to be higher. It turns out that the opposite is true: From January 2012 to today, I received 46,163 messages in this folder. Of these, 2,547 have the string “google.com” somewhere in their headers – that’s just 5.52%.
I’m happy to admit that I’m not entirely sure about the methodology. Feel free to criticise and suggest improvements in the comments!
The upshot is that yes, hosting your own server – or keeping your mail somewhere other than the big web service companies – is an important component in reducing your data exhaust. But the size of that reduction depends on the providers used by the people you usually talk to. Privacy, as Eben Moglen highlights, is an ecological issue.
What share of your mail goes through Google’s servers? Post your figures in the comments.