We’re all Gmail users now – Pt. 2

The other day I wrote about how even if you don’t use Gmail, Google still ends up with access to a lot of your personal conversations. My own analysis was pretty poor imitation of the interesting work done by Benjamin Mako Hill. Where he used Python and R, I just fumbled around with Mutt’s limit patterns. Due to the different methodology, our figures weren’t really comparable.

Now I took the time to actually run Mako’s scripts. It turned out to be easier than I thought. The archives I analysed contain data starting in the first half of 2009, but anything before 2010 is patchy. I changed my mail setup at the start of 2010, and most of the mail from before then isn’t included in this analysis.

Email since 2010: All mail vs mail handled by Google
Number of mails, overall and from Google

This shows us the absolute number of mails I’ve received and replied to. It tells me that my mail volume is fairly constant over the long term, but that my mail load can oscillate wildly on a weekly basis. And it tells you when I was on vacation these past years.

The share of mail that goes through Google’s servers is pretty low. But how low?

Share of mail going through Google's servers

Between 10% and 15%, that’s how low. As Mako (and me) would expect, Google is somewhat more involved in conversations that I carry on actively (Email with Replies) than in the overall set of email I’ve received. This is because there’s lots of spam and auto-generated mail in the “All Mail” category, and most of it doesn’t go through Google’s servers.

I have no idea what causes the slight uptick in Google’s share among mails I’ve replied to after mid-2013.

Hugo Roy has run Mako’s scripts, and his Google share moves between 25% and 50%. The results that I obtained from Mutt’s limit patterns match the output of Mako’s scripts pretty closely, by the way.


What can we learn from this? A large share of my contacts doesn’t rely on Google for email service. That’s good news.

On the other hand, Edward Snowden telling us that the NSA and its buddies are after our mail apparently hasn’t dissuaded people from using a provider they know is being tapped. Or at least, it hasn’t really increased the number of my contacts who avoid using Google for mail.

Finally, looking at the figures from Mako and Hugo alongside mine, your privacy against the large web companies (and against the spies who hoover up the data they store) largely depends on your environment. If you work in a place where lots of people rely on Google for mail service, your data will end up on the company’s servers. If, on the other hand, your employer and your friends rely on their own servers, or on smaller providers, you have a much better shot at protecting your privacy.

Taking control of your own systems is the easy bit. Persuading everyone around you to do the same is harder, but has a bigger impact.