And now for something completely different…
Yesterday and today, along with Jonathan Riddell, I have been lucky and honoured to be a judge at the inaugural NHS Hack Scotland. This is an event which puts programmers and designers in the same room as staff from the NHS that have real problems to be solved with technology. This event is a replica of similar events that have recently been held in London and Liverpool.
There are roughly 40 folk taking part in the event, split into about 8 projects of various sizes. I will report more on the actual projects later: they range between a mobile app that allows people with anxiety disorders to record their SUDS levels to an application that will assign the most appropriate ambulance to emergency cases (can you believe such a thing does not already exist)! There is obviously only so much that can be achieved in only two days, but the focus is very much on producing software that does something to show off the potential for the technology and this form of collaborative development.
In many respects this reminds me of the CommunesPlone project that I was somewhat involved with when I was working or Zope Europe Association.
The event is being held at the Techcube startup incubator in Edinburgh and will run until 5pm today. So, if you are in the area, you should drop by and take a look at what is going on… Maybe even take on a new pet project?
Jonathan and I are spending the day getting to know the teams and their projects before judging begins in earnest this afternoon. Of course, we are interested in how much of an impact the project will have on patient care or on making staff members’ lives easier. Moreover, we are looking into longevity… Have these teams, from day 1, built into their project the kind of Free Software ethos that will help ensure sustainability of the project beyond the weekend? Because it is all very well producing something sexy for the NHS this weekend, but that is no good if the project is dead tomorrow.
So yesterday I blogged about my updated version of the green blobs. Since then I have made a little progress… More than intended, due to a bit of input from Kevin Ottens and healthy dose of luck.
The problem with the images I demoed to you yesterday was that the committer names (and the date upon which they first contributed) were all over to the left hand side of the picture. Only a small tweak and the text is correctly appearing next to the first steel blue blob. Except there was a problem… Occasionally the text was being truncated. Either the text was not being drawn, or the first blue blob on that row was being drawn over it. In order to determine this, Kevin suggested that I make the blobs semi-transparent. Simple. Doing so quickly revealed that the problem was that blobs were being drawn over the text. Further work fixed this problem and we end up with the following result…
Click to enlarge
Now here’s where it gets a little interesting… My script is fairly brute-force in places. Most importantly, if a contributor is making multiple commits in the same week, the blob is simply being drawn over and over. Well, by applying a little transparency we get a nice heatmap result. Take a look at the image linked above… Notice the variations in the colour? The darker the blue, the more active the contributor was that week.
Total accident. Kinda neat. And actually makes these pictures ever-so-slightly more useful than they were before.
So I’m at the KDE PIM development sprint in Berlin. This gives me a bit of time and space to do some project analysis… You remember that stuff I do, right? First, however, I had to finish off my new log parser…
For years I have been sitting on a log parser and scripts written in Python. These have served me well, but they are just too slow and more than a little buggy after years of being maintained poorly. So I have gone about writing a new parser (with some help from Mirko Boehm and Aaron Seigo) and rewritten all my metrics in C++/Qt. Everything has been working very well for many months now… Apart from one script: the green blobs. So, while I have been here in Berlin, I have finally gotten around to rewriting this monster of a visualisation. I had left this for last because of the need to dig into QFont, QImage, QPainter, QColor etc; all my other scripts were more C++ than Qt.
Of course, a new script requires a new colour… Green blobs are dead! Long live lightsteelblue blobs!
Light Steel Blue Blobs for KDE PIM: Click To Enlarge
This version is still a little buggy… Some of the information is not displayed, some of it is badly positioned. But the important nuts-n-bolts of it are there.
Watch this space. Code will be published soon!
Last week I was surprised to receive an email from Shane Coughlan inviting me to become a Fellow of the OpenForum Academy. The current Fellowship has some of my personal heroes in there and so it was exciting and humbling to receive such an offer.
From their website:
OpenForum Academy is a think tank with a broad aim to examine the paradigm shift towards openness in computing that is currently underway, and to explore how this trend is changing the role of computing in society.
OpenForum Academy is an independent programme established by OpenForum Europe. It has created a link with academia in order to provide new input and insight into the key issues which impact the openness of the IT market. Central to the operation of OpenForum Academy are the Fellows, each selected as individual contributors to the work of OFA. A number of academic organisations have agreed to work with OFA, working both with the Fellows and within a network of contributors in support of developing research initiatives.
During Akademy last week I was very impressed by the keynote presentation given by Will Schroeder of Kitware. At the core of his talk was a concern that research should be open by default, but it isn’t: peer review is a black box and publishers charge a fortune for access to the final paper. Over dinner we spoke at length on this matter.
I am lucky in that my career is no-longer tied to my publication record. This allows me the freedom to do my research when and where I want. Typically this means I treat all my research as “work in progress”. As and when I make little steps forward, I publish what I have done in my blog and gather feedback.
I have now taken the decision that whenever I have a complete piece of work I will publish:
- in an open-access journal;
- my data and the method I used to gather it;
- any tools I used to process that data.
I have always conducted my research as a benefit to the Free Software community. I’m extremely grateful to the OpenForum Academy for recognising this work and I take my new Fellowship as an opportunity to renew my commitment to making my research as open as I can.
GIMP does not drive me to the airport if my flight is before 10am.
So, its that time of year again, the annual meeting of all things KDE… Akademy! This year it is coming to you from Tallin, Estonia. This year will be my 6th outing to the event
Of course, KDE is very dear to me and to Kolab and so, in addition to me, there will be a few other members of the Kolab community will meet at this year’s Akademy. Key contributors Christian Mollekopf and Jeroen van Meeuwen will be present and available to discuss Kolab related issues. Jeroen will also give a talk about release engineering processes using KDE as an example. His experience from the Fedora Project, Cyrus IMAP, Cyrus SASL and from his roleas a Systems Architect at Kolab Systems provides him with ample experience to give some insight into how release engineering and quality assurance within the fast-paced KDE project could be improved further.
The Kolaborators will also be taking part in a Task Management sprint featuring Zanshin and Kolab developers. If you are interested in task management in KDE, you are invited to join. The sprint will focus on counting work into bringing Zanshin-like experience to Kolab on the desktop and web. This meeting will take place during the workshop week after the main conference; no date or time has been set yet, but if you track down me, Christian or Kevin Ottens we’ll work it out.
Akademy is one of my favourite conferences of the year and I’m really excited to be catching up with my KDE buddies. If you want to talk about Kolab (or anything else) just come track me down… I’ll be around until Wednesday.
So recently Jeroen van Meeuwen asked me to take a look at Cyrus IMAP. He had been involved in their switch from CVS to GIT and was curious to see what the results looked like. Let’s start with the usual green blobs:
Cyrus IMAP: Full History in Green Blobs (Click to Enlarge)
So, since I do not know precisely when the switch from CVS to GIT was made, I’m using Jeroen’s start date in the project (2010-09) as a rough guideline. Looking at the green blobs it is pretty clear that something happened after he joined the project. But let’s start by looking at what was going on before he joined.
Between 1993-07 and 2010-09 there were 25 accounts in CVS. Note: accounts and not contributors; clearly some of these accounts belong to the same person. For much of these first 7 years the project is also displaying what I refer to as “token-based development”; that is, in many weeks there is only one contributor (as if you had to hold a shared token to commit). I first noticed this phenomenon when studying Evince during my PhD and I have seen it only a couple of times since. I wish I could explain it.
Now, since 2010-09 we can see that 27 new accounts have contributed to the project; most are only around for one week (if we look deeper, I bet for only one or two commits) never to be seen again. Perhaps one of the effects of switching to GIT is that it is simply a lot easier for people to contribute? No brainer.
But I think there is slightly more at play here. How did Cyrus IMAP manage to get to its 17th anniversary and then basically double the size of its developer community just because of a switch to GIT? A project of such importance surely must have been attracting more folk before the switch to GIT… It is not like activity has increased significantly since the switch, right? Right?
Let’s take a look at some simple measures… Commit and Committers per month:
In the days after 2010-09 we can clearly see that commit rate (commits/day) has increased and so has the effort density (there is more activity on more days). But we are not seeing any significant increase in the number of contributors per day. So commit rate is up (not surprising, GIT encourages this and it is almost always exposed except where pushes are squashed) and committer rate is about the same.
Hang on! What about all those new folk we saw appear since the switch to GIT? Well, a typical week still only sees one or two contributors and these new contributors are mostly only hanging around for a few pushes (so they easily account for the days where the commuter rate increases to 2 or 3).
Here is what I think is going on… The switch from CVS to GIT has almost certainly made it easier for people to join and take part in the Cyrus IMAP community (even if only briefly). This accounts for some of the increased participation of new contributors. However, I suspect Cyrus IMAP in its first 17 years had way more contributors that the 25 we can see. My guess is that plenty people were submitting patches and that one of the 25 were applying them, thus losing the identity of the patch submitted in the CVS log.
So whilst switching to GIT has probably caused some growth, I think it is not as much growth as it may seem because the old CVS data hides the true scale of the community.
So what does the Cyrus IMAP community look like these days? This graph shows which contributors worked together (i.e. committed in at least one file together) during October 2010, the month after the switch. Everyone say “Hi” to the Cyrus IMAP community:
As the primary data store for Kolab data, Cyrus IMAP is a project close to my heart (but one that I really play no direct role in). If you would like to learn more and take part in the community check them out at their website or join in the discussion at #cyrus on Freenode.
So, in my mission to see how we can automatically detect “core” teams, I need a measure for how closely people work together. Those of you with strong memories will remember I once coined the term “cohesion” for this measure. I introduced it in a paper at the International Conference on Software Maintenance, three years ago and blogged about it around that time.
This measurement is based on some basic graph theory that I have been over before. But for the sake of completeness here is a quick recap. Let’s start by taking a look at a graph which represents one month of KDELIBS development, in this case, April 2009 (click to enlarge):
Each node here represents someone who has committed to KDELIBS in the month. The edges represent resource sharing: two nodes are connected if the committers both commit to the same file in the month. These edges have a weight (not shown) which is the number of shared files between the nodes.
Using the Floyd-Warshall algorithm it is possible to find the shortest paths between all pairs of nodes in the graph. This, in turn, allows us to find the mean shortest path length and this is what I call “community cohesion” (which should not be confused with graph structural cohesion). Now, this number is not really comparable between communities; their differing working practices really disallow this. However, within a community, we can certainly trend this metric and see how it varies over time. Perhaps, for example, certain events (such as release deadlines) cause the metric to increase? An increase in this metric shows the community is working together more tightly (higher edge weights, contributors sharing more resources).
The next step, of course, is then to actually measure this and see how the trend looks for different projects. So, I have picked KDEPIM and KDELIBS to look at; below is their cohesion trends for the 120 months from 2001 to 2010 (click to enlarge):
Now, I admit these projects are both part of the greater KDE community and so share a few contributors and a release cycle. Other than that, however, they are distinct projects. So I was surprised to see the two trends above. Why? You do not have to look too closely to see that there is a certain degree of correlation between them.
If we use Pearson’s method, we find the correlation is 0.33. To jog your memory: a score of -1 shows perfect negative correlation, 1 perfect correlation and 0 shows no correlation whatsoever. So our score of 0.33 is hardly strong correlation, but it is enough to show that either the release cycle or contributor sharing has some impact.
At a later time I will rerun this with more randomly selected KDE projects to see if similar results are found.
So this is about the time I usually do my annual review of activity in KDE SVN. Of course I have now stopped my analysis of KDE SVN and moved on to git. Instead of analysis every repo in KDE git, I will focus on what happened in KDEPIM in 2011 (KDEPIM exclusively, no PIMLIBS or PIMRUNTIME).
OK, to kickoff, the green blobs (click to enlarge):
The first thing I noticed here is that there is no account which has committed in every week of 2011. Notice, also, Laurent; he is not the most regular contributor to this repo (he committed in 67% of the weeks) and yet he is one of the most regular contributors. If we look at commits per committer, we get the following top 10 for last year:
58 Bjoern Ricks
89 Christophe Giboudeauw
109 Torgny Nyblom
142 Volker Krause
162 Script Kiddy
195 Tobias Koenig
196 Allen Winter
273 David Jarvie
273 Sergio Martins
1198 Montel Laurent
The second thing I noticed about the green blobs is how “white” that image is towards the bottom; that is, developers whose first commit for KDEPIM in 2011 was after the first week tended not to stay around too long. This for me feels like the people towards the top are most-likely part of an existing “core” team.
My “Oracle of Ervin” tool reveals Laurent to be the most highly-connected developer in this repo; this comes as no surprise. If we visualise the community we can see him along with others in the “core” of the community (click to enlarge):
Remember, this visualisation is not concerned so much with how much work an individual does, so much as how much they work with others (a better measure of a “team” I believe). So, Laurent appears in the “core” of the KDEPIM team along with many others from the top of the green blobs image. Based upon team structure it appears that Allen “The Kaiser Soze of KDE” Winter is at the centre of the community. You can expect Sergio, Till and Volker to appear in a line-up with him any day now.
How much did the KDEPIM community evolve over 2011? Let’s start by looking at the trend of all-time contributors:
At the start of the year there had been 536 contributors to KDEPIM through its history and this had grown to 568 by the end of the year. However, we can see from the green blobs that many new developers are only hanging around for one week and the project continues to be dominated by a “core” that is not evolving all that much (this is neither surprising nor worrying). If we show the daily commits split by length of contribution, we get the following:
As you can see, this chart is dominated by the purple of committers with more than 2 years of contribution to KDEPIM. Whilst a mature community is an excellent thing to have, I’d definitely still like to see some young blood get retained in KDEPIM in 2012.
So there you have it; KDEPIM 2011 in pretty pictures. I have improved the green blobs so that: 1. the font used is more legible 2. it supports UTF-8; thanks go to Chusslove Illich (Часлав Илић) for contributing to KDEPIM and exposing this. UTF-8 support was also added to the community graphing tool. The “downside” to this is that I am now using real names in these visualisations (I also toyed with the idea of using email addresses, but these create other issues). Ever since I switched from analysis of SVN to git I have struggled to find a suitable alternative to SVN account names. Using %an or %ae in git log really are the best replacements.
[This is slightly off topic from my usual Free Software analysis.]
So the Collatz Conjecture came to mind. I took a look at the Wikipedia article and was struck by a couple of things: I liked the stopping time (the number of steps you have to take to get from the given starting number to 1) plot and the graph showing the paths from certain starting numbers to 1.
Both also disappointed me for not showing enough data; this had clearly been done for clarity. Fair enough, but sometimes if you throw enough data in a visualisation it just “looks” right. Right? (OK, this is far from true). So, since it had been a while since I had last dusted off my Python and Graphviz skills, I thought I would try to replicate these visualisations, just with more data.
So let’s start with the stopping time plot (click to enlarge):
Nice pattern. Hardly exciting.
What is a little more fun is the graph showing the paths from given starting numbers back to 1 (click to see the full image, 36mb):