So, in my mission to see how we can automatically detect “core” teams, I need a measure for how closely people work together. Those of you with strong memories will remember I once coined the term “cohesion” for this measure. I introduced it in a paper at the International Conference on Software Maintenance, three years ago and blogged about it around that time.
This measurement is based on some basic graph theory that I have been over before. But for the sake of completeness here is a quick recap. Let’s start by taking a look at a graph which represents one month of KDELIBS development, in this case, April 2009 (click to enlarge):
Each node here represents someone who has committed to KDELIBS in the month. The edges represent resource sharing: two nodes are connected if the committers both commit to the same file in the month. These edges have a weight (not shown) which is the number of shared files between the nodes.
Using the Floyd-Warshall algorithm it is possible to find the shortest paths between all pairs of nodes in the graph. This, in turn, allows us to find the mean shortest path length and this is what I call “community cohesion” (which should not be confused with graph structural cohesion). Now, this number is not really comparable between communities; their differing working practices really disallow this. However, within a community, we can certainly trend this metric and see how it varies over time. Perhaps, for example, certain events (such as release deadlines) cause the metric to increase? An increase in this metric shows the community is working together more tightly (higher edge weights, contributors sharing more resources).
The next step, of course, is then to actually measure this and see how the trend looks for different projects. So, I have picked KDEPIM and KDELIBS to look at; below is their cohesion trends for the 120 months from 2001 to 2010 (click to enlarge):
Now, I admit these projects are both part of the greater KDE community and so share a few contributors and a release cycle. Other than that, however, they are distinct projects. So I was surprised to see the two trends above. Why? You do not have to look too closely to see that there is a certain degree of correlation between them.
If we use Pearson’s method, we find the correlation is 0.33. To jog your memory: a score of -1 shows perfect negative correlation, 1 perfect correlation and 0 shows no correlation whatsoever. So our score of 0.33 is hardly strong correlation, but it is enough to show that either the release cycle or contributor sharing has some impact.
At a later time I will rerun this with more randomly selected KDE projects to see if similar results are found.