Blog Entries

September 2019
« May    

More On Identifying Troubled Projects

The other day I blogged about how we might identify a Free Software project that was in trouble because of being too reliant on a single individual. One of the things I did not like about that particular visualization is that it said nothing about the artifacts in the repository. Instead, it focused squarely on the contributors. What if, for example, an individual was not just responsible for a disproportionately large number of commits, but also for a disproportionately large number of artifacts, too? So the revised visualization goes a little like this:

  • Sqaure nodes are contributors;
  • Round nodes are aretifacts;
  • Square nodes are only connected to one or many round nodes;
  • Edge weights are the number of times the contributor has committed to that artifact.

Again, the graph is laid using Kamada-Kawai in order to get a feel for the “shape” of the community. I have also followed the same colouring scheme so that it is easy to spot the contributors with a high number of commits. I have also removed labels from nodes; this makes the graph better for interpretation and, more importantly, smaller.

Following this new approach for the same KMail data set as last time, the following is achieved:

Neat, huh? The first thing to pay attention to is the disconnected sub-graph graph at the right hand side. Here we can see a particular KMail contributor that is the only contributor to touch 27 particular files and is not connected to any other developers. Before you ask, it is not “scripty”. If you were a KMail contributor, you may well want to know who this is, right? Although, in a team this size, I guess most KMail developers know who this is. The question in my mind is not “who is that?”, but rather, “what are they up to?” Summer of Code? The other thing that is interesting to note is the modularization of the community; we see many developers who are responsible for distinct parts of the codebase. This raises a good and a bad point in my mind:

  • The bad: If that contributor is run over by a bus, will that part of the codebase suffer? This is almost inevitable, at least at first.
  • The good: Thanks to Conway’s Law, we know that if the community is modular, the codebase is probably going to be modular, too. This is, of course, a good thing. Modular codebases are easy to maintain and make it easier to integrate new contributors. However, guestimating the modularity of the codebase, based upon the modularity, of the community, is not exactly a precise science.

The various visualisations I have produced are open to interpretation. Personally, whilst I think they are interesting, I still don’t think they tell us enough to act as some form of tool for detecting troubled projects. However, in the hands of (in this case) the KMail community, they may well be enlightening.

Be Sociable, Share!

10 comments to More On Identifying Troubled Projects