Blog Entries

April 2014
M T W T F S S
« Mar    
 123456
78910111213
14151617181920
21222324252627
282930  

Who Wants To Be A Millionaire? Apache, KDE

Recently the Apache Software Foundations SVN repository hit the 1,000,000 commit milestone (landmark?). Congratulations to all who participate in the Apache community. You should be proud of what you have achieved. There are very few projects that last so long and survive being so large and so active.

In fact, I can think of only one other: KDE, of course!

To get from commit 1 to commit 1,000,000 took the ASF roughly 14-and-a-half years and the effort of 2506 contributors in the VCS. For KDE it was roughly 10-and-a-half years and the effort of 2154 contributors in the VCS.

I will write a couple of short posts comparing how Apache and KDE got from there to here, and will start by looking at the two most basic counts measurable: the number of commits and committers per day.

Below you will find a plot showing, for each day and for each of ASF and KDE, the number of commits made each day:

KDE, ASF Daily CommitsWhat strikes me as being very interesting is just how similar these plots are. KDE had managed to get off to a quicker start, but otherwise both communities grow to a similar commit rate and then plateau at similar levels.

The questions in my mind are:

  • If these communities continue to grow (assumption), why does would the commit rate plateau like this?
  • Is it just coincidence that this plateau happens at a similar level in both projects?
  • Why is it that for half of the days since (roughly) 2006 there have been roughly half the commits than on the other days in the ASF? This is what is creating the “fork” effect in the ASF plot.

I am not going to offer any thoughts on these questions just yet. Instead, let’s take a look at the rate of committers per day in each of the communities.

KDE, ASF Daily Committer RateThose of you who have seen me do these kind of plots before will no be surprised to see that the committers plots follows a very similar form to the commits plot. This is because, for various social reasons, the commit rate and the committer rate usually correlate: (personal) process, community influence etc.

It’s interesting to see that the forking in the commit rate is caused by a similar pattern in the committers rate. The question I now have in my mind is why? Why do only half the ASF committers turn up on every other day? If anyone has an idea I would love to hear it. I think this is fascinating.

It is also interesting to see that the committer rates in both KDE and ASF plateau at similar levels. There are not many projects that get as large as KDE and ASF do. We know from both Brook’s Law and Conway’s Law that a certain level of restructuring is required as teams grow in order to maintain productivity. Perhaps there is a point around 100 daily committers where communities (that are lucky enough to get that large) have to really rethink their structure.

More on this next time.

Be Sociable, Share!

16 comments to Who Wants To Be A Millionaire? Apache, KDE

  • My immediate thought about why Apache committers commit every other day was that perhaps they are more likely to work on Apache only at work rather than in their spare time. In Europe, holiday days + weekends is roughly half the year.

    Thoughts?

    • moltonel

      Mike McQuaid: sorry for the off-topic, but thinking that “In Europe, holiday days + weekends is roughly half the year” sounds like propaganda.

      The weekends account for nearly 29% of the year (nearly all the world’s IT works on a 5-day week, right ?). To reach 50% you need about 80 extra days. Bank holidays are just a blip on the radar (with a few of them happening during weekends). Then I get 20 days a year for a 45h week. That may be more than you are used to in the USA, but it’s still far off the 50% you were thinking about.

      • leo

        For the record, Mike is not American, nor is he employed in the US. I however am, and I get ~10-15 days of vacation a year. Some of my colleagues in Europe have 25-30 days of vacation. No one is trying to push propaganda, just discuss this issue.

      • moltonel: “Roughly half” sounds reasonable to me. In the UK, you get 8 public holidays (none on weekends) and average about 28 days holiday. Add 2 days a weekend for 52 weeks a year and you get 140 days a year or 38.3%. That’s closer to a third than a half, I’ll give you but it’s not that far off. If someone is working full-time on Apache, it’s not inconceivable that another 10% could be days when they don’t commit as large features (particularly to SVN) are perhaps more likely to have more sporadic committing.

        You’re perhaps a bit paranoid if you think that is “propaganda”. As you could have seen if you clicked on my name, I’m Scottish and work in the UK. Most jobs here, as I said above, give you about 36 days a year for a 37.5 hour work week (excluding lunchbreaks), in case you were interested :)

        • moltonel

          Sorry for my previous, hurried comment. I guess I’m working more than I’d like, and also am tired of some people bashing other countries without knowing (which I made myself guilty of right there…). As it happens I’m working for a french company; France has 7 effective bank holidays in 2010 (it varies). Altough I am working a lot, I love my job :) 35-38% is still not “roughly half the year” in my books, though.

          That said, back on topic, it’s very likely that the weekdays is the cause of the fork in the graph. It’d be interesting to see what the weekend/weekdays ratio is.

  • Nicky

    Hi!

    Just a quick thought – could the lower half of the fork just be the weekends? I can’t make out the RHS of the KDE plot, but if it doesn’t split then maybe this is because more people are paid to work on Apache, while KDE has more volunteers?

  • [...] This post was mentioned on Twitter by Mark Taylor and Kevin Ottens, Paul Adams. Paul Adams said: New blog post: Who Wants To Be A Millionaire? Apache, KDE http://blogs.fsfe.org/padams/?p=140 #apache #kde [...]

  • As the lower prong seems to be sparser, I would lean towards “weekend” as well. Should be bordering on trivial to verify and Paul already has the input data, so… :)

  • It is worth to note for the readers that the commit number scale is logarithmic and that it is the main reason for the visual /stagnation/ at the end, or as you call it the ‘plateau’.

    There is one shorter stagnation period in the middle as well.

    Essentially, since the plots approximate the log function, I’d guess that the growth is rather close to being linear.

    Cheerio

  • Paul Adams

    Thanks for all your thoughts… I, too, leaned towards thinking this was weekends. However… If that /is/ the case there is an interesting cultural element to ASF in that I have never seen such a distinct drop in contribution that a “forking” effect is displayed.

    I will pick up more on this in my next post in this comparison.

  • Paul Adams

    Ivan, yes, growth is largely linear. By seeing a plateau on the log scale what we are seeing is that the growth is not accelerating anymore.

    • One of the possible explanations when KDE is concerned is that more than a few projects have been switching to git-based development – some of them completely, and some of them remain in svn only for periodical check-ins.

      I don’t remember when exactly Amarok moved to gitorious, but I guess it was somewhere around 2oo8.

  • Ofir

    Nice post.

    In my opinion is that as the community grows, it is not necessarily means the number of commits per day grows at the same rate too. The reason is that many new community members work on side projects, like artwork or promotion, which doesn’t have a obvious relation to the number of commits per day.

  • Matt

    This might be way off, but what proportion of the ASF developers live near the international date line?

    A developer could try to commit code at approximately the same time each day, but the commits could fall either side of the date cut-off when converted to GMT. For example say they commit at 11:59GMT on one occasion and then roughly 24 hours later at 00:01GMT, even though only a day has elapsed and their commits are regular it would look like they are missing out days.

    I’m not sure in which countries this might happen – it might be that it is people offset some distance from the date line if they prefer to commit at the end of each day.

    I doubt this would fully account for the pattern you have identified though.

  • adridg

    SOmething we haven’t looked at at all is constancy. You know, sticking with a project year in, in sickness and in health, for better or for worse. How many developers keep coming back week after week? Or day after day? Or day after day: show the percentage of committers each day that also commit on the next day. Something like that — it might help get a feeling for where the “fork” comes from.

  • You should take a look at what i’m doing with Enzyme (http://enzyme-project.org/), which is powering the return of the KDE Commit-Digest.

    Whilst i’m currently busy getting it up to the level of the old Commit-Digest in terms of functionality, my longer term plans include setting up a full queryable database of every commit, which I can then plug into dynamic, real-time visualisations – combining it with my existing database of extended KDE contributor stats – using something like ProtoVis (http://vis.stanford.edu/protovis/ex/).

    This is where you could come in, as i’m no expert on data visualisation (I wouldn’t know which of a set of specialised graphs is more appropriate in certain situations, etc :) )

    Danny