Brian Gough’s Notes

occasional GNU-related news

GNU Hackers Meeting in Paris (25 – 28 August 2011)

July 26th, 2011

It’s one month till the next GNU Hackers Meeting in Paris (Thursday 25 August – Sunday 28 August). There are about 45 GNU maintainers and contributors registered so far (details of how to register are on the meeting web page).

Speakers include Jim Meyering on “The perils of relying on output streams in C”, Stefano Zacchiroli (Debian Project Leader) on “Distributions, upstreams and downstreams” and Jim Blandy (Mozilla) on “Extensibility in Firefox” among others.

The meeting is hosted by IRILL, the Initiative de Recherche et Innovation sur le Logiciel Libre at the INRIA building near the Place d’Italie. Looks like a great location:

avenue_italie_23_small.jpg

Some pre-release checks for GNU packages

April 4th, 2011

Here’s a list of pre-release checks I’ve found useful over the years. Most of them should apply to any package, but they’ve been used for a C library (“libfoo” in the examples below, with exported header files foo/*.h and source files src/*.c).

  • Make sure all internal functions are static, and all exported functions & variables prefixed with foo_ or FOO_. Here’s a command for checking this:
    nm -A -g -P *.a  | perl -a -n -e 'print if $F[1] !~ /foo_/ && $F[2] ne "U"'
  • Make sure config.h is used consistently – it should be present in source files, but not in any exported headers.
    grep config.h foo/*.h    --- shouldn't match anything
    grep -L config.h src/*.c   --- gives files not using config.h
  • Make a master header file of all exported headers and try compiling it to test for conflicts between different header files.
    cat foo/*.h > tmp.c
    gcc tmp.c
  • Check that the library passes “make check” with various options. Some good ones are:
    -O3
    -mfpmath=sse -msse2
    -funsigned-char (to simulate Power/ARM)
    -std=c89 -ansi -pedantic (now that GCC uses c99 by default instead of c89)
  • Test with valgrind. You can run all the tests under valgrind with the TESTS_ENVIRONMENT variable
     $ TESTS_ENVIRONMENT="valgrind --tool=memcheck" make check

    This only works if configure was run with –disable-shared, because otherwise the shared library libtool shell-scripts themselves are run under valgrind.

    Regarding memory leaks, the GNU coding standards say “Memory leak detectors such as valgrind can be useful, but don’t complicate a program merely to avoid their false alarms. For example, if memory is used until just before a process exits, don’t free it simply to silence a leak detector.”

    For libraries I’ve found it’s very helpful (actually, essential) to have a test suite that is leak-free. Then if any test program exits with memory still allocated it indicates a problem. This is how I have caught any memory leaks during development.

  • Do make distcheck the local copy of install-sh provided by automake, in addition to /usr/bin/install. Depending on the end user’s environment, either of these could end up being used.
    make distcheck # will use /usr/bin/install
    make distcheck INSTALL=$HOME/foo/install-sh  # absolute location of install-sh

    When you do “make install” yourself, it will probably use /usr/bin/install since that is present on GNU/Linux systems. If so, you would not detect any problem with the install-sh shipped in your package – for example, if it had got corrupted for some reason (e.g. a stray edit or in my case, after doing a repository conversion I somehow ended up with my working directory containing the first ever version of install-sh that was checked in – which dated from 1994 – it didn’t work properly on modern systems, but I never noticed that because I was always using /usr/bin/install).

  • Try running the build and make check with a memory limit to be sure that excessive memory is not required to do the compilation.
    ulimit -v 65536  #  some reasonable memory limit
    make
    make check
  • Check that the texinfo manual works for postscript/pdf
    make dvi  # or make pdf

    It’s easy to miss a diagram or other texinput file that is needed by texi2dvi but not by makeinfo.

Autocache

March 9th, 2011

At FOSDEM, Alfred (ams@gnu.org) came up with the idea of “Autocache” – a system-wide precomputed cache of autoconf tests that makes every configure script run instantaneously. A crazy idea that cannot work completely in practice, of course, but there might be something to it.

He sent me an implementation which works like this: you make a configure script from all the autoconf tests in Gnulib (excluding some obvious ones that should not be cached). The idea is to pick tests which are unlikely to change on a given system. You run this huge configure script once and make a copy of the resulting config.cache file. You then use this as an initial config.cache for any package you’re compiling.

I tested it on GSRC and it worked surprisingly well – out of ~100 packages there were only a handful of failures where somehow the cache had bad results. Here are the timing results for the top ten longest-running GNU configure scripts with/without autocache.

./autocache.png

Decentralised systems and the tendency to centralisation

February 12th, 2011

In his presentation at FOSDEM on “Power, Freedom, Software. Why we need to divide and re-conquer our systems” Karsten Gerloff talked about the difference between decentralised systems, which are networks of client-server systems (such as Email), and distributed systems, where all nodes are equal (such as peer to peer filesharing networks like GNUnet).

If the client and the server of a decentralised system are both free software then anyone can start their own server and be free from control. In practice, decentralised systems have a tendency to become centralised, and in some cases highly centralised for most users.

The tendency to centralisation is a result of two factors: asymmetric costs and economies of scale.

  1. Asymmetric costs: in client-server systems, the server generally requires more administration that the client. Users have an incentive to transfer administration to a third-party, either for a fee or trading away privacy, in the case of advertising-supported services.
  2. Economies of scale: server administration is subject to significant economies of scale. Larger providers drive out smaller providers over time, leading to a high degree of centralisation.

These two factors form a positive feedback loop, as larger providers can offer better services at lower cost, thereby attracting more users.

As a specific example, consider email. In the early days, each group of users had their own mail server at their local site. Over time, email has been outsourced to ISPs, ISPs have merged and been taken over, and finally been overtaken by a small number of global mail providers with userbases in the hundreds of millions.

Although the factors driving this process are universal the degree of centralisation is not uniform, but depends on the relative cost of administration and value of privacy for different users. For example, email has become effectively centralised for individual non-technical users, where the relative cost of running a mail server is high and the perceived privacy problem is low. For large companies, the relative cost of running a mail server is lower, confidentiality of corporate information is valued, and most maintain their own mail systems in-house.

Autoconf caching – an option I didn’t know about

February 10th, 2011

In his talk about Autotools at the FOSDEM GNU Dev Room, Ralf Wildenheus mentioned an autoconf option that I wasn’t aware of. It’s the “-C” option and it enables caching, which speeds up multiple runs of a ./configure script enormously.

$ ./configure --help
     --cache-file=FILE   cache test results in FILE [disabled]
 -C, --config-cache      alias for `--cache-file=config.cache'
$ ./configure -C
configure: loading cache config.cache
...

Caching used to be the default up to about 10 years ago, but was turned off as the cache gets out of sync with the system if new packages are installed. Also it really only benefits developers, as end-users typically only need to run ./configure once.

In the 1990’s I got used to caching being enabled by default and when it was turned off I never noticed the change — except that somehow running ./configure while developing seemed a lot slower in the past ten years!

Thanks to Ralf for pointing out this option as I am now using -C all the time.

GNU Devroom at FOSDEM 2011 (Brussels)

February 8th, 2011

The GNU dev room at FOSDEM 2011 was great, the room had 100 seats and every talk was packed out with people standing at the sides.

The GNU talks were:

Org-Mode: your life in plain text by Bastien Guerry on the amazing personal organizer mode for emacs

Dynamic hacking with Guile by Andy Wingo on the new release of Guile 2.0 and hacking a twitter clone (“Peeple”) live in 15 minutes using Guile’s new web framework – amazing,.

GNU Autotools by Ralf Wildenhues with lots of useful tips for autoconf and automake.

GNU Network Security Labyrinth by Simon Josefsson, a nice tutorial on how different security protocols have developed over time, in order to solve the problems with earlier attempts.

GNU recutils: your data in plain text by Jose E. Marchesi, tools to work with your data on the command-line.

There were also two talks from FSF Europe,

Power, Freedom, Software. Why we need to divide and re-conquer our systems by Karsten Gerloff.

Non-free software advertisement – presented by your government by Matthias Kirschner, about the successful PDFreaders.org campaign.

Thanks to José and Andy for organising the day, it was a great set of topics.

05022011446.small.jpg

Karsten Gerloff talking about freedom and the cloud.

05022011445.small.jpg

Simon Josefsson finishing his talk

05022011444.small.jpg

The room

Apache reference manual

January 31st, 2011

This is the latest printed free software manual I’ve been working on – The Apache Reference Manual (for Apache version 2.2.17) (ISBN 9781906966034, RRP £19.95). It’s the biggest single volume I’ve published – over 850 pages. I’m donating $1 to the Apache Software Foundation for each copy sold.

apache.jpg

GNU Parallel – a map operator for the command line

January 13th, 2011

GNU Parallel is a nice recent addition to my unix toolkit. It can do a lot of things, my favorite is as a “map” operator for the command-line, automatically parallelising over the arguments of a command.

The command

parallel command {} ::: arg1 arg2 arg3 ...

is equivalent to

 command arg1 &
 command arg2 &
 command arg3 &
 ...

run in parallel. The {} is replaced by each argument in turn.

As part of the regular updates to GSRC I need to run “make check-update” for several hundred directories for all the gnu packages. With a for loop that takes a long time, as “check-update” examines remote urls to see if a new version of a package has been released. With parallel I can use

parallel make -C {} check-update ::: gnu/*/

to get the same effect but in parallel (by default 9 processes simultaneously – to change it use -j N).

The author of GNU Parallel, Ole Tange, is giving a talk at FOSDEM (Brussels) in the GNU Dev room on Saturday 5 Feb 2011, see http://www.gnu.org/ghm/2011/fosdem/ for details. Hope to see you there!

Org Mode manual published

December 17th, 2010

I’ve published a paperback edition of the Org Mode 7 Reference Manual (subtitled “Organize your life in GNU Emacs”). For each copy sold $1 is donated to the Org project.

PostgreSQL 9 manuals

December 16th, 2010

I’ve just published the PostgreSQL 9 Reference Manual (in 4 volumes) through my company Network Theory Ltd.

It’s over 1,600 pages, so proofreading it was quite a task! One of the side-benefits of publishing free software manuals is that I do get to know all the features of the software.

The main new PostgreSQL feature that I’ll be using is full text indexing, for a Django site I’m working on. I was previously running PostgreSQL 8.2 on my server and updated it to PostgreSQL 9 yesterday to get support for this. Thanks to the new pg_upgrade migration tool in PostgreSQL 9 that should also the last time I’ll need to do the full pg_dump/pg_restore upgrade (always a bit of a hassle on a large database).

Incidentally, for every volume of the manual sold I’m donating $1 to the PostgreSQL project. I was able to give over $2,000 from the sales of the previous edition for PostgreSQL version 8.