Long-standing Konsole bug in OSOL squashed
Wednesday, November 10th, 2010Consider the following snippet of C code:
write(1, "foo", 3);
write(1, "bar", 3);
write(1, "", 0); write(1, "", 0);
write(1, "moose", 4);
You expect to get 10 bytes on stdout like this (“foobarmoos”). The 0-byte writes are no-ops. In OpenSolaris, though, we’ve had a long-standing bug in Konsole that makes it hang sometimes. Well, not hang per se, it just ignores all subsequent output from the shell. It took some time before we discovered that it wasn’t hanging, just ignoring output and that you could still interact with the shell (e.g. type “exit”) to recover somewhat.
Various attempts at a fix have been tried. There are two hacks in kdelibs kpty library right now that attempt to address the problem. One accounts for FIONREAD returning 0 bytes available to be read and the other pushes two streams modules on to the terminal. The former doesn’t catch all the cases that block further output, and the latter breaks terminal size setting, which is terribly annoying. Most things assume 80×24 when TIOCGWINSZ fails.
But the basic assumption is that when FIONREAD indicates that there is data available from the terminal, that read() will always return more than 0 bytes and that 0 bytes indicates an end-of-file situation. Testing (and it’s annoying testing, driven from an xterm-only failsafe login with twm) indicates that 0-byte writes drop 0-byte chunks into the stream that kpty sees — a typical sequence is this: FIONREAD returns 10; read() returns 6 bytes (“foobar”); read() returns 0; read() returns 0; read() returns 4. Those two 0-byte reads correspond to the 0-byte writes in the stream, and they tell kpty to close the stream, which in turn stops output from reaching konsole.
I suppose the Linux and FreeBSD terminal subsystems suppress these 0-byte writes; I really don’t know why Solaris does amalgamate non-zero writes (e.g. “foo” followed by “bar” arrives as “foobar”) but keeps 0-byte writes on their own. Maybe it’s a matter of “don’t do 0-byte writes”, but experience shows that 0-byte writes happen all the time, in error messages, when hitting ^C in the shell, and other situations. So on Solaris at least, kpty needs to be armoured against them.
I’m building an updated kdelibs package now, it will hit the package server repo on bionicmutton later today, and I pushed the changes — also with some other Solaris corner cases — into KDE SVN for KDE 4.6 later. It feels good to sit down and finally ferret out a fix.