Bobulate

Home [ade] cookies

How new software ends up on KDE4-Solaris

November 17th, 2010

I really like it when new software packages are mentioned in blogs or on the dot. Those — based on a combination of personal interest and seeing how the community reacts — drive my selection of packages for OpenIndiana and OpenSolaris (such as it was). Something like the dot story on Sentinella says to me “this is interesting software.”

So I’ve started to package Sentinella, and you can find progress in the -460 repo. What progress? You ask. Yeah, it’s not trivial and the code shows some gcc-isms, for which I have filed bug reports. So as of this writing, there’s no Sentinella spec file in there. It’ll come.

I spotted a new poppler release (1.5.2) in its work towards a 0.16 version. I think I spotted that in a Freshmeat RSS feed somewhere. Packaged most of that up, although it too has some gcc-isms. Balked at registering for the FreeDesktop bugzilla, here are the patches. A gcc-ism for variable-length arrays, introducing a Solarisism for RPATH, using the <ios> header to get std::hex and std::dec, and a gcc-ism around unicode strings. Simple.

Speaking of unicode strings, it seems that gcc will take “\u0161″ as a unicode string and encode it in UTF-8 as two bytes. The Sun compiler doesn’t like that as it interprets \u0161 as a UTF-16 character, which doesn’t fit in a char. On the other hand, U”\u0161”, a unicode character string, is not a char * but a short *. I ended up using gcc to determine that “\u0161” is the same as “\305\241” (octal!) and patched poppler to use that.

But while searching for some information on unicode strings, I found Lukas Lalinsky (see, no unicode for me; sorry Lukáš) blogging about taglib releases, so I bumped that one as well.

I guess the summary answer to “how does software end up in the KDE4-Solaris specfile repository and package server?” is “it’s a crap-shoot depending on what I notice.” That’s not entirely good, but it also means that we’re largely driven by KDE needs.

In closing, a shout-out to Jarosław and the KOffice folk for providing some really useful information for packagers. That deserves a whole separate post (e.g. when KOffice is indeed packaged for OpenSolaris and OpenIndiana).

Rain and Reason

November 15th, 2010

As Sebas has already noted, there was a KDE e.V. board meeting over the weekend. I had Claudia, our business manager, over on wednesday and we had a good thursday at the NLUUG Fall Conference on Security (where Claudia ran the booth and I was acting as part of the NLUUG board). I think there’s a real advantage to getting together before a board meeting and spending some time chatting and whatnot — it takes those topics out of the meeting time and allows us all to synchronize a little on what the current issues are.

You might claim that the main issue was horrific weather, with storm, rain, more rain and cold going on in the city of Nijmegen. I might claim that the main issue was how to eat all the food.

Another issue we ran into was the visible lack of visibility (!?) of the documentation for Sprints and developer events. Sjors is the catalyst here. I think we spend about half of the total KDE e.V. budget on sprints (the rest is Akademy and office and personnel costs), and you can find plenty of mention of events in the quarterly report (PDF) or on the Dot once they’ve happened. So the board thought that the sprint organization mechanism was pretty obvious, and it turns out it’s not.

But thanks to that realization, we now have Sjors being all enthusiastic about a KMess event, and I’m starting to plan an KDE4-OpenIndiana event and there’s something I need to cook up further with Celeste, too. So expect more entries in the upcoming-sprints department soon. One of the tasks I’ve taken away from the board meeting is to improve the sprint HOWTO with perhaps some more fine-grained instructions or checklists. But I’ll give my own interpretation of what a sprint is and what it’s for first:

A sprint is a single event, highly focused on technical results, in one location with a short time frame (a week or less) and a single topic; the topic is most often development of an application or library. A sprint is organized by an existing development community, and has a small core group of attendees (six to ten people). A sprint is 80 percent sweat (e.g. getting work done and of that, let’s say 60/40 for doing stuff and planning future doing) and 20 percent social.

You’ll note that some things we call sprints aren’t, by this personal definition. Those events are swept up by the more general term “developer events.” It’s not like we hold up events plans to this simple descriptive yardstick — feel free to come up with something else.

KDE e.V. supports the organization of these events — financially, sometimes administratively, and rarely (simply because it’s almost never needed) organizationally. However, KDE e.V. doesn’t come up with its own sprint ideas, nor will it (generally) approach people saying “you should do a sprint.” It waits for (sub)communities within KDE to come up with something and to show off a plan — or even a sketch of a plan — before starting to act.

So, to paraphrase the A-Team: if you (collective, addressing a developer community) have a problem, and no one else can help, maybe you should get together to solve it — and then you can ask the K-team (KDE e.V.) to support your efforts by covering the costs of getting together to solve that problem. ( — Ed: that’s not a very good paraphrase at all.)

Some notes on OpenSolaris packaging

November 10th, 2010

Thanks to the fine folks at Belenix and OpenIndiana, the package serving mechanism for the KDE4 OpenSolaris packages has changed. Changed for the better, because we now use Apache HTTPD to serve up the files themselves. This removes one of the big issues with our earlier package-serving setups, which was that connections and downloads were unreliable and it could take many many attempts to get a large file.

The trick is to use Apache with a reverse proxy (ProxyPass directive) to pass on some requests to an internal pkg.depotd and to use a rewrite rule to modify other requests to match the on-disk layout of the repository. Sriram N, Shawn W and Alastair helped out in finally pushing this out the door.

As a side effect, the correct publisher to use for KDE 4.5.3 packages — including the Plasma Desktop, KDE Platform and KDE applications — has changed. We will no longer be futzing with port numbers in public, but instead have a human-readable URL. To set up your system to use the latest KDE packages, use

pkg set-publisher -O http://solaris.bionicmutton.org/pkg/4.5.3/ kdeips-dev

That particular URL will only serve 4.5.3 packages and such updates and additional applications as slip in. Eventually, we will get a /4.5.4/ package repository as well. We’re still debating a little on how to do a “4.5” or “latest everything” repository. Some measure of deduplication would be nice if we’re going to be serving up multiple repositories.

SysV packaging: On a vaguely related note, I’ve regularly had to battle SysV packaging on OpenSolaris. The legacy packaging system (pkgadd, pkginfo) and the new package system (IPS, with pkg as the main tool) are usually well-integrated, but there are edge cases that break stuff — usually the legacy bits. However, the build system (pkgbuild, which is pretty much rpmbuild for Solaris as I understand it) uses the legacy package system for information even while it builds modern packages. So that means I sometimes have to fool pkginfo(1) into thinking that a particular package is installed (or not).

To fool pkginfo(1), you need to manipulate the directories in /var/sadm/pkg. There is a directory there for every package on the system that is known to the legacy packaging system. To hide a package, just thrown away the directory (probably tar it up). Each directory holds a pkginfo file, which is a straightforward key=value file; to tell pkginfo that a package exists, just create a directory of the appropriate name and copy an existing pkginfo file in there, then adjust the contents so it vaguely makes sense. The important settings seem to be PKG and PKGINST. The rest is important only if you’re dealing with officially supported software.

Future of Solaris10 packaging: The specfile repository that we maintain has a lot of material that is there specifically to support Solaris 10. The people who are active in the repository don’t use S10, and I think the complexity imposed by supporting OSOL and S10 is starting to hurt. If we’ve got a complexity budget, it would be much better spent in supporting OSOL and OI (e.g. the future) instead of the past. No concrete plans yet, but I can imagine us tagging the repo at some point with eol-s10 and then ditching all the Solaris 10 support. Thanks to repository history, it’s not really gone, but won’t burden us in the future.

Long-standing Konsole bug in OSOL squashed

November 10th, 2010

Consider the following snippet of C code:

write(1, "foo", 3);
write(1, "bar", 3);
write(1, "", 0); write(1, "", 0);
write(1, "moose", 4);

You expect to get 10 bytes on stdout like this (“foobarmoos”). The 0-byte writes are no-ops. In OpenSolaris, though, we’ve had a long-standing bug in Konsole that makes it hang sometimes. Well, not hang per se, it just ignores all subsequent output from the shell. It took some time before we discovered that it wasn’t hanging, just ignoring output and that you could still interact with the shell (e.g. type “exit”) to recover somewhat.

Various attempts at a fix have been tried. There are two hacks in kdelibs kpty library right now that attempt to address the problem. One accounts for FIONREAD returning 0 bytes available to be read and the other pushes two streams modules on to the terminal. The former doesn’t catch all the cases that block further output, and the latter breaks terminal size setting, which is terribly annoying. Most things assume 80×24 when TIOCGWINSZ fails.

But the basic assumption is that when FIONREAD indicates that there is data available from the terminal, that read() will always return more than 0 bytes and that 0 bytes indicates an end-of-file situation. Testing (and it’s annoying testing, driven from an xterm-only failsafe login with twm) indicates that 0-byte writes drop 0-byte chunks into the stream that kpty sees — a typical sequence is this: FIONREAD returns 10; read() returns 6 bytes (“foobar”); read() returns 0; read() returns 0; read() returns 4. Those two 0-byte reads correspond to the 0-byte writes in the stream, and they tell kpty to close the stream, which in turn stops output from reaching konsole.

I suppose the Linux and FreeBSD terminal subsystems suppress these 0-byte writes; I really don’t know why Solaris does amalgamate non-zero writes (e.g. “foo” followed by “bar” arrives as “foobar”) but keeps 0-byte writes on their own. Maybe it’s a matter of “don’t do 0-byte writes”, but experience shows that 0-byte writes happen all the time, in error messages, when hitting ^C in the shell, and other situations. So on Solaris at least, kpty needs to be armoured against them.

I’m building an updated kdelibs package now, it will hit the package server repo on bionicmutton later today, and I pushed the changes — also with some other Solaris corner cases — into KDE SVN for KDE 4.6 later. It feels good to sit down and finally ferret out a fix.

Murphy’s Day

November 10th, 2010

If something weird is happening with a server, never think “It’ll just be an hour or two.” Never think “If I’m going to be in the server room anyway, I might as well do foo as well to another box.” Since I thought both of these foolish things, it shows off that there’s definitely areas of Linux system administration that I’m no good at and that are needlessly complicated, and that I’m an inveterate optimist when it comes to these things.

The CodeYard server — a five year old IBM x306 with hard drives showing over 30000 hours of continuous operation and which has had uptimes over 500 days — slowed to a crawl, then rebooted yesterday. Sjors pinged me by phone, so I biked to the University to take a look with him. While en-route, the box did another kernel panic while running fsck(8). Ugh.

Now, working on a server that has two partially-mirrored 250GB SATA-150 hard drives and only 1GB of RAM (seriously, when we got this machine it was a sensible box for supporting medium sized workgroups, now my phone has more oomph) just takes forever. It never takes just an hour or two to wait for GEOM mirror to complete and then the fsck(8) to wind up and then .. bam, another kernel panic. By the end of the day we hadn’t really pinned down what was causing the problem, but memcheck seems in order.

All the data — students SVN and git repositories — on the machine seems safe, but we’ve pretty much turned off all the services offered by the box by various service jails until we get things sorted out.

So one failure doesn’t a Murphy’s day make. The second is that my laptop — which worked in the morning and didn’t when I got to the server room — has suddenly forgotten that it has a display panel attached to it, so I don’t see a thing. Not even BIOS POST messages. It still seems to boot into Fedora OK and I can even log in to my wonderful pink desktop (now there’s a blessing in disguise). Can’t see a thing. This particularly puts a crimp in the plan to use the laptop as a KDE demonstration machine during the NLUUG fall conference. I might end up lugging a desktop machine along instead.

In parallel with all this I did some upgrades on the EBN machine, which was foolish of me. That server had been running off of a spare laptop drive for some time now — a situation that was bound to come crashing down at some point. So the plan was simple: add a 500GB data disk, put back the Sun 10kRPM SAS disk that came out of the machine some time ago, copy boot stuff to SAS disk, reboot, done.

Yeah, right.

Three things I’d forgotten: dump + restore no longer works, making disks bootable is non-trivial and initrd is some brain-dead invention intended to prevent you from moving things around effectively. Give me FreeBSD, which at least will boot (quickly) and then complain and you can type in the root directory for single-user mode in a human-friendly fashion.

In the end I dd’ed the old disk onto the new disk, then did a chroot and mkinitrd. It just doesn’t seem right. Maybe I’ve missed a really obvious manpage somewhere explaining how the boot process works nowadays and how to migrate an installation to a different disk (lazyweb!). Tracking down the remaining references to the old disk took a bit longer, but the machine is up-and-running again. Now my next challenge is to convince the disk subsystem that I hot-attached a new drive (which would be /dev/sdf) which is physically identical to /dev/sde, and then dd everything over again so there’s a spare boot disk.

Plenty of things to go wrong. In retrospect, the old Nethack adage serves best (e.g. when going down stairs while burdened with a cockatrice corpse) “just don’t do that.”

Boux!

November 1st, 2010

Scary midnight pumpkinSo hallowe’en has shuffled off, zombie-like, for another year. Darkness falls faster now, with the start of daylight saving. About every three months, folks in this street in Lent grasp an opportunity to hold a street party. This time I actually made flyers stating a time, date (not everyone is hip to hallowe’en in the Netherlands) and place and the instructions “bring food and/or drink and we’ll see what happens.” Those instructions apply to most of our street-fests. There’s almost 20 kids in the immediate neighbourhood, so they rush around until tired and the adults produce some very nice bits of cookery. Ms. I. had pumpkin flaps (like apple turnovers, but with pumpkin) and Ms. B. had a fine pumpkin-chicken pie. There was beer, wine, more beer, jenever and special Texels bitter and the fires burned down around midnight — that’s what you get when the next day is a working day for most.

The photo is of the pumpkin I made with Mira and Amiel; they grew it in their garden plot and it grew to about 10kg. They did the design and I wielded the butcher’s knife — although I did make Mira scoop out the pumpkin gloop with her bare hands, for the scary slimy hallowe’en feeling.

Sun Ray restored

October 29th, 2010

It’s been ages since I wrote (or complained) about Sun Ray Server Software (SRSS) on OpenSolaris in combination with KDE4. That was because I wasn’t using it and my Sun Ray terminal sat idle. But a bit of house-rearranging has made the device useful again, and I spent an hour futzing with it today to get things working again.

SRSS is a bit complicated — it’s officially delivered only for Solaris 10 and it exists separately from the client devices, so just because you have a Sun Ray doesn’t mean you can do anything with it. Certainly there’s issues on OpenSolaris (or OpenIndiana). I seem to have 4.2 running — it’s a bit unclear whether there’s a version 5 available, and last time I blogged about SRSS it was enthusing about the new addition of Porter-Duff compositing, which made the Plasma Desktop beautiful on the thin client again.

A good official(-ish?) source of information on SRSS is the sun-rays.org wiki which has a section on getting this stuff to work on OSOL as well. Plus notes for build 134 or later, which is what KDE4 requires. Two blog entries I found very helpful were Chris Gerhard about build 130 and later and the Grey Blog on 2009.06. Both point out some tweaking that’s needed to get gdm to like the thin client — otherwise it starts on the thin client and stops about 60 milliseconds later.

After adding the requires symlinks and font paths, gdm starts up normally.

From there, I can now display KDE4 on my FullHD TV downstairs while the server hums gently in the attic. It’s nice to be able to move around in the house and have the desktop available everywhere. Session resume is also supported, so if I switch off the Sun Ray and switch on again later, my desktop is there.

Another benefit of having a desktop available in the living room is that it enables pair programming when random developers stop over, but that’s a topic for another time.

KDEPIM 4.4.7 available in OpenSolaris

October 25th, 2010

Since Allen pointed out that the KDE PIM team had released another 4.4 version, we’ve bumped the KDEPIM version available for OpenSolaris through our KDE 4.5 series specfile repository. PIM hasn’t done a 4.5 release as such, concentrating on 4.6 and a full Akonadi-based release, so it’s good to see updates and bugfixes going on nonetheless.

You can build the software using the specfiles — we have not uploaded newer packages. That will have to wait until we’ve done a little more research into how to best serve up packages (as more people are using them now, we’re hitting various performance issues in the way we do things).

Lazyweb, Ethiopian

October 24th, 2010

Dear Lazyweb,

My brother lives in Addis Ababa. This means I sometimes get cool stuff (like coffee straight from the highlands) and sometimes rather incomprehensible parcels in the mail. In this case, I have a sachet of "Raw Tikur Azmude" and a bag of very interestingly colored beans. See photo. Since my Amharic just isn’t what it needs to be — and I still have Hausa on the list of things to do above learning Amharic — I’m at a loss as to what to do with this. So anyone with some Ethiopian cooking hints, please drop me a note.

Love in International cooking,

[ade]

A little bit Elfish

October 22nd, 2010

Q: how many different STL implementations are there available on OpenSolaris?
A: at least five by my count: Cstd, stlport4, stlport5, stdcxx4 (either as package from Oracle or from our KDE4-Solaris packages) and g++.
Q: can you mix and match them?
A: only if you like interesting segfaults. Take ‘Hello World’ and compile as follows: CC -lstdcxx4 hello.cc. Ignore the (dire and in this case serious) linker warnings. See program. Run program. Run, program, run! See program dump core right on the new carpet. Bad program. The crash happens inside the static constructors of one of the two libraries as it calls methods in the other one which has a very different idea of how things work. LD_DEBUG=libs might help you spot this. Compare also CC -library=no%Cstd -lstdcxx4 -lCstd versus CC -library=no%Cstd -lCstd -lstdcxx4 (the former crashes, the latter doesn’t, for some reason).
Q: so why is this a problem?
A: plugins make this a problem. If you load a plugin (i.e. dlopen() a shared object) which uses a different C++ STL library from the one you’re already linked to, then you hit the problem described above:
Q: so how can that happen?
A: different parts of an OpenSolaris system are built with different C++ STLs. This is a historical problem; at some point there were plans to slowly migrate all the desktop-relevant stuff to stdcxx4 (that’s the Apache STL, by the way).but it’s a big effort and takes time. One component that is still built against Cstd is Firefox. So all the Firefox plugins that are written in C++ or that use system components in C++, link to Cstd. Like, for instance, libtotem plugins.
Q: so just don’t load them, right?
A: how do you identify “them” then? In Qt 4.7.0 the demo browser (WebKit based) goes and loads all the Firefox plugins, pretty much on startup. And that crashes in a slow and strange way — it dlopen()s a libtotem plugin, loads in libCstd, runs the initializers on libCstd and boom. It’s probably related to the order in which the libraries are loaded. In any case, we don’t know beforehand what libraries a given plugin uses nor can we say what plugins there are.

If you like, you can pretend this was a conversation between Legolas and Haldir and not a run-in to looking at library files in ELF format.

So, having established that plugin loading can kill a Qt application in a gory fashion and that it happens in practice, the KDE4-Solaris team (a kind of Fellowship of the Ring, committed to binding themselves in the darkness that is OSOL) went looking for a solution. With /usr/bin/dump -Lv you can get a listing of dynamic items in an ELF object, including the libraries it’s going to load (e.g. grep for NEEDED). We decided that a likely place to insert a check was in QLibrary::load_sys(), which has some code to search for libraries and also already has some checks in place for strange platforms. So we had in mind something like bool is_library_acceptable(const char *filename) which would do the necessary check. If the library that’s about to be loaded uses Cstd, then don’t load it at all and return failure. That way, we avoid the mysterious crashes caused by STL incompatibility.

When mucking about with ELF files, the obvious resource to use is libelf. However, the libelf available on OpenSolaris is incompatible with largefile support. Why? Because it uses off_t in some structures instead of a 64-bit offset which could accomodate but 32- (non-largefile) and 64-bit (largefile) offsets. It does have the decency to #error out.

I suppose gelf would have been a solution. But thinking about the problem of finding which libraries are NEEDED by a loadable object, I came up with the following algorithm (which applies regardless of which ELF reading approach I’m using):

  • Read the ELF header
  • Find the ELF .dynamic section.
  • Scan the .dynamic section for NEEDED tags, look them up in the .dynstr section.
  • Bail out if libCstd.so is mentioned.

The precise details — sizes of headers, layouts of structures — depends on whether it’s a 32-bit or a 64-bit ELF file, but the alogrithm remains the same.

I initially coded this in C, with macros. (Bear with me here — I realize now that I tend to iterate over my own code as I understand the problem and solution space better, so I often start with something horrible that gets improved later). Then the code ended up looking like this:

#define ALGORITHM_STEP(d0,d1) \
  /* a step using values from d0 and d1 */
if (is32) { 32bit_t d0,d1; ALGORITHM_STEP(d0,d1); }
else { 64bit_t d0,d1; ALGORITHM_STEP(d0,d1); }

Ugh. From there I went to C++ and template functions. Each macro became a template, but I could also put the types in there, albeit a little awkwardly:

template <typename t0, typename t1> ALGORITHM_STEP() 
{ t0 d0; t1 d1; /* a step using values from d0 and d1 */ }
if (is32) { ALGORITHM_STEP<32bit_t0,32bit_t1>() } 
else { ALGORITHM_STEP<64bit_t0, 64bit_t1>() }

Feel free to go “Ugh” again, especially if you’re an experienced C++ template hacker. Oddly enough, I haven’t written much in the way of neat template code, just usually debugged what others have done. So by now my program was a bunch of template functions and then a main function with a sequence of if (is32) {} statements to string together those template functions. I could actually collapse it down to just one if() statement, but then the template would have 8 or 9 parameters and it’d be even more ugly than it already is.

From here I remembered non-type template parameters and specializations. It came down to this: define a template class with an integer parameter that can be specialized to hold the 32- or 64-bit types that I need. This looked like so:

template <int> class elftype;
template<> elftype<32> {
  typedef 32bit_t0 t0; typedef 32bit_t1 t1; } ;
template<> elftype<64> {
  typedef 64bit_t0 t0; typedef 64bit_t1 t1; } ;

Not much change in the program itself, except that instead of writing 32bit_t0 I’d write elftype<32>::t0. But this specialization approach applies to the functions themselves, too, and they became:

template<int n> ALGORITHM_STEP() {
  elftype<n>::t0 d0; elftype<n>::t1 d1;
  /* actual algorithm step */ } ;

Now it makes sense to collapse the whole thing into a single function parameterized by 32- or 64-bits, and then the main function I was looking for becomes something like this:

/* read ELF header */
if (is32) { return is_library_acceptable<32>(); }
else { return is_library_acceptable<64>(); }

The algorithm needs to be written only once, it’s specialized for the two cases, the types are neatly squirreled away in their respective class templates. Although functionally it’s no different from the original C version, I find it much more satisfying — for one thing, it’s extensible in a way that wouldn’t be easy with macros (say if there’s a 48-bit ELF format).

And the original purpose, of checking whether a plugin is usable before loading it? We’ve got that patched in for the next package release of Qt 4.7.0 on OpenSolaris, and the demo browser won’t crash anymore (but it won’t play whatever libtotem supports, either).