Paul Boddie's Free Software-related blog


Archive for December, 2024

Recent Progress

Monday, December 2nd, 2024

The last few months have not always been entirely conducive to making significant progress with various projects, particularly my ongoing investigations and experiments with L4Re, but I did manage to reacquaint myself with my previous efforts sufficiently to finally make some headway in November. This article tries to retrieve some of the more significant accomplishments, modest as they might be, to give an impression of how such work is undertaken.

Previously, I had managed to get my software to do somewhat useful things on MIPS-based single-board computer hardware, showing graphical content on a small screen. Various problems had arisen with regard to one revision of a single-board computer for which the screen was originally intended, causing me to shift my focus to more general system functionality within L4Re. With the arrival of the next revision of the board, I leveraged this general functionality, combining it with support for memory cards, to get my minimalist system to operate on the board itself. I rather surprised myself getting this working, it must be said.

Returning to the activity at the start of November, there were still some matters to be resolved. In parallel to my efforts with L4Re, I had been trying to troubleshoot the board’s operation under Linux. Linux is, in general, a topic upon which I do not wish to waste my words. However, with the newer board revision, I had also acquired another, larger, screen and had been investigating its operation, and there were performance-related issues experienced under Linux that needed to be verified under other conditions. This is where a separate software environment can be very useful.

Plugging a Leak

Before turning my attention to the larger screen, I had been running a form of stress test with the smaller screen, updating it intensively while also performing read operations from the memory card. What this demonstrated was that there were no obvious bandwidth issues with regard to data transfers occurring concurrently. Translating this discovery back to Linux remains an ongoing exercise, unfortunately. But another problem arose within my own software environment: after a while, the filesystem server would run out of memory. I felt that this problem now needed to be confronted.

Since I tend to make such problems for myself, I suspected a memory leak in some of my code, despite trying to be methodical in the way that allocated objects are handled. I considered various tools that might localise this particular leak, with AddressSanitizer and LeakSanitizer being potentially useful, merely requiring recompilation and being available for a wide selection of architectures as part of GCC. I also sought to demonstrate the problem in a virtual environment, this simply involving appropriate test programs running under QEMU. Unfortunately, the sanitizer functionality could not be linked into my binaries, at least with the Debian toolchains that I am using.

Eventually, I resolved to use simpler techniques. Wondering if the memory allocator might be fragmenting memory, I introduced a call to malloc_stats, just to get an impression of the state of the heap. After failing to gain much insight into the problem, I rolled up my sleeves and decided to just look through my code for anything I might have done with regard to allocating memory, just to see if I had overlooked anything as I sought to assemble a working system from its numerous pieces.

Sure enough, I had introduced an allocation for “convenience” in one kind of object, making a pool of memory available to that object if no specific pool had been presented to it. The memory pool itself would release its own memory upon disposal, but in focusing on getting everything working, I had neglected to introduce the corresponding top-level disposal operation. With this remedied, my stress test was now able to run seemingly indefinitely.

Separating Displays and Devices

I would return to my generic system support later, but the need to exercise the larger screen led me to consider the way I had previously introduced support for screens and displays. The smaller screen employs SPI as the communications mechanism between the SoC and the display controller, as does the larger screen, and I had implemented support for the smaller screen as a library combining the necessary initialisation and pixel data transfer code with code that would directly access the SPI peripheral using a SoC-specific library.

Clearly, this functionality needed to be separated into two distinct parts: the code retaining the details of initialising and operating the display via its controller, and the code performing the SPI communication for a specific SoC. Not doing this could require us to needlessly build multiple variants of the display driver for different SoCs or platforms, when in principle we should only need one display driver with knowledge of the controller and its peculiarities, this then being combined using interprocess communication with a single, SoC-specific driver for the communications.

A few years ago now, I had in fact implemented a “server” in L4Re to perform short SPI transfers on the Ben NanoNote, this to control the display backlight. It became appropriate to enhance this functionality to allow programs to make longer transfers using data held in shared memory, all of this occurring without those programs having privileged access to the underlying SPI peripheral in the SoC. Alongside the SPI server appropriate for the Ben NanoNote’s SoC, servers would be built for other SoCs, and only the appropriate one would be started on a given hardware device. This would then mediate access to the SPI peripheral, accepting requests from client programs within the established L4Re software architecture.

One important element in the enhanced SPI server functionality is the provision of shared memory that can be used for DMA transfers. Fortunately, this is mostly a matter of using the appropriate settings when requesting memory within L4Re, even though the mechanism has been made somewhat more complicated in recent times. It was also fortunate that I previously needed to consider such matters when implementing memory card support, saving me time in considering them now. The result is that a client program should be able to write into a memory region and the SPI server should be able to send the written data directly to the display controller without any need for additional copying.

Complementing the enhanced SPI servers are framebuffer components that use these servers to configure each kind of display, each providing an interface to their own client programs which, in turn, access the display and provide visual content. The smaller screen uses an ST7789 controller and is therefore supported by one kind of framebuffer component, whereas the larger screen uses an ILI9486 controller and has its own kind of component. In principle, the display controller support could be organised so that common code is reused and that support for additional controllers would only need specialisations to that generic code. Both of these controllers seem to implement the MIPI DBI specifications.

The particular display board housing the larger screen presented some additional difficulties, being very peculiarly designed to present what would seem to be an SPI interface to the hardware interfacing to the board, but where the ILI9486 controller’s parallel interface is apparently used on the board itself, with some shift registers and logic faking the serial interface to the outside world. This complicates the communications, requiring 16-bit values to be sent where 8-bit values would be used in genuine SPI command traffic.

The motivation for this weird design is presumably that of squeezing a little extra performance out of the controller that is only available when transferring pixel data via the parallel interface, especially desired by those making low-cost retrogaming systems with the Raspberry Pi. Various additional tweaks were needed to make the ILI9486 happy, such as an explicit reset pulse, with this being incorporated into my simplistic display component framework. Much more work is required in this area, and I hope to contemplate such matters in the not-too-distant future.

Discoveries and Remedies

Further testing brought some other issues to the fore. With one of the single-board computers, I had been using a microSD card with a capacity of about half a gigabyte, which would make it a traditional SD or SDSC (standard capacity) card, at least according to the broader SD card specifications. With another board, I had been using a card with a sixteen gigabyte capacity or thereabouts, aligning it with the SDHC (high capacity) format.

Starting to exercise my code a bit more on this larger card exposed memory mapping issues when accessing the card as a single region: on the 32-bit MIPS architecture used by the SoC, a pointer simply cannot address this entire region, and thus some pointer arithmetic occurred that had undesirable consequences. Constraining the size of mapped regions seemed like the easiest way of fixing this problem, at least for now.

More sustained testing revealed a couple of concurrency issues. One involved a path of invocation via a method testing for access to filesystem objects where I had overlooked that the method, deliberately omitting usage of a mutex, could be called from another component and thus circumvent the concurrency measures already in place. I may well have refactored components at some point, forgetting about this particular possibility.

Another issue was an oversight in the way an object providing access to file content releases its memory pages for other objects to use before terminating, part of the demand paging framework that has been developed. I had managed to overlook a window between two operations where an object seeking to acquire a page from the terminating object might obtain exclusive access to a page, but upon attempting to notify the terminating object, find that it has since been deallocated. This caused memory access errors.

Strangely, I had previously noticed one side of this potential situation in the terminating object, even writing up some commentary in the code, but I had failed to consider the other side of it lurking between those two operations. Building in the missing support involved getting the terminating object to wait for its counterparts, so that they may notify it about pages they were in the process of removing from its control. Hopefully, this resolves the problem, but perhaps the lesson is that if something anomalous is occurring, exhibiting certain unexpected effects, the cause should not be ignored or assumed to be harmless.

All of this proves to be quite demanding work, having to consider many aspects of a system at a variety of levels and across a breadth of components. Nevertheless, modest progress continues to be made, even if it is entirely on my own initiative. Hopefully, it remains of interest to a few of my readers, too.