Adding Multicore Support for a MIPS Board to the Fiasco Microkernel
Monday, December 1st, 2025I thought that before I forget, I should try and write down some of the different things I have tackled recently with regard to L4Re and the different frameworks and libraries I have been developing. There have been many pauses in this work this year, and it just hasn’t been a priority to record various achievements, particularly since the effort is generally ongoing.
Multicore MIPS
As established members of my readership may recall, my entry point into L4Re was initially focused on establishing support for MIPS-based single-board computers and devices. Despite developing support for many of the peripherals in the Ingenic JZ4780 used by the MIPS Creator CI20, one thing I never did was to enable dual-core processing in L4Re and the Fiasco microkernel in particular. I may have thought that such support was already present, but, well, that was an unreasonable belief.
This year, I was indulged with a board based on a later SoC that also has two processing cores. It supposedly has two threads per core as well, but I don’t actually believe it based on the way the unit in the chip responsible for core management behaves. In the almost illicit documentation I have, there is also no real mention of scheduling threads in hardware, so then I am left to guess whether these threads behave like cores. And whatever hardware interface is provided does not seem to be the MIPS MT implementation, either, since it is reported as not being present.
These days, one has to rely on archived copies of MIPS documentation after MIPS Technologies threw all of that overboard to bet the farm on RISC-V, which is admittedly a reasonable bet. Former owner, Imagination Technologies, disowned the architecture and purged any documentation they might have had from their Web site, although I think that any links to toolchains might still work, but that is perhaps prudent on the basis of upholding any Free Software licence obligations that remain.
(Weirdly, MIPS Technologies recently got themselves acquired by GlobalFoundries, meaning that they are now owned by the part of AMD that was sold off when AMD decided it was too costly to fabricate their own chips, becoming “fabless” instead, just as MIPS had always been.)
I also wonder what the benefit of simultaneous multithreading (SMT) is on MIPS over plain old symmetic multiprocessing (SMP) using multiple cores. Conceptually, SMT is meant to use fewer resources than SMP by sharing common resources between execution units, eliminating much of the costly functionality needed by dedicated cores. But MIPS is different from other architectures in that it does not, for example, maintain page tables in hardware for processes/tasks, which are the sort of costly things that one might want to see eliminated.
Instead, MIPS employs a translation lookaside buffer (TLB) to handle virtual memory mappings, and each “virtual processing element” (VPE) is apparently provided with an independent TLB in the MIPS MT implementation. It seems we can think about a VPE more or less as a conventional processor or core given the general description of it. However, each “thread context” (TC) with a VPE may share a TLB with other TCs, although they will have their own address space identifier (ASID), meaning that their own memory mappings may differ from that of other threads in the same process or task. Given that the ASID would typically be used to define independent address spaces at a process or task level, this seems like an odd decision. One can be forgiven for being confused!
In any case, I needed to familiarise myself with the documentation and with work previously done in the Linux kernel. That kernel work, it turned out, was a work of bravery seemingly based on the incomplete knowledge that we still rely on today. Unfortunately, it seems that certain details are incorrect, these pertaining to the interrupt request management and the arrangement of the appropriate hardware registers. The Linux support appeared to use a bank of registers that, instead of applying to interrupts directed at the second core in particular, seems to be some kind of global control for interrupts on both cores. Sadly, the contributor of this code is no longer communicative, and I just hope that he is well and finding satisfaction in whatever he does now.
Into Fiasco
The very first thing I had to do in Fiasco, however, was to get it working on this SoC. This was rather frustrating, and in the end, the problem was in some cache initialisation code. Because of course it was. Then, my efforts within Fiasco to support multiple cores were informed by existing support in the microkernel for initialising additional processors. Reacquainting myself with the kernel bootstrap process, I found that the architecture-specific entry point is the bootstrap_arch method, provided by the kernel thread implementation. This invokes the boot_all_secondary_cpus method, provided by the platform control abstraction.
Although there is support for the MIPS Concurrency Manager (CM) present in Fiasco, it is not useful for this particular SoC, so a certain amount of dedicated support code was going to be required. Indeed, the existing mechanisms were defined to use the MIPS CM to initialise these “secondary CPUs” in the generic platform control implementation. Fortunately, the C++ framework employed by Fiasco permits the overriding of operations, and I was able to fairly cleanly provide my own board/product-specific implementation that would use the appropriate functionality to be written by myself.
That additional functionality was support for the SoC’s own Core Control Unit (CCU), which is something that appears to be thankfully much simpler than the MIPS CM. The CCU provides facilities to start cores, to monitor and control interrupts, and to support inter-core communication using mailboxes. Of particular interest was the ability to start cores, to permit the cores to communicate via the mailboxes, and for such communication to generate mailbox interrupts. For the most part, the abstraction supporting the CCU is fairly simple, however.
Interrupt Handling
Perhaps the most challenging part of the exercise was that of describing the interrupt handling configuration. Although I was familiar with the code that connects the processor-level interrupts to the different peripheral interrupt handlers, it was not clear to me how these handlers would be allocated for, and assigned to, each processor or core. Indeed, it was not even clear how the interrupts behaved in the hardware.
I suppose I could have a long rant about the hardware documentation. Having already needed to dig up an address for the CCU, I noticed that the addresses for the interrupt controller in the manual for the chip were simply fictitious and very possibly originating in a copy-paste operation, given that the register banks conflicted with the clock and power management unit. More digging eventually revealed the actual location of these banks. One helpful aspect of the manual, however, was the information implicitly given about the spacing of these register banks, even though I think the number of banks is also a fiction, bound up with the issue of how many cores and/or threads the chip actually has.
The way that the chip appears to work is that each core can enable and mask (ignore) individual interrupts. It does so using its own coprocessor registers at the MIPS architecture level, but to identify and control the individual interrupt sources, it uses registers provided in the appropriate bank by the interrupt controller. In a single-core processor, there is only one set of registers, and the single core can switch them on and off for its own benefit. But with multiple cores, each core can apparently choose to receive or ignore interrupts, leaving the others to decide for themselves. And if we ignore the top-level control, we might even allow one core to set the preferences for itself and other cores, since it can access all of the register banks for the different cores in the interrupt controller.
Now, the fundamental interrupt handling for this family of chips has been consistent throughout my exposure to L4Re and Fiasco, with a specialisation of Irq_chip_gen providing access to the interrupt controller registers. Since this abstraction for the interrupt controller unit works and is still applicable, instead of trying to make it juggle multiple register banks, I decided to wrap it in another abstraction that would allow interrupts to be associated with a specific core or with all cores, replicating the same fundamental interface, and that would redirect operations to the individual core-specific units according to the association made for a given interrupt.
IPIs
In my exploration of the interrupt handling code, I repeatedly encountered the acronym IPI, which turns out to mean inter-processor interrupt, where one core may raise an interrupt in another core. Although it was apparent that the CCU’s mailbox facilities would be the vehicle to support such interrupts, it was not immediately obvious how these interrupts might be used within Fiasco. In fact, it was while trying to operate the kernel debugger, JDB, that I discovered one of their applications: the debugger needs to halt secondary cores or processors to be able to safely inspect the system.
Thus, I attempted to provide an IPI implementation by following the general pattern of implementations for other platforms, such as the RISC-V support within Fiasco, relying on that architecture’s closer heritage than, say, ARM or x86(-64), and generally hoping that the supporting code would provide more helpful clues. Sadly, there is not much formal guidance on such matters in the form of documentation or other explanatory materials for Fiasco, at least as far as I have discovered.
One aspect of my implementation that I suspect could be improved involves the representation of the different IPI conditions, along with usage of the atomic_reset operation in the generic MIPS IPI support. This employs an array to hold the different interrupt conditions occurring for a core, rather like a big status register occupying numerous words instead of bits, with atomic_reset obtaining the appropriate interrupt status and clearing any signalled condition.
Given that the CCU is able to maintain core-specific state in the mailbox registers, one might envisage a core setting bits in such registers to signal IPI conditions, with the interrupted core clearing these individually, doing so safely by using the locking mechanism provided by the CCU. However, since my augmentation of the existing IPI status management seemed to result in a functioning system, merely augmenting the existing code with the interrupt delivery and signalling mechanisms of the CCU, I did not feel particularly motivated to continue.
Bad Bits
It should be mentioned that throughout all of this effort, I encountered rather severe problems with the UART on the development board concerned. These would manifest themselves as garbled output from the board along with a lack of responsiveness or faulty interpretation of input. Characters in the usual ASCII range would be punctuated and eventually overrrun by “special” characters, some non-printable, and efforts to continue the session would typically need to be abandoned. This obviously did not help troubleshooting of either my boot payloads or the kernel debugger.
Some analysis of the problem was required, and in an attempt to understand why certain wholly normal characters would be transformed to abnormal ones, I wrote out some character values in their transmitted forms, also incorporating extra elements of the transmission:
0x0d is 00001101 - repeated: 0000110100001101 - stop bit exposed: 000011011000011011000011011000011011 0xc3 is 11000011 -------- -------- --------
In trying to reproduce the observed character values, I looked for ways in which the bitstream would be misinterpreted and yield these erroneous characters. This exercise would repeatedly succeed, suggesting some kind of slippage in the acquisition of characters. It turned out that this was an anticipated problem, and the omission of appropriate level shifters for the UART pins meant that the signalling was effectively unreliable. A fix was introduced on subsequent board revisions, and in fact, the board was more generally refined and completed, since I had been using what was effectively a prototype.
General Remarks
A degree of familiarisation was required with the mechanisms in Fiasco for some common activities. Certain constructs exist such as per-CPU allocators that need to be used where resources are to be allocated for, and assigned to, individual CPUs. These allocator constructs provide a convenience operation allowing the “current” CPU, being the one running the code, to obtain its own resource. Although the details now elude me, there were some frustrations in deploying these constructs under certain circumstances, but I seemed to figure something out that got me to where I wanted to be in the end.
I also wanted to avoid changing existing Fiasco code, instead augmenting it with the necessary specialisations for this hardware. Apart from the existing cache initialisation routines, this was largely successful. In principle, I now have some largely clean patches that I could potentially submit upstream, especially since they have now stopped insisting on contributor licence agreements. I do wonder if they are still interested in this family of SoCs, however.
I might also stick my neck out at this point and note that if anyone is interested in development boards with MIPS-based processors and a degree of compatibility with Raspberry Pi HATs, they might get in contact with me, so that their interest can be recorded and the likelihood increased of such boards being produced for purchase.