Paul Boddie's Free Software-related blog


Archive for the ‘NanoNote’ Category

Porting L4Re and Fiasco.OC to the Ben NanoNote (Part 4)

Saturday, March 24th, 2018

As described previously, having hopefully done enough to modify the kernel – Fiasco.OC – for the Ben NanoNote, it then became necessary to investigate the bootstrap package that is responsible for setting up the hardware and starting the kernel.  This package resides in the L4Re distribution, which is technically a separate thing, even though both L4Re and Fiasco.OC reside in the same published repository structure.

Before continuing into the details, it is worth noting which things need to be retrieved from the L4Re section of the repository in order to avoid frustration later on with package dependencies. I had previously discovered that the following package installation operation would be required (from inside the l4 directory):

svn update pkg/acpica pkg/bootstrap pkg/cxx_thread pkg/drivers pkg/drivers-frst pkg/examples \
           pkg/fb-drv pkg/hello pkg/input pkg/io pkg/l4re-core pkg/libedid pkg/libevent \
           pkg/libgomp pkg/libirq pkg/libvcpu pkg/loader pkg/log pkg/mag pkg/mag-gfx pkg/x86emu

With the listed packages available, it should be possible to build the examples that will eventually interest us. Some of these appear superfluous – x86emu, for instance – but some of the more obviously-essential packages have dependencies on these other packages, and so we cannot rely on our intuition alone!

Also needed when building a payload is some path definitions in the l4/conf/Makeconf.boot file. Here is what I used:

MODULE_SEARCH_PATH += $(L4DIR_ABS)/../kernel/fiasco/mybuild
MODULE_SEARCH_PATH += $(L4DIR_ABS)/conf/examples
MODULE_SEARCH_PATH += $(L4DIR_ABS)/pkg/io/io/config
BOOTSTRAP_SEARCH_PATH  = $(L4DIR_ABS)/conf/examples
BOOTSTRAP_SEARCH_PATH += $(L4DIR_ABS)/../kernel/fiasco/mybuild
BOOTSTRAP_SEARCH_PATH += $(L4DIR_ABS)/pkg/io/io/config
BOOTSTRAP_MODULES_LIST = $(L4DIR_ABS)/conf/modules.list

This assumes that the build directory used when building the kernel is called mybuild. The Makefile will try and copy the kernel into the final image to be deployed and so needs to know where to find it.

Describing the Ben (Again)

Just as we saw with the kernel, there is a need to describe the Ben and to audit the code to make sure that it stands a chance of working on the Ben. This is done slightly differently in L4Re but the general form of the activity is similar, defining the following:

  • An architecture version (MIPS32r1) for the JZ4720 (in l4/mk/arch/Kconfig.mips.inc)
  • A platform configuration for the Ben (in l4/mk/platforms)
  • Some platform details in the bootstrap package (in l4/pkg/bootstrap/server/src)
  • Some hardware details related to memory and interrupts (in l4/pkg/io/io/config/plat-qi_lb60)

For the first of these, I introduced a configuration setting (CPU_MIPS_32R1) to allow us to distinguish between the Ben’s SoC (JZ4720) and other processors, just as I did in the kernel code. With this done, the familiar task of hunting down problematic assembly language instructions can begin, and these can be divided into the following categories:

  • Those that can be rewritten using other instructions that are available to us
  • Those that must be “trapped” and handled by the kernel

Candidates for the former category include all unprivileged instructions that the JZ4720 doesn’t support, such as ext and ins. Where privileged instructions or ones that “bridge” privileges in some way are used, we can still rewrite them if they appear in the bootstrap code, since this code is also running in privileged mode. Here is an example of such privileged instruction rewriting (from l4/pkg/bootstrap/server/src/ARCH-mips/crt0.S):

#if defined(CONFIG_CPU_MIPS_32R1)
       cache   0x01, 0($a0)     # Index_Writeback_Inv_D
       nop
       cache   0x08, 0($a0)     # Index_Store_Tag_I
#else
       synci   0($a0)
#endif

Candidates for the latter category include all awkward privileged or privilege-escalating instructions outside the bootstrap package. Fortunately, though, we don’t need to worry about them very much at all. Since the kernel will be obliged to trap them, we can just keep them where they are and concede that there is nothing else we can do with them.

However, there is one pitfall: these preserved-but-unsupported instructions will upset the compiler! Consider the use of the now overly-familiar rdhwr instruction. If it is mentioned in an assembly language statement, the compiler will notice that amongst its clean MIPS32r1-compliant output, something is inserting an unrecognised instruction, yielding that error we saw earlier:

Error: opcode not supported on this processor: mips32 (mips32)

But we do know what we’re doing! So how can we persuade the compiler? The solution is to override what the compiler (or assembler) thinks it should be producing by introducing a suitable directive as in the following example (from l4/pkg/l4re-core/l4sys/include/ARCH-mips/cache.h):

  asm volatile (
    ".set push\n"
    ".set mips32r2\n"
    "rdhwr %0, $1\n"
    ".set pop"
    : "=r"(step));

Here, with the .set directives, we switch this little region of code to MIPS32r2 compliance and emit our forbidden instruction into the output. Since the kernel will take care of it in the end, the compiler shouldn’t be made to feel that it has to defend us against it.

In L4Re, there are also issues experienced with the CI20 that will also affect the Ben, such as an awkward and seemingly compiler-related issue affecting the way programs are started. In this regard, I just kept my existing patches for these things applied.

My other platform-related adjustments for the Ben have mostly borrowed from support for the CI20 where it existed. For instance, the bootstrap package’s definition for the Ben (in l4/pkg/bootstrap/server/src/platform/qi_lb60.cc) just takes the CI20 equivalent, eliminates superfluous features, modifies details that are different between the SoCs, and changes identifiers. The general definition for the Ben (in l4/mk/platforms/qi_lb60.conf) merely acknowledges differences in some basic platform details.

The CI20 was not supported with a hardware definition describing memory regions and interrupts used by the io package. Taking other devices as inspiration, I consulted the device documentation and wrote a definition when experimenting with the CI20. For the Ben, the form of this definition (in l4/pkg/io/io/config/plat-qi_lb60/hw_devices.io) remains similar but is obviously adjusted for the SoC differences.

Device Drivers and Output

One topic that I have not really mentioned at all is that pertaining to device drivers. I would not have even started this work if I didn’t feel there was a chance of seeing some signs of success from the Ben. Although the Ben, like the CI20, has the capability of exposing a serial console to the outside world, meaning that it can emit messages via a cable to another computer and receive input from that computer, unlike the CI20, its serial console pins are not particularly convenient to use: they really require wires to be soldered to some tiny pads that are found in the battery compartment of the device.

Now, my soldering skills are not very good, and I also want to be able to put the battery back into the device in future. I did try and experiment by holding wires against the pads, this working once or twice by showing output when booting the Ben into its more typical Linux-based environment. But such experiments proved to be unsustainable and rather uncomfortable, needing some kind of “guitar grip” while juggling cables and holding down buttons. So I quickly knew that I would need to get output from the Ben in other ways.

Having deployed low-level payloads to the Ben before, I knew something about the framebuffer, so I had some confidence about initialising it and getting something on the screen that might tell me what has or hasn’t happened. And I adapted my code from this previous effort, itself being derived from driver code written by the people responsible for the Ben, wrapping it up for L4Re. I tried to keep this code minimally different from its previous incarnation, meaning that I could eliminate certain kinds of mistakes in case the code didn’t manage to do its job. With this in place, I felt that I could now consider trying out my efforts and seeing what, if anything, might happen.

Attempting to Bootstrap

Being in the now-familiar position of believing that enough has been done to make the software run, I now considered an attempt at actually bootstrapping the kernel. It may sound naive, but I almost expected to be able to compile everything – the kernel, L4Re, my drivers – and for them all to work together in harmony and produce at least something on the display. But instead, after “Starting kernel …”, nothing happened.

The Ben NanoNote trying to boot a payload from the memory card

The Ben NanoNote trying to boot a payload from the memory card

It should be said that in these kinds of exercises, just one source of failure need present itself and the outcome is, of course, failure. And I can confirm that there were many sources of failure at this point. The challenges, then, are to identify all of these and then to eliminate them all. But how can you even know what all of these sources of failure actually are? It seemed disheartening, but then there are two kinds of strategy that can be employed: to investigate areas likely to be causing problems, and to take every opportunity to persuade the device to tell us what is happening. And with this, the debugging would begin.

Porting L4Re and Fiasco.OC to the Ben NanoNote (Part 3)

Friday, March 23rd, 2018

So far, in this exercise of porting L4Re and Fiasco.OC to the Ben NanoNote, we have toured certain parts of the kernel, made adjustments for the compiler to generate suitable code, and added some descriptions of the device itself. But, as we saw, the Ben needs some additional changes to be made to the software in places where certain instructions are used that it doesn’t support. Attempting to compile the kernel will most likely end with an error if we ignore such matters, because although the C and C++ code will produce acceptable instructions, upon encountering an assembly language statement containing an unacceptable instruction, the compiler will probably report something like this:

Error: opcode not supported on this processor: mips32 (mips32)

So, we find ourselves in a situation where the compiler is doing the right thing for the code it is generating, but it also notices when the programmer has chosen to do what is now the wrong thing. We must therefore track down these instructions and offer a supported alternative. Previously, we introduced a special configuration setting that might be used to indicate to the compiler when to choose these alternative sequences of instructions: CPU_MIPS32_R1. This gets expanded to CONFIG_CPU_MIPS32_R1 by the build system and it is this identifier that gets used in the program code.

Those Unsupported Instructions

I have put off giving out the details so far, but now is as good a time as any to provide some about the instructions that the JZ4720 (the SoC in the Ben NanoNote) doesn’t seem to support. Some of them are just conveniences, offering a single instruction where many would otherwise be needed. Others offer functionality that is not always trivially replicated.

Instructions Description Privileges
di, ei Disable, enable interrupts Privileged
ext Extract bits from register Unprivileged
ins Insert bits into register Unprivileged
rdhwr Read hardware register Unprivileged, accesses privileged information
synci Synchronise instruction cache Unprivileged, performs privileged operations

We have already mentioned rdhwr, and this is precisely the kind of instruction that can pose problems, these mostly being concerned with it offering access to some (supposedly) privileged information from an unprivileged processor mode. However, since the kernel runs in a privileged mode, typically referred to as “kernel mode”, we won’t see rdhwr when doing our modifications to the kernel. And since the need to provide rdhwr also applied to the JZ4780 (the SoC in the MIPS Creator CI20), it turned out that I didn’t need to do much in addition to what others had already done in supporting it.

Another instruction that requires a bridging of privilege levels is synci. If we discover synci being used in the kernel, it is possible to rewrite it in terms of the equivalent cache instructions. However, outside the kernel in unprivileged mode, those cache instructions cannot be used and we would not wish to support them either, because “user mode” programs are not meant to be playing around with such aspects of the hardware. The solution for such situations is to “trap” synci when it gets used in unprivileged code and to handle it using the same mechanism as that employed to handle rdhwr: to treat it as a “reserved instruction”.

Thus, some extra code is added in the kernel to support this “trap” mechanism, but where we can just replace the instructions, we do so as in this example (from kernel/fiasco/src/kern/mips/alternatives.cpp):

#ifdef CONFIG_CPU_MIPS32_R1
    asm volatile ("cache 0x01, %0\n"
                  "nop\n"
                  "cache 0x08, %0"
                  : : "R"(orig_insn[i]));
#else
    asm volatile ("synci %0" : : "R"(orig_insn[i]));
#endif

We could choose not to bother doing this even in the kernel, instead just trapping all usage of synci. But this would have a performance impact, and L4 is ostensibly very much about performance, and so the opportunity is taken to maximise it by going round and fixing up the code in all these places instead. (Note that I’ve used the nop instruction above, but maybe I should use ehb. It’s probably something to take another look at, perhaps more generally with regard to which instruction I use in these situations.)

The other unsupported instructions don’t create as many problems. The di (disable interrupts) and ei (enable interrupts) instructions are really shorthand for modifications to the processor’s status register, albeit performing those modifications “atomically”. In principle, in cases where I have written out the equivalent sequence of instructions but not done anything to “guard” these instructions from untimely interruptions or exceptions, something bad could happen that wouldn’t have happened with the di or ei instructions themselves.

Maybe I will revisit this, too, and see what the risks might actually be, but for the purposes of getting the kernel working – which is where these instructions appear – the minimal solution seemed reasonably adequate. Here is an extract from a statement employing the ei instruction (from kernel/fiasco/src/drivers/mips/processor-mips.cpp):

#ifdef CONFIG_CPU_MIPS32_R1
    ASM_MFC0 " $t0, $12\n"
    "ehb\n"
    "or $t0, $t0, %[ie]\n"
    ASM_MTC0 " $t0, $12\n"
#else
    "ei\n"
#endif

Meanwhile, the ext (extract) and ins (insert) instructions have similar properties in that they too access parts of registers, replacing sequences of instructions that do the work piece by piece. One challenge that they pose is that they appear in potentially many different places, some with minimal register use, and the equivalent instruction sequence may end up needing an extra register to get the same work done. Fortunately, though, those equivalent instructions are all perfectly usable at whichever privilege level happens to be involved. Here is an extract from a statement employing the ins instruction (from kernel/fiasco/src/kern/mips/thread-mips.cpp):

#ifdef CONFIG_CPU_MIPS32_R1
       "  andi  $t0, %[status], 0xff  \n"
       "  li    $t1, 0xffffff00       \n"
       "  and   $t2, $t2, $t1         \n"
       "  or    $t2, $t2, $t0         \n"
#else
       "  ins   $t2, %[status], 0, 8  \n"
#endif

Note how temporary registers are employed to isolate the bits from the status register and to erase bits in the $t2 register before these two things are combined and stored in $t2.

Bridging the Privilege Gap

The rdhwr instruction has been mentioned quite a few times already. In the kernel, it is handled in the kernel/fiasco/src/kern/mips/exception.S file, specifically in the routine called “reserved_insn”. When the processor encounters an instruction it doesn’t understand, the kernel should have been configured to send it here. I will admit that I knew little to nothing about what to do to handle such situations, but the people who did the MIPS port of the kernel had laid the foundations by supporting one rdhwr variant, and I adapted their work to handle another.

In essence, what happens is that the processor “shows up” in the reserved_insn routine with the location of the bad instruction in its “exception program counter” register. By loading the value stored at that location, we obtain the instruction – or its value, at least – and can then inspect this value to see if we recognise it and can do anything with it. Here is the general representation of rdhwr with an example of its use:

SPECIAL3 _____ t s _____ RDHWR
011111 00000 01000 00001 00000 111011

The first and last portions of the above representation identify the instruction in general, with the bits for the second and next-to-last portions being set to zero presumably because they are either not needed to encode an instruction in this category, or they encode two parameters that are not needed by this particular instruction. To be honest, I haven’t checked which explanation applies, but I suspect it is the latter.

This leaves the remaining portions to indicate specific registers: the target (t) and source (s). With t=8, the result is written to register $8, which is normally known as $t0 (or just t0) in MIPS assembly language. Meanwhile, with s=1, the source register has been given as $1, which is the SYNCI_Step hardware register. So, the above is equivalent to the following:

rdhwr $t0, $1

To reproduce this same realisation in code, we must isolate the parts of the value that identify the instruction. For rdhwr accessing the SYNCI_Step hardware register, this means using a mask that preserves the SPECIAL3, RDHWR, s and blank regions, ignoring the target register value t because it will change according to specific circumstances. Applying this mask to the instruction value and comparing it to an expected value is done rather like this:

li $k0, 0x7c00083b # $k0 = SPECIAL3, blank, s=1, blank, RDHWR
li $at, 0xffe0ffff # $at = define a mask to mask out t
and $at, $at, $k1  # $at = the mask applied to the instruction value

Now, if $at is equal to $k0, the instruction value is identified as encoding rdhwr accessing SYNCI_Step, with the target register being masked out so as not to confuse things. Later on, the target register is itself selected and some trickery is employed to get the appropriate data into that register before returning from this routine.

For the above case and for the synci instruction, the work that needs doing once such an instruction has been identified is equivalent to what would have happened had it been possible to just insert into the code the alternative sequence of instructions that achieves the same thing. So, for synci, the equivalent cache instructions are executed before control is returned to the instruction after synci in the program where it appeared. Thus, upon encountering an unsupported instruction, control is handed over from an unprivileged program to the kernel, the instruction is identified and handled using the necessary privileged instructions, and then control is handed back to the unprivileged program again.

In fact, most of my efforts in exception.S were not really directed towards these two awkward instructions. Instead I had to deal with the use of quite a number of ext and ins instructions. Although it seems tempting to just trap those as well and to provide handlers for them, that would add considerable overhead, and so I added some macros to provide the same functionality when building the kernel for the Ben.

Prepare for Launch

Looking at my patches for the kernel now, I can see that there isn’t much else to cover. One or two details are rather important in the context of the Ben and how it manages to boot, however, and the process of figuring out those details was, like much else in this exercise, time-consuming, slightly frustrating, and left surprisingly little trace once the solution was found. At this stage, not everything was perfectly transcribed or expressed, leaving a degree of debugging activity that would also need to be performed in the future.

So, with a kernel that might be runnable, I considered what it would take to actually launch that kernel. This led me into the L4 Runtime Environment (L4Re) code and specifically to the bootstrap package. It turns out that the kernel distribution delegates such concerns to other software, and the bootstrap package sits uneasily alongside other packages, it being perhaps the only one amongst them that can exercise as much privilege as the kernel because its code actually runs at boot time before the kernel is started up.

Porting L4Re and Fiasco.OC to the Ben NanoNote (Part 2)

Thursday, March 22nd, 2018

Having undertaken some initial investigations into running L4Re and Fiasco.OC on the MIPS Creator CI20, I envisaged attempting to get this software running on the Ben NanoNote, too. For a while, I put this off, feeling confident that when I finally got round to it, it would probably be a matter of just choosing the right compiler options and then merely fixing all the mistakes I had made in my own driver code. Little did I know that even the most trivial activities would prove more complicated than anticipated.

As you may recall, I had noted that a potentially viable approach to porting the software would merely involve setting the appropriate compiler switches for “soft-float” code, thus avoiding the generation of floating point instructions that the JZ4720 – the SoC on the Ben NanoNote – would not be able to execute. A quick check of the GCC documentation indicated the availability of the -msoft-float switch. And since I have a working cross-compiler for MIPS as provided by Debian, there didn’t seem to be much more to it than that. Until I discovered that the compiler doesn’t seem to support soft-float output at all.

I had hoped to avoid building my own cross-compiler, and apart from enthusiastic (and occasionally successful) attempts to build the Debian ones before they became more generally available, the last time I really had anything to do with this was when I first developed software for the Ben. As part of the general support for the device an OpenWrt distribution had been made available. Part of that was the recipe for building the cross-compiler and other tools, needed for building a kernel and all the software one would deploy on a device. I am sure that this would still be a good place to look for a solution, but I had heard things about Buildroot and so set off to investigate that instead.

So although Buildroot, like OpenWrt, is promoted as a way of building an entire system, it too offers help in building just the toolchain if that is all you need. Getting it to build the appropriately-configured cross-compiler is a matter of the familiar “make menuconfig” seen from the Linux kernel source distribution, choosing things in a menu – for us, asking for a soft-float toolchain, also enabling C++ support – and then running “make toolchain”. As a result, I got a range of tools in the output/host/bin directory prefixed with mipsel-buildroot-linux-uclibc.

Some Assembly Required

Changing the compiler settings for Fiasco.OC (in kernel/fiasco/src/Makeconf.mips) and L4Re (in l4/mk/arch/Makeconf.mips), and making sure not to enable any floating point support in Fiasco.OC, and recompiling the code to produce soft-float output was straightforward enough. However, despite the portability of this software, it isn’t completely C and C++ code: lurking in various places (typically in mips or ARCH-mips directories) are assembly language source files with the .S prefix, and in some C and C++ files one can also find “asm” statements which embed assembly language instructions within higher-level code.

With the assumption that by specifying the right compiler switches, no floating point instructions will be produced from C or C++ source code, all that remains is to determine whether any of these other code sections mention such forbidden instructions. It was asserted that Fiasco.OC doesn’t use any floating point instructions at all. Meanwhile, I couldn’t find any floating point instructions in the generated code: “mipsel-linux-gnu-objdump -D some-output-file” (or, indeed, “mipsel-buildroot-linux-uclibc-objdump -D some-output-file”) now started to become a familiar acquaintance if not exactly a friend!

In fact, the assembly language files and statements would provide other challenges in the form of instructions unsupported by the JZ4720. Again, I had the choice of either trying to support MIPS32r2 instructions, like rdhwr, by providing “reserved instruction” handlers, or to rewrite these instructions in forms suitable for the JZ4720. At least within Fiasco.OC – the “kernel” – where the environment for executing instructions is generally privileged, it is possible to reformulate MIPS32r2 instructions in terms of others. I will return to the details of these instructions later on.

Where to Find Things

Having spent all this time looking around in the L4Re and Fiasco.OC code, it is perhaps worth briefly mentioning where certain things can be found. The heart of the action in the kernel is found in these places:

Directory Significance
kernel/fiasco/src The top-level directory of the kernel sources, having some MIPS-specific files
kernel/fiasco/src/drivers/mips Various hardware abstractions related to MIPS
kernel/fiasco/src/jdb/mips MIPS-specific support code for the kernel debugger (which I don’t use)
kernel/fiasco/src/kern/mips MIPS-specific support code for the kernel itself
kernel/fiasco/src/templates Device configuration details

As noted above, I don’t use the kernel debugger, but I still made some edits that might make it possible to use it later on. For the most part, the bulk of my time and effort was spent in the src/kern/mips hierarchy, occasionally discovering things in src/drivers/mips that also needed some attention.

Describing the Ben

So it started to make sense to consider how the Ben might be described in terms of a kernel configuration, and whether we might want to indicate a less sophisticated revision of the architecture so that we could test for it in the code and offer alternative sequences of instructions where possible. There are a few different places where hardware platforms are described within Fiasco.OC, and I ended up defining the following:

  • An architecture version (MIPS32r1) for the JZ4720 (in kernel/fiasco/src/kern/mips/Kconfig)
  • A definition for the Ben itself (in kernel/fiasco/src/templates/globalconfig.out.mips-qi_lb60)
  • A board entry for the Ben (in kernel/fiasco/src/kern/mips/bsp/qi_lb60/Kconfig) as part of a board-specific collection of functionality

This is not by any means enough, even disregarding any code required to do things specific to the Ben. But with the additional configuration setting for the JZ4720, which I called CPU_MIPS32_R1, it becomes possible to go around inside the kernel code and start to mark up places which need different instruction sequences for the Ben, using CONFIG_CPU_MIPS32_R1 as the symbol corresponding to this setting in the code itself. There are places where this new setting will also change the compiler’s behaviour: in kernel/fiasco/src/Makeconf.mips, the -march=mips32 compiler switch is activated by the setting, preventing the compiler from generating instructions we do not want.

For the board-specific functionality (found in kernel/fiasco/src/kern/mips/bsp/qi_lb60), I took the CI20’s collection of files as a starting point. Fortunately for me, the Ben’s JZ4720 and the CI20’s JZ4780 are so similar that I could, with reference to Linux kernel code and other sources of documentation, make a first effort at support for the Ben by transcribing and editing these files. Some things I didn’t understand straight away, and I only later discovered what some parameters to certain methods really mean.

But generally, this work was simply a matter of seeing what peripheral registers were mentioned in the CI20 version, figuring out whether those registers were present in the earlier SoC, and determining whether their locations were the same or whether they had been moved around from one product to the next. Let us take a brief look at the registers associated with the timer/counter unit (TCU) in the JZ4720 and JZ4780 (with apologies for WordPress converting “x” into a multiplication symbol in some places):

JZ4720 (Ben NanoNote) JZ4780 (MIPS Creator CI20)
Registers Offsets Size Registers Offsets Size
TER, TESR, TECR (timer enable, set, clear) 0x10, 0x14, 0x18 8-bit TER, TESR, TECR (timer enable, set, clear) 0x10, 0x14, 0x18 16-bit
TFR, TFSR, TFCR (timer flag, set, clear) 0x20, 0x24, 0x28 32-bit TFR, TFSR, TFCR (timer flags, set, clear) 0x20, 0x24, 0x28 32-bit
TMR, TMSR, TMCR (timer mask, set, clear) 0x30, 0x34, 0x38 32-bit TMR, TMSR, TMCR (timer mask, set, clear) 0x30, 0x34, 0x38 32-bit
TDFR0, TDHR0, TCNT0, TCSR0 (timer data full match, half match, counter, control) 0x40, 0x44, 0x48, 0x4c 16-bit TDFR0, TDHR0, TCNT0, TCSR0 (timer data full match, half match, counter, control) 0x40, 0x44, 0x48, 0x4c 16-bit
TSR, TSSR, TSCR (timer stop, set, clear) 0x1c, 0x2c, 0x3c 8-bit TSR, TSSR, TSCR (timer stop, set, clear) 0x1c, 0x2c, 0x3c 32-bit

We can see how the later product (JZ4780) has evolved from the earlier one (JZ4720), with some registers supporting more bits, exposing control over an increased number of timers. A lot of the details are the same, which was fortunate for me! Even the oddly-located timer stop registers, separated by intervals of 16 bytes (0x10) instead of 4 bytes, have been preserved between the products.

One interesting difference is the absence of the “operating system timer” in the JZ4720. This is a 64-bit counter provided by the JZ4780, but for the Ben it seems that we have to make do with the standard 16-bit timers provided by both products. Otherwise, for this part of the hardware, it is a matter of making sure the fundamental operations look reasonable – whether the registers are initialised sensibly – and then seeing how this functionality is used elsewhere. A file called tcu_jz4740.cpp in the board-specific directory for the Ben preserves this information. (Note that the JZ4720 is largely the same as the JZ4740 which can be considered as a broader product category that includes the JZ4720 as a variant with slightly reduced functionality.)

In the same directory, there is a file covering timer functionality from the perspective of the kernel: timer-jz4740.cpp. Here, the above registers are manipulated to realise certain operations – enabling and disabling timers, reading them, indicating which interrupt they may cause – and the essence of this work again involves checking documentation sources, register layouts, and making sure that the intent of the code is preserved. It may be mundane work, but any little detail that is not correct may prevent the kernel from working.

Covering the Ground

At this point, the essential hardware has mostly been described, building on all the work done by others to port the kernel to the MIPS architecture and to the CI20, merely adding a description of the differences presented by the Ben. When I made these changes, I was slowly immersing myself in the code, writing things that I felt I mostly understood from having previously seen code accessing certain hardware features of the Ben. But I knew that there will still some way to go before being able to expect anything to actually work.

From this point, I would now need to confront the unimplemented instructions, deal with the memory layout, and figure out how the kernel actually gets launched in the first place. This would also mean that I could no longer keep just adding and changing code and feeling like progress was being made: I would actually have to try and get the Ben to run something. And as those of us who write software know very well, there can be nothing more punishing than being confronted with the behaviour of a program that is incorrect, with the computer caring not about intentions or aspirations but only about executing the logic whether it is correct or not.

Porting L4Re and Fiasco.OC to the Ben NanoNote (Part 1)

Wednesday, March 21st, 2018

For quite some time, I have been interested in alternative operating system technologies, particularly kernels beyond the likes of Linux. Things like the Hurd and technologies associated with it, such as Mach, seem like worthy initiatives, and contrary to largely ignorant and conveniently propagated myths, they are available and usable today for anyone bothered to take a look. Indeed, Mach has had quite an active life despite being denigrated for being an older-generation microkernel with questionable performance credentials.

But one technological branch that has intrigued me for a while has been the L4 family of microkernels. Starting out with the motivation to improve microkernel performance, particularly with regard to interprocess communication, different “flavours” of L4 have seen widespread use and, like Mach, have been ported to different hardware architectures. One of these L4 implementations, Fiasco.OC, appeared particularly interesting in this latter regard, in addition to various other features it offers over earlier L4 implementations.

Meanwhile, I have had some success with software and hardware experiments with the Ben NanoNote. As you may know or remember, the Ben NanoNote is a “palmtop” computer based on an existing design (apparently for a pocket dictionary product) that was intended to offer a portable computing experience supported entirely by Free Software, not needing any proprietary drivers or firmware whatsoever. Had the Free Software Foundation been certifying devices at the time of its introduction, I imagine that it would have received the “Respects Your Freedom” certification. So, it seems to me that it is a worthy candidate for a Free Software porting exercise.

The Starting Point

Now, it so happened that Fiasco.OC received some attention with regards to being able to run on the MIPS architecture. The Ben NanoNote employs a system-on-a-chip (SoC) whose own architecture closely (and deliberately) resembles the MIPS architecture, but all information about the JZ4720 SoC specifies “XBurst” as the architecture name. In fact, one can regard XBurst as a clone of a particular version of the MIPS architecture with some additional instructions.

Indeed, the vendor, Ingenic, subsequently licensed the MIPS architecture, produced some SoCs that are officially MIPS-labelled, culminating in the production of the MIPS Creator CI20 product: a development board commissioned by the then-owners of the MIPS portfolio, Imagination Technologies, utilising the Ingenic JZ4780 SoC to presumably showcase the suitability of the MIPS architecture for various applications. It was apparently for this product that an effort was made to port Fiasco.OC to MIPS, and it was this effort that managed to attract my attention.

The MIPS Creator CI20 single-board computer

The MIPS Creator CI20 single-board computer

It was just as well others had done this hard work. Although I have been gradually immersing myself in the details of how MIPS-based CPUs function, having written some code that can boot the Ben, run a few things concurrently, map memory for different processes, read the keyboard and show things on the screen, I doubt that my knowledge is anywhere near comprehensive enough to tackle porting an existing operating system kernel. But knowing that not only had others done this work, but they had also targeted a rather similar system, gave me some confidence that I might be able to perform the relatively minor porting exercise to target the Ben.

But first I felt that I had to gain experience with Fiasco.OC on MIPS in a more convenient fashion. Although I had muddled through the development of code on the Ben, reusing existing framebuffer driver code and hacking away until I managed to get some output on the display, I felt that if I were to continue my experiments, a more efficient way of debugging my code would be required. With this in mind, I purchased a MIPS Creator CI20 and, after doing things with the pre-installed Debian image plus installing a newer version of Debian, I set out to try Fiasco.OC on the hardware.

The Missing Pieces

According to the Fiasco.OC features page, the “Ci20” is supported. Unfortunately, this assertion of support is not entirely true, as we will come to see. Previously, I mentioned that the JZ4720 in the Ben NanoNote largely implements the instructions of a certain version of the MIPS architecture. Although the JZ4780 in the CI20 introduces some new features over the JZ4720, such as a floating point arithmetic unit, it still lacks various instructions that are present in commonly-used MIPS versions that might be taken as the “baseline” for software support: MIPS32 Release 2 (MIPS32r2), for instance.

Upon trying to get Fiasco.OC to start up, I soon encountered one of these instructions, or at least a particular variant of it: rdhwr (read hardware register) accessing SYNCI_Step (the instruction cache line size). This sounds quite fearsome, but I had been somewhat exposed to cache management operations when conjuring up my own code to run on the Ben. In fact, all this instruction variant does is to ask how big the step size has to be in a loop that invalidates the instruction cache, instead of stuffing such a value into the program when compiling it and thus making an executable that will then be specific to a particular processor.

Fortunately, those hardworking people who had already ported the code to MIPS had previously encountered another rdhwr variant and had written code to “trap” it in the “reserved instruction” handler. That provided some essential familiarisation with the kernel code, saving me the effort of having to identify the right place to modify, as well as providing a template for how such handlers should operate. I feel fairly competent writing MIPS assembly language, although I would manage to make an easy mistake in this code that would impede progress much later on.

There were one or two other things that also needed fixing up, mentioned briefly in my review of the year article, generally involving position-independent code that was not called correctly and may have been related to me using a generic version of GCC instead of some vendor-modified version. But as I described in that article, I finally managed to boot Fiasco.OC and run a program on top of it, writing the output via the serial connection to my personal computer.

The End of the Very Beginning

I realised that compiling such code for the Ben would either require the complete avoidance of floating point instructions, due to the lack of that floating point unit in the JZ4720, or that I would need to provide implementations of those instructions in software. Fortunately, GCC provides a mode to compile “soft-float” versions of C and C++ programs, and so this looked like the next step. And so, apart from polishing support for features of the Ben like the framebuffer, input/output pins, the clock circuitry, it didn’t really seem that there would be so much to do.

As it so often turns out with technology, optimism can lead to unrealistic estimates of how much time and effort remains in a project. I now know that a description of all this effort would be just too much for a single article. So, I will wrap this article up with a promise that the next one will descend into the details of compilers, assembly language, the SoC, and before too long, we will get to see the inconvenience of debugging low-level software with nothing more than a framebuffer.

The Ben NanoNote: An Overlooked Hardware Experimentation Platform

Wednesday, October 30th, 2013

The Ben NanoNote is a pocket computer with a 3-inch screen and organiser-style keyboard announced in 2010 as the first in a line of copyleft hardware products under the Qi-Hardware umbrella: an initiative to collaboratively develop open hardware with full support for Free Software. With origins as an existing electronic dictionary product, the Ben NanoNote was customised for use as a general-purpose computing platform and produced in a limited quantity, with plans for successors that sadly did not reach full production.

The Ben NanoNote with illustrative beverage (not endorsed by anyone involved with this message, all trademarks acknowledged, call off the lawyers!)

The Ben NanoNote with illustrative beverage (not endorsed by anyone involved with this message, all trademarks acknowledged, call off the lawyers!)

When the Ben (as it is sometimes referred to in short form) first became known to a wider audience, many people focused on those specifications common to most portable devices sold today: the memory and screen size, what kind of networking it has (or doesn’t have). Some people wondered what the attraction was of a device that wasn’t wireless-capable when supposedly cheaper wireless communicator devices could be obtained. Even the wiki page for the Ben has only really prominently promoted the Free Software side of the device, mentioning its potential for making customised end-user experiences, and as an appliance for open content or for music and video playback.

Certainly, the community around the Ben has a lot to be proud of with regard to Free Software support. A perusal of the Qi-Hardware news page reveals the efforts to make sure that the Ben was (and still is) completely supported by Free Software drivers within the upstream Linux kernel distribution. With a Free Software bootloader, the Ben is probably one of the few devices that could conceivably get some kind of endorsement for the complete absence of proprietary software, including firmware blobs, from organisations like the FSF who naturally care about such things. (Indeed, a project recommended by the FSF whose output appears to be closely related to the Ben’s default software distribution publishes a short guide to installing their software on the Ben.)

But not everybody focused only on the software upon learning about the device: some articles covered the full range of ambitions and applications anticipated for the Ben and for subsequent devices in the NanoNote series. And work got underway rather quickly to demonstrate how the Ben might complement the Arduino range of electronics prototyping and experimentation boards. Although there were concerns that the interfacing potential of the Ben might be a bit limited, with only USB peripheral support available via the built-in USB port (thus ruling out the huge range of devices accessible to USB hosts), the alternatives offered by the device’s microSD port appear to offer a degree of compensation. (The possibility of using SDIO devices had been mentioned at the very beginning, but SDIO is not as popular as some might have wished, and the Ben’s microSD support seems to go only as far as providing MMC capabilities in hardware, leaving out desirable features such as hardware SPI support that would make programming slightly easier and performance substantially better. Meanwhile, some people even took the NanoNote platform to a different level by reworking the Ben, freeing up connections for interfacing and adding an FPGA, but the resulting SIE device apparently didn’t make it beyond the academic environments for which it was designed.)

Thus, the Universal Breakout Board (UBB) was conceived: a way of “breaking out” or exposing the connections of the microSD port to external devices whilst communicating with those devices in a different way than one would with SD-based cards. Indeed, the initial impetus for the UBB was to investigate whether the Ben could be interfaced to an Ethernet board and thus provide dependency-free networking (as opposed to using Ethernet-over-USB to a networked host computer or suitably configured router). Sadly, some of those missing SD-related features have an impact on performance and practicality, but that doesn’t stop the UBB from being an interesting avenue of experimentation. To distinguish between SD-related usage and to avoid trademark issues, the microSD port is usually referred to as the 8:10 port in the context of the UBB.

The Universal Breakout Board that plugs into the 8:10 (microSD) slot

The Universal Breakout Board that plugs into the 8:10 (microSD) slot

(The UBB image originates from Qi-Hardware and is CC-BY-SA 3.0 licensed.)

Interfacing in Comfort

A lot of experimentation with computer-controlled electronics is done using microcontroller solutions like those designed and produced by Arduino, whose range of products starts with the modestly specified Arduino Uno with an ATmega328 CPU providing 32K (kilobytes) of flash memory and 2K of “conventional” static RAM (SRAM). Such specifications sound incredibly limiting, and when one considers that many microcomputers in 1983 – thirty years ago – had at least 32K of “conventional” readable and writable memory, albeit not all of it being always available for use on some machines, devices such as the Uno do not seem to represent much of an advance. However, such constraints can also be liberating: programs written to run in such limited space on the “bare metal” (there being no operating system) can be conceptually simple and focus on very specific interfacing tasks. Nevertheless, the platform requires some adjustment, too: data that will not be updated while a program runs on the device must be packed away in the flash memory where it obviously cannot be changed, and data that the device manipulates or collects must be kept within the limits of the precious SRAM, bearing in mind that the program stack may also be taking up space there, too.

As a consequence, the Arduino platform benefits from a vibrant market in add-ons that extend the basic boards such as the Uno with useful capabilities that in some way make up for those boards’ deficiencies. For example, there are several “shield” add-on products that provide access to SD cards for “data logging“: essential given that the on-board SRAM is not likely to be able to log much data (and is volatile), and given that the on-board flash memory cannot be rewritten during operation. Other add-ons requiring considerable amounts of data also include such additional storage, so that display shields will incorporate storage for bitmaps that might be shown on the display: the Arduino TFT LCD Screen does precisely this by offering a microSD slot. On the one hand, the basic boards offer only what people really need as the foundational component of their projects, but this causes add-on designers to try and remedy the lack of useful core functionality at every turn, putting microSD or SD storage on every shield or extension board just because the user might not have such capabilities already.

Having said all this, the Arduino platform generally only makes you pay for what you need, and for many people this is interesting enough that it explains Arduino’s continuing success. Nevertheless, for some activities, the Arduino platform is perhaps too low-level and to build and combine the capabilities one might need for a project – to combine an Arduino board with numerous shields and other extensions – would be troublesome and possibly unsatisfactory. At some point, one might see the need to discard the “form factor” of the Arduino and to use the technological building blocks that comprise the average Arduino board – the microcontroller and other components – in order to make a more integrated, more compact device with the additional capabilities of choice. For instance, if one wanted to make a portable music player with Arduino, one could certainly acquire shields each providing a screen and controls (and microSD slot), headphone socket and audio playback (and microSD slot), and combine them with the basic board, hoping that they are compatible or working round any incompatibilities by adding yet more hardware. And then one would want to consider issues of power, whether using a simple battery-to-power-jack solution would be good enough or whether there should be a rechargeable battery with the necessary power circuit. But the result of combining an Arduino with shields would not be as practical as a more optimised device.

The Arduino Duemilanove attached to a three-axis accelerometer breakout board

The Arduino Duemilanove attached to a three-axis accelerometer breakout board

The Opportunity

In contrast to the “bare metal” approach, people have been trying and advocating other approaches in recent times. First of all, it has been realised that many people would prefer the comfort of their normal computing environment, or at least one with many of the capabilities they have come to expect, to a very basic environment that has to be told to run a single, simple program that needs to function correctly or be made to offer some rudimentary support for debugging when deployed. Those who promote solutions like the Raspberry Pi note that it runs a desktop operating system, typically GNU/Linux, and thus offers plenty of facilities for writing programs and running them in pleasant ways. Thus, interfacing with other hardware becomes more interactive and easier to troubleshoot, and it might even permit writing interfacing programs in high-level languages like Python as opposed to low-level languages like C or assembly language. In effect, the more capable platform with its more generous resources, faster processor and a genuine operating system provide opportunities that a microcontroller-based solution just cannot provide.

If this was all that were important then we would surely have reached the end of the matter, but there is more to consider. The Raspberry Pi is really a desktop computer replacement: you plug in USB peripherals and a screen and you potentially have a replacement for your existing x86-based personal computer. But it is in many ways only as portable and as mobile as the Arduino, and absolutely less so in its primary configuration. Certainly, people have done some interesting experiments adding miniature keyboards and small screens to the Raspberry Pi, but it starts to look like the situation with the Arduino when trying to build up some capability or other from too low a starting point. Such things are undoubtedly achievements in themselves, and like climbing a mountain or going into space, showing that it could be done is worthy of our approval, but just like those great achievements it would be a shame to go to all that effort without also doing a bit of science in the process, would it not?

An e-paper display connected to the Ben NanoNote via a cable and the Sparkfun "microSD Sniffer" board

An e-paper display connected to the Ben NanoNote via a cable and the Sparkfun "microSD Sniffer" board

This is where the Ben enters the picture. Because it is a pocket computer with a built-in screen and battery (and keyboard), you can perform various mobile experiments without having to gear up to be mobile in the first place. Perhaps most of the time, it may well be sitting on your desk next to a desktop computer and being remotely accessed using a secure shell connection via Ethernet-over-USB, acting as a mere accessory for experimentation. An example of this is a little project where I connected an e-paper screen to the Ben to see how hard it would be to make that screen useful. Of course, I could also take this solution “on the road” if I wanted, and it would be largely independent of any other computing infrastructure, although I will admit to not having run native compilers on the Ben myself – everything I have compiled has actually been cross-compiled using the OpenWrt toolchain targeting the Ben – but it should be possible to develop on the road, too.

But for truly mobile experimentation, where having an untethered device is part of the experiment, the Ben offers something that the Raspberry Pi and various single-board computer solutions do not: visualisation on the move. One interesting project that pioneered this is the UBB-LA (UBB Logic Analyzer) which accepts signals via the Ben’s 8:10 port and displays a time-limited capture of signal data on the screen. One area that has interested me for a while has been that of orientation and motion sensor data, with the use of gyroscopes and accelerometers to determine the orientation and motion of devices. Since there are many “breakout boards” (small boards providing convenient access to components) offering these kinds of sensors, and since the communication with these sensors is within the constraints of the 8:10 port bandwidth, it became attractive to use the Ben to prototype some software for applications which might use such data, and the Ben’s screen provides a useful way of visualising the interpretation of the data by the software. Thus, another project was conceived that hopefully provides the basis for more sophisticated experiments into navigation, interaction and perhaps even things like measurement.

The Pololu MinIMU-9 board connected to the Ben NanoNote in a horizontal position, showing the orientation information from the board

The Pololu MinIMU-9 board connected to the Ben NanoNote in a horizontal position, showing the orientation information from the board

The Differentiator

Of course, sensor applications are becoming commonplace due to the inclusion of gyroscopes, accelerometers, magnetometers and barometers into smartphones. Indeed, this was realised within the open hardware community several years ago with the production of the Openmoko Freerunner Navigation Board that featured such sensors and additional interfacing components. Augmented reality applications and fancy compass visualisations are becoming standard features on smartphones, complementing navigation data from GPS and comparable satellite navigation systems, and the major smartphone software vendors provide APIs to access such components. Indeed, many of the components used in smartphones feature on the breakout boards mentioned above, unsurprisingly, because their cost has been driven down and made them available and affordable for a wider range of applications.

So what makes the Ben NanoNote interesting when you could just buy a smartphone and code against, say, the Android API? Well, as many people familiar with software and hardware freedom will already know, being able to use an API is perhaps only enough if you never intend to improve the software providing that API, or share that software and any improvements you make to it with others, and if you never want to know what the code is really doing, anyway. Furthermore, you may not be able to change or upgrade that software or to deploy your own software on a smartphone you have bought, despite vigorous efforts to make this your right.

Even if you think that the software providers have done a good job interpreting the sensor data and translating it into something usable that a smartphone application might use, and this is not a trivial achievement by any means, you may also want to try and understand how the device interacts with the sensors at a lower level. It is rather likely that an Android smartphone, for example, will communicate with an accelerometer using a kernel module that resides in the upstream Linux kernel source code, and you could certainly take a look at that code, but as thirty years of campaigning for software freedom has shown, taking a look is not as good as improving, sharing and deploying that code yourself. Certainly, you could “root” your smartphone and install an alternative operating system that gives you the ability to develop and deploy kernel modules and even user-space code – normal programs – that access the different sensors, but you take your chances doing so.

Meanwhile, the Ben encourages experimentation by letting you re-flash the bootloader and operating system image, and you can build your own kernel and root filesystem populated with programs of your choice, all of it being Free Software and doing so using only Free Software. Things that still surprise people with modified smartphone images like being able to log in and get a secure shell session and to run “normal Linux programs” are the very essence of the Ben. It may not have wireless or cellular networking as standard – a much discussed topic that can be solved in different ways – but that can only be good news for the battery life.

The Successor?

“What would I use it for?” That might have been my first reaction to the Ben when I first heard about it. I don’t buy many gadgets – my mobile telephone is almost ten years old – and I always try to justify gadget purchases, perhaps because growing up in an age where things like microcomputers were appearing but were hardly impulse purchases, one learns how long-lived technology products can be when they are made to last and made to a high quality: their usefulness does not cease merely because a new product has become available. Having used the Ben, it is clear that although its perceived origins as some kind of dictionary, personal organiser or music player are worthy in themselves, it just happens to offer some fun as a platform for hardware experimentation.

I regard the Ben as something of a classic. It might not have a keyboard that would appeal to those who bought Psion organiser products at the zenith of that company’s influence, and its 320×240 screen seems low resolution in the age of large screen laptops and tablets with “retina” displays, but it represents something that transcends specifications and yet manages to distract people when they see one. Sadly, it seems likely that remaining stocks will eventually be depleted and that opportunities to acquire a new one will therefore cease.

Plans for direct successors of the Ben never really worked out, although somewhat related products do exist: the GCW-Zero uses an Ingenic SoC just like the Ben does, and software development for both devices engages a common community in many respects, but the form factor is obviously different. The Pandora has a similar form factor and has a higher specification but also has a higher price, but it is apparently not open hardware. The Neo900, if it comes to pass (and it hopefully will), may offer a good combination of Free Software and open hardware, but it will understandably not come cheap.

One day, well-funded organisations may recognise and reward the efforts of the open hardware pioneers. Imitating aspirational product demonstrations and trying to get in on the existing action is all very well, not to mention going your own way, but getting involved in the open hardware communities and helping them to build new things would benefit everybody. I can only hope that such organisations come to their senses soon so that more people can have the opportunity to play with sensors, robotics and all the other areas of hardware experimentation, and that a healthy diversity of platforms may be sustained to encourage such experimentation long into the future.

The Ben NanoNote and regular-sized desktop computer accessories

The Ben NanoNote and regular-sized desktop computer accessories