Paul Boddie's Free Software-related blog


Archive for the ‘hardware’ Category

Shared-Mode Executables in L4Re for MIPS-Based Devices

Sunday, July 8th, 2018

I have been meaning to write about my device driver experiments with L4Re, following on from my porting exercises, but that exercise took me along various routes and I haven’t yet got back to documenting all of them. Meanwhile, one thing that did start to bother me was how much space the software was taking up when compiled, linked and ready to deploy.

Since each of my device drivers is a separate program, and since each one may be linked to various libraries, they each started to contribute substantially to the size of the resulting file – the payload – needing to be transferred to the device. At one point, I had to resize the boot partition on the memory card used by the Letux 400 notebook computer to make the payload fit in the available space.

The work done to port L4Re to the MIPS Creator CI20 had already laid the foundations for functioning payloads, and once the final touches were put in place to support the peculiarities of the Ingenic JZ4780 system-on-a-chip, it was possible to run both the conventional “hello” example which is statically linked to its libraries, as well as a “shared-hello” example which is dynamically linked to its libraries. The latter configuration of the program results in a smaller executable program and thus a smaller payload.

So it seemed clear that I might be able to run my own programs on the Letux 400 or Ben NanoNote with similar reductions in payload size. Unfortunately, nothing ever seems to be as straightforward as it ought to be.

Exceptional Obstructions and Observations

Initially, I set about trying one of my own graphical examples with the MODE variable set to “shared” in its Makefile. This, upon powering up, merely indicated that it had not managed to start up properly. Instead of a blank screen, the viewports set up by the graphical multiplexer, Mag, were still active and showing their usual blankness. But these regions did not then change in any way when I pressed keys on the keyboard (which is functionality that I will hopefully get round to describing in another article).

I sought some general advice from the l4-hackers mailing list, but quickly realised that to make any real progress, I would need a decent way of accessing the debugging output produced by the dynamic linker. This took me on a diversion that led to my debugging capabilities being strengthened with the availability of a textual output console on the screen of my devices. I still don’t like the idea of performing hardware modifications to get access to the serial console, so this is a useful and welcome alternative.

Having switched out the “hello” program with the “shared-hello” program in the system configuration and module list demonstrating the framebuffer terminal, I deployed the payload and powered up, but I did not get the satisfying output of the program operating normally. Instead, the framebuffer terminal appeared and rewarded me with the following message:

L4Re: rom/ex_hello_shared: Unhandled exception: PC=0x800000 PFA=8d7a LdrFlgs=0

This isn’t really the kind of thing you want to see. Having not had to debug L4Re or Fiasco.OC in any serious fashion for a couple of months, I was out of practice in considering the next step, but fortunately some encouragement arrived in a private e-mail from Jean Wolter. This brought the suggestion of triggering the kernel debugger, but since this requires serial console access, it wasn’t a viable approach. But another idea that I could use involved writing out a bit more information in the routine that was producing this output.

The message in question originates in the pkg/l4re-core/l4re_kernel/server/src/region.cc file, within the Region_map::op_exception method. The details it produces are rather minimal and generic: the program counter (PC) tells us where the exception occurred; the loader flags (LdrFlags) presumably tell us about the activity of the library loader; the mysterious “PFA” is supposedly the page fault address but it actually seemed to be the stack pointer address on these MIPS-based systems.

On their own, these details are not particularly informative, but I suppose that more useful information could quickly become fairly specific to a particular architecture. Jean suggested looking at the structure describing the exception state, l4_exc_regs_t (defined with MIPS-specific members in pkg/l4re-core/l4sys/include/ARCH-mips/utcb.h), to see what else I might dig up. This I did, generating the following:

pc=0x800000
gp=0x82dd30
sp=0x8d7a
ra=0x802f6c
cause=0x1000002c

A few things interested me, thus motivating my choice of registers to dump. The global pointer (gp) register tells us about symbols in the problematic code, and I felt that having once made changes to the L4Re sources – way back in the era of getting the CI20 to run GCC-generated code – so that another register (t9) would be initialised correctly, this so that the gp register would be set up correctly within programs, it was entirely possible that I had rather too enthusiastically changed something that was now causing a problem.

The stack pointer (sp) is useful to check, just to see if it located in a sensible region of memory, and here I discovered that this seemed to be the same as the “PFA” number. Oddly, the “PFA” seems to occupy the same place in the exception structure as any “bad virtual address” featuring in an address exception, and so I started to suspect that maybe the stack pointer was referencing the wrong part of memory. But this was partially ruled out by examining the value of the stack pointer in the “hello” example, which appeared to reference broadly the same part of memory. And, of course, the “hello” example works just fine.

In fact, the cause register indicated another kind of exception entirely, and it was one I was not really expecting: a “coprocessor unusable” exception indicating that coprocessor 1, typically a floating point arithmetic unit, was being illegally requested by an instruction. Here is how I interpreted that register’s value:

hex value   binary value
1000002c == 00010000000000000000000000101100
              --                     -----
              CE                     ExcCode

=> CE == 1; ExcCode == 11 (coprocessor unusable)
=> coprocessor 1 unusable

Now, as I may have mentioned before, the hardware involved in this exercise does not support floating point instructions itself, and this is why I have configured compilers to use “soft-float” (software-based floating point arithmetic) support. It meant that I had to find places that might have wanted to use floating point instructions and eliminate those instructions somehow. Fortunately, only code generated by the compiler was likely to contain such instructions. But now I wondered if there weren’t some instructions of this nature lurking in places I hadn’t checked.

I had also thought to check the return address (ra) register. This tells us where the processor will jump to when it has finished executing the current routine, and since this is usually a matter of “returning” somewhere, it tells us something about the code that was being executed before the problematic routine was called. I figured that the work being done before the exception was probably going to be more important than the exception itself.

Floating Point Magic

Another debugging suggestion that now became unavoidable was to inspect the erroneous instruction. I noted above that this instruction was causing the processor to signal an illegal attempt to use an unusable – actually completely unavailable – coprocessor. Writing a numeric representation of the instruction to the display provided me with the following hexadecimal (base 16) value:

464c457f

This can be interpreted as follows in binary, with groups of bits defined for interpretation according to the MIPS instruction set architecture, and with tentative interpretations of these groups provided beneath:

010001 10010  01100 01000 10101 111111
COP1   rs/fmt rt/ft rd/fs       C.ABS.NGT

The first group of bits is the opcode field which is interpreted as a coprocessor 1 (COP1) opcode. Should we then wish to consider what the other groups mean, we might then examine the final group which could indicate a comparison instruction. However, this becomes rather hypothetical since the processor will most likely interpret the opcode field and then decide that it cannot handle the instruction.

So, I started to look for places where the instruction might have been written, but no obvious locations were forthcoming. One peculiar aspect of all this is that the location of the instruction is at a rather “clean” location – 0×800000 – and some investigations indicated that this is where the library containing the problematic code gets loaded. I actually don’t remember precisely how I figured this out, but I think it was as follows.

I had looked at linker scripts that might give some details of the location of program objects, and one of them (pkg/l4re-core/ldscripts/ARCH-mips/main_dyn.ld) seemed to be related. It gave an address for the code of 0×400000. This made me think that some misconfiguration or erroneous operation was putting the observed code somewhere it shouldn’t be. But changing this address in the linker script just gave another exception at 0×400000, meaning that I had disrupted something that was intentional and probably working fine.

Meanwhile, emitting the t9 register’s value from the exception state yielded 0×800000, indicating that the calling routine had most likely jumped straight to that address, not to another address with execution having then proceeded normally until reaching the exception location. I decided to look at the instructions around the return address, these most likely being the ones that had set up the call to the exception location. Writing these locations out gave me some idea about the instructions involved. Below, I provide the stored values and their interpretations as machine instructions:

8f998250 # lw $t9, -32176($gp)
24a55fa8 # addiu $a1, $a1, 0x5fa8
0320f809 # jalr $t9
24844ee4 # addiu $a0, $a0, 0x4ee4
8fbc0010 # lw $gp, 16($sp)

One objective of doing this, apart from confirming that a jump instruction (jalr) was involved, with the t9 register being employed as is the convention with MIPS code, was to use the fragment to identify the library that was causing the error. A brute-force approach was employed here, generating “object dumps” from the library files and writing them out as files in a new directory:

mkdir tmpdir
for FILENAME in mybuild/lib/mips_32/l4f/* ; do
    mipsel-linux-gnu-objdump -d "$FILENAME" > tmpdir/`basename "$FILENAME"`
done

The textual dump files were then searched for the instruction values using grep, narrowing down the places where these instructions were found in consecutive locations. This yielded the following code, found in the libld-l4.so library:

    2f5c:       8f998250        lw      t9,-32176(gp)
    2f60:       24a55fa8        addiu   a1,a1,24488
    2f64:       0320f809        jalr    t9
    2f68:       24844ee4        addiu   a0,a0,20196
    2f6c:       8fbc0010        lw      gp,16(sp)

The integer operands for the addiu instructions are the same, of course, just being shown as decimal rather than hexadecimal values. Now, we previously saw that the return address (ra) register had the value 0x802f6c. When a MIPS processor executes a jump instruction, it will also fetch the following instruction and execute it, this being a consequence of the way the processor architecture is designed.

So, the instruction after the jump, residing in what is known as the “branch delay slot” is not the instruction that will be visited upon returning from the called routine. Instead, it is the instruction after that. Here, we see that the return address from the jump at location 0x2f64 would be two locations later at 0x2f6c. This provides a kind of realisation that the program object – the libld-l4.so library – is positioned in memory at 0×800000: 0x2f6c added to 0×800000 gives the value of ra, 0x802f6c.

And this means that the location of the problematic instruction – the cause of our exception – is the first location within this object. Anyone with any experience of this kind of software will have realised by now that this doesn’t sound like a healthy situation: the first location within a library is not actually going to be code because these kinds of objects are packaged up in a way that permits their manipulation by other programs.

So what is the first location of a library used for? Since such objects employ the Executable and Linkable Format (ELF), we can take a look at some documentation. And we see that the first location is used to identify the kind of object, employing a “magic number” for the purpose. And that magic number would be…

464c457f

In the little-endian arrangement employed by this processor, the stored bytes are as follows:

7f
45 ('E')
4c ('L')
46 ('F')

The value was not a floating point instruction at all, but the magic number at the start of the library object! It was something of a coincidence that such a value would be interpreted as a floating point instruction, an accidentally convenient way of signalling something going badly wrong.

Missing Entries

The investigation now started to focus on how the code trying to jump to the start of the library had managed to get this incorrect address and what it was trying to do by jumping to it. I started to wonder if the global pointer (gp), whose job it is to reference the list of locations of program routines and other global data, might have been miscalculated such that attempts to load the addresses of routines would then be failing with data being fetched from the wrong places.

But looking around at code fragments where the gp register was being calculated, they seemed to look set to calculate the correct values based on assumptions about other registers. For example, from the object dump for libld-l4.so:

00002780 <_ftext>:
    2780:       3c1c0003        lui     gp,0x3
    2784:       279cb5b0        addiu   gp,gp,-19024
    2788:       0399e021        addu    gp,gp,t9

Assuming that the processor has t9 set to 0×2780 and then jumps to the value of t9, as is the convention, the following calculation is then performed:

gp = 0x30000 (since lui loads the "upper" half-word)
gp = gp - 19024 = 0x30000 - 19024 = 0x2b5b0
gp = gp + t9 = 0x2b5b0 + 0x2780 = 0x2dd30

Using the nm tool, which tells us about symbols in program objects, it was possible to check this value:

mipsel-linux-gnu-nm -n mybuild/lib/mips_32/l4f/libld-l4.so

This shows the following at the end of the output:

0002dd30 d _gp

Also appearing somewhat earlier in the output is this, telling us where the table of symbols starts (as well as the next thing in the file):

00025d40 a _GLOBAL_OFFSET_TABLE_
00025f90 g __dso_handle

Some digging around in the L4Re source code gave a kind of confirmation that the difference between _gp and _GLOBAL_OFFSET_TABLE_ was to be expected. Here is what I found in the pkg/l4re-core/uclibc/lib/contrib/uclibc/ldso/ldso/mips/elfinterp.c file:

#define OFFSET_GP_GOT 0x7ff0

If gp, when recalculated in other places, ended up getting the same value, there didn’t seem to be anything wrong with it. Some quick inspections of neighbouring calculations indicated that this wasn’t likely to be the problem. But what about the values used in conjunction with gp? Might they be having an effect? In the case of the erroneous jump, the following calculation is involved:

lw t9,-32176(gp) => load word into t9 from the location at gp - 32176
                 => ...               from 0x2dd30 - 32176
                 => ...               from 0x25f80

The calculated address, 0x25f80, is after the start of _GLOBAL_OFFSET_TABLE_ providing entries for program routines and other things, which is a good sign, but what is perhaps more troubling is how far after the start of the table such a value is. In the above output, another symbol (__dso_handle) indicates something that is located at the end of the table. Now, although its address is still greater than the one computed above, meaning that the computation does not cause us to stray off the end of the table, the computed address is suspiciously close to the end.

There was nothing else to do than to have a look at the table contents itself, and here it was rather useful to have a way of displaying a number of values on the screen. At this point, we have to note that the addresses in use in the running system are adjusted according to the start of the loaded object, so that the table is positioned at 0x25d40 in the object dump, but in the running system we would see 0×800000 + 0x25d40 and thus 0x825d40 instead.

What I saw was that the table contained entries that varied in the expected way right up until 0x825f60 (corresponding to 0x25f60 in the object dump) being only 0×30 (or 48 bytes, or 12 entries) before the end of the table, but then all remaining entries starting at 0x825f64 (corresponding to 0x25f64) yielded a value of 0×800000, apart from 0x825f90 (corresponding to 0x25f90, right at the end of the table) which yielded itself.

Since the calculated address above (0x25f80, adjusted to 0x825f80 in the running system) lies in this final region, we now know the origin of this annoying 0×800000: it comes from entries at the end of the table that do not seem to hold meaningful values. Indeed, the object dump for the library seemed to skip over this region of the table entirely, presumably because it was left uninitialised. And using the readelf tool with the –relocs option to show “relocations”, which applies to this table, it appeared that the last entries rather confirmed my observations:

00025d34  00000003 R_MIPS_REL32
00025f90  00000003 R_MIPS_REL32

Clearly, something is missing from this table. But since something has to adjust the contents of the table to add the “base address”, 0×800000, to the entries in order to provide valid addresses within the running program, what started to intrigue me was whether the code that performed this adjustment had any idea about these missing entries, and how this code might be related to the code causing the exception situation.

Routines and Responsibilities

While considering the nature of the code causing the exception, I had been using the objdump utility with the -d (disassemble) and -D (disassemble all) options. These provide details of program sections, code routines and the machine instructions themselves. But Jean pointed out that if I really wanted to find out which part of the source code was responsible for producing certain regions of the program, I might use a combination of options: -d, -l (line numbers) and -S (source code). This was almost a revelation!

However, the code responsible for the jump to the start of the library resisted such measures. A large region of code appeared to have no corresponding source, suggesting that it might be generated. Here is how it starts:

_ftext():
    2dac:       00000000        nop
    2db0:       3c1c0003        lui     gp,0x3
    2db4:       279caf80        addiu   gp,gp,-20608
    2db8:       0399e021        addu    gp,gp,t9
    2dbc:       8f84801c        lw      a0,-32740(gp)
    2dc0:       8f828018        lw      v0,-32744(gp)

There is no function defined in the source code with the name _ftext. However, _ftext is defined in the linker script (in pkg/l4re-core/ldscripts/ARCH-mips/main_rel.ld) as follows:

  .text           :
  {
    _ftext = . ;
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf32.em.  */
    *(.gnu.warning)
    *(.mips16.fn.*) *(.mips16.call.*)
  }

If you haven’t encountered linker scripts before, then you probably don’t want to spend too much time looking at this, linker scripts being frustratingly terse and arcane, but the essence of the above is that a bunch of code is stuffed into the .text section, with _ftext being assigned the address of the start of all this code. Now, _ftext in the linker script corresponds to a particular label in the object dump (which we saw earlier was positioned at 0×2780) whereas the _ftext function in the code occurs later (at 0x2dac, above). After the label but before the function is code whose source is found by objdump.

So I took the approach of removing things from the linker script, ultimately removing everything from the .text section apart from the assignment to _ftext. This removed the annotated regions of the code and left me with only the _ftext function. It really did appear that this was something the compiler might be responsible for. But where would I find the code responsible?

One hint that was present in the _ftext function code was the use of another identified function, __cxa_finalize. Searching the GCC sources for code that might use it led me to the libgcc sources and to code that invokes destructor functions upon program exit. This wasn’t really what I was looking for, but the file containing it (libgcc/crtstuff.c) would prove informative.

Back to the Table

Jean had indicated that there might be a difference in output between compilers, and that certain symbols might be produced by some but not by others. I investigated further by using the readelf tool with the -a option to show almost everything about the library file. Here, the focus was on the global offset table (GOT) and information about the entries. In particular, I wanted to know more about the entry providing the erroneous 0×800000 value, located at (gp – 32176). In my output I saw the following interesting thing:

 Global entries:
   Address     Access  Initial Sym.Val. Type    Ndx Name
  00025f80 -32176(gp) 00000000 00000000 FUNC    UND __register_frame_info

This seems to tell us what the program expects to find at the location in question, and it indicates that the named symbol is presumably undefined. There were some other undefined symbols, too:

_ITM_deregisterTMCloneTable
_ITM_registerTMCloneTable
__deregister_frame_info

Meanwhile, Jean was seeing symbols with other names:

__register_frame_info_base
__deregister_frame_info_base

During my perusal of the libgcc sources, I had noticed some of these symbols being tested to see if they were non-zero. For example:

  if (__register_frame_info)
    __register_frame_info (__EH_FRAME_BEGIN__, &object);

These fragments of code appear to be located in functions related to program initialisation. And it is also interesting to note that back in the library code, after the offending table entry has been accessed, there are tests against zero:

    2f34:       3c1c0003        lui     gp,0x3
    2f38:       279cadfc        addiu   gp,gp,-20996
    2f3c:       0399e021        addu    gp,gp,t9
    2f40:       27bdffe0        addiu   sp,sp,-32
    2f44:       8f828250        lw      v0,-32176(gp)
    2f48:       afbc0010        sw      gp,16(sp)
    2f4c:       afbf001c        sw      ra,28(sp)
    2f50:       10400007        beqz    v0,2f70 <_ftext+0x7f0>

Here, gp gets set up correctly, v0 is set to the value of the table entry, which we now believe refers to __register_frame_info, and the beqz instruction tests this value against zero, skipping ahead if it is zero. Does that not sound a bit like the code shown above? One might think that the libgcc code might handle an uninitialised table entry, and maybe it is intended to do so, but the table entry ends up getting adjusted to 0×800000, presumably as part of the library loading process.

I think that the most relevant function here for the adjustment of these entries is _dl_perform_mips_global_got_relocations which can be found in the pkg/l4re-core/uclibc/lib/contrib/uclibc/ldso/ldso/ldso.c file as part of the L4Re C library code. It may well have changed the entry from zero to this erroneous non-zero value, merely because the entry lies within the table and is assumed to be valid.

So, as a consequence, the libgcc code acts as if it has a genuine __register_frame_info function to call, and doing so causes the jump to the start of the library object and the exception. Maybe the code is supposed to be designed to handle missing symbols, those symbols potentially being deliberately omitted, but it doesn’t function correctly under these particular circumstances.

Symbol Restoration

However, despite identifying this unfortunate interaction between C library and libgcc, the matter of a remedy remained unaddressed. What was I to do about these missing symbols? Were they supposed to be there? Was there a way to tell libgcc not to expect them to be there at all?

In attempting to learn a bit more about the linking process, I had probably been through the different L4Re packages several times, but Jean then pointed me to a file I had seen before, perhaps before I had needed to think about these symbols at all. It contained “empty” definitions for some of the symbols but not for others. Maybe the workaround or even the solution was to just add more definitions corresponding to the symbols the program was expecting? Jean thought so.

So, I added a few things to the file (pkg/l4re-core/ldso/ldso/fixup.c):

void __deregister_frame_info(void);
void __register_frame_info(void);
void _ITM_deregisterTMCloneTable(void);
void _ITM_registerTMCloneTable(void);

void __deregister_frame_info(void) {}
void __register_frame_info(void) {}
void _ITM_deregisterTMCloneTable(void) {}
void _ITM_registerTMCloneTable(void) {}

I wasn’t confident that this would fix the problem. After all the investigation, adding a few lines of trivial code to one file seemed like too easy a way to fix what seemed like a serious problem. But I checked the object dump of the library, and suddenly things looked a bit more reasonable. Instead of referencing an uninitialised table entry, objdump was able to identify the jump target as __register_frame_info:

    2e14:       8f828040        lw      v0,-32704(gp)
    2e18:       afbc0010        sw      gp,16(sp)
    2e1c:       afbf001c        sw      ra,28(sp)
    2e20:       10400007        beqz    v0,2e40 <_ftext+0x7f0>
    2e24:       8f85801c        lw      a1,-32740(gp)
    2e28:       8f84803c        lw      a0,-32708(gp)
    2e2c:       8f998040        lw      t9,-32704(gp)
    2e30:       24a55fa8        addiu   a1,a1,24488
    2e34:       04111c39        bal     9f1c <__register_frame_info>

Of course, the code had been reorganised and so things were no longer in quite the same places, but in the above, (gp – 32704) is actually a reference to __register_frame_info, and although this value gets tested against zero as before, we can see that enough is now known about the previously-missing symbol that a branch directly to the location of the function has been incorporated, rather than a jump to the address stored in the table.

And sure enough, powering up the Letux 400 produced the framebuffer terminal showing the expected output:

Hi World! (shared)

It had been a long journey for such a modest reward, but thanks to Jean’s help and a bit of perseverance, I got there in the end.

2017 in Review

Thursday, December 7th, 2017

On Planet Debian there seems to be quite a few regularly-posted articles summarising the work done by various people in Free Software over the month that has most recently passed. I thought it might be useful, personally at least, to review the different things I have been doing over the past year. The difference between this article and many of those others is that the work I describe is not commissioned or generally requested by others, instead relying mainly on my own motivation for it to happen. The rate of progress can vary somewhat as a result.

Learning KiCad

Over the years, I have been playing around with Arduino boards, sensors, displays and things of a similar nature. Although I try to avoid buying more things to play with, sometimes I manage to acquire interesting items regardless, and these aren’t always ready to use with the hardware I have. Last December, I decided to buy a selection of electronics-related items for interfacing and experimentation. Some of these items have yet to be deployed, but others were bought with the firm intention of putting different “spare” pieces of hardware to use, or at least to make them usable in future.

One thing that sits in this category of spare, potentially-usable hardware is a display circuit board that was once part of a desk telephone, featuring a two-line, bitmapped character display, driven by the Hitachi HD44780 LCD controller. It turns out that this hardware is so common and mundane that the Arduino libraries already support it, but the problem for me was being able to interface it to the Arduino. The display board uses a cable with a connector that needs a special kind of socket, and so some research is needed to discover the kind of socket needed and how this might be mounted on something else to break the connections out for use with the Arduino.

Fortunately, someone else had done all this research quite some time ago. They had even designed a breakout board to hold such a socket, making it available via the OSH Park board fabricating service. So, to make good on my plan, I ordered the mandatory minimum of three boards, also ordering some connectors from Mouser. When all of these different things arrived, I soldered the socket to the board along with some headers, wired up a circuit, wrote a program to use the LiquidCrystal Arduino library, and to my surprise it more or less worked straight away.

Breakout board for the Molex 52030 connector

Breakout board for the Molex 52030 connector

Hitachi HD44780 LCD display boards driven by an Arduino

Hitachi HD44780 LCD display boards driven by an Arduino

This satisfying experience led me to consider other boards that I might design and get made. Previously, I had only made a board for the Arduino using Fritzing and the Fritzing Fab service, and I had held off looking at other board design solutions, but this experience now encouraged me to look again. After some evaluation of the gEDA tools, I decided that I might as well give KiCad a try, given that it seems to be popular in certain “open source hardware” circles. And after a fair amount of effort familiarising myself with it, with a degree of frustration finding out how to do certain things (and also finding up-to-date documentation), I managed to design my own rather simple board: a breakout board for the Acorn Electron cartridge connector.

Acorn Electron cartridge breakout board (in 3D-printed case section)

Acorn Electron cartridge breakout board (in 3D-printed case section)

In the back of my mind, I have vague plans to do other boards in future, but doing this kind of work can soak up a lot of time and be rather frustrating: you almost have to get into some modified mental state to work efficiently in KiCad. And it isn’t as if I don’t have other things to do. But at least I now know something about what this kind of work involves.

Retro and Embedded Hardware

With the above breakout board in hand, a series of experiments were conducted to see if I could interface various circuits to the Acorn Electron microcomputer. These mostly involved 7400-series logic chips (ICs, integrated circuits) and featured various logic gates and counters. Previously, I had re-purposed an existing ROM cartridge design to break out signals from the computer and make it access a single flash memory chip instead of two ROM chips.

With a dedicated prototyping solution, I was able to explore the implementation of that existing board, determine various aspects of the signal timings that remained rather unclear (despite being successfully handled by the existing board’s logic), and make it possible to consider a dedicated board for a flash memory cartridge. In fact, my brother, David, also wanting to get into board design, later adapted the prototyping cartridge to make such a board.

But this experimentation also encouraged me to tackle some other items in the electronics shipment: the PIC32 microcontrollers that I had acquired because they were MIPS-based chips, with somewhat more built-in RAM than the Atmel AVR-based chips used by the average Arduino, that could also be used on a breadboard. I hoped that my familiarity with the SoC (system-on-a-chip) in the Ben NanoNote – the Ingenic JZ4720 – might confer some benefits when writing low-level code for the PIC32.

PIC32 on breadboard with Arduino programming circuit

PIC32 on breadboard with Arduino programming circuit (and some LEDs for diagnostic purposes)

I do not need to reproduce an account of my activities here, given that I wrote about the effort involved in getting started with the PIC32 earlier in the year, and subsequently described an unusual application of such a microcontroller that seemed to complement my retrocomputing interests. I have since tried to make that particular piece of work more robust, but deducing the actual behaviour of the hardware has been frustrating, the documentation can be vague when it needs to be accurate, and much of the community discussion is focused on proprietary products and specific software tools rather than techniques. Maybe this will finally push me towards investigating programmable logic solutions in the future.

Compiling a Python-like Language

As things actually happened, the above hardware activities were actually distractions from something I have been working on for a long time. But at this point in the article, this can be a diversion from all the things that seem to involve hardware or low-level software development. Many years ago, I started writing software in Python. Over the years since, alternative implementations of the Python language (the main implementation being CPython) have emerged and seen some use, some continuing to be developed to this day. But around fifteen years ago, it became a bit more common for people to consider whether Python could be compiled to something that runs more efficiently (and more quickly).

I followed some of these projects enthusiastically for a while. Starkiller promised compilation to C++ but never delivered any code for public consumption, although the associated academic thesis might have prompted the development of Shed Skin which does compile a particular style of Python program to C++ and is available as Free Software. Meanwhile, PyPy elevated to prominence the notion of writing a language and runtime library implementation in the language itself, previously seen with language technologies like Slang, used to implement Squeak/Smalltalk.

Although other projects have also emerged and evolved to attempt the compilation of Python to lower-level languages (Pyrex, Cython, Nuitka, and so on), my interests have largely focused on the analysis of programs so that we may learn about their structure and behaviour before we attempt to run them, this alongside any benefits that might be had in compiling them to something potentially faster to execute. But my interests have also broadened to consider the evolution of the Python language since the point fifteen years ago when I first started to think about the analysis and compilation of Python. The near-mythical Python 3000 became a real thing in the form of the Python 3 development branch, introducing incompatibilities with Python 2 and fragmenting the community writing software in Python.

With the risk of perfectly usable software becoming neglected, its use actively (and destructively) discouraged, it becomes relevant to consider how one might take control of one’s software tools for long-term stability, where tools might be good for decades of use instead of constantly changing their behaviour and obliging their users to constantly change their software. I expressed some of my thoughts about this earlier in the year having finally reached a point where I might be able to reflect on the matter.

So, the result of a great deal of work, informed by experiences and conversations over the years related to previous projects of my own and those of others, is a language and toolchain called Lichen. This language resembles Python in many ways but does not try to be a Python implementation. The toolchain compiles programs to C which can then be compiled and executed like “normal” binaries. Programs can be trivially cross-compiled by any available C cross-compilers, too, which is something that always seems to be a struggle elsewhere in the software world. Unlike other Python compilers or implementations, it does not use CPython’s libraries, nor does it generate in “longhand” the work done by the CPython virtual machine.

One might wonder why anyone should bother developing such a toolchain given its incompatibility with Python and a potential lack of any other compelling reason for people to switch. Given that I had to accept some necessary reductions in the original scope of the project and to limit my level of ambition just to feel remotely capable of making something work, one does need to ask whether the result is too compromised to be attractive to others. At one point, programs manipulating integers were slower when compiled than when they were run by CPython, and this was incredibly disheartening to see, but upon further investigation I noticed that CPython effectively special-cases integer operations. The design of my implementation permitted me to represent integers as tagged references – a classic trick of various language implementations – and this overturned the disadvantage.

For me, just having the possibility of exploring alternative design decisions is interesting. Python’s design is largely done by consensus, with pronouncements made to settle disagreements and to move the process forward. Although this may have served the language well, depending on one’s perspective, it has also meant that certain paths of exploration have not been followed. Certain things have been improved gradually but not radically due to backwards compatibility considerations, this despite the break in compatibility between the Python 2 and 3 branches where an opportunity was undoubtedly lost to do greater things. Lichen is an attempt to explore those other paths without having to constantly justify it to a group of people who may regard such exploration as hostile to their own interests.

Lichen is not really complete: it needs floating point number and other useful types; its library is minimal; it could be made more robust; it could be made more powerful. But I find myself surprised that it works at all. Maybe I should have more confidence in myself, especially given all the preparation I did in trying to understand the good and bad aspects of my previous efforts before getting started on this one.

Developing for MIPS-based Platforms

A couple of years ago I found myself wondering if I couldn’t write some low-level software for the Ben NanoNote. One source of inspiration for doing this was “The CI20 bare-metal project“: a series of blog articles discussing the challenges of booting the MIPS Creator CI20 single-board computer. The Ben and the CI20 use CPUs (or SoCs) from the same family: the Ingenic JZ4720 and JZ4780 respectively.

For the Ben, I looked at the different boot payloads, principally those written to support booting from a USB host, but also the version of U-Boot deployed on the Ben. I combined elements of these things with the framebuffer driver code from the Linux kernel supporting the Ben, and to my surprise I was able to get the device to boot up and show a pattern on the screen. Progress has not always been steady, though.

For a while, I struggled to make the CPU leave its initial exception state without hanging, and with the screen as my only debugging tool, it was hard to see what might have been going wrong. Some careful study of the code revealed the problem: the code I was using to write to the framebuffer was using the wrong address region, meaning that as soon as an attempt was made to update the contents of the screen, the CPU would detect a bad memory access and an exception would occur. Such exceptions will not be delivered in the initial exception state, but with that state cleared, the CPU will happily trigger a new exception when the program accesses memory it shouldn’t be touching.

Debugging low-level code on the Ben NanoNote (the hard way)

Debugging low-level code on the Ben NanoNote (the hard way)

I have since plodded along introducing user mode functionality, some page table initialisation, trying to read keypresses, eventually succeeding after retracing my steps and discovering my errors along the way. Maybe this will become a genuinely useful piece of software one day.

But one useful purpose this exercise has served is that of familiarising myself with the way these SoCs are organised, the facilities they provide, how these may be accessed, and so on. My brother has the Letux 400 notebook containing yet another SoC in the same family, the JZ4730, which seems to be almost entirely undocumented. This notebook has proven useful under certain circumstances. For instance, it has been used as a kind of appliance for document scanning, driving a multifunction scanner/printer over USB using the enduring SANE project’s software.

However, the Letux 400 is already an old machine, with products based on this hardware platform being almost ten years old, and when originally shipped it used a 2.4 series Linux kernel instead of a more recent 2.6 series kernel. Like many products whose software is shipped as “finished”, this makes the adoption of newer software very difficult, especially if the kernel code is not “upstreamed” or incorporated into the official Linux releases.

As software distributions such as Debian evolve, they depend on newer kernel features, but if a device is stuck on an older kernel (because the special functionality that makes it work on that device is specific to that kernel) then the device, unable to run the newer kernels, gradually becomes unable to run newer versions of the distribution as well. Thus, Debian Etch was the newest distribution version that would work on the 2.4 kernel used by the Letux 400 as shipped.

Fortunately, work had been done to make a 2.6 series kernel work on the Letux 400, and this made Debian Lenny functional. But time passes and even this is now considered ancient. Although David was running some software successfully, there was other software that really needed a newer distribution to be able to run, and this meant considering what it might take to support Debian Squeeze on the hardware. So he set to work adding patches to the 2.6.24 kernel to try and take it within the realm of Squeeze support, making it beyond the bare minimum of 2.6.29 and into the “release candidate” territory of 2.6.30. And this was indeed enough to run Squeeze on the notebook, at least supporting the devices needed to make the exercise worthwhile.

Now, at a much earlier stage in my own experiments with the Ben NanoNote, I had tried without success to reproduce my results on the Letux 400. And I had also made a rather tentative effort at modifying Ben NanoNote kernel drivers to potentially work with the Letux 400 from some 3.x kernel version. David’s success in updating the kernel version led me to look again at the tasks of familiarising myself with kernel drivers, machine details and of supporting the Letux 400 in even newer kernels.

The outcome of this is uncertain at present. Most of the work on updating the drivers and board support has been done, but actual testing of my work still needs to be done, something that I cannot really do myself. That might seem strange: why start something I cannot finish by myself? But how I got started in this effort is also rather related to the topic of the next section.

The MIPS Creator CI20 and L4/Fiasco.OC

Low-level programming on the Ben NanoNote is frustrating unless you modify the device and solder the UART connections to the exposed pads in the battery compartment, thereby enabling a serial connection and allowing debugging information to be sent to a remote display for perusal. My soldering skills are not that great, and I don’t want to damage my device. So debugging was a frustrating exercise. Since I felt that I needed a bit more experience with the MIPS architecture and the Ingenic SoCs, it occurred to me that getting a CI20 might be the way to go.

I am not really a supporter of Imagination Technologies, producer of the CI20, due to the company’s rather hostile attitude towards Free Software around their PowerVR technologies, meaning that of the different graphics acceleration chipsets, PowerVR has been increasingly isolated as a technology that is consistently unsupportable by Free Software drivers. However, the CI20 is well-documented and has been properly supported with Free Software, apart from the PowerVR parts of the hardware, of course. Ingenic were seemingly persuaded to make the programming manual for the JZ4780 used by the CI20 publicly available, unlike the manuals for other SoCs in that family. And the PowerVR hardware is not actually needed to be able to use the CI20.

The MIPS Creator CI20 single-board computer

The MIPS Creator CI20 single-board computer

I had hoped that the EOMA68 campaign would have offered a JZ4775 computer card, and that the campaign might have delivered such a card by now, but with both of these things not having happened I took the plunge and bought a CI20. There were a few other reasons for doing so: I wanted to see how a single-board computer with a decent amount of RAM (1GB) might perform as a working desktop machine; having another computer to offload certain development and testing tasks, rather than run virtual machines, would be useful; I also wanted to experiment with and even attempt to port other operating systems, loosening my dependence on the Linux monoculture.

One of these other operating systems involves two components: the Fiasco.OC microkernel and the L4 Runtime Environment (L4Re). Over the years, microkernels in the L4 family have seen widespread use, and at one point people considered porting GNU Hurd to one of the L4 family microkernels from the Mach microkernel it then used (and still uses). It seems to me like something worth looking at more closely, and fortunately it also seemed that this software combination had been ported to the CI20. However, it turned out that my expectations of building an image, testing the result, and then moving on to developing interesting software were a little premature.

The first real problem was that GCC produced position-independent code that was not called correctly. This meant that upon trying to get the addresses of functions, the program would end up loading garbage addresses and trying to call any code that might be there at those addresses. So some fixes were required. Then, it appeared that the JZ4780 doesn’t support a particular MIPS instruction, meaning that the CPU would encounter this instruction and cause an exception. So, with some guidance, I wrote a handler to decode the instruction and generate the rather trivial result that the instruction should produce. There were also some more generic problems with the microkernel code that had previously been patched but which had not appeared in the upstream repository. But in the end, I got the “hello” program to run.

With a working foundation I tried to explore the hardware just as I had done with the Ben NanoNote, attempting to understand things like the clock and power management hardware, general purpose input/output (GPIO) peripherals, and also the Inter-Integrated Circuit (I2C) peripherals. Some assistance was available in the form of Linux kernel driver code, although the style of code can vary significantly, and it also takes time to “decode” various mechanisms in the Linux code and to unpick the useful bits related to the hardware. I had hoped to get further, but in trying to use the I2C peripherals to talk to my monitor using the DDC protocol, I found that the data being returned was not entirely reliable. This was arguably a distraction from the more interesting task of enabling the display, given that I know what resolutions my monitor supports.

However, all this hardware-related research and detective work at least gave me an insight into mechanisms – software and hardware – that would inform the effort to “decode” the vendor-written code for the Letux 400, making certain things seem a lot more familiar and increasing my confidence that I might be understanding the things I was seeing. For example, the JZ4720 in the Ben NanoNote arranges its hardware registers for GPIO configuration and access in a particular way, but the code written by the vendor for the JZ4730 in the Letux 400 accesses GPIO registers in a different way.

Initially, I might have thought that I was missing some important detail: are the two products really so different, and if not, then why is the code so different? But then, looking at the JZ4780, I encountered another scheme for GPIO register organisation that is different again, but which does have similarities to the JZ4730. With the JZ4780 being publicly documented, the code for the Letux 400 no longer seemed quite so bizarre or unfathomable. With more experience, it is possible to have a little more confidence in one’s understanding of the mechanisms at work.

I would like to spend a bit more time looking at microkernels and alternatives to Linux. While many people presumably think that Linux is running on everything and has “won”, it is increasingly likely that the Linux one sees on devices does not completely control the hardware and is, in fact, virtualised or confined by software systems like L4/Fiasco.OC. I also have reservations about the way Linux is developed and how well it is able to handle the demands of its proliferation onto every kind of device, many of them hooked up to the Internet and being left to fend for themselves.

Developing imip-agent

Alongside Lichen, a project that has been under development for the last couple of years has been imip-agent, allowing calendar-based scheduling activities to be integrated with mail transport agents. I haven’t been able to spend quite as much time on imip-agent this year as I might have liked, although I will also admit that I haven’t always been motivated to spend much time on it, either. Still, there have been brief periods of activity tidying up, fixing, or improving the code. And some interest in packaging the software led me to reconsider some of the techniques used to deploy the software, in particular the way scheduling extensions are discovered, and the way the system configuration is processed (since Debian does not want “executable scripts” in places like /etc, even if those scripts just contain some simple configuration setting definitions).

It is perhaps fairly typical that a project that tries to assess the feasibility of a concept accumulates the necessary functionality in order to demonstrate that it could do a particular task. After such an initial demonstration, the effort of making the code easier to work with, more reliable, more extensible, must occur if further progress is to be made. One intervention that kept imip-agent viable as a project was the introduction of a test suite to ensure that the basic functionality did indeed work. There were other architectural details that I felt needed remedying or improving for the code to remain manageable.

Recently, I have been refining the parts of the code that support editing of calendar objects and the exchange of updates caused by changes to calendar events. Such work is intended to make the Web client easier to understand and to expose such functionality to proper testing. One side-effect of this may be the introduction of a text-based client for people using e-mail programs like Mutt, as well as a potentially usable library for other mail clients. Such tidying up and fixing does not show off fancy new features or argue the case for developing such software in the first place, but I suppose it makes me feel better about the software I have written.

Whither Moin?

There are probably plenty of other little projects of my own that I have started or at least contemplated this year. And there are also projects that are not mine but which I use and which have had contributions from me over the years. One of these is the MoinMoin wiki software that powers a number of Free Software and other Web sites where collaborative editing is made available to the communities involved. I use MoinMoin – or Moin for short – to publish content on the Web myself, and I have encouraged others to use it in the past. However, it worries me now that the level of maintenance it is receiving has fallen to a level where updates for faults in the software are not likely to be forthcoming and where it is no longer clear where such updates should be coming from.

Earlier in the year, having previously read queries about the static export output from Moin, which can be rather basic and not necessarily resemble the appearance of the wiki such output has come from, I spent some time considering my own use of Moin for documentation publishing. For some of my projects, I don’t take advantage of the “through the Web” editing of the solution when publishing the public documentation. Instead, I use Moin locally, store the pages in a separate repository, and then make page packages that get installed on a public instance of Moin. This means that I do not have to worry about Web-based authentication and can just have a wiki as a read-only resource.

Obviously, the parts of Moin that I really need here are just the things that parse the wiki formatting (which I regard as more usable than other document markup formats in various respects) and that format the content as HTML. If I could format it as static content with some pages, some stylesheets, some images, with some Web server magic to make the URLs look nice, then that would probably be sufficient. For some things like the automatic generation of SVG from Graphviz-format files, I would also need to have the relevant parsers available, too. Having a complete Web framework, which is what Moin really is, is rather unnecessary with these diminished requirements.

But I do use Moin as a full wiki solution as well, and so it made me wonder whether I shouldn’t try and bring it up to date. Of course, there is already the MoinMoin 2.0 effort that was intended to modernise and tidy up the software, but since this effort made a clean break from Moin 1.x, it was never an attractive choice for those people already using Moin in anything more than a basic sense. Since there wasn’t an established API for extensions, it was not readily usable for many existing sites that rely on such extensions. In a way, Moin 2 has suffered from something that Python 3 only avoided by having a lot more people working on it, including people being paid to work on it, together with a policy of openly shaming those people who had made Python 2 viable – by developing software for it – into spending time migrating their code to Python 3.

I don’t have an obvious plan of action here. Moin perhaps illustrates the fundamental problem facing many Free Software projects, this being a theme that I have discussed regularly this year: how they may remain viable by having people able to dedicate their time to writing and maintaining Free Software without this work being squeezed in around the edges of people’s “actual work” and thus burdening them with yet another obligation in their lives, particularly one that is not rewarded by a proper appreciation of the sacrifice being made.

Plenty of individuals and organisations benefit from Moin, but we live in an age of “comparison shopping” where people will gladly drop one thing if someone offers them something newer and shinier. This is, after all, how everyone ends up using “free” services where the actual costs are hidden. To their credit, when Moin needed to improve its password management, the Python Software Foundation stepped up and funded this work rather than dropping Moin, which is what I had expected given certain Python community attitudes. Maybe other, more well-known organisations that use Moin also support its development, but I don’t really see much evidence of it.

Maybe they should consider doing so. The notion that something else will always come along, developed by some enthusiastic developer “scratching their itch”, is misguided and exploitative. And a failure to sustain Free Software development can only undermine Free Software as a resource, as an activity or a cause, and as the basis of many of those organisations’ continued existence. Many of us like developing Free Software, as I hope this article has shown, but motivation alone does not keep that software coming forever.

VGA Signal Generation with the PIC32

Monday, May 22nd, 2017

It all started after I had designed – and received from fabrication – a circuit board for prototyping cartridges for the Acorn Electron microcomputer. Although some prototyping had already taken place with an existing cartridge, with pins intended for ROM components being routed to drive other things, this board effectively “breaks out” all connections available to a cartridge that has been inserted into the computer’s Plus 1 expansion unit.

Acorn Electron cartridge breakout board

The Acorn Electron cartridge breakout board being used to drive an external circuit

One thing led to another, and soon my brother, David, was interfacing a microcontroller to the Electron in order to act as a peripheral being driven directly by the system’s bus signals. His approach involved having a program that would run and continuously scan the signals for read and write conditions and then interpret the address signals, sending and receiving data on the bus when appropriate.

Having acquired some PIC32 devices out of curiosity, with the idea of potentially interfacing them with the Electron, I finally took the trouble of looking at the datasheet to see whether some of the hard work done by David’s program might be handled by the peripheral hardware in the PIC32. The presence of something called “Parallel Master Port” was particularly interesting.

Operating this function in the somewhat insensitively-named “slave” mode, the device would be able to act like a memory device, with the signalling required by read and write operations mostly being dealt with by the hardware. Software running on the PIC32 would be able to read and write data through this port and be able to get notifications about new data while getting on with doing other things.

So began my journey into PIC32 experimentation, but this article isn’t about any of that, mostly because I put that particular investigation to one side after a degree of experience gave me perhaps a bit too much confidence, and I ended up being distracted by something far more glamorous: generating a video signal using the PIC32!

The Precedents’ Hall of Fame

There are plenty of people who have written up their experiments generating VGA and other video signals with microcontrollers. Here are some interesting examples:

And there are presumably many more pages on the Web with details of people sending pixel data over a cable to a display of some sort, often trying to squeeze every last cycle out of their microcontroller’s instruction processing unit. But, given an awareness of how microcontrollers should be able to take the burden off the programs running on them, employing peripheral hardware to do the grunt work of switching pins on and off at certain frequencies, maybe it would be useful to find examples of projects where such advantages of microcontrollers had been brought to bear on the problem.

In fact, I was already aware of the Maximite “single chip computer” partly through having seen the cloned version of the original being sold by Olimex – something rather resented by the developer of the Maximite for reasons largely rooted in an unfortunate misunderstanding of Free Software licensing on his part – and I was aware that this computer could generate a VGA signal. Indeed, the method used to achieve this had apparently been written up in a textbook for the PIC32 platform, albeit generating a composite video signal using one of the on-chip SPI peripherals. The Colour Maximite uses three SPI channels to generate one red, one green, and one blue channel of colour information, thus supporting eight-colour graphical output.

But I had been made aware of the Parallel Master Port (PMP) and its “master” mode, used to drive LCD panels with eight bits of colour information per pixel (or, using devices with many more pins than those I had acquired, with sixteen bits of colour information per pixel). Would it surely not be possible to generate 256-colour graphical output at the very least?

Information from people trying to use PMP for this purpose was thin on the ground. Indeed, reading again one article that mentioned an abandoned attempt to get PMP working in this way, using the peripheral to emit pixel data for display on a screen instead of a panel, I now see that it actually mentions an essential component of the solution that I finally arrived at. But the author had unfortunately moved away from that successful component in an attempt to get the data to the display at a rate regarded as satisfactory.

Direct Savings

It is one thing to have the means to output data to be sent over a cable to a display. It is another to actually send the data efficiently from the microcontroller. Having contemplated such issues in the past, it was not a surprise that the Maximite and other video-generating solutions use direct memory access (DMA) to get the hardware, as opposed to programs, to read through memory and to write its contents to a destination, which in most cases seemed to be the memory address holding output data to be emitted via a data pin using the SPI mechanism.

I had also envisaged using DMA and was still fixated on using PMP to emit the different data bits to the output circuit producing the analogue signals for the display. Indeed, Microchip promotes the PMP and DMA combination as a way of doing “low-cost controllerless graphics solutions” involving LCD panels, so I felt that there surely couldn’t be much difference between that and getting an image on my monitor via a few resistors on the breadboard.

And so, a tour of different PIC32 features began, trying to understand the DMA documentation, the PMP documentation, all the while trying to get a grasp of what the VGA signal actually looks like, the timing constraints of the various synchronisation pulses, and battle various aspects of the MIPS architecture and the PIC32 implementation of it, constantly refining my own perceptions and understanding and learning perhaps too often that there may have been things I didn’t know quite enough about before trying them out!

Using VGA to Build a Picture

Before we really start to look at a VGA signal, let us first look at how a picture is generated by the signal on a screen:

VGA Picture Structure

The structure of a display image or picture produced by a VGA signal

The most important detail at this point is the central area of the diagram, filled with horizontal lines representing the colour information that builds up a picture on the display, with the actual limits of the screen being represented here by the bold rectangle outline. But it is also important to recognise that even though there are a number of visible “display lines” within which the colour information appears, the entire “frame” sent to the display actually contains yet more lines, even though they will not be used to produce an image.

Above and below – really before and after – the visible display lines are the vertical back and front porches whose lines are blank because they do not appear on the screen or are used to provide a border at the top and bottom of the screen. Such extra lines contribute to the total frame period and to the total number of lines dividing up the total frame period.

Figuring out how many lines a display will have seems to involve messing around with something called the “generalised timing formula”, and if you have an X server like Xorg installed on your system, you may even have a tool called “gtf” that will attempt to calculate numbers of lines and pixels based on desired screen resolutions and frame rates. Alternatively, you can look up some common sets of figures on sites providing such information.

What a VGA Signal Looks Like

Some sources show diagrams attempting to describe the VGA signal, but many of these diagrams are open to interpretation (in some cases, very much so). They perhaps show the signal for horizontal (display) lines, then other signals for the entire image, but they either do not attempt to combine them, or they instead combine these details ambiguously.

For instance, should the horizontal sync (synchronisation) pulse be produced when the vertical sync pulse is active or during the “blanking” period when no pixel information is being transmitted? This could be deduced from some diagrams but only if you share their authors’ unstated assumptions and do not consider other assertions about the signal structure. Other diagrams do explicitly show the horizontal sync active during vertical sync pulses, but this contradicts statements elsewhere such as “during the vertical sync period the horizontal sync must also be held low”, for instance.

After a lot of experimentation, I found that the following signal structure was compatible with the monitor I use with my computer:

VGA Signal Structure

The basic structure of a VGA signal, or at least a signal that my monitor can recognise

There are three principal components to the signal:

  • Colour information for the pixel or line data forms the image on the display and it is transferred within display lines during what I call the visible display period in every frame
  • The horizontal sync pulse tells the display when each horizontal display line ends, or at least the frequency of the lines being sent
  • The vertical sync pulse tells the display when each frame (or picture) ends, or at least the refresh rate of the picture

The voltage levels appear to be as follows:

  • Colour information should be at 0.7V (although some people seem to think that 1V is acceptable as “the specified peak voltage for a VGA signal”)
  • Sync pulses are supposed to be at “TTL” levels, which apparently can be from 0V to 0.5V for the low state and from 2.7V to 5V for the high state

Meanwhile, the polarity of the sync pulses is also worth noting. In the above diagram, they have negative polarity, meaning that an active pulse is at the low logic level. Some people claim that “modern VGA monitors don’t care about sync polarity”, but since it isn’t clear to me what determines the polarity, and since most descriptions and demonstrations of VGA signal generation seem to use negative polarity, I chose to go with the flow. As far as I can tell, the gtf tool always outputs the same polarity details, whereas certain resources provide signal characteristics with differing polarities.

It is possible, and arguably advisable, to start out trying to generate sync pulses and just grounding the colour outputs until your monitor (or other VGA-capable display) can be persuaded that it is receiving a picture at a certain refresh rate and resolution. Such confirmation can be obtained on a modern display by seeing a blank picture without any “no signal” or “input not supported” messages and by being able to activate the on-screen menu built into the device, in which an option is likely to exist to show the picture details.

How the sync and colour signals are actually produced will be explained later on. This section was merely intended to provide some background and gather some fairly useful details into one place.

Counting Lines and Generating Vertical Sync Pulses

The horizontal and vertical sync pulses are each driven at their own frequency. However, given that there are a fixed number of lines in every frame, it becomes apparent that the frequency of vertical sync pulse occurrences is related to the frequency of horizontal sync pulses, the latter occurring once per line, of course.

With, say, 622 lines forming a frame, the vertical sync will occur once for every 622 horizontal sync pulses, or at a rate that is 1/622 of the horizontal sync frequency or “line rate”. So, if we can find a way of generating the line rate, we can not only generate horizontal sync pulses, but we can also count cycles at this frequency, and every 622 cycles we can produce a vertical sync pulse.

But how do we calculate the line rate in the first place? First, we decide what our refresh rate should be. The “classic” rate for VGA output is 60Hz. Then, we decide how many lines there are in the display including those extra non-visible lines. We multiply the refresh rate by the number of lines to get the line rate:

60Hz * 622 = 37320Hz = 37.320kHz

On a microcontroller, the obvious way to obtain periodic events is to use a timer. Given a particular frequency at which the timer is updated, a quick calculation can be performed to discover how many times a timer needs to be incremented before we need to generate an event. So, let us say that we have a clock frequency of 24MHz, and a line rate of 37.320kHz, we calculate the number of timer increments required to produce the latter from the former:

24MHz / 37.320kHz = 24000000Hz / 37320Hz = 643

So, if we set up a timer that counts up to 642 and then upon incrementing again to 643 actually starts again at zero, with the timer sending a signal when this “wraparound” occurs, we can have a mechanism providing a suitable frequency and then make things happen at that frequency. And this includes counting cycles at this particular frequency, meaning that we can increment our own counter by 1 to keep track of display lines. Every 622 display lines, we can initiate a vertical sync pulse.

One aspect of vertical sync pulses that has not yet been mentioned is their duration. Various sources suggest that they should last for only two display lines, although the “gtf” tool specifies three lines instead. Our line-counting logic therefore needs to know that it should enable the vertical sync pulse by bringing it low at a particular starting line and then disable it by bringing it high again after two whole lines.

Generating Horizontal Sync Pulses

Horizontal sync pulses take place within each display line, have a specific duration, and they must start at the same time relative to the start of each line. Some video output demonstrations seem to use lots of precisely-timed instructions to achieve such things, but we want to use the peripherals of the microcontroller as much as possible to avoid wasting CPU time. Having considered various tricks involving specially formulated data that might be transferred from memory to act as a pulse, I was looking for examples of DMA usage when I found a mention of something called the Output Compare unit on the PIC32.

What the Output Compare (OC) units do is to take a timer as input and produce an output signal dependent on the current value of the timer relative to certain parameters. In clearer terms, you can indicate a timer value at which the OC unit will cause the output to go high, and you can indicate another timer value at which the OC unit will cause the output to go low. It doesn’t take much imagination to realise that this sounds almost perfect for generating the horizontal sync pulse:

  1. We take the timer previously set up which counts up to 643 and thus divides the display line period into units of 1/643.
  2. We identify where the pulse should be brought low and present that as the parameter for taking the output low.
  3. We identify where the pulse should be brought high and present that as the parameter for taking the output high.

Upon combining the timer and the OC unit, then configuring the output pin appropriately, we end up with a low pulse occurring at the line rate, but at a suitable offset from the start of each line.

VGA Display Line Structure

The structure of each visible display line in the VGA signal

In fact, the OC unit also proves useful in actually generating the vertical sync pulses, too. Although we have a timer that can tell us when it has wrapped around, we really need a mechanism to act upon this signal promptly, at least if we are to generate a clean signal. Unfortunately, handling an interrupt will introduce a delay between the timer wrapping around and the CPU being able to do something about it, and it is not inconceivable that this delay may vary depending on what the CPU has been doing.

So, what seems to be a reasonable solution to this problem is to count the lines and upon seeing that the vertical sync pulse should be initiated at the start of the next line, we can enable another OC unit configured to act as soon as the timer value is zero. Thus, upon wraparound, the OC unit will spring into action and bring the vertical sync output low immediately. Similarly, upon realising that the next line will see the sync pulse brought high again, we can reconfigure the OC unit to do so as soon as the timer value again wraps around to zero.

Inserting the Colour Information

At this point, we can test the basic structure of the signal and see if our monitor likes it. But none of this is very interesting without being able to generate a picture, and so we need a way of getting pixel information from the microcontroller’s memory to its outputs. We previously concluded that Direct Memory Access (DMA) was the way to go in reading the pixel data from what is usually known as a framebuffer, sending it to another place for output.

As previously noted, I thought that the Parallel Master Port (PMP) might be the right peripheral to use. It provides an output register, confusingly called the PMDIN (parallel master data in) register, that lives at a particular address and whose value is exposed on output pins. On the PIC32MX270, only the least significant eight bits of this register are employed in emitting data to the outside world, and so a DMA destination having a one-byte size, located at the address of PMDIN, is chosen.

The source data is the framebuffer, of course. For various retrocomputing reasons hinted at above, I had decided to generate a picture 160 pixels in width, 256 lines in height, and with each byte providing eight bits of colour depth (specifying how many distinct colours are encoded for each pixel). This requires 40 kilobytes and can therefore reside in the 64 kilobytes of RAM provided by the PIC32MX270. It was at this point that I learned a few things about the DMA mechanisms of the PIC32 that didn’t seem completely clear from the documentation.

Now, the documentation talks of “transactions”, “cells” and “blocks”, but I don’t think it describes them as clearly as it could do. Each “transaction” is just a transfer of a four-byte word. Each “cell transfer” is a collection of transactions that the DMA mechanism performs in a kind of batch, proceeding with these as quickly as it can until it either finishes the batch or is told to stop the transfer. Each “block transfer” is a collection of cell transfers. But what really matters is that if you want to transfer a certain amount of data and not have to keep telling the DMA mechanism to keep going, you need to choose a cell size that defines this amount. (When describing this, it is hard not to use the term “block” rather than “cell”, and I do wonder why they assigned these terms in this way because it seems counter-intuitive.)

You can perhaps use the following template to phrase your intentions:

I want to transfer <cell size> bytes at a time from a total of <block size> bytes, reading data starting from <source address>, having <source size>, and writing data starting at <destination address>, having <destination size>.

The total number of bytes to be transferred – the block size – is calculated from the source and destination sizes, with the larger chosen to be the block size. If we choose a destination size less than the source size, the transfers will not go beyond the area of memory defined by the specified destination address and the destination size. What actually happens to the “destination pointer” is not immediately obvious from the documentation, but for our purposes, where we will use a destination size of one byte, the DMA mechanism will just keep writing source bytes to the same destination address over and over again. (One might imagine the pointer starting again at the initial start address, or perhaps stopping at the end address instead.)

So, for our purposes, we define a “cell” as 160 bytes, being the amount of data in a single display line, and we only transfer one cell in a block. Thus, the DMA source is 160 bytes long, and even though the destination size is only a single byte, the DMA mechanism will transfer each of the source bytes into the destination. There is a rather unhelpful diagram in the documentation that perhaps tries to communicate too much at once, leading one to believe that the cell size is a factor in how the destination gets populated by source data, but the purpose of the cell size seems only to be to define how much data is transferred at once when a transfer is requested.

DMA Transfer Mechanism

The transfer of framebuffer data to PORTB using DMA cell transfers (noting that this hints at the eventual approach which uses PORTB and not PMDIN)

In the matter of requesting a transfer, we have already described the mechanism that will allow us to make this happen: when the timer signals the start of a new line, we can use the wraparound event to initiate a DMA transfer. It would appear that the transfer will happen as fast as both the source and the destination will allow, at least as far as I can tell, and so it is probably unlikely that the data will be sent to the destination too quickly. Once the transfer of a line’s pixel data is complete, we can do some things to set up the transfer for the next line, like changing the source data address to point to the next 160 bytes representing the next display line.

(We could actually set the block size to the length of the entire framebuffer – by setting the source size – and have the DMA mechanism automatically transfer each line in turn, updating its own address for the current line. However, I intend to support hardware scrolling, where the address of the first line of the screen can be adjusted so that the display starts part way through the framebuffer, reaches the end of the framebuffer part way down the screen, and then starts again at the beginning of the framebuffer in order to finish displaying the data at the bottom of the screen. The DMA mechanism doesn’t seem to support the necessary address wraparound required to manage this all by itself.)

Output Complications

Having assumed that the PMP peripheral would be an appropriate choice, I soon discovered some problems with the generated output. Although the data that I had stored in the RAM seemed to be emitted as pixels in appropriate colours, there were gaps between the pixels on the screen. Yet the documentation seemed to vaguely indicate that the PMDIN register was more or less continuously updated. That meant that the actual output signals were being driven low between each pixel, causing black-level gaps and ruining the result.

I wondered if anything could be done about this issue. PMP is really intended as some kind of memory interface, and it isn’t unreasonable for it to only maintain valid data for certain periods of time, modifying control signals to define this valid data period. That PMP can be used to drive LCD panels is merely a result of those panels themselves upholding this kind of interface. For those of you familiar with microcontrollers, the solution to my problem was probably obvious several paragraphs ago, but it needed me to reconsider my assumptions and requirements before I realised what I should have been doing all along.

Unlike SPI, which concerns itself with the bit-by-bit serial output of data, PMP concerns itself with the multiple-bits-at-once parallel output of data, and all I wanted to do was to present multiple bits to a memory location and have them translated to a collection of separate signals. But, of course, this is exactly how normal I/O (input/output) pins are provided on microcontrollers! They all seem to provide “PORT” registers whose bits correspond to output pins, and if you write a value to those registers, all the pins can be changed simultaneously. (This feature is obscured by platforms like Arduino where functions are offered to manipulate only a single pin at once.)

And so, I changed the DMA destination to be the PORTB register, which on the PIC32MX270 is the only PORT register with enough bits corresponding to I/O pins to be useful enough for this application. Even then, PORTB does not have a complete mapping from bits to pins: some pins that are available in other devices have been dedicated to specific functions on the PIC32MX270F256B and cannot be used for I/O. So, it turns out that we can only employ at most seven bits of our pixel data in generating signal data:

PORTB Pin Availability on the PIC32MX270F256B
Pins
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RPB15 RPB14 RPB13 RPB11 RPB10 RPB9 RPB8 RPB7 RPB5 RPB4 RPB3 RPB2 RPB1 RPB0

We could target the first byte of PORTB (bits 0 to 7) or the second byte (bits 8 to 15), but either way we will encounter an unmapped bit. So, instead of choosing a colour representation making use of eight bits, we have to make do with only seven.

Initially, not noticing that RPB6 was not available, I was using a “RRRGGGBB” or “332″ representation. But persuaded by others in a similar predicament, I decided to choose a representation where each colour channel gets two bits, and then a separate intensity bit is used to adjust the final intensity of the basic colour result. This also means that greyscale output is possible because it is possible to balance the channels.

The 2-bit-per-channel plus intensity colours

The colours employing two bits per channel plus one intensity bit, perhaps not shown completely accurately due to circuit inadequacies and the usual white balance issues when taking photographs

It is worth noting at this point that since we have left the 8-bit limitations of the PMP peripheral far behind us now, we could choose to populate two bytes of PORTB at once, aiming for sixteen bits per pixel but actually getting fourteen bits per pixel once the unmapped bits have been taken into account. However, this would double our framebuffer memory requirements for the same resolution, and we don’t have that much memory. There may be devices with more than sixteen bits mapped in the 32-bit PORTB register (or in one of the other PORT registers), but they had better have more memory to be useful for greater colour depths.

Back in Black

One other matter presented itself as a problem. It is all very well generating a colour signal for the pixels in the framebuffer, but what happens at the end of each DMA transfer once a line of pixels has been transmitted? For the portions of the display not providing any colour information, the channel signals should be held at zero, yet it is likely that the last pixel on any given line is not at the lowest possible (black) level. And so the DMA transfer will have left a stray value in PORTB that could then confuse the monitor, producing streaks of colour in the border areas of the display, making the monitor unsure about the black level in the signal, and also potentially confusing some monitors about the validity of the picture, too.

As with the horizontal sync pulses, we need a prompt way of switching off colour information as soon as the pixel data has been transferred. We cannot really use an Output Compare unit because that only affects the value of a single output pin, and although we could wire up some kind of blanking in our external circuit, it is simpler to look for a quick solution within the capabilities of the microcontroller. Fortunately, such a quick solution exists: we can “chain” another DMA channel to the one providing the pixel data, thereby having this new channel perform a transfer as soon as the pixel data has been sent in its entirety. This new channel has one simple purpose: to transfer a single byte of black pixel data. By doing this, the monitor will see black in any borders and beyond the visible regions of the display.

Wiring Up

Of course, the microcontroller still has to be connected to the monitor somehow. First of all, we need a way of accessing the pins of a VGA socket or cable. One reasonable approach is to obtain something that acts as a socket and that breaks out the different signals from a cable, connecting the microcontroller to these broken-out signals.

Wanting to get something for this task quickly and relatively conveniently, I found a product at a local retailer that provides a “male” VGA connector and screw-adjustable terminals to break out the different pins. But since the VGA cable also has a male connector, I also needed to get a “gender changer” for VGA that acts as a “female” connector in both directions, thus accommodating the VGA cable and the male breakout board connector.

Wiring up to the broken-out VGA connector pins is mostly a matter of following diagrams and the pin numbering scheme, illustrated well enough in various resources (albeit with colour signal transposition errors in some resources). Pins 1, 2 and 3 need some special consideration for the red, green and blue signals, and we will look at them in a moment. However, pins 13 and 14 are the horizontal and vertical sync pins, respectively, and these can be connected directly to the PIC32 output pins in this case, since the 3.3V output from the microcontroller is supposedly compatible with the “TTL” levels. Pins 5 through 10 can be connected to ground.

We have seen mentions of colour signals with magnitudes of up to 0.7V, but no explicit mention of how they are formed has been presented in this article. Fortunately, everyone is willing to show how they converted their digital signals to an analogue output, with most of them electing to use a resistor network to combine each output pin within a channel to produce a hopefully suitable output voltage.

Here, with two bits per channel, I take the most significant bit for a channel and send it through a 470ohm resistor. Meanwhile, the least significant bit for the channel is sent through a 1000ohm resistor. Thus, the former contributes more to the magnitude of the signal than the latter. If we were only dealing with channel information, this would be as much as we need to do, but here we also employ an intensity bit whose job it is to boost the channels by a small amount, making sure not to allow the channels to pollute each other via this intensity sub-circuit. Here, I feed the intensity output through a 2200ohm resistor and then to each of the channel outputs via signal diodes.

VGA Output Circuit

The circuit showing connections relevant to VGA output (generic connections are not shown)

The Final Picture

I could probably go on and cover other aspects of the solution, but the fundamental aspects are probably dealt with sufficiently above to help others reproduce this experiment themselves. Populating memory with usable image data, at least in this solution, involves copying data to RAM, and I did experience problems with accessing RAM that are probably related to CPU initialisation (as covered in my previous article) and to synchronising the memory contents with what the CPU has written via its cache.

As for the actual picture data, the RGB-plus-intensity representation is not likely to be the format of most images these days. So, to prepare data for output, some image processing is needed. A while ago, I made a program to perform palette optimisation and dithering on images for the Acorn Electron, and I felt that it was going to be easier to adapt the dithering code than it was to figure out the necessary techniques required for software like ImageMagick or the Python Imaging Library. The pixel data is then converted to assembly language data definition statements and incorporated into my PIC32 program.

VGA output from a PIC32 microcontroller

VGA output from a PIC32 microcontroller, featuring a picture showing some Oslo architecture, with the PIC32MX270 being powered (and programmed) by the Arduino Duemilanove, and with the breadboards holding the necessary resistors and diodes to supply the VGA breakout and, beyond that, the cable to the monitor

To demonstrate control over the visible region, I deliberately adjusted the display frequencies so that the monitor would consider the signal to be carrying an image 800 pixels by 600 pixels at a refresh rate of 60Hz. Since my framebuffer is only 256 lines high, I double the lines to produce 512 lines for the display. It would seem that choosing a line rate to try and produce 512 lines has the monitor trying to show something compatible with the traditional 640×480 resolution and thus lines are lost off the screen. I suppose I could settle for 480 lines or aim for 300 lines instead, but I actually don’t mind having a border around the picture.

The on-screen menu showing the monitor's interpretation of the signal

The on-screen menu showing the monitor's interpretation of the signal

It is worth noting that I haven’t really mentioned a “pixel clock” or “dot clock” so far. As far as the display receiving the VGA signal is concerned, there is no pixel clock in that signal. And as far as we are concerned, the pixel clock is only important when deciding how quickly we can get our data into the signal, not in actually generating the signal. We can generate new colour values as slowly (or as quickly) as we might like, and the result will be wider (or narrower) pixels, but it shouldn’t make the actual signal invalid in any way.

Of course, it is important to consider how quickly we can generate pixels. Previously, I mentioned a 24MHz clock being used within the PIC32, and it is this clock that is used to drive peripherals and this clock’s frequency that will limit the transfer speed. As noted elsewhere, a pixel clock frequency of 25MHz is used to support the traditional VGA resolution of 640×480 at 60Hz. With the possibilities of running the “peripheral clock” in the PIC32MX270 considerably faster than this, it becomes a matter of experimentation as to how many pixels can be supported horizontally.

Some Oslo street art being displayed by the PIC32

Some Oslo street art being displayed by the PIC32

For my own purposes, I have at least reached the initial goal of generating a stable and usable video signal. Further work is likely to involve attempting to write routines to modify the framebuffer, maybe support things like scrolling and sprites, and even consider interfacing with other devices.

Naturally, this project is available as Free Software from its own repository. Maybe it will inspire or encourage you to pursue something similar, knowing that you absolutely do not need to be any kind of “expert” to stubbornly persist and to eventually get results!

Evaluating PIC32 for Hardware Experiments

Friday, May 19th, 2017

Some time ago I became aware of the PIC32 microcontroller platform, perhaps while following various hardware projects, pursuing hardware-related interests, and looking for pertinent documentation. Although I had heard of PIC microcontrollers before, my impression of them was that they were mostly an alternative to other low-end computing products like the Atmel AVR series, but with a development ecosystem arguably more reliant on its vendor than the Atmel products for which tools like avr-gcc and avrdude exist as Free Software and have gone on to see extensive use, perhaps most famously in connection with the Arduino ecosystem.

What made PIC32 stand out when I first encountered it, however, was that it uses the MIPS32 instruction set architecture instead of its own specialised architecture. Consequently, instead of being reliant on the silicon vendor and random third-party tool providers for proprietary tools, the possibility of using widely-used Free Software tools exists. Moreover, with a degree of familiarity with MIPS assembly language, thanks to the Ben NanoNote, I felt that there might be an opportunity to apply some of my elementary skills to another MIPS-based system and gain some useful experience.

Some basic investigation occurred before I made any attempt to acquire hardware. As anyone having to pay attention to the details of hardware can surely attest, it isn’t much fun to obtain something only to find that some necessary tool required for the successful use of that hardware happens to be proprietary, only works on proprietary platforms, or is generally a nuisance in some way that makes you regret the purchase. Looking around at various resources such as the Pinguino Web site gave me some confidence that there were people out there using PIC32 devices with Free Software. (Of course, the eventual development scenario proved to be different from that envisaged in these initial investigations, but previous experience has taught me to expect that things may not go as originally foreseen.)

Some Discoveries

So that was perhaps more than enough of an introduction, given that I really want to focus on some of my discoveries in using my acquired PIC32 devices, hoping that writing them up will help others to use Free Software with this platform. So, below, I will present a few discoveries that may well, for all I know, be “obvious” to people embedded in the PIC universe since it began, or that might be “superfluous” to those happy that Microchip’s development tools can obscure the operation of the hardware to offer a simpler “experience”.

I should mention at this point that I chose to acquire PDIP-profile products for use with a solderless breadboard. This is the level of sophistication at which the Arduino products mostly operate and it allows convenient prototyping involving discrete components and other electronic devices. The evidence from the chipKIT and Pinguino sites suggested that it would be possible to set up a device on a breadboard, wire it up to a power supply with a few supporting components, and then be able to program it. (It turned out that programming involved another approach than indicated by that latter reference, however.)

The 28-pin product I elected to buy was the PIC32MX270F256B-50/SP. I also bought some capacitors that were recommended for wiring up the device. In the picture below, you can just about see the capacitor connecting two of the pins, and there is also a resistor pulling up one of the pins. I recommend obtaining a selection of resistors of different values so as to be able to wire up various circuits as you encounter them. Note that the picture does not show a definitive wiring guide: please refer to the product documentation or to appropriate project documentation for such things.

PIC32 device on a mini-breadboard

PIC32 device on a mini-breadboard connected to an Arduino Duemilanove for power and programming

Programming the Device

Despite the apparent suitability of a program called pic32prog, recommended by the “cheap DIY programmer” guide, I initially found success elsewhere. I suppose it didn’t help that the circuit diagram was rather hard to follow for me, as someone who isn’t really familiar with certain electrical constructs that have been mixed in, arguably without enough explanation.

Initial Recommendation: ArduPIC32

Instead, I looked for a solution that used an Arduino product (not something procured from ephemeral Chinese “auction site” vendors) and found ArduPIC32 living a quiet retirement in the Google Code Archive. Bypassing tricky voltage level conversion and connecting an Arduino Duemilanove with the PIC32 device on its 5V-tolerant JTAG-capable pins, ArduPIC32 mostly seemed to work, although I did alter it slightly to work the way I wanted to and to alleviate the usual oddness with Arduino serial-over-USB connections.

However, I didn’t continue to use ArduPIC. One reason is that programming using the JTAG interface is slow, but a much more significant reason is that the use of JTAG means that the pins on the PIC32 associated with JTAG cannot be used for other things. This is either something that “everybody” knows or just something that Microchip doesn’t feel is important enough to mention in convenient places in the product documentation. More on this topic below!

Final Recommendation: Pickle (and Nanu Nanu)

So, I did try and use pic32prog and the suggested circuit, but had no success. Then, searching around, I happened to find some very useful resources indeed: Pickle is a GPL-licensed tool for uploading data to different PIC devices including PIC32 devices; Nanu Nanu is a GPL-licensed program that runs on AVR devices and programs PIC32 devices using the ICSP interface. Compiling the latter for the Arduino and uploading it in the usual way (actually done by the Makefile), it can then run on the Arduino and be controlled by the Pickle tool.

Admittedly, I did have some problems with the programming circuit, most likely self-inflicted, but the developer of these tools was very responsive and, as I know myself from being in his position in other situations, provided the necessary encouragement that was perhaps most sorely lacking to get the PIC32 device talking. I used the “sink” or “open drain” circuit so that the Arduino would be driving the PIC32 device using a suitable voltage and not the Arduino’s native 5V. Conveniently, this is the default configuration for Nanu Nanu.

PIC32 on breadboard with Arduino programming circuit

PIC32 on breadboard with Arduino programming circuit (and some LEDs for diagnostic purposes)

I should point out that the Pinguino initiative promotes USB programming similar to that employed by the Arduino series of boards. Even though that should make programming very easy, it is still necessary to program the USB bootloader on the PIC32 device using another method in the first place. And for my experiments involving an integrated circuit on a breadboard, setting up a USB-based programming circuit is a distraction that would have complicated the familiarisation process and would have mostly duplicated the functionality that the Arduino can already provide, even though this two-stage programming circuit may seem a little contrived.

Compiling Code for the Device

This is perhaps the easiest problem to solve, strongly motivating my choice of PIC32 in the first place…

Recommendation: GNU Toolchain

PIC32 uses the MIPS32 instruction set. Since MIPS has been around for a very long time, and since the architecture was prominent in workstations, servers and even games consoles in the late 1980s and 1990s, remaining in widespread use in more constrained products such as routers as this century has progressed, the GNU toolchain (GCC, binutils) has had a long time to comfortably support MIPS. Although the computer you are using is not particularly likely to be MIPS-based, cross-compiling versions of these tools can be built to run on, say, x86 or x86-64 while generating MIPS32 executable programs.

And fortunately, Debian GNU/Linux provides the mipsel-linux-gnu variant of the toolchain packages (at least in the unstable version of Debian) that makes the task of building software simply a matter of changing the definitions for the compiler, linker and other tools in one’s Makefile to use these variants instead of the unprefixed “host” gcc, ld, and so on. You can therefore keep using the high-quality Free Software tools you already know. The binutils-mipsel-linux-gnu package provides an assembler, if you just want to practise your MIPS assembly language, as well as a linker and tools for creating binaries. Meanwhile, the gcc-mipsel-linux-gnu package provides a C compiler.

Perhaps the only drawback of using the GNU tools is that people using the proprietary tools supplied by Microchip and partners will post example code that uses special notation interpreted in a certain way by those products. Fortunately, with some awareness that this is going on, we can still support the necessary functionality in our own way, as described below.

Configuring the Device

With the PIC32, and presumably with the other PIC products, there is a distinct activity of configuring the device when programming it with program code and data. This isn’t so obvious until one reads the datasheets, tries to find a way of changing some behaviour of the device, and then stumbles across the DEVCFG registers. These registers cannot be set in a running program: instead, they are “programmed” before the code is run.

You might wonder what the distinction is between “programming” the device to take the code you have written and “programming” the configuration registers, and there isn’t much difference conceptually. All that matters is that you have your program written into one part of the device’s memory and you also ask for certain data values to be written to the configuration registers. How this is done in the Microchip universe is with something like this:

#pragma config SOMETHING = SOMEVALUE;

With a basic GNU tool configuration, we need to find the equivalent operation and express it in a different way (at least as far as I know, unless someone has implemented an option to support this kind of notation in the GNU tools). The mechanism for achieving this is related to the linker script and is described in a section of this article presented below. For now, we will concentrate on what configuration settings we need to change.

Recommendation: Disable JTAG

As briefly mentioned above, one thing that “everybody” knows, at least if they are using Microchip’s own tools and are busy copying code from Microchip’s examples, is that the JTAG functionality takes over various pins and won’t let you use them, even if you switch on a “peripheral” in the microcontroller that needs to use those pins. Maybe I am naive about how intrusive JTAG should or should not be, but the lesson from this matter is just to configure the device to not have JTAG features enabled and to use the ICSP programming method recommended above instead.

Recommendation: Disable Watchdog Timer

Again, although I am aware of the general concept of a watchdog timer – something that resets a device if it thinks that the device has hung, crashed, or experienced something particularly bad – I did not expect something like this to necessarily be configured by default. In case it is, and one does see lots of code assuming so, then it should be disabled as well. Otherwise, I can imagine that you might experience spontaneous resets for no obvious reason.

Recommendation: Configure the Oscillator and Clocks

If you need to change the oscillator frequency or the origin of the oscillator used by the PIC32, it is perhaps best to do this in the configuration registers rather than try and mess around with this while the device is running. Indeed, some configuration is probably unavoidable even if there is a need to, say, switch between oscillators at run-time. Meanwhile, the relationship between the system clock (used by the processor to execute instructions) and the peripheral clock (used to interact with other devices and to run things like timers) is defined in the configuration registers.

Linker Scripts

So, to undertake the matter of configuration, a way is needed to express the setting of configuration register values in a general way. For this, we need to take a closer look at linker scripts. If you routinely compile and link software, you will be using linker scripts without realising it, because such scripts are telling the tools things about where parts of the program should be stored, what kinds of addresses they should be using, and so on. Here, we need to take more of an interest in such matters.

Recommendation: Expand a Simple Script

Writing linker scripts does not seem like much fun. The syntax is awkward to read and to produce as a human being, and knowledge about tool output is assumed. However, it is possible to start with a simple script that works for someone else in a similar situation and to modify it conservatively in order to achieve what you need. I started out with one that just defined the memory regions and a few sections. To avoid reproducing all the details here, I will just show what the memory regions for a configuration register look like:

  config2                    : ORIGIN = 0xBFC00BF4, LENGTH = 0x4
  physical_config2           : ORIGIN = 0x3FC00BF4, LENGTH = 0x4

This will be written in the MEMORY construct in the script. What they tell us is that config2 is a region of four bytes starting at virtual address 0xBFC00BF4, which is the location of the DEVCFG2 register as specified in the PIC32MX270 documentation. However, this register actually has a physical address of 0x3FC00BF4. The difference between virtual addresses and physical addresses is perhaps easiest to summarise by saying that CPU instructions would use the virtual address when referencing the register, whereas the actual memory of the device employs the physical address to refer to this four-byte region.

Meanwhile, in the SECTIONS construct, there needs to be something like this:

  .devcfg2 : {
        *(.devcfg2)
        } > config2 AT > physical_config2

Now you might understand my remark about the syntax! Nevertheless, what these things do is to tell the tools to put things from a section called .devcfg2 in the physical_config2 memory region and, if there were to be any address references in the data (which there isn’t in this case), then they would use the addresses in the config2 region.

Recommendation: Define Configuration Sections and Values in the Code

Since I have been using assembly language, here is what I do in my actual program source file. Having looked at the documentation and figured out which configuration register I need to change, I introduce a section in the code that defines the register value. For DEVCFG2, it looks like this:

.section .devcfg2, "a"
.word 0xfff9fffb        /* DEVCFG2<18:16> = FPLLODIV<2:0> = 001;
                        DEVCFG2<6:4> = FPLLMUL<2:0> = 111;
                        DEVCFG2<2:0> = FPLLIDIV<2:0> = 011 */

Here, I fully acknowledge that this might not be the most optimal approach, but when you’re learning about all these things at once, you make progress as best you can. In any case, what this does is to tell the assembler to include the .devcfg2 section and to populate it with the specified “word”, which is four bytes on the 32-bit PIC32 platform. This word contains the value of the register which has been expressed in hexadecimal with the most significant digit first.

Returning to our checklist of configuration tasks, what we now need to do is to formulate each one in terms of configuration register values, introduce the necessary sections and values, and adjust the values to contain the appropriate bit patterns. The above example shows how the DEVCFG2 bits are adjusted and set in the value of the register. Here is a short amplification of the operation:

DEVCFG2 Bits
31…28 27…24 23…20 19…16 15…12 11…8 7…4 3…0
1111 1111 1111 1001
FPLLODIV<2:0>
1111 1111 1111
FPLLMUL<2:0>
1011
FPLLIDIV<2:0>
f f f 9 f f f b

Here, the underlined bits are those of interest and have been changed to the desired values. It turns out that we can set the other bits as 1 for the functionality we want (or don’t want) in this case.

By the way, the JTAG functionality is disabled in the DEVCFG0 register (JTAGEN, bit 2, on this device). The watchdog timer is disabled in DEVCFG1 (FWDTEN, bit 23, on this device).

Recommendation: Define Regions for Exceptions and Interrupts

The MIPS architecture has the processor jump to certain memory locations when bad things happen (exceptions) or when other things happen (interrupts). We will cover the details of this below, but while we are setting up the places in memory where things will reside, we might as well define where the code to handle exceptions and interrupts will be living:

  .flash : { *(.flash*) } > kseg0_program_mem AT > physical_program_mem

This will be written in the SECTIONS construct. It relies on a memory region being defined, which would appear in the MEMORY construct as follows:

  kseg0_program_mem    (rx)  : ORIGIN = 0x9D000000, LENGTH = 0x40000
  physical_program_mem (rx)  : ORIGIN = 0x1D000000, LENGTH = 0x40000

These definitions allow the .flash section to be placed at 0x9D00000 but actually be written to memory at 0x1D00000.

Initialising the Device

On the various systems I have used in the past, even when working in assembly language I never had to deal with the earliest stages of the CPU’s activity. However, on the MIPS systems I have used in more recent times, I have been confronted with the matter of installing code to handle system initialisation, and this does require some knowledge of what MIPS processors would like to do when something goes wrong or if some interrupt arrives and needs to be processed.

The convention with the PIC32 seems to be that programs are installed within the MIPS KSEG0 region (one of the four principal regions) of memory, specifically at address 0x9FC00000, and so in the linker script we have MEMORY definitions like this:

  kseg0_boot_mem       (rx)  : ORIGIN = 0x9FC00000, LENGTH = 0xBF0
  physical_boot_mem    (rx)  : ORIGIN = 0x1FC00000, LENGTH = 0xBF0

As you can see, this region is far shorter than the 512MB of the KSEG0 region in its entirety. Indeed, 0xBF0 is only 3056 bytes! So, we need to put more substantial amounts of code elsewhere, some of which will be tasked with handling things when they go wrong.

Recommendation: Define Exception and Interrupt Handlers

As we have seen, the matter of defining routines to handle errors and interrupt conditions falls on the unlucky Free Software developer in this case. When something goes wrong, like the CPU not liking an instruction it has just seen, it will jump to a predefined location and try and execute code to recover. By default, with the PIC32, this location will be at address 0×80000000 which is the start of RAM, but if the RAM has not been configured then the CPU will not achieve very much trying to load instructions from that address.

Now, it can be tempting to set the “exception base”, as it is known, to be the place where our “boot” code is installed (at 0x9FC00000). So if something bad does happen, our code will start running again “from the top”, a bit like what used to happen if you ever wrote BASIC code that said…

10 ON ERROR GOTO 10

Clearly, this isn’t particularly desirable because it can mask problems. Indeed, I found that because I had not observed a step in the little dance required to change interrupt locations, my program would be happily restarting itself over and over again upon receiving interrupts. This is where the work done in the linker script starts to pay off: we can move the exception handler, this being the code residing at the exception base, to another region of memory and tell the CPU to jump to that address instead. We should therefore never have unscheduled restarts occurring once this is done.

Again, I have been working in assembly language, so I position my exception handling code using a directive like this:

.section .flash, "a"

exception_handler:
    /* Exception handling code goes here. */

Note that .flash is what we mentioned in the linker script. After this directive, the exception handler is defined so that the CPU has something to jump to. But exceptions are just one kind of condition that may occur, and we also need to handle interrupts. Although we might be able to handle both things together, you can instead position an interrupt handler after the exception handler at a well-defined offset, and the CPU can be told to use that when it receives an interrupt. Here is what I do:

.org 0x200

interrupt_handler:
    /* Interrupt handling code goes here. *

The .org directive tells the assembler that the interrupt handler will reside at an offset of 0×200 from the start of the .flash section. This number isn’t something I have made up: it is defined by the MIPS architecture and will be observed by the CPU when suitably configured.

So that leaves the configuration. Although one sees a lot of advocacy for the “multi-vector” interrupt handling support of the PIC32, possibly because the Microchip example code seems to use it, I wanted to stick with the more widely available “single-vector” support which is what I have effectively described above: you have one place the CPU jumps to and it is then left to the code to identify the interrupt condition – what it is that happened, exactly – and then handle the condition appropriately. (Multi-vector handling has the CPU identify the kind of condition and then choose a condition-specific location to jump to all by itself.)

The following steps are required for this to work properly:

  1. Make sure that the CP0 (co-processor 0) STATUS register has the BEV bit set. Otherwise things will fail seemingly inexplicably.
  2. Set the CP0 EBASE (exception base) register to use the exception handler address. This changes the target of the jump that occurs when an exception or interrupt occurs.
  3. Set the CP0 INTCTL (interrupt control) register to use a non-zero vector spacing, even though this setting probably has no relevance to the single-vector mode.
  4. Make sure that the CP0 CAUSE (exception cause) register has the IV bit set. This tells the CPU that the interrupt handler is at that magic 0×200 offset from the exception handler, meaning that interrupts should be dispatched there instead of to the exception handler.
  5. Now make sure that the CP0 STATUS register has the BEV bit cleared, so that the CPU will now use the new handlers.

To enable exceptions and interrupts, the IE bit in the CP0 STATUS register must be set, but there are also other things that must be done for them to actually be delivered.

Recommendation: Handle the Initial Error Condition

As I found out with the Ben NanoNote, a MIPS device will happily run in its initial state, but it turns out that this state is some kind of “error” state that prevents exceptions and interrupts from being delivered, even though interrupts may have been enabled. This does make some kind of sense: if a program is in the process of setting up interrupt handlers, it doesn’t really want an interrupt to occur before that work is done.

So, the MIPS architecture defines some flags in the CP0 STATUS register to override normal behaviour as some kind of “fail-safe” controls. The ERL (error level) bit is set when the CPU starts up, preventing errors (or probably most errors, at least) as well as interrupts from interrupting the execution of the installed code. The EXL (exception level) bit may also be set, preventing exceptions and interrupts from occurring even when ERL is clear. Thus, both of these bits must be cleared for interrupts to be enabled and delivered, and what happens next rather depends on how successful we were in setting up those handlers!

Recommendation: Set up the Global Offset Table

Something that I first noticed when looking at code for the Ben NanoNote was the initialisation of a register to refer to something called the global offset table (or GOT). For anyone already familiar with MIPS assembly language or the way code is compiled for the architecture, programs follow a convention where they refer to objects via a table containing the addresses of those objects. For example:

Global Offset Table
Offset from Start Member
+0 0
+4 address of _start
+8 address of my_routine
+12 address of other_routine

But in order to refer to the table, something has to have the address of the table. This is where the $gp register comes in: it holds that address and lets the code access members of the table relative to the address.

What seems to work for setting up the $gp register is this:

        lui $gp, %hi(_GLOBAL_OFFSET_TABLE_)
        ori $gp, $gp, %lo(_GLOBAL_OFFSET_TABLE_)

Here, the upper 16 bits of the $gp register are set with the “high” 16 bits of the value assigned to the special _GLOBAL_OFFSET_TABLE_ symbol, which provides the address of the special .got section defined in the linker script. This value is then combined using a logical “or” operation with the “low” 16 bits of the symbol’s value.

(I’m sure that anyone reading this far will already know that MIPS instructions are fixed length and are the same length as the address being loaded here, so it isn’t possible to fit the address into the load instruction. So each half of the address has to be loaded separately by different instructions. If you look at the assembled output of an assembly language program employing the “li” instruction, you will see that the assembler breaks this instruction down into “lui” and “ori” if it needs to. Note well that you cannot rely on the “la” instruction to load the special symbol into $gp precisely because it relies on $gp having already been set up.)

Rounding Off

I could probably go on about MIPS initialisation rituals, setting up a stack for function calls, and so on, but that isn’t really what this article is about. My intention here is to leave enough clues and reminders for other people in a similar position and perhaps even my future self.

Even though I have some familiarity with the MIPS architecture, I suppose you might be wondering why I am not evaluating more open hardware platforms. I admit that it is partly laziness: I could get into doing FPGA-related stuff and deploy open microcontroller cores, maybe even combining them with different peripheral circuits, but that would be a longer project with a lot of familiarisation involved, plus I would have to choose carefully to get a product supported by Free Software. It is also cheapness: I could have ordered a HiFive1 board and started experimenting with the RISC-V architecture, but that board seems expensive for what it offers, at least once you disregard the pioneering aspects of the product and focus on the applications of interest.

So, for now, I intend to move slowly forwards and gain experiences with an existing platform. In my next article on this topic, I hope to present some of the things I have managed to achieve with the PIC32 and a selection of different components and technologies.

EOMA68: The Campaign (and some remarks about recurring criticisms)

Thursday, August 18th, 2016

I have previously written about the EOMA68 initiative and its objective of making small, modular computing cards that conform to a well-defined standard which can be plugged into certain kinds of device – a laptop or desktop computer, or maybe even a tablet or smartphone – providing a way of supplying such devices with the computing power they all need. This would also offer a convenient way of taking your computing environment with you, using it in the kind of device that makes most sense at the time you need to use it, since the computer card is the actual computer and all you are doing is putting it in a different box: switch off, unplug the card, plug it into something else, switch that on, and your “computer” has effectively taken on a different form.

(This “take your desktop with you” by actually taking your computer with you is fundamentally different to various dubious “cloud synchronisation” services that would claim to offer something similar: “now you can synchronise your tablet with your PC!”, or whatever. Such services tend to operate rather imperfectly – storing your files on some remote site – and, of course, exposing you to surveillance and convenience issues.)

Well, a crowd-funding campaign has since been launched to fund a number of EOMA68-related products, with an opportunity for those interested to acquire the first round of computer cards and compatible devices, those devices being a “micro-desktop” that offers a simple “mini PC” solution, together with a somewhat radically designed and produced laptop (or netbook, perhaps) that emphasises accessible construction methods (home 3D printing) and alternative material usage (“eco-friendly plywood”). In the interests of transparency, I will admit that I have pledged for a card and the micro-desktop, albeit via my brother for various personal reasons that also delayed me from actually writing about this here before now.

An EOMA68 computer card in a wallet

An EOMA68 computer card in a wallet (courtesy Rhombus Tech/Crowd Supply)

Of course, EOMA68 is about more than just conveniently taking your computer with you because it is now small enough to fit in a wallet. Even if you do not intend to regularly move your computer card from device to device, it emphasises various sustainability issues such as power consumption (deliberately kept low), long-term support and matters of freedom (the selection of CPUs that completely support Free Software and do not introduce surveillance backdoors), and device longevity (that when the user wants to upgrade, they may easily use the card in something else that might benefit from it).

This is not modularity to prove some irrelevant hypothesis. It is modularity that delivers concrete benefits to users (that they aren’t forced to keep replacing products engineered for obsolescence), to designers and manufacturers (that they can rely on the standard to provide computing functionality and just focus on their own speciality to differentiate their product in more interesting ways), and to society and the environment (by reducing needless consumption and waste caused by the upgrade treadmill promoted by the technology industries over the last few decades).

One might think that such benefits might be received with enthusiasm. Sadly, it says a lot about today’s “needy consumer” culture that instead of welcoming another choice, some would rather spend their time criticising it, often to the point that one might wonder about their motivations for doing so. Below, I present some common criticisms and some of my own remarks.

(If you don’t want to read about “first world” objections – typically about “new” and “fast” – and are already satisfied by the decisions made regarding more understandable concerns – typically involving corporate behaviour and licensing – just skip to the last section.)

“The A20 is so old and slow! What’s the point?”

The Allwinner A20 has been around for a while. Indeed, its predecessor – the A10 – was the basis of initial iterations of the computer card several years ago. Now, the amount of engineering needed to upgrade the prototypes that were previously made to use the A10 instead of the A20 is minimal, at least in comparison to adopting another CPU (that would probably require a redesign of the circuit board for the card). And hardware prototyping is expensive, especially when unnecessary design changes have to be made, when they don’t always work out as expected, and when extra rounds of prototypes are then required to get the job done. For an initiative with a limited budget, the A20 makes a lot of sense because it means changing as little as possible, benefiting from the functionality upgrade and keeping the risks low.

Obviously, there are faster processors available now, but as the processor selection criteria illustrate, if you cannot support them properly with Free Software and must potentially rely on binary blobs which potentially violate the GPL, it would be better to stick to a more sustainable choice (because that is what adherence to Free Software is largely about) even if that means accepting reduced performance. In any case, at some point, other cards with different processors will come along and offer faster performance. Alternatively, someone will make a dual-slot product that takes two cards (or even a multi-slot product that provides a kind of mini-cluster), and then with software that is hopefully better-equipped for concurrency, there will be alternative ways of improving the performance to that of finding faster processors and hoping that they meet all the practical and ethical criteria.

“The RasPi 3…”

Lots of people love the Raspberry Pi, it would seem. The original models delivered a cheap, adequate desktop computer for a sum that was competitive even with some microcontroller-based single-board computers that are aimed at electronics projects and not desktop computing, although people probably overlook rivals like the BeagleBoard and variants that would probably have occupied a similar price point even if the Raspberry Pi had never existed. Indeed, the BeagleBone Black resides in the same pricing territory now, as do many other products. It is interesting that both product families are backed by certain semiconductor manufacturers, and the Raspberry Pi appears to benefit from privileged access to Broadcom products and employees that is denied to others seeking to make solutions using the same SoC (system on a chip).

Now, the first Raspberry Pi models were not entirely at the performance level of contemporary desktop solutions, especially by having only 256MB or 512MB RAM, meaning that any desktop experience had to be optimised for the device. Furthermore, they employed an ARM architecture variant that was not fully supported by mainstream GNU/Linux distributions, in particular the one favoured by the initiative: Debian. So a variant of Debian has been concocted to support the devices – Raspbian – and despite the Raspberry Pi 2 being the first device in the series to employ an architecture variant that is fully supported by Debian, Raspbian is still recommended for it and its successor.

Anyway, the Raspberry Pi 3 having 1GB RAM and being several times faster than the earliest models might be more competitive with today’s desktop solutions, at least for modestly-priced products, and perhaps it is faster than products using the A20. But just like the fascination with MHz and GHz until Intel found that it couldn’t rely on routinely turning up the clock speed on its CPUs, or everybody emphasising the number of megapixels their digital camera had until they discovered image noise, such number games ignore other factors: the closed source hardware of the Raspberry Pi boards, the opaque architecture of the Broadcom SoCs with a closed source operating system running on the GPU (graphics processing unit) that has control over the ARM CPU running the user’s programs, the impracticality of repurposing the device for things like laptops (despite people attempting to repurpose it for such things, anyway), and the organisation behind the device seemingly being happy to promote a variety of unethical proprietary software from a variety of unethical vendors who clearly want a piece of the action.

And finally, with all the fuss about how much faster the opaque Broadcom product is than the A20, the Raspberry Pi 3 has half the RAM of the EOMA68-A20 computer card. For certain applications, more RAM is going to be much more helpful than more cores or “64-bit!”, which makes us wonder why the Raspberry Pi 3 doesn’t support 4GB RAM or more. (Indeed, the current trend of 64-bit ARM products offering memory quantities addressable by 32-bit CPUs seems to have missed the motivation for x86 finally going 64-bit back in the early 21st century, which was largely about efficiently supporting the increasingly necessary amounts of RAM required for certain computing tasks, with Intel’s name for x86-64 actually being at one time “Extended Memory 64 Technology“. Even the DEC Alpha, back in the 1990s, which could be regarded as heralding the 64-bit age in mainstream computing, and which arguably relied on the increased performance provided by a 64-bit architecture for its success, still supported 64-bit quantities of memory in delivered products when memory was obviously a lot more expensive than it is now.)

“But the RasPi Zero!”

Sure, who can argue with a $5 (or £4, or whatever) computer with 512MB RAM and a 1GHz CPU that might even be a usable size and shape for some level of repurposing for the kinds of things that EOMA68 aims at: putting a general purpose computer into a wide range of devices? Except that the Raspberry Pi Zero has had persistent availability issues, even ignoring the free give-away with a magazine that had people scuffling in newsagents to buy up all the available copies so they could resell them online at several times the retail price. And it could be perceived as yet another inventory-dumping exercise by Broadcom, given that it uses the same SoC as the original Raspberry Pi.

Arguably, the Raspberry Pi Zero is a more ambiguous follow-on from the Raspberry Pi Compute Module that obviously was (and maybe still is) intended for building into other products. Some people may wonder why the Compute Module wasn’t the same success as the earlier products in the Raspberry Pi line-up. Maybe its lack of success was because any organisation thinking of putting the Compute Module (or, these days, the Pi Zero) in a product to sell to other people is relying on a single vendor. And with that vendor itself relying on a single vendor with whom it currently has a special relationship, a chain of single vendor reliance is formed.

Any organisation wanting to build one of these boards into their product now has to have rather a lot of confidence that the chain will never weaken or break and that at no point will either of those vendors decide that they would rather like to compete in that particular market themselves and exploit their obvious dominance in doing so. And they have to be sure that the Raspberry Pi Foundation doesn’t suddenly decide to get out of the hardware business altogether and pursue those educational objectives that they once emphasised so much instead, or that the Foundation and its manufacturing partners don’t decide for some reason to cease doing business, perhaps selectively, with people building products around their boards.

“Allwinner are GPL violators and will never get my money!”

Sadly, Allwinner have repeatedly delivered GPL-licensed software without providing the corresponding source code, and this practice may even persist to this day. One response to this has referred to the internal politics and organisation of Allwinner and that some factions try to do the right thing while others act in an unenlightened, licence-violating fashion.

Let it be known that I am no fan of the argument that there are lots of departments in companies and that just because some do some bad things doesn’t mean that you should punish the whole company. To this day, Sony does not get my business because of the unsatisfactorily-resolved rootkit scandal and I am hardly alone in taking this position. (It gets brought up regularly on a photography site I tend to visit where tensions often run high between Sony fanatics and those who use cameras from other manufacturers, but to be fair, Sony also has other ways of irritating its customers.) And while people like to claim that Microsoft has changed and is nice to Free Software, even to the point where people refusing to accept this assertion get criticised, it is pretty difficult to accept claims of change and improvement when the company pulls in significant sums from shaking down device manufacturers using dubious patent claims on Android and Linux: systems it contributed nothing to. And no, nobody will have been reading any patents to figure out how to implement parts of Android or Linux, let alone any belonging to some company or other that Microsoft may have “vacuumed up” in an acquisition spree.

So, should the argument be discarded here as well? Even though I am not too happy about Allwinner’s behaviour, there is the consideration that as the saying goes, “beggars cannot be choosers”. When very few CPUs exist that meet the criteria desirable for the initiative, some kind of nasty compromise may have to be made. Personally, I would have preferred to have had the option of the Ingenic jz4775 card that was close to being offered in the campaign, although I have seen signs of Ingenic doing binary-only code drops on certain code-sharing sites, and so they do not necessarily have clean hands, either. But they are actually making the source code for such binaries available elsewhere, however, if you know where to look. Thus it is most likely that they do not really understand the precise obligations of the software licences concerned, as opposed to deliberately withholding the source code.

But it may well be that unlike certain European, American and Japanese companies for whom the familiar regime of corporate accountability allows us to judge a company on any wrongdoing, because any executives unaware of such wrongdoing have been negligent or ineffective at building the proper processes of supervision and thus permit an unethical corporate culture, and any executives aware of such wrongdoing have arguably cultivated an unethical corporate culture themselves, it could be the case that Chinese companies do not necessarily operate (or are regulated) on similar principles. That does not excuse unethical behaviour, but it might at least entertain the idea that by supporting an ethical faction within a company, the unethical factions may be weakened or even eliminated. If that really is how the game is played, of course, and is not just an excuse for finger-pointing where nobody is held to account for anything.

But companies elsewhere should certainly not be looking for a weakening of their accountability structures so as to maintain a similarly convenient situation of corporate hypocrisy: if Sony BMG does something unethical, Sony Imaging should take the bad with the good when they share and exploit the Sony brand; people cannot have things both ways. And Chinese companies should comply with international governance principles, if only to reassure their investors that nasty surprises (and liabilities) do not lie in wait because parts of such businesses were poorly supervised and not held accountable for any unethical activities taking place.

It is up to everyone to make their own decision about this. The policy of the campaign is that the A20 can be supported by Free Software without needing any proprietary software, does not rely on any Allwinner-engineered, licence-violating software (which might be perceived as a good thing), and is merely the first step into a wider endeavour that could be conveniently undertaken with the limited resources available at the time. Later computer cards may ignore Allwinner entirely, especially if the company does not clean up its act, but such cards may never get made if the campaign fails and the wider endeavour never even begins in earnest.

(And I sincerely hope that those who are apparently so outraged by GPL violations actually support organisations seeking to educate and correct companies who commit such violations.)

“You could buy a top-end laptop for that price!”

Sure you could. But this isn’t about a crowd-funding campaign trying to magically compete with an optimised production process that turns out millions of units every year backed by a multi-billion-dollar corporation. It is about highlighting the possibilities of more scalable (down to the economically-viable manufacture of a single unit), more sustainable device design and construction. And by the way, that laptop you were talking about won’t be upgradeable, so when you tire of its performance or if the battery loses its capacity, I suppose you will be disposing of it (hopefully responsibly) and presumably buying something similarly new and shiny by today’s measures.

Meanwhile, with EOMA68, the computing part of the supposedly overpriced laptop will be upgradeable, and with sensible device design the battery (and maybe other things) will be replaceable, too. Over time, EOMA68 solutions should be competitive on price, anyway, because larger numbers of them will be produced, but unlike traditional products, the increased usable lifespans of EOMA68 solutions will also offer longer-term savings to their purchasers, too.

“You could just buy a used laptop instead!”

Sure you could. At some point you will need to be buying a very old laptop just to have a CPU without a surveillance engine and offering some level of upgrade potential, although the specification might be disappointing to you. Even worse, things don’t last forever, particularly batteries and certain kinds of electronic components. Replacing those things may well be a challenge, and although it is worthwhile to make sure things get reused rather than immediately discarded, you can’t rely on picking up a particular product in the second-hand market forever. And relying on sourcing second-hand items is very much for limited edition products, whereas the EOMA68 initiative is meant to be concerned with reliably producing widely-available products.

“Why pay more for ideological purity?”

Firstly, words like “ideology”, “religion”, “church”, and so on, might be useful terms for trolls to poison and polarise any discussion, but does anyone not see that expecting suspiciously cheap, increasingly capable products to be delivered in an almost conveyor belt fashion is itself subscribing to an ideology? One that mandates that resources should be procured at the minimum cost and processed and assembled at the minimum cost, preferably without knowing too much about the human rights abuses at each step. Where everybody involved is threatened that at any time their role may be taken over by someone offering the same thing for less. And where a culture of exploitation towards those doing the work grows, perpetuating increasing wealth inequality because those offering the services in question will just lean harder on the workers to meet their cost target (while they skim off “their share” for having facilitated the deal). Meanwhile, no-one buying the product wants to know “how the sausage is made”. That sounds like an ideology to me: one of neoliberalism combined with feigned ignorance of the damage it does.

Anyway, people pay for more sustainable, more ethical products all the time. While the wilfully ignorant may jeer that they could just buy what they regard as the same thing for less (usually being unaware of factors like quality, never mind how these things get made), more sensible people see that the extra they pay provides the basis for a fairer, better society and higher-quality goods.

“There’s no point to such modularity!”

People argue this alongside the assertion that systems are easy to upgrade and that they can independently upgrade the RAM and CPU in their desktop tower system or whatever, although they usually start off by talking about laptops, but clearly not the kind of “welded shut” laptops that they or maybe others would apparently prefer to buy (see above). But systems are getting harder to upgrade, particularly portable systems like laptops, tablets, smartphones (with Fairphone 2 being a rare exception of being something that might be upgradeable), and even upgradeable systems are not typically upgraded by most end-users: they may only manage to do so by enlisting the help of more knowledgeable relatives and friends.

I use a 32-bit system that is over 11 years old. It could have more RAM, and I could do the job of upgrading it, but guess how much I would be upgrading it to: 2GB, which is as much as is supported by the two prototyped 32-bit architecture EOMA68 computer card designs (A20 and jz4775). Only certain 32-bit systems actually support more RAM, mostly because it requires the use of relatively exotic architectural features that a lot of software doesn’t support. As for the CPU, there is no sensible upgrade path even if I were sure that I could remove the CPU without causing damage to it or the board. Now, 64-bit systems might offer more options, and in upgradeable desktop systems more RAM might be added, but it still relies on what the chipset was designed to support. Some chipsets may limit upgrades based on either manufacturer pessimism (no-one will be able to afford larger amounts in the near future) or manufacturer cynicism (no-one will upgrade to our next product if they can keep adding more RAM).

EOMA68 makes a trade-off in order to support the upgrading of devices in a way that should be accessible to people who are not experts: no-one should be dealing with circuit boards and memory modules. People who think hardware engineering has nothing to do with compromises should get out of their armchair, join one of the big corporations already doing hardware, and show them how it is done, because I am sure those companies would appreciate such market-dominating insight.

An EOMA68 computer card with the micro-desktop device

An EOMA68 computer card with the micro-desktop device (courtesy Rhombus Tech/Crowd Supply)

Back to the Campaign

But really, the criticisms are not the things to focus on here. Maybe EOMA68 was interesting to you and then you read one of these criticisms somewhere and started to wonder about whether it is a good idea to support the initiative after all. Now, at least you have another perspective on them, albeit from someone who actually believes that EOMA68 provides an interesting and credible way forward for sustainable technology products.

Certainly, this campaign is not for everyone. Above all else it is crowd-funding: you are pledging for rewards, not buying things, even though the aim is to actually manufacture and ship real products to those who have pledged for them. Some crowd-funding exercises never deliver anything because they underestimate the difficulties of doing so, leaving a horde of angry backers with nothing to show for their money. I cannot make any guarantees here, but given that prototypes have been made over the last few years, that videos have been produced with a charming informality that would surely leave no-one seriously believing that “the whole thing was just rendered” (which tends to happen a lot with other campaigns), and given the initiative founder’s stubbornness not to give up, I have a lot of confidence in him to make good on his plans.

(A lot of campaigns underestimate the logistics and, having managed to deliver a complicated technological product, fail to manage the apparently simple matter of “postage”, infuriating their backers by being unable to get packages sent to all the different countries involved. My impression is that logistics expertise is what Crowd Supply brings to the table, and it really surprises me that established freight and logistics companies aren’t dipping their toes in the crowd-funding market themselves, either by running their own services or taking ownership stakes and integrating their services into such businesses.)

Personally, I think that $65 for a computer card that actually has more RAM than most single-board computers is actually a reasonable price, but I can understand that some of the other rewards seem a bit more expensive than one might have hoped. But these are effectively “limited edition” prices, and the aim of the exercise is not to merely make some things, get them for the clique of backers, and then never do anything like this ever again. Rather, the aim is to demonstrate that such products can be delivered, develop a market for them where the quantities involved will be greater, and thus be able to increase the competitiveness of the pricing, iterating on this hopefully successful formula. People are backing a standard and a concept, with the benefit of actually getting some hardware in return.

Interestingly, one priority of the campaign has been to seek the FSF’s “Respects Your Freedom” (RYF) endorsement. There is already plenty of hardware that employs proprietary software at some level, leaving the user to merely wonder what some “binary blob” actually does. Here, with one of the software distributions for the computer card, all of the software used on the card and the policies of the GNU/Linux distribution concerned – a surprisingly awkward obstacle – will seek to meet the FSF’s criteria. Thus, the “Libre Tea” card will hopefully be one of the first general purpose computing solutions to actually be designed for RYF certification and to obtain it, too.

The campaign runs until August 26th and has over a thousand pledges. If nothing else, go and take a look at the details and the updates, with the latter providing lots of background including video evidence of how the software offerings have evolved over the course of the campaign. And even if it’s not for you, maybe people you know might appreciate hearing about it, even if only to follow the action and to see how crowd-funding campaigns are done.

Other people’s thoughts on “Freedom and security issues on x86 platforms”

Saturday, July 2nd, 2016

A couple of months ago, we had a brief discussion on the FSFE discussion mailing list about the topic of “Uncorrectable freedom and security issues on x86 platforms“, but it just came to my attention that a bunch of other people were discussing our discussion, too. Hacker News is, of course, so very “meta”, but fortunately they got onto discussing the actual topic as well.

The initial message in the original discussion advocated adopting the Power computing architecture as a primary hardware platform for Free Software. Now, the Hacker News participants were surprised that nobody mentioned SPARC and yet I was sure that SPARC did get mentioned in our discussion. A brief search doesn’t find any mention of it, however, and I’m embarrassed to admit that I do know about things like LEON and even used SPARC-based hardware for many years. (The Sun 4 workstations at my university had SPARC CPUs, for instance.)

I suppose the disconnect here involves price, availability and performance of readily-available products. Certainly, a free hardware SPARC implementation can be synthesised for an FPGA, but the previous discussion covered things like RISC-V in a similar fashion: it’s nice to have the ability to deploy a “soft processor” in an FPGA, but customers of computing products usually expect “hard” CPU performance. And you can at least buy ARM and MIPS CPUs, even if they aren’t free hardware implementations, having decent-enough performance which support Free Software from the very bottom of the software stack.

The participants in the meta-discussion wondered why MIPS became so popular given that there are licensing fees involved, whereas Sun made certain SPARC designs available under the GPL, and given that the SPARC architecture is supposedly royalty-free. For some manufacturers, this is asking the wrong question: they did not seek to license the patent-encumbered versions of the MIPS architecture; like the OpenRISC initiative, they merely implemented the unencumbered versions instead.

It would be nice to have a high-performance, inexpensive, readily-available free hardware CPU for use in free hardware designs. And of course those designs would support Free Software completely. But until that comes to pass, we have to work with what we can get. And indeed, for whichever architecture seems to be favoured for such a role, we also need to have usable and accessible hardware that is compatible with our favoured architecture so that we may prepare ourselves for the day it finally gets rolled out.

There might be a reason why SPARC isn’t so well supported by things like GNU/Linux distributions. Sadly, unlike various competitors, inexpensive SPARC products seem to be thin on the ground, and without those the efforts to maintain ports of large Free Software collections inevitably grind to a halt, but I would be only too happy for someone to point me to a source of products that I may have overlooked. There is no inherent reason why SPARC couldn’t be a viable platform for Free Software, regardless of what people may have to say about register windows!

Testing Times for Free Software and Open Hardware

Tuesday, January 12th, 2016

The last few months haven’t been too kind on Free Software and open hardware initiatives in a number of ways. Here, in a shorter form than one might usually expect from me, are some problematic developments on topics that I may have covered in the past year.

Software Freedom Undervalued

About a couple of months ago, the Software Freedom Conservancy started a fund-raising campaign after it became apparent that companies could not be relied upon to support the organisation’s activities. Since the start of the campaign, many individuals have stepped up and pledged financial support of their own, which is very generous of them, as is the support of enlightened organisations that have offered to match individual contributions.

Sadly, such generosity seems not to be shared by many of the largest companies making money from Free Software and from Linux in particular, and thus from the non-financial contributions that make projects like Linux viable in the first place, with many of those even coming from those same generous individuals who have supported the Conservancy financially. And let us consider for a moment why one prominent umbrella organisation’s members might not want to enforce the GPL, especially given that some of them have been successfully prosecuted for violating that licence, in relation to various Free Software projects, in the past.

The Proprietary Instincts of the BBC

The BBC Micro Bit was a topic covered in the last year, when I indicated a degree of caution about the mistakes of the past being repeated needlessly. And indeed, for some time, everything was being done behind the curtain of a non-disclosure agreement (NDA), meaning that very little information was being made available about the device and accompanying materials, and thus very little could be done by the average member of the public to prepare for the availability of the device, let alone develop their own materials, software, accessories or anything else for it.

Since then, a degree of secrecy has been eliminated, and efforts have been made to get the embedded variant of Python known as Micropython working on the board. However, certain parts of that work still appear to be encumbered by NDA, arguably making the effort of developing Python-related materials something of a social networking exercise. Meanwhile, notorious industry monopolist, Microsoft, somehow managed to muscle in on the initiative and take control of the principally-supported method of developing software with the device. I guess people at the BBC and their friends in politics and business don’t always learn from the mistakes of the past, particularly as they spend other people’s money.

The Walled Garden Party’s Hangover for Free Software Development

Just over twelve months ago, I made some observations about the Python core development group’s attraction to GitHub. It seems that the infatuation with the enthusiastic masses and their inevitable unleashing on Python assets, with the expectation of stimulating an exponential upturn in development activity, will now be gratified through a migration of various Python infrastructure components to the proprietary and centralised service that GitHub offers. (I have my doubts as to whether CPython contribution barriers are really the cause of Python’s current malaise, despite the usual clamour for Git and the associated “network effects” amongst a community of self-proclaimed version control wizards whose powers somehow don’t extend to mastering simple workflows with other tools.)

Anatoly Techtonik makes some interesting points, which will presumably go unheard because those involved have all decided not to listen to him any more. One of the more disturbing ones is that the “comparison shopping” mentality, where Free Software developers abandon their colleagues writing various tools and systems in favour of random corporations offering proprietary stuff at no cost, may well result in the Free Software solutions in such areas becoming seen as uncompetitive and unattractive. What those making such foolish decisions fail to realise is that their own projects can easily get the same treatment, if nobody bothers to see beyond the end of their own nose.

The result of all this is less funding and fewer resources for Free Software projects, with potentially fewer contributions, too, as the attraction of supporting “losing” solutions starts to fade. Community-oriented Free Software is arguably grossly underfunded as it is: we don’t really need other Free Software developers abandoning or undermining their colleagues while ridiculing those colleagues’ “ideological purity“. And, of course, volunteer effort will undoubtedly be burned up in the needless migration to the proprietary solution, setting everyone up for another costly transition down the road, which experience indicates is always more work than anyone anticipated (if they even bothered to think ahead at all).

PayPal: Doesn’t Pay, Not Your Pal

It has been a long time since I wrote about the Neo900 project. Things were looking promising: necessary components had been secured, and everyone was just waiting for Nikolaus to finish his work with the Pyra handheld console. And then we learned that PayPal had decided to hold a significant amount of money as a form of “security”, thus cutting off a vital source of funds for actually doing the work. Apparently, PayPal have a habit of doing this kind of thing, on one reported occasion even taking the opportunity to then offer loans to those people they deliberately put in such a difficult position.

If you supported the Neo900 project and pledged funds via PayPal, you need to tell PayPal to actually pay the project. You know: like the verb in their company name. Otherwise, in the worst case, you may not only not get a Neo900 and not see it developed to completion, but you will also have loaned your money to a large corporation for a substantial period and earned no interest on that involuntary loan, perhaps even incurring fees for the privilege. (So, please see the “How to fix it” section of the relevant article.)

Maybe in 2016, people will become a lot clearer about who their real friends are. Let us hope so!

Random Questions about Fairphone Source Code Availability

Saturday, September 26th, 2015

I was interested to read the recent announcement about source code availability for the first Fairphone device. I’ve written before about the threat to that device’s continued viability and Fairphone’s vague position on delivering a device that properly supports Free Software. It is nice to see that the initiative takes such matters seriously and does not seem to feel that letting its partners serve up what they have lying around is sufficient. However, a few questions arise, starting with the following quote from the announcement:

We can happily say that we have recently obtained a software license from all our major partners and license holders that allows us to modify the Fairphone 1 software and release new versions to our users. Getting that license also required us to obtain rights to use and distribute Mentor Graphics’s RTOS used on the phone. (We want to thank Mentor Graphics in making it possible for us to acquire the distribution license for their RTOS, as well as other partners for helping us achieve this.)

I noted before that various portions of the software are already subject to copyleft licensing, but if we ignore those (and trust that the sources were already being made available), it is interesting to consider the following questions:

  • What is “the Fairphone 1 software” exactly?
  • Fairphone may modify the software but what about its customers?
  • What role does the Mentor Graphics RTOS have? Can it be replaced by customers with something else?
  • Do the rights to use and distribute the RTOS extend to customers?
  • Do those rights extend to the source code of the RTOS, and do those rights uphold the four freedoms?

On further inspection, some contradictions emerge, perhaps most efficiently encapsulated by the following quote:

Now that Fairphone has control over the Fairphone 1 source code, what’s next? First of all, we can say that we have no plans to stop supporting the Fairphone hardware. We will continue to apply security fixes as long as it is feasible for the years to come. We will also keep exploring ways to increase the longevity of the Fairphone 1. Possibilities include upgrading to a more recent Android version, although we would like to manage expectations here as this is still very much a longshot dependent on cooperation from license holders and our own resources.

If Fairphone has control over the source code, why is upgrading to a more recent Android version dependent on cooperation with licence holders? If Fairphone “has control” then the licence holders should already have provided the necessary permissions for Fairphone to actually take control, just as one would experience with the four freedoms. One wonders which permissions have been withheld and whether these are being illegitimately withheld for software distributed under copyleft licences.

With a new device in the pipeline, I respect the persistence of Fairphone in improving the situation, but perhaps the following quote summarises the state of the industry and the struggle for sustainable licensing and licence compliance:

It is rather unusual for a small company like Fairphone to get such a license (usually ODMs get these and handle most of the work for their clients) and it is uncommon that a company attempts and manages to obtain such a license towards the end of the economic life cycle of the product.

Sadly, original design manufacturers (ODMs) have a poor reputation: often being known for throwing binaries over the wall whilst being unable or unwilling to supply the corresponding sources, with downstream manufacturers and retailers claiming that they have no leverage to rectify such licence violations. Although the injustices and hardships of those working to supply the raw materials for products like the Fairphone, along with those of the people working to construct the devices themselves, make other injustices seem slight – thinking especially of those experienced by software developers whose copyright is infringed by dubious industry practices – dealing with unethical and untidy practices wherever they may be found should be part of the initiative’s objectives.

From what I’ve seen and heard, Fairphone 2 should have a better story for chipset support and Free Software, but again, an inspection of the message raises some awkward questions. For example:

In the coming months we are going to launch several programs that address different aspects of creating fairer software. For now, one of the best tools for us to reach these goals is to embrace open source principles. With this in mind and without further ado, we’re excited to announce that we are going to release the complete build environment for Fairphone OS on Fairphone 2, which contains the full open source code, all the tools and the binary blobs that will allow users to build their own Fairphone OS.

To be fair, binary blobs are often difficult to avoid: desktop computers often use them for various devices, and even devices like the Neo900 that emphasise a completely Free Software stack will end up using them for certain functions (mitigating this by employing other technical measures). Making the build environment available is a good thing: frequently, this aspect is overlooked and anyone requesting the source code can be left guessing about build configuration details in an exercise that is effectively a matter of doing the vendor’s licence compliance work for them. But here, we are left wondering where the open source code ends, where binary blobs will be padding out the distribution, and what those blobs are actually for.

We need to keep asking difficult questions about such matters even if what Fairphone is doing is worthy in its own right. Not only does it safeguard the interests of the customers involved, but it also helps Fairphone to focus on doing the right thing. It does feel unkind to criticise what seems like a noble initiative for not doing more when they obviously try rather hard to do the right thing in so many respects. But by doing the right thing in terms of the software as well, Fairphone can uphold its own reputation and credibility: something that all businesses need to remember, as certain very large companies have very recently discovered.

Hardware Experiments with Fritzing

Friday, August 28th, 2015

One of my other interests, if you can even regard it as truly separate to my interests in Free Software and open hardware, involves the microcomputer systems of the 1980s that first introduced me to computing and probably launched me in the direction of my current career. There are many aspects of such systems that invite re-evaluation of their capabilities and limitations, leading to the consideration of improvements that could have been made at the time, as well as more radical enhancements that unashamedly employ technology that has only become available or affordable in recent years. Such “what if?” thought experiments and their hypothetical consequences are useful if we are to learn from the strategic mistakes once made by systems vendors, to have an informed perspective on current initiatives, and to properly appreciate computing history.

At the same time, people still enjoy actually using such systems today, writing new software and providing hardware that makes such continuing usage practical and sustainable. These computers and their peripherals are certainly “getting on”, and acquiring or rediscovering such old systems does not necessarily mean that you can plug them in and they still work as if they were new. Indeed, the lifetime of magnetic media and the devices that can read it, together with issues of physical decay in some components, mean that alternative mechanisms for loading and storing software have become attractive for some users, having been developed to complement or replace the cassette tape and floppy disk methods that those of us old enough to remember would have used “back in the day”.

My microcomputer of choice in the 1980s was the Acorn Electron – a cut-down, less expensive version of the BBC Microcomputer hardware platform – which supported only cassette storage in its unexpanded form. However, some expansion units added the disk interfaces present on the BBC Micro, while others added the ability to use ROM-based software. On the BBC Micro, one would plug ROM chips directly into sockets, and some expansion units for the Electron supported this method, too. The official Plus 1 expansion chose instead to support the more friendly expansion cartridge approach familiar to users of other computing and console systems, with ROM cartridges being the delivery method for games, applications and utilities in this form, providing nothing much more than a ROM chip and some logic inside a convenient-to-use cartridge.

The Motivation

A while ago, my brother, David, became interested in delivering software on cartridge for the Electron, and a certain amount of discussion led him to investigate various flash memory integrated circuits (ICs, chips), notably the AMD Am29F010 series. As technological progress continues, such devices provide a lot of storage in comparison to the ROM chips originally used with the Electron: the latter having only 16 kilobytes of capacity, whereas the Am29F010 variant chosen here has a capacity of 128 kilobytes. Meanwhile, others chose to look at EEPROM chips, notably the AT28C256 from Atmel.

Despite the manufacturing differences, both device types behave in a very similar way: a good idea for the manufacturers who could then sell products that would be compatible straight away with existing products and the mechanisms they use. In short, some kind of de-facto standard seems to apply to programming these devices, and so it should be possible to get something working with one and then switch to the other, especially if one kind becomes too difficult to obtain.

Now, some people realised that they could plug such devices into their microcomputers and program them “in place” using a clever hack where writes to the addresses that correspond to the memory provided by the EEPROM (or, indeed, flash memory device) in the computer’s normal memory map can be trivially translated into addresses that have significance to the EEPROM itself. But not routinely using such microcomputers myself, and wanting more flexibility in the programming of such devices, not to mention also avoiding the issue of getting software onto such computers so that it can be written to such non-volatile memory, it seemed like a natural course of action to try to do the programming with the help of some more modern conveniences.

And so I considered the idea of getting a microcontroller solution like the Arduino to do the programming work. Since an Arduino can be accessed over USB, a ROM image could be conveniently transferred from a modern computer and, with a suitable circuit wired up, programmed into the memory chip. ROM images can thus be obtained in the usual modern way – say, from the Internet – and then written straight to the memory chip via the Arduino, rather than having to be written first to some other medium and transferred through a more convoluted sequence of steps.

Breadboarding

Being somewhat familiar with Arduino experimentation, the first exercise was to make the circuit that can be used to program the memory device. Here, the first challenge presented itself: the chip employs 17 address lines, 8 data lines, and 3 control lines. Meanwhile, the Arduino Duemilanove only provides 14 digital pins and 6 analogue pins, with 2 of the digital pins (0 and 1) being unusable if the Arduino is communicating with a host, and another (13) being connected to the LED and being seemingly untrustworthy. Even with the analogue pins in service as digital output pins, only 17 pins would be available for interfacing.

The pin requirements
Arduino Duemilanove Am29F010
11 digital pins (2-12) 17 address pins (A0-A16)
6 analogue pins (0-6) 8 data pins (DQ0-DQ7)
3 control pins (CE#, OE#, WE#)
17 total 28 total

So, a way of multiplexing the Arduino pins was required, where at one point in time the Arduino would be issuing signals for one purpose, these signals would then be “stored” somewhere, and then at another point in time the Arduino would be issuing signals for another purpose. Ultimately, these signals would be combined and presented to the memory device in a hopefully coherent fashion. We cannot really do this kind of multiplexing with the control signals because they typically need to be coordinated to act in a timing-sensitive fashion, so we would be concentrating on the other signals instead.

So which signals would be stored and issued later? Well, with as many address lines needing signals as there are available pins on the Arduino, it would make sense to “break up” this block of signals into two. So, when issuing an address to the memory device, we would ideally be issuing 17 bits of information all at the same time, but instead we take approximately half of the them (8 bits) and issue the necessary signals for storage somewhere. Then, we would issue the other half or so (8 bits) for storage. At this point, we need only a maximum of 8 signal lines to communicate information through this mechanism. (Don’t worry, I haven’t forgotten the other address bit! More on that in a moment!)

How would we store these signals? Fortunately, I had considered such matters before and had ordered some 74-series logic chips for general interfacing, including 74HC273 flip-flop ICs. These can be given 8 bits of information and will then, upon command, hold that information while other signals may be present on its input pins. If we take two of these chips and attach their input pins to those 8 Arduino pins we wish to use for communication, we can “program” each 74HC273 in turn – one with 8 bits of an address, the other with another 8 bits – and then the output pins will be presenting 16 bits of the address to the memory chip. At this point, those 8 Arduino pins could even be doing something else because the 74HC273 chips will be holding the signal values from an earlier point in time and won’t be affected by signals presented to their input pins.

Of all the non-control signals, with 16 signals out of the way, that leaves only 8 signals for the memory chip’s data lines and that other address signal to deal with. But since the Arduino pins used to send address signals are free once the addresses are sent, we can re-use those 8 pins for the data signals. So, with our signal storage mechanism, we get away with only using 8 Arduino pins to send 24 pieces of information! We can live with allocating that remaining address signal to a spare Arduino pin.

Address and data pins
Arduino Duemilanove 74HC273 Am29F010
8 input/output pins 8 output pins 8 address pins (A0-A7)
8 output pins 8 address pins (A8-A15)
8 data pins (DQ0-DQ7)
1 output pin 1 address pin (A16)
9 total 25 total

That now leaves us with the task of managing the 3 control signals for the memory chip – to make it “listen” to the things we are sending to it – but at the same time, we also need to consider the control lines for those flip-flop ICs. Since it turns out that we need 1 control signal for each of the 74HC273 chips, we therefore need to allocate 5 additional interfacing pins on the Arduino for sending control signals to the different chips.

The final sums
Arduino Duemilanove 74HC273 Am29F010
8 input/output pins 8 output pins 8 address pins (A0-A7)
8 output pins 8 address pins (A8-A15)
8 data pins (DQ0-DQ7)
1 output pin 1 address pin (A16)
3 output pins 3 control pins (CE#, OE#, WE#)
2 output pins 2 control pins (CP for both ICs)
14 total 28 total

In the end, we don’t even need all the available pins on the Arduino, but the three going spare wouldn’t be enough to save us from having to use the flip-flop ICs.

With this many pins in use, and the need to connect them together, there are going to be a lot of wires in use:

The breadboard circuit with the Arduino and ICs

The breadboard circuit with the Arduino and ICs

The result is somewhat overwhelming! Presented in a more transparent fashion, and with some jumper wires replaced with breadboard wires, it is slightly easier to follow:

An overview of the breadboard circuit

An overview of the breadboard circuit

The orange wires between the two chips on the right-hand breadboard indicate how the 8 Arduino pins are connected beyond the two flip-flop chips and directly to the flash memory chip, which would sit on the left-hand breadboard between the headers inserted into that breadboard (which weren’t used in the previous arrangement).

Making a Circuit Board

It should be pretty clear that while breadboarding can help a lot with prototyping, things can get messy very quickly with even moderately complicated circuits. And while I was prototyping this, I was running out of jumper wires that I needed for other things! Although this circuit is useful, I don’t want to have to commit my collection of components to keeping it available “just in case”, but at the same time I don’t want to have to wire it up when I do need it. The solution to this dilemma was obvious: I should make a “proper” printed circuit board (PCB) and free up all my jumper wires!

It is easy to be quickly overwhelmed when thinking about making circuit boards. Various people recommend various different tools for designing them, ranging from proprietary software that might be free-of-charge in certain forms but which imposes arbitrary limitations on designs (as well as curtailing your software freedoms) through to Free Software that people struggle to recommend because they have experienced stability or functionality deficiencies with it. And beyond the activity of designing boards, the act of getting them made is confused by the range of services in various different places with differing levels of service and quality, not to mention those people who advocate making boards at home using chemicals that are, shall we say, not always kind to the skin.

Fortunately, I had heard of an initiative called Fritzing some time ago, initially in connection with various interesting products being sold in an online store, but whose store then appeared to be offering a service – Fritzing Fab – to fabricate individual circuit boards. What isn’t clear, or wasn’t really clear to me straight away, was that Fritzing is also some Free Software that can be used to design circuit boards. Conveniently, it is also available as a Debian package.

The Fritzing software aims to make certain tasks easy that would perhaps otherwise require a degree of familiarity with the practice of making circuit boards. For instance, having decided that I wanted to interface my circuit to an Arduino as a shield which sits on top and connects directly to the connectors on the Arduino board, I can choose an Arduino shield PCB template in the Fritzing software and be sure that if I then choose to get the board made, the dimensions and placement of the various connections will all be correct. So for my purposes and with my level of experience, Fritzing seems like a reasonable choice for a first board design.

Replicating the Circuit

Fritzing probably gets a certain degree of disdain from experienced practitioners of electronic design because it seems to emphasise the breadboard paradigm, rather than insisting that a proper circuit diagram (or schematic) acts as the starting point. Here is what my circuit looks like in Fritzing:

The breadboard view of my circuit in Fritzing

The breadboard view of my circuit in Fritzing

You will undoubtedly observe that it isn’t much tidier than my real-life breadboard layout! Having dragged a component like the Arduino Uno (mostly compatible with the Duemilanove) onto the canvas along with various breadboards, and then having dragged various other components onto those breadboards, all that remains is that we wire them up like we managed to do in reality. Here, Fritzing helps out by highlighting connections between things, so that breadboard columns appear green as wires are connected to them, indicating that an electrical connection is made and applies to all points in that column on that half of the breadboard (the upper or lower half as seen in the above image). It even highlights things that are connected together according to the properties of the device, so that any attempt to modify to a connection that leads to one of the ground pins on the Arduino also highlights the other ground pins as the modification is being done.

I can certainly understand criticism of this visual paradigm. Before wiring up the real-life circuit, I typically write down which things will be connected to each other in a simple table like this:

Example connections
Arduino 74HC273 #1 74HC273 #2 Am29F010
A5 CE#
A4 OE#
A3 WE#
2 CP
3 CP
4 D3 D3 DQ3

If I were not concerned with prototyping with breadboards, I would aim to use such information directly and not try and figure out which size breadboard I might need (or how many!) and how to arrange the wires so that signals get where they need to be. When one runs out of points in a breadboard column and has to introduce “staging” breadboards (as shown above by the breadboard hosting only incoming and outgoing wires), it distracts from the essential simplicity of a circuit.

Anyway, once the circuit is defined, and here it really does help that upon clicking on a terminal/pin, the connected terminals or pins are highlighted, we can move on to the schematic view and try and produce something that makes a degree of sense. Here is what that should look like in Fritzing:

The schematic for the circuit in Fritzing

The schematic for the circuit in Fritzing

Now, the observant amongst you will notice that this doesn’t look very tidy at all. First of all, there are wires going directly between terminals without any respect for tidiness whatsoever. The more observant will notice that some of the wires end in the middle of nowhere, although on closer inspection they appear to be aimed at a pin of an IC but are shifted to the right on the diagram. I don’t know what causes this phenomenon, but it would seem that as far as the software is concerned, they are connected to the component. (I will come back to how components are defined and the pitfalls involved later on.)

Anyway, one might be tempted to skip over this view and try and start designing a PCB layout directly, but I found that it helped to try and tidy this up a bit. First of all, the effects of the breadboard paradigm tend to manifest themselves with connections that do not really reflect the logical relationships between components, so that an Arduino pin that feeds an input pin on both flip-flop ICs as well as a data pin on the flash memory IC may have its connectors represented by a wire first going from the Arduino to one of the flip-flop ICs, then to the other flip-flop IC, and finally to the flash memory IC in some kind of sequential wiring. Although electrically this is not incorrect, with a thought to the later track routing on a PCB, it may not be the best representation to help us think about such subsequent problems.

So, for my own sanity, I rearranged the connections to “fan out” from the Arduino as much as possible. This was at times a frustrating exercise, as those of you with experience with drawing applications might recognise: trying to persuade the software that you really did select a particular thing and not something else, and so on. Again, selecting the end of a connection causes some highlighting to occur, and the desired result is that selecting a terminal highlights the appropriate terminals on the various components and not the unrelated ones.

Sometimes that highlighting behaviour provides surprising and counter-intuitive results. Checking the breadboard layout tends to be useful because Fritzing occasionally thinks that a new connection between certain pins has been established, and it helpfully creates a “rats nest” connection on the breadboard layout without apparently saying anything. Such “rats nest” connections are logical connections that have not been “made real” by the use of a wire, and they feature heavily in the PCB view.

PCB Layout

For those of us with no experience of PCB layout who just admire the PCBs in everybody else’s products, the task of laying out the tracks so that they make electrical sense is a daunting one. Fritzing will provide a canvas containing a board and the chosen components, but it is up to you to combine them in a sensible way. Here, the circuit board actually corresponds to the Arduino in the breadboard and schematic views.

But slightly confusing as the depiction of the Arduino is in the breadboard view, the pertinent aspects of it are merely the connectors on that device, not the functionality of the device itself which we obviously aren’t intending to replicate. So, instead of the details of an actual Arduino or its functional equivalent, we instead merely see the connection points required by the Arduino. And by choosing a board template for an Arduino shield, those connection points should appear in the appropriate places, as well as the board itself having the appropriate size and shape to be an Arduino shield.

Here’s how the completed board looks:

The upper surface of the PCB design in Fritzing

The upper surface of the PCB design in Fritzing

Of course, I have spared you a lot of work by just showing the image above. In practice, the components whose outlines and connectors feature above need to be positioned in sensible places. Then, tracks need to be defined connecting the different connection points, with dotted “rats nest” lines directly joining logically-connected points needing to be replaced with physical wiring in the form of those tracks. And of course, tracks do not enjoy the same luxury as the wires in the other views, of being able to cross over each other indiscriminately: they must be explicitly routed to the other side of the board, either using the existing connectors or by employing vias.

The lower surface of the PCB design in Fritzing

The lower surface of the PCB design in Fritzing

Hopefully, you will get to the point where there are no more dotted lines and where, upon selecting a connection point, all the appropriate points light up, just as we saw when probing the details of the other layouts. To reassure myself that I probably had connected everything up correctly, I went through my table and inspected the pin-outs of the components and did a kind of virtual electrical test, just to make sure that I wasn’t completely fooling myself.

With all this done, there isn’t much more to do before building up enough courage to actually get a board made, but one important step that remains is to run the “design checks” via the menu to see if there is anything that would prevent the board from working correctly or from otherwise being made. It can be the case that tracks do cross – the maze of yellow and orange can be distracting – or that they are too close and might cause signals to go astray. Fortunately, the hours of planning paid off here and only minor adjustments needed to be done.

It should be noted that the exercise of routing the tracks is certainly not to be underestimated when there are as many connections as there are above. Although an auto-routing function is provided, it failed to suggest tracks for most of the required connections and produced some bizarre routing as well. But clinging onto the memory of a working circuit in real three-dimensional space, along with the hope that two sides of a circuit board are enough and that there is enough space on the board, can keep the dream of a working design alive!

The Components

I skipped over the matter of components earlier on, and I don’t really want to dwell on the matter too much now, either. But one challenge that surprised me given the selection of fancy components that can be dragged onto the canvas was the lack of a simple template for a 32-pin DIP (dual in-line package) socket for the Am29F010 chip. There were socket definitions of different sizes, but it wasn’t possible to adjust the number of pins.

Now, there is a parts editor in Fritzing, but I tend to run away from graphical interfaces where I suspect that the matter could be resolved in more efficient ways, and it seems like other people feel the same way. Alongside the logical definition of the component’s connectors, one also has to consider the physical characteristics such as where the connectors are and what special markings will be reproduced on the PCB’s silk-screen for the component.

After copying an existing component, ransacking the Fritzing settings files, editing various files including those telling Fritzing about my new parts, I achieved my modest goals. But I would regard this as perhaps the weakest part of the software. I didn’t resort to doing things the behind-the-scenes way immediately, but the copy-and-edit paradigm was incredibly frustrating and doesn’t seem to be readily documented in a way I could usefully follow. There is a Sparkfun tutorial which describes things at length, but one cannot help feeling that a lot of this should be easier, especially for very simple component changes like the one I needed.

The Result

With some confidence and only modest expectations of success, I elected to place an order with the Fritzing Fab service and to see what the result would end up like. This was straightforward for the most part: upload the file created by Fritzing, fill out some details (albeit not via a secure connection), and then proceed to payment. Unfortunately, the easy payment method involves PayPal, and unfortunately PayPal wants random people like myself to create an account with them before they will consider letting me make a credit card payment, which is something that didn’t happen before. Fortunately, the Fritzing people are most accommodating and do support wire transfers as an alternative payment method, and they were very responsive to my queries, so I managed to get an order submitted even more quickly than I thought might happen (considering that fabrication happens only once a week).

Just over a week after placing my order, the board was shipped from Germany, arriving a couple of days later here in Norway. Here is what it looked like:

The finished PCB from Fritzing

The finished PCB from Fritzing

Now, all I had to do was to populate the board and to test the circuit again with the Arduino. First, I tested the connections using the Arduino’s 5V and GND pins with an LED in series with a resistor in an “old school” approach to the problem, and everything seemed to be as I had defined it in the Fritzing software.

Given that I don’t really like soldering things, the act of populating the board went about as well as expected, even though I could still clean up the residue from the solder a bit (which would lead me onto a story about buying the recommended chemicals that I won’t bother you with). Here is the result of that activity:

The populated board together with the Arduino

The populated board together with the Arduino

And, all that remained was the task of getting my software running and testing the circuit in its new form. Originally, I was only using 16 address pins, holding the seventeenth low, and had to change the software to handle these extended addresses. In addition, the issuing of commands to the flash memory device probably needed a bit of refinement as well. Consequently, this testing went on for a bit longer than I would have wished, but eventually I managed to successfully replicate the programming of a ROM image that had been done some time ago with the breadboard circuit.

The outcome did rely on a certain degree of good fortune: the template for the Arduino Uno is not quite compatible with the Duemilanove, but this was rectified by clipping two superfluous pins from one of the headers I soldered onto the board; two of the connections belonging to the socket holding the flash memory chip touch the outside of the plastic “power jack” socket, but not enough to cause a real problem. But I would like to think that a lot of preparation dealt with problems that otherwise might have occurred.

Apart from liberating my breadboards and wires, this exercise has provided useful experience with PCB design. And of course, you can find the sources for all of this in my repository, as well as a project page for the board on the Fritzing projects site. I hope that this account of my experiences will encourage others to consider trying it out, too. It isn’t as scary as it would first appear, after all, although I won’t deny that it was quite a bit of work!

New Fairphone, New Features, Same Old Software Story?

Saturday, August 15th, 2015

I must admit that I haven’t been following Fairphone of late, so it was a surprise to see that vague details of the second Fairphone device have been published on the Fairphone Web site. One aspect that seems to be a substantial improvement is that of hardware modularity. Since the popularisation of the notion that such a device could be built by combining functional units as if they were simple building blocks, with a lot of concepts, renderings and position statements coming from a couple of advocacy initiatives, not much else has actually happened in terms of getting devices out for people to use and develop further. And there are people with experience of designing such end-user products who are sceptical about the robustness and economics of such open-ended modular solutions. To see illustrations of a solution that will presumably be manufactured takes the idea some way along the road to validation.

If it is possible to, say, switch out the general-purpose computing unit of the Fairphone with another one, then it can be said that even if the Fairphone initiative fails once again to deliver a software solution that is entirely Free Software, perhaps because the choice of hardware obliges the initiative to deliver opaque “binary-only” payloads, then the opportunity might be there for others to deliver a bottom-to-top free-and-open solution as a replacement component. But one might hope that it should not be necessary to “opt in” to getting a system whose sources can be obtained, rebuilt and redeployed: that the second Fairphone device might have such desirable characteristics out of the box.

Now, it does seem that Fairphone acknowledges the existence and the merits of Free Software, at least in very broad terms. Reading the support site provides us with an insight into the current situation with regard to software freedom and Fairphone:

Our goal is to take a more open source approach to be able to offer owners more choice and control over their phone’s OS. For example, we want to make the source code available to the developer community and we are also in discussions with other OS vendors to look at the possibility of offering alternative operating systems for the Fairphone 2. However, at the moment there are parts of the software that are owned or licensed by third parties, so we are still investigating the technical and legal requirements to accomplish our goals of open software.

First of all, ignoring vague terms like “open software” that are susceptible to “openwashing” (putting the label “open” on something that really isn’t), it should be noted that various parts of the deployed software will, through their licensing, oblige the Fairphone initiative to make the corresponding source code available. This is not a matter that can be waved away with excuses about people’s hands being tied, that it is difficult to coordinate, or whatever else the average GPL-violating vendor might make. If copyleft-licensed code ships, the sources must follow.

Now there may also be proprietary software on the device (or permissively-licensed software bearing no obligation for anyone to release the corresponding source, which virtually amounts to the same thing) and that would clearly be against software freedom and should be something Fairphone should strongly consider avoiding, because neither end-users nor anyone who may wish to help those users would have any control over such software, and they would be completely dependent on the vendor, who in turn would be completely dependent on their suppliers, who in turn might suddenly not care about the viability of that software or the devices on which it is deployed. So much for sustainability under such circumstances!

As I noted before, having control over the software is not a perk for those who wish to “geek out” over the internals of a product: it is a prerequisite for product viability, longevity and sustainability. Let us hope that Fairphone can not only learn and apply the lessons from their first device, which may indeed have occurred with the choice of a potentially supportable chipset this time around, but that the initiative can also understand and embrace their obligations to those who produced the bulk of their software (as well as to their customers) in a coherent and concrete fashion. It would be a shame if, once again, an unwillingness to focus on software led to another missed opportunity, and the need for another version of the device to be brought to market to remedy deficiencies in what is otherwise a well-considered enterprise.

Now, if only Fairphone could organise their Web site in a more coherent fashion, putting useful summaries of essential information in obvious places instead of being buried in some random forum post