Paul Boddie's Free Software-related blog


Archive for the ‘Free Software’ Category

Making a Board for the PIC32 VGA Signal Generation Project

Tuesday, February 22nd, 2022

Although I have tried to maintain an interest in computing hardware over the last two years or so, both contemporary and somewhat older technologies, I haven’t really had much time or energy to devote to electronics projects. However, I was becoming increasingly bothered by my existing VGA signal generation project taking up space on a couple of solderless breadboards, demanding lots of jumper wires, and generally not helping me organise and consolidate my collection of electronics-related acquisitions, components, and so on. I was also feeling a bit bad for not moving the VGA project to the next step, which is what I had virtually promised to do. Having acquired a new computer in 2020 but not having really looked at KiCad since migrating to the new machine, I finally picked up the courage to set out translating my existing notes into a proper circuit diagram, reacquainting myself with KiCad’s peculiarities, and then translating this into an actual board layout.

My priorities for a circuit board were to support signal generation and programming, gathering together all the different passive components associated with signal generation – the resistors for combining colour intensity levels – plus all the components associated with programming the microcontroller – more resistors – and putting them on the board with a socket for the microcontroller. Unlike boards trying to replicate the Arduino experience, this board would not seek to offer programming facilities itself: that would remain the job of another board, with the Arduino Duemilanove already doing this job adequately under the control of the Nanu Nanu software. Headers would therefore expose the programming pins for connection to the Arduino or other device.

Nor would the board offer its own VGA connector for the VGA signals: I already have a VGA connector terminal block, and mounting a DE-15 connector on a board raises issues of selecting an appropriate component, defining a usable component footprint, making room on the board for it, and ensuring that the connector is securely attached without risking stress on the board or solder joints. Besides, the idea was to make use of my existing parts, and by retaining use of the terminal block, I could just use a header for the VGA signal outputs which would also require a modest amount of space. I envisaged that any future solution with a proper socket for a VGA cable might well involve a separate board securely mounted to a case, with the case taking the strain of cable insertions and removals, and with cables connecting this separate board to the board being designed here.

One issue when making a board that I could easily imagine stalling the process is that of the size and shape of any given board. With something resembling a miniature computing system, one might get tempted by selecting a broadly-adopted circuit board profile like PC/104 or one of the ITX family, or perhaps leveraging the deluge of Raspberry Pi cases available for one of those products (despite their variations and incompatibilities). However, my choice was practically made for me: I already had one Arduino case that I hadn’t used for anything, and it seemed to me that the traditional Arduino board profile was appropriately sized and would not be particularly inconvenient to adopt, although the actual physical characteristics were not adequately defined, leaving it to others to do the work of documentation that should have been done by the Arduino initiative right from the start. (There are subtle differences between Duemilanove models that have actually made some cases incompatible with some versions of that board, so this was a real problem.)

Anyway, with the physical and electrical constraints mostly figured out, I laid out the board, put headers in places I considered sensible enough, not aiming for Arduino shield compatibility since I needed more flexibility than that would demand. I also wanted to add support for various other useful features including some that I had already tested, such as UART communication, together with some I had not been able to adequately investigate, such as the driving of peripherals (or the microcontroller itself) over the parallel bus and USB connectivity. The latter involved messing around with the differential pair routing features of KiCad, but it is entirely possible that such features are largely superfluous at the transmission frequencies this board would end up using. After much checking, rechecking, agonising, and so on, I uploaded the board design to OSHPark and placed an order.

Several weeks later, I received the boards in the post, thankfully without any customs fees or other charges. In the meantime, I had also ordered components that I needed from a supplier in the UK, Technobots, who happen to sell logic chips and other things that suppliers targeting the “maker” community tend not to bother selling. Thankfully also in this case, the components arrived without incurring fees and charges, although I had kept the value of my order fairly low. In Norway, the industrial lobby hate to see people importing things, often taking the tone that people should buy locally produced goods, but since nobody makes these things here, anyone “local” that sells them has to import them, and the result is often just the cheapest stuff at ridiculously high prices and some middlemen making all the money, rather than anyone earning a decent living out of actually producing anything. But anyway.

The boards themselves were well made, as I have seen before, although I was disappointed with the “tabs” around the edge. When boards get made, people will tell you that they are all put on a big panel together and that they therefore need to be attached to each other somehow. The manufacturer will then typically spell out that apart from separating the boards by snapping them apart, they don’t do any further finishing work on the edges because we, the customers, aren’t paying for that level of service. However, I don’t remember the remnants of these inter-board connections – the tabs – being so awkward on previous boards from OSHPark. If I had to do anything before, I’m sure it just involved some gentle trimming, but these needed the application of glasspaper to grind down the tabs to be level with the actual board edge. Given the sub-millimetre tolerances involved in board fabrication, I find it perverse that such finishing would fall to someone – me or someone in the factory – to do this by hand.

The fabricated boards

The fabricated boards, with the annoying “tabs” visible around the edges.

Having tested the continuity between different pads using a simple power plus LED plus resistor arrangement, all that remained was to get the soldering equipment out and to build myself up to the usually arduous task of fitting all the headers, resistors, capacitors and the socket. This was something of a trial in itself, but I eventually remembered the various tricks needed to coerce my soldering iron to apply the solder in a barely decent way.

The finished board

The finished board connected to the VGA terminal block and Arduino Duemilanove.

One thing I realised very quickly was that I had chosen the wrong component footprint for the resistors. In navigating the rather unreadable list provided by the inflexible component assignment dialogue within KiCad, I had chosen footprints with the pads not being sufficiently separated for the resistors I happen to have (and ones that are likely to be sold for these purposes). So, I had to raise the resistors up from the board and tuck the leads under them slightly. Despite the additional height occupied by the resistors, the capacitors and headers are already rather tall and so this workaround causes no other problems. I also discovered that placing the resistors in the compact arrangement shown, while very neat and tidy, made all the unsoldered leads rather crowded on the underside of the board and rather increased the risk of accidental solder bridging between components, at least with my soldering technique. The capacitors had appropriate footprints, however, and were less of a challenge.

What followed was another trial, mostly caused by a simple wiring up error, and lots of troubleshooting involving the programming software and circuits both on this board and on the breadboard, chasing down a non-existent programming issue. The programming circuit was actually fine, and the programming was actually being performed as normal, but in maintaining a level of doubt about the outcome of the exercise, I had run the verification operation of the programming software and had seen that it was consistently complaining about a programming error. Worryingly, this error now seemed to occur both on the manufactured board and on the breadboard, for both a microcontroller that I had used before and a completely unused one. I started to test and investigate different versions of the programming software to see if any regressions had occurred.

Eventually, I realised that the error was related to the last thing the software would do when programming the microcontroller and was related to setting a configuration register. Ignoring this and reminding myself of the UART configuration, I sought to test the example program demonstrating UART communication and found, eventually, that it worked on the breadboard. It also worked on the manufactured board, too, raising my confidence slightly. And then, while perusing my previous blog post, I realised that I had connected the VGA sync signal pins the wrong way round! Programming the VGA example program and fixing the wiring finally got me to where I should have been to begin with.

The board generating a VGA signal.

The board generating a VGA signal via a terminal block and cable, with the Arduino supplying power and acting as programmer.

The arrangement of boards required here is a bit cumbersome, although the Arduino could be replaced by a simple 3.3V power source if the appropriate software has already been programmed into the PIC32 microcontroller. As noted above, an auxiliary board could provide a VGA socket for a cable and thus eliminate the bulky terminal connector.

The three-board arrangement.

The three-board arrangement of VGA terminal block, VGA signal board, and Arduino Duemilanove.

As far as using an existing Arduino case is concerned, there are some obvious limitations and possibilities for improvement and refinement. The size and shape of the manufactured board is compatible with the case I happen to have, but where an Arduino shield would be stacked on top of the Duemilanove directly, this board necessitates that the connections be routed using wires or cables between the Arduino’s analogue header and the programming header on this board. Having mounted female headers on this board, this involves routing cables around the edge of the board, and the lack of clearance between the Arduino’s headers and the underside of this board means that there is not enough space to use jumper wires. Fortunately, breadboard jumper cables can do this job, if not entirely elegantly.

The board and the Duemilanove in a case.

The board and the Duemilanove in a case, with the cables routing programming and power signals between the boards.

Introducing a degree of shield compatibility might be a solution to such problems, but perhaps I should not be too hard on myself. The new computer I bought in 2020 has its own compartment for cables, and this compartment is rammed full of seemingly needless, bulky cabling, presumably demonstrating the designers’ aptitude for cable management as they neglected other basic functionality one would expect from a computer case in the third decade of the twenty-first century.

There are other modifications that I might make in a second version of this board. Challenged by the layout issues, I neglected to realise that if the VGA signal resistors are fitted, some of the parallel bus pins will have their own signals mixed with others: I should have realised this and specified diodes for the various signal lines concerned. Still, with spare boards and another microcontroller to play with, I could possibly just make up a board without those resistors if I wanted to experiment with parallel mode. Since this would most likely involve driving a screen, it wouldn’t make sense to try and support that and VGA output on the same board, although I could imagine some kind of switching solution disabling one and enabling the other as appropriate.

In reflection, the outcome is satisfactory. I did spend rather too long designing, building and – especially – troubleshooting the board, and there are definitely things that are obviously wrong with it, but it has now allowed me to dismantle the original breadboard circuit, free up some jumper wires and components, and to put all of those things away properly. It has also helped me become familiar with KiCad once again, encouraged me to document more schematics, and opened the door to doing more circuit design in the future. In any case, I hope that this was a somewhat interesting view into the realisation of a long overdue project.

Some thoughts about technological sustainability

Saturday, February 12th, 2022

It was interesting to see an apparently recent article “On the Sustainability of Free Software” published by the FSFE in the context of the Upcycling Android campaign. I have been advocating for sustainable Free Software for some time. When I wasn’t posting articles about my own Python-like language, electronics projects or microkernel-based system development, it seems that I was posting quite a few about sustainable software, hardware and technology like these:

So, I hardly feel it necessary to go back through much of the same material again. Frustratingly, very little has improved over the years, it would seem: some new initiatives emerge, and such things always manage to excite some people, but the same old underlying causes of a general lack of sustainability remain, these including access to affordable, long-lasting and supportable hardware, and the properly funded development of Free Software and the hardware that would run it.

Of course, I wouldn’t even be bothered to write this if I didn’t feel that there might be some positive insights to share, and recent events have prompted me to do so. Hopefully, I can formulate them concisely and constructively in the following paragraphs.

The Open Hardware Crisis

Alright, so that was a provocative heading – hardly positive or constructive – and with so much hardware hacking (of the good kind) going on these days, it might be tempting to ask “what crisis?” Well, evidently, some people think there has been a crisis around the certification of hardware by the Free Software Foundation: that the Respects Your Freedom criteria don’t really help get hardware designed and made that would support (or be supported by) Free Software; that the criteria fail to recognise practical limitations with some elements of hardware, imposing rigid and wasteful measures that fail to enhance the rights of users and that might even impair the longevity and repairability of devices.

A lot of the hardware we rely on nowadays depends on features that cannot easily be supported by Free Software. The system I use has integrated graphics that require proprietary firmware from AMD to work in any half-decent way, as do many processors and interfacing chips, some not even working in any real way at all without it. Although FPGA technology has become more accessible and has invigorated the field of open hardware, there are still considerable challenges around the availability of Free Software toolchains for those kinds of devices. It is also not completely clear how programmable logic devices intersect with the realm of Free Software, either, as far as I can tell. Should people expect the corresponding source code and the means to generate a “bitstream” for the FPGAs in a system? I would think so, given appropriate licensing, but I am not familiar enough with the legal and regulatory constraints to allow me to expect it to be so.

The discussion around this may sound like a storm in a teacup, especially if you do not follow the appropriate organisations, figures, and their mailing lists (and I recommend saving your time and not bothering to, either), but it also sounds rather like tactics have prevailed over strategy. The fact is that without hardware being made to run Free Software, there isn’t really going to be much of a Free Software movement. So, instead of recommending increasingly ancient phones as “ethical gifts” or hoping that the latest crop of “Linux phones” will deliver a package that will not only run Linux but also prove to be usable as a phone, maybe a move away from consumerism is advised. Consumerism, of course, being the tendency to solve every problem by choosing the “best deal” the market happens to be offering today.

What the likes of the FSF need to do is to invest in hardware platforms that are amenable to the deployment of Free Software. This does not necessarily mean totally rejecting hardware if it has unfortunate characteristics such as proprietary firmware, particularly if there is no acceptable alternative, but the initiative has to start somewhere, however imperfect that somewhere might be. Much as we would all like to spend thousands of dollars on hardware that meets some kind of liberty threshold, most of us don’t have that kind of money and would accept some kind of compromise (just as we have to do most of the time, anyway), especially if we felt there was a chance to make up for any deficiencies later on. Trying to start from an impossible position means that there is no “here and now”, never mind “later on”.

Sadly, several attempts at open hardware platforms have struggled and could not be sustained. Some of these were criticised for having some supposed flaw or other that apparently made them unacceptable to the broader Free Software community, and yet they could have led to products that might have remedied such supposed flaws. Meanwhile, consumerist instincts had all the money chasing the latest projects, and yet here we are in 2022, barely any better off than we were in 2012. Had the FSF and company actually supported hardware projects that sought to support their own vision, as opposed to just casually endorsing projects and hoping they came good, we might be in better place by now.

The Ethical Software Crisis

One depressingly recurring theme is the lack of support given to Free Software developers and to Free Software development, even as billion-to-trillion-dollar corporations bank substantial profits on the back of Free Software. As soon as some random developer deletes his JavaScript package from some repository or other, or even switches it out for something that breaks the hyperactive “continuous integration” of hundreds or thousands of projects, everyone laments the fragility of “the system” and embeds that XKCD cartoon with the precarious structure that you’ve all presumably seen. Business-as-usual, however, is soon restored.

Many of the remedies for overworked, underpaid, burned-out developers have the same, familiar consumerist or neoliberal traits. Trait number one is, of course, to let a million projects bloom, carefully selecting the winners and discarding any that fail to keep up with the constant technological churn that also plagues our societies. Beyond that, things like bounties and donations are proposed, and funding platforms helpfully materialise to facilitate the transaction, themselves mostly funded not by bounties and donations (other than the cut of other people’s bounties and donations, of course) but by venture capital money. Because who would actually want to be going from one “gig” to the next when they could actually get a salary?

And it is revealing that organisations engaging in Free Software development tend to have an enthusiasm not for hiring actual developers but for positions like “community manager”, frequently with the responsibility for encouraging contributions from that desirable stream of eager volunteers. Alongside this, funding is sought from a variety of sources, some of whom being public institutions or progressive organisations perhaps sensing a growing crisis and feeling that something should be done. Other sources are perhaps more about doing “philanthropic work” on behalf of wealthy patrons, although I think I would think twice about taking money from people whose wealth has been built on the back of facilitating psychological warfare on entire populations, undermining public health policies and climate change mitigation, enabling inter-ethnic violence and hate generally, and providing a broadcasting platform for extremists. But as they say, beggars can’t be choosers, right?

It may, of course, be argued that big companies are big employers of Free Software developers. Certainly, lots of people seem to work on Free Software projects in companies like, ahem, Blue Hat. And some of that corporate development does deliver usable software, or at least it helps to mitigate the usability issues of the software being produced elsewhere, maybe in another part of the very same corporation. Large, stable organisations may well be the key to providing developers with secure incomes and the space to focus on producing high-quality, well-designed, long-lasting software. Then again, such organisations sometimes exhibit ethical deficiencies in their own collective activities by seeking to aggressively protect revenue streams by limiting interoperability, reduce costs through offshoring, assert patents against others, and impose needless technological change on their customers and the broader market simply to achieve a temporary competitive advantage.

Free Software organisations should be advocating for quality, stable employment for software developers. For too long, Free Software has been perceived as something for nothing where “everybody else” pays, even as organisations and individuals happily pay substantial sums for hardware and for proprietary software. Deferring to “the market” does nobody any good in the end: “the market” will only pay for what it absolutely has to, and businesses doing nicely selling solutions (who might claim that “the market” works for them and should be good enough for everyone) all too frequently rely on practically invisible infrastructure projects that they get for free. It arguably doesn’t matter if it would be public institutions, as opposed to businesses, ending up hiring people as long as they get decent contracts and aren’t at risk of all being laid off because some right-wing government wants to slash taxes for rich people, as tends to happen every few years.

And Free Software organisations should be advocating for ethical software development. Although the public mood in general may lag rather too far behind that of more informed commentary, the awareness many of us have of the substantial ethical concerns around various applications of computing – artificial intelligence, social media/manipulation platforms, surveillance, “cryptocurrencies”, and so on – requires us to uphold our principles, to recognise where our own principles fall short, and to embrace other causes that seek to safeguard the rights of individuals, the health of our societies, and the viability of our planet as our home.

The Accessible Infrastructure Crisis

Some of that ethical software development would also recognise the longevity we hope our societies may ultimately have. And yet we have every reason to worry about our societies becoming less equitable, less inclusive, and less accessible. The unquestioning adoption of technology-driven, consumerist solutions has led to many of the interactions us individuals have with institutions and providers of infrastructure being mediated by random companies who have inserted themselves into every kind of transaction they have perceived as highly profitable. Meanwhile, technologists have often embraced change through newer and newer technology for its own sake, not for the sake of actual progress or for making life easier.

While devices like smartphones have been liberating for many, providing capabilities that one could only have dreamed of only a few decades ago, they also introduce the risk of imposing relationships and restrictions on individuals to the point where those unable to acquire or use technological devices may find themselves excluded from public facilities, commercial transactions, and even voting in elections or other processes of participatory democracy. Such conditions may be the result of political ideology, the outsourcing and offshoring of supposedly non-essential activities, and the trimming back of the public sector, with any consequences, conflicts of interest, and even corrupt dealings being ignored or deliberately overlooked, dismissed as “nothing that would happen here”.

The risk to Free Software and to our societies is that we as individuals no longer collectively control our infrastructure through our representatives, nor do we control the means of interacting with it, the adoption of technology, or the pace such technology is introduced and obsoleted. When the suggestion to problems using supposedly public infrastructure is to “get a new phone” or “upgrade your computer”, we are actually being exploited by corporate interests and their facilitators. Anyone participating in such a cynical deployment of technology must, I suppose, reconcile their sense of a job well done with the sight of their fellow citizens being obstructed, impoverished or even denied their rights.

Although Free Software organisations have tried to popularise unencumbered sources of mobile software and to promote techniques and technologies to lengthen the lifespan of mobile devices, more fundamental measures are required to reverse the harmful course taken by many of our societies. Some of these measures are political or social, and some are technological. All of them are necessary.

We must reject the notion that progress is dependent on technological consumption. While computers and computing devices have managed to keep getting faster, despite warnings that such trends would meet their demise one way or another, improvements in their operational effectiveness in many regards have been limited. We may be able to view higher quality video today than we could ten years ago, and user interfaces may be pushing around many more pixels, but the results of our interactions are not necessarily more substantial. Yet, the ever-increasing demands of things like Web browsers means that systems become obsolete and are replaced with newer, faster systems to do exactly the same things in any qualitative sense. This wastefulness, burdening individuals with needless expenditure and burdening the environment with even more consumption, must stop.

We must demand interoperability and openness with regard to public infrastructure and even commercial platforms. It should be forbidden to insist that specific products be used to interact with public services and amenities or with commercial operators. The definition and adoption of genuinely open standards would be central to any such demands, and we would need to insist that such standards must encompass every aspect of such interactions and activities, without permitting companies to extend them in incompatible, proprietary ways that would deliberately undermine such initiatives.

We must insist that individuals never be under any obligation to commercial interests when interacting with public infrastructure: that their obligations are only to the public bodies concerned. And, similarly, we should insist that when dealing with companies, they may not also require us to enter into ongoing commercial relationships with other companies, purely as a condition of any transaction. It is unacceptable, for example, that individuals require such a relationship with a foreign technology company purely to gain access to essential services or to conduct purchases or similar transactions.

Ideas for Remedies

In conclusion, we need to care about technological freedoms – our choices of hardware and software, things like online privacy, and all that – but we also need to recognise and care about the social, economic and political conditions that threaten such freedoms. We can’t expect to set up a nice Free Software computing system and to use it forever when forces in society compel us to upgrade every few years. Nor can we expect people to make the hardware for such systems, let alone at an affordable price, when technological indulgence drives the sophistication of hardware to levels where investment in open hardware production is prohibitively costly. And we can’t expect to use our Free Software systems if consumerist and/or corrupt choices sacrifice interoperability and pander to entrenched commercial interests.

The vision here, if we can even call it that, is that we might embrace the “essential” nature of our computing needs and thus embrace hardware with adequate levels of sophistication that could, if everyone were honest with themselves, get the job done just fine. Years ago, people used to say how “Linux is great because it works on older systems”, but these days you apparently cannot even install some distributions with less than 1GB of RAM, and Blue Hat is apparently going to put all the bloat of a “modern” Web browser into its installer. And once we have installed our system, do we really need a video playing in the background of a Web page as we navigate a simple list of train times? Do we even need something as sophisticated as a Web browser for that at all?

Embracing mature, proven, reliable, well-understood hardware would help hardware designers to get their efforts right, and if hardware standards and modularity were adopted, there would still be the chance to introduce improvements and enhancements. Such hardware characteristics would also help with the software support: instead of rushing the long, difficult journey of introducing support for poorly documented components from unhelpful manufacturers eager to retire those components and to start making future products, the aim would be to support components with long commercial lifespans whose software support is well established and hopefully facilitated by the manufacturer. And with suitably standardised or modular hardware, creativity and refinement could be directed towards other aspects of the hardware which are often neglected or underestimated such as ergonomics and the other aspects of traditional product design.

With stable hardware, there might be more software options, too. Although many would propose “just putting Linux on it” for any given device, one need only consider the realm of smartphones to realise that such convenient answers are not necessarily the most obviously correct ones, particularly for certain definitions of “Linux”. Instead of choosing Linux because it probably supports the hardware, only to find that as much time is spent fixing that support and swimming against the currents of upstream development as would have been spent implementing that support elsewhere, other systems with more desirable properties could be considered and deployed. We might even encourage different systems to share functionality, instead of wrapping it up in a specific framework that resists portability. Such systems would also aspire to avoid the churn throughout the GNU-plus-Linux-plus-graphical-stack-of-the-day familiar to many of us, potentially allowing us to use familiar software over much longer periods than we have generally been allowed to before, retrocomputing platforms aside.

But to let all of this happen and to offer a viable foundation, we must also ensure that such systems can be used in the wider world. Otherwise, this would merely be an exercise in retrocomputing. Now, there is an argument that there are plenty of existing standards that might facilitate this vision, and perhaps going along with the famous saying about standards (the good thing being that there are so many to choose from), we might wish to avoid the topic of yet another widely-referenced XKCD cartoon by actually adopting some of them instead of creating more of them. That is not to say that we would necessarily want to go along with the full breadth of some standards, however. XML deliberately narrowed down SGML to be a more usable technology, despite its own reputation for complexity. Since some standards were probably “front-run” by companies wishing to elevate their own products, within which various proposed features were already implemented, thus forcing their competitors to play catch up, it is entirely possible that various features are superfluous or frivolous.

There have already been attempts to simplify the Web or to make a simpler Web-like platform, Gemini being one of them, and there are persuasive arguments that such technologies should be considered as separate from the traditional Web. After all, the best of intentions in delivering a simple, respectful experience can easily be undermined by enthusiasm for the latest frameworks and fashions, or by the insistence that less than respectful techniques and technologies – user surveillance, to take one example – be introduced to “help understand” or “better serve” users. A distinct technology might offer easy ways of resisting such temptations by simply failing to support them conveniently, but the greater risk is that it might not even get adopted significantly at all.

New standards might well be necessary, but revising and reforming existing ones might well be more productive, and there is merit in focusing standards on the essentials. After all, people used the Web for real work twenty years ago, too. And some would argue that today’s Web is just reimplementing the client-server paradigm but with JavaScript on the front end, grinding your CPU, with application-specific communications conducted between the browser and the server. Such communications will, for the most part, not be specified and be prone to changing and breaking, interoperability being the last thing on everybody’s mind. Formalising such communications, and adopting technologies more appropriate to each device and to each user, might actually be beneficial: instead of megabytes of JavaScript passing across the network and through the browser, the user would get to choose how they access such services, which programs they might use, rather than having “an experience” foisted upon them.

Such an approach would actually return us to something close to the original vision of the Web. But standards surely have to be seen as the basis of the Free Software we might hope to use, and as the primary vehicle for the persuasion of others. Public institutions and businesses care about reaching the biggest possible audience, and this has brought us to a rather familiar sight: the anointment of two viable players in a particular market and no others. Back in the 1990s, the two chosen ones in desktop computing were Microsoft and Apple, the latter kept afloat by the former so as to avoid being perceived as a de-facto monopoly and thereby avoiding being subject to proper regulation. Today, Apple and Google are the gatekeepers in mobile computing, with even Microsoft being an unwelcome complication.

Such organisations want to offer solutions that supposedly reach “the most users”, will happily commission “apps” for the big two players (and Microsoft, sometimes, because habits and favouritism die hard), and will probably shy away from suggesting other solutions, labelling them as confusing or unreliable, mostly because they just don’t want to care: their job is done, the boxes ticked, more effort gives no more reward. But standards offer the possibility of reaching every user, of meeting legal accessibility requirements, and potentially allowing such organisations to delegate the provision of solutions to their favourite entity: “the market”. Naturally, some kind of validation of standards compliance would probably be required, but this need not be overly restrictive nor the business of every last government department or company.

So, I suppose a combination of genuinely open standards facilitating Free Software and accessible public and private services, with users able to adopt and retain open and long-lasting hardware, might be a glimpse of some kind of vision. How people might make good enough money to be able to live decently is another question entirely, but then again, perhaps cultivating simpler, durable, sustainable infrastructure might create opportunities in the development of products that use it, allowing people to focus on improving those products, that infrastructure and the services they collectively deliver as opposed to chasing every last fad and fashion, running faster and faster and yet having the constant feeling of falling behind. As many people seem to experience in many other aspects of their lives.

Well, I hope the positivity was in there somewhere!

Dataspaces and Paging in L4Re

Tuesday, March 19th, 2019

The experiments covered by my recent articles about filesystems and L4Re managed to lead me along another path in the past few weeks. I had defined a mechanism for providing access to files in a filesystem via a programming interface employing interprocess communication within the L4Re system. In doing so, I had defined calls or operations that would read from and write to a file, observing that some kind of “memory-mapped” file support might also be possible. At the time, I had no clear idea of how this would actually be made to work, however.

As can often be the case, once some kind of intellectual challenge emerges, it can become almost impossible to resist the urge to consider it and to formulate some kind of solution. Consequently, I started digging deeper into a number of things: dataspaces, pagers, page faults, and the communication that happens within L4Re via the kernel to support all of these things.

Dataspaces and Memory

Because the L4Re developers have put a lot of effort into making a system where one can compile a fairly portable program and probably expect it to work, matters like the allocation of memory within programs, the use of functions like malloc, and other things we take for granted need no special consideration in the context of describing general development for an L4Re-based system. In principle, if our program wants more memory for its own use, then the use of things like malloc will probably suffice. It is where we have other requirements that some of the L4Re abstractions become interesting.

In my previous efforts to support MIPS-based systems, these other requirements have included the need to access memory with a fixed and known location so that the hardware can be told about it, thus supporting things like framebuffers that retain stored data for presentation on a display device. But perhaps most commonly in a system like L4Re, it is the need to share memory between processes or tasks that causes us to look beyond traditional memory allocation techniques at what L4Re has to offer.

Indeed, the filesystem work so far employs what are known as dataspaces to allow filesystem servers and client applications to exchange larger quantities of information conveniently via shared buffers. First, the client requests a dataspace representing a region of memory. It then associates it with an address so that it may access the memory. Then, the client shares this with the server by sending it a reference to the dataspace (known as a capability) in a message.

Opening a file using shared memory containing file information

Opening a file using shared memory containing file information

The kernel, in propagating the message and the capability, makes the dataspace available to the server so that both the client and the server may access the memory associated with the dataspace and that these accesses will just work without any further effort. At this level of sophistication we can get away with thinking of dataspaces as being blocks of memory that can be plugged into tasks. Upon obtaining access to such a block, reads and writes (or loads and stores) to addresses in the block will ultimately touch real memory locations.

Even in this simple scheme, there will be some address translation going on because each task has its own way of arranging its view of memory: its virtual address space. The virtual memory addresses used by a task may very well be different from the physical memory addresses indicating the actual memory locations involved in accesses.

An illustration of virtual memory corresponding to physical memory

An illustration of virtual memory corresponding to physical memory

Such address translation is at the heart of operating systems like those supported by the L4 family of microkernels. But the system will make sure that when a task tries to access a virtual address available to it, the access will be translated to a physical address and supported by some memory location.

Mapping and Paging

With some knowledge of the underlying hardware architecture, we can say that each task will need support from the kernel and the hardware to be able to treat its virtual address space as a way of accessing real memory locations. In my experiments with simple payloads to run on MIPS-based hardware, it was sufficient to define very simple tables that recorded correspondences between virtual and physical addresses. Processes or tasks would access memory addresses, and where the need arose to look up such a virtual address, the table would be consulted and the hardware configured to map the virtual address to a physical address.

Naturally, proper operating systems go much further than this, and systems built on L4 technologies go as far as to expose the mechanisms for normal programs to interact with. Instead of all decisions about how memory is mapped for each task being taken in the kernel, with the kernel being equipped with all the necessary policy and information, such decisions are delegated to entities known as pagers.

When a task needs an address translated, the kernel pushes the translation activity over to the designated pager for a decision to be made. And the event that demands an address translation is known as a page fault since it occurs when a task accesses a memory page that is not yet supported by a mapping to physical memory. Pagers are therefore present to receive page fault notifications and to respond in a way that causes the kernel to perform the necessary privileged actions to configure the hardware, this being one of the few responsibilities of the kernel.

The role of a pager in managing access to the contents of a dataspace

The role of a pager in managing access to the contents of a dataspace

Treating a dataspace as an abstraction for memory accessed by a task or application, the designated pager for the dataspace acts as a dataspace manager, ensuring that memory accesses within the dataspace can be satisfied. If an access causes a page fault, the pager must act to provide a mapping for the accessed page, leaving the application mostly oblivious to the work going on to present the dataspace and its memory as a continuously present resource.

An Aside

It is rather interesting to consider the act of delegation in the context of processor architecture. It would seem to be fairly common that the memory management units provided by various architectures feature built-in support for consulting various forms of data structures describing the virtual memory layout of a process or task. So, when a memory access fails, the information about the actual memory address involved can be retrieved from such a predefined structure.

However, the MIPS architecture largely delegates such matters to software: a processor exception is raised when a “bad” virtual address is used, and the job of doing something about it falls immediately to a software routine. So, there seems to be some kind of parallel between processor architecture and operating system architecture, L4 taking a MIPS-like approach of eager delegation to a software component for increased flexibility and functionality.

Messages and Flexpages

So, the high-level view so far is as follows:

  • Dataspaces represent regions of virtual memory
  • Virtual memory is mapped to physical memory where the data actually resides
  • When a virtual memory access cannot be satisfied, a page fault occurs
  • Page faults are delivered to pagers (acting as dataspace managers) for resolution
  • Pagers make data available and indicate the necessary mapping to satisfy the failing access

To get to the level of actual implementation, some familiarisation with other concepts is needed. Previously, my efforts have exposed me to the interprocess communications (IPC) central in L4Re as a microkernel-based system. I had even managed to gain some level of understanding around sending references or capabilities between processes or tasks. And it was apparent that this mechanism would be used to support paging.

Unfortunately, the main L4Re documentation does not seem to emphasise the actual message details or protocols involved in these fundamental activities. Instead, the library code is described in reference documentation with some additional explanation. However, some investigation of the code yielded some insights as to the kind of interfaces the existing dataspace implementations must support, and I also tracked down some message sending activities in various components.

When a page fault occurs, the first thing to know about it is the kernel because the fault occurs at the fundamental level of instruction execution, and it is the kernel’s job to deal with such low-level events in the first instance. Notification of the fault is then sent out of the kernel to the page fault handler for the affected task. The page fault handler then contacts the task’s pager to request a resolution to the problem.

Page fault handling in detail

Page fault handling in detail

In L4Re, this page fault handler is likely to be something called a region mapper (or perhaps a region manager), and so it is not completely surprising that the details of invoking the pager was located in some region mapper code. Putting together both halves of the interaction yielded the following details of the message:

  • map: offset, hot spot, flags → flexpage

Here, the offset is the position of the failing memory access relative to the start of the dataspace; the flags describe the nature of the memory to be accessed. The “hot spot” and “flexpage” need slightly more explanation, the latter being an established term in L4 circles, the former being almost arbitrarily chosen and not particularly descriptive.

The term “flexpage” may have its public origins in the “Flexible-Sized Page-Objects” paper whose title describes the term. For our purposes, the significance of the term is that it allows for the consideration of memory pages in a range of sizes instead of merely considering a single system-wide page size. These sizes start at the smallest page size supported by the system (but not necessarily the absolute smallest supported by the hardware, but anyway…) and each successively larger size is double the size preceding it. For example:

  • 4096 (212) bytes
  • 8192 (213) bytes
  • 16384 (214) bytes
  • 32768 (215) bytes
  • 65536 (216) bytes

When a page fault occurs, the handler identifies a region of memory where the failing access is occurring. Although it could merely request that memory be made available for a single page (of the smallest size) in which the access is situated, there is the possibility that a larger amount of memory be made available that encompasses this access page. The flexpage involved in a map request represents such a region of memory, having a size not necessarily decided in advance, being made available to the affected task.

This brings us to the significance of the “hot spot” and some investigation into how the page fault handler and pager interact. I must admit that I find various educational materials to be a bit vague on this matter, at least with regard to explicitly describing the appropriate behaviour. Here, the flexpage paper was helpful in providing slightly different explanations, albeit employing the term “fraction” instead of “hot spot”.

Since the map request needs to indicate the constraints applying to the region in which the failing access occurs, without demanding a particular size of region and yet still providing enough useful information to the pager for the resulting flexpage to be useful, an efficient way is needed of describing the memory landscape in the affected task. This is apparently where the “hot spot” comes in. Consider a failing access in page #3 of a memory region in a task, with the memory available in the pager to satisfy the request being limited to two pages:

Mapping available memory to pages in a task experiencing a page fault

Mapping available memory to pages in a task experiencing a page fault

Here, the “hot spot” would reference page #3, and this information would be received by the pager. The significance of the “hot spot” appears to be the location of the failing access within a flexpage, and if the pager could provide it then a flexpage of four system pages would map precisely to the largest flexpage expected by the handler for the task.

However, with only two system pages to spare, the pager can only send a flexpage consisting of those two pages, the “hot spot” being localised in page #1 of the flexpage to be sent, and the base of this flexpage being the base of page #0. Fortunately, the handler is smart enough to fit this smaller flexpage onto the “receive window” by using the original “hot spot” information, mapping page #2 in the receive window to page #0 of the received flexpage and thus mapping the access page #3 to page #1 of that flexpage.

So, the following seems to be considered and thus defined by the page fault handler:

  • The largest flexpage that could be used to satisfy the failing access.
  • The base of this flexpage.
  • The page within this flexpage where the access occurs: the “hot spot”.
  • The offset within the broader dataspace of the failing access, it indicating the data that would be expected in this page.

(Given this phrasing of the criteria, it becomes apparent that “flexpage offset” might be a better term than “hot spot”.)

With these things transferred to the pager using a map request, the pager’s considerations are as follows:

  • How flexpages of different sizes may fit within the memory available to satisfy the request.
  • The base of the most appropriate flexpage, where this might be the largest that fits within the available memory.
  • The population of the available memory with data from the dataspace.

To respond to the request, the pager sends a special flexpage item in its response message. Consequently, this flexpage is mapped into the task’s address space, and the execution of the task may resume with the missing data now available.

Practicalities and Pitfalls

If the dataspace being provided by a pager were merely a contiguous region of memory containing the data, there would probably be little else to say on the matter, but in the above I hint at some other applications. In my example, the pager only uses a certain amount of memory with which it responds to map requests. Evidently, in providing a dataspace representing a larger region, the data would have to be brought in from elsewhere, which raises some other issues.

Firstly, if data is to be copied into the limited region of memory available for satisfying map requests, then the appropriate portion of the data needs to be selected. This is mostly a matter of identifying how the available memory pages correspond to the data, then copying the data into the pages so that the accessed location ultimately provides the expected data. It may also be the case that the amount of data available does not fill the available memory pages; this should cause the rest of those pages to be filled with zeros so that data cannot leak between map requests.

Secondly, if the available memory pages are to be used to satisfy the current map request, then what happens when we re-use them in each new map request? It turns out that the mappings made for previous requests remain active! So if a task traverses a sequence of pages, and if each successive page encountered in that traversal causes a page fault, then it will seem that new data is being made available in each of those pages. But if that task inspects the earlier pages, it will find that the newest data is exposed through those pages, too, banishing the data that we might have expected.

Of course, what is happening is that all of the mapped pages in the task’s dataspace now refer to the same collection of pages in the pager, these being dedicated to satisfying the latest map request. And so, they will all reflect the contents of those available memory pages as they currently are after this latest map request.

The effect of mapping the same page repeatedly

The effect of mapping the same page repeatedly

One solution to this problem is to try and make the task forget the mappings for pages it has visited previously. I wondered if this could be done automatically, by sending a flexpage from the pager with a flag set to tell the kernel to invalidate prior mappings to the pager’s memory. After a time looking at the code, I ended up asking on the l4-hackers mailing list and getting a very helpful response that was exactly what I had been looking for!

There is, in fact, a special way of telling the kernel to “unmap” memory used by other tasks (l4_task_unmap), and it is this operation that I ended up using to invalidate the mappings previously sent to the task. Thus the task, upon backtracking to earlier pages, finds that the mappings from virtual addresses to the physical memory holding the latest data are absent once again, and page faults are needed to restore the data in those pages. The result is a form of multiplexing access to a resource via a limited region of memory.

Applications of Flexible Paging

Given the context of my investigations, it goes almost without saying that the origin of data in such a dataspace could be a file in a filesystem, but it could equally be anything that exposes data in some kind of backing store. And with this backing store not necessarily being an area of random access memory (RAM), we enter the realm of a more restrictive definition of paging where processes running in a system can themselves be partially resident in RAM and partially resident in some other kind of storage, with the latter portions being converted to the former by being fetched from wherever they reside, depending on the demands made on the system at any given point in time.

One observation worth making is that a dataspace does not need to be a dedicated component in the system in that it is not a separate and special kind of entity. Anything that is able to respond to the messages understood by dataspaces – the paging “protocol” – can provide dataspaces. A filesystem object can therefore act as a dataspace, exposing itself in a region of memory and responding to map requests that involve populating that region from the filesystem storage.

It is also worth mentioning that dataspaces and flexpages exist at different levels of abstraction. Dataspaces can be considered as control mechanisms for accessing regions of virtual memory, and the Fiasco.OC kernel does not appear to employ the term at all. Meanwhile, flexpages are abstractions for memory pages existing within or even independently of dataspaces. (If you wish, think of the frame of Banksy’s work “Love is in the Bin” as a dataspace, with the shredded pieces being flexpages that are mapped in and out.)

One can envisage more exotic forms of dataspace. Consider an image whose pixels need to be computed, like a ray-traced image, for instance. If it exposed those pixels as a dataspace, then a task reading from pages associated with that dataspace might cause computations to be initiated for an area of the image, with the task being suspended until those computations are performed and then being resumed with the pixel data ready to read, with all of this happening largely transparently.

I started this exercise out of somewhat idle curiosity, but it now makes me wonder whether I might introduce memory-mapped access to filesystem objects and then re-implement operations like reading and writing using this particular mechanism. Not being familiar with how systems like GNU/Linux provide these operations, I can only speculate as to whether similar decisions have been taken elsewhere.

But certainly, this exercise has been informative, even if certain aspects of it were frustrating. I hope that this account of my investigations proves useful to anyone else wondering about microkernel-based systems and L4Re in particular, especially if they too wish there were more discussion, reflection and collaboration on the design and implementation of software for these kinds of systems.

Integrating libext2fs with a Filesystem Framework

Wednesday, February 20th, 2019

Given the content covered by my previous articles, there probably doesn’t seem to be too much that needs saying about the topic covered by this article. Previously, I described the work involved in building libext2fs for L4Re and testing the library, and I described a framework for separating filesystem providers from programs that want to use files. But, as always, there are plenty of little details, detours and learning experiences that help to make the tale longer than it otherwise might have been.

Although this file access framework sounds intimidating, it is always worth remembering that the only exotic thing about the software being written is that it needs to request system resources and to communicate with other programs. That can be tricky in itself in many programming environments, and I have certainly spent enough time trying to figure out how to use the types and functions provided by the many L4Re libraries so that these operations may actually work.

But in the end, these are programs that are run just like any other. We aren’t building things into the kernel and having to conform to a particularly restricted environment. And although it can still be tiresome to have to debug things, particularly interprocess communication (IPC) problems, many familiar techniques for debugging and inspecting program behaviour remain available to us.

A Quick Translation

The test program I had written for libext2fs simply opened a file located in the “rom” filesystem, exposed it to libext2fs, and performed operations to extract content. In my framework, I had directed my attention towards opening and reading files, so it made sense to concentrate on providing this functionality in a filesystem server or “provider”.

Accessing a filesystem server employing a "rom" file for the data

Accessing a filesystem server employing a "rom" file for the data

The user of the framework (shielded from the details by a client library) would request the opening of a file (thus obtaining a file descriptor able to communicate with a dedicated resource object) and then read from the file (causing communication with the resource object and some transfers of data). These operations, previously done in a single program employing libext2fs directly, would now require collaboration by two separate programs.

So, I would need to insert the appropriate code in the right places in my filesystem server and its objects to open a filesystem, search for a file of the given name, and to provide the file data. For the first of these, the test program was doing something like this in the main function:

retval = ext2fs_open(devname, EXT2_FLAG_RW, 0, 0, unix_io_manager, &fs);

In the main function of the filesystem server program, something similar needs to be done. A reference to the filesystem (fs) is then passed to the server object for it to use:

Fs_server server_obj(fs, devname);

When a request is made to open a file, the filesystem server needs to locate the file just as the test program needed to. The code to achieve this is tedious, employing the ext2fs_lookup function and traversing the directory hierarchy. Ultimately, something like this needs to be done to obtain a structure for accessing the file contents:

retval = ext2fs_file_open(_fs, ino_file, ext2flags, &file);

Here, the _fs variable is our reference in the server object to the filesystem structure, the ino_file variable refers to the place in the filesystem where the file is found (the inode), some flags indicate things like whether we are reading and/or writing, and a supplied file variable is set upon the successful opening of the file. In the filesystem server, we want to create a specific object to conduct access to the file:

Fs_object *obj = new Fs_object(file, EXT2_I_SIZE(&inode_file), fsobj, irq);

Here, this resource object is initialised with the file access structure, an indication of the file size, something encapsulating the state of the communication between client and server, and the IRQ object needed for cleaning up (as described in the last article). Meanwhile, in the resource object, the read operation is supported by a pair of libext2fs functions:

ext2fs_file_lseek(_file, _obj.position, EXT2_SEEK_SET, 0);
ext2fs_file_read(_file, _obj.buffer, to_transfer, &read);

These don’t appear next to each other in the actual code, but the first call is used to seek to the indicated position in the file, this having been specified by the client. The second call appears in a loop to read into a buffer an indicated amount of data, returning the amount that was actually read.

In summary, the work done by a collection of function calls appearing together in a single function is now spread out over three places in the filesystem server program:

  • The initialisation is done in the main function as the server starts up
  • The locating and opening of a file in the filesystem is done in the general filesystem server object
  • Reading and writing is done in the file-specific resource object

After initialisation, the performance of each part of the work only occurs upon receiving a distinct kind of message from a client program, of which more details are given below.

The Client Library

Although we cannot yet use the familiar C library functions for accessing files (fopen, fread, fwrite, fclose, and so on), we can employ functions that try to be as friendly. Thus, the following form of program may be used:

char buffer[80];
file_descriptor_t *desc = client_open("test.txt", O_RDONLY);

available = client_read(desc, buffer, 80);
if (available)
    fwrite((void *) buffer, sizeof(char), available, stdout); /* using existing fwrite function */
client_close(desc);

As noted above, the existing fwrite function in L4Re may be used to write file data out to the console. Ultimately, we would want our modified version of the function to be doing this job.

These client library functions resemble lower-level C library functions such as open, read, write, close, and so on. By targeting this particular level of functionality, it is hoped that much of the logic in functions like fopen can be preserved, this logic having to deal with things like mode strings (“r”, “r+”, “w”, and so on) which have little to do with the actual job of transmitting file content around the system.

In order to do their work, the client library functions need to send and receive IPC messages, or at least need to get other functions to deal with this particular work. My approach has been to write a layer of functions that only deals with messaging and that hides the L4-specific details from the rest of the code.

This lower-level layer of functions allows us to treat interprocess interactions like normal function calls, and in this framework those calls would have the following signatures, with the inputs arriving at the server and the outputs arriving back at the client:

  • fs_open: flags, buffer → file size, resource object
  • fs_flush: (no parameters) → (no return values)
  • fs_read: position → available
  • fs_write: position, available → written, file size
Here, the aim is to keep the interprocess interactions as simple and as infrequent as possible, with data buffered in the indicated buffer dataspace, and with reading and writing only occurring when the buffer is read or has been filled by writing. The more friendly semantics therefore need to be supported in the client library functions resting on top of these even-lower-level IPC messaging functions.

The responsibilities of the client library functions can be summarised as follows:

  • client_open: allocate memory for the buffer, obtain a server reference (“capability”) from the program’s environment
  • client_close: deallocate the allocated resources
  • client_flush: invoke fs_flush with any available data, resetting the buffer status
  • client_read: provide data to the caller from its buffer, invoking fs_read whenever the buffer is empty
  • client_write: commit data from the caller into the buffer, invoking fs_write whenever the buffer is full, also flushing the buffer when appropriate

The lack of a fs_close function might seem surprising, but as described in the previous article, the server process is designed to receive a notification when the client process discards a reference to the resource object dedicated to a particular file. So in client_close, we should be able to merely throw away the things acquired by client_open, and the system together with the server will hopefully handle the consequences.

Switching the Backend

Using a conventional file as the repository for file content is convenient, but since the aim is to replace the existing filesystem mechanisms, it would seem necessary to try and get libext2fs to use other ways of accessing the underlying storage. Previously, my considerations had led me to provide a “block” storage layer underneath the filesystem layer. So it made sense to investigate how libext2fs might communicate with a “block server” or “block device” in order to read and write raw filesystem data.

Employing a separate server to provide filesystem data

Employing a separate server to provide filesystem data

Changing the way libext2fs accesses its storage sounds like an ominous task, but fortunately some thought has evidently gone into accommodating different storage types and platforms. Indeed, the library code includes support for things like DOS and Windows, with this functionality evidently being used by various applications on those platforms (or, these days, the latter one, at least) to provide some kind of file browser support for ext2-family filesystems.

The kind of component involved in providing this variety of support is known as an “I/O manager”, and the one that we have been using is known as the “Unix” I/O manager, this employing POSIX or standard C library calls to access files and devices. Now, this may have been adequate until now, but with the requirement that we use the replacement IPC mechanisms to access a block server, we need to consider how a different kind of I/O manager might be implemented to use the client library functions instead of the C library functions.

This exercise turned out to be relatively straightforward and perhaps a little less work than envisaged once the requirements of initialising an io_channel object had been understood, this involving the allocation of memory and the population of a structure to indicate things like the block size, error status, and so on. Beyond this, the principal operations needing support are as follows:

  • open: initialises the io_channel and calls client_open
  • close: calls client_close
  • set block size: sets the block size for transfers, something that gets done at various points in the opening of a filesystem
  • read block: calls client_seek and client_read to obtain data from the block server
  • write block: calls client_seek and client_write to commit data to the block server

It should be noted that the block server largely acts like a single-file filesystem, so the same interface supported by the filesystem server is also supported by the block server. This is how we get away with using the client libraries.

Meanwhile, in the filesystem server code, the only changes required are to declare the new I/O manager, implemented in a separate library package, and to use it instead of the previous one:

retval = ext2fs_open(devname, ext2flags, 0, 0, blockserver_io_manager, &fs);

The Final Trick

By pushing use of the “rom” filesystem further down in the system, use of the new file access mechanisms can be introduced and tested, with the only “unauthentic” aspect of the arrangement being that a parallel set of file access functions is being used instead of the conventional ones. The only thing left to do would be to change the C library to incorporate the new style of file access, probably by incorporating the client library internally, thus switching the C library away from its previous method of accessing files.

With the conventional file abstractions reimplemented, access to files would go via the virtual filesystem and hopefully end up encountering block devices that are able to serve up the needed data directly. And ultimately, we could end up switching back to using the Unix I/O manager with libext2fs.

Introducing the new IPC mechanisms at the C library level

Introducing the new IPC mechanisms at the C library level

Changing things so drastically would also force us to think about maintaining access to the “rom” filesystem through the revised architecture, at least at first, because it happens to provide a very convenient way of getting access to data for use as storage. We could try and implement storage hardware support in order to get round this problem, but that probably isn’t convenient – or would be a distraction – when running L4Re on Fiasco.OC-UX as a kind of hosted version of the software.

Indeed, tackling the C library is probably too much of a challenge at this early stage. Fortunately, there are plenty of other issues to be considered first, with the use of non-standard file access functions being only a minor inconvenience in the broader scheme of things. For instance, how are permissions and user identities to be managed? What about concurrent access to the filesystem? And what mechanisms would need to be provided for grafting filesystems onto a larger virtual filesystem hierarchy? I hope to try and discuss some of these things in future articles.

Some Attention to Detail

Thursday, February 14th, 2019

I spent some time recently looking at my Python-like language, Lichen, and its toolchain. Although my focus was on improving support for floating point numbers and arithmetic, of which more may need to be written in a future article, I ended up noticing a few things that needed correcting and had escaped my attention. One of these probably goes a long way to solving a mystery raised in a previous article.

The investigation into floating point support necessitated some scrutiny of the way floating point numbers are allocated when compiled Lichen programs are run. CPython – the C language implementation of a virtual machine for the Python language – has various strategies for reserving memory for floating point numbers, this not being particularly surprising given what it does for integers, as we previously saw. What bothered me was how much time was being spent allocating space for numbers needed to store computation results.

I spent quite a bit of time looking at the run-time support code for compiled programs, trying different strategies to “preallocate” number instances and other things, but it was when I was considering various other optimisation strategies and running generated programs in the GNU debugger (gdb) that I happened to notice something about the type definitions that are generated for instances. For example, here is what a tuple instance type looks like:

typedef struct {
    const __table * table;
    __pos pos;
    __attr attrs[__csize___builtins___tuple_tuple];
} __obj___builtins___tuple_tuple;

And here is what it should look like:

typedef struct {
    const __table * table;
    __pos pos;
    __attr attrs[__isize___builtins___tuple_tuple];
} __obj___builtins___tuple_tuple;

Naturally, I will excuse you for not necessarily noticing the crucial difference, but it is the size of the attrs array, this defining the attributes that are available via each instance of the tuple type. And here, I had used a constant prefixed with “__csize” meaning class size, as opposed to “__isize” meaning instance size. With so many things to think about when finishing off my toolchain, I had accidentally presented the wrong kind of value to the code generating these type definitions.

So, what was going to happen was that instances were going to be given the wrong number of attributes: a potentially catastrophic fault! But it is in the case of types like the tuple where things get more interesting than that. Such types tend to have lots of methods associated with them, and these methods are, of course, stored as class attributes.

Meanwhile, tuple instances are likely to have far fewer attributes, and even when the tuple data is considered, since tuples frequently have few elements, such instances are also likely to be far smaller than the size of the tuple class’s structure. Indeed, the following definitions are more or less indicative of the sizes of the tuple class and of tuple instances:

__csize___builtins___tuple_tuple = 36
__isize___builtins___tuple_tuple = 2

And I had noticed this because, for some reason unknown to me at the time but obviously known to me now, floating point numbers were being allocated using far more space than I thought appropriate. Here are some definitions of interest:

__csize___builtins___float_float = 43
__isize___builtins___float_float = 1

Evidently, something was very wrong until I noticed my simple mistake: that in the code generating the definitions for program types, I had accidentally used the wrong constant for instance attribute arrays. Fixing this meant that the memory allocator probably only needed to find 16 bytes or so, as opposed to maybe 186 bytes, for each number!

Returning to tuples, though, it becomes interesting to see what effect this fix has on the performance of the benchmark previously discussed. We had previously seen that a program using tuples was inexplicably far slower than one employing objects to represent the same data. But with this unnecessary allocation occurring, it seems possible that this might have been making some extra work for the allocator and garbage collector.

Here is a table of measurements from running the benchmark before and after the fix had been applied:

Program Version Time Maximum Memory Usage
Tuples 24s 122M
Objects 15s 54M
Tuples (fixed) 17s 30M
Objects (fixed) 13s 30M

Although there is still a benefit to using objects to model data in Lichen as opposed to keeping such data in tuples, the benefit is not as pronounced as before, with the memory usage now clearly comparable as we would expect. With this fix applied, both versions of the benchmark are even faster than they were before, but it is especially gratifying that the object-based version is now ten times faster when compiled with the Lichen toolchain than the same program run by the CPython virtual machine.

Filesystem Abstractions for L4Re

Tuesday, February 12th, 2019

In my previous posts, I discussed the possibility of using “real world” filesystems in L4Re, initially considering the nature of code to access an ext2-based filesystem using the library known as libext2fs, then getting some of that code working within L4Re itself. Having previously investigated the nature of abstractions for providing filesystems and file objects to applications, it was inevitable that I would now want to incorporate libext2fs into those abstractions and to try and access files residing in an ext2 filesystem using those abstractions.

It should be remembered that L4Re already provides a framework for filesystem access, known as Vfs or “virtual file system”. This appears to be the way the standard file access functions are supported, with the “rom” part of the filesystem hierarchy being supported by a “namespace filesystem” library that understands the way that the deployed payload modules are made available as files. To support other kinds of filesystem, other libraries must apparently be registered and made available to programs.

Although I am sure that the developers of the existing Vfs framework are highly competent, I found the mechanisms difficult to follow and quite unlike what I expected, which was to see a clear separation between programs accessing files and other programs providing those files. Indeed, this is what one sees when looking at other systems such as Minix 3 and its virtual filesystem. I must also admit that I became tired of having to dig into the code to understand the abstractions in order to supplement the reference documentation for the Vfs framework in L4Re.

An Alternative Framework

It might be too soon to label what I have done as a framework, but at the very least I needed to decide upon a few mechanisms to implement an alternative approach to providing file-like abstractions to programs within L4Re. There were a few essential characteristics to be incorporated:

  • A way of requesting access to a named file
  • The provision of objects maintaining the state of access to an opened file
  • The transmission of file content to file readers and from file writers
  • A way of cleaning up when programs are no longer accessing files

One characteristic that I did want to uphold in any solution was to make programs largely oblivious to the nature of the accessed filesystems. They would navigate a virtual filesystem hierarchy, just as one does in Unix-like systems, with certain directories acting as gateways to devices exposing potentially different filesystems with superficially similar semantics.

Requesting File Access

In a system like L4Re, with notions of clients and servers already prevalent, it seems natural to support a mechanism for requesting access to files that sees a client – a program wanting to access a file – delegating the task of locating that file to a server. How the server performs this task may be as simple or as complicated as we wish, depending on what kind of architecture we choose to support. In an operating system with a “monolithic” kernel, like GNU/Linux, we also see such delegation occurring, with the kernel being the entity having to support the necessary wiring up of filesystems contributing to the virtual filesystem.

So, it makes sense to support an “open” system call just like in other operating systems. The difference here, however, is that since L4Re is a microkernel-based environment, both the caller and the target of the call are mere programs, with the kernel only getting involved to route the call or message between the programs concerned. We would first need to make sure that the program accessing files has a reference (known as a “capability”) to another program that provides a filesystem and can respond to this “open” message. This wiring up of programs is a task for the system’s configuration file, as featured in some of my previous articles.

We may now consider what the filesystem-providing program or filesystem “server” needs to do when receiving an “open” message. Let us consider the failure to find the requested file: the filesystem server would, in such an event, probably just return a response indicating failure without any real need to do anything else. It is in the case of a successful lookup that the response to the caller or client needs some more consideration: the server could indicate success, but what is the client going to do with such information? And how should the server then facilitate further access to the file itself?

Providing Objects for File Access

It becomes gradually clearer that the filesystem server will need to allocate some resources for the client to conduct its activities, to hold information read from the filesystem itself and to hold data sent for writing back to the opened file. The server could manage this within a single abstraction and support a range of different operations, accommodating not only requests to open files but also operations on the opened files themselves. However, this might make the abstraction complicated and also raise issues around things like concurrency.

What if this server object ends up being so busy that while waiting for reading or writing operations to complete on a file for one program, it leaves other programs queuing up to ask about or gain access to other files? It all starts to sound like another kind of abstraction would be beneficial for access to specific files for specific clients. Consequently, we end up with an arrangement like this:

Accessing a filesystem and a resource

Accessing a filesystem and a resource

When a filesystem server receives an “open” message and locates a file, it allocates a separate object to act as a contact point for subsequent access to that file. This leaves the filesystem object free to service other requests, with these separate resource objects dealing with the needs of the programs wanting to read from and write to each individual file.

The underlying mechanisms by which a separate resource object is created and exposed are as follows:

  1. The instantiation of an object in the filesystem server program holding the details of the accessed file.
  2. The creation of a new thread of execution in which the object will run. This permits it to handle incoming messages concurrently with the filesystem object.
  3. The creation of an “IPC gate” for the thread. This effectively exposes the object to the wider environment as what often appears to be known as a “kernel object” (rather confusingly, but it simply means that the kernel is aware of it and has to do some housekeeping for it).

Once activated, the thread created for the resource is dedicated to listening for incoming messages and handling them, invoking methods on the resource object as a proxy for the client sending those messages to achieve the same effect.

Transmitting File Content

Although we have looked at how files manifest themselves and may be referenced, the matter of obtaining their contents has not been examined too closely so far. A program might be able to obtain a reference to a resource object and to send it messages and receive responses, but this is not likely to be sufficient for transferring content to and from the file. The reason for this is that the messages sent between programs – or processes, since this is how we usually call programs that are running – are deliberately limited in size.

Thus, another way of exchanging this data is needed. In a situation where we are reading from a file, what we would most likely want to see is a read operation populate some memory for us in our process. Indeed, in a system like GNU/Linux, I imagine that the Linux kernel shuttles the file data from the filesystem module responsible to an area of memory that it has reserved and exposed to the process. In a microkernel-based system, things have to be done more “collaboratively”.

The answer, it would seem to me, is to have dedicated memory that is shared between processes. Fortunately, and arguably unsurprisingly, L4Re provides an abstraction known as a “dataspace” that provides the foundation for such sharing. My approach, then, involves requesting a dataspace to act as a conduit for data, making the dataspace available to the file-accessing client and the file-providing server object, and then having a protocol to notify each other about data being sent in each direction.

I considered whether it would be most appropriate for the client to request the memory or whether the server should do so, eventually concluding that the client could decide how much space it would want as a buffer, handing this over to the server to use to whatever extent it could. A benefit of doing things this way is that the client may communicate initialisation details when it contacts the server, and so it becomes possible to transfer a filesystem path – the location of a file from the root of the filesystem hierarchy – without it being limited to the size of an interprocess message.

Opening a file using a path written to shared memory

Opening a file using a path written to shared memory

So, the “open” message references the newly-created dataspace, and the filesystem server reads the path written to the dataspace’s memory so that it may use it to locate the requested file. The dataspace is not retained by the filesystem object but is instead passed to the resource object which will then share the memory with the application or client. As described above, a reference to the resource object is returned in the response to the “open” message.

It is worthwhile to consider the act of reading from a file exposed in this way. Although both client (the application in the above diagram) and server (resource object) should be able to access the shared “buffer”, it would not be a good idea to let them do so freely. Instead, their roles should be defined by the protocol employed for communication with one another. For a simple synchronous approach it would look like this:

Reading data from a resource via a shared buffer

Reading data from a resource via a shared buffer

Here, upon the application or client invoking the “read” operation (in other words, sending the “read” message) on the resource object, the resource is able to take control of the buffer, obtaining data from the file and writing it to the buffer memory. When it is done, its reply or response needs to indicate the updated state of the buffer so that the client will know how much data there is available, potentially amongst other things of interest.

Cleaning Up

Many of us will be familiar with the workflow of opening, reading and writing, and closing files. This final activity is essential not only for our own programs but also for the system, so that it does not tie up resources for activities that are no longer taking place. In the filesystem server, for the resource object, a “close” operation can be provided that causes the allocated memory to be freed and the resource object to be discarded.

However, merely providing a “close” operation does not guarantee that a program would use it, and we would not want a situation where a program exits or crashes without having invoked this operation, leaving the server holding resources that it cannot safely discard. We therefore need a way of cleaning up after a program regardless of whether it sees the need to do so itself.

In my earliest experiments with L4Re on the MIPS Creator CI20, I had previously encountered the use of interrupt request (IRQ) objects, in that case signalling hardware-initiated events such as the pressing of physical switches. But I also knew that the IRQ abstraction is employed more widely in L4Re to allow programs to participate in activities that would normally be the responsibility of the kernel in a monolithic architecture. It made me wonder whether there might be interrupts communicating the termination of a process that could then be used to clean up afterwards.

One area of interest is that concerning the “IPC gate” mentioned above. This provides the channel through which messages are delivered to a particular running program, and up to this point, we have considered how a resource object has its own IPC gate for the delivery of messages intended for it. But it also turns out that we can also enable notifications with regard to the status of the IPC gate via the same mechanism.

By creating an IRQ object and associating it with a thread as the “deletion IRQ”, when the kernel decides that the IPC gate is no longer needed, this IRQ will be delivered. And the kernel will make this decision when nothing in the system needs to use the IPC gate any more. Since the IPC gate was only created to service messages from a single client, it is precisely when that client terminates that the kernel will realise that the IPC gate has no other users.

Resource deletion upon the termination of a client

Resource deletion upon the termination of a client

To enable this to actually work, however, a little trick is required: the server must indicate that it is ready to dispose of the IPC gate whenever necessary, doing so by decreasing the “reference count” which tracks how many things in the system are using the IPC gate. So this is what happens:

  1. The IPC gate is created for the resource thread, and its details are passed to the client, exposing the resource object.
  2. An IRQ object is bound to the thread and associated with the IPC gate deletion event.
  3. The server decreases its reference count, relinquishing the IPC gate and allowing its eventual destruction.
  4. The client and server communicate as desired.
  5. Upon the client terminating, the kernel disassociates the client from the IPC gate, decreasing the reference count.
  6. The kernel notices that the reference count is zero and sends an IRQ telling the server about the impending IPC gate deletion.
  7. The resource thread in the server deallocates the resource object and terminates.
  8. The IPC gate is deleted.

Using the “gate label”, the thread handling communications for the resource object is able to distinguish between the interrupt condition and normal messages from the client. Consequently, it is able to invoke the appropriate cleaning up routine, to discard the resource object, and to terminate the thread. Hopefully, with this approach, resource objects will no longer be hanging around long after their clients have disappeared.

Other Approaches

Another approach to providing file content did also occur to me, and I wondered whether this might have been a component of the “namespace filesystem” in L4Re. One technique for accessing files involves mapping the entire file into memory using a “mmap” function. This could be supported by requesting a dataspace of a suitable size, but only choosing to populate a region of it initially.

The file-accessing program would attempt to access the memory associated with the file, and upon straying outside the populated region, some kind of “fault” would occur. A filesystem server would have the job of handling this fault, fetching more data, allocating more memory pages, mapping them into the file’s memory area, and disposing of unwanted pages, potentially writing modified pages to the appropriate parts of the file.

In effect, the filesystem server would act as a pager, as far as I can tell, and I believe it to be the case that Moe – the root task – acts in such a way to provide the “rom” files from the deployed payload modules. Currently, I don’t find it particularly obvious from the documentation how I might implement a pager, and I imagine that if I choose to support such things, I will end up having to trawl the code for hints on how it might be done.

Client Library Functions

To present a relatively convenient interface to programs wanting to use files, some client library functions need to be provided. The intention with these is to support the traditional C library paradigms and for these functions to behave like those that C programmers are generally familiar with. This means performing interprocess communications using the “open”, “read”, “write”, “close” and other messages when necessary, hiding the act of sending such messages from the library user.

The details of such a client library are probably best left to another article. With some kind of mechanism in place for accessing files, it becomes a matter of experimentation to see what the demands of the different operations are, and how they may be tuned to reduce the need for interactions with server objects, hopefully allowing file-accessing programs to operate efficiently.

The next article on this topic is likely to consider the integration of libext2fs with this effort, along with the library functionality required to exercise and test it. And it will hopefully be able to report some real experiences of accessing ext2-resident files in relatively understandable programs.

Using ext2 Filesystems with L4Re

Tuesday, February 5th, 2019

Previously, I described my initial investigations into libext2fs and the development of programs to access and populate ext2/3/4 filesystems. With a program written and now successfully using libext2fs in my normal GNU/Linux environment, the next step appeared to be the task of getting this library to work within the L4Re system. The following steps were envisaged:

  1. Figuring out the code that would be needed, this hopefully being supportable within L4Re.
  2. Introducing the software as a package within L4Re.
  3. Discovering the configuration required to build the code for L4Re.
  4. Actually generating a library file.
  5. Testing the library with a program.

This process is not properly completed in that I do not yet have a good way of integrating with the L4Re configuration and using its details to configure the libext2fs code. I felt somewhat lazy with regard to reconciling the use of autotools with the rather different approach taken to build L4Re, which is somewhat reminiscent of things like Buildroot and OpenWrt in certain respects.

So, instead, I built the Debian package from source in my normal environment, grabbed the config.h file that was produced, and proceeded to use it with a vastly simplified Makefile arrangement, also in my normal environment, until I was comfortable with building the library. Indeed, this exercise of simplified building also let me consider which portions of the libext2fs distribution would really be needed for my purposes. I did not really fancy having to struggle to build files that would ultimately be superfluous.

Still, as I noted, this work isn’t finished. However, it is useful to document what I have done so far so that I can subsequently describe other, more definitive, work.

Making a Package

With a library that seemed to work with the archiving program, written to populate filesystems for eventual deployment, I then set about formulating this simplified library distribution as a package within L4Re. This involves a few things:

  • Structuring the files so that the build system may process them.
  • Persuading the build system to install things in places for other packages to find.
  • Formulating the appropriate definitions to build the source files (and thus producing the right compiler and linker invocations).
Here are some notes about the results.

The Package Structure

Currently, I have the following arrangement inside the pkg/libext2fs directory:

include
include/libblkid
include/libe2p
include/libet
include/libext2fs
include/libsupport
include/libuuid
lib
lib/libblkid
lib/libe2p
lib/libet
lib/libext2fs
lib/libsupport
lib/libuuid

To follow L4Re conventions, public header files have been moved into the include hierarchy. This breaks assumptions in the code, with header files being referenced without a prefix (like “ext2fs”, “et”, “e2p”, and so on) in some places, but being referenced with such a prefix in others. The original build system for the code gets away with this by using the “ext2fs” and other prefixes as the directory names containing the code for the different libraries. It then indicates the parent “lib” directory of these directories as the place to start looking for headers.

But I thought it worthwhile to try and map out the header usage and distinguish between public and private headers. At the very least, it helps me to establish the relationships between the different components involved. And I may end up splitting the different components into their own packages, requiring some formalisation of their interactions.

Meanwhile, I defined a Control file to indicate what the package provides:

provides: libblkid libe2p libet libext2fs libsupport libuuid

This appears to be used in dependency resolution, causing the package to be built if another package requires one of the named entities in its own Control file.

Header File Locations

In each include subdirectory (such as include/libext2fs) is a Makefile indicating a couple of things, the following being used for libext2fs:

PKGNAME = libext2fs
CONTRIB_HEADERS = 1

The effect of this is to install the headers into a include/contrib/libext2fs directory in the build output.

In the corresponding lib subdirectory (which is lib/libext2fs), the following seems to be needed:

CONTRIB_INCDIR = libext2fs

Hopefully, with this, other packages can depend on libext2fs and have the headers made available to it by an include statement like this:

#include <ext2fs/ext2fs.h>

(The ext2fs prefix is provided by a directory inside include/libext2fs.)

Otherwise, headers may end up being put in a special “l4” hierarchy, and then code would need changing to look something like this:

#include <l4/ext2fs/ext2fs.h>

So, avoiding this and having the original naming seems to be the benefit of the “contrib” settings, as far as I can tell.

Defining Build Files

The Makefile in each specific lib subdirectory employs the usual L4Re build system definitions:

TARGET          = libext2fs.a libext2fs.so
PC_FILENAME     = libext2fs

The latter of these is used to identify the build products so that the appropriate compiler and linker options can be retrieved by the build system when this library is required by another. Here, PC is short for “package config” but the notion of “package” is different from that otherwise used in this article: it just refers to the specific library being built in this case.

An important aspect related to “package config” involves the requirements or dependencies of this library. These are specified as follows for libext2fs:

REQUIRES_LIBS   = libet libe2p

We saw these things in the Control file. By indicating these other libraries, the compiler and linker options to find and use these other libraries will be brought in when something else requires libext2fs. This should help to prevent build failures caused by missing headers or libraries, and it should also permit more concise declarations of requirements by allowing those declarations to omit libet and libe2p in this case.

Meanwhile, the actual source files are listed using a SRC_C definition, and the PRIVATE_INCDIR definition lists the different paths to be used to search for header files within this package. Moving the header files around complicates this latter definition substantially.

There are other complications with libext2fs, notably the building of a tool that generates a file to be used when building the library itself. I will try and return to this matter at some point and figure out a way of doing this within the build system. Such generation of binaries for use in build processes can be problematic, particularly if there is some kind of assumption that the build system is the same as the target system, but such assumptions are probably not being made here.

Building the Library

Fortunately, the build system mostly takes care of everything else, and a command like this should see the package being built and libraries produced:

make O=mybuild S=pkg/libext2fs

The “S” option is a real time saver, and I wish I had made more use of it before. Use of the “V” option can be helpful in debugging command options, since the normal output is abridged:

make O=mybuild S=pkg/libext2fs V=1

I will admit that since certain header files are not provided by L4Re, a degree of editing of the config.h file was required. Things like HAVE_LINUX_FD_H, indicating the availability of Linux-specific headers, needed to be removed.

Testing the Library

An appropriate program for testing the library is really not much different from one used in a GNU/Linux environment. Indeed, I just took some code from my existing program that lists a directory inside a filesystem image. Since L4Re should provide enough of a POSIX-like environment to support such unambitious programs, practically no changes were needed and no special header files were included.

A suitable Makefile is needed, of course, but the examples package in L4Re provides plenty of guidance. The most important part is this, however:

REQUIRES_LIBS   = libext2fs

A Control file requiring libext2fs is actually not necessary for an example in the examples hierarchy, it would seem, but such a file would otherwise be advisible. The above library requirements pull in the necessary compiler and linker flags from the “package config” universe. (It also means that the libext2fs headers are augmented by the libe2p and libet headers, as defined in the required libraries for libext2fs itself.)

As always, deploying requires a suitable configuration description and a list of modules to be deployed. The former looks like this:

local L4 = require("L4");

local l = L4.default_loader;

l:startv({
    log = { "ext2fstest", "g" },
  },
  "rom/ex_ext2fstest", "rom/ext2fstest.fs", "/");

The interesting part is right at the end: a program called ex_ext2fstest is run with two arguments: the name of a file containing a filesystem image, and the directory inside that image that we want the program to show us. Here, we will be using the built-in “rom” filesystem in L4Re to serve up the data that we will be decoding with libext2fs in the program. In effect, we use one filesystem to bootstrap access to another!

Since the “rom” filesystem is merely a way of exposing modules as files, the filesystem image therefore needs to be made available as a module in the module list provided in the conf/modules.list file, the appropriate section starting off like this:

entry ext2fstest
roottask moe rom/ext2fstest.cfg
module ext2fstest.cfg
module ext2fstest.fs
module l4re
module ned
module ex_ext2fstest
# plus lots of library modules

All these experiments are being conducted with L4Re running on the UX configuration of Fiasco.OC, meaning that the system runs on top of GNU/Linux: a sort of “user mode L4”. Running the set of modules for the above test is a matter of running something like this:

make O=mybuild ux E=ext2fstest

This produces a lot of output and then some “logged” output for the test program:

ext2fste| Opened rom/ext2fstest.fs.
ext2fste| /
ext2fste| drwxr-xr-x-       0     0        1024 .
ext2fste| drwxr-xr-x-       0     0        1024 ..
ext2fste| drwx-------       0     0       12288 lost+found
ext2fste| -rw-r--r---    1000  1000       11449 e2access.c
ext2fste| -rw-r--r---    1000  1000        1768 file.c
ext2fste| -rw-r--r---    1000  1000        1221 format.c
ext2fste| -rw-r--r---    1000  1000        6504 image.c
ext2fste| -rw-r--r---    1000  1000        1510 path.c

It really isn’t much to look at, but this indicates that we have managed to access an ext2 filesystem within L4Re using a program that calls the libext2fs library functions. If nothing else, the possibility of porting a library to L4Re and using it has been demonstrated.

But we want to do more than that, of course. The next step is to provide access to an ext2 filesystem via a general interface that hides the specific nature of the filesystem, one that separates the work into a different program from those wanting to access files. To do so involves integrating this effort into my existing filesystem framework, then attempting to re-use a generic file-accessing program to obtain its data from ext2-resident files. Such activities will probably form the basis of the next article on this topic.

Some Ideas for 2019

Wednesday, January 23rd, 2019

Well, after my last article moaning about having wishes and goals while ignoring the preconditions for, and contributing factors in, the realisation of such wishes and goals, I thought I might as well be constructive and post some ideas I could imagine working on this year. It would be a bonus to get paid to work on such things, but I don’t hold out too much hope in that regard.

In a way, this is to make up for not writing an article summarising what I managed to look at in 2018. But then again, it can be a bit wearing to have to read through people’s catalogues of work even if I do try and make my own more approachable and not just list tons of work items, which is what one tends to see on a monthly basis in other channels.

In any case, 2018 saw a fair amount of personal focus on the L4Re ecosystem, as one can tell from looking at my article history. Having dabbled with L4Re and Fiasco.OC a bit in 2017 with the MIPS Creator CI20, I finally confronted certain aspects of the software and got it working on various devices, which had been something of an ambition for at least a couple of years. I also got back into looking at PIC32 hardware and software experiments, tidying up and building on earlier work, and I keep nudging along my Python-like language and toolchain, Lichen.

Anyway, here are a few ideas I have been having for supporting a general strategy of building flexible, sustainable and secure computing environments that respect the end-user. Such respect not being limited to software freedom, but also extending to things like privacy, affordability and longevity that are often disregarded in the narrow focus on only one set of end-user rights.

Building a General-Purpose System with L4Re

Apart from writing unfinished articles about supporting hardware devices on the Ben NanoNote and Letux 400, I also spent some time last year considering the mechanisms supporting filesystems in L4Re. For an outsider like myself, the picture isn’t particularly clear, but the mechanisms don’t really seem particularly well documented, and I am not convinced that the filesystem support is what people might expect from a microkernel-based system.

Like L4Re’s device support, the way filesystems are made available to tasks appears to use libraries extensively, whereas I would expect more use of individual programs, with interprocess messages and shared memory being employed to move the data around. To evaluate my expectations, I have been writing programs that operate in such a way, employing a “toy” filesystem to test the viability of such an approach. The plan is to make use of libext2fs to expose ext2/3/4 filesystems to L4Re components, then to try and replace the existing file access mechanisms with ones that access these file-serving components.

It is unfortunate that systems like these no longer seem to get much attention from the wider Free Software community. There was once a project to port GNU Hurd to L4-family microkernels, but with the state of the art having seemingly been regarded as insufficient or inappropriate, the focus drifted off onto other things, and now there doesn’t seem to be much appetite to resume such work. Even the existing Hurd implementation doesn’t get that much interest these days, either. And yet there are plenty of technical, social and practical reasons for not just settling for using the Linux kernel and systems based on it for every last application and device.

Extending Hardware Support within L4Re

It is all very well developing filesystem support, but there also has to be support for the things that provide the storage on which those filesystems reside. This is something I didn’t bother to look at when getting L4Re working on various devices because the priority was to have something to show, meaning that the display had to work, along with testing and demonstrating other well-understood peripherals, with the keyboard or keypad being something that could be supported with relative ease and then used to demonstrate other aspects of the solution.

It seems perverse that one must implement support for SD or microSD card storage all over again when the software being run is already being loaded from such storage, but this is rather like the way that “live CD” versions of GNU/Linux would load an environment directly from a CD, yet an installed version of such an environment might not have the capability to access the CD drive. Still, this is an unavoidable step along the path to make a practical system.

It might also be nice to make the CI20 support a bit better. Although a device notionally supported by L4Re, various missing pieces of hardware support needed to be added, and the HDMI output capability remains unavailable. Here, the mystery hardware left undocumented by the datasheet happens to be used in other chipsets and has been supported in the Linux kernel for many of them for quite some time. Hopefully, the exercise will not be too frustrating.

Another device that might be a good candidate for L4Re is the Efika MX Smartbook. Although having a modest specification by today’s bloated-Web and pointless-eye-candy standards, it has a nice keyboard (with a more sensible layout than the irritating HP Elitebook, as I recently discovered) and is several times more powerful than the Letux 400. My brother, David, has already investigated certain aspects of the hardware, and it might make the basis of a nice portable system. And since support in Linux and other more commonly-used technologies has been left to rot, why not look into developing a more lasting alternative?

Reviving Mail-Based Communication

It is tiresome to see the continuing neglect of the health of e-mail, despite it being used as the bedrock of the Internet’s activities, while proprietary and predatory social media platforms enjoy undeserved attention and promotion in mass media and in society at large. Governmental and corporate negligence mean that the average person is subjected to an array of undesirable, disturbing and generally unwanted communications from chancers and criminals through their e-mail accounts which, if it had ever happened to the same degree with postal mail, would have seen people routinely arrested and convicted for their activities.

It is tempting to think that “they” do not want “us” to have a more reliable form of mail-based communication because that would involve things like encryption and digital signatures. Even when these things are deemed necessary, they always seem to be rolled out as part of a special service that hosts “our” encryption and signing keys for us, through which we must go to access our messages. It is, I suppose, yet another example of the abandonment of common infrastructure: that when something better is needed, effort and attention is siphoned off from the “commons” and ploughed into something separate that might make someone a bit of money.

There are certainly challenges involved in making e-mail better, with any discussion of managing identities, vouching for and recognising correspondents, and the management of keys most likely to lead to dispute about the “best” way of doing things. But in the end, we probably find ourselves pursuing perfect solutions that do everything whilst proprietary competitors just focus on doing a handful of things effectively. So I envisage turning this around and evaluating whether a more limited form of mail-based communication can be done in the way that most people would need.

I did look fairly recently at a selection of different projects seeking to advise and support people on providing their own e-mail infrastructure. That is perhaps worth an article in its own right. And on the subject of mail-based communication, I hope to look a bit more at imip-agent again after neglecting it for so long.

Sustaining a Python Alternative

One motivation for developing my Python-like language and toolchain, Lichen, was to explore ways in which Python might have been developed to better suit my own needs and preferences. Although I still use Python extensively, I remain mindful of the need to write conservative, uncomplicated code without the kind of needless tricks and flourishes that Python’s expanding feature set can tempt the developer with, and thus I almost always consider the possibility of being able to use the Lichen toolchain with my projects one day.

Lichen may still be a proof of concept, but there has been work done on gradually pushing it towards being genuinely usable. I spent some time considering the way floating point numbers might be represented, and this raised some interesting issues around how they might be stored within instances. Like the tuple performance optimisations, I hope to introduce floating point support into the established feature set of Lichen and hopefully offer decent-enough performance, with the latter aspect being yet another path of investigation this year.

Documenting and Publishing

A continuing worry I have is whether I should still be running MoinMoin sites or even sites derived from MoinMoin “export dumps” for published information that is merely static. I have spent some time developing a parsing and formatting tool to generate static content from Moin content, thus avoiding running Moin altogether and also avoiding having to run a script acting as a URL-preserving front-end to exported Moin content (which is unfortunately required because of how the “export dump” seems to work).

Currently, this tool supports HTML and Moin output, the latter to test the parsing activity, with Graphviz content rendered as inline SVG or in other supported formats (although inline SVG is really what you want). Some macros are supported, but I need to add support for others, like the search macros which are useful for generating page listings. Themes are also supported, but I need to make sure that things like tables of contents – already supported with a macro – can be styled appropriately.

Already, I can generate the bulk of my existing project documentation, and the aim here is to be able to focus on improving that documentation, particularly for things like Lichen that really need explanations to be written before I need to start reviewing the code from scratch as if I were a total newcomer to the work. I have also considered using this tool as the basis for a decentralised wiki solution, but that can probably wait for a while given how many other things I have said I want to do!

Anything More?

There are probably other things that are worth looking at, but these are perhaps the ones I feel are most pressing. It could be said that pursuing all these at once would spread me and my efforts very thin, but I tend to cycle through projects periodically and hope that I can catch up with what I was previously doing, hence the mention above of documenting my work.

I wonder how much other people think about the year ahead, whether it is a productive and ultimately rewarding exercise to state aspirations and goals in this kind of way. New Year’s resolutions are a familiar practice, of course, but here I make no promises!

Nevertheless, a belated Happy New Year to anyone still reading! I hope we can all pursue our objectives enthusiastically over the year ahead and make a real and positive difference to computing, its users and to our societies.

An Absence of Strategy?

Wednesday, January 16th, 2019

I keep starting articles but not finishing them. However, after responding to some correspondence recently, where I got into a minor rant about a particular topic, I thought about starting this article and more or less airing the rant for a wider audience. I don’t intend to be negative here, so even if this sounds like me having a moan about how things are, I really do want to see positive and constructive things happen to remedy what I see as deficiencies in the way people go about promoting and supporting Free Software.

The original topic of the correspondence was my brother’s article about submitting “apps” to F-Droid, the Free Software application repository for Android, which somehow got misattributed to me in the FSFE newsletter. As anyone who knows both of us can imagine, it is not particularly unusual that people mix us up, but it does still surprise me how people can be fluid about other people’s names and assume that two people with the same family name are the same person.

Eventually, the correction was made, for which I am grateful, and it must be said that I do also appreciate the effort that goes into writing the newsletter. Having previously had the task of doing some of the Fellowship interviews, I know that such things require more work than people might think, largely go either unnoticed or unremarked, and as a participant in the process it can be easy to wonder afterwards if it was worth the bother. I do actually follow the FSFE Planet and the discussion mailing list, so I’d like to think that I keep up with what other people do, but the newsletter must have some value to those who don’t want to follow a range of channels.

A Rant about Free Software on Mobile

Well, it wasn’t as much of a rant as it was a moan about how there doesn’t seem to be a coherent strategy about Free Software on mobile devices. The FSFE has had some kind of campaign about Android for quite some time. What it amounts to is promoting Free Software applications and Free Software distributions on phones.

This probably isn’t significantly different from the activities promoted by the FSF, whose Defective By Design campaign features a gift guide promoting phones that run Free Software. The FSF also promotes and funds the Replicant project, more of which below.

For all I know, the situation about getting Free Software applications onto a phone probably isn’t all that dire, assuming that Google and phone vendors don’t try and prevent users from installing software that isn’t delivered via Google Play or other officially-sanctioned channels. Or as the Android documentation puts it:

Of course, this is rather reminiscent of the “bad old days” where some people could copy things on and off their phone using Bluetooth (or for those with particularly long memories, infrared communication) whereas others had those features disabled by their provider. So, while some people get to enjoy the benefits of Free Software, others are denied them: another case of divide-and-rule in action, I suppose.

But it is the situation about Free Software distributions, more specifically having a Free Software operating system with Free Software device drivers and Free Software firmware, that is most worrying. The FSFE campaign points people to the two enduring initiatives for putting a different kind of Android distribution on phones: Replicant and LineageOS (previously CyanogenMod).

While LineageOS seems to try and support as many devices as possible, it relies on non-free software to support device hardware. Meanwhile, Replicant employs only Free Software and is therefore limited in which devices it may support and to what extent those device’s features will function.

Although I can’t really muster much enthusiasm for Android and its derivatives, I don’t think there is anything wrong with trying to provide a completely Free Software distribution of that software. Certainly, there will always be challenges with the evolution of the upstream code, being steered this way and that by its corporate parent for maximum corporate benefit, but this isn’t really much different to clinging on to the pace of change (and churn) in an openly-developed project like the Linux kernel.

But ultimately, these initiatives will always be reacting to what other people, specifically those working for large companies, have decided to do. It will always be about chasing the latest release of the upstream software and making it acceptable for a Free Software audience. And it will always be about seeing whether the latest devices can be supported or not and then trying to figure out how. And this is where most people start to wonder why things always have to be like this, spurring my rant.

For the Long Term

To be fair to the FSFE’s Android-related campaign, the advice given does give people some concrete activities to consider: it isn’t simply the “go out and write code” battle cry that sometimes drifts through the air after an acrimonious episode where nobody can agree with each other. Helping F-Droid get more applications published, writing more Free Software applications, helping the operating system projects with their efforts: these are all worthwhile things for people to do.

But we only need look at the FSF’s ethical gift guide to see the severe challenges being faced over the longer term. For yet another year, the only offerings are older, refurbished Samsung smartphones, the most recent being the Galaxy S3 and Galaxy Note 2 from 2012. Now, there is nothing wrong with older hardware or refurbished devices. After all, I have written about older and much less powerful hardware than that which I believe still should have a place in modern life.

Nor should people regard the price of such refurbished units as particularly expensive. Of course you can buy a new phone with better specifications for the same or even less, but that new phone won’t be running only Free Software. Yes, there is always the Fairphone whose creators’ grip on Free Software, software longevity and other matters that weren’t confronted in the beginning is supposedly now rather better, although the hardware drivers remain non-free, so it isn’t really comparable, either.

Here, it is illustrative to consider community-originating efforts to develop smartphones, particularly since there is a perception that such efforts eventually end up pitching “expensive” and “underpowered” devices that “aren’t competitive”. There are obviously a collection of such initiatives ongoing at any given point, but ignoring random crowdfunding campaigns and corporate publicity stunts, we might usefully focus on some more familiar projects: the GTA04 successor to the Openmoko FreeRunner and the Neo900 successor to the Nokia N900.

Both of these projects are in a not-easily-explained state. The GTA04 device was made in a number of incrementally refined versions and sold primarily to people who already had a FreeRunner into which they could install the new hardware. However, difficulties with the last hardware revision meant that it was no longer viable as a product, with the cost of overcoming production problems being rather likely to exceed any money otherwise made on the units.

Meanwhile, the Neo900 project is effectively stalled having experienced several setbacks, notably the freezing of funds by PayPal for no really good reason, and difficulties in finding and retaining qualified people to do things like board layout. Although there are aspirations to get to completion, perhaps with some modification of the original goals, the path to completion remains obscure and uncertain. It is certainly hard to sustain my initial enthusiasm for the project, even if I do sympathise deeply with the struggles and difficulties of those trying to deliver something that they want to see succeed perhaps more than anyone else.

The future is not necessarily entirely bleak for these projects, though. Experience from the GTA04 effort may have been beneficial to the development of the Pyra handheld computer, whose own genesis has been troubled at times and yet forthrightly communicated in an honourably transparent fashion by its initiator, and the CPU board for that device may end up as the basis for a new product known as the GTA15. Given the common architectural heritage of the GTA04, N900 and Neo900, it would not be completely inconceivable that if some kind of way forward could be found for the Neo900, GTA15 might be some kind of contributing element to that.

What these projects should illustrate, however, is that the foundations of a Free Software mobile device are difficult to prepare, subject to sudden and potentially catastrophic setbacks or outright failure, and they require persistence and plenty of resources, not least of the financial kind but also the dedication of people with the right competence and experience. Sadly, these projects never get the attention, the recognition, or the generosity they deserve.

Seeing the Bigger Picture

If we care about being able to support all the different elements of our phones with Free Software, instead of crossing out items in the specification list, sacrificing functionality because nobody knows how it works without signing a non-disclosure agreement and then only being allowed to release a binary blob for loading into specific Linux kernel versions, then we need to be there at the start, when the phone gets designed, and play a role in making sure that everything can be supported by Free Software. Spending time and effort on figuring out someone else’s hardware is not time and effort well spent.

Indeed, from the moment a proprietary product gets into the hands of developers, the clock is ticking. Already, given the pace of product development and shipment, the product is heading for obsolescence and its manufacturer will be tooling up to make its successor. Even if downstream developers work quickly and get as much of the hardware supported as they can, there will be only be a certain period before the product becomes difficult to obtain. And then the process of catch-up starts all over again with the next product.

Of course, product variations always used to happen with desktop and laptop computers. One day you’d get a laptop with one chipset and the next day the “same” laptop would contain something else. The only thing that eased the pain involved was broad hardware support for these kinds of devices, and even then there would be chipsets notorious for their lack of support in things like the Linux kernel.

Such pitfalls cultivated demands for products that could run Free Software and be supplied with such software instead of the usual proprietary products bundled as a consequence of Microsoft’s anticompetitive and coercive business practices. It was no longer enough to accept that we might buy a computer with bundled software and “install over it”, that this might emphasise our Free Software credentials. Credible advocates of Free Software have sought to identify vendors offering systems that are either already supplied with Free Software or that come without any preinstalled software at all, in both cases being fully capable of supporting a Free Software distribution.

But we now find ourselves in the absurd situation with mobile devices where remedial measures comparable to “install over it” are almost the only things people can suggest, that there really aren’t any mobile device vendors who can offer a bare, supportable device or one that is, say, already running Replicant and offering access to all of the device’s features. And although refurbished devices are sold that may run Replicant well enough, we lack another essential guarantee that may not have been so important in the past, one that community-originating hardware projects might be able to help with.

In being involved with the design of these devices, we can seek to dictate how long they remain viable. Instead of having a large corporation decide that now is the time for you to buy their next device and that the product you bought and liked is now deliberately unavailable, we can seek to keep making devices as long as they have a role and have people wanting to keep using them.

If something runs a Free Software distribution well enough, and if that device can still be made, it becomes a safe choice and something we can recommend to others. At last, we get some kind of certainty in a world whose stream of continual change is often fad-driven, exploitative and needless.

The Strategic Vacuum

So it seems obvious to me that if people want Free Software on phones, they need to cultivate the efforts to make that Free Software viable, which means cultivating sustainable hardware design and actually promoting and supporting the projects pursuing it. Otherwise, it is like trying to plant an orchard without paying attention to the soil, cursing that the trees will not grow whilst being oblivious to the fact that the ground is concrete.

And this goes beyond this particular domain. Free Software advocacy is all well and good, but there needs to be practical action that goes beyond pitching in and nudging things towards success. It is wonderful that collective effort and collaboration can take small endeavours and grow them into solutions that are useful for many, but it can be too much to expect everything to just coalesce as if by magic, that people and projects will just discover each other, work together, strengthen each other and multiply the benefits.

There needs to be a strategy, for people to identify real-world problems that are not being addressed by Free Software, to identify the projects that might be applied to those problems, and to propose actual solutions. In the absence of Free Software, proprietary and exploitative solutions are able to stake out their territory, entrench themselves and to thrive.

Here, I struggle to see which Free Software organisations have both the breadth of scope and the depth of focus to make a difference. Developer-driven organisations like Debian and KDE have a lot going on, and they deliver non-trivial software systems, and yet sustaining something like a viable mobile platform (encompassing hardware and software) is seemingly beyond their reach. Neither have a coherent answer to other significant challenges of our time, such as the dominance of proprietary (and destructive) communications platforms.

Meanwhile, advocacy-driven organisations like the FSFE merely give advice to people about how they might help Free Software. The FSF arguably combines this with actual development through the GNU project, and there are those lists of urgent activities, but one cannot help but have the impression that the urgency is reactive and not the product of strategic vision (beyond the goal of the proliferation of Free Software, of course). I would like to think that the Software Freedom Conservancy might combine things more effectively, but it perhaps remains more of an umbrella organisation with a similar emphasis on broad Free Software adoption.

I would like to think that the step of getting involved in Free Software is but the first step towards fairer, kinder, more transparent, more productive and more sustainable societies. Traces of such visions can be seen in the communications of various organisations and yet they largely hold back from suggesting what the next steps might be. And yet, I think, we will in future really need to take those steps in order to protect ourselves, our societies, and the things we care about.

In general, we notice the absence of strategy; in specific cases, we notice it keenly. Which organisations are willing and able to fill this strategic void coherently and decisively, to offer concrete solutions as opposed to suggestions, to have something definite to work towards, and to direct effort and resources into actually realising such goals? Surely, in this age of hoping for the best, those organisations will be the ones that deserve our support the most.

Another Look at VGA Signal Generation with a PIC32 Microcontroller

Thursday, November 8th, 2018

Maybe some people like to see others attempting unusual challenges, things that wouldn’t normally be seen as productive, sensible or a good way to spend time and energy, things that shouldn’t be possible or viable but were nevertheless made to work somehow. And maybe they like the idea of indulging their own curiosity in such things, knowing that for potential future experiments of their own there is a route already mapped out to some kind of success. That might explain why my project to produce a VGA-compatible analogue video signal from a PIC32 microcontroller seems to attract more feedback than some of my other, arguably more useful or deserving, projects.

Nevertheless, I was recently contacted by different people inquiring about my earlier experiments. One was admittedly only interested in using Free Software tools to port his own software to the MIPS-based PIC32 products, and I tried to give some advice about navigating the documentation and to describe some of the issues I had encountered. Another was more concerned with the actual signal generation aspect of the earlier project and the usability of the end result. Previously, I had also had a conversation with someone looking to use such techniques for his project, and although he ended up choosing a different approach, the cumulative effect of my discussions with him and these more recent correspondents persuaded me to take another look at my earlier work and to see if I couldn’t answer some of the questions I had received more definitively.

Picking Over the Pieces

I was already rather aware of some of the demonstrated and potential limitations of my earlier work, these being concerned with generating a decent picture, and although I had attempted to improve the work on previous occasions, I think I just ran out of energy and patience to properly investigate other techniques. The following things were bothersome or a source of concern:

  • The unevenly sized pixels
  • The process of experimentation with the existing code
  • Whether the microcontroller could really do other things while displaying the picture

Although one of my correspondents was very complimentary about the form of my assembly language code, I rather felt that it was holding me back, making me focus on details that should be abstracted away. It should be said that MIPS assembly language is fairly pleasant to write, at least in comparison to certain other architectures.

(I was brought up on 6502 assembly language, where there is an “accumulator” that is the only thing even approaching a general-purpose register in function, and where instructions need to combine this accumulator with other, more limited, registers to do things like accessing “zero page”: an area of memory that supports certain kinds of operations by providing the contents of locations as inputs. Everything needs to be meticulously planned, and despite odd claims that “zero page” is really one big register file and that 6502 is therefore “RISC-like”, the existence of virtual machines such as SWEET16 say rather a lot about how RISC-like the 6502 actually is. Later, I learned ARM assembly language and found it rather liberating with its general-purpose registers and uncomplicated, rather easier to use, memory access instructions. Certain things are even simpler in MIPS assembly language, whereas other conveniences from ARM are absent and demand a bit more effort from the programmer.)

Anyway, I had previously introduced functionality in C in my earlier work, mostly because I didn’t want the challenge of writing graphics routines in assembly language. So with the need to more easily experiment with different peripheral configurations, I decided to migrate the remaining functionality to C, leaving only the lowest-level routines concerned with booting and exception/interrupt handling in assembly language. This effort took me to some frustrating places, making me deal with things like linker scripts and the kind of memory initialisation that one’s compiler usually does for you but which is absent when targeting a “bare metal” environment. I shall spare you those details in this article.

I therefore spent a certain amount of effort in developing some C library functionality for dealing with the hardware. It could be said that I might have used existing libraries instead, but ignoring Microchip’s libraries that will either be proprietary or the subject of numerous complaints by those unwilling to leave that company’s “ecosystem”, I rather felt that the exercise in library design would be useful in getting reacquainted and providing me with something I would actually want to use. An important goal was minimalism, whereas my impression of libraries such as those provided by the Pinguino effort are that they try and bridge the different PIC hardware platforms and consequently accumulate features and details that do not really interest me.

The Wide Pixel Problem

One thing that had bothered me when demonstrating a VGA signal was that the displayed images featured “wide” pixels. These are not always noticeable: one of my correspondents told me that he couldn’t see them in one of my example pictures, but they are almost certainly present because they are a feature of the mechanism used to generate the signal. Here is a crop from the example in question:

Picture detail from VGAPIC32 output

Picture detail from VGAPIC32 output

And here is the same crop with the wide pixels highlighted:

Picture detail with wide pixels highlighted

Picture detail with wide pixels highlighted

I have left the identification of all wide pixel columns to the reader! Nevertheless, it can be stated that these pixels occur in every fourth column and are especially noticeable with things like text, where at such low resolutions, the doubling of pixel widths becomes rather obvious and annoying.

Quite why this increase in pixel width was occurring became a matter I wanted to investigate. As you may recall, the technique I used to output pixels involved getting the direct memory access (DMA) controller in the PIC32 chip to “copy” the contents of memory to a hardware register corresponding to output pins. The signals from these pins were sent along the cable to the monitor. And the DMA controller was transferring data as fast as it could and thus producing pixel colours as fast as it could.

Pixel Output Using DMA Transfer

An overview of the architecture for pixel output using DMA transfer

One of my correspondents looked into the matter and confirmed that we were not imagining this problem, even employing an oscilloscope to check what was happening with the signals from the output pins. The DMA controller would, after starting each fourth pixel, somehow not be able to produce the next pixel in a timely fashion, leaving the current pixel colour unchanged as the monitor traced the picture across the screen. This would cause these pixels to “stretch” until the first pixel from the next group could be emitted.

Initially, I had thought that interrupts were occurring and the CPU, in responding to interrupt conditions and needing to read instructions, was gaining priority over the DMA controller and forcing pixel transfers to wait. Although I had specified a “cell size” of 160 bytes, corresponding to 160 pixels, I was aware that the architecture of the system would be dividing data up into four-byte “words”, and it would be natural at certain points for accesses to memory to be broken up and scheduled in terms of such units. I had therefore wanted to accommodate both the CPU and DMA using an approach where the DMA would not try and transfer everything at once, but without the energy to get this to work, I had left the matter to rest.

A Steady Rhythm

The documentation for these microcontrollers distinguishes between block and cell transfers when describing DMA. Previously, I had noted that these terms could be better described, and I think there are people who are under the impression that cells must be very limited in size and that you need to drive repeated cell transfers using various interrupt conditions to transfer larger amounts. We have seen that this is not the case: a single, large cell transfer is entirely possible, even though the characteristics of the transfer have been less than desirable. (Nevertheless, the documentation focuses on things like copying from one UART peripheral to another, arguably failing to explore the range of possible applications for DMA and to thereby elucidate the mechanisms involved.)

However, given the wide pixel effect, it becomes more interesting to introduce a steady rhythm by using smaller cell sizes and having an external event coordinate each cell’s transfer. With a single, large transfer, only one initiation event needs to occur: that produced by the timer whose period corresponds to that of a single horizontal “scanline”. The DMA channel producing pixels then runs to completion and triggers another channel to turn off the pixel output. In this scheme, the initiating condition for the cell transfer is the timer.

VGA Display Line Structure

The structure of each visible display line in the VGA signal

When using multiple cells to transfer the pixel data, however, it is no longer possible to use the timer in this way. Doing so would cause the initiation of the first cell, but then subsequent cells would only be transferred on subsequent timer events. And since these events only occur once per scanline, this would see a single line’s pixel data being transferred over many scanlines instead (or, since the DMA channel would be updated regularly, we would see only the first pixel on each line being emitted, stretched across the entire line). Since the DMA mechanism apparently does not permit one kind of interrupt condition to enable a channel and another to initiate each cell transfer, we must be slightly more creative.

Fortunately, the solution is to chain two channels, just as we already do with the pixel-producing channel and the one that resets the output. A channel is dedicated to waiting for the line timer event, and it transfers a single black pixel to the screen before handing over to the pixel-producing channel. This channel, now enabled, has its cell transfers regulated by another interrupt condition and proceeds as fast as such a condition may occur. Finally, the reset channel takes over and turns off the output as before.

Pixel Output Using Timed DMA Transfers

An overview of the architecture for pixel output using timed DMA transfers

The nature of the cell transfer interrupt can take various forms, but it is arguably most intuitive to use another timer for this purpose. We may set the limit of such a timer to 1, indicating that it may “wrap around” and thus produce an event almost continuously. And by configuring it to run as quickly as possible, at the frequency of the peripheral clock, it may drive cell transfers at a rate that is quick enough to hopefully produce enough pixels whilst also allowing other activities to occur between each transfer.

VGA Pixel Output (Using Transfer Timer)

Using a timer to initiate pixel transfers for the VGA signal

One thing is worth mentioning here just to be explicit about the mechanisms involved. When configuring interrupts that are used for DMA transfers, it is the actual condition that matters, not the interrupt nor the delivery of the interrupt request to the CPU. So, when using timer events for transfers, it appears to be sufficient to merely configure the timer; it will produce the interrupt condition upon its counter “wrapping around” regardless of whether the interrupt itself is enabled.

With a cell size of a single byte, and with a peripheral clock running at half the speed of the system clock, this approach is sufficient all by itself to yield pixels with consistent widths, with the only significant disadvantage being how few of them can be produced per line: I could only manage in the neighbourhood of 80 pixels! Making the peripheral clock run as fast as the system clock doesn’t help in theory: we actually want the CPU running faster than the transfer rate just to have a chance of doing other things. Nor does it help in practice: picture stability rather suffers.

A picture of the display output from timed DMA transfers

A picture of the display output from timed DMA transfers

Using larger cell sizes, we encounter the wide pixel problem, meaning that the end of a four-byte group is encountered and the transfer hangs on for longer than it should. However, larger cell sizes also introduce byte transfers at a different rate from cell transfers (at the system clock rate) and therefore risk making the last pixel produced by a cell longer than the others, anyway.

Uncovering DMA Transfers

I rather suspect that interruptions are not really responsible for the wide pixels at all, and that it is the DMA controller that causes them. Some discussion with another correspondent explored how the controller might be operating, with transfers perhaps looking something like this:

DMA read from memory
DMA write to display (byte #1)
DMA write to display (byte #2)
DMA write to display (byte #3)
DMA write to display (byte #4)
DMA read from memory
...

This would, by itself, cause a transfer pattern like this:

R____R____R____R____R____R ...
_WWWW_WWWW_WWWW_WWWW_WWWW_ ...

And thus pixel output as follows:

41234412344123441234412344 ...
=***==***==***==***==***== ... (narrow pixels as * and wide pixel components as =)

Even without any extra operations or interruptions, we would get a gap between the write operations that would cause a wider pixel. This would only get worse if the DMA controller had to update the address of the pixel data after every four-byte read and write, not being able to do so concurrently with those operations. And if the CPU were really able to interrupt longer transfers, even to obtain a single instruction to execute, it might then compete with the DMA controller in accessing memory, making the read operations even later every time.

Assuming, then, that wide pixels are the fault of the way the DMA controller works, we might consider how we might want it to work instead:

                     | ...
DMA read from memory | DMA write to display (byte #4)
                 \-> | DMA write to display (byte #1)
                     | DMA write to display (byte #2)
                     | DMA write to display (byte #3)
DMA read from memory | DMA write to display (byte #4)
                 \-> | ...

If only pixel data could be read from memory and written to the output register (and thus the display) concurrently, we might have a continuous stream of evenly-sized pixels. Such things do not seem possible with the microcontroller I happen to be using. Either concurrent reading from memory and writing to a peripheral is not possible or the DMA controller is not able to take advantage of this concurrency. But these observations did give me another idea.

Dual Channel Transfers

If the DMA controller cannot get a single channel to read ahead and get the next four bytes, could it be persuaded to do so using two channels? The intention would be something like the following:

Channel #1:                     Channel #2:
                                ...
DMA read from memory            DMA write to display (byte #4)
DMA write to display (byte #1)
DMA write to display (byte #2)
DMA write to display (byte #3)
DMA write to display (byte #4)  DMA read from memory
                                DMA write to display (byte #1)
...                             ...

This is really nothing different from the above, functionally, but the required operations end up being assigned to different channels explicitly. We would then want these channels to cooperate, interleaving their data so that the result is the combined sequence of pixels for a line:

Channel #1: 1234    1234     ...
Channel #2:     5678    5678 ...
  Combined: 1234567812345678 ...

It would seem that channels might even cooperate within cell transfers, meaning that we can apparently schedule two long transfer cells and have the DMA controller switch between the two channels after every four bytes. Here, I wrote a short test program employing text strings and the UART peripheral to see if the microcontroller would “zip up” the strings, with the following being used for single-byte cells:

Channel #1: "Adoc gi,hlo\r"
Channel #2: "n neaan el!\n"
  Combined: "And once again, hello\r\n"

Then, seeing that it did, I decided that this had a chance of also working with pixel data. Here, every other pixel on a line needs to be presented to each channel, with the first channel being responsible for the pixels in odd-numbered positions, and the second channel being responsible for the even-numbered pixels. Since the DMA controller is unable to step through the data at address increments other than one (which may be a feature of other DMA implementations), this causes us to rearrange each line of pixel data as follows:

 Displayed pixels: 123456......7890
Rearranged pixels: 135...79246...80
                   *       *

Here, the asterisks mark the start of each channel’s data, with each channel only needing to transfer half the usual amount.

Pixel Output Using Timed Dual-Channel Transfers

The architecture involved in employing two pixel data channels with timed transfers

The documentation does, in fact, mention that where multiple channels are active with the same priority, each one is given control in turn with the controller cycling through them repeatedly. The matter of which order they are chosen, which is important for us, seems to be dependent on various factors, only some of which I can claim to understand. For instance, I suspect that if the second channel refers to data that appears before the first channel’s data in memory, it gets scheduled first when both channels are activated. Although this is not a significant concern when just trying to produce a stable picture, it does limit more advanced operations such as horizontal scrolling.

A picture of the display output from timed, dual-channel DMA transfers

A picture of the display output from timed, dual-channel DMA transfers

As you can see, trying this technique out with timed transfers actually made a difference. Instead of only managing something approaching 80 pixels across the screen, more than 90 can be accommodated. Meanwhile, experiments with transfers going as fast as possible seemed to make no real difference, and the fourth pixel in each group was still wider than the others. Still, making the timed transfer mode more usable is a small victory worth having, I suppose.

Parallel Mode Revisited

At the start of my interest in this project, I had it in my mind that I would couple DMA transfers with the parallel mode (or Parallel Master Port) functionality in order to generate a VGA signal. Certain aspects of this, particularly gaps between pixels, made me less than enthusiastic about the approach. However, in considering what might be done to the output signal in other situations, I had contemplated the use of a flip-flop to hold output stable according to a regular tempo, rather like what I managed to achieve, almost inadvertently, when introducing a transfer timer. Until recently, I had failed to apply this idea to where it made most sense: in regulating the parallel mode signal.

Since parallel mode is really intended for driving memory devices and display controllers, various control signals are exposed via pins that can tell these external devices that data is available for their consumption. For our purposes, a flip-flop is just like a memory device: it retains the input values sampled by its input pins, and then exposes these values on its output pins when the inputs are “clocked” into memory using a “clock pulse” signal. The parallel mode peripheral in the microcontroller offers various different signals for such clock and selection pulse purposes.

VGA Output Circuit (Parallel Mode)

The parallel mode circuit showing connections relevant to VGA output (generic connections are not shown)

Employing the PMWR (parallel mode write) signal as the clock pulse, directing the display signals to the flip-flop’s inputs, and routing the flip-flop’s outputs to the VGA circuit solved the pixel gap problem at a stroke. Unfortunately, it merely reminded us that the wide pixel problem also affects parallel mode output, too. Although the clock pulse is able to tell an external component about the availability of a new pixel value, it is up to the external component to regulate the appearance of each pixel. A memory device does not care about the timing of incoming data as long as it knows when such data has arrived, and so such regulation is beyond the capabilities of a flip-flop.

It was observed, however, that since each group of pixels is generated at a regular frequency, the PMWR signalling frequency might be reduced by being scaled by a constant factor. This might allow some pixel data to linger slightly longer in the flip-flop and be slightly stretched. By the time the fourth pixel in a group arrives, the time allocated to that pixel would be the same as those preceding it, thus producing consistently-sized pixels. I imagine that a factor of 8/9 might do the trick, but I haven’t considered what modification to the circuit might be needed or whether it would all be too complicated.

Recognising the Obvious

When people normally experiment with video signals from microcontrollers, one tends to see people writing code to run as efficiently as is absolutely possible – using assembly language if necessary – to generate the video signal. It may only be those of us using microcontrollers with DMA peripherals who want to try and get the DMA hardware to do the heavy lifting. Indeed, those of us with exposure to display peripherals provided by system-on-a-chip solutions feel almost obliged to do things this way.

But recent discussions with one of my correspondents made me reconsider whether an adequate solution might be achieved by just getting the CPU to do the work of transferring pixel data to the display. Previously, another correspondent had indicated that it this was potentially tricky, and that getting the timings right was more difficult than letting the hardware synchronise the different mechanisms – timer and DMA – all by itself. By involving the CPU and making it run code, the display line timer would need to generate an interrupt that would be handled, causing the CPU to start running a loop to copy data from the framebuffer to the output port.

Pixel Output Using CPU-Driven Transfers

An overview of the architecture with the CPU driving transfers of pixel data

This approach puts us at the mercy of how the CPU handles and dispatches interrupts. Being somewhat conservative about the mechanisms more generally available on various MIPS-based products, I tend to choose a single interrupt vector and then test for the different conditions. Since we need as little variation as possible in the latency between a timer event occurring and the pixel data being generated, I test for that particular event before even considering anything else. Then, a loop of the following form is performed:

    for (current = line_data; current < end; current++)
        *output_port = *current;

Here, the line data is copied byte by byte to the output port. Some adornments are necessary to persuade the compiler to generate code that writes the data efficiently and in order, but there is nothing particularly exotic required and GCC does a decent job of doing what we want. After the loop, a black/reset pixel is generated to set the appropriate output level.

One concern that one might have about using the CPU for such long transfers in an interrupt handler is that it ties up the CPU, preventing it from doing other things, and it also prevents other interrupt requests from being serviced. In a system performing a limited range of activities, this can be acceptable: there may be little else going on apart from updating the display and running programs that access the display; even when other activities are incorporated, they may accommodate being relegated to a secondary status, or they may instead take priority over the display in a way that may produce picture distortion but only very occasionally.

Many of these considerations applied to systems of an earlier era. Thinking back to computers like the Acorn Electron – a 6502-based system that formed the basis of my first sustained experiences with computing – it employs a display controller that demands access to the computer’s RAM for a certain amount of the time dedicated to each video frame. The CPU is often slowed down or even paused during periods of this display controller’s activity, making programs slower than they otherwise would be, and making some kinds of input and output slightly less reliable under certain circumstances. Nevertheless, with certain kinds of additional hardware, the possibility is present for such hardware to interrupt the CPU and to override the display controller that would then produce “snow” or noise on the screen as a consquence of this high-priority interruption.

Such issues cause us to consider the role of the DMA controller in our modern experiment. We might well worry about loading the CPU with lots of work, preventing it from doing other things, but what if the DMA controller dominates the system in such a way that it effectively prevents the CPU from doing anything productive anyway? This would be rather similar to what happens with the Electron and its display controller.

So, evaluating a CPU-driven solution seems to be worthwhile just to see whether it produces an acceptable picture and whether it causes unacceptable performance degradation. My recent correspondence also brought up the assertion that the RAM and flash memory provided by PIC32 microcontrollers can be accessed concurrently. This would actually mitigate contention between DMA and any programs running from flash memory, at least until the point that accesses to RAM needed to be performed by those programs, meaning that we might expect some loss of program performance by shifting the transfer burden to the CPU.

(Again, I am reminded of the Electron whose ROM could be accessed at full speed but whose RAM could only be accessed at half speed by the CPU but at full speed by the display controller. This might have been exploited by software running from ROM, or by a special kind of RAM installed and made available at the right place in memory, but the 6502 favours those zero-page instructions mentioned earlier, forcing RAM access and thus contention with the display controller. There were upgrades to mitigate this by providing some dedicated memory for zero page, but all of this is really another story for another time.)

Ultimately, having accepted that the compiler would produce good-enough code and that I didn’t need to try more exotic things with assembly language, I managed to produce a stable picture.

A picture of the display output from CPU-driven pixel data transfers

A picture of the display output from CPU-driven pixel data transfers

Maybe I should have taken this obvious path from the very beginning. However, the presence of DMA support would have eventually caused me to investigate its viability for this application, anyway. And it should be said that the performance differences between the CPU-based approach and the DMA-based approaches might be significant enough to argue in favour of the use of DMA for some purposes.

Observations and Conclusions

What started out as a quick review of my earlier work turned out to be a more thorough study of different techniques and approaches. I managed to get timed transfers functioning, revisited parallel mode and made it work in a fairly acceptable way, and I discovered some optimisations that help to make certain combinations of techniques more usable. But what ultimately matters is which approaches can actually be used to produce a picture on a screen while programs are being run at the same time.

To give the CPU more to do, I decided to implement some graphical operations, mostly copying data to a framebuffer for its eventual transfer as pixels to the display. The idea was to engage the CPU in actual work whilst also exercising access to RAM. If significant contention between the CPU and DMA controller were to occur, the effects would presumably be visible on the screen, potentially making the chosen configuration unusable.

Although some approaches seem promising on paper, and they may even produce promising results when the CPU is doing little more than looping and decrementing a register to introduce a delay, these approaches may produce less than promising results under load. The picture may start to ripple and stretch, and under “real world” load, the picture may seem noisy and like a badly-tuned television (for those who remember the old days of analogue broadcast signals).

Two approaches seem to remain robust, however: the use of timed DMA transfers, and the use of the CPU to do all the transfer work. The former is limited in terms of resolution and introduces complexity around the construction of images in the framebuffer, at least if optimised as much as possible, but it seems to allow more work to occur alongside the update of the display, and the reduction in resolution also frees up RAM for other purposes for those applications that need it. Meanwhile, the latter provides the resolution we originally sought and offers a straightforward framebuffer arrangement, but it demands more effort from the CPU, slowing things down to the extent that animation practically demands double buffering and thus the allocation of even more RAM for display purposes.

But both of these seemingly viable approaches produce consistent pixel widths, which is something of a happy outcome given all the effort to try and fix that particular problem. One can envisage accommodating them both within a system given that various fundamental system properties (how fast the system and peripheral clocks are running, for example) are shared between the two approaches. Again, this is reminiscent of microcomputers where a range of display modes allowed developers and users to choose the trade-off appropriate for them.

A demonstration of text plotting at a resolution of 160x128

A demonstration of text plotting at a resolution of 160x128

Having investigated techniques like hardware scrolling and sprite plotting, it is tempting to keep developing software to demonstrate the techniques described in this article. I am even tempted to design a printed circuit board to tidy up my rather cumbersome breadboard arrangement. And perhaps the library code I have written can be used as the basis for other projects.

It is remarkable that a home-made microcontroller-based solution can be versatile enough to demonstrate aspects of simple computer systems, possibly even making it relevant for those wishing to teach or to learn about such things, particularly since all the components can be connected together relatively easily, with only some magic happening in the microcontroller itself. And with such potential, maybe this seemingly pointless project might have some meaning and value after all!

Update

Although I can’t embed video files of any size here, I have made a “standard definition” video available to demonstrate scrolling and sprites. I hope it is entertaining and also somewhat illustrative of the kind of thing these techniques can achieve.