Paul Boddie's Free Software-related blog


Archive for the ‘retrocomputing’ Category

Considering Unexplored Products of the Past: Formulating a Product

Friday, February 10th, 2023

Previously, I described exploring the matter of developing emulation of a serial port, along with the necessary circuitry, for Elkulator, an emulator for the Acorn Electron microcomputer, motivated by a need to provide a way of transferring files into and out of the emulated computer. During this exploration, I had discovered some existing software that had been developed to provide some level of serial “filing system” support on the BBC Microcomputer – the higher-specification sibling of the Electron – with the development of this software having been motivated by an unforeseen need to transfer software to a computer without any attached storage devices.

This existing serial filing system software was a good indication that serial communications could provide the basis of a storage medium. But instead of starting from a predicament involving computers without usable storage facilities, where an unforeseen need motivates the development of a clever workaround, I wanted to consider what such a system might have been like if there had been a deliberate plan from the very beginning to deploy computers that would rely on a serial connection for all their storage needs. Instead of having an implementation of the filing system in RAM, one could have the luxury of putting it into a ROM chip that would be fitted in the computer or in an expansion, and a richer set of features might then be contemplated.

A Smarter Terminal

Once again, my interest in the historical aspects of the technology provided some guidance and some inspiration. When microcomputers started to become popular and businesses and institutions had to decide whether these new products had any relevance to their operations, there was some uncertainty about whether such products were capable enough to be useful or whether they were a distraction from the facilities already available in such organisations. It seems like a lifetime ago now, but having a computer on every desk was not necessarily seen as a guarantee of enhanced productivity, particularly if they did not link up to existing facilities or did not coordinate the work of a number of individuals.

At the start of the 1980s, equipping an office with a computer on every desk and equipping every computer with a storage solution was an expensive exercise. Even disk drives offering only a hundred kilobytes of storage on each removable floppy disk were expensive, and hard disk drives were an especially expensive and precious luxury that were best shared between many users. Some microcomputers were marketed as multi-user systems, encouraging purchasers to connect terminals to them and to share those precious resources: precisely the kind of thing that had been done with minicomputers and mainframes. Such trends continued into the mid-1980s, manifested by products promoted by companies with mainframe origins, such companies perpetuating entrenched tendencies to frame computing solutions in certain ways.

Terminals themselves were really just microcomputers designed for the sole purpose of interacting with a “host” computer, and institutions already operating mainframes and minicomputers would have experienced the need to purchase several of them. Until competition intensified in the terminal industry, such products were not particularly cheap, with the DEC VT220 introduced in 1983 costing $1295 at its introduction. Meanwhile, interest in microcomputers and the possibility of distributing some kinds of computing activity to these new products, led to experimentation in some organisations. Some terminal manufacturers responded by offering terminals that also ran microcomputer software.

Much of the popular history of microcomputing, familiar to anyone who follows such topics online, particularly through YouTube videos, focuses on adoption of such technology in the home, with an inevitable near-obsession with gaming. The popular history of institutional adoption often focuses on the upgrade parade from one generation of computer to the next. But there is a lesser told history involving the experimentation that took place at the intersection of microcomputing and minicomputing or mainframe computing. In universities, computers like the BBC Micro were apparently informally introduced as terminals for other systems, terminal ROMs were developed and shared between institutions. However, there seems to have been relatively little mainstream interest in such software as fully promoted commercial products, although Acornsoft – Acorn’s software outlet – did adopt such a ROM to sell as their Termulator product.

The Acorn Electron, introduced at £199, had a “proper” keyboard and the ability to display 80 columns of text, unlike various other popular microcomputers. Indeed, it may have been the lowest-priced computer to be able to display 80 columns of relatively high definition text as standard, such capabilities requiring extra cards for machines like the Apple II and the Commodore 64. Considering the much lower price of such a computer, the ongoing experimentation underway at the time with its sibling machine on alternative terminal solutions, and the generally favourable capabilities of both these machines, it seems slightly baffling that more was not done to pursue opportunities to introduce a form of “intelligent terminal” or “hybrid terminal” product to certain markets.

VIEW in 80 columns on the Acorn Electron.

VIEW in 80 columns on the Acorn Electron.

None of this is to say that institutional users would have been especially enthusiastic. In some institutions, budgets were evidently generous enough that considerable sums of money would be spent acquiring workstations that were sometimes of questionable value. But in others, the opportunity to make savings, to explore other ways of working, and perhaps also to explicitly introduce microcomputing topics such as software development for lower-specification hardware would have been worthy of some consideration. An Electron with a decent monochrome monitor, like the one provided with the M2105, plus some serial hardware, could have comprised a product sold for perhaps as little as £300.

The Hybrid Terminal

How would a “hybrid terminal” solution work, how might it have been adopted, and what might it have been used for? Through emulation and by taking advantage of the technological continuity in multi-user systems from the 1980s to the present day, we can attempt to answer such questions. Starting with communications technologies familiar in the world of the terminal, we might speculate that a serial connection would be the most appropriate and least disruptive way of interfacing a microcomputer to a multi-user system.

Although multi-user systems, like those produced by Digital Equipment Corporation (DEC), might have offered network connectivity, it is likely that such connectivity was proprietary, expensive in terms of the hardware required, and possibly beyond the interfacing capabilities of most microcomputers. Meanwhile, Acorn’s own low-cost networking solution, Econet, would not have been directly compatible with these much higher-end machines. Acorn’s involvement in network technologies is also more complicated than often portrayed, but as far as Econet is concerned, only much later machines would more conveniently bridge the different realms of Econet and standards-based higher-performance networks.

Moreover, it remains unlikely that operators and suppliers of various multi-user systems would have been enthusiastic about fitting dedicated hardware and installing dedicated software for the purpose of having such systems communicate with third-party computers using a third-party network technology. I did find it interesting that someone had also adapted Acorn’s network filing system that usually runs over Econet to work instead over a serial connection, which presumably serves files out of a particular user account. Another discovery I made was a serial filing system approach by someone who had worked at Acorn who wanted to transfer files between a BBC Micro system and a Unix machine, confirming that such functionality was worth pursuing. (And there is also a rather more complicated approach involving more exotic Acorn technology.)

Indeed, to be successful, a hybrid terminal approach would have to accommodate existing practices and conventions as far as might be feasible in order to not burden or disturb the operators of these existing systems. One motivation from an individual user’s perspective might be to justify introducing a computer on their desk, to be able to have it take advantage of the existing facilities, and to augment those facilities where it might be felt that they are not flexible or agile enough. Such users might request help from the operators, but the aim would be to avoid introducing more support hassles, which would easily arise if introducing a new kind of network to the mix. Those operators would want to be able to deploy something and have it perform a role without too much extra thought.

I considered how a serial link solution might achieve this. An existing terminal would be connected to, say, a Unix machine and be expected to behave like a normal client, allowing the user to log into their account. The microcomputer would send some characters down the serial line to the Unix “host”, causing it to present the usual login prompt, and the user would then log in as normal. They would then have the option of conducting an interactive session, making their computer like a conventional terminal, but there would also be the option of having the Unix system sit in the background, providing other facilities on request.

Logging into a remote service via a serial connection.

Logging into a remote service via a serial connection.

The principal candidates for these other facilities would be file storage and printing. Both of these things were centrally managed in institutions, often available via the main computing service, and the extensible operating system of the Electron and related microcomputers invites the development of software to integrate the core support for these facilities with such existing infrastructure. Files would be loaded from the user’s account on the multi-user system and saved back there again. Printing would spool the printed data to files somewhere in the user’s home directory for queuing to centralised printing services.

Attempting an Implementation

I wanted to see how such a “serial computing environment” would work in practice, how it would behave, what kinds of applications might benefit, and what kind of annoyances it might have. After all, it might be an interesting idea or a fun idea, but it need not be a particularly good one. The first obstacle was that of understanding how the software elements would work, primarily on the Electron itself, from the tasks that I would want the software to perform down to the way the functionality would be implemented. On the host or remote system, I was rather more convinced that something could be implemented since it would mostly be yet another server program communicating over a stream, with plenty of modern Unix conveniences to assist me along the way.

As it turned out, my investigations began with a trip away from home and the use of a different, and much more constrained, development environment involving an ARM-based netbook. Fortunately, Elkulator and the different compilers and tools worked well enough on that development hardware to make the exercise approachable. Another unusual element was that I was going to mostly rely on the original documentation in the form of the actual paper version of the Acorn Electron Advanced User Guide for information on how to write the software for the Electron. It was enlightening coming back to this book after a few decades for assistance on a specific exercise, even though I have perused the book many times in its revised forms online, because returning to it with a focus on a particular task led me to find that the documentation in the book was often vague or incomplete.

Although the authors were working in a different era and presumably under a degree of time pressure, I feel that the book in some ways exhibits various traits familiar to those of us working in the software industry, these indicating a lack of rigour and of sufficient investment in systems documentation. For this, I mostly blame the company who commissioned the work and then presumably handed over some notes and told the authors to fill in the gaps. As if to strengthen such perceptions of hurriedness and lack of review, it also does not help that “system” is mis-spelled “sysem” in a number of places in the book!

Nevertheless, certain aspects of the book were helpful. The examples, although focusing on one particular use-case, did provide helpful detail in deducing the correct way of using certain mechanisms, even if they elected to avoid the correct way of performing other tasks. Acorn’s documentation had a habit of being “preachy” about proper practices, only to see its closest developers ignore those practices, anyway. Eventually, on returning from my time away, I was able to fill in some of the gaps, although by this time I had a working prototype that was able to do basic things like initiate a session on the host system and to perform some file-related operations.

There were, and still are, a lot of things that needed, and still need, improvement with my implementation. The way that the operating system needs to be extended to provide extra filing system functionality involves plenty of programming interfaces, plenty of things to support, and also plenty of opportunities for things to go wrong. The VIEW word processor makes use of interfaces for both whole-file loading and saving as well as random-access file operations. Missing out support for one or the other will probably not yield the desired level of functionality.

There are also intricacies with regard to switching printing on and off – this typically being done using control characters sent through the output stream – and of “spool” files which capture character output. And filing system ROMs need to be initialised through a series of “service calls”, these being largely documented, but the overall mechanism is left largely undescribed in the documentation. It is difficult enough deciphering the behaviour of the Electron’s operating system today, with all the online guidance available in many forms, so I cannot imagine how difficult it would have been as a third party to effectively develop applications back in the day.

Levels of Simulation

To support the activities of the ROM software in the emulated Electron, I had to develop a server program running on my host computer. As noted above, this was not onerous, especially since I had already written a program to exercise the serial communications and to interact with the emulated serial port. I developed this program further to respond to commands issued by my ROM, performing host operations and returning results. For example, the CAT command produces a “catalogue” of files in a host directory, and so my server program performs a directory listing operation, collects the names of the files, and then sends them over the virtual serial link to the ROM for it to display to the user.

To make the experience somewhat authentic and to approximate to an actual deployment environment, I included a simulation of the login prompt so that the user of the emulated Electron would have to log in first, with the software also having to deal with a logged out (or not yet logged in) condition in a fairly graceful way. To ensure that they are logged in, a user selects the Serial Computing Environment using the *SCE command, this explicitly selecting the serial filing system, and the login dialogue is then presented if the user has not yet logged into the remote host. Once logged in, the ROM software should be able to test for the presence of the command processor that responds to issued commands, only issuing commands if the command processor has signalled its presence.

Although this models a likely deployment environment, I wanted to go a bit further in terms of authenticity, and so I decided to make the command processor a separate program that would be installed in a user account on a Unix machine. The user’s profile script would be set up to run the command processor, so that when they logged in, this program would automatically run and be ready for commands. I was first introduced to such practices in my first workplace where a menu-driven, curses-based program I had written was deployed so that people doing first-line technical support could query the database of an administrative system without needing to be comfortable with the Unix shell environment.

For complete authenticity I would actually want to have the emulated Electron contact a Unix-based system over a physical serial connection, but for now I have settled for an arrangement whereby a pseudoterminal is created to run the login program, with the terminal output presented to the emulator. Instead of seeing a simulated login dialogue, the user now interacts with the host system’s login program, allowing them to log into a real account. At that point, the command processor is invoked by the shell and the user gets back control.

Obtaining a genuine login dialogue from a Unix system.

Obtaining a genuine login dialogue from a Unix system.

To prevent problems with certain characters, the command processor configures the terminal to operate in raw mode. Apart from that, it operates mostly as it did when run together with the login simulation which did not have to concern itself with such things as terminals and login programs.

Some Applications

This effort was motivated by the need or desire to be able to access files from within Elkulator, particularly from applications such as VIEW. Naturally, VIEW is really just one example from the many applications available for the Electron, but since it interacts with a range of functionality that this serial computing environment provides, it serves to showcase such functionality fairly well. Indeed, some of the screenshots featured in this and the previous article show VIEW operating on text that was saved and loaded over the serial connection.

Accessing files involves some existing operating system commands, such as *CAT (often abbreviated to *.) to list the catalogue of a storage medium. Since a Unix host supports hierarchical storage, whereas the Electron’s built-in command set only really addresses the needs of a flat storage medium (as provided by various floppy disk filing systems for Electron and BBC Micro), the *DIR command has been introduced from Acorn’s hierarchical filing systems (such as ADFS) to navigate between directories, which is perhaps confusing to anyone familiar with other operating systems, such as the different variants of DOS and their successors.

Using catalogue and directory traversal commands.

Using catalogue and directory traversal commands.

VIEW allows documents to be loaded and saved in a number of ways, but as a word processor it also needs to be able to print these documents. This might be done using a printer connected to a parallel port, but it makes a bit more sense to instead allow the serial printer to be selected and for printing to occur over the serial connection. However, it is not sufficient to merely allow the operating system to take over the serial link and to send the printed document, if only because the other side of this link is not a printer! Indeed, the command processor is likely to be waiting for commands and to see the incoming data as ill-formed input.

The chosen solution was to intercept attempts to send characters to a serial printer, buffering them and then sending the buffered data in special commands to the command processor. This in turn would write the printed characters to a “spool” file for each printing session. From there, these files could be sent to an appropriate printer. This would give the user rather more control over printing, allowing them to process the printout with Unix tools, or to select one particular physical printer out of the many potentially available in an organisation. In the VIEW environment, and in the MOS environment generally, there is no built-in list of printers or printer selection dialogue.

Since the kinds of printers anticipated for use with VIEW might well have been rather different from the kinds connected to multi-user systems, it is likely that some processing would be desirable where different text styles and fonts have been employed. Today, projects like PrinterToPDF exist to work with old-style printouts, but it is conceivable that either the “printer driver generator” in the View suite or some postprocessing tool might have been used to produce directly printable output. With unstyled text, however, the printouts are generally readable and usable, as the following excerpt illustrates.

               A  brief report on the experience
               of using VIEW as a word processor
               four decades on.

Using VIEW on the Acorn  Electron  is  an  interesting  experience  and  a
glimpse  into  the  way  word  processing  was  once done. Although I am a
dedicated user of Vim, I am under no  illusions  of  that  program's  word
processing  capabilities: it is deliberately a screen editor based on line
editor  heritage,  and  much  of  its  operations  are  line-oriented.  In
contrast, VIEW is intended to provide printed output: it presents the user
with a  ruler  showing  the  page margins and tab stops, and it even saves
additional   rulers   into  the  stored  document   in   their   on-screen
representations. Together with its default typewriter-style  behaviour  of
allowing  the  cursor  to  be moved into empty space and of overwriting or
replacing text, there is a quaint feel to it.

Since VIEW is purely text-based, I can easily imagine converting its formatting codes to work with troff. That would then broaden the output options. Interestingly, the Advanced User Guide was written in VIEW and then sent to a company for typesetting, so perhaps a workflow like this would have been useful for the authors back then.

A major selling point of the Electron was its provision of BBC BASIC as the built-in language. As the BBC Micro had started to become relatively widely adopted in schools across the United Kingdom, a less expensive computer offering this particular dialect of BASIC was attractive to purchasers looking for compatibility with school computers at home. Obviously, there is a need to be able to load and save BASIC programs, and this can be done using the serial connection.

Loading a BASIC program from the Unix host.

Loading a BASIC program from the Unix host.

Beyond straightforward operations like these, BASIC also provides random-access file operations through various keywords and constructs, utilising the underlying operating system interfaces that invoke filing system operations to perform such work. VIEW also appears to use these operations, so it seems sensible not to ignore them, even if many programmers might have preferred to use bulk transfer operations – the standard load and save – to get data in and out of memory quickly.

A BASIC program reading and showing a file.

A BASIC program reading and showing a file.

Interactions between printing, the operating system’s own spooling support, outputting characters and reading and writing data are tricky. A degree of experimentation was required to make these things work together. In principle, it should be possible to print and spool at the same time, even with output generated by the remote host that has been sent over the serial line for display on the Electron!

Of course, as a hybrid terminal, the exercise would not be complete without terminal functionality. Here, I wanted to avoid going down another rabbit hole and implementing a full terminal emulator, but I still wanted to demonstrate the invocation of a shell on the Unix host and the ability to run commands. To show just another shell session transcript would be rather dull, so here I present the perusal of a Python program to generate control codes that change the text colour on the Electron, along with the program’s effects:

Interaction with the shell featuring multiple text colours.

Interaction with the shell featuring multiple text colours.

As a bitmapped terminal, the Electron is capable of much more than this. Although limited to moderate resolutions by the standards of the fanciest graphics terminals even of that era, there are interesting possibilities for Unix programs and scripts to generate graphical output.

A chart generated by a Python program showing workstation performance results.

A chart generated by a Python program showing workstation performance results.

Sending arbitrary character codes requires a bit of terminal configuration magic so that line feeds do not get translated into other things (the termios manual page is helpful, here, suggesting the ONLCR flag as the culprit), but the challenge, as always, is to discover the piece of the stack of technologies that is working against you. Similar things can be said on the Electron as well, with its own awkward confluence of character codes for output and output control, requiring the character output state to be tracked so that certain values do not get misinterpreted in the wrong context.

Others have investigated terminal connectivity on Acorn’s 8-bit microcomputers and demonstrated other interesting ways of producing graphical output from Unix programs. Acornsoft’s Termulator could even emulate a Tektronix 4010 graphical terminal. Curiously, Termulator also supported file transfer between a BBC Micro and the host machine, although only as a dedicated mode and limited to ASCII-only text files, leaving the hybrid terminal concept unexplored.

Reflections and Remarks

I embarked on this exercise with some cautiousness, knowing that plenty of uncertainties lay ahead in implementing a functional piece of software, and there were plenty of frustrating moments as some of the different elements of the rather underdocumented software stack conspired to produce undesirable behaviour. In addition, the behaviour of my serial emulation code had a confounding influence, requiring some low-level debugging (tracing execution within the emulator instruction by instruction, noting the state of the emulated CPU), some slowly dawning realisations, and some adjustments to hopefully make it work in a more cooperative fashion.

There are several areas of potential improvement. I first programmed in 6502 assembly language maybe thirty-five years ago, and although I managed to get some sprite and scrolling routines working, I never wrote any large programs, nor had to interact with the operating system frameworks. I personally find the 6502 primitive, rigid, and not particularly conducive to higher-level programming techniques, and I found myself writing some macros to take away the tedium of shuffling values between registers and the stack, constantly aware of various pitfalls with regard to corrupting registers.

My routines extending the operating system framework possibly do not do things the right way or misunderstand some details. That, I will blame on the vague documentation as well as any mistakes made micromanaging the registers. Particularly frustrating was the way that my ROM code would be called with interrupts disabled in certain cases. This made implementation challenging when my routines needed to communicate over the serial connection, when such communication itself requires interrupts to be enabled. Quite what the intention of the MOS designers was in such circumstances remains something of a mystery. While writing this article, I realised that I could have implemented the printing functionality in a different way, and this might have simplified things, right up to the point where I saw, thanks to the debugger provided by Elkulator, that the routines involved are called – surprise! – with interrupts disabled.

Performance could be a lot better, with this partly due to my own code undoubtedly requiring optimisation. The existing software stack is probably optimised to a reasonable extent, but there are various persistent background activities that probably steal CPU cycles unnecessarily. One unfortunate contributor to performance limitations is the hardware architecture of the Electron. Indeed, I discovered while testing in one of the 80-column display modes that serial transfers were not reliable at the default transfer rate of 9600 baud, instead needing to be slowed down to only 2400 baud. Some diagnosis confirmed that the software was not reading the data from the serial chip quickly enough, causing an overflow condition and data being lost.

Motivated by cost reduction and product positioning considerations – the desire to avoid introducing a product that might negatively affect BBC Micro sales – the Electron was deliberately designed to use a narrow data bus to fewer RAM chips than otherwise would have been used, with a seemingly clever technique being employed to allow the video circuitry to get the data at the desired rate to produce a high-resolution or high-bandwidth display. Unfortunately, the adoption of the narrow data bus, facilitated by the adoption of this particular technique, meant that the CPU could only ever access RAM at half its rated speed. And with the narrow data bus, the video circuitry effectively halts the CPU altogether for a substantial portion of its time in high-bandwidth display modes. Since serial communications handling relies on the delivery and handling of interrupts, if the CPU is effectively blocked from responding quickly enough, it can quickly fall behind if the data is arriving and the interrupts are occurring too often.

That does raise the issue of reliability and of error correction techniques. Admittedly, this work relies on a reliable connection between the emulated Electron and the host. Some measures are taken to improve the robustness of the communication when messages are interrupted so that the host in particular is not left trying to send or receive large volumes of data that are no longer welcome or available, and other measures are taken to prevent misinterpretation of stray data received in a different and thus inappropriate context. I imagine that I may have reinvented the wheel badly here, but these frustrations did provide a level of appreciation of the challenges involved.

Some Broader Thoughts

It is possible that Acorn, having engineered the Electron too aggressively for cost, made the machine less than ideal for the broader range of applications for which it was envisaged. That said, it should have been possible to revise the design and produce a more performant machine. Experiments suggest that a wider data path to RAM would have helped with the general performance of the Electron, but to avoid most of the interrupt handling problems experienced with the kind of application being demonstrated here, the video system would have needed to employ its existing “clever” memory access technique in conjunction with that wider data path so as to be able to share the bandwidth more readily with the CPU.

Contingency plans should have been made to change or upgrade the machine, if that had eventually been deemed necessary, starting at the point in time when the original design compromises were introduced. Such flexibility and forethought would also have made a product with a longer appeal to potential purchasers, as opposed to a product that risked being commercially viable for only a limited period of time. However, it seems that the lessons accompanying such reflections on strategy and product design were rarely learned by Acorn. If lessons were learned, they appear to have reinforced a particular mindset and design culture.

Virtue is often made of the Acorn design philosophy and the sometimes rudely expressed and dismissive views of competing technologies that led the company to develop the ARM processor. This approach enabled comparatively fast and low-cost systems to be delivered by introducing a powerful CPU to do everything in a system from running applications to servicing interrupts for data transfers, striving for maximal utilisation of the available memory bandwidth by keeping the CPU busy. That formula worked well enough at the low end of the market, but when the company tried to move upmarket once again, its products were unable to compete with those of other companies. Ultimately, this sealed the company’s fate, even if more fortuitous developments occurred to keep ARM in the running.

(In the chart shown earlier demonstating graphical terminal output and illustrating workstation performance, circa 1990, Acorn’s R260 workstation is depicted as almost looking competitive until one learns that the other workstations depicted arrived a year earlier and that the red bar showing floating-point performance only applies to Acorn’s machine three years after its launch. It would not be flattering to show the competitors at that point in history, nor would it necessarily be flattering to compare whole-system performance, either, if any publication sufficiently interested in such figures had bothered to do so. There is probably an interesting story to be told about these topics, particularly how Acorn’s floating-point hardware arrived so late, but I doubt that there is the same willingness to tell it as there is to re-tell the usual celebratory story of ARM for the nth time.)

Acorn went on to make the Communicator as a computer that would operate in a kind of network computing environment, relying on network file servers to provide persistent storage. It reused some of the technology in the Electron and the BT Merlin M2105, particularly the same display generator and its narrow data bus to RAM, but ostensibly confining that aspect of the Electron’s architecture to a specialised role, and providing other facilities for applications and, as in the M2105, for interaction with peripherals. Sadly, the group responsible in Acorn had already been marginalised and eventually departed, apparently looking to pursue the concept elsewhere.

As for this particular application of an old computer and a product that was largely left uncontemplated, I think there probably was some mileage in deploying microcomputers in this way, even outside companies like Acorn where such computers were being developed and used, together with software development companies with their own sophisticated needs, where minicomputers like the DEC VAX would have been available for certain corporate or technical functions. Public (or semi-public) access terminals were fairly common in universities, and later microcomputers were also adopted in academia due to their low cost and apparently sufficient capabilities.

Although such adoption appears to have focused on terminal applications, it cannot have been beyond the wit of those involved to consider closer integration between the microcomputing and multi-user environments. In further and higher education, students will have had microcomputing experience and would have been able to leverage their existing skills whilst learning new ones. They might have brought their microcomputers along with them, giving them the opportunity to transfer or migrate their existing content – their notes, essays, programs – to the bright and emerging new world of Unix, as well as updating their expertise.

As for updating my own expertise, it has been an enlightening experience in some ways, and I may well continue to augment the implemented functionality, fix and improve things, and investigate the possibilities this work brings. I hope that this rather lengthy presentation of the effort has provided insights into experiences of the past that was and the past that might have been.

Considering Unexplored Products of the Past: Emulating an Expansion

Wednesday, February 8th, 2023

In the last couple of years, possibly in common with quite a few other people, certainly people of my vintage, and undoubtedly those also interested in retrocomputing, I have found myself revisiting certain aspects of my technological past. Fortunately, sites like the Internet Archive make this very easy indeed, allowing us to dive into publications from earlier eras and to dredge up familiar and not so familiar magazine titles and other documentation. And having pursued my retrocomputing interest for a while, participating in forums, watching online videos, even contributing to new software and hardware developments, I have found myself wanting to review some of the beliefs and perceptions that I and other people have had of the companies and products we grew up with.

One of the products of personal interest to me is the computer that got me and my brother started with writing programs (as well as playing games): the Acorn Electron, a product of Acorn Computers of Cambridge in the United Kingdom. Much can be said about the perceived chronology of this product’s development and introduction, the actual chronology, and its impact on its originator and on wider society, but that surely deserves a separate treatment. What I can say is that reviewing the archives and other knowledge available to us now can give a deeper understanding of the processes involved in the development of the Electron, the technological compromises made, and the corporate strategy that led to its creation and eventually its discontinuation.

By Bilby - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=10957142

The Acorn Electron
(Picture attribution: By BilbyOwn work, CC BY 3.0, Link)

It has been popular to tell simplistic narratives about Acorn Computers, to reduce its history to a few choice moments as the originator of the BBC Microcomputer and the ARM processor, but to do so is to neglect a richer and far more interesting story, even if the fallibility of some of the heroic and generally successful characters involved may be exposed by telling some of that story. And for those who wonder how differently some aspects of computing history might have turned out, exploring that story and the products involved can be an adventure in itself, filling in the gaps of our prior experiences with new insights, realisations and maybe even glimpses into opportunities missed and what might have been if things had played out differently.

At the Rabbit Hole

Reading about computing history is one thing, but this tale is about actually doing things with old software, emulation, and writing new software. It started off with a discussion about the keyboard shortcuts for a word processor and the differences between the keyboards on the Acorn Electron and its higher-specification predecessor, the BBC Microcomputer. Having acquainted myself with the circuitry of the Electron, how its keyboard is wired up, and how the software accesses it, I was obviously intrigued by these apparent differences, but I was also intrigued by the operation of the word processor in question, Acornsoft’s VIEW.

Back in the day, as people like to refer to the time when these products were first made available, such office or productivity applications were just beyond my experience. Although it was slightly fascinating to read about them, most of my productive time was spent writing programs, mostly trying to write games. I had actually seen an office suite written by Psion on the ACT Sirius 1 in the early 1980s, but word processors were the kind of thing that people used in offices or, at the very least, by people who had a printer so that they could print the inevitable letters that everyone would be needing to write.

Firing up an Acorn Electron emulator, specifically Elkulator, I discovered that one of the participants in the discussion was describing keyboard shortcuts that didn’t match up to those that were described in a magazine article from the era, these appearing correct as I tried them out for myself. It turned out that the discussion participant in question was using the BBC Micro version of VIEW on the Electron and was working around the mismatch in keyboard layouts. Although all of this was much ado about virtually nothing, it did two things. Firstly, it made me finally go in and fix Elkulator’s keyboard configuration dialogue, and secondly, it made me wonder how convenient it would be to explore old software in a productive way in an emulator.

Reconciling Keyboards

Having moved to Norway many years ago now, I use a Norwegian keyboard layout, and this has previously been slightly problematic when using emulators for older machines. Many years ago, I used and even contributed some minor things to another emulator, ElectrEm, which had a nice keyboard configuration dialogue. The Electron’s keyboard corresponds to certain modern keyboards pretty well, at least as far as the alphanumeric keys are concerned. More challenging are the symbols and control-related keys, in particular the Electron’s special Caps Lock/Function key which sits where many people now have their Tab key.

Obviously, there is a need to be able to tell an emulator which keys on a modern keyboard are going to correspond to the keys on the emulated machine. Being derived from an emulator for the BBC Micro, however, Elkulator’s keyboard configuration dialogue merely presented a BBC Micro keyboard on the screen and required the user to guess which “Beeb” key might correspond to an Electron one. Having put up with this situation for some time, I finally decided to fix this once and for all. The process of doing so is not particularly interesting, so I will spare you the details of doing things with the Allegro toolkit and the Elkulator source code, but I was mildly pleased with the result:

The revised keyboard configuration dialogue in Elkulator.

The revised keyboard configuration dialogue in Elkulator.

By also adding support for redefining the Break key in a sensible way, I was also finally able to choose a key that desktop environments don’t want to interfere with: F12 might work for Break, but Ctrl-F12 makes KDE/Plasma do something I don’t want, and yet Ctrl-Break is quite an important key combination when using an Electron or BBC Micro. Why Break isn’t a normal key on these machines is another story in itself, but here is an example of redefining it and even allowing multiple keys on a modern keyboard to act as Break on the emulated computer:

Redefining the Break key in Elkulator.

Redefining the Break key in Elkulator.

Being able to confidently choose and use keys made it possible to try out VIEW in a more natural way. But this then led to another issue: how might I experiment with such software productively? It would be good to write documents and to be able to extract them from the emulator, rather than see them disappear when the emulator is closed.

Real and Virtual Machines

One way to get text out of a system, whether it is a virtual system like the emulated Electron or a real machine, is to print it. I vaguely remembered some support for printing from Elkulator and was reminded by my brother that he had implemented such support himself a while ago as a quick way of getting data out of the emulated system. But I also wanted to be able to get data into the emulated system as well, and the parallel interface typically used by the printer is not bidirectional on the Electron. So, I would need to look further for a solution.

It is actually the case that Elkulator supports reading from and writing to disk (or disc) images. The unexpanded Electron supports read/write access to cassettes (or tapes), but Elkulator does not support writing to tapes, probably because the usability considerations are rather complicated: one would need to allow the user to control the current position on a tape, and all this would do is to remind everyone how inconvenient tapes are. Meanwhile, writing to disk images would be fairly convenient within the emulator, but then one would need to use tools to access the files within the images outside the emulator.

Some emulators for various systems also support the notion of a host filesystem (or filing system) where some special support has been added to make the emulated machine see another peripheral and to communicate with it, this peripheral really being a program on the host machine (the machine that is running the emulator). I could have just written such support, although it would also have needed some software support written for the emulated machine as well, but this approach would have led me down a path of doing something specific to emulation. And I have a principle of sorts which is that if I am going to change the way an emulated machine behaves, it has to be rooted in some kind of reality and not just enhance the emulated machine in a way that the original, “real” machine could not have been.

Building on Old Foundations

As noted earlier, I have an interest in the way that old products were conceived and the roles for which those products were intended by their originators. The Electron was largely sold as an unexpanded product, offering only power, display and cassette ports, with a general-purpose expansion connector being the gateway to anything else that might have been added to the system later. This was perceived somewhat negatively when the machine was launched because it was anticipated that buyers would probably, at the very least, want to plug joysticks into the Electron to play games. Instead, Acorn offered an expansion unit, the Plus 1, that cost another £60 which provided joystick, printer and cartridge connectors.

But this flexibility in expanding the machine meant that it could have been used as the basis for a fairly diverse range of specialised products. In fact, one of the Acorn founders, Chris Curry, enthused about the Electron as a platform for such products, and one such product did actually make it to market, in a way: the BT Merlin M2105 messaging terminal. This terminal combined the Electron with an expansion unit containing circuitry for communicating over a telephone line, a generic serial communications port, a printer port, as well as speech synthesis circuitry and a substantial amount of read-only memory (ROM) for communications software.

Back in the mid-1980s, telecommunications (or “telecoms”) was the next big thing, and enthusiasm for getting a modem and dialling up some “online” service or other (like Prestel) was prevalent in the computing press. For businesses and institutions, there were some good arguments for adopting such technologies, but for individuals the supposed benefits were rather dulled by the considerable costs of acquiring the hardware, buying subscriptions, and the notoriously high telephone call rates of the era. Only the relatively wealthy or the dedicated few pursued this side of data communications.

The M2105 reportedly did some service in the healthcare sector before being repositioned for commercial applications. Along with its successor product, the Acorn Communicator, it enjoyed a somewhat longer lifespan in certain enterprises. For the standard Electron and its accompanying expansions, support for basic communications capabilities was evidently considered important enough to be incorporated into the software of the Plus 1 expansion unit, even though the Plus 1 did not provide any of the specific hardware capabilities for communication over a serial link or a telephone line.

It was this apparently superfluous software capability that I revisited when I started to think about getting files in and out of the emulator. When emulating an Electron with Plus 1, this serial-capable software is run by the emulator, just as it is by a real Electron. On a real system of this kind, a cartridge could be added that provides a serial port and the necessary accompanying circuitry, and the system would be able to drive that hardware. Indeed, such cartridges were produced decades ago. So, if I could replicate the functionality of a cartridge within the emulator, making some code that pretends to be a serial communications chip (or UART) that has been interfaced to the Electron, then I would in principle be able to set up a virtual serial connection between the emulated Electron and my modern host computer.

Emulated Expansions

Modifying Elkulator to add support for serial communications hardware was fairly straightforward, with only a few complications. Expansion hardware on the Electron is generally accessible via a range of memory addresses that actually signal peripherals as opposed to reading and writing memory. The software provided by the Plus 1 expansion unit is written to expect the serial chip to be accessible via a range of memory locations, with the serial chip accepting values sent to those locations and producing values from those locations on request. The “memory map” through which the chip is exposed in the Electron corresponds directly to the locations or registers in the serial chip – the SCN2681 dual asynchronous receiver/transmitter (DUART) – as described by its datasheet.

In principle, all that is needed is to replicate the functionality described by the datasheet. With this done, the software will drive the chip, the emulated chip will do what is needed, and the illusion will be complete. In practice, a certain level of experimentation is needed to fill in the gaps left by the datasheet and any lack of understanding on the part of the implementer. It did help that the Plus 1 software has been disassembled – some kind of source code regenerated from the binary – so that the details of its operation and its expectations of the serial chip’s operation can be established.

Moreover, it is possible to save a bit of effort by seeing which features of the chip have been left unused. However, some unused features can be provided with barely any extra effort: the software only drives one serial port, but the chip supports two in largely the same way, so we can keep support for two just in case there is a need in future for such capabilities. Maybe someone might make a real serial cartridge with two ports and want to adapt the existing software, and they could at least test that software under emulation before moving to real hardware.

It has to be mentioned that the Electron’s operating system, known as the Machine Operating System or MOS, is effectively extended by the software provided in the Plus 1 unit. Even the unexpanded machine provides the foundations for adding serial communications and printing capabilities in different ways, and the Plus 1 software merely plugs into that framework. A different kind of serial chip would be driven by different software but it would plug into the same framework. At no point does anyone have to replace the MOS with a patched version, which seems to be the kind of thing that happens with some microcomputers from the same era.

Ultimately, what all of this means is that having implemented the emulated serial hardware, useful things can already be done with it within the bare computing environment provided by the MOS. One can set the output stream to use the serial port and have all the text produced by the system and programs sent over the serial connection. One can select the serial port for the input stream and send text to the computer instead of using the keyboard. And printing over the serial connection is also possible by selecting the appropriate printer type using a built-in system command.

In Elkulator, I chose to expose the serial port via a socket connection, with the emulator binding to a Unix domain socket on start-up. I then wrote a simple Python program to monitor the socket, to show any data being sent from the emulator and to send any input from the terminal to the emulator. This permitted the emulated machine to be operated from a kind of remote console and for the emulated machine to be able to print to this console. At last, remote logins are possible on the Electron! Of course, such connectivity was contemplated and incorporated from the earliest days of these products.

Filing Options

If the goal of all of this had been to facilitate transfers to and from the emulated machine, this might have been enough, but a simple serial connection is not especially convenient to use. Although a method of squirting a file into the serial link at the Electron could be made convenient for the host computer, at the other end one has to have a program to do something with that file. And once the data has arrived, would it not be most convenient to be able to save that data as a file? We just end up right back where we started: having some data inside the Electron and nowhere to put it! Of course, we could enable disk emulation and store a file on a virtual disk, but then it might just have been easier to make disk image handling outside the emulator more convenient instead.

It seemed to me that the most elegant solution would be to make the serial link act as the means through which the Electron accesses files. That instead of doing ad-hoc transfers of data, such data would be transferred as part of operations that are deliberately accessing files. Such ambitions are not unrealistic, and here I could draw on my experience with the platform, having acquired the Acorn Electron Advanced User Guide many, many years ago, in which there are details of implementing filing system ROMs. Again, the operating system had been designed to be extended in order to cover future needs, and this was one of them.

In fact, I had not been the only one to consider a serial filing system, and I had been somewhat aware of another project to make software available via a serial link to the BBC Micro. That project had been motivated by the desire to be able to get software onto that computer where no storage devices were otherwise available, even performing some ingenious tricks to transfer the filing system software to the machine and to have that software operate from RAM. It might have been tempting merely to use this existing software with my emulated serial port, to get it working, and then to get back to trying out applications, loading and saving, and to consider my work done. But I had other ideas in mind…

Pessimistic perspectives on technological sustainability

Tuesday, August 16th, 2022

I was recently perusing the Retro Computing Forum when I stumbled across a mention of Collapse OS. If your anxiety levels have not already been maxed out during the last few years of climate breakdown, psychological warfare, pandemic, and actual warmongering, accompanied by supply chain breakdowns, initially in technology and exacerbated by overconsumption and spivcoin, now also in commodities and exacerbated by many of those other factors (particularly the warmongering), then perhaps focusing on societal and civilisational collapse isn’t going to improve your mood or your outlook. Unusually, then, after my last, rather negative post on such topics, may I be the one to introduce some constructive input and perhaps even some slight optimism?

If I understand the motivations behind Collapse OS correctly, it is meant to provide a modest computing environment that can work on well-understood, commonplace, easily repaired and readily sourced hardware, with the software providing the environment itself being maintainable on the target hardware, as opposed to being cross-built on more powerful hardware and then deployed to simpler, less capable hardware. The envisaged scenario for its adoption is a world where powerful new hardware is no longer produced or readily available and where people must scavenge and “make do” with the hardware already produced. Although civilisation may have brought about its own collapse, the consolation is that so much hardware will have been strewn across the planet for a variety of purposes that even after semiconductor fabrication and sophisticated manufacturing have ceased, there will remain a bounty of hardware usable for people’s computational needs (whatever they may be).

I am not one to try and predict the future, and I don’t really want to imagine it as being along the same lines as the plot for one of Kevin Costner’s less successful movies, either, but I feel that Collapse OS and its peers, in considering various dystopian scenarios and strategies to mitigate their impacts, may actually offer more than just a hopefully sufficient kind of preparedness for a depressing future. In that future, without super-fast Internet, dopamine-fired social media, lifelike gaming, and streaming video services with huge catalogues of content available on demand, everyone has to accept that far less technology will be available to them: they get no choice in the matter. Investigating how they might manage is at the very least an interesting thought experiment. But we would be foolish to consider such matters as purely part of a possible future and not instructive in other ways.

An Overlap of Interests

As readers of my previous articles will be aware, I have something of an interest in older computers, open source hardware, and sustainable computing. Older computers lend themselves to analysis and enhancement even by individuals with modest capabilities and tools because they employ technologies that may have been regarded as “miniaturised” when they were new, but they were still amenable to manual assembly and repair. Similarly, open source hardware has grown to a broad phenomenon because the means to make computing systems and accessories has now become more accessible to individuals, as opposed to being the preserve of large and well-resourced businesses. Where these activities experience challenges, it is typically in the areas that have not yet become quite as democratised, such as semiconductor fabrication at the large-scale integration level, along with the development and manufacture of more advanced technology, such as components and devices that would be competitive with off-the-shelf commercial products.

Some of the angst around open source hardware concerns the lack of investment it receives from those who would benefit from it, but much of that investment would largely be concerned with establishing an ability to maintain some kind of parity with modern, proprietary hardware. Ignoring such performance-led requirements and focusing on simpler hardware projects, as many people already do, brings us a lot closer to retrocomputing and a lot closer to the constrained hardware scenario envisaged by Collapse OS. My own experiments with PIC32-based microcontrollers are not too far removed from this, and it would not be inconceivable to run a simple environment in the 64K of RAM and 256K of flash memory of the PIC32MX270, this being much more generous than many microcomputers and games consoles of the 1980s.

Although I relied on cross-compilation to build the programs that would run on the minimal hardware of the PIC32 microcontroller, Collapse OS emphasises self-hosting: that it is possible to build the software within the running software itself. After all, how sustainable would a frugal computing environment be if it needed a much more powerful development system to fix and improve it? For Collapse OS, such self-hosting is enabled by the use of the Forth programming language, as explained by the rationale for switching to Forth from a system implemented in assembly language. Such use of Forth is not particularly unusual: its frugal demands were prized in the microcomputer era and earlier, with its creator Charles Moore describing the characteristics of a computer designed to run Forth as needing around 8K of RAM and 8K of ROM, this providing a complete interactive system.

(If you are interested in self-hosting and bootstrapping, one place to start might be the bootstrapping wiki.)

For a short while, Forth was perhaps even thought to be the hot new thing in some circles within computing. One fairly famous example was the Jupiter Ace microcomputer, developed by former Sinclair Research designers, offering a machine that followed on fairly closely from Sinclair’s rudimentary ZX81. But in a high-minded way one might have expected from the Sinclair stable and the Cambridge scene, it offered Forth as its built-in language in response to all the other microcomputers offering “unstructured” BASIC dialects. Worthy as such goals might have been, the introduction of a machine with outdated hardware specifications condemned it in its target market as a home computer, with it offering primitive black-and-white display output against competitors offering multi-colour graphics, and offering limited amounts of memory as competitors launched with far more fitted as standard. Interestingly, the Z80 processor at the heart of the Ace was the primary target of Collapse OS, and one might wonder if the latter might actually be portable to the former, which would be an interesting project if any hardware collector wants to give it a try!

Other Forth-based computers were delivered such as the Canon Cat: an unusual “information appliance” that might have formed the basis of Apple’s Macintosh had that project not been diverted towards following up on the Apple Lisa. Dedicated Forth processors were even delivered, as anticipated already by Moore back in 1980, reminiscent of the Lisp machine era. However, one hardware-related legacy of Forth is that of the Open Firmware standard where a Forth environment provides an interactive command-line interface to a system’s bootloader. Collapse OS fits in pretty well with that kind of application of Forth. Curiously, someone did contact me when I first wrote about my PIC32 experiments, this person maintaining their own microcontroller Forth implementation, and in the context of this article I have re-established contact because I never managed to properly follow up on the matter.

Changing the Context

According to a broad interpretation of the Collapse OS hardware criteria, the PIC32MX270 would actually not be a bad choice. Like the AVR microcontrollers and the microprocessors of the 1980s, PIC32MX microcontrollers are available in convenient dual in-line packages, but unlike those older microprocessors they also offer the 32-bit MIPS architecture that is nicer to program than the awkward instruction sets of the likes of the Z80 and 6502, no matter how much nostalgia colours people’s preferences. However, instead of focusing on hardware suitability in a resource-constrained future, I want to consider the messages of simplicity and sustainability that underpin the Collapse OS initiative and might be relevant to the way we practise computing today.

When getting a PIC32 microcontroller to produce a video signal, part of the motivation was just to see how straightforward it might be to make a simple “single chip” microcomputer. Like many microcomputers back in the 1980s, it became tempting to consider how it might be used to deliver graphical demonstrations and games, but I also wondered what kind of role such a system might have in today’s world. Similar projects, including the first versions of the Maximite have emphasised such things as well, along with interfacing and educational applications (such as learning BASIC). Indeed, many low-end microcontroller-based computers attempt to recreate and to emphasise the sparse interfaces of 1980s microcomputers as a distraction-free experience for learning and teaching.

Eliminating distractions is a worthy goal, whether those distractions are things that we can conveniently seek out when our attention wanders, such as all our favourite, readily accessible Internet content, or whether they come in the form of the notifications that plague “modern” user interfaces. Another is simply reducing the level of consumption involved in our computational activities: civilisational collapse would certainly impose severe limits on that kind of consumption, but it would seem foolish to acknowledge that and then continue on the same path of ever-increasing consumption that also increasingly fails to deliver significant improvements in the user experience. When desktop applications, mobile “apps”, and Web sites frequently offer sluggish and yet overly-simplistic interfaces that are more infuriating than anything else, it might be wise to audit our progress and reconsider how we do certain things.

Human nature has us constantly exploring the boundaries of what is possible with technology, but some things which captivate people at any given point on the journey of technological progress may turn out to be distracting diversions from the route ultimately taken. In my trawl of microcomputing history over the last couple of years, I was reminded of an absurd but illustrative example of how certain technological exercises seem to become the all-consuming focus of several developers, almost being developed for the sake of it, before the fad in question flames out and everybody moves on. That example concerned “morphing” software, inspired by visual effects from movies such as Terminator 2, but operating on a simpler, less convincing level.

Suddenly, such effects were all over the television and for a few months in late 1993, everyone was supposedly interested in making the likeness of one famous person slowly change into the likeness of another, never mind that it really required a good library of images, this being somewhat before widespread digital imaging and widespread image availability. Fast-forward a few years, and it all seemed like a crazy mass delusion best never spoken of again. We might want to review our own time’s obsessions with performative animations and effects, along with the peculiarities of touch-based interfaces, the assumption of pervasive and fast connectivity, and how these drive hardware consumption and obsolescence.

Once again, some of this comes back to asking how people managed to do things in earlier times and why things sometimes seem so complicated now. Thinking back to the 1980s era of microcomputing, my favourite 8-bit computer of those times was the Acorn Electron, this being the one I had back then, and it was certainly possible to equip it to do word processing to a certain level. Acorn even pitched an expanded version as a messaging terminal for British Telecom, although I personally think that they could have made more of such opportunities, especially given the machine’s 80-column text capabilities being made available at such a low price. The user experience would not exactly be appealing by today’s standards, but then nor would that of Collapse OS, either.

When I got my PIC32 experiment working reasonably, I asked myself if it would be sufficient for tasks like simple messaging and writing articles like this. The answer, assuming that I would enhance that effort to use a USB keyboard and external storage, is probably the same as whether anyone might use a Maximite for such applications: it might not be as comfortable as on a modern system but it would be possible in some way. Given the tricks I used, certain things would actually be regressions from the Electron, such as the display resolution. Conversely, the performance of a 48MHz MIPS-based processor is obviously going to be superior to a 2MHz 6502, even when having to generate the video signal, thus allowing for some potential in other areas.

Reversing Technological Escalation

Using low-specification hardware for various applications today, considering even the PIC32 as low-spec and ignoring the microcomputers of the past, would also need us to pare back the demands that such applications have managed to accumulate over the years. As more powerful, higher-performance hardware has become available, software, specifications and standards have opportunistically grown to take advantage of that extra power, leaving many people bewildered by the result: their new computer being just as slow as their old one, for example.

Standards can be particularly vulnerable where entrenched interests drive hardware consumption whilst seeking to minimise the level of adaptation their own organisations will need to undertake in order to deliver solutions based on such standards. A severely constrained computing device may not have the capacity or performance to handle all the quirks of a “full fat” standard, but it might handle an essential core of that standard, ignoring all the edge cases and special treatment for certain companies’ products. Just as important, the developers of an implementation handling a standard also may not have the capacity or tenacity for a “full fat” standard, but they may do a reasonable job handling one that cuts out all the corporate cruft.

And beyond the technology needed to perform some kind of transaction as part of an activity, we might reconsider what is necessary to actually perform that activity. Here, we may consider the more blatant case of the average “modern” Web site or endpoint, where an activity may end up escalating and involving the performance of a number of transactions, many of which superfluous and, in the case of the pervasive cult of analytics, exploitative. What once may have been a simple Web form is often now an “experience” where the browser connects to dozens of sites, where all the scripts poll the client computer into oblivion, and where the functionality somehow doesn’t manage to work, anyway (as I recently experienced on one airline’s Web site).

Technologists and their employers may drive consumption, but so do their customers. Public institutions, utilities and other companies may lazily rely on easily procured products and services, these insisting (for “security” or “the best experience”) that only the latest devices or devices from named vendors may be used to gain access. Here, the opposite of standardisation occurs, where adherence to brand names dictates the provision of service, compounded by the upgrade treadmill familiar from desktop computing, bringing back memories of Microsoft and Intel ostensibly colluding to get people to replace their computer as often as possible.

A Broader Brush

We don’t need to go back to retrocomputing levels of technology to benefit from re-evaluating the prevalent technological habits of our era. I have previously discussed single-board computers like the MIPS Creator CI20 which, in comparison to contemporary boards from the Raspberry Pi series, was fairly competitive in terms of specification and performance (having twice the RAM of the Raspberry Pi Models A+, B and B+). Although hardly offering conventional desktop performance upon its introduction, the CI20 would have made a reasonable workstation in certain respects in earlier times: its 1GHz CPU and 1GB of RAM should certainly be plenty for many applications even now.

Sadly, starting up and using the main two desktop environments on the CI20 is an exercise in patience, and I recommend trying something like the MATE desktop environment just for something responsive. Using a Web browser like Firefox is a trial, and extensive site blocking is needed just to prevent the browser wanting to download things from all over the place, as it tries to do its bit in shoring up Google’s business model. My father was asking me the other day why a ten-year-old computer might be slow on a “modern” Web site but still perfectly adequate for watching video. I would love to hear the Firefox and Chrome developers, along with the “architects of the modern Web”, give any explanation for this that doesn’t sound like they are members of some kind of self-realisation cult.

If we can envisage a microcomputer, either a vintage one or a modern microcontroller-based one, performing useful computing activities, then we can most certainly envisage machines of ten or so years ago, even ones behind the performance curve, doing so as well. And by realising that, we might understand that we might even have the power to slow down the engineered obsolescence of computing hardware, bring usable hardware back into use, and since not everyone on the planet can afford the latest and greatest, we might even put usable hardware into the hands of more people who might benefit from it.

Naturally, this perspective is rather broader than one that only considers a future of hardship and scarcity, but hardship and scarcity are part of the present, just as they have always been part of the past. Applying many of the same concerns and countermeasures to today’s situation, albeit in less extreme forms, means that we have the power to mitigate today’s situation and, if we are optimistic, perhaps steer it away from becoming the extreme situation that the Collapse OS initiative seeks to prepare for.

Concrete Steps

I have, in the past, been accused of complaining about injustices too generally for my complaints to be taken seriously, never mind such injustices being blatant and increasingly obvious in our modern societies and expressed through the many crises of our times. So how might we seek to mitigate widespread hardware obsolescence and technology-driven overconsumption? Some suggestions in a concise list for those looking for actionable things:

  • Develop, popularise and mandate lightweight formats, protocols and standards
  • Encourage interoperability and tolerance for multiple user interfaces, clients and devices
  • Insist on an unlimited “right to repair” for computing devices including the software
  • Encourage long-term thinking in software and systems development

And now for some elucidation…

Mandatory Accessible Standards

This suggestion has already been described above, but where it would gain its power is in the idea of mandating that public institutions and businesses would be obliged to support lightweight formats, protocols and standards, and not simply as an implementation detail for their chosen “app”, like a REST endpoint might be, but actually as a formal mechanism providing service to those who would interact with those institutions. This would make the use of a broad range of different devices viable, and in the case of special-purpose devices for certain kinds of users, particularly those who would otherwise be handed a smartphone and told to “get with it”, it would offer a humane way of accessing services that is currently denied to them.

For simple dialogue-based interactions, existing formats such as JSON might even be sufficient as they are. I am reminded of a paper I read when putting together my degree thesis back in the 1990s, where the idea was that people would be able to run programs safely in their mail reader, with one example being that of submitting forms.

T-shirt ordering dialogues shown by Safe-Tcl

T-shirt ordering dialogues shown by Safe-Tcl running in a mail program, offering the recipient the chance to order some merchandise that might not be as popular now.

In that paper, most of the emphasis was on the safety of the execution environment as opposed to the way in which the transaction was to be encoded, but it is not implausible that one might have encoded the details of the transaction – the T-shirt size (with the recipient’s physical address presumably already being known to the sender) – in a serialised form of the programming language concerned (Safe-Tcl) as opposed to just dumping some unstructured text in the body of a mail. I would need to dig out my own thesis to see what ideas I had for serialised information. Certainly, such transactions even embellished with other details and choices and with explanatory information, prompts and questions do not require megabytes of HTML, CSS, JavaScript, images, videos and so on.

Interoperability and Device Choice

One thing that the Web was supposed to liberate us from was the insistence that to perform a particular task, we needed a particular application, and that particular application was only available on a particular platform. In the early days, HTML was deliberately simplistic in its display capabilities, and people had to put up with Web pages that looked very plain until things like font tags allowed people to go wild. With different factions stretching HTML in all sorts of directions, CSS was introduced to let people apply presentation attributes to documents, supposedly without polluting or corrupting the original HTML that would remain semantically pure. We all know how this turned out, particularly once the Web 2.0 crowd got going.

Back in the 1990s, I worked on an in-house application at my employer that used a document model inspired by SGML (as HTML had been), and the graphical user interface to the documents being exchanged initially offered a particular user interface paradigm when dealing with collections of data items, this being the one embraced by the Macintosh’s Finder when showing directory hierarchies in what we would now call a tree view. Unfortunately, users seemed to find expanding and hiding things by clicking on small triangles to be annoying, and so alternative presentation approaches were explored. Interestingly, the original paradigm would be familiar even now to those using generic XML editor software, but many people would accept that while such low-level editing capabilities are nice to have, higher-level representations of the data are usually much more preferable.

Such user preferences could quite easily be catered to through the availability of client software that works in the way they expect, rather than the providers of functionality or the operators of services trying to gauge what the latest fashions in user interfaces might be, as we have all seen when familiar Web sites change to mimic something one would expect to see on a smartphone, even with a large monitor on a desk with plenty of pixels to spare. With well-defined standards, if a client device or program were to see that it needed to allow a user to peruse a large collection of items or to choose a calendar date, it would defer to the conventions of that device or platform, giving the user the familiarity they expect.

This would also allow clients and devices with a wide range of capabilities to be used. The Web tried to deliver a reasonable text-only experience for a while, but most sites can hardly be considered usable in a textual browser these days. And although there is an “accessibility story” for the Web, it largely appears to involve retrofitting sites with semantic annotations to help users muddle through the verbose morass encoded in each page. Certainly, the Web of today does do one thing reasonably by mixing up structure and presentation: it can provide a means of specifying and navigating new kinds of data that might be unknown to the client, showing them something more than a text box. A decent way of extending the range of supported data types would be needed in any alternative, but it would need to spare everyone suddenly having scripts running all over the place.

Rights to Repair

The right to repair movement has traditionally been focused on physical repairs to commercial products, making sure that even if the manufacturer has abandoned a product and really wants you to buy something new from them, you can still choose to have the product repaired so that it can keep serving you well for some time to come. But if hardware remains capable enough to keep doing its job, and if we are able to slow down or stop the forces of enforced obsolescence, we also need to make sure that the software running on the hardware may also be repaired, maintained and updated. A right to repair very much applies to software.

Devotees of the cult of the smartphone, those who think that there is an “app” for everything, should really fall silent with shame. Not just for shoehorning every activity they can think of onto a device that is far from suitable for everyone, and not just for mandating commercial relationships with large multinational corporations for everyone, but also for the way that happily functioning smartphones have to be discarded because they run software that is too old and cannot be fixed or upgraded. Demanding the right to exercise the four freedoms of Free Software for our devices means that we get to decide when those devices are “too old” for what we want to use them for. If a device happens to be no longer usable for its original activity even after some Free Software repairs, we can repurpose it for something else, instead of having the vendor use those familiar security scare stories and pretending that they are acting in our best interests.

Long-Term Perspectives

If we are looking to preserve the viability of our computing devices by demanding interoperability to give them a chance to participate in the modern world and by demanding that they may be repaired, we also need to think about how the software we develop may itself remain viable, both in terms of the ability to run the software on available devices as well as the ability to keep maintaining, improving and repairing it. That potentially entails embracing unfashionable practices because “modern” practices do not exactly seem conducive to the kind of sustainable activities we have in mind.

I recently had the opportunity to contemplate the deployment of software in “virtual environments” containing entire software stacks, each of which running their own Web server program, that would receive their traffic from another Web server program running in the same virtual machine, all of this running in some cloud infrastructure. It was either that or using containers containing whole software distributions, these being deployed inside virtual machines containing their own software distributions. All because people like to use the latest and greatest stuff for everything, this stuff being constantly churned by fashionable development methodologies and downloaded needlessly over and over again from centralised Internet services run by monopolists.

Naturally, managing gigabytes of largely duplicated software is worlds, if not galaxies, away from the modest computing demands of things like Collapse OS, but it would be distasteful to anyone even a decade ago and shocking to anyone even a couple of decades ago. Unfashionable as it may seem now, software engineering courses once emphasised things like modularity and the need for formal interfaces between modules in systems. And a crucial benefit of separating out functionality into modules is to allow those modules to mature, do the job they were designed for, and to recede into the background and become something that can be relied upon and not need continual, intensive maintenance. There is almost nothing better than writing a library that one may use constantly but never need to touch again.

Thus, the idea that a precarious stack of precisely versioned software is required to deliver a solution is absurd, but it drives the attitude that established software distributions only deliver “old” software, and it drives the demand for wasteful container or virtual environment solutions whose advocates readily criticise traditional distributions whilst pilfering packages from them. Or as Docker users might all too easily say, “FROM debian:sid”. Part of the problem is that it is easy to rely on methods of mass consumption to solve problems with software – if something is broken, just update and see if it fixes it – but such attitudes permeate the entire development process, leading to continual instability and a software stack constantly in flux.

Dealing with a multitude of software requirements is certainly a challenging problem that established operating systems struggle to resolve convincingly, despite all the shoehorning of features into the Linux technology stack. Nevertheless, the topic of operating system design is rather outside the scope of this article. Closer to relevance is the matter of how people seem reluctant to pick a technology and stick with it, particularly in the realm of programming languages. Then again, I covered much of this before and fairly recently, too. Ultimately, we want to be establishing software stacks that people can readily acquaint themselves with decades down the line, without the modern-day caveats that “feature X changed in version Y” and that if you were not there at the time, you have quite the job to do to catch up with that and everything else that went on, including migrations to a variety of source management tools and venues, maybe even completely new programming languages.

A Different Mindset

If anything, Collapse OS makes us consider a future beyond tomorrow, next week, next year, or a few years’ time. Even if the wheels do not start falling off the vehicle of human civilisation, there are still plenty of other things that can go away without much notice. Corporations like Apple and Google might stick around, whether that is good news or not, but it does not stop them from pulling the plug on products and services. Projects and organisations do not always keep going forever, not least because they are often led by people who cannot keep going forever, either.

There are ways we can mitigate these threats to sustainability and longevity, however. We can share our software through accessible channels without insisting that others use those monopolist-run centralised hosting services. We can document our software so that others have a chance of understanding what we were thinking when we wrote it. We can try and keep the requirements for our software modest and give people a chance to deploy it on modest hardware. And we might think about what kind of world we are leaving behind and whether it is better than the world we were born into.

On forms of apparent progress

Sunday, May 8th, 2022

Over the years, I have had a few things to say about technological change, churn, and the appearance of progress, a few of them touching on the evolution and development of the Python programming language. Some of my articles have probably seemed a bit outspoken, perhaps even unfair. It was somewhat reassuring, then, to encounter the reflections of a longstanding author of Python books and his use of rather stronger language than I think I ever used. It was also particularly reassuring because I apparently complain about things in far too general a way, not giving specific examples of phenomena for anything actionable to be done about them. So let us see whether we can emerge from the other end of this article in better shape than we are at this point in it.

Now, the longstanding author in question is none other than Mark Lutz whose books “Programming Python” and “Learning Python” must surely have been bestsellers for their publisher over the years. As someone who has, for many years, been teaching Python to a broad audience of newcomers to the language and to programming in general, his views overlap with mine about how Python has become increasingly incoherent and overly complicated, as its creators or stewards pursue some kind of agenda of supposed improvement without properly taking into account the needs of the broadest reaches of its user community. Instead, as with numerous Free Software projects, an unscrutable “vision” is used to impose change based on aesthetics and contemporary fashions, unrooted in functional need, by self-appointed authorities who often lack an awareness or understanding of historical precedent or genuine user need.

Such assertions are perhaps less kind to Python’s own developers than they should be. Those choosing to shoehorn new features into Python arguably have more sense of precedent than, say, the average desktop environment developer imitating Apple in what could uncharitably be described as an ongoing veiled audition for a job in Cupertino. Nevertheless, I feel that language developers would be rather more conservative if they only considered what teaching their language to newcomers entails or what effect their changes have on the people who have written code in their language. Am I being unfair? Let us read what Mr Lutz has to say on the matter:

The real problem with Python, of course, is that its evolution is driven by narcissism, not user feedback. That inevitably makes your programs beholden to ever-shifting whims and ever-hungry egos. Such dependencies might have been laughable in the past. In the age of Facebook, unfortunately, this paradigm permeates Python, open source, and the computer field. In fact, it extends well beyond all three; narcissism is a sign of our times.

You won’t find a shortage of similar sentiments on his running commentary of Python releases. Let us, then, take a look at some experiences and try to review such assertions. Maybe I am not being so unreasonable (or impractical) in my criticism after all!

Out in the Field

In a recent job, of which more might be written another time, Python was introduced to people more familiar with languages such as R (which comes across as a terrible language, but again, another time perhaps). It didn’t help that as part of that introduction, they were exposed to things like this:

    def method(self, arg: Dict[Something, SomethingElse]):
        return arg.items()

When newcomers are already having to digest new syntax, new concepts (classes and objects!), and why there is a “self” parameter, unnecessary ornamentation such as the type annotations included in the above, only increases the cognitive burden. It also doesn’t help to then say, “Oh, the type declarations are optional and Python doesn’t really check them, anyway!” What is the student supposed to do with that information? Many years ago now, Java was mocked for confronting its newcomers with boilerplate like this classic:

    public static void main(String[] args)

But exposing things that the student is then directed to ignore is simply doing precisely the same thing for which Java was criticised. Of course, in Python, the above method could simply have been written as follows:

    def method(self, arg):
        return arg.items()

Indeed, for the above method to be valid in the broadest sense, the only constraint on the nature of the “arg” parameter is that it offer an attribute called “items” that can be called with no arguments. By prescriptively imposing a limitation on “arg” as was done above, insisting that it be a dictionary, the method becomes less general and less usable. Moreover, the nature of Python itself is neglected or mischaracterised: the student might believe that only a certain type would be acceptable, just as one might suggest that the author of that code also fails to see that a range of different, conformant kinds of objects could be used with the method. Such practices discourage or conceal polymorphism and generic functionality at a point when the beginner’s mind should be opened to them.

As Mr Lutz puts such things in the context of a different feature introduced in Python 3.5:

To put that another way: unless you’re willing to try explaining a new feature to people learning the language, you just shouldn’t do it.

The tragedy is that Python in its essential form is a fairly intuitive and readable language. But as he also says in the specific context of type annotations:

Thrashing (and rudeness) aside, the larger problem with this proposal’s extensions is that many programmers will code them—either in imitation of other languages they already know, or in a misguided effort to prove that they are more clever than others. It happens. Over time, type declarations will start appearing commonly in examples on the web, and in Python’s own standard library. This has the effect of making a supposedly optional feature a mandatory topic for every Python programmer.

And I can certainly say from observation that in various professional cultures, including academia where my own recent observations were made, there is a persistent phenomenon where people demonstrate “best practice” to show that they as a software development practitioner (or, indeed, a practitioner of anything else related to the career in question) are aware of the latest developments, are able to communicate them to their less well-informed colleagues, and are presumably the ones who should be foremost in anyone’s consideration for any future hiring round or promotion. Unfortunately, this enthusiasm is not always tempered by considered reflection, either on the nature of the supposed innovation itself, or on the consequences its proliferation will have.

Perversely, such enthusiasm, provoked by the continual hustle for funding, positions, publications and reputation, risks causing a trail of broken programs, and yet at the same time, much is made of the need for software development to be done “properly” in academia, that people do research that is reproducible and whose computational elements are repeatable. It doesn’t help that those ambitions must also be squared with other apparent needs such as offering tools and services to others. And the need to offer such things in a robust and secure fashion sometimes has to coexist with the need to offer them in a convenient form, where appropriate. Taking all of these things into consideration is quite the headache.

A Positive Legacy

Amusingly, some have come to realise that Python’s best hope for reproducible research is precisely the thing that Python’s core developers have abandoned – Python 2.7 – and precisely because they have abandoned it. In an article about reproducing old, published results, albeit of a rather less than scientific nature, Nicholas Rougier sought to bring an old program back to life, aiming to find a way of obtaining or recovering the program’s sources, constructing an executable form of the program, and deploying and running that program on a suitable system. To run his old program, written for the Apple IIe microcomputer in Applesoft BASIC, required the use of emulators and, for complete authenticity, modern hardware expansions to transfer the software to floppy disks to run on an original Apple IIe machine.

And yet, the ability to revive and deploy a program developed 32 years earlier was possible thanks to the Apple machine’s status as a mature, well-understood platform with an enthusiastic community developing new projects and products. These initiatives were only able to offer such extensive support for a range of different “retrocomputing” activities because the platform has for a long time effectively been “frozen”. Contrasting such a static target with rapidly evolving modern programming languages and environments, Rougier concluded that “an advanced programming language that is guaranteed not to evolve anymore” would actually be a benefit for reproducible science, that few people use many of the new features of Python 3, and that Python 2.7 could equally be such a “highly fertile ground for development” that the proprietary Applesoft BASIC had proven to be for a whole community of developers and users.

Naturally, no language designer ever wants to be told that their work is finished. Lutz asserts that “a bloated system that is in a perpetual state of change will eventually be of more interest to its changers than its prospective users”, which is provocative but also rings true. CPython (the implementation of Python in the C programming language) has always had various technical deficiencies – the lack of proper multithreading, for instance – but its developers who also happen to be the language designers seem to prefer tweaking the language instead. Other languages have gained in popularity at Python’s expense by seeking to address such deficiencies and to meet the frustrated expectations of Python developers. Or as Lutz notes:

While Python developers were busy playing in their sandbox, web browsers mandated JavaScript, Android mandated Java, and iOS became so proprietary and closed that it holds almost no interest to generalist developers.

In parts of academia familiar with Python, languages like Rust and Julia are now name-dropped, although I doubt that many of those doing the name-dropping realise what they are in for if they decide to write everything in Rust. Meanwhile, Python 2 code is still used, against a backdrop of insistent but often ignored requests from systems administrators for people to migrate code to Python 3 so that newer operating system distributions can be deployed. In other sectors, such migration is meant to be factored into the cost of doing business, but in places like academia where software maintenance generally doesn’t get funding, no amount of shaming or passive-aggressive coercion is magically going to get many programs updated at all.

Of course, we could advocate that everybody simply run their old software in virtual machines or containers, just as was possible with that Applesoft BASIC program from over thirty years ago. Indeed, containerisation is the hot thing in places like academia just as it undoubtedly is elsewhere. But unlike the Apple II community who will hopefully stick with what they know, I have my doubts that all those technological lubricants marketed under the buzzword “containers!” will still be delivering the desired performance decades from now. As people jump from yesterday’s hot solution to today’s and on to tomorrow’s (Docker, with or without root, to Singularity/Apptainer, and on to whatever else we have somehow deserved), just the confusion around the tooling will be enough to make the whole exercise something of an ordeal.

A Detour to the Past

Over the last couple of years, I have been increasingly interested in looking back over the course of the last few decades, back to the time when I was first introduced to microcomputers, and even back beyond that to the age of mainframes when IBM reigned supreme and the likes of ICL sought to defend their niche and to remain competitive, or even relevant, as the industry shifted beneath them. Obviously, I was not in a position to fully digest the state of the industry as a schoolchild fascinated with the idea that a computer could seemingly take over a television set and show text and graphics on the screen, and I was certainly not “taking” all the necessary computing publications to build up a sophisticated overview, either.

But these days, many publications from decades past – magazines, newspapers, academic and corporate journals – are available from sites like the Internet Archive, and it becomes possible to sample the sentiments and mood of the times, frustrations about the state of then-current, affordable technology, expectations of products to come, and so on. Those of us who grew up in the microcomputing era saw an obvious progression in computing technologies: faster processors, more memory, better graphics, more and faster storage, more sophisticated user interfaces, increased reliability, better development tools, and so on. Technologies such as Unix were “the future”, labelled as impending to the point of often being ridiculed as too expensive, too demanding or too complicated, perhaps never to see the limelight after all. People were just impatient: we got there in the end.

While all of that was going on, other trends were afoot at the lowest levels of computing. Computer instruction set architectures had become more complicated as the capabilities they offered had expanded. Although such complexity, broadly categorised using labels such as CISC, had been seen as necessary or at least desirable to be able to offer system implementers a set of convenient tools to more readily accomplish their work, the burden of delivering such complexity risked making products unreliable, costly and late. For example, the National Semiconductor 32016 processor, seeking to muscle in on the territory of Digital Equipment Corporation and its VAX line of computers, suffered delays in getting to market and performance deficiencies that impaired its competitiveness.

Although capable and in some respects elegant, it turned out that these kinds of processing architectures were not necessarily delivering what was actually important, either in terms of raw performance for end-users or in terms of convenience for developers. Realisations were had that some of the complexity was superfluous, that programmers did not use certain instructions often or at all, and that a flawed understanding of programmers’ needs had led to the retention of functionality that did not need to be inscribed in silicon with all the associated costs and risks that this would entail. Instead, simpler, more orthogonal architectures could be delivered that offered instructions that programmers or, crucially, their compilers would actually use. The idea of RISC was thereby born.

As the concept of RISC took off, pursued by the likes of IBM, UCB and Sun, Stanford University and MIPS, Acorn (and subsequently ARM), HP, and even Digital, Intel and Motorola, amongst others, the concept of the workstation became more fully realised. It may have been claimed by some commentator or other that “the personal computer killed the workstation” or words to that effect, but in fact, the personal computer effectively became the workstation during the course of the 1990s and early years of the twenty-first century, albeit somewhat delayed by Microsoft’s sluggish delivery of appropriately sophisticated operating systems throughout its largely captive audience.

For a few people in the 1980s, the workstation vision was the dream: the realisation of their expectations for what a computer should do. Although expectations should always be updated to take new circumstances and developments into account, it is increasingly difficult to see the same rate of progress in this century’s decades that we saw in the final decades of the last century, at least in terms of general usability, stability and the emergence of new and useful computational capabilities. Some might well argue that graphics and video processing or networked computing have progressed immeasurably, these certainly having delivered benefits for visualisation, gaming, communications and the provision of online infrastructure, but in other regards, we seem stuck with something very familiar to that of twenty years ago but with increasingly disillusioned developers and disempowered users.

What we might take away from this historical diversion is that sometimes a focus on the essentials, on simplicity, and on the features that genuinely matter make more of a difference than just pressing ahead with increasingly esoteric and baroque functionality that benefits few and yet brings its own set of risks and costs. And we should recognise that progress is largely acknowledged only when it delivers perceptable benefits. In terms of delivering a computer language and environment, this may necessarily entail emphasising the stability and simplicity of the language, focusing instead on remedying the deficiencies of the underlying language technology to give users the kind of progress they might actually welcome.

A Dark Currency

Mark Lutz had intended to stop commentating on newer versions of Python, reflecting on the forces at work that makes Python what it now is:

In the end, the convolution of Python was not a technical story. It was a sociological story, and a human story. If you create a work with qualities positive enough to make it popular, but also encourage it to be changed for a reward paid entirely in the dark currency of ego points, time will inevitably erode the very qualities which made the work popular originally. There’s no known name for this law, but it happens nonetheless, and not just in Python. In fact, it’s the very definition of open source, whose paradigm of reckless change now permeates the computing world.

I also don’t know of a name for such a law of human behaviour, and yet I have surely mentioned such behavioural phenomena previously myself: the need to hustle, demonstrate expertise, audition for some potential job offer, demonstrate generosity through volunteering. In some respects, the cultivation of “open source” as a pragmatic way of writing software collaboratively, marginalising Free Software principles and encouraging some kind of individualistic gift culture coupled to permissive licensing, is responsible for certain traits of what Python has become. But although a work that is intrinsically Free Software in nature may facilitate chaotic, haphazard, antisocial, selfish, and many other negative characteristics in the evolution of that work, it is the social and economic environment around the work that actually promotes those characteristics.

When reflecting on the past, particularly during periods when capabilities were being built up, we can start to appreciate the values that might have been more appreciated at that time than they are now. Python originated at a time when computers in widespread use were becoming capable enough to offer such a higher-level language, one that could offer increased convenience over various systems programming languages whilst building on top of the foundations established by those languages. With considerable effort having been invested in such foundations, a mindset seemed to persist, at least in places, that such foundations might be enduring and be good for a long time.

An interesting example of such attitudes arose at a lower level with the development of the Alpha instruction set architecture. Digital, having responded ineffectively to its competitive threats, embraced the RISC philosophy and eventually delivered a processor range that could be used to support its existing product line-up, emphasising performance and longevity through a “15- to 25-year design horizon” that attempted to foresee the requirements of future systems. Sadly, Digital made some poor strategic decisions, some arguably due to Microsoft’s increasing influence over the company’s strategy, and after a parade of acquisitions, Alpha fell under the control of HP who sacrificed it, along with its own RISC architecture, to commit to Intel’s dead-end Itanium architecture. I suppose this illustrates that the chaos of “open source” is not the only hazard threatening stability and design for longevity.

Such long or distant horizons demand that newer developments remain respectful to the endeavours that have made them possible. Such existing and ongoing endeavours may have their flaws, but recognising and improving those flaws is more constructive and arguably more productive than tearing everything down and demanding that everything be redone to accommodate an apparently new way of thinking. Sadly, we see a lot of the latter these days, but it goes beyond a lack of respect for precedent and achievement, reflecting broader tendencies in our increasingly stressed societies. One such tendency is that of destructive competition, the elimination of competitors, and the pursuit of monopoly. We might be used to seeing such things in the corporate sphere – the likes of Microsoft wanting to be the only ones who provide the software for your computer, no matter where you buy it – but people have a habit of imitating what they see, especially when the economic model for our societies increasingly promotes the hustle for work and the need to carve out a lucrative niche.

So, we now see pervasive attitudes such as the pursuit of the zero-sum game. Where the deficiencies of a technology lead its users to pursue alternatives, defensiveness in the form of utterances such as “no need to invent another language” arises. Never mind that the custodians of the deficient technology – in this case, Python, of course – happily and regularly offer promotional consideration to a company who openly tout their own language for mobile development. Somehow, the primacy of the Python language is a matter for its users to bear, whereas another rule applies amongst its custodians. That is another familiar characteristic of human behaviour, particularly where power and influence accumulates.

And so, we now see hostility towards anything being perceived as competition, even if it is merely an independent endeavour undertaken by someone wishing to satisfy their own needs. We see intolerance for other solutions, but we also see a number of other toxic behaviours on display: alpha-dogging, personality worship and the cultivation of celebrity. We see chest-puffing displays of butchness about Important Matters like “security”. And, of course, the attitude to what went before is the kind of approach that involves boiling the oceans so that it may be populated by precisely the right kind of fish. None of this builds on or complements what is already there, nor does it deliver a better experience for the end-user. No wonder people say that they are jealous of colleagues who are retiring.

All these things make it unappealing to share software or even ideas with others. Fortunately, if one does not care about making a splash, one can just get on with things that are personally interesting and ignore all the “negativity from ignorant, opinionated blowhards”. Although in today’s hustle culture, this means also foregoing the necessary attention that might prompt anyone to discover your efforts and pay you to do such work. On the actual topic that has furnished us with so many links to toxic behaviour, and on the matter of the venue where such behaviour is routine, I doubt that I would want my own language-related efforts announced in such a venue.

Then again, I seem to recall that I stopped participating in that particular venue after one discussion had a participant distorting public health observations by the likes of Hans Rosling to despicably indulge in poverty denial. Once again, broader social, economic and political influences weigh heavily on our industry and communities, with people exporting their own parochial or ignorant views globally, and in the process corrupting and undermining other people’s societies, oblivious to the misery it has already caused in their own. Against this backdrop, simple narcissism is perhaps something of a lesser concern.

At the End of the Tunnel

I suppose I promised some actionable observations at the start of the article, so what might they be?

Respect Users and Investments

First of all, software developers should be respectful towards the users of their software. Such users lend validation to that software, encourage others to use it, and they potentially make it possible for the developers to work on it for a living. Their use involves an investment that, if written off by the developers, is costly for everyone concerned.

And no, the users’ demands for that investment to be protected cannot be disregarded as “entitlement”, even if they paid nothing to acquire the software, at least if the developers are happy to enjoy all the other benefits of the software’s proliferation. As is often said, power and influence bring responsibility. Just as democratically elected politicians have a responsibility towards everyone they represent, regardless of whether those people voted for them or not, software developers have a duty of care towards all of their users, even if it is merely to step out of the way and to let the users take the software in its own direction without seeking to frustrate them as we saw when Python 2 was cast aside.

Respond to User Needs Constructively

Developers should also be responsive to genuine user needs. If you believe all the folklore about the “open source” way, it should have been precisely people’s own genuine needs that persuaded them to initiate their own projects in the first place. It is entirely possible that a project may start with one kind of emphasis and demand one kind of skills only to evolve towards another emphasis or to require other skills. With Python, much of the groundwork was laid in the 1990s, building an interpreter and formulating a capable language. But beyond that initial groundwork, the more pressing challenges lay outside the language design domain and went beyond the implementation of a simple interpreter.

Improved performance and concurrency, both increasingly expected by users, required the application of other skills that might not have been present in the project. And yet, the elaboration of the language continued, with the developers susceptible to persuasion by outsiders engaging in “alpha-dogging” or even insiders with an inferiority complex, being made to feel that the language was not complete or even adequate since it lacked features from the pet languages of those outsiders or of the popular language of the day. Development communities should welcome initiatives to improve their projects in ways that actually benefit the users, and they should resist the urge to muscle in on such initiatives by seeking to demonstrate that they have the necessary solutions when their track record would indicate otherwise. (Or worse still, by reframing user needs in terms of their own narrow agenda as if to say, “Here is what you are really asking for.” Another familiar trait of the “visionary” desktop developer.)

Respect Other Solutions

Developers and commentators more generally should accept and respect the existence of other technologies and solutions. Just because they have their own favourite solution does not de-legitimise something they have just been made aware of. Maybe it is simply not meant for them. After all, not everything that happens in this reality is part of a performance exclusively for any one person’s benefit, despite what some people appear to think. And the existence of other projects doing much the same thing is not necessarily “wasted effort”: another concept introduced from some cult of economics or other.

It is entirely possible to provide similar functionality in different ways, and the underlying implementations may lend those different projects different characteristics – portability, adaptability, and so on – even if the user sees largely the same result on their screen. Maybe we do want to encourage different efforts even for fundamental technologies or infrastructure, not because anyone likes to “waste effort”, but because it gives the systems we build a level of redundancy and resilience. And maybe some people just work better with certain other people. We should let them, as opposed to forcing them to fit in with tiresome, exploitative and time-wasting development cultures, to suffer rudeness and general abuse, simply to go along with an exercise that props up some form of corporate programme of minimal investment in the chosen solution of industry and various pundits.

Develop for the Long Term and for Stability

Developers should make things that are durable so that they may be usable for many years to come. Or they should at least expect that people may want to use them years or even decades from now. Just because something is old does not mean it is bad. Much of what we use today is based on technology that is old, with much of that technology effectively coming of age decades ago. We should be able to enjoy the increased performance of our computers, not have it consumed by inefficient software that drives the hardware and other software into obsolescence. Technological fads come and go (and come back again): people in the 1990s probably thought that virtual reality would be pervasive by now, but experience should permit us to reflect and to recognise that some things were (and maybe always will be) bad ideas and that we shouldn’t throw everything overboard to pander to them, only to regret doing so later.

We live in a world where rapid and uncomfortable change has been normalised, but where consumerism has been promoted as the remedy. Perhaps some old way of doing something mundane doesn’t work any more – buying something, interacting with public agencies, fulfilling obligations, even casting votes in some kinds of elections – perhaps because someone has decided that money can be saved (and, of course, soon wasted elsewhere) if it can be done “digitally” from now on. To keep up, you just need a smartphone, or a newer smartphone, with an “app”, or the new “app”, and a subscription to a service, and another one. And so on. All of that “works” for people as long as they have the necessarily interest, skills, time, and money to spend.

But as the last few years have shown, it doesn’t take much to disrupt these unsatisfactory and fragile arrangements. Nobody advocating fancy “digital” solutions evidently considered that people would not already have everything they need to access their amazing creations. And when, as they say, neither love nor money can get you the gadgets you need, it doesn’t even matter how well-off you are: suddenly you get a downgrade in experience to a level that, as a happy consumer, you probably didn’t even know still existed, even if it is still the reality for whole sections of our societies. We have all seen how narrow the margins are between everything apparently being “just fine” and there being an all-consuming crisis, both on a global level and, for many, on a personal level, too.

Recognise Responsibilities to Others

Change can be a positive thing if it carries everyone along and delivers actual progress. Meanwhile, there are those who embrace disruption as a form of change, claiming it to be a form of progress, too, but that form of change is destructive, harmful and exclusionary. It should not be a surprise that prominent advocates of a certain political movement advocate such disruptive change: for them, it doesn’t matter how many people suffer by the ruinous change they have inflicted on everyone as long as they are the ones to benefit; everyone else can wait fifty years or so to see some kind of consolation for the things taken from them, apparently.

As we deliver technology to others, we should not be the ones deepening any misery already experienced by imposing needless and costly change. We should be letting people catch up with the state of technology and allowing them to be comfortable with it. We should invest in long-term solutions that address people’s needs, and we should refuse to be shamed into playing the games of opportunists and profiteers who ridicule anything old or familiar in favour of what they happen to be promoting today. We should demand that people’s investments in hardware and software be protected, that they are not effectively coerced into constantly buying new things and seeing their their living standards diminished in other ways, with such consumption burdening our planet’s ecosystem and resources.

Just as we all experience that others have power over us, so we might recognise the power we have over other people. And just as we might expect others to consider our interests, so we might consider the interests of those who have to put up with our decisions. Maybe, in the end, all I am doing is asking for people to show some consideration for the experiences of other people, that their lives not be made any harder than they might already be. Is that really too much to ask? Is that so hard to understand?

Another Look at VGA Signal Generation with a PIC32 Microcontroller

Thursday, November 8th, 2018

Maybe some people like to see others attempting unusual challenges, things that wouldn’t normally be seen as productive, sensible or a good way to spend time and energy, things that shouldn’t be possible or viable but were nevertheless made to work somehow. And maybe they like the idea of indulging their own curiosity in such things, knowing that for potential future experiments of their own there is a route already mapped out to some kind of success. That might explain why my project to produce a VGA-compatible analogue video signal from a PIC32 microcontroller seems to attract more feedback than some of my other, arguably more useful or deserving, projects.

Nevertheless, I was recently contacted by different people inquiring about my earlier experiments. One was admittedly only interested in using Free Software tools to port his own software to the MIPS-based PIC32 products, and I tried to give some advice about navigating the documentation and to describe some of the issues I had encountered. Another was more concerned with the actual signal generation aspect of the earlier project and the usability of the end result. Previously, I had also had a conversation with someone looking to use such techniques for his project, and although he ended up choosing a different approach, the cumulative effect of my discussions with him and these more recent correspondents persuaded me to take another look at my earlier work and to see if I couldn’t answer some of the questions I had received more definitively.

Picking Over the Pieces

I was already rather aware of some of the demonstrated and potential limitations of my earlier work, these being concerned with generating a decent picture, and although I had attempted to improve the work on previous occasions, I think I just ran out of energy and patience to properly investigate other techniques. The following things were bothersome or a source of concern:

  • The unevenly sized pixels
  • The process of experimentation with the existing code
  • Whether the microcontroller could really do other things while displaying the picture

Although one of my correspondents was very complimentary about the form of my assembly language code, I rather felt that it was holding me back, making me focus on details that should be abstracted away. It should be said that MIPS assembly language is fairly pleasant to write, at least in comparison to certain other architectures.

(I was brought up on 6502 assembly language, where there is an “accumulator” that is the only thing even approaching a general-purpose register in function, and where instructions need to combine this accumulator with other, more limited, registers to do things like accessing “zero page”: an area of memory that supports certain kinds of operations by providing the contents of locations as inputs. Everything needs to be meticulously planned, and despite odd claims that “zero page” is really one big register file and that 6502 is therefore “RISC-like”, the existence of virtual machines such as SWEET16 say rather a lot about how RISC-like the 6502 actually is. Later, I learned ARM assembly language and found it rather liberating with its general-purpose registers and uncomplicated, rather easier to use, memory access instructions. Certain things are even simpler in MIPS assembly language, whereas other conveniences from ARM are absent and demand a bit more effort from the programmer.)

Anyway, I had previously introduced functionality in C in my earlier work, mostly because I didn’t want the challenge of writing graphics routines in assembly language. So with the need to more easily experiment with different peripheral configurations, I decided to migrate the remaining functionality to C, leaving only the lowest-level routines concerned with booting and exception/interrupt handling in assembly language. This effort took me to some frustrating places, making me deal with things like linker scripts and the kind of memory initialisation that one’s compiler usually does for you but which is absent when targeting a “bare metal” environment. I shall spare you those details in this article.

I therefore spent a certain amount of effort in developing some C library functionality for dealing with the hardware. It could be said that I might have used existing libraries instead, but ignoring Microchip’s libraries that will either be proprietary or the subject of numerous complaints by those unwilling to leave that company’s “ecosystem”, I rather felt that the exercise in library design would be useful in getting reacquainted and providing me with something I would actually want to use. An important goal was minimalism, whereas my impression of libraries such as those provided by the Pinguino effort are that they try and bridge the different PIC hardware platforms and consequently accumulate features and details that do not really interest me.

The Wide Pixel Problem

One thing that had bothered me when demonstrating a VGA signal was that the displayed images featured “wide” pixels. These are not always noticeable: one of my correspondents told me that he couldn’t see them in one of my example pictures, but they are almost certainly present because they are a feature of the mechanism used to generate the signal. Here is a crop from the example in question:

Picture detail from VGAPIC32 output

Picture detail from VGAPIC32 output

And here is the same crop with the wide pixels highlighted:

Picture detail with wide pixels highlighted

Picture detail with wide pixels highlighted

I have left the identification of all wide pixel columns to the reader! Nevertheless, it can be stated that these pixels occur in every fourth column and are especially noticeable with things like text, where at such low resolutions, the doubling of pixel widths becomes rather obvious and annoying.

Quite why this increase in pixel width was occurring became a matter I wanted to investigate. As you may recall, the technique I used to output pixels involved getting the direct memory access (DMA) controller in the PIC32 chip to “copy” the contents of memory to a hardware register corresponding to output pins. The signals from these pins were sent along the cable to the monitor. And the DMA controller was transferring data as fast as it could and thus producing pixel colours as fast as it could.

Pixel Output Using DMA Transfer

An overview of the architecture for pixel output using DMA transfer

One of my correspondents looked into the matter and confirmed that we were not imagining this problem, even employing an oscilloscope to check what was happening with the signals from the output pins. The DMA controller would, after starting each fourth pixel, somehow not be able to produce the next pixel in a timely fashion, leaving the current pixel colour unchanged as the monitor traced the picture across the screen. This would cause these pixels to “stretch” until the first pixel from the next group could be emitted.

Initially, I had thought that interrupts were occurring and the CPU, in responding to interrupt conditions and needing to read instructions, was gaining priority over the DMA controller and forcing pixel transfers to wait. Although I had specified a “cell size” of 160 bytes, corresponding to 160 pixels, I was aware that the architecture of the system would be dividing data up into four-byte “words”, and it would be natural at certain points for accesses to memory to be broken up and scheduled in terms of such units. I had therefore wanted to accommodate both the CPU and DMA using an approach where the DMA would not try and transfer everything at once, but without the energy to get this to work, I had left the matter to rest.

A Steady Rhythm

The documentation for these microcontrollers distinguishes between block and cell transfers when describing DMA. Previously, I had noted that these terms could be better described, and I think there are people who are under the impression that cells must be very limited in size and that you need to drive repeated cell transfers using various interrupt conditions to transfer larger amounts. We have seen that this is not the case: a single, large cell transfer is entirely possible, even though the characteristics of the transfer have been less than desirable. (Nevertheless, the documentation focuses on things like copying from one UART peripheral to another, arguably failing to explore the range of possible applications for DMA and to thereby elucidate the mechanisms involved.)

However, given the wide pixel effect, it becomes more interesting to introduce a steady rhythm by using smaller cell sizes and having an external event coordinate each cell’s transfer. With a single, large transfer, only one initiation event needs to occur: that produced by the timer whose period corresponds to that of a single horizontal “scanline”. The DMA channel producing pixels then runs to completion and triggers another channel to turn off the pixel output. In this scheme, the initiating condition for the cell transfer is the timer.

VGA Display Line Structure

The structure of each visible display line in the VGA signal

When using multiple cells to transfer the pixel data, however, it is no longer possible to use the timer in this way. Doing so would cause the initiation of the first cell, but then subsequent cells would only be transferred on subsequent timer events. And since these events only occur once per scanline, this would see a single line’s pixel data being transferred over many scanlines instead (or, since the DMA channel would be updated regularly, we would see only the first pixel on each line being emitted, stretched across the entire line). Since the DMA mechanism apparently does not permit one kind of interrupt condition to enable a channel and another to initiate each cell transfer, we must be slightly more creative.

Fortunately, the solution is to chain two channels, just as we already do with the pixel-producing channel and the one that resets the output. A channel is dedicated to waiting for the line timer event, and it transfers a single black pixel to the screen before handing over to the pixel-producing channel. This channel, now enabled, has its cell transfers regulated by another interrupt condition and proceeds as fast as such a condition may occur. Finally, the reset channel takes over and turns off the output as before.

Pixel Output Using Timed DMA Transfers

An overview of the architecture for pixel output using timed DMA transfers

The nature of the cell transfer interrupt can take various forms, but it is arguably most intuitive to use another timer for this purpose. We may set the limit of such a timer to 1, indicating that it may “wrap around” and thus produce an event almost continuously. And by configuring it to run as quickly as possible, at the frequency of the peripheral clock, it may drive cell transfers at a rate that is quick enough to hopefully produce enough pixels whilst also allowing other activities to occur between each transfer.

VGA Pixel Output (Using Transfer Timer)

Using a timer to initiate pixel transfers for the VGA signal

One thing is worth mentioning here just to be explicit about the mechanisms involved. When configuring interrupts that are used for DMA transfers, it is the actual condition that matters, not the interrupt nor the delivery of the interrupt request to the CPU. So, when using timer events for transfers, it appears to be sufficient to merely configure the timer; it will produce the interrupt condition upon its counter “wrapping around” regardless of whether the interrupt itself is enabled.

With a cell size of a single byte, and with a peripheral clock running at half the speed of the system clock, this approach is sufficient all by itself to yield pixels with consistent widths, with the only significant disadvantage being how few of them can be produced per line: I could only manage in the neighbourhood of 80 pixels! Making the peripheral clock run as fast as the system clock doesn’t help in theory: we actually want the CPU running faster than the transfer rate just to have a chance of doing other things. Nor does it help in practice: picture stability rather suffers.

A picture of the display output from timed DMA transfers

A picture of the display output from timed DMA transfers

Using larger cell sizes, we encounter the wide pixel problem, meaning that the end of a four-byte group is encountered and the transfer hangs on for longer than it should. However, larger cell sizes also introduce byte transfers at a different rate from cell transfers (at the system clock rate) and therefore risk making the last pixel produced by a cell longer than the others, anyway.

Uncovering DMA Transfers

I rather suspect that interruptions are not really responsible for the wide pixels at all, and that it is the DMA controller that causes them. Some discussion with another correspondent explored how the controller might be operating, with transfers perhaps looking something like this:

DMA read from memory
DMA write to display (byte #1)
DMA write to display (byte #2)
DMA write to display (byte #3)
DMA write to display (byte #4)
DMA read from memory
...

This would, by itself, cause a transfer pattern like this:

R____R____R____R____R____R ...
_WWWW_WWWW_WWWW_WWWW_WWWW_ ...

And thus pixel output as follows:

41234412344123441234412344 ...
=***==***==***==***==***== ... (narrow pixels as * and wide pixel components as =)

Even without any extra operations or interruptions, we would get a gap between the write operations that would cause a wider pixel. This would only get worse if the DMA controller had to update the address of the pixel data after every four-byte read and write, not being able to do so concurrently with those operations. And if the CPU were really able to interrupt longer transfers, even to obtain a single instruction to execute, it might then compete with the DMA controller in accessing memory, making the read operations even later every time.

Assuming, then, that wide pixels are the fault of the way the DMA controller works, we might consider how we might want it to work instead:

                     | ...
DMA read from memory | DMA write to display (byte #4)
                 \-> | DMA write to display (byte #1)
                     | DMA write to display (byte #2)
                     | DMA write to display (byte #3)
DMA read from memory | DMA write to display (byte #4)
                 \-> | ...

If only pixel data could be read from memory and written to the output register (and thus the display) concurrently, we might have a continuous stream of evenly-sized pixels. Such things do not seem possible with the microcontroller I happen to be using. Either concurrent reading from memory and writing to a peripheral is not possible or the DMA controller is not able to take advantage of this concurrency. But these observations did give me another idea.

Dual Channel Transfers

If the DMA controller cannot get a single channel to read ahead and get the next four bytes, could it be persuaded to do so using two channels? The intention would be something like the following:

Channel #1:                     Channel #2:
                                ...
DMA read from memory            DMA write to display (byte #4)
DMA write to display (byte #1)
DMA write to display (byte #2)
DMA write to display (byte #3)
DMA write to display (byte #4)  DMA read from memory
                                DMA write to display (byte #1)
...                             ...

This is really nothing different from the above, functionally, but the required operations end up being assigned to different channels explicitly. We would then want these channels to cooperate, interleaving their data so that the result is the combined sequence of pixels for a line:

Channel #1: 1234    1234     ...
Channel #2:     5678    5678 ...
  Combined: 1234567812345678 ...

It would seem that channels might even cooperate within cell transfers, meaning that we can apparently schedule two long transfer cells and have the DMA controller switch between the two channels after every four bytes. Here, I wrote a short test program employing text strings and the UART peripheral to see if the microcontroller would “zip up” the strings, with the following being used for single-byte cells:

Channel #1: "Adoc gi,hlo\r"
Channel #2: "n neaan el!\n"
  Combined: "And once again, hello\r\n"

Then, seeing that it did, I decided that this had a chance of also working with pixel data. Here, every other pixel on a line needs to be presented to each channel, with the first channel being responsible for the pixels in odd-numbered positions, and the second channel being responsible for the even-numbered pixels. Since the DMA controller is unable to step through the data at address increments other than one (which may be a feature of other DMA implementations), this causes us to rearrange each line of pixel data as follows:

 Displayed pixels: 123456......7890
Rearranged pixels: 135...79246...80
                   *       *

Here, the asterisks mark the start of each channel’s data, with each channel only needing to transfer half the usual amount.

Pixel Output Using Timed Dual-Channel Transfers

The architecture involved in employing two pixel data channels with timed transfers

The documentation does, in fact, mention that where multiple channels are active with the same priority, each one is given control in turn with the controller cycling through them repeatedly. The matter of which order they are chosen, which is important for us, seems to be dependent on various factors, only some of which I can claim to understand. For instance, I suspect that if the second channel refers to data that appears before the first channel’s data in memory, it gets scheduled first when both channels are activated. Although this is not a significant concern when just trying to produce a stable picture, it does limit more advanced operations such as horizontal scrolling.

A picture of the display output from timed, dual-channel DMA transfers

A picture of the display output from timed, dual-channel DMA transfers

As you can see, trying this technique out with timed transfers actually made a difference. Instead of only managing something approaching 80 pixels across the screen, more than 90 can be accommodated. Meanwhile, experiments with transfers going as fast as possible seemed to make no real difference, and the fourth pixel in each group was still wider than the others. Still, making the timed transfer mode more usable is a small victory worth having, I suppose.

Parallel Mode Revisited

At the start of my interest in this project, I had it in my mind that I would couple DMA transfers with the parallel mode (or Parallel Master Port) functionality in order to generate a VGA signal. Certain aspects of this, particularly gaps between pixels, made me less than enthusiastic about the approach. However, in considering what might be done to the output signal in other situations, I had contemplated the use of a flip-flop to hold output stable according to a regular tempo, rather like what I managed to achieve, almost inadvertently, when introducing a transfer timer. Until recently, I had failed to apply this idea to where it made most sense: in regulating the parallel mode signal.

Since parallel mode is really intended for driving memory devices and display controllers, various control signals are exposed via pins that can tell these external devices that data is available for their consumption. For our purposes, a flip-flop is just like a memory device: it retains the input values sampled by its input pins, and then exposes these values on its output pins when the inputs are “clocked” into memory using a “clock pulse” signal. The parallel mode peripheral in the microcontroller offers various different signals for such clock and selection pulse purposes.

VGA Output Circuit (Parallel Mode)

The parallel mode circuit showing connections relevant to VGA output (generic connections are not shown)

Employing the PMWR (parallel mode write) signal as the clock pulse, directing the display signals to the flip-flop’s inputs, and routing the flip-flop’s outputs to the VGA circuit solved the pixel gap problem at a stroke. Unfortunately, it merely reminded us that the wide pixel problem also affects parallel mode output, too. Although the clock pulse is able to tell an external component about the availability of a new pixel value, it is up to the external component to regulate the appearance of each pixel. A memory device does not care about the timing of incoming data as long as it knows when such data has arrived, and so such regulation is beyond the capabilities of a flip-flop.

It was observed, however, that since each group of pixels is generated at a regular frequency, the PMWR signalling frequency might be reduced by being scaled by a constant factor. This might allow some pixel data to linger slightly longer in the flip-flop and be slightly stretched. By the time the fourth pixel in a group arrives, the time allocated to that pixel would be the same as those preceding it, thus producing consistently-sized pixels. I imagine that a factor of 8/9 might do the trick, but I haven’t considered what modification to the circuit might be needed or whether it would all be too complicated.

Recognising the Obvious

When people normally experiment with video signals from microcontrollers, one tends to see people writing code to run as efficiently as is absolutely possible – using assembly language if necessary – to generate the video signal. It may only be those of us using microcontrollers with DMA peripherals who want to try and get the DMA hardware to do the heavy lifting. Indeed, those of us with exposure to display peripherals provided by system-on-a-chip solutions feel almost obliged to do things this way.

But recent discussions with one of my correspondents made me reconsider whether an adequate solution might be achieved by just getting the CPU to do the work of transferring pixel data to the display. Previously, another correspondent had indicated that it this was potentially tricky, and that getting the timings right was more difficult than letting the hardware synchronise the different mechanisms – timer and DMA – all by itself. By involving the CPU and making it run code, the display line timer would need to generate an interrupt that would be handled, causing the CPU to start running a loop to copy data from the framebuffer to the output port.

Pixel Output Using CPU-Driven Transfers

An overview of the architecture with the CPU driving transfers of pixel data

This approach puts us at the mercy of how the CPU handles and dispatches interrupts. Being somewhat conservative about the mechanisms more generally available on various MIPS-based products, I tend to choose a single interrupt vector and then test for the different conditions. Since we need as little variation as possible in the latency between a timer event occurring and the pixel data being generated, I test for that particular event before even considering anything else. Then, a loop of the following form is performed:

    for (current = line_data; current < end; current++)
        *output_port = *current;

Here, the line data is copied byte by byte to the output port. Some adornments are necessary to persuade the compiler to generate code that writes the data efficiently and in order, but there is nothing particularly exotic required and GCC does a decent job of doing what we want. After the loop, a black/reset pixel is generated to set the appropriate output level.

One concern that one might have about using the CPU for such long transfers in an interrupt handler is that it ties up the CPU, preventing it from doing other things, and it also prevents other interrupt requests from being serviced. In a system performing a limited range of activities, this can be acceptable: there may be little else going on apart from updating the display and running programs that access the display; even when other activities are incorporated, they may accommodate being relegated to a secondary status, or they may instead take priority over the display in a way that may produce picture distortion but only very occasionally.

Many of these considerations applied to systems of an earlier era. Thinking back to computers like the Acorn Electron – a 6502-based system that formed the basis of my first sustained experiences with computing – it employs a display controller that demands access to the computer’s RAM for a certain amount of the time dedicated to each video frame. The CPU is often slowed down or even paused during periods of this display controller’s activity, making programs slower than they otherwise would be, and making some kinds of input and output slightly less reliable under certain circumstances. Nevertheless, with certain kinds of additional hardware, the possibility is present for such hardware to interrupt the CPU and to override the display controller that would then produce “snow” or noise on the screen as a consquence of this high-priority interruption.

Such issues cause us to consider the role of the DMA controller in our modern experiment. We might well worry about loading the CPU with lots of work, preventing it from doing other things, but what if the DMA controller dominates the system in such a way that it effectively prevents the CPU from doing anything productive anyway? This would be rather similar to what happens with the Electron and its display controller.

So, evaluating a CPU-driven solution seems to be worthwhile just to see whether it produces an acceptable picture and whether it causes unacceptable performance degradation. My recent correspondence also brought up the assertion that the RAM and flash memory provided by PIC32 microcontrollers can be accessed concurrently. This would actually mitigate contention between DMA and any programs running from flash memory, at least until the point that accesses to RAM needed to be performed by those programs, meaning that we might expect some loss of program performance by shifting the transfer burden to the CPU.

(Again, I am reminded of the Electron whose ROM could be accessed at full speed but whose RAM could only be accessed at half speed by the CPU but at full speed by the display controller. This might have been exploited by software running from ROM, or by a special kind of RAM installed and made available at the right place in memory, but the 6502 favours those zero-page instructions mentioned earlier, forcing RAM access and thus contention with the display controller. There were upgrades to mitigate this by providing some dedicated memory for zero page, but all of this is really another story for another time.)

Ultimately, having accepted that the compiler would produce good-enough code and that I didn’t need to try more exotic things with assembly language, I managed to produce a stable picture.

A picture of the display output from CPU-driven pixel data transfers

A picture of the display output from CPU-driven pixel data transfers

Maybe I should have taken this obvious path from the very beginning. However, the presence of DMA support would have eventually caused me to investigate its viability for this application, anyway. And it should be said that the performance differences between the CPU-based approach and the DMA-based approaches might be significant enough to argue in favour of the use of DMA for some purposes.

Observations and Conclusions

What started out as a quick review of my earlier work turned out to be a more thorough study of different techniques and approaches. I managed to get timed transfers functioning, revisited parallel mode and made it work in a fairly acceptable way, and I discovered some optimisations that help to make certain combinations of techniques more usable. But what ultimately matters is which approaches can actually be used to produce a picture on a screen while programs are being run at the same time.

To give the CPU more to do, I decided to implement some graphical operations, mostly copying data to a framebuffer for its eventual transfer as pixels to the display. The idea was to engage the CPU in actual work whilst also exercising access to RAM. If significant contention between the CPU and DMA controller were to occur, the effects would presumably be visible on the screen, potentially making the chosen configuration unusable.

Although some approaches seem promising on paper, and they may even produce promising results when the CPU is doing little more than looping and decrementing a register to introduce a delay, these approaches may produce less than promising results under load. The picture may start to ripple and stretch, and under “real world” load, the picture may seem noisy and like a badly-tuned television (for those who remember the old days of analogue broadcast signals).

Two approaches seem to remain robust, however: the use of timed DMA transfers, and the use of the CPU to do all the transfer work. The former is limited in terms of resolution and introduces complexity around the construction of images in the framebuffer, at least if optimised as much as possible, but it seems to allow more work to occur alongside the update of the display, and the reduction in resolution also frees up RAM for other purposes for those applications that need it. Meanwhile, the latter provides the resolution we originally sought and offers a straightforward framebuffer arrangement, but it demands more effort from the CPU, slowing things down to the extent that animation practically demands double buffering and thus the allocation of even more RAM for display purposes.

But both of these seemingly viable approaches produce consistent pixel widths, which is something of a happy outcome given all the effort to try and fix that particular problem. One can envisage accommodating them both within a system given that various fundamental system properties (how fast the system and peripheral clocks are running, for example) are shared between the two approaches. Again, this is reminiscent of microcomputers where a range of display modes allowed developers and users to choose the trade-off appropriate for them.

A demonstration of text plotting at a resolution of 160x128

A demonstration of text plotting at a resolution of 160x128

Having investigated techniques like hardware scrolling and sprite plotting, it is tempting to keep developing software to demonstrate the techniques described in this article. I am even tempted to design a printed circuit board to tidy up my rather cumbersome breadboard arrangement. And perhaps the library code I have written can be used as the basis for other projects.

It is remarkable that a home-made microcontroller-based solution can be versatile enough to demonstrate aspects of simple computer systems, possibly even making it relevant for those wishing to teach or to learn about such things, particularly since all the components can be connected together relatively easily, with only some magic happening in the microcontroller itself. And with such potential, maybe this seemingly pointless project might have some meaning and value after all!

Update

Although I can’t embed video files of any size here, I have made a “standard definition” video available to demonstrate scrolling and sprites. I hope it is entertaining and also somewhat illustrative of the kind of thing these techniques can achieve.

2017 in Review

Thursday, December 7th, 2017

On Planet Debian there seems to be quite a few regularly-posted articles summarising the work done by various people in Free Software over the month that has most recently passed. I thought it might be useful, personally at least, to review the different things I have been doing over the past year. The difference between this article and many of those others is that the work I describe is not commissioned or generally requested by others, instead relying mainly on my own motivation for it to happen. The rate of progress can vary somewhat as a result.

Learning KiCad

Over the years, I have been playing around with Arduino boards, sensors, displays and things of a similar nature. Although I try to avoid buying more things to play with, sometimes I manage to acquire interesting items regardless, and these aren’t always ready to use with the hardware I have. Last December, I decided to buy a selection of electronics-related items for interfacing and experimentation. Some of these items have yet to be deployed, but others were bought with the firm intention of putting different “spare” pieces of hardware to use, or at least to make them usable in future.

One thing that sits in this category of spare, potentially-usable hardware is a display circuit board that was once part of a desk telephone, featuring a two-line, bitmapped character display, driven by the Hitachi HD44780 LCD controller. It turns out that this hardware is so common and mundane that the Arduino libraries already support it, but the problem for me was being able to interface it to the Arduino. The display board uses a cable with a connector that needs a special kind of socket, and so some research is needed to discover the kind of socket needed and how this might be mounted on something else to break the connections out for use with the Arduino.

Fortunately, someone else had done all this research quite some time ago. They had even designed a breakout board to hold such a socket, making it available via the OSH Park board fabricating service. So, to make good on my plan, I ordered the mandatory minimum of three boards, also ordering some connectors from Mouser. When all of these different things arrived, I soldered the socket to the board along with some headers, wired up a circuit, wrote a program to use the LiquidCrystal Arduino library, and to my surprise it more or less worked straight away.

Breakout board for the Molex 52030 connector

Breakout board for the Molex 52030 connector

Hitachi HD44780 LCD display boards driven by an Arduino

Hitachi HD44780 LCD display boards driven by an Arduino

This satisfying experience led me to consider other boards that I might design and get made. Previously, I had only made a board for the Arduino using Fritzing and the Fritzing Fab service, and I had held off looking at other board design solutions, but this experience now encouraged me to look again. After some evaluation of the gEDA tools, I decided that I might as well give KiCad a try, given that it seems to be popular in certain “open source hardware” circles. And after a fair amount of effort familiarising myself with it, with a degree of frustration finding out how to do certain things (and also finding up-to-date documentation), I managed to design my own rather simple board: a breakout board for the Acorn Electron cartridge connector.

Acorn Electron cartridge breakout board (in 3D-printed case section)

Acorn Electron cartridge breakout board (in 3D-printed case section)

In the back of my mind, I have vague plans to do other boards in future, but doing this kind of work can soak up a lot of time and be rather frustrating: you almost have to get into some modified mental state to work efficiently in KiCad. And it isn’t as if I don’t have other things to do. But at least I now know something about what this kind of work involves.

Retro and Embedded Hardware

With the above breakout board in hand, a series of experiments were conducted to see if I could interface various circuits to the Acorn Electron microcomputer. These mostly involved 7400-series logic chips (ICs, integrated circuits) and featured various logic gates and counters. Previously, I had re-purposed an existing ROM cartridge design to break out signals from the computer and make it access a single flash memory chip instead of two ROM chips.

With a dedicated prototyping solution, I was able to explore the implementation of that existing board, determine various aspects of the signal timings that remained rather unclear (despite being successfully handled by the existing board’s logic), and make it possible to consider a dedicated board for a flash memory cartridge. In fact, my brother, David, also wanting to get into board design, later adapted the prototyping cartridge to make such a board.

But this experimentation also encouraged me to tackle some other items in the electronics shipment: the PIC32 microcontrollers that I had acquired because they were MIPS-based chips, with somewhat more built-in RAM than the Atmel AVR-based chips used by the average Arduino, that could also be used on a breadboard. I hoped that my familiarity with the SoC (system-on-a-chip) in the Ben NanoNote – the Ingenic JZ4720 – might confer some benefits when writing low-level code for the PIC32.

PIC32 on breadboard with Arduino programming circuit

PIC32 on breadboard with Arduino programming circuit (and some LEDs for diagnostic purposes)

I do not need to reproduce an account of my activities here, given that I wrote about the effort involved in getting started with the PIC32 earlier in the year, and subsequently described an unusual application of such a microcontroller that seemed to complement my retrocomputing interests. I have since tried to make that particular piece of work more robust, but deducing the actual behaviour of the hardware has been frustrating, the documentation can be vague when it needs to be accurate, and much of the community discussion is focused on proprietary products and specific software tools rather than techniques. Maybe this will finally push me towards investigating programmable logic solutions in the future.

Compiling a Python-like Language

As things actually happened, the above hardware activities were actually distractions from something I have been working on for a long time. But at this point in the article, this can be a diversion from all the things that seem to involve hardware or low-level software development. Many years ago, I started writing software in Python. Over the years since, alternative implementations of the Python language (the main implementation being CPython) have emerged and seen some use, some continuing to be developed to this day. But around fifteen years ago, it became a bit more common for people to consider whether Python could be compiled to something that runs more efficiently (and more quickly).

I followed some of these projects enthusiastically for a while. Starkiller promised compilation to C++ but never delivered any code for public consumption, although the associated academic thesis might have prompted the development of Shed Skin which does compile a particular style of Python program to C++ and is available as Free Software. Meanwhile, PyPy elevated to prominence the notion of writing a language and runtime library implementation in the language itself, previously seen with language technologies like Slang, used to implement Squeak/Smalltalk.

Although other projects have also emerged and evolved to attempt the compilation of Python to lower-level languages (Pyrex, Cython, Nuitka, and so on), my interests have largely focused on the analysis of programs so that we may learn about their structure and behaviour before we attempt to run them, this alongside any benefits that might be had in compiling them to something potentially faster to execute. But my interests have also broadened to consider the evolution of the Python language since the point fifteen years ago when I first started to think about the analysis and compilation of Python. The near-mythical Python 3000 became a real thing in the form of the Python 3 development branch, introducing incompatibilities with Python 2 and fragmenting the community writing software in Python.

With the risk of perfectly usable software becoming neglected, its use actively (and destructively) discouraged, it becomes relevant to consider how one might take control of one’s software tools for long-term stability, where tools might be good for decades of use instead of constantly changing their behaviour and obliging their users to constantly change their software. I expressed some of my thoughts about this earlier in the year having finally reached a point where I might be able to reflect on the matter.

So, the result of a great deal of work, informed by experiences and conversations over the years related to previous projects of my own and those of others, is a language and toolchain called Lichen. This language resembles Python in many ways but does not try to be a Python implementation. The toolchain compiles programs to C which can then be compiled and executed like “normal” binaries. Programs can be trivially cross-compiled by any available C cross-compilers, too, which is something that always seems to be a struggle elsewhere in the software world. Unlike other Python compilers or implementations, it does not use CPython’s libraries, nor does it generate in “longhand” the work done by the CPython virtual machine.

One might wonder why anyone should bother developing such a toolchain given its incompatibility with Python and a potential lack of any other compelling reason for people to switch. Given that I had to accept some necessary reductions in the original scope of the project and to limit my level of ambition just to feel remotely capable of making something work, one does need to ask whether the result is too compromised to be attractive to others. At one point, programs manipulating integers were slower when compiled than when they were run by CPython, and this was incredibly disheartening to see, but upon further investigation I noticed that CPython effectively special-cases integer operations. The design of my implementation permitted me to represent integers as tagged references – a classic trick of various language implementations – and this overturned the disadvantage.

For me, just having the possibility of exploring alternative design decisions is interesting. Python’s design is largely done by consensus, with pronouncements made to settle disagreements and to move the process forward. Although this may have served the language well, depending on one’s perspective, it has also meant that certain paths of exploration have not been followed. Certain things have been improved gradually but not radically due to backwards compatibility considerations, this despite the break in compatibility between the Python 2 and 3 branches where an opportunity was undoubtedly lost to do greater things. Lichen is an attempt to explore those other paths without having to constantly justify it to a group of people who may regard such exploration as hostile to their own interests.

Lichen is not really complete: it needs floating point number and other useful types; its library is minimal; it could be made more robust; it could be made more powerful. But I find myself surprised that it works at all. Maybe I should have more confidence in myself, especially given all the preparation I did in trying to understand the good and bad aspects of my previous efforts before getting started on this one.

Developing for MIPS-based Platforms

A couple of years ago I found myself wondering if I couldn’t write some low-level software for the Ben NanoNote. One source of inspiration for doing this was “The CI20 bare-metal project“: a series of blog articles discussing the challenges of booting the MIPS Creator CI20 single-board computer. The Ben and the CI20 use CPUs (or SoCs) from the same family: the Ingenic JZ4720 and JZ4780 respectively.

For the Ben, I looked at the different boot payloads, principally those written to support booting from a USB host, but also the version of U-Boot deployed on the Ben. I combined elements of these things with the framebuffer driver code from the Linux kernel supporting the Ben, and to my surprise I was able to get the device to boot up and show a pattern on the screen. Progress has not always been steady, though.

For a while, I struggled to make the CPU leave its initial exception state without hanging, and with the screen as my only debugging tool, it was hard to see what might have been going wrong. Some careful study of the code revealed the problem: the code I was using to write to the framebuffer was using the wrong address region, meaning that as soon as an attempt was made to update the contents of the screen, the CPU would detect a bad memory access and an exception would occur. Such exceptions will not be delivered in the initial exception state, but with that state cleared, the CPU will happily trigger a new exception when the program accesses memory it shouldn’t be touching.

Debugging low-level code on the Ben NanoNote (the hard way)

Debugging low-level code on the Ben NanoNote (the hard way)

I have since plodded along introducing user mode functionality, some page table initialisation, trying to read keypresses, eventually succeeding after retracing my steps and discovering my errors along the way. Maybe this will become a genuinely useful piece of software one day.

But one useful purpose this exercise has served is that of familiarising myself with the way these SoCs are organised, the facilities they provide, how these may be accessed, and so on. My brother has the Letux 400 notebook containing yet another SoC in the same family, the JZ4730, which seems to be almost entirely undocumented. This notebook has proven useful under certain circumstances. For instance, it has been used as a kind of appliance for document scanning, driving a multifunction scanner/printer over USB using the enduring SANE project’s software.

However, the Letux 400 is already an old machine, with products based on this hardware platform being almost ten years old, and when originally shipped it used a 2.4 series Linux kernel instead of a more recent 2.6 series kernel. Like many products whose software is shipped as “finished”, this makes the adoption of newer software very difficult, especially if the kernel code is not “upstreamed” or incorporated into the official Linux releases.

As software distributions such as Debian evolve, they depend on newer kernel features, but if a device is stuck on an older kernel (because the special functionality that makes it work on that device is specific to that kernel) then the device, unable to run the newer kernels, gradually becomes unable to run newer versions of the distribution as well. Thus, Debian Etch was the newest distribution version that would work on the 2.4 kernel used by the Letux 400 as shipped.

Fortunately, work had been done to make a 2.6 series kernel work on the Letux 400, and this made Debian Lenny functional. But time passes and even this is now considered ancient. Although David was running some software successfully, there was other software that really needed a newer distribution to be able to run, and this meant considering what it might take to support Debian Squeeze on the hardware. So he set to work adding patches to the 2.6.24 kernel to try and take it within the realm of Squeeze support, making it beyond the bare minimum of 2.6.29 and into the “release candidate” territory of 2.6.30. And this was indeed enough to run Squeeze on the notebook, at least supporting the devices needed to make the exercise worthwhile.

Now, at a much earlier stage in my own experiments with the Ben NanoNote, I had tried without success to reproduce my results on the Letux 400. And I had also made a rather tentative effort at modifying Ben NanoNote kernel drivers to potentially work with the Letux 400 from some 3.x kernel version. David’s success in updating the kernel version led me to look again at the tasks of familiarising myself with kernel drivers, machine details and of supporting the Letux 400 in even newer kernels.

The outcome of this is uncertain at present. Most of the work on updating the drivers and board support has been done, but actual testing of my work still needs to be done, something that I cannot really do myself. That might seem strange: why start something I cannot finish by myself? But how I got started in this effort is also rather related to the topic of the next section.

The MIPS Creator CI20 and L4/Fiasco.OC

Low-level programming on the Ben NanoNote is frustrating unless you modify the device and solder the UART connections to the exposed pads in the battery compartment, thereby enabling a serial connection and allowing debugging information to be sent to a remote display for perusal. My soldering skills are not that great, and I don’t want to damage my device. So debugging was a frustrating exercise. Since I felt that I needed a bit more experience with the MIPS architecture and the Ingenic SoCs, it occurred to me that getting a CI20 might be the way to go.

I am not really a supporter of Imagination Technologies, producer of the CI20, due to the company’s rather hostile attitude towards Free Software around their PowerVR technologies, meaning that of the different graphics acceleration chipsets, PowerVR has been increasingly isolated as a technology that is consistently unsupportable by Free Software drivers. However, the CI20 is well-documented and has been properly supported with Free Software, apart from the PowerVR parts of the hardware, of course. Ingenic were seemingly persuaded to make the programming manual for the JZ4780 used by the CI20 publicly available, unlike the manuals for other SoCs in that family. And the PowerVR hardware is not actually needed to be able to use the CI20.

The MIPS Creator CI20 single-board computer

The MIPS Creator CI20 single-board computer

I had hoped that the EOMA68 campaign would have offered a JZ4775 computer card, and that the campaign might have delivered such a card by now, but with both of these things not having happened I took the plunge and bought a CI20. There were a few other reasons for doing so: I wanted to see how a single-board computer with a decent amount of RAM (1GB) might perform as a working desktop machine; having another computer to offload certain development and testing tasks, rather than run virtual machines, would be useful; I also wanted to experiment with and even attempt to port other operating systems, loosening my dependence on the Linux monoculture.

One of these other operating systems involves two components: the Fiasco.OC microkernel and the L4 Runtime Environment (L4Re). Over the years, microkernels in the L4 family have seen widespread use, and at one point people considered porting GNU Hurd to one of the L4 family microkernels from the Mach microkernel it then used (and still uses). It seems to me like something worth looking at more closely, and fortunately it also seemed that this software combination had been ported to the CI20. However, it turned out that my expectations of building an image, testing the result, and then moving on to developing interesting software were a little premature.

The first real problem was that GCC produced position-independent code that was not called correctly. This meant that upon trying to get the addresses of functions, the program would end up loading garbage addresses and trying to call any code that might be there at those addresses. So some fixes were required. Then, it appeared that the JZ4780 doesn’t support a particular MIPS instruction, meaning that the CPU would encounter this instruction and cause an exception. So, with some guidance, I wrote a handler to decode the instruction and generate the rather trivial result that the instruction should produce. There were also some more generic problems with the microkernel code that had previously been patched but which had not appeared in the upstream repository. But in the end, I got the “hello” program to run.

With a working foundation I tried to explore the hardware just as I had done with the Ben NanoNote, attempting to understand things like the clock and power management hardware, general purpose input/output (GPIO) peripherals, and also the Inter-Integrated Circuit (I2C) peripherals. Some assistance was available in the form of Linux kernel driver code, although the style of code can vary significantly, and it also takes time to “decode” various mechanisms in the Linux code and to unpick the useful bits related to the hardware. I had hoped to get further, but in trying to use the I2C peripherals to talk to my monitor using the DDC protocol, I found that the data being returned was not entirely reliable. This was arguably a distraction from the more interesting task of enabling the display, given that I know what resolutions my monitor supports.

However, all this hardware-related research and detective work at least gave me an insight into mechanisms – software and hardware – that would inform the effort to “decode” the vendor-written code for the Letux 400, making certain things seem a lot more familiar and increasing my confidence that I might be understanding the things I was seeing. For example, the JZ4720 in the Ben NanoNote arranges its hardware registers for GPIO configuration and access in a particular way, but the code written by the vendor for the JZ4730 in the Letux 400 accesses GPIO registers in a different way.

Initially, I might have thought that I was missing some important detail: are the two products really so different, and if not, then why is the code so different? But then, looking at the JZ4780, I encountered another scheme for GPIO register organisation that is different again, but which does have similarities to the JZ4730. With the JZ4780 being publicly documented, the code for the Letux 400 no longer seemed quite so bizarre or unfathomable. With more experience, it is possible to have a little more confidence in one’s understanding of the mechanisms at work.

I would like to spend a bit more time looking at microkernels and alternatives to Linux. While many people presumably think that Linux is running on everything and has “won”, it is increasingly likely that the Linux one sees on devices does not completely control the hardware and is, in fact, virtualised or confined by software systems like L4/Fiasco.OC. I also have reservations about the way Linux is developed and how well it is able to handle the demands of its proliferation onto every kind of device, many of them hooked up to the Internet and being left to fend for themselves.

Developing imip-agent

Alongside Lichen, a project that has been under development for the last couple of years has been imip-agent, allowing calendar-based scheduling activities to be integrated with mail transport agents. I haven’t been able to spend quite as much time on imip-agent this year as I might have liked, although I will also admit that I haven’t always been motivated to spend much time on it, either. Still, there have been brief periods of activity tidying up, fixing, or improving the code. And some interest in packaging the software led me to reconsider some of the techniques used to deploy the software, in particular the way scheduling extensions are discovered, and the way the system configuration is processed (since Debian does not want “executable scripts” in places like /etc, even if those scripts just contain some simple configuration setting definitions).

It is perhaps fairly typical that a project that tries to assess the feasibility of a concept accumulates the necessary functionality in order to demonstrate that it could do a particular task. After such an initial demonstration, the effort of making the code easier to work with, more reliable, more extensible, must occur if further progress is to be made. One intervention that kept imip-agent viable as a project was the introduction of a test suite to ensure that the basic functionality did indeed work. There were other architectural details that I felt needed remedying or improving for the code to remain manageable.

Recently, I have been refining the parts of the code that support editing of calendar objects and the exchange of updates caused by changes to calendar events. Such work is intended to make the Web client easier to understand and to expose such functionality to proper testing. One side-effect of this may be the introduction of a text-based client for people using e-mail programs like Mutt, as well as a potentially usable library for other mail clients. Such tidying up and fixing does not show off fancy new features or argue the case for developing such software in the first place, but I suppose it makes me feel better about the software I have written.

Whither Moin?

There are probably plenty of other little projects of my own that I have started or at least contemplated this year. And there are also projects that are not mine but which I use and which have had contributions from me over the years. One of these is the MoinMoin wiki software that powers a number of Free Software and other Web sites where collaborative editing is made available to the communities involved. I use MoinMoin – or Moin for short – to publish content on the Web myself, and I have encouraged others to use it in the past. However, it worries me now that the level of maintenance it is receiving has fallen to a level where updates for faults in the software are not likely to be forthcoming and where it is no longer clear where such updates should be coming from.

Earlier in the year, having previously read queries about the static export output from Moin, which can be rather basic and not necessarily resemble the appearance of the wiki such output has come from, I spent some time considering my own use of Moin for documentation publishing. For some of my projects, I don’t take advantage of the “through the Web” editing of the solution when publishing the public documentation. Instead, I use Moin locally, store the pages in a separate repository, and then make page packages that get installed on a public instance of Moin. This means that I do not have to worry about Web-based authentication and can just have a wiki as a read-only resource.

Obviously, the parts of Moin that I really need here are just the things that parse the wiki formatting (which I regard as more usable than other document markup formats in various respects) and that format the content as HTML. If I could format it as static content with some pages, some stylesheets, some images, with some Web server magic to make the URLs look nice, then that would probably be sufficient. For some things like the automatic generation of SVG from Graphviz-format files, I would also need to have the relevant parsers available, too. Having a complete Web framework, which is what Moin really is, is rather unnecessary with these diminished requirements.

But I do use Moin as a full wiki solution as well, and so it made me wonder whether I shouldn’t try and bring it up to date. Of course, there is already the MoinMoin 2.0 effort that was intended to modernise and tidy up the software, but since this effort made a clean break from Moin 1.x, it was never an attractive choice for those people already using Moin in anything more than a basic sense. Since there wasn’t an established API for extensions, it was not readily usable for many existing sites that rely on such extensions. In a way, Moin 2 has suffered from something that Python 3 only avoided by having a lot more people working on it, including people being paid to work on it, together with a policy of openly shaming those people who had made Python 2 viable – by developing software for it – into spending time migrating their code to Python 3.

I don’t have an obvious plan of action here. Moin perhaps illustrates the fundamental problem facing many Free Software projects, this being a theme that I have discussed regularly this year: how they may remain viable by having people able to dedicate their time to writing and maintaining Free Software without this work being squeezed in around the edges of people’s “actual work” and thus burdening them with yet another obligation in their lives, particularly one that is not rewarded by a proper appreciation of the sacrifice being made.

Plenty of individuals and organisations benefit from Moin, but we live in an age of “comparison shopping” where people will gladly drop one thing if someone offers them something newer and shinier. This is, after all, how everyone ends up using “free” services where the actual costs are hidden. To their credit, when Moin needed to improve its password management, the Python Software Foundation stepped up and funded this work rather than dropping Moin, which is what I had expected given certain Python community attitudes. Maybe other, more well-known organisations that use Moin also support its development, but I don’t really see much evidence of it.

Maybe they should consider doing so. The notion that something else will always come along, developed by some enthusiastic developer “scratching their itch”, is misguided and exploitative. And a failure to sustain Free Software development can only undermine Free Software as a resource, as an activity or a cause, and as the basis of many of those organisations’ continued existence. Many of us like developing Free Software, as I hope this article has shown, but motivation alone does not keep that software coming forever.

VGA Signal Generation with the PIC32

Monday, May 22nd, 2017

It all started after I had designed – and received from fabrication – a circuit board for prototyping cartridges for the Acorn Electron microcomputer. Although some prototyping had already taken place with an existing cartridge, with pins intended for ROM components being routed to drive other things, this board effectively “breaks out” all connections available to a cartridge that has been inserted into the computer’s Plus 1 expansion unit.

Acorn Electron cartridge breakout board

The Acorn Electron cartridge breakout board being used to drive an external circuit

One thing led to another, and soon my brother, David, was interfacing a microcontroller to the Electron in order to act as a peripheral being driven directly by the system’s bus signals. His approach involved having a program that would run and continuously scan the signals for read and write conditions and then interpret the address signals, sending and receiving data on the bus when appropriate.

Having acquired some PIC32 devices out of curiosity, with the idea of potentially interfacing them with the Electron, I finally took the trouble of looking at the datasheet to see whether some of the hard work done by David’s program might be handled by the peripheral hardware in the PIC32. The presence of something called “Parallel Master Port” was particularly interesting.

Operating this function in the somewhat insensitively-named “slave” mode, the device would be able to act like a memory device, with the signalling required by read and write operations mostly being dealt with by the hardware. Software running on the PIC32 would be able to read and write data through this port and be able to get notifications about new data while getting on with doing other things.

So began my journey into PIC32 experimentation, but this article isn’t about any of that, mostly because I put that particular investigation to one side after a degree of experience gave me perhaps a bit too much confidence, and I ended up being distracted by something far more glamorous: generating a video signal using the PIC32!

The Precedents’ Hall of Fame

There are plenty of people who have written up their experiments generating VGA and other video signals with microcontrollers. Here are some interesting examples:

And there are presumably many more pages on the Web with details of people sending pixel data over a cable to a display of some sort, often trying to squeeze every last cycle out of their microcontroller’s instruction processing unit. But, given an awareness of how microcontrollers should be able to take the burden off the programs running on them, employing peripheral hardware to do the grunt work of switching pins on and off at certain frequencies, maybe it would be useful to find examples of projects where such advantages of microcontrollers had been brought to bear on the problem.

In fact, I was already aware of the Maximite “single chip computer” partly through having seen the cloned version of the original being sold by Olimex – something rather resented by the developer of the Maximite for reasons largely rooted in an unfortunate misunderstanding of Free Software licensing on his part – and I was aware that this computer could generate a VGA signal. Indeed, the method used to achieve this had apparently been written up in a textbook for the PIC32 platform, albeit generating a composite video signal using one of the on-chip SPI peripherals. The Colour Maximite uses three SPI channels to generate one red, one green, and one blue channel of colour information, thus supporting eight-colour graphical output.

But I had been made aware of the Parallel Master Port (PMP) and its “master” mode, used to drive LCD panels with eight bits of colour information per pixel (or, using devices with many more pins than those I had acquired, with sixteen bits of colour information per pixel). Would it surely not be possible to generate 256-colour graphical output at the very least?

Information from people trying to use PMP for this purpose was thin on the ground. Indeed, reading again one article that mentioned an abandoned attempt to get PMP working in this way, using the peripheral to emit pixel data for display on a screen instead of a panel, I now see that it actually mentions an essential component of the solution that I finally arrived at. But the author had unfortunately moved away from that successful component in an attempt to get the data to the display at a rate regarded as satisfactory.

Direct Savings

It is one thing to have the means to output data to be sent over a cable to a display. It is another to actually send the data efficiently from the microcontroller. Having contemplated such issues in the past, it was not a surprise that the Maximite and other video-generating solutions use direct memory access (DMA) to get the hardware, as opposed to programs, to read through memory and to write its contents to a destination, which in most cases seemed to be the memory address holding output data to be emitted via a data pin using the SPI mechanism.

I had also envisaged using DMA and was still fixated on using PMP to emit the different data bits to the output circuit producing the analogue signals for the display. Indeed, Microchip promotes the PMP and DMA combination as a way of doing “low-cost controllerless graphics solutions” involving LCD panels, so I felt that there surely couldn’t be much difference between that and getting an image on my monitor via a few resistors on the breadboard.

And so, a tour of different PIC32 features began, trying to understand the DMA documentation, the PMP documentation, all the while trying to get a grasp of what the VGA signal actually looks like, the timing constraints of the various synchronisation pulses, and battle various aspects of the MIPS architecture and the PIC32 implementation of it, constantly refining my own perceptions and understanding and learning perhaps too often that there may have been things I didn’t know quite enough about before trying them out!

Using VGA to Build a Picture

Before we really start to look at a VGA signal, let us first look at how a picture is generated by the signal on a screen:

VGA Picture Structure

The structure of a display image or picture produced by a VGA signal

The most important detail at this point is the central area of the diagram, filled with horizontal lines representing the colour information that builds up a picture on the display, with the actual limits of the screen being represented here by the bold rectangle outline. But it is also important to recognise that even though there are a number of visible “display lines” within which the colour information appears, the entire “frame” sent to the display actually contains yet more lines, even though they will not be used to produce an image.

Above and below – really before and after – the visible display lines are the vertical back and front porches whose lines are blank because they do not appear on the screen or are used to provide a border at the top and bottom of the screen. Such extra lines contribute to the total frame period and to the total number of lines dividing up the total frame period.

Figuring out how many lines a display will have seems to involve messing around with something called the “generalised timing formula”, and if you have an X server like Xorg installed on your system, you may even have a tool called “gtf” that will attempt to calculate numbers of lines and pixels based on desired screen resolutions and frame rates. Alternatively, you can look up some common sets of figures on sites providing such information.

What a VGA Signal Looks Like

Some sources show diagrams attempting to describe the VGA signal, but many of these diagrams are open to interpretation (in some cases, very much so). They perhaps show the signal for horizontal (display) lines, then other signals for the entire image, but they either do not attempt to combine them, or they instead combine these details ambiguously.

For instance, should the horizontal sync (synchronisation) pulse be produced when the vertical sync pulse is active or during the “blanking” period when no pixel information is being transmitted? This could be deduced from some diagrams but only if you share their authors’ unstated assumptions and do not consider other assertions about the signal structure. Other diagrams do explicitly show the horizontal sync active during vertical sync pulses, but this contradicts statements elsewhere such as “during the vertical sync period the horizontal sync must also be held low”, for instance.

After a lot of experimentation, I found that the following signal structure was compatible with the monitor I use with my computer:

VGA Signal Structure

The basic structure of a VGA signal, or at least a signal that my monitor can recognise

There are three principal components to the signal:

  • Colour information for the pixel or line data forms the image on the display and it is transferred within display lines during what I call the visible display period in every frame
  • The horizontal sync pulse tells the display when each horizontal display line ends, or at least the frequency of the lines being sent
  • The vertical sync pulse tells the display when each frame (or picture) ends, or at least the refresh rate of the picture

The voltage levels appear to be as follows:

  • Colour information should be at 0.7V (although some people seem to think that 1V is acceptable as “the specified peak voltage for a VGA signal”)
  • Sync pulses are supposed to be at “TTL” levels, which apparently can be from 0V to 0.5V for the low state and from 2.7V to 5V for the high state

Meanwhile, the polarity of the sync pulses is also worth noting. In the above diagram, they have negative polarity, meaning that an active pulse is at the low logic level. Some people claim that “modern VGA monitors don’t care about sync polarity”, but since it isn’t clear to me what determines the polarity, and since most descriptions and demonstrations of VGA signal generation seem to use negative polarity, I chose to go with the flow. As far as I can tell, the gtf tool always outputs the same polarity details, whereas certain resources provide signal characteristics with differing polarities.

It is possible, and arguably advisable, to start out trying to generate sync pulses and just grounding the colour outputs until your monitor (or other VGA-capable display) can be persuaded that it is receiving a picture at a certain refresh rate and resolution. Such confirmation can be obtained on a modern display by seeing a blank picture without any “no signal” or “input not supported” messages and by being able to activate the on-screen menu built into the device, in which an option is likely to exist to show the picture details.

How the sync and colour signals are actually produced will be explained later on. This section was merely intended to provide some background and gather some fairly useful details into one place.

Counting Lines and Generating Vertical Sync Pulses

The horizontal and vertical sync pulses are each driven at their own frequency. However, given that there are a fixed number of lines in every frame, it becomes apparent that the frequency of vertical sync pulse occurrences is related to the frequency of horizontal sync pulses, the latter occurring once per line, of course.

With, say, 622 lines forming a frame, the vertical sync will occur once for every 622 horizontal sync pulses, or at a rate that is 1/622 of the horizontal sync frequency or “line rate”. So, if we can find a way of generating the line rate, we can not only generate horizontal sync pulses, but we can also count cycles at this frequency, and every 622 cycles we can produce a vertical sync pulse.

But how do we calculate the line rate in the first place? First, we decide what our refresh rate should be. The “classic” rate for VGA output is 60Hz. Then, we decide how many lines there are in the display including those extra non-visible lines. We multiply the refresh rate by the number of lines to get the line rate:

60Hz * 622 = 37320Hz = 37.320kHz

On a microcontroller, the obvious way to obtain periodic events is to use a timer. Given a particular frequency at which the timer is updated, a quick calculation can be performed to discover how many times a timer needs to be incremented before we need to generate an event. So, let us say that we have a clock frequency of 24MHz, and a line rate of 37.320kHz, we calculate the number of timer increments required to produce the latter from the former:

24MHz / 37.320kHz = 24000000Hz / 37320Hz = 643

So, if we set up a timer that counts up to 642 and then upon incrementing again to 643 actually starts again at zero, with the timer sending a signal when this “wraparound” occurs, we can have a mechanism providing a suitable frequency and then make things happen at that frequency. And this includes counting cycles at this particular frequency, meaning that we can increment our own counter by 1 to keep track of display lines. Every 622 display lines, we can initiate a vertical sync pulse.

One aspect of vertical sync pulses that has not yet been mentioned is their duration. Various sources suggest that they should last for only two display lines, although the “gtf” tool specifies three lines instead. Our line-counting logic therefore needs to know that it should enable the vertical sync pulse by bringing it low at a particular starting line and then disable it by bringing it high again after two whole lines.

Generating Horizontal Sync Pulses

Horizontal sync pulses take place within each display line, have a specific duration, and they must start at the same time relative to the start of each line. Some video output demonstrations seem to use lots of precisely-timed instructions to achieve such things, but we want to use the peripherals of the microcontroller as much as possible to avoid wasting CPU time. Having considered various tricks involving specially formulated data that might be transferred from memory to act as a pulse, I was looking for examples of DMA usage when I found a mention of something called the Output Compare unit on the PIC32.

What the Output Compare (OC) units do is to take a timer as input and produce an output signal dependent on the current value of the timer relative to certain parameters. In clearer terms, you can indicate a timer value at which the OC unit will cause the output to go high, and you can indicate another timer value at which the OC unit will cause the output to go low. It doesn’t take much imagination to realise that this sounds almost perfect for generating the horizontal sync pulse:

  1. We take the timer previously set up which counts up to 643 and thus divides the display line period into units of 1/643.
  2. We identify where the pulse should be brought low and present that as the parameter for taking the output low.
  3. We identify where the pulse should be brought high and present that as the parameter for taking the output high.

Upon combining the timer and the OC unit, then configuring the output pin appropriately, we end up with a low pulse occurring at the line rate, but at a suitable offset from the start of each line.

VGA Display Line Structure

The structure of each visible display line in the VGA signal

In fact, the OC unit also proves useful in actually generating the vertical sync pulses, too. Although we have a timer that can tell us when it has wrapped around, we really need a mechanism to act upon this signal promptly, at least if we are to generate a clean signal. Unfortunately, handling an interrupt will introduce a delay between the timer wrapping around and the CPU being able to do something about it, and it is not inconceivable that this delay may vary depending on what the CPU has been doing.

So, what seems to be a reasonable solution to this problem is to count the lines and upon seeing that the vertical sync pulse should be initiated at the start of the next line, we can enable another OC unit configured to act as soon as the timer value is zero. Thus, upon wraparound, the OC unit will spring into action and bring the vertical sync output low immediately. Similarly, upon realising that the next line will see the sync pulse brought high again, we can reconfigure the OC unit to do so as soon as the timer value again wraps around to zero.

Inserting the Colour Information

At this point, we can test the basic structure of the signal and see if our monitor likes it. But none of this is very interesting without being able to generate a picture, and so we need a way of getting pixel information from the microcontroller’s memory to its outputs. We previously concluded that Direct Memory Access (DMA) was the way to go in reading the pixel data from what is usually known as a framebuffer, sending it to another place for output.

As previously noted, I thought that the Parallel Master Port (PMP) might be the right peripheral to use. It provides an output register, confusingly called the PMDIN (parallel master data in) register, that lives at a particular address and whose value is exposed on output pins. On the PIC32MX270, only the least significant eight bits of this register are employed in emitting data to the outside world, and so a DMA destination having a one-byte size, located at the address of PMDIN, is chosen.

The source data is the framebuffer, of course. For various retrocomputing reasons hinted at above, I had decided to generate a picture 160 pixels in width, 256 lines in height, and with each byte providing eight bits of colour depth (specifying how many distinct colours are encoded for each pixel). This requires 40 kilobytes and can therefore reside in the 64 kilobytes of RAM provided by the PIC32MX270. It was at this point that I learned a few things about the DMA mechanisms of the PIC32 that didn’t seem completely clear from the documentation.

Now, the documentation talks of “transactions”, “cells” and “blocks”, but I don’t think it describes them as clearly as it could do. Each “transaction” is just a transfer of a four-byte word. Each “cell transfer” is a collection of transactions that the DMA mechanism performs in a kind of batch, proceeding with these as quickly as it can until it either finishes the batch or is told to stop the transfer. Each “block transfer” is a collection of cell transfers. But what really matters is that if you want to transfer a certain amount of data and not have to keep telling the DMA mechanism to keep going, you need to choose a cell size that defines this amount. (When describing this, it is hard not to use the term “block” rather than “cell”, and I do wonder why they assigned these terms in this way because it seems counter-intuitive.)

You can perhaps use the following template to phrase your intentions:

I want to transfer <cell size> bytes at a time from a total of <block size> bytes, reading data starting from <source address>, having <source size>, and writing data starting at <destination address>, having <destination size>.

The total number of bytes to be transferred – the block size – is calculated from the source and destination sizes, with the larger chosen to be the block size. If we choose a destination size less than the source size, the transfers will not go beyond the area of memory defined by the specified destination address and the destination size. What actually happens to the “destination pointer” is not immediately obvious from the documentation, but for our purposes, where we will use a destination size of one byte, the DMA mechanism will just keep writing source bytes to the same destination address over and over again. (One might imagine the pointer starting again at the initial start address, or perhaps stopping at the end address instead.)

So, for our purposes, we define a “cell” as 160 bytes, being the amount of data in a single display line, and we only transfer one cell in a block. Thus, the DMA source is 160 bytes long, and even though the destination size is only a single byte, the DMA mechanism will transfer each of the source bytes into the destination. There is a rather unhelpful diagram in the documentation that perhaps tries to communicate too much at once, leading one to believe that the cell size is a factor in how the destination gets populated by source data, but the purpose of the cell size seems only to be to define how much data is transferred at once when a transfer is requested.

DMA Transfer Mechanism

The transfer of framebuffer data to PORTB using DMA cell transfers (noting that this hints at the eventual approach which uses PORTB and not PMDIN)

In the matter of requesting a transfer, we have already described the mechanism that will allow us to make this happen: when the timer signals the start of a new line, we can use the wraparound event to initiate a DMA transfer. It would appear that the transfer will happen as fast as both the source and the destination will allow, at least as far as I can tell, and so it is probably unlikely that the data will be sent to the destination too quickly. Once the transfer of a line’s pixel data is complete, we can do some things to set up the transfer for the next line, like changing the source data address to point to the next 160 bytes representing the next display line.

(We could actually set the block size to the length of the entire framebuffer – by setting the source size – and have the DMA mechanism automatically transfer each line in turn, updating its own address for the current line. However, I intend to support hardware scrolling, where the address of the first line of the screen can be adjusted so that the display starts part way through the framebuffer, reaches the end of the framebuffer part way down the screen, and then starts again at the beginning of the framebuffer in order to finish displaying the data at the bottom of the screen. The DMA mechanism doesn’t seem to support the necessary address wraparound required to manage this all by itself.)

Output Complications

Having assumed that the PMP peripheral would be an appropriate choice, I soon discovered some problems with the generated output. Although the data that I had stored in the RAM seemed to be emitted as pixels in appropriate colours, there were gaps between the pixels on the screen. Yet the documentation seemed to vaguely indicate that the PMDIN register was more or less continuously updated. That meant that the actual output signals were being driven low between each pixel, causing black-level gaps and ruining the result.

I wondered if anything could be done about this issue. PMP is really intended as some kind of memory interface, and it isn’t unreasonable for it to only maintain valid data for certain periods of time, modifying control signals to define this valid data period. That PMP can be used to drive LCD panels is merely a result of those panels themselves upholding this kind of interface. For those of you familiar with microcontrollers, the solution to my problem was probably obvious several paragraphs ago, but it needed me to reconsider my assumptions and requirements before I realised what I should have been doing all along.

Unlike SPI, which concerns itself with the bit-by-bit serial output of data, PMP concerns itself with the multiple-bits-at-once parallel output of data, and all I wanted to do was to present multiple bits to a memory location and have them translated to a collection of separate signals. But, of course, this is exactly how normal I/O (input/output) pins are provided on microcontrollers! They all seem to provide “PORT” registers whose bits correspond to output pins, and if you write a value to those registers, all the pins can be changed simultaneously. (This feature is obscured by platforms like Arduino where functions are offered to manipulate only a single pin at once.)

And so, I changed the DMA destination to be the PORTB register, which on the PIC32MX270 is the only PORT register with enough bits corresponding to I/O pins to be useful enough for this application. Even then, PORTB does not have a complete mapping from bits to pins: some pins that are available in other devices have been dedicated to specific functions on the PIC32MX270F256B and cannot be used for I/O. So, it turns out that we can only employ at most seven bits of our pixel data in generating signal data:

PORTB Pin Availability on the PIC32MX270F256B
Pins
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
RPB15 RPB14 RPB13 RPB11 RPB10 RPB9 RPB8 RPB7 RPB5 RPB4 RPB3 RPB2 RPB1 RPB0

We could target the first byte of PORTB (bits 0 to 7) or the second byte (bits 8 to 15), but either way we will encounter an unmapped bit. So, instead of choosing a colour representation making use of eight bits, we have to make do with only seven.

Initially, not noticing that RPB6 was not available, I was using a “RRRGGGBB” or “332” representation. But persuaded by others in a similar predicament, I decided to choose a representation where each colour channel gets two bits, and then a separate intensity bit is used to adjust the final intensity of the basic colour result. This also means that greyscale output is possible because it is possible to balance the channels.

The 2-bit-per-channel plus intensity colours

The colours employing two bits per channel plus one intensity bit, perhaps not shown completely accurately due to circuit inadequacies and the usual white balance issues when taking photographs

It is worth noting at this point that since we have left the 8-bit limitations of the PMP peripheral far behind us now, we could choose to populate two bytes of PORTB at once, aiming for sixteen bits per pixel but actually getting fourteen bits per pixel once the unmapped bits have been taken into account. However, this would double our framebuffer memory requirements for the same resolution, and we don’t have that much memory. There may be devices with more than sixteen bits mapped in the 32-bit PORTB register (or in one of the other PORT registers), but they had better have more memory to be useful for greater colour depths.

Back in Black

One other matter presented itself as a problem. It is all very well generating a colour signal for the pixels in the framebuffer, but what happens at the end of each DMA transfer once a line of pixels has been transmitted? For the portions of the display not providing any colour information, the channel signals should be held at zero, yet it is likely that the last pixel on any given line is not at the lowest possible (black) level. And so the DMA transfer will have left a stray value in PORTB that could then confuse the monitor, producing streaks of colour in the border areas of the display, making the monitor unsure about the black level in the signal, and also potentially confusing some monitors about the validity of the picture, too.

As with the horizontal sync pulses, we need a prompt way of switching off colour information as soon as the pixel data has been transferred. We cannot really use an Output Compare unit because that only affects the value of a single output pin, and although we could wire up some kind of blanking in our external circuit, it is simpler to look for a quick solution within the capabilities of the microcontroller. Fortunately, such a quick solution exists: we can “chain” another DMA channel to the one providing the pixel data, thereby having this new channel perform a transfer as soon as the pixel data has been sent in its entirety. This new channel has one simple purpose: to transfer a single byte of black pixel data. By doing this, the monitor will see black in any borders and beyond the visible regions of the display.

Wiring Up

Of course, the microcontroller still has to be connected to the monitor somehow. First of all, we need a way of accessing the pins of a VGA socket or cable. One reasonable approach is to obtain something that acts as a socket and that breaks out the different signals from a cable, connecting the microcontroller to these broken-out signals.

Wanting to get something for this task quickly and relatively conveniently, I found a product at a local retailer that provides a “male” VGA connector and screw-adjustable terminals to break out the different pins. But since the VGA cable also has a male connector, I also needed to get a “gender changer” for VGA that acts as a “female” connector in both directions, thus accommodating the VGA cable and the male breakout board connector.

Wiring up to the broken-out VGA connector pins is mostly a matter of following diagrams and the pin numbering scheme, illustrated well enough in various resources (albeit with colour signal transposition errors in some resources). Pins 1, 2 and 3 need some special consideration for the red, green and blue signals, and we will look at them in a moment. However, pins 13 and 14 are the horizontal and vertical sync pins, respectively, and these can be connected directly to the PIC32 output pins in this case, since the 3.3V output from the microcontroller is supposedly compatible with the “TTL” levels. Pins 5 through 10 can be connected to ground.

We have seen mentions of colour signals with magnitudes of up to 0.7V, but no explicit mention of how they are formed has been presented in this article. Fortunately, everyone is willing to show how they converted their digital signals to an analogue output, with most of them electing to use a resistor network to combine each output pin within a channel to produce a hopefully suitable output voltage.

Here, with two bits per channel, I take the most significant bit for a channel and send it through a 470ohm resistor. Meanwhile, the least significant bit for the channel is sent through a 1000ohm resistor. Thus, the former contributes more to the magnitude of the signal than the latter. If we were only dealing with channel information, this would be as much as we need to do, but here we also employ an intensity bit whose job it is to boost the channels by a small amount, making sure not to allow the channels to pollute each other via this intensity sub-circuit. Here, I feed the intensity output through a 2200ohm resistor and then to each of the channel outputs via signal diodes.

VGA Output Circuit

The circuit showing connections relevant to VGA output (generic connections are not shown)

The Final Picture

I could probably go on and cover other aspects of the solution, but the fundamental aspects are probably dealt with sufficiently above to help others reproduce this experiment themselves. Populating memory with usable image data, at least in this solution, involves copying data to RAM, and I did experience problems with accessing RAM that are probably related to CPU initialisation (as covered in my previous article) and to synchronising the memory contents with what the CPU has written via its cache.

As for the actual picture data, the RGB-plus-intensity representation is not likely to be the format of most images these days. So, to prepare data for output, some image processing is needed. A while ago, I made a program to perform palette optimisation and dithering on images for the Acorn Electron, and I felt that it was going to be easier to adapt the dithering code than it was to figure out the necessary techniques required for software like ImageMagick or the Python Imaging Library. The pixel data is then converted to assembly language data definition statements and incorporated into my PIC32 program.

VGA output from a PIC32 microcontroller

VGA output from a PIC32 microcontroller, featuring a picture showing some Oslo architecture, with the PIC32MX270 being powered (and programmed) by the Arduino Duemilanove, and with the breadboards holding the necessary resistors and diodes to supply the VGA breakout and, beyond that, the cable to the monitor

To demonstrate control over the visible region, I deliberately adjusted the display frequencies so that the monitor would consider the signal to be carrying an image 800 pixels by 600 pixels at a refresh rate of 60Hz. Since my framebuffer is only 256 lines high, I double the lines to produce 512 lines for the display. It would seem that choosing a line rate to try and produce 512 lines has the monitor trying to show something compatible with the traditional 640×480 resolution and thus lines are lost off the screen. I suppose I could settle for 480 lines or aim for 300 lines instead, but I actually don’t mind having a border around the picture.

The on-screen menu showing the monitor's interpretation of the signal

The on-screen menu showing the monitor's interpretation of the signal

It is worth noting that I haven’t really mentioned a “pixel clock” or “dot clock” so far. As far as the display receiving the VGA signal is concerned, there is no pixel clock in that signal. And as far as we are concerned, the pixel clock is only important when deciding how quickly we can get our data into the signal, not in actually generating the signal. We can generate new colour values as slowly (or as quickly) as we might like, and the result will be wider (or narrower) pixels, but it shouldn’t make the actual signal invalid in any way.

Of course, it is important to consider how quickly we can generate pixels. Previously, I mentioned a 24MHz clock being used within the PIC32, and it is this clock that is used to drive peripherals and this clock’s frequency that will limit the transfer speed. As noted elsewhere, a pixel clock frequency of 25MHz is used to support the traditional VGA resolution of 640×480 at 60Hz. With the possibilities of running the “peripheral clock” in the PIC32MX270 considerably faster than this, it becomes a matter of experimentation as to how many pixels can be supported horizontally.

Some Oslo street art being displayed by the PIC32

Some Oslo street art being displayed by the PIC32

For my own purposes, I have at least reached the initial goal of generating a stable and usable video signal. Further work is likely to involve attempting to write routines to modify the framebuffer, maybe support things like scrolling and sprites, and even consider interfacing with other devices.

Naturally, this project is available as Free Software from its own repository. Maybe it will inspire or encourage you to pursue something similar, knowing that you absolutely do not need to be any kind of “expert” to stubbornly persist and to eventually get results!

Making Python Programs Faster with Shedskin

Tuesday, February 2nd, 2016

A few months ago, I had the opportunity to combine two of my interests: retrocomputing and Python programming. The latter needs little additional explanation, but the former perhaps requires a few more words. Retrocomputing is the study and use of computing equipment from an earlier era in computing, where such equipment is typically no longer in use, or is no longer in widespread use. My own experiences with microcomputers began in the 1980s and are largely centred upon those manufactured by Acorn Computers, such as the Acorn Electron and BBC Microcomputer.

Some History

One of my earlier initiatives was to attempt to document and understand the functioning of the Acorn Electron’s ULA integrated circuit: the Uncommitted Logic Array employed in the computer to generate video and perform input/output tasks. When Acorn decided to make the Electron as a variant of the BBC Microcomputer, the engineers merged many of the functions performed by separate chips into a single one, for several good and not-so-good reasons:

  • To reduce system complexity: having to connect several components and make sure that they all work correctly and in time with each other can be challenging, and it is arguably best to reduce the number of things that can go wrong by just reducing the number of things involved in the first place.
  • To reduce system cost: production becomes less complicated and savings can potentially be realised by combining discrete components.
  • To deepen the organisation’s experience with integrated circuit design: this being the company that developed the ARM architecture and eventually brought the first ARM chipset (CPU, audio/video controller, memory management unit, and input/output controller) to market.
  • To make a proprietary component that others could not readily clone: this being something of an obsession in the early 1980s marketplace.

Now, the ULA has the job of reading from memory and translating what it reads into a sequence of colour values, thus generating a picture on the screen. However, it has the annoying limitation of locking the CPU out of the memory (in fact, only the RAM which resides in a certain region) while it generates each line of the displayed image. Depending on which “screen mode” is selected (determining the resolution and colour depth), it may still let the CPU access the RAM at a lower speed, or it may effectively suspend the CPU for the entire time taken to generate a single horizontal display line.

Consequently, the Acorn Electron is considerably slower than the BBC Microcomputer for this reason: the BBC Micro employs faster memory and lets the CPU and its own video circuitry take turns with the memory while they both run at full speed; the Electron used cheaper memory as another cost-saving measure; reviews of the Electron, while generally positive about getting BBC Micro features for less, tended to note this performance degradation with some disappointment.

The Acorn Electron ULA

The Acorn Electron ULA (with socket cover/heatsink removed)

Some Background

Normally, the matter of the CPU stalling for much of its time would be considered a disadvantage, but my brother and I were having a discussion about software running from ROM (or, in fact, in the area of memory not affected by the ULA), with the pitfalls of such software needing to access the RAM and potentially becoming stalled as the CPU finds itself waiting for the ULA to do its work. Somehow the notion arose that the ULA effectively provides a kind of synchronisation mechanism for software needing to run in the period between display lines.

So, a program can read instructions from ROM and then, when it needs to know that the display is not being updated, it can attempt to access RAM. Whether or not the program stalls remains unknown to the program itself, but when it gets the result of accessing the RAM – perhaps immediately, perhaps after a few microseconds – it can be certain that the ULA is not accessing the RAM because it just did so itself.

A screen image showing the update region

An image showing the display update region (the "test card" picture) when the ULA accesses memory, with the black border indicating the region (or time period) during which the CPU may access the lower region of the Electron's memory.

(See the BBC Test Cards for the origin of the above picture.)

One might wonder what kind of program would benefit from synchronising itself to the display line periods. Well, one thing that tended to happen back in the microcomputing era, was the trick of reprogramming the display palette – the selection of colours shown on screen – so that a greater number of colours can be displayed simultaneously on the screen than would normally be the case for that particular display configuration. For example, a screen mode normally offering only four colours could instead offer the full range – eight “proper” colours on an Electron – if it switched the palette during screen updates. Even a screen mode offering only two colours could offer eight by employing palette switching.

And with a certain amount of experimentation, a working solution was eventually delivered (with only limited input needed from my side). By running software in a ROM (or from RAM mapped into the same area of memory as a ROM), it became possible to reliably change the palette on a line-by-line basis, bridging the gap between a medium-resolution four-colour mode and a hypothetical eight-colour version that would only become available on Acorn’s ARM-based microcomputers. Although only four colours can be used per display line, each of the 256 display lines can employ four-colour combinations of the full eight colours, not quite making an eight-colour mode – which would, of course, allow eight colours on every display line – but still permitting graphical output much closer to a true eight-colour mode than can be contemplated with a restrictive four-colour mode.

Rainbow lorikeets on a lawn

Rainbow lorikeets on a lawn (left: 4 colours from 8 per display line; right: 8 colours per display line)

Palette Optimisation

With the trick in place to switch the palette on demand, all that remained was to make a program that could take an input image and to optimise the colours so that…

  1. Only the eight permitted colours (black, white, primary and secondary colours) are used in the image.
  2. No pixel row (display line) employs more than four different colours.

Expectations were rather low at first. First of all, in this era of bountiful quantities of fresh imagery, most pictures appear to have plenty of colours and are photographs, and so the easiest images to use are the ones that first need to be reduced in both resolution and colour depth. And once this is done, the job of optimising the colours to meet the second of the above criteria is required. So, initially, very simple techniques were employed to do both of these things.

Later, after some discussions, it appeared that integrating the two processing activities and, crucially, applying basic dithering and error propagation techniques, made it possible to represent complicated “deep-colour” images within the limited display representation.

Magnified details of the two representations

Magnified details of the two representations

Closer examination of the above images reveals how the algorithm attempts to spread the responsibility of representing colours across rows, being restricted to the choice of four colours (in the left image) to produce the appropriate tones that are more readily encoded using eight colours (in the right image).

Tuning the Implementation

To perform the optimisation process, I wrote a program which took an input image, rotated and scaled it to the target resolution, and processed the colours by inspecting the pixel data on each horizontal line (or row) of the image, calculating the appropriate four-colour combination for a line and generating suitable pixel data appropriate for this restricted palette. Since I am probably most comfortable with Python, and since Python also has various convenient image-processing libraries, I found myself developing a short Python program to do the work.

Now, Python doesn’t usually exhibit the highest performance, particularly for tasks such as this, and as the experimentation with different approaches started to lessen, with the most rewarding approaches making themselves evident, and with the temptation to convert lots of images just to see the results, my attentions turned to speeding up the program. Since I had been involved with packaging the Shedskin Python-to-C++ compiler for Debian, and thus the package was right there for me to use, it made sense to give it a try on this code.

At this point, the program was taking around 40 seconds to convert a 16 megapixel photographic image into the appropriate output representation. Despite using the Python Imaging Library to do the rotation and scaling, visiting each pixel in a Python program and doing some simple statistical and arithmetic operations was taking up rather a lot of time. In the past, I have used various “numeric” Python extensions – usually the ones supported by pygame – but things have moved on somewhat incoherently since then, and I also wanted to keep my code straightforward and readable, which is often something that is lost when making code “numeric”.

Time for Shedskin

I have used Shedskin before, and the first thing to think about when using it is how it will interact with non-Python code. Shedskin takes Python code and generates C++ which must then be compiled. If a whole program is translated to C++, the resulting executable can be run with no further work necessary. However, the program of interest here uses libraries that are actually implemented in C and are delivered as shared libraries that are loaded by CPython (the Python virtual machine implementation written in the C programming language). Shedskin cannot generally translate such a program in its entirety.

From the earliest days of Python, it was (and has remained) a common practice to first write library code in Python, desire better performance, and to then rewrite much of that code in the C programming language against the Python/C API (or CPython API) as an “extension module”, which is what these problematic shared libraries are. Such libraries act as part of the CPython virtual machine, more or less, and with suitable implementation choices made for performance, everything using such libraries will run much more quickly. Unfortunately, but understandably, Shedskin doesn’t seek to interact closely with the CPython implementation: it produces C++ code that works with its own runtime libraries.

So, instead of working with a single program file, I split my program up into a main program which deals with CPython extensions such as the Python Imaging Library, whose JPEG and PNG manipulation facilities are too convenient to abandon, along with a library module that does all the computation for this particular application. The library would provide its own image abstraction for pixel-level accesses, but be given the pixel data obtained by the main program, returning the processed data to the main program for saving to a file. By only compiling the library, Shedskin can produce a standalone library file containing hopefully high-performance implementations of the original Python code.

But how can such a library constructed by Shedskin be used? Would we now need to somehow translate the main program into C or C++ in order to be able to use it? Fortunately, but slightly confusingly, Shedskin can also generate the necessary CPython API wrapper around the translated code, making it possible for CPython to load this newly-created library after all.

(One might wonder how this is possible, but if one considers that arbitrary C or C++ code can be wrapped using the CPython API, with the values being sent in and out of a library only being converted at the interface, Shedskin has the luxury of generating something that lives by its own rules internally, with the wrapper doing the necessary “marshalling” of the values as they go in and come out. Such a wrapped library might be frowned upon in the Python world, not being a sophisticated extension module that takes full advantage of the CPython API, but such libraries were, and probably still are, the “bread and butter” of Python’s access to a wide array of tools and technologies.)

Knowing that this is a reasonable approach, I experienced a moment of excessive ambition. I tried to take the newly-broken-out library and compile it using the appropriate option:

shedskin -e optimiser.py

This took quite some time, and things did not go well…

*** SHED SKIN Python-to-C++ Compiler 0.9.2 ***
Copyright 2005-2011 Mark Dufour; License GNU GPL version 3 (See LICENSE)

[analyzing types..]
****************************88%
*WARNING* reached maximum number of iterations
********************************100%
[generating c++ code..]
*WARNING* 'get_combinations' function not exported (cannot convert argument 'c')
*WARNING* 'get_colours' function not exported (cannot convert return value)
*WARNING* 'balance' function not exported (cannot convert argument 'd')
*WARNING* optimiser.py: expression has dynamic (sub)type: {None, int, tuple}
*WARNING* optimiser.py: expression has dynamic (sub)type: {float, int, tuple2}
*WARNING* optimiser.py: expression has dynamic (sub)type: {float, int, tuple}
*WARNING* optimiser.py: expression has dynamic (sub)type: {int, tuple}
*WARNING* optimiser.py: variable 'data' has dynamic (sub)type: {None, int, tuple}
*WARNING* optimiser.py: variable (class SimpleImage, 'data') has dynamic (sub)type: {None, int, tuple}

And then there were many more lines of a more specific nature:

*WARNING* optimiser.py:41: function distance not called!
*WARNING* optimiser.py:50: expression has dynamic (sub)type: {None, int, tuple}
*WARNING* optimiser.py:53: expression has dynamic (sub)type: {None, int, tuple}
*WARNING* optimiser.py:57: expression has dynamic (sub)type: {float, int, tuple2}
*WARNING* optimiser.py:57: expression has dynamic (sub)type: {int, tuple}

And so on. Now, if you are thinking that this will probably not end well, you would be right. Running make gives plenty of errors looking as scary as this…

optimiser.cpp: In function '__shedskin__::list<__shedskin__::tuple2<__shedskin__::tuple2<int, int>*, double>*>* __optimiser__::list_comp_0(__shedskin__::pyiter<double>*)':
optimiser.cpp:76:17: error: base operand of '->' is not a pointer
optimiser.cpp:77:21: error: base operand of '->' is not a pointer
optimiser.cpp:78:68: error: invalid conversion from '__shedskin__::tuple2<int, int>*' to 'int' [-fpermissive]
In file included from /usr/share/shedskin/lib/builtin.hpp:1204:0,
                 from optimiser.cpp:1:
/usr/share/shedskin/lib/builtin/tuple.hpp:211:28: error:   initializing argument 2 of '__shedskin__::tuple2<A, B>::tuple2(int, A, B) [with A = int; B = double]' [-fpermissive]
optimiser.cpp:78:70: error: no matching function for call to '__shedskin__::list<__shedskin__::tuple2<__shedskin__::tuple2<int, int>*, double>*>::append(__shedskin__::tuple2<int, double>*)'
optimiser.cpp:78:70: note: candidate is:
In file included from /usr/share/shedskin/lib/builtin.hpp:1203:0,
                 from optimiser.cpp:1:
/usr/share/shedskin/lib/builtin/list.hpp:96:25: note: void* __shedskin__::list<T>::append(T) [with T = __shedskin__::tuple2<__shedskin__::tuple2<int, int>*, double>*]
/usr/share/shedskin/lib/builtin/list.hpp:96:25: note:   no known conversion for argument 1 from '__shedskin__::tuple2<int, double>*' to '__shedskin__::tuple2<__shedskin__::tuple2<int, int>*, double>*'

Of course, it would be unfair to expect this to have worked: Shedskin did indeed give us plenty of warnings! But we should at least try and understand such warnings if we are to make progress. After a few more iterations of this bold strategy, I realised that I had started out in the wrong fashion, presenting Shedskin with code that is too dynamic (and ambiguous) for it to reasonably infer sensible types and produce a compilable C++ representation.

First Things First

I started out by reintroducing some necessary changes gradually. First of all, I made a branch of my code committing the special image abstraction to be used in any future Shedskin-compiled version. Meanwhile, I looked at some things that might generally be performance-degrading in Python, and which might also help Shedskin deduce the program types more readily.

One thing that I had done in the initial implementation was to freely use sequences to represent colour triplets: the red, green and blue values. I was using code like this:

def invert(srgb):
    return tuple(map(lambda x: 1.0 - x, srgb))

This might be elegant – there’s no separate treatment of each element in the triplet – but there is likely to be more overhead in treating the triplet as a sequence, iterating over it, and so on. Moreover, Shedskin is likely to see objects appearing from a collection without any idea of what was put into that collection to start with. So, more mundane code was introduced in its place:

def invert(srgb):
    r, g, b = srgb
    return 1.0 - r, 1.0 - g, 1.0 - b

Such changes reduced the processing time from around 35 seconds to around 25 seconds. After various other performance modifications, such as avoiding the repeated computation of common values, this became around 18 to 19 seconds. Interestingly, merging such modifications into the branch with the special image abstraction reduced the running time to around 16 to 17 seconds. But at this point, obvious optimisation opportunities were more or less exhausted.

It then became time for another attempt to present this to Shedskin. This time, instead of calling the main program main.py and the library optimiser.py, which is not particularly intuitive, I retained the name of the main program as optimiser.py and used the traditional optimiserlib.py naming for the library:

shedskin -e optimiserlib.py

This completed without any warnings, and running make produced a library file, optimiserlib.so, that Python recognises as an extension module. Running the program produced a palette-optimised image in just under 9 seconds!

Making the Difference

So, what were the differences that made Shedskin accept the library code? Since the splitting up of the code into two files makes comparisons between versions awkward, we can first compare the version of the program before our preparatory work with the version of the program that fed into our Shedskin version. There are four areas of changes:

  • The introduction of that image abstraction class – SimpleImage – to be used to manipulate images instead of using Python Imaging Library objects. (In fact, this is only used initially as a container for the raw pixel data, with such data being passed to library functions, as described below.)
  • Modifications to access the different components of colour triplets explicitly.
  • Some “common sense” performance modifications, avoiding the repeated computation of things that always produce the same result anyway.
  • Some local name adjustments.

This final point is something that the Shedskin documentation mentions prominently, but it can be easy to forget. Consider the following code (in the balance function of the library module):

dd = dict([(value, f) for f, value in d])

Originally, this looked like this:

d = dict([(value, f) for f, value in d])

This earlier version just reverses a dictionary mapping and assigns the result to the name of the original dictionary. Although this seems harmless enough, Shedskin’s analysis would be complicated substantially by having to deal with a name being reassigned and potentially being associated with completely different kinds of objects, even if in this case we are only dealing with dictionaries. (Even if the same type were only ever involved, as we see here, it would also prevent any analysis being done on the nature of the keys and values in the dictionaries.)

We can see the effect of this by trying to compile the functioning version with the earlier naming scheme reintroduced. First, we see a warning like this:

*WARNING* 'balance' function not exported (cannot convert argument 'd')

Since Shedskin propagates type information around the program, this leads to other problems:

*WARNING* optimiserlib.py: expression has dynamic (sub)type: {float, int, tuple2}
*WARNING* optimiserlib.py: expression has dynamic (sub)type: {float, int, tuple}
*WARNING* optimiserlib.py: expression has dynamic (sub)type: {int, tuple}
*WARNING* optimiserlib.py: variable (function (function balance, 'list_comp_0'), 'value') has dynamic (sub)type: {int, tuple}
*WARNING* optimiserlib.py: variable (function (function balance, 'list_comp_1'), 'f') has dynamic (sub)type: {float, int, tuple2}
*WARNING* optimiserlib.py: variable (function (function balance, 'list_comp_1'), 'value') has dynamic (sub)type: {int, tuple}

And later in the messages, the more specific errors indicate the problem and its consequences:

*WARNING* optimiserlib.py:100: expression has dynamic (sub)type: {float, int, tuple2}
*WARNING* optimiserlib.py:100: expression has dynamic (sub)type: {float, int, tuple}
*WARNING* optimiserlib.py:100: expression has dynamic (sub)type: {int, tuple}
*WARNING* optimiserlib.py:102: expression has dynamic (sub)type: {float, int, tuple2}
*WARNING* optimiserlib.py:102: expression has dynamic (sub)type: {float, int, tuple}
*WARNING* optimiserlib.py:103: expression has dynamic (sub)type: {float, int, tuple2}
*WARNING* optimiserlib.py:103: expression has dynamic (sub)type: {float, int, tuple}
*WARNING* optimiserlib.py:104: expression has dynamic (sub)type: {float, int, tuple2}
*WARNING* optimiserlib.py:104: expression has dynamic (sub)type: {float, int, tuple}
*WARNING* optimiserlib.py:105: class 'list' has no method 'items'
*WARNING* optimiserlib.py:105: expression has dynamic (sub)type: {float, int, tuple2}
*WARNING* optimiserlib.py:105: expression has dynamic (sub)type: {float, int, tuple}
*WARNING* optimiserlib.py:105: expression has dynamic (sub)type: {int, tuple}

And so on. Such warnings, which will cause errors if one tries to compile the result, can seem very intimidating, but within all these details the cause should be identifiable.

But another important aspect of the successful translation of the library module should not be forgotten: it is the matter of choosing the right initial functionality to give to Shedskin. Instead of trying to compile everything, it makes sense to concentrate on things like functions which are fairly self-contained, which do not call potentially vast regions of other program functionality, and which are invoked often, especially in loops that take a long time. Migrating a small piece of functionality at a time helps to keep the troubleshooting activity manageable.

By using the SimpleImage class as a convenient staging post for pixel data, simple library functions could be migrated first, and the library could be kept ignorant of the abstractions being maintained in the main program. Later on, as we shall see below, such abstractions and functions that require them could then be migrated themselves.

Broadening the Scope

With some core functionality migrated to a compilable extension module, it becomes possible to give other things the Shedskin treatment. At the start of the second attempt to get Shedskin to compile the code, quite a few functions were still being run by the CPython virtual machine, notably various image operations. With the functions called by these operations already migrated, it becomes possible to move each of these operations over into the extension module. The expectation here is that since these functions can now call each other without the overhead of the CPython API, as well as avoiding running code in the virtual machine, further performance improvements will occur.

And since Shedskin’s own runtime library supports certain Python standard library modules in accelerated form, it becomes interesting to migrate operations that employ such module functionality. For example, the get_combinations function can be moved to take advantage of Shedskin’s itertools implementation. And in addition to just moving functions, classes can also be compiled by Shedskin and potentially accessed more quickly, while remaining accessible to normal Python code that hasn’t been compiled.

Not all of these optimisations give the boost in performance that might be hoped for, perhaps even bringing performance penalties when first introduced. But it is important to look beyond any current, small optimisation to future optimisations that the accumulation of those small optimisations will permit. Moving the SimpleImage class into the extension module permits the migration of other code, with the consequence that the program when run takes around 6 seconds instead of 9 seconds: the apparent optimisation barrier is overcome, leading to yet more gains!

Some Conclusions

It is possible to make very useful performance gains just by revisiting code and writing it in a way that suits the language implementation. Starting out with an execution time of around 40 seconds, it was possible to make the program run in closer to 15 seconds, and that might have been enough for occasional conversion of images. But by adopting Shedskin and applying it to a subset of the program’s functionality, an execution time of 9 seconds was initially obtained. This might not be quite as impressive as the earlier two-and-a-half fold reduction in time (or, of course, a two-and-a-half fold increase in throughput), but it was obtained with little additional effort, with admittedly some planning for it being incorporated into the earlier optimisation work.

Ultimately, reaching around 6 seconds of execution time means that Shedskin was indeed able to more or less match earlier performance gains, but again with little actual optimisation effort. And one can argue that merely using Shedskin helped to inform those earlier gains, anyway. Regardless of what should take the credit, the program ended up running almost seven times faster than when we started out on this journey.

Tools like Shedskin have a slightly controversial place in the Python world. People like to point out that Shedskin really only compiles a “restricted subset” of Python: one that Shedskin is able to analyse. And as the above exercise demonstrates, modifications to programs are likely to be required to take advantage of it. But the resulting programs are still Python programs, and the CPython virtual machine will still run them, just slower – three times slower in this case – than if they were compiled.

Authors of tools like Shedskin and alternative implementations of the Python language have a tough decision to make. They must either deal with the continuously changing “full-fat” version of the language, with new features being added all the time that they have to support, with the risk that they will never catch up and never be considered a “proper” implementation. Or they must impose limitations on the form of the language and the features they support, knowing that even if their software accepts programs written in Python, it will be marginalised and regarded as a distraction from “proper” Python programming. And yet, Python as delivered by CPython offers little in the way of things that Shedskin and similar tools seek to offer, so it should be no wonder that such tools have come to exist.

Instead, one might wonder why it is that the language must continue to evolve in ways that frustrate static analysis and enhancements to program performance and scalability. Indeed, my own interests in the analysis of Python code have been somewhat rekindled by this exercise, but unlike the valiant efforts of other developers (such as the author of Nuitka and his continuing quest to support the compilation of “full-fat” Python), I have no intention of using Python as my gold standard. In essence, Python is and has been a productive language to use – that is why I have used it here and many times before – but the very nature of the language should be open to question and review.

If, by discarding features, something like Python can be more predictable and better-performing, why would exploring this avenue of inquiry be a bad thing? Shedskin shows us that there are benefits in doing so, and I hope that this article has also shown some practical techniques for using and understanding the tool. And maybe this will encourage me and others to make some more progress with our own Python program analysis efforts as well, if only to offer alternatives that seek to deliver on some of the unrealised promise and potential that Python seemed to offer back when we first discovered it, so many years ago now.

Oslo architecture (4 from 8 colours with scaled original)

An architectural picture from Oslo with the left version using 4 colours from 8 on every display line, the right version being a scaled version of the original.