Paul Boddie's Free Software-related blog


Archive for the ‘filesystems’ Category

Some More Slow Progress

Sunday, April 7th, 2024

A couple of months have elapsed since my last, brief progress report on L4Re development, so I suppose a few words are required to summarise what I have done since. Distractions, travel, and other commitments notwithstanding, I did manage to push my software framework along a little, encountering frustrations and the occasional sensation of satisfaction along the way.

Supporting Real Hardware

Previously, I had managed to create a simple shell-like environment running within L4Re that could inspect an ext2-compatible filesystem, launch programs, and have those programs communicate with the shell – or each other – using pipes. Since I had also been updating my hardware support framework for L4Re on MIPS-based devices, I thought that it was time to face up to implementing support for memory cards – specifically, SD and microSD cards – so that I could access filesystems residing on such cards.

Although I had designed my software framework with things like disks and memory devices in mind, I had been apprehensive about actually tackling driver development for such devices, as well as about whether my conceptual model would prove too simple, necessitating more framework development just to achieve the apparently simple task of reading files. It turned out that the act of reading data, even when almost magical mechanisms like direct memory access (DMA) are used, is as straightforward as one could reasonably expect. I haven’t tested writing data yet, mostly because I am not that brave, but it should be essentially as straightforward as reading.

What was annoying and rather overcomplicated, however, was the way that memory cards have to be coaxed into cooperation, with the SD-related standards featuring layer upon layer of commands added every time they enhanced the technologies. Plenty of time was spent (or wasted) trying to get these commands to behave and to allow me to gradually approach the step where data would actually be transferred. In contrast, setting up DMA transactions was comparatively easy, particularly using my interactive hardware experimentation environment.

There were some memorable traps encountered in the exercise. One involved making sure that the interrupts signalling completed DMA transactions were delivered to the right thread. In L4Re, hardware interrupts are delivered via IRQ (interrupt request) objects to specific threads, and it is obviously important to make sure that a thread waiting for notifications (including interrupts) expects these notifications. Otherwise, they may cause a degree of confusion, which is what happened when a thread serving “blocks” of data to the filesystem components was presented with DMA interrupt occurrences. Obviously, the solution was to be more careful and to “bind” the interrupts to the thread interacting with the hardware.

Another trap involved the follow-on task of running programs that had been read from the memory card. In principle, this should have yielded few surprises: my testing environment involves QEMU and raw filesystem data being accessed in memory, and program execution was already working fine there. However, various odd exceptions were occurring when programs were starting up, forcing me to exercise the useful kernel debugging tool provided with the Fiasco.OC (or L4Re) microkernel.

Of course, the completely foreseeable problem involved caching: data loaded from the memory card was not yet available in the processor’s instruction cache, and so the processor was running code (or potentially something that might not have been code) that had been present in the cache. The problem tended to arise after a jump or branch in the code, executing instructions that did undesirable things to the values of the registers until something severe enough caused an exception. The solution, of course, was to make sure that the instruction cache was synchronised with the data cache containing the newly read data using the l4_cache_coherent function.

Replacing the C Library

With that, I could replicate my shell environment on “real hardware” which was fairly gratifying. But this only led to the next challenge: that of integrating my filesystem framework into programs in a more natural way. Until now, accessing files involved a special “filesystem client” library that largely mimics the normal C library functions for such activities, but the intention has always been to wrap these with the actual C library functions so that portable programs can be run. Ideally, there would be a way of getting the L4Re C library – an adapted version of uClibc – to use these client library functions.

A remarkable five years have passed since I last considered such matters. Back then, my investigations indicated that getting the L4Re library to interface to the filesystem framework might be an involved and cumbersome exercise due to the way the “backend” functionality is implemented. It seemed that the L4Re mechanism for using different kinds of filesystems involved programs dynamically linking to libraries that would perform the access operations on the filesystem, but I could not find much documentation for this framework, and I had the feeling that the framework was somewhat underdeveloped, anyway.

My previous investigations had led me to consider deploying an alternative C library within L4Re, with programs linking to this library instead of uClibc. C libraries generally come across as rather messy and incoherent things, accumulating lots of historical baggage as files are incorporated from different sources to support long-forgotten systems and architectures. The challenge was to find a library that could be conveniently adapted to accommodate a non-Unix-like system, with the emphasis on convenience precluding having to make changes to hundreds of files. Eventually, I chose Newlib because the breadth of its dependencies on the underlying system is rather narrow: a relatively small number of fundamental calls. In contrast, other C libraries assume a Unix-like system with countless, specialised system calls that would need to be reinterpreted and reframed in terms of my own library’s operations.

My previous effort had rather superficially demonstrated a proof of concept: linking programs to Newlib and performing fairly undemanding operations. This time round, I knew that my own framework had become more complicated, employed C++ in various places, and would create a lot of work if I were to decouple it from various L4Re packages, as I had done in my earlier proof of concept. I briefly considered and then rejected undertaking such extra work, instead deciding that I would simply dust off my modified Newlib sources, build my old test programs, and see which symbols were missing. I would then seek to reintroduce these symbols and hope that the existing L4Re code would be happy with my substitutions.

Supporting Threads

For the very simplest of programs, I was able to “stub” a few functions and get them to run. However, part of the sophistication of my framework in its current state is its use of threading to support various activities. For example, monitoring data streams from pipes and files involves a notification mechanism employing threads, and thus a dependency on the pthread library is introduced. Unfortunately, although Newlib does provide a similar pthread library to that featured in L4Re, it is not really done in a coherent fashion, and there is other pthread support present in Newlib that just adds to the confusion.

Initially, then, I decided to create “stub” implementations for the different functions used by various libraries in L4Re, like the standard C++ library whose concurrency facilities I use in my own code. I made a simple implementation of pthread_create, along with some support for mutexes. Running programs did exercise these functions and produce broadly expected results. Continuing along this path seemed like it might entail a lot of work, however, and in studying the existing pthread library in L4Re, I had noticed that although it resides within the “uclibc” package, it is somewhat decoupled from the C library itself.

Favouring laziness, I decided to see if I couldn’t make a somewhat independent package that might then be interfaced to Newlib. For the most part, this exercise involved introducing missing functions and lots of debugging, watching the initialisation of programs fail due to things like conflicts with capability allocation, perhaps due to something I am doing wrong, or perhaps exposing conditions that are fortuitously avoided in L4Re’s existing uClibc arrangement. Ultimately, I managed to get a program relying on threading to start, leaving me with the exercise of making sure that it was producing the expected output. This involved some double-checking of previous measures to allow programs using different C libraries to communicate certain kinds of structures without them misinterpreting the contents of those structures.

Further Work

There is plenty still to do in this effort. First of all, I need to rewrite the remaining test programs to use C library functions instead of client library functions, having done this for only a couple of them. Then, it would be nice to expand C library coverage to deal with other operations, particularly process creation since I spent quite some time getting that to work.

I need to review the way Newlib handles concurrency and determine what else I need to do to make everything work as it should in that regard. I am still using code from an older version of Newlib, so an update to a newer version might be sensible. In this latest round of C library evaluation, I briefly considered Picolibc which is derived from Newlib and other sources, but I didn’t fancy having to deal with its build system or to repackage the sources to work with the L4Re build system. I did much of the same with Newlib previously and, having worked through such annoyances, was largely able to focus on the actual code as opposed to the tooling.

Currently, I have been statically linking programs to Newlib, but I eventually want to dynamically link them. This does exercise different paths in the C and pthread libraries, but I also want to explore dynamic linking more broadly in my own environment, having already postponed such investigations from my work on getting programs to run. Introducing dynamic linking and shared libraries helps to reduce memory usage and increase the performance of a system when multiple programs need the same libraries.

There are also some reasonable arguments for making the existing L4Re pthread implementation more adaptable, consolidating my own changes to the code, and also for considering making or adopting other pthread implementations. Convenient support for multiple C library implementations, and for combining these with other libraries, would be desirable, too.

Much of the above has been a distraction from what I have been wanting to focus on, however. Had it been more apparent how to usefully extend uClibc, I might not have bothered looking at Newlib or other C libraries, and then I probably wouldn’t have looked into threading support. Although I have accumulated some knowledge in the process, and although some of that knowledge will eventually have proven useful, I cannot help feeling that L4Re, being a fairly mature product at this point and a considerable achievement, could be more readily extensible and accessible than it currently is.

Continuing Explorations into Filesystems and Paging with L4Re

Saturday, April 8th, 2023

Towards the end of last year, I spent a fair amount of time trying to tidy up and document the work I had been doing on integrating a conventional filesystem into the L4 Runtime Environment (or L4Re Operating System Framework, as it now seems to be called). Some of that effort was purely administrative, such as giving the work a more meaningful name and changing references to the naming in various places, whereas other aspects were concerned with documenting mundane things like how the software might be obtained, built and used. My focus had shifted somewhat towards sharing the work and making it slightly more accessible to anyone who might be interested (even if this is probably a very small audience).

Previously, in seeking to demonstrate various mechanisms such as the way programs might be loaded and run, with their payloads paged into memory on demand, I had deferred other work that I felt was needed to make the software framework more usable. For example, I was not entirely happy with the way that my “client” library for filesystem access hid the underlying errors, making troubleshooting less convenient than it could be. Instead of perpetuating the classic Unix “errno” practice, I decided to give file data structures their own error member to retain any underlying error, meaning that a global variable would not be involved in any error reporting.

Other matters needed attending to, as well. Since acquiring a new computer in 2020 based on the x86-64 architecture, the primary testing environment for this effort has been a KVM/QEMU instance invoked by the L4Re build process. When employing the same x86-64 architecture for the instance as the host system, the instance should in theory be very efficient, but for some reason the startup time of such x86-64 instances is currently rather long. This was not the case at some point in the past, but having adopted the Git-based L4Re distribution, this performance regression made an appearance. Maybe at some stage in the future I will discover why it sits there for half a minute spinning up at the “Booting ROM” stage, but for now a reasonable workaround is to favour QEMU instances for other architectures when testing my development efforts.

Preserving Portability

Having long been aware of the necessity of software portability, I have therefore been testing the software in QEMU instances emulating the classic 32-bit x86 architecture as well as MIPS32, in which I have had a personal interest for several years. Surprisingly, testing on x86 revealed a few failures that were not easily explained, but I eventually tracked them down to interoperability problems with the L4Re IPC library, where that library was leaving parts of IPC message values uninitialised and causing my own IPC library to misinterpret the values being sent. This investigation also led me to discover that the x86 Application Binary Interface is rather different in character to the ABI for other architectures. On those other architectures, the alignment of members in structures (and of parameters in parameter lists) needs to be done more carefully due to the way values in memory are accessed. On x86, meanwhile, it seems that values of different sizes can be more readily packed together.

In any case, I came to believe that the L4Re IPC library is not following the x86 ABI specification in the way IPC messages are prepared. I did wonder whether this was deliberate, but I think that it is actually inadvertent. One of my helpful correspondents confirmed that there was indeed a discrepancy between the L4Re code and the ABI, but nothing came of any enquiries into the matter, so I imagine that in any L4Re systems deployed on x86 (although I doubt that there can be many), the use of the L4Re code on both sides of any given IPC transaction manages to conceal this apparent deficiency. The consequence for me was that I had to introduce a workaround in the cases where my code needs to interact with various existing L4Re components.

Several other portability changes were made to resolve a degree of ambiguity around the sizes of various types. This is where the C language family and various related standards and technologies can be infuriating, with care required when choosing data types and then using these in conjunction with libraries that might have their own ideas about which types should be used. Although there probably are good reasons for some types to be equivalent to a “machine word” in size, such types sit uncomfortably with types of other, machine-independent sizes. I am sure I will have to revisit these choices over and over again in future.

Enhancing Component Interface Descriptions

One thing I had been meaning to return to was the matter of my interface description language (IDL) tool and its lack of support for composing interfaces. For example, a component providing file content might expose several different interfaces for file operations, dataspace operations, and so on. These compound interfaces had been defined by specifying arguments for each invocation of the IDL tool that indicate all the interfaces involved, and thus the knowledge of each compound interface ended up being encoded as definitions within Makefiles like this:

mapped_file_object_INTERFACES = dataspace file flush mapped_file notification

A more natural approach involved defining these interfaces in the interface description language itself, but this was going to require putting in the effort to extend the tool, which would not be particularly pleasant, being written in C using Flex and Bison.

Eventually, I decided to just get on with remedying the situation, adding the necessary tool support, and thus tidying up and simplifying the Makefiles in my L4Re build system package. This did raise the complexity level in the special Makefiles provided to support the IDL tool – nothing in the realm of Makefiles is ever truly easy – but it hopefully confines such complexity out of sight and keeps the main project Makefiles as concise as can reasonably be expected. For reference, here is how a file component interface looks with this new tool support added:

interface MappedFileObject composes Dataspace, File, Flush, MappedFile, Notification;

And for reference, here is what one of the constituent interfaces looks like:

interface Flush
{
  /* Flush data and update the size, if appropriate. */

  [opcode(5)] void flush(in offset_t populated_size, out offset_t size);
};

I decided to diverge from previous languages of this kind and to use “composes” instead of language like “inherits”. These compound interface descriptions deliberately do not seek to combine interfaces in a way that entirely resembles inheritance as supported by various commonly used programming languages, and an interface composing other interfaces cannot also add operations of its own: it can merely combine other interfaces. The main reason for such limitations is the deliberate simplicity or lack of capability of the tool: it only really transcribes the input descriptions to equivalent forms in C or C++ and neglects to impose many restrictions of its own. One day, maybe I will revisit this and at least formalise these limitations instead of allowing them to emerge from the current state of the implementation.

A New Year

I had hoped to deliver something for broader perusal late last year, but the end of the year arrived and with it some intriguing but increasingly time-consuming distractions. Having written up the effective conclusion of those efforts, I was able to turn my attention to this work again. To start with, that involved reminding myself where I had got to with it, which underscores the need for some level of documentation, because documentation not only communicates the nature of a work to others but it also communicates it to one’s future self. So, I had to spend some time rediscovering the finer detail and reminding myself what the next steps were meant to be.

My previous efforts had demonstrated the ability to launch new programs from my own programs, reproducing some of what L4Re already provides but in a form more amenable to integrating with my own framework. If the existing L4Re code had been more obviously adaptable in a number of different ways throughout my long process of investigation and development for it, I might have been able to take some significant shortcuts and save myself a lot of effort. I suppose, however, that I am somewhat wiser about the technologies and techniques involved, which might be beneficial in its own way. The next step, then, was to figure out how to detect and handle the termination of programs that I had managed to launch.

In the existing L4Re framework, a component called Ned is capable of launching programs, although not being able to see quite how I might use it for my own purposes – that being to provide a capable enough shell environment for testing – had led me along my current path of development. It so happens that Ned supports an interface for “parent” tasks that is used by created or “child” tasks, and when a program terminates, the general support code for the program that is brought along by the C library includes the invocation of an operation on this parent interface before the program goes into a “wait forever” state. Handling this operation and providing this interface seemed to be the most suitable approach for replicating this functionality in my own code.

Consolidation and Modularisation

Before going any further, I wanted to consolidate my existing work which had demonstrated program launching in a program written specifically for that purpose, bringing along some accompanying abstractions that were more general in nature. First of all, I decided to try and make a library from the logic of the demonstration program I had written, so that the work involved in setting up the environment and resources for a new program could be packaged up and re-used. I also wanted the functionality to be available through a separate server component, so that programs wanting to start other programs would not need to incorporate this functionality but could instead make a request to this separate “process server” component to do the work, obtaining a reference to the new program in response.

One might wonder why one might bother introducing a separate component to start programs on another program’s behalf. As always when considering the division of functionality between components in a microkernel-based system, it is important to remember that components can have different configurations that afford them different levels of privilege within a system. We might want to start programs with one level of privilege from other programs with a different level of privilege. Another benefit of localising program launching in one particular component is that it might provide an overview of such activities across a number of programs, thus facilitating support for things like job and process control.

Naturally, an operating system does not need to consolidate all knowledge about running programs or processes in one place, and in a modular microkernel-based system, there need not even be a single process server. In fact, it seems likely that if we preserve the notion of a user of the system, each user might have their own process server, and maybe even more than one of them. Such a server would be configured to launch new programs in a particular way, having access only to resources available to a particular user. One interesting possibility is that of being able to run programs provided by one filesystem that then operate on data provided by another filesystem. A program would not be able to see the filesystem from which it came, but it would be able to see the contents of a separate, designated filesystem.

Region Mapper Deficiencies

A few things conspired to make the path of progress rather less direct than it might have been. Having demonstrated the launching of trivial programs, I had decided to take a welcome break from the effort. Returning to the effort, I decided to test access to files served up by my filesystem infrastructure, and this caused programs to fail. In order to support notification events when accessing files, I employ a notification thread to receive such events from other components, but the initialisation of threading in the C library was failing. This turned out to be due to the use of a region mapper operation that I had not yet supported, so I had to undertake a detour to implement an appropriate data structure in the region mapper, which in C++ is not a particularly pleasant experience.

Later on, the region mapper caused me some other problems. I had neglected to implement the detach operation, which I rely on quite heavily for my file access library. Attempting to remedy these problems involved reacquainting myself with the region mapper interface description which is buried in one of the L4Re packages, not to be confused with plenty of other region mapper abstractions which do not describe the actual interface employed by the IPC mechanism. The way that L4Re has abandoned properly documented interface descriptions is very annoying, requiring developers to sift through pages of barely commented code and to be fully aware of the role of that code. I implemented something that seemed to work, quite sure that I still did not have all the details correct in my implementation, and this suspicion would prove correct later on.

Local and Non-Local Capabilities

Another thing that I had not fully understood, when trying to put together a library handling IPC that I could tolerate working with, was the way that capabilities may be transferred in IPC messages within tasks. Capabilities are references to components in the system, and when transferred between tasks, the receiving task is meant to allocate a “slot” for each received capability. By choosing a slot denoted by an index, the task (or the program running in it) can tell the kernel where to record the capability in its own registry for the task, and by employing this index in its own registry, the program will be able to maintain a record of available capabilities consistent with that of the kernel.

The practice of allocating capability slots for received capabilities is necessary for transfers between tasks, but when the transfer occurs within a task, there is no need to allocate a new slot: the received capability is already recorded within the task, and so the item describing the capability in the message will actually encode the capability index known to the task. Previously, I was not generally sending capabilities in messages within tasks, and so I had not knowingly encountered any issues with my simplistic “general case” support for capability transfers, but having implemented a region mapper that resides in the same task as a program being run, it became necessary to handle the capabilities presented to the region mapper from within the same task.

One counterintuitive consequence of the capability management scheme arises from the general, inter-task transfer case. When a task receives a capability from another task, it will assign a new index to the capability ahead of time, since the kernel needs to process this transfer as it propagates the message. This leaves the task with a new capability without any apparent notion of whether it has seen that capability before. Maybe there is a way of asking the kernel if two capabilities refer to the same object, but it might be worthwhile just not relying on such facilities and designing frameworks around such restrictions instead.

Starting and Stopping

So, back to the exercise of stopping programs that I had been able to start! It turned out that receiving the notification that a program had finished was only the start; what then needed to happen was something of a mystery. Intuitively, I knew that the task hosting the program’s threads would need to be discarded, but I envisaged that the threads themselves probably needed to be discarded first, since they are assigned to the task and probably cannot have that task removed from under them, even if they are suspended in some sense.

But what about everything else referenced by the task? After all, the task will have capabilities for things like dataspaces that provide access to regions of files and to the program stack, for things like the filesystem for opening other files, for semaphore and IRQ objects, and so on. I cannot honestly say that I have the definitive solution, and I could not easily find much in the way of existing guidance, so I decided in the end to just try and tidy all the resources up as best I could, hopefully doing enough to make it possible to release the task and have the kernel dispose of it. This entailed a fairly long endeavour that also encouraged me to evolve the way that the monitoring of the process termination is performed.

When the started program eventually reaches the end and sends a message to its “parent” component, that component needs to record any termination state communicated in the message so that it may be passed on to the program’s creator or initiator, and then it also needs to commence the work of wrapping up the program. Here, I decided on a distinct component separate from one responsible for any paging activities to act as the contact point for the creating or initiating program. When receiving a termination message or signal, this component disconnects the terminating program from its internal pager by freeing the capability, and this then causes the internal pager to terminate, itself sending a signal to its own parent.

One important aspect of starting and terminating processes is that of notifying the party that sought to start a process in the first place. For filesystem operations, I had already implemented support for certain notification events related to opening, modifying and closing files and pipes, with these being particularly important for pipes. I wanted to extend this support to processes so that it might be possible to monitor files, pipes and processes together using a kind of select or poll operation. This led to a substantial detour where I became dissatisfied with the existing support, modified it, had to debug it, and remain somewhat concerned that it might need more work in the future.

Testing on the different architectures under QEMU also revealed that I would need to handle the possibility that a program might be started and run to completion before its initiator had even received a reference to the program for notification purposes. Fortunately, a similar kind of vanishing resource problem arose when I was developing the file paging system, and so I had a technique available to communicate the reference to the process monitor component to the initiator of the program, ensuring that the process monitor becomes established in the kernel’s own records, before the program itself gets started, runs and completes, avoiding the process monitor being tidied up before its existence becomes known to the wider system.

Wrapping Up Again

A few concerns remain with the state of the work so far. I experienced problems with filesystem access that I traced to the activity of repeatedly attaching and detaching dataspaces, which is something my filesystem access library does deliberately, but the error suggested that the L4Re region mapper had somehow failed to attach the appropriate region. This may well be caused by issues within my own code, and my initial investigation did indeed uncover a problem in my own code where the size of the attached region of a file would gradually increase over time. With this mistake fixed, the situation was improved, but the underlying problem was not completely eliminated, judging from occasional errors. A workaround has been adopted for now.

Various other problems arose and were hopefully resolved. I would say that some of them were due to oversights when getting things done takes precedence over a more complete consideration of all the issues, particularly when working in a language like C++ where lower-level chores like manual memory management enter the picture. The differing performance when emulating various architectures under QEMU also revealed a deficiency with my region mapper implementation. It turned out that detach operations were not returning successfully, leading the L4Re library function to return without invalidating memory pages, and so my file access operations were returning pages of incorrect content instead of the expected file content for the first few accesses until the correct pages had been paged in and were almost continuously resident.

Here, yet more digging around in the L4Re code revealed an apparent misunderstanding about the return value associated with one of the parameters to the detach operation, that of the detached dataspace. I had concluded that a genuine capability was meant to be returned, but it seems that a simple index value is returned in a message word instead of a message item, and so there is no actual capability transferred to the caller, not even a local one. The L4Re IPC framework does not really make the typing semantics very clear, or at least not to me, and the code involved is quite unfathomable. Again, a formal interface specification written in a clearly expressed language would have helped substantially.

Next Steps

I suppose progress of sorts has been made in the last month or so, for which I can be thankful. Although tidying up the detritus of my efforts will remain an ongoing task, I can now initiate programs and wait for them to finish, meaning that I can start building up test suites within the environment, combining programs with differing functionality in a Unix-like fashion to hopefully validate the behaviour of the underlying frameworks and mechanisms.

Now, I might have tried much of this with L4Re’s Lua-based scripting, but it is not as straightforward as a more familiar shell environment, appearing rather more low-level in some ways, and it is employed in a way that seems to favour parallel execution instead of the sequential execution that I might desire when composing tests: I want tests to involve programs whose results feed into subsequent programs, as opposed to just running a load of programs at once. Also, without more extensive documentation, the Lua-based scripting support remains a less attractive choice than just building something where I get to control the semantics. Besides, I also need to introduce things like interprocess pipes, standard input and output, and such things familiar from traditional software platforms. Doing that for a simple shell-like environment would be generally beneficial, anyway.

Should I continue to make progress, I would like to explore some of the possibilities hinted at above. The modular architecture of a microkernel-based system should allow a more flexible approach in partitioning the activities of different users, along with the configuration of their programs. These days, so much effort is spent in “orchestration” and the management of containers, with a veritable telephone directory of different technologies and solutions competing for the time and attention of developers who are now also obliged to do the work of deployment specialists and systems administrators. Arguably, much of that involves working around the constraints of traditional systems instead of adapting to those systems, with those systems themselves slowly adapting in not entirely convincing or satisfactory ways.

I also think back to my bachelor’s degree dissertation about mobile software agents where the idea was that untrusted code might be transmitted between systems to carry out work in a safe and harmless fashion. Reducing the execution environment of such agent programs to a minimum and providing decent support for monitoring and interacting with them would be something that might be more approachable using the techniques explored in this endeavour. Pervasive, high-speed, inexpensively-accessed networks undermined the envisaged use-cases for mobile agents in general, although the practice of issuing SQL queries to database servers or having your browser run JavaScript programs deployed in Web pages demonstrates that the general paradigm is far from obsolete.

In any case, my “to do” list for this project will undoubtedly remain worryingly long for the foreseeable future, but I will hopefully be able to remedy shortcomings, expand the scope and ambition of the effort, and continue to communicate my progress. Thank you to those who have made it to the end of this rather dry article!

Considering Unexplored Products of the Past: Formulating a Product

Friday, February 10th, 2023

Previously, I described exploring the matter of developing emulation of a serial port, along with the necessary circuitry, for Elkulator, an emulator for the Acorn Electron microcomputer, motivated by a need to provide a way of transferring files into and out of the emulated computer. During this exploration, I had discovered some existing software that had been developed to provide some level of serial “filing system” support on the BBC Microcomputer – the higher-specification sibling of the Electron – with the development of this software having been motivated by an unforeseen need to transfer software to a computer without any attached storage devices.

This existing serial filing system software was a good indication that serial communications could provide the basis of a storage medium. But instead of starting from a predicament involving computers without usable storage facilities, where an unforeseen need motivates the development of a clever workaround, I wanted to consider what such a system might have been like if there had been a deliberate plan from the very beginning to deploy computers that would rely on a serial connection for all their storage needs. Instead of having an implementation of the filing system in RAM, one could have the luxury of putting it into a ROM chip that would be fitted in the computer or in an expansion, and a richer set of features might then be contemplated.

A Smarter Terminal

Once again, my interest in the historical aspects of the technology provided some guidance and some inspiration. When microcomputers started to become popular and businesses and institutions had to decide whether these new products had any relevance to their operations, there was some uncertainty about whether such products were capable enough to be useful or whether they were a distraction from the facilities already available in such organisations. It seems like a lifetime ago now, but having a computer on every desk was not necessarily seen as a guarantee of enhanced productivity, particularly if they did not link up to existing facilities or did not coordinate the work of a number of individuals.

At the start of the 1980s, equipping an office with a computer on every desk and equipping every computer with a storage solution was an expensive exercise. Even disk drives offering only a hundred kilobytes of storage on each removable floppy disk were expensive, and hard disk drives were an especially expensive and precious luxury that were best shared between many users. Some microcomputers were marketed as multi-user systems, encouraging purchasers to connect terminals to them and to share those precious resources: precisely the kind of thing that had been done with minicomputers and mainframes. Such trends continued into the mid-1980s, manifested by products promoted by companies with mainframe origins, such companies perpetuating entrenched tendencies to frame computing solutions in certain ways.

Terminals themselves were really just microcomputers designed for the sole purpose of interacting with a “host” computer, and institutions already operating mainframes and minicomputers would have experienced the need to purchase several of them. Until competition intensified in the terminal industry, such products were not particularly cheap, with the DEC VT220 introduced in 1983 costing $1295 at its introduction. Meanwhile, interest in microcomputers and the possibility of distributing some kinds of computing activity to these new products, led to experimentation in some organisations. Some terminal manufacturers responded by offering terminals that also ran microcomputer software.

Much of the popular history of microcomputing, familiar to anyone who follows such topics online, particularly through YouTube videos, focuses on adoption of such technology in the home, with an inevitable near-obsession with gaming. The popular history of institutional adoption often focuses on the upgrade parade from one generation of computer to the next. But there is a lesser told history involving the experimentation that took place at the intersection of microcomputing and minicomputing or mainframe computing. In universities, computers like the BBC Micro were apparently informally introduced as terminals for other systems, terminal ROMs were developed and shared between institutions. However, there seems to have been relatively little mainstream interest in such software as fully promoted commercial products, although Acornsoft – Acorn’s software outlet – did adopt such a ROM to sell as their Termulator product.

The Acorn Electron, introduced at £199, had a “proper” keyboard and the ability to display 80 columns of text, unlike various other popular microcomputers. Indeed, it may have been the lowest-priced computer to be able to display 80 columns of relatively high definition text as standard, such capabilities requiring extra cards for machines like the Apple II and the Commodore 64. Considering the much lower price of such a computer, the ongoing experimentation underway at the time with its sibling machine on alternative terminal solutions, and the generally favourable capabilities of both these machines, it seems slightly baffling that more was not done to pursue opportunities to introduce a form of “intelligent terminal” or “hybrid terminal” product to certain markets.

VIEW in 80 columns on the Acorn Electron.

VIEW in 80 columns on the Acorn Electron.

None of this is to say that institutional users would have been especially enthusiastic. In some institutions, budgets were evidently generous enough that considerable sums of money would be spent acquiring workstations that were sometimes of questionable value. But in others, the opportunity to make savings, to explore other ways of working, and perhaps also to explicitly introduce microcomputing topics such as software development for lower-specification hardware would have been worthy of some consideration. An Electron with a decent monochrome monitor, like the one provided with the M2105, plus some serial hardware, could have comprised a product sold for perhaps as little as £300.

The Hybrid Terminal

How would a “hybrid terminal” solution work, how might it have been adopted, and what might it have been used for? Through emulation and by taking advantage of the technological continuity in multi-user systems from the 1980s to the present day, we can attempt to answer such questions. Starting with communications technologies familiar in the world of the terminal, we might speculate that a serial connection would be the most appropriate and least disruptive way of interfacing a microcomputer to a multi-user system.

Although multi-user systems, like those produced by Digital Equipment Corporation (DEC), might have offered network connectivity, it is likely that such connectivity was proprietary, expensive in terms of the hardware required, and possibly beyond the interfacing capabilities of most microcomputers. Meanwhile, Acorn’s own low-cost networking solution, Econet, would not have been directly compatible with these much higher-end machines. Acorn’s involvement in network technologies is also more complicated than often portrayed, but as far as Econet is concerned, only much later machines would more conveniently bridge the different realms of Econet and standards-based higher-performance networks.

Moreover, it remains unlikely that operators and suppliers of various multi-user systems would have been enthusiastic about fitting dedicated hardware and installing dedicated software for the purpose of having such systems communicate with third-party computers using a third-party network technology. I did find it interesting that someone had also adapted Acorn’s network filing system that usually runs over Econet to work instead over a serial connection, which presumably serves files out of a particular user account. Another discovery I made was a serial filing system approach by someone who had worked at Acorn who wanted to transfer files between a BBC Micro system and a Unix machine, confirming that such functionality was worth pursuing. (And there is also a rather more complicated approach involving more exotic Acorn technology.)

Indeed, to be successful, a hybrid terminal approach would have to accommodate existing practices and conventions as far as might be feasible in order to not burden or disturb the operators of these existing systems. One motivation from an individual user’s perspective might be to justify introducing a computer on their desk, to be able to have it take advantage of the existing facilities, and to augment those facilities where it might be felt that they are not flexible or agile enough. Such users might request help from the operators, but the aim would be to avoid introducing more support hassles, which would easily arise if introducing a new kind of network to the mix. Those operators would want to be able to deploy something and have it perform a role without too much extra thought.

I considered how a serial link solution might achieve this. An existing terminal would be connected to, say, a Unix machine and be expected to behave like a normal client, allowing the user to log into their account. The microcomputer would send some characters down the serial line to the Unix “host”, causing it to present the usual login prompt, and the user would then log in as normal. They would then have the option of conducting an interactive session, making their computer like a conventional terminal, but there would also be the option of having the Unix system sit in the background, providing other facilities on request.

Logging into a remote service via a serial connection.

Logging into a remote service via a serial connection.

The principal candidates for these other facilities would be file storage and printing. Both of these things were centrally managed in institutions, often available via the main computing service, and the extensible operating system of the Electron and related microcomputers invites the development of software to integrate the core support for these facilities with such existing infrastructure. Files would be loaded from the user’s account on the multi-user system and saved back there again. Printing would spool the printed data to files somewhere in the user’s home directory for queuing to centralised printing services.

Attempting an Implementation

I wanted to see how such a “serial computing environment” would work in practice, how it would behave, what kinds of applications might benefit, and what kind of annoyances it might have. After all, it might be an interesting idea or a fun idea, but it need not be a particularly good one. The first obstacle was that of understanding how the software elements would work, primarily on the Electron itself, from the tasks that I would want the software to perform down to the way the functionality would be implemented. On the host or remote system, I was rather more convinced that something could be implemented since it would mostly be yet another server program communicating over a stream, with plenty of modern Unix conveniences to assist me along the way.

As it turned out, my investigations began with a trip away from home and the use of a different, and much more constrained, development environment involving an ARM-based netbook. Fortunately, Elkulator and the different compilers and tools worked well enough on that development hardware to make the exercise approachable. Another unusual element was that I was going to mostly rely on the original documentation in the form of the actual paper version of the Acorn Electron Advanced User Guide for information on how to write the software for the Electron. It was enlightening coming back to this book after a few decades for assistance on a specific exercise, even though I have perused the book many times in its revised forms online, because returning to it with a focus on a particular task led me to find that the documentation in the book was often vague or incomplete.

Although the authors were working in a different era and presumably under a degree of time pressure, I feel that the book in some ways exhibits various traits familiar to those of us working in the software industry, these indicating a lack of rigour and of sufficient investment in systems documentation. For this, I mostly blame the company who commissioned the work and then presumably handed over some notes and told the authors to fill in the gaps. As if to strengthen such perceptions of hurriedness and lack of review, it also does not help that “system” is mis-spelled “sysem” in a number of places in the book!

Nevertheless, certain aspects of the book were helpful. The examples, although focusing on one particular use-case, did provide helpful detail in deducing the correct way of using certain mechanisms, even if they elected to avoid the correct way of performing other tasks. Acorn’s documentation had a habit of being “preachy” about proper practices, only to see its closest developers ignore those practices, anyway. Eventually, on returning from my time away, I was able to fill in some of the gaps, although by this time I had a working prototype that was able to do basic things like initiate a session on the host system and to perform some file-related operations.

There were, and still are, a lot of things that needed, and still need, improvement with my implementation. The way that the operating system needs to be extended to provide extra filing system functionality involves plenty of programming interfaces, plenty of things to support, and also plenty of opportunities for things to go wrong. The VIEW word processor makes use of interfaces for both whole-file loading and saving as well as random-access file operations. Missing out support for one or the other will probably not yield the desired level of functionality.

There are also intricacies with regard to switching printing on and off – this typically being done using control characters sent through the output stream – and of “spool” files which capture character output. And filing system ROMs need to be initialised through a series of “service calls”, these being largely documented, but the overall mechanism is left largely undescribed in the documentation. It is difficult enough deciphering the behaviour of the Electron’s operating system today, with all the online guidance available in many forms, so I cannot imagine how difficult it would have been as a third party to effectively develop applications back in the day.

Levels of Simulation

To support the activities of the ROM software in the emulated Electron, I had to develop a server program running on my host computer. As noted above, this was not onerous, especially since I had already written a program to exercise the serial communications and to interact with the emulated serial port. I developed this program further to respond to commands issued by my ROM, performing host operations and returning results. For example, the CAT command produces a “catalogue” of files in a host directory, and so my server program performs a directory listing operation, collects the names of the files, and then sends them over the virtual serial link to the ROM for it to display to the user.

To make the experience somewhat authentic and to approximate to an actual deployment environment, I included a simulation of the login prompt so that the user of the emulated Electron would have to log in first, with the software also having to deal with a logged out (or not yet logged in) condition in a fairly graceful way. To ensure that they are logged in, a user selects the Serial Computing Environment using the *SCE command, this explicitly selecting the serial filing system, and the login dialogue is then presented if the user has not yet logged into the remote host. Once logged in, the ROM software should be able to test for the presence of the command processor that responds to issued commands, only issuing commands if the command processor has signalled its presence.

Although this models a likely deployment environment, I wanted to go a bit further in terms of authenticity, and so I decided to make the command processor a separate program that would be installed in a user account on a Unix machine. The user’s profile script would be set up to run the command processor, so that when they logged in, this program would automatically run and be ready for commands. I was first introduced to such practices in my first workplace where a menu-driven, curses-based program I had written was deployed so that people doing first-line technical support could query the database of an administrative system without needing to be comfortable with the Unix shell environment.

For complete authenticity I would actually want to have the emulated Electron contact a Unix-based system over a physical serial connection, but for now I have settled for an arrangement whereby a pseudoterminal is created to run the login program, with the terminal output presented to the emulator. Instead of seeing a simulated login dialogue, the user now interacts with the host system’s login program, allowing them to log into a real account. At that point, the command processor is invoked by the shell and the user gets back control.

Obtaining a genuine login dialogue from a Unix system.

Obtaining a genuine login dialogue from a Unix system.

To prevent problems with certain characters, the command processor configures the terminal to operate in raw mode. Apart from that, it operates mostly as it did when run together with the login simulation which did not have to concern itself with such things as terminals and login programs.

Some Applications

This effort was motivated by the need or desire to be able to access files from within Elkulator, particularly from applications such as VIEW. Naturally, VIEW is really just one example from the many applications available for the Electron, but since it interacts with a range of functionality that this serial computing environment provides, it serves to showcase such functionality fairly well. Indeed, some of the screenshots featured in this and the previous article show VIEW operating on text that was saved and loaded over the serial connection.

Accessing files involves some existing operating system commands, such as *CAT (often abbreviated to *.) to list the catalogue of a storage medium. Since a Unix host supports hierarchical storage, whereas the Electron’s built-in command set only really addresses the needs of a flat storage medium (as provided by various floppy disk filing systems for Electron and BBC Micro), the *DIR command has been introduced from Acorn’s hierarchical filing systems (such as ADFS) to navigate between directories, which is perhaps confusing to anyone familiar with other operating systems, such as the different variants of DOS and their successors.

Using catalogue and directory traversal commands.

Using catalogue and directory traversal commands.

VIEW allows documents to be loaded and saved in a number of ways, but as a word processor it also needs to be able to print these documents. This might be done using a printer connected to a parallel port, but it makes a bit more sense to instead allow the serial printer to be selected and for printing to occur over the serial connection. However, it is not sufficient to merely allow the operating system to take over the serial link and to send the printed document, if only because the other side of this link is not a printer! Indeed, the command processor is likely to be waiting for commands and to see the incoming data as ill-formed input.

The chosen solution was to intercept attempts to send characters to a serial printer, buffering them and then sending the buffered data in special commands to the command processor. This in turn would write the printed characters to a “spool” file for each printing session. From there, these files could be sent to an appropriate printer. This would give the user rather more control over printing, allowing them to process the printout with Unix tools, or to select one particular physical printer out of the many potentially available in an organisation. In the VIEW environment, and in the MOS environment generally, there is no built-in list of printers or printer selection dialogue.

Since the kinds of printers anticipated for use with VIEW might well have been rather different from the kinds connected to multi-user systems, it is likely that some processing would be desirable where different text styles and fonts have been employed. Today, projects like PrinterToPDF exist to work with old-style printouts, but it is conceivable that either the “printer driver generator” in the View suite or some postprocessing tool might have been used to produce directly printable output. With unstyled text, however, the printouts are generally readable and usable, as the following excerpt illustrates.

               A  brief report on the experience
               of using VIEW as a word processor
               four decades on.

Using VIEW on the Acorn  Electron  is  an  interesting  experience  and  a
glimpse  into  the  way  word  processing  was  once done. Although I am a
dedicated user of Vim, I am under no  illusions  of  that  program's  word
processing  capabilities: it is deliberately a screen editor based on line
editor  heritage,  and  much  of  its  operations  are  line-oriented.  In
contrast, VIEW is intended to provide printed output: it presents the user
with a  ruler  showing  the  page margins and tab stops, and it even saves
additional   rulers   into  the  stored  document   in   their   on-screen
representations. Together with its default typewriter-style  behaviour  of
allowing  the  cursor  to  be moved into empty space and of overwriting or
replacing text, there is a quaint feel to it.

Since VIEW is purely text-based, I can easily imagine converting its formatting codes to work with troff. That would then broaden the output options. Interestingly, the Advanced User Guide was written in VIEW and then sent to a company for typesetting, so perhaps a workflow like this would have been useful for the authors back then.

A major selling point of the Electron was its provision of BBC BASIC as the built-in language. As the BBC Micro had started to become relatively widely adopted in schools across the United Kingdom, a less expensive computer offering this particular dialect of BASIC was attractive to purchasers looking for compatibility with school computers at home. Obviously, there is a need to be able to load and save BASIC programs, and this can be done using the serial connection.

Loading a BASIC program from the Unix host.

Loading a BASIC program from the Unix host.

Beyond straightforward operations like these, BASIC also provides random-access file operations through various keywords and constructs, utilising the underlying operating system interfaces that invoke filing system operations to perform such work. VIEW also appears to use these operations, so it seems sensible not to ignore them, even if many programmers might have preferred to use bulk transfer operations – the standard load and save – to get data in and out of memory quickly.

A BASIC program reading and showing a file.

A BASIC program reading and showing a file.

Interactions between printing, the operating system’s own spooling support, outputting characters and reading and writing data are tricky. A degree of experimentation was required to make these things work together. In principle, it should be possible to print and spool at the same time, even with output generated by the remote host that has been sent over the serial line for display on the Electron!

Of course, as a hybrid terminal, the exercise would not be complete without terminal functionality. Here, I wanted to avoid going down another rabbit hole and implementing a full terminal emulator, but I still wanted to demonstrate the invocation of a shell on the Unix host and the ability to run commands. To show just another shell session transcript would be rather dull, so here I present the perusal of a Python program to generate control codes that change the text colour on the Electron, along with the program’s effects:

Interaction with the shell featuring multiple text colours.

Interaction with the shell featuring multiple text colours.

As a bitmapped terminal, the Electron is capable of much more than this. Although limited to moderate resolutions by the standards of the fanciest graphics terminals even of that era, there are interesting possibilities for Unix programs and scripts to generate graphical output.

A chart generated by a Python program showing workstation performance results.

A chart generated by a Python program showing workstation performance results.

Sending arbitrary character codes requires a bit of terminal configuration magic so that line feeds do not get translated into other things (the termios manual page is helpful, here, suggesting the ONLCR flag as the culprit), but the challenge, as always, is to discover the piece of the stack of technologies that is working against you. Similar things can be said on the Electron as well, with its own awkward confluence of character codes for output and output control, requiring the character output state to be tracked so that certain values do not get misinterpreted in the wrong context.

Others have investigated terminal connectivity on Acorn’s 8-bit microcomputers and demonstrated other interesting ways of producing graphical output from Unix programs. Acornsoft’s Termulator could even emulate a Tektronix 4010 graphical terminal. Curiously, Termulator also supported file transfer between a BBC Micro and the host machine, although only as a dedicated mode and limited to ASCII-only text files, leaving the hybrid terminal concept unexplored.

Reflections and Remarks

I embarked on this exercise with some cautiousness, knowing that plenty of uncertainties lay ahead in implementing a functional piece of software, and there were plenty of frustrating moments as some of the different elements of the rather underdocumented software stack conspired to produce undesirable behaviour. In addition, the behaviour of my serial emulation code had a confounding influence, requiring some low-level debugging (tracing execution within the emulator instruction by instruction, noting the state of the emulated CPU), some slowly dawning realisations, and some adjustments to hopefully make it work in a more cooperative fashion.

There are several areas of potential improvement. I first programmed in 6502 assembly language maybe thirty-five years ago, and although I managed to get some sprite and scrolling routines working, I never wrote any large programs, nor had to interact with the operating system frameworks. I personally find the 6502 primitive, rigid, and not particularly conducive to higher-level programming techniques, and I found myself writing some macros to take away the tedium of shuffling values between registers and the stack, constantly aware of various pitfalls with regard to corrupting registers.

My routines extending the operating system framework possibly do not do things the right way or misunderstand some details. That, I will blame on the vague documentation as well as any mistakes made micromanaging the registers. Particularly frustrating was the way that my ROM code would be called with interrupts disabled in certain cases. This made implementation challenging when my routines needed to communicate over the serial connection, when such communication itself requires interrupts to be enabled. Quite what the intention of the MOS designers was in such circumstances remains something of a mystery. While writing this article, I realised that I could have implemented the printing functionality in a different way, and this might have simplified things, right up to the point where I saw, thanks to the debugger provided by Elkulator, that the routines involved are called – surprise! – with interrupts disabled.

Performance could be a lot better, with this partly due to my own code undoubtedly requiring optimisation. The existing software stack is probably optimised to a reasonable extent, but there are various persistent background activities that probably steal CPU cycles unnecessarily. One unfortunate contributor to performance limitations is the hardware architecture of the Electron. Indeed, I discovered while testing in one of the 80-column display modes that serial transfers were not reliable at the default transfer rate of 9600 baud, instead needing to be slowed down to only 2400 baud. Some diagnosis confirmed that the software was not reading the data from the serial chip quickly enough, causing an overflow condition and data being lost.

Motivated by cost reduction and product positioning considerations – the desire to avoid introducing a product that might negatively affect BBC Micro sales – the Electron was deliberately designed to use a narrow data bus to fewer RAM chips than otherwise would have been used, with a seemingly clever technique being employed to allow the video circuitry to get the data at the desired rate to produce a high-resolution or high-bandwidth display. Unfortunately, the adoption of the narrow data bus, facilitated by the adoption of this particular technique, meant that the CPU could only ever access RAM at half its rated speed. And with the narrow data bus, the video circuitry effectively halts the CPU altogether for a substantial portion of its time in high-bandwidth display modes. Since serial communications handling relies on the delivery and handling of interrupts, if the CPU is effectively blocked from responding quickly enough, it can quickly fall behind if the data is arriving and the interrupts are occurring too often.

That does raise the issue of reliability and of error correction techniques. Admittedly, this work relies on a reliable connection between the emulated Electron and the host. Some measures are taken to improve the robustness of the communication when messages are interrupted so that the host in particular is not left trying to send or receive large volumes of data that are no longer welcome or available, and other measures are taken to prevent misinterpretation of stray data received in a different and thus inappropriate context. I imagine that I may have reinvented the wheel badly here, but these frustrations did provide a level of appreciation of the challenges involved.

Some Broader Thoughts

It is possible that Acorn, having engineered the Electron too aggressively for cost, made the machine less than ideal for the broader range of applications for which it was envisaged. That said, it should have been possible to revise the design and produce a more performant machine. Experiments suggest that a wider data path to RAM would have helped with the general performance of the Electron, but to avoid most of the interrupt handling problems experienced with the kind of application being demonstrated here, the video system would have needed to employ its existing “clever” memory access technique in conjunction with that wider data path so as to be able to share the bandwidth more readily with the CPU.

Contingency plans should have been made to change or upgrade the machine, if that had eventually been deemed necessary, starting at the point in time when the original design compromises were introduced. Such flexibility and forethought would also have made a product with a longer appeal to potential purchasers, as opposed to a product that risked being commercially viable for only a limited period of time. However, it seems that the lessons accompanying such reflections on strategy and product design were rarely learned by Acorn. If lessons were learned, they appear to have reinforced a particular mindset and design culture.

Virtue is often made of the Acorn design philosophy and the sometimes rudely expressed and dismissive views of competing technologies that led the company to develop the ARM processor. This approach enabled comparatively fast and low-cost systems to be delivered by introducing a powerful CPU to do everything in a system from running applications to servicing interrupts for data transfers, striving for maximal utilisation of the available memory bandwidth by keeping the CPU busy. That formula worked well enough at the low end of the market, but when the company tried to move upmarket once again, its products were unable to compete with those of other companies. Ultimately, this sealed the company’s fate, even if more fortuitous developments occurred to keep ARM in the running.

(In the chart shown earlier demonstating graphical terminal output and illustrating workstation performance, circa 1990, Acorn’s R260 workstation is depicted as almost looking competitive until one learns that the other workstations depicted arrived a year earlier and that the red bar showing floating-point performance only applies to Acorn’s machine three years after its launch. It would not be flattering to show the competitors at that point in history, nor would it necessarily be flattering to compare whole-system performance, either, if any publication sufficiently interested in such figures had bothered to do so. There is probably an interesting story to be told about these topics, particularly how Acorn’s floating-point hardware arrived so late, but I doubt that there is the same willingness to tell it as there is to re-tell the usual celebratory story of ARM for the nth time.)

Acorn went on to make the Communicator as a computer that would operate in a kind of network computing environment, relying on network file servers to provide persistent storage. It reused some of the technology in the Electron and the BT Merlin M2105, particularly the same display generator and its narrow data bus to RAM, but ostensibly confining that aspect of the Electron’s architecture to a specialised role, and providing other facilities for applications and, as in the M2105, for interaction with peripherals. Sadly, the group responsible in Acorn had already been marginalised and eventually departed, apparently looking to pursue the concept elsewhere.

As for this particular application of an old computer and a product that was largely left uncontemplated, I think there probably was some mileage in deploying microcomputers in this way, even outside companies like Acorn where such computers were being developed and used, together with software development companies with their own sophisticated needs, where minicomputers like the DEC VAX would have been available for certain corporate or technical functions. Public (or semi-public) access terminals were fairly common in universities, and later microcomputers were also adopted in academia due to their low cost and apparently sufficient capabilities.

Although such adoption appears to have focused on terminal applications, it cannot have been beyond the wit of those involved to consider closer integration between the microcomputing and multi-user environments. In further and higher education, students will have had microcomputing experience and would have been able to leverage their existing skills whilst learning new ones. They might have brought their microcomputers along with them, giving them the opportunity to transfer or migrate their existing content – their notes, essays, programs – to the bright and emerging new world of Unix, as well as updating their expertise.

As for updating my own expertise, it has been an enlightening experience in some ways, and I may well continue to augment the implemented functionality, fix and improve things, and investigate the possibilities this work brings. I hope that this rather lengthy presentation of the effort has provided insights into experiences of the past that was and the past that might have been.

Considering Unexplored Products of the Past: Emulating an Expansion

Wednesday, February 8th, 2023

In the last couple of years, possibly in common with quite a few other people, certainly people of my vintage, and undoubtedly those also interested in retrocomputing, I have found myself revisiting certain aspects of my technological past. Fortunately, sites like the Internet Archive make this very easy indeed, allowing us to dive into publications from earlier eras and to dredge up familiar and not so familiar magazine titles and other documentation. And having pursued my retrocomputing interest for a while, participating in forums, watching online videos, even contributing to new software and hardware developments, I have found myself wanting to review some of the beliefs and perceptions that I and other people have had of the companies and products we grew up with.

One of the products of personal interest to me is the computer that got me and my brother started with writing programs (as well as playing games): the Acorn Electron, a product of Acorn Computers of Cambridge in the United Kingdom. Much can be said about the perceived chronology of this product’s development and introduction, the actual chronology, and its impact on its originator and on wider society, but that surely deserves a separate treatment. What I can say is that reviewing the archives and other knowledge available to us now can give a deeper understanding of the processes involved in the development of the Electron, the technological compromises made, and the corporate strategy that led to its creation and eventually its discontinuation.

By Bilby - Own work, CC BY 3.0, https://commons.wikimedia.org/w/index.php?curid=10957142

The Acorn Electron
(Picture attribution: By BilbyOwn work, CC BY 3.0, Link)

It has been popular to tell simplistic narratives about Acorn Computers, to reduce its history to a few choice moments as the originator of the BBC Microcomputer and the ARM processor, but to do so is to neglect a richer and far more interesting story, even if the fallibility of some of the heroic and generally successful characters involved may be exposed by telling some of that story. And for those who wonder how differently some aspects of computing history might have turned out, exploring that story and the products involved can be an adventure in itself, filling in the gaps of our prior experiences with new insights, realisations and maybe even glimpses into opportunities missed and what might have been if things had played out differently.

At the Rabbit Hole

Reading about computing history is one thing, but this tale is about actually doing things with old software, emulation, and writing new software. It started off with a discussion about the keyboard shortcuts for a word processor and the differences between the keyboards on the Acorn Electron and its higher-specification predecessor, the BBC Microcomputer. Having acquainted myself with the circuitry of the Electron, how its keyboard is wired up, and how the software accesses it, I was obviously intrigued by these apparent differences, but I was also intrigued by the operation of the word processor in question, Acornsoft’s VIEW.

Back in the day, as people like to refer to the time when these products were first made available, such office or productivity applications were just beyond my experience. Although it was slightly fascinating to read about them, most of my productive time was spent writing programs, mostly trying to write games. I had actually seen an office suite written by Psion on the ACT Sirius 1 in the early 1980s, but word processors were the kind of thing that people used in offices or, at the very least, by people who had a printer so that they could print the inevitable letters that everyone would be needing to write.

Firing up an Acorn Electron emulator, specifically Elkulator, I discovered that one of the participants in the discussion was describing keyboard shortcuts that didn’t match up to those that were described in a magazine article from the era, these appearing correct as I tried them out for myself. It turned out that the discussion participant in question was using the BBC Micro version of VIEW on the Electron and was working around the mismatch in keyboard layouts. Although all of this was much ado about virtually nothing, it did two things. Firstly, it made me finally go in and fix Elkulator’s keyboard configuration dialogue, and secondly, it made me wonder how convenient it would be to explore old software in a productive way in an emulator.

Reconciling Keyboards

Having moved to Norway many years ago now, I use a Norwegian keyboard layout, and this has previously been slightly problematic when using emulators for older machines. Many years ago, I used and even contributed some minor things to another emulator, ElectrEm, which had a nice keyboard configuration dialogue. The Electron’s keyboard corresponds to certain modern keyboards pretty well, at least as far as the alphanumeric keys are concerned. More challenging are the symbols and control-related keys, in particular the Electron’s special Caps Lock/Function key which sits where many people now have their Tab key.

Obviously, there is a need to be able to tell an emulator which keys on a modern keyboard are going to correspond to the keys on the emulated machine. Being derived from an emulator for the BBC Micro, however, Elkulator’s keyboard configuration dialogue merely presented a BBC Micro keyboard on the screen and required the user to guess which “Beeb” key might correspond to an Electron one. Having put up with this situation for some time, I finally decided to fix this once and for all. The process of doing so is not particularly interesting, so I will spare you the details of doing things with the Allegro toolkit and the Elkulator source code, but I was mildly pleased with the result:

The revised keyboard configuration dialogue in Elkulator.

The revised keyboard configuration dialogue in Elkulator.

By also adding support for redefining the Break key in a sensible way, I was also finally able to choose a key that desktop environments don’t want to interfere with: F12 might work for Break, but Ctrl-F12 makes KDE/Plasma do something I don’t want, and yet Ctrl-Break is quite an important key combination when using an Electron or BBC Micro. Why Break isn’t a normal key on these machines is another story in itself, but here is an example of redefining it and even allowing multiple keys on a modern keyboard to act as Break on the emulated computer:

Redefining the Break key in Elkulator.

Redefining the Break key in Elkulator.

Being able to confidently choose and use keys made it possible to try out VIEW in a more natural way. But this then led to another issue: how might I experiment with such software productively? It would be good to write documents and to be able to extract them from the emulator, rather than see them disappear when the emulator is closed.

Real and Virtual Machines

One way to get text out of a system, whether it is a virtual system like the emulated Electron or a real machine, is to print it. I vaguely remembered some support for printing from Elkulator and was reminded by my brother that he had implemented such support himself a while ago as a quick way of getting data out of the emulated system. But I also wanted to be able to get data into the emulated system as well, and the parallel interface typically used by the printer is not bidirectional on the Electron. So, I would need to look further for a solution.

It is actually the case that Elkulator supports reading from and writing to disk (or disc) images. The unexpanded Electron supports read/write access to cassettes (or tapes), but Elkulator does not support writing to tapes, probably because the usability considerations are rather complicated: one would need to allow the user to control the current position on a tape, and all this would do is to remind everyone how inconvenient tapes are. Meanwhile, writing to disk images would be fairly convenient within the emulator, but then one would need to use tools to access the files within the images outside the emulator.

Some emulators for various systems also support the notion of a host filesystem (or filing system) where some special support has been added to make the emulated machine see another peripheral and to communicate with it, this peripheral really being a program on the host machine (the machine that is running the emulator). I could have just written such support, although it would also have needed some software support written for the emulated machine as well, but this approach would have led me down a path of doing something specific to emulation. And I have a principle of sorts which is that if I am going to change the way an emulated machine behaves, it has to be rooted in some kind of reality and not just enhance the emulated machine in a way that the original, “real” machine could not have been.

Building on Old Foundations

As noted earlier, I have an interest in the way that old products were conceived and the roles for which those products were intended by their originators. The Electron was largely sold as an unexpanded product, offering only power, display and cassette ports, with a general-purpose expansion connector being the gateway to anything else that might have been added to the system later. This was perceived somewhat negatively when the machine was launched because it was anticipated that buyers would probably, at the very least, want to plug joysticks into the Electron to play games. Instead, Acorn offered an expansion unit, the Plus 1, that cost another £60 which provided joystick, printer and cartridge connectors.

But this flexibility in expanding the machine meant that it could have been used as the basis for a fairly diverse range of specialised products. In fact, one of the Acorn founders, Chris Curry, enthused about the Electron as a platform for such products, and one such product did actually make it to market, in a way: the BT Merlin M2105 messaging terminal. This terminal combined the Electron with an expansion unit containing circuitry for communicating over a telephone line, a generic serial communications port, a printer port, as well as speech synthesis circuitry and a substantial amount of read-only memory (ROM) for communications software.

Back in the mid-1980s, telecommunications (or “telecoms”) was the next big thing, and enthusiasm for getting a modem and dialling up some “online” service or other (like Prestel) was prevalent in the computing press. For businesses and institutions, there were some good arguments for adopting such technologies, but for individuals the supposed benefits were rather dulled by the considerable costs of acquiring the hardware, buying subscriptions, and the notoriously high telephone call rates of the era. Only the relatively wealthy or the dedicated few pursued this side of data communications.

The M2105 reportedly did some service in the healthcare sector before being repositioned for commercial applications. Along with its successor product, the Acorn Communicator, it enjoyed a somewhat longer lifespan in certain enterprises. For the standard Electron and its accompanying expansions, support for basic communications capabilities was evidently considered important enough to be incorporated into the software of the Plus 1 expansion unit, even though the Plus 1 did not provide any of the specific hardware capabilities for communication over a serial link or a telephone line.

It was this apparently superfluous software capability that I revisited when I started to think about getting files in and out of the emulator. When emulating an Electron with Plus 1, this serial-capable software is run by the emulator, just as it is by a real Electron. On a real system of this kind, a cartridge could be added that provides a serial port and the necessary accompanying circuitry, and the system would be able to drive that hardware. Indeed, such cartridges were produced decades ago. So, if I could replicate the functionality of a cartridge within the emulator, making some code that pretends to be a serial communications chip (or UART) that has been interfaced to the Electron, then I would in principle be able to set up a virtual serial connection between the emulated Electron and my modern host computer.

Emulated Expansions

Modifying Elkulator to add support for serial communications hardware was fairly straightforward, with only a few complications. Expansion hardware on the Electron is generally accessible via a range of memory addresses that actually signal peripherals as opposed to reading and writing memory. The software provided by the Plus 1 expansion unit is written to expect the serial chip to be accessible via a range of memory locations, with the serial chip accepting values sent to those locations and producing values from those locations on request. The “memory map” through which the chip is exposed in the Electron corresponds directly to the locations or registers in the serial chip – the SCN2681 dual asynchronous receiver/transmitter (DUART) – as described by its datasheet.

In principle, all that is needed is to replicate the functionality described by the datasheet. With this done, the software will drive the chip, the emulated chip will do what is needed, and the illusion will be complete. In practice, a certain level of experimentation is needed to fill in the gaps left by the datasheet and any lack of understanding on the part of the implementer. It did help that the Plus 1 software has been disassembled – some kind of source code regenerated from the binary – so that the details of its operation and its expectations of the serial chip’s operation can be established.

Moreover, it is possible to save a bit of effort by seeing which features of the chip have been left unused. However, some unused features can be provided with barely any extra effort: the software only drives one serial port, but the chip supports two in largely the same way, so we can keep support for two just in case there is a need in future for such capabilities. Maybe someone might make a real serial cartridge with two ports and want to adapt the existing software, and they could at least test that software under emulation before moving to real hardware.

It has to be mentioned that the Electron’s operating system, known as the Machine Operating System or MOS, is effectively extended by the software provided in the Plus 1 unit. Even the unexpanded machine provides the foundations for adding serial communications and printing capabilities in different ways, and the Plus 1 software merely plugs into that framework. A different kind of serial chip would be driven by different software but it would plug into the same framework. At no point does anyone have to replace the MOS with a patched version, which seems to be the kind of thing that happens with some microcomputers from the same era.

Ultimately, what all of this means is that having implemented the emulated serial hardware, useful things can already be done with it within the bare computing environment provided by the MOS. One can set the output stream to use the serial port and have all the text produced by the system and programs sent over the serial connection. One can select the serial port for the input stream and send text to the computer instead of using the keyboard. And printing over the serial connection is also possible by selecting the appropriate printer type using a built-in system command.

In Elkulator, I chose to expose the serial port via a socket connection, with the emulator binding to a Unix domain socket on start-up. I then wrote a simple Python program to monitor the socket, to show any data being sent from the emulator and to send any input from the terminal to the emulator. This permitted the emulated machine to be operated from a kind of remote console and for the emulated machine to be able to print to this console. At last, remote logins are possible on the Electron! Of course, such connectivity was contemplated and incorporated from the earliest days of these products.

Filing Options

If the goal of all of this had been to facilitate transfers to and from the emulated machine, this might have been enough, but a simple serial connection is not especially convenient to use. Although a method of squirting a file into the serial link at the Electron could be made convenient for the host computer, at the other end one has to have a program to do something with that file. And once the data has arrived, would it not be most convenient to be able to save that data as a file? We just end up right back where we started: having some data inside the Electron and nowhere to put it! Of course, we could enable disk emulation and store a file on a virtual disk, but then it might just have been easier to make disk image handling outside the emulator more convenient instead.

It seemed to me that the most elegant solution would be to make the serial link act as the means through which the Electron accesses files. That instead of doing ad-hoc transfers of data, such data would be transferred as part of operations that are deliberately accessing files. Such ambitions are not unrealistic, and here I could draw on my experience with the platform, having acquired the Acorn Electron Advanced User Guide many, many years ago, in which there are details of implementing filing system ROMs. Again, the operating system had been designed to be extended in order to cover future needs, and this was one of them.

In fact, I had not been the only one to consider a serial filing system, and I had been somewhat aware of another project to make software available via a serial link to the BBC Micro. That project had been motivated by the desire to be able to get software onto that computer where no storage devices were otherwise available, even performing some ingenious tricks to transfer the filing system software to the machine and to have that software operate from RAM. It might have been tempting merely to use this existing software with my emulated serial port, to get it working, and then to get back to trying out applications, loading and saving, and to consider my work done. But I had other ideas in mind…

Gradual Explorations of Filesystems, Paging and L4Re

Thursday, June 30th, 2022

A surprising three years have passed since my last article about my efforts to make a general-purpose filesystem accessible to programs running in the L4 (or L4Re) Runtime Environment. Some of that delay was due to a lack of enthusiasm about blogging for various reasons, much more was due to having much of my time occupied by full-time employment involving other technologies (Python and Django mostly, since you ask) that limited the amount of time and energy that could be spent focusing on finding my way around the intricacies of L4Re.

In fact, various other things I looked into in 2019 (or maybe 2018) also went somewhat unreported. I looked into trying to port the “user mode” (UX) variant of the Fiasco.OC microkernel to the MIPS architecture used by the MIPS Creator CI20. This would have allowed me to conveniently develop and test L4Re programs in the GNU/Linux environment on that hardware. I did gain some familiarity with the internals of that software, together with the Linux ptrace mechanism, making some progress but not actually getting to a usable conclusion. Recommendations to use QEMU instead led me to investigate the situation with KVM on MIPS, simply to try and get half-way reasonable performance: emulation is otherwise rather slow.

You wouldn’t think that running KVM on anything other than Intel/AMD or ARM architectures were possible if you only read the summary on the KVM project page or the Debian Wiki’s KVM page. In fact, KVM is supported on multiple architectures including MIPS, but the latest (and by now very old 3.18) “official” kernel for the CI20 turned out to be too old to support what I needed. Or at least, I tried to get it to work but even with all the necessary configuration to support “trap and emulate” on a CPU without virtualisation support, it seemed to encounter instructions it did not emulate. As the hot summer of 2019 (just like 2018) wound down, I switched back to using my main machine at the time: an ancient Pentium 4 system that I didn’t want heating the apartment; one that could run QEMU rather slowly, albeit faster than the CI20, but which gave me access to Fiasco.OC-UX once again.

Since then, the hard yards of upstreaming Linux kernel device support for the CI20 has largely been pursued by the ever-patient Nikolaus Schaller, vendor of the Letux 400 mini-notebook and hardware designer of the Pyra, and a newer kernel capable of running KVM satisfactorily might now be within reach. That is something to be investigated somewhere in the future.

Back to the Topic

In my last article on the topic of this article, I had noted that to take advantage of various features that L4Re offers, I would need to move on from providing a simple mechanism to access files through read and write operations, instead embracing the memory mapping paradigm that is pervasive in L4Re, adopting such techniques to expose file content to programs. This took us through a tour of dataspaces, mapping, pages, flexpages, virtual memory and so on. Ultimately, I had a few simple programs working that could still read and write to files, but they would be doing so via a region of memory where pages of this memory would be dynamically “mapped” – made available – and populated with file content. I even integrated the filesystem “client” library with the Newlib C library implementation, but that is another story.

Nothing is ever simple, though. As I stressed the test programs, introducing concurrent access to files, crashes would occur in the handling of the pages issued to the clients. Since I had ambitiously decided that programs accessing the same files would be able to share memory regions assigned to those files, with two or more programs being issued with access to the same memory pages if they happened to be accessing the same areas of the underlying file, I had set myself up for the accompanying punishment: concurrency errors! Despite the heroic help of l4-hackers mailing list regulars (Frank and Jean), I had to concede that a retreat, some additional planning, and then a new approach would be required. (If nothing else, I hope this article persuades some l4-hackers readers that their efforts in helping me are not entirely going to waste!)

Prototyping an Architecture

In some spare time a couple of years ago, I started sketching out what I needed to consider when implementing such an architecture. Perhaps bizarrely, given the nature of the problem, my instinct was to prototype such an architecture in Python, running as a normal program on my GNU/Linux system. Now, Python is not exactly celebrated for its concurrency support, given the attention its principal implementation, CPython, has often had for a lack of scalability. However, whether or not the Python implementation supports running code in separate threads simultaneously, or whether it merely allows code in threads to take turns running sequentially, the most important thing was that I could have code happily running along being interrupted at potentially inconvenient moments by some other code that could conceivably ruin everything.

Fortunately, Python has libraries for threading and provides abstractions like semaphores. Such facilities would be all that was needed to introduce concurrency control in the different program components, allowing the simulation of the mechanisms involved in acquiring memory pages, populating them, issuing them to clients, and revoking them. It may sound strange to even consider simulating memory pages in Python, which operates at another level entirely, and the issuing of pages via a simulated interprocess communication (IPC) mechanism might seem unnecessary and subject to inaccuracy, but I found it to be generally helpful in refining my approach and even deepening my understanding of concepts such as flexpages, which I had applied in a limited way previously, making me suspect that I had not adequately tested the limits of my understanding.

Naturally, L4Re development is probably never done in Python, so I then had the task of reworking my prototype in C++. Fortunately, this gave me the opportunity to acquaint myself with the more modern support in the C++ standard libraries for threading and concurrency, allowing me to adopt constructs such as mutexes, condition variables and lock guards. Some of this exercise was frustrating: C++ is, after all, a lower-level language that demands more attention to various mundane details than Python does. It did suggest potential improvements to Python’s standard library, however, although I don’t really pay any attention to Python core development any more, so unless someone else has sought to address such issues, I imagine that Python will gain even more in the way of vanity features while such genuine deficiencies and omissions remain unrecognised.

Transplanting the Prototype

Upon reintroducing this prototype functionality into L4Re, I decided to retain the existing separation of functionality into various libraries within the L4Re build system – ones for filesystem clients, servers, IPC – also making a more general memory abstractions library, but I ultimately put all these libraries within a single package. At some point, it is this package that I will be making available, and I think that it will be easier to evaluate with all the functionality in a single bundle. The highest priority was then to test the mechanisms employed by the prototype using the same concurrency stress test program, this originally being written in Python, then ported to C++, having been used in my GNU/Linux environment to loosely simulate the conditions under L4Re.

This stress testing exercise eventually ended up working well enough, but I did experience issues with resource limits within L4Re as well as some concurrency issues with capability management that I should probably investigate further. My test program opens a number of files in a number of threads and attempts to read various regions of these files over and over again. I found that I would run out of capability slots, these tracking the references to other components available to a task in L4Re, and since each open file descriptor or session would require a slot, as would each thread, I had to be careful not to exceed the default budget of such slots. Once again, with help from another l4-hackers participant (Philipp), I realised that I wasn’t releasing some of the slots in my own code, but I also learned that above a certain number of threads, open files, and so on, I would need to request more resources from the kernel. The concurrency issue with allocating individual capability slots remains unexplored, but since I already wrap the existing L4Re functionality in my own library, I just decided to guard the allocation functionality with semaphores.

With some confidence in the test program, which only accesses simulated files with computed file content, I then sought to restore functionality accessing genuine files, these being the read-only files already exposed within L4Re along with ext2-resident files previously supported by my efforts. The former kind of file was already simulated in the prototype in the form of “host” files, although L4Re unhelpfully gives an arbitary (counter) value for the inode identifiers for each file, so some adjustments were required. Meanwhile, introducing support for the latter kind of file led me to update the bundled version of libext2fs I am using, refine various techniques for adapting the upstream code, introduce more functionality to help use libext2fs from my own code (since libext2fs can be rather low-level), and to consider the broader filesystem support architecture.

Here is the general outline of the paging mechanism supporting access to filesystem content:

Paging data structures

The data structures employed to provide filesystem content to programs.

It is rather simplistic, and I have practically ignored complicated page replacement algorithms. In practice, pages are obtained for use when a page fault occurs in a program requesting a particular region of file content, and fulfilment of this request will move a page to the end of a page queue. Any independent requests for pages providing a particular file region will also reset the page’s position in the queue. However, since successful accesses to pages will not cause programs to repeatedly request those pages, eventually those pages will move to the front of the queue and be reclaimed.

Without any insight into how much programs are accessing a page successfully, relying purely on the frequency of page faults, I imagine that various approaches can be adopted to try and assess the frequency of accesses, extrapolating from the page fault frequency and seeking to “bias” or “weight” pages with a high frequency of requests so that they move through the queue more slowly or, indeed, move through a queue that provides pages less often. But all of this is largely a distraction from getting a basic mechanism working, so I haven’t directed any more time to it than I have just now writing this paragraph!

Files and File Sessions

While I am quite sure that I ended up arriving at a rather less than optimal approach for the paging architecture, I found that the broader filesystem architecture also needed to be refined further as I restored the functionality that I had previously attempted to provide. When trying to support shared access to file content, it is appropriate to implement some kind of registry of open files, these retaining references to objects that are providing access to each of the open files. Previously, this had been done in a fairly simple fashion, merely providing a thread-safe map or dictionary yielding the appropriate file-related objects when present, otherwise permitting new objects to be registered.

Again, concurrency issues needed closer consideration. When one program requests access to a file, it is highly undesirable for another program to interfere during the process of finding the file, if it exists already, or creating the file, if it does not. Therefore, there must be some kind of “gatekeeper” for the file, enforcing sequential access to filesystem operations involving it and ensuring that any preparatory activities being undertaken to make a file available, or to remove a file, are not interrupted or interfered with. I came up with an architecture looking like this, with a resource registry being the gatekeeper, resources supporting file sessions, providers representing open files, and accessors transferring data to and from files:

Filesystem access data structures

The data structures employed to provide access to the underlying filesystem objects.

I became particularly concerned with the behaviour of the system around file deletion. On Unix systems, it is fairly well understood that one can “unlink” an existing file and keep accessing it, as long as a file descriptor has been retained to access that file. Opening a file with the same name as the unlinked file under such circumstances will create a new file, provided that the appropriate options are indicated, or otherwise raise a non-existent file error, and yet the old file will still exist somewhere. Any new file with the same name can be unlinked and retained similarly, and so on, building up a catalogue of old files that ultimately will be removed when the active file descriptors are closed.

I thought I might have to introduce general mechanisms to preserve these Unix semantics, but the way the ext2 filesystem works largely encodes them to some extent in its own abstractions. In fact, various questions that I had about Unix filesystem semantics and how libext2fs might behave were answered through the development of various test programs, some being normal programs accessing files in my GNU/Linux environment, others being programs that would exercise libext2fs in that environment. Having some confidence that libext2fs would do the expected thing leads me to believe that I can rely on it at least for some of the desired semantics of the eventual system.

The only thing I really needed to consider was how the request to remove a file when that file was still open would affect the “provider” abstraction permitting continued access to the file contents. Here, I decided to support a kind of deferred removal: if a program requested the removal of a file, the provider and the file itself would be marked for removal upon the final closure of the file, but the provider for the file would no longer be available for new usage, and the file would be unlinked; programs already accessing the file would continue to operate, but programs opening a file of the same name would obtain a new file and a new provider.

The key to this working satisfactorily is that libext2fs will assign a new inode identifier when opening a new file, whereas an unlinked file retains its inode identifier. Since providers are indexed by inode identifier, and since libext2fs translates the path of a file to the inode identifier associated with the file in its directory entry, attempts to access a recreated file will always yield the new inode identifier and thus the new file provider.

Pipes, Listings and Notifications

In the previous implementation of this filesystem functionality, I had explored some other aspects of accessing a filesystem. One of these was the ability to obtain directory listings, usually exposed in Unix operating systems by the opendir and readdir functions. The previous implementation sought to expose such listings as files, this in an attempt to leverage the paging mechanisms already built, but the way that libext2fs provides such listing information is not particularly compatible with the random-access file model: instead, it provides something more like an iterator that involves the repeated invocation of a callback function, successively supplying each directory entry for the callback function to process.

For this new implementation, I decided to expose directory listings via pipes, with a server thread accessing the filesystem and, in that callback function, writing directory entries to one end of a pipe, and with a client thread reading from the other end of the pipe. Of course, this meant that I needed to have an implementation of pipes! In my previous efforts, I did implement pipes as a kind of investigation, and one can certainly make this very complicated indeed, but I deliberately kept this very simple in this current round of development, merely having a couple of memory regions, one being used by the reader and one being used by the writer, with each party transferring the regions to the other (and blocking) if they find themselves respectively running out of content or running out of space.

One necessary element in making pipes work is that of coordinating the reading and writing parties involved. If we restrict ourselves to a pipe that will not expand (or even not expand indefinitely) to accommodate more data, at some point a writer may fill the pipe and may then need to block, waiting for more space to become available again. Meanwhile, a reader may find itself encountering an empty pipe, perhaps after having read all available data, and it may need to block and wait for more content to become available again. Readers and writers both need a way of waiting efficiently and requesting a notification for when they might start to interact with the pipe again.

To support such efficient blocking, I introduced a notifier abstraction for use in programs that could be instantiated and a reference to such an instance (in the form of a capability) presented in a subscription request to the pipe endpoint. Upon invoking the wait operation on a notifier, the notifier will cause the program (or a thread within a program) to wait for the delivery of a notification from the pipe, this being efficient because the thread will then sleep, only to awaken if a message is sent to it. Here is how pipes make use of notifiers to support blocking reads and writes:

Communication via pipes employing notifications

The use of notifications when programs communicate via a pipe.

A certain amount of plumbing is required behind the scenes to support notifications. Since programs accessing files will have their own sessions, there needs to be a coordinating object representing each file itself, this being able to propagate notification events to the users of the file concerned. Fortunately, I introduced the notion of a “provider” object in my architecture that can act in such a capacity. When an event occurs, the provider will send a notification to each of the relevant notifier endpoints, also providing some indication of the kind of event occurring. Previously, I had employed L4Re’s IRQ (interrupt request) objects as a means of delivering notifications to programs, but these appear to be very limited and do not allow additional information to be conveyed, as far as I can tell.

One objective I had with a client-side notifier was to support waiting for events from multiple files or streams collectively, instead of requiring a program to have threads that wait for events from each file individually, thus attempting to support the functionality provided by Unix functions like select and poll. Such functionality relies on additional information indicating the kind of event that has occurred. The need to wait for events from numerous sources also inverts the roles of client and server, with a notifier effectively acting like a server but residing in a client program, waiting for messages from its clients, these typically residing in the filesystem server framework.

Testing and Layering

Previously, I found that it was all very well developing functionality, but only through a commitment to testing it would I discover its flaws. When having to develop functionality at a number of levels in a system at the same time, testing generally starts off in a fairly limited fashion. Initially, I reintroduced a “block” server that merely provides access to a contiguous block of data, this effectively simulating storage device access that will hopefully be written at some point, and although genuine filesystem support utilises this block server, it is reassuring to be able to know whether it is behaving correctly. Meanwhile, for programs to access servers, they must send requests to those servers, assisted by a client library that provides support for such interprocess communication at a fairly low level. Thus, initial testing focused on using this low-level support to access the block server and verify that it provides access to the expected data.

On top of the lowest-level library functionality is a more usable level of “client” functions that automates the housekeeping needing to be done so that programs may expect an experience familiar to that provided by traditional C library functionality. Again, testing of file operations at that level helped to assess whether library and server functionality was behaving in line with expectations. With some confidence, the previously-developed ext2 filesystem functionality was reintroduced and updated. By layering the ext2 filesystem server on top of the block server, the testing activity is actually elevated to another level: libext2fs absolutely depends on properly functioning access to the block device; otherwise, it will not be able to perform even the simplest operations on files.

When acquainting myself with libext2fs, I developed a convenience library called libe2access that encapsulates some of the higher-level operations, and I made a tool called e2access that is able to populate a filesystem image from a normal program. This tool, somewhat reminiscent of the mtools suite that was popular at one time to allow normal users to access floppy disks on a system, is actually a fairly useful thing to have, and I remain surprised that there isn’t anything like it in common use. In any case, e2access allows me to populate images for use in L4Re, but I then thought that an equivalent to it would also be useful in L4Re for testing purposes. Consequently, a tool called fsaccess was created, but unlike e2access it does not use libe2access or libext2fs directly: instead, it uses the “client” filesystem library, exercising filesystem access via the IPC system and filesystem server architecture.

Ultimately, testing will be done completely normally using C library functions, these wrapping the “client” library. At that point, there will be no distinction between programs running within L4Re and within Unix. To an extent, L4Re already supports normal Unix-like programs using C library functions, this being particularly helpful when developing all this functionality, but of course it doesn’t support “proper” filesystems or Unix-like functionality in a particularly broad way, with various common C library or POSIX functions being stubs that do nothing. Of course, all this effort started out precisely to remedy these shortcomings.

Paging, Loading and Running Programs

Beyond explicitly performed file access, the next level of mutually-reinforcing testing and development came about through the simple desire to have a more predictable testing environment. In wanting to be able to perform tests sequentially, I needed control over the initiation of programs and to be able to rely on their completion before initiating successive programs. This may well be possible within L4Re’s Lua-based scripting environment, but I generally find the details to be rather thin on the ground. Besides, the problem provided some motivation to explore and understand the way that programs are launched in the environment.

There is some summary-level information about how programs (or tasks) are started in L4Re – for example, pages 41 onwards of “Memory, IPC, and L4Re” – but not much in the way of substantial documentation otherwise. Digging into the L4Re libraries yielded a confusing array of classes and apparent interactions which presumably make sense to anyone who is already very familiar with the specific approach being taken, as well as the general techniques being applied, but it seems difficult for outsiders to distinguish between the specifics and the generalities.

Nevertheless, some ideas were gained from looking at the code for various L4Re components including Moe (the root task), Ned (the init program), the loader and utilities libraries, and the oddly-named l4re_kernel component, this actually providing the l4re program which itself hosts actual programs by providing the memory management functionality necessary for those programs to work. In fact, we will eventually be looking at a solution that replicates that l4re program.

A substantial amount of investigation and testing took place to explore the topic. There were a number of steps required to initialise a new program:

  1. Open the program executable file and obtain details of the different program segments and the program’s start address, this requiring some knowledge of ELF binaries.
  2. Initialise a stack for the program containing the arguments to be presented to it, plus details of the program’s environment. The environment is of particular concern.
  3. Create a task for the program together with a thread to begin execution at the start address, setting the stack pointer to the appropriate place in where the stack should be made available.
  4. Initialise a control block for the thread.
  5. Start the thread. This should immediately generate a page fault because the memory at the start address is not yet available within the task.
  6. Service page faults for the program, providing pages for the program code – thus resolving that initial page fault – as well as for the stack and other regions of memory.

Naturally, each of these steps entails a lot more work than is readily apparent. Particularly the last step is something of an understatement in terms of what is required: the mechanism by which demand paging of the program is to be achieved.

L4Re provides some support for inspecting ELF binaries in its utilities library, but I found the ELF specification to be very useful in determining the exact purposes of various program header fields. For more practical guidance, the OSDev wiki page about ELF provides an explanation of the program loading process, along with how the different program segments are to be applied in the initialisation of a new program or process. With this information to hand, together with similar descriptions in the context of L4Re, it became possible to envisage how the address space of a new program might be set up, determining which various parts of the program file might be installed and where they might be found. I wrote some test programs, making some use of the structures in the utilities library, but wrote my own functions to extract the segment details from an ELF binary.

I found a couple of helpful resources describing the initialisation of the program stack: “Linux x86 Program Start Up” and “How statically linked programs run on Linux”. These mainly demystify the code that is run when a program starts up, setting up a program before the user’s main function is called, giving a degree of guidance about the work required to set up the stack so that such code may perform as expected. I was, of course, also able to study what the various existing L4Re components were doing in this respect, although I found the stack abstractions used to be idiomatic C/C++ bordering on esoteric. Nevertheless, the exercise involves obtaining some memory that can eventually be handed over to the new program, populating that memory, and then making it available to the new program, either immediately or on request.

Although I had already accumulated plenty of experience passing object capabilities around in L4Re, as well as having managed to map memory between tasks by sending the appropriate message items, the exact methods of setting up another task with memory and capabilities had remained mysterious to me, and so began another round of experimentation. What I wanted to do was to take a fairly easy route to begin with: create a task, populate some memory regions containing the program code and stack, transfer these things to the new task (using the l4_task_map function), and then start the thread to run the program, just to see what happened. Transferring capabilities was fairly easily achieved, and the L4Re libraries and frameworks do employ the equivalent of l4_task_map in places like the Remote_app_model class found in libloader, albeit obfuscated by the use of the corresponding C++ abstractions.

Frustratingly, this simple approach did not seem to work for the memory, and I could find only very few cases of anyone trying to use l4_task_map (or its equivalent C++ incantations) to transfer memory. Despite the memory apparently being transferred to the new task, the thread would immediately cause a page fault. Eventually, a page fault is what we want, but that would only occur because no memory would be made available initially, precisely because we would be implementing a demand paging solution. In the case of using l4_task_map to set up program memory, there should be no new “demand” for pages of such memory, this demand having been satisfied in advance. Nevertheless, I decided to try and get a page fault handler to supply flexpages to resolve these faults, also without success.

Having checked and double-checked my efforts, an enquiry on the l4-hackers list yielded the observation that the memory I had reserved and populated had not been configured as “executable”, for use by code in running programs. And indeed, since I had relied on the plain posix_memalign function to allocate that memory, it wasn’t set up for such usage. So, I changed my memory allocation strategy to permit the allocation of appropriately executable memory, and fortunately the problem was solved. Further small steps were then taken. I sought to introduce a region mapper that would attempt to satisfy requests for memory regions occurring in the new program, these occurring because a program starting up in L4Re will perform some setting up activities of its own. These new memory regions would be recognised by the page fault handler, with flexpages supplied to resolve page faults involving those regions. Eventually, it became possible to run a simple, statically-linked program in its own task.

Supporting program loading with an external page fault handler

When loading and running a new program, an external page fault handler makes sure that accesses to memory are supported by memory regions that may be populated with file content.

Up to this point, the page fault handler had been external to the new task and had been supplying memory pages from its own memory regions. Requests for data from the program file were being satisfied by accessing the appropriate region of the file, this bringing in the data using the file’s paging mechanism, and then supplying a flexpage for that part of memory to the program running in the new task. This particular approach compels the task containing the page fault handler to have a memory region dedicated to the file. However, the more elegant solution involves having a page fault handler communicating directly with the file’s pager component which will itself supply flexpages to map the requested memory pages into the new task. And to be done most elegantly, the page fault handler needs to be resident in the same task as the actual program.

Putting the page fault handler and the actual program in the same task demanded some improvements in the way I was setting up tasks and threads, providing capabilities to them, and so on. Separate stacks need to be provided for the handler and the program, and these will run in different threads. Moving the page fault handler into the new task is all very well, but we still need to be able to handle page faults that the “internal” handler might cause, so this requires us to retain an “external” handler. So, the configuration of the handler and program are slightly different.

Another tricky aspect of this arrangement is how the program is configured to send its page faults to the handler running alongside it – the internal handler – instead of the one servicing the handler itself. This requires an IPC gate to be created for the internal handler, presented to it via its configuration, and then the handler will bind to this IPC gate when it starts up. The program may then start up using a reference to this IPC gate capability as its “pager” or page fault handler. You would be forgiven for thinking that all of this can be quite difficult to orchestrate correctly!

Configuring the communication between program and page fault handler

An IPC gate must be created and presented to the page fault handler for it to bind to before it is presented to the program as its “pager”.

Although I had previously been sending flexpages in messages to satisfy map requests, the other side of such transactions had not been investigated. Senders of map requests will specify a “receive window” to localise the placement of flexpages returned from such requests, this being an intrinsic part of the flexpage concept. Here, some aspects of the IPC system became more prominent and I needed to adjust the code generated by my interface description language tool which had mostly ignored the use of message buffer registers, employing them only to control the reception of object capabilities.

More testing was required to ensure that I was successfully able to request the mapping of memory in a particular region and that the supplied memory did indeed get mapped into the appropriate place. With that established, I was then able to modify the handler deployed to the task. Since the flexpage returned by the dataspace (or resource) providing access to file content effectively maps the memory into the receiving task, the page fault handler does not need to explicitly return a valid flexpage: the mapping has already been done. The semantics here were not readily apparent, but this approach appears to work correctly.

The use of an internal page fault handler with a new program

An internal page fault handler satisfies accesses to memory from the program running in the same task, providing it with access to memory regions that may be populated with file content.

One other detail that proved to be important was that of mapping file content to memory regions so that they would not overlap somehow and prevent the correct region from being used to satisfy page faults. Consider the following regions of the test executable file described by the readelf utility (with the -l option):

  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000001000000 0x0000000001000000
                 0x00000000000281a6 0x00000000000281a6  R E    0x1000
  LOAD           0x0000000000028360 0x0000000001029360 0x0000000001029360
                 0x0000000000002058 0x0000000000008068  RW     0x1000

Here, we need to put the first region providing the program code at a virtual address of 0x1000000, having a size of at least 0x281a6, populated with exactly that amount of content from the file. Meanwhile, we need to put the second region at address 0x1029360, having a size of 0x8068, but only filled with 0x2058 bytes of data. Both regions need to be aligned to addresses that are multiples of 0x1000, but their contents must be available at the stated locations. Such considerations brought up two apparently necessary enhancements to the provision of file content: the masking of content so that undefined areas of each region are populated with zero bytes, this being important in the case of the partially filled data region; the ability to support writes to a region without those writes being propagated to the original file.

The alignment details help to avoid the overlapping of regions, and the matter of populating the regions can be managed in a variety of ways. I found that since file content was already being padded at the extent of a file, I could introduce a variation of the page mapper already used to manage the population of memory pages that would perform such padding at the limits of regions defined within files. For read-only file regions, such a “masked” page mapper would issue a single read-only page containing only zero bytes for any part of a file completely beyond the limits of such regions, thus avoiding the allocation of lots of identical pages. For writable regions that are not to be committed to the actual files, a “copied” page mapper abstraction was introduced, this providing copy-on-write functionality where write accesses cause new memory pages to be allocated and used to retain the modified data.

Some packaging up of the functionality into library routines and abstractions was undertaken, although as things stand more of that still needs to be done. I haven’t even looked into support for dynamic library loading, nor am I handling any need to extend the program stack when that is necessary, amongst other things, and I also need to make the process of creating tasks as simple as a function call and probably also expose the process via IPC in order to have a kind of process server. I still need to get back to addressing the lack of convenient support for the sequential testing of functionality.

But I hope that much of the hard work has now already been done. Then again, I often find myself climbing one particular part of this mountain, thinking that the next part of the ascent will be easier, only to find myself confronted with another long and demanding stretch that brings me only marginally closer to the top! This article is part of a broader consolidation process, along with writing some documentation, and this process will continue with the packaging of this work for the historical record if nothing else.

Conclusions and Reflections

All of this has very much been a learning exercise covering everything from the nuts and bolts of L4Re, with its functions and abstractions, through the design of a component architecture to support familiar, intuitive but hard-to-define filesystem functionality, this requiring a deeper understanding of Unix filesystem behaviour, all the while considering issues of concurrency and resource management that are not necessarily trivial. With so much going on at so many levels, progress can be slow and frustrating. I see that similar topics and exercises are pursued in some university courses, and I am sure that these courses produce highly educated people who are well equipped to go out into the broader world, developing systems like these using far less effort than I seem to be applying.

That leads to the usual question of whether such systems are worth developing when one can just “use Linux” or adopt something already under development and aimed at a particular audience. As I note above, maybe people are routinely developing such systems for proprietary use and don’t see any merit in doing the same thing openly. The problem with such attitudes is that experience with the development of such systems is then not broadly cultivated, the associated expertise and the corresponding benefits of developing and deploying such systems are not proliferated, and the average user of technology only gets benefits from such systems in a limited sense, if they even encounter them at all, and then only for a limited period of time, most likely, before the products incorporating such technologies wear out or become obsolete.

In other words, it is all very well developing proprietary systems and celebrating achievements made decades ago, but having reviewed decades of computing history, it is evident to me that achievements that are not shared will need to be replicated over and over again. That such replication is not cutting-edge development or, to use the odious term prevalent in academia, “novel” is not an indictment of those seeking to replicate past glories: it is an indictment of the priorities of those who commercialised them on every prior occasion. As mundane as the efforts described in this article may be, I would hope that by describing them and the often frustrating journey involved in pursuing them, people may be motivated to explore the development of such systems and that techniques that would otherwise be kept as commercial secrets or solutions to assessment exercises might hopefully be brought to a broader audience.

Integrating libext2fs with a Filesystem Framework

Wednesday, February 20th, 2019

Given the content covered by my previous articles, there probably doesn’t seem to be too much that needs saying about the topic covered by this article. Previously, I described the work involved in building libext2fs for L4Re and testing the library, and I described a framework for separating filesystem providers from programs that want to use files. But, as always, there are plenty of little details, detours and learning experiences that help to make the tale longer than it otherwise might have been.

Although this file access framework sounds intimidating, it is always worth remembering that the only exotic thing about the software being written is that it needs to request system resources and to communicate with other programs. That can be tricky in itself in many programming environments, and I have certainly spent enough time trying to figure out how to use the types and functions provided by the many L4Re libraries so that these operations may actually work.

But in the end, these are programs that are run just like any other. We aren’t building things into the kernel and having to conform to a particularly restricted environment. And although it can still be tiresome to have to debug things, particularly interprocess communication (IPC) problems, many familiar techniques for debugging and inspecting program behaviour remain available to us.

A Quick Translation

The test program I had written for libext2fs simply opened a file located in the “rom” filesystem, exposed it to libext2fs, and performed operations to extract content. In my framework, I had directed my attention towards opening and reading files, so it made sense to concentrate on providing this functionality in a filesystem server or “provider”.

Accessing a filesystem server employing a "rom" file for the data

Accessing a filesystem server employing a "rom" file for the data

The user of the framework (shielded from the details by a client library) would request the opening of a file (thus obtaining a file descriptor able to communicate with a dedicated resource object) and then read from the file (causing communication with the resource object and some transfers of data). These operations, previously done in a single program employing libext2fs directly, would now require collaboration by two separate programs.

So, I would need to insert the appropriate code in the right places in my filesystem server and its objects to open a filesystem, search for a file of the given name, and to provide the file data. For the first of these, the test program was doing something like this in the main function:

retval = ext2fs_open(devname, EXT2_FLAG_RW, 0, 0, unix_io_manager, &fs);

In the main function of the filesystem server program, something similar needs to be done. A reference to the filesystem (fs) is then passed to the server object for it to use:

Fs_server server_obj(fs, devname);

When a request is made to open a file, the filesystem server needs to locate the file just as the test program needed to. The code to achieve this is tedious, employing the ext2fs_lookup function and traversing the directory hierarchy. Ultimately, something like this needs to be done to obtain a structure for accessing the file contents:

retval = ext2fs_file_open(_fs, ino_file, ext2flags, &file);

Here, the _fs variable is our reference in the server object to the filesystem structure, the ino_file variable refers to the place in the filesystem where the file is found (the inode), some flags indicate things like whether we are reading and/or writing, and a supplied file variable is set upon the successful opening of the file. In the filesystem server, we want to create a specific object to conduct access to the file:

Fs_object *obj = new Fs_object(file, EXT2_I_SIZE(&inode_file), fsobj, irq);

Here, this resource object is initialised with the file access structure, an indication of the file size, something encapsulating the state of the communication between client and server, and the IRQ object needed for cleaning up (as described in the last article). Meanwhile, in the resource object, the read operation is supported by a pair of libext2fs functions:

ext2fs_file_lseek(_file, _obj.position, EXT2_SEEK_SET, 0);
ext2fs_file_read(_file, _obj.buffer, to_transfer, &read);

These don’t appear next to each other in the actual code, but the first call is used to seek to the indicated position in the file, this having been specified by the client. The second call appears in a loop to read into a buffer an indicated amount of data, returning the amount that was actually read.

In summary, the work done by a collection of function calls appearing together in a single function is now spread out over three places in the filesystem server program:

  • The initialisation is done in the main function as the server starts up
  • The locating and opening of a file in the filesystem is done in the general filesystem server object
  • Reading and writing is done in the file-specific resource object

After initialisation, the performance of each part of the work only occurs upon receiving a distinct kind of message from a client program, of which more details are given below.

The Client Library

Although we cannot yet use the familiar C library functions for accessing files (fopen, fread, fwrite, fclose, and so on), we can employ functions that try to be as friendly. Thus, the following form of program may be used:

char buffer[80];
file_descriptor_t *desc = client_open("test.txt", O_RDONLY);

available = client_read(desc, buffer, 80);
if (available)
    fwrite((void *) buffer, sizeof(char), available, stdout); /* using existing fwrite function */
client_close(desc);

As noted above, the existing fwrite function in L4Re may be used to write file data out to the console. Ultimately, we would want our modified version of the function to be doing this job.

These client library functions resemble lower-level C library functions such as open, read, write, close, and so on. By targeting this particular level of functionality, it is hoped that much of the logic in functions like fopen can be preserved, this logic having to deal with things like mode strings (“r”, “r+”, “w”, and so on) which have little to do with the actual job of transmitting file content around the system.

In order to do their work, the client library functions need to send and receive IPC messages, or at least need to get other functions to deal with this particular work. My approach has been to write a layer of functions that only deals with messaging and that hides the L4-specific details from the rest of the code.

This lower-level layer of functions allows us to treat interprocess interactions like normal function calls, and in this framework those calls would have the following signatures, with the inputs arriving at the server and the outputs arriving back at the client:

  • fs_open: flags, buffer → file size, resource object
  • fs_flush: (no parameters) → (no return values)
  • fs_read: position → available
  • fs_write: position, available → written, file size
Here, the aim is to keep the interprocess interactions as simple and as infrequent as possible, with data buffered in the indicated buffer dataspace, and with reading and writing only occurring when the buffer is read or has been filled by writing. The more friendly semantics therefore need to be supported in the client library functions resting on top of these even-lower-level IPC messaging functions.

The responsibilities of the client library functions can be summarised as follows:

  • client_open: allocate memory for the buffer, obtain a server reference (“capability”) from the program’s environment
  • client_close: deallocate the allocated resources
  • client_flush: invoke fs_flush with any available data, resetting the buffer status
  • client_read: provide data to the caller from its buffer, invoking fs_read whenever the buffer is empty
  • client_write: commit data from the caller into the buffer, invoking fs_write whenever the buffer is full, also flushing the buffer when appropriate

The lack of a fs_close function might seem surprising, but as described in the previous article, the server process is designed to receive a notification when the client process discards a reference to the resource object dedicated to a particular file. So in client_close, we should be able to merely throw away the things acquired by client_open, and the system together with the server will hopefully handle the consequences.

Switching the Backend

Using a conventional file as the repository for file content is convenient, but since the aim is to replace the existing filesystem mechanisms, it would seem necessary to try and get libext2fs to use other ways of accessing the underlying storage. Previously, my considerations had led me to provide a “block” storage layer underneath the filesystem layer. So it made sense to investigate how libext2fs might communicate with a “block server” or “block device” in order to read and write raw filesystem data.

Employing a separate server to provide filesystem data

Employing a separate server to provide filesystem data

Changing the way libext2fs accesses its storage sounds like an ominous task, but fortunately some thought has evidently gone into accommodating different storage types and platforms. Indeed, the library code includes support for things like DOS and Windows, with this functionality evidently being used by various applications on those platforms (or, these days, the latter one, at least) to provide some kind of file browser support for ext2-family filesystems.

The kind of component involved in providing this variety of support is known as an “I/O manager”, and the one that we have been using is known as the “Unix” I/O manager, this employing POSIX or standard C library calls to access files and devices. Now, this may have been adequate until now, but with the requirement that we use the replacement IPC mechanisms to access a block server, we need to consider how a different kind of I/O manager might be implemented to use the client library functions instead of the C library functions.

This exercise turned out to be relatively straightforward and perhaps a little less work than envisaged once the requirements of initialising an io_channel object had been understood, this involving the allocation of memory and the population of a structure to indicate things like the block size, error status, and so on. Beyond this, the principal operations needing support are as follows:

  • open: initialises the io_channel and calls client_open
  • close: calls client_close
  • set block size: sets the block size for transfers, something that gets done at various points in the opening of a filesystem
  • read block: calls client_seek and client_read to obtain data from the block server
  • write block: calls client_seek and client_write to commit data to the block server

It should be noted that the block server largely acts like a single-file filesystem, so the same interface supported by the filesystem server is also supported by the block server. This is how we get away with using the client libraries.

Meanwhile, in the filesystem server code, the only changes required are to declare the new I/O manager, implemented in a separate library package, and to use it instead of the previous one:

retval = ext2fs_open(devname, ext2flags, 0, 0, blockserver_io_manager, &fs);

The Final Trick

By pushing use of the “rom” filesystem further down in the system, use of the new file access mechanisms can be introduced and tested, with the only “unauthentic” aspect of the arrangement being that a parallel set of file access functions is being used instead of the conventional ones. The only thing left to do would be to change the C library to incorporate the new style of file access, probably by incorporating the client library internally, thus switching the C library away from its previous method of accessing files.

With the conventional file abstractions reimplemented, access to files would go via the virtual filesystem and hopefully end up encountering block devices that are able to serve up the needed data directly. And ultimately, we could end up switching back to using the Unix I/O manager with libext2fs.

Introducing the new IPC mechanisms at the C library level

Introducing the new IPC mechanisms at the C library level

Changing things so drastically would also force us to think about maintaining access to the “rom” filesystem through the revised architecture, at least at first, because it happens to provide a very convenient way of getting access to data for use as storage. We could try and implement storage hardware support in order to get round this problem, but that probably isn’t convenient – or would be a distraction – when running L4Re on Fiasco.OC-UX as a kind of hosted version of the software.

Indeed, tackling the C library is probably too much of a challenge at this early stage. Fortunately, there are plenty of other issues to be considered first, with the use of non-standard file access functions being only a minor inconvenience in the broader scheme of things. For instance, how are permissions and user identities to be managed? What about concurrent access to the filesystem? And what mechanisms would need to be provided for grafting filesystems onto a larger virtual filesystem hierarchy? I hope to try and discuss some of these things in future articles.

Filesystem Abstractions for L4Re

Tuesday, February 12th, 2019

In my previous posts, I discussed the possibility of using “real world” filesystems in L4Re, initially considering the nature of code to access an ext2-based filesystem using the library known as libext2fs, then getting some of that code working within L4Re itself. Having previously investigated the nature of abstractions for providing filesystems and file objects to applications, it was inevitable that I would now want to incorporate libext2fs into those abstractions and to try and access files residing in an ext2 filesystem using those abstractions.

It should be remembered that L4Re already provides a framework for filesystem access, known as Vfs or “virtual file system”. This appears to be the way the standard file access functions are supported, with the “rom” part of the filesystem hierarchy being supported by a “namespace filesystem” library that understands the way that the deployed payload modules are made available as files. To support other kinds of filesystem, other libraries must apparently be registered and made available to programs.

Although I am sure that the developers of the existing Vfs framework are highly competent, I found the mechanisms difficult to follow and quite unlike what I expected, which was to see a clear separation between programs accessing files and other programs providing those files. Indeed, this is what one sees when looking at other systems such as Minix 3 and its virtual filesystem. I must also admit that I became tired of having to dig into the code to understand the abstractions in order to supplement the reference documentation for the Vfs framework in L4Re.

An Alternative Framework

It might be too soon to label what I have done as a framework, but at the very least I needed to decide upon a few mechanisms to implement an alternative approach to providing file-like abstractions to programs within L4Re. There were a few essential characteristics to be incorporated:

  • A way of requesting access to a named file
  • The provision of objects maintaining the state of access to an opened file
  • The transmission of file content to file readers and from file writers
  • A way of cleaning up when programs are no longer accessing files

One characteristic that I did want to uphold in any solution was to make programs largely oblivious to the nature of the accessed filesystems. They would navigate a virtual filesystem hierarchy, just as one does in Unix-like systems, with certain directories acting as gateways to devices exposing potentially different filesystems with superficially similar semantics.

Requesting File Access

In a system like L4Re, with notions of clients and servers already prevalent, it seems natural to support a mechanism for requesting access to files that sees a client – a program wanting to access a file – delegating the task of locating that file to a server. How the server performs this task may be as simple or as complicated as we wish, depending on what kind of architecture we choose to support. In an operating system with a “monolithic” kernel, like GNU/Linux, we also see such delegation occurring, with the kernel being the entity having to support the necessary wiring up of filesystems contributing to the virtual filesystem.

So, it makes sense to support an “open” system call just like in other operating systems. The difference here, however, is that since L4Re is a microkernel-based environment, both the caller and the target of the call are mere programs, with the kernel only getting involved to route the call or message between the programs concerned. We would first need to make sure that the program accessing files has a reference (known as a “capability”) to another program that provides a filesystem and can respond to this “open” message. This wiring up of programs is a task for the system’s configuration file, as featured in some of my previous articles.

We may now consider what the filesystem-providing program or filesystem “server” needs to do when receiving an “open” message. Let us consider the failure to find the requested file: the filesystem server would, in such an event, probably just return a response indicating failure without any real need to do anything else. It is in the case of a successful lookup that the response to the caller or client needs some more consideration: the server could indicate success, but what is the client going to do with such information? And how should the server then facilitate further access to the file itself?

Providing Objects for File Access

It becomes gradually clearer that the filesystem server will need to allocate some resources for the client to conduct its activities, to hold information read from the filesystem itself and to hold data sent for writing back to the opened file. The server could manage this within a single abstraction and support a range of different operations, accommodating not only requests to open files but also operations on the opened files themselves. However, this might make the abstraction complicated and also raise issues around things like concurrency.

What if this server object ends up being so busy that while waiting for reading or writing operations to complete on a file for one program, it leaves other programs queuing up to ask about or gain access to other files? It all starts to sound like another kind of abstraction would be beneficial for access to specific files for specific clients. Consequently, we end up with an arrangement like this:

Accessing a filesystem and a resource

Accessing a filesystem and a resource

When a filesystem server receives an “open” message and locates a file, it allocates a separate object to act as a contact point for subsequent access to that file. This leaves the filesystem object free to service other requests, with these separate resource objects dealing with the needs of the programs wanting to read from and write to each individual file.

The underlying mechanisms by which a separate resource object is created and exposed are as follows:

  1. The instantiation of an object in the filesystem server program holding the details of the accessed file.
  2. The creation of a new thread of execution in which the object will run. This permits it to handle incoming messages concurrently with the filesystem object.
  3. The creation of an “IPC gate” for the thread. This effectively exposes the object to the wider environment as what often appears to be known as a “kernel object” (rather confusingly, but it simply means that the kernel is aware of it and has to do some housekeeping for it).

Once activated, the thread created for the resource is dedicated to listening for incoming messages and handling them, invoking methods on the resource object as a proxy for the client sending those messages to achieve the same effect.

Transmitting File Content

Although we have looked at how files manifest themselves and may be referenced, the matter of obtaining their contents has not been examined too closely so far. A program might be able to obtain a reference to a resource object and to send it messages and receive responses, but this is not likely to be sufficient for transferring content to and from the file. The reason for this is that the messages sent between programs – or processes, since this is how we usually call programs that are running – are deliberately limited in size.

Thus, another way of exchanging this data is needed. In a situation where we are reading from a file, what we would most likely want to see is a read operation populate some memory for us in our process. Indeed, in a system like GNU/Linux, I imagine that the Linux kernel shuttles the file data from the filesystem module responsible to an area of memory that it has reserved and exposed to the process. In a microkernel-based system, things have to be done more “collaboratively”.

The answer, it would seem to me, is to have dedicated memory that is shared between processes. Fortunately, and arguably unsurprisingly, L4Re provides an abstraction known as a “dataspace” that provides the foundation for such sharing. My approach, then, involves requesting a dataspace to act as a conduit for data, making the dataspace available to the file-accessing client and the file-providing server object, and then having a protocol to notify each other about data being sent in each direction.

I considered whether it would be most appropriate for the client to request the memory or whether the server should do so, eventually concluding that the client could decide how much space it would want as a buffer, handing this over to the server to use to whatever extent it could. A benefit of doing things this way is that the client may communicate initialisation details when it contacts the server, and so it becomes possible to transfer a filesystem path – the location of a file from the root of the filesystem hierarchy – without it being limited to the size of an interprocess message.

Opening a file using a path written to shared memory

Opening a file using a path written to shared memory

So, the “open” message references the newly-created dataspace, and the filesystem server reads the path written to the dataspace’s memory so that it may use it to locate the requested file. The dataspace is not retained by the filesystem object but is instead passed to the resource object which will then share the memory with the application or client. As described above, a reference to the resource object is returned in the response to the “open” message.

It is worthwhile to consider the act of reading from a file exposed in this way. Although both client (the application in the above diagram) and server (resource object) should be able to access the shared “buffer”, it would not be a good idea to let them do so freely. Instead, their roles should be defined by the protocol employed for communication with one another. For a simple synchronous approach it would look like this:

Reading data from a resource via a shared buffer

Reading data from a resource via a shared buffer

Here, upon the application or client invoking the “read” operation (in other words, sending the “read” message) on the resource object, the resource is able to take control of the buffer, obtaining data from the file and writing it to the buffer memory. When it is done, its reply or response needs to indicate the updated state of the buffer so that the client will know how much data there is available, potentially amongst other things of interest.

Cleaning Up

Many of us will be familiar with the workflow of opening, reading and writing, and closing files. This final activity is essential not only for our own programs but also for the system, so that it does not tie up resources for activities that are no longer taking place. In the filesystem server, for the resource object, a “close” operation can be provided that causes the allocated memory to be freed and the resource object to be discarded.

However, merely providing a “close” operation does not guarantee that a program would use it, and we would not want a situation where a program exits or crashes without having invoked this operation, leaving the server holding resources that it cannot safely discard. We therefore need a way of cleaning up after a program regardless of whether it sees the need to do so itself.

In my earliest experiments with L4Re on the MIPS Creator CI20, I had previously encountered the use of interrupt request (IRQ) objects, in that case signalling hardware-initiated events such as the pressing of physical switches. But I also knew that the IRQ abstraction is employed more widely in L4Re to allow programs to participate in activities that would normally be the responsibility of the kernel in a monolithic architecture. It made me wonder whether there might be interrupts communicating the termination of a process that could then be used to clean up afterwards.

One area of interest is that concerning the “IPC gate” mentioned above. This provides the channel through which messages are delivered to a particular running program, and up to this point, we have considered how a resource object has its own IPC gate for the delivery of messages intended for it. But it also turns out that we can also enable notifications with regard to the status of the IPC gate via the same mechanism.

By creating an IRQ object and associating it with a thread as the “deletion IRQ”, when the kernel decides that the IPC gate is no longer needed, this IRQ will be delivered. And the kernel will make this decision when nothing in the system needs to use the IPC gate any more. Since the IPC gate was only created to service messages from a single client, it is precisely when that client terminates that the kernel will realise that the IPC gate has no other users.

Resource deletion upon the termination of a client

Resource deletion upon the termination of a client

To enable this to actually work, however, a little trick is required: the server must indicate that it is ready to dispose of the IPC gate whenever necessary, doing so by decreasing the “reference count” which tracks how many things in the system are using the IPC gate. So this is what happens:

  1. The IPC gate is created for the resource thread, and its details are passed to the client, exposing the resource object.
  2. An IRQ object is bound to the thread and associated with the IPC gate deletion event.
  3. The server decreases its reference count, relinquishing the IPC gate and allowing its eventual destruction.
  4. The client and server communicate as desired.
  5. Upon the client terminating, the kernel disassociates the client from the IPC gate, decreasing the reference count.
  6. The kernel notices that the reference count is zero and sends an IRQ telling the server about the impending IPC gate deletion.
  7. The resource thread in the server deallocates the resource object and terminates.
  8. The IPC gate is deleted.

Using the “gate label”, the thread handling communications for the resource object is able to distinguish between the interrupt condition and normal messages from the client. Consequently, it is able to invoke the appropriate cleaning up routine, to discard the resource object, and to terminate the thread. Hopefully, with this approach, resource objects will no longer be hanging around long after their clients have disappeared.

Other Approaches

Another approach to providing file content did also occur to me, and I wondered whether this might have been a component of the “namespace filesystem” in L4Re. One technique for accessing files involves mapping the entire file into memory using a “mmap” function. This could be supported by requesting a dataspace of a suitable size, but only choosing to populate a region of it initially.

The file-accessing program would attempt to access the memory associated with the file, and upon straying outside the populated region, some kind of “fault” would occur. A filesystem server would have the job of handling this fault, fetching more data, allocating more memory pages, mapping them into the file’s memory area, and disposing of unwanted pages, potentially writing modified pages to the appropriate parts of the file.

In effect, the filesystem server would act as a pager, as far as I can tell, and I believe it to be the case that Moe – the root task – acts in such a way to provide the “rom” files from the deployed payload modules. Currently, I don’t find it particularly obvious from the documentation how I might implement a pager, and I imagine that if I choose to support such things, I will end up having to trawl the code for hints on how it might be done.

Client Library Functions

To present a relatively convenient interface to programs wanting to use files, some client library functions need to be provided. The intention with these is to support the traditional C library paradigms and for these functions to behave like those that C programmers are generally familiar with. This means performing interprocess communications using the “open”, “read”, “write”, “close” and other messages when necessary, hiding the act of sending such messages from the library user.

The details of such a client library are probably best left to another article. With some kind of mechanism in place for accessing files, it becomes a matter of experimentation to see what the demands of the different operations are, and how they may be tuned to reduce the need for interactions with server objects, hopefully allowing file-accessing programs to operate efficiently.

The next article on this topic is likely to consider the integration of libext2fs with this effort, along with the library functionality required to exercise and test it. And it will hopefully be able to report some real experiences of accessing ext2-resident files in relatively understandable programs.

Using ext2 Filesystems with L4Re

Tuesday, February 5th, 2019

Previously, I described my initial investigations into libext2fs and the development of programs to access and populate ext2/3/4 filesystems. With a program written and now successfully using libext2fs in my normal GNU/Linux environment, the next step appeared to be the task of getting this library to work within the L4Re system. The following steps were envisaged:

  1. Figuring out the code that would be needed, this hopefully being supportable within L4Re.
  2. Introducing the software as a package within L4Re.
  3. Discovering the configuration required to build the code for L4Re.
  4. Actually generating a library file.
  5. Testing the library with a program.

This process is not properly completed in that I do not yet have a good way of integrating with the L4Re configuration and using its details to configure the libext2fs code. I felt somewhat lazy with regard to reconciling the use of autotools with the rather different approach taken to build L4Re, which is somewhat reminiscent of things like Buildroot and OpenWrt in certain respects.

So, instead, I built the Debian package from source in my normal environment, grabbed the config.h file that was produced, and proceeded to use it with a vastly simplified Makefile arrangement, also in my normal environment, until I was comfortable with building the library. Indeed, this exercise of simplified building also let me consider which portions of the libext2fs distribution would really be needed for my purposes. I did not really fancy having to struggle to build files that would ultimately be superfluous.

Still, as I noted, this work isn’t finished. However, it is useful to document what I have done so far so that I can subsequently describe other, more definitive, work.

Making a Package

With a library that seemed to work with the archiving program, written to populate filesystems for eventual deployment, I then set about formulating this simplified library distribution as a package within L4Re. This involves a few things:

  • Structuring the files so that the build system may process them.
  • Persuading the build system to install things in places for other packages to find.
  • Formulating the appropriate definitions to build the source files (and thus producing the right compiler and linker invocations).
Here are some notes about the results.

The Package Structure

Currently, I have the following arrangement inside the pkg/libext2fs directory:

include
include/libblkid
include/libe2p
include/libet
include/libext2fs
include/libsupport
include/libuuid
lib
lib/libblkid
lib/libe2p
lib/libet
lib/libext2fs
lib/libsupport
lib/libuuid

To follow L4Re conventions, public header files have been moved into the include hierarchy. This breaks assumptions in the code, with header files being referenced without a prefix (like “ext2fs”, “et”, “e2p”, and so on) in some places, but being referenced with such a prefix in others. The original build system for the code gets away with this by using the “ext2fs” and other prefixes as the directory names containing the code for the different libraries. It then indicates the parent “lib” directory of these directories as the place to start looking for headers.

But I thought it worthwhile to try and map out the header usage and distinguish between public and private headers. At the very least, it helps me to establish the relationships between the different components involved. And I may end up splitting the different components into their own packages, requiring some formalisation of their interactions.

Meanwhile, I defined a Control file to indicate what the package provides:

provides: libblkid libe2p libet libext2fs libsupport libuuid

This appears to be used in dependency resolution, causing the package to be built if another package requires one of the named entities in its own Control file.

Header File Locations

In each include subdirectory (such as include/libext2fs) is a Makefile indicating a couple of things, the following being used for libext2fs:

PKGNAME = libext2fs
CONTRIB_HEADERS = 1

The effect of this is to install the headers into a include/contrib/libext2fs directory in the build output.

In the corresponding lib subdirectory (which is lib/libext2fs), the following seems to be needed:

CONTRIB_INCDIR = libext2fs

Hopefully, with this, other packages can depend on libext2fs and have the headers made available to it by an include statement like this:

#include <ext2fs/ext2fs.h>

(The ext2fs prefix is provided by a directory inside include/libext2fs.)

Otherwise, headers may end up being put in a special “l4” hierarchy, and then code would need changing to look something like this:

#include <l4/ext2fs/ext2fs.h>

So, avoiding this and having the original naming seems to be the benefit of the “contrib” settings, as far as I can tell.

Defining Build Files

The Makefile in each specific lib subdirectory employs the usual L4Re build system definitions:

TARGET          = libext2fs.a libext2fs.so
PC_FILENAME     = libext2fs

The latter of these is used to identify the build products so that the appropriate compiler and linker options can be retrieved by the build system when this library is required by another. Here, PC is short for “package config” but the notion of “package” is different from that otherwise used in this article: it just refers to the specific library being built in this case.

An important aspect related to “package config” involves the requirements or dependencies of this library. These are specified as follows for libext2fs:

REQUIRES_LIBS   = libet libe2p

We saw these things in the Control file. By indicating these other libraries, the compiler and linker options to find and use these other libraries will be brought in when something else requires libext2fs. This should help to prevent build failures caused by missing headers or libraries, and it should also permit more concise declarations of requirements by allowing those declarations to omit libet and libe2p in this case.

Meanwhile, the actual source files are listed using a SRC_C definition, and the PRIVATE_INCDIR definition lists the different paths to be used to search for header files within this package. Moving the header files around complicates this latter definition substantially.

There are other complications with libext2fs, notably the building of a tool that generates a file to be used when building the library itself. I will try and return to this matter at some point and figure out a way of doing this within the build system. Such generation of binaries for use in build processes can be problematic, particularly if there is some kind of assumption that the build system is the same as the target system, but such assumptions are probably not being made here.

Building the Library

Fortunately, the build system mostly takes care of everything else, and a command like this should see the package being built and libraries produced:

make O=mybuild S=pkg/libext2fs

The “S” option is a real time saver, and I wish I had made more use of it before. Use of the “V” option can be helpful in debugging command options, since the normal output is abridged:

make O=mybuild S=pkg/libext2fs V=1

I will admit that since certain header files are not provided by L4Re, a degree of editing of the config.h file was required. Things like HAVE_LINUX_FD_H, indicating the availability of Linux-specific headers, needed to be removed.

Testing the Library

An appropriate program for testing the library is really not much different from one used in a GNU/Linux environment. Indeed, I just took some code from my existing program that lists a directory inside a filesystem image. Since L4Re should provide enough of a POSIX-like environment to support such unambitious programs, practically no changes were needed and no special header files were included.

A suitable Makefile is needed, of course, but the examples package in L4Re provides plenty of guidance. The most important part is this, however:

REQUIRES_LIBS   = libext2fs

A Control file requiring libext2fs is actually not necessary for an example in the examples hierarchy, it would seem, but such a file would otherwise be advisible. The above library requirements pull in the necessary compiler and linker flags from the “package config” universe. (It also means that the libext2fs headers are augmented by the libe2p and libet headers, as defined in the required libraries for libext2fs itself.)

As always, deploying requires a suitable configuration description and a list of modules to be deployed. The former looks like this:

local L4 = require("L4");

local l = L4.default_loader;

l:startv({
    log = { "ext2fstest", "g" },
  },
  "rom/ex_ext2fstest", "rom/ext2fstest.fs", "/");

The interesting part is right at the end: a program called ex_ext2fstest is run with two arguments: the name of a file containing a filesystem image, and the directory inside that image that we want the program to show us. Here, we will be using the built-in “rom” filesystem in L4Re to serve up the data that we will be decoding with libext2fs in the program. In effect, we use one filesystem to bootstrap access to another!

Since the “rom” filesystem is merely a way of exposing modules as files, the filesystem image therefore needs to be made available as a module in the module list provided in the conf/modules.list file, the appropriate section starting off like this:

entry ext2fstest
roottask moe rom/ext2fstest.cfg
module ext2fstest.cfg
module ext2fstest.fs
module l4re
module ned
module ex_ext2fstest
# plus lots of library modules

All these experiments are being conducted with L4Re running on the UX configuration of Fiasco.OC, meaning that the system runs on top of GNU/Linux: a sort of “user mode L4”. Running the set of modules for the above test is a matter of running something like this:

make O=mybuild ux E=ext2fstest

This produces a lot of output and then some “logged” output for the test program:

ext2fste| Opened rom/ext2fstest.fs.
ext2fste| /
ext2fste| drwxr-xr-x-       0     0        1024 .
ext2fste| drwxr-xr-x-       0     0        1024 ..
ext2fste| drwx-------       0     0       12288 lost+found
ext2fste| -rw-r--r---    1000  1000       11449 e2access.c
ext2fste| -rw-r--r---    1000  1000        1768 file.c
ext2fste| -rw-r--r---    1000  1000        1221 format.c
ext2fste| -rw-r--r---    1000  1000        6504 image.c
ext2fste| -rw-r--r---    1000  1000        1510 path.c

It really isn’t much to look at, but this indicates that we have managed to access an ext2 filesystem within L4Re using a program that calls the libext2fs library functions. If nothing else, the possibility of porting a library to L4Re and using it has been demonstrated.

But we want to do more than that, of course. The next step is to provide access to an ext2 filesystem via a general interface that hides the specific nature of the filesystem, one that separates the work into a different program from those wanting to access files. To do so involves integrating this effort into my existing filesystem framework, then attempting to re-use a generic file-accessing program to obtain its data from ext2-resident files. Such activities will probably form the basis of the next article on this topic.

Filesystem Familiarisation

Tuesday, January 29th, 2019

I previously noted that accessing filesystems would be a component in my work with microkernel-based systems, and towards the end of last year I began an exercise in developing a simple “toy” filesystem that could hold file-like entities. Combining this with some L4Re-based components that implement seemingly reasonable mechanisms for providing access to files, I was able to write simple test programs that open and access these files.

The starting point for all this was the observation that a normal system file – that is, something stored in the filesystem in my GNU/Linux environment – can be treated like an archive containing multiple files and therefore be regarded as providing a filesystem itself. Such a file can then be embedded in a payload providing a L4Re system by specifying it as a “module” in conf/modules.list for a particular payload entry:

module image_root.fs

Since L4Re provides a rudimentary “rom” filesystem that exposes the modules embedded in the payload, I could open this “toy” filesystem module as a file within L4Re using the normal file access functions.

fp = fopen("rom/image_root.fs", "r");

And with that, I could then use my own functions to access the files stored within. Some additional effort went into exposing file access via interprocess communication, which forms the basis of those mechanisms mentioned above, those mechanisms being needed if such filesystems are to be generally usable in the broader environment rather than by just a single program.

Preparing Filesystems

The first step in any such work is surely to devise how a filesystem is to be represented. Then, code must be written to access the filesystem, firstly to write files and directories to it, and then to be able to perform the necessary task of reading that file and directory information back out. At some point, an actual filesystem image needs to be prepared, and here it helps a lot if a convenient tool can be developed to speed up testing and further development.

I won’t dwell on the “toy” representation I used, mostly because it was merely chosen to let me explore the mechanisms and interfaces to be provided as L4Re components. The intention was always to switch to a “real world” filesystem and to use that instead. But in order to avoid being overwhelmed with learning about existing filesystems alongside learning about L4Re and developing file access mechanisms, I chose some very simple representations that I thought might resemble “real world” filesystems sufficiently enough to make the exercise realistic.

With the basic proof of concept somewhat validated, my attentions have now turned to “real world” filesystems, and here some interesting observations can be made about tools and libraries. If you were to ask someone about how they might prepare a filesystem, particularly a GNU/Linux user, it would be unsurprising to me if they suggested preparing a file…

dd if=/dev/zero of=image_root.fs bs=1024 count=1 seek=$SIZE_IN_KB

…then a filesystem in the file…

/sbin/mkfs.ext2 image_root.fs

…and then mounting it as follows:

sudo mount image_root.fs $MOUNTPOINT

Here, an ext2 filesystem is prepared in a normal system file, and then the operating system is asked to mount the filesystem and to expose it via a mountpoint, this being a directory in the general hierarchy of files and filesystems. But this last step requires special privileges and for the kernel to get involved, and yet all we are doing is accessing a file with the data inside it stored in a particular way. So why is there not a more straightforward, unprivileged way of writing data to that file in the required format?

Indeed, other projects of mine have needed to initialise filesystems, and such mounting operations have been a necessary aspect of those, given the apparent shortage of other methods. It really seemed that filesystems and kernel mechanisms were bound to each other, requiring us to always get the kernel involved. But it turns out that there are other solutions.

A History Lesson

I am reminded of the mtools suite of programs for accessing floppy disks. Once upon a time, when I was in my first year of university studies, practically all of our class’s programming was performed on a collection of DECstations. Although networked, each of these also provided a floppy drive capable of supporting 2.88MB disks: an uncommon sight, for me at least, with the availability of media and compatibility concerns dictating the use of 720KB and 1.44MB disks instead.

Presumably, within the Ultrix environment we were using, normal users were granted access to the floppy drive when logged in. With a disk inserted, mtools could then be used to access the disk as one big file, interpreting the contents and presenting the user with a view onto files and directories. Of course, mtools exposes a DOS-like interface to the disk, with DOS-like commands providing DOS-like output, and it does not attempt to integrate the contents of a disk within the general Unix filesystem hierarchy.

Indeed, the mechanisms of integrating such foreign data into the general filesystem hierarchy are denied to mere programs, this being a motivation for pursuing alternative operating system architectures like GNU Hurd which support such integration. But the point here is that filesystems – in this example, DOS-based filesystems on floppy disks – can readily be interpreted with the appropriate tools and without “operator” privileges.

Decoding Filesystem Data

Since filesystems are really just data structures encoded in storage, there should really be no magic involved in decoding and accessing them. After all, the code in the Linux kernel and in other operating system kernels has to do just that, and these things are just programs that happen to run under certain special conditions. So it would make sense if some of the knowledge encoded in these kernels had been extracted and made available as library code for other purposes. After all, it might come in useful elsewhere.

Fortunately, it is likely that such library code is already installed on your system, at least if you are using the ext2 family of filesystems. A search for some common utilities can be informative in this respect. Here is a query being issued for the appropriate filesystem checking utility on a Debian system:

$ dpkg -S e2fsck
e2fsprogs: /usr/share/man/man5/e2fsck.conf.5.gz
e2fsprogs: /sbin/e2fsck
e2fsprogs: /usr/share/man/man8/e2fsck.8.gz

And for the filesystem initialisation utility mentioned above:

$ dpkg -S mkfs.ext2
e2fsprogs: /sbin/mkfs.ext2
e2fsprogs: /usr/share/man/man8/mkfs.ext2.8.gz

The e2fsprogs package itself depends on a package called libext2fs2 – or e2fslibs on earlier distribution versions – and ultimately one discovers that these tools and their libraries are provided by a software distribution, e2fsprogs, whose aim is to provide programs and libraries for general access to the ext2/3/4 filesystem format. So it turns out to be possible and indeed feasible to write programs accessing filesystems without needing to make use of code residing in some kernel or other.

Tooling Up

Had I bothered to investigate further, I might have discovered another useful package. Running one or both of the following commands on a Debian system lets us see which other packages make use of the library functionality of e2fsprogs:

apt-cache rdepends e2fslibs
apt-cache rdepends libext2fs2

Amongst those listed is e2tools which offers a suite of commands resembling those provided by mtools, albeit with a Unix flavour instead of a DOS flavour. Investigating this, I discovered that these tools inherit somewhat from the utilities provided by e2fsprogs, particularly the debugfs utility.

However, investigating e2fsprogs by myself gave me a chance to become familiar with the details of libext2fs and how the different utilities managed to use it. Since it is not always obvious to me how the library should be used, and I find myself missing some good documentation for it, the more program code I can find to demonstrate its use, the better.

For my purposes, accessing individual files and directories is not particularly interesting: I really just want to treat an ext2 filesystem like an archive when preparing my L4Re payload; it is only within L4Re that I actually want to access individual things. Outside L4Re, having an equivalent to the tar command, but with the output being a filesystem image instead of a tar file, would be most useful for me. For example:

e2archive --create image_root.fs $ROOTFS

Currently, this can be made to populate a filesystem for eventual deployment, although the breadth of support for the filesystem features is rather limited. It is possible that I might adopt e2tools as the basis of this archiving program, given that it is merely a shell script that calls another program. Then again, it might be useful to gain direct experience with libext2fs for my other activities.

Future Directions

And so, in the GNU/Linux environment, the creation of such archives has been the focus of my experiments. Meanwhile, I need to develop library functions to support filesystem operations within L4Re, which means writing code to support things like file descriptor abstractions and the appropriate functions for accessing and manipulating files and directories. The basics of some of this is already done for the “toy” filesystem, but it will be a matter of figuring out which libext2fs functions and abstractions need to be used to achieve the same thing for ext2 and its derivatives.

Hopefully, once I can demonstrate file access via the same interprocess communications mechanisms, I can then make a start in replacing the existing conventional file access functions with versions that use my mechanisms instead of those provided in L4Re. This will most likely involve work on the C library support in L4Re, which is a daunting prospect, but some familiarity with that is probably beneficial if a more ambitious project to replace the C library is to be undertaken.

But if I can just manage to get the dynamic linker to be able to read shared libraries from an ext2 filesystem, then a rather satisfying milestone will have been reached. And this will then motivate work to support storage devices on various hardware platforms of interest, permitting the hosting of filesystems and giving those systems some potential as L4Re-based general-purpose computing devices, too.