Paul Boddie's Free Software-related blog


Archive for November, 2015

imip-agent: Integrating Calendaring with E-Mail

Tuesday, November 17th, 2015

Longer ago than I had, until now, realised, I wrote an article about my ongoing exploration of groupware and, specifically, calendaring. As I noted in that article, I felt that a broader range of options may be needed for those wishing to expand their use of communications technologies beyond plain e-mail and into the structured exchange of other kinds of information, whilst retaining and building upon that e-mail infrastructure.

And I noted that more often than not, people wanting to increase their ambitions in this regard are often confronted with the prospect of abandoning what they already use successfully, instead being obliged to adopt a complete package of technologies, some of which they may not even need. While proprietary software and service vendors might pursue such strategies of persuasion – getting the big sale or the big contract – it is baffling that Free Software projects might also put potential users on the spot in the same way. After all, Free Software is very much about choice and control.

So, as I spelled out in that previous article, there may be some mileage in trying to offer extensions to existing infrastructure so that people can increase their communications capabilities whilst retaining the technologies they already know. And in some depth (and at some length), I described what a mail-centred calendaring solution might need to provide in order to address most people’s needs. Finally, I promised to make my own efforts available in this area so that anyone remotely interested in the topic might get some benefit from it.

Last month, I started a very brief exchange on a Debian- and groupware-related mailing list about such matters, just to see what people interested in groupware projects might think, also attempting to find out what they use for calendaring themselves. (Unfortunately, there doesn’t seem to be so many non-product-specific, public and open places to discuss matters such as this one. Search mail software lists for calendaring discussions and you may even get to see hostility towards anyone mentioning groupware.) Ultimately, to keep the discussion concrete, I decided to announce informally what I have been working on.

Introducing imip-agent

imip-agent logoCalendaring and distributed scheduling can be achieved over e-mail using the iMIP standard. My work relies on this standard to function, providing programs that are integrated in mail transfer agents (MTAs) acting as calendaring agents. Thus, I decided to call the project imip-agent.

Initially, and as noted previously, my interest in such matters started with the mail handling functionality of Kolab and the component called Wallace that is responsible for responding to requests sent to certain e-mail addresses. Meanwhile, Kolab provided (and maybe still provides) a rather inelegant way of preparing “free/busy” information describing the availability of calendar system participants: a daemon program would run periodically, scanning mailboxes for events stored in special folders, and generate completely new manifests of each user’s schedule. (This may have changed since I last looked at Kolab in any serious manner.)

It occurred to me that the exchange of messages between participants in a scheduling transaction should be sufficient to maintain a live record of each participant’s availability, and that some experimentation would demonstrate the feasibility or infeasibility of such an approach. I had already looked into how existing architectures prepare and consume free/busy information, and felt that I had enumerated the relevant essentials for a viable calendaring architecture based on e-mail exchanges alone.

And so I set about learning about mail handling programs and expanding my existing knowledge of calendar-related standards. Fortunately, my work trying to get Kolab configured in a nice way didn’t go entirely to waste after all, although I also wanted to support different MTAs and not use convoluted Postfix-specific integration mechanisms, and so had to read up about more convenient and approachable mechanisms that other systems use to integrate with mail pipelines without trying hard to be all “high performance” about it. And I also wanted to make it possible for people to adopt a solution that didn’t force them to roll out LDAP in a scary “cross your fingers and run this script” fashion, even if many organisations already rely on LDAP and are comfortable with it.

The resulting description of this work is now available on the Web, and an attempt has been made to document the many different aspects of development, deployment and integration. Naturally, it is a work in progress and not a finished product: one step on the road to hopefully becoming a dependable solution involves packaging for Free Software distributions, which would result in the effort currently required to configure the software being minimised for the person setting it up. But at the same time, the mechanisms for integration with other systems (such as mail, mailboxes and Web servers) still need to be documented so that such work may have a chance to proceed.

Why Bother?

For various reasons unrelated to the work itself, it has taken a bit longer to get to this point than previously anticipated. But the act of making it available is, for me, a very necessary part of what I regard as a contribution to a kind of conversation about what kinds of software and solutions might work for certain groups of people, touching upon topics like how such solutions might be developed and realised. For instance, the handling of calendar data, although already supported by various Python libraries, hasn’t really led to similar Python-based solutions being developed as far as I can tell. Perhaps my contribution can act as an encouragement there.

There are, of course, various Python-based CalDAV servers, but I regard the projects around them to be somewhat opaque, and I perceive a common tendency amongst them to provide something resembling a product that covers some specific needs but then leaves those people deploying that product with numerous open-ended questions about how they might address related needs. I also wonder whether there should be more library sharing between these projects for more than basic data interpretation, but I know that this is quite difficult to achieve in practice, even if these projects should be largely functionally identical.

With such things forming the background of Free Software groupware, I can understand why some organisations are pitching complete solutions that aim to do many things. But here, in certain regards, I perceive a lack of opportunity for that conversation I mentioned above: there’s either a monologue with the insinuation that some parties know better than others (or worse, that they have the magic formula to total market domination) or there’s a dialogue with one side not really extending the courtesy of taking the other side’s views or contributions seriously.

And it is clear that those wanting to use such solutions should also be part of a conversation about what, in the end, should work best for them. Now, it is possible that organisations might see the benefit in the incremental approach to improving their systems and services that imip-agent offers. But it is also possible that there are also organisations who will contrast imip-agent with a selection of all-in-one solutions, possibly being dangled in front of them on special terms by vendors who just want to “close the deal”, and in the comparison shopping exercise that ensues, they will buy into the sales pitch of one of those vendors.

Without a concerted education exercise, that latter group of potential users are never likely to be a serious participant in our conversations (although I would hope that they might ultimately see sense), but the former group of potential users should be most welcome to participate in our conversations and thus enrich the wealth of choices and options that we should be offering. They would, I hope, realise that it is not about what they can get out of other people for nothing (or next to nothing), but instead what expertise and guidance they can contribute so that they and others can benefit from a sustainable and durable solution that, above all else, serves them and their needs and interests.

What Next?

Some people might point out that calendaring is only a small portion of what groupware is, if the latter term can even be somewhat accurately defined. This is indeed true. I would like to think that Free Software projects in other domains might enter the picture here to offer a compelling, broader groupware alternative. For instance, despite the apparent focus on chat and real-time communications, one doesn’t hear too much about one of the most popular groupware technologies on the Web today: the wiki. When used effectively, and when the dated rhetoric about wikis being equivalent to anarchy has been silenced by demonstrating effective collaborative editing and content management techniques, a wiki can be a potent tool for collaboration and collective information management.

It also turns out that Free Software calendar clients could do with some improvement. Their deficiencies may be a product of an unfortunate but fashionable fascination with proprietary mail, scheduling and social networking services amongst the community of people who use and develop Free Software. Once again, even though imip-agent seeks to provide only basic functionality as a calendar client, I hope that such functionality may inform or, at the very least, inspire developers to improve existing programs and bring them up to the expected levels of functionality.

Alongside this work, I have other things I want (and need) to be looking at, but I will happily entertain enquiries about how it might be developed further or deployed. It is, after all, Free Software, and given sufficient interest, it should be developed and improved in a collaborative fashion. There are some future plans for it that I take rather seriously, but with the privileges or freedoms granted in the licence, there is nothing stopping it from having a life of its own from now on.

So, if you are interested in this kind of solution and want to know more about it, take a look at the imip-agent site. If nothing else, I hope that it reminds you of the importance of independently-developed solutions for communication and the value in retaining control of the software and systems you rely on for that communication.

An actual user reports on his use of my parallel processing library

Monday, November 16th, 2015

A long time ago, when lots of people complained all the time about how it was seemingly “impossible” to write Python programs that use more than one CPU or CPU core, and given that Unix has always (at least in accessible, recorded history) allowed processes to “fork” and thus create new ones that run the same code, I decided to write a library that makes such mechanisms more accessible within Python programs. I wasn’t the only one thinking about this: another rather similar library called processing emerged, and the Python core development community adopted that one and dropped it into the Python standard library, renaming it multiprocessing.

Now, there are a few differences between multiprocessing and my own library, pprocess, largely because my objectives may have been slightly different. First of all, I had been inspired by renewed interest in the communicating sequential processes paradigm of parallel processing, which came to prominence around the Transputer system and Occam programming language, becoming visible once more with languages like Erlang, and so I aimed to support channels between processes as one of my priorities. Secondly, and following from the previous objective, I wasn’t trying to make multithreading or the kind of “shared everything at your own risk” model easier or even possible. Thirdly, I didn’t care about proprietary operating systems whose support for process-forking was at that time deficient, and presumably still is.

(The way that the deficiencies of Microsoft Windows, either inherently or in the way it is commonly deployed, dictates development priorities in a Free Software project is yet another maddening thing about Python core development that has increased my distance from the whole endeavour over the years, but that’s a rant for another time.)

User Stories

Despite pprocess making it into Debian all by itself – or rather, with the efforts of people who must have liked it enough to package it – I’ve only occasionally had correspondence about it, much of it regarding the package falling out of Debian and being supported in a specialised Debian variant instead. For all the projects I have produced, I just assume now that the very few people using them are either able to fix any problems themselves or are happy enough with the behaviour of the code that there just isn’t any reason for them to switch to something else.

Recently, however, I heard from Kai Staats who is using Python for some genetic programming work, and he had found the pprocess and multiprocessing libraries and was trying to decide which one would work best for him. Perhaps as a matter of chance, multiprocessing would produce pickle errors that he found somewhat frustrating to understand, whereas pprocess would not: this may have been a consequence of pprocess not really trying very hard to provide some of the features that multiprocessing does. Certainly, multiprocessing attempts to provide some fairly nice features, but maybe they fail under certain overly-demanding circumstances. Kai also noted some issues with random number generators that have recently come to prominence elsewhere, interestingly enough.

Some correspondence between us then ensued, and we both gained a better understanding of how pprocess could be applied to his problem, along with some insights into how the documentation for pprocess might be improved. Eventually, success was achieved, and this article serves as a kind of brief response to his conclusion of our discussions. He notes that the multiprocess model inhibits the sharing of global variables, almost as a kind of protection against “bad things”, and that the processes must explicitly communicate their results with each other. I must admit, being so close to my own work that the peculiarities of its use were easily assumed and overlooked, that pprocess really does turn a few things about Python programs on its head.

Update: Kai suggested that I add the following, perhaps in place of the remarks about the sharing of global variables…

Kai notes that pprocess allows passing more than one variable through the multi-core portal at once, identical to a standard Python method. This enables direct translation of a single CPU for-loop into a multiple-CPU model, with nearly the same number of lines of code.

Two Tales of Concurrency

If you choose to use threads for concurrency in Python, you’ll get the “shared everything at your own risk” model, and you can have your threads modifying global variables as they see fit. Since CPython employs a “global interpreter lock”, the modifications will succeed when considered in isolation – you shouldn’t see things getting corrupted at the lowest level (pointers or references, say) like you might if doing such things unsafely in a systems programming language – but without further measures, such modifications may end up causing inconsistencies in the global data as threads change things in an uncoordinated way.

Meanwhile, pprocess also lets you change global variables at will. However, other processes just don’t see those changes and continue happily on their way with the old values of those globals.

Mutability is a key concept in Python. For anyone learning Python, it is introduced fairly early on in the form of the distinction between lists and tuples, perhaps as a curiosity at first. But then, when dictionaries are introduced to the newcomer, the notion of mutability becomes somewhat more important because dictionary keys must be immutable (or, at least, the computed hash value of keys must remain the same if sanity is to prevail): try and use a list or even a dictionary as a key and Python will complain. But for many Python programmers, it is the convenience of passing objects into functions and seeing them mutated after the function has completed, whether the object is something as transparent as a list or whether it is an instance of some exotic class, that serves as a reminder of the utility of mutability.

But again, pprocess diverges from this behaviour: pass an otherwise mutable object into a created process as an argument to a parallel function and, while the object will indeed get mutated inside the created “child” process, the “parent” process will see no change to that object upon seeing the function apparently complete. The solution is to explicitly return – or send, depending on the mechanisms chosen – changes to the “parent” if those changes are to be recorded outside the “child”.

Awkward Analogies

Kai mentioned to me the notion of a portal or gateway through which data must be transferred. For a deeper thought experiment, I would extend this analogy by suggesting that the portal is situated within a mirror, and that the mirror portrays an alternative reality that happens to be the same as our own. Now, as far as the person looking into the mirror is concerned, everything on the “other side” in the mirror image merely reflects the state of reality on their own side, and initially, when the portal through the mirror is created, this is indeed the case.

But as things start to occur independently on the “other side”, with things changing, moving around, and so on, the original observer remains oblivious to those changes and keeps seeing the state on their own side in the mirror image, believing that nothing has actually changed over there. Meanwhile, an observer on the other side of the portal sees their changes in their own mirror image. They believe that their view reflects reality not only for themselves but for the initial observer as well. It is only when data is exchanged via the portal (or, in the case of pprocess, returned from a parallel function or sent via a channel) that the surprise of previously-unseen data arriving at one of our observers occurs.

Expectations and Opportunities

It may seem very wrong to contradict Python’s semantics in this way, and for all I know the multiprocessing library may try and do clever things to support the normal semantics behind the scenes (although it would be quite tricky to achieve), but anyone familiar with the “fork” system call in other languages would recognise and probably accept the semantic “discontinuity”. One thing I’ve learned is that it isn’t always possible to assume that people who are motivated to use such software will happen to share the same motivations as I and other library developers may have in writing that software.

However, it has also occurred to me that the behavioural differences caused in programs by pprocess could offer some opportunities. For example, programs can implement transactional behaviour by creating new processes which may or may not return data depending on whether the transactions performed by the new processes succeed or fail. Of course, it is possible to implement such transactional behaviour in Python already with some discipline, but the underlying copy-on-write semantics that allow “fork” and pprocess to function make it much easier and arguably more reliable.

With CPython’s scalability constantly being questioned, despite attempts to provide improved concurrency features (in Python 3, at least), and with a certain amount of enthusiasm in some circles for functional programming and the ability to eliminate side effects in parts of Python programs, maybe such broken expectations point the way to evolved forms of Python that possibly work better for certain kinds of applications or systems.

So, it turns out that user feedback doesn’t have to be about bug reports and support questions and nothing else: it can really get you thinking about larger issues, questioning long-held assumptions, and even be a motivation to look at certain topics once again. But for now, I hope that Kai’s programs keep producing correct data as they scale across the numerous cores of the cluster he is using. We can learn new things about my software another time!