Tonnerre Lombard

FFII’s coordinator for Switzerland

Blind trust in valgrind – the Debian OpenSSL vulnerability

The big run on valgrind way back in 2005 to 2006 has already demanded its first prominent victim: the OpenSSL implementation shipped with Debian.

Way back in May 2006, one of the Debian developers ran valgrind on OpenSSL in an attempt to make it more secure. Along the findings of valgrind was an uninitialized buffer named buf in the ssleay_rand_add function in openssl/crypto/rand/md_rand.c. The programmer simply commented out the MD_Update call which added the random data to the pool in order to fix the presumed flaw.

This blind patch was not exactly the correct thing to do. The data contained in buf was exactly the random pool initialization data, which was now no longer being added.

Apparently, the OpenSSL team also had its part in this game though. The Debian developer sent the patch upstream, and it was approved for debugging purposes by the OpenSSL team. Apparently, this was slightly misunderstood by the Debian developer, so he committed the now-defunct MD based PRNG into the Debian codebase.

According to the audit trail of the corresponding Debian bug, the Debian SSL team approved the patch and released a “fixed” package in May 2006.

The impact

As soon as the new OpenSSL release was deployed, the Debian users would now create keys using an MD as pseudo random number generator with hardly any modifications in the randon pool. As a short explanation to non-cryptographers: it was not really random.

The Debian Security team then discovered certain patterns which would emerge magically in most of their SSH and SSL keys, as well as keys from all other products which were based on OpenSSL. After several days if not weeks of analysis, the culprit had been tracked down to be that precise valgrind-triggered change.

The effect of this could be observed in the past couple of days by close followers of the Debian community. All of a sudden, the web certificates changed, all authorized_keys files were removed from the project servers, and some SSH host keys had changed, even though non of them had expired. This confused the Debian community very much, and was perceived as “A large security incident immediately ahead”.

With the release of the Debian Security Advisory today, this expectation was finally fulfilled, and the incident was indeed a major one: users were asked to regenerate all OpenSSL generated cryptographic keys since May 2006. A script was released to detect and warn about common patterns(!) in the various key files.

Lessons learned

There are certainly various lessons to be learned from this, both on the cryptographic, the programming and the practical side.

  1. Don’t blindly trust valgrind’s output.
    This has been repeated over and over again. If valgrind finds a presumed flaw in your code, it does not necessarily mean it is really a flaw. It must be investigated very thoroughly by the programmer, and not patched away lightly just because it’s there.
  2. Cryptography may be counter intuitive to a programmer.
    I personally can’t stop repeating this. What might appear as a runtime optimization to a programmer can indeed be a timing based information disclosure on the cryptographic level, and what might look like an uninitialized variable might actually not want to be zeroed out.
    This is also an argument against GnuTLS I keep repeating. Cryptography is not something which can be handled just like that by any good programmer. One needs at least a diploma in maths and programming plus be a very focused computer geek and close follower of the cryptographic community to even be able to touch cryptographic products successfully. This is the reason why I have major concerns with the GNU community rewriting an SSL implementation from scratch just because they do not like the OpenSSL license.
  3. A diversification of infrastructures may be useful at times.
    This might be a bit counter-intuitive to those who followed the argument from the last paragraph, but the sole reason why the chain of trust did not break for the Debian team was that besides their working OpenSSL PKI, they also had a working, trusted and distributed GnuPG PKI. Thus, even though all OpenSSL keys were compromitted, the GnuPG keys could still be used to verify the origin of various security credentials and to verify that the new key material et cetera was indeed originating from the Debian project.

That said, I would like to proudly add that neither the NetBSD base nor the pkgsrc version of OpenSSL are affected by this bug.

Audit trail

  • 22:20: Added more precise information on what keys and certificates changed
  • 23:25: Added reference to what exactly happened to get the patch approved

(Original source)

4 Responses to “Blind trust in valgrind – the Debian OpenSSL vulnerability”

  1. Michael Kallas Says:

    GnuTLS has one pro:

    GnuTLS has one pro: No version of it featured this bug.
    Also, even cryptographic code should be _understandable_ by experienced programmers, something that the OpenSSL codes seems to fail.

  2. ciaran Says:

    Any previous GnuTLS threads

    Have the criticisms of GnuTLS being raised on one of their mailing lists? If you remember some words from the subject of the tread, I’d be interested in reading more about it.

    And thanks for this blog entry. Very interesting.

  3. tonnerre Says:

    Re: GnuTLS has one pro:

    > GnuTLS has one pro: No version of it featured this bug.

    This is not exactly a feature but rather a coincidence in my opinion. I think that GnuTLS is going to suffer from a large amount of problems which OpenSSL has already put through in the past, such as, and especially, timing attacks and information leaks.

    For a programmer, it is very intuitive to optimize a program for fast execution, and to return errors in as concise manner as possible. In cryptographic applications, this is however a deadly mistake. If execution is aborted too early, for example before executing a hash function because the chosen hash function was not defined, but after verifying the key, then information about the key is being disclosed to a possible attacker depending on the execution time of the routine in question.

    There was an article on thedailywtf once where an application returned to the user: “Wrong key 0xabcd, expected 0xdeff” – what is obvious to a normal programmer, is deadly in cryptography.

    > Also, even cryptographic code should be _understandable_ by
    > experienced programmers, something that the OpenSSL codes
    > seems to fail.

    The vast majority of security problems with OpenSSL were not due to the code being misunderstood (and by the way, I don’t count this incident as misunderstood code either, because it simply isn’t), but due to mistakes made by people who lacked introspection of the other side of the fence: cryptographers doing programming errors, and programmers making design mistakes in cryptographic systems.

    I have spent quite some time so far looking through OpenSSL code, and I cannot observe the lack of clarity you mention. It is very structured, in my opinion, in a way which is obvious to a cryptographer with a sufficient amount of programming experience. And to anyone who does not fall into this category, crypto implementation are a big no-go anyway; it is way too easy to make major mistakes.

    This is by the way something OpenSSL tries to optimize away by use of its layer design around a well-known socket API. And this is the only sane approach: leave the programmer of the implementation with as few choice as possible, so he cannot make a lot of mistakes. (Still, people do, sure.)

    Tonnerre

  4. gnosis Says:

    The true origin of this problem was obfuscated, “clever” code, which reminds me of “The Story of Mel, a Real Programmer”

    http://www.cs.utah.edu/~elb/folklore/mel.html

    Such “clever” code should be thoroughly commented to explain *why* the “clever” code does what it does, and how it works.

    In this particular case, a simple comment indicating that the use of the uninitialized variable was deliberate, and that it was done to get some randomness.

    This does *not* require a math or programming PHD to understand! All it requires is a simple comment.

    The fact that such obfuscated, uncommented code was allowed in to the codebase in the first place speaks of the lack of quality control in the OpenSSL project.

    Further evidence of a lack of quality control in the project is that there was no automated test suite which checked the quality of the keys generated by this code. Such an automated test suite would have also prevented this fiasco — and many future ones as well.