How to verify that apps from play store match their source code

Given the recent release of the “Corona Warn App” by the German RKI (Robert-Koch-Institut) several users complained that it’s not possible to ensure the app from the Play Store is actually the same as the source code that was published, as the App is not built reproducible.

However, there is still some method to have certainty that the app and source code actually do match, and I’m going to describe this method in this blog post. As an example I’ll use the Corona Warn App (de.rki.coronawarnapp) version 1.0.0, but the method can be applied to any other App as well.

Step 1: Get the .apk from Play Store

The first step is to get hands on the .apk-file as it is uploaded to the play store. If you already installed the app from the Play Store (either using original Play Store app or using apps like Aurora Store), you can fetch it from the device over adb using the following two commands:

$ adb shell pm path de.rki.coronawarnapp
package:/data/app/de.rki.coronawarnapp-Dftmp0SK11gWd1HZ-i0UYg==/base.apk
$ adb pull /data/app/de.rki.coronawarnapp-Dftmp0SK11gWd1HZ-i0UYg==/base.apk de.rki.coronawarnapp.apk
/data/app/de.rki.coronawarnapp-Dftmp0SK11gWd1HZ-i0UYg==/base.apk: 1 file pulled, [...]

This way, the apk is stored as de.rki.coronawarnapp.apk in current working directory.

Step 2: Get and build the .apk from source code

This step varies a lot between apps. I’ll stick with the most popular way source code is distributed and built: git and gradle. Once you got your hands on the source code repository, clone it and make sure to check-out the branch.

$ git clone https://github.com/corona-warn-app/cwa-app-android
Cloning into 'cwa-app-android'...
$ cd cwa-app-android
$ git checkout 1.0.0
HEAD is now at [...]
$ ./gradlew build

This requires that you correctly installed and configure the Android SDK. The build process usually gives meaningful error messages. One important thing to note is that tha gradle android build process requires a full JDK including all modules. Easiest is to just use OpenJDK 8 if it’s available for your system. You can specify to use a certain JDK by starting gradle using

$ env JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 ./gradlew build

The resulting apk will end up in the build folder of the main artifact of the source code. If you are unsure you can list all candidates using the command

$ find -type f -name "*-release-unsigned.apk"
./Corona-Warn-App/build/outputs/apk/device/release/Corona-Warn-App-device-release-unsigned.apk
./Corona-Warn-App/build/outputs/apk/deviceForTesters/release/Corona-Warn-App-deviceForTesters-release-unsigned.apk

In this example, there are two candidates, but the name makes it obvious that the one we are looking for is not the “deviceForTesters” version.

Step 3: Unpack and disassemble the .apk-files

We don’t want to compare the .apk-files bitwise as this is very likely to mismatch (due to differences in build environments, if no care was taken to create a reproducible build) and the differences are hard to impossible to understand. Instead we disassemble the .apk files and then can work on its individual contents and smali assemble code. The easiest tool to do this is the “apktool”.

$ apktool d de.rki.coronawarnapp-from-source.apk
[...]
$ apktool d de.rki.coronawarnapp-from-play.apk
[...]

This crreates two new folders de.rki.coronawarnapp-from-source and de.rki.coronawarnapp-from-play which contain all the contents from the corresponding .apk

Be careful to see if there are any errors or warnings in the output. Most of warnings can be safely ignored, but some may cause the disassembly result to be incomplete. Also apktool tends to be incompatible with latest android versions every now and then, so if you analyze new apps, make sure you have the latest version of apktool.

Step 4: Visualize differences

One easy way to visualize the differences between two folders is meld, a graphical diff viewer.

$ meld de.rki.coronawarnapp-from-source de.rki.coronawarnapp-from-play

This may need a few seconds to create the diff, afterwards it will present you with a full directory listing and all files with differences highlighted. Folders that are exactly the same are not expanded. You can further restrict the displayed files using by unchecking “View” -> “File status” -> “Same”. Also make sure that no filter in “View” -> “File filters” is checked.

Now you are very likely to see a bunch of differences:

  • original/META-INF/MANIFEST.MF – this file contains hashes of every part of the .apk file. It thus is very likely to differ if there are any differences at all.
  • original/META-INF/*.SF and original/META-INF/*.RSA – these files represent the cryptographic signature by the app author. The app you built can’t have the same signature as you don’t own the author’s private key.
  • res/*/*.png – These files normally are the same, however in some cases they are generated during the build from vector graphics or are optimized (“crunched”) during build and in that case they can differ slightly based on the build system. If you want you can visually compare these files to ensure they are the same. You should also verify that there are no large differences in the file size, as that could be an indicator for information hidden in the file (e.g. in it’s meta data).
  • apktool.yml – This file includes information from apktool and does not include any actual content of the .apk files. As this file also includes the original .apk file name, it’s very likely to differ.

Now it can happen that .smali files remain. SMALI is an assembly language for the dalvik virtual machine that is was used on Android. To understand the differences in these files you should know some basic programming. Double clicking the file name displays and highlights the differences directly in meld. In the above case we have two files each containing just a single difference, and it looks like it’s very much the same:

The last file in our example has some more changes.

Looking closely at the code, one can see that the method a() and b() are just exactly swapped, which equals the difference in the two invocations. Such difference can happen if code is automatically generated at build time and the generator does not enforce a strict ordering.

Closing remarks

One thing to mention is that should the open-source code include code that e.g. tries to load additional code from image assets or that are hidden in the .apk file such that apktool does not correctly handle it, this allows to deploy differences that have impact but would not show up in this analysis. However such loader would have to be present in the code that is visible to apktool and as such the open source project itself. It thus is necessary to verify that the source code does not include such routines.

However it would be much better if builds would be done reproducible. This is possible, a popular example of an app doing it is Signal.