Minding your Handlers and Queues
Flaky tests are the absolute worst; getting rid of them is my favorite.
Jesse Wilson
It’s true! No one likes a flaky test. They slow down development effort and reduce confidence in your test suite and testing tools. Given that we’re the primary maintainers of Paparazzi, when one of our Paparazzi test shards became flaky on CI, it was time to dig in and figure out why.
Background
Layoutlib is a library bundled with Android Studio which powers the XML layout and compose preview panes.
Paparazzi uses Layoutlib to emulate the Android runtime environment, thus eliminating the need for an emulator or real device when generating screenshots.
Generating screenshots without an emulator or device is great! The tests are fast and you get to avoid much of the Android toolchain in the build process, but as we’ll see with our flaky test, using Layoutlib outside of the context of Android Studio can lead to some interesting problems.
The stack trace
This is the stack trace our failed CI shard would generate intermittently:
java.util.ConcurrentModificationException
at java.base/java.util.WeakHashMap.forEach(WeakHashMap.java:1031)
at com.android.layoutlib.bridge.util.HandlerMessageQueue.extractFirst(HandlerMessageQueue.java:73)
at android.os.Handler_Delegate.executeCallbacks(Handler_Delegate.java:95)
at app.cash.paparazzi.Paparazzi.withTime(Paparazzi.kt:337)
at app.cash.paparazzi.Paparazzi.takeSnapshots(Paparazzi.kt:298)
at app.cash.paparazzi.Paparazzi.snapshot(Paparazzi.kt:214)
at app.cash.paparazzi.Paparazzi.snapshot$default(Paparazzi.kt:213)
From the ConcurrentModificationException
JavaDoc:
This exception may be thrown by methods that have detected concurrent modification of an object when such modification is not permissible.
For example, it is not generally permissible for one thread to modify a Collection while another thread is iterating over it.
Looking at the stack trace, it’s easy enough to see where the first condition for this exception occurs, but where is the second? Let’s take a closer look at HandlerMessageQueue:
public class HandlerMessageQueue {
private final WeakHashMap<...> runnablesMap = new WeakHashMap<>();
public void add(Handler h, long uptimeMillis, Runnable r) {
LinkedList<...> runnables = runnablesMap.computeIfAbsent(h, k -> new LinkedList<>());
...
}
public Runnable extractFirst(long uptimeMillis) {
...
runnablesMap.forEach((h, l) -> {
...
});
...
}
public void clear() {
runnablesMap.clear();
}
We can see that runnablesMap
is responsible for the exception and can now point to two places where our map might be getting altered:
HandlerMessageQueue.clear
and HandlerMessageQueue.add
.
Pursuing our leads
HandlerMessageQueue.clear()
If we follow the call hierarchy up from this method we see that, eventually, Paparazzi does make a call that will result in this method being executed, but it’s unlikely that this is the culprit; JUnit tests are single-threaded by default and this method is only called at the end of a given test run.
HandlerMessageQueue.add(Handler, long, Runnable)
This method is a little more complicated but right off the bat we have an interesting line to look at:
public void add(Handler h, long uptimeMillis, Runnable r) {
LinkedList<...> runnables = runnablesMap.computeIfAbsent(h, k -> new LinkedList<>());
...
}
runnablesMap.computeIfAbsent
will search for the Handler
and if it’s not present in the map, it will be associated with a new, empty LinkedList
.
At this point, I should mention that Layoutlib’s APIs are generally called from a single thread since it’s dealing with rendering views which typically occur on Android’s main thread. However, apps are rarely single-threaded.
Handler
s can be created from any thread and posted to any thread. Layoutlib will intercept calls to Handler.post
and call HandlerMessageQueue.add
instead, meaning it can run from any thread, too. If that occurs while HandlerMessageQueue.extractFirst
is running, it could result in a ConcurrentModificationException
.
To test this out we can use the following test case:
@Test
fun test() {
// 1
thread {
while (true) {
Handler(Looper.getMainLooper()).post{}
}
}
// 2
for (i in 0..10) {
Handler(Looper.getMainLooper()).post {}
}
// 3
paparazzi.snapshot {}
}
- Start a background thread which creates new
Handler
s. This will result in calls toHandlerMessageQueue.add
. - Give the main thread some scheduled work in order to increase the likelihood of
HandlerMessageQueue.add
andHandlerMessageQueue.extractFirst
running at the same time. - Call
paparazzi.snapshot
in order to causeHandlerMessageQueue.executeFirst
to run and clear the main thread work.
After running this test we will see the exact same ConcurrentModificationException
get thrown. This means somewhere in our code we are doing something that has a similar effect.
Zeroing in on the cause
It turns out that one of our views transitively uses LottieAnimationView
. A quick search of the Lottie GitHub repository for Handler
yields a hit on a class called LottieTask
. Could we be doing something that leads to the creation of this LottieTask
? Let’s see!
One of the first methods we call on LottieAnimationView
is setAnimation
which calls fromRawRes
in turn:
// LottieAnimationView
public void setAnimation(final int rawRes) {
...
setCompositionTask(fromRawRes(rawRes));
}
private LottieTask<...> fromRawRes(final int rawRes) {
...
return cacheComposition ? LottieCompositionFactory.fromRawRes(getContext(), rawRes) : LottieCompositionFactory.fromRawRes(getContext(), rawRes, null);
...
}
LottieCompositionFactory.fromRawRes()
is the method that actually creates our LottieTask
and here’s where things start to get interesting.
public class LottieTask<T> {
public static Executor EXECUTOR = Executors.newCachedThreadPool();
private final Handler handler = new Handler(Looper.getMainLooper());
LottieTask(Callable<LottieResult<T>> runnable, boolean runNow) {
if (runNow) {
try {
setResult(runnable.call());
} catch (Throwable e) {
setResult(new LottieResult<>(e));
}
} else {
EXECUTOR.execute(new LottieFutureTask(runnable));
}
}
private void notifyListeners() {
// Listeners should be called on the main thread.
handler.post(() -> {
...
});
}
LottieCompositionFactory.fromRawRes()
will eventually call the constructor above with runNow
set to false
and the Runnable
will be executed on a background thread via the EXECUTOR
.
When the task is done, it will notifyListeners()
which will cause the new Handler
to be added to the HandlerMessageQueue
.
With a probable culprit identified, it’s time to apply a fix.
Fixing the flake with a workaround
Lottie made a conscious decision to make LottieTask.EXECUTOR
public and static for the benefit of its consumers and their tests, so the fastest way to fix our flaky behavior is to overwrite the value of LottieTask.EXECUTOR
in your Paparazzi test, like so:
@Before
fun setup() {
LottieTask.EXECUTOR = Executor(Runnable::run)
}
Fixing the underlying problem
Layoutlib needs to account for views which create Handler
s from non-main threads. This map and the lists it contains need to be safe for concurrent access to correctly emulate what the real Android framework supports. We’ve filed an issue with the Android Studio team to address this problem.
One solution would be to have each method synchronize on HandlerMessageQueue.runnablesMap
. Doing so will protect both the map and its lists from concurrent access.
Here’s an example:
public void add(Handler h, long uptimeMillis, Runnable r) {
synchronized (runnablesMap) {
LinkedList<Pair<Long, Runnable>> runnables = runnablesMap.computeIfAbsent(h, k -> new LinkedList<>());
int idx = 0;
while (idx < runnables.size()) {
if (runnables.get(idx).first <= uptimeMillis) {
idx++;
} else {
break;
}
}
runnables.add(idx, Pair.create(uptimeMillis, r));
}
}
No more flakes
In future versions of Paparazzi, when Layoutlib is updated to a version with the proper fix, workarounds like the one above won’t be necessary. Until then, give this one a shot!
If you’d like to try out Paparazzi, get it on GitHub.