Ordinateur

Are You Testing Too Much In Your Tests?

April 18, 2025

When you write an integration test that depends on an asynchronous process, how long do you wait before validating? In my experience, correctness ad performance are two different factors that should be measured separately.

A service I’ve worked on used the Command Query Responsibility Segregation (CQRS) pattern, where the read path is eventually consistent with the write path. Our service-level objective (SLO) was for data to be visible on reads no more than 10 seconds after writes. We thought it would be convenient to use our continuous integration tests to verify this SLO. It was not.

In distributed systems, if you complete an identical unit of work multiple times, the latency will vary. The tail latency is where certain requests take much longer to complete than the average. This phenomenon led to our integration test being very flaky. Instead of just verifying correctness, we had added the confounding factor of latency. A better approach would have been to separate the concerns by creating an independent alarm for the propagation latency metric. Then the integration test could have used a very conservative wait time before checking that the data was visible in the read path.