Test splitting

Speed up the execution of tests suites

Test splitting

Introduction

The majority of CircleCI users are developers who run tests on every code commit. When all of the tests pass, the code is safe to be merged into production. When tests fail developers want to discover what failed and fix the failure as soon as possible.

Test & debug is a delivery team inside CircleCI committed to enabling people to test faster, discover failures faster, and fix failures faster.

CircleCI offers a set of features to speed of test suites:

  • Using larger resource classes (eg. stronger CPU and more RAM)
  • Splitting tests across multiple nodes (parallelism)
  • Optimizing parallelism by splitting tests across different nodes by their historical duration instead of by test name

Problem statement

CircleCI parallelism is an extremely valuable feature for organizations with large test suites who want their tests to execute fast. But not many organizations use parallelism. Our task was to:

  • make the benefits of parallelism obvious
  • increase the number of projects which use parallelism

The target audience for this work is the developer experience persona.

Research

I heavily relied on internal sources of knowledge like documentation and customer-facing teams to understand factors that affect test speed and how the testing system works.

Test suites run inside a CircleCI job which is a virtual machine inside CircleCI infrastructure. CircleCI jobs have steps, which are in most cases Linux commands. Some commands run to prepare the testing environment, such as spinning up the environment and installing dependencies. Only one step runs tests.

Test splitting
Test splitting

Would the test suite benefit from using parallelism?

Parallelism splits tests into multiple nodes based on the parallelism factor. Only the step which runs tests can be parallelized. This means that test suites with a very long setup time won't benefit from using parallelism at all.

This realization helped me to identify the problem that we did not even intend to solve, but pretty much looked like a prerequisite to moving on.

I communicated with people across different teams to learn what actions can we recommend to users in the case that they are having a long setup time.

When to recommend parallelism?

When setup time is relatively short when compared to the run time test suite would benefit from using parallelism.

Test splitting

Test splitting optimization

For test suites that are parallelized across different nodes, people can do additional optimizations to make test suites run even faster. The ideal is that all nodes run for roughly the same time. If there is a node that lasts for much longer than others, some tests from that node should be redistributed to other nodes:

  • By setting the test suite to split tests by historical timing data instead of by test name
  • By manually redistributing tests

We worked on this as a team and came up with idle time as an indicator for the quality of test optimization. Idle time is the difference between the longest and shortest node.

Test splitting

Ideation

Recommending test splitting

Recommend test splitting when we know that the user is going to benefit from it. Demonstrate the value visually.

Test splitting

Happy path

Demonstrate the use of parallelism and test splitting.

Test splitting

Recommend decreasing setup time

When the ratio between setup time and run time is high it makes sense to decrease setup time before splitting tests.

Test splitting

Recommend test splitting optimization

When the finishing time between longest and shortest runs is long, the test suite can benefit from increased speed by test splitting optimization.

Test splitting

Shape up

To break this work up into smaller chunks for the purpose of efficient execution, product manager, lead engineer and I used the shape-up methodology from Basecamp to identify complexity, decide about what to put in the scope and out of scope, and identify potential rabbit holes.

The technical limitation of our product is that we can not precisely identify which steps in a job prepare testing environment and which step actually run tests. The idea from engineering was that instead of dividing a job into set up time and test run time we simply show the top 3 longest steps in a job with the assumption that the longest job step would be the step that runs tests.

This decision would turn engineering sprint from extra large to small.

To me, this was not the ideal way to solve a customer problem but was also a compromise that I was more than willing to make because it was a step in the right direction and this solution exposed the issue of a long setup time.

Test splitting

Validation

I created a research plan for validating designs with customers and participated in all customer calls which were led by a UX researcher.

Some of the feedback that we got:

  • For developers who just want to identify and fix failures, these views were not useful
  • Designs were very useful for developer experience people who want to optimize run times of test suites
  • One developer experience person commented that we are focusing on problems and solutions without explaining problems in more detail

Takeaways

  • This project increased awareness among CircleCI leadership about how the testing system works in CircleCI
  • By doing this work I was able to identify the problem which we did not intend to solve which in many ways was a prerequisite for meeting customer needs
  • This is a good example of collaboration between design, product, and engineering where we all needed to demonstrate flexibility to win
  • Number of projects that use parallelism increased by 10%
  • Instead of just mindlessly upselling for parallelism we managed to demonstrate its value to customers and give them a better idea about how they would benefit from it
  • Visibility into how test splitting works before this project was reserved for power users who spent time familiarizing themselves with CircleCI documentation
  • This project started a conversation in the design team about if we should aim for solving multiple problems at the same time