Test Automation for Continuous Deployment

Ease the transition to continuous deployment with a modern approach to test automation.

Nov 19, 2022

To adopt continuous deployment you face a big challenge: You need to build enough automated test coverage to make this safe.

If you automate an existing manual regression test suite, you’ll end up with a preponderance of end-to-end tests that are expensive to maintain and slow to run. At the same time, this approach doesn’t put enough emphasis on non-functional concerns, like security, accessibility, performance, and coding style. None of this is conducive to successful continuous deployment, where you need to deploy quickly and frequently, with a high degree of confidence in your overall product quality.

Modern deployment practices call for a modern approach to test automation. To do continuous deployment well, you need a test suite with:

Broad test coverage…
…but not too deep
Zero tolerance for test failures

Let’s find out how these guidelines ease the transition to continuous deployment.

Broad Coverage

In software engineering organizations that ship monthly or quarterly releases, engineers tend to focus on fast feedback tests they can run locally. This includes unit tests and light integration tests. Maybe they also integrate their changes regularly, with automated functional testing on a continuous integration server.

Thorough regression testing happens at release time, along with security testing, performance testing, accessibility testing, and any other tests that are costly to set up or run. Very often, a separate team execute these tests, in a dedicated environment they own and govern. Product quality becomes a focus at the end of each release cycle, and teams may allocate time for “hardening”, to focus exclusively on testing and fixing bugs.

With continuous deployment, this model is no longer possible. Engineers deploy to production many times a day, and every deployment must meet the quality bar for production code. This means all forms of testing must be combined into an automated test suite, and run continuously, on every deployment. And the test suite must be broad enough to cover all functional and non-functional validation.

A robust test suite for continuous deployment will include unit testing, integration testing, and end-to-end testing. These should all be integrated directly into the deployment pipeline, and run automatically on every code change. But on top of that, any other tests you run occasionally today must now be integrated into the pipeline. This may include static code analysis, configuration checks, linters to maintain your house coding style, API contract tests, fuzz tests, and a battery of static and dynamic security tests. If you perform regular tests for accessibility and performance, you can look for tools to automate these as well.

Building out your automated test suite with these additional forms of coverage means you can enjoy production-grade release safety for every individual code change. With broad and continuous coverage, you’ll also reduce the churn that comes from fixing large numbers of non-functional test failures at the same time, during infrequent integration windows.

But Not Too Deep

Building up this level of test coverage seems daunting, but here’s the thing… You need broad coverage, but it doesn’t have to be deep. If the purpose of your test suite is to let you safely deploy code to production, then the ideal amount of coverage is literally as little as possible, while still achieving that goal.

With continuous deployment, that’s not a very high bar — because every change you deploy is small, and hence inherently safe. You’re no longer integrating a lot of features at the same time, and looking for regression issues across the entire product. Instead, you’re making individual, highly targeted changes in one product area at a time. These changes are easy to test locally, easy to reason about, and easy to peer review. They don’t need the same level of regression coverage you’d apply for a big bang release.

What constitutes enough coverage depends on the type of test. Third party tools for linting and static security testing come with hundreds of built-in rules that execute in seconds. You may as well run all of these, because they’ll help maintain product quality for virtually no effort on your part. Unit tests are cheap to build and run, and you can go to town on contract tests, configuration checks, and other cheap tests.

Instead, you should focus your efforts on tuning coverage for expensive tests, such as end-to-end tests and automated performance tests. The goal here is not to achieve comprehensive test coverage, but to exercise your most important workflows. These tests act as a sanity check, to catch anything that’s fundamentally broken in the product. Engineers will still test the specific areas they worked on before merging a code change. And you can use cheaper tests — such as snapshot tests — to catch any functional regressions in other product areas.

How do you know you’ve made the right tradeoff between test coverage, maintenance cost, and deployment speed? With some basic operational metrics, you can answer this question analytically. For example, if adding or removing tests doesn’t materially affect your rate of defects in production, you may have more tests than you need of that type. You can also set some high level goals around deployment speed and change failure rate. If you target a 15 minute lead time to production, and a change failure rate of less than 1%, your teams can tune their test suites to achieve those targets.

A less scientific approach is to measure your deployment confidence with the “Friday afternoon test”. If your engineers are comfortable deploying to production on Friday afternoon, that’s a good sign. They know the tests are fast enough and stable enough that they won’t waste their Friday evening getting the deployment into production. And they clearly feel the test coverage is good enough that they won’t get paged for regression issues over the weekend.

The concept of reducing test coverage seems alien at first. But to practice continuous deployment safely and sustainably, you need a good balance of broad test coverage, low maintenance costs, and fast execution times. That means you have to get comfortable trimming out tests that are not pulling their weight.

Zero Tolerance for Failure

Now that you’ve tuned your test coverage, there’s still the matter of making sure tests pass. With infrequent releases, much of the integration and testing work happens close to release time. This flurry of activity makes it inevitable that some tests fail. Humans need to review these test results, and make a judgement call on whether to fix the code, fix the test, or release regardless.

With continuous deployment, this is not practical. Even a small engineering team can deploy code to production a dozen times a day or more. When you deploy this often, there’s no room for human decision making. The tests have to pass every time — and if they don’t pass, you need to know it’s due to a legitimate defect. If you slow down to investigate a test failure manually, you’re not only holding up your own deployment, but your team will be piling up change sets behind you as well. There should be no manual decision points in a continuous deployment pipeline, which means you cannot accept test failures.

This is even more important if your organization is new to continuous deployment. You can expect some initial pushback around testing as people get used to the idea of fully automated deployments. Customers and compliance teams will feel more comfortable if you can guarantee that every deployment passed every test case.

It’s also easier culturally to enforce a 100% pass rate. Paraphrasing former Ikea Chief Sustainability Officer Steve Howard, if your target is 95%, everyone will find a reason they should be in the 5%. Engineers will become desensitized to flaky tests, and will just restart failed builds, or wait for others to investigate. But if you set the target at 100%, you create total clarity for the team, and engineers just get on with fixing every failure.

Besides, if a test fails, and you still feel it’s ok to deploy to production, do you really need that test? In the spirit of reducing the depth of coverage, sometimes the correct response to a broken or flaky test is to simply delete it. Adopting a zero tolerance policy for test failures helps keep your test suite small, fast, and dependable.

Get Started

With all this in mind, what should you do next?

It’s always a good idea to start with an assessment of your current test suite. If you haven’t adapted your test strategy for continuous deployment, your test coverage and test maintenance effort are probably heavily skewed towards functional regression tests.

The end goal is a broad test suite that’s fast, dependable, and easy to maintain. Trim expensive end-to-end tests down to the most important workflows — or those tests that most often find legitimate defects. Eliminate slow and flaky tests. Then add broader coverage as needed, to cover configuration, coding style, security, accessibility, and other non-functional concerns. And implement a zero tolerance policy for test failures, to improve consistency and deployment confidence — for yourself and for your customers.

If you haven’t yet adopted continuous deployment because you don’t have enough test coverage … are you sure? Continuous deployment is inherently safer, because each change is small and self-contained. That means you can probably get started with what you have. You’ll quickly discover where the gaps are, and you can tune the coverage incrementally over time.

Incrementalism

Discussion about this post