What are Software Regressions and How do We Avoid them?
from Paul Gerrard
https://youtu.be/a61wKNUbDhY?si=cdv1HJhGk7gGuNub
Video Abstract
Does your working software sometimes stop working when changes are introduced?
Are your developers unable to impact analyze changes so unwanted side effects get released?
What measures are you taking to reduce the number of software regression errors?
Now, even though it's expensive and doesn't prevent regressions, most companies use system level regression testing.
You could say this is a last resort and in some ways the least effective anti regression measure we can take.
Let's look more closely at what regressions are, why they occur, and which anti regression measures are available to us.
Overview
I want to talk about software regressions and why regressions occur. If software regressions are the enemy, we want to prevent them as well as find them. Now, there are several options and we should consider all of them depending on our circumstances.
We need to know how regressions occur and why and take measures to prevent them as much as possible. So let's explore what a regression is.
What is a software regression?
One definition would be
“an unintended side effect or bug created or activated when changes are made to … something”
... and we’ll look what that something is.
There are several causes of software regressions and there are some variations of these too.
Causes of Regressions
Obviously code changes are a big concern. The most common cause of regressions is when developers modify existing code. Code changes often, unintentionally affect the behavior of other parts of the system.
But there are also environment changes that can cause problems.
Environment Changes
For example, hardware and operating system and other software upgrades can cause previously stable software to fail.
Updates or changes to third party libraries, APIs or services that your software relies on can introduce regressions.
These third parties could be partnering organizations or divisions in your own company.
Lack of Technical Understanding
When teams do not share adequate knowledge or understanding of the system's overall architecture, or if there is a lack of communication between different development teams, regressions are more likely to occur.
Older code bases usually lack clear documentation. All the experts may be long gone.
- The knowledge and understanding of the original design choices are poor so architects, designers and developers make mistakes.
Code maintenance becomes very risky because no one has time to really analyze code to understand it.
And without that understanding, it becomes difficult to predict the impacts of change.
Developer Testing
It could be that development testing, if not thorough, can result missed regressions.
Developers are adopting better testing practices, including test first and TDD, but it's a slow process.
The big question is how can we avoid software regressions? There are several well established approaches.
More Effective Impact Analysis
The obvious one is to perform more effective impact analyses. But impact analysis is difficult and it's never going to be 100% reliable.
Requirements Impact Analysis
At the requirements level, we need to track requirements changes to understand the impact on other requirements.
Code Level Impact Analysis
At the code level, we have to trace code changes to understand the impact on other code.
Environment Impact Analysis
We should also evaluate impact of environmental changes on our systems too.
Now, all these measures sound great in principle. The problem is, they can be extremely difficult to apply in practice.
But there are other practices that help.
Anti-Regression Measures
Test-First Approaches
The first is test-first development.
Now, test-first implies the whole team think about testing before both new development and changes, whether due to requirements or bug reports.
Test-driven-development or TDD means developers write tests before writing code, and when done properly, means software changes incrementally in an always-tested state.
In continuous delivery environments, TDD is easier to apply and very effective at reducing regressions in later test stages.
We should not forget that test-first includes testing requirements and testing requirements is a powerful approach.
For example, if you write gherkin stories, creating feature scenarios not only helps testing, it can help the whole team to recognise and understand impacts.
Tracing language and feature changes across requirements gives some insight into impact too.
The use and reuse of data across requirements can point you in the right direction to find other impacts.
CI/CD Disciplines
Continuous Integration Continuous Deployment – CICD – pipelines allow automated testing to run every time new code is pushed to the code base.
Continuous approaches extend the test-first concept. Test first becomes test ALWAYS.
This is why continuous delivery helps identify issues early to keep the software in a deployable state.
Code Review
Regular code reviews can help catch potential problems and prevent regressions because code changes are critically examined from new perspectives.
Now, tools can scan code in isolation, but developers and architects – humans – can look at code in the context of interfacing components too.
In this way, interfaces and collaborating components are examined more closely to find inconsistencies and impacts.
Refactoring
Regular refactoring improves code readability, maintainability and developer understanding and this reduces regressions too.
Refactoring is an essential stage of test driven development.
The TDD mantra is RED, GREEN, REFACTOR in all TDD cycles.
Refactoring is a critical stage in the TDD cycle. Refactoring should not be an afterthought, but refactoring is too often neglected if time is tight.
Version Control Discipline
Good version control practices eliminate some types of regressions.
Developers are well used to version control tools such as GIT. But version control in continuous and DevOps environments requires discipline by BOTH developers and software managers.
Good version control practices not only reduce regressions, but also version control tools are an invaluable aid to tracing the troublesome code that causes a regression failures.
Feature Flags
Feature flags allow you to enable or disable specific features in code dynamically.
In test or production, new or changed features can be released to a specific environment or selected users.
If there are problems, the features can be withdrawn or turned off.
This doesn’t reduce regression, but it can reduce the impact of regression failures.
So, with care, we can extend some testing into production.
Test Coverage Thorough the Life-cycle
Finally, comprehensive test coverage helps. If your entire system is covered by tests at all stages, if these tests are repeatable, and repeated, this has to help the battle against regressions.
But this is an unachievable ideal in many situations.
Continuous delivery and test approaches are probably the least painful way of making progress.
System and User Regression Testing
System end to end and user tests are a different matter. The problem is, they are often too slow and too late to keep pace with continuous development cycles.
Summary
In summary, by adopting a mix of these measures, you could significantly reduce the chances of regressions.
They definitely help to ensure new updates and changes improve the software without breaking existing functionality.
But there's no prizes for guessing the elephant in the room.
Automated Regression Testing
What is the role of automated system and user regression testing?
Automated regression testing is a very effective way to catch regressions after they occur, but they are not a prevention measure, unless tests are part of a TDD cycle.
System and user regression tests are at best, a partial safety net and may help, when other measures fail.
The problem is that often, impact analysis in particular, is difficult and can be expensive. So regression prevention is not considered to be economic – it’s not looked at closely enough or acted upon.
For many years, late regression testing is the only approach used to prevent regressions reaching end-users.
As a consequence, the test automation tool market has grown dramatically.
The problem is these tools support a costly, less effective approach to regression prevention.
Late regression testing is a last resort, but test execution tools are the go-to solution to regression problems.
We'll talk more about automated regression testing in the next video.