Tenets of Testing - A comparison with Test Axioms

Tenets of Testing – A comparison with Test Axioms

May 5, 2023

First published 13/10/2020

Nicholas Snogren posted on LinkedIn a reference to an “Axioms of Testing” presentation from 2009 and asked me to comment on his “Tenets of Software Testing”. There are some similarities but not many I think, some parallel too, but his question prompted me to give a longer response than I guess was expected. I said...

“Hi, thanks for asking for my opinion. Your tenets look interesting – and although I don't think they map directly to what I've written, they raise points in my mind that need a little airing – my mind grows cobwebby over time, and it's good to brush off old ideas. A bit like exercising muscles that haven't been used for a while haha.”

I give my response as a comparison with my Tester's Pocketbook, and Test Axioms website and presentations. (I notice that some of these posts are around 12 years old and some links don't work (anymore). Some are out of my control, others I'll have to track down and correct others – let me know if you want that.)

Tenets v Axioms

Firstly, let's get our definitions right.

According to dictionary.com, a Tenet is “any opinion, principle, doctrine, dogma, etc., especially one held as true by members of a profession, group, or movement.” Tenets might be regarded as beliefs that don't require proof and don't provide a strong foundation.

From the same source, an Axiom is, “i) a self-evident truth that requires no proof. ii) a universally accepted principle or rule. iii) Logic, Mathematics. a proposition that is assumed without proof for the sake of studying the consequences that follow from it”

I favoured the use of Axioms as a foundation for thinking in the testing domain. Axioms, if they are defensible, would provide a stronger foundational set of ideas. When I proposed a set of Testing Axioms, there was some resistance – here's a Prezi talk that introduces the idea.

James Bach in particular challenged the idea of Axioms and suggested I was creating a school of testing (when schools of testing were getting some attention) here.

By and large, by defining the Axioms in terms that are context-neutral, challenges have tended to be easy to acknowledge, disarm and set aside. Critics, almost entirely from the context-driven school, jumped the gun so to speak – they clearly hadn't read what I had written at the time before critiquing. Only one or two people responded to James' call to arms to criticise the Axioms and challenged them.

The Axioms are fully described in The Tester's Pocketbook – http://testers-pocketbook.com/.

The Axioms of Testing website – https://testaxioms.com/ – sets out the Axioms with some explanation and provides around 50% of the pocketbook content for free.

Axioms caught the attention (and criticism) of people because I pitched them as universal principles or laws of testing. Tenets, being less strident in their definition might not attract attention (or criticism) in the same way.

Immediate Comments on the Tenets

The Tenets are numbered and italicised. My comments in plain text.

A software product’s behavior is exhibited by interactions.
There is potentially an infinite number of possible behaviors in software.

These are properties of software. I'm not sure what 1 says other than behavior is triggered by interactions and presumably observed through interactions. Although a lot of software behaving autonomously might respond to internal events such as the passing of time and might not exhibit any behaviour through interactions e.g. a change of internal state. I'm not sure 1 says much.

Tenet 2 is reasonable for reasonably-sized software artefacts.

In the Tester's Pocketbook, I hardly use the term software. I prefer that we test systems. Software is usually an important part of every system. Humans do not interact with software (except by reading or writing it). Software exists in the context of program compilation, hosted on operating systems, running on devices which have peripherals and other interconnected systems which may or may not have user interfaces.

Basing Axioms on Systems means that the Axioms are open to interpretation as Axioms of testing ANY system (i.e. anything. I don't press that idea – but it's an attractive one). Another 'benefit' is that all of the Systems Thinking principles can also be brought to bear on our arguments. Outside its context, Software is not a System.

3. Some of those behaviors are potentially negative, that is, would detract from the objectives of the software company or users.

I use the term Stakeholders to refer to parties interested in the valuable, reliable behavior of systems and the outcome and value of testing those systems.

4. The potentiality for that negative behavior is risk.

OK, but it could be better worded. I would simply say 'potential modes of failure' rather than negative behaviour.

5. It’s impossible to guarantee a lack of risk as it’s impossible to experience an infinite number of behaviors.

Not really. You can guarantee a no-risk situation if no one cares or no one cares enough to voice their concerns before testing (or after testing). There is always the potential for failure because systems are complex and we are not skilled enough to create perfect systems.

6. Therefore a subset of behaviors must be sampled to represent the risk.

Rather than represent, I would say trigger the failure(s) of concern to explore the risk and better inform a risk-assessment.

7. The ability to take an accurate sample, representative of the true risk, is a testing skill.

Not sure what you mean by sample – tests or test cases, I presume? Representative is a subjective notion, surely; 'true' I don't understand; and a testing skill would need more definition than this, wouldn't it?

8. A code change to an existing product may also affect the product in an infinite number of ways.

I'd use 'ANY' change, to a 'SYSTEM'. Why 'also'? What would you say fits into a 'not only.... but also...' clause? But I'm not sure I agree with this assertion anyway. A code change changes some software artefact. The infinite effects (faulty behaviors?) derive from infinite tests (or uses in production) – which you say in 5 is impossible to achieve. I'm not sure what you're trying to say here.

9. It is possible to infer that some behaviors are more likely to be affected by that change than others.

You can infer anything you like by calling upon the great Unicorn in the sky. How will you do this? Either you use tools which are limited in capability or you might use change and defect history or you might guess based on partial knowledge and experience.

10. The risk -of that change- is higher within the set of behaviors that are more likely to be affected by that change.

Do you mean probability of failure or the consequence of failure? I assume probability. At any rate, this is redundant. You have already asserted this in 9. But it's also more complicated than this – a cosmetic defect on an app can be catastropic and a system failure negligible at times.

11. The ability to accurately estimate a scope of affected behavior is another testing skill.

I would call this the skills of impact analysis rather than testing. Developers are relatively poor at this, even having a far deeper technical knowledge (either they aren't able or lack the time to impact-analyse to any reliable degree). So we rely on testing to catch regressions which is less than ideal. Testers depend on their experience rather than system internals knowledge. But, since buggy systems behave in essentially unpredictable ways, we must admit our experience is limited and fallible. It's not a 'skill' that I would dwell on.

12. The scope and sampling ideas alone are meaningless without empirical evidence.

The scope and sampling ideas have meaning regardless of whether you implement them. I suppose you might say they are useless ideas if you don't gather evidence.

13. Empirical evidence is gathered through interactions with the product, observation of resultant behavior, and assessment of those observations.

The word empirical is redundant. I would use the word 'some' here. We also get evidence from operation in production, for example. (Unless you include that already?)

14. The accuracy and speed of scope estimation, behavior sampling, and gathering of evidence are key performance indicators for the tester.

If you are implying 13 are tester skills, I suppose you could make this assertion. But you haven't said what the value of evidence is yet. Is the purpose of testing only to evaluate the performance of testers? Hope not ;O)

15. Heuristics for the gathering of such evidence, the estimation of scope, and the sampling of behavior are defined in the Heuristic Test Strategy Model.

Heuristics are available in a wide range of sources including software, systems and engineering standards. Why choose such a limited source?

Inspiration for the Tenets

These tenets were inspired by James Bach’s “Risk Gap” and Doug Hubbard’s book “How to Measure Anything.” Both Bach and Hubbard discuss a very similar idea from different spaces. Hubbard suggests that by defining our uncertainty, we can communicate the value of reducing the uncertainty. Bach describes the “knowledge we need to know” as the “Risk Gap.” This Risk Gap is our uncertainty, and in defining it, we can compute the value of closing it. In testing, I realized we have three primary areas of uncertainty: 1) what is the “risk gap,” or knowledge we need to find out, 2) how can we know when we’ve acquired enough of that unknown knowledge, and 3) how can we design interactions with the program to efficiently reveal this knowledge.

There are several interesting anomalies to unpick here:

I recall James telling a story about Tom Gilb asserting anything could be measured. James suggested Love and Tom obliged. I don't think James was impressed.
'Defining uncertainty' – how do you do that reliably? Numerically? Objectively? We can put any numbers we like against probability and consequence. Being certain, with or without evidence, is always subjective. People can say they are more certain, but based on ... what? How do we correlate data with a human emotion and use that to make engineering decisions? People can be easily deceived – by themselves, not just by others. Consider this, for example, and this.
Risk Gap – how is a quantity of knowledge measured? What units? With what certainty? These are aspects that James has argued against since the early 1990s.
Your three challenges 1), 2)and 3) are reasonable as goals. How do Bach and Hubbard argue you achieve them, if not by calling on the subjective opinions of other people?

Some More General Comments

You seem to be trying to 'make a case' for testing as a tool to address the risk of failure in systems. I (like and) use that same approach in a rounder sense in my conference talks and writings, when practicable. My observations on this are:

The logic doesn't flow as it should because of flaws in the individual statements
You have no pre-definition of test, testing or its purpose at the outset, so it's not clear what your destination is
There's no defined goal of testing, other than to gather evidence to (my words) reassess risks and thereby reduce uncertainty (but you don't say why that's a 'good thing')
Testing enables a reassessment of risk, but that reassessment may increase risk if, for example, you find a bug in something that was previously deemed reliable. (there's a bigger conversation to be had, but risk is not a BAD thing, it's the barrier(s) you need to navigate or break through to gain your REWARD).
Extant, significant risks are a BARRIER to accepting or releasing systems. As such, the goal of testing is to provide evidence that to the people who make the decision, those risks are acceptable or negligible (reducing uncertainty, sure but never eliminating it). But the ultimate goal of testing is to show that the system WORKS. Encountering failures is a detour from that goal. The testing goal is broader than exploring risks.
You don't mention stakeholders at all. Why do we test? To provide evidence to testing stakeholders – our customers – so they can make better-informed decisions.

Summary

I don't want to give the impression that I'm criticising Nicholas or am arguing against the concept of Tenets or Principles or Axioms of testing. All I have tried to do is offer reasonable criticism of the Tenets to show that is a) extremely difficult to postulate bullet-proof Tenets, Principles or Axioms and b) it is extremely easy to criticise such efforts by:

Exposing flaws in the language used and the logic in an argument that C follows B follows A etc.
Identifying implicit assumptions of meaning, scope, dependency and authority and
Offering examples of context that contradict, or expose flaws in, the statements made.

I do this because I have been there many times since 2008 and occasionally have to defend the Test Axioms from such criticisms. I have to say, Critical Thinking is STILL a rare skill – I wish criticism were more often proffered as a result of it.

References

The Tester's Pocketbook – http://testers-pocketbook.com/
Axioms of Testing website – https://testaxioms.com/

Tags: #ALF #TestAxioms #Tenets #CriticalThinking

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account