Paul Gerrard

My experiences in the Test Engineering business; opinions, definitions and occasional polemics. Many have been rewritten to restore their original content.

First published 11/10/2011

Anne-Marie Charrett wrote a blog post that I commented on extensively. I've reproduced the comment here:

“Some to agree with here, and plenty to disagree with too...

  1. Regression testing isn't about finding bugs the same way as one might test new software to detect bugs (testing actually does not detect bugs, it exposes failure. Whatever.) It is about detecting unwanted changes in functionality caused by a change to software or its environment. Good regression tests are not necessarily 'good functional tests'. They are tests that will flag up changes in behaviour – some changes will be acceptable, some won't. A set of tests that purely achieve 80% branch coverage will probably be adequate to demonstrate functional equivalence of two versions of software with a high level of confidence – economically. They might be lousy functional tests “to detect bugs”. But that's OK – 'bug detection' is a different objective.

  2. Regression Testing is one of four anti-regression approaches. Impact analysis from a technical and business point of view are the two preventative approaches. Static code analysis is a rarely used regression detection approach. Fourthly...and finally ... regression testing is what most organisations attempt to do. It seems to be the 'easiest option' and 'least disruptive to the developers'. (Except that it isn't easy and regression bugs are an embarrassing pain for developers). The point is one can't consider regression testing in isolation. It is one of four weapons in our armoury (although the technical approaches require tools). It is also over relied-on and done badly (see 1 above and 3 below).

  3. If Regression testing is about demonstrating functional equivalence (or not), then who should do it? The answer is clear. Developers introduce the changes. They understand or should understand the potential impact of planned changes on the code base before they proceed. Demonstrating functional equivalence is a purely technical activity. Call it checking if you must. Tools can do it very effectively and efficiently if the tests are well directed (80% branch coverage is a rule of thumb). Demonstrating functional equivalence is a purely technical activity that should be done by technicians.

Of course, what happens mostly is that developers are unable to perform accurate technical impact analyses and they don't unit test well so they have no tests and certainly nothing automated. They may not be interested in and/or paid to do testing. So the poor old system or acceptance testers working purely from the user interface are obliged to give it their best shot. Of course, they try and re-use their documented tests or their exploratory nous to create good ones. And fail badly. Not only are tests driven from the UI point of view unlikely to cover the software that might be affected, the testers are generally uninformed of the potential impact of software changes so have no steer to choose good tests in the first place. By and large, they aren't technical and aren't privy to the musings of the developers, before they perform the code changes so they are pretty much in the dark.

So UI driven manual or automated regression testing is usually of low value (but high expense) when intended to demonstrate functional equivalence. That is not to say that UI driven testing has no value. Far from it. It is central to assessing the business impact of changes. Unwanted side effects may not be bugs in code. Unwanted side-effects are a natural outcome of the software changes requested by users. A common unwanted effect here is for example, a change in configuration in an ERP system. The users may not get what they wanted from the 'simple change'. Ill-judged configuration changes in ERP systems designed to perform straight-through processing can have catastrophic effects. I know of one example that caused 75 man-years of manual data clean-up effort. The software worked perfectly – there was no bug. The business using the software did not understand the impact of configuration changes.

Last year I wrote four short papers on Anti-Regression Approaches (including regression testing) and I expand on the points above. You can see them here: http://gerrardconsulting.com/index.php?q=node/479

Tags: #regressiontesting #anti-regression

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 20/09/2010

In the first essay in this series, I set out the challenges of system-level testing in environments where requirements documents define the business need and pre-scripted tests drive demonstrations that business needs are met. These challenges are not being met in most systems development projects.

In this second essay, I’d like to set out a vision for how organizations could increase confidence in requirements and the solutions they describe and to regard them as artifacts that are worth keeping and maintaining. Creating examples that supplement requirements will provide a better definition of the proposed solution for system developers and a source of knowledge for testing that aligns with the business need.

I need to provide some justification. The response of some to the challenge of capturing trusted requirements and managing change through a systems development project is to abandon the concept of pre-stated requirements entirely. The Agile approach focuses on the dynamics of development and the delivered system is ‘merely’ an outcome. This is a sensible approach in some projects. The customer is continuously informed by witnessing demonstrations or having hands-on access to the evolving system to experience its behaviour in use. By this means, they can steer the project towards an emergent solution. The customer is left with experience but no business definition of the solution – only the solution itself. That’s the deal.

But many projects that must work with (internally or externally) contracted requirements treat those requirements as a point of departure, to be left behind and to fade into corporate memory, rather than as a continuously available, dynamic vision of the destination. In effect, projects simply give up on having a vision at all and are driven by the bureaucratic needs to follow a process and the commercial imperative of delivering ‘on time’. It’s no surprise that so many projects fail.

In these projects, the customer is obliged to regard their customer test results as sufficient evidence that the system should be paid for and adopted. But the content of these tests is too often influenced by the solution itself. The content of these tests – at least at a business level could be defined much earlier. In fact, they could be derived from the requirements and ways the users intend to do business using the proposed system (i.e. their new or existing business processes). The essential content of examples is re-usable as tests of the business requirements and business processes from which they are derived. Demonstration by example IS testing. (One could call them logical tests, as compared with the physical tests of the delivered system).

The potential benefits of such an approach are huge. The requirements and processes to be used are tested by example. Customer confidence and trust in these requirements is increased. Tested, trusted requirements with a consistent and covering set of examples provide a far better specification to systems developers: concrete examples provide clarity, improve their understanding and increase their chances of success. Examples provide a trusted foundation for later system and acceptance testing so reusing the examples saves time. The level of late system failures can be expected to be lower. The focus of acceptance tests is more precise and stakeholders can have more confidence in their acceptance decisions. All in all, a much improved state of affairs.

Achieving this enlightened state requires an adjustment of attitudes and focus by customers, and systems development teams. I am using the Test Axioms (http://testaxioms.com) to steer this vision and here are the main tenets of it:

  1. Statements of Requirements, however captured, cannot be trusted if they are fixed and unchanging.
  2. Requirements are an ambiguous, incomplete definition of business needs. They must be supported by examples of the system in use.
  3. Requirements must be tested: examples are derived from the requirements and guided by the business process; they are used to challenge and confirm the thinking behind the requirements and processes.
  4. Requirements, processes and examples together provide a consistent definition of the business need to be addressed by the system supplier.
  5. The business-oriented approach is guided by the Stakeholder and Design Axioms.
  6. Examples are tests: like all tests, they have associated models, coverage, baselines, prioritisations and oracles.
  7. Business impact analyses during initial development and subsequent enhancement projects are informed by requirements and examples. Changes in need are reflected by changes in requirements and associated examples.
  8. Tests intended to demonstrate that business needs are met are derived from the examples that tested the requirements.
  9. Requirements and examples are maintained for the lifetime of the systems they define. The term ‘Live Specs’ has been used for this discipline.

If this is the vision, then some interesting questions (and challenges) arise:

  • Who creates examples to test requirements? Testers or business analysts?
  • Does this approach require a bureaucratic process? Is it limited to large structured projects?
  • What do examples look like? How formal are they?
  • What automated support is required for test management?
  • How does this approach fit with automated test execution?
  • What is the model for testing requirements? How do we measure coverage?
  • How do changing requirements and examples fit with contractual arrangements?
  • What is the requirements test process?
  • How do we make the change happen?
I’ll be discussing these and other questions in subsequent essays.

Tags: #Essaysontestdesign #examples

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 21/02/2008

Teacher: Paul, make a sentence starting with the letter I.

Paul: I is...

Teacher: No, no, no, don't say "I is", you say "I am".

Paul: OK, I am the ninth letter of the alphabet.


This blog is my response to James Bach's comments on his blog to my postings on testing axioms. "Does a set of irrefutable test axioms exist?" and "The 12 Axioms of Testing". There are a lot of comments – all interesting – but many need a separate response. So, Read the following as if it were a conversation – it might make more sense.

PG:= Paul
Text in the standard font = James – not highlighted.



Here we go... James writes...

Paul Gerrard believes there are irrefutable testing axioms.

PG: I'm not sure I do or I don't. My previous blog asks could there be such axioms. This is just an interesting thought experiment. Interesting for me anyway. ;–)
This is not surprising, since all axioms are by definition irrefutable.

PG: Agreed – "irrefutable axioms" is tautological. I changed my blog title quickly – you probably got the first version, I didn't amend the other blog posting. Irrefutable is the main word in that title so I'll leave it as it is.
To call something an axiom is to say you will cover your ears and hum whenever someone calls that principle into question.

PG: It's an experiment, James. I'm listening and not humming.
An axiom is a fundamental assumption on which the rest of your reasoning will be based.

PG: Not all the time. If we encounter an 'exception' in daily life, and in our business we see exceptions all the damn time, we must challenge all such axioms. The axiom must explain the phenomena or be changed or abandoned. Over time, proposals gain credibility and evolve into axioms or are abandoned.
They are not universal axioms for our field.

PG: (Assume you mean "there are no") Now, that is the question I'm posing! I'm open to the possibility. I sense there's a good one.
Instead they are articles of Paul’s philosophy.

PG: Nope – I'm undecided. My philosophy, if I have one, is, "everything is up for grabs".
As such, I’m glad to see them. I wish more testing authors would put their cards on the table that way.

PG: Well thanks (thinks... damned with faint praise ;–) ).
I think what Paul means is that not that his axioms are irrefutable, but that they are necessary and sufficient as a basis for understanding what he considers to be good testing.

PG: Hmm, I hadn't quite thought of it like that but keep going. These aren't MY axioms any more than Newton's laws belonged to him – they were 'discovered'. It took me an hour to sketch them out – I've never used them in this format but I do suspect they have been in some implicit way, my guide. I hope they have been yours too. If not...
In other words, they define his school of software testing.

PG: WHAT! Pause while I get up off the floor haha. Deep breath, Paul. This is news to me, James!
They are the result of many choices Paul has made that he could have made differently. For instance, he could have treated testing as an activity rather than speaking of tests as artifacts. He went with the artifact option, which is why one of his axioms speaks of test sequencing. I don’t think in terms of test artifacts, primarily, so I don’t speak of sequencing tests, usually. Usually, I speak of chartering test sessions and focusing test attention.

PG: I didn't use the word artifact anywhere. I regard testing as an activity that produces Project Intelligence – information, knowledge, evidence, data – whatever you like – that has some value to the tester but more to the stakeholders of testing. We should think of our stakeholders before we commit to a test approach and not be dogmatic. (The stakeholder axiom). How can you not agree with that one? The sequencing axiom suggests you put most valuable/interesting/useful tests up front as you might not have time to do every test – you might be stopped at any time in fact. Test Charters and Sessions are right in line with at least half of the axioms. I do read stuff occasionally :–) Next question please!

No these aren't the result. They are thoughts, instincts even that I've had for many years and I've tried to articulate. I'm posing a question. Do all testers share some testing instincts? I won't be convinced that my proposed axioms are anywhere close until they've been tested and perfected through experience. I took some care to consider the 'school'.
Sometimes people complain that declaring a school of testing fragments the craft. But I think the craft is already fragmented, and we should explore and understand the various philosophies that are out there. Paul’s proposed axioms seem a pretty fair representation of what I sometimes call the Chapel Hill School, since the Chapel Hill Symposium in 1972 was the organizing moment for many of those ideas, perhaps all of them. The book Program Test Methods, by Bill Hetzel, was the first book dedicated to testing. It came out of that symposium.

PG: Hmm. This worries me a lot. I am not a 'school' thank-you very much. Too many schools push dogma, demand obedience to school rules and mark people for life. They put up barriers to entry and exit and require members to sing the same school song. No thanks. I'm not a school.

It reminds me of Groucho Marx. "I wouldn't want to join any club that would have me as a member."
The Chapel Hill School is usually called “traditional testing”, but it’s important to understand that this tradition was not well established before 1972. Jerry Weinberg’s writings on testing, in his authoritative 1961 textbook on programming, presented a more flexible view. I think the Chapel Hill school has not achieved its vision, it was largely in dissatisfaction with it that the Context-Driven school was created.

PG: In my questioning post, I used 'old school' and 'new school' just to label one obvious choice – pre-meditated v contemporaneous design and execution to illustrate that axioms should support or allow both – as both are appropriate in different contexts. I could have used school v no-school or structured v ad-hoc or ... well anything you like. This is a distraction.

But I am confused. You call the CH symposium a school and label that "traditional". What did the symposium of 1972 call themselves? Traditional? A school? I'm sure they didn't wake up the day after thinking "we are a school" and "we are traditional". How do those labels help the discussion? In this context, I can't figure out whether 'school' is a good thing or bad. I only know one group who call themselves a school. I think 'brand' is a better label.
One of his axioms is “5. The Coverage Axiom: You must have a mechanism to define a target for the quantity of testing, measure progress towards that goal and assess the thoroughness in a quantifiable way.” This is not an axiom for me. I rarely quantify coverage. I think quantification that is not grounded in measurement theory is no better than using numerology or star signs to run your projects. I generally use narrative and qualitative assessment, instead.

PG: Good point. the words quantity and quantifiable imply numeric measurement – that wasn't my intention. Do you have a form of words I should use that would encompass quantitive and qualitative assessment? I think I could suggest "You must have a means of evaluating narratively, qualitatively or quantitatively the testing you plan to do or have done". When someone asks, how much testing do you plan to do, have done or have left to do, I think we should be able to provide answers. "I don't know" is not a good answer – if you want to stay hired.
For you context-driven hounds out there

PG: Sir, Yes Sir! ;–)
practice your art by picking one of his axioms and showing how it is possible to have good testing, in some context, while rejecting that principle. Post your analysis as a comment to this blog, if you want.

PG: Yes please!
In any social activity (as opposed to a mathematical or physical system), any attempt to say “this is what it must be” boils down to a question of values or definitions. The Context-Driven community declared our values with our seven principles. But we don’t call our principles irrefutable. We simply say here is one school of thought, and we like it better than any other, for the moment.

PG: I don't think I'm saying "this is what it must be" at all. What is "it", what is "must be"? I'm asking testers to consider the proposal and ask whether they agree if it has some value as a guide to choosing their actions. I'm not particularly religious but I think "murder is wrong". The fact that I don't use the ten commandments from day to day does not mean that I don't see value in them as a set of guiding principles for Christians. Every religion has their own set of principles, but I don't think many would argue murder is acceptable. So even religions are able to find some common ground. In this analogy, school=religion. Why can't we find common ground between schools of thought?

I'm extremely happy to amend, remove or add to the axioms as folk comment. Either all my suggestions will be completely shot down or some might be left standing. I'm up for trying. I firmly believe that there are some things all testers could agree on no matter how abstract. Are they axioms? Are they motherhood and apple pie? Let's find out. These abstractions could have some value other than just as debating points. But let's have that debate.

By the way – my only condition in all this is you use the blog the proposed axioms appear on. If you want to defend the proposed axioms – be my guest.

Thanks for giving this some thought – I appreciate it.


Tags: #School'sOut!

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 18/09/2010

This is the first in a series of short essays in which I will set out an approach to test design, preparation and execution that involves testers earlier, increases their influence in projects, improves baseline documents and stability, reduces rework and increases the quality of system and acceptance testing. The approach needs automated support and the architecture for the next generation of test management tools will be proposed. I hope that doesn’t sound too good to be true and that you’ll bear with me.

Some scene-setting needs to be done...

In this series, I’m focusing on contexts (in system or acceptance testing) where scripted tests are a required deliverable and will provide the instructions in the form of scripts, procedures (or program code) to execute tests. In this opening essay, I’d like to explore why the usual approach to building test scripts (promoted in most textbooks and certification schemes) wastes time, undermines their effectiveness and limits the influence of testers in projects. These problems are well-known.

There are two common approaches to building scripted tests:

  1. Create (manually or automated) test scripts directly from a baseline (requirement or other specification documents). The scripts provide all the information required to execute a test in isolation.
  2. Create tabulated test cases (combinations of preconditions, data inputs, outputs, expected results) from the baseline and an associated procedure to be used to execute each test case in turn.
By and large, the first approach is very wasteful and inflexible and the tests themselves might not be viable anyway. The second approach is much better and is used to create so called ‘data-driven’ manual (and automated) test regimes. (Separating procedure from data in software and tests is generally a good thing!) But both of these approaches make two critical assumptions:
  • The baseline document(s) provide all the information required to extract a set of executable instructions for the conduct of a test.
  • The baseline is stable: changing requirements and designs make for a very painful test development and maintenance experience; most test script development takes place late in the development cycle.
In theory, a long term, document-intensive project with formal reviews, stages and sign-offs could deliver stable, accurate baselines providing all the information that system-level testers require. But few such projects deliver what their stakeholders want because stakeholder needs change over time and bureaucratic projects and processes cannot respond to change fast enough (or at all). So, in practice, neither assumption is safe. The full information required to construct an executable test script is not usually available until the system is actually delivered and testers can see how things really work. The baseline is rarely stable anyway: stakeholders learn more about the problem to be solved and the solution design evolves over time so ‘stability’, if ever achieved, is very late in arriving. The usual response is to bring the testers onto the project team at a very late stage.

What are the consequences?

  • The baselines are a ‘done deal’. Requirements are fixed and cannot be changed. They are not testable because no one has tried to use them to create tests. The most significant early deliverables of a project may not themselves have been tested.
  • Testers have little or no involvement in the requirements process. The defects that testers find in documents are ignored (“we’ve moved on – we’re not using that document anymore”).
  • There is insufficient detail in baselines to construct tests, so testers have to get the information they need from stakeholders, users and developers any which way they can. (Needless to say, there is insufficient detail to build the software at all! But developers at least get a head start on testers in this respect.) The knowledge obtained from these sources may conflict, causing even more problems for the tester.
  • The scripts fail in their stated objective: to provide sufficient information to delegate execution to an independent tester, outsourced organization or to an automated tool. These scripts need intelligence and varying degrees of system and business domain knowledge to be usable.
  • The baselines do not match the delivered system. Typically, the system design and implementation has evolved away from the fixed requirements. The requirements have not been maintained as users and developers focus on delivery. Developers rely on meetings, conversations and email messages for their knowledge.
  • When the time comes for test execution:
    1. The testers who created the scripts have to support the people running them (eliminating the supposed cost-savings of delegation or outsourcing).
    2. The testers run the test themselves (but they don’t need the scripts, so how much effort to create these test scripts was wasted?).
    3. The scripts are inaccurate, so paper copies are marked up and corrected retrospectively to cover the backs of management.
    4. Automated tests won’t run at all without adjustment. In fixing the scripts, are some legitimate test failures eliminated and lost? No one knows.
When testers arrive on a project late they are under-informed and misinformed. They are isolated in their own projects. Their sources of knowledge are unreliable: the baseline documents are not trustworthy. Sources of knowledge may be uncooperative: “the team is too busy to talk to you – go away!”

Does this sound familiar to you?

That’s the scene set. In the next essay, I’ll set out a different vision.

Tags: #Essaysontestdesign

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 06/11/2009

Low product quality may be associated with poor development activities, but most organisations identify lack of testing or low testing effectiveness as the culprit. The choice is clear: hire more testers, improve the way we do our testing or get someone else to do it.

Hiring more testers might increase the amount of testing, but unless the new testers are particularly capable, the productivity of test teams may be unchanged. On the same product, 'more testing' will only improve test effectiveness if a more systematic approach is adopted and techniques are used to reach the software nooks and crannies that other tests don't reach. Just like search teams, testers must 'spread out', but aimless wandering is not effective. Testers must be organised to avoid duplicating effort and leaving gaps.

If testing is currently chaotic, adding testers to a chaotic process may keep unemployed testers off the street, but doesn't improve test effectiveness much. To make significant gains in effectiveness, both testing skills and infrastructure need to be enhanced. What about tools? Can't they be used to increase the testing? For a chaotic process, tools rarely add value (they often waste testers time). All too often, tools are only used to run the tests that are easy to automate – the very tests that didn't find errors!

What about outsourcing? (Tester body-shopping should not, strictly, be regarded as outsourced testing – that is the 'more testers' route). What is the outsourced testing service? The service definition should detail the responsibilities of the client as well as the outsourcer. For example, outsourcer responsibilities might be for example: documentation of master test plans, test specifications, creation of test scripts and data, test execution, test management, test automation etc. The client responsibility might be: direction on test scope, business risks, business processes, technical or application consultancy, assessment of incident severity, analysis of test results and sign-off.

If the client organisation has a poor track record of directing in-house test teams, however, the outsourcing arrangement is unlikely to benefit the client. Testing may not be faster, as testers may be held up through lack of direction from business or system experts. The testing may lack focus, as testers 'guess' where they should spend their time to best effect. Tests may not address the biggest business risks; cosmetic errors may be found at the expense of leaving serious errors undetected.

Simply giving up on testing and handing it over to a third party will cost more because you have a management overhead, as well as more expensive test teams. The quality of the testing is unlikely to improve – good testers are better at producing test plans and tests, but the content is based on the quality and amount of knowledge transferred from business and technical experts. If your experts are not used to being heavily involved in the test process, tests may be produced faster, but may still be of poor quality.

This does not mean that outsourced testers can never be as effective as internal resources. The point is that unless your organisation is used to using an internal testing service, it is unlikely to get the most out of an outsourced testing service. The inevitable conclusion is that most organisations should improve their test practices before outsourcing. But what are the improvements?

We'd like to introduce the good testing customer (GTC). It sounds like this means making the job of the outsourcer easier, but does it really mean... doing the job for them?

Describing a GTC is easy. They know what they want and can articulate their need to their supplier; they understand the customer/supplier relationship and how to manage it; they know how to discern a good supplier from a bad one. One could define a good customer of any service this way.

The GTC understands the role and importance of testing in the software development and maintenance process. They recognise the purpose and different emphases of development, system and acceptance testing. Their expectations of the test process are realistic and stable. The relationship between business, technical and schedule risks and testing is visible. When development slips, the testing budget is defended; the consequences of squeezing testing are acknowledged.

These are the main issues that need to be addressed by the client organisation if the benefits of good testing are to be realised. How does an organisation become a good testing customer? In the same way any organisation improves its practices – through management and practitioner commitment, clarity of purpose and a willingness to change the way things are done.

© 1998, Paul Gerrard

Tags: #outsourcing #improvement

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 05/11/2009

we're stood in a boat ankle deep in water. the cannibals are coming to kill and eat us. The testers are looking for the holes in the boat saying – we can't push off yet the river is full of hungry crocodiles.

The testers are saying – if we push off now, we'll all die.

The skipper is saying – if we don't push off soon, we'll all die.

I'ts the same with software.

Tags: #ALF

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 30/06/2011

Scale, extended timescales, logistics, geographic resource distribution, requirements/architectural/commercial complexity, demand for documented plans and evidence are the gestalt of larger systems development. “Large systems projects can be broken up into a number of more manageable smaller projects requiring less bureaucracy and paperwork” sounds good, bur few have succeeded. Iterative approaches are the obvious way to go, but not many corporates have the vision, the skills or the patience to operate that way. Even so, session-based/exploratory testing is a component of almost all test approaches.

The disadvantages of documentation are plain to see. But there are three apsects that concern us.

  1. Projects, like life never stand still. Documentation is never up to date or accurate and it's a pain to maintain – so it usually isn't
  2. Processes can be put in place to keep the requirements and all dependent documentation in perfect synchronisation. The delays caused by the required human interventions and translation processes undermine our best efforts.
  3. At the heart of projects are people. They can rely on processes and paper to save them and stop thinking. Or they can use their brains.

Number 3 is the killer of course. With the best will and processes and discipline in the world, all our sources of knowledge are fallible. It is our human ability and flexibility and dare I say it agility that allows us to build and test some pretty big stuff that seems to work.

Societal and corporate stupor (aka culture) conspire to make us less interested in tracking down the flaws in requirements, designs, code, builds and thinking. It is our exploratory instincts that rescue us.

Tags: #ALF

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 26/01/2012

Peter Farrell-Vinay posted the question “Does exploratory testing mean we've stopped caring about test coverage?”on LinkedIn here: http://www.linkedin.com/groupItem?view=&gid=690977&type=member&item=88040261&qid=75dd65c0-9736-4ac5-9338-eb38766e4c46&trk=group_most_recent_rich-0-b-ttl&goback=.gde_690977_member_88040261.gmr_690977

I've replied on that forum, but I wanted to restructure some of the various thoughts expressed there to make a different case.

Do exploratory testers care about coverage? If they don't think and care about coverage, they absolutely should.

All test design is based on models

I’ve said this before: http://testaxioms.com/?q=node/11 Testing is a process in which we create mental models of the environment, the system, human nature, and the tests themselves. Test design is the process by which we select, from the infinite number possible, the tests that we believe will be most valuable to us and our stakeholders. Our test model helps us to select tests in a systematic way. Test models are fundamental to testing - however performed. A test model might be a checklist or set of criteria; it could be a diagram derived from a design document or an analysis of narrative text. Many test models are never committed to paper – they can be mental models constructed specifically to guide the tester whilst they explore the system under test. From the tester’s point of view, a model helps us to recognise particular aspects of the system that could be the object of a test. The model focuses attention on areas of the system that are of interest. But, models almost always over-simplify the situation.

All models are wrong, some models are useful

This maxim is attributed to the statistician George Box. But it absolutely applies in our situation. Here’s the rub with all models – an example will help. A state diagram is a model. Useful, but flawed and incomplete. It is incomplete because a real system has billions of states, not the three defined in a design document. (And the design might have a lot or little in common with the delivered system itself, by the way). So the model in the document is idealised, partial and incomplete - it is not reality. So, the formality of models does not equate to test accuracy or completeness in any way. All coverage is measured with respect to the model used to derive testable items (in this case it could be state transitions). Coverage of the test items derived from the model doesn’t usually (hardly ever?) indicate coverage of the system or technology. The skill of testing isn't mechanically following the model to derive testable items. The skill of testing is in the choice of the considered mix of various models. The choice of models ultimately determines the quality of the testing. The rest is clerical work and (most important) observation. I’ve argued elsewhere that not enough attention is paid to the selection of test models. http://gerrardconsulting.com/index.php?q=node/495

Testing needs a test coverage model or models

I’ve said this before too: http://testaxioms.com/?q=node/14 Test models allow us to identify coverage items. A coverage item is something we want to exercise in our tests. When we have planned or executed tests that cover items identified by our model we can quantify the coverage achieved as a proportion of all items on the model - as a percentage. Numeric test coverage targets are sometimes defined in standards and plans and to be compliant these targets must be met. Identifiable aspects of our test model, such as paths through workflows, transitions in state models or branches in software code can be used as the coverage items. Coverage measurement can help to make testing more 'manageable'. If we don’t have a notion of coverage, we may not be able to answer questions like, ‘what has been tested?’, ‘what has not been tested?’, ‘have we finished yet?’, ‘how many tests remain?’ This is particularly awkward for a test manager. Test models and coverage measures can be used to define quantitative or qualitative targets for test design and execution. To varying degrees, we can use such targets to plan and estimate. We can also measure progress and infer the thoroughness or completeness of the testing we have planned or executed. But we need to be very careful with any quantitative coverage measures or percentages we use.

Formal and Informal Models

Models and coverage items need not necessarily be defined by industry standards. Any model that allows coverage items to be identified can be used.

My definition is this: a Formal Model allows coverage items to be reliably identified on the model. A quantitative coverage measure can therefore be defined and used as a measurable target (if you wish).

Informal Models tend to be checklists or criteria used to brainstorm a list of coverage items or to trigger ideas for testing. These lists or criteria might be pre-defined or prepared as part of a test plan or adopted in an exploratory test session.

Informal models are different from formal models in that the derivation of the model itself is dependent on the experience, intuition and imagination of the practitioner using them so coverage using these models can never be quantified meaningfully. We can never know what ‘complete coverage’ means with respect to these models.

Needless to say, tests derived from an informal model are just as valid as tests derived from a formal model if they increase our knowledge of the behaviour or capability of our system.

Risk-based testing is an informal model approach – there is no way to limit the number of risks that can be identified. Is that bad? Of course not. It’s just that we can't define a numeric coverage target (other than ‘do some tests associated with every serious risk’). Risk identification, assessments etc. are subjective. Different people would come up with different risks, described differently, with different probabilities and consequences. Different risks would be included/omitted; some risks would be split into micro-risks or not. It's subjective. All risks aren't the same so %coverage is meaningless etc. The formality associated with risk-based approaches relates mostly to the level of ceremony and documentation and not the actual technique of identifying and assessing risks. It’s still an informal technique.

In contrast, two testers given the same state transition diagram or state table asked to derive, say, state transitions to be covered by tests, would come up with the same list of transitions. Assuming a standard presentation for state diagrams can be agreed, you have an objective model (albeit flawed, as already suggested).

Coverage does not equal quality

A coverage measure (based on a formal model) may be calculated objectively, but there is no formula or law that says X coverage means Y quality or Z confidence. All coverage measures give only indirect, qualitative, subjective insights into the thoroughness or completeness of our testing. There is no meaningful relationship between coverage and the quality of systems.

So, to return to Peter's original question “Does exploratory testing mean we've stopped caring about test coverage?” Certainly not, if the tester is competent.

Is the value of testing less because informal test/coverage models are used rather than formal ones? No one can say – there is no data to support that assertion.

One 'test' of whether ANY tester is competent is to ask about their models and coverage. Most testing is performed by people who do not understand the concept of models because they were never made aware of them.

The formal/informal aspects of test models and coverage are not a criteria for deciding whether planned/documented v exploratory is best because planned testing can use informal models and ET can use formal models.

Ad-Hoc Test Models

Some models can be ad-hoc – here and now, for a specific purpose – invented by the tester just before or even during testing. If, while testing, a tester sees an opportunity to explore a particular aspect of a system, he might use his experience to think up some interesting situations on-the-fly. Nothing may be written down at the time, but the tester is using a mental model to generate tests and speculate how the system should behave.

When a tester sees a new screen for the first time, they might look at the fields on screen (model: test all the data fields), they might focus on the validation of numeric fields (model: boundary values), they might look at the interactions between checkboxes and their impact on other fields visibility or outcomes (model: decision table?) or look at ways the screen could fail e.g. extreme values, unusual combinations etc. (model: failure mode or risk-based). Whatever. There are hundreds of potential models that can be imagined for every feature of a system.

The very limited number of test models associated with textual requirements are just that – limited – to the common ones taught in certification courses. Are they the best models? Who knows? There is very little evidence to say they are. Are they formal – yes, in so far objective definitions of the models (often called test techniques) exist. Is formal better than informal/ad-hoc? That is a cultural or value-based decision – there's little or no evidence other than anecdotal to justify the choice.

ET exists partly to allow testers to do much more testing than that limited by the common models. ET might be the only testing used in some contexts or it might be the 'extra testing on the top' of more formally planned, documented testing. That's a choice made by the project.

Certification Promotes Testing as a Clerical Activity

This ‘clerical’ view of testing is what we have become accustomed to (partly because of certification). The handed-down or ‘received wisdom’ of off-the-shelf models are useful in that they are accessible, easy to teach and mostly formal (in my definition). There were, when I last looked, 60+ different code coverage models possible in plain vanilla program languages. My guess is there are dozens associated with narrative text analysis, dozens associated with usage patterns, dozens associated with integration and messaging strategies. And for every formal design model in say, UML, there are probably 3-5 associated test models – for EACH. Certified courses give us five or six models. Most testers actually use one or two (or zero).

Are the stock techniques efficient/effective? Compared to what? They are taught mostly as a way of preparing documentation to be used as test scripts. They aren't taught as test models having more or less effectiveness or value for money to be selected and managed. They are taught as clerical procedures. The problem with real requirements is you need half a dozen different models on each page, on each paragraph even. Few people are trained/skilled enough to prepare good designed, documented tests. When people talk about requirements coverage it's as sophisticated as saying we have a test that someone thinks relates to something mentioned in that requirement. Hey – that's subjective again – subjective, not very effective and also very expensive.

With Freedom of Model Choice Comes Responsibility

A key aspect of exploratory testing is that you should not be constrained but should be allowed and encouraged to choose models that align with the task in hand so that they are more direct, appropriate and relevant. But the ‘freedom of model choice’ applies to all testing, not just exploratory, because at one level, all testing is exploratory (http://gerrardconsulting.com/index.php?q=node/588). In future, testers need to be granted the freedom of choice of test models but for this to work, testers must hone their modelling skills. With freedom comes responsibility. Given freedom to choose, testers need to make informed choices of model that are relevant to the goals of their testing stakeholders. It seems to me that the testers who will come through the turbulent times ahead are those who step up to that responsibility.

Sections of the text in this post are lifted from the pocketbook http://testers-pocketbook.com

Tags: #model #testmodel #exploratorytesting #ET #coverage

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 05/11/2009

This year, I was asked to present two talks on 'Past, Present and Future of Testing' at IBC Euroforum in Stockholm and 'Future of Testing' at SQC London. I thought it would be a good idea to write some notes on the 'predictions' as I'm joining some esteemed colleagues at a retreat this weekend, and we talk about futures much of the time. These notes are notes. This isn't a formal paper or article. Please regard them as PowerPoint speaker notes, not more. They don't read particularly well and don't tell a story, but they do summarise the ideas presented at the two talks.

Registered users can download the paper from the link below. If you aren't registered, you can register here.

Tags: #futures

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 08/01/2010

Sarang Kulkarni posted an interesting question on the LinkeIn “senior Testing Professionals” discussion forum.

It's a question that has been posed endlessly by people funding testing and every tester has worried about the answer. I can't tell you how many discussions I've been involved in have revolved around this question. It's a fair question and it's hard one to answer. OK – why is it so hard to answer? Received wisdom has nothing to say except quantiy of testing is good (in some way) and that thoroughness (by some mysterious measure) are morelikely to improve quality. Unfortunately, testers do no usually write or change software – only developers have an influence over quality. All in all the quality of testing has the most indirect relationship to quality, Measure perfomance? Forget it.

My response is based on a different view of what testing is for. Testing isn't about finding bugs so others can fix them. That's like saying literary criticsm is about finding typos, or battlefield medicine is about finding bulletholes in people or banking is about counting money. Not quite.

Testing exists to collect information about a system's behaviour (on the analysts drawing board, as components, usable system or integrated whole) and calibrating that in some (usually subjective) way against someone else's expectations and communicating that to stakeholders. It's as simple and as complicated than that.

Simple because testing exists to collect and communicate information for others to make a decision. more complicated because virtually everything in software, systems, organisation and culture block this most basic objective. but hey, that's what makes a tester's life interesting.

If our role as testers is collect and disseminate information for others to make decisions, then it must be those decision makers who must just the completeness and quality of our work – i.e. performance. Who else can make that judgement – and judgement it must be because there are no metrics that can reasonably be used to evaluate our performance.

The problem is, our 'performance' is influenced by the quality (good or bad) of the systems we test, the ease by which we can obtain behavioural information, the subjective view of the depth of the testing we do, the cricitality of the systems we test and the pressures on, mentality of and even frame of mind of the people we test on behalf of.

What meaning could be assigned to any meausre once cares to use? Silly.

Performance shamformance. What's the difference?

The best we can do is ask our stakeholders – the people we test on behalf of – what do they thing we are doing and how well are we doing it? Subjective yes. Qualititative yes. Helpful – yes. If...

The challenge to testers is to get stakeholders to articulate what exactly they want from us before we test and then to give us their best assessment of how we meet those objectives. Anything else is mere fluff.

Tags: #ALF

Paul Gerrard My linkedin profile is here My Mastodon Account