Paul Gerrard

My experiences, opinions in the Test Engineering business. I am republishing/rewriting old blogs from time to time.

First published 06/12/2009

Are you ever asked as a tester, “is the system good enough to ship?” Given our normal experience, where we are never given enough time to do the testing, the system cannot be as good as it should be. When the time comes to make the release decision, how could you answer that question? James Bach introduced the idea called ‘Good Enough’ in 1997 (Bach,1997). It is helpful to understanding the risk-based test approach as it seems to hold water as a framework for release decision-making, at least in projects where risks are being taken. So, what is “Good Enough?” and how does it help with the release decision?

Many consultants advocate ‘best practices’ in books and conferences. Usually, they preach perfection and they ask leading questions like, “would you like to improve your processes?”, “do you want zero defects?” Could anyone possibly say “no” to these questions? Of course not. Many consultants promote their services using this method of preaching perfection and pushing mantras that sound good. It’s almost impossible to reject them.

Good enough is a reaction to this compulsive formalism, as it is called. It’s not reasonable to aim at zero-defects in software and your users and customers never expect perfection, so why do you pretend that you’re aiming at perfection? The zero-defect attitude just doesn’t help. Compromise is inevitable and you always know it’s coming. The challenge ahead is to make a release decision for an imperfect system based on imperfect information.

The definition of “Good Enough” in the context of a system to be released is:

  1. X has sufficient benefits.
  2. X has no critical problems.
  3. The benefits of X sufficiently outweigh the problems.
  4. In the present situation, and all things considered, improving X would cause more harm than good.
  5. All the above must apply.

To expand on this rather terse definition, X (whatever X is) has sufficient benefits means that there is deemed enough of this system working for us to take it into production, use it, get value, and get the benefit. It has no critical problems. i.e. there are no severe faults that make it unusable or unacceptable. At this moment in time, with all things considered, if we spend time trying to perfect X, that time is probably going to cost us more than shipping early with the known problems. This framework allows us to release an imperfect system early because the benefits may be worth it. How does testing fit into this good enough idea?

Firstly, have sufficient benefits been delivered? The tests that we execute must at least demonstrate that the features providing the benefits are delivered completely, so that we have evidence of this. Secondly, are there any critical problems? Our incident reports give us the evidence of the critical problems and many others too. There should be no critical problems for it to be good enough. Thirdly, is our testing good enough to support this decision? Have we provided sufficient evidence to say these risks are addressed and those benefits are available for release?

It is not for a tester to decide whether the system is good enough. An analogy that might help here is to view the tester as an expert witness in a court of law. The main players in this courtroom scene are:

  • The accused (the system under test).
  • The judge (project manager).
  • The jury (the stakeholders).
  • Expert witness (the tester).

In our simple analogy, we will disregard the lawyers’ role. (In principle, they act only to extract evidence from witnesses). Expert witnesses are brought into a court of law to find evidence and present that evidence in a form for laymen (the jury) to understand. When asked to present evidence, the expert is objective and detached. If asked whether the evidence points to guilt or innocence, the expert explains what inferences could be made based on the evidence, but refuses to judge innocence or guilt. In the same way, the software tester might simply state that based on evidence “these features work, these features do not work, these risks have been addressed, these risks remain”. It is for others to judge whether this makes a system acceptable.

The tester simply provides information for the stakeholders to make a decision. Adopting this position in a project seems to be a reasonable one to take. After all, testers do not create software or software faults; testers do not take the risks of accepting a system into production. Testers should advocate to their management and peers this independent point of view. When asked to judge, whether a system is good enough, the tester might say that on the evidence we have obtained, these benefits are available; these risks still exist. The release decision is someone else’s decision to make.

However, you know that the big question is coming your way so when you are asked, “is it ready?” what should you do? You must help the stakeholders make the decision, but not make it for them. The risks, those problems that we thought could occur some months ago, which, in your opinion would make the system unacceptable, might still exist. Based on the stakeholders’ own criteria, the system cannot now be acceptable, unless they relax their perceptions of the risk. The judgement on outstanding risks must be as follows:

  • There is enough test evidence now to judge that certain risks have been addressed.
  • There is evidence that some features do not work (the feared risk has materialised).
  • Some risks remain (tests have not been run, or no tests are planned).

This might seem like an ideal independent position that testers could take but you might think that it is unrealistic to think one can behave this way. However, we believe this stance is unassailable since the alternative, effectively, is for the tester to take over the decision making in a project. You may still be forced to give an opinion on the readiness of a system, but we believe taking this principled position (at least at first) might raise your profile and credibility with management. They might also recognise your role in projects in future – as an honest broker.

REFERENCES Bach, J. (1997), Good Enough Quality: Beyond the Buzzword, in: IEEE Computer, August 1997, pp. 96-98

Paul Gerrard, July 2001



Tags: #risk-basedtesting #good-enough

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account

First published 06/11/2009

If we believe the computer press, the E-Business revolution is here; the whole world is getting connected; that many of the small start-ups of today will become the market leaders of tomorrow; that the whole world will benefit from E-anyWordULike. The web offers a fabulous opportunity for entrepreneurs and venture capitalists to stake a claim in the new territory – E-Business. Images of the Wild West, wagons rolling, gold digging and ferocious competition over territory give the right impression of a gold rush.

Pressure to deliver quickly, using new technology, inexperienced staff, into an untested marketplace and facing uncertain risks is overwhelming. Where does all this leave the tester? In fast-moving environments, if the tester carps about lack of requirements, software stability or integration plans they will probably be trampled to death by the stampeding project team. In high integrity environments (where the Internet has made little impact, thankfully), testers have earned the grudging respect of their peers because the risk of failure is unacceptable and testing helps to reduce or eliminate risk. In most commercial IT environments however, testers are still second class citizens on the team. Is this perhaps because testers, too often, become ant-risk zealots? Could it be that testers don’t acclimatise to risky projects because we all preach ‘best practices’?

In all software projects, risks are taken. In one way, testing in high-integrity environments is easy. Every textbook process, method and technique must be used to achieve an explicit aim: to minimise risk. It’s a no-brainer. In fast-moving E-Business projects, risk taking is inevitable. Balancing testing against risk is essential because we never have the time to test everything. It’s tough to get it ‘right’. If we don’t talk to the risk-takers in their language we’ll never get the testing budget approved.

So, testers must become expert in risk. They must identify failure modes and translate these into consequences to the sponsors of the project. 'If xxx fails (and it is likely, if we don’t test), then the consequence to you, as sponsor is...' In this way, testers, management, sponsors can reconcile the risks being taken to the testing time and effort.

How does this help the tester? Firstly, the decision to do more or less testing is arrived at by consensus (no longer will the tester lie awake at night thinking: 'am I doing enough testing?'). Second, the decision is made consciously by those taking the risk. Third, it makes explicit the tests that will not be done – the case for doing more testing was self-evident, but was consciously overruled by management. Fourth, it makes the risks being taken by the project visible to all.

Using risk to prioritise tests means that testers can concentrate on designing effective tests to find faults and not worry about doing ‘too little’ testing.

What happens at the end of the test phase, when time has run out and there are outstanding incidents? If every test case and incident can be traced back to a risk, the tester can say, 'at this moment, here are the risks of release'. The decision to release needn’t be an uninformed guess. It can be based on an objective assessment of the residual risk.

Adopting a risk-based approach changes the definition of ‘good’ testing. Our testing is good if it provides evidence of the benefits delivered and of the current risk of release, at an acceptable cost, in an acceptable timeframe. Our testing is good if, at any time during the test phase, we know the status of benefits, and the risk of release. No longer need we wait a year after release before we know whether our testing is perfect (or not). Who cares, one year later anyway?

In a recent E-Business project, we identified 82 product risks of concern. Fewer than 10 had anything to do with functionality. In all E-Business projects, the issues of Non-Functional problems such as usability, browser configuration, performance, reliability, security seem to dominate people’s concerns. We used to think of software product risks in one dimension (functionality) and concentrate on that. The number and variety of the risks of E-Business projects forces us to take a new approach.

It could be said that in the early 1990s, the tester community began to emerge and gain a voice in the computer industry. Using that voice, the language of risk will make testers effective in the ambitious projects coming in the next millennium.

Paul Gerrard, February 2000

Tags: #risk #e-businesstesting #language

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account

First published 21/02/2008

Teacher: Paul, make a sentence starting with the letter I.

Paul: I is...

Teacher: No, no, no, don't say "I is", you say "I am".

Paul: OK, I am the ninth letter of the alphabet.


This blog is my response to James Bach's comments on his blog to my postings on testing axioms. "Does a set of irrefutable test axioms exist?" and "The 12 Axioms of Testing". There are a lot of comments – all interesting – but many need a separate response. So, Read the following as if it were a conversation – it might make more sense.

PG:= Paul
Text in the standard font = James – not highlighted.



Here we go... James writes...

Paul Gerrard believes there are irrefutable testing axioms.

PG: I'm not sure I do or I don't. My previous blog asks could there be such axioms. This is just an interesting thought experiment. Interesting for me anyway. ;–)
This is not surprising, since all axioms are by definition irrefutable.

PG: Agreed – "irrefutable axioms" is tautological. I changed my blog title quickly – you probably got the first version, I didn't amend the other blog posting. Irrefutable is the main word in that title so I'll leave it as it is.
To call something an axiom is to say you will cover your ears and hum whenever someone calls that principle into question.

PG: It's an experiment, James. I'm listening and not humming.
An axiom is a fundamental assumption on which the rest of your reasoning will be based.

PG: Not all the time. If we encounter an 'exception' in daily life, and in our business we see exceptions all the damn time, we must challenge all such axioms. The axiom must explain the phenomena or be changed or abandoned. Over time, proposals gain credibility and evolve into axioms or are abandoned.
They are not universal axioms for our field.

PG: (Assume you mean "there are no") Now, that is the question I'm posing! I'm open to the possibility. I sense there's a good one.
Instead they are articles of Paul’s philosophy.

PG: Nope – I'm undecided. My philosophy, if I have one, is, "everything is up for grabs".
As such, I’m glad to see them. I wish more testing authors would put their cards on the table that way.

PG: Well thanks (thinks... damned with faint praise ;–) ).
I think what Paul means is that not that his axioms are irrefutable, but that they are necessary and sufficient as a basis for understanding what he considers to be good testing.

PG: Hmm, I hadn't quite thought of it like that but keep going. These aren't MY axioms any more than Newton's laws belonged to him – they were 'discovered'. It took me an hour to sketch them out – I've never used them in this format but I do suspect they have been in some implicit way, my guide. I hope they have been yours too. If not...
In other words, they define his school of software testing.

PG: WHAT! Pause while I get up off the floor haha. Deep breath, Paul. This is news to me, James!
They are the result of many choices Paul has made that he could have made differently. For instance, he could have treated testing as an activity rather than speaking of tests as artifacts. He went with the artifact option, which is why one of his axioms speaks of test sequencing. I don’t think in terms of test artifacts, primarily, so I don’t speak of sequencing tests, usually. Usually, I speak of chartering test sessions and focusing test attention.

PG: I didn't use the word artifact anywhere. I regard testing as an activity that produces Project Intelligence – information, knowledge, evidence, data – whatever you like – that has some value to the tester but more to the stakeholders of testing. We should think of our stakeholders before we commit to a test approach and not be dogmatic. (The stakeholder axiom). How can you not agree with that one? The sequencing axiom suggests you put most valuable/interesting/useful tests up front as you might not have time to do every test – you might be stopped at any time in fact. Test Charters and Sessions are right in line with at least half of the axioms. I do read stuff occasionally :–) Next question please!

No these aren't the result. They are thoughts, instincts even that I've had for many years and I've tried to articulate. I'm posing a question. Do all testers share some testing instincts? I won't be convinced that my proposed axioms are anywhere close until they've been tested and perfected through experience. I took some care to consider the 'school'.
Sometimes people complain that declaring a school of testing fragments the craft. But I think the craft is already fragmented, and we should explore and understand the various philosophies that are out there. Paul’s proposed axioms seem a pretty fair representation of what I sometimes call the Chapel Hill School, since the Chapel Hill Symposium in 1972 was the organizing moment for many of those ideas, perhaps all of them. The book Program Test Methods, by Bill Hetzel, was the first book dedicated to testing. It came out of that symposium.

PG: Hmm. This worries me a lot. I am not a 'school' thank-you very much. Too many schools push dogma, demand obedience to school rules and mark people for life. They put up barriers to entry and exit and require members to sing the same school song. No thanks. I'm not a school.

It reminds me of Groucho Marx. "I wouldn't want to join any club that would have me as a member."
The Chapel Hill School is usually called “traditional testing”, but it’s important to understand that this tradition was not well established before 1972. Jerry Weinberg’s writings on testing, in his authoritative 1961 textbook on programming, presented a more flexible view. I think the Chapel Hill school has not achieved its vision, it was largely in dissatisfaction with it that the Context-Driven school was created.

PG: In my questioning post, I used 'old school' and 'new school' just to label one obvious choice – pre-meditated v contemporaneous design and execution to illustrate that axioms should support or allow both – as both are appropriate in different contexts. I could have used school v no-school or structured v ad-hoc or ... well anything you like. This is a distraction.

But I am confused. You call the CH symposium a school and label that "traditional". What did the symposium of 1972 call themselves? Traditional? A school? I'm sure they didn't wake up the day after thinking "we are a school" and "we are traditional". How do those labels help the discussion? In this context, I can't figure out whether 'school' is a good thing or bad. I only know one group who call themselves a school. I think 'brand' is a better label.
One of his axioms is “5. The Coverage Axiom: You must have a mechanism to define a target for the quantity of testing, measure progress towards that goal and assess the thoroughness in a quantifiable way.” This is not an axiom for me. I rarely quantify coverage. I think quantification that is not grounded in measurement theory is no better than using numerology or star signs to run your projects. I generally use narrative and qualitative assessment, instead.

PG: Good point. the words quantity and quantifiable imply numeric measurement – that wasn't my intention. Do you have a form of words I should use that would encompass quantitive and qualitative assessment? I think I could suggest "You must have a means of evaluating narratively, qualitatively or quantitatively the testing you plan to do or have done". When someone asks, how much testing do you plan to do, have done or have left to do, I think we should be able to provide answers. "I don't know" is not a good answer – if you want to stay hired.
For you context-driven hounds out there

PG: Sir, Yes Sir! ;–)
practice your art by picking one of his axioms and showing how it is possible to have good testing, in some context, while rejecting that principle. Post your analysis as a comment to this blog, if you want.

PG: Yes please!
In any social activity (as opposed to a mathematical or physical system), any attempt to say “this is what it must be” boils down to a question of values or definitions. The Context-Driven community declared our values with our seven principles. But we don’t call our principles irrefutable. We simply say here is one school of thought, and we like it better than any other, for the moment.

PG: I don't think I'm saying "this is what it must be" at all. What is "it", what is "must be"? I'm asking testers to consider the proposal and ask whether they agree if it has some value as a guide to choosing their actions. I'm not particularly religious but I think "murder is wrong". The fact that I don't use the ten commandments from day to day does not mean that I don't see value in them as a set of guiding principles for Christians. Every religion has their own set of principles, but I don't think many would argue murder is acceptable. So even religions are able to find some common ground. In this analogy, school=religion. Why can't we find common ground between schools of thought?

I'm extremely happy to amend, remove or add to the axioms as folk comment. Either all my suggestions will be completely shot down or some might be left standing. I'm up for trying. I firmly believe that there are some things all testers could agree on no matter how abstract. Are they axioms? Are they motherhood and apple pie? Let's find out. These abstractions could have some value other than just as debating points. But let's have that debate.

By the way – my only condition in all this is you use the blog the proposed axioms appear on. If you want to defend the proposed axioms – be my guest.

Thanks for giving this some thought – I appreciate it.


Tags: #School'sOut!

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account

First published 03/12/2009

This paper gives general guidance about selecting and evaluating commercial CAST tools. It is intended to provide a starting point for tool assessment. Although it is not as detailed or specific as a tailored report prepared by a consultant, it should enable you to plan the basics of your own tool selection and evaluation process.

It is easy to make the mistake of considering only tool function, and not the other success factors related to the organisation where the tool will be used; this is one reason that expensive tools end up being unused only a few months after purchase. Following the advice given in this paper should help you to avoid some of these problems.

There are a surprising number of steps in the tool selection process. The following diagram is an overview of the whole process.


Overview of the selection process

Where to start

You are probably reading this report because you want to make your testing process more efficient through the use of a software testing tool. However, there is a wide variety of tools available; which one(s) should you buy?

There are a number of different types of testing tools on the market, and they serve different purposes. Buying a capture/replay tool will not help you measure test coverage; a static analysis tool will not help in repeating regression tests.

You also need to consider the environment where the testing tool will be used: a mainframe tool is no use if you only have PCs.

The skills of the people using the tool also need to be taken into account; if a test execution tool requires programming skills to write test scripts, it would not be appropriate for use by end-users only.

These considerations are critical to the successful use of the tool; if the wrong tool is selected, whether it is the wrong tool for the job, the environment or the users, the benefits will not be achieved.

The tool selection and evaluation team

Someone should be given the responsibility for managing the selection and evaluation process. Generally a single individual would be authorised to investigate what tools are available and prepare a shortlist, although there could be several people involved. The researchers need to have a reasonable idea of what type of tool is needed, who within the organisation would be interested in using it, and what the most important factors are for a tool to qualify for the shortlist.

After the shortlist is prepared, however, it is wise to involve a number of people in the evaluation process. The evaluation team should include a representative from each group planning to use the tool, and someone of each type of job function who would be using it. For example, if non-technical end-users will be using a test execution tool to run user acceptance tests, then a tool that needs programming skills would be excluded and an end-user should be on the evaluation team. The usability of a tool has a significant effect on whether the tool becomes accepted as part of the testing process.

If the evaluation team becomes involved in a trial scheme, the intended user must make the recommendation as to the tool’s usability. However, the team may need to have access to a systems support resource. The role of the support person is to assist in overcoming technical problems which appear important to the user or developer but are actually quite easily overcome by a technician.

The selection and evaluation team may also go on to become the implementation team, but not necessarily.

What are the problems to be solved?

The starting point for tool selection is identifying what problem needs to be solved, where a tool might provide a solution. Some examples are:

  • tests which are currently run manually are labour-intensive, boring and lengthy
  • tests which are run manually are subject to inconsistencies, owing to human error in inputting tests
  • we need knowledge of the completeness or thoroughness of our tests; the measurement of the amount of software tested with a test suite is difficult or impossible to compute manually
  • paper records of tests are cumbersome to maintain, leading to tests being repeated or omitted
  • when small changes are made to the software, extensive regression tests must be repeated and there is not time to do it manually
  • setting up test data or test cases is repetitive and ‘mechanical’: the testers find it uninteresting and make too many ‘simple’ errors
  • comparison of test results is tedious and error-prone
  • errors are found during testing which could have been detected before any tests were run by examining the code carefully enough
  • users find too many errors that could have been found by testing.

The current testing process must be sufficiently well-defined that it is easy to see the areas where automated improvement would actually help.

Is a tool the right solution?

Software testing tools are one way to approach these types of problems, but are not the only way. For example, code inspection could be used to address the problem of detecting errors before test execution. Better organisation of test documentation and better test management procedures would address the problem of omitting or repeating tests. Considering whether all the repetitive ‘mechanical’ test cases are really necessary may be more important for test effectiveness and efficiency than blindly automating them. The use of a testing tool will not help in finding more errors in testing unless the test design process is improved, which is done by training, not by automation.

An automated solution often ‘looks better’ and may be easier to authorise expenditure for than addressing the more fundamental problems of the testing process itself. It is important to realise that the tool will not correct a poor process without additional attention being paid to it. It is possible to improve testing practices alongside implementing the tool, but it does require conscious effort.

However, we will assume that you have decided, upon rational consideration of your own current situation (possibly with some tool readiness assessment advice from an outside organisation), that a testing tool is the solution you will be going for.

How much help should the tool be?

Once you have identified the area of testing you want the tool to help you with, how will you be able to tell whether any tool you buy has actually helped? You could just buy one and see if everyone feels good about it, but this is not the most rational approach. A better way is to define measurable criteria for success for the tool. For example, if the length of time taken to run tests manually is the problem, how much quicker should the tests be run using a tool?

Setting measurable criteria is not difficult to do, at least to obtain a broad general idea of costs. A general idea is all that is necessary to know whether the tool will be cost-justified. A realistic measurable criterion for a test execution tool might be set out as follows:

Manual execution of tests currently takes 4 man-weeks. In the first 3 months of using the tool, 50–60 per cent of these tests should be automated, with the whole test suite run in 2–2½ man-weeks. Next year at this time we aim to have 80 per cent of the tests automated, with the equivalent test suite being run in 4 man-days.

An approach to measuring the potential savings of a test coverage tool might be:

We currently believe that our test cases ‘completely test’ our programs, but have no way of measuring coverage. As an experiment on a previously tested (but unreleased) program, rerun the dynamic tests using a coverage measurement tool. We will almost certainly find that our tests reached less than 100 per cent coverage. Based on the tool’s report of the unexecuted code we can devise and run additional test cases. If these additional test cases discover errors – serious ones that are deemed likely to have appeared sometime in the future in live running – then the tool would make a saving by detecting those errors during testing, which is less costly than in live running. The potential savings are the difference between the cost of an error found in live running: say £5000 for a modest error that must be corrected, and an error found during testing, say £500.

A similar approach could be applied to determining whether a static analysis tool is worth using:

Run the static analysis tool on a group of programs that have been through dynamic testing but are not yet released. Evaluate (in the opinion of a senior analyst) the cost of the genuine errors detected. For those errors which were also found in the original dynamic testing, the static analyser might not save the cost of running the dynamic test (because you do not design your dynamic tests assuming that errors have been found previously), but it might save the cost of all the diagnosis and rerunning that dynamic testing entails. More interesting is the cost of those errors which were only detected by static test but which would have caused a problem in live running. The cost of detecting the error through static analysis is likely to be one-hundredth the cost of finding it in live running.

When looking at the measurable benefits it is best to be fairly conservative about what could be accomplished. When a tool is used for the first time it always takes much longer than when people are experienced in using it, so the learning curve must be taken into account. It is important to set realistic goals, and not to expect miracles. It is also important that the tool is used correctly, otherwise the benefits may not be obtained.

If you find that people are prepared to argue about the specific numbers which you have put down, ask them to supply you with more accurate figures which will give a better-quality evaluation. Do not spend a great deal of time ‘polishing’ your estimates: the tool evaluation process should be only as long as is needed to come to a decision, and no longer. Your estimates should reflect this granularity.

How much is this help worth?

The measurable criteria that you have identified as achievable will have a value to your organisation; it is important to quantify this value in order to compare the cost of the tool with the cost saved by the benefits. One of the simplest ways to quantify the benefits is to measure the saving of time and multiply that by approximate staff costs.

For example, if regression tests which normally take 4 man-weeks manually can be done in 2 man-weeks we will save 2 man-weeks of effort whenever those tests are run. If they are run once a quarter, we will save 8 man-weeks a year. If they are run once a month, we will save 24 man-weeks a year. (If they are only run once a year we will only save 2 man-weeks in that year.) If a man-week is costed at say, £2,000, we will save respectively £16,000, £48,000 or £4,000.

The savings that can be achieved can be taken advantage of by putting the people into more productive work, on development, enhancements or better test design.

There will also be other benefits, which may be very difficult if not impossible to quantify but which should also be mentioned. The risk of an embarrassing public release may be reduced, for example, but it may not be possible to put a monetary value on this. Morale is likely to improve, which is likely to result in an increase in productivity, but it may not be possible or desirable to separate this from the productivity increase from using the tool. There may be some things that are not even possible to do manually, which will not be discovered until the tool has been in use; these unanticipated benefits cannot be quantified because no one realised them at the time.

Of course this is a very simplistic start to building a proper business case for the tool, but it is essential that some first attempt is made to quantify the benefits, otherwise you will not be able to learn from your tool evaluation experience for next time.

Tool Requirements

What tool features are needed to meet requirements?

The next step is to begin to familiarise yourself with the general capabilities of tools of the type you want.

Which of the features listed are the most important ones to meet the needs and objectives for the tool in your current situation? For example, if you want to improve the accuracy of test results comparison, a capture/replay tool without a comparator would not help.

Make a list of the features, classified as ‘essential’, ‘desirable’ and ‘don’t matter’. The essential features list would rule out any tool which did not provide all of the things on that list, and the desirable features would be used to discriminate among those tools that provide all the essential features.

Note that your feature list will change as you progress in evaluating tools. You will almost certainly discover new features that you think are desirable as you go through the evaluation process. The tool vendors are sure to point out features they can supply but which you did not specifically request. Other tool users may recommend a feature as essential because of their experience, which you may not have thought was so important. For example, you may not consider the importance of being able to update your test scripts whenever the software changes because you are concentrating on the use of the execution tool for capturing tests the first time. However, this may be a significant ‘running cost’ for using the testing tool in the future. It is also possible that a feature that you thought was desirable is not required owing to the way other features are implemented.

As well as the functions that the tool performs, it is important to include some grading of usability as a feature for evaluation. Tools that have sound technical features, but are difficult to use, frequently become shelfware.

What are the constraints?

Environmental constraints

Testing tools are software packages and therefore may be specific to particular hardware, software or operating systems. You would not want a tool that runs only on a VAX VMS system if you have an IBM MVS system and no possibility of acquiring or using anything else.

Most people look for a tool which will run on the environment in which they are developing or maintaining software, but that is not the only possibility. A number of tools can run on a PC, for example, and can execute tests running on a different computer. Even debug and coverage measurement tools can work in a ‘host–target’ or client–server environment.

Having to acquire additional hardware is sometimes more of a psychological barrier than a technical or economic one. In your tool selection process, especially if there are not many for your ‘home’ environment, it is worth considering tools based on a separate environment.

However, you may need to acquire extra hardware even for a tool that runs on your own current environment, for example extra disk space to store test scripts.

Make sure that you find out exactly what the tool requires in terms of hardware and software versions. For example, you would not want to discover at installation time that you needed to have an operating system upgrade or additional memory before the tool can work. Have you considered security aspects? Do you need a separate language compiler for the test scripts?

Commercial supplier constraints

The company that you buy the tool from will be an important factor for your future testing practices. If you have problems with the tool, you will want them sorted out quickly and competently. If you want to get the best from the tool, you will want to take advantage of their expertise. You may want to influence the future development of the tool to provide for those needs which are not currently met by it.

There are a number of factors that you should take into consideration in evaluating the tool vendor’s organisation:

  • Is the supplier a bona fide company?
  • How mature are the company and the product? If the company is well established this gives confidence, but if the product has not changed significantly in recent years it may be getting rather out of date. Some organisations will feel that they need to buy products from the product vendor who sets the trend in the marketplace. Some organisations will be wary of new product companies, but there may be instances when a brand new organisation or product may be an unknown quantity but may be just what you need at just the right time. A new vendor may be much more eager to please their first customers;
  • Is there adequate technical support? What would their response be to major or minor problems? Does the vendor run a help desk? What hours is help available? (If your vendor is in California and you are in Europe, there will be no overlap of their working day with yours!) What training courses are provided? How responsive are they to requests for information?
  • How many other people have purchased or use this tool? You may or may not want to be the very first commercial user of a new tool. Can you talk to any other users? Is there a user group, and when does it meet and who controls it? Will they provide a reference site for you to talk to?
  • What is the tool’s history? Was it developed to support good internal testing practices, to meet a specific client need, or as a speculative product? How many releases have there been to date, and how often is the tool updated? How many open faults are there currently reported?

Your relationship with the tool vendor starts during the selection and evaluation phase. If there are problems with the vendor now (when they want your money), there are likely to be even more problems later.

Cost constraints

Cost is often the most stringent and most visible constraint on tool selection. The purchase price may only be a small factor in the total cost to the organisation in fully implementing the tool. Cost factors include:

  • purchase or lease price
  • cost basis (per seat, per computer etc.)
  • cost of training in the use of the tool
  • any additional hardware needed (e.g. a PC, additional disk space or memory)
  • support costs
  • any additional costs, e.g. consultancy to ensure the tool is used in the best way.

Other constraints

Tool quality factors may include:

  • How many people can use the tool at the same time? Can test scripts be shared?
  • What skill level is needed to use the tool? How long does it take to become proficient? Are programming skills needed to write test scripts?
  • What documentation is supplied? How thorough is it? How usable is it? Are there ‘quick reference guides’, for example?

  • There may well be other constraints which override all of the others, for example ‘political’ factors, such as having to buy the same tool that the parent company uses (e.g. an American parent enforces an American tool as its standard), or a restriction against buying anything other than a locally supported tool, perhaps limiting the choice to a European or British, French or German tool. It is frustrating to tool selectors to discover these factors late on in the selection process.

Constructing the shortlist

Use the cross-references in this report to find the tools that meet your environmental requirements and provide the features that are essential for you. Read the descriptions in the tools pages. This should give you enough information to know which tools listed in this report can go on your shortlist for further evaluation.

If there are more than six or seven tools that are suitable for you, you may want to do some initial filtering using your list of desirable features so that you will be looking at only three or four tools in your selection process.

If no or not enough suitable tools are found in this report, the search could be widened to other countries (e.g. the USA).

Other sources of information include pre-commercial tools (if you can find out about them). It is worth asking your current hardware or system software supplier if they know of any tools that meet your needs. If you are already using a CASE tool, it would be worth asking your vendor about support for testing, either through future development of their tool or by linking to an existing CAST tool. Conferences and exhibitions are where new vendors often go to announce a new tool. In particular the EuroSTAR conference is the prime showcase for testing tools.

The possibility of in-house development of a special-purpose tool should also be assessed. Do not forget to consider any existing in-house written tools within your own organisation that may be suitable for further development to meet your needs. The true cost of in-house development, including the level of testing and support needed to provide a tool of adequate quality, will be significant. It is generally much more than the cost of a commercial tool, but an in-house written tool will be more directly suitable to your own needs. For example, it can help to compensate for a lack of testability in the software under test. A purchased tool may need additional tailoring in order to meet real needs, and this can be expensive.

Another possibility is to use a ‘meta-tool’ to develop a new tailored tool in a short time. A meta-tool provides software for building software tools quickly, using the existing foundation of a standardised but highly tailorable user interface, graphical editor and text editor. It could enable a new testing tool, tailored to a specific organisation, to be built within a few months.

Summary of where to look for tools
  • Existing environment (this report)
  • CASE tool vendor
  • PC-based (this report)
  • Meta-tool vendor
  • In-house prototype for development
  • World market
  • Future environment of likely vendor
  • Conferences and exhibitions
  • Hardware/software vendor
 

Tool Evaluation

Evaluating the shortlisted candidate tools

Research and compare tool features

Contact the vendors of the shortlisted tools and arrange to have information sent (if you have not done this already). Study the information and compare features. Request further information from the vendors if the literature sent does not explain the tool function clearly enough.

This is the time to consult one or more of the publications which have evaluated testing tools, if the ones you are interested in are covered in such a report. The cost of such reports should be compared to the cost of someone’s time in performing similar evaluations, and the cost of choosing the wrong tool because you did not know about something which was covered in published material. (Do not forget to allow time to read the report.)

Ask the shortlisted vendors to give you the names of some of their existing customers as reference. Contact the reference sites from each shortlisted vendor and ask them a number of questions about the tool. For example, why they bought this tool, how extensively it is now used, whether they are happy with it, what problems they have had, their impression of the vendor’s after-sales support service, how the tool affected their work, what benefits the tool gave them, and what they would do differently next time they were buying a tool. Remember that reference sites are usually the vendor’s best customers, and so will be likely to be very happy with the tool. Their environment is different from yours, so the benefits or problems which they have had may well not be the same as the ones which are important to you. However, the experience of someone else who bought a tool for similar reasons to yours is invaluable and well worth pursuing.

Many vendors are aware that a tool does not always add up to a total solution and are keen to present it as part of a more comprehensive offering, often including consultancy and training beyond just their product. They usually understand the issues covered in this paper because bad selection and bad implementation of their tools gives them a bad reputation. Because the vendors have good experience in getting the best out of their tools, their solutions may enhance the tools significantly and are worth serious examination. Nevertheless, it is always worth bearing in mind that the tool supplier is ultimately trying to persuade you to buy his product.

At any point in the selection and tool evaluation process it may become clear which tool will be the best choice. When this happens, any further activities will not influence the choice of tool but may still be useful in assessing in more detail how well the chosen tool will work in practice. It will either detect a catastrophic mismatch between the selected tool and your own environment, or will give you more confidence that you have selected a workable tool.

Tool demonstrations: preparation

Before contacting the vendor to arrange for a tool demonstration, some preparatory work will help to make your assessment of the competing tools more efficient and unbiased. Prepare two test case suites for tool demonstration:

  • one of a normal ‘mainstream’ test case
  • one of a worst-case ‘nightmare’ scenario.

Rehearse both tests manually, in order to discover any bugs in the test scenarios themselves. Prepare evaluation forms or checklists:

  • general vendor relationship (responsiveness, flexibility, technical knowledge)
  • tool performance on your test cases. Set measurable objectives, such as time to run a test on your own (first solo flight), time to run a reasonable set of tests, time to find an answer to a question in the documentation.

It is important that the tools be set up and used on your premises, using your configurations, and we recommend this, if at all possible, for the demonstration. We have had clients report to us that they found this single step to be extremely valuable, when they discovered that their prime candidate tool simply would not run in their environment! Of course, the vendor may be able to put it right but this takes time, and it is better to know about it before you sign on the dotted line, not after.

Invite the vendors of all shortlisted tools to give demonstrations within a short time-frame, for example on Monday, Wednesday and Friday of the same week. This will make sure that your memory of a previous tool is still fresh when you see a different one.

Give vendors both of your test cases in advance, to be used in their demo. If they cannot cope with your two cases in their demo, there probably is not much hope of their tool being suitable. However, be prepared to be flexible about your prepared tests. The tool may be able to solve your underlying problem in a different way than you had pictured. If your test cases are too rigid, you may eliminate a tool which would actually be very suitable for you.

Find out what facilities the vendors require and make sure they are available. Prepare a list of questions (technical and commercial) to ask on the demo day, and prepare one more test case suite to give them on the day. Allow time to write up your reactions to each of the tools, say at the end of each day.

Tool demonstrations from each vendor

Provide facilities for the vendor’s presentation and their demonstration. Listen to the presentation and ask the questions you had prepared.

Observe their running of your prepared test case suites. Try your own ‘hands-on’ demonstration of your prepared test case suites and the new one for the day. Have a slightly changed version of the software being tested, so that the test suite needs to be modified to test the other version. Have the vendors edit the scripts if they insist, but it is better to edit them yourself with their assistance, so that you can see how much work will be involved in maintaining scripts.

Ask (and note) any more questions which occur to you. Note any additional features or functions which you had not realised this tool provided. Note any features or functions which you thought it did provide but does not, or not in the form you had thought.

Try to keep all demonstrations the same as far as possible. It is easy for the last one to incorporate improvements learned during the other demonstrations, but this is not fair to the first one. Save new ideas for use in the competitive trial.

Thank and dismiss the vendor. Write up your observations and reactions to this tool.

Post-demonstration analysis

Ask of the vendors you saw first any questions which occurred to you when watching a later vendor’s presentation or demonstration. This will give the fairest comparison between the tools.

Assess tool performance against measurable criteria defined earlier, taking any special circumstances into account. Compare features and functions offered by competing tools. Compare non-functional attributes, such as usability. Compare the commercial attributes of vendor companies.

If a clear winner is now obvious, select the winning tool. Otherwise select two tools for final competitive trial. Write to the non-selected vendors giving the reason for their elimination.

Competitive trial

If it is not clear which tool is the most appropriate for you at this point, an in-house trial or evaluation will give a better idea of how you would use the tool for your systems.

Most tool vendors will allow short-term use of the tool under an evaluation licence, particularly for tools which are complex and represent a major investment. Such licences will be for a limited period of time, and the evaluating unit must plan and prepare for that evaluation accordingly.

It is all too easy to acquire the tool under an evaluation licence only to find that those who really ought to be evaluating it are tied up in some higher-priority activity during that time. If they are not able or willing to make the time available during the period of the licence to give the tool more than the most cursory attention, then the evaluation licence will be wasted.

The preparation for the trial period includes the drafting of a test plan and test suites to be used by all tools in the trial. Measurable success criteria for the evaluated tools should be planned in advance, for example length of time to record a test or to replay a test, and the number of discrepancies found in comparison (real, extraneous and any missed). Attending a training course for each tool will help to ensure that they will be used in the right way during the evaluation period.

When the competi

Tags: #cast

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account

First published 14/12/2011

When the testing versus checking debate started with Michael’s blog here http://www.developsense.com/blog/2009/08/testing-vs-checking/ I read the posts and decided it wasn’t worth getting into. It seemed to be a debate amongst the followers of the blog and the school rather than a more widespread unsettling of the status quo.

I fully recognise the difference between testing and checking (as suggested in the blogs). Renaming of what most people call testing today to checking and redefining testing in the way Michael suggests upset some folk, cheered others. Most if not all developer testing and all testing through an API using tools becomes checking – by definition. I guess developers might sniff at that. Pretty much what exploratory testers do becomes the standard for what the new testing is. So they are happy. Most testers tend not to follow blogs so they are still blissfully unaware of the debate.

Brian Marick suggested the blogs were a ‘power play’ in a Tweet and pointed to an interesting online conversation here http://tech.groups.yahoo.com/group/agile-testing/message/18116. The suggested redefinitions appear to underplay checking and promote the virtue of testing. Michael clarified his position and said it wasn’t here http://www.developsense.com/blog/2009/11/merely-checking-or-merely-testing/ and said:

“The distinction between testing and checking is a power play, but it’s not a power play between (say) testers and programmers. It’s a power play between the glorification of mechanizable assertions over human intelligence. It’s a power play between sapient and non-sapient actions.”

In the last year or so, I’ve had a few run-ins with people and presenters at conferences when I asked what they meant by checking when they used the word. They tended to forget the distinction and focus on the glorification bit. They told me testing was good (“that’s what I get paid for”) and checking was bad, useless or for drones. I’m not unduly worried by that – but it’s kind of irritating.

The problem I have is that if the idea (distinguishing test v check) is to gain traction and I believe it should, then changing the definition of testing is hardly going to help. It will confuse more than clarify. I hold that the scope of testing is much broader than testing software. In our business we test systems (a system could be a web page, it could be a hospital). The word and activity is in widespread use in almost every business, scientific and engineering discipline you can imagine. People may or may not be checking, but to ask them to change the name and description of what they do seems a bit ambitious. All the textbooks, papers and blogs written by people in our business will have to be reinterpreted and possibly changed. Oh, and how many dictionaries around the world need a correction? My guess is it won’t happen.

It’s much easier to say that a component of testing is checking. Know exactly what that is and you are a wiser tester. Sapient even.

The test v check debate is significant in the common exploratory contexts of individuals making decisions on what they do right now in an exploratory session perhaps. But it isn’t significant in the context of larger projects and teams. The sapience required in an exploratory session is concentrated in the moment to moment decision making of the tester. The sapience in other projects is found elsewhere.

In a large business project, say an SAP implementation, there might be ten to twenty legacy and SAP module system test teams plus multiple integration test teams as well as one or several customer test teams all working at a legacy system, SAP module or integrated system level. SAP projects vary from maybe fifty to several thousand man-years of effort of which a large percentage (tens of percent) is testing of one form or another. Although there will be some exploration in there – most of the test execution will be scripted and it’s definitely checking as we know it.

But, the checking activity probably counts for a tiny percentage of the overall effort and much of it is automated. The sapient effort goes into the logistics of managing quite large teams of people who must test in this context. Ten to twenty legacy systems must be significantly updated, system tested, then integrated with other legacy systems and kept in step with SAP modules that are being configured with perhaps ten thousand parameter changes. All this takes place in between ten and thirty test environments over the course of one to three years. And in all this time, business as usual changes on the legacy systems and the system to be migrated and/or retired must be accommodated.

As the business and projects learn what it is about, requirements evolve and all the usual instability disturbs things. But change is an inevitable consequence of learning and large projects need very careful change management to make sure the learning is communicated. It’s an exploratory process on a very large scale. Testing includes data migration, integration with customers, suppliers, banks, counterparties; it covers regulatory requirements, cutover and rollback plans, workarounds, support and maintenance processes as well as all the common non-functional areas.

Testing in these projects has some parallels with a military campaign. It’s all about logistics. Test checking activity compares with ‘pulling the trigger’.

Soldiering isn’t just about pulling triggers. In the same way, testing isn’t just about checking. Almost all the sapient activity goes into putting the testers into exactly the right place at the right time, fully equipped with meaningful and reliable environments, systems under test, integral data and clear instructions, with dedicated development, integration, technical, data, domain expert support teams. Checking may be manual or automated, but it’s a small part of the whole.

Exploration in environments like these can’t be done ‘interactively’. It really could take months and tens/hundreds of thousands of pounds/dollars/euros to construct the environment and data to run a speculative test. Remarkably, exploratory tests are part of all these projects. They just need to be wisely chosen and carefully prepared, just like other planned tests, because you have a limited time window and might not get a second chance. These systems are huge production lines for data so they need to be checked endlessly end to end. It’s a factory process so maybe testing is factory-like. It’s just a different context.

The machines on the line (the modules/screens) are extremely reliable commercial products. They do exactly what you have configured them to do with a teutonic reliability. The exploration is really one of requirements, configuration options and the known behaviour of modules used in a unique combination. Test execution is confirmation but it seems that it can be done no other way.

It rarely goes smoothly of course. That’s logistics for you. And testing doing what it always does.

Tags: #testingvchecking #checking #factory

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account

First published 06/11/2009

This talk was presented at SQSTEST in 2006. It sets out an alternative way of thinking about 'process improvement'. My argument is that we should focus on results, then define the changes we need to make. It draws on Results-Chain theory and the change management approach of John Kotter.

Registered users can download the paper from the link below. If you aren't registered, you can register here.

Tags: #softwaresuccessimprovement

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account

First published 16/12/2012

A couple of weeks ago I gave a talk that included a couple of slides that focused on the idea of Specification by Example and how it cannot be relied upon to fully define the functionality of a software solution. I thought I'd summarise it here while the thought was fresh in my mind and also because Martin Fowler recently re-posted a blog originally published some time ago.

Martin provides a broader perspective and significantly, he says 'Specification By Example only works in the context of a working relationship where both sides are collaborating and not fighting'. Quite. He quotes a Cedric Beust post that critiques TDD (and Agile projects in general) that promote the use of tests as specifications.

Clearly, SBE can work nicely in an Agile environment where the scenarios are there to capture some key examples of the feature in use. The more general business rules to be implemented in code are (presumably) discussed and captured elsewhere – specifically in the code and exampled in tests. The examples and automated tests based on the conversations are retained to provide evidence that the software 'works' and stays working after changes elsewhere. One obvious, valuable outcome of SBE, Behaviour-Driven or Test-Driven approaches is a set of automated tests that are a quite effective anti-regression measure for use in projects that practice a continuous delivery approach. But what about non-Agile? Can SBE work in all contexts?

The questions is, “can examples alone be trusted to fully describe some system behaviour?” The answer is occasionally yes, but usually – no. Here's an example of why not.

The table below shows some scenarios associated with a feature. Call it SBE, BDD or just a shorthand for some TDD tests. Whatever.

given a, b, c are real numbers
when a=<a>
  and b=<b>
  and c=<c>
then r1=<r1> and r2=<r2>

| a | b | c | r1 | r2 | | 1 | -2 | 1 | 1 | 1 | | 1 | 3 | 2 | 1 | 2 | | 1 | 3 | 2 | 1 | 2 | |12 |-28 |15 | 1.5| 0.833|

It doesn't give much away does it? “Do you know what it is yet?” (as Rolf Harris might ask).

Now, I could keep giving you new examples that are correct from the point of view of the requirement (that I'm not yet sharing with you). Maybe you'd spot the pattern of the inputs and outputs and guess that a b c are the coefficients of a quadratic and r1 r2 are the roots. Aha. The programmer could easily implement the code as follows:

r1 = (-b + sqrt(bb – 4ac))/(2a)
r2 = (-b – sqrt(bb – 4ac))/(2a)
Sorted. But is it...?

Suppose I then gave you an example that could NOT be processed by the quadratic formulae? The example below would probably cause an exception in the code:

| a |  b | c |  r1 | r2  |
| 4 |  3 | 2 | ... | ... |
You can't take square roots of negative numbers. So you could argue that there's a validation rule (not yet specified) that rejects inputs that cause this exception and change the code accordingly. But in fact, one CAN derive the square root of negative numbers. They are just called 'complex numbers' that's all. (Mathematicians can be a bit slippery). Have we got it right yet? We'd have to look at the expected outcomes in the examples provided, and generate them in code and hope for the best. Whatever. That's enough maths for one day.

The principle must be that examples on their own do not provide enough information to formulate a general solution. It is always possible to code a solution that will satisfy the examples provided. But that is not a solution – it is mimicry. A coded table of pre-defined answers can mimic the general solution. But the very next example, when used to test our solution will fail if the example is not in our coded table. Our model of the solution is incomplete – it's wrong. In fact, to be certain we have the perfect solution we would effectively need an infinite number of examples that when tested generate the required outcomes. Specification by Example ALONE cannot provide a complete specification.

Where does this leave us? Specifications are models of systems. All models are wrong (or at least incomplete) but some are useful. But having a specification is a necessary (but probably not sufficient) condition for building a solution.

Perhaps Specification by Example is mis-named. It should called be Specification AND Example.

The question remains, “How much software is out there that just mimics a solution?”

Tags: #specificationbyexample #SBE

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account

First published 05/05/2011

We can help you meet the challenges in the selection and management of your current and prospective external/internal suppliers and partners. We can help your supplier management by:

  • Evaluating supplier strengths and weaknesses
  • Identifying any major risks associated with your supplier services
  • Developing contract schedules including specific acceptance criteria
  • Refining and improving the commercial relationship with your suppliers
  • Developing plans and techniques for the performance tracking and management of your suppliers

If you’d like to know more, please contact us directly.



Tags: #SupplierSelection #SupplierSelectionManagement

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account

First published 27/05/2011

Susan Windsor's Test Assurance introduction, presented at the Unicom Next Generation conference on 19 May. To date, Test Assurance has been a rather specialised discipline that we've been involved in for the past 10-12 years or so. But it now seems like it's becoming a hot topic. This is a broad overview of what it's all about. Download the presentation



Tags: #testassurance

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account

First published 05/11/2009

Is it possible to define a set of axioms that provide a framework for software testing that all the variations of test approach currently being advocated align with or obey? In this respect, an axiom would be an uncontested principle; something self-evidently and so obviously true and not requiring proof. What would such test axioms look like?

This paper summarises some preliminary work on defining a set of Test Axioms. Some applications of the axioms that would appear useful are suggested for future development. It is also suggested the work of practitioners and researchers is on very shaky ground unless we refine and agree these Axioms. This is a work in progress.

Registered users can download the paper from the link below. If you aren't registered, you can register here.

Tags: #testaxioms #thinkingtools

Paul Gerrard Please connect and contact me using my linkedin profile. My Mastodon Account