Paul Gerrard

My experiences in the Test Engineering business; opinions, definitions and occasional polemics. Many have been rewritten to restore their original content.

First published 12/03/2010

On Thursday's SIGIST meeting, it was great to have such a positive reaction to my workshop and closing talk.

The Fallibility axiom (p41) tells us our sources of knowledge are undependable. The tester is a human being and prone to error. The system is being tested because we are uncertain of its behaviour or reliability. As a consequence, the plan for any test worth running cannot be relied upon to be accurate before we follow it. Predictions of test status (e.g. coverage achieved or test pass- rate) at any future date or time are notional. The planning quandary is conveniently expressed in the testing uncertainty principle:

  • One can predict test status, but not when it will be achieved;
  • One can predict when a test will end, but not its status.
Consequently, if a plan defines completion of testing using test exit criteria to be met at a specified date (expressed in terms of tests run and the status of those tests) it is wise to regard them as planning assumptions, rather than hard targets.
  • If exit criteria are met on time or earlier, our planning assumptions are sound: We are where we want to be.
  • If exit criteria are not met or not met on time, our plan was optimistic: Our plan needs adjustment, or we must relax the criteria.
Whichever outcome arises, we still need to think very carefully about what they actually mean in our project.

Tags: #testaxioms #uncertainty

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 01/12/2009

The Knowledge Base has moved HERE!

This is a new website currently hosting a directory of tools in the DevOps, SDET and Testing support domains and it also provides a searchable index of tools. There are over 2208 tools registered although 693 are actually programming languages.

The site also monitors the blog pages of 277 bloggers. These again are indexed and searchable.

Numbers correct as of 25/8/2015.



Tags: #tkb #ToolsKnowledgeBase

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 05/11/2009

This talk setting out some thoughts on what's happening in the testing marketplace. Covers Benefits-Based Testing, Testing Frameworks, Software Success Improvement, Tester Skills and provides some recommendations for building your career.

Registered users can download the paper from the link below. If you aren't registered, you can register here.

Tags: #testingtrends

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 03/12/2009

You may or may not find this response useful. :–)

“It depends”.

The “it depends” response is an old joke. I think I was advised by David Gelperin in the early 90s that if someone says “it depends” your response should be “ahh, you must be a consultant!”

But it does depend. It always has and will do. The context-driven guys provide a little more information – “it depends on context”. But this doesn't answer the question of course – we still get asked by people who really do need an answer – i.e. project managers who need to plan and to resource teams.

As an aside, there’s an interesting discussion of “stupid questions” here. This question isn't stupid, but the blog post is interesting.

In what follows – let me assume you’ve been asked the question by a project manager.

The 'best' dev/tester ratio is possibly the most context-specific question in testing. What are the influences on the answer?

  • What is the capability/competence of the developers and testers respectively and absolutely?
  • What do dev and test WANT to do versus what you (as a manager) want them to do?
  • To what degree are the testers involved in early testing (they just system test? Or are involved from concept thru to acceptance etc.)
  • What is the risk-profile of the project?
  • Do stakeholders care if the system works or not?
  • What is the scale of the development?
  • What is the ratio of new/custom code versus reused (and trusted) code/infrastructure?
  • How trustworthy is the to-be-reused code anyway?
  • How testable will the delivered system be?
  • Do resources come in integer whole numbers or fractions?
  • And so on, and so on…
Even if you had the answers to these questions to six significant digits – you still aren’t much wiser because some other pieces of information are missing. These are possibly known to the project manager who is asking the question:
  • How much budget is available? (knowing this – he has an answer already)
  • Does the project manager trust your estimates and recommendations or does he want references to industry ‘standards’? i.e. he wants a crutch, not an answer.
  • Is the project manager competent and honest?
So we’re left with this awkward situation. Are you being asked the question to make the project manager feel better; to give him reassurance he has the right answer already? Does he know his budget is low and needs to articulate a case for justifying more? Does he think the budget is too high and wants a case for spending less?

Does he regard you as competent and trust what you say anyway? This final point could depend on his competence as much as yours! References to ‘higher authorities’ satisfy some people (if all they want is back-covering), but other folk want personal, direct, relevant experience and data.

I think a bit of von Neumann game theory may be required to analyse the situation!

Here’s a suggestion. Suppose the PM says he has 4 developers and needs to know how many testers are required. I’d suggest he has a choice:

  • 4 dev – 1 tester: onus is on the devs to do good testing, the tester will advise, cherry pick areas to test and focus on high impact problems. PM needs to micro manage the devs, and the tester is a free-agent.
  • 4 dev – 2 testers: testers partner with dev to ‘keep them honest’. Testers pair up to help with dev testing (whether TDD or not). Testers keep track of the coverage and focus on covering gaps and doing system-level testing. PM manages dev based on tester output.
  • 4 dev – 3 testers: testers accountable for testing. Testers shadow developers in all dev test activities. System testing is thorough. Testers set targets for achievement and provide evidence of it to PM. PM manages on the basis of test reports.
  • 4 dev – 4 testers: testers take ownership of all testing. But is this still Agile??? ;–)
Perhaps it’s worth asking the PM for dev and tester job specs and working out what proportion of their activities are actually dev and test? Don’t hire testers at all – just hire good developers (i.e. those who can test). If he has poor developers (who can’t/won’t test) then the ratio of testers goes up because someone has to do their job for them.

Tags: #estimation #testerdeveloperratio

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 03/12/2009

The V-model promotes the idea that the dynamic test stages (on the right hand side of the model) use the documentation identified on the left hand side as baselines for testing. The V-Model further promotes the notion of early test preparation.


Figure 4.1 The V-Model of testing.

Early test preparation finds faults in baselines and is an effective way of detecting faults early. This approach is fine in principle and the early test preparation approach is always effective. However, there are two problems with the V-Model as normally presented.


The V-Model with early test preparation.

Firstly, in our experience, there is rarely a perfect, one-to-one relationship between the documents on the left hand side and the test activities on the right. For example, functional specifications don’t usually provide enough information for a system test. System tests must often take account of some aspects of the business requirements as well as physical design issues for example. System testing usually draws on several sources of requirements information to be thoroughly planned.

Secondly, and more important, the V-Model has little to say about static testing at all. The V-Model treats testing as a “back-door” activity on the right hand side of the model. There is no mention of the potentially greater value and effectiveness of static tests such as reviews, inspections, static code analysis and so on. This is a major omission and the V-Model does not support the broader view of testing as a constantly prominent activity throughout the development lifecycle.


The W-Model of testing.

Paul Herzlich introduced the W-Model approach in 1993. The W-Model  attempts to address shortcomings in the V-Model. Rather than focus on specific dynamic test stages, as the V-Model does, the W-Model focuses on the development products themselves. Essentially, every development activity that produces a work product is “shadowed” by a test activity. The purpose of the test activity specifically is to determine whether the objectives of a development activity have been met and the deliverable meets its requirements. In its most generic form, the W-Model presents a standard development lifecycle with every development stage mirrored by a test activity. On the left hand side, typically, the deliverables of a development activity (for example, write requirements) is accompanied by a test activity “test the requirements” and so on. If your organization has a different set of development stages, then the W-Model is easily adjusted to your situation. The important thing is this: the W-Model of testing focuses specifically on the product risks of concern at the point where testing can be most effective.


The W-Model and static test techniques.

If we focus on the static test techniques, you can see that there is a wide range of techniques available for evaluating the products of the left hand side. Inspections, reviews, walkthroughs, static analysis, requirements animation as well as early test case preparation can all be used.


The W-Model and dynamic test techniques.

If we consider the dynamic test techniques you can see that there is also a wide range of techniques available for evaluating executable software and systems. The traditional unit, integration, system and acceptance tests can make use of the functional test design and measurement techniques as well as the non-functional test techniques that are all available for use to address specific test objectives.

The W-Model removes the rather artificial constraint of having the same number of dynamic test stages as development stages. If there are five development stages concerned with the definition, design and construction of code in your project, it might be sensible to have only three stages of dynamic testing only. Component, system and acceptance testing might fit your normal way of working. The test objectives for the whole project would be distributed across three stages, not five. There may be practical reasons for doing this and the decision is based on an evaluation of product risks and how best to address them. The W-Model does not enforce a project “symmetry” that does not (or cannot) exist in reality. The W-model does not impose any rule that later dynamic tests must be based on documents created in specific stages (although earlier documentation products are nearly always used as baselines for dynamic testing). More recently, the Unified Modeling Language (UML) described in Booch, Rumbaugh and Jacobsen’s book [5] and the methodologies based on it, namely the Unified Software Process and the Rational Unified Process™ (described in [6-7]) have emerged in importance. In projects using these methods, requirements and designs might be documented in multiple models so system testing might be based on several of these models (spread over several documents).

We  use the W-Model in test strategy as follows. Having identified the specific risks of concern, we specify the products that need to be tested; we then select test techniques (static reviews or dynamic test stages) to be used on those products to address the risks; we then schedule test activities as close as practicable to the development activity that generated the products to be tested.
 



Tags: #w-model

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 14/12/2011

When the testing versus checking debate started with Michael’s blog here http://www.developsense.com/blog/2009/08/testing-vs-checking/ I read the posts and decided it wasn’t worth getting into. It seemed to be a debate amongst the followers of the blog and the school rather than a more widespread unsettling of the status quo.

I fully recognise the difference between testing and checking (as suggested in the blogs). Renaming of what most people call testing today to checking and redefining testing in the way Michael suggests upset some folk, cheered others. Most if not all developer testing and all testing through an API using tools becomes checking – by definition. I guess developers might sniff at that. Pretty much what exploratory testers do becomes the standard for what the new testing is. So they are happy. Most testers tend not to follow blogs so they are still blissfully unaware of the debate.

Brian Marick suggested the blogs were a ‘power play’ in a Tweet and pointed to an interesting online conversation here http://tech.groups.yahoo.com/group/agile-testing/message/18116. The suggested redefinitions appear to underplay checking and promote the virtue of testing. Michael clarified his position and said it wasn’t here http://www.developsense.com/blog/2009/11/merely-checking-or-merely-testing/ and said:

“The distinction between testing and checking is a power play, but it’s not a power play between (say) testers and programmers. It’s a power play between the glorification of mechanizable assertions over human intelligence. It’s a power play between sapient and non-sapient actions.”

In the last year or so, I’ve had a few run-ins with people and presenters at conferences when I asked what they meant by checking when they used the word. They tended to forget the distinction and focus on the glorification bit. They told me testing was good (“that’s what I get paid for”) and checking was bad, useless or for drones. I’m not unduly worried by that – but it’s kind of irritating.

The problem I have is that if the idea (distinguishing test v check) is to gain traction and I believe it should, then changing the definition of testing is hardly going to help. It will confuse more than clarify. I hold that the scope of testing is much broader than testing software. In our business we test systems (a system could be a web page, it could be a hospital). The word and activity is in widespread use in almost every business, scientific and engineering discipline you can imagine. People may or may not be checking, but to ask them to change the name and description of what they do seems a bit ambitious. All the textbooks, papers and blogs written by people in our business will have to be reinterpreted and possibly changed. Oh, and how many dictionaries around the world need a correction? My guess is it won’t happen.

It’s much easier to say that a component of testing is checking. Know exactly what that is and you are a wiser tester. Sapient even.

The test v check debate is significant in the common exploratory contexts of individuals making decisions on what they do right now in an exploratory session perhaps. But it isn’t significant in the context of larger projects and teams. The sapience required in an exploratory session is concentrated in the moment to moment decision making of the tester. The sapience in other projects is found elsewhere.

In a large business project, say an SAP implementation, there might be ten to twenty legacy and SAP module system test teams plus multiple integration test teams as well as one or several customer test teams all working at a legacy system, SAP module or integrated system level. SAP projects vary from maybe fifty to several thousand man-years of effort of which a large percentage (tens of percent) is testing of one form or another. Although there will be some exploration in there – most of the test execution will be scripted and it’s definitely checking as we know it.

But, the checking activity probably counts for a tiny percentage of the overall effort and much of it is automated. The sapient effort goes into the logistics of managing quite large teams of people who must test in this context. Ten to twenty legacy systems must be significantly updated, system tested, then integrated with other legacy systems and kept in step with SAP modules that are being configured with perhaps ten thousand parameter changes. All this takes place in between ten and thirty test environments over the course of one to three years. And in all this time, business as usual changes on the legacy systems and the system to be migrated and/or retired must be accommodated.

As the business and projects learn what it is about, requirements evolve and all the usual instability disturbs things. But change is an inevitable consequence of learning and large projects need very careful change management to make sure the learning is communicated. It’s an exploratory process on a very large scale. Testing includes data migration, integration with customers, suppliers, banks, counterparties; it covers regulatory requirements, cutover and rollback plans, workarounds, support and maintenance processes as well as all the common non-functional areas.

Testing in these projects has some parallels with a military campaign. It’s all about logistics. Test checking activity compares with ‘pulling the trigger’.

Soldiering isn’t just about pulling triggers. In the same way, testing isn’t just about checking. Almost all the sapient activity goes into putting the testers into exactly the right place at the right time, fully equipped with meaningful and reliable environments, systems under test, integral data and clear instructions, with dedicated development, integration, technical, data, domain expert support teams. Checking may be manual or automated, but it’s a small part of the whole.

Exploration in environments like these can’t be done ‘interactively’. It really could take months and tens/hundreds of thousands of pounds/dollars/euros to construct the environment and data to run a speculative test. Remarkably, exploratory tests are part of all these projects. They just need to be wisely chosen and carefully prepared, just like other planned tests, because you have a limited time window and might not get a second chance. These systems are huge production lines for data so they need to be checked endlessly end to end. It’s a factory process so maybe testing is factory-like. It’s just a different context.

The machines on the line (the modules/screens) are extremely reliable commercial products. They do exactly what you have configured them to do with a teutonic reliability. The exploration is really one of requirements, configuration options and the known behaviour of modules used in a unique combination. Test execution is confirmation but it seems that it can be done no other way.

It rarely goes smoothly of course. That’s logistics for you. And testing doing what it always does.

Tags: #testingvchecking #checking #factory

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 01/12/2009

The extraordinary growth in the Internet is sweeping through industry. Small companies can now compete for attention in the global shopping mall – the World Wide Web. Large corporations see ‘The Web’ not only as an inexpensive way to make company information on products and services available to anyone with a PC and browser, but increasingly as a means of doing on-line business with world wide markets. Companies are using the new paradigm in four ways:

  • Web sites - to publicise services, products, culture and achievements.
  • Internet products - on-line services and information to a global market on the Web.
  • Intranet products - on-line services and information for internal employees.
  • Extranet products - on-line services and information enabling geographically distributed organisations to collaborate.

Web-based systems can be considered to be a particular type of client/server architecture. However, the way that these systems are assembled and used means some specialist tools are required and since such tools are becoming available we will give them some consideration here.

The risks that are particular to Web based applications are especially severe where the system may be accessed by thousands or tens of thousands of customers. The very high visibility of some systems being built mean that failure in such systems could be catastrophic. Web pages usually comprise text base files in Hypertext Markup Language (HTML) and now contain executable content so that the traditional separation of ‘code and data’ is no longer possible or appropriate. Browsers, plug-ins, active objects and Java are also new concepts which are still immature.

There are four main categories of tools which support the testing of Web applications:

Application test running

Test running tools that can capture tests of user interactions with Web applications and replay them now exist. These tools are either enhancements to existing GUI test running tools or are new tools created specifically to drive applications accessed through Web browsers. The requirements for these tools are very similar to normal GUI test running tools, but there are some important considerations.

Firstly, the Web testing tool will be designed to integrate with specific browsers. Ask your vendor whether the tool supports the browsers your Web application is designed for, but also check whether old versions of these browsers are also supported. To test simple text-oriented HTML pages, the Web testing tool must be able to execute the normal web page navigation commands, recognise HTML objects such as tables, forms, frames, links to other pages and content consisting of text, images, video clips etc. HTML text pages are often supplemented by server-side programs typically in the form of Common Gateway Interface (CGI) Scripts which perform more substantial processing. These should be transparent to the test tool.

Increasingly, web applications will consist of simple text-based web pages, as before, but these will be complemented with ‘active content’ components. These components are likely to be Java applets, Netscape ‘plug-ins’, ActiveX controls. Tools are only just emerging which are capable of dealing with these components. Given the portable nature of the Java development language, tools written in Java may actually be completely cable of dealing with any legitimate Java object in your Web application, so may be an obvious choice. However, if other non-Java components are present in your application, a ‘pure-Java’ tool may prove inadequate. Another consideration is how tools cope with dynamically generated HTML pages – some tools cannot.

HTML source, link, file and image checkers

Tools have existed for some time which perform ‘static tests’ of Web pages and content. These tools open a Web page (a site Home page, for example) to verify the syntax of the HTML source and check that all the content, such as images, sounds and video clips can be accessed and played/displayed. Links to other pages on the site can be traversed, one by one. For each linked page, the content is verified until the tool runs out of unvisited links. These tools are usually configured to stop the search once they encounter a link to an off-server page or another site, but they can effectively verify every page and that every home-site based link and content is present. Most of these tools provide graphical reports on the structure of Web sites which highlight the individual pages, internal and external links, missing links and other missing content.

Component test-drivers

Advanced Web applications are likely to utilise active components which are not directly accessible, using a browser-oriented Web testing tool. Currently, developers have to write customised component drivers, for example using the main{} method in Java classes to exercise the methods in a the class without having to use other methods in other classes.

As web applications become more sophisticated, the demand for specialised component drivers to test low level functionality and integration of components will increase. Such tools may be delivered as part of a development toolkit, but unfortunately, development tool vendors are more often interested in providing the ‘coolest’ development, rather than testing tools.

Internet performance testing

Web applications are most easily viewed as a particular implementation of client/server. The performance testing tools for the web that are available are all enhanced versions of established client/server based tools. We can consider the requirements for load generation and client application response time measurement separately.

Load generation tools rely on a master or control program running on a server which drive physical workstations using a test running tool to drive the application, or test drivers which submit Web traffic across the network to the Web servers. In the first case, all that is new is that it is the Web-oriented test running tool which drives the client application through a browser. For larger tests, test drivers capable of generating Web traffic across the network are used. Here, the test scripts dispatch remote procedure calls to the web servers. Rather than a SQL-oriented database protocol such as ODBC, the test script will contain HTTP calls which use the HTTP protocol instead. All that has changed is the test driver programs.

Client application response time measurement is done using the Web test running tool. This may be a standalone tool running on a client workstation or the client application driver, controlled by the load generation tool.



Tags: #tools #automation #web #internet

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 06/11/2009

This talk will be presented at EuroSTAR 2007 in Stockholm in December 2007. I presented the talk at the EuroSTAR-Testnet mini-event at Nieuwegein, Holland on the same night as Liverpool played AC Milan in the Champions League Cup Final hence the picture on slide 2. (It's a shame they lost 2-1). The focus of the talk is that using lessons learned can help you formulate a better test strategy, or as I am calling them nowadays, 'Project Intelligence Strategies'.

Registered users can download the paper from the link below. If you aren't registered, you can register here.

Tags: #sap #erp #lessonslearned

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 06/11/2009

A discussion of the main improvement types possible and how the TOM maturity model can be used to get a process improvement initiative started.

Registered users can download the paper from the link below. If you aren't registered, you can register here.

Tags: #softwaresuccessimprovement

Paul Gerrard My linkedin profile is here My Mastodon Account

First published 03/12/2009

This paper gives general guidance about selecting and evaluating commercial CAST tools. It is intended to provide a starting point for tool assessment. Although it is not as detailed or specific as a tailored report prepared by a consultant, it should enable you to plan the basics of your own tool selection and evaluation process.

It is easy to make the mistake of considering only tool function, and not the other success factors related to the organisation where the tool will be used; this is one reason that expensive tools end up being unused only a few months after purchase. Following the advice given in this paper should help you to avoid some of these problems.

There are a surprising number of steps in the tool selection process. The following diagram is an overview of the whole process.


Overview of the selection process

Where to start

You are probably reading this report because you want to make your testing process more efficient through the use of a software testing tool. However, there is a wide variety of tools available; which one(s) should you buy?

There are a number of different types of testing tools on the market, and they serve different purposes. Buying a capture/replay tool will not help you measure test coverage; a static analysis tool will not help in repeating regression tests.

You also need to consider the environment where the testing tool will be used: a mainframe tool is no use if you only have PCs.

The skills of the people using the tool also need to be taken into account; if a test execution tool requires programming skills to write test scripts, it would not be appropriate for use by end-users only.

These considerations are critical to the successful use of the tool; if the wrong tool is selected, whether it is the wrong tool for the job, the environment or the users, the benefits will not be achieved.

The tool selection and evaluation team

Someone should be given the responsibility for managing the selection and evaluation process. Generally a single individual would be authorised to investigate what tools are available and prepare a shortlist, although there could be several people involved. The researchers need to have a reasonable idea of what type of tool is needed, who within the organisation would be interested in using it, and what the most important factors are for a tool to qualify for the shortlist.

After the shortlist is prepared, however, it is wise to involve a number of people in the evaluation process. The evaluation team should include a representative from each group planning to use the tool, and someone of each type of job function who would be using it. For example, if non-technical end-users will be using a test execution tool to run user acceptance tests, then a tool that needs programming skills would be excluded and an end-user should be on the evaluation team. The usability of a tool has a significant effect on whether the tool becomes accepted as part of the testing process.

If the evaluation team becomes involved in a trial scheme, the intended user must make the recommendation as to the tool’s usability. However, the team may need to have access to a systems support resource. The role of the support person is to assist in overcoming technical problems which appear important to the user or developer but are actually quite easily overcome by a technician.

The selection and evaluation team may also go on to become the implementation team, but not necessarily.

What are the problems to be solved?

The starting point for tool selection is identifying what problem needs to be solved, where a tool might provide a solution. Some examples are:

  • tests which are currently run manually are labour-intensive, boring and lengthy
  • tests which are run manually are subject to inconsistencies, owing to human error in inputting tests
  • we need knowledge of the completeness or thoroughness of our tests; the measurement of the amount of software tested with a test suite is difficult or impossible to compute manually
  • paper records of tests are cumbersome to maintain, leading to tests being repeated or omitted
  • when small changes are made to the software, extensive regression tests must be repeated and there is not time to do it manually
  • setting up test data or test cases is repetitive and ‘mechanical’: the testers find it uninteresting and make too many ‘simple’ errors
  • comparison of test results is tedious and error-prone
  • errors are found during testing which could have been detected before any tests were run by examining the code carefully enough
  • users find too many errors that could have been found by testing.

The current testing process must be sufficiently well-defined that it is easy to see the areas where automated improvement would actually help.

Is a tool the right solution?

Software testing tools are one way to approach these types of problems, but are not the only way. For example, code inspection could be used to address the problem of detecting errors before test execution. Better organisation of test documentation and better test management procedures would address the problem of omitting or repeating tests. Considering whether all the repetitive ‘mechanical’ test cases are really necessary may be more important for test effectiveness and efficiency than blindly automating them. The use of a testing tool will not help in finding more errors in testing unless the test design process is improved, which is done by training, not by automation.

An automated solution often ‘looks better’ and may be easier to authorise expenditure for than addressing the more fundamental problems of the testing process itself. It is important to realise that the tool will not correct a poor process without additional attention being paid to it. It is possible to improve testing practices alongside implementing the tool, but it does require conscious effort.

However, we will assume that you have decided, upon rational consideration of your own current situation (possibly with some tool readiness assessment advice from an outside organisation), that a testing tool is the solution you will be going for.

How much help should the tool be?

Once you have identified the area of testing you want the tool to help you with, how will you be able to tell whether any tool you buy has actually helped? You could just buy one and see if everyone feels good about it, but this is not the most rational approach. A better way is to define measurable criteria for success for the tool. For example, if the length of time taken to run tests manually is the problem, how much quicker should the tests be run using a tool?

Setting measurable criteria is not difficult to do, at least to obtain a broad general idea of costs. A general idea is all that is necessary to know whether the tool will be cost-justified. A realistic measurable criterion for a test execution tool might be set out as follows:

Manual execution of tests currently takes 4 man-weeks. In the first 3 months of using the tool, 50–60 per cent of these tests should be automated, with the whole test suite run in 2–2½ man-weeks. Next year at this time we aim to have 80 per cent of the tests automated, with the equivalent test suite being run in 4 man-days.

An approach to measuring the potential savings of a test coverage tool might be:

We currently believe that our test cases ‘completely test’ our programs, but have no way of measuring coverage. As an experiment on a previously tested (but unreleased) program, rerun the dynamic tests using a coverage measurement tool. We will almost certainly find that our tests reached less than 100 per cent coverage. Based on the tool’s report of the unexecuted code we can devise and run additional test cases. If these additional test cases discover errors – serious ones that are deemed likely to have appeared sometime in the future in live running – then the tool would make a saving by detecting those errors during testing, which is less costly than in live running. The potential savings are the difference between the cost of an error found in live running: say £5000 for a modest error that must be corrected, and an error found during testing, say £500.

A similar approach could be applied to determining whether a static analysis tool is worth using:

Run the static analysis tool on a group of programs that have been through dynamic testing but are not yet released. Evaluate (in the opinion of a senior analyst) the cost of the genuine errors detected. For those errors which were also found in the original dynamic testing, the static analyser might not save the cost of running the dynamic test (because you do not design your dynamic tests assuming that errors have been found previously), but it might save the cost of all the diagnosis and rerunning that dynamic testing entails. More interesting is the cost of those errors which were only detected by static test but which would have caused a problem in live running. The cost of detecting the error through static analysis is likely to be one-hundredth the cost of finding it in live running.

When looking at the measurable benefits it is best to be fairly conservative about what could be accomplished. When a tool is used for the first time it always takes much longer than when people are experienced in using it, so the learning curve must be taken into account. It is important to set realistic goals, and not to expect miracles. It is also important that the tool is used correctly, otherwise the benefits may not be obtained.

If you find that people are prepared to argue about the specific numbers which you have put down, ask them to supply you with more accurate figures which will give a better-quality evaluation. Do not spend a great deal of time ‘polishing’ your estimates: the tool evaluation process should be only as long as is needed to come to a decision, and no longer. Your estimates should reflect this granularity.

How much is this help worth?

The measurable criteria that you have identified as achievable will have a value to your organisation; it is important to quantify this value in order to compare the cost of the tool with the cost saved by the benefits. One of the simplest ways to quantify the benefits is to measure the saving of time and multiply that by approximate staff costs.

For example, if regression tests which normally take 4 man-weeks manually can be done in 2 man-weeks we will save 2 man-weeks of effort whenever those tests are run. If they are run once a quarter, we will save 8 man-weeks a year. If they are run once a month, we will save 24 man-weeks a year. (If they are only run once a year we will only save 2 man-weeks in that year.) If a man-week is costed at say, £2,000, we will save respectively £16,000, £48,000 or £4,000.

The savings that can be achieved can be taken advantage of by putting the people into more productive work, on development, enhancements or better test design.

There will also be other benefits, which may be very difficult if not impossible to quantify but which should also be mentioned. The risk of an embarrassing public release may be reduced, for example, but it may not be possible to put a monetary value on this. Morale is likely to improve, which is likely to result in an increase in productivity, but it may not be possible or desirable to separate this from the productivity increase from using the tool. There may be some things that are not even possible to do manually, which will not be discovered until the tool has been in use; these unanticipated benefits cannot be quantified because no one realised them at the time.

Of course this is a very simplistic start to building a proper business case for the tool, but it is essential that some first attempt is made to quantify the benefits, otherwise you will not be able to learn from your tool evaluation experience for next time.

Tool Requirements

What tool features are needed to meet requirements?

The next step is to begin to familiarise yourself with the general capabilities of tools of the type you want.

Which of the features listed are the most important ones to meet the needs and objectives for the tool in your current situation? For example, if you want to improve the accuracy of test results comparison, a capture/replay tool without a comparator would not help.

Make a list of the features, classified as ‘essential’, ‘desirable’ and ‘don’t matter’. The essential features list would rule out any tool which did not provide all of the things on that list, and the desirable features would be used to discriminate among those tools that provide all the essential features.

Note that your feature list will change as you progress in evaluating tools. You will almost certainly discover new features that you think are desirable as you go through the evaluation process. The tool vendors are sure to point out features they can supply but which you did not specifically request. Other tool users may recommend a feature as essential because of their experience, which you may not have thought was so important. For example, you may not consider the importance of being able to update your test scripts whenever the software changes because you are concentrating on the use of the execution tool for capturing tests the first time. However, this may be a significant ‘running cost’ for using the testing tool in the future. It is also possible that a feature that you thought was desirable is not required owing to the way other features are implemented.

As well as the functions that the tool performs, it is important to include some grading of usability as a feature for evaluation. Tools that have sound technical features, but are difficult to use, frequently become shelfware.

What are the constraints?

Environmental constraints

Testing tools are software packages and therefore may be specific to particular hardware, software or operating systems. You would not want a tool that runs only on a VAX VMS system if you have an IBM MVS system and no possibility of acquiring or using anything else.

Most people look for a tool which will run on the environment in which they are developing or maintaining software, but that is not the only possibility. A number of tools can run on a PC, for example, and can execute tests running on a different computer. Even debug and coverage measurement tools can work in a ‘host–target’ or client–server environment.

Having to acquire additional hardware is sometimes more of a psychological barrier than a technical or economic one. In your tool selection process, especially if there are not many for your ‘home’ environment, it is worth considering tools based on a separate environment.

However, you may need to acquire extra hardware even for a tool that runs on your own current environment, for example extra disk space to store test scripts.

Make sure that you find out exactly what the tool requires in terms of hardware and software versions. For example, you would not want to discover at installation time that you needed to have an operating system upgrade or additional memory before the tool can work. Have you considered security aspects? Do you need a separate language compiler for the test scripts?

Commercial supplier constraints

The company that you buy the tool from will be an important factor for your future testing practices. If you have problems with the tool, you will want them sorted out quickly and competently. If you want to get the best from the tool, you will want to take advantage of their expertise. You may want to influence the future development of the tool to provide for those needs which are not currently met by it.

There are a number of factors that you should take into consideration in evaluating the tool vendor’s organisation:

  • Is the supplier a bona fide company?
  • How mature are the company and the product? If the company is well established this gives confidence, but if the product has not changed significantly in recent years it may be getting rather out of date. Some organisations will feel that they need to buy products from the product vendor who sets the trend in the marketplace. Some organisations will be wary of new product companies, but there may be instances when a brand new organisation or product may be an unknown quantity but may be just what you need at just the right time. A new vendor may be much more eager to please their first customers;
  • Is there adequate technical support? What would their response be to major or minor problems? Does the vendor run a help desk? What hours is help available? (If your vendor is in California and you are in Europe, there will be no overlap of their working day with yours!) What training courses are provided? How responsive are they to requests for information?
  • How many other people have purchased or use this tool? You may or may not want to be the very first commercial user of a new tool. Can you talk to any other users? Is there a user group, and when does it meet and who controls it? Will they provide a reference site for you to talk to?
  • What is the tool’s history? Was it developed to support good internal testing practices, to meet a specific client need, or as a speculative product? How many releases have there been to date, and how often is the tool updated? How many open faults are there currently reported?

Your relationship with the tool vendor starts during the selection and evaluation phase. If there are problems with the vendor now (when they want your money), there are likely to be even more problems later.

Cost constraints

Cost is often the most stringent and most visible constraint on tool selection. The purchase price may only be a small factor in the total cost to the organisation in fully implementing the tool. Cost factors include:

  • purchase or lease price
  • cost basis (per seat, per computer etc.)
  • cost of training in the use of the tool
  • any additional hardware needed (e.g. a PC, additional disk space or memory)
  • support costs
  • any additional costs, e.g. consultancy to ensure the tool is used in the best way.

Other constraints

Tool quality factors may include:

  • How many people can use the tool at the same time? Can test scripts be shared?
  • What skill level is needed to use the tool? How long does it take to become proficient? Are programming skills needed to write test scripts?
  • What documentation is supplied? How thorough is it? How usable is it? Are there ‘quick reference guides’, for example?

  • There may well be other constraints which override all of the others, for example ‘political’ factors, such as having to buy the same tool that the parent company uses (e.g. an American parent enforces an American tool as its standard), or a restriction against buying anything other than a locally supported tool, perhaps limiting the choice to a European or British, French or German tool. It is frustrating to tool selectors to discover these factors late on in the selection process.

Constructing the shortlist

Use the cross-references in this report to find the tools that meet your environmental requirements and provide the features that are essential for you. Read the descriptions in the tools pages. This should give you enough information to know which tools listed in this report can go on your shortlist for further evaluation.

If there are more than six or seven tools that are suitable for you, you may want to do some initial filtering using your list of desirable features so that you will be looking at only three or four tools in your selection process.

If no or not enough suitable tools are found in this report, the search could be widened to other countries (e.g. the USA).

Other sources of information include pre-commercial tools (if you can find out about them). It is worth asking your current hardware or system software supplier if they know of any tools that meet your needs. If you are already using a CASE tool, it would be worth asking your vendor about support for testing, either through future development of their tool or by linking to an existing CAST tool. Conferences and exhibitions are where new vendors often go to announce a new tool. In particular the EuroSTAR conference is the prime showcase for testing tools.

The possibility of in-house development of a special-purpose tool should also be assessed. Do not forget to consider any existing in-house written tools within your own organisation that may be suitable for further development to meet your needs. The true cost of in-house development, including the level of testing and support needed to provide a tool of adequate quality, will be significant. It is generally much more than the cost of a commercial tool, but an in-house written tool will be more directly suitable to your own needs. For example, it can help to compensate for a lack of testability in the software under test. A purchased tool may need additional tailoring in order to meet real needs, and this can be expensive.

Another possibility is to use a ‘meta-tool’ to develop a new tailored tool in a short time. A meta-tool provides software for building software tools quickly, using the existing foundation of a standardised but highly tailorable user interface, graphical editor and text editor. It could enable a new testing tool, tailored to a specific organisation, to be built within a few months.

Summary of where to look for tools
  • Existing environment (this report)
  • CASE tool vendor
  • PC-based (this report)
  • Meta-tool vendor
  • In-house prototype for development
  • World market
  • Future environment of likely vendor
  • Conferences and exhibitions
  • Hardware/software vendor
 

Tool Evaluation

Evaluating the shortlisted candidate tools

Research and compare tool features

Contact the vendors of the shortlisted tools and arrange to have information sent (if you have not done this already). Study the information and compare features. Request further information from the vendors if the literature sent does not explain the tool function clearly enough.

This is the time to consult one or more of the publications which have evaluated testing tools, if the ones you are interested in are covered in such a report. The cost of such reports should be compared to the cost of someone’s time in performing similar evaluations, and the cost of choosing the wrong tool because you did not know about something which was covered in published material. (Do not forget to allow time to read the report.)

Ask the shortlisted vendors to give you the names of some of their existing customers as reference. Contact the reference sites from each shortlisted vendor and ask them a number of questions about the tool. For example, why they bought this tool, how extensively it is now used, whether they are happy with it, what problems they have had, their impression of the vendor’s after-sales support service, how the tool affected their work, what benefits the tool gave them, and what they would do differently next time they were buying a tool. Remember that reference sites are usually the vendor’s best customers, and so will be likely to be very happy with the tool. Their environment is different from yours, so the benefits or problems which they have had may well not be the same as the ones which are important to you. However, the experience of someone else who bought a tool for similar reasons to yours is invaluable and well worth pursuing.

Many vendors are aware that a tool does not always add up to a total solution and are keen to present it as part of a more comprehensive offering, often including consultancy and training beyond just their product. They usually understand the issues covered in this paper because bad selection and bad implementation of their tools gives them a bad reputation. Because the vendors have good experience in getting the best out of their tools, their solutions may enhance the tools significantly and are worth serious examination. Nevertheless, it is always worth bearing in mind that the tool supplier is ultimately trying to persuade you to buy his product.

At any point in the selection and tool evaluation process it may become clear which tool will be the best choice. When this happens, any further activities will not influence the choice of tool but may still be useful in assessing in more detail how well the chosen tool will work in practice. It will either detect a catastrophic mismatch between the selected tool and your own environment, or will give you more confidence that you have selected a workable tool.

Tool demonstrations: preparation

Before contacting the vendor to arrange for a tool demonstration, some preparatory work will help to make your assessment of the competing tools more efficient and unbiased. Prepare two test case suites for tool demonstration:

  • one of a normal ‘mainstream’ test case
  • one of a worst-case ‘nightmare’ scenario.

Rehearse both tests manually, in order to discover any bugs in the test scenarios themselves. Prepare evaluation forms or checklists:

  • general vendor relationship (responsiveness, flexibility, technical knowledge)
  • tool performance on your test cases. Set measurable objectives, such as time to run a test on your own (first solo flight), time to run a reasonable set of tests, time to find an answer to a question in the documentation.

It is important that the tools be set up and used on your premises, using your configurations, and we recommend this, if at all possible, for the demonstration. We have had clients report to us that they found this single step to be extremely valuable, when they discovered that their prime candidate tool simply would not run in their environment! Of course, the vendor may be able to put it right but this takes time, and it is better to know about it before you sign on the dotted line, not after.

Invite the vendors of all shortlisted tools to give demonstrations within a short time-frame, for example on Monday, Wednesday and Friday of the same week. This will make sure that your memory of a previous tool is still fresh when you see a different one.

Give vendors both of your test cases in advance, to be used in their demo. If they cannot cope with your two cases in their demo, there probably is not much hope of their tool being suitable. However, be prepared to be flexible about your prepared tests. The tool may be able to solve your underlying problem in a different way than you had pictured. If your test cases are too rigid, you may eliminate a tool which would actually be very suitable for you.

Find out what facilities the vendors require and make sure they are available. Prepare a list of questions (technical and commercial) to ask on the demo day, and prepare one more test case suite to give them on the day. Allow time to write up your reactions to each of the tools, say at the end of each day.

Tool demonstrations from each vendor

Provide facilities for the vendor’s presentation and their demonstration. Listen to the presentation and ask the questions you had prepared.

Observe their running of your prepared test case suites. Try your own ‘hands-on’ demonstration of your prepared test case suites and the new one for the day. Have a slightly changed version of the software being tested, so that the test suite needs to be modified to test the other version. Have the vendors edit the scripts if they insist, but it is better to edit them yourself with their assistance, so that you can see how much work will be involved in maintaining scripts.

Ask (and note) any more questions which occur to you. Note any additional features or functions which you had not realised this tool provided. Note any features or functions which you thought it did provide but does not, or not in the form you had thought.

Try to keep all demonstrations the same as far as possible. It is easy for the last one to incorporate improvements learned during the other demonstrations, but this is not fair to the first one. Save new ideas for use in the competitive trial.

Thank and dismiss the vendor. Write up your observations and reactions to this tool.

Post-demonstration analysis

Ask of the vendors you saw first any questions which occurred to you when watching a later vendor’s presentation or demonstration. This will give the fairest comparison between the tools.

Assess tool performance against measurable criteria defined earlier, taking any special circumstances into account. Compare features and functions offered by competing tools. Compare non-functional attributes, such as usability. Compare the commercial attributes of vendor companies.

If a clear winner is now obvious, select the winning tool. Otherwise select two tools for final competitive trial. Write to the non-selected vendors giving the reason for their elimination.

Competitive trial

If it is not clear which tool is the most appropriate for you at this point, an in-house trial or evaluation will give a better idea of how you would use the tool for your systems.

Most tool vendors will allow short-term use of the tool under an evaluation licence, particularly for tools which are complex and represent a major investment. Such licences will be for a limited period of time, and the evaluating unit must plan and prepare for that evaluation accordingly.

It is all too easy to acquire the tool under an evaluation licence only to find that those who really ought to be evaluating it are tied up in some higher-priority activity during that time. If they are not able or willing to make the time available during the period of the licence to give the tool more than the most cursory attention, then the evaluation licence will be wasted.

The preparation for the trial period includes the drafting of a test plan and test suites to be used by all tools in the trial. Measurable success criteria for the evaluated tools should be planned in advance, for example length of time to record a test or to replay a test, and the number of discrepancies found in comparison (real, extraneous and any missed). Attending a training course for each tool will help to ensure that they will be used in the right way during the evaluation period.

When the competi

Tags: #cast

Paul Gerrard My linkedin profile is here My Mastodon Account