Thinking Big: Introducing Test Analytics

First published 23/10/2013

This article appeared in the October 2013 Edition of Professional Tester magazine. To fit the magazine format, the article was edited somewhat. The original/full article can be downloaded here.

Big Data

Right now, Big Data is trending. Data is now being captured at an astonishing speed. Any device that has a power supply has some software driving it. If the device is connected to a network or the internet, then the device is probably posting activity logs somewhere. The volumes being captured across organisations are huge – databases of Petabytes (millions of Gigabytes) of data are springing up in large and not so large organisations. Traditional, relational technology simply cannot cope. Mayer-Schonberger and Cukier argue in their book, “Big Data” [1], it’s not that data is huge, it’s that, for all business domains, it seems to be much bigger than we collected before. Big Data can be huge, but the more interesting aspect of Big Data is its lack of structure. The change in the philosophy of Big Data is reflected in three principles.
  1. Traditionally, we have dealt with samples (because the full data set is large), and as a consequence we have tended to focus on relationships that reflected cause and effect. Looking at the entire data set allows us to see details that we never could before.
  2. Using the full data set releases us from the need to be exact. If we are dealing with data points in the tens or hundreds, we focus on precision. If we deal with thousands or millions of data points, we aren’t so obsessed with minor inaccuracies like losing a few records here and there.
  3. We must not be obsessed with causality. If the data tells us there is a correlation between two things we measure, then so be it. We don’t need to analyse the relationship to make use of it. It might be good enough just to know that the number of cups of coffee bought by product owners in the cafeteria correlates inversely with the number of severity 1 incidents in production. (Obviously, I made that correlation up – but you see what I mean). Maybe we should give the POs tea instead?
The interest in Big Data as a means of supporting decision-making is rapidly growing. Larger organisations are creating teams of so-called data scientists to orchestrate the capture of data and analyse it to obtain insights. The phrase ‘from insight to action’ is increasingly used to summarise the need to improve and accelerate business decision-making.

‘From Insight to Action’Activity logs tend to be captured as plain text files with fields delimited by spaces, tabs or commas or as JSON or XML formatted data. This data does not appear validated, structured and integral as it would be in a relational table – it needs filtering, cleaning, enriching as well as storing. New tools designed to deal with such data are becoming available. A new set of data management and analysis disciplines are also emerging. What opportunities are out there for testing? Can the Big Data tools and disciplines be applied to traditional test practices? Will these test practices have to change to make use of Big Data? This article explores how data captured throughout a test and assurance process could be merged and integrated with definition data (requirements and design information) and production monitoring data and analysed in interesting and useful ways.

The original/full article can be downloaded here.

Tags: #continuousdelivery #BigData #TestAnalytics

Paul Gerrard My linkedin profile is here