News

Reflections on ‘Big Data’ and the implications for Quality management

Digital data exchange_xxl

Martin Andrew is Quality Manager for URS Australia, part of AECOM, the world’s leading engineering, environmental and construction services company. Martin is also National President of Australian Organisation for Quality Inc.

This paper was developed from a Café Quality discussion that Martin led in Adelaide on 17th June.

The Café Quality discussion drew heavily on a ’60 MOC’ [Minutes of Coffee] session presented by Marina Pullin of Mantra Australia in Adelaide on 6th March 2015.

Martin is not an authority on Big Data but knows some of the questions to ask. He has been involved with computer-assisted data analysis since undertaking a Fortran IV course in 1971 and throughout his subsequent career as a University and CSIRO research ecologist, natural resources consultant and Quality leader.

Introduction

Data, information and knowledge are closely related concepts, but each has its own role in relation to the other. Data are collected and analysed to create information suitable for making decisions, while knowledge is derived from extensive amounts of experience dealing with information on a subject (Wikipedia). Wisdom is about the appropriate time and place to apply knowledge.

So what is ‘Big Data’? The term was first coined by NASA in 1997, so it is a relatively new concept. The Gartner company defines Big Data as “high volume, high velocity and high variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision-making”.

To provide some context for what “high volume, high velocity and high variety information assets” means, every minute, globally, there are about 100k tweets, 700k status updates, 11m instant messages and 700k Google searches. And that is just some of the data being created; others include GPS devices, satellite imagery, mobile phone pings to towers, machinery sensors etc.

And these data are growing: a prediction is that in 2020 there will be 44x as much data being created as in 2009.

What is significant about the Gartner definition is that ‘Big Data’ is not just the data – it’s about how it is processed to provide information to inform decisions; it’s not just the ‘what’ but the ‘how’, and importantly, the ‘why’.

Furthermore its ‘Big’ness is relative to our ability to process it.

Arguably, Sir Ronald Fisher would not have developed parametric statistical analysis if he had had access to computers to allow for bootstrapping and jack-knifing of datasets to generate the empirical distribution of a statistic of interest to enable an exact probability to be calculated.

What was a big dataset in 1980, requiring me to use CSIRO’s Cyber 76 mainframe computer in Canberra to analyse, is now easily handled by my PC using Excel.

Moore’s Law observes that computing power (per unit price) has doubled quickly (every 18 months or so) for the past 50 years, which is convenient given the exponential growth of the data.

So what’s new about ‘Big Data’?

It seems to me that what’s new includes the sheer volume of the data, that it is mostly now produced digitally and hence amenable to analysis, and that so much is generated in real time.

And what is even more remarkable is the ability to make interconnections between these data. The ‘Internet of Things’ is expected to comprise 50 Bn objects in 2020; the ‘things’ include: phones, fridges, vehicles, luggage, pets, agricultural sensors, building sensors, and various instrumentation. This increases enormously the information that can be obtained.

What’s not new about Big Data?

In terms of the fundamentals, I think there is much that is not new. The same pitfalls that have always been associated with data analysis still apply, probably even more so.

Many a scientist has eagerly launched into gathering data, only to be advised by a statistician subsequently that the data are not easily analysable because they were not collected in a designed, thoughtful way. The sequence ‘Ready, Aim, Fire’ applies to Big Data also.

Effective data analysis needs to ‘start with why’ (as per Simon Sinek’s book of the same name), and ‘begin with the end in mind’ (Stephen Covey). In other words, what is the problem being addressed or the decision that needs to be informed? And what is the information required?

Even the term ‘data mining’ implies a clear idea of what is being sought and why.

Then there is the issue of the scientific method and the idea of ‘control’ or ‘counterfactual’. We all know that correlation does not necessarily mean causality, yet spurious correlations become more prevalent as the dataset sizes increase. The GIGO principle (Garbage In, Garbage Out) still applies. Data mining tools are unlikely to replace the thinking of data scientists any time soon.

What’s also true is that our human brains are not developing in synchrony with the mass and complexities of Big Data. “Humans haven’t had a software update in 200,000 years” (Peter Diamandis).

Big Data Pitfalls

Gartner lists the top challenges for Big Data as being:

  1. not understanding the benefits to the business
  2. inadequate analytics skills
  3. deciding what data are relevant.

Gartner states the common causes of Big Data failure as:

  1. management resistance
  2. selecting the wrong uses
  3. asking the wrong questions
  4. lacking the right skills
  5. unanticipated problems beyond big data technology
  6. disagreement on enterprise strategy
  7. big data silos
  8. problem avoidance.

Note that challenges a and c, and failures 1, 2, 3, 6, and 8 are all related to confusion about the ‘Why’.

These challenges and failures underscore that it is people who rule the data process. The capture, use and analysis of data are not objective but subjective – witness the recent dismissal by the Federal Transport Minister of the idea that road taxes be levied on actual individual road usage determined by GPS ‘Big Data’.

In a recent article, Ross et al. (2013) argue that organisations need to make better use of the ‘little data’ they already have. “The biggest reason that investments in big data fail to pay off, though, is that most companies don’t do a good job with the information they already have. They don’t know how to manage it, analyze it in ways that enhance their understanding, and then make changes in response to new insights. Companies don’t magically develop those competencies just because they’ve invested in high-end analytics tools. They first need to learn how to use the data already embedded in their core operating systems, much the way people must master arithmetic before they tackle algebra. Until a company learns how to use data and analysis to support its operating decisions, it will not be in a position to benefit from big data.” There can be a Cargo Cult mentality to Big Data – and especially so when the ‘Why’ has not been clarified.

Some benefits / success cases of Big Data

Einstein said “Imagination is more important than knowledge”, and this is true for using Big Data.

Imaginative uses abound:

  • vehicle automation and ultimately driverless vehicles
  • Google search results conditioned by your personal search history
  • marketing messages tailored for customers’ personal behaviours and preferences
  • laser scanning, 3D printing and Building Information Modelling
  • predicting trends from analysing discourse on social media, such as mapping outbreaks of colds and ‘flu to predict pharmacy demand for treatments, and detecting fashion trends that suppliers can respond to quickly
  • improved agricultural production from better land, plant and animal management.

Other examples quoted in Ross et al. (2013) include:

  • UPS courier service used real time tracking of its vehicles and other data to redesign routes to minimise the number of turns against the traffic (i.e. right-hand turns in Australia), thus reducing both time, fuel and greenhouse gases – 11,000 tonnes of CO2 globally in 2011!
  • website designers test modifications in real time using A/B testing where consumers are randomly exposed to the tweaked or current website and their responses compared and statistically analysed to enable objective decision making.

Whilst Big Data can be used for good, it can be used for ill also – by sophisticated criminal operations and pariah governments. Big Data is essentially value-neutral. Whether the Australian Tax Office’s ability to connect the dots between billions of financial transactions is for good or ill perhaps depends on your point of view!

Ideally, more data means more information, which means more insights, which means increased productivity. But in the absence of clear thinking this is not likely to be the case.

What are the implications for Big Data for Quality Management?

Quality is defined as meeting the needs of the customer. Big Data enables this to be done much more effectively. Examples include:

  • Customer ratings of suppliers on Amazon, eBay, Yelp! and TripAdvisor; social media and Big Data empowers customers!
  • Suppliers can measure their customers’ actual behaviour (eg the A/B testing mentioned above) to get an accurate Voice of the Customer, rather than merely what customers think they want expressed via a survey.
  • Manufacturers can measure the actual performance of their products via telemetry and other data capture methods.
  • More fundamentally, variation is inherent in everything. Big Data can enable us to understand this variation more incisively and then manage to reduce unwanted variation. This was explored in a conference on ‘Big Data Analytics and Lean Six Sigma’ at Monash University in June 2015.

All in all, Big Data enables more effective decision making based on objective and abundant evidence. And because more effective decision making contributes to organisational excellence, Big Data has a powerful role to play in facilitating organisational excellence also.

Acknowledgements

Marina Pullin, Mantra Australia – 60MOC on ‘Big Data’, Adelaide 6th March 2015

Bruce Riley, who actively chaired the AOQ’s Café Quality on Big Data in Adelaide, 17th June 2015 and who reviewed this manuscript.

Reference

Jeanne W. Ross, Cynthia M Beath, Anne Quaadgras (2013) You may not need big data after all. Harvard Business Review (on-line).

For further information about this article, feel free to contact Martin by emailing martin.andrew@urs.com.

Image sourced from luminastock/123RF.