What are ‘Data Worlds’?
The phrase ‘Data World’ refers to types of numerical data. All numerical data can be grouped into the three types, and these are described as the Continuous, Count and Attribute Data Worlds.
- The Continuous Data World refers to numerical data that results from measuring things. So, any measurement of time, distance, temperature, weight, pressure (or many other characteristics) is ‘Continuous’ data. These measurements are not restricted to whole numbers. Continuous Data can follow a number of different statistical models (the most well known of these is the Normal distribution). There are many different statistics that are used to summarise Continuous data, the best known of which are the average (for central position) and standard deviation (for variation).
- The Count Data World refers to numerical data that results from counting things. The data is always whole numbers because if you are counting, then it is not possible to have half a unit (otherwise you would be measuring). A range of statistics can be used to summarise Count data, but most often, the average (count) is used.
- The Attribute Data World refers to numerical data that results from classifying things. So, if something is classified as either good/bad, ok/not ok, on-time/not on-time, within specification/out of specification, the resulting data is Attribute data. The most common statistic used to summarise Attribute data is the percentage.
An example: Imagine a project that is looking at reducing the number of scratches on mobile phone screens (hopefully to zero!). The project team are wondering what data to capture – here are a few of their options:
- They could assess if a screen is scratched or not, and classify it as either good or bad – this would be Attribute data. The percentage of scratched screens could be calculated.
- They could count the number of scratches on a screen – this would be Count data. The average number of scratches per screen could be calculated.
- They could measure the length (or depth or width) of the scratches on a screen – this would result in Continuous data. The average length of the scratches could be calculated, along with the standard deviation.
Different terminology: The terminology Continuous, Count and Attribute was proposed by OPEX Resources in our publication ‘Lean Six Sigma and Minitab’, in order to standardise the terminology in use. However, there are several alternative sets of terminology in use, as follows:
- The Continuous data world is sometimes also called Variable.
- The Count data world is sometimes also called Defects, and can be referred to as Discrete or Attribute data.
- The Attribute data world is sometimes also called Defective, and like Count data, can also be referred to as Discrete or Attribute data.
Why are data worlds so important? Data Worlds are a forgotten principle – many Lean Six Sigma training programs gloss over them, and as a result, delegates become confused about how to analyse different types of data later on. For example, if you’re going to select the right Hypothesis test, or SPC Chart, then you need to know what type of data you’ve got first. Recognising and understanding the different data worlds is essential in order to know:
- how a process might ‘behave’ over time (what’s expected, what isn’t)
- what statistics to use to summarise the data
- what statistical model the data may follow
- what graphical techniques will show the data most clearly
- what statistical techniques can be used to analyse the data
Summary: By way of summary, the diagram below is taken from page 50 of the Lean Six Sigma and Minitab book, which also provides more information and example data files for each of the data worlds.