**What are ‘Data Worlds’?**

The phrase ‘Data World’ refers to types of numerical data. All numerical data can be grouped into the three types, and these are described as the **Continuous**, **Count** and **Attribute** Data Worlds.

- The
**Continuous**Data World refers to numerical data that results from*measuring*things. So, any measurement of time, distance, temperature, weight, pressure (or many other characteristics) is ‘Continuous’ data. These measurements are not restricted to whole numbers. Continuous Data can follow a number of different statistical models (the most well known of these is the Normal distribution). There are many different statistics that are used to summarise Continuous data, the best known of which are the average (for central position) and standard deviation (for variation). - The
**Count**Data World refers to numerical data that results from*counting*things. The data is always whole numbers because if you are counting, then it is not possible to have half a unit (otherwise you would be measuring). A range of statistics can be used to summarise Count data, but most often, the average (count) is used. - The
**Attribute**Data World refers to numerical data that results from*classifying*things. So, if something is classified as either good/bad, ok/not ok, on-time/not on-time, within specification/out of specification, the resulting data is Attribute data. The most common statistic used to summarise Attribute data is the percentage.

**An example:** Imagine a project that is looking at reducing the number of scratches on mobile phone screens (hopefully to zero!). The project team are wondering what data to capture – here are a few of their options:

- They could assess if a screen is scratched or not, and
*classify*it as either good or bad – this would be Attribute data. The percentage of scratched screens could be calculated. - They could
*count*the number of scratches on a screen – this would be Count data. The average number of scratches per screen could be calculated. - They could
*measure*the length (or depth or width) of the scratches on a screen – this would result in Continuous data. The average length of the scratches could be calculated, along with the standard deviation.

**Different terminology:** The terminology *Continuous*, *Count* and *Attribute* was proposed by OPEX Resources in our publication ‘Lean Six Sigma and Minitab’, in order to standardise the terminology in use. However, there are several alternative sets of terminology in use, as follows:

- The
*Continuous*data world is sometimes also called*Variable*. - The
*Count*data world is sometimes also called*Defects*, and can be referred to as*Discrete*or*Attribute*data. - The
*Attribute*data world is sometimes also called*Defective*, and like Count data, can also be referred to as*Discrete*or*Attribute*data.

**Why are data worlds so important?** Data Worlds are a forgotten principle – many Lean Six Sigma training programs gloss over them, and as a result, delegates become confused about how to analyse different types of data later on. For example, if you’re going to select the right Hypothesis test, or SPC Chart, then you need to know what type of data you’ve got first. Recognising and understanding the different data worlds is essential in order to know:

- how a process might ‘behave’ over time (what’s expected, what isn’t)
- what statistics to use to summarise the data
- what statistical model the data may follow
- what graphical techniques will show the data most clearly
- what statistical techniques can be used to analyse the data

**Summary:** By way of summary, the diagram below is taken from page 50 of the Lean Six Sigma and Minitab book, which also provides more information and example data files for each of the data worlds.