VALCRI Statistical Process Control

Chris Rooney and Roger Beecham

Version: . Last updated: 28th March 2017

Table of Contents

  1. A simple SPC chart and efficient representations of geography
  2. Representing signals geographically
  3. Summarising processes geographically
  4. Combining representations
  5. Comparing multiple maps

This page describes the design process for furthering the work on statistical process control (SPC). SPC is used to identify signals (both positive and negative) in data by comparing individual data points against the mean and variance of the whole data (or an individual process). It is commonly used in policing, and a standard SPC looks like this:

A standard SPC chart. This chart shows all crimes committed in the West Midlands, aggregated by month. Select individual data points to treat them as outliers and remove them from the analysis. Create process markers by clicking on the graph. Click to toggle automatic process detection.

One way in which we can potentially expand SPC is by thinking in terms of geography. Here is a standard map of the West Midlands. Each region represents a neighbourhood.

A map of the West Midlands. Hover over a neighbourhood to see additional information.

Rather than simply shade the regions in a single colour, we could, for example, use a discrete colour scale to colour code each neighbourhood based on its neighbourhood policing unit (NPU). This is known as a choropleth map.

A choropleth map of the West Midlands. Individual neighbourhoods are coloured based on their NPU. Hover over a neighbourhood to see additional information.

Another option is to colour each neighbourhood by a continuous value, such as population. What this emphasises is that small, densely populated regions lack visual salience (i.e., they occupy very little screen space). Take, for example, Bordesley Green, which has a much higher population than Meriden, but takes up much less screen space.

A choropleth map of the West Midlands. Individual neighbourhoods are coloured based on their population.

Another method of representing geography is through a space-filling tree map. The aim of this visualisation is to maximise the use of the space available to show as much data as possible. As a result, the original geography is converted into an uneven grid. Each cell still represents a neighbourhood, but now its size represents its population. The algorithm behind the visualisation tries to maintain the position of each neighbourhood to its original geography. Although it's never perfect, as you can see from the visualisation below. Since we are using size to represent population, we can again use colour to show NPU.

A space-filling tree map. The size of each neighbourhood represents its population.

If we want to place summary visualisations inside each neighbourhood, then we might be best to size them equally. However, this once again changes the layout of the neighbourhood. Try to find Winson Green and Selly Oak. You can see from the highlighting that they have different locations in the maps below and above.

A space-filling tree map, coloured by NPU and sized equally.

An issue with the algorithm used above is that the neighbourhoods have different aspect ratios. This could still present a problem if we attempt to place summary visualisation inside each neighbourhood in a consistent manner. One option here it to switch to an algorithm that gives each neighbourhood an equal size. The algorithm used below does not fill the space (notice the gap in the top right corner), but it does do a better job of keeping the NPUs together.

Small multiples with gaps representing the West Midland neighbourhoods, coloured by NPU.
Opacity: 1

As mentioned, the main disadvantage of this approach is that it can be difficult to identify the relationship between the grid layout and the original geography. One way to address this is to not fill all the space, but rather leave gaps such that the grid arrangement more closely resembles the original geography. Below is the gapped approach with colour representing NPU. As you can see, the neighbourhoods are well aligned, and the overall layout more closely represents the original geography. It is possible to configure the algorithm to further increase the accuracy of the neighbourhood positions, but this comes at the cost of size (i.e. more gaps and smaller neighbourhoods).

On a side note, this type of colouring might be useful in the background of some other visualisation (such as icons representing a signal), so we've implemented a slider bar to control the opacity. How faint can you make the visualisation such that you can still determine the NPUs?

Small multiples with gaps representing the West Midland neighbourhoods, coloured by NPU.
Opacity: 1

Placing the three visualisation side-by-side we can make a direct comparison. At this size, you can see the regions in the two left-most views are larger and easier to see. We've placed the original geography above for comparison.

The three types of geographic representations. Hover over a neighbourhood to see where it resides in the other two.

If we colour by NPU again, we would argue that the gapped approach is the best regarding both clarity (seeing each neighbourhood easily), geographic familiarity (distance of the neighbourhood from its actual position, and overall layout).

The three types of geographic representations, coloured by NPU. Hover over a neighbourhood to see where it resides in the other two.

Finally, we can think about evenly-sized geography, but with a focus on NPUs. Below is an arrangement of NPUs such that they are arranged similar to the original geography. Inside each NPU, we then arrange the neighbourhoods using the same algorithm as above. This means that each NPU more closely resembles its original geography, but Birmingham as a whole does not. The disadvantage of this is that is reenforces these 'soft' boundaries between neighbourhoods. While one might report on a particular NPU, this does not mean that a spatial correlation does not exist in neighbouring neighbourhoods that are assigned to different NPUs. Therefore, we believe the continuous geography of West Midlands to be more beneficial than the NPU-divided geography below.

NOTE: The NPU arrangement is ad-hoc and can be modified. Rearrange the NPUs below by dragging them around. Would you arrange them differently? If so, please can you capture and send a screenshot of your arrangement?

The West Midlands represented as small multiples, but divided by NPU. Rearrange the NPU geography by dragging each NPU.

In summary, we believe that unusual geographic layouts can be beneficial for representing summaries of neighbourhood data. We are interested to hear your feedback. Does this breakaway from traditional representations work for you? Of the two (space filling and gapped), which would you say you were most comfortable with? Do you prefer the continuous geography, or the NPU-focused approach (regarding to position rather than colour)?

Click here to learn more about how we think these layouts (the gapped one in particular) can be used in conjunction with SPC.