Loading ...

DExD for Statistical Process Control

Roger Beecham, Jason Dykes, Chris Rooney & William Wong

Version: 08/2019. Last updated: 9th August 2019

Table of Contents

A Standard SPC chart and some alternative ways of mapping geography

This Design Exposition Document is intended to explain some options for visualizing the crime data that we are focusing on and to give you a chance to try our ideas in your own workplace at your own pace and at a time that suits you. We'd like you to try the designs out, interact with them and provide us with some feedback on their potential.

We begin by describing the design process for furthering the work on statistical process control (SPC).

SPC is used to identify signals (both positive and negative) in data by comparing individual data points against the mean and variance of the whole data (or an individual process). It is commonly used in policing and represented through an SPC chart such as this:

A standard SPC chart. This chart shows all crimes committed in the West Midlands, aggregated by month.
Select individual data points to treat them as outliers and remove them from the analysis.
Create process markers by clicking on the graph.
Click to toggle automatic process detection.

Play a little with the graph. Try removing some points. Try clicking positions in the graph to split processes. Consider the effects of these actions.

Has this helped you think of the sensitivities of SPC analysis to the time periods selected and extreme events?
Record your thoughts.

One way in which we can potentially expand SPC is by thinking about these processes and any signals that we detect in terms of their geography:
Where do they occur?
Do they cluster?
Do we have different types of signal and process in different places?

Here is a standard map of the West Midlands that might form the basis for this analysis. Each region represents a neighbourhood for which monthly crime statistics are available.

A map of the West Midlands. Hover over a neighbourhood to see additional information.

Rather than simply shade the regions in a single colour, we could, for example, use a discrete colour scale to differentiate neighbourhoods based on the larger neighbourhood policing units (NPU) into which they fall. This is known as a choropleth map.

A choropleth map of the West Midlands.
Individual neighbourhoods are coloured based on their NPU.
Hover over a neighbourhood to see additional information.

Another option is to colour each neighbourhood by a continuous value, such as population. This emphasises the fact that small, densely populated regions lack visual salience (i.e., they occupy very little screen space). Take, for example, Bordesley Green, which has a much higher population than Meriden, but takes up much less screen space.

Take time to find Bordesley Green and Meriden.
What are their respective populations?

A choropleth map of the West Midlands.
Individual neighbourhoods are coloured based on their population.

To address our geographic questions we may want to know what SPC analysis and charts look like in each of our neighnbourhoods or NPUs. We may want to compare the SPC analysis for these areas in their geographic context - by adding information from the SPC analyis onto the map. However, the problem we have is that the areas vary greatly in size, and are tiny in the densely populated areas. It would be hard to fit the information onto the map without overlap that would make things very difficult to read and interpret.

Another method of representing geography that aims to address this problem is through a space-filling tree map. The aim of this visualisation is to maximise the use of the space available on the screen or page to show as much data as possible. But in doing so, the precise geographic positions are relaxed a little - places on our maps move to different positions on the screen to that we can see them more clearly in their approximate geographic positions. As a result, the original geography is converted into an uneven grid. Each cell still represents a neighbourhood, but now its size represents some desired characteristic - such as its population. The algorithm behind the visualisation attempts to maintain the position of each neighbourhood according to its original geography. Although this is never perfect, as you can see from the visualisation below. Since we are using size to represent population, we can again use colour to show NPU.

A space-filling tree map. The size of each neighbourhood represents its population.

If we want to place summary visualisations inside each neighbourhood, then we might be best to size them equally. However, this once again changes the layout of the neighbourhood.

Try to find Winson Green and Selly Oak.
You can see from the highlighting that they have different locations in the maps below and above.

A space-filling tree map, coloured by NPU and sized equally.

An issue with the algorithm used above is that the neighbourhoods have different aspect ratios - rectangular shapes and orientations. This could still present a problem if we attempt to place summary visualisation inside each neighbourhood in a consistent manner. One option here it to switch to an algorithm that gives each neighbourhood an equal size and shape. The algorithm used below does not fill the space (notice the gap in the top right corner), but it does create a tile of the same shape, size and aspect ratio for each reporting area. We can call this a Tile Map. It also does a better job of keeping the NPUs together.

Tile Map representing the West Midland neighbourhoods, coloured by NPU.

As mentioned, the main disadvantage of this approach is that it can be difficult to identify the relationship between the location of the reporting areas on the grid layout and the original geography. One way to address this is to not fill all the space, but rather leave gaps such that the grid arrangement more closely resembles the original geography. The problem with this solution is that the tiles themselves will be smaller, giving us less room to show information about the reporting areas. Below is the gapped approach with colour representing NPU. As you can see, the neighbourhoods are well aligned, and the overall layout more represents the original geography more closely with the gaps added. It is possible to configure the algorithm to further increase the accuracy of the neighbourhood positions, but this comes at the cost of size (i.e. more gaps and smaller, less salient, neighbourhoods).

As an aside, this type of colouring might be useful in the background of some other visualisation (such as icons representing a signal), so we've implemented a slider bar to control the opacity.

How faint can you make the visualisation such that you can still determine the NPUs?

Tile Map representing the West Midland neighbourhoods, coloured by NPU.

Opacity: 1

Placing the three visualisation side-by-side allows us to make a direct comparison between the trade-offs we aree making between geography, salience and consistency of shape. At this size, you can see the regions in the two left-most views are larger and easier to see than the tile map. We've placed the original geography above for comparison.

The three types of geographic representations. Hover over a neighbourhood in any of the maps to see where it resides in the other two.

If we colour by NPU again, we would argue that the tile map is the best regarding both clarity (seeing each neighbourhood easily), geographic familiarity (distance of the neighbourhood from its geographic position and its neighbours) and overall layout.

The three types of geographic representations, coloured by NPU. Hover over a neighbourhood to see where it resides in the other two.

Finally, we can think about evenly-sized geography, but with a focus on NPUs. Below is an arrangement of NPUs such that they are positioned in relation to the original geography. Inside each NPU, we then arrange the neighbourhoods in naighboruhood tile maps using the same algorithm as above. This means that each NPU more closely resembles its original geography, but the West Midlands as a whole does not. The disadvantage of this is that is reinforces these 'soft' boundaries between neighbourhoods. While one might report on a particular NPU, this does not mean that a spatial association does not exist in nearby neighbourhoods that are assigned to different NPUs. Therefore, the continuous geographic representations of the West Midlands, such as the tile map, are likely to be preferable to the NPU-divided geography presented below even though they contain irregularities and disciontinuities as we have seen.

The NPU arrangement is ad-hoc and can be modified.

Rearrange the NPUs below by dragging them around.
Can you find a different configuration that works well?
If so, please can you capture and send a screenshot of your arrangement?

The West Midlands represented as Tile Maps,one for each NPU.
Rearrange the NPU geography by clicking and dragging each to appropriate locations.

In summary, there is a case for partial geographic layouts as a means of presenting geographic summaries of neighbourhood data, even though they may be unfamiliar and lack geographic precision. But we are interested to hear your views on this:

Does this step away from traditional representations work for you?
Of the two (space-filling tree map and tile map), which would you say you were most comfortable with?
Do you prefer the continuous geography used in the tile map, or the NPU-focused approach in which you rearranged the positions of the NPUs?

That's all on our introduction to SPC Charts and Alternative Geographies.

Next we'd like to think about how these maps can be used in conjunction with SPC for Representing Signals Geographically.