Geographic analysis is a method of applying analytics techniques to data with a locational or geographic aspect. This ultimately leads to the creation of new knowledge and insights. Geographic analysis can be carried out very easily via a simple visualisation, through more complex geographic analyses and by enriching the data with additional geographic information that is freely offered as open data.
As with any analytics project, a rapid prototyping and ‘think big, act small’ approach is the key to success. Typically, we work incrementally and iteratively, ensuring that we have as many opportunities as possible to get feedback from the business. But, most importantly, we work in a manner that is specifically tailored to the business. This approach and an agile development process are often necessary to convince important stakeholders of the opportunities they will gain by working further with the data. An analytics project involves multiple iteration in these key stages:
Salespoint_ID | Street | Zipcode | City | … | Sales | Latitude | Longitude |
---|---|---|---|---|---|---|---|
10006 | … | 8640 | OOSTVLETEREN | … | €501 000 | 50.9358 | 2.7368 |
10008 | … | 8800 | ROESELARE | … | €658 000 | 50.9476 | 3.1228 |
10009 | … | 1932 | SINT-STEVENS-WOLUWE | … | €314 000 | 50.87 | 4.4422 |
10012 | … | 4020 | LIEGE | … | €498 000 | 50.6542 | 5.6293 |
10014 | … | 8550 | ZWEVEGEM | … | €510 000 | 50.8209 | 3.3613 |
10023 | … | 2860 | SINT-KATELIJNE-WAVER | … | €232 000 | 51.078 | 4.5363 |
10025 | … | 8500 | KORTRIJK | … | €418 000 | 50.8312 | 3.2616 |
10028 | … | 3800 | SINT-TRUIDEN | … | €345 000 | 50.8208 | 5.191 |
10029 | … | 2500 | LIER | … | €484 000 | 51.1359 | 4.5741 |
10033 | … | 1410 | WATERLOO | … | €169 000 | 50.7067 | 4.4063 |
10037 | … | 9000 | GENT | … | €345 000 | 51.0543 | 3.7307 |
10038 | … | 7500 | TOURNAI | … | €194 000 | 50.6118 | 3.3891 |
10044 | … | 8400 | OOSTENDE | … | €322 000 | 51.2346 | 2.9256 |
10047 | … | 1000 | BRUSSEL | … | €117 000 | 50.8535 | 4.3678 |
10048 | … | 1060 | BRUXELLES | … | €349 000 | 50.8382 | 4.3614 |
Company data are often - for performance-related reasons - stored in a classic table format and also - perhaps too often - reported in that format. This makes it difficult to interpret these business data. The first step in geographic analysis is to create a simple geographic visualisation, which makes it easier and quicker to understand and apply certain insights or trends.
Starting with a traditional table format, the example here displays (fictitious) point-of-sale data, where we have first expanded these data with coordinates obtained via open API-services. This allows us to visualise the data on a geographic map.
Once the geocoded data is visualised on a geographic map we can focus very explicitly on the ‘where’ of the data and gain numerous new insights: Where are my points of sale located? Can I identify certain clusters around cities or municipalities? Which areas are more or less covered by my points-of-sales?
These geographic visualisations can contain multiple layers of data, so that the information can be visualised in different ways: e.g. in the above example, there is a layer with the individual points of sale and a heat map layer to visualise the points-of-sales data density.
Furthermore, with open data sources, such as Google or OpenStreetMap, you can add extra location-specific context and create new insights which were not possible earlier. In the example here, we can examine which points-of-sales are located in a trendy neighbourhood. Identifying these “Hipster Hotspots” assists the process for launching a new product that is specifically aimed at the “hipsters” target group.
In the example of locating Hipster Hotspots, we have used open data from Google and OpenStreetMaps to enrich the points-of-sales data and used R to perform the analysis. Via public Google API’s or OpenStreetMaps APIs, you can identify an enormous amount of geographic context:
The analysis performed in this use case is a simple and efficient example: how many coffee bars, barber shops or restaurants are there in a radius of 250 metres? The drafting of this definition, which exactly determines a hip hotspot, is a very important part of the process. We will concretize this together with business and translate it into data.
Here too, a visualization can help to make the definition of hip hotspots easier to understand. In our visualisation, we show two points-of-sales, each with a different colour, with a higher score assigned to the red circle because there are more barber shops, restaurants and coffee bars in the area.
In the example here, it is very clear that we should only target the point-of-sales from the yellow area, which satisfy our selection criteria based on the location context.
The results of this analysis enable the customer to conduct an efficient data-driven marketing campaign, rather than just follow its intuition or base itself only on sales figures.
In a utilities project, we collected the location-specific characteristics of the utilities company’s customers via open data from the Belgian government in order to develop cross-selling and up-selling models, so that products, services and promotions can be better tailored to the profile, and therefore also, to the needs of the customer.
Based on location data we can add extra demographic and socio-economic statistic per neighbourhood (statistical sector). The (fictional) example here shows a few characteristics of census data that are available as open data, such as the percentage of working inhabitants, higher-educated inhabitants, rented houses or the median income per neighbourhood. Also here, a visual makes it easier and quicker to understand the difference in these neighbourhoods for one particular statistic.
Finally, we have used these new customer profiles in machine learning models during a subsequent phase to implement effective and accurate cross-selling and up-selling algorithms. This enables our customer to offer a specific assortment of (new) products and services specifically targeted at similar profiles.
The Belgium Open Data Initiative has made more than 6500 dataset available to the public for non-commercial and commercial use.