Article Text

Using parallel geocoding to analyse the spatial characteristics of road traffic injury occurrences across Lagos, Nigeria
  1. Avirut Mehta1,
  2. Dohyeong Kim1,
  3. Nicholas Allo2,
  4. Aina Olufemi Odusola3,
  5. Chenchita Malolan4,
  6. Fiemu E Nwariaku5
  1. 1School of Economic, Political and Policy Sciences, University of Texas at Dallas, Richardson, Texas, USA
  2. 2Visual Earth Group, London, UK
  3. 3Lagos Health Service Commission, Lagos, Nigeria
  4. 4University of Texas Southwestern Medical Center, Dallas, Texas, USA
  5. 5Spencer Fox Eccles School of Medicine, University of Utah, Salt Lake City, Utah, USA
  1. Correspondence to Dr Dohyeong Kim; dohyeong.kim{at}utdallas.edu

Abstract

While efforts to understand and mitigate road traffic injury (RTI) occurrence have long been underway in high-income countries, similar projects in low/middle-income countries (LMICs) are frequently hindered by institutional and informational obstacles. Technological advances in geospatial analysis provide a pathway to overcome a subset of these barriers, and in doing so enable researchers to create actionable insights in the pursuit of mitigating RTI-associated negative health outcomes. This analysis develops a parallel geocoding workflow to improve investigation of low-fidelity datasets common in LMICs. Subsequently, this workflow is applied to and evaluated on an RTI dataset from Lagos State, Nigeria, minimising positional error in geocoding by incorporating outputs from four commercially available geocoders. The concordance between outputs from these geocoders is evaluated, and spatial visualisations are generated to provide insight into the distribution of RTI occurrence within the analysis region. This study highlights the implications of geospatial data analysis in LMICs facilitated by modern technologies on health resource allocation, and ultimately, patient outcomes.

  • geographic information systems
  • health policy
  • injury
  • public health
  • epidemiology

Data availability statement

Data are available upon request.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Summary box

  • Geospatial analysis of road traffic injury (RTI) occurrence enables optimisation of emergency response resource allocation and improvement of health outcomes.

  • Existing RTI datasets in low/middle-income countries are often limited by low-fidelity addressing systems and institutional hurdles.

  • A parallel geocoding pipeline, which concurrently employs multiple commercially available toolkits, serves as a means for overcoming common addressing challenges.

  • Although commercial geocoders often lack concordance in low-fidelity datasets, insights can nevertheless be ascertained through cross-comparison and spatial visualisation.

  • Application of parallel geocoding techniques to RTI datasets from Lagos State, Nigeria yields insights into optimal ambulance allocation patterns.

Introduction

Road traffic injuries (RTIs) create significant economic and health burdens, particularly in low/middle-income countries (LMICs) such as Nigeria, where RTI death rates reached 21.4 deaths per 100 000 population in 2016.1 Although this compares favourably with the average rate in the African region (26.6 deaths per 100 000), it remains significantly elevated in comparison with the world average (18.2 deaths per 100 000) and more than double the European average (9.3 deaths per 100 000).1 This comparatively elevated burden has been partially attributed to the rapid rate at which urbanisation and motorisation efforts have increased in LMICs.1 This is further exacerbated in conjunction with the relatively slow progression of institutional efforts to implement road safety interventions, conduct safety analysis and legislate corrective measures.2 3

Historically, the effects of injury across sub-Saharan Africa have been measured by methods such as community surveillance or representative sampling. However, efforts to mitigate injury impacts in the region can now incorporate data-mining techniques that enable the collection of large RTI datasets, allowing for further analysis to help alleviate the increased LMIC injury burden.4–6 Recent studies in nations such as Ethiopia and Zambia have systematically aggregated hospital data or leveraged large census datasets to ascertain injury-related metrics such as mortality rates or prevalence of post-traumatic stress disorder in survivors, demonstrating the value of more comprehensive and inclusive investigation techniques.7 8 As such, geospatial analysis of RTI data collected by ambulance services may prove useful in the evaluation of ambulance and healthcare facility allocation patterns to minimise time to incidence and improve patient outcomes.9 Similar studies aimed at improving healthcare resource allocation through analysis of geospatial data have previously been performed in Karachi, Pakistan and Hanoi, Vietnam; these studies have generated concrete and actionable conclusions for improvement of deployment patterns.10 11

Serving a population of more than 20 million with approximately 25 ambulances, the Lagos State Ambulance Service (LASAMBUS) encounters several barriers to timely prehospital care including a lack of operational ambulances, poor road infrastructure and traffic congestion.12 13 Further exacerbating these challenges is the issue of inefficient deployment of ambulance base stations across Lagos State, resulting in undesirably high travel times to a subset of accident hotspots.14 Decreased time to treatment has been consistently associated with improved odds of survival, highlighting the need for efforts to address factors currently preventing timely access to care within the LASAMBUS system.15–18 As such, the optimal allocation of ambulance teams is imperative for a reduction of negative RTI impacts and an improvement of outcomes. Previous studies outline efforts made to more efficiently situate ambulance base stations and offer better accident coverage, which yielded up to 40% reductions in travel time to incidents in some cases.19 Heuristic algorithms that aim to efficiently identify resource allocation patterns and improve the efficiency of coverage are capable of optimising ambulance base station locations, as observed in China and Saudi Arabia.20–23 However, the efficacy of this approach depends on an accurate and comprehensive prior understanding of existing RTI locations and outcomes.

A major barrier to the collection of RTI data in many LMICs, as well as in Lagos State, is the minimal amount of resources allocated to public health services. Adding to the burden of RTI data collection and analysis, sub-Saharan Africa generally suffers from a lack of concrete, unambiguous spatial references, with informal schemes taking the place of an official addressing structure.24 However, the conduct of analysis to optimise resource placement generally requires specific point data, and whatever location data are available must therefore be geocoded (converted from textual address description to latitude and longitude) before being used. Commonly used and widely available tools for this aim include geocoding APIs (application programming interfaces) from several positioning service providers such as ArcGIS, Google, Bing, Mapbox and HERE Maps. These primarily commercial geocoders are designed to function best when provided with a clear and specific address string, a rarity in many LMICs where naming conventions frequently derive location references from relative position to local landmarks or crossroads.25 Given this, there is an evident need to analyse the effectiveness of existing geocoding tools when using the often incomplete health and traffic datasets present in most LMICs.

Geocoding has been used since the 1960s, where the needs of large-scale surveys such as the US Census necessitated a means to transform abstract location information into numerically coded geographical zones.26 The use of geocoding for traffic collision data dates back to at least the early 2000s, and past analyses have frequently combined commercially available data with custom geocoding methodologies. For example, customised geocoding pipelines have been used to take advantage of highway mile markers in regions where such information is commonly encoded alongside RTI records.27 The utility of analysing historical RTI data to identify public health measures, which may mitigate the economic and health burdens of traffic accidents, has not gone unnoticed. Organisations such as the Florida and Wisconsin Departments of Transportation have previously employed geocoding-based methods for the geospatial quantification of RTIs.28 29

Geocoder workflow

LASAMBUS functions as a part of the Lagos State Ministry of Health, in collaboration with whom this study was conducted (see author reflexivity statement). As part of their standard procedure, the ambulance personnel complete intervention forms when attending to an emergency call. LASAMBUS shared their RTI intervention forms during a year-long study (April 2017–May 2018). In total, 5606 records were collected, of which 3588 (64%) represented traffic incidents. Valid RTIs (those which contained entries for all key fields necessary for a viable geocoding) were then categorised into usable records which were accompanied by a textual description of the location (generally in the form of crossroads or landmarks), establishing a final usable dataset of 2920 incidents (81% of all traffic incidents). Within this set, 574 RTIs (20%) encoded the local government area, or LGA, in which they originated. An address string was constructed for each RTI by concatenating all available information about the incident location—textual information, LGA names where available, and a state and country descriptor. For example, a record which separately encoded the textual description ‘Alimosho Roundabout Inward Egbeda’ and LGA ‘Alimosho’ would yield the address string ‘Alimosho Roundabout Inward Egbeda, Alimosho, Lagos, Nigeria’. Data analyses and visualisations were conducted using Python.30–33

Generated addresses were then passed through the ArcGIS, Bing Maps, Google Maps, Mapbox and HERE Maps geocoders using the Python library geocoder. Although a common choice in the research community, the OpenStreetMap geocoding service Nominatim was excluded due to prohibitive rate limits on the publicly available API, as well as significant time and resource expenditures necessary for hosting a private instance. In contrast, all of the aforementioned commercial geocoders were either entirely free or provided a generous free tier sufficient for the aims of this study. Implementation at a greater scale, however, may benefit from the initial investment required to use geocoders such as Nominatim. Outputted coordinate pairs as well as confidence and quality metrics from each geocoder were aggregated into the RTI dataset to be further analysed for concordance (figure 1).

Figure 1

Inputs and outputs from five freely available geocoders. RTIs, road traffic injuries.

While the gold standard for determining the accuracy of geocoder outputs would be a comparison against a manually labelled truth set, this tends to be expensive at scale and could be incompatible with the realities of geospatial analysis in LMICs. This can be attributed to their lack or limited use of existing census infrastructure analogously used in high-income country analyses to assess household access to healthcare and make provisions where gaps are identified.34 However, the parallel use of multiple complementary geocoders provides a potential means of automated error detection. To begin analysing the degree of concordance between outputs from multiple geocoders, the input addresses for which valid latitude/longitude pairs were generated by each tool should be considered.

From the 2920 input addresses, each geocoder was able to generate between 500 and 900 valid geocodes. Validity was primarily assessed by secondary outputs from each geocoder described as ‘quality’, ‘accuracy’ or ‘confidence’. Consistency across geocoders was established by selecting quality and accuracy labels which were indicative of single-building or neighbourhood-level precision. In total, 1932 RTIs were assigned a valid coordinate pair by one or more of the geocoders. Out of these, 6 RTIs were acceptably geocoded by all five services, 46 were acceptably geocoded by four out of the five services, 282 were acceptably geocoded by three out of the five services, 619 were acceptably geocoded by two out of the five services, and 979 were acceptably geocoded by only one service. The Bing/Mapbox geocoder pair provided valid geocodes for the most inputs out of any pair, while the ArcGIS/BING pair had the fewest number of shared valid geocodes (figure 2). Nevertheless, the majority of input addresses had only zero or one valid geocoded output, suggesting low concordance in perceived precision of outputs between geocoders, likely due to the generally ambiguous character of addresses within the dataset.

Figure 2

Inputs with valid outputs shared between geocoders. RTIs, road traffic injuries.

Concordance analysis

Once geocoding was complete for all geocoders, the pairwise distances between shared outputs were computed to quantify the spatial concordance for points which were assessed by the geocoders to have high precision. Shared outputs were identified as individual RTIs which were successfully coded by more than one geocoder, allowing for calculation of the average distance between each of the geocoder outputs. Subsequently, these outputs were shuffled for RTIs across geocoders to provide a point of comparison for the evaluation of geocoder consistency. Each latitude/longitude pair was shuffled to a random RTI for each geocoder; while all geocoders kept the same set of points overall, they were arbitrarily assigned to different RTIs, allowing for comparison of inter-geocoder distance in the shuffled set versus the non-shuffled set. Finally, heatmaps were generated using the Python library folium to provide a visual representation of RTI hotspots as outputted by each geocoder.33

All combinations of shared geocoders resulted in a median inter-geocoder distance of greater than a kilometre, indicative of relatively low concordance attributable to lacking consistency and precision of addresses in the LASAMBUS dataset. However, output values may remain useful for medium-resolution geospatial analysis, given that more than 90% of RTIs are located within a 60-kilometre by 30-kilometre area, such that kilometre precision can potentially reveal important trends within the data. On the measure of concordance with each other, all geocoders perform significantly better than a shuffled set of geocoded points, indicating that disagreement between geocoders is limited overall (figure 3).

Figure 3

Range of distances for pair-geocoded points from five geocoders.

RTI and travel time analysis

As the LASAMBUS dataset encodes ambulance response time information for a subset of RTIs, it was also possible to evaluate geocoder accuracy by comparing computed travel times from ambulance base positions to RTIs against the actual travel times within the dataset. Of the 2920 valid RTIs in the dataset, 2350 records (80%) included textual information regarding the ambulance starting point, of which 831 (35%) were identifiable and could be converted to a longitude/latitude pair. The overlap between these RTIs, those with a known travel time, and those with valid geocoding ranges between 100 and 250 incidents for each geocoder. Using the Mapbox Directions API with a traffic-inclusive profile, routes were generated for each RTI from the responding ambulance base station to the geocoded position of the RTI. Duration and distances were then collected from these routes, such that travel times reported by Mapbox API could be compared against those logged by the LASAMBUS ambulance teams.

Figure 4 shows the hotspots of RTIs per geocoder. The geocoders all showed similar hotspot areas except for Bing and Mapbox, which did not show as strong a concentration of RTI incidence across the city.

Figure 4

Heatmaps of RTIs as determined by five geocoders. RTIs, road traffic injuries.

Ambulance to RTI travel times

Of the 2920 RTIs within the dataset, 2634 include reasonably interpretable values for travel time from the ambulance base station to the scene of the RTI. The median travel time from the ambulance base to an RTI location is 15 min, with an IQR of 9–25 min.

Figure 5 shows the distribution of ambulance travel times after generating the routes for each RTI and comparing them against known travel times, providing a means of comparison both across geocoders and against known values.

Figure 5

Distribution of ambulance to RTI travel times across geocoders. RTIs, road traffic injuries.

Routed travel times are generally greater than those reported by LASAMBUS, with a median travel time of 18.7 min (IQR 9.9–36.2 min) across 913 records from all geocoders. The Mapbox (21.4 (14.4–46.6) min), HERE Maps (20.3 (9.1–42.4) min) and ArcGIS (20.3 (9.1–42.4) min) geocoded values were most significantly greater than the truth set, while the Bing Maps (14.4 (8.2–32.8) min) and Google Maps (13.3 (8.1–22.5) min) geocoders produced values closer to that of the truth set.

Conclusion

We used a multipronged geocoding approach to perform an analysis of the spatial characteristics of RTI incidence across Lagos, Nigeria. To do so, we confronted the unique challenges of geospatial investigations in LMICs, such as the unclear encoding of locations in source datasets and limited geocoding and routing tools with high accuracy; Nigeria is one among many nations which lack a high-fidelity location reference scheme.35 Extending previous research using a dataset collected by the LASAMBUS, we found RTIs in Lagos State to be primarily concentrated in urban areas, with an adequate median travel time from RTI to hospital.12

Geocoding and concordance

Using the LASAMBUS dataset, we compared five commonly used and freely available geocoders to one another, judging each on the basis of concordance between their output and those of other geocoders. We further examined the accuracy of geocoded values by comparing calculated travel times from ambulance base stations to RTIs against known travel times, finding that points geocoded and subsequently routed with the Mapbox Directions API are calculated as temporally farther from ambulance base stations than travel times encoded within the dataset would suggest. These findings reflect an opportunity to gain valuable operational insights from a rudimentary analysis of large geospatial datasets in LMICs, despite lacking structures and systems for positional data.

Next, we examined the median distance between RTIs as geocoded by various geocoders versus randomly shuffled across geocoders, finding that the median inter-geocoder distance is significantly lower for non-shuffled points. These observations allow for two general conclusions: first, that even low-quality inputs from LMICs can be geocoded with precision significantly better than random; second, that no single geocoder is significantly more consistent than others in matching the outputs of its peers, and therefore no single source of truth exists to viably geocode the LASAMBUS dataset (figure 3).

Geospatial incidence of RTIs

Spatial plotting of RTI incidence serves as a further means of analysing the differences between the output of each geocoder and demonstrates that RTIs are primarily concentrated, but somewhat spread, across the centre of Lagos (figure 4). This assessment falls in line with previous analyses of RTI distribution in sub-Saharan Africa.9 Spatial visualisations of RTIs as geocoded by the ArcGIS, Google and HERE Maps tools are remarkably similar, suggesting that while the pairing might not allow for the most concordance on individual points, aggregation allows for an agreement in trends to visually emerge. Bing Maps and the Mapbox geocoder, on the other hand, are more likely to geocode points in other parts of the state, farther away from the city centre.

Disagreements between these two groups of geocoders are largely empirically unresolvable without significant further investment into manual verification. However, the similarities—high densities of RTIs in certain areas of the city along major roadways—allow for the determination of points where the need for LASAMBUS resources is most urgent. As such, this analysis demonstrates that public health recommendations can be made even in situations where accurate and reliable geospatial data are unavailable, as is often the case in LMICs. Furthermore, it provides a practical basis for the development of heuristic algorithms to optimise ambulance base station location, bridging the gap between the complexities of the real world and the theoretical utility of such methods when applied to man-made models.21

Ambulance to RTI travel

Our calculated travel time from the ambulance base to RTI of 15 min (figure 5) is roughly in line with the median ambulance response time in Africa, which is notably greater than in any other continent.36 This result once again highlights the need for efforts to reduce ambulance travel times, a measure which has been demonstrated to significantly improve patient survival outcomes.15–18

Finally, a comparison of calculated and routed travel times, from ambulance base stations to RTI locations, reveals known truth values to be consistently lower than calculated values from the Mapbox Directions API. This difference may be attributable to the traffic privileges afforded to ambulances, or could simply indicate that geocoders are locating RTI points farther from ambulance base stations than is true in reality. However, the role of the Mapbox Directions API in routing must not be ignored as a factor which may differentially affect the provided coordinate pairs from different geocoders. Additionally, the Mapbox Directions API does not take into account factors which would apply specifically to ambulances, such as potentially reduced effects of traffic resulting from cooperation and coordination by other drivers.

Contextualisation and next steps

At their root, biases within geocoders originate from each geocoder’s sourcing of data and historical age. Volunteer-based geocoders such as OSM’s Nominatim rely largely on crowdsourced data points, whereas commercial geocoders may combine crowdsourced data with that collected by local contractors or employees. Trade-offs between geocoding rigour and rural/urban bias have been well examined in developed nations, and are likely to be even more severe in LMICs.37 Given the general prevalence of low-quality address records which frequently omit street names or concrete identifiers in favour of landmarks, positional errors which deviate geocoded locations from the ground truth must be expected. Using stricter mechanisms to minimise error, for example, by filtering low-quality addresses and geocodes, is prone to bias analyses against rural areas which have been shown to have poorer match rates than their urban counterparts in developed countries.37 In essence, positional error resulting from geographical confounding is largely inevitable, particularly in LMICs. The use of multiple geocoders as described in this paper is intended to mitigate this positional error by combining multiple points of information, providing an easily accessible reduction of the impact of this limitation. However, future analyses may even further limit the impact of such errors with advanced address-matching and confidence-weighting techniques.38

The limited accuracy of geocoder outputs is not the only limitation of our study. The LASAMBUS reported data may fail to capture all notable RTIs, and some duplicates may remain as well despite our best attempts at cleaning the dataset. Furthermore, the Mapbox Directions API may fail to perform consistently in a potentially poorly mapped region such as Lagos, resulting in the introduction of biases due to suboptimal routing or improper inclusion of traffic. In the future, the impact of these biases may be mitigated through the use of other routing APIs (such as OpenStreetMap) or, at the very least, quantified through experimentation oriented towards ascertaining the magnitude of difference between computationally routed values and true travel times. However, our results remain reasonable and certain metrics, such as computed travel times, are in line with prior research.

Supplemental material

Data availability statement

Data are available upon request.

Ethics statements

Patient consent for publication

References

Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

Footnotes

  • Handling editor Seye Abimbola

  • Twitter @nicholasallo

  • Contributors DK, FEN, CM and AM conceptualised and designed the study. AM searched the literature and collected the data and prepared the draft manuscript. All authors participated in the interpretation of the data and writing of the manuscript. All authors confirm that they had full access to all the data used in the study and accept responsibility to submit for publication. The final submitted version of the manuscript was reviewed and approved by all authors. DK acts as the guarantor responsible for the overall content of the published work.

  • Funding This study was supported by award no. 1R21TW010991-01A1 (Reducing the Burden of Road Traffic-Associated Mortality using Mobile Technology) from the National Institute of Health.

  • Disclaimer The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of Health.

  • Map disclaimer The inclusion of any map (including the depiction of any boundaries therein), or of any geographic or locational reference, does not imply the expression of any opinion whatsoever on the part of BMJ concerning the legal status of any country, territory, jurisdiction or area or of its authorities. Any such expression remains solely that of the relevant source and is not endorsed by BMJ. Maps are provided without any warranty of any kind, either express or implied.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Author note The reflexivity statement for this paper is linked as an online supplemental file 1.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.