paint-brush
A Nationwide Drive Time Matrix Between U.S. ZIP Code Areas: Conclusion, Acknowledgement & Referencesby@zipdrive
150 reads

A Nationwide Drive Time Matrix Between U.S. ZIP Code Areas: Conclusion, Acknowledgement & References

by Zip Drive9mAugust 10th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This paper illustrates the estimation of the nationwide ZIP-to-ZIP drive time matrix in the U.S. The drive times derived by the Google Maps API on randomly-sampled OD pairs serve two purposes: facilitating empirical models to further improve the preliminary estimates based on road networks or simply geodesic distances. As trip lengths increase, the approach requires less data preparation and uses less computational power.
featured image - A Nationwide Drive Time Matrix Between U.S. ZIP Code Areas: Conclusion, Acknowledgement & References
Zip Drive HackerNoon profile picture
0-item

Authors:

(1) Yujie Hu, Department of Geography, University of Florida, Gainesville, FL 32611 and UF Informatics Institute, University of Florida, Gainesville, FL 32611;

(2) Changzhen Wang, Department of Geography & Anthropology, Louisiana State University, Baton Rouge, LA 70803;

(3) Ruiyang Li, Children’s Environmental Health Initiative, Rice University, Houston, TX 77005;

(4)Fahui Wang, Department of Geography & Anthropology, Louisiana State University, Baton Rouge, LA 70803.

Abstract and 1 Introduction.

Methodology

Results

Concluding comments, Acknowledgement and References

Concluding comments

This paper illustrates the estimation of the nationwide ZIP-to-ZIP drive time matrix in the U.S. The drive times derived by the Google Maps API on randomly-sampled OD pairs serve two purposes: facilitating empirical models to further improve the preliminary estimates based on road networks or simply geodesic distances, and validating our design of the methods of varying computational complexity and differential sampling intensity. As trip lengths increase, the approach requires less data preparation and uses less computational power without much compromising the quality of results.


Our own motivation for undertaking this endeavor is to facilitate a study that examines a national health care market structure. We hope that the derived matrix becomes an important resource for researchers who may need it in spatial analysis of a national scope or a large region. For instance, a recent study on measuring and improving accessibility to public libraries in the U.S. (Donnelly, 2015) could benefit from a more accurate measure of drive time from us. In addition, the estimated coefficients and other parameters from the regression models in both Algorithm 2 and Algorithm 3 can be used as a reference in other studies when such information at the national scale is not available. For studies being performed in other geographic scales, such as census tract, or other geographic areas, the derived parameters can be also referenced as a baseline. The proposed research method (or framework) is also useful for one to imitate in a different country (region) of a similar scale. The method has been wrapped into a convenient ArcGIS tool with a user interface, where researchers can easily select input data and make changes to key parameters, such as the constant travel speed, the predefined three hierarchical levels, and the number of requests sent to Google Maps API, to make the tool work for their own data. We will provide both the tool and the matrix for free download.


Several limitations of this study merit discussion. First, this research considers driving as the only transportation mode. The omission of other modes could be problematic especially for studies focusing on other trip purposes or in other areas where public transit service coverage is high. For example, the General Transit Feed Specification (GTFS) data can be integrated into the road network for calculating drive time by transit. In addition, potential users of the derived matrices are suggested to proceed with caution when using travel times of some medium- or long-range trips, such as from Alaska to the contiguous U.S., if they favor more accurate estimates down to minutes. Some of these trips are likely to be made by other modes such as air or train, which are not accounted for by the proposed approach. Another related issue is that the use of ferry is permitted in Algorithm 2 by default, which yields much shorter travel between areas separated by water, e.g., between Michigan and Wisconsin, or with island barriers in coastal areas, than otherwise. If it is desirable to avoid the use of ferry, one can simply specify one parameter (avoid=“ferries”) in Algorithm 2, according to the Google Maps API. In any case, the estimated drive times are a good proxy for travel impedance.


Secondly, more work is needed to improve the baseline estimation on the current division of three hierarchical levels in Algorithm 1. Instead of using people’s perceptions, one may design a simulation procedure that examines the national road network and identifies at what distances it would be most appropriate to simplify the road network. Other types of times warrant consideration for more reasonable estimates, especially for long-range trips in Level 3, such as stopping time for bathroom breaks, gas refill, or sleep. Another issue may arise from the current selection of travel speeds, such as 50 mph in Algorithm 1 and 25 mph in Algorithm 3, in estimating drive times. These values may overestimate drive times of distant zip code pairs in Algorithm 1 or large zip code zones in Algorithm 3 in which case highways and interstates with higher speed limits are more likely to be utilized. Similarly, the selection of an appropriate distance threshold to snap locations onto a road network may depend heavily on the geography being studied. More experiments are needed to determine the most appropriate values. In addition, as discussed in Shi (2007), the Monte Carlo randomization in Algorithm 3 would benefit from a process that considers population distribution, such as the block-level population data, rather than the zip code zone itself. Such a finer geographic resolution would demand additional computation, however. Another potential improvement to Algorithm 3 could be the consideration of the number of road segments or the total length of road segments within ZIP code areas besides perimeter and area.


hree sources of uncertainty are relevant in this study: (1) the three defined hierarchical levels, (2) the centroid-based representation of a zip code zone, and (3) the random sampling of zip code pairs in Algorithm 2. For example, a possible solution to address the random sampling issue might be to consider different geographies and population sizes (Delmelle et al., 2019). Furthermore, other road network data sources such as the OpenStreetMap (OSM) could be employed, especially for regions or countries that do not have access to high-quality road network data. Finally, it is worthwhile to make the proposed method available in a non-ArcGIS environment since ArcGIS is not free to the public, especially for researchers in other counties.

Acknowledgement

Financial support from the National Cancer Institute (NCI), National Institutes of Health, under Grant R21CA212687, is gratefully acknowledged. Points of view or opinions in this article are those of the authors, and do not necessarily represent the official position or policies of NCI. Hu also would like to acknowledge the support by the Ralph E. Powe Junior Faculty Enhancement Awards from the ORAU (Oak Ridge Associated Universities). Comments from 3 anonymous reviewers helped us prepare a much improved final version of the paper

References

Balomenos, G. P., Hu, Y., Padgett, J. E., & Shelton, K. (2019). Impact of Coastal Hazards on Residents’ Spatial Accessibility to Health Services. Journal of Infrastructure Systems, 25(4), 04019028.


Berke, E. M., & Shi, X. (2009). Computing drive time when the exact address is unknown: a comparison of point and polygon ZIP code approximation methods. International Journal of Health Geographics, 8(1), 23.


Bhaskar, A., & Chung, E. (2013). Fundamental understanding on the use of Bluetooth scanner as a complementary transport data. Transportation Research Part C: Emerging Technologies, 37, 42-72.


Bhatta, B. P., & Larsen, O. I. (2011). Are intrazonal trips ignorable? Transport policy, 18(1), 13-22.


Boscoe, F. P., Henry, K. A., & Zdeb, M. S. (2012). A nationwide comparison of driving distance versus straight-line distance to hospitals. The Professional Geographer, 64(2), 188-196.


Coifman, B. (2002). Estimating drive times and vehicle trajectories on freeways using dual loop detectors. Transportation Research Part A: Policy and Practice, 36(4), 351- 364.


De Fabritiis, C., Ragona, R., & Valenti, G. (2008, October). Traffic estimation and prediction based on real time floating car data. In 2008 11th International IEEE Conference on Intelligent Transportation Systems (pp. 197-203). IEEE.


Delmelle, E. M., Marsh, D. M., Dony, C., & Delamater, P. L. (2019). Travel impedance agreement among online road network data providers. International Journal of Geographical Information Science, 33(6), 1251-1269


Donnelly, F. P. (2015). Regional variations in average distance to public libraries in the United States. Library & Information Science Research, 37(4), 280-289.


Dony, C. C., Delmelle, E. M., & Delmelle, E. C. (2015). Re-conceptualizing accessibility to parks in multi-modal cities: A Variable-width Floating Catchment Area (VFCA) method. Landscape and Urban Planning, 143, 90-99


El Faouzi, N. E., Klein, L. A., & De Mouzon, O. (2009). Improving drive time estimates from inductive loop and toll collection data with Dempster–Shafer data fusion. Transportation Research Record, 2129(1), 73-80


Frost, M., Linneker, B., & Spence, N. (1998). Excess or wasteful commuting in a selection of British cities. Transportation Research Part A: Policy and Practice, 32(7), 529-538. Griffith, D. A., Vojnovic, I., & Messina, J. (2012). Distances in residential space: Implications from estimated metric functions for minimum path distances. GIScience & Remote Sensing, 49(1), 1-30.


Hewko, J., Smoyer-Tomic, K. E., & Hodgson, M. J. (2002). Measuring neighbourhood spatial accessibility to urban amenities: does aggregation error matter?. Environment and Planning A, 34(7), 1185-1206.


van Hinsbergen, C. I., van Lint, J. W. C., & van Zuylen, H. J. (2009). Bayesian committee of neural networks to predict drive times with confidence intervals. Transportation Research Part C: Emerging Technologies, 17(5), 498-509. Horner, M. W., & Murray, A. T. (2002). Excess commuting and the modifiable areal unit problem. Urban Studies, 39(1), 131-139.


Hu, Y., & Downs, J. (2019). Measuring and visualizing place-based space-time job accessibility. Journal of Transport Geography, 74, 278-288.


Hu, Y., & Wang, F. (2015). Decomposing excess commuting: A Monte Carlo simulation approach. Journal of Transport Geography, 44, 43-52.


Hu, Y., & Wang, F. (2016). Temporal trends of intraurban commuting in Baton Rouge, 1990–2010. Annals of the American Association of Geographers, 106(2), 470-479. Hu, Y., & Wang, F. (2019). GIS-based Simulation and Analysis of Intra-urban Commuting. CRC Press.


Hu, Y., Wang, F., & Wilmot, C. G. (2017). Commuting variability by wage groups in Baton Rouge, 1990–2010. Papers in Applied Geography, 3(1), 14-29.


Hu, Y., Zhang, Y., Lamb, D., Zhang, M., & Jia, P. (2019). Examining and optimizing the BCycle bike-sharing system–A pilot study in Colorado, US. Applied Energy, 247, 1-12.


Ikram, S. Z., Hu, Y., & Wang, F. (2015). Disparities in spatial accessibility of pharmacies in Baton Rouge, Louisiana. Geographical Review, 105(4), 492-510.


Khan, T. S., Kabir, A., Pfoser, D., & Züfle, A. (2019, November). CrowdZIP: A System to Improve Reverse ZIP Code Geocoding using Spatial and Crowdsourced Data (Demo Paper). In Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 588-591).


Kordi, M., Kaiser, C., & Fotheringham, A. S. (2012). A possible solution for the centroidto-centroid and intra-zonal trip length problems. In International Conference on Geographic Information Science, Avignon.


Kwon, J., Coifman, B., & Bickel, P. (2000). Day-to-day travel-time trends and traveltime prediction from loop-detector data. Transportation Research Record, 1717(1), 120- 129.


Luo, W., & Wang, F. (2003). Measures of spatial accessibility to health care in a GIS environment: synthesis and a case study in the Chicago region. Environment and Planning B: Planning and Design, 30(6), 865-884.


McFadden, D. (1974). The measurement of urban travel demand. Journal of Public Economics, 3(4), 303-328.


Onega, T., Alford‐Teaster, J., & Wang, F. (2017). Population‐based geographic access to parent and satellite National Cancer Institute Cancer Center Facilities. Cancer, 123(17), 3305-3311.


Onega, T., Duell, E. J., Shi, X., Wang, D., Demidenko, E., & Goodman, D. (2008). Geographic access to cancer care in the US. Cancer, 112(4), 909-918. ReVelle, C. S. and R. Swain. (1970). Central facilities location. Geographical Analysis 2, 30–34.


Saxon, J. and D. Snow. (2019). A Rational Agent Model for the Spatial Accessibility of Primary Health Care. Working paper available https://saxon.harris.uchicago.edu/~jsaxon/raam.pdf (last accessed on 5-7-2019).


Semanjski, I. (2015). Potential of big data in forecasting drive times. Promet-Traffic & Transportation, 27(6), 515-528.


Shi, X. (2007). Evaluating the uncertainty caused by Post Office Box addresses in environmental health studies: A restricted Monte Carlo approach. International Journal of Geographical Information Science, 21(3), 325-340.


Shi, X., J. Alford-Teaster, T. Onega, and D. Wang. (2012). Spatial access and local demand for major cancer care facilities in the United States. Annals of the Association of American Geographers 102, 1125–1134.


Simini, F., González, M. C., Maritan, A., & Barabási, A. L. (2012). A universal model for mobility and migration patterns. Nature, 484(7392), 96.


Toole, J. L., Colak, S., Sturt, B., Alexander, L. P., Evsukoff, A., & González, M. C. (2015). The path most traveled: Travel demand estimation using big data resources. Transportation Research Part C: Emerging Technologies, 58, 162-177.


U.S. Census Bureau. (2019). TIGER/Line with Selected Demographic and Economic Data available https://www.census.gov/geographies/mapping-files/time-series/geo/tigerdata.html (last access 9-9-2019)


Wang, F. (2003). Job proximity and accessibility for workers of various wage groups. Urban Geography 24: 253-271 Wang, F. (2015). Quantitative Methods and Socio-Economic Applications in GIS. CRC Press.


Wang, F., C. Wang, Y. Hu, J. Weiss, J. Alford-Teaster, T. Onega. (2020). Automated delineation of cancer service areas in northeast region of the United States: a network optimization approach. Spatial and Spatio-temporal Epidemiology 33:100338.


Wang, F., & Xu, Y. (2011). Estimating O–D drive time matrix by Google Maps API: implementation, advantages, and implications. Annals of GIS, 17(4), 199-209.


Woodard, D., Nogin, G., Koch, P., Racz, D., Goldszmidt, M., & Horvitz, E. (2017). Predicting drive time reliability using mobile phone GPS data. Transportation Research Part C: Emerging Technologies, 75, 30-44.


Zhu, Y. J., Hu, Y., & Collins, J. M. (2020). Estimating road network accessibility during a hurricane evacuation: A case study of hurricane Irma in Florida. Transportation Research Part D: Transport and Environment, 83, 102334.


This paper is available on arxiv under CC BY 4.0 DEED license.