Module 1.3 - Data Quality - Assessment

In this module I learned how to assess road network quality based on completeness. To achieve this, I needed to summarize my findings in textual, visual, and numerical terms. Given a set of geographical data, I was to exercise a road network quality assessment in ArcGIS Pro. 

The study area for the exercise was Jackson County, OR. I was given a boundary of the county as a polygon shapefile, a grid of 25x25 km. cells making up the county, and two road network polyline shapefiles derived from street centerline and TIGER data. After adding all the files to a new map in ArcGIS Pro as feature layers, my first step was to calculate the total length of all the lines making up the road network layers. To do so accurately, I created a new field for both networks and used the Calculate Geometry feature to populate the field in kilometers. Using the Statistics field in each attribute table, I recorded the sum of the lengths of each object in the layer. Assuming that the total length of the network represented completeness, I was able to determine which network was more complete overall. 

Next, I needed to determine which road network was more complete on a grid cell by grid cell basis to achieve a more meaningful analysis. This analysis would follow Haklay's methodology in "How good is volunteered geographic information? A comparative study of OpenStreetMap and Ordnance Survey Datasets" (2010). To begin this step of the analysis, I used the Pairwise Intersect analysis tool on both the street centerline and the TIGER road shapefiles. I used this tool as I could intersect them with the grid cell polygon shapefile to isolate polylines that only existed within the boundary of the grid and exclude any outside the grid. It also allowed the polylines to break at the grid cell boundaries. Next, I used the Summarize Within tool for each of the Pairwise Intersect Outputs with the grid cell shapefile as the Summary Polygons parameter, kilometers as the shape units, and the length fields I had calculated for each of the original road files as the Standard Summary Field parameter. The results were two feature layers containing a field providing the sum of lengths of roads in each grid cell. With this attribute table, I could then use Excel to perform the necessary numerical analyses. Using an IF function in excel, I created a code comparing the street centerline and TIGER road values for each grid cell. A “1” meant the street centerline data was more complete for that cell and “0” if the TIGER data was more complete. Using Conditional Formatting in Excel, I then found the total number of grid cells where each data source was more complete. My findings are represented by the following table:


Finally, I needed to represent my findings visually using an ArcGIS Pro Layout. I began by calculating the differences between the total street centerline length and the total TIGER length in each grid cell using a function in Excel. I exported the sheet as a .csv file and opened it as a Standalone Table in ArcGIS Pro. I needed to join this data to the grid cell feature layer, however this proved difficult at first. I realized that to do so, the data type of my fields in both the .csv table the and the grid cell feature layer needed to match. Once I cleaned this up, I was able to attach the percent differences to the grid cell layer using a Join. This allowed me to represent the data as a choropleth with a diverging color scheme in Symbology. The final result of the analysis was as follows:




Comments