Tuesday, May 10, 2011

Comparing two point datasets for error checking

Let's say you have a set of points given to you that represent addresses of snake farms[insert object here].  Also you've been given the raw address data and you want to compare the quality of the geocoded point file by geocoding the points yourself using the best road data you can find.  Then you might want to know the distances between all the "identical" points to check your data for errors.

As an exercise I generated 2 random sets of points (100 points each) and arbitrarily joined them based on an ID field of 0-99.  One way to figure out the distance between points that should be the same in each file(although in this case NONE will be the same because both were supposed to be random point sets).  Generate an X and Y field for each point in each dataset... (x1,y1 and x2,y2)  Then join the two datasets and create a new field called perhaps "distance" and using the field calculator in arcgis use sqr((x1-x2)^2 + (y1-y2)^2).  Your results will be in whatever unit your coordinate system is measuring in.  I used NAD83 Stateplane Texas North Central for the example.


Another method involves using the ET Geowizards free tool "point to polyline". To do this you would take both files and copy both sets of points into a 3rd shapefile.  This method is nice because you get a line connecting points with identical addresses and could color ramp them to look for ones with large differences in distances.
2 point datasets with lines connecting related points 

No comments:

Post a Comment