geospatial things and stuff: 2011

Wednesday, November 16, 2011

Solar Power System

I spent the last two weeks near Bayfield CO with my parents building a new solar power system.
Specs:
6x 230 watt Canadian Solar Panels configured for 2 series of 3 at 90VDC
4kw Schneider Electric Charger/Inverter with 8kw 10 second surge capacity
60amp Schneider MPPT charge controller
12x 6volt 370AH deep cell batteries wired for 24volts at 1110AH

The build went smooth and everything worked on the first flip of the switches. When I left at 10am roughly 3 hours before solar noon the output was 1180Watts, almost 30 volts at almost 40amps. Not sure yet how high the output got by 1pm if the sky stayed clear.

From Last day

Wednesday, October 19, 2011

Setting up a spatial postgresql database with postgis

My machine is running Ubuntu 10.04 x64 This is assuming you already have postgresql and postgis installed.
First create the PostgreSQL database:

createdb yourdatabase

Then add plpgsql support to that database:

createlang plpgsql yourdatabase

Then you need to import these two SQL files into that database to set up the PostGIS functions:

psql -d yourdatabase -f /usr/share/postgresql/8.4/contrib/postgis-1.5/postgis.sql psql -d yourdatabase -f /usr/share/postgresql/8.4/contrib/postgis-1.5/spatial_ref_sys.sql

That's it!
QGIS has a nice plugin tool for pushing shapefiles into postgis enabled postgresql databases called SPIT(Shapefile to Postgresql Import Tool

Friday, September 16, 2011

change in unemployment rate county by county coupled with total economic recovery spending

Courtesy of Development Seed

SEXtante

"SEXTANTE is a spatial data analysis library written in Java. The main aim of SEXTANTE is to provide a platform for the easy implementation, deployment and usage of rich geoprocessing functionality. It currently contains more than three hundred algorithms for both raster and vector data processing, as well as tabular data analysis tools. SEXTANTE integrates seamlessly with many open source Java GIS (such as gvSIG, uDig or OpenJUMP) and non-GIS tools (such as the 52N WPS server or the spatial ETL Talend)."

Friday, September 9, 2011

Preparing 2010 census data

Preparing 2010 census data found in the SF1 files. The census instructions are helpful but they use extremely slow processes in their instructions on how to prepare the data for importing into ms access.

On page 5 of the instructions they tell you "All files with an .sf1 extension must be changed to .txt files. Right click on the first file with a .sf1 extension. Choose “Rename” and change the .sf1 portion of the name to .txt and hit Enter. Repeat for each file with a .sf1 extension"
This is incredibly slow how about opening up a CMD window and typing in

ren *.sf1 *.txt

Next on page 7 they tell you to use Wordpad to find and replace text in several huge text files, turns out this method is incredibly slow. How about we do this using sed linux command:

cat tx000062010.txt | \ > sed -e 's/SF1ST,TX,000,06,//' > tx000062010mod.txt

This command finds the pattern found between the first two forward slashes and replaces it with the pattern between the 2nd and 3rd forward slashes(in this case nothing). This took about 4 seconds to process a 565mb files on a quadcore AMD machine with 8gb memory. It was going to take hours to do this using wordpad's "find replace" tool. Turns out you don't need to cat the file and pipe it to sed. A good friend with way more experience using unix tools and programming than I also say's that awk is easier to use and I have to agree.

Count the number of fields in a comma delimited txt file with awk.

gawk -F"," '{ print NF ":" $0}' textfile.csv
sample output: 260:SF1ST,TX,000,45,0000438,0,0,0,0,0,0,0,0,1,0,1,0,1.00,1.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,1.00,1.00,0.00,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,1 OR just awk -F"," '{print NF}' tx000452010.txt
sample output: 260 260 260 ....

This text file has 260 fields per line. Which I want to extract the first 239 fields from.

cut -d ',' -f1-239 tx000452010.txt > tx000452010part1.txt

Excluding a field range and writing to a new file

gawk -F"," -v f=6 -v t=239 '{ for (i=1; i<=260;i++) if( i>=f && i<=t) continue; else printf("%s%s", $i,(i!=260) ? OFS : ORS) }' tx000452010.txt > tx000452010part2.txt
replace the default seperator of space with comma
awk '{gsub(/ /,",");print}' tx000452010part2.txt > tx000452010part_2.txt

This could be done in a single command if I knew how.
Import the text files into a postgresql database because msaccess has a 2gb file limit and to have all this data in one database you're looking at a 10+gb database easy.

add the new field for building the geoid
ALTER TABLE "SF1_Access2003_mdb"."SF1_00001" ADD COLUMN geoid text;
Concatenate the fields to build the geoid for the block summary level. Hint: If you take the left 12 characters of this result you get the geoid for the blockgroup level......etc..
UPDATE "SF1_00002" SET geoid = "Txgeo2010"."STATE" || "Txgeo2010"."COUNTY" || "Txgeo2010"."TRACT" || "Txgeo2010"."BLOCK" FROM "Txgeo2010" WHERE ("SF1_00002"."LOGRECNO" = "Txgeo2010"."LOGRECNO");

...to be continued

Wednesday, June 1, 2011

Generalizing parcel data

Raw data -> google-refine for clustering and filtering -> join refined data to spatial data -> dissolve parcels using refined subdivision attribute -> buffer to fill in ROW's --> copy buffer to original dissolve layer --> dissolve again on subdivision attribute --> setup topology does not have gaps with tolerance of ~1foot --> validate topology (removes donut hole slivers)--> buffer negative value equal to the original buffer to remove overlap of adjacent subdivision polygons --> minor edits to clean up the data.

Wednesday, May 18, 2011

Compiling gdal with ECW and MrSID support

Well I wanted to generate an mbtiles database of some imagery using raster2mb, a python script based off of gdal2tiles, but the imagery was in ECW format. I wanted to do this to use as a baselayer on the Mapbox ipad app.

I had gdal (geospatial data abstratcion layer) installed but the binary available from the repositories does not include this support built in. So, svn checkout the latest stable trunk of gdal and find the libecwj2_3.3-1_i386.deb (that is not trivial as ERmapper doesn't support linux in their new SDK builds[4.2] and they don't host version 3.3 anymore alas I found someone hosting it on mediafire, remember Google is your friend)

Building gdal from source tips and hints found here. Find yourself the aforementioned ECW sdk library(read only, the code is proprietary and to write this file type you have to pay for a license, but reading from and then converting to a more friendly format can be done with the r/o sdk).

While you're at it might as well download the MrSID SDK(free registration required) and add it to your ./configure arguments because lot's of data is available in this format freely and it'd be nice to manipulate it/convert it with gdal.

good luck. :)

gdalwarp -t_srs EPSG:900913 -of GTiff dallas.ecw dallas.tif
Creating output file that is 43163P x 50580L.
Processing input file dallas.ecw

Tuesday, May 10, 2011

Comparing two point datasets for error checking

Let's say you have a set of points given to you that represent addresses of ~~snake farms~~[insert object here]. Also you've been given the raw address data and you want to compare the quality of the geocoded point file by geocoding the points yourself using the best road data you can find. Then you might want to know the distances between all the "identical" points to check your data for errors.

As an exercise I generated 2 random sets of points (100 points each) and arbitrarily joined them based on an ID field of 0-99. One way to figure out the distance between points that should be the same in each file(although in this case NONE will be the same because both were supposed to be random point sets). Generate an X and Y field for each point in each dataset... (x1,y1 and x2,y2) Then join the two datasets and create a new field called perhaps "distance" and using the field calculator in arcgis use sqr((x1-x2)^2 + (y1-y2)^2). Your results will be in whatever unit your coordinate system is measuring in. I used NAD83 Stateplane Texas North Central for the example.

Another method involves using the ET Geowizards free tool "point to polyline". To do this you would take both files and copy both sets of points into a 3rd shapefile. This method is nice because you get a line connecting points with identical addresses and could color ramp them to look for ones with large differences in distances.

2 point datasets with lines connecting related points

Thursday, April 21, 2011

Mapbox, TileMill and TileStream

Tilemill is pretty awesome. Quickly you can prepare data as shapefiles, zip them up and then theme them using Tilemill(you need to install it on OSX or linux, I'm using Ubuntu 10.10). Once you have it installed you can very quickly theme a map using CSS like style code, then export your map into an MBtiles format and start serving it up using Tilestream. I built both from source after mirroring the projects from their git repositories.
This screen shot is of the first map I am working on. It uses some of the data that Mapbox team has stored on their s3 amazon account and a shapefile of the latest TEA district boundaries. I'm working on making it more data rich. There are ways to use your custom MBtiles data with Google Maps API and OpenStreetMap data.
Simple "MSS" style code to generate most of the above map, there are two other stylesheets being used. One for labeling(with various definitions for different zoom levels) and one to color all the school districts contrasting colors. The colors can be defined by attribute data read from the shapefile DBF.

Map {
background-color: #fff;
}
#states {
line-color: #002bff;
line-width: 1;
}
#lakes {
polygon-fill:@water;
line-color:#002bff;
line-width:0.4;
line-opacity:0.4;
}
#world {
::outline {
line-color: #000;
line-width: 4;
}
line-color: #fff;
line-width: 2;
}
#districts {
line-color: #rgba(0,0,0,0.75);
line-width: 1;

Development Seed has designed and released for free an iPad Mapbox App. Once you've got map style sheets down using carto in tilemill you generate the mbtiles file and copy it over to your iPad and viola instant custom map works very fast even with HUGE datasets, I tested it with a shapefile with over 300,000 polygon and it worked very well and was stable too. This is due to the data being converted to images and sliced into tilesets, therefore it doesn't have to render the millions of points needed to make all those polygons.

Monday, April 11, 2011

Code!

I've used a few snippets of code from http://gis.utah.gov/code
The last one I used was the spatial join points and polygons without creating an extra(trash) dataset.

Saturday, February 19, 2011

Fuzzy String Matching

*Update* checkout Google Refine for cleaning dirty data.

I wanted to generate numbers that represented a sort of "sameness" or "matchiness" of two strings, that's how I was thinking about it. This led me to learning a little about fuzzy string matching using a method called Levenshtein. I didn't need to understand how to write the algorithms to do this I just wanted to use the tools.

ArcGIS has adopted Python as it's scripting language. I downloaded compiled and installed a module called pylevenshtein which will compute the levenshtein edit distance as well as other methods of comparing two strings. (had to install microsoft studio 2008 express it's free). Once this is setup it's rather easy to implement the various string comparison algorithms in the field calculator.
I'm now figuring out which function to use or combination of functions to use to see if it's any better than my current method. The Jaro Distance is looking good.

update:
I've decided that the method I was already using was best for me, perhaps I can write a script to automate some of the steps involved.
Step 1 select where old name <> new name
Step 2 export this selection
Step 3 delete all unnecessary fields(to reduce dataset size and ease the manual review steps)
Step 4 create two new txt fields 4 characters in length
Step 5 populate new txt fields with left(old name/new name, 4)
Step 6 select where left4_old=left4_new
Step 7 scan selected parcels > X acres(depending on size of dataset)
Step 8 delete the manually corrected selection of left4_old=left4_new

On spot checking this method the results are very good. I found it difficult to interpret the various edit distance algorithms from the pylevenshtein package into something meaningful that would improve the accuracy and speed in developing change data. The exercise was to try and find a better/faster way and I ended up going back to my original method. Maybe it's because my method is easier to comprehend.

One goal for generating the data is to highlight ownership changes over time on a map. This could also be useful when looking at older subdivisions and gauging the regeneration rate.

Friday, February 11, 2011

NIR to fake NC

You can fake natural color imagery using near infrared imagery by playing with the 3 (or more) bands.
Here I took some 2010 NAIP imagery, clipped it to the area of interest and set the RGB bands to
Red=Band 2
Green=Band 3
Blue=Band 3

Remote Sensing

Here's a link to a tutorial with a lot of history published by someone at NASA.

Friday, January 14, 2011

Supervised Image Classification

Related to the previous post about vacant land inventory, this is an exercise in delineating land use types using image analysis. Previously I used parcel vector data and the linked attribute data looking for parcels over a certain size that had a low or zero improvement value to locate vacant land.

First I've got a 2010 3 band CIR image to work with. I've started with 5 basic classes(water, forested, pasture, bare earth, pavement) and defined several training data regions for each class manually using a polygon lasso tool. Waiting for processing results....

Initial results are not great. I seem to have classified the wooded areas very well, but the road/water/bare earth probably needs more manual definition placed with better precision. Next to try this on a smaller dataset so it doesn't take days to process and see how good/bad the results are.

geospatial things and stuff