My internship with the Social Decisions and Analytics Lab at Virginia Tech gave me a unique opportunity to work with an emerging discipline where social scientists, statistician, and software engineers are coming together to merge data from multiple sources to inform local policy. My role on the team was to develop visualizations to support “telling the story” in a way that was easily understood.
Within administrative records, multiple types of data are often combined for expediency. By multiple types we mean different sets of data fields, each set representing a different type of observational unit (e.g. property information and listing agent information in the same record). The observational unit types necessary to the project at hand need to be separated out into individual observations or individual data-sets in the restructuring phase. An example from a housing case study is given here. The data-set provided was comprised of single records with 128 fields. Each original record was identified by a unique “List Number”. However, if a parcel was listed twice it would have two different “List Numbers.” As a result, changes in a property or parcel over time could not be tracked from these records because the structure only identified the list number not the parcel number. Changing the structure to include the “Parcel ID” allowed the required historical tracking of changes.
Human Error: Inconsistent Gender Classifications and Data Quality
The frequency of human error in completing voter registration and other administrative data is illustrated above. Human error comes into play when people complete the same documents multiple times. Generally mistakes occur when the wrong boxes are checked, questions are misunderstood, etc. For example, the animation shows the same individuals submitting documents with different gender classifications in error. The accuracy of data relies on the subjects correctly and consistently listing information so that changes and patterns can be analyzed, noted, and adapted to.
This animated table was designed to support a presentation that compares the education levels of populations in three adjacent Northern Virginia counties for citizens aged 18 to 24 years old.