OpenNews/hackdays/storyandalgorithm/departmentofdefensedatadig
From MozillaWiki
< OpenNews | hackdays | storyandalgorithm
- Project name: Department of Defense data dig
- One-line description of project: Checking defense contracts against campaign donations.
Dept. of Defense Data Dig
- Your team: Please list project team members
Nicola Hughes
Peter Richardson
Al Shaw
- Project URL(s), if applicable:
- Hashtag, if #relevant:
- What are you building: What will the thing you are creating do, enable or solve, in 1 human-readable paragraph
Command-line tools to compare giant CSV files containing US Department of Defense contracts against data from open government APIs showing political contributions.
- Who is it for: Describe your target user audience, e.g. "Everyone who uses the internet", "people in high risk environments", "People who still read 'Peanuts'"
Muck-raking trouble-makers who are likely to wind up on a list somewhere
- Your goal for this weekend: where are you trying to get to by 2:45pm Sunday in terms of features, functionality or other creations
A working pipeline: data in -> data out
- Your starting point: Are you writing from scratch? Designing a new concept? Modeling a data space? Extending a codebase or library? Adding a feature to a platform? Combining tools?
We have a large dataset from http://usaspending.gov/data and enough knowledge of Python and shell scripting to get into trouble
- Anything else we should know: At most 1 non-long paragraph of useful additional context
It's unlikely we'll find anything suspicious, I've always found politicians and corporate types to be an honorable bunch of super-nice people
- How is this project useful? The project helps to see if there are connections between contracts and campaign donations. The project is the first step in a larger project looking for lobbying influences between politicians and corporations.
- Where is this project going and what lessons/concepts can be applied to other projects? Nicola will be following up on the project and it could lead to a bigger picture project, expanded to lobbying in other areas for example, or a narrower project following a single contract. The project also involved figuring out techniques for manipulating large data sets, using APIs with large data sets, and putting together command line tools and scripts to get a handle on large amounts of data. Lessons can be shared on what size data set causes something to crash.