Knight-Mozilla-MIT "Story and Algorithm" Hack Day
The Knight-Mozilla OpenNews project is sponsoring at 24-hour hack day as a lead-in to the 2012 MIT-Knight Civic Media Conference. While the conference is invite-only, the hack day is open to talented developers who want to spend their weekend working with others to build amazing things.
Following the conference theme of the "the Story and the Algorithm," this hack day will be focused on new ways that data lets us tell compelling stories.
If you're tweeting about this hackday, please use the #storyhack hashtag
- Where: MIT Media Lab 5th Floor 75 Amherst Street Cambridge, MA 02139
- When: 3pm sharp Saturday June 16 to 4pm Sunday June 17
- Will there be food? Yes, there will be food. We will be providing dinner and late-night snacks on Saturday night and breakfast and lunch on Sunday.
- What to bring You will need to bring your own laptop and power supply.
- We'll supply the WiFi, the plugs, and collaboration and brainstorming materials like post-its, sharpies, etc.
- 3pm Opening Circle
- 6pm Dinner
- 9pm Late-night Snack
- 10pm Building closes
- 9:00am Building opens
- 9:30am Breakfast
- 12:30pm Lunch
- 2:45pm Show and Tell
- 3:45pm Closing Circle
Here are the teams working on projects this weekend:
- Condition of anonymity
- Department of Defense data dig
- Dim sum news
- jAngels: Visualizing journalism angel funding
- News your own adventure
- Progressive Voter Index
To pre-seed ideas for the hack day, we invited some of the attendees to outline some basic concepts of things to build/problems to solve. If you like these, please add a +1 next to the item. If you have your own ideas, please feel free to add them below.
- Tool-building around Senate race data, by working with the Brown/Warren race in Massachusetts. It's got a very nice local hook, and also massive national repercussions. Could be lots of ways to slice and dice it.
- Distributary: http://newschallenge.tumblr.com/post/19486363659/distributary -- The idea is it you submit a topic / location / theme and the system generates a twitter list for you of people to follow who are experts in that topic / location / theme. Metaphor can be found here: http://en.wikipedia.org/wiki/Distributary
- Meta Meta: The Meta Meta Project is a tool which provides a simple service: take in any piece of media, spit out all the meta possible. Project Wiki.
- ATTN-SPAN: Take government access footage, process it, generate personalized video clip shows containing personally relevant primary source info. http://slifty.com/2011/08/learning-lab-final-project-attn-span/.
- News Your Own Adventure: An automated choose your own adventure news mashup. Take multiple articles about the same event, process them all to create a network of related paragraphs, associate key topics to each chunk through keyword extraction, then present them in a way that lets the reader decide what they want to learn more about. As you read it you can make choices to read different parts of each article that have been stitched together.
- Reporting dashboard - An admin and dashboard for data mining, analysis and aggregation got data sources around a specific beat or special project. For example, say you are reporting on an area of a city that has some anomaly of high-crime. Editor has a gut feeling that there is something there but now is on a fishing mission. So they get tons of data sets from different civic silos, and now cross analyze. Using Refine, clean and normalize the sets … the system is plugged into a deeper database that houses all of the newsrooms datasets and allows the user to query from project to project to find stories. We are adding in the Globe's API hooks to help find archive stories … And are pulling in geolocated social media (tweets, instagram, foursquare, etc) to help the reporters find characters within that community. Like I said, we are in the early stages of this, happy to hear any thoughts or ideas… I think think this is a system that could be open-sourced (maybe built on top of PANDA).
- DocumentCloud for datasets - Create notes, or system of annotating datasets. Or possible creating notes an interest ion of of multiple datasets. Also, give the public data a home and cataloging system. Maybe there is a part of this that could be worked on?
- Taking inspiration from the just-launched ProPublica election campaign e-mail tracker, extend it to physical mailings. A way for people to shoot their mailers with their phone and/or scan and e-mail in to a system that could OCR and add metadata about candidate, topic etc, create a searchable database of them.
- Seamless sharing: (in response to the circulating IA article on sharing tools, http://informationarchitects.net/blog/sweep-the-sleaze/ ) A system to detect on page load if the user’s browser/device made sharing seamless for the user… and if so, it drops the call to load sharing buttons at all. I wonder if there is some potential for a POC here.
- DataCouch: Github for data. As a simple, quick conversion tool, DataCouch can take any dataset and make it into a programmatically accessible API. This tool was started as a Code for America project, pick it up and hack on it more. http://codeforamerica.org/?cfa_project=data-couch
- supercool.io: The local storage capabilities in browsers and mobile devices is, IMHO, underutilized in news apps and I think we all might benefit from a project that made it easier to use and standardized some of the data types so we could more readily share code across organizations and apps. UPDATE: Thinking about it today and looking back at the Meta Meta project I'm thinking a doable portion of this would be to make a JSON serialization of select parts of the rNews and model them for local storage using backbone.js and its local storage plugin or another local storage abstraction library -- @mterenzio
Tools & APIs
Have tools or APIs that would be helpful for this hack weekend? List them here.
- Boston Globe API - Chris Marstall has an early version of this that we made available for the Boston Innovation Challenge a few weeks back. I am looking into getting the blessing to have this available for OpenNews … I am pretty sure that it will be OK. It's a rough, hacked together version of something like the nytimes API. It offers HTTP/Json-based querying of globe articles from the past 6 months based on date, keyword, article type, lat/long, etc. Chris has offered up himself to do a quick walk-thru using google refine navigate it in lieu a presentation or demo.
- Commuter rail data - The Boston Globe has been storing a little over six months of Boston-area commuter rail data. There are about 23.8m entries stored (and growing). It includes the names of the trains, date/time it left, how late it was, station it was at, etc …
- Mass Elections Data - Boundary data for political divisions from the state’s GIS office. Much of the data, in the form of shapefiles, is available on their web site. Here’s a good collection: http://www.mass.gov/mgis/laylist.htm. There are separate files for congressional, legislative, county, and governor’s council districts. That’s what ethepeople uses to map specific addresses to the correct congressional, state, and county districts. There are a few gotchas at the county level, including, as I recall, the Franklin County DA also covering three towns in an adjoining county, but almost all political divisions except can be figured out from these files. There's also a town clerks association that might be helpful, since they run elections in most of the 351 cities and towns. http://www.newenglandclerks.org/content/121/212/default.aspx
- Teleportd API - Teleportd is a realtime photo search engine aggregating more than 5m public mobile pictures everyday. Teleportd's search engine is accessible through their REST API: http://teleportd.com/api. Using teleportd's API is a great way to add up-to-date visual content to your apps. Example applications using the API: http://www.teleportd.com/detectd (teleportd algorithmic event detector), http://smilesfilm.com/ (Yoko Ono's artistic film project), http://snapquest.me/ (Local image discovery)
- n0tice API - Crowdmapping can be useful for collective investigations. Here is a list of some high impact and large scale research projects where simple task-based participation fed some sort of larger context: http://www.mattmcalister.com/blog/2012/06/14/1833/the-power-of-collective-research-task-based-investigations-and-swarm-intelligence/. n0tice.com and the n0tice API can be a powerful combination for conducting large scale crowdmapping efforts. API Documentation: http://n0tice.org/developers/api-documentation/. How To Build a Crowdmap: http://n0tice.org/how-to-crowdmap-using-n0tice/
If you're tweeting about this hackday, please use the #storyhack hashtag
If you're blogging about the event, please link to it here.
- Toward a Generic Context Engine for Civic Data (Daniel X O'Neil)
- Countdown to the Knight-Mozilla-MIT Hack Day (Dan Sinker)
- Yak Shaving, Magical Incantations, and Data Journalism (Lisa Williams)
Places to Eat and Drink
The hotel bar and restaurant in the Marriott is lackluster; I'd avoid it if you can.
- Za Homemade pickles, excellent wine list. Rest of menu is pizza and salad and THAT'S IT. Across the street from the Marriott.
- Mary Chung's Classic MIT hangout. Chinese food. Eat the Suan La Chow Show. In Central Square. Lip-dragging distance from Le Meridien; an eight minute walk from the Marriott.
Do You Want To Eat A Vegetable?
Lisa Williams' Guide to Things To Do, See and Eat in Boston From a native.