Festival2012/Submit/Make The Robots Do The Hard Work
- Title of session: Make the robots do the hard work! Writing screen scrapers for news gathering with PANDA
- Your name and affiliation: Brian Boyer, NPR. Chris Groskopf, NPR. Joe Germuska, Chicago Tribune. Ryan Pitts, The Spokesman Review.
- Session format: Learning Lab
What will your session or activity allow people to make, learn or do?
Imagine that you're a bike enthusiast, or a reporter writing about cycling issues, and you'd like to know whenever a bike theft has been reported. You could call the police department every morning. But wouldn't you rather just get an email about the police report?
In this session, people will learn the basics of writing screen scrapers -- little bits of software that help you turn web pages into data you can use. Then we'll send the data into PANDA project, which lets you set up saved searches for your data.
That way, when new data arrives, the PANDA will email you!
People will leave with the ability to apply these skills to any beat where you can get data -- campaign finance, corporate filings, product recalls, the sky is the limit.
How do you see that working?
We'll have example scrapers prepared for the group, and depending on everyone's experience, we'll talk about screen scrapers for a bit and then walk people through writing their own. Then we'll dive in to how you'd make your scraper work with PANDA. We'll have a special PANDA set up for use during the festival.
How will you deal with 5, 15, 50 participants?
Varying skill sets in a large group will be a challenge, but we'll use tools like ScraperWiki to at least level the technical requirements out. If we've got a large number of participants, we'll get people into small groups to work on scrapers together so that we have enough technical expertise to go around.
How long within your session before someone else can teach this?
These things are often remarkably easy if you've got even a little programming knowledge. We hope that we'll have enough people with the skills that they could teach in 30 minutes to an hour.
What do you see as outcomes after the festival?
The goal of the PANDA Project is to get more journalists working with data on a daily basis, by making data visible and by increasing newsroom efficiency via automation.
We hope that participants at news organizations (and everywhere else!) will take these skills back to work, and be more effective at finding stories in data.