OpenNews/hackdays/storyandalgorithm/conditionofanonymity: Difference between revisions

Latest revision as of 12:34, 20 June 2012

Project name: Condition of Anonymity

One-line description of project: A fun way to explore the reasons the New York Times has given for granting anonymity to a source.

Your team: Daniel X. O'Neil, Daniel McLaughlin, and Gabriel Floirit

Project URL(s), if applicable: Info will be published to a Heroku project and streamed to our Twitter account: @conditionof

Hashtag, if #relevant: #conditionof

What are you building: Basically, this is a fun way to explore the reasons the New York Times has given for granting anonymity. We've got a method for consuming all articles published in the New York Times that contain statements from anonymous sources, a website to display the reasons the source was given anonymity ("because clause"), and a Twitter account @conditionof to stream new articles through time. The corpus is all NYT articles since January 1, 2000 that contain the phrase, "condition of anonymity" or "anonymity because". Users can click on any word in any because clause and see a chronological list of every because clase that contains that word.

Who is it for: This site is for New York Times aficionados, click-happy types who like Internet rabbit holes, and people who dig getting data from unstructured text.

Your goal for this weekend: Pull the relevant articles (done), analyze text (done), and publish the processed text in a chron list (done). Later: add ab about page, stream the because clauses on Twitter, and publish the Web site.

Your starting point: Using Natural Language Toolkit in Python and the New York Times Article Search API.

Anything else we should know: We need people who can help review the "because clauses" and mark interesting ones for display. Here's a document we're using to plan our work.

How is this project useful? It serves to highlight and draw attention to the reasoning given for using anonymous sources, and it does so in an accessible, entertaining way.

Where is this project going and what lessons/concepts can be applied to other projects? The goal project is to have a web application and Twitter bot that tweets out reasons given for anonymity. More generically, this is a lightweight way to find structure and data in unstructured text. Many industries use consistent phrasing (e.g. style books for journalists, standard operating procedure in police departments), and it makes it easy to find consistent phrasing with a minimum of programming effort. And, sometimes organizations use common phrases as a defense mechanism (e.g. no comment), but this turns that approach on its head because the consistent language is exactly what makes it so easy to find, gather, and analyze repeated references.

@@ Line 1: / Line 1: @@
 <ul><li><b>Project name:</b> Condition of Anonymity
 </li></ul>
-*<b>One-line description of project:</b> A fun way to explore the reasons the New York Times has given for granting anonymity.
+*<b>One-line description of project:</b> A fun way to explore the reasons the New York Times has given for granting anonymity to a source.
 <ul><li><b>Your team:</b> [https://twitter.com/#!/juggernautco Daniel X. O'Neil], [https://twitter.com/#!/mclaughlin Daniel McLaughlin], and [https://twitter.com/#!/gabrielflorit Gabriel Floirit]
 </li></ul>
@@ Line 8: / Line 8: @@
 <ul><li><b>Hashtag, if #relevant:</b> [https://twitter.com/#!/search/%22condition%20of%22 #conditionof]
 </li></ul>
-<ul><li><b>What are you building:</b> Basically, this is a fun way to explore the reasons the New York Times has given for granting anonymity. We're building a method for consuming all articles published in the New York Times that contain statements from anonymous sources, a website to display the reasons the source was given anonymity ("because clause"), the snippet in which that clause appears, the description of the source, and the information provided by source. The corpus is all NYT articles since January 1, 2000 that contain the phrase, "condition of anonymity" or "anonymity because". We're also streaming all new articles containing those phrases and streaming the clauses to Twitter [https://twitter.com/#!/conditionof @conditionof] , along with links to the full snippet on our site.
+<ul><li><b>What are you building:</b> Basically, this is a fun way to explore the reasons the New York Times has given for granting anonymity. We've got a method for consuming all articles published in the New York Times that contain statements from anonymous sources, a website to display the reasons the source was given anonymity ("because clause"), and a Twitter account  [https://twitter.com/#!/conditionof @conditionof] to stream new articles through time. The corpus is all NYT articles since January 1, 2000 that contain the phrase, "condition of anonymity" or "anonymity because". Users can click on any word in any because clause and see a chronological list of every because clase that contains that word.
 </li></ul>
-<ul><li><b>Who is it for:</b> This site is for New York Times aficionados, people who like blind items, and people who dig getting data from unstructured text.
+<ul><li><b>Who is it for:</b> This site is for New York Times aficionados, click-happy types who like Internet rabbit holes, and people who dig getting data from unstructured text.
 </li></ul>
-<ul><li><b>Your goal for this weekend:</b> Pull the relevant articles (done), analyze text (nearly done), publish the processed text (with snippet, description of source, anonymity reason, and information provided by source) in some fashion. Later: organize this data into an interface that allows users to provide guesses on the source and stream the because clauses on Twitter.
+<ul><li><b>Your goal for this weekend:</b> Pull the relevant articles (done), analyze text (done), and publish the processed text in a chron list (done). Later: add ab about page, stream the because clauses on Twitter, and publish the Web site.
 </li></ul>
 <ul><li><b>Your starting point:</b> Using [http://nltk.org/ Natural Language Toolkit] in Python and the [http://developer.nytimes.com/docs/article_search_api New York Times Article Search API].
 </li></ul>
 <ul><li><b>Anything else we should know:</b> We need people who can help review the "because clauses" and mark interesting ones for display. Here's [https://docs.google.com/document/d/1c7ohf_JKmvaqvUJYJq9vhjIhhsgVl5KY-t-rknfO0QI/edit a document we're using to plan our work].
 </li></ul>
+<ul><li>
+<b>How is this project useful?</b> It serves to highlight and draw attention to the reasoning given for using anonymous sources, and it does so in an accessible, entertaining way.
+</li></ul>
+<ul><li><b>Where is this project going and what lessons/concepts can be applied to other projects?</b> The goal project is to have a web application and Twitter bot that tweets out reasons given for anonymity. More generically, this is a lightweight way to find structure and data in unstructured text. Many industries use consistent phrasing (e.g. style books for journalists, standard operating procedure in police departments), and it makes it easy to find consistent phrasing with a minimum of programming effort. And, sometimes organizations use common phrases as a defense mechanism (e.g. no comment), but this turns that approach on its head because the consistent language is exactly what makes it so easy to find, gather, and analyze repeated references.

OpenNews/hackdays/storyandalgorithm/conditionofanonymity: Difference between revisions

Latest revision as of 12:34, 20 June 2012

Navigation menu

Search