Drumbeat/MoJo/hackfest/berlin/projects/followthis: Difference between revisions

Revision as of 08:45, 29 September 2011

Project Name: FollowThis

   Project Lead(s): Matt Terenzio

Big Goal for MoJo Hackfest:

Ship some usable code. (achieved)
Learn how to manage an Open Source project.
Work with others on related projects. (achieved)
Drink heavily. (achieved)

Key steps toward goal:

1. need to be able to extract RDFa, Microformats from pages. (working)
2. Need to be able to use NLP to extract entities if Semantic metadata is not present. (adopt and contribute to metameta project for this)
3. Need to be able to store and query the metadata. (am currently able to query the RDF triplestore but need to hone queries)
4. Need a solid UI for users to be able to interact with the service. (getting there)
5. A crawler for the news sources would be nice. (deferred to version .2)

Pending needs:

Important:Need to make a button that is an embeddable widget for ease of deployment
I have a working bookmarklet but it needs work. JQuery help. (still need a session with jquery expert)
Totally clueless on entity extraction from pages that don't have semantic metadata. (solved somewhat)
Also need to figure out SPARQL and the best persistent data store for RDF. (Laurian gave me some good starting points)

Link for more info:

rNews has been brought up. I've installed the RDFa distiller from W3C and you can use it to distill RDFa from pages.

   Example which distills a page with rNews in it: 
   To use it, just call:
http://followth.is/cgi-bin/RDFa.py?uri=uri-of-we-page-youwant-to-distill

(update: Matt has a better entity extractor than this using Stanford NLP -- will use that) To extract keywords from some text I set up a CGI script that does so if you feed it text.

example

It should accept posts to that URL as well as gets.

First pass at a readability-like way to extract the article text and headline from a web page:

http://followth.is/read/article/http%3A%2F%2Fwww.thehour.com%2Fstory%2F511535%2Ffrank-fay-way-we-were/

Another endpoint that distills RDFa froma web page (this one in PHP)

http://followth.is/transform/?type=rdfa&url=http://www.thehour.com/story/511535/frank-fay-way-we-were

A SPARQL endpoint for the triplestore of rNews data

http://followth.is/transform/sparql/

Link for demo:

FollowThis demo

Link to source code:

FollowThis on GitHub

Where from here:

Though code is in working form, it is necessary to clean and organize a few parts for better forward maintainability and extension
Continue to work on open alternatives to some of the portions that use third party APIs
Documentation for both developers and users
Promote rNews adoption
Deploy to at least one news site by end of 2011

Project Status

Currently working features include...
The project is currently capable of doing...
The project currently functions in these contexts...

Collaborators

The following folks helped with this project:

Laurian/How to model data for RDF storage and how to query that data using SPARQL
Raynor/TF-IDF (term frequency–inverse document frequency)
Laurian/Raynor Cosine Similarity concepts for comparing documents
Jordan What constitutes a valuable difference between documents from a user or journalist perspective
Chris CMS perspectives from a Journalists standpoint

Next steps

- From here I would like to:

NEXT IMPLEMENTATION STEP 1
NEXT IMPLEMENTATION STEP 2
NEXT IMPLEMENTATION STEP 3

Places where this project might be tested include:

TEST CONTEXT 1
TEST CONTEXT 2
TEST CONTEXT 3

@@ Line 62: / Line 62: @@
 * Promote rNews adoption
 * Deploy to at least one news site by end of 2011
+=== Project Status  ===
+* Currently working features include...
+* The project is currently capable of doing...
+* The project currently functions in these contexts...
+=== Collaborators  ===
+The following folks helped with this project:
+* Laurian/How to model data for RDF storage and how to query that data using SPARQL
+* Raynor/TF-IDF (term frequency–inverse document frequency)
+* Laurian/Raynor Cosine Similarity concepts for comparing documents
+* Jordan What constitutes a valuable difference between documents from a user or journalist perspective
+* Chris CMS perspectives from a Journalists standpoint
+=== Next steps  ===
+- From here I would like to:
+* NEXT IMPLEMENTATION STEP 1
+* NEXT IMPLEMENTATION STEP 2
+* NEXT IMPLEMENTATION STEP 3
+Places where this project might be tested include:
+* TEST CONTEXT 1
+* TEST CONTEXT 2
+* TEST CONTEXT 3

Drumbeat/MoJo/hackfest/berlin/projects/followthis: Difference between revisions

Revision as of 08:45, 29 September 2011

Contents

Project Name: FollowThis

Big Goal for MoJo Hackfest:

Key steps toward goal:

Pending needs:

Link for more info:

Link for demo:

Link to source code:

Where from here:

Project Status

Collaborators

Next steps

Navigation menu

Drumbeat/MoJo/hackfest/berlin/projects/followthis: Difference between revisions

Revision as of 08:45, 29 September 2011

Project Name: FollowThis

Big Goal for MoJo Hackfest:

Key steps toward goal:

Pending needs:

Link for more info:

Link for demo:

Link to source code:

Where from here:

Project Status

Collaborators

Next steps

Navigation menu

Search