Drumbeat/MoJo/hackfest/berlin/projects/followthis

Project Name: FollowThis

   Project Lead(s): Matt Terenzio

1. need to be able to extract RDFa, Microformats from pages.
2. Need to be able to use NLP to extract entities if Semantic metadata is not present.
3. Need to be able to store and query the metadata.
4. Need a solid UI for users to be able to interact with the service.
5. A crawler for the news sources would be nice.

I have a working bookmarklet but it needs work. JQuery help.
Totally clueless on entity extraction from pages that don't have semantic metadata.
Also need to figure out SPARQL and the best persistent data store for RDF.

rNews has been brought up. I've installed the RDFa distiller from W3C and you can use it to distill RDFa from pages.

   Example which distills a page with rNews in it: 
   To use it, just call:
http://followth.is/cgi-bin/RDFa.py?uri=uri-of-we-page-youwant-to-distill

To extract keywords from some text I set up a CGI script that does so if you feed it text.

It should accept posts to that URL as well as gets.