Drumbeat/MoJo/hackfest/berlin/projects/followthis

From MozillaWiki
Jump to navigation Jump to search

Project Name: FollowThis

   Project Lead(s): Matt Terenzio
   

Big Goal for MoJo Hackfest:

  • Ship some usable code.
  • Learn how to manage an Open Source project.
  • Work with others on related projects.
  • Drink heavily.

Key steps toward goal:

  • 1. need to be able to extract RDFa, Microformats from pages.
  • 2. Need to be able to use NLP to extract entities if Semantic metadata is not present.
  • 3. Need to be able to store and query the metadata.
  • 4. Need a solid UI for users to be able to interact with the service.
  • 5. A crawler for the news sources would be nice.

Pending needs:

  • I have a working bookmarklet but it needs work. JQuery help.
  • Totally clueless on entity extraction from pages that don't have semantic metadata.
  • Also need to figure out SPARQL and the best persistent data store for RDF.

Link for more info:

rNews has been brought up. I've installed the RDFa distiller from W3C and you can use it to distill RDFa from pages.

   Example which distills a page with rNews in it: 
   To use it, just call:
http://followth.is/cgi-bin/RDFa.py?uri=uri-of-we-page-youwant-to-distill

To extract keywords from some text I set up a CGI script that does so if you feed it text.

example

It should accept posts to that URL as well as gets.

Link for demo:

Link to source code if applicable: