Drumbeat/MoJo/hackfest/berlin/projects/MetaProject
The Meta Project is a tool which provides a simple service: take in any piece of media, spit out all the meta possible.
Meta Standards Resources
(Add links and summaries to documents discussing metadata)
- rNews is a proposed standard for using RDFa to annotate news-specific metadata in HTML documents.
- Metafragments proposed metadata markup for audio and video. - (Julien Dorra)
Known APIs and Tools
(Add links and summaries of toolkits and APIs which can help generate data!)
Desired Functionality
TEXT
Valid Inputs: URL, Plain Text, HTML
Optional Inputs: Known Metadata
Returned Metadata:
- Primary Themes (Document-wide) - Primary Themes (Per-paragraph) - Suggested Tags - Entities (Names, Locations, Dates, Organizations) and their locations in text - Author - Publishing organization (if any) - Date initially published and date last updated - Names of people who are quoted - Quotes - Other texts cited and/or linked (books, articles, urls) - All other numbers (that aren't dates) and their units (i.e. data points cited) - Corrections
VIDEO
Valid Inputs: URL, Video (format? .mov and .mp4 are the dominate ones)
Optional Inputs: Transcript, Faces, Known Metadata
Returned Metadata:
- Transcript - Moments of audio transition (new speaker) - Moments of video transition (new scene) - OCR data (any text that appears on image) and their timestamps - Entities (Names, Locations) and their timestamps - Suggested Tags - Face identification and their timestamp ranges [only done if faces are provided]
In photo we have:
- caption - author and job title - headline - keywords - location - date - copyright - news org name
AUDIO
Valid Inputs: URL, Audio (mp3, wav)
Optional Inputs: Transcript, Voice Samples, Known Metadata
Returned Metadata:
- Transcript - Moments of audio transition (new speaker) - Entities (Names, Locations) and their timestamps - Suggested Tags - Voice identification and their timestamp ranges [only done if voice samples are provided]
IMAGE
Valid Inputs: URL, Image (jpg, gif, bmp, png, tif)
Optional Inputs: Faces, Known Metadata
Returned Metadata:
- OCR data and it's coordinate location - Object identification - Face identification [only done if faces are provided]