Drumbeat/MoJo/hackfest/berlin/projects/MetaProject: Difference between revisions
(→POST) |
(→POST) |
||
| Line 109: | Line 109: | ||
- '''url''':''str'' // url containing the text to store on the server | - '''url''':''str'' // url containing the text to store on the server | ||
- '''text''':''str'' // text to store on the server | - '''text''':''str'' // text to store on the server | ||
- '''ttl''':''int'' {D: | - '''ttl''':''int'' {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely) | ||
''Note:'' Either text_file, url, or text must be provided | ''Note:'' Either text_file, url, or text must be provided | ||
Revision as of 08:24, 28 September 2011
The Meta Project is a tool which provides a simple service: take in any piece of media, spit out all the meta possible.
Meta Standards Resources
(Add links and summaries to documents discussing metadata)
- rNews is a proposed standard for using RDFa to annotate news-specific metadata in HTML documents.
- Metafragments proposed metadata markup for audio and video. - (Julien Dorra)
Known APIs and Tools
(Add links and summaries of toolkits and APIs which can help generate data!)
- http://m.vid.ly/user/ - won't generate metadata but can help with format conversions
Desired Functionality
TEXT
Valid Inputs: URL, Plain Text, HTML
Optional Inputs: Known Metadata
Returned Metadata:
- Primary Themes (Document-wide) - Primary Themes (Per-paragraph) - Suggested Tags - Entities (Names, Locations, Dates, Organizations) and their locations in text - Author - Publishing organization (if any) - Date initially published and date last updated - Names of people who are quoted - Quotes - Other texts cited and/or linked (books, articles, urls) - All other numbers (that aren't dates) and their units (i.e. data points cited) - Corrections
VIDEO
Valid Inputs: URL, Video (.mov .mp4)
Optional Inputs: Transcript, Faces, Known Metadata
Returned Metadata:
- Transcript - Moments of audio transition (new speaker) - Moments of video transition (new scene) - OCR data (any text that appears on image) and their timestamps - Entities (Names, Locations) and their timestamps - Suggested Tags - Face identification and their timestamp ranges [only done if faces are provided] - caption/summary - author and job title - headline - keywords - location - date - copyright - news org name - URL to related word story
AUDIO
Valid Inputs: URL, Audio (mp3, wav)
Optional Inputs: Transcript, Voice Samples, Known Metadata
Returned Metadata:
- Transcript - Moments of audio transition (new speaker) - Entities (Names, Locations) and their timestamps - Suggested Tags - Voice identification and their timestamp ranges [only done if voice samples are provided]
IMAGE
Valid Inputs: URL, Image (jpg, gif, bmp, png, tif)
Optional Inputs: Faces, Known Metadata
Returned Metadata:
- OCR data and it's coordinate location - Object identification - Face identification [only done if faces are provided] - Location identification
In photo we have:
- caption - author and job title - headline - keywords - location - date - copyright - news org name
API
API to be as RESTful as posisble. Current thought is that POST will be used to upload the media item (if needed) which will return a Media Item ID (MIID), GET will be used to perform the actual analysis (taking in either an external URL, or the MIID returned from a POST).
Entity Types
* text * image * video * audio * visual interactive
Text
URL: /api/text
POST
Inputs
- text_file:file // text file to store on the server
- url:str // url containing the text to store on the server
- text:str // text to store on the server
- ttl:int {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)
Note: Either text_file, url, or text must be provided
Outputs
- miid:int // unique media item id assigned to this item
GET
Inputs
- miid:int // server-provided media item id to be analyzed
- url:str // url containing the text to be analyzed
- text:str // text to be analyzed
- tasks:dictionary // list of tasks to perform
- results:dictionary {D: null} // list of results from past tasks
Note: Either miid, url, or text must be provided
Outputs
- results:dictionary // list of task results (one result object per task).
Tasks
identify_entities
Identify entities (e.g. people, organizations, and locations) found in the text, either document wide, per paragraph, or both
Powered by [???]
Inputs
None
Outputs
- entities:array // array of [position, entity, type] tuples in the document
identify_keywords
Identify main keywords found in the text, either document wide, per paragraph, or both
Powered by Luminoso
Inputs
- type:enum('document','paragraph', 'both') {D: 'document'} // The scope of keywords to be extracted
Outputs
- document_keywords:array // list of keywords for the entire document - paragraph_keywords:array // list of keywords for each paragraph
Video
URL: /api/video
POST
Inputs
- video_file:file // video file to store on the server
- url:str // url containing the video to store on the server
- ttl:int {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)
Note: Either video_file or url must be provided
Outputs
- miid:int // unique media item id assigned to this item
GET
Inputs
- miid:int // server-provided media item id to be analyzed
- url:str // url containing the video to be analyzed
- tasks:dictionary // list of tasks to perform
- results:dictionary {D null} // list of results from past tasks
Note: Either miid or url must be provided
Outputs
- results:dictionary // list of task results (one result object per task)
Tasks
identify_audio_transitions
Identify moments of distinct changes in audio content (e.g. speaker changes).
Powered by [???]
Inputs
None
Outputs
- audio_transitions:array // list of [HH:MM:SS, sound_id] tuples
identify_entities
Identify entities (e.g. people, organizations, and locations) found in the video transcript
Powered by [???]
Inputs
None
Outputs
- entities:array // array of [HH:MM:SS, entity, type] tuples in the document
identify_faces
Identify faces that appear in the video
Powered by ??
Inputs
- sample_rate:int {D: 1} // number of frames per second to sample for analysis
Outputs
- faces:array // list of [start HH:MM:SS, end HH:MM:SS, [x,y] miid ]] tuples
identify_keywords
Identify main keywords found in the video, either video wide or per time segment
Powered by Luminoso
Inputs
- block_size:int {D: 0} // size of the time blocks in seconds (0 means entire video)
Outputs
- video_keywords:array // list of [start HH:MM:SS, [keywords]] tuples for each time block
identify_video_transitions
Identify moments of distinct changes in video content (e.g. scene changes).
Powered by [???]
Inputs
None
Outputs
- video_transitions:array // list of [HH:MM:SS, scene_id] tuples
ocr
Attempt to extract any digital characters found in the video.
Powered by [???]
Inputs
- focus_blocks:array {D: null} // list of [x, y, h, w] boxes that contain specific segments of OCR
- sample_rate:int {D: 1} // number of frames per second to sample for analysis
Outputs
- ocr_results:array // list of [start HH:MM:SS, end HH:MM:SS, [x, y], string]] tuples
transcribe
Attempt to create a timestamped transcript for the video. The transcript will either be ripped from CC data or estimated using speech to text algorithms.
Powered by [???]
Inputs
None
Outputs
- transcript:array // list of [HH:MM:SS, transcript] tuples
- transcription_method:enum('cc','stt') // method used to generate the transcript
Audio
URL: /api/audio
Image
URL: /api/image