Drumbeat/MoJo/hackfest/berlin/projects/MetaProject: Difference between revisions
(→Tasks) |
(→Tasks) |
||
| Line 207: | Line 207: | ||
- '''audio_transitions''':''array'' // list of [HH:MM:SS, sound_id] tuples | - '''audio_transitions''':''array'' // list of [HH:MM:SS, sound_id] tuples | ||
===== | =====identify_entities===== | ||
Identify | Identify entities (e.g. people, organizations, and locations) found in the video transcript | ||
Powered by [???] | Powered by [???] | ||
| Line 218: | Line 218: | ||
<u>'''Outputs'''</u> | <u>'''Outputs'''</u> | ||
- ''' | - '''entities''':''array'' // array of [HH:MM:SS, entity, type] tuples in the document | ||
===== | =====identify_faces===== | ||
Identify faces that appear in the video | |||
Powered by | Powered by ?? | ||
<u>'''Inputs'''</u> | <u>'''Inputs'''</u> | ||
- '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis | - '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis | ||
<u>'''Outputs'''</u> | <u>'''Outputs'''</u> | ||
- ''' | - '''faces''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [x,y] miid ]] tuples | ||
===== | =====identify_keywords===== | ||
Identify | Identify main keywords found in the video, either video wide or per time segment | ||
Powered by [ | Powered by [http://csc.media.mit.edu/luminoso Luminoso] | ||
<u>'''Inputs'''</u> | <u>'''Inputs'''</u> | ||
- '''block_size''':''int'' {D: 0} // size of the time blocks in seconds (0 means entire video) | |||
<u>'''Outputs'''</u> | <u>'''Outputs'''</u> | ||
- ''' | - '''video_keywords''':''array'' // list of [start HH:MM:SS, [keywords]] tuples for each time block | ||
===== | =====identify_video_transitions===== | ||
Identify | Identify moments of distinct changes in video content (e.g. scene changes). | ||
Powered by [ | Powered by [???] | ||
<u>'''Inputs'''</u> | <u>'''Inputs'''</u> | ||
None | |||
<u>'''Outputs'''</u> | <u>'''Outputs'''</u> | ||
- ''' | - '''video_transitions''':''array'' // list of [HH:MM:SS, scene_id] tuples | ||
===== | =====ocr===== | ||
Attempt to extract any digital characters found in the video. | |||
Powered by ?? | Powered by [???] | ||
<u>'''Inputs'''</u> | <u>'''Inputs'''</u> | ||
- '''focus_blocks''':''array'' {D: null} // list of [x, y, h, w] boxes that contain specific segments of OCR | |||
- '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis | - '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis | ||
<u>'''Outputs'''</u> | <u>'''Outputs'''</u> | ||
- ''' | - '''ocr_results''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [x, y], string]] tuples | ||
===Audio=== | ===Audio=== | ||
Revision as of 13:09, 27 September 2011
The Meta Project is a tool which provides a simple service: take in any piece of media, spit out all the meta possible.
Meta Standards Resources
(Add links and summaries to documents discussing metadata)
- rNews is a proposed standard for using RDFa to annotate news-specific metadata in HTML documents.
- Metafragments proposed metadata markup for audio and video. - (Julien Dorra)
Known APIs and Tools
(Add links and summaries of toolkits and APIs which can help generate data!)
- http://m.vid.ly/user/ - won't generate metadata but can help with format conversions
Desired Functionality
TEXT
Valid Inputs: URL, Plain Text, HTML
Optional Inputs: Known Metadata
Returned Metadata:
- Primary Themes (Document-wide) - Primary Themes (Per-paragraph) - Suggested Tags - Entities (Names, Locations, Dates, Organizations) and their locations in text - Author - Publishing organization (if any) - Date initially published and date last updated - Names of people who are quoted - Quotes - Other texts cited and/or linked (books, articles, urls) - All other numbers (that aren't dates) and their units (i.e. data points cited) - Corrections
VIDEO
Valid Inputs: URL, Video (format? .mov and .mp4 are the dominate ones)
Optional Inputs: Transcript, Faces, Known Metadata
Returned Metadata:
- Transcript - Moments of audio transition (new speaker) - Moments of video transition (new scene) - OCR data (any text that appears on image) and their timestamps - Entities (Names, Locations) and their timestamps - Suggested Tags - Face identification and their timestamp ranges [only done if faces are provided]
AUDIO
Valid Inputs: URL, Audio (mp3, wav)
Optional Inputs: Transcript, Voice Samples, Known Metadata
Returned Metadata:
- Transcript - Moments of audio transition (new speaker) - Entities (Names, Locations) and their timestamps - Suggested Tags - Voice identification and their timestamp ranges [only done if voice samples are provided]
IMAGE
Valid Inputs: URL, Image (jpg, gif, bmp, png, tif)
Optional Inputs: Faces, Known Metadata
Returned Metadata:
- OCR data and it's coordinate location - Object identification - Face identification [only done if faces are provided]
In photo we have:
- caption - author and job title - headline - keywords - location - date - copyright - news org name
API
API to be as RESTful as posisble. Current thought is that POST will be used to upload the media item (if needed) which will return a Media Item ID (MIID), GET will be used to perform the actual analysis (taking in either an external URL, or the MIID returned from a POST).
Entity Types
* text * image * video * audio
Text
URL: /api/text
POST
Inputs
- text_file:file // text file to store on the server
- url:str // url containing the text to store on the server
- text:str // text to store on the server
- ttl:int {D:0} // number of seconds until the file will be removed from the system (0 means indefinitely)
Note: Either text_file, url, or text must be provided
Outputs
- miid:int // unique media item id assigned to this item
GET
Inputs
- miid:int // server-provided media item id to be analyzed
- url:str // url containing the text to be analyzed
- text:str // text to be analyzed
- tasks:dictionary // list of tasks to perform
- results:dictionary {D: null} // list of results from past tasks to perform
Note: Either miid, url, or text must be provided
Outputs
- results:dictionary // list of task results (one result object per task).
Tasks
identify_entities
Identify entities (e.g. people, organizations, and locations) found in the text, either document wide, per paragraph, or both
Powered by [???]
Inputs
???
Outputs
- entities:array // array of [position, entity, type] tuples in the document
identify_keywords
Identify main keywords found in the text, either document wide, per paragraph, or both
Powered by Luminoso
Inputs
- type:enum('document','paragraph', 'both') {D: 'document'} // The scope of keywords to be extracted
Outputs
- document_keywords:array // list of keywords for the entire document - paragraph_keywords:array // list of keywords for each paragraph
Video
URL: /api/video
POST
Inputs
- video_file:file // video file to store on the server
- url:str // url containing the video to store on the server
- ttl:int {D:0} // number of seconds until the file will be removed from the system (0 means indefinitely)
Note: Either video_file or url must be provided
Outputs
- miid:int // unique media item id assigned to this item
GET
Inputs
- miid:int // server-provided media item id to be analyzed
- url:str // url containing the video to be analyzed
- tasks:dictionary // list of tasks to perform.
- results:dictionary {D null} // list of results from past tasks.
Note: Either miid or url must be provided
Outputs
- results:dictionary // list of task results (one result object per task)
Tasks
transcribe
Attempt to create a timestamped transcript for the video. The transcript will either be ripped from CC data or estimated using speech to text algorithms.
Powered by [???]
Inputs
None
Outputs
- transcript:array // list of [HH:MM:SS, transcript] tuples
- transcription_method:enum('cc','stt') // method used to generate the transcript
identify_audio_transitions
Identify moments of distinct changes in audio content (e.g. speaker changes).
Powered by [???]
Inputs
None
Outputs
- audio_transitions:array // list of [HH:MM:SS, sound_id] tuples
identify_entities
Identify entities (e.g. people, organizations, and locations) found in the video transcript
Powered by [???]
Inputs
None
Outputs
- entities:array // array of [HH:MM:SS, entity, type] tuples in the document
identify_faces
Identify faces that appear in the video
Powered by ??
Inputs
- sample_rate:int {D: 1} // number of frames per second to sample for analysis
Outputs
- faces:array // list of [start HH:MM:SS, end HH:MM:SS, [x,y] miid ]] tuples
identify_keywords
Identify main keywords found in the video, either video wide or per time segment
Powered by Luminoso
Inputs
- block_size:int {D: 0} // size of the time blocks in seconds (0 means entire video)
Outputs
- video_keywords:array // list of [start HH:MM:SS, [keywords]] tuples for each time block
identify_video_transitions
Identify moments of distinct changes in video content (e.g. scene changes).
Powered by [???]
Inputs
None
Outputs
- video_transitions:array // list of [HH:MM:SS, scene_id] tuples
ocr
Attempt to extract any digital characters found in the video.
Powered by [???]
Inputs
- focus_blocks:array {D: null} // list of [x, y, h, w] boxes that contain specific segments of OCR
- sample_rate:int {D: 1} // number of frames per second to sample for analysis
Outputs
- ocr_results:array // list of [start HH:MM:SS, end HH:MM:SS, [x, y], string]] tuples
Audio
URL: /api/audio
Image
URL: /api/image