|
|
| (25 intermediate revisions by 2 users not shown) |
| Line 1: |
Line 1: |
| The Meta Project is a tool which provides a simple service: take in any piece of media, spit out all the meta possible.
| | This page [[Drumbeat/MoJo/hackfest/berlin/projects/MetaMetaProject|has been moved.]] |
| | |
| ==Meta Standards Resources==
| |
| (Add links and summaries to documents discussing metadata)
| |
| | |
| * [http://dev.iptc.org/rNews rNews] is a proposed standard for using RDFa to annotate news-specific metadata in HTML documents.
| |
| | |
| * [https://docs.google.com/document/pub?id=1gHBKJxdLEQVBOax5uis7_3b3gOQIUQHfJKZY-FWQ_2c Metafragments] proposed metadata markup for audio and video. - (Julien Dorra)
| |
| | |
| ==Known APIs and Tools==
| |
| (Add links and summaries of toolkits and APIs which can help generate data!)
| |
| * http://m.vid.ly/user/ - won't generate metadata but can help with format conversions
| |
| | |
| ==Desired Functionality==
| |
| === TEXT ===
| |
| | |
| '''Valid Inputs:''' URL, Plain Text, HTML
| |
| | |
| '''Optional Inputs:''' Known Metadata
| |
| | |
| '''Returned Metadata:'''
| |
| | |
| - Primary Themes (Document-wide)
| |
| - Primary Themes (Per-paragraph)
| |
| - Suggested Tags
| |
| - Entities (Names, Locations, Dates, Organizations) and their locations in text
| |
| - Author
| |
| - Publishing organization (if any)
| |
| - Date initially published and date last updated
| |
| - Names of people who are quoted
| |
| - Quotes
| |
| - Other texts cited and/or linked (books, articles, urls)
| |
| - All other numbers (that aren't dates) and their units (i.e. data points cited)
| |
| - Corrections
| |
| | |
| ===VIDEO===
| |
| '''Valid Inputs:''' URL, Video (format? .mov and .mp4 are the dominate ones)
| |
| | |
| '''Optional Inputs:''' Transcript, Faces, Known Metadata
| |
| | |
| '''Returned Metadata:'''
| |
| - Transcript
| |
| - Moments of audio transition (new speaker)
| |
| - Moments of video transition (new scene)
| |
| - OCR data (any text that appears on image) and their timestamps
| |
| - Entities (Names, Locations) and their timestamps
| |
| - Suggested Tags
| |
| - Face identification and their timestamp ranges [only done if faces are provided]
| |
| | |
| ===AUDIO===
| |
| '''Valid Inputs:''' URL, Audio (mp3, wav)
| |
| | |
| '''Optional Inputs:''' Transcript, Voice Samples, Known Metadata
| |
| | |
| '''Returned Metadata:'''
| |
| - Transcript
| |
| - Moments of audio transition (new speaker)
| |
| - Entities (Names, Locations) and their timestamps
| |
| - Suggested Tags
| |
| - Voice identification and their timestamp ranges [only done if voice samples are provided]
| |
| | |
| ===IMAGE===
| |
| '''Valid Inputs:''' URL, Image (jpg, gif, bmp, png, tif)
| |
| | |
| '''Optional Inputs:''' Faces, Known Metadata
| |
| | |
| '''Returned Metadata:'''
| |
| - OCR data and it's coordinate location
| |
| - Object identification
| |
| - Face identification [only done if faces are provided]
| |
| | |
| '''In photo we have:'''
| |
| - caption
| |
| - author and job title
| |
| - headline
| |
| - keywords
| |
| - location
| |
| - date
| |
| - copyright
| |
| - news org name
| |
| | |
| ==API==
| |
| API to be as RESTful as posisble. Current thought is that POST will be used to upload the media item (if needed) which will return a ''Media Item ID (MIID)'', GET will be used to perform the actual analysis (taking in either an external URL, or the MIID returned from a POST).
| |
| | |
| '''Entity Types'''
| |
| * text
| |
| * image
| |
| * video
| |
| * audio
| |
| | |
| ===Text===
| |
| '''URL:''' /api/text
| |
| | |
| ====POST====
| |
| <u>'''Inputs'''</u>
| |
| | |
| - '''text_file''':''file'' // The text file to store on the server
| |
| - '''url''':''str'' // The url containing the text to store on the server
| |
| - '''text''':''str'' // The text to store on the server
| |
| - '''ttl''':''int'' {D:0} // The number of seconds until the file will be removed from the system (0 means indefinitely)
| |
| | |
| ''Note:'' Either text_file, url, or text must be provided
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''miid''':''int'' // The unique media item id assigned to this item
| |
| | |
| ====GET====
| |
| <u>'''Inputs'''</u>
| |
| | |
| - '''miid''':''int'' // server-provided media item id to be analyzed
| |
| - '''url''':''str'' // url containing the text to be analyzed
| |
| - '''text''':''str'' // text to be analyzed
| |
| - '''tasks''':''dictionary'' // list of tasks to perform
| |
| - '''results''':''dictionary'' {D: null} // list of results from past tasks to perform
| |
| | |
| ''Note:'' Either miid, url, or text must be provided
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''results''':''dictionary'' // list of task results (one result object per task).
| |
| | |
| ====Tasks====
| |
| =====identify_keywords=====
| |
| Identify main keywords found in the text, either document wide, per paragraph, or both
| |
| | |
| Powered by [http://csc.media.mit.edu/luminoso Luminoso]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| - '''type''':''enum('document','paragraph', 'both')'' {D: 'document'} // The scope of keywords to be extracted
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''document_keywords''':''array'' // list of keywords for the entire document
| |
| - '''paragraph_keywords''':''array'' // list of keywords for each paragraph
| |
| | |
| =====identify_entities=====
| |
| Identify entities (e.g. people, organizations, and locations) found in the text, either document wide, per paragraph, or both
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| ???
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''entities''':''array'' // array of [position, entity, type] tuples in the document.
| |
| | |
| ===Video===
| |
| '''URL:''' /api/video
| |
| | |
| ====POST====
| |
| <u>'''Inputs'''</u>
| |
| | |
| - '''video_file''':''file'' // The video file to store on the server
| |
| - '''url''':''str'' // The url containing the video to store on the server
| |
| - '''ttl''':''int'' {D:0} // The number of seconds until the file will be removed from the system (0 means indefinitely)
| |
| | |
| ''Note:'' Either video_file or url must be provided
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''miid''':''int'' // The unique media item id assigned to this item
| |
| | |
| ====GET====
| |
| <u>'''Inputs'''</u>
| |
| | |
| - '''miid''':''int'' // server-provided media item id to be analyzed
| |
| - '''url''':''str'' // url containing the video to be analyzed
| |
| - '''transcript''':''array'' {D: null}// list of [HH:MM:SS, transcript] tuples
| |
| - '''tasks''':''dictionary'' {D: null} // list of tasks to perform.
| |
| - '''results''':''dictionary'' {D null} // list of results from past tasks.
| |
| | |
| ''Note:'' Either miid or url must be provided
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''results''':''dictionary'' // list of task results (one result object per task).
| |
| | |
| ====Tasks====
| |
| =====transcribe=====
| |
| Attempt to create a timestamped transcript for the video. The transcript will either be ripped from CC data or estimated using speech to text algorithms.
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| None
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''transcript''':''array'' // list of [HH:MM:SS, transcript] tuples
| |
| - '''transcription_method''':''enum('cc','stt')'' // the method used to generate the transcript
| |
| | |
| =====identify_audio_transitions=====
| |
| Identify moments of distinct changes in audio content (e.g. speaker changes).
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| None
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''audio_transitions''':''array'' // list of [HH:MM:SS, sound_id] tuples
| |
| | |
| =====identify_video_transitions=====
| |
| Identify moments of distinct changes in video content (e.g. scene changes).
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| None
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''video_transitions''':''array'' // list of [HH:MM:SS, scene_id] tuples
| |
| | |
| =====ocr=====
| |
| Attempt to extract any digital characters found in the video.
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| - '''focus_blocks''':''array'' {D: null} // a list of [x, y, h, w] boxes that contain specific segments of OCR
| |
| - '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''ocr_results''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [[string, position ],...]] tuples
| |
| | |
| =====identify_entities=====
| |
| Identify entities (e.g. people, organizations, and locations) found in the video transcript
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| ???
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''entities''':''array'' // array of [HH:MM:SS, entity, type] tuples in the document.
| |
| | |
| =====identify_keywords=====
| |
| Identify main keywords found in the video, either video wide or per time segment
| |
| | |
| Powered by [http://csc.media.mit.edu/luminoso Luminoso]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| - '''block_size''':''int'' {D: 0} // The size of the time blocks in seconds (0 means entire video)
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''video_keywords''':''array'' // list of [start HH:MM:SS, [keywords]] tuples for each time block
| |
| | |
| ===Audio===
| |
| '''URL:''' /api/audio
| |
| | |
| ===Image===
| |
| '''URL:''' /api/image
| |