|
|
| Line 1: |
Line 1: |
| ==About The Project==
| | This page [[Drumbeat/MoJo/hackfest/berlin/projects/MetaMetaProject|has been moved.]] |
| '''Name:''' Meta Meta Project
| |
| | |
| '''Code Repository:''' [https://github.com/slifty/MetaProject On Github]
| |
| | |
| The Meta Meta Project is a tool which provides a simple service: take in any piece of media, spit out all the meta possible.
| |
| | |
| === Project Status ===
| |
| | |
| * Much of the API is designed and documented.
| |
| * Much of the API is stubbed out in code, ready to have the "brains" inserted.
| |
| * Keyword extraction is implemented, in addition to a front-facing "test shell" which can easily be modified to show off the new features as they are added.
| |
| | |
| === Collaborators ===
| |
| | |
| Oh so many folks at Hacktoberfest helped in discussions, brainstorms, fleshing out the wish lists, and in some case even in the code. Shout outs in particular go to:
| |
| * [[https://wiki.mozilla.org/User:ShinNoNoir Raynor Vliegendhart]] who helped design python server template and serve as a spectacular Python resource.
| |
| * [[https://wiki.mozilla.org/User:Tathagatadg Tathagata Dasgupta]] who has been particularly enthusiastic about contributing his entity extraction work.
| |
| * [[https://wiki.mozilla.org/index.php?title=User:Maboa&action=edit&redlink=1 Mark Boas]] who is going to be a key player in the incorporation of microformats transcription features.
| |
| * Laurian Gridinoc whose comments and advice helped shape the API design.
| |
| | |
| === Next steps ===
| |
| | |
| There are some clear next steps:
| |
| | |
| * Continue fleshing out the API, particularly for Text and Audio formats.
| |
| * Continue to code the specific tasks in the API.
| |
| * Flesh out and possibly streamline the installation process.
| |
| * Encapsulate library includes so that, when setting up a server, it is possible to only set up specific portions (for instance maybe someone doesn't need the identify_keywords task so ideally they wouldn't have to install nltk).
| |
| * Design a "test script" which will make it clear what tasks are functional and what tasks don't have their dependencies properly installed
| |
| * Design new media type "Web Site" which will focus on component extraction (e.g. "identify_videos" "identify_content" etc.)
| |
| | |
| Places where this project might be tested include:
| |
| | |
| * This tool can be used (and contributed to) by anyone who is hacking together a project using media, be it in a newsroom, a professional in a company, or a hobby coder.
| |
| | |
| ==Meta Standards Resources==
| |
| (Add links and summaries to documents discussing metadata)
| |
| | |
| * [http://dev.iptc.org/rNews rNews] is a proposed standard for using RDFa to annotate news-specific metadata in HTML documents.
| |
| | |
| * [https://docs.google.com/document/pub?id=1gHBKJxdLEQVBOax5uis7_3b3gOQIUQHfJKZY-FWQ_2c Metafragments] proposed metadata markup for audio and video. - (Julien Dorra)
| |
| | |
| ==Known APIs and Tools==
| |
| (Add links and summaries of toolkits and APIs which can help generate data!)
| |
| * http://m.vid.ly/user/ - won't generate metadata but can help with format conversions
| |
| | |
| ==Desired Functionality==
| |
| === TEXT ===
| |
| | |
| '''Valid Inputs:''' URL, Plain Text, HTML
| |
| | |
| '''Optional Inputs:''' Known Metadata
| |
| | |
| '''Desired Metadata:'''
| |
| | |
| - Primary Themes (Document-wide)
| |
| - Primary Themes (Per-paragraph)
| |
| - Suggested Tags
| |
| - Entities (Names, Locations, Dates, Organizations) and their locations in text
| |
| - Author
| |
| - Publishing organization (if any)
| |
| - Date initially published and date last updated
| |
| - Names of people who are quoted
| |
| - Quotes
| |
| - Other texts cited and/or linked (books, articles, urls)
| |
| - All other numbers (that aren't dates) and their units (i.e. data points cited)
| |
| - Corrections
| |
| | |
| ===VIDEO===
| |
| '''Valid Inputs:''' URL, Video (.mov .mp4 vp8)
| |
| | |
| '''Optional Inputs:''' Transcript, Faces, Known Metadata
| |
| | |
| '''Desired Metadata:'''
| |
| - Transcript
| |
| - Moments of audio transition (new speaker)
| |
| - Moments of video transition (new scene)
| |
| - OCR data (any text that appears on image) and their timestamps
| |
| - Entities (Names, Locations) and their timestamps
| |
| - Suggested Tags
| |
| - Face identification and their timestamp ranges [only done if faces are provided]
| |
| - caption/summary
| |
| - author and job title
| |
| - headline
| |
| - keywords
| |
| - location
| |
| - date
| |
| - copyright
| |
| - news org name
| |
| - URL to related word story
| |
| | |
| ===AUDIO===
| |
| '''Valid Inputs:''' URL, Audio (mp3, wav)
| |
| | |
| '''Optional Inputs:''' Transcript, Voice Samples, Known Metadata
| |
| | |
| '''Desired Metadata:'''
| |
| - Transcript
| |
| - Moments of audio transition (new speaker)
| |
| - Entities (Names, Locations) and their timestamps
| |
| - Suggested Tags
| |
| - Voice identification and their timestamp ranges [only done if voice samples are provided]
| |
| | |
| ===IMAGE===
| |
| '''Valid Inputs:''' URL, Image (jpg, gif, bmp, png, tif)
| |
| | |
| '''Optional Inputs:''' Faces, Known Metadata
| |
| | |
| '''Desired Metadata:'''
| |
| - OCR data and it's coordinate location
| |
| - Object identification
| |
| - Face identification [only done if faces are provided]
| |
| - Location identification
| |
| | |
| '''In photo we have:'''
| |
| - caption
| |
| - author and job title
| |
| - headline
| |
| - keywords
| |
| - location
| |
| - date
| |
| - copyright
| |
| - news org name
| |
| | |
| ===INTERACTIVE===
| |
| | |
| '''Valid Inputs:''' URL
| |
| | |
| '''Optional Inputs:''' None
| |
| | |
| '''Desired Metadata:'''
| |
| ???
| |
| | |
| | |
| ===WEB PAGE===
| |
| '''Valid Inputs:''' URL
| |
| | |
| '''Optional Inputs:''' None
| |
| | |
| '''Desired Metadata:'''
| |
| - images
| |
| - audio
| |
| - videos
| |
| - content
| |
| - title
| |
| - author
| |
| - last update
| |
| - meta tags
| |
| -
| |
| | |
| ==API==
| |
| API to be as RESTful as posisble. Current thought is that POST will be used to upload the media item (if needed) which will return a ''Media Item ID (MIID)'', GET will be used to perform the actual analysis (taking in either an external URL, or the MIID returned from a POST).
| |
| | |
| '''Entity Types'''
| |
| * text
| |
| * image
| |
| * video
| |
| * audio
| |
| | |
| ===Text===
| |
| '''URL:''' /api/text
| |
| | |
| ====POST====
| |
| <u>'''Inputs'''</u>
| |
| | |
| - '''text_file''':''file'' // text file to store on the server
| |
| - '''url''':''str'' // url containing the text to store on the server
| |
| - '''text''':''str'' // text to store on the server
| |
| - '''ttl''':''int'' {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)
| |
| | |
| ''Note:'' Either text_file, url, or text must be provided
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''miid''':''int'' // unique media item id assigned to this item
| |
| | |
| ====GET====
| |
| <u>'''Inputs'''</u>
| |
| | |
| - '''miid''':''int'' // server-provided media item id to be analyzed
| |
| - '''url''':''str'' // url containing the text to be analyzed
| |
| - '''text''':''str'' // text to be analyzed
| |
| - '''tasks''':''dictionary'' // list of tasks to perform
| |
| - '''results''':''dictionary'' {D: null} // list of results from past tasks
| |
| | |
| ''Note:'' Either miid, url, or text must be provided
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''results''':''dictionary'' // list of task results (one result object per task).
| |
| | |
| ====Tasks====
| |
| =====identify_entities=====
| |
| Identify entities (e.g. people, organizations, and locations) found in the text, either document wide, per paragraph, or both
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| None
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''entities''':''array'' // array of [position, entity, type] tuples in the document
| |
| | |
| =====identify_keywords=====
| |
| Identify main keywords found in the text, either document wide, per paragraph, or both
| |
| | |
| Powered by [http://www.nltk.org/ nltk]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| - '''type''':''enum('document','paragraph', 'both')'' {D: 'document'} // The scope of keywords to be extracted
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''document_keywords''':''array'' // list of keywords for the entire document
| |
| - '''paragraph_keywords''':''array'' // list of keywords for each paragraph
| |
| | |
| ===Video===
| |
| '''URL:''' /api/video
| |
| | |
| ====POST====
| |
| <u>'''Inputs'''</u>
| |
| | |
| - '''video_file''':''file'' // video file to store on the server
| |
| - '''url''':''str'' // url containing the video to store on the server
| |
| - '''ttl''':''int'' {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)
| |
| | |
| ''Note:'' Either video_file or url must be provided
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''miid''':''int'' // unique media item id assigned to this item
| |
| | |
| ====GET====
| |
| <u>'''Inputs'''</u>
| |
| | |
| - '''miid''':''int'' // server-provided media item id to be analyzed
| |
| - '''url''':''str'' // url containing the video to be analyzed
| |
| - '''tasks''':''dictionary'' // list of tasks to perform
| |
| - '''results''':''dictionary'' {D null} // list of results from past tasks
| |
| | |
| ''Note:'' Either miid or url must be provided
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''results''':''dictionary'' // list of task results (one result object per task)
| |
| | |
| ====Tasks====
| |
| =====identify_audio_transitions=====
| |
| Identify moments of distinct changes in audio content (e.g. speaker changes).
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| None
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''audio_transitions''':''array'' // list of [HH:MM:SS, sound_id] tuples
| |
| | |
| =====identify_entities=====
| |
| Identify entities (e.g. people, organizations, and locations) found in the video transcript
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| None
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''entities''':''array'' // array of [HH:MM:SS, entity, type] tuples in the document
| |
| | |
| =====identify_faces=====
| |
| Identify faces that appear in the video
| |
| | |
| Powered by ??
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| - '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''faces''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [x,y] miid ]] tuples
| |
| | |
| =====identify_keywords=====
| |
| Identify main keywords found in the video, either video wide or per time segment
| |
| | |
| Powered by [http://www.nltk.org/ nltk]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| - '''block_size''':''int'' {D: 0} // size of the time blocks in seconds (0 means entire video)
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''video_keywords''':''array'' // list of [start HH:MM:SS, [keywords]] tuples for each time block
| |
| | |
| =====identify_video_transitions=====
| |
| Identify moments of distinct changes in video content (e.g. scene changes).
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| None
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''video_transitions''':''array'' // list of [HH:MM:SS, scene_id] tuples
| |
| | |
| =====ocr=====
| |
| Attempt to extract any digital characters found in the video.
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| - '''focus_blocks''':''array'' {D: null} // list of [x, y, h, w] boxes that contain specific segments of OCR
| |
| - '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''ocr_results''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [x, y], string]] tuples
| |
| | |
| =====transcribe=====
| |
| Attempt to create a timestamped transcript for the video. The transcript will either be ripped from CC data or estimated using speech to text algorithms.
| |
| | |
| Powered by [???]
| |
| | |
| <u>'''Inputs'''</u>
| |
| | |
| None
| |
| | |
| <u>'''Outputs'''</u>
| |
| | |
| - '''transcript''':''array'' // list of [HH:MM:SS, transcript] tuples
| |
| - '''transcription_method''':''enum('cc','stt')'' // method used to generate the transcript
| |
| | |
| ===Audio===
| |
| '''URL:''' /api/audio
| |
| | |
| ===Image===
| |
| '''URL:''' /api/image
| |