Drumbeat/MoJo/hackfest/berlin/projects/MetaProject: Difference between revisions

From MozillaWiki
< Drumbeat‎ | MoJo‎ | hackfest‎ | berlin‎ | projects
Jump to navigation Jump to search
(Replaced content with "This page has been moved.")
 
Line 1: Line 1:
==About The Project==
This page [[Drumbeat/MoJo/hackfest/berlin/projects/MetaMetaProject|has been moved.]]
'''Name:''' Meta Meta Project
 
'''Code Repository:''' [https://github.com/slifty/MetaProject On Github]
 
The Meta Meta Project is a tool which provides a simple service: take in any piece of media, spit out all the meta possible.
 
=== Project Status  ===
 
* Much of the API is designed and documented.
* Much of the API is stubbed out in code, ready to have the "brains" inserted.
* Keyword extraction is implemented, in addition to a front-facing "test shell" which can easily be modified to show off the new features as they are added.
 
=== Collaborators  ===
 
Oh so many folks at Hacktoberfest helped in discussions, brainstorms, fleshing out the wish lists, and in some case even in the code.  Shout outs in particular go to:
* [[https://wiki.mozilla.org/User:ShinNoNoir Raynor Vliegendhart]] who helped design python server template and serve as a spectacular Python resource.
* [[https://wiki.mozilla.org/User:Tathagatadg Tathagata Dasgupta]] who has been particularly enthusiastic about contributing his entity extraction work.
* [[https://wiki.mozilla.org/index.php?title=User:Maboa&action=edit&redlink=1 Mark Boas]] who is going to be a key player in the incorporation of microformats  transcription features.
* Laurian Gridinoc whose comments and advice helped shape the API design.
 
=== Next steps  ===
 
There are some clear next steps:
 
* Continue fleshing out the API, particularly for Text and Audio formats.
* Continue to code the specific tasks in the API.
* Flesh out and possibly streamline the installation process.
* Encapsulate library includes so that, when setting up a server, it is possible to only set up specific portions (for instance maybe someone doesn't need the identify_keywords task so ideally they wouldn't have to install nltk).
* Design a "test script" which will make it clear what tasks are functional and what tasks don't have their dependencies properly installed
* Design new media type "Web Site" which will focus on component extraction (e.g. "identify_videos" "identify_content" etc.)
 
Places where this project might be tested include:
 
* This tool can be used (and contributed to) by anyone who is hacking together a project using media, be it in a newsroom, a professional in a company, or a hobby coder.
 
==Meta Standards Resources==
(Add links and summaries to documents discussing metadata)
 
* [http://dev.iptc.org/rNews rNews] is a proposed standard for using RDFa to annotate news-specific metadata in HTML documents.
 
* [https://docs.google.com/document/pub?id=1gHBKJxdLEQVBOax5uis7_3b3gOQIUQHfJKZY-FWQ_2c Metafragments] proposed metadata markup for audio and video. - (Julien Dorra)
 
==Known APIs and Tools==
(Add links and summaries of toolkits and APIs which can help generate data!)
* http://m.vid.ly/user/ - won't generate metadata but can help with format conversions
 
==Desired Functionality==
=== TEXT  ===
 
'''Valid Inputs:''' URL, Plain Text, HTML
 
'''Optional Inputs:''' Known Metadata
 
'''Desired Metadata:'''
 
- Primary Themes (Document-wide)
- Primary Themes (Per-paragraph)
- Suggested Tags
- Entities (Names, Locations, Dates, Organizations) and their locations in text
- Author
- Publishing organization (if any)
- Date initially published and date last updated
- Names of people who are quoted
- Quotes
- Other texts cited and/or linked (books, articles, urls)
- All other numbers (that aren't dates) and their units (i.e. data points cited)
- Corrections
 
===VIDEO===
'''Valid Inputs:''' URL, Video (.mov .mp4 vp8)
 
'''Optional Inputs:''' Transcript, Faces, Known Metadata
 
'''Desired Metadata:'''
- Transcript
- Moments of audio transition (new speaker)
- Moments of video transition (new scene)
- OCR data (any text that appears on image) and their timestamps
- Entities (Names, Locations) and their timestamps
- Suggested Tags
- Face identification and their timestamp ranges [only done if faces are provided]
- caption/summary
- author and job title
- headline
- keywords
- location
- date
- copyright
- news org name
- URL to related word story
 
===AUDIO===
'''Valid Inputs:''' URL, Audio (mp3, wav)
 
'''Optional Inputs:''' Transcript, Voice Samples, Known Metadata
 
'''Desired Metadata:'''
- Transcript
- Moments of audio transition (new speaker)
- Entities (Names, Locations) and their timestamps
- Suggested Tags
- Voice identification  and their timestamp ranges [only done if voice samples are provided]
 
===IMAGE===
'''Valid Inputs:''' URL, Image (jpg, gif, bmp, png, tif)
 
'''Optional Inputs:''' Faces, Known Metadata
 
'''Desired Metadata:'''
- OCR data and it's coordinate location
- Object identification
- Face identification [only done if faces are provided]
- Location identification
 
'''In photo we have:'''
- caption
- author and job title
- headline
- keywords
- location
- date
- copyright
- news org name
 
===INTERACTIVE===
 
'''Valid Inputs:''' URL
 
'''Optional Inputs:''' None
 
'''Desired Metadata:'''
???
 
 
===WEB PAGE===
'''Valid Inputs:''' URL
 
'''Optional Inputs:''' None
 
'''Desired Metadata:'''
- images
- audio
- videos
- content
- title
- author
- last update
- meta tags
-
 
==API==
API to be as RESTful as posisble.  Current thought is that POST will be used to upload the media item (if needed) which will return a ''Media Item ID (MIID)'', GET will be used to perform the actual analysis (taking in either an external URL, or the MIID returned from a POST).
 
'''Entity Types'''
* text
* image
* video
* audio
 
===Text===
'''URL:''' /api/text
 
====POST====
<u>'''Inputs'''</u>
 
- '''text_file''':''file'' // text file to store on the server
- '''url''':''str'' // url containing the text to store on the server
- '''text''':''str'' // text to store on the server
- '''ttl''':''int'' {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)
 
''Note:'' Either text_file, url, or text must be provided
 
<u>'''Outputs'''</u>
 
- '''miid''':''int'' // unique media item id assigned to this item
 
====GET====
<u>'''Inputs'''</u>
 
- '''miid''':''int'' // server-provided media item id to be analyzed
- '''url''':''str'' // url containing the text to be analyzed
- '''text''':''str'' // text to be analyzed
- '''tasks''':''dictionary'' // list of tasks to perform
- '''results''':''dictionary'' {D: null} // list of results from past tasks
 
''Note:'' Either miid, url, or text must be provided
 
<u>'''Outputs'''</u>
 
- '''results''':''dictionary'' // list of task results (one result object per task).
 
====Tasks====
=====identify_entities=====
Identify entities (e.g. people, organizations, and locations) found in the text, either document wide, per paragraph, or both
 
Powered by [???]
 
<u>'''Inputs'''</u>
 
None
 
<u>'''Outputs'''</u>
 
- '''entities''':''array'' // array of [position, entity, type] tuples in the document
 
=====identify_keywords=====
Identify main keywords found in the text, either document wide, per paragraph, or both
 
Powered by [http://www.nltk.org/ nltk]
 
<u>'''Inputs'''</u>
 
- '''type''':''enum('document','paragraph', 'both')'' {D: 'document'} // The scope of keywords to be extracted
 
<u>'''Outputs'''</u>
 
- '''document_keywords''':''array'' // list of keywords for the entire document
- '''paragraph_keywords''':''array'' // list of keywords for each paragraph
 
===Video===
'''URL:''' /api/video
 
====POST====
<u>'''Inputs'''</u>
 
- '''video_file''':''file'' // video file to store on the server
- '''url''':''str'' // url containing the video to store on the server
- '''ttl''':''int'' {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)
 
''Note:'' Either video_file or url must be provided
 
<u>'''Outputs'''</u>
 
- '''miid''':''int'' // unique media item id assigned to this item
 
====GET====
<u>'''Inputs'''</u>
 
- '''miid''':''int'' // server-provided media item id to be analyzed
- '''url''':''str'' // url containing the video to be analyzed
- '''tasks''':''dictionary'' // list of tasks to perform
- '''results''':''dictionary'' {D null} // list of results from past tasks
 
''Note:'' Either miid or url must be provided
 
<u>'''Outputs'''</u>
 
- '''results''':''dictionary'' // list of task results (one result object per task)
 
====Tasks====
=====identify_audio_transitions=====
Identify moments of distinct changes in audio content (e.g. speaker changes).
 
Powered by [???]
 
<u>'''Inputs'''</u>
 
None
 
<u>'''Outputs'''</u>
 
- '''audio_transitions''':''array'' // list of [HH:MM:SS, sound_id] tuples
 
=====identify_entities=====
Identify entities (e.g. people, organizations, and locations) found in the video transcript
 
Powered by [???]
 
<u>'''Inputs'''</u>
 
None
 
<u>'''Outputs'''</u>
 
- '''entities''':''array'' // array of [HH:MM:SS, entity, type] tuples in the document
 
=====identify_faces=====
Identify faces that appear in the video
 
Powered by ??
 
<u>'''Inputs'''</u>
 
- '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis
 
<u>'''Outputs'''</u>
 
- '''faces''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [x,y] miid ]] tuples
 
=====identify_keywords=====
Identify main keywords found in the video, either video wide or per time segment
 
Powered by [http://www.nltk.org/ nltk]
 
<u>'''Inputs'''</u>
 
- '''block_size''':''int'' {D: 0} // size of the time blocks in seconds (0 means entire video)
 
<u>'''Outputs'''</u>
 
- '''video_keywords''':''array'' // list of [start HH:MM:SS, [keywords]] tuples for each time block
 
=====identify_video_transitions=====
Identify moments of distinct changes in video content (e.g. scene changes).
 
Powered by [???]
 
<u>'''Inputs'''</u>
 
None
 
<u>'''Outputs'''</u>
 
- '''video_transitions''':''array'' // list of [HH:MM:SS, scene_id] tuples
 
=====ocr=====
Attempt to extract any digital characters found in the video.
 
Powered by [???]
 
<u>'''Inputs'''</u>
 
- '''focus_blocks''':''array'' {D: null} // list of [x, y, h, w] boxes that contain specific segments of OCR
- '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis
 
<u>'''Outputs'''</u>
 
- '''ocr_results''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [x, y], string]] tuples
 
=====transcribe=====
Attempt to create a timestamped transcript for the video.  The transcript will either be ripped from CC data or estimated using speech to text algorithms.
 
Powered by [???]
 
<u>'''Inputs'''</u>
 
None
 
<u>'''Outputs'''</u>
 
- '''transcript''':''array'' // list of [HH:MM:SS, transcript] tuples
- '''transcription_method''':''enum('cc','stt')'' // method used to generate the transcript
 
===Audio===
'''URL:''' /api/audio
 
===Image===
'''URL:''' /api/image

Latest revision as of 19:53, 3 October 2011

This page has been moved.