Drumbeat/MoJo/hackfest/berlin/projects/MetaProject: Difference between revisions

Revision as of 13:09, 27 September 2011

The Meta Project is a tool which provides a simple service: take in any piece of media, spit out all the meta possible.

Meta Standards Resources

(Add links and summaries to documents discussing metadata)

rNews is a proposed standard for using RDFa to annotate news-specific metadata in HTML documents.

Metafragments proposed metadata markup for audio and video. - (Julien Dorra)

Known APIs and Tools

(Add links and summaries of toolkits and APIs which can help generate data!)

http://m.vid.ly/user/ - won't generate metadata but can help with format conversions

Desired Functionality

TEXT

Valid Inputs: URL, Plain Text, HTML

Optional Inputs: Known Metadata

Returned Metadata:

- Primary Themes (Document-wide)
- Primary Themes (Per-paragraph)
- Suggested Tags
- Entities (Names, Locations, Dates, Organizations) and their locations in text
- Author
- Publishing organization (if any)
- Date initially published and date last updated
- Names of people who are quoted
- Quotes
- Other texts cited and/or linked (books, articles, urls)
- All other numbers (that aren't dates) and their units (i.e. data points cited)
- Corrections

VIDEO

Valid Inputs: URL, Video (format? .mov and .mp4 are the dominate ones)

Optional Inputs: Transcript, Faces, Known Metadata

Returned Metadata:

- Transcript
- Moments of audio transition (new speaker)
- Moments of video transition (new scene)
- OCR data (any text that appears on image) and their timestamps
- Entities (Names, Locations) and their timestamps
- Suggested Tags
- Face identification and their timestamp ranges [only done if faces are provided]

AUDIO

Valid Inputs: URL, Audio (mp3, wav)

Optional Inputs: Transcript, Voice Samples, Known Metadata

Returned Metadata:

- Transcript
- Moments of audio transition (new speaker)
- Entities (Names, Locations) and their timestamps
- Suggested Tags
- Voice identification  and their timestamp ranges [only done if voice samples are provided]

IMAGE

Valid Inputs: URL, Image (jpg, gif, bmp, png, tif)

Optional Inputs: Faces, Known Metadata

Returned Metadata:

- OCR data and it's coordinate location
- Object identification
- Face identification [only done if faces are provided]

In photo we have:

- caption
- author and job title
- headline
- keywords 
- location
- date
- copyright
- news org name

API

API to be as RESTful as posisble. Current thought is that POST will be used to upload the media item (if needed) which will return a Media Item ID (MIID), GET will be used to perform the actual analysis (taking in either an external URL, or the MIID returned from a POST).

Entity Types

* text
* image
* video
* audio

Text

URL: /api/text

POST

Inputs

- text_file:file // text file to store on the server
- url:str // url containing the text to store on the server
- text:str // text to store on the server
- ttl:int {D:0} // number of seconds until the file will be removed from the system (0 means indefinitely)

Note: Either text_file, url, or text must be provided

Outputs

- miid:int // unique media item id assigned to this item

GET

Inputs

- miid:int // server-provided media item id to be analyzed
- url:str // url containing the text to be analyzed
- text:str // text to be analyzed
- tasks:dictionary // list of tasks to perform
- results:dictionary {D: null} // list of results from past tasks to perform

Note: Either miid, url, or text must be provided

Outputs

- results:dictionary // list of task results (one result object per task).

Tasks

identify_entities

Identify entities (e.g. people, organizations, and locations) found in the text, either document wide, per paragraph, or both

Powered by [???]

Inputs

???

Outputs

- entities:array // array of [position, entity, type] tuples in the document

identify_keywords

Identify main keywords found in the text, either document wide, per paragraph, or both

Powered by Luminoso

Inputs

- type:enum('document','paragraph', 'both') {D: 'document'} // The scope of keywords to be extracted

Outputs

- document_keywords:array // list of keywords for the entire document
- paragraph_keywords:array // list of keywords for each paragraph

Video

URL: /api/video

POST

Inputs

- video_file:file // video file to store on the server
- url:str // url containing the video to store on the server
- ttl:int {D:0} // number of seconds until the file will be removed from the system (0 means indefinitely)

Note: Either video_file or url must be provided

Outputs

- miid:int // unique media item id assigned to this item

GET

Inputs

- miid:int // server-provided media item id to be analyzed
- url:str // url containing the video to be analyzed
- tasks:dictionary // list of tasks to perform.
- results:dictionary {D null} // list of results from past tasks.

Note: Either miid or url must be provided

Outputs

- results:dictionary // list of task results (one result object per task)

Tasks

transcribe

Attempt to create a timestamped transcript for the video. The transcript will either be ripped from CC data or estimated using speech to text algorithms.

Powered by [???]

Inputs

None

Outputs

- transcript:array // list of [HH:MM:SS, transcript] tuples
- transcription_method:enum('cc','stt') // method used to generate the transcript

identify_audio_transitions

Identify moments of distinct changes in audio content (e.g. speaker changes).

Powered by [???]

Inputs

None

Outputs

- audio_transitions:array // list of [HH:MM:SS, sound_id] tuples

identify_entities

Identify entities (e.g. people, organizations, and locations) found in the video transcript

Powered by [???]

Inputs

None

Outputs

- entities:array // array of [HH:MM:SS, entity, type] tuples in the document

identify_faces

Identify faces that appear in the video

Powered by ??

Inputs

- sample_rate:int {D: 1} // number of frames per second to sample for analysis

Outputs

- faces:array // list of [start HH:MM:SS, end HH:MM:SS, [x,y] miid ]] tuples

identify_keywords

Identify main keywords found in the video, either video wide or per time segment

Powered by Luminoso

Inputs

- block_size:int {D: 0} // size of the time blocks in seconds (0 means entire video)

Outputs

- video_keywords:array // list of [start HH:MM:SS, [keywords]] tuples for each time block

identify_video_transitions

Identify moments of distinct changes in video content (e.g. scene changes).

Powered by [???]

Inputs

None

Outputs

- video_transitions:array // list of [HH:MM:SS, scene_id] tuples

ocr

Attempt to extract any digital characters found in the video.

Powered by [???]

Inputs

- focus_blocks:array {D: null} // list of [x, y, h, w] boxes that contain specific segments of OCR
- sample_rate:int {D: 1} // number of frames per second to sample for analysis

Outputs

- ocr_results:array // list of [start HH:MM:SS, end HH:MM:SS, [x, y], string]] tuples

Audio

URL: /api/audio

Image

URL: /api/image

@@ Line 207: / Line 207: @@
   - '''audio_transitions''':''array'' // list of [HH:MM:SS, sound_id] tuples
-=====identify_video_transitions=====
+=====identify_entities=====
-Identify moments of distinct changes in video content (e.g. scene changes).
+Identify entities (e.g. people, organizations, and locations) found in the video transcript
 Powered by [???]
@@ Line 218: / Line 218: @@
 <u>'''Outputs'''</u>
-  - '''video_transitions''':''array'' // list of [HH:MM:SS, scene_id] tuples
+  - '''entities''':''array'' // array of [HH:MM:SS, entity, type] tuples in the document
-=====ocr=====
+=====identify_faces=====
-Attempt to extract any digital characters found in the video.
+Identify faces that appear in the video
-Powered by [???]
+Powered by ??
 <u>'''Inputs'''</u>
- - '''focus_blocks''':''array'' {D: null} // list of [x, y, h, w] boxes that contain specific segments of OCR
   - '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis
 <u>'''Outputs'''</u>
-  - '''ocr_results''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [x, y], string]] tuples
+  - '''faces''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [x,y] miid ]] tuples
-=====identify_entities=====
+=====identify_keywords=====
-Identify entities (e.g. people, organizations, and locations) found in the video transcript
+Identify main keywords found in the video, either video wide or per time segment
-Powered by [???]
+Powered by [http://csc.media.mit.edu/luminoso Luminoso]
 <u>'''Inputs'''</u>
-???
+ - '''block_size''':''int'' {D: 0} // size of the time blocks in seconds (0 means entire video)
 <u>'''Outputs'''</u>
-  - '''entities''':''array'' // array of [HH:MM:SS, entity, type] tuples in the document
+  - '''video_keywords''':''array'' // list of [start HH:MM:SS, [keywords]] tuples for each time block
-=====identify_keywords=====
+=====identify_video_transitions=====
-Identify main keywords found in the video, either video wide or per time segment
+Identify moments of distinct changes in video content (e.g. scene changes).
-Powered by [http://csc.media.mit.edu/luminoso Luminoso]
+Powered by [???]
 <u>'''Inputs'''</u>
- - '''block_size''':''int'' {D: 0} // size of the time blocks in seconds (0 means entire video)
+None
 <u>'''Outputs'''</u>
-  - '''video_keywords''':''array'' // list of [start HH:MM:SS, [keywords]] tuples for each time block
+  - '''video_transitions''':''array'' // list of [HH:MM:SS, scene_id] tuples
-=====identify_faces=====
+=====ocr=====
-Identify faces that appear in the video
+Attempt to extract any digital characters found in the video.
-Powered by ??
+Powered by [???]
 <u>'''Inputs'''</u>
+ - '''focus_blocks''':''array'' {D: null} // list of [x, y, h, w] boxes that contain specific segments of OCR
   - '''sample_rate''':''int'' {D: 1} // number of frames per second to sample for analysis
 <u>'''Outputs'''</u>
-  - '''faces''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [x,y] miid ]] tuples
+  - '''ocr_results''':''array'' // list of [start HH:MM:SS, end HH:MM:SS, [x, y], string]] tuples
 ===Audio===

Drumbeat/MoJo/hackfest/berlin/projects/MetaProject: Difference between revisions

Revision as of 13:09, 27 September 2011

Contents

Meta Standards Resources

Known APIs and Tools

Desired Functionality

TEXT

VIDEO

AUDIO

IMAGE

API

Text

POST

GET

Tasks

identify_entities

identify_keywords

Video

POST

GET

Tasks

transcribe

identify_audio_transitions

identify_entities

identify_faces

identify_keywords

identify_video_transitions

ocr

Audio

Image

Navigation menu

Drumbeat/MoJo/hackfest/berlin/projects/MetaProject: Difference between revisions

Revision as of 13:09, 27 September 2011

Meta Standards Resources

Known APIs and Tools

Desired Functionality

TEXT

VIDEO

AUDIO

IMAGE

API

Text

POST

GET

Tasks

identify_entities

identify_keywords

Video

POST

GET

Tasks

transcribe

identify_audio_transitions

identify_entities

identify_faces

identify_keywords

identify_video_transitions

ocr

Audio

Image

Navigation menu

Search