Drumbeat/MoJo/hackfest/berlin/projects/MetaProject: Difference between revisions

From MozillaWiki
< Drumbeat‎ | MoJo‎ | hackfest‎ | berlin‎ | projects
Jump to navigation Jump to search
Line 109: Line 109:
  - '''url''':''str'' // url containing the text to store on the server
  - '''url''':''str'' // url containing the text to store on the server
  - '''text''':''str'' // text to store on the server
  - '''text''':''str'' // text to store on the server
  - '''ttl''':''int'' {D:0} // number of seconds until the file will be removed from the system (0 means indefinitely)
  - '''ttl''':''int'' {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)


''Note:'' Either text_file, url, or text must be provided
''Note:'' Either text_file, url, or text must be provided

Revision as of 08:24, 28 September 2011

The Meta Project is a tool which provides a simple service: take in any piece of media, spit out all the meta possible.

Meta Standards Resources

(Add links and summaries to documents discussing metadata)

  • rNews is a proposed standard for using RDFa to annotate news-specific metadata in HTML documents.
  • Metafragments proposed metadata markup for audio and video. - (Julien Dorra)

Known APIs and Tools

(Add links and summaries of toolkits and APIs which can help generate data!)

Desired Functionality

TEXT

Valid Inputs: URL, Plain Text, HTML

Optional Inputs: Known Metadata

Returned Metadata:

- Primary Themes (Document-wide)
- Primary Themes (Per-paragraph)
- Suggested Tags
- Entities (Names, Locations, Dates, Organizations) and their locations in text
- Author
- Publishing organization (if any)
- Date initially published and date last updated
- Names of people who are quoted
- Quotes
- Other texts cited and/or linked (books, articles, urls)
- All other numbers (that aren't dates) and their units (i.e. data points cited)
- Corrections

VIDEO

Valid Inputs: URL, Video (.mov .mp4)

Optional Inputs: Transcript, Faces, Known Metadata

Returned Metadata:

- Transcript
- Moments of audio transition (new speaker)
- Moments of video transition (new scene)
- OCR data (any text that appears on image) and their timestamps
- Entities (Names, Locations) and their timestamps
- Suggested Tags
- Face identification and their timestamp ranges [only done if faces are provided]
- caption/summary
- author and job title
- headline
- keywords 
- location
- date
- copyright
- news org name
- URL to related word story

AUDIO

Valid Inputs: URL, Audio (mp3, wav)

Optional Inputs: Transcript, Voice Samples, Known Metadata

Returned Metadata:

- Transcript
- Moments of audio transition (new speaker)
- Entities (Names, Locations) and their timestamps
- Suggested Tags
- Voice identification  and their timestamp ranges [only done if voice samples are provided]

IMAGE

Valid Inputs: URL, Image (jpg, gif, bmp, png, tif)

Optional Inputs: Faces, Known Metadata

Returned Metadata:

- OCR data and it's coordinate location
- Object identification
- Face identification [only done if faces are provided]
- Location identification

In photo we have:

- caption
- author and job title
- headline
- keywords 
- location
- date
- copyright
- news org name

API

API to be as RESTful as posisble. Current thought is that POST will be used to upload the media item (if needed) which will return a Media Item ID (MIID), GET will be used to perform the actual analysis (taking in either an external URL, or the MIID returned from a POST).

Entity Types

* text
* image
* video
* audio
* visual interactive

Text

URL: /api/text

POST

Inputs

- text_file:file // text file to store on the server
- url:str // url containing the text to store on the server
- text:str // text to store on the server
- ttl:int {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)

Note: Either text_file, url, or text must be provided

Outputs

- miid:int // unique media item id assigned to this item

GET

Inputs

- miid:int // server-provided media item id to be analyzed
- url:str // url containing the text to be analyzed
- text:str // text to be analyzed
- tasks:dictionary // list of tasks to perform
- results:dictionary {D: null} // list of results from past tasks

Note: Either miid, url, or text must be provided

Outputs

- results:dictionary // list of task results (one result object per task).

Tasks

identify_entities

Identify entities (e.g. people, organizations, and locations) found in the text, either document wide, per paragraph, or both

Powered by [???]

Inputs

None

Outputs

- entities:array // array of [position, entity, type] tuples in the document
identify_keywords

Identify main keywords found in the text, either document wide, per paragraph, or both

Powered by Luminoso

Inputs

- type:enum('document','paragraph', 'both') {D: 'document'} // The scope of keywords to be extracted

Outputs

- document_keywords:array // list of keywords for the entire document
- paragraph_keywords:array // list of keywords for each paragraph

Video

URL: /api/video

POST

Inputs

- video_file:file // video file to store on the server
- url:str // url containing the video to store on the server
- ttl:int {D:180} // number of seconds until the file will be removed from the system (0 means indefinitely)

Note: Either video_file or url must be provided

Outputs

- miid:int // unique media item id assigned to this item

GET

Inputs

- miid:int // server-provided media item id to be analyzed
- url:str // url containing the video to be analyzed
- tasks:dictionary // list of tasks to perform
- results:dictionary {D null} // list of results from past tasks

Note: Either miid or url must be provided

Outputs

- results:dictionary // list of task results (one result object per task)

Tasks

identify_audio_transitions

Identify moments of distinct changes in audio content (e.g. speaker changes).

Powered by [???]

Inputs

None

Outputs

- audio_transitions:array // list of [HH:MM:SS, sound_id] tuples
identify_entities

Identify entities (e.g. people, organizations, and locations) found in the video transcript

Powered by [???]

Inputs

None

Outputs

- entities:array // array of [HH:MM:SS, entity, type] tuples in the document
identify_faces

Identify faces that appear in the video

Powered by ??

Inputs

- sample_rate:int {D: 1} // number of frames per second to sample for analysis

Outputs

- faces:array // list of [start HH:MM:SS, end HH:MM:SS, [x,y] miid ]] tuples
identify_keywords

Identify main keywords found in the video, either video wide or per time segment

Powered by Luminoso

Inputs

- block_size:int {D: 0} // size of the time blocks in seconds (0 means entire video)

Outputs

- video_keywords:array // list of [start HH:MM:SS, [keywords]] tuples for each time block
identify_video_transitions

Identify moments of distinct changes in video content (e.g. scene changes).

Powered by [???]

Inputs

None

Outputs

- video_transitions:array // list of [HH:MM:SS, scene_id] tuples
ocr

Attempt to extract any digital characters found in the video.

Powered by [???]

Inputs

- focus_blocks:array {D: null} // list of [x, y, h, w] boxes that contain specific segments of OCR
- sample_rate:int {D: 1} // number of frames per second to sample for analysis

Outputs

- ocr_results:array // list of [start HH:MM:SS, end HH:MM:SS, [x, y], string]] tuples
transcribe

Attempt to create a timestamped transcript for the video. The transcript will either be ripped from CC data or estimated using speech to text algorithms.

Powered by [???]

Inputs

None

Outputs

- transcript:array // list of [HH:MM:SS, transcript] tuples
- transcription_method:enum('cc','stt') // method used to generate the transcript

Audio

URL: /api/audio

Image

URL: /api/image