Abhishek/metabrainz GSoC2016Proposal
Personal Details
-
Name : Abhishek Kumar Singh
-
IRC : kanha@freenode.net
-
Location(Time Zone) : India, IST(UTC + 5:30)
-
Education : MS in Computer Science and Engineering, IIIT Hyderabad
-
Email : abhishekkumarsingh.cse@gmail.com, abhishek.singh@research.iiit.ac.in
-
Blog URL : http://abhisheksingh01.wordpress.com/
AcousticBrainz
Abstract
Implementation of an audio search system for AcousticBrainz. An Audio Information Retrieval(AIR) system that would provide client API(s) for audio search and information retrieval. Search will be based on annotated text/descriptions or tags/labels associated with an audio as well as on low and high level features. Focus is to implement an AIR system that provides text based and content based search. Users would be able to perform query through the provided client API(s) and will get list of relevant audios in return.
Benefit to AcousticBrainz community
“Search, and you will find”
- Provides client API(s) for text and content based search. Query can be done by providing some text or by submitting a piece of audio.
- Support for advanced queries combining content analysis features and other metadata (tags, etc) including filters and group-by options.
- Support for duplication detection. It helps in finding duplicate entries in the collection.
- Helps in finding relevant audios quickly and efficiently.
Project Details
INTRODUCTION
The idea is to implement a search functionality for AcousticBrainz. This would include implementing an Audio Information Retrieval System(AIRS) which would provide audio content search. This would also simplifies the task of data exploration and investigating content based similarity.
Audio Information Retrieval(AIR) can be:
- Text based
- Content based
- Text based AIR
Query can be any text say 'rock' or 'beethoven' etc and the system will search through the text(tags, artist name, description) associated with the audios and will return list of relevant audios corresponding to matched text. It's simple to implement but doesn't help much in audio retrieval as most of the times the audio doesn't contains enough annotations.
- Content based AIR
It bridges the semantic gap when text annotations are nonexistent or incomplete. For example, if one comes to a record store and one only know a tune from a song or an album one want to buy, but not title of music, album, composer, or artist. It is difficult problem to solve for one. On the other hand, sales expertise with vast knowledge of the music can identify tunes hummed by one. Content based audio information retrieval can substitute the vast experience of sales expertise with a vast knowledge of the music.
Benefit :
- Given a piece of audio as a query, Content based AIRS will search through the indexed database of audios return list of audios that are similar to a piece of audio given as query.
- Can detect (duplicates) copy right issues with little tweak.
REQUIREMENTS DURING DEVELOPMENT
- Hardware Requirements
- A modern PC/Laptop
- Graphical Processing Unit(GPU)
- Software Requirements
- Elastic search (storage database)
- elastic-py library
- Kibana (data visualization)
- Python 2.7.x/C
- Scikit, Keras
- Coverage
- Pep8
- Pyflakes
- Vim (IDE)
- Sphinx (documentation)
- Python Unit-testing framework (Unittest/Nosetest)
- AcousticBrainz tools
Implementation Details
Audio information retrieval framework consists of some basic processing steps:
- Feature extraction
- Segmentation
- Storing feature vectors(indexing)
- Similarity matching
- Returning similar results
Some proposed models for Content based audio retrieval
- Vector Space Model(VSM)
- Spectrum analysis(Fingerprint) model
- Deep Neural network model (Deep learning approach)
Vector Space Model(VSM)
This is a basic yet effective model. The idea is to represent every audio entity in vector form in the feature space.
Given a query audio, learn the vector representation of it and then find the similarity between the query vector and indexed audio vectors. More the similarity more is its relevance. Cosine similarity measure can be used to calculate relevance score. Features for audio could be tempo, bpm, average volume, genre, mood, fingerprint etc.
Spectrum analysis(Fingerprint) model
This relies on fingerprinting music based on spectrogram.
Basic steps of processing
- Preprocessing step: Includes fingerprinting comprehensive collections of music and storing fingerprint data in database (indexing)
- Extracting fingerprint of music(m) piece used as query
- Matching fingerprint of m against the indexed fingerprints
- Returning matched musics in relevance order
Advantage
It works on very obscure songs and will do so even with extraneous background noise
Implementation Details
Idea here is to think of a piece of music as time frequency graph also called spectrogram. This graph has three axis. Time on x-axis, frequency on y-axis and intensity on z-axis. A line horizontal line represents a continuous pure tone, while vertical line represents sudden rise (burst) white noise.
sample wave file: scuse me
"""Spectrogram image generator for a given audio wav sample.
It is a visual representation of spectral frequencies in sound.
horizontal line(x axis) is time, vertical line(y axis) is frequency
and color represents intensity.
"""
import os
import wave
import pylab
def graph_spectrogram(wav_file):
sound_info, frame_rate = get_wav_info(wav_file)
pylab.figure(num=None, figsize=(19, 12))
pylab.subplot(111)
pylab.title('spectrogram of %r' % wav_file)
pylab.specgram(sound_info, Fs=frame_rate)
pylab.savefig('spectrogram.png')
def get_wav_info(wav_file):
wav = wave.open(wav_file, 'r')
frames = wav.readframes(-1)
sound_info = pylab.fromstring(frames, 'Int16')
frame_rate = wav.getframerate()
wav.close()
return sound_info, frame_rate
if __name__ == '__main__':
wav_file = 'scuse_me.wav'
graph_spectrogram(wav_file)
Points of interest are the points with 'peak intensity'. The algorithm keeps track of the frequency and the amount of time from the beginning of the track for each of these peak point. Fingerprint of the song is generated using these information.
Details of algorithm
Read the audio/music data as a normal input stream. This data is time-domain data. As we need to use spectrum analysis instead of direct time-domain data. Using Discrete Fourier Transform (DFT) we will convert the time-domain data to frequency domain so that it can be of use. The problem here is that, in frequency domain data we loose track of time. Hence, to overcome this problem we divide whole data into chunks of data and will transform just this bit of information. Basically, we will be using a small window size. For each of these small window we already know the time.
Using Fast Fourier Transform(FFT) on all data chunks we get a list of all the data about frequencies. Now, our goal would be to find interest points also called key music points to save these points on hash and try to match on them against the indexed database. It would be efficient since average lookup time for a hash table is O(1), i.e. constant. Each line of spectrum represents data for a particular window. We would be taking certain ranges say 40-80, 80-120, 120-180, 180-300. Our goal is to take points with highest magnitude from these ranges for each windowed data. These are the interest points or key points.
Next step is to use these key points for hashing and generate index database. Key points obtained form each window will be used to generate a hash key. In the dictionary, a hash key is used to obtain the bucketlist stored as value. Index of the bucketlist is songID and at each index value would be details stored in KeyPoint object.
class KeyPoint:
def __init__(self, time, songID):
self.time = time
self.songID = songID
Using the above mentioned technique we can create index for the audio/music collections. Given a query audio we find fingerprint(contains Keypoints and time details) of that audio and will match against the indexed database to find similar audios. Important thing to keep in mind is that we are not just matching key points but time too. We must overlap the timing. We can subtract the current time in our recording (from query fingerprint) with the time of the hash-match (from indexed KeyPoint objects). This difference is stored together with the song ID. Because this offset, this difference, tells us where we possibly could be in the song. When we have gone through all the hashes from our recording we are left with a lot of song id’s and offsets. More the number of hashes with matching offsets more relevant is the song.
Reference Paper: http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
Deep Neural Network Model (Deep learning Approach)
This method helps in learning features of a audio automatically. It's an unsupervised approach to learn important features. In deep neural network each layer extracts some abstract representation of the audio. The idea is through each computation layer the system is able to learn some representation of the audio. Initial sets of inputs is passed through the input layer and computations are done at the hidden layers. After passing some hidden layer we learn new feature representation of initial input features. These new learned features vectors will be the representations of audios. Now, we could use machine learning algorithm on these features to do various classification and clustering job.
In feature space, the audio will be close to each other have some kind of similarity notion between them.
Advantage
- Saves the effort of learning features of audio (meta data extraction and manual labelling)
- It can be used to find similar audios/music. Which can be useful in content based search and by using user information it can be used to recommend songs/audios.
- It can be used to detect duplicate songs. It generally happens that two songs ends up having different Ids but are indeed same songs.
Time
I will be spending 45 hours per week for this project.
Summer Plans
I am a first year masters student and my final exams will be over by the end of April, 2016. Then we have summer vacation till end of July. As, I am free till then so, I will be spending much of my time towards my project.
Motivation
I am a kind of person who always aspires to learn more and explore different areas. I am also an open source enthusiast who likes exploring new technologies. GSoC(Google Summer of Code) provides a very good platform for students like me to learn and show case their talents by coming up with some cool application at the end of summer.
Bio
I am a first year MS in Computer Science and Engineering student at IIIT Hyderabad, India. I am passionate about machine learning, deep learning, information retrieval and text processing and have keen interest in Open Source Software. I am also a research assistant at Search and Information Extraction Lab (SIEL) at my university.
Experiences
My work includes information retrieval for text based systems. During my course work I created a search engine on whole Wikipedia corpus from scratch. I have also been using machine learning and have dived into deep learning concepts for representation learning. Currently, I am working on learning efficient representations of nodes in social network graph.
I have sound knowledge of programming language namely Python, Cython, C, C++ etc. I have good understanding of python's advanced concepts like descriptors, decorators, meta-classes, generators and iterators along with other OOPs concepts. Also, I have contributed to some of the open source organizations like Mozilla, Fedora, Tor, Ubuntu before. As a part of GSoc 2014 project I worked for the Mars exploration project of Italian Mars Society. It was a virtual reality project where I implemented a system for full body and hand gesture tracking of astronauts. This allows astronauts in real world to control their avatar in virtual world through their body gestures. Details of this project can be found at:
concept idea : https://wiki.mozilla.org/Abhishek/IMS_Gsoc2014Proposal
IMS repo : https://bitbucket.org/italianmarssociety/
Details of some of my contributions can be found here
Github : https://github.com/AbhishekKumarSingh
Bitbuket : https://bitbucket.org/abhisheksingh
My CV : https://github.com/AbhishekKumarSingh/CV/blob/master/abhishek.pdf