Essentia is an open source C++ library for audio analysis, released under Affero GPL V3 licence. The package includes a complete set of algorithms required for extracting information from audio, audio file input/output functionality, standard digital signal processing (DSP) building blocks, filters, generic algorithms for statistical characterisation, and spectral, temporal, tonal and high-level music descriptors.
Researchers make use of the Essentia environment for experimentation and rapid application development. For analysing a large database of audio tracks, the software enables compilation of optimised extractors to run efficiently on computing clusters. Users who are familiar with MATLAB/Python environments can make use of Python bindings in the package.
What is Essentia about
The software package is itself a collection of algorithms for feature extraction from audios, wrapped up as a C++ library. When it comes to analysis of audio data, ease of use, performance, maintainability of the code, accuracy, optimality and availability of algorithms are the major design considerations. Essentia is developed to cater to all these requirements.
Decision and implementation of the flow of music analysis is completely left to the discretion of the user. The software simply takes care of implementation details of the algorithms used.
The reason behind development of Essentia as a C++ library is to ensure high performance. Besides this, the object-oriented programming environment in C++ is well-suited for audio-signal processing. Python bindings provided in the package make it well-suited to be used in an interactive development environment. Writing feature extractors is further made easy by the introduction of streaming mode, along with the conventional standard mode.
Further, conversion of Python code into C++ in the streaming mode is rather straightforward. It also includes a Vamp plugin to be used with Sonic Visualiser for visualisation purposes.
Where can Essentia be used
The software has been used extensively in a wide range of applications in audio-signal processing. Essentia could be used for semantic autotagging, visualisation and interaction with music, sound indexing, cover and beat detection.
Researchers in neuroimaging make use of Essentia for the study of acoustic analysis of stimuli. It can also be used for classification of music, especially as per the mood, and for finding out similar music recommendations. We can use the software for recognising the instruments used in the music.
There is a popular Web based application called Dunya, which is based on the audio processing using Essentia. This application allows the user to interact with the audio music application through music concepts related to a music culture, for example, Raaga and Taala in the case of Carnatic music. The application contains a database that includes relevant information on the music collection. The analysis module in Essentia is used to extract features from an audio recording, in order to create a meaningful relationship between all items in the database. This relationship is then utilised by the application for providing better suggestions and recommendations to the user.
Main algorithms bundled in Essentia
The Essentia package cannot be considered as a software framework. Instead, it is a useful set of algorithms for feature attraction along with some infrastructure for multi-threading and low memory usage. Some of the frequently used algorithms are:
Audio file input/output. The software comes with a variety of audio loaders and writers in order to facilitate reading and writing nearly all audio formats. An audio loader is used to load the audio file and return a stream of stereo samples. A mono loader is used to down-mix the file to a mono-signal and resample it to the given sample rate. The software also contains a large number of algorithms like AudioWriter and MonoWriter that enable writing audio files.
Standard signal processing algorithms. Essentia is bundled with major digital signal processing algorithms (DSPs) for basic processing of audio samples. The basic DSP operations of windowing, auto and cross-correlation, finding the finite Fourier transform, resampling, removing the DC component, etc can all be done using the software. We can also conveniently convert complex arrays between polar and Cartesian coordinate systems using this package.
Filters. A good variety of audio-filtering operations can be carried out using Essentia. It includes low-pass, high-pass, band-pass and band-reject filtering operations. We could conveniently apply an equal loudness curve-approximating filter for filtering operations. Filtering out the DC component from the signal is also possible.
Statistics. When we are dealing with a huge amount of data, it becomes necessary to compute its statistics. It is possible to find out the energy of an array of values, as well as the root-mean-square value of these. We can have the mean, median, geometric mean and power mean of an array of values. In order to aid computation of probability distributions, variance, skewness and kurtosis algorithms are present. We can also have a single Gaussian estimate of the given list of arrays.