DOKK Library

essential.js: a JavaScript library for music and audio analysis on the web

Authors Albin Correya Dmitry Bogdanov Luis Joglar-Ongay Xavier Serra

License CC-BY-4.0

Plaintext
        ESSENTIA.JS: A JAVASCRIPT LIBRARY FOR MUSIC AND AUDIO
                          ANALYSIS ON THE WEB

      Albin Correya1      Dmitry Bogdanov1          Luis Joglar-Ongay1,2       Xavier Serra1
               1
                 Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain
                                  2
                                    SonoSuite, Barcelona, Spain
                                   albin.correya@upf.edu, dmitry.bogdanov@upf.edu



                          ABSTRACT                                      applications arguably makes it one of the most used com-
                                                                        puter programming languages in the recent years [2]. JS is
Open-source software libraries for audio/music analysis                 also widely used as an entry-level programming language
and feature extraction have a significant impact on the de-             by the thinkers from design, art, computer graphics, archi-
velopment of Audio Signal Processing and Music Infor-                   tecture, and spaces in between where audio processing and
mation Retrieval (MIR) systems. Despite the abundance                   analysis can be relevant.
of such tools on the native computing platforms, there is                  With the adoption of both HTML5 and the W3C Web
a lack of an extensive and easy-to-use reference library                Audio API specifications [14], modern web browsers are
for audio feature extraction on the Web. In this paper,                 capable of audio processing, synthesis, and analysis with-
we present Essentia.js, an open-source JavaScript (JS) li-              out any third-party dependencies on proprietary software.
brary for audio and music analysis on both web clients                  This allows developers to move most of the audio process-
and JS-based servers. Along with the Web Audio API, it                  ing code from the server to the client and can provide better
can be used for efficient and robust real-time audio fea-               scalability and deployment, considering that the web-client
ture extraction on the web browsers. Essentia.js is modu-               has computational resources for the required processing.
lar, lightweight, and easy-to-use, deploy, maintain and in-             Web Audio API provides a JS interface to various prede-
tegrate into the existing plethora of JS libraries and Web              fined nodes for audio processing, synthesis, and analysis.
technologies. It is powered by a WebAssembly back-end                   Even though the provided capabilities are limited, the API
of the Essentia C++ library, which facilitates a JS interface           includes the ScriptProcessorNode for developers to add
to a wide range of low-level and high-level audio features.             custom JS code for audio processing. 2 The design of this
It also provides a higher-level JS API and add-on MIR util-             node has been criticized by the audio developer commu-
ity modules along with extensive documentation, usage ex-               nity since it runs the JS audio processing code on the main
amples, and tutorials. We benchmark the proposed library                UI thread, which can result in unreliable performance and
on two popular web browsers, Node.js engine, and An-                    audio glitching [15]. As an alternative, AudioWorklet [10]
droid devices, comparing it to the native performance of                has been introduced to mitigate this design issue to a great
Essentia and Meyda JS library.                                          extent by allowing running custom audio processing code
                                                                        on a dedicated audio thread. It also allows bi-directional
                                                                        communication between the audio thread and the control
                    1. INTRODUCTION
                                                                        thread of Web Audio API asynchronously using the web
The Web has become one of the most used computing                       message ports.
platforms with billions of web pages served daily, and JS                  Despite all of these recent developments of Web Audio
being an essential part of building modern web applica-                 technologies, the Audio Signal Processing and MIR com-
tions. Using HTML, CSS, and JS, developers can make                     munities still lack reliable and modular software tools and
dynamic, interactive, and responsive web pages by imple-                libraries that could be easily used for building audio and
menting custom client-side scripts. At the same time, the               music analysis applications for web browsers and JS run-
developers can also use cross-platform run-time engines                 time engines. To the best of our knowledge, Meyda [11]
like Node.js 1 to write server-side code in JS. The flexi-              and JS-Xtract [18] are the few available JS libraries that
bility of using JS on both server and client ends of web                are ready-to-use and have implementations of a limited set
  1
                                                                        of MIR audio features. 3 The lack of more extensive al-
      https://nodejs.org
                                                                        ternatives is not surprising, given that writing a new JS
                                                                        audio analysis library from scratch would require a lot of
              c Albin Correya, Dmitry Bogdanov, Luis Joglar-Ongay,
                                                                        effort. Also, JS code for audio processing are prone to per-
Xavier Serra. Licensed under a Creative Commons Attribution 4.0 In-     formance issues due to the just-in-time (JIT) compilation
ternational License (CC BY 4.0). Attribution: Albin Correya, Dmitry     and garbage collection overhead, which can be critical for
Bogdanov, Luis Joglar-Ongay, Xavier Serra. “Essentia.js: A JavaScript
Library for Music and Audio Analysis on the Web”, 21st International      2 https://www.w3.org/TR/webaudio/
Society for Music Information Retrieval Conference, Montréal, Canada,   #scriptprocessornode
2020.                                                                     3 As of May 2020, Meyda only has 20 MIR algorithms.
real-time audio and music analysis tasks. Due to these rea-
sons, researchers and developers often rely on server-side
audio processing solutions using the existing native MIR
tools for writing the required web applications.
   Over the last two decades, the existing software tools
for audio analysis have been mostly written in C/C++, Java
and Python and delivered as standalone applications, host
application plug-ins, or as software library packages. Soft-
ware libraries with a Python API, such as Essentia [7], Li-
brosa [23], Madmom [6], Yaafe [22] and Aubio [8], have
been especially popular within the MIR community due to
rapid prototyping and rich environment for scientific com-       Figure 1: Overview of the Essentia.js library in terms of
putations. Meanwhile, the libraries with a native C/C++          its abstraction levels.
back-end offered faster analysis [24] and were often re-
quired for writing industrial audio applications. Various
web applications for audio processing and analysis have          machine learning frameworks.
been developed using these tools. Spotify API 4 (formerly            In [24], the authors evaluated a wide range of MIR soft-
Echonest), Freesound API [13] and AcousticBrainz [25]            ware libraries in terms of coverage, effort, presentation,
are examples of web services providing precomputed au-           time-lag and found Essentia 7 [7] to be an overall best per-
dio features to the end users via a REST API. Besides, nu-       former with respect to these criteria. Essentia is an open-
merous web applications were built for aiding tasks such         source library for audio and music analysis released under
as crowd sourcing audio annotations [9, 12], audio listen-       the AGPLv3 license providing a wide range of optimized
ing tests [19, 26] and music education platforms [1, 21] to      algorithms (over 250 algorithms) that are successfully used
mention a few. All these services manage their audio anal-       in various academic and industrial large-scale applications.
ysis on the server, which may require a significant effort       Essentia includes both low-level and high-level audio fea-
and resources to scale to many users.                            tures, along with some ready-to-use features extractors.
   With the recent arrival of WebAssembly (WASM) sup-            And, it provides an object-oriented interface to fine tune
port on most of the modern web browsers [16], one can            each algorithm in detail. Given all these advantages and
safely port the existing C/C++ audio processing and anal-        that the code repository is consistently maintained by its
ysis code into the Web Audio ecosystem using com-                developers, it is a good potential choice for porting into
piler toolchains such as Emscripten. 5 WASM is a low-            WASM target for the web platform.
level assembly-like language with a compact binary format            In this paper, we present Essentia.js, an open-source JS
that runs with near-native performances on modern web            library for audio and music analysis on the web, released
browsers or any WebAssembly-based stacks without com-            under the AGPLv3 license. It allows building audio analy-
promising security, portability and load time. WASM code         sis and MIR applications for web browsers and JS engines
was found to be comparatively faster than JS code [17] and       such as Node.js. It provides straightforward integration
generates no garbage from the code and can run within Au-        with the latest W3C Web Audio API specification allow-
dioWorkletProcessor. 6 This makes it an ideal solution to        ing efficient real-time audio feature extraction on the web
the problems that were previously hindering us from build-       browsers. Web applications written using the proposed li-
ing efficient and reliable JS MIR libraries for the web plat-    brary can also be cross-compiled to native targets such as
form [20]. However, taking full advantage of all these fea-      for PCs, smartphones, and IoT devices using the JS frame-
tures can be challenging because it requires understand-         works like Electron 8 and React Native. 9
ing concurrent programming wrapped with several JS APIs              The rest of the paper is organized as follows. Section 2
and dealing with cross-compilation and linking of the na-        outlines the design choices, software architecture and var-
tive code. Even for experienced developers, compiling na-        ious components of Essentia.js. An overview of potential
tive code to WASM targets, generating JS bindings, and in-       use-cases and usage examples are detailed in Section 3.
tegrating them in a regular JS processing code pipeline can      A detailed benchmarking of Essentia.js across and against
be cumbersome. Hence, it is essential that an ideal JS MIR       various platforms and alternative JS libraries can found in
software library should encapsulate and abstract all these       Section 4. Finally, we conclude in Section 5.
steps and be packaged as a compact build which is easy-to-
use and extendable using a high-level JS API. Besides the                            2. ESSENTIA.JS
JS API, the potential users might also need utility tools that
are often required while building MIR-based projects, such       Essentia.js is much more than just JS bindings to the Es-
as plotting audio features on an HTML page, ready-to-use         sentia C++ library. It was developed with coherent design
feature extractors, and possible integration with web-based      and functional objectives that are necessary for building an
                                                                 efficient and easy-to-use MIR library for the Web. In this
  4 https://developer.spotify.com/documentation
  5                                                                7 https://essentia.upf.edu
    https://emscripten.org
  6 https://www.w3.org/TR/webaudio/                                8 https://www.electronjs.org
#audioworkletprocessor                                             9 https://reactnative.dev
section, we discuss our design choices, the architecture,       the majority of the algorithms in Essentia, 14 while the
and various components of Essentia.js. Figure 1 shows an        few excluded algorithms can be still integrated into the
overview of these components.                                   WASM backend by compiling and linking with the re-
                                                                quired third-party dependencies using our build tools
2.1 Design and Functionality                                    (Section 2.5). Besides, all the JS code in the library is
                                                                passed through a code compression process to achieve
We chose the following goals and design decisions for de-       lightweight builds for the web browsers. With all these
veloping the library:                                           efforts we were able to achieve builds of Essentia.js, in-
                                                                cluding the WASM backend and the JS API, as small as
• User-friendly API and utility tools. The Web is a ubiq-
                                                                2.5MB approximately. We also provide tools for custom
  uitous computing platform, and an ideal JS MIR library
                                                                lightweight builds of the library that only include a sub-
  should provide a simple, user-friendly API while being
                                                                set of the selected algorithms to further reduce the build
  highly configurable for advanced use cases. Essentia.js
                                                                size (Section 2.5).
  ships with a simple JS API and add-on utility modules
  with a fast learning curve for new users. The main JS       • Reproducibility. Using the WASM backend of Essen-
  API is composed of a singleton class with methods im-         tia ensures identical analysis results across various de-
  plementing most of the functionality (each method is an       vices and native platforms on which Essentia has been
  algorithm in Essentia). These methods are automatically       previously extensively used and tested. Remarkably, Es-
  generated from the upstream C++ code and documenta-           sentia.js allows reproducing a large amount of existing
  tion using code templates and scripting as described in       code and research based on Essentia as well as, to a cer-
  Sections 2.2 and 2.3. We also provide add-on modules          tain extent, other libraries. In particular, it is possible to
  with helper classes for feature extraction and visualisa-     use Essentia.js to reproduce common input audio rep-
  tion that can be used for rapid prototyping of web appli-     resentations for the existing machine learning models,
  cations. A quick preview of the JS API can be seen in         enabling their usage in web applications.
  Listing 2.
                                                              • Easy installation. Essentia.js is easy to install and inte-
• Modularity and extensibility. Just like Essentia itself,      grate with new or existing web projects. It is available
  the Essentia.js codebase is modular by design. It con-        both as a package on NPM 15 and as static builds on
  tains a large amount of configurable algorithms that can      our public GitHub repository. In addition, we also pro-
  be arranged into the desired audio processing chains.         vide continuous delivery network (CDN) through open
  The add-on utility modules shipped with the library           source web services.
  leverage this functionality to build custom feature ex-
  tractors.                                                   • Extensive documentation. We provide a complete API
                                                                reference together with the instructions to get started,
• Web standards compliance.Web browsers provide a               tutorials, and interactive web application examples. 16
  standard set of tools for loading and processing au-          The documentation is built automatically using JS-
  dio files using the HTML5 Audio element 10 and the            doc 17 and the algorithm reference is generated from
  Web Audio API. Essentia.js rely on these standard fea-        the upstream Essentia C++ documentation using Python
  tures for loading audio files or for streaming real-time      scripts.
  audio from various device sources. It also provides
  seamless integration with the latest tools in the Web       2.2 Essentia WASM backend
  Audio ecosystem such as AudioWorklets, Web Work-
  ers, 11 WASM and SharedArrayBuffer. In addition, JS         As already mentioned, the core of the library is powered
  conforms to the ECMAScript specification 12 and it is       by the Essentia WASM backend. It contains a lightweight
  evolving fast with new features added to the language       WASM build of Essentia C++ library along with custom
  every year. For backward and forward compatibility of       JS bindings for using it in JS. This backend is generated in
  our JS code, we wrote our JS API using Typescript (Sec-     multiple steps.
  tion 2.3).                                                     Firstly, the Essentia C++ library is compiled to LLVM
                                                              assembly 18 using Emscripten. Emscripten [28] is a LLVM
• Lightweight and few dependencies. It is important for       to JS compiler which provides a wide range of tools for
  a JS library to be lightweight since the load times of JS   compiling the C/C++ code-base or the intermediate LLVM
  code can directly impact the UI/UX and performance of       builds into various targets such as asm.js 19 and WASM.
  web applications. This includes having fewer dependen-      Secondly, we need a custom C++ interface in order to gen-
  cies, which also makes the library much more maintain-      erate the corresponding JS bindings which would allow us
  able. Taking this into account, Essentia WASM backend       access the algorithms in Essentia on the JS side. We used
  is built without any third-party software dependencies of
  the Essentia library except for Kiss FFT. 13 It includes     14 As of May 2020, over 200 algorithms are supported.
                                                               15 https://www.npmjs.com
 10 https://www.w3.org/html/wiki/Elements/audio                16 https://mtg.github.io/essentia.js
 11 https://w3c.github.io/workers/                             17 https://jsdoc.app
 12 http://ecma-international.org/ecma-262                     18 https://llvm.org
 13 https://github.com/mborgerding/kissfft                     19 http://asmjs.org
Embind [4] for generating this C++ interface that binds Es-
sentia native code to JS.
    Writing custom JS bindings for all Essentia algorithms
can be tedious considering their large amount. Hence, we
created Python scripts to automate the generation of the re-
quired C++ code for the C++ wrapper from the upstream
library Python bindings. Using this scripts, it is possible to
configure which algorithms to include in the bindings by
their name and category. Therefore, it is possible to cre-
ate extremely lightweight builds of the library with only a
few specific algorithms required for a particular applica-
tion. The Essentia WASM backend is built by compiling
the generated wrapper C++ code and linking with the pre-
compiled Essentia LLVM using Emscripten.
    Essentia WASM backend provides compact WASM bi-              Figure 2: Screenshot of a example web application that
nary files along with the JS bindings to over 200 Essentia       use Essentia.js and its add-on modules.
algorithms. We provide these binaries and a JS glue code
for both asynchronous and synchronous import of Essen-           • essentia.js-plot provides helper functions for visualiza-
tia WASM backend, covering a wide range of use cases.              tion of MIR features, allowing both real-time and offline
The build for asynchronous import can be instantly loaded          plotting in a given HTML element. It uses the Plotly.js
into a HTML page. The synchronous-import build sup-                data visualization library, which has a design layout and
ports the new ES6 style class imports characteristic of the        functionalities much alike of Matplotlib, 20 and is eas-
modern JS libraries. This build can be also seamlessly in-         ily configurable. Currently, we provide object-oriented
tegrated with AudioWorklet and Web Workers for better              classes for plotting basic MIR features like melody/pitch
performance demanding web applications.                            contours, spectrograms, chroma, MFCC, etc. The mod-
2.3 High-level JS API                                              ule is functionally similar to the display module in Li-
                                                                   brosa [23].
Even though it is possible to use the Essentia WASM back-
end with its bindings directly, they have limitations due to        A full reference of the modules can be found in the doc-
the specifics of using Embind with Essentia: a user needs        umentation of the library. Both modules can be easily ex-
to manually specify all parameter values for the algorithms      tended with more functionalities as per the demands of the
because the default values are not supported. This is cum-       user community.
bersome and to solve this issue we developed a high-level
                                                                 2.5 Build and packaging tools
JS API written using Typescript [5]. Typescript is a typed
superset of JS that can be compiled to various ECMA tar-         We provide tools for custom builds and packaging of Es-
gets of JS. In addition, this gives us the benefit of having a   sentia.js for the developers and the end-level users:
typed JS API which can provide better exception handling.        • Command-line interface (CLI). We provide CLI for
Again we used custom Python scripts and code templates             building Essentia.js using some customised shell scripts.
to automatically generate the Typescript wrapper in a sim-
ilar way to the C++ wrapper for the WASM backend. The            • Docker. We provide a Docker image with static builds
high-level JS API of Essentia.js provides a singleton class        of Essentia with Emscripten and other development de-
Essentia with all the algorithms and helper functions en-          pendencies required for building Essentia.js.
capsulated as its methods All the algorithm methods are          • Web application. We also host a website for building
configurable similarly to the Essentia’s C++/Python API            custom Essentia.js online. 21 It allows users to select
itself. Listing 1, shows an example of using the high-level        specific set algorithms and build settings.
API of Essentia.js.
                                                                   The official Essentia.js builds are bundled using
2.4 Add-on utility modules                                       Rollup 22 and then packaged and hosted using NPM.
Essentia.js is shipped with a few add-on modules to facil-
itate common MIR tasks. These add-on modules are writ-                          3. GETTING STARTED
ten entirely in Typescript using the Essentia.js high-level      In this section, we outline several usage examples and ap-
JS API. Currently, we provide two add-on modules:                plication scenarios for getting started with Essentia.js.
                                                                    The library can be imported into a web application us-
• essentia.js-extractor contains predefined feature ex-
                                                                 ing the following methods:
  tractors for common MIR tasks, computing BPM, beat
  positions, pitch, predominant melody, key, chords,             • HTML <script> tag. The most simple way to use Es-
  chroma features, MFCC, etc. Each extractor implements            sentia.js is by using it with the HTML <script> tag.
  the entire processing chain starting from audio input and       20 https://matplotlib.org
  outputs the resulting track-level or frame-level features.      21 https://mtg.github.io/essentia.js-builder
  These extractors are configurable as well.                      22 https://rollupjs.org
• NPM. Essentia.js can be also installed from NPM using        // Imports Essentia WASM backend
  the command npm install essentia.js.                         import {EssentiaWASM} from ’essentia-wasm.module.js’;
                                                               // Imports Essentia.js core API
                                                               import Essentia from ’essentia.js-core.es.js’;
• ES6 class imports. Essentia.js can be imported using
  the ES6 class style imports in JS. This allows to inte-      // Creates Essentia.js instance
                                                               const essentia = new Essentia(EssentiaWASM);
  grate the library into any modern JS framework. Listing
  1 shows an example of using ES6 style imports for an         // Instance of Web Audio API AudioContext
                                                               const audioContext = new AudioContext();
  offline feature extraction task.                             // URL of an audio file
                                                               let audioURL = "https://freesound.org/data/previews
• CDN. We also provide CDN links for instantly serv-                /328/328857_230356-lq.mp3";
  ing Essentia.js online using free third-party web services   // Decode audio file as Float32 typed array
  such as Jsdelivr 23 and Unpkg. 24                            const audioData = await essentia.
                                                                    getAudioChannelDataFromURL(audioURL, audioContext,
                                                                     0); // audioContext, channel number
    There are a lot of potential web applications that can
be built with Essentia.js. The library provides algorithms     // Convert audioData array into vector
                                                               const audioVector = essentia.arrayToVector(audioData);
for typical sound and music analysis tasks, including spec-
tral, tonal, and rhythmic characterization. In particular,     // Onset detection with SuperFluxExtractor algorithm
                                                               let bt = essentia.SuperFluxExtractor(audioVector);
it is suitable for onset detection, beat tracking and tempo    console.log(bt.onsets);
estimation, melody extraction, key and chord estimation,
                                                               // Pitch estimation with PitchYinProbabilistic
sound and music classification, cover song similarity, loud-        algorithm
ness metering, and audio problems detection among oth-         let pyYin = essentia.PitchYinProbabilistic(audioVector,
                                                                     4096, 256); // frameSize, hopSize
ers. Figure 2 shows the screenshot of an example web ap-
plication that we include with the library. Below we outline   console.log(pyYin.pitch);
some of the common application use cases of the library.       // Shutdown Essentia.js instance and free memory
We provide an extensive collection of analysis examples        essentia.shutdown();
                                                               essentia.delete();
on our website. 25
                                                               Listing 1: A simple example of offline audio feature
3.1 Offline feature extraction                                 extraction using Essentia.js via ES6 style imports.

Many MIR use cases rely on an offline audio analysis and
feature extraction. Listing 1 shows a simple JS example
                                                                 using AudioWorkletNode. 26 Currently, the only limita-
of using the library for offline analysis of pitch and on-
                                                                 tion is that it is only supports in the latest Firefox and
sets. For features computed on overlapping frames, Es-
                                                                 Chromium-based web browsers.
sentia.js provides the FrameGenerator method similarly to
Essentia’s Python API. Frames generated by this method
can be used as an input to other algorithms in the process-    3.3 Machine learning applications
ing chain. The offline feature extraction can be run inside
                                                               In the recent years, machine learning (ML) techniques, es-
a Web Worker to improve the efficiency in performance-
                                                               pecially deep learning have been widely used in a wide
demanding web applications.
                                                               range of MIR tasks. With the support of WebGL and
                                                               WASM, modern web browsers are also capable of running
3.2 Real-time feature extraction
                                                               ML applications. Essentia.js can be easily integrated with
Essentia.js can be used for efficient real-time audio/music    popular JS ML frameworks such as TensorFlow.js [27] and
analysis in web browsers along with the Web Audio API.         Onnx.js 27 for training and inference. Pre-trained audio
This can be done by using the ScriptProcessorNode or the       ML models using features computed with Essentia as an
newly introduced AudioWorklet in the Web Audio API:            input (e.g., mel-spectrograms, Constant-Q transform, or
                                                               chroma) can be easily ported and used for inference in
• ScriptProcessorNode allows users to provide custom           web browsers. In particular, Essentia now ships with a
  JS code for audio feature extraction in its onprocess        collection of pre-trained TensorFlow models for music au-
  callback. Even though the ScriptProcessorNode is dep-        dio tagging and classification [3]. These models can be
  recated according to the current W3C Web Audio API           run for inference using Essentia.js and TensorFlow.js li-
  specifications, it is still widely used by the developers    braries. Our essentia.js-extractor add-on module provides
  because of its cross-browser support.                        a mel-spectrogram extractor for computing the inputs to
• AudioWorklet design pattern [10] allows users to write       these models.
  high-performance real-time audio analysis on a dedi-
  cated audio thread. Users can write custom analysis                             4. BENCHMARK
  code by extending the AudioWorkletProcessor and fur-         We tested the performance of Essentia.js in terms of the
  ther abstract it as a node in the Web Audio API graph        analysis time for common MIR audio features on various
 23 https://www.jsdelivr.com                                    26 https://www.w3.org/TR/webaudio/
 24 https://unpkg.com                                          #audioworkletnode
 25 https://mtg.github.io/essentia.js/examples                  27 https://github.com/Microsoft/onnxjs
                         (a) Essentia.js                                                  (b) Meyda

Figure 3: Average analysis times (in seconds) for common audio features on a 5-second music clip. "Python (Linux)"
corresponds to the analysis baseline using native Essentia with Python bindings.


platforms, and compared it to the native Essentia library.       Chrome on Android, which was not expected. This is
In addition, we measured the analysis times for features         probably because different vendors have slightly different
available in Meyda and compared them to their Essentia.js        implementations of WASM support in their platforms or
counterparts. To this end, we built a set of test suites using   due to some other reasons yet to be found. In addition,
the JS library benchmark.js and implemented the equiva-          WASM is a relatively new technology in active develop-
lent features using both libraries. In our benchmark we          ment. 29 Many proposals for improving its performance
measure the time it takes for the entire processing chain        such as SIMD optimizations and multi-thread support are
to compute a feature given a 5-second audio segment as an        under way. We also aim to do detailed benchmarking of
input. The code used by Essentia.js is equivalent to the one     real-time use cases and using the Web Audio API Audio
for Essentia used in Python. The benchmarking of Python          Worklets in our future work.
implementation was done using the library pytest with the
benchmark extension. We provide the code and website to                             5. CONCLUSIONS
reproduce these experiments online. 28
                                                                 We have presented Essentia.js, an open-source JavaScript
   The results are reported in Figure 3. They include tests
                                                                 library for music/audio analysis on the Web. It is based on
on five different environments:
                                                                 the Essentia C++ library which is commonly used in MIR,
• Linux with Chrome 84.0.4147.89 run with disabled ex-           ported to JS via WASM, and compatible with the latest
  tensions.                                                      technologies in the Web Audio ecosystem. To the best of
                                                                 our knowledge, this is the most comprehensive library for
• Linux with Firefox 78.0.2 in private browsing mode.
                                                                 audio analysis and MIR, which can be run on web browsers
• Android 9 (LineageOS 16) with Chrome 84.0.4147.89              as well as JS server applications. We hope that the library
  in incognito mode.                                             will contribute to the creation of new online music technol-
• Android 9 (LineageOS 16) with Firefox Nightly 200727           ogy tools in educational, industrial, and creative contexts.
  06:00                                                          The source code of the library is publicly available in our
                                                                 Github repository. 30 Everyone is encouraged to contribute
• Linux with Node.js v.13.13.0.                                  to the library.
                                                                     In our future work, we will focus on improving the per-
   The Linux computer used for all runs is a 2017 DELL
                                                                 formance of the library along with expanding the add-on
XPS-15 with a 2.80GHz x 8 Intel Core i7-7700HQ proces-
                                                                 modules, particularly on providing pre-trained audio ML
sor, 16GB of RAM and Ubuntu 19.04 as OS. The mobile
                                                                 models for audio analysis, classification, and synthesis on
phone is a Xiaomi Redmi Note 7 Pro with a Snapdragon
                                                                 the web client. We also aim to develop interesting web ap-
Octa-core 1.7 GHz processor and 6GB RAM. All the tests
                                                                 plications that go beyond typical MIR tasks to attract and
were done with the same 5 seconds audio file.
                                                                 build a diverse user community. The detailed information
   As we can see from Figure 3, the results shows that the
                                                                 about the library is available at the official web page. 31 It
performance of Essentia.js algorithms were considerably
                                                                 contains the complete documentation, usage examples and
slower when running on Node.js and Firefox and Chrome
                                                                 tutorials needed for one to get started.
on Android compared to Firefox and Chrome on Linux.
                                                                  29
Interestingly, Node.js performed worse than Firefox and              https://webassembly.org/roadmap/
                                                                  30 https://github.com/MTG/essentia.js
 28   https://mtg.github.io/essentia.js-benchmarks                31 https://essentia.upf.edu/essentiajs
                    6. REFERENCES                                    Freesound datasets: a platform for the creation of open
 [1] MusicCritic: An automatic assessment system for                 audio datasets. In International Society for Music In-
     musical exercises. https://musiccritic.upf.                     formation Retrieval Conference (ISMIR 2017), pages
     edu. Accessed: 2020-09-04.                                      486–93.

 [2] Stack Overflow Annual Developer Survey. https://           [13] Frederic Font, Gerard Roma, and Xavier Serra.
     insights.stackoverflow.com/survey. Ac-                          Freesound technical demo. In ACM International Con-
     cessed: 2020-15-04.                                             ference on Multimedia (MM 2013), page 411–412,
                                                                     New York, NY, USA, 2013.
 [3] Pablo Alonso-Jiménez, Dmitry Bogdanov, Jordi Pons,
     and Xavier Serra. Tensorflow audio models in Essen-        [14] W3C Audio Working Group. W3C Web Audio
     tia. In IEEE International Conference on Acoustics,             API specifications. https://www.w3.org/TR/
     Speech and Signal Processing (ICASSP 2020), pages               webaudio. Accessed: 2020-15-04.
     266–270, 2020.                                             [15] W3C Technical Architecture Group. Web Audio
 [4] Chad Austin. CppCon 2014: Embind and Em-                        API Design Review. https://github.com/
     scripten: Blending C++11, JavaScript, and the Web               w3ctag/design-reviews/blob/master/
     Browser. https://www.youtube.com/watch?                         2013/07/WebAudio.md. Accessed: 2020-05-04.
     v=Dsgws5zJiwk. Accessed: 2020-15-04.                       [16] Andreas Haas, Andreas Rossberg, Derek L Schuff,
 [5] Gavin Bierman, Martín Abadi, and Mads Torgersen.                Ben L Titzer, Michael Holman, Dan Gohman, Luke
     Understanding typescript. In European Conference on             Wagner, Alon Zakai, and JF Bastien. Bringing the
     Object-Oriented Programming (ECOOP 2014), pages                 web up to speed with webassembly. In ACM SIGPLAN
     257–281. Springer, 2014.                                        Conference on Programming Language Design and
                                                                     Implementation (PLDI 2017), pages 185–200, 2017.
 [6] Sebastian Böck, Filip Korzeniowski, Jan Schlüter, Flo-
     rian Krebs, and Gerhard Widmer. Madmom: A new              [17] David Herrera, Hanfeng Chen, Erick Lavoie, and Lau-
     python audio and music signal processing library. In            rie Hendren. Numerical computing on the web: Bench-
     ACM International Conference on Multimedia (MM                  marking for the future. In ACM SIGPLAN Interna-
     2016), pages 1174–1178, 2016.                                   tional Symposium on Dynamic Languages (DLS 2018),
                                                                     pages 88–100, 2018.
 [7] Dmitry Bogdanov, Nicolas Wack, Emilia Gómez,
     Sankalp Gulati, Perfecto Herrera, Oscar Mayor, Ger-        [18] Nicholas Jillings, Jamie Bullock, and Ryan Stables. JS-
     ard Roma, Justin Salamon, José Zapata, and Xavier               Xtract: A realtime audio feature extraction library for
     Serra. Essentia: An audio analysis library for music in-        the Web. In International Society for Music Informa-
     formation retrieval. In International Society for Music         tion Retrieval Conference (ISMIR 2016) Late Breaking
     Information Retrieval Conference (ISMIR 2013), pages            Demo, 2016.
     493–498, 2013.
                                                                [19] Nicholas Jillings, David Moffat, Brecht De Man, and
 [8] Paul M Brossier. The aubio library at MIREX 2006.               Joshua D. Reiss. Web Audio Evaluation Tool: A
     In Music Information Retrieval Evaluation Exchange              browser-based listening test environment. In Sound
     (MIREX 2006), 2006.                                             and Music Computing Conference (SMC 2015), 2015.

 [9] Mark Cartwright, Ayanna Seals, Justin Salamon, Alex        [20] Jari Kleimola and Oliver Larkin. Web audio mod-
     Williams, Stefanie Mikloska, Duncan MacConnell,                 ules. In Sound and Music Computing Conference (SMC
     Edith Law, Juan P Bello, and Oded Nov. Seeing sound:            2015), 2015.
     Investigating the effects of visualizations and com-
                                                                [21] Anand Mahadevan, Jason Freeman, Brian Magerko,
     plexity on crowdsourced audio annotations. Proceed-
                                                                     and Juan Carlos Martinez. Earsketch: Teaching com-
     ings of the ACM on Human-Computer Interaction,
                                                                     putational music remixing in an online web audio
     1(CSCW):1–21, 2017.
                                                                     based learning environment. In Web Audio Conference
[10] Hongchan Choi. AudioWorklet: The future of web au-              (WAC 2015), 2015.
     dio. Ann Arbor, MI: Michigan Publishing, University
                                                                [22] Benoit Mathieu, Slim Essid, Thomas Fillon, Jacques
     of Michigan Library, 2018.
                                                                     Prado, and Gaël Richard. Yaafe, an easy to use and
[11] Jakub Fiala, Nevo Segal, and Hugh A. Rawlinson.                 efficient audio feature extraction software. In Interna-
     Meyda: an audio feature extraction library for the Web          tional Society for Music Information Retrieval Confer-
     Audio API. In Web Audio Conference (WAC 2015),                  ence (ISMIR 2010), pages 441–446, 2010.
     pages 1–6, 2015.
                                                                [23] Brian McFee, Colin Raffel, Dawen Liang, Daniel PW
[12] Eduardo Fonseca, Jordi Pons Puig, Xavier Favory,                Ellis, Matt McVicar, Eric Battenberg, and Oriol Nieto.
     Frederic Font Corbera, Dmitry Bogdanov, Andres Fer-             librosa: Audio and music signal analysis in python. In
     raro, Sergio Oramas, Alastair Porter, and Xavier Serra.         Python in Science Conference (SciPy 2015), 2015.
[24] David Moffat, David Ronan, and Joshua D. Reiss. An
     evaluation of audio feature extraction toolboxes. In In-
     ternational Conference on Digital Audio Effects (DAFx
     2015), pages 1–7, 2015.

[25] Alastair Porter, Dmitry Bogdanov, Robert Kaye, Ro-
     man Tsukanov, and Xavier Serra. Acousticbrainz: a
     community platform for gathering music information
     obtained from audio. In International Society for Music
     Information Retrieval Conference (ISMIR 2015), 2015.

[26] Michael Schoeffler, Fabian-Robert Stöter, Bernd Edler,
     and Jürgen Herre. Towards the next generation of web-
     based experiments: A case study assessing basic au-
     dio quality following the ITU-R recommendation BS.
     1534 (MUSHRA). In Web Audio Conference (WAC
     2015), pages 1–6, 2015.

[27] Daniel Smilkov, Nikhil Thorat, Yannick Assogba, Ann
     Yuan, Nick Kreeger, Ping Yu, Kangyi Zhang, Shanqing
     Cai, Eric Nielsen, David Soergel, et al. Tensorflow. js:
     Machine learning for the web and beyond. In Confer-
     ence on Systems and Machine Learning (SysML 2019),
     2019.

[28] Alon Zakai. Emscripten: an llvm-to-javascript com-
     piler. In ACM SIGPLAN Conference on Object-
     Oriented Programming, Systems, Languages, and Ap-
     plications (OOPSLA 2011), pages 301–312, 2011.