Automatic detection of unwanted noise
A major task in any Data Science project is data cleaning. Without proper cleaning, data can be biased, polluted or even inconsistent. If a machine learning model is fitted using such data, the results obtained are unlikely to be reliable.
In this project, I used machine learning to enhance and speed up the process of cleaning of audio recordings. In these recordings, parasitic noise can occur, but they are not known beforehand. Therefore, developing an unsupervised method is mandatory.
Results
Input spectrogram (top)  Anomaly scores (middle)  Ouput spectrogram with dark frames for parts classified as anomalies (bottom)
Pipeline
 HarmonicPercussive Source Separation
 Extraction of features from the signal
 Features enrichment with statistical indicators
 Scoring using a Isolation Forest (unsupervised anomaly detection)
 Rolling window to smoothen the results
Harmonicpercussive source separation
A spectrogram is a 3D representation of a signal. Time is usually represented along the xaxis, and frequency along the yaxis. The zaxis corresponds to the amplitude and is conveniently represented using a colormap. The signal is divided into frames, and for each frame is calculated a spectrum (a column of pixels in the spectrogram).
This representation allows human eye to "visualize" the sound.
Horizontal lines correspond to tonal noise (nearly constant frequency), whereas parasitic noises usually span vertically on a spectrogram. It can be shocks, clicks, voice and so on. Using a right sized window, the harmonicpercussive source separation (HPSS [1]) allows to separate and filter out tonal and broadband noise, to keep the percussive component (vertical lines).
Below is an example taken from librosa's documentation on the effect of applying the HPSS on a sample recording.
Example from librosa documentation
Features extraction
Several features are extracted from the signal, using the librosa python library [2].
Melfrequency cepstral coefficients (MFCCs)
The detection uses 15 MFCCs computed by librosa.feature.mfcc
MFCCs are commonly derived as follows [3]:
 Divide signal into frames.
 Take the Fourier transform.
 Convert to a melscale.
 Take the logs of the powers.
 Take the discrete cosine transform.
 The MFCCs are the amplitudes of the resulting spectrum.
Spectral contrast
The detection uses 6band spectral contrast [4] computed by librosa.feature.spectral_contrast
Each frame of a spectrogram S is divided into subbands. For each subband, the energy contrast is estimated by comparing the mean energy in the top quantile (peak energy) to that of the bottom quantile (valley energy). High contrast values generally correspond to clear, narrowband signals, while low contrast values correspond to broadband noise.
Zerocrossing rate
Computed by librosa.feature.zero_crossing_rate. The zerocrossing rate is defined as the rate of signchanges along a signal.
$\text{zerocrossingrate} = \frac{1}{T1}\sum_{t=1}^{T}\mathbb{1}\{s_ts_{t1}<0\}$Spectral Rolloff
Computed by librosa.feature.spectral_rolloff
The rolloff frequency is defined as the frequency below which a given percentage of the energy of the spectrum is contained.
Onsetstrength
Computed by librosa.onset.onset_strength
Compute a spectral flux onset strength envelope [5].
RMS
Computed by librosa.feature.rms
Compute the rootmeansquare (RMS) value for each frame.
Features enrichment
To enrich the dataset, we compute for each feature, using a sliding window.

Corrected sample standard deviation:
$s=\sqrt{\frac{1}{N1}\sum_{i=1}^N(x_i  \bar{x})^2}$where $\bar{x}$ is the sample mean.
The higher $s$ is, the higher the local dispersion of data.

Sample skewness:
$G_1 = \frac{k_3}{k_2^{3/2}} = \frac{\sqrt{n(n1)}}{n2}\frac{\frac{1}{n}\sum_{i=1}^n(x_i\bar{x})^3}{\left[\frac{1}{n1}\sum_{i=1}^n(x_i\bar{x})^2\right]^{3/2}}$where $\bar{x}$ is the sample mean.
The higher the absolute value of $G_1$, the more asymmetric the distribution of data.

Sample kurtosis:
$g_2 = \frac{m_4}{m_2^2}3 = \frac{\frac{1}{n}\sum_{i=1}^n(x_i\bar{x})^4}{\left[\frac{1}{n}\sum_{i=1}^n(x_i\bar{x})^2\right]^2}$where $\bar{x}$ is the sample mean.
The higher $g_2$ is, the fatter the tails of the distribution, hence, the higher the number of extreme values.
Isolation Forest
The main approaches when it comes to detecting anomalies consists in profiling what a normal point is. Isolation Forest [6] uses a completely different method. Instead of focusing on normal points, it isolates the abnormal ones.
When we see a lineplot (1D), we can easily imagine setting a range of acceptable values. However, this range may change along a second dimension. For example, what can be considered as normal for a temperature of 20°C may not be at 50°C. The Isolation Forest method has the ability to work with ndimension data.
An Isolation Forest is composed of multiple trees
The algorithm to build a tree is the following:

Take a sample of the dataset

Select a random attribute (dimension)

Select a random split point for this attribute

Split the sample (using the split point) into two subsets.

Repeat steps 2 to 4 for each of the two subsets, until the maximum depth is reached.
By creating trees using random attributes, we ensure that all the trees in the forest will be different.
After generating a given number of trees, we can compute for each point $x$ the average path length $h(x)$ from the roots. A point considered abnormal is easier to isolate, and will have a lower average path length.
$x_i$ can be considered as normal, $x_0$ as an anomaly
A score $s$ is calculated for each point $x$ using:
 the average path length in all trees $E(h(x))$,
 the average path length of unsuccessful search in a Binary Search Tree $c(n)$ where n is the sample size
The scores given to the points have the following meaning:
 When $s$ is close to 1, $x$ is very likely to be an anomaly.
 When $s$ is much smaller than 0.5, $x$ can be considered as normal.
Finally, to reduce the possibility of falsepositive (points labeled abnormal instead of normal), a rolling windows can be apply to smoothen the results (ndecile, standarddeviation etc).