Continue With Machine Learning - Noise Detection (Classification)

Có thể bạn quan tâm

Noise has pattern that we can identify. If our model is good enough to classify a specific group of noises such as gun shot, mirror broken, car horn, ..., then we can use the model in a very useful ways such as to identify crime or abnormality in a running machine by just detecting the noise.

Today we'll build a noise classifier based on the data from https://drive.google.com/drive/folders/0By0bAi7hOBAFUHVXd1JCN3MwTEU The data is the collection of noises from 10 classes. Our goal is to learn from the training data in order to identify one among those noises.

Note that we'll be using librosa libray to audio analysing. We need to convert wave form to interval of frequency. Here we change each noise into MFCCs (Mel Frequency Cepstral Coefficient), which is an array of some numbers.

All we'll be using Deep Learning algorithm to train our data with [keras](https://keras.io/) and [tensorflow](https://github.com/tensorflow/tensorflow) libraries.

We'll skip the installation process by going directly into the implementation.

Understand Data

The attributes of data are as follows: ID – Unique ID of sound excerpt Class – type of sound

Each noise is a wave file with .wav extension. Try to listen to some noises.

import IPython.display as ipd ipd.Audio('data/Train/2022.wav')

Output

Now let's plot some waves to see their patterns.

% pylab inline import os import pandas as pd import librosa import glob import librosa.display data, sampling_rate = librosa.load('data/Train/1045.wav') plt.figure(figsize=(12, 4)) librosa.display.waveplot(data, sr=sampling_rate)

Output

Let's check some wave plots and observe its patterns. Pay attention to how the patterns of wave of the same class looks similar to each other.

import time train = pd.read_csv('data/train.csv') data_dir = os.getcwd() + '/data' def load_wave(): i = random.choice(train.index) audio_name = train.ID[i] path = os.path.join(data_dir, 'Train', str(audio_name) + '.wav') print('Class: ', train.Class[i]) x, sr = librosa.load(path) plt.figure(figsize=(12, 4)) librosa.display.waveplot(x, sr=sr) plt.show() for i in range(10): load_wave()

Output

Class: jackhammer

Class: street_music

Class: gun_shot

Class: street_music

Class: siren

Class: engine idling

Class: children_playing

Class: gun_shot

Class: children_playing

Let's check number of values for each class:

train.Class.value_counts()

Output:

jackhammer 668 engine_idling 624 siren 607 children_playing 600 street_music 600 drilling 600 air_conditioner 600 dog_bark 600 car_horn 306 gun_shot 230 Name: Class, dtype: int64

Let's create function to convert wave to mfcc.

def parser(row): # function to load files and extract features file_name = os.path.join(os.path.abspath(data_dir), 'Train', str(row.ID) + '.wav') # handle exception to check if there isn't a file which is corrupted try: # here kaiser_fast is a technique used for faster extraction X, sample_rate = librosa.load(file_name, res_type='kaiser_fast') # we extract mfcc feature from data mfccs = np.mean(librosa.feature.mfcc(y=X, sr=sample_rate, n_mfcc=40).T,axis=0) except Exception as e: print("Error encountered while parsing file: ", file) return None, None feature = mfccs label = row.Class return [feature, label] temp = train.apply(parser, axis=1) temp.columns = ['feature', 'label'] temp.head(2)

Output

feature label 0 [-82.12358939071989, 139.5059159813099, -42.43... siren 1 [-15.744005405358056, 124.1199599305049, -29.4... street_music 2 [-123.39365145003913, 15.181946313102896, -50.... drilling 3 [-213.27878814908152, 89.32358896182456, -55.2... siren 4 [-237.92647882472895, 135.90246127730546, 39.2... dog_bark [array([-82.12358939, 139.50591598, -42.43086489, 24.82786139, -11.62076447, 23.49708426, -12.19458986, 25.89713885, -9.40527728, 21.21042898, -7.36882138, 14.25433903, -8.67870015, 7.75023765, -10.1241154 , 3.2581183 , -11.35261914, 2.80096779, -7.04601346, 3.91331351, -2.3349743 , 2.01242254, -2.79394367, 4.12927394, -1.62076864, 4.32620082, -1.03440959, -1.23297714, -3.11085341, 0.32044827, -1.787786 , 0.44295495, -1.79164752, -0.76361758, -1.24246428, -0.27664012, 0.65718559, -0.50237115, -2.60428533, -1.05346291]), array([-15.74400541, 124.11995993, -29.42888126, 39.44719325, -23.50191209, 16.55081468, -21.73682007, 16.533573 , -16.97172924, 4.48358393, -17.38768904, 0.73712233, -16.28922845, 5.11214906, -10.55923116, 2.91787297, -10.39084829, 0.6512996 , -10.04633806, -1.78348022, -6.09971424, 5.62978658, -4.65111382, -1.3691931 , -8.24916556, -2.36192798, -4.79620618, -0.50256975, -5.41067503, 2.07804459, 7.18600337, 8.1857473 , 0.76736086, 0.32726166, -2.21366512, -3.1068377 , -5.72384895, 0.82370563, 1.7193221 , -0.33146235]), array([-123.39365145, 15.18194631, -50.09332904, 7.14187248, -26.81703338, -0.69250356, -8.22307572, 13.51293887, -11.38205589, 19.94935211, -11.19345959, 9.59290493, -8.26916969, 4.59170834, -4.1160931 , -0.12661012, -9.26636096, 12.86464874, -6.76813103, 0.17970622, -5.58614496, 6.82406367, -7.44342262, 6.7138549 , 0.88696144, 7.95247415, -7.80404736, 4.75135726, -5.91704383, -0.51082848, -2.89312164, 3.75250478, -4.3756492 , 5.6246255 , -4.87082627, 1.88768287, -3.88603327, 1.57439023, -3.9967419 , 3.24574944]), array([-2.13278788e+02, 8.93235890e+01, -5.52561899e+01, 1.26320999e+01, -4.77753793e+01, 1.47029095e+01, 1.90393420e+01, 1.59744018e+01, -3.44622589e-01, -3.85278488e+00, -5.71352012e+00, 1.45023797e+01, 7.35625117e+00, 2.77150926e+00, -1.21664120e+01, -7.54264413e+00, -1.07718201e+00, -8.32686797e+00, 1.13106934e+01, 1.30546411e+01, -1.24408594e+01, -1.77511089e+01, 5.21535203e-02, 2.10836481e+00, 5.23872284e+00, 9.59365485e+00, -3.76473392e+00, -2.26955796e+00, -6.74119185e+00, -8.63759137e+00, 1.20566440e+01, -4.87823860e+00, -5.16628578e+00, 1.11927687e+00, -2.28136843e+00, 5.82950793e+00, 1.19403556e+00, 7.46546796e+00, -1.78587829e+00, -1.50114553e+01]), array([-2.37926479e+02, 1.35902461e+02, 3.92684403e+01, 2.12402387e+01, 9.53132848e+00, 1.38851206e+01, -3.99444661e+00, 1.24814870e+01, -2.60462664e+00, 6.07091558e+00, 2.23836723e+00, 4.17497228e+00, -1.90301314e+00, 2.30779460e+00, -2.66080009e+00, -6.64915491e-01, 4.49824368e+00, 3.77204298e+00, 3.37126391e+00, 1.59958680e+00, -5.34918903e-01, -1.95140379e-01, 6.27361166e-01, 3.20973300e+00, 1.33894133e+00, 1.04329816e-01, 9.07274421e-01, -2.27093950e+00, -5.29897600e-01, 2.00067343e-01, 5.41832293e-01, -3.01083238e-01, 5.31196471e-01, -5.16148851e-01, 1.73844662e+00, 1.10963680e+00, 2.75794074e+00, 2.15940254e+00, 6.26566792e-01, 6.92017477e-01])]

Let's convert label to dummy variable

from sklearn.preprocessing import LabelEncoder from keras.utils import np_utils X = np.array(temp.feature.tolist()) y = np.array(temp.label.tolist()) lb = LabelEncoder() print(lb.fit_transform(y)[:5]) y = np_utils.to_categorical(lb.fit_transform(y)) print(y.shape) print(y[:10])

Output

[8 9 4 8 3] (5435, 10) [[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.] [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.] [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.] [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.] [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.] [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.] [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]

Now let's build the model:

model = Sequential() model.add(Dense(256, input_shape=(40,))) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(256)) model.add(Activation('relu')) model.add(Dropout(0.5)) model.add(Dense(num_labels)) model.add(Activation('softmax')) model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam') model.fit(X, y, batch_size=32, epochs=200, validation_split=0.1, shuffle=True)

Here is some last few output:

Epoch 196/200 4891/4891 [==============================] - 0s 84us/step - loss: 0.3572 - acc: 0.8779 - val_loss: 0.3988 - val_acc: 0.8989 Epoch 197/200 4891/4891 [==============================] - 0s 94us/step - loss: 0.3450 - acc: 0.8867 - val_loss: 0.3363 - val_acc: 0.9044 Epoch 198/200 4891/4891 [==============================] - 0s 84us/step - loss: 0.3603 - acc: 0.8796 - val_loss: 0.3784 - val_acc: 0.8934 Epoch 199/200 4891/4891 [==============================] - 0s 84us/step - loss: 0.3349 - acc: 0.8902 - val_loss: 0.3603 - val_acc: 0.8952 Epoch 200/200 4891/4891 [==============================] - 0s 84us/step - loss: 0.3700 - acc: 0.8814 - val_loss: 0.3686 - val_acc: 0.8952 <keras.callbacks.History at 0x7f6fb6eb6b00>

We'll see that accuracy rate is 0.8814, and validation accuracy 0.8952 the loss is around 0.37.

Pretty good result. But I think we can tune to make better result!

Conclusion

We cannot work on wave sound directly so we need to convert it to digital format first. The we use deep learning to learn from those pattern to predict the other noises.

Từ khóa » N_mfcc=40

Continue With Machine Learning - Noise Detection (Classification)

Understand Data

Conclusion

Librosa.feature.mfcc — Librosa 0.10.v0 Documentation

Librosa.feature.mfcc — Librosa 0.7.2 Documentation

Explanation Of Librosa MFCC Parameter With N_mfcc=40

Some Question When Extracting MFCC Features · Issue #595 - GitHub

Librosa.feature.mfcc

Mel-frequency Cepstrum Coefficients — Transform_mfcc • Torchaudio

Why N_mfcc Value = 40 , While Extracting Mfcc Features | By Naveen K

Librosa.feature.mfcc Example - Program Talk

Speech Recognition - Audio Data Analysis Post Doubts (MFCCS)

Audio Spectrogram — NVIDIA DALI 1.16.0 Documentation

Class For Audio Feature Extraction - Notebooks

Speech Features - Jupyter Notebooks Gallery

Liên Hệ