Task 3

Urban Sounds Classification

Dataset

About the dataset

Link to uploaded dataset: https://drive.google.com/folderview?id=14u-bfL8kPRUDggPCSVio25Bw7LSuoChK

this dataset contains 8732 labelled sound excerpts (<=4s) of urban sounds from 10 classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren, and street_music

According to the original paper, sound excerpts are taken from www.freesound.org and are already pre-sorted into ten folds for cross-validation.

The meta-data contains 8 columns.

slice_file_name: name of the audio file
fsID: FreesoundID of the recording where the excerpt is taken from
start: start time of the slice
end: end time of the slice
salience: salience rating of the sound. 1 = foreground, 2 = background
fold: The fold number (1–10) to which this file has been allocated
classID: 0 = air_conditioner 1 = car_horn 2 = children_playing 3 = dog_bark 4 = drilling 5 = engine_idling 6 = gun_shot 7 = jackhammer 8 = siren 9 = street_music
class: class name
There are two classes (car_horn, gun_shot) which have a bit less than half amount of entries compared to other 8 classes.

Variables in the data

The sampling rate, bit depth, and number of channels are the same as those of the original file uploaded to Freesound (and hence may vary from file to file).”
It means there might be many different sample rates in the data, which means even with the same duration, the number of samples will be different.
Moreover, different bit depth means, they can take different rage of values.
Some of them might be stereo, while others are mono.

Pre-processing to be applied

Convert everything to Mono audio.
Sample rate conversion to 22.05 khz.
Equalise bit depth.

33-Let's "Hear" the sound- Audio Processing with ML