Audio data fundamentals – Exploring Audio Data-1 – IT Exams and Labeling Video Data

First, let us understand some basic terminology in audio data analysis:

Amplitude: Sound is made up of waves, and the height of those waves is called the amplitude. The bigger the amplitude, the louder the sound. Amplitude refers to the maximum extent of a vibration or oscillation, measured from the position of equilibrium. Imagine a swinging pendulum. The distance the pendulum moves from its resting position (middle point) to one extreme is its amplitude. Think of a person on a swing. The higher they swing, the greater the amplitude of their motion.
RMS calculation: To find the loudness using RMS, we square the amplitude values of the sound waves. This is done because it helps us focus on the positive values (removing any negative values) and because loudness should reflect the intensity of the sound.
Average power: After squaring the amplitudes, we calculate the average (mean) of these squared values. It’s like finding the typical size of the sound waves.
Square root: To get the final loudness measurement, we take the square root of that average power. This is the RMS, which tells us how intense the sound is on average.
RMS energy: In practical terms, when you look at a loudness value given in decibels (dB), it’s often calculated from the RMS energy. A higher RMS value means a louder sound, while a lower RMS value means a quieter sound.

So, RMS energy is a way to take the raw amplitudes of an audio signal, square them to focus on their intensity, find the average of these squared values, and then take the square root to get a measure of how loud the sound is overall. It’s a useful tool for understanding and comparing the loudness of different audio signals.

Frequency: Think of frequency as how fast something vibrates. In sound, it’s how quickly air moves back and forth to create a pitch. High frequency means a high-pitched sound, such as a whistle, and low frequency means a low-pitched sound, such as a bass drum. Think of ocean waves hitting the shore. The more waves that arrive in a given time frame, the higher the frequency.

Spectrogram: A spectrogram is like a picture that shows how loud different frequencies are in sound. It’s often used for music or speech analysis. Imagine a graph where time is on the x axis, frequency (pitch) is on the y axis, and color represents how loud each frequency is at a certain moment. Consider a musical score with notes over time. The position of the notes on the score represents their frequency, and the intensity of the notes represents their amplitude.
Mel spectrogram: A mel spectrogram is a special type of spectrogram that tries to show how humans hear sound. It’s like a picture of sound that’s been adjusted to match how we perceive pitch. It’s helpful for tasks such as music and speech recognition.
Mel-frequency cepstral coefficients (MFCCs): MFCCs are like a special way to describe the features of sound. They take the mel spectrogram and turn it into a set of numbers that a computer can understand. It’s often used in voice recognition and music analysis.
Binary cross-entropy (BCE): BCE is a way to measure how well a computer is doing a “yes” or “no” task, such as telling whether a picture has a cat in it or not. It checks whether the computer’s answers match the real answers and gives a score.
AMaxP (.95 f1, .96 acc): AMaxP is a way to find the best answer among many choices. Imagine you have a test with multiple questions, and you want the highest score. .95 f1 and .96 acc are like scores that tell you how well you did. f1 is about finding a balance between being right and not missing anything, while acc is just about how many answers you got right.

Now let us learn about the most used libraries for audio data analysis.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Audio data fundamentals – Exploring Audio Data-1

Leave a Reply Cancel reply