In this article, we explore how to use Google’s Teachable Machine AI model with a spectrogram to recognize sounds and trigger actions. The model recognizes specific sounds and opens a web server hosted by the IndusBoard. The phone or laptop chip runs the AI model, while the Indus Board connects to receive signals and execute tasks accordingly.
For example, when the AI recognizes a command like “hello, turn on the light,” it can activate an LED on pin 33, which can be connected to a relay module to control home lights.
This system can be used in various applications, such as detecting animal presence in forests, identifying different bird species, or diagnosing faults in engines and motors through sound analysis.
Understanding the Spectrogram
Before using the system, it’s essential to understand what a spectrogram is and how it aids in recognizing words, animals, and faults in motors.
What is a Spectrogram?
A spectrogram is a visual representation of the spectrum of frequencies in a signal as they change over time. It allows for the analysis of the amplitude (or power) of different frequency components of a sound signal over time. The x-axis of a spectrogram represents time, the y-axis represents frequency, and the color or intensity indicates the amplitude of the frequencies at each time point.
How Does a Spectrogram Recognize Sound?
1. Sound Wave Capture:
Sound waves are captured using a microphone, which converts the acoustic signal into an electrical signal.
2. Digitization:
The electrical signal is digitized by an analog-to-digital converter (ADC), creating a series of discrete digital samples of the sound wave.
3. Short-Time Fourier Transform (STFT):
The digitized signal is divided into small overlapping segments (windows). Each segment is transformed from the time domain to the frequency domain using the Short-Time Fourier Transform (STFT). This results in a series of frequency spectra, one for each segment of the signal.
4. Spectrogram Generation:
The frequency spectra are arranged in sequence to form the spectrogram. The intensity or color of each point in the spectrogram indicates the amplitude of a specific frequency at a specific time.
5. Feature Extraction:
Features are extracted from the spectrogram to capture relevant information about the sound. Common features include Mel-Frequency Cepstral Coefficients (MFCCs), which represent the short-term power spectrum of a sound. These features create a compact representation of the sound signal that is more manageable for analysis.
6. Pattern Recognition:
Machine learning algorithms or neural networks are trained on these features to recognize and classify different sounds. The training process involves using labeled datasets where the sound type (e.g., speech, music, noise) is known. Once trained, the model can analyze new spectrograms and recognize the corresponding sounds by comparing the extracted features to the patterns learned during training.
By following these steps, you can effectively use sound recognition to automate various tasks and enhance your projects with AI-driven sound analysis.
Bill of Material
Training AI Model
In this project, we use Google’s Teachable Machine to create an ML model using spectrograms. Start by visiting the Google Teachable Machine and collect the dataset to train the ML model with different sounds.
Steps to Create Your ML Model:
- Open the Google Teachable Machine:
- Go to Google Teachable Machine.
- Collect Your Dataset:
- Record or upload sounds that you want the model to recognize. This could include various animal sounds like elephants and birds, or sounds related to machinery, such as a motor operating correctly or incorrectly.
- Label Your Sounds:
- Assign labels to each sound. For instance, label sounds of elephants as “Elephant” and bird sounds as “Birds.”
- If you’re working with machinery, label the sound of a motor running in good condition as “Motor – Normal” and when it’s malfunctioning as “Motor – Fault.”
- For keyword spotting, label sounds according to the specific words or phrases you want to detect.
- Train Your ML Model:
- Use the collected and labeled dataset to train your model in the Teachable Machine. This will involve processing the sounds into spectrograms that can be used for pattern recognition by the model.
- Deploy Your Model:
- Once trained, you can deploy your model for real-time sound recognition, allowing it to identify sounds and trigger specific actions based on the labels assigned.
By following these steps, you can create a custom ML model tailored to your needs, enabling you to recognize and respond to various sounds effectively.
Now after collection, train the ML model and then upload the ML model to cloud of google, you can also download the ML model to run locally on PC.
Preparing the INDUSBOARD Server to Connect with the Main Program
Now we need to program the IndusBoard server to be activated when the sound is detected. The IndusBoard will alert us by lighting up an LED or, if you are using a buzzer on pin 33, it will sound when the noise is detected and recognized by the ML model.
Now configure the ML.js code that you get after training and uploading the ML model on Google Teachable Machine to call the IP address of the IndusBoard server when the sound is detected.
Testing
Now run the ML.js code and allow access to the mic, then play the sound for which you have created the ML model to recognize and get the alert. After recognizing the sound, the IndusBoard automatically opens the page showing the interface for the alert and automatically plays the buzzer if connected, otherwise lights up the LED on pin 33.