Friday, January 10, 2025

AI System Improves Sound Imitations

- Advertisement -

MIT researchers have developed an AI that can mimic human sounds without prior training, using a model of the vocal tract to simulate speech and noises.

A new model can take many sounds from the world and generate a human-like imitation of them, like a snake’s hiss and an approaching ambulance siren. The system can also be run in reverse to guess real-world sounds from human vocal imitations.
Credits:Image: Alex Shipps/MIT CSAIL, with visual elements from Pixabay
A new model can take many sounds from the world and generate a human-like imitation of them, like a snake’s hiss and an approaching ambulance siren. The system can also be run in reverse to guess real-world sounds from human vocal imitations.
Credits:Image: Alex Shipps/MIT CSAIL, with visual elements from Pixabay

Imitating sounds with your voice, like copying a car engine or a cat meowing, can help explain something when words don’t work. It’s like drawing a quick picture to show what you saw, but instead of using a pencil, you use your voice to make a sound. We all do it without thinking—try copying the sound of an ambulance siren, a crow, or a bell.

Researchers at MIT have made an AI system that can copy human sounds without training or hearing a human do it first. They created a human vocal tract model, which shows how sounds are made with the throat, tongue, and lips. Then, they used an AI algorithm to control the model and make it copy sounds like humans do.

- Advertisement -

The model can take different sounds from the world and make them sound like a human imitation, like leaves rustling, a snake’s hiss, or an approaching ambulance siren. It can also work the other way around, guessing authentic sounds from human vocal imitations, similar to how some computer systems can find suitable images from sketches. For example, the model can distinguish between a person imitating a cat’s “meow” and its “hiss.”

The team have developed three versions of the model to improve its ability to imitate sounds. The first model aimed to create imitations that were as close to real-world sounds as possible but didn’t match human behaviour well. The second model focused on what’s important about a sound to the listener, making it more effective than the first.

To make it even better, they added a layer of reasoning to the model, considering that humans don’t put effort into making too quick or extreme sounds. This made the imitations sound more human-like. In an experiment, the AI model was preferred 25% of the time overall, with up to 75% preference for specific sounds like a motorboat or a gunshot.

The team still has work to do on their model. It has trouble with certain consonants, like “z,” which makes it hard to imitate sounds like bees buzzing accurately. It also can’t yet replicate how humans imitate speech, music, or sounds that vary across languages, like a heartbeat.

Nidhi Agarwal
Nidhi Agarwal
Nidhi Agarwal is a Senior Technology Journalist at EFY with a deep interest in embedded systems, development boards and IoT cloud solutions.

SHARE YOUR THOUGHTS & COMMENTS

Most Popular DIY Projects

EFY Prime

Unique DIY Projects

Electronics News

Truly Innovative Electronics

Latest DIY Videos

Electronics Components

Electronics Jobs

Calculators For Electronics