In the past decade or so, Artificial Intelligence (AI) has made significant strides and is now part of automation pipelines in various applications. New innovations that are even more efficient than their predecessors are constantly being released. Generative AI is a sub-part of AI capable of creating realistic content whose popularity has increased almost exponentially. It is now capable of generating content that is indistinguishable from the real one, posing many security and privacy threats.
Speech generation and voice cloning are two fields that have been adversely affected by these AI models, which pose more risks of scams and spreading misinformation. To tackle this issue, the researchers at Meta have introduced AudioSeal, an audio-watermarked technique that has been designed for localized detection of AI-generated speech.
Watermarking is a technique that is used for detecting synthesized audio. It inserts a signal in the generated audio, which is imperceptible to the ear but can be detected by specific algorithms. However, current watermarking methods are not adapted for detection and are not localized, i.e., they consider the entire audio, which makes it difficult to identify segments of AI-generated speech within the entire clip.
AudioSeal jointly trains two models – a generator and a detector. The generator creates a watermark signal embedded into the input audio, and the detector returns the probability of the watermark’s presence. The detector model has been trained by masking the watermark in random audio sections, allowing it to detect synthesized speech in longer audio clips precisely. Moreover, it can also identify the position of the watermark in the audio.
The researchers also introduced a novel perpetual loss inspired by auditory masking that allows the model to achieve better results in terms of the imperceptibility of the watermark signal. They also extended AudioSeal to multi-bit watermarking, allowing the attribution of audio to a specific model or version without impacting the detection signal.
The evaluation results show that AudioSeal significantly outperforms the WavMark model in computation speed and achieved two times faster detection. It also displayed state-of-the-art robustness against a wide range of real-life audio manipulations with precise detection of minor alterations in the audio. The researchers also studied the impact of deliberate attacks (to overwhelm the detection system) and observed that the detection error could be increased by 80% through white-box attacks, i.e., where the attacker has access to the detector. They concluded that to limit the effect of such attacks, the weights of the detector model should remain confidential.
In conclusion, AudioSeal is a robust method for proactive detection and localization of synthetic speech. It addresses the problems that its predecessors did not tackle fully and achieves significantly better performance than the WavMark model in terms of localization, attribution, and efficiency. The method is pivotal in detecting patches of synthetic audio, which will surely help ensure the privacy and security of individuals.
Check out the Paper and Github. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and Google News. Join our 37k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our Telegram Channel
The post Meta AI Introduces AudioSeal: The First Audio Watermarking Technique Designed Specifically for Localized Detection of AI-Generated Speech appeared first on MarkTechPost.
#AIShorts #Applications #ArtificialIntelligence #EditorsPick #LanguageModel #LargeLanguageModel #Sound #Staff #TechNews #Technology #Uncategorized [Source: AI Techpark]