weya AI Launches “Hush” to Fix Voice Agents Audio Challenges

Noida, India, April 09: weya AI, a BFSI-focused AI company building omni-channel AI agents for customer onboarding, sales, and collections today announced the release of Hush — an open-source speech enhancement model purpose-built for the realities of production voice AI. Trusted by Tier-1 institutions including Kotak Bank, Weya AI is on a mission to make AI transformation accessible across the global BFSI landscape through an on-premises voice AI stack.

At just 8 MB in size and requiring no GPU, Hush processes audio in under 1 ms per 10 ms frame and has been trained on over 10,000 hours of mixed data. The model is fully language-agnostic, with 1.8 million parameters, and is capable of operating consistently across all spoken languages. At launch, Hush ranked #5 on Hugging Face’s Audio-to-Audio leaderboard, making it one of the top-performing open-source models in its category.

Voice AI systems such as phone agents, call centre bots, real-time transcription pipelines, and conversational assistants often fail in real-world environments due to poor audio input rather than limitations of language models. When multiple speakers are present, traditional noise suppression systems either capture unwanted voices or degrade the clarity of the primary speaker, leading to unreliable outputs. This is one of the primary reasons voice AI fails in production.

Hush addresses this challenge by isolating the primary speaker from live audio streams while suppressing competing voices, background noise, secondary speakers, whistles, hum, hiss, and all other disturbances in real time. Built on the DeepFilterNet3 architecture and enhanced with an Auxiliary Separation Head, the model has been trained with competing human voices present in 60% of its dataset at signal-to-interference ratios of 12–24 dB.

Commenting on the development, Mr. Atul Singh, CTO, weya AI , said “Hush solves one of the most overlooked failure points in production voice AI. We built this because we kept seeing high-quality language models fail in the field, not because of the model, but because of the audio it was receiving. This is the first of several models we are developing internally, all oriented toward a single vision: giving enterprises —> banks, financial institutions, and regulated industries the ability to deploy world-class AI entirely on-premises, with full control over their data and infrastructure.”

Designed for seamless integration, Hush runs entirely on CPU and can be deployed across Linux, macOS (Apple Silicon), and Windows using prebuilt ONNX binaries, eliminating the need for heavy production dependencies. The model weights and full source code are available on Hugging Face and GitHub under the Apache 2.0 licence.

Related Posts

Leave a Reply Cancel reply