You have an audio file with background noise, echo or sibilance, and you want to clean it before uploading to a podcast, sending to a client, or transcribing it. This guide shows you how to do it free with FFmpeg, when to use a professional service, and the common mistakes that ruin a decent cleanup.
Identify the problem first
Not all "dirty audio" is the same. Identifying the problem type tells you which technique to use:
- Constant background noise (AC, street, computer): spectral denoise.
- Room echo: dereverberation (more complex, requires ML).
- P/B popping: pop killer + EQ low-cut.
- Sibilance (excessive Sssss): de-esser in 4-9 kHz band.
- MP3 artifacts: re-encode to wav + denoise.
- Uneven volume: dynamic normalization.
Free cleanup with FFmpeg (3 recipes)
1. Spectral denoise (background noise)
The afftdn filter does FFT-based denoise. Works great for constant noise:
ffmpeg -i dirty.mp3 -af "afftdn=nf=-25" clean.mp3
The nf parameter is noise floor in dB. Typical values: -20 (gentle) to -30 (aggressive).
2. De-essing (sibilance)
To reduce excessive "S" sounds:
ffmpeg -i input.mp3 -af "equalizer=f=6500:t=q:w=2:g=-6" output.mp3
3. Full pipeline (denoise + normalize + de-ess)
ffmpeg -i dirty.mp3 -af "afftdn=nf=-25,dynaudnorm=p=0.71,equalizer=f=6500:t=q:w=2:g=-4" clean_pro.mp3
When to use a professional service
FFmpeg covers 70% of cases. For the rest, professional services are worth it:
- Strong room echo: dereverberation requires ML, FFmpeg alone falls short.
- Multiple simultaneous noises (street + AC + dog): RNNoise/DeepFilterNet on GPU works much better.
- Long batches: 50+ files, you save time processing in parallel.
- Broadcast quality: for commercial podcast, production house or client, that last 30% of detail matters.
FFmpeg vs Apps vs TranscribeNode
| Option | Cost | Time | Quality | Best for |
|---|---|---|---|---|
| FFmpeg local | Free | 5min/h | 70% | Tech, devs, simple audio |
| Audacity + plugins | Free | 15min/h | 75% | Editors with time |
| iZotope RX 11 | USD 399 | 20min/h | 95% | Broadcast pros |
| Adobe Enhance Speech | Adobe Sub | 1min/h | 85% | Adobe subscribers |
| TranscribeNode | Included with transcription | 30s/h | 85% | Pre-processing before transcription |
Common mistakes when cleaning audio
- Running denoise twice: makes voice sound "robotic". One well-configured pass beats two badly configured ones.
- Aggressive low-cut: high-pass at 200Hz makes male voices lose body.
- Peak normalization instead of RMS/LUFS: peak doesn't guarantee consistent loudness. Use
dynaudnormor LUFS. - Cleaning before cutting: if you'll edit parts of the audio, clean AFTER cutting — saves processing time.
Want to try clean + transcribed in one step? Create a free account with 50 minutes included.