Abstract
Automated speech and pause/silence detection is a crucial task in clinical and pathological environments, supporting diagnostic processes and providing essential information for treatment planning. This study evaluates three methods for automatic silence detection in clinical speech analysis: (1) a traditional energy-based method using zero-crossing detection, (2) a pretrained neural network model for voice activity detection (Silero-VAD), and (3) NVIDIA's speaker diarization and transcription tool. All methods demonstrated effective pause/silence detection with comparable error rates, though Silero-VAD exhibited superior precision and performance. Key metrics included a Dice coefficient of 0.917, an onset error of 500 ms, and an endpoint error of 370 ms, highlighting the importance of audio preprocessing.
| Translated title of the contribution | Detección automática de silencios mediante inteligencia artificial para análisis de contextos clínicos |
|---|---|
| Original language | English (US) |
| DOIs | |
| State | Published - Dec 2024 |
| Event | 3rd International Congress of Biomedical Engineering and Bioengineering, CIIBBI 2024 - Cali, Colombia Duration: Nov 6 2024 → Nov 8 2024 |
Conference
| Conference | 3rd International Congress of Biomedical Engineering and Bioengineering, CIIBBI 2024 |
|---|---|
| Country/Territory | Colombia |
| City | Cali |
| Period | 11/6/24 → 11/8/24 |
All Science Journal Classification (ASJC) codes
- Orthopedics and Sports Medicine
- Bioengineering
- Biomedical Engineering
- Media Technology