Abstract
Automated voice-based detection of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) could facilitate the screening for COVID19. A dataset of cellular phone recordings from 88 subjects was recently collected. The dataset included vocal utterances, speech and coughs that were self-recorded by the subjects in either hospitals or isolation sites. All subjects underwent nasopharyngeal swabbing at the time of recording and were labelled as SARS-CoV-2 positives or negative controls. The present study harnessed deep machine learning and speech processing to detect the SARS-CoV-2 positives. A three-stage architecture was implemented. A self-supervised attention-based transformer generated embeddings from the audio inputs. Recurrent neural networks were used to produce specialized sub-models for the SARS-CoV-2 classification. An ensemble stacking fused the predictions of the sub-models. Pre-training, bootstrapping and regularization techniques were used to prevent overfitting. A recall of 78% and a probability of false alarm (PFA) of 41% were measured on a test set of 57 recording sessions. A leave-one-speaker-out cross validation on 292 recording sessions yielded a recall of 78% and a PFA of 30%. These preliminary results imply a feasibility for COVID19 screening using voice.
| Original language | English |
|---|---|
| Article number | 9205643 |
| Pages (from-to) | 268-274 |
| Number of pages | 7 |
| Journal | IEEE Open Journal of Engineering in Medicine and Biology |
| Volume | 1 |
| DOIs | |
| Publication status | Published - 2020 |
| Externally published | Yes |
Keywords
- audio embeddings
- COVID19
- ensemble stacking
- recurrent neural network
- semi supervised learning
- transformer
Fingerprint
Dive into the research topics of 'SARS-CoV-2 Detection from Voice'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver