Harnessing music to enhance speech recognition

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The performance of automatic speech recognition highly depends upon the speaker’s intelligibility and is affected by speech intensity and rate. Lombard reflex is an auditory feedback mechanism which is encountered when speakers spontaneously increase their voice in a noisy environment. We studied the feasibility of employing Lombard reflex to improve speech recognition without the speaker’s conscious awareness of the process. Whereas previous studied employed noises to produce this reflex, which may be unpleasant to the speakers, we studied the effects of music-induced Lombard reflex. Twenty speakers were recorded when listening to two music types: a rhythmic dance music or a calm yoga music, as well as to white noise, metronome sound and silence, and the differences in the speakers’ speech rate and intensity while listening to the different sounds were compared. Several cohort trends were observed: Speech intensity was particularly stronger in the rhythmic dance music condition for most subjects. This change was not observed for the metronome sound which had a similar rhythm. Speech rate was decreased for the yoga music condition for female speakers only. An examination of the changes in these prosodic variables for individual speakers yielded that most of them exhibited an increase in speech power and/or a decrease in speaking rate for at least one of the music types. This effect, when further explored, may be implemented in a personalized speech recognition engine, to enhance the usability of voice commands, dictation, and other speech based applications.

Original languageEnglish
Title of host publicationAdvances in Usability, User Experience and Assistive Technology - Proceedings of the AHFE 2018 International Conferences on Usability and User Experience and Human Factors and Assistive Technology, 2018
EditorsChristianne Falcao, Tareq Z. Ahram
PublisherSpringer Verlag
Pages390-396
Number of pages7
ISBN (Print)9783319949468
DOIs
Publication statusPublished - 2019
Externally publishedYes
EventAHFE International Conferences on Usability and User Experience and Human Factors and Assistive Technology, 2018 - Orlando, United States
Duration: 21 Jul 201825 Jul 2018

Publication series

NameAdvances in Intelligent Systems and Computing
Volume794
ISSN (Print)2194-5357

Conference

ConferenceAHFE International Conferences on Usability and User Experience and Human Factors and Assistive Technology, 2018
Country/TerritoryUnited States
CityOrlando
Period21/07/1825/07/18

Keywords

  • Automatic speech recognition (ASR)
  • Lombard reflex
  • Music effect on speech

Fingerprint

Dive into the research topics of 'Harnessing music to enhance speech recognition'. Together they form a unique fingerprint.

Cite this