TY - GEN
T1 - Harnessing music to enhance speech recognition
AU - Aharonson, Vered
AU - Mualem, Shany
AU - Aharonson, Eran
N1 - Publisher Copyright:
© 2019, Springer International Publishing AG, part of Springer Nature.
PY - 2019
Y1 - 2019
N2 - The performance of automatic speech recognition highly depends upon the speaker’s intelligibility and is affected by speech intensity and rate. Lombard reflex is an auditory feedback mechanism which is encountered when speakers spontaneously increase their voice in a noisy environment. We studied the feasibility of employing Lombard reflex to improve speech recognition without the speaker’s conscious awareness of the process. Whereas previous studied employed noises to produce this reflex, which may be unpleasant to the speakers, we studied the effects of music-induced Lombard reflex. Twenty speakers were recorded when listening to two music types: a rhythmic dance music or a calm yoga music, as well as to white noise, metronome sound and silence, and the differences in the speakers’ speech rate and intensity while listening to the different sounds were compared. Several cohort trends were observed: Speech intensity was particularly stronger in the rhythmic dance music condition for most subjects. This change was not observed for the metronome sound which had a similar rhythm. Speech rate was decreased for the yoga music condition for female speakers only. An examination of the changes in these prosodic variables for individual speakers yielded that most of them exhibited an increase in speech power and/or a decrease in speaking rate for at least one of the music types. This effect, when further explored, may be implemented in a personalized speech recognition engine, to enhance the usability of voice commands, dictation, and other speech based applications.
AB - The performance of automatic speech recognition highly depends upon the speaker’s intelligibility and is affected by speech intensity and rate. Lombard reflex is an auditory feedback mechanism which is encountered when speakers spontaneously increase their voice in a noisy environment. We studied the feasibility of employing Lombard reflex to improve speech recognition without the speaker’s conscious awareness of the process. Whereas previous studied employed noises to produce this reflex, which may be unpleasant to the speakers, we studied the effects of music-induced Lombard reflex. Twenty speakers were recorded when listening to two music types: a rhythmic dance music or a calm yoga music, as well as to white noise, metronome sound and silence, and the differences in the speakers’ speech rate and intensity while listening to the different sounds were compared. Several cohort trends were observed: Speech intensity was particularly stronger in the rhythmic dance music condition for most subjects. This change was not observed for the metronome sound which had a similar rhythm. Speech rate was decreased for the yoga music condition for female speakers only. An examination of the changes in these prosodic variables for individual speakers yielded that most of them exhibited an increase in speech power and/or a decrease in speaking rate for at least one of the music types. This effect, when further explored, may be implemented in a personalized speech recognition engine, to enhance the usability of voice commands, dictation, and other speech based applications.
KW - Automatic speech recognition (ASR)
KW - Lombard reflex
KW - Music effect on speech
UR - https://www.scopus.com/pages/publications/85049535386
U2 - 10.1007/978-3-319-94947-5_39
DO - 10.1007/978-3-319-94947-5_39
M3 - Conference contribution
AN - SCOPUS:85049535386
SN - 9783319949468
T3 - Advances in Intelligent Systems and Computing
SP - 390
EP - 396
BT - Advances in Usability, User Experience and Assistive Technology - Proceedings of the AHFE 2018 International Conferences on Usability and User Experience and Human Factors and Assistive Technology, 2018
A2 - Falcao, Christianne
A2 - Ahram, Tareq Z.
PB - Springer Verlag
T2 - AHFE International Conferences on Usability and User Experience and Human Factors and Assistive Technology, 2018
Y2 - 21 July 2018 through 25 July 2018
ER -