TY - GEN
T1 - Voice quality enhancement for vocal tract rehabilitation
AU - Sutcliffe, Bianca
AU - Wiggins, Lindzi
AU - Rubin, David
AU - Aharonson, Vered
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/5/23
Y1 - 2018/5/23
N2 - Vocal rehabilitation devices used by patients after Laryngectomy produce an unnatural sounding speech. Our study aims at increasing the quality of these synthetically generated voices by implementing human-like characteristics. A simplified source filter model, linear predictive coding coefficients and line spectral frequencies were used to model the vocal tract and manipulate the acoustic features of their resulting speech. Two different mapping functions were employed to convert between the features of synthetically generated voice and those of a human voice: A Gaussian mixture model and a linear regression model. The models were trained on a set of 50 human and 50 synthetic voice utterances. Both mapping functions yielded significant changes in the transformed synthetic voices and their spectra were similar to the human voices. The linear regression model mapping produced slightly better results compared to the Gaussian mixture model mapping. Listeners' tests confirmed this result, but indicated that voices re-synthesized from the transformed model coefficients, improved on the synthetic voice but still sounded unnatural. This may imply that the vocal tract model is lacking in information that produces the subjective perception of 'artificial speech'. Future work will investigate an elaborate model which will include the speech production excitation and radiation signals and the transformation of their features. These models have the potential to improve the conversion of synthetically generated electrolarynx voice into human sounding one.
AB - Vocal rehabilitation devices used by patients after Laryngectomy produce an unnatural sounding speech. Our study aims at increasing the quality of these synthetically generated voices by implementing human-like characteristics. A simplified source filter model, linear predictive coding coefficients and line spectral frequencies were used to model the vocal tract and manipulate the acoustic features of their resulting speech. Two different mapping functions were employed to convert between the features of synthetically generated voice and those of a human voice: A Gaussian mixture model and a linear regression model. The models were trained on a set of 50 human and 50 synthetic voice utterances. Both mapping functions yielded significant changes in the transformed synthetic voices and their spectra were similar to the human voices. The linear regression model mapping produced slightly better results compared to the Gaussian mixture model mapping. Listeners' tests confirmed this result, but indicated that voices re-synthesized from the transformed model coefficients, improved on the synthetic voice but still sounded unnatural. This may imply that the vocal tract model is lacking in information that produces the subjective perception of 'artificial speech'. Future work will investigate an elaborate model which will include the speech production excitation and radiation signals and the transformation of their features. These models have the potential to improve the conversion of synthetically generated electrolarynx voice into human sounding one.
KW - Gaussian mixture model
KW - line spectral frequencies
KW - linear predictive coding coefficients
KW - linear regression
KW - source-filter model
KW - voice conversion
UR - https://www.scopus.com/pages/publications/85048452301
U2 - 10.1109/SAIBMEC.2018.8363197
DO - 10.1109/SAIBMEC.2018.8363197
M3 - Conference contribution
AN - SCOPUS:85048452301
T3 - 2018 3rd Biennial South African Biomedical Engineering Conference, SAIBMEC 2018
SP - 1
EP - 4
BT - 2018 3rd Biennial South African Biomedical Engineering Conference, SAIBMEC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 3rd Biennial South African Biomedical Engineering Conference, SAIBMEC 2018
Y2 - 4 April 2018 through 6 April 2018
ER -