TY - GEN
T1 - Audio salient event detection and summarization using audio and text modalities
AU - Zlatintsi, Athanasia
AU - Iosif, Elias
AU - Marago, Petros
AU - Potamianos, Alexandros
N1 - Publisher Copyright:
© 2015 EURASIP.
PY - 2015/12/22
Y1 - 2015/12/22
N2 - This paper investigates the problem of audio event detection and summarization, building on previous work [1,2] on the detection of perceptually important audio events based on saliency models. We take a synergistic approach to audio summarization where saliency computation of audio streams is assisted by using the text modality as well. Auditory saliency is assessed by auditory and perceptual cues such as Teager energy, loudness and roughness; all known to correlate with attention and human hearing. Text analysis incorporates part-of-speech tagging and affective modeling. A computational method for the automatic correction of the boundaries of the selected audio events is applied creating summaries that consist not only of salient but also meaningful and semantically coherent events. A non-parametric classification technique is employed and results are reported on the MovSum movie database using objective evaluations against ground-truth designating the auditory and semantically salient events.
AB - This paper investigates the problem of audio event detection and summarization, building on previous work [1,2] on the detection of perceptually important audio events based on saliency models. We take a synergistic approach to audio summarization where saliency computation of audio streams is assisted by using the text modality as well. Auditory saliency is assessed by auditory and perceptual cues such as Teager energy, loudness and roughness; all known to correlate with attention and human hearing. Text analysis incorporates part-of-speech tagging and affective modeling. A computational method for the automatic correction of the boundaries of the selected audio events is applied creating summaries that consist not only of salient but also meaningful and semantically coherent events. A non-parametric classification technique is employed and results are reported on the MovSum movie database using objective evaluations against ground-truth designating the auditory and semantically salient events.
KW - affective text analysis
KW - audio summarization
KW - audio-text salient events
KW - monomodal auditory saliency
UR - http://www.scopus.com/inward/record.url?scp=84963985212&partnerID=8YFLogxK
U2 - 10.1109/EUSIPCO.2015.7362797
DO - 10.1109/EUSIPCO.2015.7362797
M3 - Conference contribution
AN - SCOPUS:84963985212
T3 - 2015 23rd European Signal Processing Conference, EUSIPCO 2015
SP - 2311
EP - 2315
BT - 2015 23rd European Signal Processing Conference, EUSIPCO 2015
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd European Signal Processing Conference, EUSIPCO 2015
Y2 - 31 August 2015 through 4 September 2015
ER -