Audio salient event detection and summarization using audio and text modalities

Athanasia Zlatintsi, Elias Iosif, Petros Marago, Alexandros Potamianos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper investigates the problem of audio event detection and summarization, building on previous work [1,2] on the detection of perceptually important audio events based on saliency models. We take a synergistic approach to audio summarization where saliency computation of audio streams is assisted by using the text modality as well. Auditory saliency is assessed by auditory and perceptual cues such as Teager energy, loudness and roughness; all known to correlate with attention and human hearing. Text analysis incorporates part-of-speech tagging and affective modeling. A computational method for the automatic correction of the boundaries of the selected audio events is applied creating summaries that consist not only of salient but also meaningful and semantically coherent events. A non-parametric classification technique is employed and results are reported on the MovSum movie database using objective evaluations against ground-truth designating the auditory and semantically salient events.

Original languageEnglish
Title of host publication2015 23rd European Signal Processing Conference, EUSIPCO 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2311-2315
Number of pages5
ISBN (Electronic)9780992862633
DOIs
Publication statusPublished - 22 Dec 2015
Externally publishedYes
Event23rd European Signal Processing Conference, EUSIPCO 2015 - Nice, France
Duration: 31 Aug 20154 Sept 2015

Publication series

Name2015 23rd European Signal Processing Conference, EUSIPCO 2015

Conference

Conference23rd European Signal Processing Conference, EUSIPCO 2015
Country/TerritoryFrance
CityNice
Period31/08/154/09/15

Keywords

  • affective text analysis
  • audio summarization
  • audio-text salient events
  • monomodal auditory saliency

Fingerprint

Dive into the research topics of 'Audio salient event detection and summarization using audio and text modalities'. Together they form a unique fingerprint.

Cite this