Analysis and Understanding of Multimodal Communication Scenes

            Prof. Hervé Bourlard Idiap Research Institute, Director Swiss Federal Institute of Technology at Lausanne (EPFL)

In recent times there has been growing research interest in the recognition and understanding of interactions between people in settings such as meetings, lectures, seminars and teleconferences. The modeling and interpretation of human-human communication scenes is a challenging scientific endeavor, requiring a broad range of research advances in areas including signal processing, speech recognition, multimodal scene analysis, discourse analysis, and multimodal retrieval. The analysis and interpretation of multiparty meetings is of scientific interest since it provides a circumscribed arena for the investigation of communication scenes, as well as underpinning a number of potentially significant applications. In this talk, we give an overview of some of the relevant projects, where we have developed the following: (1) an infrastructure for recording meetings using multiple microphones and cameras; (2) a one hundred hour, manually annotated meeting corpus; (3) a number of techniques for indexing, and summarizing of meeting videos using automatic speech recognition and computer vision, and (4) a extensible framework for browsing, and searching of meeting videos.