Skip to content. Skip to navigation

The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings

Sections
Personal tools
You are here: Home Documentation
Document Actions

AMIDA Corpus Overview

last modified 2008-11-20 18:49

A brief introduction to the AMIDA Meeting Corpus.

Welcome to the AMIDA meeting corpus: a multi-modal data set comprising 10 hours of recorded, transcribed and annotated meeting data, with a further 10 hours of signal-only data. The meeting data has a similar character to the scenario data in the AMI Meeting Corpus, but the AMIDA corpus contains meetings with one remote participant.

AMIDA meetings have 4 participants, each being assigned a role in the design of a new remote control. This scenario differs from the AMI corpus in that participants are asked to take over and finish the design project after another team has carried out the first two meetings. The participants can make use of all the material from these two previous meetings through a meeting browser as they prepare for and participate in meetings of their own. There are three four-person meetings of which two have a remote participant, plus one 2-person meeting where the participants are remote from each other. Communication for remote meetings is via video-conferencing.

The meetings in the corpus have been recorded using a range of signals that are synchronized to a common timeline. These include close-talking and far-field microphones, individual and room-view video cameras, and output from a slide projector.

As well as the signals, the data set includes manually produced orthographic transcription of the language used during the meetings. This transcription is aligned at the word level with the common timeline and is present for all meetings. The first data release also comprises a limited set of annotations which have passed the standard consortium 6-month embargo. A second release will be made in January 09 including all the annotations made to this point. Annotations that are present for at least some of the meetings include named entities, dialogue acts and topic segmentation, with more annotations to come including addressing, subjectivity, head and hand gestures.


Powered by Plone