Skip to content. Skip to navigation

The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings

Personal tools
You are here: Home Documentation Meeting Room
Document Actions

Meeting Room

last modified 2008-11-20 19:29

Description of meeting room for the AMIDA meeting corpus

Data for the AMIDA Meeting Corpus were collected in the instrumented meeting room constructed at the University of Edinburgh (U.K.).

Edinburgh Room
Audio setup
The room contains 24 microphones from which 24 mono audio channels are recorded directly to hard disk. 16 Sennheiser MK2E-P-C miniature omni-directional electret microphones are arranged in two 10cm radius circular arrays of eight. These are placed in the center of the meeting room table, one between the participants and one at the end of the table closest to the presentation screen and whiteboard. The MK2E-P-C was chosen for its 20Hz-20kHz linear frequency response and its ability to draw phantom power directly from the microphone pre-amplifier. Eight Sennheiser EW300 Series radio microphones are used for recording the four meeting participants. Each person wears an ME 3-N close talking headset condenser mic and an MKE 2-EW omni directional lapel mic -- the wireless equivalent of those used in the microphone arrays. Use of a radio based system allows participants to move freely around the room without diminishing the quality of audio recordings.

Three Focusrite Octopre eight-channel microphone pre-amplifiers with up to 24bit 96kHz analogue-to-digital converters are used to amplify and digitize the microphone outputs. Each channel has a separate class A amplifier with independent gain control. Digitized output is via a single ADAT Lightpipe fiber optic cable carrying all eight channels. The A-to-D converters can sample at a variety of rates either using the Octopre's internal clock or from an external source via a word-clock input. Here the data is captured at a 48kHz, 16bit resolution. The Octopres also provide phantom power for the MK2E-P-C microphones.

The Mark of the Unicorn (MOTU) 2408 MKIII is an audio interface for PC based hard disk audio recording. It consists of a 19-inch rack-mounted I/O unit connected via a Firewire-like interface to a PCI card. The I/O unit supports 24 input/output channels in three banks of eight, with all 24 channels capable of operating simultaneously. Software installed on the PC allows configuration and acquisition of each of the channels via the PCI card. In the meeting room, each of the ADAT Lightpipe outputs from the Octopre A-to-D converters are connected to one bank of a single I/O unit and are subsequently acquired by the PC via the PCI card.

The audio capture computer is a 3GHz P4 with two 40MB SCSI hard drives configured as a RAID 0 array for streaming audio. The operating system used is Windows XP for compatibility with the MOTU driver software. Audio is captured and exported using Cakewalk Sonar recording software.

Video setup
Six cameras are used to record video proceedings. Four Sony XC555 subminiature cameras with 6mm lenses mounted under the central microphone array provide close-up views of each of the meeting participants. Two Sony SSC-DC58AP CCTV cameras, each with a 3.6mm semi-fisheye lense, provide wide-angle views of the room. One is mounted above the center of the table and gives an overhead view of the entire floor area of the room. The other is mounted in the corner of the room and provides a view of the whiteboard and presentation areas.

Six Sony GV-D1000E digital video recorders are used to record the output of the cameras directly to Mini-DV cassettes. Using Mini-DV provides reliable video capture with few errors or dropped frames. It also provides an immediate tape backup of the raw video data.

Special hardware is used to provide synchronization signals. Global time-stamping allows the A-to-D converters in the Octopres to sample each channel at the same time, thereby avoiding a time skew between audio channels. Cameras also acquire frames at the same time, avoiding lags between video channels. The Horita BSG-50 PAL Blackburst Generator generates a composite video timing signal which is used as a reference signal to which all other devices are locked. The signal is fed directly to each of the video cameras to ensure that they sample frames at the exact same time. A further output is connected to a MOTU MIDI Timepiece AV, which generates all other timing signals. The MOTU MIDI timepiece AV (MTP-AV) is capable of locking to and generating a number of different timing signals. In the meeting room, the MTP-AV locks to the Blackburst reference signal and generates a 48kHz word clock for triggering the A-to-D converters in the 3 Octopres. This ensures that each audio channel is sampled at precisely the same instant. The MTP-AV also creates a Longitudinal Time Code (LTC) for each video frame. The LTC is encoded as an 80-bit word (Hours:Minutes:Seconds:Frames) and output as a 2kHz audio signal. In addition, the MTP-AV outputs a MIDI Time Code. This is the LTC output in a format which can be read by MIDI devices. In the meeting room, it is read by the Sonar recording software and used to time-stamp the audio samples.

The Horita AVG-50 time-code inserters translate the 80bit LTC audio signal into a 90bit Vertical Interval Time Code. This 90-bit code is then inserted into the top two lines of each video frame as a series of black and white blocks, which may subsequently be read during video playback. Since this code corresponds directly to the Midi Time Code being used to time stamp the audio recording, precise synchronization of the audio and video signals can be achieved.

Auxiliary data
In addition to audio and video, any auxiliary data generated by the participants during a meeting are recorded.

Breakout room for remote participant

Powered by Plone