Intermodal Timing Cues for Audio-Visual Speech Recognition

Accession number;04A0446534
Title;Intermodal Timing Cues for Audio-Visual Speech Recognition
Author; HASHIMOTO MASAHIRO (Univ. Occupational and Environmental Health, JPN) KUMASHIRO MASAHARU (Univ. Occupational and Environmental Health, Inst. Industrial Ecological Sci., JPN)
Journal Title;J UOEH Occup Environ Health
Journal Code:Z0840A
ISSN:0387-821X
VOL.26;NO.2;PAGE.215-225(2004)
Figure&Table&Reference;FIG.4, TBL.2, REF.18
Pub. Country;Japan
Language;Japanese
Abstract;The purpose of this study was to investigate the limitations of lip-reading advantages for Japanese young adults by desynchronizing visual and auditory information in speech. In the experiment, audio-visual speech stimuli were presented under the six test conditions: audio-alone, and audio-visually with either 0, 60, 120, 240 or 480ms of audio delay. The stimuli were the video recordings of a face of a female Japanese speaking long and short Japanese sentences. The intelligibility of the audio-visual stimuli was measured as a function of audio delays in sixteen untrained young subjects. Speech intelligibility under the audio-delay condition of less than 120ms was significantly better than that under the audio-alone condition. On the other hand, the delay of 120ms corresponded to the mean mora duration measured for the audio stimuli. The results implied that audio delays of up to 120ms would not disrupt lip-reading advantage, because visual and auditory information in speech seemed to be integrated on a syllabic time scale. Potential applications of this research include noisy work-place in which a worker must extract relevant speech from all the other competing noises. (author abst.)
FULLTEXT