Jont B. Allen
AT&T Labs - Research, Florham Park, NJ

From Lord Rayleigh to Shannon:
How do humans decode speech?
 

In 1908 Lord Rayleigh reported on his speech perception studies using the "acousticon" (a commercial sound system produced in 1905), demonstrating that he was well aware of the importance of the bandwidth and blind speech testing in speech perception. It was the development of the telephone that both allowed and pushed mathematicians and physicists to develop the science of speech perception. Critical to this development was probability theory. One of their main tools was the confusion matrix which estimates the probability of hearing phoneme hi when speaking phoneme sj.

From 1910 to 1950 speech perception was extensively studied by telephone research departments throughout the world. However it was the work of Harvey Fletcher in 1921 that made the first major breakthroughs. By 1930 millions of dollars were being spent every year on speech perception research at the newly created Bell Labs. The key was his quantification of the transmission of information, as characterized by phone error patterns. Fletcher's full and final theory was not published until 1950, following his AT&T retirement.

The next breakthroughs were provided by George Miller and his colleagues at the Harvard Acoustics Lab during and following WWII. Miller used concepts from information theory, developed at Bell Labs by Claude Shannon, to quantify speech entropy. While these studies provide key insight into speech perception, they do not take the final elusive step that would allow us to build robust automatic speech recognition (ASR) machines.

Regardless of what you read in the popular press, ASR is still an unsolved problem. I will attempt to pass along some wisdom I have learned over the years on what we now know about human speech recognition (HSR). It is hoped that by learning more about HSR we might make ASR robust to noise and filtering. Today ASR is based on language models which have not, and can not, give ASR the basic robustness to noise and filtering found in HSR.

First I summarize results of speech perception from the 30 years of work by Fletcher and his colleagues, which resulted in the "articulation index," a widely recognized method for characterizing the information bearing frequency regions of speech.

Next I summarize the speech work of George Miller. Miller showed the importance of source entropy (randomness) in speech perception.

Finally I briefly describe some work in progress where I partially repeated Miller and Nicely's experiment, and then outline the problem of making existing ASR systems robust to filtering and noise.

 

Jont B. Allen - received a BS in Electrical Engineering from the University of Illinois, Urbana-Champaign in 1966, and MS and PhD from the University of Pennsylvania in 1968 and 1970 respectively. After graduation he joined Bell Laboratories, and was in the Acoustics Research Department in Murray Hill NJ from 1974 to 1996, as a Distinguished member of Technical Staff. Since 1996 Dr. Allen is a Technology Leader at AT&T Labs-Research http://www.att.com/technology/attlabs/.

During his 30 year career Dr. Allen has specialized in auditory signal processing. In the last few years he has worked nearly full time on the problem of human speech recognition. His expertise includes the areas of signal processing, cochlear modeling, cochlear neurophysiology, auditory psychophysics, human speech recognition, room acoustics, and musical acoustics.

Dr. Allen is a Fellow (May, 1981) of the Acoustical Society of America (ASA) and Fellow (Jan., 1985) of the Inst. of Electrical and Electronic Engineers (IEEE). In 1986 he was awarded the IEEE ASSP 1986 Meritorious Service Award, and in 2000 received an IEEE Third Millennium medal. He is a past member of the Executive Council of the ASA, the Adm. Committee (ADCOM) of the Acoustics Speech and Signal Processing Society (ASSP) of the IEEE, served as Editor of the ASSP transactions, Chairman of the Publication Board of the ASSP Society, General Chairman of the International Conf. of the ASSP (ICASSP-1988), and has served on numerous committees of both the ASA and the ASSP, and presently on the Acoust. Soc. Am. (ASA) Pub. policy board.

In 1986-88 he participated in the development of the AT&T multi-band compression hearing aid, that was sold under the ReSound, and Danavox logos, and is a member of the ReSound and SoundID Scientific advisory boards. In 1990 he was an Osher Fellow at the Exploratorium museum in San Francisco. In 1991-92 he served as an international Distinguished Lecturer for the Signal Processing Society. In 1993 he was on the Dean’s Advisory council for the University of Pennsylvania. In 1994 he spent 5 weeks as visiting scientist and lecturer at the University of Calgary. From 1987 to the present he is an Adjunct Associate Research Scientist in the Otolaryngology department of Columbia University. Dr. Allen has 17 patents and more than 91 publications, mostly in peer reviewed journals.

 
 
Call for Papers | Committees | Exhibitor Info | Program | MySchedule | Regular Submissions
| ITT | Paper Review | Tutorials | Registration | Workshops | Housing | Events | Home
© 2002 CMS -||- Email: icassp2002web@securecms.com -||- Last Updated: 23 April, 2002