@import url(http://www.auditorymodels.org/pub/skins/hsr/basic.css); @import url(http://www.auditorymodels.org/pub/skins/hsr/layout.css); @import url(http://www.auditorymodels.org/pub/skins/hsr/hsr.css);
A collection of demos, software, research, history, reflections, and speech data
The Role of the Cochlea in Human Speech Recognition from CLSP Seminars PRO Jont Allen, UIUC Allen on phone recognition in hearing impaired ears 2007 August 7 Center for Speech and Language Processing, Johns Hopkins University Allen vimeo video
Extra:
Summer workshop results: Modeling Mardrin with CASS (mp4)
Harvey Fletcher video from c1963
The research in the Human Speech Recognition group is directed at a fundamental understanding of speech perception in both normal-hearing (NH) and Hearing-Impaired ears. These are related problems, and are actually a continiuium, not two separate things. Most people are born with normal hearing. Within a few years we learn, without seeming effort, to understand human speech. How this happens is a mystery. But what happens is not a mystery. The research we have been doing over the past 10 years, as documented in the section below, is a systematic study of the nature of the failure to process and communicate under various conditions. Only by stressing the system, causing failure, can we hope to understand it. There are at least four levels of experimentation:
Examples of such processing are given in later on this page.
We have found that speech perception is a discrete (binary) zero error task Singh and Allen, 2012. Working at the token level, we defined 2 groups: ZE, NZE. Zero-Error (ZE) speech is defined as speech that NH listeners never make an error in identifying, at and above above -2 dB SNR. The non-ZE (NZE) sounds are all the rest. All of the speech CV sounds that we have tested contain many ZE tokens: most CV consonants consist of more than 80% ZE utterances.
The remaining 20% of the CVs may be broken down into 0% < medium-error (ME) <10% and >10% high-error (HE) groups. ME consonants are typically utterances having varying degrees of mispronounced utterances. HE consonants are typically those that are heard as a different sound, with high probability (>20%). Based on the entropy across normal hearing listeners, we view such sounds as mislabled. The reasons for these errors can typically be traced to a specific flaw in the production of the sound, which is typically easily identified.
A chronological history of HSR papers
Year | Experiment | Students | Details; \(N_s\)=# Subjects | Publications | .mat |
---|---|---|---|---|---|
2004 | MN64 (MN04SWN) | Phatak & Lovitt | Miller-Nicely in SWN with 4 vowels: f/a/ther, b/a/t, b/i/t, b/ee/t (not b/e/t) i.e., LaTex's tipia ``textipa{ @, \ae, E, i},'' LDCbet: [a, xq, i, xi] ([a, Q, i I]), \(V_{ldc}\)=/a, @, i, I/ \(N_s=18\) with 4 "bad subjects" | Phatak & Allen (2007) [PA07] pdf | MN64 |
2005 | Study | Allen, J. B. | Consonant recognition and the AI | JASA 117(4), p. 2212-2223. (2005) pdf | |
2005 | MN16-R (MN05WN) | Phatak & Lovitt | Replicate MN04 (WN) | Phatak, Lovitt & Allen (2008) pdf | |
2005 | MN64R (MN05SWN) | Phatak & Lovitt | More MN64; 14 new subjects; SWN | Phatak, Lovitt & Allen (2008) pdf | MN64 |
2005 | HIMCL05 | Yoon & Phatak | CVs; 10 HI ears @MCL in WN | Phatak, Yoon, Gooler & Allen (2009) pdf | |
2006 | HINALR05 | Yoon | CVs; 10 HI ears; NALR@MCL in SWN | ||
2006 | Verification | Regnier | Modifications of /ta/ | Regnier & Allen (2008) pdf | |
2006 | CV06SWN | Phatak | \(C_{ldc}\)d,b,k,p,s,t,S,Z,z/, \(V_{ldc}\)o,E,u,R,Q,U,I,a/ | cv06swn | |
2006 | CV06WN | Regnier | 9C+8V WN /d, b, k, p, s, t, xs, xz, z/ | cv06wn | |
2007 | CV06 | Pan | Analysis of 9 Vowels of CV06 | 2 unpublished MSs | |
2007 | HL07 | Li | High and Low pass Repeat of Fletcher | Li Allen 2009, JASA pdf | |
2008 | TR07 | Li | Time Truncation after Furui86 | Allen Li (2009) ASSP Magazine pdf | |
2008 | TR08 | Li | Time Truncation after Furui86 | ? 3 vowels ? | |
2009 | 3DDS | Li | 3DDS (i.e., MN64, HL07, TR07-8) | Li Allen (2010) JASA pdf; Li Allen (2010) IEEE TLSP; Li Trevino Allen 2012 JASA; | |
2009 | Verification | Menon | Remove Primary burst | ||
2009 | Verification | Abhinauv | Modify (\(\pm 6\) dB)+Remove Primary burst | Kapoor and Allen, 131(1), 2012 pdf | |
2009 | Verification | Cvengros | Modify burst + devoiced + voiced transition | ||
2009 | MN64(+R) | Singh | Full analysis of \(N_s=25\) of MN64+MN64R | JASA, April 2012 pdf | |
2010 | HIMCL10-I/III | Woojae Han | CVs; \(N_s=46\) HI ears with \(N_t\)2/token/SNR | ||
2010 | HI10NALR-II/IV | Woojae Han | CVs \(N_s = 17\) HI ears with \(N_t\)10/token/SNR | ||
2011 | HL11 | Trevino | High/Low filter CVs of HI10 | ||
2013 | HI Exp2 Analysis | Trevino | Analysis of the individual variability of HI | Trevino & Allen pdf, pdf | |
2014 | MN64(+R) | Toscano&Allen | Extend Singh & Allen (2009) |
Powered by PmWiki