![]() |
|||||||
|
|
Report on talk by Dr. Preeti Rao, Dept of Electrical Engineering, IIT-Bombay for the Audio Engineering Society – India Section on Saturday 7th May, 2005.
Dr. Rao’s presentation was primarily based on signal processing and coding of speech signals. She began by explaining the fundamentals of the human speech production mechanism. We were introduced to the various sounds produced by humans along with the various methods of articulation and voicing used to produce all these sounds. She went on to explain how the human vocal tract can be modeled by a series of concatenated tubes (closed at one end and open at the other), each of a different cross section. At this point she was asked whether this was an accurate enough model considering the human vocal tract has a bent shape. In response we were informed that the bending factor was very minor as compared to the fact that the cross sections were constantly changing. She then explained the difference between the configurations of the articulators for producing different vowel sounds and how it was possible to use this information to recognize / synthesize different vowels. Next explained was the difference between voiced (vowel) sounds and unvoiced (consonant) sounds. This whole understanding of the human speech production mechanism led to the source-filter model of speech synthesis. Dr. Rao’s explanation of how the Devanagri script of the Hindi language has a phonetic foundation was a revelation. Not only does one symbol in the script represent one phone but also the ordering of each of the symbols in the script is phonetically consistent. In contrast the English language script was described as phonetically inconsistent and a universally recognized standard phonetic symbol set was developed to make up for this. Two applications of speech processing methods delved into were Automatic speech recognition (ASR) & Speech coding. Dr. Rao explained the difference between waveform coding and speech model based coding in terms of speech quality and bitrate. Finally Dr. Rao gave us a demonstration of a Query-by-Humming system developed for Hindi film music by the Audio signal processing lab at IIT-B. A query audio file of a singer humming/singing the melody/chorus of a song was used as input and the system was able to recognize and return the correct song from the database of about 200 songs. Efforts are currently on to increase the database of songs and make the system more robust. The audience, comprising primarily of audio industry professionals, were very interested with this introduction to the field of Speech signal processing and asked pertinent questions. All in all the members were given a glimpse of how much more there is to the world of audio than the recording studio.
|