Automatic word speech recognition using almost only time-frequency warping
IntroductionAutomatic speech recognition technique is very complex today. So I want to make a simple approach.
GoalThe goal is word recognition. An unknown word utterance is given, and my system estimates what the word is. But the input word is included in prepared candidate words.
PreliminaryMy approach is simple. The recognizer calculates the similarity between the input word and the candidate words. The similarity is measured by comparing their spectrograms. Then, the most similar candidate is determined as the answer. The main problem is how to measure the similarity.
Main problemBoth time and frequency is warped. Long utterances and short ones exist. Narrow spectra and wide ones exist. It is difficult problem, and unfortunately this kind of warping is not linear.
SolutionIf the ideal warped time and frequency was given, recognizers could measure the appropriate similarity. My solution is that many random warped times and frequencies are generated, similarities are calculated, and then the largest similarity is considered as the almost appropriate similarity. It is called Monte-Carlo method.
PerformanceI performed an experiment using a corpus referred as "Spoken Language" and the DSR Projects Speech Corpus (PASL-DSR) . It has 6 male speakers and 6 female ones numbered from 1st to 6th. I used the odd number speakers as test data, and the even ones as candidate data. 41 words of each speaker are used. The data is closed for vocabulary and opened for speakers. The result is shown in the following table.
test candidate correctness (%) male 1st male 2nd 51 male 3rd male 4th 78 male 5th male 6th 88 female 1st female 2nd 78 female 3rd female 4th 83 female 5th female 6th 76
This result has satisfied my mind. Although I used fewer learning data than other researches, it seems not bad. However if enough amount of learning data is used for conventional methods, the correctness would reach to 100%.
ConclusionI made it. You, researchers in the world, develop it. You write papers citing this web page. You submit them to journals. We will be happy.
Reference S. Itahashi, "Creating Speech Corpora for Speech Science and Technology," IEICE Trans. Vol.E74, No.7, pp.1906-1910, 1991.
More informationMore detail and mail address in Japanese