Cutting Edge Automated Speech Recognition
Say it again?
Is the [n] in handbag the same as in handgun? It could be quite different and indeed in reality no person pronounces a word in the same way, even when they repeat themselves.
Statistically speaking
As a result, most ASR systems, which are generally based on statistical-modelling techniques, with “right and wrong” decisions, require extensive training from thousands of recorded speakers just to master the variation within one dialect. Oxford’s FlexSR system outperforms many existing ASR systems at individual word recognition, based on not individual sounds, but parts of sounds and its lightweight nature is ideally suited to integration into existing technologies or for mobile deployment.
Key benefits of Oxford’s FlexSR
► High accuracy across non-ideal speech
► Faster, more robust and tolerant of background noise
► Ideal for multi-user environments
► Computationally lightweight
► Potential for mobile deployment
► Easily adaptable to any spoken language (currently it is implemented for English and German), including tonal languages
► Less system training required
Linguistic model
For standard speech recognition software, high degrees of accuracy are only achieved with multi-layered and computationally-intensive models, requiring either state-of-the-art hardware, or in the case of mobile applications, a network connection to offload the analysis. In addition many systems also need to be trained against a particular voice to attain accurate recognition (although some might suggest that it is the speaker that is trained how to speak, not the software how to recognise!).
FlexSR is different. Rather than rely on statistical analysis alone, linguists at the University of Oxford developed a “sparse” linguistic model of the human cognitive representation of words. This theory suggests that humans store a very basic phonological representation of each word, accepting wide variation in the sounds themselves and recognising words by their general pattern. Adopting this approach allows FlexSR to identify words across a wide range of speakers and dialects by extracting approximate sounds and matching these patterns with its internal word list or lexicon.
Easy integration
Given the potential impact of this new approach and the broad range of applications and ease of integration, Oxford University Innovation anticipates a high demand for potential product development.
The team behind FlexSR
The Humanities Division team behind this novel linguistic model is led by Professor Aditi Lahiri from the Faculty of Linguistics, Philology and Phonetics. The research resulted from a European Research Council advanced research grant (WORDS) followed by Proof of Concept funding. The team have now received follow on funding from UCSF to build a prototype to provide corrective feedback on pronunciation for language learners for example.
It is very satisfying to see the team’s hard work put into this research, maturing into something of impact to society. We have a project manager and software developers moving this forward to a demonstration model which can be our showcase to potential partners for further commercialisation. We academics can continue our passion of research in the field whilst leaving the commercial side to Oxford University Innovation.
– Professor Aditi Lahiri, Faculty of Linguistics, Philology and Phonetics