Flexible Speech Recognition System (FlexSR)
Say it again?
Is the ‘a’ in bath like bar or like bat? A small difference, but in reality every person pronounces every word differently, even when they repeat themselves.
As a result, most automated speech recognition (ASR) systems, which are generally based on statistical modelling techniques, require extensive training from thousands of recorded speakers just to master the variation within one dialect. Oxford’s FlexSR system outperforms many existing ASR systems at individual word recognition, and its lightweight nature is ideally suited to integration into existing technologies or for mobile deployment.
Key benefits of Oxford’s FlexSR
- High accuracy regardless of dialect, accent or non-ideal speech
- Faster, more robust and tolerant of background noise
- Ideal for multi-user environments
- Computationally lightweight
- The potential for mobile deployment
- Easily adaptable to any spoken language (currently it is implemented for English and German), including tonal languages
- No system training required
For standard speech recognition software, high degrees of accuracy is only achieved with multi-layered and computationally-intensive models, requiring either state-of-the-art hardware or in the case of mobile applications, a network connection to offload the analysis. In addition, many systems also need to be trained against a particular voice to attain accurate recognition (although some might suggest that it is the speaker that is trained how to speak, not the software how to recognise!)
FlexSR is different. Rather than rely on statistical analysis alone, leading linguists at the University of Oxford developed a “sparse” linguistic model of the human cognitive representation of words. This theory suggests that humans store a very basic acoustic representation of each word, accepting a wide variation in the sounds themselves and recognising words by their general pattern. Adopting this approach allows FlexSR to identify words across a wide range of speakers and dialects by extracting approximate sounds and matching these patterns with its internal word list or lexicon.
Given the potential impact of this new approach and the broad range of applications, Oxford University Innovation welcomes discussions with potential development or integration partners.
(Patent applied for: GB1322377.1)
about this technology