Logo of The Center for Spoken Language Research


Image of a page from the Reading Tutor
 
 




 


SONIC: Large Vocabulary Continuous Speech Recognition System

What is SONIC? 

SONIC is a toolkit for enabling research and development of new algorithms for continuous speech recognition.  Since March of 2001, SONIC has been used as our test bed for research activities that include speech recognition as core components at the Center for Spoken Language Research.  SONIC is an end-to-end solution which can allow one to design, train, and test on many state-of-the-art speech recognition tasks.  The recognizer can run in batch-mode (process many audio files sequentially) or in live-mode (speak into the microphone and see output in real-time).  Moreover, the client/server implementation with voice codec support allows the recognizer to run on a server machine while servicing requests from thin clients connected through remote low-bandwidth IP connections (e.g., wireless 802.11b or even dialup telephone modem speed connections). 

What can I do with it?

SONIC provides necessary tools to enable researchers to conduct meaningful speech recognition experiments or to even prototype live voice-enabled applications.  Provided with the software are tools for retraining the speech recognizer on new data, tools for integrating new language models (both statistical n-gram and grammars) or even port the recognizer to a new language.  Speaker and environment adaptation routines are also provided in addition to example applications to run experiments in either batch or live-mode.

How accurate is it?

We have benchmarked SONIC on a number of standard speech recognition tasks ranging from continuous digits to complete large vocabulary transcription of conversational telephone speech.   The table below summarizes our current performance on several standardized benchmark tests.  Many results are comparable with state-of-the-art values reported in the literature.


Speech Recognition
Task Description

Vocabulary Size
Word Error
Rate
(without adaptation)
Word Error Rate
(with adaptation)
TI-DIGITS
(continuous spoken digits)
11
0.4%
0.2%
DARPA Communicator
(realtime spoken dialog system, telephone speech related to travel domain)

2.1k

10.9%

--NA--
Wall Street Journal
(Nov 1992 5k eval)
(dictation task, high-quality microphone speech)

5k

3.9%

3.0%
Wall Street Journal
(Nov 1992 20k eval)
(dictation task, high-quality microphone speech)

20k

10.0%

8.6%
DARPA/NRL SPINE
(spoken dialogs, noisy military environments,
microphone speech)

3k

42.2%

31.0%
Switchboard
(conversational telephone speech;
NIST 2000 eval data, SWB eval results only)

40k

41.9%

31.0%

How fast is it?

We have demonstrations at CSLR of SONIC executing in real-time for modern PC's (Intel Pentium 4, 2.2 GHz) for vocabularies up to approximately 40k words.  Performance gains can be further obtained using speaker adaptation (which can provide improved hypothesis pruning).  Our long-term goal is to provide real-time speech recognition up to 64k words to the research community. 

What about Support?  How often is the system updated?

The current version is 2.0-beta5 (last updated May 31, 2005).  SONIC is written in ANSI C and is provided in binary executable format for operating systems including Linux, MS Windows,  Sun Solaris, and Mac OS X.   Acoustic models, extracted speech features, and language models are compatible in machines with differing byte-orders (e.g., one can train acoustic models on a Sun Solaris platform, but test on an Intel Linux machine).

At this time we can not offer any guaranteed support of the software, but we are eager to get your feedback as a user/tester of the software.  We will do our best to establish a FAQ web site, and work on important problems that arise. SONIC is updated about once every two months, generally when there is a major bug-fix or new system enhancements or modules which we feel have matured to the point of destributing within the core recognition engine.  Bug reports and comments can be emailed to the developer (pellom@cslr.colorado.edu).

Note CSLR is contemplating offering 2 or 3 day workshops on training SONIC and porting it to new applications and languages.  If there is sufficient interest, and you would like to be contacted if such workshops are offered, register here.

What about License Restrictions?

SONIC is currently provided in binary executable format under a non-commercial use license to academic and non-profit organizations.  The license allows you to conduct basic research using the software and to make internal demonstrations using it. 

Download: How do I obtain it, and what will I get?

The technology transfer group at the University of Colorado has developed a relatively easy process to obtain a non-commercial use license for SONIC.  If you are a member of an academic group or non-profit organization (click here) to read and execute the license agreement and to obtain a login/password to download the software.  

Registered users for system download will have access to SONIC along with documentation and tutorials.  Pretrained acoustic models for several large vocabulary microphone and telephone speeech tasks (e.g., Wall Street Journal Dictation) are provided. We will provide conversational telephone acoustic models and models trained in a number of non-English languages in the coming months.

  • I am not registered and need a login & password for system download (Please contact Kate Tallman for SONIC download issues)
  • I am registered, take me to the download page

  •