SONIC is a
toolkit for enabling research and development of new algorithms for
continuous speech recognition. Since March of 2001, SONIC has
been used as our test bed for research activities that include speech
recognition as core components at the Center for Spoken Language
Research. SONIC is an end-to-end solution which can allow one to
design, train, and test on many state-of-the-art speech recognition
tasks. The recognizer can run in batch-mode (process many audio
files sequentially) or in live-mode (speak into the microphone and see
output in real-time). Moreover, the client/server implementation
with
voice codec support allows the
recognizer to run on a server machine while servicing requests from
thin clients connected through remote low-bandwidth IP connections
(e.g., wireless 802.11b or even dialup telephone modem speed
connections).
What can I do with it?
SONIC provides necessary tools to
enable researchers to conduct meaningful speech recognition experiments
or to even prototype live voice-enabled applications. Provided
with the software are tools for retraining the speech recognizer on new
data, tools for integrating new language models (both statistical
n-gram and grammars) or even port the recognizer to a new
language. Speaker and environment adaptation routines are also
provided in addition to example applications to run experiments in
either batch or live-mode.
How accurate is it?
We have
benchmarked SONIC on a number of standard speech recognition tasks
ranging from continuous digits to complete large vocabulary
transcription of conversational telephone speech. The table
below summarizes our current performance on several standardized
benchmark tests. Many results are comparable with
state-of-the-art values reported in the literature.
Speech Recognition
Task Description
|
Vocabulary Size
|
Word
Error
Rate
(without adaptation)
|
Word
Error Rate
(with adaptation)
|
TI-DIGITS
(continuous spoken digits)
|
11
|
0.4%
|
0.2%
|
DARPA
Communicator
(realtime spoken dialog system, telephone speech related to travel
domain)
|
2.1k
|
10.9%
|
--NA--
|
Wall
Street Journal
(Nov 1992 5k eval)
(dictation task, high-quality microphone speech)
|
5k
|
3.9%
|
3.0%
|
Wall Street
Journal
(Nov
1992 20k eval)
(dictation task, high-quality microphone speech)
|
20k
|
10.0%
|
8.6%
|
DARPA/NRL
SPINE
(spoken dialogs, noisy military environments,
microphone speech)
|
3k
|
42.2%
|
31.0%
|
Switchboard
(conversational telephone speech;
NIST 2000 eval data, SWB eval results only)
|
40k
|
41.9%
|
31.0%
|
How fast is it?
We have demonstrations at CSLR of
SONIC executing in real-time for modern PC's (Intel Pentium 4, 2.2 GHz)
for
vocabularies up to approximately 40k words. Performance gains can
be further obtained using speaker adaptation (which can provide
improved hypothesis pruning). Our long-term goal is to provide
real-time speech recognition up to 64k words to the research
community.
What about Support?
How often is the system updated?
The current version is 2.0-beta5
(last updated May 31, 2005). SONIC is written in ANSI C and is
provided in binary executable format for operating systems including
Linux, MS Windows, Sun Solaris, and Mac OS X.
Acoustic models, extracted speech features, and language models are compatible
in machines with differing byte-orders (e.g., one can train acoustic
models on a Sun Solaris platform, but test on an Intel Linux machine).
At this time we can not offer any guaranteed support of the software,
but we are eager to get your feedback as a user/tester of the
software. We will do our best to establish a FAQ web site, and
work on important problems that arise. SONIC is updated about once
every two months, generally
when there is a major bug-fix or new system enhancements or
modules which we feel have matured to the point of
destributing within the core recognition engine. Bug reports and
comments can be emailed to the developer (
pellom@cslr.colorado.edu).
Note CSLR is contemplating offering 2 or 3 day workshops on training
SONIC and porting it to new applications and languages. If there
is sufficient interest, and you would like to be contacted if such
workshops are offered,
register here.
What about License
Restrictions?
SONIC is currently provided in
binary executable format under a
non-commercial
use license to academic and non-profit organizations. The
license allows you to conduct basic research using the software and to
make internal demonstrations using it.
Download: How do I obtain
it, and what will I get?
The technology
transfer group at the University of Colorado has developed a relatively
easy process to obtain a non-commercial use license for SONIC. If
you are a member of an academic group or non-profit organization (click here)
to read and execute the license agreement and to obtain a
login/password to download the software.
Registered
users for system download will have access to SONIC along with
documentation and tutorials. Pretrained acoustic models for
several large vocabulary microphone and telephone speeech tasks (e.g.,
Wall Street Journal Dictation) are provided. We will provide
conversational telephone acoustic models and models trained in a number
of non-English languages in the coming months.
I am not registered
and need a login & password for system download (Please
contact Kate Tallman
for SONIC download issues)
I am
registered, take me to the download page