Course:
Text-to-Speech Synthesis
Instructor: Alan W. Black
Personal page: http://www.cs.cmu.edu/~awb/
Affiliation: Language Technologies
Institute
Carnegie
Mellon University, Pittsburgh, PA 15213
Lab website: http://www.lti.cs.cmu.edu/
General Description
The course is designed to cover all aspects of speech synthesis from
both a theoretical and practical point of view. Students are given the
opportunity to learn about new research in the areas of text processing,
prosodic modeling and waveform synthesis as well practical experience
in using existing synthesis technologies. The course consists of the following
sub-parts:
- History and general use of speech synthesis;
- Text analysis: text conditioning, markup languages, homograph disambiguation;
- Lexicons and letter to sound rules;
- Prosodic modeling: phrasing, duration and intonation;
- Waveform synthesis: diphones, and unit selection;
- Building new voices in new languages; and,
- Limited domain synthesis for practical applications.
Course Objectives
The objectives of this course are:
- To allow understanding of the basic parts of speech synthesis
- To understand the relative complexity of implementing solutions to
the problems
- To become familiar with the Festival architecture and know what it
can and can't do
As the instructor is a firm believer in learning by doing, this course
tries to touch on every aspect of speech synthesis from a practical view.
General discussion of problems are discussed with some presentation of
potential theoretical solutions. Where appropriate, substantial exercises
are given which will hopefully lead to greater understanding of the actual
problems.
Learning Activities
The course was based heavily around the Festival Speech Synthesis System.
As Festival offers an environment for building new synthetic voices as
well as an end user delivery vehicle for black box text-to-speech, it
offers an ideal platform for teaching students what can be done with today's
speech output technologies. Each week simple exercises are assigned involving
different aspects of the system so the students can learn from practical
experience how the technology worked. The system is designed such that
no low-level C++ programming was required, thus opening the course to
a much wider audience. In all cases, existing simple rules and functions
used in Festival were presented to students for modification using the
Scheme scripting language; this enables students to learn without having
to delve too deeply into the complexities of the system.
In addition to synthesis techniques, the students are led into the field
of building new synthetic voices in new and currently supported languages
based on the released documentation and scripts that are part of the CMU
FestVox Project (http://festvox.org).
These scripts and tools sit on top of Festival (and the Edinburgh Speech
Tools) and offer a complete environment for developing new synthetic voices.
In addition to the weekly exercises, a larger project is set towards
the end of the course.
History and Background of the Course
This course, now completing its second year, is primarily desgined for
entering graduate students at CMU majoring in language technologies, computer
science or robotics. Although some students will continue their research
in speech synthesis, most are in more general areas of speech and language
processing. The attendees in the second year also included two senior
undergraduates. Some of the projects completed at the end of the course
have led to publications. Such projects have included, cross language
limited domain synthesis (a talking clock in Chinese and Polish weather
reports), Thai letter to sound rules, a talking Eliza program, complete
new female US English diphone voices, horoscopes, singing synthesis, a
Catalan diphone synthesizer etc.
Although the CSLU Speech Toolkit itself is not used, the Festival Speech
Synthesis System is an integral part of the toolkit, so the course can
be taught using the toolkit under Windows. Moreover, all voices, techniques,
models, etc. developed within this course can be used directly in toolkit
applications.
Development of the course was sponsored in part by an NSF CRCD (Combined
Research and Curriculum Development) grant awarded to the University of
Colorado at Boulder.
Links to the Course Materials
The complete course notes and slides have been made available at
http://festvox.org/festtut/
This site will also be updated with some of the example student projects
that were completed and model answers to the exercises (some are already
on the general section of the FestVox site). We are continuing to update
these notes and there will be new releases making it easier for both individual
students to follow the course notes and institutions to uses these notes
to teach there own course.
If you have questions regarding the course, please e-mail the course
instructor. The following link will invoke your e-mail program:
Alan
W. Black, Ph.D.
|