Voice Recognition

The framework of speaker recognition technology was developed in the 1960's. Since then, numerous technical groups have engaged in aggressive research and development culminating in key innovations that now make speaker recognition feasible for many applications [4]. Speech is the principal and most inherent form of communication among humans. Because of this and the fact that speech is a primary form of personal identification (PI), people generally have no problem accepting it as a biometric. Advantages of using speech as a biometric include: it's simple to use, it feels natural to the user, it provides eyes and hands-free operation, it can easily be implemented to support remote recognition (via telephone, internet, etc.), and implementation is typically inexpensive (often requiring software only). Typical problems include: channel mismatch (e.g. different microphones for enrollment and verification), background noise, and inconsistent acoustics (e.g. lab environment for enrollment, office environment for recognition) [3]. In this paper, speaker recognition is defined as identifying or verifying the identity of someone using their voice. Of all the biometrics reviewed in this paper, the human voice is the only one that possesses not only physical but behavioral characteristics as well. In fact, many sources define the voice as a behavioral biometric [1] [2] [5].

A typical human voice is formed when acoustic waves (generated by airflow from the lungs) are carried by the trachea (wind pipe) through the vocal folds (vocal cords) out through the vocal tract. Speech features, that allow us to discriminate between speakers, are due to both physiological and behavioral aspects of the speech production system. The main physiological component of speech production is the vocal tract. As acoustic waves pass through the vocal tract, their frequency content is altered by its resonances. The main discriminating behavioral characteristics include speaking rate and dialect. From a uniqueness standpoint, some biometric "experts" feel that the human voice is not rich in discriminative features such as fingerprints and iris patterns [1]. This makes speaker recognition a poor candidate for "identification mode" operation when there is a large database of enrollees. From a permanence standpoint, the human voice is not necessarily stable over one's lifespan. Long-term changes include aging and disease. Short-term changes include stress, colds, and allergies. A system that updates the user's speaker model with each successful identification/verification might compensate for the long-term changes.

Speaker recognition based PI is one of the oldest and better-accepted biometric technologies. It is primarily a "verification mode" technology and is easy to use, inexpensive, and non-invasive. Speaker recognition as a biometric is considered somewhat deficient in the characteristics of permanence and uniqueness. It is the only technology that offers remote PI with existing resources. However, if a PI system developer requires "identification mode" operations, then other technologies will probably need to be considered.


[1] A. Jain, R. Bolle, S. Pankanti, editors, "BIOMETRICS Personal Identification in Networked Society," Kluwer Academic Press, Boston, 1999.

[2] D. Zhang, "AUTOMATED BIOMETRICS Technologies and Systems," Kluwer Academic Publishers, Boston, 2000.

[3] J. Campbell, "Speaker Recognition: A Tutorial," Proceedings of the IEEE, Vol. 85, No. 9, September 1997, pages 1437-1462.

[4] Purdue University, School of Technology, Industrial Technology, Online Resources, Automatic Identification and Data Capture, http://www.tech.purdue.edu/it/resources/aidc/BioWebPages. Last accessed: 9 June 2001.

[5] D. Polemi, "Biometric Techniques: Review and Evaluation of Biometric Techniques for Identification and Authentication, Including an Appraisal of the Areas Where They are Most Applicable," Final Report, April 1997, http://www.cordis.lu/infosec/src/stud5fr.htm. Last accessed: 31 July 2001.