An introduction to Voice Biometrics

Voice Biometrics Explained

Most of us can recognize friends and family by their voices, however, the automatic treatment of
voice patterns is a complex problem that has taken a number of years to solve in a satisfactory
manner. Improved and highly accurate forms of voice (or speaker) verification have recently
become available based upon Voice Biometric Technology.

The voice is a personal characteristic, meaning something you are rather than something you have
such as a PIN, password or even address. Just like fingerprints, the characteristics of the voice can
be used to identify a person in a unique way. Voice Biometrics is the technology used to extract
personal voice patterns and verify the identity of a speaker using just voice.

Voice Biometric Technology ignores what you are actually saying and also ignores how you are
saying it, for example your language, your accent, or your speaking style. These aspects are most
easily changed by individuals even though as humans these are the characteristics we use to
indentify people.

This information is embedded in the wave forms of your voice and is very difficult to change or
disguise. These wave forms are the same regardless of the language you are speaking or what you
are saying. Voice Biometrics technology captures this information and uses it to indentify t speakers.

The most useful information carried by your voice is influenced by the physical structures in your head. Each one of us possesses a different physiology that generates information in the sound wave that is unique and personal.

When a person speaks, the air in the lungs passes through vocal cords and then a part of the anatomy called the vocal tract. This includes the larynx, the oral cavity and the nose.

By modifying the physical structure of the vocal tract we can articulate various phonemes and thus we are able to communicate. These physical transformations can be tracked with precision if we
analyze the frequency components of the resulting sound wave.

Voice Biometric Technology identifies and measures these frequency components then groups them
together to form a voice model (or voiceprint) that is unique to the speaker.


Two types of error rates are used to measure the effectiveness of a security system: false
acceptance and false rejection:-

  • False acceptance refers to the percentage of times that the system erroneously admits an impostor
  • False rejection refers to the percentage of times that the system rejects the authorized user as legitimate.

There is an interconnection between these two rates; improving one reduces the other and vice versa. This is often known as the threshold rate.
The conditions surrounding the application and the method of decision employed can impact the
error rates in a significant way. Voices that are captured in an office environment using a land line
sound different to the same voices from a mobile phone in situations where there is a lot of
background noise, such as in a car or in a public place. Therefore, it is not helpful to use statistics
and error rates without knowledge of capture and use. Industry studies have shown that for real
world applications the threshold can be set to allow a false acceptance rate of below 0.5% with false
rejection rates of less than 2%.

Applications of Voice Biometric Technology

A key strength of voice biometric technology is that it does not rely on external elements such as
passwords or PINs that could be used by someone other than the authorized user. Voice Biometric
technology relies on something you are (a person with biometric characteristics) rather than
something you know (a password, PIN etc). As a result, the technology is considerably more secure
than other methods usually employed. Only biometric technology can truly verify that you are who you claim to be.

In most commercial applications of this technology a telephone caller speaks an utterance that is captured by the biometric system then converted into a biometric string. This biometric string is
then compared to a previously stored voiceprint string. This comparison process produces a score orlikelihood of how well the utterance matches the stored voiceprint.

Voice (or speaker) verification typically leverages the biometric technology with business logic that involves a user-enrolment process to capture and store a reference voiceprint for the caller. A verification process can then capture an utterance next time the individual calls and perform a match.

In the course of police investigations not only DNA and fingerprint evidence is obtained, but voices are also recorded; on answer machines (voice mail), during bomb threats and during intercepted telephone conversations. Great strides have been made using phonetic and linguistic techniques in the last three years to improve technology to identify speakers with greater accuracy and precision. There are now many instances of voice biometric technology having been successfully accepted as evidence in courts of law across the world.

For further information contact:
The Biometrics Institute is the independent Not for profit association providing information, education, research and testing of biometrics. It is predominately a user group representing government departments and private organisations who are using or looking at using biometrics,[1] but suppliers also form part of the membership.


© 2013 Vicorp Services Limited. Registered in UK No: 05038031 | Registered Office: 3 Shaftsbury Court, Chalvey Park, Slough, Berkshire, SL1 2ER, UK

UK VAT registered number 174 0054 35