The Secret to Better Face Recognition Accuracy: Thresholds

Published on

September 27, 2018

Ben Virdee-Chapman

You’ll be pleased to hear you are not alone. And we’re here for you. This article gives you a seriously thorough guide to getting the most out of Kairos face recognition, and explains how— with some simple tweaks of a concept called thresholds— you’ll be rubbing your eyes in amazement in no time.

Let’s begin.

What is face recognition software and how does it work?

Face recognition technology is the biometric identification of a person by comparing a live capture or digital image with a stored, or ‘enrolled’ image for that person. During the enrollment process, the subject establishes her identity in the system by presenting her biometric image for scanning.

Next, enrolled biometrics is processed by biometric and encryption algorithms which allow matching algorithms to predict, by comparison, how closely the processed face matches the enrolled face. This enables the software to make a confident identification.

What are thresholds in face recognition?

Definition from NTIA (National Telecommunications and Information Administration)

“[Threshold] A user setting for Facial Recognition Systems for authentication, verification or identification. The acceptance or rejection of a Facial Template match is dependent on the match score falling above or below the threshold. The threshold is adjustable within the Facial Recognition System.”

For example: When you match faces against all the enrolled faces in your gallery, Kairos returns a confidence score between 0 and 1. The higher the number, the more confident the system is about the match.

Depending on your needs, you can set a threshold number between 0 and 1 where any value above that threshold would be considered a match and any value below that threshold would be considered a non match. For example, if you set your threshold at .82, any match with a score that is equal to or greater than .82 would be considered a match.

Why are Thresholds important?

A threshold, as established by the operating parameters, affects the error rate and therefore the integrity of the matches. Depending on what you are looking to achieve this can have an affect on the overall success of your application.

Thresholds allow you to tweak a face recognition system based on the degree of accuracy you desire for your individual use case. The likelihood of a match can be increased by implementing a low threshold, as in the case of identifying amusement park goers for their roller coaster photo souvenir— or decreased by implementing a high threshold, as in the case of banking transaction verification.

In other cases, like “Which celebrity do I most closely resemble” games— threshold doesn’t really matter at all. Face rec software is capable of producing such varying degrees of accuracy because a ‘match’ is essentially a statistical score between two face biometric templates which determine a degree of similarity— with the user controlling the determining degree.

Generally speaking, we can state:

Low Threshold = More results; suitable for business non-critical use cases.
High Threshold = Fewer results; suitable for business critical uses cases.

Understanding results

For each face found there may be a series of possible ‘identifiers’ (in the Kairos API, we call these ‘subject IDs’) along with the confidence of it being that person depicted. For instance it might return the following results for a particular face: “Bob .95, Fred .70, Mary .40”, indicating that it is probably a picture of Bob, because we return a higher confidence-threshold than the other faces.

Kairos' API has set a default threshold of 0.60. This tells the API to report a successful match for a face if the software determines that there is a match at a confidence threshold of 0.60 or higher. If all confidence levels are lower than 0.60 then the API would say that there is no match, and return no list of potential subjects (names).

Threshold is actually another optional parameter that can be entered into the API, if the use case of your app requires it set at a level other than 0.60.

Example: “If you are verifying a secure financial transaction, the threshold should be 0.9 and upwards".

Illutsration of face recognition threshold charts and false acceptance rates

Thresholds allow you to tweak a face recognition system based on the degree of accuracy you desire for your individual use case

Be aware, though, that if it is set too high, perfectly good matches are likely to be rejected— known in the industry as false rejects. Conversely, setting the threshold too low may result in faces “matching” that aren't correct. These are known as false positives or false accepts/match.

Kairos’ matching guidelines give you an idea of how confidence thresholds affect the outcome of us returning a false match. So, at the low end a .50 confidence threshold means we’d likely misidentify a face one in every 10. At the high end, we’re more certain— .999 confidence threshold means we’re likely to be wrong 1 time out of a million.

Authentication accuracy

There are currently three variables which are considered primary influencers in ensuring the accuracy of face recognition— pose, illumination, and expression, or 'PIE'.

Illustrations of faces symbolizing the effects of pose, illumination, and expressions on face recognition accuracy — We toyed with the idea of using a photo of an actual pie.

In addition to PIE, avoidance of facial occlusion is also important. This includes things like sunglasses and scarves that cover the face so that it is unrecognizable.

Currently, government identification documents like passports, visas, and driver's licenses require a flat background and prohibit smiling and face accessories like glasses— yet, further sophistication of face recognition technology systems is giving them the ability to recognize people wearing glasses or a hat, and can determine identity even if you’ve cut your hair or grown a beard. This advancement, and nearly all advancements in face recognition, are the work of 'neural' technology’ and the adaptive capabilities involved in machine learning.

Machine learning is a data centric process, requiring massive amounts of information to effectively remember, learn, and adapt. The data, in the case of face recognition, is the number of enrolled images. For example: the higher the number of enrolled images of Chinese women, the more accurately a system will recognize a Chinese woman. And the opposite is also true. If there aren’t enough enrolled images, the system cannot be relied upon for accurate identification.

This has become a point of contention both socially and politically, as face recognition is increasingly being called upon to perform identification in law enforcement environments.

Read more about Kairos' stance on this in our exclusive TechCrunch op-ed.

Let's take a moment to define 'Error Rates'...

False Accept/Match: System claims a pair of pictures are a match, when they are actually pictures of different individuals.
False Accept Rate (FAR): Frequency that the system makes False Accepts. Example: FAR of 0.1% system will make 1 false accept for every 1000 imposter attempts.
False Reject: System claims a pair of pictures are a mismatch, when they are actually pictures of the same individual.
False Reject Rate (FRR): Frequency that the system makes False Rejects.
ID Rate = 100% minus FRR: FRR of 2% or Identification rate of 98% system will reject 2 matches for every 100 authorized attempts.
Equal Error Rate (EER): Used to predetermine the threshold values for its false acceptance rate and its false rejection rate. When the rates are equal, the common value is referred to as the equal error rate. The value indicates that the proportion of false acceptances is equal to the proportion of false rejections. The lower the equal error rate value, the higher the accuracy of the face recognition system.

Face Identification vs Face Verification

^ This is a common misunderstanding when we're talking to new customers, or responding to media interviews— so, let's clear up the definitions once and for all:

Face Identification is the process we have been discussing thus far, whereby the system compares a person’s biometric image to all other enrolled images— yielding a match or a non match based on these measurements combined with the threshold set by the operator.

Face Verification differs from identification, because rather than a person’s biometric image being compared to other enrolled images, it is compared only with that person’s individual template.

Bonus definition: Face Detection is the process of searching an image for the location of the face. In Kairos’ case, our system searches an image from the top left to the bottom right step-by-step, pixel-by-pixel, to find all the faces in an image at different locations and at different scales (size) of the face.

From a technical point of view, face recognition is really 'face identification'— however, it's common for 'face recognition' to be used as an umbrella term for the above (some folks even group emotion and demographic analysis into face recognition, and we're totally okay with that).

Here's another common question we get:

"Is face recognition software more reliable than the human eye?"

In terms of the algorithms ability to compare images and identify someone based on precise calculations— yes, it is. In this way, face rec software is far more reliable.

However, as humans, we have the benefit of context to add to our capabilities at facial recognition. For example, if you saw someone who looks like your friend Greg sitting in Greg’s living room, you have a much higher confidence that it is in fact Greg— and not Johnny Depp. Because no matter how strongly Greg resembles the actor, why would Johnny Depp be in Greg’s living room?!

A face recognition system doesn’t have the ability to use context in its identification process, rather, it compares one or more images of faces to each other in isolation. So in terms of context, humans possess superior recognition skills. For now.

Photo of actor Johnny Depp — Seems legit.

When comparing face recognition systems

FAR and FRR are most common metrics used to evaluate performance of face rec systems. The FAR (false acceptance rate) is the probability of cases for which a biometric system inaccurately returns a positive identification/verification. False acceptance occurs when the system recognizes the wrong person at the time of verification.

The FRR (false rejection rate) is the probability of cases for which a biometric system inaccurately returns a no-match identification/verification. This happens when the system fails to match the biometric input with a stored template.

The key to preventing FAR and FRR is precise extraction of features, as performance does not solely depend on the biometric algorithm. Poorly scanned biometric features will impede the ability of the algorithm to accurately identify and calculate available minutia—which will affect overall performance of a biometric system.

Performance rates of a face rec system can be expressed in various ways. In decimal format (0.05), in percent (1%), as fractions (1/100) or by using powers of ten (102). When comparing systems, the more accurate one would show lower FRR at the same level of FAR.

More terminology defined relating to face rec systems:

Even if you don’t read it, the search engines will ;-)

FMR False match rate: FMR is an empirical estimate of the percentage of times at which the system incorrectly states that a biometric sample belongs to the claimed identity, when the sample actually belongs to a different subject. During the performance evaluation, pairs of FRR and FAR (or FNMR and FMR) are calculated.
FNMR False non-match rate: An empirical estimate of the probability at which the system incorrectly rejects a claimed identity when the sample actually belongs to the subject.
ERR Equal error rate: The rate at which FMR is Equal to FNMR.
TAR True acceptance rate: Defined as 1 – FRR. This measure represents the degree that the biometric system is able to positively match the biometric information from the same person.
WER Weighted error rate: Defined as the weighted sum between FNMR (FRR) and FMR (FAR).
Template capacity: Template capacity gives us the maximum number of sets of data that can be stored in the system. The capacity to store biometric templates differs amongst biometric systems.
Matching speed: The time it takes for a person to be authenticated or identified using the software.

Got all that!?

Closing thoughts

Current Face Recognition systems rely heavily upon favorable lighting conditions, orientations, and scale. Yet, as digital culture migrates to ‘realtime’ biometric identity verification through mobile and IoT devices, reliability has to extend beyond these restrictions— so we are working rigorously to train these systems to be more effective in ‘real world’ conditions presently considered unfavorable for accuracy.

The development of this brand of truly lifestyle compatible Face Recognition is in direct alignment with cultural demand. And Kairos, along with other conscious providers, is actively pushing to provide systems with the enrollment data necessary for truly reliable, inclusive accuracy for all human faces.

Still, we see thresholds as a key component to 'tuning' your specific implementation of Kairos. Now you (hopefully) have a better understanding of their power, we can't wait to see what you build.

Stay up to date on AI developments

Our experts weigh in on the latest industry technology.

Introducing Our New Liveness Model: A Leap Forward in Biometric Security

Discover Kairos' latest breakthrough in biometric security with our new liveness model. Designed for passive, selfie-based liveness verification, this enhanced model leverages more data, advanced deep learning backbones, and improved score distributions to deliver greater accuracy and security against spoofing attacks. Experience a seamless and robust authentication process that stays one step ahead of emerging threats.

Let's talk!