• View Gartner Market Guide For Online Fraud. Read Here.

  • Read Gartner Market Guide For User Authentication. View Here.

  • BehavioSec Announces More Continuous Authentication Features Read Press Release

Learn More »

BehavioSec in a Real World E-Banking Environment

  • White Papers
  • Share

A case study of BehavioSec in a real world E-banking environment

Summary

This paper is a result of a pilot project performed with Danske Bank during 2012/2013. Danske Bank installed a timing collector into their online E-banking solution for a limited number of users)at the end of 2012. Data collection ran for a few months so that a sufficient amount of transactions could be seen for the majority of the users.

Once sufficient amounts of data were recorded we performed an analysis to calculate system accuracy.
When considering a session (a combination of 4 different input forms) BehavioSec can correctly verify the identity of the user 99.7% of the time while also detecting an imposter 99.7% of the time. This is a very high level of accuracy among all types of biometrics.

Introduction

Behaviometrics is a combination of ”behavioral” (the way a person behaves) and ”biometrics” (technologies and methods that measure and analyze biological characteristics of the human body for authentication purposes, i.e. fingerprints, eye retina, and voice patterns). Behaviometrics or behavioral biometrics is a measurable dataset of behavior used to recognize or verify the digital identity of a person. Behaviometrics focus on user behavior patterns rather than physical characteristics.

After a user is verified with traditional security techniques like passwords, behavioral biometrics enhances user protection through continuous monitoring across the length of their session, providing continuous authentication of the user based on their actions.

A biometric authentication system checks whether a user should be accepted into a system. A false accept is when an unauthorized user incorrectly enters the system and conversely, when an authorized user is not accepted, they experience a false reject. The ratio between false-user entry attempts and how often they succeed in gaining access is the False Accept Rate (FAR) while the ratio between correct users being accepted and rejected is called False Reject Rate (FRR).

A system using continuous authentication through behavioral biometrics references a set of behavioral traits and calculates a “similarity ratio” between the user’s current behavior and their recorded behaviors. These are calibrated to a threshold so if the similarity drops below the tolerance level, that user will be detected as an imposter.

Calculating the Performance of a Static Biometric System

Biometric systems are defined by the US National Institute of Standards and Technology (NIST) as systems utilizing “automated methods of recognizing a person based on physiological or behavioral characteristics” [9].

A biometric system identifies and categorizes data collected from the user through a sensor. It captures a mathematical representation of the biometric samples to be used later for in an authentication process. These systems have Identification and Verification modes. Identification compares a given sample to a dataset derived from a user group to correctly identify a particular user. The Verification process ensures that particular user by matching the collected sample against that user’s existing dataset. If the sample matches data in that previously stored template then the verification is successful.

Effects of Calibrated System Thresholds

Behavioral Biometrics systems separate impostors from authentic users by comparing the user’s score against the threshold – the higher the score, the more likely the user is authentic. This conditional approach is similar to articles [4, 7, 6, 3, 8, 1].

Consider the figure 3.2 below, with the threshold set to 30, it demonstrates that the samples [1, 2, 5] are above threshold and are considered to be from the correct user, whereas the samples [3, 4] are determined to be from the wrong user.

As highlighted in article [5] there are many other metrics that must be present to test the accuracy of a behavoral biometrics system. Unfortunately, those metrics are dependent on the system’s structure (identification, verification, fixed threshold, number of enrolled users and number of templates per user) which must be taken into account when comparing the performance of competing systems [10].

In theory, a single threshold measurement against a sample from a correct user will always score higher than an impostor, causing the imposter to be rejected from the system.

The matching algorithm for the system makes a determination based on a tuned threshold to calibrate how close an input sample needs to be to the template dataset for it to be considered a match. When the threshold is reduced there are more false accepts. Correspondingly, a higher threshold increases the false reject rating.

No matter how the threshold is set, some classification errors will occur. In some cases, impostors can generate scores that rank higher against the threshold than the patterns from an authentic user. The choice of the threshold value is a problem if the scoring distribution of the correct user and the impostor overlap.

Some behavioral biometrics systems only specify a FAR value. A single FAR is not sufficient since it is possible that the system with the lowest FAR also has a correspondingly high FRR.

In some cases where the authentication threshold is adjustable it can be difficult to determine from only FAR and FRR valuse if any particular setting performs more accurately. To get a measurement independent of the threshold the Equal Error Rate (EER) can be used. The lower the EER, the more accurate the system is considered to be.

Method

BehavioSec set up a BehavioSec system simulation to post user process data captured by Danske Bank to demonstrate how the system works in a realtime production environment.

Building user profiles

User profiles were built chronologically by inserting every transaction into the simulator, user by user. In this stage every transaction was considered to have been made by the correct user. After the training phase (the first 10 insertions), the scores from each insertion were used to calculate the FRR of the system.

Simulating attacks

To calculate the FAR it was necessary to simulate imposter login attempts. We simulated 5 imposter attacks against each profile using timing data from a set of unauthorized users. These scores were used to calculate the False Accept Rate of the system.

Finding the optimal detection threshold

We calculated the FAR and FRR curves by varying the threshold starting from 0 and incrementing to 100. The FAR values were calculated by recording how many simulated attacks were accepted at any given threshold. FRR values were calculated by checking the amount of authentic transactions that were rejected at a that same threshold.

Delimitations/Assumptions

  • The simulated attacks were built up by combining data from other known users, not necessarily reflecting a real attacker.
  • The results found in this report are based on a limit of 3 user attempts to perform a login before lockout.
  • When calculating the FRR of the system we have assumed that all transactions have been made by the correct user.

However, data from a live environment could include attacks or account sharing that can potentially skew the FRR and reduce accuracy, making the referenced FRR a model of a ’worst case’ FRR.

Results

The results are presented as the Equal Error Rate (EER) calculated from the Danske Bank data.

Input fields
The timing data was gathered from 4 different input forms with corresponding input fields:

  • Login form
    • Username (6-10 characters, static)
    • Password (4-8 characters, static, anonymous)
  • Verification form
    • One-Time Password (6 characters, free text, anonymous)
  • Fund Transfer form
    • 3-5 free text fields (various lengths)
  • Signature form
    • Password (same as login form)

Login Accuracy (Username & Password)

Using only the login form we can properly verify if it is the correct user/an imposter in 97.4% of the cases.

Session Accuracy

By combining the results from each individual transaction over a whole session we can see that the accuracy significantly increased to 99.7%. To test this we defined a session as the following combined transactions:

  • Login (Username/Password)
  • One-Time Password
  • Fund transfer
  • Signing of transfer (Password)

Conclusions

This study showed great potential while testing BehavioSec in a Danske Bank test environment. Danske Bank has decided to conduct further field tests and is adding behavioral biometric technology to their E-Banking platform for use testing with actual customers.

For single login attempt tests (username and password input) we observed an Equal Error Rate (EER) of 2.6% (see figure 4). This indicates the system properly distinguishes correct users from an imposter in 97.4% of the time when a single login attempt is made.

Results can be further improved by combining the test results from multiple transactions into a session score (see Figure 5). For whole sessions we calculate an EER of 0.3%, indicating that 99.7% of the time, our behavioral biometric system provides continuous authentication of the correct user.

Adjusting the threshold allows the system to be fine tuned to specific needs. Increasing security ups the likelihood the risk of false rejections and vice versa. If it is desirable, a lower threshold can be selected so correct users are always accepted at a cost of more successful impostor logins. Even when the threshold is lowered to near 0% false rejects most impostor attempts will still be successfully blocked.

Definitions

Sample is the blob of behavioral data that is collected when typing in a text field.

Profile is a condensed numerical representation of the behavior which is unique for each individual user. The profile is built by collecting and analyzing samples.

Insertions is a measurement of how many times that a profile has been updated with data from new samples.

Similarity Score between 0 and 100 is calculated when comparing a collected sample against a profile. The higher the score, the more probable it is that the sample comes from the correct person.

Threshold can be used to separate the impostor from the correct user and have a direct link to the False Accept Ratio (FAR) and False Reject Ratio (FRR). If the score is above the threshold it is considered to be the correct user, if the score is below the threshold it is considered to be an impostor. The threshold can be set on a range between 0 and 100.

False Accept Ratio (FAR) is the statistical ratio (%) of samples that incorrectly scores above the threshold. E.g. the percentage of patterns that we know belong to an incorrect user and that is falsely accepted as the correct user. A high threshold makes it less likely for incorrect samples to be accepted.

False Reject Ratio (FRR) is the statistical ratio (%) of samples that incorrectly scores below the threshold. E.g. the percentage of patterns that we know belong to the correct user and that is falsely rejected as the correct user. A low threshold makes it less likely for the correct samples to be rejected.

Equal Error Rate (ERR) is the point (threshold) at which the curves for FAR and FRR intersects. It is the point on which FAR and FRR is equal. It is used to determine the accuracy of a system.

For more information please contact sales at
BehavioSec, Västra Trädgårdsgatan 11
SE-111 53 Stockholm, Sweden
Phn. +46(0)920-75045 Fax. +46(0)920-75010
contact@behaviosec.com
www.behaviosec.com