Anomaly detection


The project is designed to look at the implementation of network security based on the assessment of anomalies as an added layer to address the threats we do not know we don’t know. Attempting to filter through the noise from the bulk of data security systems feedback.

Information system move a lot of data externally and internally, automatically generated; by users, processes, systems automated and manual, virtual, physical etc. This leads to a mountain of data and in and environment were our priority is to establish and maintain the “integrity confidentiality and availability” of this bulk feedback poses more of a problem than a solution.

Analysing data events as acceptable or unacceptable is subject to the requirements of the organizational unit. The objective in this project is to further reduce the data classification to threats which are known and unknown. Threats not defined by signatures, unwanted behaviour, predefined set of rules, etc. We attempt to extract from the bulk of data/information the outlier that is not picked up as either acceptable or threat and assess its implications.

We have to keep in mind that “anomaly detection systems” are not a single solution to threat management as a matter of fact they are ineffective on their own. Outlier detection is best effective in finding unknown threats. As for the known and zero day threats signatures and behaviour based systems come into play ref table below:

When to use anomaly detection systems.

Threat Status Anomaly Signature behaviour
Known Not good Best good
0-day good Not good Very good
unknown Best Not good best

Anomaly detection is also very subjective and demands proper analysis. The baseline used to determine the initial point of reference to the system at its safe/optimal operation has to be established first. This is critical as creating a false baseline will adversely affect the final result. In the event of an “advanced persistent threat” a granular understanding of the environment is required to implement a model that will effectively identify the ongoing persistent variance and not identify it as normal.

Methodology:  We are following a basic methodology to establish a working sustainable learning life-cycle model it is constructed of six elements:

Classification: The primary interest in this project is not the standard set-up for information security management system but to stablish “system variants”. Our classification does not constitute the standard approach to information systems but establishes a scalable model that should apply to any given system. It classifies components of the system and their generated network data. Defining the base line and then establishing new entries as per environment requirements establishing variations anomalies in data/traffic/information usage and access. Classification for analysis would include data components like IP addresses, user names, bytes, domains, reputation, “deep packet constipation elements”, etc.

View full document and Project classification results

Behaviour analysis: Having classified our interesting data we need to assess usage and define behaviour. We need to establish; how, why, when, what and by whom or by what our classifications are to be established. Giving us a bird’s eye view of the heartbeat rhythm that defines normal in the given infrastructure.

View full assessment and Project behaviour analysis results

Establish baseline: Threat modelling for anomaly detection requires that we establish a baseline. The point at which deviations are established taking the feedback from observing the network traffic class behaviour under what we should define as optimal working condition which defines our default markers. This baseline is fluid, with every new data definition the baseline shift. Behaviour and classification of all new data helps maintain a valid secure new baseline. The baseline is redefined at every refresh cycle of the process.

View full assessment and Project baseline results

Anomaly Detection: The bulk of our research might be at this stage, that having been said not getting the best out of the previous states would derail the results of this stage. The baseline in information will encompass data from known threats, normal usage, behaviour, etc. Once it has been established we are then tasked with finding those odd occurrences that vary from it.

See project implementation of anomaly detection

Anomaly Analysis: There are going to be a lot of variations from the baseline and not all of these will constitute interesting events. Before we can rule functionality we need to filter out the false positives and negatives. This will again lead to re-learning the systems and establishing classifications and behaviour. We intend to cross check a variation across different platforms (open-source/development and trial) with the objective of assessing which would create the best results.

View full analysis

Adaptation: What is done after the anomalies are detected is subject to their implications. The model will have to be adjusted to fit the new state of data. These as of the current state of the project are in one of two scenarios:

  • Adaptive anomalies: these would be anomalies that are not threats and need to be fed back into the system to adjust the classification metrics thus adjusting to the new landscape.
  • Threat driven anomalies: the primary objective of the project would be to have a security system that identifies unknown threats, mitigates and contains the incidents before they can cause any damage. For this to work effectively there has to be a means to adapting the threat management system to the mitigation

We will be addressing means to integrate results across from anomaly detection system to the threat management system with least friction in “Adapting the threat management system to unknown threats