Unlike legacy signature-based detection systems, today’s generation of AI-powered security technologies are rarely suited to a plug-it-in-and-watch-it-light-up evaluation strategy.
They often include a mix of supervised and unsupervised machine learning, automated threat hunting, trained classifiers, and focus on reducing the number of erroneous and unactionable alerts. As a result, evaluating their detection efficacy requires carefully planned testing before making a buying decision.
For products focused on detecting network-borne threats, it is always wise to account for the learning period to baseline the network.
Most threats can be quickly detected and labeled using supervised learning and pretrained detection models (think in terms of n-dimensional signatures). But there are still many threats and classes of threats within the network – such as low and slow data exfiltration and lateral movement – that require an understanding of what normal traffic looks like before effective detections can be made.
When testing traditional signature-based detection systems, it was often enough to simply launch a bunch of common hacker tools, vulnerability scanners, or even just replay a series of packet captures (PCAPs), in order to cause the generation of multiple threat alerts on the system under test.
Some signature-based systems were at least smart enough to differentiate between a vulnerability probe and an exploit attempt. But in general, the name of the game was to generate as many alerts as possible – thereby confirming the breadth of detection coverage the product offered.
The reality of this kind of testing is that it doesn’t account for intelligent advances designed to reduce the noise of the systems and filter out all the meaningless and unactionable alerts.
Learning the network is a critical phase of today’s AI-powered threat detection systems. That learning period tends to vary with different classes of threats and the protocols that they depend upon.
For instance, learning and baselining email traffic in a corporate network – such as SMTP, IMAP or POP2 – may only take a few days, while learning the Kerberos authentication protocol traffic landscape and interrelationships between all the hosts on the network may take a week or more.
When it comes to evaluating new AI-powered network threat detection systems, the baselining period is a critical consideration, especially when measurements of detection efficacy and breath of coverage are concerned.
For example, many organisations often consider using a penetration test (pen test) as the primary vehicle for a product’s evaluation.
While there is considerable breadth in what people believe a pen test should cover and the methodology that should be used, special attention must be given to the learning period of the devices being tested.
If a new device, such as the pen tester’s laptop, is added to the network and starts launching probes and attacks, few to no threat detection systems that rely on baselining and unsupervised machine learning will begin firing alerts.
In addition, if the newly introduced pen testing laptop stays on the network and continues to probe and attack for an extended period, those behaviours would likely be learned by the detection system as an acceptable baseline for the host and maybe even the network.
Scrutiny should be given to the scenarios that are important to evaluate when testing the latest generation of AI-based network threat detection systems.
For example, if a critical threat profile is an insider threat or recently compromised host, pen testing should be conducted from a device that has been on the network for several weeks and used by the type of employee you care most about.
This ensures that the baselining systems will accurately learn the normal traffic profile of a host. Remember that a host can be on the network for an extended period of time, but not actively used, and therefore might not have application or Internet traffic profiles.
Alternatively, if your most critical threat profile involves detecting outside attackers who surreptitiously add a new device to the network from which to launch an attack, it is a good idea to include the pen tester’s laptop on the network and assume that no learning has taken place.
In this scenario, threat detection will be more focused on spotting these newly introduced devices and the noisy attacks they have introduced to the network as fast as possible.
Successful evaluation of new network-based threat detection systems requires a thorough understanding and staging of test scenarios. Advanced detection capabilities in these new products make extensive use of unsupervised learning modes and require variable periods to baseline the network, the devices on the network, and the users operating the devices on the network.
These new systems are smart enough to filter out contrived and non-malicious traffic, and roll-up multiple detection types and classes of threats until confidence levels are met. This reduces false-positive results and unactionable alerts.
Just because an alert hasn’t popped up in a GUI doesn’t mean the system isn’t aware and tracking the threat. It might simply mean that there’s not enough evidence to support an actionable response yet.