We may sound pedantic when pointing we should be talking about Machine Learning, and not AI, for security threat detection use cases. But there is a strong reason why: to deflate the hype around it. Let me quickly mention a real world situation where the indiscriminate use of those terms caused confusion and frustration:
One of our clients was complaining about the “real Machine Learning” capabilities of a UEBA solution. According to them, “it was just rule based”. What do you mean by rule based? Well, for them, having to tell the tool that it needs to detect behavior deviations on the authentication events for each individual user, based on the location (source IP) and on the time of the event, is not really ML, but a rule based detection. I would say it’s both.
Yes, it is really a rule, as you have to define what type of anomaly (to the data field – or ‘feature’ – level) it should be looking for. So, you need to know enough about the malicious activity you are looking for, so you can specify the type of behavior anomaly it will present.
But on this “rule”, how do you define what “an anomaly” is? That’s where the Machine Learning goes. The tools will have to automatically profile each individual user authentication behavior, focusing on those data fields specified from the authentication events. You just can’t do it with, let’s say, a “standard SIEM rule”. There is real Machine Learning being used there.
But what about AI – Artificial Intelligence? ML is a small subset of a field of knowledge known as AI. But the problem is that AI has much more than just ML. And that’s what that client was expecting when they complained about the “rules”. We still need people to figure out those rules and write the ML models to implement them. There’s no machine capable of doing that – yet.
There have been some attempts based on “deep learning” (another piece of the AI domain), but nothing concrete exists. You can always point ML systems to all data collected from your environment so it can point to anomalies, but you’ll soon find out there are far more anomalies that are not related to security incidents than you are lead to believe by some pixie dust vendors. Broad network based anomaly detection has been around for years, but it hasn’t been able to deliver efficient threat detection without a lot of human work to figure out which anomalies are worth investigating.
Some UEBA vendors have decent ML capabilities, but they are not good on defining good rules/models/use cases to apply it. So, you may end up with good ML technology, but with mediocre threat detection capabilities, if you don’t have good people writing the detection content. For those going through the “build you own” path, this is even more challenging, as you need the magical combination of people who understand threats and what type of anomalies they would create and people who understand ML to write the content to find them.
Isn’t that just like SIEM? Indeed, it is. People bought SIEM in the past expecting to avoid the IDS signature development problem. Now they are repeating the same mistake buying UEBA to avoid the SIEM rules development problem. Do you think it’s going to work this time?
The post Machine Learning or AI? appeared first on Augusto Barros.
from Augusto Barros http://ift.tt/2BlLxpn
via IFTTT
It's important to think of Machine Learning as part of AI especially search or question-based capabilities -- and to think of Deep Learning as separate from both ML and AI. Deep Learning is more a part of High-Performance Computing. The link is HPAI or High-Performance Artificial Intelligence.
ReplyDeleteNot too-many cybersecurity platforms create that HPAI link -- especially not UEBA. Some NGSIEM / NextGen SIEM platforms are moving in the direction of HPAI, especially ones based on (built on?) Hadoop platforms such as MapR, but with the addition of MapReduce (YARN server, commercial YARNs) and Apache Spark. It doesn't have to be Hadoop and it could also be Cassandra (or commercial Cassandra such as DataStax) as one example.
Most UEBAs are not using machine learning but rather one-small subset of statistical techniques known as confidence intervals. I think that's also part of the rub. So some of the pundits may be more-correct than you allude to.
If you look at the capabilities in a survey of the past-20 years tech in SIEM and IDS, you'll find a few gems that do not get used along with modern network and system data (events, event streams, or otherwise). The first is frequency analysis, but also other data-product integrations such as stacking and baselining. These aren't AI, but often they solve the problems and work with existing tools. Use them! Take advantage of stuff like that first before jumping to magic (technology so advanced that you clearly aren't yet ready for it!). NetFlow/ipfix and SysMon need to be leveraged... and we're still jumping to solutions like SolarWinds and/or Splunk when we're better off with YAF and/or Neo23x0/Sigma. Build the generic signatures and use the generic tooling before going pro. Why can't we learn these basic lessons?