I attended a security panel recently where the panelists were asked what areas or approaches in security, if any, were over-hyped. One of the panelists, the CISO of a well-regarded Valley startup, said “Machine Learning. If a vendor comes in and starts talking about how they use Machine Learning, I start tuning out.”
Not long after that, I was briefing a well-known analyst about Agari’s new Enterprise Protect product. As I described our approach, which leverages Machine Learning, he said “It seems like you’re boiling the ocean. Why not just use a simple set of rules?”
I believe that both the CISO and the analyst were reacting to the overuse, and sometimes misuse, of the term “Machine Learning” in marketing security solutions. Many vendors use Machine Learning, or the broader field of Data Science, to attempt to associate a degree of complexity and sophistication with their products, sometimes leveraging jargon as a way of obscuring the constraints or limitations of their approaches.
While the underlying techniques associated with Machine Learning can be complex, the primary goal is relatively simple – to use a set of known examples (e.g. identified attacks) to train a generalized model that can estimate a value or a verdict for previously unknown examples (e.g. new attacks). Earlier this year, I delivered a presentation to the eCrime Congress in London that focused on demystifying the application of Machine Learning (ML) in security solutions and helping security buyers make informed decisions.
The first part of the talk focused on the core components of any (Supervised) Machine Learning process:
The traditional Supervised Machine Learning process involves selecting features, collecting a labeled training set, choosing and tuning an algorithm to train a model and measuring the resultant accuracy on a testing set. If the accuracy measures are acceptable, then you have a model that can be put into the wild. If not, you may have to select more or different features, collect better training examples, tune or replace your algorithm and do so repeatedly until the accuracy measures meet your expectations.
While the art (and sometimes resultant perception of “voodoo” science) of Machine Learning is the correct selection and tuning in this iterative process, this underlying approach is straightforward and consistent. The core components of Features, Training Set, Algorithm and Accuracy can be used to understand, analyze and evaluate any security solution based on Machine Learning.
Next week, I’ll post a follow-up blog on how to apply this framework for evaluating Machine Learning claims to make better, more informed decisions regarding security solutions.
If you’re interested in learning more, don’t miss our webinar on June 23rd: Machine Learning in Security: Detecting Signal in the Vendor Noise.