Machine Learning and Security: Hope or Hype?
There is a temptation to hail major advances in technology as cure-alls for the challenges facing organizations and society today. The fanfare usually ends in disappointment, as the latest superhero technology doesn’t live up to its expectations.
Not surprisingly, machine learning, a domain within the broader field of artificial intelligence, has been hailed as the current be-all end-all answer in cybersecurity. As a result, it is currently at the peak of inflated expectations in Gartner’s most recent Hype Cycle for Emerging Technologies.
What Machine Learning Is … and Isn’t
Arthur Samuel, a pioneering American computer scientist, defined machine learning in 1959 as “the field of study that gives computers the ability to learn without being explicitly programmed.” Put another way, machine learning teaches computers to do what people do: learn from experience and get better over time.
An important distinction is that machine learning is a domain within the broader field of artificial intelligence. The two terms are not entirely synonymous, despite often being used interchangeably.
Machine learning primarily consists of three high-level categories:
- Supervised learning: When you know the question you want to ask, have examples of it being asked and answered correctly, and can feed this information to a machine
- Unsupervised learning: You do not have answers and may not fully know the questions
- Reinforcement learning: Trial-and-error behavior effective in game scenarios
How Supervised Machine Learning Works
The details and terms of machine learning can seem intimidating to non-data scientists, so let’s look at some key terms.
Supervised learning requires training data in the form of sets of correct question-and-answer pairs called “ground truth.” This training data allows the classifiers—the workhorses of machine learning that categorize observations—and the algorithms—the techniques that organize and orient classifiers—to do great work when analyzing new data in the real world.
A common example is facial recognition. Classifiers analyze specific data patterns they are trained to recognize—not actual noses or eyes—to accurately tag a particular face amongst millions of photos.
Machine Learning in Cybersecurity
The cyber threat landscape today forces organizations to constantly track and correlate millions of external and internal data points across a number of endpoints. It simply is not feasible to manage this volume of information on an ongoing basis with a team of people.
While machine learning offers tremendous promise for cybersecurity, it has its share of shortcomings that need to be acknowledged.
Machine learning shines here because it can recognize patterns and predict threats in massive data sets, all at machine speed. By automating the analysis, cyber teams can rapidly detect threats and isolate situations that need deeper human analysis. Machine learning techniques can better protect organizations in a number of ways:
- Detecting surreptitious attackers on networks: Machine learning can detect behavioral anomalies to find attackers on the inside or logged in with stolen credentials.
- Predicting “bad neighborhoods” online: By learning from Internet activity patterns, machine learning can automatically identify attacker infrastructure being staged to launch the next threat.
- Detecting attacks through novelty and outliers: Machine learning finds attack patterns humans cannot readily detect, such as a new peer relationship on the network with hosts communicating that can’t or shouldn’t be doing so.
- Finding suspicious cloud user behavior: Analytical techniques uncover suspicious user behavior indicative of cloud account compromise to extract data or perform malicious operations.
- Detecting modern malware: Machine learning is valuable in detecting polymorphic malware, breaking down threat attributes to better stop new and reengineered polymorphic threats.
Beware the Pitfalls
While machine learning offers tremendous promise for cybersecurity, it has its share of shortcomings that need to be acknowledged in order to be used appropriately.
If an application using machine learning suggests an incorrect movie recommendation, it is typically ignored. However, if machine learning incorrectly misses a threat or falsely convicts a good file, that could potentially interrupt business operations.
Moreover, how can—or should—machine learning account for changes occurring in the world around it? For example, if it operates in an environment in which two countries are foes, how can it account for a peace treaty struck between the former adversaries? This makes periodic retraining vital so it remains accurate as the world evolves.
When machine learning detects something bad, it often explains itself with mathematical logic, instead of relevant security context. For example, say a machine learning system detected an infected device in the finance office. Prior to potentially yanking the CFO off the network, a security practitioner must confirm the relevant security event details of the infection—how the computer was infected, whether there is a vulnerable application on the laptop, or what file turned malicious—to better understand how to respond. Mathematical logic won’t help here. This “explainability” problem is a real challenge.
Making Machine Learning Work for Your Organization
Machine learning is not a panacea for increasing cyber resilience. Instead, it is a helpful, additional security layer to augment other techniques in place.
Rather than being used in isolation, it needs to be combined with other cybersecurity techniques from intrusion prevention rules and antivirus signatures, to whitelists, to sandboxing to behavioral techniques. Specific to machine learning, no single technique or method will suffice, rather we must call on a pipeline of hundreds of algorithms working together for successful outcomes. Moreover, no security approach is effective without a team of humans carrying out threat intelligence research, confirming all is working as it should and addressing changes in context.
Machine learning has many technical measures of success, but not all are helpful for a security professional. For machine learning to be most successful and embraced wholeheartedly, it must generate understandable outputs and generally “show its work” with security context.