It’s no secret that organizations are facing an onslaught of attacks – the Yahoo breach is only the most recent in an escalating pattern; we’re way beyond viruses and script kiddies, and while we continue to have layer 8 problems when users do dumb things, it’s a different age. Our adversaries are very well organized, well funded, and above all else, patient. The average time an advanced persistent threat is in an environment before detection is more than 200 days. We truly are in an era of industrialized threats. At the same time we’re faced with a massive amount of security information, creating a serious signal to noise problem. It’s not uncommon for a medium to large enterprise to generate hundreds of millions of events per day, and even with a 100,000 to one reduction via rules and filters, there are literally thousands to look at.
At the same time, there are an ever increasing number of vulnerabilities – 50 % of which are at the application layer, for which there is no patch: new secure code has to be written. On top of it all we have serious skills gap in InfoSec, particularly for technical folks. Estimates are that there are 200,000 unfilled security positions right now (as anyone who’s on LinkedIn knows), with an estimate of more than a million by 2020.
Yet many organizations still structure their security program for the dark ages – you know, 2005. Back before mobile, cloud and IOT (Internet of threats) came along. We were focused on building a secure perimeter, like a moat around the corporate castle. That worked until those crazy users wanted us to put doors in the walls. Those doors both break, and can be compromised, and over the past couple of years we’ve realized that the bad guys are already inside. There is no secure permitter, and no way to keep them out. We have to build an immune system that detects and responds at a component level.
We’ve spent the last several years instrumenting our networks, and ensuring that we have good security information flows – sensors and SIEMs. Even moving beyond a SIEM to a true Security Intelligence Platform leaves us with an incomplete picture of what’s happening. That’s because all of the structured, machine generated, telemetry that we’ve been working so hard to collect, filter, and analyze is only 8% of the total security information available to us.
The other 92% of security information is contained in things like threat commentaries, blog posts, social media, presentations, IRC chat logs, forum posts – it’s human generated, natural language information – sometimes called unstructured data. And it’s that information that cognitive technology can help us leverage to improve the risk posture of the business.
Let’s look at an example of how cognitive technology can help. Bob is our Tier-1 SOC analyst, who’s grown into that role from the IT operations staff. He’s been with the company for three years, but only in this new role for about 10 months. We’ve moved beyond SIEM to a security intelligence platform (SIP) that does a great job of finding real offenses that need to have action taken. Bob get’s the alert from the SIP that somethings happened, and starts his usual 10 hour investigation to fully understand what’s going on (it’s only a medium complexity event).
Bob has some challenges – first, is that he’d dealing with silos of information. IT doesn’t talk to security, security doesn’t talk to the business, and the business doesn’t talk to IT. Nobody talks to audit. There’s also silos externally, for all that ISAC’s share information, it tends to be what they see from the outside, rather than what’s inside. After all, no company wants to give competitors information on their weaknesses, let alone the plaintiff’s bar. So Bob turns to his favorite search engine, and starts searching and researching information on what the office shows. That’s all well and good, but it’s hardly a curated set of information from trusted sources. He’s bounces what he finds off of Alice, our Tier-2/3 operator.
Alice has been with the company for 10 years, and in the SOC for 5. She knows our environment cold – that old HP-UX server in the corner that fires off a cloud of steam once a month and fills the logs with cruft, who to call in audit to find out if they’re running an unannounced penetration test, and so forth. Alice has the tribal knowledge of our environment and can improvise better than James Moody – she’s incredibly effective at understanding what an offense really is, but even she has to start over with each new offense because she can’t possibly hold all the details in her head.
And she just quit for 50% more money. Unfortunately she’s not Vulcan and can’t mind-meld with Bob, so he’s out of luck. That means that Bob’s late to identify what’s going on, and the remediation and response is delayed until the damage is already done. If Bob had an experienced advisor that he could take the offense to and ask ‘what is it’, he wouldn’t be in such a bad state.
That’s where cognitive security comes in. Cognitive can bridge the gap between machine and human generated information to tell us what an offense is, and why that conclusion is true. That’ll reduce Bob’s analysis time from hours to minutes, allowing much faster incident response.
So my dad’s a big fan of cognitive security because I finally get to use my undergraduate degree in cognitive science (Computer Science + Psychology – anthropology only helped making stone tools to get ready for Y2K). As part of that program, it became crystal clear that the very best pattern matching engine is the human brain – it evolved/was created specially to take in unstructured information from our environment, match it against our previous knowledge and return information and actions that we should pursue (See: snake, Action: jump back). But it’s not perfect (See: stick, Action: jump back).
This is why, at a college reunion, we recognize a face but can’t place a name – though we know we went to school with them, that’s all that comes our of the archives. We may get one piece of information our of our vast landscape of data, but find the half-dozen that are required to fully understand an event, and connecting them together to tell a story is beyond what we can do. That’s where machine learning comes in. By leveraging cognitive technology, and teaching it how to read security information – training it on our buzz words (threat, vulnerability, actor, risk, CVE, etc), as well as how our language is structured, it’s able to pull it all together.
And that’s fundamentally different from how a search engine works. If we ask one to ‘Please don’t show me pictures of the 2016 presidential candidates’, it’s going to show us pictures of the candidates, because it does a keyword search. Cognitive would instead ask us ‘Ok, what pictures of other politicians would you like to see?’ because it understands the language and context of the question. It’s that context that’s critical, and something that only a curated corpus (cognitive library) of knowledge can provide.
Say that we have an offense that includes some indicators of compromise: bill, feathers, quack, waddle. We’d start searching on those indicators and come back with everything from restaurant charges, to pillows, to bad doctors, to Opus. Cognitive would put them all together, understand that it looks like a duck, sounds like a duck and walks like a duck, so it’s probably a duck. And then it would tell you exactly why it thinks that this particular offense is a duck, and show you the sources it used to reach that conclusion.
One last example. If you’ve taken the CISSP exam, you know it’s about as much fun as a root canal without novocaine. The material is obscure (fire extinguisher codes anyone?), and the questions are designed to be difficult (“choose the most correct answer”). You could point a search engine at the body of knowledge, but you can’t just drop the question into the search field and get anything like a good answer out of it.
But if you feed the BOK into a cognitive engine that has a basic security vocabulary, it’ll get roughly half the questions right. Think about that for a moment – the engine read the BOK, then read the question, read the answers, and understood enough to get half the questions right. That’s impressive, but not the end of the story. A key part of building a cognitive engine is to teach, train and calibrate the corpus of knowledge. After a few weeks of working with it, it’ll then score 90% or better – if cognitive was five years old, it could get certified. By the way, those other questions are the ones we all think are bogus.
Is it magic? No. Cognitive isn’t skynet, it doesn’t replace people – it arguments them. I’ve heard some folks say that cognitive security will replace a tier-one SOC operator. No it won’t. It’ll give those folks somewhere else to go ask questions (almost like augmenting tier-2). A human will always be in the loop, but the heavy lifting of analyzing an event can be reduced with cognitive. It’ll allow for faster diagnosis, which not only saves time and money, but also let’s us start our remediation process faster, and return to service sooner.
With a modern security intelligence platform in a well instrumented environment, cognitive security, and a robust incident response workflow engine we have the potential to reduce the time from when an incident occurs to when it’s fully remediated from days or weeks, to minutes or hours. That’s a game-changing event. But that full life-cycle only works if we have access to all the security information – machine and human generated. We need to do it, because we have to improve the risk posture of the business.