Search Close
Email Security Blog

Email Fraud: Refining Predictive Models to Stop the Next Email Attack

Siobhan McNamara August 22nd, 2018 Email Security
Fallback Featured Image

New research finds malicious email attacks hit your business an average of one every six minutes. Predictive models point to new ways to quash the threat—but how can they account for random, outside factors?

With apologies to Thomas Dolby, there’s new hope that email fraudsters everywhere could soon be screaming, “They blunted me with science!” Data science, that is.

As we discussed in parts one and two of this series, breakthrough research from data scientists at Agari have unearthed a surprising pattern hidden in business email compromise (BEC), whale- and spear-phishing and other email-based attacks.

After accounting for other variables, it’s now clear that businesses receive a new malicious email at an average rate of one every six minutes.

Yet as alarming as this attack frequency may be, that’s not the jaw-dropping part. As it turns out, this average attack rate is consistent across all organizations, of all sizes, across all industries.

Why does it matter? Because this level of consistency may soon lead to new enhancements in the Agari solutions organizations depend on to short-circuit a growing number of advanced email threats. As it stands now, the business world can use all the help it can get.

Not Your Grandfather’s Email Attack

Indeed, email has emerged as a cybercriminal’s best friend. While businesses everywhere have spent billions to keep hackers out of their systems, fraudsters have learned to circumvent those defenses by exploiting simple human nature.

Using advanced identity deception techniques and highly-personalized messages that easily slip past the email security solutions most businesses use today, these criminals leverage social engineering to fool recipients into making payments or exposing sensitive information.

The damage is mounting. Today, more than 95% of data breaches start with a malicious email, helping to fuel cybercriminal activities that are expected to contribute to more than $3 trillion in business losses worldwide this year.

Unlike traditional email security systems, Agari solutions leverage advanced machine learning to analyze people, relationships and behaviors to effectively block malicious emails.

What if a predictive, time-based data model could be applied to fine-tune the efficacy of these solutions even further?

Known-Knowns Meet Unknowns

In part one, we established that average six-minute interval between the arrival of one malicious email to the next, with 90% falling within an average 16-minute window.

To be clear, this is just an average. Some malicious emails arrive in quick succession followed by longer time gaps. What’s more, each organization experiences a unique cadence to these attacks—no two organizations are receiving malicious emails at the same intervals. Yet on average, they all share that same average six-minute time gap, no matter their overall volume of email.

In part two, we used this insight to developed a data model that featured a two-minute time lag to determine if the prior few minutes of incoming email activity would accurately predict the next minute of activity. When plotted against one company’s real-world data on incoming email over a 45-day period, this model, with a 2 minute lag, anticipated the arrival of malicious emails with a remarkable level of accuracy. That means every two minutes of data served as a good predictor of the next two minute’s volume of malicious messages.

A key aspect of the attacks caught our eye: Even within the average six-minute interval, when malicious emails did arrive, they tended to do so in batches of pronounced spikes. This is why the model with the two minute lag works so well. Next we wanted to know what underlies this pattern. Are these coordinated campaigns? Are they in response to news involving the company?

Campaign-Centric Threat?

As it happens, we did find some of the answers in shared subject lines, “from” addresses and IP addresses. By removing these clear campaign indicators, we cut our model’s error rate in half. What’s more, the large spikes in malicious messages start conforming to typical business hours.

This may indicate campaigns sharing this handful of obvious attributes are less targeted and perhaps sent from different geographies without concern for email arrival time.

[IMAGE]

To better understand the remaining attack pattern, we recalibrated our model to capture work hours only, hoping to see if there was an even more defined pattern during the block of time both legitimate and malicious email volumes are highest.

[IMAGE]

In this instance, our 2-minute model was less predictive of new malicious emails. Indeed, the malicious email appeared to be more random, with unique subject lines and coming from unique senders and IP addresses. This points to two possibilities:

There is a common incentive motivating a large number of independent fraudsters to send malicious emails to a given organization at one time

These emails are part of coordinated campaigns from the same criminals, and that these campaigns are far more sophisticated than spam ever was

News-Driven Events?

That first hypothesis fit with our notion that news coverage about the company could drive malicious email traffic by getting word out to a large number of fraudsters about opportunities to defraud the organization. After all, when an organization announces they have completed a round of funding, for instance, they tend to receive a lot more phishing emails.

[IMAGE]

To test that, we analyzed for volume of media coverage. But ultimately, we could not identify any discernable patterns to support that notion. Which leaves our second hypothesis. And while the prospect that these are all highly-coordinated attacks is far more intriguing, they also represent a much larger threat to organizations.

A Never-Ending Battle

So, where does that leave us? After plotting a number of models to test a variety of hypotheses, we could account for only some of the underlying reasons organizations large and small receive malicious emails with that same average six-minute interval.

Which means that while we can’t yet identify whether a malicious email is part of a highly-coordinated campaign, we’re getting closer. This is great validation to Agari’s approach of designing machine learning models to catch fraud. Despite the opaque nature of what underlies these batch attacks, Agari’s solutions are designed to catch them regardless. We catch anything that looks different from ‘good’ and ‘normal’.

The dataset generated by the two trillion or so emails we process annually is just one of the extraordinary weapons in our arsenal. Our industry-leading expertise and AI-powered solutions apply behavioral science to identify and infer relationships in order to successfully recognize and neutralize incoming email attacks.

And our data scientists are continuously refining our data models to enhance the way we help brands across the globe stay ahead of cybercriminals and defeat BEC, phishing and other forms of email fraud. This culture of innovation is at the heart of everything we do.

With that in mind, I hope this series has given you at least a small glimpse into the kind of analysis we do on a continuous basis. We are growing fast and as we do so, our data becomes more diverse and our models more sophisticated in distinguishing between good and malicious behaviour. With such a large global customer base now I believe we will be able to defeat advanced email attacks, closing off the primary entry point for hackers into organizations.

That is my prediction – and I’m sticking to it!

Leave a Reply

Your email will not be published. All fields are required.

October 31, 2018 Fareed Bukhari

Business Email Compromise: 54% of Email Attacks Use Display Name Deception

Siobhan McNamara is a data scientist at Agari, working on machine learning models that determine…

September 26, 2018 Ravi Khatod

BEC: Future-Proofing Your Investment in Email Security

Siobhan McNamara is a data scientist at Agari, working on machine learning models that determine…

September 24, 2018 Armen Najarian

The CMO's Guide to Email Deliverability

Siobhan McNamara is a data scientist at Agari, working on machine learning models that determine…

September 20, 2018 AJ Shipley

With Losses from Email Attacks Rising Fast, is it Automate—or Else?

Siobhan McNamara is a data scientist at Agari, working on machine learning models that determine…

September 13, 2018 Srinivas Malladi

The Chance to Work on Advanced Email Fraud Prevention Tech? Priceless

Siobhan McNamara is a data scientist at Agari, working on machine learning models that determine…

mobile image