Part 1 of a 3-part series

The Goal of this Experiment

The question under investigation in this post is whether the propensity to receive malicious emails is a product of time or of total email volume within an organization. We want to know whether we can project (a) the time before an organization will receive a malicious email and (b) the number of good emails they receive before a malicious one. This is interesting for a number of reasons. If we can project the time before a malicious email we have a potentially powerful indicator of the behavior of criminals. Alternatively if there is a consistent pattern of the number of good emails before another malicious one we could say that larger organizations who receive more emails are more exposed. Conversely if there is no relationship between email volume and malicious emails it would not matter how large an organization is, both small and large organizations are equally exposed.

The Backdrop

In order to project the time or the number of emails before the next malicious message arrives, time gaps and email volume gaps were plotted and their respective statistics calculated. Malicious email volume did not correlate with total email volume. Malicious emails could arrive in succession or there may be thousands of good emails before any malicious one arrives.

In contrast there were intriguing patterns in time gaps. Time gaps between malicious emails were predominantly small, the average was 6 minutes with 90% falling within 16 minutes. There were however some large time gaps and this average was not representative of the underlying pattern. We cannot with confidence project any time or email volume gap between malicious emails. This is not surprising, it stands to reason. It would be strange if there were consistent intervals between malicious messages.

What was different and interesting about the time gaps is that they were very consistent across organizations. They had very similar average time gaps between malicious emails and these time gaps varied the same amount. What this means is that while time gaps themselves are not consistent and uniform, there is a consistent relationship between time and malicious emails across organizations. Therefore larger and smaller organizations will receive malicious emails just as often.

Time Series

In order to better understand what this might mean we did a little more research into the spread of malicious emails across time and compared this to ordinary email patterns. Email volumes have a clear pattern with peaks on weekdays during work hours and troughs over the weekend.

In contrast these weekly cycles are less evident in the volume of malicious emails.

With respect to malicious emails there are spikes that recur regularly and dotted amongst them are some larger spikes when volumes of malicious emails surge. Weekday and work hour patterns are not evident though they may be thwarted by the larger spikes in visualizations like this. The blue line in the above shows the total volume of malicious emails received at one organization every minute over a 45 day period. The red line is the rolling average over 15 minutes and the black line at the bottom is the rolling variation. What we see here is, while there is no obvious workday pattern in line with the total email volume, the mean and variation are fairly consistent over time. These are critical factors for performing a time series analysis. In addition they are crucial for our understanding of the phenomena under investigation. Here malicious emails at least on average are the same over the 45 days, there are spikes but the spikes are evenly distributed and the general pattern remains similar.

In the next blog we’ll delve into predictive models for malicious emails and how these may shed light on when organizations are most at risk.