Big data is useful for more than just figuring how much milk needs to be stocked in supermarkets. I’ve joined with two members of my team to research new methods for using the concept of big data to detect when cybercriminals are avoiding detection on enterprise networks. Bin Yu, Mark Threefoot and I have applied for a patent in our work, which involves detecting “fast flux” abuse of the Domain Name System (DNS) by cybercriminals.
Fast flux is a technique used to mask botnets, which are malicious Internet-connected programs colluding across multiple machines for automated tasks. When phishing or distributing malware, these botnets hide behind a rapidly shifting network of compromised hosts acting as proxies, employing a variety of Internet Protocol (IP) addresses associated with a legitimate domain name. Using the fast flux (also known as flux domain) technique, the botnets shift among these IP addresses. With fast flux masking both their identities and activities, cybercriminals delay or evade detection.
As we note in the paper outlining our research, fast flux “exploits the stability and resilience of the DNS to make it difficult to eliminate systems being used for criminal activities. It can frustrate both administrative remedies and technical remedies. While [fast flux] isn’t a threat to any component of the DNS infrastructure, it is a threat to Internet users facilitated by DNS.”
In order to differentiate malicious fast flux from the legitimate activity found in content delivery networks and network time protocol (NTP) services, we created a model focused not only on the DNS time-to-live feature (the data that prevents a message from circulating in perpetuity) but also on the trustworthiness of DNS resource records. That is, DNS messages from a highly trusted source are excused from analysis in favor of those for which there is less certainty, enabling the system to focus on the most likely suspects, rather than all suspects equally.
Where does big data come in? By using advanced analytics technologies to combine information from an offline machine-learning system – which “trains” the model to get smarter – and an online system that supports throughput of real-time DNS streaming.
It sounds complicated – and it is – but it comes down to this: Determining the validity or invalidity of a DNS message requires the amalgamation of multiple types of information in near-real time, and our work on detection mechanisms brings us that much closer to identifying when a DNS server is being spoofed.
Testing 200 days’ worth of DNS data (collected between November 2012 to June 2013), we analyzed 10 million DNS messages for 906 unique domains classified as fast flux. Out of all those messages, they were able to identify almost 24,000 potentially suspect messages using fast flux technology. Needless to say, the amount of work involved in blocking 24,000 DNS messages versus 10 million is considerably less – and faster.
As we said in our research paper, “If such a type of threat can be detected and mitigated on a DNS transaction that is often the entry point for network connections, the network is that much safer.”
It’s too soon to say for sure how Infoblox will use this research, but we’re continuing to look for new ways to make DNS infrastructure more efficient and more secure.