Malicious actors are always finding new ways to bypass any company’s defenses and steal valuable data to make quick money. The more dynamic their approach is the more successful they are in evading security controls that use static methods, like blacklists that are not frequently updated. In this blog, we will explore an advanced technique called Domain Generation Algorithm (DGA) used by cyber criminals to circumvent even the most sophisticated defenses and learn how to defend against such attacks.
What are DGAs?
Although DGAs stands for Domain Name Algorithms, they should be called AGDs – algorithmically generated domain names, because that’s what they are. DGAs are code that programmatically produce a list of domains used by malware clients to communicate with a sequence of command and control (C&C) sites. These domains are used as rendezvous points for malware and hacker controlled servers to communicate stealthily on a backhaul network. Once one of the dynamically generated domains is detected and blocked by IT security, the malware client and C&C server switches to the next one on the list to bypass defenses.
DGA domains have the following characteristics:
- They have lengthy nonsensical names – com, since they have a lesser chance of clashing with a pre-registered domain.
- They are usually encoded or encrypted using the same crypto algorithms that both malware client and C&C server share, making them hard to decode/decipher.
- Thousands of DGA domains are generated per day, but only a few are active or resolvable, which is known only to the malware client and C&C server.
- Even when they are active, they have a short life span (often only a few days), making it hard to blacklist.
Why are DGAs difficult to detect?
Domain Generation Algorithms create a constantly moving target for any perimeter firewall to detect and block using domain-based blacklists. Most algorithms use different approaches to randomize the letters in the second level domain that precedes “.com”. These domains are constantly changing based on a static and dynamic seed, which makes it very difficult to detect.
The DGA technique was first popularized by the Conficker worm back in 2008, which at first generated 250 domains per day. With a new strain of Conficker (.C), the malware would generate 50,000 domains a day, which became a huge effort for law enforcement to track every day.
Even the IP address that these DGA domains resolve to can be changing to bypass firewalls that use IP-based blacklists. by using a technology called Fast-Flux. To complicate matters further, most systems that make up the C&C network are not actually responsible for hosting malicious content. This task is reserved for a few machines that act as servers of malicious content; the rest act as redirectors that help mask the real IP addresses of the systems controlled by cyber criminals.
What Are the Ways to Detect DGA?
There are two major methods of detecting DGA-based cyber-attacks.
- Reverse-engineering method: You can calculate the next list of DGA domains, if you have access to the source code of the malware (and maybe the servers). Or if you have already observed a few DGAs in sequence and you can guess or estimate the algorithm, you can do the same. However, this method has critical weaknesses:
- You need a copy of malware source-code, which may be hard to obtain
- Your guess could be wrong, because your sample set is limited.
- You have to assume the hacker is using the source-code, which may not be true.
- The amount of domain names predicted by this method may be too large, which exceeds the amount of memory allowed for blacklists in traditional firewalls
- Machine Learning (ML) method: You can build data models using an existing sample of domain names generated by DGAs and predict unknown DGAs that will be used by cyber criminals. This approach does not suffer the drawbacks listed above with the reverse-engineering technique.
How Can You Defend Against DGAs?
Statistical models using machine learning is the better way to detect DGA based domain names. The following factors should be taken into consideration while building these ML models.
- Entropy: How much randomness is there in the domain name?
- Lexical: Does it appear to be encoded or encrypted?
- N-gram: Does the domain name contain words in a language?
- Frequency: Are far too many requests sent to the same external domain?
- Size: Is the domain name unusually long?
Each of the above statistical characteristics are assigned points, and these individual points are aggregated to compute a consolidated score. When the consolidated score exceeds a certain threshold (which can be tuned), the security analyst can determine whether a domain name used by malware is generated by DGA or not.
Infoblox to the Rescue
Infoblox can shut down DNS-based data exfiltration, DGA and other aggressive malware through automation, curated threat intelligence in BloxOne™ Threat Defense and advanced analytics that combine signature-based detection and machine learning. The solution leverages greater processing capabilities of the cloud to detect a wider range of threats, including data exfiltration, DGA, fast flux, file-less malware and dictionary DGA. It relies on a hybrid security model that protects users and data wherever they happen to be across any infrastructure.
Learn More about DGA in this Infoblox white paper : Using AI/ML to Detect Domain Generation Algorithms