Amazon discontinued production of its popular Internet domain ranking list, Alexa, on May 1st, 2022 and many users of the service are scrambling to find a replacement.1 Widely used for purposes ranging from search engine optimization to security applications, the website alexa[.]com began providing publicly available, free rankings of domains over twenty five years ago. Infoblox has not utilized Alexa for some years, having found statistical issues with the lists that made them unreliable for our use cases.2 With users forced to find a new information source or devise their own, we want to share our insight into ranking Internet domains. Today we are releasing a new white paper that discusses the security use cases for domain rankings and the difficulties inherent in creating reliable ranking lists, provides a short technical assessment of alternative public ranking lists, and makes recommendations for replacing Alexa in your workflows.
Our paper provides an analysis on the publicly available lists: Alexa, Cisco Umbrella, Majestic, as well as an aggregate list called Tranco. This analysis builds on what we previously published in our papers Whitelists that Work: Creating Dynamic Defensible Whitelists using Statistical Learning and InfoRanks: Statistical Inference for Defining Internet Ranks. In addition to the public lists, we include analysis of our own InfoRanks and top domains within a selection of our networks.
We demonstrate that ranking lists are highly network specific and combining them together as is done by Tranco does not improve the quality or interpretability of the list. While two of the Tranco goals were to reduce malicious domains in the list and have a larger intersection with user traffic, our analysis showed that neither of these goals were achieved. Using a random subset of Infoblox active threat domains, we found that Tranco contained more malicious domains than its public counterparts on May 27th, 2022. These results are shown in Table 1 below.
|Top 1M List||Number of Infoblox Active High Threats|
Table 1. The number of active threats found in each public list on May 27th, 2022. The active threat domains used in this table are high threats, originating from Infoblox Threat Intelligence, available in the Threat Intelligence Data Exchange (TIDE), and are second level domains only. The total number of threats considered was approximately 1.6M.
We also show that the public lists have little overlap with our own networks. This is an inherent limitation of ranking lists and a demonstration of the unique nature of DNS within every network. Table 2 below shows the overlap between two network perspectives within Infoblox, our DNS forwarding proxies and our BloxOne Clients, both in aggregate, with the public lists. Our white paper shows more detailed analysis of this phenomenon.
|May 27, 2022||Tranco Overlap||Umbrella Overlap||Alexa Overlap||Majestic Overlap|
|Infoblox DNS Forwarding Proxies (DFP)||34%||19%||24%||26%|
|Infoblox BloxOne Clients (laptops, mobile devices)||45%||27%||35%||35%|
Table 2. Overlap percentage between the top 1M domains in the public lists and Infoblox products on May 27th, 2022.
Infoblox customers have access to our patent-pending InfoRanks domain rankings via the customer services portal. While all ranking lists suffer from limitations based on the unique nature of every network, InfoRanks attempts to address another well-known issue with domain rankings: stability. As discussed in our earlier blog, there are a number of causes for the variance in rankings from day-to-day. Tranco attempts to address variance by averaging the rank over a 30 day window, a straightforward method that can lead to inaccurate results.
InfoRanks provides users both the most likely rank over a 7 day period, as well as the potential interval of the true rank. This additional information provides context for decision support systems. Table 3 below shows that as the popularity of a domain within a network decreases, the uncertainty of its rank increases. In this example, there is a good amount of confidence that google[.]com is the 7th or 8th most popular domain. In contrast, the domain researchgate[.]net is most likely ranked 4143, but everything between 3634 and 4531 are acceptable possibilities. The additional context allows the user to understand the fluctuations with several days of DNS data at a glance and make stronger decisions about the importance of the domain.
|Domain||Most likely rank||Rank intervals||Rank Range|
Table 3. Calculated most likely and rank intervals using InfoRanks methods for a sample of 5 domains.
This same data is shown visually in Figure 1 below. It becomes readily apparent that as the popularity decreases, the potential error increases rapidly.
|Figure 1. Calculated most likely rank and rank intervals using InfoRanks methods for a sample of 5 domains. Ranks get more difficult to represent with a single value as plausible ranks get wider when popularity decreases.|
Before replacing Alexa in your workflows, we recommend analyzing your use cases. Most importantly, use data sources that are relevant to your environment and use cases. For most security use cases, the best list of top domains is one generated from your own network traffic, or one containing similar traffic to your own. If you choose to use one or several of the publicly available lists, let them inform, rather than dictate, decisions in your workflow. To learn more about the limitations of public ranking lists and the pitfalls of combining them, check out our white paper.