If you manage an IT organization, you might be concerned about the recent Internet routing table “crisis,” where the volume of Internet routes is reportedly exceeding the 512k capacity of many Internet routers. (Among the coverage are pieces in The Wall Street Journal on Aug. 13 and Aug 19.) Will your organization have to buy new routers? Will the Internet become unstable? Before you summon your network engineering team for an emergency meeting, a few words of advice: This is nothing new, and the situation is manageable – although this can also be a “teachable moment” to explore ways of making your network more flexible and more efficient.
First, as I said, this is nothing new. At my last job, before I joined Infoblox, I had to explicitly plan for routing table capacity issues. This was back when the global routing table was only 300K prefixes or so (and the IPv6 routing table was a measly 2.5K prefixes).
Working for a company that maintained a global content delivery network with its own IP backbone meant my team had to requisition a staggering number of 10 gigabit ports connected to transit providers and Internet peers. Each of these ports, in turn, needed full Internet visibility in order to optimize the outbound traffic that was critically important to delivering performance for our clients.
And each of the thousands of 10G ports in question supported a maximum of 512K prefixes due to limitations in Ternary Content-Address Memory (TCAM), the technical name for the memory that holds the routing tables that we’ve been reading so much about lately. The 300K IPv4 prefixes available then in the default-free zone still gave 40 percent more headroom to grow before hitting the 512K barrier.
The catch was that we were in the middle of adopting IPv6 (in a dual-stack configuration, along with IPv4). The TCAM profile available on these ports would give us 384K worth of IPv4 prefixes (with 64K reserved for IPv6 prefixes).
Growth of the Internet routing table. Source: BGP Report
Based on the growth trajectory of the global routing table it was fairly obvious back then that we would be bumping against the ceiling of prefixes available with a 384K TCAM profile. Our router vendor was happy to sell us line cards with enough TCAM to support 1M routes. But I probably don’t need to tell you that if one of your biggest cost centers is the price of 10G ports, TCAM is expensive — very expensive. Let’s just say I was soon tasked with figuring out what prefixes we might be able to safely summarize or exclude to keep our TCAM support at only 512K and subsequently keep the cost of 10G ports to a minimum, a project that most network engineers would rather substitute with a trip to the dentist for one or more root canals.
Of course, knowing a bit of the history of the Internet I realized it didn’t have to be this way. The biggest part of the problem was (and is) the disaggregation (i.e., the breaking up of larger blocks) of IPv4 address space to provide organizations with sufficient host addressing. This has often been the direct consequence of those same organizations lacking the proper tools to definitively know, at any given point in time, the levels of utilization for IP addresses and IP address blocks they’re currently using or that they’ve allocated.
As a result, it’s often easier for an organization to simply assume levels of utilization justifying the request of more IP blocks. These requests for more IP space, when granted, have been necessarily met with allocations that are both smaller and discontiguous, due to the steadily shrinking supply of IPv4. This in turn has led to a larger and larger global routing table. And so here we are, just north of 512K routes and suffering outages because of, among other things, lack of deployed tools for IP address management and mature operational practice.
The truth is, for all the drama, the 512K problem is easily remedied for most organizations. Service providers have known about the impending issue for a long time and don’t have much of an excuse for not upgrading TCAM ahead of time. Meanwhile the great majority of enterprises don’t need the routing visibility and optimization afforded by storing the global routing table at their Internet edge. Further, with a little diligence and consensus, harmful disaggregation of IPv6 can be avoided altogether.
But the question remains: As an IT manager or CIO, if you needed your team to produce a report of all the IP address prefixes your organization owns, all the IP addresses it controls, all their associated DNS and DHCP data — even the violations of security policy the associated hosts and servers are potentially responsible for, what would that report look like? Could decisions ensuring business agility be made from such a report?
I’m talking about the kind of data that enhance network visibility, control, automation, and security. The shape and cost of the next equivalent to the “512K mess” are likely visible in those data today. There’s still time to put the right tools and practices into place to access that data, prevent the next major outage, and demonstrate that IT’s role in business agility isn’t forever capped (and trapped) at a proverbial 512K.