Author: Bob Rose and Nathan Snook
Users don’t care why they can’t reach a site or why their online session failed. They don’t care if a component or application went down, a transaction timed-out, a system was attacked, a natural disaster occurred, or a system was taken offline for maintenance. They only know frustration, disappointment and that they are not able to get things done. We’ve all been there. That’s why we need the Power of Three.
Business Continuity
For commercial, enterprise and government entities, on-demand availability and performance are table stakes. Sites, applications, components and IT systems must operate at a high level, continuously without failing. High-availability (HA) infrastructure is designed to deliver quality performance under varying loads with essentially zero downtime. This means HA services and IT systems are virtually always on, delivering five nines (99.999%) reliability even during times of planned and unplanned outages.
HA is a strategy for managing business continuity. It’s intended especially for critical failures in IT components and services that must be easy to restore. Should critical IT infrastructure fail, HA architecture provides an immediate backup component or system so users can continue access and work without disruption.
Related to HA for business continuity is IT Disaster Recovery (DR). Though it’s outside the scope of this blog, DR is worth mentioning. It differs from HA in terms of scope and scale, and includes the policies, tools and procedures IT uses to bring an entire infrastructure and its related services back online following a major catastrophic event (e.g., the destruction of a data center due to a hurricane or flood).
When considering HA, IT architects usually design systems based on three reliability principles: eliminate single points-of-failure; design dependable failover structures; and enable failure detection. These principles are supported by five common HA strategies:
- Redundancy: Adding a secondary component with a mirror-image copy allows the backup to immediately take over a workload should the primary component fail.
- Failover: Failover occurs by transferring a workload from a failed primary to a secondary system.
- Clustering: Clustering combines a group of components to function as a single system. Should one component in the cluster fail, software transfers the workload to the other components.
- Load Balancing: Monitoring component statistics, conducting health checks and transferring workloads from an overloaded to a less utilized component helps support availability and performance.
- Logging: Database logging provides forensic visibility into network data for triage, as well as primary and secondary database synchronization and recovery.
Configuring HA Pairs
In Infoblox NIOS devices, an HA pair can be a Grid Manager, a Grid Manager candidate, a Grid member, or an independent appliance. A physical appliance and a virtual appliance, two physical appliances, or two virtual appliances can comprise an HA pair1. Two nodes form an HA pair. They are identified as Node 1 and Node 2 and are configured in an active/passive relationship. The active node receives, processes and responds to all service requests. The passive node constantly keeps its database synchronized with the active node so it can take over services if a failover occurs. During a failover, the previously active node becomes passive, and the previously passive node becomes active. HA pairs can be configured in either IPv4, IPv6, or in dual stack.
HA Cost and Maintenance
Without question, HA everywhere is great for maintaining business continuity. The problem is, HA infrastructure can be very expensive, especially for resource-constrained IT teams. Most organizations want HA everywhere and many implement policies to support it, especially for DNS, DHCP, and NTP critical path workloads. But this approach means acquiring resources and budget for the associated maintenance costs and overhead for redundant appliances in multiple locations (see Figure 1).
Figure 1
However, while the cost of HA may seem expensive, the cost of a single DNS/DHCP outage can be even higher, often exceeding the total cost of an organization’s HA infrastructure—including technology refreshes and maintenance over time.
- According to the Ponemon Institute, downtime costs an average of $9,000 per minute or over $500,000 per hour.
- A one-hour outage cost Amazon an estimated $34 million in sales in 2021.
- A 20-minute crash during 2021’s Singles’ Day cost Alibaba billions.
- Facebook’s 2021 outage cost Meta nearly $100 million in revenue.
Network downtime is common and continues to rise:
- Uptime Institute’s 2022 Outage Analysis Report found that over 60% of outages cost more than $100,000, an increase from 39% in 2019. The Institute also found that 15% of outages cost more than $1 million, an increase from 11% in 2019.
1Note: there are some limitations when combining physical and virtual appliances for an HA pair. For more information, see the Administrator Guide, About HA Pairs.
- LogicMonitor’s survey of enterprise IT leaders found that over the past three years:
- 97% of enterprises experienced an IT brownout
- 94% of enterprises experienced an IT outage
- The average amount of brownouts for enterprises was 19 per year
- The average amount of IT outages was 15 per year
These statistics underscore the need for HA. Not providing for HA infrastructure in critical workload paths limits business resiliency. For most organizations, this is simply not acceptable. So, how do you achieve HA without breaking the bank?
Power of Three
Fortunately, there is a simple and inexpensive architectural solution–the Power of Three. It involves designing and deploying two Nodes: 1) Node 1 is configured as an HA appliance pair serving DNS/DHCP; 2) Node 2 is a single appliance with DNS and DHCP failover (DHCP FO). In most cases, it simply involves adding a single appliance to make an HA pair. As Noded (don’t unfriend me), it can work with physical, virtual, or a mix of appliance types. It’s also platform agnostic and applies to on-prem, virtual, hybrid, and public cloud deployments. However, it’s important to emphasize that the device being augmented with HA must be sized to handle the workload of both of the other two sites to properly realize the Power of Three.
Figure 2
Without the Power of Three
A common configuration includes two appliances serving DNS/DHCP/NTP. Without the Power of Three, if one fails immediate human intervention (YOU) are required to fix it. If two fail, everything is down. And that usually occurs when it’s absolutely the most inconvenient time and with little or no option for root cause analysis (see Figure 2.)
Figure 3
With the Power of Three
With the Power of Three, two Nodes are deployed including one HA pair and a single appliance. If one appliance goes offline, the workload fails over to the supporting appliance and infrastructure and services remain available. You can then manage the outage on your schedule, do a root cause analysis, submit a ticket, and install a replacement appliance as time allows while planning how to prevent the issue in the future (see Figure 3).
Benefits
The Power of Three provides a number of key benefits:
- High Value/Low Cost: Providing another layer of redundancy for the cost of a single appliance offers a very cost effective, high value investment over alternative HA everywhere deployments. It also helps to safeguard against the astronomical cost of potential DNS/DHCP outages.
- Easy Deployment: Adding an appliance to make an HA pair is a relatively fast, easy, and uncomplicated process. While Professional Services can be involved, this process is one most IT teams can handle by themselves in a fairly short time. It also offers physical and virtual pairing capability and the flexibility to deploy on-prem, and in private, hybrid and public cloud environments.
- Best Practices: The Power of Three further supports HA best practices, while still being mindful of your budget.
- Peace of mind: Ensuring service and infrastructure availability through low-cost redundancy solutions removes friction and improves user experience, peace-of-mind and work-life balance.
Downtime is not a question of if, but when. Is it worth the risk of being unprepared? Fortunately, you don’t have to be. You can improve business continuity with Infoblox through the Power of Three to provide affordable HA. To learn more, contact your account team, visit Infoblox.com, or request a follow-up at info@infoblox.com.