What is DNS failover and how long does it take?

DNS failover is the automatic change of your domain's DNS A record from a failed server's IP to a backup server's IP when a health check service detects that your primary server is not responding. The time from failure to traffic reaching the backup server depends on two factors: how quickly the health check detects the failure (typically 2 to 5 minutes with 60-second check intervals and 2 retries), and how quickly DNS propagates to your visitors (determined by your TTL setting, typically 300 seconds with Cloudflare). In practice, DNS failover completes in 5 to 15 minutes from the moment the primary server goes down. It is not suitable when you need recovery in under 5 minutes â€” use load balancer failover for faster response.

Does Cloudflare Load Balancer work for single-server sites?

Yes. Cloudflare Load Balancer is often positioned as a multi-server tool, but it provides DNS failover functionality even if you have only one active server. You create two origin pools: one containing your primary server and one containing your backup server. Set the steering policy to Failover. Cloudflare health-checks the primary server every 60 seconds, and if it fails two consecutive checks, Cloudflare automatically routes traffic to the backup pool. This gives you automatic DNS failover for $5 per month without requiring any changes to your server configuration. The backup server just needs a current copy of your WordPress site.

What is the difference between passive and active health checks in Nginx?

Passive health checks monitor real production traffic. When Nginx forwards a request to an upstream server and that server fails to respond or returns an error, Nginx counts it as a failure. After the configured number of consecutive failures (max_fails), Nginx marks the server unavailable for the configured duration (fail_timeout) and stops sending traffic to it. This happens automatically with no additional configuration beyond the max_fails and fail_timeout directives. Active health checks send synthetic probe requests to upstream servers independently of real traffic â€” detecting failures even when no real traffic is flowing. Active health checks require Nginx Plus (the commercial version). For most WordPress deployments, passive health checks with appropriate max_fails and fail_timeout settings provide adequate failover capability without the Nginx Plus cost.

How do I promote a MySQL replica to primary when the primary fails?

Connect to the replica server and run three commands in sequence. First, STOP SLAVE â€” this stops the replication process. Second, RESET SLAVE ALL â€” this removes the replication configuration so the server stops trying to replicate from a host that no longer exists. Third, update your wp-config.php to change DB_HOST from the primary's IP to the replica's IP. The replica is now the primary and accepts both reads and writes. If the old primary comes back online, you configure it as a replica of the new primary to resync. For managed database services like DigitalOcean Managed MySQL, this entire process happens automatically in under 30 seconds with no manual commands required.

How often should I test my failover configuration?

DNS and load balancer failover should be tested quarterly. The test takes about 10 minutes: block traffic to your primary server, confirm Cloudflare routes to the backup, verify the site works correctly on the backup, then restore and verify traffic shifts back. Database failover for self-managed setups should be tested biannually or after any change to your database infrastructure. For managed databases with a failover button in the control panel, triggering a manual failover test is low-risk and takes under 5 minutes. The cost of discovering a broken failover configuration during a test is a 10-minute planned disruption. The cost of discovering it during an actual outage is measured in hours.

What does my backup server need to have before failover works?

The backup server needs a current copy of your WordPress files and must be able to connect to a database that reflects your current data. There are two approaches. The sync approach: rsync your WordPress files to the backup server on a schedule (every 4 hours, matching your RPO), and point both servers at the same managed database. On failover, the backup server is already serving live data. The restore approach: the backup server has no site files until disaster strikes, at which point you restore from backup. The sync approach gives faster and cleaner failover. The restore approach costs less in ongoing compute. For DNS failover to work seamlessly, the sync approach is strongly preferred â€” restoring a backup during an active failover adds 15 to 30 minutes of additional downtime to an already stressful situation.

WordPress Failover Explained: DNS, Load Balancer, and Database Failover

By Mangesh Supe

Founder, ThatMy.com • Independent Hosting Benchmarks • ISP & Network Infrastructure Background

Updated May 10 2026 X LinkedIn How we test →

Failover is the automatic switch from a failed system to a working backup â€” without manual intervention, without requiring you to be awake when it happens at 3 AM. Three types matter for WordPress site owners: DNS failover (traffic reroutes when your server is unreachable), load balancer failover (one server in a pool dies and others absorb its traffic), and database failover (primary database fails and a replica is promoted). This guide sets up all three using tools you either already have or can afford for under $20 per month.

Most failover documentation targets enterprise DevOps teams and assumes budget and infrastructure that most site owners do not have. Every setup in this guide is built around Cloudflare (which you may already use), Nginx (which most VPS deployments already run), and managed database services that cost $15 per month. The theory is brief. The configuration is exact.

3 Failover types â€” each protects a different failure scenario

$5/mo Cloudflare Load Balancer â€” DNS and geographic failover

30s Database failover time with DigitalOcean Managed MySQL

Never Trust failover you have not tested

Which Failover Type Do You Need?

Failover is not one thing. The mechanism that protects you when your entire server goes offline is different from the mechanism that handles one server in a pool failing, which is different again from the mechanism that handles your database going down. Choose the wrong type for your situation and you either over-build or leave a gap.

WordPress failover architecture: DNS failover triggering A record change via health check, load balancer failover routing around failed upstream, and database failover promoting replica to primary

Scenario	Failover Type	Setup Complexity	Cost
Entire server goes down	DNS failover or LB failover	Low to Medium	$0 to $5/month
One of multiple servers fails	Load balancer failover	Medium	$5/month (Cloudflare LB)
Database server fails	Database failover	High	$15 to $50/month (managed DB)
Host's entire region goes down	Geographic failover	Medium	$5 to $10/month

Most WordPress sites start with DNS failover because it protects the most common failure scenario â€” your entire server going down â€” at the lowest cost. If you are already running multiple servers behind a load balancer, your load balancer's built-in health checking is the right failover mechanism and requires no additional tooling. Database failover is the most operationally complex and the most consequential: any data written after your last replication sync before the failure is at risk. The redundancy guide explains how these failover types map to the active-passive and active-active architecture patterns and which standby tier each type corresponds to.

Key insight: Failover protects against infrastructure failure. It does not protect against application failure. A bad plugin update that takes WordPress down is not a failover event â€” it will take your backup server down too, because both run the same broken WordPress install. Failover and application monitoring are different defenses against different threats.

DNS Failover: Automatic Traffic Rerouting When Your Server Goes Down

Your server is not responding. DNS failover means your visitors do not know. A health check service is pinging your server every 60 seconds. After two missed pings, it changes your DNS A record from your dead server's IP to your backup server's IP. With a 300-second TTL, visitors start reaching the backup within 5 to 10 minutes of your primary going down. No alert needed. No manual action at 3 AM.

How DNS Failover Works â€” Step by Step

A health check service pings your primary server at a configured path every 60 seconds.

Your primary server fails to respond to 2 consecutive checks â€” typically within 2 minutes of going offline.

The service automatically updates your DNS: your domain now points to the backup server's IP.

With a 300-second TTL, visitor DNS caches expire within 5 minutes and they start reaching the backup server.

When the primary recovers and passes health checks, traffic automatically routes back.

Watch out: DNS failover is not instant. The 5 to 15 minute recovery window is a function of TTL and health check frequency. If you need under-5-minute recovery, you need load balancer failover (handled at the network layer, with no DNS propagation delay). DNS failover is the right tool when your RTO allows a 10-minute window.

Setting Up DNS Failover with Cloudflare Load Balancer

Cloudflare's Load Balancer works as DNS failover even with a single active server. You pay $5 per month for the feature. Here is the complete configuration.

Cloudflare Load Balancer dashboard: two origin pools (primary and backup), health check on /wp-login.php, 60-second interval, 2 retries, and failover steering policy active

Cloudflare Dashboard:
Traffic â†’ Load Balancing â†’ Create Load Balancer

Hostname: yourdomain.com (or www.yourdomain.com)

Pool 1 â€” Primary:
  Name: primary-server
  Origin: your main server IP
  Weight: 1

Pool 2 â€” Backup:
  Name: backup-server
  Origin: your backup server IP
  Weight: 1

Health Check:
  Type: HTTP
  Path: /wp-login.php
  Interval: 60 seconds
  Retries: 2 (mark down after 2 consecutive failures)
  Expected response: 200

Steering Policy: Failover
  Pool order: primary-server first, backup-server second

Session Affinity: Cookie
  (Required for WooCommerce â€” keeps cart sessions
   on the same server during normal operation)

Why /wp-login.php as the health check path: This file always returns 200 on a functioning WordPress install. It is not cached by Cloudflare, WP Rocket, or any CDN, which means it reflects the true state of your server. Using your homepage risks a false positive from a cached edge response â€” the health check would pass even with your origin down.

Preparing the Backup Server

A health check that routes traffic to an empty server is not failover â€” it is redirecting your visitors to a broken page on a different IP. Your backup server needs a current copy of your WordPress site before you need it.

Option A: Rsync-based file sync (automated, every 4 hours):

# Add to crontab on the primary server: crontab -e
# Syncs WordPress files to backup server every 4 hours
0 */4 * * * rsync -avz --delete \
  /var/www/html/wordpress/ \
  deploy@YOUR_BACKUP_IP:/var/www/html/wordpress/

With this approach, both servers connect to the same managed database. Media files are served from a shared object storage bucket (S3 or Cloudflare R2). On failover, the backup server is already serving live data because it shares the same database and the same media files. File sync keeps the WordPress installation and plugins current.

Option B: Shared database and object storage only (cleanest architecture):

# Both servers use identical wp-config.php
# DB_HOST points to your managed database hostname
# (DigitalOcean Managed MySQL or similar)
# Media URLs point to R2 or S3 â€” not local /uploads

define('DB_HOST', 'private-managed-db-endpoint.example.com');

// Offload media to S3 or R2 using WP Offload Media plugin
// Both servers read media from the same bucket
// No local file sync needed for media

Option B is the cleaner architecture. The backup server is identical to the primary from a database and media perspective. File sync via rsync handles the WordPress PHP files. This is the setup I recommend for any site that expects to use DNS failover in a real emergency rather than just a test.

Testing DNS Failover â€” Never Skip This

You cannot trust failover you have not tested. A failover that works in theory but has never been exercised is not a failover â€” it is a plan with unknown failure modes. Here is the exact test procedure:

Note your primary server's current health status in the Cloudflare Load Balancer dashboard â€” it should show green.

Add a temporary firewall rule on the primary server blocking ports 80 and 443 inbound. This simulates a server failure from Cloudflare's perspective without taking the server offline.

Watch the Cloudflare Load Balancer dashboard. The primary pool should turn red within 2 minutes (2 failed checks at 60-second intervals).

From a different device or network (not cached), visit your domain. Verify the site loads. Check the response headers for which server served the request.

Remove the firewall block. The primary pool should turn green within 2 minutes and traffic should shift back automatically.

Document the elapsed time from blocking traffic to the backup server serving requests. That is your real-world DNS failover time.

Run this test quarterly. If anything fails â€” the health check does not trigger, the backup server returns errors, traffic does not shift back â€” you now know before an actual outage. The DNS system itself is covered in the DNS records guide, including how TTL values affect propagation speed and why lowering TTL before planned maintenance is standard practice.

Load Balancer Failover: Nginx Upstream Configuration

If you are already running multiple application servers, your load balancer is doing health checks right now. The question is whether those health checks are configured to trigger actual failover â€” or whether a failed upstream just generates errors until you notice. The difference between the two is three Nginx directives.

Nginx upstream configuration: three backend servers with max_fails and fail_timeout directives, one server marked backup receiving traffic only when both primaries are unavailable

Nginx Passive Health Check Configuration

# /etc/nginx/nginx.conf or your server block
upstream wordpress_backend {
    least_conn;

    # Primary servers â€” receive traffic normally
    server 192.168.1.10:80 max_fails=3 fail_timeout=30s;
    server 192.168.1.11:80 max_fails=3 fail_timeout=30s;

    # Backup â€” only receives traffic when both above are down
    server 192.168.1.12:80 backup;
}

server {
    listen 80;
    server_name yourdomain.com;

    location / {
        proxy_pass http://wordpress_backend;
        proxy_connect_timeout 5s;
        proxy_read_timeout 60s;
    }
}

What each directive does:

max_fails=3 Mark the server unavailable after 3 consecutive failed requests. Each failed real request counts as one failure â€” no synthetic probes needed.

fail_timeout=30s After being marked down, keep the server out of rotation for 30 seconds before retrying. After 30 seconds, Nginx sends one test request â€” if it succeeds, the server is restored.

backup This server receives zero production traffic until all non-backup servers are simultaneously unavailable. Then it handles all traffic. Automatically stops when any primary recovers.

least_conn Load balancing algorithm: send each new request to the server with the fewest active connections. Better than round-robin for WordPress because page generation times vary.

Passive vs Active Health Checks

The configuration above uses passive health checks: Nginx monitors real traffic failures. When a real request to a server fails, Nginx counts it. After max_fails failures, that server is marked down. This is the free approach and handles the common case well.

Active health checks (Nginx Plus) send synthetic probes to upstream servers independently of real traffic. They detect server failures even during low-traffic periods when real requests are sparse. For most WordPress deployments, passive health checks are sufficient. The gap between passive and active detection is typically under 2 minutes during active traffic â€” acceptable for all but the highest-availability use cases.

Key insight: The proxy_connect_timeout 5s setting matters for failover speed. With a 5-second connection timeout, Nginx waits 5 seconds per failed attempt before counting it. Three failures at 5 seconds each means 15 seconds of slow or failed requests before a server is marked down. Reduce this to 2 or 3 seconds if you need faster failover detection at the cost of slightly more aggressive timeouts for legitimately slow connections.

Cloudflare Load Balancer Health Check Behavior

If you are using Cloudflare Load Balancer in front of multiple origins, Cloudflare uses active health checks regardless of your Nginx configuration. The behavior:

Health check interval Every 60 seconds â€” configurable down to 10 seconds on higher plans

Failure threshold 2 consecutive failures to mark origin as unhealthy (120 seconds worst-case detection)

Recovery threshold 2 consecutive successes to restore origin to the pool (120 seconds to recover)

Notification Load Balancer dashboard sends email when origin health changes â€” configure in Notifications settings

Database Failover: The Most Consequential Failure You Can Prepare For

A server fails and your backup serves from a recent file sync. Inconvenient, but recoverable. A database fails without replication and the last backup was 6 hours ago. That means 6 hours of orders, registrations, and content changes gone permanently. Database failover is not optional for any site that processes real-time transactions.

MySQL primary-replica failover sequence: write traffic to primary, replication stream to replica, manual promotion via STOP SLAVE and RESET SLAVE ALL, new primary accepting reads and writes

How MySQL Replication Works

The primary database records every write operation in a binary log. The replica server reads that binary log continuously and applies the same operations to its own copy of the database. The replica typically runs a few seconds to a few minutes behind the primary, depending on write volume. That gap â€” called replication lag â€” represents the maximum data loss if the primary fails unexpectedly.

Primary DB                    Replica DB
-----------                   -----------
Write: INSERT order #5001 --> Binary log --> Apply: INSERT order #5001
Write: UPDATE stock qty    --> Binary log --> Apply: UPDATE stock qty
Write: INSERT user session --> Binary log --> Apply: INSERT user session

Monitoring lag:
SHOW SLAVE STATUS\G
-- Look for: Seconds_Behind_Master: 0
-- 0 = fully synced, real-time
-- Alert threshold: > 60 seconds = investigate

Option 1: Managed Database with Automatic Failover (Recommended)

Self-managed MySQL replication requires monitoring replication lag, executing a multi-step promotion process under pressure during an outage, and coordinating application reconnections. That is not a task most site owners should take on during an incident at 2 AM.

DigitalOcean Managed MySQL ($15 per month for 1 GB plan): Maintains a hot standby replica automatically. Failover is automatic in under 30 seconds. Your application connects to a stable hostname â€” private-managed-db.ondigitalocean.com:25060 â€” that always points to the current primary. After failover, the hostname still resolves to the new primary. No application changes, no manual promotion, no wp-config.php edits.

Amazon RDS Multi-AZ ($30 to $80 per month): Standby replica in a different Availability Zone. Failover in 30 to 60 seconds, DNS-based â€” the endpoint hostname does not change, your application reconnects automatically. The right choice for applications already running in AWS where you want infrastructure in a single account.

I have deployed DigitalOcean Managed MySQL for six Cloudways-hosted WooCommerce stores in the past 18 months. In every case, the database failover tested cleanly under 30 seconds. The operational overhead is minimal compared to self-managed replication. For most site owners this is the correct answer â€” the difference in cost between self-managed and managed is less than one hour of engineer time per month.

Option 2: Self-Managed MySQL Replication (VPS Setups)

For self-managed VPS environments where you need full control or cannot justify the managed database cost, here is the complete replication setup.

# Step 1: Primary server â€” edit /etc/mysql/mysql.conf.d/mysqld.cnf
[mysqld]
server-id          = 1
log_bin            = /var/log/mysql/mysql-bin.log
binlog_do_db       = your_wordpress_db

# Restart MySQL after config change:
sudo systemctl restart mysql

# Step 2: On primary â€” create replication user
mysql -u root -p
CREATE USER 'repl_user'@'REPLICA_SERVER_IP'
  IDENTIFIED BY 'strong_password_here';
GRANT REPLICATION SLAVE ON *.* TO 'repl_user'@'REPLICA_SERVER_IP';
FLUSH PRIVILEGES;

# Note the binary log position before setting up replica:
SHOW MASTER STATUS;
# Note the File and Position values â€” you need them next

# Step 3: Replica server â€” edit /etc/mysql/mysql.conf.d/mysqld.cnf
[mysqld]
server-id   = 2
relay-log   = /var/log/mysql/mysql-relay-bin.log

# Restart MySQL, then connect and start replication:
CHANGE MASTER TO
  MASTER_HOST     = 'PRIMARY_SERVER_IP',
  MASTER_USER     = 'repl_user',
  MASTER_PASSWORD = 'strong_password_here',
  MASTER_LOG_FILE = 'mysql-bin.000001',  -- from SHOW MASTER STATUS
  MASTER_LOG_POS  =  154;                -- from SHOW MASTER STATUS
START SLAVE;

# Verify replication is running:
SHOW SLAVE STATUS\G
-- Slave_IO_Running: Yes
-- Slave_SQL_Running: Yes
-- Seconds_Behind_Master: 0

Manual Failover Procedure When Primary Fails

# Connect to the replica server
mysql -u root -p

# Step 1: Stop replication
STOP SLAVE;

# Step 2: Remove replication configuration
RESET SLAVE ALL;

# The replica is now a standalone primary.
# Update wp-config.php on your application server:
# Change DB_HOST from primary IP to replica IP

Watch out: If your primary fails abruptly (hardware crash, kernel panic), any writes that were in the primary's binary log but had not yet been transmitted to the replica are lost. This is the replication lag problem. With a well-configured setup and low write volume, lag is typically under 5 seconds. Under high write load, lag can grow. Monitor Seconds_Behind_Master in your alerting stack and alert if it exceeds 60 seconds.

Connecting WordPress to Your Replica for Read Offloading

While your replica is running, you can use it to handle WordPress read queries, reducing load on the primary. HyperDB is the correct tool for this in WordPress â€” it is the database abstraction layer the core team built specifically for this purpose. The redundancy guide includes the full HyperDB configuration for primary-replica read splitting.

Application-Level Failover: WordPress-Specific Strategies

Server failover protects against infrastructure failure. But WordPress itself can fail in ways that do not take the server down â€” a payment gateway goes offline, your transactional email provider stops delivering, or a WooCommerce plugin generates a fatal error on the checkout page while everything else loads fine. These are application failures, and they have their own failover strategies.

Payment Gateway Failover

WooCommerce does not automatically fall back to a secondary payment gateway when the primary is unavailable. But you can make it trivially easy for customers to use an alternative. Enable both Stripe and PayPal in WooCommerce â†’ Settings â†’ Payments. When Stripe has an outage â€” which happens a few times per year, typically for under 30 minutes â€” customers who reach checkout still have PayPal available. This is not automatic failover, but it reduces the revenue impact of a payment processor outage from "zero revenue during the outage" to "some customers choose the alternative."

Transactional Email Failover

Order confirmation emails, password resets, and account notifications go silent if your SMTP provider fails. WP Mail SMTP Pro supports multiple configured SMTP connections with automatic failover between them. If SendGrid is down, it falls back to Mailgun. If you use the free WP Mail SMTP, configure your primary SMTP and have a second provider's credentials ready to paste in â€” the switch takes under 2 minutes.

// wp-config.php â€” using wp_mail action to add a fallback
// (requires custom code or WP Mail SMTP Pro)
// Primary: SendGrid
// Fallback: Mailgun
// WP Mail SMTP Pro handles this automatically with
// its "backup connection" feature

Cache as Passive Failover for Read Traffic

This one is underrated. With full-page caching configured in Cloudflare and a caching plugin like WP Rocket, a database failure does not immediately take your entire site down. Cached pages keep serving from Cloudflare's edge network. Visitors browsing product listings, blog posts, or static pages see nothing wrong. Only uncached requests fail: checkout, account login, search, and AJAX endpoints that bypass the cache.

This means a 15-minute database failover window â€” the time to promote a replica or provision a managed failover â€” translates to 15 minutes of checkout unavailability, not 15 minutes of complete site blackout. For a content site, cached content may serve for hours from the Cloudflare edge even if your origin is completely offline. That is passive redundancy that costs nothing beyond the Cloudflare free tier and your existing caching plugin.

Key insight: Cache TTL determines how long passive cache failover lasts. Cloudflare's default edge cache TTL for cached pages is 4 hours. If your origin goes down at 2 PM and you have a 4-hour edge cache TTL, your cached pages serve until 6 PM without any action on your part. Configure Cloudflare's Edge Cache TTL in your Cache â†’ Configuration settings â€” set it to at least 4 hours for content pages.

Failover Testing Runbook: The Tests That Reveal What Is Actually Broken

Every failover configuration has silent failure modes that only appear during testing. The Cloudflare health check might not trigger because /wp-login.php requires authentication on your server. The backup server might serve your site but break WooCommerce cart sessions. The database replica might be 4 minutes behind and lose your most recent orders. None of these are visible until you test. All of them are catastrophic during a real outage.

Test 1: DNS and Load Balancer Failover

Frequency: Quarterly
Estimated duration: 10 to 15 minutes
Impact: 5 to 15 minutes of traffic on backup server
       (transparent to users if backup is current)

Pre-test checks:
[ ] Verify backup server has files synced within last 4 hours
[ ] Confirm both servers connect to same managed database
[ ] Note primary server's current Cloudflare health status (green)

Test procedure:
1. Add firewall rule on primary: block inbound port 80, 443
   (iptables -A INPUT -p tcp --dport 80 -j DROP)
   (iptables -A INPUT -p tcp --dport 443 -j DROP)

2. Watch Cloudflare LB dashboard for 2 minutes
   Expected: primary pool turns red

3. From mobile data (not cached on your home network):
   curl -I https://yourdomain.com
   Expected: 200 response from backup server

4. Load the site and test:
   [ ] Homepage loads
   [ ] A product or post loads
   [ ] WooCommerce cart adds item correctly
   [ ] Checkout page loads (does not need to complete)

5. Remove firewall rules:
   (iptables -D INPUT -p tcp --dport 80 -j DROP)
   (iptables -D INPUT -p tcp --dport 443 -j DROP)

6. Watch Cloudflare LB dashboard: primary should turn green
   within 2 minutes. Traffic should automatically route back.

Document:
- Time from firewall block to backup serving: _____ minutes
- Time from firewall removal to primary restored: _____ minutes
- Any errors encountered on backup: _____________________

Test 2: Database Failover

Frequency: Biannually, or after any database infrastructure change
Estimated duration: 15 to 30 minutes
Impact: 30 to 120 seconds of database unavailability
       (site returns errors during promotion window)

For managed databases (DigitalOcean, RDS):
1. Log into your managed database control panel
2. Find the "Failover" or "Migrate primary" option
   (DigitalOcean: database cluster â†’ Actions â†’ Migrate primary)
3. Trigger the manual failover
4. Start timing from trigger click

Expected timeline (DigitalOcean Managed MySQL):
- 0 to 5 seconds: failover initiated
- 5 to 30 seconds: replica promoted to primary
- 30 seconds: your connection endpoint resolves to new primary
- 30 to 45 seconds: WordPress reconnects and serves normally

Verify after failover:
[ ] Site loads and serves pages
[ ] WooCommerce checkout processes (test order)
[ ] wp-config.php DB_HOST still correct (hostname, not IP)

For self-managed replication (manual procedure):
1. On replica: note current Seconds_Behind_Master value
2. Stop MySQL on primary: sudo systemctl stop mysql
3. Start timing
4. Execute failover procedure:
   STOP SLAVE;
   RESET SLAVE ALL;
5. Update wp-config.php DB_HOST to replica IP
6. Time from primary stop to site serving from replica: _____

Data integrity check after either test:
- Create a test WordPress post before failover
- Verify the post exists after failover completes
- This confirms replication was current at time of failover

Watch out: Never test database failover on a live production database without a current backup confirmed. Managed database failover tests are low-risk because the service maintains the standby for you. Self-managed failover tests carry more risk â€” if your replication has silently broken, the replica may be missing recent data and you will not know until you look at the post-failover data integrity check.

What to Do When a Test Fails

A failed test during a scheduled quarterly check is the best possible outcome. It means you found the gap before an actual outage did. Common test failures and their fixes:

Health check never triggers despite server being blocked Check that /wp-login.php is not returning 403 to Cloudflare's health check IP range. Add Cloudflare's IP ranges to your server's allowlist, or change the health check path to one that allows Cloudflare through.

Backup server returns 500 or blank page WordPress config on backup server has wrong database host, or file sync has not run recently. Verify wp-config.php on backup points to the managed database, and check rsync cron job ran recently.

WooCommerce cart empties on failover Session affinity is not configured. Enable Cookie session affinity in Cloudflare Load Balancer settings. During normal operation, customers stay on the same server. On failover, cart loss for in-flight sessions is unavoidable but rare.

Database replication lag was high at time of failover test Investigate what is causing the lag: high write volume, slow replica hardware, or network latency between primary and replica. Monitor Seconds_Behind_Master continuously and alert if it exceeds 60 seconds.

A complete failover strategy connects back to the RTO and RPO targets you defined in your disaster recovery plan. The test results tell you whether your current infrastructure meets those targets. If the DNS failover test shows a 12-minute recovery time and your RTO is 5 minutes, you either need load balancer failover (faster) or you adjust your RTO to reflect the actual capability of your infrastructure. Knowing the real number is the point of testing.

Frequently Asked Questions

What is DNS failover and how long does it take?: DNS failover is the automatic change of your domain's DNS A record from a failed server's IP to a backup server's IP when a health check service detects that your primary server is not responding. The time from failure to traffic reaching the backup server depends on two factors: how quickly the health check detects the failure (typically 2 to 5 minutes with 60-second check intervals and 2 retries), and how quickly DNS propagates to your visitors (determined by your TTL setting, typically 300 seconds with Cloudflare). In practice, DNS failover completes in 5 to 15 minutes from the moment the primary server goes down. It is not suitable when you need recovery in under 5 minutes â€” use load balancer failover for faster response.
Does Cloudflare Load Balancer work for single-server sites?: Yes. Cloudflare Load Balancer is often positioned as a multi-server tool, but it provides DNS failover functionality even if you have only one active server. You create two origin pools: one containing your primary server and one containing your backup server. Set the steering policy to Failover. Cloudflare health-checks the primary server every 60 seconds, and if it fails two consecutive checks, Cloudflare automatically routes traffic to the backup pool. This gives you automatic DNS failover for $5 per month without requiring any changes to your server configuration. The backup server just needs a current copy of your WordPress site.
What is the difference between passive and active health checks in Nginx?: Passive health checks monitor real production traffic. When Nginx forwards a request to an upstream server and that server fails to respond or returns an error, Nginx counts it as a failure. After the configured number of consecutive failures (max_fails), Nginx marks the server unavailable for the configured duration (fail_timeout) and stops sending traffic to it. This happens automatically with no additional configuration beyond the max_fails and fail_timeout directives. Active health checks send synthetic probe requests to upstream servers independently of real traffic â€” detecting failures even when no real traffic is flowing. Active health checks require Nginx Plus (the commercial version). For most WordPress deployments, passive health checks with appropriate max_fails and fail_timeout settings provide adequate failover capability without the Nginx Plus cost.
How do I promote a MySQL replica to primary when the primary fails?: Connect to the replica server and run three commands in sequence. First, STOP SLAVE â€” this stops the replication process. Second, RESET SLAVE ALL â€” this removes the replication configuration so the server stops trying to replicate from a host that no longer exists. Third, update your wp-config.php to change DB_HOST from the primary's IP to the replica's IP. The replica is now the primary and accepts both reads and writes. If the old primary comes back online, you configure it as a replica of the new primary to resync. For managed database services like DigitalOcean Managed MySQL, this entire process happens automatically in under 30 seconds with no manual commands required.
How often should I test my failover configuration?: DNS and load balancer failover should be tested quarterly. The test takes about 10 minutes: block traffic to your primary server, confirm Cloudflare routes to the backup, verify the site works correctly on the backup, then restore and verify traffic shifts back. Database failover for self-managed setups should be tested biannually or after any change to your database infrastructure. For managed databases with a failover button in the control panel, triggering a manual failover test is low-risk and takes under 5 minutes. The cost of discovering a broken failover configuration during a test is a 10-minute planned disruption. The cost of discovering it during an actual outage is measured in hours.
What does my backup server need to have before failover works?: The backup server needs a current copy of your WordPress files and must be able to connect to a database that reflects your current data. There are two approaches. The sync approach: rsync your WordPress files to the backup server on a schedule (every 4 hours, matching your RPO), and point both servers at the same managed database. On failover, the backup server is already serving live data. The restore approach: the backup server has no site files until disaster strikes, at which point you restore from backup. The sync approach gives faster and cleaner failover. The restore approach costs less in ongoing compute. For DNS failover to work seamlessly, the sync approach is strongly preferred â€” restoring a backup during an active failover adds 15 to 30 minutes of additional downtime to an already stressful situation.

Where to Go Next

If you have DNS failover configured but no tested backup plan behind it, the disaster recovery guide covers what happens after failover completes â€” how to recover data, restore a corrupted database, and document the incident. For the database configuration side of failover, the redundancy guide covers the active-passive architecture patterns that DNS and database failover implement, including how to use HyperDB for read-replica splitting while your primary is healthy. And if you are deciding between Cloudflare Load Balancer and a self-managed Nginx upstream for your load balancing layer, the load balancing guide compares both approaches across traffic volume, cost, and operational complexity.

TOP