It is amazing how many ways there are to configure a simple WAN Failover on a Mikrotik. This post, contains three simple lines of code, that will perform a hard failover: Use a primary connection 100% of the time, if the primary line becomes unable to ping an external IP, switch entirely over to the failover line. When the primary comes back up, return to primary.
A common weakness of the simple “check-gateway” ping on a route, is if you are not in bridged mode — because your ISP does not offer true PPPoE bridged mode, the failover will not occur in any event except physical-disconnection — your Mikrotik will always be able to ping the gateway is it directly plugged into. What we really care about, is loss of internet access — not just electrical connectivity. When testing failover, disconnect the input, not the cable to your router or the modem’s power.
A common internet setup:
ISP <> Coax Cable / Fiber <> Modem/ISP-Gateway <> Ethernet Cable <> Ether1 Mikrotik
If you unplug the power of the modem, or disconnect the ethernet cable going to your router, your failover usually will test just fine, because the route to your gateway has been lost — but it won’t work if the ISP has issues.
For a proper test – to simulate the ISP having internal routing problems, or your area’s network node goes down, disconnect at the Coax/Fiber side — one step behind the Modem.
One of the more common methods of failover with Mikrotik is using netwatch to monitor interfaces ability to ping external IPs, if a down state is detected, running a script that changes priority of the primary route, or disables an interface or route entirely. These scripts are normally run every 5-10 seconds. I feel uncomfortable with this method, because complexity increases substantially for failing back to the primary interface. Some code can get pretty intricate.
The below code is simple and reliable, but not immediate. If you want zero-loss in connectivity, you’ll probably need to use scripts for a quicker response.
With these route-based rules, failover times are about 15 seconds. From the time internet connectivity stops, to failing over, to workstations regaining internet access, is about 5-15 seconds. From testing, failing back to primary is a little quicker, maybe 5 seconds.
The way the code operates, is pinging a public DNS server, set to Scope 10. The primary route you use for internet access, is also in Scope 10. Route distance is 1.
Distance is a routing metric, that determines the priority that a route will be taken. The lower the distance — the higher priority it takes for flowing traffic. For example, a route at distance 5, will be taken over a route at distance 10.
When the Check-Gateway ping which is forced to flow through our primary ISP gateway of 220.127.116.11 fails, it’s distance is increased to 200 and it is marked as unreachable/inactive. Now the next lowest route is “2”, the failover route.
The ping will continue to run on the primary route, once the DNS server responds, the route will be re-enabled automatically, bringing it’s distance back to 1 — a priority higher than distance 2 — the failover route — traffic resumes flowing through the primary pipe.
#This hard-codes the Verizon Public DNS Server (18.104.22.168) to always go out the primary gateway's IP (usually the gateway IP provided by your ISP), *recursively*, you want to confirm connectivity. Only choose a reliable server (Google is 22.214.171.124, Comcast is 126.96.36.199, Verizon is 188.8.131.52-4, etc). #Note that this will make 184.108.40.206 ONLY go through your primary gateway, if your primary gateway stops working, and you're on the failover line, you won't be able to ping this server. #The scope of 10 is the default scope, which includes any routes you add that don't have a different scope. /ip route add dst-address=220.127.116.11/32 gateway=18.104.22.168 scope=10 comment="Validate Primary Gateway" # #This runs the ping that checks for connectivity. Distance is 1 -- the lower the number, the higher the priority. Traffic will flow through the lowest-numbered route that is reachable/working. /ip route add gateway=22.214.171.124 distance=1 check-gateway=ping comment="WAN Primary" # #This is the failover route, ISP Gateway for the failover line distance is 2, higher than the default route. /ip route add gateway=100.100.100.202 distance=2 comment="WAN Secondary"