Skip to main content

OpenWrt, mwan3 and default route for IPsec tunnel

OpenWrt mwan3 and IPsec failover: Resolve default route issues for seamless internet & VPN redundancy. Learn how to configure mwan3.user for automatic metric adjustments and IPsec tunnel switching.
Contents

Recently, I faced an issue with mwan3, a package on OpenWrt routers that is designed to manage multiple internet connections, either for load balancing or a failover.

At work, we have two fibre connections, where the main one is used primarily and the second is online, but only activated when the first goes down – typical failover approach.

First connection (lets call it fibre) is with metric 10 wheras second connection (fibre2) is with metric 20.

For tech guys, first fibre is uncontended with static IP, whereas our backup fibre is contended over PPPoE.

My main policy in mwan3 is fibre_fibre2.

fibre_fibre2 policy contain fibre_m1_w3 (Metric 1, Weight 3) and fibre2_m2_w2 (Metric 2, Weight 2) member.

When fibre is down, the fibre2 takes over and all traffic flows through nicely, almost.

I said almost, as there are some instances where traffic is flown through the wrong interface, even if the interface is down.

I also have WireGuard server set on my router, so when one connection goes down, WireGuard starts accepting connections on the second IP.

It is a known issue when you are using mwan3 with a balanced approach, that it is not possible to have WireGuard accepting connections on both connections at once with mwan3 active.

The flaw in mwan3, which is not as easy to sort as it sounds, is that if the main connection is down and backup is activated, any connections initiated by things running on the router itself are still trying to connect through the default route.

If you run the command:

ip route show

You will get something like that:

default via 1.2.3.4 dev wan metric 10 
default via 4.5.6.7 dev wan2 metric 20 

Whereas wan is my fibre, and wan2 is my fibre2. 1.2.3.4 is theoretical IP of gateway for wan interface and 4.5.6.7 is a theoretical IP of gateway for wan2 interface.

When the main connection goes down, all LAN traffic is neatly redirected through the second connection. Local traffic and VoIP phones in business keep working.

However, locally run services may still want to try to send traffic through the default route.

The problem is that when the main connection is down, the default route with the lowest metric is still shown in the routing table; hence, locally run services still think that it is the default, through which traffic shall be sent and received.

The problem is that mwan3 cannot directly amend these default routes; hence, some router traffic will still be pushed through the route with the lowest metric, even if that connection is down.

This issue affects services like IPsec tunnels that I have implemented on the router to allow the whole LAN network to access specific services used in business through RemoteApp.

I got two IPsec tunnels in place. One for the main connection (left=1.2.3.5) and the second for backup (left=4.5.6.8).

1.2.3.5 is the theoretical IP of the WAN interface (fibre), and 4.5.6.8 is a theoretical IP for the WAN2 interface (fibre2).

These tunnels cannot work together as they share the same left subnet (leftsubnet=192.168.1.0/24).

The first IPsec tunnel is activated (ipsec up tunnel1) when the main fibre is up. The second tunnel (ipsec up tunnel2) is activated when the main fibre goes down and fibre2 takes over (after… ipsec down tunnel1) – in theory.

When my main fibre goes down and mwan3 redirects all traffic through fibre2, the ipsec connection is still sent through the default route even though the left IP states which IP to use.

The fault is the default route and the lowest metric of the interface.

Let’s recall our command:

ip route show

And look at that again:

default via 1.2.3.4 dev wan metric 10 
default via 4.5.6.7 dev wan2 metric 20 

Even though mwan3 is nicely redirecting all the traffic, and the second IPsec tunnel is connecting nicely through the second connection (left=4.5.6.8), the traffic that comes from the IPsec tunnel itself, like ping to IP on the other side of the IPsec tunnel, still utilises the default route with the lowest metric (going over the wrong interface).

You are doing ping 10.1.2.3… which is sent through the currently active fibre connection (fibre2/WAN2), but the ping response is directed through the default route with the lowest metric, which belongs to the fibre connection (fibre/WAN) that is currently down.

This was a bit of a head-scratcher for me one day. I tried to understand what’s going on, as IPsec tunnel2 was connected correctly, both ends see it, but LAN traffic through the tunnel was not flowing.

I decided to shut down the main interface, that is currently marked as down, with ifdown wan.

Once I executed that, local traffic through the IPsec tunnel started flowing, and ping started responding correctly.

When I looked at ip route show I noticed that once you put down an interface, the default route, which was with a metric of 10 for WAN, was removed, and only the default route, with a metric of 20, was for WAN2.

Thanks to that, I understand the problem and the limitation of mwan3 when it comes to the default route.

I need to get this sorted, and after a bit of reading, I did it with the use of mwan3 Alerts/notifications through mwan3.user located in /etc/mwan3.user, that can be edited through Terminal or over the web interface (Network > MultiWAN Manager > Notify tab).

Theoretical approach

The idea was, when fibre goes down (controlled and notified through mwan3), the metric for this connection changes, so the fibre2 (second connection that will take over by mwan3) becomes the lowest metric.

This will allow traffic to flow through the connection with the lowest metric, which, in that instance, will be the connection that is currently active as a backup (fibre2).

Practical solution

To change the metric of the interface, you need to remove the default route first for the first connection.

Sadly, it is not possible to just amend the metric, which would be ideal in that case.

In that case, when the main connection (fibre) goes down and traffic is re-routed (by mwan3) to the backup connection, I need to change the metric from 10 to higher than 20.

I decided to just add zero to the current metric.

I am doing that by executing the following commands.

ip route del default via 1.2.3.4
ip route add default via 1.2.3.4 metric 100

This will also redirect the default response of the Wireguard Server to go through the connection with the lowest metric.

IPsec failover

In my /etc/ipsec.conf file, I specified two tunnels, one for each IP of each connection.

Tunnel conn tunnel1 for left=1.2.3.5 (main, fibre)‌ and conn tunnel2 for left=4.5.6.8 (backup, fibre2).

Because both tunnels share the same subnet (leftsubnet=192.168.1.0/24), they cannot work simultaneously (at the same time). I need to set tunnel1 with auto=start and tunnel 2 with auto=add.

In that scenario, when for some reason tunnel1 fails or disconnects, the IPsec will re-try to reconnect to it straight away. The second tunnel will only be able to connect by doing it manually via ipsec down tunnel1 and executing ipsec up tunnel2.

This is not a perfect solution because when my main connection goes down, I will need to go to the router and manually bring tunnel2 up.

The problem is that sometimes there needs to be a time before the other end will notice that tunnel1 is actually down. As if I try to connect tunnel2 when the other end still sees tunnel1 active, there will be a failure running tunnel2 due to an authentication error (even when both tunnels share different access passwords stored in… /etc/ipsec.secrets).

I decided to put both tunnels into separate files.

/etc/ipsec.conf.tunnel1
/etc/ipsec.conf.tunnel2

I set it in both files auto=start. When the second tunnel is meant to be up but fails due to a premature connection and authentication error, it will keep retrying until it succeeds.

Of course, putting the tunnels in separate config files will not make them work, as the IPSEC service will always be ready only from /etc/ipsec.conf.

In that case, like I executed a change of metric for the first connection, when it goes down, I will need to tell which IPSec tunnel needs to be used by copying. ipsec.conf.tunnel2 in place of ipsec.conf file when fibre2 becomes an active connection. When the main fibre goes back online, I will need to copy ipsec.conf.tunnel1 into ipsec.conf to make tunnel1 the default connection.

Of course, each time after copying, I need to restart the IPSEC service using the ipsec restart command.

Now I need to combine all the above to do this automatically with MWAN3 Alerts/notifications feature

mwan3.user script

Let’s put the script below into the /etc/mwan3.user file either by editing it through Terminal, or pasting the content through the web interface (Network > MultiWAN Manager > Notify tab).

All the IP addresses here are fake and need to be adjusted to whatever you are using. For your information, all of my connections use a static external IP address in the IPv4 range.

#!/bin/sh
if [ "${ACTION}" = "disconnected" ] && [ "${INTERFACE}" = "fibre" ] ; then
    # When FIBRE down - set FIBRE2 as primary
    ip route del default via 1.2.3.4 2>/dev/null
    ip route add default via 1.2.3.4 metric 100 2>/dev/null
    cp /etc/ipsec.conf.tunnel2 /etc/ipsec.conf; ipsec restart 2>/dev/null
fi

if [ "${ACTION}" = "connected" ] && [ "${INTERFACE}" = "fibre" ] ; then
    # When FIBRE up - remove FIBRE2 priority route
    ip route del default via 1.2.3.4 2>/dev/null
    ip route add default via 1.2.3.4 metric 10 2>/dev/null
    cp /etc/ipsec.conf.tunnel1 /etc/ipsec.conf; ipsec restart 2>/dev/null
fi

if [ "${ACTION}" = "disconnected" ] && [ "${INTERFACE}" = "fibre2" ] ; then
    # When FIBRE2 down - set FIBRE as primary
    ip route del default via 4.5.6.7 2>/dev/null
    ip route add default via 4.5.6.7 metric 200 2>/dev/null
fi

if [ "${ACTION}" = "connected" ] && [ "${INTERFACE}" = "fibre2" ] ; then
    # When FIBRE2 up - remove FIBRE priority route
    ip route del default via 4.5.6.7 2>/dev/null
    ip route add default via 4.5.6.7 metric 20 2>/dev/null
fi

You can see that there are four rules here.

  1. When the main fibre goes down, the following things happen: a. Remove the default route for the main fibre b. Re-add the default route with a high metric of 100 c. Copy ipsec.conf.tunnel2 into ipsec.conf (so the second tunnel, designed to work with fibre2 connection, will be used) and restart the IPsec service.

  2. When the main fibre goes back online, the following things happen: a. Remove the default route for the main fibre with a high metric of 100 b. Re-add the default route with the original metric 10 c. Copy ipsec.conf.tunnel1 into ipsec.conf and restart the IPsec service

The third and fourth rules are optional and used to bump up the default metric from 20 to 200 when the connection is down, just in case, so even unintentional traffic will not be sent through the interface that is not operational.

Default metrics for each connections are set in Network > Interfaces, by editing each interface and in Advanced Settings specifying metric in position Use gateway metric.

This way I have functioning mwan3 for managing internet connections and re-routing traffic accordingly, and a failsafe for IPsec tunnels, to use the right tunnel for the right “active” connection, with directing traffic through the right route (default) of the connection that is currently in use.

The issue with mwan3 and sending traffic through the wrong interface is not new, and sorting it is not as easy as it sounds, because it depends on many different scenarios. This is why, if we cannot have one solution to fit all, we need to build our own solution to fit what we need, thanks to mwan3.user.

Regards.

Share on Threads
Share on Bluesky
Share on Linkedin
Share via WhatsApp
Share via Email

Comments & Reactions

Categories