Re: Route fallback issue

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: Route fallback issue
       [not found]   ` <20180620081916.GA30608@unix-ag.uni-kl.de>
@ 2018-06-20  8:26     ` Akshat Kakkar
  2018-06-20 13:48       ` David Ahern
  2018-06-20 19:00       ` Julian Anastasov
  0 siblings, 2 replies; 12+ messages in thread
From: Akshat Kakkar @ 2018-06-20  8:26 UTC (permalink / raw)
  To: netdev; +Cc: cronolog+lartc, lartc, Erik Auerswald

Hi netdev community,

I have 2 interfaces
eno1 : 192.168.1.10/24
eno2 : 192.168.2.10/24

I added routes as
172.16.0.0/12 via 192.168.1.254 metric 1
172.16.0.0/12 via 192.168.2.254 metric 2

My intention : All traffic to 172.16.0.0/12 should go thru eno1. If
192.168.1.254 is not reachable (no arp entry or link down), then it
should fall back to eno2.

But this is not working. My box keeps on looking for 192.168.1.254
(i.e. sending arp requests) and never falls back.

I have posted this in lartc but looks like solution, if any, has to be
from netdev.

Your views on this.

Do we have some plan/roadmap to resolve this in linux kernel?



On Wed, Jun 20, 2018 at 1:49 PM, Erik Auerswald
<auerswal@unix-ag.uni-kl.de> wrote:
> Hi,
>
> I have usually used the "replace" keyword of iproute2 for similar
> purposes. I would suggest a script as well, run via cron unless 1 minute
> failover times are not acceptable. The logic could be as follows:
>
> if ping -c1 $PRIMARY_NH >/dev/null 2>&1; then
>   ip route replace $PREFIX via $PRIMARY_NH
> elif ping -c1 $SECONDARY_NH >/dev/null 2>&1; then
>   ip route replace $PREFIX via $SECONDARY_NH
> else
>   ip route del $PREFIX
> fi
>
> Alternatively, one could look into a routing daemon that supports static
> routing (Zebra/Quagga/FRRouting, BIRD, ...) and check if that supports
> some form of next-hop tracking or at least removes static routes with
> unreachable next-hops as one would expect from experience with dedicated
> networking devices.
>
> IMHO static route handling as done by the Linux kernel does not seem
> useful for networking devices. I have even had bad experiences with
> Arista switches and static routing because they relied too much on the
> Linux kernel (probably still do).
>
> Thanks,
> Erik
> --
> Bufferbloat just waits in hiding to get you when you try to use the network.
>                         -- Jim Gettys
>
> On Wed, Jun 20, 2018 at 04:20:11AM +0100, cronolog+lartc wrote:
>> Hi,
>>
>> I /think/ Linux continues sending ARP requests and doesn't fall back
>> to the other route because the route to the failed next hop still
>> exists in the routing table with highest metric, so it continues
>> looking for this next hop.  I get the same behaviour as you when
>> labbing this up, I could not see a straightforward option to mark a
>> route as invalid under changes in reachability, I'd also like to
>> know if this feature is built in and exists.
>>
>>
>> However in the enterprise Cisco world, we can do what you are trying
>> to do very easily using "route tracking" and "IP SLA" features.
>> Basically we define tests e.g. reachability via ping with
>> appropriate frequency and threshholds, then attach these tests to
>> one or more preferred routes.  If the test fails, the associated
>> route is automatically uninstalled from the forwarding table, so any
>> existing lower metric routes get exposed and are used instead.  When
>> the test passes again, the preferred routes are reapplied.
>>
>> The underlying logic of this can certainly be scripted under Linux
>> to get very similar functionality, then put into a cron job or a
>> while loop or similar.  Something along the lines of (pseudocode):
>>    if [the test such as ping fails] ; then
>>       if [preferred route exists] ; then ip route delete ... ; fi
>>    else  ## ping is successful
>>       if [preferred route doesn't exist] ; then ip route add ... ; fi
>>    fi
>>
>>
>> Hope that helps.  I'm also interested in any other solutions to do
>> this under Linux.
>>
>>
>> On 2018-06-19 13:18, Akshat Kakkar wrote:
>> >I have 2 interfaces
>> >eno1 : 192.168.1.10/24
>> >eno2 : 192.168.2.10/24
>> >
>> >I added routes as
>> >172.16.0.0/12 via 192.168.1.254 metric 1
>> >172.16.0.0/12 via 192.168.2.254 metric 2
>> >
>> >My intention : All traffic to 172.16.0.0/12 should go thru eno1. If
>> >192.168.1.254 is not reachable (no arp entry or link down), then it
>> >should fall back to eno2.
>> >
>> >But this is not working. My box keeps on looking for 192.168.1.254
>> >(i.e. sending arp requests) and never falls back.
>> >
>> >Can anyone help?
>> >

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-20  8:26     ` Route fallback issue Akshat Kakkar
@ 2018-06-20 13:48       ` David Ahern
  2018-06-20 15:18         ` Grant Taylor
  2018-06-20 19:00       ` Julian Anastasov
  1 sibling, 1 reply; 12+ messages in thread
From: David Ahern @ 2018-06-20 13:48 UTC (permalink / raw)
  To: Akshat Kakkar, netdev; +Cc: cronolog+lartc, lartc, Erik Auerswald

On 6/20/18 2:26 AM, Akshat Kakkar wrote:
> Hi netdev community,
> 
> I have 2 interfaces
> eno1 : 192.168.1.10/24
> eno2 : 192.168.2.10/24
> 
> I added routes as
> 172.16.0.0/12 via 192.168.1.254 metric 1
> 172.16.0.0/12 via 192.168.2.254 metric 2
> 
> My intention : All traffic to 172.16.0.0/12 should go thru eno1. If
> 192.168.1.254 is not reachable (no arp entry or link down), then it
> should fall back to eno2.

See the ignore_routes_with_linkdown and fib_multipath_use_neigh sysctl
settings.


> On Wed, Jun 20, 2018 at 1:49 PM, Erik Auerswald
> <auerswal@unix-ag.uni-kl.de> wrote:
>> Hi,
>>
>> I have usually used the "replace" keyword of iproute2 for similar
>> purposes. I would suggest a script as well, run via cron unless 1 minute
>> failover times are not acceptable. The logic could be as follows:
>>
>> if ping -c1 $PRIMARY_NH >/dev/null 2>&1; then
>>   ip route replace $PREFIX via $PRIMARY_NH
>> elif ping -c1 $SECONDARY_NH >/dev/null 2>&1; then
>>   ip route replace $PREFIX via $SECONDARY_NH
>> else
>>   ip route del $PREFIX
>> fi
>>
>> Alternatively, one could look into a routing daemon that supports static
>> routing (Zebra/Quagga/FRRouting, BIRD, ...) and check if that supports
>> some form of next-hop tracking or at least removes static routes with
>> unreachable next-hops as one would expect from experience with dedicated
>> networking devices.

A feature is in the works to have fallback nexthops.


>>
>> IMHO static route handling as done by the Linux kernel does not seem
>> useful for networking devices. I have even had bad experiences with
>> Arista switches and static routing because they relied too much on the
>> Linux kernel (probably still do).

Useful how? what did not work as expected?

Do not confuse Arista's NOS with Linux's capabilities or any NOS truly
based on Linux and using a modern kernel. A lot of work has been put
into bringing Linux up to par with NOS features. If something is not
working, demonstrate the problem on the latest kernel and inquire if
someone is working on it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-20 13:48       ` David Ahern
@ 2018-06-20 15:18         ` Grant Taylor
  2018-06-20 15:38           ` Grant Taylor
  0 siblings, 1 reply; 12+ messages in thread
From: Grant Taylor @ 2018-06-20 15:18 UTC (permalink / raw)
  To: netdev; +Cc: lartc

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

On 06/20/2018 07:48 AM, David Ahern wrote:
> See the ignore_routes_with_linkdown and fib_multipath_use_neigh sysctl 
> settings.

Where can I find more information on ignore_routes_with_linkdown?  I 
don't see it listed in $Kernel/Documentation/networking/ip-sysctl.txt. 
(I do see fib_multipath_use_neigh documented there in.)

> A feature is in the works to have fallback nexthops.

O.o?



-- 
Grant. . . .
unix || die


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-20 15:18         ` Grant Taylor
@ 2018-06-20 15:38           ` Grant Taylor
  0 siblings, 0 replies; 12+ messages in thread
From: Grant Taylor @ 2018-06-20 15:38 UTC (permalink / raw)
  To: netdev; +Cc: lartc

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On 06/20/2018 09:18 AM, Grant Taylor wrote:
> Where can I find more information on ignore_routes_with_linkdown?  I 
> don't see it listed in $Kernel/Documentation/networking/ip-sysctl.txt. 
> (I do see fib_multipath_use_neigh documented there in.)

I'm specifically interested in if ignore_routes_with_linkdown and / or 
fib_multipath_use_neigh will cause Linux to fall back to an alternate 
(higher metric) route if the link is still up but the neighbor is not 
accessible across it.

           +-------+
           | Linux |
           +---+---+
               |
+-----+   +---+----+   +-----+
| R 1 +---+ Switch +---+ R 2 |
+-----+   +--------+   +-----+

A typical scenario is where Linux is connected to a DSL or Cable modem 
where the physical link stays up even if the neighbor R {1,2} goes 
offline.  It's not possible to rely on the local link (MII) status to 
determine that a neighbor is not reachable.  I.e. R 1 going away like below.

           +-------+
           | Linux |
           +---+---+
               |
+-----+   +---+----+   +-----+
| R 1 X   X Switch +---+ R 2 |
+-----+   +--------+   +-----+

-- 
Grant. . . .
unix || die

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-20  8:26     ` Route fallback issue Akshat Kakkar
  2018-06-20 13:48       ` David Ahern
@ 2018-06-20 19:00       ` Julian Anastasov
  2018-06-21  5:13         ` Grant Taylor
  1 sibling, 1 reply; 12+ messages in thread
From: Julian Anastasov @ 2018-06-20 19:00 UTC (permalink / raw)
  To: Akshat Kakkar; +Cc: netdev, cronolog+lartc, lartc, Erik Auerswald


	Hello,

On Wed, 20 Jun 2018, Akshat Kakkar wrote:

> Hi netdev community,
> 
> I have 2 interfaces
> eno1 : 192.168.1.10/24
> eno2 : 192.168.2.10/24
> 
> I added routes as
> 172.16.0.0/12 via 192.168.1.254 metric 1
> 172.16.0.0/12 via 192.168.2.254 metric 2
> 
> My intention : All traffic to 172.16.0.0/12 should go thru eno1. If
> 192.168.1.254 is not reachable (no arp entry or link down), then it
> should fall back to eno2.

	You can also try alternative routes. But as the
kernel supports only default alternative routes, you can
put them in their own table:

# Alternative routes use same metric!!!
ip route append default via 192.168.1.254 dev eno1 table 100
ip route append default via 192.168.2.254 dev eno2 table 100
ip rule add prio 100 to 172.16.0.0/12 table 100

	Of course, you will get better results if an user space
tool puts only alive routes in service after doing health
checks of all near gateways.

Regards

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-20 19:00       ` Julian Anastasov
@ 2018-06-21  5:13         ` Grant Taylor
  2018-06-21 19:57           ` Julian Anastasov
  0 siblings, 1 reply; 12+ messages in thread
From: Grant Taylor @ 2018-06-21  5:13 UTC (permalink / raw)
  To: Julian Anastasov, Akshat Kakkar
  Cc: netdev, cronolog+lartc, lartc, Erik Auerswald

On 06/20/2018 01:00 PM, Julian Anastasov wrote:
> You can also try alternative routes.

"Alternative routes"?  I can't say as I've heard that description as a 
specific technique / feature / capability before.

Is that it's official name?

Where can I find out more about it?

> But as the kernel supports only default alternative routes, you can put 
> them in their own table:

I don't know that that is the case any more.

I was able to issue the following commands without a problem:

# ip route append 192.0.2.128/26 via 192.0.2.62
# ip route append 192.0.2.128/26 via 192.0.2.126

I crated two network namespaces and had a pair of vEths between them 
(192.0.2.0/26 and 192.0.2.64/26).  I added a dummy network to each NetNS 
(192.0.2.128/26 and 192.0.2.192/26).

I ran the following commands while a persistent ping was running from 
one NetNS to the IP on the other's dummy0 interface:

# ip link set ns2b up && ip route append 192.0.2.192/26 via 192.0.2.126 
&& ip link set ns2a down
(pause and watch things)
# ip link set ns2a up && ip route append 192.0.2.192/26 via 192.0.2.62 
&& ip link set ns2b down
(pause and watch things)

I could iterate between the two above commands and pings continued to work.

So, I think that it's now possible to use "alternate routes" (new to me) 
on specific prefixes in addition to the default.  Thus there is no 
longer any need for a separate table and the associated IP rule.

I'm running kernel version 4.9.76.

I did go ahead and set net.ipv4.conf.ns2b.ignore_routes_with_linkdown to 1.

for i in /proc/sys/net/ipv4/conf/*/ignore_routes_with_linkdown; do echo 
1 > $i; done

Doing that dropped the number of dropped pings from 60 ~ 90 (1 / second) 
to 0 ~ 5 (1 / second).  (Rarely, maybe 1 out of 20 flips, would it take 
upwards of 10 pings / seconds.)

> # Alternative routes use same metric!!!
> ip route append default via 192.168.1.254 dev eno1 table 100
> ip route append default via 192.168.2.254 dev eno2 table 100
> ip rule add prio 100 to 172.16.0.0/12 table 100

I did have to "append" the route.  I couldn't just "add" the route. 
When I tried to "add" the second route, I got an error about the route 
already existing.  Using "append" instead of "add" with everything else 
the same worked just fine.

Note:  I did go ahead and remove the single route that was added via 
"add" and used "append" for both.

> Of course, you will get better results if an user space tool puts only 
> alive routes in service after doing health checks of all near gateways.

I've got to say, with as well as this is working, I don't feel any need 
for a user space monitoring daemon.  I agree that I've felt the need for 
such in the past before I learned about "alternative routes".

I still want to learn more about "alternative routes".

Here's a diagram of the test network if someone wants to try to 
reproduce my findings:

+-------------+                +-------------+
| NS1         |                |         NS2 |
|        ns2a +-----vEth-A-----+ ns1a        |
|             |                |             |
+ dummy0      |                |      dummy0 +
|             |                |             |
|        ns2b +-----vEth-B-----+ ns1b        |
|             |                |             |
+-------------+                +-------------+

(vEths get the name of the NS that they face.)

NS1:ns2a     192.0.2.1     /26
NS1:ns2b     192.0.2.65    /26
NS1:dummy0   192.0.2.129   /26
NS2:ns1a     192.0.2.62    /26
NS2:ns1b     192.0.2.126   /26
NS2:dummy0   192.0.2.254   /26

-- 
Grant. . . .
unix || die

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-21  5:13         ` Grant Taylor
@ 2018-06-21 19:57           ` Julian Anastasov
  2018-06-21 21:08             ` Grant Taylor
  2018-06-24 13:45             ` Erik Auerswald
  0 siblings, 2 replies; 12+ messages in thread
From: Julian Anastasov @ 2018-06-21 19:57 UTC (permalink / raw)
  To: Grant Taylor; +Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald


	Hello,

On Wed, 20 Jun 2018, Grant Taylor wrote:

> On 06/20/2018 01:00 PM, Julian Anastasov wrote:
> > You can also try alternative routes.
> 
> "Alternative routes"?  I can't say as I've heard that description as a
> specific technique / feature / capability before.
> 
> Is that it's official name?

	I think so

> Where can I find out more about it?

	You can search on net. I have some old docs on
these issues, they should be actual:

http://ja.ssi.bg/dgd-usage.txt

> > But as the kernel supports only default alternative routes, you can put them
> > in their own table:
> 
> I don't know that that is the case any more.
> 
> I was able to issue the following commands without a problem:
> 
> # ip route append 192.0.2.128/26 via 192.0.2.62
> # ip route append 192.0.2.128/26 via 192.0.2.126
> 
> I crated two network namespaces and had a pair of vEths between them
> (192.0.2.0/26 and 192.0.2.64/26).  I added a dummy network to each NetNS
> (192.0.2.128/26 and 192.0.2.192/26).
> 
> I ran the following commands while a persistent ping was running from one
> NetNS to the IP on the other's dummy0 interface:
> 
> # ip link set ns2b up && ip route append 192.0.2.192/26 via 192.0.2.126 && ip
> link set ns2a down
> (pause and watch things)
> # ip link set ns2a up && ip route append 192.0.2.192/26 via 192.0.2.62 && ip
> link set ns2b down
> (pause and watch things)
> 
> I could iterate between the two above commands and pings continued to work.
> 
> So, I think that it's now possible to use "alternate routes" (new to me) on
> specific prefixes in addition to the default.  Thus there is no longer any
> need for a separate table and the associated IP rule.

	Not true. net/ipv4/fib_semantics.c:fib_select_path()
calls fib_select_default() only when prefixlen = 0 (default route).
Otherwise, only the first route will be considered.

	fib_select_default() is the function that decides which
nexthop is reachable and whether to contact it. It uses the ARP
state via fib_detect_death(). That is all code that is behind this
feature called "alternative routes": the kernel selects one
based on nexthop's ARP state. Routes with different metric are
considered only when the routes with lower metric are removed.

> I'm running kernel version 4.9.76.
> 
> I did go ahead and set net.ipv4.conf.ns2b.ignore_routes_with_linkdown to 1.
> 
> for i in /proc/sys/net/ipv4/conf/*/ignore_routes_with_linkdown; do echo 1 >
> $i; done

	IIRC, this flag invalidates nexthops depending on
the link state. If your link is always UP it does not help
much. If you rely on user space tool, you can check the state
of the desired hops: device link state, your gateway to
ISP, one or more gateways in the ISP network which you
consider permanent part of the path via this ISP.

> Doing that dropped the number of dropped pings from 60 ~ 90 (1 / second) to 0
> ~ 5 (1 / second).  (Rarely, maybe 1 out of 20 flips, would it take upwards of
> 10 pings / seconds.)
> 
> > # Alternative routes use same metric!!!
> > ip route append default via 192.168.1.254 dev eno1 table 100
> > ip route append default via 192.168.2.254 dev eno2 table 100
> > ip rule add prio 100 to 172.16.0.0/12 table 100
> 
> I did have to "append" the route.  I couldn't just "add" the route. When I
> tried to "add" the second route, I got an error about the route already
> existing.  Using "append" instead of "add" with everything else the same
> worked just fine.
> 
> Note:  I did go ahead and remove the single route that was added via "add" and
> used "append" for both.

	First route can be created with 'add' but all next
alternative routes can be added only with "append". If you
successfully add them with "add" it means they are not
alternatives to the first one, they are not considered at all.

Regards

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-21 19:57           ` Julian Anastasov
@ 2018-06-21 21:08             ` Grant Taylor
  2018-06-25 18:50               ` Julian Anastasov
  2018-06-24 13:45             ` Erik Auerswald
  1 sibling, 1 reply; 12+ messages in thread
From: Grant Taylor @ 2018-06-21 21:08 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald

[-- Attachment #1: Type: text/plain, Size: 6350 bytes --]

On 06/21/2018 01:57 PM, Julian Anastasov wrote:
> Hello,

Hi.

> I think so

Okay.

I'll do some more digging.

> You can search on net. I have some old docs on these issues, they should 
> be actual:
> 
> http://ja.ssi.bg/dgd-usage.txt

"DGD" or "Dead Gateway Detection" sounds very familiar.  I referenced it 
in an earlier reply.

I distinctly remember DGD not behaving satisfactorily years ago.  Where 
unsatisfactorily was something like 90 seconds (or more) to recover. 
Which actually matches what I was getting without the 
ignore_routes_with_linkdown=1 setting that David A. mentioned.

With ignore_routes_with_linkdown=1 things behaved much better.

> Not true. net/ipv4/fib_semantics.c:fib_select_path() calls 
> fib_select_default() only when prefixlen = 0 (default route).

Okay....  My testing last night disagrees with you.  Specifically, I was 
able to add a alternate routes to the same prefix, 192.0.2.128/26. 
There was not any default gateway configured on any of the NetNSs.  So 
everything was using routes for locally attacked or the two added via 
"ip route append".

What am I misinterpreting?  Or where are we otherwise talking past each 
other?

> Otherwise, only the first route will be considered.

"only the first route" almost sounds like something akin to Equal Cost 
Multi Path.

I was not expecting "alternative routes" to use more than one route at a 
time, equally or otherwise.  I was wanting for the kernel to fall back 
to an alternate route / gateway / path in the event that the one that 
was being used became unusable / unreachable.

So what should "Alternative Routes" do?  How does this compare / 
contract to E.C.M.P. or D.G.D.

> fib_select_default() is the function that decides which nexthop 
> is reachable and whether to contact it. It uses the ARP state via 
> fib_detect_death(). That is all code that is behind this feature called 
> "alternative routes": the kernel selects one based on nexthop's ARP 
> state.

Please confirm that you aren't entering / referring to E.C.M.P. 
territory when you say "nexthop".  I think that you are not, but I want 
to ask and be sure, particularly seeing as how things are very closely 
related.

It sounds like you're referring to literally the router that is the next 
hop in the path.  I.e. the device on the other end of the wire.

I'll have to find, read, and try to grok the code to have a better idea. 
  That being said, it looks like (based on the name) that 
fib_select_default() deals with the default route.  The testing I did 
last night, and positive results, indicate that the kernel did what I 
wanted it to do.  (See above about D.G.D. vs E.C.M.P.)

So, it seems as if something about alternative routes worked using 
non-default routes.  I have no way of knowing if it was the code that 
we're talking about, or something else that produced the results.  Given 
the way I did the test (specific prefixes, non-default, routes being 
appended with no other routes) worked the way that I would have thought 
that a feature that uses alternative routes (or historically D.G.D.) 
would have worked.

The following ping works just fine as I bounce interfaces on NS1.

ns2# ping -I 192.0.2.254 192.0.2.129

I can confirm that traffic is moving back and forth between the vEth 
links between the NetNSs.  Granted, the traffic sticks to one vEth 
interface until it goes away.

I can shut down ns2a on NS1 so that ns1a sees loss of link but but stays 
up on NS2, and traffic moves to vEth-B.

I can then open up ns2a on NS1 so that ns1a sees link on NS2, and 
re-append the route on NS1.

I can then shut down ns2b on NS1 so that ns1b sees loss of link but 
stays up on NS2, and traffic moves to vEth-A.

I can then open up ns2b on NS1 so that ns1b sees link on NS2, and 
re-append the route on NS1.

NS2 behaves exactly as I would hope.  Traffic will move from the down 
interface to the remaining up interface.  Back and forth, no problem.

I don't know where the disconnect is, but I feel like there is one.

> Routes with different metric are considered only when the routes with 
> lower metric are removed.

I agree with the statement.  What I question is where metric came into 
play here.  All of the routes had the same (default) metric.  None of 
the routes I tested had different metrics.

ns1# ip route show
192.0.2.0/26 dev ns2a proto kernel scope link src 192.0.2.1
192.0.2.64/26 dev ns2b proto kernel scope link src 192.0.2.65
192.0.2.128/26 dev dummy0 proto kernel scope link src 192.0.2.129
192.0.2.192/26 via 192.0.2.62 dev ns2a
192.0.2.192/26 via 192.0.2.126 dev ns2b

ns2# ip route show
192.0.2.0/26 dev ns1a proto kernel scope link src 192.0.2.62
192.0.2.64/26 dev ns1b proto kernel scope link src 192.0.2.126
192.0.2.128/26 via 192.0.2.65 dev ns1b
192.0.2.128/26 via 192.0.2.1 dev ns1a
192.0.2.192/26 dev dummy0 proto kernel scope link src 192.0.2.254

> IIRC, this flag invalidates nexthops depending on the link state. If 
> your link is always UP it does not help much.

That's what I gathered.  So things like DSL & cable modems or other L2 
bridging devices might not drop the link when their circuit drops.

This is also why I asked the follow up questions to David's email.

I want to do some testing to see if fib_multipath_use_neigh alters this 
behavior at all.  I'm hoping that it will invalidate an alternate route 
if the MAC is not resolvable even if the physical link stays up.

Sure, the ARP cache may have a 30 ~ 120 second timeout before triggering 
this behavior.  But having that timeout and starting to use an 
alternative route is considerably better than not using an alternative 
route.

> If you rely on user space tool, you can check the state of the desired 
> hops: device link state, your gateway to ISP, one or more gateways in the 
> ISP network which you consider permanent part of the path via this ISP.

This is what I have thought about doing previously.

> First route can be created with 'add' but all next alternative routes 
> can be added only with "append". If you successfully add them with 
> "add" it means they are not alternatives to the first one, they are not 
> considered at all.

ACK

-- 
Grant. . . .
unix || die

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-21 21:08             ` Grant Taylor
@ 2018-06-25 18:50               ` Julian Anastasov
  2018-06-25 20:07                 ` Grant Taylor
  0 siblings, 1 reply; 12+ messages in thread
From: Julian Anastasov @ 2018-06-25 18:50 UTC (permalink / raw)
  To: Grant Taylor; +Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald


	Hello,

On Thu, 21 Jun 2018, Grant Taylor wrote:

> On 06/21/2018 01:57 PM, Julian Anastasov wrote:
> > Hello,
> 
> > http://ja.ssi.bg/dgd-usage.txt
> 
> "DGD" or "Dead Gateway Detection" sounds very familiar.  I referenced it in an
> earlier reply.
> 
> I distinctly remember DGD not behaving satisfactorily years ago.  Where
> unsatisfactorily was something like 90 seconds (or more) to recover. Which
> actually matches what I was getting without the ignore_routes_with_linkdown=1
> setting that David A. mentioned.

	Yes, ARP state for unreachable GWs may be updated slowly,
there is in-time feedback only for reachable state.

> With ignore_routes_with_linkdown=1 things behaved much better.
> 
> > Not true. net/ipv4/fib_semantics.c:fib_select_path() calls
> > fib_select_default() only when prefixlen = 0 (default route).
> 
> Okay....  My testing last night disagrees with you.  Specifically, I was able
> to add a alternate routes to the same prefix, 192.0.2.128/26. There was not
> any default gateway configured on any of the NetNSs.  So everything was using
> routes for locally attacked or the two added via "ip route append".
> 
> What am I misinterpreting?  Or where are we otherwise talking past each other?

	You can create the two routes, of course. But only the
default routes are alternative.

> 
> > Otherwise, only the first route will be considered.
> 
> "only the first route" almost sounds like something akin to Equal Cost Multi
> Path.
> 
> I was not expecting "alternative routes" to use more than one route at a time,
> equally or otherwise.  I was wanting for the kernel to fall back to an
> alternate route / gateway / path in the event that the one that was being used
> became unusable / unreachable.
> 
> So what should "Alternative Routes" do?  How does this compare / contract to
> E.C.M.P. or D.G.D.

	The alternative routes work in this way:

- on lookup, routes are walked in order - as listed in table

- as long as route contains reachable gateway (ARP state), only this route 
is used

- if some gateway becomes unreachable (ARP state), next alternative routes 
are tried

- if ARP entry is expired (missing), this gateway can be probed if the 
route is before the currently used route. This is what happens initially
when no ARP state is present for the GWs. It is bad luck if the probed
GW is actually unreachable.

- active probing by user space (ping GWs) can only help to keep the
ARP state present for the used gateways. By this way, if ARP entry 
for GW is missing, the kernel will not risk to select unavailable route 
with the goal to probe the GW.

> > fib_select_default() is the function that decides which nexthop is reachable
> > and whether to contact it. It uses the ARP state via fib_detect_death().
> > That is all code that is behind this feature called "alternative routes":
> > the kernel selects one based on nexthop's ARP state.
> 
> Please confirm that you aren't entering / referring to E.C.M.P. territory when
> you say "nexthop".  I think that you are not, but I want to ask and be sure,
> particularly seeing as how things are very closely related.

	nexthop is the GW in the route

> It sounds like you're referring to literally the router that is the next hop
> in the path.  I.e. the device on the other end of the wire.

	Yes, the kernel avoids alternative routes with unreachable GWs

> I want to do some testing to see if fib_multipath_use_neigh alters this
> behavior at all.  I'm hoping that it will invalidate an alternate route if the
> MAC is not resolvable even if the physical link stays up.

	The multipath route uses all its alive nexthops at the same 
time... But you may need in the same way active probing by user space,
otherwise unavailable GW can be selected.

> Sure, the ARP cache may have a 30 ~ 120 second timeout before triggering this
> behavior.  But having that timeout and starting to use an alternative route is
> considerably better than not using an alternative route.

	Yes, if you prefer, you may run PING every second to avoid such 
delays...

Regards

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-25 18:50               ` Julian Anastasov
@ 2018-06-25 20:07                 ` Grant Taylor
  0 siblings, 0 replies; 12+ messages in thread
From: Grant Taylor @ 2018-06-25 20:07 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald

[-- Attachment #1: Type: text/plain, Size: 3308 bytes --]

On 06/25/2018 12:50 PM, Julian Anastasov wrote:
> Hello,

Hi Julian,

> Yes, ARP state for unreachable GWs may be updated slowly, there is 
> in-time feedback only for reachable state.

Fair.

Most of the installations where I needed D.G.D. to work would be okay 
with a < 5 minute timeout.  Obviously they would like faster, but 
automation is a LOT better than waiting on manual intervention.

IMHO < 30 seconds is great.  < 90 seconds is acceptable.  < 300 seconds 
leaves some room for improvement.

> You can create the two routes, of course. But only the default routes 
> are alternative.

Are you saying that the functionality I'm describing only works for 
default gateways or that the term "alternative route" only applies to 
default gateways?

The testing that I did indicated that alternative routes worked for 
specific prefixes too.

I tested multiple NetNSs with only directly attached routes and appended 
routes to a destination prefix, no default gateway / route of last resort.

The behavior seemed to be different when ignore_routes_with_linkdown was 
set verses unset.  Specifically, ignore_routes_with_linkdown seemed to 
help considerably.

Hence why I question the requirement for the "default" route verses a 
route to a specific prefix.

Can you explain why I saw the behavior difference with 
ignore_routes_with_linkdown if it only applies to the default route?

> The alternative routes work in this way:
> 
> - on lookup, routes are walked in order - as listed in table
> 
> - as long as route contains reachable gateway (ARP state), only this 
> route is used
> 
> - if some gateway becomes unreachable (ARP state), next alternative 
> routes are tried
> 
> - if ARP entry is expired (missing), this gateway can be probed if the 
> route is before the currently used route. This is what happens initially 
> when no ARP state is present for the GWs. It is bad luck if the probed 
> GW is actually unreachable.
> 
> - active probing by user space (ping GWs) can only help to keep the ARP 
> state present for the used gateways. By this way, if ARP entry for GW 
> is missing, the kernel will not risk to select unavailable route with 
> the goal to probe the GW.

This all makes sense.

Please confirm if "gateway" in this context is the "/default/ gateway" 
or not.  I ask because arguably "gateway" can be used as a term to 
describe the next hop for a route, or gateway, to a prefix.  Further, 
the "/default/ (gateway,router)" is the gateway or route of last resort. 
  Which to me means that "gateway" can be any route in this context.

> nexthop is the GW in the route

Thank you for confirming.

> Yes, the kernel avoids alternative routes with unreachable GWs

Fair enough.

> The multipath route uses all its alive nexthops at the same time... But 
> you may need in the same way active probing by user space, otherwise 
> unavailable GW can be selected.

I assume that the dead ECMP NEXTHOP is also subject to similar timeouts 
as alternative routes.  Correct?

> Yes, if you prefer, you may run PING every second to avoid such delays...

Agreed.

I'm trying to make sure I understand basic functionality before I do 
things to modify it.

-- 
Grant. . . .
unix || die

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 3982 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-21 19:57           ` Julian Anastasov
  2018-06-21 21:08             ` Grant Taylor
@ 2018-06-24 13:45             ` Erik Auerswald
  2018-06-25 19:02               ` Julian Anastasov
  1 sibling, 1 reply; 12+ messages in thread
From: Erik Auerswald @ 2018-06-24 13:45 UTC (permalink / raw)
  To: Julian Anastasov
  Cc: Grant Taylor, Akshat Kakkar, netdev, cronolog+lartc, lartc

Hello Julien,

On Thu, Jun 21, 2018 at 10:57:14PM +0300, Julian Anastasov wrote:
> On Wed, 20 Jun 2018, Grant Taylor wrote:
> > On 06/20/2018 01:00 PM, Julian Anastasov wrote:
> > > You can also try alternative routes.
> > 
> > "Alternative routes"?  I can't say as I've heard that description as a
> > specific technique / feature / capability before.
> > 
> > Is that it's official name?
> 
> 	I think so
> 
> > Where can I find out more about it?
> 
> 	You can search on net. I have some old docs on
> these issues, they should be actual:
> 
> http://ja.ssi.bg/dgd-usage.txt

Thanks for that info!

Can you tell us what parts from the above text is actually implemented
in the upstream Linux kernel, and starting with which version(s)
(approximately)? The text describes ideas and patches from nearly two
decades ago, is more recent documentation available somewhere?

Thanks,
Erik
-- 
In the beginning, there was static routing.
                        -- RFC 1118

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Route fallback issue
  2018-06-24 13:45             ` Erik Auerswald
@ 2018-06-25 19:02               ` Julian Anastasov
  0 siblings, 0 replies; 12+ messages in thread
From: Julian Anastasov @ 2018-06-25 19:02 UTC (permalink / raw)
  To: Erik Auerswald; +Cc: Grant Taylor, Akshat Kakkar, netdev, cronolog+lartc, lartc


	Hello,

On Sun, 24 Jun 2018, Erik Auerswald wrote:

> Hello Julien,
> 
> > http://ja.ssi.bg/dgd-usage.txt
> 
> Thanks for that info!
> 
> Can you tell us what parts from the above text is actually implemented
> in the upstream Linux kernel, and starting with which version(s)
> (approximately)? The text describes ideas and patches from nearly two
> decades ago, is more recent documentation available somewhere?

	Nothing is included in kernel. The idea is that user space
has more control. It is best done with CONNMARK: stick NATed
connection to some path (via alive ISP), use route lookup just
to select alive path for the first packet in connection. So, what
we balance are connections, not packets (which does not work with
different ISPs). Probe GWs to keep only alive routes in the table.

Regards

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-06-25 20:09 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <CAA5aLPhszyo1HBK8gOYZC35fuy_LW8Xw1CT_nob+CuZ0wf33zw@mail.gmail.com>
     [not found] ` <10767cab-7a21-faaf-e49e-4752c28d28b7@googlemail.com>
     [not found]   ` <20180620081916.GA30608@unix-ag.uni-kl.de>
2018-06-20  8:26     ` Route fallback issue Akshat Kakkar
2018-06-20 13:48       ` David Ahern
2018-06-20 15:18         ` Grant Taylor
2018-06-20 15:38           ` Grant Taylor
2018-06-20 19:00       ` Julian Anastasov
2018-06-21  5:13         ` Grant Taylor
2018-06-21 19:57           ` Julian Anastasov
2018-06-21 21:08             ` Grant Taylor
2018-06-25 18:50               ` Julian Anastasov
2018-06-25 20:07                 ` Grant Taylor
2018-06-24 13:45             ` Erik Auerswald
2018-06-25 19:02               ` Julian Anastasov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).