* Re: Route fallback issue [not found] ` <20180620081916.GA30608@unix-ag.uni-kl.de> @ 2018-06-20 8:26 ` Akshat Kakkar 2018-06-20 13:48 ` David Ahern 2018-06-20 19:00 ` Julian Anastasov 0 siblings, 2 replies; 12+ messages in thread From: Akshat Kakkar @ 2018-06-20 8:26 UTC (permalink / raw) To: netdev; +Cc: cronolog+lartc, lartc, Erik Auerswald Hi netdev community, I have 2 interfaces eno1 : 192.168.1.10/24 eno2 : 192.168.2.10/24 I added routes as 172.16.0.0/12 via 192.168.1.254 metric 1 172.16.0.0/12 via 192.168.2.254 metric 2 My intention : All traffic to 172.16.0.0/12 should go thru eno1. If 192.168.1.254 is not reachable (no arp entry or link down), then it should fall back to eno2. But this is not working. My box keeps on looking for 192.168.1.254 (i.e. sending arp requests) and never falls back. I have posted this in lartc but looks like solution, if any, has to be from netdev. Your views on this. Do we have some plan/roadmap to resolve this in linux kernel? On Wed, Jun 20, 2018 at 1:49 PM, Erik Auerswald <auerswal@unix-ag.uni-kl.de> wrote: > Hi, > > I have usually used the "replace" keyword of iproute2 for similar > purposes. I would suggest a script as well, run via cron unless 1 minute > failover times are not acceptable. The logic could be as follows: > > if ping -c1 $PRIMARY_NH >/dev/null 2>&1; then > ip route replace $PREFIX via $PRIMARY_NH > elif ping -c1 $SECONDARY_NH >/dev/null 2>&1; then > ip route replace $PREFIX via $SECONDARY_NH > else > ip route del $PREFIX > fi > > Alternatively, one could look into a routing daemon that supports static > routing (Zebra/Quagga/FRRouting, BIRD, ...) and check if that supports > some form of next-hop tracking or at least removes static routes with > unreachable next-hops as one would expect from experience with dedicated > networking devices. > > IMHO static route handling as done by the Linux kernel does not seem > useful for networking devices. I have even had bad experiences with > Arista switches and static routing because they relied too much on the > Linux kernel (probably still do). > > Thanks, > Erik > -- > Bufferbloat just waits in hiding to get you when you try to use the network. > -- Jim Gettys > > On Wed, Jun 20, 2018 at 04:20:11AM +0100, cronolog+lartc wrote: >> Hi, >> >> I /think/ Linux continues sending ARP requests and doesn't fall back >> to the other route because the route to the failed next hop still >> exists in the routing table with highest metric, so it continues >> looking for this next hop. I get the same behaviour as you when >> labbing this up, I could not see a straightforward option to mark a >> route as invalid under changes in reachability, I'd also like to >> know if this feature is built in and exists. >> >> >> However in the enterprise Cisco world, we can do what you are trying >> to do very easily using "route tracking" and "IP SLA" features. >> Basically we define tests e.g. reachability via ping with >> appropriate frequency and threshholds, then attach these tests to >> one or more preferred routes. If the test fails, the associated >> route is automatically uninstalled from the forwarding table, so any >> existing lower metric routes get exposed and are used instead. When >> the test passes again, the preferred routes are reapplied. >> >> The underlying logic of this can certainly be scripted under Linux >> to get very similar functionality, then put into a cron job or a >> while loop or similar. Something along the lines of (pseudocode): >> if [the test such as ping fails] ; then >> if [preferred route exists] ; then ip route delete ... ; fi >> else ## ping is successful >> if [preferred route doesn't exist] ; then ip route add ... ; fi >> fi >> >> >> Hope that helps. I'm also interested in any other solutions to do >> this under Linux. >> >> >> On 2018-06-19 13:18, Akshat Kakkar wrote: >> >I have 2 interfaces >> >eno1 : 192.168.1.10/24 >> >eno2 : 192.168.2.10/24 >> > >> >I added routes as >> >172.16.0.0/12 via 192.168.1.254 metric 1 >> >172.16.0.0/12 via 192.168.2.254 metric 2 >> > >> >My intention : All traffic to 172.16.0.0/12 should go thru eno1. If >> >192.168.1.254 is not reachable (no arp entry or link down), then it >> >should fall back to eno2. >> > >> >But this is not working. My box keeps on looking for 192.168.1.254 >> >(i.e. sending arp requests) and never falls back. >> > >> >Can anyone help? >> > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-20 8:26 ` Route fallback issue Akshat Kakkar @ 2018-06-20 13:48 ` David Ahern 2018-06-20 15:18 ` Grant Taylor 2018-06-20 19:00 ` Julian Anastasov 1 sibling, 1 reply; 12+ messages in thread From: David Ahern @ 2018-06-20 13:48 UTC (permalink / raw) To: Akshat Kakkar, netdev; +Cc: cronolog+lartc, lartc, Erik Auerswald On 6/20/18 2:26 AM, Akshat Kakkar wrote: > Hi netdev community, > > I have 2 interfaces > eno1 : 192.168.1.10/24 > eno2 : 192.168.2.10/24 > > I added routes as > 172.16.0.0/12 via 192.168.1.254 metric 1 > 172.16.0.0/12 via 192.168.2.254 metric 2 > > My intention : All traffic to 172.16.0.0/12 should go thru eno1. If > 192.168.1.254 is not reachable (no arp entry or link down), then it > should fall back to eno2. See the ignore_routes_with_linkdown and fib_multipath_use_neigh sysctl settings. > On Wed, Jun 20, 2018 at 1:49 PM, Erik Auerswald > <auerswal@unix-ag.uni-kl.de> wrote: >> Hi, >> >> I have usually used the "replace" keyword of iproute2 for similar >> purposes. I would suggest a script as well, run via cron unless 1 minute >> failover times are not acceptable. The logic could be as follows: >> >> if ping -c1 $PRIMARY_NH >/dev/null 2>&1; then >> ip route replace $PREFIX via $PRIMARY_NH >> elif ping -c1 $SECONDARY_NH >/dev/null 2>&1; then >> ip route replace $PREFIX via $SECONDARY_NH >> else >> ip route del $PREFIX >> fi >> >> Alternatively, one could look into a routing daemon that supports static >> routing (Zebra/Quagga/FRRouting, BIRD, ...) and check if that supports >> some form of next-hop tracking or at least removes static routes with >> unreachable next-hops as one would expect from experience with dedicated >> networking devices. A feature is in the works to have fallback nexthops. >> >> IMHO static route handling as done by the Linux kernel does not seem >> useful for networking devices. I have even had bad experiences with >> Arista switches and static routing because they relied too much on the >> Linux kernel (probably still do). Useful how? what did not work as expected? Do not confuse Arista's NOS with Linux's capabilities or any NOS truly based on Linux and using a modern kernel. A lot of work has been put into bringing Linux up to par with NOS features. If something is not working, demonstrate the problem on the latest kernel and inquire if someone is working on it. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-20 13:48 ` David Ahern @ 2018-06-20 15:18 ` Grant Taylor 2018-06-20 15:38 ` Grant Taylor 0 siblings, 1 reply; 12+ messages in thread From: Grant Taylor @ 2018-06-20 15:18 UTC (permalink / raw) To: netdev; +Cc: lartc [-- Attachment #1: Type: text/plain, Size: 441 bytes --] On 06/20/2018 07:48 AM, David Ahern wrote: > See the ignore_routes_with_linkdown and fib_multipath_use_neigh sysctl > settings. Where can I find more information on ignore_routes_with_linkdown? I don't see it listed in $Kernel/Documentation/networking/ip-sysctl.txt. (I do see fib_multipath_use_neigh documented there in.) > A feature is in the works to have fallback nexthops. O.o? -- Grant. . . . unix || die [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3982 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-20 15:18 ` Grant Taylor @ 2018-06-20 15:38 ` Grant Taylor 0 siblings, 0 replies; 12+ messages in thread From: Grant Taylor @ 2018-06-20 15:38 UTC (permalink / raw) To: netdev; +Cc: lartc [-- Attachment #1: Type: text/plain, Size: 1193 bytes --] On 06/20/2018 09:18 AM, Grant Taylor wrote: > Where can I find more information on ignore_routes_with_linkdown? I > don't see it listed in $Kernel/Documentation/networking/ip-sysctl.txt. > (I do see fib_multipath_use_neigh documented there in.) I'm specifically interested in if ignore_routes_with_linkdown and / or fib_multipath_use_neigh will cause Linux to fall back to an alternate (higher metric) route if the link is still up but the neighbor is not accessible across it. +-------+ | Linux | +---+---+ | +-----+ +---+----+ +-----+ | R 1 +---+ Switch +---+ R 2 | +-----+ +--------+ +-----+ A typical scenario is where Linux is connected to a DSL or Cable modem where the physical link stays up even if the neighbor R {1,2} goes offline. It's not possible to rely on the local link (MII) status to determine that a neighbor is not reachable. I.e. R 1 going away like below. +-------+ | Linux | +---+---+ | +-----+ +---+----+ +-----+ | R 1 X X Switch +---+ R 2 | +-----+ +--------+ +-----+ -- Grant. . . . unix || die [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3982 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-20 8:26 ` Route fallback issue Akshat Kakkar 2018-06-20 13:48 ` David Ahern @ 2018-06-20 19:00 ` Julian Anastasov 2018-06-21 5:13 ` Grant Taylor 1 sibling, 1 reply; 12+ messages in thread From: Julian Anastasov @ 2018-06-20 19:00 UTC (permalink / raw) To: Akshat Kakkar; +Cc: netdev, cronolog+lartc, lartc, Erik Auerswald Hello, On Wed, 20 Jun 2018, Akshat Kakkar wrote: > Hi netdev community, > > I have 2 interfaces > eno1 : 192.168.1.10/24 > eno2 : 192.168.2.10/24 > > I added routes as > 172.16.0.0/12 via 192.168.1.254 metric 1 > 172.16.0.0/12 via 192.168.2.254 metric 2 > > My intention : All traffic to 172.16.0.0/12 should go thru eno1. If > 192.168.1.254 is not reachable (no arp entry or link down), then it > should fall back to eno2. You can also try alternative routes. But as the kernel supports only default alternative routes, you can put them in their own table: # Alternative routes use same metric!!! ip route append default via 192.168.1.254 dev eno1 table 100 ip route append default via 192.168.2.254 dev eno2 table 100 ip rule add prio 100 to 172.16.0.0/12 table 100 Of course, you will get better results if an user space tool puts only alive routes in service after doing health checks of all near gateways. Regards ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-20 19:00 ` Julian Anastasov @ 2018-06-21 5:13 ` Grant Taylor 2018-06-21 19:57 ` Julian Anastasov 0 siblings, 1 reply; 12+ messages in thread From: Grant Taylor @ 2018-06-21 5:13 UTC (permalink / raw) To: Julian Anastasov, Akshat Kakkar Cc: netdev, cronolog+lartc, lartc, Erik Auerswald On 06/20/2018 01:00 PM, Julian Anastasov wrote: > You can also try alternative routes. "Alternative routes"? I can't say as I've heard that description as a specific technique / feature / capability before. Is that it's official name? Where can I find out more about it? > But as the kernel supports only default alternative routes, you can put > them in their own table: I don't know that that is the case any more. I was able to issue the following commands without a problem: # ip route append 192.0.2.128/26 via 192.0.2.62 # ip route append 192.0.2.128/26 via 192.0.2.126 I crated two network namespaces and had a pair of vEths between them (192.0.2.0/26 and 192.0.2.64/26). I added a dummy network to each NetNS (192.0.2.128/26 and 192.0.2.192/26). I ran the following commands while a persistent ping was running from one NetNS to the IP on the other's dummy0 interface: # ip link set ns2b up && ip route append 192.0.2.192/26 via 192.0.2.126 && ip link set ns2a down (pause and watch things) # ip link set ns2a up && ip route append 192.0.2.192/26 via 192.0.2.62 && ip link set ns2b down (pause and watch things) I could iterate between the two above commands and pings continued to work. So, I think that it's now possible to use "alternate routes" (new to me) on specific prefixes in addition to the default. Thus there is no longer any need for a separate table and the associated IP rule. I'm running kernel version 4.9.76. I did go ahead and set net.ipv4.conf.ns2b.ignore_routes_with_linkdown to 1. for i in /proc/sys/net/ipv4/conf/*/ignore_routes_with_linkdown; do echo 1 > $i; done Doing that dropped the number of dropped pings from 60 ~ 90 (1 / second) to 0 ~ 5 (1 / second). (Rarely, maybe 1 out of 20 flips, would it take upwards of 10 pings / seconds.) > # Alternative routes use same metric!!! > ip route append default via 192.168.1.254 dev eno1 table 100 > ip route append default via 192.168.2.254 dev eno2 table 100 > ip rule add prio 100 to 172.16.0.0/12 table 100 I did have to "append" the route. I couldn't just "add" the route. When I tried to "add" the second route, I got an error about the route already existing. Using "append" instead of "add" with everything else the same worked just fine. Note: I did go ahead and remove the single route that was added via "add" and used "append" for both. > Of course, you will get better results if an user space tool puts only > alive routes in service after doing health checks of all near gateways. I've got to say, with as well as this is working, I don't feel any need for a user space monitoring daemon. I agree that I've felt the need for such in the past before I learned about "alternative routes". I still want to learn more about "alternative routes". Here's a diagram of the test network if someone wants to try to reproduce my findings: +-------------+ +-------------+ | NS1 | | NS2 | | ns2a +-----vEth-A-----+ ns1a | | | | | + dummy0 | | dummy0 + | | | | | ns2b +-----vEth-B-----+ ns1b | | | | | +-------------+ +-------------+ (vEths get the name of the NS that they face.) NS1:ns2a 192.0.2.1 /26 NS1:ns2b 192.0.2.65 /26 NS1:dummy0 192.0.2.129 /26 NS2:ns1a 192.0.2.62 /26 NS2:ns1b 192.0.2.126 /26 NS2:dummy0 192.0.2.254 /26 -- Grant. . . . unix || die ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-21 5:13 ` Grant Taylor @ 2018-06-21 19:57 ` Julian Anastasov 2018-06-21 21:08 ` Grant Taylor 2018-06-24 13:45 ` Erik Auerswald 0 siblings, 2 replies; 12+ messages in thread From: Julian Anastasov @ 2018-06-21 19:57 UTC (permalink / raw) To: Grant Taylor; +Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald Hello, On Wed, 20 Jun 2018, Grant Taylor wrote: > On 06/20/2018 01:00 PM, Julian Anastasov wrote: > > You can also try alternative routes. > > "Alternative routes"? I can't say as I've heard that description as a > specific technique / feature / capability before. > > Is that it's official name? I think so > Where can I find out more about it? You can search on net. I have some old docs on these issues, they should be actual: http://ja.ssi.bg/dgd-usage.txt > > But as the kernel supports only default alternative routes, you can put them > > in their own table: > > I don't know that that is the case any more. > > I was able to issue the following commands without a problem: > > # ip route append 192.0.2.128/26 via 192.0.2.62 > # ip route append 192.0.2.128/26 via 192.0.2.126 > > I crated two network namespaces and had a pair of vEths between them > (192.0.2.0/26 and 192.0.2.64/26). I added a dummy network to each NetNS > (192.0.2.128/26 and 192.0.2.192/26). > > I ran the following commands while a persistent ping was running from one > NetNS to the IP on the other's dummy0 interface: > > # ip link set ns2b up && ip route append 192.0.2.192/26 via 192.0.2.126 && ip > link set ns2a down > (pause and watch things) > # ip link set ns2a up && ip route append 192.0.2.192/26 via 192.0.2.62 && ip > link set ns2b down > (pause and watch things) > > I could iterate between the two above commands and pings continued to work. > > So, I think that it's now possible to use "alternate routes" (new to me) on > specific prefixes in addition to the default. Thus there is no longer any > need for a separate table and the associated IP rule. Not true. net/ipv4/fib_semantics.c:fib_select_path() calls fib_select_default() only when prefixlen = 0 (default route). Otherwise, only the first route will be considered. fib_select_default() is the function that decides which nexthop is reachable and whether to contact it. It uses the ARP state via fib_detect_death(). That is all code that is behind this feature called "alternative routes": the kernel selects one based on nexthop's ARP state. Routes with different metric are considered only when the routes with lower metric are removed. > I'm running kernel version 4.9.76. > > I did go ahead and set net.ipv4.conf.ns2b.ignore_routes_with_linkdown to 1. > > for i in /proc/sys/net/ipv4/conf/*/ignore_routes_with_linkdown; do echo 1 > > $i; done IIRC, this flag invalidates nexthops depending on the link state. If your link is always UP it does not help much. If you rely on user space tool, you can check the state of the desired hops: device link state, your gateway to ISP, one or more gateways in the ISP network which you consider permanent part of the path via this ISP. > Doing that dropped the number of dropped pings from 60 ~ 90 (1 / second) to 0 > ~ 5 (1 / second). (Rarely, maybe 1 out of 20 flips, would it take upwards of > 10 pings / seconds.) > > > # Alternative routes use same metric!!! > > ip route append default via 192.168.1.254 dev eno1 table 100 > > ip route append default via 192.168.2.254 dev eno2 table 100 > > ip rule add prio 100 to 172.16.0.0/12 table 100 > > I did have to "append" the route. I couldn't just "add" the route. When I > tried to "add" the second route, I got an error about the route already > existing. Using "append" instead of "add" with everything else the same > worked just fine. > > Note: I did go ahead and remove the single route that was added via "add" and > used "append" for both. First route can be created with 'add' but all next alternative routes can be added only with "append". If you successfully add them with "add" it means they are not alternatives to the first one, they are not considered at all. Regards ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-21 19:57 ` Julian Anastasov @ 2018-06-21 21:08 ` Grant Taylor 2018-06-25 18:50 ` Julian Anastasov 2018-06-24 13:45 ` Erik Auerswald 1 sibling, 1 reply; 12+ messages in thread From: Grant Taylor @ 2018-06-21 21:08 UTC (permalink / raw) To: Julian Anastasov Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald [-- Attachment #1: Type: text/plain, Size: 6350 bytes --] On 06/21/2018 01:57 PM, Julian Anastasov wrote: > Hello, Hi. > I think so Okay. I'll do some more digging. > You can search on net. I have some old docs on these issues, they should > be actual: > > http://ja.ssi.bg/dgd-usage.txt "DGD" or "Dead Gateway Detection" sounds very familiar. I referenced it in an earlier reply. I distinctly remember DGD not behaving satisfactorily years ago. Where unsatisfactorily was something like 90 seconds (or more) to recover. Which actually matches what I was getting without the ignore_routes_with_linkdown=1 setting that David A. mentioned. With ignore_routes_with_linkdown=1 things behaved much better. > Not true. net/ipv4/fib_semantics.c:fib_select_path() calls > fib_select_default() only when prefixlen = 0 (default route). Okay.... My testing last night disagrees with you. Specifically, I was able to add a alternate routes to the same prefix, 192.0.2.128/26. There was not any default gateway configured on any of the NetNSs. So everything was using routes for locally attacked or the two added via "ip route append". What am I misinterpreting? Or where are we otherwise talking past each other? > Otherwise, only the first route will be considered. "only the first route" almost sounds like something akin to Equal Cost Multi Path. I was not expecting "alternative routes" to use more than one route at a time, equally or otherwise. I was wanting for the kernel to fall back to an alternate route / gateway / path in the event that the one that was being used became unusable / unreachable. So what should "Alternative Routes" do? How does this compare / contract to E.C.M.P. or D.G.D. > fib_select_default() is the function that decides which nexthop > is reachable and whether to contact it. It uses the ARP state via > fib_detect_death(). That is all code that is behind this feature called > "alternative routes": the kernel selects one based on nexthop's ARP > state. Please confirm that you aren't entering / referring to E.C.M.P. territory when you say "nexthop". I think that you are not, but I want to ask and be sure, particularly seeing as how things are very closely related. It sounds like you're referring to literally the router that is the next hop in the path. I.e. the device on the other end of the wire. I'll have to find, read, and try to grok the code to have a better idea. That being said, it looks like (based on the name) that fib_select_default() deals with the default route. The testing I did last night, and positive results, indicate that the kernel did what I wanted it to do. (See above about D.G.D. vs E.C.M.P.) So, it seems as if something about alternative routes worked using non-default routes. I have no way of knowing if it was the code that we're talking about, or something else that produced the results. Given the way I did the test (specific prefixes, non-default, routes being appended with no other routes) worked the way that I would have thought that a feature that uses alternative routes (or historically D.G.D.) would have worked. The following ping works just fine as I bounce interfaces on NS1. ns2# ping -I 192.0.2.254 192.0.2.129 I can confirm that traffic is moving back and forth between the vEth links between the NetNSs. Granted, the traffic sticks to one vEth interface until it goes away. I can shut down ns2a on NS1 so that ns1a sees loss of link but but stays up on NS2, and traffic moves to vEth-B. I can then open up ns2a on NS1 so that ns1a sees link on NS2, and re-append the route on NS1. I can then shut down ns2b on NS1 so that ns1b sees loss of link but stays up on NS2, and traffic moves to vEth-A. I can then open up ns2b on NS1 so that ns1b sees link on NS2, and re-append the route on NS1. NS2 behaves exactly as I would hope. Traffic will move from the down interface to the remaining up interface. Back and forth, no problem. I don't know where the disconnect is, but I feel like there is one. > Routes with different metric are considered only when the routes with > lower metric are removed. I agree with the statement. What I question is where metric came into play here. All of the routes had the same (default) metric. None of the routes I tested had different metrics. ns1# ip route show 192.0.2.0/26 dev ns2a proto kernel scope link src 192.0.2.1 192.0.2.64/26 dev ns2b proto kernel scope link src 192.0.2.65 192.0.2.128/26 dev dummy0 proto kernel scope link src 192.0.2.129 192.0.2.192/26 via 192.0.2.62 dev ns2a 192.0.2.192/26 via 192.0.2.126 dev ns2b ns2# ip route show 192.0.2.0/26 dev ns1a proto kernel scope link src 192.0.2.62 192.0.2.64/26 dev ns1b proto kernel scope link src 192.0.2.126 192.0.2.128/26 via 192.0.2.65 dev ns1b 192.0.2.128/26 via 192.0.2.1 dev ns1a 192.0.2.192/26 dev dummy0 proto kernel scope link src 192.0.2.254 > IIRC, this flag invalidates nexthops depending on the link state. If > your link is always UP it does not help much. That's what I gathered. So things like DSL & cable modems or other L2 bridging devices might not drop the link when their circuit drops. This is also why I asked the follow up questions to David's email. I want to do some testing to see if fib_multipath_use_neigh alters this behavior at all. I'm hoping that it will invalidate an alternate route if the MAC is not resolvable even if the physical link stays up. Sure, the ARP cache may have a 30 ~ 120 second timeout before triggering this behavior. But having that timeout and starting to use an alternative route is considerably better than not using an alternative route. > If you rely on user space tool, you can check the state of the desired > hops: device link state, your gateway to ISP, one or more gateways in the > ISP network which you consider permanent part of the path via this ISP. This is what I have thought about doing previously. > First route can be created with 'add' but all next alternative routes > can be added only with "append". If you successfully add them with > "add" it means they are not alternatives to the first one, they are not > considered at all. ACK -- Grant. . . . unix || die [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3982 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-21 21:08 ` Grant Taylor @ 2018-06-25 18:50 ` Julian Anastasov 2018-06-25 20:07 ` Grant Taylor 0 siblings, 1 reply; 12+ messages in thread From: Julian Anastasov @ 2018-06-25 18:50 UTC (permalink / raw) To: Grant Taylor; +Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald Hello, On Thu, 21 Jun 2018, Grant Taylor wrote: > On 06/21/2018 01:57 PM, Julian Anastasov wrote: > > Hello, > > > http://ja.ssi.bg/dgd-usage.txt > > "DGD" or "Dead Gateway Detection" sounds very familiar. I referenced it in an > earlier reply. > > I distinctly remember DGD not behaving satisfactorily years ago. Where > unsatisfactorily was something like 90 seconds (or more) to recover. Which > actually matches what I was getting without the ignore_routes_with_linkdown=1 > setting that David A. mentioned. Yes, ARP state for unreachable GWs may be updated slowly, there is in-time feedback only for reachable state. > With ignore_routes_with_linkdown=1 things behaved much better. > > > Not true. net/ipv4/fib_semantics.c:fib_select_path() calls > > fib_select_default() only when prefixlen = 0 (default route). > > Okay.... My testing last night disagrees with you. Specifically, I was able > to add a alternate routes to the same prefix, 192.0.2.128/26. There was not > any default gateway configured on any of the NetNSs. So everything was using > routes for locally attacked or the two added via "ip route append". > > What am I misinterpreting? Or where are we otherwise talking past each other? You can create the two routes, of course. But only the default routes are alternative. > > > Otherwise, only the first route will be considered. > > "only the first route" almost sounds like something akin to Equal Cost Multi > Path. > > I was not expecting "alternative routes" to use more than one route at a time, > equally or otherwise. I was wanting for the kernel to fall back to an > alternate route / gateway / path in the event that the one that was being used > became unusable / unreachable. > > So what should "Alternative Routes" do? How does this compare / contract to > E.C.M.P. or D.G.D. The alternative routes work in this way: - on lookup, routes are walked in order - as listed in table - as long as route contains reachable gateway (ARP state), only this route is used - if some gateway becomes unreachable (ARP state), next alternative routes are tried - if ARP entry is expired (missing), this gateway can be probed if the route is before the currently used route. This is what happens initially when no ARP state is present for the GWs. It is bad luck if the probed GW is actually unreachable. - active probing by user space (ping GWs) can only help to keep the ARP state present for the used gateways. By this way, if ARP entry for GW is missing, the kernel will not risk to select unavailable route with the goal to probe the GW. > > fib_select_default() is the function that decides which nexthop is reachable > > and whether to contact it. It uses the ARP state via fib_detect_death(). > > That is all code that is behind this feature called "alternative routes": > > the kernel selects one based on nexthop's ARP state. > > Please confirm that you aren't entering / referring to E.C.M.P. territory when > you say "nexthop". I think that you are not, but I want to ask and be sure, > particularly seeing as how things are very closely related. nexthop is the GW in the route > It sounds like you're referring to literally the router that is the next hop > in the path. I.e. the device on the other end of the wire. Yes, the kernel avoids alternative routes with unreachable GWs > I want to do some testing to see if fib_multipath_use_neigh alters this > behavior at all. I'm hoping that it will invalidate an alternate route if the > MAC is not resolvable even if the physical link stays up. The multipath route uses all its alive nexthops at the same time... But you may need in the same way active probing by user space, otherwise unavailable GW can be selected. > Sure, the ARP cache may have a 30 ~ 120 second timeout before triggering this > behavior. But having that timeout and starting to use an alternative route is > considerably better than not using an alternative route. Yes, if you prefer, you may run PING every second to avoid such delays... Regards ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-25 18:50 ` Julian Anastasov @ 2018-06-25 20:07 ` Grant Taylor 0 siblings, 0 replies; 12+ messages in thread From: Grant Taylor @ 2018-06-25 20:07 UTC (permalink / raw) To: Julian Anastasov Cc: Akshat Kakkar, netdev, cronolog+lartc, lartc, Erik Auerswald [-- Attachment #1: Type: text/plain, Size: 3308 bytes --] On 06/25/2018 12:50 PM, Julian Anastasov wrote: > Hello, Hi Julian, > Yes, ARP state for unreachable GWs may be updated slowly, there is > in-time feedback only for reachable state. Fair. Most of the installations where I needed D.G.D. to work would be okay with a < 5 minute timeout. Obviously they would like faster, but automation is a LOT better than waiting on manual intervention. IMHO < 30 seconds is great. < 90 seconds is acceptable. < 300 seconds leaves some room for improvement. > You can create the two routes, of course. But only the default routes > are alternative. Are you saying that the functionality I'm describing only works for default gateways or that the term "alternative route" only applies to default gateways? The testing that I did indicated that alternative routes worked for specific prefixes too. I tested multiple NetNSs with only directly attached routes and appended routes to a destination prefix, no default gateway / route of last resort. The behavior seemed to be different when ignore_routes_with_linkdown was set verses unset. Specifically, ignore_routes_with_linkdown seemed to help considerably. Hence why I question the requirement for the "default" route verses a route to a specific prefix. Can you explain why I saw the behavior difference with ignore_routes_with_linkdown if it only applies to the default route? > The alternative routes work in this way: > > - on lookup, routes are walked in order - as listed in table > > - as long as route contains reachable gateway (ARP state), only this > route is used > > - if some gateway becomes unreachable (ARP state), next alternative > routes are tried > > - if ARP entry is expired (missing), this gateway can be probed if the > route is before the currently used route. This is what happens initially > when no ARP state is present for the GWs. It is bad luck if the probed > GW is actually unreachable. > > - active probing by user space (ping GWs) can only help to keep the ARP > state present for the used gateways. By this way, if ARP entry for GW > is missing, the kernel will not risk to select unavailable route with > the goal to probe the GW. This all makes sense. Please confirm if "gateway" in this context is the "/default/ gateway" or not. I ask because arguably "gateway" can be used as a term to describe the next hop for a route, or gateway, to a prefix. Further, the "/default/ (gateway,router)" is the gateway or route of last resort. Which to me means that "gateway" can be any route in this context. > nexthop is the GW in the route Thank you for confirming. > Yes, the kernel avoids alternative routes with unreachable GWs Fair enough. > The multipath route uses all its alive nexthops at the same time... But > you may need in the same way active probing by user space, otherwise > unavailable GW can be selected. I assume that the dead ECMP NEXTHOP is also subject to similar timeouts as alternative routes. Correct? > Yes, if you prefer, you may run PING every second to avoid such delays... Agreed. I'm trying to make sure I understand basic functionality before I do things to modify it. -- Grant. . . . unix || die [-- Attachment #2: S/MIME Cryptographic Signature --] [-- Type: application/pkcs7-signature, Size: 3982 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-21 19:57 ` Julian Anastasov 2018-06-21 21:08 ` Grant Taylor @ 2018-06-24 13:45 ` Erik Auerswald 2018-06-25 19:02 ` Julian Anastasov 1 sibling, 1 reply; 12+ messages in thread From: Erik Auerswald @ 2018-06-24 13:45 UTC (permalink / raw) To: Julian Anastasov Cc: Grant Taylor, Akshat Kakkar, netdev, cronolog+lartc, lartc Hello Julien, On Thu, Jun 21, 2018 at 10:57:14PM +0300, Julian Anastasov wrote: > On Wed, 20 Jun 2018, Grant Taylor wrote: > > On 06/20/2018 01:00 PM, Julian Anastasov wrote: > > > You can also try alternative routes. > > > > "Alternative routes"? I can't say as I've heard that description as a > > specific technique / feature / capability before. > > > > Is that it's official name? > > I think so > > > Where can I find out more about it? > > You can search on net. I have some old docs on > these issues, they should be actual: > > http://ja.ssi.bg/dgd-usage.txt Thanks for that info! Can you tell us what parts from the above text is actually implemented in the upstream Linux kernel, and starting with which version(s) (approximately)? The text describes ideas and patches from nearly two decades ago, is more recent documentation available somewhere? Thanks, Erik -- In the beginning, there was static routing. -- RFC 1118 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Route fallback issue 2018-06-24 13:45 ` Erik Auerswald @ 2018-06-25 19:02 ` Julian Anastasov 0 siblings, 0 replies; 12+ messages in thread From: Julian Anastasov @ 2018-06-25 19:02 UTC (permalink / raw) To: Erik Auerswald; +Cc: Grant Taylor, Akshat Kakkar, netdev, cronolog+lartc, lartc Hello, On Sun, 24 Jun 2018, Erik Auerswald wrote: > Hello Julien, > > > http://ja.ssi.bg/dgd-usage.txt > > Thanks for that info! > > Can you tell us what parts from the above text is actually implemented > in the upstream Linux kernel, and starting with which version(s) > (approximately)? The text describes ideas and patches from nearly two > decades ago, is more recent documentation available somewhere? Nothing is included in kernel. The idea is that user space has more control. It is best done with CONNMARK: stick NATed connection to some path (via alive ISP), use route lookup just to select alive path for the first packet in connection. So, what we balance are connections, not packets (which does not work with different ISPs). Probe GWs to keep only alive routes in the table. Regards ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2018-06-25 20:09 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <CAA5aLPhszyo1HBK8gOYZC35fuy_LW8Xw1CT_nob+CuZ0wf33zw@mail.gmail.com>
[not found] ` <10767cab-7a21-faaf-e49e-4752c28d28b7@googlemail.com>
[not found] ` <20180620081916.GA30608@unix-ag.uni-kl.de>
2018-06-20 8:26 ` Route fallback issue Akshat Kakkar
2018-06-20 13:48 ` David Ahern
2018-06-20 15:18 ` Grant Taylor
2018-06-20 15:38 ` Grant Taylor
2018-06-20 19:00 ` Julian Anastasov
2018-06-21 5:13 ` Grant Taylor
2018-06-21 19:57 ` Julian Anastasov
2018-06-21 21:08 ` Grant Taylor
2018-06-25 18:50 ` Julian Anastasov
2018-06-25 20:07 ` Grant Taylor
2018-06-24 13:45 ` Erik Auerswald
2018-06-25 19:02 ` Julian Anastasov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).