linux-api.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
To: Matt Muggeridge <Matt.Muggeridge@hpe.com>
Cc: David Ahern <dsahern@kernel.org>,
	"David S . Miller" <davem@davemloft.net>,
	linux-api@vger.kernel.org, stable@vger.kernel.org,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net 1/1] net/ipv6: Netlink flag for new IPv6 Default Routes
Date: Tue, 5 Nov 2024 15:37:32 +0100	[thread overview]
Message-ID: <0a8d6565-fdc0-452f-b132-5d237a1b7dec@6wind.com> (raw)
In-Reply-To: <20241105031841.10730-2-Matt.Muggeridge@hpe.com>

Le 05/11/2024 à 04:18, Matt Muggeridge a écrit :
> Add a Netlink rtm_flag, RTM_F_RA_ROUTER for the RTM_NEWROUTE message.
> This allows an IPv6 Netlink client to indicate the default route came
> from an RA. This results in the kernel creating individual default
> routes, rather than coalescing multiple default routes into a single
> ECMP route.
> 
> Details:
> 
> For IPv6, a Netlink client is unable to create default routes in the
> same manner as the kernel. This leads to failures when there are
> multiple default routers, as they were being coalesced into a single
> ECMP route. When one of the ECMP default routers becomes UNREACHABLE, it
> was still being selected as the nexthop.
> 
> Meanwhile, when the kernel processes RAs from multiple default routers,
> it sets the fib6_flags: RTF_ADDRCONF | RTF_DEFAULT. The RTF_ADDRCONF
> flag is checked by rt6_qualify_for_ecmp(), which returns false when
> ADDRCONF is set. As such, the kernel creates separate default routes.
> 
> E.g. compare the routing tables when RAs are processed by the kernel
> versus a Netlink client (systemd-networkd, in my case).
> 
> 1) RA Processed by kernel (accept_ra = 2)
> $ ip -6 route
> 2001:2:0:1000::/64 dev enp0s9 proto kernel metric 256 expires ...
> fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
> default via fe80::200:10ff:fe10:1060 dev enp0s9 proto ra ...
> default via fe80::200:10ff:fe10:1061 dev enp0s9 proto ra ...
> 
> 2) RA Processed by Netlink client (accept_ra = 0)
> $ ip -6 route
> 2001:2:0:1000::/64 dev enp0s9 proto ra metric 1024 expires ...
> fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
> fe80::/64 dev enp0s9 proto kernel metric 256 pref medium
> default proto ra metric 1024 expires 595sec pref medium
> 	nexthop via fe80::200:10ff:fe10:1060 dev enp0s9 weight 1
> 	nexthop via fe80::200:10ff:fe10:1061 dev enp0s9 weight 1
> 
> IPv6 Netlink clients need a mechanism to identify a route as coming from
> an RA. i.e. a Netlink client needs a method to set the kernel flags:
> 
>     RTF_ADDRCONF | RTF_DEFAULT
> 
> This is needed when there are multiple default routers that each send
> an RA. Setting the RTF_ADDRCONF flag ensures their fib entries do not
> qualify for ECMP routes, see rt6_qualify_for_ecmp().
> 
> To achieve this, introduce a user-level flag RTM_F_RA_ROUTER that a
> Netlink client can pass to the kernel.
> 
> A Netlink user-level network manager, such as systemd-networkd, may set
> the RTM_F_RA_ROUTER flag in the Netlink RTM_NEWROUTE rtmsg. When set,
> the kernel sets RTF_RA_ROUTER in the fib6_config fc_flags. This causes a
> default route to be created in the same way as if the kernel processed
> the RA, via rt6add_dflt_router().
> 
> This is needed by user-level network managers, like systemd-networkd,
> that prefer to do the RA processing themselves. ie. they disable the
> kernel's RA processing by setting net.ipv6.conf.<intf>.accept_ra=0.
> 
> Without this flag, when there are mutliple default routers, the kernel
> coalesces multiple default routes into an ECMP route. The ECMP route
> ignores per-route REACHABILITY information. If one of the default
> routers is unresponsive, with a Neighbor Cache entry of INCOMPLETE, then
> it can still be selected as the nexthop for outgoing packets. This
> results in an inability to communicate with remote hosts, even though
> one of the default routers remains REACHABLE. This violates RFC4861
> section 6.3.6, bullet 1.
> 
> Extract from RFC4861 6.3.6 bullet 1:
>      1) Routers that are reachable or probably reachable (i.e., in any
>         state other than INCOMPLETE) SHOULD be preferred over routers
>         whose reachability is unknown or suspect (i.e., in the
>         INCOMPLETE state, or for which no Neighbor Cache entry exists).
> 
> This fixes the IPv6 Logo conformance test v6LC_2_2_11, and others that
> test with multiple default routers. Also see systemd issue #33470:
> https://github.com/systemd/systemd/issues/33470.
> 
> Signed-off-by: Matt Muggeridge <Matt.Muggeridge@hpe.com>
> Cc: David Ahern <dsahern@kernel.org>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: linux-api@vger.kernel.org
> Cc: stable@vger.kernel.org
> ---
>  include/uapi/linux/rtnetlink.h | 9 +++++----
>  net/ipv6/route.c               | 3 +++
>  2 files changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index 3b687d20c9ed..9f0259f6e4ed 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -202,7 +202,7 @@ enum {
>  #define RTM_NR_FAMILIES	(RTM_NR_MSGTYPES >> 2)
>  #define RTM_FAM(cmd)	(((cmd) - RTM_BASE) >> 2)
>  
> -/* 
> +/*
>     Generic structure for encapsulation of optional route information.
>     It is reminiscent of sockaddr, but with sa_family replaced
>     with attribute type.
> @@ -242,7 +242,7 @@ struct rtmsg {
>  
>  	unsigned char		rtm_table;	/* Routing table id */
>  	unsigned char		rtm_protocol;	/* Routing protocol; see below	*/
> -	unsigned char		rtm_scope;	/* See below */	
> +	unsigned char		rtm_scope;	/* See below */
>  	unsigned char		rtm_type;	/* See below	*/
>  
>  	unsigned		rtm_flags;
> @@ -336,6 +336,7 @@ enum rt_scope_t {
>  #define RTM_F_FIB_MATCH	        0x2000	/* return full fib lookup match */
>  #define RTM_F_OFFLOAD		0x4000	/* route is offloaded */
>  #define RTM_F_TRAP		0x8000	/* route is trapping packets */
> +#define RTM_F_RA_ROUTER		0x10000	/* route is a default route from RA */
Please, don't mix whitespace changes with the changes related to the new flag.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst#n166


Regards,
Nicolas

  reply	other threads:[~2024-11-05 14:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-05  3:18 [PATCH net 0/1] net/ipv6: Netlink flag for new IPv6 Default Routes Matt Muggeridge
2024-11-05  3:18 ` [PATCH net 1/1] " Matt Muggeridge
2024-11-05 14:37   ` Nicolas Dichtel [this message]
2024-11-06  1:44     ` Matt Muggeridge
2024-11-05 18:15   ` Ido Schimmel
2024-11-06  2:50     ` Matt Muggeridge
2024-11-06 12:37       ` Ido Schimmel
2024-11-07  3:53         ` Matt Muggeridge
2024-11-07  9:52           ` Ido Schimmel
2024-11-08  2:20             ` Matt Muggeridge
2024-11-14 20:42             ` Matt Muggeridge
2024-11-06 18:59       ` David Ahern

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a8d6565-fdc0-452f-b132-5d237a1b7dec@6wind.com \
    --to=nicolas.dichtel@6wind.com \
    --cc=Matt.Muggeridge@hpe.com \
    --cc=davem@davemloft.net \
    --cc=dsahern@kernel.org \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).