From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Ido Schimmel" <idosch@nvidia.com>, "David Ahern" <dsahern@kernel.org>
Cc: netdev@vger.kernel.org, dsahern@kernel.org,
jiayuan.chen@shopee.com,
syzbot+334190e097a98a1b81bb@syzkaller.appspotmail.com,
"David S. Miller" <davem@davemloft.net>,
"Eric Dumazet" <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
"Paolo Abeni" <pabeni@redhat.com>,
"Simon Horman" <horms@kernel.org>,
"Shuah Khan" <shuah@kernel.org>,
linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH net v2 1/2] net: ipv6: fix panic when IPv4 route references loopback IPv6 nexthop
Date: Mon, 02 Mar 2026 09:07:34 +0000 [thread overview]
Message-ID: <80cf6abc40af7f2d072bd9c55758849bb05bfa95@linux.dev> (raw)
In-Reply-To: <20260302082551.GA814377@shredder>
March 2, 2026 at 16:25, "Ido Schimmel" <idosch@nvidia.com mailto:idosch@nvidia.com?to=%22Ido%20Schimmel%22%20%3Cidosch%40nvidia.com%3E > wrote:
>
> On Mon, Mar 02, 2026 at 01:11:28PM +0800, Jiayuan Chen wrote:
>
> >
> > From: Jiayuan Chen <jiayuan.chen@shopee.com>
> >
> > When a standalone IPv6 nexthop object is created with a loopback device
> > (e.g., "ip -6 nexthop add id 100 dev lo"), fib6_nh_init() misclassifies
> > it as a reject route. This is because nexthop objects have no destination
> > prefix (fc_dst=::), causing fib6_is_reject() to match any loopback
> > nexthop. The reject path skips fib_nh_common_init(), leaving
> > nhc_pcpu_rth_output unallocated. If an IPv4 route later references this
> > nexthop, __mkroute_output() dereferences NULL nhc_pcpu_rth_output and
> > panics.
> >
> > The reject classification was designed for regular IPv6 routes to prevent
> > kernel loopback loops, but nexthop objects should not be subject to this
> > check since they carry no destination information - loop prevention is
> > handled separately when the route is created.
> >
> > An alternative approach of unconditionally calling fib_nh_common_init()
> > for all reject routes was considered, but on large machines (e.g., 256
> > CPUs) with many routes, this wastes significant memory since
> > nhc_pcpu_rth_output allocates a per-CPU pointer for each route.
> >
> > Since fib6_nh_init() is shared by multiple callers (route creation,
> > nexthop object creation, IPv4 gateway validation), using fc_dst_len to
> > implicitly distinguish nexthop objects would be fragile. Add an explicit
> > fc_is_nh flag to fib6_config to clearly identify nexthop object creation
> > and skip the reject check for this path.
> >
> > Fixes: 7dd73168e273 ("ipv6: Always allocate pcpu memory in a fib6_nh")
> > Reported-by: syzbot+334190e097a98a1b81bb@syzkaller.appspotmail.com
> > Closes: https://lore.kernel.org/all/698f8482.a70a0220.2c38d7.00ca.GAE@google.com/T/
> > Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
> > ---
> > include/net/ip6_fib.h | 1 +
> > net/ipv4/nexthop.c | 1 +
> > net/ipv6/route.c | 8 +++++++-
> > 3 files changed, 9 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
> > index 88b0dd4d8e09..7710f247b8d9 100644
> > --- a/include/net/ip6_fib.h
> > +++ b/include/net/ip6_fib.h
> > @@ -62,6 +62,7 @@ struct fib6_config {
> > struct nlattr *fc_encap;
> > u16 fc_encap_type;
> > bool fc_is_fdb;
> > + bool fc_is_nh;
> > };
> >
> > struct fib6_node {
> > diff --git a/net/ipv4/nexthop.c b/net/ipv4/nexthop.c
> > index 7b9d70f9b31c..efad2dd27636 100644
> > --- a/net/ipv4/nexthop.c
> > +++ b/net/ipv4/nexthop.c
> > @@ -2859,6 +2859,7 @@ static int nh_create_ipv6(struct net *net, struct nexthop *nh,
> > struct fib6_config fib6_cfg = {
> > .fc_table = l3mdev_fib_table(cfg->dev),
> > .fc_ifindex = cfg->nh_ifindex,
> > + .fc_is_nh = true,
> > .fc_gateway = cfg->gw.ipv6,
> > .fc_flags = cfg->nh_flags,
> > .fc_nlinfo = cfg->nlinfo,
> > diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> > index c0350d97307e..347f464ce7fe 100644
> > --- a/net/ipv6/route.c
> > +++ b/net/ipv6/route.c
> > @@ -3628,7 +3628,13 @@ int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh,
> > * they would result in kernel looping; promote them to reject routes
> > */
> > addr_type = ipv6_addr_type(&cfg->fc_dst);
> > - if (fib6_is_reject(cfg->fc_flags, dev, addr_type)) {
> > + /*
> > + * Nexthop objects have no destination prefix, so fib6_is_reject()
> > + * will misclassify loopback nexthops as reject routes, causing
> > + * fib_nh_common_init() to be skipped along with its allocation
> > + * of nhc_pcpu_rth_output, which IPv4 routes require.
> > + */
> > + if (!cfg->fc_is_nh && fib6_is_reject(cfg->fc_flags, dev, addr_type)) {
> > /* hold loopback dev/idev if we haven't done so. */
> > if (dev != net->loopback_dev) {
> > if (dev) {
> >
> The code basically resets the nexthop device to the loopback device in
> case of reject routes:
>
> # ip link add name dummy1 up type dummy
> # ip route add unreachable 2001:db8:1::/64 dev dummy1
> # ip -6 route show 2001:db8:1::/64
> unreachable 2001:db8:1::/64 dev lo metric 1024 pref medium
>
> Therefore, the check in fib6_is_reject() regarding the nexthop device
> being a loopback seems quite pointless. It's probably only needed when
> promoting routes that are using the loopback device to reject routes,
> which happens in ip6_route_info_create_nh() (the other caller of
> fib6_is_reject()).
>
> I suggest simplifying the check so that it only applies to reject routes
> [1]. It fixes the issue since RTF_REJECT is a route attribute and not a
> nexthop attribute, so it will never be set by the nexthop code.
>
> [1]
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 85df25c36409..035e3f668d49 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -3582,7 +3582,6 @@ int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh,
> netdevice_tracker *dev_tracker = &fib6_nh->fib_nh_dev_tracker;
> struct net_device *dev = NULL;
> struct inet6_dev *idev = NULL;
> - int addr_type;
> int err;
>
> fib6_nh->fib_nh_family = AF_INET6;
> @@ -3624,11 +3623,10 @@ int fib6_nh_init(struct net *net, struct fib6_nh *fib6_nh,
>
> fib6_nh->fib_nh_weight = 1;
>
> - /* We cannot add true routes via loopback here,
> - * they would result in kernel looping; promote them to reject routes
> + /* Reset the nexthop device to the loopback device in case of reject
> + * routes.
> */
> - addr_type = ipv6_addr_type(&cfg->fc_dst);
> - if (fib6_is_reject(cfg->fc_flags, dev, addr_type)) {
> + if (cfg->fc_flags & RTF_REJECT) {
> /* hold loopback dev/idev if we haven't done so. */
> if (dev != net->loopback_dev) {
> if (dev) {
>
Thanks, this is indeed the simplest fix.
Let me walk through each case to confirm my understanding:
Case 1: Explicit reject route (with RTF_REJECT)
ip -6 route add unreachable 2001:db8:1::/64
cfg->fc_flags has RTF_REJECT before entering fib6_nh_init(), so the reject path is taken.
fib_nh_common_init() is skipped, nhc_pcpu_rth_output is not allocated. This is fine since reject
routes never need it.
Case 2: Loopback implicit reject route (without RTF_REJECT)
ip -6 route add 2001:db8::/32 dev lo
cfg->fc_flags does not have RTF_REJECT, so fib6_nh_init() takes the normal path and
fib_nh_common_init() allocates nhc_pcpu_rth_output. Later, ip6_route_info_create() calls
fib6_is_reject() and marks the route as RTF_REJECT.
The allocated nhc_pcpu_rth_output is unused but harmless.
Case 3: Standalone nexthop object (our bug scenario)
ip -6 nexthop add id 100 dev lo
ip route add 172.20.20.0/24 nhid 100
cfg->fc_flags does not have RTF_REJECT (nexthop objects never carry route attributes),
so fib6_nh_init() takes the normal path and fib_nh_common_init() allocates nhc_pcpu_rth_output.
This fixes the crash when an IPv4 route later references this nexthop.
next prev parent reply other threads:[~2026-03-02 9:07 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-02 5:11 [PATCH net v2 0/2] net: ipv6: fix panic when IPv4 route references loopback IPv6 nexthop and add selftest Jiayuan Chen
2026-03-02 5:11 ` [PATCH net v2 1/2] net: ipv6: fix panic when IPv4 route references loopback IPv6 nexthop Jiayuan Chen
2026-03-02 8:25 ` Ido Schimmel
2026-03-02 9:07 ` Jiayuan Chen [this message]
2026-03-02 13:38 ` Ido Schimmel
2026-03-02 5:11 ` [PATCH net v2 2/2] selftests: net: add test for IPv4 route with " Jiayuan Chen
2026-03-02 8:35 ` Ido Schimmel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=80cf6abc40af7f2d072bd9c55758849bb05bfa95@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=idosch@nvidia.com \
--cc=jiayuan.chen@shopee.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=shuah@kernel.org \
--cc=syzbot+334190e097a98a1b81bb@syzkaller.appspotmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox