From: Kuniyuki Iwashima <kuniyu@google.com>
To: anthony.doeraene@uclouvain.be
Cc: davem@davemloft.net, kuniyu@google.com, netdev@vger.kernel.org
Subject: Re: [Regression] Broken MPLS routes with multiple nexthops
Date: Thu, 25 Jun 2026 15:51:54 +0000 [thread overview]
Message-ID: <20260625155313.847098-1-kuniyu@google.com> (raw)
In-Reply-To: <036a0c95-f5d4-46ab-88e7-1eab567d7a84@uclouvain.be>
From: Anthony Doeraene <anthony.doeraene@uclouvain.be>
Date: Thu, 25 Jun 2026 17:07:41 +0200
> Hello all,
>
> According to my experiments, it seems that ECMP with MPLS (i.e. an MPLS
> route with multiple
> nexthops) is broken on the master branch of the kernel.
>
> Indeed, whenever adding an MPLS route with multiple nexthops, ip route
> show the route as
> a dead route/link down, even if nexthops are reachable.
>
> Example to reproduce the error (tested with virtme-ng and the master
> branch of the kernel):
> ```
> modprobe mpls_iptunnel mpls_router
> sysctl net.mpls.platform_labels=100000
> ip link set dummy0 up
> ip addr add fc00:1::1/112 dev dummy0
> ip addr add fc00:2::1/112 dev dummy0
> ip -M route add 16000 \
> nexthop via inet6 fc00:1::2 as 16001 \
> nexthop via inet6 fc00:2::2 as 16002
>
> # Check the route
> ip -M route
> # Output:
> # 16000 dead linkdown
> #
> # Route is not present, even if accepted
> ```
>
> From a git blame, it seems that commit
> f0914b8436c589b7ab32c614d8d7868eb4ebd5bf
> broke the core logic for building nexthops.
Thanks for the report !
It was to balance refcount with netdev_put() in mpls_rt_alloc().
I'll post the patch below.
(Updating rt->rt_nhn is not strictlly needed for netdev_put()
because it has NULL check and rt is allocated with kzalloc(),
but it's a bit error prone, so I'll keep it)
---8<---
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ca504d9626cf..4a81514e919a 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -922,8 +922,7 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
struct nlattr *nla_via, *nla_newdst;
int remaining = cfg->rc_mp_len;
int err = 0;
-
- rt->rt_nhn = 0;
+ u8 nhs = 0;
change_nexthops(rt) {
int attrlen;
@@ -959,12 +958,15 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
rt->rt_nhn_alive--;
rtnh = rtnh_next(rtnh, &remaining);
- rt->rt_nhn++;
+ nhs++;
} endfor_nexthops(rt);
+ rt->rt_nhn = nhs;
+
return 0;
errout:
+ rt->rt_nhn = nhs;
return err;
}
---8<---
>
> This commit modified function `mpls_nx_build_multi` by setting
> `rt->rt_nhn` to 0 at the
> start of the function. However, the loop `change_nexthops` just below
> depends on
> `rt->rt_nhn` to know the actual number of nexthops that it should build.
> As `rt->rt_nhn`
> is set to 0 just before, **no nexthop is ever built**, leading to a dead
> route. Even if we
> remove this modification, this commit incorrectly increments
> `rt->rt_nhn` at the end of
> the loop (`rt->rt_nhn++`), such that the loop always end with an error
> as it tries to
> constructs more nexthops that actually provided.
>
> Commenting these two lines fix the issue, and allows to create once
> again MPLS routes
> with multiple nexthops:
>
> ```
> modprobe mpls_iptunnel mpls_router
> sysctl net.mpls.platform_labels=100000
> ip link set dummy0 up
> ip addr add fc00:1::1/112 dev dummy0
> ip addr add fc00:2::1/112 dev dummy0
> ip -M route add 16000 \
> nexthop via inet6 fc00:1::2 as 16001 \
> nexthop via inet6 fc00:2::2 as 16002
>
> # Check the route
> ip -M route
> # Output:
> # 16000
> # nexthop as to 16001 via inet6 fc00:1::2 dev dummy0
> # nexthop as to 16002 via inet6 fc00:2::2 dev dummy0
> #
> # Route is accepted and present !
> ```
>
> Overall, I think it would be interesting to discuss what this patch was
> trying to achieve,
> and how we can conciliate both use-cases.
>
> Best regards and looking forward to hearing from you,
> Doeraene Anthony
prev parent reply other threads:[~2026-06-25 15:53 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-25 15:07 [Regression] Broken MPLS routes with multiple nexthops Anthony Doeraene
2026-06-25 15:51 ` Kuniyuki Iwashima [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260625155313.847098-1-kuniyu@google.com \
--to=kuniyu@google.com \
--cc=anthony.doeraene@uclouvain.be \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox