Netdev List
 help / color / mirror / Atom feed
From: Kuniyuki Iwashima <kuniyu@google.com>
To: anthony.doeraene@uclouvain.be
Cc: davem@davemloft.net, kuniyu@google.com, netdev@vger.kernel.org
Subject: Re: [Regression] Broken MPLS routes with multiple nexthops
Date: Thu, 25 Jun 2026 15:51:54 +0000	[thread overview]
Message-ID: <20260625155313.847098-1-kuniyu@google.com> (raw)
In-Reply-To: <036a0c95-f5d4-46ab-88e7-1eab567d7a84@uclouvain.be>

From: Anthony Doeraene <anthony.doeraene@uclouvain.be>
Date: Thu, 25 Jun 2026 17:07:41 +0200
> Hello all,
> 
> According to my experiments, it seems that ECMP with MPLS (i.e. an MPLS 
> route with multiple
> nexthops) is broken on the master branch of the kernel.
> 
> Indeed, whenever adding an MPLS route with multiple nexthops, ip route 
> show the route as
> a dead route/link down, even if nexthops are reachable.
> 
> Example to reproduce the error (tested with virtme-ng and the master 
> branch of the kernel):
> ```
> modprobe mpls_iptunnel mpls_router
> sysctl net.mpls.platform_labels=100000
> ip link set dummy0 up
> ip addr add fc00:1::1/112 dev dummy0
> ip addr add fc00:2::1/112 dev dummy0
> ip -M route add 16000 \
>      nexthop via inet6 fc00:1::2 as 16001 \
>      nexthop via inet6 fc00:2::2 as 16002
> 
> # Check the route
> ip -M route
> # Output:
> #     16000 dead linkdown
> #
> # Route is not present, even if accepted
> ```
> 
>  From a git blame, it seems that commit 
> f0914b8436c589b7ab32c614d8d7868eb4ebd5bf
> broke the core logic for building nexthops.

Thanks for the report !

It was to balance refcount with netdev_put() in mpls_rt_alloc().
I'll post the patch below.

(Updating rt->rt_nhn is not strictlly needed for netdev_put()
 because it has NULL check and rt is allocated with kzalloc(),
 but it's a bit error prone, so I'll keep it)

---8<---
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ca504d9626cf..4a81514e919a 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -922,8 +922,7 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
 	struct nlattr *nla_via, *nla_newdst;
 	int remaining = cfg->rc_mp_len;
 	int err = 0;
-
-	rt->rt_nhn = 0;
+	u8 nhs = 0;
 
 	change_nexthops(rt) {
 		int attrlen;
@@ -959,12 +958,15 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
 			rt->rt_nhn_alive--;
 
 		rtnh = rtnh_next(rtnh, &remaining);
-		rt->rt_nhn++;
+		nhs++;
 	} endfor_nexthops(rt);
 
+	rt->rt_nhn = nhs;
+
 	return 0;
 
 errout:
+	rt->rt_nhn = nhs;
 	return err;
 }
 
---8<---


> 
> This commit modified function `mpls_nx_build_multi` by setting 
> `rt->rt_nhn` to 0 at the
> start of the function. However, the loop `change_nexthops` just below 
> depends on
> `rt->rt_nhn` to know the actual number of nexthops that it should build. 
> As `rt->rt_nhn`
> is set to 0 just before, **no nexthop is ever built**, leading to a dead 
> route. Even if we
> remove this modification, this commit incorrectly increments 
> `rt->rt_nhn` at the end of
> the loop (`rt->rt_nhn++`),  such that the loop always end with an error 
> as it tries to
> constructs more nexthops that actually provided.
> 
> Commenting these two lines fix the issue, and allows to create once 
> again MPLS routes
> with multiple nexthops:
> 
> ```
> modprobe mpls_iptunnel mpls_router
> sysctl net.mpls.platform_labels=100000
> ip link set dummy0 up
> ip addr add fc00:1::1/112 dev dummy0
> ip addr add fc00:2::1/112 dev dummy0
> ip -M route add 16000 \
>      nexthop via inet6 fc00:1::2 as 16001 \
>      nexthop via inet6 fc00:2::2 as 16002
> 
> # Check the route
> ip -M route
> # Output:
> #     16000
> #            nexthop as to 16001 via inet6 fc00:1::2 dev dummy0
> #            nexthop as to 16002 via inet6 fc00:2::2 dev dummy0
> #
> # Route is accepted and present !
> ```
> 
> Overall, I think it would be interesting to discuss what this patch was 
> trying to achieve,
> and how we can conciliate both use-cases.
> 
> Best regards and looking forward to hearing from you,
> Doeraene Anthony

      reply	other threads:[~2026-06-25 15:53 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-25 15:07 [Regression] Broken MPLS routes with multiple nexthops Anthony Doeraene
2026-06-25 15:51 ` Kuniyuki Iwashima [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260625155313.847098-1-kuniyu@google.com \
    --to=kuniyu@google.com \
    --cc=anthony.doeraene@uclouvain.be \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox