* [Regression] Broken MPLS routes with multiple nexthops
@ 2026-06-25 15:07 Anthony Doeraene
2026-06-25 15:51 ` Kuniyuki Iwashima
0 siblings, 1 reply; 2+ messages in thread
From: Anthony Doeraene @ 2026-06-25 15:07 UTC (permalink / raw)
To: davem; +Cc: kuniyu, netdev
Hello all,
According to my experiments, it seems that ECMP with MPLS (i.e. an MPLS
route with multiple
nexthops) is broken on the master branch of the kernel.
Indeed, whenever adding an MPLS route with multiple nexthops, ip route
show the route as
a dead route/link down, even if nexthops are reachable.
Example to reproduce the error (tested with virtme-ng and the master
branch of the kernel):
```
modprobe mpls_iptunnel mpls_router
sysctl net.mpls.platform_labels=100000
ip link set dummy0 up
ip addr add fc00:1::1/112 dev dummy0
ip addr add fc00:2::1/112 dev dummy0
ip -M route add 16000 \
nexthop via inet6 fc00:1::2 as 16001 \
nexthop via inet6 fc00:2::2 as 16002
# Check the route
ip -M route
# Output:
# 16000 dead linkdown
#
# Route is not present, even if accepted
```
From a git blame, it seems that commit
f0914b8436c589b7ab32c614d8d7868eb4ebd5bf
broke the core logic for building nexthops.
This commit modified function `mpls_nx_build_multi` by setting
`rt->rt_nhn` to 0 at the
start of the function. However, the loop `change_nexthops` just below
depends on
`rt->rt_nhn` to know the actual number of nexthops that it should build.
As `rt->rt_nhn`
is set to 0 just before, **no nexthop is ever built**, leading to a dead
route. Even if we
remove this modification, this commit incorrectly increments
`rt->rt_nhn` at the end of
the loop (`rt->rt_nhn++`), such that the loop always end with an error
as it tries to
constructs more nexthops that actually provided.
Commenting these two lines fix the issue, and allows to create once
again MPLS routes
with multiple nexthops:
```
modprobe mpls_iptunnel mpls_router
sysctl net.mpls.platform_labels=100000
ip link set dummy0 up
ip addr add fc00:1::1/112 dev dummy0
ip addr add fc00:2::1/112 dev dummy0
ip -M route add 16000 \
nexthop via inet6 fc00:1::2 as 16001 \
nexthop via inet6 fc00:2::2 as 16002
# Check the route
ip -M route
# Output:
# 16000
# nexthop as to 16001 via inet6 fc00:1::2 dev dummy0
# nexthop as to 16002 via inet6 fc00:2::2 dev dummy0
#
# Route is accepted and present !
```
Overall, I think it would be interesting to discuss what this patch was
trying to achieve,
and how we can conciliate both use-cases.
Best regards and looking forward to hearing from you,
Doeraene Anthony
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Regression] Broken MPLS routes with multiple nexthops
2026-06-25 15:07 [Regression] Broken MPLS routes with multiple nexthops Anthony Doeraene
@ 2026-06-25 15:51 ` Kuniyuki Iwashima
0 siblings, 0 replies; 2+ messages in thread
From: Kuniyuki Iwashima @ 2026-06-25 15:51 UTC (permalink / raw)
To: anthony.doeraene; +Cc: davem, kuniyu, netdev
From: Anthony Doeraene <anthony.doeraene@uclouvain.be>
Date: Thu, 25 Jun 2026 17:07:41 +0200
> Hello all,
>
> According to my experiments, it seems that ECMP with MPLS (i.e. an MPLS
> route with multiple
> nexthops) is broken on the master branch of the kernel.
>
> Indeed, whenever adding an MPLS route with multiple nexthops, ip route
> show the route as
> a dead route/link down, even if nexthops are reachable.
>
> Example to reproduce the error (tested with virtme-ng and the master
> branch of the kernel):
> ```
> modprobe mpls_iptunnel mpls_router
> sysctl net.mpls.platform_labels=100000
> ip link set dummy0 up
> ip addr add fc00:1::1/112 dev dummy0
> ip addr add fc00:2::1/112 dev dummy0
> ip -M route add 16000 \
> nexthop via inet6 fc00:1::2 as 16001 \
> nexthop via inet6 fc00:2::2 as 16002
>
> # Check the route
> ip -M route
> # Output:
> # 16000 dead linkdown
> #
> # Route is not present, even if accepted
> ```
>
> From a git blame, it seems that commit
> f0914b8436c589b7ab32c614d8d7868eb4ebd5bf
> broke the core logic for building nexthops.
Thanks for the report !
It was to balance refcount with netdev_put() in mpls_rt_alloc().
I'll post the patch below.
(Updating rt->rt_nhn is not strictlly needed for netdev_put()
because it has NULL check and rt is allocated with kzalloc(),
but it's a bit error prone, so I'll keep it)
---8<---
diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
index ca504d9626cf..4a81514e919a 100644
--- a/net/mpls/af_mpls.c
+++ b/net/mpls/af_mpls.c
@@ -922,8 +922,7 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
struct nlattr *nla_via, *nla_newdst;
int remaining = cfg->rc_mp_len;
int err = 0;
-
- rt->rt_nhn = 0;
+ u8 nhs = 0;
change_nexthops(rt) {
int attrlen;
@@ -959,12 +958,15 @@ static int mpls_nh_build_multi(struct mpls_route_config *cfg,
rt->rt_nhn_alive--;
rtnh = rtnh_next(rtnh, &remaining);
- rt->rt_nhn++;
+ nhs++;
} endfor_nexthops(rt);
+ rt->rt_nhn = nhs;
+
return 0;
errout:
+ rt->rt_nhn = nhs;
return err;
}
---8<---
>
> This commit modified function `mpls_nx_build_multi` by setting
> `rt->rt_nhn` to 0 at the
> start of the function. However, the loop `change_nexthops` just below
> depends on
> `rt->rt_nhn` to know the actual number of nexthops that it should build.
> As `rt->rt_nhn`
> is set to 0 just before, **no nexthop is ever built**, leading to a dead
> route. Even if we
> remove this modification, this commit incorrectly increments
> `rt->rt_nhn` at the end of
> the loop (`rt->rt_nhn++`), such that the loop always end with an error
> as it tries to
> constructs more nexthops that actually provided.
>
> Commenting these two lines fix the issue, and allows to create once
> again MPLS routes
> with multiple nexthops:
>
> ```
> modprobe mpls_iptunnel mpls_router
> sysctl net.mpls.platform_labels=100000
> ip link set dummy0 up
> ip addr add fc00:1::1/112 dev dummy0
> ip addr add fc00:2::1/112 dev dummy0
> ip -M route add 16000 \
> nexthop via inet6 fc00:1::2 as 16001 \
> nexthop via inet6 fc00:2::2 as 16002
>
> # Check the route
> ip -M route
> # Output:
> # 16000
> # nexthop as to 16001 via inet6 fc00:1::2 dev dummy0
> # nexthop as to 16002 via inet6 fc00:2::2 dev dummy0
> #
> # Route is accepted and present !
> ```
>
> Overall, I think it would be interesting to discuss what this patch was
> trying to achieve,
> and how we can conciliate both use-cases.
>
> Best regards and looking forward to hearing from you,
> Doeraene Anthony
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2026-06-25 15:53 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-25 15:07 [Regression] Broken MPLS routes with multiple nexthops Anthony Doeraene
2026-06-25 15:51 ` Kuniyuki Iwashima
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox