* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
2026-04-21 9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
@ 2026-04-21 14:25 ` Simon Horman
2026-04-21 17:33 ` Justin Iurman
2026-04-23 8:00 ` Sebastian Andrzej Siewior
2 siblings, 0 replies; 4+ messages in thread
From: Simon Horman @ 2026-04-21 14:25 UTC (permalink / raw)
To: Andrea Mayer
Cc: davem, dsahern, edumazet, kuba, pabeni, bigeasy, clrkwllms,
rostedt, david.lebrun, alex.aring, stefano.salsano, netdev,
linux-rt-devel, linux-kernel, stable
On Tue, Apr 21, 2026 at 11:47:35AM +0200, Andrea Mayer wrote:
> seg6_input_core() and rpl_input() call ip6_route_input() which sets a
> NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
> dst_hold() unconditionally.
> On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
> release the underlying pcpu_rt between the lookup and the caching
> through a concurrent FIB lookup on a shared nexthop.
> Simplified race sequence:
>
> ksoftirqd/X higher-prio task (same CPU X)
> ----------- --------------------------------
> seg6_input_core(,skb)/rpl_input(skb)
> dst_cache_get()
> -> miss
> ip6_route_input(skb)
> -> ip6_pol_route(,skb,flags)
> [RT6_LOOKUP_F_DST_NOREF in flags]
> -> FIB lookup resolves fib6_nh
> [nhid=N route]
> -> rt6_make_pcpu_route()
> [creates pcpu_rt, refcount=1]
> pcpu_rt->sernum = fib6_sernum
> [fib6_sernum=W]
> -> cmpxchg(fib6_nh.rt6i_pcpu,
> NULL, pcpu_rt)
> [slot was empty, store succeeds]
> -> skb_dst_set_noref(skb, dst)
> [dst is pcpu_rt, refcount still 1]
>
> rt_genid_bump_ipv6()
> -> bumps fib6_sernum
> [fib6_sernum from W to Z]
> ip6_route_output()
> -> ip6_pol_route()
> -> FIB lookup resolves fib6_nh
> [nhid=N]
> -> rt6_get_pcpu_route()
> pcpu_rt->sernum != fib6_sernum
> [W <> Z, stale]
> -> prev = xchg(rt6i_pcpu, NULL)
> -> dst_release(prev)
> [prev is pcpu_rt,
> refcount 1->0, dead]
>
> dst = skb_dst(skb)
> [dst is the dead pcpu_rt]
> dst_cache_set_ip6(dst)
> -> dst_hold() on dead dst
> -> WARN / use-after-free
>
> For the race to occur, ksoftirqd must be preemptible (PREEMPT_RT without
> PREEMPT_RT_NEEDS_BH_LOCK) and a concurrent task must be able to release
> the pcpu_rt. Shared nexthop objects provide such a path, as two routes
> pointing to the same nhid share the same fib6_nh and its rt6i_pcpu
> entry.
>
> Fix seg6_input_core() and rpl_input() by calling skb_dst_force() after
> ip6_route_input() to force the NOREF dst into a refcounted one before
> caching.
> The output path is not affected as ip6_route_output() already returns a
> refcounted dst.
>
> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> Cc: stable@vger.kernel.org
> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Reviewed-by: Simon Horman <horms@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
2026-04-21 9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
2026-04-21 14:25 ` Simon Horman
@ 2026-04-21 17:33 ` Justin Iurman
2026-04-23 8:00 ` Sebastian Andrzej Siewior
2 siblings, 0 replies; 4+ messages in thread
From: Justin Iurman @ 2026-04-21 17:33 UTC (permalink / raw)
To: Andrea Mayer, davem, dsahern, edumazet, kuba, pabeni, horms
Cc: bigeasy, clrkwllms, rostedt, david.lebrun, alex.aring,
stefano.salsano, netdev, linux-rt-devel, linux-kernel, stable
On 4/21/26 11:47, Andrea Mayer wrote:
> seg6_input_core() and rpl_input() call ip6_route_input() which sets a
> NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
> dst_hold() unconditionally.
> On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
> release the underlying pcpu_rt between the lookup and the caching
> through a concurrent FIB lookup on a shared nexthop.
> Simplified race sequence:
>
> ksoftirqd/X higher-prio task (same CPU X)
> ----------- --------------------------------
> seg6_input_core(,skb)/rpl_input(skb)
> dst_cache_get()
> -> miss
> ip6_route_input(skb)
> -> ip6_pol_route(,skb,flags)
> [RT6_LOOKUP_F_DST_NOREF in flags]
> -> FIB lookup resolves fib6_nh
> [nhid=N route]
> -> rt6_make_pcpu_route()
> [creates pcpu_rt, refcount=1]
> pcpu_rt->sernum = fib6_sernum
> [fib6_sernum=W]
> -> cmpxchg(fib6_nh.rt6i_pcpu,
> NULL, pcpu_rt)
> [slot was empty, store succeeds]
> -> skb_dst_set_noref(skb, dst)
> [dst is pcpu_rt, refcount still 1]
>
> rt_genid_bump_ipv6()
> -> bumps fib6_sernum
> [fib6_sernum from W to Z]
> ip6_route_output()
> -> ip6_pol_route()
> -> FIB lookup resolves fib6_nh
> [nhid=N]
> -> rt6_get_pcpu_route()
> pcpu_rt->sernum != fib6_sernum
> [W <> Z, stale]
> -> prev = xchg(rt6i_pcpu, NULL)
> -> dst_release(prev)
> [prev is pcpu_rt,
> refcount 1->0, dead]
>
> dst = skb_dst(skb)
> [dst is the dead pcpu_rt]
> dst_cache_set_ip6(dst)
> -> dst_hold() on dead dst
> -> WARN / use-after-free
>
> For the race to occur, ksoftirqd must be preemptible (PREEMPT_RT without
> PREEMPT_RT_NEEDS_BH_LOCK) and a concurrent task must be able to release
> the pcpu_rt. Shared nexthop objects provide such a path, as two routes
> pointing to the same nhid share the same fib6_nh and its rt6i_pcpu
> entry.
>
> Fix seg6_input_core() and rpl_input() by calling skb_dst_force() after
> ip6_route_input() to force the NOREF dst into a refcounted one before
> caching.
> The output path is not affected as ip6_route_output() already returns a
> refcounted dst.
>
> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> Cc: stable@vger.kernel.org
> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
> ---
> net/ipv6/rpl_iptunnel.c | 9 +++++++++
> net/ipv6/seg6_iptunnel.c | 9 +++++++++
> 2 files changed, 18 insertions(+)
>
> diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
> index c7942cf65567..4e10adcd70e8 100644
> --- a/net/ipv6/rpl_iptunnel.c
> +++ b/net/ipv6/rpl_iptunnel.c
> @@ -287,7 +287,16 @@ static int rpl_input(struct sk_buff *skb)
>
> if (!dst) {
> ip6_route_input(skb);
> +
> + /* ip6_route_input() sets a NOREF dst; force a refcount on it
> + * before caching or further use.
> + */
> + skb_dst_force(skb);
> dst = skb_dst(skb);
> + if (unlikely(!dst)) {
> + err = -ENETUNREACH;
> + goto drop;
> + }
>
> /* cache only if we don't create a dst reference loop */
> if (!dst->error && lwtst != dst->lwtstate) {
> diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
> index 97b50d9b1365..94284b483be0 100644
> --- a/net/ipv6/seg6_iptunnel.c
> +++ b/net/ipv6/seg6_iptunnel.c
> @@ -515,7 +515,16 @@ static int seg6_input_core(struct net *net, struct sock *sk,
>
> if (!dst) {
> ip6_route_input(skb);
> +
> + /* ip6_route_input() sets a NOREF dst; force a refcount on it
> + * before caching or further use.
> + */
> + skb_dst_force(skb);
> dst = skb_dst(skb);
> + if (unlikely(!dst)) {
> + err = -ENETUNREACH;
> + goto drop;
> + }
>
> /* cache only if we don't create a dst reference loop */
> if (!dst->error && lwtst != dst->lwtstate) {
Thanks for taking care of this, Andrea! LGTM.
Reviewed-by: Justin Iurman <justin.iurman@gmail.com>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
2026-04-21 9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
2026-04-21 14:25 ` Simon Horman
2026-04-21 17:33 ` Justin Iurman
@ 2026-04-23 8:00 ` Sebastian Andrzej Siewior
2 siblings, 0 replies; 4+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-04-23 8:00 UTC (permalink / raw)
To: Andrea Mayer
Cc: davem, dsahern, edumazet, kuba, pabeni, horms, clrkwllms, rostedt,
david.lebrun, alex.aring, stefano.salsano, netdev, linux-rt-devel,
linux-kernel, stable
On 2026-04-21 11:47:35 [+0200], Andrea Mayer wrote:
> seg6_input_core() and rpl_input() call ip6_route_input() which sets a
> NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
> dst_hold() unconditionally.
> On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
> release the underlying pcpu_rt between the lookup and the caching
> through a concurrent FIB lookup on a shared nexthop.
> Simplified race sequence:
>
> ksoftirqd/X higher-prio task (same CPU X)
> ----------- --------------------------------
> seg6_input_core(,skb)/rpl_input(skb)
> dst_cache_get()
> -> miss
> ip6_route_input(skb)
> -> ip6_pol_route(,skb,flags)
> [RT6_LOOKUP_F_DST_NOREF in flags]
> -> FIB lookup resolves fib6_nh
> [nhid=N route]
> -> rt6_make_pcpu_route()
> [creates pcpu_rt, refcount=1]
> pcpu_rt->sernum = fib6_sernum
> [fib6_sernum=W]
> -> cmpxchg(fib6_nh.rt6i_pcpu,
> NULL, pcpu_rt)
> [slot was empty, store succeeds]
> -> skb_dst_set_noref(skb, dst)
> [dst is pcpu_rt, refcount still 1]
>
> rt_genid_bump_ipv6()
> -> bumps fib6_sernum
> [fib6_sernum from W to Z]
> ip6_route_output()
> -> ip6_pol_route()
> -> FIB lookup resolves fib6_nh
> [nhid=N]
> -> rt6_get_pcpu_route()
> pcpu_rt->sernum != fib6_sernum
> [W <> Z, stale]
> -> prev = xchg(rt6i_pcpu, NULL)
> -> dst_release(prev)
> [prev is pcpu_rt,
> refcount 1->0, dead]
>
> dst = skb_dst(skb)
> [dst is the dead pcpu_rt]
> dst_cache_set_ip6(dst)
> -> dst_hold() on dead dst
> -> WARN / use-after-free
So the dst passed to skb_dst_set_noref() has no reference count. The fix
is to use skb_dst_force() to increment the refcount on it. But this
requires that we are in the same RCU section. And I guess we are since
none of the warnings are visible.
Doesn't this make ip6_route_input() on RT fragile in general due to the
RT6_LOOKUP_F_DST_NOREF usage or here something special about the two
files that are patched?
Based on your explanation it all makes sense, I am just not sure if this
race is limited to those two are if there is more to it.
> For the race to occur, ksoftirqd must be preemptible (PREEMPT_RT without
> PREEMPT_RT_NEEDS_BH_LOCK) and a concurrent task must be able to release
> the pcpu_rt. Shared nexthop objects provide such a path, as two routes
> pointing to the same nhid share the same fib6_nh and its rt6i_pcpu
> entry.
>
> Fix seg6_input_core() and rpl_input() by calling skb_dst_force() after
> ip6_route_input() to force the NOREF dst into a refcounted one before
> caching.
> The output path is not affected as ip6_route_output() already returns a
> refcounted dst.
>
> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
If having PREEMPT_RT_NEEDS_BH_LOCK unset is the requirement then the
right fixes: would be
Fixes: 3253cb49cbad4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
as prior this commit the race is not possible, right?
Does this mean that rpl_input() does a local_bh_disable() while
obtaining the dst but it never runs outside of bh-disabled section?
Because if it can run in preemptible context then it would not be to
PREEMPT_RT at which point the Fixes: tags from above would make sense
again.
> Cc: stable@vger.kernel.org
> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
Sebastian
^ permalink raw reply [flat|nested] 4+ messages in thread