public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
@ 2026-04-21  9:47 Andrea Mayer
  2026-04-21 14:25 ` Simon Horman
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Andrea Mayer @ 2026-04-21  9:47 UTC (permalink / raw)
  To: davem, dsahern, edumazet, kuba, pabeni, horms
  Cc: bigeasy, clrkwllms, rostedt, david.lebrun, alex.aring,
	stefano.salsano, andrea.mayer, netdev, linux-rt-devel,
	linux-kernel, stable

seg6_input_core() and rpl_input() call ip6_route_input() which sets a
NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
dst_hold() unconditionally.
On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
release the underlying pcpu_rt between the lookup and the caching
through a concurrent FIB lookup on a shared nexthop.
Simplified race sequence:

  ksoftirqd/X                       higher-prio task (same CPU X)
  -----------                       --------------------------------
  seg6_input_core(,skb)/rpl_input(skb)
    dst_cache_get()
      -> miss
    ip6_route_input(skb)
      -> ip6_pol_route(,skb,flags)
         [RT6_LOOKUP_F_DST_NOREF in flags]
        -> FIB lookup resolves fib6_nh
           [nhid=N route]
        -> rt6_make_pcpu_route()
           [creates pcpu_rt, refcount=1]
             pcpu_rt->sernum = fib6_sernum
             [fib6_sernum=W]
           -> cmpxchg(fib6_nh.rt6i_pcpu,
                      NULL, pcpu_rt)
              [slot was empty, store succeeds]
      -> skb_dst_set_noref(skb, dst)
         [dst is pcpu_rt, refcount still 1]

                                    rt_genid_bump_ipv6()
                                      -> bumps fib6_sernum
                                         [fib6_sernum from W to Z]
                                    ip6_route_output()
                                      -> ip6_pol_route()
                                        -> FIB lookup resolves fib6_nh
                                           [nhid=N]
                                        -> rt6_get_pcpu_route()
                                             pcpu_rt->sernum != fib6_sernum
                                             [W <> Z, stale]
                                          -> prev = xchg(rt6i_pcpu, NULL)
                                          -> dst_release(prev)
                                             [prev is pcpu_rt,
                                              refcount 1->0, dead]

    dst = skb_dst(skb)
    [dst is the dead pcpu_rt]
    dst_cache_set_ip6(dst)
      -> dst_hold() on dead dst
      -> WARN / use-after-free

For the race to occur, ksoftirqd must be preemptible (PREEMPT_RT without
PREEMPT_RT_NEEDS_BH_LOCK) and a concurrent task must be able to release
the pcpu_rt. Shared nexthop objects provide such a path, as two routes
pointing to the same nhid share the same fib6_nh and its rt6i_pcpu
entry.

Fix seg6_input_core() and rpl_input() by calling skb_dst_force() after
ip6_route_input() to force the NOREF dst into a refcounted one before
caching.
The output path is not affected as ip6_route_output() already returns a
refcounted dst.

Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
---
 net/ipv6/rpl_iptunnel.c  | 9 +++++++++
 net/ipv6/seg6_iptunnel.c | 9 +++++++++
 2 files changed, 18 insertions(+)

diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
index c7942cf65567..4e10adcd70e8 100644
--- a/net/ipv6/rpl_iptunnel.c
+++ b/net/ipv6/rpl_iptunnel.c
@@ -287,7 +287,16 @@ static int rpl_input(struct sk_buff *skb)
 
 	if (!dst) {
 		ip6_route_input(skb);
+
+		/* ip6_route_input() sets a NOREF dst; force a refcount on it
+		 * before caching or further use.
+		 */
+		skb_dst_force(skb);
 		dst = skb_dst(skb);
+		if (unlikely(!dst)) {
+			err = -ENETUNREACH;
+			goto drop;
+		}
 
 		/* cache only if we don't create a dst reference loop */
 		if (!dst->error && lwtst != dst->lwtstate) {
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 97b50d9b1365..94284b483be0 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -515,7 +515,16 @@ static int seg6_input_core(struct net *net, struct sock *sk,
 
 	if (!dst) {
 		ip6_route_input(skb);
+
+		/* ip6_route_input() sets a NOREF dst; force a refcount on it
+		 * before caching or further use.
+		 */
+		skb_dst_force(skb);
 		dst = skb_dst(skb);
+		if (unlikely(!dst)) {
+			err = -ENETUNREACH;
+			goto drop;
+		}
 
 		/* cache only if we don't create a dst reference loop */
 		if (!dst->error && lwtst != dst->lwtstate) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
  2026-04-21  9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
@ 2026-04-21 14:25 ` Simon Horman
  2026-04-21 17:33 ` Justin Iurman
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Simon Horman @ 2026-04-21 14:25 UTC (permalink / raw)
  To: Andrea Mayer
  Cc: davem, dsahern, edumazet, kuba, pabeni, bigeasy, clrkwllms,
	rostedt, david.lebrun, alex.aring, stefano.salsano, netdev,
	linux-rt-devel, linux-kernel, stable

On Tue, Apr 21, 2026 at 11:47:35AM +0200, Andrea Mayer wrote:
> seg6_input_core() and rpl_input() call ip6_route_input() which sets a
> NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
> dst_hold() unconditionally.
> On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
> release the underlying pcpu_rt between the lookup and the caching
> through a concurrent FIB lookup on a shared nexthop.
> Simplified race sequence:
> 
>   ksoftirqd/X                       higher-prio task (same CPU X)
>   -----------                       --------------------------------
>   seg6_input_core(,skb)/rpl_input(skb)
>     dst_cache_get()
>       -> miss
>     ip6_route_input(skb)
>       -> ip6_pol_route(,skb,flags)
>          [RT6_LOOKUP_F_DST_NOREF in flags]
>         -> FIB lookup resolves fib6_nh
>            [nhid=N route]
>         -> rt6_make_pcpu_route()
>            [creates pcpu_rt, refcount=1]
>              pcpu_rt->sernum = fib6_sernum
>              [fib6_sernum=W]
>            -> cmpxchg(fib6_nh.rt6i_pcpu,
>                       NULL, pcpu_rt)
>               [slot was empty, store succeeds]
>       -> skb_dst_set_noref(skb, dst)
>          [dst is pcpu_rt, refcount still 1]
> 
>                                     rt_genid_bump_ipv6()
>                                       -> bumps fib6_sernum
>                                          [fib6_sernum from W to Z]
>                                     ip6_route_output()
>                                       -> ip6_pol_route()
>                                         -> FIB lookup resolves fib6_nh
>                                            [nhid=N]
>                                         -> rt6_get_pcpu_route()
>                                              pcpu_rt->sernum != fib6_sernum
>                                              [W <> Z, stale]
>                                           -> prev = xchg(rt6i_pcpu, NULL)
>                                           -> dst_release(prev)
>                                              [prev is pcpu_rt,
>                                               refcount 1->0, dead]
> 
>     dst = skb_dst(skb)
>     [dst is the dead pcpu_rt]
>     dst_cache_set_ip6(dst)
>       -> dst_hold() on dead dst
>       -> WARN / use-after-free
> 
> For the race to occur, ksoftirqd must be preemptible (PREEMPT_RT without
> PREEMPT_RT_NEEDS_BH_LOCK) and a concurrent task must be able to release
> the pcpu_rt. Shared nexthop objects provide such a path, as two routes
> pointing to the same nhid share the same fib6_nh and its rt6i_pcpu
> entry.
> 
> Fix seg6_input_core() and rpl_input() by calling skb_dst_force() after
> ip6_route_input() to force the NOREF dst into a refcounted one before
> caching.
> The output path is not affected as ip6_route_output() already returns a
> refcounted dst.
> 
> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> Cc: stable@vger.kernel.org
> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
  2026-04-21  9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
  2026-04-21 14:25 ` Simon Horman
@ 2026-04-21 17:33 ` Justin Iurman
  2026-04-23  8:00 ` Sebastian Andrzej Siewior
  2026-04-28  9:30 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 7+ messages in thread
From: Justin Iurman @ 2026-04-21 17:33 UTC (permalink / raw)
  To: Andrea Mayer, davem, dsahern, edumazet, kuba, pabeni, horms
  Cc: bigeasy, clrkwllms, rostedt, david.lebrun, alex.aring,
	stefano.salsano, netdev, linux-rt-devel, linux-kernel, stable

On 4/21/26 11:47, Andrea Mayer wrote:
> seg6_input_core() and rpl_input() call ip6_route_input() which sets a
> NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
> dst_hold() unconditionally.
> On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
> release the underlying pcpu_rt between the lookup and the caching
> through a concurrent FIB lookup on a shared nexthop.
> Simplified race sequence:
> 
>    ksoftirqd/X                       higher-prio task (same CPU X)
>    -----------                       --------------------------------
>    seg6_input_core(,skb)/rpl_input(skb)
>      dst_cache_get()
>        -> miss
>      ip6_route_input(skb)
>        -> ip6_pol_route(,skb,flags)
>           [RT6_LOOKUP_F_DST_NOREF in flags]
>          -> FIB lookup resolves fib6_nh
>             [nhid=N route]
>          -> rt6_make_pcpu_route()
>             [creates pcpu_rt, refcount=1]
>               pcpu_rt->sernum = fib6_sernum
>               [fib6_sernum=W]
>             -> cmpxchg(fib6_nh.rt6i_pcpu,
>                        NULL, pcpu_rt)
>                [slot was empty, store succeeds]
>        -> skb_dst_set_noref(skb, dst)
>           [dst is pcpu_rt, refcount still 1]
> 
>                                      rt_genid_bump_ipv6()
>                                        -> bumps fib6_sernum
>                                           [fib6_sernum from W to Z]
>                                      ip6_route_output()
>                                        -> ip6_pol_route()
>                                          -> FIB lookup resolves fib6_nh
>                                             [nhid=N]
>                                          -> rt6_get_pcpu_route()
>                                               pcpu_rt->sernum != fib6_sernum
>                                               [W <> Z, stale]
>                                            -> prev = xchg(rt6i_pcpu, NULL)
>                                            -> dst_release(prev)
>                                               [prev is pcpu_rt,
>                                                refcount 1->0, dead]
> 
>      dst = skb_dst(skb)
>      [dst is the dead pcpu_rt]
>      dst_cache_set_ip6(dst)
>        -> dst_hold() on dead dst
>        -> WARN / use-after-free
> 
> For the race to occur, ksoftirqd must be preemptible (PREEMPT_RT without
> PREEMPT_RT_NEEDS_BH_LOCK) and a concurrent task must be able to release
> the pcpu_rt. Shared nexthop objects provide such a path, as two routes
> pointing to the same nhid share the same fib6_nh and its rt6i_pcpu
> entry.
> 
> Fix seg6_input_core() and rpl_input() by calling skb_dst_force() after
> ip6_route_input() to force the NOREF dst into a refcounted one before
> caching.
> The output path is not affected as ip6_route_output() already returns a
> refcounted dst.
> 
> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> Cc: stable@vger.kernel.org
> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
> ---
>   net/ipv6/rpl_iptunnel.c  | 9 +++++++++
>   net/ipv6/seg6_iptunnel.c | 9 +++++++++
>   2 files changed, 18 insertions(+)
> 
> diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
> index c7942cf65567..4e10adcd70e8 100644
> --- a/net/ipv6/rpl_iptunnel.c
> +++ b/net/ipv6/rpl_iptunnel.c
> @@ -287,7 +287,16 @@ static int rpl_input(struct sk_buff *skb)
>   
>   	if (!dst) {
>   		ip6_route_input(skb);
> +
> +		/* ip6_route_input() sets a NOREF dst; force a refcount on it
> +		 * before caching or further use.
> +		 */
> +		skb_dst_force(skb);
>   		dst = skb_dst(skb);
> +		if (unlikely(!dst)) {
> +			err = -ENETUNREACH;
> +			goto drop;
> +		}
>   
>   		/* cache only if we don't create a dst reference loop */
>   		if (!dst->error && lwtst != dst->lwtstate) {
> diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
> index 97b50d9b1365..94284b483be0 100644
> --- a/net/ipv6/seg6_iptunnel.c
> +++ b/net/ipv6/seg6_iptunnel.c
> @@ -515,7 +515,16 @@ static int seg6_input_core(struct net *net, struct sock *sk,
>   
>   	if (!dst) {
>   		ip6_route_input(skb);
> +
> +		/* ip6_route_input() sets a NOREF dst; force a refcount on it
> +		 * before caching or further use.
> +		 */
> +		skb_dst_force(skb);
>   		dst = skb_dst(skb);
> +		if (unlikely(!dst)) {
> +			err = -ENETUNREACH;
> +			goto drop;
> +		}
>   
>   		/* cache only if we don't create a dst reference loop */
>   		if (!dst->error && lwtst != dst->lwtstate) {

Thanks for taking care of this, Andrea! LGTM.

Reviewed-by: Justin Iurman <justin.iurman@gmail.com>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
  2026-04-21  9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
  2026-04-21 14:25 ` Simon Horman
  2026-04-21 17:33 ` Justin Iurman
@ 2026-04-23  8:00 ` Sebastian Andrzej Siewior
  2026-04-25 14:08   ` Andrea Mayer
  2026-04-28  9:30 ` patchwork-bot+netdevbpf
  3 siblings, 1 reply; 7+ messages in thread
From: Sebastian Andrzej Siewior @ 2026-04-23  8:00 UTC (permalink / raw)
  To: Andrea Mayer
  Cc: davem, dsahern, edumazet, kuba, pabeni, horms, clrkwllms, rostedt,
	david.lebrun, alex.aring, stefano.salsano, netdev, linux-rt-devel,
	linux-kernel, stable

On 2026-04-21 11:47:35 [+0200], Andrea Mayer wrote:
> seg6_input_core() and rpl_input() call ip6_route_input() which sets a
> NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
> dst_hold() unconditionally.
> On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
> release the underlying pcpu_rt between the lookup and the caching
> through a concurrent FIB lookup on a shared nexthop.
> Simplified race sequence:
> 
>   ksoftirqd/X                       higher-prio task (same CPU X)
>   -----------                       --------------------------------
>   seg6_input_core(,skb)/rpl_input(skb)
>     dst_cache_get()
>       -> miss
>     ip6_route_input(skb)
>       -> ip6_pol_route(,skb,flags)
>          [RT6_LOOKUP_F_DST_NOREF in flags]
>         -> FIB lookup resolves fib6_nh
>            [nhid=N route]
>         -> rt6_make_pcpu_route()
>            [creates pcpu_rt, refcount=1]
>              pcpu_rt->sernum = fib6_sernum
>              [fib6_sernum=W]
>            -> cmpxchg(fib6_nh.rt6i_pcpu,
>                       NULL, pcpu_rt)
>               [slot was empty, store succeeds]
>       -> skb_dst_set_noref(skb, dst)
>          [dst is pcpu_rt, refcount still 1]
> 
>                                     rt_genid_bump_ipv6()
>                                       -> bumps fib6_sernum
>                                          [fib6_sernum from W to Z]
>                                     ip6_route_output()
>                                       -> ip6_pol_route()
>                                         -> FIB lookup resolves fib6_nh
>                                            [nhid=N]
>                                         -> rt6_get_pcpu_route()
>                                              pcpu_rt->sernum != fib6_sernum
>                                              [W <> Z, stale]
>                                           -> prev = xchg(rt6i_pcpu, NULL)
>                                           -> dst_release(prev)
>                                              [prev is pcpu_rt,
>                                               refcount 1->0, dead]
> 
>     dst = skb_dst(skb)
>     [dst is the dead pcpu_rt]
>     dst_cache_set_ip6(dst)
>       -> dst_hold() on dead dst
>       -> WARN / use-after-free

So the dst passed to skb_dst_set_noref() has no reference count. The fix
is to use skb_dst_force() to increment the refcount on it. But this
requires that we are in the same RCU section. And I guess we are since
none of the warnings are visible.

Doesn't this make ip6_route_input() on RT fragile in general due to the
RT6_LOOKUP_F_DST_NOREF usage or here something special about the two
files that are patched?
Based on your explanation it all makes sense, I am just not sure if this
race is limited to those two are if there is more to it.

> For the race to occur, ksoftirqd must be preemptible (PREEMPT_RT without
> PREEMPT_RT_NEEDS_BH_LOCK) and a concurrent task must be able to release
> the pcpu_rt. Shared nexthop objects provide such a path, as two routes
> pointing to the same nhid share the same fib6_nh and its rt6i_pcpu
> entry.
> 
> Fix seg6_input_core() and rpl_input() by calling skb_dst_force() after
> ip6_route_input() to force the NOREF dst into a refcounted one before
> caching.
> The output path is not affected as ip6_route_output() already returns a
> refcounted dst.
> 
> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")

If having PREEMPT_RT_NEEDS_BH_LOCK unset is the requirement then the
right fixes: would be
Fixes: 3253cb49cbad4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")

as prior this commit the race is not possible, right?

Does this mean that rpl_input() does a local_bh_disable() while
obtaining the dst but it never runs outside of bh-disabled section?
Because if it can run in preemptible context then it would not be to
PREEMPT_RT at which point the Fixes: tags from above would make sense
again.

> Cc: stable@vger.kernel.org
> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>

Sebastian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
  2026-04-23  8:00 ` Sebastian Andrzej Siewior
@ 2026-04-25 14:08   ` Andrea Mayer
  2026-04-28  9:14     ` Paolo Abeni
  0 siblings, 1 reply; 7+ messages in thread
From: Andrea Mayer @ 2026-04-25 14:08 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: davem, dsahern, edumazet, kuba, pabeni, horms, clrkwllms, rostedt,
	david.lebrun, alex.aring, Justin Iurman, stefano.salsano, netdev,
	linux-rt-devel, linux-kernel, stable, Andrea Mayer

On Thu, 23 Apr 2026 10:00:56 +0200
Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:

Hi Sebastian,

thanks for the review, and to Simon and Justin as well.


> On 2026-04-21 11:47:35 [+0200], Andrea Mayer wrote:
> >
> > [snip]
> 
> So the dst passed to skb_dst_set_noref() has no reference count. The fix
> is to use skb_dst_force() to increment the refcount on it. But this
> requires that we are in the same RCU section. And I guess we are since
> none of the warnings are visible.
 
Yes. lwtunnel_input() holds rcu_read_lock() around ops->input(), which is
where seg6_input_core()/rpl_input() execute. The skb_dst_force() is called
within that RCU section.
 

> Doesn't this make ip6_route_input() on RT fragile in general due to the
> RT6_LOOKUP_F_DST_NOREF usage or here something special about the two
> files that are patched?
> Based on your explanation it all makes sense, I am just not sure if this
> race is limited to those two are if there is more to it.

seg6_input_core() and rpl_input() cache the dst via dst_cache_set_ip6(), which
invokes dst_hold(). The dst_hold() calls rcuref_get(), failing on a zero
refcount and triggering a WARN, but the pointer is still stored in the cache.
After the RCU grace period completes the dst is freed, and a subsequent
dst_cache_get() returns a dangling pointer.
 
The other callers of ip6_route_input() (e.g., ipv6_srh_rcv, ipv6_rpl_srh_rcv,
ip6_rcv_finish_core) consume the NOREF dst without caching it. Even if the
pcpu_rt's refcount is concurrently dropped to zero, the dst memory remains
valid because dst_release() defers the actual free via call_rcu_hurry() and the
caller is still inside the RCU read-side critical section.


> > [snip]
> >
> > Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> > Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> 
> If having PREEMPT_RT_NEEDS_BH_LOCK unset is the requirement then the
> right fixes: would be
> Fixes: 3253cb49cbad4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
> 
> as prior this commit the race is not possible, right?

I built and tested kernels at 3253cb49cbad and its parent fd4e876f59b7 (both
CONFIG_PREEMPT_RT=y, without the fix): no issues at fd4e876f59b7.
At 3253cb49cbad, a pcpu_rt cmpxchg contention in rt6_make_pcpu_route() shows
up, which was addressed in 1adaea51c61b. I also tested at 1adaea51c61b, and at
that point the dst_hold() race described in this patch appears.
 
The seg6/rpl code obtains a NOREF dst from ip6_route_input(), does not promote
it via skb_dst_force(), and passes it to dst_cache_set_ip6() which calls
dst_hold(). This pattern has been present since af4a2209b134 and a7a29f9c361f,
and the current Fixes: tags point to the commits where it was introduced.
Does that seem reasonable?


> Does this mean that rpl_input() does a local_bh_disable() while
> obtaining the dst but it never runs outside of bh-disabled section?
> Because if it can run in preemptible context then it would not be to
> PREEMPT_RT at which point the Fixes: tags from above would make sense
> again.
> 

rpl_input() and seg6_input_core() run in softirq context via lwtunnel_input().
They do local_bh_disable() around dst_cache_get() and dst_cache_set_ip6(), but
not around ip6_route_input(). The race window is between ip6_route_input()
returning and dst_cache_set_ip6().
 
> Sebastian

Ciao,
Andrea

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
  2026-04-25 14:08   ` Andrea Mayer
@ 2026-04-28  9:14     ` Paolo Abeni
  0 siblings, 0 replies; 7+ messages in thread
From: Paolo Abeni @ 2026-04-28  9:14 UTC (permalink / raw)
  To: Andrea Mayer, Sebastian Andrzej Siewior
  Cc: davem, dsahern, edumazet, kuba, horms, clrkwllms, rostedt,
	david.lebrun, alex.aring, Justin Iurman, stefano.salsano, netdev,
	linux-rt-devel, linux-kernel, stable

On 4/25/26 4:08 PM, Andrea Mayer wrote:
> On Thu, 23 Apr 2026 10:00:56 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
> 
> Hi Sebastian,
> 
> thanks for the review, and to Simon and Justin as well.
> 
> 
>> On 2026-04-21 11:47:35 [+0200], Andrea Mayer wrote:
>>>
>>> [snip]
>>
>> So the dst passed to skb_dst_set_noref() has no reference count. The fix
>> is to use skb_dst_force() to increment the refcount on it. But this
>> requires that we are in the same RCU section. And I guess we are since
>> none of the warnings are visible.
>  
> Yes. lwtunnel_input() holds rcu_read_lock() around ops->input(), which is
> where seg6_input_core()/rpl_input() execute. The skb_dst_force() is called
> within that RCU section.
>  
> 
>> Doesn't this make ip6_route_input() on RT fragile in general due to the
>> RT6_LOOKUP_F_DST_NOREF usage or here something special about the two
>> files that are patched?
>> Based on your explanation it all makes sense, I am just not sure if this
>> race is limited to those two are if there is more to it.
> 
> seg6_input_core() and rpl_input() cache the dst via dst_cache_set_ip6(), which
> invokes dst_hold(). The dst_hold() calls rcuref_get(), failing on a zero
> refcount and triggering a WARN, but the pointer is still stored in the cache.
> After the RCU grace period completes the dst is freed, and a subsequent
> dst_cache_get() returns a dangling pointer.
>  
> The other callers of ip6_route_input() (e.g., ipv6_srh_rcv, ipv6_rpl_srh_rcv,
> ip6_rcv_finish_core) consume the NOREF dst without caching it. Even if the
> pcpu_rt's refcount is concurrently dropped to zero, the dst memory remains
> valid because dst_release() defers the actual free via call_rcu_hurry() and the
> caller is still inside the RCU read-side critical section.
> 
> 
>>> [snip]
>>>
>>> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
>>> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
>>
>> If having PREEMPT_RT_NEEDS_BH_LOCK unset is the requirement then the
>> right fixes: would be
>> Fixes: 3253cb49cbad4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
>>
>> as prior this commit the race is not possible, right?
> 
> I built and tested kernels at 3253cb49cbad and its parent fd4e876f59b7 (both
> CONFIG_PREEMPT_RT=y, without the fix): no issues at fd4e876f59b7.
> At 3253cb49cbad, a pcpu_rt cmpxchg contention in rt6_make_pcpu_route() shows
> up, which was addressed in 1adaea51c61b. I also tested at 1adaea51c61b, and at
> that point the dst_hold() race described in this patch appears.
>  
> The seg6/rpl code obtains a NOREF dst from ip6_route_input(), does not promote
> it via skb_dst_force(), and passes it to dst_cache_set_ip6() which calls
> dst_hold(). This pattern has been present since af4a2209b134 and a7a29f9c361f,
> and the current Fixes: tags point to the commits where it was introduced.
> Does that seem reasonable?

I think the above is correct, but also pointing to 3253cb49cbad4 would
be correct, since the latter is required to exploit the problem. I also
think cases like this one it's better to avoid the repost (since the
constant ML flood) and to err on the conservative side (older hash). So
I'm applying the patch as-is.

Thanks,

Paolo


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
  2026-04-21  9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
                   ` (2 preceding siblings ...)
  2026-04-23  8:00 ` Sebastian Andrzej Siewior
@ 2026-04-28  9:30 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2026-04-28  9:30 UTC (permalink / raw)
  To: Andrea Mayer
  Cc: davem, dsahern, edumazet, kuba, pabeni, horms, bigeasy, clrkwllms,
	rostedt, david.lebrun, alex.aring, stefano.salsano, netdev,
	linux-rt-devel, linux-kernel, stable

Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Tue, 21 Apr 2026 11:47:35 +0200 you wrote:
> seg6_input_core() and rpl_input() call ip6_route_input() which sets a
> NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
> dst_hold() unconditionally.
> On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
> release the underlying pcpu_rt between the lookup and the caching
> through a concurrent FIB lookup on a shared nexthop.
> Simplified race sequence:
> 
> [...]

Here is the summary with links:
  - [net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
    https://git.kernel.org/netdev/net/c/f9c52a6ba978

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2026-04-28  9:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21  9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
2026-04-21 14:25 ` Simon Horman
2026-04-21 17:33 ` Justin Iurman
2026-04-23  8:00 ` Sebastian Andrzej Siewior
2026-04-25 14:08   ` Andrea Mayer
2026-04-28  9:14     ` Paolo Abeni
2026-04-28  9:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox