public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
@ 2026-04-21  9:47 Andrea Mayer
  2026-04-21 14:25 ` Simon Horman
  2026-04-21 17:33 ` Justin Iurman
  0 siblings, 2 replies; 3+ messages in thread
From: Andrea Mayer @ 2026-04-21  9:47 UTC (permalink / raw)
  To: davem, dsahern, edumazet, kuba, pabeni, horms
  Cc: bigeasy, clrkwllms, rostedt, david.lebrun, alex.aring,
	stefano.salsano, andrea.mayer, netdev, linux-rt-devel,
	linux-kernel, stable

seg6_input_core() and rpl_input() call ip6_route_input() which sets a
NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
dst_hold() unconditionally.
On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
release the underlying pcpu_rt between the lookup and the caching
through a concurrent FIB lookup on a shared nexthop.
Simplified race sequence:

  ksoftirqd/X                       higher-prio task (same CPU X)
  -----------                       --------------------------------
  seg6_input_core(,skb)/rpl_input(skb)
    dst_cache_get()
      -> miss
    ip6_route_input(skb)
      -> ip6_pol_route(,skb,flags)
         [RT6_LOOKUP_F_DST_NOREF in flags]
        -> FIB lookup resolves fib6_nh
           [nhid=N route]
        -> rt6_make_pcpu_route()
           [creates pcpu_rt, refcount=1]
             pcpu_rt->sernum = fib6_sernum
             [fib6_sernum=W]
           -> cmpxchg(fib6_nh.rt6i_pcpu,
                      NULL, pcpu_rt)
              [slot was empty, store succeeds]
      -> skb_dst_set_noref(skb, dst)
         [dst is pcpu_rt, refcount still 1]

                                    rt_genid_bump_ipv6()
                                      -> bumps fib6_sernum
                                         [fib6_sernum from W to Z]
                                    ip6_route_output()
                                      -> ip6_pol_route()
                                        -> FIB lookup resolves fib6_nh
                                           [nhid=N]
                                        -> rt6_get_pcpu_route()
                                             pcpu_rt->sernum != fib6_sernum
                                             [W <> Z, stale]
                                          -> prev = xchg(rt6i_pcpu, NULL)
                                          -> dst_release(prev)
                                             [prev is pcpu_rt,
                                              refcount 1->0, dead]

    dst = skb_dst(skb)
    [dst is the dead pcpu_rt]
    dst_cache_set_ip6(dst)
      -> dst_hold() on dead dst
      -> WARN / use-after-free

For the race to occur, ksoftirqd must be preemptible (PREEMPT_RT without
PREEMPT_RT_NEEDS_BH_LOCK) and a concurrent task must be able to release
the pcpu_rt. Shared nexthop objects provide such a path, as two routes
pointing to the same nhid share the same fib6_nh and its rt6i_pcpu
entry.

Fix seg6_input_core() and rpl_input() by calling skb_dst_force() after
ip6_route_input() to force the NOREF dst into a refcounted one before
caching.
The output path is not affected as ip6_route_output() already returns a
refcounted dst.

Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
Cc: stable@vger.kernel.org
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
---
 net/ipv6/rpl_iptunnel.c  | 9 +++++++++
 net/ipv6/seg6_iptunnel.c | 9 +++++++++
 2 files changed, 18 insertions(+)

diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
index c7942cf65567..4e10adcd70e8 100644
--- a/net/ipv6/rpl_iptunnel.c
+++ b/net/ipv6/rpl_iptunnel.c
@@ -287,7 +287,16 @@ static int rpl_input(struct sk_buff *skb)
 
 	if (!dst) {
 		ip6_route_input(skb);
+
+		/* ip6_route_input() sets a NOREF dst; force a refcount on it
+		 * before caching or further use.
+		 */
+		skb_dst_force(skb);
 		dst = skb_dst(skb);
+		if (unlikely(!dst)) {
+			err = -ENETUNREACH;
+			goto drop;
+		}
 
 		/* cache only if we don't create a dst reference loop */
 		if (!dst->error && lwtst != dst->lwtstate) {
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
index 97b50d9b1365..94284b483be0 100644
--- a/net/ipv6/seg6_iptunnel.c
+++ b/net/ipv6/seg6_iptunnel.c
@@ -515,7 +515,16 @@ static int seg6_input_core(struct net *net, struct sock *sk,
 
 	if (!dst) {
 		ip6_route_input(skb);
+
+		/* ip6_route_input() sets a NOREF dst; force a refcount on it
+		 * before caching or further use.
+		 */
+		skb_dst_force(skb);
 		dst = skb_dst(skb);
+		if (unlikely(!dst)) {
+			err = -ENETUNREACH;
+			goto drop;
+		}
 
 		/* cache only if we don't create a dst reference loop */
 		if (!dst->error && lwtst != dst->lwtstate) {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
  2026-04-21  9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
@ 2026-04-21 14:25 ` Simon Horman
  2026-04-21 17:33 ` Justin Iurman
  1 sibling, 0 replies; 3+ messages in thread
From: Simon Horman @ 2026-04-21 14:25 UTC (permalink / raw)
  To: Andrea Mayer
  Cc: davem, dsahern, edumazet, kuba, pabeni, bigeasy, clrkwllms,
	rostedt, david.lebrun, alex.aring, stefano.salsano, netdev,
	linux-rt-devel, linux-kernel, stable

On Tue, Apr 21, 2026 at 11:47:35AM +0200, Andrea Mayer wrote:
> seg6_input_core() and rpl_input() call ip6_route_input() which sets a
> NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
> dst_hold() unconditionally.
> On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
> release the underlying pcpu_rt between the lookup and the caching
> through a concurrent FIB lookup on a shared nexthop.
> Simplified race sequence:
> 
>   ksoftirqd/X                       higher-prio task (same CPU X)
>   -----------                       --------------------------------
>   seg6_input_core(,skb)/rpl_input(skb)
>     dst_cache_get()
>       -> miss
>     ip6_route_input(skb)
>       -> ip6_pol_route(,skb,flags)
>          [RT6_LOOKUP_F_DST_NOREF in flags]
>         -> FIB lookup resolves fib6_nh
>            [nhid=N route]
>         -> rt6_make_pcpu_route()
>            [creates pcpu_rt, refcount=1]
>              pcpu_rt->sernum = fib6_sernum
>              [fib6_sernum=W]
>            -> cmpxchg(fib6_nh.rt6i_pcpu,
>                       NULL, pcpu_rt)
>               [slot was empty, store succeeds]
>       -> skb_dst_set_noref(skb, dst)
>          [dst is pcpu_rt, refcount still 1]
> 
>                                     rt_genid_bump_ipv6()
>                                       -> bumps fib6_sernum
>                                          [fib6_sernum from W to Z]
>                                     ip6_route_output()
>                                       -> ip6_pol_route()
>                                         -> FIB lookup resolves fib6_nh
>                                            [nhid=N]
>                                         -> rt6_get_pcpu_route()
>                                              pcpu_rt->sernum != fib6_sernum
>                                              [W <> Z, stale]
>                                           -> prev = xchg(rt6i_pcpu, NULL)
>                                           -> dst_release(prev)
>                                              [prev is pcpu_rt,
>                                               refcount 1->0, dead]
> 
>     dst = skb_dst(skb)
>     [dst is the dead pcpu_rt]
>     dst_cache_set_ip6(dst)
>       -> dst_hold() on dead dst
>       -> WARN / use-after-free
> 
> For the race to occur, ksoftirqd must be preemptible (PREEMPT_RT without
> PREEMPT_RT_NEEDS_BH_LOCK) and a concurrent task must be able to release
> the pcpu_rt. Shared nexthop objects provide such a path, as two routes
> pointing to the same nhid share the same fib6_nh and its rt6i_pcpu
> entry.
> 
> Fix seg6_input_core() and rpl_input() by calling skb_dst_force() after
> ip6_route_input() to force the NOREF dst into a refcounted one before
> caching.
> The output path is not affected as ip6_route_output() already returns a
> refcounted dst.
> 
> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> Cc: stable@vger.kernel.org
> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>

Reviewed-by: Simon Horman <horms@kernel.org>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
  2026-04-21  9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
  2026-04-21 14:25 ` Simon Horman
@ 2026-04-21 17:33 ` Justin Iurman
  1 sibling, 0 replies; 3+ messages in thread
From: Justin Iurman @ 2026-04-21 17:33 UTC (permalink / raw)
  To: Andrea Mayer, davem, dsahern, edumazet, kuba, pabeni, horms
  Cc: bigeasy, clrkwllms, rostedt, david.lebrun, alex.aring,
	stefano.salsano, netdev, linux-rt-devel, linux-kernel, stable

On 4/21/26 11:47, Andrea Mayer wrote:
> seg6_input_core() and rpl_input() call ip6_route_input() which sets a
> NOREF dst on the skb, then pass it to dst_cache_set_ip6() invoking
> dst_hold() unconditionally.
> On PREEMPT_RT, ksoftirqd is preemptible and a higher-priority task can
> release the underlying pcpu_rt between the lookup and the caching
> through a concurrent FIB lookup on a shared nexthop.
> Simplified race sequence:
> 
>    ksoftirqd/X                       higher-prio task (same CPU X)
>    -----------                       --------------------------------
>    seg6_input_core(,skb)/rpl_input(skb)
>      dst_cache_get()
>        -> miss
>      ip6_route_input(skb)
>        -> ip6_pol_route(,skb,flags)
>           [RT6_LOOKUP_F_DST_NOREF in flags]
>          -> FIB lookup resolves fib6_nh
>             [nhid=N route]
>          -> rt6_make_pcpu_route()
>             [creates pcpu_rt, refcount=1]
>               pcpu_rt->sernum = fib6_sernum
>               [fib6_sernum=W]
>             -> cmpxchg(fib6_nh.rt6i_pcpu,
>                        NULL, pcpu_rt)
>                [slot was empty, store succeeds]
>        -> skb_dst_set_noref(skb, dst)
>           [dst is pcpu_rt, refcount still 1]
> 
>                                      rt_genid_bump_ipv6()
>                                        -> bumps fib6_sernum
>                                           [fib6_sernum from W to Z]
>                                      ip6_route_output()
>                                        -> ip6_pol_route()
>                                          -> FIB lookup resolves fib6_nh
>                                             [nhid=N]
>                                          -> rt6_get_pcpu_route()
>                                               pcpu_rt->sernum != fib6_sernum
>                                               [W <> Z, stale]
>                                            -> prev = xchg(rt6i_pcpu, NULL)
>                                            -> dst_release(prev)
>                                               [prev is pcpu_rt,
>                                                refcount 1->0, dead]
> 
>      dst = skb_dst(skb)
>      [dst is the dead pcpu_rt]
>      dst_cache_set_ip6(dst)
>        -> dst_hold() on dead dst
>        -> WARN / use-after-free
> 
> For the race to occur, ksoftirqd must be preemptible (PREEMPT_RT without
> PREEMPT_RT_NEEDS_BH_LOCK) and a concurrent task must be able to release
> the pcpu_rt. Shared nexthop objects provide such a path, as two routes
> pointing to the same nhid share the same fib6_nh and its rt6i_pcpu
> entry.
> 
> Fix seg6_input_core() and rpl_input() by calling skb_dst_force() after
> ip6_route_input() to force the NOREF dst into a refcounted one before
> caching.
> The output path is not affected as ip6_route_output() already returns a
> refcounted dst.
> 
> Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> Cc: stable@vger.kernel.org
> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it>
> ---
>   net/ipv6/rpl_iptunnel.c  | 9 +++++++++
>   net/ipv6/seg6_iptunnel.c | 9 +++++++++
>   2 files changed, 18 insertions(+)
> 
> diff --git a/net/ipv6/rpl_iptunnel.c b/net/ipv6/rpl_iptunnel.c
> index c7942cf65567..4e10adcd70e8 100644
> --- a/net/ipv6/rpl_iptunnel.c
> +++ b/net/ipv6/rpl_iptunnel.c
> @@ -287,7 +287,16 @@ static int rpl_input(struct sk_buff *skb)
>   
>   	if (!dst) {
>   		ip6_route_input(skb);
> +
> +		/* ip6_route_input() sets a NOREF dst; force a refcount on it
> +		 * before caching or further use.
> +		 */
> +		skb_dst_force(skb);
>   		dst = skb_dst(skb);
> +		if (unlikely(!dst)) {
> +			err = -ENETUNREACH;
> +			goto drop;
> +		}
>   
>   		/* cache only if we don't create a dst reference loop */
>   		if (!dst->error && lwtst != dst->lwtstate) {
> diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c
> index 97b50d9b1365..94284b483be0 100644
> --- a/net/ipv6/seg6_iptunnel.c
> +++ b/net/ipv6/seg6_iptunnel.c
> @@ -515,7 +515,16 @@ static int seg6_input_core(struct net *net, struct sock *sk,
>   
>   	if (!dst) {
>   		ip6_route_input(skb);
> +
> +		/* ip6_route_input() sets a NOREF dst; force a refcount on it
> +		 * before caching or further use.
> +		 */
> +		skb_dst_force(skb);
>   		dst = skb_dst(skb);
> +		if (unlikely(!dst)) {
> +			err = -ENETUNREACH;
> +			goto drop;
> +		}
>   
>   		/* cache only if we don't create a dst reference loop */
>   		if (!dst->error && lwtst != dst->lwtstate) {

Thanks for taking care of this, Andrea! LGTM.

Reviewed-by: Justin Iurman <justin.iurman@gmail.com>

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-21 17:33 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-21  9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
2026-04-21 14:25 ` Simon Horman
2026-04-21 17:33 ` Justin Iurman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox