From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Andrea Mayer <andrea.mayer@uniroma2.it>
Cc: davem@davemloft.net, dsahern@kernel.org, edumazet@google.com,
kuba@kernel.org, pabeni@redhat.com, horms@kernel.org,
clrkwllms@kernel.org, rostedt@goodmis.org,
david.lebrun@uclouvain.be, alex.aring@gmail.com,
Justin Iurman <justin.iurman@gmail.com>,
stefano.salsano@uniroma2.it, netdev@vger.kernel.org,
linux-rt-devel@lists.linux.dev, linux-kernel@vger.kernel.org,
stable@vger.kernel.org
Subject: Re: [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels
Date: Wed, 29 Apr 2026 13:03:41 +0200 [thread overview]
Message-ID: <20260429110341.ipXGaamM@linutronix.de> (raw)
In-Reply-To: <20260425160856.8cebade5eae1dcaec7af8bfe@uniroma2.it>
On 2026-04-25 16:08:56 [+0200], Andrea Mayer wrote:
> On Thu, 23 Apr 2026 10:00:56 +0200
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote:
>
> Hi Sebastian,
Hi Andrea,
> > Doesn't this make ip6_route_input() on RT fragile in general due to the
> > RT6_LOOKUP_F_DST_NOREF usage or here something special about the two
> > files that are patched?
> > Based on your explanation it all makes sense, I am just not sure if this
> > race is limited to those two are if there is more to it.
>
> seg6_input_core() and rpl_input() cache the dst via dst_cache_set_ip6(), which
> invokes dst_hold(). The dst_hold() calls rcuref_get(), failing on a zero
> refcount and triggering a WARN, but the pointer is still stored in the cache.
> After the RCU grace period completes the dst is freed, and a subsequent
> dst_cache_get() returns a dangling pointer.
>
> The other callers of ip6_route_input() (e.g., ipv6_srh_rcv, ipv6_rpl_srh_rcv,
> ip6_rcv_finish_core) consume the NOREF dst without caching it. Even if the
> pcpu_rt's refcount is concurrently dropped to zero, the dst memory remains
> valid because dst_release() defers the actual free via call_rcu_hurry() and the
> caller is still inside the RCU read-side critical section.
Ah, okay. Thank you for clearing that up.
> > > [snip]
> > >
> > > Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
> > > Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
> >
> > If having PREEMPT_RT_NEEDS_BH_LOCK unset is the requirement then the
> > right fixes: would be
> > Fixes: 3253cb49cbad4 ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
> >
> > as prior this commit the race is not possible, right?
>
> I built and tested kernels at 3253cb49cbad and its parent fd4e876f59b7 (both
> CONFIG_PREEMPT_RT=y, without the fix): no issues at fd4e876f59b7.
> At 3253cb49cbad, a pcpu_rt cmpxchg contention in rt6_make_pcpu_route() shows
> up, which was addressed in 1adaea51c61b. I also tested at 1adaea51c61b, and at
> that point the dst_hold() race described in this patch appears.
>
> The seg6/rpl code obtains a NOREF dst from ip6_route_input(), does not promote
> it via skb_dst_force(), and passes it to dst_cache_set_ip6() which calls
> dst_hold(). This pattern has been present since af4a2209b134 and a7a29f9c361f,
> and the current Fixes: tags point to the commits where it was introduced.
> Does that seem reasonable?
Yes. So based on that the regression was introduced in 3253cb49cbad.
Before that, the lock guarded everything. That means also that
rpl_input() and seg6_input_core() is invoked a BH disabled section which
is what makes it for !RT work.
> > Does this mean that rpl_input() does a local_bh_disable() while
> > obtaining the dst but it never runs outside of bh-disabled section?
> > Because if it can run in preemptible context then it would not be to
> > PREEMPT_RT at which point the Fixes: tags from above would make sense
> > again.
> >
>
> rpl_input() and seg6_input_core() run in softirq context via lwtunnel_input().
> They do local_bh_disable() around dst_cache_get() and dst_cache_set_ip6(), but
> not around ip6_route_input(). The race window is between ip6_route_input()
> returning and dst_cache_set_ip6().
My point was that the Fixes: tag could be updated to 3253cb49cbad
instead. Since everything runs in softirq context, the
local_bh_disable() within that functions is not needed. Otherwise, if
this would not be invoked softirq then preemption would also be possible
on !RT.
Anyway, now it has been merged.
>
> Ciao,
> Andrea
Sebastian
next prev parent reply other threads:[~2026-04-29 11:03 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-21 9:47 [PATCH net] net: ipv6: fix NOREF dst use in seg6 and rpl lwtunnels Andrea Mayer
2026-04-21 14:25 ` Simon Horman
2026-04-21 17:33 ` Justin Iurman
2026-04-23 8:00 ` Sebastian Andrzej Siewior
2026-04-25 14:08 ` Andrea Mayer
2026-04-28 9:14 ` Paolo Abeni
2026-04-29 11:03 ` Sebastian Andrzej Siewior [this message]
2026-04-28 9:30 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260429110341.ipXGaamM@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=alex.aring@gmail.com \
--cc=andrea.mayer@uniroma2.it \
--cc=clrkwllms@kernel.org \
--cc=davem@davemloft.net \
--cc=david.lebrun@uclouvain.be \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=justin.iurman@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rostedt@goodmis.org \
--cc=stable@vger.kernel.org \
--cc=stefano.salsano@uniroma2.it \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.