From: Hyunwoo Kim <imv4bel@gmail.com>
To: Sabrina Dubroca <sd@queasysnail.net>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, Julia.Lawall@inria.fr,
linux@treblig.org, nate.karstens@garmin.com,
netdev@vger.kernel.org, imv4bel@gmail.com
Subject: Re: [PATCH net v2] strparser: Fix race condition in strp_done()
Date: Sat, 21 Mar 2026 04:07:47 +0900 [thread overview]
Message-ID: <ab2bA-g2qnwhdWbq@v4bel> (raw)
In-Reply-To: <abDr-fSP9EtP7uEQ@v4bel>
On Wed, Mar 11, 2026 at 01:13:45PM +0900, Hyunwoo Kim wrote:
> On Fri, Mar 06, 2026 at 08:41:02PM +0900, Hyunwoo Kim wrote:
> > On Fri, Mar 06, 2026 at 11:13:19AM +0100, Sabrina Dubroca wrote:
> > > 2026-03-06, 09:11:04 +0900, Hyunwoo Kim wrote:
> > > > On Fri, Mar 06, 2026 at 12:35:48AM +0100, Sabrina Dubroca wrote:
> > > > > Sorry for the delay, I wanted to think about the race condition a bit
> > > > > more.
> > > > >
> > > > > 2026-03-03, 10:50:05 +0900, Hyunwoo Kim wrote:
> > > > > > On Tue, Mar 03, 2026 at 12:10:33AM +0100, Sabrina Dubroca wrote:
> > > > > > > 2026-02-27, 06:51:10 +0900, Hyunwoo Kim wrote:
> > > > > > > > On Mon, Feb 23, 2026 at 06:20:58PM +0100, Sabrina Dubroca wrote:
> > > > > > > > > 2026-02-20, 18:29:55 +0900, Hyunwoo Kim wrote:
> > > > > > > > > "strp stopped" is not really enough, I think we'd also need to reset
> > > > > > > > > the CBs, and then grab bh_lock_sock to make sure a previously-running
> > > > > > > > > ->sk_data_ready has completed. This is what kcm does, at least.
> > > > > > > >
> > > > > > > > It seems that this is not something that should be handled inside strp itself,
> > > > > > > > but rather something that each caller of strp_stop() is expected to take care
> > > > > > > > of individually. Would that be the right direction?
> > > > > > >
> > > > > > > Agree.
> > > > > > >
> > > > > > > > It also appears that ovpn and kcm handle this by implementing their own callback
> > > > > > > > restoration logic.
> > > > > > >
> > > > > > > Right. I tried to look at skmsg/psock (the other user of strp), but
> > > > > > > didn't get far enough to verify if it's handling this correctly.
> > > > > > >
> > > > > > > > > Without that, if strp_recv runs in parallel (not from strp->work) with
> > > > > > > > > strp_done, cleaning up skb_head in strp_done seems problematic.
> > > > > > > >
> > > > > > > > From the espintcp perspective, how about applying a patch along the following lines?
> > > > > > >
> > > > > > > This is what I was thinking about, yes.
> > > > > >
> > > > > > In my opinion, it might be cleaner to split the espintcp callback restoration work into
> > > > > > a separate patch, rather than merging it into the strparser v3 patch. What do you think?
> > > > >
> > > > > Sure. But once espintcp is fixed in that way, can the original race
> > > > > condition with strparser still occur? release_sock() will wait for any
> > > >
> > > > If the espintcp callback restoration patch is applied, the strparser
> > > > race should no longer occur in espintcp.
> > > >
> > > > > espintcp_data_ready()/strp_data_ready() that's already running, and a
> > > > > sk_data_ready that starts after we've changed the callbacks will not
> > > > > end up in strp_data_ready() at all so it won't restart the works that
> > > > > are being stopped by strp_done()?
> > > > >
> > > > > It's quite reasonable to use disable*_work_sync in strp_done, but I'm
> > > > > not sure there's a bug other than espintcp not terminating itself
> > > > > correctly on the socket.
> > > >
> > > > That said, the _cancel APIs in strparser still appear to carry some
> > > > structural risk, so it might still make sense to switch to the _disable
> > > > APIs for the benefit of other strp users or potential future callers.
> > >
> > > Not really. Every user of strp that is open to the strp_recv vs
> > > cancel_* race is also open to the strp_recv vs free race, so switching
> > > from cancel_* to disable_* is only a partial fix.
> > >
> > > But if we took and released the socket lock in strp_done, we would
> > > solve the issue for all users even without resetting the callbacks?
> >
> > Looks good to me. With this change, it seems the issue can be resolved
> > not only for espintcp but for all strp users.
> >
> > When strp_stop() runs first:
> > ```
> > cpu0 cpu1
> >
> > espintcp_close()
> > strp_stop()
> > strp->stopped = 1;
> > espintcp_data_ready()
> > strp_data_ready()
> > if (unlikely(strp->stopped)) return;
> > strp_done()
> > lock_sock();
> > release_sock();
> > cancel_delayed_work_sync(&strp->msg_timer_work);
> > kfree_skb(strp->skb_head);
> > ```
> >
> > When strp_data_ready() runs first:
> > ```
> > cpu0 cpu1
> >
> > espintcp_data_ready()
> > strp_data_ready()
> > if (unlikely(strp->stopped)) return;
> > espintcp_close()
> > strp_stop()
> > strp->stopped = 1;
> > strp_done()
> > lock_sock();
> > strp_read_sock()
> > tcp_read_sock()
> > __tcp_read_sock()
> > strp_recv()
> > __strp_recv()
> > head = strp->skb_head;
> > strp_start_timer()
> > mod_delayed_work(&strp->msg_timer_work);
> > ...
> > release_sock();
> > cancel_delayed_work_sync(&strp->msg_timer_work);
> > kfree_skb(strp->skb_head);
> > ```
> > In both cases, the race does not appear to cause any problem.
> >
> > >
> > > @@ -503,6 +503,10 @@ void strp_done(struct strparser *strp)
> > > {
> > > WARN_ON(!strp->stopped);
> > >
> > > + lock_sock(strp->sk);
> > > + /* sync with other code */
> > > + release_sock(strp->sk);
> > > +
> > > cancel_delayed_work_sync(&strp->msg_timer_work);
> > > cancel_work_sync(&strp->work);
> > >
> > >
> > >
> > > - strp->stopped so any new call into strp_data_ready will not do anything
> > >
> > > - lock/release need to take bh_lock_sock so any existing call to
> > > strp_data_ready will have to complete before we move on to cancel*_work
> > >
> > >
> > >
> > > Or maybe the requirement should be that strp_stop has to be called
> >
> > From my perspective, adding lock_sock() inside strp_done(), as in the
> > patch above, looks cleaner.
> >
> > > under lock_sock() (or even just bh_lock_sock), but again I can't
> > > figure out if that's ok for sockmap.
> >
> > sockmap/psock has a more complex call stack compared to other strp
> > users, so I'm also not entirely certain about that part.
>
> I looked into the sockmap/psock side. sk_psock_strp_data_ready() is protected
> by read_lock_bh(&sk->sk_callback_lock), and during teardown sk_psock_drop()
> performs callback restoration and strp_stop() under
> write_lock_bh(&sk->sk_callback_lock), so sockmap/psock doesn't have this race
> to begin with.
>
> As for introducing this patch, in sockmap/psock strp_done() is only called
> from sk_psock_destroy(), which is scheduled via queue_rcu_work() and runs
> on system_percpu_wq after an RCU GP, so no locks including lock_sock are held
> at that point. And since lock_sock is released before cancel_work_sync,
> there's no circular dependency with do_strp_work/strp_msg_timeout either.
> So this patch shouldn't introduce any new issues for sockmap/psock.
Hi Sabrina,
Could you please provide an update on the status of this patch?
Best regards,
Hyunwoo Kim
>
> >
> > >
> > >
> > > > With that in mind, perhaps the direct fix for this race could be handled
> > > > in the espintcp callback restoration patch. For the strparser patch, I
> > > > could instead adjust the commit message to reflect that it removes a
> > > > potential hazard by replacing the _cancel APIs with the _disable
> > > > variants, and resubmit it in that form.
> > >
> > > I'm not going to nack a patch doing s/cancel_/disable_/ in strp_done,
> > > but it doesn't fully solve the race condition if the caller isn't
> > > doing the right thing, and it doesn't do anything if the strp user is
> > > handling the teardown correctly.
> >
> > I agree with your point there. Still, after the core patches addressing
> > this race are applied, I plan to resubmit the _disable patch with an
> > updated commit message. I think applying that change is still beneficial.
> >
> >
> > Best regards,
> > Hyunwoo Kim
next prev parent reply other threads:[~2026-03-20 19:07 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-20 9:29 [PATCH net v2] strparser: Fix race condition in strp_done() Hyunwoo Kim
2026-02-23 17:20 ` Sabrina Dubroca
2026-02-26 21:51 ` Hyunwoo Kim
2026-03-02 23:10 ` Sabrina Dubroca
2026-03-03 1:50 ` Hyunwoo Kim
2026-03-05 23:35 ` Sabrina Dubroca
2026-03-06 0:11 ` Hyunwoo Kim
2026-03-06 10:13 ` Sabrina Dubroca
2026-03-06 11:41 ` Hyunwoo Kim
2026-03-11 4:13 ` Hyunwoo Kim
2026-03-20 19:07 ` Hyunwoo Kim [this message]
2026-03-11 6:34 ` Jiayuan Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ab2bA-g2qnwhdWbq@v4bel \
--to=imv4bel@gmail.com \
--cc=Julia.Lawall@inria.fr \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux@treblig.org \
--cc=nate.karstens@garmin.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sd@queasysnail.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox