From: Hyunwoo Kim <imv4bel@gmail.com>
To: Sabrina Dubroca <sd@queasysnail.net>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
pabeni@redhat.com, horms@kernel.org, Julia.Lawall@inria.fr,
linux@treblig.org, nate.karstens@garmin.com,
netdev@vger.kernel.org, imv4bel@gmail.com
Subject: Re: [PATCH net v2] strparser: Fix race condition in strp_done()
Date: Fri, 6 Mar 2026 20:41:02 +0900 [thread overview]
Message-ID: <aaq9TgFpsjn9XWAC@v4bel> (raw)
In-Reply-To: <aaqovyrmd5pJtZKq@krikkit>
On Fri, Mar 06, 2026 at 11:13:19AM +0100, Sabrina Dubroca wrote:
> 2026-03-06, 09:11:04 +0900, Hyunwoo Kim wrote:
> > On Fri, Mar 06, 2026 at 12:35:48AM +0100, Sabrina Dubroca wrote:
> > > Sorry for the delay, I wanted to think about the race condition a bit
> > > more.
> > >
> > > 2026-03-03, 10:50:05 +0900, Hyunwoo Kim wrote:
> > > > On Tue, Mar 03, 2026 at 12:10:33AM +0100, Sabrina Dubroca wrote:
> > > > > 2026-02-27, 06:51:10 +0900, Hyunwoo Kim wrote:
> > > > > > On Mon, Feb 23, 2026 at 06:20:58PM +0100, Sabrina Dubroca wrote:
> > > > > > > 2026-02-20, 18:29:55 +0900, Hyunwoo Kim wrote:
> > > > > > > "strp stopped" is not really enough, I think we'd also need to reset
> > > > > > > the CBs, and then grab bh_lock_sock to make sure a previously-running
> > > > > > > ->sk_data_ready has completed. This is what kcm does, at least.
> > > > > >
> > > > > > It seems that this is not something that should be handled inside strp itself,
> > > > > > but rather something that each caller of strp_stop() is expected to take care
> > > > > > of individually. Would that be the right direction?
> > > > >
> > > > > Agree.
> > > > >
> > > > > > It also appears that ovpn and kcm handle this by implementing their own callback
> > > > > > restoration logic.
> > > > >
> > > > > Right. I tried to look at skmsg/psock (the other user of strp), but
> > > > > didn't get far enough to verify if it's handling this correctly.
> > > > >
> > > > > > > Without that, if strp_recv runs in parallel (not from strp->work) with
> > > > > > > strp_done, cleaning up skb_head in strp_done seems problematic.
> > > > > >
> > > > > > From the espintcp perspective, how about applying a patch along the following lines?
> > > > >
> > > > > This is what I was thinking about, yes.
> > > >
> > > > In my opinion, it might be cleaner to split the espintcp callback restoration work into
> > > > a separate patch, rather than merging it into the strparser v3 patch. What do you think?
> > >
> > > Sure. But once espintcp is fixed in that way, can the original race
> > > condition with strparser still occur? release_sock() will wait for any
> >
> > If the espintcp callback restoration patch is applied, the strparser
> > race should no longer occur in espintcp.
> >
> > > espintcp_data_ready()/strp_data_ready() that's already running, and a
> > > sk_data_ready that starts after we've changed the callbacks will not
> > > end up in strp_data_ready() at all so it won't restart the works that
> > > are being stopped by strp_done()?
> > >
> > > It's quite reasonable to use disable*_work_sync in strp_done, but I'm
> > > not sure there's a bug other than espintcp not terminating itself
> > > correctly on the socket.
> >
> > That said, the _cancel APIs in strparser still appear to carry some
> > structural risk, so it might still make sense to switch to the _disable
> > APIs for the benefit of other strp users or potential future callers.
>
> Not really. Every user of strp that is open to the strp_recv vs
> cancel_* race is also open to the strp_recv vs free race, so switching
> from cancel_* to disable_* is only a partial fix.
>
> But if we took and released the socket lock in strp_done, we would
> solve the issue for all users even without resetting the callbacks?
Looks good to me. With this change, it seems the issue can be resolved
not only for espintcp but for all strp users.
When strp_stop() runs first:
```
cpu0 cpu1
espintcp_close()
strp_stop()
strp->stopped = 1;
espintcp_data_ready()
strp_data_ready()
if (unlikely(strp->stopped)) return;
strp_done()
lock_sock();
release_sock();
cancel_delayed_work_sync(&strp->msg_timer_work);
kfree_skb(strp->skb_head);
```
When strp_data_ready() runs first:
```
cpu0 cpu1
espintcp_data_ready()
strp_data_ready()
if (unlikely(strp->stopped)) return;
espintcp_close()
strp_stop()
strp->stopped = 1;
strp_done()
lock_sock();
strp_read_sock()
tcp_read_sock()
__tcp_read_sock()
strp_recv()
__strp_recv()
head = strp->skb_head;
strp_start_timer()
mod_delayed_work(&strp->msg_timer_work);
...
release_sock();
cancel_delayed_work_sync(&strp->msg_timer_work);
kfree_skb(strp->skb_head);
```
In both cases, the race does not appear to cause any problem.
>
> @@ -503,6 +503,10 @@ void strp_done(struct strparser *strp)
> {
> WARN_ON(!strp->stopped);
>
> + lock_sock(strp->sk);
> + /* sync with other code */
> + release_sock(strp->sk);
> +
> cancel_delayed_work_sync(&strp->msg_timer_work);
> cancel_work_sync(&strp->work);
>
>
>
> - strp->stopped so any new call into strp_data_ready will not do anything
>
> - lock/release need to take bh_lock_sock so any existing call to
> strp_data_ready will have to complete before we move on to cancel*_work
>
>
>
> Or maybe the requirement should be that strp_stop has to be called
From my perspective, adding lock_sock() inside strp_done(), as in the
patch above, looks cleaner.
> under lock_sock() (or even just bh_lock_sock), but again I can't
> figure out if that's ok for sockmap.
sockmap/psock has a more complex call stack compared to other strp
users, so I'm also not entirely certain about that part.
>
>
> > With that in mind, perhaps the direct fix for this race could be handled
> > in the espintcp callback restoration patch. For the strparser patch, I
> > could instead adjust the commit message to reflect that it removes a
> > potential hazard by replacing the _cancel APIs with the _disable
> > variants, and resubmit it in that form.
>
> I'm not going to nack a patch doing s/cancel_/disable_/ in strp_done,
> but it doesn't fully solve the race condition if the caller isn't
> doing the right thing, and it doesn't do anything if the strp user is
> handling the teardown correctly.
I agree with your point there. Still, after the core patches addressing
this race are applied, I plan to resubmit the _disable patch with an
updated commit message. I think applying that change is still beneficial.
Best regards,
Hyunwoo Kim
next prev parent reply other threads:[~2026-03-06 11:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-20 9:29 [PATCH net v2] strparser: Fix race condition in strp_done() Hyunwoo Kim
2026-02-23 17:20 ` Sabrina Dubroca
2026-02-26 21:51 ` Hyunwoo Kim
2026-03-02 23:10 ` Sabrina Dubroca
2026-03-03 1:50 ` Hyunwoo Kim
2026-03-05 23:35 ` Sabrina Dubroca
2026-03-06 0:11 ` Hyunwoo Kim
2026-03-06 10:13 ` Sabrina Dubroca
2026-03-06 11:41 ` Hyunwoo Kim [this message]
2026-03-11 4:13 ` Hyunwoo Kim
2026-03-20 19:07 ` Hyunwoo Kim
2026-03-11 6:34 ` Jiayuan Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aaq9TgFpsjn9XWAC@v4bel \
--to=imv4bel@gmail.com \
--cc=Julia.Lawall@inria.fr \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=linux@treblig.org \
--cc=nate.karstens@garmin.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sd@queasysnail.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox