public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Sabrina Dubroca <sd@queasysnail.net>
To: Hyunwoo Kim <imv4bel@gmail.com>
Cc: davem@davemloft.net, edumazet@google.com, kuba@kernel.org,
	pabeni@redhat.com, horms@kernel.org, nate.karstens@garmin.com,
	linux@treblig.org, Julia.Lawall@inria.fr, netdev@vger.kernel.org
Subject: Re: [PATCH] strparser: Use worker disable API instead of cancellation in strp_done()
Date: Wed, 18 Feb 2026 19:11:47 +0100	[thread overview]
Message-ID: <aZYA4wrC_HtJdSUq@krikkit> (raw)
In-Reply-To: <aZTFXZcPiu8GBAY7@v4bel>

2026-02-18, 04:45:33 +0900, Hyunwoo Kim wrote:
> On Tue, Feb 17, 2026 at 07:21:57PM +0100, Sabrina Dubroca wrote:
> > 2026-02-16, 18:48:08 +0900, Hyunwoo Kim wrote:
> > > When strp_stop() and strp_done() are called without holding lock_sock(), 
> > > they can race with worker-scheduling paths such as the Delayed ACK handler 
> > > and ksoftirqd.
> > > Specifically, after cancel_delayed_work_sync() and cancel_work_sync() are 
> > > invoked from strp_done(), the workers may still be scheduled.
> > > As a result, the workers may dereference freed objects.
> > > 
> > > To prevent these races, the cancellation APIs are replaced with 
> > > worker-disabling APIs.
> > > 
> > > Fixes: 829385f08ae9 ("strparser: Use delayed work instead of timer for msg timeout")
> > 
> > That's the correct commit for msg_timer_work, but not for
> > strp->work. No race was possible when msg timeout was using a timer?
> 
> Of course, the race could also occur when the message timeout was 
> implemented using a timer.
> 
> > Your second scenario relies only on strp->work so I would think yes.
> 
> Using Fixes: bbb0302 ("strparser: Generalize strparser") should cover 
> both cases.

Ok.

> > > Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> > > ---
> > >  net/strparser/strparser.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> > > index fe0e76fdd1f1..15cd9cadbd1a 100644
> > > --- a/net/strparser/strparser.c
> > > +++ b/net/strparser/strparser.c
> > > @@ -503,8 +503,8 @@ void strp_done(struct strparser *strp)
> > >  {
> > >  	WARN_ON(!strp->stopped);
> > >  
> > > -	cancel_delayed_work_sync(&strp->msg_timer_work);
> > > -	cancel_work_sync(&strp->work);
> > > +	disable_delayed_work_sync(&strp->msg_timer_work);
> > > +	disable_work_sync(&strp->work);
> > 
> > The change itself looks reasonable.
> > 
> > >  	if (strp->skb_head) {
> > >  		kfree_skb(strp->skb_head);
> > > -- 
> > > 2.43.0
> > > 
> > > ---
> > > Dear,
> > > 
> > > The following is a simplified scenario illustrating how each race can occur. Since espintcp_close() does not hold lock_sock(), the race is possible.
> > > Although cancel_work_sync(&strp->work) does not appear to be easy to trigger in practice here, it still seems better to fix it as well.
> > 
> > What about the other users of strp? Only espintcp is racy?
> 
> Any subsystem that calls strp_stop() and strp_done() outside of 
> lock_sock() is racy.

strp_done() has to be called outside of lock_sock(), since both
strp_work() and strp_msg_timeout() need to take lock_sock.

> 
> > 
> > If strp_done can run concurrently with __strp_recv, it seems we could
> > also end up leaking strp->skb_head, if __strp_recv stores a new one
> > after we've cleared the old?
> 
> I am not fully sure about this part yet, so I think it should be 
> discussed separately in another thread.

Maybe not, if it's the same race condition (same code running
concurrently), just with different symptoms.


> > > ```
> > >                  cpu0                                cpu1
> > > 
> > > espintcp_close()
> > >                                          espintcp_data_ready()
> > >                                            if (unlikely(strp->stopped)) return;
> > >   strp_stop()
> > >     strp->stopped = 1;
> > >   strp_done()
> > >     cancel_delayed_work_sync(&strp->msg_timer_work);
> > >                                            strp_data_ready()
> > 
> > In this order, strp_data_ready will see strp->stopped and return
> > without doing anything.
> > 
> > (I'm confused by the "if (unlikely(strp->stopped))" above though,
> > maybe you meant espintcp_data_ready -> strp_data_ready -> if (...))
> 
> Sorry for the confusion. I accidentally mixed up the call order in the 
> strp_data_ready() path. 
> More precisely, the scenario is that espintcp_data_ready() → strp_data_ready() 
> runs first, passes the if (unlikely(strp->stopped)) check, and only after 
> that strp->stopped = 1 is set.

Ok, thanks.

> > > ```
> > >                  cpu0                                cpu1
> > > 
> > > espintcp_close()
> > >                                           sk->sk_data_ready()
> > >                                             espintcp_data_ready()
> > >                                               if (unlikely(strp->stopped)) return;
> > >   strp_stop()
> > >     strp->stopped = 1;
> > >   strp_done()
> > >     cancel_work_sync(&strp->work);
> > >                                               if (strp_read_sock(strp) == -ENOMEM)
> > >                                               queue_work()
> > 
> > Here the problem would be if we enter do_strp_work after all the
> > socket data has already been freed? Otherwise again the test on
> > strp->stopped will make do_strp_work return early. (this would be
> > unexpected but should be safe)
> 
> If the worker is scheduled after the cancel call, then during 
> espintcp_close() → tcp_close(), the sk will be freed and the ctx 
> will be freed as well. 
> As a result, the kworker will access the freed ctx->strp. 
> This access to the freed ctx happens before the actual worker 
> handler is called, so the problem occurs regardless of the 
> condition checks in do_strp_work().

If we're lucky:

cancel_work_sync(strp->work)
                                        queue_work(strp->work)
...
                                        do_strp_work
                                        strp->stopped
...
free(sk)

[anyway, doesn't matter, seems it could indeed happen as you say]


I'm thinking that all those races are not very likely to happen in
real life, since syzbot has not seen them, and it's usually pretty
good at finding races. (which doesn't mean they're not worth fixing)

-- 
Sabrina

      reply	other threads:[~2026-02-18 18:11 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-16  9:48 [PATCH] strparser: Use worker disable API instead of cancellation in strp_done() Hyunwoo Kim
2026-02-17 18:21 ` Sabrina Dubroca
2026-02-17 19:45   ` Hyunwoo Kim
2026-02-18 18:11     ` Sabrina Dubroca [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aZYA4wrC_HtJdSUq@krikkit \
    --to=sd@queasysnail.net \
    --cc=Julia.Lawall@inria.fr \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=imv4bel@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux@treblig.org \
    --cc=nate.karstens@garmin.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox