* [PATCH] strparser: Use worker disable API instead of cancellation in strp_done()
@ 2026-02-16 9:48 Hyunwoo Kim
2026-02-17 18:21 ` Sabrina Dubroca
0 siblings, 1 reply; 4+ messages in thread
From: Hyunwoo Kim @ 2026-02-16 9:48 UTC (permalink / raw)
To: davem, edumazet, kuba, pabeni, horms
Cc: sd, nate.karstens, linux, Julia.Lawall, netdev, imv4bel
When strp_stop() and strp_done() are called without holding lock_sock(),
they can race with worker-scheduling paths such as the Delayed ACK handler
and ksoftirqd.
Specifically, after cancel_delayed_work_sync() and cancel_work_sync() are
invoked from strp_done(), the workers may still be scheduled.
As a result, the workers may dereference freed objects.
To prevent these races, the cancellation APIs are replaced with
worker-disabling APIs.
Fixes: 829385f08ae9 ("strparser: Use delayed work instead of timer for msg timeout")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
net/strparser/strparser.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index fe0e76fdd1f1..15cd9cadbd1a 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -503,8 +503,8 @@ void strp_done(struct strparser *strp)
{
WARN_ON(!strp->stopped);
- cancel_delayed_work_sync(&strp->msg_timer_work);
- cancel_work_sync(&strp->work);
+ disable_delayed_work_sync(&strp->msg_timer_work);
+ disable_work_sync(&strp->work);
if (strp->skb_head) {
kfree_skb(strp->skb_head);
--
2.43.0
---
Dear,
The following is a simplified scenario illustrating how each race can occur. Since espintcp_close() does not hold lock_sock(), the race is possible.
Although cancel_work_sync(&strp->work) does not appear to be easy to trigger in practice here, it still seems better to fix it as well.
```
cpu0 cpu1
espintcp_close()
espintcp_data_ready()
if (unlikely(strp->stopped)) return;
strp_stop()
strp->stopped = 1;
strp_done()
cancel_delayed_work_sync(&strp->msg_timer_work);
strp_data_ready()
strp_read_sock()
tcp_read_sock()
__tcp_read_sock()
strp_recv()
__strp_recv()
strp_start_timer()
mod_delayed_work(strp_wq, &strp->msg_timer_work, timeo);
```
```
cpu0 cpu1
espintcp_close()
sk->sk_data_ready()
espintcp_data_ready()
if (unlikely(strp->stopped)) return;
strp_stop()
strp->stopped = 1;
strp_done()
cancel_work_sync(&strp->work);
if (strp_read_sock(strp) == -ENOMEM)
queue_work()
```
Best regards,
Hyunwoo Kim
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] strparser: Use worker disable API instead of cancellation in strp_done()
2026-02-16 9:48 [PATCH] strparser: Use worker disable API instead of cancellation in strp_done() Hyunwoo Kim
@ 2026-02-17 18:21 ` Sabrina Dubroca
2026-02-17 19:45 ` Hyunwoo Kim
0 siblings, 1 reply; 4+ messages in thread
From: Sabrina Dubroca @ 2026-02-17 18:21 UTC (permalink / raw)
To: Hyunwoo Kim
Cc: davem, edumazet, kuba, pabeni, horms, nate.karstens, linux,
Julia.Lawall, netdev
2026-02-16, 18:48:08 +0900, Hyunwoo Kim wrote:
> When strp_stop() and strp_done() are called without holding lock_sock(),
> they can race with worker-scheduling paths such as the Delayed ACK handler
> and ksoftirqd.
> Specifically, after cancel_delayed_work_sync() and cancel_work_sync() are
> invoked from strp_done(), the workers may still be scheduled.
> As a result, the workers may dereference freed objects.
>
> To prevent these races, the cancellation APIs are replaced with
> worker-disabling APIs.
>
> Fixes: 829385f08ae9 ("strparser: Use delayed work instead of timer for msg timeout")
That's the correct commit for msg_timer_work, but not for
strp->work. No race was possible when msg timeout was using a timer?
Your second scenario relies only on strp->work so I would think yes.
> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> ---
> net/strparser/strparser.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> index fe0e76fdd1f1..15cd9cadbd1a 100644
> --- a/net/strparser/strparser.c
> +++ b/net/strparser/strparser.c
> @@ -503,8 +503,8 @@ void strp_done(struct strparser *strp)
> {
> WARN_ON(!strp->stopped);
>
> - cancel_delayed_work_sync(&strp->msg_timer_work);
> - cancel_work_sync(&strp->work);
> + disable_delayed_work_sync(&strp->msg_timer_work);
> + disable_work_sync(&strp->work);
The change itself looks reasonable.
> if (strp->skb_head) {
> kfree_skb(strp->skb_head);
> --
> 2.43.0
>
> ---
> Dear,
>
> The following is a simplified scenario illustrating how each race can occur. Since espintcp_close() does not hold lock_sock(), the race is possible.
> Although cancel_work_sync(&strp->work) does not appear to be easy to trigger in practice here, it still seems better to fix it as well.
What about the other users of strp? Only espintcp is racy?
If strp_done can run concurrently with __strp_recv, it seems we could
also end up leaking strp->skb_head, if __strp_recv stores a new one
after we've cleared the old?
> ```
> cpu0 cpu1
>
> espintcp_close()
> espintcp_data_ready()
> if (unlikely(strp->stopped)) return;
> strp_stop()
> strp->stopped = 1;
> strp_done()
> cancel_delayed_work_sync(&strp->msg_timer_work);
> strp_data_ready()
In this order, strp_data_ready will see strp->stopped and return
without doing anything.
(I'm confused by the "if (unlikely(strp->stopped))" above though,
maybe you meant espintcp_data_ready -> strp_data_ready -> if (...))
> strp_read_sock()
> tcp_read_sock()
> __tcp_read_sock()
> strp_recv()
> __strp_recv()
> strp_start_timer()
> mod_delayed_work(strp_wq, &strp->msg_timer_work, timeo);
> ```
> ```
> cpu0 cpu1
>
> espintcp_close()
> sk->sk_data_ready()
> espintcp_data_ready()
> if (unlikely(strp->stopped)) return;
> strp_stop()
> strp->stopped = 1;
> strp_done()
> cancel_work_sync(&strp->work);
> if (strp_read_sock(strp) == -ENOMEM)
> queue_work()
Here the problem would be if we enter do_strp_work after all the
socket data has already been freed? Otherwise again the test on
strp->stopped will make do_strp_work return early. (this would be
unexpected but should be safe)
> ```
>
> Best regards,
> Hyunwoo Kim
--
Sabrina
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] strparser: Use worker disable API instead of cancellation in strp_done()
2026-02-17 18:21 ` Sabrina Dubroca
@ 2026-02-17 19:45 ` Hyunwoo Kim
2026-02-18 18:11 ` Sabrina Dubroca
0 siblings, 1 reply; 4+ messages in thread
From: Hyunwoo Kim @ 2026-02-17 19:45 UTC (permalink / raw)
To: Sabrina Dubroca
Cc: davem, edumazet, kuba, pabeni, horms, nate.karstens, linux,
Julia.Lawall, netdev, imv4bel
On Tue, Feb 17, 2026 at 07:21:57PM +0100, Sabrina Dubroca wrote:
> 2026-02-16, 18:48:08 +0900, Hyunwoo Kim wrote:
> > When strp_stop() and strp_done() are called without holding lock_sock(),
> > they can race with worker-scheduling paths such as the Delayed ACK handler
> > and ksoftirqd.
> > Specifically, after cancel_delayed_work_sync() and cancel_work_sync() are
> > invoked from strp_done(), the workers may still be scheduled.
> > As a result, the workers may dereference freed objects.
> >
> > To prevent these races, the cancellation APIs are replaced with
> > worker-disabling APIs.
> >
> > Fixes: 829385f08ae9 ("strparser: Use delayed work instead of timer for msg timeout")
>
> That's the correct commit for msg_timer_work, but not for
> strp->work. No race was possible when msg timeout was using a timer?
Of course, the race could also occur when the message timeout was
implemented using a timer.
> Your second scenario relies only on strp->work so I would think yes.
Using Fixes: bbb0302 ("strparser: Generalize strparser") should cover
both cases.
>
> > Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> > ---
> > net/strparser/strparser.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> > index fe0e76fdd1f1..15cd9cadbd1a 100644
> > --- a/net/strparser/strparser.c
> > +++ b/net/strparser/strparser.c
> > @@ -503,8 +503,8 @@ void strp_done(struct strparser *strp)
> > {
> > WARN_ON(!strp->stopped);
> >
> > - cancel_delayed_work_sync(&strp->msg_timer_work);
> > - cancel_work_sync(&strp->work);
> > + disable_delayed_work_sync(&strp->msg_timer_work);
> > + disable_work_sync(&strp->work);
>
> The change itself looks reasonable.
>
> > if (strp->skb_head) {
> > kfree_skb(strp->skb_head);
> > --
> > 2.43.0
> >
> > ---
> > Dear,
> >
> > The following is a simplified scenario illustrating how each race can occur. Since espintcp_close() does not hold lock_sock(), the race is possible.
> > Although cancel_work_sync(&strp->work) does not appear to be easy to trigger in practice here, it still seems better to fix it as well.
>
> What about the other users of strp? Only espintcp is racy?
Any subsystem that calls strp_stop() and strp_done() outside of
lock_sock() is racy.
>
> If strp_done can run concurrently with __strp_recv, it seems we could
> also end up leaking strp->skb_head, if __strp_recv stores a new one
> after we've cleared the old?
I am not fully sure about this part yet, so I think it should be
discussed separately in another thread.
>
>
> > ```
> > cpu0 cpu1
> >
> > espintcp_close()
> > espintcp_data_ready()
> > if (unlikely(strp->stopped)) return;
> > strp_stop()
> > strp->stopped = 1;
> > strp_done()
> > cancel_delayed_work_sync(&strp->msg_timer_work);
> > strp_data_ready()
>
> In this order, strp_data_ready will see strp->stopped and return
> without doing anything.
>
> (I'm confused by the "if (unlikely(strp->stopped))" above though,
> maybe you meant espintcp_data_ready -> strp_data_ready -> if (...))
Sorry for the confusion. I accidentally mixed up the call order in the
strp_data_ready() path.
More precisely, the scenario is that espintcp_data_ready() → strp_data_ready()
runs first, passes the if (unlikely(strp->stopped)) check, and only after
that strp->stopped = 1 is set.
```
cpu0 cpu1
espintcp_close()
espintcp_data_ready()
strp_data_ready()
if (unlikely(strp->stopped)) return;
strp_stop()
strp->stopped = 1;
strp_done()
cancel_delayed_work_sync(&strp->msg_timer_work);
strp_read_sock()
tcp_read_sock()
__tcp_read_sock()
strp_recv()
__strp_recv()
strp_start_timer()
mod_delayed_work(strp_wq, &strp->msg_timer_work, timeo);
```
```
cpu0 cpu1
espintcp_close()
espintcp_data_ready()
strp_data_ready()
if (unlikely(strp->stopped)) return;
strp_stop()
strp->stopped = 1;
strp_done()
cancel_work_sync(&strp->work);
if (strp_read_sock(strp) == -ENOMEM)
queue_work()
```
>
> > strp_read_sock()
> > tcp_read_sock()
> > __tcp_read_sock()
> > strp_recv()
> > __strp_recv()
> > strp_start_timer()
> > mod_delayed_work(strp_wq, &strp->msg_timer_work, timeo);
> > ```
> > ```
> > cpu0 cpu1
> >
> > espintcp_close()
> > sk->sk_data_ready()
> > espintcp_data_ready()
> > if (unlikely(strp->stopped)) return;
> > strp_stop()
> > strp->stopped = 1;
> > strp_done()
> > cancel_work_sync(&strp->work);
> > if (strp_read_sock(strp) == -ENOMEM)
> > queue_work()
>
> Here the problem would be if we enter do_strp_work after all the
> socket data has already been freed? Otherwise again the test on
> strp->stopped will make do_strp_work return early. (this would be
> unexpected but should be safe)
If the worker is scheduled after the cancel call, then during
espintcp_close() → tcp_close(), the sk will be freed and the ctx
will be freed as well.
As a result, the kworker will access the freed ctx->strp.
This access to the freed ctx happens before the actual worker
handler is called, so the problem occurs regardless of the
condition checks in do_strp_work().
```
worker_thread()
assign_work(work, &worker->scheduled)
move_linked_works(work) // work: ctx->strp->msg_timer_work or ctx->strp->work
```
>
> > ```
> >
> > Best regards,
> > Hyunwoo Kim
>
> --
> Sabrina
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] strparser: Use worker disable API instead of cancellation in strp_done()
2026-02-17 19:45 ` Hyunwoo Kim
@ 2026-02-18 18:11 ` Sabrina Dubroca
0 siblings, 0 replies; 4+ messages in thread
From: Sabrina Dubroca @ 2026-02-18 18:11 UTC (permalink / raw)
To: Hyunwoo Kim
Cc: davem, edumazet, kuba, pabeni, horms, nate.karstens, linux,
Julia.Lawall, netdev
2026-02-18, 04:45:33 +0900, Hyunwoo Kim wrote:
> On Tue, Feb 17, 2026 at 07:21:57PM +0100, Sabrina Dubroca wrote:
> > 2026-02-16, 18:48:08 +0900, Hyunwoo Kim wrote:
> > > When strp_stop() and strp_done() are called without holding lock_sock(),
> > > they can race with worker-scheduling paths such as the Delayed ACK handler
> > > and ksoftirqd.
> > > Specifically, after cancel_delayed_work_sync() and cancel_work_sync() are
> > > invoked from strp_done(), the workers may still be scheduled.
> > > As a result, the workers may dereference freed objects.
> > >
> > > To prevent these races, the cancellation APIs are replaced with
> > > worker-disabling APIs.
> > >
> > > Fixes: 829385f08ae9 ("strparser: Use delayed work instead of timer for msg timeout")
> >
> > That's the correct commit for msg_timer_work, but not for
> > strp->work. No race was possible when msg timeout was using a timer?
>
> Of course, the race could also occur when the message timeout was
> implemented using a timer.
>
> > Your second scenario relies only on strp->work so I would think yes.
>
> Using Fixes: bbb0302 ("strparser: Generalize strparser") should cover
> both cases.
Ok.
> > > Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> > > ---
> > > net/strparser/strparser.c | 4 ++--
> > > 1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> > > index fe0e76fdd1f1..15cd9cadbd1a 100644
> > > --- a/net/strparser/strparser.c
> > > +++ b/net/strparser/strparser.c
> > > @@ -503,8 +503,8 @@ void strp_done(struct strparser *strp)
> > > {
> > > WARN_ON(!strp->stopped);
> > >
> > > - cancel_delayed_work_sync(&strp->msg_timer_work);
> > > - cancel_work_sync(&strp->work);
> > > + disable_delayed_work_sync(&strp->msg_timer_work);
> > > + disable_work_sync(&strp->work);
> >
> > The change itself looks reasonable.
> >
> > > if (strp->skb_head) {
> > > kfree_skb(strp->skb_head);
> > > --
> > > 2.43.0
> > >
> > > ---
> > > Dear,
> > >
> > > The following is a simplified scenario illustrating how each race can occur. Since espintcp_close() does not hold lock_sock(), the race is possible.
> > > Although cancel_work_sync(&strp->work) does not appear to be easy to trigger in practice here, it still seems better to fix it as well.
> >
> > What about the other users of strp? Only espintcp is racy?
>
> Any subsystem that calls strp_stop() and strp_done() outside of
> lock_sock() is racy.
strp_done() has to be called outside of lock_sock(), since both
strp_work() and strp_msg_timeout() need to take lock_sock.
>
> >
> > If strp_done can run concurrently with __strp_recv, it seems we could
> > also end up leaking strp->skb_head, if __strp_recv stores a new one
> > after we've cleared the old?
>
> I am not fully sure about this part yet, so I think it should be
> discussed separately in another thread.
Maybe not, if it's the same race condition (same code running
concurrently), just with different symptoms.
> > > ```
> > > cpu0 cpu1
> > >
> > > espintcp_close()
> > > espintcp_data_ready()
> > > if (unlikely(strp->stopped)) return;
> > > strp_stop()
> > > strp->stopped = 1;
> > > strp_done()
> > > cancel_delayed_work_sync(&strp->msg_timer_work);
> > > strp_data_ready()
> >
> > In this order, strp_data_ready will see strp->stopped and return
> > without doing anything.
> >
> > (I'm confused by the "if (unlikely(strp->stopped))" above though,
> > maybe you meant espintcp_data_ready -> strp_data_ready -> if (...))
>
> Sorry for the confusion. I accidentally mixed up the call order in the
> strp_data_ready() path.
> More precisely, the scenario is that espintcp_data_ready() → strp_data_ready()
> runs first, passes the if (unlikely(strp->stopped)) check, and only after
> that strp->stopped = 1 is set.
Ok, thanks.
> > > ```
> > > cpu0 cpu1
> > >
> > > espintcp_close()
> > > sk->sk_data_ready()
> > > espintcp_data_ready()
> > > if (unlikely(strp->stopped)) return;
> > > strp_stop()
> > > strp->stopped = 1;
> > > strp_done()
> > > cancel_work_sync(&strp->work);
> > > if (strp_read_sock(strp) == -ENOMEM)
> > > queue_work()
> >
> > Here the problem would be if we enter do_strp_work after all the
> > socket data has already been freed? Otherwise again the test on
> > strp->stopped will make do_strp_work return early. (this would be
> > unexpected but should be safe)
>
> If the worker is scheduled after the cancel call, then during
> espintcp_close() → tcp_close(), the sk will be freed and the ctx
> will be freed as well.
> As a result, the kworker will access the freed ctx->strp.
> This access to the freed ctx happens before the actual worker
> handler is called, so the problem occurs regardless of the
> condition checks in do_strp_work().
If we're lucky:
cancel_work_sync(strp->work)
queue_work(strp->work)
...
do_strp_work
strp->stopped
...
free(sk)
[anyway, doesn't matter, seems it could indeed happen as you say]
I'm thinking that all those races are not very likely to happen in
real life, since syzbot has not seen them, and it's usually pretty
good at finding races. (which doesn't mean they're not worth fixing)
--
Sabrina
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-18 18:11 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-16 9:48 [PATCH] strparser: Use worker disable API instead of cancellation in strp_done() Hyunwoo Kim
2026-02-17 18:21 ` Sabrina Dubroca
2026-02-17 19:45 ` Hyunwoo Kim
2026-02-18 18:11 ` Sabrina Dubroca
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox