public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] strparser: Use worker disable API instead of cancellation in strp_done()
@ 2026-02-16  9:48 Hyunwoo Kim
  2026-02-17 18:21 ` Sabrina Dubroca
  0 siblings, 1 reply; 4+ messages in thread
From: Hyunwoo Kim @ 2026-02-16  9:48 UTC (permalink / raw)
  To: davem, edumazet, kuba, pabeni, horms
  Cc: sd, nate.karstens, linux, Julia.Lawall, netdev, imv4bel

When strp_stop() and strp_done() are called without holding lock_sock(), 
they can race with worker-scheduling paths such as the Delayed ACK handler 
and ksoftirqd.
Specifically, after cancel_delayed_work_sync() and cancel_work_sync() are 
invoked from strp_done(), the workers may still be scheduled.
As a result, the workers may dereference freed objects.

To prevent these races, the cancellation APIs are replaced with 
worker-disabling APIs.

Fixes: 829385f08ae9 ("strparser: Use delayed work instead of timer for msg timeout")
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
---
 net/strparser/strparser.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index fe0e76fdd1f1..15cd9cadbd1a 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -503,8 +503,8 @@ void strp_done(struct strparser *strp)
 {
 	WARN_ON(!strp->stopped);
 
-	cancel_delayed_work_sync(&strp->msg_timer_work);
-	cancel_work_sync(&strp->work);
+	disable_delayed_work_sync(&strp->msg_timer_work);
+	disable_work_sync(&strp->work);
 
 	if (strp->skb_head) {
 		kfree_skb(strp->skb_head);
-- 
2.43.0

---
Dear,

The following is a simplified scenario illustrating how each race can occur. Since espintcp_close() does not hold lock_sock(), the race is possible.
Although cancel_work_sync(&strp->work) does not appear to be easy to trigger in practice here, it still seems better to fix it as well.
```
                 cpu0                                cpu1

espintcp_close()
                                         espintcp_data_ready()
                                           if (unlikely(strp->stopped)) return;
  strp_stop()
    strp->stopped = 1;
  strp_done()
    cancel_delayed_work_sync(&strp->msg_timer_work);
                                           strp_data_ready()
                                             strp_read_sock()
                                               tcp_read_sock()
                                                 __tcp_read_sock()
                                                   strp_recv()
                                                     __strp_recv()
                                                       strp_start_timer()
                                                         mod_delayed_work(strp_wq, &strp->msg_timer_work, timeo);  
```
```
                 cpu0                                cpu1

espintcp_close()
                                          sk->sk_data_ready()
                                            espintcp_data_ready()
                                              if (unlikely(strp->stopped)) return;
  strp_stop()
    strp->stopped = 1;
  strp_done()
    cancel_work_sync(&strp->work);
                                              if (strp_read_sock(strp) == -ENOMEM)
                                              queue_work()
```

Best regards,
Hyunwoo Kim

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] strparser: Use worker disable API instead of cancellation in strp_done()
  2026-02-16  9:48 [PATCH] strparser: Use worker disable API instead of cancellation in strp_done() Hyunwoo Kim
@ 2026-02-17 18:21 ` Sabrina Dubroca
  2026-02-17 19:45   ` Hyunwoo Kim
  0 siblings, 1 reply; 4+ messages in thread
From: Sabrina Dubroca @ 2026-02-17 18:21 UTC (permalink / raw)
  To: Hyunwoo Kim
  Cc: davem, edumazet, kuba, pabeni, horms, nate.karstens, linux,
	Julia.Lawall, netdev

2026-02-16, 18:48:08 +0900, Hyunwoo Kim wrote:
> When strp_stop() and strp_done() are called without holding lock_sock(), 
> they can race with worker-scheduling paths such as the Delayed ACK handler 
> and ksoftirqd.
> Specifically, after cancel_delayed_work_sync() and cancel_work_sync() are 
> invoked from strp_done(), the workers may still be scheduled.
> As a result, the workers may dereference freed objects.
> 
> To prevent these races, the cancellation APIs are replaced with 
> worker-disabling APIs.
> 
> Fixes: 829385f08ae9 ("strparser: Use delayed work instead of timer for msg timeout")

That's the correct commit for msg_timer_work, but not for
strp->work. No race was possible when msg timeout was using a timer?
Your second scenario relies only on strp->work so I would think yes.

> Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> ---
>  net/strparser/strparser.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> index fe0e76fdd1f1..15cd9cadbd1a 100644
> --- a/net/strparser/strparser.c
> +++ b/net/strparser/strparser.c
> @@ -503,8 +503,8 @@ void strp_done(struct strparser *strp)
>  {
>  	WARN_ON(!strp->stopped);
>  
> -	cancel_delayed_work_sync(&strp->msg_timer_work);
> -	cancel_work_sync(&strp->work);
> +	disable_delayed_work_sync(&strp->msg_timer_work);
> +	disable_work_sync(&strp->work);

The change itself looks reasonable.

>  	if (strp->skb_head) {
>  		kfree_skb(strp->skb_head);
> -- 
> 2.43.0
> 
> ---
> Dear,
> 
> The following is a simplified scenario illustrating how each race can occur. Since espintcp_close() does not hold lock_sock(), the race is possible.
> Although cancel_work_sync(&strp->work) does not appear to be easy to trigger in practice here, it still seems better to fix it as well.

What about the other users of strp? Only espintcp is racy?

If strp_done can run concurrently with __strp_recv, it seems we could
also end up leaking strp->skb_head, if __strp_recv stores a new one
after we've cleared the old?


> ```
>                  cpu0                                cpu1
> 
> espintcp_close()
>                                          espintcp_data_ready()
>                                            if (unlikely(strp->stopped)) return;
>   strp_stop()
>     strp->stopped = 1;
>   strp_done()
>     cancel_delayed_work_sync(&strp->msg_timer_work);
>                                            strp_data_ready()

In this order, strp_data_ready will see strp->stopped and return
without doing anything.

(I'm confused by the "if (unlikely(strp->stopped))" above though,
maybe you meant espintcp_data_ready -> strp_data_ready -> if (...))

>                                              strp_read_sock()
>                                                tcp_read_sock()
>                                                  __tcp_read_sock()
>                                                    strp_recv()
>                                                      __strp_recv()
>                                                        strp_start_timer()
>                                                          mod_delayed_work(strp_wq, &strp->msg_timer_work, timeo);  
> ```
> ```
>                  cpu0                                cpu1
> 
> espintcp_close()
>                                           sk->sk_data_ready()
>                                             espintcp_data_ready()
>                                               if (unlikely(strp->stopped)) return;
>   strp_stop()
>     strp->stopped = 1;
>   strp_done()
>     cancel_work_sync(&strp->work);
>                                               if (strp_read_sock(strp) == -ENOMEM)
>                                               queue_work()

Here the problem would be if we enter do_strp_work after all the
socket data has already been freed? Otherwise again the test on
strp->stopped will make do_strp_work return early. (this would be
unexpected but should be safe)

> ```
> 
> Best regards,
> Hyunwoo Kim

-- 
Sabrina

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] strparser: Use worker disable API instead of cancellation in strp_done()
  2026-02-17 18:21 ` Sabrina Dubroca
@ 2026-02-17 19:45   ` Hyunwoo Kim
  2026-02-18 18:11     ` Sabrina Dubroca
  0 siblings, 1 reply; 4+ messages in thread
From: Hyunwoo Kim @ 2026-02-17 19:45 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: davem, edumazet, kuba, pabeni, horms, nate.karstens, linux,
	Julia.Lawall, netdev, imv4bel

On Tue, Feb 17, 2026 at 07:21:57PM +0100, Sabrina Dubroca wrote:
> 2026-02-16, 18:48:08 +0900, Hyunwoo Kim wrote:
> > When strp_stop() and strp_done() are called without holding lock_sock(), 
> > they can race with worker-scheduling paths such as the Delayed ACK handler 
> > and ksoftirqd.
> > Specifically, after cancel_delayed_work_sync() and cancel_work_sync() are 
> > invoked from strp_done(), the workers may still be scheduled.
> > As a result, the workers may dereference freed objects.
> > 
> > To prevent these races, the cancellation APIs are replaced with 
> > worker-disabling APIs.
> > 
> > Fixes: 829385f08ae9 ("strparser: Use delayed work instead of timer for msg timeout")
> 
> That's the correct commit for msg_timer_work, but not for
> strp->work. No race was possible when msg timeout was using a timer?

Of course, the race could also occur when the message timeout was 
implemented using a timer.

> Your second scenario relies only on strp->work so I would think yes.

Using Fixes: bbb0302 ("strparser: Generalize strparser") should cover 
both cases.


> 
> > Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> > ---
> >  net/strparser/strparser.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> > index fe0e76fdd1f1..15cd9cadbd1a 100644
> > --- a/net/strparser/strparser.c
> > +++ b/net/strparser/strparser.c
> > @@ -503,8 +503,8 @@ void strp_done(struct strparser *strp)
> >  {
> >  	WARN_ON(!strp->stopped);
> >  
> > -	cancel_delayed_work_sync(&strp->msg_timer_work);
> > -	cancel_work_sync(&strp->work);
> > +	disable_delayed_work_sync(&strp->msg_timer_work);
> > +	disable_work_sync(&strp->work);
> 
> The change itself looks reasonable.
> 
> >  	if (strp->skb_head) {
> >  		kfree_skb(strp->skb_head);
> > -- 
> > 2.43.0
> > 
> > ---
> > Dear,
> > 
> > The following is a simplified scenario illustrating how each race can occur. Since espintcp_close() does not hold lock_sock(), the race is possible.
> > Although cancel_work_sync(&strp->work) does not appear to be easy to trigger in practice here, it still seems better to fix it as well.
> 
> What about the other users of strp? Only espintcp is racy?

Any subsystem that calls strp_stop() and strp_done() outside of 
lock_sock() is racy.

> 
> If strp_done can run concurrently with __strp_recv, it seems we could
> also end up leaking strp->skb_head, if __strp_recv stores a new one
> after we've cleared the old?

I am not fully sure about this part yet, so I think it should be 
discussed separately in another thread.

> 
> 
> > ```
> >                  cpu0                                cpu1
> > 
> > espintcp_close()
> >                                          espintcp_data_ready()
> >                                            if (unlikely(strp->stopped)) return;
> >   strp_stop()
> >     strp->stopped = 1;
> >   strp_done()
> >     cancel_delayed_work_sync(&strp->msg_timer_work);
> >                                            strp_data_ready()
> 
> In this order, strp_data_ready will see strp->stopped and return
> without doing anything.
> 
> (I'm confused by the "if (unlikely(strp->stopped))" above though,
> maybe you meant espintcp_data_ready -> strp_data_ready -> if (...))

Sorry for the confusion. I accidentally mixed up the call order in the 
strp_data_ready() path. 
More precisely, the scenario is that espintcp_data_ready() → strp_data_ready() 
runs first, passes the if (unlikely(strp->stopped)) check, and only after 
that strp->stopped = 1 is set.
```
                 cpu0                                cpu1

espintcp_close()
                                         espintcp_data_ready()
                                           strp_data_ready()
                                             if (unlikely(strp->stopped)) return;
  strp_stop()
    strp->stopped = 1;
  strp_done()
    cancel_delayed_work_sync(&strp->msg_timer_work);
                                             strp_read_sock()
                                               tcp_read_sock()
                                                 __tcp_read_sock()
                                                   strp_recv()
                                                     __strp_recv()
                                                       strp_start_timer()
                                                         mod_delayed_work(strp_wq, &strp->msg_timer_work, timeo);
```
```
                 cpu0                                cpu1

espintcp_close()
                                            espintcp_data_ready()
                                              strp_data_ready()
                                                if (unlikely(strp->stopped)) return;
  strp_stop()
    strp->stopped = 1;
  strp_done()
    cancel_work_sync(&strp->work);
                                                if (strp_read_sock(strp) == -ENOMEM)
                                                  queue_work()
```


> 
> >                                              strp_read_sock()
> >                                                tcp_read_sock()
> >                                                  __tcp_read_sock()
> >                                                    strp_recv()
> >                                                      __strp_recv()
> >                                                        strp_start_timer()
> >                                                          mod_delayed_work(strp_wq, &strp->msg_timer_work, timeo);  
> > ```
> > ```
> >                  cpu0                                cpu1
> > 
> > espintcp_close()
> >                                           sk->sk_data_ready()
> >                                             espintcp_data_ready()
> >                                               if (unlikely(strp->stopped)) return;
> >   strp_stop()
> >     strp->stopped = 1;
> >   strp_done()
> >     cancel_work_sync(&strp->work);
> >                                               if (strp_read_sock(strp) == -ENOMEM)
> >                                               queue_work()
> 
> Here the problem would be if we enter do_strp_work after all the
> socket data has already been freed? Otherwise again the test on
> strp->stopped will make do_strp_work return early. (this would be
> unexpected but should be safe)

If the worker is scheduled after the cancel call, then during 
espintcp_close() → tcp_close(), the sk will be freed and the ctx 
will be freed as well. 
As a result, the kworker will access the freed ctx->strp. 
This access to the freed ctx happens before the actual worker 
handler is called, so the problem occurs regardless of the 
condition checks in do_strp_work().
```
worker_thread()
  assign_work(work, &worker->scheduled)
    move_linked_works(work)    // work: ctx->strp->msg_timer_work or ctx->strp->work
```

> 
> > ```
> > 
> > Best regards,
> > Hyunwoo Kim
> 
> -- 
> Sabrina

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] strparser: Use worker disable API instead of cancellation in strp_done()
  2026-02-17 19:45   ` Hyunwoo Kim
@ 2026-02-18 18:11     ` Sabrina Dubroca
  0 siblings, 0 replies; 4+ messages in thread
From: Sabrina Dubroca @ 2026-02-18 18:11 UTC (permalink / raw)
  To: Hyunwoo Kim
  Cc: davem, edumazet, kuba, pabeni, horms, nate.karstens, linux,
	Julia.Lawall, netdev

2026-02-18, 04:45:33 +0900, Hyunwoo Kim wrote:
> On Tue, Feb 17, 2026 at 07:21:57PM +0100, Sabrina Dubroca wrote:
> > 2026-02-16, 18:48:08 +0900, Hyunwoo Kim wrote:
> > > When strp_stop() and strp_done() are called without holding lock_sock(), 
> > > they can race with worker-scheduling paths such as the Delayed ACK handler 
> > > and ksoftirqd.
> > > Specifically, after cancel_delayed_work_sync() and cancel_work_sync() are 
> > > invoked from strp_done(), the workers may still be scheduled.
> > > As a result, the workers may dereference freed objects.
> > > 
> > > To prevent these races, the cancellation APIs are replaced with 
> > > worker-disabling APIs.
> > > 
> > > Fixes: 829385f08ae9 ("strparser: Use delayed work instead of timer for msg timeout")
> > 
> > That's the correct commit for msg_timer_work, but not for
> > strp->work. No race was possible when msg timeout was using a timer?
> 
> Of course, the race could also occur when the message timeout was 
> implemented using a timer.
> 
> > Your second scenario relies only on strp->work so I would think yes.
> 
> Using Fixes: bbb0302 ("strparser: Generalize strparser") should cover 
> both cases.

Ok.

> > > Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
> > > ---
> > >  net/strparser/strparser.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
> > > index fe0e76fdd1f1..15cd9cadbd1a 100644
> > > --- a/net/strparser/strparser.c
> > > +++ b/net/strparser/strparser.c
> > > @@ -503,8 +503,8 @@ void strp_done(struct strparser *strp)
> > >  {
> > >  	WARN_ON(!strp->stopped);
> > >  
> > > -	cancel_delayed_work_sync(&strp->msg_timer_work);
> > > -	cancel_work_sync(&strp->work);
> > > +	disable_delayed_work_sync(&strp->msg_timer_work);
> > > +	disable_work_sync(&strp->work);
> > 
> > The change itself looks reasonable.
> > 
> > >  	if (strp->skb_head) {
> > >  		kfree_skb(strp->skb_head);
> > > -- 
> > > 2.43.0
> > > 
> > > ---
> > > Dear,
> > > 
> > > The following is a simplified scenario illustrating how each race can occur. Since espintcp_close() does not hold lock_sock(), the race is possible.
> > > Although cancel_work_sync(&strp->work) does not appear to be easy to trigger in practice here, it still seems better to fix it as well.
> > 
> > What about the other users of strp? Only espintcp is racy?
> 
> Any subsystem that calls strp_stop() and strp_done() outside of 
> lock_sock() is racy.

strp_done() has to be called outside of lock_sock(), since both
strp_work() and strp_msg_timeout() need to take lock_sock.

> 
> > 
> > If strp_done can run concurrently with __strp_recv, it seems we could
> > also end up leaking strp->skb_head, if __strp_recv stores a new one
> > after we've cleared the old?
> 
> I am not fully sure about this part yet, so I think it should be 
> discussed separately in another thread.

Maybe not, if it's the same race condition (same code running
concurrently), just with different symptoms.


> > > ```
> > >                  cpu0                                cpu1
> > > 
> > > espintcp_close()
> > >                                          espintcp_data_ready()
> > >                                            if (unlikely(strp->stopped)) return;
> > >   strp_stop()
> > >     strp->stopped = 1;
> > >   strp_done()
> > >     cancel_delayed_work_sync(&strp->msg_timer_work);
> > >                                            strp_data_ready()
> > 
> > In this order, strp_data_ready will see strp->stopped and return
> > without doing anything.
> > 
> > (I'm confused by the "if (unlikely(strp->stopped))" above though,
> > maybe you meant espintcp_data_ready -> strp_data_ready -> if (...))
> 
> Sorry for the confusion. I accidentally mixed up the call order in the 
> strp_data_ready() path. 
> More precisely, the scenario is that espintcp_data_ready() → strp_data_ready() 
> runs first, passes the if (unlikely(strp->stopped)) check, and only after 
> that strp->stopped = 1 is set.

Ok, thanks.

> > > ```
> > >                  cpu0                                cpu1
> > > 
> > > espintcp_close()
> > >                                           sk->sk_data_ready()
> > >                                             espintcp_data_ready()
> > >                                               if (unlikely(strp->stopped)) return;
> > >   strp_stop()
> > >     strp->stopped = 1;
> > >   strp_done()
> > >     cancel_work_sync(&strp->work);
> > >                                               if (strp_read_sock(strp) == -ENOMEM)
> > >                                               queue_work()
> > 
> > Here the problem would be if we enter do_strp_work after all the
> > socket data has already been freed? Otherwise again the test on
> > strp->stopped will make do_strp_work return early. (this would be
> > unexpected but should be safe)
> 
> If the worker is scheduled after the cancel call, then during 
> espintcp_close() → tcp_close(), the sk will be freed and the ctx 
> will be freed as well. 
> As a result, the kworker will access the freed ctx->strp. 
> This access to the freed ctx happens before the actual worker 
> handler is called, so the problem occurs regardless of the 
> condition checks in do_strp_work().

If we're lucky:

cancel_work_sync(strp->work)
                                        queue_work(strp->work)
...
                                        do_strp_work
                                        strp->stopped
...
free(sk)

[anyway, doesn't matter, seems it could indeed happen as you say]


I'm thinking that all those races are not very likely to happen in
real life, since syzbot has not seen them, and it's usually pretty
good at finding races. (which doesn't mean they're not worth fixing)

-- 
Sabrina

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-18 18:11 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-16  9:48 [PATCH] strparser: Use worker disable API instead of cancellation in strp_done() Hyunwoo Kim
2026-02-17 18:21 ` Sabrina Dubroca
2026-02-17 19:45   ` Hyunwoo Kim
2026-02-18 18:11     ` Sabrina Dubroca

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox