[PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends
@ 2026-02-27  6:32 Jiayuan Chen
  2026-02-28 17:15 ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: Jiayuan Chen @ 2026-02-27  6:32 UTC (permalink / raw)
  To: netdev
  Cc: jiayuan.chen, Jiayuan Chen, syzbot+ca1345cca66556f3d79b,
	John Fastabend, Jakub Kicinski, Sabrina Dubroca, David S. Miller,
	Eric Dumazet, Paolo Abeni, Simon Horman, Vakul Garg, linux-kernel

From: Jiayuan Chen <jiayuan.chen@shopee.com>

tx_work_handler calls tls_tx_records with flags=-1, which preserves
each record's original tx_flags but results in tcp_sendmsg_locked
using an infinite send timeout. When the peer is unresponsive and the
send buffer is full, tcp_sendmsg_locked blocks indefinitely in
sk_stream_wait_memory. This causes tls_sk_proto_close to hang in
cancel_delayed_work_sync waiting for tx_work_handler to finish,
leading to a hung task:

  INFO: task ...: blocked for more than ... seconds.
  Call Trace:
    cancel_delayed_work_sync
    tls_sw_cancel_work_tx
    tls_sk_proto_close

A workqueue handler should never block indefinitely. Fix this by
introducing __tls_tx_records() with an extra_flags parameter that
gets OR'd into each record's tx_flags. tx_work_handler uses this to
pass MSG_DONTWAIT so tcp_sendmsg_locked returns -EAGAIN immediately
when the send buffer is full, without overwriting the original
per-record flags (MSG_MORE, MSG_NOSIGNAL, etc.). On -EAGAIN, the
existing reschedule mechanism retries after a short delay.

Also consolidate the two identical reschedule paths (lock contention
and -EAGAIN) into one.

Reported-by: syzbot+ca1345cca66556f3d79b@syzkaller.appspotmail.com
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
---
 net/tls/tls_sw.c | 31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 9937d4c810f2..c9d3d44581da 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -404,7 +404,7 @@ static void tls_free_open_rec(struct sock *sk)
 	}
 }
 
-int tls_tx_records(struct sock *sk, int flags)
+static int __tls_tx_records(struct sock *sk, int flags, int extra_flags)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx);
@@ -417,9 +417,9 @@ int tls_tx_records(struct sock *sk, int flags)
 				       struct tls_rec, list);
 
 		if (flags == -1)
-			tx_flags = rec->tx_flags;
+			tx_flags = rec->tx_flags | extra_flags;
 		else
-			tx_flags = flags;
+			tx_flags = flags | extra_flags;
 
 		rc = tls_push_partial_record(sk, tls_ctx, tx_flags);
 		if (rc)
@@ -463,6 +463,11 @@ int tls_tx_records(struct sock *sk, int flags)
 	return rc;
 }
 
+int tls_tx_records(struct sock *sk, int flags)
+{
+	return __tls_tx_records(sk, flags, 0);
+}
+
 static void tls_encrypt_done(void *data, int err)
 {
 	struct tls_sw_context_tx *ctx;
@@ -2629,6 +2634,7 @@ static void tx_work_handler(struct work_struct *work)
 	struct sock *sk = tx_work->sk;
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
 	struct tls_sw_context_tx *ctx;
+	int rc;
 
 	if (unlikely(!tls_ctx))
 		return;
@@ -2642,16 +2648,21 @@ static void tx_work_handler(struct work_struct *work)
 
 	if (mutex_trylock(&tls_ctx->tx_lock)) {
 		lock_sock(sk);
-		tls_tx_records(sk, -1);
+		rc = __tls_tx_records(sk, -1, MSG_DONTWAIT);
 		release_sock(sk);
 		mutex_unlock(&tls_ctx->tx_lock);
-	} else if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) {
-		/* Someone is holding the tx_lock, they will likely run Tx
-		 * and cancel the work on their way out of the lock section.
-		 * Schedule a long delay just in case.
-		 */
-		schedule_delayed_work(&ctx->tx_work.work, msecs_to_jiffies(10));
+		if (rc != -EAGAIN)
+			return;
 	}
+
+	/* Someone is holding the tx_lock, they will likely run Tx
+	 * and cancel the work on their way out of the lock section.
+	 * Schedule a long delay just in case.
+	 * Also reschedule on -EAGAIN when the send buffer is full
+	 * to avoid blocking the workqueue indefinitely.
+	 */
+	if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
+		schedule_delayed_work(&ctx->tx_work.work, msecs_to_jiffies(10));
 }
 
 static bool tls_is_tx_ready(struct tls_sw_context_tx *ctx)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends
  2026-02-27  6:32 [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends Jiayuan Chen
@ 2026-02-28 17:15 ` Jakub Kicinski
  2026-03-01  6:52   ` Jiayuan Chen
  0 siblings, 1 reply; 4+ messages in thread
From: Jakub Kicinski @ 2026-02-28 17:15 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: netdev, Jiayuan Chen, syzbot+ca1345cca66556f3d79b, John Fastabend,
	Sabrina Dubroca, David S. Miller, Eric Dumazet, Paolo Abeni,
	Simon Horman, Vakul Garg, linux-kernel

On Fri, 27 Feb 2026 14:32:31 +0800 Jiayuan Chen wrote:
> tx_work_handler calls tls_tx_records with flags=-1, which preserves
> each record's original tx_flags but results in tcp_sendmsg_locked
> using an infinite send timeout. When the peer is unresponsive and the
> send buffer is full, tcp_sendmsg_locked blocks indefinitely in
> sk_stream_wait_memory. This causes tls_sk_proto_close to hang in
> cancel_delayed_work_sync waiting for tx_work_handler to finish,
> leading to a hung task:
> 
>   INFO: task ...: blocked for more than ... seconds.
>   Call Trace:
>     cancel_delayed_work_sync
>     tls_sw_cancel_work_tx
>     tls_sk_proto_close
> 
> A workqueue handler should never block indefinitely. Fix this by
> introducing __tls_tx_records() with an extra_flags parameter that
> gets OR'd into each record's tx_flags. tx_work_handler uses this to
> pass MSG_DONTWAIT so tcp_sendmsg_locked returns -EAGAIN immediately
> when the send buffer is full, without overwriting the original
> per-record flags (MSG_MORE, MSG_NOSIGNAL, etc.). On -EAGAIN, the
> existing reschedule mechanism retries after a short delay.
> 
> Also consolidate the two identical reschedule paths (lock contention
> and -EAGAIN) into one.

It's not that simple. The default semantics for TCP sockets is that
queuing data and then calling close() is a legitimate thing to do
and the data should be sent cleanly, followed by a normal FIN in such
case.

Maybe we should explore trying to make sure we have enough wmem before
we start creating records. Get rid of the entire workqueue mess?

Regarding your patch I think all callers passing -1 as flags are on 
the close path, you could have just added | DONTWAIT if the flags 
are -1.
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends
  2026-02-28 17:15 ` Jakub Kicinski
@ 2026-03-01  6:52   ` Jiayuan Chen
  2026-03-03  0:38     ` Jakub Kicinski
  0 siblings, 1 reply; 4+ messages in thread
From: Jiayuan Chen @ 2026-03-01  6:52 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: netdev, Jiayuan Chen, syzbot+ca1345cca66556f3d79b, John Fastabend,
	Sabrina Dubroca, David S.  Miller, Eric Dumazet, Paolo  Abeni,
	Simon Horman, Vakul Garg, linux-kernel

March 1, 2026 at 01:15, "Jakub Kicinski" <kuba@kernel.org mailto:kuba@kernel.org?to=%22Jakub%20Kicinski%22%20%3Ckuba%40kernel.org%3E > wrote:


> 
> On Fri, 27 Feb 2026 14:32:31 +0800 Jiayuan Chen wrote:
> 
> > 
> > tx_work_handler calls tls_tx_records with flags=-1, which preserves
> >  each record's original tx_flags but results in tcp_sendmsg_locked
> >  using an infinite send timeout. When the peer is unresponsive and the
> >  send buffer is full, tcp_sendmsg_locked blocks indefinitely in
> >  sk_stream_wait_memory. This causes tls_sk_proto_close to hang in
> >  cancel_delayed_work_sync waiting for tx_work_handler to finish,
> >  leading to a hung task:
> >  
> >  INFO: task ...: blocked for more than ... seconds.
> >  Call Trace:
> >  cancel_delayed_work_sync
> >  tls_sw_cancel_work_tx
> >  tls_sk_proto_close
> >  
> >  A workqueue handler should never block indefinitely. Fix this by
> >  introducing __tls_tx_records() with an extra_flags parameter that
> >  gets OR'd into each record's tx_flags. tx_work_handler uses this to
> >  pass MSG_DONTWAIT so tcp_sendmsg_locked returns -EAGAIN immediately
> >  when the send buffer is full, without overwriting the original
> >  per-record flags (MSG_MORE, MSG_NOSIGNAL, etc.). On -EAGAIN, the
> >  existing reschedule mechanism retries after a short delay.
> >  
> >  Also consolidate the two identical reschedule paths (lock contention
> >  and -EAGAIN) into one.
> > 
> It's not that simple. The default semantics for TCP sockets is that
> queuing data and then calling close() is a legitimate thing to do
> and the data should be sent cleanly, followed by a normal FIN in such
> case.
> 
> Maybe we should explore trying to make sure we have enough wmem before
> we start creating records. Get rid of the entire workqueue mess?



Regarding wmem pre-check: the async crypto path is not triggered by
wmem shortage — it's triggered when the crypto operation itself is
asynchronous (e.g. cryptd fallback when SIMD is unavailable). At the
time tls_do_encryption() returns -EINPROGRESS, wmem may be perfectly
fine. The problem occurs later when tls_encrypt_done() fires and
tx_work_handler tries to push the completed records — by that point
the send buffer may have filled up. Since these are two different
points in time, pre-checking wmem at record creation wouldn't help.

> Regarding your patch I think all callers passing -1 as flags are on 
> the close path, you could have just added | DONTWAIT if the flags 
> are -1.



Regarding adding MSG_DONTWAIT unconditionally when flags == -1:
tls_sw_release_resources_tx() also calls tls_tx_records(sk, -1).
That's in the close path where we actually want to block and flush
remaining records to honour the "close() should send data cleanly"
semantics you mentioned. Making that non-blocking would cause data
loss. So we do need to distinguish between the two callers, which
is why I introduced __tls_tx_records() with the extra_flags parameter.


Thanks,

> pw-bot: cr
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends
  2026-03-01  6:52   ` Jiayuan Chen
@ 2026-03-03  0:38     ` Jakub Kicinski
  0 siblings, 0 replies; 4+ messages in thread
From: Jakub Kicinski @ 2026-03-03  0:38 UTC (permalink / raw)
  To: Jiayuan Chen
  Cc: netdev, Jiayuan Chen, syzbot+ca1345cca66556f3d79b, John Fastabend,
	Sabrina Dubroca, David S.  Miller, Eric Dumazet, Paolo  Abeni,
	Simon Horman, Vakul Garg, linux-kernel

On Sun, 01 Mar 2026 06:52:00 +0000 Jiayuan Chen wrote:
> > On Fri, 27 Feb 2026 14:32:31 +0800 Jiayuan Chen wrote:
> > > tx_work_handler calls tls_tx_records with flags=-1, which preserves
> > >  each record's original tx_flags but results in tcp_sendmsg_locked
> > >  using an infinite send timeout. When the peer is unresponsive and the
> > >  send buffer is full, tcp_sendmsg_locked blocks indefinitely in
> > >  sk_stream_wait_memory. This causes tls_sk_proto_close to hang in
> > >  cancel_delayed_work_sync waiting for tx_work_handler to finish,
> > >  leading to a hung task:
> > >  
> > >  INFO: task ...: blocked for more than ... seconds.
> > >  Call Trace:
> > >  cancel_delayed_work_sync
> > >  tls_sw_cancel_work_tx
> > >  tls_sk_proto_close
> > >  
> > >  A workqueue handler should never block indefinitely. Fix this by
> > >  introducing __tls_tx_records() with an extra_flags parameter that
> > >  gets OR'd into each record's tx_flags. tx_work_handler uses this to
> > >  pass MSG_DONTWAIT so tcp_sendmsg_locked returns -EAGAIN immediately
> > >  when the send buffer is full, without overwriting the original
> > >  per-record flags (MSG_MORE, MSG_NOSIGNAL, etc.). On -EAGAIN, the
> > >  existing reschedule mechanism retries after a short delay.
> > >  
> > >  Also consolidate the two identical reschedule paths (lock contention
> > >  and -EAGAIN) into one.
> > >   
> > It's not that simple. The default semantics for TCP sockets is that
> > queuing data and then calling close() is a legitimate thing to do
> > and the data should be sent cleanly, followed by a normal FIN in such
> > case.
> > 
> > Maybe we should explore trying to make sure we have enough wmem before
> > we start creating records. Get rid of the entire workqueue mess?  
> 
> Regarding wmem pre-check: the async crypto path is not triggered by
> wmem shortage — it's triggered when the crypto operation itself is
> asynchronous (e.g. cryptd fallback when SIMD is unavailable). At the
> time tls_do_encryption() returns -EINPROGRESS, wmem may be perfectly
> fine. The problem occurs later when tls_encrypt_done() fires and
> tx_work_handler tries to push the completed records — by that point
> the send buffer may have filled up. Since these are two different
> points in time, pre-checking wmem at record creation wouldn't help.

My recollection is that the work scheduling in the async encrypt path
is just a duct-tape fix for some old race. The sendmsg() paths should
normally wait for the async crypto to finish before returning to user
space.

> > Regarding your patch I think all callers passing -1 as flags are on 
> > the close path, you could have just added | DONTWAIT if the flags 
> > are -1.  
> 
> Regarding adding MSG_DONTWAIT unconditionally when flags == -1:
> tls_sw_release_resources_tx() also calls tls_tx_records(sk, -1).
> That's in the close path where we actually want to block and flush
> remaining records to honour the "close() should send data cleanly"
> semantics you mentioned. Making that non-blocking would cause data
> loss. So we do need to distinguish between the two callers, which
> is why I introduced __tls_tx_records() with the extra_flags parameter.

Possible, I didn't look very closely.

The extra_flags argument you're adding is extremely inelegant.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-03-03  0:38 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-27  6:32 [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends Jiayuan Chen
2026-02-28 17:15 ` Jakub Kicinski
2026-03-01  6:52   ` Jiayuan Chen
2026-03-03  0:38     ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox