All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Jakub Kicinski" <kuba@kernel.org>
Cc: netdev@vger.kernel.org, "Jiayuan Chen" <jiayuan.chen@shopee.com>,
	syzbot+ca1345cca66556f3d79b@syzkaller.appspotmail.com,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Sabrina Dubroca" <sd@queasysnail.net>,
	"David S.  Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Paolo  Abeni" <pabeni@redhat.com>,
	"Simon Horman" <horms@kernel.org>,
	"Vakul Garg" <vakul.garg@nxp.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends
Date: Sun, 01 Mar 2026 06:52:00 +0000	[thread overview]
Message-ID: <f85c8e268a8384fe14ba5b0033b9f9b931afc122@linux.dev> (raw)
In-Reply-To: <20260228091545.412a9a2d@kernel.org>

March 1, 2026 at 01:15, "Jakub Kicinski" <kuba@kernel.org mailto:kuba@kernel.org?to=%22Jakub%20Kicinski%22%20%3Ckuba%40kernel.org%3E > wrote:


> 
> On Fri, 27 Feb 2026 14:32:31 +0800 Jiayuan Chen wrote:
> 
> > 
> > tx_work_handler calls tls_tx_records with flags=-1, which preserves
> >  each record's original tx_flags but results in tcp_sendmsg_locked
> >  using an infinite send timeout. When the peer is unresponsive and the
> >  send buffer is full, tcp_sendmsg_locked blocks indefinitely in
> >  sk_stream_wait_memory. This causes tls_sk_proto_close to hang in
> >  cancel_delayed_work_sync waiting for tx_work_handler to finish,
> >  leading to a hung task:
> >  
> >  INFO: task ...: blocked for more than ... seconds.
> >  Call Trace:
> >  cancel_delayed_work_sync
> >  tls_sw_cancel_work_tx
> >  tls_sk_proto_close
> >  
> >  A workqueue handler should never block indefinitely. Fix this by
> >  introducing __tls_tx_records() with an extra_flags parameter that
> >  gets OR'd into each record's tx_flags. tx_work_handler uses this to
> >  pass MSG_DONTWAIT so tcp_sendmsg_locked returns -EAGAIN immediately
> >  when the send buffer is full, without overwriting the original
> >  per-record flags (MSG_MORE, MSG_NOSIGNAL, etc.). On -EAGAIN, the
> >  existing reschedule mechanism retries after a short delay.
> >  
> >  Also consolidate the two identical reschedule paths (lock contention
> >  and -EAGAIN) into one.
> > 
> It's not that simple. The default semantics for TCP sockets is that
> queuing data and then calling close() is a legitimate thing to do
> and the data should be sent cleanly, followed by a normal FIN in such
> case.
> 
> Maybe we should explore trying to make sure we have enough wmem before
> we start creating records. Get rid of the entire workqueue mess?



Regarding wmem pre-check: the async crypto path is not triggered by
wmem shortage — it's triggered when the crypto operation itself is
asynchronous (e.g. cryptd fallback when SIMD is unavailable). At the
time tls_do_encryption() returns -EINPROGRESS, wmem may be perfectly
fine. The problem occurs later when tls_encrypt_done() fires and
tx_work_handler tries to push the completed records — by that point
the send buffer may have filled up. Since these are two different
points in time, pre-checking wmem at record creation wouldn't help.

> Regarding your patch I think all callers passing -1 as flags are on 
> the close path, you could have just added | DONTWAIT if the flags 
> are -1.



Regarding adding MSG_DONTWAIT unconditionally when flags == -1:
tls_sw_release_resources_tx() also calls tls_tx_records(sk, -1).
That's in the close path where we actually want to block and flush
remaining records to honour the "close() should send data cleanly"
semantics you mentioned. Making that non-blocking would cause data
loss. So we do need to distinguish between the two callers, which
is why I introduced __tls_tx_records() with the extra_flags parameter.


Thanks,

> pw-bot: cr
>

  reply	other threads:[~2026-03-01  6:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27  6:32 [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends Jiayuan Chen
2026-02-28 17:15 ` Jakub Kicinski
2026-03-01  6:52   ` Jiayuan Chen [this message]
2026-03-03  0:38     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f85c8e268a8384fe14ba5b0033b9f9b931afc122@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jiayuan.chen@shopee.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sd@queasysnail.net \
    --cc=syzbot+ca1345cca66556f3d79b@syzkaller.appspotmail.com \
    --cc=vakul.garg@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.