public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: "Jiayuan Chen" <jiayuan.chen@linux.dev>
To: "Jakub Kicinski" <kuba@kernel.org>
Cc: netdev@vger.kernel.org, "Jiayuan Chen" <jiayuan.chen@shopee.com>,
	syzbot+ca1345cca66556f3d79b@syzkaller.appspotmail.com,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Sabrina Dubroca" <sd@queasysnail.net>,
	"David S.  Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Paolo  Abeni" <pabeni@redhat.com>,
	"Simon Horman" <horms@kernel.org>,
	"Vakul Garg" <vakul.garg@nxp.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends
Date: Sun, 01 Mar 2026 06:52:00 +0000	[thread overview]
Message-ID: <f85c8e268a8384fe14ba5b0033b9f9b931afc122@linux.dev> (raw)
In-Reply-To: <20260228091545.412a9a2d@kernel.org>

March 1, 2026 at 01:15, "Jakub Kicinski" <kuba@kernel.org mailto:kuba@kernel.org?to=%22Jakub%20Kicinski%22%20%3Ckuba%40kernel.org%3E > wrote:


> 
> On Fri, 27 Feb 2026 14:32:31 +0800 Jiayuan Chen wrote:
> 
> > 
> > tx_work_handler calls tls_tx_records with flags=-1, which preserves
> >  each record's original tx_flags but results in tcp_sendmsg_locked
> >  using an infinite send timeout. When the peer is unresponsive and the
> >  send buffer is full, tcp_sendmsg_locked blocks indefinitely in
> >  sk_stream_wait_memory. This causes tls_sk_proto_close to hang in
> >  cancel_delayed_work_sync waiting for tx_work_handler to finish,
> >  leading to a hung task:
> >  
> >  INFO: task ...: blocked for more than ... seconds.
> >  Call Trace:
> >  cancel_delayed_work_sync
> >  tls_sw_cancel_work_tx
> >  tls_sk_proto_close
> >  
> >  A workqueue handler should never block indefinitely. Fix this by
> >  introducing __tls_tx_records() with an extra_flags parameter that
> >  gets OR'd into each record's tx_flags. tx_work_handler uses this to
> >  pass MSG_DONTWAIT so tcp_sendmsg_locked returns -EAGAIN immediately
> >  when the send buffer is full, without overwriting the original
> >  per-record flags (MSG_MORE, MSG_NOSIGNAL, etc.). On -EAGAIN, the
> >  existing reschedule mechanism retries after a short delay.
> >  
> >  Also consolidate the two identical reschedule paths (lock contention
> >  and -EAGAIN) into one.
> > 
> It's not that simple. The default semantics for TCP sockets is that
> queuing data and then calling close() is a legitimate thing to do
> and the data should be sent cleanly, followed by a normal FIN in such
> case.
> 
> Maybe we should explore trying to make sure we have enough wmem before
> we start creating records. Get rid of the entire workqueue mess?



Regarding wmem pre-check: the async crypto path is not triggered by
wmem shortage — it's triggered when the crypto operation itself is
asynchronous (e.g. cryptd fallback when SIMD is unavailable). At the
time tls_do_encryption() returns -EINPROGRESS, wmem may be perfectly
fine. The problem occurs later when tls_encrypt_done() fires and
tx_work_handler tries to push the completed records — by that point
the send buffer may have filled up. Since these are two different
points in time, pre-checking wmem at record creation wouldn't help.

> Regarding your patch I think all callers passing -1 as flags are on 
> the close path, you could have just added | DONTWAIT if the flags 
> are -1.



Regarding adding MSG_DONTWAIT unconditionally when flags == -1:
tls_sw_release_resources_tx() also calls tls_tx_records(sk, -1).
That's in the close path where we actually want to block and flush
remaining records to honour the "close() should send data cleanly"
semantics you mentioned. Making that non-blocking would cause data
loss. So we do need to distinguish between the two callers, which
is why I introduced __tls_tx_records() with the extra_flags parameter.


Thanks,

> pw-bot: cr
>

  reply	other threads:[~2026-03-01  6:52 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-27  6:32 [PATCH net v1] tls: fix hung task in tx_work_handler by using non-blocking sends Jiayuan Chen
2026-02-28 17:15 ` Jakub Kicinski
2026-03-01  6:52   ` Jiayuan Chen [this message]
2026-03-03  0:38     ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f85c8e268a8384fe14ba5b0033b9f9b931afc122@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jiayuan.chen@shopee.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sd@queasysnail.net \
    --cc=syzbot+ca1345cca66556f3d79b@syzkaller.appspotmail.com \
    --cc=vakul.garg@nxp.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox