From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: "WindowsForum.com" <admin@windowsforum.com>
Cc: netdev@vger.kernel.org, Boris Pismenny <borisp@nvidia.com>,
John Fastabend <john.fastabend@gmail.com>,
Jakub Kicinski <kuba@kernel.org>,
Sabrina Dubroca <sd@queasysnail.net>,
David Howells <dhowells@redhat.com>,
Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
linux-kernel@vger.kernel.org
Subject: Re: [RFC net] tls: TLS_SW sendfile() stalls at large MSS
Date: Thu, 4 Jun 2026 19:27:18 +0800 [thread overview]
Message-ID: <66925275-1d07-4a74-996a-ec14456999f2@linux.dev> (raw)
In-Reply-To: <CAP_6uV+1zQqLtwH30SyuQmyHc03uv9ea+ZUr42TGSCozU4KcdA@mail.gmail.com>
On 6/4/26 2:53 PM, WindowsForum.com wrote:
> Thanks for testing. The non-reproduction is maybe now the key data
> point. My reproducer omitted a precondition my hosts happened to meet:
> a low net.ipv4.tcp_notsent_lowat. To reproduce, add before running:
>
> sysctl -w net.ipv4.tcp_notsent_lowat=16384
I see.
>
> Root cause
> ----------
> The stalling hosts have tcp_notsent_lowat=16384 (local web tuning);
> the stock default is effectively disabled. A TLS 1.3 record is 16406
> bytes (TLS_MAX_PAYLOAD_SIZE 16384 + 22), just above that watermark --
> so once tls_sw queues a single completed record, notsent (16406)
> exceeds the lowat, tcp_stream_memory_free() returns false, and tls_sw
> parks in sk_stream_wait_memory() holding exactly one corked record
> (the notsent:16406 + persist state from the original dump). With the
> default lowat, tls_sw keeps queuing, the MSG_MORE cork flushes at each
> sendfile() boundary, packets_out stays non-zero, and the persist timer
> never arms -- which is why stock kernels don't show it.
>
> Three conditions must coincide:
> (a) MSG_MORE forwarded on a completed record -> the sub-MSS record
> is corked [the bug];
> (b) tcp_notsent_lowat < one TLS record (16406) -> tls_sw blocks
> after that one record instead of streaming past it [the trigger I'd
> omitted];
> (c) large MSS -> the record is sub-MSS, so the cork engages [the
> amplifier].
>
> Confirmed by flipping only that knob: on a stalling host, restoring
> the default lowat -> 2.89 GiB/s; on a healthy host, setting
> lowat=16384 -> stalls (~0.0001 GiB/s). Everything that merely
> correlated (kernel build, congestion control/qdisc, wmem/rmem,
> tcp_mem, tcp_limit_output_bytes, CPU count, AES-GCM impl) was
> flip-tested and ruled out.
>
> This doesn't change the proposed fix: clearing MSG_MORE for a full
> record sends it immediately, so the deadlock can't form regardless of
> tcp_notsent_lowat.
IMO, force-clearing the MSG_MORE flag for each record is not a good idea,
since we want multiple "APPLICATION DATA" frames in one TCP payload.
>
> If you had not submitted your reply I don't think I would have kept
> testing it - hope this information is useful to the group.
>
>
Maybe we can skip the sk_stream_memory_free check if MSG_MORE is
present. The lower tcp_sendmsg_locked will check it again.
next prev parent reply other threads:[~2026-06-04 11:27 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-03 17:19 [RFC net] tls: TLS_SW sendfile() stalls at large MSS Mike Fara
2026-06-04 3:12 ` Jiayuan Chen
[not found] ` <CAP_6uV+1zQqLtwH30SyuQmyHc03uv9ea+ZUr42TGSCozU4KcdA@mail.gmail.com>
2026-06-04 11:27 ` Jiayuan Chen [this message]
2026-06-04 13:13 ` Eric Dumazet
2026-06-04 13:00 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66925275-1d07-4a74-996a-ec14456999f2@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=admin@windowsforum.com \
--cc=borisp@nvidia.com \
--cc=dhowells@redhat.com \
--cc=edumazet@google.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=sd@queasysnail.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox