Netdev List
 help / color / mirror / Atom feed
From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: "WindowsForum.com" <admin@windowsforum.com>
Cc: netdev@vger.kernel.org, Boris Pismenny <borisp@nvidia.com>,
	John Fastabend <john.fastabend@gmail.com>,
	Jakub Kicinski <kuba@kernel.org>,
	Sabrina Dubroca <sd@queasysnail.net>,
	David Howells <dhowells@redhat.com>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [RFC net] tls: TLS_SW sendfile() stalls at large MSS
Date: Thu, 4 Jun 2026 19:27:18 +0800	[thread overview]
Message-ID: <66925275-1d07-4a74-996a-ec14456999f2@linux.dev> (raw)
In-Reply-To: <CAP_6uV+1zQqLtwH30SyuQmyHc03uv9ea+ZUr42TGSCozU4KcdA@mail.gmail.com>


On 6/4/26 2:53 PM, WindowsForum.com wrote:
> Thanks for testing. The non-reproduction is maybe now the key data 
> point. My reproducer omitted a precondition my hosts happened to meet: 
> a low net.ipv4.tcp_notsent_lowat. To reproduce, add before running:
>
> sysctl -w net.ipv4.tcp_notsent_lowat=16384


I see.

>
> Root cause
> ----------
> The stalling hosts have tcp_notsent_lowat=16384 (local web tuning); 
> the stock default is effectively disabled. A TLS 1.3 record is 16406 
> bytes (TLS_MAX_PAYLOAD_SIZE 16384 + 22), just above that watermark -- 
> so once tls_sw queues a single completed record, notsent (16406) 
> exceeds the lowat, tcp_stream_memory_free() returns false, and tls_sw 
> parks in sk_stream_wait_memory() holding exactly one corked record 
> (the notsent:16406 + persist state from the original dump). With the 
> default lowat, tls_sw keeps queuing, the MSG_MORE cork flushes at each 
> sendfile() boundary, packets_out stays non-zero, and the persist timer 
> never arms -- which is why stock kernels don't show it.
>
> Three conditions must coincide:
>   (a) MSG_MORE forwarded on a completed record -> the sub-MSS record 
> is corked [the bug];
>   (b) tcp_notsent_lowat < one TLS record (16406) -> tls_sw blocks 
> after that one record instead of streaming past it [the trigger I'd 
> omitted];
>   (c) large MSS -> the record is sub-MSS, so the cork engages [the 
> amplifier].
>
> Confirmed by flipping only that knob: on a stalling host, restoring 
> the default lowat -> 2.89 GiB/s; on a healthy host, setting 
> lowat=16384 -> stalls (~0.0001 GiB/s). Everything that merely 
> correlated (kernel build, congestion control/qdisc, wmem/rmem, 
> tcp_mem, tcp_limit_output_bytes, CPU count, AES-GCM impl) was 
> flip-tested and ruled out.
>
> This doesn't change the proposed fix: clearing MSG_MORE for a full 
> record sends it immediately, so the deadlock can't form regardless of 
> tcp_notsent_lowat.


IMO, force-clearing the MSG_MORE flag for each record is not a good idea,

since we want multiple "APPLICATION DATA" frames in one TCP payload.

>
> If you had not submitted your reply I don't think I would have kept 
> testing it - hope this information is useful to the group.
>
>
Maybe we can skip the sk_stream_memory_free check if MSG_MORE is 
present. The lower tcp_sendmsg_locked will check it again.




  parent reply	other threads:[~2026-06-04 11:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-03 17:19 [RFC net] tls: TLS_SW sendfile() stalls at large MSS Mike Fara
2026-06-04  3:12 ` Jiayuan Chen
     [not found]   ` <CAP_6uV+1zQqLtwH30SyuQmyHc03uv9ea+ZUr42TGSCozU4KcdA@mail.gmail.com>
2026-06-04 11:27     ` Jiayuan Chen [this message]
2026-06-04 13:13     ` Eric Dumazet
2026-06-04 13:00 ` David Laight

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=66925275-1d07-4a74-996a-ec14456999f2@linux.dev \
    --to=jiayuan.chen@linux.dev \
    --cc=admin@windowsforum.com \
    --cc=borisp@nvidia.com \
    --cc=dhowells@redhat.com \
    --cc=edumazet@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sd@queasysnail.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox