public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: "Chuck Lever" <cel@kernel.org>
To: "Sabrina Dubroca" <sd@queasysnail.net>
Cc: john.fastabend@gmail.com, "Jakub Kicinski" <kuba@kernel.org>,
	netdev@vger.kernel.org, kernel-tls-handshake@lists.linux.dev,
	"Chuck Lever" <chuck.lever@oracle.com>,
	"Alistair Francis" <alistair.francis@wdc.com>,
	"Hannes Reinecke" <hare@suse.de>
Subject: Re: [PATCH net-next v7 4/5] tls: Suppress spurious saved_data_ready on all receive paths
Date: Mon, 30 Mar 2026 22:06:01 -0400	[thread overview]
Message-ID: <dba824ff-d3e5-4a7e-9e2f-5c0017473390@app.fastmail.com> (raw)
In-Reply-To: <acqMIKWFmx6B1lD8@krikkit>


On Mon, Mar 30, 2026, at 10:43 AM, Sabrina Dubroca wrote:
> 2026-03-28, 11:17:11 -0400, Chuck Lever wrote:
>> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
>> index 5fdd43a55f1e..8fb2f2a93846 100644
>> --- a/net/tls/tls_sw.c
>> +++ b/net/tls/tls_sw.c
>> @@ -1373,7 +1373,11 @@ tls_rx_rec_wait(struct sock *sk, struct sk_psock *psock, bool nonblock,
>>  			return ret;
>>  
>>  		if (!skb_queue_empty(&sk->sk_receive_queue)) {
>> -			tls_strp_check_rcv(&ctx->strp);
>> +			/* Defer notification to the exit point;
>> +			 * this thread will consume the record
>> +			 * directly.
>> +			 */
>
> That's a really nice improvement over the comment you had here
> before. Thanks.
>
>> +			tls_strp_check_rcv(&ctx->strp, false);
>>  			if (tls_strp_msg_ready(ctx))
>>  				break;
>>  		}
>
> [...]
>> @@ -2142,7 +2146,7 @@ int tls_sw_recvmsg(struct sock *sk,
>>  		err = tls_record_content_type(msg, tls_msg(darg.skb), &control);
>>  		if (err <= 0) {
>>  			DEBUG_NET_WARN_ON_ONCE(darg.zc);
>> -			tls_rx_rec_done(ctx);
>> +			tls_rx_rec_release(ctx);
>>  put_on_rx_list_err:
>>  			__skb_queue_tail(&ctx->rx_list, darg.skb);
>>  			goto recv_end;
>> @@ -2156,7 +2160,8 @@ int tls_sw_recvmsg(struct sock *sk,
>>  		/* TLS 1.3 may have updated the length by more than overhead */
>>  		rxm = strp_msg(darg.skb);
>>  		chunk = rxm->full_len;
>> -		tls_rx_rec_done(ctx);
>> +		tls_rx_rec_release(ctx);
>> +		tls_strp_check_rcv(&ctx->strp, false);
>
> Not strictly an objection against your patch, but after those changes,
> calling tls_strp_check_rcv() at this point in tls_sw_recvmsg() is
> starting to look like a leftover from the transition from generic strp
> to custom strp. We're not going to do anything with the next record
> until we call tls_rx_rec_wait(), so it seems it would fit better
> before the loop in tls_rx_rec_wait().
>
>
> [...]
>> @@ -2290,7 +2296,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
>>  		if (err < 0)
>>  			goto splice_read_end;
>>  
>> -		tls_rx_rec_done(ctx);
>> +		tls_rx_rec_release(ctx);
>>  		skb = darg.skb;
>>  	}
>>  
>> @@ -2317,6 +2323,7 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
>>  	consume_skb(skb);
>>  
>>  splice_read_end:
>> +	tls_strp_check_rcv(&ctx->strp, true);
>
> This is in effect adding a
>
> if (strp->msg_ready)
>     tls_rx_msg_ready(strp);
>
> [a bit more than that]
> in case we dequeue from the rx_list but don't use the record (or only
> part of it).
>
> I wonder if that should be seen as a problem (another spurious wakeup)
> or an improvement (wake up because there's data ready)? Or if we
> should wake up anyway if we exit with rx_list non-empty, regardless or
> the parser's state, since there is data to read (tls_sk_poll looks at
> both rx_list and msg_ready).
>
> [this applies to all 3 RX functions, but this one is the simplest,
> which makes it more visible]
>
>>  	tls_rx_reader_unlock(sk, ctx);
>>  	return copied ? : err;
>>  
>> @@ -2382,7 +2389,7 @@ int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
>>  			tlm = tls_msg(skb);
>>  			decrypted += rxm->full_len;
>>  
>> -			tls_rx_rec_done(ctx);
>> +			tls_rx_rec_release(ctx);
>
> With this, there's no tls_strp_check_rcv() call inside the
> tls_sw_read_sock() loop, so in the next iteration, tls_rx_rec_wait()
> will have to go for at least one round of its own loop. [this points
> back to the recvmsg comment above, and a bit to the "cold path"
> discussion we had earlier]

All three points are really the same architectural question:
where should the parse-next-record step live -- eagerly inline
after release, or lazily in tls_rx_rec_wait()? And when a record
is already parsed at function exit, should the wakeup be
considered spurious or useful?

I’d like to try to help answer those questions. I set up loopback
NFS-over-TLS on a 10-CPU test system running these patches with
patches that replace recvmsg in NFSD with read_sock. I profiled
the server-side receive path with perf under several fio write
workloads. The backing store is tmpfs, so disk I/O is not a factor.

At 1M block size with a single fio job (540 IOPS, 540 MiB/s),
AES-GCM decryption dominates at 49% of cycles. The TLS record
parsing functions (tls_strp_check_rcv, tls_rx_rec_wait,
tls_strp_msg_release) do not appear above the 0.5% threshold.

At 4k block size with 8 jobs over a single connection (29K
IOPS, 113 MiB/s), AES-GCM drops to 5% and mutex contention
on the socket lock rises to 6.4%. tls_strp_check_rcv and
tls_rx_rec_wait each account for 0.05%.

At 4k block size with 32 jobs over nconnect=8 (87K IOPS,
339 MiB/s), mutex contention drops to 0.75% as the load
spreads across eight TLS connections. tls_strp_check_rcv
rises to 0.14% and tls_rx_rec_wait to 0.12%. The profile
is broadly distributed across NFS compound processing, XDR
encode/decode, credential handling, TCP transmit, and memory
allocation.

Across all three workloads the total cost of the functions
reorganized by patch 4/5 -- tls_strp_check_rcv,
tls_strp_msg_release, tls_rx_rec_wait, and tls_rx_msg_ready
-- stays below 0.5% of server CPU cycles. Moving the
tls_strp_check_rcv call site from its current position to
tls_rx_rec_wait, or adding an eager parse inside the
tls_sw_read_sock loop, would not produce a measurable
difference at these load levels.

From a performance standpoint, the architecture doesn’t
really matter. I think we have room to innovate when we
get around to async support for decryption (which I agree
will be challenging).


-- 
Chuck Lever

  reply	other threads:[~2026-03-31  2:06 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-28 15:17 [PATCH net-next v7 0/5] TLS read_sock performance scalability Chuck Lever
2026-03-28 15:17 ` [PATCH net-next v7 1/5] tls: Abort the connection on decrypt failure Chuck Lever
2026-03-28 15:17 ` [PATCH net-next v7 2/5] tls: Fix dangling skb pointer in tls_sw_read_sock() Chuck Lever
2026-03-30 12:17   ` Sabrina Dubroca
2026-03-28 15:17 ` [PATCH net-next v7 3/5] tls: Factor tls_strp_msg_release() from tls_strp_msg_done() Chuck Lever
2026-03-30 12:21   ` Sabrina Dubroca
2026-03-28 15:17 ` [PATCH net-next v7 4/5] tls: Suppress spurious saved_data_ready on all receive paths Chuck Lever
2026-03-30 14:43   ` Sabrina Dubroca
2026-03-31  2:06     ` Chuck Lever [this message]
2026-03-28 15:17 ` [PATCH net-next v7 5/5] tls: Flush backlog before waiting for a new record Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dba824ff-d3e5-4a7e-9e2f-5c0017473390@app.fastmail.com \
    --to=cel@kernel.org \
    --cc=alistair.francis@wdc.com \
    --cc=chuck.lever@oracle.com \
    --cc=hare@suse.de \
    --cc=john.fastabend@gmail.com \
    --cc=kernel-tls-handshake@lists.linux.dev \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=sd@queasysnail.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox