Netdev List
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: rjethwani@purestorage.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	netdev@vger.kernel.org, saeedm@nvidia.com, tariqt@nvidia.com,
	mbloch@nvidia.com, borisp@nvidia.com, john.fastabend@gmail.com,
	sd@queasysnail.net, davem@davemloft.net, pabeni@redhat.com,
	edumazet@google.com, leon@kernel.org
Subject: Re: [PATCH v14 7/9] tls: device: add RX KeyUpdate support
Date: Mon, 25 May 2026 14:16:10 -0700	[thread overview]
Message-ID: <20260525211610.2508479-1-kuba@kernel.org> (raw)
In-Reply-To: <20260515212715.3151307-8-rjethwani@purestorage.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
tls: device: add RX KeyUpdate support

This adds RX-side TLS 1.3 KeyUpdate support to the device offload path,
keeping the old AEAD/IV/rec_seq around to handle in-flight records that
the NIC already decrypted with the retired key, and deferring tls_dev_add
of the new key until the old key has fully drained.

> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index c435b3450872..1c58cbd55ffb 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c

[ ... ]

> @@ -1552,13 +1722,30 @@ int tls_set_device_offload(struct sock *sk,
>  	return rc;
>  }
>  
> -int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx)
> +int tls_set_device_offload_rx(struct sock *sk, struct tls_context *ctx,
> +			      struct tls_crypto_info *new_crypto_info)
>  {

[ ... ]

> +	if (new_crypto_info) {
> +		struct tls_sw_context_rx *sw_ctx = tls_sw_ctx_rx(ctx);
> +
> +		if (!test_bit(TLS_RX_DEV_CLOSED, &ctx->flags)) {
> +			set_bit(TLS_RX_DEV_CLOSED, &ctx->flags);
> +			synchronize_net();
> +			netdev->tlsdev_ops->tls_dev_del(netdev, ctx,
> +							TLS_OFFLOAD_CTX_DIR_RX);
> +		}
> +
> +		if (context->rekey.old_aead_recv &&
> +		    before(copied_seq, context->rekey.old_nic_boundary)) {

[ ... ]

> +			context->dev_add_pending = 1;
> +		} else {
> +			u32 rcv_nxt;
> +
> +			if (context->rekey.old_aead_recv) {
> +				crypto_free_aead(context->rekey.old_aead_recv);
> +				context->rekey.old_aead_recv = NULL;
> +			}
> +
> +			/* flush the backlog so rcv_nxt is accurate */
> +			__sk_flush_backlog(sk);
> +			rcv_nxt = tcp_sk(sk)->rcv_nxt;
> +
> +			if (before(copied_seq, rcv_nxt)) {
> +				context->rekey.old_aead_recv = sw_ctx->aead_recv;
> +				sw_ctx->aead_recv = NULL;
> +				memcpy(context->rekey.old_iv, ctx->rx.iv,
> +				       sizeof(context->rekey.old_iv));
> +				memcpy(context->rekey.old_rec_seq, ctx->rx.rec_seq,
> +				       sizeof(context->rekey.old_rec_seq));
> +				context->rekey.old_nic_boundary = rcv_nxt;
> +				context->dev_add_pending = 1;
> +			}
> +		}
> +	}
> +
> +	rc = tls_sw_ctx_init(sk, 0, new_crypto_info);
>  	if (rc)
>  		goto release_ctx;

[High]
In the rekey branch, sw_ctx->aead_recv has just been moved into
context->rekey.old_aead_recv (with sw_ctx->aead_recv set to NULL) and the
NIC offload has already been torn down via tls_dev_del with
TLS_RX_DEV_CLOSED set.

If tls_sw_ctx_init() then fails (for example crypto_alloc_aead, setkey,
setauthsize, or an ENOMEM in the strparser init), control jumps to
release_ctx, and for the new_crypto_info case that label is essentially a
no-op:

	release_ctx:
		if (!new_crypto_info) {
			kfree(ctx->priv_ctx_rx);
			ctx->priv_ctx_rx = NULL;
		}

So sw_ctx->aead_recv stays NULL, the rekey state stays populated,
TLS_RX_DEV_CLOSED stays set, and tls_sw_ctx_finalize() is never called.

The error returns to do_tls_setsockopt_conf() and, because
update && rx_conf == TLS_HW, it goes straight to err_crypto_info without
restoring anything, while rx_conf is still TLS_HW and the socket remains
open.

Can a subsequent recvmsg on this socket then NULL-deref in tls_decrypt_sg(),
which uses ctx->aead_recv unconditionally to compute
sizeof(*aead_req) + crypto_aead_reqsize(ctx->aead_recv)?

Should this path roll back the rekey state (restore sw_ctx->aead_recv from
rekey.old_aead_recv, clear rekey state, clear dev_add_pending) on
tls_sw_ctx_init() failure?

[High]
The error path renaming from tls_sw_free_resources_rx() to
tls_sw_release_resources_rx() in free_sw_resources, combined with a later
kfree(ctx->priv_ctx_rx), looks like it drops the strparser anchor cleanup:

> free_sw_resources:
>  	up_read(&device_offload_lock);
> -	tls_sw_free_resources_rx(sk);
> +	tls_sw_release_resources_rx(sk);
>  	down_read(&device_offload_lock);
>  release_ctx:
> -	ctx->priv_ctx_rx = NULL;
> +	if (!new_crypto_info) {
> +		kfree(ctx->priv_ctx_rx);
> +		ctx->priv_ctx_rx = NULL;
> +	}

tls_sw_free_resources_rx() previously did three things: release_resources_rx
(frees aead_recv, stops strparser), __tls_strp_done() ->
tls_strp_anchor_free() which consume_skb()s the anchor allocated in
tls_strp_init() during tls_sw_ctx_init(), and finally tls_sw_free_ctx_rx().

The new code only does steps one and three — calling kfree(priv_ctx_rx)
without first doing the anchor consume_skb().

Since tls_strp_init() runs inside tls_sw_ctx_init() during the
!new_crypto_info path, and free_sw_resources is reached only after
tls_device_dev_add_rx() fails (i.e. after init succeeded), is the
strp->anchor SKB leaked on every initial-setup failure here?

> +	if (!context->dev_add_pending) {
> +		rc = tls_device_dev_add_rx(sk, ctx, netdev, src_crypto_info,
> +					   copied_seq, !!new_crypto_info);
> +		if (!new_crypto_info) {
> +			if (rc)
> +				goto free_sw_resources;
> +			tls_device_attach(ctx, sk, netdev);
> +		}
> +	} else if (!was_dev_add_pending) {
> +		TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXREKEYINPROGRESS);
> +	} else {
> +		TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXREKEYOK);
> +	}

[Medium, Medium]
For the rekey path, context->dev_add_pending is set to 1 before
tls_sw_ctx_init() is called, but the matching
TLS_INC_STATS(LINUX_MIB_TLSRXREKEYINPROGRESS) only happens later in the
"else if (!was_dev_add_pending)" branch after init has succeeded.

If tls_sw_ctx_init() fails, the goto release_ctx is a no-op for
new_crypto_info, leaving dev_add_pending == 1 with no prior
REKEYINPROGRESS increment.

When the socket closes, tls_device_offload_cleanup_rx() then runs:

	if (rx_ctx && rx_ctx->dev_add_pending) {
		rx_ctx->dev_add_pending = 0;
		TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXREKEYOK);
		TLS_DEC_STATS(sock_net(sk), LINUX_MIB_TLSRXREKEYINPROGRESS);
	}

Does this decrement TLSRXREKEYINPROGRESS without a prior matching
increment, wrapping the per-CPU SNMP counter to a very large unsigned
value, and bump TLSRXREKEYOK for a rekey that never installed a new key
on the NIC?

Separately, in the same cleanup tail, dev_add_pending == 1 means the
deferred tls_dev_add never ran, so the new key was not installed on the
NIC. Should this path really increment LINUX_MIB_TLSRXREKEYOK, given that
tls_device_dev_add_rx() is the documented place that bumps REKEYOK after
the NIC accepts the new key?

[Medium]
In the same hunk, when was_dev_add_pending is 1 (a previous rekey is still
draining) and a new rekey arrives, the third arm runs:

	} else {
		TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSRXREKEYOK);
	}

dev_add_pending is still 1 here and the new key has not been installed on
the NIC yet. The eventual tls_device_deferred_dev_add_rx() will also call
tls_device_dev_add_rx() with is_rekey=true, which on success increments
LINUX_MIB_TLSRXREKEYOK again.

Is the same successful rekey getting counted twice in this case, and a
never-completed nested rekey getting counted as success?

  reply	other threads:[~2026-05-25 21:16 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-15 21:27 [PATCH net-next v14 0/9] tls: Add TLS 1.3 hardware offload support Rishikesh Jethwani
2026-05-15 21:27 ` [PATCH v14 1/9] net: tls: reject TLS 1.3 offload in chcr_ktls and nfp drivers Rishikesh Jethwani
2026-05-15 21:27 ` [PATCH v14 2/9] net/mlx5e: add TLS 1.3 hardware offload support Rishikesh Jethwani
2026-05-15 21:27 ` [PATCH v14 3/9] tls: " Rishikesh Jethwani
2026-05-15 21:27 ` [PATCH v14 4/9] tls: split tls_set_sw_offload into init and finalize stages Rishikesh Jethwani
2026-05-15 21:27 ` [PATCH v14 5/9] tls: prep helpers and refactors for HW offload KeyUpdate Rishikesh Jethwani
2026-05-15 21:27 ` [PATCH v14 6/9] tls: device: add TX KeyUpdate support Rishikesh Jethwani
2026-05-25 21:16   ` Jakub Kicinski
2026-05-15 21:27 ` [PATCH v14 7/9] tls: device: add RX " Rishikesh Jethwani
2026-05-25 21:16   ` Jakub Kicinski [this message]
2026-05-15 21:27 ` [PATCH v14 8/9] tls: device: add tracepoints for RX KeyUpdate path Rishikesh Jethwani
2026-05-15 21:27 ` [PATCH v14 9/9] selftests: net: add TLS hardware offload test Rishikesh Jethwani
2026-05-25 21:16   ` Jakub Kicinski
2026-05-17 22:21 ` [PATCH net-next v14 0/9] tls: Add TLS 1.3 hardware offload support Sabrina Dubroca

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260525211610.2508479-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=borisp@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=leon@kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=rjethwani@purestorage.com \
    --cc=saeedm@nvidia.com \
    --cc=sd@queasysnail.net \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox