From: Sabrina Dubroca <sd@queasysnail.net>
To: Rishikesh Jethwani <rjethwani@purestorage.com>
Cc: netdev@vger.kernel.org, saeedm@nvidia.com, tariqt@nvidia.com,
mbloch@nvidia.com, borisp@nvidia.com, john.fastabend@gmail.com,
kuba@kernel.org, davem@davemloft.net, pabeni@redhat.com,
edumazet@google.com, leon@kernel.org
Subject: Re: [PATCH v13 5/6] tls: add hardware offload key update support
Date: Wed, 6 May 2026 13:30:28 +0200 [thread overview]
Message-ID: <afsmVLSn2tA7UP9W@krikkit> (raw)
In-Reply-To: <20260429181016.3164935-6-rjethwani@purestorage.com>
2026-04-29, 12:10:15 -0600, Rishikesh Jethwani wrote:
> On RX, the NIC may have already decrypted in-flight records with
> the old key before the peer's KeyUpdate is parsed, so the old
> AEAD, IV and rec_seq are retained on tls_offload_context_rx.
> tls_check_pending_rekey() invokes tls_device_rx_del_key() to drop
> the NIC key; otherwise post-KeyUpdate records (carrying new-key
> wire encryption) would be XOR'd with the retired key.
> tls_device_decrypted() classifies records by old_nic_boundary:
>
> - after the boundary: new-key record; drop the old key.
> - before, fully encrypted: advance old_rec_seq, let SW AEAD decrypt.
> - before, (partially) decrypted: reencrypt with the old key so SW
> AEAD can decrypt with the new key.
>
> For mixed records skb->decrypted flags can be wrong (NIC clears
> them on auth failure); on -EBADMSG, tls_rx_rekey_retry() toggles
> those flags, decrements old_rec_seq to reuse the nonce, and
> retries once (gated by old_key_reencrypted).
Not blaming you for NIC behavior, but... the NIC passes up as
"decrypted" records that have failed decryption (because it was using
the wrong (old) key), or passes as "encrypted" the incorrectly
decrypted data (that it has "decrypted" with the old key)?
Or this is only the first record(s) after the KeyUpdate message, if
they fall within the same packet, the whole packet was "decrypted"
with the old key but only the KeyUpdate itself (and maybe some more
records before it) decrypted correctly ; but subsequent packets get
passed as !decrypted and don't need this reencrypt dance?
(this is maybe more of a question for Tariq or the other @nvidia
folks)
I haven't reviewed the whole patch at this point, because of Paolo's
suggestion and this confusion with the RX rekey.
> diff --git a/include/net/tls.h b/include/net/tls.h
> index ebd2550280ae..6891aa6b484c 100644
> --- a/include/net/tls.h
> +++ b/include/net/tls.h
[...]
> @@ -165,6 +181,11 @@ struct tls_offload_context_tx {
> void (*sk_destruct)(struct sock *sk);
> struct work_struct destruct_work;
> struct tls_context *ctx;
> +
> + struct tls_sw_context_tx rekey_sw; /* SW context for new key */
> + struct cipher_context rekey_tx; /* IV, rec_seq for new key */
> + union tls_crypto_context rekey_crypto_send; /* Crypto for new key */
[...]
> @@ -253,6 +273,15 @@ struct tls_context {
> */
> unsigned long flags;
>
> + /* TCP sequence number boundary for pending rekey.
> + * Packets with seq < this use old key, >= use new key.
> + */
> + u32 rekey_boundary_seq;
> +
> + /* Pointers to rekey contexts for SW encryption with new key */
> + struct tls_sw_context_tx *rekey_sw_ctx;
> + struct cipher_context *rekey_cipher_ctx;
> +
[...]
> @@ -311,6 +340,14 @@ struct tls_offload_context_rx {
> u8 resync_nh_reset:1;
> /* CORE_NEXT_HINT-only member, but use the hole here */
> u8 resync_nh_do_now:1;
> + /* retry reencrypt of mixed record during rekey */
> + u8 old_key_reencrypted:1;
> + /* tls_dev_add deferred until old key is freed */
> + u8 dev_add_pending:1;
> + struct crypto_aead *old_aead_recv; /* old key AEAD cipher */
> + char old_iv[TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE]; /* old key IV */
> + char old_rec_seq[TLS_MAX_REC_SEQ_SIZE]; /* old key TLS record seq */
> + u32 old_nic_boundary; /* TCP seq: NIC switched to next key */
I think it would be nice to group the new fields in those 3 structs
into embedded "untyped" structs:
struct tls_offload_context_rx {
/* existing fields */
struct {
struct crypto_aead *old_aead_recv; /* old key AEAD cipher */
char old_iv[TLS_MAX_IV_SIZE + TLS_MAX_SALT_SIZE]; /* old key IV */
char old_rec_seq[TLS_MAX_REC_SEQ_SIZE]; /* old key TLS record seq */
u32 old_nic_boundary; /* TCP seq: NIC switched to next key */
} rekey;
/* other fields */
};
and then remove the "rekey_" prefixes in
tls_offload_context_tx/tls_context.
[note to self: in the near future we should probably switch all those
comments to kdoc, and consider splitting the offload_contexts into
NIC-visible and internal-only chunks]
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index cd26873e9063..51f1cc783336 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
[...]
> @@ -159,6 +161,262 @@ static void delete_all_records(struct tls_offload_context_tx *offload_ctx)
> +static bool tls_has_unacked_records(struct tls_offload_context_tx *offload_ctx)
> +{
[...]
> +static int tls_device_init_rekey_sw(struct sock *sk,
> + struct tls_context *ctx,
> + struct tls_offload_context_tx *offload_ctx,
> + struct tls_crypto_info *new_crypto_info)
> +{
[...]
> +static int tls_device_start_rekey(struct sock *sk,
> + struct tls_context *ctx,
> + struct tls_offload_context_tx *offload_ctx,
> + struct tls_crypto_info *new_crypto_info)
> +{
[...]
> +static int tls_device_complete_rekey(struct sock *sk, struct tls_context *ctx)
> +{
For tls_device_complete_rekey(): refactor the bits that overlap with
tls_set_device_offload_initial() into a shared helper?
I think it'd be better to reshuffle all this so that
tls_has_unacked_records is just next to tls_tcp_clean_acked (because
they do something a bit similar, both deal with unacked records), and
move those 3 rekey functions next to tls_set_device_offload_rekey().
> @@ -980,13 +1269,144 @@ tls_device_reencrypt(struct sock *sk, struct tls_context *tls_ctx)
> return err;
> }
>
> +/*
> + * temporarily swap in the old key, run
> + * tls_device_reencrypt(), then restore the current key.
> + */
> +static int tls_old_key_reencrypt(struct sock *sk,
> + struct tls_offload_context_rx *ctx,
> + struct tls_sw_context_rx *sw_ctx,
> + struct tls_context *tls_ctx)
> +{
Why a separate helper (with a very confusing name), with then a
wrapper that does almost nothing else than call tls_old_key_reencrypt()?
[...]
> +static int tls_device_reencrypt_old_key(struct sock *sk,
> + struct tls_offload_context_rx *ctx,
> + struct tls_sw_context_rx *sw_ctx,
> + struct tls_context *tls_ctx)
> +{
[...]
> +void tls_device_rx_del_key(struct sock *sk, struct tls_context *ctx)
> +{
[...]
> +static int tls_device_dev_add(struct sock *sk, struct tls_context *tls_ctx,
> + struct net_device *netdev,
> + struct tls_crypto_info *crypto_info,
> + u32 cur_seq, bool is_rekey)
> +{
[...]
> +static void tls_device_deferred_dev_add(struct sock *sk,
> + struct tls_context *tls_ctx,
> + struct tls_offload_context_rx *ctx)
> +{
Same here, in terms of code organization, I'd move those 3
->tls_dev_{add,del} wrappers to the top of the file, so that the
reencrypt/decrypted/old_key function(s) stay grouped together.
> int tls_device_decrypted(struct sock *sk, struct tls_context *tls_ctx)
> {
[...]
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index fd04857fa0ab..ab701f166b57 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -371,6 +371,8 @@ static void tls_sk_proto_close(struct sock *sk, long timeout)
>
> if (ctx->tx_conf == TLS_SW)
> tls_sw_cancel_work_tx(ctx);
> + else if (ctx->tx_conf == TLS_HW && ctx->rekey_sw_ctx)
> + tls_sw_cancel_work_tx(ctx);
>
> lock_sock(sk);
> free_ctx = ctx->tx_conf != TLS_HW && ctx->rx_conf != TLS_HW;
> @@ -711,64 +713,68 @@ static int do_tls_setsockopt_conf(struct sock *sk, sockptr_t optval,
> }
>
> if (tx) {
> - if (update && ctx->tx_conf == TLS_HW) {
> - rc = -EOPNOTSUPP;
> - goto err_crypto_info;
> - }
> -
> - if (!update) {
> - rc = tls_set_device_offload(sk);
> - conf = TLS_HW;
> - if (!rc) {
> + rc = tls_set_device_offload(sk, update ? crypto_info : NULL);
> + conf = TLS_HW;
> + if (!rc) {
> + if (update) {
> + TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSTXREKEYOK);
tls_set_device_offload* could succeed without completing the rekey, so
we would increment this counter, and then the async rekey completion
could fail. In that case, we will end up with 2 stats increases for a
single rekey.
Probably better to add a pair of REKEYINPROGRESS counters (similar to
the CURR* counters) that get incremented if we start a rekey but can't
finish it immediately, and decremented when we complete the rekey (and
also increment the corresponding REKEYOK/REKEYFAIL counter at that
time). If we're able to complete the rekey before returning from
do_tls_setsockopt_conf, we would increment the REKEYOK/REKEYFAIL
immediately.
This would avoid the inconsistent number of stats increases, and allow
users to see that a rekey is in progress.
> + } else {
> TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSTXDEVICE);
> TLS_INC_STATS(sock_net(sk), LINUX_MIB_TLSCURRTXDEVICE);
> - goto out;
> }
> - }
> -
> - rc = tls_set_sw_offload(sk, 1, update ? crypto_info : NULL);
> - if (rc)
> + } else if (update && ctx->tx_conf == TLS_HW) {
> + /* HW rekey failed - return the actual error.
> + * Cannot fall back to SW for an existing HW connection.
> + */
> goto err_crypto_info;
In this case, the REKEYHWFAIL will not always be incremented (for
example when tls_device_start_rekey fails, or tls_sw_ctx_init on the
RX side).
[...]
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 1412b3dcce4c..fc60e8c0f24c 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
[...]
> @@ -1802,6 +1801,8 @@ static int tls_check_pending_rekey(struct sock *sk, struct tls_context *ctx,
> if (hs_type == TLS_HANDSHAKE_KEYUPDATE) {
> struct tls_sw_context_rx *rx_ctx = ctx->priv_ctx_rx;
>
> + /* Stop NIC from XOR-ing post-KU records with the retired key */
I think "tls_device_rx_del_key" tells us everything we need to know in
this context, and "XOR-ing post-KU records" is just noise here. Please
remove this comment.
> @@ -2714,11 +2758,7 @@ static struct tls_sw_context_tx *init_ctx_tx(struct tls_context *ctx, struct soc
> sw_ctx_tx = ctx->priv_ctx_tx;
> }
>
> - crypto_init_wait(&sw_ctx_tx->async_wait);
> - atomic_set(&sw_ctx_tx->encrypt_pending, 1);
> - INIT_LIST_HEAD(&sw_ctx_tx->tx_list);
> - INIT_DELAYED_WORK(&sw_ctx_tx->tx_work.work, tx_work_handler);
> - sw_ctx_tx->tx_work.sk = sk;
> + tls_sw_ctx_tx_init(sk, sw_ctx_tx);
>
> return sw_ctx_tx;
> }
> @@ -2861,11 +2901,9 @@ int tls_sw_ctx_init(struct sock *sk, int tx,
> goto free_aead;
> }
>
> - if (!new_crypto_info) {
> - rc = crypto_aead_setauthsize(*aead, prot->tag_size);
> - if (rc)
> - goto free_aead;
> - }
> + rc = crypto_aead_setauthsize(*aead, prot->tag_size);
> + if (rc)
> + goto free_aead;
If you're going to do this (I'm not sure why we want this), fix up the
comment just above crypto_aead_setkey (8 lines above this) that says
"setkey is the last operation that could fail during a rekey", or do
the change in a way that setkey is still the last op that can fail.
--
Sabrina
next prev parent reply other threads:[~2026-05-06 11:30 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-29 18:10 [PATCH net-next v13 0/6] tls: Add TLS 1.3 hardware offload support Rishikesh Jethwani
2026-04-29 18:10 ` [PATCH v13 1/6] net: tls: reject TLS 1.3 offload in chcr_ktls and nfp drivers Rishikesh Jethwani
2026-04-29 18:10 ` [PATCH v13 2/6] net/mlx5e: add TLS 1.3 hardware offload support Rishikesh Jethwani
2026-04-29 18:10 ` [PATCH v13 3/6] tls: " Rishikesh Jethwani
2026-04-29 18:10 ` [PATCH v13 4/6] tls: split tls_set_sw_offload into init and finalize stages Rishikesh Jethwani
2026-04-29 18:10 ` [PATCH v13 5/6] tls: add hardware offload key update support Rishikesh Jethwani
2026-05-05 8:40 ` Paolo Abeni
2026-05-05 8:41 ` Paolo Abeni
2026-05-06 10:37 ` Sabrina Dubroca
2026-05-06 11:30 ` Sabrina Dubroca [this message]
2026-04-29 18:10 ` [PATCH v13 6/6] selftests: net: add TLS hardware offload test Rishikesh Jethwani
2026-05-06 17:07 ` Sabrina Dubroca
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afsmVLSn2tA7UP9W@krikkit \
--to=sd@queasysnail.net \
--cc=borisp@nvidia.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rjethwani@purestorage.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox