Re: [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del()

Netdev List
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: michael.chan@broadcom.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
	pabeni@redhat.com, andrew+netdev@lunn.ch,
	pavan.chebbi@broadcom.com, andrew.gospodarek@broadcom.com
Subject: Re: [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del()
Date: Fri, 15 May 2026 18:10:22 -0700	[thread overview]
Message-ID: <20260516011022.1857741-1-kuba@kernel.org> (raw)
In-Reply-To: <20260512212105.3488258-13-michael.chan@broadcom.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del()

This patch adds the basic infrastructure for kTLS TX offload in bnxt_en
by allocating and freeing kTLS context IDs (KIDs) via MPC commands, and
introduces bnxt_ktls_init() that wires bnxt_ktls_ops onto the netdev for
later use.

A few questions follow inline.

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 74935b119569..e26980814fe7 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c

[ ... ]

> @@ -13491,6 +13493,7 @@ static int bnxt_close(struct net_device *dev)
>  {
>  	struct bnxt *bp = netdev_priv(dev);
>  
> +	bnxt_crypto_del_all(bp);
>  	bnxt_close_nic(bp, true, true);
>  	bnxt_hwrm_shutdown_link(bp);
>  	bnxt_hwrm_if_change(bp, false);

[High]
Is the placement of bnxt_crypto_del_all() before bnxt_close_nic() safe
against concurrent tls_dev_add/tls_dev_del?

At this point BNXT_STATE_OPEN is still set and the pending counter has not
been drained yet (that happens inside __bnxt_close_nic()). The net/tls
core only holds device_offload_lock for read on the add path and does not
synchronize with .ndo_stop, so bnxt_ktls_dev_add() can run in parallel
with bnxt_crypto_del_all() and call into bnxt_key_ctx_alloc_one(), which
modifies kctx->list with list_add_tail_rcu() under kctx->lock.

bnxt_crypto_del_all() walks the same list with plain list_for_each_entry()
(no rcu_read_lock, no kctx->lock), while the alloc/free helpers explicitly
use list_*_rcu primitives:

> +void bnxt_crypto_del_all(struct bnxt *bp)
> +{
> +	struct bnxt_crypto_info *crypto = bp->crypto_info;
> +	struct bnxt_kid_info *kid;
> +	struct bnxt_kctx *kctx;
> +	int i;
> +
> +	if (!crypto)
> +		return;
> +
> +	/* Shutting down, no need to protect the lists. */
> +	for (i = 0; i < BNXT_MAX_CRYPTO_KEY_TYPE; i++) {
> +		kctx = &crypto->kctx[i];
> +		list_for_each_entry(kid, &kctx->list, list)
> +			bnxt_crypto_del_all_kids(bp, kid);
> +		kctx->epoch++;
> +	}
> +}

The "Shutting down, no need to protect the lists" comment seems to assume
no concurrent mutation, but with this call site preceding the OPEN-flag
clear and pending-drain, can a concurrent tls_dev_add insert a new kid
mid-traversal? The same applies to the call from bnxt_fw_reset_close()
below.

A second concern is in bnxt_crypto_del_all_kids():

> +static void bnxt_crypto_del_all_kids(struct bnxt *bp, struct bnxt_kid_info *kid)
> +{
> +	int i, rc;
> +
> +	for (i = 0; i < kid->count; i++) {
> +		if (!test_bit(i, kid->ids)) {
> +			rc = bnxt_crypto_del(bp, kid->type, kid->kind,
> +					     kid->start_id + i);
> +			if (!rc)
> +				set_bit(i, kid->ids);
> +		}
> +	}
> +}

The sequence test_bit(i) -> bnxt_crypto_del() (which can sleep inside
bnxt_xmit_crypto_cmd()) -> set_bit(i) is not atomic. If a concurrent
bnxt_free_one_kctx() does set_bit(i) and a subsequent bnxt_alloc_one_kctx()
reissues the same KID via test_and_clear_bit(i) during that sleep, can the
trailing set_bit() in this loop then mark an in-use slot as free, so the
next allocation hands the same KID out to a second TLS socket?

> @@ -14567,6 +14570,7 @@ static void bnxt_fw_reset_close(struct bnxt *bp)
>  			bp->fw_reset_min_dsecs = 0;
>  		bnxt_fw_fatal_close(bp);
>  	}
> +	bnxt_crypto_del_all(bp);
>  	__bnxt_close_nic(bp, true, false);
>  	bnxt_vf_reps_free(bp);
>  	bnxt_clear_int_mode(bp);

[ ... ]

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ktls.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ktls.c
> new file mode 100644

[ ... ]

> +static int bnxt_crypto_add(struct bnxt *bp, enum tls_offload_ctx_dir direction,
> +			   struct tls_crypto_info *crypto_info, u32 tcp_seq_no,
> +			   u32 kid)
> +{
> +	struct bnxt_tx_ring_info *txr;
> +	struct ce_add_cmd cmd = {0};
> +	u32 data;
> +
> +	if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
> +		txr = bnxt_select_mpc_ring(bp, BNXT_MPC_TCE_TYPE);
> +		cmd.ctx_kind = CE_ADD_CMD_CTX_KIND_CK_TX;
> +	} else {
> +		return -EOPNOTSUPP;
> +	}
> +	if (!txr)
> +		return -ENODEV;
> +
> +	data = CE_ADD_CMD_OPCODE_ADD | (BNXT_KID_HW(kid) << CE_ADD_CMD_KID_SFT);
> +	switch (crypto_info->cipher_type) {
> +	case TLS_CIPHER_AES_GCM_128: {
> +		struct tls12_crypto_info_aes_gcm_128 *aes;
> +
> +		aes = (void *)crypto_info;
> +		data |= CE_ADD_CMD_ALGORITHM_AES_GCM_128;
> +		if (crypto_info->version == TLS_1_3_VERSION)
> +			data |= CE_ADD_CMD_VERSION_TLS1_3;
> +		memcpy(&cmd.session_key, aes->key, sizeof(aes->key));
> +		memcpy(&cmd.salt, aes->salt, sizeof(aes->salt));
> +		memcpy(&cmd.addl_iv, aes->iv, sizeof(aes->iv));
> +		bnxt_copy_tls_mp_data(cmd.record_seq_num, aes->rec_seq,
> +				      sizeof(aes->rec_seq));
> +		break;
> +	}
> +	case TLS_CIPHER_AES_GCM_256: {
> +		struct tls12_crypto_info_aes_gcm_256 *aes;
> +
> +		aes = (void *)crypto_info;
> +		data |= CE_ADD_CMD_ALGORITHM_AES_GCM_256;
> +		if (crypto_info->version == TLS_1_3_VERSION)
> +			data |= CE_ADD_CMD_VERSION_TLS1_3;
> +		memcpy(&cmd.session_key, aes->key, sizeof(aes->key));
> +		memcpy(&cmd.salt, aes->salt, sizeof(aes->salt));
> +		memcpy(&cmd.addl_iv, aes->iv, sizeof(aes->iv));
> +		bnxt_copy_tls_mp_data(cmd.record_seq_num, aes->rec_seq,
> +				      sizeof(aes->rec_seq));
> +		break;
> +	}
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +	cmd.ver_algo_kid_opcode = cpu_to_le32(data);
> +	cmd.pkt_tcp_seq_num = cpu_to_le32(tcp_seq_no);
> +	cmd.tls_header_tcp_seq_num = cmd.pkt_tcp_seq_num;
> +	return bnxt_xmit_crypto_cmd(bp, txr, &cmd, sizeof(cmd),
> +				    BNXT_MPC_TMO_MSECS);
> +}

[High]
The cmd buffer here is on the kernel stack and ends up holding a
cleartext copy of the AES-GCM session key in cmd.session_key. After
bnxt_xmit_crypto_cmd() returns, the function returns without zeroing the
buffer, so the key bytes remain in the freed stack frame until they are
overwritten.

Should this end with a memzero_explicit(&cmd, sizeof(cmd)) before
returning, similar to how mlx5e_ktls_add_tx() clears its key material?

[ ... ]

> +static int bnxt_ktls_dev_add(struct net_device *dev, struct sock *sk,
> +			     enum tls_offload_ctx_dir direction,
> +			     struct tls_crypto_info *crypto_info,
> +			     u32 start_offload_tcp_sn)
> +{
> +	struct bnxt_ktls_offload_ctx_tx *kctx_tx;
> +	struct bnxt *bp = netdev_priv(dev);
> +	struct bnxt_crypto_info *crypto;
> +	struct tls_context *tls_ctx;
> +	struct bnxt_tls_info *ktls;
> +	struct bnxt_kctx *kctx;
> +	u32 kid;
> +	int rc;
> +
> +	BUILD_BUG_ON(sizeof(struct bnxt_ktls_offload_ctx_tx) >
> +		     TLS_DRIVER_STATE_SIZE_TX);
> +
> +	ktls = bp->ktls_info;
> +	if (direction == TLS_OFFLOAD_CTX_DIR_RX)
> +		return -EOPNOTSUPP;
> +
> +	if (!bnxt_ktls_cipher_supported(bp, crypto_info)) {
> +		atomic64_inc(&ktls->counters[BNXT_KTLS_ERR_INVALID_CIPHER]);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	atomic_inc(&ktls->pending);
> +	/* Make sure bnxt_close_nic() sees pending before we check the
> +	 * BNXT_STATE_OPEN flag.
> +	 */
> +	smp_mb__after_atomic();
> +	if (!test_bit(BNXT_STATE_OPEN, &bp->state)) {
> +		atomic64_inc(&ktls->counters[BNXT_KTLS_ERR_STATE_NOT_OPEN]);
> +		rc = -ENODEV;
> +		goto exit;
> +	}

[Medium, Low]
Two related questions about the ktls dereferences in this function.

bp->ktls_info is read once into ktls and then immediately dereferenced
without a NULL guard. bnxt_ktls_init() (in a later patch in this series)
only assigns dev->tlsdev_ops when bp->ktls_info is non-NULL, but does the
driver ever clear dev->tlsdev_ops on teardown? bnxt_free_ktls_info()
sets bp->ktls_info = NULL and is reachable from the FW-reset path via
__bnxt_hwrm_func_qcaps() -> bnxt_alloc_crypto_info() -> bnxt_free_*
without unregistering the netdev. Can a tls_dev_add arriving in that
window dereference NULL?

Separately, the comment on smp_mb__after_atomic() says "pending is the
gate that protects ktls" but the BNXT_KTLS_ERR_INVALID_CIPHER counter is
incremented before atomic_inc(&ktls->pending). Should that counter
update be moved past the pending barrier so it follows the same
ordering as the other ktls dereferences in this function?

[ ... ]

> +#define KTLS_RETRY_MAX	100
> +
> +static void bnxt_ktls_dev_del(struct net_device *dev,
> +			      struct tls_context *tls_ctx,
> +			      enum tls_offload_ctx_dir direction)
> +{
> +	struct bnxt_ktls_offload_ctx_tx *kctx_tx;
> +	struct bnxt *bp = netdev_priv(dev);
> +	struct bnxt_crypto_info *crypto;
> +	struct bnxt_tls_info *ktls;
> +	struct bnxt_kctx *kctx;
> +	int retry_cnt = 0;
> +	u8 kind;
> +	u32 kid;
> +
> +	ktls = bp->ktls_info;
> +retry:
> +	while (!test_bit(BNXT_STATE_OPEN, &bp->state)) {
> +		if (!netif_running(dev))
> +			return;
> +		/* Prevent infinite loop if device is down but netif_running()
> +		 * returns true.
> +		 */
> +		if (retry_cnt > KTLS_RETRY_MAX) {
> +			atomic64_inc(&ktls->counters[BNXT_KTLS_ERR_RETRY_EXCEEDED]);
> +			netdev_warn(dev, "%s retry max %d exceeded, state %lx\n",
> +				    __func__, retry_cnt, bp->state);
> +			return;
> +		}
> +		retry_cnt++;
> +		msleep(100);
> +	}

[High]
This loop can sleep up to 100 * 100 ms = ~10 seconds while holding
device_offload_lock from the net/tls caller. tls_device_down() takes
that lock for write and walks every offloaded TLS context on the netdev,
so a FW-reset that overlaps teardown of N TLS sockets could multiply the
delay across all of them and stall every other tls_set_device_offload()
on the system for that period.

Could this be coupled to the FW-reset completion path via wait_event()
or a completion instead of an open-coded msleep retry on a state bit?

The fixed retry budget also means that if the device stays out of
BNXT_STATE_OPEN longer than ~10 seconds, the function returns having
done nothing -- the HW key context stays programmed and the in-driver
KID slot is leaked until driver unload. Is that intended, and where is
the leaked slot reclaimed in that case?

[ ... ]

> +	crypto = bp->crypto_info;
> +	kctx_tx = __tls_driver_ctx(tls_ctx, TLS_OFFLOAD_CTX_DIR_TX);
> +	kid = kctx_tx->kid;
> +	kctx = &crypto->kctx[BNXT_TX_CRYPTO_KEY_TYPE];
> +	kind = BNXT_CTX_KIND_CK_TX;
> +	atomic64_inc(&ktls->counters[BNXT_KTLS_TX_DEL]);
> +	if (bnxt_kid_valid(kctx, kid) &&
> +	    !bnxt_crypto_del(bp, kctx->type, kind, kid))
> +		bnxt_free_one_kctx(kctx, kid);
> +
> +	atomic_dec(&ktls->pending);
> +}

[Medium]
bnxt_free_one_kctx() is only called when bnxt_crypto_del() returns 0.
bnxt_crypto_del() can return -ENODEV when bnxt_select_mpc_ring() returns
NULL, and any nonzero value from bnxt_xmit_crypto_cmd() (timeout, MPC
ring busy, completion error, allocation failure).

In all of those cases the bit corresponding to this KID in kid->ids is
never cleared back to "free", so the slot stays consumed. The close-time
fallback bnxt_crypto_del_all_kids() also only sets the bit on rc == 0,
so it does not reclaim slots that fail repeatedly either.

Can the KID pool slowly drain over runtime ifdown/ifup cycles with
intermittent MPC errors? Should the slot be released on the local side
even when the HW delete command fails?

[ ... ]

next prev parent reply	other threads:[~2026-05-16  1:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 21:20 [PATCH net-next v2 00/15] bnxt_en: Add kTLS TX offload support Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 01/15] bnxt_en: Add Midpath channel information Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 02/15] bnxt_en: Account for the MPC TX and CP rings Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 03/15] bnxt_en: Set default MPC ring count Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 04/15] bnxt_en: Rename xdp_tx_lock to tx_lock Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 05/15] bnxt_en: Allocate and free MPC software structures Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 06/15] bnxt_en: Allocate and free MPC channels from firmware Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 07/15] bnxt_en: Allocate crypto structure and backing store Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 08/15] bnxt_en: Reserve crypto RX and TX key contexts on a PF Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 09/15] bnxt_en: Add infrastructure for crypto key context IDs Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 10/15] bnxt_en: Add MPC transmit and completion functions Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del() Michael Chan
2026-05-16  1:10   ` Jakub Kicinski [this message]
2026-05-12 21:21 ` [PATCH net-next v2 13/15] bnxt_en: Implement kTLS TX normal path Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 14/15] bnxt_en: Add support for inline transmit BDs Michael Chan
2026-05-12 21:21 ` [PATCH net-next v2 15/15] bnxt_en: Add kTLS retransmission support Michael Chan
2026-05-16  1:10   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260516011022.1857741-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pavan.chebbi@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox