Re: [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: michael.chan@broadcom.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
	pabeni@redhat.com, andrew+netdev@lunn.ch,
	pavan.chebbi@broadcom.com, andrew.gospodarek@broadcom.com
Subject: Re: [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del()
Date: Fri, 15 May 2026 18:10:22 -0700	[thread overview]
Message-ID: <20260516011022.1857741-1-kuba@kernel.org> (raw)
In-Reply-To: <20260512212105.3488258-13-michael.chan@broadcom.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del()

This patch adds the basic infrastructure for kTLS TX offload in bnxt_en
by allocating and freeing kTLS context IDs (KIDs) via MPC commands, and
introduces bnxt_ktls_init() that wires bnxt_ktls_ops onto the netdev for
later use.

A few questions follow inline.

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 74935b119569..e26980814fe7 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c

[ ... ]

> @@ -13491,6 +13493,7 @@ static int bnxt_close(struct net_device *dev)
>  {
>  	struct bnxt *bp = netdev_priv(dev);
>  
> +	bnxt_crypto_del_all(bp);
>  	bnxt_close_nic(bp, true, true);
>  	bnxt_hwrm_shutdown_link(bp);
>  	bnxt_hwrm_if_change(bp, false);

[High]
Is the placement of bnxt_crypto_del_all() before bnxt_close_nic() safe
against concurrent tls_dev_add/tls_dev_del?

At this point BNXT_STATE_OPEN is still set and the pending counter has not
been drained yet (that happens inside __bnxt_close_nic()). The net/tls
core only holds device_offload_lock for read on the add path and does not
synchronize with .ndo_stop, so bnxt_ktls_dev_add() can run in parallel
with bnxt_crypto_del_all() and call into bnxt_key_ctx_alloc_one(), which
modifies kctx->list with list_add_tail_rcu() under kctx->lock.

bnxt_crypto_del_all() walks the same list with plain list_for_each_entry()
(no rcu_read_lock, no kctx->lock), while the alloc/free helpers explicitly
use list_*_rcu primitives:

> +void bnxt_crypto_del_all(struct bnxt *bp)
> +{
> +	struct bnxt_crypto_info *crypto = bp->crypto_info;
> +	struct bnxt_kid_info *kid;
> +	struct bnxt_kctx *kctx;
> +	int i;
> +
> +	if (!crypto)
> +		return;
> +
> +	/* Shutting down, no need to protect the lists. */
> +	for (i = 0; i < BNXT_MAX_CRYPTO_KEY_TYPE; i++) {
> +		kctx = &crypto->kctx[i];
> +		list_for_each_entry(kid, &kctx->list, list)
> +			bnxt_crypto_del_all_kids(bp, kid);
> +		kctx->epoch++;
> +	}
> +}

The "Shutting down, no need to protect the lists" comment seems to assume
no concurrent mutation, but with this call site preceding the OPEN-flag
clear and pending-drain, can a concurrent tls_dev_add insert a new kid
mid-traversal? The same applies to the call from bnxt_fw_reset_close()
below.

A second concern is in bnxt_crypto_del_all_kids():

> +static void bnxt_crypto_del_all_kids(struct bnxt *bp, struct bnxt_kid_info *kid)
> +{
> +	int i, rc;
> +
> +	for (i = 0; i < kid->count; i++) {
> +		if (!test_bit(i, kid->ids)) {
> +			rc = bnxt_crypto_del(bp, kid->type, kid->kind,
> +					     kid->start_id + i);
> +			if (!rc)
> +				set_bit(i, kid->ids);
> +		}
> +	}
> +}

The sequence test_bit(i) -> bnxt_crypto_del() (which can sleep inside
bnxt_xmit_crypto_cmd()) -> set_bit(i) is not atomic. If a concurrent
bnxt_free_one_kctx() does set_bit(i) and a subsequent bnxt_alloc_one_kctx()
reissues the same KID via test_and_clear_bit(i) during that sleep, can the
trailing set_bit() in this loop then mark an in-use slot as free, so the
next allocation hands the same KID out to a second TLS socket?

> @@ -14567,6 +14570,7 @@ static void bnxt_fw_reset_close(struct bnxt *bp)
>  			bp->fw_reset_min_dsecs = 0;
>  		bnxt_fw_fatal_close(bp);
>  	}
> +	bnxt_crypto_del_all(bp);
>  	__bnxt_close_nic(bp, true, false);
>  	bnxt_vf_reps_free(bp);
>  	bnxt_clear_int_mode(bp);

[ ... ]

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ktls.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ktls.c
> new file mode 100644

[ ... ]

> +static int bnxt_crypto_add(struct bnxt *bp, enum tls_offload_ctx_dir direction,
> +			   struct tls_crypto_info *crypto_info, u32 tcp_seq_no,
> +			   u32 kid)
> +{
> +	struct bnxt_tx_ring_info *txr;
> +	struct ce_add_cmd cmd = {0};
> +	u32 data;
> +
> +	if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
> +		txr = bnxt_select_mpc_ring(bp, BNXT_MPC_TCE_TYPE);
> +		cmd.ctx_kind = CE_ADD_CMD_CTX_KIND_CK_TX;
> +	} else {
> +		return -EOPNOTSUPP;
> +	}
> +	if (!txr)
> +		return -ENODEV;
> +
> +	data = CE_ADD_CMD_OPCODE_ADD | (BNXT_KID_HW(kid) << CE_ADD_CMD_KID_SFT);
> +	switch (crypto_info->cipher_type) {
> +	case TLS_CIPHER_AES_GCM_128: {
> +		struct tls12_crypto_info_aes_gcm_128 *aes;
> +
> +		aes = (void *)crypto_info;
> +		data |= CE_ADD_CMD_ALGORITHM_AES_GCM_128;
> +		if (crypto_info->version == TLS_1_3_VERSION)
> +			data |= CE_ADD_CMD_VERSION_TLS1_3;
> +		memcpy(&cmd.session_key, aes->key, sizeof(aes->key));
> +		memcpy(&cmd.salt, aes->salt, sizeof(aes->salt));
> +		memcpy(&cmd.addl_iv, aes->iv, sizeof(aes->iv));
> +		bnxt_copy_tls_mp_data(cmd.record_seq_num, aes->rec_seq,
> +				      sizeof(aes->rec_seq));
> +		break;
> +	}
> +	case TLS_CIPHER_AES_GCM_256: {
> +		struct tls12_crypto_info_aes_gcm_256 *aes;
> +
> +		aes = (void *)crypto_info;
> +		data |= CE_ADD_CMD_ALGORITHM_AES_GCM_256;
> +		if (crypto_info->version == TLS_1_3_VERSION)
> +			data |= CE_ADD_CMD_VERSION_TLS1_3;
> +		memcpy(&cmd.session_key, aes->key, sizeof(aes->key));
> +		memcpy(&cmd.salt, aes->salt, sizeof(aes->salt));
> +		memcpy(&cmd.addl_iv, aes->iv, sizeof(aes->iv));
> +		bnxt_copy_tls_mp_data(cmd.record_seq_num, aes->rec_seq,
> +				      sizeof(aes->rec_seq));
> +		break;
> +	}
> +	default:
> +		return -EOPNOTSUPP;
> +	}
> +	cmd.ver_algo_kid_opcode = cpu_to_le32(data);
> +	cmd.pkt_tcp_seq_num = cpu_to_le32(tcp_seq_no);
> +	cmd.tls_header_tcp_seq_num = cmd.pkt_tcp_seq_num;
> +	return bnxt_xmit_crypto_cmd(bp, txr, &cmd, sizeof(cmd),
> +				    BNXT_MPC_TMO_MSECS);
> +}

[High]
The cmd buffer here is on the kernel stack and ends up holding a
cleartext copy of the AES-GCM session key in cmd.session_key. After
bnxt_xmit_crypto_cmd() returns, the function returns without zeroing the
buffer, so the key bytes remain in the freed stack frame until they are
overwritten.

Should this end with a memzero_explicit(&cmd, sizeof(cmd)) before
returning, similar to how mlx5e_ktls_add_tx() clears its key material?

[ ... ]

> +static int bnxt_ktls_dev_add(struct net_device *dev, struct sock *sk,
> +			     enum tls_offload_ctx_dir direction,
> +			     struct tls_crypto_info *crypto_info,
> +			     u32 start_offload_tcp_sn)
> +{
> +	struct bnxt_ktls_offload_ctx_tx *kctx_tx;
> +	struct bnxt *bp = netdev_priv(dev);
> +	struct bnxt_crypto_info *crypto;
> +	struct tls_context *tls_ctx;
> +	struct bnxt_tls_info *ktls;
> +	struct bnxt_kctx *kctx;
> +	u32 kid;
> +	int rc;
> +
> +	BUILD_BUG_ON(sizeof(struct bnxt_ktls_offload_ctx_tx) >
> +		     TLS_DRIVER_STATE_SIZE_TX);
> +
> +	ktls = bp->ktls_info;
> +	if (direction == TLS_OFFLOAD_CTX_DIR_RX)
> +		return -EOPNOTSUPP;
> +
> +	if (!bnxt_ktls_cipher_supported(bp, crypto_info)) {
> +		atomic64_inc(&ktls->counters[BNXT_KTLS_ERR_INVALID_CIPHER]);
> +		return -EOPNOTSUPP;
> +	}
> +
> +	atomic_inc(&ktls->pending);
> +	/* Make sure bnxt_close_nic() sees pending before we check the
> +	 * BNXT_STATE_OPEN flag.
> +	 */
> +	smp_mb__after_atomic();
> +	if (!test_bit(BNXT_STATE_OPEN, &bp->state)) {
> +		atomic64_inc(&ktls->counters[BNXT_KTLS_ERR_STATE_NOT_OPEN]);
> +		rc = -ENODEV;
> +		goto exit;
> +	}

[Medium, Low]
Two related questions about the ktls dereferences in this function.

bp->ktls_info is read once into ktls and then immediately dereferenced
without a NULL guard. bnxt_ktls_init() (in a later patch in this series)
only assigns dev->tlsdev_ops when bp->ktls_info is non-NULL, but does the
driver ever clear dev->tlsdev_ops on teardown? bnxt_free_ktls_info()
sets bp->ktls_info = NULL and is reachable from the FW-reset path via
__bnxt_hwrm_func_qcaps() -> bnxt_alloc_crypto_info() -> bnxt_free_*
without unregistering the netdev. Can a tls_dev_add arriving in that
window dereference NULL?

Separately, the comment on smp_mb__after_atomic() says "pending is the
gate that protects ktls" but the BNXT_KTLS_ERR_INVALID_CIPHER counter is
incremented before atomic_inc(&ktls->pending). Should that counter
update be moved past the pending barrier so it follows the same
ordering as the other ktls dereferences in this function?

[ ... ]

> +#define KTLS_RETRY_MAX	100
> +
> +static void bnxt_ktls_dev_del(struct net_device *dev,
> +			      struct tls_context *tls_ctx,
> +			      enum tls_offload_ctx_dir direction)
> +{
> +	struct bnxt_ktls_offload_ctx_tx *kctx_tx;
> +	struct bnxt *bp = netdev_priv(dev);
> +	struct bnxt_crypto_info *crypto;
> +	struct bnxt_tls_info *ktls;
> +	struct bnxt_kctx *kctx;
> +	int retry_cnt = 0;
> +	u8 kind;
> +	u32 kid;
> +
> +	ktls = bp->ktls_info;
> +retry:
> +	while (!test_bit(BNXT_STATE_OPEN, &bp->state)) {
> +		if (!netif_running(dev))
> +			return;
> +		/* Prevent infinite loop if device is down but netif_running()
> +		 * returns true.
> +		 */
> +		if (retry_cnt > KTLS_RETRY_MAX) {
> +			atomic64_inc(&ktls->counters[BNXT_KTLS_ERR_RETRY_EXCEEDED]);
> +			netdev_warn(dev, "%s retry max %d exceeded, state %lx\n",
> +				    __func__, retry_cnt, bp->state);
> +			return;
> +		}
> +		retry_cnt++;
> +		msleep(100);
> +	}

[High]
This loop can sleep up to 100 * 100 ms = ~10 seconds while holding
device_offload_lock from the net/tls caller. tls_device_down() takes
that lock for write and walks every offloaded TLS context on the netdev,
so a FW-reset that overlaps teardown of N TLS sockets could multiply the
delay across all of them and stall every other tls_set_device_offload()
on the system for that period.

Could this be coupled to the FW-reset completion path via wait_event()
or a completion instead of an open-coded msleep retry on a state bit?

The fixed retry budget also means that if the device stays out of
BNXT_STATE_OPEN longer than ~10 seconds, the function returns having
done nothing -- the HW key context stays programmed and the in-driver
KID slot is leaked until driver unload. Is that intended, and where is
the leaked slot reclaimed in that case?

[ ... ]

> +	crypto = bp->crypto_info;
> +	kctx_tx = __tls_driver_ctx(tls_ctx, TLS_OFFLOAD_CTX_DIR_TX);
> +	kid = kctx_tx->kid;
> +	kctx = &crypto->kctx[BNXT_TX_CRYPTO_KEY_TYPE];
> +	kind = BNXT_CTX_KIND_CK_TX;
> +	atomic64_inc(&ktls->counters[BNXT_KTLS_TX_DEL]);
> +	if (bnxt_kid_valid(kctx, kid) &&
> +	    !bnxt_crypto_del(bp, kctx->type, kind, kid))
> +		bnxt_free_one_kctx(kctx, kid);
> +
> +	atomic_dec(&ktls->pending);
> +}

[Medium]
bnxt_free_one_kctx() is only called when bnxt_crypto_del() returns 0.
bnxt_crypto_del() can return -ENODEV when bnxt_select_mpc_ring() returns
NULL, and any nonzero value from bnxt_xmit_crypto_cmd() (timeout, MPC
ring busy, completion error, allocation failure).

In all of those cases the bit corresponding to this KID in kid->ids is
never cleared back to "free", so the slot stays consumed. The close-time
fallback bnxt_crypto_del_all_kids() also only sets the bit on rc == 0,
so it does not reclaim slots that fail repeatedly either.

Can the KID pool slowly drain over runtime ifdown/ifup cycles with
intermittent MPC errors? Should the slot be released on the local side
even when the HW delete command fails?

[ ... ]

next prev parent reply	other threads:[~2026-05-16  1:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 21:20 [PATCH net-next v2 00/15] bnxt_en: Add kTLS TX offload support Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 01/15] bnxt_en: Add Midpath channel information Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 02/15] bnxt_en: Account for the MPC TX and CP rings Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 03/15] bnxt_en: Set default MPC ring count Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 04/15] bnxt_en: Rename xdp_tx_lock to tx_lock Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 05/15] bnxt_en: Allocate and free MPC software structures Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 06/15] bnxt_en: Allocate and free MPC channels from firmware Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 07/15] bnxt_en: Allocate crypto structure and backing store Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 08/15] bnxt_en: Reserve crypto RX and TX key contexts on a PF Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 09/15] bnxt_en: Add infrastructure for crypto key context IDs Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 10/15] bnxt_en: Add MPC transmit and completion functions Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del() Michael Chan
2026-05-16  1:10   ` Jakub Kicinski [this message]
2026-05-12 21:21 ` [PATCH net-next v2 13/15] bnxt_en: Implement kTLS TX normal path Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 14/15] bnxt_en: Add support for inline transmit BDs Michael Chan
2026-05-12 21:21 ` [PATCH net-next v2 15/15] bnxt_en: Add kTLS retransmission support Michael Chan
2026-05-16  1:10   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260516011022.1857741-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pavan.chebbi@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.