Re: [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: michael.chan@broadcom.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
	pabeni@redhat.com, andrew+netdev@lunn.ch,
	pavan.chebbi@broadcom.com, andrew.gospodarek@broadcom.com
Subject: Re: [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure
Date: Fri, 15 May 2026 18:10:20 -0700	[thread overview]
Message-ID: <20260516011020.1857710-1-kuba@kernel.org> (raw)
In-Reply-To: <20260512212105.3488258-12-michael.chan@broadcom.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
bnxt_en: Add crypto MPC transmit/completion infrastructure

This patch adds infrastructure to send crypto commands over the midpath
channels (MPCs).  bnxt_xmit_crypto_cmd() submits a command and sleeps with
a timeout for the completion; on timeout the MPC ring is reset.

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c
> index 4ea3e67be9f5..2cacce28676d 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c

[ ... ]

> @@ -113,8 +130,13 @@ void bnxt_clear_crypto(struct bnxt *bp)
>   */
>  void bnxt_free_crypto_info(struct bnxt *bp)
>  {
> +	struct bnxt_crypto_info *crypto = bp->crypto_info;
> +
> +	if (!crypto)
> +		return;
>  	bnxt_clear_crypto(bp);
> -	kfree(bp->crypto_info);
> +	kmem_cache_destroy(crypto->mpc_cache);
> +	kfree(crypto);
>  	bp->crypto_info = NULL;
>  }

[High]
Is there a guarantee that no bnxt_xmit_crypto_cmd() caller is still in
flight when bnxt_free_crypto_info() runs?

bnxt_clear_crypto() does not appear to wait for callers currently sleeping
in wait_for_completion_timeout() or in the 200 ms poll inside
bnxt_crypto_timeout(), and there is no refcount or wait_event tying ctx
lifetime to cache destruction.

If a caller is mid-flight during teardown, can kmem_cache_destroy() trigger
the "Slab cache still has objects" warning, and can the surviving caller
later call kmem_cache_free() on a destroyed cache?

> @@ -333,6 +355,90 @@ int bnxt_key_ctx_alloc_one(struct bnxt *bp, struct bnxt_kctx *kctx, u8 kind,
>  	return -EAGAIN;
>  }
>  
> +#define BNXT_CMD_CTX_RETRY_MAX	10
> +
> +static void bnxt_crypto_timeout(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> +				struct bnxt_crypto_cmd_ctx *ctx)
> +{
> +	int i;
> +
> +	bnxt_mpc_timeout(bp, txr);
> +	/* Wait for the ctx to complete before proceeding */
> +	for (i = 0; i < BNXT_CMD_CTX_RETRY_MAX &&
> +	     !(ctx->status & BNXT_CMD_CTX_COMPLETED); i++)
> +		msleep(20);
> +	if (!(ctx->status & BNXT_CMD_CTX_COMPLETED))
> +		netdev_warn(bp->dev,
> +			    "Timed out waiting for cmd_ctx to complete for MPC ring %d\n",
> +			    txr->txq_index);
> +}

[Critical]
After this 200 ms poll the caller proceeds to kmem_cache_free(ctx)
unconditionally.  Are there paths where bnxt_mpc_timeout() does not
actually clean up ctx->handle from tx_mpc_buf_ring[] within 200 ms, leaving
the freed pointer reachable?

bnxt_mpc_timeout() returns -EINVAL when txr->tx_ring_struct.queue_id is
not BNXT_MPC_QUEUE_ID, and this rc is not checked here:

	int bnxt_mpc_timeout(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
	{
		if (txr->tx_ring_struct.queue_id == BNXT_MPC_QUEUE_ID)
			return bnxt_mpc_ring_reset(bp, txr->txq_index);
		return -EINVAL;
	}

bnxt_mpc_ring_reset() also returns 0 immediately when bnxt_disable_mpc_ring()
loses the race because dev_state was already set:

	if (!bnxt_disable_mpc_ring(mpc, mpc_ring))
		return 0;

In both cases, no bnxt_clear_one_mpc_entries() runs, the loser's poll
exits after 200 ms, and the freed ctx is still pointed to from
tx_mpc_buf_ring[i].handle.  When NAPI later observes the completion (or
the winner's bnxt_clear_one_mpc_entries() iterates), bnxt_crypto_mpc_cmp()
will write ctx->status, memcpy into ctx->ce_cmp, and complete(&ctx->cmp)
on freed slab memory.

Should the timeout path arrange that the handle is removed (or the ctx
guaranteed completed) before kmem_cache_free()?

> +#define BNXT_XMIT_CRYPTO_RETRY_MAX	10
> +#define BNXT_XMIT_CRYPTO_MIN_TMO	100
> +#define BNXT_XMIT_CRYPTO_MAX_TMO	150
> +
> +int bnxt_xmit_crypto_cmd(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> +			 void *cmd, unsigned int len, unsigned int tmo)
> +{

[ ... ]

> +	tmo_left = wait_for_completion_timeout(&ctx->cmp, msecs_to_jiffies(tmo));
> +	if (!tmo_left) {
> +		netdev_warn(bp->dev, "crypto MP cmd %08x timed out\n",
> +			    *((u32 *)cmd));
> +		bnxt_crypto_timeout(bp, txr, ctx);
> +		rc = -ETIMEDOUT;
> +		goto xmit_done;
> +	}

[Critical]
Even outside the timeout path, is the handshake between bnxt_crypto_mpc_cmp()
and the waiter safe?  bnxt_crypto_mpc_cmp() does:

	ctx = (void *)handle;
	ctx->status = BNXT_CMD_CTX_COMPLETED;
	...
	memcpy(&ctx->ce_cmp, cmpl[0].cmpl, len);
cmp_done:
	complete(&ctx->cmp);

The timeout poll loop above exits as soon as it observes
(ctx->status & BNXT_CMD_CTX_COMPLETED).  If the waiter happened to time
out at exactly the same time as a real completion, can the poll observe
status set by mpc_cmp(), then proceed to kmem_cache_free(ctx) while the
mpc_cmp() side is still inside complete(&ctx->cmp) and is touching
ctx->cmp.wait on another CPU?

[ ... ]

> +xmit_done:
> +	if (rc) {

[ ... ]

> +	if (ctx)
> +		kmem_cache_free(crypto->mpc_cache, ctx);
> +	return rc;
> +}

> +void bnxt_crypto_mpc_cmp(struct bnxt *bp, u32 client, unsigned long handle,
> +			 struct bnxt_cmpl_entry cmpl[], u32 entries)
> +{
> +	struct bnxt_crypto_cmd_ctx *ctx;
> +	struct ce_cmpl *cmp = NULL;
> +	u32 len, kid;
> +
> +	if (likely(cmpl))
> +		cmp = cmpl[0].cmpl;
> +	if (!handle || entries != 1) {
> +		if (entries != 1 && cmpl) {
> +			netdev_warn(bp->dev, "Invalid entries %d with handle %lx cmpl %08x in %s()\n",
> +				    entries, handle, *(u32 *)cmp, __func__);
> +		}
> +		if (!handle)
> +			return;
> +	}
> +	ctx = (void *)handle;
> +	ctx->status = BNXT_CMD_CTX_COMPLETED;

[Medium]
When entries != 1 with a valid cmpl, the function emits a warning and
falls through to set ctx->status = BNXT_CMD_CTX_COMPLETED without setting
BNXT_CMD_CTX_ERROR (BNXT_CMD_CTX_ERROR is only set on the kid/client
mismatch branch below).  The waiter checks:

	if (ctx->status == BNXT_CMD_CTX_COMPLETED &&
	    CE_CMPL_STATUS(&ctx->ce_cmp) == CE_CMPL_STATUS_OK)
		rc = 0;

Should the "invalid entries" branch also OR in BNXT_CMD_CTX_ERROR so that
the caller does not see a successful return for a completion the driver
itself flagged as malformed?

> +	if (unlikely(!cmpl)) {
> +		ctx->status |= BNXT_CMD_CTX_RESET;
> +		goto cmp_done;
> +	}

[ ... ]

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c
> index 67ba43b7f168..196805c2d972 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c

[ ... ]

> +/* Returns true if the ring is successfully marked as closing. */
> +static bool bnxt_disable_mpc_ring(struct bnxt_mpc_info *mpc, int mpc_ring)
> +{
> +	struct bnxt_tx_ring_info *txr;
> +	bool disabled = false;
> +	int i;
> +
> +	for (i = 0; i < BNXT_MPC_TYPE_MAX; i++) {
> +		if (mpc_ring >= mpc->mpc_ring_count[i])
> +			continue;
> +		txr = &mpc->mpc_rings[i][mpc_ring];
> +		spin_lock_bh(&txr->tx_lock);
> +		if (!READ_ONCE(txr->dev_state)) {
> +			disabled = true;
> +			WRITE_ONCE(txr->dev_state, BNXT_DEV_STATE_CLOSING);
> +		}
> +		spin_unlock_bh(&txr->tx_lock);
> +		if (!disabled)
> +			break;
> +	}
> +	return disabled;
> +}

[ ... ]

> +static void bnxt_clear_one_mpc_entries(struct bnxt *bp,
> +				       struct bnxt_tx_ring_info *txr)
> +{
> +	struct bnxt_sw_mpc_tx_bd *tx_buf;
> +	unsigned long handle;
> +	int i, max_idx;
> +	u32 client;
> +
> +	max_idx = bp->tx_nr_pages * TX_DESC_CNT;
> +
> +	for (i = 0; i < max_idx; i++) {
> +		tx_buf = &txr->tx_mpc_buf_ring[i];
> +		handle = tx_buf->handle;
> +		if (handle) {
> +			client = txr->tx_ring_struct.mpc_chnl_type;
> +			bnxt_crypto_mpc_cmp(bp, client, handle, NULL, 0);
> +			tx_buf->handle = 0;
> +		}
> +	}
> +}

> +static void bnxt_mpc_ring_stop(struct bnxt *bp, struct bnxt_mpc_info *mpc,
> +			       int mpc_ring)
> +{
> +	struct bnxt_tx_ring_info *txr;
> +	struct bnxt_cp_ring_info *cpr;
> +	int i;
> +
> +	for (i = 0; i < BNXT_MPC_TYPE_MAX; i++) {
> +		if (mpc->mpc_ring_count[i] > mpc_ring) {
> +			txr = &mpc->mpc_rings[i][mpc_ring];
> +			bnxt_hwrm_tx_ring_free(bp, txr, true);
> +		}
> +	}
> +	/* CP rings must be freed at the end to guarantee that the HWRM_DONE
> +	 * responses for HWRM_RING_FREE can still be seen on the CP rings.
> +	 */
> +	for (i = 0; i < BNXT_MPC_TYPE_MAX; i++) {
> +		if (mpc->mpc_ring_count[i] > mpc_ring) {
> +			txr = &mpc->mpc_rings[i][mpc_ring];
> +			cpr = txr->tx_cpr;
> +			if (cpr) {
> +				bnxt_hwrm_cp_ring_free(bp, cpr);
> +				bnxt_clear_one_cp_ring(bp, cpr);
> +			}
> +			bnxt_clear_one_mpc_entries(bp, txr);
> +		}
> +	}
> +}

[Critical]
Is bnxt_disable_mpc_ring() sufficient to fence the NAPI poller before the
ring teardown begins?

bnxt_disable_mpc_ring() takes spin_lock_bh(&txr->tx_lock) and flips
dev_state to BNXT_DEV_STATE_CLOSING, but the NAPI consumer in
bnxt_mpc_cmp() reads dev_state with bare READ_ONCE() and never takes
tx_lock:

	mpc_buf = &txr->tx_mpc_buf_ring[RING_TX(bp, tx_cons)];
	if (!READ_ONCE(txr->dev_state))
		handle = mpc_buf->handle;
	mpc_buf->handle = 0;
	...
	bnxt_crypto_mpc_cmp(bp, client, handle, cmpl_entry_arr, cmpl_num);

There is no synchronize_net() / napi_synchronize() between the dev_state
flip and bnxt_mpc_ring_stop()'s teardown.

A NAPI poll already past the READ_ONCE(dev_state) check will have captured
handle = mpc_buf->handle and is about to call bnxt_crypto_mpc_cmp() while,
on another CPU, bnxt_clear_one_mpc_entries() walks tx_mpc_buf_ring,
reads the same still-non-zero tx_buf->handle, and also calls
bnxt_crypto_mpc_cmp() on it.  Two complete()s on the same ctx; if the
waiter freed ctx after the first one, the second is a use-after-free.

Separately, bnxt_clear_one_cp_ring() memset()s cp_desc_ring pages to zero
before NAPI is fenced, so can NAPI observe partially-zeroed completion
entries while iterating?

The L2 close path in bnxt.c uses the explicit pattern of WRITE_ONCE on
dev_state followed by synchronize_net() (with the comment "guarantee no
more DMA on that ring after seeing the completion") before tearing the
ring down.  Should the MPC reset path do the same?

> +static int bnxt_mpc_ring_reset(struct bnxt *bp, int mpc_ring)
> +{
> +	struct bnxt_mpc_info *mpc = bp->mpc_info;
> +	int rc;
> +
> +	if (!mpc)
> +		return 0;
> +	if (mpc_ring >= mpc->mpc_cp_rings)
> +		return -EINVAL;
> +
> +	if (!bnxt_disable_mpc_ring(mpc, mpc_ring))
> +		return 0;

[ ... ]

> +int bnxt_mpc_timeout(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
> +{
> +	if (txr->tx_ring_struct.queue_id == BNXT_MPC_QUEUE_ID)
> +		return bnxt_mpc_ring_reset(bp, txr->txq_index);
> +	return -EINVAL;
> +}

[ ... ]

> @@ -529,14 +693,17 @@ int bnxt_mpc_cmp(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, u32 *raw_cons)
>  		txr = bnapi->tx_mpc_ring[client];
>  		tx_cons = txr->tx_cons;
>  		if (TX_OPAQUE_RING(opaque) != txr->tx_napi_idx)
> -			netdev_warn(bp->dev, "Wrong opaque %x, expected ring %x, idx %x\n",
> +			netdev_warn(bp->dev, "Wrong opaque %x, expected ring %x, cons idx %x\n",
>  				    opaque, txr->tx_napi_idx, txr->tx_cons);
>  		mpc_buf = &txr->tx_mpc_buf_ring[RING_TX(bp, tx_cons)];
> +		if (!READ_ONCE(txr->dev_state))
> +			handle = mpc_buf->handle;
>  		mpc_buf->handle = 0;
>  		tx_cons += mpc_buf->inline_bds;
>  		txr->tx_cons = tx_cons;
>  		txr->tx_hw_cons = RING_TX(bp, tx_cons);
>  	}
> +	bnxt_crypto_mpc_cmp(bp, client, handle, cmpl_entry_arr, cmpl_num);
>  
>  cmp_done:
>  	*raw_cons = tmp_raw_cons;

next prev parent reply	other threads:[~2026-05-16  1:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 21:20 [PATCH net-next v2 00/15] bnxt_en: Add kTLS TX offload support Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 01/15] bnxt_en: Add Midpath channel information Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 02/15] bnxt_en: Account for the MPC TX and CP rings Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 03/15] bnxt_en: Set default MPC ring count Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 04/15] bnxt_en: Rename xdp_tx_lock to tx_lock Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 05/15] bnxt_en: Allocate and free MPC software structures Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 06/15] bnxt_en: Allocate and free MPC channels from firmware Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 07/15] bnxt_en: Allocate crypto structure and backing store Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 08/15] bnxt_en: Reserve crypto RX and TX key contexts on a PF Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 09/15] bnxt_en: Add infrastructure for crypto key context IDs Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 10/15] bnxt_en: Add MPC transmit and completion functions Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure Michael Chan
2026-05-16  1:10   ` Jakub Kicinski [this message]
2026-05-12 21:21 ` [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del() Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 13/15] bnxt_en: Implement kTLS TX normal path Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 14/15] bnxt_en: Add support for inline transmit BDs Michael Chan
2026-05-12 21:21 ` [PATCH net-next v2 15/15] bnxt_en: Add kTLS retransmission support Michael Chan
2026-05-16  1:10   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260516011020.1857710-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pavan.chebbi@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.