Netdev List
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: michael.chan@broadcom.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
	pabeni@redhat.com, andrew+netdev@lunn.ch,
	pavan.chebbi@broadcom.com, andrew.gospodarek@broadcom.com
Subject: Re: [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure
Date: Fri, 15 May 2026 18:10:20 -0700	[thread overview]
Message-ID: <20260516011020.1857710-1-kuba@kernel.org> (raw)
In-Reply-To: <20260512212105.3488258-12-michael.chan@broadcom.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
bnxt_en: Add crypto MPC transmit/completion infrastructure

This patch adds infrastructure to send crypto commands over the midpath
channels (MPCs).  bnxt_xmit_crypto_cmd() submits a command and sleeps with
a timeout for the completion; on timeout the MPC ring is reset.

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c
> index 4ea3e67be9f5..2cacce28676d 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c

[ ... ]

> @@ -113,8 +130,13 @@ void bnxt_clear_crypto(struct bnxt *bp)
>   */
>  void bnxt_free_crypto_info(struct bnxt *bp)
>  {
> +	struct bnxt_crypto_info *crypto = bp->crypto_info;
> +
> +	if (!crypto)
> +		return;
>  	bnxt_clear_crypto(bp);
> -	kfree(bp->crypto_info);
> +	kmem_cache_destroy(crypto->mpc_cache);
> +	kfree(crypto);
>  	bp->crypto_info = NULL;
>  }

[High]
Is there a guarantee that no bnxt_xmit_crypto_cmd() caller is still in
flight when bnxt_free_crypto_info() runs?

bnxt_clear_crypto() does not appear to wait for callers currently sleeping
in wait_for_completion_timeout() or in the 200 ms poll inside
bnxt_crypto_timeout(), and there is no refcount or wait_event tying ctx
lifetime to cache destruction.

If a caller is mid-flight during teardown, can kmem_cache_destroy() trigger
the "Slab cache still has objects" warning, and can the surviving caller
later call kmem_cache_free() on a destroyed cache?

> @@ -333,6 +355,90 @@ int bnxt_key_ctx_alloc_one(struct bnxt *bp, struct bnxt_kctx *kctx, u8 kind,
>  	return -EAGAIN;
>  }
>  
> +#define BNXT_CMD_CTX_RETRY_MAX	10
> +
> +static void bnxt_crypto_timeout(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> +				struct bnxt_crypto_cmd_ctx *ctx)
> +{
> +	int i;
> +
> +	bnxt_mpc_timeout(bp, txr);
> +	/* Wait for the ctx to complete before proceeding */
> +	for (i = 0; i < BNXT_CMD_CTX_RETRY_MAX &&
> +	     !(ctx->status & BNXT_CMD_CTX_COMPLETED); i++)
> +		msleep(20);
> +	if (!(ctx->status & BNXT_CMD_CTX_COMPLETED))
> +		netdev_warn(bp->dev,
> +			    "Timed out waiting for cmd_ctx to complete for MPC ring %d\n",
> +			    txr->txq_index);
> +}

[Critical]
After this 200 ms poll the caller proceeds to kmem_cache_free(ctx)
unconditionally.  Are there paths where bnxt_mpc_timeout() does not
actually clean up ctx->handle from tx_mpc_buf_ring[] within 200 ms, leaving
the freed pointer reachable?

bnxt_mpc_timeout() returns -EINVAL when txr->tx_ring_struct.queue_id is
not BNXT_MPC_QUEUE_ID, and this rc is not checked here:

	int bnxt_mpc_timeout(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
	{
		if (txr->tx_ring_struct.queue_id == BNXT_MPC_QUEUE_ID)
			return bnxt_mpc_ring_reset(bp, txr->txq_index);
		return -EINVAL;
	}

bnxt_mpc_ring_reset() also returns 0 immediately when bnxt_disable_mpc_ring()
loses the race because dev_state was already set:

	if (!bnxt_disable_mpc_ring(mpc, mpc_ring))
		return 0;

In both cases, no bnxt_clear_one_mpc_entries() runs, the loser's poll
exits after 200 ms, and the freed ctx is still pointed to from
tx_mpc_buf_ring[i].handle.  When NAPI later observes the completion (or
the winner's bnxt_clear_one_mpc_entries() iterates), bnxt_crypto_mpc_cmp()
will write ctx->status, memcpy into ctx->ce_cmp, and complete(&ctx->cmp)
on freed slab memory.

Should the timeout path arrange that the handle is removed (or the ctx
guaranteed completed) before kmem_cache_free()?

> +#define BNXT_XMIT_CRYPTO_RETRY_MAX	10
> +#define BNXT_XMIT_CRYPTO_MIN_TMO	100
> +#define BNXT_XMIT_CRYPTO_MAX_TMO	150
> +
> +int bnxt_xmit_crypto_cmd(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> +			 void *cmd, unsigned int len, unsigned int tmo)
> +{

[ ... ]

> +	tmo_left = wait_for_completion_timeout(&ctx->cmp, msecs_to_jiffies(tmo));
> +	if (!tmo_left) {
> +		netdev_warn(bp->dev, "crypto MP cmd %08x timed out\n",
> +			    *((u32 *)cmd));
> +		bnxt_crypto_timeout(bp, txr, ctx);
> +		rc = -ETIMEDOUT;
> +		goto xmit_done;
> +	}

[Critical]
Even outside the timeout path, is the handshake between bnxt_crypto_mpc_cmp()
and the waiter safe?  bnxt_crypto_mpc_cmp() does:

	ctx = (void *)handle;
	ctx->status = BNXT_CMD_CTX_COMPLETED;
	...
	memcpy(&ctx->ce_cmp, cmpl[0].cmpl, len);
cmp_done:
	complete(&ctx->cmp);

The timeout poll loop above exits as soon as it observes
(ctx->status & BNXT_CMD_CTX_COMPLETED).  If the waiter happened to time
out at exactly the same time as a real completion, can the poll observe
status set by mpc_cmp(), then proceed to kmem_cache_free(ctx) while the
mpc_cmp() side is still inside complete(&ctx->cmp) and is touching
ctx->cmp.wait on another CPU?

[ ... ]

> +xmit_done:
> +	if (rc) {

[ ... ]

> +	if (ctx)
> +		kmem_cache_free(crypto->mpc_cache, ctx);
> +	return rc;
> +}

> +void bnxt_crypto_mpc_cmp(struct bnxt *bp, u32 client, unsigned long handle,
> +			 struct bnxt_cmpl_entry cmpl[], u32 entries)
> +{
> +	struct bnxt_crypto_cmd_ctx *ctx;
> +	struct ce_cmpl *cmp = NULL;
> +	u32 len, kid;
> +
> +	if (likely(cmpl))
> +		cmp = cmpl[0].cmpl;
> +	if (!handle || entries != 1) {
> +		if (entries != 1 && cmpl) {
> +			netdev_warn(bp->dev, "Invalid entries %d with handle %lx cmpl %08x in %s()\n",
> +				    entries, handle, *(u32 *)cmp, __func__);
> +		}
> +		if (!handle)
> +			return;
> +	}
> +	ctx = (void *)handle;
> +	ctx->status = BNXT_CMD_CTX_COMPLETED;

[Medium]
When entries != 1 with a valid cmpl, the function emits a warning and
falls through to set ctx->status = BNXT_CMD_CTX_COMPLETED without setting
BNXT_CMD_CTX_ERROR (BNXT_CMD_CTX_ERROR is only set on the kid/client
mismatch branch below).  The waiter checks:

	if (ctx->status == BNXT_CMD_CTX_COMPLETED &&
	    CE_CMPL_STATUS(&ctx->ce_cmp) == CE_CMPL_STATUS_OK)
		rc = 0;

Should the "invalid entries" branch also OR in BNXT_CMD_CTX_ERROR so that
the caller does not see a successful return for a completion the driver
itself flagged as malformed?

> +	if (unlikely(!cmpl)) {
> +		ctx->status |= BNXT_CMD_CTX_RESET;
> +		goto cmp_done;
> +	}

[ ... ]

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c
> index 67ba43b7f168..196805c2d972 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c

[ ... ]

> +/* Returns true if the ring is successfully marked as closing. */
> +static bool bnxt_disable_mpc_ring(struct bnxt_mpc_info *mpc, int mpc_ring)
> +{
> +	struct bnxt_tx_ring_info *txr;
> +	bool disabled = false;
> +	int i;
> +
> +	for (i = 0; i < BNXT_MPC_TYPE_MAX; i++) {
> +		if (mpc_ring >= mpc->mpc_ring_count[i])
> +			continue;
> +		txr = &mpc->mpc_rings[i][mpc_ring];
> +		spin_lock_bh(&txr->tx_lock);
> +		if (!READ_ONCE(txr->dev_state)) {
> +			disabled = true;
> +			WRITE_ONCE(txr->dev_state, BNXT_DEV_STATE_CLOSING);
> +		}
> +		spin_unlock_bh(&txr->tx_lock);
> +		if (!disabled)
> +			break;
> +	}
> +	return disabled;
> +}

[ ... ]

> +static void bnxt_clear_one_mpc_entries(struct bnxt *bp,
> +				       struct bnxt_tx_ring_info *txr)
> +{
> +	struct bnxt_sw_mpc_tx_bd *tx_buf;
> +	unsigned long handle;
> +	int i, max_idx;
> +	u32 client;
> +
> +	max_idx = bp->tx_nr_pages * TX_DESC_CNT;
> +
> +	for (i = 0; i < max_idx; i++) {
> +		tx_buf = &txr->tx_mpc_buf_ring[i];
> +		handle = tx_buf->handle;
> +		if (handle) {
> +			client = txr->tx_ring_struct.mpc_chnl_type;
> +			bnxt_crypto_mpc_cmp(bp, client, handle, NULL, 0);
> +			tx_buf->handle = 0;
> +		}
> +	}
> +}

> +static void bnxt_mpc_ring_stop(struct bnxt *bp, struct bnxt_mpc_info *mpc,
> +			       int mpc_ring)
> +{
> +	struct bnxt_tx_ring_info *txr;
> +	struct bnxt_cp_ring_info *cpr;
> +	int i;
> +
> +	for (i = 0; i < BNXT_MPC_TYPE_MAX; i++) {
> +		if (mpc->mpc_ring_count[i] > mpc_ring) {
> +			txr = &mpc->mpc_rings[i][mpc_ring];
> +			bnxt_hwrm_tx_ring_free(bp, txr, true);
> +		}
> +	}
> +	/* CP rings must be freed at the end to guarantee that the HWRM_DONE
> +	 * responses for HWRM_RING_FREE can still be seen on the CP rings.
> +	 */
> +	for (i = 0; i < BNXT_MPC_TYPE_MAX; i++) {
> +		if (mpc->mpc_ring_count[i] > mpc_ring) {
> +			txr = &mpc->mpc_rings[i][mpc_ring];
> +			cpr = txr->tx_cpr;
> +			if (cpr) {
> +				bnxt_hwrm_cp_ring_free(bp, cpr);
> +				bnxt_clear_one_cp_ring(bp, cpr);
> +			}
> +			bnxt_clear_one_mpc_entries(bp, txr);
> +		}
> +	}
> +}

[Critical]
Is bnxt_disable_mpc_ring() sufficient to fence the NAPI poller before the
ring teardown begins?

bnxt_disable_mpc_ring() takes spin_lock_bh(&txr->tx_lock) and flips
dev_state to BNXT_DEV_STATE_CLOSING, but the NAPI consumer in
bnxt_mpc_cmp() reads dev_state with bare READ_ONCE() and never takes
tx_lock:

	mpc_buf = &txr->tx_mpc_buf_ring[RING_TX(bp, tx_cons)];
	if (!READ_ONCE(txr->dev_state))
		handle = mpc_buf->handle;
	mpc_buf->handle = 0;
	...
	bnxt_crypto_mpc_cmp(bp, client, handle, cmpl_entry_arr, cmpl_num);

There is no synchronize_net() / napi_synchronize() between the dev_state
flip and bnxt_mpc_ring_stop()'s teardown.

A NAPI poll already past the READ_ONCE(dev_state) check will have captured
handle = mpc_buf->handle and is about to call bnxt_crypto_mpc_cmp() while,
on another CPU, bnxt_clear_one_mpc_entries() walks tx_mpc_buf_ring,
reads the same still-non-zero tx_buf->handle, and also calls
bnxt_crypto_mpc_cmp() on it.  Two complete()s on the same ctx; if the
waiter freed ctx after the first one, the second is a use-after-free.

Separately, bnxt_clear_one_cp_ring() memset()s cp_desc_ring pages to zero
before NAPI is fenced, so can NAPI observe partially-zeroed completion
entries while iterating?

The L2 close path in bnxt.c uses the explicit pattern of WRITE_ONCE on
dev_state followed by synchronize_net() (with the comment "guarantee no
more DMA on that ring after seeing the completion") before tearing the
ring down.  Should the MPC reset path do the same?

> +static int bnxt_mpc_ring_reset(struct bnxt *bp, int mpc_ring)
> +{
> +	struct bnxt_mpc_info *mpc = bp->mpc_info;
> +	int rc;
> +
> +	if (!mpc)
> +		return 0;
> +	if (mpc_ring >= mpc->mpc_cp_rings)
> +		return -EINVAL;
> +
> +	if (!bnxt_disable_mpc_ring(mpc, mpc_ring))
> +		return 0;

[ ... ]

> +int bnxt_mpc_timeout(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
> +{
> +	if (txr->tx_ring_struct.queue_id == BNXT_MPC_QUEUE_ID)
> +		return bnxt_mpc_ring_reset(bp, txr->txq_index);
> +	return -EINVAL;
> +}

[ ... ]

> @@ -529,14 +693,17 @@ int bnxt_mpc_cmp(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, u32 *raw_cons)
>  		txr = bnapi->tx_mpc_ring[client];
>  		tx_cons = txr->tx_cons;
>  		if (TX_OPAQUE_RING(opaque) != txr->tx_napi_idx)
> -			netdev_warn(bp->dev, "Wrong opaque %x, expected ring %x, idx %x\n",
> +			netdev_warn(bp->dev, "Wrong opaque %x, expected ring %x, cons idx %x\n",
>  				    opaque, txr->tx_napi_idx, txr->tx_cons);
>  		mpc_buf = &txr->tx_mpc_buf_ring[RING_TX(bp, tx_cons)];
> +		if (!READ_ONCE(txr->dev_state))
> +			handle = mpc_buf->handle;
>  		mpc_buf->handle = 0;
>  		tx_cons += mpc_buf->inline_bds;
>  		txr->tx_cons = tx_cons;
>  		txr->tx_hw_cons = RING_TX(bp, tx_cons);
>  	}
> +	bnxt_crypto_mpc_cmp(bp, client, handle, cmpl_entry_arr, cmpl_num);
>  
>  cmp_done:
>  	*raw_cons = tmp_raw_cons;

  reply	other threads:[~2026-05-16  1:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 21:20 [PATCH net-next v2 00/15] bnxt_en: Add kTLS TX offload support Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 01/15] bnxt_en: Add Midpath channel information Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 02/15] bnxt_en: Account for the MPC TX and CP rings Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 03/15] bnxt_en: Set default MPC ring count Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 04/15] bnxt_en: Rename xdp_tx_lock to tx_lock Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 05/15] bnxt_en: Allocate and free MPC software structures Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 06/15] bnxt_en: Allocate and free MPC channels from firmware Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 07/15] bnxt_en: Allocate crypto structure and backing store Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 08/15] bnxt_en: Reserve crypto RX and TX key contexts on a PF Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 09/15] bnxt_en: Add infrastructure for crypto key context IDs Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 10/15] bnxt_en: Add MPC transmit and completion functions Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure Michael Chan
2026-05-16  1:10   ` Jakub Kicinski [this message]
2026-05-12 21:21 ` [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del() Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 13/15] bnxt_en: Implement kTLS TX normal path Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 14/15] bnxt_en: Add support for inline transmit BDs Michael Chan
2026-05-12 21:21 ` [PATCH net-next v2 15/15] bnxt_en: Add kTLS retransmission support Michael Chan
2026-05-16  1:10   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260516011020.1857710-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pavan.chebbi@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox