From: Jakub Kicinski <kuba@kernel.org>
To: michael.chan@broadcom.com
Cc: Jakub Kicinski <kuba@kernel.org>,
davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
pabeni@redhat.com, andrew+netdev@lunn.ch,
pavan.chebbi@broadcom.com, andrew.gospodarek@broadcom.com
Subject: Re: [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure
Date: Fri, 15 May 2026 18:10:20 -0700 [thread overview]
Message-ID: <20260516011020.1857710-1-kuba@kernel.org> (raw)
In-Reply-To: <20260512212105.3488258-12-michael.chan@broadcom.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
bnxt_en: Add crypto MPC transmit/completion infrastructure
This patch adds infrastructure to send crypto commands over the midpath
channels (MPCs). bnxt_xmit_crypto_cmd() submits a command and sleeps with
a timeout for the completion; on timeout the MPC ring is reset.
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c
> index 4ea3e67be9f5..2cacce28676d 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_crypto.c
[ ... ]
> @@ -113,8 +130,13 @@ void bnxt_clear_crypto(struct bnxt *bp)
> */
> void bnxt_free_crypto_info(struct bnxt *bp)
> {
> + struct bnxt_crypto_info *crypto = bp->crypto_info;
> +
> + if (!crypto)
> + return;
> bnxt_clear_crypto(bp);
> - kfree(bp->crypto_info);
> + kmem_cache_destroy(crypto->mpc_cache);
> + kfree(crypto);
> bp->crypto_info = NULL;
> }
[High]
Is there a guarantee that no bnxt_xmit_crypto_cmd() caller is still in
flight when bnxt_free_crypto_info() runs?
bnxt_clear_crypto() does not appear to wait for callers currently sleeping
in wait_for_completion_timeout() or in the 200 ms poll inside
bnxt_crypto_timeout(), and there is no refcount or wait_event tying ctx
lifetime to cache destruction.
If a caller is mid-flight during teardown, can kmem_cache_destroy() trigger
the "Slab cache still has objects" warning, and can the surviving caller
later call kmem_cache_free() on a destroyed cache?
> @@ -333,6 +355,90 @@ int bnxt_key_ctx_alloc_one(struct bnxt *bp, struct bnxt_kctx *kctx, u8 kind,
> return -EAGAIN;
> }
>
> +#define BNXT_CMD_CTX_RETRY_MAX 10
> +
> +static void bnxt_crypto_timeout(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> + struct bnxt_crypto_cmd_ctx *ctx)
> +{
> + int i;
> +
> + bnxt_mpc_timeout(bp, txr);
> + /* Wait for the ctx to complete before proceeding */
> + for (i = 0; i < BNXT_CMD_CTX_RETRY_MAX &&
> + !(ctx->status & BNXT_CMD_CTX_COMPLETED); i++)
> + msleep(20);
> + if (!(ctx->status & BNXT_CMD_CTX_COMPLETED))
> + netdev_warn(bp->dev,
> + "Timed out waiting for cmd_ctx to complete for MPC ring %d\n",
> + txr->txq_index);
> +}
[Critical]
After this 200 ms poll the caller proceeds to kmem_cache_free(ctx)
unconditionally. Are there paths where bnxt_mpc_timeout() does not
actually clean up ctx->handle from tx_mpc_buf_ring[] within 200 ms, leaving
the freed pointer reachable?
bnxt_mpc_timeout() returns -EINVAL when txr->tx_ring_struct.queue_id is
not BNXT_MPC_QUEUE_ID, and this rc is not checked here:
int bnxt_mpc_timeout(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
{
if (txr->tx_ring_struct.queue_id == BNXT_MPC_QUEUE_ID)
return bnxt_mpc_ring_reset(bp, txr->txq_index);
return -EINVAL;
}
bnxt_mpc_ring_reset() also returns 0 immediately when bnxt_disable_mpc_ring()
loses the race because dev_state was already set:
if (!bnxt_disable_mpc_ring(mpc, mpc_ring))
return 0;
In both cases, no bnxt_clear_one_mpc_entries() runs, the loser's poll
exits after 200 ms, and the freed ctx is still pointed to from
tx_mpc_buf_ring[i].handle. When NAPI later observes the completion (or
the winner's bnxt_clear_one_mpc_entries() iterates), bnxt_crypto_mpc_cmp()
will write ctx->status, memcpy into ctx->ce_cmp, and complete(&ctx->cmp)
on freed slab memory.
Should the timeout path arrange that the handle is removed (or the ctx
guaranteed completed) before kmem_cache_free()?
> +#define BNXT_XMIT_CRYPTO_RETRY_MAX 10
> +#define BNXT_XMIT_CRYPTO_MIN_TMO 100
> +#define BNXT_XMIT_CRYPTO_MAX_TMO 150
> +
> +int bnxt_xmit_crypto_cmd(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> + void *cmd, unsigned int len, unsigned int tmo)
> +{
[ ... ]
> + tmo_left = wait_for_completion_timeout(&ctx->cmp, msecs_to_jiffies(tmo));
> + if (!tmo_left) {
> + netdev_warn(bp->dev, "crypto MP cmd %08x timed out\n",
> + *((u32 *)cmd));
> + bnxt_crypto_timeout(bp, txr, ctx);
> + rc = -ETIMEDOUT;
> + goto xmit_done;
> + }
[Critical]
Even outside the timeout path, is the handshake between bnxt_crypto_mpc_cmp()
and the waiter safe? bnxt_crypto_mpc_cmp() does:
ctx = (void *)handle;
ctx->status = BNXT_CMD_CTX_COMPLETED;
...
memcpy(&ctx->ce_cmp, cmpl[0].cmpl, len);
cmp_done:
complete(&ctx->cmp);
The timeout poll loop above exits as soon as it observes
(ctx->status & BNXT_CMD_CTX_COMPLETED). If the waiter happened to time
out at exactly the same time as a real completion, can the poll observe
status set by mpc_cmp(), then proceed to kmem_cache_free(ctx) while the
mpc_cmp() side is still inside complete(&ctx->cmp) and is touching
ctx->cmp.wait on another CPU?
[ ... ]
> +xmit_done:
> + if (rc) {
[ ... ]
> + if (ctx)
> + kmem_cache_free(crypto->mpc_cache, ctx);
> + return rc;
> +}
> +void bnxt_crypto_mpc_cmp(struct bnxt *bp, u32 client, unsigned long handle,
> + struct bnxt_cmpl_entry cmpl[], u32 entries)
> +{
> + struct bnxt_crypto_cmd_ctx *ctx;
> + struct ce_cmpl *cmp = NULL;
> + u32 len, kid;
> +
> + if (likely(cmpl))
> + cmp = cmpl[0].cmpl;
> + if (!handle || entries != 1) {
> + if (entries != 1 && cmpl) {
> + netdev_warn(bp->dev, "Invalid entries %d with handle %lx cmpl %08x in %s()\n",
> + entries, handle, *(u32 *)cmp, __func__);
> + }
> + if (!handle)
> + return;
> + }
> + ctx = (void *)handle;
> + ctx->status = BNXT_CMD_CTX_COMPLETED;
[Medium]
When entries != 1 with a valid cmpl, the function emits a warning and
falls through to set ctx->status = BNXT_CMD_CTX_COMPLETED without setting
BNXT_CMD_CTX_ERROR (BNXT_CMD_CTX_ERROR is only set on the kid/client
mismatch branch below). The waiter checks:
if (ctx->status == BNXT_CMD_CTX_COMPLETED &&
CE_CMPL_STATUS(&ctx->ce_cmp) == CE_CMPL_STATUS_OK)
rc = 0;
Should the "invalid entries" branch also OR in BNXT_CMD_CTX_ERROR so that
the caller does not see a successful return for a completion the driver
itself flagged as malformed?
> + if (unlikely(!cmpl)) {
> + ctx->status |= BNXT_CMD_CTX_RESET;
> + goto cmp_done;
> + }
[ ... ]
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c
> index 67ba43b7f168..196805c2d972 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_mpc.c
[ ... ]
> +/* Returns true if the ring is successfully marked as closing. */
> +static bool bnxt_disable_mpc_ring(struct bnxt_mpc_info *mpc, int mpc_ring)
> +{
> + struct bnxt_tx_ring_info *txr;
> + bool disabled = false;
> + int i;
> +
> + for (i = 0; i < BNXT_MPC_TYPE_MAX; i++) {
> + if (mpc_ring >= mpc->mpc_ring_count[i])
> + continue;
> + txr = &mpc->mpc_rings[i][mpc_ring];
> + spin_lock_bh(&txr->tx_lock);
> + if (!READ_ONCE(txr->dev_state)) {
> + disabled = true;
> + WRITE_ONCE(txr->dev_state, BNXT_DEV_STATE_CLOSING);
> + }
> + spin_unlock_bh(&txr->tx_lock);
> + if (!disabled)
> + break;
> + }
> + return disabled;
> +}
[ ... ]
> +static void bnxt_clear_one_mpc_entries(struct bnxt *bp,
> + struct bnxt_tx_ring_info *txr)
> +{
> + struct bnxt_sw_mpc_tx_bd *tx_buf;
> + unsigned long handle;
> + int i, max_idx;
> + u32 client;
> +
> + max_idx = bp->tx_nr_pages * TX_DESC_CNT;
> +
> + for (i = 0; i < max_idx; i++) {
> + tx_buf = &txr->tx_mpc_buf_ring[i];
> + handle = tx_buf->handle;
> + if (handle) {
> + client = txr->tx_ring_struct.mpc_chnl_type;
> + bnxt_crypto_mpc_cmp(bp, client, handle, NULL, 0);
> + tx_buf->handle = 0;
> + }
> + }
> +}
> +static void bnxt_mpc_ring_stop(struct bnxt *bp, struct bnxt_mpc_info *mpc,
> + int mpc_ring)
> +{
> + struct bnxt_tx_ring_info *txr;
> + struct bnxt_cp_ring_info *cpr;
> + int i;
> +
> + for (i = 0; i < BNXT_MPC_TYPE_MAX; i++) {
> + if (mpc->mpc_ring_count[i] > mpc_ring) {
> + txr = &mpc->mpc_rings[i][mpc_ring];
> + bnxt_hwrm_tx_ring_free(bp, txr, true);
> + }
> + }
> + /* CP rings must be freed at the end to guarantee that the HWRM_DONE
> + * responses for HWRM_RING_FREE can still be seen on the CP rings.
> + */
> + for (i = 0; i < BNXT_MPC_TYPE_MAX; i++) {
> + if (mpc->mpc_ring_count[i] > mpc_ring) {
> + txr = &mpc->mpc_rings[i][mpc_ring];
> + cpr = txr->tx_cpr;
> + if (cpr) {
> + bnxt_hwrm_cp_ring_free(bp, cpr);
> + bnxt_clear_one_cp_ring(bp, cpr);
> + }
> + bnxt_clear_one_mpc_entries(bp, txr);
> + }
> + }
> +}
[Critical]
Is bnxt_disable_mpc_ring() sufficient to fence the NAPI poller before the
ring teardown begins?
bnxt_disable_mpc_ring() takes spin_lock_bh(&txr->tx_lock) and flips
dev_state to BNXT_DEV_STATE_CLOSING, but the NAPI consumer in
bnxt_mpc_cmp() reads dev_state with bare READ_ONCE() and never takes
tx_lock:
mpc_buf = &txr->tx_mpc_buf_ring[RING_TX(bp, tx_cons)];
if (!READ_ONCE(txr->dev_state))
handle = mpc_buf->handle;
mpc_buf->handle = 0;
...
bnxt_crypto_mpc_cmp(bp, client, handle, cmpl_entry_arr, cmpl_num);
There is no synchronize_net() / napi_synchronize() between the dev_state
flip and bnxt_mpc_ring_stop()'s teardown.
A NAPI poll already past the READ_ONCE(dev_state) check will have captured
handle = mpc_buf->handle and is about to call bnxt_crypto_mpc_cmp() while,
on another CPU, bnxt_clear_one_mpc_entries() walks tx_mpc_buf_ring,
reads the same still-non-zero tx_buf->handle, and also calls
bnxt_crypto_mpc_cmp() on it. Two complete()s on the same ctx; if the
waiter freed ctx after the first one, the second is a use-after-free.
Separately, bnxt_clear_one_cp_ring() memset()s cp_desc_ring pages to zero
before NAPI is fenced, so can NAPI observe partially-zeroed completion
entries while iterating?
The L2 close path in bnxt.c uses the explicit pattern of WRITE_ONCE on
dev_state followed by synchronize_net() (with the comment "guarantee no
more DMA on that ring after seeing the completion") before tearing the
ring down. Should the MPC reset path do the same?
> +static int bnxt_mpc_ring_reset(struct bnxt *bp, int mpc_ring)
> +{
> + struct bnxt_mpc_info *mpc = bp->mpc_info;
> + int rc;
> +
> + if (!mpc)
> + return 0;
> + if (mpc_ring >= mpc->mpc_cp_rings)
> + return -EINVAL;
> +
> + if (!bnxt_disable_mpc_ring(mpc, mpc_ring))
> + return 0;
[ ... ]
> +int bnxt_mpc_timeout(struct bnxt *bp, struct bnxt_tx_ring_info *txr)
> +{
> + if (txr->tx_ring_struct.queue_id == BNXT_MPC_QUEUE_ID)
> + return bnxt_mpc_ring_reset(bp, txr->txq_index);
> + return -EINVAL;
> +}
[ ... ]
> @@ -529,14 +693,17 @@ int bnxt_mpc_cmp(struct bnxt *bp, struct bnxt_cp_ring_info *cpr, u32 *raw_cons)
> txr = bnapi->tx_mpc_ring[client];
> tx_cons = txr->tx_cons;
> if (TX_OPAQUE_RING(opaque) != txr->tx_napi_idx)
> - netdev_warn(bp->dev, "Wrong opaque %x, expected ring %x, idx %x\n",
> + netdev_warn(bp->dev, "Wrong opaque %x, expected ring %x, cons idx %x\n",
> opaque, txr->tx_napi_idx, txr->tx_cons);
> mpc_buf = &txr->tx_mpc_buf_ring[RING_TX(bp, tx_cons)];
> + if (!READ_ONCE(txr->dev_state))
> + handle = mpc_buf->handle;
> mpc_buf->handle = 0;
> tx_cons += mpc_buf->inline_bds;
> txr->tx_cons = tx_cons;
> txr->tx_hw_cons = RING_TX(bp, tx_cons);
> }
> + bnxt_crypto_mpc_cmp(bp, client, handle, cmpl_entry_arr, cmpl_num);
>
> cmp_done:
> *raw_cons = tmp_raw_cons;
next prev parent reply other threads:[~2026-05-16 1:10 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 21:20 [PATCH net-next v2 00/15] bnxt_en: Add kTLS TX offload support Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 01/15] bnxt_en: Add Midpath channel information Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 02/15] bnxt_en: Account for the MPC TX and CP rings Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 03/15] bnxt_en: Set default MPC ring count Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 04/15] bnxt_en: Rename xdp_tx_lock to tx_lock Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 05/15] bnxt_en: Allocate and free MPC software structures Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 06/15] bnxt_en: Allocate and free MPC channels from firmware Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 07/15] bnxt_en: Allocate crypto structure and backing store Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 08/15] bnxt_en: Reserve crypto RX and TX key contexts on a PF Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 09/15] bnxt_en: Add infrastructure for crypto key context IDs Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 10/15] bnxt_en: Add MPC transmit and completion functions Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure Michael Chan
2026-05-16 1:10 ` Jakub Kicinski [this message]
2026-05-12 21:21 ` [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del() Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 13/15] bnxt_en: Implement kTLS TX normal path Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 14/15] bnxt_en: Add support for inline transmit BDs Michael Chan
2026-05-12 21:21 ` [PATCH net-next v2 15/15] bnxt_en: Add kTLS retransmission support Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260516011020.1857710-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=andrew.gospodarek@broadcom.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pavan.chebbi@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox