From: Jakub Kicinski <kuba@kernel.org>
To: michael.chan@broadcom.com
Cc: Jakub Kicinski <kuba@kernel.org>,
davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
pabeni@redhat.com, andrew+netdev@lunn.ch,
pavan.chebbi@broadcom.com, andrew.gospodarek@broadcom.com
Subject: Re: [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del()
Date: Fri, 15 May 2026 18:10:22 -0700 [thread overview]
Message-ID: <20260516011022.1857741-1-kuba@kernel.org> (raw)
In-Reply-To: <20260512212105.3488258-13-michael.chan@broadcom.com>
This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del()
This patch adds the basic infrastructure for kTLS TX offload in bnxt_en
by allocating and freeing kTLS context IDs (KIDs) via MPC commands, and
introduces bnxt_ktls_init() that wires bnxt_ktls_ops onto the netdev for
later use.
A few questions follow inline.
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 74935b119569..e26980814fe7 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
[ ... ]
> @@ -13491,6 +13493,7 @@ static int bnxt_close(struct net_device *dev)
> {
> struct bnxt *bp = netdev_priv(dev);
>
> + bnxt_crypto_del_all(bp);
> bnxt_close_nic(bp, true, true);
> bnxt_hwrm_shutdown_link(bp);
> bnxt_hwrm_if_change(bp, false);
[High]
Is the placement of bnxt_crypto_del_all() before bnxt_close_nic() safe
against concurrent tls_dev_add/tls_dev_del?
At this point BNXT_STATE_OPEN is still set and the pending counter has not
been drained yet (that happens inside __bnxt_close_nic()). The net/tls
core only holds device_offload_lock for read on the add path and does not
synchronize with .ndo_stop, so bnxt_ktls_dev_add() can run in parallel
with bnxt_crypto_del_all() and call into bnxt_key_ctx_alloc_one(), which
modifies kctx->list with list_add_tail_rcu() under kctx->lock.
bnxt_crypto_del_all() walks the same list with plain list_for_each_entry()
(no rcu_read_lock, no kctx->lock), while the alloc/free helpers explicitly
use list_*_rcu primitives:
> +void bnxt_crypto_del_all(struct bnxt *bp)
> +{
> + struct bnxt_crypto_info *crypto = bp->crypto_info;
> + struct bnxt_kid_info *kid;
> + struct bnxt_kctx *kctx;
> + int i;
> +
> + if (!crypto)
> + return;
> +
> + /* Shutting down, no need to protect the lists. */
> + for (i = 0; i < BNXT_MAX_CRYPTO_KEY_TYPE; i++) {
> + kctx = &crypto->kctx[i];
> + list_for_each_entry(kid, &kctx->list, list)
> + bnxt_crypto_del_all_kids(bp, kid);
> + kctx->epoch++;
> + }
> +}
The "Shutting down, no need to protect the lists" comment seems to assume
no concurrent mutation, but with this call site preceding the OPEN-flag
clear and pending-drain, can a concurrent tls_dev_add insert a new kid
mid-traversal? The same applies to the call from bnxt_fw_reset_close()
below.
A second concern is in bnxt_crypto_del_all_kids():
> +static void bnxt_crypto_del_all_kids(struct bnxt *bp, struct bnxt_kid_info *kid)
> +{
> + int i, rc;
> +
> + for (i = 0; i < kid->count; i++) {
> + if (!test_bit(i, kid->ids)) {
> + rc = bnxt_crypto_del(bp, kid->type, kid->kind,
> + kid->start_id + i);
> + if (!rc)
> + set_bit(i, kid->ids);
> + }
> + }
> +}
The sequence test_bit(i) -> bnxt_crypto_del() (which can sleep inside
bnxt_xmit_crypto_cmd()) -> set_bit(i) is not atomic. If a concurrent
bnxt_free_one_kctx() does set_bit(i) and a subsequent bnxt_alloc_one_kctx()
reissues the same KID via test_and_clear_bit(i) during that sleep, can the
trailing set_bit() in this loop then mark an in-use slot as free, so the
next allocation hands the same KID out to a second TLS socket?
> @@ -14567,6 +14570,7 @@ static void bnxt_fw_reset_close(struct bnxt *bp)
> bp->fw_reset_min_dsecs = 0;
> bnxt_fw_fatal_close(bp);
> }
> + bnxt_crypto_del_all(bp);
> __bnxt_close_nic(bp, true, false);
> bnxt_vf_reps_free(bp);
> bnxt_clear_int_mode(bp);
[ ... ]
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ktls.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ktls.c
> new file mode 100644
[ ... ]
> +static int bnxt_crypto_add(struct bnxt *bp, enum tls_offload_ctx_dir direction,
> + struct tls_crypto_info *crypto_info, u32 tcp_seq_no,
> + u32 kid)
> +{
> + struct bnxt_tx_ring_info *txr;
> + struct ce_add_cmd cmd = {0};
> + u32 data;
> +
> + if (direction == TLS_OFFLOAD_CTX_DIR_TX) {
> + txr = bnxt_select_mpc_ring(bp, BNXT_MPC_TCE_TYPE);
> + cmd.ctx_kind = CE_ADD_CMD_CTX_KIND_CK_TX;
> + } else {
> + return -EOPNOTSUPP;
> + }
> + if (!txr)
> + return -ENODEV;
> +
> + data = CE_ADD_CMD_OPCODE_ADD | (BNXT_KID_HW(kid) << CE_ADD_CMD_KID_SFT);
> + switch (crypto_info->cipher_type) {
> + case TLS_CIPHER_AES_GCM_128: {
> + struct tls12_crypto_info_aes_gcm_128 *aes;
> +
> + aes = (void *)crypto_info;
> + data |= CE_ADD_CMD_ALGORITHM_AES_GCM_128;
> + if (crypto_info->version == TLS_1_3_VERSION)
> + data |= CE_ADD_CMD_VERSION_TLS1_3;
> + memcpy(&cmd.session_key, aes->key, sizeof(aes->key));
> + memcpy(&cmd.salt, aes->salt, sizeof(aes->salt));
> + memcpy(&cmd.addl_iv, aes->iv, sizeof(aes->iv));
> + bnxt_copy_tls_mp_data(cmd.record_seq_num, aes->rec_seq,
> + sizeof(aes->rec_seq));
> + break;
> + }
> + case TLS_CIPHER_AES_GCM_256: {
> + struct tls12_crypto_info_aes_gcm_256 *aes;
> +
> + aes = (void *)crypto_info;
> + data |= CE_ADD_CMD_ALGORITHM_AES_GCM_256;
> + if (crypto_info->version == TLS_1_3_VERSION)
> + data |= CE_ADD_CMD_VERSION_TLS1_3;
> + memcpy(&cmd.session_key, aes->key, sizeof(aes->key));
> + memcpy(&cmd.salt, aes->salt, sizeof(aes->salt));
> + memcpy(&cmd.addl_iv, aes->iv, sizeof(aes->iv));
> + bnxt_copy_tls_mp_data(cmd.record_seq_num, aes->rec_seq,
> + sizeof(aes->rec_seq));
> + break;
> + }
> + default:
> + return -EOPNOTSUPP;
> + }
> + cmd.ver_algo_kid_opcode = cpu_to_le32(data);
> + cmd.pkt_tcp_seq_num = cpu_to_le32(tcp_seq_no);
> + cmd.tls_header_tcp_seq_num = cmd.pkt_tcp_seq_num;
> + return bnxt_xmit_crypto_cmd(bp, txr, &cmd, sizeof(cmd),
> + BNXT_MPC_TMO_MSECS);
> +}
[High]
The cmd buffer here is on the kernel stack and ends up holding a
cleartext copy of the AES-GCM session key in cmd.session_key. After
bnxt_xmit_crypto_cmd() returns, the function returns without zeroing the
buffer, so the key bytes remain in the freed stack frame until they are
overwritten.
Should this end with a memzero_explicit(&cmd, sizeof(cmd)) before
returning, similar to how mlx5e_ktls_add_tx() clears its key material?
[ ... ]
> +static int bnxt_ktls_dev_add(struct net_device *dev, struct sock *sk,
> + enum tls_offload_ctx_dir direction,
> + struct tls_crypto_info *crypto_info,
> + u32 start_offload_tcp_sn)
> +{
> + struct bnxt_ktls_offload_ctx_tx *kctx_tx;
> + struct bnxt *bp = netdev_priv(dev);
> + struct bnxt_crypto_info *crypto;
> + struct tls_context *tls_ctx;
> + struct bnxt_tls_info *ktls;
> + struct bnxt_kctx *kctx;
> + u32 kid;
> + int rc;
> +
> + BUILD_BUG_ON(sizeof(struct bnxt_ktls_offload_ctx_tx) >
> + TLS_DRIVER_STATE_SIZE_TX);
> +
> + ktls = bp->ktls_info;
> + if (direction == TLS_OFFLOAD_CTX_DIR_RX)
> + return -EOPNOTSUPP;
> +
> + if (!bnxt_ktls_cipher_supported(bp, crypto_info)) {
> + atomic64_inc(&ktls->counters[BNXT_KTLS_ERR_INVALID_CIPHER]);
> + return -EOPNOTSUPP;
> + }
> +
> + atomic_inc(&ktls->pending);
> + /* Make sure bnxt_close_nic() sees pending before we check the
> + * BNXT_STATE_OPEN flag.
> + */
> + smp_mb__after_atomic();
> + if (!test_bit(BNXT_STATE_OPEN, &bp->state)) {
> + atomic64_inc(&ktls->counters[BNXT_KTLS_ERR_STATE_NOT_OPEN]);
> + rc = -ENODEV;
> + goto exit;
> + }
[Medium, Low]
Two related questions about the ktls dereferences in this function.
bp->ktls_info is read once into ktls and then immediately dereferenced
without a NULL guard. bnxt_ktls_init() (in a later patch in this series)
only assigns dev->tlsdev_ops when bp->ktls_info is non-NULL, but does the
driver ever clear dev->tlsdev_ops on teardown? bnxt_free_ktls_info()
sets bp->ktls_info = NULL and is reachable from the FW-reset path via
__bnxt_hwrm_func_qcaps() -> bnxt_alloc_crypto_info() -> bnxt_free_*
without unregistering the netdev. Can a tls_dev_add arriving in that
window dereference NULL?
Separately, the comment on smp_mb__after_atomic() says "pending is the
gate that protects ktls" but the BNXT_KTLS_ERR_INVALID_CIPHER counter is
incremented before atomic_inc(&ktls->pending). Should that counter
update be moved past the pending barrier so it follows the same
ordering as the other ktls dereferences in this function?
[ ... ]
> +#define KTLS_RETRY_MAX 100
> +
> +static void bnxt_ktls_dev_del(struct net_device *dev,
> + struct tls_context *tls_ctx,
> + enum tls_offload_ctx_dir direction)
> +{
> + struct bnxt_ktls_offload_ctx_tx *kctx_tx;
> + struct bnxt *bp = netdev_priv(dev);
> + struct bnxt_crypto_info *crypto;
> + struct bnxt_tls_info *ktls;
> + struct bnxt_kctx *kctx;
> + int retry_cnt = 0;
> + u8 kind;
> + u32 kid;
> +
> + ktls = bp->ktls_info;
> +retry:
> + while (!test_bit(BNXT_STATE_OPEN, &bp->state)) {
> + if (!netif_running(dev))
> + return;
> + /* Prevent infinite loop if device is down but netif_running()
> + * returns true.
> + */
> + if (retry_cnt > KTLS_RETRY_MAX) {
> + atomic64_inc(&ktls->counters[BNXT_KTLS_ERR_RETRY_EXCEEDED]);
> + netdev_warn(dev, "%s retry max %d exceeded, state %lx\n",
> + __func__, retry_cnt, bp->state);
> + return;
> + }
> + retry_cnt++;
> + msleep(100);
> + }
[High]
This loop can sleep up to 100 * 100 ms = ~10 seconds while holding
device_offload_lock from the net/tls caller. tls_device_down() takes
that lock for write and walks every offloaded TLS context on the netdev,
so a FW-reset that overlaps teardown of N TLS sockets could multiply the
delay across all of them and stall every other tls_set_device_offload()
on the system for that period.
Could this be coupled to the FW-reset completion path via wait_event()
or a completion instead of an open-coded msleep retry on a state bit?
The fixed retry budget also means that if the device stays out of
BNXT_STATE_OPEN longer than ~10 seconds, the function returns having
done nothing -- the HW key context stays programmed and the in-driver
KID slot is leaked until driver unload. Is that intended, and where is
the leaked slot reclaimed in that case?
[ ... ]
> + crypto = bp->crypto_info;
> + kctx_tx = __tls_driver_ctx(tls_ctx, TLS_OFFLOAD_CTX_DIR_TX);
> + kid = kctx_tx->kid;
> + kctx = &crypto->kctx[BNXT_TX_CRYPTO_KEY_TYPE];
> + kind = BNXT_CTX_KIND_CK_TX;
> + atomic64_inc(&ktls->counters[BNXT_KTLS_TX_DEL]);
> + if (bnxt_kid_valid(kctx, kid) &&
> + !bnxt_crypto_del(bp, kctx->type, kind, kid))
> + bnxt_free_one_kctx(kctx, kid);
> +
> + atomic_dec(&ktls->pending);
> +}
[Medium]
bnxt_free_one_kctx() is only called when bnxt_crypto_del() returns 0.
bnxt_crypto_del() can return -ENODEV when bnxt_select_mpc_ring() returns
NULL, and any nonzero value from bnxt_xmit_crypto_cmd() (timeout, MPC
ring busy, completion error, allocation failure).
In all of those cases the bit corresponding to this KID in kid->ids is
never cleared back to "free", so the slot stays consumed. The close-time
fallback bnxt_crypto_del_all_kids() also only sets the bit on rc == 0,
so it does not reclaim slots that fail repeatedly either.
Can the KID pool slowly drain over runtime ifdown/ifup cycles with
intermittent MPC errors? Should the slot be released on the local side
even when the HW delete command fails?
[ ... ]
next prev parent reply other threads:[~2026-05-16 1:10 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-12 21:20 [PATCH net-next v2 00/15] bnxt_en: Add kTLS TX offload support Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 01/15] bnxt_en: Add Midpath channel information Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 02/15] bnxt_en: Account for the MPC TX and CP rings Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 03/15] bnxt_en: Set default MPC ring count Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 04/15] bnxt_en: Rename xdp_tx_lock to tx_lock Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 05/15] bnxt_en: Allocate and free MPC software structures Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 06/15] bnxt_en: Allocate and free MPC channels from firmware Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 07/15] bnxt_en: Allocate crypto structure and backing store Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 08/15] bnxt_en: Reserve crypto RX and TX key contexts on a PF Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 09/15] bnxt_en: Add infrastructure for crypto key context IDs Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 10/15] bnxt_en: Add MPC transmit and completion functions Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del() Michael Chan
2026-05-16 1:10 ` Jakub Kicinski [this message]
2026-05-12 21:21 ` [PATCH net-next v2 13/15] bnxt_en: Implement kTLS TX normal path Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 14/15] bnxt_en: Add support for inline transmit BDs Michael Chan
2026-05-12 21:21 ` [PATCH net-next v2 15/15] bnxt_en: Add kTLS retransmission support Michael Chan
2026-05-16 1:10 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260516011022.1857741-1-kuba@kernel.org \
--to=kuba@kernel.org \
--cc=andrew+netdev@lunn.ch \
--cc=andrew.gospodarek@broadcom.com \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=michael.chan@broadcom.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=pavan.chebbi@broadcom.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox