Re: [PATCH net-next v2 15/15] bnxt_en: Add kTLS retransmission support

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: michael.chan@broadcom.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	davem@davemloft.net, netdev@vger.kernel.org, edumazet@google.com,
	pabeni@redhat.com, andrew+netdev@lunn.ch,
	pavan.chebbi@broadcom.com, andrew.gospodarek@broadcom.com
Subject: Re: [PATCH net-next v2 15/15] bnxt_en: Add kTLS retransmission support
Date: Fri, 15 May 2026 18:10:24 -0700	[thread overview]
Message-ID: <20260516011024.1857806-1-kuba@kernel.org> (raw)
In-Reply-To: <20260512212105.3488258-16-michael.chan@broadcom.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
bnxt_en: Add kTLS retransmission support

Adds a TX path for kTLS retransmissions that retrieves the TLS record
covering a retransmitted byte range and emits a presync TX BD carrying
the crypto prefix command so the NIC can re-encrypt the packet. The
patch also adds two new ethtool -S counters (tx_tls_ooo_packets and
tx_tls_drop_no_sync) and reorders the read of txr->tx_prod in
bnxt_start_xmit() so it happens after bnxt_ktls_xmit() may have
advanced the producer.

A few questions on the new code below.

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 89c900db45ba..5697190dc541 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -499,7 +499,6 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  
>  	txq = netdev_get_tx_queue(dev, i);
>  	txr = &bp->tx_ring[bp->tx_ring_map[i]];
> -	prod = txr->tx_prod;
>  
>  #if (MAX_SKB_FRAGS > TX_MAX_FRAGS)
>  	if (skb_shinfo(skb)->nr_frags > TX_MAX_FRAGS) {
> @@ -532,6 +531,7 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
>  	if (unlikely(!skb))
>  		return NETDEV_TX_OK;
>  
> +	prod = txr->tx_prod;
>  	length = skb->len;
>  	len = skb_headlen(skb);
>  	last_frag = skb_shinfo(skb)->nr_frags;

[High]
What unwinds the presync BDs if a DMA mapping fails later in
bnxt_start_xmit()?

By the time bnxt_ktls_xmit() returns, bnxt_ktls_tx_ooo() has already
called bnxt_ktls_pre_xmit() which advances txr->tx_prod past the
CRYPTO_PRESYNC_BDS presync entries and (when fwd is true) updates
kctx_tx->tcp_seq_no = end_seq. After that, bnxt_start_xmit() still has
to dma_map_single(skb->data, ...) and skb_frag_dma_map() each fragment.

If either mapping fails, control jumps to tx_dma_error / tx_free /
tx_kick_pending, which clears only the head data BD slot at txr->tx_prod
and does not move tx_prod back past the presync BDs nor revert
kctx_tx->tcp_seq_no.

Once a subsequent successful TX advances tx_prod and rings the doorbell,
won't HW read the orphan presync BDs and apply the captured TLS crypto
state to whichever data BDs follow, even if those belong to an unrelated
packet on the ring?

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> index 66b323e94140..769058a6ec31 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
> @@ -359,6 +359,8 @@ static const char *const bnxt_ring_drv_stats_arr[] = {
>  static const char *const bnxt_ktls_data_stats[] = {
>  	[BNXT_KTLS_TX_PKTS]		= "tx_tls_encrypted_packets",
>  	[BNXT_KTLS_TX_BYTES]		= "tx_tls_encrypted_bytes",
> +	[BNXT_KTLS_TX_OOO_PKTS]		= "tx_tls_ooo_packets",
> +	[BNXT_KTLS_TX_DROP_NO_SYNC]	= "tx_tls_drop_no_sync",
>  };

[Low]
Is the "tx_tls_drop_no_sync" name accurate?  BNXT_KTLS_TX_DROP_NO_SYNC
is bumped in bnxt_ktls_tx_ooo() when tls_get_record() returns NULL or a
record with num_frags == 0, after which -EPROTO is returned and the
caller falls back to tls_encrypt_skb() rather than dropping the packet.
Would something like "tx_tls_resync_no_record" be more descriptive of
what the counter actually counts, given that ethtool -S strings are
hard to rename once shipped?

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ktls.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ktls.c
> index 263b075af621..b94418ee5436 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ktls.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ktls.c
> @@ -289,6 +289,116 @@ int bnxt_ktls_init(struct bnxt *bp)

[ ... ]

> +static int bnxt_ktls_tx_ooo(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> +			    struct sk_buff *skb, u32 payload_len, u32 seq,
> +			    struct tls_context *tls_ctx)
> +{
[ ... ]
> +	spin_lock_irqsave(&tx_tls_ctx->lock, flags);
> +	record = tls_get_record(tx_tls_ctx, seq, &rec_sn);
> +	if (!record || !record->num_frags) {
> +		rc = -EPROTO;
> +		sw_stats->tls.counters[BNXT_KTLS_TX_DROP_NO_SYNC]++;
> +		goto unlock_exit;
> +	}
> +	hdr_tcp_seq = tls_record_start_seq(record);
> +	hdr = skb_frag_address_safe(&record->frags[0]);
> +
[ ... ]
> +	if (tls_ctx->prot_info.version == TLS_1_2_VERSION) {
> +		u32 nonce_bytes = tls_ctx->prot_info.iv_size;
> +		u32 retrans_off = seq - hdr_tcp_seq;
> +
> +		if (retrans_off > 5 && retrans_off < 5 + nonce_bytes)
> +			nonce_bytes = retrans_off - 5;
> +		memcpy(pcmd.explicit_nonce, hdr + 5, nonce_bytes);
> +	}

[High]
Can hdr be NULL here?  skb_frag_address_safe() is documented to return
NULL when page_address(page) returns NULL, e.g. for highmem pages on
32-bit kernels with CONFIG_HIGHMEM that have not been kmap'd.

TLS records are allocated via sk_page_frag_refill() with the socket's
GFP flags, which can produce highmem pages, and the existing in-tree
caller in bnxt_start_xmit() does:

	fptr = skb_frag_address_safe(frag);
	if (!fptr)
		goto normal_tx;

For TLS 1.2, hdr + 5 is then dereferenced unconditionally in
memcpy(pcmd.explicit_nonce, hdr + 5, nonce_bytes).  Should there be a
NULL check here too, similar to the bnxt_start_xmit() pattern?

> +	memcpy(&pcmd.record_seq_num[0], &rec_sn, sizeof(rec_sn));

[High]
Does this copy the record sequence number with the byte order the NIC
expects on big-endian hosts?

rec_sn comes from tls_get_record() which returns offload_ctx->
unacked_record_sn, a host-byte-order u64 initialised via
be64_to_cpu(rcd_sn) - 1 in net/tls/tls_device.c and incremented as
plain C arithmetic.  The steady-state setup path bnxt_crypto_add()
populates the same record_seq_num field via bnxt_copy_tls_mp_data()
which is commented "Copy in reverse byte order" and reverses the
wire-format big-endian aes->rec_seq into little-endian.

On a little-endian host memcpy of rec_sn yields LE bytes, which matches
the HW format.  On a big-endian host memcpy yields BE bytes, so the NIC
sees the wrong record sequence number for the AEAD computation on every
retransmission.  Would something like

	__le64 le_rec_sn = cpu_to_le64(rec_sn);
	memcpy(&pcmd.record_seq_num[0], &le_rec_sn, sizeof(le_rec_sn));

be more correct?

> +	rc = 0;
> +	bnxt_ktls_pre_xmit(bp, txr, kctx_tx->kid, &pcmd);
> +
> +	if (fwd)
> +		kctx_tx->tcp_seq_no = end_seq;
> +
> +unlock_exit:
> +	spin_unlock_irqrestore(&tx_tls_ctx->lock, flags);
> +	return rc;
> +}
> +
>  struct sk_buff *bnxt_ktls_xmit(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
>  			       struct sk_buff *skb, __le32 *lflags, u32 *kid)
>  {
[ ... ]
> @@ -316,9 +427,18 @@ struct sk_buff *bnxt_ktls_xmit(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
>  		sw_stats->tls.counters[BNXT_KTLS_TX_PKTS]++;
>  		sw_stats->tls.counters[BNXT_KTLS_TX_BYTES] += payload_len;
>  	} else {
> -		skb = tls_encrypt_skb(skb);
> -		if (!skb)
> -			return NULL;
> +		sw_stats->tls.counters[BNXT_KTLS_TX_OOO_PKTS]++;
> +
> +		rc = bnxt_ktls_tx_ooo(bp, txr, skb, payload_len, seq, tls_ctx);
> +		if (rc)
> +			return tls_encrypt_skb(skb);
> +
> +		*kid = BNXT_KID_HW(kctx_tx->kid);
> +		*lflags |= cpu_to_le32(TX_BD_FLAGS_CRYPTO_EN |
> +				       BNXT_TX_KID_LO(*kid));
> +		sw_stats->tls.counters[BNXT_KTLS_TX_PKTS]++;
> +		sw_stats->tls.counters[BNXT_KTLS_TX_BYTES] += payload_len;
> +		return skb;
>  	}
>  	return skb;
>  }

[Low]
How is an operator expected to break the OOO counter down by outcome?

BNXT_KTLS_TX_OOO_PKTS is incremented unconditionally before calling
bnxt_ktls_tx_ooo(), so it counts every OOO attempt.  On success the
same packet then bumps BNXT_KTLS_TX_PKTS and BNXT_KTLS_TX_BYTES, while
on -EPROTO it also bumps BNXT_KTLS_TX_DROP_NO_SYNC.

The two SW-fallback cases (-ENOSPC for ring-full and -EOPNOTSUPP for
tag-bearing retransmissions) are not visible in any counter.  Given the
commit message defers tag-bearing retransmissions to future patches,
-EOPNOTSUPP is expected to be the principal SW-fallback cause for now.

Would it make sense to either move the OOO_PKTS increment to the success
branch or add per-failure-cause counters such as
tx_tls_ooo_fallback_no_sync, tx_tls_ooo_fallback_tag_retrans and
tx_tls_ooo_fallback_no_space, before the strings are picked up by
tooling?

     prev parent reply	other threads:[~2026-05-16  1:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12 21:20 [PATCH net-next v2 00/15] bnxt_en: Add kTLS TX offload support Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 01/15] bnxt_en: Add Midpath channel information Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 02/15] bnxt_en: Account for the MPC TX and CP rings Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 03/15] bnxt_en: Set default MPC ring count Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 04/15] bnxt_en: Rename xdp_tx_lock to tx_lock Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 05/15] bnxt_en: Allocate and free MPC software structures Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 06/15] bnxt_en: Allocate and free MPC channels from firmware Michael Chan
2026-05-12 21:20 ` [PATCH net-next v2 07/15] bnxt_en: Allocate crypto structure and backing store Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 08/15] bnxt_en: Reserve crypto RX and TX key contexts on a PF Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:20 ` [PATCH net-next v2 09/15] bnxt_en: Add infrastructure for crypto key context IDs Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 10/15] bnxt_en: Add MPC transmit and completion functions Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 11/15] bnxt_en: Add crypto MPC transmit/completion infrastructure Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 12/15] bnxt_en: Support kTLS TX offload by implementing .tls_dev_add/del() Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 13/15] bnxt_en: Implement kTLS TX normal path Michael Chan
2026-05-16  1:10   ` Jakub Kicinski
2026-05-12 21:21 ` [PATCH net-next v2 14/15] bnxt_en: Add support for inline transmit BDs Michael Chan
2026-05-12 21:21 ` [PATCH net-next v2 15/15] bnxt_en: Add kTLS retransmission support Michael Chan
2026-05-16  1:10   ` Jakub Kicinski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260516011024.1857806-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=andrew.gospodarek@broadcom.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=michael.chan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pavan.chebbi@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.