* [net-next v5 01/12] net: tso: Introduce tso_dma_map
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-23 18:38 ` [net-next v5 02/12] net: tso: Add tso_dma_map helpers Joe Damato
` (10 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: andrew+netdev, michael.chan, pavan.chebbi, linux-kernel, leon,
Joe Damato
Add struct tso_dma_map to tso.h for tracking DMA addresses of mapped
GSO payload data.
The struct combines DMA mapping storage with iterator state, allowing
drivers to walk pre-mapped DMA regions linearly. Includes fields for
the DMA IOVA path (iova_state, iova_offset, total_len) and a fallback
per-region path (linear_dma, frags[], frag_idx, offset).
Helpers to initialize and operate on this struct will be added in the
next commit.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
v3:
- struct tso_dma_map extended to track IOVA state and
a fallback per-region path.
include/net/tso.h | 40 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 40 insertions(+)
diff --git a/include/net/tso.h b/include/net/tso.h
index e7e157ae0526..8f8d9d74e873 100644
--- a/include/net/tso.h
+++ b/include/net/tso.h
@@ -3,6 +3,7 @@
#define _TSO_H
#include <linux/skbuff.h>
+#include <linux/dma-mapping.h>
#include <net/ip.h>
#define TSO_HEADER_SIZE 256
@@ -28,4 +29,43 @@ void tso_build_hdr(const struct sk_buff *skb, char *hdr, struct tso_t *tso,
void tso_build_data(const struct sk_buff *skb, struct tso_t *tso, int size);
int tso_start(struct sk_buff *skb, struct tso_t *tso);
+/**
+ * struct tso_dma_map - DMA mapping state for GSO payload
+ * @dev: device used for DMA mapping
+ * @skb: the GSO skb being mapped
+ * @hdr_len: per-segment header length
+ * @iova_state: DMA IOVA state (when IOMMU available)
+ * @iova_offset: global byte offset into IOVA range (IOVA path only)
+ * @total_len: total payload length
+ * @frag_idx: current region (-1 = linear, 0..nr_frags-1 = frag)
+ * @offset: byte offset within current region
+ * @linear_dma: DMA address of the linear payload
+ * @linear_len: length of the linear payload
+ * @nr_frags: number of frags successfully DMA-mapped
+ * @frags: per-frag DMA address and length
+ *
+ * DMA-maps the payload regions of a GSO skb (linear data + frags).
+ * Prefers the DMA IOVA API for a single contiguous mapping with one
+ * IOTLB sync; falls back to per-region dma_map_phys() otherwise.
+ */
+struct tso_dma_map {
+ struct device *dev;
+ const struct sk_buff *skb;
+ unsigned int hdr_len;
+ /* IOVA path */
+ struct dma_iova_state iova_state;
+ size_t iova_offset;
+ size_t total_len;
+ /* Fallback path if IOVA path fails */
+ int frag_idx;
+ unsigned int offset;
+ dma_addr_t linear_dma;
+ unsigned int linear_len;
+ unsigned int nr_frags;
+ struct {
+ dma_addr_t dma;
+ unsigned int len;
+ } frags[MAX_SKB_FRAGS];
+};
+
#endif /* _TSO_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [net-next v5 02/12] net: tso: Add tso_dma_map helpers
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
2026-03-23 18:38 ` [net-next v5 01/12] net: tso: Introduce tso_dma_map Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-23 18:38 ` [net-next v5 03/12] net: bnxt: Export bnxt_xmit_get_cfa_action Joe Damato
` (9 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: andrew+netdev, michael.chan, pavan.chebbi, linux-kernel, leon,
Joe Damato
Adds skb_frag_phys() to skbuff.h, returning the physical address
of a paged fragment's data, which is used by the tso_dma_map helpers
introduced in this commit described below:
tso_dma_map_init(): DMA-maps the linear payload region and all frags
upfront. Prefers the DMA IOVA API for a single contiguous mapping with
one IOTLB sync; falls back to per-region dma_map_phys() otherwise.
Returns 0 on success, cleans up partial mappings on failure.
tso_dma_map_cleanup(): Handles both IOVA and fallback teardown paths.
tso_dma_map_count(): counts how many descriptors the next N bytes of
payload will need. Returns 1 if IOVA is used since the mapping is
contiguous.
tso_dma_map_next(): yields the next (dma_addr, chunk_len) pair.
On the IOVA path, each segment is a single contiguous chunk. On the
fallback path, indicates when a chunk starts a new DMA mapping so the
driver can set dma_unmap_len on that descriptor for completion-time
unmapping.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
v4:
- Fix the kdoc for the TSO helpers. No functional changes.
v3:
- Added skb_frag_phys helper include/linux/skbuff.h.
- Added tso_dma_map_use_iova() inline helper in tso.h.
- Updated the helpers to use the DMA IOVA API and falls back to per-region
mapping instead.
include/linux/skbuff.h | 11 ++
include/net/tso.h | 21 ++++
net/core/tso.c | 273 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 305 insertions(+)
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 9cc98f850f1d..d8630eb366c5 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3758,6 +3758,17 @@ static inline void *skb_frag_address_safe(const skb_frag_t *frag)
return ptr + skb_frag_off(frag);
}
+/**
+ * skb_frag_phys - gets the physical address of the data in a paged fragment
+ * @frag: the paged fragment buffer
+ *
+ * Returns: the physical address of the data within @frag.
+ */
+static inline phys_addr_t skb_frag_phys(const skb_frag_t *frag)
+{
+ return page_to_phys(skb_frag_page(frag)) + skb_frag_off(frag);
+}
+
/**
* skb_frag_page_copy() - sets the page in a fragment from another fragment
* @fragto: skb fragment where page is set
diff --git a/include/net/tso.h b/include/net/tso.h
index 8f8d9d74e873..f78a470a7277 100644
--- a/include/net/tso.h
+++ b/include/net/tso.h
@@ -68,4 +68,25 @@ struct tso_dma_map {
} frags[MAX_SKB_FRAGS];
};
+int tso_dma_map_init(struct tso_dma_map *map, struct device *dev,
+ const struct sk_buff *skb, unsigned int hdr_len);
+void tso_dma_map_cleanup(struct tso_dma_map *map);
+unsigned int tso_dma_map_count(struct tso_dma_map *map, unsigned int len);
+bool tso_dma_map_next(struct tso_dma_map *map, dma_addr_t *addr,
+ unsigned int *chunk_len, unsigned int *mapping_len,
+ unsigned int seg_remaining);
+
+/**
+ * tso_dma_map_use_iova - check if this map used the DMA IOVA path
+ * @map: the map to check
+ *
+ * Return: true if the IOVA API was used for this mapping. When true,
+ * the driver must call tso_dma_map_cleanup() at completion time instead
+ * of doing per-region DMA unmaps.
+ */
+static inline bool tso_dma_map_use_iova(struct tso_dma_map *map)
+{
+ return dma_use_iova(&map->iova_state);
+}
+
#endif /* _TSO_H */
diff --git a/net/core/tso.c b/net/core/tso.c
index 6df997b9076e..8d3cfbd52e84 100644
--- a/net/core/tso.c
+++ b/net/core/tso.c
@@ -3,6 +3,7 @@
#include <linux/if_vlan.h>
#include <net/ip.h>
#include <net/tso.h>
+#include <linux/dma-mapping.h>
#include <linux/unaligned.h>
void tso_build_hdr(const struct sk_buff *skb, char *hdr, struct tso_t *tso,
@@ -87,3 +88,275 @@ int tso_start(struct sk_buff *skb, struct tso_t *tso)
return hdr_len;
}
EXPORT_SYMBOL(tso_start);
+
+static int tso_dma_iova_try(struct device *dev, struct tso_dma_map *map,
+ phys_addr_t phys, size_t linear_len, size_t total_len,
+ size_t *offset)
+{
+ const struct sk_buff *skb;
+ unsigned int nr_frags;
+ int i;
+
+ if (!dma_iova_try_alloc(dev, &map->iova_state, phys, total_len))
+ return 1;
+
+ skb = map->skb;
+ nr_frags = skb_shinfo(skb)->nr_frags;
+
+ if (linear_len) {
+ if (dma_iova_link(dev, &map->iova_state,
+ phys, *offset, linear_len,
+ DMA_TO_DEVICE, 0))
+ goto iova_fail;
+ map->linear_len = linear_len;
+ *offset += linear_len;
+ }
+
+ for (i = 0; i < nr_frags; i++) {
+ skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+ unsigned int frag_len = skb_frag_size(frag);
+
+ if (dma_iova_link(dev, &map->iova_state,
+ skb_frag_phys(frag), *offset,
+ frag_len, DMA_TO_DEVICE, 0)) {
+ map->nr_frags = i;
+ goto iova_fail;
+ }
+ map->frags[i].len = frag_len;
+ *offset += frag_len;
+ map->nr_frags = i + 1;
+ }
+
+ if (dma_iova_sync(dev, &map->iova_state, 0, total_len))
+ goto iova_fail;
+
+ return 0;
+
+iova_fail:
+ dma_iova_destroy(dev, &map->iova_state, *offset,
+ DMA_TO_DEVICE, 0);
+ memset(&map->iova_state, 0, sizeof(map->iova_state));
+
+ /* reset map state */
+ map->frag_idx = -1;
+ map->offset = 0;
+ map->linear_len = 0;
+ map->nr_frags = 0;
+
+ return 1;
+}
+
+/**
+ * tso_dma_map_init - DMA-map GSO payload regions
+ * @map: map struct to initialize
+ * @dev: device for DMA mapping
+ * @skb: the GSO skb
+ * @hdr_len: per-segment header length in bytes
+ *
+ * DMA-maps the linear payload (after headers) and all frags.
+ * Prefers the DMA IOVA API (one contiguous mapping, one IOTLB sync);
+ * falls back to per-region dma_map_phys() when IOVA is not available.
+ * Positions the iterator at byte 0 of the payload.
+ *
+ * Return: 0 on success, -ENOMEM on DMA mapping failure (partial mappings
+ * are cleaned up internally).
+ */
+int tso_dma_map_init(struct tso_dma_map *map, struct device *dev,
+ const struct sk_buff *skb, unsigned int hdr_len)
+{
+ unsigned int linear_len = skb_headlen(skb) - hdr_len;
+ unsigned int nr_frags = skb_shinfo(skb)->nr_frags;
+ size_t total_len = skb->len - hdr_len;
+ size_t offset = 0;
+ phys_addr_t phys;
+ int i;
+
+ if (!total_len)
+ return 0;
+
+ map->dev = dev;
+ map->skb = skb;
+ map->hdr_len = hdr_len;
+ map->frag_idx = -1;
+ map->offset = 0;
+ map->iova_offset = 0;
+ map->total_len = total_len;
+ map->linear_len = 0;
+ map->nr_frags = 0;
+ memset(&map->iova_state, 0, sizeof(map->iova_state));
+
+ if (linear_len)
+ phys = virt_to_phys(skb->data + hdr_len);
+ else
+ phys = skb_frag_phys(&skb_shinfo(skb)->frags[0]);
+
+ if (tso_dma_iova_try(dev, map, phys, linear_len, total_len, &offset)) {
+ /* IOVA path failed, map state was reset. Fallback to
+ * per-region dma_map_phys()
+ */
+ if (linear_len) {
+ map->linear_dma = dma_map_phys(dev, phys, linear_len,
+ DMA_TO_DEVICE, 0);
+ if (dma_mapping_error(dev, map->linear_dma))
+ return -ENOMEM;
+ map->linear_len = linear_len;
+ }
+
+ for (i = 0; i < nr_frags; i++) {
+ skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+ unsigned int frag_len = skb_frag_size(frag);
+
+ map->frags[i].len = frag_len;
+ map->frags[i].dma = dma_map_phys(dev, skb_frag_phys(frag),
+ frag_len, DMA_TO_DEVICE, 0);
+ if (dma_mapping_error(dev, map->frags[i].dma)) {
+ tso_dma_map_cleanup(map);
+ return -ENOMEM;
+ }
+ map->nr_frags = i + 1;
+ }
+ }
+
+ if (linear_len == 0 && nr_frags > 0)
+ map->frag_idx = 0;
+
+ return 0;
+}
+EXPORT_SYMBOL(tso_dma_map_init);
+
+/**
+ * tso_dma_map_cleanup - unmap all DMA regions in a tso_dma_map
+ * @map: the map to clean up
+ *
+ * Handles both IOVA and fallback paths. For IOVA, calls
+ * dma_iova_destroy(). For fallback, unmaps each region individually.
+ */
+void tso_dma_map_cleanup(struct tso_dma_map *map)
+{
+ int i;
+
+ if (dma_use_iova(&map->iova_state)) {
+ dma_iova_destroy(map->dev, &map->iova_state, map->total_len,
+ DMA_TO_DEVICE, 0);
+ memset(&map->iova_state, 0, sizeof(map->iova_state));
+ map->linear_len = 0;
+ map->nr_frags = 0;
+ return;
+ }
+
+ if (map->linear_len)
+ dma_unmap_phys(map->dev, map->linear_dma, map->linear_len,
+ DMA_TO_DEVICE, 0);
+
+ for (i = 0; i < map->nr_frags; i++)
+ dma_unmap_phys(map->dev, map->frags[i].dma, map->frags[i].len,
+ DMA_TO_DEVICE, 0);
+
+ map->linear_len = 0;
+ map->nr_frags = 0;
+}
+EXPORT_SYMBOL(tso_dma_map_cleanup);
+
+/**
+ * tso_dma_map_count - count descriptors for a payload range
+ * @map: the payload map
+ * @len: number of payload bytes in this segment
+ *
+ * Counts how many contiguous DMA region chunks the next @len bytes
+ * will span, without advancing the iterator. On the IOVA path this
+ * is always 1 (contiguous). On the fallback path, uses region sizes
+ * from the current position.
+ *
+ * Return: the number of descriptors needed for @len bytes of payload.
+ */
+unsigned int tso_dma_map_count(struct tso_dma_map *map, unsigned int len)
+{
+ unsigned int offset = map->offset;
+ int idx = map->frag_idx;
+ unsigned int count = 0;
+
+ if (!len)
+ return 0;
+
+ if (dma_use_iova(&map->iova_state))
+ return 1;
+
+ while (len > 0) {
+ unsigned int region_len, chunk;
+
+ if (idx == -1)
+ region_len = map->linear_len;
+ else
+ region_len = map->frags[idx].len;
+
+ chunk = min(len, region_len - offset);
+ len -= chunk;
+ count++;
+ offset = 0;
+ idx++;
+ }
+
+ return count;
+}
+EXPORT_SYMBOL(tso_dma_map_count);
+
+/**
+ * tso_dma_map_next - yield the next DMA address range
+ * @map: the payload map
+ * @addr: output DMA address
+ * @chunk_len: output chunk length
+ * @mapping_len: full DMA mapping length when this chunk starts a new
+ * mapping region, or 0 when continuing a previous one.
+ * On the IOVA path this is always 0 (driver must not
+ * do per-region unmaps; use tso_dma_map_cleanup instead).
+ * @seg_remaining: bytes left in current segment
+ *
+ * Yields the next (dma_addr, chunk_len) pair and advances the iterator.
+ * On the IOVA path, the entire payload is contiguous so each segment
+ * is always a single chunk.
+ *
+ * Return: true if a chunk was yielded, false when @seg_remaining is 0.
+ */
+bool tso_dma_map_next(struct tso_dma_map *map, dma_addr_t *addr,
+ unsigned int *chunk_len, unsigned int *mapping_len,
+ unsigned int seg_remaining)
+{
+ unsigned int region_len, chunk;
+
+ if (!seg_remaining)
+ return false;
+
+ /* IOVA path: contiguous DMA range, no region boundaries */
+ if (dma_use_iova(&map->iova_state)) {
+ *addr = map->iova_state.addr + map->iova_offset;
+ *chunk_len = seg_remaining;
+ *mapping_len = 0;
+ map->iova_offset += seg_remaining;
+ return true;
+ }
+
+ /* Fallback path: per-region iteration */
+
+ if (map->frag_idx == -1) {
+ region_len = map->linear_len;
+ chunk = min(seg_remaining, region_len - map->offset);
+ *addr = map->linear_dma + map->offset;
+ *mapping_len = (map->offset == 0) ? region_len : 0;
+ } else {
+ region_len = map->frags[map->frag_idx].len;
+ chunk = min(seg_remaining, region_len - map->offset);
+ *addr = map->frags[map->frag_idx].dma + map->offset;
+ *mapping_len = (map->offset == 0) ? region_len : 0;
+ }
+
+ *chunk_len = chunk;
+ map->offset += chunk;
+
+ if (map->offset >= region_len) {
+ map->frag_idx++;
+ map->offset = 0;
+ }
+
+ return true;
+}
+EXPORT_SYMBOL(tso_dma_map_next);
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [net-next v5 03/12] net: bnxt: Export bnxt_xmit_get_cfa_action
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
2026-03-23 18:38 ` [net-next v5 01/12] net: tso: Introduce tso_dma_map Joe Damato
2026-03-23 18:38 ` [net-next v5 02/12] net: tso: Add tso_dma_map helpers Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-23 18:38 ` [net-next v5 04/12] net: bnxt: Add a helper for tx_bd_ext Joe Damato
` (8 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: horms, linux-kernel, leon, Joe Damato
Export bnxt_xmit_get_cfa_action so that it can be used in future commits
which add software USO support to bnxt.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
---
v4:
- Added Pavan's Reviewed-by tag. No functional changes.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 604966a398f5..7793ba59bcfc 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -447,7 +447,7 @@ const u16 bnxt_lhint_arr[] = {
TX_BD_FLAGS_LHINT_2048_AND_LARGER,
};
-static u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb)
+u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb)
{
struct metadata_dst *md_dst = skb_metadata_dst(skb);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index dd0f6743acf5..d82b0899b33d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2950,6 +2950,7 @@ unsigned int bnxt_get_avail_cp_rings_for_en(struct bnxt *bp);
int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init);
void bnxt_tx_disable(struct bnxt *bp);
void bnxt_tx_enable(struct bnxt *bp);
+u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb);
void bnxt_sched_reset_txr(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
u16 curr);
void bnxt_report_link(struct bnxt *bp);
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [net-next v5 04/12] net: bnxt: Add a helper for tx_bd_ext
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (2 preceding siblings ...)
2026-03-23 18:38 ` [net-next v5 03/12] net: bnxt: Export bnxt_xmit_get_cfa_action Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-23 18:38 ` [net-next v5 05/12] net: bnxt: Use dma_unmap_len for TX completion unmapping Joe Damato
` (7 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: horms, linux-kernel, leon, Joe Damato
Factor out some code to setup tx_bd_exts into a helper function. This
helper will be used by SW USO implementation in the following commits.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
---
v4:
- Added Pavan's Reviewed-by tag. No functional changes.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 ++-------
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 18 ++++++++++++++++++
2 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7793ba59bcfc..4d4e7643f7dd 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -663,10 +663,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
txbd->tx_bd_opaque = SET_TX_OPAQUE(bp, txr, prod, 2 + last_frag);
prod = NEXT_TX(prod);
- txbd1 = (struct tx_bd_ext *)
- &txr->tx_desc_ring[TX_RING(bp, prod)][TX_IDX(prod)];
+ txbd1 = bnxt_init_ext_bd(bp, txr, prod, lflags, vlan_tag_flags,
+ cfa_action);
- txbd1->tx_bd_hsize_lflags = lflags;
if (skb_is_gso(skb)) {
bool udp_gso = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4);
u32 hdr_len;
@@ -693,7 +692,6 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
txbd1->tx_bd_hsize_lflags |=
cpu_to_le32(TX_BD_FLAGS_TCP_UDP_CHKSUM);
- txbd1->tx_bd_mss = 0;
}
length >>= 9;
@@ -706,9 +704,6 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
flags |= bnxt_lhint_arr[length];
txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
- txbd1->tx_bd_cfa_meta = cpu_to_le32(vlan_tag_flags);
- txbd1->tx_bd_cfa_action =
- cpu_to_le32(cfa_action << TX_BD_CFA_ACTION_SHIFT);
txbd0 = txbd;
for (i = 0; i < last_frag; i++) {
frag = &skb_shinfo(skb)->frags[i];
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index d82b0899b33d..a6b04652600e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2834,6 +2834,24 @@ static inline u32 bnxt_tx_avail(struct bnxt *bp,
return bp->tx_ring_size - (used & bp->tx_ring_mask);
}
+static inline struct tx_bd_ext *
+bnxt_init_ext_bd(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
+ u16 prod, __le32 lflags, u32 vlan_tag_flags,
+ u32 cfa_action)
+{
+ struct tx_bd_ext *txbd1;
+
+ txbd1 = (struct tx_bd_ext *)
+ &txr->tx_desc_ring[TX_RING(bp, prod)][TX_IDX(prod)];
+ txbd1->tx_bd_hsize_lflags = lflags;
+ txbd1->tx_bd_mss = 0;
+ txbd1->tx_bd_cfa_meta = cpu_to_le32(vlan_tag_flags);
+ txbd1->tx_bd_cfa_action =
+ cpu_to_le32(cfa_action << TX_BD_CFA_ACTION_SHIFT);
+
+ return txbd1;
+}
+
static inline void bnxt_writeq(struct bnxt *bp, u64 val,
volatile void __iomem *addr)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [net-next v5 05/12] net: bnxt: Use dma_unmap_len for TX completion unmapping
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (3 preceding siblings ...)
2026-03-23 18:38 ` [net-next v5 04/12] net: bnxt: Add a helper for tx_bd_ext Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-23 18:38 ` [net-next v5 06/12] net: bnxt: Add TX inline buffer infrastructure Joe Damato
` (6 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: horms, linux-kernel, leon, Joe Damato
Store the DMA mapping length in each TX buffer descriptor via
dma_unmap_len_set at submit time, and use dma_unmap_len at completion
time.
This is a no-op for normal packets but prepares for software USO,
where header BDs set dma_unmap_len to 0 because the header buffer
is unmapped collectively rather than per-segment.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
---
v4:
- Added Pavan's Reviewed-by tag. No functional changes.
rfcv2:
- Use some local variables to shorten long lines. No functional change from
rfcv1.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 63 ++++++++++++++---------
1 file changed, 40 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4d4e7643f7dd..fe15b32b12e7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -656,6 +656,7 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
goto tx_free;
dma_unmap_addr_set(tx_buf, mapping, mapping);
+ dma_unmap_len_set(tx_buf, len, len);
flags = (len << TX_BD_LEN_SHIFT) | TX_BD_TYPE_LONG_TX_BD |
TX_BD_CNT(last_frag + 2);
@@ -720,6 +721,7 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
netmem_dma_unmap_addr_set(skb_frag_netmem(frag), tx_buf,
mapping, mapping);
+ dma_unmap_len_set(tx_buf, len, len);
txbd->tx_bd_haddr = cpu_to_le64(mapping);
@@ -809,7 +811,8 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
u16 hw_cons = txr->tx_hw_cons;
unsigned int tx_bytes = 0;
u16 cons = txr->tx_cons;
- skb_frag_t *frag;
+ unsigned int dma_len;
+ dma_addr_t dma_addr;
int tx_pkts = 0;
bool rc = false;
@@ -844,19 +847,27 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
goto next_tx_int;
}
- dma_unmap_single(&pdev->dev, dma_unmap_addr(tx_buf, mapping),
- skb_headlen(skb), DMA_TO_DEVICE);
+ if (dma_unmap_len(tx_buf, len)) {
+ dma_addr = dma_unmap_addr(tx_buf, mapping);
+ dma_len = dma_unmap_len(tx_buf, len);
+
+ dma_unmap_single(&pdev->dev, dma_addr, dma_len,
+ DMA_TO_DEVICE);
+ }
+
last = tx_buf->nr_frags;
for (j = 0; j < last; j++) {
- frag = &skb_shinfo(skb)->frags[j];
cons = NEXT_TX(cons);
tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)];
- netmem_dma_unmap_page_attrs(&pdev->dev,
- dma_unmap_addr(tx_buf,
- mapping),
- skb_frag_size(frag),
- DMA_TO_DEVICE, 0);
+ if (dma_unmap_len(tx_buf, len)) {
+ dma_addr = dma_unmap_addr(tx_buf, mapping);
+ dma_len = dma_unmap_len(tx_buf, len);
+
+ netmem_dma_unmap_page_attrs(&pdev->dev,
+ dma_addr, dma_len,
+ DMA_TO_DEVICE, 0);
+ }
}
if (unlikely(is_ts_pkt)) {
if (BNXT_CHIP_P5(bp)) {
@@ -3402,6 +3413,8 @@ static void bnxt_free_one_tx_ring_skbs(struct bnxt *bp,
{
int i, max_idx;
struct pci_dev *pdev = bp->pdev;
+ unsigned int dma_len;
+ dma_addr_t dma_addr;
max_idx = bp->tx_nr_pages * TX_DESC_CNT;
@@ -3412,10 +3425,10 @@ static void bnxt_free_one_tx_ring_skbs(struct bnxt *bp,
if (idx < bp->tx_nr_rings_xdp &&
tx_buf->action == XDP_REDIRECT) {
- dma_unmap_single(&pdev->dev,
- dma_unmap_addr(tx_buf, mapping),
- dma_unmap_len(tx_buf, len),
- DMA_TO_DEVICE);
+ dma_addr = dma_unmap_addr(tx_buf, mapping);
+ dma_len = dma_unmap_len(tx_buf, len);
+
+ dma_unmap_single(&pdev->dev, dma_addr, dma_len, DMA_TO_DEVICE);
xdp_return_frame(tx_buf->xdpf);
tx_buf->action = 0;
tx_buf->xdpf = NULL;
@@ -3437,23 +3450,27 @@ static void bnxt_free_one_tx_ring_skbs(struct bnxt *bp,
continue;
}
- dma_unmap_single(&pdev->dev,
- dma_unmap_addr(tx_buf, mapping),
- skb_headlen(skb),
- DMA_TO_DEVICE);
+ if (dma_unmap_len(tx_buf, len)) {
+ dma_addr = dma_unmap_addr(tx_buf, mapping);
+ dma_len = dma_unmap_len(tx_buf, len);
+
+ dma_unmap_single(&pdev->dev, dma_addr, dma_len, DMA_TO_DEVICE);
+ }
last = tx_buf->nr_frags;
i += 2;
for (j = 0; j < last; j++, i++) {
int ring_idx = i & bp->tx_ring_mask;
- skb_frag_t *frag = &skb_shinfo(skb)->frags[j];
tx_buf = &txr->tx_buf_ring[ring_idx];
- netmem_dma_unmap_page_attrs(&pdev->dev,
- dma_unmap_addr(tx_buf,
- mapping),
- skb_frag_size(frag),
- DMA_TO_DEVICE, 0);
+ if (dma_unmap_len(tx_buf, len)) {
+ dma_addr = dma_unmap_addr(tx_buf, mapping);
+ dma_len = dma_unmap_len(tx_buf, len);
+
+ netmem_dma_unmap_page_attrs(&pdev->dev,
+ dma_addr, dma_len,
+ DMA_TO_DEVICE, 0);
+ }
}
dev_kfree_skb(skb);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [net-next v5 06/12] net: bnxt: Add TX inline buffer infrastructure
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (4 preceding siblings ...)
2026-03-23 18:38 ` [net-next v5 05/12] net: bnxt: Use dma_unmap_len for TX completion unmapping Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-23 18:38 ` [net-next v5 07/12] net: bnxt: Add boilerplate GSO code Joe Damato
` (5 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: horms, linux-kernel, leon, Joe Damato
Add per-ring pre-allocated inline buffer fields (tx_inline_buf,
tx_inline_dma, tx_inline_size) to bnxt_tx_ring_info and helpers to
allocate and free them. A producer and consumer (tx_inline_prod,
tx_inline_cons) are added to track which slot(s) of the inline buffer
are in-use.
The inline buffer will be used by the SW USO path for pre-allocated,
pre-DMA-mapped per-segment header copies. In the future, this
could be extended to support TX copybreak.
Allocation helper is marked __maybe_unused in this commit because it
will be wired in later.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
---
v5:
- Added Pavan's Reviewed-by. No functional changes.
rfcv2:
- Added a producer and consumer to correctly track the in use header slots.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 35 +++++++++++++++++++++++
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 6 ++++
2 files changed, 41 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index fe15b32b12e7..2759a4e2b148 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3985,6 +3985,39 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
return rc;
}
+static void bnxt_free_tx_inline_buf(struct bnxt_tx_ring_info *txr,
+ struct pci_dev *pdev)
+{
+ if (!txr->tx_inline_buf)
+ return;
+
+ dma_unmap_single(&pdev->dev, txr->tx_inline_dma,
+ txr->tx_inline_size, DMA_TO_DEVICE);
+ kfree(txr->tx_inline_buf);
+ txr->tx_inline_buf = NULL;
+ txr->tx_inline_size = 0;
+}
+
+static int __maybe_unused bnxt_alloc_tx_inline_buf(struct bnxt_tx_ring_info *txr,
+ struct pci_dev *pdev,
+ unsigned int size)
+{
+ txr->tx_inline_buf = kmalloc(size, GFP_KERNEL);
+ if (!txr->tx_inline_buf)
+ return -ENOMEM;
+
+ txr->tx_inline_dma = dma_map_single(&pdev->dev, txr->tx_inline_buf,
+ size, DMA_TO_DEVICE);
+ if (dma_mapping_error(&pdev->dev, txr->tx_inline_dma)) {
+ kfree(txr->tx_inline_buf);
+ txr->tx_inline_buf = NULL;
+ return -ENOMEM;
+ }
+ txr->tx_inline_size = size;
+
+ return 0;
+}
+
static void bnxt_free_tx_rings(struct bnxt *bp)
{
int i;
@@ -4003,6 +4036,8 @@ static void bnxt_free_tx_rings(struct bnxt *bp)
txr->tx_push = NULL;
}
+ bnxt_free_tx_inline_buf(txr, pdev);
+
ring = &txr->tx_ring_struct;
bnxt_free_ring(bp, &ring->ring_mem);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index a6b04652600e..751dbc055fdd 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -994,6 +994,12 @@ struct bnxt_tx_ring_info {
dma_addr_t tx_push_mapping;
__le64 data_mapping;
+ void *tx_inline_buf;
+ dma_addr_t tx_inline_dma;
+ unsigned int tx_inline_size;
+ u16 tx_inline_prod;
+ u16 tx_inline_cons;
+
#define BNXT_DEV_STATE_CLOSING 0x1
u32 dev_state;
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [net-next v5 07/12] net: bnxt: Add boilerplate GSO code
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (5 preceding siblings ...)
2026-03-23 18:38 ` [net-next v5 06/12] net: bnxt: Add TX inline buffer infrastructure Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-23 18:38 ` [net-next v5 08/12] net: bnxt: Implement software USO Joe Damato
` (4 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Richard Cochran,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev
Cc: horms, linux-kernel, leon, Joe Damato, bpf
Add bnxt_gso.c and bnxt_gso.h with a stub bnxt_sw_udp_gso_xmit()
function, SW USO constants (BNXT_SW_USO_MAX_SEGS,
BNXT_SW_USO_MAX_DESCS), and the is_sw_gso field in bnxt_sw_tx_bd
with BNXT_SW_GSO_MID/LAST markers.
The full SW USO implementation will be added in a future commit.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
---
v5:
- Added Pavan's Reviewed-by. No functional changes.
drivers/net/ethernet/broadcom/bnxt/Makefile | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 +++
drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c | 30 ++++++++++++++++++
drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h | 31 +++++++++++++++++++
4 files changed, 66 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h
diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile b/drivers/net/ethernet/broadcom/bnxt/Makefile
index ba6c239d52fa..debef78c8b6d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_BNXT) += bnxt_en.o
-bnxt_en-y := bnxt.o bnxt_hwrm.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o bnxt_xdp.o bnxt_ptp.o bnxt_vfr.o bnxt_devlink.o bnxt_dim.o bnxt_coredump.o
+bnxt_en-y := bnxt.o bnxt_hwrm.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o bnxt_xdp.o bnxt_ptp.o bnxt_vfr.o bnxt_devlink.o bnxt_dim.o bnxt_coredump.o bnxt_gso.o
bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
bnxt_en-$(CONFIG_DEBUG_FS) += bnxt_debugfs.o
bnxt_en-$(CONFIG_BNXT_HWMON) += bnxt_hwmon.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 751dbc055fdd..18b08789b3a4 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -891,6 +891,7 @@ struct bnxt_sw_tx_bd {
u8 is_ts_pkt;
u8 is_push;
u8 action;
+ u8 is_sw_gso;
unsigned short nr_frags;
union {
u16 rx_prod;
@@ -898,6 +899,9 @@ struct bnxt_sw_tx_bd {
};
};
+#define BNXT_SW_GSO_MID 1
+#define BNXT_SW_GSO_LAST 2
+
struct bnxt_sw_rx_bd {
void *data;
u8 *data_ptr;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
new file mode 100644
index 000000000000..b296769ee4fe
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Broadcom NetXtreme-C/E network driver.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/pci.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <net/netdev_queues.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/udp.h>
+#include <net/tso.h>
+#include <linux/bnxt/hsi.h>
+
+#include "bnxt.h"
+#include "bnxt_gso.h"
+
+netdev_tx_t bnxt_sw_udp_gso_xmit(struct bnxt *bp,
+ struct bnxt_tx_ring_info *txr,
+ struct netdev_queue *txq,
+ struct sk_buff *skb)
+{
+ dev_kfree_skb_any(skb);
+ dev_core_stats_tx_dropped_inc(bp->dev);
+ return NETDEV_TX_OK;
+}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h
new file mode 100644
index 000000000000..f01e8102dcd7
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Broadcom NetXtreme-C/E network driver.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef BNXT_GSO_H
+#define BNXT_GSO_H
+
+/* Maximum segments the stack may send in a single SW USO skb.
+ * This caps gso_max_segs for NICs without HW USO support.
+ */
+#define BNXT_SW_USO_MAX_SEGS 64
+
+/* Worst-case TX descriptors consumed by one SW USO packet:
+ * Each segment: 1 long BD + 1 ext BD + payload BDs.
+ * Total payload BDs across all segs <= num_segs + nr_frags (each frag
+ * boundary crossing adds at most 1 extra BD).
+ * So: 3 * max_segs + MAX_SKB_FRAGS + 1 = 3 * 64 + 17 + 1 = 210.
+ */
+#define BNXT_SW_USO_MAX_DESCS (3 * BNXT_SW_USO_MAX_SEGS + MAX_SKB_FRAGS + 1)
+
+netdev_tx_t bnxt_sw_udp_gso_xmit(struct bnxt *bp,
+ struct bnxt_tx_ring_info *txr,
+ struct netdev_queue *txq,
+ struct sk_buff *skb);
+
+#endif
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [net-next v5 08/12] net: bnxt: Implement software USO
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (6 preceding siblings ...)
2026-03-23 18:38 ` [net-next v5 07/12] net: bnxt: Add boilerplate GSO code Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-23 18:38 ` [net-next v5 09/12] net: bnxt: Add SW GSO completion and teardown support Joe Damato
` (3 subsequent siblings)
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: horms, linux-kernel, leon, Joe Damato
Implement bnxt_sw_udp_gso_xmit() using the core tso_dma_map API and
the pre-allocated TX inline buffer for per-segment headers.
The xmit path:
1. Calls tso_start() to initialize TSO state
2. Stack-allocates a tso_dma_map and calls tso_dma_map_init() to
DMA-map the linear payload and all frags upfront.
3. For each segment:
- Copies and patches headers via tso_build_hdr() into the
pre-allocated tx_inline_buf (DMA-synced per segment)
- Counts payload BDs via tso_dma_map_count()
- Emits long BD (header) + ext BD + payload BDs
- Payload BDs use tso_dma_map_next() which yields (dma_addr,
chunk_len, mapping_len) tuples.
Header BDs set dma_unmap_len=0 since the inline buffer is pre-allocated
and unmapped only at ring teardown.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
---
v5:
- Added __maybe_unused to last_unmap_len and last_unmap_addr to silence a
build warning when CONFIG_NEED_DMA_MAP_STATE is disabled. No functional
changes.
- Added Pavan's Reviewed-by.
v4:
- Fixed the early return issue Pavan pointed out when num_segs <= 1; use the
drop label instead of returning.
v3:
- Added iova_state and iova_total_len to struct bnxt_sw_tx_bd.
- Stores iova_state on the last segment's tx_buf during xmit.
rfcv2:
- set the unmap len on the last descriptor, so that when completions fire
only the last completion unmaps the region.
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 +
drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c | 206 ++++++++++++++++++
2 files changed, 210 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 18b08789b3a4..865546f3bfce 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -11,6 +11,8 @@
#ifndef BNXT_H
#define BNXT_H
+#include <linux/dma-mapping.h>
+
#define DRV_MODULE_NAME "bnxt_en"
/* DO NOT CHANGE DRV_VER_* defines
@@ -897,6 +899,8 @@ struct bnxt_sw_tx_bd {
u16 rx_prod;
u16 txts_prod;
};
+ struct dma_iova_state iova_state;
+ size_t iova_total_len;
};
#define BNXT_SW_GSO_MID 1
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
index b296769ee4fe..9c30ee063ef5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
@@ -19,11 +19,217 @@
#include "bnxt.h"
#include "bnxt_gso.h"
+static u32 bnxt_sw_gso_lhint(unsigned int len)
+{
+ if (len <= 512)
+ return TX_BD_FLAGS_LHINT_512_AND_SMALLER;
+ else if (len <= 1023)
+ return TX_BD_FLAGS_LHINT_512_TO_1023;
+ else if (len <= 2047)
+ return TX_BD_FLAGS_LHINT_1024_TO_2047;
+ else
+ return TX_BD_FLAGS_LHINT_2048_AND_LARGER;
+}
+
netdev_tx_t bnxt_sw_udp_gso_xmit(struct bnxt *bp,
struct bnxt_tx_ring_info *txr,
struct netdev_queue *txq,
struct sk_buff *skb)
{
+ unsigned int last_unmap_len __maybe_unused = 0;
+ dma_addr_t last_unmap_addr __maybe_unused = 0;
+ struct bnxt_sw_tx_bd *last_unmap_buf = NULL;
+ unsigned int hdr_len, mss, num_segs;
+ struct pci_dev *pdev = bp->pdev;
+ unsigned int total_payload;
+ int i, bds_needed, slots;
+ struct tso_dma_map map;
+ u32 vlan_tag_flags = 0;
+ struct tso_t tso;
+ u16 cfa_action;
+ u16 prod;
+
+ hdr_len = tso_start(skb, &tso);
+ mss = skb_shinfo(skb)->gso_size;
+ total_payload = skb->len - hdr_len;
+ num_segs = DIV_ROUND_UP(total_payload, mss);
+
+ /* Zero the csum fields so tso_build_hdr will propagate zeroes into
+ * every segment header. HW csum offload will recompute from scratch.
+ */
+ udp_hdr(skb)->check = 0;
+ if (!tso.ipv6)
+ ip_hdr(skb)->check = 0;
+
+ if (unlikely(num_segs <= 1))
+ goto drop;
+
+ /* Upper bound on the number of descriptors needed.
+ *
+ * Each segment uses 1 long BD + 1 ext BD + payload BDs, which is
+ * at most num_segs + nr_frags (each frag boundary crossing adds at
+ * most 1 extra BD).
+ */
+ bds_needed = 3 * num_segs + skb_shinfo(skb)->nr_frags + 1;
+
+ if (unlikely(bnxt_tx_avail(bp, txr) < bds_needed)) {
+ netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh);
+ return NETDEV_TX_BUSY;
+ }
+
+ slots = BNXT_SW_USO_MAX_SEGS - (txr->tx_inline_prod - txr->tx_inline_cons);
+
+ if (unlikely(slots < num_segs)) {
+ netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh);
+ return NETDEV_TX_BUSY;
+ }
+
+ if (unlikely(tso_dma_map_init(&map, &pdev->dev, skb, hdr_len)))
+ goto drop;
+
+ cfa_action = bnxt_xmit_get_cfa_action(skb);
+ if (skb_vlan_tag_present(skb)) {
+ vlan_tag_flags = TX_BD_CFA_META_KEY_VLAN |
+ skb_vlan_tag_get(skb);
+ if (skb->vlan_proto == htons(ETH_P_8021Q))
+ vlan_tag_flags |= 1 << TX_BD_CFA_META_TPID_SHIFT;
+ }
+
+ prod = txr->tx_prod;
+
+ for (i = 0; i < num_segs; i++) {
+ unsigned int seg_payload = min_t(unsigned int, mss,
+ total_payload - i * mss);
+ u16 slot = (txr->tx_inline_prod + i) &
+ (BNXT_SW_USO_MAX_SEGS - 1);
+ struct bnxt_sw_tx_bd *tx_buf;
+ unsigned int mapping_len;
+ dma_addr_t this_hdr_dma;
+ unsigned int chunk_len;
+ unsigned int offset;
+ dma_addr_t dma_addr;
+ struct tx_bd *txbd;
+ void *this_hdr;
+ int bd_count;
+ __le32 csum;
+ bool last;
+ u32 flags;
+
+ last = (i == num_segs - 1);
+ offset = slot * TSO_HEADER_SIZE;
+ this_hdr = txr->tx_inline_buf + offset;
+ this_hdr_dma = txr->tx_inline_dma + offset;
+
+ tso_build_hdr(skb, this_hdr, &tso, seg_payload, last);
+
+ dma_sync_single_for_device(&pdev->dev, this_hdr_dma,
+ hdr_len, DMA_TO_DEVICE);
+
+ bd_count = tso_dma_map_count(&map, seg_payload);
+
+ tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
+ txbd = &txr->tx_desc_ring[TX_RING(bp, prod)][TX_IDX(prod)];
+
+ tx_buf->skb = skb;
+ tx_buf->nr_frags = bd_count;
+ tx_buf->is_push = 0;
+ tx_buf->is_ts_pkt = 0;
+
+ dma_unmap_addr_set(tx_buf, mapping, this_hdr_dma);
+ dma_unmap_len_set(tx_buf, len, 0);
+
+ tx_buf->is_sw_gso = last ? BNXT_SW_GSO_LAST : BNXT_SW_GSO_MID;
+
+ /* Store IOVA state on the last segment for completion */
+ if (last && tso_dma_map_use_iova(&map)) {
+ tx_buf->iova_state = map.iova_state;
+ tx_buf->iova_total_len = map.total_len;
+ }
+
+ flags = (hdr_len << TX_BD_LEN_SHIFT) |
+ TX_BD_TYPE_LONG_TX_BD |
+ TX_BD_CNT(2 + bd_count);
+
+ flags |= bnxt_sw_gso_lhint(hdr_len + seg_payload);
+
+ txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
+ txbd->tx_bd_haddr = cpu_to_le64(this_hdr_dma);
+ txbd->tx_bd_opaque = SET_TX_OPAQUE(bp, txr, prod,
+ 2 + bd_count);
+
+ csum = cpu_to_le32(TX_BD_FLAGS_TCP_UDP_CHKSUM |
+ TX_BD_FLAGS_IP_CKSUM);
+
+ prod = NEXT_TX(prod);
+ bnxt_init_ext_bd(bp, txr, prod, csum,
+ vlan_tag_flags, cfa_action);
+
+ /* set dma_unmap_len on the LAST BD touching each
+ * region. Since completions are in-order, the last segment
+ * completes after all earlier ones, so the unmap is safe.
+ */
+ while (tso_dma_map_next(&map, &dma_addr, &chunk_len,
+ &mapping_len, seg_payload)) {
+ prod = NEXT_TX(prod);
+ txbd = &txr->tx_desc_ring[TX_RING(bp, prod)][TX_IDX(prod)];
+ tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
+
+ txbd->tx_bd_haddr = cpu_to_le64(dma_addr);
+ dma_unmap_addr_set(tx_buf, mapping, dma_addr);
+ dma_unmap_len_set(tx_buf, len, 0);
+ tx_buf->skb = NULL;
+ tx_buf->is_sw_gso = 0;
+
+ if (mapping_len) {
+ if (last_unmap_buf) {
+ dma_unmap_addr_set(last_unmap_buf,
+ mapping,
+ last_unmap_addr);
+ dma_unmap_len_set(last_unmap_buf,
+ len,
+ last_unmap_len);
+ }
+ last_unmap_addr = dma_addr;
+ last_unmap_len = mapping_len;
+ }
+ last_unmap_buf = tx_buf;
+
+ flags = chunk_len << TX_BD_LEN_SHIFT;
+ txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
+ txbd->tx_bd_opaque = 0;
+
+ seg_payload -= chunk_len;
+ }
+
+ txbd->tx_bd_len_flags_type |=
+ cpu_to_le32(TX_BD_FLAGS_PACKET_END);
+
+ prod = NEXT_TX(prod);
+ }
+
+ if (last_unmap_buf) {
+ dma_unmap_addr_set(last_unmap_buf, mapping, last_unmap_addr);
+ dma_unmap_len_set(last_unmap_buf, len, last_unmap_len);
+ }
+
+ txr->tx_inline_prod += num_segs;
+
+ netdev_tx_sent_queue(txq, skb->len);
+
+ WRITE_ONCE(txr->tx_prod, prod);
+ /* Sync BDs before doorbell */
+ wmb();
+ bnxt_db_write(bp, &txr->tx_db, prod);
+
+ if (unlikely(bnxt_tx_avail(bp, txr) <= bp->tx_wake_thresh))
+ netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh);
+
+ return NETDEV_TX_OK;
+
+drop:
dev_kfree_skb_any(skb);
dev_core_stats_tx_dropped_inc(bp->dev);
return NETDEV_TX_OK;
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [net-next v5 09/12] net: bnxt: Add SW GSO completion and teardown support
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (7 preceding siblings ...)
2026-03-23 18:38 ` [net-next v5 08/12] net: bnxt: Implement software USO Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-26 12:39 ` Paolo Abeni
2026-03-23 18:38 ` [net-next v5 10/12] net: bnxt: Dispatch to SW USO Joe Damato
` (2 subsequent siblings)
11 siblings, 1 reply; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: horms, linux-kernel, leon, Joe Damato
Update __bnxt_tx_int and bnxt_free_one_tx_ring_skbs to handle SW GSO
segments:
- MID segments: adjust tx_pkts/tx_bytes accounting and skip skb free
(the skb is shared across all segments and freed only once)
- LAST segments: if the DMA IOVA path was used, use dma_iova_destroy to
tear down the contiguous mapping. On the fallback path, payload DMA
unmapping is handled by the existing per-BD dma_unmap_len walk.
Both MID and LAST completions advance tx_inline_cons to release the
segment's inline header slot back to the ring.
is_sw_gso is initialized to zero, so the new code paths are not run.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
---
v5:
- Added Pavan's Reviewed-by. No functional changes.
v3:
- completion paths updated to use DMA IOVA APIs to teardown mappings.
rfcv2:
- Update the shared header buffer consumer on TX completion.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 82 +++++++++++++++++--
.../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 19 ++++-
2 files changed, 91 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 2759a4e2b148..40a16f96feba 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -74,6 +74,8 @@
#include "bnxt_debugfs.h"
#include "bnxt_coredump.h"
#include "bnxt_hwmon.h"
+#include "bnxt_gso.h"
+#include <net/tso.h>
#define BNXT_TX_TIMEOUT (5 * HZ)
#define BNXT_DEF_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_HW | \
@@ -817,12 +819,13 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
bool rc = false;
while (RING_TX(bp, cons) != hw_cons) {
- struct bnxt_sw_tx_bd *tx_buf;
+ struct bnxt_sw_tx_bd *tx_buf, *head_buf;
struct sk_buff *skb;
bool is_ts_pkt;
int j, last;
tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)];
+ head_buf = tx_buf;
skb = tx_buf->skb;
if (unlikely(!skb)) {
@@ -869,6 +872,23 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
DMA_TO_DEVICE, 0);
}
}
+
+ if (unlikely(head_buf->is_sw_gso)) {
+ txr->tx_inline_cons++;
+ if (head_buf->is_sw_gso == BNXT_SW_GSO_LAST) {
+ if (dma_use_iova(&head_buf->iova_state))
+ dma_iova_destroy(&pdev->dev,
+ &head_buf->iova_state,
+ head_buf->iova_total_len,
+ DMA_TO_DEVICE, 0);
+ } else {
+ tx_pkts--;
+ tx_bytes -= skb->len;
+ skb = NULL;
+ }
+ head_buf->is_sw_gso = 0;
+ }
+
if (unlikely(is_ts_pkt)) {
if (BNXT_CHIP_P5(bp)) {
/* PTP worker takes ownership of the skb */
@@ -3420,6 +3440,7 @@ static void bnxt_free_one_tx_ring_skbs(struct bnxt *bp,
for (i = 0; i < max_idx;) {
struct bnxt_sw_tx_bd *tx_buf = &txr->tx_buf_ring[i];
+ struct bnxt_sw_tx_bd *head_buf = tx_buf;
struct sk_buff *skb;
int j, last;
@@ -3472,7 +3493,20 @@ static void bnxt_free_one_tx_ring_skbs(struct bnxt *bp,
DMA_TO_DEVICE, 0);
}
}
- dev_kfree_skb(skb);
+ if (head_buf->is_sw_gso) {
+ txr->tx_inline_cons++;
+ if (head_buf->is_sw_gso == BNXT_SW_GSO_LAST) {
+ if (dma_use_iova(&head_buf->iova_state))
+ dma_iova_destroy(&pdev->dev,
+ &head_buf->iova_state,
+ head_buf->iova_total_len,
+ DMA_TO_DEVICE, 0);
+ } else {
+ skb = NULL;
+ }
+ }
+ if (skb)
+ dev_kfree_skb(skb);
}
netdev_tx_reset_queue(netdev_get_tx_queue(bp->dev, idx));
}
@@ -3998,9 +4032,9 @@ static void bnxt_free_tx_inline_buf(struct bnxt_tx_ring_info *txr,
txr->tx_inline_size = 0;
}
-static int __maybe_unused bnxt_alloc_tx_inline_buf(struct bnxt_tx_ring_info *txr,
- struct pci_dev *pdev,
- unsigned int size)
+static int bnxt_alloc_tx_inline_buf(struct bnxt_tx_ring_info *txr,
+ struct pci_dev *pdev,
+ unsigned int size)
{
txr->tx_inline_buf = kmalloc(size, GFP_KERNEL);
if (!txr->tx_inline_buf)
@@ -4103,6 +4137,14 @@ static int bnxt_alloc_tx_rings(struct bnxt *bp)
sizeof(struct tx_push_bd);
txr->data_mapping = cpu_to_le64(mapping);
}
+ if (!(bp->flags & BNXT_FLAG_UDP_GSO_CAP) &&
+ (bp->dev->features & NETIF_F_GSO_UDP_L4)) {
+ rc = bnxt_alloc_tx_inline_buf(txr, pdev,
+ BNXT_SW_USO_MAX_SEGS *
+ TSO_HEADER_SIZE);
+ if (rc)
+ return rc;
+ }
qidx = bp->tc_to_qidx[j];
ring->queue_id = bp->q_info[qidx].queue_id;
spin_lock_init(&txr->xdp_tx_lock);
@@ -4645,6 +4687,10 @@ static int bnxt_init_tx_rings(struct bnxt *bp)
bp->tx_wake_thresh = max_t(int, bp->tx_ring_size / 2,
BNXT_MIN_TX_DESC_CNT);
+ if (!(bp->flags & BNXT_FLAG_UDP_GSO_CAP) &&
+ (bp->dev->features & NETIF_F_GSO_UDP_L4))
+ bp->tx_wake_thresh = max_t(int, bp->tx_wake_thresh,
+ BNXT_SW_USO_MAX_DESCS);
for (i = 0; i < bp->tx_nr_rings; i++) {
struct bnxt_tx_ring_info *txr = &bp->tx_ring[i];
@@ -13833,6 +13879,11 @@ static netdev_features_t bnxt_fix_features(struct net_device *dev,
if ((features & NETIF_F_NTUPLE) && !bnxt_rfs_capable(bp, false))
features &= ~NETIF_F_NTUPLE;
+ if ((features & NETIF_F_GSO_UDP_L4) &&
+ !(bp->flags & BNXT_FLAG_UDP_GSO_CAP) &&
+ bp->tx_ring_size < 2 * BNXT_SW_USO_MAX_DESCS)
+ features &= ~NETIF_F_GSO_UDP_L4;
+
if ((bp->flags & BNXT_FLAG_NO_AGG_RINGS) || bp->xdp_prog)
features &= ~(NETIF_F_LRO | NETIF_F_GRO_HW);
@@ -13878,6 +13929,15 @@ static int bnxt_set_features(struct net_device *dev, netdev_features_t features)
int rc = 0;
bool re_init = false;
+ if (!(bp->flags & BNXT_FLAG_UDP_GSO_CAP)) {
+ if (features & NETIF_F_GSO_UDP_L4)
+ bp->tx_wake_thresh = max_t(int, bp->tx_wake_thresh,
+ BNXT_SW_USO_MAX_DESCS);
+ else
+ bp->tx_wake_thresh = max_t(int, bp->tx_ring_size / 2,
+ BNXT_MIN_TX_DESC_CNT);
+ }
+
flags &= ~BNXT_FLAG_ALL_CONFIG_FEATS;
if (features & NETIF_F_GRO_HW)
flags |= BNXT_FLAG_GRO;
@@ -16881,8 +16941,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
NETIF_F_GSO_UDP_TUNNEL_CSUM | NETIF_F_GSO_GRE_CSUM |
NETIF_F_GSO_PARTIAL | NETIF_F_RXHASH |
NETIF_F_RXCSUM | NETIF_F_GRO;
- if (bp->flags & BNXT_FLAG_UDP_GSO_CAP)
- dev->hw_features |= NETIF_F_GSO_UDP_L4;
+ dev->hw_features |= NETIF_F_GSO_UDP_L4;
if (BNXT_SUPPORTS_TPA(bp))
dev->hw_features |= NETIF_F_LRO;
@@ -16915,8 +16974,15 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
dev->priv_flags |= IFF_UNICAST_FLT;
netif_set_tso_max_size(dev, GSO_MAX_SIZE);
- if (bp->tso_max_segs)
+ if (!(bp->flags & BNXT_FLAG_UDP_GSO_CAP)) {
+ u16 max_segs = BNXT_SW_USO_MAX_SEGS;
+
+ if (bp->tso_max_segs)
+ max_segs = min_t(u16, max_segs, bp->tso_max_segs);
+ netif_set_tso_max_segs(dev, max_segs);
+ } else if (bp->tso_max_segs) {
netif_set_tso_max_segs(dev, bp->tso_max_segs);
+ }
dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
NETDEV_XDP_ACT_RX_SG;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 48e8e3be70d3..44b3fd18fcbe 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -33,6 +33,7 @@
#include "bnxt_xdp.h"
#include "bnxt_ptp.h"
#include "bnxt_ethtool.h"
+#include "bnxt_gso.h"
#include "bnxt_nvm_defs.h" /* NVRAM content constant and structure defs */
#include "bnxt_fw_hdr.h" /* Firmware hdr constant and structure defs */
#include "bnxt_coredump.h"
@@ -852,12 +853,18 @@ static int bnxt_set_ringparam(struct net_device *dev,
u8 tcp_data_split = kernel_ering->tcp_data_split;
struct bnxt *bp = netdev_priv(dev);
u8 hds_config_mod;
+ int rc;
if ((ering->rx_pending > BNXT_MAX_RX_DESC_CNT) ||
(ering->tx_pending > BNXT_MAX_TX_DESC_CNT) ||
(ering->tx_pending < BNXT_MIN_TX_DESC_CNT))
return -EINVAL;
+ if ((dev->features & NETIF_F_GSO_UDP_L4) &&
+ !(bp->flags & BNXT_FLAG_UDP_GSO_CAP) &&
+ ering->tx_pending < 2 * BNXT_SW_USO_MAX_DESCS)
+ return -EINVAL;
+
hds_config_mod = tcp_data_split != dev->cfg->hds_config;
if (tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_DISABLED && hds_config_mod)
return -EINVAL;
@@ -882,9 +889,17 @@ static int bnxt_set_ringparam(struct net_device *dev,
bp->tx_ring_size = ering->tx_pending;
bnxt_set_ring_params(bp);
- if (netif_running(dev))
- return bnxt_open_nic(bp, false, false);
+ if (netif_running(dev)) {
+ rc = bnxt_open_nic(bp, false, false);
+ if (rc)
+ return rc;
+ }
+ /* ring size changes may affect features (SW USO requires a minimum
+ * ring size), so recalculate features to ensure the correct features
+ * are blocked/available.
+ */
+ netdev_update_features(dev);
return 0;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [net-next v5 09/12] net: bnxt: Add SW GSO completion and teardown support
2026-03-23 18:38 ` [net-next v5 09/12] net: bnxt: Add SW GSO completion and teardown support Joe Damato
@ 2026-03-26 12:39 ` Paolo Abeni
0 siblings, 0 replies; 14+ messages in thread
From: Paolo Abeni @ 2026-03-26 12:39 UTC (permalink / raw)
To: Joe Damato, netdev, Michael Chan, Pavan Chebbi, Andrew Lunn,
David S. Miller, Eric Dumazet, Jakub Kicinski
Cc: horms, linux-kernel, leon
On 3/23/26 7:38 PM, Joe Damato wrote:
> Update __bnxt_tx_int and bnxt_free_one_tx_ring_skbs to handle SW GSO
> segments:
>
> - MID segments: adjust tx_pkts/tx_bytes accounting and skip skb free
> (the skb is shared across all segments and freed only once)
>
> - LAST segments: if the DMA IOVA path was used, use dma_iova_destroy to
> tear down the contiguous mapping. On the fallback path, payload DMA
> unmapping is handled by the existing per-BD dma_unmap_len walk.
>
> Both MID and LAST completions advance tx_inline_cons to release the
> segment's inline header slot back to the ring.
>
> is_sw_gso is initialized to zero, so the new code paths are not run.
>
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
> Signed-off-by: Joe Damato <joe@dama.to>
> ---
> v5:
> - Added Pavan's Reviewed-by. No functional changes.
>
> v3:
> - completion paths updated to use DMA IOVA APIs to teardown mappings.
>
> rfcv2:
> - Update the shared header buffer consumer on TX completion.
>
> drivers/net/ethernet/broadcom/bnxt/bnxt.c | 82 +++++++++++++++++--
> .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 19 ++++-
> 2 files changed, 91 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 2759a4e2b148..40a16f96feba 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -74,6 +74,8 @@
> #include "bnxt_debugfs.h"
> #include "bnxt_coredump.h"
> #include "bnxt_hwmon.h"
> +#include "bnxt_gso.h"
> +#include <net/tso.h>
>
> #define BNXT_TX_TIMEOUT (5 * HZ)
> #define BNXT_DEF_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_HW | \
> @@ -817,12 +819,13 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> bool rc = false;
>
> while (RING_TX(bp, cons) != hw_cons) {
> - struct bnxt_sw_tx_bd *tx_buf;
> + struct bnxt_sw_tx_bd *tx_buf, *head_buf;
> struct sk_buff *skb;
> bool is_ts_pkt;
> int j, last;
>
> tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)];
> + head_buf = tx_buf;
> skb = tx_buf->skb;
>
> if (unlikely(!skb)) {
> @@ -869,6 +872,23 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
> DMA_TO_DEVICE, 0);
> }
> }
> +
> + if (unlikely(head_buf->is_sw_gso)) {
> + txr->tx_inline_cons++;
> + if (head_buf->is_sw_gso == BNXT_SW_GSO_LAST) {
> + if (dma_use_iova(&head_buf->iova_state))
I'm likely lost, but AFAICS the previous patch/bnxt_sw_udp_gso_xmit()
initialize head_buf->iova_state only when
`dma_use_iova(&head_buf->iova_state) == true`. I.e. in fallback scenario
the previous iova_state is maintained.
Additionally AFAICS dma_iova_destroy does not clear `head_buf->iova_state`.
It looks like that 2 consecutive skb hitting the same slot use a
different dma mapping strategy (fallback vs iova) bat things will
happen?!? should the previous patch always initializing
head_buf->iova_state?
/P
^ permalink raw reply [flat|nested] 14+ messages in thread
* [net-next v5 10/12] net: bnxt: Dispatch to SW USO
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (8 preceding siblings ...)
2026-03-23 18:38 ` [net-next v5 09/12] net: bnxt: Add SW GSO completion and teardown support Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-23 18:38 ` [net-next v5 11/12] net: netdevsim: Add support for " Joe Damato
2026-03-23 18:38 ` [net-next v5 12/12] selftests: drv-net: Add USO test Joe Damato
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: horms, linux-kernel, leon, Joe Damato
Wire in the SW USO path added in preceding commits when hardware USO is
not possible.
When a GSO skb with SKB_GSO_UDP_L4 arrives and the NIC lacks HW USO
capability, redirect to bnxt_sw_udp_gso_xmit() which handles software
segmentation into individual UDP frames submitted directly to the TX
ring.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
---
v5:
- Added Pavan's Reviewed-by. No functional changes.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 40a16f96feba..737b64f8b80d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -508,6 +508,11 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
}
}
#endif
+ if (skb_is_gso(skb) &&
+ (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) &&
+ !(bp->flags & BNXT_FLAG_UDP_GSO_CAP))
+ return bnxt_sw_udp_gso_xmit(bp, txr, txq, skb);
+
free_size = bnxt_tx_avail(bp, txr);
if (unlikely(free_size < skb_shinfo(skb)->nr_frags + 2)) {
/* We must have raced with NAPI cleanup */
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [net-next v5 11/12] net: netdevsim: Add support for SW USO
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (9 preceding siblings ...)
2026-03-23 18:38 ` [net-next v5 10/12] net: bnxt: Dispatch to SW USO Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
2026-03-23 18:38 ` [net-next v5 12/12] selftests: drv-net: Add USO test Joe Damato
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, Jakub Kicinski, Andrew Lunn, David S. Miller,
Eric Dumazet, Paolo Abeni
Cc: horms, michael.chan, pavan.chebbi, linux-kernel, leon, Joe Damato
Add support for UDP Segmentation Offloading in software (SW USO). This
is helpful for testing when real hardware is not available. A test which
uses this codepath will be added in a following commit.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
---
v5:
- Added Pavan's Reviewed-by. No functional changes.
v4:
- Added parentheses around the gso_type check for clarity. No functional
change.
rfcv2:
- new in rfcv2
drivers/net/netdevsim/netdev.c | 100 ++++++++++++++++++++++++++++++++-
1 file changed, 99 insertions(+), 1 deletion(-)
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index c71b8d116f18..f228bcf3d190 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -30,6 +30,7 @@
#include <net/rtnetlink.h>
#include <net/udp_tunnel.h>
#include <net/busy_poll.h>
+#include <net/tso.h>
#include "netdevsim.h"
@@ -120,6 +121,98 @@ static int nsim_forward_skb(struct net_device *tx_dev,
return nsim_napi_rx(tx_dev, rx_dev, rq, skb);
}
+static netdev_tx_t nsim_uso_segment_xmit(struct net_device *dev,
+ struct sk_buff *skb)
+{
+ unsigned int hdr_len, mss, total_payload, num_segs;
+ struct netdevsim *ns = netdev_priv(dev);
+ struct net_device *peer_dev;
+ unsigned int total_len = 0;
+ struct netdevsim *peer_ns;
+ struct nsim_rq *rq;
+ struct tso_t tso;
+ int i, rxq;
+
+ hdr_len = tso_start(skb, &tso);
+ mss = skb_shinfo(skb)->gso_size;
+ total_payload = skb->len - hdr_len;
+ num_segs = DIV_ROUND_UP(total_payload, mss);
+
+ udp_hdr(skb)->check = 0;
+ if (!tso.ipv6)
+ ip_hdr(skb)->check = 0;
+
+ rcu_read_lock();
+ peer_ns = rcu_dereference(ns->peer);
+ if (!peer_ns)
+ goto out_drop_free;
+
+ peer_dev = peer_ns->netdev;
+ rxq = skb_get_queue_mapping(skb);
+ if (rxq >= peer_dev->num_rx_queues)
+ rxq = rxq % peer_dev->num_rx_queues;
+ rq = peer_ns->rq[rxq];
+
+ for (i = 0; i < num_segs; i++) {
+ unsigned int seg_payload = min_t(unsigned int, mss,
+ total_payload);
+ bool last = (i == num_segs - 1);
+ unsigned int seg_remaining;
+ struct sk_buff *seg;
+
+ seg = alloc_skb(hdr_len + seg_payload, GFP_ATOMIC);
+ if (!seg)
+ break;
+
+ seg->dev = dev;
+
+ tso_build_hdr(skb, skb_put(seg, hdr_len), &tso,
+ seg_payload, last);
+
+ if (!tso.ipv6) {
+ unsigned int nh_off = skb_network_offset(skb);
+ struct iphdr *iph;
+
+ iph = (struct iphdr *)(seg->data + nh_off);
+ iph->check = ip_fast_csum(iph, iph->ihl);
+ }
+
+ seg_remaining = seg_payload;
+ while (seg_remaining > 0) {
+ unsigned int chunk = min_t(unsigned int, tso.size,
+ seg_remaining);
+
+ memcpy(skb_put(seg, chunk), tso.data, chunk);
+ tso_build_data(skb, &tso, chunk);
+ seg_remaining -= chunk;
+ }
+
+ total_payload -= seg_payload;
+
+ seg->ip_summed = CHECKSUM_UNNECESSARY;
+
+ if (nsim_forward_skb(dev, peer_dev, seg, rq, NULL) == NET_RX_DROP)
+ continue;
+
+ total_len += hdr_len + seg_payload;
+ }
+
+ if (!hrtimer_active(&rq->napi_timer))
+ hrtimer_start(&rq->napi_timer, us_to_ktime(5),
+ HRTIMER_MODE_REL);
+
+ rcu_read_unlock();
+ dev_kfree_skb(skb);
+ dev_dstats_tx_add(dev, total_len);
+ return NETDEV_TX_OK;
+
+out_drop_free:
+ dev_kfree_skb(skb);
+ rcu_read_unlock();
+ dev_dstats_tx_dropped(dev);
+ return NETDEV_TX_OK;
+}
+
static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct netdevsim *ns = netdev_priv(dev);
@@ -132,6 +225,10 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
int rxq;
int dr;
+ if (skb_is_gso(skb) &&
+ skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
+ return nsim_uso_segment_xmit(dev, skb);
+
rcu_read_lock();
if (!nsim_ipsec_tx(ns, skb))
goto out_drop_any;
@@ -938,7 +1035,8 @@ static void nsim_setup(struct net_device *dev)
NETIF_F_HW_CSUM |
NETIF_F_LRO |
NETIF_F_TSO |
- NETIF_F_LOOPBACK;
+ NETIF_F_LOOPBACK |
+ NETIF_F_GSO_UDP_L4;
dev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS;
dev->max_mtu = ETH_MAX_MTU;
dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_HW_OFFLOAD;
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* [net-next v5 12/12] selftests: drv-net: Add USO test
2026-03-23 18:38 [net-next v5 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (10 preceding siblings ...)
2026-03-23 18:38 ` [net-next v5 11/12] net: netdevsim: Add support for " Joe Damato
@ 2026-03-23 18:38 ` Joe Damato
11 siblings, 0 replies; 14+ messages in thread
From: Joe Damato @ 2026-03-23 18:38 UTC (permalink / raw)
To: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Shuah Khan
Cc: horms, michael.chan, pavan.chebbi, linux-kernel, leon, Joe Damato,
linux-kselftest
Add a simple test for USO. Can be used with netdevsim or real hardware.
Tests both ipv4 and ipv6 with several full segments and a partial
segment.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Joe Damato <joe@dama.to>
---
v5:
- Added Pavan's Reviewed-by. No functional changes.
v4:
- Fix python linter issues (unused imports, docstring, etc).
rfcv2:
- new in rfcv2
tools/testing/selftests/drivers/net/Makefile | 1 +
tools/testing/selftests/drivers/net/uso.py | 96 ++++++++++++++++++++
2 files changed, 97 insertions(+)
create mode 100755 tools/testing/selftests/drivers/net/uso.py
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 7c7fa75b80c2..335c2ce4b9ab 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -21,6 +21,7 @@ TEST_PROGS := \
ring_reconfig.py \
shaper.py \
stats.py \
+ uso.py \
xdp.py \
# end of TEST_PROGS
diff --git a/tools/testing/selftests/drivers/net/uso.py b/tools/testing/selftests/drivers/net/uso.py
new file mode 100755
index 000000000000..2ddeae99b4d6
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/uso.py
@@ -0,0 +1,96 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""Test USO
+
+Sends large UDP datagrams with UDP_SEGMENT and verifies that the peer
+receives the correct number of individual segments with correct sizes.
+"""
+import socket
+import time
+
+from lib.py import ksft_run, ksft_exit, KsftSkipEx
+from lib.py import ksft_ge
+from lib.py import NetDrvEpEnv
+from lib.py import defer, ethtool, ip, rand_port
+
+# python doesn't expose this constant, so we need to hardcode it to enable UDP
+# segmentation for large payloads
+UDP_SEGMENT = 103
+
+
+def _send_uso(cfg, ipver, mss, total_payload, port):
+ if ipver == "4":
+ sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+ dst = (cfg.remote_addr_v["4"], port)
+ else:
+ sock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
+ dst = (cfg.remote_addr_v["6"], port)
+
+ sock.setsockopt(socket.IPPROTO_UDP, UDP_SEGMENT, mss)
+ payload = bytes(range(256)) * ((total_payload // 256) + 1)
+ payload = payload[:total_payload]
+ sock.sendto(payload, dst)
+ sock.close()
+ return payload
+
+
+def _get_rx_packets(cfg):
+ stats = ip(f"-s link show dev {cfg.remote_ifname}",
+ json=True, host=cfg.remote)[0]
+ return stats['stats64']['rx']['packets']
+
+
+def _test_uso(cfg, ipver, mss, total_payload):
+ cfg.require_ipver(ipver)
+
+ try:
+ ethtool(f"-K {cfg.ifname} tx-udp-segmentation on")
+ except Exception as exc:
+ raise KsftSkipEx(
+ "Device does not support tx-udp-segmentation") from exc
+ defer(ethtool, f"-K {cfg.ifname} tx-udp-segmentation off")
+
+ expected_segs = (total_payload + mss - 1) // mss
+
+ rx_before = _get_rx_packets(cfg)
+
+ port = rand_port(stype=socket.SOCK_DGRAM)
+ _send_uso(cfg, ipver, mss, total_payload, port)
+
+ time.sleep(0.5)
+
+ rx_after = _get_rx_packets(cfg)
+ rx_delta = rx_after - rx_before
+
+ ksft_ge(rx_delta, expected_segs,
+ comment=f"Expected >= {expected_segs} rx packets, got {rx_delta}")
+
+
+def test_uso_v4(cfg):
+ """USO IPv4: 11 segments (10 full + 1 partial)."""
+ _test_uso(cfg, "4", 1400, 1400 * 10 + 500)
+
+
+def test_uso_v6(cfg):
+ """USO IPv6: 11 segments (10 full + 1 partial)."""
+ _test_uso(cfg, "6", 1400, 1400 * 10 + 500)
+
+
+def test_uso_v4_exact(cfg):
+ """USO IPv4: exact multiple of MSS (5 full segments)."""
+ _test_uso(cfg, "4", 1400, 1400 * 5)
+
+
+def main() -> None:
+ """Run USO tests."""
+ with NetDrvEpEnv(__file__) as cfg:
+ ksft_run([test_uso_v4,
+ test_uso_v6,
+ test_uso_v4_exact],
+ args=(cfg, ))
+ ksft_exit()
+
+
+if __name__ == "__main__":
+ main()
--
2.52.0
^ permalink raw reply related [flat|nested] 14+ messages in thread