* [RFC net-next v2 01/12] net: tso: Introduce tso_dma_map
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 02/12] net: tso: Add tso_dma_map helpers Joe Damato
` (11 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: michael.chan, pavan.chebbi, linux-kernel, Joe Damato
Add struct tso_dma_map to tso.h for tracking DMA addresses of mapped
GSO payload data.
The struct combines DMA mapping storage (linear_dma, frags[]) with
iterator state (frag_idx, offset), allowing drivers to walk pre-mapped
DMA regions linearly. Helpers to initialize and operate on this struct
will be added in the next commit.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
include/net/tso.h | 34 ++++++++++++++++++++++++++++++++++
1 file changed, 34 insertions(+)
diff --git a/include/net/tso.h b/include/net/tso.h
index e7e157ae0526..cd4b98dbea71 100644
--- a/include/net/tso.h
+++ b/include/net/tso.h
@@ -3,6 +3,7 @@
#define _TSO_H
#include <linux/skbuff.h>
+#include <linux/dma-mapping.h>
#include <net/ip.h>
#define TSO_HEADER_SIZE 256
@@ -28,4 +29,37 @@ void tso_build_hdr(const struct sk_buff *skb, char *hdr, struct tso_t *tso,
void tso_build_data(const struct sk_buff *skb, struct tso_t *tso, int size);
int tso_start(struct sk_buff *skb, struct tso_t *tso);
+/**
+ * struct tso_dma_map - DMA mapping state for GSO payload
+ * @dev: device used for DMA mapping
+ * @skb: the GSO skb being mapped
+ * @hdr_len: per-segment header length
+ * @frag_idx: current region (-1 = linear, 0..nr_frags-1 = frag)
+ * @offset: byte offset within current region
+ * @linear_dma: DMA address of the linear payload (after headers)
+ * @linear_len: length of the linear payload
+ * @nr_frags: number of frags successfully DMA-mapped
+ * @frags: per-frag DMA address and length
+ *
+ * Struct that DMA-maps the payload regions of a GSO skb
+ * (linear data + frags) upfront, then provides iteration to yield
+ * (dma_addr, chunk_len) pairs bounded by region boundaries.
+ */
+struct tso_dma_map {
+ struct device *dev;
+ const struct sk_buff *skb;
+ unsigned int hdr_len;
+ /* Iterator state */
+ int frag_idx;
+ unsigned int offset;
+ /* Pre-mapped regions */
+ dma_addr_t linear_dma;
+ unsigned int linear_len;
+ unsigned int nr_frags;
+ struct {
+ dma_addr_t dma;
+ unsigned int len;
+ } frags[MAX_SKB_FRAGS];
+};
+
#endif /* _TSO_H */
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 02/12] net: tso: Add tso_dma_map helpers
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 01/12] net: tso: Introduce tso_dma_map Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 03/12] net: bnxt: Export bnxt_xmit_get_cfa_action Joe Damato
` (10 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Simon Horman
Cc: michael.chan, pavan.chebbi, linux-kernel, Joe Damato
Add helpers to initialize, iterate, and clean up a tso_dma_map:
tso_dma_map_init(): DMA-maps the linear payload region and all frags
upfront into the tso_dma_map struct. Returns 0 on success, cleans up
partial mappings on failure.
tso_dma_map_cleanup(): unmaps all DMA regions. Used on error paths.
tso_dma_map_count(): counts how many descriptors the next N bytes of
payload will need, without advancing the iterator.
tso_dma_map_next(): yields the next (dma_addr, chunk_len) pair.
Indicates when a chunk starts a new DMA mapping so the driver can set
dma_unmap_len on that BD for completion-time unmapping.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
include/net/tso.h | 8 +++
net/core/tso.c | 165 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 173 insertions(+)
diff --git a/include/net/tso.h b/include/net/tso.h
index cd4b98dbea71..a1fa605f26b4 100644
--- a/include/net/tso.h
+++ b/include/net/tso.h
@@ -62,4 +62,12 @@ struct tso_dma_map {
} frags[MAX_SKB_FRAGS];
};
+int tso_dma_map_init(struct tso_dma_map *map, struct device *dev,
+ const struct sk_buff *skb, unsigned int hdr_len);
+void tso_dma_map_cleanup(struct tso_dma_map *map);
+unsigned int tso_dma_map_count(const struct tso_dma_map *map, unsigned int len);
+bool tso_dma_map_next(struct tso_dma_map *map, dma_addr_t *addr,
+ unsigned int *chunk_len, unsigned int *mapping_len,
+ unsigned int seg_remaining);
+
#endif /* _TSO_H */
diff --git a/net/core/tso.c b/net/core/tso.c
index 6df997b9076e..fdbef4ca840d 100644
--- a/net/core/tso.c
+++ b/net/core/tso.c
@@ -3,6 +3,7 @@
#include <linux/if_vlan.h>
#include <net/ip.h>
#include <net/tso.h>
+#include <linux/dma-mapping.h>
#include <linux/unaligned.h>
void tso_build_hdr(const struct sk_buff *skb, char *hdr, struct tso_t *tso,
@@ -87,3 +88,167 @@ int tso_start(struct sk_buff *skb, struct tso_t *tso)
return hdr_len;
}
EXPORT_SYMBOL(tso_start);
+
+/**
+ * tso_dma_map_init - DMA-map GSO payload regions
+ * @map: map struct to initialize
+ * @dev: device for DMA mapping
+ * @skb: the GSO skb
+ * @hdr_len: per-segment header length in bytes
+ *
+ * DMA-maps the linear payload (after headers) and all frags.
+ * Positions the iterator at byte 0 of the payload.
+ *
+ * Returns 0 on success, -ENOMEM on DMA mapping failure (partial mappings
+ * are cleaned up internally).
+ */
+int tso_dma_map_init(struct tso_dma_map *map, struct device *dev,
+ const struct sk_buff *skb, unsigned int hdr_len)
+{
+ unsigned int linear_len = skb_headlen(skb) - hdr_len;
+ unsigned int nr_frags = skb_shinfo(skb)->nr_frags;
+ int i;
+
+ map->dev = dev;
+ map->skb = skb;
+ map->hdr_len = hdr_len;
+ map->frag_idx = -1;
+ map->offset = 0;
+ map->linear_len = 0;
+ map->nr_frags = 0;
+
+ if (linear_len > 0) {
+ map->linear_dma = dma_map_single(dev, skb->data + hdr_len,
+ linear_len, DMA_TO_DEVICE);
+ if (dma_mapping_error(dev, map->linear_dma))
+ return -ENOMEM;
+ map->linear_len = linear_len;
+ }
+
+ for (i = 0; i < nr_frags; i++) {
+ skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+ map->frags[i].len = skb_frag_size(frag);
+ map->frags[i].dma = skb_frag_dma_map(dev, frag, 0,
+ map->frags[i].len,
+ DMA_TO_DEVICE);
+ if (dma_mapping_error(dev, map->frags[i].dma)) {
+ tso_dma_map_cleanup(map);
+ return -ENOMEM;
+ }
+ map->nr_frags = i + 1;
+ }
+
+ if (linear_len == 0 && nr_frags > 0)
+ map->frag_idx = 0;
+
+ return 0;
+}
+EXPORT_SYMBOL(tso_dma_map_init);
+
+/**
+ * tso_dma_map_cleanup - unmap all DMA regions in a tso_dma_map
+ * @map: the map to clean up
+ *
+ * Unmaps linear payload and all mapped frags. Used on error paths.
+ * Success paths use the driver's completion path to handle unmapping.
+ */
+void tso_dma_map_cleanup(struct tso_dma_map *map)
+{
+ int i;
+
+ if (map->linear_len)
+ dma_unmap_single(map->dev, map->linear_dma, map->linear_len,
+ DMA_TO_DEVICE);
+
+ for (i = 0; i < map->nr_frags; i++)
+ dma_unmap_page(map->dev, map->frags[i].dma, map->frags[i].len,
+ DMA_TO_DEVICE);
+
+ map->linear_len = 0;
+ map->nr_frags = 0;
+}
+EXPORT_SYMBOL(tso_dma_map_cleanup);
+
+/**
+ * tso_dma_map_count - count descriptors for a payload range
+ * @map: the payload map
+ * @len: number of payload bytes in this segment
+ *
+ * Counts how many contiguous DMA region chunks the next @len bytes
+ * will span, without advancing the iterator. Uses region sizes from
+ * the current position.
+ *
+ * Returns the number of descriptors needed for @len bytes of payload.
+ */
+unsigned int tso_dma_map_count(const struct tso_dma_map *map, unsigned int len)
+{
+ unsigned int offset = map->offset;
+ int idx = map->frag_idx;
+ unsigned int count = 0;
+
+ while (len > 0) {
+ unsigned int region_len, chunk;
+
+ if (idx == -1)
+ region_len = map->linear_len;
+ else
+ region_len = map->frags[idx].len;
+
+ chunk = min(len, region_len - offset);
+ len -= chunk;
+ count++;
+ offset = 0;
+ idx++;
+ }
+
+ return count;
+}
+EXPORT_SYMBOL(tso_dma_map_count);
+
+/**
+ * tso_dma_map_next - yield the next DMA address range
+ * @map: the payload map
+ * @addr: output DMA address
+ * @chunk_len: output chunk length
+ * @mapping_len: full DMA mapping length when this chunk starts a new
+ * mapping region, or 0 when continuing a previous one.
+ * Driver can assign this to the last descriptor.
+ * @seg_remaining: bytes left in current segment
+ *
+ * Yields the next (dma_addr, chunk_len) pair and advances the iterator.
+ *
+ * Returns true if a chunk was yielded, false when @seg_remaining is 0.
+ */
+bool tso_dma_map_next(struct tso_dma_map *map, dma_addr_t *addr,
+ unsigned int *chunk_len, unsigned int *mapping_len,
+ unsigned int seg_remaining)
+{
+ unsigned int region_len, chunk;
+
+ if (!seg_remaining)
+ return false;
+
+ if (map->frag_idx == -1) {
+ region_len = map->linear_len;
+ chunk = min(seg_remaining, region_len - map->offset);
+ *addr = map->linear_dma + map->offset;
+ *mapping_len = (map->offset == 0) ? region_len : 0;
+ } else {
+ region_len = map->frags[map->frag_idx].len;
+ chunk = min(seg_remaining, region_len - map->offset);
+ *addr = map->frags[map->frag_idx].dma + map->offset;
+ *mapping_len = (map->offset == 0) ? region_len : 0;
+ }
+
+ *chunk_len = chunk;
+ map->offset += chunk;
+
+ if (map->offset >= region_len) {
+ map->frag_idx++;
+ map->offset = 0;
+ }
+
+ return true;
+}
+EXPORT_SYMBOL(tso_dma_map_next);
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 03/12] net: bnxt: Export bnxt_xmit_get_cfa_action
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 01/12] net: tso: Introduce tso_dma_map Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 02/12] net: tso: Add tso_dma_map helpers Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 04/12] net: bnxt: Add a helper for tx_bd_ext Joe Damato
` (9 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-kernel, Joe Damato
Export bnxt_xmit_get_cfa_action so that it can be used in future commits
which add software USO support to bnxt.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index c982aac714d1..c9206977fd54 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -447,7 +447,7 @@ const u16 bnxt_lhint_arr[] = {
TX_BD_FLAGS_LHINT_2048_AND_LARGER,
};
-static u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb)
+u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb)
{
struct metadata_dst *md_dst = skb_metadata_dst(skb);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 90fa3e93c8d6..8147f31967b5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2950,6 +2950,7 @@ unsigned int bnxt_get_avail_cp_rings_for_en(struct bnxt *bp);
int bnxt_reserve_rings(struct bnxt *bp, bool irq_re_init);
void bnxt_tx_disable(struct bnxt *bp);
void bnxt_tx_enable(struct bnxt *bp);
+u16 bnxt_xmit_get_cfa_action(struct sk_buff *skb);
void bnxt_sched_reset_txr(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
u16 curr);
void bnxt_report_link(struct bnxt *bp);
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 04/12] net: bnxt: Add a helper for tx_bd_ext
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (2 preceding siblings ...)
2026-03-12 22:34 ` [RFC net-next v2 03/12] net: bnxt: Export bnxt_xmit_get_cfa_action Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 05/12] net: bnxt: Use dma_unmap_len for TX completion unmapping Joe Damato
` (8 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-kernel, Joe Damato
Factor out some code to setup tx_bd_exts into a helper function. This
helper will be used by SW USO implementation in the following commits.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 9 ++-------
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 18 ++++++++++++++++++
2 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index c9206977fd54..d12e4fcd5063 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -663,10 +663,9 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
txbd->tx_bd_opaque = SET_TX_OPAQUE(bp, txr, prod, 2 + last_frag);
prod = NEXT_TX(prod);
- txbd1 = (struct tx_bd_ext *)
- &txr->tx_desc_ring[TX_RING(bp, prod)][TX_IDX(prod)];
+ txbd1 = bnxt_init_ext_bd(bp, txr, prod, lflags, vlan_tag_flags,
+ cfa_action);
- txbd1->tx_bd_hsize_lflags = lflags;
if (skb_is_gso(skb)) {
bool udp_gso = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4);
u32 hdr_len;
@@ -693,7 +692,6 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
} else if (skb->ip_summed == CHECKSUM_PARTIAL) {
txbd1->tx_bd_hsize_lflags |=
cpu_to_le32(TX_BD_FLAGS_TCP_UDP_CHKSUM);
- txbd1->tx_bd_mss = 0;
}
length >>= 9;
@@ -706,9 +704,6 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
flags |= bnxt_lhint_arr[length];
txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
- txbd1->tx_bd_cfa_meta = cpu_to_le32(vlan_tag_flags);
- txbd1->tx_bd_cfa_action =
- cpu_to_le32(cfa_action << TX_BD_CFA_ACTION_SHIFT);
txbd0 = txbd;
for (i = 0; i < last_frag; i++) {
frag = &skb_shinfo(skb)->frags[i];
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 8147f31967b5..a822bbb71146 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -2834,6 +2834,24 @@ static inline u32 bnxt_tx_avail(struct bnxt *bp,
return bp->tx_ring_size - (used & bp->tx_ring_mask);
}
+static inline struct tx_bd_ext *
+bnxt_init_ext_bd(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
+ u16 prod, __le32 lflags, u32 vlan_tag_flags,
+ u32 cfa_action)
+{
+ struct tx_bd_ext *txbd1;
+
+ txbd1 = (struct tx_bd_ext *)
+ &txr->tx_desc_ring[TX_RING(bp, prod)][TX_IDX(prod)];
+ txbd1->tx_bd_hsize_lflags = lflags;
+ txbd1->tx_bd_mss = 0;
+ txbd1->tx_bd_cfa_meta = cpu_to_le32(vlan_tag_flags);
+ txbd1->tx_bd_cfa_action =
+ cpu_to_le32(cfa_action << TX_BD_CFA_ACTION_SHIFT);
+
+ return txbd1;
+}
+
static inline void bnxt_writeq(struct bnxt *bp, u64 val,
volatile void __iomem *addr)
{
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 05/12] net: bnxt: Use dma_unmap_len for TX completion unmapping
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (3 preceding siblings ...)
2026-03-12 22:34 ` [RFC net-next v2 04/12] net: bnxt: Add a helper for tx_bd_ext Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 06/12] net: bnxt: Add TX inline buffer infrastructure Joe Damato
` (7 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-kernel, Joe Damato
Store the DMA mapping length in each TX buffer descriptor via
dma_unmap_len_set at submit time, and use dma_unmap_len at completion
time.
This is a no-op for normal packets but prepares for software USO,
where header BDs set dma_unmap_len to 0 because the header buffer
is unmapped collectively rather than per-segment.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
rfcv2:
- Use some local variables to shortern long lines. No functional change from
rfcv1.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 63 ++++++++++++++---------
1 file changed, 40 insertions(+), 23 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d12e4fcd5063..ea8081aeb5ae 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -656,6 +656,7 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
goto tx_free;
dma_unmap_addr_set(tx_buf, mapping, mapping);
+ dma_unmap_len_set(tx_buf, len, len);
flags = (len << TX_BD_LEN_SHIFT) | TX_BD_TYPE_LONG_TX_BD |
TX_BD_CNT(last_frag + 2);
@@ -720,6 +721,7 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
netmem_dma_unmap_addr_set(skb_frag_netmem(frag), tx_buf,
mapping, mapping);
+ dma_unmap_len_set(tx_buf, len, len);
txbd->tx_bd_haddr = cpu_to_le64(mapping);
@@ -809,7 +811,8 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
u16 hw_cons = txr->tx_hw_cons;
unsigned int tx_bytes = 0;
u16 cons = txr->tx_cons;
- skb_frag_t *frag;
+ unsigned int dma_len;
+ dma_addr_t dma_addr;
int tx_pkts = 0;
bool rc = false;
@@ -844,19 +847,27 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
goto next_tx_int;
}
- dma_unmap_single(&pdev->dev, dma_unmap_addr(tx_buf, mapping),
- skb_headlen(skb), DMA_TO_DEVICE);
+ if (dma_unmap_len(tx_buf, len)) {
+ dma_addr = dma_unmap_addr(tx_buf, mapping);
+ dma_len = dma_unmap_len(tx_buf, len);
+
+ dma_unmap_single(&pdev->dev, dma_addr, dma_len,
+ DMA_TO_DEVICE);
+ }
+
last = tx_buf->nr_frags;
for (j = 0; j < last; j++) {
- frag = &skb_shinfo(skb)->frags[j];
cons = NEXT_TX(cons);
tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)];
- netmem_dma_unmap_page_attrs(&pdev->dev,
- dma_unmap_addr(tx_buf,
- mapping),
- skb_frag_size(frag),
- DMA_TO_DEVICE, 0);
+ if (dma_unmap_len(tx_buf, len)) {
+ dma_addr = dma_unmap_addr(tx_buf, mapping);
+ dma_len = dma_unmap_len(tx_buf, len);
+
+ netmem_dma_unmap_page_attrs(&pdev->dev,
+ dma_addr, dma_len,
+ DMA_TO_DEVICE, 0);
+ }
}
if (unlikely(is_ts_pkt)) {
if (BNXT_CHIP_P5(bp)) {
@@ -3400,6 +3411,8 @@ static void bnxt_free_one_tx_ring_skbs(struct bnxt *bp,
{
int i, max_idx;
struct pci_dev *pdev = bp->pdev;
+ unsigned int dma_len;
+ dma_addr_t dma_addr;
max_idx = bp->tx_nr_pages * TX_DESC_CNT;
@@ -3410,10 +3423,10 @@ static void bnxt_free_one_tx_ring_skbs(struct bnxt *bp,
if (idx < bp->tx_nr_rings_xdp &&
tx_buf->action == XDP_REDIRECT) {
- dma_unmap_single(&pdev->dev,
- dma_unmap_addr(tx_buf, mapping),
- dma_unmap_len(tx_buf, len),
- DMA_TO_DEVICE);
+ dma_addr = dma_unmap_addr(tx_buf, mapping);
+ dma_len = dma_unmap_len(tx_buf, len);
+
+ dma_unmap_single(&pdev->dev, dma_addr, dma_len, DMA_TO_DEVICE);
xdp_return_frame(tx_buf->xdpf);
tx_buf->action = 0;
tx_buf->xdpf = NULL;
@@ -3435,23 +3448,27 @@ static void bnxt_free_one_tx_ring_skbs(struct bnxt *bp,
continue;
}
- dma_unmap_single(&pdev->dev,
- dma_unmap_addr(tx_buf, mapping),
- skb_headlen(skb),
- DMA_TO_DEVICE);
+ if (dma_unmap_len(tx_buf, len)) {
+ dma_addr = dma_unmap_addr(tx_buf, mapping);
+ dma_len = dma_unmap_len(tx_buf, len);
+
+ dma_unmap_single(&pdev->dev, dma_addr, dma_len, DMA_TO_DEVICE);
+ }
last = tx_buf->nr_frags;
i += 2;
for (j = 0; j < last; j++, i++) {
int ring_idx = i & bp->tx_ring_mask;
- skb_frag_t *frag = &skb_shinfo(skb)->frags[j];
tx_buf = &txr->tx_buf_ring[ring_idx];
- netmem_dma_unmap_page_attrs(&pdev->dev,
- dma_unmap_addr(tx_buf,
- mapping),
- skb_frag_size(frag),
- DMA_TO_DEVICE, 0);
+ if (dma_unmap_len(tx_buf, len)) {
+ dma_addr = dma_unmap_addr(tx_buf, mapping);
+ dma_len = dma_unmap_len(tx_buf, len);
+
+ netmem_dma_unmap_page_attrs(&pdev->dev,
+ dma_addr, dma_len,
+ DMA_TO_DEVICE, 0);
+ }
}
dev_kfree_skb(skb);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 06/12] net: bnxt: Add TX inline buffer infrastructure
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (4 preceding siblings ...)
2026-03-12 22:34 ` [RFC net-next v2 05/12] net: bnxt: Use dma_unmap_len for TX completion unmapping Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 07/12] net: bnxt: Add boilerplate GSO code Joe Damato
` (6 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-kernel, Joe Damato
Add per-ring pre-allocated inline buffer fields (tx_inline_buf,
tx_inline_dma, tx_inline_size) to bnxt_tx_ring_info and helpers to
allocate and free them. A producer and consumer (tx_inline_prod,
tx_inline_cons) are added to track which slot(s) of the inline buffer
are in-use.
The inline buffer will be used by the SW USO path for pre-allocated,
pre-DMA-mapped per-segment header copies. In the future, this
could be extended to support TX copybreak.
Allocation helper is marked __maybe_unused in this commit because it
will be wired in later.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
rfcv2:
- Added a producer and consumer to correctly track the in use header slots.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 35 +++++++++++++++++++++++
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 6 ++++
2 files changed, 41 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ea8081aeb5ae..8929264a54b1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3983,6 +3983,39 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
return rc;
}
+static void bnxt_free_tx_inline_buf(struct bnxt_tx_ring_info *txr,
+ struct pci_dev *pdev)
+{
+ if (!txr->tx_inline_buf)
+ return;
+
+ dma_unmap_single(&pdev->dev, txr->tx_inline_dma,
+ txr->tx_inline_size, DMA_TO_DEVICE);
+ kfree(txr->tx_inline_buf);
+ txr->tx_inline_buf = NULL;
+ txr->tx_inline_size = 0;
+}
+
+static int __maybe_unused bnxt_alloc_tx_inline_buf(struct bnxt_tx_ring_info *txr,
+ struct pci_dev *pdev,
+ unsigned int size)
+{
+ txr->tx_inline_buf = kmalloc(size, GFP_KERNEL);
+ if (!txr->tx_inline_buf)
+ return -ENOMEM;
+
+ txr->tx_inline_dma = dma_map_single(&pdev->dev, txr->tx_inline_buf,
+ size, DMA_TO_DEVICE);
+ if (dma_mapping_error(&pdev->dev, txr->tx_inline_dma)) {
+ kfree(txr->tx_inline_buf);
+ txr->tx_inline_buf = NULL;
+ return -ENOMEM;
+ }
+ txr->tx_inline_size = size;
+
+ return 0;
+}
+
static void bnxt_free_tx_rings(struct bnxt *bp)
{
int i;
@@ -4001,6 +4034,8 @@ static void bnxt_free_tx_rings(struct bnxt *bp)
txr->tx_push = NULL;
}
+ bnxt_free_tx_inline_buf(txr, pdev);
+
ring = &txr->tx_ring_struct;
bnxt_free_ring(bp, &ring->ring_mem);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index a822bbb71146..d9543d6048d8 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -994,6 +994,12 @@ struct bnxt_tx_ring_info {
dma_addr_t tx_push_mapping;
__le64 data_mapping;
+ void *tx_inline_buf;
+ dma_addr_t tx_inline_dma;
+ unsigned int tx_inline_size;
+ u16 tx_inline_prod;
+ u16 tx_inline_cons;
+
#define BNXT_DEV_STATE_CLOSING 0x1
u32 dev_state;
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 07/12] net: bnxt: Add boilerplate GSO code
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (5 preceding siblings ...)
2026-03-12 22:34 ` [RFC net-next v2 06/12] net: bnxt: Add TX inline buffer infrastructure Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 08/12] net: bnxt: Implement software USO Joe Damato
` (5 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni, Richard Cochran,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, Stanislav Fomichev
Cc: linux-kernel, Joe Damato, bpf
Add bnxt_gso.c and bnxt_gso.h with a stub bnxt_sw_udp_gso_xmit()
function, SW USO constants (BNXT_SW_USO_MAX_SEGS,
BNXT_SW_USO_MAX_DESCS), and the is_sw_gso field in bnxt_sw_tx_bd
with BNXT_SW_GSO_MID/LAST markers.
The full SW USO implementation will be added in a future commit.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
drivers/net/ethernet/broadcom/bnxt/Makefile | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt.h | 4 +++
drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c | 30 ++++++++++++++++++
drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h | 31 +++++++++++++++++++
4 files changed, 66 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
create mode 100644 drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h
diff --git a/drivers/net/ethernet/broadcom/bnxt/Makefile b/drivers/net/ethernet/broadcom/bnxt/Makefile
index ba6c239d52fa..debef78c8b6d 100644
--- a/drivers/net/ethernet/broadcom/bnxt/Makefile
+++ b/drivers/net/ethernet/broadcom/bnxt/Makefile
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-$(CONFIG_BNXT) += bnxt_en.o
-bnxt_en-y := bnxt.o bnxt_hwrm.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o bnxt_xdp.o bnxt_ptp.o bnxt_vfr.o bnxt_devlink.o bnxt_dim.o bnxt_coredump.o
+bnxt_en-y := bnxt.o bnxt_hwrm.o bnxt_sriov.o bnxt_ethtool.o bnxt_dcb.o bnxt_ulp.o bnxt_xdp.o bnxt_ptp.o bnxt_vfr.o bnxt_devlink.o bnxt_dim.o bnxt_coredump.o bnxt_gso.o
bnxt_en-$(CONFIG_BNXT_FLOWER_OFFLOAD) += bnxt_tc.o
bnxt_en-$(CONFIG_DEBUG_FS) += bnxt_debugfs.o
bnxt_en-$(CONFIG_BNXT_HWMON) += bnxt_hwmon.o
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index d9543d6048d8..593b78672be8 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -891,6 +891,7 @@ struct bnxt_sw_tx_bd {
u8 is_ts_pkt;
u8 is_push;
u8 action;
+ u8 is_sw_gso;
unsigned short nr_frags;
union {
u16 rx_prod;
@@ -898,6 +899,9 @@ struct bnxt_sw_tx_bd {
};
};
+#define BNXT_SW_GSO_MID 1
+#define BNXT_SW_GSO_LAST 2
+
struct bnxt_sw_rx_bd {
void *data;
u8 *data_ptr;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
new file mode 100644
index 000000000000..b296769ee4fe
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
@@ -0,0 +1,30 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Broadcom NetXtreme-C/E network driver.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#include <linux/pci.h>
+#include <linux/netdevice.h>
+#include <linux/skbuff.h>
+#include <net/netdev_queues.h>
+#include <net/ip.h>
+#include <net/ipv6.h>
+#include <net/udp.h>
+#include <net/tso.h>
+#include <linux/bnxt/hsi.h>
+
+#include "bnxt.h"
+#include "bnxt_gso.h"
+
+netdev_tx_t bnxt_sw_udp_gso_xmit(struct bnxt *bp,
+ struct bnxt_tx_ring_info *txr,
+ struct netdev_queue *txq,
+ struct sk_buff *skb)
+{
+ dev_kfree_skb_any(skb);
+ dev_core_stats_tx_dropped_inc(bp->dev);
+ return NETDEV_TX_OK;
+}
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h
new file mode 100644
index 000000000000..f01e8102dcd7
--- /dev/null
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Broadcom NetXtreme-C/E network driver.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation.
+ */
+
+#ifndef BNXT_GSO_H
+#define BNXT_GSO_H
+
+/* Maximum segments the stack may send in a single SW USO skb.
+ * This caps gso_max_segs for NICs without HW USO support.
+ */
+#define BNXT_SW_USO_MAX_SEGS 64
+
+/* Worst-case TX descriptors consumed by one SW USO packet:
+ * Each segment: 1 long BD + 1 ext BD + payload BDs.
+ * Total payload BDs across all segs <= num_segs + nr_frags (each frag
+ * boundary crossing adds at most 1 extra BD).
+ * So: 3 * max_segs + MAX_SKB_FRAGS + 1 = 3 * 64 + 17 + 1 = 210.
+ */
+#define BNXT_SW_USO_MAX_DESCS (3 * BNXT_SW_USO_MAX_SEGS + MAX_SKB_FRAGS + 1)
+
+netdev_tx_t bnxt_sw_udp_gso_xmit(struct bnxt *bp,
+ struct bnxt_tx_ring_info *txr,
+ struct netdev_queue *txq,
+ struct sk_buff *skb);
+
+#endif
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 08/12] net: bnxt: Implement software USO
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (6 preceding siblings ...)
2026-03-12 22:34 ` [RFC net-next v2 07/12] net: bnxt: Add boilerplate GSO code Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 09/12] net: bnxt: Add SW GSO completion and teardown support Joe Damato
` (4 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-kernel, Joe Damato
Implement bnxt_sw_udp_gso_xmit() using the core tso_dma_map API and
the pre-allocated TX inline buffer for per-segment headers.
The xmit path:
1. Calls tso_start() to initialize TSO state
2. Stack-allocates a tso_dma_map and calls tso_dma_map_init() to
DMA-map the linear payload and all frags upfront.
3. For each segment:
- Copies and patches headers via tso_build_hdr() into the
pre-allocated tx_inline_buf (DMA-synced per segment)
- Counts payload BDs via tso_dma_map_count()
- Emits long BD (header) + ext BD + payload BDs
- Payload BDs use tso_dma_map_next() which yields (dma_addr,
chunk_len, mapping_len) tuples.
Header BDs set dma_unmap_len=0 since the inline buffer is pre-allocated
and unmapped only at ring teardown.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
rfcv2:
- set the unmap len on the last descriptor, so that when completions fire
only the last completion unmaps the region.
drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c | 200 ++++++++++++++++++
1 file changed, 200 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
index b296769ee4fe..6e186d514a2b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_gso.c
@@ -19,11 +19,211 @@
#include "bnxt.h"
#include "bnxt_gso.h"
+static u32 bnxt_sw_gso_lhint(unsigned int len)
+{
+ if (len <= 512)
+ return TX_BD_FLAGS_LHINT_512_AND_SMALLER;
+ else if (len <= 1023)
+ return TX_BD_FLAGS_LHINT_512_TO_1023;
+ else if (len <= 2047)
+ return TX_BD_FLAGS_LHINT_1024_TO_2047;
+ else
+ return TX_BD_FLAGS_LHINT_2048_AND_LARGER;
+}
+
netdev_tx_t bnxt_sw_udp_gso_xmit(struct bnxt *bp,
struct bnxt_tx_ring_info *txr,
struct netdev_queue *txq,
struct sk_buff *skb)
{
+ struct bnxt_sw_tx_bd *last_unmap_buf = NULL;
+ unsigned int hdr_len, mss, num_segs;
+ unsigned int last_unmap_len = 0;
+ struct pci_dev *pdev = bp->pdev;
+ dma_addr_t last_unmap_addr = 0;
+ unsigned int total_payload;
+ int i, bds_needed, slots;
+ struct tso_dma_map map;
+ u32 vlan_tag_flags = 0;
+ struct tso_t tso;
+ u16 cfa_action;
+ u16 prod;
+
+ hdr_len = tso_start(skb, &tso);
+ mss = skb_shinfo(skb)->gso_size;
+ total_payload = skb->len - hdr_len;
+ num_segs = DIV_ROUND_UP(total_payload, mss);
+
+ /* Zero the csum fields so tso_build_hdr will propagate zeroes into
+ * every segment header. HW csum offload will recompute from scratch.
+ */
+ udp_hdr(skb)->check = 0;
+ if (!tso.ipv6)
+ ip_hdr(skb)->check = 0;
+
+ if (unlikely(num_segs <= 1))
+ return NETDEV_TX_OK;
+
+ /* Upper bound on the number of descriptors needed.
+ *
+ * Each segment uses 1 long BD + 1 ext BD + payload BDs, which is
+ * at most num_segs + nr_frags (each frag boundary crossing adds at
+ * most 1 extra BD).
+ */
+ bds_needed = 3 * num_segs + skb_shinfo(skb)->nr_frags + 1;
+
+ if (unlikely(bnxt_tx_avail(bp, txr) < bds_needed)) {
+ netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh);
+ return NETDEV_TX_BUSY;
+ }
+
+ slots = BNXT_SW_USO_MAX_SEGS - (txr->tx_inline_prod - txr->tx_inline_cons);
+
+ if (unlikely(slots < num_segs)) {
+ netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh);
+ return NETDEV_TX_BUSY;
+ }
+
+ if (unlikely(tso_dma_map_init(&map, &pdev->dev, skb, hdr_len)))
+ goto drop;
+
+ cfa_action = bnxt_xmit_get_cfa_action(skb);
+ if (skb_vlan_tag_present(skb)) {
+ vlan_tag_flags = TX_BD_CFA_META_KEY_VLAN |
+ skb_vlan_tag_get(skb);
+ if (skb->vlan_proto == htons(ETH_P_8021Q))
+ vlan_tag_flags |= 1 << TX_BD_CFA_META_TPID_SHIFT;
+ }
+
+ prod = txr->tx_prod;
+
+ for (i = 0; i < num_segs; i++) {
+ unsigned int seg_payload = min_t(unsigned int, mss,
+ total_payload - i * mss);
+ u16 slot = (txr->tx_inline_prod + i) &
+ (BNXT_SW_USO_MAX_SEGS - 1);
+ struct bnxt_sw_tx_bd *tx_buf;
+ unsigned int mapping_len;
+ dma_addr_t this_hdr_dma;
+ unsigned int chunk_len;
+ unsigned int offset;
+ dma_addr_t dma_addr;
+ struct tx_bd *txbd;
+ void *this_hdr;
+ int bd_count;
+ __le32 csum;
+ bool last;
+ u32 flags;
+
+ last = (i == num_segs - 1);
+ offset = slot * TSO_HEADER_SIZE;
+ this_hdr = txr->tx_inline_buf + offset;
+ this_hdr_dma = txr->tx_inline_dma + offset;
+
+ tso_build_hdr(skb, this_hdr, &tso, seg_payload, last);
+
+ dma_sync_single_for_device(&pdev->dev, this_hdr_dma,
+ hdr_len, DMA_TO_DEVICE);
+
+ bd_count = tso_dma_map_count(&map, seg_payload);
+
+ tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
+ txbd = &txr->tx_desc_ring[TX_RING(bp, prod)][TX_IDX(prod)];
+
+ tx_buf->skb = skb;
+ tx_buf->nr_frags = bd_count;
+ tx_buf->is_push = 0;
+ tx_buf->is_ts_pkt = 0;
+
+ dma_unmap_addr_set(tx_buf, mapping, this_hdr_dma);
+ dma_unmap_len_set(tx_buf, len, 0);
+
+ tx_buf->is_sw_gso = last ? BNXT_SW_GSO_LAST : BNXT_SW_GSO_MID;
+
+ flags = (hdr_len << TX_BD_LEN_SHIFT) |
+ TX_BD_TYPE_LONG_TX_BD |
+ TX_BD_CNT(2 + bd_count);
+
+ flags |= bnxt_sw_gso_lhint(hdr_len + seg_payload);
+
+ txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
+ txbd->tx_bd_haddr = cpu_to_le64(this_hdr_dma);
+ txbd->tx_bd_opaque = SET_TX_OPAQUE(bp, txr, prod,
+ 2 + bd_count);
+
+ csum = cpu_to_le32(TX_BD_FLAGS_TCP_UDP_CHKSUM |
+ TX_BD_FLAGS_IP_CKSUM);
+
+ prod = NEXT_TX(prod);
+ bnxt_init_ext_bd(bp, txr, prod, csum,
+ vlan_tag_flags, cfa_action);
+
+ /* set dma_unmap_len on the LAST BD touching each
+ * region. Since completions are in-order, the last segment
+ * completes after all earlier ones, so the unmap is safe.
+ */
+ while (tso_dma_map_next(&map, &dma_addr, &chunk_len,
+ &mapping_len, seg_payload)) {
+ prod = NEXT_TX(prod);
+ txbd = &txr->tx_desc_ring[TX_RING(bp, prod)][TX_IDX(prod)];
+ tx_buf = &txr->tx_buf_ring[RING_TX(bp, prod)];
+
+ txbd->tx_bd_haddr = cpu_to_le64(dma_addr);
+ dma_unmap_addr_set(tx_buf, mapping, dma_addr);
+ dma_unmap_len_set(tx_buf, len, 0);
+ tx_buf->skb = NULL;
+ tx_buf->is_sw_gso = 0;
+
+ if (mapping_len) {
+ if (last_unmap_buf) {
+ dma_unmap_addr_set(last_unmap_buf,
+ mapping,
+ last_unmap_addr);
+ dma_unmap_len_set(last_unmap_buf,
+ len,
+ last_unmap_len);
+ }
+ last_unmap_addr = dma_addr;
+ last_unmap_len = mapping_len;
+ }
+ last_unmap_buf = tx_buf;
+
+ flags = chunk_len << TX_BD_LEN_SHIFT;
+ txbd->tx_bd_len_flags_type = cpu_to_le32(flags);
+ txbd->tx_bd_opaque = 0;
+
+ seg_payload -= chunk_len;
+ }
+
+ txbd->tx_bd_len_flags_type |=
+ cpu_to_le32(TX_BD_FLAGS_PACKET_END);
+
+ prod = NEXT_TX(prod);
+ }
+
+ if (last_unmap_buf) {
+ dma_unmap_addr_set(last_unmap_buf, mapping, last_unmap_addr);
+ dma_unmap_len_set(last_unmap_buf, len, last_unmap_len);
+ }
+
+ txr->tx_inline_prod += num_segs;
+
+ netdev_tx_sent_queue(txq, skb->len);
+
+ WRITE_ONCE(txr->tx_prod, prod);
+ /* Sync BDs before doorbell */
+ wmb();
+ bnxt_db_write(bp, &txr->tx_db, prod);
+
+ if (unlikely(bnxt_tx_avail(bp, txr) <= bp->tx_wake_thresh))
+ netif_txq_try_stop(txq, bnxt_tx_avail(bp, txr),
+ bp->tx_wake_thresh);
+
+ return NETDEV_TX_OK;
+
+drop:
dev_kfree_skb_any(skb);
dev_core_stats_tx_dropped_inc(bp->dev);
return NETDEV_TX_OK;
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 09/12] net: bnxt: Add SW GSO completion and teardown support
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (7 preceding siblings ...)
2026-03-12 22:34 ` [RFC net-next v2 08/12] net: bnxt: Implement software USO Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 10/12] net: bnxt: Dispatch to SW USO Joe Damato
` (3 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-kernel, Joe Damato
Update __bnxt_tx_int and bnxt_free_one_tx_ring_skbs to handle SW GSO
segments:
- MID segments: adjust tx_pkts/tx_bytes accounting and skip skb free
(the skb is shared across all segments and freed only once)
- LAST segments: no special cleanup needed -- payload DMA unmapping is
handled by the existing per-BD dma_unmap_len walk, and the header
inline buffer is pre-allocated per-ring (freed at ring teardown)
Both MID and LAST completions advance tx_inline_cons to release the
segment's inline header slot back to the ring.
is_sw_gso is initialized to zero, so the new code paths are not run.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
rfcv2:
- Update the shared header buffer consumer on TX completion.
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 69 ++++++++++++++++---
.../net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 19 ++++-
2 files changed, 78 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8929264a54b1..60daf813154e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -74,6 +74,8 @@
#include "bnxt_debugfs.h"
#include "bnxt_coredump.h"
#include "bnxt_hwmon.h"
+#include "bnxt_gso.h"
+#include <net/tso.h>
#define BNXT_TX_TIMEOUT (5 * HZ)
#define BNXT_DEF_MSG_ENABLE (NETIF_MSG_DRV | NETIF_MSG_HW | \
@@ -817,12 +819,13 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
bool rc = false;
while (RING_TX(bp, cons) != hw_cons) {
- struct bnxt_sw_tx_bd *tx_buf;
+ struct bnxt_sw_tx_bd *tx_buf, *head_buf;
struct sk_buff *skb;
bool is_ts_pkt;
int j, last;
tx_buf = &txr->tx_buf_ring[RING_TX(bp, cons)];
+ head_buf = tx_buf;
skb = tx_buf->skb;
if (unlikely(!skb)) {
@@ -869,6 +872,17 @@ static bool __bnxt_tx_int(struct bnxt *bp, struct bnxt_tx_ring_info *txr,
DMA_TO_DEVICE, 0);
}
}
+
+ if (unlikely(head_buf->is_sw_gso)) {
+ txr->tx_inline_cons++;
+ if (head_buf->is_sw_gso == BNXT_SW_GSO_MID) {
+ tx_pkts--;
+ tx_bytes -= skb->len;
+ skb = NULL;
+ }
+ head_buf->is_sw_gso = 0;
+ }
+
if (unlikely(is_ts_pkt)) {
if (BNXT_CHIP_P5(bp)) {
/* PTP worker takes ownership of the skb */
@@ -3418,6 +3432,7 @@ static void bnxt_free_one_tx_ring_skbs(struct bnxt *bp,
for (i = 0; i < max_idx;) {
struct bnxt_sw_tx_bd *tx_buf = &txr->tx_buf_ring[i];
+ struct bnxt_sw_tx_bd *head_buf = tx_buf;
struct sk_buff *skb;
int j, last;
@@ -3470,7 +3485,13 @@ static void bnxt_free_one_tx_ring_skbs(struct bnxt *bp,
DMA_TO_DEVICE, 0);
}
}
- dev_kfree_skb(skb);
+ if (head_buf->is_sw_gso) {
+ txr->tx_inline_cons++;
+ if (head_buf->is_sw_gso == BNXT_SW_GSO_MID)
+ skb = NULL;
+ }
+ if (skb)
+ dev_kfree_skb(skb);
}
netdev_tx_reset_queue(netdev_get_tx_queue(bp->dev, idx));
}
@@ -3996,9 +4017,9 @@ static void bnxt_free_tx_inline_buf(struct bnxt_tx_ring_info *txr,
txr->tx_inline_size = 0;
}
-static int __maybe_unused bnxt_alloc_tx_inline_buf(struct bnxt_tx_ring_info *txr,
- struct pci_dev *pdev,
- unsigned int size)
+static int bnxt_alloc_tx_inline_buf(struct bnxt_tx_ring_info *txr,
+ struct pci_dev *pdev,
+ unsigned int size)
{
txr->tx_inline_buf = kmalloc(size, GFP_KERNEL);
if (!txr->tx_inline_buf)
@@ -4101,6 +4122,14 @@ static int bnxt_alloc_tx_rings(struct bnxt *bp)
sizeof(struct tx_push_bd);
txr->data_mapping = cpu_to_le64(mapping);
}
+ if (!(bp->flags & BNXT_FLAG_UDP_GSO_CAP) &&
+ (bp->dev->features & NETIF_F_GSO_UDP_L4)) {
+ rc = bnxt_alloc_tx_inline_buf(txr, pdev,
+ BNXT_SW_USO_MAX_SEGS *
+ TSO_HEADER_SIZE);
+ if (rc)
+ return rc;
+ }
qidx = bp->tc_to_qidx[j];
ring->queue_id = bp->q_info[qidx].queue_id;
spin_lock_init(&txr->xdp_tx_lock);
@@ -4643,6 +4672,10 @@ static int bnxt_init_tx_rings(struct bnxt *bp)
bp->tx_wake_thresh = max_t(int, bp->tx_ring_size / 2,
BNXT_MIN_TX_DESC_CNT);
+ if (!(bp->flags & BNXT_FLAG_UDP_GSO_CAP) &&
+ (bp->dev->features & NETIF_F_GSO_UDP_L4))
+ bp->tx_wake_thresh = max_t(int, bp->tx_wake_thresh,
+ BNXT_SW_USO_MAX_DESCS);
for (i = 0; i < bp->tx_nr_rings; i++) {
struct bnxt_tx_ring_info *txr = &bp->tx_ring[i];
@@ -13831,6 +13864,11 @@ static netdev_features_t bnxt_fix_features(struct net_device *dev,
if ((features & NETIF_F_NTUPLE) && !bnxt_rfs_capable(bp, false))
features &= ~NETIF_F_NTUPLE;
+ if ((features & NETIF_F_GSO_UDP_L4) &&
+ !(bp->flags & BNXT_FLAG_UDP_GSO_CAP) &&
+ bp->tx_ring_size < 2 * BNXT_SW_USO_MAX_DESCS)
+ features &= ~NETIF_F_GSO_UDP_L4;
+
if ((bp->flags & BNXT_FLAG_NO_AGG_RINGS) || bp->xdp_prog)
features &= ~(NETIF_F_LRO | NETIF_F_GRO_HW);
@@ -13876,6 +13914,15 @@ static int bnxt_set_features(struct net_device *dev, netdev_features_t features)
int rc = 0;
bool re_init = false;
+ if (!(bp->flags & BNXT_FLAG_UDP_GSO_CAP)) {
+ if (features & NETIF_F_GSO_UDP_L4)
+ bp->tx_wake_thresh = max_t(int, bp->tx_wake_thresh,
+ BNXT_SW_USO_MAX_DESCS);
+ else
+ bp->tx_wake_thresh = max_t(int, bp->tx_ring_size / 2,
+ BNXT_MIN_TX_DESC_CNT);
+ }
+
flags &= ~BNXT_FLAG_ALL_CONFIG_FEATS;
if (features & NETIF_F_GRO_HW)
flags |= BNXT_FLAG_GRO;
@@ -16879,8 +16926,7 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
NETIF_F_GSO_UDP_TUNNEL_CSUM | NETIF_F_GSO_GRE_CSUM |
NETIF_F_GSO_PARTIAL | NETIF_F_RXHASH |
NETIF_F_RXCSUM | NETIF_F_GRO;
- if (bp->flags & BNXT_FLAG_UDP_GSO_CAP)
- dev->hw_features |= NETIF_F_GSO_UDP_L4;
+ dev->hw_features |= NETIF_F_GSO_UDP_L4;
if (BNXT_SUPPORTS_TPA(bp))
dev->hw_features |= NETIF_F_LRO;
@@ -16913,8 +16959,15 @@ static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
dev->priv_flags |= IFF_UNICAST_FLT;
netif_set_tso_max_size(dev, GSO_MAX_SIZE);
- if (bp->tso_max_segs)
+ if (!(bp->flags & BNXT_FLAG_UDP_GSO_CAP)) {
+ u16 max_segs = BNXT_SW_USO_MAX_SEGS;
+
+ if (bp->tso_max_segs)
+ max_segs = min_t(u16, max_segs, bp->tso_max_segs);
+ netif_set_tso_max_segs(dev, max_segs);
+ } else if (bp->tso_max_segs) {
netif_set_tso_max_segs(dev, bp->tso_max_segs);
+ }
dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_REDIRECT |
NETDEV_XDP_ACT_RX_SG;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 26fcd52c8a61..1a2c6920e9e1 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -33,6 +33,7 @@
#include "bnxt_xdp.h"
#include "bnxt_ptp.h"
#include "bnxt_ethtool.h"
+#include "bnxt_gso.h"
#include "bnxt_nvm_defs.h" /* NVRAM content constant and structure defs */
#include "bnxt_fw_hdr.h" /* Firmware hdr constant and structure defs */
#include "bnxt_coredump.h"
@@ -852,12 +853,18 @@ static int bnxt_set_ringparam(struct net_device *dev,
u8 tcp_data_split = kernel_ering->tcp_data_split;
struct bnxt *bp = netdev_priv(dev);
u8 hds_config_mod;
+ int rc;
if ((ering->rx_pending > BNXT_MAX_RX_DESC_CNT) ||
(ering->tx_pending > BNXT_MAX_TX_DESC_CNT) ||
(ering->tx_pending < BNXT_MIN_TX_DESC_CNT))
return -EINVAL;
+ if ((dev->features & NETIF_F_GSO_UDP_L4) &&
+ !(bp->flags & BNXT_FLAG_UDP_GSO_CAP) &&
+ ering->tx_pending < 2 * BNXT_SW_USO_MAX_DESCS)
+ return -EINVAL;
+
hds_config_mod = tcp_data_split != dev->cfg->hds_config;
if (tcp_data_split == ETHTOOL_TCP_DATA_SPLIT_DISABLED && hds_config_mod)
return -EINVAL;
@@ -882,9 +889,17 @@ static int bnxt_set_ringparam(struct net_device *dev,
bp->tx_ring_size = ering->tx_pending;
bnxt_set_ring_params(bp);
- if (netif_running(dev))
- return bnxt_open_nic(bp, false, false);
+ if (netif_running(dev)) {
+ rc = bnxt_open_nic(bp, false, false);
+ if (rc)
+ return rc;
+ }
+ /* ring size changes may affect features (SW USO requires a minimum
+ * ring size), so recalculate features to ensure the correct features
+ * are blocked/available.
+ */
+ netdev_update_features(dev);
return 0;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 10/12] net: bnxt: Dispatch to SW USO
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (8 preceding siblings ...)
2026-03-12 22:34 ` [RFC net-next v2 09/12] net: bnxt: Add SW GSO completion and teardown support Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 11/12] net: netdevsim: Add support for " Joe Damato
` (2 subsequent siblings)
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, Michael Chan, Pavan Chebbi, Andrew Lunn, David S. Miller,
Eric Dumazet, Jakub Kicinski, Paolo Abeni
Cc: linux-kernel, Joe Damato
Wire in the SW USO path added in preceding commits when hardware USO is
not possible.
When a GSO skb with SKB_GSO_UDP_L4 arrives and the NIC lacks HW USO
capability, redirect to bnxt_sw_udp_gso_xmit() which handles software
segmentation into individual UDP frames submitted directly to the TX
ring.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 60daf813154e..c09772aa2b32 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -508,6 +508,11 @@ static netdev_tx_t bnxt_start_xmit(struct sk_buff *skb, struct net_device *dev)
}
}
#endif
+ if (skb_is_gso(skb) &&
+ (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) &&
+ !(bp->flags & BNXT_FLAG_UDP_GSO_CAP))
+ return bnxt_sw_udp_gso_xmit(bp, txr, txq, skb);
+
free_size = bnxt_tx_avail(bp, txr);
if (unlikely(free_size < skb_shinfo(skb)->nr_frags + 2)) {
/* We must have raced with NAPI cleanup */
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 11/12] net: netdevsim: Add support for SW USO
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (9 preceding siblings ...)
2026-03-12 22:34 ` [RFC net-next v2 10/12] net: bnxt: Dispatch to SW USO Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-12 22:34 ` [RFC net-next v2 12/12] selftests: drv-net: Add USO test Joe Damato
2026-03-16 19:44 ` [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Leon Romanovsky
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, Jakub Kicinski, Andrew Lunn, David S. Miller,
Eric Dumazet, Paolo Abeni
Cc: michael.chan, pavan.chebbi, linux-kernel, Joe Damato
Add support for UDP Segmentation Offloading in software (SW USO). This
is helpful for testing when real hardware is not available. A test which
uses this codepath will be added in a following commit.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
rfcv2:
- new in rfcv2
drivers/net/netdevsim/netdev.c | 100 ++++++++++++++++++++++++++++++++-
1 file changed, 99 insertions(+), 1 deletion(-)
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index 5ec028a00c62..f7dd7692a5d9 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -30,6 +30,7 @@
#include <net/rtnetlink.h>
#include <net/udp_tunnel.h>
#include <net/busy_poll.h>
+#include <net/tso.h>
#include "netdevsim.h"
@@ -117,6 +118,98 @@ static int nsim_forward_skb(struct net_device *tx_dev,
return nsim_napi_rx(tx_dev, rx_dev, rq, skb);
}
+static netdev_tx_t nsim_uso_segment_xmit(struct net_device *dev,
+ struct sk_buff *skb)
+{
+ unsigned int hdr_len, mss, total_payload, num_segs;
+ struct netdevsim *ns = netdev_priv(dev);
+ struct net_device *peer_dev;
+ unsigned int total_len = 0;
+ struct netdevsim *peer_ns;
+ struct nsim_rq *rq;
+ struct tso_t tso;
+ int i, rxq;
+
+ hdr_len = tso_start(skb, &tso);
+ mss = skb_shinfo(skb)->gso_size;
+ total_payload = skb->len - hdr_len;
+ num_segs = DIV_ROUND_UP(total_payload, mss);
+
+ udp_hdr(skb)->check = 0;
+ if (!tso.ipv6)
+ ip_hdr(skb)->check = 0;
+
+ rcu_read_lock();
+ peer_ns = rcu_dereference(ns->peer);
+ if (!peer_ns)
+ goto out_drop_free;
+
+ peer_dev = peer_ns->netdev;
+ rxq = skb_get_queue_mapping(skb);
+ if (rxq >= peer_dev->num_rx_queues)
+ rxq = rxq % peer_dev->num_rx_queues;
+ rq = peer_ns->rq[rxq];
+
+ for (i = 0; i < num_segs; i++) {
+ unsigned int seg_payload = min_t(unsigned int, mss,
+ total_payload);
+ bool last = (i == num_segs - 1);
+ unsigned int seg_remaining;
+ struct sk_buff *seg;
+
+ seg = alloc_skb(hdr_len + seg_payload, GFP_ATOMIC);
+ if (!seg)
+ break;
+
+ seg->dev = dev;
+
+ tso_build_hdr(skb, skb_put(seg, hdr_len), &tso,
+ seg_payload, last);
+
+ if (!tso.ipv6) {
+ unsigned int nh_off = skb_network_offset(skb);
+ struct iphdr *iph;
+
+ iph = (struct iphdr *)(seg->data + nh_off);
+ iph->check = ip_fast_csum(iph, iph->ihl);
+ }
+
+ seg_remaining = seg_payload;
+ while (seg_remaining > 0) {
+ unsigned int chunk = min_t(unsigned int, tso.size,
+ seg_remaining);
+
+ memcpy(skb_put(seg, chunk), tso.data, chunk);
+ tso_build_data(skb, &tso, chunk);
+ seg_remaining -= chunk;
+ }
+
+ total_payload -= seg_payload;
+
+ seg->ip_summed = CHECKSUM_UNNECESSARY;
+
+ if (nsim_forward_skb(dev, peer_dev, seg, rq, NULL) == NET_RX_DROP)
+ continue;
+
+ total_len += hdr_len + seg_payload;
+ }
+
+ if (!hrtimer_active(&rq->napi_timer))
+ hrtimer_start(&rq->napi_timer, us_to_ktime(5),
+ HRTIMER_MODE_REL);
+
+ rcu_read_unlock();
+ dev_kfree_skb(skb);
+ dev_dstats_tx_add(dev, total_len);
+ return NETDEV_TX_OK;
+
+out_drop_free:
+ dev_kfree_skb(skb);
+ rcu_read_unlock();
+ dev_dstats_tx_dropped(dev);
+ return NETDEV_TX_OK;
+}
+
static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
{
struct netdevsim *ns = netdev_priv(dev);
@@ -129,6 +222,10 @@ static netdev_tx_t nsim_start_xmit(struct sk_buff *skb, struct net_device *dev)
int rxq;
int dr;
+ if (skb_is_gso(skb) &&
+ skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4)
+ return nsim_uso_segment_xmit(dev, skb);
+
rcu_read_lock();
if (!nsim_ipsec_tx(ns, skb))
goto out_drop_any;
@@ -986,7 +1083,8 @@ static void nsim_setup(struct net_device *dev)
NETIF_F_HW_CSUM |
NETIF_F_LRO |
NETIF_F_TSO |
- NETIF_F_LOOPBACK;
+ NETIF_F_LOOPBACK |
+ NETIF_F_GSO_UDP_L4;
dev->pcpu_stat_type = NETDEV_PCPU_STAT_DSTATS;
dev->max_mtu = ETH_MAX_MTU;
dev->xdp_features = NETDEV_XDP_ACT_BASIC | NETDEV_XDP_ACT_HW_OFFLOAD;
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* [RFC net-next v2 12/12] selftests: drv-net: Add USO test
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (10 preceding siblings ...)
2026-03-12 22:34 ` [RFC net-next v2 11/12] net: netdevsim: Add support for " Joe Damato
@ 2026-03-12 22:34 ` Joe Damato
2026-03-16 19:44 ` [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Leon Romanovsky
12 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-12 22:34 UTC (permalink / raw)
To: netdev, Andrew Lunn, David S. Miller, Eric Dumazet,
Jakub Kicinski, Paolo Abeni, Shuah Khan
Cc: michael.chan, pavan.chebbi, linux-kernel, Joe Damato,
linux-kselftest
Add a simple test for USO. Can be used with netdevsim or real hardware.
Tests both ipv4 and ipv6 with several full segments and a partial
segment.
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Joe Damato <joe@dama.to>
---
rfcv2:
- new in rfcv2
tools/testing/selftests/drivers/net/Makefile | 1 +
tools/testing/selftests/drivers/net/uso.py | 87 ++++++++++++++++++++
2 files changed, 88 insertions(+)
create mode 100755 tools/testing/selftests/drivers/net/uso.py
diff --git a/tools/testing/selftests/drivers/net/Makefile b/tools/testing/selftests/drivers/net/Makefile
index 8154d6d429d3..800065fe443f 100644
--- a/tools/testing/selftests/drivers/net/Makefile
+++ b/tools/testing/selftests/drivers/net/Makefile
@@ -22,6 +22,7 @@ TEST_PROGS := \
ring_reconfig.py \
shaper.py \
stats.py \
+ uso.py \
xdp.py \
# end of TEST_PROGS
diff --git a/tools/testing/selftests/drivers/net/uso.py b/tools/testing/selftests/drivers/net/uso.py
new file mode 100755
index 000000000000..da7a68b15734
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/uso.py
@@ -0,0 +1,87 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"""Test USO
+
+Sends large UDP datagrams with UDP_SEGMENT and verifies that the peer
+receives the correct number of individual segments with correct sizes.
+"""
+import socket
+import struct
+import time
+
+from lib.py import ksft_pr, ksft_run, ksft_exit, KsftSkipEx
+from lib.py import ksft_eq, ksft_ge
+from lib.py import NetDrvEpEnv
+from lib.py import bkg, cmd, defer, ethtool, ip, rand_port, wait_port_listen
+
+# python doesn't expose this constant, so we need to hardcode it to enable UDP
+# segmentation for large payloads
+UDP_SEGMENT = 103
+
+def _send_uso(cfg, ipver, mss, total_payload, port):
+ if ipver == "4":
+ sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
+ dst = (cfg.remote_addr_v["4"], port)
+ else:
+ sock = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM)
+ dst = (cfg.remote_addr_v["6"], port)
+
+ sock.setsockopt(socket.IPPROTO_UDP, UDP_SEGMENT, mss)
+ payload = bytes(range(256)) * ((total_payload // 256) + 1)
+ payload = payload[:total_payload]
+ sock.sendto(payload, dst)
+ sock.close()
+ return payload
+
+def _get_rx_packets(cfg):
+ stats = ip(f"-s link show dev {cfg.remote_ifname}",
+ json=True, host=cfg.remote)[0]
+ return stats['stats64']['rx']['packets']
+
+def _test_uso(cfg, ipver, mss, total_payload):
+ cfg.require_ipver(ipver)
+
+ try:
+ ethtool(f"-K {cfg.ifname} tx-udp-segmentation on")
+ except Exception:
+ raise KsftSkipEx("Device does not support tx-udp-segmentation")
+ defer(ethtool, f"-K {cfg.ifname} tx-udp-segmentation off")
+
+ expected_segs = (total_payload + mss - 1) // mss
+
+ rx_before = _get_rx_packets(cfg)
+
+ port = rand_port(stype=socket.SOCK_DGRAM)
+ _send_uso(cfg, ipver, mss, total_payload, port)
+
+ time.sleep(0.5)
+
+ rx_after = _get_rx_packets(cfg)
+ rx_delta = rx_after - rx_before
+
+ ksft_ge(rx_delta, expected_segs,
+ comment=f"Expected >= {expected_segs} rx packets, got {rx_delta}")
+
+def test_uso_v4(cfg):
+ """USO IPv4: 11 segments (10 full + 1 partial)."""
+ _test_uso(cfg, "4", 1400, 1400 * 10 + 500)
+
+def test_uso_v6(cfg):
+ """USO IPv6: 11 segments (10 full + 1 partial)."""
+ _test_uso(cfg, "6", 1400, 1400 * 10 + 500)
+
+def test_uso_v4_exact(cfg):
+ """USO IPv4: exact multiple of MSS (5 full segments)."""
+ _test_uso(cfg, "4", 1400, 1400 * 5)
+
+def main() -> None:
+ with NetDrvEpEnv(__file__) as cfg:
+ ksft_run([test_uso_v4,
+ test_uso_v6,
+ test_uso_v4_exact],
+ args=(cfg, ))
+ ksft_exit()
+
+if __name__ == "__main__":
+ main()
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread* Re: [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support
2026-03-12 22:34 [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Joe Damato
` (11 preceding siblings ...)
2026-03-12 22:34 ` [RFC net-next v2 12/12] selftests: drv-net: Add USO test Joe Damato
@ 2026-03-16 19:44 ` Leon Romanovsky
2026-03-16 21:02 ` Joe Damato
12 siblings, 1 reply; 15+ messages in thread
From: Leon Romanovsky @ 2026-03-16 19:44 UTC (permalink / raw)
To: Joe Damato
Cc: netdev, michael.chan, pavan.chebbi, linux-kernel,
Marek Szyprowski
On Thu, Mar 12, 2026 at 03:34:37PM -0700, Joe Damato wrote:
> Greetings:
>
> This series extends net/tso to add a data structure and some helpers allowing
> drivers to DMA map headers and packet payloads a single time. The helpers can
> then be used to reference slices of shared mapping for each segment. This
> helps to avoid the cost of repeated DMA mappings, especially on systems which
> use an IOMMU.
In modern kernels, it is done by using DMA IOVA API, see NVMe
driver/block layer for the most comprehensive example.
The pseudo code is:
if (with_iommu)
use dma_iova_link/dma_iova_unlink
else
use dma_map_phys()
https://lore.kernel.org/all/cover.1746424934.git.leon@kernel.org/
https://lore.kernel.org/all/20250623141259.76767-1-hch@lst.de/
https://lwn.net/Articles/997563/
Thanks
^ permalink raw reply [flat|nested] 15+ messages in thread* Re: [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support
2026-03-16 19:44 ` [RFC net-next v2 00/12] Add TSO map-once DMA helpers and bnxt SW USO support Leon Romanovsky
@ 2026-03-16 21:02 ` Joe Damato
0 siblings, 0 replies; 15+ messages in thread
From: Joe Damato @ 2026-03-16 21:02 UTC (permalink / raw)
To: Leon Romanovsky
Cc: netdev, michael.chan, pavan.chebbi, linux-kernel,
Marek Szyprowski
On Mon, Mar 16, 2026 at 09:44:19PM +0200, Leon Romanovsky wrote:
> On Thu, Mar 12, 2026 at 03:34:37PM -0700, Joe Damato wrote:
> > Greetings:
> >
> > This series extends net/tso to add a data structure and some helpers allowing
> > drivers to DMA map headers and packet payloads a single time. The helpers can
> > then be used to reference slices of shared mapping for each segment. This
> > helps to avoid the cost of repeated DMA mappings, especially on systems which
> > use an IOMMU.
>
> In modern kernels, it is done by using DMA IOVA API, see NVMe
> driver/block layer for the most comprehensive example.
>
> The pseudo code is:
> if (with_iommu)
> use dma_iova_link/dma_iova_unlink
> else
> use dma_map_phys()
>
> https://lore.kernel.org/all/cover.1746424934.git.leon@kernel.org/
> https://lore.kernel.org/all/20250623141259.76767-1-hch@lst.de/
> https://lwn.net/Articles/997563/
Thanks for the pointer.
I agree it's the right approach. Batching the IOVA allocation and IOTLB sync
across all regions is a clear win over the per-region
dma_map_single/skb_frag_dma_map calls I had in v2.
I'll submit a v3 with the tso_dma_map internals updated to use
dma_iova_try_alloc + dma_iova_link + dma_iova_sync, with a
dma_map_phys fallback.
^ permalink raw reply [flat|nested] 15+ messages in thread