netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 0/2]eth: bnxt: Implement rx-side device memory
@ 2025-03-31 11:47 Taehee Yoo
  2025-03-31 11:47 ` [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor Taehee Yoo
  2025-03-31 11:47 ` [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP Taehee Yoo
  0 siblings, 2 replies; 16+ messages in thread
From: Taehee Yoo @ 2025-03-31 11:47 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, horms, michael.chan,
	pavan.chebbi, ilias.apalodimas, dw, netdev
  Cc: ap420073, kuniyu, sdf, aleksander.lobakin

This patchset implements device memory TCP using netmem API instead of
page API.
The bnxt_en driver already satisfies the requirements of devmem TCP. The
only change required for devmem TCP is to switch from the page API to
the netmem API.

The first patch refactors bnxt_en drivers.
The main purpose of this is to make the bnxt_en driver use page_pool dma
sync API instead of raw DMA sync API.
page_pool_dma_sync_for_{cpu | device}() doesn't support virtual address
handling, so it requires to switch from handling virtual address to
page.
It switches from virtual address to page in the struct bnxt_sw_rx_bd.
By this change, the struct bnxt_sw_rx_bd becomes the same structure with
bnxt_sw_rx_agg_bd.
So it makes code much simpler.

The second patch switches apply netmem API.
This patch adds PP_FLAG_ALLOW_UNREADABLE_NETMEM flags to page_pool
parameter.
This flag indicates if the user enabled netmem, page_pool will be
initialized to support unreadable-netmem.

By this patchset, bnxt_en driver supports rx-side device memory TCP.
But Only Thor+ NICs support it and recent firmware is also required.
We can test device memory TCP with
tools/testing/selftests/drivers/net/hw/ncdevmem.c

This is tested with BCM57504-N425G and firmware version 232.0.155.8/pkg
232.1.132.8.

David Wei tested this patch on the io_uring side.
Thank you for testing, David.

Taehee Yoo (2):
  eth: bnxt: refactor buffer descriptor
  eth: bnxt: add support rx side device memory TCP

 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 520 ++++++++++--------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  35 +-
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  41 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   4 +-
 include/linux/netdevice.h                     |   1 +
 include/net/page_pool/helpers.h               |   6 +
 include/net/page_pool/types.h                 |   4 +-
 net/core/dev.c                                |   6 +
 net/core/page_pool.c                          |  23 +-
 10 files changed, 344 insertions(+), 298 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor
  2025-03-31 11:47 [RFC net-next 0/2]eth: bnxt: Implement rx-side device memory Taehee Yoo
@ 2025-03-31 11:47 ` Taehee Yoo
  2025-03-31 17:34   ` Jakub Kicinski
                     ` (2 more replies)
  2025-03-31 11:47 ` [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP Taehee Yoo
  1 sibling, 3 replies; 16+ messages in thread
From: Taehee Yoo @ 2025-03-31 11:47 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, horms, michael.chan,
	pavan.chebbi, ilias.apalodimas, dw, netdev
  Cc: ap420073, kuniyu, sdf, aleksander.lobakin

There are two kinds of buffer descriptors in bnxt, struct
bnxt_sw_rx_bd and struct bnxt_sw_rx_agg_bd.(+ struct bnxt_tpa_info).
The bnxt_sw_rx_bd is the bd for ring buffer, the bnxt_sw_rx_agg_bd is
the bd for the aggregation ring buffer. The purpose of these bd are the
same, but the structure is a little bit different.

struct bnxt_sw_rx_bd {
        void *data;
        u8 *data_ptr;
        dma_addr_t mapping;
};

struct bnxt_sw_rx_agg_bd {
        struct page *page;
        unsigned int offset;
        dma_addr_t mapping;
}

bnxt_sw_rx_bd->data would be either page pointer or page_address(page) +
offset. Under page mode(xdp is set), data indicates page pointer,
if not, it indicates virtual address.
Before the recent head_pool work from Jakub, bnxt_sw_rx_bd->data was
allocated by kmalloc().
But after Jakub's work, bnxt_sw_rx_bd->data is allocated by page_pool.
So, there is no reason to still keep handling virtual address anymore.
The goal of this patch is to make bnxt_sw_rx_bd the same as
the bnxt_sw_rx_agg_bd.
By this change, we can easily use page_pool API like
page_pool_dma_sync_for_{cpu | device}()
Also, we can convert from page to the netmem very smoothly by this change.

Tested-by: David Wei <dw@davidwei.uk>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 313 +++++++++---------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  31 +-
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  23 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   2 +-
 include/net/page_pool/types.h                 |   4 +-
 net/core/page_pool.c                          |  23 +-
 7 files changed, 199 insertions(+), 199 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 934ba9425857..198a42da3015 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -915,24 +915,24 @@ static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
 	if (!page)
 		return NULL;
 
-	*mapping = page_pool_get_dma_addr(page) + *offset;
+	*mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;
 	return page;
 }
 
-static inline u8 *__bnxt_alloc_rx_frag(struct bnxt *bp, dma_addr_t *mapping,
-				       struct bnxt_rx_ring_info *rxr,
-				       gfp_t gfp)
+static struct page *__bnxt_alloc_rx_frag(struct bnxt *bp, dma_addr_t *mapping,
+					 struct bnxt_rx_ring_info *rxr,
+					 unsigned int *offset,
+					 gfp_t gfp)
 {
-	unsigned int offset;
 	struct page *page;
 
-	page = page_pool_alloc_frag(rxr->head_pool, &offset,
+	page = page_pool_alloc_frag(rxr->head_pool, offset,
 				    bp->rx_buf_size, gfp);
 	if (!page)
 		return NULL;
 
-	*mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + offset;
-	return page_address(page) + offset;
+	*mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;
+	return page;
 }
 
 int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
@@ -940,35 +940,27 @@ int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 {
 	struct rx_bd *rxbd = &rxr->rx_desc_ring[RX_RING(bp, prod)][RX_IDX(prod)];
 	struct bnxt_sw_rx_bd *rx_buf = &rxr->rx_buf_ring[RING_RX(bp, prod)];
+	unsigned int offset;
 	dma_addr_t mapping;
+	struct page *page;
 
-	if (BNXT_RX_PAGE_MODE(bp)) {
-		unsigned int offset;
-		struct page *page =
-			__bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
-
-		if (!page)
-			return -ENOMEM;
-
-		mapping += bp->rx_dma_offset;
-		rx_buf->data = page;
-		rx_buf->data_ptr = page_address(page) + offset + bp->rx_offset;
-	} else {
-		u8 *data = __bnxt_alloc_rx_frag(bp, &mapping, rxr, gfp);
-
-		if (!data)
-			return -ENOMEM;
+	if (BNXT_RX_PAGE_MODE(bp))
+		page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
+	else
+		page = __bnxt_alloc_rx_frag(bp, &mapping, rxr, &offset, gfp);
+	if (!page)
+		return -ENOMEM;
 
-		rx_buf->data = data;
-		rx_buf->data_ptr = data + bp->rx_offset;
-	}
+	rx_buf->page = page;
+	rx_buf->offset = offset;
 	rx_buf->mapping = mapping;
 
 	rxbd->rx_bd_haddr = cpu_to_le64(mapping);
 	return 0;
 }
 
-void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons, void *data)
+void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons,
+			struct page *page)
 {
 	u16 prod = rxr->rx_prod;
 	struct bnxt_sw_rx_bd *cons_rx_buf, *prod_rx_buf;
@@ -978,8 +970,8 @@ void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons, void *data)
 	prod_rx_buf = &rxr->rx_buf_ring[RING_RX(bp, prod)];
 	cons_rx_buf = &rxr->rx_buf_ring[cons];
 
-	prod_rx_buf->data = data;
-	prod_rx_buf->data_ptr = cons_rx_buf->data_ptr;
+	prod_rx_buf->page = page;
+	prod_rx_buf->offset = cons_rx_buf->offset;
 
 	prod_rx_buf->mapping = cons_rx_buf->mapping;
 
@@ -999,22 +991,22 @@ static inline u16 bnxt_find_next_agg_idx(struct bnxt_rx_ring_info *rxr, u16 idx)
 	return next;
 }
 
-static inline int bnxt_alloc_rx_page(struct bnxt *bp,
-				     struct bnxt_rx_ring_info *rxr,
-				     u16 prod, gfp_t gfp)
+static inline int bnxt_alloc_rx_agg_page(struct bnxt *bp,
+					 struct bnxt_rx_ring_info *rxr,
+					 u16 prod, gfp_t gfp)
 {
 	struct rx_bd *rxbd =
 		&rxr->rx_agg_desc_ring[RX_AGG_RING(bp, prod)][RX_IDX(prod)];
-	struct bnxt_sw_rx_agg_bd *rx_agg_buf;
-	struct page *page;
-	dma_addr_t mapping;
 	u16 sw_prod = rxr->rx_sw_agg_prod;
+	struct bnxt_sw_rx_bd *rx_agg_buf;
 	unsigned int offset = 0;
+	dma_addr_t mapping;
+	struct page *page;
 
 	page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
-
 	if (!page)
 		return -ENOMEM;
+	mapping -= bp->rx_dma_offset;
 
 	if (unlikely(test_bit(sw_prod, rxr->rx_agg_bmap)))
 		sw_prod = bnxt_find_next_agg_idx(rxr, sw_prod);
@@ -1067,11 +1059,11 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
 		p5_tpa = true;
 
 	for (i = 0; i < agg_bufs; i++) {
-		u16 cons;
+		struct bnxt_sw_rx_bd *cons_rx_buf, *prod_rx_buf;
 		struct rx_agg_cmp *agg;
-		struct bnxt_sw_rx_agg_bd *cons_rx_buf, *prod_rx_buf;
 		struct rx_bd *prod_bd;
 		struct page *page;
+		u16 cons;
 
 		if (p5_tpa)
 			agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, start + i);
@@ -1111,25 +1103,24 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
 
 static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
 					      struct bnxt_rx_ring_info *rxr,
-					      u16 cons, void *data, u8 *data_ptr,
+					      u16 cons, struct page *page,
+					      unsigned int offset,
 					      dma_addr_t dma_addr,
 					      unsigned int offset_and_len)
 {
 	unsigned int len = offset_and_len & 0xffff;
-	struct page *page = data;
 	u16 prod = rxr->rx_prod;
 	struct sk_buff *skb;
 	int err;
 
 	err = bnxt_alloc_rx_data(bp, rxr, prod, GFP_ATOMIC);
 	if (unlikely(err)) {
-		bnxt_reuse_rx_data(rxr, cons, data);
+		bnxt_reuse_rx_data(rxr, cons, page);
 		return NULL;
 	}
-	dma_addr -= bp->rx_dma_offset;
-	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE,
-				bp->rx_dir);
-	skb = napi_build_skb(data_ptr - bp->rx_offset, BNXT_RX_PAGE_SIZE);
+	page_pool_dma_sync_for_cpu(rxr->page_pool, page, 0, BNXT_RX_PAGE_SIZE);
+
+	skb = napi_build_skb(bnxt_data(page, offset), BNXT_RX_PAGE_SIZE);
 	if (!skb) {
 		page_pool_recycle_direct(rxr->page_pool, page);
 		return NULL;
@@ -1143,26 +1134,26 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
 
 static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
 					struct bnxt_rx_ring_info *rxr,
-					u16 cons, void *data, u8 *data_ptr,
+					u16 cons, struct page *page,
+					unsigned int offset,
 					dma_addr_t dma_addr,
 					unsigned int offset_and_len)
 {
 	unsigned int payload = offset_and_len >> 16;
 	unsigned int len = offset_and_len & 0xffff;
-	skb_frag_t *frag;
-	struct page *page = data;
 	u16 prod = rxr->rx_prod;
 	struct sk_buff *skb;
-	int off, err;
+	skb_frag_t *frag;
+	u8 *data_ptr;
+	int err;
 
 	err = bnxt_alloc_rx_data(bp, rxr, prod, GFP_ATOMIC);
 	if (unlikely(err)) {
-		bnxt_reuse_rx_data(rxr, cons, data);
+		bnxt_reuse_rx_data(rxr, cons, page);
 		return NULL;
 	}
-	dma_addr -= bp->rx_dma_offset;
-	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE,
-				bp->rx_dir);
+	data_ptr = bnxt_data_ptr(bp, page, offset);
+	page_pool_dma_sync_for_cpu(rxr->page_pool, page, 0, BNXT_RX_PAGE_SIZE);
 
 	if (unlikely(!payload))
 		payload = eth_get_headlen(bp->dev, data_ptr, len);
@@ -1174,8 +1165,8 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
 	}
 
 	skb_mark_for_recycle(skb);
-	off = (void *)data_ptr - page_address(page);
-	skb_add_rx_frag(skb, 0, page, off, len, BNXT_RX_PAGE_SIZE);
+	skb_add_rx_frag(skb, 0, page, bp->rx_offset + offset, len,
+			BNXT_RX_PAGE_SIZE);
 	memcpy(skb->data - NET_IP_ALIGN, data_ptr - NET_IP_ALIGN,
 	       payload + NET_IP_ALIGN);
 
@@ -1190,7 +1181,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
 
 static struct sk_buff *bnxt_rx_skb(struct bnxt *bp,
 				   struct bnxt_rx_ring_info *rxr, u16 cons,
-				   void *data, u8 *data_ptr,
+				   struct page *page, unsigned int offset,
 				   dma_addr_t dma_addr,
 				   unsigned int offset_and_len)
 {
@@ -1200,15 +1191,16 @@ static struct sk_buff *bnxt_rx_skb(struct bnxt *bp,
 
 	err = bnxt_alloc_rx_data(bp, rxr, prod, GFP_ATOMIC);
 	if (unlikely(err)) {
-		bnxt_reuse_rx_data(rxr, cons, data);
+		bnxt_reuse_rx_data(rxr, cons, page);
 		return NULL;
 	}
 
-	skb = napi_build_skb(data, bp->rx_buf_size);
-	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
-				bp->rx_dir);
+	skb = napi_build_skb(bnxt_data(page, offset), bp->rx_buf_size);
+	page_pool_dma_sync_for_cpu(rxr->head_pool, page,
+				   offset + bp->rx_dma_offset,
+				   bp->rx_buf_use_size);
 	if (!skb) {
-		page_pool_free_va(rxr->head_pool, data, true);
+		page_pool_recycle_direct(rxr->head_pool, page);
 		return NULL;
 	}
 
@@ -1225,22 +1217,24 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
 			       struct xdp_buff *xdp)
 {
 	struct bnxt_napi *bnapi = cpr->bnapi;
-	struct pci_dev *pdev = bp->pdev;
-	struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
-	u16 prod = rxr->rx_agg_prod;
+	struct bnxt_rx_ring_info *rxr;
 	u32 i, total_frag_len = 0;
 	bool p5_tpa = false;
+	u16 prod;
+
+	rxr = bnapi->rx_ring;
+	prod = rxr->rx_agg_prod;
 
 	if ((bp->flags & BNXT_FLAG_CHIP_P5_PLUS) && tpa)
 		p5_tpa = true;
 
 	for (i = 0; i < agg_bufs; i++) {
 		skb_frag_t *frag = &shinfo->frags[i];
-		u16 cons, frag_len;
+		struct bnxt_sw_rx_bd *cons_rx_buf;
 		struct rx_agg_cmp *agg;
-		struct bnxt_sw_rx_agg_bd *cons_rx_buf;
-		struct page *page;
+		u16 cons, frag_len;
 		dma_addr_t mapping;
+		struct page *page;
 
 		if (p5_tpa)
 			agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, i);
@@ -1256,7 +1250,7 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
 		shinfo->nr_frags = i + 1;
 		__clear_bit(cons, rxr->rx_agg_bmap);
 
-		/* It is possible for bnxt_alloc_rx_page() to allocate
+		/* It is possible for bnxt_alloc_rx_agg_page() to allocate
 		 * a sw_prod index that equals the cons index, so we
 		 * need to clear the cons entry now.
 		 */
@@ -1267,7 +1261,7 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
 		if (xdp && page_is_pfmemalloc(page))
 			xdp_buff_set_frag_pfmemalloc(xdp);
 
-		if (bnxt_alloc_rx_page(bp, rxr, prod, GFP_ATOMIC) != 0) {
+		if (bnxt_alloc_rx_agg_page(bp, rxr, prod, GFP_ATOMIC) != 0) {
 			--shinfo->nr_frags;
 			cons_rx_buf->page = page;
 
@@ -1279,8 +1273,8 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
 			return 0;
 		}
 
-		dma_sync_single_for_cpu(&pdev->dev, mapping, BNXT_RX_PAGE_SIZE,
-					bp->rx_dir);
+		page_pool_dma_sync_for_cpu(rxr->page_pool, page, 0,
+					   BNXT_RX_PAGE_SIZE);
 
 		total_frag_len += frag_len;
 		prod = NEXT_RX_AGG(prod);
@@ -1345,43 +1339,47 @@ static int bnxt_agg_bufs_valid(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	return RX_AGG_CMP_VALID(agg, *raw_cons);
 }
 
-static struct sk_buff *bnxt_copy_data(struct bnxt_napi *bnapi, u8 *data,
-				      unsigned int len,
-				      dma_addr_t mapping)
+static struct sk_buff *bnxt_copy_data(struct bnxt_napi *bnapi,
+				      struct page *page,
+				      unsigned int offset,
+				      unsigned int len)
 {
+	struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
 	struct bnxt *bp = bnapi->bp;
-	struct pci_dev *pdev = bp->pdev;
 	struct sk_buff *skb;
 
 	skb = napi_alloc_skb(&bnapi->napi, len);
 	if (!skb)
 		return NULL;
 
-	dma_sync_single_for_cpu(&pdev->dev, mapping, bp->rx_copybreak,
-				bp->rx_dir);
+	page_pool_dma_sync_for_cpu(rxr->head_pool, page,
+				   offset + bp->rx_dma_offset,
+				   bp->rx_copybreak);
 
-	memcpy(skb->data - NET_IP_ALIGN, data - NET_IP_ALIGN,
+	memcpy(skb->data - NET_IP_ALIGN,
+	       bnxt_data_ptr(bp, page, offset) - NET_IP_ALIGN,
 	       len + NET_IP_ALIGN);
 
-	dma_sync_single_for_device(&pdev->dev, mapping, bp->rx_copybreak,
-				   bp->rx_dir);
-
+	page_pool_dma_sync_for_device(rxr->head_pool, page_to_netmem(page),
+				      bp->rx_dma_offset, bp->rx_copybreak);
 	skb_put(skb, len);
 
 	return skb;
 }
 
-static struct sk_buff *bnxt_copy_skb(struct bnxt_napi *bnapi, u8 *data,
-				     unsigned int len,
-				     dma_addr_t mapping)
+static struct sk_buff *bnxt_copy_skb(struct bnxt_napi *bnapi,
+				     struct page *page,
+				     unsigned int offset,
+				     unsigned int len)
 {
-	return bnxt_copy_data(bnapi, data, len, mapping);
+	return bnxt_copy_data(bnapi, page, offset, len);
 }
 
 static struct sk_buff *bnxt_copy_xdp(struct bnxt_napi *bnapi,
+				     struct page *page,
+				     unsigned int offset,
 				     struct xdp_buff *xdp,
-				     unsigned int len,
-				     dma_addr_t mapping)
+				     unsigned int len)
 {
 	unsigned int metasize = 0;
 	u8 *data = xdp->data;
@@ -1391,7 +1389,7 @@ static struct sk_buff *bnxt_copy_xdp(struct bnxt_napi *bnapi,
 	metasize = xdp->data - xdp->data_meta;
 	data = xdp->data_meta;
 
-	skb = bnxt_copy_data(bnapi, data, len, mapping);
+	skb = bnxt_copy_data(bnapi, page, offset, len);
 	if (!skb)
 		return skb;
 
@@ -1521,20 +1519,20 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 		bnxt_sched_reset_rxr(bp, rxr);
 		return;
 	}
-	prod_rx_buf->data = tpa_info->data;
-	prod_rx_buf->data_ptr = tpa_info->data_ptr;
+	prod_rx_buf->page = tpa_info->bd.page;
+	prod_rx_buf->offset = tpa_info->bd.offset;
 
-	mapping = tpa_info->mapping;
+	mapping = tpa_info->bd.mapping;
 	prod_rx_buf->mapping = mapping;
 
 	prod_bd = &rxr->rx_desc_ring[RX_RING(bp, prod)][RX_IDX(prod)];
 
 	prod_bd->rx_bd_haddr = cpu_to_le64(mapping);
 
-	tpa_info->data = cons_rx_buf->data;
-	tpa_info->data_ptr = cons_rx_buf->data_ptr;
-	cons_rx_buf->data = NULL;
-	tpa_info->mapping = cons_rx_buf->mapping;
+	tpa_info->bd.page = cons_rx_buf->page;
+	tpa_info->bd.offset = cons_rx_buf->offset;
+	cons_rx_buf->page = NULL;
+	tpa_info->bd.mapping = cons_rx_buf->mapping;
 
 	tpa_info->len =
 		le32_to_cpu(tpa_start->rx_tpa_start_cmp_len_flags_type) >>
@@ -1568,9 +1566,9 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 	rxr->rx_next_cons = RING_RX(bp, NEXT_RX(cons));
 	cons_rx_buf = &rxr->rx_buf_ring[cons];
 
-	bnxt_reuse_rx_data(rxr, cons, cons_rx_buf->data);
+	bnxt_reuse_rx_data(rxr, cons, cons_rx_buf->page);
 	rxr->rx_prod = NEXT_RX(rxr->rx_prod);
-	cons_rx_buf->data = NULL;
+	cons_rx_buf->page = NULL;
 }
 
 static void bnxt_abort_tpa(struct bnxt_cp_ring_info *cpr, u16 idx, u32 agg_bufs)
@@ -1796,13 +1794,13 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	struct bnxt_napi *bnapi = cpr->bnapi;
 	struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
 	struct net_device *dev = bp->dev;
-	u8 *data_ptr, agg_bufs;
-	unsigned int len;
 	struct bnxt_tpa_info *tpa_info;
-	dma_addr_t mapping;
+	unsigned int len, offset;
+	u8 *data_ptr, agg_bufs;
 	struct sk_buff *skb;
 	u16 idx = 0, agg_id;
-	void *data;
+	dma_addr_t mapping;
+	struct page *page;
 	bool gro;
 
 	if (unlikely(bnapi->in_reset)) {
@@ -1842,11 +1840,12 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 		}
 		gro = !!TPA_END_GRO(tpa_end);
 	}
-	data = tpa_info->data;
-	data_ptr = tpa_info->data_ptr;
+	page = tpa_info->bd.page;
+	offset = tpa_info->bd.offset;
+	data_ptr = bnxt_data_ptr(bp, page, offset);
 	prefetch(data_ptr);
 	len = tpa_info->len;
-	mapping = tpa_info->mapping;
+	mapping = tpa_info->bd.mapping;
 
 	if (unlikely(agg_bufs > MAX_SKB_FRAGS || TPA_END_ERRORS(tpa_end1))) {
 		bnxt_abort_tpa(cpr, idx, agg_bufs);
@@ -1857,34 +1856,36 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	}
 
 	if (len <= bp->rx_copybreak) {
-		skb = bnxt_copy_skb(bnapi, data_ptr, len, mapping);
+		skb = bnxt_copy_skb(bnapi, page, offset, len);
 		if (!skb) {
 			bnxt_abort_tpa(cpr, idx, agg_bufs);
 			cpr->sw_stats->rx.rx_oom_discards += 1;
 			return NULL;
 		}
 	} else {
-		u8 *new_data;
+		unsigned int new_offset;
 		dma_addr_t new_mapping;
+		struct page *new_page;
 
-		new_data = __bnxt_alloc_rx_frag(bp, &new_mapping, rxr,
-						GFP_ATOMIC);
-		if (!new_data) {
+		new_page = __bnxt_alloc_rx_frag(bp, &new_mapping, rxr,
+						&new_offset, GFP_ATOMIC);
+		if (!new_page) {
 			bnxt_abort_tpa(cpr, idx, agg_bufs);
 			cpr->sw_stats->rx.rx_oom_discards += 1;
 			return NULL;
 		}
 
-		tpa_info->data = new_data;
-		tpa_info->data_ptr = new_data + bp->rx_offset;
-		tpa_info->mapping = new_mapping;
+		tpa_info->bd.page = new_page;
+		tpa_info->bd.offset = new_offset;
+		tpa_info->bd.mapping = new_mapping;
 
-		skb = napi_build_skb(data, bp->rx_buf_size);
-		dma_sync_single_for_cpu(&bp->pdev->dev, mapping,
-					bp->rx_buf_use_size, bp->rx_dir);
+		skb = napi_build_skb(bnxt_data(page, offset), bp->rx_buf_size);
+		page_pool_dma_sync_for_cpu(rxr->head_pool, page,
+					   offset + bp->rx_dma_offset,
+					   bp->rx_buf_use_size);
 
 		if (!skb) {
-			page_pool_free_va(rxr->head_pool, data, true);
+			page_pool_recycle_direct(rxr->head_pool, page);
 			bnxt_abort_tpa(cpr, idx, agg_bufs);
 			cpr->sw_stats->rx.rx_oom_discards += 1;
 			return NULL;
@@ -2047,25 +2048,28 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 		       u32 *raw_cons, u8 *event)
 {
 	struct bnxt_napi *bnapi = cpr->bnapi;
-	struct bnxt_rx_ring_info *rxr = bnapi->rx_ring;
 	struct net_device *dev = bp->dev;
-	struct rx_cmp *rxcmp;
-	struct rx_cmp_ext *rxcmp1;
-	u32 tmp_raw_cons = *raw_cons;
-	u16 cons, prod, cp_cons = RING_CMP(tmp_raw_cons);
+	u8 *data_ptr, agg_bufs, cmp_type;
+	struct bnxt_rx_ring_info *rxr;
 	struct skb_shared_info *sinfo;
+	u32 tmp_raw_cons = *raw_cons;
 	struct bnxt_sw_rx_bd *rx_buf;
-	unsigned int len;
-	u8 *data_ptr, agg_bufs, cmp_type;
+	struct rx_cmp_ext *rxcmp1;
+	unsigned int len, offset;
 	bool xdp_active = false;
+	u16 cons, prod, cp_cons;
+	struct rx_cmp *rxcmp;
 	dma_addr_t dma_addr;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
+	struct page *page;
 	u32 flags, misc;
 	u32 cmpl_ts;
-	void *data;
 	int rc = 0;
 
+	rxr = bnapi->rx_ring;
+	cp_cons = RING_CMP(tmp_raw_cons);
+
 	rxcmp = (struct rx_cmp *)
 			&cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)];
 
@@ -2130,8 +2134,9 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 		goto next_rx_no_prod_no_len;
 	}
 	rx_buf = &rxr->rx_buf_ring[cons];
-	data = rx_buf->data;
-	data_ptr = rx_buf->data_ptr;
+	page = rx_buf->page;
+	offset = rx_buf->offset;
+	data_ptr = bnxt_data_ptr(bp, page, offset);
 	prefetch(data_ptr);
 
 	misc = le32_to_cpu(rxcmp->rx_cmp_misc_v1);
@@ -2146,11 +2151,11 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	}
 	*event |= BNXT_RX_EVENT;
 
-	rx_buf->data = NULL;
+	rx_buf->page = NULL;
 	if (rxcmp1->rx_cmp_cfa_code_errors_v2 & RX_CMP_L2_ERRORS) {
 		u32 rx_err = le32_to_cpu(rxcmp1->rx_cmp_cfa_code_errors_v2);
 
-		bnxt_reuse_rx_data(rxr, cons, data);
+		bnxt_reuse_rx_data(rxr, cons, page);
 		if (agg_bufs)
 			bnxt_reuse_rx_agg_bufs(cpr, cp_cons, 0, agg_bufs,
 					       false);
@@ -2173,7 +2178,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	dma_addr = rx_buf->mapping;
 
 	if (bnxt_xdp_attached(bp, rxr)) {
-		bnxt_xdp_buff_init(bp, rxr, cons, data_ptr, len, &xdp);
+		bnxt_xdp_buff_init(bp, rxr, cons, page, len, &xdp);
 		if (agg_bufs) {
 			u32 frag_len = bnxt_rx_agg_pages_xdp(bp, cpr, &xdp,
 							     cp_cons, agg_bufs,
@@ -2186,7 +2191,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	}
 
 	if (xdp_active) {
-		if (bnxt_rx_xdp(bp, rxr, cons, &xdp, data, &data_ptr, &len, event)) {
+		if (bnxt_rx_xdp(bp, rxr, cons, &xdp, page, &data_ptr, &len, event)) {
 			rc = 1;
 			goto next_rx;
 		}
@@ -2200,10 +2205,10 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 
 	if (len <= bp->rx_copybreak) {
 		if (!xdp_active)
-			skb = bnxt_copy_skb(bnapi, data_ptr, len, dma_addr);
+			skb = bnxt_copy_skb(bnapi, page, offset, len);
 		else
-			skb = bnxt_copy_xdp(bnapi, &xdp, len, dma_addr);
-		bnxt_reuse_rx_data(rxr, cons, data);
+			skb = bnxt_copy_xdp(bnapi, page, offset, &xdp, len);
+		bnxt_reuse_rx_data(rxr, cons, page);
 		if (!skb) {
 			if (agg_bufs) {
 				if (!xdp_active)
@@ -2217,11 +2222,11 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	} else {
 		u32 payload;
 
-		if (rx_buf->data_ptr == data_ptr)
+		if (bnxt_data_ptr(bp, page, offset) == data_ptr)
 			payload = misc & RX_CMP_PAYLOAD_OFFSET;
 		else
 			payload = 0;
-		skb = bp->rx_skb_func(bp, rxr, cons, data, data_ptr, dma_addr,
+		skb = bp->rx_skb_func(bp, rxr, cons, page, offset, dma_addr,
 				      payload | len);
 		if (!skb)
 			goto oom_next_rx;
@@ -3424,16 +3429,13 @@ static void bnxt_free_one_rx_ring(struct bnxt *bp, struct bnxt_rx_ring_info *rxr
 
 	for (i = 0; i < max_idx; i++) {
 		struct bnxt_sw_rx_bd *rx_buf = &rxr->rx_buf_ring[i];
-		void *data = rx_buf->data;
+		struct page *page = rx_buf->page;
 
-		if (!data)
+		if (!page)
 			continue;
 
-		rx_buf->data = NULL;
-		if (BNXT_RX_PAGE_MODE(bp))
-			page_pool_recycle_direct(rxr->page_pool, data);
-		else
-			page_pool_free_va(rxr->head_pool, data, true);
+		rx_buf->page = NULL;
+		page_pool_recycle_direct(rxr->page_pool, page);
 	}
 }
 
@@ -3444,7 +3446,7 @@ static void bnxt_free_one_rx_agg_ring(struct bnxt *bp, struct bnxt_rx_ring_info
 	max_idx = bp->rx_agg_nr_pages * RX_DESC_CNT;
 
 	for (i = 0; i < max_idx; i++) {
-		struct bnxt_sw_rx_agg_bd *rx_agg_buf = &rxr->rx_agg_ring[i];
+		struct bnxt_sw_rx_bd *rx_agg_buf = &rxr->rx_agg_ring[i];
 		struct page *page = rx_agg_buf->page;
 
 		if (!page)
@@ -3464,13 +3466,13 @@ static void bnxt_free_one_tpa_info_data(struct bnxt *bp,
 
 	for (i = 0; i < bp->max_tpa; i++) {
 		struct bnxt_tpa_info *tpa_info = &rxr->rx_tpa[i];
-		u8 *data = tpa_info->data;
+		struct page *page = tpa_info->bd.page;
 
-		if (!data)
+		if (!page)
 			continue;
 
-		tpa_info->data = NULL;
-		page_pool_free_va(rxr->head_pool, data, false);
+		tpa_info->bd.page = NULL;
+		page_pool_put_full_page(rxr->head_pool, page, false);
 	}
 }
 
@@ -4330,7 +4332,7 @@ static void bnxt_alloc_one_rx_ring_page(struct bnxt *bp,
 
 	prod = rxr->rx_agg_prod;
 	for (i = 0; i < bp->rx_agg_ring_size; i++) {
-		if (bnxt_alloc_rx_page(bp, rxr, prod, GFP_KERNEL)) {
+		if (bnxt_alloc_rx_agg_page(bp, rxr, prod, GFP_KERNEL)) {
 			netdev_warn(bp->dev, "init'ed rx ring %d with %d/%d pages only\n",
 				    ring_nr, i, bp->rx_ring_size);
 			break;
@@ -4343,19 +4345,20 @@ static void bnxt_alloc_one_rx_ring_page(struct bnxt *bp,
 static int bnxt_alloc_one_tpa_info_data(struct bnxt *bp,
 					struct bnxt_rx_ring_info *rxr)
 {
+	unsigned int offset;
 	dma_addr_t mapping;
-	u8 *data;
+	struct page *page;
 	int i;
 
 	for (i = 0; i < bp->max_tpa; i++) {
-		data = __bnxt_alloc_rx_frag(bp, &mapping, rxr,
+		page = __bnxt_alloc_rx_frag(bp, &mapping, rxr, &offset,
 					    GFP_KERNEL);
-		if (!data)
+		if (!page)
 			return -ENOMEM;
 
-		rxr->rx_tpa[i].data = data;
-		rxr->rx_tpa[i].data_ptr = data + bp->rx_offset;
-		rxr->rx_tpa[i].mapping = mapping;
+		rxr->rx_tpa[i].bd.page = page;
+		rxr->rx_tpa[i].bd.offset = offset;
+		rxr->rx_tpa[i].bd.mapping = mapping;
 	}
 
 	return 0;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 21726cf56586..13db8b7bd4b7 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -807,7 +807,7 @@ struct nqe_cn {
 #define SW_RXBD_RING_SIZE (sizeof(struct bnxt_sw_rx_bd) * RX_DESC_CNT)
 #define HW_RXBD_RING_SIZE (sizeof(struct rx_bd) * RX_DESC_CNT)
 
-#define SW_RXBD_AGG_RING_SIZE (sizeof(struct bnxt_sw_rx_agg_bd) * RX_DESC_CNT)
+#define SW_RXBD_AGG_RING_SIZE (sizeof(struct bnxt_sw_rx_bd) * RX_DESC_CNT)
 
 #define SW_TXBD_RING_SIZE (sizeof(struct bnxt_sw_tx_bd) * TX_DESC_CNT)
 #define HW_TXBD_RING_SIZE (sizeof(struct tx_bd) * TX_DESC_CNT)
@@ -897,12 +897,6 @@ struct bnxt_sw_tx_bd {
 };
 
 struct bnxt_sw_rx_bd {
-	void			*data;
-	u8			*data_ptr;
-	dma_addr_t		mapping;
-};
-
-struct bnxt_sw_rx_agg_bd {
 	struct page		*page;
 	unsigned int		offset;
 	dma_addr_t		mapping;
@@ -1049,9 +1043,7 @@ struct bnxt_coal {
 };
 
 struct bnxt_tpa_info {
-	void			*data;
-	u8			*data_ptr;
-	dma_addr_t		mapping;
+	struct bnxt_sw_rx_bd	bd;
 	u16			len;
 	unsigned short		gso_type;
 	u32			flags2;
@@ -1102,7 +1094,7 @@ struct bnxt_rx_ring_info {
 	struct bnxt_sw_rx_bd	*rx_buf_ring;
 
 	struct rx_bd		*rx_agg_desc_ring[MAX_RX_AGG_PAGES];
-	struct bnxt_sw_rx_agg_bd	*rx_agg_ring;
+	struct bnxt_sw_rx_bd	*rx_agg_ring;
 
 	unsigned long		*rx_agg_bmap;
 	u16			rx_agg_bmap_size;
@@ -2350,7 +2342,8 @@ struct bnxt {
 
 	struct sk_buff *	(*rx_skb_func)(struct bnxt *,
 					       struct bnxt_rx_ring_info *,
-					       u16, void *, u8 *, dma_addr_t,
+					       u16, struct page *, unsigned int,
+					       dma_addr_t,
 					       unsigned int);
 
 	u16			max_tpa_v2;
@@ -2870,12 +2863,24 @@ static inline bool bnxt_sriov_cfg(struct bnxt *bp)
 #endif
 }
 
+static inline u8 *bnxt_data_ptr(struct bnxt *bp, struct page *page,
+				unsigned int offset)
+{
+	return page_address(page) + offset + bp->rx_offset;
+}
+
+static inline u8 *bnxt_data(struct page *page, unsigned int offset)
+{
+	return page_address(page) + offset;
+}
+
 extern const u16 bnxt_bstore_to_trace[];
 extern const u16 bnxt_lhint_arr[];
 
 int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 		       u16 prod, gfp_t gfp);
-void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons, void *data);
+void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons,
+			struct page *page);
 u32 bnxt_fw_health_readl(struct bnxt *bp, int reg_idx);
 bool bnxt_bs_trace_avail(struct bnxt *bp, u16 type);
 void bnxt_set_tpa_flags(struct bnxt *bp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 48dd5922e4dd..2938697aa381 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -4855,7 +4855,7 @@ static int bnxt_rx_loopback(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 		&cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)];
 	cons = rxcmp->rx_cmp_opaque;
 	rx_buf = &rxr->rx_buf_ring[cons];
-	data = rx_buf->data_ptr;
+	data = bnxt_data_ptr(bp, rx_buf->page, rx_buf->offset);
 	len = le32_to_cpu(rxcmp->rx_cmp_len_flags_type) >> RX_CMP_LEN_SHIFT;
 	if (len != pkt_size)
 		return -EIO;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index e675611777b5..1ba2ad7bf4b0 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -180,24 +180,12 @@ bool bnxt_xdp_attached(struct bnxt *bp, struct bnxt_rx_ring_info *rxr)
 }
 
 void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
-			u16 cons, u8 *data_ptr, unsigned int len,
+			u16 cons, struct page *page, unsigned int len,
 			struct xdp_buff *xdp)
 {
-	u32 buflen = BNXT_RX_PAGE_SIZE;
-	struct bnxt_sw_rx_bd *rx_buf;
-	struct pci_dev *pdev;
-	dma_addr_t mapping;
-	u32 offset;
-
-	pdev = bp->pdev;
-	rx_buf = &rxr->rx_buf_ring[cons];
-	offset = bp->rx_offset;
-
-	mapping = rx_buf->mapping - bp->rx_dma_offset;
-	dma_sync_single_for_cpu(&pdev->dev, mapping + offset, len, bp->rx_dir);
-
-	xdp_init_buff(xdp, buflen, &rxr->xdp_rxq);
-	xdp_prepare_buff(xdp, data_ptr - offset, offset, len, true);
+	page_pool_dma_sync_for_cpu(rxr->page_pool, page, bp->rx_offset, len);
+	xdp_init_buff(xdp, BNXT_RX_PAGE_SIZE, &rxr->xdp_rxq);
+	xdp_prepare_buff(xdp, page_address(page), bp->rx_offset, len, true);
 }
 
 void bnxt_xdp_buff_frags_free(struct bnxt_rx_ring_info *rxr,
@@ -284,8 +272,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 			return true;
 		}
 
-		dma_sync_single_for_device(&pdev->dev, mapping + offset, *len,
-					   bp->rx_dir);
+		page_pool_dma_sync_for_cpu(rxr->page_pool, page, offset, *len);
 
 		*event |= BNXT_TX_EVENT;
 		__bnxt_xmit_xdp(bp, txr, mapping + offset, *len,
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
index 220285e190fc..9592d04e0661 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
@@ -27,7 +27,7 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
 bool bnxt_xdp_attached(struct bnxt *bp, struct bnxt_rx_ring_info *rxr);
 
 void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
-			u16 cons, u8 *data_ptr, unsigned int len,
+			u16 cons, struct page *page, unsigned int len,
 			struct xdp_buff *xdp);
 void bnxt_xdp_buff_frags_free(struct bnxt_rx_ring_info *rxr,
 			      struct xdp_buff *xdp);
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 36eb57d73abc..72e2960b2543 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -255,7 +255,9 @@ netmem_ref page_pool_alloc_frag_netmem(struct page_pool *pool,
 struct page_pool *page_pool_create(const struct page_pool_params *params);
 struct page_pool *page_pool_create_percpu(const struct page_pool_params *params,
 					  int cpuid);
-
+void page_pool_dma_sync_for_device(const struct page_pool *pool,
+				   netmem_ref netmem, u32 offset,
+				   u32 dma_sync_size);
 struct xdp_mem_info;
 
 #ifdef CONFIG_PAGE_POOL
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7745ad924ae2..83b5883d3e8e 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -446,26 +446,29 @@ static netmem_ref __page_pool_get_cached(struct page_pool *pool)
 }
 
 static void __page_pool_dma_sync_for_device(const struct page_pool *pool,
-					    netmem_ref netmem,
+					    netmem_ref netmem, u32 offset,
 					    u32 dma_sync_size)
 {
 #if defined(CONFIG_HAS_DMA) && defined(CONFIG_DMA_NEED_SYNC)
 	dma_addr_t dma_addr = page_pool_get_dma_addr_netmem(netmem);
 
 	dma_sync_size = min(dma_sync_size, pool->p.max_len);
-	__dma_sync_single_for_device(pool->p.dev, dma_addr + pool->p.offset,
+	__dma_sync_single_for_device(pool->p.dev,
+				     dma_addr + pool->p.offset + offset,
 				     dma_sync_size, pool->p.dma_dir);
 #endif
 }
 
-static __always_inline void
-page_pool_dma_sync_for_device(const struct page_pool *pool,
-			      netmem_ref netmem,
-			      u32 dma_sync_size)
+void page_pool_dma_sync_for_device(const struct page_pool *pool,
+				   netmem_ref netmem,
+				   u32 offset,
+				   u32 dma_sync_size)
 {
 	if (pool->dma_sync && dma_dev_need_sync(pool->p.dev))
-		__page_pool_dma_sync_for_device(pool, netmem, dma_sync_size);
+		__page_pool_dma_sync_for_device(pool, netmem, offset,
+						dma_sync_size);
 }
+EXPORT_SYMBOL(page_pool_dma_sync_for_device);
 
 static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem)
 {
@@ -486,7 +489,7 @@ static bool page_pool_dma_map(struct page_pool *pool, netmem_ref netmem)
 	if (page_pool_set_dma_addr_netmem(netmem, dma))
 		goto unmap_failed;
 
-	page_pool_dma_sync_for_device(pool, netmem, pool->p.max_len);
+	page_pool_dma_sync_for_device(pool, netmem, 0, pool->p.max_len);
 
 	return true;
 
@@ -772,7 +775,7 @@ __page_pool_put_page(struct page_pool *pool, netmem_ref netmem,
 	if (likely(__page_pool_page_can_be_recycled(netmem))) {
 		/* Read barrier done in page_ref_count / READ_ONCE */
 
-		page_pool_dma_sync_for_device(pool, netmem, dma_sync_size);
+		page_pool_dma_sync_for_device(pool, netmem, 0, dma_sync_size);
 
 		if (allow_direct && page_pool_recycle_in_cache(netmem, pool))
 			return 0;
@@ -956,7 +959,7 @@ static netmem_ref page_pool_drain_frag(struct page_pool *pool,
 		return 0;
 
 	if (__page_pool_page_can_be_recycled(netmem)) {
-		page_pool_dma_sync_for_device(pool, netmem, -1);
+		page_pool_dma_sync_for_device(pool, netmem, 0, -1);
 		return netmem;
 	}
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP
  2025-03-31 11:47 [RFC net-next 0/2]eth: bnxt: Implement rx-side device memory Taehee Yoo
  2025-03-31 11:47 ` [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor Taehee Yoo
@ 2025-03-31 11:47 ` Taehee Yoo
  2025-03-31 18:50   ` Jakub Kicinski
  2025-04-02 22:16   ` Mina Almasry
  1 sibling, 2 replies; 16+ messages in thread
From: Taehee Yoo @ 2025-03-31 11:47 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet, andrew+netdev, horms, michael.chan,
	pavan.chebbi, ilias.apalodimas, dw, netdev
  Cc: ap420073, kuniyu, sdf, aleksander.lobakin

Currently, bnxt_en driver satisfies the requirements of the Device
memory TCP, which is HDS.
So, it implements rx-side Device memory TCP for bnxt_en driver.
It requires only converting the page API to netmem API.
`struct page` for rx-size are changed to `netmem_ref netmem` and
corresponding functions are changed to a variant of netmem API.

It also passes PP_FLAG_ALLOW_UNREADABLE_NETMEM flag to a parameter of
page_pool.
The netmem will be activated only when a user requests devmem TCP.

When netmem is activated, received data is unreadable and netmem is
disabled, received data is readable.
But drivers don't need to handle both cases because netmem core API will
handle it properly.
So, using proper netmem API is enough for drivers.

Device memory TCP can be tested with
tools/testing/selftests/drivers/net/hw/ncdevmem.
This is tested with BCM57504-N425G and firmware version 232.0.155.8/pkg
232.1.132.8.

Tested-by: David Wei <dw@davidwei.uk>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 407 ++++++++++--------
 drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  18 +-
 .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  26 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   4 +-
 include/linux/netdevice.h                     |   1 +
 include/net/page_pool/helpers.h               |   6 +
 net/core/dev.c                                |   6 +
 8 files changed, 258 insertions(+), 212 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 198a42da3015..b3b4836862d3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -893,46 +893,48 @@ static void bnxt_tx_int(struct bnxt *bp, struct bnxt_napi *bnapi, int budget)
 		bnapi->events &= ~BNXT_TX_CMP_EVENT;
 }
 
-static bool bnxt_separate_head_pool(void)
+static bool bnxt_separate_head_pool(struct bnxt_rx_ring_info *rxr)
 {
-	return PAGE_SIZE > BNXT_RX_PAGE_SIZE;
+	return rxr->need_head_pool || PAGE_SIZE > BNXT_RX_PAGE_SIZE;
 }
 
-static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
+static netmem_ref __bnxt_alloc_rx_netmem(struct bnxt *bp, dma_addr_t *mapping,
 					 struct bnxt_rx_ring_info *rxr,
 					 unsigned int *offset,
 					 gfp_t gfp)
 {
-	struct page *page;
+	netmem_ref netmem;
 
 	if (PAGE_SIZE > BNXT_RX_PAGE_SIZE) {
-		page = page_pool_dev_alloc_frag(rxr->page_pool, offset,
-						BNXT_RX_PAGE_SIZE);
+		netmem = page_pool_alloc_frag_netmem(rxr->page_pool, offset,
+						     BNXT_RX_PAGE_SIZE, gfp);
 	} else {
-		page = page_pool_dev_alloc_pages(rxr->page_pool);
+		netmem = page_pool_alloc_netmems(rxr->page_pool, gfp);
 		*offset = 0;
 	}
-	if (!page)
-		return NULL;
+	if (!netmem)
+		return 0;
 
-	*mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;
-	return page;
+	*mapping = page_pool_get_dma_addr_netmem(netmem) + bp->rx_dma_offset +
+		   *offset;
+	return netmem;
 }
 
-static struct page *__bnxt_alloc_rx_frag(struct bnxt *bp, dma_addr_t *mapping,
-					 struct bnxt_rx_ring_info *rxr,
-					 unsigned int *offset,
-					 gfp_t gfp)
+static netmem_ref __bnxt_alloc_rx_frag(struct bnxt *bp, dma_addr_t *mapping,
+				       struct bnxt_rx_ring_info *rxr,
+				       unsigned int *offset,
+				       gfp_t gfp)
 {
-	struct page *page;
+	netmem_ref netmem;
 
-	page = page_pool_alloc_frag(rxr->head_pool, offset,
-				    bp->rx_buf_size, gfp);
-	if (!page)
-		return NULL;
+	netmem = page_pool_alloc_frag_netmem(rxr->head_pool, offset,
+					     bp->rx_buf_size, gfp);
+	if (!netmem)
+		return 0;
 
-	*mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;
-	return page;
+	*mapping = page_pool_get_dma_addr_netmem(netmem) + bp->rx_dma_offset +
+		   *offset;
+	return netmem;
 }
 
 int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
@@ -942,16 +944,17 @@ int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 	struct bnxt_sw_rx_bd *rx_buf = &rxr->rx_buf_ring[RING_RX(bp, prod)];
 	unsigned int offset;
 	dma_addr_t mapping;
-	struct page *page;
+	netmem_ref netmem;
 
 	if (BNXT_RX_PAGE_MODE(bp))
-		page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
+		netmem = __bnxt_alloc_rx_netmem(bp, &mapping, rxr, &offset,
+						gfp);
 	else
-		page = __bnxt_alloc_rx_frag(bp, &mapping, rxr, &offset, gfp);
-	if (!page)
+		netmem = __bnxt_alloc_rx_frag(bp, &mapping, rxr, &offset, gfp);
+	if (!netmem)
 		return -ENOMEM;
 
-	rx_buf->page = page;
+	rx_buf->netmem = netmem;
 	rx_buf->offset = offset;
 	rx_buf->mapping = mapping;
 
@@ -959,8 +962,8 @@ int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 	return 0;
 }
 
-void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons,
-			struct page *page)
+void bnxt_reuse_rx_netmem(struct bnxt_rx_ring_info *rxr, u16 cons,
+			  netmem_ref netmem)
 {
 	u16 prod = rxr->rx_prod;
 	struct bnxt_sw_rx_bd *cons_rx_buf, *prod_rx_buf;
@@ -970,7 +973,7 @@ void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons,
 	prod_rx_buf = &rxr->rx_buf_ring[RING_RX(bp, prod)];
 	cons_rx_buf = &rxr->rx_buf_ring[cons];
 
-	prod_rx_buf->page = page;
+	prod_rx_buf->netmem = netmem;
 	prod_rx_buf->offset = cons_rx_buf->offset;
 
 	prod_rx_buf->mapping = cons_rx_buf->mapping;
@@ -991,9 +994,9 @@ static inline u16 bnxt_find_next_agg_idx(struct bnxt_rx_ring_info *rxr, u16 idx)
 	return next;
 }
 
-static inline int bnxt_alloc_rx_agg_page(struct bnxt *bp,
-					 struct bnxt_rx_ring_info *rxr,
-					 u16 prod, gfp_t gfp)
+static inline int bnxt_alloc_rx_agg_netmem(struct bnxt *bp,
+					   struct bnxt_rx_ring_info *rxr,
+					   u16 prod, gfp_t gfp)
 {
 	struct rx_bd *rxbd =
 		&rxr->rx_agg_desc_ring[RX_AGG_RING(bp, prod)][RX_IDX(prod)];
@@ -1001,10 +1004,10 @@ static inline int bnxt_alloc_rx_agg_page(struct bnxt *bp,
 	struct bnxt_sw_rx_bd *rx_agg_buf;
 	unsigned int offset = 0;
 	dma_addr_t mapping;
-	struct page *page;
+	netmem_ref netmem;
 
-	page = __bnxt_alloc_rx_page(bp, &mapping, rxr, &offset, gfp);
-	if (!page)
+	netmem = __bnxt_alloc_rx_netmem(bp, &mapping, rxr, &offset, gfp);
+	if (!netmem)
 		return -ENOMEM;
 	mapping -= bp->rx_dma_offset;
 
@@ -1015,7 +1018,7 @@ static inline int bnxt_alloc_rx_agg_page(struct bnxt *bp,
 	rx_agg_buf = &rxr->rx_agg_ring[sw_prod];
 	rxr->rx_sw_agg_prod = RING_RX_AGG(bp, NEXT_RX_AGG(sw_prod));
 
-	rx_agg_buf->page = page;
+	rx_agg_buf->netmem = netmem;
 	rx_agg_buf->offset = offset;
 	rx_agg_buf->mapping = mapping;
 	rxbd->rx_bd_haddr = cpu_to_le64(mapping);
@@ -1062,7 +1065,7 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
 		struct bnxt_sw_rx_bd *cons_rx_buf, *prod_rx_buf;
 		struct rx_agg_cmp *agg;
 		struct rx_bd *prod_bd;
-		struct page *page;
+		netmem_ref netmem;
 		u16 cons;
 
 		if (p5_tpa)
@@ -1080,11 +1083,11 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
 		cons_rx_buf = &rxr->rx_agg_ring[cons];
 
 		/* It is possible for sw_prod to be equal to cons, so
-		 * set cons_rx_buf->page to NULL first.
+		 * set cons_rx_buf->netmem to 0 first.
 		 */
-		page = cons_rx_buf->page;
-		cons_rx_buf->page = NULL;
-		prod_rx_buf->page = page;
+		netmem = cons_rx_buf->netmem;
+		cons_rx_buf->netmem = 0;
+		prod_rx_buf->netmem = netmem;
 		prod_rx_buf->offset = cons_rx_buf->offset;
 
 		prod_rx_buf->mapping = cons_rx_buf->mapping;
@@ -1101,12 +1104,12 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
 	rxr->rx_sw_agg_prod = sw_prod;
 }
 
-static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
-					      struct bnxt_rx_ring_info *rxr,
-					      u16 cons, struct page *page,
-					      unsigned int offset,
-					      dma_addr_t dma_addr,
-					      unsigned int offset_and_len)
+static struct sk_buff *bnxt_rx_multi_netmem_skb(struct bnxt *bp,
+						struct bnxt_rx_ring_info *rxr,
+						u16 cons, netmem_ref netmem,
+						unsigned int offset,
+						dma_addr_t dma_addr,
+						unsigned int offset_and_len)
 {
 	unsigned int len = offset_and_len & 0xffff;
 	u16 prod = rxr->rx_prod;
@@ -1115,14 +1118,15 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
 
 	err = bnxt_alloc_rx_data(bp, rxr, prod, GFP_ATOMIC);
 	if (unlikely(err)) {
-		bnxt_reuse_rx_data(rxr, cons, page);
+		bnxt_reuse_rx_netmem(rxr, cons, netmem);
 		return NULL;
 	}
-	page_pool_dma_sync_for_cpu(rxr->page_pool, page, 0, BNXT_RX_PAGE_SIZE);
+	page_pool_dma_sync_netmem_for_cpu(rxr->page_pool, netmem, 0,
+					  BNXT_RX_PAGE_SIZE);
 
-	skb = napi_build_skb(bnxt_data(page, offset), BNXT_RX_PAGE_SIZE);
+	skb = napi_build_skb(bnxt_data(netmem, offset), BNXT_RX_PAGE_SIZE);
 	if (!skb) {
-		page_pool_recycle_direct(rxr->page_pool, page);
+		page_pool_recycle_direct_netmem(rxr->head_pool, netmem);
 		return NULL;
 	}
 	skb_mark_for_recycle(skb);
@@ -1132,12 +1136,12 @@ static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
 	return skb;
 }
 
-static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
-					struct bnxt_rx_ring_info *rxr,
-					u16 cons, struct page *page,
-					unsigned int offset,
-					dma_addr_t dma_addr,
-					unsigned int offset_and_len)
+static struct sk_buff *bnxt_rx_netmem_skb(struct bnxt *bp,
+					  struct bnxt_rx_ring_info *rxr,
+					  u16 cons, netmem_ref netmem,
+					  unsigned int offset,
+					  dma_addr_t dma_addr,
+					  unsigned int offset_and_len)
 {
 	unsigned int payload = offset_and_len >> 16;
 	unsigned int len = offset_and_len & 0xffff;
@@ -1149,24 +1153,25 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
 
 	err = bnxt_alloc_rx_data(bp, rxr, prod, GFP_ATOMIC);
 	if (unlikely(err)) {
-		bnxt_reuse_rx_data(rxr, cons, page);
+		bnxt_reuse_rx_netmem(rxr, cons, netmem);
 		return NULL;
 	}
-	data_ptr = bnxt_data_ptr(bp, page, offset);
-	page_pool_dma_sync_for_cpu(rxr->page_pool, page, 0, BNXT_RX_PAGE_SIZE);
+	data_ptr = bnxt_data_ptr(bp, netmem, offset);
+	page_pool_dma_sync_netmem_for_cpu(rxr->page_pool, netmem, 0,
+					  BNXT_RX_PAGE_SIZE);
 
 	if (unlikely(!payload))
 		payload = eth_get_headlen(bp->dev, data_ptr, len);
 
 	skb = napi_alloc_skb(&rxr->bnapi->napi, payload);
 	if (!skb) {
-		page_pool_recycle_direct(rxr->page_pool, page);
+		page_pool_recycle_direct_netmem(rxr->head_pool, netmem);
 		return NULL;
 	}
 
 	skb_mark_for_recycle(skb);
-	skb_add_rx_frag(skb, 0, page, bp->rx_offset + offset, len,
-			BNXT_RX_PAGE_SIZE);
+	skb_add_rx_frag_netmem(skb, 0, netmem, bp->rx_offset + offset, len,
+			       BNXT_RX_PAGE_SIZE);
 	memcpy(skb->data - NET_IP_ALIGN, data_ptr - NET_IP_ALIGN,
 	       payload + NET_IP_ALIGN);
 
@@ -1181,7 +1186,7 @@ static struct sk_buff *bnxt_rx_page_skb(struct bnxt *bp,
 
 static struct sk_buff *bnxt_rx_skb(struct bnxt *bp,
 				   struct bnxt_rx_ring_info *rxr, u16 cons,
-				   struct page *page, unsigned int offset,
+				   netmem_ref netmem, unsigned int offset,
 				   dma_addr_t dma_addr,
 				   unsigned int offset_and_len)
 {
@@ -1191,16 +1196,17 @@ static struct sk_buff *bnxt_rx_skb(struct bnxt *bp,
 
 	err = bnxt_alloc_rx_data(bp, rxr, prod, GFP_ATOMIC);
 	if (unlikely(err)) {
-		bnxt_reuse_rx_data(rxr, cons, page);
+		bnxt_reuse_rx_netmem(rxr, cons, netmem);
 		return NULL;
 	}
 
-	skb = napi_build_skb(bnxt_data(page, offset), bp->rx_buf_size);
-	page_pool_dma_sync_for_cpu(rxr->head_pool, page,
-				   offset + bp->rx_dma_offset,
-				   bp->rx_buf_use_size);
+	skb = napi_build_skb(bnxt_data(netmem, offset), bp->rx_buf_size);
+	page_pool_dma_sync_netmem_for_cpu(rxr->head_pool, netmem,
+					  offset + bp->rx_dma_offset,
+					  bp->rx_buf_use_size);
+
 	if (!skb) {
-		page_pool_recycle_direct(rxr->head_pool, page);
+		page_pool_recycle_direct_netmem(rxr->head_pool, netmem);
 		return NULL;
 	}
 
@@ -1210,13 +1216,14 @@ static struct sk_buff *bnxt_rx_skb(struct bnxt *bp,
 	return skb;
 }
 
-static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
-			       struct bnxt_cp_ring_info *cpr,
-			       struct skb_shared_info *shinfo,
-			       u16 idx, u32 agg_bufs, bool tpa,
-			       struct xdp_buff *xdp)
+static u32 __bnxt_rx_agg_netmems(struct bnxt *bp,
+				 struct bnxt_cp_ring_info *cpr,
+				 struct sk_buff *skb,
+				 u16 idx, u32 agg_bufs, bool tpa,
+				 struct xdp_buff *xdp)
 {
 	struct bnxt_napi *bnapi = cpr->bnapi;
+	struct skb_shared_info *shinfo;
 	struct bnxt_rx_ring_info *rxr;
 	u32 i, total_frag_len = 0;
 	bool p5_tpa = false;
@@ -1229,12 +1236,11 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
 		p5_tpa = true;
 
 	for (i = 0; i < agg_bufs; i++) {
-		skb_frag_t *frag = &shinfo->frags[i];
 		struct bnxt_sw_rx_bd *cons_rx_buf;
 		struct rx_agg_cmp *agg;
 		u16 cons, frag_len;
 		dma_addr_t mapping;
-		struct page *page;
+		netmem_ref netmem;
 
 		if (p5_tpa)
 			agg = bnxt_get_tpa_agg_p5(bp, rxr, idx, i);
@@ -1245,25 +1251,42 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
 			    RX_AGG_CMP_LEN) >> RX_AGG_CMP_LEN_SHIFT;
 
 		cons_rx_buf = &rxr->rx_agg_ring[cons];
-		skb_frag_fill_page_desc(frag, cons_rx_buf->page,
-					cons_rx_buf->offset, frag_len);
-		shinfo->nr_frags = i + 1;
+		if (skb) {
+			shinfo = skb_shinfo(skb);
+			skb_add_rx_frag_netmem(skb, i, cons_rx_buf->netmem,
+					       cons_rx_buf->offset,
+					       frag_len, BNXT_RX_PAGE_SIZE);
+		} else {
+			shinfo = xdp_get_shared_info_from_buff(xdp);
+			skb_frag_t *frag = &shinfo->frags[i];
+
+			skb_frag_fill_netmem_desc(frag, cons_rx_buf->netmem,
+						  cons_rx_buf->offset,
+						  frag_len);
+			shinfo->nr_frags = i + 1;
+		}
+
 		__clear_bit(cons, rxr->rx_agg_bmap);
 
-		/* It is possible for bnxt_alloc_rx_agg_page() to allocate
+		/* It is possible for bnxt_alloc_rx_agg_netmem() to allocate
 		 * a sw_prod index that equals the cons index, so we
 		 * need to clear the cons entry now.
 		 */
 		mapping = cons_rx_buf->mapping;
-		page = cons_rx_buf->page;
-		cons_rx_buf->page = NULL;
+		netmem = cons_rx_buf->netmem;
+		cons_rx_buf->netmem = 0;
 
-		if (xdp && page_is_pfmemalloc(page))
+		if (xdp && netmem_is_pfmemalloc(netmem))
 			xdp_buff_set_frag_pfmemalloc(xdp);
 
-		if (bnxt_alloc_rx_agg_page(bp, rxr, prod, GFP_ATOMIC) != 0) {
+		if (bnxt_alloc_rx_agg_netmem(bp, rxr, prod, GFP_ATOMIC) != 0) {
+			if (skb) {
+				skb->len -= frag_len;
+				skb->data_len -= frag_len;
+				skb->truesize -= BNXT_RX_PAGE_SIZE;
+			}
 			--shinfo->nr_frags;
-			cons_rx_buf->page = page;
+			cons_rx_buf->netmem = netmem;
 
 			/* Update prod since possibly some pages have been
 			 * allocated already.
@@ -1273,8 +1296,8 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
 			return 0;
 		}
 
-		page_pool_dma_sync_for_cpu(rxr->page_pool, page, 0,
-					   BNXT_RX_PAGE_SIZE);
+		page_pool_dma_sync_netmem_for_cpu(rxr->page_pool, netmem, 0,
+						  BNXT_RX_PAGE_SIZE);
 
 		total_frag_len += frag_len;
 		prod = NEXT_RX_AGG(prod);
@@ -1283,32 +1306,24 @@ static u32 __bnxt_rx_agg_pages(struct bnxt *bp,
 	return total_frag_len;
 }
 
-static struct sk_buff *bnxt_rx_agg_pages_skb(struct bnxt *bp,
-					     struct bnxt_cp_ring_info *cpr,
-					     struct sk_buff *skb, u16 idx,
-					     u32 agg_bufs, bool tpa)
+static struct sk_buff *bnxt_rx_agg_netmems_skb(struct bnxt *bp,
+					       struct bnxt_cp_ring_info *cpr,
+					       struct sk_buff *skb, u16 idx,
+					       u32 agg_bufs, bool tpa)
 {
-	struct skb_shared_info *shinfo = skb_shinfo(skb);
-	u32 total_frag_len = 0;
-
-	total_frag_len = __bnxt_rx_agg_pages(bp, cpr, shinfo, idx,
-					     agg_bufs, tpa, NULL);
-	if (!total_frag_len) {
+	if (!__bnxt_rx_agg_netmems(bp, cpr, skb, idx, agg_bufs, tpa, NULL)) {
 		skb_mark_for_recycle(skb);
 		dev_kfree_skb(skb);
 		return NULL;
 	}
 
-	skb->data_len += total_frag_len;
-	skb->len += total_frag_len;
-	skb->truesize += BNXT_RX_PAGE_SIZE * agg_bufs;
 	return skb;
 }
 
-static u32 bnxt_rx_agg_pages_xdp(struct bnxt *bp,
-				 struct bnxt_cp_ring_info *cpr,
-				 struct xdp_buff *xdp, u16 idx,
-				 u32 agg_bufs, bool tpa)
+static u32 bnxt_rx_agg_netmems_xdp(struct bnxt *bp,
+				   struct bnxt_cp_ring_info *cpr,
+				   struct xdp_buff *xdp, u16 idx,
+				   u32 agg_bufs, bool tpa)
 {
 	struct skb_shared_info *shinfo = xdp_get_shared_info_from_buff(xdp);
 	u32 total_frag_len = 0;
@@ -1316,8 +1331,8 @@ static u32 bnxt_rx_agg_pages_xdp(struct bnxt *bp,
 	if (!xdp_buff_has_frags(xdp))
 		shinfo->nr_frags = 0;
 
-	total_frag_len = __bnxt_rx_agg_pages(bp, cpr, shinfo,
-					     idx, agg_bufs, tpa, xdp);
+	total_frag_len = __bnxt_rx_agg_netmems(bp, cpr, NULL,
+					       idx, agg_bufs, tpa, xdp);
 	if (total_frag_len) {
 		xdp_buff_set_frags_flag(xdp);
 		shinfo->nr_frags = agg_bufs;
@@ -1340,7 +1355,7 @@ static int bnxt_agg_bufs_valid(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 }
 
 static struct sk_buff *bnxt_copy_data(struct bnxt_napi *bnapi,
-				      struct page *page,
+				      netmem_ref netmem,
 				      unsigned int offset,
 				      unsigned int len)
 {
@@ -1352,15 +1367,15 @@ static struct sk_buff *bnxt_copy_data(struct bnxt_napi *bnapi,
 	if (!skb)
 		return NULL;
 
-	page_pool_dma_sync_for_cpu(rxr->head_pool, page,
-				   offset + bp->rx_dma_offset,
-				   bp->rx_copybreak);
+	page_pool_dma_sync_netmem_for_cpu(rxr->head_pool, netmem,
+					  offset + bp->rx_dma_offset,
+					  bp->rx_copybreak);
 
 	memcpy(skb->data - NET_IP_ALIGN,
-	       bnxt_data_ptr(bp, page, offset) - NET_IP_ALIGN,
+	       bnxt_data_ptr(bp, netmem, offset) - NET_IP_ALIGN,
 	       len + NET_IP_ALIGN);
 
-	page_pool_dma_sync_for_device(rxr->head_pool, page_to_netmem(page),
+	page_pool_dma_sync_for_device(rxr->head_pool, netmem,
 				      bp->rx_dma_offset, bp->rx_copybreak);
 	skb_put(skb, len);
 
@@ -1368,15 +1383,15 @@ static struct sk_buff *bnxt_copy_data(struct bnxt_napi *bnapi,
 }
 
 static struct sk_buff *bnxt_copy_skb(struct bnxt_napi *bnapi,
-				     struct page *page,
+				     netmem_ref netmem,
 				     unsigned int offset,
 				     unsigned int len)
 {
-	return bnxt_copy_data(bnapi, page, offset, len);
+	return bnxt_copy_data(bnapi, netmem, offset, len);
 }
 
 static struct sk_buff *bnxt_copy_xdp(struct bnxt_napi *bnapi,
-				     struct page *page,
+				     netmem_ref netmem,
 				     unsigned int offset,
 				     struct xdp_buff *xdp,
 				     unsigned int len)
@@ -1389,7 +1404,7 @@ static struct sk_buff *bnxt_copy_xdp(struct bnxt_napi *bnapi,
 	metasize = xdp->data - xdp->data_meta;
 	data = xdp->data_meta;
 
-	skb = bnxt_copy_data(bnapi, page, offset, len);
+	skb = bnxt_copy_data(bnapi, netmem, offset, len);
 	if (!skb)
 		return skb;
 
@@ -1519,7 +1534,7 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 		bnxt_sched_reset_rxr(bp, rxr);
 		return;
 	}
-	prod_rx_buf->page = tpa_info->bd.page;
+	prod_rx_buf->netmem = tpa_info->bd.netmem;
 	prod_rx_buf->offset = tpa_info->bd.offset;
 
 	mapping = tpa_info->bd.mapping;
@@ -1529,9 +1544,9 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 
 	prod_bd->rx_bd_haddr = cpu_to_le64(mapping);
 
-	tpa_info->bd.page = cons_rx_buf->page;
+	tpa_info->bd.netmem = cons_rx_buf->netmem;
 	tpa_info->bd.offset = cons_rx_buf->offset;
-	cons_rx_buf->page = NULL;
+	cons_rx_buf->netmem = 0;
 	tpa_info->bd.mapping = cons_rx_buf->mapping;
 
 	tpa_info->len =
@@ -1566,9 +1581,9 @@ static void bnxt_tpa_start(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 	rxr->rx_next_cons = RING_RX(bp, NEXT_RX(cons));
 	cons_rx_buf = &rxr->rx_buf_ring[cons];
 
-	bnxt_reuse_rx_data(rxr, cons, cons_rx_buf->page);
+	bnxt_reuse_rx_netmem(rxr, cons, cons_rx_buf->netmem);
 	rxr->rx_prod = NEXT_RX(rxr->rx_prod);
-	cons_rx_buf->page = NULL;
+	cons_rx_buf->netmem = 0;
 }
 
 static void bnxt_abort_tpa(struct bnxt_cp_ring_info *cpr, u16 idx, u32 agg_bufs)
@@ -1800,7 +1815,7 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	struct sk_buff *skb;
 	u16 idx = 0, agg_id;
 	dma_addr_t mapping;
-	struct page *page;
+	netmem_ref netmem;
 	bool gro;
 
 	if (unlikely(bnapi->in_reset)) {
@@ -1840,9 +1855,9 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 		}
 		gro = !!TPA_END_GRO(tpa_end);
 	}
-	page = tpa_info->bd.page;
+	netmem = tpa_info->bd.netmem;
 	offset = tpa_info->bd.offset;
-	data_ptr = bnxt_data_ptr(bp, page, offset);
+	data_ptr = bnxt_data_ptr(bp, netmem, offset);
 	prefetch(data_ptr);
 	len = tpa_info->len;
 	mapping = tpa_info->bd.mapping;
@@ -1856,7 +1871,7 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	}
 
 	if (len <= bp->rx_copybreak) {
-		skb = bnxt_copy_skb(bnapi, page, offset, len);
+		skb = bnxt_copy_skb(bnapi, netmem, offset, len);
 		if (!skb) {
 			bnxt_abort_tpa(cpr, idx, agg_bufs);
 			cpr->sw_stats->rx.rx_oom_discards += 1;
@@ -1865,27 +1880,27 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	} else {
 		unsigned int new_offset;
 		dma_addr_t new_mapping;
-		struct page *new_page;
+		netmem_ref new_netmem;
 
-		new_page = __bnxt_alloc_rx_frag(bp, &new_mapping, rxr,
-						&new_offset, GFP_ATOMIC);
-		if (!new_page) {
+		new_netmem = __bnxt_alloc_rx_frag(bp, &new_mapping, rxr,
+						  &new_offset, GFP_ATOMIC);
+		if (!new_netmem) {
 			bnxt_abort_tpa(cpr, idx, agg_bufs);
 			cpr->sw_stats->rx.rx_oom_discards += 1;
 			return NULL;
 		}
 
-		tpa_info->bd.page = new_page;
+		tpa_info->bd.netmem = new_netmem;
 		tpa_info->bd.offset = new_offset;
 		tpa_info->bd.mapping = new_mapping;
 
-		skb = napi_build_skb(bnxt_data(page, offset), bp->rx_buf_size);
-		page_pool_dma_sync_for_cpu(rxr->head_pool, page,
-					   offset + bp->rx_dma_offset,
-					   bp->rx_buf_use_size);
-
+		skb = napi_build_skb(bnxt_data(netmem, offset),
+				     bp->rx_buf_size);
+		page_pool_dma_sync_netmem_for_cpu(rxr->head_pool, netmem,
+						  offset + bp->rx_dma_offset,
+						  bp->rx_buf_use_size);
 		if (!skb) {
-			page_pool_recycle_direct(rxr->head_pool, page);
+			page_pool_recycle_direct_netmem(rxr->head_pool, netmem);
 			bnxt_abort_tpa(cpr, idx, agg_bufs);
 			cpr->sw_stats->rx.rx_oom_discards += 1;
 			return NULL;
@@ -1896,9 +1911,12 @@ static inline struct sk_buff *bnxt_tpa_end(struct bnxt *bp,
 	}
 
 	if (agg_bufs) {
-		skb = bnxt_rx_agg_pages_skb(bp, cpr, skb, idx, agg_bufs, true);
+		skb = bnxt_rx_agg_netmems_skb(bp, cpr, skb, idx, agg_bufs,
+					      true);
 		if (!skb) {
-			/* Page reuse already handled by bnxt_rx_pages(). */
+			/* Page reuse already handled by
+			 * bnxt_rx_agg_netmems_skb().
+			 */
 			cpr->sw_stats->rx.rx_oom_discards += 1;
 			return NULL;
 		}
@@ -2062,7 +2080,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	dma_addr_t dma_addr;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
-	struct page *page;
+	netmem_ref netmem;
 	u32 flags, misc;
 	u32 cmpl_ts;
 	int rc = 0;
@@ -2134,9 +2152,9 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 		goto next_rx_no_prod_no_len;
 	}
 	rx_buf = &rxr->rx_buf_ring[cons];
-	page = rx_buf->page;
+	netmem = rx_buf->netmem;
 	offset = rx_buf->offset;
-	data_ptr = bnxt_data_ptr(bp, page, offset);
+	data_ptr = bnxt_data_ptr(bp, netmem, offset);
 	prefetch(data_ptr);
 
 	misc = le32_to_cpu(rxcmp->rx_cmp_misc_v1);
@@ -2151,11 +2169,11 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	}
 	*event |= BNXT_RX_EVENT;
 
-	rx_buf->page = NULL;
+	rx_buf->netmem = 0;
 	if (rxcmp1->rx_cmp_cfa_code_errors_v2 & RX_CMP_L2_ERRORS) {
 		u32 rx_err = le32_to_cpu(rxcmp1->rx_cmp_cfa_code_errors_v2);
 
-		bnxt_reuse_rx_data(rxr, cons, page);
+		bnxt_reuse_rx_netmem(rxr, cons, netmem);
 		if (agg_bufs)
 			bnxt_reuse_rx_agg_bufs(cpr, cp_cons, 0, agg_bufs,
 					       false);
@@ -2178,11 +2196,12 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	dma_addr = rx_buf->mapping;
 
 	if (bnxt_xdp_attached(bp, rxr)) {
-		bnxt_xdp_buff_init(bp, rxr, cons, page, len, &xdp);
+		bnxt_xdp_buff_init(bp, rxr, cons, netmem, len, &xdp);
 		if (agg_bufs) {
-			u32 frag_len = bnxt_rx_agg_pages_xdp(bp, cpr, &xdp,
-							     cp_cons, agg_bufs,
-							     false);
+			u32 frag_len = bnxt_rx_agg_netmems_xdp(bp, cpr, &xdp,
+							       cp_cons,
+							       agg_bufs,
+							       false);
 			if (!frag_len)
 				goto oom_next_rx;
 
@@ -2191,7 +2210,7 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	}
 
 	if (xdp_active) {
-		if (bnxt_rx_xdp(bp, rxr, cons, &xdp, page, &data_ptr, &len, event)) {
+		if (bnxt_rx_xdp(bp, rxr, cons, &xdp, netmem, &data_ptr, &len, event)) {
 			rc = 1;
 			goto next_rx;
 		}
@@ -2205,10 +2224,10 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 
 	if (len <= bp->rx_copybreak) {
 		if (!xdp_active)
-			skb = bnxt_copy_skb(bnapi, page, offset, len);
+			skb = bnxt_copy_skb(bnapi, netmem, offset, len);
 		else
-			skb = bnxt_copy_xdp(bnapi, page, offset, &xdp, len);
-		bnxt_reuse_rx_data(rxr, cons, page);
+			skb = bnxt_copy_xdp(bnapi, netmem, offset, &xdp, len);
+		bnxt_reuse_rx_netmem(rxr, cons, netmem);
 		if (!skb) {
 			if (agg_bufs) {
 				if (!xdp_active)
@@ -2222,11 +2241,11 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 	} else {
 		u32 payload;
 
-		if (bnxt_data_ptr(bp, page, offset) == data_ptr)
+		if (bnxt_data_ptr(bp, netmem, offset) == data_ptr)
 			payload = misc & RX_CMP_PAYLOAD_OFFSET;
 		else
 			payload = 0;
-		skb = bp->rx_skb_func(bp, rxr, cons, page, offset, dma_addr,
+		skb = bp->rx_skb_func(bp, rxr, cons, netmem, offset, dma_addr,
 				      payload | len);
 		if (!skb)
 			goto oom_next_rx;
@@ -2234,7 +2253,8 @@ static int bnxt_rx_pkt(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 
 	if (agg_bufs) {
 		if (!xdp_active) {
-			skb = bnxt_rx_agg_pages_skb(bp, cpr, skb, cp_cons, agg_bufs, false);
+			skb = bnxt_rx_agg_netmems_skb(bp, cpr, skb, cp_cons,
+						      agg_bufs, false);
 			if (!skb)
 				goto oom_next_rx;
 		} else {
@@ -3429,13 +3449,13 @@ static void bnxt_free_one_rx_ring(struct bnxt *bp, struct bnxt_rx_ring_info *rxr
 
 	for (i = 0; i < max_idx; i++) {
 		struct bnxt_sw_rx_bd *rx_buf = &rxr->rx_buf_ring[i];
-		struct page *page = rx_buf->page;
+		netmem_ref netmem = rx_buf->netmem;
 
-		if (!page)
+		if (!netmem)
 			continue;
 
-		rx_buf->page = NULL;
-		page_pool_recycle_direct(rxr->page_pool, page);
+		rx_buf->netmem = 0;
+		page_pool_recycle_direct_netmem(rxr->head_pool, netmem);
 	}
 }
 
@@ -3447,15 +3467,15 @@ static void bnxt_free_one_rx_agg_ring(struct bnxt *bp, struct bnxt_rx_ring_info
 
 	for (i = 0; i < max_idx; i++) {
 		struct bnxt_sw_rx_bd *rx_agg_buf = &rxr->rx_agg_ring[i];
-		struct page *page = rx_agg_buf->page;
+		netmem_ref netmem = rx_agg_buf->netmem;
 
-		if (!page)
+		if (!netmem)
 			continue;
 
-		rx_agg_buf->page = NULL;
+		rx_agg_buf->netmem = 0;
 		__clear_bit(i, rxr->rx_agg_bmap);
 
-		page_pool_recycle_direct(rxr->page_pool, page);
+		page_pool_recycle_direct_netmem(rxr->page_pool, netmem);
 	}
 }
 
@@ -3466,13 +3486,13 @@ static void bnxt_free_one_tpa_info_data(struct bnxt *bp,
 
 	for (i = 0; i < bp->max_tpa; i++) {
 		struct bnxt_tpa_info *tpa_info = &rxr->rx_tpa[i];
-		struct page *page = tpa_info->bd.page;
+		netmem_ref netmem = tpa_info->bd.netmem;
 
-		if (!page)
+		if (!netmem)
 			continue;
 
-		tpa_info->bd.page = NULL;
-		page_pool_put_full_page(rxr->head_pool, page, false);
+		tpa_info->bd.netmem = 0;
+		page_pool_put_full_netmem(rxr->head_pool, netmem, false);
 	}
 }
 
@@ -3748,7 +3768,7 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
 			xdp_rxq_info_unreg(&rxr->xdp_rxq);
 
 		page_pool_destroy(rxr->page_pool);
-		if (bnxt_separate_head_pool())
+		if (bnxt_separate_head_pool(rxr))
 			page_pool_destroy(rxr->head_pool);
 		rxr->page_pool = rxr->head_pool = NULL;
 
@@ -3763,9 +3783,9 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
 	}
 }
 
-static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
-				   struct bnxt_rx_ring_info *rxr,
-				   int numa_node)
+static int bnxt_alloc_rx_netmem_pool(struct bnxt *bp,
+				     struct bnxt_rx_ring_info *rxr,
+				     int numa_node)
 {
 	struct page_pool_params pp = { 0 };
 	struct page_pool *pool;
@@ -3779,15 +3799,20 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 	pp.dev = &bp->pdev->dev;
 	pp.dma_dir = bp->rx_dir;
 	pp.max_len = PAGE_SIZE;
-	pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
+	pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV |
+		   PP_FLAG_ALLOW_UNREADABLE_NETMEM;
+	pp.queue_idx = rxr->bnapi->index;
+	pp.order = 0;
 
 	pool = page_pool_create(&pp);
 	if (IS_ERR(pool))
 		return PTR_ERR(pool);
 	rxr->page_pool = pool;
 
-	if (bnxt_separate_head_pool()) {
+	rxr->need_head_pool = dev_is_mp_channel(bp->dev, rxr->bnapi->index);
+	if (bnxt_separate_head_pool(rxr)) {
 		pp.pool_size = max(bp->rx_ring_size, 1024);
+		pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
 		pool = page_pool_create(&pp);
 		if (IS_ERR(pool))
 			goto err_destroy_pp;
@@ -3837,7 +3862,7 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
 		cpu_node = cpu_to_node(cpu);
 		netdev_dbg(bp->dev, "Allocating page pool for rx_ring[%d] on numa_node: %d\n",
 			   i, cpu_node);
-		rc = bnxt_alloc_rx_page_pool(bp, rxr, cpu_node);
+		rc = bnxt_alloc_rx_netmem_pool(bp, rxr, cpu_node);
 		if (rc)
 			return rc;
 
@@ -4199,6 +4224,8 @@ static void bnxt_reset_rx_ring_struct(struct bnxt *bp,
 
 	rxr->page_pool->p.napi = NULL;
 	rxr->page_pool = NULL;
+	rxr->head_pool->p.napi = NULL;
+	rxr->head_pool = NULL;
 	memset(&rxr->xdp_rxq, 0, sizeof(struct xdp_rxq_info));
 
 	ring = &rxr->rx_ring_struct;
@@ -4332,7 +4359,7 @@ static void bnxt_alloc_one_rx_ring_page(struct bnxt *bp,
 
 	prod = rxr->rx_agg_prod;
 	for (i = 0; i < bp->rx_agg_ring_size; i++) {
-		if (bnxt_alloc_rx_agg_page(bp, rxr, prod, GFP_KERNEL)) {
+		if (bnxt_alloc_rx_agg_netmem(bp, rxr, prod, GFP_KERNEL)) {
 			netdev_warn(bp->dev, "init'ed rx ring %d with %d/%d pages only\n",
 				    ring_nr, i, bp->rx_ring_size);
 			break;
@@ -4347,16 +4374,16 @@ static int bnxt_alloc_one_tpa_info_data(struct bnxt *bp,
 {
 	unsigned int offset;
 	dma_addr_t mapping;
-	struct page *page;
+	netmem_ref netmem;
 	int i;
 
 	for (i = 0; i < bp->max_tpa; i++) {
-		page = __bnxt_alloc_rx_frag(bp, &mapping, rxr, &offset,
-					    GFP_KERNEL);
-		if (!page)
+		netmem = __bnxt_alloc_rx_frag(bp, &mapping, rxr, &offset,
+					      GFP_KERNEL);
+		if (!netmem)
 			return -ENOMEM;
 
-		rxr->rx_tpa[i].bd.page = page;
+		rxr->rx_tpa[i].bd.netmem = netmem;
 		rxr->rx_tpa[i].bd.offset = offset;
 		rxr->rx_tpa[i].bd.mapping = mapping;
 	}
@@ -4770,10 +4797,10 @@ static void __bnxt_set_rx_skb_mode(struct bnxt *bp, bool page_mode)
 				min_t(u16, bp->max_mtu, BNXT_MAX_PAGE_MODE_MTU);
 		if (dev->mtu > BNXT_MAX_PAGE_MODE_MTU) {
 			bp->flags |= BNXT_FLAG_JUMBO;
-			bp->rx_skb_func = bnxt_rx_multi_page_skb;
+			bp->rx_skb_func = bnxt_rx_multi_netmem_skb;
 		} else {
 			bp->flags |= BNXT_FLAG_NO_AGG_RINGS;
-			bp->rx_skb_func = bnxt_rx_page_skb;
+			bp->rx_skb_func = bnxt_rx_netmem_skb;
 		}
 		bp->rx_dir = DMA_BIDIRECTIONAL;
 	} else {
@@ -15711,8 +15738,9 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
 	clone->rx_agg_prod = 0;
 	clone->rx_sw_agg_prod = 0;
 	clone->rx_next_cons = 0;
+	clone->need_head_pool = false;
 
-	rc = bnxt_alloc_rx_page_pool(bp, clone, rxr->page_pool->p.nid);
+	rc = bnxt_alloc_rx_netmem_pool(bp, clone, rxr->page_pool->p.nid);
 	if (rc)
 		return rc;
 
@@ -15769,7 +15797,7 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
 	xdp_rxq_info_unreg(&clone->xdp_rxq);
 err_page_pool_destroy:
 	page_pool_destroy(clone->page_pool);
-	if (bnxt_separate_head_pool())
+	if (bnxt_separate_head_pool(rxr))
 		page_pool_destroy(clone->head_pool);
 	clone->page_pool = NULL;
 	clone->head_pool = NULL;
@@ -15788,7 +15816,7 @@ static void bnxt_queue_mem_free(struct net_device *dev, void *qmem)
 	xdp_rxq_info_unreg(&rxr->xdp_rxq);
 
 	page_pool_destroy(rxr->page_pool);
-	if (bnxt_separate_head_pool())
+	if (bnxt_separate_head_pool(rxr))
 		page_pool_destroy(rxr->head_pool);
 	rxr->page_pool = NULL;
 	rxr->head_pool = NULL;
@@ -15879,6 +15907,7 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
 	rxr->page_pool = clone->page_pool;
 	rxr->head_pool = clone->head_pool;
 	rxr->xdp_rxq = clone->xdp_rxq;
+	rxr->need_head_pool = clone->need_head_pool;
 
 	bnxt_copy_rx_ring(bp, rxr, clone);
 
@@ -15912,7 +15941,7 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
 			goto err_reset;
 	}
 
-	napi_enable(&bnapi->napi);
+	napi_enable_locked(&bnapi->napi);
 	bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons);
 
 	for (i = 0; i < bp->nr_vnics; i++) {
@@ -15964,7 +15993,7 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
 	bnxt_hwrm_rx_ring_free(bp, rxr, false);
 	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
 	page_pool_disable_direct_recycling(rxr->page_pool);
-	if (bnxt_separate_head_pool())
+	if (bnxt_separate_head_pool(rxr))
 		page_pool_disable_direct_recycling(rxr->head_pool);
 
 	if (bp->flags & BNXT_FLAG_SHARED_RINGS)
@@ -15974,7 +16003,7 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
 	 * completion is handled in NAPI to guarantee no more DMA on that ring
 	 * after seeing the completion.
 	 */
-	napi_disable(&bnapi->napi);
+	napi_disable_locked(&bnapi->napi);
 
 	if (bp->tph_mode) {
 		bnxt_hwrm_cp_ring_free(bp, rxr->rx_cpr);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 13db8b7bd4b7..9917a5e7c57e 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -897,7 +897,7 @@ struct bnxt_sw_tx_bd {
 };
 
 struct bnxt_sw_rx_bd {
-	struct page		*page;
+	netmem_ref		netmem;
 	unsigned int		offset;
 	dma_addr_t		mapping;
 };
@@ -1110,6 +1110,7 @@ struct bnxt_rx_ring_info {
 	struct xdp_rxq_info	xdp_rxq;
 	struct page_pool	*page_pool;
 	struct page_pool	*head_pool;
+	bool			need_head_pool;
 };
 
 struct bnxt_rx_sw_stats {
@@ -2342,7 +2343,8 @@ struct bnxt {
 
 	struct sk_buff *	(*rx_skb_func)(struct bnxt *,
 					       struct bnxt_rx_ring_info *,
-					       u16, struct page *, unsigned int,
+					       u16, netmem_ref netmem,
+					       unsigned int,
 					       dma_addr_t,
 					       unsigned int);
 
@@ -2863,15 +2865,15 @@ static inline bool bnxt_sriov_cfg(struct bnxt *bp)
 #endif
 }
 
-static inline u8 *bnxt_data_ptr(struct bnxt *bp, struct page *page,
+static inline u8 *bnxt_data_ptr(struct bnxt *bp, netmem_ref netmem,
 				unsigned int offset)
 {
-	return page_address(page) + offset + bp->rx_offset;
+	return netmem_address(netmem) + offset + bp->rx_offset;
 }
 
-static inline u8 *bnxt_data(struct page *page, unsigned int offset)
+static inline u8 *bnxt_data(netmem_ref netmem, unsigned int offset)
 {
-	return page_address(page) + offset;
+	return netmem_address(netmem) + offset;
 }
 
 extern const u16 bnxt_bstore_to_trace[];
@@ -2879,8 +2881,8 @@ extern const u16 bnxt_lhint_arr[];
 
 int bnxt_alloc_rx_data(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
 		       u16 prod, gfp_t gfp);
-void bnxt_reuse_rx_data(struct bnxt_rx_ring_info *rxr, u16 cons,
-			struct page *page);
+void bnxt_reuse_rx_netmem(struct bnxt_rx_ring_info *rxr, u16 cons,
+			  netmem_ref netmem);
 u32 bnxt_fw_health_readl(struct bnxt *bp, int reg_idx);
 bool bnxt_bs_trace_avail(struct bnxt *bp, u16 type);
 void bnxt_set_tpa_flags(struct bnxt *bp);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 2938697aa381..949872279f50 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -4855,7 +4855,7 @@ static int bnxt_rx_loopback(struct bnxt *bp, struct bnxt_cp_ring_info *cpr,
 		&cpr->cp_desc_ring[CP_RING(cp_cons)][CP_IDX(cp_cons)];
 	cons = rxcmp->rx_cmp_opaque;
 	rx_buf = &rxr->rx_buf_ring[cons];
-	data = bnxt_data_ptr(bp, rx_buf->page, rx_buf->offset);
+	data = bnxt_data_ptr(bp, rx_buf->netmem, rx_buf->offset);
 	len = le32_to_cpu(rxcmp->rx_cmp_len_flags_type) >> RX_CMP_LEN_SHIFT;
 	if (len != pkt_size)
 		return -EIO;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index 1ba2ad7bf4b0..e6f5d2a2bb60 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -180,12 +180,13 @@ bool bnxt_xdp_attached(struct bnxt *bp, struct bnxt_rx_ring_info *rxr)
 }
 
 void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
-			u16 cons, struct page *page, unsigned int len,
+			u16 cons, netmem_ref netmem, unsigned int len,
 			struct xdp_buff *xdp)
 {
-	page_pool_dma_sync_for_cpu(rxr->page_pool, page, bp->rx_offset, len);
+	page_pool_dma_sync_netmem_for_cpu(rxr->page_pool, netmem,
+					  bp->rx_offset, len);
 	xdp_init_buff(xdp, BNXT_RX_PAGE_SIZE, &rxr->xdp_rxq);
-	xdp_prepare_buff(xdp, page_address(page), bp->rx_offset, len, true);
+	xdp_prepare_buff(xdp, netmem_address(netmem), bp->rx_offset, len, true);
 }
 
 void bnxt_xdp_buff_frags_free(struct bnxt_rx_ring_info *rxr,
@@ -198,9 +199,9 @@ void bnxt_xdp_buff_frags_free(struct bnxt_rx_ring_info *rxr,
 		return;
 	shinfo = xdp_get_shared_info_from_buff(xdp);
 	for (i = 0; i < shinfo->nr_frags; i++) {
-		struct page *page = skb_frag_page(&shinfo->frags[i]);
+		netmem_ref netmem = skb_frag_netmem(&shinfo->frags[i]);
 
-		page_pool_recycle_direct(rxr->page_pool, page);
+		page_pool_recycle_direct_netmem(rxr->page_pool, netmem);
 	}
 	shinfo->nr_frags = 0;
 }
@@ -210,7 +211,7 @@ void bnxt_xdp_buff_frags_free(struct bnxt_rx_ring_info *rxr,
  * false   - packet should be passed to the stack.
  */
 bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
-		 struct xdp_buff *xdp, struct page *page, u8 **data_ptr,
+		 struct xdp_buff *xdp, netmem_ref netmem, u8 **data_ptr,
 		 unsigned int *len, u8 *event)
 {
 	struct bpf_prog *xdp_prog = READ_ONCE(rxr->xdp_prog);
@@ -268,16 +269,17 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 		if (tx_avail < tx_needed) {
 			trace_xdp_exception(bp->dev, xdp_prog, act);
 			bnxt_xdp_buff_frags_free(rxr, xdp);
-			bnxt_reuse_rx_data(rxr, cons, page);
+			bnxt_reuse_rx_netmem(rxr, cons, netmem);
 			return true;
 		}
 
-		page_pool_dma_sync_for_cpu(rxr->page_pool, page, offset, *len);
+		page_pool_dma_sync_netmem_for_cpu(rxr->page_pool, netmem,
+						  offset, *len);
 
 		*event |= BNXT_TX_EVENT;
 		__bnxt_xmit_xdp(bp, txr, mapping + offset, *len,
 				NEXT_RX(rxr->rx_prod), xdp);
-		bnxt_reuse_rx_data(rxr, cons, page);
+		bnxt_reuse_rx_netmem(rxr, cons, netmem);
 		return true;
 	case XDP_REDIRECT:
 		/* if we are calling this here then we know that the
@@ -289,13 +291,13 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 		if (bnxt_alloc_rx_data(bp, rxr, rxr->rx_prod, GFP_ATOMIC)) {
 			trace_xdp_exception(bp->dev, xdp_prog, act);
 			bnxt_xdp_buff_frags_free(rxr, xdp);
-			bnxt_reuse_rx_data(rxr, cons, page);
+			bnxt_reuse_rx_netmem(rxr, cons, netmem);
 			return true;
 		}
 
 		if (xdp_do_redirect(bp->dev, xdp, xdp_prog)) {
 			trace_xdp_exception(bp->dev, xdp_prog, act);
-			page_pool_recycle_direct(rxr->page_pool, page);
+			page_pool_recycle_direct_netmem(rxr->page_pool, netmem);
 			return true;
 		}
 
@@ -309,7 +311,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 		fallthrough;
 	case XDP_DROP:
 		bnxt_xdp_buff_frags_free(rxr, xdp);
-		bnxt_reuse_rx_data(rxr, cons, page);
+		bnxt_reuse_rx_netmem(rxr, cons, netmem);
 		break;
 	}
 	return true;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
index 9592d04e0661..85b6df6a9e7f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
@@ -18,7 +18,7 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
 				   struct xdp_buff *xdp);
 void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int budget);
 bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
-		 struct xdp_buff *xdp, struct page *page, u8 **data_ptr,
+		 struct xdp_buff *xdp, netmem_ref netmem, u8 **data_ptr,
 		 unsigned int *len, u8 *event);
 int bnxt_xdp(struct net_device *dev, struct netdev_bpf *xdp);
 int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
@@ -27,7 +27,7 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
 bool bnxt_xdp_attached(struct bnxt *bp, struct bnxt_rx_ring_info *rxr);
 
 void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
-			u16 cons, struct page *page, unsigned int len,
+			u16 cons, netmem_ref netmem, unsigned int len,
 			struct xdp_buff *xdp);
 void bnxt_xdp_buff_frags_free(struct bnxt_rx_ring_info *rxr,
 			      struct xdp_buff *xdp);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index fa79145518d1..08aa6891e17b 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4225,6 +4225,7 @@ u8 dev_xdp_sb_prog_count(struct net_device *dev);
 u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode);
 
 u32 dev_get_min_mp_channel_count(const struct net_device *dev);
+bool dev_is_mp_channel(struct net_device *dev, int i);
 
 int __dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
 int dev_forward_skb(struct net_device *dev, struct sk_buff *skb);
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 582a3d00cbe2..9b7a3a996bbe 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -395,6 +395,12 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 	page_pool_put_full_page(pool, page, true);
 }
 
+static inline void page_pool_recycle_direct_netmem(struct page_pool *pool,
+						   netmem_ref netmem)
+{
+	page_pool_put_full_netmem(pool, netmem, true);
+}
+
 #define PAGE_POOL_32BIT_ARCH_WITH_64BIT_DMA	\
 		(sizeof(dma_addr_t) > sizeof(unsigned long))
 
diff --git a/net/core/dev.c b/net/core/dev.c
index be17e0660144..f8cb53782dad 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -10367,6 +10367,12 @@ u32 dev_get_min_mp_channel_count(const struct net_device *dev)
 	return 0;
 }
 
+bool dev_is_mp_channel(struct net_device *dev, int i)
+{
+	return !!dev->_rx[i].mp_params.mp_priv;
+}
+EXPORT_SYMBOL(dev_is_mp_channel);
+
 /**
  * dev_index_reserve() - allocate an ifindex in a namespace
  * @net: the applicable net namespace
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor
  2025-03-31 11:47 ` [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor Taehee Yoo
@ 2025-03-31 17:34   ` Jakub Kicinski
  2025-04-01  6:48     ` Taehee Yoo
  2025-04-01  5:39   ` Michael Chan
  2025-04-01 15:20   ` David Wei
  2 siblings, 1 reply; 16+ messages in thread
From: Jakub Kicinski @ 2025-03-31 17:34 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, michael.chan,
	pavan.chebbi, ilias.apalodimas, dw, netdev, kuniyu, sdf,
	aleksander.lobakin

On Mon, 31 Mar 2025 11:47:28 +0000 Taehee Yoo wrote:
> There are two kinds of buffer descriptors in bnxt, struct
> bnxt_sw_rx_bd and struct bnxt_sw_rx_agg_bd.(+ struct bnxt_tpa_info).
> The bnxt_sw_rx_bd is the bd for ring buffer, the bnxt_sw_rx_agg_bd is
> the bd for the aggregation ring buffer. The purpose of these bd are the
> same, but the structure is a little bit different.
> 
> struct bnxt_sw_rx_bd {
>         void *data;
>         u8 *data_ptr;
>         dma_addr_t mapping;
> };
> 
> struct bnxt_sw_rx_agg_bd {
>         struct page *page;
>         unsigned int offset;
>         dma_addr_t mapping;
> }
> 
> bnxt_sw_rx_bd->data would be either page pointer or page_address(page) +
> offset. Under page mode(xdp is set), data indicates page pointer,
> if not, it indicates virtual address.
> Before the recent head_pool work from Jakub, bnxt_sw_rx_bd->data was
> allocated by kmalloc().
> But after Jakub's work, bnxt_sw_rx_bd->data is allocated by page_pool.
> So, there is no reason to still keep handling virtual address anymore.
> The goal of this patch is to make bnxt_sw_rx_bd the same as
> the bnxt_sw_rx_agg_bd.
> By this change, we can easily use page_pool API like
> page_pool_dma_sync_for_{cpu | device}()
> Also, we can convert from page to the netmem very smoothly by this change.

LGTM, could you split this into two patches, tho?
One for the BD change and one for the syncing changes?

> -	dma_sync_single_for_device(&pdev->dev, mapping, bp->rx_copybreak,
> -				   bp->rx_dir);
> -
> +	page_pool_dma_sync_for_device(rxr->head_pool, page_to_netmem(page),
> +				      bp->rx_dma_offset, bp->rx_copybreak);

I think we should add a separate helper for this instead of extending
the existing page_pool_dma_sync_for_device(). Let's call it
page_pool_dma_sync_for_device_frag() ?

The use case here is that the driver recycles a frag directly, rather
than following the normal PP recycling path.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP
  2025-03-31 11:47 ` [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP Taehee Yoo
@ 2025-03-31 18:50   ` Jakub Kicinski
  2025-04-02 12:09     ` Taehee Yoo
  2025-04-02 22:11     ` Mina Almasry
  2025-04-02 22:16   ` Mina Almasry
  1 sibling, 2 replies; 16+ messages in thread
From: Jakub Kicinski @ 2025-03-31 18:50 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, michael.chan,
	pavan.chebbi, ilias.apalodimas, dw, netdev, kuniyu, sdf,
	aleksander.lobakin

On Mon, 31 Mar 2025 11:47:29 +0000 Taehee Yoo wrote:
> Currently, bnxt_en driver satisfies the requirements of the Device
> memory TCP, which is HDS.
> So, it implements rx-side Device memory TCP for bnxt_en driver.
> It requires only converting the page API to netmem API.
> `struct page` for rx-size are changed to `netmem_ref netmem` and
> corresponding functions are changed to a variant of netmem API.
> 
> It also passes PP_FLAG_ALLOW_UNREADABLE_NETMEM flag to a parameter of
> page_pool.
> The netmem will be activated only when a user requests devmem TCP.
> 
> When netmem is activated, received data is unreadable and netmem is
> disabled, received data is readable.
> But drivers don't need to handle both cases because netmem core API will
> handle it properly.
> So, using proper netmem API is enough for drivers.

> @@ -1352,15 +1367,15 @@ static struct sk_buff *bnxt_copy_data(struct bnxt_napi *bnapi,
>  	if (!skb)
>  		return NULL;
>  
> -	page_pool_dma_sync_for_cpu(rxr->head_pool, page,
> -				   offset + bp->rx_dma_offset,
> -				   bp->rx_copybreak);
> +	page_pool_dma_sync_netmem_for_cpu(rxr->head_pool, netmem,
> +					  offset + bp->rx_dma_offset,
> +					  bp->rx_copybreak);
>  
>  	memcpy(skb->data - NET_IP_ALIGN,
> -	       bnxt_data_ptr(bp, page, offset) - NET_IP_ALIGN,
> +	       bnxt_data_ptr(bp, netmem, offset) - NET_IP_ALIGN,
>  	       len + NET_IP_ALIGN);
>  
> -	page_pool_dma_sync_for_device(rxr->head_pool, page_to_netmem(page),
> +	page_pool_dma_sync_for_device(rxr->head_pool, netmem,
>  				      bp->rx_dma_offset, bp->rx_copybreak);
>  	skb_put(skb, len);
>  

Do we check if rx copybreak is enabled before allowing ZC to be enabled?
We can't copybreak with unreadable memory.

> @@ -15912,7 +15941,7 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
>  			goto err_reset;
>  	}
>  
> -	napi_enable(&bnapi->napi);
> +	napi_enable_locked(&bnapi->napi);
>  	bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons);
>  
>  	for (i = 0; i < bp->nr_vnics; i++) {
> @@ -15964,7 +15993,7 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>  	bnxt_hwrm_rx_ring_free(bp, rxr, false);
>  	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
>  	page_pool_disable_direct_recycling(rxr->page_pool);
> -	if (bnxt_separate_head_pool())
> +	if (bnxt_separate_head_pool(rxr))
>  		page_pool_disable_direct_recycling(rxr->head_pool);
>  
>  	if (bp->flags & BNXT_FLAG_SHARED_RINGS)
> @@ -15974,7 +16003,7 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>  	 * completion is handled in NAPI to guarantee no more DMA on that ring
>  	 * after seeing the completion.
>  	 */
> -	napi_disable(&bnapi->napi);
> +	napi_disable_locked(&bnapi->napi);

This is a fix right? The IRQ code already calls the queue reset without
rtnl_lock. Let's split it up and submit for net.

> @@ -2863,15 +2865,15 @@ static inline bool bnxt_sriov_cfg(struct bnxt *bp)
>  #endif
>  }
>  
> -static inline u8 *bnxt_data_ptr(struct bnxt *bp, struct page *page,
> +static inline u8 *bnxt_data_ptr(struct bnxt *bp, netmem_ref netmem,
>  				unsigned int offset)
>  {
> -	return page_address(page) + offset + bp->rx_offset;
> +	return netmem_address(netmem) + offset + bp->rx_offset;
>  }
>  
> -static inline u8 *bnxt_data(struct page *page, unsigned int offset)
> +static inline u8 *bnxt_data(netmem_ref netmem, unsigned int offset)
>  {
> -	return page_address(page) + offset;
> +	return netmem_address(netmem) + offset;
>  }

This is not great, seems like the unification of normal vs agg bd struct
backfires here. unreadable netmem can only be populated in agg bds
right? So why don't we keep the structs separate and avoid the need
to convert from netmem back to a VA?

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
> index 9592d04e0661..85b6df6a9e7f 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
> @@ -18,7 +18,7 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
>  				   struct xdp_buff *xdp);
>  void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int budget);
>  bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
> -		 struct xdp_buff *xdp, struct page *page, u8 **data_ptr,
> +		 struct xdp_buff *xdp, netmem_ref netmem, u8 **data_ptr,
>  		 unsigned int *len, u8 *event);
>  int bnxt_xdp(struct net_device *dev, struct netdev_bpf *xdp);
>  int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
> @@ -27,7 +27,7 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
>  bool bnxt_xdp_attached(struct bnxt *bp, struct bnxt_rx_ring_info *rxr);
>  
>  void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
> -			u16 cons, struct page *page, unsigned int len,
> +			u16 cons, netmem_ref netmem, unsigned int len,
>  			struct xdp_buff *xdp);

We also shouldn't pass netmem to XDP init, it's strange conceptually.
If we reach XDP it has to be a non-net_iov page.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor
  2025-03-31 11:47 ` [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor Taehee Yoo
  2025-03-31 17:34   ` Jakub Kicinski
@ 2025-04-01  5:39   ` Michael Chan
  2025-04-01  7:17     ` Taehee Yoo
  2025-04-01 15:20   ` David Wei
  2 siblings, 1 reply; 16+ messages in thread
From: Michael Chan @ 2025-04-01  5:39 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, pavan.chebbi,
	ilias.apalodimas, dw, netdev, kuniyu, sdf, aleksander.lobakin

[-- Attachment #1: Type: text/plain, Size: 1546 bytes --]

On Mon, Mar 31, 2025 at 4:47 AM Taehee Yoo <ap420073@gmail.com> wrote:

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 934ba9425857..198a42da3015 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> @@ -915,24 +915,24 @@ static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
>         if (!page)
>                 return NULL;
>
> -       *mapping = page_pool_get_dma_addr(page) + *offset;
> +       *mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;

Why are we changing the logic here by adding bp->rx_dma_offset?
Please explain this and other similar offset changes in the rest of
the patch.  It may be more clear if you split this patch into smaller
patches.

> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> index 21726cf56586..13db8b7bd4b7 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> @@ -807,7 +807,7 @@ struct nqe_cn {
>  #define SW_RXBD_RING_SIZE (sizeof(struct bnxt_sw_rx_bd) * RX_DESC_CNT)
>  #define HW_RXBD_RING_SIZE (sizeof(struct rx_bd) * RX_DESC_CNT)
>
> -#define SW_RXBD_AGG_RING_SIZE (sizeof(struct bnxt_sw_rx_agg_bd) * RX_DESC_CNT)
> +#define SW_RXBD_AGG_RING_SIZE (sizeof(struct bnxt_sw_rx_bd) * RX_DESC_CNT)

I would just eliminate SW_RXBD_AGG_RING_SIZE since it is now identical
to SW_RXBD_RING_SIZE.
Thanks.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4196 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor
  2025-03-31 17:34   ` Jakub Kicinski
@ 2025-04-01  6:48     ` Taehee Yoo
  0 siblings, 0 replies; 16+ messages in thread
From: Taehee Yoo @ 2025-04-01  6:48 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, michael.chan,
	pavan.chebbi, ilias.apalodimas, dw, netdev, kuniyu, sdf,
	aleksander.lobakin

On Tue, Apr 1, 2025 at 2:34 AM Jakub Kicinski <kuba@kernel.org> wrote:
>

Hi Jakub,
Thanks a lot for the review!

> On Mon, 31 Mar 2025 11:47:28 +0000 Taehee Yoo wrote:
> > There are two kinds of buffer descriptors in bnxt, struct
> > bnxt_sw_rx_bd and struct bnxt_sw_rx_agg_bd.(+ struct bnxt_tpa_info).
> > The bnxt_sw_rx_bd is the bd for ring buffer, the bnxt_sw_rx_agg_bd is
> > the bd for the aggregation ring buffer. The purpose of these bd are the
> > same, but the structure is a little bit different.
> >
> > struct bnxt_sw_rx_bd {
> >         void *data;
> >         u8 *data_ptr;
> >         dma_addr_t mapping;
> > };
> >
> > struct bnxt_sw_rx_agg_bd {
> >         struct page *page;
> >         unsigned int offset;
> >         dma_addr_t mapping;
> > }
> >
> > bnxt_sw_rx_bd->data would be either page pointer or page_address(page) +
> > offset. Under page mode(xdp is set), data indicates page pointer,
> > if not, it indicates virtual address.
> > Before the recent head_pool work from Jakub, bnxt_sw_rx_bd->data was
> > allocated by kmalloc().
> > But after Jakub's work, bnxt_sw_rx_bd->data is allocated by page_pool.
> > So, there is no reason to still keep handling virtual address anymore.
> > The goal of this patch is to make bnxt_sw_rx_bd the same as
> > the bnxt_sw_rx_agg_bd.
> > By this change, we can easily use page_pool API like
> > page_pool_dma_sync_for_{cpu | device}()
> > Also, we can convert from page to the netmem very smoothly by this change.
>
> LGTM, could you split this into two patches, tho?
> One for the BD change and one for the syncing changes?

Okay, I will split this patch.

>
> > -     dma_sync_single_for_device(&pdev->dev, mapping, bp->rx_copybreak,
> > -                                bp->rx_dir);
> > -
> > +     page_pool_dma_sync_for_device(rxr->head_pool, page_to_netmem(page),
> > +                                   bp->rx_dma_offset, bp->rx_copybreak);
>
> I think we should add a separate helper for this instead of extending
> the existing page_pool_dma_sync_for_device(). Let's call it
> page_pool_dma_sync_for_device_frag() ?

Thanks. I will add a separate helper,
the page_pool_dma_sync_for_device_frag().

>
> The use case here is that the driver recycles a frag directly, rather
> than following the normal PP recycling path.

I'm not sure that I understand this use case correctly.
I think it's like when a packet is dropped or fully
copied by rx-copy-break, a driver can reuse frag directly.
To reuse a frag in a driver directly, a driver can use
page_pool_dma_sync_for_device_frag() before setting ring buffer.
If I misunderstand, please let me know.

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor
  2025-04-01  5:39   ` Michael Chan
@ 2025-04-01  7:17     ` Taehee Yoo
  2025-04-01 15:22       ` David Wei
  0 siblings, 1 reply; 16+ messages in thread
From: Taehee Yoo @ 2025-04-01  7:17 UTC (permalink / raw)
  To: Michael Chan
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, pavan.chebbi,
	ilias.apalodimas, dw, netdev, kuniyu, sdf, aleksander.lobakin

On Tue, Apr 1, 2025 at 2:39 PM Michael Chan <michael.chan@broadcom.com> wrote:
>

Hi Michael,
Thanks a lot for the review!

> On Mon, Mar 31, 2025 at 4:47 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > index 934ba9425857..198a42da3015 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> > @@ -915,24 +915,24 @@ static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
> >         if (!page)
> >                 return NULL;
> >
> > -       *mapping = page_pool_get_dma_addr(page) + *offset;
> > +       *mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;
>
> Why are we changing the logic here by adding bp->rx_dma_offset?
> Please explain this and other similar offset changes in the rest of
> the patch.  It may be more clear if you split this patch into smaller
> patches.

Apologies for a lack of explanation.
This change intends to make the two functions similar.
__bnxt_alloc_rx_page() and __bnxt_alloc_rx_frag().

Original code like this.
```
    __bnxt_alloc_rx_page()
        *mapping = page_pool_get_dma_addr(page) + *offset;
    __bnxt_alloc_rx_frag()
        *mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + offset;

Then, we use a mapping value like below.
    bnxt_alloc_rx_data()
        if (BNXT_RX_PAGE_MODE(bp)) {
            ...
            mapping += bp->rx_dma_offset;
        }

    rx_buf->mapping = mapping;

    bnxt_alloc_rx_page()
        page = __bnxt_alloc_rx_page();
        // no mapping offset change.
```

So I changed this logic like below.
```
    __bnxt_alloc_rx_page()
        *mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;
    __bnxt_alloc_rx_frag()
        *mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;

    bnxt_alloc_rx_data()
        // no mapping offset change.
        rx_buf->mapping = mapping;

    bnxt_alloc_rx_page()
        page = __bnxt_alloc_rx_page();
        mapping -= bp->rx_dma_offset; //added for this change.
```

However, including this change is not necessary for this patchset.
Moreover, it makes the patch harder to review.
Therefore, as you mentioned, I would like to drop this change for now
and submit a separate patch for it later.

Also, if you think I missed other points, please let me know!

>
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > index 21726cf56586..13db8b7bd4b7 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
> > @@ -807,7 +807,7 @@ struct nqe_cn {
> >  #define SW_RXBD_RING_SIZE (sizeof(struct bnxt_sw_rx_bd) * RX_DESC_CNT)
> >  #define HW_RXBD_RING_SIZE (sizeof(struct rx_bd) * RX_DESC_CNT)
> >
> > -#define SW_RXBD_AGG_RING_SIZE (sizeof(struct bnxt_sw_rx_agg_bd) * RX_DESC_CNT)
> > +#define SW_RXBD_AGG_RING_SIZE (sizeof(struct bnxt_sw_rx_bd) * RX_DESC_CNT)
>
> I would just eliminate SW_RXBD_AGG_RING_SIZE since it is now identical
> to SW_RXBD_RING_SIZE.
> Thanks.

Okay, I will remove SW_RXBD_AGG_RING_SIZE and then use
SW_RXBD_RING_SIZE instead.

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor
  2025-03-31 11:47 ` [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor Taehee Yoo
  2025-03-31 17:34   ` Jakub Kicinski
  2025-04-01  5:39   ` Michael Chan
@ 2025-04-01 15:20   ` David Wei
  2 siblings, 0 replies; 16+ messages in thread
From: David Wei @ 2025-04-01 15:20 UTC (permalink / raw)
  To: Taehee Yoo, davem, kuba, pabeni, edumazet, andrew+netdev, horms,
	michael.chan, pavan.chebbi, ilias.apalodimas, netdev
  Cc: kuniyu, sdf, aleksander.lobakin

On 2025-03-31 04:47, Taehee Yoo wrote:
> There are two kinds of buffer descriptors in bnxt, struct
> bnxt_sw_rx_bd and struct bnxt_sw_rx_agg_bd.(+ struct bnxt_tpa_info).
> The bnxt_sw_rx_bd is the bd for ring buffer, the bnxt_sw_rx_agg_bd is
> the bd for the aggregation ring buffer. The purpose of these bd are the
> same, but the structure is a little bit different.
> 
> struct bnxt_sw_rx_bd {
>         void *data;
>         u8 *data_ptr;
>         dma_addr_t mapping;
> };
> 
> struct bnxt_sw_rx_agg_bd {
>         struct page *page;
>         unsigned int offset;
>         dma_addr_t mapping;
> }
> 
> bnxt_sw_rx_bd->data would be either page pointer or page_address(page) +
> offset. Under page mode(xdp is set), data indicates page pointer,
> if not, it indicates virtual address.
> Before the recent head_pool work from Jakub, bnxt_sw_rx_bd->data was
> allocated by kmalloc().
> But after Jakub's work, bnxt_sw_rx_bd->data is allocated by page_pool.
> So, there is no reason to still keep handling virtual address anymore.
> The goal of this patch is to make bnxt_sw_rx_bd the same as
> the bnxt_sw_rx_agg_bd.
> By this change, we can easily use page_pool API like
> page_pool_dma_sync_for_{cpu | device}()
> Also, we can convert from page to the netmem very smoothly by this change.
> 
> Tested-by: David Wei <dw@davidwei.uk>
> Signed-off-by: Taehee Yoo <ap420073@gmail.com>
> ---
>  drivers/net/ethernet/broadcom/bnxt/bnxt.c     | 313 +++++++++---------
>  drivers/net/ethernet/broadcom/bnxt/bnxt.h     |  31 +-
>  .../net/ethernet/broadcom/bnxt/bnxt_ethtool.c |   2 +-
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |  23 +-
>  drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h |   2 +-
>  include/net/page_pool/types.h                 |   4 +-
>  net/core/page_pool.c                          |  23 +-
>  7 files changed, 199 insertions(+), 199 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> index 934ba9425857..198a42da3015 100644
> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
...
> @@ -1111,25 +1103,24 @@ static void bnxt_reuse_rx_agg_bufs(struct bnxt_cp_ring_info *cpr, u16 idx,
>  
>  static struct sk_buff *bnxt_rx_multi_page_skb(struct bnxt *bp,
>  					      struct bnxt_rx_ring_info *rxr,
> -					      u16 cons, void *data, u8 *data_ptr,
> +					      u16 cons, struct page *page,
> +					      unsigned int offset,
>  					      dma_addr_t dma_addr,
>  					      unsigned int offset_and_len)
>  {
>  	unsigned int len = offset_and_len & 0xffff;
> -	struct page *page = data;
>  	u16 prod = rxr->rx_prod;
>  	struct sk_buff *skb;
>  	int err;
>  
>  	err = bnxt_alloc_rx_data(bp, rxr, prod, GFP_ATOMIC);
>  	if (unlikely(err)) {
> -		bnxt_reuse_rx_data(rxr, cons, data);
> +		bnxt_reuse_rx_data(rxr, cons, page);
>  		return NULL;
>  	}
> -	dma_addr -= bp->rx_dma_offset;
> -	dma_sync_single_for_cpu(&bp->pdev->dev, dma_addr, BNXT_RX_PAGE_SIZE,
> -				bp->rx_dir);
> -	skb = napi_build_skb(data_ptr - bp->rx_offset, BNXT_RX_PAGE_SIZE);
> +	page_pool_dma_sync_for_cpu(rxr->page_pool, page, 0, BNXT_RX_PAGE_SIZE);

bnxt_alloc_rx_data() can allocate out of both head_pool and page_pool,
but page_pool_dma_sync_for_cpu() always takes page_pool. The only
difference is pool->p.offset; can this ever be different between the
two?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor
  2025-04-01  7:17     ` Taehee Yoo
@ 2025-04-01 15:22       ` David Wei
  0 siblings, 0 replies; 16+ messages in thread
From: David Wei @ 2025-04-01 15:22 UTC (permalink / raw)
  To: Taehee Yoo, Michael Chan
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, pavan.chebbi,
	ilias.apalodimas, netdev, kuniyu, sdf, aleksander.lobakin

On 2025-04-01 00:17, Taehee Yoo wrote:
> On Tue, Apr 1, 2025 at 2:39 PM Michael Chan <michael.chan@broadcom.com> wrote:
>>
> 
> Hi Michael,
> Thanks a lot for the review!
> 
>> On Mon, Mar 31, 2025 at 4:47 AM Taehee Yoo <ap420073@gmail.com> wrote:
>>
>>> diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> index 934ba9425857..198a42da3015 100644
>>> --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
>>> @@ -915,24 +915,24 @@ static struct page *__bnxt_alloc_rx_page(struct bnxt *bp, dma_addr_t *mapping,
>>>         if (!page)
>>>                 return NULL;
>>>
>>> -       *mapping = page_pool_get_dma_addr(page) + *offset;
>>> +       *mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;
>>
>> Why are we changing the logic here by adding bp->rx_dma_offset?
>> Please explain this and other similar offset changes in the rest of
>> the patch.  It may be more clear if you split this patch into smaller
>> patches.
> 
> Apologies for a lack of explanation.
> This change intends to make the two functions similar.
> __bnxt_alloc_rx_page() and __bnxt_alloc_rx_frag().
> 
> Original code like this.
> ```
>     __bnxt_alloc_rx_page()
>         *mapping = page_pool_get_dma_addr(page) + *offset;
>     __bnxt_alloc_rx_frag()
>         *mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + offset;
> 
> Then, we use a mapping value like below.
>     bnxt_alloc_rx_data()
>         if (BNXT_RX_PAGE_MODE(bp)) {
>             ...
>             mapping += bp->rx_dma_offset;
>         }
> 
>     rx_buf->mapping = mapping;
> 
>     bnxt_alloc_rx_page()
>         page = __bnxt_alloc_rx_page();
>         // no mapping offset change.
> ```
> 
> So I changed this logic like below.
> ```
>     __bnxt_alloc_rx_page()
>         *mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;
>     __bnxt_alloc_rx_frag()
>         *mapping = page_pool_get_dma_addr(page) + bp->rx_dma_offset + *offset;
> 
>     bnxt_alloc_rx_data()
>         // no mapping offset change.
>         rx_buf->mapping = mapping;
> 
>     bnxt_alloc_rx_page()
>         page = __bnxt_alloc_rx_page();
>         mapping -= bp->rx_dma_offset; //added for this change.
> ```
> 
> However, including this change is not necessary for this patchset.
> Moreover, it makes the patch harder to review.
> Therefore, as you mentioned, I would like to drop this change for now
> and submit a separate patch for it later.

I double checked this when testing the patchset. The maths is correct,
though imo it just shifts the extra op (either add or sub
bp->rx_dma_offset) so I'm not sure how much it gains.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP
  2025-03-31 18:50   ` Jakub Kicinski
@ 2025-04-02 12:09     ` Taehee Yoo
  2025-04-02 12:46       ` Jakub Kicinski
  2025-04-02 22:11     ` Mina Almasry
  1 sibling, 1 reply; 16+ messages in thread
From: Taehee Yoo @ 2025-04-02 12:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, michael.chan,
	pavan.chebbi, ilias.apalodimas, dw, netdev, kuniyu, sdf,
	aleksander.lobakin

On Tue, Apr 1, 2025 at 3:50 AM Jakub Kicinski <kuba@kernel.org> wrote:
>

Hi Jakub,
Thank you so much for the review!

> On Mon, 31 Mar 2025 11:47:29 +0000 Taehee Yoo wrote:
> > Currently, bnxt_en driver satisfies the requirements of the Device
> > memory TCP, which is HDS.
> > So, it implements rx-side Device memory TCP for bnxt_en driver.
> > It requires only converting the page API to netmem API.
> > `struct page` for rx-size are changed to `netmem_ref netmem` and
> > corresponding functions are changed to a variant of netmem API.
> >
> > It also passes PP_FLAG_ALLOW_UNREADABLE_NETMEM flag to a parameter of
> > page_pool.
> > The netmem will be activated only when a user requests devmem TCP.
> >
> > When netmem is activated, received data is unreadable and netmem is
> > disabled, received data is readable.
> > But drivers don't need to handle both cases because netmem core API will
> > handle it properly.
> > So, using proper netmem API is enough for drivers.
>
> > @@ -1352,15 +1367,15 @@ static struct sk_buff *bnxt_copy_data(struct bnxt_napi *bnapi,
> >       if (!skb)
> >               return NULL;
> >
> > -     page_pool_dma_sync_for_cpu(rxr->head_pool, page,
> > -                                offset + bp->rx_dma_offset,
> > -                                bp->rx_copybreak);
> > +     page_pool_dma_sync_netmem_for_cpu(rxr->head_pool, netmem,
> > +                                       offset + bp->rx_dma_offset,
> > +                                       bp->rx_copybreak);
> >
> >       memcpy(skb->data - NET_IP_ALIGN,
> > -            bnxt_data_ptr(bp, page, offset) - NET_IP_ALIGN,
> > +            bnxt_data_ptr(bp, netmem, offset) - NET_IP_ALIGN,
> >              len + NET_IP_ALIGN);
> >
> > -     page_pool_dma_sync_for_device(rxr->head_pool, page_to_netmem(page),
> > +     page_pool_dma_sync_for_device(rxr->head_pool, netmem,
> >                                     bp->rx_dma_offset, bp->rx_copybreak);
> >       skb_put(skb, len);
> >
>
> Do we check if rx copybreak is enabled before allowing ZC to be enabled?
> We can't copybreak with unreadable memory.

For the bnxt driver, only the first page's data is copied by rx-copybreak.
agg buffers are not affected by rx-copybreak.
So I think it's okay.
I tested rx-copybreak+devmem TCP, and it works well.
0 ~ 1024 rx-copybreak and from very small MTU to large MTU.

>
> > @@ -15912,7 +15941,7 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
> >                       goto err_reset;
> >       }
> >
> > -     napi_enable(&bnapi->napi);
> > +     napi_enable_locked(&bnapi->napi);
> >       bnxt_db_nq_arm(bp, &cpr->cp_db, cpr->cp_raw_cons);
> >
> >       for (i = 0; i < bp->nr_vnics; i++) {
> > @@ -15964,7 +15993,7 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
> >       bnxt_hwrm_rx_ring_free(bp, rxr, false);
> >       bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
> >       page_pool_disable_direct_recycling(rxr->page_pool);
> > -     if (bnxt_separate_head_pool())
> > +     if (bnxt_separate_head_pool(rxr))
> >               page_pool_disable_direct_recycling(rxr->head_pool);
> >
> >       if (bp->flags & BNXT_FLAG_SHARED_RINGS)
> > @@ -15974,7 +16003,7 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
> >        * completion is handled in NAPI to guarantee no more DMA on that ring
> >        * after seeing the completion.
> >        */
> > -     napi_disable(&bnapi->napi);
> > +     napi_disable_locked(&bnapi->napi);
>
> This is a fix right? The IRQ code already calls the queue reset without
> rtnl_lock. Let's split it up and submit for net.
>

Oh yes, it's a fix indeed.
I will send a patch to fix this.
Thanks a lot!

> > @@ -2863,15 +2865,15 @@ static inline bool bnxt_sriov_cfg(struct bnxt *bp)
> >  #endif
> >  }
> >
> > -static inline u8 *bnxt_data_ptr(struct bnxt *bp, struct page *page,
> > +static inline u8 *bnxt_data_ptr(struct bnxt *bp, netmem_ref netmem,
> >                               unsigned int offset)
> >  {
> > -     return page_address(page) + offset + bp->rx_offset;
> > +     return netmem_address(netmem) + offset + bp->rx_offset;
> >  }
> >
> > -static inline u8 *bnxt_data(struct page *page, unsigned int offset)
> > +static inline u8 *bnxt_data(netmem_ref netmem, unsigned int offset)
> >  {
> > -     return page_address(page) + offset;
> > +     return netmem_address(netmem) + offset;
> >  }
>
> This is not great, seems like the unification of normal vs agg bd struct
> backfires here. unreadable netmem can only be populated in agg bds
> right? So why don't we keep the structs separate and avoid the need
> to convert from netmem back to a VA?

Okay, I will drop the 1/2 patch, and I will convert only agg ring
to netmem. So a normal ring will still handle
VA instead of page.
If there are refactorable changes in this driver,
I will send that separately.

>
> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
> > index 9592d04e0661..85b6df6a9e7f 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
> > @@ -18,7 +18,7 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
> >                                  struct xdp_buff *xdp);
> >  void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int budget);
> >  bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
> > -              struct xdp_buff *xdp, struct page *page, u8 **data_ptr,
> > +              struct xdp_buff *xdp, netmem_ref netmem, u8 **data_ptr,
> >                unsigned int *len, u8 *event);
> >  int bnxt_xdp(struct net_device *dev, struct netdev_bpf *xdp);
> >  int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
> > @@ -27,7 +27,7 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
> >  bool bnxt_xdp_attached(struct bnxt *bp, struct bnxt_rx_ring_info *rxr);
> >
> >  void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
> > -                     u16 cons, struct page *page, unsigned int len,
> > +                     u16 cons, netmem_ref netmem, unsigned int len,
> >                       struct xdp_buff *xdp);
>
> We also shouldn't pass netmem to XDP init, it's strange conceptually.
> If we reach XDP it has to be a non-net_iov page.

A normal page_pool will not be converted to netmem in the next version.
So, XDP will still handle a page.

Thanks a lot!
Taehee Yoo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP
  2025-04-02 12:09     ` Taehee Yoo
@ 2025-04-02 12:46       ` Jakub Kicinski
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2025-04-02 12:46 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, pabeni, edumazet, andrew+netdev, horms, michael.chan,
	pavan.chebbi, ilias.apalodimas, dw, netdev, kuniyu, sdf,
	aleksander.lobakin

On Wed, 2 Apr 2025 21:09:11 +0900 Taehee Yoo wrote:
> > Do we check if rx copybreak is enabled before allowing ZC to be enabled?
> > We can't copybreak with unreadable memory.  
> 
> For the bnxt driver, only the first page's data is copied by rx-copybreak.
> agg buffers are not affected by rx-copybreak.
> So I think it's okay.
> I tested rx-copybreak+devmem TCP, and it works well.
> 0 ~ 1024 rx-copybreak and from very small MTU to large MTU.

Ah, good point!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP
  2025-03-31 18:50   ` Jakub Kicinski
  2025-04-02 12:09     ` Taehee Yoo
@ 2025-04-02 22:11     ` Mina Almasry
  2025-04-02 22:45       ` Jakub Kicinski
  1 sibling, 1 reply; 16+ messages in thread
From: Mina Almasry @ 2025-04-02 22:11 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Taehee Yoo, davem, pabeni, edumazet, andrew+netdev, horms,
	michael.chan, pavan.chebbi, ilias.apalodimas, dw, netdev, kuniyu,
	sdf, aleksander.lobakin

On Mon, Mar 31, 2025 at 11:50 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Mon, 31 Mar 2025 11:47:29 +0000 Taehee Yoo wrote:
> > @@ -2863,15 +2865,15 @@ static inline bool bnxt_sriov_cfg(struct bnxt *bp)
> >  #endif
> >  }
> >
> > -static inline u8 *bnxt_data_ptr(struct bnxt *bp, struct page *page,
> > +static inline u8 *bnxt_data_ptr(struct bnxt *bp, netmem_ref netmem,
> >                               unsigned int offset)
> >  {
> > -     return page_address(page) + offset + bp->rx_offset;
> > +     return netmem_address(netmem) + offset + bp->rx_offset;
> >  }
> >
> > -static inline u8 *bnxt_data(struct page *page, unsigned int offset)
> > +static inline u8 *bnxt_data(netmem_ref netmem, unsigned int offset)
> >  {
> > -     return page_address(page) + offset;
> > +     return netmem_address(netmem) + offset;
> >  }
>
> This is not great, seems like the unification of normal vs agg bd struct
> backfires here. unreadable netmem can only be populated in agg bds
> right? So why don't we keep the structs separate and avoid the need
> to convert from netmem back to a VA?
>

Another option for your consideration (I don't know if it's better):

static inline u8 *bnxt_data(netmem_ref netmem, unsigned int offset)
{
    void * addr = netmem_addr(netmem);
    if (!addr) return addr;
    return addr + offset;
}

That way you can combine the structs, but all users of the return
value of bnxt_data need to NULL check it.

This would more naturally extend to possible future readable net_iovs.

> > diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
> > index 9592d04e0661..85b6df6a9e7f 100644
> > --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
> > +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.h
> > @@ -18,7 +18,7 @@ struct bnxt_sw_tx_bd *bnxt_xmit_bd(struct bnxt *bp,
> >                                  struct xdp_buff *xdp);
> >  void bnxt_tx_int_xdp(struct bnxt *bp, struct bnxt_napi *bnapi, int budget);
> >  bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
> > -              struct xdp_buff *xdp, struct page *page, u8 **data_ptr,
> > +              struct xdp_buff *xdp, netmem_ref netmem, u8 **data_ptr,
> >                unsigned int *len, u8 *event);
> >  int bnxt_xdp(struct net_device *dev, struct netdev_bpf *xdp);
> >  int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
> > @@ -27,7 +27,7 @@ int bnxt_xdp_xmit(struct net_device *dev, int num_frames,
> >  bool bnxt_xdp_attached(struct bnxt *bp, struct bnxt_rx_ring_info *rxr);
> >
> >  void bnxt_xdp_buff_init(struct bnxt *bp, struct bnxt_rx_ring_info *rxr,
> > -                     u16 cons, struct page *page, unsigned int len,
> > +                     u16 cons, netmem_ref netmem, unsigned int len,
> >                       struct xdp_buff *xdp);
>
> We also shouldn't pass netmem to XDP init, it's strange conceptually.
> If we reach XDP it has to be a non-net_iov page.
>

Very noob question, but is XDP/netmem interactions completely
impossible for some reason? I was thinking XDP progs that only
touch/need the header may work with unreadable netmem, and if we ever
add readable net_iovs then those maybe can be exposed to XDP, no? Or
am I completely off the rails here?

--
Thanks,
Mina

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP
  2025-03-31 11:47 ` [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP Taehee Yoo
  2025-03-31 18:50   ` Jakub Kicinski
@ 2025-04-02 22:16   ` Mina Almasry
  2025-04-03  1:55     ` Taehee Yoo
  1 sibling, 1 reply; 16+ messages in thread
From: Mina Almasry @ 2025-04-02 22:16 UTC (permalink / raw)
  To: Taehee Yoo
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, michael.chan,
	pavan.chebbi, ilias.apalodimas, dw, netdev, kuniyu, sdf,
	aleksander.lobakin

On Mon, Mar 31, 2025 at 4:48 AM Taehee Yoo <ap420073@gmail.com> wrote:

> -static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
> -                                  struct bnxt_rx_ring_info *rxr,
> -                                  int numa_node)
> +static int bnxt_alloc_rx_netmem_pool(struct bnxt *bp,
> +                                    struct bnxt_rx_ring_info *rxr,
> +                                    int numa_node)
>  {
>         struct page_pool_params pp = { 0 };
>         struct page_pool *pool;
> @@ -3779,15 +3799,20 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
>         pp.dev = &bp->pdev->dev;
>         pp.dma_dir = bp->rx_dir;
>         pp.max_len = PAGE_SIZE;
> -       pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
> +       pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV |
> +                  PP_FLAG_ALLOW_UNREADABLE_NETMEM;

I was expecting to see a check that hdr_split is enabled and threshold
is 0 before you allow unreadable netmem. Or are you relying on the
core check at devmem/io_uring binding time to do that for you?

-- 
Thanks,
Mina

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP
  2025-04-02 22:11     ` Mina Almasry
@ 2025-04-02 22:45       ` Jakub Kicinski
  0 siblings, 0 replies; 16+ messages in thread
From: Jakub Kicinski @ 2025-04-02 22:45 UTC (permalink / raw)
  To: Mina Almasry
  Cc: Taehee Yoo, davem, pabeni, edumazet, andrew+netdev, horms,
	michael.chan, pavan.chebbi, ilias.apalodimas, dw, netdev, kuniyu,
	sdf, aleksander.lobakin

On Wed, 2 Apr 2025 15:11:39 -0700 Mina Almasry wrote:
> > We also shouldn't pass netmem to XDP init, it's strange conceptually.
> > If we reach XDP it has to be a non-net_iov page.
> 
> Very noob question, but is XDP/netmem interactions completely
> impossible for some reason? I was thinking XDP progs that only
> touch/need the header may work with unreadable netmem, and if we ever
> add readable net_iovs then those maybe can be exposed to XDP, no? Or
> am I completely off the rails here?

Right, I was referring to the current state of things.
Extensions both to XDP semantics or net_iov could change
the picture.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP
  2025-04-02 22:16   ` Mina Almasry
@ 2025-04-03  1:55     ` Taehee Yoo
  0 siblings, 0 replies; 16+ messages in thread
From: Taehee Yoo @ 2025-04-03  1:55 UTC (permalink / raw)
  To: Mina Almasry
  Cc: davem, kuba, pabeni, edumazet, andrew+netdev, horms, michael.chan,
	pavan.chebbi, ilias.apalodimas, dw, netdev, kuniyu, sdf,
	aleksander.lobakin

On Thu, Apr 3, 2025 at 7:17 AM Mina Almasry <almasrymina@google.com> wrote:
>

Hi Mina,
Thanks a lot for the review!

> On Mon, Mar 31, 2025 at 4:48 AM Taehee Yoo <ap420073@gmail.com> wrote:
>
> > -static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
> > -                                  struct bnxt_rx_ring_info *rxr,
> > -                                  int numa_node)
> > +static int bnxt_alloc_rx_netmem_pool(struct bnxt *bp,
> > +                                    struct bnxt_rx_ring_info *rxr,
> > +                                    int numa_node)
> >  {
> >         struct page_pool_params pp = { 0 };
> >         struct page_pool *pool;
> > @@ -3779,15 +3799,20 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
> >         pp.dev = &bp->pdev->dev;
> >         pp.dma_dir = bp->rx_dir;
> >         pp.max_len = PAGE_SIZE;
> > -       pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
> > +       pp.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV |
> > +                  PP_FLAG_ALLOW_UNREADABLE_NETMEM;
>
> I was expecting to see a check that hdr_split is enabled and threshold
> is 0 before you allow unreadable netmem. Or are you relying on the
> core check at devmem/io_uring binding time to do that for you?

Yes, it relies on the core.
The core already checks conditions very well, such as HDS.
So I think drivers don't need to check these conditions again.

Thanks a lot!
Taehee Yoo

>
> --
> Thanks,
> Mina

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-04-03  1:55 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-31 11:47 [RFC net-next 0/2]eth: bnxt: Implement rx-side device memory Taehee Yoo
2025-03-31 11:47 ` [RFC net-next 1/2] eth: bnxt: refactor buffer descriptor Taehee Yoo
2025-03-31 17:34   ` Jakub Kicinski
2025-04-01  6:48     ` Taehee Yoo
2025-04-01  5:39   ` Michael Chan
2025-04-01  7:17     ` Taehee Yoo
2025-04-01 15:22       ` David Wei
2025-04-01 15:20   ` David Wei
2025-03-31 11:47 ` [RFC net-next 2/2] eth: bnxt: add support rx side device memory TCP Taehee Yoo
2025-03-31 18:50   ` Jakub Kicinski
2025-04-02 12:09     ` Taehee Yoo
2025-04-02 12:46       ` Jakub Kicinski
2025-04-02 22:11     ` Mina Almasry
2025-04-02 22:45       ` Jakub Kicinski
2025-04-02 22:16   ` Mina Almasry
2025-04-03  1:55     ` Taehee Yoo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).