[PATCH net v3 0/3] bnxt_en: support header page pool in queue API

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net v3 0/3] bnxt_en: support header page pool in queue API
@ 2024-12-04  4:10 David Wei
  2024-12-04  4:10 ` [PATCH net v3 1/3] bnxt_en: refactor tpa_info alloc/free into helpers David Wei
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: David Wei @ 2024-12-04  4:10 UTC (permalink / raw)
  To: netdev, Michael Chan, Andy Gospodarek, Somnath Kotur
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Wei

Commit 7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
separate page pool for header frags. Now, frags are allocated from this
header page pool e.g. rxr->tpa_info.data.

The queue API did not properly handle rxr->tpa_info and so using the
queue API to i.e. reset any queues will result in pages being returned
to the incorrect page pool, causing inflight != 0 warnings.

Fix this bug by properly allocating/freeing tpa_info and copying/freeing
head_pool in the queue API implementation.

The 1st patch is a prep patch that refactors helpers out to be used by
the implementation patch later.

The 2nd patch is a drive-by refactor. Happy to take it out and re-send
to net-next if there are any objections.

The 3rd patch is the implementation patch that will properly alloc/free
rxr->tpa_info.

---
v3:
 - use common helper bnxt_separate_head_pool() instead of comparing
   head_pool and page_pool
 - better document why TPA changes were needed in patch 3
v2:
 - remove unneeded struct bnxt_rx_ring_info *rxr declaration
 - restore unintended removal of page_pool_disable_direct_recycling()

David Wei (3):
  bnxt_en: refactor tpa_info alloc/free into helpers
  bnxt_en: refactor bnxt_alloc_rx_rings() to call
    bnxt_alloc_rx_agg_bmap()
  bnxt_en: handle tpa_info in queue API implementation

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 205 ++++++++++++++--------
 1 file changed, 129 insertions(+), 76 deletions(-)

-- 
2.43.5

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH net v3 1/3] bnxt_en: refactor tpa_info alloc/free into helpers
  2024-12-04  4:10 [PATCH net v3 0/3] bnxt_en: support header page pool in queue API David Wei
@ 2024-12-04  4:10 ` David Wei
  2024-12-04  4:10 ` [PATCH net v3 2/3] bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() David Wei
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2024-12-04  4:10 UTC (permalink / raw)
  To: netdev, Michael Chan, Andy Gospodarek, Somnath Kotur
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Wei

Refactor bnxt_rx_ring_info->tpa_info operations into helpers that work
on a single tpa_info in prep for queue API using them.

There are 2 pairs of operations:

* bnxt_alloc_one_tpa_info()
* bnxt_free_one_tpa_info()

These alloc/free the tpa_info array itself.

* bnxt_alloc_one_tpa_info_data()
* bnxt_free_one_tpa_info_data()

These alloc/free the frags stored in tpa_info array.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 142 ++++++++++++++--------
 1 file changed, 90 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 4ec4934a4edd..b85f22a4d1c3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3421,15 +3421,11 @@ static void bnxt_free_one_rx_agg_ring(struct bnxt *bp, struct bnxt_rx_ring_info
 	}
 }
 
-static void bnxt_free_one_rx_ring_skbs(struct bnxt *bp, int ring_nr)
+static void bnxt_free_one_tpa_info_data(struct bnxt *bp,
+					struct bnxt_rx_ring_info *rxr)
 {
-	struct bnxt_rx_ring_info *rxr = &bp->rx_ring[ring_nr];
-	struct bnxt_tpa_idx_map *map;
 	int i;
 
-	if (!rxr->rx_tpa)
-		goto skip_rx_tpa_free;
-
 	for (i = 0; i < bp->max_tpa; i++) {
 		struct bnxt_tpa_info *tpa_info = &rxr->rx_tpa[i];
 		u8 *data = tpa_info->data;
@@ -3440,6 +3436,17 @@ static void bnxt_free_one_rx_ring_skbs(struct bnxt *bp, int ring_nr)
 		tpa_info->data = NULL;
 		page_pool_free_va(rxr->head_pool, data, false);
 	}
+}
+
+static void bnxt_free_one_rx_ring_skbs(struct bnxt *bp,
+				       struct bnxt_rx_ring_info *rxr)
+{
+	struct bnxt_tpa_idx_map *map;
+
+	if (!rxr->rx_tpa)
+		goto skip_rx_tpa_free;
+
+	bnxt_free_one_tpa_info_data(bp, rxr);
 
 skip_rx_tpa_free:
 	if (!rxr->rx_buf_ring)
@@ -3467,7 +3474,7 @@ static void bnxt_free_rx_skbs(struct bnxt *bp)
 		return;
 
 	for (i = 0; i < bp->rx_nr_rings; i++)
-		bnxt_free_one_rx_ring_skbs(bp, i);
+		bnxt_free_one_rx_ring_skbs(bp, &bp->rx_ring[i]);
 }
 
 static void bnxt_free_skbs(struct bnxt *bp)
@@ -3608,29 +3615,64 @@ static int bnxt_alloc_ring(struct bnxt *bp, struct bnxt_ring_mem_info *rmem)
 	return 0;
 }
 
+static void bnxt_free_one_tpa_info(struct bnxt *bp,
+				   struct bnxt_rx_ring_info *rxr)
+{
+	int i;
+
+	kfree(rxr->rx_tpa_idx_map);
+	rxr->rx_tpa_idx_map = NULL;
+	if (rxr->rx_tpa) {
+		for (i = 0; i < bp->max_tpa; i++) {
+			kfree(rxr->rx_tpa[i].agg_arr);
+			rxr->rx_tpa[i].agg_arr = NULL;
+		}
+	}
+	kfree(rxr->rx_tpa);
+	rxr->rx_tpa = NULL;
+}
+
 static void bnxt_free_tpa_info(struct bnxt *bp)
 {
-	int i, j;
+	int i;
 
 	for (i = 0; i < bp->rx_nr_rings; i++) {
 		struct bnxt_rx_ring_info *rxr = &bp->rx_ring[i];
 
-		kfree(rxr->rx_tpa_idx_map);
-		rxr->rx_tpa_idx_map = NULL;
-		if (rxr->rx_tpa) {
-			for (j = 0; j < bp->max_tpa; j++) {
-				kfree(rxr->rx_tpa[j].agg_arr);
-				rxr->rx_tpa[j].agg_arr = NULL;
-			}
-		}
-		kfree(rxr->rx_tpa);
-		rxr->rx_tpa = NULL;
+		bnxt_free_one_tpa_info(bp, rxr);
 	}
 }
 
+static int bnxt_alloc_one_tpa_info(struct bnxt *bp,
+				   struct bnxt_rx_ring_info *rxr)
+{
+	struct rx_agg_cmp *agg;
+	int i;
+
+	rxr->rx_tpa = kcalloc(bp->max_tpa, sizeof(struct bnxt_tpa_info),
+			      GFP_KERNEL);
+	if (!rxr->rx_tpa)
+		return -ENOMEM;
+
+	if (!(bp->flags & BNXT_FLAG_CHIP_P5_PLUS))
+		return 0;
+	for (i = 0; i < bp->max_tpa; i++) {
+		agg = kcalloc(MAX_SKB_FRAGS, sizeof(*agg), GFP_KERNEL);
+		if (!agg)
+			return -ENOMEM;
+		rxr->rx_tpa[i].agg_arr = agg;
+	}
+	rxr->rx_tpa_idx_map = kzalloc(sizeof(*rxr->rx_tpa_idx_map),
+				      GFP_KERNEL);
+	if (!rxr->rx_tpa_idx_map)
+		return -ENOMEM;
+
+	return 0;
+}
+
 static int bnxt_alloc_tpa_info(struct bnxt *bp)
 {
-	int i, j;
+	int i, rc;
 
 	bp->max_tpa = MAX_TPA;
 	if (bp->flags & BNXT_FLAG_CHIP_P5_PLUS) {
@@ -3641,25 +3683,10 @@ static int bnxt_alloc_tpa_info(struct bnxt *bp)
 
 	for (i = 0; i < bp->rx_nr_rings; i++) {
 		struct bnxt_rx_ring_info *rxr = &bp->rx_ring[i];
-		struct rx_agg_cmp *agg;
-
-		rxr->rx_tpa = kcalloc(bp->max_tpa, sizeof(struct bnxt_tpa_info),
-				      GFP_KERNEL);
-		if (!rxr->rx_tpa)
-			return -ENOMEM;
 
-		if (!(bp->flags & BNXT_FLAG_CHIP_P5_PLUS))
-			continue;
-		for (j = 0; j < bp->max_tpa; j++) {
-			agg = kcalloc(MAX_SKB_FRAGS, sizeof(*agg), GFP_KERNEL);
-			if (!agg)
-				return -ENOMEM;
-			rxr->rx_tpa[j].agg_arr = agg;
-		}
-		rxr->rx_tpa_idx_map = kzalloc(sizeof(*rxr->rx_tpa_idx_map),
-					      GFP_KERNEL);
-		if (!rxr->rx_tpa_idx_map)
-			return -ENOMEM;
+		rc = bnxt_alloc_one_tpa_info(bp, rxr);
+		if (rc)
+			return rc;
 	}
 	return 0;
 }
@@ -4268,10 +4295,31 @@ static void bnxt_alloc_one_rx_ring_page(struct bnxt *bp,
 	rxr->rx_agg_prod = prod;
 }
 
+static int bnxt_alloc_one_tpa_info_data(struct bnxt *bp,
+					struct bnxt_rx_ring_info *rxr)
+{
+	dma_addr_t mapping;
+	u8 *data;
+	int i;
+
+	for (i = 0; i < bp->max_tpa; i++) {
+		data = __bnxt_alloc_rx_frag(bp, &mapping, rxr,
+					    GFP_KERNEL);
+		if (!data)
+			return -ENOMEM;
+
+		rxr->rx_tpa[i].data = data;
+		rxr->rx_tpa[i].data_ptr = data + bp->rx_offset;
+		rxr->rx_tpa[i].mapping = mapping;
+	}
+
+	return 0;
+}
+
 static int bnxt_alloc_one_rx_ring(struct bnxt *bp, int ring_nr)
 {
 	struct bnxt_rx_ring_info *rxr = &bp->rx_ring[ring_nr];
-	int i;
+	int rc;
 
 	bnxt_alloc_one_rx_ring_skb(bp, rxr, ring_nr);
 
@@ -4281,19 +4329,9 @@ static int bnxt_alloc_one_rx_ring(struct bnxt *bp, int ring_nr)
 	bnxt_alloc_one_rx_ring_page(bp, rxr, ring_nr);
 
 	if (rxr->rx_tpa) {
-		dma_addr_t mapping;
-		u8 *data;
-
-		for (i = 0; i < bp->max_tpa; i++) {
-			data = __bnxt_alloc_rx_frag(bp, &mapping, rxr,
-						    GFP_KERNEL);
-			if (!data)
-				return -ENOMEM;
-
-			rxr->rx_tpa[i].data = data;
-			rxr->rx_tpa[i].data_ptr = data + bp->rx_offset;
-			rxr->rx_tpa[i].mapping = mapping;
-		}
+		rc = bnxt_alloc_one_tpa_info_data(bp, rxr);
+		if (rc)
+			return rc;
 	}
 	return 0;
 }
@@ -13663,7 +13701,7 @@ static void bnxt_rx_ring_reset(struct bnxt *bp)
 			bnxt_reset_task(bp, true);
 			break;
 		}
-		bnxt_free_one_rx_ring_skbs(bp, i);
+		bnxt_free_one_rx_ring_skbs(bp, rxr);
 		rxr->rx_prod = 0;
 		rxr->rx_agg_prod = 0;
 		rxr->rx_sw_agg_prod = 0;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net v3 2/3] bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
  2024-12-04  4:10 [PATCH net v3 0/3] bnxt_en: support header page pool in queue API David Wei
  2024-12-04  4:10 ` [PATCH net v3 1/3] bnxt_en: refactor tpa_info alloc/free into helpers David Wei
@ 2024-12-04  4:10 ` David Wei
  2024-12-04  4:10 ` [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation David Wei
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2024-12-04  4:10 UTC (permalink / raw)
  To: netdev, Michael Chan, Andy Gospodarek, Somnath Kotur
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Wei

Refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() for
allocating rx_agg_bmap.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 36 ++++++++++-------------
 1 file changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index b85f22a4d1c3..8031ff31f837 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3764,6 +3764,19 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 	return PTR_ERR(pool);
 }
 
+static int bnxt_alloc_rx_agg_bmap(struct bnxt *bp, struct bnxt_rx_ring_info *rxr)
+{
+	u16 mem_size;
+
+	rxr->rx_agg_bmap_size = bp->rx_agg_ring_mask + 1;
+	mem_size = rxr->rx_agg_bmap_size / 8;
+	rxr->rx_agg_bmap = kzalloc(mem_size, GFP_KERNEL);
+	if (!rxr->rx_agg_bmap)
+		return -ENOMEM;
+
+	return 0;
+}
+
 static int bnxt_alloc_rx_rings(struct bnxt *bp)
 {
 	int numa_node = dev_to_node(&bp->pdev->dev);
@@ -3808,19 +3821,15 @@ static int bnxt_alloc_rx_rings(struct bnxt *bp)
 
 		ring->grp_idx = i;
 		if (agg_rings) {
-			u16 mem_size;
-
 			ring = &rxr->rx_agg_ring_struct;
 			rc = bnxt_alloc_ring(bp, &ring->ring_mem);
 			if (rc)
 				return rc;
 
 			ring->grp_idx = i;
-			rxr->rx_agg_bmap_size = bp->rx_agg_ring_mask + 1;
-			mem_size = rxr->rx_agg_bmap_size / 8;
-			rxr->rx_agg_bmap = kzalloc(mem_size, GFP_KERNEL);
-			if (!rxr->rx_agg_bmap)
-				return -ENOMEM;
+			rc = bnxt_alloc_rx_agg_bmap(bp, rxr);
+			if (rc)
+				return rc;
 		}
 	}
 	if (bp->flags & BNXT_FLAG_TPA)
@@ -15331,19 +15340,6 @@ static const struct netdev_stat_ops bnxt_stat_ops = {
 	.get_base_stats		= bnxt_get_base_stats,
 };
 
-static int bnxt_alloc_rx_agg_bmap(struct bnxt *bp, struct bnxt_rx_ring_info *rxr)
-{
-	u16 mem_size;
-
-	rxr->rx_agg_bmap_size = bp->rx_agg_ring_mask + 1;
-	mem_size = rxr->rx_agg_bmap_size / 8;
-	rxr->rx_agg_bmap = kzalloc(mem_size, GFP_KERNEL);
-	if (!rxr->rx_agg_bmap)
-		return -ENOMEM;
-
-	return 0;
-}
-
 static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
 {
 	struct bnxt_rx_ring_info *rxr, *clone;
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-04  4:10 [PATCH net v3 0/3] bnxt_en: support header page pool in queue API David Wei
  2024-12-04  4:10 ` [PATCH net v3 1/3] bnxt_en: refactor tpa_info alloc/free into helpers David Wei
  2024-12-04  4:10 ` [PATCH net v3 2/3] bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() David Wei
@ 2024-12-04  4:10 ` David Wei
  2024-12-10 12:25   ` Yunsheng Lin
  2024-12-04 22:43 ` [PATCH net v3 0/3] bnxt_en: support header page pool in queue API Michael Chan
  2024-12-05  3:30 ` patchwork-bot+netdevbpf
  4 siblings, 1 reply; 19+ messages in thread
From: David Wei @ 2024-12-04  4:10 UTC (permalink / raw)
  To: netdev, Michael Chan, Andy Gospodarek, Somnath Kotur
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni, David Wei

Commit 7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
page pool for header frags, which may be distinct from the existing pool
for the aggregation ring. Prior to this change, frags used in the TPA
ring rx_tpa were allocated from system memory e.g. napi_alloc_frag()
meaning their lifetimes were not associated with a page pool. They can
be returned at any time and so the queue API did not alloc or free
rx_tpa.

But now frags come from a separate head_pool which may be different to
page_pool. Without allocating and freeing rx_tpa, frags allocated from
the old head_pool may be returned to a different new head_pool which
causes a mismatch between the pp hold/release count.

Fix this problem by properly freeing and allocating rx_tpa in the queue
API implementation.

Signed-off-by: David Wei <dw@davidwei.uk>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 27 +++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 8031ff31f837..6b963086c1d3 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3710,7 +3710,7 @@ static void bnxt_free_rx_rings(struct bnxt *bp)
 			xdp_rxq_info_unreg(&rxr->xdp_rxq);
 
 		page_pool_destroy(rxr->page_pool);
-		if (rxr->page_pool != rxr->head_pool)
+		if (bnxt_separate_head_pool())
 			page_pool_destroy(rxr->head_pool);
 		rxr->page_pool = rxr->head_pool = NULL;
 
@@ -15388,15 +15388,25 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
 			goto err_free_rx_agg_ring;
 	}
 
+	if (bp->flags & BNXT_FLAG_TPA) {
+		rc = bnxt_alloc_one_tpa_info(bp, clone);
+		if (rc)
+			goto err_free_tpa_info;
+	}
+
 	bnxt_init_one_rx_ring_rxbd(bp, clone);
 	bnxt_init_one_rx_agg_ring_rxbd(bp, clone);
 
 	bnxt_alloc_one_rx_ring_skb(bp, clone, idx);
 	if (bp->flags & BNXT_FLAG_AGG_RINGS)
 		bnxt_alloc_one_rx_ring_page(bp, clone, idx);
+	if (bp->flags & BNXT_FLAG_TPA)
+		bnxt_alloc_one_tpa_info_data(bp, clone);
 
 	return 0;
 
+err_free_tpa_info:
+	bnxt_free_one_tpa_info(bp, clone);
 err_free_rx_agg_ring:
 	bnxt_free_ring(bp, &clone->rx_agg_ring_struct.ring_mem);
 err_free_rx_ring:
@@ -15404,9 +15414,11 @@ static int bnxt_queue_mem_alloc(struct net_device *dev, void *qmem, int idx)
 err_rxq_info_unreg:
 	xdp_rxq_info_unreg(&clone->xdp_rxq);
 err_page_pool_destroy:
-	clone->page_pool->p.napi = NULL;
 	page_pool_destroy(clone->page_pool);
+	if (bnxt_separate_head_pool())
+		page_pool_destroy(clone->head_pool);
 	clone->page_pool = NULL;
+	clone->head_pool = NULL;
 	return rc;
 }
 
@@ -15416,13 +15428,15 @@ static void bnxt_queue_mem_free(struct net_device *dev, void *qmem)
 	struct bnxt *bp = netdev_priv(dev);
 	struct bnxt_ring_struct *ring;
 
-	bnxt_free_one_rx_ring(bp, rxr);
-	bnxt_free_one_rx_agg_ring(bp, rxr);
+	bnxt_free_one_rx_ring_skbs(bp, rxr);
 
 	xdp_rxq_info_unreg(&rxr->xdp_rxq);
 
 	page_pool_destroy(rxr->page_pool);
+	if (bnxt_separate_head_pool())
+		page_pool_destroy(rxr->head_pool);
 	rxr->page_pool = NULL;
+	rxr->head_pool = NULL;
 
 	ring = &rxr->rx_ring_struct;
 	bnxt_free_ring(bp, &ring->ring_mem);
@@ -15504,7 +15518,10 @@ static int bnxt_queue_start(struct net_device *dev, void *qmem, int idx)
 	rxr->rx_agg_prod = clone->rx_agg_prod;
 	rxr->rx_sw_agg_prod = clone->rx_sw_agg_prod;
 	rxr->rx_next_cons = clone->rx_next_cons;
+	rxr->rx_tpa = clone->rx_tpa;
+	rxr->rx_tpa_idx_map = clone->rx_tpa_idx_map;
 	rxr->page_pool = clone->page_pool;
+	rxr->head_pool = clone->head_pool;
 	rxr->xdp_rxq = clone->xdp_rxq;
 
 	bnxt_copy_rx_ring(bp, rxr, clone);
@@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
 	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
 	rxr->rx_next_cons = 0;
 	page_pool_disable_direct_recycling(rxr->page_pool);
+	if (bnxt_separate_head_pool())
+		page_pool_disable_direct_recycling(rxr->head_pool);
 
 	memcpy(qmem, rxr, sizeof(*rxr));
 	bnxt_init_rx_ring_struct(bp, qmem);
-- 
2.43.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 0/3] bnxt_en: support header page pool in queue API
  2024-12-04  4:10 [PATCH net v3 0/3] bnxt_en: support header page pool in queue API David Wei
                   ` (2 preceding siblings ...)
  2024-12-04  4:10 ` [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation David Wei
@ 2024-12-04 22:43 ` Michael Chan
  2024-12-05  3:30 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 19+ messages in thread
From: Michael Chan @ 2024-12-04 22:43 UTC (permalink / raw)
  To: David Wei
  Cc: netdev, Andy Gospodarek, Somnath Kotur, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni

[-- Attachment #1: Type: text/plain, Size: 1015 bytes --]

On Tue, Dec 3, 2024 at 8:10 PM David Wei <dw@davidwei.uk> wrote:
>
> Commit 7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
> separate page pool for header frags. Now, frags are allocated from this
> header page pool e.g. rxr->tpa_info.data.
>
> The queue API did not properly handle rxr->tpa_info and so using the
> queue API to i.e. reset any queues will result in pages being returned
> to the incorrect page pool, causing inflight != 0 warnings.
>
> Fix this bug by properly allocating/freeing tpa_info and copying/freeing
> head_pool in the queue API implementation.
>
> The 1st patch is a prep patch that refactors helpers out to be used by
> the implementation patch later.
>
> The 2nd patch is a drive-by refactor. Happy to take it out and re-send
> to net-next if there are any objections.
>
> The 3rd patch is the implementation patch that will properly alloc/free
> rxr->tpa_info.

Thanks.
For the series,
Reviewed-by: Michael Chan <michael.chan@broadcom.com>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 0/3] bnxt_en: support header page pool in queue API
  2024-12-04  4:10 [PATCH net v3 0/3] bnxt_en: support header page pool in queue API David Wei
                   ` (3 preceding siblings ...)
  2024-12-04 22:43 ` [PATCH net v3 0/3] bnxt_en: support header page pool in queue API Michael Chan
@ 2024-12-05  3:30 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 19+ messages in thread
From: patchwork-bot+netdevbpf @ 2024-12-05  3:30 UTC (permalink / raw)
  To: David Wei
  Cc: netdev, michael.chan, andrew.gospodarek, somnath.kotur,
	andrew+netdev, davem, edumazet, kuba, pabeni

Hello:

This series was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue,  3 Dec 2024 20:10:19 -0800 you wrote:
> Commit 7ed816be35ab ("eth: bnxt: use page pool for head frags") added a
> separate page pool for header frags. Now, frags are allocated from this
> header page pool e.g. rxr->tpa_info.data.
> 
> The queue API did not properly handle rxr->tpa_info and so using the
> queue API to i.e. reset any queues will result in pages being returned
> to the incorrect page pool, causing inflight != 0 warnings.
> 
> [...]

Here is the summary with links:
  - [net,v3,1/3] bnxt_en: refactor tpa_info alloc/free into helpers
    https://git.kernel.org/netdev/net/c/5883a3e0babf
  - [net,v3,2/3] bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
    https://git.kernel.org/netdev/net/c/bf1782d70ddd
  - [net,v3,3/3] bnxt_en: handle tpa_info in queue API implementation
    https://git.kernel.org/netdev/net/c/bd649c5cc958

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-04  4:10 ` [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation David Wei
@ 2024-12-10 12:25   ` Yunsheng Lin
  2024-12-10 18:14     ` David Wei
  0 siblings, 1 reply; 19+ messages in thread
From: Yunsheng Lin @ 2024-12-10 12:25 UTC (permalink / raw)
  To: David Wei, netdev, Michael Chan, Andy Gospodarek, Somnath Kotur
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni

On 2024/12/4 12:10, David Wei wrote:

>  	bnxt_copy_rx_ring(bp, rxr, clone);
> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>  	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
>  	rxr->rx_next_cons = 0;
>  	page_pool_disable_direct_recycling(rxr->page_pool);
> +	if (bnxt_separate_head_pool())
> +		page_pool_disable_direct_recycling(rxr->head_pool);

Hi, David
As mentioned in [1], is the above page_pool_disable_direct_recycling()
really needed?

Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
It doesn't seem obvious there is any NAPI API like napi_enable() &
____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.

1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-10 12:25   ` Yunsheng Lin
@ 2024-12-10 18:14     ` David Wei
  2024-12-11 12:32       ` Yunsheng Lin
  0 siblings, 1 reply; 19+ messages in thread
From: David Wei @ 2024-12-10 18:14 UTC (permalink / raw)
  To: Yunsheng Lin, netdev, Michael Chan, Andy Gospodarek,
	Somnath Kotur
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni

On 2024-12-10 04:25, Yunsheng Lin wrote:
> On 2024/12/4 12:10, David Wei wrote:
> 
>>  	bnxt_copy_rx_ring(bp, rxr, clone);
>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>>  	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
>>  	rxr->rx_next_cons = 0;
>>  	page_pool_disable_direct_recycling(rxr->page_pool);
>> +	if (bnxt_separate_head_pool())
>> +		page_pool_disable_direct_recycling(rxr->head_pool);
> 
> Hi, David
> As mentioned in [1], is the above page_pool_disable_direct_recycling()
> really needed?
> 
> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
> It doesn't seem obvious there is any NAPI API like napi_enable() &
> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.
> 
> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/

Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the
bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW
quirks. I looked back at my discussions w/ Broadcom, and IIU/RC
bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq
that I'm trying to stop. Calling napi_disable() has unintended side
effects on the Tx side.

The intent of the call to page_pool_disable_direct_recycling() is to
prevent pages from the old page pool from being returned into the fast
cache. These pages must be returned via page_pool_return_page() so that
the it can eventually be freed in page_pool_release_retry().

I'm going to take a look at your discussions in [1] and respond there.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-10 18:14     ` David Wei
@ 2024-12-11 12:32       ` Yunsheng Lin
  2024-12-11 16:10         ` David Wei
  0 siblings, 1 reply; 19+ messages in thread
From: Yunsheng Lin @ 2024-12-11 12:32 UTC (permalink / raw)
  To: David Wei, netdev, Michael Chan, Andy Gospodarek, Somnath Kotur
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni

On 2024/12/11 2:14, David Wei wrote:
> On 2024-12-10 04:25, Yunsheng Lin wrote:
>> On 2024/12/4 12:10, David Wei wrote:
>>
>>>  	bnxt_copy_rx_ring(bp, rxr, clone);
>>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>>>  	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
>>>  	rxr->rx_next_cons = 0;
>>>  	page_pool_disable_direct_recycling(rxr->page_pool);
>>> +	if (bnxt_separate_head_pool())
>>> +		page_pool_disable_direct_recycling(rxr->head_pool);
>>
>> Hi, David
>> As mentioned in [1], is the above page_pool_disable_direct_recycling()
>> really needed?
>>
>> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
>> It doesn't seem obvious there is any NAPI API like napi_enable() &
>> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
>> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.
>>
>> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/
> 
> Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the
> bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW
> quirks. I looked back at my discussions w/ Broadcom, and IIU/RC
> bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq
> that I'm trying to stop. Calling napi_disable() has unintended side
> effects on the Tx side.

It seems that bnxt_hwrm_vnic_update() sends a VNIC_UPDATE cmd to disable
a VNIC? and a napi_disable() is not needed? Is it possible that there may
be some pending NAPI work is still being processed after bnxt_hwrm_vnic_update()
is called?

> 
> The intent of the call to page_pool_disable_direct_recycling() is to
> prevent pages from the old page pool from being returned into the fast
> cache. These pages must be returned via page_pool_return_page() so that
> the it can eventually be freed in page_pool_release_retry().
> 
> I'm going to take a look at your discussions in [1] and respond there.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-11 12:32       ` Yunsheng Lin
@ 2024-12-11 16:10         ` David Wei
  2024-12-11 17:11           ` Michael Chan
  0 siblings, 1 reply; 19+ messages in thread
From: David Wei @ 2024-12-11 16:10 UTC (permalink / raw)
  To: Yunsheng Lin, netdev, Michael Chan, Andy Gospodarek,
	Somnath Kotur
  Cc: Andrew Lunn, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni

On 2024-12-11 04:32, Yunsheng Lin wrote:
> On 2024/12/11 2:14, David Wei wrote:
>> On 2024-12-10 04:25, Yunsheng Lin wrote:
>>> On 2024/12/4 12:10, David Wei wrote:
>>>
>>>>  	bnxt_copy_rx_ring(bp, rxr, clone);
>>>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>>>>  	bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
>>>>  	rxr->rx_next_cons = 0;
>>>>  	page_pool_disable_direct_recycling(rxr->page_pool);
>>>> +	if (bnxt_separate_head_pool())
>>>> +		page_pool_disable_direct_recycling(rxr->head_pool);
>>>
>>> Hi, David
>>> As mentioned in [1], is the above page_pool_disable_direct_recycling()
>>> really needed?
>>>
>>> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
>>> It doesn't seem obvious there is any NAPI API like napi_enable() &
>>> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
>>> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.
>>>
>>> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/
>>
>> Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the
>> bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW
>> quirks. I looked back at my discussions w/ Broadcom, and IIU/RC
>> bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq
>> that I'm trying to stop. Calling napi_disable() has unintended side
>> effects on the Tx side.
> 
> It seems that bnxt_hwrm_vnic_update() sends a VNIC_UPDATE cmd to disable
> a VNIC? and a napi_disable() is not needed?

Correct.

> Is it possible that there may
> be some pending NAPI work is still being processed after bnxt_hwrm_vnic_update()
> is called?

Possibly, I don't know the details of how the HW works.

At the time I just wanted something to work, and not having
napi_enable/disable() made it work. :) Looking back though it does seem
odd, so I'll try putting it back.

> 
>>
>> The intent of the call to page_pool_disable_direct_recycling() is to
>> prevent pages from the old page pool from being returned into the fast
>> cache. These pages must be returned via page_pool_return_page() so that
>> the it can eventually be freed in page_pool_release_retry().
>>
>> I'm going to take a look at your discussions in [1] and respond there.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-11 16:10         ` David Wei
@ 2024-12-11 17:11           ` Michael Chan
  2024-12-12  0:48             ` Jakub Kicinski
  2024-12-16 21:41             ` David Wei
  0 siblings, 2 replies; 19+ messages in thread
From: Michael Chan @ 2024-12-11 17:11 UTC (permalink / raw)
  To: David Wei
  Cc: Yunsheng Lin, netdev, Andy Gospodarek, Somnath Kotur, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni

[-- Attachment #1: Type: text/plain, Size: 2489 bytes --]

On Wed, Dec 11, 2024 at 8:10 AM David Wei <dw@davidwei.uk> wrote:
>
> On 2024-12-11 04:32, Yunsheng Lin wrote:
> > On 2024/12/11 2:14, David Wei wrote:
> >> On 2024-12-10 04:25, Yunsheng Lin wrote:
> >>> On 2024/12/4 12:10, David Wei wrote:
> >>>
> >>>>    bnxt_copy_rx_ring(bp, rxr, clone);
> >>>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
> >>>>    bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
> >>>>    rxr->rx_next_cons = 0;
> >>>>    page_pool_disable_direct_recycling(rxr->page_pool);
> >>>> +  if (bnxt_separate_head_pool())
> >>>> +          page_pool_disable_direct_recycling(rxr->head_pool);
> >>>
> >>> Hi, David
> >>> As mentioned in [1], is the above page_pool_disable_direct_recycling()
> >>> really needed?
> >>>
> >>> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
> >>> It doesn't seem obvious there is any NAPI API like napi_enable() &
> >>> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
> >>> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.
> >>>
> >>> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/
> >>
> >> Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the
> >> bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW
> >> quirks. I looked back at my discussions w/ Broadcom, and IIU/RC
> >> bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq
> >> that I'm trying to stop. Calling napi_disable() has unintended side
> >> effects on the Tx side.
> >
> > It seems that bnxt_hwrm_vnic_update() sends a VNIC_UPDATE cmd to disable
> > a VNIC? and a napi_disable() is not needed?
>
> Correct.
>
> > Is it possible that there may
> > be some pending NAPI work is still being processed after bnxt_hwrm_vnic_update()
> > is called?
>
> Possibly, I don't know the details of how the HW works.
>

bnxt_hwrm_vnic_update() only stops the HW from receiving more packets.
The completion may already have lots of RX entries and TPA entries.
NAPI may be behind and it can take a while to process a batch of 64
entries at a time to go through all the remaining entries.

> At the time I just wanted something to work, and not having
> napi_enable/disable() made it work. :) Looking back though it does seem
> odd, so I'll try putting it back.

Yeah, I think it makes sense to add napi_disable().  Thanks.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-11 17:11           ` Michael Chan
@ 2024-12-12  0:48             ` Jakub Kicinski
  2024-12-12 11:23               ` Yunsheng Lin
  2024-12-16 21:41             ` David Wei
  1 sibling, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2024-12-12  0:48 UTC (permalink / raw)
  To: Michael Chan
  Cc: David Wei, Yunsheng Lin, netdev, Andy Gospodarek, Somnath Kotur,
	Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni

On Wed, 11 Dec 2024 09:11:55 -0800 Michael Chan wrote:
> > At the time I just wanted something to work, and not having
> > napi_enable/disable() made it work. :) Looking back though it does seem
> > odd, so I'll try putting it back.  
> 
> Yeah, I think it makes sense to add napi_disable().

+1, TBH I'm not sure how we avoid hitting the warning which checks
if NAPI is "scheduled" in page_pool_disable_direct_recycling().

But, Yunsheng, I hope it is clear that the sync RCU is needed even 
if driver disables NAPI for the reconfiguration. Unless you see a
RCU sync in napi_disable() / napi_enable()..

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-12  0:48             ` Jakub Kicinski
@ 2024-12-12 11:23               ` Yunsheng Lin
  2024-12-12 14:56                 ` Jakub Kicinski
  0 siblings, 1 reply; 19+ messages in thread
From: Yunsheng Lin @ 2024-12-12 11:23 UTC (permalink / raw)
  To: Jakub Kicinski, Michael Chan
  Cc: David Wei, netdev, Andy Gospodarek, Somnath Kotur, Andrew Lunn,
	David S. Miller, Eric Dumazet, Paolo Abeni

On 2024/12/12 8:48, Jakub Kicinski wrote:
 > But, Yunsheng, I hope it is clear that the sync RCU is needed even
> if driver disables NAPI for the reconfiguration. Unless you see a
> RCU sync in napi_disable() / napi_enable()..

It seems that depends on calling order of napi_disable()/napi_enable()
and page_pool_destroy() for old page_pool.
It seems an extra RCU sync is not really needed if page_pool_destroy()
for the old page_pool is called between napi_disable() and napi_enable()
as page_pool_destroy() already have a RCU sync.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-12 11:23               ` Yunsheng Lin
@ 2024-12-12 14:56                 ` Jakub Kicinski
  2024-12-13 12:17                   ` Yunsheng Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Jakub Kicinski @ 2024-12-12 14:56 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Michael Chan, David Wei, netdev, Andy Gospodarek, Somnath Kotur,
	Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni

On Thu, 12 Dec 2024 19:23:52 +0800 Yunsheng Lin wrote:
> It seems an extra RCU sync is not really needed if page_pool_destroy()
> for the old page_pool is called between napi_disable() and napi_enable()
> as page_pool_destroy() already have a RCU sync.

I did my best.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-12 14:56                 ` Jakub Kicinski
@ 2024-12-13 12:17                   ` Yunsheng Lin
  2024-12-31 11:49                     ` Yunsheng Lin
  0 siblings, 1 reply; 19+ messages in thread
From: Yunsheng Lin @ 2024-12-13 12:17 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Michael Chan, David Wei, netdev, Andy Gospodarek, Somnath Kotur,
	Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni

On 2024/12/12 22:56, Jakub Kicinski wrote:
> On Thu, 12 Dec 2024 19:23:52 +0800 Yunsheng Lin wrote:
>> It seems an extra RCU sync is not really needed if page_pool_destroy()
>> for the old page_pool is called between napi_disable() and napi_enable()
>> as page_pool_destroy() already have a RCU sync.
> 
> I did my best.
> 

I am not sure how to interpret the above comment.

Anyway, I guess it can be said the patch in [1] is only trying to fix
a use-after-freed problem basing on the assumption that the softirq
context on the same CPU has ensured sequential execution and a specific
NAPI instance executed between different CPUs has also ensured sequential
execution, so page_pool_napi_local() will only return true for a softirq
context specific to the CPU being pool->p.napi->list_owner when list_owner
!= -1 after napi_enable() is called, and page_pool_napi_local() will always
return false after napi_disable() is called as pool->p.napi->list_owner is
always -1 even when skb_defer_free_flush() can be called without binding to
any NAPI instance and can be executed in the softirq context of any CPU.

If there is any timing window you think that might cause page_pool_napi_local()
to return true unexpectly, it would be good to be more specific about it and
a bigger rcu lock might be needed instead of a small rcu lock in [1].

1. https://lore.kernel.org/all/20241120103456.396577-2-linyunsheng@huawei.com/#r

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-11 17:11           ` Michael Chan
  2024-12-12  0:48             ` Jakub Kicinski
@ 2024-12-16 21:41             ` David Wei
  2024-12-16 22:48               ` Michael Chan
  1 sibling, 1 reply; 19+ messages in thread
From: David Wei @ 2024-12-16 21:41 UTC (permalink / raw)
  To: Michael Chan, Somnath Kotur
  Cc: Yunsheng Lin, netdev, Andy Gospodarek, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni

On 2024-12-11 09:11, Michael Chan wrote:
> On Wed, Dec 11, 2024 at 8:10 AM David Wei <dw@davidwei.uk> wrote:
>>
>> On 2024-12-11 04:32, Yunsheng Lin wrote:
>>> On 2024/12/11 2:14, David Wei wrote:
>>>> On 2024-12-10 04:25, Yunsheng Lin wrote:
>>>>> On 2024/12/4 12:10, David Wei wrote:
>>>>>
>>>>>>    bnxt_copy_rx_ring(bp, rxr, clone);
>>>>>> @@ -15563,6 +15580,8 @@ static int bnxt_queue_stop(struct net_device *dev, void *qmem, int idx)
>>>>>>    bnxt_hwrm_rx_agg_ring_free(bp, rxr, false);
>>>>>>    rxr->rx_next_cons = 0;
>>>>>>    page_pool_disable_direct_recycling(rxr->page_pool);
>>>>>> +  if (bnxt_separate_head_pool())
>>>>>> +          page_pool_disable_direct_recycling(rxr->head_pool);
>>>>>
>>>>> Hi, David
>>>>> As mentioned in [1], is the above page_pool_disable_direct_recycling()
>>>>> really needed?
>>>>>
>>>>> Is there any NAPI API called in the implementation of netdev_queue_mgmt_ops?
>>>>> It doesn't seem obvious there is any NAPI API like napi_enable() &
>>>>> ____napi_schedule() that is called in bnxt_queue_start()/bnxt_queue_stop()/
>>>>> bnxt_queue_mem_alloc()/bnxt_queue_mem_free() through code reading.
>>>>>
>>>>> 1. https://lore.kernel.org/all/c2b306af-4817-4169-814b-adbf25803919@huawei.com/
>>>>
>>>> Hi Yunsheng, there are explicitly no napi_enable/disable() calls in the
>>>> bnxt implementation of netdev_queue_mgmt_ops due to ... let's say HW/FW
>>>> quirks. I looked back at my discussions w/ Broadcom, and IIU/RC
>>>> bnxt_hwrm_vnic_update() will prevent any work from coming into the rxq
>>>> that I'm trying to stop. Calling napi_disable() has unintended side
>>>> effects on the Tx side.
>>>
>>> It seems that bnxt_hwrm_vnic_update() sends a VNIC_UPDATE cmd to disable
>>> a VNIC? and a napi_disable() is not needed?
>>
>> Correct.
>>
>>> Is it possible that there may
>>> be some pending NAPI work is still being processed after bnxt_hwrm_vnic_update()
>>> is called?
>>
>> Possibly, I don't know the details of how the HW works.
>>
> 
> bnxt_hwrm_vnic_update() only stops the HW from receiving more packets.
> The completion may already have lots of RX entries and TPA entries.
> NAPI may be behind and it can take a while to process a batch of 64
> entries at a time to go through all the remaining entries.
> 
>> At the time I just wanted something to work, and not having
>> napi_enable/disable() made it work. :) Looking back though it does seem
>> odd, so I'll try putting it back.
> 
> Yeah, I think it makes sense to add napi_disable().  Thanks.

Michael, Som. I can't add napi_disable()/enable() because the NAPI
instance is shared between the Rx and Tx queues. If I disable a NAPI
instance, then it affects the corresponding Tx queue because it is not
quiesced. Only the Rx queue is quiesced indirectly by preventhing the HW
from receiving packets via the call to bnxt_hwrm_vnic_update().

The other implementation of queue API (gve) will quiesce all queues for
an individual queue stop/start operation. To call
napi_disable()/enable() I believe we will need the same thing for bnxt.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-16 21:41             ` David Wei
@ 2024-12-16 22:48               ` Michael Chan
  2024-12-16 22:59                 ` David Wei
  0 siblings, 1 reply; 19+ messages in thread
From: Michael Chan @ 2024-12-16 22:48 UTC (permalink / raw)
  To: David Wei
  Cc: Somnath Kotur, Yunsheng Lin, netdev, Andy Gospodarek, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni

[-- Attachment #1: Type: text/plain, Size: 1028 bytes --]

On Mon, Dec 16, 2024 at 1:41 PM David Wei <dw@davidwei.uk> wrote:
>
> On 2024-12-11 09:11, Michael Chan wrote:
> > Yeah, I think it makes sense to add napi_disable().  Thanks.
>
> Michael, Som. I can't add napi_disable()/enable() because the NAPI
> instance is shared between the Rx and Tx queues. If I disable a NAPI
> instance, then it affects the corresponding Tx queue because it is not
> quiesced. Only the Rx queue is quiesced indirectly by preventhing the HW
> from receiving packets via the call to bnxt_hwrm_vnic_update().
>
> The other implementation of queue API (gve) will quiesce all queues for
> an individual queue stop/start operation. To call
> napi_disable()/enable() I believe we will need the same thing for bnxt.

Som is working on a set of driver patches to make
queue_stop/queue_start work for the AMD PCIe TPH feature.  He will
have to disable and re-enable the TX queue for TPH to work.  I think
we can just include the napi_disable()/napi_enable() in Som's patches
then.  Thanks.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-16 22:48               ` Michael Chan
@ 2024-12-16 22:59                 ` David Wei
  0 siblings, 0 replies; 19+ messages in thread
From: David Wei @ 2024-12-16 22:59 UTC (permalink / raw)
  To: Michael Chan
  Cc: Somnath Kotur, Yunsheng Lin, netdev, Andy Gospodarek, Andrew Lunn,
	David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni

On 2024-12-16 14:48, Michael Chan wrote:
> On Mon, Dec 16, 2024 at 1:41 PM David Wei <dw@davidwei.uk> wrote:
>>
>> On 2024-12-11 09:11, Michael Chan wrote:
>>> Yeah, I think it makes sense to add napi_disable().  Thanks.
>>
>> Michael, Som. I can't add napi_disable()/enable() because the NAPI
>> instance is shared between the Rx and Tx queues. If I disable a NAPI
>> instance, then it affects the corresponding Tx queue because it is not
>> quiesced. Only the Rx queue is quiesced indirectly by preventhing the HW
>> from receiving packets via the call to bnxt_hwrm_vnic_update().
>>
>> The other implementation of queue API (gve) will quiesce all queues for
>> an individual queue stop/start operation. To call
>> napi_disable()/enable() I believe we will need the same thing for bnxt.
> 
> Som is working on a set of driver patches to make
> queue_stop/queue_start work for the AMD PCIe TPH feature.  He will
> have to disable and re-enable the TX queue for TPH to work.  I think
> we can just include the napi_disable()/napi_enable() in Som's patches
> then.  Thanks.

Sounds good, thanks Michael.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation
  2024-12-13 12:17                   ` Yunsheng Lin
@ 2024-12-31 11:49                     ` Yunsheng Lin
  0 siblings, 0 replies; 19+ messages in thread
From: Yunsheng Lin @ 2024-12-31 11:49 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Michael Chan, David Wei, netdev, Andy Gospodarek, Somnath Kotur,
	Andrew Lunn, David S. Miller, Eric Dumazet, Paolo Abeni

On 2024/12/13 20:17, Yunsheng Lin wrote:
> On 2024/12/12 22:56, Jakub Kicinski wrote:
>> On Thu, 12 Dec 2024 19:23:52 +0800 Yunsheng Lin wrote:
>>> It seems an extra RCU sync is not really needed if page_pool_destroy()
>>> for the old page_pool is called between napi_disable() and napi_enable()
>>> as page_pool_destroy() already have a RCU sync.
>>
>> I did my best.
>>
> 
> I am not sure how to interpret the above comment.
> 
> Anyway, I guess it can be said the patch in [1] is only trying to fix
> a use-after-freed problem basing on the assumption that the softirq
> context on the same CPU has ensured sequential execution and a specific
> NAPI instance executed between different CPUs has also ensured sequential
> execution, so page_pool_napi_local() will only return true for a softirq
> context specific to the CPU being pool->p.napi->list_owner when list_owner
> != -1 after napi_enable() is called, and page_pool_napi_local() will always
> return false after napi_disable() is called as pool->p.napi->list_owner is
> always -1 even when skb_defer_free_flush() can be called without binding to
> any NAPI instance and can be executed in the softirq context of any CPU.
> 
> If there is any timing window you think that might cause page_pool_napi_local()
> to return true unexpectly, it would be good to be more specific about it and
> a bigger rcu lock might be needed instead of a small rcu lock in [1].

Please let me know if there is other obvious timing window that need fixing.
If not, I am planning to keep that patch as part of the dma unmapping patchset
as the dma unmapping fix patch depends on synchronize_rcu() added in that patch
and the time window is so small that it doesn't seem to be an urgent fix, so
target the net-next as the dma unmapping fix does.

> 
> 1. https://lore.kernel.org/all/20241120103456.396577-2-linyunsheng@huawei.com/#r
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-12-31 11:50 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-04  4:10 [PATCH net v3 0/3] bnxt_en: support header page pool in queue API David Wei
2024-12-04  4:10 ` [PATCH net v3 1/3] bnxt_en: refactor tpa_info alloc/free into helpers David Wei
2024-12-04  4:10 ` [PATCH net v3 2/3] bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() David Wei
2024-12-04  4:10 ` [PATCH net v3 3/3] bnxt_en: handle tpa_info in queue API implementation David Wei
2024-12-10 12:25   ` Yunsheng Lin
2024-12-10 18:14     ` David Wei
2024-12-11 12:32       ` Yunsheng Lin
2024-12-11 16:10         ` David Wei
2024-12-11 17:11           ` Michael Chan
2024-12-12  0:48             ` Jakub Kicinski
2024-12-12 11:23               ` Yunsheng Lin
2024-12-12 14:56                 ` Jakub Kicinski
2024-12-13 12:17                   ` Yunsheng Lin
2024-12-31 11:49                     ` Yunsheng Lin
2024-12-16 21:41             ` David Wei
2024-12-16 22:48               ` Michael Chan
2024-12-16 22:59                 ` David Wei
2024-12-04 22:43 ` [PATCH net v3 0/3] bnxt_en: support header page pool in queue API Michael Chan
2024-12-05  3:30 ` patchwork-bot+netdevbpf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).