BPF List
 help / color / mirror / Atom feed
* [PATCH net-next v7 0/8] fix two bugs related to page_pool
@ 2025-01-10 13:06 Yunsheng Lin
  2025-01-10 13:06 ` [PATCH net-next v7 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
  2025-01-14 14:31 ` [PATCH net-next v7 0/8] fix two bugs related to page_pool Jesper Dangaard Brouer
  0 siblings, 2 replies; 9+ messages in thread
From: Yunsheng Lin @ 2025-01-10 13:06 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: zhangkun09, liuyonglong, fanghaiqing, Yunsheng Lin,
	Alexander Lobakin, Robin Murphy, Alexander Duyck, Andrew Morton,
	IOMMU, MM, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, John Fastabend, Matthias Brugger,
	AngeloGioacchino Del Regno, netdev, intel-wired-lan, bpf,
	linux-kernel, linux-arm-kernel, linux-mediatek

This patchset fix a possible time window problem for page_pool and
the dma API misuse problem as mentioned in [1], and try to avoid the
overhead of the fixing using some optimization.

From the below performance data, the overhead is not so obvious
due to performance variations for time_bench_page_pool01_fast_path()
and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
for time_bench_page_pool03_slow() for fixing the bug.

Before this patchset:
root@(none)$ insmod bench_page_pool_simple.ko
[  323.367627] bench_page_pool_simple: Loaded
[  323.448747] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076997150 sec time_interval:76997150) - (invoke count:100000000 tsc_interval:7699707)
[  324.812884] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346855130 sec time_interval:1346855130) - (invoke count:100000000 tsc_interval:134685507)
[  324.980875] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.010 ns (step:0) - (measurement period time:0.150101270 sec time_interval:150101270) - (invoke count:10000000 tsc_interval:15010120)
[  325.652195] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.542 ns (step:0) - (measurement period time:0.654213000 sec time_interval:654213000) - (invoke count:100000000 tsc_interval:65421294)
[  325.669215] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  325.974848] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 29.633 ns (step:0) - (measurement period time:0.296338200 sec time_interval:296338200) - (invoke count:10000000 tsc_interval:29633814)
[  325.993517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  326.576636] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.391 ns (step:0) - (measurement period time:0.573911820 sec time_interval:573911820) - (invoke count:10000000 tsc_interval:57391174)
[  326.595307] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  328.422661] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.849 ns (step:0) - (measurement period time:1.818495880 sec time_interval:1818495880) - (invoke count:10000000 tsc_interval:181849581)
[  328.441681] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  328.449584] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  328.755031] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 29.632 ns (step:0) - (measurement period time:0.296327910 sec time_interval:296327910) - (invoke count:10000000 tsc_interval:29632785)
[  328.774308] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  329.578579] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 7 cycles(tsc) 79.523 ns (step:0) - (measurement period time:0.795236560 sec time_interval:795236560) - (invoke count:10000000 tsc_interval:79523650)
[  329.597769] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  331.507501] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.104 ns (step:0) - (measurement period time:1.901047510 sec time_interval:1901047510) - (invoke count:10000000 tsc_interval:190104743)

After this patchset:
root@(none)$ insmod bench_page_pool_simple.ko
[  138.634758] bench_page_pool_simple: Loaded
[  138.715879] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076972720 sec time_interval:76972720) - (invoke count:100000000 tsc_interval:7697265)
[  140.079897] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:1.346735370 sec time_interval:1346735370) - (invoke count:100000000 tsc_interval:134673531)
[  140.247841] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150055080 sec time_interval:150055080) - (invoke count:10000000 tsc_interval:15005497)
[  140.919072] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:0.654125000 sec time_interval:654125000) - (invoke count:100000000 tsc_interval:65412493)
[  140.936091] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  141.246985] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 30.159 ns (step:0) - (measurement period time:0.301598160 sec time_interval:301598160) - (invoke count:10000000 tsc_interval:30159812)
[  141.265654] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  141.976265] time_bench: Type:no-softirq-page_pool02 Per elem: 7 cycles(tsc) 70.140 ns (step:0) - (measurement period time:0.701405780 sec time_interval:701405780) - (invoke count:10000000 tsc_interval:70140573)
[  141.994933] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  144.018945] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.514 ns (step:0) - (measurement period time:2.015141210 sec time_interval:2015141210) - (invoke count:10000000 tsc_interval:201514113)
[  144.037966] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  144.045870] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  144.205045] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150056510 sec time_interval:150056510) - (invoke count:10000000 tsc_interval:15005645)
[  144.224320] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  144.916044] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 68.269 ns (step:0) - (measurement period time:0.682693070 sec time_interval:682693070) - (invoke count:10000000 tsc_interval:68269300)
[  144.935234] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  146.997684] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 205.376 ns (step:0) - (measurement period time:2.053766310 sec time_interval:2053766310) - (invoke count:10000000 tsc_interval:205376624)

1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/

CC: Alexander Lobakin <aleksander.lobakin@intel.com>
CC: Robin Murphy <robin.murphy@arm.com>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: IOMMU <iommu@lists.linux.dev>
CC: MM <linux-mm@kvack.org>

Change log:
V7:
  1. Fix a used-after-free bug reported by KASAN as mentioned by Jakub.
  2. Fix the 'netmem' variable not setting up correctly bug as mentioned
     by Simon.

V6:
  1. Repost based on latest net-next.
  2. Rename page_pool_to_pp() to page_pool_get_pp().

V5:
  1. Support unlimit inflight pages.
  2. Add some optimization to avoid the overhead of fixing bug.

V4:
  1. use scanning to do the unmapping
  2. spilt dma sync skipping into separate patch

V3:
  1. Target net-next tree instead of net tree.
  2. Narrow the rcu lock as the discussion in v2.
  3. Check the ummapping cnt against the inflight cnt.

V2:
  1. Add a item_full stat.
  2. Use container_of() for page_pool_to_pp().

Yunsheng Lin (8):
  page_pool: introduce page_pool_get_pp() API
  page_pool: fix timing for checking and disabling napi_local
  page_pool: fix IOMMU crash when driver has already unbound
  page_pool: support unlimited number of inflight pages
  page_pool: skip dma sync operation for inflight pages
  page_pool: use list instead of ptr_ring for ring cache
  page_pool: batch refilling pages to reduce atomic operation
  page_pool: use list instead of array for alloc cache

 drivers/net/ethernet/freescale/fec_main.c     |   8 +-
 .../ethernet/google/gve/gve_buffer_mgmt_dqo.c |   2 +-
 drivers/net/ethernet/intel/iavf/iavf_txrx.c   |   6 +-
 drivers/net/ethernet/intel/idpf/idpf_txrx.c   |  14 +-
 drivers/net/ethernet/intel/libeth/rx.c        |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |   3 +-
 drivers/net/netdevsim/netdev.c                |   6 +-
 drivers/net/wireless/mediatek/mt76/mt76.h     |   2 +-
 include/linux/mm_types.h                      |   2 +-
 include/linux/skbuff.h                        |   1 +
 include/net/libeth/rx.h                       |   3 +-
 include/net/netmem.h                          |  24 +-
 include/net/page_pool/helpers.h               |  11 +
 include/net/page_pool/types.h                 |  64 +-
 net/core/devmem.c                             |   4 +-
 net/core/netmem_priv.h                        |   5 +-
 net/core/page_pool.c                          | 664 ++++++++++++++----
 net/core/page_pool_priv.h                     |  12 +-
 18 files changed, 675 insertions(+), 158 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next v7 1/8] page_pool: introduce page_pool_get_pp() API
  2025-01-10 13:06 [PATCH net-next v7 0/8] fix two bugs related to page_pool Yunsheng Lin
@ 2025-01-10 13:06 ` Yunsheng Lin
  2025-01-14 14:31 ` [PATCH net-next v7 0/8] fix two bugs related to page_pool Jesper Dangaard Brouer
  1 sibling, 0 replies; 9+ messages in thread
From: Yunsheng Lin @ 2025-01-10 13:06 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: zhangkun09, liuyonglong, fanghaiqing, Yunsheng Lin, Wei Fang,
	Shenwei Wang, Clark Wang, Andrew Lunn, Eric Dumazet,
	Jeroen de Borst, Praveen Kaligineedi, Shailend Chand, Tony Nguyen,
	Przemek Kitszel, Alexander Lobakin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Felix Fietkau,
	Lorenzo Bianconi, Ryder Lee, Shayne Chen, Sean Wang, Kalle Valo,
	Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
	Ilias Apalodimas, imx, netdev, linux-kernel, intel-wired-lan, bpf,
	linux-rdma, linux-wireless, linux-arm-kernel, linux-mediatek

introduce page_pool_get_pp() API to avoid caller accessing
page->pp directly.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 drivers/net/ethernet/freescale/fec_main.c          |  8 +++++---
 .../net/ethernet/google/gve/gve_buffer_mgmt_dqo.c  |  2 +-
 drivers/net/ethernet/intel/iavf/iavf_txrx.c        |  6 ++++--
 drivers/net/ethernet/intel/idpf/idpf_txrx.c        | 14 +++++++++-----
 drivers/net/ethernet/intel/libeth/rx.c             |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |  3 ++-
 drivers/net/netdevsim/netdev.c                     |  6 ++++--
 drivers/net/wireless/mediatek/mt76/mt76.h          |  2 +-
 include/net/libeth/rx.h                            |  3 ++-
 include/net/page_pool/helpers.h                    |  5 +++++
 10 files changed, 34 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index b2daed55bf6c..18d2119dbec1 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1009,7 +1009,8 @@ static void fec_enet_bd_init(struct net_device *dev)
 				struct page *page = txq->tx_buf[i].buf_p;
 
 				if (page)
-					page_pool_put_page(page->pp, page, 0, false);
+					page_pool_put_page(page_pool_get_pp(page),
+							   page, 0, false);
 			}
 
 			txq->tx_buf[i].buf_p = NULL;
@@ -1549,7 +1550,7 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id, int budget)
 			xdp_return_frame_rx_napi(xdpf);
 		} else { /* recycle pages of XDP_TX frames */
 			/* The dma_sync_size = 0 as XDP_TX has already synced DMA for_device */
-			page_pool_put_page(page->pp, page, 0, true);
+			page_pool_put_page(page_pool_get_pp(page), page, 0, true);
 		}
 
 		txq->tx_buf[index].buf_p = NULL;
@@ -3307,7 +3308,8 @@ static void fec_enet_free_buffers(struct net_device *ndev)
 			} else {
 				struct page *page = txq->tx_buf[i].buf_p;
 
-				page_pool_put_page(page->pp, page, 0, false);
+				page_pool_put_page(page_pool_get_pp(page),
+						   page, 0, false);
 			}
 
 			txq->tx_buf[i].buf_p = NULL;
diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
index 403f0f335ba6..87422b8828ff 100644
--- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
@@ -210,7 +210,7 @@ void gve_free_to_page_pool(struct gve_rx_ring *rx,
 	if (!page)
 		return;
 
-	page_pool_put_full_page(page->pp, page, allow_direct);
+	page_pool_put_full_page(page_pool_get_pp(page), page, allow_direct);
 	buf_state->page_info.page = NULL;
 }
 
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 26b424fd6718..e1bf5554f6e3 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1050,7 +1050,8 @@ static void iavf_add_rx_frag(struct sk_buff *skb,
 			     const struct libeth_fqe *rx_buffer,
 			     unsigned int size)
 {
-	u32 hr = rx_buffer->page->pp->p.offset;
+	struct page_pool *pool = page_pool_get_pp(rx_buffer->page);
+	u32 hr = pool->p.offset;
 
 	skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
 			rx_buffer->offset + hr, size, rx_buffer->truesize);
@@ -1067,7 +1068,8 @@ static void iavf_add_rx_frag(struct sk_buff *skb,
 static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
 				      unsigned int size)
 {
-	u32 hr = rx_buffer->page->pp->p.offset;
+	struct page_pool *pool = page_pool_get_pp(rx_buffer->page);
+	u32 hr = pool->p.offset;
 	struct sk_buff *skb;
 	void *va;
 
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 2fa9c36e33c9..04f2347716ca 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -385,7 +385,8 @@ static void idpf_rx_page_rel(struct libeth_fqe *rx_buf)
 	if (unlikely(!rx_buf->page))
 		return;
 
-	page_pool_put_full_page(rx_buf->page->pp, rx_buf->page, false);
+	page_pool_put_full_page(page_pool_get_pp(rx_buf->page), rx_buf->page,
+				false);
 
 	rx_buf->page = NULL;
 	rx_buf->offset = 0;
@@ -3098,7 +3099,8 @@ idpf_rx_process_skb_fields(struct idpf_rx_queue *rxq, struct sk_buff *skb,
 void idpf_rx_add_frag(struct idpf_rx_buf *rx_buf, struct sk_buff *skb,
 		      unsigned int size)
 {
-	u32 hr = rx_buf->page->pp->p.offset;
+	struct page_pool *pool = page_pool_get_pp(rx_buf->page);
+	u32 hr = pool->p.offset;
 
 	skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buf->page,
 			rx_buf->offset + hr, size, rx_buf->truesize);
@@ -3130,8 +3132,10 @@ static u32 idpf_rx_hsplit_wa(const struct libeth_fqe *hdr,
 	if (!libeth_rx_sync_for_cpu(buf, copy))
 		return 0;
 
-	dst = page_address(hdr->page) + hdr->offset + hdr->page->pp->p.offset;
-	src = page_address(buf->page) + buf->offset + buf->page->pp->p.offset;
+	dst = page_address(hdr->page) + hdr->offset +
+		page_pool_get_pp(hdr->page)->p.offset;
+	src = page_address(buf->page) + buf->offset +
+		page_pool_get_pp(buf->page)->p.offset;
 	memcpy(dst, src, LARGEST_ALIGN(copy));
 
 	buf->offset += copy;
@@ -3149,7 +3153,7 @@ static u32 idpf_rx_hsplit_wa(const struct libeth_fqe *hdr,
  */
 struct sk_buff *idpf_rx_build_skb(const struct libeth_fqe *buf, u32 size)
 {
-	u32 hr = buf->page->pp->p.offset;
+	u32 hr = page_pool_get_pp(buf->page)->p.offset;
 	struct sk_buff *skb;
 	void *va;
 
diff --git a/drivers/net/ethernet/intel/libeth/rx.c b/drivers/net/ethernet/intel/libeth/rx.c
index 66d1d23b8ad2..8de0c3a3b146 100644
--- a/drivers/net/ethernet/intel/libeth/rx.c
+++ b/drivers/net/ethernet/intel/libeth/rx.c
@@ -207,7 +207,7 @@ EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_destroy, "LIBETH");
  */
 void libeth_rx_recycle_slow(struct page *page)
 {
-	page_pool_recycle_direct(page->pp, page);
+	page_pool_recycle_direct(page_pool_get_pp(page), page);
 }
 EXPORT_SYMBOL_NS_GPL(libeth_rx_recycle_slow, "LIBETH");
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 94b291662087..30baca49c71e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -716,7 +716,8 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
 				/* No need to check ((page->pp_magic & ~0x3UL) == PP_SIGNATURE)
 				 * as we know this is a page_pool page.
 				 */
-				page_pool_recycle_direct(page->pp, page);
+				page_pool_recycle_direct(page_pool_get_pp(page),
+							 page);
 			} while (++n < num);
 
 			break;
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index d013b6498539..05a04f4b51d7 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -810,7 +810,8 @@ nsim_pp_hold_write(struct file *file, const char __user *data,
 		if (!ns->page)
 			ret = -ENOMEM;
 	} else {
-		page_pool_put_full_page(ns->page->pp, ns->page, false);
+		page_pool_put_full_page(page_pool_get_pp(ns->page), ns->page,
+					false);
 		ns->page = NULL;
 	}
 
@@ -1022,7 +1023,8 @@ void nsim_destroy(struct netdevsim *ns)
 
 	/* Put this intentionally late to exercise the orphaning path */
 	if (ns->page) {
-		page_pool_put_full_page(ns->page->pp, ns->page, false);
+		page_pool_put_full_page(page_pool_get_pp(ns->page), ns->page,
+					false);
 		ns->page = NULL;
 	}
 
diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h
index ca2dba3ac65d..4d0e41a7bf4a 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76.h
+++ b/drivers/net/wireless/mediatek/mt76/mt76.h
@@ -1688,7 +1688,7 @@ static inline void mt76_put_page_pool_buf(void *buf, bool allow_direct)
 {
 	struct page *page = virt_to_head_page(buf);
 
-	page_pool_put_full_page(page->pp, page, allow_direct);
+	page_pool_put_full_page(page_pool_get_pp(page), page, allow_direct);
 }
 
 static inline void *
diff --git a/include/net/libeth/rx.h b/include/net/libeth/rx.h
index 43574bd6612f..f4ae75f9cc1b 100644
--- a/include/net/libeth/rx.h
+++ b/include/net/libeth/rx.h
@@ -137,7 +137,8 @@ static inline bool libeth_rx_sync_for_cpu(const struct libeth_fqe *fqe,
 		return false;
 	}
 
-	page_pool_dma_sync_for_cpu(page->pp, page, fqe->offset, len);
+	page_pool_dma_sync_for_cpu(page_pool_get_pp(page), page, fqe->offset,
+				   len);
 
 	return true;
 }
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 543f54fa3020..9c4dbd2289b1 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -83,6 +83,11 @@ static inline u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
 }
 #endif
 
+static inline struct page_pool *page_pool_get_pp(struct page *page)
+{
+	return page->pp;
+}
+
 /**
  * page_pool_dev_alloc_pages() - allocate a page.
  * @pool:	pool from which to allocate
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v7 0/8] fix two bugs related to page_pool
  2025-01-10 13:06 [PATCH net-next v7 0/8] fix two bugs related to page_pool Yunsheng Lin
  2025-01-10 13:06 ` [PATCH net-next v7 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
@ 2025-01-14 14:31 ` Jesper Dangaard Brouer
  2025-01-15 11:33   ` Yunsheng Lin
  1 sibling, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2025-01-14 14:31 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba, pabeni
  Cc: zhangkun09, liuyonglong, fanghaiqing, Alexander Lobakin,
	Robin Murphy, Alexander Duyck, Andrew Morton, IOMMU, MM,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Matthias Brugger, AngeloGioacchino Del Regno, netdev,
	intel-wired-lan, bpf, linux-kernel, linux-arm-kernel,
	linux-mediatek



On 10/01/2025 14.06, Yunsheng Lin wrote:
> This patchset fix a possible time window problem for page_pool and
> the dma API misuse problem as mentioned in [1], and try to avoid the
> overhead of the fixing using some optimization.
> 
>  From the below performance data, the overhead is not so obvious
> due to performance variations for time_bench_page_pool01_fast_path()
> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
> for time_bench_page_pool03_slow() for fixing the bug.
> 

My benchmarking on x86_64 CPUs looks significantly different.
  - CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

Benchmark (bench_page_pool_simple) results from before and after patchset:

| Test name  | Cycles |       |    |Nanosec |        |       |      % |
| (tasklet_*)| Before | After |diff| Before |  After |  diff | change |
|------------+--------+-------+----+--------+--------+-------+--------|
| fast_path  |     19 |    24 |   5|  5.399 |  6.928 | 1.529 |   28.3 |
| ptr_ring   |     54 |    79 |  25| 15.090 | 21.976 | 6.886 |   45.6 |
| slow       |    238 |   299 |  61| 66.134 | 83.298 |17.164 |   26.0 |
#+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f

My above testing show a clear performance regressions across three
different page_pool operating modes.


Data also available in:
  - 
https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org

Raw data below

Before this patchset:

[  157.186644] bench_page_pool_simple: Loaded
[  157.475084] time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.284 
ns (step:0) - (measurement period time:0.284327440 sec 
time_interval:284327440) - (invoke count:1000000000 tsc_interval:1023590451)
[  162.262752] time_bench: Type:atomic_inc Per elem: 17 cycles(tsc) 
4.769 ns (step:0) - (measurement period time:4.769757001 sec 
time_interval:4769757001) - (invoke count:1000000000 
tsc_interval:17171776113)
[  163.324091] time_bench: Type:lock Per elem: 37 cycles(tsc) 10.431 ns 
(step:0) - (measurement period time:1.043182161 sec 
time_interval:1043182161) - (invoke count:100000000 tsc_interval:3755514465)
[  163.341702] bench_page_pool_simple: 
time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  163.922466] time_bench: Type:no-softirq-page_pool01 Per elem: 20 
cycles(tsc) 5.713 ns (step:0) - (measurement period time:0.571357387 sec 
time_interval:571357387) - (invoke count:100000000 tsc_interval:2056911063)
[  163.941429] bench_page_pool_simple: 
time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  165.506796] time_bench: Type:no-softirq-page_pool02 Per elem: 56 
cycles(tsc) 15.560 ns (step:0) - (measurement period time:1.556080558 
sec time_interval:1556080558) - (invoke count:100000000 
tsc_interval:5601960921)
[  165.525978] bench_page_pool_simple: time_bench_page_pool03_slow(): 
Cannot use page_pool fast-path
[  171.811289] time_bench: Type:no-softirq-page_pool03 Per elem: 225 
cycles(tsc) 62.763 ns (step:0) - (measurement period time:6.276301531 
sec time_interval:6276301531) - (invoke count:100000000 
tsc_interval:22594974468)
[  171.830646] bench_page_pool_simple: pp_tasklet_handler(): 
in_serving_softirq fast-path
[  171.838561] bench_page_pool_simple: 
time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  172.387597] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 
19 cycles(tsc) 5.399 ns (step:0) - (measurement period time:0.539904228 
sec time_interval:539904228) - (invoke count:100000000 
tsc_interval:1943679246)
[  172.407130] bench_page_pool_simple: 
time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  173.925266] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 
54 cycles(tsc) 15.090 ns (step:0) - (measurement period time:1.509075496 
sec time_interval:1509075496) - (invoke count:100000000 
tsc_interval:5432740575)
[  173.944878] bench_page_pool_simple: time_bench_page_pool03_slow(): 
in_serving_softirq fast-path
[  180.567094] time_bench: Type:tasklet_page_pool03_slow Per elem: 238 
cycles(tsc) 66.134 ns (step:0) - (measurement period time:6.613430605 
sec time_interval:6613430605) - (invoke count:100000000 
tsc_interval:23808654870)



After this patchset:
[  860.519918] bench_page_pool_simple: Loaded
[  860.781605] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.257 
ns (step:0) - (measurement period time:0.257573336 sec 
time_interval:257573336) - (invoke count:1000000000 tsc_interval:927275355)
[  865.613893] time_bench: Type:atomic_inc Per elem: 17 cycles(tsc) 
4.814 ns (step:0) - (measurement period time:4.814593429 sec 
time_interval:4814593429) - (invoke count:1000000000 
tsc_interval:17332768494)
[  866.708420] time_bench: Type:lock Per elem: 38 cycles(tsc) 10.763 ns 
(step:0) - (measurement period time:1.076362960 sec 
time_interval:1076362960) - (invoke count:100000000 tsc_interval:3874955595)
[  866.726118] bench_page_pool_simple: 
time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  867.423572] time_bench: Type:no-softirq-page_pool01 Per elem: 24 
cycles(tsc) 6.880 ns (step:0) - (measurement period time:0.688069107 sec 
time_interval:688069107) - (invoke count:100000000 tsc_interval:2477080260)
[  867.442517] bench_page_pool_simple: 
time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  869.436286] time_bench: Type:no-softirq-page_pool02 Per elem: 71 
cycles(tsc) 19.844 ns (step:0) - (measurement period time:1.984451929 
sec time_interval:1984451929) - (invoke count:100000000 
tsc_interval:7144120329)
[  869.455492] bench_page_pool_simple: time_bench_page_pool03_slow(): 
Cannot use page_pool fast-path
[  877.071437] time_bench: Type:no-softirq-page_pool03 Per elem: 273 
cycles(tsc) 76.069 ns (step:0) - (measurement period time:7.606911291 
sec time_interval:7606911291) - (invoke count:100000000 
tsc_interval:27385252251)
[  877.090762] bench_page_pool_simple: pp_tasklet_handler(): 
in_serving_softirq fast-path
[  877.098683] bench_page_pool_simple: 
time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  877.800696] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 
24 cycles(tsc) 6.928 ns (step:0) - (measurement period time:0.692852876 
sec time_interval:692852876) - (invoke count:100000000 
tsc_interval:2494303293)
[  877.820224] bench_page_pool_simple: 
time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  880.026911] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 
79 cycles(tsc) 21.976 ns (step:0) - (measurement period time:2.197615122 
sec time_interval:2197615122) - (invoke count:100000000 
tsc_interval:7911521190)
[  880.046528] bench_page_pool_simple: time_bench_page_pool03_slow(): 
in_serving_softirq fast-path
[  888.385235] time_bench: Type:tasklet_page_pool03_slow Per elem: 299 
cycles(tsc) 83.298 ns (step:0) - (measurement period time:8.329893717 
sec time_interval:8329893717) - (invoke count:100000000 
tsc_interval:29988024696)




> Before this patchset:
> root@(none)$ insmod bench_page_pool_simple.ko
> [  323.367627] bench_page_pool_simple: Loaded
> [  323.448747] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076997150 sec time_interval:76997150) - (invoke count:100000000 tsc_interval:7699707)
> [  324.812884] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346855130 sec time_interval:1346855130) - (invoke count:100000000 tsc_interval:134685507)
> [  324.980875] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.010 ns (step:0) - (measurement period time:0.150101270 sec time_interval:150101270) - (invoke count:10000000 tsc_interval:15010120)
> [  325.652195] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.542 ns (step:0) - (measurement period time:0.654213000 sec time_interval:654213000) - (invoke count:100000000 tsc_interval:65421294)
> [  325.669215] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [  325.974848] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 29.633 ns (step:0) - (measurement period time:0.296338200 sec time_interval:296338200) - (invoke count:10000000 tsc_interval:29633814)
> [  325.993517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [  326.576636] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.391 ns (step:0) - (measurement period time:0.573911820 sec time_interval:573911820) - (invoke count:10000000 tsc_interval:57391174)
> [  326.595307] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [  328.422661] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.849 ns (step:0) - (measurement period time:1.818495880 sec time_interval:1818495880) - (invoke count:10000000 tsc_interval:181849581)
> [  328.441681] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [  328.449584] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [  328.755031] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 29.632 ns (step:0) - (measurement period time:0.296327910 sec time_interval:296327910) - (invoke count:10000000 tsc_interval:29632785)
> [  328.774308] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [  329.578579] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 7 cycles(tsc) 79.523 ns (step:0) - (measurement period time:0.795236560 sec time_interval:795236560) - (invoke count:10000000 tsc_interval:79523650)
> [  329.597769] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [  331.507501] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.104 ns (step:0) - (measurement period time:1.901047510 sec time_interval:1901047510) - (invoke count:10000000 tsc_interval:190104743)
> 
> After this patchset:
> root@(none)$ insmod bench_page_pool_simple.ko
> [  138.634758] bench_page_pool_simple: Loaded
> [  138.715879] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076972720 sec time_interval:76972720) - (invoke count:100000000 tsc_interval:7697265)
> [  140.079897] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:1.346735370 sec time_interval:1346735370) - (invoke count:100000000 tsc_interval:134673531)
> [  140.247841] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150055080 sec time_interval:150055080) - (invoke count:10000000 tsc_interval:15005497)
> [  140.919072] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:0.654125000 sec time_interval:654125000) - (invoke count:100000000 tsc_interval:65412493)
> [  140.936091] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [  141.246985] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 30.159 ns (step:0) - (measurement period time:0.301598160 sec time_interval:301598160) - (invoke count:10000000 tsc_interval:30159812)
> [  141.265654] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [  141.976265] time_bench: Type:no-softirq-page_pool02 Per elem: 7 cycles(tsc) 70.140 ns (step:0) - (measurement period time:0.701405780 sec time_interval:701405780) - (invoke count:10000000 tsc_interval:70140573)
> [  141.994933] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [  144.018945] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.514 ns (step:0) - (measurement period time:2.015141210 sec time_interval:2015141210) - (invoke count:10000000 tsc_interval:201514113)
> [  144.037966] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [  144.045870] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [  144.205045] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150056510 sec time_interval:150056510) - (invoke count:10000000 tsc_interval:15005645)
> [  144.224320] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [  144.916044] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 68.269 ns (step:0) - (measurement period time:0.682693070 sec time_interval:682693070) - (invoke count:10000000 tsc_interval:68269300)
> [  144.935234] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [  146.997684] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 205.376 ns (step:0) - (measurement period time:2.053766310 sec time_interval:2053766310) - (invoke count:10000000 tsc_interval:205376624)
> 
> 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
> 
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> CC: Robin Murphy <robin.murphy@arm.com>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: IOMMU <iommu@lists.linux.dev>
> CC: MM <linux-mm@kvack.org>
> 
> Change log:
> V7:
>    1. Fix a used-after-free bug reported by KASAN as mentioned by Jakub.
>    2. Fix the 'netmem' variable not setting up correctly bug as mentioned
>       by Simon.
> 
> V6:
>    1. Repost based on latest net-next.
>    2. Rename page_pool_to_pp() to page_pool_get_pp().
> 
> V5:
>    1. Support unlimit inflight pages.
>    2. Add some optimization to avoid the overhead of fixing bug.
> 
> V4:
>    1. use scanning to do the unmapping
>    2. spilt dma sync skipping into separate patch
> 
> V3:
>    1. Target net-next tree instead of net tree.
>    2. Narrow the rcu lock as the discussion in v2.
>    3. Check the ummapping cnt against the inflight cnt.
> 
> V2:
>    1. Add a item_full stat.
>    2. Use container_of() for page_pool_to_pp().
> 
> Yunsheng Lin (8):
>    page_pool: introduce page_pool_get_pp() API
>    page_pool: fix timing for checking and disabling napi_local
>    page_pool: fix IOMMU crash when driver has already unbound
>    page_pool: support unlimited number of inflight pages
>    page_pool: skip dma sync operation for inflight pages
>    page_pool: use list instead of ptr_ring for ring cache
>    page_pool: batch refilling pages to reduce atomic operation
>    page_pool: use list instead of array for alloc cache
> 
>   drivers/net/ethernet/freescale/fec_main.c     |   8 +-
>   .../ethernet/google/gve/gve_buffer_mgmt_dqo.c |   2 +-
>   drivers/net/ethernet/intel/iavf/iavf_txrx.c   |   6 +-
>   drivers/net/ethernet/intel/idpf/idpf_txrx.c   |  14 +-
>   drivers/net/ethernet/intel/libeth/rx.c        |   2 +-
>   .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |   3 +-
>   drivers/net/netdevsim/netdev.c                |   6 +-
>   drivers/net/wireless/mediatek/mt76/mt76.h     |   2 +-
>   include/linux/mm_types.h                      |   2 +-
>   include/linux/skbuff.h                        |   1 +
>   include/net/libeth/rx.h                       |   3 +-
>   include/net/netmem.h                          |  24 +-
>   include/net/page_pool/helpers.h               |  11 +
>   include/net/page_pool/types.h                 |  64 +-
>   net/core/devmem.c                             |   4 +-
>   net/core/netmem_priv.h                        |   5 +-
>   net/core/page_pool.c                          | 664 ++++++++++++++----
>   net/core/page_pool_priv.h                     |  12 +-
>   18 files changed, 675 insertions(+), 158 deletions(-)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v7 0/8] fix two bugs related to page_pool
  2025-01-14 14:31 ` [PATCH net-next v7 0/8] fix two bugs related to page_pool Jesper Dangaard Brouer
@ 2025-01-15 11:33   ` Yunsheng Lin
  2025-01-15 17:40     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Yunsheng Lin @ 2025-01-15 11:33 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, davem, kuba, pabeni
  Cc: zhangkun09, liuyonglong, fanghaiqing, Alexander Lobakin,
	Robin Murphy, Alexander Duyck, Andrew Morton, IOMMU, MM,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Matthias Brugger, AngeloGioacchino Del Regno, netdev,
	intel-wired-lan, bpf, linux-kernel, linux-arm-kernel,
	linux-mediatek

[-- Attachment #1: Type: text/plain, Size: 8662 bytes --]

On 2025/1/14 22:31, Jesper Dangaard Brouer wrote:
> 
> 
> On 10/01/2025 14.06, Yunsheng Lin wrote:
>> This patchset fix a possible time window problem for page_pool and
>> the dma API misuse problem as mentioned in [1], and try to avoid the
>> overhead of the fixing using some optimization.
>>
>>  From the below performance data, the overhead is not so obvious
>> due to performance variations for time_bench_page_pool01_fast_path()
>> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
>> for time_bench_page_pool03_slow() for fixing the bug.
>>
> 
> My benchmarking on x86_64 CPUs looks significantly different.
>  - CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
> 
> Benchmark (bench_page_pool_simple) results from before and after patchset:
> 
> | Test name  | Cycles |       |    |Nanosec |        |       |      % |
> | (tasklet_*)| Before | After |diff| Before |  After |  diff | change |
> |------------+--------+-------+----+--------+--------+-------+--------|
> | fast_path  |     19 |    24 |   5|  5.399 |  6.928 | 1.529 |   28.3 |
> | ptr_ring   |     54 |    79 |  25| 15.090 | 21.976 | 6.886 |   45.6 |
> | slow       |    238 |   299 |  61| 66.134 | 83.298 |17.164 |   26.0 |
> #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f
> 
> My above testing show a clear performance regressions across three
> different page_pool operating modes.

I retested it on arm64 server patch by patch as the raw performance
data in the attachment, it seems the result seemed similar as before.

Before this patchset:
            fast_path              ptr_ring            slow
1.         31.171 ns               60.980 ns          164.917 ns
2.         28.824 ns               60.891 ns          170.241 ns
3.         14.236 ns               60.583 ns          164.355 ns

With patch 1-4:
4.         31.443 ns               53.242 ns          210.148 ns
5.         31.406 ns               53.270 ns          210.189 ns

With patch 1-5:
6.         26.163 ns               53.781 ns          189.450 ns
7.         26.189 ns               53.798 ns          189.466 ns

With patch 1-8:
8.         28.108 ns               68.199 ns          202.516 ns
9.         16.128 ns               55.904 ns          202.711 ns

I am not able to get hold of a x86 server yet, I might be able
to get one during weekend.

Theoretically, patch 1-4 or 1-5 should not have much performance
impact for fast_path and ptr_ring except for the rcu_lock mentioned
in page_pool_napi_local(), so it would be good if patch 1-5 is also
tested in your testlab with the rcu_lock removing in
page_pool_napi_local().

> 
> 
> Data also available in:
>  - https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org
> 
> Raw data below
> 
> Before this patchset:
> 
> [  157.186644] bench_page_pool_simple: Loaded
> [  157.475084] time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.284 ns (step:0) - (measurement period time:0.284327440 sec time_interval:284327440) - (invoke count:1000000000 tsc_interval:1023590451)
> [  162.262752] time_bench: Type:atomic_inc Per elem: 17 cycles(tsc) 4.769 ns (step:0) - (measurement period time:4.769757001 sec time_interval:4769757001) - (invoke count:1000000000 tsc_interval:17171776113)
> [  163.324091] time_bench: Type:lock Per elem: 37 cycles(tsc) 10.431 ns (step:0) - (measurement period time:1.043182161 sec time_interval:1043182161) - (invoke count:100000000 tsc_interval:3755514465)
> [  163.341702] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [  163.922466] time_bench: Type:no-softirq-page_pool01 Per elem: 20 cycles(tsc) 5.713 ns (step:0) - (measurement period time:0.571357387 sec time_interval:571357387) - (invoke count:100000000 tsc_interval:2056911063)
> [  163.941429] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [  165.506796] time_bench: Type:no-softirq-page_pool02 Per elem: 56 cycles(tsc) 15.560 ns (step:0) - (measurement period time:1.556080558 sec time_interval:1556080558) - (invoke count:100000000 tsc_interval:5601960921)
> [  165.525978] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [  171.811289] time_bench: Type:no-softirq-page_pool03 Per elem: 225 cycles(tsc) 62.763 ns (step:0) - (measurement period time:6.276301531 sec time_interval:6276301531) - (invoke count:100000000 tsc_interval:22594974468)
> [  171.830646] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [  171.838561] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [  172.387597] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 5.399 ns (step:0) - (measurement period time:0.539904228 sec time_interval:539904228) - (invoke count:100000000 tsc_interval:1943679246)
> [  172.407130] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [  173.925266] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 54 cycles(tsc) 15.090 ns (step:0) - (measurement period time:1.509075496 sec time_interval:1509075496) - (invoke count:100000000 tsc_interval:5432740575)
> [  173.944878] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [  180.567094] time_bench: Type:tasklet_page_pool03_slow Per elem: 238 cycles(tsc) 66.134 ns (step:0) - (measurement period time:6.613430605 sec time_interval:6613430605) - (invoke count:100000000 tsc_interval:23808654870)
> 
> 
> 
> After this patchset:
> [  860.519918] bench_page_pool_simple: Loaded
> [  860.781605] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.257 ns (step:0) - (measurement period time:0.257573336 sec time_interval:257573336) - (invoke count:1000000000 tsc_interval:927275355)
> [  865.613893] time_bench: Type:atomic_inc Per elem: 17 cycles(tsc) 4.814 ns (step:0) - (measurement period time:4.814593429 sec time_interval:4814593429) - (invoke count:1000000000 tsc_interval:17332768494)
> [  866.708420] time_bench: Type:lock Per elem: 38 cycles(tsc) 10.763 ns (step:0) - (measurement period time:1.076362960 sec time_interval:1076362960) - (invoke count:100000000 tsc_interval:3874955595)
> [  866.726118] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [  867.423572] time_bench: Type:no-softirq-page_pool01 Per elem: 24 cycles(tsc) 6.880 ns (step:0) - (measurement period time:0.688069107 sec time_interval:688069107) - (invoke count:100000000 tsc_interval:2477080260)
> [  867.442517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [  869.436286] time_bench: Type:no-softirq-page_pool02 Per elem: 71 cycles(tsc) 19.844 ns (step:0) - (measurement period time:1.984451929 sec time_interval:1984451929) - (invoke count:100000000 tsc_interval:7144120329)
> [  869.455492] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [  877.071437] time_bench: Type:no-softirq-page_pool03 Per elem: 273 cycles(tsc) 76.069 ns (step:0) - (measurement period time:7.606911291 sec time_interval:7606911291) - (invoke count:100000000 tsc_interval:27385252251)
> [  877.090762] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [  877.098683] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [  877.800696] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 24 cycles(tsc) 6.928 ns (step:0) - (measurement period time:0.692852876 sec time_interval:692852876) - (invoke count:100000000 tsc_interval:2494303293)
> [  877.820224] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [  880.026911] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 79 cycles(tsc) 21.976 ns (step:0) - (measurement period time:2.197615122 sec time_interval:2197615122) - (invoke count:100000000 tsc_interval:7911521190)
> [  880.046528] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [  888.385235] time_bench: Type:tasklet_page_pool03_slow Per elem: 299 cycles(tsc) 83.298 ns (step:0) - (measurement period time:8.329893717 sec time_interval:8329893717) - (invoke count:100000000 tsc_interval:29988024696)

As mentioned by Toke, we may be able to reduce the performance difference
between tasklet and non-tasklet testcases by removing the rcu_lock in
page_pool_napi_local() for patch 1 as in_softirq() checking in
page_pool_napi_local() should ensure RCU-bh read-side critical section.

[-- Attachment #2: pp_inflight_fix_v7_perf_data.txt --]
[-- Type: text/plain, Size: 69039 bytes --]


07ea810753bd Revert "page_pool: introduce page_pool_get_pp() API"
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  118.835127] bench_page_pool_simple: Loaded
[  119.608858] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769587320 sec time_interval:769587320) - (invoke count:1000000000 tsc_interval:76958720)
[  136.559273] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 16.932 ns (step:0) - (measurement period time:16.932925510 sec time_interval:16932925510) - (invoke count:1000000000 tsc_interval:1693292543)
[  138.078107] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500666520 sec time_interval:1500666520) - (invoke count:100000000 tsc_interval:150066646)
[  144.636732] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541323980 sec time_interval:6541323980) - (invoke count:1000000000 tsc_interval:654132391)
[  144.653948] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  147.780571] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.173 ns (step:0) - (measurement period time:3.117359810 sec time_interval:3117359810) - (invoke count:100000000 tsc_interval:311735974)
[  147.799427] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  153.566322] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.577 ns (step:0) - (measurement period time:5.757708010 sec time_interval:5757708010) - (invoke count:100000000 tsc_interval:575770795)
[  153.585178] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  171.732446] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.384 ns (step:0) - (measurement period time:18.138436700 sec time_interval:18138436700) - (invoke count:100000000 tsc_interval:1813843661)
[  171.751744] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  171.759626] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  174.885885] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.171 ns (step:0) - (measurement period time:3.117169710 sec time_interval:3117169710) - (invoke count:100000000 tsc_interval:311716965)
[  174.905345] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  181.012397] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.980 ns (step:0) - (measurement period time:6.098047810 sec time_interval:6098047810) - (invoke count:100000000 tsc_interval:609804775)
[  181.031770] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path

[  197.532151] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 164.917 ns (step:0) - (measurement period time:16.491723510 sec time_interval:16491723510) - (invoke count:100000000 tsc_interval:1649172345)
root@(none)$
root@(none)$
root@(none)$ rmmod bench_page_pool_simple.ko
[  209.510186] bench_page_pool_simple: Unloaded
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  210.659129] bench_page_pool_simple: Loaded
[  211.432882] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769610810 sec time_interval:769610810) - (invoke count:1000000000 tsc_interval:76961072)
[  224.917831] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467473740 sec time_interval:13467473740) - (invoke count:1000000000 tsc_interval:1346747368)
[  226.436667] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500671210 sec time_interval:1500671210) - (invoke count:100000000 tsc_interval:150067117)
[  232.995372] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541405330 sec time_interval:6541405330) - (invoke count:1000000000 tsc_interval:654140528)
[  233.012586] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  236.139341] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.174 ns (step:0) - (measurement period time:3.117491630 sec time_interval:3117491630) - (invoke count:100000000 tsc_interval:311749159)
[  236.158197] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  241.926861] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.594 ns (step:0) - (measurement period time:5.759481900 sec time_interval:5759481900) - (invoke count:100000000 tsc_interval:575948185)
[  241.945717] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  259.747779] time_bench: Type:no-softirq-page_pool03 Per elem: 17 cycles(tsc) 177.932 ns (step:0) - (measurement period time:17.793230520 sec time_interval:17793230520) - (invoke count:100000000 tsc_interval:1779323045)
[  259.767070] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  259.774951] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  262.901276] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.172 ns (step:0) - (measurement period time:3.117235450 sec time_interval:3117235450) - (invoke count:100000000 tsc_interval:311723540)
[  262.920737] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  269.016589] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.868 ns (step:0) - (measurement period time:6.086848810 sec time_interval:6086848810) - (invoke count:100000000 tsc_interval:608684876)
[  269.035963] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  285.540301] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 164.956 ns (step:0) - (measurement period time:16.495681400 sec time_interval:16495681400) - (invoke count:100000000 tsc_interval:1649568134)
root@(none)$ cat /proc/version
Linux version 6.13.0-rc6-00905-g07ea810753bd (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #295 SMP PREEMPT Wed Jan 15 11:22:27 CST 2025

root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  102.478309] bench_page_pool_simple: Loaded
[  103.252061] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769609840 sec time_interval:769609840) - (invoke count:1000000000 tsc_interval:76960976)
[  116.737122] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467584160 sec time_interval:13467584160) - (invoke count:1000000000 tsc_interval:1346758411)
[  118.255948] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500661720 sec time_interval:1500661720) - (invoke count:100000000 tsc_interval:150066166)
[  124.814672] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541425600 sec time_interval:6541425600) - (invoke count:1000000000 tsc_interval:654142555)
[  124.831887] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  126.355730] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 15.145 ns (step:0) - (measurement period time:1.514579980 sec time_interval:1514579980) - (invoke count:100000000 tsc_interval:151457991)
[  126.374588] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  132.139818] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.560 ns (step:0) - (measurement period time:5.756052820 sec time_interval:5756052820) - (invoke count:100000000 tsc_interval:575605276)
[  132.158674] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  149.943233] time_bench: Type:no-softirq-page_pool03 Per elem: 17 cycles(tsc) 177.757 ns (step:0) - (measurement period time:17.775726280 sec time_interval:17775726280) - (invoke count:100000000 tsc_interval:1777572621)
[  149.962525] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  149.970407] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  152.861903] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 28.824 ns (step:0) - (measurement period time:2.882405020 sec time_interval:2882405020) - (invoke count:100000000 tsc_interval:288240495)
[  152.881364] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  158.979512] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.891 ns (step:0) - (measurement period time:6.089144870 sec time_interval:6089144870) - (invoke count:100000000 tsc_interval:608914482)
[  158.998884] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  176.031659] time_bench: Type:tasklet_page_pool03_slow Per elem: 17 cycles(tsc) 170.241 ns (step:0) - (measurement period time:17.024117960 sec time_interval:17024117960) - (invoke count:100000000 tsc_interval:1702411789)

root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  442.818325] bench_page_pool_simple: Loaded
[  443.592055] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769610330 sec time_interval:769610330) - (invoke count:1000000000 tsc_interval:76961025)
[  458.439817] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 14.830 ns (step:0) - (measurement period time:14.830285600 sec time_interval:14830285600) - (invoke count:1000000000 tsc_interval:1483028556)
[  459.958698] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.007 ns (step:0) - (measurement period time:1.500714240 sec time_interval:1500714240) - (invoke count:100000000 tsc_interval:150071418)
[  466.517515] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541516880 sec time_interval:6541516880) - (invoke count:1000000000 tsc_interval:654151682)
[  466.534728] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  468.047027] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 15.030 ns (step:0) - (measurement period time:1.503035130 sec time_interval:1503035130) - (invoke count:100000000 tsc_interval:150303507)
[  468.065883] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  473.829596] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.545 ns (step:0) - (measurement period time:5.754537290 sec time_interval:5754537290) - (invoke count:100000000 tsc_interval:575453724)
[  473.848452] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  491.124253] time_bench: Type:no-softirq-page_pool03 Per elem: 17 cycles(tsc) 172.669 ns (step:0) - (measurement period time:17.266968680 sec time_interval:17266968680) - (invoke count:100000000 tsc_interval:1726696861)
[  491.143550] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  491.151434] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  493.118656] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 19.581 ns (step:0) - (measurement period time:1.958131510 sec time_interval:1958131510) - (invoke count:100000000 tsc_interval:195813143)
[  493.138115] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  499.227968] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.808 ns (step:0) - (measurement period time:6.080847450 sec time_interval:6080847450) - (invoke count:100000000 tsc_interval:608084740)
[  499.247339] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  515.691157] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 164.351 ns (step:0) - (measurement period time:16.435160550 sec time_interval:16435160550) - (invoke count:100000000 tsc_interval:1643516048)
root@(none)$ rmmod bench_page_pool_simple.ko
[  683.197394] bench_page_pool_simple: Unloaded
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  684.374311] bench_page_pool_simple: Loaded
[  685.148035] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769604180 sec time_interval:769604180) - (invoke count:1000000000 tsc_interval:76960410)
[  698.632947] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467434190 sec time_interval:13467434190) - (invoke count:1000000000 tsc_interval:1346743412)
[  700.151767] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500657020 sec time_interval:1500657020) - (invoke count:100000000 tsc_interval:150065696)
[  706.710339] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541272330 sec time_interval:6541272330) - (invoke count:1000000000 tsc_interval:654127227)
[  706.727553] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  709.619400] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 28.825 ns (step:0) - (measurement period time:2.882584100 sec time_interval:2882584100) - (invoke count:100000000 tsc_interval:288258403)
[  709.638256] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  715.411633] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.642 ns (step:0) - (measurement period time:5.764201050 sec time_interval:5764201050) - (invoke count:100000000 tsc_interval:576420099)
[  715.430493] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  732.168906] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 167.295 ns (step:0) - (measurement period time:16.729578200 sec time_interval:16729578200) - (invoke count:100000000 tsc_interval:1672957815)
[  732.188197] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  732.196078] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  733.628852] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 14.236 ns (step:0) - (measurement period time:1.423682990 sec time_interval:1423682990) - (invoke count:100000000 tsc_interval:142368292)
[  733.648311] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  739.715700] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.583 ns (step:0) - (measurement period time:6.058384260 sec time_interval:6058384260) - (invoke count:100000000 tsc_interval:605838420)
[  739.735073] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  756.179270] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 164.355 ns (step:0) - (measurement period time:16.435539700 sec time_interval:16435539700) - (invoke count:100000000 tsc_interval:1643553963)
root@(none)$ cat /proc/version
Linux version 6.13.0-rc6-00905-g07ea810753bd (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #295 SMP PREEMPT Wed Jan 15 11:22:27 CST 2025


c8cd65aea46f (HEAD -> pp-inflight-fix_v6_test) Revert "page_pool: fix IOMMU crash when driver has already unbound"
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  112.284533] bench_page_pool_simple: Loaded
[  113.058250] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769595440 sec time_interval:769595440) - (invoke count:1000000000 tsc_interval:76959536)
[  126.543325] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467599580 sec time_interval:13467599580) - (invoke count:1000000000 tsc_interval:1346759954)
[  128.062178] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500688700 sec time_interval:1500688700) - (invoke count:100000000 tsc_interval:150068863)
[  134.620885] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541407810 sec time_interval:6541407810) - (invoke count:1000000000 tsc_interval:654140776)
[  134.638100] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  137.764295] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.169 ns (step:0) - (measurement period time:3.116932100 sec time_interval:3116932100) - (invoke count:100000000 tsc_interval:311693204)
[  137.783151] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  143.556498] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.641 ns (step:0) - (measurement period time:5.764165830 sec time_interval:5764165830) - (invoke count:100000000 tsc_interval:576416578)
[  143.575354] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  160.391936] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 168.077 ns (step:0) - (measurement period time:16.807748380 sec time_interval:16807748380) - (invoke count:100000000 tsc_interval:1680774833)
[  160.411228] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  160.419110] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  163.025216] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 25.970 ns (step:0) - (measurement period time:2.597014370 sec time_interval:2597014370) - (invoke count:100000000 tsc_interval:259701433)
[  163.044675] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  169.169341] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 61.156 ns (step:0) - (measurement period time:6.115661410 sec time_interval:6115661410) - (invoke count:100000000 tsc_interval:611566136)
[  169.188712] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  185.721921] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 165.245 ns (step:0) - (measurement period time:16.524552130 sec time_interval:16524552130) - (invoke count:100000000 tsc_interval:1652455208)
root@(none)$ rmmod bench_page_pool_simple.ko
[  228.647567] bench_page_pool_simple: Unloaded
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  229.756515] bench_page_pool_simple: Loaded
[  230.530211] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769571820 sec time_interval:769571820) - (invoke count:1000000000 tsc_interval:76957172)
[  244.015118] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467427880 sec time_interval:13467427880) - (invoke count:1000000000 tsc_interval:1346742782)
[  245.533931] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500649840 sec time_interval:1500649840) - (invoke count:100000000 tsc_interval:150064979)
[  252.092555] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541318290 sec time_interval:6541318290) - (invoke count:1000000000 tsc_interval:654131824)
[  252.109769] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  253.543110] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 14.240 ns (step:0) - (measurement period time:1.424077550 sec time_interval:1424077550) - (invoke count:100000000 tsc_interval:142407750)
[  253.561963] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  259.320132] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.489 ns (step:0) - (measurement period time:5.748989970 sec time_interval:5748989970) - (invoke count:100000000 tsc_interval:574898993)
[  259.338990] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  276.124086] time_bench: Type:no-softirq-page_pool03 Per elem: 16 cycles(tsc) 167.762 ns (step:0) - (measurement period time:16.776264180 sec time_interval:16776264180) - (invoke count:100000000 tsc_interval:1677626413)
[  276.143377] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  276.151259] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  277.584309] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 14.239 ns (step:0) - (measurement period time:1.423960790 sec time_interval:1423960790) - (invoke count:100000000 tsc_interval:142396074)
[  277.603769] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  283.675754] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 60.629 ns (step:0) - (measurement period time:6.062981570 sec time_interval:6062981570) - (invoke count:100000000 tsc_interval:606298151)
[  283.695128] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  300.180187] time_bench: Type:tasklet_page_pool03_slow Per elem: 16 cycles(tsc) 164.764 ns (step:0) - (measurement period time:16.476401670 sec time_interval:16476401670) - (invoke count:100000000 tsc_interval:1647640163)
root@(none)$ cat /proc/version
Linux version 6.13.0-rc6-00903-gc8cd65aea46f (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #296 SMP PREEMPT Wed Jan 15 11:29:54 CST 2025



d8de0484ad23------page_pool: fix IOMMU crash when driver has already unbound
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  352.981066] bench_page_pool_simple: Loaded
[  353.754833] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769612830 sec time_interval:769612830) - (invoke count:1000000000 tsc_interval:76961275)
[  367.239820] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467509700 sec time_interval:13467509700) - (invoke count:1000000000 tsc_interval:1346750932)
[  368.758688] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.007 ns (step:0) - (measurement period time:1.500703810 sec time_interval:1500703810) - (invoke count:100000000 tsc_interval:150070375)
[  375.317433] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541446010 sec time_interval:6541446010) - (invoke count:1000000000 tsc_interval:654144595)
[  375.334647] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  378.470719] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.268 ns (step:0) - (measurement period time:3.126808010 sec time_interval:3126808010) - (invoke count:100000000 tsc_interval:312680796)
[  378.489580] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  384.237992] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.392 ns (step:0) - (measurement period time:5.739235000 sec time_interval:5739235000) - (invoke count:100000000 tsc_interval:573923493)
[  384.256846] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  404.284227] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 200.185 ns (step:0) - (measurement period time:20.018549500 sec time_interval:20018549500) - (invoke count:100000000 tsc_interval:2001854942)
[  404.303523] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  404.311405] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  407.450798] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.303 ns (step:0) - (measurement period time:3.130301150 sec time_interval:3130301150) - (invoke count:100000000 tsc_interval:313030109)
[  407.470257] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  413.117820] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 56.385 ns (step:0) - (measurement period time:5.638558540 sec time_interval:5638558540) - (invoke count:100000000 tsc_interval:563855847)
[  413.137192] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  433.250575] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 201.047 ns (step:0) - (measurement period time:20.104725790 sec time_interval:20104725790) - (invoke count:100000000 tsc_interval:2010472573)
root@(none)$ rmmod bench_page_pool_simple.ko
[  481.612067] bench_page_pool_simple: Unloaded
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  482.525041] bench_page_pool_simple: Loaded
[  483.298777] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769612290 sec time_interval:769612290) - (invoke count:1000000000 tsc_interval:76961221)
[  496.783660] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467404470 sec time_interval:13467404470) - (invoke count:1000000000 tsc_interval:1346740441)
[  498.302476] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500651360 sec time_interval:1500651360) - (invoke count:100000000 tsc_interval:150065132)
[  504.861015] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541237000 sec time_interval:6541237000) - (invoke count:1000000000 tsc_interval:654123694)
[  504.878228] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  508.017855] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.303 ns (step:0) - (measurement period time:3.130363490 sec time_interval:3130363490) - (invoke count:100000000 tsc_interval:313036345)
[  508.036725] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  513.777554] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.316 ns (step:0) - (measurement period time:5.731647070 sec time_interval:5731647070) - (invoke count:100000000 tsc_interval:573164701)
[  513.796408] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  533.821092] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 200.158 ns (step:0) - (measurement period time:20.015853910 sec time_interval:20015853910) - (invoke count:100000000 tsc_interval:2001585384)
[  533.840385] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  533.848266] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  536.987413] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.300 ns (step:0) - (measurement period time:3.130056990 sec time_interval:3130056990) - (invoke count:100000000 tsc_interval:313005695)
[  537.006870] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  542.553443] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 55.375 ns (step:0) - (measurement period time:5.537567730 sec time_interval:5537567730) - (invoke count:100000000 tsc_interval:553756767)
[  542.572814] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  562.622903] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 200.414 ns (step:0) - (measurement period time:20.041430960 sec time_interval:20041430960) - (invoke count:100000000 tsc_interval:2004143090)
root@(none)$


b53806ee8b03 (HEAD -> pp-inflight-fix_v6_test) page_pool: support unlimited number of inflight pages
root@(none)$ insmod time_bench.ko
[   57.826902] time_bench: loading out-of-tree module taints kernel.
[   57.833978] time_bench: Loaded
root@(none)$  insmod bench_page_pool_simple.ko loops=100000000
[   66.015795] bench_page_pool_simple: Loaded
[   66.789504] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769581100 sec time_interval:769581100) - (invoke count:1000000000 tsc_interval:76958101)
[   85.985445] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 19.178 ns (step:0) - (measurement period time:19.178464890 sec time_interval:19178464890) - (invoke count:1000000000 tsc_interval:1917846484)
[   87.504318] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.007 ns (step:0) - (measurement period time:1.500707820 sec time_interval:1500707820) - (invoke count:100000000 tsc_interval:150070776)
[   94.062989] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541369880 sec time_interval:6541369880) - (invoke count:1000000000 tsc_interval:654136982)
[   94.080203] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[   97.229937] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 31.404 ns (step:0) - (measurement period time:3.140470140 sec time_interval:3140470140) - (invoke count:100000000 tsc_interval:314047009)
[   97.248793] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  102.967699] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.097 ns (step:0) - (measurement period time:5.709729700 sec time_interval:5709729700) - (invoke count:100000000 tsc_interval:570972963)
[  102.986554] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  123.332228] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 203.368 ns (step:0) - (measurement period time:20.336842600 sec time_interval:20336842600) - (invoke count:100000000 tsc_interval:2033684253)
[  123.351522] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  123.359404] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  126.512828] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.443 ns (step:0) - (measurement period time:3.144333160 sec time_interval:3144333160) - (invoke count:100000000 tsc_interval:314433311)
[  126.532286] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  131.865545] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 53.242 ns (step:0) - (measurement period time:5.324254260 sec time_interval:5324254260) - (invoke count:100000000 tsc_interval:532425421)
[  131.884917] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  152.908467] time_bench: Type:tasklet_page_pool03_slow Per elem: 21 cycles(tsc) 210.148 ns (step:0) - (measurement period time:21.014892650 sec time_interval:21014892650) - (invoke count:100000000 tsc_interval:2101489259)
root@(none)$ rmmod bench_page_pool_simple.ko
[  163.826865] bench_page_pool_simple: Unloaded
root@(none)$  insmod bench_page_pool_simple.ko loops=100000000
[  164.867796] bench_page_pool_simple: Loaded
[  165.641522] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769607400 sec time_interval:769607400) - (invoke count:1000000000 tsc_interval:76960732)
[  179.126540] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467542660 sec time_interval:13467542660) - (invoke count:1000000000 tsc_interval:1346754260)
[  180.645378] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500671580 sec time_interval:1500671580) - (invoke count:100000000 tsc_interval:150067152)
[  187.204029] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541350520 sec time_interval:6541350520) - (invoke count:1000000000 tsc_interval:654135046)
[  187.221243] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  188.577413] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346892420 sec time_interval:1346892420) - (invoke count:100000000 tsc_interval:134689236)
[  188.596268] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  194.314705] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.092 ns (step:0) - (measurement period time:5.709260290 sec time_interval:5709260290) - (invoke count:100000000 tsc_interval:570926024)
[  194.333561] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  214.660328] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 203.179 ns (step:0) - (measurement period time:20.317934940 sec time_interval:20317934940) - (invoke count:100000000 tsc_interval:2031793485)
[  214.679620] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  214.687501] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  217.837259] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 3 cycles(tsc) 31.406 ns (step:0) - (measurement period time:3.140666230 sec time_interval:3140666230) - (invoke count:100000000 tsc_interval:314066616)
[  217.856720] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  223.192797] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 53.270 ns (step:0) - (measurement period time:5.327072820 sec time_interval:5327072820) - (invoke count:100000000 tsc_interval:532707276)
[  223.212169] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  244.239728] time_bench: Type:tasklet_page_pool03_slow Per elem: 21 cycles(tsc) 210.189 ns (step:0) - (measurement period time:21.018901830 sec time_interval:21018901830) - (invoke count:100000000 tsc_interval:2101890177)
root@(none)$ cat /proc/version
Linux version 6.13.0-rc6-00903-gb53806ee8b03 (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #297 SMP PREEMPT Wed Jan 15 11:43:41 CST 2025




249fa431270c (HEAD -> pp-inflight-fix_v6_test) page_pool: skip dma sync operation for inflight pages
root@(none)$ cat /proc/version
Linux version 6.13.0-rc6-00904-g249fa431270c (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #300 SMP PREEMPT Wed Jan 15 14:21:51 CST 2025
root@(none)$ rmmod bench_page_pool_simple.ko
[  459.241973] bench_page_pool_simple: Unloaded
root@(none)$
root@(none)$
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  462.674971] bench_page_pool_simple: Loaded
[  463.448730] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769614430 sec time_interval:769614430) - (invoke count:1000000000 tsc_interval:76961435)
[  476.933835] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467629020 sec time_interval:13467629020) - (invoke count:1000000000 tsc_interval:1346762898)
[  478.452709] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.007 ns (step:0) - (measurement period time:1.500710750 sec time_interval:1500710750) - (invoke count:100000000 tsc_interval:150071069)
[  485.011458] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541449970 sec time_interval:6541449970) - (invoke count:1000000000 tsc_interval:654144991)
[  485.028671] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  486.500170] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 14.622 ns (step:0) - (measurement period time:1.462234950 sec time_interval:1462234950) - (invoke count:100000000 tsc_interval:146223489)
[  486.519026] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  491.827181] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 52.989 ns (step:0) - (measurement period time:5.298974920 sec time_interval:5298974920) - (invoke count:100000000 tsc_interval:529897484)
[  491.846039] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  509.968937] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.140 ns (step:0) - (measurement period time:18.114063050 sec time_interval:18114063050) - (invoke count:100000000 tsc_interval:1811406296)
[  509.988228] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  509.996109] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  512.621549] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 26.163 ns (step:0) - (measurement period time:2.616350750 sec time_interval:2616350750) - (invoke count:100000000 tsc_interval:261635069)
[  512.641009] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  518.028167] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 53.781 ns (step:0) - (measurement period time:5.378154590 sec time_interval:5378154590) - (invoke count:100000000 tsc_interval:537815454)
[  518.047541] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  537.001263] time_bench: Type:tasklet_page_pool03_slow Per elem: 18 cycles(tsc) 189.450 ns (step:0) - (measurement period time:18.945065660 sec time_interval:18945065660) - (invoke count:100000000 tsc_interval:1894506561)
root@(none)$ rmmod bench_page_pool_simple.ko
[  554.270004] bench_page_pool_simple: Unloaded
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  555.334974] bench_page_pool_simple: Loaded
[  556.108716] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769622900 sec time_interval:769622900) - (invoke count:1000000000 tsc_interval:76962277)
[  569.593570] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467378920 sec time_interval:13467378920) - (invoke count:1000000000 tsc_interval:1346737886)
[  571.112408] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500672390 sec time_interval:1500672390) - (invoke count:100000000 tsc_interval:150067233)
[  577.671068] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541360400 sec time_interval:6541360400) - (invoke count:1000000000 tsc_interval:654136033)
[  577.688281] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  579.159760] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 14.622 ns (step:0) - (measurement period time:1.462214680 sec time_interval:1462214680) - (invoke count:100000000 tsc_interval:146221461)
[  579.178615] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  584.387107] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 51.993 ns (step:0) - (measurement period time:5.199315890 sec time_interval:5199315890) - (invoke count:100000000 tsc_interval:519931583)
[  584.405963] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  601.992462] time_bench: Type:no-softirq-page_pool03 Per elem: 17 cycles(tsc) 175.776 ns (step:0) - (measurement period time:17.577663130 sec time_interval:17577663130) - (invoke count:100000000 tsc_interval:1757766306)
[  602.011753] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  602.019634] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  604.647682] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 26.189 ns (step:0) - (measurement period time:2.618955910 sec time_interval:2618955910) - (invoke count:100000000 tsc_interval:261895585)
[  604.667141] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  610.055961] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 53.798 ns (step:0) - (measurement period time:5.379816080 sec time_interval:5379816080) - (invoke count:100000000 tsc_interval:537981602)
[  610.075334] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  629.030597] time_bench: Type:tasklet_page_pool03_slow Per elem: 18 cycles(tsc) 189.466 ns (step:0) - (measurement period time:18.946606280 sec time_interval:18946606280) - (invoke count:100000000 tsc_interval:1894660622)



bd05af7e28d2 (HEAD -> pp-inflight-fix_v6_test) page_pool: use list instead of ptr_ring for ring cache
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  324.256893] bench_page_pool_simple: Loaded
[  325.030626] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769608510 sec time_interval:769608510) - (invoke count:1000000000 tsc_interval:76960843)
[  338.515544] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467442220 sec time_interval:13467442220) - (invoke count:1000000000 tsc_interval:1346744216)
[  340.034383] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500673080 sec time_interval:1500673080) - (invoke count:100000000 tsc_interval:150067302)
[  346.593168] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541486300 sec time_interval:6541486300) - (invoke count:1000000000 tsc_interval:654148625)
[  346.610383] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  349.198132] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 25.784 ns (step:0) - (measurement period time:2.578484390 sec time_interval:2578484390) - (invoke count:100000000 tsc_interval:257848433)
[  349.216987] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  358.266543] time_bench: Type:no-softirq-page_pool02 Per elem: 9 cycles(tsc) 90.403 ns (step:0) - (measurement period time:9.040378740 sec time_interval:9040378740) - (invoke count:100000000 tsc_interval:904037869)
[  358.285398] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  378.581275] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 202.870 ns (step:0) - (measurement period time:20.287047800 sec time_interval:20287047800) - (invoke count:100000000 tsc_interval:2028704772)
[  378.600567] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  378.608449] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  381.195830] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 25.782 ns (step:0) - (measurement period time:2.578291220 sec time_interval:2578291220) - (invoke count:100000000 tsc_interval:257829118)
[  381.215288] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  390.262793] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 9 cycles(tsc) 90.385 ns (step:0) - (measurement period time:9.038500040 sec time_interval:9038500040) - (invoke count:100000000 tsc_interval:903849999)
[  390.282165] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  410.602531] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 203.117 ns (step:0) - (measurement period time:20.311708230 sec time_interval:20311708230) - (invoke count:100000000 tsc_interval:2031170817)
root@(none)$ rmmod bench_page_pool_simple.ko
[  452.799939] bench_page_pool_simple: Unloaded
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  454.932877] bench_page_pool_simple: Loaded
[  455.706590] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769596200 sec time_interval:769596200) - (invoke count:1000000000 tsc_interval:76959611)
[  469.191300] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467234550 sec time_interval:13467234550) - (invoke count:1000000000 tsc_interval:1346723449)
[  470.710117] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500652740 sec time_interval:1500652740) - (invoke count:100000000 tsc_interval:150065267)
[  477.268702] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541285540 sec time_interval:6541285540) - (invoke count:1000000000 tsc_interval:654128549)
[  477.285914] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  479.873572] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 25.783 ns (step:0) - (measurement period time:2.578394320 sec time_interval:2578394320) - (invoke count:100000000 tsc_interval:257839426)
[  479.892426] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  488.941591] time_bench: Type:no-softirq-page_pool02 Per elem: 9 cycles(tsc) 90.399 ns (step:0) - (measurement period time:9.039988700 sec time_interval:9039988700) - (invoke count:100000000 tsc_interval:903998864)
[  488.960458] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  509.252999] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 202.837 ns (step:0) - (measurement period time:20.283709920 sec time_interval:20283709920) - (invoke count:100000000 tsc_interval:2028370986)
[  509.275188] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  509.283069] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  511.870501] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 25.783 ns (step:0) - (measurement period time:2.578339900 sec time_interval:2578339900) - (invoke count:100000000 tsc_interval:257833985)
[  511.889959] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  520.937881] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 9 cycles(tsc) 90.389 ns (step:0) - (measurement period time:9.038917580 sec time_interval:9038917580) - (invoke count:100000000 tsc_interval:903891752)
[  520.957253] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  541.278328] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 203.124 ns (step:0) - (measurement period time:20.312417960 sec time_interval:20312417960) - (invoke count:100000000 tsc_interval:2031241790)
root@(none)$ cat /proc/version
Linux version 6.13.0-rc6-00905-gbd05af7e28d2 (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #301 SMP PREEMPT Wed Jan 15 14:57:40 CST 2025



e8e4ef65fd4b (HEAD -> pp-inflight-fix_v6_test) page_pool: batch refilling pages to reduce atomic operation
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[   81.660612] bench_page_pool_simple: Loaded
[   82.434335] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769577370 sec time_interval:769577370) - (invoke count:1000000000 tsc_interval:76957728)
[   95.919455] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467643010 sec time_interval:13467643010) - (invoke count:1000000000 tsc_interval:1346764295)
[   97.438295] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500675620 sec time_interval:1500675620) - (invoke count:100000000 tsc_interval:150067556)
[  103.997112] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541514490 sec time_interval:6541514490) - (invoke count:1000000000 tsc_interval:654151443)
[  104.014327] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  105.524295] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 15.007 ns (step:0) - (measurement period time:1.500704660 sec time_interval:1500704660) - (invoke count:100000000 tsc_interval:150070459)
[  105.543183] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  111.935637] time_bench: Type:no-softirq-page_pool02 Per elem: 6 cycles(tsc) 63.832 ns (step:0) - (measurement period time:6.383276590 sec time_interval:6383276590) - (invoke count:100000000 tsc_interval:638327653)
[  111.954492] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  131.007329] time_bench: Type:no-softirq-page_pool03 Per elem: 19 cycles(tsc) 190.440 ns (step:0) - (measurement period time:19.044004630 sec time_interval:19044004630) - (invoke count:100000000 tsc_interval:1904400455)
[  131.026621] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  131.034503] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  132.544154] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:1.500558810 sec time_interval:1500558810) - (invoke count:100000000 tsc_interval:150055876)
[  132.563614] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  139.007314] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 64.346 ns (step:0) - (measurement period time:6.434695610 sec time_interval:6434695610) - (invoke count:100000000 tsc_interval:643469557)
[  139.026687] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  158.093560] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.582 ns (step:0) - (measurement period time:19.058215140 sec time_interval:19058215140) - (invoke count:100000000 tsc_interval:1905821508)
root@(none)$ rmmod bench_page_pool_simple.ko
[  172.671534] bench_page_pool_simple: Unloaded
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  174.012461] bench_page_pool_simple: Loaded
[  174.786162] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769579310 sec time_interval:769579310) - (invoke count:1000000000 tsc_interval:76957922)
[  188.270731] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467093170 sec time_interval:13467093170) - (invoke count:1000000000 tsc_interval:1346709310)
[  189.789532] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500638040 sec time_interval:1500638040) - (invoke count:100000000 tsc_interval:150063795)
[  196.348065] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541234660 sec time_interval:6541234660) - (invoke count:1000000000 tsc_interval:654123460)
[  196.365281] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  197.875195] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500650210 sec time_interval:1500650210) - (invoke count:100000000 tsc_interval:150065016)
[  197.894050] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  203.394345] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 54.911 ns (step:0) - (measurement period time:5.491119700 sec time_interval:5491119700) - (invoke count:100000000 tsc_interval:549111964)
[  203.413201] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  222.522015] time_bench: Type:no-softirq-page_pool03 Per elem: 19 cycles(tsc) 190.999 ns (step:0) - (measurement period time:19.099982300 sec time_interval:19099982300) - (invoke count:100000000 tsc_interval:1909998222)
[  222.541306] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  222.549187] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  224.058807] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:1.500531720 sec time_interval:1500531720) - (invoke count:100000000 tsc_interval:150053166)
[  224.078267] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  229.638432] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 55.511 ns (step:0) - (measurement period time:5.551160500 sec time_interval:5551160500) - (invoke count:100000000 tsc_interval:555116045)
[  229.657805] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  248.720382] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.539 ns (step:0) - (measurement period time:19.053918960 sec time_interval:19053918960) - (invoke count:100000000 tsc_interval:1905391890)
root@(none)$ cat /proc/version
Linux version 6.13.0-rc6-00906-ge8e4ef65fd4b (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #302 SMP PREEMPT Wed Jan 15 15:11:10 CST 2025
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  493.008461] bench_page_pool_simple: Loaded
[  493.782195] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769607870 sec time_interval:769607870) - (invoke count:1000000000 tsc_interval:76960778)
[  507.266860] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467190060 sec time_interval:13467190060) - (invoke count:1000000000 tsc_interval:1346718999)
[  508.785667] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500643840 sec time_interval:1500643840) - (invoke count:100000000 tsc_interval:150064378)
[  515.344224] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541258530 sec time_interval:6541258530) - (invoke count:1000000000 tsc_interval:654125847)
[  515.361440] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  518.102903] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 27.321 ns (step:0) - (measurement period time:2.732199220 sec time_interval:2732199220) - (invoke count:100000000 tsc_interval:273219917)
[  518.121759] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  524.874604] time_bench: Type:no-softirq-page_pool02 Per elem: 6 cycles(tsc) 67.436 ns (step:0) - (measurement period time:6.743668740 sec time_interval:6743668740) - (invoke count:100000000 tsc_interval:674366869)
[  524.893460] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  543.980580] time_bench: Type:no-softirq-page_pool03 Per elem: 19 cycles(tsc) 190.782 ns (step:0) - (measurement period time:19.078288770 sec time_interval:19078288770) - (invoke count:100000000 tsc_interval:1907828868)
[  543.999871] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  544.007753] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  546.748829] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 27.319 ns (step:0) - (measurement period time:2.731985080 sec time_interval:2731985080) - (invoke count:100000000 tsc_interval:273198499)
[  546.768288] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  553.505522] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 67.282 ns (step:0) - (measurement period time:6.728229430 sec time_interval:6728229430) - (invoke count:100000000 tsc_interval:672822938)
[  553.524893] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  572.731687] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 191.981 ns (step:0) - (measurement period time:19.198137710 sec time_interval:19198137710) - (invoke count:100000000 tsc_inter

root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[  624.624453] bench_page_pool_simple: Loaded
[  625.398155] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769580100 sec time_interval:769580100) - (invoke count:1000000000 tsc_interval:76958003)
[  638.882758] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467127790 sec time_interval:13467127790) - (invoke count:1000000000 tsc_interval:1346712774)
[  640.401554] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500633000 sec time_interval:1500633000) - (invoke count:100000000 tsc_interval:150063294)
[  646.960100] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541244270 sec time_interval:6541244270) - (invoke count:1000000000 tsc_interval:654124421)
[  646.977313] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  649.718817] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 27.322 ns (step:0) - (measurement period time:2.732241230 sec time_interval:2732241230) - (invoke count:100000000 tsc_interval:273224117)
[  649.737673] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  656.485353] time_bench: Type:no-softirq-page_pool02 Per elem: 6 cycles(tsc) 67.385 ns (step:0) - (measurement period time:6.738504450 sec time_interval:6738504450) - (invoke count:100000000 tsc_interval:673850439)
[  656.504211] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[  675.730226] time_bench: Type:no-softirq-page_pool03 Per elem: 19 cycles(tsc) 192.171 ns (step:0) - (measurement period time:19.217181040 sec time_interval:19217181040) - (invoke count:100000000 tsc_interval:1921718097)
[  675.749517] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[  675.757399] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  678.498457] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 27.319 ns (step:0) - (measurement period time:2.731969810 sec time_interval:2731969810) - (invoke count:100000000 tsc_interval:273196975)
[  678.517917] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  685.272622] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 67.457 ns (step:0) - (measurement period time:6.745701080 sec time_interval:6745701080) - (invoke count:100000000 tsc_interval:674570103)
[  685.291993] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[  704.535410] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 192.347 ns (step:0) - (measurement period time:19.234760880 sec time_interval:19234760880) - (invoke count:100000000 tsc_interval:1923476080)


5760bcdd3fef (HEAD -> pp-inflight-fix_v6_test) page_pool: use list instead of array for alloc cache
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[ 1378.118009] bench_page_pool_simple: Loaded
[ 1378.891760] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769629870 sec time_interval:769629870) - (invoke count:1000000000 tsc_interval:76962977)
[ 1392.376430] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467196340 sec time_interval:13467196340) - (invoke count:1000000000 tsc_interval:1346719628)
[ 1393.895253] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500659490 sec time_interval:1500659490) - (invoke count:100000000 tsc_interval:150065942)
[ 1400.453791] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541237910 sec time_interval:6541237910) - (invoke count:1000000000 tsc_interval:654123784)
[ 1400.471006] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[ 1402.135620] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 16.553 ns (step:0) - (measurement period time:1.655350930 sec time_interval:1655350930) - (invoke count:100000000 tsc_interval:165535087)
[ 1402.154474] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[ 1407.685584] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 55.219 ns (step:0) - (measurement period time:5.521934590 sec time_interval:5521934590) - (invoke count:100000000 tsc_interval:552193452)
[ 1407.704438] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[ 1427.906125] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.928 ns (step:0) - (measurement period time:20.192856910 sec time_interval:20192856910) - (invoke count:100000000 tsc_interval:2019285683)
[ 1427.925416] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[ 1427.933297] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[ 1429.519900] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.775 ns (step:0) - (measurement period time:1.577513290 sec time_interval:1577513290) - (invoke count:100000000 tsc_interval:157751323)
[ 1429.539358] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[ 1435.138765] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 55.904 ns (step:0) - (measurement period time:5.590404140 sec time_interval:5590404140) - (invoke count:100000000 tsc_interval:559040410)
[ 1435.158136] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[ 1455.411856] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 202.450 ns (step:0) - (measurement period time:20.245062650 sec time_interval:20245062650) - (invoke count:100000000 tsc_interval:2024506258)
root@(none)$ rmmod bench_page_pool_simple.ko
[ 1624.116972] bench_page_pool_simple: Unloaded
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[ 1625.254057] bench_page_pool_simple: Loaded
[ 1626.027804] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769627010 sec time_interval:769627010) - (invoke count:1000000000 tsc_interval:76962694)
[ 1639.512664] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467385750 sec time_interval:13467385750) - (invoke count:1000000000 tsc_interval:1346738568)
[ 1641.031493] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500664980 sec time_interval:1500664980) - (invoke count:100000000 tsc_interval:150066492)
[ 1647.590116] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541324190 sec time_interval:6541324190) - (invoke count:1000000000 tsc_interval:654132413)
[ 1647.607328] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[ 1649.211118] time_bench: Type:no-softirq-page_pool01 Per elem: 1 cycles(tsc) 15.945 ns (step:0) - (measurement period time:1.594526020 sec time_interval:1594526020) - (invoke count:100000000 tsc_interval:159452596)
[ 1649.229971] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[ 1654.761083] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 55.219 ns (step:0) - (measurement period time:5.521934830 sec time_interval:5521934830) - (invoke count:100000000 tsc_interval:552193476)
[ 1654.779937] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[ 1674.973459] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.846 ns (step:0) - (measurement period time:20.184690600 sec time_interval:20184690600) - (invoke count:100000000 tsc_interval:2018469053)
[ 1674.992751] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[ 1675.000632] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[ 1676.622598] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 16.128 ns (step:0) - (measurement period time:1.612877140 sec time_interval:1612877140) - (invoke count:100000000 tsc_interval:161287709)
[ 1676.642056] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[ 1682.241489] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 5 cycles(tsc) 55.904 ns (step:0) - (measurement period time:5.590428410 sec time_interval:5590428410) - (invoke count:100000000 tsc_interval:559042835)
[ 1682.260860] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[ 1702.540682] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 202.711 ns (step:0) - (measurement period time:20.271164760 sec time_interval:20271164760) - (invoke count:100000000 tsc_interval:2027116470)
root@(none)$ rmmod bench_page_pool_simple.ko
[ 3945.224975] bench_page_pool_simple: Unloaded
root@(none)$ insmod bench_page_pool_simple.ko loops=100000000
[ 3946.318072] bench_page_pool_simple: Loaded
[ 3947.091825] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.769631280 sec time_interval:769631280) - (invoke count:1000000000 tsc_interval:76963115)
[ 3960.576784] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:13.467483140 sec time_interval:13467483140) - (invoke count:1000000000 tsc_interval:1346748308)
[ 3962.095607] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.006 ns (step:0) - (measurement period time:1.500658780 sec time_interval:1500658780) - (invoke count:100000000 tsc_interval:150065872)
[ 3968.654285] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:6.541378830 sec time_interval:6541378830) - (invoke count:1000000000 tsc_interval:654137877)
[ 3968.671520] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[ 3971.491845] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 28.110 ns (step:0) - (measurement period time:2.811058810 sec time_interval:2811058810) - (invoke count:100000000 tsc_interval:281105875)
[ 3971.510703] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[ 3978.348581] time_bench: Type:no-softirq-page_pool02 Per elem: 6 cycles(tsc) 68.287 ns (step:0) - (measurement period time:6.828701400 sec time_interval:6828701400) - (invoke count:100000000 tsc_interval:682870134)
[ 3978.367435] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[ 3998.595188] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 202.189 ns (step:0) - (measurement period time:20.218922630 sec time_interval:20218922630) - (invoke count:100000000 tsc_interval:2021892255)
[ 3998.614480] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[ 3998.622362] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[ 4001.442253] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 28.108 ns (step:0) - (measurement period time:2.810802040 sec time_interval:2810802040) - (invoke count:100000000 tsc_interval:281080197)
[ 4001.461713] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[ 4008.290654] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 68.199 ns (step:0) - (measurement period time:6.819937430 sec time_interval:6819937430) - (invoke count:100000000 tsc_interval:681993738)
[ 4008.310026] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[ 4028.570377] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 202.516 ns (step:0) - (measurement period time:20.251693920 sec time_interval:20251693920) - (invoke count:100000000 tsc_interval:2025169387)
root@(none)$ cat /proc/version
Linux version 6.13.0-rc6-00907-g5760bcdd3fef (linyunsheng@localhost.localdomain) (gcc (GCC) 10.3.1, GNU ld (GNU Binutils) 2.37) #303 SMP PREEMPT Wed Jan 15 15:27:07 CST 2025

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v7 0/8] fix two bugs related to page_pool
  2025-01-15 11:33   ` Yunsheng Lin
@ 2025-01-15 17:40     ` Jesper Dangaard Brouer
  2025-01-16 12:52       ` Yunsheng Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2025-01-15 17:40 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba, pabeni
  Cc: zhangkun09, liuyonglong, fanghaiqing, Alexander Lobakin,
	Robin Murphy, Alexander Duyck, Andrew Morton, IOMMU, MM,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Matthias Brugger, AngeloGioacchino Del Regno, netdev,
	intel-wired-lan, bpf, linux-kernel, linux-arm-kernel,
	linux-mediatek



On 15/01/2025 12.33, Yunsheng Lin wrote:
> On 2025/1/14 22:31, Jesper Dangaard Brouer wrote:
>>
>>
>> On 10/01/2025 14.06, Yunsheng Lin wrote:
>>> This patchset fix a possible time window problem for page_pool and
>>> the dma API misuse problem as mentioned in [1], and try to avoid the
>>> overhead of the fixing using some optimization.
>>>
>>>   From the below performance data, the overhead is not so obvious
>>> due to performance variations for time_bench_page_pool01_fast_path()
>>> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
>>> for time_bench_page_pool03_slow() for fixing the bug.
>>>
>>
>> My benchmarking on x86_64 CPUs looks significantly different.
>>   - CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
>>
>> Benchmark (bench_page_pool_simple) results from before and after patchset:
>>
>> | Test name  | Cycles |       |    |Nanosec |        |       |      % |
>> | (tasklet_*)| Before | After |diff| Before |  After |  diff | change |
>> |------------+--------+-------+----+--------+--------+-------+--------|
>> | fast_path  |     19 |    24 |   5|  5.399 |  6.928 | 1.529 |   28.3 |
>> | ptr_ring   |     54 |    79 |  25| 15.090 | 21.976 | 6.886 |   45.6 |
>> | slow       |    238 |   299 |  61| 66.134 | 83.298 |17.164 |   26.0 |
>> #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f
>>
>> My above testing show a clear performance regressions across three
>> different page_pool operating modes.
> 
> I retested it on arm64 server patch by patch as the raw performance
> data in the attachment, it seems the result seemed similar as before.
> 
> Before this patchset:
>              fast_path              ptr_ring            slow
> 1.         31.171 ns               60.980 ns          164.917 ns
> 2.         28.824 ns               60.891 ns          170.241 ns
> 3.         14.236 ns               60.583 ns          164.355 ns
> 
> With patch 1-4:
> 4.         31.443 ns               53.242 ns          210.148 ns
> 5.         31.406 ns               53.270 ns          210.189 ns
> 
> With patch 1-5:
> 6.         26.163 ns               53.781 ns          189.450 ns
> 7.         26.189 ns               53.798 ns          189.466 ns
> 
> With patch 1-8:
> 8.         28.108 ns               68.199 ns          202.516 ns
> 9.         16.128 ns               55.904 ns          202.711 ns
> 
> I am not able to get hold of a x86 server yet, I might be able
> to get one during weekend.
> 
> Theoretically, patch 1-4 or 1-5 should not have much performance
> impact for fast_path and ptr_ring except for the rcu_lock mentioned
> in page_pool_napi_local(), so it would be good if patch 1-5 is also
> tested in your testlab with the rcu_lock removing in
> page_pool_napi_local().
> 

What are you saying?
  - (1) test patch 1-5
  - or (2) test patch 1-5 but revert patch 2 with page_pool_napi_local()

--Jesper

>>
>>
>> Data also available in:
>>   - https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v7 0/8] fix two bugs related to page_pool
  2025-01-15 17:40     ` Jesper Dangaard Brouer
@ 2025-01-16 12:52       ` Yunsheng Lin
  2025-01-16 18:02         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Yunsheng Lin @ 2025-01-16 12:52 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, davem, kuba, pabeni
  Cc: zhangkun09, liuyonglong, fanghaiqing, Alexander Lobakin,
	Robin Murphy, Alexander Duyck, Andrew Morton, IOMMU, MM,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Matthias Brugger, AngeloGioacchino Del Regno, netdev,
	intel-wired-lan, bpf, linux-kernel, linux-arm-kernel,
	linux-mediatek

On 2025/1/16 1:40, Jesper Dangaard Brouer wrote:
> 
> 
> On 15/01/2025 12.33, Yunsheng Lin wrote:
>> On 2025/1/14 22:31, Jesper Dangaard Brouer wrote:
>>>
>>>
>>> On 10/01/2025 14.06, Yunsheng Lin wrote:
>>>> This patchset fix a possible time window problem for page_pool and
>>>> the dma API misuse problem as mentioned in [1], and try to avoid the
>>>> overhead of the fixing using some optimization.
>>>>
>>>>   From the below performance data, the overhead is not so obvious
>>>> due to performance variations for time_bench_page_pool01_fast_path()
>>>> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
>>>> for time_bench_page_pool03_slow() for fixing the bug.
>>>>
>>>
>>> My benchmarking on x86_64 CPUs looks significantly different.
>>>   - CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
>>>
>>> Benchmark (bench_page_pool_simple) results from before and after patchset:
>>>
>>> | Test name  | Cycles |       |    |Nanosec |        |       |      % |
>>> | (tasklet_*)| Before | After |diff| Before |  After |  diff | change |
>>> |------------+--------+-------+----+--------+--------+-------+--------|
>>> | fast_path  |     19 |    24 |   5|  5.399 |  6.928 | 1.529 |   28.3 |
>>> | ptr_ring   |     54 |    79 |  25| 15.090 | 21.976 | 6.886 |   45.6 |
>>> | slow       |    238 |   299 |  61| 66.134 | 83.298 |17.164 |   26.0 |
>>> #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f
>>>
>>> My above testing show a clear performance regressions across three
>>> different page_pool operating modes.
>>
>> I retested it on arm64 server patch by patch as the raw performance
>> data in the attachment, it seems the result seemed similar as before.
>>
>> Before this patchset:
>>              fast_path              ptr_ring            slow
>> 1.         31.171 ns               60.980 ns          164.917 ns
>> 2.         28.824 ns               60.891 ns          170.241 ns
>> 3.         14.236 ns               60.583 ns          164.355 ns
>>
>> With patch 1-4:
>> 4.         31.443 ns               53.242 ns          210.148 ns
>> 5.         31.406 ns               53.270 ns          210.189 ns
>>
>> With patch 1-5:
>> 6.         26.163 ns               53.781 ns          189.450 ns
>> 7.         26.189 ns               53.798 ns          189.466 ns
>>
>> With patch 1-8:
>> 8.         28.108 ns               68.199 ns          202.516 ns
>> 9.         16.128 ns               55.904 ns          202.711 ns
>>
>> I am not able to get hold of a x86 server yet, I might be able
>> to get one during weekend.
>>
>> Theoretically, patch 1-4 or 1-5 should not have much performance
>> impact for fast_path and ptr_ring except for the rcu_lock mentioned
>> in page_pool_napi_local(), so it would be good if patch 1-5 is also
>> tested in your testlab with the rcu_lock removing in
>> page_pool_napi_local().
>>
> 
> What are you saying?
>  - (1) test patch 1-5
>  - or (2) test patch 1-5 but revert patch 2 with page_pool_napi_local()

patch 1-5 with below applied.

--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -1207,10 +1207,8 @@ static bool page_pool_napi_local(const struct page_pool *pool)
        /* Synchronizated with page_pool_destory() to avoid use-after-free
         * for 'napi'.
         */
-       rcu_read_lock();
        napi = READ_ONCE(pool->p.napi);
        napi_local = napi && READ_ONCE(napi->list_owner) == cpuid;
-       rcu_read_unlock();

        return napi_local;
 }


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v7 0/8] fix two bugs related to page_pool
  2025-01-16 12:52       ` Yunsheng Lin
@ 2025-01-16 18:02         ` Jesper Dangaard Brouer
  2025-01-17 11:35           ` Yunsheng Lin
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2025-01-16 18:02 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba, pabeni
  Cc: zhangkun09, liuyonglong, fanghaiqing, Alexander Lobakin,
	Robin Murphy, Alexander Duyck, Andrew Morton, IOMMU, MM,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Matthias Brugger, AngeloGioacchino Del Regno, netdev,
	intel-wired-lan, bpf, linux-kernel, linux-arm-kernel,
	linux-mediatek



On 16/01/2025 13.52, Yunsheng Lin wrote:
> On 2025/1/16 1:40, Jesper Dangaard Brouer wrote:
>>
>>
>> On 15/01/2025 12.33, Yunsheng Lin wrote:
>>> On 2025/1/14 22:31, Jesper Dangaard Brouer wrote:
>>>>
>>>>
>>>> On 10/01/2025 14.06, Yunsheng Lin wrote:
>>>>> This patchset fix a possible time window problem for page_pool and
>>>>> the dma API misuse problem as mentioned in [1], and try to avoid the
>>>>> overhead of the fixing using some optimization.
>>>>>
>>>>>    From the below performance data, the overhead is not so obvious
>>>>> due to performance variations for time_bench_page_pool01_fast_path()
>>>>> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
>>>>> for time_bench_page_pool03_slow() for fixing the bug.
>>>>>
>>>>
>>>> My benchmarking on x86_64 CPUs looks significantly different.
>>>>    - CPU: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
>>>>
>>>> Benchmark (bench_page_pool_simple) results from before and after patchset:
>>>>
>>>> | Test name  | Cycles |       |    |Nanosec |        |       |      % |
>>>> | (tasklet_*)| Before | After |diff| Before |  After |  diff | change |
>>>> |------------+--------+-------+----+--------+--------+-------+--------|
>>>> | fast_path  |     19 |    24 |   5|  5.399 |  6.928 | 1.529 |   28.3 |
>>>> | ptr_ring   |     54 |    79 |  25| 15.090 | 21.976 | 6.886 |   45.6 |
>>>> | slow       |    238 |   299 |  61| 66.134 | 83.298 |17.164 |   26.0 |
>>>> #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f
>>>>
>>>> My above testing show a clear performance regressions across three
>>>> different page_pool operating modes.
>>>
>>> I retested it on arm64 server patch by patch as the raw performance
>>> data in the attachment, it seems the result seemed similar as before.
>>>
>>> Before this patchset:
>>>               fast_path              ptr_ring            slow
>>> 1.         31.171 ns               60.980 ns          164.917 ns
>>> 2.         28.824 ns               60.891 ns          170.241 ns
>>> 3.         14.236 ns               60.583 ns          164.355 ns
>>>
>>> With patch 1-4:
>>> 4.         31.443 ns               53.242 ns          210.148 ns
>>> 5.         31.406 ns               53.270 ns          210.189 ns
>>>
>>> With patch 1-5:
>>> 6.         26.163 ns               53.781 ns          189.450 ns
>>> 7.         26.189 ns               53.798 ns          189.466 ns
>>>
>>> With patch 1-8:
>>> 8.         28.108 ns               68.199 ns          202.516 ns
>>> 9.         16.128 ns               55.904 ns          202.711 ns
>>>
>>> I am not able to get hold of a x86 server yet, I might be able
>>> to get one during weekend.
>>>
>>> Theoretically, patch 1-4 or 1-5 should not have much performance
>>> impact for fast_path and ptr_ring except for the rcu_lock mentioned
>>> in page_pool_napi_local(), so it would be good if patch 1-5 is also
>>> tested in your testlab with the rcu_lock removing in
>>> page_pool_napi_local().
>>>
>>
>> What are you saying?
>>   - (1) test patch 1-5
>>   - or (2) test patch 1-5 but revert patch 2 with page_pool_napi_local()
> 
> patch 1-5 with below applied.
> 
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -1207,10 +1207,8 @@ static bool page_pool_napi_local(const struct page_pool *pool)
>          /* Synchronizated with page_pool_destory() to avoid use-after-free
>           * for 'napi'.
>           */
> -       rcu_read_lock();
>          napi = READ_ONCE(pool->p.napi);
>          napi_local = napi && READ_ONCE(napi->list_owner) == cpuid;
> -       rcu_read_unlock();
> 
>          return napi_local;
>   }
> 

Benchmark (bench_page_pool_simple) results from before and after
patchset with patches 1-5m and rcu lock removal as requested.

| Test name  |Cycles |   1-5 |    | Nanosec |    1-5 |        |      % |
| (tasklet_*)|Before | After |diff|  Before |  After |   diff | change |
|------------+-------+-------+----+---------+--------+--------+--------|
| fast_path  |    19 |    19 |   0|   5.399 |  5.492 |  0.093 |    1.7 |
| ptr_ring   |    54 |    57 |   3|  15.090 | 15.849 |  0.759 |    5.0 |
| slow       |   238 |   284 |  46|  66.134 | 78.909 | 12.775 |   19.3 |
#+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f

This test with patches 1-5 looks much better regarding performance.

--Jesper

https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org#e5-1650-pp01-dma-fix-v7-p1-5

Kernel:
  - 6.13.0-rc6-pp01-DMA-fix-v7-p1-5+ #5 SMP PREEMPT_DYNAMIC Thu Jan 16 
18:06:53 CET 2025 x86_64 GNU/Linux

Machine: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

modprobe bench_page_pool_simple loops=100000000

Raw data:
[  187.309423] bench_page_pool_simple: 
time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[  187.872849] time_bench: Type:no-softirq-page_pool01 Per elem: 19 
cycles(tsc) 5.539 ns (step:0) - (measurement period time:0.553906443 sec 
time_interval:553906443) - (invoke count:100000000 tsc_interval:1994123064)
[  187.892023] bench_page_pool_simple: 
time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[  189.611070] time_bench: Type:no-softirq-page_pool02 Per elem: 61 
cycles(tsc) 17.095 ns (step:0) - (measurement period time:1.709580367 
sec time_interval:1709580367) - (invoke count:100000000 
tsc_interval:6154679394)
[  189.630414] bench_page_pool_simple: time_bench_page_pool03_slow(): 
Cannot use page_pool fast-path
[  197.222387] time_bench: Type:no-softirq-page_pool03 Per elem: 272 
cycles(tsc) 75.826 ns (step:0) - (measurement period time:7.582681388 
sec time_interval:7582681388) - (invoke count:100000000 
tsc_interval:27298499214)
[  197.241926] bench_page_pool_simple: pp_tasklet_handler(): 
in_serving_softirq fast-path
[  197.249968] bench_page_pool_simple: 
time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[  197.808470] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 
19 cycles(tsc) 5.492 ns (step:0) - (measurement period time:0.549225541 
sec time_interval:549225541) - (invoke count:100000000 
tsc_interval:1977272238)
[  197.828174] bench_page_pool_simple: 
time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[  199.422305] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 
57 cycles(tsc) 15.849 ns (step:0) - (measurement period time:1.584920736 
sec time_interval:1584920736) - (invoke count:100000000 
tsc_interval:5705890830)
[  199.442087] bench_page_pool_simple: time_bench_page_pool03_slow(): 
in_serving_softirq fast-path
[  207.342120] time_bench: Type:tasklet_page_pool03_slow Per elem: 284 
cycles(tsc) 78.909 ns (step:0) - (measurement period time:7.890955151 
sec time_interval:7890955151) - (invoke count:100000000 
tsc_interval:28408319289)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v7 0/8] fix two bugs related to page_pool
  2025-01-16 18:02         ` Jesper Dangaard Brouer
@ 2025-01-17 11:35           ` Yunsheng Lin
  2025-01-18  8:04             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Yunsheng Lin @ 2025-01-17 11:35 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, davem, kuba, pabeni
  Cc: zhangkun09, liuyonglong, fanghaiqing, Alexander Lobakin,
	Robin Murphy, Alexander Duyck, Andrew Morton, IOMMU, MM,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Matthias Brugger, AngeloGioacchino Del Regno, netdev,
	intel-wired-lan, bpf, linux-kernel, linux-arm-kernel,
	linux-mediatek

On 2025/1/17 2:02, Jesper Dangaard Brouer wrote:

> 
> Benchmark (bench_page_pool_simple) results from before and after
> patchset with patches 1-5m and rcu lock removal as requested.
> 
> | Test name  |Cycles |   1-5 |    | Nanosec |    1-5 |        |      % |
> | (tasklet_*)|Before | After |diff|  Before |  After |   diff | change |
> |------------+-------+-------+----+---------+--------+--------+--------|
> | fast_path  |    19 |    19 |   0|   5.399 |  5.492 |  0.093 |    1.7 |
> | ptr_ring   |    54 |    57 |   3|  15.090 | 15.849 |  0.759 |    5.0 |
> | slow       |   238 |   284 |  46|  66.134 | 78.909 | 12.775 |   19.3 |
> #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f
> 
> This test with patches 1-5 looks much better regarding performance.

Thanks for the testing.

Is there any notiable performance variation during different test running
for the same built kernel in your machine?

> 
> --Jesper
> 
> https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org#e5-1650-pp01-dma-fix-v7-p1-5
> 
> Kernel:
>  - 6.13.0-rc6-pp01-DMA-fix-v7-p1-5+ #5 SMP PREEMPT_DYNAMIC Thu Jan 16 18:06:53 CET 2025 x86_64 GNU/Linux
> 
> Machine: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
> 
> modprobe bench_page_pool_simple loops=100000000
> 
> Raw data:
> [  187.309423] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [  187.872849] time_bench: Type:no-softirq-page_pool01 Per elem: 19 cycles(tsc) 5.539 ns (step:0) - (measurement period time:0.553906443 sec time_interval:553906443) - (invoke count:100000000 tsc_interval:1994123064)
> [  187.892023] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [  189.611070] time_bench: Type:no-softirq-page_pool02 Per elem: 61 cycles(tsc) 17.095 ns (step:0) - (measurement period time:1.709580367 sec time_interval:1709580367) - (invoke count:100000000 tsc_interval:6154679394)
> [  189.630414] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [  197.222387] time_bench: Type:no-softirq-page_pool03 Per elem: 272 cycles(tsc) 75.826 ns (step:0) - (measurement period time:7.582681388 sec time_interval:7582681388) - (invoke count:100000000 tsc_interval:27298499214)
> [  197.241926] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [  197.249968] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [  197.808470] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 5.492 ns (step:0) - (measurement period time:0.549225541 sec time_interval:549225541) - (invoke count:100000000 tsc_interval:1977272238)
> [  197.828174] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [  199.422305] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 57 cycles(tsc) 15.849 ns (step:0) - (measurement period time:1.584920736 sec time_interval:1584920736) - (invoke count:100000000 tsc_interval:5705890830)
> [  199.442087] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [  207.342120] time_bench: Type:tasklet_page_pool03_slow Per elem: 284 cycles(tsc) 78.909 ns (step:0) - (measurement period time:7.890955151 sec time_interval:7890955151) - (invoke count:100000000 tsc_interval:28408319289)
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next v7 0/8] fix two bugs related to page_pool
  2025-01-17 11:35           ` Yunsheng Lin
@ 2025-01-18  8:04             ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2025-01-18  8:04 UTC (permalink / raw)
  To: Yunsheng Lin, davem, kuba, pabeni
  Cc: zhangkun09, liuyonglong, fanghaiqing, Alexander Lobakin,
	Robin Murphy, Alexander Duyck, Andrew Morton, IOMMU, MM,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend,
	Matthias Brugger, AngeloGioacchino Del Regno, netdev,
	intel-wired-lan, bpf, linux-kernel, linux-arm-kernel,
	linux-mediatek



On 17/01/2025 12.35, Yunsheng Lin wrote:
> On 2025/1/17 2:02, Jesper Dangaard Brouer wrote:
> 
>>
>> Benchmark (bench_page_pool_simple) results from before and after
>> patchset with patches 1-5m and rcu lock removal as requested.
>>
>> | Test name  |Cycles |   1-5 |    | Nanosec |    1-5 |        |      % |
>> | (tasklet_*)|Before | After |diff|  Before |  After |   diff | change |
>> |------------+-------+-------+----+---------+--------+--------+--------|
>> | fast_path  |    19 |    19 |   0|   5.399 |  5.492 |  0.093 |    1.7 |
>> | ptr_ring   |    54 |    57 |   3|  15.090 | 15.849 |  0.759 |    5.0 |
>> | slow       |   238 |   284 |  46|  66.134 | 78.909 | 12.775 |   19.3 |
>> #+TBLFM: $4=$3-$2::$7=$6-$5::$8=(($7/$5)*100);%.1f
>>
>> This test with patches 1-5 looks much better regarding performance.
> 
> Thanks for the testing.
> 
> Is there any notiable performance variation during different test running
> for the same built kernel in your machine?
> 

My machine have quite stable performance for this benchmark.


>> https://github.com/xdp-project/xdp-project/blob/main/areas/mem/page_pool07_bench_DMA_fix.org#e5-1650-pp01-dma-fix-v7-p1-5

Like documented in above link. I have also increased the loops count for
the test to get it more stable, given this will be measured over a
longer period.

  modprobe bench_page_pool_simple loops=100000000


>> Kernel:
>>   - 6.13.0-rc6-pp01-DMA-fix-v7-p1-5+ #5 SMP PREEMPT_DYNAMIC Thu Jan 16 18:06:53 CET 2025 x86_64 GNU/Linux
>>
>> Machine: Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz
>>
>> modprobe bench_page_pool_simple loops=100000000
>>
>> Raw data:
>> [  187.309423] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
>> [  187.872849] time_bench: Type:no-softirq-page_pool01 Per elem: 19 cycles(tsc) 5.539 ns (step:0) - (measurement period time:0.553906443 sec time_interval:553906443) - (invoke count:100000000 tsc_interval:1994123064)
>> [  187.892023] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
>> [  189.611070] time_bench: Type:no-softirq-page_pool02 Per elem: 61 cycles(tsc) 17.095 ns (step:0) - (measurement period time:1.709580367 sec time_interval:1709580367) - (invoke count:100000000 tsc_interval:6154679394)
>> [  189.630414] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
>> [  197.222387] time_bench: Type:no-softirq-page_pool03 Per elem: 272 cycles(tsc) 75.826 ns (step:0) - (measurement period time:7.582681388 sec time_interval:7582681388) - (invoke count:100000000 tsc_interval:27298499214)
>> [  197.241926] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
>> [  197.249968] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
>> [  197.808470] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 5.492 ns (step:0) - (measurement period time:0.549225541 sec time_interval:549225541) - (invoke count:100000000 tsc_interval:1977272238)
>> [  197.828174] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
>> [  199.422305] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 57 cycles(tsc) 15.849 ns (step:0) - (measurement period time:1.584920736 sec time_interval:1584920736) - (invoke count:100000000 tsc_interval:5705890830)
>> [  199.442087] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
>> [  207.342120] time_bench: Type:tasklet_page_pool03_slow Per elem: 284 cycles(tsc) 78.909 ns (step:0) - (measurement period time:7.890955151 sec time_interval:7890955151) - (invoke count:100000000 tsc_interval:28408319289)
>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-01-18  8:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-10 13:06 [PATCH net-next v7 0/8] fix two bugs related to page_pool Yunsheng Lin
2025-01-10 13:06 ` [PATCH net-next v7 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
2025-01-14 14:31 ` [PATCH net-next v7 0/8] fix two bugs related to page_pool Jesper Dangaard Brouer
2025-01-15 11:33   ` Yunsheng Lin
2025-01-15 17:40     ` Jesper Dangaard Brouer
2025-01-16 12:52       ` Yunsheng Lin
2025-01-16 18:02         ` Jesper Dangaard Brouer
2025-01-17 11:35           ` Yunsheng Lin
2025-01-18  8:04             ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox