* [PATCH net-next v6 0/8] fix two bugs related to page_pool
@ 2025-01-06 13:01 Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Yunsheng Lin @ 2025-01-06 13:01 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: liuyonglong, fanghaiqing, zhangkun09, Yunsheng Lin,
Alexander Lobakin, Robin Murphy, Alexander Duyck, Andrew Morton,
IOMMU, MM, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Matthias Brugger,
AngeloGioacchino Del Regno, netdev, intel-wired-lan, bpf,
linux-kernel, linux-arm-kernel, linux-mediatek
This patchset fix a possible time window problem for page_pool and
the dma API misuse problem as mentioned in [1], and try to avoid the
overhead of the fixing using some optimization.
From the below performance data, the overhead is not so obvious
due to performance variations for time_bench_page_pool01_fast_path()
and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
for time_bench_page_pool03_slow() for fixing the bug.
Before this patchset:
root@(none)$ insmod bench_page_pool_simple.ko
[ 323.367627] bench_page_pool_simple: Loaded
[ 323.448747] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076997150 sec time_interval:76997150) - (invoke count:100000000 tsc_interval:7699707)
[ 324.812884] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346855130 sec time_interval:1346855130) - (invoke count:100000000 tsc_interval:134685507)
[ 324.980875] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.010 ns (step:0) - (measurement period time:0.150101270 sec time_interval:150101270) - (invoke count:10000000 tsc_interval:15010120)
[ 325.652195] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.542 ns (step:0) - (measurement period time:0.654213000 sec time_interval:654213000) - (invoke count:100000000 tsc_interval:65421294)
[ 325.669215] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[ 325.974848] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 29.633 ns (step:0) - (measurement period time:0.296338200 sec time_interval:296338200) - (invoke count:10000000 tsc_interval:29633814)
[ 325.993517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[ 326.576636] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.391 ns (step:0) - (measurement period time:0.573911820 sec time_interval:573911820) - (invoke count:10000000 tsc_interval:57391174)
[ 326.595307] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[ 328.422661] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.849 ns (step:0) - (measurement period time:1.818495880 sec time_interval:1818495880) - (invoke count:10000000 tsc_interval:181849581)
[ 328.441681] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[ 328.449584] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[ 328.755031] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 29.632 ns (step:0) - (measurement period time:0.296327910 sec time_interval:296327910) - (invoke count:10000000 tsc_interval:29632785)
[ 328.774308] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[ 329.578579] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 7 cycles(tsc) 79.523 ns (step:0) - (measurement period time:0.795236560 sec time_interval:795236560) - (invoke count:10000000 tsc_interval:79523650)
[ 329.597769] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[ 331.507501] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.104 ns (step:0) - (measurement period time:1.901047510 sec time_interval:1901047510) - (invoke count:10000000 tsc_interval:190104743)
After this patchset:
root@(none)$ insmod bench_page_pool_simple.ko
[ 138.634758] bench_page_pool_simple: Loaded
[ 138.715879] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076972720 sec time_interval:76972720) - (invoke count:100000000 tsc_interval:7697265)
[ 140.079897] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:1.346735370 sec time_interval:1346735370) - (invoke count:100000000 tsc_interval:134673531)
[ 140.247841] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150055080 sec time_interval:150055080) - (invoke count:10000000 tsc_interval:15005497)
[ 140.919072] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:0.654125000 sec time_interval:654125000) - (invoke count:100000000 tsc_interval:65412493)
[ 140.936091] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
[ 141.246985] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 30.159 ns (step:0) - (measurement period time:0.301598160 sec time_interval:301598160) - (invoke count:10000000 tsc_interval:30159812)
[ 141.265654] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
[ 141.976265] time_bench: Type:no-softirq-page_pool02 Per elem: 7 cycles(tsc) 70.140 ns (step:0) - (measurement period time:0.701405780 sec time_interval:701405780) - (invoke count:10000000 tsc_interval:70140573)
[ 141.994933] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
[ 144.018945] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.514 ns (step:0) - (measurement period time:2.015141210 sec time_interval:2015141210) - (invoke count:10000000 tsc_interval:201514113)
[ 144.037966] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
[ 144.045870] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[ 144.205045] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150056510 sec time_interval:150056510) - (invoke count:10000000 tsc_interval:15005645)
[ 144.224320] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[ 144.916044] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 68.269 ns (step:0) - (measurement period time:0.682693070 sec time_interval:682693070) - (invoke count:10000000 tsc_interval:68269300)
[ 144.935234] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
[ 146.997684] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 205.376 ns (step:0) - (measurement period time:2.053766310 sec time_interval:2053766310) - (invoke count:10000000 tsc_interval:205376624)
1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
CC: Robin Murphy <robin.murphy@arm.com>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: IOMMU <iommu@lists.linux.dev>
CC: MM <linux-mm@kvack.org>
Change log:
V6:
1. Repost based on latest net-next.
2. Rename page_pool_to_pp() to page_pool_get_pp().
V5:
1. Support unlimit inflight pages.
2. Add some optimization to avoid the overhead of fixing bug.
V4:
1. use scanning to do the unmapping
2. spilt dma sync skipping into separate patch
V3:
1. Target net-next tree instead of net tree.
2. Narrow the rcu lock as the discussion in v2.
3. Check the ummapping cnt against the inflight cnt.
V2:
1. Add a item_full stat.
2. Use container_of() for page_pool_to_pp().
Yunsheng Lin (8):
page_pool: introduce page_pool_get_pp() API
page_pool: fix timing for checking and disabling napi_local
page_pool: fix IOMMU crash when driver has already unbound
page_pool: support unlimited number of inflight pages
page_pool: skip dma sync operation for inflight pages
page_pool: use list instead of ptr_ring for ring cache
page_pool: batch refilling pages to reduce atomic operation
page_pool: use list instead of array for alloc cache
drivers/net/ethernet/freescale/fec_main.c | 8 +-
.../ethernet/google/gve/gve_buffer_mgmt_dqo.c | 2 +-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 6 +-
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +-
drivers/net/ethernet/intel/libeth/rx.c | 2 +-
.../net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 +-
drivers/net/netdevsim/netdev.c | 6 +-
drivers/net/wireless/mediatek/mt76/mt76.h | 2 +-
include/linux/mm_types.h | 2 +-
include/linux/skbuff.h | 1 +
include/net/libeth/rx.h | 3 +-
include/net/netmem.h | 24 +-
include/net/page_pool/helpers.h | 11 +
include/net/page_pool/types.h | 63 +-
net/core/devmem.c | 4 +-
net/core/netmem_priv.h | 5 +-
net/core/page_pool.c | 660 ++++++++++++++----
net/core/page_pool_priv.h | 12 +-
18 files changed, 664 insertions(+), 164 deletions(-)
--
2.33.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API
2025-01-06 13:01 [PATCH net-next v6 0/8] fix two bugs related to page_pool Yunsheng Lin
@ 2025-01-06 13:01 ` Yunsheng Lin
2025-01-07 14:52 ` Jesper Dangaard Brouer
2025-01-06 23:51 ` [PATCH net-next v6 0/8] fix two bugs related to page_pool Jakub Kicinski
2025-01-07 14:26 ` Jesper Dangaard Brouer
2 siblings, 1 reply; 8+ messages in thread
From: Yunsheng Lin @ 2025-01-06 13:01 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: liuyonglong, fanghaiqing, zhangkun09, Yunsheng Lin, Wei Fang,
Shenwei Wang, Clark Wang, Andrew Lunn, Eric Dumazet,
Jeroen de Borst, Praveen Kaligineedi, Shailend Chand, Tony Nguyen,
Przemek Kitszel, Alexander Lobakin, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Felix Fietkau,
Lorenzo Bianconi, Ryder Lee, Shayne Chen, Sean Wang, Kalle Valo,
Matthias Brugger, AngeloGioacchino Del Regno, Simon Horman,
Ilias Apalodimas, imx, netdev, linux-kernel, intel-wired-lan, bpf,
linux-rdma, linux-wireless, linux-arm-kernel, linux-mediatek
introduce page_pool_get_pp() API to avoid caller accessing
page->pp directly.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
drivers/net/ethernet/freescale/fec_main.c | 8 +++++---
.../net/ethernet/google/gve/gve_buffer_mgmt_dqo.c | 2 +-
drivers/net/ethernet/intel/iavf/iavf_txrx.c | 6 ++++--
drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +++++++++-----
drivers/net/ethernet/intel/libeth/rx.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 ++-
drivers/net/netdevsim/netdev.c | 6 ++++--
drivers/net/wireless/mediatek/mt76/mt76.h | 2 +-
include/net/libeth/rx.h | 3 ++-
include/net/page_pool/helpers.h | 5 +++++
10 files changed, 34 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index b2daed55bf6c..18d2119dbec1 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1009,7 +1009,8 @@ static void fec_enet_bd_init(struct net_device *dev)
struct page *page = txq->tx_buf[i].buf_p;
if (page)
- page_pool_put_page(page->pp, page, 0, false);
+ page_pool_put_page(page_pool_get_pp(page),
+ page, 0, false);
}
txq->tx_buf[i].buf_p = NULL;
@@ -1549,7 +1550,7 @@ fec_enet_tx_queue(struct net_device *ndev, u16 queue_id, int budget)
xdp_return_frame_rx_napi(xdpf);
} else { /* recycle pages of XDP_TX frames */
/* The dma_sync_size = 0 as XDP_TX has already synced DMA for_device */
- page_pool_put_page(page->pp, page, 0, true);
+ page_pool_put_page(page_pool_get_pp(page), page, 0, true);
}
txq->tx_buf[index].buf_p = NULL;
@@ -3307,7 +3308,8 @@ static void fec_enet_free_buffers(struct net_device *ndev)
} else {
struct page *page = txq->tx_buf[i].buf_p;
- page_pool_put_page(page->pp, page, 0, false);
+ page_pool_put_page(page_pool_get_pp(page),
+ page, 0, false);
}
txq->tx_buf[i].buf_p = NULL;
diff --git a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
index 403f0f335ba6..87422b8828ff 100644
--- a/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
+++ b/drivers/net/ethernet/google/gve/gve_buffer_mgmt_dqo.c
@@ -210,7 +210,7 @@ void gve_free_to_page_pool(struct gve_rx_ring *rx,
if (!page)
return;
- page_pool_put_full_page(page->pp, page, allow_direct);
+ page_pool_put_full_page(page_pool_get_pp(page), page, allow_direct);
buf_state->page_info.page = NULL;
}
diff --git a/drivers/net/ethernet/intel/iavf/iavf_txrx.c b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
index 26b424fd6718..e1bf5554f6e3 100644
--- a/drivers/net/ethernet/intel/iavf/iavf_txrx.c
+++ b/drivers/net/ethernet/intel/iavf/iavf_txrx.c
@@ -1050,7 +1050,8 @@ static void iavf_add_rx_frag(struct sk_buff *skb,
const struct libeth_fqe *rx_buffer,
unsigned int size)
{
- u32 hr = rx_buffer->page->pp->p.offset;
+ struct page_pool *pool = page_pool_get_pp(rx_buffer->page);
+ u32 hr = pool->p.offset;
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buffer->page,
rx_buffer->offset + hr, size, rx_buffer->truesize);
@@ -1067,7 +1068,8 @@ static void iavf_add_rx_frag(struct sk_buff *skb,
static struct sk_buff *iavf_build_skb(const struct libeth_fqe *rx_buffer,
unsigned int size)
{
- u32 hr = rx_buffer->page->pp->p.offset;
+ struct page_pool *pool = page_pool_get_pp(rx_buffer->page);
+ u32 hr = pool->p.offset;
struct sk_buff *skb;
void *va;
diff --git a/drivers/net/ethernet/intel/idpf/idpf_txrx.c b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
index 2fa9c36e33c9..04f2347716ca 100644
--- a/drivers/net/ethernet/intel/idpf/idpf_txrx.c
+++ b/drivers/net/ethernet/intel/idpf/idpf_txrx.c
@@ -385,7 +385,8 @@ static void idpf_rx_page_rel(struct libeth_fqe *rx_buf)
if (unlikely(!rx_buf->page))
return;
- page_pool_put_full_page(rx_buf->page->pp, rx_buf->page, false);
+ page_pool_put_full_page(page_pool_get_pp(rx_buf->page), rx_buf->page,
+ false);
rx_buf->page = NULL;
rx_buf->offset = 0;
@@ -3098,7 +3099,8 @@ idpf_rx_process_skb_fields(struct idpf_rx_queue *rxq, struct sk_buff *skb,
void idpf_rx_add_frag(struct idpf_rx_buf *rx_buf, struct sk_buff *skb,
unsigned int size)
{
- u32 hr = rx_buf->page->pp->p.offset;
+ struct page_pool *pool = page_pool_get_pp(rx_buf->page);
+ u32 hr = pool->p.offset;
skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, rx_buf->page,
rx_buf->offset + hr, size, rx_buf->truesize);
@@ -3130,8 +3132,10 @@ static u32 idpf_rx_hsplit_wa(const struct libeth_fqe *hdr,
if (!libeth_rx_sync_for_cpu(buf, copy))
return 0;
- dst = page_address(hdr->page) + hdr->offset + hdr->page->pp->p.offset;
- src = page_address(buf->page) + buf->offset + buf->page->pp->p.offset;
+ dst = page_address(hdr->page) + hdr->offset +
+ page_pool_get_pp(hdr->page)->p.offset;
+ src = page_address(buf->page) + buf->offset +
+ page_pool_get_pp(buf->page)->p.offset;
memcpy(dst, src, LARGEST_ALIGN(copy));
buf->offset += copy;
@@ -3149,7 +3153,7 @@ static u32 idpf_rx_hsplit_wa(const struct libeth_fqe *hdr,
*/
struct sk_buff *idpf_rx_build_skb(const struct libeth_fqe *buf, u32 size)
{
- u32 hr = buf->page->pp->p.offset;
+ u32 hr = page_pool_get_pp(buf->page)->p.offset;
struct sk_buff *skb;
void *va;
diff --git a/drivers/net/ethernet/intel/libeth/rx.c b/drivers/net/ethernet/intel/libeth/rx.c
index 66d1d23b8ad2..8de0c3a3b146 100644
--- a/drivers/net/ethernet/intel/libeth/rx.c
+++ b/drivers/net/ethernet/intel/libeth/rx.c
@@ -207,7 +207,7 @@ EXPORT_SYMBOL_NS_GPL(libeth_rx_fq_destroy, "LIBETH");
*/
void libeth_rx_recycle_slow(struct page *page)
{
- page_pool_recycle_direct(page->pp, page);
+ page_pool_recycle_direct(page_pool_get_pp(page), page);
}
EXPORT_SYMBOL_NS_GPL(libeth_rx_recycle_slow, "LIBETH");
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index 94b291662087..30baca49c71e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -716,7 +716,8 @@ static void mlx5e_free_xdpsq_desc(struct mlx5e_xdpsq *sq,
/* No need to check ((page->pp_magic & ~0x3UL) == PP_SIGNATURE)
* as we know this is a page_pool page.
*/
- page_pool_recycle_direct(page->pp, page);
+ page_pool_recycle_direct(page_pool_get_pp(page),
+ page);
} while (++n < num);
break;
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c
index e068a9761c09..e583415f9088 100644
--- a/drivers/net/netdevsim/netdev.c
+++ b/drivers/net/netdevsim/netdev.c
@@ -632,7 +632,8 @@ nsim_pp_hold_write(struct file *file, const char __user *data,
if (!ns->page)
ret = -ENOMEM;
} else {
- page_pool_put_full_page(ns->page->pp, ns->page, false);
+ page_pool_put_full_page(page_pool_get_pp(ns->page), ns->page,
+ false);
ns->page = NULL;
}
@@ -831,7 +832,8 @@ void nsim_destroy(struct netdevsim *ns)
/* Put this intentionally late to exercise the orphaning path */
if (ns->page) {
- page_pool_put_full_page(ns->page->pp, ns->page, false);
+ page_pool_put_full_page(page_pool_get_pp(ns->page), ns->page,
+ false);
ns->page = NULL;
}
diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h
index ca2dba3ac65d..4d0e41a7bf4a 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76.h
+++ b/drivers/net/wireless/mediatek/mt76/mt76.h
@@ -1688,7 +1688,7 @@ static inline void mt76_put_page_pool_buf(void *buf, bool allow_direct)
{
struct page *page = virt_to_head_page(buf);
- page_pool_put_full_page(page->pp, page, allow_direct);
+ page_pool_put_full_page(page_pool_get_pp(page), page, allow_direct);
}
static inline void *
diff --git a/include/net/libeth/rx.h b/include/net/libeth/rx.h
index 43574bd6612f..f4ae75f9cc1b 100644
--- a/include/net/libeth/rx.h
+++ b/include/net/libeth/rx.h
@@ -137,7 +137,8 @@ static inline bool libeth_rx_sync_for_cpu(const struct libeth_fqe *fqe,
return false;
}
- page_pool_dma_sync_for_cpu(page->pp, page, fqe->offset, len);
+ page_pool_dma_sync_for_cpu(page_pool_get_pp(page), page, fqe->offset,
+ len);
return true;
}
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 543f54fa3020..9c4dbd2289b1 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -83,6 +83,11 @@ static inline u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
}
#endif
+static inline struct page_pool *page_pool_get_pp(struct page *page)
+{
+ return page->pp;
+}
+
/**
* page_pool_dev_alloc_pages() - allocate a page.
* @pool: pool from which to allocate
--
2.33.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v6 0/8] fix two bugs related to page_pool
2025-01-06 13:01 [PATCH net-next v6 0/8] fix two bugs related to page_pool Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
@ 2025-01-06 23:51 ` Jakub Kicinski
2025-01-07 12:54 ` Yunsheng Lin
2025-01-07 14:26 ` Jesper Dangaard Brouer
2 siblings, 1 reply; 8+ messages in thread
From: Jakub Kicinski @ 2025-01-06 23:51 UTC (permalink / raw)
To: Yunsheng Lin
Cc: davem, pabeni, liuyonglong, fanghaiqing, zhangkun09,
Alexander Lobakin, Robin Murphy, Alexander Duyck, Andrew Morton,
IOMMU, MM, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Matthias Brugger,
AngeloGioacchino Del Regno, netdev, intel-wired-lan, bpf,
linux-kernel, linux-arm-kernel, linux-mediatek
On Mon, 6 Jan 2025 21:01:08 +0800 Yunsheng Lin wrote:
> This patchset fix a possible time window problem for page_pool and
> the dma API misuse problem as mentioned in [1], and try to avoid the
> overhead of the fixing using some optimization.
>
> From the below performance data, the overhead is not so obvious
> due to performance variations for time_bench_page_pool01_fast_path()
> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
> for time_bench_page_pool03_slow() for fixing the bug.
This appears to make the selftest from the drivers/net target implode.
[ 20.227775][ T218] BUG: KASAN: use-after-free in page_pool_item_uninit+0x100/0x130
Running the ping.py tests should be enough to repro.
--
pw-bot: cr
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v6 0/8] fix two bugs related to page_pool
2025-01-06 23:51 ` [PATCH net-next v6 0/8] fix two bugs related to page_pool Jakub Kicinski
@ 2025-01-07 12:54 ` Yunsheng Lin
0 siblings, 0 replies; 8+ messages in thread
From: Yunsheng Lin @ 2025-01-07 12:54 UTC (permalink / raw)
To: Jakub Kicinski
Cc: davem, pabeni, liuyonglong, fanghaiqing, zhangkun09,
Alexander Lobakin, Robin Murphy, Alexander Duyck, Andrew Morton,
IOMMU, MM, Alexei Starovoitov, Daniel Borkmann,
Jesper Dangaard Brouer, John Fastabend, Matthias Brugger,
AngeloGioacchino Del Regno, netdev, intel-wired-lan, bpf,
linux-kernel, linux-arm-kernel, linux-mediatek
On 2025/1/7 7:51, Jakub Kicinski wrote:
> On Mon, 6 Jan 2025 21:01:08 +0800 Yunsheng Lin wrote:
>> This patchset fix a possible time window problem for page_pool and
>> the dma API misuse problem as mentioned in [1], and try to avoid the
>> overhead of the fixing using some optimization.
>>
>> From the below performance data, the overhead is not so obvious
>> due to performance variations for time_bench_page_pool01_fast_path()
>> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
>> for time_bench_page_pool03_slow() for fixing the bug.
>
> This appears to make the selftest from the drivers/net target implode.
>
> [ 20.227775][ T218] BUG: KASAN: use-after-free in page_pool_item_uninit+0x100/0x130
>
> Running the ping.py tests should be enough to repro.
Thanks for reminding.
Something like below seems to fix the use-after-free bug, will enable more
DEBUG config when doing testing.
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -518,9 +518,13 @@ static void page_pool_items_unmap(struct page_pool *pool)
static void page_pool_item_uninit(struct page_pool *pool)
{
- struct page_pool_item_block *block;
+ while (!list_empty(&pool->item_blocks)) {
+ struct page_pool_item_block *block;
- list_for_each_entry(block, &pool->item_blocks, list) {
+ block = list_first_entry(&pool->item_blocks,
+ struct page_pool_item_block,
+ list);
+ list_del(&block->list);
WARN_ON(refcount_read(&block->ref));
put_page(virt_to_page(block));
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v6 0/8] fix two bugs related to page_pool
2025-01-06 13:01 [PATCH net-next v6 0/8] fix two bugs related to page_pool Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
2025-01-06 23:51 ` [PATCH net-next v6 0/8] fix two bugs related to page_pool Jakub Kicinski
@ 2025-01-07 14:26 ` Jesper Dangaard Brouer
2025-01-08 9:36 ` Yunsheng Lin
2 siblings, 1 reply; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2025-01-07 14:26 UTC (permalink / raw)
To: Yunsheng Lin, davem, kuba, pabeni
Cc: liuyonglong, fanghaiqing, zhangkun09, Alexander Lobakin,
Robin Murphy, Alexander Duyck, Andrew Morton, IOMMU, MM,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Matthias Brugger, AngeloGioacchino Del Regno, netdev,
intel-wired-lan, bpf, linux-kernel, linux-arm-kernel,
linux-mediatek
On 06/01/2025 14.01, Yunsheng Lin wrote:
> This patchset fix a possible time window problem for page_pool and
> the dma API misuse problem as mentioned in [1], and try to avoid the
> overhead of the fixing using some optimization.
>
> From the below performance data, the overhead is not so obvious
> due to performance variations for time_bench_page_pool01_fast_path()
> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
> for time_bench_page_pool03_slow() for fixing the bug.
>
> Before this patchset:
> root@(none)$ insmod bench_page_pool_simple.ko
> [ 323.367627] bench_page_pool_simple: Loaded
> [ 323.448747] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076997150 sec time_interval:76997150) - (invoke count:100000000 tsc_interval:7699707)
> [ 324.812884] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346855130 sec time_interval:1346855130) - (invoke count:100000000 tsc_interval:134685507)
> [ 324.980875] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.010 ns (step:0) - (measurement period time:0.150101270 sec time_interval:150101270) - (invoke count:10000000 tsc_interval:15010120)
> [ 325.652195] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.542 ns (step:0) - (measurement period time:0.654213000 sec time_interval:654213000) - (invoke count:100000000 tsc_interval:65421294)
> [ 325.669215] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [ 325.974848] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 29.633 ns (step:0) - (measurement period time:0.296338200 sec time_interval:296338200) - (invoke count:10000000 tsc_interval:29633814)
(referring to above line, below)
> [ 325.993517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [ 326.576636] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.391 ns (step:0) - (measurement period time:0.573911820 sec time_interval:573911820) - (invoke count:10000000 tsc_interval:57391174)
> [ 326.595307] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [ 328.422661] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.849 ns (step:0) - (measurement period time:1.818495880 sec time_interval:1818495880) - (invoke count:10000000 tsc_interval:181849581)
> [ 328.441681] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [ 328.449584] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [ 328.755031] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 29.632 ns (step:0) - (measurement period time:0.296327910 sec time_interval:296327910) - (invoke count:10000000 tsc_interval:29632785)
It is strange that fast-path "tasklet_page_pool01_fast_path" isn't
faster than above "no-softirq-page_pool01".
They are both 29.633 ns.
What hardware is this?
e.g. the cycle count of 2 cycles(tsc) seem strange.
On my testlab hardware Intel CPU E5-1650 v4 @3.60GHz
My fast-path numbers say 5.202 ns (18 cycles) for
"tasklet_page_pool01_fast_path"
Raw data look like this
[Tue Jan 7 15:15:18 2025] bench_page_pool_simple: pp_tasklet_handler():
in_serving_softirq fast-path
[Tue Jan 7 15:15:18 2025] bench_page_pool_simple:
time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[Tue Jan 7 15:15:18 2025] time_bench:
Type:tasklet_page_pool01_fast_path Per elem: 18 cycles(tsc) 5.202 ns
(step:0) - (measurement period time:0.052020430 sec
time_interval:52020430) - (invoke count:10000000 tsc_interval:187272981)
[Tue Jan 7 15:15:18 2025] bench_page_pool_simple:
time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[Tue Jan 7 15:15:19 2025] time_bench: Type:tasklet_page_pool02_ptr_ring
Per elem: 55 cycles(tsc) 15.343 ns (step:0) - (measurement period
time:0.153438301 sec time_interval:153438301) - (invoke count:10000000
tsc_interval:552378168)
[Tue Jan 7 15:15:19 2025] bench_page_pool_simple:
time_bench_page_pool03_slow(): in_serving_softirq fast-path
[Tue Jan 7 15:15:19 2025] time_bench: Type:tasklet_page_pool03_slow Per
elem: 243 cycles(tsc) 67.725 ns (step:0) - (measurement period
time:0.677255574 sec time_interval:677255574) - (invoke count:10000000
tsc_interval:2438124315)
> [ 328.774308] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [ 329.578579] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 7 cycles(tsc) 79.523 ns (step:0) - (measurement period time:0.795236560 sec time_interval:795236560) - (invoke count:10000000 tsc_interval:79523650)
> [ 329.597769] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [ 331.507501] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.104 ns (step:0) - (measurement period time:1.901047510 sec time_interval:1901047510) - (invoke count:10000000 tsc_interval:190104743)
>
> After this patchset:
> root@(none)$ insmod bench_page_pool_simple.ko
> [ 138.634758] bench_page_pool_simple: Loaded
> [ 138.715879] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076972720 sec time_interval:76972720) - (invoke count:100000000 tsc_interval:7697265)
> [ 140.079897] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:1.346735370 sec time_interval:1346735370) - (invoke count:100000000 tsc_interval:134673531)
> [ 140.247841] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150055080 sec time_interval:150055080) - (invoke count:10000000 tsc_interval:15005497)
> [ 140.919072] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:0.654125000 sec time_interval:654125000) - (invoke count:100000000 tsc_interval:65412493)
> [ 140.936091] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [ 141.246985] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 30.159 ns (step:0) - (measurement period time:0.301598160 sec time_interval:301598160) - (invoke count:10000000 tsc_interval:30159812)
> [ 141.265654] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [ 141.976265] time_bench: Type:no-softirq-page_pool02 Per elem: 7 cycles(tsc) 70.140 ns (step:0) - (measurement period time:0.701405780 sec time_interval:701405780) - (invoke count:10000000 tsc_interval:70140573)
> [ 141.994933] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [ 144.018945] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.514 ns (step:0) - (measurement period time:2.015141210 sec time_interval:2015141210) - (invoke count:10000000 tsc_interval:201514113)
> [ 144.037966] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [ 144.045870] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [ 144.205045] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150056510 sec time_interval:150056510) - (invoke count:10000000 tsc_interval:15005645)
This 15.005 ns looks like a significant improvement over 29.633 ns
> [ 144.224320] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [ 144.916044] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 68.269 ns (step:0) - (measurement period time:0.682693070 sec time_interval:682693070) - (invoke count:10000000 tsc_interval:68269300)
> [ 144.935234] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [ 146.997684] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 205.376 ns (step:0) - (measurement period time:2.053766310 sec time_interval:2053766310) - (invoke count:10000000 tsc_interval:205376624)
>
Looks like I should also try out this patchset on my testlab, as this
hardware seems significantly different than mine...
> 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
>
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> CC: Robin Murphy <robin.murphy@arm.com>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: IOMMU <iommu@lists.linux.dev>
> CC: MM <linux-mm@kvack.org>
>
> Change log:
> V6:
> 1. Repost based on latest net-next.
> 2. Rename page_pool_to_pp() to page_pool_get_pp().
>
> V5:
> 1. Support unlimit inflight pages.
> 2. Add some optimization to avoid the overhead of fixing bug.
>
> V4:
> 1. use scanning to do the unmapping
> 2. spilt dma sync skipping into separate patch
>
> V3:
> 1. Target net-next tree instead of net tree.
> 2. Narrow the rcu lock as the discussion in v2.
> 3. Check the ummapping cnt against the inflight cnt.
>
> V2:
> 1. Add a item_full stat.
> 2. Use container_of() for page_pool_to_pp().
>
> Yunsheng Lin (8):
> page_pool: introduce page_pool_get_pp() API
> page_pool: fix timing for checking and disabling napi_local
> page_pool: fix IOMMU crash when driver has already unbound
> page_pool: support unlimited number of inflight pages
> page_pool: skip dma sync operation for inflight pages
> page_pool: use list instead of ptr_ring for ring cache
> page_pool: batch refilling pages to reduce atomic operation
> page_pool: use list instead of array for alloc cache
>
> drivers/net/ethernet/freescale/fec_main.c | 8 +-
> .../ethernet/google/gve/gve_buffer_mgmt_dqo.c | 2 +-
> drivers/net/ethernet/intel/iavf/iavf_txrx.c | 6 +-
> drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +-
> drivers/net/ethernet/intel/libeth/rx.c | 2 +-
> .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 +-
> drivers/net/netdevsim/netdev.c | 6 +-
> drivers/net/wireless/mediatek/mt76/mt76.h | 2 +-
> include/linux/mm_types.h | 2 +-
> include/linux/skbuff.h | 1 +
> include/net/libeth/rx.h | 3 +-
> include/net/netmem.h | 24 +-
> include/net/page_pool/helpers.h | 11 +
> include/net/page_pool/types.h | 63 +-
> net/core/devmem.c | 4 +-
> net/core/netmem_priv.h | 5 +-
> net/core/page_pool.c | 660 ++++++++++++++----
> net/core/page_pool_priv.h | 12 +-
> 18 files changed, 664 insertions(+), 164 deletions(-)
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API
2025-01-06 13:01 ` [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
@ 2025-01-07 14:52 ` Jesper Dangaard Brouer
2025-01-08 9:37 ` Yunsheng Lin
0 siblings, 1 reply; 8+ messages in thread
From: Jesper Dangaard Brouer @ 2025-01-07 14:52 UTC (permalink / raw)
To: Yunsheng Lin, davem, kuba, pabeni
Cc: liuyonglong, fanghaiqing, zhangkun09, Wei Fang, Shenwei Wang,
Clark Wang, Andrew Lunn, Eric Dumazet, Jeroen de Borst,
Praveen Kaligineedi, Shailend Chand, Tony Nguyen, Przemek Kitszel,
Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, Shayne Chen,
Sean Wang, Kalle Valo, Matthias Brugger,
AngeloGioacchino Del Regno, Simon Horman, Ilias Apalodimas, imx,
netdev, linux-kernel, intel-wired-lan, bpf, linux-rdma,
linux-wireless, linux-arm-kernel, linux-mediatek
On 06/01/2025 14.01, Yunsheng Lin wrote:
> introduce page_pool_get_pp() API to avoid caller accessing
> page->pp directly.
>
[...]
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 543f54fa3020..9c4dbd2289b1 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -83,6 +83,11 @@ static inline u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
> }
> #endif
>
> +static inline struct page_pool *page_pool_get_pp(struct page *page)
> +{
> + return page->pp;
> +}
IMHO the function name "page_pool_get_pp" is problematic. As calling it
"get" indicate to me that we are taking some reference on the pp object.
Is this you plan in later patches?
If it is simply a dereference of page->pp ... then we could call it
page2pp ?
... but I'm uncertain why we need this change.
--Jesper
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v6 0/8] fix two bugs related to page_pool
2025-01-07 14:26 ` Jesper Dangaard Brouer
@ 2025-01-08 9:36 ` Yunsheng Lin
0 siblings, 0 replies; 8+ messages in thread
From: Yunsheng Lin @ 2025-01-08 9:36 UTC (permalink / raw)
To: Jesper Dangaard Brouer, davem, kuba, pabeni
Cc: liuyonglong, fanghaiqing, zhangkun09, Alexander Lobakin,
Robin Murphy, Alexander Duyck, Andrew Morton, IOMMU, MM,
Alexei Starovoitov, Daniel Borkmann, John Fastabend,
Matthias Brugger, AngeloGioacchino Del Regno, netdev,
intel-wired-lan, bpf, linux-kernel, linux-arm-kernel,
linux-mediatek
On 2025/1/7 22:26, Jesper Dangaard Brouer wrote:
>
>
> On 06/01/2025 14.01, Yunsheng Lin wrote:
>> This patchset fix a possible time window problem for page_pool and
>> the dma API misuse problem as mentioned in [1], and try to avoid the
>> overhead of the fixing using some optimization.
>>
>> From the below performance data, the overhead is not so obvious
>> due to performance variations for time_bench_page_pool01_fast_path()
>> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
>> for time_bench_page_pool03_slow() for fixing the bug.
>>
>> Before this patchset:
>> root@(none)$ insmod bench_page_pool_simple.ko
>> [ 323.367627] bench_page_pool_simple: Loaded
>> [ 323.448747] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076997150 sec time_interval:76997150) - (invoke count:100000000 tsc_interval:7699707)
>> [ 324.812884] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346855130 sec time_interval:1346855130) - (invoke count:100000000 tsc_interval:134685507)
>> [ 324.980875] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.010 ns (step:0) - (measurement period time:0.150101270 sec time_interval:150101270) - (invoke count:10000000 tsc_interval:15010120)
>> [ 325.652195] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.542 ns (step:0) - (measurement period time:0.654213000 sec time_interval:654213000) - (invoke count:100000000 tsc_interval:65421294)
>> [ 325.669215] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
>> [ 325.974848] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 29.633 ns (step:0) - (measurement period time:0.296338200 sec time_interval:296338200) - (invoke count:10000000 tsc_interval:29633814)
>
> (referring to above line, below)
>
>> [ 325.993517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
>> [ 326.576636] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.391 ns (step:0) - (measurement period time:0.573911820 sec time_interval:573911820) - (invoke count:10000000 tsc_interval:57391174)
>> [ 326.595307] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
>> [ 328.422661] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.849 ns (step:0) - (measurement period time:1.818495880 sec time_interval:1818495880) - (invoke count:10000000 tsc_interval:181849581)
>> [ 328.441681] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
>> [ 328.449584] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
>> [ 328.755031] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 29.632 ns (step:0) - (measurement period time:0.296327910 sec time_interval:296327910) - (invoke count:10000000 tsc_interval:29632785)
>
> It is strange that fast-path "tasklet_page_pool01_fast_path" isn't
> faster than above "no-softirq-page_pool01".
> They are both 29.633 ns.
>
> What hardware is this?
Arm64 server, as the testing module doesn't support arm64, so get_cycles()
in [1] is used to do time keeping instead of using x86 asm instruction.
1. https://lore.kernel.org/lkml/caf31b5e-0e8f-4844-b7ba-ef59ed13b74e@arm.com/T/
>
> e.g. the cycle count of 2 cycles(tsc) seem strange.
>
> On my testlab hardware Intel CPU E5-1650 v4 @3.60GHz
> My fast-path numbers say 5.202 ns (18 cycles) for "tasklet_page_pool01_fast_path"
>
>
> Raw data look like this
>
> [Tue Jan 7 15:15:18 2025] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [Tue Jan 7 15:15:18 2025] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [Tue Jan 7 15:15:18 2025] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 18 cycles(tsc) 5.202 ns (step:0) - (measurement period time:0.052020430 sec time_interval:52020430) - (invoke count:10000000 tsc_interval:187272981)
> [Tue Jan 7 15:15:18 2025] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [Tue Jan 7 15:15:19 2025] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 55 cycles(tsc) 15.343 ns (step:0) - (measurement period time:0.153438301 sec time_interval:153438301) - (invoke count:10000000 tsc_interval:552378168)
> [Tue Jan 7 15:15:19 2025] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [Tue Jan 7 15:15:19 2025] time_bench: Type:tasklet_page_pool03_slow Per elem: 243 cycles(tsc) 67.725 ns (step:0) - (measurement period time:0.677255574 sec time_interval:677255574) - (invoke count:10000000 tsc_interval:2438124315)
>
>
>> [ 328.774308] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
>> [ 329.578579] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 7 cycles(tsc) 79.523 ns (step:0) - (measurement period time:0.795236560 sec time_interval:795236560) - (invoke count:10000000 tsc_interval:79523650)
>> [ 329.597769] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
>> [ 331.507501] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.104 ns (step:0) - (measurement period time:1.901047510 sec time_interval:1901047510) - (invoke count:10000000 tsc_interval:190104743)
>>
>> After this patchset:
>> root@(none)$ insmod bench_page_pool_simple.ko
>> [ 138.634758] bench_page_pool_simple: Loaded
>> [ 138.715879] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076972720 sec time_interval:76972720) - (invoke count:100000000 tsc_interval:7697265)
>> [ 140.079897] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:1.346735370 sec time_interval:1346735370) - (invoke count:100000000 tsc_interval:134673531)
>> [ 140.247841] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150055080 sec time_interval:150055080) - (invoke count:10000000 tsc_interval:15005497)
>> [ 140.919072] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:0.654125000 sec time_interval:654125000) - (invoke count:100000000 tsc_interval:65412493)
>> [ 140.936091] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
>> [ 141.246985] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 30.159 ns (step:0) - (measurement period time:0.301598160 sec time_interval:301598160) - (invoke count:10000000 tsc_interval:30159812)
>> [ 141.265654] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
>> [ 141.976265] time_bench: Type:no-softirq-page_pool02 Per elem: 7 cycles(tsc) 70.140 ns (step:0) - (measurement period time:0.701405780 sec time_interval:701405780) - (invoke count:10000000 tsc_interval:70140573)
>> [ 141.994933] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
>> [ 144.018945] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.514 ns (step:0) - (measurement period time:2.015141210 sec time_interval:2015141210) - (invoke count:10000000 tsc_interval:201514113)
>> [ 144.037966] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
>> [ 144.045870] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
>> [ 144.205045] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150056510 sec time_interval:150056510) - (invoke count:10000000 tsc_interval:15005645)
>
> This 15.005 ns looks like a significant improvement over 29.633 ns
It seems to be some performance variations here. There seems to be some
performance variations between doing test using 'taskset -c 0' and with
using 'taskset -c 1' too, I didn't get into the detail reason of performance
variations yet, as the performance variations seems to exist before this
patchset too.
>
>> [ 144.224320] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
>> [ 144.916044] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 68.269 ns (step:0) - (measurement period time:0.682693070 sec time_interval:682693070) - (invoke count:10000000 tsc_interval:68269300)
>> [ 144.935234] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
>> [ 146.997684] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 205.376 ns (step:0) - (measurement period time:2.053766310 sec time_interval:2053766310) - (invoke count:10000000 tsc_interval:205376624)
>>
>
>
> Looks like I should also try out this patchset on my testlab, as this
> hardware seems significantly different than mine...
Yes, it would be much appreciated if it is also tested in your testlab.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API
2025-01-07 14:52 ` Jesper Dangaard Brouer
@ 2025-01-08 9:37 ` Yunsheng Lin
0 siblings, 0 replies; 8+ messages in thread
From: Yunsheng Lin @ 2025-01-08 9:37 UTC (permalink / raw)
To: Jesper Dangaard Brouer, davem, kuba, pabeni
Cc: liuyonglong, fanghaiqing, zhangkun09, Wei Fang, Shenwei Wang,
Clark Wang, Andrew Lunn, Eric Dumazet, Jeroen de Borst,
Praveen Kaligineedi, Shailend Chand, Tony Nguyen, Przemek Kitszel,
Alexander Lobakin, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, Saeed Mahameed, Leon Romanovsky, Tariq Toukan,
Felix Fietkau, Lorenzo Bianconi, Ryder Lee, Shayne Chen,
Sean Wang, Kalle Valo, Matthias Brugger,
AngeloGioacchino Del Regno, Simon Horman, Ilias Apalodimas, imx,
netdev, linux-kernel, intel-wired-lan, bpf, linux-rdma,
linux-wireless, linux-arm-kernel, linux-mediatek
On 2025/1/7 22:52, Jesper Dangaard Brouer wrote:
>
> On 06/01/2025 14.01, Yunsheng Lin wrote:
>> introduce page_pool_get_pp() API to avoid caller accessing
>> page->pp directly.
>>
> [...]
>
>> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
>> index 543f54fa3020..9c4dbd2289b1 100644
>> --- a/include/net/page_pool/helpers.h
>> +++ b/include/net/page_pool/helpers.h
>> @@ -83,6 +83,11 @@ static inline u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
>> }
>> #endif
>> +static inline struct page_pool *page_pool_get_pp(struct page *page)
>> +{
>> + return page->pp;
>> +}
>
> IMHO the function name "page_pool_get_pp" is problematic. As calling it "get" indicate to me that we are taking some reference on the pp object. Is this you plan in later patches?
No, this patchset is not going to taking some reference on the pp object.
>
> If it is simply a dereference of page->pp ... then we could call it page2pp ?
Before this version page_pool_to_pp() is used, this version renamed it to
page_pool_get_pp() as there is an exising netmem_get_pp() in patch 3, which
is also simply a dereference of netmem->pp, using page_pool_to_pp() does not
seem consistent with netmem from API naming point.
> ... but I'm uncertain why we need this change.
This patch is added to make patch 3 more reviewable as page->pp is renamed to
page->pp_item in patch 3. If there is no helper added in this patch, patch 3
might need to touch all the places touched in this patch too.
>
> --Jesper
>
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-01-08 9:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-06 13:01 [PATCH net-next v6 0/8] fix two bugs related to page_pool Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
2025-01-07 14:52 ` Jesper Dangaard Brouer
2025-01-08 9:37 ` Yunsheng Lin
2025-01-06 23:51 ` [PATCH net-next v6 0/8] fix two bugs related to page_pool Jakub Kicinski
2025-01-07 12:54 ` Yunsheng Lin
2025-01-07 14:26 ` Jesper Dangaard Brouer
2025-01-08 9:36 ` Yunsheng Lin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).