From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com
Cc: liuyonglong@huawei.com, fanghaiqing@huawei.com,
zhangkun09@huawei.com,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Robin Murphy <robin.murphy@arm.com>,
Alexander Duyck <alexander.duyck@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
IOMMU <iommu@lists.linux.dev>, MM <linux-mm@kvack.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Matthias Brugger <matthias.bgg@gmail.com>,
AngeloGioacchino Del Regno
<angelogioacchino.delregno@collabora.com>,
netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-mediatek@lists.infradead.org
Subject: Re: [PATCH net-next v6 0/8] fix two bugs related to page_pool
Date: Tue, 7 Jan 2025 15:26:35 +0100 [thread overview]
Message-ID: <f977c0ab-76f5-4869-9fb7-e111104e2fff@kernel.org> (raw)
In-Reply-To: <20250106130116.457938-1-linyunsheng@huawei.com>
On 06/01/2025 14.01, Yunsheng Lin wrote:
> This patchset fix a possible time window problem for page_pool and
> the dma API misuse problem as mentioned in [1], and try to avoid the
> overhead of the fixing using some optimization.
>
> From the below performance data, the overhead is not so obvious
> due to performance variations for time_bench_page_pool01_fast_path()
> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
> for time_bench_page_pool03_slow() for fixing the bug.
>
> Before this patchset:
> root@(none)$ insmod bench_page_pool_simple.ko
> [ 323.367627] bench_page_pool_simple: Loaded
> [ 323.448747] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076997150 sec time_interval:76997150) - (invoke count:100000000 tsc_interval:7699707)
> [ 324.812884] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346855130 sec time_interval:1346855130) - (invoke count:100000000 tsc_interval:134685507)
> [ 324.980875] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.010 ns (step:0) - (measurement period time:0.150101270 sec time_interval:150101270) - (invoke count:10000000 tsc_interval:15010120)
> [ 325.652195] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.542 ns (step:0) - (measurement period time:0.654213000 sec time_interval:654213000) - (invoke count:100000000 tsc_interval:65421294)
> [ 325.669215] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [ 325.974848] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 29.633 ns (step:0) - (measurement period time:0.296338200 sec time_interval:296338200) - (invoke count:10000000 tsc_interval:29633814)
(referring to above line, below)
> [ 325.993517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [ 326.576636] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.391 ns (step:0) - (measurement period time:0.573911820 sec time_interval:573911820) - (invoke count:10000000 tsc_interval:57391174)
> [ 326.595307] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [ 328.422661] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.849 ns (step:0) - (measurement period time:1.818495880 sec time_interval:1818495880) - (invoke count:10000000 tsc_interval:181849581)
> [ 328.441681] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [ 328.449584] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [ 328.755031] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 29.632 ns (step:0) - (measurement period time:0.296327910 sec time_interval:296327910) - (invoke count:10000000 tsc_interval:29632785)
It is strange that fast-path "tasklet_page_pool01_fast_path" isn't
faster than above "no-softirq-page_pool01".
They are both 29.633 ns.
What hardware is this?
e.g. the cycle count of 2 cycles(tsc) seem strange.
On my testlab hardware Intel CPU E5-1650 v4 @3.60GHz
My fast-path numbers say 5.202 ns (18 cycles) for
"tasklet_page_pool01_fast_path"
Raw data look like this
[Tue Jan 7 15:15:18 2025] bench_page_pool_simple: pp_tasklet_handler():
in_serving_softirq fast-path
[Tue Jan 7 15:15:18 2025] bench_page_pool_simple:
time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[Tue Jan 7 15:15:18 2025] time_bench:
Type:tasklet_page_pool01_fast_path Per elem: 18 cycles(tsc) 5.202 ns
(step:0) - (measurement period time:0.052020430 sec
time_interval:52020430) - (invoke count:10000000 tsc_interval:187272981)
[Tue Jan 7 15:15:18 2025] bench_page_pool_simple:
time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[Tue Jan 7 15:15:19 2025] time_bench: Type:tasklet_page_pool02_ptr_ring
Per elem: 55 cycles(tsc) 15.343 ns (step:0) - (measurement period
time:0.153438301 sec time_interval:153438301) - (invoke count:10000000
tsc_interval:552378168)
[Tue Jan 7 15:15:19 2025] bench_page_pool_simple:
time_bench_page_pool03_slow(): in_serving_softirq fast-path
[Tue Jan 7 15:15:19 2025] time_bench: Type:tasklet_page_pool03_slow Per
elem: 243 cycles(tsc) 67.725 ns (step:0) - (measurement period
time:0.677255574 sec time_interval:677255574) - (invoke count:10000000
tsc_interval:2438124315)
> [ 328.774308] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [ 329.578579] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 7 cycles(tsc) 79.523 ns (step:0) - (measurement period time:0.795236560 sec time_interval:795236560) - (invoke count:10000000 tsc_interval:79523650)
> [ 329.597769] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [ 331.507501] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.104 ns (step:0) - (measurement period time:1.901047510 sec time_interval:1901047510) - (invoke count:10000000 tsc_interval:190104743)
>
> After this patchset:
> root@(none)$ insmod bench_page_pool_simple.ko
> [ 138.634758] bench_page_pool_simple: Loaded
> [ 138.715879] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076972720 sec time_interval:76972720) - (invoke count:100000000 tsc_interval:7697265)
> [ 140.079897] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:1.346735370 sec time_interval:1346735370) - (invoke count:100000000 tsc_interval:134673531)
> [ 140.247841] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150055080 sec time_interval:150055080) - (invoke count:10000000 tsc_interval:15005497)
> [ 140.919072] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:0.654125000 sec time_interval:654125000) - (invoke count:100000000 tsc_interval:65412493)
> [ 140.936091] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [ 141.246985] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 30.159 ns (step:0) - (measurement period time:0.301598160 sec time_interval:301598160) - (invoke count:10000000 tsc_interval:30159812)
> [ 141.265654] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [ 141.976265] time_bench: Type:no-softirq-page_pool02 Per elem: 7 cycles(tsc) 70.140 ns (step:0) - (measurement period time:0.701405780 sec time_interval:701405780) - (invoke count:10000000 tsc_interval:70140573)
> [ 141.994933] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [ 144.018945] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.514 ns (step:0) - (measurement period time:2.015141210 sec time_interval:2015141210) - (invoke count:10000000 tsc_interval:201514113)
> [ 144.037966] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [ 144.045870] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [ 144.205045] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150056510 sec time_interval:150056510) - (invoke count:10000000 tsc_interval:15005645)
This 15.005 ns looks like a significant improvement over 29.633 ns
> [ 144.224320] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [ 144.916044] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 68.269 ns (step:0) - (measurement period time:0.682693070 sec time_interval:682693070) - (invoke count:10000000 tsc_interval:68269300)
> [ 144.935234] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [ 146.997684] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 205.376 ns (step:0) - (measurement period time:2.053766310 sec time_interval:2053766310) - (invoke count:10000000 tsc_interval:205376624)
>
Looks like I should also try out this patchset on my testlab, as this
hardware seems significantly different than mine...
> 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
>
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> CC: Robin Murphy <robin.murphy@arm.com>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: IOMMU <iommu@lists.linux.dev>
> CC: MM <linux-mm@kvack.org>
>
> Change log:
> V6:
> 1. Repost based on latest net-next.
> 2. Rename page_pool_to_pp() to page_pool_get_pp().
>
> V5:
> 1. Support unlimit inflight pages.
> 2. Add some optimization to avoid the overhead of fixing bug.
>
> V4:
> 1. use scanning to do the unmapping
> 2. spilt dma sync skipping into separate patch
>
> V3:
> 1. Target net-next tree instead of net tree.
> 2. Narrow the rcu lock as the discussion in v2.
> 3. Check the ummapping cnt against the inflight cnt.
>
> V2:
> 1. Add a item_full stat.
> 2. Use container_of() for page_pool_to_pp().
>
> Yunsheng Lin (8):
> page_pool: introduce page_pool_get_pp() API
> page_pool: fix timing for checking and disabling napi_local
> page_pool: fix IOMMU crash when driver has already unbound
> page_pool: support unlimited number of inflight pages
> page_pool: skip dma sync operation for inflight pages
> page_pool: use list instead of ptr_ring for ring cache
> page_pool: batch refilling pages to reduce atomic operation
> page_pool: use list instead of array for alloc cache
>
> drivers/net/ethernet/freescale/fec_main.c | 8 +-
> .../ethernet/google/gve/gve_buffer_mgmt_dqo.c | 2 +-
> drivers/net/ethernet/intel/iavf/iavf_txrx.c | 6 +-
> drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +-
> drivers/net/ethernet/intel/libeth/rx.c | 2 +-
> .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 +-
> drivers/net/netdevsim/netdev.c | 6 +-
> drivers/net/wireless/mediatek/mt76/mt76.h | 2 +-
> include/linux/mm_types.h | 2 +-
> include/linux/skbuff.h | 1 +
> include/net/libeth/rx.h | 3 +-
> include/net/netmem.h | 24 +-
> include/net/page_pool/helpers.h | 11 +
> include/net/page_pool/types.h | 63 +-
> net/core/devmem.c | 4 +-
> net/core/netmem_priv.h | 5 +-
> net/core/page_pool.c | 660 ++++++++++++++----
> net/core/page_pool_priv.h | 12 +-
> 18 files changed, 664 insertions(+), 164 deletions(-)
>
WARNING: multiple messages have this Message-ID (diff)
From: Jesper Dangaard Brouer <hawk@kernel.org>
To: Yunsheng Lin <linyunsheng@huawei.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com
Cc: liuyonglong@huawei.com, fanghaiqing@huawei.com,
zhangkun09@huawei.com,
Alexander Lobakin <aleksander.lobakin@intel.com>,
Robin Murphy <robin.murphy@arm.com>,
Alexander Duyck <alexander.duyck@gmail.com>,
Andrew Morton <akpm@linux-foundation.org>,
IOMMU <iommu@lists.linux.dev>, MM <linux-mm@kvack.org>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
Matthias Brugger <matthias.bgg@gmail.com>,
AngeloGioacchino Del Regno
<angelogioacchino.delregno@collabora.com>,
netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
bpf@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
linux-mediatek@lists.infradead.org
Subject: Re: [Intel-wired-lan] [PATCH net-next v6 0/8] fix two bugs related to page_pool
Date: Tue, 7 Jan 2025 15:26:35 +0100 [thread overview]
Message-ID: <f977c0ab-76f5-4869-9fb7-e111104e2fff@kernel.org> (raw)
In-Reply-To: <20250106130116.457938-1-linyunsheng@huawei.com>
On 06/01/2025 14.01, Yunsheng Lin wrote:
> This patchset fix a possible time window problem for page_pool and
> the dma API misuse problem as mentioned in [1], and try to avoid the
> overhead of the fixing using some optimization.
>
> From the below performance data, the overhead is not so obvious
> due to performance variations for time_bench_page_pool01_fast_path()
> and time_bench_page_pool02_ptr_ring, and there is about 20ns overhead
> for time_bench_page_pool03_slow() for fixing the bug.
>
> Before this patchset:
> root@(none)$ insmod bench_page_pool_simple.ko
> [ 323.367627] bench_page_pool_simple: Loaded
> [ 323.448747] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076997150 sec time_interval:76997150) - (invoke count:100000000 tsc_interval:7699707)
> [ 324.812884] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.468 ns (step:0) - (measurement period time:1.346855130 sec time_interval:1346855130) - (invoke count:100000000 tsc_interval:134685507)
> [ 324.980875] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.010 ns (step:0) - (measurement period time:0.150101270 sec time_interval:150101270) - (invoke count:10000000 tsc_interval:15010120)
> [ 325.652195] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.542 ns (step:0) - (measurement period time:0.654213000 sec time_interval:654213000) - (invoke count:100000000 tsc_interval:65421294)
> [ 325.669215] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [ 325.974848] time_bench: Type:no-softirq-page_pool01 Per elem: 2 cycles(tsc) 29.633 ns (step:0) - (measurement period time:0.296338200 sec time_interval:296338200) - (invoke count:10000000 tsc_interval:29633814)
(referring to above line, below)
> [ 325.993517] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [ 326.576636] time_bench: Type:no-softirq-page_pool02 Per elem: 5 cycles(tsc) 57.391 ns (step:0) - (measurement period time:0.573911820 sec time_interval:573911820) - (invoke count:10000000 tsc_interval:57391174)
> [ 326.595307] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [ 328.422661] time_bench: Type:no-softirq-page_pool03 Per elem: 18 cycles(tsc) 181.849 ns (step:0) - (measurement period time:1.818495880 sec time_interval:1818495880) - (invoke count:10000000 tsc_interval:181849581)
> [ 328.441681] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [ 328.449584] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [ 328.755031] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 2 cycles(tsc) 29.632 ns (step:0) - (measurement period time:0.296327910 sec time_interval:296327910) - (invoke count:10000000 tsc_interval:29632785)
It is strange that fast-path "tasklet_page_pool01_fast_path" isn't
faster than above "no-softirq-page_pool01".
They are both 29.633 ns.
What hardware is this?
e.g. the cycle count of 2 cycles(tsc) seem strange.
On my testlab hardware Intel CPU E5-1650 v4 @3.60GHz
My fast-path numbers say 5.202 ns (18 cycles) for
"tasklet_page_pool01_fast_path"
Raw data look like this
[Tue Jan 7 15:15:18 2025] bench_page_pool_simple: pp_tasklet_handler():
in_serving_softirq fast-path
[Tue Jan 7 15:15:18 2025] bench_page_pool_simple:
time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
[Tue Jan 7 15:15:18 2025] time_bench:
Type:tasklet_page_pool01_fast_path Per elem: 18 cycles(tsc) 5.202 ns
(step:0) - (measurement period time:0.052020430 sec
time_interval:52020430) - (invoke count:10000000 tsc_interval:187272981)
[Tue Jan 7 15:15:18 2025] bench_page_pool_simple:
time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
[Tue Jan 7 15:15:19 2025] time_bench: Type:tasklet_page_pool02_ptr_ring
Per elem: 55 cycles(tsc) 15.343 ns (step:0) - (measurement period
time:0.153438301 sec time_interval:153438301) - (invoke count:10000000
tsc_interval:552378168)
[Tue Jan 7 15:15:19 2025] bench_page_pool_simple:
time_bench_page_pool03_slow(): in_serving_softirq fast-path
[Tue Jan 7 15:15:19 2025] time_bench: Type:tasklet_page_pool03_slow Per
elem: 243 cycles(tsc) 67.725 ns (step:0) - (measurement period
time:0.677255574 sec time_interval:677255574) - (invoke count:10000000
tsc_interval:2438124315)
> [ 328.774308] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [ 329.578579] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 7 cycles(tsc) 79.523 ns (step:0) - (measurement period time:0.795236560 sec time_interval:795236560) - (invoke count:10000000 tsc_interval:79523650)
> [ 329.597769] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [ 331.507501] time_bench: Type:tasklet_page_pool03_slow Per elem: 19 cycles(tsc) 190.104 ns (step:0) - (measurement period time:1.901047510 sec time_interval:1901047510) - (invoke count:10000000 tsc_interval:190104743)
>
> After this patchset:
> root@(none)$ insmod bench_page_pool_simple.ko
> [ 138.634758] bench_page_pool_simple: Loaded
> [ 138.715879] time_bench: Type:for_loop Per elem: 0 cycles(tsc) 0.769 ns (step:0) - (measurement period time:0.076972720 sec time_interval:76972720) - (invoke count:100000000 tsc_interval:7697265)
> [ 140.079897] time_bench: Type:atomic_inc Per elem: 1 cycles(tsc) 13.467 ns (step:0) - (measurement period time:1.346735370 sec time_interval:1346735370) - (invoke count:100000000 tsc_interval:134673531)
> [ 140.247841] time_bench: Type:lock Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150055080 sec time_interval:150055080) - (invoke count:10000000 tsc_interval:15005497)
> [ 140.919072] time_bench: Type:rcu Per elem: 0 cycles(tsc) 6.541 ns (step:0) - (measurement period time:0.654125000 sec time_interval:654125000) - (invoke count:100000000 tsc_interval:65412493)
> [ 140.936091] bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> [ 141.246985] time_bench: Type:no-softirq-page_pool01 Per elem: 3 cycles(tsc) 30.159 ns (step:0) - (measurement period time:0.301598160 sec time_interval:301598160) - (invoke count:10000000 tsc_interval:30159812)
> [ 141.265654] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> [ 141.976265] time_bench: Type:no-softirq-page_pool02 Per elem: 7 cycles(tsc) 70.140 ns (step:0) - (measurement period time:0.701405780 sec time_interval:701405780) - (invoke count:10000000 tsc_interval:70140573)
> [ 141.994933] bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> [ 144.018945] time_bench: Type:no-softirq-page_pool03 Per elem: 20 cycles(tsc) 201.514 ns (step:0) - (measurement period time:2.015141210 sec time_interval:2015141210) - (invoke count:10000000 tsc_interval:201514113)
> [ 144.037966] bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> [ 144.045870] bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> [ 144.205045] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 1 cycles(tsc) 15.005 ns (step:0) - (measurement period time:0.150056510 sec time_interval:150056510) - (invoke count:10000000 tsc_interval:15005645)
This 15.005 ns looks like a significant improvement over 29.633 ns
> [ 144.224320] bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> [ 144.916044] time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 6 cycles(tsc) 68.269 ns (step:0) - (measurement period time:0.682693070 sec time_interval:682693070) - (invoke count:10000000 tsc_interval:68269300)
> [ 144.935234] bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> [ 146.997684] time_bench: Type:tasklet_page_pool03_slow Per elem: 20 cycles(tsc) 205.376 ns (step:0) - (measurement period time:2.053766310 sec time_interval:2053766310) - (invoke count:10000000 tsc_interval:205376624)
>
Looks like I should also try out this patchset on my testlab, as this
hardware seems significantly different than mine...
> 1. https://lore.kernel.org/lkml/8067f204-1380-4d37-8ffd-007fc6f26738@kernel.org/T/
>
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> CC: Robin Murphy <robin.murphy@arm.com>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: IOMMU <iommu@lists.linux.dev>
> CC: MM <linux-mm@kvack.org>
>
> Change log:
> V6:
> 1. Repost based on latest net-next.
> 2. Rename page_pool_to_pp() to page_pool_get_pp().
>
> V5:
> 1. Support unlimit inflight pages.
> 2. Add some optimization to avoid the overhead of fixing bug.
>
> V4:
> 1. use scanning to do the unmapping
> 2. spilt dma sync skipping into separate patch
>
> V3:
> 1. Target net-next tree instead of net tree.
> 2. Narrow the rcu lock as the discussion in v2.
> 3. Check the ummapping cnt against the inflight cnt.
>
> V2:
> 1. Add a item_full stat.
> 2. Use container_of() for page_pool_to_pp().
>
> Yunsheng Lin (8):
> page_pool: introduce page_pool_get_pp() API
> page_pool: fix timing for checking and disabling napi_local
> page_pool: fix IOMMU crash when driver has already unbound
> page_pool: support unlimited number of inflight pages
> page_pool: skip dma sync operation for inflight pages
> page_pool: use list instead of ptr_ring for ring cache
> page_pool: batch refilling pages to reduce atomic operation
> page_pool: use list instead of array for alloc cache
>
> drivers/net/ethernet/freescale/fec_main.c | 8 +-
> .../ethernet/google/gve/gve_buffer_mgmt_dqo.c | 2 +-
> drivers/net/ethernet/intel/iavf/iavf_txrx.c | 6 +-
> drivers/net/ethernet/intel/idpf/idpf_txrx.c | 14 +-
> drivers/net/ethernet/intel/libeth/rx.c | 2 +-
> .../net/ethernet/mellanox/mlx5/core/en/xdp.c | 3 +-
> drivers/net/netdevsim/netdev.c | 6 +-
> drivers/net/wireless/mediatek/mt76/mt76.h | 2 +-
> include/linux/mm_types.h | 2 +-
> include/linux/skbuff.h | 1 +
> include/net/libeth/rx.h | 3 +-
> include/net/netmem.h | 24 +-
> include/net/page_pool/helpers.h | 11 +
> include/net/page_pool/types.h | 63 +-
> net/core/devmem.c | 4 +-
> net/core/netmem_priv.h | 5 +-
> net/core/page_pool.c | 660 ++++++++++++++----
> net/core/page_pool_priv.h | 12 +-
> 18 files changed, 664 insertions(+), 164 deletions(-)
>
next prev parent reply other threads:[~2025-01-07 14:26 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-06 13:01 [PATCH net-next v6 0/8] fix two bugs related to page_pool Yunsheng Lin
2025-01-06 13:01 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 1/8] page_pool: introduce page_pool_get_pp() API Yunsheng Lin
2025-01-06 13:01 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-07 14:52 ` Jesper Dangaard Brouer
2025-01-07 14:52 ` [Intel-wired-lan] " Jesper Dangaard Brouer
2025-01-08 9:37 ` Yunsheng Lin
2025-01-08 9:37 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 2/8] page_pool: fix timing for checking and disabling napi_local Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 3/8] page_pool: fix IOMMU crash when driver has already unbound Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 4/8] page_pool: support unlimited number of inflight pages Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 5/8] page_pool: skip dma sync operation for " Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 6/8] page_pool: use list instead of ptr_ring for ring cache Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 7/8] page_pool: batch refilling pages to reduce atomic operation Yunsheng Lin
2025-01-06 13:01 ` [PATCH net-next v6 8/8] page_pool: use list instead of array for alloc cache Yunsheng Lin
2025-01-07 12:03 ` Simon Horman
2025-01-07 12:55 ` Yunsheng Lin
2025-01-06 23:51 ` [PATCH net-next v6 0/8] fix two bugs related to page_pool Jakub Kicinski
2025-01-06 23:51 ` [Intel-wired-lan] " Jakub Kicinski
2025-01-07 12:54 ` Yunsheng Lin
2025-01-07 12:54 ` [Intel-wired-lan] " Yunsheng Lin
2025-01-07 14:26 ` Jesper Dangaard Brouer [this message]
2025-01-07 14:26 ` Jesper Dangaard Brouer
2025-01-08 9:36 ` Yunsheng Lin
2025-01-08 9:36 ` [Intel-wired-lan] " Yunsheng Lin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f977c0ab-76f5-4869-9fb7-e111104e2fff@kernel.org \
--to=hawk@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=aleksander.lobakin@intel.com \
--cc=alexander.duyck@gmail.com \
--cc=angelogioacchino.delregno@collabora.com \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=fanghaiqing@huawei.com \
--cc=intel-wired-lan@lists.osuosl.org \
--cc=iommu@lists.linux.dev \
--cc=john.fastabend@gmail.com \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-mm@kvack.org \
--cc=linyunsheng@huawei.com \
--cc=liuyonglong@huawei.com \
--cc=matthias.bgg@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=robin.murphy@arm.com \
--cc=zhangkun09@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.