netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t.
@ 2025-04-08 10:59 Sebastian Andrzej Siewior
  2025-04-08 10:59 ` [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics Sebastian Andrzej Siewior
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-04-08 10:59 UTC (permalink / raw)
  To: linux-rdma, linux-rt-devel, netdev
  Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
	Jakub Kicinski, Jesper Dangaard Brouer, Joe Damato,
	Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner, Yunsheng Lin,
	Sebastian Andrzej Siewior

This is a follow-up on
        https://lore.kernel.org/all/20250213093925.x_ggH1aj@linutronix.de/

to convert the page_pool statistics to u64_stats_t to avoid u64 related
problems on 32bit architectures.
While looking over it, the comment for recycle_stat_inc() says that it
is safe to use in preemptible context. The 32bit update is split into
two 32bit writes. This is "okay" if the value observed from the current
CPU but cross CPU reads may observe inconsistencies if the lower part
overflows and the upper part is not yet written.
I explained this and added x86-32 assembly in
	https://lore.kernel.org/all/20250226102703.3F7Aa2oK@linutronix.de/

I don't know if it is ensured that only *one* update can happen because
the stats are per-CPU and per NAPI device. But there will be now a
warning on 32bit if this is really attempted in preemptible context.

The placement of the counters is not affected by this change except on
32bit where an additional sync member is added. For 64bit pahole output
changes from
| struct page_pool_recycle_stats {
|         u64                        cached;               /*     0     8 */
|         u64                        cache_full;           /*     8     8 */
|         u64                        ring;                 /*    16     8 */
|         u64                        ring_full;            /*    24     8 */
|         u64                        released_refcnt;      /*    32     8 */
|
|         /* size: 40, cachelines: 1, members: 5 */
|         /* last cacheline: 40 bytes */
| };

to
| struct page_pool_recycle_stats {
|         struct u64_stats_sync      syncp;                /*     0     0 */
|         u64_stats_t                cached;               /*     0     8 */
|         u64_stats_t                cache_full;           /*     8     8 */
|         u64_stats_t                ring;                 /*    16     8 */
|         u64_stats_t                ring_full;            /*    24     8 */
|         u64_stats_t                released_refcnt;      /*    32     8 */
|
|         /* size: 40, cachelines: 1, members: 6 */
|         /* last cacheline: 40 bytes */
| };

On 32bit struct u64_stats_sync grows by 4 bytes (plus addiional 20 with
lockdep).

For bench_page_pool_simple.ko loops=600000000 I ended up with, before:

| time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.501 ns (step:0)
| time_bench: Type:atomic_inc Per elem: 6 cycles(tsc) 3.303 ns (step:0)
| time_bench: Type:lock Per elem: 28 cycles(tsc) 14.038 ns (step:0)
| time_bench: Type:u64_stats_inc Per elem: 1 cycles(tsc) 0.565 ns (step:0)
| time_bench: Type:this_cpu_inc Per elem: 1 cycles(tsc) 0.503 ns (step:0)
| 
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool01 Per elem: 19 cycles(tsc) 9.526 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool02 Per elem: 46 cycles(tsc) 23.501 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool03 Per elem: 121 cycles(tsc) 60.697 ns (step:0)
| bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 9.531 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 45 cycles(tsc) 22.594 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool03_slow Per elem: 123 cycles(tsc) 61.969 ns (step:0)

after:
| time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.501 ns (step:0)
| time_bench: Type:atomic_inc Per elem: 6 cycles(tsc) 3.324 ns (step:0)
| time_bench: Type:lock Per elem: 28 cycles(tsc) 14.038 ns (step:0)
| time_bench: Type:u64_stats_inc Per elem: 1 cycles(tsc) 0.565 ns (step:0)
| time_bench: Type:this_cpu_inc Per elem: 1 cycles(tsc) 0.506 ns (step:0)
| 
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool01 Per elem: 18 cycles(tsc) 9.028 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool02 Per elem: 45 cycles(tsc) 22.714 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool03 Per elem: 120 cycles(tsc) 60.428 ns (step:0)
| bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool01_fast_path Per elem: 18 cycles(tsc) 9.024 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 43 cycles(tsc) 22.028 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool03_slow Per elem: 121 cycles(tsc) 60.736 ns (step:0)

v2…v3: https://lore.kernel.org/all/20250307115722.705311-1-bigeasy@linutronix.de
  - Moved the page_pool_ethtool_stats_get_strings_mq() to the mlx5
    driver and named it mlx_page_pool_stats_get_strings_mq(). As per
    review it will remain a Mellanox only thing.

v1…v2: https://lore.kernel.org/all/20250221115221.291006-1-bigeasy@linutronix.de
  - Clarified the cover mail, added stat for pahole and from bench_page_pool_simple.ko
  - Corrected page_pool_alloc_stats vs page_pool_recycle_stats type in
    the last patch.
  - Copy the counter values outside of the do {} while loop and add them
    later.
  - Redid the mlnx5 patch to make it use generic infrastructure which is
    now extended as part of this series.

Sebastian Andrzej Siewior (4):
  mlnx5: Use generic code for page_pool statistics.
  page_pool: Provide an empty page_pool_stats for disabled stats.
  page_pool: Convert page_pool_recycle_stats to u64_stats_t.
  page_pool: Convert page_pool_alloc_stats to u64_stats_t.

 Documentation/networking/page_pool.rst        |  6 +-
 .../ethernet/mellanox/mlx5/core/en_stats.c    | 97 ++++++++----------
 .../ethernet/mellanox/mlx5/core/en_stats.h    | 26 +----
 include/linux/u64_stats_sync.h                |  5 +
 include/net/page_pool/helpers.h               |  6 ++
 include/net/page_pool/types.h                 | 31 +++---
 net/core/page_pool.c                          | 99 +++++++++++++------
 net/core/page_pool_user.c                     | 22 ++---
 8 files changed, 159 insertions(+), 133 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-04-10 11:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08 10:59 [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
2025-04-08 10:59 ` [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics Sebastian Andrzej Siewior
2025-04-10  7:16   ` Tariq Toukan
2025-04-10  8:58     ` Sebastian Andrzej Siewior
2025-04-10 11:23       ` Tariq Toukan
2025-04-08 10:59 ` [PATCH net-next v3 2/4] page_pool: Provide an empty page_pool_stats for disabled stats Sebastian Andrzej Siewior
2025-04-08 10:59 ` [PATCH net-next v3 3/4] page_pool: Convert page_pool_recycle_stats to u64_stats_t Sebastian Andrzej Siewior
2025-04-08 12:13   ` Yunsheng Lin
2025-04-09 16:11     ` Sebastian Andrzej Siewior
2025-04-08 10:59 ` [PATCH net-next v3 4/4] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
2025-04-08 12:27   ` Yunsheng Lin
2025-04-08 12:33 ` [PATCH net-next v3 0/4] page_pool: Convert stats " Yunsheng Lin
2025-04-09  1:56 ` Jakub Kicinski
2025-04-09 16:15   ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).