[PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t.
@ 2025-04-08 10:59 Sebastian Andrzej Siewior
  2025-04-08 10:59 ` [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics Sebastian Andrzej Siewior
                   ` (5 more replies)
  0 siblings, 6 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-04-08 10:59 UTC (permalink / raw)
  To: linux-rdma, linux-rt-devel, netdev
  Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
	Jakub Kicinski, Jesper Dangaard Brouer, Joe Damato,
	Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner, Yunsheng Lin,
	Sebastian Andrzej Siewior

This is a follow-up on
        https://lore.kernel.org/all/20250213093925.x_ggH1aj@linutronix.de/

to convert the page_pool statistics to u64_stats_t to avoid u64 related
problems on 32bit architectures.
While looking over it, the comment for recycle_stat_inc() says that it
is safe to use in preemptible context. The 32bit update is split into
two 32bit writes. This is "okay" if the value observed from the current
CPU but cross CPU reads may observe inconsistencies if the lower part
overflows and the upper part is not yet written.
I explained this and added x86-32 assembly in
	https://lore.kernel.org/all/20250226102703.3F7Aa2oK@linutronix.de/

I don't know if it is ensured that only *one* update can happen because
the stats are per-CPU and per NAPI device. But there will be now a
warning on 32bit if this is really attempted in preemptible context.

The placement of the counters is not affected by this change except on
32bit where an additional sync member is added. For 64bit pahole output
changes from
| struct page_pool_recycle_stats {
|         u64                        cached;               /*     0     8 */
|         u64                        cache_full;           /*     8     8 */
|         u64                        ring;                 /*    16     8 */
|         u64                        ring_full;            /*    24     8 */
|         u64                        released_refcnt;      /*    32     8 */
|
|         /* size: 40, cachelines: 1, members: 5 */
|         /* last cacheline: 40 bytes */
| };

to
| struct page_pool_recycle_stats {
|         struct u64_stats_sync      syncp;                /*     0     0 */
|         u64_stats_t                cached;               /*     0     8 */
|         u64_stats_t                cache_full;           /*     8     8 */
|         u64_stats_t                ring;                 /*    16     8 */
|         u64_stats_t                ring_full;            /*    24     8 */
|         u64_stats_t                released_refcnt;      /*    32     8 */
|
|         /* size: 40, cachelines: 1, members: 6 */
|         /* last cacheline: 40 bytes */
| };

On 32bit struct u64_stats_sync grows by 4 bytes (plus addiional 20 with
lockdep).

For bench_page_pool_simple.ko loops=600000000 I ended up with, before:

| time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.501 ns (step:0)
| time_bench: Type:atomic_inc Per elem: 6 cycles(tsc) 3.303 ns (step:0)
| time_bench: Type:lock Per elem: 28 cycles(tsc) 14.038 ns (step:0)
| time_bench: Type:u64_stats_inc Per elem: 1 cycles(tsc) 0.565 ns (step:0)
| time_bench: Type:this_cpu_inc Per elem: 1 cycles(tsc) 0.503 ns (step:0)
| 
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool01 Per elem: 19 cycles(tsc) 9.526 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool02 Per elem: 46 cycles(tsc) 23.501 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool03 Per elem: 121 cycles(tsc) 60.697 ns (step:0)
| bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 9.531 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 45 cycles(tsc) 22.594 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool03_slow Per elem: 123 cycles(tsc) 61.969 ns (step:0)

after:
| time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.501 ns (step:0)
| time_bench: Type:atomic_inc Per elem: 6 cycles(tsc) 3.324 ns (step:0)
| time_bench: Type:lock Per elem: 28 cycles(tsc) 14.038 ns (step:0)
| time_bench: Type:u64_stats_inc Per elem: 1 cycles(tsc) 0.565 ns (step:0)
| time_bench: Type:this_cpu_inc Per elem: 1 cycles(tsc) 0.506 ns (step:0)
| 
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool01 Per elem: 18 cycles(tsc) 9.028 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool02 Per elem: 45 cycles(tsc) 22.714 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
| time_bench: Type:no-softirq-page_pool03 Per elem: 120 cycles(tsc) 60.428 ns (step:0)
| bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
| bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool01_fast_path Per elem: 18 cycles(tsc) 9.024 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 43 cycles(tsc) 22.028 ns (step:0)
| bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
| time_bench: Type:tasklet_page_pool03_slow Per elem: 121 cycles(tsc) 60.736 ns (step:0)

v2…v3: https://lore.kernel.org/all/20250307115722.705311-1-bigeasy@linutronix.de
  - Moved the page_pool_ethtool_stats_get_strings_mq() to the mlx5
    driver and named it mlx_page_pool_stats_get_strings_mq(). As per
    review it will remain a Mellanox only thing.

v1…v2: https://lore.kernel.org/all/20250221115221.291006-1-bigeasy@linutronix.de
  - Clarified the cover mail, added stat for pahole and from bench_page_pool_simple.ko
  - Corrected page_pool_alloc_stats vs page_pool_recycle_stats type in
    the last patch.
  - Copy the counter values outside of the do {} while loop and add them
    later.
  - Redid the mlnx5 patch to make it use generic infrastructure which is
    now extended as part of this series.

Sebastian Andrzej Siewior (4):
  mlnx5: Use generic code for page_pool statistics.
  page_pool: Provide an empty page_pool_stats for disabled stats.
  page_pool: Convert page_pool_recycle_stats to u64_stats_t.
  page_pool: Convert page_pool_alloc_stats to u64_stats_t.

 Documentation/networking/page_pool.rst        |  6 +-
 .../ethernet/mellanox/mlx5/core/en_stats.c    | 97 ++++++++----------
 .../ethernet/mellanox/mlx5/core/en_stats.h    | 26 +----
 include/linux/u64_stats_sync.h                |  5 +
 include/net/page_pool/helpers.h               |  6 ++
 include/net/page_pool/types.h                 | 31 +++---
 net/core/page_pool.c                          | 99 +++++++++++++------
 net/core/page_pool_user.c                     | 22 ++---
 8 files changed, 159 insertions(+), 133 deletions(-)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics.
  2025-04-08 10:59 [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
@ 2025-04-08 10:59 ` Sebastian Andrzej Siewior
  2025-04-10  7:16   ` Tariq Toukan
  2025-04-08 10:59 ` [PATCH net-next v3 2/4] page_pool: Provide an empty page_pool_stats for disabled stats Sebastian Andrzej Siewior
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-04-08 10:59 UTC (permalink / raw)
  To: linux-rdma, linux-rt-devel, netdev
  Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
	Jakub Kicinski, Jesper Dangaard Brouer, Joe Damato,
	Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner, Yunsheng Lin,
	Sebastian Andrzej Siewior

The statistics gathering code for page_pool statistics has multiple
steps:
- gather statistics from a channel via page_pool_get_stats() to an
  on-stack structure.
- copy this data to dedicated rq_stats.
- copy the data from rq_stats global mlx5e_sw_stats structure, and merge
  per-queue statistics into one counter.
- Finally copy the data in specific order for the ethtool query (both
  per queue and all queues summed up).

The downside here is that the individual counter types are expected to
be u64 and if something changes, the code breaks. Also if additional
counter are added to struct page_pool_stats then they are not
automatically picked up by the driver but need to be manually added in
all four spots.

Remove the page_pool_stats related description from sw_stats_desc and
rq_stats_desc.
Replace the counters in mlx5e_sw_stats and mlx5e_rq_stats with struct
page_pool_stats.
Let mlx5e_stats_update_stats_rq_page_pool() fetch the stats for
page_pool twice: One for the summed up data, one for the individual
queue.
Publish the strings via page_pool_ethtool_stats_get_strings() and
mlx_page_pool_stats_get_strings_mq().
Publish the counter via page_pool_ethtool_stats_get().

Suggested-by: Joe Damato <jdamato@fastly.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 .../ethernet/mellanox/mlx5/core/en_stats.c    | 97 ++++++++-----------
 .../ethernet/mellanox/mlx5/core/en_stats.h    | 26 +----
 2 files changed, 43 insertions(+), 80 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 1c121b435016d..54303877adb1d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -194,17 +194,6 @@ static const struct counter_desc sw_stats_desc[] = {
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_arfs_err) },
 #endif
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_recover) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_fast) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_slow) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_slow_high_order) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_empty) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_refill) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_waive) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_cached) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_cache_full) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_ring) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_ring_full) },
-	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_released_ref) },
 #ifdef CONFIG_MLX5_EN_TLS
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_tls_decrypted_packets) },
 	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_tls_decrypted_bytes) },
@@ -253,7 +242,7 @@ static const struct counter_desc sw_stats_desc[] = {
 
 static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(sw)
 {
-	return NUM_SW_COUNTERS;
+	return NUM_SW_COUNTERS + page_pool_ethtool_stats_get_count();
 }
 
 static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(sw)
@@ -262,6 +251,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(sw)
 
 	for (i = 0; i < NUM_SW_COUNTERS; i++)
 		ethtool_puts(data, sw_stats_desc[i].format);
+	*data = page_pool_ethtool_stats_get_strings(*data);
 }
 
 static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(sw)
@@ -272,6 +262,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(sw)
 		mlx5e_ethtool_put_stat(data,
 				       MLX5E_READ_CTR64_CPU(&priv->stats.sw,
 							    sw_stats_desc, i));
+	*data = page_pool_ethtool_stats_get(*data, &priv->stats.sw.page_pool_stats);
 }
 
 static void mlx5e_stats_grp_sw_update_stats_xdp_red(struct mlx5e_sw_stats *s,
@@ -373,17 +364,6 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
 	s->rx_arfs_err                += rq_stats->arfs_err;
 #endif
 	s->rx_recover                 += rq_stats->recover;
-	s->rx_pp_alloc_fast          += rq_stats->pp_alloc_fast;
-	s->rx_pp_alloc_slow          += rq_stats->pp_alloc_slow;
-	s->rx_pp_alloc_empty         += rq_stats->pp_alloc_empty;
-	s->rx_pp_alloc_refill        += rq_stats->pp_alloc_refill;
-	s->rx_pp_alloc_waive         += rq_stats->pp_alloc_waive;
-	s->rx_pp_alloc_slow_high_order		+= rq_stats->pp_alloc_slow_high_order;
-	s->rx_pp_recycle_cached			+= rq_stats->pp_recycle_cached;
-	s->rx_pp_recycle_cache_full		+= rq_stats->pp_recycle_cache_full;
-	s->rx_pp_recycle_ring			+= rq_stats->pp_recycle_ring;
-	s->rx_pp_recycle_ring_full		+= rq_stats->pp_recycle_ring_full;
-	s->rx_pp_recycle_released_ref		+= rq_stats->pp_recycle_released_ref;
 #ifdef CONFIG_MLX5_EN_TLS
 	s->rx_tls_decrypted_packets   += rq_stats->tls_decrypted_packets;
 	s->rx_tls_decrypted_bytes     += rq_stats->tls_decrypted_bytes;
@@ -490,27 +470,13 @@ static void mlx5e_stats_grp_sw_update_stats_qos(struct mlx5e_priv *priv,
 	}
 }
 
-static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
+static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_sw_stats *s,
+						  struct mlx5e_channel *c)
 {
 	struct mlx5e_rq_stats *rq_stats = c->rq.stats;
-	struct page_pool *pool = c->rq.page_pool;
-	struct page_pool_stats stats = { 0 };
 
-	if (!page_pool_get_stats(pool, &stats))
-		return;
-
-	rq_stats->pp_alloc_fast = stats.alloc_stats.fast;
-	rq_stats->pp_alloc_slow = stats.alloc_stats.slow;
-	rq_stats->pp_alloc_slow_high_order = stats.alloc_stats.slow_high_order;
-	rq_stats->pp_alloc_empty = stats.alloc_stats.empty;
-	rq_stats->pp_alloc_waive = stats.alloc_stats.waive;
-	rq_stats->pp_alloc_refill = stats.alloc_stats.refill;
-
-	rq_stats->pp_recycle_cached = stats.recycle_stats.cached;
-	rq_stats->pp_recycle_cache_full = stats.recycle_stats.cache_full;
-	rq_stats->pp_recycle_ring = stats.recycle_stats.ring;
-	rq_stats->pp_recycle_ring_full = stats.recycle_stats.ring_full;
-	rq_stats->pp_recycle_released_ref = stats.recycle_stats.released_refcnt;
+	page_pool_get_stats(c->rq.page_pool, &s->page_pool_stats);
+	page_pool_get_stats(c->rq.page_pool, &rq_stats->page_pool_stats);
 }
 
 static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
@@ -520,15 +486,13 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
 
 	memset(s, 0, sizeof(*s));
 
-	for (i = 0; i < priv->channels.num; i++) /* for active channels only */
-		mlx5e_stats_update_stats_rq_page_pool(priv->channels.c[i]);
-
 	for (i = 0; i < priv->stats_nch; i++) {
 		struct mlx5e_channel_stats *channel_stats =
 			priv->channel_stats[i];
 
 		int j;
 
+		mlx5e_stats_update_stats_rq_page_pool(s, priv->channels.c[i]);
 		mlx5e_stats_grp_sw_update_stats_rq_stats(s, &channel_stats->rq);
 		mlx5e_stats_grp_sw_update_stats_xdpsq(s, &channel_stats->rq_xdpsq);
 		mlx5e_stats_grp_sw_update_stats_ch_stats(s, &channel_stats->ch);
@@ -2119,17 +2083,6 @@ static const struct counter_desc rq_stats_desc[] = {
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, arfs_err) },
 #endif
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, recover) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_fast) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_slow) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_slow_high_order) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_empty) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_refill) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_waive) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_cached) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_cache_full) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_ring) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_ring_full) },
-	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_released_ref) },
 #ifdef CONFIG_MLX5_EN_TLS
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, tls_decrypted_packets) },
 	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, tls_decrypted_bytes) },
@@ -2477,7 +2430,32 @@ static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(channels)
 	       (NUM_RQ_XDPSQ_STATS * max_nch) +
 	       (NUM_XDPSQ_STATS * max_nch) +
 	       (NUM_XSKRQ_STATS * max_nch * priv->xsk.ever_used) +
-	       (NUM_XSKSQ_STATS * max_nch * priv->xsk.ever_used);
+	       (NUM_XSKSQ_STATS * max_nch * priv->xsk.ever_used) +
+	       page_pool_ethtool_stats_get_count() * max_nch;
+}
+
+static const char pp_stats_mq[][ETH_GSTRING_LEN] = {
+	"rx%d_pp_alloc_fast",
+	"rx%d_pp_alloc_slow",
+	"rx%d_pp_alloc_slow_ho",
+	"rx%d_pp_alloc_empty",
+	"rx%d_pp_alloc_refill",
+	"rx%d_pp_alloc_waive",
+	"rx%d_pp_recycle_cached",
+	"rx%d_pp_recycle_cache_full",
+	"rx%d_pp_recycle_ring",
+	"rx%d_pp_recycle_ring_full",
+	"rx%d_pp_recycle_released_ref",
+};
+
+static void mlx_page_pool_stats_get_strings_mq(u8 **data, unsigned int queue)
+{
+	int i;
+
+	WARN_ON_ONCE(ARRAY_SIZE(pp_stats_mq) != page_pool_ethtool_stats_get_count());
+
+	for (i = 0; i < ARRAY_SIZE(pp_stats_mq); i++)
+		ethtool_sprintf(data, pp_stats_mq[i], queue);
 }
 
 static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
@@ -2493,6 +2471,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
 	for (i = 0; i < max_nch; i++) {
 		for (j = 0; j < NUM_RQ_STATS; j++)
 			ethtool_sprintf(data, rq_stats_desc[j].format, i);
+		mlx_page_pool_stats_get_strings_mq(data, i);
 		for (j = 0; j < NUM_XSKRQ_STATS * is_xsk; j++)
 			ethtool_sprintf(data, xskrq_stats_desc[j].format, i);
 		for (j = 0; j < NUM_RQ_XDPSQ_STATS; j++)
@@ -2527,11 +2506,13 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(channels)
 					      ch_stats_desc, j));
 
 	for (i = 0; i < max_nch; i++) {
+		struct mlx5e_rq_stats *rq_stats = &priv->channel_stats[i]->rq;
+
 		for (j = 0; j < NUM_RQ_STATS; j++)
 			mlx5e_ethtool_put_stat(
-				data, MLX5E_READ_CTR64_CPU(
-					      &priv->channel_stats[i]->rq,
+				data, MLX5E_READ_CTR64_CPU(rq_stats,
 					      rq_stats_desc, j));
+		*data = page_pool_ethtool_stats_get(*data, &rq_stats->page_pool_stats);
 		for (j = 0; j < NUM_XSKRQ_STATS * is_xsk; j++)
 			mlx5e_ethtool_put_stat(
 				data, MLX5E_READ_CTR64_CPU(
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
index 8de6fcbd3a033..30c5c2a92508b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
@@ -33,6 +33,8 @@
 #ifndef __MLX5_EN_STATS_H__
 #define __MLX5_EN_STATS_H__
 
+#include <net/page_pool/types.h>
+
 #define MLX5E_READ_CTR64_CPU(ptr, dsc, i) \
 	(*(u64 *)((char *)ptr + dsc[i].offset))
 #define MLX5E_READ_CTR64_BE(ptr, dsc, i) \
@@ -215,17 +217,7 @@ struct mlx5e_sw_stats {
 	u64 ch_aff_change;
 	u64 ch_force_irq;
 	u64 ch_eq_rearm;
-	u64 rx_pp_alloc_fast;
-	u64 rx_pp_alloc_slow;
-	u64 rx_pp_alloc_slow_high_order;
-	u64 rx_pp_alloc_empty;
-	u64 rx_pp_alloc_refill;
-	u64 rx_pp_alloc_waive;
-	u64 rx_pp_recycle_cached;
-	u64 rx_pp_recycle_cache_full;
-	u64 rx_pp_recycle_ring;
-	u64 rx_pp_recycle_ring_full;
-	u64 rx_pp_recycle_released_ref;
+	struct page_pool_stats page_pool_stats;
 #ifdef CONFIG_MLX5_EN_TLS
 	u64 tx_tls_encrypted_packets;
 	u64 tx_tls_encrypted_bytes;
@@ -383,17 +375,7 @@ struct mlx5e_rq_stats {
 	u64 arfs_err;
 #endif
 	u64 recover;
-	u64 pp_alloc_fast;
-	u64 pp_alloc_slow;
-	u64 pp_alloc_slow_high_order;
-	u64 pp_alloc_empty;
-	u64 pp_alloc_refill;
-	u64 pp_alloc_waive;
-	u64 pp_recycle_cached;
-	u64 pp_recycle_cache_full;
-	u64 pp_recycle_ring;
-	u64 pp_recycle_ring_full;
-	u64 pp_recycle_released_ref;
+	struct page_pool_stats page_pool_stats;
 #ifdef CONFIG_MLX5_EN_TLS
 	u64 tls_decrypted_packets;
 	u64 tls_decrypted_bytes;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics.
  2025-04-08 10:59 ` [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics Sebastian Andrzej Siewior
@ 2025-04-10  7:16   ` Tariq Toukan
  2025-04-10  8:58     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Tariq Toukan @ 2025-04-10  7:16 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-rdma, linux-rt-devel, netdev
  Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
	Jakub Kicinski, Jesper Dangaard Brouer, Joe Damato,
	Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner, Yunsheng Lin, Tariq Toukan


Fix subject module/submodule to be "net/mlx5e".

Also, for me it looks strange to see a dot at the end of the patch subject.

On 08/04/2025 13:59, Sebastian Andrzej Siewior wrote:
> The statistics gathering code for page_pool statistics has multiple
> steps:
> - gather statistics from a channel via page_pool_get_stats() to an
>    on-stack structure.
> - copy this data to dedicated rq_stats.
> - copy the data from rq_stats global mlx5e_sw_stats structure, and merge
>    per-queue statistics into one counter.
> - Finally copy the data in specific order for the ethtool query (both
>    per queue and all queues summed up).
> 
> The downside here is that the individual counter types are expected to
> be u64 and if something changes, the code breaks. Also if additional
> counter are added to struct page_pool_stats then they are not
> automatically picked up by the driver but need to be manually added in
> all four spots.
> 
> Remove the page_pool_stats related description from sw_stats_desc and
> rq_stats_desc.
> Replace the counters in mlx5e_sw_stats and mlx5e_rq_stats with struct
> page_pool_stats.
> Let mlx5e_stats_update_stats_rq_page_pool() fetch the stats for
> page_pool twice: One for the summed up data, one for the individual
> queue.
> Publish the strings via page_pool_ethtool_stats_get_strings() and
> mlx_page_pool_stats_get_strings_mq().
> Publish the counter via page_pool_ethtool_stats_get().
> 
> Suggested-by: Joe Damato <jdamato@fastly.com>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---

Thanks for your patch.

Was this tested?
Were you able to run with mlx5 device?


>   .../ethernet/mellanox/mlx5/core/en_stats.c    | 97 ++++++++-----------
>   .../ethernet/mellanox/mlx5/core/en_stats.h    | 26 +----
>   2 files changed, 43 insertions(+), 80 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> index 1c121b435016d..54303877adb1d 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> @@ -194,17 +194,6 @@ static const struct counter_desc sw_stats_desc[] = {
>   	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_arfs_err) },
>   #endif
>   	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_recover) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_fast) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_slow) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_slow_high_order) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_empty) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_refill) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_waive) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_cached) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_cache_full) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_ring) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_ring_full) },
> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_released_ref) },
>   #ifdef CONFIG_MLX5_EN_TLS
>   	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_tls_decrypted_packets) },
>   	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_tls_decrypted_bytes) },
> @@ -253,7 +242,7 @@ static const struct counter_desc sw_stats_desc[] = {
>   
>   static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(sw)
>   {
> -	return NUM_SW_COUNTERS;
> +	return NUM_SW_COUNTERS + page_pool_ethtool_stats_get_count();
>   }
>   
>   static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(sw)
> @@ -262,6 +251,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(sw)
>   
>   	for (i = 0; i < NUM_SW_COUNTERS; i++)
>   		ethtool_puts(data, sw_stats_desc[i].format);
> +	*data = page_pool_ethtool_stats_get_strings(*data);
>   }
>   
>   static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(sw)
> @@ -272,6 +262,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(sw)
>   		mlx5e_ethtool_put_stat(data,
>   				       MLX5E_READ_CTR64_CPU(&priv->stats.sw,
>   							    sw_stats_desc, i));
> +	*data = page_pool_ethtool_stats_get(*data, &priv->stats.sw.page_pool_stats);
>   }
>   
>   static void mlx5e_stats_grp_sw_update_stats_xdp_red(struct mlx5e_sw_stats *s,
> @@ -373,17 +364,6 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
>   	s->rx_arfs_err                += rq_stats->arfs_err;
>   #endif
>   	s->rx_recover                 += rq_stats->recover;
> -	s->rx_pp_alloc_fast          += rq_stats->pp_alloc_fast;
> -	s->rx_pp_alloc_slow          += rq_stats->pp_alloc_slow;
> -	s->rx_pp_alloc_empty         += rq_stats->pp_alloc_empty;
> -	s->rx_pp_alloc_refill        += rq_stats->pp_alloc_refill;
> -	s->rx_pp_alloc_waive         += rq_stats->pp_alloc_waive;
> -	s->rx_pp_alloc_slow_high_order		+= rq_stats->pp_alloc_slow_high_order;
> -	s->rx_pp_recycle_cached			+= rq_stats->pp_recycle_cached;
> -	s->rx_pp_recycle_cache_full		+= rq_stats->pp_recycle_cache_full;
> -	s->rx_pp_recycle_ring			+= rq_stats->pp_recycle_ring;
> -	s->rx_pp_recycle_ring_full		+= rq_stats->pp_recycle_ring_full;
> -	s->rx_pp_recycle_released_ref		+= rq_stats->pp_recycle_released_ref;
>   #ifdef CONFIG_MLX5_EN_TLS
>   	s->rx_tls_decrypted_packets   += rq_stats->tls_decrypted_packets;
>   	s->rx_tls_decrypted_bytes     += rq_stats->tls_decrypted_bytes;
> @@ -490,27 +470,13 @@ static void mlx5e_stats_grp_sw_update_stats_qos(struct mlx5e_priv *priv,
>   	}
>   }
>   
> -static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
> +static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_sw_stats *s,
> +						  struct mlx5e_channel *c)

This per-RQ function should not get the general SW status struct and 
write to it.

Gathering stats from all RQs into the general SW stats should not be 
done here.

>   {
>   	struct mlx5e_rq_stats *rq_stats = c->rq.stats;
> -	struct page_pool *pool = c->rq.page_pool;
> -	struct page_pool_stats stats = { 0 };
>   
> -	if (!page_pool_get_stats(pool, &stats))
> -		return;
> -
> -	rq_stats->pp_alloc_fast = stats.alloc_stats.fast;
> -	rq_stats->pp_alloc_slow = stats.alloc_stats.slow;
> -	rq_stats->pp_alloc_slow_high_order = stats.alloc_stats.slow_high_order;
> -	rq_stats->pp_alloc_empty = stats.alloc_stats.empty;
> -	rq_stats->pp_alloc_waive = stats.alloc_stats.waive;
> -	rq_stats->pp_alloc_refill = stats.alloc_stats.refill;
> -
> -	rq_stats->pp_recycle_cached = stats.recycle_stats.cached;
> -	rq_stats->pp_recycle_cache_full = stats.recycle_stats.cache_full;
> -	rq_stats->pp_recycle_ring = stats.recycle_stats.ring;
> -	rq_stats->pp_recycle_ring_full = stats.recycle_stats.ring_full;
> -	rq_stats->pp_recycle_released_ref = stats.recycle_stats.released_refcnt;
> +	page_pool_get_stats(c->rq.page_pool, &s->page_pool_stats);
> +	page_pool_get_stats(c->rq.page_pool, &rq_stats->page_pool_stats);
>   }
>   
>   static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
> @@ -520,15 +486,13 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
>   
>   	memset(s, 0, sizeof(*s));
>   
> -	for (i = 0; i < priv->channels.num; i++) /* for active channels only */
> -		mlx5e_stats_update_stats_rq_page_pool(priv->channels.c[i]);
> -
>   	for (i = 0; i < priv->stats_nch; i++) {
>   		struct mlx5e_channel_stats *channel_stats =
>   			priv->channel_stats[i];
>   
>   		int j;
>   
> +		mlx5e_stats_update_stats_rq_page_pool(s, priv->channels.c[i]);

Should separate, do not mix two roles for the same function.

Moreover, this won't work, it's buggy:
You're looping up to priv->stats_nch (largest num of channels ever 
created) trying to access priv->channels.c[i] (current number of 
channels), you can get out of bound accessing non-existing channels.

There's a design issue to be solved here:
Previously, the page_pool stats existed in RQ stats, also for RQs that 
no longer exist. We must preserve this behavior here, even if it means 
assigning zeros for the page_pool stats for non-existing RQs.

Worth mentioning: Due to the nature of the infrastructure, today the 
page_pool stats are not-persistent, while all our other stats are. They 
are reset to zeros upon channels down/up events. This is not optimal but 
that's how it is today, and is not expected to be changed here in your 
series.

>   		mlx5e_stats_grp_sw_update_stats_rq_stats(s, &channel_stats->rq);
>   		mlx5e_stats_grp_sw_update_stats_xdpsq(s, &channel_stats->rq_xdpsq);
>   		mlx5e_stats_grp_sw_update_stats_ch_stats(s, &channel_stats->ch);
> @@ -2119,17 +2083,6 @@ static const struct counter_desc rq_stats_desc[] = {
>   	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, arfs_err) },
>   #endif
>   	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, recover) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_fast) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_slow) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_slow_high_order) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_empty) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_refill) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_waive) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_cached) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_cache_full) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_ring) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_ring_full) },
> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_released_ref) },
>   #ifdef CONFIG_MLX5_EN_TLS
>   	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, tls_decrypted_packets) },
>   	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, tls_decrypted_bytes) },
> @@ -2477,7 +2430,32 @@ static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(channels)
>   	       (NUM_RQ_XDPSQ_STATS * max_nch) +
>   	       (NUM_XDPSQ_STATS * max_nch) +
>   	       (NUM_XSKRQ_STATS * max_nch * priv->xsk.ever_used) +
> -	       (NUM_XSKSQ_STATS * max_nch * priv->xsk.ever_used);
> +	       (NUM_XSKSQ_STATS * max_nch * priv->xsk.ever_used) +
> +	       page_pool_ethtool_stats_get_count() * max_nch;

Take this closer to the NUM_RQ_STATS * max_nch part.

> +}
> +
> +static const char pp_stats_mq[][ETH_GSTRING_LEN] = {
> +	"rx%d_pp_alloc_fast",
> +	"rx%d_pp_alloc_slow",
> +	"rx%d_pp_alloc_slow_ho",
> +	"rx%d_pp_alloc_empty",
> +	"rx%d_pp_alloc_refill",
> +	"rx%d_pp_alloc_waive",
> +	"rx%d_pp_recycle_cached",
> +	"rx%d_pp_recycle_cache_full",
> +	"rx%d_pp_recycle_ring",
> +	"rx%d_pp_recycle_ring_full",
> +	"rx%d_pp_recycle_released_ref",

Why static? Isn't the whole point that we want automatic alignment for 
changes in net/core/page_pool.c :: struct pp_stats ?
I suggest writing the code so that mlx5e pp_stats_mq strings are 
generated and adjusted automatically from the generic struct pp_stats.

> +};
> +
> +static void mlx_page_pool_stats_get_strings_mq(u8 **data, unsigned int queue)

Use mlx5e prefix.
Can use pp to shorten page_pool in function/struct names.

> +{
> +	int i;
> +
> +	WARN_ON_ONCE(ARRAY_SIZE(pp_stats_mq) != page_pool_ethtool_stats_get_count());

Not good. We don't want to get a WARNING in case someone aded new pp 
stats without adding to mlx5e.

Shoud write the code so that this is not possible.

> +
> +	for (i = 0; i < ARRAY_SIZE(pp_stats_mq); i++)
> +		ethtool_sprintf(data, pp_stats_mq[i], queue);
>   }
>   
>   static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
> @@ -2493,6 +2471,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
>   	for (i = 0; i < max_nch; i++) {
>   		for (j = 0; j < NUM_RQ_STATS; j++)
>   			ethtool_sprintf(data, rq_stats_desc[j].format, i);
> +		mlx_page_pool_stats_get_strings_mq(data, i);
>   		for (j = 0; j < NUM_XSKRQ_STATS * is_xsk; j++)
>   			ethtool_sprintf(data, xskrq_stats_desc[j].format, i);
>   		for (j = 0; j < NUM_RQ_XDPSQ_STATS; j++)
> @@ -2527,11 +2506,13 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(channels)
>   					      ch_stats_desc, j));
>   
>   	for (i = 0; i < max_nch; i++) {
> +		struct mlx5e_rq_stats *rq_stats = &priv->channel_stats[i]->rq;
> +
>   		for (j = 0; j < NUM_RQ_STATS; j++)
>   			mlx5e_ethtool_put_stat(
> -				data, MLX5E_READ_CTR64_CPU(
> -					      &priv->channel_stats[i]->rq,
> +				data, MLX5E_READ_CTR64_CPU(rq_stats,
>   					      rq_stats_desc, j));
> +		*data = page_pool_ethtool_stats_get(*data, &rq_stats->page_pool_stats);
>   		for (j = 0; j < NUM_XSKRQ_STATS * is_xsk; j++)
>   			mlx5e_ethtool_put_stat(
>   				data, MLX5E_READ_CTR64_CPU(
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
> index 8de6fcbd3a033..30c5c2a92508b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
> @@ -33,6 +33,8 @@
>   #ifndef __MLX5_EN_STATS_H__
>   #define __MLX5_EN_STATS_H__
>   
> +#include <net/page_pool/types.h>
> +
>   #define MLX5E_READ_CTR64_CPU(ptr, dsc, i) \
>   	(*(u64 *)((char *)ptr + dsc[i].offset))
>   #define MLX5E_READ_CTR64_BE(ptr, dsc, i) \
> @@ -215,17 +217,7 @@ struct mlx5e_sw_stats {
>   	u64 ch_aff_change;
>   	u64 ch_force_irq;
>   	u64 ch_eq_rearm;
> -	u64 rx_pp_alloc_fast;
> -	u64 rx_pp_alloc_slow;
> -	u64 rx_pp_alloc_slow_high_order;
> -	u64 rx_pp_alloc_empty;
> -	u64 rx_pp_alloc_refill;
> -	u64 rx_pp_alloc_waive;
> -	u64 rx_pp_recycle_cached;
> -	u64 rx_pp_recycle_cache_full;
> -	u64 rx_pp_recycle_ring;
> -	u64 rx_pp_recycle_ring_full;
> -	u64 rx_pp_recycle_released_ref;
> +	struct page_pool_stats page_pool_stats;

Maybe call it rx_pp_stats ?

>   #ifdef CONFIG_MLX5_EN_TLS
>   	u64 tx_tls_encrypted_packets;
>   	u64 tx_tls_encrypted_bytes;
> @@ -383,17 +375,7 @@ struct mlx5e_rq_stats {
>   	u64 arfs_err;
>   #endif
>   	u64 recover;
> -	u64 pp_alloc_fast;
> -	u64 pp_alloc_slow;
> -	u64 pp_alloc_slow_high_order;
> -	u64 pp_alloc_empty;
> -	u64 pp_alloc_refill;
> -	u64 pp_alloc_waive;
> -	u64 pp_recycle_cached;
> -	u64 pp_recycle_cache_full;
> -	u64 pp_recycle_ring;
> -	u64 pp_recycle_ring_full;
> -	u64 pp_recycle_released_ref;
> +	struct page_pool_stats page_pool_stats;

Maybe call it pp_stats ?

>   #ifdef CONFIG_MLX5_EN_TLS
>   	u64 tls_decrypted_packets;
>   	u64 tls_decrypted_bytes;


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics.
  2025-04-10  7:16   ` Tariq Toukan
@ 2025-04-10  8:58     ` Sebastian Andrzej Siewior
  2025-04-10 11:23       ` Tariq Toukan
  0 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-04-10  8:58 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: linux-rdma, linux-rt-devel, netdev, David S. Miller, Andrew Lunn,
	Eric Dumazet, Ilias Apalodimas, Jakub Kicinski,
	Jesper Dangaard Brouer, Joe Damato, Leon Romanovsky, Paolo Abeni,
	Saeed Mahameed, Simon Horman, Tariq Toukan, Thomas Gleixner,
	Yunsheng Lin

On 2025-04-10 10:16:54 [+0300], Tariq Toukan wrote:
> Thanks for your patch.
> 
> Was this tested?
> Were you able to run with mlx5 device?

Kind of. The device does optical and I have nothing to bring up the
link, a few gbic did not work so the guys gave up back then. On link up
the driver sets up buffers so I saw that the stats incrementing and the
numbers queries with ethtool made sense.

> >   .../ethernet/mellanox/mlx5/core/en_stats.c    | 97 ++++++++-----------
> >   .../ethernet/mellanox/mlx5/core/en_stats.h    | 26 +----
> >   2 files changed, 43 insertions(+), 80 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> > index 1c121b435016d..54303877adb1d 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> > @@ -194,17 +194,6 @@ static const struct counter_desc sw_stats_desc[] = {
> >   	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_arfs_err) },
> >   #endif
> >   	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_recover) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_fast) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_slow) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_slow_high_order) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_empty) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_refill) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_waive) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_cached) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_cache_full) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_ring) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_ring_full) },
> > -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_released_ref) },
> >   #ifdef CONFIG_MLX5_EN_TLS
> >   	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_tls_decrypted_packets) },
> >   	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_tls_decrypted_bytes) },
> > @@ -253,7 +242,7 @@ static const struct counter_desc sw_stats_desc[] = {
> >   static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(sw)
> >   {
> > -	return NUM_SW_COUNTERS;
> > +	return NUM_SW_COUNTERS + page_pool_ethtool_stats_get_count();
> >   }
> >   static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(sw)
> > @@ -262,6 +251,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(sw)
> >   	for (i = 0; i < NUM_SW_COUNTERS; i++)
> >   		ethtool_puts(data, sw_stats_desc[i].format);
> > +	*data = page_pool_ethtool_stats_get_strings(*data);
> >   }
> >   static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(sw)
> > @@ -272,6 +262,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(sw)
> >   		mlx5e_ethtool_put_stat(data,
> >   				       MLX5E_READ_CTR64_CPU(&priv->stats.sw,
> >   							    sw_stats_desc, i));
> > +	*data = page_pool_ethtool_stats_get(*data, &priv->stats.sw.page_pool_stats);
> >   }
> >   static void mlx5e_stats_grp_sw_update_stats_xdp_red(struct mlx5e_sw_stats *s,
> > @@ -373,17 +364,6 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
> >   	s->rx_arfs_err                += rq_stats->arfs_err;
> >   #endif
> >   	s->rx_recover                 += rq_stats->recover;
> > -	s->rx_pp_alloc_fast          += rq_stats->pp_alloc_fast;
> > -	s->rx_pp_alloc_slow          += rq_stats->pp_alloc_slow;
> > -	s->rx_pp_alloc_empty         += rq_stats->pp_alloc_empty;
> > -	s->rx_pp_alloc_refill        += rq_stats->pp_alloc_refill;
> > -	s->rx_pp_alloc_waive         += rq_stats->pp_alloc_waive;
> > -	s->rx_pp_alloc_slow_high_order		+= rq_stats->pp_alloc_slow_high_order;
> > -	s->rx_pp_recycle_cached			+= rq_stats->pp_recycle_cached;
> > -	s->rx_pp_recycle_cache_full		+= rq_stats->pp_recycle_cache_full;
> > -	s->rx_pp_recycle_ring			+= rq_stats->pp_recycle_ring;
> > -	s->rx_pp_recycle_ring_full		+= rq_stats->pp_recycle_ring_full;
> > -	s->rx_pp_recycle_released_ref		+= rq_stats->pp_recycle_released_ref;
> >   #ifdef CONFIG_MLX5_EN_TLS
> >   	s->rx_tls_decrypted_packets   += rq_stats->tls_decrypted_packets;
> >   	s->rx_tls_decrypted_bytes     += rq_stats->tls_decrypted_bytes;
> > @@ -490,27 +470,13 @@ static void mlx5e_stats_grp_sw_update_stats_qos(struct mlx5e_priv *priv,
> >   	}
> >   }
> > -static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
> > +static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_sw_stats *s,
> > +						  struct mlx5e_channel *c)
> 
> This per-RQ function should not get the general SW status struct and write
> to it.
> 
> Gathering stats from all RQs into the general SW stats should not be done
> here.

but I have to sum it for the global stats. The old code queried the
channels again for the global stats. This queries the channels twice,
once for per-channel and again for the global stats.

> >   {
> >   	struct mlx5e_rq_stats *rq_stats = c->rq.stats;
> > -	struct page_pool *pool = c->rq.page_pool;
> > -	struct page_pool_stats stats = { 0 };
> > -	if (!page_pool_get_stats(pool, &stats))
> > -		return;
> > -
> > -	rq_stats->pp_alloc_fast = stats.alloc_stats.fast;
> > -	rq_stats->pp_alloc_slow = stats.alloc_stats.slow;
> > -	rq_stats->pp_alloc_slow_high_order = stats.alloc_stats.slow_high_order;
> > -	rq_stats->pp_alloc_empty = stats.alloc_stats.empty;
> > -	rq_stats->pp_alloc_waive = stats.alloc_stats.waive;
> > -	rq_stats->pp_alloc_refill = stats.alloc_stats.refill;
> > -
> > -	rq_stats->pp_recycle_cached = stats.recycle_stats.cached;
> > -	rq_stats->pp_recycle_cache_full = stats.recycle_stats.cache_full;
> > -	rq_stats->pp_recycle_ring = stats.recycle_stats.ring;
> > -	rq_stats->pp_recycle_ring_full = stats.recycle_stats.ring_full;
> > -	rq_stats->pp_recycle_released_ref = stats.recycle_stats.released_refcnt;
> > +	page_pool_get_stats(c->rq.page_pool, &s->page_pool_stats);
> > +	page_pool_get_stats(c->rq.page_pool, &rq_stats->page_pool_stats);
> >   }
> >   static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
> > @@ -520,15 +486,13 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
> >   	memset(s, 0, sizeof(*s));
> > -	for (i = 0; i < priv->channels.num; i++) /* for active channels only */
> > -		mlx5e_stats_update_stats_rq_page_pool(priv->channels.c[i]);
> > -
> >   	for (i = 0; i < priv->stats_nch; i++) {
> >   		struct mlx5e_channel_stats *channel_stats =
> >   			priv->channel_stats[i];
> >   		int j;
> > +		mlx5e_stats_update_stats_rq_page_pool(s, priv->channels.c[i]);
> 
> Should separate, do not mix two roles for the same function.
> 
> Moreover, this won't work, it's buggy:
> You're looping up to priv->stats_nch (largest num of channels ever created)
> trying to access priv->channels.c[i] (current number of channels), you can
> get out of bound accessing non-existing channels.
> 
> There's a design issue to be solved here:
> Previously, the page_pool stats existed in RQ stats, also for RQs that no
> longer exist. We must preserve this behavior here, even if it means
> assigning zeros for the page_pool stats for non-existing RQs.
> 
> Worth mentioning: Due to the nature of the infrastructure, today the
> page_pool stats are not-persistent, while all our other stats are. They are
> reset to zeros upon channels down/up events. This is not optimal but that's
> how it is today, and is not expected to be changed here in your series.

so just channels.num then.

> >   		mlx5e_stats_grp_sw_update_stats_rq_stats(s, &channel_stats->rq);
> >   		mlx5e_stats_grp_sw_update_stats_xdpsq(s, &channel_stats->rq_xdpsq);
> >   		mlx5e_stats_grp_sw_update_stats_ch_stats(s, &channel_stats->ch);
> > @@ -2119,17 +2083,6 @@ static const struct counter_desc rq_stats_desc[] = {
> >   	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, arfs_err) },
> >   #endif
> >   	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, recover) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_fast) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_slow) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_slow_high_order) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_empty) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_refill) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_waive) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_cached) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_cache_full) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_ring) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_ring_full) },
> > -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_released_ref) },
> >   #ifdef CONFIG_MLX5_EN_TLS
> >   	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, tls_decrypted_packets) },
> >   	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, tls_decrypted_bytes) },
> > @@ -2477,7 +2430,32 @@ static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(channels)
> >   	       (NUM_RQ_XDPSQ_STATS * max_nch) +
> >   	       (NUM_XDPSQ_STATS * max_nch) +
> >   	       (NUM_XSKRQ_STATS * max_nch * priv->xsk.ever_used) +
> > -	       (NUM_XSKSQ_STATS * max_nch * priv->xsk.ever_used);
> > +	       (NUM_XSKSQ_STATS * max_nch * priv->xsk.ever_used) +
> > +	       page_pool_ethtool_stats_get_count() * max_nch;
> 
> Take this closer to the NUM_RQ_STATS * max_nch part.
> 
> > +}
> > +
> > +static const char pp_stats_mq[][ETH_GSTRING_LEN] = {
> > +	"rx%d_pp_alloc_fast",
> > +	"rx%d_pp_alloc_slow",
> > +	"rx%d_pp_alloc_slow_ho",
> > +	"rx%d_pp_alloc_empty",
> > +	"rx%d_pp_alloc_refill",
> > +	"rx%d_pp_alloc_waive",
> > +	"rx%d_pp_recycle_cached",
> > +	"rx%d_pp_recycle_cache_full",
> > +	"rx%d_pp_recycle_ring",
> > +	"rx%d_pp_recycle_ring_full",
> > +	"rx%d_pp_recycle_released_ref",
> 
> Why static? Isn't the whole point that we want automatic alignment for
> changes in net/core/page_pool.c :: struct pp_stats ?
> I suggest writing the code so that mlx5e pp_stats_mq strings are generated
> and adjusted automatically from the generic struct pp_stats.

I've been told that it does not make sense to make generic because it is
already possible to query per-queue stats and the mlx driver does its
own approach here.

> > +};
> > +
> > +static void mlx_page_pool_stats_get_strings_mq(u8 **data, unsigned int queue)
> 
> Use mlx5e prefix.
> Can use pp to shorten page_pool in function/struct names.
> 
> > +{
> > +	int i;
> > +
> > +	WARN_ON_ONCE(ARRAY_SIZE(pp_stats_mq) != page_pool_ethtool_stats_get_count());
> 
> Not good. We don't want to get a WARNING in case someone aded new pp stats
> without adding to mlx5e.
> 
> Shoud write the code so that this is not possible.

I can't use BUILD_BUG_ON() with page_pool_ethtool_stats_get_count() and
the array isn't exported.

> > +
> > +	for (i = 0; i < ARRAY_SIZE(pp_stats_mq); i++)
> > +		ethtool_sprintf(data, pp_stats_mq[i], queue);
> >   }
> >   static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
> > @@ -2493,6 +2471,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
> >   	for (i = 0; i < max_nch; i++) {
> >   		for (j = 0; j < NUM_RQ_STATS; j++)
> >   			ethtool_sprintf(data, rq_stats_desc[j].format, i);
> > +		mlx_page_pool_stats_get_strings_mq(data, i);
> >   		for (j = 0; j < NUM_XSKRQ_STATS * is_xsk; j++)
> >   			ethtool_sprintf(data, xskrq_stats_desc[j].format, i);
> >   		for (j = 0; j < NUM_RQ_XDPSQ_STATS; j++)
> > @@ -2527,11 +2506,13 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(channels)
> >   					      ch_stats_desc, j));
> >   	for (i = 0; i < max_nch; i++) {
> > +		struct mlx5e_rq_stats *rq_stats = &priv->channel_stats[i]->rq;
> > +
> >   		for (j = 0; j < NUM_RQ_STATS; j++)
> >   			mlx5e_ethtool_put_stat(
> > -				data, MLX5E_READ_CTR64_CPU(
> > -					      &priv->channel_stats[i]->rq,
> > +				data, MLX5E_READ_CTR64_CPU(rq_stats,
> >   					      rq_stats_desc, j));
> > +		*data = page_pool_ethtool_stats_get(*data, &rq_stats->page_pool_stats);
> >   		for (j = 0; j < NUM_XSKRQ_STATS * is_xsk; j++)
> >   			mlx5e_ethtool_put_stat(
> >   				data, MLX5E_READ_CTR64_CPU(
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
> > index 8de6fcbd3a033..30c5c2a92508b 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
> > @@ -33,6 +33,8 @@
> >   #ifndef __MLX5_EN_STATS_H__
> >   #define __MLX5_EN_STATS_H__
> > +#include <net/page_pool/types.h>
> > +
> >   #define MLX5E_READ_CTR64_CPU(ptr, dsc, i) \
> >   	(*(u64 *)((char *)ptr + dsc[i].offset))
> >   #define MLX5E_READ_CTR64_BE(ptr, dsc, i) \
> > @@ -215,17 +217,7 @@ struct mlx5e_sw_stats {
> >   	u64 ch_aff_change;
> >   	u64 ch_force_irq;
> >   	u64 ch_eq_rearm;
> > -	u64 rx_pp_alloc_fast;
> > -	u64 rx_pp_alloc_slow;
> > -	u64 rx_pp_alloc_slow_high_order;
> > -	u64 rx_pp_alloc_empty;
> > -	u64 rx_pp_alloc_refill;
> > -	u64 rx_pp_alloc_waive;
> > -	u64 rx_pp_recycle_cached;
> > -	u64 rx_pp_recycle_cache_full;
> > -	u64 rx_pp_recycle_ring;
> > -	u64 rx_pp_recycle_ring_full;
> > -	u64 rx_pp_recycle_released_ref;
> > +	struct page_pool_stats page_pool_stats;
> 
> Maybe call it rx_pp_stats ?

okay

> >   #ifdef CONFIG_MLX5_EN_TLS
> >   	u64 tx_tls_encrypted_packets;
> >   	u64 tx_tls_encrypted_bytes;
> > @@ -383,17 +375,7 @@ struct mlx5e_rq_stats {
> >   	u64 arfs_err;
> >   #endif
> >   	u64 recover;
> > -	u64 pp_alloc_fast;
> > -	u64 pp_alloc_slow;
> > -	u64 pp_alloc_slow_high_order;
> > -	u64 pp_alloc_empty;
> > -	u64 pp_alloc_refill;
> > -	u64 pp_alloc_waive;
> > -	u64 pp_recycle_cached;
> > -	u64 pp_recycle_cache_full;
> > -	u64 pp_recycle_ring;
> > -	u64 pp_recycle_ring_full;
> > -	u64 pp_recycle_released_ref;
> > +	struct page_pool_stats page_pool_stats;
> 
> Maybe call it pp_stats ?

okay.

> >   #ifdef CONFIG_MLX5_EN_TLS
> >   	u64 tls_decrypted_packets;
> >   	u64 tls_decrypted_bytes;

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics.
  2025-04-10  8:58     ` Sebastian Andrzej Siewior
@ 2025-04-10 11:23       ` Tariq Toukan
  0 siblings, 0 replies; 14+ messages in thread
From: Tariq Toukan @ 2025-04-10 11:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rdma, linux-rt-devel, netdev, David S. Miller, Andrew Lunn,
	Eric Dumazet, Ilias Apalodimas, Jakub Kicinski,
	Jesper Dangaard Brouer, Joe Damato, Leon Romanovsky, Paolo Abeni,
	Saeed Mahameed, Simon Horman, Tariq Toukan, Thomas Gleixner,
	Yunsheng Lin



On 10/04/2025 11:58, Sebastian Andrzej Siewior wrote:
> On 2025-04-10 10:16:54 [+0300], Tariq Toukan wrote:
>> Thanks for your patch.
>>
>> Was this tested?
>> Were you able to run with mlx5 device?
> 
> Kind of. The device does optical and I have nothing to bring up the
> link, a few gbic did not work so the guys gave up back then. On link up
> the driver sets up buffers so I saw that the stats incrementing and the
> numbers queries with ethtool made sense.
> 
>>>    .../ethernet/mellanox/mlx5/core/en_stats.c    | 97 ++++++++-----------
>>>    .../ethernet/mellanox/mlx5/core/en_stats.h    | 26 +----
>>>    2 files changed, 43 insertions(+), 80 deletions(-)
>>>
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
>>> index 1c121b435016d..54303877adb1d 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
>>> @@ -194,17 +194,6 @@ static const struct counter_desc sw_stats_desc[] = {
>>>    	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_arfs_err) },
>>>    #endif
>>>    	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_recover) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_fast) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_slow) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_slow_high_order) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_empty) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_refill) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_alloc_waive) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_cached) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_cache_full) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_ring) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_ring_full) },
>>> -	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_pp_recycle_released_ref) },
>>>    #ifdef CONFIG_MLX5_EN_TLS
>>>    	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_tls_decrypted_packets) },
>>>    	{ MLX5E_DECLARE_STAT(struct mlx5e_sw_stats, rx_tls_decrypted_bytes) },
>>> @@ -253,7 +242,7 @@ static const struct counter_desc sw_stats_desc[] = {
>>>    static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(sw)
>>>    {
>>> -	return NUM_SW_COUNTERS;
>>> +	return NUM_SW_COUNTERS + page_pool_ethtool_stats_get_count();
>>>    }
>>>    static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(sw)
>>> @@ -262,6 +251,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(sw)
>>>    	for (i = 0; i < NUM_SW_COUNTERS; i++)
>>>    		ethtool_puts(data, sw_stats_desc[i].format);
>>> +	*data = page_pool_ethtool_stats_get_strings(*data);
>>>    }
>>>    static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(sw)
>>> @@ -272,6 +262,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(sw)
>>>    		mlx5e_ethtool_put_stat(data,
>>>    				       MLX5E_READ_CTR64_CPU(&priv->stats.sw,
>>>    							    sw_stats_desc, i));
>>> +	*data = page_pool_ethtool_stats_get(*data, &priv->stats.sw.page_pool_stats);
>>>    }
>>>    static void mlx5e_stats_grp_sw_update_stats_xdp_red(struct mlx5e_sw_stats *s,
>>> @@ -373,17 +364,6 @@ static void mlx5e_stats_grp_sw_update_stats_rq_stats(struct mlx5e_sw_stats *s,
>>>    	s->rx_arfs_err                += rq_stats->arfs_err;
>>>    #endif
>>>    	s->rx_recover                 += rq_stats->recover;
>>> -	s->rx_pp_alloc_fast          += rq_stats->pp_alloc_fast;
>>> -	s->rx_pp_alloc_slow          += rq_stats->pp_alloc_slow;
>>> -	s->rx_pp_alloc_empty         += rq_stats->pp_alloc_empty;
>>> -	s->rx_pp_alloc_refill        += rq_stats->pp_alloc_refill;
>>> -	s->rx_pp_alloc_waive         += rq_stats->pp_alloc_waive;
>>> -	s->rx_pp_alloc_slow_high_order		+= rq_stats->pp_alloc_slow_high_order;
>>> -	s->rx_pp_recycle_cached			+= rq_stats->pp_recycle_cached;
>>> -	s->rx_pp_recycle_cache_full		+= rq_stats->pp_recycle_cache_full;
>>> -	s->rx_pp_recycle_ring			+= rq_stats->pp_recycle_ring;
>>> -	s->rx_pp_recycle_ring_full		+= rq_stats->pp_recycle_ring_full;
>>> -	s->rx_pp_recycle_released_ref		+= rq_stats->pp_recycle_released_ref;
>>>    #ifdef CONFIG_MLX5_EN_TLS
>>>    	s->rx_tls_decrypted_packets   += rq_stats->tls_decrypted_packets;
>>>    	s->rx_tls_decrypted_bytes     += rq_stats->tls_decrypted_bytes;
>>> @@ -490,27 +470,13 @@ static void mlx5e_stats_grp_sw_update_stats_qos(struct mlx5e_priv *priv,
>>>    	}
>>>    }
>>> -static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
>>> +static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_sw_stats *s,
>>> +						  struct mlx5e_channel *c)
>>
>> This per-RQ function should not get the general SW status struct and write
>> to it.
>>
>> Gathering stats from all RQs into the general SW stats should not be done
>> here.
> 
> but I have to sum it for the global stats. The old code queried the
> channels again for the global stats. This queries the channels twice,
> once for per-channel and again for the global stats.
> 

This changes the function role.
Keep the separation.

>>>    {
>>>    	struct mlx5e_rq_stats *rq_stats = c->rq.stats;
>>> -	struct page_pool *pool = c->rq.page_pool;
>>> -	struct page_pool_stats stats = { 0 };
>>> -	if (!page_pool_get_stats(pool, &stats))
>>> -		return;
>>> -
>>> -	rq_stats->pp_alloc_fast = stats.alloc_stats.fast;
>>> -	rq_stats->pp_alloc_slow = stats.alloc_stats.slow;
>>> -	rq_stats->pp_alloc_slow_high_order = stats.alloc_stats.slow_high_order;
>>> -	rq_stats->pp_alloc_empty = stats.alloc_stats.empty;
>>> -	rq_stats->pp_alloc_waive = stats.alloc_stats.waive;
>>> -	rq_stats->pp_alloc_refill = stats.alloc_stats.refill;
>>> -
>>> -	rq_stats->pp_recycle_cached = stats.recycle_stats.cached;
>>> -	rq_stats->pp_recycle_cache_full = stats.recycle_stats.cache_full;
>>> -	rq_stats->pp_recycle_ring = stats.recycle_stats.ring;
>>> -	rq_stats->pp_recycle_ring_full = stats.recycle_stats.ring_full;
>>> -	rq_stats->pp_recycle_released_ref = stats.recycle_stats.released_refcnt;
>>> +	page_pool_get_stats(c->rq.page_pool, &s->page_pool_stats);
>>> +	page_pool_get_stats(c->rq.page_pool, &rq_stats->page_pool_stats);

Also, what do you acheive here when reading the per-RQ pp stats into 
&s->page_pool_stats? isn't this immediately overwritten by the next 
iteration? They do not accumulate.

>>>    }
>>>    static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
>>> @@ -520,15 +486,13 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
>>>    	memset(s, 0, sizeof(*s));
>>> -	for (i = 0; i < priv->channels.num; i++) /* for active channels only */
>>> -		mlx5e_stats_update_stats_rq_page_pool(priv->channels.c[i]);
>>> -
>>>    	for (i = 0; i < priv->stats_nch; i++) {
>>>    		struct mlx5e_channel_stats *channel_stats =
>>>    			priv->channel_stats[i];
>>>    		int j;
>>> +		mlx5e_stats_update_stats_rq_page_pool(s, priv->channels.c[i]);
>>

You'd probably need to read the per-RQ stats into a local tmp struct, 
and iteratively accumulate then values.

>> Should separate, do not mix two roles for the same function.
>>
>> Moreover, this won't work, it's buggy:
>> You're looping up to priv->stats_nch (largest num of channels ever created)
>> trying to access priv->channels.c[i] (current number of channels), you can
>> get out of bound accessing non-existing channels.
>>
>> There's a design issue to be solved here:
>> Previously, the page_pool stats existed in RQ stats, also for RQs that no
>> longer exist. We must preserve this behavior here, even if it means
>> assigning zeros for the page_pool stats for non-existing RQs.
>>
>> Worth mentioning: Due to the nature of the infrastructure, today the
>> page_pool stats are not-persistent, while all our other stats are. They are
>> reset to zeros upon channels down/up events. This is not optimal but that's
>> how it is today, and is not expected to be changed here in your series.
> 
> so just channels.num then.
> 

For page_pool stats of existing RQs, yes.

BUT:
Previously, page_pool stats entries appeared (with zeroed values) also 
for non-existing RQs, now you're gonna drop them? Better not introduce 
any behavior change.

>>>    		mlx5e_stats_grp_sw_update_stats_rq_stats(s, &channel_stats->rq);
>>>    		mlx5e_stats_grp_sw_update_stats_xdpsq(s, &channel_stats->rq_xdpsq);
>>>    		mlx5e_stats_grp_sw_update_stats_ch_stats(s, &channel_stats->ch);
>>> @@ -2119,17 +2083,6 @@ static const struct counter_desc rq_stats_desc[] = {
>>>    	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, arfs_err) },
>>>    #endif
>>>    	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, recover) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_fast) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_slow) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_slow_high_order) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_empty) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_refill) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_alloc_waive) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_cached) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_cache_full) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_ring) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_ring_full) },
>>> -	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, pp_recycle_released_ref) },
>>>    #ifdef CONFIG_MLX5_EN_TLS
>>>    	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, tls_decrypted_packets) },
>>>    	{ MLX5E_DECLARE_RX_STAT(struct mlx5e_rq_stats, tls_decrypted_bytes) },
>>> @@ -2477,7 +2430,32 @@ static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(channels)
>>>    	       (NUM_RQ_XDPSQ_STATS * max_nch) +
>>>    	       (NUM_XDPSQ_STATS * max_nch) +
>>>    	       (NUM_XSKRQ_STATS * max_nch * priv->xsk.ever_used) +
>>> -	       (NUM_XSKSQ_STATS * max_nch * priv->xsk.ever_used);
>>> +	       (NUM_XSKSQ_STATS * max_nch * priv->xsk.ever_used) +
>>> +	       page_pool_ethtool_stats_get_count() * max_nch;
>>
>> Take this closer to the NUM_RQ_STATS * max_nch part.
>>
>>> +}
>>> +
>>> +static const char pp_stats_mq[][ETH_GSTRING_LEN] = {
>>> +	"rx%d_pp_alloc_fast",
>>> +	"rx%d_pp_alloc_slow",
>>> +	"rx%d_pp_alloc_slow_ho",
>>> +	"rx%d_pp_alloc_empty",
>>> +	"rx%d_pp_alloc_refill",
>>> +	"rx%d_pp_alloc_waive",
>>> +	"rx%d_pp_recycle_cached",
>>> +	"rx%d_pp_recycle_cache_full",
>>> +	"rx%d_pp_recycle_ring",
>>> +	"rx%d_pp_recycle_ring_full",
>>> +	"rx%d_pp_recycle_released_ref",
>>
>> Why static? Isn't the whole point that we want automatic alignment for
>> changes in net/core/page_pool.c :: struct pp_stats ?
>> I suggest writing the code so that mlx5e pp_stats_mq strings are generated
>> and adjusted automatically from the generic struct pp_stats.
> 
> I've been told that it does not make sense to make generic because it is
> already possible to query per-queue stats and the mlx driver does its
> own approach here.
> 

I know. I didn't ask for this.
My comments is, let mlx5 driver generate its own set of strings (maybe 
in runtime) as a derivative of the generic struct pp_stats.
So that it automatically adopts to any change in struct pp_stats.

>>> +};
>>> +
>>> +static void mlx_page_pool_stats_get_strings_mq(u8 **data, unsigned int queue)
>>
>> Use mlx5e prefix.
>> Can use pp to shorten page_pool in function/struct names.
>>
>>> +{
>>> +	int i;
>>> +
>>> +	WARN_ON_ONCE(ARRAY_SIZE(pp_stats_mq) != page_pool_ethtool_stats_get_count());
>>
>> Not good. We don't want to get a WARNING in case someone aded new pp stats
>> without adding to mlx5e.
>>
>> Shoud write the code so that this is not possible.
> 
> I can't use BUILD_BUG_ON() with page_pool_ethtool_stats_get_count() and
> the array isn't exported.
> 

Got it.
Maybe we can add a non-ethtool getter function for the pp_stats strings?

>>> +
>>> +	for (i = 0; i < ARRAY_SIZE(pp_stats_mq); i++)
>>> +		ethtool_sprintf(data, pp_stats_mq[i], queue);
>>>    }
>>>    static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
>>> @@ -2493,6 +2471,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
>>>    	for (i = 0; i < max_nch; i++) {
>>>    		for (j = 0; j < NUM_RQ_STATS; j++)
>>>    			ethtool_sprintf(data, rq_stats_desc[j].format, i);
>>> +		mlx_page_pool_stats_get_strings_mq(data, i);
>>>    		for (j = 0; j < NUM_XSKRQ_STATS * is_xsk; j++)
>>>    			ethtool_sprintf(data, xskrq_stats_desc[j].format, i);
>>>    		for (j = 0; j < NUM_RQ_XDPSQ_STATS; j++)
>>> @@ -2527,11 +2506,13 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(channels)
>>>    					      ch_stats_desc, j));
>>>    	for (i = 0; i < max_nch; i++) {
>>> +		struct mlx5e_rq_stats *rq_stats = &priv->channel_stats[i]->rq;
>>> +
>>>    		for (j = 0; j < NUM_RQ_STATS; j++)
>>>    			mlx5e_ethtool_put_stat(
>>> -				data, MLX5E_READ_CTR64_CPU(
>>> -					      &priv->channel_stats[i]->rq,
>>> +				data, MLX5E_READ_CTR64_CPU(rq_stats,
>>>    					      rq_stats_desc, j));
>>> +		*data = page_pool_ethtool_stats_get(*data, &rq_stats->page_pool_stats);
>>>    		for (j = 0; j < NUM_XSKRQ_STATS * is_xsk; j++)
>>>    			mlx5e_ethtool_put_stat(
>>>    				data, MLX5E_READ_CTR64_CPU(
>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
>>> index 8de6fcbd3a033..30c5c2a92508b 100644
>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.h
>>> @@ -33,6 +33,8 @@
>>>    #ifndef __MLX5_EN_STATS_H__
>>>    #define __MLX5_EN_STATS_H__
>>> +#include <net/page_pool/types.h>
>>> +
>>>    #define MLX5E_READ_CTR64_CPU(ptr, dsc, i) \
>>>    	(*(u64 *)((char *)ptr + dsc[i].offset))
>>>    #define MLX5E_READ_CTR64_BE(ptr, dsc, i) \
>>> @@ -215,17 +217,7 @@ struct mlx5e_sw_stats {
>>>    	u64 ch_aff_change;
>>>    	u64 ch_force_irq;
>>>    	u64 ch_eq_rearm;
>>> -	u64 rx_pp_alloc_fast;
>>> -	u64 rx_pp_alloc_slow;
>>> -	u64 rx_pp_alloc_slow_high_order;
>>> -	u64 rx_pp_alloc_empty;
>>> -	u64 rx_pp_alloc_refill;
>>> -	u64 rx_pp_alloc_waive;
>>> -	u64 rx_pp_recycle_cached;
>>> -	u64 rx_pp_recycle_cache_full;
>>> -	u64 rx_pp_recycle_ring;
>>> -	u64 rx_pp_recycle_ring_full;
>>> -	u64 rx_pp_recycle_released_ref;
>>> +	struct page_pool_stats page_pool_stats;
>>
>> Maybe call it rx_pp_stats ?
> 
> okay
> 
>>>    #ifdef CONFIG_MLX5_EN_TLS
>>>    	u64 tx_tls_encrypted_packets;
>>>    	u64 tx_tls_encrypted_bytes;
>>> @@ -383,17 +375,7 @@ struct mlx5e_rq_stats {
>>>    	u64 arfs_err;
>>>    #endif
>>>    	u64 recover;
>>> -	u64 pp_alloc_fast;
>>> -	u64 pp_alloc_slow;
>>> -	u64 pp_alloc_slow_high_order;
>>> -	u64 pp_alloc_empty;
>>> -	u64 pp_alloc_refill;
>>> -	u64 pp_alloc_waive;
>>> -	u64 pp_recycle_cached;
>>> -	u64 pp_recycle_cache_full;
>>> -	u64 pp_recycle_ring;
>>> -	u64 pp_recycle_ring_full;
>>> -	u64 pp_recycle_released_ref;
>>> +	struct page_pool_stats page_pool_stats;
>>
>> Maybe call it pp_stats ?
> 
> okay.
> 
>>>    #ifdef CONFIG_MLX5_EN_TLS
>>>    	u64 tls_decrypted_packets;
>>>    	u64 tls_decrypted_bytes;
> 
> Sebastian


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 2/4] page_pool: Provide an empty page_pool_stats for disabled stats.
  2025-04-08 10:59 [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
  2025-04-08 10:59 ` [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics Sebastian Andrzej Siewior
@ 2025-04-08 10:59 ` Sebastian Andrzej Siewior
  2025-04-08 10:59 ` [PATCH net-next v3 3/4] page_pool: Convert page_pool_recycle_stats to u64_stats_t Sebastian Andrzej Siewior
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-04-08 10:59 UTC (permalink / raw)
  To: linux-rdma, linux-rt-devel, netdev
  Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
	Jakub Kicinski, Jesper Dangaard Brouer, Joe Damato,
	Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner, Yunsheng Lin,
	Sebastian Andrzej Siewior

An empty struct page_pool_stats allows to always add it to structs and
pass it to functions like page_pool_ethtool_stats_get() without the need
for an ifdef.

Provide an empty struct page_pool_stats and page_pool_get_stats() for
!CONFIG_PAGE_POOL_STATS builds.

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/net/page_pool/helpers.h | 6 ++++++
 include/net/page_pool/types.h   | 4 ++++
 2 files changed, 10 insertions(+)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 582a3d00cbe23..4622db90f88f2 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -81,6 +81,12 @@ static inline u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
 {
 	return data;
 }
+
+static inline bool page_pool_get_stats(const struct page_pool *pool,
+				       struct page_pool_stats *stats)
+{
+	return false;
+}
 #endif
 
 /**
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 36eb57d73abc6..15557fe77af2c 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -140,6 +140,10 @@ struct page_pool_stats {
 	struct page_pool_alloc_stats alloc_stats;
 	struct page_pool_recycle_stats recycle_stats;
 };
+
+#else /* !CONFIG_PAGE_POOL_STATS */
+
+struct page_pool_stats { };
 #endif
 
 /* The whole frag API block must stay within one cacheline. On 32-bit systems,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 3/4] page_pool: Convert page_pool_recycle_stats to u64_stats_t.
  2025-04-08 10:59 [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
  2025-04-08 10:59 ` [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics Sebastian Andrzej Siewior
  2025-04-08 10:59 ` [PATCH net-next v3 2/4] page_pool: Provide an empty page_pool_stats for disabled stats Sebastian Andrzej Siewior
@ 2025-04-08 10:59 ` Sebastian Andrzej Siewior
  2025-04-08 12:13   ` Yunsheng Lin
  2025-04-08 10:59 ` [PATCH net-next v3 4/4] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-04-08 10:59 UTC (permalink / raw)
  To: linux-rdma, linux-rt-devel, netdev
  Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
	Jakub Kicinski, Jesper Dangaard Brouer, Joe Damato,
	Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner, Yunsheng Lin,
	Sebastian Andrzej Siewior

Using u64 for statistics can lead to inconsistency on 32bit because an
update and a read requires to access two 32bit values.
This can be avoided by using u64_stats_t for the counters and
u64_stats_sync for the required synchronisation on 32bit platforms. The
synchronisation is a NOP on 64bit architectures.

Use u64_stats_t for the counters in page_pool_recycle_stats.
Add U64_STATS_ZERO, a static initializer for u64_stats_t.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 Documentation/networking/page_pool.rst |  6 +--
 include/linux/u64_stats_sync.h         |  5 +++
 include/net/page_pool/types.h          | 13 ++++---
 net/core/page_pool.c                   | 52 ++++++++++++++++++--------
 net/core/page_pool_user.c              | 10 ++---
 5 files changed, 58 insertions(+), 28 deletions(-)

diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
index 9d958128a57cb..5215fd51a334a 100644
--- a/Documentation/networking/page_pool.rst
+++ b/Documentation/networking/page_pool.rst
@@ -181,11 +181,11 @@ Stats
 
 	#ifdef CONFIG_PAGE_POOL_STATS
 	/* retrieve stats */
-	struct page_pool_stats stats = { 0 };
+	struct page_pool_stats stats = { };
 	if (page_pool_get_stats(page_pool, &stats)) {
 		/* perhaps the driver reports statistics with ethool */
-		ethtool_print_allocation_stats(&stats.alloc_stats);
-		ethtool_print_recycle_stats(&stats.recycle_stats);
+		ethtool_print_allocation_stats(u64_stats_read(&stats.alloc_stats));
+		ethtool_print_recycle_stats(u64_stats_read(&stats.recycle_stats));
 	}
 	#endif
 
diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 457879938fc19..086bd4a51cfe9 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -94,6 +94,8 @@ static inline void u64_stats_inc(u64_stats_t *p)
 	local64_inc(&p->v);
 }
 
+#define U64_STATS_ZERO(_member, _name)	{}
+
 static inline void u64_stats_init(struct u64_stats_sync *syncp) { }
 static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp) { }
 static inline void __u64_stats_update_end(struct u64_stats_sync *syncp) { }
@@ -141,6 +143,9 @@ static inline void u64_stats_inc(u64_stats_t *p)
 		seqcount_init(&__s->seq);		\
 	} while (0)
 
+#define U64_STATS_ZERO(_member, _name)			\
+	_member.seq	= SEQCNT_ZERO(#_name#_member.seq)
+
 static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp)
 {
 	preempt_disable_nested();
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 15557fe77af2c..54c79a020b334 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -6,6 +6,7 @@
 #include <linux/dma-direction.h>
 #include <linux/ptr_ring.h>
 #include <linux/types.h>
+#include <linux/u64_stats_sync.h>
 #include <net/netmem.h>
 
 #define PP_FLAG_DMA_MAP		BIT(0) /* Should page_pool do the DMA
@@ -114,6 +115,7 @@ struct page_pool_alloc_stats {
 
 /**
  * struct page_pool_recycle_stats - recycling (freeing) statistics
+ * @syncp:	synchronisations point for updates.
  * @cached:	recycling placed page in the page pool cache
  * @cache_full:	page pool cache was full
  * @ring:	page placed into the ptr ring
@@ -121,11 +123,12 @@ struct page_pool_alloc_stats {
  * @released_refcnt:	page released (and not recycled) because refcnt > 1
  */
 struct page_pool_recycle_stats {
-	u64 cached;
-	u64 cache_full;
-	u64 ring;
-	u64 ring_full;
-	u64 released_refcnt;
+	struct u64_stats_sync syncp;
+	u64_stats_t cached;
+	u64_stats_t cache_full;
+	u64_stats_t ring;
+	u64_stats_t ring_full;
+	u64_stats_t released_refcnt;
 };
 
 /**
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7745ad924ae2d..eb2f5b995022f 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -40,21 +40,27 @@ DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers);
 #define BIAS_MAX	(LONG_MAX >> 1)
 
 #ifdef CONFIG_PAGE_POOL_STATS
-static DEFINE_PER_CPU(struct page_pool_recycle_stats, pp_system_recycle_stats);
+static DEFINE_PER_CPU(struct page_pool_recycle_stats, pp_system_recycle_stats) = {
+	U64_STATS_ZERO(.syncp, pp_system_recycle_stats),
+};
 
 /* alloc_stat_inc is intended to be used in softirq context */
 #define alloc_stat_inc(pool, __stat)	(pool->alloc_stats.__stat++)
 /* recycle_stat_inc is safe to use when preemption is possible. */
 #define recycle_stat_inc(pool, __stat)							\
 	do {										\
-		struct page_pool_recycle_stats __percpu *s = pool->recycle_stats;	\
-		this_cpu_inc(s->__stat);						\
+		struct page_pool_recycle_stats *s = this_cpu_ptr(pool->recycle_stats);	\
+		u64_stats_update_begin(&s->syncp);					\
+		u64_stats_inc(&s->__stat);						\
+		u64_stats_update_end(&s->syncp);					\
 	} while (0)
 
 #define recycle_stat_add(pool, __stat, val)						\
 	do {										\
-		struct page_pool_recycle_stats __percpu *s = pool->recycle_stats;	\
-		this_cpu_add(s->__stat, val);						\
+		struct page_pool_recycle_stats *s = this_cpu_ptr(pool->recycle_stats);	\
+		u64_stats_update_begin(&s->syncp);					\
+		u64_stats_add(&s->__stat, val);						\
+		u64_stats_update_end(&s->syncp);					\
 	} while (0)
 
 static const char pp_stats[][ETH_GSTRING_LEN] = {
@@ -85,6 +91,7 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
 bool page_pool_get_stats(const struct page_pool *pool,
 			 struct page_pool_stats *stats)
 {
+	unsigned int start;
 	int cpu = 0;
 
 	if (!stats)
@@ -99,14 +106,24 @@ bool page_pool_get_stats(const struct page_pool *pool,
 	stats->alloc_stats.waive += pool->alloc_stats.waive;
 
 	for_each_possible_cpu(cpu) {
+		u64 cached, cache_full, ring, ring_full, released_refcnt;
 		const struct page_pool_recycle_stats *pcpu =
 			per_cpu_ptr(pool->recycle_stats, cpu);
 
-		stats->recycle_stats.cached += pcpu->cached;
-		stats->recycle_stats.cache_full += pcpu->cache_full;
-		stats->recycle_stats.ring += pcpu->ring;
-		stats->recycle_stats.ring_full += pcpu->ring_full;
-		stats->recycle_stats.released_refcnt += pcpu->released_refcnt;
+		do {
+			start = u64_stats_fetch_begin(&pcpu->syncp);
+			cached = u64_stats_read(&pcpu->cached);
+			cache_full = u64_stats_read(&pcpu->cache_full);
+			ring = u64_stats_read(&pcpu->ring);
+			ring_full = u64_stats_read(&pcpu->ring_full);
+			released_refcnt = u64_stats_read(&pcpu->released_refcnt);
+		} while (u64_stats_fetch_retry(&pcpu->syncp, start));
+
+		u64_stats_add(&stats->recycle_stats.cached, cached);
+		u64_stats_add(&stats->recycle_stats.cache_full, cache_full);
+		u64_stats_add(&stats->recycle_stats.ring, ring);
+		u64_stats_add(&stats->recycle_stats.ring_full, ring_full);
+		u64_stats_add(&stats->recycle_stats.released_refcnt, released_refcnt);
 	}
 
 	return true;
@@ -142,11 +159,11 @@ u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
 	*data++ = pool_stats->alloc_stats.empty;
 	*data++ = pool_stats->alloc_stats.refill;
 	*data++ = pool_stats->alloc_stats.waive;
-	*data++ = pool_stats->recycle_stats.cached;
-	*data++ = pool_stats->recycle_stats.cache_full;
-	*data++ = pool_stats->recycle_stats.ring;
-	*data++ = pool_stats->recycle_stats.ring_full;
-	*data++ = pool_stats->recycle_stats.released_refcnt;
+	*data++ = u64_stats_read(&pool_stats->recycle_stats.cached);
+	*data++ = u64_stats_read(&pool_stats->recycle_stats.cache_full);
+	*data++ = u64_stats_read(&pool_stats->recycle_stats.ring);
+	*data++ = u64_stats_read(&pool_stats->recycle_stats.ring_full);
+	*data++ = u64_stats_read(&pool_stats->recycle_stats.released_refcnt);
 
 	return data;
 }
@@ -250,9 +267,14 @@ static int page_pool_init(struct page_pool *pool,
 
 #ifdef CONFIG_PAGE_POOL_STATS
 	if (!(pool->slow.flags & PP_FLAG_SYSTEM_POOL)) {
+		unsigned int cpu;
+
 		pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
 		if (!pool->recycle_stats)
 			return -ENOMEM;
+
+		for_each_possible_cpu(cpu)
+			u64_stats_init(&per_cpu_ptr(pool->recycle_stats, cpu)->syncp);
 	} else {
 		/* For system page pool instance we use a singular stats object
 		 * instead of allocating a separate percpu variable for each
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index c82a95beceff8..86c22461b7fed 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -149,15 +149,15 @@ page_pool_nl_stats_fill(struct sk_buff *rsp, const struct page_pool *pool,
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
 			 stats.alloc_stats.waive) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
-			 stats.recycle_stats.cached) ||
+			 u64_stats_read(&stats.recycle_stats.cached)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
-			 stats.recycle_stats.cache_full) ||
+			 u64_stats_read(&stats.recycle_stats.cache_full)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
-			 stats.recycle_stats.ring) ||
+			 u64_stats_read(&stats.recycle_stats.ring)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
-			 stats.recycle_stats.ring_full) ||
+			 u64_stats_read(&stats.recycle_stats.ring_full)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
-			 stats.recycle_stats.released_refcnt))
+			 u64_stats_read(&stats.recycle_stats.released_refcnt)))
 		goto err_cancel_msg;
 
 	genlmsg_end(rsp, hdr);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 3/4] page_pool: Convert page_pool_recycle_stats to u64_stats_t.
  2025-04-08 10:59 ` [PATCH net-next v3 3/4] page_pool: Convert page_pool_recycle_stats to u64_stats_t Sebastian Andrzej Siewior
@ 2025-04-08 12:13   ` Yunsheng Lin
  2025-04-09 16:11     ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 14+ messages in thread
From: Yunsheng Lin @ 2025-04-08 12:13 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-rdma, linux-rt-devel, netdev
  Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
	Jakub Kicinski, Jesper Dangaard Brouer, Joe Damato,
	Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner

On 2025/4/8 18:59, Sebastian Andrzej Siewior wrote:
> Using u64 for statistics can lead to inconsistency on 32bit because an
> update and a read requires to access two 32bit values.
> This can be avoided by using u64_stats_t for the counters and
> u64_stats_sync for the required synchronisation on 32bit platforms. The
> synchronisation is a NOP on 64bit architectures.
> 
> Use u64_stats_t for the counters in page_pool_recycle_stats.
> Add U64_STATS_ZERO, a static initializer for u64_stats_t.
> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  Documentation/networking/page_pool.rst |  6 +--
>  include/linux/u64_stats_sync.h         |  5 +++
>  include/net/page_pool/types.h          | 13 ++++---
>  net/core/page_pool.c                   | 52 ++++++++++++++++++--------
>  net/core/page_pool_user.c              | 10 ++---
>  5 files changed, 58 insertions(+), 28 deletions(-)
> 
> diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
> index 9d958128a57cb..5215fd51a334a 100644
> --- a/Documentation/networking/page_pool.rst
> +++ b/Documentation/networking/page_pool.rst
> @@ -181,11 +181,11 @@ Stats
>  
>  	#ifdef CONFIG_PAGE_POOL_STATS
>  	/* retrieve stats */
> -	struct page_pool_stats stats = { 0 };
> +	struct page_pool_stats stats = { };
>  	if (page_pool_get_stats(page_pool, &stats)) {
>  		/* perhaps the driver reports statistics with ethool */
> -		ethtool_print_allocation_stats(&stats.alloc_stats);
> -		ethtool_print_recycle_stats(&stats.recycle_stats);
> +		ethtool_print_allocation_stats(u64_stats_read(&stats.alloc_stats));
> +		ethtool_print_recycle_stats(u64_stats_read(&stats.recycle_stats));

The above seems like an unnecessary change? as stats.alloc_stats and
stats.recycle_stats are not really 'u64_stats_t' type.

Otherwise, LGTM.
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 3/4] page_pool: Convert page_pool_recycle_stats to u64_stats_t.
  2025-04-08 12:13   ` Yunsheng Lin
@ 2025-04-09 16:11     ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-04-09 16:11 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: linux-rdma, linux-rt-devel, netdev, David S. Miller, Andrew Lunn,
	Eric Dumazet, Ilias Apalodimas, Jakub Kicinski,
	Jesper Dangaard Brouer, Joe Damato, Leon Romanovsky, Paolo Abeni,
	Saeed Mahameed, Simon Horman, Tariq Toukan, Thomas Gleixner

On 2025-04-08 20:13:09 [+0800], Yunsheng Lin wrote:
> > index 9d958128a57cb..5215fd51a334a 100644
> > --- a/Documentation/networking/page_pool.rst
> > +++ b/Documentation/networking/page_pool.rst
> > @@ -181,11 +181,11 @@ Stats
> >  
> >  	#ifdef CONFIG_PAGE_POOL_STATS
> >  	/* retrieve stats */
> > -	struct page_pool_stats stats = { 0 };
> > +	struct page_pool_stats stats = { };
> >  	if (page_pool_get_stats(page_pool, &stats)) {
> >  		/* perhaps the driver reports statistics with ethool */
> > -		ethtool_print_allocation_stats(&stats.alloc_stats);
> > -		ethtool_print_recycle_stats(&stats.recycle_stats);
> > +		ethtool_print_allocation_stats(u64_stats_read(&stats.alloc_stats));
> > +		ethtool_print_recycle_stats(u64_stats_read(&stats.recycle_stats));
> 
> The above seems like an unnecessary change? as stats.alloc_stats and

Right. The ethtool_print_.*() don't exist either. Let me remove that.

> stats.recycle_stats are not really 'u64_stats_t' type.
> 
> Otherwise, LGTM.
> Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH net-next v3 4/4] page_pool: Convert page_pool_alloc_stats to u64_stats_t.
  2025-04-08 10:59 [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
                   ` (2 preceding siblings ...)
  2025-04-08 10:59 ` [PATCH net-next v3 3/4] page_pool: Convert page_pool_recycle_stats to u64_stats_t Sebastian Andrzej Siewior
@ 2025-04-08 10:59 ` Sebastian Andrzej Siewior
  2025-04-08 12:27   ` Yunsheng Lin
  2025-04-08 12:33 ` [PATCH net-next v3 0/4] page_pool: Convert stats " Yunsheng Lin
  2025-04-09  1:56 ` Jakub Kicinski
  5 siblings, 1 reply; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-04-08 10:59 UTC (permalink / raw)
  To: linux-rdma, linux-rt-devel, netdev
  Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
	Jakub Kicinski, Jesper Dangaard Brouer, Joe Damato,
	Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner, Yunsheng Lin,
	Sebastian Andrzej Siewior

Using u64 for statistics can lead to inconsistency on 32bit because an
update and a read requires to access two 32bit values.
This can be avoided by using u64_stats_t for the counters and
u64_stats_sync for the required synchronisation on 32bit platforms. The
synchronisation is a NOP on 64bit architectures.

Use u64_stats_t for the counters in page_pool_alloc_stats.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
 include/net/page_pool/types.h | 14 ++++++-----
 net/core/page_pool.c          | 47 +++++++++++++++++++++++++----------
 net/core/page_pool_user.c     | 12 ++++-----
 3 files changed, 48 insertions(+), 25 deletions(-)

diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 54c79a020b334..9406fa69232ee 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -96,6 +96,7 @@ struct page_pool_params {
 #ifdef CONFIG_PAGE_POOL_STATS
 /**
  * struct page_pool_alloc_stats - allocation statistics
+ * @syncp:	synchronisations point for updates.
  * @fast:	successful fast path allocations
  * @slow:	slow path order-0 allocations
  * @slow_high_order: slow path high order allocations
@@ -105,12 +106,13 @@ struct page_pool_params {
  *		the cache due to a NUMA mismatch
  */
 struct page_pool_alloc_stats {
-	u64 fast;
-	u64 slow;
-	u64 slow_high_order;
-	u64 empty;
-	u64 refill;
-	u64 waive;
+	struct u64_stats_sync syncp;
+	u64_stats_t fast;
+	u64_stats_t slow;
+	u64_stats_t slow_high_order;
+	u64_stats_t empty;
+	u64_stats_t refill;
+	u64_stats_t waive;
 };
 
 /**
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index eb2f5b995022f..46e3c56b76692 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -45,7 +45,14 @@ static DEFINE_PER_CPU(struct page_pool_recycle_stats, pp_system_recycle_stats) =
 };
 
 /* alloc_stat_inc is intended to be used in softirq context */
-#define alloc_stat_inc(pool, __stat)	(pool->alloc_stats.__stat++)
+#define alloc_stat_inc(pool, __stat)						\
+	do {									\
+		struct page_pool_alloc_stats *s = &pool->alloc_stats;		\
+		u64_stats_update_begin(&s->syncp);				\
+		u64_stats_inc(&s->__stat);					\
+		u64_stats_update_end(&s->syncp);				\
+	} while (0)
+
 /* recycle_stat_inc is safe to use when preemption is possible. */
 #define recycle_stat_inc(pool, __stat)							\
 	do {										\
@@ -91,19 +98,32 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
 bool page_pool_get_stats(const struct page_pool *pool,
 			 struct page_pool_stats *stats)
 {
+	u64 fast, slow, slow_high_order, empty, refill, waive;
+	const struct page_pool_alloc_stats *alloc_stats;
 	unsigned int start;
 	int cpu = 0;
 
 	if (!stats)
 		return false;
 
+	alloc_stats = &pool->alloc_stats;
 	/* The caller is responsible to initialize stats. */
-	stats->alloc_stats.fast += pool->alloc_stats.fast;
-	stats->alloc_stats.slow += pool->alloc_stats.slow;
-	stats->alloc_stats.slow_high_order += pool->alloc_stats.slow_high_order;
-	stats->alloc_stats.empty += pool->alloc_stats.empty;
-	stats->alloc_stats.refill += pool->alloc_stats.refill;
-	stats->alloc_stats.waive += pool->alloc_stats.waive;
+	do {
+		start = u64_stats_fetch_begin(&alloc_stats->syncp);
+		fast = u64_stats_read(&alloc_stats->fast);
+		slow = u64_stats_read(&alloc_stats->slow);
+		slow_high_order = u64_stats_read(&alloc_stats->slow_high_order);
+		empty = u64_stats_read(&alloc_stats->empty);
+		refill = u64_stats_read(&alloc_stats->refill);
+		waive = u64_stats_read(&alloc_stats->waive);
+	} while (u64_stats_fetch_retry(&alloc_stats->syncp, start));
+
+	u64_stats_add(&stats->alloc_stats.fast, fast);
+	u64_stats_add(&stats->alloc_stats.slow, slow);
+	u64_stats_add(&stats->alloc_stats.slow_high_order, slow_high_order);
+	u64_stats_add(&stats->alloc_stats.empty, empty);
+	u64_stats_add(&stats->alloc_stats.refill, refill);
+	u64_stats_add(&stats->alloc_stats.waive, waive);
 
 	for_each_possible_cpu(cpu) {
 		u64 cached, cache_full, ring, ring_full, released_refcnt;
@@ -153,12 +173,12 @@ u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
 {
 	const struct page_pool_stats *pool_stats = stats;
 
-	*data++ = pool_stats->alloc_stats.fast;
-	*data++ = pool_stats->alloc_stats.slow;
-	*data++ = pool_stats->alloc_stats.slow_high_order;
-	*data++ = pool_stats->alloc_stats.empty;
-	*data++ = pool_stats->alloc_stats.refill;
-	*data++ = pool_stats->alloc_stats.waive;
+	*data++ = u64_stats_read(&pool_stats->alloc_stats.fast);
+	*data++ = u64_stats_read(&pool_stats->alloc_stats.slow);
+	*data++ = u64_stats_read(&pool_stats->alloc_stats.slow_high_order);
+	*data++ = u64_stats_read(&pool_stats->alloc_stats.empty);
+	*data++ = u64_stats_read(&pool_stats->alloc_stats.refill);
+	*data++ = u64_stats_read(&pool_stats->alloc_stats.waive);
 	*data++ = u64_stats_read(&pool_stats->recycle_stats.cached);
 	*data++ = u64_stats_read(&pool_stats->recycle_stats.cache_full);
 	*data++ = u64_stats_read(&pool_stats->recycle_stats.ring);
@@ -283,6 +303,7 @@ static int page_pool_init(struct page_pool *pool,
 		pool->recycle_stats = &pp_system_recycle_stats;
 		pool->system = true;
 	}
+	u64_stats_init(&pool->alloc_stats.syncp);
 #endif
 
 	if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0) {
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 86c22461b7fed..53c1ebe7cbe6b 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -137,17 +137,17 @@ page_pool_nl_stats_fill(struct sk_buff *rsp, const struct page_pool *pool,
 	nla_nest_end(rsp, nest);
 
 	if (nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST,
-			 stats.alloc_stats.fast) ||
+			 u64_stats_read(&stats.alloc_stats.fast)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
-			 stats.alloc_stats.slow) ||
+			 u64_stats_read(&stats.alloc_stats.slow)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
-			 stats.alloc_stats.slow_high_order) ||
+			 u64_stats_read(&stats.alloc_stats.slow_high_order)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
-			 stats.alloc_stats.empty) ||
+			 u64_stats_read(&stats.alloc_stats.empty)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
-			 stats.alloc_stats.refill) ||
+			 u64_stats_read(&stats.alloc_stats.refill)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
-			 stats.alloc_stats.waive) ||
+			 u64_stats_read(&stats.alloc_stats.waive)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
 			 u64_stats_read(&stats.recycle_stats.cached)) ||
 	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 4/4] page_pool: Convert page_pool_alloc_stats to u64_stats_t.
  2025-04-08 10:59 ` [PATCH net-next v3 4/4] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
@ 2025-04-08 12:27   ` Yunsheng Lin
  0 siblings, 0 replies; 14+ messages in thread
From: Yunsheng Lin @ 2025-04-08 12:27 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-rdma, linux-rt-devel, netdev
  Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
	Jakub Kicinski, Jesper Dangaard Brouer, Joe Damato,
	Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner

On 2025/4/8 18:59, Sebastian Andrzej Siewior wrote:
> Using u64 for statistics can lead to inconsistency on 32bit because an
> update and a read requires to access two 32bit values.
> This can be avoided by using u64_stats_t for the counters and
> u64_stats_sync for the required synchronisation on 32bit platforms. The
> synchronisation is a NOP on 64bit architectures.
> 
> Use u64_stats_t for the counters in page_pool_alloc_stats.

It seems a little overkill for page_pool_alloc_stats as there is only
one updater ensured by NAPI context, but I am not able to think of a better
way, so:
Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com>

> 
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
>  include/net/page_pool/types.h | 14 ++++++-----
>  net/core/page_pool.c          | 47 +++++++++++++++++++++++++----------
>  net/core/page_pool_user.c     | 12 ++++-----
>  3 files changed, 48 insertions(+), 25 deletions(-)
> 
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 54c79a020b334..9406fa69232ee 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -96,6 +96,7 @@ struct page_pool_params {
>  #ifdef CONFIG_PAGE_POOL_STATS
>  /**
>   * struct page_pool_alloc_stats - allocation statistics
> + * @syncp:	synchronisations point for updates.
>   * @fast:	successful fast path allocations
>   * @slow:	slow path order-0 allocations
>   * @slow_high_order: slow path high order allocations
> @@ -105,12 +106,13 @@ struct page_pool_params {
>   *		the cache due to a NUMA mismatch
>   */
>  struct page_pool_alloc_stats {
> -	u64 fast;
> -	u64 slow;
> -	u64 slow_high_order;
> -	u64 empty;
> -	u64 refill;
> -	u64 waive;
> +	struct u64_stats_sync syncp;
> +	u64_stats_t fast;
> +	u64_stats_t slow;
> +	u64_stats_t slow_high_order;
> +	u64_stats_t empty;
> +	u64_stats_t refill;
> +	u64_stats_t waive;
>  };
>  
>  /**
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index eb2f5b995022f..46e3c56b76692 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -45,7 +45,14 @@ static DEFINE_PER_CPU(struct page_pool_recycle_stats, pp_system_recycle_stats) =
>  };
>  
>  /* alloc_stat_inc is intended to be used in softirq context */
> -#define alloc_stat_inc(pool, __stat)	(pool->alloc_stats.__stat++)
> +#define alloc_stat_inc(pool, __stat)						\
> +	do {									\
> +		struct page_pool_alloc_stats *s = &pool->alloc_stats;		\
> +		u64_stats_update_begin(&s->syncp);				\
> +		u64_stats_inc(&s->__stat);					\
> +		u64_stats_update_end(&s->syncp);				\
> +	} while (0)
> +
>  /* recycle_stat_inc is safe to use when preemption is possible. */
>  #define recycle_stat_inc(pool, __stat)							\
>  	do {										\
> @@ -91,19 +98,32 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
>  bool page_pool_get_stats(const struct page_pool *pool,
>  			 struct page_pool_stats *stats)
>  {
> +	u64 fast, slow, slow_high_order, empty, refill, waive;
> +	const struct page_pool_alloc_stats *alloc_stats;
>  	unsigned int start;
>  	int cpu = 0;
>  
>  	if (!stats)
>  		return false;
>  
> +	alloc_stats = &pool->alloc_stats;
>  	/* The caller is responsible to initialize stats. */
> -	stats->alloc_stats.fast += pool->alloc_stats.fast;
> -	stats->alloc_stats.slow += pool->alloc_stats.slow;
> -	stats->alloc_stats.slow_high_order += pool->alloc_stats.slow_high_order;
> -	stats->alloc_stats.empty += pool->alloc_stats.empty;
> -	stats->alloc_stats.refill += pool->alloc_stats.refill;
> -	stats->alloc_stats.waive += pool->alloc_stats.waive;
> +	do {
> +		start = u64_stats_fetch_begin(&alloc_stats->syncp);
> +		fast = u64_stats_read(&alloc_stats->fast);
> +		slow = u64_stats_read(&alloc_stats->slow);
> +		slow_high_order = u64_stats_read(&alloc_stats->slow_high_order);
> +		empty = u64_stats_read(&alloc_stats->empty);
> +		refill = u64_stats_read(&alloc_stats->refill);
> +		waive = u64_stats_read(&alloc_stats->waive);
> +	} while (u64_stats_fetch_retry(&alloc_stats->syncp, start));
> +
> +	u64_stats_add(&stats->alloc_stats.fast, fast);
> +	u64_stats_add(&stats->alloc_stats.slow, slow);
> +	u64_stats_add(&stats->alloc_stats.slow_high_order, slow_high_order);
> +	u64_stats_add(&stats->alloc_stats.empty, empty);
> +	u64_stats_add(&stats->alloc_stats.refill, refill);
> +	u64_stats_add(&stats->alloc_stats.waive, waive);
>  
>  	for_each_possible_cpu(cpu) {
>  		u64 cached, cache_full, ring, ring_full, released_refcnt;
> @@ -153,12 +173,12 @@ u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
>  {
>  	const struct page_pool_stats *pool_stats = stats;
>  
> -	*data++ = pool_stats->alloc_stats.fast;
> -	*data++ = pool_stats->alloc_stats.slow;
> -	*data++ = pool_stats->alloc_stats.slow_high_order;
> -	*data++ = pool_stats->alloc_stats.empty;
> -	*data++ = pool_stats->alloc_stats.refill;
> -	*data++ = pool_stats->alloc_stats.waive;
> +	*data++ = u64_stats_read(&pool_stats->alloc_stats.fast);
> +	*data++ = u64_stats_read(&pool_stats->alloc_stats.slow);
> +	*data++ = u64_stats_read(&pool_stats->alloc_stats.slow_high_order);
> +	*data++ = u64_stats_read(&pool_stats->alloc_stats.empty);
> +	*data++ = u64_stats_read(&pool_stats->alloc_stats.refill);
> +	*data++ = u64_stats_read(&pool_stats->alloc_stats.waive);
>  	*data++ = u64_stats_read(&pool_stats->recycle_stats.cached);
>  	*data++ = u64_stats_read(&pool_stats->recycle_stats.cache_full);
>  	*data++ = u64_stats_read(&pool_stats->recycle_stats.ring);
> @@ -283,6 +303,7 @@ static int page_pool_init(struct page_pool *pool,
>  		pool->recycle_stats = &pp_system_recycle_stats;
>  		pool->system = true;
>  	}
> +	u64_stats_init(&pool->alloc_stats.syncp);
>  #endif
>  
>  	if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0) {
> diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
> index 86c22461b7fed..53c1ebe7cbe6b 100644
> --- a/net/core/page_pool_user.c
> +++ b/net/core/page_pool_user.c
> @@ -137,17 +137,17 @@ page_pool_nl_stats_fill(struct sk_buff *rsp, const struct page_pool *pool,
>  	nla_nest_end(rsp, nest);
>  
>  	if (nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST,
> -			 stats.alloc_stats.fast) ||
> +			 u64_stats_read(&stats.alloc_stats.fast)) ||
>  	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
> -			 stats.alloc_stats.slow) ||
> +			 u64_stats_read(&stats.alloc_stats.slow)) ||
>  	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
> -			 stats.alloc_stats.slow_high_order) ||
> +			 u64_stats_read(&stats.alloc_stats.slow_high_order)) ||
>  	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
> -			 stats.alloc_stats.empty) ||
> +			 u64_stats_read(&stats.alloc_stats.empty)) ||
>  	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
> -			 stats.alloc_stats.refill) ||
> +			 u64_stats_read(&stats.alloc_stats.refill)) ||
>  	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
> -			 stats.alloc_stats.waive) ||
> +			 u64_stats_read(&stats.alloc_stats.waive)) ||
>  	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
>  			 u64_stats_read(&stats.recycle_stats.cached)) ||
>  	    nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t.
  2025-04-08 10:59 [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
                   ` (3 preceding siblings ...)
  2025-04-08 10:59 ` [PATCH net-next v3 4/4] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
@ 2025-04-08 12:33 ` Yunsheng Lin
  2025-04-09  1:56 ` Jakub Kicinski
  5 siblings, 0 replies; 14+ messages in thread
From: Yunsheng Lin @ 2025-04-08 12:33 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior, linux-rdma, linux-rt-devel, netdev
  Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
	Jakub Kicinski, Jesper Dangaard Brouer, Joe Damato,
	Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner

On 2025/4/8 18:59, Sebastian Andrzej Siewior wrote:
> This is a follow-up on
>         https://lore.kernel.org/all/20250213093925.x_ggH1aj@linutronix.de/
> 
> to convert the page_pool statistics to u64_stats_t to avoid u64 related
> problems on 32bit architectures.
> While looking over it, the comment for recycle_stat_inc() says that it
> is safe to use in preemptible context. The 32bit update is split into
> two 32bit writes. This is "okay" if the value observed from the current
> CPU but cross CPU reads may observe inconsistencies if the lower part
> overflows and the upper part is not yet written.
> I explained this and added x86-32 assembly in
> 	https://lore.kernel.org/all/20250226102703.3F7Aa2oK@linutronix.de/
> 
> I don't know if it is ensured that only *one* update can happen because
> the stats are per-CPU and per NAPI device. But there will be now a
> warning on 32bit if this is really attempted in preemptible context.

I don't think it is ensured that only *one* update, at least not for
'ring_full' and 'released_refcnt'.

> 
> The placement of the counters is not affected by this change except on
> 32bit where an additional sync member is added. For 64bit pahole output
> changes from
> | struct page_pool_recycle_stats {
> |         u64                        cached;               /*     0     8 */
> |         u64                        cache_full;           /*     8     8 */
> |         u64                        ring;                 /*    16     8 */
> |         u64                        ring_full;            /*    24     8 */
> |         u64                        released_refcnt;      /*    32     8 */
> |
> |         /* size: 40, cachelines: 1, members: 5 */
> |         /* last cacheline: 40 bytes */
> | };
> 
> to
> | struct page_pool_recycle_stats {
> |         struct u64_stats_sync      syncp;                /*     0     0 */
> |         u64_stats_t                cached;               /*     0     8 */
> |         u64_stats_t                cache_full;           /*     8     8 */
> |         u64_stats_t                ring;                 /*    16     8 */
> |         u64_stats_t                ring_full;            /*    24     8 */
> |         u64_stats_t                released_refcnt;      /*    32     8 */
> |
> |         /* size: 40, cachelines: 1, members: 6 */
> |         /* last cacheline: 40 bytes */
> | };
> 
> On 32bit struct u64_stats_sync grows by 4 bytes (plus addiional 20 with
> lockdep).
> 
> For bench_page_pool_simple.ko loops=600000000 I ended up with, before:

Is the above test for 64 bit arches or 32 bit arches?
I guess the added overhead is only for the 32 bit arches?

> 
> | time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.501 ns (step:0)
> | time_bench: Type:atomic_inc Per elem: 6 cycles(tsc) 3.303 ns (step:0)
> | time_bench: Type:lock Per elem: 28 cycles(tsc) 14.038 ns (step:0)
> | time_bench: Type:u64_stats_inc Per elem: 1 cycles(tsc) 0.565 ns (step:0)
> | time_bench: Type:this_cpu_inc Per elem: 1 cycles(tsc) 0.503 ns (step:0)
> | 
> | bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> | time_bench: Type:no-softirq-page_pool01 Per elem: 19 cycles(tsc) 9.526 ns (step:0)
> | bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> | time_bench: Type:no-softirq-page_pool02 Per elem: 46 cycles(tsc) 23.501 ns (step:0)
> | bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> | time_bench: Type:no-softirq-page_pool03 Per elem: 121 cycles(tsc) 60.697 ns (step:0)
> | bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> | bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> | time_bench: Type:tasklet_page_pool01_fast_path Per elem: 19 cycles(tsc) 9.531 ns (step:0)
> | bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> | time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 45 cycles(tsc) 22.594 ns (step:0)
> | bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> | time_bench: Type:tasklet_page_pool03_slow Per elem: 123 cycles(tsc) 61.969 ns (step:0)
> 
> after:
> | time_bench: Type:for_loop Per elem: 1 cycles(tsc) 0.501 ns (step:0)
> | time_bench: Type:atomic_inc Per elem: 6 cycles(tsc) 3.324 ns (step:0)
> | time_bench: Type:lock Per elem: 28 cycles(tsc) 14.038 ns (step:0)
> | time_bench: Type:u64_stats_inc Per elem: 1 cycles(tsc) 0.565 ns (step:0)
> | time_bench: Type:this_cpu_inc Per elem: 1 cycles(tsc) 0.506 ns (step:0)
> | 
> | bench_page_pool_simple: time_bench_page_pool01_fast_path(): Cannot use page_pool fast-path
> | time_bench: Type:no-softirq-page_pool01 Per elem: 18 cycles(tsc) 9.028 ns (step:0)
> | bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): Cannot use page_pool fast-path
> | time_bench: Type:no-softirq-page_pool02 Per elem: 45 cycles(tsc) 22.714 ns (step:0)
> | bench_page_pool_simple: time_bench_page_pool03_slow(): Cannot use page_pool fast-path
> | time_bench: Type:no-softirq-page_pool03 Per elem: 120 cycles(tsc) 60.428 ns (step:0)
> | bench_page_pool_simple: pp_tasklet_handler(): in_serving_softirq fast-path
> | bench_page_pool_simple: time_bench_page_pool01_fast_path(): in_serving_softirq fast-path
> | time_bench: Type:tasklet_page_pool01_fast_path Per elem: 18 cycles(tsc) 9.024 ns (step:0)
> | bench_page_pool_simple: time_bench_page_pool02_ptr_ring(): in_serving_softirq fast-path
> | time_bench: Type:tasklet_page_pool02_ptr_ring Per elem: 43 cycles(tsc) 22.028 ns (step:0)
> | bench_page_pool_simple: time_bench_page_pool03_slow(): in_serving_softirq fast-path
> | time_bench: Type:tasklet_page_pool03_slow Per elem: 121 cycles(tsc) 60.736 ns (step:0)
> 
> v2…v3: https://lore.kernel.org/all/20250307115722.705311-1-bigeasy@linutronix.de
>   - Moved the page_pool_ethtool_stats_get_strings_mq() to the mlx5
>     driver and named it mlx_page_pool_stats_get_strings_mq(). As per
>     review it will remain a Mellanox only thing.
> 
> v1…v2: https://lore.kernel.org/all/20250221115221.291006-1-bigeasy@linutronix.de
>   - Clarified the cover mail, added stat for pahole and from bench_page_pool_simple.ko
>   - Corrected page_pool_alloc_stats vs page_pool_recycle_stats type in
>     the last patch.
>   - Copy the counter values outside of the do {} while loop and add them
>     later.
>   - Redid the mlnx5 patch to make it use generic infrastructure which is
>     now extended as part of this series.
> 
> Sebastian Andrzej Siewior (4):
>   mlnx5: Use generic code for page_pool statistics.
>   page_pool: Provide an empty page_pool_stats for disabled stats.
>   page_pool: Convert page_pool_recycle_stats to u64_stats_t.
>   page_pool: Convert page_pool_alloc_stats to u64_stats_t.
> 
>  Documentation/networking/page_pool.rst        |  6 +-
>  .../ethernet/mellanox/mlx5/core/en_stats.c    | 97 ++++++++----------
>  .../ethernet/mellanox/mlx5/core/en_stats.h    | 26 +----
>  include/linux/u64_stats_sync.h                |  5 +
>  include/net/page_pool/helpers.h               |  6 ++
>  include/net/page_pool/types.h                 | 31 +++---
>  net/core/page_pool.c                          | 99 +++++++++++++------
>  net/core/page_pool_user.c                     | 22 ++---
>  8 files changed, 159 insertions(+), 133 deletions(-)
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t.
  2025-04-08 10:59 [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
                   ` (4 preceding siblings ...)
  2025-04-08 12:33 ` [PATCH net-next v3 0/4] page_pool: Convert stats " Yunsheng Lin
@ 2025-04-09  1:56 ` Jakub Kicinski
  2025-04-09 16:15   ` Sebastian Andrzej Siewior
  5 siblings, 1 reply; 14+ messages in thread
From: Jakub Kicinski @ 2025-04-09  1:56 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior
  Cc: linux-rt-devel, netdev, David S. Miller, Andrew Lunn,
	Eric Dumazet, Ilias Apalodimas, Jesper Dangaard Brouer,
	Joe Damato, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner, Yunsheng Lin

On Tue,  8 Apr 2025 12:59:17 +0200 Sebastian Andrzej Siewior wrote:
> I don't know if it is ensured that only *one* update can happen because
> the stats are per-CPU and per NAPI device. But there will be now a
> warning on 32bit if this is really attempted in preemptible context.

I think recycling may happen in preemptible context, and from BH.
Have you tried to test?
The net.core.skb_defer_max sysctl may mask it for TCP traffic.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t.
  2025-04-09  1:56 ` Jakub Kicinski
@ 2025-04-09 16:15   ` Sebastian Andrzej Siewior
  0 siblings, 0 replies; 14+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-04-09 16:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: linux-rt-devel, netdev, David S. Miller, Andrew Lunn,
	Eric Dumazet, Ilias Apalodimas, Jesper Dangaard Brouer,
	Joe Damato, Paolo Abeni, Saeed Mahameed, Simon Horman,
	Tariq Toukan, Thomas Gleixner, Yunsheng Lin

On 2025-04-08 18:56:36 [-0700], Jakub Kicinski wrote:
> On Tue,  8 Apr 2025 12:59:17 +0200 Sebastian Andrzej Siewior wrote:
> > I don't know if it is ensured that only *one* update can happen because
> > the stats are per-CPU and per NAPI device. But there will be now a
> > warning on 32bit if this is really attempted in preemptible context.
> 
> I think recycling may happen in preemptible context, and from BH.
> Have you tried to test?

Let me try to find something for testing. My mlx box has no ethernet
cable plugged.
Yunsheng mentioned also something that there might be more than one
writer this might ask for different approach.

> The net.core.skb_defer_max sysctl may mask it for TCP traffic.

Sebastian

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-04-10 11:23 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-08 10:59 [PATCH net-next v3 0/4] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
2025-04-08 10:59 ` [PATCH net-next v3 1/4] mlnx5: Use generic code for page_pool statistics Sebastian Andrzej Siewior
2025-04-10  7:16   ` Tariq Toukan
2025-04-10  8:58     ` Sebastian Andrzej Siewior
2025-04-10 11:23       ` Tariq Toukan
2025-04-08 10:59 ` [PATCH net-next v3 2/4] page_pool: Provide an empty page_pool_stats for disabled stats Sebastian Andrzej Siewior
2025-04-08 10:59 ` [PATCH net-next v3 3/4] page_pool: Convert page_pool_recycle_stats to u64_stats_t Sebastian Andrzej Siewior
2025-04-08 12:13   ` Yunsheng Lin
2025-04-09 16:11     ` Sebastian Andrzej Siewior
2025-04-08 10:59 ` [PATCH net-next v3 4/4] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
2025-04-08 12:27   ` Yunsheng Lin
2025-04-08 12:33 ` [PATCH net-next v3 0/4] page_pool: Convert stats " Yunsheng Lin
2025-04-09  1:56 ` Jakub Kicinski
2025-04-09 16:15   ` Sebastian Andrzej Siewior

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).