* [PATCH net-next 0/2] page_pool: Convert stats to u64_stats_t.
@ 2025-02-21 11:52 Sebastian Andrzej Siewior
2025-02-21 11:52 ` [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats " Sebastian Andrzej Siewior
` (2 more replies)
0 siblings, 3 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-21 11:52 UTC (permalink / raw)
To: linux-rdma, netdev
Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
Jakub Kicinski, Jesper Dangaard Brouer, Leon Romanovsky,
Paolo Abeni, Saeed Mahameed, Simon Horman, Tariq Toukan,
Thomas Gleixner, Yunsheng Lin, Sebastian Andrzej Siewior
This is a follow-up on
https://lore.kernel.org/all/20250213093925.x_ggH1aj@linutronix.de/
to convert the page_pool statistics to u64_stats_t to avoid u64 related
problems on 32bit architectures.
While looking over it, the comment for recycle_stat_inc() says that it
is safe to use in preemptible context. The 32bit update is split into
two 32bit writes and if we get preempted in the middle and another one
makes an update then the value gets inconsistent and the previous update
can overwrite the following. (Rare but still).
I don't know if it is ensured that only *one* update can happen because
the stats are per-CPU and per NAPI device. But there will be now a
warning on 32bit if this is really attempted in preemptible context.
Sebastian Andrzej Siewior (2):
page_pool: Convert page_pool_recycle_stats to u64_stats_t.
page_pool: Convert page_pool_alloc_stats to u64_stats_t.
Documentation/networking/page_pool.rst | 4 +-
.../ethernet/mellanox/mlx5/core/en_stats.c | 24 ++---
include/linux/u64_stats_sync.h | 5 +
include/net/page_pool/types.h | 27 +++---
net/core/page_pool.c | 95 +++++++++++++------
net/core/page_pool_user.c | 22 ++---
6 files changed, 113 insertions(+), 64 deletions(-)
--
2.47.2
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats to u64_stats_t.
2025-02-21 11:52 [PATCH net-next 0/2] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
@ 2025-02-21 11:52 ` Sebastian Andrzej Siewior
2025-02-21 17:21 ` Joe Damato
2025-02-22 8:13 ` Yunsheng Lin
2025-02-21 11:52 ` [PATCH net-next 2/2] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
2025-02-21 17:10 ` [PATCH net-next 0/2] page_pool: Convert stats " Joe Damato
2 siblings, 2 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-21 11:52 UTC (permalink / raw)
To: linux-rdma, netdev
Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
Jakub Kicinski, Jesper Dangaard Brouer, Leon Romanovsky,
Paolo Abeni, Saeed Mahameed, Simon Horman, Tariq Toukan,
Thomas Gleixner, Yunsheng Lin, Sebastian Andrzej Siewior
Using u64 for statistics can lead to inconsistency on 32bit because an
update and a read requires to access two 32bit values.
This can be avoided by using u64_stats_t for the counters and
u64_stats_sync for the required synchronisation on 32bit platforms. The
synchronisation is a NOP on 64bit architectures.
Use u64_stats_t for the counters in page_pool_recycle_stats.
Add U64_STATS_ZERO, a static initializer for u64_stats_t.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
Documentation/networking/page_pool.rst | 4 +-
.../ethernet/mellanox/mlx5/core/en_stats.c | 12 ++---
include/linux/u64_stats_sync.h | 5 ++
include/net/page_pool/types.h | 13 +++--
net/core/page_pool.c | 50 +++++++++++++------
net/core/page_pool_user.c | 10 ++--
6 files changed, 61 insertions(+), 33 deletions(-)
diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
index 9d958128a57cb..00431bc88825f 100644
--- a/Documentation/networking/page_pool.rst
+++ b/Documentation/networking/page_pool.rst
@@ -184,8 +184,8 @@ Stats
struct page_pool_stats stats = { 0 };
if (page_pool_get_stats(page_pool, &stats)) {
/* perhaps the driver reports statistics with ethool */
- ethtool_print_allocation_stats(&stats.alloc_stats);
- ethtool_print_recycle_stats(&stats.recycle_stats);
+ ethtool_print_allocation_stats(u64_stats_read(&stats.alloc_stats));
+ ethtool_print_recycle_stats(u64_stats_read(&stats.recycle_stats));
}
#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 611ec4b6f3709..baff961970f25 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -501,7 +501,7 @@ static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
{
struct mlx5e_rq_stats *rq_stats = c->rq.stats;
struct page_pool *pool = c->rq.page_pool;
- struct page_pool_stats stats = { 0 };
+ struct page_pool_stats stats = { };
if (!page_pool_get_stats(pool, &stats))
return;
@@ -513,11 +513,11 @@ static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
rq_stats->pp_alloc_waive = stats.alloc_stats.waive;
rq_stats->pp_alloc_refill = stats.alloc_stats.refill;
- rq_stats->pp_recycle_cached = stats.recycle_stats.cached;
- rq_stats->pp_recycle_cache_full = stats.recycle_stats.cache_full;
- rq_stats->pp_recycle_ring = stats.recycle_stats.ring;
- rq_stats->pp_recycle_ring_full = stats.recycle_stats.ring_full;
- rq_stats->pp_recycle_released_ref = stats.recycle_stats.released_refcnt;
+ rq_stats->pp_recycle_cached = u64_stats_read(&stats.recycle_stats.cached);
+ rq_stats->pp_recycle_cache_full = u64_stats_read(&stats.recycle_stats.cache_full);
+ rq_stats->pp_recycle_ring = u64_stats_read(&stats.recycle_stats.ring);
+ rq_stats->pp_recycle_ring_full = u64_stats_read(&stats.recycle_stats.ring_full);
+ rq_stats->pp_recycle_released_ref = u64_stats_read(&stats.recycle_stats.released_refcnt);
}
#else
static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 457879938fc19..086bd4a51cfe9 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -94,6 +94,8 @@ static inline void u64_stats_inc(u64_stats_t *p)
local64_inc(&p->v);
}
+#define U64_STATS_ZERO(_member, _name) {}
+
static inline void u64_stats_init(struct u64_stats_sync *syncp) { }
static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp) { }
static inline void __u64_stats_update_end(struct u64_stats_sync *syncp) { }
@@ -141,6 +143,9 @@ static inline void u64_stats_inc(u64_stats_t *p)
seqcount_init(&__s->seq); \
} while (0)
+#define U64_STATS_ZERO(_member, _name) \
+ _member.seq = SEQCNT_ZERO(#_name#_member.seq)
+
static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp)
{
preempt_disable_nested();
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 7f405672b089d..c5ad80a542b7d 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -6,6 +6,7 @@
#include <linux/dma-direction.h>
#include <linux/ptr_ring.h>
#include <linux/types.h>
+#include <linux/u64_stats_sync.h>
#include <net/netmem.h>
#define PP_FLAG_DMA_MAP BIT(0) /* Should page_pool do the DMA
@@ -114,6 +115,7 @@ struct page_pool_alloc_stats {
/**
* struct page_pool_recycle_stats - recycling (freeing) statistics
+ * @syncp: synchronisations point for updates.
* @cached: recycling placed page in the page pool cache
* @cache_full: page pool cache was full
* @ring: page placed into the ptr ring
@@ -121,11 +123,12 @@ struct page_pool_alloc_stats {
* @released_refcnt: page released (and not recycled) because refcnt > 1
*/
struct page_pool_recycle_stats {
- u64 cached;
- u64 cache_full;
- u64 ring;
- u64 ring_full;
- u64 released_refcnt;
+ struct u64_stats_sync syncp;
+ u64_stats_t cached;
+ u64_stats_t cache_full;
+ u64_stats_t ring;
+ u64_stats_t ring_full;
+ u64_stats_t released_refcnt;
};
/**
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index f5e908c9e7ad8..36fa14a1e8441 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -37,21 +37,27 @@ DEFINE_STATIC_KEY_FALSE(page_pool_mem_providers);
#define BIAS_MAX (LONG_MAX >> 1)
#ifdef CONFIG_PAGE_POOL_STATS
-static DEFINE_PER_CPU(struct page_pool_recycle_stats, pp_system_recycle_stats);
+static DEFINE_PER_CPU(struct page_pool_recycle_stats, pp_system_recycle_stats) = {
+ U64_STATS_ZERO(.syncp, pp_system_recycle_stats),
+};
/* alloc_stat_inc is intended to be used in softirq context */
#define alloc_stat_inc(pool, __stat) (pool->alloc_stats.__stat++)
/* recycle_stat_inc is safe to use when preemption is possible. */
#define recycle_stat_inc(pool, __stat) \
do { \
- struct page_pool_recycle_stats __percpu *s = pool->recycle_stats; \
- this_cpu_inc(s->__stat); \
+ struct page_pool_recycle_stats *s = this_cpu_ptr(pool->recycle_stats); \
+ u64_stats_update_begin(&s->syncp); \
+ u64_stats_inc(&s->__stat); \
+ u64_stats_update_end(&s->syncp); \
} while (0)
#define recycle_stat_add(pool, __stat, val) \
do { \
- struct page_pool_recycle_stats __percpu *s = pool->recycle_stats; \
- this_cpu_add(s->__stat, val); \
+ struct page_pool_recycle_stats *s = this_cpu_ptr(pool->recycle_stats); \
+ u64_stats_update_begin(&s->syncp); \
+ u64_stats_add(&s->__stat, val); \
+ u64_stats_update_end(&s->syncp); \
} while (0)
static const char pp_stats[][ETH_GSTRING_LEN] = {
@@ -82,6 +88,7 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
bool page_pool_get_stats(const struct page_pool *pool,
struct page_pool_stats *stats)
{
+ unsigned int start;
int cpu = 0;
if (!stats)
@@ -99,11 +106,19 @@ bool page_pool_get_stats(const struct page_pool *pool,
const struct page_pool_recycle_stats *pcpu =
per_cpu_ptr(pool->recycle_stats, cpu);
- stats->recycle_stats.cached += pcpu->cached;
- stats->recycle_stats.cache_full += pcpu->cache_full;
- stats->recycle_stats.ring += pcpu->ring;
- stats->recycle_stats.ring_full += pcpu->ring_full;
- stats->recycle_stats.released_refcnt += pcpu->released_refcnt;
+ do {
+ start = u64_stats_fetch_begin(&pcpu->syncp);
+ u64_stats_add(&stats->recycle_stats.cached,
+ u64_stats_read(&pcpu->cached));
+ u64_stats_add(&stats->recycle_stats.cache_full,
+ u64_stats_read(&pcpu->cache_full));
+ u64_stats_add(&stats->recycle_stats.ring,
+ u64_stats_read(&pcpu->ring));
+ u64_stats_add(&stats->recycle_stats.ring_full,
+ u64_stats_read(&pcpu->ring_full));
+ u64_stats_add(&stats->recycle_stats.released_refcnt,
+ u64_stats_read(&pcpu->released_refcnt));
+ } while (u64_stats_fetch_retry(&pcpu->syncp, start));
}
return true;
@@ -139,11 +154,11 @@ u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
*data++ = pool_stats->alloc_stats.empty;
*data++ = pool_stats->alloc_stats.refill;
*data++ = pool_stats->alloc_stats.waive;
- *data++ = pool_stats->recycle_stats.cached;
- *data++ = pool_stats->recycle_stats.cache_full;
- *data++ = pool_stats->recycle_stats.ring;
- *data++ = pool_stats->recycle_stats.ring_full;
- *data++ = pool_stats->recycle_stats.released_refcnt;
+ *data++ = u64_stats_read(&pool_stats->recycle_stats.cached);
+ *data++ = u64_stats_read(&pool_stats->recycle_stats.cache_full);
+ *data++ = u64_stats_read(&pool_stats->recycle_stats.ring);
+ *data++ = u64_stats_read(&pool_stats->recycle_stats.ring_full);
+ *data++ = u64_stats_read(&pool_stats->recycle_stats.released_refcnt);
return data;
}
@@ -247,9 +262,14 @@ static int page_pool_init(struct page_pool *pool,
#ifdef CONFIG_PAGE_POOL_STATS
if (!(pool->slow.flags & PP_FLAG_SYSTEM_POOL)) {
+ unsigned int cpu;
+
pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
if (!pool->recycle_stats)
return -ENOMEM;
+
+ for_each_possible_cpu(cpu)
+ u64_stats_init(&per_cpu_ptr(pool->recycle_stats, cpu)->syncp);
} else {
/* For system page pool instance we use a singular stats object
* instead of allocating a separate percpu variable for each
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 6677e0c2e2565..0d038c0c8996d 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -149,15 +149,15 @@ page_pool_nl_stats_fill(struct sk_buff *rsp, const struct page_pool *pool,
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
stats.alloc_stats.waive) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
- stats.recycle_stats.cached) ||
+ u64_stats_read(&stats.recycle_stats.cached)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
- stats.recycle_stats.cache_full) ||
+ u64_stats_read(&stats.recycle_stats.cache_full)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING,
- stats.recycle_stats.ring) ||
+ u64_stats_read(&stats.recycle_stats.ring)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RING_FULL,
- stats.recycle_stats.ring_full) ||
+ u64_stats_read(&stats.recycle_stats.ring_full)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_RELEASED_REFCNT,
- stats.recycle_stats.released_refcnt))
+ u64_stats_read(&stats.recycle_stats.released_refcnt)))
goto err_cancel_msg;
genlmsg_end(rsp, hdr);
--
2.47.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH net-next 2/2] page_pool: Convert page_pool_alloc_stats to u64_stats_t.
2025-02-21 11:52 [PATCH net-next 0/2] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
2025-02-21 11:52 ` [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats " Sebastian Andrzej Siewior
@ 2025-02-21 11:52 ` Sebastian Andrzej Siewior
2025-02-21 17:30 ` Joe Damato
2025-02-22 8:13 ` Yunsheng Lin
2025-02-21 17:10 ` [PATCH net-next 0/2] page_pool: Convert stats " Joe Damato
2 siblings, 2 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-21 11:52 UTC (permalink / raw)
To: linux-rdma, netdev
Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
Jakub Kicinski, Jesper Dangaard Brouer, Leon Romanovsky,
Paolo Abeni, Saeed Mahameed, Simon Horman, Tariq Toukan,
Thomas Gleixner, Yunsheng Lin, Sebastian Andrzej Siewior
Using u64 for statistics can lead to inconsistency on 32bit because an
update and a read requires to access two 32bit values.
This can be avoided by using u64_stats_t for the counters and
u64_stats_sync for the required synchronisation on 32bit platforms. The
synchronisation is a NOP on 64bit architectures.
Use u64_stats_t for the counters in page_pool_recycle_stats.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
.../ethernet/mellanox/mlx5/core/en_stats.c | 12 ++---
include/net/page_pool/types.h | 14 +++---
net/core/page_pool.c | 45 +++++++++++++------
net/core/page_pool_user.c | 12 ++---
4 files changed, 52 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index baff961970f25..afb5c135b68c1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -506,12 +506,12 @@ static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
if (!page_pool_get_stats(pool, &stats))
return;
- rq_stats->pp_alloc_fast = stats.alloc_stats.fast;
- rq_stats->pp_alloc_slow = stats.alloc_stats.slow;
- rq_stats->pp_alloc_slow_high_order = stats.alloc_stats.slow_high_order;
- rq_stats->pp_alloc_empty = stats.alloc_stats.empty;
- rq_stats->pp_alloc_waive = stats.alloc_stats.waive;
- rq_stats->pp_alloc_refill = stats.alloc_stats.refill;
+ rq_stats->pp_alloc_fast = u64_stats_read(&stats.alloc_stats.fast);
+ rq_stats->pp_alloc_slow = u64_stats_read(&stats.alloc_stats.slow);
+ rq_stats->pp_alloc_slow_high_order = u64_stats_read(&stats.alloc_stats.slow_high_order);
+ rq_stats->pp_alloc_empty = u64_stats_read(&stats.alloc_stats.empty);
+ rq_stats->pp_alloc_waive = u64_stats_read(&stats.alloc_stats.waive);
+ rq_stats->pp_alloc_refill = u64_stats_read(&stats.alloc_stats.refill);
rq_stats->pp_recycle_cached = u64_stats_read(&stats.recycle_stats.cached);
rq_stats->pp_recycle_cache_full = u64_stats_read(&stats.recycle_stats.cache_full);
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index c5ad80a542b7d..f45d55e6e8643 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -96,6 +96,7 @@ struct page_pool_params {
#ifdef CONFIG_PAGE_POOL_STATS
/**
* struct page_pool_alloc_stats - allocation statistics
+ * @syncp: synchronisations point for updates.
* @fast: successful fast path allocations
* @slow: slow path order-0 allocations
* @slow_high_order: slow path high order allocations
@@ -105,12 +106,13 @@ struct page_pool_params {
* the cache due to a NUMA mismatch
*/
struct page_pool_alloc_stats {
- u64 fast;
- u64 slow;
- u64 slow_high_order;
- u64 empty;
- u64 refill;
- u64 waive;
+ struct u64_stats_sync syncp;
+ u64_stats_t fast;
+ u64_stats_t slow;
+ u64_stats_t slow_high_order;
+ u64_stats_t empty;
+ u64_stats_t refill;
+ u64_stats_t waive;
};
/**
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 36fa14a1e8441..d69a03609613b 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -42,7 +42,14 @@ static DEFINE_PER_CPU(struct page_pool_recycle_stats, pp_system_recycle_stats) =
};
/* alloc_stat_inc is intended to be used in softirq context */
-#define alloc_stat_inc(pool, __stat) (pool->alloc_stats.__stat++)
+#define alloc_stat_inc(pool, __stat) \
+ do { \
+ struct page_pool_alloc_stats *s = &pool->alloc_stats; \
+ u64_stats_update_begin(&s->syncp); \
+ u64_stats_inc(&s->__stat); \
+ u64_stats_update_end(&s->syncp); \
+ } while (0)
+
/* recycle_stat_inc is safe to use when preemption is possible. */
#define recycle_stat_inc(pool, __stat) \
do { \
@@ -88,19 +95,30 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
bool page_pool_get_stats(const struct page_pool *pool,
struct page_pool_stats *stats)
{
+ const struct page_pool_alloc_stats *alloc_stats;
unsigned int start;
int cpu = 0;
if (!stats)
return false;
+ alloc_stats = &pool->alloc_stats;
/* The caller is responsible to initialize stats. */
- stats->alloc_stats.fast += pool->alloc_stats.fast;
- stats->alloc_stats.slow += pool->alloc_stats.slow;
- stats->alloc_stats.slow_high_order += pool->alloc_stats.slow_high_order;
- stats->alloc_stats.empty += pool->alloc_stats.empty;
- stats->alloc_stats.refill += pool->alloc_stats.refill;
- stats->alloc_stats.waive += pool->alloc_stats.waive;
+ do {
+ start = u64_stats_fetch_begin(&alloc_stats->syncp);
+ u64_stats_add(&stats->alloc_stats.fast,
+ u64_stats_read(&alloc_stats->fast));
+ u64_stats_add(&stats->alloc_stats.slow,
+ u64_stats_read(&alloc_stats->slow));
+ u64_stats_add(&stats->alloc_stats.slow_high_order,
+ u64_stats_read(&alloc_stats->slow_high_order));
+ u64_stats_add(&stats->alloc_stats.empty,
+ u64_stats_read(&alloc_stats->empty));
+ u64_stats_add(&stats->alloc_stats.refill,
+ u64_stats_read(&alloc_stats->refill));
+ u64_stats_add(&stats->alloc_stats.waive,
+ u64_stats_read(&alloc_stats->waive));
+ } while (u64_stats_fetch_retry(&alloc_stats->syncp, start));
for_each_possible_cpu(cpu) {
const struct page_pool_recycle_stats *pcpu =
@@ -148,12 +166,12 @@ u64 *page_pool_ethtool_stats_get(u64 *data, const void *stats)
{
const struct page_pool_stats *pool_stats = stats;
- *data++ = pool_stats->alloc_stats.fast;
- *data++ = pool_stats->alloc_stats.slow;
- *data++ = pool_stats->alloc_stats.slow_high_order;
- *data++ = pool_stats->alloc_stats.empty;
- *data++ = pool_stats->alloc_stats.refill;
- *data++ = pool_stats->alloc_stats.waive;
+ *data++ = u64_stats_read(&pool_stats->alloc_stats.fast);
+ *data++ = u64_stats_read(&pool_stats->alloc_stats.slow);
+ *data++ = u64_stats_read(&pool_stats->alloc_stats.slow_high_order);
+ *data++ = u64_stats_read(&pool_stats->alloc_stats.empty);
+ *data++ = u64_stats_read(&pool_stats->alloc_stats.refill);
+ *data++ = u64_stats_read(&pool_stats->alloc_stats.waive);
*data++ = u64_stats_read(&pool_stats->recycle_stats.cached);
*data++ = u64_stats_read(&pool_stats->recycle_stats.cache_full);
*data++ = u64_stats_read(&pool_stats->recycle_stats.ring);
@@ -278,6 +296,7 @@ static int page_pool_init(struct page_pool *pool,
pool->recycle_stats = &pp_system_recycle_stats;
pool->system = true;
}
+ u64_stats_init(&pool->alloc_stats.syncp);
#endif
if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0) {
diff --git a/net/core/page_pool_user.c b/net/core/page_pool_user.c
index 0d038c0c8996d..c368cb141147f 100644
--- a/net/core/page_pool_user.c
+++ b/net/core/page_pool_user.c
@@ -137,17 +137,17 @@ page_pool_nl_stats_fill(struct sk_buff *rsp, const struct page_pool *pool,
nla_nest_end(rsp, nest);
if (nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_FAST,
- stats.alloc_stats.fast) ||
+ u64_stats_read(&stats.alloc_stats.fast)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW,
- stats.alloc_stats.slow) ||
+ u64_stats_read(&stats.alloc_stats.slow)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_SLOW_HIGH_ORDER,
- stats.alloc_stats.slow_high_order) ||
+ u64_stats_read(&stats.alloc_stats.slow_high_order)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_EMPTY,
- stats.alloc_stats.empty) ||
+ u64_stats_read(&stats.alloc_stats.empty)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_REFILL,
- stats.alloc_stats.refill) ||
+ u64_stats_read(&stats.alloc_stats.refill)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_ALLOC_WAIVE,
- stats.alloc_stats.waive) ||
+ u64_stats_read(&stats.alloc_stats.waive)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHED,
u64_stats_read(&stats.recycle_stats.cached)) ||
nla_put_uint(rsp, NETDEV_A_PAGE_POOL_STATS_RECYCLE_CACHE_FULL,
--
2.47.2
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] page_pool: Convert stats to u64_stats_t.
2025-02-21 11:52 [PATCH net-next 0/2] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
2025-02-21 11:52 ` [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats " Sebastian Andrzej Siewior
2025-02-21 11:52 ` [PATCH net-next 2/2] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
@ 2025-02-21 17:10 ` Joe Damato
2025-02-26 10:27 ` Sebastian Andrzej Siewior
2 siblings, 1 reply; 12+ messages in thread
From: Joe Damato @ 2025-02-21 17:10 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rdma, netdev, David S. Miller, Andrew Lunn, Eric Dumazet,
Ilias Apalodimas, Jakub Kicinski, Jesper Dangaard Brouer,
Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
Tariq Toukan, Thomas Gleixner, Yunsheng Lin
On Fri, Feb 21, 2025 at 12:52:19PM +0100, Sebastian Andrzej Siewior wrote:
> This is a follow-up on
> https://lore.kernel.org/all/20250213093925.x_ggH1aj@linutronix.de/
>
> to convert the page_pool statistics to u64_stats_t to avoid u64 related
> problems on 32bit architectures.
> While looking over it, the comment for recycle_stat_inc() says that it
> is safe to use in preemptible context.
I wrote that comment because it's an increment of a per-cpu counter.
The documentation in Documentation/core-api/this_cpu_ops.rst
explains in more depth, but this_cpu_inc is safe to use without
worrying about pre-emption and interrupts.
> The 32bit update is split into two 32bit writes and if we get
> preempted in the middle and another one makes an update then the
> value gets inconsistent and the previous update can overwrite the
> following. (Rare but still).
Have you seen this? Can you show the generated assembly which
suggests that this occurs? It would be helpful if you could show the
before and after 32-bit assembly code.
I am asking because in arch/x86/include/asm/percpu.h a lot of care
is taken to generate the correct assembly for various sizes and I
am skeptical that this_cpu_inc behaves correctly on 64bit but
incorrectly on 32bit x86. It's certainly possible, but IMHO, we
should be sure that this is the case.
If you could show that the generated assembly on 32bit was not
prempt/irq safe then probably we'd also want to update the
this_cpu_ops documentation?
> I don't know if it is ensured that only *one* update can happen because
> the stats are per-CPU and per NAPI device. But there will be now a
> warning on 32bit if this is really attempted in preemptible context.
Please see Documentation/core-api/this_cpu_ops.rst for a more
detailed explanation.
At a high level, only one per-cpu counter is incremented. The
individual per-cpu counters don't mean anything on their own
(because the increment could happen on any CPU); the sum of the
values is what has meaning.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats to u64_stats_t.
2025-02-21 11:52 ` [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats " Sebastian Andrzej Siewior
@ 2025-02-21 17:21 ` Joe Damato
2025-02-26 12:06 ` Sebastian Andrzej Siewior
2025-02-22 8:13 ` Yunsheng Lin
1 sibling, 1 reply; 12+ messages in thread
From: Joe Damato @ 2025-02-21 17:21 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rdma, netdev, David S. Miller, Andrew Lunn, Eric Dumazet,
Ilias Apalodimas, Jakub Kicinski, Jesper Dangaard Brouer,
Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
Tariq Toukan, Thomas Gleixner, Yunsheng Lin
On Fri, Feb 21, 2025 at 12:52:20PM +0100, Sebastian Andrzej Siewior wrote:
> Using u64 for statistics can lead to inconsistency on 32bit because an
> update and a read requires to access two 32bit values.
> This can be avoided by using u64_stats_t for the counters and
> u64_stats_sync for the required synchronisation on 32bit platforms. The
> synchronisation is a NOP on 64bit architectures.
As mentioned in my response to the cover letter, I'd want to see
before/after 32bit assembly to ensure that this assertion is
correct.
[...]
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> index 611ec4b6f3709..baff961970f25 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> @@ -501,7 +501,7 @@ static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
> {
> struct mlx5e_rq_stats *rq_stats = c->rq.stats;
> struct page_pool *pool = c->rq.page_pool;
> - struct page_pool_stats stats = { 0 };
> + struct page_pool_stats stats = { };
>
> if (!page_pool_get_stats(pool, &stats))
> return;
> @@ -513,11 +513,11 @@ static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
> rq_stats->pp_alloc_waive = stats.alloc_stats.waive;
> rq_stats->pp_alloc_refill = stats.alloc_stats.refill;
>
> - rq_stats->pp_recycle_cached = stats.recycle_stats.cached;
> - rq_stats->pp_recycle_cache_full = stats.recycle_stats.cache_full;
> - rq_stats->pp_recycle_ring = stats.recycle_stats.ring;
> - rq_stats->pp_recycle_ring_full = stats.recycle_stats.ring_full;
> - rq_stats->pp_recycle_released_ref = stats.recycle_stats.released_refcnt;
> + rq_stats->pp_recycle_cached = u64_stats_read(&stats.recycle_stats.cached);
> + rq_stats->pp_recycle_cache_full = u64_stats_read(&stats.recycle_stats.cache_full);
> + rq_stats->pp_recycle_ring = u64_stats_read(&stats.recycle_stats.ring);
> + rq_stats->pp_recycle_ring_full = u64_stats_read(&stats.recycle_stats.ring_full);
> + rq_stats->pp_recycle_released_ref = u64_stats_read(&stats.recycle_stats.released_refcnt);
> }
> #else
> static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
It might be better to convert mlx5 to
page_pool_ethtool_stats_get_strings and
page_pool_ethtool_stats_get_count instead ?
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 2/2] page_pool: Convert page_pool_alloc_stats to u64_stats_t.
2025-02-21 11:52 ` [PATCH net-next 2/2] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
@ 2025-02-21 17:30 ` Joe Damato
2025-02-22 8:13 ` Yunsheng Lin
1 sibling, 0 replies; 12+ messages in thread
From: Joe Damato @ 2025-02-21 17:30 UTC (permalink / raw)
To: Sebastian Andrzej Siewior
Cc: linux-rdma, netdev, David S. Miller, Andrew Lunn, Eric Dumazet,
Ilias Apalodimas, Jakub Kicinski, Jesper Dangaard Brouer,
Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
Tariq Toukan, Thomas Gleixner, Yunsheng Lin
On Fri, Feb 21, 2025 at 12:52:21PM +0100, Sebastian Andrzej Siewior wrote:
> Using u64 for statistics can lead to inconsistency on 32bit because an
> update and a read requires to access two 32bit values.
> This can be avoided by using u64_stats_t for the counters and
> u64_stats_sync for the required synchronisation on 32bit platforms. The
> synchronisation is a NOP on 64bit architectures.
Same as in previous messages: I'd want to see clearly that this is
indeed an issue on 32bit systems showing before/after assembly.
> Use u64_stats_t for the counters in page_pool_recycle_stats.
Commit message says page_pool_recycle_stats, but code below is for
alloc stats.
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> .../ethernet/mellanox/mlx5/core/en_stats.c | 12 ++---
> include/net/page_pool/types.h | 14 +++---
> net/core/page_pool.c | 45 +++++++++++++------
> net/core/page_pool_user.c | 12 ++---
> 4 files changed, 52 insertions(+), 31 deletions(-)
[...]
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -96,6 +96,7 @@ struct page_pool_params {
> #ifdef CONFIG_PAGE_POOL_STATS
> /**
> * struct page_pool_alloc_stats - allocation statistics
> + * @syncp: synchronisations point for updates.
> * @fast: successful fast path allocations
> * @slow: slow path order-0 allocations
> * @slow_high_order: slow path high order allocations
> @@ -105,12 +106,13 @@ struct page_pool_params {
> * the cache due to a NUMA mismatch
> */
> struct page_pool_alloc_stats {
> - u64 fast;
> - u64 slow;
> - u64 slow_high_order;
> - u64 empty;
> - u64 refill;
> - u64 waive;
> + struct u64_stats_sync syncp;
> + u64_stats_t fast;
> + u64_stats_t slow;
> + u64_stats_t slow_high_order;
> + u64_stats_t empty;
> + u64_stats_t refill;
> + u64_stats_t waive;
> };
When I tried to get this in initially, Jesper had feelings about the
cacheline placement of the counters. I have no idea if that is still
the case or not.
My suggestion to you (assuming that your initial assertion is
correct that this_cpu_inc isn't safe on 32bit x86) would be to:
- include pahole output showing the placement of these counters
- include the same benchmarks I included in the original series
[1] that Jesper requested from me. I believe the code for the
benchmarks can be found here:
https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/lib
That would probably make it easier for the page pool people to
review / ack and would likely result in fewer revisions.
[1]: https://lore.kernel.org/all/1646172610-129397-1-git-send-email-jdamato@fastly.com/
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats to u64_stats_t.
2025-02-21 11:52 ` [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats " Sebastian Andrzej Siewior
2025-02-21 17:21 ` Joe Damato
@ 2025-02-22 8:13 ` Yunsheng Lin
2025-02-25 11:27 ` Paolo Abeni
2025-02-26 9:28 ` Sebastian Andrzej Siewior
1 sibling, 2 replies; 12+ messages in thread
From: Yunsheng Lin @ 2025-02-22 8:13 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, linux-rdma, netdev
Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
Jakub Kicinski, Jesper Dangaard Brouer, Leon Romanovsky,
Paolo Abeni, Saeed Mahameed, Simon Horman, Tariq Toukan,
Thomas Gleixner
On 2025/2/21 19:52, Sebastian Andrzej Siewior wrote:
>
> static const char pp_stats[][ETH_GSTRING_LEN] = {
> @@ -82,6 +88,7 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
> bool page_pool_get_stats(const struct page_pool *pool,
> struct page_pool_stats *stats)
> {
> + unsigned int start;
> int cpu = 0;
>
> if (!stats)
> @@ -99,11 +106,19 @@ bool page_pool_get_stats(const struct page_pool *pool,
> const struct page_pool_recycle_stats *pcpu =
> per_cpu_ptr(pool->recycle_stats, cpu);
>
> - stats->recycle_stats.cached += pcpu->cached;
> - stats->recycle_stats.cache_full += pcpu->cache_full;
> - stats->recycle_stats.ring += pcpu->ring;
> - stats->recycle_stats.ring_full += pcpu->ring_full;
> - stats->recycle_stats.released_refcnt += pcpu->released_refcnt;
> + do {
> + start = u64_stats_fetch_begin(&pcpu->syncp);
> + u64_stats_add(&stats->recycle_stats.cached,
> + u64_stats_read(&pcpu->cached));
> + u64_stats_add(&stats->recycle_stats.cache_full,
> + u64_stats_read(&pcpu->cache_full));
> + u64_stats_add(&stats->recycle_stats.ring,
> + u64_stats_read(&pcpu->ring));
> + u64_stats_add(&stats->recycle_stats.ring_full,
> + u64_stats_read(&pcpu->ring_full));
> + u64_stats_add(&stats->recycle_stats.released_refcnt,
> + u64_stats_read(&pcpu->released_refcnt));
It seems the above u64_stats_add() may be called more than one time
if the below u64_stats_fetch_retry() returns true, which might mean
the stats is added more than it is needed.
It seems more correct to me that pool->alloc_stats is read into a
local varible in the while loop and then do the addition outside
the while loop?
> + } while (u64_stats_fetch_retry(&pcpu->syncp, start));
> }
>
> return true;
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 2/2] page_pool: Convert page_pool_alloc_stats to u64_stats_t.
2025-02-21 11:52 ` [PATCH net-next 2/2] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
2025-02-21 17:30 ` Joe Damato
@ 2025-02-22 8:13 ` Yunsheng Lin
1 sibling, 0 replies; 12+ messages in thread
From: Yunsheng Lin @ 2025-02-22 8:13 UTC (permalink / raw)
To: Sebastian Andrzej Siewior, linux-rdma, netdev
Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
Jakub Kicinski, Jesper Dangaard Brouer, Leon Romanovsky,
Paolo Abeni, Saeed Mahameed, Simon Horman, Tariq Toukan,
Thomas Gleixner
On 2025/2/21 19:52, Sebastian Andrzej Siewior wrote:
...
\
> @@ -88,19 +95,30 @@ static const char pp_stats[][ETH_GSTRING_LEN] = {
> bool page_pool_get_stats(const struct page_pool *pool,
> struct page_pool_stats *stats)
> {
> + const struct page_pool_alloc_stats *alloc_stats;
> unsigned int start;
> int cpu = 0;
>
> if (!stats)
> return false;
>
> + alloc_stats = &pool->alloc_stats;
> /* The caller is responsible to initialize stats. */
> - stats->alloc_stats.fast += pool->alloc_stats.fast;
> - stats->alloc_stats.slow += pool->alloc_stats.slow;
> - stats->alloc_stats.slow_high_order += pool->alloc_stats.slow_high_order;
> - stats->alloc_stats.empty += pool->alloc_stats.empty;
> - stats->alloc_stats.refill += pool->alloc_stats.refill;
> - stats->alloc_stats.waive += pool->alloc_stats.waive;
> + do {
> + start = u64_stats_fetch_begin(&alloc_stats->syncp);
> + u64_stats_add(&stats->alloc_stats.fast,
> + u64_stats_read(&alloc_stats->fast));
> + u64_stats_add(&stats->alloc_stats.slow,
> + u64_stats_read(&alloc_stats->slow));
> + u64_stats_add(&stats->alloc_stats.slow_high_order,
> + u64_stats_read(&alloc_stats->slow_high_order));
> + u64_stats_add(&stats->alloc_stats.empty,
> + u64_stats_read(&alloc_stats->empty));
> + u64_stats_add(&stats->alloc_stats.refill,
> + u64_stats_read(&alloc_stats->refill));
> + u64_stats_add(&stats->alloc_stats.waive,
> + u64_stats_read(&alloc_stats->waive));
similar comment as patch 1.
> + } while (u64_stats_fetch_retry(&alloc_stats->syncp, start));
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats to u64_stats_t.
2025-02-22 8:13 ` Yunsheng Lin
@ 2025-02-25 11:27 ` Paolo Abeni
2025-02-26 9:28 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 12+ messages in thread
From: Paolo Abeni @ 2025-02-25 11:27 UTC (permalink / raw)
To: Yunsheng Lin, Sebastian Andrzej Siewior, linux-rdma, netdev
Cc: David S. Miller, Andrew Lunn, Eric Dumazet, Ilias Apalodimas,
Jakub Kicinski, Jesper Dangaard Brouer, Leon Romanovsky,
Saeed Mahameed, Simon Horman, Tariq Toukan, Thomas Gleixner
On 2/22/25 9:13 AM, Yunsheng Lin wrote:
> On 2025/2/21 19:52, Sebastian Andrzej Siewior wrote:
>> @@ -99,11 +106,19 @@ bool page_pool_get_stats(const struct page_pool *pool,
>> const struct page_pool_recycle_stats *pcpu =
>> per_cpu_ptr(pool->recycle_stats, cpu);
>>
>> - stats->recycle_stats.cached += pcpu->cached;
>> - stats->recycle_stats.cache_full += pcpu->cache_full;
>> - stats->recycle_stats.ring += pcpu->ring;
>> - stats->recycle_stats.ring_full += pcpu->ring_full;
>> - stats->recycle_stats.released_refcnt += pcpu->released_refcnt;
>> + do {
>> + start = u64_stats_fetch_begin(&pcpu->syncp);
>> + u64_stats_add(&stats->recycle_stats.cached,
>> + u64_stats_read(&pcpu->cached));
>> + u64_stats_add(&stats->recycle_stats.cache_full,
>> + u64_stats_read(&pcpu->cache_full));
>> + u64_stats_add(&stats->recycle_stats.ring,
>> + u64_stats_read(&pcpu->ring));
>> + u64_stats_add(&stats->recycle_stats.ring_full,
>> + u64_stats_read(&pcpu->ring_full));
>> + u64_stats_add(&stats->recycle_stats.released_refcnt,
>> + u64_stats_read(&pcpu->released_refcnt));
>
> It seems the above u64_stats_add() may be called more than one time
> if the below u64_stats_fetch_retry() returns true, which might mean
> the stats is added more than it is needed.
>
> It seems more correct to me that pool->alloc_stats is read into a
> local varible in the while loop and then do the addition outside
> the while loop?
I also think the above code is incorrect, and local variables to store
the read stats are needed.
/P
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats to u64_stats_t.
2025-02-22 8:13 ` Yunsheng Lin
2025-02-25 11:27 ` Paolo Abeni
@ 2025-02-26 9:28 ` Sebastian Andrzej Siewior
1 sibling, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-26 9:28 UTC (permalink / raw)
To: Yunsheng Lin
Cc: linux-rdma, netdev, David S. Miller, Andrew Lunn, Eric Dumazet,
Ilias Apalodimas, Jakub Kicinski, Jesper Dangaard Brouer,
Leon Romanovsky, Paolo Abeni, Saeed Mahameed, Simon Horman,
Tariq Toukan, Thomas Gleixner
On 2025-02-22 16:13:35 [+0800], Yunsheng Lin wrote:
> > @@ -99,11 +106,19 @@ bool page_pool_get_stats(const struct page_pool *pool,
> > const struct page_pool_recycle_stats *pcpu =
> > per_cpu_ptr(pool->recycle_stats, cpu);
> >
> > - stats->recycle_stats.cached += pcpu->cached;
> > - stats->recycle_stats.cache_full += pcpu->cache_full;
> > - stats->recycle_stats.ring += pcpu->ring;
> > - stats->recycle_stats.ring_full += pcpu->ring_full;
> > - stats->recycle_stats.released_refcnt += pcpu->released_refcnt;
> > + do {
> > + start = u64_stats_fetch_begin(&pcpu->syncp);
> > + u64_stats_add(&stats->recycle_stats.cached,
> > + u64_stats_read(&pcpu->cached));
> > + u64_stats_add(&stats->recycle_stats.cache_full,
> > + u64_stats_read(&pcpu->cache_full));
> > + u64_stats_add(&stats->recycle_stats.ring,
> > + u64_stats_read(&pcpu->ring));
> > + u64_stats_add(&stats->recycle_stats.ring_full,
> > + u64_stats_read(&pcpu->ring_full));
> > + u64_stats_add(&stats->recycle_stats.released_refcnt,
> > + u64_stats_read(&pcpu->released_refcnt));
>
> It seems the above u64_stats_add() may be called more than one time
> if the below u64_stats_fetch_retry() returns true, which might mean
> the stats is added more than it is needed.
>
> It seems more correct to me that pool->alloc_stats is read into a
> local varible in the while loop and then do the addition outside
> the while loop?
indeed, indeed, what was I thinking…
>
> > + } while (u64_stats_fetch_retry(&pcpu->syncp, start));
> > }
> >
> > return true;
Sebastian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 0/2] page_pool: Convert stats to u64_stats_t.
2025-02-21 17:10 ` [PATCH net-next 0/2] page_pool: Convert stats " Joe Damato
@ 2025-02-26 10:27 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-26 10:27 UTC (permalink / raw)
To: Joe Damato, linux-rdma, netdev, David S. Miller, Andrew Lunn,
Eric Dumazet, Ilias Apalodimas, Jakub Kicinski,
Jesper Dangaard Brouer, Leon Romanovsky, Paolo Abeni,
Saeed Mahameed, Simon Horman, Tariq Toukan, Thomas Gleixner,
Yunsheng Lin
On 2025-02-21 12:10:51 [-0500], Joe Damato wrote:
> On Fri, Feb 21, 2025 at 12:52:19PM +0100, Sebastian Andrzej Siewior wrote:
> > This is a follow-up on
> > https://lore.kernel.org/all/20250213093925.x_ggH1aj@linutronix.de/
> >
> > to convert the page_pool statistics to u64_stats_t to avoid u64 related
> > problems on 32bit architectures.
> > While looking over it, the comment for recycle_stat_inc() says that it
> > is safe to use in preemptible context.
>
> I wrote that comment because it's an increment of a per-cpu counter.
>
> The documentation in Documentation/core-api/this_cpu_ops.rst
> explains in more depth, but this_cpu_inc is safe to use without
> worrying about pre-emption and interrupts.
I don't argue that it is not safe to use in preemptible context. I am
just saying that it is not safe after the rework. If it is really used
like that, then it is no longer safe to do so (after the rework).
> > The 32bit update is split into two 32bit writes and if we get
> > preempted in the middle and another one makes an update then the
> > value gets inconsistent and the previous update can overwrite the
> > following. (Rare but still).
>
> Have you seen this? Can you show the generated assembly which
> suggests that this occurs? It would be helpful if you could show the
> before and after 32-bit assembly code.
…
So there are two things:
First we have alloc_stat_inc(), which does
#define alloc_stat_inc(pool, __stat) (pool->alloc_stats.__stat++)
so on x86 32bit this turns into
| addl $1, 120(%ebx) #, pool_15(D)->alloc_stats.fast
| adcl $0, 124(%ebx) #, pool_15(D)->alloc_stats.fast
So the lower 4 byte are incremented before the higher 4 byte are.
recycle_stat_inc() is using this_cpu_inc() and performs a similar
update. On x86-32 it turns into
| movl 836(%ebx), %eax # pool_15(D)->recycle_stats, s
| pushf ; pop %edx # flags
| cli
| movl %fs:this_cpu_off, %ecx # *_20,
| addl %ecx, %eax #, _42
| addl $1, (%eax) #, *_42
| adcl $0, 4(%eax) #, *_42
| testb $2, %dh #, flags
| je .L508 #,
| sti
|.L508:
so the update can be performed safely in preemptible context as the CPU
is determined within the IRQ-off section and so is the increment itself
performed. It updates always the local value belonging to the CPU.
Reading the values locally (on the CPU that is doing the update) is okay
but reading the value from a remote CPU while an update might be done
can result in reading the lower 4 bytes before the upper 4 bytes are
visible.
This can lead to an inconsistent value on 32bit which is likely
"corrected" on the next read.
Thus my initial question: Do we care? If so, I suggest to use u64_stats
like most of the networking stack. However if we do so, the update must
be performed with disabled-BH as I assume the updates are done in
softirq context. It must be avoided that one update preempts another.
Sebastian
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats to u64_stats_t.
2025-02-21 17:21 ` Joe Damato
@ 2025-02-26 12:06 ` Sebastian Andrzej Siewior
0 siblings, 0 replies; 12+ messages in thread
From: Sebastian Andrzej Siewior @ 2025-02-26 12:06 UTC (permalink / raw)
To: Joe Damato, linux-rdma, netdev, David S. Miller, Andrew Lunn,
Eric Dumazet, Ilias Apalodimas, Jakub Kicinski,
Jesper Dangaard Brouer, Leon Romanovsky, Paolo Abeni,
Saeed Mahameed, Simon Horman, Tariq Toukan, Thomas Gleixner,
Yunsheng Lin
On 2025-02-21 12:21:40 [-0500], Joe Damato wrote:
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> > index 611ec4b6f3709..baff961970f25 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
> > @@ -513,11 +513,11 @@ static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
> > rq_stats->pp_alloc_waive = stats.alloc_stats.waive;
> > rq_stats->pp_alloc_refill = stats.alloc_stats.refill;
> >
> > - rq_stats->pp_recycle_cached = stats.recycle_stats.cached;
> > - rq_stats->pp_recycle_cache_full = stats.recycle_stats.cache_full;
> > - rq_stats->pp_recycle_ring = stats.recycle_stats.ring;
> > - rq_stats->pp_recycle_ring_full = stats.recycle_stats.ring_full;
> > - rq_stats->pp_recycle_released_ref = stats.recycle_stats.released_refcnt;
> > + rq_stats->pp_recycle_cached = u64_stats_read(&stats.recycle_stats.cached);
> > + rq_stats->pp_recycle_cache_full = u64_stats_read(&stats.recycle_stats.cache_full);
> > + rq_stats->pp_recycle_ring = u64_stats_read(&stats.recycle_stats.ring);
> > + rq_stats->pp_recycle_ring_full = u64_stats_read(&stats.recycle_stats.ring_full);
> > + rq_stats->pp_recycle_released_ref = u64_stats_read(&stats.recycle_stats.released_refcnt);
> > }
> > #else
> > static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
>
> It might be better to convert mlx5 to
> page_pool_ethtool_stats_get_strings and
> page_pool_ethtool_stats_get_count instead ?
You mean something like
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 611ec4b6f3709..76be86ed35b03 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -506,18 +506,7 @@ static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
if (!page_pool_get_stats(pool, &stats))
return;
- rq_stats->pp_alloc_fast = stats.alloc_stats.fast;
- rq_stats->pp_alloc_slow = stats.alloc_stats.slow;
- rq_stats->pp_alloc_slow_high_order = stats.alloc_stats.slow_high_order;
- rq_stats->pp_alloc_empty = stats.alloc_stats.empty;
- rq_stats->pp_alloc_waive = stats.alloc_stats.waive;
- rq_stats->pp_alloc_refill = stats.alloc_stats.refill;
-
- rq_stats->pp_recycle_cached = stats.recycle_stats.cached;
- rq_stats->pp_recycle_cache_full = stats.recycle_stats.cache_full;
- rq_stats->pp_recycle_ring = stats.recycle_stats.ring;
- rq_stats->pp_recycle_ring_full = stats.recycle_stats.ring_full;
- rq_stats->pp_recycle_released_ref = stats.recycle_stats.released_refcnt;
+ page_pool_ethtool_stats_get(&rq_stats->pp_alloc_fast, &stats);
}
#else
static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
?
Because I've been staring on this for a while and it seems that they
have their own logic around struct mlx5e_sw_stats for stats.
Sebastian
^ permalink raw reply related [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-02-26 12:06 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-21 11:52 [PATCH net-next 0/2] page_pool: Convert stats to u64_stats_t Sebastian Andrzej Siewior
2025-02-21 11:52 ` [PATCH net-next 1/2] page_pool: Convert page_pool_recycle_stats " Sebastian Andrzej Siewior
2025-02-21 17:21 ` Joe Damato
2025-02-26 12:06 ` Sebastian Andrzej Siewior
2025-02-22 8:13 ` Yunsheng Lin
2025-02-25 11:27 ` Paolo Abeni
2025-02-26 9:28 ` Sebastian Andrzej Siewior
2025-02-21 11:52 ` [PATCH net-next 2/2] page_pool: Convert page_pool_alloc_stats " Sebastian Andrzej Siewior
2025-02-21 17:30 ` Joe Damato
2025-02-22 8:13 ` Yunsheng Lin
2025-02-21 17:10 ` [PATCH net-next 0/2] page_pool: Convert stats " Joe Damato
2025-02-26 10:27 ` Sebastian Andrzej Siewior
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).