* [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps
@ 2026-04-10 18:22 Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 01/15] net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc() Eric Dumazet
` (13 more replies)
0 siblings, 14 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
We add annotations for data-races, so that most dump methods
can run in parallel with data path.
Then change mq and mqprio to no longer acquire each children
qdisc spinlock.
Next round of patches will wait for linux-7.2.
v2/v3: addressed most sashiko.dev feedbacks.
I think remaining problems (in red offloads) are minor
and can be fixed later.
Eric Dumazet (15):
net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc()
net/sched: add qstats_cpu_drop_inc() helper
net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]
net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()
net/sched: annotate data-races around sch->qstats.backlog
net/sched: sch_sfb: annotate data-races in sfb_dump_stats()
net/sched: sch_red: annotate data-races in red_dump_stats()
net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats()
net/sched: sch_pie: annotate data-races in pie_dump_stats()
net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats()
net_sched: sch_hhf: annotate data-races in hhf_dump_stats()
net/sched: sch_choke: annotate data-races in choke_dump_stats()
net/sched: sch_cake: annotate data-races in cake_dump_stats()
net/sched: mq: no longer acquire qdisc spinlocks in dump operations
net/sched: taprio: prepare taprio_dump() for RTNL removal
include/net/act_api.h | 4 +-
include/net/gen_stats.h | 9 +-
include/net/pie.h | 2 +-
include/net/sch_generic.h | 71 +++++--
net/core/gen_estimator.c | 24 +--
net/core/gen_stats.c | 37 ++--
net/sched/act_api.c | 2 +-
net/sched/act_bpf.c | 2 +-
net/sched/act_ife.c | 12 +-
net/sched/act_mpls.c | 2 +-
net/sched/act_police.c | 4 +-
net/sched/act_skbedit.c | 2 +-
net/sched/act_skbmod.c | 2 +-
net/sched/sch_api.c | 4 +-
net/sched/sch_cake.c | 422 +++++++++++++++++++++-----------------
net/sched/sch_cbs.c | 6 +-
net/sched/sch_choke.c | 34 +--
net/sched/sch_codel.c | 2 +-
net/sched/sch_drr.c | 6 +-
net/sched/sch_dualpi2.c | 4 +-
net/sched/sch_etf.c | 8 +-
net/sched/sch_ets.c | 6 +-
net/sched/sch_fq.c | 6 +-
net/sched/sch_fq_codel.c | 16 +-
net/sched/sch_fq_pie.c | 27 +--
net/sched/sch_generic.c | 8 +-
net/sched/sch_gred.c | 4 +-
net/sched/sch_hfsc.c | 6 +-
net/sched/sch_hhf.c | 26 +--
net/sched/sch_htb.c | 6 +-
net/sched/sch_mq.c | 36 ++--
net/sched/sch_mqprio.c | 81 ++++----
net/sched/sch_multiq.c | 4 +-
net/sched/sch_netem.c | 12 +-
net/sched/sch_pie.c | 38 ++--
net/sched/sch_prio.c | 6 +-
net/sched/sch_qfq.c | 8 +-
net/sched/sch_red.c | 37 ++--
net/sched/sch_sfb.c | 54 +++--
net/sched/sch_sfq.c | 11 +-
net/sched/sch_skbprio.c | 4 +-
net/sched/sch_taprio.c | 46 +++--
net/sched/sch_tbf.c | 10 +-
net/sched/sch_teql.c | 2 +-
44 files changed, 624 insertions(+), 489 deletions(-)
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 01/15] net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc()
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 02/15] net/sched: add qstats_cpu_drop_inc() helper Eric Dumazet
` (12 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
qstats_overlimit_inc() is only used to increment per cpu overlimits.
It can use this_cpu_inc() to avoid this_cpu_ptr() extra cost
and avoid potential store tearing.
Change qstats_overlimit_inc() name and its argument type.
Also add a WRITE_ONCE() in qdisc_qstats_overlimit() to prevent
store tearing.
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-72 (-72)
Function old new delta
tcf_skbmod_act 772 764 -8
tcf_police_act 733 725 -8
tcf_mirred_to_dev 1126 1114 -12
tcf_ife_act 1077 1061 -16
tcf_mirred_act 1324 1296 -28
Total: Before=29610901, After=29610829, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/act_api.h | 2 +-
include/net/sch_generic.h | 6 +++---
net/sched/act_ife.c | 4 ++--
net/sched/act_police.c | 2 +-
net/sched/act_skbmod.c | 2 +-
5 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/include/net/act_api.h b/include/net/act_api.h
index d11b791079302f50c47e174979767e0b24afc59a..2ec4ef9a5d0c8e9110f92f135cc3c31a38af0479 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -250,7 +250,7 @@ static inline void tcf_action_inc_drop_qstats(struct tc_action *a)
static inline void tcf_action_inc_overlimit_qstats(struct tc_action *a)
{
if (likely(a->cpu_qstats)) {
- qstats_overlimit_inc(this_cpu_ptr(a->cpu_qstats));
+ qstats_cpu_overlimit_inc(a->cpu_qstats);
return;
}
atomic_inc(&a->tcfa_overlimits);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 5af262ec4bbd2d5021904df127a849e52c26178a..3ee383c6fc3f66f1aecd9ebc675fbd143852c150 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1004,9 +1004,9 @@ static inline void qstats_drop_inc(struct gnet_stats_queue *qstats)
qstats->drops++;
}
-static inline void qstats_overlimit_inc(struct gnet_stats_queue *qstats)
+static inline void qstats_cpu_overlimit_inc(struct gnet_stats_queue __percpu *qstats)
{
- qstats->overlimits++;
+ this_cpu_inc(qstats->overlimits);
}
static inline void qdisc_qstats_drop(struct Qdisc *sch)
@@ -1021,7 +1021,7 @@ static inline void qdisc_qstats_cpu_drop(struct Qdisc *sch)
static inline void qdisc_qstats_overlimit(struct Qdisc *sch)
{
- sch->qstats.overlimits++;
+ WRITE_ONCE(sch->qstats.overlimits, sch->qstats.overlimits + 1);
}
static inline int qdisc_qstats_copy(struct gnet_dump *d, struct Qdisc *sch)
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index d5e8a91bb4eb9f1f1f084e199b5ada4e7f7e7205..e1b825e14900d6f46bbfd1b7f72ab6cd554d8a73 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -750,7 +750,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
*/
pr_info_ratelimited("Unknown metaid %d dlen %d\n",
mtype, dlen);
- qstats_overlimit_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_overlimit_inc(ife->common.cpu_qstats);
}
}
@@ -814,7 +814,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
/* abuse overlimits to count when we allow packet
* with no metadata
*/
- qstats_overlimit_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_overlimit_inc(ife->common.cpu_qstats);
return action;
}
/* could be stupid policy setup or mtu config
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 12ea9e5a600536b603ea73cc99b4c00381287219..8060f43e4d11c0a26e1475db06b76426f50c5975 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -307,7 +307,7 @@ TC_INDIRECT_SCOPE int tcf_police_act(struct sk_buff *skb,
}
inc_overlimits:
- qstats_overlimit_inc(this_cpu_ptr(police->common.cpu_qstats));
+ qstats_cpu_overlimit_inc(police->common.cpu_qstats);
inc_drops:
if (ret == TC_ACT_SHOT)
qstats_drop_inc(this_cpu_ptr(police->common.cpu_qstats));
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index 23ca46138f040d38de37684439873921bc9c86af..a464b0a3c1b81dba6c28c1141aa38c5c7cad3acb 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -87,7 +87,7 @@ TC_INDIRECT_SCOPE int tcf_skbmod_act(struct sk_buff *skb,
return p->action;
drop:
- qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
+ qstats_cpu_overlimit_inc(d->common.cpu_qstats);
return TC_ACT_SHOT;
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 02/15] net/sched: add qstats_cpu_drop_inc() helper
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 01/15] net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc() Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 03/15] net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu] Eric Dumazet
` (11 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
1) Using this_cpu_inc() is better than going through this_cpu_ptr():
- Single instruction on x86.
- Store tearing prevention.
2) Change tcf_action_update_stats() to use this_cpu_add().
3) Add WRITE_ONCE() to __qdisc_qstats_drop() and qstats_drop_inc().
$ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2
add/remove: 0/0 grow/shrink: 1/10 up/down: 16/-97 (-81)
Function old new delta
tcf_ife_act 1061 1077 +16
tcf_vlan_act 684 676 -8
tcf_skbedit_act 626 618 -8
tcf_police_act 725 717 -8
tcf_mpls_act 1297 1289 -8
tcf_gate_act 310 302 -8
tcf_gact_act 195 187 -8
tcf_csum_act 2905 2897 -8
tcf_bpf_act 749 741 -8
tcf_action_update_stats 124 115 -9
tcf_ct_act 2154 2130 -24
Total: Before=29739602, After=29739521, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/act_api.h | 2 +-
include/net/sch_generic.h | 9 +++++++--
net/sched/act_api.c | 2 +-
net/sched/act_bpf.c | 2 +-
net/sched/act_ife.c | 8 ++++----
net/sched/act_mpls.c | 2 +-
net/sched/act_police.c | 2 +-
net/sched/act_skbedit.c | 2 +-
net/sched/sch_cake.c | 2 +-
net/sched/sch_fq_codel.c | 2 +-
net/sched/sch_gred.c | 2 +-
11 files changed, 20 insertions(+), 15 deletions(-)
diff --git a/include/net/act_api.h b/include/net/act_api.h
index 2ec4ef9a5d0c8e9110f92f135cc3c31a38af0479..167435c5615e09f491a05d01ec86b0c9f9f4fd5b 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -241,7 +241,7 @@ static inline void tcf_action_update_bstats(struct tc_action *a,
static inline void tcf_action_inc_drop_qstats(struct tc_action *a)
{
if (likely(a->cpu_qstats)) {
- qstats_drop_inc(this_cpu_ptr(a->cpu_qstats));
+ qstats_cpu_drop_inc(a->cpu_qstats);
return;
}
atomic_inc(&a->tcfa_drops);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 3ee383c6fc3f66f1aecd9ebc675fbd143852c150..b22579671e4b4dd04c5dfa810b714daaac74af2a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -996,12 +996,17 @@ static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch)
static inline void __qdisc_qstats_drop(struct Qdisc *sch, int count)
{
- sch->qstats.drops += count;
+ WRITE_ONCE(sch->qstats.drops, sch->qstats.drops + count);
}
static inline void qstats_drop_inc(struct gnet_stats_queue *qstats)
{
- qstats->drops++;
+ WRITE_ONCE(qstats->drops, qstats->drops + 1);
+}
+
+static inline void qstats_cpu_drop_inc(struct gnet_stats_queue __percpu *qstats)
+{
+ this_cpu_inc(qstats->drops);
}
static inline void qstats_cpu_overlimit_inc(struct gnet_stats_queue __percpu *qstats)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 332fd9695e54a1fc63bb869c28cacf5f2ed14971..551992683d9e69c247b8d9c613a69e2a897a1e79 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1578,7 +1578,7 @@ void tcf_action_update_stats(struct tc_action *a, u64 bytes, u64 packets,
if (a->cpu_bstats) {
_bstats_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
- this_cpu_ptr(a->cpu_qstats)->drops += drops;
+ this_cpu_add(a->cpu_qstats->drops, drops);
if (hw)
_bstats_update(this_cpu_ptr(a->cpu_bstats_hw),
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index c2b5bc19e09118857d1ef3c4aed566b8225f2e9a..58a074651176730fd1bd370ba8420dfbed0d4e9c 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -76,7 +76,7 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb,
break;
case TC_ACT_SHOT:
action = filter_res;
- qstats_drop_inc(this_cpu_ptr(prog->common.cpu_qstats));
+ qstats_cpu_drop_inc(prog->common.cpu_qstats);
break;
case TC_ACT_UNSPEC:
action = prog->tcf_action;
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index e1b825e14900d6f46bbfd1b7f72ab6cd554d8a73..065228026c58eb0f8ff3b3a08758e4ef0d6ea708 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -727,7 +727,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
tlv_data = ife_decode(skb, &metalen);
if (unlikely(!tlv_data)) {
- qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_drop_inc(ife->common.cpu_qstats);
return TC_ACT_SHOT;
}
@@ -740,7 +740,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
curr_data = ife_tlv_meta_decode(tlv_data, ifehdr_end, &mtype,
&dlen, NULL);
if (!curr_data) {
- qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_drop_inc(ife->common.cpu_qstats);
return TC_ACT_SHOT;
}
@@ -755,7 +755,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
}
if (WARN_ON(tlv_data != ifehdr_end)) {
- qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_drop_inc(ife->common.cpu_qstats);
return TC_ACT_SHOT;
}
@@ -821,7 +821,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
* so lets be conservative.. */
if ((action == TC_ACT_SHOT) || exceed_mtu) {
drop:
- qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_drop_inc(ife->common.cpu_qstats);
return TC_ACT_SHOT;
}
diff --git a/net/sched/act_mpls.c b/net/sched/act_mpls.c
index 1abfaf9d99f1fce0fe7cafa2a9e35c80a3969ce7..4ea8b2e08c3a4dddfe1670af72a5d487a5219f5e 100644
--- a/net/sched/act_mpls.c
+++ b/net/sched/act_mpls.c
@@ -123,7 +123,7 @@ TC_INDIRECT_SCOPE int tcf_mpls_act(struct sk_buff *skb,
return p->action;
drop:
- qstats_drop_inc(this_cpu_ptr(m->common.cpu_qstats));
+ qstats_cpu_drop_inc(m->common.cpu_qstats);
return TC_ACT_SHOT;
}
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 8060f43e4d11c0a26e1475db06b76426f50c5975..b16468a98c55e32260e8d4cb1fe3d771fca65120 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -310,7 +310,7 @@ TC_INDIRECT_SCOPE int tcf_police_act(struct sk_buff *skb,
qstats_cpu_overlimit_inc(police->common.cpu_qstats);
inc_drops:
if (ret == TC_ACT_SHOT)
- qstats_drop_inc(this_cpu_ptr(police->common.cpu_qstats));
+ qstats_cpu_drop_inc(police->common.cpu_qstats);
end:
return ret;
}
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index a778cdba9258c2c776ee5ba0751cca1b73c984df..bfec6b66841031cd566d0c2bdc3d120cec41e3e4 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -86,7 +86,7 @@ TC_INDIRECT_SCOPE int tcf_skbedit_act(struct sk_buff *skb,
return params->action;
err:
- qstats_drop_inc(this_cpu_ptr(d->common.cpu_qstats));
+ qstats_cpu_drop_inc(d->common.cpu_qstats);
return TC_ACT_SHOT;
}
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index ffea9fbd522d8dd3311cbca0a55a3d133eaceae4..30c881dda6a366af40e94f746cf95af83d2f10d2 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1844,7 +1844,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (ack) {
b->ack_drops++;
- sch->qstats.drops++;
+ qdisc_qstats_drop(sch);
ack_pkt_len = qdisc_pkt_len(ack);
b->bytes += ack_pkt_len;
q->buffer_used += skb->truesize - ack->truesize;
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 2a3d758f67ab43d17128442fd8b51c6ba7775d52..f9ef95e69f72870c8a6a16b7e7911d1287b8e83e 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -176,7 +176,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets,
flow->cvars.count += i;
q->backlogs[idx] -= len;
q->memory_usage -= mem;
- sch->qstats.drops += i;
+ __qdisc_qstats_drop(sch, i);
sch->qstats.backlog -= len;
sch->q.qlen -= i;
return idx;
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 36d0cafac2063f3ba1133ad9b8fab08ce4468550..8ae65572162c188cca5ac8f030dc6f2054a7fcd0 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -389,7 +389,7 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
packets += u64_stats_read(&hw_stats->stats.bstats[i].packets);
sch->qstats.qlen += hw_stats->stats.qstats[i].qlen;
sch->qstats.backlog += hw_stats->stats.qstats[i].backlog;
- sch->qstats.drops += hw_stats->stats.qstats[i].drops;
+ __qdisc_qstats_drop(sch, hw_stats->stats.qstats[i].drops);
sch->qstats.requeues += hw_stats->stats.qstats[i].requeues;
sch->qstats.overlimits += hw_stats->stats.qstats[i].overlimits;
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 03/15] net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 01/15] net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 02/15] net/sched: add qstats_cpu_drop_inc() helper Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 04/15] net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec() Eric Dumazet
` (10 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
Stats are read locklessly, add READ_ONCE() to prevent load-stearing.
Write side will be handled in separate patches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/gen_stats.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index b71ccaec0991461333dbe465ee619bca4a06e75b..1a2380e74272de8eaf3d4ef453e56105a31e9edf 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -345,11 +345,11 @@ static void gnet_stats_add_queue_cpu(struct gnet_stats_queue *qstats,
for_each_possible_cpu(i) {
const struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
- qstats->qlen += qcpu->qlen;
- qstats->backlog += qcpu->backlog;
- qstats->drops += qcpu->drops;
- qstats->requeues += qcpu->requeues;
- qstats->overlimits += qcpu->overlimits;
+ qstats->qlen += READ_ONCE(qcpu->qlen);
+ qstats->backlog += READ_ONCE(qcpu->backlog);
+ qstats->drops += READ_ONCE(qcpu->drops);
+ qstats->requeues += READ_ONCE(qcpu->requeues);
+ qstats->overlimits += READ_ONCE(qcpu->overlimits);
}
}
@@ -360,11 +360,11 @@ void gnet_stats_add_queue(struct gnet_stats_queue *qstats,
if (cpu) {
gnet_stats_add_queue_cpu(qstats, cpu);
} else {
- qstats->qlen += q->qlen;
- qstats->backlog += q->backlog;
- qstats->drops += q->drops;
- qstats->requeues += q->requeues;
- qstats->overlimits += q->overlimits;
+ qstats->qlen += READ_ONCE(q->qlen);
+ qstats->backlog += READ_ONCE(q->backlog);
+ qstats->drops += READ_ONCE(q->drops);
+ qstats->requeues += READ_ONCE(q->requeues);
+ qstats->overlimits += READ_ONCE(q->overlimits);
}
}
EXPORT_SYMBOL(gnet_stats_add_queue);
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 04/15] net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (2 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 03/15] net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu] Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 05/15] net/sched: annotate data-races around sch->qstats.backlog Eric Dumazet
` (9 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
Helpers to increment or decrement sch->q.qlen, with appropriate
WRITE_ONCE() to prevent store tearing.
Add other WRITE_ONCE() when sch->q.qlen is changed.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/sch_generic.h | 26 ++++++++++++++++++--------
net/sched/sch_api.c | 2 +-
net/sched/sch_cake.c | 8 ++++----
net/sched/sch_cbs.c | 4 ++--
net/sched/sch_choke.c | 8 ++++----
net/sched/sch_drr.c | 4 ++--
net/sched/sch_dualpi2.c | 4 ++--
net/sched/sch_etf.c | 8 ++++----
net/sched/sch_ets.c | 4 ++--
net/sched/sch_fq.c | 6 +++---
net/sched/sch_fq_codel.c | 7 ++++---
net/sched/sch_fq_pie.c | 4 ++--
net/sched/sch_generic.c | 8 ++++----
net/sched/sch_hfsc.c | 4 ++--
net/sched/sch_hhf.c | 7 ++++---
net/sched/sch_htb.c | 4 ++--
net/sched/sch_mq.c | 5 +++--
net/sched/sch_mqprio.c | 18 ++++++++++--------
net/sched/sch_multiq.c | 4 ++--
net/sched/sch_netem.c | 10 +++++-----
net/sched/sch_prio.c | 4 ++--
net/sched/sch_qfq.c | 6 +++---
net/sched/sch_red.c | 4 ++--
net/sched/sch_sfb.c | 4 ++--
net/sched/sch_sfq.c | 9 +++++----
net/sched/sch_skbprio.c | 4 ++--
net/sched/sch_taprio.c | 4 ++--
net/sched/sch_tbf.c | 6 +++---
net/sched/sch_teql.c | 2 +-
29 files changed, 102 insertions(+), 86 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b22579671e4b4dd04c5dfa810b714daaac74af2a..27d705246dbde99eda02f225dd745f14c73bd830 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -542,6 +542,16 @@ static inline int qdisc_qlen(const struct Qdisc *q)
return q->q.qlen;
}
+static inline void qdisc_qlen_inc(struct Qdisc *q)
+{
+ WRITE_ONCE(q->q.qlen, q->q.qlen + 1);
+}
+
+static inline void qdisc_qlen_dec(struct Qdisc *q)
+{
+ WRITE_ONCE(q->q.qlen, q->q.qlen - 1);
+}
+
static inline int qdisc_qlen_sum(const struct Qdisc *q)
{
__u32 qlen = q->qstats.qlen;
@@ -549,9 +559,9 @@ static inline int qdisc_qlen_sum(const struct Qdisc *q)
if (qdisc_is_percpu_stats(q)) {
for_each_possible_cpu(i)
- qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
+ qlen += READ_ONCE(per_cpu_ptr(q->cpu_qstats, i)->qlen);
} else {
- qlen += q->q.qlen;
+ qlen += READ_ONCE(q->q.qlen);
}
return qlen;
@@ -1110,7 +1120,7 @@ static inline struct sk_buff *qdisc_dequeue_internal(struct Qdisc *sch, bool dir
skb = __skb_dequeue(&sch->gso_skb);
if (skb) {
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
return skb;
}
@@ -1256,7 +1266,7 @@ static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch)
__skb_queue_head(&sch->gso_skb, skb);
/* it's still part of the queue */
qdisc_qstats_backlog_inc(sch, skb);
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
}
}
@@ -1273,7 +1283,7 @@ static inline void qdisc_update_stats_at_dequeue(struct Qdisc *sch,
} else {
qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
}
}
@@ -1285,7 +1295,7 @@ static inline void qdisc_update_stats_at_enqueue(struct Qdisc *sch,
this_cpu_add(sch->cpu_qstats->backlog, pkt_len);
} else {
sch->qstats.backlog += pkt_len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
}
}
@@ -1301,7 +1311,7 @@ static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
qdisc_qstats_cpu_qlen_dec(sch);
} else {
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
}
} else {
skb = sch->dequeue(sch);
@@ -1322,7 +1332,7 @@ static inline void __qdisc_reset_queue(struct qdisc_skb_head *qh)
qh->head = NULL;
qh->tail = NULL;
- qh->qlen = 0;
+ WRITE_ONCE(qh->qlen, 0);
}
}
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ed869a5ffc7377b7c19e66ae5fc9788e709488da..0dd3efd86393870e9695dddb4a471c5bf854f81e 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -805,7 +805,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
cl = cops->find(sch, parentid);
cops->qlen_notify(sch, cl);
}
- sch->q.qlen -= n;
+ WRITE_ONCE(sch->q.qlen, sch->q.qlen - n);
sch->qstats.backlog -= len;
__qdisc_qstats_drop(sch, drops);
}
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 30c881dda6a366af40e94f746cf95af83d2f10d2..f2d9aded493264f6235d43c7e99db1d4c27408e6 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1605,7 +1605,7 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
cake_advance_shaper(q, b, skb, now, true);
qdisc_drop_reason(skb, sch, to_free, QDISC_DROP_OVERLIMIT);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
cake_heapify(q, 0);
@@ -1815,7 +1815,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
segs);
flow_queue_add(flow, segs);
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
numsegs++;
slen += segs->len;
q->buffer_used += segs->truesize;
@@ -1854,7 +1854,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qdisc_tree_reduce_backlog(sch, 1, ack_pkt_len);
consume_skb(ack);
} else {
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
q->buffer_used += skb->truesize;
}
@@ -1980,7 +1980,7 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
b->tin_backlog -= len;
sch->qstats.backlog -= len;
q->buffer_used -= skb->truesize;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
if (q->overflow_timeout)
cake_heapify(q, b->overflow_idx[q->cur_flow]);
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index 8c9a0400c8622c652db290796f2dd338eb61799c..a75e58876797952f2218725f6da5cff29f330ae2 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -97,7 +97,7 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return err;
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
@@ -168,7 +168,7 @@ static struct sk_buff *cbs_child_dequeue(struct Qdisc *sch, struct Qdisc *child)
qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 94df8e741a979191a06885ad3ee813f12650ff3c..cd0785ad8e74314e6d5c88144ffcf64f286e02dd 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -123,7 +123,7 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx,
if (idx == q->tail)
choke_zap_tail_holes(q);
- --sch->q.qlen;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(skb));
qdisc_drop(skb, sch, to_free);
@@ -267,7 +267,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (sch->q.qlen < q->limit) {
q->tab[q->tail] = skb;
q->tail = (q->tail + 1) & q->tab_mask;
- ++sch->q.qlen;
+ qdisc_qlen_inc(sch);
qdisc_qstats_backlog_inc(sch, skb);
return NET_XMIT_SUCCESS;
}
@@ -294,7 +294,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
skb = q->tab[q->head];
q->tab[q->head] = NULL;
choke_zap_head_holes(q);
- --sch->q.qlen;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
@@ -392,7 +392,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt,
}
dropped += qdisc_pkt_len(skb);
qdisc_qstats_backlog_dec(sch, skb);
- --sch->q.qlen;
+ qdisc_qlen_dec(sch);
rtnl_qdisc_drop(skb, sch);
}
qdisc_tree_reduce_backlog(sch, oqlen - sch->q.qlen, dropped);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 01335a49e091444747635ee8bc7e22ded504d571..925fa0cfd730ce72e45e8983ba02eb913afb1235 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -366,7 +366,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return err;
}
@@ -399,7 +399,7 @@ static struct sk_buff *drr_dequeue(struct Qdisc *sch)
bstats_update(&cl->bstats, skb);
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
index fe6f5e8896257674b9f175e01428b89e299a7dda..d093d058decbc577f3b311e1e8513260c167bff0 100644
--- a/net/sched/sch_dualpi2.c
+++ b/net/sched/sch_dualpi2.c
@@ -415,7 +415,7 @@ static int dualpi2_enqueue_skb(struct sk_buff *skb, struct Qdisc *sch,
dualpi2_skb_cb(skb)->apply_step = skb_apply_step(skb, q);
/* Keep the overall qdisc stats consistent */
- ++sch->q.qlen;
+ qdisc_qlen_inc(sch);
qdisc_qstats_backlog_inc(sch, skb);
++q->packets_in_l;
if (!q->l_head_ts)
@@ -530,7 +530,7 @@ static struct sk_buff *dequeue_packet(struct Qdisc *sch,
qdisc_qstats_backlog_dec(q->l_queue, skb);
/* Keep the global queue size consistent */
- --sch->q.qlen;
+ qdisc_qlen_dec(sch);
q->memory_used -= skb->truesize;
} else if (c_len) {
skb = __qdisc_dequeue_head(&sch->q);
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index c74d778c32a1eda639650df4d1d103c5338f14e6..ada87a81da6ac4c20e036b5391eb4efe9795ab91 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -189,7 +189,7 @@ static int etf_enqueue_timesortedlist(struct sk_buff *nskb, struct Qdisc *sch,
rb_insert_color_cached(&nskb->rbnode, &q->head, leftmost);
qdisc_qstats_backlog_inc(sch, nskb);
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
/* Now we may need to re-arm the qdisc watchdog for the next packet. */
reset_watchdog(sch);
@@ -222,7 +222,7 @@ static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb,
qdisc_qstats_backlog_dec(sch, skb);
qdisc_drop(skb, sch, &to_free);
qdisc_qstats_overlimit(sch);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
}
kfree_skb_list(to_free);
@@ -247,7 +247,7 @@ static void timesortedlist_remove(struct Qdisc *sch, struct sk_buff *skb)
q->last = skb->tstamp;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
}
static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
@@ -426,7 +426,7 @@ static void timesortedlist_clear(struct Qdisc *sch)
rb_erase_cached(&skb->rbnode, &q->head);
rtnl_kfree_skbs(skb, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
}
}
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index a4b07b661b7756a675d22c0f84f8f0a713cdb7eb..c817e0a6c14653a35f5ebb9de1a5ccc44d1a2f98 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -449,7 +449,7 @@ static int ets_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return err;
}
@@ -458,7 +458,7 @@ ets_qdisc_dequeue_skb(struct Qdisc *sch, struct sk_buff *skb)
{
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index f2edcf872981fd8181dfb97a3bc665fd4a869115..1e34ac136b15cf24742f2810d201420cf763021a 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -497,7 +497,7 @@ static void fq_dequeue_skb(struct Qdisc *sch, struct fq_flow *flow,
fq_erase_head(sch, flow, skb);
skb_mark_not_on_list(skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_bstats_update(sch, skb);
}
@@ -597,7 +597,7 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
flow_queue_add(f, skb);
qdisc_qstats_backlog_inc(sch, skb);
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
@@ -801,7 +801,7 @@ static void fq_reset(struct Qdisc *sch)
struct fq_flow *f;
unsigned int idx;
- sch->q.qlen = 0;
+ WRITE_ONCE(sch->q.qlen, 0);
sch->qstats.backlog = 0;
fq_flow_purge(&q->internal);
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index f9ef95e69f72870c8a6a16b7e7911d1287b8e83e..3c60a8ec4682b174ebf0df34f356eb6132356764 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -178,7 +178,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets,
q->memory_usage -= mem;
__qdisc_qstats_drop(sch, i);
sch->qstats.backlog -= len;
- sch->q.qlen -= i;
+ WRITE_ONCE(sch->q.qlen, sch->q.qlen - i);
return idx;
}
@@ -215,7 +215,8 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch,
get_codel_cb(skb)->mem_usage = skb->truesize;
q->memory_usage += get_codel_cb(skb)->mem_usage;
memory_limited = q->memory_usage > q->memory_limit;
- if (++sch->q.qlen <= sch->limit && !memory_limited)
+ qdisc_qlen_inc(sch);
+ if (sch->q.qlen <= sch->limit && !memory_limited)
return NET_XMIT_SUCCESS;
prev_backlog = sch->qstats.backlog;
@@ -265,7 +266,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
skb = dequeue_head(flow);
q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
q->memory_usage -= get_codel_cb(skb)->mem_usage;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
sch->qstats.backlog -= qdisc_pkt_len(skb);
}
return skb;
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 154c70f489f289066db5d61bb51e58aaf328f16e..dba49d44a5d2412b2deb983bf87428ade7944e51 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -185,7 +185,7 @@ static int fq_pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
q->stats.packets_in++;
q->memory_usage += skb->truesize;
sch->qstats.backlog += pkt_len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
flow_queue_add(sel_flow, skb);
if (list_empty(&sel_flow->flowchain)) {
list_add_tail(&sel_flow->flowchain, &q->new_flows);
@@ -263,7 +263,7 @@ static struct sk_buff *fq_pie_qdisc_dequeue(struct Qdisc *sch)
skb = dequeue_head(flow);
pkt_len = qdisc_pkt_len(skb);
sch->qstats.backlog -= pkt_len;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_bstats_update(sch, skb);
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a93321db8fd75d30c61e146c290bbc139c37c913..32ace8659ab86457cd1b1655810e0f4105149c47 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -118,7 +118,7 @@ static inline struct sk_buff *__skb_dequeue_bad_txq(struct Qdisc *q)
qdisc_qstats_cpu_qlen_dec(q);
} else {
qdisc_qstats_backlog_dec(q, skb);
- q->q.qlen--;
+ qdisc_qlen_dec(q);
}
} else {
skb = SKB_XOFF_MAGIC;
@@ -159,7 +159,7 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
qdisc_qstats_cpu_qlen_inc(q);
} else {
qdisc_qstats_backlog_inc(q, skb);
- q->q.qlen++;
+ qdisc_qlen_inc(q);
}
if (lock)
@@ -188,7 +188,7 @@ static inline void dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
} else {
q->qstats.requeues++;
qdisc_qstats_backlog_inc(q, skb);
- q->q.qlen++;
+ qdisc_qlen_inc(q);
}
skb = next;
@@ -294,7 +294,7 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
qdisc_qstats_cpu_qlen_dec(q);
} else {
qdisc_qstats_backlog_dec(q, skb);
- q->q.qlen--;
+ qdisc_qlen_dec(q);
}
} else {
skb = NULL;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 83b2ca2e37fc82cfebf089e6c0e36f18af939887..e71a565100edf60881ca7542faa408c5bb1a0984 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1561,7 +1561,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
}
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
if (first && !cl_in_el_or_vttree(cl)) {
if (cl->cl_flags & HFSC_RSC)
@@ -1650,7 +1650,7 @@ hfsc_dequeue(struct Qdisc *sch)
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 95e5d9bfd9c8c0cac08e080b8f1e0332e722aa3b..69b6f0a5471cb9a3b7b760144683f2b249091d89 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -359,7 +359,7 @@ static unsigned int hhf_drop(struct Qdisc *sch, struct sk_buff **to_free)
if (bucket->head) {
struct sk_buff *skb = dequeue_head(bucket);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
qdisc_drop(skb, sch, to_free);
}
@@ -399,7 +399,8 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
bucket->deficit = weight * q->quantum;
}
- if (++sch->q.qlen <= sch->limit)
+ qdisc_qlen_inc(sch);
+ if (sch->q.qlen <= sch->limit)
return NET_XMIT_SUCCESS;
prev_backlog = sch->qstats.backlog;
@@ -442,7 +443,7 @@ static struct sk_buff *hhf_dequeue(struct Qdisc *sch)
if (bucket->head) {
skb = dequeue_head(bucket);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
}
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index eb12381795ce1bb0f3b8c5f502e16ad64c4408c8..c22ccd8eae8c73323ccdf425e62857b3b851d74e 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -651,7 +651,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
@@ -951,7 +951,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
ok:
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index a0133a7b9d3b09a0d2a6064234c8fdef60dbf955..ec8c91d3fde04e59daec2aecdb14d6bf50715e15 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -143,10 +143,10 @@ EXPORT_SYMBOL_NS_GPL(mq_attach, "NET_SCHED_INTERNAL");
void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
{
struct net_device *dev = qdisc_dev(sch);
+ unsigned int qlen = 0;
struct Qdisc *qdisc;
unsigned int ntx;
- sch->q.qlen = 0;
gnet_stats_basic_sync_init(&sch->bstats);
memset(&sch->qstats, 0, sizeof(sch->qstats));
@@ -163,10 +163,11 @@ void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
&qdisc->bstats, false);
gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
&qdisc->qstats);
- sch->q.qlen += qdisc_qlen(qdisc);
+ qlen += qdisc_qlen(qdisc);
spin_unlock_bh(qdisc_lock(qdisc));
}
+ WRITE_ONCE(sch->q.qlen, qlen);
}
EXPORT_SYMBOL_NS_GPL(mq_dump_common, "NET_SCHED_INTERNAL");
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 002add5ce9e0ab04a6260495d1bec02983c2a204..91a92992cd24ab6c30bf7db2288c08cd493c7bc3 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -555,10 +555,11 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
struct mqprio_sched *priv = qdisc_priv(sch);
struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
struct tc_mqprio_qopt opt = { 0 };
+ unsigned int qlen = 0;
struct Qdisc *qdisc;
unsigned int ntx;
- sch->q.qlen = 0;
+ qlen = 0;
gnet_stats_basic_sync_init(&sch->bstats);
memset(&sch->qstats, 0, sizeof(sch->qstats));
@@ -575,10 +576,11 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
&qdisc->bstats, false);
gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
&qdisc->qstats);
- sch->q.qlen += qdisc_qlen(qdisc);
+ qlen += qdisc_qlen(qdisc);
spin_unlock_bh(qdisc_lock(qdisc));
}
+ WRITE_ONCE(sch->q.qlen, qlen);
mqprio_qopt_reconstruct(dev, &opt);
opt.hw = priv->hw_offload;
@@ -663,12 +665,12 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
__acquires(d->lock)
{
if (cl >= TC_H_MIN_PRIORITY) {
- int i;
- __u32 qlen;
- struct gnet_stats_queue qstats = {0};
- struct gnet_stats_basic_sync bstats;
struct net_device *dev = qdisc_dev(sch);
struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
+ struct gnet_stats_queue qstats = {0};
+ struct gnet_stats_basic_sync bstats;
+ u32 qlen = 0;
+ int i;
gnet_stats_basic_sync_init(&bstats);
/* Drop lock here it will be reclaimed before touching
@@ -689,11 +691,11 @@ static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
&qdisc->bstats, false);
gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
&qdisc->qstats);
- sch->q.qlen += qdisc_qlen(qdisc);
+ qlen += qdisc_qlen(qdisc);
spin_unlock_bh(qdisc_lock(qdisc));
}
- qlen = qdisc_qlen(sch) + qstats.qlen;
+ qlen = qlen + qstats.qlen;
/* Reclaim root sleeping lock before completing stats */
if (d->lock)
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 9f822fee113df6562ddac89092357434547a4599..4e465d11e3d75e36b875b66f8c8087c2e15cdad9 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -76,7 +76,7 @@ multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
ret = qdisc_enqueue(skb, qdisc, to_free);
if (ret == NET_XMIT_SUCCESS) {
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
if (net_xmit_drop_count(ret))
@@ -106,7 +106,7 @@ static struct sk_buff *multiq_dequeue(struct Qdisc *sch)
skb = qdisc->dequeue(qdisc);
if (skb) {
qdisc_bstats_update(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
}
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 20df1c08b1e9d04e9495f1a69eff0dd96049f914..4498dd440a02ea7a089c92ebc005d5064b87e2d2 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -416,7 +416,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
rb_insert_color(&nskb->rbnode, &q->t_root);
}
q->t_len++;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
}
/* netem can't properly corrupt a megapacket (like we get from GSO), so instead
@@ -752,19 +752,19 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (net_xmit_drop_count(err))
qdisc_qstats_drop(sch);
sch->qstats.backlog -= pkt_len;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_tree_reduce_backlog(sch, 1, pkt_len);
}
goto tfifo_dequeue;
}
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
goto deliver;
}
if (q->qdisc) {
skb = q->qdisc->ops->dequeue(q->qdisc);
if (skb) {
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
goto deliver;
}
}
@@ -777,7 +777,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (q->qdisc) {
skb = q->qdisc->ops->dequeue(q->qdisc);
if (skb) {
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
goto deliver;
}
}
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 9e2b9a490db23d858b27b7fc073b05a06535b05e..fe42ae3d6b696b2fc47f4d397af32e950eeec194 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -86,7 +86,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
ret = qdisc_enqueue(skb, qdisc, to_free);
if (ret == NET_XMIT_SUCCESS) {
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
if (net_xmit_drop_count(ret))
@@ -119,7 +119,7 @@ static struct sk_buff *prio_dequeue(struct Qdisc *sch)
if (skb) {
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 699e45873f86145e96abd0d9ca77a6d0ff763b1b..195c434aae5f7e03d1a1238ed73bb64b3f04e105 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1152,12 +1152,12 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
if (!skb)
return NULL;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
skb = agg_dequeue(in_serv_agg, cl, len);
if (!skb) {
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NULL;
}
@@ -1265,7 +1265,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
_bstats_update(&cl->bstats, len, gso_segs);
sch->qstats.backlog += len;
- ++sch->q.qlen;
+ qdisc_qlen_inc(sch);
agg = cl->agg;
/* if the class is active, then done here */
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index c8d3d09f15e3919d6468964561130bfc79fb215b..61b9064d39f222bdfe5021e93e8172b7ae60c408 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -133,7 +133,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
ret = qdisc_enqueue(skb, child, to_free);
if (likely(ret == NET_XMIT_SUCCESS)) {
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
} else if (net_xmit_drop_count(ret)) {
q->stats.pdrop++;
qdisc_qstats_drop(sch);
@@ -159,7 +159,7 @@ static struct sk_buff *red_dequeue(struct Qdisc *sch)
if (skb) {
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
} else {
if (!red_is_idling(&q->vars))
red_start_of_idle_period(&q->vars);
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 01373866212894c57e7de58706ee464879303955..17b6ce223ad3a6f2d289c3ebe27cce8168c66b2b 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -407,7 +407,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
ret = qdisc_enqueue(skb, child, to_free);
if (likely(ret == NET_XMIT_SUCCESS)) {
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
increment_qlen(&cb, q);
} else if (net_xmit_drop_count(ret)) {
q->stats.childdrop++;
@@ -436,7 +436,7 @@ static struct sk_buff *sfb_dequeue(struct Qdisc *sch)
if (skb) {
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
decrement_qlen(skb, q);
}
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index c3f3181dba5424eb9d26362a1628653bb9392e89..132afe4d923c56e25bcb406a7cc47acef0f0e9b5 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -300,7 +300,7 @@ static unsigned int sfq_drop(struct Qdisc *sch, struct sk_buff **to_free)
len = qdisc_pkt_len(skb);
slot->backlog -= len;
sfq_dec(q, x);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
qdisc_drop_reason(skb, sch, to_free, QDISC_DROP_OVERLIMIT);
return len;
@@ -454,7 +454,8 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
/* We could use a bigger initial quantum for new flows */
slot->allot = q->quantum;
}
- if (++sch->q.qlen <= q->limit)
+ qdisc_qlen_inc(sch);
+ if (sch->q.qlen <= q->limit)
return NET_XMIT_SUCCESS;
qlen = slot->qlen;
@@ -495,7 +496,7 @@ sfq_dequeue(struct Qdisc *sch)
skb = slot_dequeue_head(slot);
sfq_dec(q, a);
qdisc_bstats_update(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
slot->backlog -= qdisc_pkt_len(skb);
/* Is the slot empty? */
@@ -594,7 +595,7 @@ static void sfq_rehash(struct Qdisc *sch)
slot->allot = q->quantum;
}
}
- sch->q.qlen -= dropped;
+ WRITE_ONCE(sch->q.qlen, sch->q.qlen - dropped);
qdisc_tree_reduce_backlog(sch, dropped, drop_len);
}
diff --git a/net/sched/sch_skbprio.c b/net/sched/sch_skbprio.c
index f485f62ab721ab8cde21230c60514708fb479982..52abfb4015a36408046d96b349497419ab5dacf8 100644
--- a/net/sched/sch_skbprio.c
+++ b/net/sched/sch_skbprio.c
@@ -93,7 +93,7 @@ static int skbprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (prio < q->lowest_prio)
q->lowest_prio = prio;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
@@ -145,7 +145,7 @@ static struct sk_buff *skbprio_dequeue(struct Qdisc *sch)
if (unlikely(!skb))
return NULL;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 8e375281195061da848fb2bfaf79cf125afccac0..885a9bc859166dfb6d20aa0dfbb8f11194e02ba9 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -574,7 +574,7 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch,
}
qdisc_qstats_backlog_inc(sch, skb);
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return qdisc_enqueue(skb, child, to_free);
}
@@ -755,7 +755,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index f2340164f579a25431979e12ec3d23ab828edd16..25edf11a7d671fe63878b0995998c5920b86ef74 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -231,7 +231,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
len += seg_len;
}
}
- sch->q.qlen += nb;
+ WRITE_ONCE(sch->q.qlen, sch->q.qlen + nb);
sch->qstats.backlog += len;
if (nb > 0) {
qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
@@ -264,7 +264,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
@@ -309,7 +309,7 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
q->tokens = toks;
q->ptokens = ptoks;
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_bstats_update(sch, skb);
return skb;
}
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index ec4039a201a2c2c502bc649fa5f6a0e4feee8fd5..bd10da46f5ddbc53f914648066dab526c8064e55 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -107,7 +107,7 @@ teql_dequeue(struct Qdisc *sch)
} else {
qdisc_bstats_update(sch, skb);
}
- sch->q.qlen = dat->q.qlen + q->q.qlen;
+ WRITE_ONCE(sch->q.qlen, dat->q.qlen + q->q.qlen);
return skb;
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 05/15] net/sched: annotate data-races around sch->qstats.backlog
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (3 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 04/15] net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec() Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 06/15] net/sched: sch_sfb: annotate data-races in sfb_dump_stats() Eric Dumazet
` (8 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
Add qstats_backlog_sub() and qstats_backlog_add() helpers
and use them instead of open-coding them.
These helpers use WRITE_ONCE() to prevent store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/sch_generic.h | 16 +++++++++++++---
net/sched/sch_api.c | 2 +-
net/sched/sch_cake.c | 8 ++++----
net/sched/sch_cbs.c | 2 +-
net/sched/sch_codel.c | 2 +-
net/sched/sch_drr.c | 2 +-
net/sched/sch_ets.c | 2 +-
net/sched/sch_fq_codel.c | 4 ++--
net/sched/sch_fq_pie.c | 4 ++--
net/sched/sch_gred.c | 2 +-
net/sched/sch_hfsc.c | 2 +-
net/sched/sch_htb.c | 2 +-
net/sched/sch_netem.c | 2 +-
net/sched/sch_prio.c | 2 +-
net/sched/sch_qfq.c | 2 +-
net/sched/sch_red.c | 2 +-
net/sched/sch_sfb.c | 4 ++--
net/sched/sch_sfq.c | 2 +-
net/sched/sch_tbf.c | 4 ++--
19 files changed, 38 insertions(+), 28 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 27d705246dbde99eda02f225dd745f14c73bd830..b0564a39caf4471619b74179a06a0e41e3765d94 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -965,10 +965,15 @@ static inline void qdisc_bstats_update(struct Qdisc *sch,
bstats_update(&sch->bstats, skb);
}
+static inline void qstats_backlog_sub(struct Qdisc *sch, u32 val)
+{
+ WRITE_ONCE(sch->qstats.backlog, sch->qstats.backlog - val);
+}
+
static inline void qdisc_qstats_backlog_dec(struct Qdisc *sch,
const struct sk_buff *skb)
{
- sch->qstats.backlog -= qdisc_pkt_len(skb);
+ qstats_backlog_sub(sch, qdisc_pkt_len(skb));
}
static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
@@ -977,10 +982,15 @@ static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
this_cpu_sub(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
}
+static inline void qstats_backlog_add(struct Qdisc *sch, u32 val)
+{
+ WRITE_ONCE(sch->qstats.backlog, sch->qstats.backlog + val);
+}
+
static inline void qdisc_qstats_backlog_inc(struct Qdisc *sch,
const struct sk_buff *skb)
{
- sch->qstats.backlog += qdisc_pkt_len(skb);
+ qstats_backlog_add(sch, qdisc_pkt_len(skb));
}
static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch,
@@ -1294,7 +1304,7 @@ static inline void qdisc_update_stats_at_enqueue(struct Qdisc *sch,
qdisc_qstats_cpu_qlen_inc(sch);
this_cpu_add(sch->cpu_qstats->backlog, pkt_len);
} else {
- sch->qstats.backlog += pkt_len;
+ qstats_backlog_add(sch, pkt_len);
qdisc_qlen_inc(sch);
}
}
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 0dd3efd86393870e9695dddb4a471c5bf854f81e..292bc8bb7a79922a83865ed54083c04ff72742ff 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -806,7 +806,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
cops->qlen_notify(sch, cl);
}
WRITE_ONCE(sch->q.qlen, sch->q.qlen - n);
- sch->qstats.backlog -= len;
+ qstats_backlog_sub(sch, len);
__qdisc_qstats_drop(sch, drops);
}
rcu_read_unlock();
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index f2d9aded493264f6235d43c7e99db1d4c27408e6..32e672820c00a88c6d8fe77a6308405e016525ea 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1596,7 +1596,7 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
q->buffer_used -= skb->truesize;
b->backlogs[idx] -= len;
b->tin_backlog -= len;
- sch->qstats.backlog -= len;
+ qstats_backlog_sub(sch, len);
flow->dropped++;
b->tin_dropped++;
@@ -1826,7 +1826,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
b->bytes += slen;
b->backlogs[idx] += slen;
b->tin_backlog += slen;
- sch->qstats.backlog += slen;
+ qstats_backlog_add(sch, slen);
q->avg_window_bytes += slen;
qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen);
@@ -1863,7 +1863,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
b->bytes += len - ack_pkt_len;
b->backlogs[idx] += len - ack_pkt_len;
b->tin_backlog += len - ack_pkt_len;
- sch->qstats.backlog += len - ack_pkt_len;
+ qstats_backlog_add(sch, len - ack_pkt_len);
q->avg_window_bytes += len - ack_pkt_len;
}
@@ -1978,7 +1978,7 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
len = qdisc_pkt_len(skb);
b->backlogs[q->cur_flow] -= len;
b->tin_backlog -= len;
- sch->qstats.backlog -= len;
+ qstats_backlog_sub(sch, len);
q->buffer_used -= skb->truesize;
qdisc_qlen_dec(sch);
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index a75e58876797952f2218725f6da5cff29f330ae2..2cfa0fd92829ad7eba7454e09dc17eb8f22519b8 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -96,7 +96,7 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (err != NET_XMIT_SUCCESS)
return err;
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 317aae0ec7bd6aedb4bae09b18423c981fed16e7..91dd2e629af8f2d1a29f439a6dbb5c186fa01d33 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -42,7 +42,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
if (skb) {
- sch->qstats.backlog -= qdisc_pkt_len(skb);
+ qstats_backlog_sub(sch, qdisc_pkt_len(skb));
prefetch(&skb->end); /* we'll need skb_shinfo() */
}
return skb;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 925fa0cfd730ce72e45e8983ba02eb913afb1235..3f6687fa9666257952be5d44f9e3460845fe2a40 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -365,7 +365,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
cl->deficit = cl->quantum;
}
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return err;
}
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index c817e0a6c14653a35f5ebb9de1a5ccc44d1a2f98..1cc559634ed27ce5a6630186a51a8ac8180dad96 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -448,7 +448,7 @@ static int ets_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
cl->deficit = cl->quantum;
}
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return err;
}
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 3c60a8ec4682b174ebf0df34f356eb6132356764..3a348be18551033dcf41ce632eb4b563221040fa 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -177,7 +177,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets,
q->backlogs[idx] -= len;
q->memory_usage -= mem;
__qdisc_qstats_drop(sch, i);
- sch->qstats.backlog -= len;
+ qstats_backlog_sub(sch, len);
WRITE_ONCE(sch->q.qlen, sch->q.qlen - i);
return idx;
}
@@ -267,7 +267,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
q->memory_usage -= get_codel_cb(skb)->mem_usage;
qdisc_qlen_dec(sch);
- sch->qstats.backlog -= qdisc_pkt_len(skb);
+ qdisc_qstats_backlog_dec(sch, skb);
}
return skb;
}
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index dba49d44a5d2412b2deb983bf87428ade7944e51..197f0df0a6eb06ab4ce25eefe01d32a35dbd84af 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -184,7 +184,7 @@ static int fq_pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
pkt_len = qdisc_pkt_len(skb);
q->stats.packets_in++;
q->memory_usage += skb->truesize;
- sch->qstats.backlog += pkt_len;
+ qstats_backlog_add(sch, pkt_len);
qdisc_qlen_inc(sch);
flow_queue_add(sel_flow, skb);
if (list_empty(&sel_flow->flowchain)) {
@@ -262,7 +262,7 @@ static struct sk_buff *fq_pie_qdisc_dequeue(struct Qdisc *sch)
if (flow->head) {
skb = dequeue_head(flow);
pkt_len = qdisc_pkt_len(skb);
- sch->qstats.backlog -= pkt_len;
+ qstats_backlog_sub(sch, pkt_len);
qdisc_qlen_dec(sch);
qdisc_bstats_update(sch, skb);
}
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 8ae65572162c188cca5ac8f030dc6f2054a7fcd0..fcc1a4c0363624293986f221c70572ce6503e220 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -388,7 +388,7 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
bytes += u64_stats_read(&hw_stats->stats.bstats[i].bytes);
packets += u64_stats_read(&hw_stats->stats.bstats[i].packets);
sch->qstats.qlen += hw_stats->stats.qstats[i].qlen;
- sch->qstats.backlog += hw_stats->stats.qstats[i].backlog;
+ qstats_backlog_add(sch, hw_stats->stats.qstats[i].backlog);
__qdisc_qstats_drop(sch, hw_stats->stats.qstats[i].drops);
sch->qstats.requeues += hw_stats->stats.qstats[i].requeues;
sch->qstats.overlimits += hw_stats->stats.qstats[i].overlimits;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index e71a565100edf60881ca7542faa408c5bb1a0984..59409ee2d2ff9279d7439b744030c0e845386de0 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1560,7 +1560,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
return err;
}
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
if (first && !cl_in_el_or_vttree(cl)) {
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index c22ccd8eae8c73323ccdf425e62857b3b851d74e..1e600f65c8769a74286c4f060b0d45da9a13eeeb 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -650,7 +650,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
htb_activate(q, cl);
}
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 4498dd440a02ea7a089c92ebc005d5064b87e2d2..2a2cdd1e4cc206ba00b8dd1821bef87156050950 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -751,7 +751,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (err != NET_XMIT_SUCCESS) {
if (net_xmit_drop_count(err))
qdisc_qstats_drop(sch);
- sch->qstats.backlog -= pkt_len;
+ qstats_backlog_sub(sch, pkt_len);
qdisc_qlen_dec(sch);
qdisc_tree_reduce_backlog(sch, 1, pkt_len);
}
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index fe42ae3d6b696b2fc47f4d397af32e950eeec194..e4dd56a890725b4c14d6715c96f5b3fa44a8f4f2 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -85,7 +85,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
ret = qdisc_enqueue(skb, qdisc, to_free);
if (ret == NET_XMIT_SUCCESS) {
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 195c434aae5f7e03d1a1238ed73bb64b3f04e105..cb56787e1d258c06f2e86959c3b2cfaeb12df1ac 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1264,7 +1264,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
_bstats_update(&cl->bstats, len, gso_segs);
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
agg = cl->agg;
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 61b9064d39f222bdfe5021e93e8172b7ae60c408..7db97c96351309bc3e64fa50570a1928f2b2ce55 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -132,7 +132,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
len = qdisc_pkt_len(skb);
ret = qdisc_enqueue(skb, child, to_free);
if (likely(ret == NET_XMIT_SUCCESS)) {
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
} else if (net_xmit_drop_count(ret)) {
q->stats.pdrop++;
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 17b6ce223ad3a6f2d289c3ebe27cce8168c66b2b..2258567cbcaf70863eace85d347efda882a00145 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -406,7 +406,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
memcpy(&cb, sfb_skb_cb(skb), sizeof(cb));
ret = qdisc_enqueue(skb, child, to_free);
if (likely(ret == NET_XMIT_SUCCESS)) {
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
increment_qlen(&cb, q);
} else if (net_xmit_drop_count(ret)) {
@@ -582,7 +582,7 @@ static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
.penalty_burst = q->penalty_burst,
};
- sch->qstats.backlog = q->qdisc->qstats.backlog;
+ WRITE_ONCE(sch->qstats.backlog, READ_ONCE(q->qdisc->qstats.backlog));
opts = nla_nest_start_noflag(skb, TCA_OPTIONS);
if (opts == NULL)
goto nla_put_failure;
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 132afe4d923c56e25bcb406a7cc47acef0f0e9b5..76885d813e6b270f00e052395c1f98fc130f3b49 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -425,7 +425,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
/* We know we have at least one packet in queue */
head = slot_dequeue_head(slot);
delta = qdisc_pkt_len(head) - qdisc_pkt_len(skb);
- sch->qstats.backlog -= delta;
+ qstats_backlog_sub(sch, delta);
slot->backlog -= delta;
qdisc_drop_reason(head, sch, to_free, QDISC_DROP_FLOW_LIMIT);
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 25edf11a7d671fe63878b0995998c5920b86ef74..67c7aaaf8f607e82ad13b7fdf177405a1dd075bb 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -232,7 +232,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
}
}
WRITE_ONCE(sch->q.qlen, sch->q.qlen + nb);
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
if (nb > 0) {
qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
consume_skb(skb);
@@ -263,7 +263,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return ret;
}
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 06/15] net/sched: sch_sfb: annotate data-races in sfb_dump_stats()
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (4 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 05/15] net/sched: annotate data-races around sch->qstats.backlog Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 07/15] net/sched: sch_red: annotate data-races in red_dump_stats() Eric Dumazet
` (7 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
sfb_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.
Alternative would be to acquire the qdisc spinlock, but our long-term
goal is to make qdisc dump operations lockless as much as we can.
tc_sfb_xstats fields don't need to be latched atomically,
otherwise this bug would have been caught earlier.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_sfb.c | 46 +++++++++++++++++++++++++++------------------
1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 2258567cbcaf70863eace85d347efda882a00145..315edd7f87fcf1600d69a3a92733ddb9fee55e99 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -202,11 +202,14 @@ static u32 sfb_compute_qlen(u32 *prob_r, u32 *avgpm_r, const struct sfb_sched_da
const struct sfb_bucket *b = &q->bins[q->slot].bins[0][0];
for (i = 0; i < SFB_LEVELS * SFB_NUMBUCKETS; i++) {
- if (qlen < b->qlen)
- qlen = b->qlen;
- totalpm += b->p_mark;
- if (prob < b->p_mark)
- prob = b->p_mark;
+ u32 b_qlen = READ_ONCE(b->qlen);
+ u32 b_mark = READ_ONCE(b->p_mark);
+
+ if (qlen < b_qlen)
+ qlen = b_qlen;
+ totalpm += b_mark;
+ if (prob < b_mark)
+ prob = b_mark;
b++;
}
*prob_r = prob;
@@ -295,7 +298,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (unlikely(sch->q.qlen >= q->limit)) {
qdisc_qstats_overlimit(sch);
- q->stats.queuedrop++;
+ WRITE_ONCE(q->stats.queuedrop,
+ q->stats.queuedrop + 1);
goto drop;
}
@@ -348,7 +352,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (unlikely(minqlen >= q->max)) {
qdisc_qstats_overlimit(sch);
- q->stats.bucketdrop++;
+ WRITE_ONCE(q->stats.bucketdrop,
+ q->stats.bucketdrop + 1);
goto drop;
}
@@ -374,7 +379,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
if (sfb_rate_limit(skb, q)) {
qdisc_qstats_overlimit(sch);
- q->stats.penaltydrop++;
+ WRITE_ONCE(q->stats.penaltydrop,
+ q->stats.penaltydrop + 1);
goto drop;
}
goto enqueue;
@@ -390,14 +396,17 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
* In either case, we want to start dropping packets.
*/
if (r < (p_min - SFB_MAX_PROB / 2) * 2) {
- q->stats.earlydrop++;
+ WRITE_ONCE(q->stats.earlydrop,
+ q->stats.earlydrop + 1);
goto drop;
}
}
if (INET_ECN_set_ce(skb)) {
- q->stats.marked++;
+ WRITE_ONCE(q->stats.marked,
+ q->stats.marked + 1);
} else {
- q->stats.earlydrop++;
+ WRITE_ONCE(q->stats.earlydrop,
+ q->stats.earlydrop + 1);
goto drop;
}
}
@@ -410,7 +419,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qdisc_qlen_inc(sch);
increment_qlen(&cb, q);
} else if (net_xmit_drop_count(ret)) {
- q->stats.childdrop++;
+ WRITE_ONCE(q->stats.childdrop,
+ q->stats.childdrop + 1);
qdisc_qstats_drop(sch);
}
return ret;
@@ -599,12 +609,12 @@ static int sfb_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
{
struct sfb_sched_data *q = qdisc_priv(sch);
struct tc_sfb_xstats st = {
- .earlydrop = q->stats.earlydrop,
- .penaltydrop = q->stats.penaltydrop,
- .bucketdrop = q->stats.bucketdrop,
- .queuedrop = q->stats.queuedrop,
- .childdrop = q->stats.childdrop,
- .marked = q->stats.marked,
+ .earlydrop = READ_ONCE(q->stats.earlydrop),
+ .penaltydrop = READ_ONCE(q->stats.penaltydrop),
+ .bucketdrop = READ_ONCE(q->stats.bucketdrop),
+ .queuedrop = READ_ONCE(q->stats.queuedrop),
+ .childdrop = READ_ONCE(q->stats.childdrop),
+ .marked = READ_ONCE(q->stats.marked),
};
st.maxqlen = sfb_compute_qlen(&st.maxprob, &st.avgprob, q);
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 07/15] net/sched: sch_red: annotate data-races in red_dump_stats()
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (5 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 06/15] net/sched: sch_sfb: annotate data-races in sfb_dump_stats() Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 08/15] net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats() Eric Dumazet
` (6 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
red_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.
Alternative would be to acquire the qdisc spinlock, but our long-term
goal is to make qdisc dump operations lockless as much as we can.
tc_red_xstats fields don't need to be latched atomically,
otherwise this bug would have been caught earlier.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_red.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 7db97c96351309bc3e64fa50570a1928f2b2ce55..268f1ba4520cd74da60e7b1ca08974fcfdced680 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -90,17 +90,20 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
case RED_PROB_MARK:
qdisc_qstats_overlimit(sch);
if (!red_use_ecn(q)) {
- q->stats.prob_drop++;
+ WRITE_ONCE(q->stats.prob_drop,
+ q->stats.prob_drop + 1);
goto congestion_drop;
}
if (INET_ECN_set_ce(skb)) {
- q->stats.prob_mark++;
+ WRITE_ONCE(q->stats.prob_mark,
+ q->stats.prob_mark + 1);
skb = tcf_qevent_handle(&q->qe_mark, sch, skb, to_free, &ret);
if (!skb)
return NET_XMIT_CN | ret;
} else if (!red_use_nodrop(q)) {
- q->stats.prob_drop++;
+ WRITE_ONCE(q->stats.prob_drop,
+ q->stats.prob_drop + 1);
goto congestion_drop;
}
@@ -111,17 +114,20 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
reason = QDISC_DROP_OVERLIMIT;
qdisc_qstats_overlimit(sch);
if (red_use_harddrop(q) || !red_use_ecn(q)) {
- q->stats.forced_drop++;
+ WRITE_ONCE(q->stats.forced_drop,
+ q->stats.forced_drop + 1);
goto congestion_drop;
}
if (INET_ECN_set_ce(skb)) {
- q->stats.forced_mark++;
+ WRITE_ONCE(q->stats.forced_mark,
+ q->stats.forced_mark + 1);
skb = tcf_qevent_handle(&q->qe_mark, sch, skb, to_free, &ret);
if (!skb)
return NET_XMIT_CN | ret;
} else if (!red_use_nodrop(q)) {
- q->stats.forced_drop++;
+ WRITE_ONCE(q->stats.forced_drop,
+ q->stats.forced_drop + 1);
goto congestion_drop;
}
@@ -135,7 +141,8 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
} else if (net_xmit_drop_count(ret)) {
- q->stats.pdrop++;
+ WRITE_ONCE(q->stats.pdrop,
+ q->stats.pdrop + 1);
qdisc_qstats_drop(sch);
}
return ret;
@@ -463,9 +470,13 @@ static int red_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED,
&hw_stats_request);
}
- st.early = q->stats.prob_drop + q->stats.forced_drop;
- st.pdrop = q->stats.pdrop;
- st.marked = q->stats.prob_mark + q->stats.forced_mark;
+ st.early = READ_ONCE(q->stats.prob_drop) +
+ READ_ONCE(q->stats.forced_drop);
+
+ st.pdrop = READ_ONCE(q->stats.pdrop);
+
+ st.marked = READ_ONCE(q->stats.prob_mark) +
+ READ_ONCE(q->stats.forced_mark);
return gnet_stats_copy_app(d, &st, sizeof(st));
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 08/15] net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats()
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (6 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 07/15] net/sched: sch_red: annotate data-races in red_dump_stats() Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 09/15] net/sched: sch_pie: annotate data-races in pie_dump_stats() Eric Dumazet
` (5 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
fq_codel_dump_stats() acquires the qdisc spinlock a bit too late.
Move this acquisition before we fill st.qdisc_stats with live data.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_fq_codel.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 3a348be18551033dcf41ce632eb4b563221040fa..95769b19a04fb392e14a48353e94fb2ee299565a 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -586,6 +586,8 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
};
struct list_head *pos;
+ sch_tree_lock(sch);
+
st.qdisc_stats.maxpacket = q->cstats.maxpacket;
st.qdisc_stats.drop_overlimit = q->drop_overlimit;
st.qdisc_stats.ecn_mark = q->cstats.ecn_mark;
@@ -594,7 +596,6 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
st.qdisc_stats.memory_usage = q->memory_usage;
st.qdisc_stats.drop_overmemory = q->drop_overmemory;
- sch_tree_lock(sch);
list_for_each(pos, &q->new_flows)
st.qdisc_stats.new_flows_len++;
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 09/15] net/sched: sch_pie: annotate data-races in pie_dump_stats()
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (7 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 08/15] net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats() Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 10/15] net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats() Eric Dumazet
` (4 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
pie_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.
Alternative would be to acquire the qdisc spinlock, but our long-term
goal is to make qdisc dump operations lockless as much as we can.
tc_pie_xstats fields don't need to be latched atomically,
otherwise this bug would have been caught earlier.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/pie.h | 2 +-
net/sched/sch_pie.c | 38 +++++++++++++++++++-------------------
2 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/include/net/pie.h b/include/net/pie.h
index 01cbc66825a40bd21c0a044b1180cbbc346785df..1f3db0c355149b41823a891c9156cac625122031 100644
--- a/include/net/pie.h
+++ b/include/net/pie.h
@@ -104,7 +104,7 @@ static inline void pie_vars_init(struct pie_vars *vars)
vars->dq_tstamp = DTIME_INVALID;
vars->accu_prob = 0;
vars->dq_count = DQCOUNT_INVALID;
- vars->avg_dq_rate = 0;
+ WRITE_ONCE(vars->avg_dq_rate, 0);
}
static inline struct pie_skb_cb *get_pie_cb(const struct sk_buff *skb)
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 16f3f629cb8e4be71431f7e50a278e3c7fdba8d0..fb53fbf0e328571be72b66ba4e75a938e1963422 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -90,7 +90,7 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
bool enqueue = false;
if (unlikely(qdisc_qlen(sch) >= sch->limit)) {
- q->stats.overlimit++;
+ WRITE_ONCE(q->stats.overlimit, q->stats.overlimit + 1);
goto out;
}
@@ -104,7 +104,7 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
/* If packet is ecn capable, mark it if drop probability
* is lower than 10%, else drop it.
*/
- q->stats.ecn_mark++;
+ WRITE_ONCE(q->stats.ecn_mark, q->stats.ecn_mark + 1);
enqueue = true;
}
@@ -114,15 +114,15 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (!q->params.dq_rate_estimator)
pie_set_enqueue_time(skb);
- q->stats.packets_in++;
+ WRITE_ONCE(q->stats.packets_in, q->stats.packets_in + 1);
if (qdisc_qlen(sch) > q->stats.maxq)
- q->stats.maxq = qdisc_qlen(sch);
+ WRITE_ONCE(q->stats.maxq, qdisc_qlen(sch));
return qdisc_enqueue_tail(skb, sch);
}
out:
- q->stats.dropped++;
+ WRITE_ONCE(q->stats.dropped, q->stats.dropped + 1);
q->vars.accu_prob = 0;
return qdisc_drop_reason(skb, sch, to_free, reason);
}
@@ -267,11 +267,11 @@ void pie_process_dequeue(struct sk_buff *skb, struct pie_params *params,
count = count / dtime;
if (vars->avg_dq_rate == 0)
- vars->avg_dq_rate = count;
+ WRITE_ONCE(vars->avg_dq_rate, count);
else
- vars->avg_dq_rate =
+ WRITE_ONCE(vars->avg_dq_rate,
(vars->avg_dq_rate -
- (vars->avg_dq_rate >> 3)) + (count >> 3);
+ (vars->avg_dq_rate >> 3)) + (count >> 3));
/* If the queue has receded below the threshold, we hold
* on to the last drain rate calculated, else we reset
@@ -381,7 +381,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars,
if (delta > 0) {
/* prevent overflow */
if (vars->prob < oldprob) {
- vars->prob = MAX_PROB;
+ WRITE_ONCE(vars->prob, MAX_PROB);
/* Prevent normalization error. If probability is at
* maximum value already, we normalize it here, and
* skip the check to do a non-linear drop in the next
@@ -392,7 +392,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars,
} else {
/* prevent underflow */
if (vars->prob > oldprob)
- vars->prob = 0;
+ WRITE_ONCE(vars->prob, 0);
}
/* Non-linear drop in probability: Reduce drop probability quickly if
@@ -403,7 +403,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars,
/* Reduce drop probability to 98.4% */
vars->prob -= vars->prob / 64;
- vars->qdelay = qdelay;
+ WRITE_ONCE(vars->qdelay, qdelay);
vars->backlog_old = backlog;
/* We restart the measurement cycle if the following conditions are met
@@ -502,21 +502,21 @@ static int pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
struct pie_sched_data *q = qdisc_priv(sch);
struct tc_pie_xstats st = {
.prob = q->vars.prob << BITS_PER_BYTE,
- .delay = ((u32)PSCHED_TICKS2NS(q->vars.qdelay)) /
+ .delay = ((u32)PSCHED_TICKS2NS(READ_ONCE(q->vars.qdelay))) /
NSEC_PER_USEC,
- .packets_in = q->stats.packets_in,
- .overlimit = q->stats.overlimit,
- .maxq = q->stats.maxq,
- .dropped = q->stats.dropped,
- .ecn_mark = q->stats.ecn_mark,
+ .packets_in = READ_ONCE(q->stats.packets_in),
+ .overlimit = READ_ONCE(q->stats.overlimit),
+ .maxq = READ_ONCE(q->stats.maxq),
+ .dropped = READ_ONCE(q->stats.dropped),
+ .ecn_mark = READ_ONCE(q->stats.ecn_mark),
};
/* avg_dq_rate is only valid if dq_rate_estimator is enabled */
st.dq_rate_estimating = q->params.dq_rate_estimator;
/* unscale and return dq_rate in bytes per sec */
- if (q->params.dq_rate_estimator)
- st.avg_dq_rate = q->vars.avg_dq_rate *
+ if (st.dq_rate_estimating)
+ st.avg_dq_rate = READ_ONCE(q->vars.avg_dq_rate) *
(PSCHED_TICKS_PER_SEC) >> PIE_SCALE;
return gnet_stats_copy_app(d, &st, sizeof(st));
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 10/15] net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats()
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (8 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 09/15] net/sched: sch_pie: annotate data-races in pie_dump_stats() Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 11/15] net_sched: sch_hhf: annotate data-races in hhf_dump_stats() Eric Dumazet
` (3 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
fq_codel_dump_stats() acquires the qdisc spinlock a bit too late.
Move this acquisition before we fill tc_fq_pie_xstats with live data.
Alternative would be to add READ_ONCE() and WRITE_ONCE() annotations,
but the spinlock is needed anyway.
Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_fq_pie.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 197f0df0a6eb06ab4ce25eefe01d32a35dbd84af..72f48fa4010bebbe6be212938b457db21ff3c5a0 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -509,18 +509,19 @@ static int fq_pie_dump(struct Qdisc *sch, struct sk_buff *skb)
static int fq_pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
{
struct fq_pie_sched_data *q = qdisc_priv(sch);
- struct tc_fq_pie_xstats st = {
- .packets_in = q->stats.packets_in,
- .overlimit = q->stats.overlimit,
- .overmemory = q->overmemory,
- .dropped = q->stats.dropped,
- .ecn_mark = q->stats.ecn_mark,
- .new_flow_count = q->new_flow_count,
- .memory_usage = q->memory_usage,
- };
+ struct tc_fq_pie_xstats st = { 0 };
struct list_head *pos;
sch_tree_lock(sch);
+
+ st.packets_in = q->stats.packets_in;
+ st.overlimit = q->stats.overlimit;
+ st.overmemory = q->overmemory;
+ st.dropped = q->stats.dropped;
+ st.ecn_mark = q->stats.ecn_mark;
+ st.new_flow_count = q->new_flow_count;
+ st.memory_usage = q->memory_usage;
+
list_for_each(pos, &q->new_flows)
st.new_flows_len++;
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 11/15] net_sched: sch_hhf: annotate data-races in hhf_dump_stats()
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (9 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 10/15] net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats() Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 12/15] net/sched: sch_choke: annotate data-races in choke_dump_stats() Eric Dumazet
` (2 subsequent siblings)
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
hhf_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_hhf.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 69b6f0a5471cb9a3b7b760144683f2b249091d89..1e25b75daae2e5de31bd212dfa1f6d7aea927174 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -198,7 +198,8 @@ static struct hh_flow_state *seek_list(const u32 hash,
return NULL;
list_del(&flow->flowchain);
kfree(flow);
- q->hh_flows_current_cnt--;
+ WRITE_ONCE(q->hh_flows_current_cnt,
+ q->hh_flows_current_cnt - 1);
} else if (flow->hash_id == hash) {
return flow;
}
@@ -226,7 +227,7 @@ static struct hh_flow_state *alloc_new_hh(struct list_head *head,
}
if (q->hh_flows_current_cnt >= q->hh_flows_limit) {
- q->hh_flows_overlimit++;
+ WRITE_ONCE(q->hh_flows_overlimit, q->hh_flows_overlimit + 1);
return NULL;
}
/* Create new entry. */
@@ -234,7 +235,7 @@ static struct hh_flow_state *alloc_new_hh(struct list_head *head,
if (!flow)
return NULL;
- q->hh_flows_current_cnt++;
+ WRITE_ONCE(q->hh_flows_current_cnt, q->hh_flows_current_cnt + 1);
INIT_LIST_HEAD(&flow->flowchain);
list_add_tail(&flow->flowchain, head);
@@ -309,7 +310,7 @@ static enum wdrr_bucket_idx hhf_classify(struct sk_buff *skb, struct Qdisc *sch)
return WDRR_BUCKET_FOR_NON_HH;
flow->hash_id = hash;
flow->hit_timestamp = now;
- q->hh_flows_total_cnt++;
+ WRITE_ONCE(q->hh_flows_total_cnt, q->hh_flows_total_cnt + 1);
/* By returning without updating counters in q->hhf_arrays,
* we implicitly implement "shielding" (see Optimization O1).
@@ -404,7 +405,7 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return NET_XMIT_SUCCESS;
prev_backlog = sch->qstats.backlog;
- q->drop_overlimit++;
+ WRITE_ONCE(q->drop_overlimit, q->drop_overlimit + 1);
/* Return Congestion Notification only if we dropped a packet from this
* bucket.
*/
@@ -687,10 +688,10 @@ static int hhf_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
{
struct hhf_sched_data *q = qdisc_priv(sch);
struct tc_hhf_xstats st = {
- .drop_overlimit = q->drop_overlimit,
- .hh_overlimit = q->hh_flows_overlimit,
- .hh_tot_count = q->hh_flows_total_cnt,
- .hh_cur_count = q->hh_flows_current_cnt,
+ .drop_overlimit = READ_ONCE(q->drop_overlimit),
+ .hh_overlimit = READ_ONCE(q->hh_flows_overlimit),
+ .hh_tot_count = READ_ONCE(q->hh_flows_total_cnt),
+ .hh_cur_count = READ_ONCE(q->hh_flows_current_cnt),
};
return gnet_stats_copy_app(d, &st, sizeof(st));
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 12/15] net/sched: sch_choke: annotate data-races in choke_dump_stats()
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (10 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 11/15] net_sched: sch_hhf: annotate data-races in hhf_dump_stats() Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 13/15] net/sched: sch_cake: annotate data-races in cake_dump_stats() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 14/15] net/sched: mq: no longer acquire qdisc spinlocks in dump operations Eric Dumazet
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
choke_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_choke.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index cd0785ad8e74314e6d5c88144ffcf64f286e02dd..73d3e673dc7b16cf2b9ac1d622da280c2ceb064a 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -229,7 +229,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
/* Draw a packet at random from queue and compare flow */
if (choke_match_random(q, skb, &idx)) {
- q->stats.matched++;
+ WRITE_ONCE(q->stats.matched, q->stats.matched + 1);
choke_drop_by_idx(sch, idx, to_free);
goto congestion_drop;
}
@@ -241,11 +241,13 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qdisc_qstats_overlimit(sch);
if (use_harddrop(q) || !use_ecn(q) ||
!INET_ECN_set_ce(skb)) {
- q->stats.forced_drop++;
+ WRITE_ONCE(q->stats.forced_drop,
+ q->stats.forced_drop + 1);
goto congestion_drop;
}
- q->stats.forced_mark++;
+ WRITE_ONCE(q->stats.forced_mark,
+ q->stats.forced_mark + 1);
} else if (++q->vars.qcount) {
if (red_mark_probability(p, &q->vars, q->vars.qavg)) {
q->vars.qcount = 0;
@@ -253,11 +255,13 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qdisc_qstats_overlimit(sch);
if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
- q->stats.prob_drop++;
+ WRITE_ONCE(q->stats.prob_drop,
+ q->stats.prob_drop + 1);
goto congestion_drop;
}
- q->stats.prob_mark++;
+ WRITE_ONCE(q->stats.prob_mark,
+ q->stats.prob_mark + 1);
}
} else
q->vars.qR = red_random(p);
@@ -272,7 +276,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return NET_XMIT_SUCCESS;
}
- q->stats.pdrop++;
+ WRITE_ONCE(q->stats.pdrop, q->stats.pdrop + 1);
return qdisc_drop(skb, sch, to_free);
congestion_drop:
@@ -461,10 +465,12 @@ static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
{
struct choke_sched_data *q = qdisc_priv(sch);
struct tc_choke_xstats st = {
- .early = q->stats.prob_drop + q->stats.forced_drop,
- .marked = q->stats.prob_mark + q->stats.forced_mark,
- .pdrop = q->stats.pdrop,
- .matched = q->stats.matched,
+ .early = READ_ONCE(q->stats.prob_drop) +
+ READ_ONCE(q->stats.forced_drop),
+ .marked = READ_ONCE(q->stats.prob_mark) +
+ READ_ONCE(q->stats.forced_mark),
+ .pdrop = READ_ONCE(q->stats.pdrop),
+ .matched = READ_ONCE(q->stats.matched),
};
return gnet_stats_copy_app(d, &st, sizeof(st));
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 13/15] net/sched: sch_cake: annotate data-races in cake_dump_stats()
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (11 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 12/15] net/sched: sch_choke: annotate data-races in choke_dump_stats() Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 14/15] net/sched: mq: no longer acquire qdisc spinlocks in dump operations Eric Dumazet
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet, Toke Høiland-Jørgensen
cake_dump_stats() and cake_dump_class_stats() run without qdisc
spinlock being held.
Add READ_ONCE()/WRITE_ONCE() annotations.
Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: "Toke Høiland-Jørgensen" <toke@toke.dk>
---
net/sched/sch_cake.c | 404 ++++++++++++++++++++++++-------------------
1 file changed, 225 insertions(+), 179 deletions(-)
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 32e672820c00a88c6d8fe77a6308405e016525ea..f523f0aa4d830e9d3ec4d43bb123e1dc4f8f289d 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -399,14 +399,14 @@ static void cake_configure_rates(struct Qdisc *sch, u64 rate, bool rate_adjust);
* Here, invsqrt is a fixed point number (< 1.0), 32bit mantissa, aka Q0.32
*/
-static void cobalt_newton_step(struct cobalt_vars *vars)
+static void cobalt_newton_step(struct cobalt_vars *vars, u32 count)
{
u32 invsqrt, invsqrt2;
u64 val;
invsqrt = vars->rec_inv_sqrt;
invsqrt2 = ((u64)invsqrt * invsqrt) >> 32;
- val = (3LL << 32) - ((u64)vars->count * invsqrt2);
+ val = (3LL << 32) - ((u64)count * invsqrt2);
val >>= 2; /* avoid overflow in following multiply */
val = (val * invsqrt) >> (32 - 2 + 1);
@@ -414,12 +414,12 @@ static void cobalt_newton_step(struct cobalt_vars *vars)
vars->rec_inv_sqrt = val;
}
-static void cobalt_invsqrt(struct cobalt_vars *vars)
+static void cobalt_invsqrt(struct cobalt_vars *vars, u32 count)
{
- if (vars->count < REC_INV_SQRT_CACHE)
- vars->rec_inv_sqrt = inv_sqrt_cache[vars->count];
+ if (count < REC_INV_SQRT_CACHE)
+ vars->rec_inv_sqrt = inv_sqrt_cache[count];
else
- cobalt_newton_step(vars);
+ cobalt_newton_step(vars, count);
}
static void cobalt_vars_init(struct cobalt_vars *vars)
@@ -449,16 +449,19 @@ static bool cobalt_queue_full(struct cobalt_vars *vars,
bool up = false;
if (ktime_to_ns(ktime_sub(now, vars->blue_timer)) > p->target) {
- up = !vars->p_drop;
- vars->p_drop += p->p_inc;
- if (vars->p_drop < p->p_inc)
- vars->p_drop = ~0;
- vars->blue_timer = now;
- }
- vars->dropping = true;
- vars->drop_next = now;
+ u32 p_drop = vars->p_drop;
+
+ up = !p_drop;
+ p_drop += p->p_inc;
+ if (p_drop < p->p_inc)
+ p_drop = ~0;
+ WRITE_ONCE(vars->p_drop, p_drop);
+ WRITE_ONCE(vars->blue_timer, now);
+ }
+ WRITE_ONCE(vars->dropping, true);
+ WRITE_ONCE(vars->drop_next, now);
if (!vars->count)
- vars->count = 1;
+ WRITE_ONCE(vars->count, 1);
return up;
}
@@ -474,21 +477,25 @@ static bool cobalt_queue_empty(struct cobalt_vars *vars,
if (vars->p_drop &&
ktime_to_ns(ktime_sub(now, vars->blue_timer)) > p->target) {
- if (vars->p_drop < p->p_dec)
- vars->p_drop = 0;
+ u32 p_drop = vars->p_drop;
+
+ if (p_drop < p->p_dec)
+ p_drop = 0;
else
- vars->p_drop -= p->p_dec;
- vars->blue_timer = now;
- down = !vars->p_drop;
+ p_drop -= p->p_dec;
+ WRITE_ONCE(vars->p_drop, p_drop);
+ WRITE_ONCE(vars->blue_timer, now);
+ down = !p_drop;
}
- vars->dropping = false;
+ WRITE_ONCE(vars->dropping, false);
if (vars->count && ktime_to_ns(ktime_sub(now, vars->drop_next)) >= 0) {
- vars->count--;
- cobalt_invsqrt(vars);
- vars->drop_next = cobalt_control(vars->drop_next,
- p->interval,
- vars->rec_inv_sqrt);
+ WRITE_ONCE(vars->count, vars->count - 1);
+ cobalt_invsqrt(vars, vars->count);
+ WRITE_ONCE(vars->drop_next,
+ cobalt_control(vars->drop_next,
+ p->interval,
+ vars->rec_inv_sqrt));
}
return down;
@@ -507,6 +514,7 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
bool next_due, over_target;
ktime_t schedule;
u64 sojourn;
+ u32 count;
/* The 'schedule' variable records, in its sign, whether 'now' is before or
* after 'drop_next'. This allows 'drop_next' to be updated before the next
@@ -528,45 +536,50 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
over_target = sojourn > p->target &&
sojourn > p->mtu_time * bulk_flows * 2 &&
sojourn > p->mtu_time * 4;
- next_due = vars->count && ktime_to_ns(schedule) >= 0;
+ count = vars->count;
+ next_due = count && ktime_to_ns(schedule) >= 0;
vars->ecn_marked = false;
if (over_target) {
if (!vars->dropping) {
- vars->dropping = true;
- vars->drop_next = cobalt_control(now,
- p->interval,
- vars->rec_inv_sqrt);
+ WRITE_ONCE(vars->dropping, true);
+ WRITE_ONCE(vars->drop_next,
+ cobalt_control(now,
+ p->interval,
+ vars->rec_inv_sqrt));
}
- if (!vars->count)
- vars->count = 1;
+ if (!count)
+ count = 1;
} else if (vars->dropping) {
- vars->dropping = false;
+ WRITE_ONCE(vars->dropping, false);
}
if (next_due && vars->dropping) {
/* Use ECN mark if possible, otherwise drop */
- if (!(vars->ecn_marked = INET_ECN_set_ce(skb)))
+ vars->ecn_marked = INET_ECN_set_ce(skb);
+ if (!vars->ecn_marked)
reason = QDISC_DROP_CONGESTED;
- vars->count++;
- if (!vars->count)
- vars->count--;
- cobalt_invsqrt(vars);
- vars->drop_next = cobalt_control(vars->drop_next,
- p->interval,
- vars->rec_inv_sqrt);
+ count++;
+ if (!count)
+ count--;
+ cobalt_invsqrt(vars, count);
+ WRITE_ONCE(vars->drop_next,
+ cobalt_control(vars->drop_next,
+ p->interval,
+ vars->rec_inv_sqrt));
schedule = ktime_sub(now, vars->drop_next);
} else {
while (next_due) {
- vars->count--;
- cobalt_invsqrt(vars);
- vars->drop_next = cobalt_control(vars->drop_next,
- p->interval,
- vars->rec_inv_sqrt);
+ count--;
+ cobalt_invsqrt(vars, count);
+ WRITE_ONCE(vars->drop_next,
+ cobalt_control(vars->drop_next,
+ p->interval,
+ vars->rec_inv_sqrt));
schedule = ktime_sub(now, vars->drop_next);
- next_due = vars->count && ktime_to_ns(schedule) >= 0;
+ next_due = count && ktime_to_ns(schedule) >= 0;
}
}
@@ -575,11 +588,12 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
get_random_u32() < vars->p_drop)
reason = QDISC_DROP_FLOOD_PROTECTION;
+ WRITE_ONCE(vars->count, count);
/* Overload the drop_next field as an activity timeout */
- if (!vars->count)
- vars->drop_next = ktime_add_ns(now, p->interval);
+ if (count)
+ WRITE_ONCE(vars->drop_next, ktime_add_ns(now, p->interval));
else if (ktime_to_ns(schedule) > 0 && reason == QDISC_DROP_UNSPEC)
- vars->drop_next = now;
+ WRITE_ONCE(vars->drop_next, now);
return reason;
}
@@ -813,7 +827,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
i++, k = (k + 1) % CAKE_SET_WAYS) {
if (q->tags[outer_hash + k] == flow_hash) {
if (i)
- q->way_hits++;
+ WRITE_ONCE(q->way_hits, q->way_hits + 1);
if (!q->flows[outer_hash + k].set) {
/* need to increment host refcnts */
@@ -831,7 +845,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
for (i = 0; i < CAKE_SET_WAYS;
i++, k = (k + 1) % CAKE_SET_WAYS) {
if (!q->flows[outer_hash + k].set) {
- q->way_misses++;
+ WRITE_ONCE(q->way_misses, q->way_misses + 1);
allocate_src = cake_dsrc(flow_mode);
allocate_dst = cake_ddst(flow_mode);
goto found;
@@ -841,7 +855,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
/* With no empty queues, default to the original
* queue, accept the collision, update the host tags.
*/
- q->way_collisions++;
+ WRITE_ONCE(q->way_collisions, q->way_collisions + 1);
allocate_src = cake_dsrc(flow_mode);
allocate_dst = cake_ddst(flow_mode);
@@ -875,7 +889,8 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
q->flows[reduced_hash].srchost = srchost_idx;
if (q->flows[reduced_hash].set == CAKE_SET_BULK)
- cake_inc_srchost_bulk_flow_count(q, &q->flows[reduced_hash], flow_mode);
+ cake_inc_srchost_bulk_flow_count(q, &q->flows[reduced_hash],
+ flow_mode);
}
if (allocate_dst) {
@@ -899,7 +914,8 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
q->flows[reduced_hash].dsthost = dsthost_idx;
if (q->flows[reduced_hash].set == CAKE_SET_BULK)
- cake_inc_dsthost_bulk_flow_count(q, &q->flows[reduced_hash], flow_mode);
+ cake_inc_dsthost_bulk_flow_count(q, &q->flows[reduced_hash],
+ flow_mode);
}
}
@@ -1379,9 +1395,9 @@ static u32 cake_calc_overhead(struct cake_sched_data *qd, u32 len, u32 off)
len -= off;
if (qd->max_netlen < len)
- qd->max_netlen = len;
+ WRITE_ONCE(qd->max_netlen, len);
if (qd->min_netlen > len)
- qd->min_netlen = len;
+ WRITE_ONCE(qd->min_netlen, len);
len += q->rate_overhead;
@@ -1401,9 +1417,9 @@ static u32 cake_calc_overhead(struct cake_sched_data *qd, u32 len, u32 off)
}
if (qd->max_adjlen < len)
- qd->max_adjlen = len;
+ WRITE_ONCE(qd->max_adjlen, len);
if (qd->min_adjlen > len)
- qd->min_adjlen = len;
+ WRITE_ONCE(qd->min_adjlen, len);
return len;
}
@@ -1416,7 +1432,7 @@ static u32 cake_overhead(struct cake_sched_data *q, const struct sk_buff *skb)
u16 segs = qdisc_pkt_segs(skb);
u32 len = qdisc_pkt_len(skb);
- q->avg_netoff = cake_ewma(q->avg_netoff, off << 16, 8);
+ WRITE_ONCE(q->avg_netoff, cake_ewma(q->avg_netoff, off << 16, 8));
if (segs == 1)
return cake_calc_overhead(q, len, off);
@@ -1590,16 +1606,17 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
}
if (cobalt_queue_full(&flow->cvars, &b->cparams, now))
- b->unresponsive_flow_count++;
+ WRITE_ONCE(b->unresponsive_flow_count,
+ b->unresponsive_flow_count + 1);
len = qdisc_pkt_len(skb);
q->buffer_used -= skb->truesize;
- b->backlogs[idx] -= len;
- b->tin_backlog -= len;
+ WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] - len);
+ WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
qstats_backlog_sub(sch, len);
- flow->dropped++;
- b->tin_dropped++;
+ WRITE_ONCE(flow->dropped, flow->dropped + 1);
+ WRITE_ONCE(b->tin_dropped, b->tin_dropped + 1);
if (q->config->rate_flags & CAKE_FLAG_INGRESS)
cake_advance_shaper(q, b, skb, now, true);
@@ -1795,7 +1812,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
if (unlikely(len > b->max_skblen))
- b->max_skblen = len;
+ WRITE_ONCE(b->max_skblen, len);
if (qdisc_pkt_segs(skb) > 1 && q->config->rate_flags & CAKE_FLAG_SPLIT_GSO) {
struct sk_buff *segs, *nskb;
@@ -1819,13 +1836,13 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
numsegs++;
slen += segs->len;
q->buffer_used += segs->truesize;
- b->packets++;
}
/* stats */
- b->bytes += slen;
- b->backlogs[idx] += slen;
- b->tin_backlog += slen;
+ WRITE_ONCE(b->bytes, b->bytes + slen);
+ WRITE_ONCE(b->packets, b->packets + numsegs);
+ WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] + slen);
+ WRITE_ONCE(b->tin_backlog, b->tin_backlog + slen);
qstats_backlog_add(sch, slen);
q->avg_window_bytes += slen;
@@ -1843,10 +1860,10 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
ack = cake_ack_filter(q, flow);
if (ack) {
- b->ack_drops++;
+ WRITE_ONCE(b->ack_drops, b->ack_drops + 1);
qdisc_qstats_drop(sch);
ack_pkt_len = qdisc_pkt_len(ack);
- b->bytes += ack_pkt_len;
+ WRITE_ONCE(b->bytes, b->bytes + ack_pkt_len);
q->buffer_used += skb->truesize - ack->truesize;
if (q->config->rate_flags & CAKE_FLAG_INGRESS)
cake_advance_shaper(q, b, ack, now, true);
@@ -1859,10 +1876,10 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
/* stats */
- b->packets++;
- b->bytes += len - ack_pkt_len;
- b->backlogs[idx] += len - ack_pkt_len;
- b->tin_backlog += len - ack_pkt_len;
+ WRITE_ONCE(b->packets, b->packets + 1);
+ WRITE_ONCE(b->bytes, b->bytes + len - ack_pkt_len);
+ WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] + len - ack_pkt_len);
+ WRITE_ONCE(b->tin_backlog, b->tin_backlog + len - ack_pkt_len);
qstats_backlog_add(sch, len - ack_pkt_len);
q->avg_window_bytes += len - ack_pkt_len;
}
@@ -1894,9 +1911,9 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
u64 b = q->avg_window_bytes * (u64)NSEC_PER_SEC;
b = div64_u64(b, window_interval);
- q->avg_peak_bandwidth =
- cake_ewma(q->avg_peak_bandwidth, b,
- b > q->avg_peak_bandwidth ? 2 : 8);
+ WRITE_ONCE(q->avg_peak_bandwidth,
+ cake_ewma(q->avg_peak_bandwidth, b,
+ b > q->avg_peak_bandwidth ? 2 : 8));
q->avg_window_bytes = 0;
q->avg_window_begin = now;
@@ -1917,27 +1934,30 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (!flow->set) {
list_add_tail(&flow->flowchain, &b->new_flows);
} else {
- b->decaying_flow_count--;
+ WRITE_ONCE(b->decaying_flow_count,
+ b->decaying_flow_count - 1);
list_move_tail(&flow->flowchain, &b->new_flows);
}
flow->set = CAKE_SET_SPARSE;
- b->sparse_flow_count++;
+ WRITE_ONCE(b->sparse_flow_count,
+ b->sparse_flow_count + 1);
- flow->deficit = cake_get_flow_quantum(b, flow, q->config->flow_mode);
+ WRITE_ONCE(flow->deficit,
+ cake_get_flow_quantum(b, flow, q->config->flow_mode));
} else if (flow->set == CAKE_SET_SPARSE_WAIT) {
/* this flow was empty, accounted as a sparse flow, but actually
* in the bulk rotation.
*/
flow->set = CAKE_SET_BULK;
- b->sparse_flow_count--;
- b->bulk_flow_count++;
+ WRITE_ONCE(b->sparse_flow_count, b->sparse_flow_count - 1);
+ WRITE_ONCE(b->bulk_flow_count, b->bulk_flow_count + 1);
cake_inc_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
cake_inc_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
}
if (q->buffer_used > q->buffer_max_used)
- q->buffer_max_used = q->buffer_used;
+ WRITE_ONCE(q->buffer_max_used, q->buffer_used);
if (q->buffer_used <= q->buffer_limit)
return NET_XMIT_SUCCESS;
@@ -1976,8 +1996,8 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
if (flow->head) {
skb = dequeue_head(flow);
len = qdisc_pkt_len(skb);
- b->backlogs[q->cur_flow] -= len;
- b->tin_backlog -= len;
+ WRITE_ONCE(b->backlogs[q->cur_flow], b->backlogs[q->cur_flow] - len);
+ WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
qstats_backlog_sub(sch, len);
q->buffer_used -= skb->truesize;
qdisc_qlen_dec(sch);
@@ -2042,7 +2062,7 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
cake_configure_rates(sch, new_rate, true);
q->last_checked_active = now;
- q->active_queues = num_active_qs;
+ WRITE_ONCE(q->active_queues, num_active_qs);
}
begin:
@@ -2149,8 +2169,10 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
*/
if (flow->set == CAKE_SET_SPARSE) {
if (flow->head) {
- b->sparse_flow_count--;
- b->bulk_flow_count++;
+ WRITE_ONCE(b->sparse_flow_count,
+ b->sparse_flow_count - 1);
+ WRITE_ONCE(b->bulk_flow_count,
+ b->bulk_flow_count + 1);
cake_inc_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
cake_inc_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
@@ -2165,7 +2187,8 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
}
}
- flow->deficit += cake_get_flow_quantum(b, flow, q->config->flow_mode);
+ WRITE_ONCE(flow->deficit,
+ flow->deficit + cake_get_flow_quantum(b, flow, q->config->flow_mode));
list_move_tail(&flow->flowchain, &b->old_flows);
goto retry;
@@ -2177,7 +2200,8 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
if (!skb) {
/* this queue was actually empty */
if (cobalt_queue_empty(&flow->cvars, &b->cparams, now))
- b->unresponsive_flow_count--;
+ WRITE_ONCE(b->unresponsive_flow_count,
+ b->unresponsive_flow_count - 1);
if (flow->cvars.p_drop || flow->cvars.count ||
ktime_before(now, flow->cvars.drop_next)) {
@@ -2187,16 +2211,22 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
list_move_tail(&flow->flowchain,
&b->decaying_flows);
if (flow->set == CAKE_SET_BULK) {
- b->bulk_flow_count--;
+ WRITE_ONCE(b->bulk_flow_count,
+ b->bulk_flow_count - 1);
- cake_dec_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
- cake_dec_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
+ cake_dec_srchost_bulk_flow_count(b, flow,
+ q->config->flow_mode);
+ cake_dec_dsthost_bulk_flow_count(b, flow,
+ q->config->flow_mode);
- b->decaying_flow_count++;
+ WRITE_ONCE(b->decaying_flow_count,
+ b->decaying_flow_count + 1);
} else if (flow->set == CAKE_SET_SPARSE ||
flow->set == CAKE_SET_SPARSE_WAIT) {
- b->sparse_flow_count--;
- b->decaying_flow_count++;
+ WRITE_ONCE(b->sparse_flow_count,
+ b->sparse_flow_count - 1);
+ WRITE_ONCE(b->decaying_flow_count,
+ b->decaying_flow_count + 1);
}
flow->set = CAKE_SET_DECAYING;
} else {
@@ -2204,14 +2234,20 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
list_del_init(&flow->flowchain);
if (flow->set == CAKE_SET_SPARSE ||
flow->set == CAKE_SET_SPARSE_WAIT)
- b->sparse_flow_count--;
+ WRITE_ONCE(b->sparse_flow_count,
+ b->sparse_flow_count - 1);
else if (flow->set == CAKE_SET_BULK) {
- b->bulk_flow_count--;
+ WRITE_ONCE(b->bulk_flow_count,
+ b->bulk_flow_count - 1);
- cake_dec_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
- cake_dec_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
- } else
- b->decaying_flow_count--;
+ cake_dec_srchost_bulk_flow_count(b, flow,
+ q->config->flow_mode);
+ cake_dec_dsthost_bulk_flow_count(b, flow,
+ q->config->flow_mode);
+ } else {
+ WRITE_ONCE(b->decaying_flow_count,
+ b->decaying_flow_count - 1);
+ }
flow->set = CAKE_SET_NONE;
}
@@ -2230,11 +2266,11 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
if (q->config->rate_flags & CAKE_FLAG_INGRESS) {
len = cake_advance_shaper(q, b, skb,
now, true);
- flow->deficit -= len;
+ WRITE_ONCE(flow->deficit, flow->deficit - len);
b->tin_deficit -= len;
}
- flow->dropped++;
- b->tin_dropped++;
+ WRITE_ONCE(flow->dropped, flow->dropped + 1);
+ WRITE_ONCE(b->tin_dropped, b->tin_dropped + 1);
qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(skb));
qdisc_qstats_drop(sch);
qdisc_dequeue_drop(sch, skb, reason);
@@ -2242,20 +2278,22 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
goto retry;
}
- b->tin_ecn_mark += !!flow->cvars.ecn_marked;
+ WRITE_ONCE(b->tin_ecn_mark, b->tin_ecn_mark + !!flow->cvars.ecn_marked);
qdisc_bstats_update(sch, skb);
WRITE_ONCE(q->last_active, now);
/* collect delay stats */
delay = ktime_to_ns(ktime_sub(now, cobalt_get_enqueue_time(skb)));
- b->avge_delay = cake_ewma(b->avge_delay, delay, 8);
- b->peak_delay = cake_ewma(b->peak_delay, delay,
- delay > b->peak_delay ? 2 : 8);
- b->base_delay = cake_ewma(b->base_delay, delay,
- delay < b->base_delay ? 2 : 8);
+ WRITE_ONCE(b->avge_delay, cake_ewma(b->avge_delay, delay, 8));
+ WRITE_ONCE(b->peak_delay,
+ cake_ewma(b->peak_delay, delay,
+ delay > b->peak_delay ? 2 : 8));
+ WRITE_ONCE(b->base_delay,
+ cake_ewma(b->base_delay, delay,
+ delay < b->base_delay ? 2 : 8));
len = cake_advance_shaper(q, b, skb, now, false);
- flow->deficit -= len;
+ WRITE_ONCE(flow->deficit, flow->deficit - len);
b->tin_deficit -= len;
if (ktime_after(q->time_next_packet, now) && sch->q.qlen) {
@@ -2329,9 +2367,8 @@ static void cake_set_rate(struct cake_tin_data *b, u64 rate, u32 mtu,
u8 rate_shft = 0;
u64 rate_ns = 0;
- b->flow_quantum = 1514;
if (rate) {
- b->flow_quantum = max(min(rate >> 12, 1514ULL), 300ULL);
+ WRITE_ONCE(b->flow_quantum, max(min(rate >> 12, 1514ULL), 300ULL));
rate_shft = 34;
rate_ns = ((u64)NSEC_PER_SEC) << rate_shft;
rate_ns = div64_u64(rate_ns, max(MIN_RATE, rate));
@@ -2339,9 +2376,11 @@ static void cake_set_rate(struct cake_tin_data *b, u64 rate, u32 mtu,
rate_ns >>= 1;
rate_shft--;
}
- } /* else unlimited, ie. zero delay */
-
- b->tin_rate_bps = rate;
+ } else {
+ /* else unlimited, ie. zero delay */
+ WRITE_ONCE(b->flow_quantum, 1514);
+ }
+ WRITE_ONCE(b->tin_rate_bps, rate);
b->tin_rate_ns = rate_ns;
b->tin_rate_shft = rate_shft;
@@ -2350,10 +2389,11 @@ static void cake_set_rate(struct cake_tin_data *b, u64 rate, u32 mtu,
byte_target_ns = (byte_target * rate_ns) >> rate_shft;
- b->cparams.target = max((byte_target_ns * 3) / 2, target_ns);
- b->cparams.interval = max(rtt_est_ns +
- b->cparams.target - target_ns,
- b->cparams.target * 2);
+ WRITE_ONCE(b->cparams.target,
+ max((byte_target_ns * 3) / 2, target_ns));
+ WRITE_ONCE(b->cparams.interval,
+ max(rtt_est_ns + b->cparams.target - target_ns,
+ b->cparams.target * 2));
b->cparams.mtu_time = byte_target_ns;
b->cparams.p_inc = 1 << 24; /* 1/256 */
b->cparams.p_dec = 1 << 20; /* 1/4096 */
@@ -2611,25 +2651,27 @@ static void cake_reconfigure(struct Qdisc *sch)
{
struct cake_sched_data *qd = qdisc_priv(sch);
struct cake_sched_config *q = qd->config;
+ u32 buffer_limit;
cake_configure_rates(sch, qd->config->rate_bps, false);
if (q->buffer_config_limit) {
- qd->buffer_limit = q->buffer_config_limit;
+ buffer_limit = q->buffer_config_limit;
} else if (q->rate_bps) {
u64 t = q->rate_bps * q->interval;
do_div(t, USEC_PER_SEC / 4);
- qd->buffer_limit = max_t(u32, t, 4U << 20);
+ buffer_limit = max_t(u32, t, 4U << 20);
} else {
- qd->buffer_limit = ~0;
+ buffer_limit = ~0;
}
sch->flags &= ~TCQ_F_CAN_BYPASS;
- qd->buffer_limit = min(qd->buffer_limit,
- max(sch->limit * psched_mtu(qdisc_dev(sch)),
- q->buffer_config_limit));
+ WRITE_ONCE(qd->buffer_limit,
+ min(buffer_limit,
+ max(sch->limit * psched_mtu(qdisc_dev(sch)),
+ q->buffer_config_limit)));
}
static int cake_config_change(struct cake_sched_config *q, struct nlattr *opt,
@@ -2774,10 +2816,10 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
return ret;
if (overhead_changed) {
- qd->max_netlen = 0;
- qd->max_adjlen = 0;
- qd->min_netlen = ~0;
- qd->min_adjlen = ~0;
+ WRITE_ONCE(qd->max_netlen, 0);
+ WRITE_ONCE(qd->max_adjlen, 0);
+ WRITE_ONCE(qd->min_netlen, ~0);
+ WRITE_ONCE(qd->min_adjlen, ~0);
}
if (qd->tins) {
@@ -2995,15 +3037,15 @@ static int cake_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
goto nla_put_failure; \
} while (0)
- PUT_STAT_U64(CAPACITY_ESTIMATE64, q->avg_peak_bandwidth);
- PUT_STAT_U32(MEMORY_LIMIT, q->buffer_limit);
- PUT_STAT_U32(MEMORY_USED, q->buffer_max_used);
- PUT_STAT_U32(AVG_NETOFF, ((q->avg_netoff + 0x8000) >> 16));
- PUT_STAT_U32(MAX_NETLEN, q->max_netlen);
- PUT_STAT_U32(MAX_ADJLEN, q->max_adjlen);
- PUT_STAT_U32(MIN_NETLEN, q->min_netlen);
- PUT_STAT_U32(MIN_ADJLEN, q->min_adjlen);
- PUT_STAT_U32(ACTIVE_QUEUES, q->active_queues);
+ PUT_STAT_U64(CAPACITY_ESTIMATE64, READ_ONCE(q->avg_peak_bandwidth));
+ PUT_STAT_U32(MEMORY_LIMIT, READ_ONCE(q->buffer_limit));
+ PUT_STAT_U32(MEMORY_USED, READ_ONCE(q->buffer_max_used));
+ PUT_STAT_U32(AVG_NETOFF, ((READ_ONCE(q->avg_netoff) + 0x8000) >> 16));
+ PUT_STAT_U32(MAX_NETLEN, READ_ONCE(q->max_netlen));
+ PUT_STAT_U32(MAX_ADJLEN, READ_ONCE(q->max_adjlen));
+ PUT_STAT_U32(MIN_NETLEN, READ_ONCE(q->min_netlen));
+ PUT_STAT_U32(MIN_ADJLEN, READ_ONCE(q->min_adjlen));
+ PUT_STAT_U32(ACTIVE_QUEUES, READ_ONCE(q->active_queues));
#undef PUT_STAT_U32
#undef PUT_STAT_U64
@@ -3029,38 +3071,38 @@ static int cake_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
if (!ts)
goto nla_put_failure;
- PUT_TSTAT_U64(THRESHOLD_RATE64, b->tin_rate_bps);
- PUT_TSTAT_U64(SENT_BYTES64, b->bytes);
- PUT_TSTAT_U32(BACKLOG_BYTES, b->tin_backlog);
+ PUT_TSTAT_U64(THRESHOLD_RATE64, READ_ONCE(b->tin_rate_bps));
+ PUT_TSTAT_U64(SENT_BYTES64, READ_ONCE(b->bytes));
+ PUT_TSTAT_U32(BACKLOG_BYTES, READ_ONCE(b->tin_backlog));
PUT_TSTAT_U32(TARGET_US,
- ktime_to_us(ns_to_ktime(b->cparams.target)));
+ ktime_to_us(ns_to_ktime(READ_ONCE(b->cparams.target))));
PUT_TSTAT_U32(INTERVAL_US,
- ktime_to_us(ns_to_ktime(b->cparams.interval)));
+ ktime_to_us(ns_to_ktime(READ_ONCE(b->cparams.interval))));
- PUT_TSTAT_U32(SENT_PACKETS, b->packets);
- PUT_TSTAT_U32(DROPPED_PACKETS, b->tin_dropped);
- PUT_TSTAT_U32(ECN_MARKED_PACKETS, b->tin_ecn_mark);
- PUT_TSTAT_U32(ACKS_DROPPED_PACKETS, b->ack_drops);
+ PUT_TSTAT_U32(SENT_PACKETS, READ_ONCE(b->packets));
+ PUT_TSTAT_U32(DROPPED_PACKETS, READ_ONCE(b->tin_dropped));
+ PUT_TSTAT_U32(ECN_MARKED_PACKETS, READ_ONCE(b->tin_ecn_mark));
+ PUT_TSTAT_U32(ACKS_DROPPED_PACKETS, READ_ONCE(b->ack_drops));
PUT_TSTAT_U32(PEAK_DELAY_US,
- ktime_to_us(ns_to_ktime(b->peak_delay)));
+ ktime_to_us(ns_to_ktime(READ_ONCE(b->peak_delay))));
PUT_TSTAT_U32(AVG_DELAY_US,
- ktime_to_us(ns_to_ktime(b->avge_delay)));
+ ktime_to_us(ns_to_ktime(READ_ONCE(b->avge_delay))));
PUT_TSTAT_U32(BASE_DELAY_US,
- ktime_to_us(ns_to_ktime(b->base_delay)));
+ ktime_to_us(ns_to_ktime(READ_ONCE(b->base_delay))));
- PUT_TSTAT_U32(WAY_INDIRECT_HITS, b->way_hits);
- PUT_TSTAT_U32(WAY_MISSES, b->way_misses);
- PUT_TSTAT_U32(WAY_COLLISIONS, b->way_collisions);
+ PUT_TSTAT_U32(WAY_INDIRECT_HITS, READ_ONCE(b->way_hits));
+ PUT_TSTAT_U32(WAY_MISSES, READ_ONCE(b->way_misses));
+ PUT_TSTAT_U32(WAY_COLLISIONS, READ_ONCE(b->way_collisions));
- PUT_TSTAT_U32(SPARSE_FLOWS, b->sparse_flow_count +
- b->decaying_flow_count);
- PUT_TSTAT_U32(BULK_FLOWS, b->bulk_flow_count);
- PUT_TSTAT_U32(UNRESPONSIVE_FLOWS, b->unresponsive_flow_count);
- PUT_TSTAT_U32(MAX_SKBLEN, b->max_skblen);
+ PUT_TSTAT_U32(SPARSE_FLOWS, READ_ONCE(b->sparse_flow_count) +
+ READ_ONCE(b->decaying_flow_count));
+ PUT_TSTAT_U32(BULK_FLOWS, READ_ONCE(b->bulk_flow_count));
+ PUT_TSTAT_U32(UNRESPONSIVE_FLOWS, READ_ONCE(b->unresponsive_flow_count));
+ PUT_TSTAT_U32(MAX_SKBLEN, READ_ONCE(b->max_skblen));
- PUT_TSTAT_U32(FLOW_QUANTUM, b->flow_quantum);
+ PUT_TSTAT_U32(FLOW_QUANTUM, READ_ONCE(b->flow_quantum));
nla_nest_end(d->skb, ts);
}
@@ -3128,7 +3170,7 @@ static int cake_dump_class_stats(struct Qdisc *sch, unsigned long cl,
flow = &b->flows[idx % CAKE_QUEUES];
- if (flow->head) {
+ if (READ_ONCE(flow->head)) {
sch_tree_lock(sch);
skb = flow->head;
while (skb) {
@@ -3137,13 +3179,15 @@ static int cake_dump_class_stats(struct Qdisc *sch, unsigned long cl,
}
sch_tree_unlock(sch);
}
- qs.backlog = b->backlogs[idx % CAKE_QUEUES];
- qs.drops = flow->dropped;
+ qs.backlog = READ_ONCE(b->backlogs[idx % CAKE_QUEUES]);
+ qs.drops = READ_ONCE(flow->dropped);
}
if (gnet_stats_copy_queue(d, NULL, &qs, qs.qlen) < 0)
return -1;
if (flow) {
ktime_t now = ktime_get();
+ bool dropping;
+ u32 p_drop;
stats = nla_nest_start_noflag(d->skb, TCA_STATS_APP);
if (!stats)
@@ -3158,21 +3202,23 @@ static int cake_dump_class_stats(struct Qdisc *sch, unsigned long cl,
goto nla_put_failure; \
} while (0)
- PUT_STAT_S32(DEFICIT, flow->deficit);
- PUT_STAT_U32(DROPPING, flow->cvars.dropping);
- PUT_STAT_U32(COBALT_COUNT, flow->cvars.count);
- PUT_STAT_U32(P_DROP, flow->cvars.p_drop);
- if (flow->cvars.p_drop) {
+ PUT_STAT_S32(DEFICIT, READ_ONCE(flow->deficit));
+ dropping = READ_ONCE(flow->cvars.dropping);
+ PUT_STAT_U32(DROPPING, dropping);
+ PUT_STAT_U32(COBALT_COUNT, READ_ONCE(flow->cvars.count));
+ p_drop = READ_ONCE(flow->cvars.p_drop);
+ PUT_STAT_U32(P_DROP, p_drop);
+ if (p_drop) {
PUT_STAT_S32(BLUE_TIMER_US,
ktime_to_us(
ktime_sub(now,
- flow->cvars.blue_timer)));
+ READ_ONCE(flow->cvars.blue_timer))));
}
- if (flow->cvars.dropping) {
+ if (dropping) {
PUT_STAT_S32(DROP_NEXT_US,
ktime_to_us(
ktime_sub(now,
- flow->cvars.drop_next)));
+ READ_ONCE(flow->cvars.drop_next))));
}
if (nla_nest_end(d->skb, stats) < 0)
@@ -3298,10 +3344,10 @@ static int cake_mq_change(struct Qdisc *sch, struct nlattr *opt,
struct cake_sched_data *qd = qdisc_priv(chld);
if (overhead_changed) {
- qd->max_netlen = 0;
- qd->max_adjlen = 0;
- qd->min_netlen = ~0;
- qd->min_adjlen = ~0;
+ WRITE_ONCE(qd->max_netlen, 0);
+ WRITE_ONCE(qd->max_adjlen, 0);
+ WRITE_ONCE(qd->min_netlen, ~0);
+ WRITE_ONCE(qd->min_adjlen, ~0);
}
if (qd->tins) {
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH v3 net-next 14/15] net/sched: mq: no longer acquire qdisc spinlocks in dump operations
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
` (12 preceding siblings ...)
2026-04-10 18:22 ` [PATCH v3 net-next 13/15] net/sched: sch_cake: annotate data-races in cake_dump_stats() Eric Dumazet
@ 2026-04-10 18:22 ` Eric Dumazet
13 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2026-04-10 18:22 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, netdev, eric.dumazet,
Eric Dumazet
Prepare mq_dump_common(), mqprio_dump() and mqprio_dump_class_stats()
for RTNL avoidance.
Use private variables instead of assuming sch->bstats and sch->qstats
can be used when folding stats from children.
This means the children qdisc spinlocks no longer need to be acquired.
Add qdisc_qlen_lockless() helper, and change gnet_stats_add_basic()
prototype.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/gen_stats.h | 9 +++--
include/net/sch_generic.h | 14 ++++++++
net/core/gen_estimator.c | 24 ++++++-------
net/core/gen_stats.c | 17 +++++-----
net/sched/sch_mq.c | 33 +++++++++++-------
net/sched/sch_mqprio.c | 71 +++++++++++++++++++--------------------
6 files changed, 95 insertions(+), 73 deletions(-)
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 7aa2b8e1fb298c4f994a745b114fc4da785ddf4b..5484b67298e3fe94fe84f0e929799362d21499df 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -21,6 +21,11 @@ struct gnet_stats_basic_sync {
struct u64_stats_sync syncp;
} __aligned(2 * sizeof(u64));
+struct gnet_stats {
+ u64 bytes;
+ u64 packets;
+};
+
struct net_rate_estimator;
struct gnet_dump {
@@ -49,9 +54,9 @@ int gnet_stats_start_copy_compat(struct sk_buff *skb, int type,
int gnet_stats_copy_basic(struct gnet_dump *d,
struct gnet_stats_basic_sync __percpu *cpu,
struct gnet_stats_basic_sync *b, bool running);
-void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
+void gnet_stats_add_basic(struct gnet_stats *bstats,
struct gnet_stats_basic_sync __percpu *cpu,
- struct gnet_stats_basic_sync *b, bool running);
+ const struct gnet_stats_basic_sync *b, bool running);
int gnet_stats_copy_basic_hw(struct gnet_dump *d,
struct gnet_stats_basic_sync __percpu *cpu,
struct gnet_stats_basic_sync *b, bool running);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b0564a39caf4471619b74179a06a0e41e3765d94..92683be33527bb0a5147d095ba08f5f8494933dd 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -542,6 +542,11 @@ static inline int qdisc_qlen(const struct Qdisc *q)
return q->q.qlen;
}
+static inline int qdisc_qlen_lockless(const struct Qdisc *q)
+{
+ return READ_ONCE(q->q.qlen);
+}
+
static inline void qdisc_qlen_inc(struct Qdisc *q)
{
WRITE_ONCE(q->q.qlen, q->q.qlen + 1);
@@ -947,6 +952,15 @@ static inline void _bstats_update(struct gnet_stats_basic_sync *bstats,
u64_stats_update_end(&bstats->syncp);
}
+static inline void _bstats_set(struct gnet_stats_basic_sync *bstats,
+ u64 bytes, u64 packets)
+{
+ u64_stats_update_begin(&bstats->syncp);
+ u64_stats_set(&bstats->bytes, bytes);
+ u64_stats_set(&bstats->packets, packets);
+ u64_stats_update_end(&bstats->syncp);
+}
+
static inline void bstats_update(struct gnet_stats_basic_sync *bstats,
const struct sk_buff *skb)
{
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index c34e58c6c3e666743e72978f9a78cf7f95a360c3..40990aee45590f2c56c070b0d28f856fc82d1f55 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -60,9 +60,10 @@ struct net_rate_estimator {
};
static void est_fetch_counters(struct net_rate_estimator *e,
- struct gnet_stats_basic_sync *b)
+ struct gnet_stats *b)
{
- gnet_stats_basic_sync_init(b);
+ b->packets = 0;
+ b->bytes = 0;
if (e->stats_lock)
spin_lock(e->stats_lock);
@@ -76,18 +77,15 @@ static void est_fetch_counters(struct net_rate_estimator *e,
static void est_timer(struct timer_list *t)
{
struct net_rate_estimator *est = timer_container_of(est, t, timer);
- struct gnet_stats_basic_sync b;
- u64 b_bytes, b_packets;
+ struct gnet_stats b;
u64 rate, brate;
est_fetch_counters(est, &b);
- b_bytes = u64_stats_read(&b.bytes);
- b_packets = u64_stats_read(&b.packets);
- brate = (b_bytes - est->last_bytes) << (10 - est->intvl_log);
+ brate = (b.bytes - est->last_bytes) << (10 - est->intvl_log);
brate = (brate >> est->ewma_log) - (est->avbps >> est->ewma_log);
- rate = (b_packets - est->last_packets) << (10 - est->intvl_log);
+ rate = (b.packets - est->last_packets) << (10 - est->intvl_log);
rate = (rate >> est->ewma_log) - (est->avpps >> est->ewma_log);
preempt_disable_nested();
@@ -97,8 +95,8 @@ static void est_timer(struct timer_list *t)
write_seqcount_end(&est->seq);
preempt_enable_nested();
- est->last_bytes = b_bytes;
- est->last_packets = b_packets;
+ est->last_bytes = b.bytes;
+ est->last_packets = b.packets;
est->next_jiffies += ((HZ/4) << est->intvl_log);
@@ -138,7 +136,7 @@ int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
{
struct gnet_estimator *parm = nla_data(opt);
struct net_rate_estimator *old, *est;
- struct gnet_stats_basic_sync b;
+ struct gnet_stats b;
int intvl_log;
if (nla_len(opt) < sizeof(*parm))
@@ -172,8 +170,8 @@ int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
est_fetch_counters(est, &b);
if (lock)
local_bh_enable();
- est->last_bytes = u64_stats_read(&b.bytes);
- est->last_packets = u64_stats_read(&b.packets);
+ est->last_bytes = b.bytes;
+ est->last_packets = b.packets;
if (lock)
spin_lock_bh(lock);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 1a2380e74272de8eaf3d4ef453e56105a31e9edf..14ee7a4e3709ad5c64a158d3c8d1177ada3a32b0 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -123,10 +123,9 @@ void gnet_stats_basic_sync_init(struct gnet_stats_basic_sync *b)
}
EXPORT_SYMBOL(gnet_stats_basic_sync_init);
-static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
+static void gnet_stats_add_basic_cpu(struct gnet_stats *bstats,
struct gnet_stats_basic_sync __percpu *cpu)
{
- u64 t_bytes = 0, t_packets = 0;
int i;
for_each_possible_cpu(i) {
@@ -140,19 +139,18 @@ static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
packets = u64_stats_read(&bcpu->packets);
} while (u64_stats_fetch_retry(&bcpu->syncp, start));
- t_bytes += bytes;
- t_packets += packets;
+ bstats->bytes += bytes;
+ bstats->packets += packets;
}
- _bstats_update(bstats, t_bytes, t_packets);
}
-void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
+void gnet_stats_add_basic(struct gnet_stats *bstats,
struct gnet_stats_basic_sync __percpu *cpu,
- struct gnet_stats_basic_sync *b, bool running)
+ const struct gnet_stats_basic_sync *b, bool running)
{
unsigned int start;
- u64 bytes = 0;
u64 packets = 0;
+ u64 bytes = 0;
WARN_ON_ONCE((cpu || running) && in_hardirq());
@@ -167,7 +165,8 @@ void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
packets = u64_stats_read(&b->packets);
} while (running && u64_stats_fetch_retry(&b->syncp, start));
- _bstats_update(bstats, bytes, packets);
+ bstats->bytes += bytes;
+ bstats->packets += packets;
}
EXPORT_SYMBOL(gnet_stats_add_basic);
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index ec8c91d3fde04e59daec2aecdb14d6bf50715e15..0d83e69f2f679988d56920c16acb659d2d1ba636 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -143,30 +143,39 @@ EXPORT_SYMBOL_NS_GPL(mq_attach, "NET_SCHED_INTERNAL");
void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
{
struct net_device *dev = qdisc_dev(sch);
+ struct gnet_stats_queue qstats = { 0 };
+ struct gnet_stats bstats = { 0 };
+ const struct Qdisc *qdisc;
unsigned int qlen = 0;
- struct Qdisc *qdisc;
unsigned int ntx;
- gnet_stats_basic_sync_init(&sch->bstats);
- memset(&sch->qstats, 0, sizeof(sch->qstats));
-
/* MQ supports lockless qdiscs. However, statistics accounting needs
* to account for all, none, or a mix of locked and unlocked child
* qdiscs. Percpu stats are added to counters in-band and locking
* qdisc totals are added at end.
*/
+ rcu_read_lock();
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
- qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
- spin_lock_bh(qdisc_lock(qdisc));
+ qdisc = rcu_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
- gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
- &qdisc->bstats, false);
- gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
+ gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
+ &qdisc->bstats, true);
+ gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
&qdisc->qstats);
- qlen += qdisc_qlen(qdisc);
-
- spin_unlock_bh(qdisc_lock(qdisc));
+ qlen += qdisc_qlen_lockless(qdisc);
}
+ rcu_read_unlock();
+
+ spin_lock_bh(qdisc_lock(sch));
+ _bstats_set(&sch->bstats, bstats.bytes, bstats.packets);
+ spin_unlock_bh(qdisc_lock(sch));
+
+ WRITE_ONCE(sch->qstats.qlen, qstats.qlen);
+ WRITE_ONCE(sch->qstats.backlog, qstats.backlog);
+ WRITE_ONCE(sch->qstats.drops, qstats.drops);
+ WRITE_ONCE(sch->qstats.requeues, qstats.requeues);
+ WRITE_ONCE(sch->qstats.overlimits, qstats.overlimits);
+
WRITE_ONCE(sch->q.qlen, qlen);
}
EXPORT_SYMBOL_NS_GPL(mq_dump_common, "NET_SCHED_INTERNAL");
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 91a92992cd24ab6c30bf7db2288c08cd493c7bc3..0f58b3a3e99a100df929de110fe0bda1a44cc7d6 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -554,32 +554,40 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
struct net_device *dev = qdisc_dev(sch);
struct mqprio_sched *priv = qdisc_priv(sch);
struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
+ struct gnet_stats_queue qstats = { 0 };
struct tc_mqprio_qopt opt = { 0 };
+ struct gnet_stats bstats = { 0 };
+ const struct Qdisc *qdisc;
unsigned int qlen = 0;
- struct Qdisc *qdisc;
unsigned int ntx;
- qlen = 0;
- gnet_stats_basic_sync_init(&sch->bstats);
- memset(&sch->qstats, 0, sizeof(sch->qstats));
-
/* MQ supports lockless qdiscs. However, statistics accounting needs
* to account for all, none, or a mix of locked and unlocked child
* qdiscs. Percpu stats are added to counters in-band and locking
* qdisc totals are added at end.
*/
+ rcu_read_lock();
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
- qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
- spin_lock_bh(qdisc_lock(qdisc));
+ qdisc = rcu_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
- gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
- &qdisc->bstats, false);
- gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
+ gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
+ &qdisc->bstats, true);
+ gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
&qdisc->qstats);
- qlen += qdisc_qlen(qdisc);
-
- spin_unlock_bh(qdisc_lock(qdisc));
+ qlen += qdisc_qlen_lockless(qdisc);
}
+ rcu_read_unlock();
+
+ spin_lock_bh(qdisc_lock(sch));
+ _bstats_set(&sch->bstats, bstats.bytes, bstats.packets);
+ spin_unlock_bh(qdisc_lock(sch));
+
+ WRITE_ONCE(sch->qstats.qlen, qstats.qlen);
+ WRITE_ONCE(sch->qstats.backlog, qstats.backlog);
+ WRITE_ONCE(sch->qstats.drops, qstats.drops);
+ WRITE_ONCE(sch->qstats.requeues, qstats.requeues);
+ WRITE_ONCE(sch->qstats.overlimits, qstats.overlimits);
+
WRITE_ONCE(sch->q.qlen, qlen);
mqprio_qopt_reconstruct(dev, &opt);
@@ -661,45 +669,34 @@ static int mqprio_dump_class(struct Qdisc *sch, unsigned long cl,
static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
struct gnet_dump *d)
- __releases(d->lock)
- __acquires(d->lock)
{
if (cl >= TC_H_MIN_PRIORITY) {
struct net_device *dev = qdisc_dev(sch);
struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
- struct gnet_stats_queue qstats = {0};
+ struct gnet_stats_queue qstats = { 0 };
struct gnet_stats_basic_sync bstats;
+ struct gnet_stats _bstats = { 0 };
u32 qlen = 0;
int i;
- gnet_stats_basic_sync_init(&bstats);
- /* Drop lock here it will be reclaimed before touching
- * statistics this is required because the d->lock we
- * hold here is the look on dev_queue->qdisc_sleeping
- * also acquired below.
- */
- if (d->lock)
- spin_unlock_bh(d->lock);
-
+ rcu_read_lock();
for (i = tc.offset; i < tc.offset + tc.count; i++) {
- struct netdev_queue *q = netdev_get_tx_queue(dev, i);
- struct Qdisc *qdisc = rtnl_dereference(q->qdisc);
-
- spin_lock_bh(qdisc_lock(qdisc));
+ const struct netdev_queue *q = netdev_get_tx_queue(dev, i);
+ const struct Qdisc *qdisc = rcu_dereference(q->qdisc);
- gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
- &qdisc->bstats, false);
+ gnet_stats_add_basic(&_bstats, qdisc->cpu_bstats,
+ &qdisc->bstats, true);
gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
&qdisc->qstats);
- qlen += qdisc_qlen(qdisc);
-
- spin_unlock_bh(qdisc_lock(qdisc));
+ qlen += qdisc_qlen_lockless(qdisc);
}
+ rcu_read_unlock();
+ u64_stats_init(&bstats.syncp);
+ u64_stats_set(&bstats.bytes, _bstats.bytes);
+ u64_stats_set(&bstats.packets, _bstats.packets);
+
qlen = qlen + qstats.qlen;
- /* Reclaim root sleeping lock before completing stats */
- if (d->lock)
- spin_lock_bh(d->lock);
if (gnet_stats_copy_basic(d, NULL, &bstats, false) < 0 ||
gnet_stats_copy_queue(d, NULL, &qstats, qlen) < 0)
return -1;
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-04-10 18:32 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-10 18:22 [PATCH v3 net-next 00/15] net/sched: prepare RTNL removal from qdisc dumps Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 01/15] net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 02/15] net/sched: add qstats_cpu_drop_inc() helper Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 03/15] net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu] Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 04/15] net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 05/15] net/sched: annotate data-races around sch->qstats.backlog Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 06/15] net/sched: sch_sfb: annotate data-races in sfb_dump_stats() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 07/15] net/sched: sch_red: annotate data-races in red_dump_stats() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 08/15] net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 09/15] net/sched: sch_pie: annotate data-races in pie_dump_stats() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 10/15] net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 11/15] net_sched: sch_hhf: annotate data-races in hhf_dump_stats() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 12/15] net/sched: sch_choke: annotate data-races in choke_dump_stats() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 13/15] net/sched: sch_cake: annotate data-races in cake_dump_stats() Eric Dumazet
2026-04-10 18:22 ` [PATCH v3 net-next 14/15] net/sched: mq: no longer acquire qdisc spinlocks in dump operations Eric Dumazet
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox