* [PATCH net-next 01/15] net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc()
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
@ 2026-04-08 12:55 ` Eric Dumazet
2026-04-08 12:55 ` [PATCH net-next 02/15] net/sched: add qstats_cpu_drop_inc() helper Eric Dumazet
` (14 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:55 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
qstats_overlimit_inc() is only used to increment per cpu overlimits.
It can use this_cpu_inc() to avoid this_cpu_ptr() extra cost
and avoid potential store tearing.
Change qstats_overlimit_inc() name and its argument type.
Also add a WRITE_ONCE() in qdisc_qstats_overlimit() to prevent
store tearing.
$ scripts/bloat-o-meter -t vmlinux.0 vmlinux.1
add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-72 (-72)
Function old new delta
tcf_skbmod_act 772 764 -8
tcf_police_act 733 725 -8
tcf_mirred_to_dev 1126 1114 -12
tcf_ife_act 1077 1061 -16
tcf_mirred_act 1324 1296 -28
Total: Before=29610901, After=29610829, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/act_api.h | 2 +-
include/net/sch_generic.h | 6 +++---
net/sched/act_ife.c | 4 ++--
net/sched/act_police.c | 2 +-
net/sched/act_skbmod.c | 2 +-
5 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/include/net/act_api.h b/include/net/act_api.h
index d11b791079302f50c47e174979767e0b24afc59a..2ec4ef9a5d0c8e9110f92f135cc3c31a38af0479 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -250,7 +250,7 @@ static inline void tcf_action_inc_drop_qstats(struct tc_action *a)
static inline void tcf_action_inc_overlimit_qstats(struct tc_action *a)
{
if (likely(a->cpu_qstats)) {
- qstats_overlimit_inc(this_cpu_ptr(a->cpu_qstats));
+ qstats_cpu_overlimit_inc(a->cpu_qstats);
return;
}
atomic_inc(&a->tcfa_overlimits);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 5af262ec4bbd2d5021904df127a849e52c26178a..3ee383c6fc3f66f1aecd9ebc675fbd143852c150 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1004,9 +1004,9 @@ static inline void qstats_drop_inc(struct gnet_stats_queue *qstats)
qstats->drops++;
}
-static inline void qstats_overlimit_inc(struct gnet_stats_queue *qstats)
+static inline void qstats_cpu_overlimit_inc(struct gnet_stats_queue __percpu *qstats)
{
- qstats->overlimits++;
+ this_cpu_inc(qstats->overlimits);
}
static inline void qdisc_qstats_drop(struct Qdisc *sch)
@@ -1021,7 +1021,7 @@ static inline void qdisc_qstats_cpu_drop(struct Qdisc *sch)
static inline void qdisc_qstats_overlimit(struct Qdisc *sch)
{
- sch->qstats.overlimits++;
+ WRITE_ONCE(sch->qstats.overlimits, sch->qstats.overlimits + 1);
}
static inline int qdisc_qstats_copy(struct gnet_dump *d, struct Qdisc *sch)
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index d5e8a91bb4eb9f1f1f084e199b5ada4e7f7e7205..e1b825e14900d6f46bbfd1b7f72ab6cd554d8a73 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -750,7 +750,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
*/
pr_info_ratelimited("Unknown metaid %d dlen %d\n",
mtype, dlen);
- qstats_overlimit_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_overlimit_inc(ife->common.cpu_qstats);
}
}
@@ -814,7 +814,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
/* abuse overlimits to count when we allow packet
* with no metadata
*/
- qstats_overlimit_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_overlimit_inc(ife->common.cpu_qstats);
return action;
}
/* could be stupid policy setup or mtu config
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 12ea9e5a600536b603ea73cc99b4c00381287219..8060f43e4d11c0a26e1475db06b76426f50c5975 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -307,7 +307,7 @@ TC_INDIRECT_SCOPE int tcf_police_act(struct sk_buff *skb,
}
inc_overlimits:
- qstats_overlimit_inc(this_cpu_ptr(police->common.cpu_qstats));
+ qstats_cpu_overlimit_inc(police->common.cpu_qstats);
inc_drops:
if (ret == TC_ACT_SHOT)
qstats_drop_inc(this_cpu_ptr(police->common.cpu_qstats));
diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index 23ca46138f040d38de37684439873921bc9c86af..a464b0a3c1b81dba6c28c1141aa38c5c7cad3acb 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -87,7 +87,7 @@ TC_INDIRECT_SCOPE int tcf_skbmod_act(struct sk_buff *skb,
return p->action;
drop:
- qstats_overlimit_inc(this_cpu_ptr(d->common.cpu_qstats));
+ qstats_cpu_overlimit_inc(d->common.cpu_qstats);
return TC_ACT_SHOT;
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 02/15] net/sched: add qstats_cpu_drop_inc() helper
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
2026-04-08 12:55 ` [PATCH net-next 01/15] net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc() Eric Dumazet
@ 2026-04-08 12:55 ` Eric Dumazet
2026-04-08 12:55 ` [PATCH net-next 03/15] net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu] Eric Dumazet
` (13 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:55 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
1) Using this_cpu_inc() is better than going through this_cpu_ptr():
- Single instruction on x86.
- Store tearing prevention.
2) Change tcf_action_update_stats() to use this_cpu_add().
3) Add WRITE_ONCE() to __qdisc_qstats_drop() and qstats_drop_inc().
$ scripts/bloat-o-meter -t vmlinux.1 vmlinux.2
add/remove: 0/0 grow/shrink: 1/10 up/down: 16/-97 (-81)
Function old new delta
tcf_ife_act 1061 1077 +16
tcf_vlan_act 684 676 -8
tcf_skbedit_act 626 618 -8
tcf_police_act 725 717 -8
tcf_mpls_act 1297 1289 -8
tcf_gate_act 310 302 -8
tcf_gact_act 195 187 -8
tcf_csum_act 2905 2897 -8
tcf_bpf_act 749 741 -8
tcf_action_update_stats 124 115 -9
tcf_ct_act 2154 2130 -24
Total: Before=29739602, After=29739521, chg -0.00%
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/act_api.h | 2 +-
include/net/sch_generic.h | 9 +++++++--
net/sched/act_api.c | 2 +-
net/sched/act_bpf.c | 2 +-
net/sched/act_ife.c | 8 ++++----
net/sched/act_mpls.c | 2 +-
net/sched/act_police.c | 2 +-
net/sched/act_skbedit.c | 2 +-
8 files changed, 17 insertions(+), 12 deletions(-)
diff --git a/include/net/act_api.h b/include/net/act_api.h
index 2ec4ef9a5d0c8e9110f92f135cc3c31a38af0479..167435c5615e09f491a05d01ec86b0c9f9f4fd5b 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -241,7 +241,7 @@ static inline void tcf_action_update_bstats(struct tc_action *a,
static inline void tcf_action_inc_drop_qstats(struct tc_action *a)
{
if (likely(a->cpu_qstats)) {
- qstats_drop_inc(this_cpu_ptr(a->cpu_qstats));
+ qstats_cpu_drop_inc(a->cpu_qstats);
return;
}
atomic_inc(&a->tcfa_drops);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 3ee383c6fc3f66f1aecd9ebc675fbd143852c150..b22579671e4b4dd04c5dfa810b714daaac74af2a 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -996,12 +996,17 @@ static inline void qdisc_qstats_cpu_requeues_inc(struct Qdisc *sch)
static inline void __qdisc_qstats_drop(struct Qdisc *sch, int count)
{
- sch->qstats.drops += count;
+ WRITE_ONCE(sch->qstats.drops, sch->qstats.drops + count);
}
static inline void qstats_drop_inc(struct gnet_stats_queue *qstats)
{
- qstats->drops++;
+ WRITE_ONCE(qstats->drops, qstats->drops + 1);
+}
+
+static inline void qstats_cpu_drop_inc(struct gnet_stats_queue __percpu *qstats)
+{
+ this_cpu_inc(qstats->drops);
}
static inline void qstats_cpu_overlimit_inc(struct gnet_stats_queue __percpu *qstats)
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 332fd9695e54a1fc63bb869c28cacf5f2ed14971..551992683d9e69c247b8d9c613a69e2a897a1e79 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1578,7 +1578,7 @@ void tcf_action_update_stats(struct tc_action *a, u64 bytes, u64 packets,
if (a->cpu_bstats) {
_bstats_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
- this_cpu_ptr(a->cpu_qstats)->drops += drops;
+ this_cpu_add(a->cpu_qstats->drops, drops);
if (hw)
_bstats_update(this_cpu_ptr(a->cpu_bstats_hw),
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index c2b5bc19e09118857d1ef3c4aed566b8225f2e9a..58a074651176730fd1bd370ba8420dfbed0d4e9c 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -76,7 +76,7 @@ TC_INDIRECT_SCOPE int tcf_bpf_act(struct sk_buff *skb,
break;
case TC_ACT_SHOT:
action = filter_res;
- qstats_drop_inc(this_cpu_ptr(prog->common.cpu_qstats));
+ qstats_cpu_drop_inc(prog->common.cpu_qstats);
break;
case TC_ACT_UNSPEC:
action = prog->tcf_action;
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index e1b825e14900d6f46bbfd1b7f72ab6cd554d8a73..065228026c58eb0f8ff3b3a08758e4ef0d6ea708 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -727,7 +727,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
tlv_data = ife_decode(skb, &metalen);
if (unlikely(!tlv_data)) {
- qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_drop_inc(ife->common.cpu_qstats);
return TC_ACT_SHOT;
}
@@ -740,7 +740,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
curr_data = ife_tlv_meta_decode(tlv_data, ifehdr_end, &mtype,
&dlen, NULL);
if (!curr_data) {
- qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_drop_inc(ife->common.cpu_qstats);
return TC_ACT_SHOT;
}
@@ -755,7 +755,7 @@ static int tcf_ife_decode(struct sk_buff *skb, const struct tc_action *a,
}
if (WARN_ON(tlv_data != ifehdr_end)) {
- qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_drop_inc(ife->common.cpu_qstats);
return TC_ACT_SHOT;
}
@@ -821,7 +821,7 @@ static int tcf_ife_encode(struct sk_buff *skb, const struct tc_action *a,
* so lets be conservative.. */
if ((action == TC_ACT_SHOT) || exceed_mtu) {
drop:
- qstats_drop_inc(this_cpu_ptr(ife->common.cpu_qstats));
+ qstats_cpu_drop_inc(ife->common.cpu_qstats);
return TC_ACT_SHOT;
}
diff --git a/net/sched/act_mpls.c b/net/sched/act_mpls.c
index 1abfaf9d99f1fce0fe7cafa2a9e35c80a3969ce7..4ea8b2e08c3a4dddfe1670af72a5d487a5219f5e 100644
--- a/net/sched/act_mpls.c
+++ b/net/sched/act_mpls.c
@@ -123,7 +123,7 @@ TC_INDIRECT_SCOPE int tcf_mpls_act(struct sk_buff *skb,
return p->action;
drop:
- qstats_drop_inc(this_cpu_ptr(m->common.cpu_qstats));
+ qstats_cpu_drop_inc(m->common.cpu_qstats);
return TC_ACT_SHOT;
}
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 8060f43e4d11c0a26e1475db06b76426f50c5975..b16468a98c55e32260e8d4cb1fe3d771fca65120 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -310,7 +310,7 @@ TC_INDIRECT_SCOPE int tcf_police_act(struct sk_buff *skb,
qstats_cpu_overlimit_inc(police->common.cpu_qstats);
inc_drops:
if (ret == TC_ACT_SHOT)
- qstats_drop_inc(this_cpu_ptr(police->common.cpu_qstats));
+ qstats_cpu_drop_inc(police->common.cpu_qstats);
end:
return ret;
}
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index a778cdba9258c2c776ee5ba0751cca1b73c984df..bfec6b66841031cd566d0c2bdc3d120cec41e3e4 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -86,7 +86,7 @@ TC_INDIRECT_SCOPE int tcf_skbedit_act(struct sk_buff *skb,
return params->action;
err:
- qstats_drop_inc(this_cpu_ptr(d->common.cpu_qstats));
+ qstats_cpu_drop_inc(d->common.cpu_qstats);
return TC_ACT_SHOT;
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 03/15] net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
2026-04-08 12:55 ` [PATCH net-next 01/15] net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc() Eric Dumazet
2026-04-08 12:55 ` [PATCH net-next 02/15] net/sched: add qstats_cpu_drop_inc() helper Eric Dumazet
@ 2026-04-08 12:55 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 04/15] net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec() Eric Dumazet
` (12 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:55 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
Stats are read locklessly, add READ_ONCE() to prevent load-stearing.
Write side will be handled in separate patches.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/core/gen_stats.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index b71ccaec0991461333dbe465ee619bca4a06e75b..1a2380e74272de8eaf3d4ef453e56105a31e9edf 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -345,11 +345,11 @@ static void gnet_stats_add_queue_cpu(struct gnet_stats_queue *qstats,
for_each_possible_cpu(i) {
const struct gnet_stats_queue *qcpu = per_cpu_ptr(q, i);
- qstats->qlen += qcpu->qlen;
- qstats->backlog += qcpu->backlog;
- qstats->drops += qcpu->drops;
- qstats->requeues += qcpu->requeues;
- qstats->overlimits += qcpu->overlimits;
+ qstats->qlen += READ_ONCE(qcpu->qlen);
+ qstats->backlog += READ_ONCE(qcpu->backlog);
+ qstats->drops += READ_ONCE(qcpu->drops);
+ qstats->requeues += READ_ONCE(qcpu->requeues);
+ qstats->overlimits += READ_ONCE(qcpu->overlimits);
}
}
@@ -360,11 +360,11 @@ void gnet_stats_add_queue(struct gnet_stats_queue *qstats,
if (cpu) {
gnet_stats_add_queue_cpu(qstats, cpu);
} else {
- qstats->qlen += q->qlen;
- qstats->backlog += q->backlog;
- qstats->drops += q->drops;
- qstats->requeues += q->requeues;
- qstats->overlimits += q->overlimits;
+ qstats->qlen += READ_ONCE(q->qlen);
+ qstats->backlog += READ_ONCE(q->backlog);
+ qstats->drops += READ_ONCE(q->drops);
+ qstats->requeues += READ_ONCE(q->requeues);
+ qstats->overlimits += READ_ONCE(q->overlimits);
}
}
EXPORT_SYMBOL(gnet_stats_add_queue);
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 04/15] net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (2 preceding siblings ...)
2026-04-08 12:55 ` [PATCH net-next 03/15] net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu] Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 05/15] net/sched: annotate data-races around sch->qstats.backlog Eric Dumazet
` (11 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
Helpers to increment or decrement sch->q.qlen, with appropriate
WRITE_ONCE() to prevent store tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/sch_generic.h | 24 +++++++++++++++++-------
net/sched/sch_api.c | 2 +-
net/sched/sch_cake.c | 8 ++++----
net/sched/sch_cbs.c | 4 ++--
net/sched/sch_choke.c | 8 ++++----
net/sched/sch_drr.c | 4 ++--
net/sched/sch_dualpi2.c | 4 ++--
net/sched/sch_etf.c | 8 ++++----
net/sched/sch_ets.c | 4 ++--
net/sched/sch_fq.c | 4 ++--
net/sched/sch_fq_codel.c | 7 ++++---
net/sched/sch_fq_pie.c | 4 ++--
net/sched/sch_generic.c | 8 ++++----
net/sched/sch_hfsc.c | 4 ++--
net/sched/sch_hhf.c | 7 ++++---
net/sched/sch_htb.c | 4 ++--
net/sched/sch_mqprio.c | 6 ++++--
net/sched/sch_multiq.c | 4 ++--
net/sched/sch_netem.c | 10 +++++-----
net/sched/sch_prio.c | 4 ++--
net/sched/sch_qfq.c | 6 +++---
net/sched/sch_red.c | 4 ++--
net/sched/sch_sfb.c | 4 ++--
net/sched/sch_sfq.c | 7 ++++---
net/sched/sch_skbprio.c | 4 ++--
net/sched/sch_taprio.c | 4 ++--
net/sched/sch_tbf.c | 6 +++---
net/sched/sch_teql.c | 2 +-
28 files changed, 90 insertions(+), 75 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index b22579671e4b4dd04c5dfa810b714daaac74af2a..84acf7ac42cb5173d151d98fa0ea603e9cc80d69 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -542,6 +542,16 @@ static inline int qdisc_qlen(const struct Qdisc *q)
return q->q.qlen;
}
+static inline void qdisc_qlen_inc(struct Qdisc *q)
+{
+ WRITE_ONCE(q->q.qlen, q->q.qlen + 1);
+}
+
+static inline void qdisc_qlen_dec(struct Qdisc *q)
+{
+ WRITE_ONCE(q->q.qlen, q->q.qlen - 1);
+}
+
static inline int qdisc_qlen_sum(const struct Qdisc *q)
{
__u32 qlen = q->qstats.qlen;
@@ -549,9 +559,9 @@ static inline int qdisc_qlen_sum(const struct Qdisc *q)
if (qdisc_is_percpu_stats(q)) {
for_each_possible_cpu(i)
- qlen += per_cpu_ptr(q->cpu_qstats, i)->qlen;
+ qlen += READ_ONCE(per_cpu_ptr(q->cpu_qstats, i)->qlen);
} else {
- qlen += q->q.qlen;
+ qlen += READ_ONCE(q->q.qlen);
}
return qlen;
@@ -1110,7 +1120,7 @@ static inline struct sk_buff *qdisc_dequeue_internal(struct Qdisc *sch, bool dir
skb = __skb_dequeue(&sch->gso_skb);
if (skb) {
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
return skb;
}
@@ -1256,7 +1266,7 @@ static inline struct sk_buff *qdisc_peek_dequeued(struct Qdisc *sch)
__skb_queue_head(&sch->gso_skb, skb);
/* it's still part of the queue */
qdisc_qstats_backlog_inc(sch, skb);
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
}
}
@@ -1273,7 +1283,7 @@ static inline void qdisc_update_stats_at_dequeue(struct Qdisc *sch,
} else {
qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
}
}
@@ -1285,7 +1295,7 @@ static inline void qdisc_update_stats_at_enqueue(struct Qdisc *sch,
this_cpu_add(sch->cpu_qstats->backlog, pkt_len);
} else {
sch->qstats.backlog += pkt_len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
}
}
@@ -1301,7 +1311,7 @@ static inline struct sk_buff *qdisc_dequeue_peeked(struct Qdisc *sch)
qdisc_qstats_cpu_qlen_dec(sch);
} else {
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
}
} else {
skb = sch->dequeue(sch);
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ed869a5ffc7377b7c19e66ae5fc9788e709488da..0dd3efd86393870e9695dddb4a471c5bf854f81e 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -805,7 +805,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
cl = cops->find(sch, parentid);
cops->qlen_notify(sch, cl);
}
- sch->q.qlen -= n;
+ WRITE_ONCE(sch->q.qlen, sch->q.qlen - n);
sch->qstats.backlog -= len;
__qdisc_qstats_drop(sch, drops);
}
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index ffea9fbd522d8dd3311cbca0a55a3d133eaceae4..0a8b067ba8ecbb85bd6f96ee9e5e959ba5e2efae 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1605,7 +1605,7 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
cake_advance_shaper(q, b, skb, now, true);
qdisc_drop_reason(skb, sch, to_free, QDISC_DROP_OVERLIMIT);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
cake_heapify(q, 0);
@@ -1815,7 +1815,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
segs);
flow_queue_add(flow, segs);
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
numsegs++;
slen += segs->len;
q->buffer_used += segs->truesize;
@@ -1854,7 +1854,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qdisc_tree_reduce_backlog(sch, 1, ack_pkt_len);
consume_skb(ack);
} else {
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
q->buffer_used += skb->truesize;
}
@@ -1980,7 +1980,7 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
b->tin_backlog -= len;
sch->qstats.backlog -= len;
q->buffer_used -= skb->truesize;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
if (q->overflow_timeout)
cake_heapify(q, b->overflow_idx[q->cur_flow]);
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index 8c9a0400c8622c652db290796f2dd338eb61799c..a75e58876797952f2218725f6da5cff29f330ae2 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -97,7 +97,7 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return err;
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
@@ -168,7 +168,7 @@ static struct sk_buff *cbs_child_dequeue(struct Qdisc *sch, struct Qdisc *child)
qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index 94df8e741a979191a06885ad3ee813f12650ff3c..cd0785ad8e74314e6d5c88144ffcf64f286e02dd 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -123,7 +123,7 @@ static void choke_drop_by_idx(struct Qdisc *sch, unsigned int idx,
if (idx == q->tail)
choke_zap_tail_holes(q);
- --sch->q.qlen;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(skb));
qdisc_drop(skb, sch, to_free);
@@ -267,7 +267,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (sch->q.qlen < q->limit) {
q->tab[q->tail] = skb;
q->tail = (q->tail + 1) & q->tab_mask;
- ++sch->q.qlen;
+ qdisc_qlen_inc(sch);
qdisc_qstats_backlog_inc(sch, skb);
return NET_XMIT_SUCCESS;
}
@@ -294,7 +294,7 @@ static struct sk_buff *choke_dequeue(struct Qdisc *sch)
skb = q->tab[q->head];
q->tab[q->head] = NULL;
choke_zap_head_holes(q);
- --sch->q.qlen;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
@@ -392,7 +392,7 @@ static int choke_change(struct Qdisc *sch, struct nlattr *opt,
}
dropped += qdisc_pkt_len(skb);
qdisc_qstats_backlog_dec(sch, skb);
- --sch->q.qlen;
+ qdisc_qlen_dec(sch);
rtnl_qdisc_drop(skb, sch);
}
qdisc_tree_reduce_backlog(sch, oqlen - sch->q.qlen, dropped);
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 01335a49e091444747635ee8bc7e22ded504d571..925fa0cfd730ce72e45e8983ba02eb913afb1235 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -366,7 +366,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return err;
}
@@ -399,7 +399,7 @@ static struct sk_buff *drr_dequeue(struct Qdisc *sch)
bstats_update(&cl->bstats, skb);
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_dualpi2.c b/net/sched/sch_dualpi2.c
index fe6f5e8896257674b9f175e01428b89e299a7dda..d093d058decbc577f3b311e1e8513260c167bff0 100644
--- a/net/sched/sch_dualpi2.c
+++ b/net/sched/sch_dualpi2.c
@@ -415,7 +415,7 @@ static int dualpi2_enqueue_skb(struct sk_buff *skb, struct Qdisc *sch,
dualpi2_skb_cb(skb)->apply_step = skb_apply_step(skb, q);
/* Keep the overall qdisc stats consistent */
- ++sch->q.qlen;
+ qdisc_qlen_inc(sch);
qdisc_qstats_backlog_inc(sch, skb);
++q->packets_in_l;
if (!q->l_head_ts)
@@ -530,7 +530,7 @@ static struct sk_buff *dequeue_packet(struct Qdisc *sch,
qdisc_qstats_backlog_dec(q->l_queue, skb);
/* Keep the global queue size consistent */
- --sch->q.qlen;
+ qdisc_qlen_dec(sch);
q->memory_used -= skb->truesize;
} else if (c_len) {
skb = __qdisc_dequeue_head(&sch->q);
diff --git a/net/sched/sch_etf.c b/net/sched/sch_etf.c
index c74d778c32a1eda639650df4d1d103c5338f14e6..ada87a81da6ac4c20e036b5391eb4efe9795ab91 100644
--- a/net/sched/sch_etf.c
+++ b/net/sched/sch_etf.c
@@ -189,7 +189,7 @@ static int etf_enqueue_timesortedlist(struct sk_buff *nskb, struct Qdisc *sch,
rb_insert_color_cached(&nskb->rbnode, &q->head, leftmost);
qdisc_qstats_backlog_inc(sch, nskb);
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
/* Now we may need to re-arm the qdisc watchdog for the next packet. */
reset_watchdog(sch);
@@ -222,7 +222,7 @@ static void timesortedlist_drop(struct Qdisc *sch, struct sk_buff *skb,
qdisc_qstats_backlog_dec(sch, skb);
qdisc_drop(skb, sch, &to_free);
qdisc_qstats_overlimit(sch);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
}
kfree_skb_list(to_free);
@@ -247,7 +247,7 @@ static void timesortedlist_remove(struct Qdisc *sch, struct sk_buff *skb)
q->last = skb->tstamp;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
}
static struct sk_buff *etf_dequeue_timesortedlist(struct Qdisc *sch)
@@ -426,7 +426,7 @@ static void timesortedlist_clear(struct Qdisc *sch)
rb_erase_cached(&skb->rbnode, &q->head);
rtnl_kfree_skbs(skb, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
}
}
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index a4b07b661b7756a675d22c0f84f8f0a713cdb7eb..c817e0a6c14653a35f5ebb9de1a5ccc44d1a2f98 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -449,7 +449,7 @@ static int ets_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return err;
}
@@ -458,7 +458,7 @@ ets_qdisc_dequeue_skb(struct Qdisc *sch, struct sk_buff *skb)
{
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index f2edcf872981fd8181dfb97a3bc665fd4a869115..dd553d6f3e8e911e161c1440eb6d9ce94f65385a 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -497,7 +497,7 @@ static void fq_dequeue_skb(struct Qdisc *sch, struct fq_flow *flow,
fq_erase_head(sch, flow, skb);
skb_mark_not_on_list(skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_bstats_update(sch, skb);
}
@@ -597,7 +597,7 @@ static int fq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
flow_queue_add(f, skb);
qdisc_qstats_backlog_inc(sch, skb);
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 2a3d758f67ab43d17128442fd8b51c6ba7775d52..183b1ea9d2076d8f709d50a38b39d28a2b14bad8 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -178,7 +178,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets,
q->memory_usage -= mem;
sch->qstats.drops += i;
sch->qstats.backlog -= len;
- sch->q.qlen -= i;
+ WRITE_ONCE(sch->q.qlen, sch->q.qlen - i);
return idx;
}
@@ -215,7 +215,8 @@ static int fq_codel_enqueue(struct sk_buff *skb, struct Qdisc *sch,
get_codel_cb(skb)->mem_usage = skb->truesize;
q->memory_usage += get_codel_cb(skb)->mem_usage;
memory_limited = q->memory_usage > q->memory_limit;
- if (++sch->q.qlen <= sch->limit && !memory_limited)
+ qdisc_qlen_inc(sch);
+ if (sch->q.qlen <= sch->limit && !memory_limited)
return NET_XMIT_SUCCESS;
prev_backlog = sch->qstats.backlog;
@@ -265,7 +266,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
skb = dequeue_head(flow);
q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
q->memory_usage -= get_codel_cb(skb)->mem_usage;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
sch->qstats.backlog -= qdisc_pkt_len(skb);
}
return skb;
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 154c70f489f289066db5d61bb51e58aaf328f16e..dba49d44a5d2412b2deb983bf87428ade7944e51 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -185,7 +185,7 @@ static int fq_pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
q->stats.packets_in++;
q->memory_usage += skb->truesize;
sch->qstats.backlog += pkt_len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
flow_queue_add(sel_flow, skb);
if (list_empty(&sel_flow->flowchain)) {
list_add_tail(&sel_flow->flowchain, &q->new_flows);
@@ -263,7 +263,7 @@ static struct sk_buff *fq_pie_qdisc_dequeue(struct Qdisc *sch)
skb = dequeue_head(flow);
pkt_len = qdisc_pkt_len(skb);
sch->qstats.backlog -= pkt_len;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_bstats_update(sch, skb);
}
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a93321db8fd75d30c61e146c290bbc139c37c913..32ace8659ab86457cd1b1655810e0f4105149c47 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -118,7 +118,7 @@ static inline struct sk_buff *__skb_dequeue_bad_txq(struct Qdisc *q)
qdisc_qstats_cpu_qlen_dec(q);
} else {
qdisc_qstats_backlog_dec(q, skb);
- q->q.qlen--;
+ qdisc_qlen_dec(q);
}
} else {
skb = SKB_XOFF_MAGIC;
@@ -159,7 +159,7 @@ static inline void qdisc_enqueue_skb_bad_txq(struct Qdisc *q,
qdisc_qstats_cpu_qlen_inc(q);
} else {
qdisc_qstats_backlog_inc(q, skb);
- q->q.qlen++;
+ qdisc_qlen_inc(q);
}
if (lock)
@@ -188,7 +188,7 @@ static inline void dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q)
} else {
q->qstats.requeues++;
qdisc_qstats_backlog_inc(q, skb);
- q->q.qlen++;
+ qdisc_qlen_inc(q);
}
skb = next;
@@ -294,7 +294,7 @@ static struct sk_buff *dequeue_skb(struct Qdisc *q, bool *validate,
qdisc_qstats_cpu_qlen_dec(q);
} else {
qdisc_qstats_backlog_dec(q, skb);
- q->q.qlen--;
+ qdisc_qlen_dec(q);
}
} else {
skb = NULL;
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index 83b2ca2e37fc82cfebf089e6c0e36f18af939887..e71a565100edf60881ca7542faa408c5bb1a0984 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1561,7 +1561,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
}
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
if (first && !cl_in_el_or_vttree(cl)) {
if (cl->cl_flags & HFSC_RSC)
@@ -1650,7 +1650,7 @@ hfsc_dequeue(struct Qdisc *sch)
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 95e5d9bfd9c8c0cac08e080b8f1e0332e722aa3b..69b6f0a5471cb9a3b7b760144683f2b249091d89 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -359,7 +359,7 @@ static unsigned int hhf_drop(struct Qdisc *sch, struct sk_buff **to_free)
if (bucket->head) {
struct sk_buff *skb = dequeue_head(bucket);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
qdisc_drop(skb, sch, to_free);
}
@@ -399,7 +399,8 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
bucket->deficit = weight * q->quantum;
}
- if (++sch->q.qlen <= sch->limit)
+ qdisc_qlen_inc(sch);
+ if (sch->q.qlen <= sch->limit)
return NET_XMIT_SUCCESS;
prev_backlog = sch->qstats.backlog;
@@ -442,7 +443,7 @@ static struct sk_buff *hhf_dequeue(struct Qdisc *sch)
if (bucket->head) {
skb = dequeue_head(bucket);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
}
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index eb12381795ce1bb0f3b8c5f502e16ad64c4408c8..c22ccd8eae8c73323ccdf425e62857b3b851d74e 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -651,7 +651,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
@@ -951,7 +951,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
ok:
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 002add5ce9e0ab04a6260495d1bec02983c2a204..d35624e5869a4a6a12612886b2cd9cdac7b0b471 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -555,10 +555,11 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
struct mqprio_sched *priv = qdisc_priv(sch);
struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
struct tc_mqprio_qopt opt = { 0 };
+ unsigned int qlen = 0;
struct Qdisc *qdisc;
unsigned int ntx;
- sch->q.qlen = 0;
+ qlen = 0;
gnet_stats_basic_sync_init(&sch->bstats);
memset(&sch->qstats, 0, sizeof(sch->qstats));
@@ -575,10 +576,11 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
&qdisc->bstats, false);
gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
&qdisc->qstats);
- sch->q.qlen += qdisc_qlen(qdisc);
+ qlen += qdisc_qlen(qdisc);
spin_unlock_bh(qdisc_lock(qdisc));
}
+ WRITE_ONCE(sch->q.qlen, qlen);
mqprio_qopt_reconstruct(dev, &opt);
opt.hw = priv->hw_offload;
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c
index 9f822fee113df6562ddac89092357434547a4599..4e465d11e3d75e36b875b66f8c8087c2e15cdad9 100644
--- a/net/sched/sch_multiq.c
+++ b/net/sched/sch_multiq.c
@@ -76,7 +76,7 @@ multiq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
ret = qdisc_enqueue(skb, qdisc, to_free);
if (ret == NET_XMIT_SUCCESS) {
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
if (net_xmit_drop_count(ret))
@@ -106,7 +106,7 @@ static struct sk_buff *multiq_dequeue(struct Qdisc *sch)
skb = qdisc->dequeue(qdisc);
if (skb) {
qdisc_bstats_update(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
}
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 20df1c08b1e9d04e9495f1a69eff0dd96049f914..4498dd440a02ea7a089c92ebc005d5064b87e2d2 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -416,7 +416,7 @@ static void tfifo_enqueue(struct sk_buff *nskb, struct Qdisc *sch)
rb_insert_color(&nskb->rbnode, &q->t_root);
}
q->t_len++;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
}
/* netem can't properly corrupt a megapacket (like we get from GSO), so instead
@@ -752,19 +752,19 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (net_xmit_drop_count(err))
qdisc_qstats_drop(sch);
sch->qstats.backlog -= pkt_len;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_tree_reduce_backlog(sch, 1, pkt_len);
}
goto tfifo_dequeue;
}
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
goto deliver;
}
if (q->qdisc) {
skb = q->qdisc->ops->dequeue(q->qdisc);
if (skb) {
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
goto deliver;
}
}
@@ -777,7 +777,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (q->qdisc) {
skb = q->qdisc->ops->dequeue(q->qdisc);
if (skb) {
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
goto deliver;
}
}
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index 9e2b9a490db23d858b27b7fc073b05a06535b05e..fe42ae3d6b696b2fc47f4d397af32e950eeec194 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -86,7 +86,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
ret = qdisc_enqueue(skb, qdisc, to_free);
if (ret == NET_XMIT_SUCCESS) {
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
if (net_xmit_drop_count(ret))
@@ -119,7 +119,7 @@ static struct sk_buff *prio_dequeue(struct Qdisc *sch)
if (skb) {
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 699e45873f86145e96abd0d9ca77a6d0ff763b1b..195c434aae5f7e03d1a1238ed73bb64b3f04e105 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1152,12 +1152,12 @@ static struct sk_buff *qfq_dequeue(struct Qdisc *sch)
if (!skb)
return NULL;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
skb = agg_dequeue(in_serv_agg, cl, len);
if (!skb) {
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NULL;
}
@@ -1265,7 +1265,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
_bstats_update(&cl->bstats, len, gso_segs);
sch->qstats.backlog += len;
- ++sch->q.qlen;
+ qdisc_qlen_inc(sch);
agg = cl->agg;
/* if the class is active, then done here */
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index c8d3d09f15e3919d6468964561130bfc79fb215b..61b9064d39f222bdfe5021e93e8172b7ae60c408 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -133,7 +133,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
ret = qdisc_enqueue(skb, child, to_free);
if (likely(ret == NET_XMIT_SUCCESS)) {
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
} else if (net_xmit_drop_count(ret)) {
q->stats.pdrop++;
qdisc_qstats_drop(sch);
@@ -159,7 +159,7 @@ static struct sk_buff *red_dequeue(struct Qdisc *sch)
if (skb) {
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
} else {
if (!red_is_idling(&q->vars))
red_start_of_idle_period(&q->vars);
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 01373866212894c57e7de58706ee464879303955..17b6ce223ad3a6f2d289c3ebe27cce8168c66b2b 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -407,7 +407,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
ret = qdisc_enqueue(skb, child, to_free);
if (likely(ret == NET_XMIT_SUCCESS)) {
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
increment_qlen(&cb, q);
} else if (net_xmit_drop_count(ret)) {
q->stats.childdrop++;
@@ -436,7 +436,7 @@ static struct sk_buff *sfb_dequeue(struct Qdisc *sch)
if (skb) {
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
decrement_qlen(skb, q);
}
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index c3f3181dba5424eb9d26362a1628653bb9392e89..5eb6d8abd1c334938f72259f5fc41526597e792f 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -300,7 +300,7 @@ static unsigned int sfq_drop(struct Qdisc *sch, struct sk_buff **to_free)
len = qdisc_pkt_len(skb);
slot->backlog -= len;
sfq_dec(q, x);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
qdisc_drop_reason(skb, sch, to_free, QDISC_DROP_OVERLIMIT);
return len;
@@ -454,7 +454,8 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
/* We could use a bigger initial quantum for new flows */
slot->allot = q->quantum;
}
- if (++sch->q.qlen <= q->limit)
+ qdisc_qlen_inc(sch);
+ if (sch->q.qlen <= q->limit)
return NET_XMIT_SUCCESS;
qlen = slot->qlen;
@@ -495,7 +496,7 @@ sfq_dequeue(struct Qdisc *sch)
skb = slot_dequeue_head(slot);
sfq_dec(q, a);
qdisc_bstats_update(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
slot->backlog -= qdisc_pkt_len(skb);
/* Is the slot empty? */
diff --git a/net/sched/sch_skbprio.c b/net/sched/sch_skbprio.c
index f485f62ab721ab8cde21230c60514708fb479982..52abfb4015a36408046d96b349497419ab5dacf8 100644
--- a/net/sched/sch_skbprio.c
+++ b/net/sched/sch_skbprio.c
@@ -93,7 +93,7 @@ static int skbprio_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (prio < q->lowest_prio)
q->lowest_prio = prio;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
@@ -145,7 +145,7 @@ static struct sk_buff *skbprio_dequeue(struct Qdisc *sch)
if (unlikely(!skb))
return NULL;
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 8e375281195061da848fb2bfaf79cf125afccac0..885a9bc859166dfb6d20aa0dfbb8f11194e02ba9 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -574,7 +574,7 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch,
}
qdisc_qstats_backlog_inc(sch, skb);
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return qdisc_enqueue(skb, child, to_free);
}
@@ -755,7 +755,7 @@ static struct sk_buff *taprio_dequeue_from_txq(struct Qdisc *sch, int txq,
qdisc_bstats_update(sch, skb);
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
return skb;
}
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index f2340164f579a25431979e12ec3d23ab828edd16..25edf11a7d671fe63878b0995998c5920b86ef74 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -231,7 +231,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
len += seg_len;
}
}
- sch->q.qlen += nb;
+ WRITE_ONCE(sch->q.qlen, sch->q.qlen + nb);
sch->qstats.backlog += len;
if (nb > 0) {
qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
@@ -264,7 +264,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
sch->qstats.backlog += len;
- sch->q.qlen++;
+ qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
@@ -309,7 +309,7 @@ static struct sk_buff *tbf_dequeue(struct Qdisc *sch)
q->tokens = toks;
q->ptokens = ptoks;
qdisc_qstats_backlog_dec(sch, skb);
- sch->q.qlen--;
+ qdisc_qlen_dec(sch);
qdisc_bstats_update(sch, skb);
return skb;
}
diff --git a/net/sched/sch_teql.c b/net/sched/sch_teql.c
index ec4039a201a2c2c502bc649fa5f6a0e4feee8fd5..bd10da46f5ddbc53f914648066dab526c8064e55 100644
--- a/net/sched/sch_teql.c
+++ b/net/sched/sch_teql.c
@@ -107,7 +107,7 @@ teql_dequeue(struct Qdisc *sch)
} else {
qdisc_bstats_update(sch, skb);
}
- sch->q.qlen = dat->q.qlen + q->q.qlen;
+ WRITE_ONCE(sch->q.qlen, dat->q.qlen + q->q.qlen);
return skb;
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 05/15] net/sched: annotate data-races around sch->qstats.backlog
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (3 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 04/15] net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec() Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 06/15] net/sched: sch_sfb: annotate data-races in sfb_dump_stats() Eric Dumazet
` (10 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
Add qstats_backlog_sub() and qstats_backlog_add() helpers
and use them instead of open-coding them.
These helpers use WRITE_ONCE() to prevent store-tearing.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/sch_generic.h | 17 ++++++++++++++---
net/sched/sch_api.c | 2 +-
net/sched/sch_cake.c | 8 ++++----
net/sched/sch_cbs.c | 2 +-
net/sched/sch_codel.c | 2 +-
net/sched/sch_drr.c | 2 +-
net/sched/sch_ets.c | 2 +-
net/sched/sch_fq_codel.c | 4 ++--
net/sched/sch_fq_pie.c | 4 ++--
net/sched/sch_hfsc.c | 2 +-
net/sched/sch_htb.c | 2 +-
net/sched/sch_netem.c | 2 +-
net/sched/sch_prio.c | 2 +-
net/sched/sch_qfq.c | 2 +-
net/sched/sch_red.c | 2 +-
net/sched/sch_sfb.c | 4 ++--
net/sched/sch_sfq.c | 2 +-
net/sched/sch_tbf.c | 4 ++--
18 files changed, 38 insertions(+), 27 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 84acf7ac42cb5173d151d98fa0ea603e9cc80d69..8e4e12692a1048f5e626b89c4db9e0339b16265d 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -965,10 +965,15 @@ static inline void qdisc_bstats_update(struct Qdisc *sch,
bstats_update(&sch->bstats, skb);
}
+static inline void qstats_backlog_sub(struct Qdisc *sch, u32 val)
+{
+ WRITE_ONCE(sch->qstats.backlog, sch->qstats.backlog - val);
+}
+
static inline void qdisc_qstats_backlog_dec(struct Qdisc *sch,
const struct sk_buff *skb)
{
- sch->qstats.backlog -= qdisc_pkt_len(skb);
+ qstats_backlog_sub(sch, qdisc_pkt_len(skb));
}
static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
@@ -977,10 +982,15 @@ static inline void qdisc_qstats_cpu_backlog_dec(struct Qdisc *sch,
this_cpu_sub(sch->cpu_qstats->backlog, qdisc_pkt_len(skb));
}
+static inline void qstats_backlog_add(struct Qdisc *sch, u32 val)
+{
+ WRITE_ONCE(sch->qstats.backlog, sch->qstats.backlog + val);
+}
+
static inline void qdisc_qstats_backlog_inc(struct Qdisc *sch,
const struct sk_buff *skb)
{
- sch->qstats.backlog += qdisc_pkt_len(skb);
+ qstats_backlog_add(sch, qdisc_pkt_len(skb));
}
static inline void qdisc_qstats_cpu_backlog_inc(struct Qdisc *sch,
@@ -1294,7 +1304,8 @@ static inline void qdisc_update_stats_at_enqueue(struct Qdisc *sch,
qdisc_qstats_cpu_qlen_inc(sch);
this_cpu_add(sch->cpu_qstats->backlog, pkt_len);
} else {
- sch->qstats.backlog += pkt_len;
+ WRITE_ONCE(sch->qstats.backlog,
+ sch->qstats.backlog + pkt_len);
qdisc_qlen_inc(sch);
}
}
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 0dd3efd86393870e9695dddb4a471c5bf854f81e..292bc8bb7a79922a83865ed54083c04ff72742ff 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -806,7 +806,7 @@ void qdisc_tree_reduce_backlog(struct Qdisc *sch, int n, int len)
cops->qlen_notify(sch, cl);
}
WRITE_ONCE(sch->q.qlen, sch->q.qlen - n);
- sch->qstats.backlog -= len;
+ qstats_backlog_sub(sch, len);
__qdisc_qstats_drop(sch, drops);
}
rcu_read_unlock();
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 0a8b067ba8ecbb85bd6f96ee9e5e959ba5e2efae..0104c29b20f8e43ffa025f0eb58bfe4e2b801010 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1596,7 +1596,7 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
q->buffer_used -= skb->truesize;
b->backlogs[idx] -= len;
b->tin_backlog -= len;
- sch->qstats.backlog -= len;
+ qstats_backlog_sub(sch, len);
flow->dropped++;
b->tin_dropped++;
@@ -1826,7 +1826,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
b->bytes += slen;
b->backlogs[idx] += slen;
b->tin_backlog += slen;
- sch->qstats.backlog += slen;
+ qstats_backlog_add(sch, slen);
q->avg_window_bytes += slen;
qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen);
@@ -1863,7 +1863,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
b->bytes += len - ack_pkt_len;
b->backlogs[idx] += len - ack_pkt_len;
b->tin_backlog += len - ack_pkt_len;
- sch->qstats.backlog += len - ack_pkt_len;
+ qstats_backlog_add(sch, len - ack_pkt_len);
q->avg_window_bytes += len - ack_pkt_len;
}
@@ -1978,7 +1978,7 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
len = qdisc_pkt_len(skb);
b->backlogs[q->cur_flow] -= len;
b->tin_backlog -= len;
- sch->qstats.backlog -= len;
+ qstats_backlog_sub(sch, len);
q->buffer_used -= skb->truesize;
qdisc_qlen_dec(sch);
diff --git a/net/sched/sch_cbs.c b/net/sched/sch_cbs.c
index a75e58876797952f2218725f6da5cff29f330ae2..2cfa0fd92829ad7eba7454e09dc17eb8f22519b8 100644
--- a/net/sched/sch_cbs.c
+++ b/net/sched/sch_cbs.c
@@ -96,7 +96,7 @@ static int cbs_child_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (err != NET_XMIT_SUCCESS)
return err;
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index dc2be90666ffbd715550800790a0091acd28701d..0a0d07bc7055d29e56ef58c89a1b079967307177 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -42,7 +42,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
struct sk_buff *skb = __qdisc_dequeue_head(&sch->q);
if (skb) {
- sch->qstats.backlog -= qdisc_pkt_len(skb);
+ qstats_backlog_sub(sch, qdisc_pkt_len(skb));
prefetch(&skb->end); /* we'll need skb_shinfo() */
}
return skb;
diff --git a/net/sched/sch_drr.c b/net/sched/sch_drr.c
index 925fa0cfd730ce72e45e8983ba02eb913afb1235..3f6687fa9666257952be5d44f9e3460845fe2a40 100644
--- a/net/sched/sch_drr.c
+++ b/net/sched/sch_drr.c
@@ -365,7 +365,7 @@ static int drr_enqueue(struct sk_buff *skb, struct Qdisc *sch,
cl->deficit = cl->quantum;
}
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return err;
}
diff --git a/net/sched/sch_ets.c b/net/sched/sch_ets.c
index c817e0a6c14653a35f5ebb9de1a5ccc44d1a2f98..1cc559634ed27ce5a6630186a51a8ac8180dad96 100644
--- a/net/sched/sch_ets.c
+++ b/net/sched/sch_ets.c
@@ -448,7 +448,7 @@ static int ets_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
cl->deficit = cl->quantum;
}
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return err;
}
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 183b1ea9d2076d8f709d50a38b39d28a2b14bad8..5c88278a30b3cd92ae525d1a88eb692ff0e296e0 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -177,7 +177,7 @@ static unsigned int fq_codel_drop(struct Qdisc *sch, unsigned int max_packets,
q->backlogs[idx] -= len;
q->memory_usage -= mem;
sch->qstats.drops += i;
- sch->qstats.backlog -= len;
+ qstats_backlog_sub(sch, len);
WRITE_ONCE(sch->q.qlen, sch->q.qlen - i);
return idx;
}
@@ -267,7 +267,7 @@ static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
q->backlogs[flow - q->flows] -= qdisc_pkt_len(skb);
q->memory_usage -= get_codel_cb(skb)->mem_usage;
qdisc_qlen_dec(sch);
- sch->qstats.backlog -= qdisc_pkt_len(skb);
+ qdisc_qstats_backlog_dec(sch, skb);
}
return skb;
}
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index dba49d44a5d2412b2deb983bf87428ade7944e51..197f0df0a6eb06ab4ce25eefe01d32a35dbd84af 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -184,7 +184,7 @@ static int fq_pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
pkt_len = qdisc_pkt_len(skb);
q->stats.packets_in++;
q->memory_usage += skb->truesize;
- sch->qstats.backlog += pkt_len;
+ qstats_backlog_add(sch, pkt_len);
qdisc_qlen_inc(sch);
flow_queue_add(sel_flow, skb);
if (list_empty(&sel_flow->flowchain)) {
@@ -262,7 +262,7 @@ static struct sk_buff *fq_pie_qdisc_dequeue(struct Qdisc *sch)
if (flow->head) {
skb = dequeue_head(flow);
pkt_len = qdisc_pkt_len(skb);
- sch->qstats.backlog -= pkt_len;
+ qstats_backlog_sub(sch, pkt_len);
qdisc_qlen_dec(sch);
qdisc_bstats_update(sch, skb);
}
diff --git a/net/sched/sch_hfsc.c b/net/sched/sch_hfsc.c
index e71a565100edf60881ca7542faa408c5bb1a0984..d61377e8551bcf60c8a47816112114684efe5e0b 100644
--- a/net/sched/sch_hfsc.c
+++ b/net/sched/sch_hfsc.c
@@ -1560,7 +1560,7 @@ hfsc_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
return err;
}
- sch->qstats.backlog += len;
+ qstats_backlog_sub(sch, len);
qdisc_qlen_inc(sch);
if (first && !cl_in_el_or_vttree(cl)) {
diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index c22ccd8eae8c73323ccdf425e62857b3b851d74e..1e600f65c8769a74286c4f060b0d45da9a13eeeb 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -650,7 +650,7 @@ static int htb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
htb_activate(q, cl);
}
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 4498dd440a02ea7a089c92ebc005d5064b87e2d2..2a2cdd1e4cc206ba00b8dd1821bef87156050950 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -751,7 +751,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
if (err != NET_XMIT_SUCCESS) {
if (net_xmit_drop_count(err))
qdisc_qstats_drop(sch);
- sch->qstats.backlog -= pkt_len;
+ qstats_backlog_sub(sch, pkt_len);
qdisc_qlen_dec(sch);
qdisc_tree_reduce_backlog(sch, 1, pkt_len);
}
diff --git a/net/sched/sch_prio.c b/net/sched/sch_prio.c
index fe42ae3d6b696b2fc47f4d397af32e950eeec194..e4dd56a890725b4c14d6715c96f5b3fa44a8f4f2 100644
--- a/net/sched/sch_prio.c
+++ b/net/sched/sch_prio.c
@@ -85,7 +85,7 @@ prio_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
ret = qdisc_enqueue(skb, qdisc, to_free);
if (ret == NET_XMIT_SUCCESS) {
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
diff --git a/net/sched/sch_qfq.c b/net/sched/sch_qfq.c
index 195c434aae5f7e03d1a1238ed73bb64b3f04e105..cb56787e1d258c06f2e86959c3b2cfaeb12df1ac 100644
--- a/net/sched/sch_qfq.c
+++ b/net/sched/sch_qfq.c
@@ -1264,7 +1264,7 @@ static int qfq_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
_bstats_update(&cl->bstats, len, gso_segs);
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
agg = cl->agg;
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 61b9064d39f222bdfe5021e93e8172b7ae60c408..7db97c96351309bc3e64fa50570a1928f2b2ce55 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -132,7 +132,7 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
len = qdisc_pkt_len(skb);
ret = qdisc_enqueue(skb, child, to_free);
if (likely(ret == NET_XMIT_SUCCESS)) {
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
} else if (net_xmit_drop_count(ret)) {
q->stats.pdrop++;
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 17b6ce223ad3a6f2d289c3ebe27cce8168c66b2b..2258567cbcaf70863eace85d347efda882a00145 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -406,7 +406,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
memcpy(&cb, sfb_skb_cb(skb), sizeof(cb));
ret = qdisc_enqueue(skb, child, to_free);
if (likely(ret == NET_XMIT_SUCCESS)) {
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
increment_qlen(&cb, q);
} else if (net_xmit_drop_count(ret)) {
@@ -582,7 +582,7 @@ static int sfb_dump(struct Qdisc *sch, struct sk_buff *skb)
.penalty_burst = q->penalty_burst,
};
- sch->qstats.backlog = q->qdisc->qstats.backlog;
+ WRITE_ONCE(sch->qstats.backlog, READ_ONCE(q->qdisc->qstats.backlog));
opts = nla_nest_start_noflag(skb, TCA_OPTIONS);
if (opts == NULL)
goto nla_put_failure;
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index 5eb6d8abd1c334938f72259f5fc41526597e792f..37410dfd7e3c554d3a8180b4887bdd5872ba8aab 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -425,7 +425,7 @@ sfq_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct sk_buff **to_free)
/* We know we have at least one packet in queue */
head = slot_dequeue_head(slot);
delta = qdisc_pkt_len(head) - qdisc_pkt_len(skb);
- sch->qstats.backlog -= delta;
+ qstats_backlog_sub(sch, delta);
slot->backlog -= delta;
qdisc_drop_reason(head, sch, to_free, QDISC_DROP_FLOW_LIMIT);
diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 25edf11a7d671fe63878b0995998c5920b86ef74..67c7aaaf8f607e82ad13b7fdf177405a1dd075bb 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -232,7 +232,7 @@ static int tbf_segment(struct sk_buff *skb, struct Qdisc *sch,
}
}
WRITE_ONCE(sch->q.qlen, sch->q.qlen + nb);
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
if (nb > 0) {
qdisc_tree_reduce_backlog(sch, 1 - nb, prev_len - len);
consume_skb(skb);
@@ -263,7 +263,7 @@ static int tbf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return ret;
}
- sch->qstats.backlog += len;
+ qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
return NET_XMIT_SUCCESS;
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 06/15] net/sched: sch_sfb: annotate data-races in sfb_dump_stats()
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (4 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 05/15] net/sched: annotate data-races around sch->qstats.backlog Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 07/15] net/sched: sch_red: annotate data-races in red_dump_stats() Eric Dumazet
` (9 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
sfb_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.
Alternative would be to acquire the qdisc spinlock, but our long-term
goal is to make qdisc dump operations lockless as much as we can.
tc_sfb_xstats fields don't need to be latched atomically,
otherwise this bug would have been caught earlier.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_sfb.c | 46 +++++++++++++++++++++++++++------------------
1 file changed, 28 insertions(+), 18 deletions(-)
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 2258567cbcaf70863eace85d347efda882a00145..315edd7f87fcf1600d69a3a92733ddb9fee55e99 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -202,11 +202,14 @@ static u32 sfb_compute_qlen(u32 *prob_r, u32 *avgpm_r, const struct sfb_sched_da
const struct sfb_bucket *b = &q->bins[q->slot].bins[0][0];
for (i = 0; i < SFB_LEVELS * SFB_NUMBUCKETS; i++) {
- if (qlen < b->qlen)
- qlen = b->qlen;
- totalpm += b->p_mark;
- if (prob < b->p_mark)
- prob = b->p_mark;
+ u32 b_qlen = READ_ONCE(b->qlen);
+ u32 b_mark = READ_ONCE(b->p_mark);
+
+ if (qlen < b_qlen)
+ qlen = b_qlen;
+ totalpm += b_mark;
+ if (prob < b_mark)
+ prob = b_mark;
b++;
}
*prob_r = prob;
@@ -295,7 +298,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (unlikely(sch->q.qlen >= q->limit)) {
qdisc_qstats_overlimit(sch);
- q->stats.queuedrop++;
+ WRITE_ONCE(q->stats.queuedrop,
+ q->stats.queuedrop + 1);
goto drop;
}
@@ -348,7 +352,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (unlikely(minqlen >= q->max)) {
qdisc_qstats_overlimit(sch);
- q->stats.bucketdrop++;
+ WRITE_ONCE(q->stats.bucketdrop,
+ q->stats.bucketdrop + 1);
goto drop;
}
@@ -374,7 +379,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
if (sfb_rate_limit(skb, q)) {
qdisc_qstats_overlimit(sch);
- q->stats.penaltydrop++;
+ WRITE_ONCE(q->stats.penaltydrop,
+ q->stats.penaltydrop + 1);
goto drop;
}
goto enqueue;
@@ -390,14 +396,17 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
* In either case, we want to start dropping packets.
*/
if (r < (p_min - SFB_MAX_PROB / 2) * 2) {
- q->stats.earlydrop++;
+ WRITE_ONCE(q->stats.earlydrop,
+ q->stats.earlydrop + 1);
goto drop;
}
}
if (INET_ECN_set_ce(skb)) {
- q->stats.marked++;
+ WRITE_ONCE(q->stats.marked,
+ q->stats.marked + 1);
} else {
- q->stats.earlydrop++;
+ WRITE_ONCE(q->stats.earlydrop,
+ q->stats.earlydrop + 1);
goto drop;
}
}
@@ -410,7 +419,8 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qdisc_qlen_inc(sch);
increment_qlen(&cb, q);
} else if (net_xmit_drop_count(ret)) {
- q->stats.childdrop++;
+ WRITE_ONCE(q->stats.childdrop,
+ q->stats.childdrop + 1);
qdisc_qstats_drop(sch);
}
return ret;
@@ -599,12 +609,12 @@ static int sfb_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
{
struct sfb_sched_data *q = qdisc_priv(sch);
struct tc_sfb_xstats st = {
- .earlydrop = q->stats.earlydrop,
- .penaltydrop = q->stats.penaltydrop,
- .bucketdrop = q->stats.bucketdrop,
- .queuedrop = q->stats.queuedrop,
- .childdrop = q->stats.childdrop,
- .marked = q->stats.marked,
+ .earlydrop = READ_ONCE(q->stats.earlydrop),
+ .penaltydrop = READ_ONCE(q->stats.penaltydrop),
+ .bucketdrop = READ_ONCE(q->stats.bucketdrop),
+ .queuedrop = READ_ONCE(q->stats.queuedrop),
+ .childdrop = READ_ONCE(q->stats.childdrop),
+ .marked = READ_ONCE(q->stats.marked),
};
st.maxqlen = sfb_compute_qlen(&st.maxprob, &st.avgprob, q);
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 07/15] net/sched: sch_red: annotate data-races in red_dump_stats()
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (5 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 06/15] net/sched: sch_sfb: annotate data-races in sfb_dump_stats() Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 08/15] net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats() Eric Dumazet
` (8 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
red_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.
Alternative would be to acquire the qdisc spinlock, but our long-term
goal is to make qdisc dump operations lockless as much as we can.
tc_red_xstats fields don't need to be latched atomically,
otherwise this bug would have been caught earlier.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_red.c | 31 +++++++++++++++++++++----------
1 file changed, 21 insertions(+), 10 deletions(-)
diff --git a/net/sched/sch_red.c b/net/sched/sch_red.c
index 7db97c96351309bc3e64fa50570a1928f2b2ce55..268f1ba4520cd74da60e7b1ca08974fcfdced680 100644
--- a/net/sched/sch_red.c
+++ b/net/sched/sch_red.c
@@ -90,17 +90,20 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
case RED_PROB_MARK:
qdisc_qstats_overlimit(sch);
if (!red_use_ecn(q)) {
- q->stats.prob_drop++;
+ WRITE_ONCE(q->stats.prob_drop,
+ q->stats.prob_drop + 1);
goto congestion_drop;
}
if (INET_ECN_set_ce(skb)) {
- q->stats.prob_mark++;
+ WRITE_ONCE(q->stats.prob_mark,
+ q->stats.prob_mark + 1);
skb = tcf_qevent_handle(&q->qe_mark, sch, skb, to_free, &ret);
if (!skb)
return NET_XMIT_CN | ret;
} else if (!red_use_nodrop(q)) {
- q->stats.prob_drop++;
+ WRITE_ONCE(q->stats.prob_drop,
+ q->stats.prob_drop + 1);
goto congestion_drop;
}
@@ -111,17 +114,20 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
reason = QDISC_DROP_OVERLIMIT;
qdisc_qstats_overlimit(sch);
if (red_use_harddrop(q) || !red_use_ecn(q)) {
- q->stats.forced_drop++;
+ WRITE_ONCE(q->stats.forced_drop,
+ q->stats.forced_drop + 1);
goto congestion_drop;
}
if (INET_ECN_set_ce(skb)) {
- q->stats.forced_mark++;
+ WRITE_ONCE(q->stats.forced_mark,
+ q->stats.forced_mark + 1);
skb = tcf_qevent_handle(&q->qe_mark, sch, skb, to_free, &ret);
if (!skb)
return NET_XMIT_CN | ret;
} else if (!red_use_nodrop(q)) {
- q->stats.forced_drop++;
+ WRITE_ONCE(q->stats.forced_drop,
+ q->stats.forced_drop + 1);
goto congestion_drop;
}
@@ -135,7 +141,8 @@ static int red_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qstats_backlog_add(sch, len);
qdisc_qlen_inc(sch);
} else if (net_xmit_drop_count(ret)) {
- q->stats.pdrop++;
+ WRITE_ONCE(q->stats.pdrop,
+ q->stats.pdrop + 1);
qdisc_qstats_drop(sch);
}
return ret;
@@ -463,9 +470,13 @@ static int red_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_RED,
&hw_stats_request);
}
- st.early = q->stats.prob_drop + q->stats.forced_drop;
- st.pdrop = q->stats.pdrop;
- st.marked = q->stats.prob_mark + q->stats.forced_mark;
+ st.early = READ_ONCE(q->stats.prob_drop) +
+ READ_ONCE(q->stats.forced_drop);
+
+ st.pdrop = READ_ONCE(q->stats.pdrop);
+
+ st.marked = READ_ONCE(q->stats.prob_mark) +
+ READ_ONCE(q->stats.forced_mark);
return gnet_stats_copy_app(d, &st, sizeof(st));
}
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 08/15] net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats()
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (6 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 07/15] net/sched: sch_red: annotate data-races in red_dump_stats() Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 09/15] net/sched: sch_pie: annotate data-races in pie_dump_stats() Eric Dumazet
` (7 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
fq_codel_dump_stats() acquires the qdisc spinlock a bit too late.
Move this acquisition before we fill st.qdisc_stats with live data.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_fq_codel.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 5c88278a30b3cd92ae525d1a88eb692ff0e296e0..db5cf578cbd63e842381cb147ced145fed4cca3e 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -586,6 +586,8 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
};
struct list_head *pos;
+ sch_tree_lock(sch);
+
st.qdisc_stats.maxpacket = q->cstats.maxpacket;
st.qdisc_stats.drop_overlimit = q->drop_overlimit;
st.qdisc_stats.ecn_mark = q->cstats.ecn_mark;
@@ -594,7 +596,6 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
st.qdisc_stats.memory_usage = q->memory_usage;
st.qdisc_stats.drop_overmemory = q->drop_overmemory;
- sch_tree_lock(sch);
list_for_each(pos, &q->new_flows)
st.qdisc_stats.new_flows_len++;
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 09/15] net/sched: sch_pie: annotate data-races in pie_dump_stats()
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (7 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 08/15] net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats() Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 10/15] net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats() Eric Dumazet
` (6 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
pie_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.
Alternative would be to acquire the qdisc spinlock, but our long-term
goal is to make qdisc dump operations lockless as much as we can.
tc_pie_xstats fields don't need to be latched atomically,
otherwise this bug would have been caught earlier.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/pie.h | 2 +-
net/sched/sch_pie.c | 38 +++++++++++++++++++-------------------
2 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/include/net/pie.h b/include/net/pie.h
index 01cbc66825a40bd21c0a044b1180cbbc346785df..1f3db0c355149b41823a891c9156cac625122031 100644
--- a/include/net/pie.h
+++ b/include/net/pie.h
@@ -104,7 +104,7 @@ static inline void pie_vars_init(struct pie_vars *vars)
vars->dq_tstamp = DTIME_INVALID;
vars->accu_prob = 0;
vars->dq_count = DQCOUNT_INVALID;
- vars->avg_dq_rate = 0;
+ WRITE_ONCE(vars->avg_dq_rate, 0);
}
static inline struct pie_skb_cb *get_pie_cb(const struct sk_buff *skb)
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index 16f3f629cb8e4be71431f7e50a278e3c7fdba8d0..fb53fbf0e328571be72b66ba4e75a938e1963422 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -90,7 +90,7 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
bool enqueue = false;
if (unlikely(qdisc_qlen(sch) >= sch->limit)) {
- q->stats.overlimit++;
+ WRITE_ONCE(q->stats.overlimit, q->stats.overlimit + 1);
goto out;
}
@@ -104,7 +104,7 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
/* If packet is ecn capable, mark it if drop probability
* is lower than 10%, else drop it.
*/
- q->stats.ecn_mark++;
+ WRITE_ONCE(q->stats.ecn_mark, q->stats.ecn_mark + 1);
enqueue = true;
}
@@ -114,15 +114,15 @@ static int pie_qdisc_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (!q->params.dq_rate_estimator)
pie_set_enqueue_time(skb);
- q->stats.packets_in++;
+ WRITE_ONCE(q->stats.packets_in, q->stats.packets_in + 1);
if (qdisc_qlen(sch) > q->stats.maxq)
- q->stats.maxq = qdisc_qlen(sch);
+ WRITE_ONCE(q->stats.maxq, qdisc_qlen(sch));
return qdisc_enqueue_tail(skb, sch);
}
out:
- q->stats.dropped++;
+ WRITE_ONCE(q->stats.dropped, q->stats.dropped + 1);
q->vars.accu_prob = 0;
return qdisc_drop_reason(skb, sch, to_free, reason);
}
@@ -267,11 +267,11 @@ void pie_process_dequeue(struct sk_buff *skb, struct pie_params *params,
count = count / dtime;
if (vars->avg_dq_rate == 0)
- vars->avg_dq_rate = count;
+ WRITE_ONCE(vars->avg_dq_rate, count);
else
- vars->avg_dq_rate =
+ WRITE_ONCE(vars->avg_dq_rate,
(vars->avg_dq_rate -
- (vars->avg_dq_rate >> 3)) + (count >> 3);
+ (vars->avg_dq_rate >> 3)) + (count >> 3));
/* If the queue has receded below the threshold, we hold
* on to the last drain rate calculated, else we reset
@@ -381,7 +381,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars,
if (delta > 0) {
/* prevent overflow */
if (vars->prob < oldprob) {
- vars->prob = MAX_PROB;
+ WRITE_ONCE(vars->prob, MAX_PROB);
/* Prevent normalization error. If probability is at
* maximum value already, we normalize it here, and
* skip the check to do a non-linear drop in the next
@@ -392,7 +392,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars,
} else {
/* prevent underflow */
if (vars->prob > oldprob)
- vars->prob = 0;
+ WRITE_ONCE(vars->prob, 0);
}
/* Non-linear drop in probability: Reduce drop probability quickly if
@@ -403,7 +403,7 @@ void pie_calculate_probability(struct pie_params *params, struct pie_vars *vars,
/* Reduce drop probability to 98.4% */
vars->prob -= vars->prob / 64;
- vars->qdelay = qdelay;
+ WRITE_ONCE(vars->qdelay, qdelay);
vars->backlog_old = backlog;
/* We restart the measurement cycle if the following conditions are met
@@ -502,21 +502,21 @@ static int pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
struct pie_sched_data *q = qdisc_priv(sch);
struct tc_pie_xstats st = {
.prob = q->vars.prob << BITS_PER_BYTE,
- .delay = ((u32)PSCHED_TICKS2NS(q->vars.qdelay)) /
+ .delay = ((u32)PSCHED_TICKS2NS(READ_ONCE(q->vars.qdelay))) /
NSEC_PER_USEC,
- .packets_in = q->stats.packets_in,
- .overlimit = q->stats.overlimit,
- .maxq = q->stats.maxq,
- .dropped = q->stats.dropped,
- .ecn_mark = q->stats.ecn_mark,
+ .packets_in = READ_ONCE(q->stats.packets_in),
+ .overlimit = READ_ONCE(q->stats.overlimit),
+ .maxq = READ_ONCE(q->stats.maxq),
+ .dropped = READ_ONCE(q->stats.dropped),
+ .ecn_mark = READ_ONCE(q->stats.ecn_mark),
};
/* avg_dq_rate is only valid if dq_rate_estimator is enabled */
st.dq_rate_estimating = q->params.dq_rate_estimator;
/* unscale and return dq_rate in bytes per sec */
- if (q->params.dq_rate_estimator)
- st.avg_dq_rate = q->vars.avg_dq_rate *
+ if (st.dq_rate_estimating)
+ st.avg_dq_rate = READ_ONCE(q->vars.avg_dq_rate) *
(PSCHED_TICKS_PER_SEC) >> PIE_SCALE;
return gnet_stats_copy_app(d, &st, sizeof(st));
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 10/15] net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats()
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (8 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 09/15] net/sched: sch_pie: annotate data-races in pie_dump_stats() Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 11/15] net_sched: sch_hhf: annotate data-races in hhf_dump_stats() Eric Dumazet
` (5 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
fq_codel_dump_stats() acquires the qdisc spinlock a bit too late.
Move this acquisition before we fill tc_fq_pie_xstats with live data.
Alternative would be to add READ_ONCE() and WRITE_ONCE() annotations,
but the spinlock is needed anyway.
Fixes: ec97ecf1ebe4 ("net: sched: add Flow Queue PIE packet scheduler")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_fq_pie.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 197f0df0a6eb06ab4ce25eefe01d32a35dbd84af..72f48fa4010bebbe6be212938b457db21ff3c5a0 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -509,18 +509,19 @@ static int fq_pie_dump(struct Qdisc *sch, struct sk_buff *skb)
static int fq_pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
{
struct fq_pie_sched_data *q = qdisc_priv(sch);
- struct tc_fq_pie_xstats st = {
- .packets_in = q->stats.packets_in,
- .overlimit = q->stats.overlimit,
- .overmemory = q->overmemory,
- .dropped = q->stats.dropped,
- .ecn_mark = q->stats.ecn_mark,
- .new_flow_count = q->new_flow_count,
- .memory_usage = q->memory_usage,
- };
+ struct tc_fq_pie_xstats st = { 0 };
struct list_head *pos;
sch_tree_lock(sch);
+
+ st.packets_in = q->stats.packets_in;
+ st.overlimit = q->stats.overlimit;
+ st.overmemory = q->overmemory;
+ st.dropped = q->stats.dropped;
+ st.ecn_mark = q->stats.ecn_mark;
+ st.new_flow_count = q->new_flow_count;
+ st.memory_usage = q->memory_usage;
+
list_for_each(pos, &q->new_flows)
st.new_flows_len++;
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 11/15] net_sched: sch_hhf: annotate data-races in hhf_dump_stats()
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (9 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 10/15] net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats() Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 12/15] net/sched: sch_choke: annotate data-races in choke_dump_stats() Eric Dumazet
` (4 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
hhf_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_hhf.c | 19 ++++++++++---------
1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 69b6f0a5471cb9a3b7b760144683f2b249091d89..1e25b75daae2e5de31bd212dfa1f6d7aea927174 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -198,7 +198,8 @@ static struct hh_flow_state *seek_list(const u32 hash,
return NULL;
list_del(&flow->flowchain);
kfree(flow);
- q->hh_flows_current_cnt--;
+ WRITE_ONCE(q->hh_flows_current_cnt,
+ q->hh_flows_current_cnt - 1);
} else if (flow->hash_id == hash) {
return flow;
}
@@ -226,7 +227,7 @@ static struct hh_flow_state *alloc_new_hh(struct list_head *head,
}
if (q->hh_flows_current_cnt >= q->hh_flows_limit) {
- q->hh_flows_overlimit++;
+ WRITE_ONCE(q->hh_flows_overlimit, q->hh_flows_overlimit + 1);
return NULL;
}
/* Create new entry. */
@@ -234,7 +235,7 @@ static struct hh_flow_state *alloc_new_hh(struct list_head *head,
if (!flow)
return NULL;
- q->hh_flows_current_cnt++;
+ WRITE_ONCE(q->hh_flows_current_cnt, q->hh_flows_current_cnt + 1);
INIT_LIST_HEAD(&flow->flowchain);
list_add_tail(&flow->flowchain, head);
@@ -309,7 +310,7 @@ static enum wdrr_bucket_idx hhf_classify(struct sk_buff *skb, struct Qdisc *sch)
return WDRR_BUCKET_FOR_NON_HH;
flow->hash_id = hash;
flow->hit_timestamp = now;
- q->hh_flows_total_cnt++;
+ WRITE_ONCE(q->hh_flows_total_cnt, q->hh_flows_total_cnt + 1);
/* By returning without updating counters in q->hhf_arrays,
* we implicitly implement "shielding" (see Optimization O1).
@@ -404,7 +405,7 @@ static int hhf_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return NET_XMIT_SUCCESS;
prev_backlog = sch->qstats.backlog;
- q->drop_overlimit++;
+ WRITE_ONCE(q->drop_overlimit, q->drop_overlimit + 1);
/* Return Congestion Notification only if we dropped a packet from this
* bucket.
*/
@@ -687,10 +688,10 @@ static int hhf_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
{
struct hhf_sched_data *q = qdisc_priv(sch);
struct tc_hhf_xstats st = {
- .drop_overlimit = q->drop_overlimit,
- .hh_overlimit = q->hh_flows_overlimit,
- .hh_tot_count = q->hh_flows_total_cnt,
- .hh_cur_count = q->hh_flows_current_cnt,
+ .drop_overlimit = READ_ONCE(q->drop_overlimit),
+ .hh_overlimit = READ_ONCE(q->hh_flows_overlimit),
+ .hh_tot_count = READ_ONCE(q->hh_flows_total_cnt),
+ .hh_cur_count = READ_ONCE(q->hh_flows_current_cnt),
};
return gnet_stats_copy_app(d, &st, sizeof(st));
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 12/15] net/sched: sch_choke: annotate data-races in choke_dump_stats()
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (10 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 11/15] net_sched: sch_hhf: annotate data-races in hhf_dump_stats() Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 13/15] net/sched: sch_cake: annotate data-races in cake_dump_stats() Eric Dumazet
` (3 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
choke_dump_stats() only runs with RTNL held,
reading fields that can be changed in qdisc fast path.
Add READ_ONCE()/WRITE_ONCE() annotations.
Fixes: edb09eb17ed8 ("net: sched: do not acquire qdisc spinlock in qdisc/class stats dump")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/sched/sch_choke.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/net/sched/sch_choke.c b/net/sched/sch_choke.c
index cd0785ad8e74314e6d5c88144ffcf64f286e02dd..73d3e673dc7b16cf2b9ac1d622da280c2ceb064a 100644
--- a/net/sched/sch_choke.c
+++ b/net/sched/sch_choke.c
@@ -229,7 +229,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
/* Draw a packet at random from queue and compare flow */
if (choke_match_random(q, skb, &idx)) {
- q->stats.matched++;
+ WRITE_ONCE(q->stats.matched, q->stats.matched + 1);
choke_drop_by_idx(sch, idx, to_free);
goto congestion_drop;
}
@@ -241,11 +241,13 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qdisc_qstats_overlimit(sch);
if (use_harddrop(q) || !use_ecn(q) ||
!INET_ECN_set_ce(skb)) {
- q->stats.forced_drop++;
+ WRITE_ONCE(q->stats.forced_drop,
+ q->stats.forced_drop + 1);
goto congestion_drop;
}
- q->stats.forced_mark++;
+ WRITE_ONCE(q->stats.forced_mark,
+ q->stats.forced_mark + 1);
} else if (++q->vars.qcount) {
if (red_mark_probability(p, &q->vars, q->vars.qavg)) {
q->vars.qcount = 0;
@@ -253,11 +255,13 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
qdisc_qstats_overlimit(sch);
if (!use_ecn(q) || !INET_ECN_set_ce(skb)) {
- q->stats.prob_drop++;
+ WRITE_ONCE(q->stats.prob_drop,
+ q->stats.prob_drop + 1);
goto congestion_drop;
}
- q->stats.prob_mark++;
+ WRITE_ONCE(q->stats.prob_mark,
+ q->stats.prob_mark + 1);
}
} else
q->vars.qR = red_random(p);
@@ -272,7 +276,7 @@ static int choke_enqueue(struct sk_buff *skb, struct Qdisc *sch,
return NET_XMIT_SUCCESS;
}
- q->stats.pdrop++;
+ WRITE_ONCE(q->stats.pdrop, q->stats.pdrop + 1);
return qdisc_drop(skb, sch, to_free);
congestion_drop:
@@ -461,10 +465,12 @@ static int choke_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
{
struct choke_sched_data *q = qdisc_priv(sch);
struct tc_choke_xstats st = {
- .early = q->stats.prob_drop + q->stats.forced_drop,
- .marked = q->stats.prob_mark + q->stats.forced_mark,
- .pdrop = q->stats.pdrop,
- .matched = q->stats.matched,
+ .early = READ_ONCE(q->stats.prob_drop) +
+ READ_ONCE(q->stats.forced_drop),
+ .marked = READ_ONCE(q->stats.prob_mark) +
+ READ_ONCE(q->stats.forced_mark),
+ .pdrop = READ_ONCE(q->stats.pdrop),
+ .matched = READ_ONCE(q->stats.matched),
};
return gnet_stats_copy_app(d, &st, sizeof(st));
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 13/15] net/sched: sch_cake: annotate data-races in cake_dump_stats()
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (11 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 12/15] net/sched: sch_choke: annotate data-races in choke_dump_stats() Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 14/15] net/sched: mq: no longer acquire qdisc spinlocks in dump operations Eric Dumazet
` (2 subsequent siblings)
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet,
Toke Høiland-Jørgensen
cake_dump_stats() and cake_dump_class_stats() run without qdisc
spinlock being held.
Add READ_ONCE()/WRITE_ONCE() annotations.
Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: "Toke Høiland-Jørgensen" <toke@toke.dk>
---
net/sched/sch_cake.c | 337 ++++++++++++++++++++++++-------------------
1 file changed, 189 insertions(+), 148 deletions(-)
diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 0104c29b20f8e43ffa025f0eb58bfe4e2b801010..fcc3c6b1044f324b399c9e80340fea3429e37c16 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -449,14 +449,17 @@ static bool cobalt_queue_full(struct cobalt_vars *vars,
bool up = false;
if (ktime_to_ns(ktime_sub(now, vars->blue_timer)) > p->target) {
- up = !vars->p_drop;
- vars->p_drop += p->p_inc;
- if (vars->p_drop < p->p_inc)
- vars->p_drop = ~0;
- vars->blue_timer = now;
- }
- vars->dropping = true;
- vars->drop_next = now;
+ u32 p_drop = vars->p_drop;
+
+ up = !p_drop;
+ p_drop += p->p_inc;
+ if (p_drop < p->p_inc)
+ p_drop = ~0;
+ WRITE_ONCE(vars->p_drop, p_drop);
+ WRITE_ONCE(vars->blue_timer, now);
+ }
+ WRITE_ONCE(vars->dropping, true);
+ WRITE_ONCE(vars->drop_next, now);
if (!vars->count)
vars->count = 1;
@@ -474,21 +477,25 @@ static bool cobalt_queue_empty(struct cobalt_vars *vars,
if (vars->p_drop &&
ktime_to_ns(ktime_sub(now, vars->blue_timer)) > p->target) {
- if (vars->p_drop < p->p_dec)
- vars->p_drop = 0;
+ u32 p_drop = vars->p_drop;
+
+ if (p_drop < p->p_dec)
+ p_drop = 0;
else
- vars->p_drop -= p->p_dec;
- vars->blue_timer = now;
- down = !vars->p_drop;
+ p_drop -= p->p_dec;
+ WRITE_ONCE(vars->p_drop, p_drop);
+ WRITE_ONCE(vars->blue_timer, now);
+ down = !p_drop;
}
- vars->dropping = false;
+ WRITE_ONCE(vars->dropping, false);
if (vars->count && ktime_to_ns(ktime_sub(now, vars->drop_next)) >= 0) {
vars->count--;
cobalt_invsqrt(vars);
- vars->drop_next = cobalt_control(vars->drop_next,
- p->interval,
- vars->rec_inv_sqrt);
+ WRITE_ONCE(vars->drop_next,
+ cobalt_control(vars->drop_next,
+ p->interval,
+ vars->rec_inv_sqrt));
}
return down;
@@ -534,37 +541,41 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
if (over_target) {
if (!vars->dropping) {
- vars->dropping = true;
- vars->drop_next = cobalt_control(now,
- p->interval,
- vars->rec_inv_sqrt);
+ WRITE_ONCE(vars->dropping, true);
+ WRITE_ONCE(vars->drop_next,
+ cobalt_control(now,
+ p->interval,
+ vars->rec_inv_sqrt));
}
if (!vars->count)
vars->count = 1;
} else if (vars->dropping) {
- vars->dropping = false;
+ WRITE_ONCE(vars->dropping, false);
}
if (next_due && vars->dropping) {
/* Use ECN mark if possible, otherwise drop */
- if (!(vars->ecn_marked = INET_ECN_set_ce(skb)))
+ vars->ecn_marked = INET_ECN_set_ce(skb);
+ if (!vars->ecn_marked)
reason = QDISC_DROP_CONGESTED;
vars->count++;
if (!vars->count)
vars->count--;
cobalt_invsqrt(vars);
- vars->drop_next = cobalt_control(vars->drop_next,
- p->interval,
- vars->rec_inv_sqrt);
+ WRITE_ONCE(vars->drop_next,
+ cobalt_control(vars->drop_next,
+ p->interval,
+ vars->rec_inv_sqrt));
schedule = ktime_sub(now, vars->drop_next);
} else {
while (next_due) {
vars->count--;
cobalt_invsqrt(vars);
- vars->drop_next = cobalt_control(vars->drop_next,
- p->interval,
- vars->rec_inv_sqrt);
+ WRITE_ONCE(vars->drop_next,
+ cobalt_control(vars->drop_next,
+ p->interval,
+ vars->rec_inv_sqrt));
schedule = ktime_sub(now, vars->drop_next);
next_due = vars->count && ktime_to_ns(schedule) >= 0;
}
@@ -577,9 +588,9 @@ static enum qdisc_drop_reason cobalt_should_drop(struct cobalt_vars *vars,
/* Overload the drop_next field as an activity timeout */
if (!vars->count)
- vars->drop_next = ktime_add_ns(now, p->interval);
+ WRITE_ONCE(vars->drop_next, ktime_add_ns(now, p->interval));
else if (ktime_to_ns(schedule) > 0 && reason == QDISC_DROP_UNSPEC)
- vars->drop_next = now;
+ WRITE_ONCE(vars->drop_next, now);
return reason;
}
@@ -813,7 +824,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
i++, k = (k + 1) % CAKE_SET_WAYS) {
if (q->tags[outer_hash + k] == flow_hash) {
if (i)
- q->way_hits++;
+ WRITE_ONCE(q->way_hits, q->way_hits + 1);
if (!q->flows[outer_hash + k].set) {
/* need to increment host refcnts */
@@ -831,7 +842,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
for (i = 0; i < CAKE_SET_WAYS;
i++, k = (k + 1) % CAKE_SET_WAYS) {
if (!q->flows[outer_hash + k].set) {
- q->way_misses++;
+ WRITE_ONCE(q->way_misses, q->way_misses + 1);
allocate_src = cake_dsrc(flow_mode);
allocate_dst = cake_ddst(flow_mode);
goto found;
@@ -841,7 +852,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
/* With no empty queues, default to the original
* queue, accept the collision, update the host tags.
*/
- q->way_collisions++;
+ WRITE_ONCE(q->way_collisions, q->way_collisions + 1);
allocate_src = cake_dsrc(flow_mode);
allocate_dst = cake_ddst(flow_mode);
@@ -875,7 +886,8 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
q->flows[reduced_hash].srchost = srchost_idx;
if (q->flows[reduced_hash].set == CAKE_SET_BULK)
- cake_inc_srchost_bulk_flow_count(q, &q->flows[reduced_hash], flow_mode);
+ cake_inc_srchost_bulk_flow_count(q, &q->flows[reduced_hash],
+ flow_mode);
}
if (allocate_dst) {
@@ -899,7 +911,8 @@ static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
q->flows[reduced_hash].dsthost = dsthost_idx;
if (q->flows[reduced_hash].set == CAKE_SET_BULK)
- cake_inc_dsthost_bulk_flow_count(q, &q->flows[reduced_hash], flow_mode);
+ cake_inc_dsthost_bulk_flow_count(q, &q->flows[reduced_hash],
+ flow_mode);
}
}
@@ -1379,9 +1392,9 @@ static u32 cake_calc_overhead(struct cake_sched_data *qd, u32 len, u32 off)
len -= off;
if (qd->max_netlen < len)
- qd->max_netlen = len;
+ WRITE_ONCE(qd->max_netlen, len);
if (qd->min_netlen > len)
- qd->min_netlen = len;
+ WRITE_ONCE(qd->min_netlen, len);
len += q->rate_overhead;
@@ -1401,9 +1414,9 @@ static u32 cake_calc_overhead(struct cake_sched_data *qd, u32 len, u32 off)
}
if (qd->max_adjlen < len)
- qd->max_adjlen = len;
+ WRITE_ONCE(qd->max_adjlen, len);
if (qd->min_adjlen > len)
- qd->min_adjlen = len;
+ WRITE_ONCE(qd->min_adjlen, len);
return len;
}
@@ -1416,7 +1429,7 @@ static u32 cake_overhead(struct cake_sched_data *q, const struct sk_buff *skb)
u16 segs = qdisc_pkt_segs(skb);
u32 len = qdisc_pkt_len(skb);
- q->avg_netoff = cake_ewma(q->avg_netoff, off << 16, 8);
+ WRITE_ONCE(q->avg_netoff, cake_ewma(q->avg_netoff, off << 16, 8));
if (segs == 1)
return cake_calc_overhead(q, len, off);
@@ -1590,16 +1603,17 @@ static unsigned int cake_drop(struct Qdisc *sch, struct sk_buff **to_free)
}
if (cobalt_queue_full(&flow->cvars, &b->cparams, now))
- b->unresponsive_flow_count++;
+ WRITE_ONCE(b->unresponsive_flow_count,
+ b->unresponsive_flow_count + 1);
len = qdisc_pkt_len(skb);
q->buffer_used -= skb->truesize;
- b->backlogs[idx] -= len;
- b->tin_backlog -= len;
+ WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] - len);
+ WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
qstats_backlog_sub(sch, len);
- flow->dropped++;
- b->tin_dropped++;
+ WRITE_ONCE(flow->dropped, flow->dropped + 1);
+ WRITE_ONCE(b->tin_dropped, b->tin_dropped + 1);
if (q->config->rate_flags & CAKE_FLAG_INGRESS)
cake_advance_shaper(q, b, skb, now, true);
@@ -1795,7 +1809,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
if (unlikely(len > b->max_skblen))
- b->max_skblen = len;
+ WRITE_ONCE(b->max_skblen, len);
if (qdisc_pkt_segs(skb) > 1 && q->config->rate_flags & CAKE_FLAG_SPLIT_GSO) {
struct sk_buff *segs, *nskb;
@@ -1819,13 +1833,13 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
numsegs++;
slen += segs->len;
q->buffer_used += segs->truesize;
- b->packets++;
}
/* stats */
- b->bytes += slen;
- b->backlogs[idx] += slen;
- b->tin_backlog += slen;
+ WRITE_ONCE(b->bytes, b->bytes + slen);
+ WRITE_ONCE(b->packets, b->packets + numsegs);
+ WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] + slen);
+ WRITE_ONCE(b->tin_backlog, b->tin_backlog + slen);
qstats_backlog_add(sch, slen);
q->avg_window_bytes += slen;
@@ -1843,7 +1857,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
ack = cake_ack_filter(q, flow);
if (ack) {
- b->ack_drops++;
+ WRITE_ONCE(b->ack_drops, b->ack_drops + 1);
sch->qstats.drops++;
ack_pkt_len = qdisc_pkt_len(ack);
b->bytes += ack_pkt_len;
@@ -1859,10 +1873,10 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
}
/* stats */
- b->packets++;
- b->bytes += len - ack_pkt_len;
- b->backlogs[idx] += len - ack_pkt_len;
- b->tin_backlog += len - ack_pkt_len;
+ WRITE_ONCE(b->packets, b->packets + 1);
+ WRITE_ONCE(b->bytes, b->bytes + len - ack_pkt_len);
+ WRITE_ONCE(b->backlogs[idx], b->backlogs[idx] + len - ack_pkt_len);
+ WRITE_ONCE(b->tin_backlog, b->tin_backlog + len - ack_pkt_len);
qstats_backlog_add(sch, len - ack_pkt_len);
q->avg_window_bytes += len - ack_pkt_len;
}
@@ -1917,27 +1931,30 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc *sch,
if (!flow->set) {
list_add_tail(&flow->flowchain, &b->new_flows);
} else {
- b->decaying_flow_count--;
+ WRITE_ONCE(b->decaying_flow_count,
+ b->decaying_flow_count - 1);
list_move_tail(&flow->flowchain, &b->new_flows);
}
flow->set = CAKE_SET_SPARSE;
- b->sparse_flow_count++;
+ WRITE_ONCE(b->sparse_flow_count,
+ b->sparse_flow_count + 1);
- flow->deficit = cake_get_flow_quantum(b, flow, q->config->flow_mode);
+ WRITE_ONCE(flow->deficit,
+ cake_get_flow_quantum(b, flow, q->config->flow_mode));
} else if (flow->set == CAKE_SET_SPARSE_WAIT) {
/* this flow was empty, accounted as a sparse flow, but actually
* in the bulk rotation.
*/
flow->set = CAKE_SET_BULK;
- b->sparse_flow_count--;
- b->bulk_flow_count++;
+ WRITE_ONCE(b->sparse_flow_count, b->sparse_flow_count - 1);
+ WRITE_ONCE(b->bulk_flow_count, b->bulk_flow_count + 1);
cake_inc_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
cake_inc_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
}
if (q->buffer_used > q->buffer_max_used)
- q->buffer_max_used = q->buffer_used;
+ WRITE_ONCE(q->buffer_max_used, q->buffer_used);
if (q->buffer_used <= q->buffer_limit)
return NET_XMIT_SUCCESS;
@@ -1976,8 +1993,8 @@ static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
if (flow->head) {
skb = dequeue_head(flow);
len = qdisc_pkt_len(skb);
- b->backlogs[q->cur_flow] -= len;
- b->tin_backlog -= len;
+ WRITE_ONCE(b->backlogs[q->cur_flow], b->backlogs[q->cur_flow] - len);
+ WRITE_ONCE(b->tin_backlog, b->tin_backlog - len);
qstats_backlog_sub(sch, len);
q->buffer_used -= skb->truesize;
qdisc_qlen_dec(sch);
@@ -2042,7 +2059,7 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
cake_configure_rates(sch, new_rate, true);
q->last_checked_active = now;
- q->active_queues = num_active_qs;
+ WRITE_ONCE(q->active_queues, num_active_qs);
}
begin:
@@ -2149,8 +2166,10 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
*/
if (flow->set == CAKE_SET_SPARSE) {
if (flow->head) {
- b->sparse_flow_count--;
- b->bulk_flow_count++;
+ WRITE_ONCE(b->sparse_flow_count,
+ b->sparse_flow_count - 1);
+ WRITE_ONCE(b->bulk_flow_count,
+ b->bulk_flow_count + 1);
cake_inc_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
cake_inc_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
@@ -2165,7 +2184,8 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
}
}
- flow->deficit += cake_get_flow_quantum(b, flow, q->config->flow_mode);
+ WRITE_ONCE(flow->deficit,
+ flow->deficit + cake_get_flow_quantum(b, flow, q->config->flow_mode));
list_move_tail(&flow->flowchain, &b->old_flows);
goto retry;
@@ -2187,16 +2207,22 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
list_move_tail(&flow->flowchain,
&b->decaying_flows);
if (flow->set == CAKE_SET_BULK) {
- b->bulk_flow_count--;
+ WRITE_ONCE(b->bulk_flow_count,
+ b->bulk_flow_count - 1);
- cake_dec_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
- cake_dec_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
+ cake_dec_srchost_bulk_flow_count(b, flow,
+ q->config->flow_mode);
+ cake_dec_dsthost_bulk_flow_count(b, flow,
+ q->config->flow_mode);
- b->decaying_flow_count++;
+ WRITE_ONCE(b->decaying_flow_count,
+ b->decaying_flow_count + 1);
} else if (flow->set == CAKE_SET_SPARSE ||
flow->set == CAKE_SET_SPARSE_WAIT) {
- b->sparse_flow_count--;
- b->decaying_flow_count++;
+ WRITE_ONCE(b->sparse_flow_count,
+ b->sparse_flow_count - 1);
+ WRITE_ONCE(b->decaying_flow_count,
+ b->decaying_flow_count + 1);
}
flow->set = CAKE_SET_DECAYING;
} else {
@@ -2204,14 +2230,20 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
list_del_init(&flow->flowchain);
if (flow->set == CAKE_SET_SPARSE ||
flow->set == CAKE_SET_SPARSE_WAIT)
- b->sparse_flow_count--;
+ WRITE_ONCE(b->sparse_flow_count,
+ b->sparse_flow_count - 1);
else if (flow->set == CAKE_SET_BULK) {
- b->bulk_flow_count--;
+ WRITE_ONCE(b->bulk_flow_count,
+ b->bulk_flow_count - 1);
- cake_dec_srchost_bulk_flow_count(b, flow, q->config->flow_mode);
- cake_dec_dsthost_bulk_flow_count(b, flow, q->config->flow_mode);
- } else
- b->decaying_flow_count--;
+ cake_dec_srchost_bulk_flow_count(b, flow,
+ q->config->flow_mode);
+ cake_dec_dsthost_bulk_flow_count(b, flow,
+ q->config->flow_mode);
+ } else {
+ WRITE_ONCE(b->decaying_flow_count,
+ b->decaying_flow_count - 1);
+ }
flow->set = CAKE_SET_NONE;
}
@@ -2230,11 +2262,11 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
if (q->config->rate_flags & CAKE_FLAG_INGRESS) {
len = cake_advance_shaper(q, b, skb,
now, true);
- flow->deficit -= len;
+ WRITE_ONCE(flow->deficit, flow->deficit - len);
b->tin_deficit -= len;
}
- flow->dropped++;
- b->tin_dropped++;
+ WRITE_ONCE(flow->dropped, flow->dropped + 1);
+ WRITE_ONCE(b->tin_dropped, b->tin_dropped + 1);
qdisc_tree_reduce_backlog(sch, 1, qdisc_pkt_len(skb));
qdisc_qstats_drop(sch);
qdisc_dequeue_drop(sch, skb, reason);
@@ -2242,20 +2274,22 @@ static struct sk_buff *cake_dequeue(struct Qdisc *sch)
goto retry;
}
- b->tin_ecn_mark += !!flow->cvars.ecn_marked;
+ WRITE_ONCE(b->tin_ecn_mark, b->tin_ecn_mark + !!flow->cvars.ecn_marked);
qdisc_bstats_update(sch, skb);
WRITE_ONCE(q->last_active, now);
/* collect delay stats */
delay = ktime_to_ns(ktime_sub(now, cobalt_get_enqueue_time(skb)));
- b->avge_delay = cake_ewma(b->avge_delay, delay, 8);
- b->peak_delay = cake_ewma(b->peak_delay, delay,
- delay > b->peak_delay ? 2 : 8);
- b->base_delay = cake_ewma(b->base_delay, delay,
- delay < b->base_delay ? 2 : 8);
+ WRITE_ONCE(b->avge_delay, cake_ewma(b->avge_delay, delay, 8));
+ WRITE_ONCE(b->peak_delay,
+ cake_ewma(b->peak_delay, delay,
+ delay > b->peak_delay ? 2 : 8));
+ WRITE_ONCE(b->base_delay,
+ cake_ewma(b->base_delay, delay,
+ delay < b->base_delay ? 2 : 8));
len = cake_advance_shaper(q, b, skb, now, false);
- flow->deficit -= len;
+ WRITE_ONCE(flow->deficit, flow->deficit - len);
b->tin_deficit -= len;
if (ktime_after(q->time_next_packet, now) && sch->q.qlen) {
@@ -2329,9 +2363,8 @@ static void cake_set_rate(struct cake_tin_data *b, u64 rate, u32 mtu,
u8 rate_shft = 0;
u64 rate_ns = 0;
- b->flow_quantum = 1514;
if (rate) {
- b->flow_quantum = max(min(rate >> 12, 1514ULL), 300ULL);
+ WRITE_ONCE(b->flow_quantum, max(min(rate >> 12, 1514ULL), 300ULL));
rate_shft = 34;
rate_ns = ((u64)NSEC_PER_SEC) << rate_shft;
rate_ns = div64_u64(rate_ns, max(MIN_RATE, rate));
@@ -2339,8 +2372,10 @@ static void cake_set_rate(struct cake_tin_data *b, u64 rate, u32 mtu,
rate_ns >>= 1;
rate_shft--;
}
- } /* else unlimited, ie. zero delay */
-
+ } else {
+ /* else unlimited, ie. zero delay */
+ WRITE_ONCE(b->flow_quantum, 1514);
+ }
b->tin_rate_bps = rate;
b->tin_rate_ns = rate_ns;
b->tin_rate_shft = rate_shft;
@@ -2611,25 +2646,27 @@ static void cake_reconfigure(struct Qdisc *sch)
{
struct cake_sched_data *qd = qdisc_priv(sch);
struct cake_sched_config *q = qd->config;
+ u32 buffer_limit;
cake_configure_rates(sch, qd->config->rate_bps, false);
if (q->buffer_config_limit) {
- qd->buffer_limit = q->buffer_config_limit;
+ buffer_limit = q->buffer_config_limit;
} else if (q->rate_bps) {
u64 t = q->rate_bps * q->interval;
do_div(t, USEC_PER_SEC / 4);
- qd->buffer_limit = max_t(u32, t, 4U << 20);
+ buffer_limit = max_t(u32, t, 4U << 20);
} else {
- qd->buffer_limit = ~0;
+ buffer_limit = ~0;
}
sch->flags &= ~TCQ_F_CAN_BYPASS;
- qd->buffer_limit = min(qd->buffer_limit,
- max(sch->limit * psched_mtu(qdisc_dev(sch)),
- q->buffer_config_limit));
+ WRITE_ONCE(qd->buffer_limit,
+ min(buffer_limit,
+ max(sch->limit * psched_mtu(qdisc_dev(sch)),
+ q->buffer_config_limit)));
}
static int cake_config_change(struct cake_sched_config *q, struct nlattr *opt,
@@ -2774,10 +2811,10 @@ static int cake_change(struct Qdisc *sch, struct nlattr *opt,
return ret;
if (overhead_changed) {
- qd->max_netlen = 0;
- qd->max_adjlen = 0;
- qd->min_netlen = ~0;
- qd->min_adjlen = ~0;
+ WRITE_ONCE(qd->max_netlen, 0);
+ WRITE_ONCE(qd->max_adjlen, 0);
+ WRITE_ONCE(qd->min_netlen, ~0);
+ WRITE_ONCE(qd->min_adjlen, ~0);
}
if (qd->tins) {
@@ -2995,15 +3032,15 @@ static int cake_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
goto nla_put_failure; \
} while (0)
- PUT_STAT_U64(CAPACITY_ESTIMATE64, q->avg_peak_bandwidth);
- PUT_STAT_U32(MEMORY_LIMIT, q->buffer_limit);
- PUT_STAT_U32(MEMORY_USED, q->buffer_max_used);
- PUT_STAT_U32(AVG_NETOFF, ((q->avg_netoff + 0x8000) >> 16));
- PUT_STAT_U32(MAX_NETLEN, q->max_netlen);
- PUT_STAT_U32(MAX_ADJLEN, q->max_adjlen);
- PUT_STAT_U32(MIN_NETLEN, q->min_netlen);
- PUT_STAT_U32(MIN_ADJLEN, q->min_adjlen);
- PUT_STAT_U32(ACTIVE_QUEUES, q->active_queues);
+ PUT_STAT_U64(CAPACITY_ESTIMATE64, READ_ONCE(q->avg_peak_bandwidth));
+ PUT_STAT_U32(MEMORY_LIMIT, READ_ONCE(q->buffer_limit));
+ PUT_STAT_U32(MEMORY_USED, READ_ONCE(q->buffer_max_used));
+ PUT_STAT_U32(AVG_NETOFF, ((READ_ONCE(q->avg_netoff) + 0x8000) >> 16));
+ PUT_STAT_U32(MAX_NETLEN, READ_ONCE(q->max_netlen));
+ PUT_STAT_U32(MAX_ADJLEN, READ_ONCE(q->max_adjlen));
+ PUT_STAT_U32(MIN_NETLEN, READ_ONCE(q->min_netlen));
+ PUT_STAT_U32(MIN_ADJLEN, READ_ONCE(q->min_adjlen));
+ PUT_STAT_U32(ACTIVE_QUEUES, READ_ONCE(q->active_queues));
#undef PUT_STAT_U32
#undef PUT_STAT_U64
@@ -3029,38 +3066,38 @@ static int cake_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
if (!ts)
goto nla_put_failure;
- PUT_TSTAT_U64(THRESHOLD_RATE64, b->tin_rate_bps);
- PUT_TSTAT_U64(SENT_BYTES64, b->bytes);
- PUT_TSTAT_U32(BACKLOG_BYTES, b->tin_backlog);
+ PUT_TSTAT_U64(THRESHOLD_RATE64, READ_ONCE(b->tin_rate_bps));
+ PUT_TSTAT_U64(SENT_BYTES64, READ_ONCE(b->bytes));
+ PUT_TSTAT_U32(BACKLOG_BYTES, READ_ONCE(b->tin_backlog));
PUT_TSTAT_U32(TARGET_US,
- ktime_to_us(ns_to_ktime(b->cparams.target)));
+ ktime_to_us(ns_to_ktime(READ_ONCE(b->cparams.target))));
PUT_TSTAT_U32(INTERVAL_US,
- ktime_to_us(ns_to_ktime(b->cparams.interval)));
+ ktime_to_us(ns_to_ktime(READ_ONCE(b->cparams.interval))));
- PUT_TSTAT_U32(SENT_PACKETS, b->packets);
- PUT_TSTAT_U32(DROPPED_PACKETS, b->tin_dropped);
- PUT_TSTAT_U32(ECN_MARKED_PACKETS, b->tin_ecn_mark);
- PUT_TSTAT_U32(ACKS_DROPPED_PACKETS, b->ack_drops);
+ PUT_TSTAT_U32(SENT_PACKETS, READ_ONCE(b->packets));
+ PUT_TSTAT_U32(DROPPED_PACKETS, READ_ONCE(b->tin_dropped));
+ PUT_TSTAT_U32(ECN_MARKED_PACKETS, READ_ONCE(b->tin_ecn_mark));
+ PUT_TSTAT_U32(ACKS_DROPPED_PACKETS, READ_ONCE(b->ack_drops));
PUT_TSTAT_U32(PEAK_DELAY_US,
- ktime_to_us(ns_to_ktime(b->peak_delay)));
+ ktime_to_us(ns_to_ktime(READ_ONCE(b->peak_delay))));
PUT_TSTAT_U32(AVG_DELAY_US,
- ktime_to_us(ns_to_ktime(b->avge_delay)));
+ ktime_to_us(ns_to_ktime(READ_ONCE(b->avge_delay))));
PUT_TSTAT_U32(BASE_DELAY_US,
- ktime_to_us(ns_to_ktime(b->base_delay)));
+ ktime_to_us(ns_to_ktime(READ_ONCE(b->base_delay))));
- PUT_TSTAT_U32(WAY_INDIRECT_HITS, b->way_hits);
- PUT_TSTAT_U32(WAY_MISSES, b->way_misses);
- PUT_TSTAT_U32(WAY_COLLISIONS, b->way_collisions);
+ PUT_TSTAT_U32(WAY_INDIRECT_HITS, READ_ONCE(b->way_hits));
+ PUT_TSTAT_U32(WAY_MISSES, READ_ONCE(b->way_misses));
+ PUT_TSTAT_U32(WAY_COLLISIONS, READ_ONCE(b->way_collisions));
- PUT_TSTAT_U32(SPARSE_FLOWS, b->sparse_flow_count +
- b->decaying_flow_count);
- PUT_TSTAT_U32(BULK_FLOWS, b->bulk_flow_count);
- PUT_TSTAT_U32(UNRESPONSIVE_FLOWS, b->unresponsive_flow_count);
- PUT_TSTAT_U32(MAX_SKBLEN, b->max_skblen);
+ PUT_TSTAT_U32(SPARSE_FLOWS, READ_ONCE(b->sparse_flow_count) +
+ READ_ONCE(b->decaying_flow_count));
+ PUT_TSTAT_U32(BULK_FLOWS, READ_ONCE(b->bulk_flow_count));
+ PUT_TSTAT_U32(UNRESPONSIVE_FLOWS, READ_ONCE(b->unresponsive_flow_count));
+ PUT_TSTAT_U32(MAX_SKBLEN, READ_ONCE(b->max_skblen));
- PUT_TSTAT_U32(FLOW_QUANTUM, b->flow_quantum);
+ PUT_TSTAT_U32(FLOW_QUANTUM, READ_ONCE(b->flow_quantum));
nla_nest_end(d->skb, ts);
}
@@ -3137,13 +3174,15 @@ static int cake_dump_class_stats(struct Qdisc *sch, unsigned long cl,
}
sch_tree_unlock(sch);
}
- qs.backlog = b->backlogs[idx % CAKE_QUEUES];
- qs.drops = flow->dropped;
+ qs.backlog = READ_ONCE(b->backlogs[idx % CAKE_QUEUES]);
+ qs.drops = READ_ONCE(flow->dropped);
}
if (gnet_stats_copy_queue(d, NULL, &qs, qs.qlen) < 0)
return -1;
if (flow) {
ktime_t now = ktime_get();
+ bool dropping;
+ u32 p_drop;
stats = nla_nest_start_noflag(d->skb, TCA_STATS_APP);
if (!stats)
@@ -3158,21 +3197,23 @@ static int cake_dump_class_stats(struct Qdisc *sch, unsigned long cl,
goto nla_put_failure; \
} while (0)
- PUT_STAT_S32(DEFICIT, flow->deficit);
- PUT_STAT_U32(DROPPING, flow->cvars.dropping);
- PUT_STAT_U32(COBALT_COUNT, flow->cvars.count);
- PUT_STAT_U32(P_DROP, flow->cvars.p_drop);
- if (flow->cvars.p_drop) {
+ PUT_STAT_S32(DEFICIT, READ_ONCE(flow->deficit));
+ dropping = READ_ONCE(flow->cvars.dropping);
+ PUT_STAT_U32(DROPPING, dropping);
+ PUT_STAT_U32(COBALT_COUNT, READ_ONCE(flow->cvars.count));
+ p_drop = READ_ONCE(flow->cvars.p_drop);
+ PUT_STAT_U32(P_DROP, p_drop);
+ if (p_drop) {
PUT_STAT_S32(BLUE_TIMER_US,
ktime_to_us(
ktime_sub(now,
- flow->cvars.blue_timer)));
+ READ_ONCE(flow->cvars.blue_timer))));
}
- if (flow->cvars.dropping) {
+ if (dropping) {
PUT_STAT_S32(DROP_NEXT_US,
ktime_to_us(
ktime_sub(now,
- flow->cvars.drop_next)));
+ READ_ONCE(flow->cvars.drop_next))));
}
if (nla_nest_end(d->skb, stats) < 0)
@@ -3298,10 +3339,10 @@ static int cake_mq_change(struct Qdisc *sch, struct nlattr *opt,
struct cake_sched_data *qd = qdisc_priv(chld);
if (overhead_changed) {
- qd->max_netlen = 0;
- qd->max_adjlen = 0;
- qd->min_netlen = ~0;
- qd->min_adjlen = ~0;
+ WRITE_ONCE(qd->max_netlen, 0);
+ WRITE_ONCE(qd->max_adjlen, 0);
+ WRITE_ONCE(qd->min_netlen, ~0);
+ WRITE_ONCE(qd->min_adjlen, ~0);
}
if (qd->tins) {
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 14/15] net/sched: mq: no longer acquire qdisc spinlocks in dump operations
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (12 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 13/15] net/sched: sch_cake: annotate data-races in cake_dump_stats() Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 12:56 ` [PATCH net-next 15/15] net/sched: convert tc_dump_qdisc() to RCU Eric Dumazet
2026-04-08 17:09 ` [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet
Prepare mq_dump_common(), mqprio_dump() and mqprio_dump_class_stats()
for RTNL avoidance.
Use private variables instead of assuming sch->bstats and sch->qstats
can be used when folding stats from children.
This means the children qdisc spinlocks no longer need to be acquired.
Add qdisc_qlen_lockless() helper, and change gnet_stats_add_basic()
prototype.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/gen_stats.h | 9 ++++--
include/net/sch_generic.h | 5 ++++
net/core/gen_estimator.c | 24 +++++++--------
net/core/gen_stats.c | 22 ++++++--------
net/sched/sch_mq.c | 28 ++++++++++-------
net/sched/sch_mqprio.c | 63 +++++++++++++++++----------------------
6 files changed, 76 insertions(+), 75 deletions(-)
diff --git a/include/net/gen_stats.h b/include/net/gen_stats.h
index 7aa2b8e1fb298c4f994a745b114fc4da785ddf4b..5484b67298e3fe94fe84f0e929799362d21499df 100644
--- a/include/net/gen_stats.h
+++ b/include/net/gen_stats.h
@@ -21,6 +21,11 @@ struct gnet_stats_basic_sync {
struct u64_stats_sync syncp;
} __aligned(2 * sizeof(u64));
+struct gnet_stats {
+ u64 bytes;
+ u64 packets;
+};
+
struct net_rate_estimator;
struct gnet_dump {
@@ -49,9 +54,9 @@ int gnet_stats_start_copy_compat(struct sk_buff *skb, int type,
int gnet_stats_copy_basic(struct gnet_dump *d,
struct gnet_stats_basic_sync __percpu *cpu,
struct gnet_stats_basic_sync *b, bool running);
-void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
+void gnet_stats_add_basic(struct gnet_stats *bstats,
struct gnet_stats_basic_sync __percpu *cpu,
- struct gnet_stats_basic_sync *b, bool running);
+ const struct gnet_stats_basic_sync *b, bool running);
int gnet_stats_copy_basic_hw(struct gnet_dump *d,
struct gnet_stats_basic_sync __percpu *cpu,
struct gnet_stats_basic_sync *b, bool running);
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 8e4e12692a1048f5e626b89c4db9e0339b16265d..f85aca5c1a445debd52a4e7b359141cf90cb7b55 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -542,6 +542,11 @@ static inline int qdisc_qlen(const struct Qdisc *q)
return q->q.qlen;
}
+static inline int qdisc_qlen_lockless(const struct Qdisc *q)
+{
+ return READ_ONCE(q->q.qlen);
+}
+
static inline void qdisc_qlen_inc(struct Qdisc *q)
{
WRITE_ONCE(q->q.qlen, q->q.qlen + 1);
diff --git a/net/core/gen_estimator.c b/net/core/gen_estimator.c
index c34e58c6c3e666743e72978f9a78cf7f95a360c3..40990aee45590f2c56c070b0d28f856fc82d1f55 100644
--- a/net/core/gen_estimator.c
+++ b/net/core/gen_estimator.c
@@ -60,9 +60,10 @@ struct net_rate_estimator {
};
static void est_fetch_counters(struct net_rate_estimator *e,
- struct gnet_stats_basic_sync *b)
+ struct gnet_stats *b)
{
- gnet_stats_basic_sync_init(b);
+ b->packets = 0;
+ b->bytes = 0;
if (e->stats_lock)
spin_lock(e->stats_lock);
@@ -76,18 +77,15 @@ static void est_fetch_counters(struct net_rate_estimator *e,
static void est_timer(struct timer_list *t)
{
struct net_rate_estimator *est = timer_container_of(est, t, timer);
- struct gnet_stats_basic_sync b;
- u64 b_bytes, b_packets;
+ struct gnet_stats b;
u64 rate, brate;
est_fetch_counters(est, &b);
- b_bytes = u64_stats_read(&b.bytes);
- b_packets = u64_stats_read(&b.packets);
- brate = (b_bytes - est->last_bytes) << (10 - est->intvl_log);
+ brate = (b.bytes - est->last_bytes) << (10 - est->intvl_log);
brate = (brate >> est->ewma_log) - (est->avbps >> est->ewma_log);
- rate = (b_packets - est->last_packets) << (10 - est->intvl_log);
+ rate = (b.packets - est->last_packets) << (10 - est->intvl_log);
rate = (rate >> est->ewma_log) - (est->avpps >> est->ewma_log);
preempt_disable_nested();
@@ -97,8 +95,8 @@ static void est_timer(struct timer_list *t)
write_seqcount_end(&est->seq);
preempt_enable_nested();
- est->last_bytes = b_bytes;
- est->last_packets = b_packets;
+ est->last_bytes = b.bytes;
+ est->last_packets = b.packets;
est->next_jiffies += ((HZ/4) << est->intvl_log);
@@ -138,7 +136,7 @@ int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
{
struct gnet_estimator *parm = nla_data(opt);
struct net_rate_estimator *old, *est;
- struct gnet_stats_basic_sync b;
+ struct gnet_stats b;
int intvl_log;
if (nla_len(opt) < sizeof(*parm))
@@ -172,8 +170,8 @@ int gen_new_estimator(struct gnet_stats_basic_sync *bstats,
est_fetch_counters(est, &b);
if (lock)
local_bh_enable();
- est->last_bytes = u64_stats_read(&b.bytes);
- est->last_packets = u64_stats_read(&b.packets);
+ est->last_bytes = b.bytes;
+ est->last_packets = b.packets;
if (lock)
spin_lock_bh(lock);
diff --git a/net/core/gen_stats.c b/net/core/gen_stats.c
index 1a2380e74272de8eaf3d4ef453e56105a31e9edf..29a178bebcc13d199e71ef0b09d852bd9dbf3378 100644
--- a/net/core/gen_stats.c
+++ b/net/core/gen_stats.c
@@ -123,12 +123,13 @@ void gnet_stats_basic_sync_init(struct gnet_stats_basic_sync *b)
}
EXPORT_SYMBOL(gnet_stats_basic_sync_init);
-static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
+static void gnet_stats_add_basic_cpu(struct gnet_stats *bstats,
struct gnet_stats_basic_sync __percpu *cpu)
{
- u64 t_bytes = 0, t_packets = 0;
int i;
+ bstats->bytes = 0;
+ bstats->packets = 0;
for_each_possible_cpu(i) {
struct gnet_stats_basic_sync *bcpu = per_cpu_ptr(cpu, i);
unsigned int start;
@@ -140,19 +141,16 @@ static void gnet_stats_add_basic_cpu(struct gnet_stats_basic_sync *bstats,
packets = u64_stats_read(&bcpu->packets);
} while (u64_stats_fetch_retry(&bcpu->syncp, start));
- t_bytes += bytes;
- t_packets += packets;
+ bstats->bytes += bytes;
+ bstats->packets += packets;
}
- _bstats_update(bstats, t_bytes, t_packets);
}
-void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
+void gnet_stats_add_basic(struct gnet_stats *bstats,
struct gnet_stats_basic_sync __percpu *cpu,
- struct gnet_stats_basic_sync *b, bool running)
+ const struct gnet_stats_basic_sync *b, bool running)
{
unsigned int start;
- u64 bytes = 0;
- u64 packets = 0;
WARN_ON_ONCE((cpu || running) && in_hardirq());
@@ -163,11 +161,9 @@ void gnet_stats_add_basic(struct gnet_stats_basic_sync *bstats,
do {
if (running)
start = u64_stats_fetch_begin(&b->syncp);
- bytes = u64_stats_read(&b->bytes);
- packets = u64_stats_read(&b->packets);
+ bstats->bytes = u64_stats_read(&b->bytes);
+ bstats->packets = u64_stats_read(&b->packets);
} while (running && u64_stats_fetch_retry(&b->syncp, start));
-
- _bstats_update(bstats, bytes, packets);
}
EXPORT_SYMBOL(gnet_stats_add_basic);
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index a0133a7b9d3b09a0d2a6064234c8fdef60dbf955..1e4522436620e5ee3a0417996476b37d9abca299 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -143,13 +143,12 @@ EXPORT_SYMBOL_NS_GPL(mq_attach, "NET_SCHED_INTERNAL");
void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
{
struct net_device *dev = qdisc_dev(sch);
- struct Qdisc *qdisc;
+ struct gnet_stats_queue qstats = { 0 };
+ struct gnet_stats bstats = { 0 };
+ const struct Qdisc *qdisc;
+ unsigned int qlen = 0;
unsigned int ntx;
- sch->q.qlen = 0;
- gnet_stats_basic_sync_init(&sch->bstats);
- memset(&sch->qstats, 0, sizeof(sch->qstats));
-
/* MQ supports lockless qdiscs. However, statistics accounting needs
* to account for all, none, or a mix of locked and unlocked child
* qdiscs. Percpu stats are added to counters in-band and locking
@@ -157,16 +156,23 @@ void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
*/
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
- spin_lock_bh(qdisc_lock(qdisc));
- gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
+ gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
&qdisc->bstats, false);
- gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
+ gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
&qdisc->qstats);
- sch->q.qlen += qdisc_qlen(qdisc);
-
- spin_unlock_bh(qdisc_lock(qdisc));
+ qlen += qdisc_qlen_lockless(qdisc);
}
+ u64_stats_set(&sch->bstats.bytes, bstats.bytes);
+ u64_stats_set(&sch->bstats.packets, bstats.packets);
+
+ WRITE_ONCE(sch->qstats.qlen, qstats.qlen);
+ WRITE_ONCE(sch->qstats.backlog, qstats.backlog);
+ WRITE_ONCE(sch->qstats.drops, qstats.drops);
+ WRITE_ONCE(sch->qstats.requeues, qstats.requeues);
+ WRITE_ONCE(sch->qstats.overlimits, qstats.overlimits);
+
+ WRITE_ONCE(sch->q.qlen, qlen);
}
EXPORT_SYMBOL_NS_GPL(mq_dump_common, "NET_SCHED_INTERNAL");
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index d35624e5869a4a6a12612886b2cd9cdac7b0b471..ac6d7d0a35309a5375425790f89f587601511cc7 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -554,15 +554,13 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
struct net_device *dev = qdisc_dev(sch);
struct mqprio_sched *priv = qdisc_priv(sch);
struct nlattr *nla = (struct nlattr *)skb_tail_pointer(skb);
+ struct gnet_stats_queue qstats = { 0 };
struct tc_mqprio_qopt opt = { 0 };
+ struct gnet_stats bstats = { 0 };
+ const struct Qdisc *qdisc;
unsigned int qlen = 0;
- struct Qdisc *qdisc;
unsigned int ntx;
- qlen = 0;
- gnet_stats_basic_sync_init(&sch->bstats);
- memset(&sch->qstats, 0, sizeof(sch->qstats));
-
/* MQ supports lockless qdiscs. However, statistics accounting needs
* to account for all, none, or a mix of locked and unlocked child
* qdiscs. Percpu stats are added to counters in-band and locking
@@ -570,16 +568,22 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
*/
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
- spin_lock_bh(qdisc_lock(qdisc));
- gnet_stats_add_basic(&sch->bstats, qdisc->cpu_bstats,
+ gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
&qdisc->bstats, false);
- gnet_stats_add_queue(&sch->qstats, qdisc->cpu_qstats,
+ gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
&qdisc->qstats);
qlen += qdisc_qlen(qdisc);
-
- spin_unlock_bh(qdisc_lock(qdisc));
}
+ u64_stats_set(&sch->bstats.bytes, bstats.bytes);
+ u64_stats_set(&sch->bstats.packets, bstats.packets);
+
+ WRITE_ONCE(sch->qstats.qlen, qstats.qlen);
+ WRITE_ONCE(sch->qstats.backlog, qstats.backlog);
+ WRITE_ONCE(sch->qstats.drops, qstats.drops);
+ WRITE_ONCE(sch->qstats.requeues, qstats.requeues);
+ WRITE_ONCE(sch->qstats.overlimits, qstats.overlimits);
+
WRITE_ONCE(sch->q.qlen, qlen);
mqprio_qopt_reconstruct(dev, &opt);
@@ -661,45 +665,32 @@ static int mqprio_dump_class(struct Qdisc *sch, unsigned long cl,
static int mqprio_dump_class_stats(struct Qdisc *sch, unsigned long cl,
struct gnet_dump *d)
- __releases(d->lock)
- __acquires(d->lock)
{
if (cl >= TC_H_MIN_PRIORITY) {
- int i;
- __u32 qlen;
- struct gnet_stats_queue qstats = {0};
- struct gnet_stats_basic_sync bstats;
struct net_device *dev = qdisc_dev(sch);
struct netdev_tc_txq tc = dev->tc_to_txq[cl & TC_BITMASK];
-
- gnet_stats_basic_sync_init(&bstats);
- /* Drop lock here it will be reclaimed before touching
- * statistics this is required because the d->lock we
- * hold here is the look on dev_queue->qdisc_sleeping
- * also acquired below.
- */
- if (d->lock)
- spin_unlock_bh(d->lock);
+ struct gnet_stats_queue qstats = { 0 };
+ struct gnet_stats_basic_sync bstats;
+ struct gnet_stats _bstats = { 0 };
+ u32 qlen = 0;
+ int i;
for (i = tc.offset; i < tc.offset + tc.count; i++) {
struct netdev_queue *q = netdev_get_tx_queue(dev, i);
- struct Qdisc *qdisc = rtnl_dereference(q->qdisc);
+ const struct Qdisc *qdisc = rtnl_dereference(q->qdisc);
- spin_lock_bh(qdisc_lock(qdisc));
-
- gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
+ gnet_stats_add_basic(&_bstats, qdisc->cpu_bstats,
&qdisc->bstats, false);
gnet_stats_add_queue(&qstats, qdisc->cpu_qstats,
&qdisc->qstats);
- sch->q.qlen += qdisc_qlen(qdisc);
-
- spin_unlock_bh(qdisc_lock(qdisc));
+ qlen += qdisc_qlen_lockless(qdisc);
}
- qlen = qdisc_qlen(sch) + qstats.qlen;
+ u64_stats_init(&bstats.syncp);
+ u64_stats_set(&bstats.bytes, _bstats.bytes);
+ u64_stats_set(&bstats.packets, _bstats.packets);
+
+ qlen = qlen + qstats.qlen;
- /* Reclaim root sleeping lock before completing stats */
- if (d->lock)
- spin_lock_bh(d->lock);
if (gnet_stats_copy_basic(d, NULL, &bstats, false) < 0 ||
gnet_stats_copy_queue(d, NULL, &qstats, qlen) < 0)
return -1;
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* [PATCH net-next 15/15] net/sched: convert tc_dump_qdisc() to RCU
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (13 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 14/15] net/sched: mq: no longer acquire qdisc spinlocks in dump operations Eric Dumazet
@ 2026-04-08 12:56 ` Eric Dumazet
2026-04-08 17:09 ` [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 12:56 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet, Eric Dumazet, Stanislav Fomichev
No longer acquire RTNL in qdisc dumps (tc -s -d qdisc show ...)
Notes:
- Add sch_tree_lock_rcu()/sch_tree_unlock_rcu() for dump methods
which need to acquire qdisc spinlock (fq, fq_codel, fq_pie and gred).
- Removed netdev_lock_ops()/netdev_unlock_ops() because dumps are read-only.
They were are added in commit a0527ee2df3f ("net: hold netdev instance
lock during qdisc ndo_setup_tc").
- for_each_netdev_rcu() can be changed later with
more scalable for_each_netdev_dump().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stanislav Fomichev <sdf@fomichev.me>
---
include/net/sch_generic.h | 14 ++++++++++++++
net/sched/sch_api.c | 21 +++++++++------------
net/sched/sch_fq.c | 4 ++--
net/sched/sch_fq_codel.c | 4 ++--
net/sched/sch_fq_pie.c | 4 ++--
net/sched/sch_gred.c | 4 ++--
net/sched/sch_mq.c | 2 +-
net/sched/sch_mqprio.c | 2 +-
net/sched/sch_taprio.c | 16 +++++-----------
9 files changed, 38 insertions(+), 33 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index f85aca5c1a445debd52a4e7b359141cf90cb7b55..82cb1e639dcc404bffe4b2669bb5aacef605cf6d 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -628,6 +628,20 @@ static inline void sch_tree_unlock(struct Qdisc *q)
spin_unlock_bh(qdisc_root_sleeping_lock(q));
}
+static inline void sch_tree_lock_rcu(struct Qdisc *q)
+{
+ if (!(q->flags & TCQ_F_MQROOT))
+ q = qdisc_root_sleeping(q);
+ spin_lock_bh(qdisc_lock(q));
+}
+
+static inline void sch_tree_unlock_rcu(struct Qdisc *q)
+{
+ if (!(q->flags & TCQ_F_MQROOT))
+ q = qdisc_root_sleeping(q);
+ spin_unlock_bh(qdisc_lock(q));
+}
+
extern struct Qdisc noop_qdisc;
extern struct Qdisc_ops noop_qdisc_ops;
extern struct Qdisc_ops pfifo_fast_ops;
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index 292bc8bb7a79922a83865ed54083c04ff72742ff..adc2d499a03d9fb4667dc87ba06aa3abfce30214 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -909,7 +909,6 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
u32 block_index;
__u32 qlen;
- cond_resched();
nlh = nlmsg_put(skb, portid, seq, event, sizeof(*tcm), flags);
if (!nlh)
goto out_nlmsg_trim;
@@ -941,7 +940,7 @@ static int tc_fill_qdisc(struct sk_buff *skb, struct Qdisc *q, u32 clid,
goto nla_put_failure;
qlen = qdisc_qlen_sum(q);
- stab = rtnl_dereference(q->stab);
+ stab = rcu_dereference(q->stab);
if (stab && qdisc_dump_stab(skb, stab) < 0)
goto nla_put_failure;
@@ -1888,14 +1887,14 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct netlink_callback *cb)
s_q_idx = q_idx = cb->args[1];
idx = 0;
- ASSERT_RTNL();
err = nlmsg_parse_deprecated(nlh, sizeof(struct tcmsg), tca, TCA_MAX,
rtm_tca_policy, cb->extack);
if (err < 0)
return err;
- for_each_netdev(net, dev) {
+ rcu_read_lock();
+ for_each_netdev_rcu(net, dev) {
struct netdev_queue *dev_queue;
if (idx < s_idx)
@@ -1904,29 +1903,26 @@ static int tc_dump_qdisc(struct sk_buff *skb, struct netlink_callback *cb)
s_q_idx = 0;
q_idx = 0;
- netdev_lock_ops(dev);
- if (tc_dump_qdisc_root(rtnl_dereference(dev->qdisc),
+ if (tc_dump_qdisc_root(rcu_dereference(dev->qdisc),
skb, cb, &q_idx, s_q_idx,
true, tca[TCA_DUMP_INVISIBLE]) < 0) {
- netdev_unlock_ops(dev);
goto done;
}
- dev_queue = dev_ingress_queue(dev);
+ dev_queue = dev_ingress_queue_rcu(dev);
if (dev_queue &&
- tc_dump_qdisc_root(rtnl_dereference(dev_queue->qdisc_sleeping),
+ tc_dump_qdisc_root(rcu_dereference(dev_queue->qdisc_sleeping),
skb, cb, &q_idx, s_q_idx, false,
tca[TCA_DUMP_INVISIBLE]) < 0) {
- netdev_unlock_ops(dev);
goto done;
}
- netdev_unlock_ops(dev);
cont:
idx++;
}
done:
+ rcu_read_unlock();
cb->args[0] = idx;
cb->args[1] = q_idx;
@@ -2487,7 +2483,8 @@ static const struct rtnl_msg_handler psched_rtnl_msg_handlers[] __initconst = {
{.msgtype = RTM_NEWQDISC, .doit = tc_modify_qdisc},
{.msgtype = RTM_DELQDISC, .doit = tc_get_qdisc},
{.msgtype = RTM_GETQDISC, .doit = tc_get_qdisc,
- .dumpit = tc_dump_qdisc},
+ .dumpit = tc_dump_qdisc,
+ .flags = RTNL_FLAG_DUMP_UNLOCKED},
{.msgtype = RTM_NEWTCLASS, .doit = tc_ctl_tclass},
{.msgtype = RTM_DELTCLASS, .doit = tc_ctl_tclass},
{.msgtype = RTM_GETTCLASS, .doit = tc_ctl_tclass,
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index dd553d6f3e8e911e161c1440eb6d9ce94f65385a..c13881a5c2b4ecb37b41b906ad96bb5de2653fe8 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -1286,7 +1286,7 @@ static int fq_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
st.pad = 0;
- sch_tree_lock(sch);
+ sch_tree_lock_rcu(sch);
st.gc_flows = q->stat_gc_flows;
st.highprio_packets = 0;
@@ -1310,7 +1310,7 @@ static int fq_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
st.band_drops[i] = q->stat_band_drops[i];
st.band_pkt_count[i] = q->band_pkt_count[i];
}
- sch_tree_unlock(sch);
+ sch_tree_unlock_rcu(sch);
return gnet_stats_copy_app(d, &st, sizeof(st));
}
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index db5cf578cbd63e842381cb147ced145fed4cca3e..e14d014041c2e33b344455f089ab69e93a13feca 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -586,7 +586,7 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
};
struct list_head *pos;
- sch_tree_lock(sch);
+ sch_tree_lock_rcu(sch);
st.qdisc_stats.maxpacket = q->cstats.maxpacket;
st.qdisc_stats.drop_overlimit = q->drop_overlimit;
@@ -601,7 +601,7 @@ static int fq_codel_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
list_for_each(pos, &q->old_flows)
st.qdisc_stats.old_flows_len++;
- sch_tree_unlock(sch);
+ sch_tree_unlock_rcu(sch);
return gnet_stats_copy_app(d, &st, sizeof(st));
}
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index 72f48fa4010bebbe6be212938b457db21ff3c5a0..93b2abdb6328f63e2dcbfbee463ddd689a9d733d 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -512,7 +512,7 @@ static int fq_pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
struct tc_fq_pie_xstats st = { 0 };
struct list_head *pos;
- sch_tree_lock(sch);
+ sch_tree_lock_rcu(sch);
st.packets_in = q->stats.packets_in;
st.overlimit = q->stats.overlimit;
@@ -527,7 +527,7 @@ static int fq_pie_dump_stats(struct Qdisc *sch, struct gnet_dump *d)
list_for_each(pos, &q->old_flows)
st.old_flows_len++;
- sch_tree_unlock(sch);
+ sch_tree_unlock_rcu(sch);
return gnet_stats_copy_app(d, &st, sizeof(st));
}
diff --git a/net/sched/sch_gred.c b/net/sched/sch_gred.c
index 36d0cafac2063f3ba1133ad9b8fab08ce4468550..28a99640da95a73117697cbf220109556a253343 100644
--- a/net/sched/sch_gred.c
+++ b/net/sched/sch_gred.c
@@ -377,7 +377,7 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
/* Even if driver returns failure adjust the stats - in case offload
* ended but driver still wants to adjust the values.
*/
- sch_tree_lock(sch);
+ sch_tree_lock_rcu(sch);
for (i = 0; i < MAX_DPs; i++) {
if (!table->tab[i])
continue;
@@ -394,7 +394,7 @@ static int gred_offload_dump_stats(struct Qdisc *sch)
sch->qstats.overlimits += hw_stats->stats.qstats[i].overlimits;
}
_bstats_update(&sch->bstats, bytes, packets);
- sch_tree_unlock(sch);
+ sch_tree_unlock_rcu(sch);
kfree(hw_stats);
return ret;
diff --git a/net/sched/sch_mq.c b/net/sched/sch_mq.c
index 1e4522436620e5ee3a0417996476b37d9abca299..eaffac2453bbd81bcf2655804e22ab9167d0660a 100644
--- a/net/sched/sch_mq.c
+++ b/net/sched/sch_mq.c
@@ -155,7 +155,7 @@ void mq_dump_common(struct Qdisc *sch, struct sk_buff *skb)
* qdisc totals are added at end.
*/
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
- qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
+ qdisc = rcu_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
&qdisc->bstats, false);
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index ac6d7d0a35309a5375425790f89f587601511cc7..b585b94d0cef8e13a416d3c2f969e421bc11cbf5 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -567,7 +567,7 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
* qdisc totals are added at end.
*/
for (ntx = 0; ntx < dev->num_tx_queues; ntx++) {
- qdisc = rtnl_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
+ qdisc = rcu_dereference(netdev_get_tx_queue(dev, ntx)->qdisc_sleeping);
gnet_stats_add_basic(&bstats, qdisc->cpu_bstats,
&qdisc->bstats, false);
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 885a9bc859166dfb6d20aa0dfbb8f11194e02ba9..93a366ba793bf92afe8359d2844b7673326ef79c 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -2404,23 +2404,21 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
goto options_error;
- rcu_read_lock();
-
- oper = rtnl_dereference(q->oper_sched);
- admin = rtnl_dereference(q->admin_sched);
+ oper = rcu_dereference(q->oper_sched);
+ admin = rcu_dereference(q->admin_sched);
if (oper && taprio_dump_tc_entries(skb, q, oper))
- goto options_error_rcu;
+ goto options_error;
if (oper && dump_schedule(skb, oper))
- goto options_error_rcu;
+ goto options_error;
if (!admin)
goto done;
sched_nest = nla_nest_start_noflag(skb, TCA_TAPRIO_ATTR_ADMIN_SCHED);
if (!sched_nest)
- goto options_error_rcu;
+ goto options_error;
if (dump_schedule(skb, admin))
goto admin_error;
@@ -2428,15 +2426,11 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
nla_nest_end(skb, sched_nest);
done:
- rcu_read_unlock();
return nla_nest_end(skb, nest);
admin_error:
nla_nest_cancel(skb, sched_nest);
-options_error_rcu:
- rcu_read_unlock();
-
options_error:
nla_nest_cancel(skb, nest);
--
2.53.0.1213.gd9a14994de-goog
^ permalink raw reply related [flat|nested] 17+ messages in thread* Re: [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps
2026-04-08 12:55 [PATCH net-next 00/15] net/sched: no longer acquire RTNL in qdisc dumps Eric Dumazet
` (14 preceding siblings ...)
2026-04-08 12:56 ` [PATCH net-next 15/15] net/sched: convert tc_dump_qdisc() to RCU Eric Dumazet
@ 2026-04-08 17:09 ` Eric Dumazet
15 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2026-04-08 17:09 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Simon Horman, Jamal Hadi Salim, Jiri Pirko, Kuniyuki Iwashima,
netdev, eric.dumazet
On Wed, Apr 8, 2026 at 5:56 AM Eric Dumazet <edumazet@google.com> wrote:
>
> This (large) series bring RTNL avoidance to qdisc dumps.
>
> We first add annotations for data-races, so that most dump methods
> can run in parallel with data path.
>
> Then change mq and mqprio to no longer acquire each children
> qdisc spinlock.
>
> Last patch replaces RTNL with RCU for tc_dump_qdisc() and the
> qdisc ops->dump() methods.
>
> Series was too big, RTNL avoidance for class dumps will be done later.
>
> Eric Dumazet (15):
> net/sched: rename qstats_overlimit_inc() to qstats_cpu_overlimit_inc()
> net/sched: add qstats_cpu_drop_inc() helper
> net/sched: add READ_ONCE() in gnet_stats_add_queue[_cpu]
> net/sched: add qdisc_qlen_inc() and qdisc_qlen_dec()
> net/sched: annotate data-races around sch->qstats.backlog
> net/sched: sch_sfb: annotate data-races in sfb_dump_stats()
> net/sched: sch_red: annotate data-races in red_dump_stats()
> net/sched: sch_fq_codel: remove data-races from fq_codel_dump_stats()
> net/sched: sch_pie: annotate data-races in pie_dump_stats()
> net/sched: sch_fq_pie: annotate data-races in fq_pie_dump_stats()
> net_sched: sch_hhf: annotate data-races in hhf_dump_stats()
> net/sched: sch_choke: annotate data-races in choke_dump_stats()
> net/sched: sch_cake: annotate data-races in cake_dump_stats()
> net/sched: mq: no longer acquire qdisc spinlocks in dump operations
> net/sched: convert tc_dump_qdisc() to RCU
Do not spend too much time on this V1, I will send a V2 tomorrow
addressing some issues.
pw-bot: cr
^ permalink raw reply [flat|nested] 17+ messages in thread