* [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal
@ 2025-07-19 18:08 William Liu
2025-07-19 18:08 ` [PATCH net 2/2] selftests/tc-testing: Check backlog stats in gso_skb case William Liu
2025-07-22 13:32 ` [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal Paolo Abeni
0 siblings, 2 replies; 4+ messages in thread
From: William Liu @ 2025-07-19 18:08 UTC (permalink / raw)
To: netdev
Cc: jhs, xiyou.wangcong, pabeni, kuba, jiri, davem, edumazet, horms,
savy, William Liu
This issue applies for the following qdiscs: hhf, fq, fq_codel, and
fq_pie, and occurs in their change handlers when adjusting to the new
limit. The problems are the following in the values passed to the
subsequent qdisc_tree_reduce_backlog call:
1. When the tbf parent runs out of tokens, skbs of these qdiscs will
be placed in gso_skb. Their peek handlers are qdisc_peek_dequeued,
which accounts for both qlen and backlog. However, in the case of
qdisc_dequeue_internal, ONLY qlen is accounted for when pulling
from gso_skb. This means that these qdiscs are missing a
qdisc_qstats_backlog_dec when dropping packets to satisfy the
new limit in their change handlers.
One can observe this issue with the following (with tc patched to
support a limit of 0):
export TARGET=fq
tc qdisc del dev lo root
tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms
tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000
echo ''; echo 'add child'; tc -s -d qdisc show dev lo
ping -I lo -f -c2 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
echo ''; echo 'after ping'; tc -s -d qdisc show dev lo
tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0
echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo
tc qdisc replace dev lo handle 2: parent 1:1 sfq
echo ''; echo 'post graft'; tc -s -d qdisc show dev lo
The second to last show command shows 0 packets but a positive
number (74) of backlog bytes. The problem becomes clearer in the
last show command, where qdisc_purge_queue triggers
qdisc_tree_reduce_backlog with the positive backlog and causes an
underflow in the tbf parent's backlog (4096 Mb instead of 0).
2. fq_codel_change is also wrong in the non gso_skb case. It tracks
the amount to drop after the limit adjustment loop through
cstats.drop_count and cstats.drop_len, but these are also updated
in fq_codel_dequeue, and reset everytime if non-zero in that
function after a call to qdisc_tree_reduce_backlog.
If the drop path ever occurs in fq_codel_dequeue and
qdisc_dequeue_internal takes the non gso_skb path, then we would
reduce the backlog by an extra packet.
To fix these issues, the codepath for all clients of
qdisc_dequeue_internal has been simplified: codel, pie, hhf, fq,
fq_pie, and fq_codel. qdisc_dequeue_internal handles the backlog
adjustments for all cases that do not directly use the dequeue
handler.
Special care is taken for fq_codel_dequeue to account for the
qdisc_tree_reduce_backlog call in its dequeue handler. The
cstats reset is moved from the end to the beginning of
fq_codel_dequeue, so the change handler can use cstats for
proper backlog reduction accounting purposes. The drop_len and
drop_count fields are not used elsewhere so this reordering in
fq_codel_dequeue is ok.
Fixes: 2d3cbfd6d54a ("net_sched: Flush gso_skb list too during ->change()")
Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
---
include/net/sch_generic.h | 9 +++++++--
net/sched/sch_codel.c | 10 +++++-----
net/sched/sch_fq.c | 14 +++++++-------
net/sched/sch_fq_codel.c | 22 +++++++++++++++-------
net/sched/sch_fq_pie.c | 10 +++++-----
net/sched/sch_hhf.c | 6 +++---
net/sched/sch_pie.c | 10 +++++-----
7 files changed, 47 insertions(+), 34 deletions(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 638948be4c50..a24094a638dc 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1038,10 +1038,15 @@ static inline struct sk_buff *qdisc_dequeue_internal(struct Qdisc *sch, bool dir
skb = __skb_dequeue(&sch->gso_skb);
if (skb) {
sch->q.qlen--;
+ qdisc_qstats_backlog_dec(sch, skb);
+ return skb;
+ }
+ if (direct) {
+ skb = __qdisc_dequeue_head(&sch->q);
+ if (skb)
+ qdisc_qstats_backlog_dec(sch, skb);
return skb;
}
- if (direct)
- return __qdisc_dequeue_head(&sch->q);
else
return sch->dequeue(sch);
}
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index c93761040c6e..8dc467f665bb 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -103,7 +103,7 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt,
{
struct codel_sched_data *q = qdisc_priv(sch);
struct nlattr *tb[TCA_CODEL_MAX + 1];
- unsigned int qlen, dropped = 0;
+ unsigned int prev_qlen, prev_backlog;
int err;
err = nla_parse_nested_deprecated(tb, TCA_CODEL_MAX, opt,
@@ -142,15 +142,15 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt,
WRITE_ONCE(q->params.ecn,
!!nla_get_u32(tb[TCA_CODEL_ECN]));
- qlen = sch->q.qlen;
+ prev_qlen = sch->q.qlen;
+ prev_backlog = sch->qstats.backlog;
while (sch->q.qlen > sch->limit) {
struct sk_buff *skb = qdisc_dequeue_internal(sch, true);
- dropped += qdisc_pkt_len(skb);
- qdisc_qstats_backlog_dec(sch, skb);
rtnl_qdisc_drop(skb, sch);
}
- qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
+ qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
+ prev_backlog - sch->qstats.backlog);
sch_tree_unlock(sch);
return 0;
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 902ff5470607..986e71e3362c 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -1014,10 +1014,10 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
struct fq_sched_data *q = qdisc_priv(sch);
+ unsigned int prev_qlen, prev_backlog;
struct nlattr *tb[TCA_FQ_MAX + 1];
- int err, drop_count = 0;
- unsigned drop_len = 0;
u32 fq_log;
+ int err;
err = nla_parse_nested_deprecated(tb, TCA_FQ_MAX, opt, fq_policy,
NULL);
@@ -1135,16 +1135,16 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
err = fq_resize(sch, fq_log);
sch_tree_lock(sch);
}
+
+ prev_qlen = sch->q.qlen;
+ prev_backlog = sch->qstats.backlog;
while (sch->q.qlen > sch->limit) {
struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
- if (!skb)
- break;
- drop_len += qdisc_pkt_len(skb);
rtnl_kfree_skbs(skb, skb);
- drop_count++;
}
- qdisc_tree_reduce_backlog(sch, drop_count, drop_len);
+ qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
+ prev_backlog - sch->qstats.backlog);
sch_tree_unlock(sch);
return err;
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 2a0f3a513bfa..f9e6d76a1712 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -286,6 +286,10 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
struct fq_codel_flow *flow;
struct list_head *head;
+ /* reset these here, as change needs them for proper accounting*/
+ q->cstats.drop_count = 0;
+ q->cstats.drop_len = 0;
+
begin:
head = &q->new_flows;
if (list_empty(head)) {
@@ -319,8 +323,6 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
if (q->cstats.drop_count) {
qdisc_tree_reduce_backlog(sch, q->cstats.drop_count,
q->cstats.drop_len);
- q->cstats.drop_count = 0;
- q->cstats.drop_len = 0;
}
return skb;
}
@@ -366,8 +368,10 @@ static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {
static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
+ unsigned int dropped_qlen = 0, dropped_backlog = 0;
struct fq_codel_sched_data *q = qdisc_priv(sch);
struct nlattr *tb[TCA_FQ_CODEL_MAX + 1];
+ unsigned int prev_qlen, prev_backlog;
u32 quantum = 0;
int err;
@@ -439,17 +443,21 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt,
WRITE_ONCE(q->memory_limit,
min(1U << 31, nla_get_u32(tb[TCA_FQ_CODEL_MEMORY_LIMIT])));
+ prev_qlen = sch->q.qlen;
+ prev_backlog = sch->qstats.backlog;
while (sch->q.qlen > sch->limit ||
q->memory_usage > q->memory_limit) {
struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
- q->cstats.drop_len += qdisc_pkt_len(skb);
+ if (q->cstats.drop_count) {
+ dropped_qlen += q->cstats.drop_count;
+ dropped_backlog += q->cstats.drop_len;
+ }
+
rtnl_kfree_skbs(skb, skb);
- q->cstats.drop_count++;
}
- qdisc_tree_reduce_backlog(sch, q->cstats.drop_count, q->cstats.drop_len);
- q->cstats.drop_count = 0;
- q->cstats.drop_len = 0;
+ qdisc_tree_reduce_backlog(sch, prev_qlen - dropped_qlen - sch->q.qlen,
+ prev_backlog - dropped_backlog - sch->qstats.backlog);
sch_tree_unlock(sch);
return 0;
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index b0e34daf1f75..8f49e9ff4f4c 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -289,8 +289,7 @@ static int fq_pie_change(struct Qdisc *sch, struct nlattr *opt,
{
struct fq_pie_sched_data *q = qdisc_priv(sch);
struct nlattr *tb[TCA_FQ_PIE_MAX + 1];
- unsigned int len_dropped = 0;
- unsigned int num_dropped = 0;
+ unsigned int prev_qlen, prev_backlog;
int err;
err = nla_parse_nested(tb, TCA_FQ_PIE_MAX, opt, fq_pie_policy, extack);
@@ -365,14 +364,15 @@ static int fq_pie_change(struct Qdisc *sch, struct nlattr *opt,
nla_get_u32(tb[TCA_FQ_PIE_DQ_RATE_ESTIMATOR]));
/* Drop excess packets if new limit is lower */
+ prev_qlen = sch->q.qlen;
+ prev_backlog = sch->qstats.backlog;
while (sch->q.qlen > sch->limit) {
struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
- len_dropped += qdisc_pkt_len(skb);
- num_dropped += 1;
rtnl_kfree_skbs(skb, skb);
}
- qdisc_tree_reduce_backlog(sch, num_dropped, len_dropped);
+ qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
+ prev_backlog - sch->qstats.backlog);
sch_tree_unlock(sch);
return 0;
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 5aa434b46707..011d1330aea5 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -509,8 +509,8 @@ static int hhf_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
struct hhf_sched_data *q = qdisc_priv(sch);
+ unsigned int prev_qlen, prev_backlog;
struct nlattr *tb[TCA_HHF_MAX + 1];
- unsigned int qlen, prev_backlog;
int err;
u64 non_hh_quantum;
u32 new_quantum = q->quantum;
@@ -561,14 +561,14 @@ static int hhf_change(struct Qdisc *sch, struct nlattr *opt,
usecs_to_jiffies(us));
}
- qlen = sch->q.qlen;
+ prev_qlen = sch->q.qlen;
prev_backlog = sch->qstats.backlog;
while (sch->q.qlen > sch->limit) {
struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
rtnl_kfree_skbs(skb, skb);
}
- qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen,
+ qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
prev_backlog - sch->qstats.backlog);
sch_tree_unlock(sch);
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index ad46ee3ed5a9..af2646545a8a 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -142,8 +142,8 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt,
struct netlink_ext_ack *extack)
{
struct pie_sched_data *q = qdisc_priv(sch);
+ unsigned int prev_qlen, prev_backlog;
struct nlattr *tb[TCA_PIE_MAX + 1];
- unsigned int qlen, dropped = 0;
int err;
err = nla_parse_nested_deprecated(tb, TCA_PIE_MAX, opt, pie_policy,
@@ -193,15 +193,15 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt,
nla_get_u32(tb[TCA_PIE_DQ_RATE_ESTIMATOR]));
/* Drop excess packets if new limit is lower */
- qlen = sch->q.qlen;
+ prev_qlen = sch->q.qlen;
+ prev_backlog = sch->qstats.backlog;
while (sch->q.qlen > sch->limit) {
struct sk_buff *skb = qdisc_dequeue_internal(sch, true);
- dropped += qdisc_pkt_len(skb);
- qdisc_qstats_backlog_dec(sch, skb);
rtnl_qdisc_drop(skb, sch);
}
- qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
+ qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
+ prev_backlog - sch->qstats.backlog);
sch_tree_unlock(sch);
return 0;
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH net 2/2] selftests/tc-testing: Check backlog stats in gso_skb case
2025-07-19 18:08 [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal William Liu
@ 2025-07-19 18:08 ` William Liu
2025-07-22 13:32 ` [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal Paolo Abeni
1 sibling, 0 replies; 4+ messages in thread
From: William Liu @ 2025-07-19 18:08 UTC (permalink / raw)
To: netdev
Cc: jhs, xiyou.wangcong, pabeni, kuba, jiri, davem, edumazet, horms,
savy, William Liu
Add tests to ensure proper backlog accounting in hhf, codel, pie, fq,
fq_pie, and fq_codel qdiscs. For hhf, codel, and pie, we check for
the correct packet count and packet size. For fq, fq_pie, and fq_codel,
we check to make sure the backlog statistics do not underflow in tbf
after removing those qdiscs, which was an original bug symptom.
Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
---
.../tc-testing/tc-tests/infra/qdiscs.json | 201 ++++++++++++++++++
1 file changed, 201 insertions(+)
diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
index c6db7fa94f55..867654a31a95 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
@@ -185,6 +185,207 @@
"$IP addr del 10.10.10.10/24 dev $DUMMY || true"
]
},
+ {
+ "id": "34c0",
+ "name": "Test TBF with HHF Backlog Accounting in gso_skb case",
+ "category": [
+ "qdisc",
+ "tbf",
+ "hhf"
+ ],
+ "plugins": {
+ "requires": [
+ "nsPlugin",
+ "scapyPlugin"
+ ]
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+ "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 hhf limit 1000",
+ [
+ "ping -I $DUMMY -f -c8 -s32 -W0.001 10.10.11.11",
+ 1
+ ]
+ ],
+ "cmdUnderTest": "$TC qdisc change dev $DUMMY handle 2: parent 1:1 hhf limit 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+ "matchPattern": "backlog 74b 1p",
+ "matchCount": "2",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root"
+ ]
+ },
+ {
+ "id": "fd68",
+ "name": "Test TBF with CODEL Backlog Accounting in gso_skb case",
+ "category": [
+ "qdisc",
+ "tbf",
+ "codel"
+ ],
+ "plugins": {
+ "requires": [
+ "nsPlugin",
+ "scapyPlugin"
+ ]
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+ "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 codel limit 1000",
+ [
+ "ping -I $DUMMY -f -c8 -s32 -W0.001 10.10.11.11",
+ 1
+ ]
+ ],
+ "cmdUnderTest": "$TC qdisc change dev $DUMMY handle 2: parent 1:1 codel limit 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+ "matchPattern": "backlog 74b 1p",
+ "matchCount": "2",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root"
+ ]
+ },
+ {
+ "id": "514e",
+ "name": "Test TBF with PIE Backlog Accounting in gso_skb case",
+ "category": [
+ "qdisc",
+ "tbf",
+ "pie"
+ ],
+ "plugins": {
+ "requires": [
+ "nsPlugin",
+ "scapyPlugin"
+ ]
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+ "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 pie limit 1000",
+ [
+ "ping -I $DUMMY -f -c8 -s32 -W0.001 10.10.11.11",
+ 1
+ ]
+ ],
+ "cmdUnderTest": "$TC qdisc change dev $DUMMY handle 2: parent 1:1 pie limit 1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+ "matchPattern": "backlog 74b 1p",
+ "matchCount": "2",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root"
+ ]
+ },
+ {
+ "id": "6c97",
+ "name": "Test TBF with FQ Backlog Accounting in gso_skb case against underflow",
+ "category": [
+ "qdisc",
+ "tbf",
+ "fq"
+ ],
+ "plugins": {
+ "requires": [
+ "nsPlugin",
+ "scapyPlugin"
+ ]
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+ "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 fq limit 1000",
+ [
+ "ping -I $DUMMY -f -c8 -s32 -W0.001 10.10.11.11",
+ 1
+ ],
+ "$TC qdisc change dev $DUMMY handle 2: parent 1:1 fq limit 1"
+ ],
+ "cmdUnderTest": "$TC qdisc del dev $DUMMY handle 2: parent 1:1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+ "matchPattern": "backlog 0b 0p",
+ "matchCount": "1",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root"
+ ]
+ },
+ {
+ "id": "5d0b",
+ "name": "Test TBF with FQ_CODEL Backlog Accounting in gso_skb case against underflow",
+ "category": [
+ "qdisc",
+ "tbf",
+ "fq_codel"
+ ],
+ "plugins": {
+ "requires": [
+ "nsPlugin",
+ "scapyPlugin"
+ ]
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+ "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 fq_codel limit 1000",
+ [
+ "ping -I $DUMMY -f -c8 -s32 -W0.001 10.10.11.11",
+ 1
+ ],
+ "$TC qdisc change dev $DUMMY handle 2: parent 1:1 fq_codel limit 1"
+ ],
+ "cmdUnderTest": "$TC qdisc del dev $DUMMY handle 2: parent 1:1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+ "matchPattern": "backlog 0b 0p",
+ "matchCount": "1",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root"
+ ]
+ },
+ {
+ "id": "21c3",
+ "name": "Test TBF with FQ_PIE Backlog Accounting in gso_skb case against underflow",
+ "category": [
+ "qdisc",
+ "tbf",
+ "fq_pie"
+ ],
+ "plugins": {
+ "requires": [
+ "nsPlugin",
+ "scapyPlugin"
+ ]
+ },
+ "setup": [
+ "$IP link set dev $DUMMY up || true",
+ "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+ "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+ "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 fq_pie limit 1000",
+ [
+ "ping -I $DUMMY -f -c8 -s32 -W0.001 10.10.11.11",
+ 1
+ ],
+ "$TC qdisc change dev $DUMMY handle 2: parent 1:1 fq_pie limit 1"
+ ],
+ "cmdUnderTest": "$TC qdisc del dev $DUMMY handle 2: parent 1:1",
+ "expExitCode": "0",
+ "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+ "matchPattern": "backlog 0b 0p",
+ "matchCount": "1",
+ "teardown": [
+ "$TC qdisc del dev $DUMMY handle 1: root"
+ ]
+ },
{
"id": "a4bb",
"name": "Test FQ_CODEL with HTB parent - force packet drop with empty queue",
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal
2025-07-19 18:08 [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal William Liu
2025-07-19 18:08 ` [PATCH net 2/2] selftests/tc-testing: Check backlog stats in gso_skb case William Liu
@ 2025-07-22 13:32 ` Paolo Abeni
2025-07-22 15:52 ` William Liu
1 sibling, 1 reply; 4+ messages in thread
From: Paolo Abeni @ 2025-07-22 13:32 UTC (permalink / raw)
To: William Liu, netdev
Cc: jhs, xiyou.wangcong, kuba, jiri, davem, edumazet, horms, savy
On 7/19/25 8:08 PM, William Liu wrote:
> This issue applies for the following qdiscs: hhf, fq, fq_codel, and
> fq_pie, and occurs in their change handlers when adjusting to the new
> limit. The problems are the following in the values passed to the
> subsequent qdisc_tree_reduce_backlog call:
>
> 1. When the tbf parent runs out of tokens, skbs of these qdiscs will
> be placed in gso_skb. Their peek handlers are qdisc_peek_dequeued,
> which accounts for both qlen and backlog. However, in the case of
> qdisc_dequeue_internal, ONLY qlen is accounted for when pulling
> from gso_skb. This means that these qdiscs are missing a
> qdisc_qstats_backlog_dec when dropping packets to satisfy the
> new limit in their change handlers.
>
> One can observe this issue with the following (with tc patched to
> support a limit of 0):
>
> export TARGET=fq
> tc qdisc del dev lo root
> tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms
> tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000
> echo ''; echo 'add child'; tc -s -d qdisc show dev lo
> ping -I lo -f -c2 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
> echo ''; echo 'after ping'; tc -s -d qdisc show dev lo
> tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0
> echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo
> tc qdisc replace dev lo handle 2: parent 1:1 sfq
> echo ''; echo 'post graft'; tc -s -d qdisc show dev lo
>
> The second to last show command shows 0 packets but a positive
> number (74) of backlog bytes. The problem becomes clearer in the
> last show command, where qdisc_purge_queue triggers
> qdisc_tree_reduce_backlog with the positive backlog and causes an
> underflow in the tbf parent's backlog (4096 Mb instead of 0).
>
> 2. fq_codel_change is also wrong in the non gso_skb case. It tracks
> the amount to drop after the limit adjustment loop through
> cstats.drop_count and cstats.drop_len, but these are also updated
> in fq_codel_dequeue, and reset everytime if non-zero in that
> function after a call to qdisc_tree_reduce_backlog.
> If the drop path ever occurs in fq_codel_dequeue and
> qdisc_dequeue_internal takes the non gso_skb path, then we would
> reduce the backlog by an extra packet.
>
> To fix these issues, the codepath for all clients of
> qdisc_dequeue_internal has been simplified: codel, pie, hhf, fq,
> fq_pie, and fq_codel. qdisc_dequeue_internal handles the backlog
> adjustments for all cases that do not directly use the dequeue
> handler.
>
> Special care is taken for fq_codel_dequeue to account for the
> qdisc_tree_reduce_backlog call in its dequeue handler. The
> cstats reset is moved from the end to the beginning of
> fq_codel_dequeue, so the change handler can use cstats for
> proper backlog reduction accounting purposes. The drop_len and
> drop_count fields are not used elsewhere so this reordering in
> fq_codel_dequeue is ok.
>
> Fixes: 2d3cbfd6d54a ("net_sched: Flush gso_skb list too during ->change()")
> Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
> Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
>
> Signed-off-by: William Liu <will@willsroot.io>
> Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
Please avid black lines in the tag area, i.e. between 'Fixes' and the SoB.
Also I think this could/should be splitted in several patches, one for
each affected qdisc.
> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> index 2a0f3a513bfa..f9e6d76a1712 100644
> --- a/net/sched/sch_fq_codel.c
> +++ b/net/sched/sch_fq_codel.c
> @@ -286,6 +286,10 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
> struct fq_codel_flow *flow;
> struct list_head *head;
>
> + /* reset these here, as change needs them for proper accounting*/
> + q->cstats.drop_count = 0;
> + q->cstats.drop_len = 0;
> +
> begin:
> head = &q->new_flows;
> if (list_empty(head)) {
> @@ -319,8 +323,6 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
> if (q->cstats.drop_count) {
Why is this 'if' still needed ? Isn't drop_count always 0 here?
/P
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal
2025-07-22 13:32 ` [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal Paolo Abeni
@ 2025-07-22 15:52 ` William Liu
0 siblings, 0 replies; 4+ messages in thread
From: William Liu @ 2025-07-22 15:52 UTC (permalink / raw)
To: Paolo Abeni
Cc: netdev, jhs, xiyou.wangcong, kuba, jiri, davem, edumazet, horms,
savy
On Tuesday, July 22nd, 2025 at 1:32 PM, Paolo Abeni <pabeni@redhat.com> wrote:
Thank you for the review!
>
>
> On 7/19/25 8:08 PM, William Liu wrote:
>
> > This issue applies for the following qdiscs: hhf, fq, fq_codel, and
> > fq_pie, and occurs in their change handlers when adjusting to the new
> > limit. The problems are the following in the values passed to the
> > subsequent qdisc_tree_reduce_backlog call:
> >
> > 1. When the tbf parent runs out of tokens, skbs of these qdiscs will
> > be placed in gso_skb. Their peek handlers are qdisc_peek_dequeued,
> > which accounts for both qlen and backlog. However, in the case of
> > qdisc_dequeue_internal, ONLY qlen is accounted for when pulling
> > from gso_skb. This means that these qdiscs are missing a
> > qdisc_qstats_backlog_dec when dropping packets to satisfy the
> > new limit in their change handlers.
> >
> > One can observe this issue with the following (with tc patched to
> > support a limit of 0):
> >
> > export TARGET=fq
> > tc qdisc del dev lo root
> > tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms
> > tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000
> > echo ''; echo 'add child'; tc -s -d qdisc show dev lo
> > ping -I lo -f -c2 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
> > echo ''; echo 'after ping'; tc -s -d qdisc show dev lo
> > tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0
> > echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo
> > tc qdisc replace dev lo handle 2: parent 1:1 sfq
> > echo ''; echo 'post graft'; tc -s -d qdisc show dev lo
> >
> > The second to last show command shows 0 packets but a positive
> > number (74) of backlog bytes. The problem becomes clearer in the
> > last show command, where qdisc_purge_queue triggers
> > qdisc_tree_reduce_backlog with the positive backlog and causes an
> > underflow in the tbf parent's backlog (4096 Mb instead of 0).
> >
> > 2. fq_codel_change is also wrong in the non gso_skb case. It tracks
> > the amount to drop after the limit adjustment loop through
> > cstats.drop_count and cstats.drop_len, but these are also updated
> > in fq_codel_dequeue, and reset everytime if non-zero in that
> > function after a call to qdisc_tree_reduce_backlog.
> > If the drop path ever occurs in fq_codel_dequeue and
> > qdisc_dequeue_internal takes the non gso_skb path, then we would
> > reduce the backlog by an extra packet.
> >
> > To fix these issues, the codepath for all clients of
> > qdisc_dequeue_internal has been simplified: codel, pie, hhf, fq,
> > fq_pie, and fq_codel. qdisc_dequeue_internal handles the backlog
> > adjustments for all cases that do not directly use the dequeue
> > handler.
> >
> > Special care is taken for fq_codel_dequeue to account for the
> > qdisc_tree_reduce_backlog call in its dequeue handler. The
> > cstats reset is moved from the end to the beginning of
> > fq_codel_dequeue, so the change handler can use cstats for
> > proper backlog reduction accounting purposes. The drop_len and
> > drop_count fields are not used elsewhere so this reordering in
> > fq_codel_dequeue is ok.
> >
> > Fixes: 2d3cbfd6d54a ("net_sched: Flush gso_skb list too during ->change()")
> > Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
> > Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
> >
> > Signed-off-by: William Liu will@willsroot.io
> > Reviewed-by: Savino Dicanosa savy@syst3mfailure.io
>
>
> Please avid black lines in the tag area, i.e. between 'Fixes' and the SoB.
>
Ok noted.
> Also I think this could/should be splitted in several patches, one for
> each affected qdisc.
>
I considered doing that originally, but the commit that introduced qdisc_dequeue_internal had all the changes in one patch: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2d3cbfd6d54a2c39ce3244f33f85c595844bd7b8
Plus, splitting this change up for each affected qdisc and the change in qdisc_dequeue_internal would have codel and pie end up with broken backlog accounting, unless I group them with the same patch for qdisc_dequeue_internal. Should I still split it in this case?
> > diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> > index 2a0f3a513bfa..f9e6d76a1712 100644
> > --- a/net/sched/sch_fq_codel.c
> > +++ b/net/sched/sch_fq_codel.c
> > @@ -286,6 +286,10 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
> > struct fq_codel_flow *flow;
> > struct list_head *head;
> >
> > + /* reset these here, as change needs them for proper accounting*/
> > + q->cstats.drop_count = 0;
> > + q->cstats.drop_len = 0;
> > +
> > begin:
> > head = &q->new_flows;
> > if (list_empty(head)) {
> > @@ -319,8 +323,6 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
> > if (q->cstats.drop_count) {
>
>
> Why is this 'if' still needed ? Isn't drop_count always 0 here?
>
No, the codel_dequeue call in the middle can adjust that value.
> /P
Best,
Will
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-07-22 15:52 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-19 18:08 [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal William Liu
2025-07-19 18:08 ` [PATCH net 2/2] selftests/tc-testing: Check backlog stats in gso_skb case William Liu
2025-07-22 13:32 ` [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal Paolo Abeni
2025-07-22 15:52 ` William Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).