netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal
@ 2025-07-19 18:08 William Liu
  2025-07-19 18:08 ` [PATCH net 2/2] selftests/tc-testing: Check backlog stats in gso_skb case William Liu
  2025-07-22 13:32 ` [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal Paolo Abeni
  0 siblings, 2 replies; 4+ messages in thread
From: William Liu @ 2025-07-19 18:08 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, pabeni, kuba, jiri, davem, edumazet, horms,
	savy, William Liu

This issue applies for the following qdiscs: hhf, fq, fq_codel, and
fq_pie, and occurs in their change handlers when adjusting to the new
limit. The problems are the following in the values passed to the
subsequent qdisc_tree_reduce_backlog call:

1. When the tbf parent runs out of tokens, skbs of these qdiscs will
   be placed in gso_skb. Their peek handlers are qdisc_peek_dequeued,
   which accounts for both qlen and backlog. However, in the case of
   qdisc_dequeue_internal, ONLY qlen is accounted for when pulling
   from gso_skb. This means that these qdiscs are missing a
   qdisc_qstats_backlog_dec when dropping packets to satisfy the
   new limit in their change handlers.

   One can observe this issue with the following (with tc patched to
   support a limit of 0):

   export TARGET=fq
   tc qdisc del dev lo root
   tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms
   tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000
   echo ''; echo 'add child'; tc -s -d qdisc show dev lo
   ping -I lo -f -c2 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
   echo ''; echo 'after ping'; tc -s -d qdisc show dev lo
   tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0
   echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo
   tc qdisc replace dev lo handle 2: parent 1:1 sfq
   echo ''; echo 'post graft'; tc -s -d qdisc show dev lo

   The second to last show command shows 0 packets but a positive
   number (74) of backlog bytes. The problem becomes clearer in the
   last show command, where qdisc_purge_queue triggers
   qdisc_tree_reduce_backlog with the positive backlog and causes an
   underflow in the tbf parent's backlog (4096 Mb instead of 0).

2. fq_codel_change is also wrong in the non gso_skb case. It tracks
   the amount to drop after the limit adjustment loop through
   cstats.drop_count and cstats.drop_len, but these are also updated
   in fq_codel_dequeue, and reset everytime if non-zero in that
   function after a call to qdisc_tree_reduce_backlog.
   If the drop path ever occurs in fq_codel_dequeue and
   qdisc_dequeue_internal takes the non gso_skb path, then we would
   reduce the backlog by an extra packet.

To fix these issues, the codepath for all clients of
qdisc_dequeue_internal has been simplified: codel, pie, hhf, fq,
fq_pie, and fq_codel. qdisc_dequeue_internal handles the backlog
adjustments for all cases that do not directly use the dequeue
handler.

Special care is taken for fq_codel_dequeue to account for the
qdisc_tree_reduce_backlog call in its dequeue handler. The
cstats reset is moved from the end to the beginning of
fq_codel_dequeue, so the change handler can use cstats for
proper backlog reduction accounting purposes. The drop_len and
drop_count fields are not used elsewhere so this reordering in
fq_codel_dequeue is ok.

Fixes: 2d3cbfd6d54a ("net_sched: Flush gso_skb list too during ->change()")
Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")

Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
---
 include/net/sch_generic.h |  9 +++++++--
 net/sched/sch_codel.c     | 10 +++++-----
 net/sched/sch_fq.c        | 14 +++++++-------
 net/sched/sch_fq_codel.c  | 22 +++++++++++++++-------
 net/sched/sch_fq_pie.c    | 10 +++++-----
 net/sched/sch_hhf.c       |  6 +++---
 net/sched/sch_pie.c       | 10 +++++-----
 7 files changed, 47 insertions(+), 34 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 638948be4c50..a24094a638dc 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -1038,10 +1038,15 @@ static inline struct sk_buff *qdisc_dequeue_internal(struct Qdisc *sch, bool dir
 	skb = __skb_dequeue(&sch->gso_skb);
 	if (skb) {
 		sch->q.qlen--;
+		qdisc_qstats_backlog_dec(sch, skb);
+		return skb;
+	}
+	if (direct) {
+		skb = __qdisc_dequeue_head(&sch->q);
+		if (skb)
+			qdisc_qstats_backlog_dec(sch, skb);
 		return skb;
 	}
-	if (direct)
-		return __qdisc_dequeue_head(&sch->q);
 	else
 		return sch->dequeue(sch);
 }
diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index c93761040c6e..8dc467f665bb 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -103,7 +103,7 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt,
 {
 	struct codel_sched_data *q = qdisc_priv(sch);
 	struct nlattr *tb[TCA_CODEL_MAX + 1];
-	unsigned int qlen, dropped = 0;
+	unsigned int prev_qlen, prev_backlog;
 	int err;
 
 	err = nla_parse_nested_deprecated(tb, TCA_CODEL_MAX, opt,
@@ -142,15 +142,15 @@ static int codel_change(struct Qdisc *sch, struct nlattr *opt,
 		WRITE_ONCE(q->params.ecn,
 			   !!nla_get_u32(tb[TCA_CODEL_ECN]));
 
-	qlen = sch->q.qlen;
+	prev_qlen = sch->q.qlen;
+	prev_backlog = sch->qstats.backlog;
 	while (sch->q.qlen > sch->limit) {
 		struct sk_buff *skb = qdisc_dequeue_internal(sch, true);
 
-		dropped += qdisc_pkt_len(skb);
-		qdisc_qstats_backlog_dec(sch, skb);
 		rtnl_qdisc_drop(skb, sch);
 	}
-	qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
+	qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
+				  prev_backlog - sch->qstats.backlog);
 
 	sch_tree_unlock(sch);
 	return 0;
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index 902ff5470607..986e71e3362c 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -1014,10 +1014,10 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
 		     struct netlink_ext_ack *extack)
 {
 	struct fq_sched_data *q = qdisc_priv(sch);
+	unsigned int prev_qlen, prev_backlog;
 	struct nlattr *tb[TCA_FQ_MAX + 1];
-	int err, drop_count = 0;
-	unsigned drop_len = 0;
 	u32 fq_log;
+	int err;
 
 	err = nla_parse_nested_deprecated(tb, TCA_FQ_MAX, opt, fq_policy,
 					  NULL);
@@ -1135,16 +1135,16 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt,
 		err = fq_resize(sch, fq_log);
 		sch_tree_lock(sch);
 	}
+
+	prev_qlen = sch->q.qlen;
+	prev_backlog = sch->qstats.backlog;
 	while (sch->q.qlen > sch->limit) {
 		struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
 
-		if (!skb)
-			break;
-		drop_len += qdisc_pkt_len(skb);
 		rtnl_kfree_skbs(skb, skb);
-		drop_count++;
 	}
-	qdisc_tree_reduce_backlog(sch, drop_count, drop_len);
+	qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
+				  prev_backlog - sch->qstats.backlog);
 
 	sch_tree_unlock(sch);
 	return err;
diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
index 2a0f3a513bfa..f9e6d76a1712 100644
--- a/net/sched/sch_fq_codel.c
+++ b/net/sched/sch_fq_codel.c
@@ -286,6 +286,10 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
 	struct fq_codel_flow *flow;
 	struct list_head *head;
 
+	/* reset these here, as change needs them for proper accounting*/
+	q->cstats.drop_count = 0;
+	q->cstats.drop_len = 0;
+
 begin:
 	head = &q->new_flows;
 	if (list_empty(head)) {
@@ -319,8 +323,6 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
 	if (q->cstats.drop_count) {
 		qdisc_tree_reduce_backlog(sch, q->cstats.drop_count,
 					  q->cstats.drop_len);
-		q->cstats.drop_count = 0;
-		q->cstats.drop_len = 0;
 	}
 	return skb;
 }
@@ -366,8 +368,10 @@ static const struct nla_policy fq_codel_policy[TCA_FQ_CODEL_MAX + 1] = {
 static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt,
 			   struct netlink_ext_ack *extack)
 {
+	unsigned int dropped_qlen = 0, dropped_backlog = 0;
 	struct fq_codel_sched_data *q = qdisc_priv(sch);
 	struct nlattr *tb[TCA_FQ_CODEL_MAX + 1];
+	unsigned int prev_qlen, prev_backlog;
 	u32 quantum = 0;
 	int err;
 
@@ -439,17 +443,21 @@ static int fq_codel_change(struct Qdisc *sch, struct nlattr *opt,
 		WRITE_ONCE(q->memory_limit,
 			   min(1U << 31, nla_get_u32(tb[TCA_FQ_CODEL_MEMORY_LIMIT])));
 
+	prev_qlen = sch->q.qlen;
+	prev_backlog = sch->qstats.backlog;
 	while (sch->q.qlen > sch->limit ||
 	       q->memory_usage > q->memory_limit) {
 		struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
 
-		q->cstats.drop_len += qdisc_pkt_len(skb);
+		if (q->cstats.drop_count) {
+			dropped_qlen += q->cstats.drop_count;
+			dropped_backlog += q->cstats.drop_len;
+		}
+
 		rtnl_kfree_skbs(skb, skb);
-		q->cstats.drop_count++;
 	}
-	qdisc_tree_reduce_backlog(sch, q->cstats.drop_count, q->cstats.drop_len);
-	q->cstats.drop_count = 0;
-	q->cstats.drop_len = 0;
+	qdisc_tree_reduce_backlog(sch, prev_qlen - dropped_qlen - sch->q.qlen,
+				  prev_backlog - dropped_backlog - sch->qstats.backlog);
 
 	sch_tree_unlock(sch);
 	return 0;
diff --git a/net/sched/sch_fq_pie.c b/net/sched/sch_fq_pie.c
index b0e34daf1f75..8f49e9ff4f4c 100644
--- a/net/sched/sch_fq_pie.c
+++ b/net/sched/sch_fq_pie.c
@@ -289,8 +289,7 @@ static int fq_pie_change(struct Qdisc *sch, struct nlattr *opt,
 {
 	struct fq_pie_sched_data *q = qdisc_priv(sch);
 	struct nlattr *tb[TCA_FQ_PIE_MAX + 1];
-	unsigned int len_dropped = 0;
-	unsigned int num_dropped = 0;
+	unsigned int prev_qlen, prev_backlog;
 	int err;
 
 	err = nla_parse_nested(tb, TCA_FQ_PIE_MAX, opt, fq_pie_policy, extack);
@@ -365,14 +364,15 @@ static int fq_pie_change(struct Qdisc *sch, struct nlattr *opt,
 			   nla_get_u32(tb[TCA_FQ_PIE_DQ_RATE_ESTIMATOR]));
 
 	/* Drop excess packets if new limit is lower */
+	prev_qlen = sch->q.qlen;
+	prev_backlog = sch->qstats.backlog;
 	while (sch->q.qlen > sch->limit) {
 		struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
 
-		len_dropped += qdisc_pkt_len(skb);
-		num_dropped += 1;
 		rtnl_kfree_skbs(skb, skb);
 	}
-	qdisc_tree_reduce_backlog(sch, num_dropped, len_dropped);
+	qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
+				  prev_backlog - sch->qstats.backlog);
 
 	sch_tree_unlock(sch);
 	return 0;
diff --git a/net/sched/sch_hhf.c b/net/sched/sch_hhf.c
index 5aa434b46707..011d1330aea5 100644
--- a/net/sched/sch_hhf.c
+++ b/net/sched/sch_hhf.c
@@ -509,8 +509,8 @@ static int hhf_change(struct Qdisc *sch, struct nlattr *opt,
 		      struct netlink_ext_ack *extack)
 {
 	struct hhf_sched_data *q = qdisc_priv(sch);
+	unsigned int prev_qlen, prev_backlog;
 	struct nlattr *tb[TCA_HHF_MAX + 1];
-	unsigned int qlen, prev_backlog;
 	int err;
 	u64 non_hh_quantum;
 	u32 new_quantum = q->quantum;
@@ -561,14 +561,14 @@ static int hhf_change(struct Qdisc *sch, struct nlattr *opt,
 			   usecs_to_jiffies(us));
 	}
 
-	qlen = sch->q.qlen;
+	prev_qlen = sch->q.qlen;
 	prev_backlog = sch->qstats.backlog;
 	while (sch->q.qlen > sch->limit) {
 		struct sk_buff *skb = qdisc_dequeue_internal(sch, false);
 
 		rtnl_kfree_skbs(skb, skb);
 	}
-	qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen,
+	qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
 				  prev_backlog - sch->qstats.backlog);
 
 	sch_tree_unlock(sch);
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index ad46ee3ed5a9..af2646545a8a 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -142,8 +142,8 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt,
 		      struct netlink_ext_ack *extack)
 {
 	struct pie_sched_data *q = qdisc_priv(sch);
+	unsigned int prev_qlen, prev_backlog;
 	struct nlattr *tb[TCA_PIE_MAX + 1];
-	unsigned int qlen, dropped = 0;
 	int err;
 
 	err = nla_parse_nested_deprecated(tb, TCA_PIE_MAX, opt, pie_policy,
@@ -193,15 +193,15 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt,
 			   nla_get_u32(tb[TCA_PIE_DQ_RATE_ESTIMATOR]));
 
 	/* Drop excess packets if new limit is lower */
-	qlen = sch->q.qlen;
+	prev_qlen = sch->q.qlen;
+	prev_backlog = sch->qstats.backlog;
 	while (sch->q.qlen > sch->limit) {
 		struct sk_buff *skb = qdisc_dequeue_internal(sch, true);
 
-		dropped += qdisc_pkt_len(skb);
-		qdisc_qstats_backlog_dec(sch, skb);
 		rtnl_qdisc_drop(skb, sch);
 	}
-	qdisc_tree_reduce_backlog(sch, qlen - sch->q.qlen, dropped);
+	qdisc_tree_reduce_backlog(sch, prev_qlen - sch->q.qlen,
+				  prev_backlog - sch->qstats.backlog);
 
 	sch_tree_unlock(sch);
 	return 0;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH net 2/2] selftests/tc-testing: Check backlog stats in gso_skb case
  2025-07-19 18:08 [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal William Liu
@ 2025-07-19 18:08 ` William Liu
  2025-07-22 13:32 ` [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal Paolo Abeni
  1 sibling, 0 replies; 4+ messages in thread
From: William Liu @ 2025-07-19 18:08 UTC (permalink / raw)
  To: netdev
  Cc: jhs, xiyou.wangcong, pabeni, kuba, jiri, davem, edumazet, horms,
	savy, William Liu

Add tests to ensure proper backlog accounting in hhf, codel, pie, fq,
fq_pie, and fq_codel qdiscs. For hhf, codel, and pie, we check for
the correct packet count and packet size. For fq, fq_pie, and fq_codel,
we check to make sure the backlog statistics do not underflow in tbf
after removing those qdiscs, which was an original bug symptom.

Signed-off-by: William Liu <will@willsroot.io>
Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>
---
 .../tc-testing/tc-tests/infra/qdiscs.json     | 201 ++++++++++++++++++
 1 file changed, 201 insertions(+)

diff --git a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
index c6db7fa94f55..867654a31a95 100644
--- a/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
+++ b/tools/testing/selftests/tc-testing/tc-tests/infra/qdiscs.json
@@ -185,6 +185,207 @@
             "$IP addr del 10.10.10.10/24 dev $DUMMY || true"
         ]
     },
+    {
+        "id": "34c0",
+        "name": "Test TBF with HHF Backlog Accounting in gso_skb case",
+        "category": [
+            "qdisc",
+            "tbf",
+            "hhf"
+        ],
+        "plugins": {
+            "requires": [
+                "nsPlugin",
+                "scapyPlugin"
+            ]
+        },
+        "setup": [
+            "$IP link set dev $DUMMY up || true",
+            "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+            "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+            "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 hhf limit 1000",
+            [
+                "ping -I $DUMMY  -f -c8 -s32 -W0.001 10.10.11.11",
+                1
+            ]
+        ],
+        "cmdUnderTest": "$TC qdisc change dev $DUMMY handle 2: parent 1:1 hhf limit 1",
+        "expExitCode": "0",
+        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+        "matchPattern": "backlog 74b 1p",
+        "matchCount": "2",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY handle 1: root"
+        ]
+    },
+    {
+        "id": "fd68",
+        "name": "Test TBF with CODEL Backlog Accounting in gso_skb case",
+        "category": [
+            "qdisc",
+            "tbf",
+            "codel"
+        ],
+        "plugins": {
+            "requires": [
+                "nsPlugin",
+                "scapyPlugin"
+            ]
+        },
+        "setup": [
+            "$IP link set dev $DUMMY up || true",
+            "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+            "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+            "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 codel limit 1000",
+            [
+                "ping -I $DUMMY  -f -c8 -s32 -W0.001 10.10.11.11",
+                1
+            ]
+        ],
+        "cmdUnderTest": "$TC qdisc change dev $DUMMY handle 2: parent 1:1 codel limit 1",
+        "expExitCode": "0",
+        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+        "matchPattern": "backlog 74b 1p",
+        "matchCount": "2",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY handle 1: root"
+        ]
+    },
+    {
+        "id": "514e",
+        "name": "Test TBF with PIE Backlog Accounting in gso_skb case",
+        "category": [
+            "qdisc",
+            "tbf",
+            "pie"
+        ],
+        "plugins": {
+            "requires": [
+                "nsPlugin",
+                "scapyPlugin"
+            ]
+        },
+        "setup": [
+            "$IP link set dev $DUMMY up || true",
+            "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+            "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+            "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 pie limit 1000",
+            [
+                "ping -I $DUMMY  -f -c8 -s32 -W0.001 10.10.11.11",
+                1
+            ]
+        ],
+        "cmdUnderTest": "$TC qdisc change dev $DUMMY handle 2: parent 1:1 pie limit 1",
+        "expExitCode": "0",
+        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+        "matchPattern": "backlog 74b 1p",
+        "matchCount": "2",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY handle 1: root"
+        ]
+    },
+    {
+        "id": "6c97",
+        "name": "Test TBF with FQ Backlog Accounting in gso_skb case against underflow",
+        "category": [
+            "qdisc",
+            "tbf",
+            "fq"
+        ],
+        "plugins": {
+            "requires": [
+                "nsPlugin",
+                "scapyPlugin"
+            ]
+        },
+        "setup": [
+            "$IP link set dev $DUMMY up || true",
+            "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+            "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+            "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 fq limit 1000",
+            [
+                "ping -I $DUMMY  -f -c8 -s32 -W0.001 10.10.11.11",
+                1
+            ],
+            "$TC qdisc change dev $DUMMY handle 2: parent 1:1 fq limit 1"
+        ],
+        "cmdUnderTest": "$TC qdisc del dev $DUMMY handle 2: parent 1:1",
+        "expExitCode": "0",
+        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+        "matchPattern": "backlog 0b 0p",
+        "matchCount": "1",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY handle 1: root"
+        ]
+    },
+    {
+        "id": "5d0b",
+        "name": "Test TBF with FQ_CODEL Backlog Accounting in gso_skb case against underflow",
+        "category": [
+            "qdisc",
+            "tbf",
+            "fq_codel"
+        ],
+        "plugins": {
+            "requires": [
+                "nsPlugin",
+                "scapyPlugin"
+            ]
+        },
+        "setup": [
+            "$IP link set dev $DUMMY up || true",
+            "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+            "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+            "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 fq_codel limit 1000",
+            [
+                "ping -I $DUMMY  -f -c8 -s32 -W0.001 10.10.11.11",
+                1
+            ],
+            "$TC qdisc change dev $DUMMY handle 2: parent 1:1 fq_codel limit 1"
+        ],
+        "cmdUnderTest": "$TC qdisc del dev $DUMMY handle 2: parent 1:1",
+        "expExitCode": "0",
+        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+        "matchPattern": "backlog 0b 0p",
+        "matchCount": "1",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY handle 1: root"
+        ]
+    },
+    {
+        "id": "21c3",
+        "name": "Test TBF with FQ_PIE Backlog Accounting in gso_skb case against underflow",
+        "category": [
+            "qdisc",
+            "tbf",
+            "fq_pie"
+        ],
+        "plugins": {
+            "requires": [
+                "nsPlugin",
+                "scapyPlugin"
+            ]
+        },
+        "setup": [
+            "$IP link set dev $DUMMY up || true",
+            "$IP addr add 10.10.11.10/24 dev $DUMMY || true",
+            "$TC qdisc add dev $DUMMY root handle 1: tbf rate 8bit burst 100b latency 100ms",
+            "$TC qdisc replace dev $DUMMY handle 2: parent 1:1 fq_pie limit 1000",
+            [
+                "ping -I $DUMMY  -f -c8 -s32 -W0.001 10.10.11.11",
+                1
+            ],
+            "$TC qdisc change dev $DUMMY handle 2: parent 1:1 fq_pie limit 1"
+        ],
+        "cmdUnderTest": "$TC qdisc del dev $DUMMY handle 2: parent 1:1",
+        "expExitCode": "0",
+        "verifyCmd": "$TC -s qdisc show dev $DUMMY",
+        "matchPattern": "backlog 0b 0p",
+        "matchCount": "1",
+        "teardown": [
+            "$TC qdisc del dev $DUMMY handle 1: root"
+        ]
+    },
     {
         "id": "a4bb",
         "name": "Test FQ_CODEL with HTB parent - force packet drop with empty queue",
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal
  2025-07-19 18:08 [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal William Liu
  2025-07-19 18:08 ` [PATCH net 2/2] selftests/tc-testing: Check backlog stats in gso_skb case William Liu
@ 2025-07-22 13:32 ` Paolo Abeni
  2025-07-22 15:52   ` William Liu
  1 sibling, 1 reply; 4+ messages in thread
From: Paolo Abeni @ 2025-07-22 13:32 UTC (permalink / raw)
  To: William Liu, netdev
  Cc: jhs, xiyou.wangcong, kuba, jiri, davem, edumazet, horms, savy

On 7/19/25 8:08 PM, William Liu wrote:
> This issue applies for the following qdiscs: hhf, fq, fq_codel, and
> fq_pie, and occurs in their change handlers when adjusting to the new
> limit. The problems are the following in the values passed to the
> subsequent qdisc_tree_reduce_backlog call:
> 
> 1. When the tbf parent runs out of tokens, skbs of these qdiscs will
>    be placed in gso_skb. Their peek handlers are qdisc_peek_dequeued,
>    which accounts for both qlen and backlog. However, in the case of
>    qdisc_dequeue_internal, ONLY qlen is accounted for when pulling
>    from gso_skb. This means that these qdiscs are missing a
>    qdisc_qstats_backlog_dec when dropping packets to satisfy the
>    new limit in their change handlers.
> 
>    One can observe this issue with the following (with tc patched to
>    support a limit of 0):
> 
>    export TARGET=fq
>    tc qdisc del dev lo root
>    tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms
>    tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000
>    echo ''; echo 'add child'; tc -s -d qdisc show dev lo
>    ping -I lo -f -c2 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
>    echo ''; echo 'after ping'; tc -s -d qdisc show dev lo
>    tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0
>    echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo
>    tc qdisc replace dev lo handle 2: parent 1:1 sfq
>    echo ''; echo 'post graft'; tc -s -d qdisc show dev lo
> 
>    The second to last show command shows 0 packets but a positive
>    number (74) of backlog bytes. The problem becomes clearer in the
>    last show command, where qdisc_purge_queue triggers
>    qdisc_tree_reduce_backlog with the positive backlog and causes an
>    underflow in the tbf parent's backlog (4096 Mb instead of 0).
> 
> 2. fq_codel_change is also wrong in the non gso_skb case. It tracks
>    the amount to drop after the limit adjustment loop through
>    cstats.drop_count and cstats.drop_len, but these are also updated
>    in fq_codel_dequeue, and reset everytime if non-zero in that
>    function after a call to qdisc_tree_reduce_backlog.
>    If the drop path ever occurs in fq_codel_dequeue and
>    qdisc_dequeue_internal takes the non gso_skb path, then we would
>    reduce the backlog by an extra packet.
> 
> To fix these issues, the codepath for all clients of
> qdisc_dequeue_internal has been simplified: codel, pie, hhf, fq,
> fq_pie, and fq_codel. qdisc_dequeue_internal handles the backlog
> adjustments for all cases that do not directly use the dequeue
> handler.
> 
> Special care is taken for fq_codel_dequeue to account for the
> qdisc_tree_reduce_backlog call in its dequeue handler. The
> cstats reset is moved from the end to the beginning of
> fq_codel_dequeue, so the change handler can use cstats for
> proper backlog reduction accounting purposes. The drop_len and
> drop_count fields are not used elsewhere so this reordering in
> fq_codel_dequeue is ok.
> 
> Fixes: 2d3cbfd6d54a ("net_sched: Flush gso_skb list too during ->change()")
> Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
> Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
> 
> Signed-off-by: William Liu <will@willsroot.io>
> Reviewed-by: Savino Dicanosa <savy@syst3mfailure.io>

Please avid black lines in the tag area, i.e. between 'Fixes' and the SoB.

Also I think this could/should be splitted in several patches, one for
each affected qdisc.

> diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> index 2a0f3a513bfa..f9e6d76a1712 100644
> --- a/net/sched/sch_fq_codel.c
> +++ b/net/sched/sch_fq_codel.c
> @@ -286,6 +286,10 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
>  	struct fq_codel_flow *flow;
>  	struct list_head *head;
>  
> +	/* reset these here, as change needs them for proper accounting*/
> +	q->cstats.drop_count = 0;
> +	q->cstats.drop_len = 0;
> +
>  begin:
>  	head = &q->new_flows;
>  	if (list_empty(head)) {
> @@ -319,8 +323,6 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
>  	if (q->cstats.drop_count) {

Why is this 'if' still needed ? Isn't drop_count always 0 here?

/P


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal
  2025-07-22 13:32 ` [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal Paolo Abeni
@ 2025-07-22 15:52   ` William Liu
  0 siblings, 0 replies; 4+ messages in thread
From: William Liu @ 2025-07-22 15:52 UTC (permalink / raw)
  To: Paolo Abeni
  Cc: netdev, jhs, xiyou.wangcong, kuba, jiri, davem, edumazet, horms,
	savy

On Tuesday, July 22nd, 2025 at 1:32 PM, Paolo Abeni <pabeni@redhat.com> wrote:

Thank you for the review!

> 
> 
> On 7/19/25 8:08 PM, William Liu wrote:
> 
> > This issue applies for the following qdiscs: hhf, fq, fq_codel, and
> > fq_pie, and occurs in their change handlers when adjusting to the new
> > limit. The problems are the following in the values passed to the
> > subsequent qdisc_tree_reduce_backlog call:
> > 
> > 1. When the tbf parent runs out of tokens, skbs of these qdiscs will
> > be placed in gso_skb. Their peek handlers are qdisc_peek_dequeued,
> > which accounts for both qlen and backlog. However, in the case of
> > qdisc_dequeue_internal, ONLY qlen is accounted for when pulling
> > from gso_skb. This means that these qdiscs are missing a
> > qdisc_qstats_backlog_dec when dropping packets to satisfy the
> > new limit in their change handlers.
> > 
> > One can observe this issue with the following (with tc patched to
> > support a limit of 0):
> > 
> > export TARGET=fq
> > tc qdisc del dev lo root
> > tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms
> > tc qdisc replace dev lo handle 3: parent 1:1 $TARGET limit 1000
> > echo ''; echo 'add child'; tc -s -d qdisc show dev lo
> > ping -I lo -f -c2 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
> > echo ''; echo 'after ping'; tc -s -d qdisc show dev lo
> > tc qdisc change dev lo handle 3: parent 1:1 $TARGET limit 0
> > echo ''; echo 'after limit drop'; tc -s -d qdisc show dev lo
> > tc qdisc replace dev lo handle 2: parent 1:1 sfq
> > echo ''; echo 'post graft'; tc -s -d qdisc show dev lo
> > 
> > The second to last show command shows 0 packets but a positive
> > number (74) of backlog bytes. The problem becomes clearer in the
> > last show command, where qdisc_purge_queue triggers
> > qdisc_tree_reduce_backlog with the positive backlog and causes an
> > underflow in the tbf parent's backlog (4096 Mb instead of 0).
> > 
> > 2. fq_codel_change is also wrong in the non gso_skb case. It tracks
> > the amount to drop after the limit adjustment loop through
> > cstats.drop_count and cstats.drop_len, but these are also updated
> > in fq_codel_dequeue, and reset everytime if non-zero in that
> > function after a call to qdisc_tree_reduce_backlog.
> > If the drop path ever occurs in fq_codel_dequeue and
> > qdisc_dequeue_internal takes the non gso_skb path, then we would
> > reduce the backlog by an extra packet.
> > 
> > To fix these issues, the codepath for all clients of
> > qdisc_dequeue_internal has been simplified: codel, pie, hhf, fq,
> > fq_pie, and fq_codel. qdisc_dequeue_internal handles the backlog
> > adjustments for all cases that do not directly use the dequeue
> > handler.
> > 
> > Special care is taken for fq_codel_dequeue to account for the
> > qdisc_tree_reduce_backlog call in its dequeue handler. The
> > cstats reset is moved from the end to the beginning of
> > fq_codel_dequeue, so the change handler can use cstats for
> > proper backlog reduction accounting purposes. The drop_len and
> > drop_count fields are not used elsewhere so this reordering in
> > fq_codel_dequeue is ok.
> > 
> > Fixes: 2d3cbfd6d54a ("net_sched: Flush gso_skb list too during ->change()")
> > Fixes: 4b549a2ef4be ("fq_codel: Fair Queue Codel AQM")
> > Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc")
> > 
> > Signed-off-by: William Liu will@willsroot.io
> > Reviewed-by: Savino Dicanosa savy@syst3mfailure.io
> 
> 
> Please avid black lines in the tag area, i.e. between 'Fixes' and the SoB.
> 

Ok noted.

> Also I think this could/should be splitted in several patches, one for
> each affected qdisc.
> 

I considered doing that originally, but the commit that introduced qdisc_dequeue_internal had all the changes in one patch: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2d3cbfd6d54a2c39ce3244f33f85c595844bd7b8

Plus, splitting this change up for each affected qdisc and the change in qdisc_dequeue_internal would have codel and pie end up with broken backlog accounting, unless I group them with the same patch for qdisc_dequeue_internal. Should I still split it in this case?

> > diff --git a/net/sched/sch_fq_codel.c b/net/sched/sch_fq_codel.c
> > index 2a0f3a513bfa..f9e6d76a1712 100644
> > --- a/net/sched/sch_fq_codel.c
> > +++ b/net/sched/sch_fq_codel.c
> > @@ -286,6 +286,10 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
> > struct fq_codel_flow *flow;
> > struct list_head *head;
> > 
> > + /* reset these here, as change needs them for proper accounting*/
> > + q->cstats.drop_count = 0;
> > + q->cstats.drop_len = 0;
> > +
> > begin:
> > head = &q->new_flows;
> > if (list_empty(head)) {
> > @@ -319,8 +323,6 @@ static struct sk_buff *fq_codel_dequeue(struct Qdisc *sch)
> > if (q->cstats.drop_count) {
> 
> 
> Why is this 'if' still needed ? Isn't drop_count always 0 here?
> 

No, the codel_dequeue call in the middle can adjust that value.

> /P

Best,
Will

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-07-22 15:52 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-19 18:08 [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal William Liu
2025-07-19 18:08 ` [PATCH net 2/2] selftests/tc-testing: Check backlog stats in gso_skb case William Liu
2025-07-22 13:32 ` [PATCH net 1/2] net/sched: Fix backlog accounting in qdisc_dequeue_internal Paolo Abeni
2025-07-22 15:52   ` William Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).