public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Cong Wang <xiyou.wangcong@gmail.com>
To: William Liu <will@willsroot.io>
Cc: "netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	Jiri Pirko <jiri@resnulli.us>,
	Jamal Hadi Salim <jhs@mojatatu.com>, Savy <savy@syst3mfailure.io>,
	Paolo Abeni <pabeni@redhat.com>
Subject: Re: [BUG]  Inconsistency between qlen and backlog in hhf, fq, fq_codel, and fq_pie causing WARNING in qdisc_tree_reduce_backlog
Date: Mon, 7 Jul 2025 11:51:25 -0700	[thread overview]
Message-ID: <aGwXLe8djsE0H3Ed@pop-os.localdomain> (raw)
In-Reply-To: <X6Q8WFnkbyNTRaSQ07hgoBUIihJJdm7GIDvCCY0prplSeVDZPKXquiH2as4hPXAWn1J1a2tyP5RnBm8tjKWNM881yDlMlx0pMg9vioBRY1w=@willsroot.io>

On Sun, Jul 06, 2025 at 01:43:08PM +0000, William Liu wrote:
> On Saturday, July 5th, 2025 at 1:04 AM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> 
> > 
> > 
> > On Wed, Jul 02, 2025 at 08:39:45PM +0000, William Liu wrote:
> > 
> > > Hi,
> > > 
> > > We write to report a bug in qlen and backlog consistency affecting hhf, fq, fq_codel, and fq_pie when acting as a child of tbf. The cause of this bug was introduced by the following fix last month designed to address a null dereference bug caused by gso segmentation and a temporary inconsistency in queue state when tbf peeks at its child while running out of tokens during tbf_dequeue [1]. We actually reported that bug but did not realize the subtle problem in the fix until now. We are aware of bugs with similar symptoms reported by Mingi [3] and Lion [4], but those are of a different root cause (at least what we can see of Mingi's report).
> > 
> > 
> > Thanks for your report.
> > 
> > > This works on the upstream kernel, and we have the following reproducer.
> > > 
> > > ./tc qdisc del dev lo root
> > > ./tc qdisc add dev lo root handle 1: tbf rate 8bit burst 100b latency 1ms || echo TBF
> > > ./tc qdisc add dev lo handle 3: parent 1:1 hhf limit 1000 || echo HH
> > > ping -I lo -f -c1 -s32 -W0.001 127.0.0.1 2>&1 >/dev/null
> > > ./tc qdisc change dev lo handle 3: parent 1:1 hhf limit 0 || echo HH
> > > ./tc qdisc replace dev lo handle 2: parent 1:1 sfq || echo SFQ
> > > 
> > > Note that a patched version of tc that supports 0 limits must be built. The symptom of the bug arises in the WARN_ON_ONCE check in qdisc_tree_reduce_backlog [2], where n is 0. You can replace hhf with fq, fq_codel, and fq_pie to trigger warnings as well, though the success rate may vary.
> > > 
> > > The root cause comes from the newly introduced function qdisc_dequeue_internal, which the change handler will trigger in the affected qdiscs [5]. When dequeuing from a non empty gso in this peek function, only qlen is decremented, and backlog is not considered. The gso insertion is triggered by qdisc_peek_dequeued, which tbf calls for these qdiscs when they are its child.
> > > 
> > > When replacing the qdisc, tbf_graft triggers, and qdisc_purge_queue triggers qdisc_tree_reduce_backlog with the inconsistent values, which one can observe by adding printk to the passed qlen backlog values.
> > 
> > 
> > If I understand you correctly, the problem is the inconsistent behavior
> > between qdisc_purge_queue() and qdisc_dequeue_internal()? And it is
> > because the former does not take care of ->gso_skb?
> > 
> 
> No, there are 2 points of inconsistent behavior. 
> 
> 1. qdisc_dequeue_internal and qdisc_peek_dequeued. In qdisc_peek_dequeued, when a skb comes from the dequeue handler, it gets added to the gso_skb with qlen and backlog increased. In qdisc_dequeue_internal, only qlen is decreased when removing from gso.
> 

Yes, because qlen is handled by qdisc_dequeue_internal()'s callers to
control their loop of sch->limit.


> 2. The dequeue limit loop in change handlers and qdisc_dequeue_internal. Every time those loops call qdisc_dequeue_internal, the loops track their own version of dropped items and total dropped packet sizes, before using those values for qdisc_tree_reduce_backlog. The hhf qdisc is an exception as it just uses a before and after loop in the limit change loop. Would this not lead to double counting in the gso_skb dequeue case for qdisc_dequeue_internal? 

Right, I think hhf qdisc should align with other Qdisc's to track the
qlen.

Note: my patch didn't change its behavior, so this issue existed even
before my patch.

> 
> Also, I took a look at some of the dequeue handlers (which qdisc_dequeue_internal call when direct is false), and fq_codel_dequeue touches the drop_count and drop_len statistics, which the limit adjustment loop also uses. 


Right, this is why it is hard to move the qlen tracking into
qdisc_dequeue_internal().

Thanks.

  reply	other threads:[~2025-07-07 18:51 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-02 20:39 [BUG] Inconsistency between qlen and backlog in hhf, fq, fq_codel, and fq_pie causing WARNING in qdisc_tree_reduce_backlog William Liu
2025-07-05  1:04 ` Cong Wang
2025-07-06 13:43   ` William Liu
2025-07-07 18:51     ` Cong Wang [this message]
2025-07-08 20:21       ` William Liu
2025-07-10 22:40         ` Cong Wang
2025-07-12  0:08           ` William Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aGwXLe8djsE0H3Ed@pop-os.localdomain \
    --to=xiyou.wangcong@gmail.com \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=savy@syst3mfailure.io \
    --cc=will@willsroot.io \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox