All of lore.kernel.org
 help / color / mirror / Atom feed
* [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4)
@ 2003-07-18 19:53 Wilfried Weissmann
  2003-07-19  9:25 ` devik
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Wilfried Weissmann @ 2003-07-18 19:53 UTC (permalink / raw)
  To: lartc

Hello,

I think the BUG_TRAP() in the htb_dequeue_tree() is wrong. First it 
checks if the class pointer "cl" is NULL, which is obviously right. But 
I do not understand why we also check whenever the queue length of the 
leaf queue is zero "cl->un.leaf.q->q.qlen". I would have put that in the 
expression of the "if" statements that comes afterwards. A queue length 
of 0 is not an error condition that should be reported (please, correct 
me if I misunderstood the code).
I can pretty much reliably trigger the assertion with a well utilized 
gigabit ethernet link when I flush and reactivate the TC configuration 
every 3 seconds. It looks like the error occurs only when confiuration 
changes are made.
I will some some more tests on monday when I am back at the office to 
verify that the queue length is (not) the problem.

bye,
wilfried

static struct sk_buff *
htb_dequeue_tree(struct htb_sched *q,int prio,int level)
{
	struct sk_buff *skb = NULL;
	//struct htb_sched *q = (struct htb_sched *)sch->data;
	struct htb_class *cl,*start;
	/* look initial class up in the row */
	start = cl = htb_lookup_leaf (q->row[level]+prio,prio,q->ptr[level]+prio);
	
	do {
		BUG_TRAP(cl && cl->un.leaf.q->q.qlen); if (!cl) return NULL;
		HTB_DBG(4,1,"htb_deq_tr prio=%d lev=%d cl=%X defic=%d\n",
				prio,level,cl->classid,cl->un.leaf.deficit[level]);
	
		if (likely((skb = cl->un.leaf.q->dequeue(cl->un.leaf.q)) != NULL))
			break;
		if (!cl->warned) {

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4)
  2003-07-18 19:53 [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4) Wilfried Weissmann
@ 2003-07-19  9:25 ` devik
  2003-07-19 11:42 ` Wilfried Weissmann
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: devik @ 2003-07-19  9:25 UTC (permalink / raw)
  To: lartc

If you read comment above htb_dequeue_tree, it should be called
only when it is sure that there are packets inside of the level/prio.
It is known by other HTB mechanism (per-level activity lists).

Thus the bugtrap is to catch case where class was inserted
into activity list because it had packets in its sub-qdisc
but when we actually decide to dequeue - it has no packet.
It is weird - can qdisc lose packets even when dequeue was
not called ??

-------------------------------
    Martin Devera aka devik
Linux kernel QoS/HTB maintainer
  http://luxik.cdi.cz/~devik/

On Fri, 18 Jul 2003, Wilfried Weissmann wrote:

> Hello,
>
> I think the BUG_TRAP() in the htb_dequeue_tree() is wrong. First it
> checks if the class pointer "cl" is NULL, which is obviously right. But
> I do not understand why we also check whenever the queue length of the
> leaf queue is zero "cl->un.leaf.q->q.qlen". I would have put that in the
> expression of the "if" statements that comes afterwards. A queue length
> of 0 is not an error condition that should be reported (please, correct
> me if I misunderstood the code).
> I can pretty much reliably trigger the assertion with a well utilized
> gigabit ethernet link when I flush and reactivate the TC configuration
> every 3 seconds. It looks like the error occurs only when confiuration
> changes are made.
> I will some some more tests on monday when I am back at the office to
> verify that the queue length is (not) the problem.
>
> bye,
> wilfried
>
> static struct sk_buff *
> htb_dequeue_tree(struct htb_sched *q,int prio,int level)
> {
> 	struct sk_buff *skb = NULL;
> 	//struct htb_sched *q = (struct htb_sched *)sch->data;
> 	struct htb_class *cl,*start;
> 	/* look initial class up in the row */
> 	start = cl = htb_lookup_leaf (q->row[level]+prio,prio,q->ptr[level]+prio);
>
> 	do {
> 		BUG_TRAP(cl && cl->un.leaf.q->q.qlen); if (!cl) return NULL;
> 		HTB_DBG(4,1,"htb_deq_tr prio=%d lev=%d cl=%X defic=%d\n",
> 				prio,level,cl->classid,cl->un.leaf.deficit[level]);
>
> 		if (likely((skb = cl->un.leaf.q->dequeue(cl->un.leaf.q)) != NULL))
> 			break;
> 		if (!cl->warned) {
>
> _______________________________________________
> LARTC mailing list / LARTC@mailman.ds9a.nl
> http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/
>
>

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4)
  2003-07-18 19:53 [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4) Wilfried Weissmann
  2003-07-19  9:25 ` devik
@ 2003-07-19 11:42 ` Wilfried Weissmann
  2003-07-20  7:28 ` devik
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Wilfried Weissmann @ 2003-07-19 11:42 UTC (permalink / raw)
  To: lartc

devik wrote:
> If you read comment above htb_dequeue_tree, it should be called
> only when it is sure that there are packets inside of the level/prio.
> It is known by other HTB mechanism (per-level activity lists).
> 
> Thus the bugtrap is to catch case where class was inserted
> into activity list because it had packets in its sub-qdisc
> but when we actually decide to dequeue - it has no packet.
> It is weird - can qdisc lose packets even when dequeue was
> not called ??

If you change the depth of the leave queue then it is possible to drop 
packets or if you completely exchange the queue. Which would also 
explain why the assertion only occurs when the configuration is altered.

Greetings,
Wilfried

> 
> -------------------------------
>     Martin Devera aka devik
> Linux kernel QoS/HTB maintainer
>   http://luxik.cdi.cz/~devik/
> 
> On Fri, 18 Jul 2003, Wilfried Weissmann wrote:
> 
> 
>>Hello,
>>
>>I think the BUG_TRAP() in the htb_dequeue_tree() is wrong. First it
>>checks if the class pointer "cl" is NULL, which is obviously right. But
>>I do not understand why we also check whenever the queue length of the
>>leaf queue is zero "cl->un.leaf.q->q.qlen". I would have put that in the
>>expression of the "if" statements that comes afterwards. A queue length
>>of 0 is not an error condition that should be reported (please, correct
>>me if I misunderstood the code).
>>I can pretty much reliably trigger the assertion with a well utilized
>>gigabit ethernet link when I flush and reactivate the TC configuration
>>every 3 seconds. It looks like the error occurs only when confiuration
>>changes are made.
>>I will some some more tests on monday when I am back at the office to
>>verify that the queue length is (not) the problem.
>>
>>bye,
>>wilfried
>>
>>static struct sk_buff *
>>htb_dequeue_tree(struct htb_sched *q,int prio,int level)
>>{
>>	struct sk_buff *skb = NULL;
>>	//struct htb_sched *q = (struct htb_sched *)sch->data;
>>	struct htb_class *cl,*start;
>>	/* look initial class up in the row */
>>	start = cl = htb_lookup_leaf (q->row[level]+prio,prio,q->ptr[level]+prio);
>>
>>	do {
>>		BUG_TRAP(cl && cl->un.leaf.q->q.qlen); if (!cl) return NULL;
>>		HTB_DBG(4,1,"htb_deq_tr prio=%d lev=%d cl=%X defic=%d\n",
>>				prio,level,cl->classid,cl->un.leaf.deficit[level]);
>>
>>		if (likely((skb = cl->un.leaf.q->dequeue(cl->un.leaf.q)) != NULL))
>>			break;
>>		if (!cl->warned) {


_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4)
  2003-07-18 19:53 [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4) Wilfried Weissmann
  2003-07-19  9:25 ` devik
  2003-07-19 11:42 ` Wilfried Weissmann
@ 2003-07-20  7:28 ` devik
  2003-07-20 20:59 ` Wilfried Weissmann
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: devik @ 2003-07-20  7:28 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1171 bytes --]

> > If you read comment above htb_dequeue_tree, it should be called
> > only when it is sure that there are packets inside of the level/prio.
> > It is known by other HTB mechanism (per-level activity lists).
> >
> > Thus the bugtrap is to catch case where class was inserted
> > into activity list because it had packets in its sub-qdisc
> > but when we actually decide to dequeue - it has no packet.
> > It is weird - can qdisc lose packets even when dequeue was
> > not called ??
>
> If you change the depth of the leave queue then it is possible to drop
> packets or if you completely exchange the queue. Which would also
> explain why the assertion only occurs when the configuration is altered.

Well, I agree that there is something wrong. Now it is neccessary to
find scenario where it does happen so that it is fixed in right way.
I have not much time these days to test these cases but your informations
would lead to following hypothesis:

Classe's child qdisc is replaced while old one still has nonzero queue.
New empty qdisc is grafted under class instead. What about attached
patch (it is against my latest version so you can see offset warnings) ?

devik

[-- Attachment #2: Type: TEXT/PLAIN, Size: 472 bytes --]

--- sch_htb.c	2003/07/05 10:37:11	1.21
+++ sch_htb.c	2003/07/20 07:24:59
@@ -1286,6 +1286,10 @@ static int htb_graft(struct Qdisc *sch, 
 					return -ENOBUFS;
 		sch_tree_lock(sch);
 		if ((*old = xchg(&cl->un.leaf.q, new)) != NULL) {
+			/* TODO: test it */
+			if (cl->prio_activity)
+				htb_deactivate ((struct htb_sched*)sch->data,cl);
+
 			/* TODO: is it correct ? Why CBQ doesn't do it ? */
 			sch->q.qlen -= (*old)->q.qlen;	
 			qdisc_reset(*old);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4)
  2003-07-18 19:53 [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4) Wilfried Weissmann
                   ` (2 preceding siblings ...)
  2003-07-20  7:28 ` devik
@ 2003-07-20 20:59 ` Wilfried Weissmann
  2003-07-21  8:49 ` Wilfried.Weissmann
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Wilfried Weissmann @ 2003-07-20 20:59 UTC (permalink / raw)
  To: lartc

devik wrote:
>>>If you read comment above htb_dequeue_tree, it should be called
>>>only when it is sure that there are packets inside of the level/prio.
>>>It is known by other HTB mechanism (per-level activity lists).
>>>
>>>Thus the bugtrap is to catch case where class was inserted
>>>into activity list because it had packets in its sub-qdisc
>>>but when we actually decide to dequeue - it has no packet.
>>>It is weird - can qdisc lose packets even when dequeue was
>>>not called ??
>>
>>If you change the depth of the leave queue then it is possible to drop
>>packets or if you completely exchange the queue. Which would also
>>explain why the assertion only occurs when the configuration is altered.
> 
> 
> Well, I agree that there is something wrong. Now it is neccessary to
> find scenario where it does happen so that it is fixed in right way.
> I have not much time these days to test these cases but your informations
> would lead to following hypothesis:
> 
> Classe's child qdisc is replaced while old one still has nonzero queue.
> New empty qdisc is grafted under class instead. What about attached
> patch (it is against my latest version so you can see offset warnings) ?

This would not work if there are several intermediates HTB queues from 
the device to the leave queue. In this case every queue from the queue 
that was changed to the root has to be notified about the change. (The 
setup we want to use involves such a configuration.) Maybe it is better 
to just deactivate a class when a dequeue from its leave failes due to a 
zero queue length. If you are concerned about performance then an audit 
process could be implemented. For example to check one leave queue every 
64 packets +/- initial random offset to create some entropy similar to 
the maximum mount count in the ext2 filesystem. Maybe there are better 
ways to do this. I am not so familiar with the code.

I will make some tests with the patch tomorrow. If my theory is true 
then it should still help a lot.

bye,
wilfried

> 
> devik
> 
> 
> ------------------------------------------------------------------------
> 
> --- sch_htb.c	2003/07/05 10:37:11	1.21
> +++ sch_htb.c	2003/07/20 07:24:59
> @@ -1286,6 +1286,10 @@ static int htb_graft(struct Qdisc *sch, 
>  					return -ENOBUFS;
>  		sch_tree_lock(sch);
>  		if ((*old = xchg(&cl->un.leaf.q, new)) != NULL) {
> +			/* TODO: test it */
> +			if (cl->prio_activity)
> +				htb_deactivate ((struct htb_sched*)sch->data,cl);
> +
>  			/* TODO: is it correct ? Why CBQ doesn't do it ? */
>  			sch->q.qlen -= (*old)->q.qlen;	
>  			qdisc_reset(*old);




_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4)
  2003-07-18 19:53 [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4) Wilfried Weissmann
                   ` (3 preceding siblings ...)
  2003-07-20 20:59 ` Wilfried Weissmann
@ 2003-07-21  8:49 ` Wilfried.Weissmann
  2003-07-21  9:10 ` devik
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Wilfried.Weissmann @ 2003-07-21  8:49 UTC (permalink / raw)
  To: lartc

> devik wrote:
> >>>If you read comment above htb_dequeue_tree, it should be called
> >>>only when it is sure that there are packets inside of the level/prio.
> >>>It is known by other HTB mechanism (per-level activity lists).
> >>>
> >>>Thus the bugtrap is to catch case where class was inserted
> >>>into activity list because it had packets in its sub-qdisc
> >>>but when we actually decide to dequeue - it has no packet.
> >>>It is weird - can qdisc lose packets even when dequeue was
> >>>not called ??
> >>
> >>If you change the depth of the leave queue then it is possible to drop
> >>packets or if you completely exchange the queue. Which would also
> >>explain why the assertion only occurs when the configuration is altered.

Now I verified that the problem is indeed the 0 queue length and not a NULL
class pointer.

> > 
> > 
> > Well, I agree that there is something wrong. Now it is neccessary to
> > find scenario where it does happen so that it is fixed in right way.
> > I have not much time these days to test these cases but your
> informations
> > would lead to following hypothesis:
> > 
> > Classe's child qdisc is replaced while old one still has nonzero queue.
> > New empty qdisc is grafted under class instead. What about attached
> > patch (it is against my latest version so you can see offset warnings) ?
> 
> This would not work if there are several intermediates HTB queues from 
> the device to the leave queue. In this case every queue from the queue 
> that was changed to the root has to be notified about the change. (The 
> setup we want to use involves such a configuration.) Maybe it is better 
> to just deactivate a class when a dequeue from its leave failes due to a 
> zero queue length. If you are concerned about performance then an audit 
> process could be implemented. For example to check one leave queue every 
> 64 packets +/- initial random offset to create some entropy similar to 
> the maximum mount count in the ext2 filesystem. Maybe there are better 
> ways to do this. I am not so familiar with the code.
> 
> I will make some tests with the patch tomorrow. If my theory is true 
> then it should still help a lot.

With the patch applied it is much harder to find the right ceil settings to
trigger the assertion, however it does not fix the problem. I also got the
following log entry:

HTB: dequeue bug (8,270045,270045), report it please !

Maybe this massages is just a side effect of the bug.

Greetings,
Wilfried

> 
> bye,
> wilfried
> 
> > 
> > devik
> > 
> > 
> > ------------------------------------------------------------------------
> > 
> > --- sch_htb.c	2003/07/05 10:37:11	1.21
> > +++ sch_htb.c	2003/07/20 07:24:59
> > @@ -1286,6 +1286,10 @@ static int htb_graft(struct Qdisc *sch, 
> >  					return -ENOBUFS;
> >  		sch_tree_lock(sch);
> >  		if ((*old = xchg(&cl->un.leaf.q, new)) != NULL) {
> > +			/* TODO: test it */
> > +			if (cl->prio_activity)
> > +				htb_deactivate ((struct htb_sched*)sch->data,cl);
> > +
> >  			/* TODO: is it correct ? Why CBQ doesn't do it ? */
> >  			sch->q.qlen -= (*old)->q.qlen;	
> >  			qdisc_reset(*old);

-- 
+++ GMX - Mail, Messaging & more  http://www.gmx.net +++

Jetzt ein- oder umsteigen und USB-Speicheruhr als Prämie sichern!

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4)
  2003-07-18 19:53 [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4) Wilfried Weissmann
                   ` (4 preceding siblings ...)
  2003-07-21  8:49 ` Wilfried.Weissmann
@ 2003-07-21  9:10 ` devik
  2003-07-23  7:39 ` devik
  2003-07-23 18:35 ` Wilfried Weissmann
  7 siblings, 0 replies; 9+ messages in thread
From: devik @ 2003-07-21  9:10 UTC (permalink / raw)
  To: lartc

Yes I agree with you regarding zero queue size. I plan
to make patch similar to your proposal. I hope it will
be today.

-------------------------------
    Martin Devera aka devik
Linux kernel QoS/HTB maintainer
  http://luxik.cdi.cz/~devik/

On Mon, 21 Jul 2003 Wilfried.Weissmann@gmx.at wrote:

> > devik wrote:
> > >>>If you read comment above htb_dequeue_tree, it should be called
> > >>>only when it is sure that there are packets inside of the level/prio.
> > >>>It is known by other HTB mechanism (per-level activity lists).
> > >>>
> > >>>Thus the bugtrap is to catch case where class was inserted
> > >>>into activity list because it had packets in its sub-qdisc
> > >>>but when we actually decide to dequeue - it has no packet.
> > >>>It is weird - can qdisc lose packets even when dequeue was
> > >>>not called ??
> > >>
> > >>If you change the depth of the leave queue then it is possible to drop
> > >>packets or if you completely exchange the queue. Which would also
> > >>explain why the assertion only occurs when the configuration is altered.
>
> Now I verified that the problem is indeed the 0 queue length and not a NULL
> class pointer.
>
> > >
> > >
> > > Well, I agree that there is something wrong. Now it is neccessary to
> > > find scenario where it does happen so that it is fixed in right way.
> > > I have not much time these days to test these cases but your
> > informations
> > > would lead to following hypothesis:
> > >
> > > Classe's child qdisc is replaced while old one still has nonzero queue.
> > > New empty qdisc is grafted under class instead. What about attached
> > > patch (it is against my latest version so you can see offset warnings) ?
> >
> > This would not work if there are several intermediates HTB queues from
> > the device to the leave queue. In this case every queue from the queue
> > that was changed to the root has to be notified about the change. (The
> > setup we want to use involves such a configuration.) Maybe it is better
> > to just deactivate a class when a dequeue from its leave failes due to a
> > zero queue length. If you are concerned about performance then an audit
> > process could be implemented. For example to check one leave queue every
> > 64 packets +/- initial random offset to create some entropy similar to
> > the maximum mount count in the ext2 filesystem. Maybe there are better
> > ways to do this. I am not so familiar with the code.
> >
> > I will make some tests with the patch tomorrow. If my theory is true
> > then it should still help a lot.
>
> With the patch applied it is much harder to find the right ceil settings to
> trigger the assertion, however it does not fix the problem. I also got the
> following log entry:
>
> HTB: dequeue bug (8,270045,270045), report it please !
>
> Maybe this massages is just a side effect of the bug.
>
> Greetings,
> Wilfried
>
> >
> > bye,
> > wilfried
> >
> > >
> > > devik
> > >
> > >
> > > ------------------------------------------------------------------------
> > >
> > > --- sch_htb.c	2003/07/05 10:37:11	1.21
> > > +++ sch_htb.c	2003/07/20 07:24:59
> > > @@ -1286,6 +1286,10 @@ static int htb_graft(struct Qdisc *sch,
> > >  					return -ENOBUFS;
> > >  		sch_tree_lock(sch);
> > >  		if ((*old = xchg(&cl->un.leaf.q, new)) != NULL) {
> > > +			/* TODO: test it */
> > > +			if (cl->prio_activity)
> > > +				htb_deactivate ((struct htb_sched*)sch->data,cl);
> > > +
> > >  			/* TODO: is it correct ? Why CBQ doesn't do it ? */
> > >  			sch->q.qlen -= (*old)->q.qlen;
> > >  			qdisc_reset(*old);
>
> --
> +++ GMX - Mail, Messaging & more  http://www.gmx.net +++
>
> Jetzt ein- oder umsteigen und USB-Speicheruhr als Prämie sichern!
>
>
>

_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4)
  2003-07-18 19:53 [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4) Wilfried Weissmann
                   ` (5 preceding siblings ...)
  2003-07-21  9:10 ` devik
@ 2003-07-23  7:39 ` devik
  2003-07-23 18:35 ` Wilfried Weissmann
  7 siblings, 0 replies; 9+ messages in thread
From: devik @ 2003-07-23  7:39 UTC (permalink / raw)
  To: lartc

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2972 bytes --]

Hi,

try attached fix please (it duplicates last one too so that
you might get a reject).

-------------------------------
    Martin Devera aka devik
Linux kernel QoS/HTB maintainer
  http://luxik.cdi.cz/~devik/

On Sun, 20 Jul 2003, Wilfried Weissmann wrote:

> devik wrote:
> >>>If you read comment above htb_dequeue_tree, it should be called
> >>>only when it is sure that there are packets inside of the level/prio.
> >>>It is known by other HTB mechanism (per-level activity lists).
> >>>
> >>>Thus the bugtrap is to catch case where class was inserted
> >>>into activity list because it had packets in its sub-qdisc
> >>>but when we actually decide to dequeue - it has no packet.
> >>>It is weird - can qdisc lose packets even when dequeue was
> >>>not called ??
> >>
> >>If you change the depth of the leave queue then it is possible to drop
> >>packets or if you completely exchange the queue. Which would also
> >>explain why the assertion only occurs when the configuration is altered.
> >
> >
> > Well, I agree that there is something wrong. Now it is neccessary to
> > find scenario where it does happen so that it is fixed in right way.
> > I have not much time these days to test these cases but your informations
> > would lead to following hypothesis:
> >
> > Classe's child qdisc is replaced while old one still has nonzero queue.
> > New empty qdisc is grafted under class instead. What about attached
> > patch (it is against my latest version so you can see offset warnings) ?
>
> This would not work if there are several intermediates HTB queues from
> the device to the leave queue. In this case every queue from the queue
> that was changed to the root has to be notified about the change. (The
> setup we want to use involves such a configuration.) Maybe it is better
> to just deactivate a class when a dequeue from its leave failes due to a
> zero queue length. If you are concerned about performance then an audit
> process could be implemented. For example to check one leave queue every
> 64 packets +/- initial random offset to create some entropy similar to
> the maximum mount count in the ext2 filesystem. Maybe there are better
> ways to do this. I am not so familiar with the code.
>
> I will make some tests with the patch tomorrow. If my theory is true
> then it should still help a lot.
>
> bye,
> wilfried
>
> >
> > devik
> >
> >
> > ------------------------------------------------------------------------
> >
> > --- sch_htb.c	2003/07/05 10:37:11	1.21
> > +++ sch_htb.c	2003/07/20 07:24:59
> > @@ -1286,6 +1286,10 @@ static int htb_graft(struct Qdisc *sch,
> >  					return -ENOBUFS;
> >  		sch_tree_lock(sch);
> >  		if ((*old = xchg(&cl->un.leaf.q, new)) != NULL) {
> > +			/* TODO: test it */
> > +			if (cl->prio_activity)
> > +				htb_deactivate ((struct htb_sched*)sch->data,cl);
> > +
> >  			/* TODO: is it correct ? Why CBQ doesn't do it ? */
> >  			sch->q.qlen -= (*old)->q.qlen;
> >  			qdisc_reset(*old);
>
>
>
>
>
>

[-- Attachment #2: Type: TEXT/PLAIN, Size: 1766 bytes --]

--- sch_htb.c	2003/07/05 10:37:11	1.21
+++ sch_htb.c	2003/07/23 07:37:52
@@ -947,15 +947,24 @@ static struct sk_buff *
 htb_dequeue_tree(struct htb_sched *q,int prio,int level)
 {
 	struct sk_buff *skb = NULL;
-	//struct htb_sched *q = (struct htb_sched *)sch->data;
 	struct htb_class *cl,*start;
 	/* look initial class up in the row */
 	start = cl = htb_lookup_leaf (q->row[level]+prio,prio,q->ptr[level]+prio);
 	
 	do {
-		BUG_TRAP(cl && cl->un.leaf.q->q.qlen); if (!cl) return NULL;
+		BUG_TRAP(cl); 
+		if (!cl) return NULL;
 		HTB_DBG(4,1,"htb_deq_tr prio=%d lev=%d cl=%X defic=%d\n",
 				prio,level,cl->classid,cl->un.leaf.deficit[level]);
+
+		/* class can be empty - it is unlikely but can be true if leaf
+		   qdisc drops packets in enqueue routine or if someone used
+		   graft operation on the leaf since last dequeue; 
+		   simply deactivate and skip such class */
+		if (unlikely(cl->un.leaf.q->q.qlen == 0)) {
+			htb_deactivate(q,cl);
+			goto new_lookup;
+		}
 	
 		if (likely((skb = cl->un.leaf.q->dequeue(cl->un.leaf.q)) != NULL)) 
 			break;
@@ -965,6 +974,7 @@ htb_dequeue_tree(struct htb_sched *q,int
 		}
 		q->nwc_hit++;
 		htb_next_rb_node((level?cl->parent->un.inner.ptr:q->ptr[0])+prio);
+new_lookup:
 		cl = htb_lookup_leaf (q->row[level]+prio,prio,q->ptr[level]+prio);
 	} while (cl != start);
 
@@ -1286,6 +1296,10 @@ static int htb_graft(struct Qdisc *sch, 
 					return -ENOBUFS;
 		sch_tree_lock(sch);
 		if ((*old = xchg(&cl->un.leaf.q, new)) != NULL) {
+			/* TODO: test it */
+			if (cl->prio_activity)
+				htb_deactivate ((struct htb_sched*)sch->data,cl);
+
 			/* TODO: is it correct ? Why CBQ doesn't do it ? */
 			sch->q.qlen -= (*old)->q.qlen;	
 			qdisc_reset(*old);

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4)
  2003-07-18 19:53 [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4) Wilfried Weissmann
                   ` (6 preceding siblings ...)
  2003-07-23  7:39 ` devik
@ 2003-07-23 18:35 ` Wilfried Weissmann
  7 siblings, 0 replies; 9+ messages in thread
From: Wilfried Weissmann @ 2003-07-23 18:35 UTC (permalink / raw)
  To: lartc

devik wrote:
> Hi,
> 
> try attached fix please (it duplicates last one too so that
> you might get a reject).

Thanks, but now the rb_tree may become empty and this causes an oops in 
htb_lookup_leaf() (tree-rb_node = NULL). I think the kernel crashes in 
"while ((*sp->pptr)->rb_left)". Catching that case is easy. But we must 
not forget to leave the do{}while() loop in htb_dequeue_tree() when an 
empty tree is detected.

I cannot provide you any patches right now. I will send them tomorrow if 
everything works.

Greetings,
Wilfried

> 
> -------------------------------
>     Martin Devera aka devik
> Linux kernel QoS/HTB maintainer
>   http://luxik.cdi.cz/~devik/
> 
> On Sun, 20 Jul 2003, Wilfried Weissmann wrote:
> 
> 
>>devik wrote:
>>
>>>>>If you read comment above htb_dequeue_tree, it should be called
>>>>>only when it is sure that there are packets inside of the level/prio.
>>>>>It is known by other HTB mechanism (per-level activity lists).
>>>>>
>>>>>Thus the bugtrap is to catch case where class was inserted
>>>>>into activity list because it had packets in its sub-qdisc
>>>>>but when we actually decide to dequeue - it has no packet.
>>>>>It is weird - can qdisc lose packets even when dequeue was
>>>>>not called ??
>>>>
>>>>If you change the depth of the leave queue then it is possible to drop
>>>>packets or if you completely exchange the queue. Which would also
>>>>explain why the assertion only occurs when the configuration is altered.
>>>
>>>
>>>Well, I agree that there is something wrong. Now it is neccessary to
>>>find scenario where it does happen so that it is fixed in right way.
>>>I have not much time these days to test these cases but your informations
>>>would lead to following hypothesis:
>>>
>>>Classe's child qdisc is replaced while old one still has nonzero queue.
>>>New empty qdisc is grafted under class instead. What about attached
>>>patch (it is against my latest version so you can see offset warnings) ?
>>
>>This would not work if there are several intermediates HTB queues from
>>the device to the leave queue. In this case every queue from the queue
>>that was changed to the root has to be notified about the change. (The
>>setup we want to use involves such a configuration.) Maybe it is better
>>to just deactivate a class when a dequeue from its leave failes due to a
>>zero queue length. If you are concerned about performance then an audit
>>process could be implemented. For example to check one leave queue every
>>64 packets +/- initial random offset to create some entropy similar to
>>the maximum mount count in the ext2 filesystem. Maybe there are better
>>ways to do this. I am not so familiar with the code.
>>
>>I will make some tests with the patch tomorrow. If my theory is true
>>then it should still help a lot.
>>
>>bye,
>>wilfried
>>
>>
>>>devik
>>>
>>>
>>>------------------------------------------------------------------------
>>>
>>>--- sch_htb.c	2003/07/05 10:37:11	1.21
>>>+++ sch_htb.c	2003/07/20 07:24:59
>>>@@ -1286,6 +1286,10 @@ static int htb_graft(struct Qdisc *sch,
>>> 					return -ENOBUFS;
>>> 		sch_tree_lock(sch);
>>> 		if ((*old = xchg(&cl->un.leaf.q, new)) != NULL) {
>>>+			/* TODO: test it */
>>>+			if (cl->prio_activity)
>>>+				htb_deactivate ((struct htb_sched*)sch->data,cl);
>>>+
>>> 			/* TODO: is it correct ? Why CBQ doesn't do it ? */
>>> 			sch->q.qlen -= (*old)->q.qlen;
>>> 			qdisc_reset(*old);
>>
>>
>>
>>
>>
>>
>>
>>------------------------------------------------------------------------
>>
>>--- sch_htb.c	2003/07/05 10:37:11	1.21
>>+++ sch_htb.c	2003/07/23 07:37:52
>>@@ -947,15 +947,24 @@ static struct sk_buff *
>> htb_dequeue_tree(struct htb_sched *q,int prio,int level)
>> {
>> 	struct sk_buff *skb = NULL;
>>-	//struct htb_sched *q = (struct htb_sched *)sch->data;
>> 	struct htb_class *cl,*start;
>> 	/* look initial class up in the row */
>> 	start = cl = htb_lookup_leaf (q->row[level]+prio,prio,q->ptr[level]+prio);
>> 	
>> 	do {
>>-		BUG_TRAP(cl && cl->un.leaf.q->q.qlen); if (!cl) return NULL;
>>+		BUG_TRAP(cl); 
>>+		if (!cl) return NULL;
>> 		HTB_DBG(4,1,"htb_deq_tr prio=%d lev=%d cl=%X defic=%d\n",
>> 				prio,level,cl->classid,cl->un.leaf.deficit[level]);
>>+
>>+		/* class can be empty - it is unlikely but can be true if leaf
>>+		   qdisc drops packets in enqueue routine or if someone used
>>+		   graft operation on the leaf since last dequeue; 
>>+		   simply deactivate and skip such class */
>>+		if (unlikely(cl->un.leaf.q->q.qlen = 0)) {
>>+			htb_deactivate(q,cl);
>>+			goto new_lookup;
>>+		}
>> 	
>> 		if (likely((skb = cl->un.leaf.q->dequeue(cl->un.leaf.q)) != NULL)) 
>> 			break;
>>@@ -965,6 +974,7 @@ htb_dequeue_tree(struct htb_sched *q,int
>> 		}
>> 		q->nwc_hit++;
>> 		htb_next_rb_node((level?cl->parent->un.inner.ptr:q->ptr[0])+prio);
>>+new_lookup:
>> 		cl = htb_lookup_leaf (q->row[level]+prio,prio,q->ptr[level]+prio);
>> 	} while (cl != start);
>> 
>>@@ -1286,6 +1296,10 @@ static int htb_graft(struct Qdisc *sch, 
>> 					return -ENOBUFS;
>> 		sch_tree_lock(sch);
>> 		if ((*old = xchg(&cl->un.leaf.q, new)) != NULL) {
>>+			/* TODO: test it */
>>+			if (cl->prio_activity)
>>+				htb_deactivate ((struct htb_sched*)sch->data,cl);
>>+
>> 			/* TODO: is it correct ? Why CBQ doesn't do it ? */
>> 			sch->q.qlen -= (*old)->q.qlen;	
>> 			qdisc_reset(*old);
> 



_______________________________________________
LARTC mailing list / LARTC@mailman.ds9a.nl
http://mailman.ds9a.nl/mailman/listinfo/lartc HOWTO: http://lartc.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-07-23 18:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-07-18 19:53 [LARTC] [HTB] htb_dequeue_tree assertion (kernel 2.4.21-ac4) Wilfried Weissmann
2003-07-19  9:25 ` devik
2003-07-19 11:42 ` Wilfried Weissmann
2003-07-20  7:28 ` devik
2003-07-20 20:59 ` Wilfried Weissmann
2003-07-21  8:49 ` Wilfried.Weissmann
2003-07-21  9:10 ` devik
2003-07-23  7:39 ` devik
2003-07-23 18:35 ` Wilfried Weissmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.