Re: ACPI/HT or Packet Scheduler BUG?

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: ACPI/HT or Packet Scheduler BUG?
       [not found]     ` <1113601029.4294.80.camel@localhost.localdomain>
@ 2005-04-15 21:44       ` jamal
  2005-04-15 21:54         ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: jamal @ 2005-04-15 21:44 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: netdev, Tarhon-Onu Victor, kuznet, devik, linux-kernel


Didnt see the beginings of this thread - please post on netdev instead
of lkml network related questions.

The real cause seems to be an ARP issue from what i saw in the oops
posted a while back:
--
[4294692.342000] Call Trace:
[4294692.342000]  [<c0104d76>] show_stack+0xa6/0xe0
[4294692.342000]  [<c0104f2b>] show_registers+0x15b/0x1f0
[4294692.342000]  [<c01051a1>] die+0x141/0x2d0
[4294692.342000]  [<c011e13e>] do_page_fault+0x22e/0x6a6
[4294692.342000]  [<c0104817>] error_code+0x4f/0x54
[4294692.342000]  [<c04236da>] qdisc_restart+0xba/0x730
[4294692.342000]  [<c04136fe>] dev_queue_xmit+0x13e/0x640
[4294692.342000]  [<c0454c4c>] arp_solicit+0xfc/0x210
[4294692.342000]  [<c041a6ee>] neigh_timer_handler+0x13e/0x320
[4294692.342000]  [<c0137450>] run_timer_softirq+0x130/0x490
[4294692.342000]  [<c0131ad2>] __do_softirq+0x42/0xa0
[4294692.342000]  [<c01066e1>] do_softirq+0x51/0x60
-----

Is this the same issue?
Can you describe how you create this issue; kernel version etc.

cheers,
jamal

On Fri, 2005-15-04 at 17:37 -0400, Steven Rostedt wrote:
> On Thu, 2005-04-14 at 18:46 +0300, Tarhon-Onu Victor wrote:
> > On Tue, 12 Apr 2005, Tarhon-Onu Victor wrote:
> > 
> > > 	So the problem should be looked in that changes to the pkt sched API, 
> > > the patch containing only those changes is at
> > 
> >  	The bug is in this portion of code from net/sched/sch_generic.c, 
> > in the qdisc_destroy() function:
> > 
> > ==
> >       list_for_each_entry(cq, &cql, list)
> >            list_for_each_entry_safe(q, n, &qdisc->dev->qdisc_list, list)
> >                 if (TC_H_MAJ(q->parent) == TC_H_MAJ(cq->handle)) {
> >                      if (q->ops->cl_ops == NULL)
> >                           list_del_init(&q->list);
> >                      else
> >                           list_move_tail(&q->list, &cql);
> >                 }
> >       list_for_each_entry_safe(cq, n, &cql, list)
> >            list_del_init(&cq->list);
> > ==
> > 
> >  	...and it happens when q->ops->cl_ops is NULL and 
> > list_del_init(&q->list) is executed.
> > 
> >  	The stuff from include/linux/list.h looks ok, it seems like one 
> > of those two iterations (list_for_each_entry() and 
> > list_for_each_entry_safe()) enters an endless loop when an element is 
> > removed from the list under some circumstances.
> 
> There's a comment above qdisc_destroy that says:
> 
> /* Under dev->queue_lock and BH! */
> 
> I'm not so sure this is the case.  I've included the emails of those
> listed as Authors of sch_generic.c and sch_htb.c, hopefully they are the
> ones who can help (if not, sorry to bother you).  
> 
> The list.h is fine, but if another task goes down this list when it
> list_del_init is done, there's a chance that the reading task can get to
> the deleted item just as it is being deleted, and has pointed itself to
> itself. p->next == p.  This would go into an infinite loop.  
> 
> The reason sysrq works is because this doesn't stop interrupts. But put
> a local_irq_save around that list and run your test, I bet you won't be
> able to do anything, but power off with the big button.
> 
> Hope someone can help. I don't know the queue disciplines well enough to
> make a proper fix.
> 
> -- Steve
> 
> 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-15 21:44       ` ACPI/HT or Packet Scheduler BUG? jamal
@ 2005-04-15 21:54         ` Steven Rostedt
  2005-04-15 22:54           ` Thomas Graf
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2005-04-15 21:54 UTC (permalink / raw)
  To: hadi; +Cc: netdev, Tarhon-Onu Victor, kuznet, devik, linux-kernel

Don't think the issue is the same. This problem is happening with
Tarhon-Onu Victor.  I'm including his previous two posts from LKML here.
So you can read the whole thing in one letter.  He states that the
problem started in 2.6.10-rc2 and looking at a diff, between rc1 and rc2
the list walk was added in qdisc_destroy.  Take a look at his scripts
and you may see that on a SMP machine, this my have a race condition.

-- Steve

>From April 8th:


        I am not subscribed to this list so please CC me if you post a 
reply, if you need additional info or if you suggest a patch (in which 
I would be very interested).

        Occurrence:
        - the kernels must be compiled with Hyper Threading support (and
the CPU/MB must support it);
        - a (tc) process is reading statistics from the htb tree and
another
tries to delete or deletes the tree.
        The bug will not occur if the system is booted with acpi=off or 
if it's not SMP (HT) capable. I reproduced the bug on
2.6.10-1.770_FCsmp 
(Fedora Core update package) and vanilla 2.6.11, 2.6.11.5, 2.6.11.6, 
2.6.12-rc1 and 2.6.12-rc2 compiled with SMP and ACPI support in order
to 
enable Hyper Threading (with and without preempt, with or without SMT 
support). The compiler I used is GCC version 3.4.2 (from Fedora Core 3).

        Result: almost all the time the system hangs but still can 
execute SysRq commands.

        How I tested
        ~~~~~~~~~~~~
        On a console I was running a script that initializes a htb tree 
on an interface (dummy0) in an endless loop:
        while /bin/true; do . htbdown.dummy0.sh; done
        ...and on another console I run tc -s also in an endless loop:
        while /bin/true; do tc -s class sh dev dummy0; done

        After a while (sometimes after 2 runs of the htbdown.dummy0.sh 
script, sometimes after 20) the system hangs. It still responds to
SysRq 
commands and I was able to see the two tc processes running. Sometimes
I 
still have ping replies from it and one time, just one time in 2 days I 
still was able to log in remotely.
        There are no printk messages, or no other warnings or errors 
printed in the system log or kernel log. It just hangs and it seems
like 
something is wasting all the CPU power: when I still was able to log in 
I noticed that one of the two virtual CPUs was 100% with system 
interrupts and the other was 100% system. I wasn't able to strace any
of 
the two running tc's.
        What I was able to paste with the mouse in my console, to catch 
in a typescript and also the script that initializes the htb tree on 
dummy0 can be found at ftp://blackblue.iasi.rdsnet.ro/pub/various/k/ .
        The test host is a 3.0GHz Intel Prescott and I first noticed
the 
bug on a system with a 2.8GHz Intel Northwood, both having motherboards 
with Intel chipset (865GBF). I am not able to test it in other SMP 
environments (Intel Xeon or Itanium, AMD Opteron, Dual P3, etc), so I'm 
not able to tell if it's an ACPI bug, a SMP bug or a Packet Scheduler 
bug.



>From April 12th:


On Fri, 8 Apr 2005, Tarhon-Onu Victor wrote:

>       I am not subscribed to this list so please CC me if you post a
reply, 
> if you need additional info or if you suggest a patch (in which I
would be 
> very interested).
>
[snip]
> (Intel Xeon or Itanium, AMD Opteron, Dual P3, etc), so I'm not able to
tell 
> if it's an ACPI bug, a SMP bug or a Packet Scheduler bug.

        It seems like this bug is a packed scheduler one and it was 
introduced in 2.6.10-rc2. In the summary of changes from 2.6.10-rc1 to 
2.6.10-rc2 there are a lot of changes announced for the packet 
scheduler.
        I removed all the changes of the packet scheduler files from
the 
incremental patch 2.6.10-rc1 to 2.6.10-rc2, I applied it to 2.6.10-rc1 
and the new 2.6.10-rc2-whithout-sched-changes does not hang.
        So the problem should be looked in that changes to the pkt
sched 
API, the patch containing only those changes is at 
ftp://blackblue.iasi.rdsnet.ro/pub/various/k/patch-2.6.10-sched_changes-from_rc1-to-rc2.gz



On Fri, 2005-04-15 at 17:44 -0400, jamal wrote:
> Didnt see the beginings of this thread - please post on netdev instead
> of lkml network related questions.
> 
> The real cause seems to be an ARP issue from what i saw in the oops
> posted a while back:
> --
> [4294692.342000] Call Trace:
> [4294692.342000]  [<c0104d76>] show_stack+0xa6/0xe0
> [4294692.342000]  [<c0104f2b>] show_registers+0x15b/0x1f0
> [4294692.342000]  [<c01051a1>] die+0x141/0x2d0
> [4294692.342000]  [<c011e13e>] do_page_fault+0x22e/0x6a6
> [4294692.342000]  [<c0104817>] error_code+0x4f/0x54
> [4294692.342000]  [<c04236da>] qdisc_restart+0xba/0x730
> [4294692.342000]  [<c04136fe>] dev_queue_xmit+0x13e/0x640
> [4294692.342000]  [<c0454c4c>] arp_solicit+0xfc/0x210
> [4294692.342000]  [<c041a6ee>] neigh_timer_handler+0x13e/0x320
> [4294692.342000]  [<c0137450>] run_timer_softirq+0x130/0x490
> [4294692.342000]  [<c0131ad2>] __do_softirq+0x42/0xa0
> [4294692.342000]  [<c01066e1>] do_softirq+0x51/0x60
> -----
> 
> Is this the same issue?
> Can you describe how you create this issue; kernel version etc.
> 
> cheers,
> jamal
> 
> On Fri, 2005-15-04 at 17:37 -0400, Steven Rostedt wrote:
> > On Thu, 2005-04-14 at 18:46 +0300, Tarhon-Onu Victor wrote:
> > > On Tue, 12 Apr 2005, Tarhon-Onu Victor wrote:
> > > 
> > > > 	So the problem should be looked in that changes to the pkt sched API, 
> > > > the patch containing only those changes is at
> > > 
> > >  	The bug is in this portion of code from net/sched/sch_generic.c, 
> > > in the qdisc_destroy() function:
> > > 
> > > ==
> > >       list_for_each_entry(cq, &cql, list)
> > >            list_for_each_entry_safe(q, n, &qdisc->dev->qdisc_list, list)
> > >                 if (TC_H_MAJ(q->parent) == TC_H_MAJ(cq->handle)) {
> > >                      if (q->ops->cl_ops == NULL)
> > >                           list_del_init(&q->list);
> > >                      else
> > >                           list_move_tail(&q->list, &cql);
> > >                 }
> > >       list_for_each_entry_safe(cq, n, &cql, list)
> > >            list_del_init(&cq->list);
> > > ==
> > > 
> > >  	...and it happens when q->ops->cl_ops is NULL and 
> > > list_del_init(&q->list) is executed.
> > > 
> > >  	The stuff from include/linux/list.h looks ok, it seems like one 
> > > of those two iterations (list_for_each_entry() and 
> > > list_for_each_entry_safe()) enters an endless loop when an element is 
> > > removed from the list under some circumstances.
> > 
> > There's a comment above qdisc_destroy that says:
> > 
> > /* Under dev->queue_lock and BH! */
> > 
> > I'm not so sure this is the case.  I've included the emails of those
> > listed as Authors of sch_generic.c and sch_htb.c, hopefully they are the
> > ones who can help (if not, sorry to bother you).  
> > 
> > The list.h is fine, but if another task goes down this list when it
> > list_del_init is done, there's a chance that the reading task can get to
> > the deleted item just as it is being deleted, and has pointed itself to
> > itself. p->next == p.  This would go into an infinite loop.  
> > 
> > The reason sysrq works is because this doesn't stop interrupts. But put
> > a local_irq_save around that list and run your test, I bet you won't be
> > able to do anything, but power off with the big button.
> > 
> > Hope someone can help. I don't know the queue disciplines well enough to
> > make a proper fix.
> > 
> > -- Steve
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-15 21:54         ` Steven Rostedt
@ 2005-04-15 22:54           ` Thomas Graf
  2005-04-16  1:49             ` Herbert Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Graf @ 2005-04-15 22:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: hadi, netdev, Tarhon-Onu Victor, kuznet, devik, linux-kernel,
	Patrick McHardy

* Steven Rostedt <1113602052.4294.89.camel@localhost.localdomain> 2005-04-15 17:54
> > ==
> >       list_for_each_entry(cq, &cql, list)
> >            list_for_each_entry_safe(q, n, &qdisc->dev->qdisc_list, list)
> >                 if (TC_H_MAJ(q->parent) == TC_H_MAJ(cq->handle)) {
> >                      if (q->ops->cl_ops == NULL)
> >                           list_del_init(&q->list);
> >                      else
> >                           list_move_tail(&q->list, &cql);
> >                 }
> >       list_for_each_entry_safe(cq, n, &cql, list)
> >            list_del_init(&cq->list);
> > ==
> > 
> >  	...and it happens when q->ops->cl_ops is NULL and 
> > list_del_init(&q->list) is executed.
> > 
> >  	The stuff from include/linux/list.h looks ok, it seems like one 
> > of those two iterations (list_for_each_entry() and 
> > list_for_each_entry_safe()) enters an endless loop when an element is 
> > removed from the list under some circumstances.
> 
> There's a comment above qdisc_destroy that says:
> 
> /* Under dev->queue_lock and BH! */
> 
> I'm not so sure this is the case.

It's not, we should change the comment. qdisc_destroy calls for inner
leaf qdiscs comming from rcu scheduled __qdisc_destroy -> qdisc::destroy()
-> qdisc_destroy() will not have a lock on queue_lock but it shouldn't be
a problem since the qdiscs cannot be found anymore.

Another case were it's not locked is upon a deletion of a class where
the class deletes its inner qdisc, although there is only one case
how this can happen and that's under rtnl semaphore so there is no
way we can have a dumper at the same time.

A wild guess would be that one of the many wrong locking places where
dev->queue_lock is acquired but qdisc_tree_lock is not is the problem.
tc_dump_qdisc assumed that locking on qdisc_tree_lock is enough which
is absolutely right, still most callers to qdisc_destroy only lock
on dev->queue_lock for the modification of the tree, which _was_ fine
before we used list.h since the unlinking was atomic. This is no news
and Patrick's patch in 2.6.10 ought to have fixed this by unlinking
inner leaf qdiscs from dev->qdisc_list and thus preventing them to
be found by anyone during unlocked calls to destroy_qdisc.

So... we might have a unlocked qdisc_destroy call for a qdisc not
properly unlinked from dev->qdisc_list.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-15 22:54           ` Thomas Graf
@ 2005-04-16  1:49             ` Herbert Xu
  2005-04-16  5:01               ` Steven Rostedt
  2005-04-16 11:06               ` Thomas Graf
  0 siblings, 2 replies; 13+ messages in thread
From: Herbert Xu @ 2005-04-16  1:49 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Steven Rostedt, hadi, netdev, Tarhon-Onu Victor, kuznet, devik,
	linux-kernel, Patrick McHardy, David S. Miller

[-- Attachment #1: Type: text/plain, Size: 2081 bytes --]

On Fri, Apr 15, 2005 at 10:54:22PM +0000, Thomas Graf wrote:
>
> Another case were it's not locked is upon a deletion of a class where
> the class deletes its inner qdisc, although there is only one case
> how this can happen and that's under rtnl semaphore so there is no
> way we can have a dumper at the same time.

Sorry, that's where tc went astray :)

The assumption that the rtnl is held during dumping is false.  It is
only true for the initial dump call.  All subsequent dumps are not
protected by the rtnl.

The solution is certainly not to place the dumpers under rtnl :)
The dump operation is read-only and should not need any exclusive
locks.

In fact the whole qdisc locking is a mess.  It's a cross-breed
between locking with ad-hoc reference counting and RCU.  What's
more, the RCU is completely useless too because for the case
where we actually have a queue we still end up taking the spin
lock on each transmit! I think someone's been benchmarking the
loopback device again :)

It needs to be reengineered.

Here is a quick'n'dirty fix to the problem at hand.  What happened
between 2.6.10-rc1 and 2.6.10-rc2 is that qdisc_destroy started
changing the next pointer of qdisc entries which totally confuses
the readers because qdisc_destroy doesn't always take the tree lock.

This patch tries to ensure that all top-level calls to qdisc_destroy
come under the tree lock.  As Thomas correctedly pointed out, most
of the other qdisc_destroy calls occur after the top qdisc has been
unlinked from the device qdisc_list.  However, someone should go
through each one of the remaining ones (they're all in the individual
sch_* implementations) and make sure that this assumption is really
true.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

If anyone has cycles to spare and a stomach strong enough for
this stuff, here is your chance :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: p --]
[-- Type: text/plain, Size: 966 bytes --]

===== net/sched/sch_api.c 1.47 vs edited =====
--- 1.47/net/sched/sch_api.c	2005-04-01 14:35:56 +10:00
+++ edited/net/sched/sch_api.c	2005-04-16 08:47:16 +10:00
@@ -608,9 +608,9 @@
 			return err;
 		if (q) {
 			qdisc_notify(skb, n, clid, q, NULL);
-			spin_lock_bh(&dev->queue_lock);
+			qdisc_lock_tree(dev);
 			qdisc_destroy(q);
-			spin_unlock_bh(&dev->queue_lock);
+			qdisc_unlock_tree(dev);
 		}
 	} else {
 		qdisc_notify(skb, n, clid, NULL, q);
@@ -743,17 +743,17 @@
 		err = qdisc_graft(dev, p, clid, q, &old_q);
 		if (err) {
 			if (q) {
-				spin_lock_bh(&dev->queue_lock);
+				qdisc_lock_tree(dev);
 				qdisc_destroy(q);
-				spin_unlock_bh(&dev->queue_lock);
+				qdisc_unlock_tree(dev);
 			}
 			return err;
 		}
 		qdisc_notify(skb, n, clid, old_q, q);
 		if (old_q) {
-			spin_lock_bh(&dev->queue_lock);
+			qdisc_lock_tree(dev);
 			qdisc_destroy(old_q);
-			spin_unlock_bh(&dev->queue_lock);
+			qdisc_unlock_tree(dev);
 		}
 	}
 	return 0;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-16  1:49             ` Herbert Xu
@ 2005-04-16  5:01               ` Steven Rostedt
  2005-04-16 11:06               ` Thomas Graf
  1 sibling, 0 replies; 13+ messages in thread
From: Steven Rostedt @ 2005-04-16  5:01 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Thomas Graf, hadi, netdev, Tarhon-Onu Victor, kuznet, devik,
	linux-kernel, Patrick McHardy, David S. Miller

On Sat, 2005-04-16 at 11:49 +1000, Herbert Xu wrote:


> Here is a quick'n'dirty fix to the problem at hand.  What happened
> between 2.6.10-rc1 and 2.6.10-rc2 is that qdisc_destroy started
> changing the next pointer of qdisc entries which totally confuses
> the readers because qdisc_destroy doesn't always take the tree lock.
> 
> This patch tries to ensure that all top-level calls to qdisc_destroy
> come under the tree lock.  As Thomas correctedly pointed out, most
> of the other qdisc_destroy calls occur after the top qdisc has been
> unlinked from the device qdisc_list.  However, someone should go
> through each one of the remaining ones (they're all in the individual
> sch_* implementations) and make sure that this assumption is really
> true.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> If anyone has cycles to spare and a stomach strong enough for
> this stuff, here is your chance :)
> 

FYI,

I ran the test case that Tarhon-Ohn had, but had to change his tc
execution from batch to single lines since the version of tc I have
segfaults on newlines.  Anyway, I did see the lock up with 2.6.11.2
after 7 iterations. I applied your patch, and it ran for 30 iterations
before I manually killed it. I didn't test any more than that, but this
seems to be the quick fix for now.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-16  1:49             ` Herbert Xu
  2005-04-16  5:01               ` Steven Rostedt
@ 2005-04-16 11:06               ` Thomas Graf
  2005-04-16 11:12                 ` Herbert Xu
  2005-04-16 11:23                 ` Herbert Xu
  1 sibling, 2 replies; 13+ messages in thread
From: Thomas Graf @ 2005-04-16 11:06 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Steven Rostedt, hadi, netdev, Tarhon-Onu Victor, kuznet, devik,
	linux-kernel, Patrick McHardy, David S. Miller

* Herbert Xu <20050416014906.GA3291@gondor.apana.org.au> 2005-04-16 11:49
> On Fri, Apr 15, 2005 at 10:54:22PM +0000, Thomas Graf wrote:
> >
> > Another case were it's not locked is upon a deletion of a class where
> > the class deletes its inner qdisc, although there is only one case
> > how this can happen and that's under rtnl semaphore so there is no
> > way we can have a dumper at the same time.
> 
> Sorry, that's where tc went astray :)
> 
> The assumption that the rtnl is held during dumping is false.  It is
> only true for the initial dump call.  All subsequent dumps are not
> protected by the rtnl.

Ahh.. it's the unlocked subsequent dump calls that are _still_ running
when the destroy is invoked. That's where Patrick and I went wrong
when we tried to fix this issue. We set our t0 to qdisc_destroy and
didn't really consider any prior unlocked tasks still running.

> In fact the whole qdisc locking is a mess.  It's a cross-breed
> between locking with ad-hoc reference counting and RCU.  What's
> more, the RCU is completely useless too because for the case
> where we actually have a queue we still end up taking the spin
> lock on each transmit! I think someone's been benchmarking the
> loopback device again :)

It's not completely useless, it speeds up the deletion classful
qdiscs having some depth. However, it's not worth the locking
troubles I guess.

> Here is a quick'n'dirty fix to the problem at hand.

I think it's pretty clean but it doesn't solve the problem completely,
see below.

> This patch tries to ensure that all top-level calls to qdisc_destroy
> come under the tree lock.  As Thomas correctedly pointed out, most
> of the other qdisc_destroy calls occur after the top qdisc has been
> unlinked from the device qdisc_list.  However, someone should go
> through each one of the remaining ones (they're all in the individual
> sch_* implementations) and make sure that this assumption is really
> true.

qdisc_destroy can still be invoked without qdisc_tree_lock via the
deletion of a class when it calls qdisc_destroy to destroy its
leaf qdisc.

> If anyone has cycles to spare and a stomach strong enough for
> this stuff, here is your chance :)

I will look into this.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-16 11:06               ` Thomas Graf
@ 2005-04-16 11:12                 ` Herbert Xu
  2005-04-17 17:46                   ` Patrick McHardy
  2005-04-16 11:23                 ` Herbert Xu
  1 sibling, 1 reply; 13+ messages in thread
From: Herbert Xu @ 2005-04-16 11:12 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Steven Rostedt, hadi, netdev, Tarhon-Onu Victor, kuznet, devik,
	linux-kernel, Patrick McHardy, David S. Miller

On Sat, Apr 16, 2005 at 01:06:39PM +0200, Thomas Graf wrote:
> 
> qdisc_destroy can still be invoked without qdisc_tree_lock via the
> deletion of a class when it calls qdisc_destroy to destroy its
> leaf qdisc.

Indeed.  Fortuantely HTB seems to be safe as it calls sch_tree_lock
which is another name for qdisc_tree_lock.  CBQ on the other hand
needs to have a little tweak.

> I will look into this.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-16 11:12                 ` Herbert Xu
@ 2005-04-17 17:46                   ` Patrick McHardy
  2005-04-17 21:37                     ` Herbert Xu
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2005-04-17 17:46 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Thomas Graf, Steven Rostedt, hadi, netdev, Tarhon-Onu Victor,
	kuznet, devik, linux-kernel, David S. Miller

Herbert Xu wrote:
> On Sat, Apr 16, 2005 at 01:06:39PM +0200, Thomas Graf wrote:
> 
>>qdisc_destroy can still be invoked without qdisc_tree_lock via the
>>deletion of a class when it calls qdisc_destroy to destroy its
>>leaf qdisc.
>
> Indeed.  Fortuantely HTB seems to be safe as it calls sch_tree_lock
> which is another name for qdisc_tree_lock.  CBQ on the other hand
> needs to have a little tweak.

HTB also needs to be fixed. Destruction is usually defered by the
refcnt until ->put(), htb_put() doesn't lock the tree. Same for
HFSC and CBQ.

Regards
Patrick

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-17 17:46                   ` Patrick McHardy
@ 2005-04-17 21:37                     ` Herbert Xu
  0 siblings, 0 replies; 13+ messages in thread
From: Herbert Xu @ 2005-04-17 21:37 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Thomas Graf, Steven Rostedt, hadi, netdev, Tarhon-Onu Victor,
	kuznet, devik, linux-kernel, David S. Miller

On Sun, Apr 17, 2005 at 07:46:16PM +0200, Patrick McHardy wrote:
> 
> HTB also needs to be fixed. Destruction is usually defered by the
> refcnt until ->put(), htb_put() doesn't lock the tree. Same for
> HFSC and CBQ.

Yes you're absolutely right.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-16 11:06               ` Thomas Graf
  2005-04-16 11:12                 ` Herbert Xu
@ 2005-04-16 11:23                 ` Herbert Xu
  2005-04-16 11:34                   ` Thomas Graf
  1 sibling, 1 reply; 13+ messages in thread
From: Herbert Xu @ 2005-04-16 11:23 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Steven Rostedt, hadi, netdev, Tarhon-Onu Victor, kuznet, devik,
	linux-kernel, Patrick McHardy, David S. Miller

On Sat, Apr 16, 2005 at 01:06:39PM +0200, Thomas Graf wrote:
> 
> It's not completely useless, it speeds up the deletion classful
> qdiscs having some depth. However, it's not worth the locking
> troubles I guess.

RCU is meant to optimise the common reader path.  In this case
that's the packet transmission code.  Unfortunately it fails
miserably when judged by that criterion.
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-16 11:23                 ` Herbert Xu
@ 2005-04-16 11:34                   ` Thomas Graf
  2005-04-16 16:04                     ` jamal
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Graf @ 2005-04-16 11:34 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Steven Rostedt, hadi, netdev, Tarhon-Onu Victor, kuznet, devik,
	linux-kernel, Patrick McHardy, David S. Miller

* Herbert Xu <20050416112329.GA31847@gondor.apana.org.au> 2005-04-16 21:23
> On Sat, Apr 16, 2005 at 01:06:39PM +0200, Thomas Graf wrote:
> > 
> > It's not completely useless, it speeds up the deletion classful
> > qdiscs having some depth. However, it's not worth the locking
> > troubles I guess.
> 
> RCU is meant to optimise the common reader path.  In this case
> that's the packet transmission code.  Unfortunately it fails
> miserably when judged by that criterion.

There is one case where it can do good for latency which is for
per flow qdiscs or any other scenarios implying hundreds or
thousands of leaf qdiscs where a destroyage of one such qdisc
tree will take up quite some cpu to traverse all the classes
under dev->queue_lock. I don't have any numbers on this, but
I don't completely dislike the method of hiding the qdiscs under
the lock and do the expensive traveling unlocked.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-16 11:34                   ` Thomas Graf
@ 2005-04-16 16:04                     ` jamal
  2005-04-16 18:21                       ` Thomas Graf
  0 siblings, 1 reply; 13+ messages in thread
From: jamal @ 2005-04-16 16:04 UTC (permalink / raw)
  To: Thomas Graf
  Cc: Herbert Xu, Steven Rostedt, netdev, Tarhon-Onu Victor, kuznet,
	devik, linux-kernel, Patrick McHardy, David S. Miller

On Sat, 2005-16-04 at 13:34 +0200, Thomas Graf wrote:
> * Herbert Xu <20050416112329.GA31847@gondor.apana.org.au> 2005-04-16 21:23
> > On Sat, Apr 16, 2005 at 01:06:39PM +0200, Thomas Graf wrote:
> > > 
> > > It's not completely useless, it speeds up the deletion classful
> > > qdiscs having some depth. However, it's not worth the locking
> > > troubles I guess.
> > 
> > RCU is meant to optimise the common reader path.  In this case
> > that's the packet transmission code.  Unfortunately it fails
> > miserably when judged by that criterion.
> 
> There is one case where it can do good for latency which is for
> per flow qdiscs or any other scenarios implying hundreds or
> thousands of leaf qdiscs where a destroyage of one such qdisc
> tree will take up quite some cpu to traverse all the classes
> under dev->queue_lock. I don't have any numbers on this, but
> I don't completely dislike the method of hiding the qdiscs under
> the lock and do the expensive traveling unlocked.

The rule of "optimize for the common" fails miserably in this case
because this is not a common case/usage of qdiscs.
I have a feeling though that the patch went in due to
dude-optimizing-loopback as pointed by Herbert. 
It could also be it was done because  RCU-is-so-cool. I dont recall.
Maybe worth reverting to the earlier scheme if it is going to continue
to be problematic.

cheers,
jamal

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: ACPI/HT or Packet Scheduler BUG?
  2005-04-16 16:04                     ` jamal
@ 2005-04-16 18:21                       ` Thomas Graf
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Graf @ 2005-04-16 18:21 UTC (permalink / raw)
  To: jamal
  Cc: Herbert Xu, Steven Rostedt, netdev, Tarhon-Onu Victor, kuznet,
	devik, linux-kernel, Patrick McHardy, David S. Miller

* jamal <1113667447.7419.9.camel@localhost.localdomain> 2005-04-16 12:04
> The rule of "optimize for the common" fails miserably in this case
> because this is not a common case/usage of qdiscs.

I tend to agree. OTOH, I use exactly such setups... ;->

> I have a feeling though that the patch went in due to
> dude-optimizing-loopback as pointed by Herbert. 

I checked, it was in fact during the lockless loopback
optimizations.

> Maybe worth reverting to the earlier scheme if it is going to continue
> to be problematic.

Let me first check and see how the locking can be done at best, it
doesn't match the principles in sch_generic.c anyway at the moment
so once we know how to do the locking efficient and how to remove the
error proneess we can see if the optimization fits in without problems
and make a call.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2005-04-17 21:37 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.61.0504081225510.27991@blackblue.iasi.rdsnet.ro>
     [not found] ` <Pine.LNX.4.61.0504121526550.4822@blackblue.iasi.rdsnet.ro>
     [not found]   ` <Pine.LNX.4.61.0504141840420.13546@blackblue.iasi.rdsnet.ro>
     [not found]     ` <1113601029.4294.80.camel@localhost.localdomain>
2005-04-15 21:44       ` ACPI/HT or Packet Scheduler BUG? jamal
2005-04-15 21:54         ` Steven Rostedt
2005-04-15 22:54           ` Thomas Graf
2005-04-16  1:49             ` Herbert Xu
2005-04-16  5:01               ` Steven Rostedt
2005-04-16 11:06               ` Thomas Graf
2005-04-16 11:12                 ` Herbert Xu
2005-04-17 17:46                   ` Patrick McHardy
2005-04-17 21:37                     ` Herbert Xu
2005-04-16 11:23                 ` Herbert Xu
2005-04-16 11:34                   ` Thomas Graf
2005-04-16 16:04                     ` jamal
2005-04-16 18:21                       ` Thomas Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).