netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* PROBLEM: kernel lockup while changing TC rules
@ 2008-05-01 15:49 Jan 'yanek' Bortl
  2008-05-03  7:16 ` Jarek Poplawski
  0 siblings, 1 reply; 12+ messages in thread
From: Jan 'yanek' Bortl @ 2008-05-01 15:49 UTC (permalink / raw)
  To: netdev

Hi,

I have found this problem with today's git (2.6.25-00000-ge4c576b) and 2.6.25:

Network configuration:

+--------+         vlan20 +---------+ vlan80
| laptop |----------------| router  |-----
+--------+                +---------+ 
laptop has address 192.168.243.10

Reproduce HOWTO:

1. run http://ya.bofh.cz/archive/kernel-2.6-htbcrash/init.sh
2. start packet generator on laptop (any traffic targetted to router's vlan80)
3. run http://ya.bofh.cz/archive/kernel-2.6-htbcrash/crash.sh

Then machine lockup and afterwhile print these messages:

[  395.137697] BUG: soft lockup - CPU#0 stuck for 61s! [ksoftirqd/0:4]
...
[  395.139871] Call Trace:
[  395.139940]  <IRQ>  [<ffffffff8035ea7c>] ? rb_insert_color+0xbc/0xf0
[  395.140012]  [<ffffffffa01f9cf6>] ? :sch_htb:htb_add_to_wait_tree+0xa6/0xc0
[  395.140061]  [<ffffffffa01fb46f>] ? :sch_htb:htb_dequeue+0x47f/0x7f0
[  395.140111]  [<ffffffffa01f9ec2>] ? :sch_htb:htb_activate_prios+0x122/0x140
[  395.140160]  [<ffffffff804682f6>] ? __qdisc_run+0x216/0x240
[  395.140207]  [<ffffffff804584e3>] ? dev_queue_xmit+0x2c3/0x390
[  395.140253]  [<ffffffff8047db07>] ? ip_finish_output+0x117/0x2a0
[  395.140300]  [<ffffffff8047dfe0>] ? ip_output+0x70/0xb0
[  395.140344]  [<ffffffff8047ac68>] ? ip_forward_finish+0x38/0x50

and

[  527.210416] BUG: soft lockup - CPU#1 stuck for 61s! [tc:2848]
...
[  527.212644] Call Trace:
[  527.212715]  [<ffffffff803617ff>] ? __delay+0xf/0x30
[  527.212759]  [<ffffffff80365a4c>] ? _raw_spin_lock+0x10c/0x180
[  527.212805]  [<ffffffff804d90a6>] ? _spin_lock_bh+0x56/0x70
[  527.212849]  [<ffffffff80467a3f>] ? qdisc_lock_tree+0x1f/0x30
[  527.212895]  [<ffffffffa0228bb4>] ? :sch_sfq:sfq_init+0xf4/0x240
[  527.212942]  [<ffffffff804693e4>] ? qdisc_create+0x154/0x250
[  527.212987]  [<ffffffff804710d3>] ? nla_parse+0x33/0xf0
[  527.213031]  [<ffffffff80469fd0>] ? tc_modify_qdisc+0x90/0x420
[  527.213079]  [<ffffffff804603d9>] ? rtnetlink_rcv_msg+0x1e9/0x230
[  527.213125]  [<ffffffff804601f0>] ? rtnetlink_rcv_msg+0x0/0x230

(full output here: http://ya.bofh.cz/archive/kernel-2.6-htbcrash/netconsole.txt)

kernel's config here: http://ya.bofh.cz/archive/kernel-2.6-htbcrash/config.txt
dmesg after boot here: http://ya.bofh.cz/archive/kernel-2.6-htbcrash/dmesg.txt

-- 
Jan 'yanek' Bortl <yanek [at] ya.bofh. cz>
http://ya.bofh.cz/ | jab: yanek [at] mitranet. cz
-----------------------------------------------------------------
"Maybe one day you will learn that your way is not the only way."
                                        Opher [StarGate: The Nox]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: kernel lockup while changing TC rules
       [not found] <20080501145239.GA20284@atlantis.mitranet.cz>
@ 2008-05-01 21:33 ` David Miller
  2008-05-03  0:26 ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2008-05-01 21:33 UTC (permalink / raw)
  To: yanek; +Cc: linux-net, netdev

From: Jan 'yanek' Bortl <yanek@ya.bofh.cz>
Date: Thu, 1 May 2008 16:52:39 +0200

CC:'ing netdev@vger.kernel.org, where such reports belong.

> I have found this problem with today's git (2.6.25-00000-ge4c576b) and 2.6.25:
> 
> Network configuration:
> 
> +--------+         vlan20 +---------+ vlan80
> | laptop |----------------| router  |-----
> +--------+                +---------+ 
> laptop has address 192.168.243.10
> 
> Reproduce HOWTO:
> 
> 1. run http://ya.bofh.cz/archive/kernel-2.6-htbcrash/init.sh
> 2. start packet generator on laptop (any traffic targetted to router's vlan80)
> 3. run http://ya.bofh.cz/archive/kernel-2.6-htbcrash/crash.sh
> 
> Then machine lockup and afterwhile print these messages:
> 
> [  395.137697] BUG: soft lockup - CPU#0 stuck for 61s! [ksoftirqd/0:4]
> ...
> [  395.139871] Call Trace:
> [  395.139940]  <IRQ>  [<ffffffff8035ea7c>] ? rb_insert_color+0xbc/0xf0
> [  395.140012]  [<ffffffffa01f9cf6>] ? :sch_htb:htb_add_to_wait_tree+0xa6/0xc0
> [  395.140061]  [<ffffffffa01fb46f>] ? :sch_htb:htb_dequeue+0x47f/0x7f0
> [  395.140111]  [<ffffffffa01f9ec2>] ? :sch_htb:htb_activate_prios+0x122/0x140
> [  395.140160]  [<ffffffff804682f6>] ? __qdisc_run+0x216/0x240
> [  395.140207]  [<ffffffff804584e3>] ? dev_queue_xmit+0x2c3/0x390
> [  395.140253]  [<ffffffff8047db07>] ? ip_finish_output+0x117/0x2a0
> [  395.140300]  [<ffffffff8047dfe0>] ? ip_output+0x70/0xb0
> [  395.140344]  [<ffffffff8047ac68>] ? ip_forward_finish+0x38/0x50
> 
> and
> 
> [  527.210416] BUG: soft lockup - CPU#1 stuck for 61s! [tc:2848]
> ...
> [  527.212644] Call Trace:
> [  527.212715]  [<ffffffff803617ff>] ? __delay+0xf/0x30
> [  527.212759]  [<ffffffff80365a4c>] ? _raw_spin_lock+0x10c/0x180
> [  527.212805]  [<ffffffff804d90a6>] ? _spin_lock_bh+0x56/0x70
> [  527.212849]  [<ffffffff80467a3f>] ? qdisc_lock_tree+0x1f/0x30
> [  527.212895]  [<ffffffffa0228bb4>] ? :sch_sfq:sfq_init+0xf4/0x240
> [  527.212942]  [<ffffffff804693e4>] ? qdisc_create+0x154/0x250
> [  527.212987]  [<ffffffff804710d3>] ? nla_parse+0x33/0xf0
> [  527.213031]  [<ffffffff80469fd0>] ? tc_modify_qdisc+0x90/0x420
> [  527.213079]  [<ffffffff804603d9>] ? rtnetlink_rcv_msg+0x1e9/0x230
> [  527.213125]  [<ffffffff804601f0>] ? rtnetlink_rcv_msg+0x0/0x230
> 
> (full output here: http://ya.bofh.cz/archive/kernel-2.6-htbcrash/netconsole.txt)
> 
> kernel's config here: http://ya.bofh.cz/archive/kernel-2.6-htbcrash/config.txt
> dmesg after boot here: http://ya.bofh.cz/archive/kernel-2.6-htbcrash/dmesg.txt
> 
> 
> -- 
> Jan 'yanek' Bortl <yanek [at] ya.bofh. cz>
> http://ya.bofh.cz/ | jab: yanek [at] mitranet. cz
> -----------------------------------------------------------------
> "Maybe one day you will learn that your way is not the only way."
>                                         Opher [StarGate: The Nox]
> --
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: kernel lockup while changing TC rules
       [not found] <20080501145239.GA20284@atlantis.mitranet.cz>
  2008-05-01 21:33 ` PROBLEM: kernel lockup while changing TC rules David Miller
@ 2008-05-03  0:26 ` David Miller
  2008-05-03  6:16   ` Stephen Hemminger
  1 sibling, 1 reply; 12+ messages in thread
From: David Miller @ 2008-05-03  0:26 UTC (permalink / raw)
  To: yanek; +Cc: netdev

From: Jan 'yanek' Bortl <yanek@ya.bofh.cz>
Date: Thu, 1 May 2008 17:49:14 +0200

> I have found this problem with today's git (2.6.25-00000-ge4c576b) and 2.6.25:

Thanks for this report, I'll try to figure it out.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: kernel lockup while changing TC rules
  2008-05-03  0:26 ` David Miller
@ 2008-05-03  6:16   ` Stephen Hemminger
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2008-05-03  6:16 UTC (permalink / raw)
  To: David Miller; +Cc: yanek, netdev

On Fri, 02 May 2008 17:26:00 -0700 (PDT)
David Miller <davem@davemloft.net> wrote:

> From: Jan 'yanek' Bortl <yanek@ya.bofh.cz>
> Date: Thu, 1 May 2008 17:49:14 +0200
> 
> > I have found this problem with today's git (2.6.25-00000-ge4c576b) and 2.6.25:
> 
> Thanks for this report, I'll try to figure it out.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

This problem isn't new (I think it is even buried in kernel bugzilla).

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: kernel lockup while changing TC rules
  2008-05-01 15:49 PROBLEM: kernel lockup while changing TC rules Jan 'yanek' Bortl
@ 2008-05-03  7:16 ` Jarek Poplawski
  2008-05-03  9:43   ` Jan 'yanek' Bortl
  0 siblings, 1 reply; 12+ messages in thread
From: Jarek Poplawski @ 2008-05-03  7:16 UTC (permalink / raw)
  To: Jan 'yanek' Bortl; +Cc: netdev

Jan 'yanek' Bortl wrote, On 05/01/2008 05:49 PM:

> Hi,

Hi,
> 
> I have found this problem with today's git (2.6.25-00000-ge4c576b) and 2.6.25:

Does this mean the bug triggers in 2.6.25 and later, but not in 2.6.24?
If so, is it possible to reproduce this e.g. with 2.6.25-rc6?

> 
> Network configuration:
> 
> +--------+         vlan20 +---------+ vlan80
> | laptop |----------------| router  |-----
> +--------+                +---------+ 
> laptop has address 192.168.243.10
> 
> Reproduce HOWTO:

Very nice description and logs, but alas I'm not able to test it, so
a few questions:

- there are quite a lot of networking modules loaded like bonding or
ifb: are there some other scripts (especially with virtual devices)?
- could you send vlan and routing rules on this router?
- does it always break with the same traces?

Thanks,
Jarek P.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: kernel lockup while changing TC rules
  2008-05-03  7:16 ` Jarek Poplawski
@ 2008-05-03  9:43   ` Jan 'yanek' Bortl
  2008-05-03 12:11     ` Jarek Poplawski
  2008-05-03 16:39     ` Jarek Poplawski
  0 siblings, 2 replies; 12+ messages in thread
From: Jan 'yanek' Bortl @ 2008-05-03  9:43 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Jarek Poplawski wrote:
> Jan 'yanek' Bortl wrote, On 05/01/2008 05:49 PM:
> 
>> Hi,
> 
> Hi,
>> I have found this problem with today's git (2.6.25-00000-ge4c576b) and 2.6.25:
> 
> Does this mean the bug triggers in 2.6.25 and later, but not in 2.6.24?
> If so, is it possible to reproduce this e.g. with 2.6.25-rc6?

I firstly discovered this on 2.6.22-6~bpo40+1 (debian's backports), but I'm 
not sure if that was same thing (it is long ago).

Now i tested on test machine with these kernels:
2.6.24, 2.6.24.6 
(http://ya.bofh.cz/archive/kernel-2.6-htbcrash/2/netconsole-2.6.24.6-slon.txt), 
2.6.25

I can test anything you want.

  > Very nice description and logs, but alas I'm not able to test it, so
> a few questions:
> 
> - there are quite a lot of networking modules loaded like bonding or
> ifb: are there some other scripts (especially with virtual devices)?

I kicked them out now (ifb, vlan, bonding). Problem persist.

  +--------+           eth1 +---------+ eth2
  | laptop |----------------| router  |-----
  +--------+                +---------+
  laptop has address 192.168.243.10


> - could you send vlan and routing rules on this router?

http://ya.bofh.cz/archive/kernel-2.6-htbcrash/2/ipa-2.6.25-slon-00000-ge4c576b.txt
http://ya.bofh.cz/archive/kernel-2.6-htbcrash/2/ipro-2.6.25-slon-00000-ge4c576b.txt

> - does it always break with the same traces?

Yes.

Another run (without that modules): 
http://ya.bofh.cz/archive/kernel-2.6-htbcrash/2/netconsole-2.6.25-slon-00000-ge4c576b.txt


-- 
Jan 'yanek' Bortl <yanek [at] ya.bofh. cz>
http://ya.bofh.cz/ | jab: yanek [at] mitranet. cz
-----------------------------------------------------------------
"Maybe one day you will learn that your way is not the only way."
                                         Opher [StarGate: The Nox]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: kernel lockup while changing TC rules
  2008-05-03  9:43   ` Jan 'yanek' Bortl
@ 2008-05-03 12:11     ` Jarek Poplawski
  2008-05-03 12:42       ` Jan 'yanek' Bortl
  2008-05-03 16:39     ` Jarek Poplawski
  1 sibling, 1 reply; 12+ messages in thread
From: Jarek Poplawski @ 2008-05-03 12:11 UTC (permalink / raw)
  To: Jan 'yanek' Bortl; +Cc: netdev

On Sat, May 03, 2008 at 11:43:46AM +0200, Jan 'yanek' Bortl wrote:
> Jarek Poplawski wrote:
>> Jan 'yanek' Bortl wrote, On 05/01/2008 05:49 PM:
>>
>>> Hi,
>>
>> Hi,
>>> I have found this problem with today's git (2.6.25-00000-ge4c576b) and 2.6.25:
>>
>> Does this mean the bug triggers in 2.6.25 and later, but not in 2.6.24?
>> If so, is it possible to reproduce this e.g. with 2.6.25-rc6?
>
> I firstly discovered this on 2.6.22-6~bpo40+1 (debian's backports), but 
> I'm not sure if that was same thing (it is long ago).
>
> Now i tested on test machine with these kernels:
> 2.6.24, 2.6.24.6  
> (http://ya.bofh.cz/archive/kernel-2.6-htbcrash/2/netconsole-2.6.24.6-slon.txt), 
> 2.6.25
>
> I can test anything you want.

Great! I really appreciate! You're very helpful in catching this rare
bug. Alas, there is still nothing obvious at least to me, so I need
more time for any idea...

BTW, one little doubt: are you really sure vanilla 2.6.24 (without .6
etc.) gives the same? (There were some changes backported from 2.6.25
which I'd like to exclude.)

Regards,
Jarek P.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: kernel lockup while changing TC rules
  2008-05-03 12:11     ` Jarek Poplawski
@ 2008-05-03 12:42       ` Jan 'yanek' Bortl
  0 siblings, 0 replies; 12+ messages in thread
From: Jan 'yanek' Bortl @ 2008-05-03 12:42 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Jarek Poplawski wrote:
> On Sat, May 03, 2008 at 11:43:46AM +0200, Jan 'yanek' Bortl wrote:
>> Jarek Poplawski wrote:
>>> Jan 'yanek' Bortl wrote, On 05/01/2008 05:49 PM:
>>>
>>>> Hi,
>>> Hi,
>>>> I have found this problem with today's git (2.6.25-00000-ge4c576b) and 2.6.25:
>>> Does this mean the bug triggers in 2.6.25 and later, but not in 2.6.24?
>>> If so, is it possible to reproduce this e.g. with 2.6.25-rc6?
>> I firstly discovered this on 2.6.22-6~bpo40+1 (debian's backports), but 
>> I'm not sure if that was same thing (it is long ago).
>>
>> Now i tested on test machine with these kernels:
>> 2.6.24, 2.6.24.6  
>> (http://ya.bofh.cz/archive/kernel-2.6-htbcrash/2/netconsole-2.6.24.6-slon.txt), 
>> 2.6.25
>>
>> I can test anything you want.
> 
> Great! I really appreciate! You're very helpful in catching this rare
> bug. Alas, there is still nothing obvious at least to me, so I need
> more time for any idea...
> 
> BTW, one little doubt: are you really sure vanilla 2.6.24 (without .6
> etc.) gives the same? (There were some changes backported from 2.6.25
> which I'd like to exclude.)

Yes. http://ya.bofh.cz/archive/kernel-2.6-htbcrash/3/netconsole.txt
(config: http://ya.bofh.cz/archive/kernel-2.6-htbcrash/3/config-2.6.24-slon2)

-- 
Jan 'yanek' Bortl <yanek [at] ya.bofh. cz>
http://ya.bofh.cz/ | jab: yanek [at] mitranet. cz
-----------------------------------------------------------------
"Maybe one day you will learn that your way is not the only way."
                                         Opher [StarGate: The Nox]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: PROBLEM: kernel lockup while changing TC rules
  2008-05-03  9:43   ` Jan 'yanek' Bortl
  2008-05-03 12:11     ` Jarek Poplawski
@ 2008-05-03 16:39     ` Jarek Poplawski
  2008-05-03 17:26       ` Jan 'yanek' Bortl
  1 sibling, 1 reply; 12+ messages in thread
From: Jarek Poplawski @ 2008-05-03 16:39 UTC (permalink / raw)
  To: Jan 'yanek' Bortl; +Cc: netdev

On Sat, May 03, 2008 at 11:43:46AM +0200, Jan 'yanek' Bortl wrote:
...
>>> I have found this problem with today's git (2.6.25-00000-ge4c576b) and 2.6.25:
>>
>> Does this mean the bug triggers in 2.6.25 and later, but not in 2.6.24?
>> If so, is it possible to reproduce this e.g. with 2.6.25-rc6?
>
> I firstly discovered this on 2.6.22-6~bpo40+1 (debian's backports), but 
> I'm not sure if that was same thing (it is long ago).
>
> Now i tested on test machine with these kernels:
> 2.6.24, 2.6.24.6  
> (http://ya.bofh.cz/archive/kernel-2.6-htbcrash/2/netconsole-2.6.24.6-slon.txt), 
> 2.6.25
>
> I can test anything you want.

Here is a suspect #1. (BTW, this place reminds me something...)

Thanks,
Jarek P.

---

 net/sched/sch_htb.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 66148cc..5bc1ed4 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1197,12 +1197,16 @@ static inline int htb_parent_last_child(struct htb_class *cl)
 	return 1;
 }
 
-static void htb_parent_to_leaf(struct htb_class *cl, struct Qdisc *new_q)
+static void htb_parent_to_leaf(struct htb_sched *q, struct htb_class *cl,
+			       struct Qdisc *new_q)
 {
 	struct htb_class *parent = cl->parent;
 
 	BUG_TRAP(!cl->level && cl->un.leaf.q && !cl->prio_activity);
 
+	if (parent->cmode != HTB_CAN_SEND)
+		htb_safe_rb_erase(&parent->pq_node, q->wait_pq + parent->level);
+
 	parent->level = 0;
 	memset(&parent->un.inner, 0, sizeof(parent->un.inner));
 	INIT_LIST_HEAD(&parent->un.leaf.drop_list);
@@ -1300,7 +1304,7 @@ static int htb_delete(struct Qdisc *sch, unsigned long arg)
 		htb_deactivate(q, cl);
 
 	if (last_child)
-		htb_parent_to_leaf(cl, new_q);
+		htb_parent_to_leaf(q, cl, new_q);
 
 	if (--cl->refcnt == 0)
 		htb_destroy_class(sch, cl);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: PROBLEM: kernel lockup while changing TC rules
  2008-05-03 16:39     ` Jarek Poplawski
@ 2008-05-03 17:26       ` Jan 'yanek' Bortl
  2008-05-03 18:42         ` [PATCH][NET_SCHED] sch_htb: remove from event queue in htb_parent_to_leaf() Jarek Poplawski
  0 siblings, 1 reply; 12+ messages in thread
From: Jan 'yanek' Bortl @ 2008-05-03 17:26 UTC (permalink / raw)
  To: Jarek Poplawski; +Cc: netdev

Jarek Poplawski wrote:
> ...
> 
> Here is a suspect #1. (BTW, this place reminds me something...)

Great! Seems to solve my problem. I'll do some tests.

Thank you!

> 
> Thanks,
> Jarek P.
> 
> ---
> 
>  net/sched/sch_htb.c |    8 ++++++--
>  1 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
> index 66148cc..5bc1ed4 100644
> --- a/net/sched/sch_htb.c
> +++ b/net/sched/sch_htb.c
> @@ -1197,12 +1197,16 @@ static inline int htb_parent_last_child(struct htb_class *cl)
>  	return 1;
>  }
>  
> -static void htb_parent_to_leaf(struct htb_class *cl, struct Qdisc *new_q)
> +static void htb_parent_to_leaf(struct htb_sched *q, struct htb_class *cl,
> +			       struct Qdisc *new_q)
>  {
>  	struct htb_class *parent = cl->parent;
>  
>  	BUG_TRAP(!cl->level && cl->un.leaf.q && !cl->prio_activity);
>  
> +	if (parent->cmode != HTB_CAN_SEND)
> +		htb_safe_rb_erase(&parent->pq_node, q->wait_pq + parent->level);
> +
>  	parent->level = 0;
>  	memset(&parent->un.inner, 0, sizeof(parent->un.inner));
>  	INIT_LIST_HEAD(&parent->un.leaf.drop_list);
> @@ -1300,7 +1304,7 @@ static int htb_delete(struct Qdisc *sch, unsigned long arg)
>  		htb_deactivate(q, cl);
>  
>  	if (last_child)
> -		htb_parent_to_leaf(cl, new_q);
> +		htb_parent_to_leaf(q, cl, new_q);
>  
>  	if (--cl->refcnt == 0)
>  		htb_destroy_class(sch, cl);

-- 
Jan 'yanek' Bortl <yanek [at] ya.bofh. cz>
http://ya.bofh.cz/ | jab: yanek [at] mitranet. cz
-----------------------------------------------------------------
"Maybe one day you will learn that your way is not the only way."
                                         Opher [StarGate: The Nox]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH][NET_SCHED] sch_htb: remove from event queue in htb_parent_to_leaf()
  2008-05-03 17:26       ` Jan 'yanek' Bortl
@ 2008-05-03 18:42         ` Jarek Poplawski
  2008-05-04  1:58           ` David Miller
  0 siblings, 1 reply; 12+ messages in thread
From: Jarek Poplawski @ 2008-05-03 18:42 UTC (permalink / raw)
  To: David Miller; +Cc: Jan 'yanek' Bortl, netdev

On Sat, May 03, 2008 at 07:26:15PM +0200, Jan 'yanek' Bortl wrote:
> Jarek Poplawski wrote:
>> ...
>>
>> Here is a suspect #1. (BTW, this place reminds me something...)
>
> Great! Seems to solve my problem. I'll do some tests.

Hi David,

IMHO this patch is needed even if there is something more.

Thanks,
Jarek P.

-------------------->

[NET_SCHED] sch_htb: remove from event queue in htb_parent_to_leaf()

There is lack of removing a class from the event queue while changing
from parent to leaf which can cause corruption of this rb tree. This
patch fixes a bug introduced by my patch: "sch_htb: turn intermediate
classes into leaves" commit: 160d5e10f87b1dc88fd9b84b31b1718e0fd76398.

Many thanks to Jan 'yanek' Bortl for finding a way to reproduce this
rare bug and narrowing the test case, which made possible proper
diagnosing.

This patch is recommended for all kernels starting from 2.6.20.

Reported-and-tested-by: Jan 'yanek' Bortl <yanek@ya.bofh.cz>
Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>

---

 net/sched/sch_htb.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 66148cc..5bc1ed4 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -1197,12 +1197,16 @@ static inline int htb_parent_last_child(struct htb_class *cl)
 	return 1;
 }
 
-static void htb_parent_to_leaf(struct htb_class *cl, struct Qdisc *new_q)
+static void htb_parent_to_leaf(struct htb_sched *q, struct htb_class *cl,
+			       struct Qdisc *new_q)
 {
 	struct htb_class *parent = cl->parent;
 
 	BUG_TRAP(!cl->level && cl->un.leaf.q && !cl->prio_activity);
 
+	if (parent->cmode != HTB_CAN_SEND)
+		htb_safe_rb_erase(&parent->pq_node, q->wait_pq + parent->level);
+
 	parent->level = 0;
 	memset(&parent->un.inner, 0, sizeof(parent->un.inner));
 	INIT_LIST_HEAD(&parent->un.leaf.drop_list);
@@ -1300,7 +1304,7 @@ static int htb_delete(struct Qdisc *sch, unsigned long arg)
 		htb_deactivate(q, cl);
 
 	if (last_child)
-		htb_parent_to_leaf(cl, new_q);
+		htb_parent_to_leaf(q, cl, new_q);
 
 	if (--cl->refcnt == 0)
 		htb_destroy_class(sch, cl);

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH][NET_SCHED] sch_htb: remove from event queue in htb_parent_to_leaf()
  2008-05-03 18:42         ` [PATCH][NET_SCHED] sch_htb: remove from event queue in htb_parent_to_leaf() Jarek Poplawski
@ 2008-05-04  1:58           ` David Miller
  0 siblings, 0 replies; 12+ messages in thread
From: David Miller @ 2008-05-04  1:58 UTC (permalink / raw)
  To: jarkao2; +Cc: yanek, netdev

From: Jarek Poplawski <jarkao2@gmail.com>
Date: Sat, 3 May 2008 20:42:45 +0200

> IMHO this patch is needed even if there is something more.

Thanks a lot for this work Jarek, I'll certainly look at this
patch later and apply it.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2008-05-04  1:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-05-01 15:49 PROBLEM: kernel lockup while changing TC rules Jan 'yanek' Bortl
2008-05-03  7:16 ` Jarek Poplawski
2008-05-03  9:43   ` Jan 'yanek' Bortl
2008-05-03 12:11     ` Jarek Poplawski
2008-05-03 12:42       ` Jan 'yanek' Bortl
2008-05-03 16:39     ` Jarek Poplawski
2008-05-03 17:26       ` Jan 'yanek' Bortl
2008-05-03 18:42         ` [PATCH][NET_SCHED] sch_htb: remove from event queue in htb_parent_to_leaf() Jarek Poplawski
2008-05-04  1:58           ` David Miller
     [not found] <20080501145239.GA20284@atlantis.mitranet.cz>
2008-05-01 21:33 ` PROBLEM: kernel lockup while changing TC rules David Miller
2008-05-03  0:26 ` David Miller
2008-05-03  6:16   ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).