* Re: [Bugme-new] [Bug 11571] New: u32_classify Kernel Panic [not found] <bug-11571-10286@http.bugzilla.kernel.org/> @ 2008-09-16 16:15 ` Andrew Morton 2008-09-17 19:38 ` Jarek Poplawski 0 siblings, 1 reply; 13+ messages in thread From: Andrew Morton @ 2008-09-16 16:15 UTC (permalink / raw) To: netdev; +Cc: bugme-daemon, m0sia (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Mon, 15 Sep 2008 03:35:16 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11571 > > Summary: u32_classify Kernel Panic > Product: Networking > Version: 2.5 > KernelVersion: 2.6.26.5 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: high > Priority: P1 > Component: Other > AssignedTo: acme@ghostprotocols.net > ReportedBy: m0sia@plotinka.ru > > > Distribution: Debian > > Problem Description: > Kernel panic > [<c023f2d8>] dev_queue_xmit+0x175/0x2a1 > [<c0243861>] neigh_resolve_output+0x1f8/0x2la > [<c025a784>] ip_finish_output+0x1d7/0x200 > [<c025aa2f>] ip_output+0x6f/0x81 > [<c0258218>] ip_forwardjinish+0x2c/0x2e > [<c0257223>] ip_rcv_f inish+0x263/0x27f > [<c023cc62>] netif_receive_skb+0x2c1/0x32b > [<f886f26d>] e1000_clean_rx_irq+0x395/0x46f [e1000] > [<f886f5f7>] e1000_clean+0x52/0x1db [e1000] > [<c013e8e4>] net_rx_action+0x8a/0x153 > [<c0128bfa>] __do_softirq+0x5d/0xc1 > [<c0128c98>] do_softirq+0x32/0x36 > [<c0185cb8>] do_IRQ+0x52/0x66 > [<c01887fa>] mwait_idle+0x8/0x32 > [<c018418b>] common_interrupt+0x23/0x28 > [<c01887fa>] mwait_idle+0xB/0x32 > [<c0188829>] muait_idle+0x2f/0x32 > [<c0182545>] cpu_ i d1e+0x88/0x9c > > > Code: 0c 8b 80 90 00 00 00 c7 44 24 14 00 00 00 00 c7 44 24 18 00 00 00 00 89 > 44 > 24 18 8b 54 24 0c 8b 74 aa 18 85 f6 0f 84 a0 01 00 00 <8b> 46 38 83 00 01 83 50 > 04 00 8b 4c 24 04 8b 46 38 23 81 88 00 > EIP: [<f8bf3670>] u32_classify+0x41/0x23f [cls_u32] SS:ESP 8868:f746fd44 > Kernel panic - not syncing: Fatal exception in interrupt > > Steps to reproduce: > > tc qdisc add dev eth1 root handle 1: htb > > tc class add dev eth1 parent 1: classid 1:1 htb rate 3600Kbit > tc class add dev eth1 parent 1:1 classid 1:11 htb rate 2800Kbit prio 0 > tc class add dev eth1 parent 1:1 classid 1:15 htb rate 100Kbit ceil 2800Kbit > prio 0 > tc class add dev eth1 parent 1:1 classid 1:19 htb rate 100Kbit ceil 500Kbit > prio 2 > > > N from 10 to 2000 > tc class add dev eth1 parent 1:{11,15,19} classid 1:$N htb rate 1Kbit ceil > {$SPEED}Kbit > tc filter add dev eth1 parent 1: protocol ip pref $N u32 match ip dst $IP > flowid 1:$N > > Everything worked with N smaller then 2000. The problem first acquired with > kernel 2.6.18-6-686 and is still present in 2.6.26.5 > > > -- > Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You are on the CC list for the bug, or are watching someone who is. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11571] New: u32_classify Kernel Panic 2008-09-16 16:15 ` [Bugme-new] [Bug 11571] New: u32_classify Kernel Panic Andrew Morton @ 2008-09-17 19:38 ` Jarek Poplawski 2008-09-17 22:08 ` Jarek Poplawski ` (2 more replies) 0 siblings, 3 replies; 13+ messages in thread From: Jarek Poplawski @ 2008-09-17 19:38 UTC (permalink / raw) To: Andrew Morton; +Cc: netdev, bugme-daemon, m0sia Andrew Morton wrote, On 09/16/2008 06:15 PM: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Mon, 15 Sep 2008 03:35:16 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: > >> http://bugzilla.kernel.org/show_bug.cgi?id=11571 >> >> Summary: u32_classify Kernel Panic ... Could you add more details: - .config - gzipped cls_u32.o (compiled with CONFIG_DEBUG_INFO on) - the first part of OOPS if possible. - more exactly these tc commands from your script Does this happen while creating or deleting something and is this easy to reproduce? Thanks, Jarek P. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11571] New: u32_classify Kernel Panic 2008-09-17 19:38 ` Jarek Poplawski @ 2008-09-17 22:08 ` Jarek Poplawski 2008-09-18 7:05 ` m0sia 2008-10-11 11:17 ` pkt_sched: cls_u32: Fix locking in u32_delete() Jarek Poplawski 2 siblings, 0 replies; 13+ messages in thread From: Jarek Poplawski @ 2008-09-17 22:08 UTC (permalink / raw) To: Andrew Morton; +Cc: netdev, bugme-daemon, m0sia On Wed, Sep 17, 2008 at 09:38:32PM +0200, Jarek Poplawski wrote: ... > >> http://bugzilla.kernel.org/show_bug.cgi?id=11571 > Does this happen while creating or deleting something and is this > easy to reproduce? If accidentally there is any deleting around try this patch, please. Jarek P. --- net/sched/cls_u32.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 246f906..9912ad5 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -433,7 +433,9 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg) if (ht->refcnt == 1) { ht->refcnt--; + tcf_tree_lock(tp); u32_destroy_hnode(tp, ht); + tcf_tree_unlock(tp); } else { return -EBUSY; } ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11571] New: u32_classify Kernel Panic 2008-09-17 19:38 ` Jarek Poplawski 2008-09-17 22:08 ` Jarek Poplawski @ 2008-09-18 7:05 ` m0sia 2008-09-18 7:53 ` Jarek Poplawski 2008-10-11 11:17 ` pkt_sched: cls_u32: Fix locking in u32_delete() Jarek Poplawski 2 siblings, 1 reply; 13+ messages in thread From: m0sia @ 2008-09-18 7:05 UTC (permalink / raw) To: Jarek Poplawski; +Cc: Andrew Morton, netdev, bugme-daemon Jarek Poplawski пишет: > Andrew Morton wrote, On 09/16/2008 06:15 PM: > > >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). >> >> On Mon, 15 Sep 2008 03:35:16 -0700 (PDT) bugme-daemon@bugzilla.kernel.org wrote: >> >> >>> http://bugzilla.kernel.org/show_bug.cgi?id=11571 >>> >>> Summary: u32_classify Kernel Panic >>> > ... > > Could you add more details: > - .config > - gzipped cls_u32.o (compiled with CONFIG_DEBUG_INFO on) > - the first part of OOPS if possible. > - more exactly these tc commands from your script > > Does this happen while creating or deleting something and is this > easy to reproduce? > > Thanks, > Jarek P. > It happens on a production server and i can't experiment with this bug. Now i'm using MARK iptable target and fw mark filter, because of this bug. I think it happens when deleting or adding filters(it occur automatically by script, when adding new user or user change speed). I'll try this patch on test server. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Bugme-new] [Bug 11571] New: u32_classify Kernel Panic 2008-09-18 7:05 ` m0sia @ 2008-09-18 7:53 ` Jarek Poplawski 2009-01-05 13:52 ` [PATCH] " Jarek Poplawski 0 siblings, 1 reply; 13+ messages in thread From: Jarek Poplawski @ 2008-09-18 7:53 UTC (permalink / raw) To: m0sia; +Cc: Andrew Morton, netdev, bugme-daemon On Thu, Sep 18, 2008 at 01:05:28PM +0600, m0sia wrote: ... >>>> http://bugzilla.kernel.org/show_bug.cgi?id=11571 >>>> >>>> Summary: u32_classify Kernel Panic ... > It happens on a production server and i can't experiment with this bug. > Now i'm using MARK iptable target and fw mark filter, because of this > bug. I think it happens when deleting or adding filters(it occur > automatically by script, when adding new user or user change speed). > I'll try this patch on test server. OK, no hurry. BTW, it looks like some traffic is needed in a qdisc while its filters are modified to trigger this. Probably, turning off CONFIG_CLS_U32_PERF can make this less visible. On the other hand, turning on memory debugging: CONFIG_DEBUG_SLAB or CONFIG_SLUB_DEBUG_ON should be helpful here. Jarek P. ^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH] Re: [Bugme-new] [Bug 11571] New: u32_classify Kernel Panic 2008-09-18 7:53 ` Jarek Poplawski @ 2009-01-05 13:52 ` Jarek Poplawski 2009-01-06 2:14 ` David Miller 0 siblings, 1 reply; 13+ messages in thread From: Jarek Poplawski @ 2009-01-05 13:52 UTC (permalink / raw) To: David Miller; +Cc: Andrew Morton, netdev, bugme-daemon > On Thu, Sep 18, 2008 at 01:05:28PM +0600, m0sia wrote: > ... > >>>> http://bugzilla.kernel.org/show_bug.cgi?id=11571 ... (take 2) It seems there could be a problem with testing if this patch fixes this bug, but IMHO it's quite probable, and needed anyway. Jarek P. -----------------> pkt_sched: cls_u32: Fix locking in u32_change() New nodes are inserted in u32_change() under rtnl_lock() with wmb(), so without tcf_tree_lock() like in other classifiers (e.g. cls_fw). This isn't enough without rmb() on the read side, but on the other hand adding such barriers doesn't give any savings, so the lock is added instead. Reported-by: m0sia <m0sia@plotinka.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> --- net/sched/cls_u32.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 05d1780..07372f6 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -638,8 +638,9 @@ static int u32_change(struct tcf_proto *tp, unsigned long base, u32 handle, break; n->next = *ins; - wmb(); + tcf_tree_lock(tp); *ins = n; + tcf_tree_unlock(tp); *arg = (unsigned long)n; return 0; ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [PATCH] Re: [Bugme-new] [Bug 11571] New: u32_classify Kernel Panic 2009-01-05 13:52 ` [PATCH] " Jarek Poplawski @ 2009-01-06 2:14 ` David Miller 0 siblings, 0 replies; 13+ messages in thread From: David Miller @ 2009-01-06 2:14 UTC (permalink / raw) To: jarkao2; +Cc: akpm, netdev, bugme-daemon From: Jarek Poplawski <jarkao2@gmail.com> Date: Mon, 5 Jan 2009 13:52:45 +0000 > pkt_sched: cls_u32: Fix locking in u32_change() > > New nodes are inserted in u32_change() under rtnl_lock() with wmb(), > so without tcf_tree_lock() like in other classifiers (e.g. cls_fw). > This isn't enough without rmb() on the read side, but on the other > hand adding such barriers doesn't give any savings, so the lock is > added instead. > > Reported-by: m0sia <m0sia@plotinka.ru> > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Applied and queued up for -stable, thanks Jarek. ^ permalink raw reply [flat|nested] 13+ messages in thread
* pkt_sched: cls_u32: Fix locking in u32_delete() 2008-09-17 19:38 ` Jarek Poplawski 2008-09-17 22:08 ` Jarek Poplawski 2008-09-18 7:05 ` m0sia @ 2008-10-11 11:17 ` Jarek Poplawski 2008-10-11 14:10 ` Herbert Xu 2 siblings, 1 reply; 13+ messages in thread From: Jarek Poplawski @ 2008-10-11 11:17 UTC (permalink / raw) To: David Miller; +Cc: netdev, bugme-daemon, m0sia, Andrew Morton pkt_sched: cls_u32: Fix locking in u32_delete() While looking for a possible reason of bugzilla [Bug 11571] "u32_classify Kernel Panic" reported by m0sia@plotinka.ru I found that tcf_tree_lock() is missing in u32_delete() during u32_destroy_hnode() call. Other paths calling this function use this lock. It haven't been acknowledged this fixes the bug, but I think this patch is needed here anyway. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> --- net/sched/cls_u32.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c index 246f906..9912ad5 100644 --- a/net/sched/cls_u32.c +++ b/net/sched/cls_u32.c @@ -433,7 +433,9 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg) if (ht->refcnt == 1) { ht->refcnt--; + tcf_tree_lock(tp); u32_destroy_hnode(tp, ht); + tcf_tree_unlock(tp); } else { return -EBUSY; } ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: pkt_sched: cls_u32: Fix locking in u32_delete() 2008-10-11 11:17 ` pkt_sched: cls_u32: Fix locking in u32_delete() Jarek Poplawski @ 2008-10-11 14:10 ` Herbert Xu 2008-10-11 19:24 ` David Miller 0 siblings, 1 reply; 13+ messages in thread From: Herbert Xu @ 2008-10-11 14:10 UTC (permalink / raw) To: Jarek Poplawski; +Cc: davem, netdev, bugme-daemon, m0sia, akpm Jarek Poplawski <jarkao2@gmail.com> wrote: > pkt_sched: cls_u32: Fix locking in u32_delete() > > While looking for a possible reason of bugzilla [Bug 11571] > "u32_classify Kernel Panic" reported by m0sia@plotinka.ru I found that > tcf_tree_lock() is missing in u32_delete() during u32_destroy_hnode() > call. Other paths calling this function use this lock. It haven't been > acknowledged this fixes the bug, but I think this patch is needed here > anyway. > > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > --- > > net/sched/cls_u32.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c > index 246f906..9912ad5 100644 > --- a/net/sched/cls_u32.c > +++ b/net/sched/cls_u32.c > @@ -433,7 +433,9 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg) > > if (ht->refcnt == 1) { > ht->refcnt--; > + tcf_tree_lock(tp); > u32_destroy_hnode(tp, ht); > + tcf_tree_unlock(tp); Well if you were going to protect you'd need to lock before the reference count check. However, this is actually unecessary because the reference count can only be increased the RTNL which we're already holding. Also, if the reference count is 1, then there must be no live references in the system to the hash table so we can safely delete it. So whatever the problem is this isn't it :) Cheers, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: pkt_sched: cls_u32: Fix locking in u32_delete() 2008-10-11 14:10 ` Herbert Xu @ 2008-10-11 19:24 ` David Miller 2008-10-11 22:04 ` Jarek Poplawski 0 siblings, 1 reply; 13+ messages in thread From: David Miller @ 2008-10-11 19:24 UTC (permalink / raw) To: herbert; +Cc: jarkao2, netdev, bugme-daemon, m0sia, akpm From: Herbert Xu <herbert@gondor.apana.org.au> Date: Sat, 11 Oct 2008 22:10:22 +0800 > Jarek Poplawski <jarkao2@gmail.com> wrote: > > pkt_sched: cls_u32: Fix locking in u32_delete() > > > > While looking for a possible reason of bugzilla [Bug 11571] > > "u32_classify Kernel Panic" reported by m0sia@plotinka.ru I found that > > tcf_tree_lock() is missing in u32_delete() during u32_destroy_hnode() > > call. Other paths calling this function use this lock. It haven't been > > acknowledged this fixes the bug, but I think this patch is needed here > > anyway. > > > > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > > > --- > > > > net/sched/cls_u32.c | 2 ++ > > 1 files changed, 2 insertions(+), 0 deletions(-) > > > > diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c > > index 246f906..9912ad5 100644 > > --- a/net/sched/cls_u32.c > > +++ b/net/sched/cls_u32.c > > @@ -433,7 +433,9 @@ static int u32_delete(struct tcf_proto *tp, unsigned long arg) > > > > if (ht->refcnt == 1) { > > ht->refcnt--; > > + tcf_tree_lock(tp); > > u32_destroy_hnode(tp, ht); > > + tcf_tree_unlock(tp); > > Well if you were going to protect you'd need to lock before the > reference count check. However, this is actually unecessary > because the reference count can only be increased the RTNL which > we're already holding. > > Also, if the reference count is 1, then there must be no live > references in the system to the hash table so we can safely > delete it. > > So whatever the problem is this isn't it :) Agreed, the synchronization is already what is necessary here. As Herbert stated, the refcounts only change under RTNL and when we see it hit 0 we can be sure we are the only reference to it. Next, my understanding is that: 1) tc_h_common is a per sched tree object 2) we quiesced the whole sched tree, from the root, before getting to this code Which means that the hash list deletion in u32_destroy_hnode() is safe as well. But hey, we could be missing something here, so I'd be happy to hear that Jarek can still see some hole here :) Because it is true that we have seen some weird crashes still and u32 seems common amongst those report. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: pkt_sched: cls_u32: Fix locking in u32_delete() 2008-10-11 19:24 ` David Miller @ 2008-10-11 22:04 ` Jarek Poplawski 2008-10-11 22:05 ` David Miller 0 siblings, 1 reply; 13+ messages in thread From: Jarek Poplawski @ 2008-10-11 22:04 UTC (permalink / raw) To: David Miller; +Cc: herbert, netdev, bugme-daemon, m0sia, akpm On Sat, Oct 11, 2008 at 12:24:00PM -0700, David Miller wrote: > From: Herbert Xu <herbert@gondor.apana.org.au> > Date: Sat, 11 Oct 2008 22:10:22 +0800 ... > But hey, we could be missing something here, so I'd be happy to > hear that Jarek can still see some hole here :) Because it is true > that we have seen some weird crashes still and u32 seems common > amongst those report. > Yes, you could be missing something, but I've forgotten what, because when I found this I was much younger and brighter... and alas now I've a busy weekend, so let it wait until monday. Thanks you both for looking at this, Jarek P. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: pkt_sched: cls_u32: Fix locking in u32_delete() 2008-10-11 22:04 ` Jarek Poplawski @ 2008-10-11 22:05 ` David Miller 2008-10-13 6:46 ` Jarek Poplawski 0 siblings, 1 reply; 13+ messages in thread From: David Miller @ 2008-10-11 22:05 UTC (permalink / raw) To: jarkao2; +Cc: herbert, netdev, bugme-daemon, m0sia, akpm From: Jarek Poplawski <jarkao2@gmail.com> Date: Sun, 12 Oct 2008 00:04:18 +0200 > Yes, you could be missing something, but I've forgotten what, because > when I found this I was much younger and brighter... and alas now I've > a busy weekend, so let it wait until monday. You euros and your "weekends". :-) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: pkt_sched: cls_u32: Fix locking in u32_delete() 2008-10-11 22:05 ` David Miller @ 2008-10-13 6:46 ` Jarek Poplawski 0 siblings, 0 replies; 13+ messages in thread From: Jarek Poplawski @ 2008-10-13 6:46 UTC (permalink / raw) To: David Miller; +Cc: herbert, netdev, bugme-daemon, m0sia, akpm On 12-10-2008 00:05, David Miller wrote: > From: Jarek Poplawski <jarkao2@gmail.com> > Date: Sun, 12 Oct 2008 00:04:18 +0200 > >> Yes, you could be missing something, but I've forgotten what, because >> when I found this I was much younger and brighter... and alas now I've >> a busy weekend, so let it wait until monday. OK, I give up! You both didn't miss anything and this patch is useless. > You euros and your "weekends". :-) I wish you were right with this too... Thanks again, Jarek P. ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-01-06 2:14 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <bug-11571-10286@http.bugzilla.kernel.org/> 2008-09-16 16:15 ` [Bugme-new] [Bug 11571] New: u32_classify Kernel Panic Andrew Morton 2008-09-17 19:38 ` Jarek Poplawski 2008-09-17 22:08 ` Jarek Poplawski 2008-09-18 7:05 ` m0sia 2008-09-18 7:53 ` Jarek Poplawski 2009-01-05 13:52 ` [PATCH] " Jarek Poplawski 2009-01-06 2:14 ` David Miller 2008-10-11 11:17 ` pkt_sched: cls_u32: Fix locking in u32_delete() Jarek Poplawski 2008-10-11 14:10 ` Herbert Xu 2008-10-11 19:24 ` David Miller 2008-10-11 22:04 ` Jarek Poplawski 2008-10-11 22:05 ` David Miller 2008-10-13 6:46 ` Jarek Poplawski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).