* possible Bug in ip_conntrack @ 2006-08-14 8:58 Maik Hentsche 2006-08-14 12:50 ` Patrick McHardy 0 siblings, 1 reply; 5+ messages in thread From: Maik Hentsche @ 2006-08-14 8:58 UTC (permalink / raw) To: netfilter-devel Hi! While debugging a new version of conntrackd, I (nearly) every time get athe following error "BUG: soft lockup detected on CPU#1!" which completely hangs the system. Here is, what I do to reproduce the error: I ue keepalived to manage a virtual IP, that is default gw for two other machines, which communicate over this IP. I use a client-server-program to count connections, which simply opens a tcp-socket and immediately closes it. The gateway uses keepalived and conntrackd-0.8.3 for hot standby. The HS-master (thta produces the error) has kernel 2.6.18-rc4 running (with no other patches), the slave is a 2.6.17.3 (also vanilla). I use valgrind --leak-check=yes for debugging. After some time, usually between 1 and 5 minutes, the system hangs and writes something like this out on the serial console. BUG: soft lockup detected on CPU#1! Call Trace: <IRQ> [<ffffffff80252bc9>] softlockup_tick+0xf9/0x140 [<ffffffff80239367>] update_process_times+0x57/0x90 [<ffffffff80216243>] smp_local_timer_interrupt+0x23/0x50 [<ffffffff802167a8>] smp_apic_timer_interrupt+0x38/0x40 [<ffffffff8020a962>] apic_timer_interrupt+0x66/0x6c <EOI> [<ffffffff8042028d>] _write_lock_irqsave+0x6d/0x90 [<ffffffff8042026a>] _write_lock_irqsave+0x4a/0x90 [<ffffffff80420416>] _write_lock_bh+0x6/0x20 [<ffffffff8811b2a9>] :ip_conntrack:destroy_conntrack+0x69/0x100 [<ffffffff88137da6>] :ip_conntrack_netlink:ctnetlink_dump_table+0x116/0x160 [<ffffffff803c5582>] netlink_dump+0x82/0x1c0 [<ffffffff803c6ff7>] netlink_recvmsg+0x197/0x2a0 [<ffffffff803ab13e>] sock_recvmsg+0xde/0x100 [<ffffffff802264fc>] task_rq_lock+0x4c/0x90 [<ffffffff80244580>] autoremove_wake_function+0x0/0x30 [<ffffffff802346db>] current_fs_time+0x3b/0x40 [<ffffffff80226433>] __wake_up+0x43/0x70 [<ffffffff8027e3ae>] fget_light+0xae/0xe0 [<ffffffff803acf7a>] sys_recvfrom+0xfa/0x190 [<ffffffff80209dbe>] system_call+0x7e/0x83 The calltrace differs every time, but ip_conntrack is always included, therefore and because it only occured yet when I was debugging conntrackd I assume its a ip_conntrack problem. More logmessages and the buggy version of conntrackd can be found here: http://www-user.tu-chemnitz.de/~hmai/ip_conntrack/ If you need more informations, please let me know. so long Maik ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: possible Bug in ip_conntrack 2006-08-14 8:58 possible Bug in ip_conntrack Maik Hentsche @ 2006-08-14 12:50 ` Patrick McHardy 2006-08-15 8:29 ` Maik Hentsche 0 siblings, 1 reply; 5+ messages in thread From: Patrick McHardy @ 2006-08-14 12:50 UTC (permalink / raw) To: Maik Hentsche; +Cc: netfilter-devel Maik Hentsche wrote: > Hi! > While debugging a new version of conntrackd, I (nearly) every time get > athe following error "BUG: soft lockup detected on CPU#1!" which > completely hangs the system. Here is, what I do to reproduce the error: > I ue keepalived to manage a virtual IP, that is default gw for two other > machines, which communicate over this IP. I use a client-server-program > to count connections, which simply opens a tcp-socket and immediately > closes it. The gateway uses keepalived and conntrackd-0.8.3 for hot > standby. The HS-master (thta produces the error) has kernel 2.6.18-rc4 > running (with no other patches), the slave is a 2.6.17.3 (also > vanilla). I use valgrind --leak-check=yes for debugging. After some > time, usually between 1 and 5 minutes, the system hangs and writes > something like this out on the serial console. > > BUG: soft lockup detected on CPU#1! > Call Trace: > <IRQ> [<ffffffff80252bc9>] softlockup_tick+0xf9/0x140 > [<ffffffff80239367>] update_process_times+0x57/0x90 > [<ffffffff80216243>] smp_local_timer_interrupt+0x23/0x50 > [<ffffffff802167a8>] smp_apic_timer_interrupt+0x38/0x40 > [<ffffffff8020a962>] apic_timer_interrupt+0x66/0x6c > <EOI> [<ffffffff8042028d>] _write_lock_irqsave+0x6d/0x90 > [<ffffffff8042026a>] _write_lock_irqsave+0x4a/0x90 > [<ffffffff80420416>] _write_lock_bh+0x6/0x20 > [<ffffffff8811b2a9>] :ip_conntrack:destroy_conntrack+0x69/0x100 > [<ffffffff88137da6>] > :ip_conntrack_netlink:ctnetlink_dump_table+0x116/0x160 Can you test current -git please? We had some changes in that area .. I don't recall any explicit bugfixes, but who knows .. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: possible Bug in ip_conntrack 2006-08-14 12:50 ` Patrick McHardy @ 2006-08-15 8:29 ` Maik Hentsche 2006-08-16 11:31 ` Patrick McHardy 0 siblings, 1 reply; 5+ messages in thread From: Maik Hentsche @ 2006-08-15 8:29 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel Zitat von Patrick McHardy <kaber@trash.net>: > Can you test current -git please? We had some changes in that area .. > I don't recall any explicit bugfixes, but who knows .. Unfortunatelly, it still occurs with -git. so long Maik ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: possible Bug in ip_conntrack 2006-08-15 8:29 ` Maik Hentsche @ 2006-08-16 11:31 ` Patrick McHardy 2006-08-16 14:29 ` Maik Hentsche 0 siblings, 1 reply; 5+ messages in thread From: Patrick McHardy @ 2006-08-16 11:31 UTC (permalink / raw) To: Maik Hentsche; +Cc: netfilter-devel [-- Attachment #1: Type: text/plain, Size: 314 bytes --] Maik Hentsche wrote: > Zitat von Patrick McHardy <kaber@trash.net>: > > >>Can you test current -git please? We had some changes in that area .. >>I don't recall any explicit bugfixes, but who knows .. > > > Unfortunatelly, it still occurs with -git. Can you try this patch please? It should fix the problem. [-- Attachment #2: x --] [-- Type: text/plain, Size: 3503 bytes --] [NETFILTER]: ctnetlink: fix deadlock in table dumping ip_conntrack_put must not be called while holding ip_conntrack_lock since destroy_conntrack takes it again. Signed-off-by: Patrick McHardy <kaber@trash.net> --- commit 8ebd6bb0f469f2759f39e73adee6916a3d975393 tree ebd73ee261508d483654416b03610910a2968e21 parent 338fe5c67e8fb799c9e3470331db6f3c60a31b1e author Patrick McHardy <kaber@trash.net> Wed, 16 Aug 2006 13:32:27 +0200 committer Patrick McHardy <kaber@trash.net> Wed, 16 Aug 2006 13:32:27 +0200 net/ipv4/netfilter/ip_conntrack_netlink.c | 17 +++++++---------- net/netfilter/nf_conntrack_netlink.c | 17 +++++++---------- 2 files changed, 14 insertions(+), 20 deletions(-) diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c index 33891bb..0d4cc92 100644 --- a/net/ipv4/netfilter/ip_conntrack_netlink.c +++ b/net/ipv4/netfilter/ip_conntrack_netlink.c @@ -415,21 +415,18 @@ ctnetlink_dump_table(struct sk_buff *skb cb->args[0], *id); read_lock_bh(&ip_conntrack_lock); + last = (struct ip_conntrack *)cb->args[1]; for (; cb->args[0] < ip_conntrack_htable_size; cb->args[0]++) { restart: - last = (struct ip_conntrack *)cb->args[1]; list_for_each_prev(i, &ip_conntrack_hash[cb->args[0]]) { h = (struct ip_conntrack_tuple_hash *) i; if (DIRECTION(h) != IP_CT_DIR_ORIGINAL) continue; ct = tuplehash_to_ctrack(h); - if (last != NULL) { - if (ct == last) { - ip_conntrack_put(last); - cb->args[1] = 0; - last = NULL; - } else + if (cb->args[1]) { + if (ct != last) continue; + cb->args[1] = 0; } if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, @@ -440,17 +437,17 @@ restart: goto out; } } - if (last != NULL) { - ip_conntrack_put(last); + if (cb->args[1]) { cb->args[1] = 0; goto restart; } } out: read_unlock_bh(&ip_conntrack_lock); + if (last) + ip_conntrack_put(last); DEBUGP("leaving, last bucket=%lu id=%u\n", cb->args[0], *id); - return skb->len; } diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index af48459..6527d4e 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -429,9 +429,9 @@ ctnetlink_dump_table(struct sk_buff *skb cb->args[0], *id); read_lock_bh(&nf_conntrack_lock); + last = (struct nf_conn *)cb->args[1]; for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) { restart: - last = (struct nf_conn *)cb->args[1]; list_for_each_prev(i, &nf_conntrack_hash[cb->args[0]]) { h = (struct nf_conntrack_tuple_hash *) i; if (DIRECTION(h) != IP_CT_DIR_ORIGINAL) @@ -442,13 +442,10 @@ restart: * then dump everything. */ if (l3proto && L3PROTO(ct) != l3proto) continue; - if (last != NULL) { - if (ct == last) { - nf_ct_put(last); - cb->args[1] = 0; - last = NULL; - } else + if (cb->args[1]) { + if (ct != last) continue; + cb->args[1] = 0; } if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, @@ -459,17 +456,17 @@ restart: goto out; } } - if (last != NULL) { - nf_ct_put(last); + if (cb->args[1]) { cb->args[1] = 0; goto restart; } } out: read_unlock_bh(&nf_conntrack_lock); + if (last) + nf_ct_put(last); DEBUGP("leaving, last bucket=%lu id=%u\n", cb->args[0], *id); - return skb->len; } ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: possible Bug in ip_conntrack 2006-08-16 11:31 ` Patrick McHardy @ 2006-08-16 14:29 ` Maik Hentsche 0 siblings, 0 replies; 5+ messages in thread From: Maik Hentsche @ 2006-08-16 14:29 UTC (permalink / raw) To: Patrick McHardy; +Cc: netfilter-devel Zitat von Patrick McHardy <kaber@trash.net>: > Can you try this patch please? It should fix the problem. Done. The bug did not reappear. Thanks for your fast reaction. so long Maik ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-08-16 14:29 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-14 8:58 possible Bug in ip_conntrack Maik Hentsche 2006-08-14 12:50 ` Patrick McHardy 2006-08-15 8:29 ` Maik Hentsche 2006-08-16 11:31 ` Patrick McHardy 2006-08-16 14:29 ` Maik Hentsche
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.