* possible Bug in ip_conntrack
@ 2006-08-14 8:58 Maik Hentsche
2006-08-14 12:50 ` Patrick McHardy
0 siblings, 1 reply; 5+ messages in thread
From: Maik Hentsche @ 2006-08-14 8:58 UTC (permalink / raw)
To: netfilter-devel
Hi!
While debugging a new version of conntrackd, I (nearly) every time get
athe following error "BUG: soft lockup detected on CPU#1!" which
completely hangs the system. Here is, what I do to reproduce the error:
I ue keepalived to manage a virtual IP, that is default gw for two other
machines, which communicate over this IP. I use a client-server-program
to count connections, which simply opens a tcp-socket and immediately
closes it. The gateway uses keepalived and conntrackd-0.8.3 for hot
standby. The HS-master (thta produces the error) has kernel 2.6.18-rc4
running (with no other patches), the slave is a 2.6.17.3 (also
vanilla). I use valgrind --leak-check=yes for debugging. After some
time, usually between 1 and 5 minutes, the system hangs and writes
something like this out on the serial console.
BUG: soft lockup detected on CPU#1!
Call Trace:
<IRQ> [<ffffffff80252bc9>] softlockup_tick+0xf9/0x140
[<ffffffff80239367>] update_process_times+0x57/0x90
[<ffffffff80216243>] smp_local_timer_interrupt+0x23/0x50
[<ffffffff802167a8>] smp_apic_timer_interrupt+0x38/0x40
[<ffffffff8020a962>] apic_timer_interrupt+0x66/0x6c
<EOI> [<ffffffff8042028d>] _write_lock_irqsave+0x6d/0x90
[<ffffffff8042026a>] _write_lock_irqsave+0x4a/0x90
[<ffffffff80420416>] _write_lock_bh+0x6/0x20
[<ffffffff8811b2a9>] :ip_conntrack:destroy_conntrack+0x69/0x100
[<ffffffff88137da6>]
:ip_conntrack_netlink:ctnetlink_dump_table+0x116/0x160
[<ffffffff803c5582>] netlink_dump+0x82/0x1c0
[<ffffffff803c6ff7>] netlink_recvmsg+0x197/0x2a0
[<ffffffff803ab13e>] sock_recvmsg+0xde/0x100
[<ffffffff802264fc>] task_rq_lock+0x4c/0x90
[<ffffffff80244580>] autoremove_wake_function+0x0/0x30
[<ffffffff802346db>] current_fs_time+0x3b/0x40
[<ffffffff80226433>] __wake_up+0x43/0x70
[<ffffffff8027e3ae>] fget_light+0xae/0xe0
[<ffffffff803acf7a>] sys_recvfrom+0xfa/0x190
[<ffffffff80209dbe>] system_call+0x7e/0x83
The calltrace differs every time, but ip_conntrack is always included,
therefore and because it only occured yet when I was debugging
conntrackd I assume its a ip_conntrack problem.
More logmessages and the buggy version of conntrackd can be found here:
http://www-user.tu-chemnitz.de/~hmai/ip_conntrack/
If you need more informations, please let me know.
so long
Maik
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: possible Bug in ip_conntrack
2006-08-14 8:58 possible Bug in ip_conntrack Maik Hentsche
@ 2006-08-14 12:50 ` Patrick McHardy
2006-08-15 8:29 ` Maik Hentsche
0 siblings, 1 reply; 5+ messages in thread
From: Patrick McHardy @ 2006-08-14 12:50 UTC (permalink / raw)
To: Maik Hentsche; +Cc: netfilter-devel
Maik Hentsche wrote:
> Hi!
> While debugging a new version of conntrackd, I (nearly) every time get
> athe following error "BUG: soft lockup detected on CPU#1!" which
> completely hangs the system. Here is, what I do to reproduce the error:
> I ue keepalived to manage a virtual IP, that is default gw for two other
> machines, which communicate over this IP. I use a client-server-program
> to count connections, which simply opens a tcp-socket and immediately
> closes it. The gateway uses keepalived and conntrackd-0.8.3 for hot
> standby. The HS-master (thta produces the error) has kernel 2.6.18-rc4
> running (with no other patches), the slave is a 2.6.17.3 (also
> vanilla). I use valgrind --leak-check=yes for debugging. After some
> time, usually between 1 and 5 minutes, the system hangs and writes
> something like this out on the serial console.
>
> BUG: soft lockup detected on CPU#1!
> Call Trace:
> <IRQ> [<ffffffff80252bc9>] softlockup_tick+0xf9/0x140
> [<ffffffff80239367>] update_process_times+0x57/0x90
> [<ffffffff80216243>] smp_local_timer_interrupt+0x23/0x50
> [<ffffffff802167a8>] smp_apic_timer_interrupt+0x38/0x40
> [<ffffffff8020a962>] apic_timer_interrupt+0x66/0x6c
> <EOI> [<ffffffff8042028d>] _write_lock_irqsave+0x6d/0x90
> [<ffffffff8042026a>] _write_lock_irqsave+0x4a/0x90
> [<ffffffff80420416>] _write_lock_bh+0x6/0x20
> [<ffffffff8811b2a9>] :ip_conntrack:destroy_conntrack+0x69/0x100
> [<ffffffff88137da6>]
> :ip_conntrack_netlink:ctnetlink_dump_table+0x116/0x160
Can you test current -git please? We had some changes in that area ..
I don't recall any explicit bugfixes, but who knows ..
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: possible Bug in ip_conntrack
2006-08-14 12:50 ` Patrick McHardy
@ 2006-08-15 8:29 ` Maik Hentsche
2006-08-16 11:31 ` Patrick McHardy
0 siblings, 1 reply; 5+ messages in thread
From: Maik Hentsche @ 2006-08-15 8:29 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netfilter-devel
Zitat von Patrick McHardy <kaber@trash.net>:
> Can you test current -git please? We had some changes in that area ..
> I don't recall any explicit bugfixes, but who knows ..
Unfortunatelly, it still occurs with -git.
so long
Maik
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: possible Bug in ip_conntrack
2006-08-15 8:29 ` Maik Hentsche
@ 2006-08-16 11:31 ` Patrick McHardy
2006-08-16 14:29 ` Maik Hentsche
0 siblings, 1 reply; 5+ messages in thread
From: Patrick McHardy @ 2006-08-16 11:31 UTC (permalink / raw)
To: Maik Hentsche; +Cc: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 314 bytes --]
Maik Hentsche wrote:
> Zitat von Patrick McHardy <kaber@trash.net>:
>
>
>>Can you test current -git please? We had some changes in that area ..
>>I don't recall any explicit bugfixes, but who knows ..
>
>
> Unfortunatelly, it still occurs with -git.
Can you try this patch please? It should fix the problem.
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 3503 bytes --]
[NETFILTER]: ctnetlink: fix deadlock in table dumping
ip_conntrack_put must not be called while holding ip_conntrack_lock
since destroy_conntrack takes it again.
Signed-off-by: Patrick McHardy <kaber@trash.net>
---
commit 8ebd6bb0f469f2759f39e73adee6916a3d975393
tree ebd73ee261508d483654416b03610910a2968e21
parent 338fe5c67e8fb799c9e3470331db6f3c60a31b1e
author Patrick McHardy <kaber@trash.net> Wed, 16 Aug 2006 13:32:27 +0200
committer Patrick McHardy <kaber@trash.net> Wed, 16 Aug 2006 13:32:27 +0200
net/ipv4/netfilter/ip_conntrack_netlink.c | 17 +++++++----------
net/netfilter/nf_conntrack_netlink.c | 17 +++++++----------
2 files changed, 14 insertions(+), 20 deletions(-)
diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c
index 33891bb..0d4cc92 100644
--- a/net/ipv4/netfilter/ip_conntrack_netlink.c
+++ b/net/ipv4/netfilter/ip_conntrack_netlink.c
@@ -415,21 +415,18 @@ ctnetlink_dump_table(struct sk_buff *skb
cb->args[0], *id);
read_lock_bh(&ip_conntrack_lock);
+ last = (struct ip_conntrack *)cb->args[1];
for (; cb->args[0] < ip_conntrack_htable_size; cb->args[0]++) {
restart:
- last = (struct ip_conntrack *)cb->args[1];
list_for_each_prev(i, &ip_conntrack_hash[cb->args[0]]) {
h = (struct ip_conntrack_tuple_hash *) i;
if (DIRECTION(h) != IP_CT_DIR_ORIGINAL)
continue;
ct = tuplehash_to_ctrack(h);
- if (last != NULL) {
- if (ct == last) {
- ip_conntrack_put(last);
- cb->args[1] = 0;
- last = NULL;
- } else
+ if (cb->args[1]) {
+ if (ct != last)
continue;
+ cb->args[1] = 0;
}
if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
cb->nlh->nlmsg_seq,
@@ -440,17 +437,17 @@ restart:
goto out;
}
}
- if (last != NULL) {
- ip_conntrack_put(last);
+ if (cb->args[1]) {
cb->args[1] = 0;
goto restart;
}
}
out:
read_unlock_bh(&ip_conntrack_lock);
+ if (last)
+ ip_conntrack_put(last);
DEBUGP("leaving, last bucket=%lu id=%u\n", cb->args[0], *id);
-
return skb->len;
}
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index af48459..6527d4e 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -429,9 +429,9 @@ ctnetlink_dump_table(struct sk_buff *skb
cb->args[0], *id);
read_lock_bh(&nf_conntrack_lock);
+ last = (struct nf_conn *)cb->args[1];
for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
restart:
- last = (struct nf_conn *)cb->args[1];
list_for_each_prev(i, &nf_conntrack_hash[cb->args[0]]) {
h = (struct nf_conntrack_tuple_hash *) i;
if (DIRECTION(h) != IP_CT_DIR_ORIGINAL)
@@ -442,13 +442,10 @@ restart:
* then dump everything. */
if (l3proto && L3PROTO(ct) != l3proto)
continue;
- if (last != NULL) {
- if (ct == last) {
- nf_ct_put(last);
- cb->args[1] = 0;
- last = NULL;
- } else
+ if (cb->args[1]) {
+ if (ct != last)
continue;
+ cb->args[1] = 0;
}
if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
cb->nlh->nlmsg_seq,
@@ -459,17 +456,17 @@ restart:
goto out;
}
}
- if (last != NULL) {
- nf_ct_put(last);
+ if (cb->args[1]) {
cb->args[1] = 0;
goto restart;
}
}
out:
read_unlock_bh(&nf_conntrack_lock);
+ if (last)
+ nf_ct_put(last);
DEBUGP("leaving, last bucket=%lu id=%u\n", cb->args[0], *id);
-
return skb->len;
}
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: possible Bug in ip_conntrack
2006-08-16 11:31 ` Patrick McHardy
@ 2006-08-16 14:29 ` Maik Hentsche
0 siblings, 0 replies; 5+ messages in thread
From: Maik Hentsche @ 2006-08-16 14:29 UTC (permalink / raw)
To: Patrick McHardy; +Cc: netfilter-devel
Zitat von Patrick McHardy <kaber@trash.net>:
> Can you try this patch please? It should fix the problem.
Done. The bug did not reappear. Thanks for your fast reaction.
so long
Maik
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2006-08-16 14:29 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-14 8:58 possible Bug in ip_conntrack Maik Hentsche
2006-08-14 12:50 ` Patrick McHardy
2006-08-15 8:29 ` Maik Hentsche
2006-08-16 11:31 ` Patrick McHardy
2006-08-16 14:29 ` Maik Hentsche
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.