All of lore.kernel.org
 help / color / mirror / Atom feed
* possible Bug in ip_conntrack
@ 2006-08-14  8:58 Maik Hentsche
  2006-08-14 12:50 ` Patrick McHardy
  0 siblings, 1 reply; 5+ messages in thread
From: Maik Hentsche @ 2006-08-14  8:58 UTC (permalink / raw)
  To: netfilter-devel

Hi!
While debugging a new version of conntrackd, I (nearly) every time get
athe following error "BUG: soft lockup detected on CPU#1!" which
completely hangs the system. Here is, what I do to reproduce the error:
I ue keepalived to manage a virtual IP, that is default gw for two other
machines, which communicate over this IP. I use a client-server-program
to count connections, which simply opens a tcp-socket and immediately
closes it. The gateway uses keepalived and conntrackd-0.8.3 for hot
standby. The HS-master (thta produces the error) has kernel 2.6.18-rc4
running (with no other patches), the slave is a 2.6.17.3 (also
vanilla).  I use valgrind --leak-check=yes for debugging. After some
time, usually between 1 and 5 minutes, the system hangs and writes
something like this out on the serial console.

BUG: soft lockup detected on CPU#1!
Call Trace:
 <IRQ> [<ffffffff80252bc9>] softlockup_tick+0xf9/0x140
 [<ffffffff80239367>] update_process_times+0x57/0x90
 [<ffffffff80216243>] smp_local_timer_interrupt+0x23/0x50
 [<ffffffff802167a8>] smp_apic_timer_interrupt+0x38/0x40
 [<ffffffff8020a962>] apic_timer_interrupt+0x66/0x6c
 <EOI> [<ffffffff8042028d>] _write_lock_irqsave+0x6d/0x90
 [<ffffffff8042026a>] _write_lock_irqsave+0x4a/0x90
 [<ffffffff80420416>] _write_lock_bh+0x6/0x20
 [<ffffffff8811b2a9>] :ip_conntrack:destroy_conntrack+0x69/0x100
 [<ffffffff88137da6>]
:ip_conntrack_netlink:ctnetlink_dump_table+0x116/0x160
 [<ffffffff803c5582>] netlink_dump+0x82/0x1c0
 [<ffffffff803c6ff7>] netlink_recvmsg+0x197/0x2a0
 [<ffffffff803ab13e>] sock_recvmsg+0xde/0x100
 [<ffffffff802264fc>] task_rq_lock+0x4c/0x90
 [<ffffffff80244580>] autoremove_wake_function+0x0/0x30
 [<ffffffff802346db>] current_fs_time+0x3b/0x40
 [<ffffffff80226433>] __wake_up+0x43/0x70
 [<ffffffff8027e3ae>] fget_light+0xae/0xe0
 [<ffffffff803acf7a>] sys_recvfrom+0xfa/0x190
 [<ffffffff80209dbe>] system_call+0x7e/0x83

The calltrace differs every time, but ip_conntrack is always included,
therefore and because it only occured yet when I was debugging
conntrackd I assume its a ip_conntrack problem.

More logmessages and the buggy version of conntrackd can be found here:
http://www-user.tu-chemnitz.de/~hmai/ip_conntrack/

If you need more informations, please let me know.

so long
Maik

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible Bug in ip_conntrack
  2006-08-14  8:58 possible Bug in ip_conntrack Maik Hentsche
@ 2006-08-14 12:50 ` Patrick McHardy
  2006-08-15  8:29   ` Maik Hentsche
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick McHardy @ 2006-08-14 12:50 UTC (permalink / raw)
  To: Maik Hentsche; +Cc: netfilter-devel

Maik Hentsche wrote:
> Hi!
> While debugging a new version of conntrackd, I (nearly) every time get
> athe following error "BUG: soft lockup detected on CPU#1!" which
> completely hangs the system. Here is, what I do to reproduce the error:
> I ue keepalived to manage a virtual IP, that is default gw for two other
> machines, which communicate over this IP. I use a client-server-program
> to count connections, which simply opens a tcp-socket and immediately
> closes it. The gateway uses keepalived and conntrackd-0.8.3 for hot
> standby. The HS-master (thta produces the error) has kernel 2.6.18-rc4
> running (with no other patches), the slave is a 2.6.17.3 (also
> vanilla).  I use valgrind --leak-check=yes for debugging. After some
> time, usually between 1 and 5 minutes, the system hangs and writes
> something like this out on the serial console.
> 
> BUG: soft lockup detected on CPU#1!
> Call Trace:
>  <IRQ> [<ffffffff80252bc9>] softlockup_tick+0xf9/0x140
>  [<ffffffff80239367>] update_process_times+0x57/0x90
>  [<ffffffff80216243>] smp_local_timer_interrupt+0x23/0x50
>  [<ffffffff802167a8>] smp_apic_timer_interrupt+0x38/0x40
>  [<ffffffff8020a962>] apic_timer_interrupt+0x66/0x6c
>  <EOI> [<ffffffff8042028d>] _write_lock_irqsave+0x6d/0x90
>  [<ffffffff8042026a>] _write_lock_irqsave+0x4a/0x90
>  [<ffffffff80420416>] _write_lock_bh+0x6/0x20
>  [<ffffffff8811b2a9>] :ip_conntrack:destroy_conntrack+0x69/0x100
>  [<ffffffff88137da6>]
> :ip_conntrack_netlink:ctnetlink_dump_table+0x116/0x160


Can you test current -git please? We had some changes in that area ..
I don't recall any explicit bugfixes, but who knows ..

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible Bug in ip_conntrack
  2006-08-14 12:50 ` Patrick McHardy
@ 2006-08-15  8:29   ` Maik Hentsche
  2006-08-16 11:31     ` Patrick McHardy
  0 siblings, 1 reply; 5+ messages in thread
From: Maik Hentsche @ 2006-08-15  8:29 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Zitat von Patrick McHardy <kaber@trash.net>:

> Can you test current -git please? We had some changes in that area ..
> I don't recall any explicit bugfixes, but who knows ..

Unfortunatelly, it still occurs with -git.

so long
Maik

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: possible Bug in ip_conntrack
  2006-08-15  8:29   ` Maik Hentsche
@ 2006-08-16 11:31     ` Patrick McHardy
  2006-08-16 14:29       ` Maik Hentsche
  0 siblings, 1 reply; 5+ messages in thread
From: Patrick McHardy @ 2006-08-16 11:31 UTC (permalink / raw)
  To: Maik Hentsche; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 314 bytes --]

Maik Hentsche wrote:
> Zitat von Patrick McHardy <kaber@trash.net>:
> 
> 
>>Can you test current -git please? We had some changes in that area ..
>>I don't recall any explicit bugfixes, but who knows ..
> 
> 
> Unfortunatelly, it still occurs with -git.

Can you try this patch please? It should fix the problem.


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 3503 bytes --]

[NETFILTER]: ctnetlink: fix deadlock in table dumping

ip_conntrack_put must not be called while holding ip_conntrack_lock
since destroy_conntrack takes it again.

Signed-off-by: Patrick McHardy <kaber@trash.net>

---
commit 8ebd6bb0f469f2759f39e73adee6916a3d975393
tree ebd73ee261508d483654416b03610910a2968e21
parent 338fe5c67e8fb799c9e3470331db6f3c60a31b1e
author Patrick McHardy <kaber@trash.net> Wed, 16 Aug 2006 13:32:27 +0200
committer Patrick McHardy <kaber@trash.net> Wed, 16 Aug 2006 13:32:27 +0200

 net/ipv4/netfilter/ip_conntrack_netlink.c |   17 +++++++----------
 net/netfilter/nf_conntrack_netlink.c      |   17 +++++++----------
 2 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/net/ipv4/netfilter/ip_conntrack_netlink.c b/net/ipv4/netfilter/ip_conntrack_netlink.c
index 33891bb..0d4cc92 100644
--- a/net/ipv4/netfilter/ip_conntrack_netlink.c
+++ b/net/ipv4/netfilter/ip_conntrack_netlink.c
@@ -415,21 +415,18 @@ ctnetlink_dump_table(struct sk_buff *skb
 			cb->args[0], *id);
 
 	read_lock_bh(&ip_conntrack_lock);
+	last = (struct ip_conntrack *)cb->args[1];
 	for (; cb->args[0] < ip_conntrack_htable_size; cb->args[0]++) {
 restart:
-		last = (struct ip_conntrack *)cb->args[1];
 		list_for_each_prev(i, &ip_conntrack_hash[cb->args[0]]) {
 			h = (struct ip_conntrack_tuple_hash *) i;
 			if (DIRECTION(h) != IP_CT_DIR_ORIGINAL)
 				continue;
 			ct = tuplehash_to_ctrack(h);
-			if (last != NULL) {
-				if (ct == last) {
-					ip_conntrack_put(last);
-					cb->args[1] = 0;
-					last = NULL;
-				} else
+			if (cb->args[1]) {
+				if (ct != last)
 					continue;
+				cb->args[1] = 0;
 			}
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
 		                        	cb->nlh->nlmsg_seq,
@@ -440,17 +437,17 @@ restart:
 				goto out;
 			}
 		}
-		if (last != NULL) {
-			ip_conntrack_put(last);
+		if (cb->args[1]) {
 			cb->args[1] = 0;
 			goto restart;
 		}
 	}
 out:
 	read_unlock_bh(&ip_conntrack_lock);
+	if (last)
+		ip_conntrack_put(last);
 
 	DEBUGP("leaving, last bucket=%lu id=%u\n", cb->args[0], *id);
-
 	return skb->len;
 }
 
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index af48459..6527d4e 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -429,9 +429,9 @@ ctnetlink_dump_table(struct sk_buff *skb
 			cb->args[0], *id);
 
 	read_lock_bh(&nf_conntrack_lock);
+	last = (struct nf_conn *)cb->args[1];
 	for (; cb->args[0] < nf_conntrack_htable_size; cb->args[0]++) {
 restart:
-		last = (struct nf_conn *)cb->args[1];
 		list_for_each_prev(i, &nf_conntrack_hash[cb->args[0]]) {
 			h = (struct nf_conntrack_tuple_hash *) i;
 			if (DIRECTION(h) != IP_CT_DIR_ORIGINAL)
@@ -442,13 +442,10 @@ restart:
 			 * then dump everything. */
 			if (l3proto && L3PROTO(ct) != l3proto)
 				continue;
-			if (last != NULL) {
-				if (ct == last) {
-					nf_ct_put(last);
-					cb->args[1] = 0;
-					last = NULL;
-				} else
+			if (cb->args[1]) {
+				if (ct != last)
 					continue;
+				cb->args[1] = 0;
 			}
 			if (ctnetlink_fill_info(skb, NETLINK_CB(cb->skb).pid,
 		                        	cb->nlh->nlmsg_seq,
@@ -459,17 +456,17 @@ restart:
 				goto out;
 			}
 		}
-		if (last != NULL) {
-			nf_ct_put(last);
+		if (cb->args[1]) {
 			cb->args[1] = 0;
 			goto restart;
 		}
 	}
 out:
 	read_unlock_bh(&nf_conntrack_lock);
+	if (last)
+		nf_ct_put(last);
 
 	DEBUGP("leaving, last bucket=%lu id=%u\n", cb->args[0], *id);
-
 	return skb->len;
 }
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: possible Bug in ip_conntrack
  2006-08-16 11:31     ` Patrick McHardy
@ 2006-08-16 14:29       ` Maik Hentsche
  0 siblings, 0 replies; 5+ messages in thread
From: Maik Hentsche @ 2006-08-16 14:29 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: netfilter-devel

Zitat von Patrick McHardy <kaber@trash.net>:

> Can you try this patch please? It should fix the problem.

Done. The bug did not reappear. Thanks for your fast reaction.

so long
Maik

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2006-08-16 14:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-14  8:58 possible Bug in ip_conntrack Maik Hentsche
2006-08-14 12:50 ` Patrick McHardy
2006-08-15  8:29   ` Maik Hentsche
2006-08-16 11:31     ` Patrick McHardy
2006-08-16 14:29       ` Maik Hentsche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.