netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* crash in death_by_timeout()
@ 2008-11-17 22:18 BORBELY Zoltan
  2008-11-18 11:07 ` Patrick McHardy
  0 siblings, 1 reply; 13+ messages in thread
From: BORBELY Zoltan @ 2008-11-17 22:18 UTC (permalink / raw)
  To: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 362 bytes --]

Hi,

There's a race in the nfct netlink code, the result is a crash in the
death_by_timeout() function. When a timer interrupt occures during a
new entry addition, the kernel will crash due to a NULL deref. The
attached patch has solved the problem for us. I haven't tested it on
the latest kernels, but the problem still seems to be there.

Bye,
Zoltan Borbely

[-- Attachment #2: nf_conntrack_netlink.patch --]
[-- Type: text/plain, Size: 317 bytes --]

--- /tmp/nf_conntrack_netlink.c-orig	2008-09-29 23:28:55.000000000 +0200
+++ /tmp/nf_conntrack_netlink.c	2008-09-29 23:29:11.000000000 +0200
@@ -1177,8 +1177,8 @@
 		ct->master = master_ct;
 	}
 
-	add_timer(&ct->timeout);
 	nf_conntrack_hash_insert(ct);
+	add_timer(&ct->timeout);
 	rcu_read_unlock();
 
 	return 0;

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-17 22:18 crash in death_by_timeout() BORBELY Zoltan
@ 2008-11-18 11:07 ` Patrick McHardy
  2008-11-18 12:38   ` BORBELY Zoltan
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2008-11-18 11:07 UTC (permalink / raw)
  To: BORBELY Zoltan; +Cc: Netfilter Development Mailinglist

BORBELY Zoltan wrote:
> Hi,
> 
> There's a race in the nfct netlink code, the result is a crash in the
> death_by_timeout() function. When a timer interrupt occures during a
> new entry addition, the kernel will crash due to a NULL deref. The
> attached patch has solved the problem for us. I haven't tested it on
> the latest kernels, but the problem still seems to be there.

> --- /tmp/nf_conntrack_netlink.c-orig	2008-09-29 23:28:55.000000000 +0200
> +++ /tmp/nf_conntrack_netlink.c	2008-09-29 23:29:11.000000000 +0200
> @@ -1177,8 +1177,8 @@
>  		ct->master = master_ct;
>  	}
>  
> -	add_timer(&ct->timeout);
>  	nf_conntrack_hash_insert(ct);
> +	add_timer(&ct->timeout);
>  	rcu_read_unlock();

That code looks very fishy. We should be holding the conntrack lock,
otherwise the addition is not only racy against the timer, but also
against addition of identical conntracks. Let me look into what
happened here.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-18 11:07 ` Patrick McHardy
@ 2008-11-18 12:38   ` BORBELY Zoltan
  2008-11-18 13:19     ` Patrick McHardy
  0 siblings, 1 reply; 13+ messages in thread
From: BORBELY Zoltan @ 2008-11-18 12:38 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 1229 bytes --]

Hi,

On Tue, Nov 18, 2008 at 12:07:20PM +0100, Patrick McHardy wrote:
>> --- /tmp/nf_conntrack_netlink.c-orig	2008-09-29 23:28:55.000000000 +0200
>> +++ /tmp/nf_conntrack_netlink.c	2008-09-29 23:29:11.000000000 +0200
>> @@ -1177,8 +1177,8 @@
>>  		ct->master = master_ct;
>>  	}
>>  -	add_timer(&ct->timeout);
>>  	nf_conntrack_hash_insert(ct);
>> +	add_timer(&ct->timeout);
>>  	rcu_read_unlock();
>
> That code looks very fishy. We should be holding the conntrack lock,
> otherwise the addition is not only racy against the timer, but also
> against addition of identical conntracks. Let me look into what
> happened here.

We have experienced a lot of kernel crashes, _every time_ in the
death_by_timeout() function while we were trying to add a new conntrack
entry from userspace via netlink (attached the disassembled version
of the function, ===> points to the EIP upon the crash). There was a
possibility, that we tried to add conntrack entries with zero timeout
value, maybe it's necessary to trigger this crash. The previous patch
has definitly solved the problem for us.

I've got photos from various crashes, but it takes a little time to
find them. Please let me know if you want to see them.

Thanks,
Zoltan Borbely

[-- Attachment #2: 2.6.26.3-death_by_timeout.txt --]
[-- Type: text/plain, Size: 4551 bytes --]

00000350 <death_by_timeout>:
     350:	55                   	push   %ebp
     351:	89 e5                	mov    %esp,%ebp
     353:	56                   	push   %esi
     354:	53                   	push   %ebx
     355:	89 c3                	mov    %eax,%ebx
     357:	83 ec 0c             	sub    $0xc,%esp

     35a:	8b 90 cc 00 00 00    	mov    0xcc(%eax),%edx
     360:	85 d2                	test   %edx,%edx
     362:	74 08                	je     36c <death_by_timeout+0x1c>
     364:	0f b6 42 08          	movzbl 0x8(%edx),%eax
     368:	84 c0                	test   %al,%al
     36a:	75 74                	jne    3e0 <death_by_timeout+0x90>

     36c:	b8 00 00 00 00       	mov    <nf_conntrack_lock>,%eax
     371:	e8 fc ff ff ff       	call   <_spin_lock_bh>

     376:	8d 4b 04             	lea    0x4(%ebx),%ecx
     379:	ff 05 18 00 00 00    	incl   <per_cpu__nf_conntrack_stat>

     					// hlist_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnode)
     37f:	8b 43 04             	mov    0x4(%ebx),%eax
     382:	8b 51 04             	mov    0x4(%ecx),%edx
     385:	85 c0                	test   %eax,%eax
===> 387:	89 02                	mov    %eax,(%edx)
     389:	74 03                	je     38e <death_by_timeout+0x3e>
     38b:	89 50 04             	mov    %edx,0x4(%eax)
     38e:	c7 41 04 00 02 20 00 	movl   $0x200200,0x4(%ecx)	// LIST_POISON2

     					// hlist_del_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnode)
     395:	8b 43 34             	mov    0x34(%ebx),%eax
     398:	8d 4b 34             	lea    0x34(%ebx),%ecx
     39b:	8b 51 04             	mov    0x4(%ecx),%edx
     39e:	85 c0                	test   %eax,%eax
     3a0:	89 02                	mov    %eax,(%edx)
     3a2:	74 03                	je     3a7 <death_by_timeout+0x57>
     3a4:	89 50 04             	mov    %edx,0x4(%eax)
     3a7:	c7 41 04 00 02 20 00 	movl   $0x200200,0x4(%ecx)	// LIST_POISON2

     3ae:	89 d8                	mov    %ebx,%eax
     3b0:	e8 fc ff ff ff       	call   <nf_ct_remove_expectations>

     3b5:	b8 00 00 00 00       	mov    <nf_conntrack_lock>,%eax
     3ba:	e8 fc ff ff ff       	call   <_spin_unlock_bh>

     3bf:	85 db                	test   %ebx,%ebx
     3c1:	74 77                	je     43a <death_by_timeout+0xea>
     3c3:	ff 0b                	decl   (%ebx)
     3c5:	0f 94 c0             	sete   %al
     3c8:	84 c0                	test   %al,%al
     3ca:	74 07                	je     3d3 <death_by_timeout+0x83>
     3cc:	89 d8                	mov    %ebx,%eax
     3ce:	e8 fc ff ff ff       	call   <nf_conntrack_destroy>
     3d3:	83 c4 0c             	add    $0xc,%esp
     3d6:	5b                   	pop    %ebx
     3d7:	5e                   	pop    %esi
     3d8:	5d                   	pop    %ebp
     3d9:	c3                   	ret    

     3da:	8d b6 00 00 00 00    	lea    0x0(%esi),%esi
     3e0:	0f b6 c0             	movzbl %al,%eax
     3e3:	89 d6                	mov    %edx,%esi
     3e5:	01 c6                	add    %eax,%esi
     3e7:	74 83                	je     36c <death_by_timeout+0x1c>
     3e9:	b8 e9 03 00 00       	mov    $0x3e9,%eax
     3ee:	31 c9                	xor    %ecx,%ecx
     3f0:	89 44 24 08          	mov    %eax,0x8(%esp)
     3f4:	b8 01 00 00 00       	mov    $0x1,%eax
     3f9:	31 d2                	xor    %edx,%edx
     3fb:	89 44 24 04          	mov    %eax,0x4(%esp)
     3ff:	b8 00 00 00 00       	mov    <rcu_lock_map>,%eax
     404:	c7 04 24 02 00 00 00 	movl   $0x2,(%esp)
     40b:	e8 fc ff ff ff       	call   <lock_acquire>
     410:	8b 06                	mov    (%esi),%eax
     412:	85 c0                	test   %eax,%eax
     414:	74 0b                	je     421 <death_by_timeout+0xd1>
     416:	8b 50 40             	mov    0x40(%eax),%edx
     419:	85 d2                	test   %edx,%edx
     41b:	74 04                	je     421 <death_by_timeout+0xd1>
     41d:	89 d8                	mov    %ebx,%eax
     41f:	ff d2                	call   *%edx
     421:	b9 21 04 00 00       	mov    $0x421,%ecx
     426:	ba 01 00 00 00       	mov    $0x1,%edx
     42b:	b8 00 00 00 00       	mov    <rcu_lock_map>,%eax
     430:	e8 fc ff ff ff       	call   <lock_release>
     435:	e9 32 ff ff ff       	jmp    36c <death_by_timeout+0x1c>
     43a:	ba b2 00 00 00       	mov    $0xb2,%edx
     43f:	b8 00 00 00 00       	mov    $0x0,%eax
     444:	e8 fc ff ff ff       	call   <warn_on_slowpath>
     449:	eb 88                	jmp    3d3 <death_by_timeout+0x83>
     44b:	90                   	nop    
     44c:	8d 74 26 00          	lea    0x0(%esi),%esi

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-18 12:38   ` BORBELY Zoltan
@ 2008-11-18 13:19     ` Patrick McHardy
  2008-11-18 13:27       ` Patrick McHardy
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2008-11-18 13:19 UTC (permalink / raw)
  To: BORBELY Zoltan; +Cc: Netfilter Development Mailinglist

BORBELY Zoltan wrote:
> Hi,
> 
> On Tue, Nov 18, 2008 at 12:07:20PM +0100, Patrick McHardy wrote:
>>> --- /tmp/nf_conntrack_netlink.c-orig	2008-09-29 23:28:55.000000000 +0200
>>> +++ /tmp/nf_conntrack_netlink.c	2008-09-29 23:29:11.000000000 +0200
>>> @@ -1177,8 +1177,8 @@
>>>  		ct->master = master_ct;
>>>  	}
>>>  -	add_timer(&ct->timeout);
>>>  	nf_conntrack_hash_insert(ct);
>>> +	add_timer(&ct->timeout);
>>>  	rcu_read_unlock();
>> That code looks very fishy. We should be holding the conntrack lock,
>> otherwise the addition is not only racy against the timer, but also
>> against addition of identical conntracks. Let me look into what
>> happened here.
> 
> We have experienced a lot of kernel crashes, _every time_ in the
> death_by_timeout() function while we were trying to add a new conntrack
> entry from userspace via netlink (attached the disassembled version
> of the function, ===> points to the EIP upon the crash). There was a
> possibility, that we tried to add conntrack entries with zero timeout
> value, maybe it's necessary to trigger this crash. The previous patch
> has definitly solved the problem for us.
> 
> I've got photos from various crashes, but it takes a little time to
> find them. Please let me know if you want to see them.

Thats not necessary, the problem is pretty obvious, I was mainly
wondering at what point we broke it.

I'll send you a patch soon.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-18 13:19     ` Patrick McHardy
@ 2008-11-18 13:27       ` Patrick McHardy
  2008-11-18 22:25         ` Pablo Neira Ayuso
  2008-11-25  8:09         ` BORBELY Zoltan
  0 siblings, 2 replies; 13+ messages in thread
From: Patrick McHardy @ 2008-11-18 13:27 UTC (permalink / raw)
  To: BORBELY Zoltan; +Cc: Netfilter Development Mailinglist, Pablo Neira Ayuso

[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]

Patrick McHardy wrote:
> BORBELY Zoltan wrote:
>> On Tue, Nov 18, 2008 at 12:07:20PM +0100, Patrick McHardy wrote:
>>>> --- /tmp/nf_conntrack_netlink.c-orig    2008-09-29 
>>>> 23:28:55.000000000 +0200
>>>> +++ /tmp/nf_conntrack_netlink.c    2008-09-29 23:29:11.000000000 +0200
>>>> @@ -1177,8 +1177,8 @@
>>>>          ct->master = master_ct;
>>>>      }
>>>>  -    add_timer(&ct->timeout);
>>>>      nf_conntrack_hash_insert(ct);
>>>> +    add_timer(&ct->timeout);
>>>>      rcu_read_unlock();
>>> That code looks very fishy. We should be holding the conntrack lock,
>>> otherwise the addition is not only racy against the timer, but also
>>> against addition of identical conntracks. Let me look into what
>>> happened here.
>>
>> We have experienced a lot of kernel crashes, _every time_ in the
>> death_by_timeout() function while we were trying to add a new conntrack
>> entry from userspace via netlink (attached the disassembled version
>> of the function, ===> points to the EIP upon the crash). There was a
>> possibility, that we tried to add conntrack entries with zero timeout
>> value, maybe it's necessary to trigger this crash. The previous patch
>> has definitly solved the problem for us.
>>
>> I've got photos from various crashes, but it takes a little time to
>> find them. Please let me know if you want to see them.
> 
> Thats not necessary, the problem is pretty obvious, I was mainly
> wondering at what point we broke it.
> 
> I'll send you a patch soon.

Could you try whether this patch fixes the problem?

Pablo, do you recall the reason why the lock isn't held in
ctnetlink_create_conntrack()?

[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1540 bytes --]

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 622d7c6..233fdd2 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -305,9 +305,7 @@ void nf_conntrack_hash_insert(struct nf_conn *ct)
 	hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
 	repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
 
-	spin_lock_bh(&nf_conntrack_lock);
 	__nf_conntrack_hash_insert(ct, hash, repl_hash);
-	spin_unlock_bh(&nf_conntrack_lock);
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_hash_insert);
 
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index a040d46..3b009a3 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1090,7 +1090,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
 	struct nf_conn_help *help;
 	struct nf_conntrack_helper *helper;
 
-	ct = nf_conntrack_alloc(&init_net, otuple, rtuple, GFP_KERNEL);
+	ct = nf_conntrack_alloc(&init_net, otuple, rtuple, GFP_ATOMIC);
 	if (ct == NULL || IS_ERR(ct))
 		return -ENOMEM;
 
@@ -1212,13 +1212,14 @@ ctnetlink_new_conntrack(struct sock *ctnl, struct sk_buff *skb,
 			atomic_inc(&master_ct->ct_general.use);
 		}
 
-		spin_unlock_bh(&nf_conntrack_lock);
 		err = -ENOENT;
 		if (nlh->nlmsg_flags & NLM_F_CREATE)
 			err = ctnetlink_create_conntrack(cda,
 							 &otuple,
 							 &rtuple,
 							 master_ct);
+		spin_unlock_bh(&nf_conntrack_lock);
+
 		if (err < 0 && master_ct)
 			nf_ct_put(master_ct);
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-18 13:27       ` Patrick McHardy
@ 2008-11-18 22:25         ` Pablo Neira Ayuso
  2008-11-19 12:04           ` Patrick McHardy
  2008-11-25  8:09         ` BORBELY Zoltan
  1 sibling, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2008-11-18 22:25 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: BORBELY Zoltan, Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 2351 bytes --]

Patrick McHardy wrote:
> Patrick McHardy wrote:
>> BORBELY Zoltan wrote:
>>> On Tue, Nov 18, 2008 at 12:07:20PM +0100, Patrick McHardy wrote:
>>>>> --- /tmp/nf_conntrack_netlink.c-orig    2008-09-29
>>>>> 23:28:55.000000000 +0200
>>>>> +++ /tmp/nf_conntrack_netlink.c    2008-09-29 23:29:11.000000000 +0200
>>>>> @@ -1177,8 +1177,8 @@
>>>>>          ct->master = master_ct;
>>>>>      }
>>>>>  -    add_timer(&ct->timeout);
>>>>>      nf_conntrack_hash_insert(ct);
>>>>> +    add_timer(&ct->timeout);
>>>>>      rcu_read_unlock();
>>>> That code looks very fishy. We should be holding the conntrack lock,
>>>> otherwise the addition is not only racy against the timer, but also
>>>> against addition of identical conntracks. Let me look into what
>>>> happened here.
>>>
>>> We have experienced a lot of kernel crashes, _every time_ in the
>>> death_by_timeout() function while we were trying to add a new conntrack
>>> entry from userspace via netlink (attached the disassembled version
>>> of the function, ===> points to the EIP upon the crash). There was a
>>> possibility, that we tried to add conntrack entries with zero timeout
>>> value, maybe it's necessary to trigger this crash. The previous patch
>>> has definitly solved the problem for us.
>>>
>>> I've got photos from various crashes, but it takes a little time to
>>> find them. Please let me know if you want to see them.
>>
>> Thats not necessary, the problem is pretty obvious, I was mainly
>> wondering at what point we broke it.
>>
>> I'll send you a patch soon.
> 
> Could you try whether this patch fixes the problem?
> 
> Pablo, do you recall the reason why the lock isn't held in
> ctnetlink_create_conntrack()?

The creation is done under the nfnl_mutex so that requests to create
identical entries cannot race. Of course, this is not enough to avoid
the race with the timer if we set a very small timer for a conntrack :(.

AFAICS, we don't need to enclose the whole conntrack creation path.
Would you prefer the patch attached? This patch should apply fine to
2.6.28-rc.

I can send this patch to -stable. BTW, this patch may conflict with my
patch enqueued for 2.6.29 that adds userspace reporting, let me know if
I can give you a hand in some way (like sending it on top of this one or
yours again, whatever).

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

[-- Attachment #2: fix-creation.patch --]
[-- Type: text/x-diff, Size: 2555 bytes --]

netfilter: ctnetlink: fix racy timer in the creation path

If we set a very small timer for new conntrack created via
ctnetlink, the entry may expire before it is in the hashes,
resulting in a crash. To fix this problem, the timer
addition and the insertion into the hashes is done holding
the nf_conntrack spin lock bh.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---

 include/net/netfilter/nf_conntrack.h |    2 +-
 net/netfilter/nf_conntrack_core.c    |    7 ++-----
 net/netfilter/nf_conntrack_netlink.c |    4 +++-
 3 files changed, 6 insertions(+), 7 deletions(-)


diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index b76a868..a05e2e2 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -197,7 +197,7 @@ extern void nf_ct_free_hashtable(struct hlist_head *hash, int vmalloced,
 extern struct nf_conntrack_tuple_hash *
 __nf_conntrack_find(struct net *net, const struct nf_conntrack_tuple *tuple);
 
-extern void nf_conntrack_hash_insert(struct nf_conn *ct);
+extern void __nf_conntrack_insert(struct nf_conn *ct);
 
 extern void nf_conntrack_flush(struct net *net);
 
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 622d7c6..22246ec 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -298,18 +298,15 @@ static void __nf_conntrack_hash_insert(struct nf_conn *ct,
 			   &net->ct.hash[repl_hash]);
 }
 
-void nf_conntrack_hash_insert(struct nf_conn *ct)
+void __nf_conntrack_insert(struct nf_conn *ct)
 {
 	unsigned int hash, repl_hash;
 
 	hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
 	repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
-
-	spin_lock_bh(&nf_conntrack_lock);
 	__nf_conntrack_hash_insert(ct, hash, repl_hash);
-	spin_unlock_bh(&nf_conntrack_lock);
 }
-EXPORT_SYMBOL_GPL(nf_conntrack_hash_insert);
+EXPORT_SYMBOL_GPL(__nf_conntrack_insert);
 
 /* Confirm a connection given skb; places it in hash table */
 int
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index b132124..2eb8fd3 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1151,8 +1151,10 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
 		ct->master = master_ct;
 	}
 
+	spin_lock_bh(&nf_conntrack_lock);
 	add_timer(&ct->timeout);
-	nf_conntrack_hash_insert(ct);
+	__nf_conntrack_insert(ct);
+	spin_unlock_bh(&nf_conntrack_lock);
 	rcu_read_unlock();
 
 	return 0;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-18 22:25         ` Pablo Neira Ayuso
@ 2008-11-19 12:04           ` Patrick McHardy
  2008-11-19 12:37             ` Pablo Neira Ayuso
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2008-11-19 12:04 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: BORBELY Zoltan, Netfilter Development Mailinglist

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> Pablo, do you recall the reason why the lock isn't held in
>> ctnetlink_create_conntrack()?
> 
> The creation is done under the nfnl_mutex so that requests to create
> identical entries cannot race. Of course, this is not enough to avoid
> the race with the timer if we set a very small timer for a conntrack :(.

Its also not enough to avoid the race against packet processing,
which takes nf_conntrack_lock.

> AFAICS, we don't need to enclose the whole conntrack creation path.
> Would you prefer the patch attached? This patch should apply fine to
> 2.6.28-rc.

That fixes the timer race, but the race between lookup and creation
remains. We really need to either hold the lock the entire time or
redo the lookup before inserting the entry into the hash tables.

> I can send this patch to -stable. BTW, this patch may conflict with my
> patch enqueued for 2.6.29 that adds userspace reporting, let me know if
> I can give you a hand in some way (like sending it on top of this one or
> yours again, whatever).

I'll take care of any merge issues.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-19 12:04           ` Patrick McHardy
@ 2008-11-19 12:37             ` Pablo Neira Ayuso
  2008-11-19 12:47               ` Patrick McHardy
  0 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2008-11-19 12:37 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: BORBELY Zoltan, Netfilter Development Mailinglist

Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> Patrick McHardy wrote:
>>> Pablo, do you recall the reason why the lock isn't held in
>>> ctnetlink_create_conntrack()?
>>
>> The creation is done under the nfnl_mutex so that requests to create
>> identical entries cannot race. Of course, this is not enough to avoid
>> the race with the timer if we set a very small timer for a conntrack :(.
> 
> Its also not enough to avoid the race against packet processing,
> which takes nf_conntrack_lock.
> 
>> AFAICS, we don't need to enclose the whole conntrack creation path.
>> Would you prefer the patch attached? This patch should apply fine to
>> 2.6.28-rc.
> 
> That fixes the timer race, but the race between lookup and creation
> remains. We really need to either hold the lock the entire time or
> redo the lookup before inserting the entry into the hash tables.

I see, I forgot about that case. Your patch should be fine then.

-- 
"Los honestos son inadaptados sociales" -- Les Luthiers

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-19 12:37             ` Pablo Neira Ayuso
@ 2008-11-19 12:47               ` Patrick McHardy
  0 siblings, 0 replies; 13+ messages in thread
From: Patrick McHardy @ 2008-11-19 12:47 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: BORBELY Zoltan, Netfilter Development Mailinglist

Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> Pablo Neira Ayuso wrote:
>>> Patrick McHardy wrote:
>>>> Pablo, do you recall the reason why the lock isn't held in
>>>> ctnetlink_create_conntrack()?
>>> The creation is done under the nfnl_mutex so that requests to create
>>> identical entries cannot race. Of course, this is not enough to avoid
>>> the race with the timer if we set a very small timer for a conntrack :(.
>> Its also not enough to avoid the race against packet processing,
>> which takes nf_conntrack_lock.
>>
>>> AFAICS, we don't need to enclose the whole conntrack creation path.
>>> Would you prefer the patch attached? This patch should apply fine to
>>> 2.6.28-rc.
>> That fixes the timer race, but the race between lookup and creation
>> remains. We really need to either hold the lock the entire time or
>> redo the lookup before inserting the entry into the hash tables.
> 
> I see, I forgot about that case. Your patch should be fine then.

Thanks, I've queued it up, but will wait for confirmation from Zoltan.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-18 13:27       ` Patrick McHardy
  2008-11-18 22:25         ` Pablo Neira Ayuso
@ 2008-11-25  8:09         ` BORBELY Zoltan
  2008-11-25 11:11           ` Patrick McHardy
  1 sibling, 1 reply; 13+ messages in thread
From: BORBELY Zoltan @ 2008-11-25  8:09 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

Hi,

On Tue, Nov 18, 2008 at 02:27:44PM +0100, Patrick McHardy wrote:
> Could you try whether this patch fixes the problem?
>
> Pablo, do you recall the reason why the lock isn't held in
> ctnetlink_create_conntrack()?

> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 622d7c6..233fdd2 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -305,9 +305,7 @@ void nf_conntrack_hash_insert(struct nf_conn *ct)
>  	hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
>  	repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
>  
> -	spin_lock_bh(&nf_conntrack_lock);
>  	__nf_conntrack_hash_insert(ct, hash, repl_hash);
> -	spin_unlock_bh(&nf_conntrack_lock);
>  }
>  EXPORT_SYMBOL_GPL(nf_conntrack_hash_insert);
>  
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index a040d46..3b009a3 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -1090,7 +1090,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
>  	struct nf_conn_help *help;
>  	struct nf_conntrack_helper *helper;
>  
> -	ct = nf_conntrack_alloc(&init_net, otuple, rtuple, GFP_KERNEL);
> +	ct = nf_conntrack_alloc(&init_net, otuple, rtuple, GFP_ATOMIC);
>  	if (ct == NULL || IS_ERR(ct))
>  		return -ENOMEM;
>  
> @@ -1212,13 +1212,14 @@ ctnetlink_new_conntrack(struct sock *ctnl, struct sk_buff *skb,
>  			atomic_inc(&master_ct->ct_general.use);
>  		}
>  
> -		spin_unlock_bh(&nf_conntrack_lock);
>  		err = -ENOENT;
>  		if (nlh->nlmsg_flags & NLM_F_CREATE)
>  			err = ctnetlink_create_conntrack(cda,
>  							 &otuple,
>  							 &rtuple,
>  							 master_ct);
> +		spin_unlock_bh(&nf_conntrack_lock);
> +
>  		if (err < 0 && master_ct)
>  			nf_ct_put(master_ct);
>  

We didn't see any kernel crashes during a half day heavy work (without the
patch the kernel crashed in 3-4 hours every time), but we found a lot of
BUG messages in the log (maybe for every new entry):

    Nov 24 14:45:43 test kernel: BUG: sleeping function called from invalid context at mm/slab.c:3043
    Nov 24 14:45:43 test kernel: in_atomic():1, irqs_disabled():0
    Nov 24 14:45:43 test kernel: 3 locks held by test/3586:
    Nov 24 14:45:43 test kernel:  #0:  (nfnl_mutex){--..}, at: [<d081500f>] nfnetlink_rcv+0xf/0x30 [nfnetlink]
    Nov 24 14:45:43 test kernel:  #1:  (nf_conntrack_lock){-+..}, at: [<d08c979f>] ctnetlink_new_conntrack+0x7f/0x770 [nf_conntrack_netlink]
    Nov 24 14:45:43 test kernel:  #2:  (rcu_read_lock){..--}, at: [<d08c98ee>] ctnetlink_new_conntrack+0x1ce/0x770 [nf_conntrack_netlink]
    Nov 24 14:45:43 test kernel: Pid: 3586, comm: test Not tainted 2.6.27.6bozotest #1
    Nov 24 14:45:43 test kernel:  [<c027a566>] __kmalloc_track_caller+0x126/0x160
    Nov 24 14:45:43 test kernel:  [<c052a7a5>] __nf_ct_ext_add+0xb5/0x290
    Nov 24 14:45:43 test kernel:  [<c026411d>] __krealloc+0x5d/0x80
    Nov 24 14:45:44 test kernel:  [<c052a7a5>] __nf_ct_ext_add+0xb5/0x290
    Nov 24 14:45:44 test kernel:  [<c052a71d>] __nf_ct_ext_add+0x2d/0x290
    Nov 24 14:45:44 test kernel:  [<d08c9af8>] ctnetlink_new_conntrack+0x3d8/0x770 [nf_conntrack_netlink]
    Nov 24 14:45:44 test kernel:  [<d08c98ee>] ctnetlink_new_conntrack+0x1ce/0x770 [nf_conntrack_netlink]
    Nov 24 14:45:44 test kernel:  [<c0248910>] validate_chain+0x380/0xed0
    Nov 24 14:45:44 test kernel:  [<d0815220>] nfnetlink_rcv_msg+0xf0/0x180 [nfnetlink]
    Nov 24 14:45:44 test kernel:  [<d0815130>] nfnetlink_rcv_msg+0x0/0x180 [nfnetlink]
    Nov 24 14:45:44 test kernel:  [<c0520ebc>] netlink_rcv_skb+0x7c/0xa0
    Nov 24 14:45:44 test kernel:  [<d081501b>] nfnetlink_rcv+0x1b/0x30 [nfnetlink]
    Nov 24 14:45:44 test kernel:  [<c0520c50>] netlink_unicast+0x250/0x280
    Nov 24 14:45:44 test kernel:  [<c052145e>] netlink_sendmsg+0x1ee/0x2c0
    Nov 24 14:45:44 test kernel:  [<c04fad7f>] sock_sendmsg+0xbf/0xf0
    Nov 24 14:45:44 test kernel:  [<c02496e5>] __lock_acquire+0x285/0x9e0
    Nov 24 14:45:44 test kernel:  [<c0239790>] autoremove_wake_function+0x0/0x50
    Nov 24 14:45:44 test kernel:  [<c0248910>] validate_chain+0x380/0xed0
    Nov 24 14:45:44 test kernel:  [<c027ee33>] fget_light+0xd3/0xf0
    Nov 24 14:45:44 test kernel:  [<c031bea8>] copy_from_user+0x38/0x80
    Nov 24 14:45:44 test kernel:  [<c031bea8>] copy_from_user+0x38/0x80
    Nov 24 14:45:44 test kernel:  [<c0502e2a>] verify_iovec+0x2a/0x90
    Nov 24 14:45:44 test kernel:  [<c04faf14>] sys_sendmsg+0x164/0x280
    Nov 24 14:45:44 test kernel:  [<c027ee33>] fget_light+0xd3/0xf0
    Nov 24 14:45:44 test kernel:  [<c031c16a>] copy_to_user+0x3a/0x70
    Nov 24 14:45:44 test kernel:  [<c04fb98f>] move_addr_to_user+0x5f/0x70
    Nov 24 14:45:44 test kernel:  [<c04fbf0d>] sys_getsockname+0xcd/0xd0
    Nov 24 14:45:44 test kernel:  [<c022ad6c>] local_bh_enable_ip+0x7c/0xc0
    Nov 24 14:45:44 test kernel:  [<c0247e64>] trace_hardirqs_on_caller+0xc4/0x140
    Nov 24 14:45:44 test kernel:  [<c022ad6c>] local_bh_enable_ip+0x7c/0xc0
    Nov 24 14:45:44 test kernel:  [<c04fe578>] sock_setsockopt+0x128/0x590
    Nov 24 14:45:44 test kernel:  [<c027edb3>] fget_light+0x53/0xf0
    Nov 24 14:45:44 test kernel:  [<c04fa552>] sockfd_lookup_light+0x32/0x60
    Nov 24 14:45:44 test kernel:  [<c04fc39b>] sys_socketcall+0x25b/0x2b0
    Nov 24 14:45:44 test kernel:  [<c031ba44>] trace_hardirqs_on_thunk+0xc/0x10
    Nov 24 14:45:44 test kernel:  [<c031ba44>] trace_hardirqs_on_thunk+0xc/0x10
    Nov 24 14:45:44 test kernel:  [<c0203029>] sysenter_do_call+0x12/0x35
    Nov 24 14:45:44 test kernel:  =======================

Bye,
Zoltan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-25  8:09         ` BORBELY Zoltan
@ 2008-11-25 11:11           ` Patrick McHardy
  2008-11-25 22:48             ` BORBELY Zoltan
  0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2008-11-25 11:11 UTC (permalink / raw)
  To: BORBELY Zoltan; +Cc: Netfilter Development Mailinglist

[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]

BORBELY Zoltan wrote:
> Hi,
> 
> On Tue, Nov 18, 2008 at 02:27:44PM +0100, Patrick McHardy wrote:
>> Could you try whether this patch fixes the problem?
>>
> We didn't see any kernel crashes during a half day heavy work (without the
> patch the kernel crashed in 3-4 hours every time), but we found a lot of
> BUG messages in the log (maybe for every new entry):
> 
>     Nov 24 14:45:43 test kernel: BUG: sleeping function called from invalid context at mm/slab.c:3043
>     Nov 24 14:45:43 test kernel: in_atomic():1, irqs_disabled():0
>     Nov 24 14:45:43 test kernel: 3 locks held by test/3586:
>     Nov 24 14:45:43 test kernel:  #0:  (nfnl_mutex){--..}, at: [<d081500f>] nfnetlink_rcv+0xf/0x30 [nfnetlink]
>     Nov 24 14:45:43 test kernel:  #1:  (nf_conntrack_lock){-+..}, at: [<d08c979f>] ctnetlink_new_conntrack+0x7f/0x770 [nf_conntrack_netlink]
>     Nov 24 14:45:43 test kernel:  #2:  (rcu_read_lock){..--}, at: [<d08c98ee>] ctnetlink_new_conntrack+0x1ce/0x770 [nf_conntrack_netlink]

Crap, I missed a spot. Does this patch help?



[-- Attachment #2: x --]
[-- Type: text/plain, Size: 425 bytes --]

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 3b009a3..5f4a651 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1138,7 +1138,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
 		}
 	}
 
-	nf_ct_acct_ext_add(ct, GFP_KERNEL);
+	nf_ct_acct_ext_add(ct, GFP_ATOMIC);
 
 #if defined(CONFIG_NF_CONNTRACK_MARK)
 	if (cda[CTA_MARK])

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-25 11:11           ` Patrick McHardy
@ 2008-11-25 22:48             ` BORBELY Zoltan
  2008-11-26 11:16               ` Patrick McHardy
  0 siblings, 1 reply; 13+ messages in thread
From: BORBELY Zoltan @ 2008-11-25 22:48 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: Netfilter Development Mailinglist

Hi,

On Tue, Nov 25, 2008 at 12:11:13PM +0100, Patrick McHardy wrote:
> Crap, I missed a spot. Does this patch help?
>
>

> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 3b009a3..5f4a651 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -1138,7 +1138,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
>  		}
>  	}
>  
> -	nf_ct_acct_ext_add(ct, GFP_KERNEL);
> +	nf_ct_acct_ext_add(ct, GFP_ATOMIC);
>  
>  #if defined(CONFIG_NF_CONNTRACK_MARK)
>  	if (cda[CTA_MARK])

Thanks, the BUG messages have disappeared.

Bye,
Zoltan

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: crash in death_by_timeout()
  2008-11-25 22:48             ` BORBELY Zoltan
@ 2008-11-26 11:16               ` Patrick McHardy
  0 siblings, 0 replies; 13+ messages in thread
From: Patrick McHardy @ 2008-11-26 11:16 UTC (permalink / raw)
  To: BORBELY Zoltan; +Cc: Netfilter Development Mailinglist

BORBELY Zoltan wrote:
> Hi,
> 
> On Tue, Nov 25, 2008 at 12:11:13PM +0100, Patrick McHardy wrote:
>> Crap, I missed a spot. Does this patch help?
>>
>>
> 
>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
>> index 3b009a3..5f4a651 100644
>> --- a/net/netfilter/nf_conntrack_netlink.c
>> +++ b/net/netfilter/nf_conntrack_netlink.c
>> @@ -1138,7 +1138,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
>>  		}
>>  	}
>>  
>> -	nf_ct_acct_ext_add(ct, GFP_KERNEL);
>> +	nf_ct_acct_ext_add(ct, GFP_ATOMIC);
>>  
>>  #if defined(CONFIG_NF_CONNTRACK_MARK)
>>  	if (cda[CTA_MARK])
> 
> Thanks, the BUG messages have disappeared.

Great, thanks for testing.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2008-11-26 11:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-17 22:18 crash in death_by_timeout() BORBELY Zoltan
2008-11-18 11:07 ` Patrick McHardy
2008-11-18 12:38   ` BORBELY Zoltan
2008-11-18 13:19     ` Patrick McHardy
2008-11-18 13:27       ` Patrick McHardy
2008-11-18 22:25         ` Pablo Neira Ayuso
2008-11-19 12:04           ` Patrick McHardy
2008-11-19 12:37             ` Pablo Neira Ayuso
2008-11-19 12:47               ` Patrick McHardy
2008-11-25  8:09         ` BORBELY Zoltan
2008-11-25 11:11           ` Patrick McHardy
2008-11-25 22:48             ` BORBELY Zoltan
2008-11-26 11:16               ` Patrick McHardy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).