* crash in death_by_timeout()
@ 2008-11-17 22:18 BORBELY Zoltan
2008-11-18 11:07 ` Patrick McHardy
0 siblings, 1 reply; 13+ messages in thread
From: BORBELY Zoltan @ 2008-11-17 22:18 UTC (permalink / raw)
To: Netfilter Development Mailinglist
[-- Attachment #1: Type: text/plain, Size: 362 bytes --]
Hi,
There's a race in the nfct netlink code, the result is a crash in the
death_by_timeout() function. When a timer interrupt occures during a
new entry addition, the kernel will crash due to a NULL deref. The
attached patch has solved the problem for us. I haven't tested it on
the latest kernels, but the problem still seems to be there.
Bye,
Zoltan Borbely
[-- Attachment #2: nf_conntrack_netlink.patch --]
[-- Type: text/plain, Size: 317 bytes --]
--- /tmp/nf_conntrack_netlink.c-orig 2008-09-29 23:28:55.000000000 +0200
+++ /tmp/nf_conntrack_netlink.c 2008-09-29 23:29:11.000000000 +0200
@@ -1177,8 +1177,8 @@
ct->master = master_ct;
}
- add_timer(&ct->timeout);
nf_conntrack_hash_insert(ct);
+ add_timer(&ct->timeout);
rcu_read_unlock();
return 0;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-17 22:18 crash in death_by_timeout() BORBELY Zoltan
@ 2008-11-18 11:07 ` Patrick McHardy
2008-11-18 12:38 ` BORBELY Zoltan
0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2008-11-18 11:07 UTC (permalink / raw)
To: BORBELY Zoltan; +Cc: Netfilter Development Mailinglist
BORBELY Zoltan wrote:
> Hi,
>
> There's a race in the nfct netlink code, the result is a crash in the
> death_by_timeout() function. When a timer interrupt occures during a
> new entry addition, the kernel will crash due to a NULL deref. The
> attached patch has solved the problem for us. I haven't tested it on
> the latest kernels, but the problem still seems to be there.
> --- /tmp/nf_conntrack_netlink.c-orig 2008-09-29 23:28:55.000000000 +0200
> +++ /tmp/nf_conntrack_netlink.c 2008-09-29 23:29:11.000000000 +0200
> @@ -1177,8 +1177,8 @@
> ct->master = master_ct;
> }
>
> - add_timer(&ct->timeout);
> nf_conntrack_hash_insert(ct);
> + add_timer(&ct->timeout);
> rcu_read_unlock();
That code looks very fishy. We should be holding the conntrack lock,
otherwise the addition is not only racy against the timer, but also
against addition of identical conntracks. Let me look into what
happened here.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-18 11:07 ` Patrick McHardy
@ 2008-11-18 12:38 ` BORBELY Zoltan
2008-11-18 13:19 ` Patrick McHardy
0 siblings, 1 reply; 13+ messages in thread
From: BORBELY Zoltan @ 2008-11-18 12:38 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Netfilter Development Mailinglist
[-- Attachment #1: Type: text/plain, Size: 1229 bytes --]
Hi,
On Tue, Nov 18, 2008 at 12:07:20PM +0100, Patrick McHardy wrote:
>> --- /tmp/nf_conntrack_netlink.c-orig 2008-09-29 23:28:55.000000000 +0200
>> +++ /tmp/nf_conntrack_netlink.c 2008-09-29 23:29:11.000000000 +0200
>> @@ -1177,8 +1177,8 @@
>> ct->master = master_ct;
>> }
>> - add_timer(&ct->timeout);
>> nf_conntrack_hash_insert(ct);
>> + add_timer(&ct->timeout);
>> rcu_read_unlock();
>
> That code looks very fishy. We should be holding the conntrack lock,
> otherwise the addition is not only racy against the timer, but also
> against addition of identical conntracks. Let me look into what
> happened here.
We have experienced a lot of kernel crashes, _every time_ in the
death_by_timeout() function while we were trying to add a new conntrack
entry from userspace via netlink (attached the disassembled version
of the function, ===> points to the EIP upon the crash). There was a
possibility, that we tried to add conntrack entries with zero timeout
value, maybe it's necessary to trigger this crash. The previous patch
has definitly solved the problem for us.
I've got photos from various crashes, but it takes a little time to
find them. Please let me know if you want to see them.
Thanks,
Zoltan Borbely
[-- Attachment #2: 2.6.26.3-death_by_timeout.txt --]
[-- Type: text/plain, Size: 4551 bytes --]
00000350 <death_by_timeout>:
350: 55 push %ebp
351: 89 e5 mov %esp,%ebp
353: 56 push %esi
354: 53 push %ebx
355: 89 c3 mov %eax,%ebx
357: 83 ec 0c sub $0xc,%esp
35a: 8b 90 cc 00 00 00 mov 0xcc(%eax),%edx
360: 85 d2 test %edx,%edx
362: 74 08 je 36c <death_by_timeout+0x1c>
364: 0f b6 42 08 movzbl 0x8(%edx),%eax
368: 84 c0 test %al,%al
36a: 75 74 jne 3e0 <death_by_timeout+0x90>
36c: b8 00 00 00 00 mov <nf_conntrack_lock>,%eax
371: e8 fc ff ff ff call <_spin_lock_bh>
376: 8d 4b 04 lea 0x4(%ebx),%ecx
379: ff 05 18 00 00 00 incl <per_cpu__nf_conntrack_stat>
// hlist_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnode)
37f: 8b 43 04 mov 0x4(%ebx),%eax
382: 8b 51 04 mov 0x4(%ecx),%edx
385: 85 c0 test %eax,%eax
===> 387: 89 02 mov %eax,(%edx)
389: 74 03 je 38e <death_by_timeout+0x3e>
38b: 89 50 04 mov %edx,0x4(%eax)
38e: c7 41 04 00 02 20 00 movl $0x200200,0x4(%ecx) // LIST_POISON2
// hlist_del_rcu(&ct->tuplehash[IP_CT_DIR_REPLY].hnode)
395: 8b 43 34 mov 0x34(%ebx),%eax
398: 8d 4b 34 lea 0x34(%ebx),%ecx
39b: 8b 51 04 mov 0x4(%ecx),%edx
39e: 85 c0 test %eax,%eax
3a0: 89 02 mov %eax,(%edx)
3a2: 74 03 je 3a7 <death_by_timeout+0x57>
3a4: 89 50 04 mov %edx,0x4(%eax)
3a7: c7 41 04 00 02 20 00 movl $0x200200,0x4(%ecx) // LIST_POISON2
3ae: 89 d8 mov %ebx,%eax
3b0: e8 fc ff ff ff call <nf_ct_remove_expectations>
3b5: b8 00 00 00 00 mov <nf_conntrack_lock>,%eax
3ba: e8 fc ff ff ff call <_spin_unlock_bh>
3bf: 85 db test %ebx,%ebx
3c1: 74 77 je 43a <death_by_timeout+0xea>
3c3: ff 0b decl (%ebx)
3c5: 0f 94 c0 sete %al
3c8: 84 c0 test %al,%al
3ca: 74 07 je 3d3 <death_by_timeout+0x83>
3cc: 89 d8 mov %ebx,%eax
3ce: e8 fc ff ff ff call <nf_conntrack_destroy>
3d3: 83 c4 0c add $0xc,%esp
3d6: 5b pop %ebx
3d7: 5e pop %esi
3d8: 5d pop %ebp
3d9: c3 ret
3da: 8d b6 00 00 00 00 lea 0x0(%esi),%esi
3e0: 0f b6 c0 movzbl %al,%eax
3e3: 89 d6 mov %edx,%esi
3e5: 01 c6 add %eax,%esi
3e7: 74 83 je 36c <death_by_timeout+0x1c>
3e9: b8 e9 03 00 00 mov $0x3e9,%eax
3ee: 31 c9 xor %ecx,%ecx
3f0: 89 44 24 08 mov %eax,0x8(%esp)
3f4: b8 01 00 00 00 mov $0x1,%eax
3f9: 31 d2 xor %edx,%edx
3fb: 89 44 24 04 mov %eax,0x4(%esp)
3ff: b8 00 00 00 00 mov <rcu_lock_map>,%eax
404: c7 04 24 02 00 00 00 movl $0x2,(%esp)
40b: e8 fc ff ff ff call <lock_acquire>
410: 8b 06 mov (%esi),%eax
412: 85 c0 test %eax,%eax
414: 74 0b je 421 <death_by_timeout+0xd1>
416: 8b 50 40 mov 0x40(%eax),%edx
419: 85 d2 test %edx,%edx
41b: 74 04 je 421 <death_by_timeout+0xd1>
41d: 89 d8 mov %ebx,%eax
41f: ff d2 call *%edx
421: b9 21 04 00 00 mov $0x421,%ecx
426: ba 01 00 00 00 mov $0x1,%edx
42b: b8 00 00 00 00 mov <rcu_lock_map>,%eax
430: e8 fc ff ff ff call <lock_release>
435: e9 32 ff ff ff jmp 36c <death_by_timeout+0x1c>
43a: ba b2 00 00 00 mov $0xb2,%edx
43f: b8 00 00 00 00 mov $0x0,%eax
444: e8 fc ff ff ff call <warn_on_slowpath>
449: eb 88 jmp 3d3 <death_by_timeout+0x83>
44b: 90 nop
44c: 8d 74 26 00 lea 0x0(%esi),%esi
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-18 12:38 ` BORBELY Zoltan
@ 2008-11-18 13:19 ` Patrick McHardy
2008-11-18 13:27 ` Patrick McHardy
0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2008-11-18 13:19 UTC (permalink / raw)
To: BORBELY Zoltan; +Cc: Netfilter Development Mailinglist
BORBELY Zoltan wrote:
> Hi,
>
> On Tue, Nov 18, 2008 at 12:07:20PM +0100, Patrick McHardy wrote:
>>> --- /tmp/nf_conntrack_netlink.c-orig 2008-09-29 23:28:55.000000000 +0200
>>> +++ /tmp/nf_conntrack_netlink.c 2008-09-29 23:29:11.000000000 +0200
>>> @@ -1177,8 +1177,8 @@
>>> ct->master = master_ct;
>>> }
>>> - add_timer(&ct->timeout);
>>> nf_conntrack_hash_insert(ct);
>>> + add_timer(&ct->timeout);
>>> rcu_read_unlock();
>> That code looks very fishy. We should be holding the conntrack lock,
>> otherwise the addition is not only racy against the timer, but also
>> against addition of identical conntracks. Let me look into what
>> happened here.
>
> We have experienced a lot of kernel crashes, _every time_ in the
> death_by_timeout() function while we were trying to add a new conntrack
> entry from userspace via netlink (attached the disassembled version
> of the function, ===> points to the EIP upon the crash). There was a
> possibility, that we tried to add conntrack entries with zero timeout
> value, maybe it's necessary to trigger this crash. The previous patch
> has definitly solved the problem for us.
>
> I've got photos from various crashes, but it takes a little time to
> find them. Please let me know if you want to see them.
Thats not necessary, the problem is pretty obvious, I was mainly
wondering at what point we broke it.
I'll send you a patch soon.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-18 13:19 ` Patrick McHardy
@ 2008-11-18 13:27 ` Patrick McHardy
2008-11-18 22:25 ` Pablo Neira Ayuso
2008-11-25 8:09 ` BORBELY Zoltan
0 siblings, 2 replies; 13+ messages in thread
From: Patrick McHardy @ 2008-11-18 13:27 UTC (permalink / raw)
To: BORBELY Zoltan; +Cc: Netfilter Development Mailinglist, Pablo Neira Ayuso
[-- Attachment #1: Type: text/plain, Size: 1623 bytes --]
Patrick McHardy wrote:
> BORBELY Zoltan wrote:
>> On Tue, Nov 18, 2008 at 12:07:20PM +0100, Patrick McHardy wrote:
>>>> --- /tmp/nf_conntrack_netlink.c-orig 2008-09-29
>>>> 23:28:55.000000000 +0200
>>>> +++ /tmp/nf_conntrack_netlink.c 2008-09-29 23:29:11.000000000 +0200
>>>> @@ -1177,8 +1177,8 @@
>>>> ct->master = master_ct;
>>>> }
>>>> - add_timer(&ct->timeout);
>>>> nf_conntrack_hash_insert(ct);
>>>> + add_timer(&ct->timeout);
>>>> rcu_read_unlock();
>>> That code looks very fishy. We should be holding the conntrack lock,
>>> otherwise the addition is not only racy against the timer, but also
>>> against addition of identical conntracks. Let me look into what
>>> happened here.
>>
>> We have experienced a lot of kernel crashes, _every time_ in the
>> death_by_timeout() function while we were trying to add a new conntrack
>> entry from userspace via netlink (attached the disassembled version
>> of the function, ===> points to the EIP upon the crash). There was a
>> possibility, that we tried to add conntrack entries with zero timeout
>> value, maybe it's necessary to trigger this crash. The previous patch
>> has definitly solved the problem for us.
>>
>> I've got photos from various crashes, but it takes a little time to
>> find them. Please let me know if you want to see them.
>
> Thats not necessary, the problem is pretty obvious, I was mainly
> wondering at what point we broke it.
>
> I'll send you a patch soon.
Could you try whether this patch fixes the problem?
Pablo, do you recall the reason why the lock isn't held in
ctnetlink_create_conntrack()?
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 1540 bytes --]
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 622d7c6..233fdd2 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -305,9 +305,7 @@ void nf_conntrack_hash_insert(struct nf_conn *ct)
hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
- spin_lock_bh(&nf_conntrack_lock);
__nf_conntrack_hash_insert(ct, hash, repl_hash);
- spin_unlock_bh(&nf_conntrack_lock);
}
EXPORT_SYMBOL_GPL(nf_conntrack_hash_insert);
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index a040d46..3b009a3 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1090,7 +1090,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
struct nf_conn_help *help;
struct nf_conntrack_helper *helper;
- ct = nf_conntrack_alloc(&init_net, otuple, rtuple, GFP_KERNEL);
+ ct = nf_conntrack_alloc(&init_net, otuple, rtuple, GFP_ATOMIC);
if (ct == NULL || IS_ERR(ct))
return -ENOMEM;
@@ -1212,13 +1212,14 @@ ctnetlink_new_conntrack(struct sock *ctnl, struct sk_buff *skb,
atomic_inc(&master_ct->ct_general.use);
}
- spin_unlock_bh(&nf_conntrack_lock);
err = -ENOENT;
if (nlh->nlmsg_flags & NLM_F_CREATE)
err = ctnetlink_create_conntrack(cda,
&otuple,
&rtuple,
master_ct);
+ spin_unlock_bh(&nf_conntrack_lock);
+
if (err < 0 && master_ct)
nf_ct_put(master_ct);
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-18 13:27 ` Patrick McHardy
@ 2008-11-18 22:25 ` Pablo Neira Ayuso
2008-11-19 12:04 ` Patrick McHardy
2008-11-25 8:09 ` BORBELY Zoltan
1 sibling, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2008-11-18 22:25 UTC (permalink / raw)
To: Patrick McHardy; +Cc: BORBELY Zoltan, Netfilter Development Mailinglist
[-- Attachment #1: Type: text/plain, Size: 2351 bytes --]
Patrick McHardy wrote:
> Patrick McHardy wrote:
>> BORBELY Zoltan wrote:
>>> On Tue, Nov 18, 2008 at 12:07:20PM +0100, Patrick McHardy wrote:
>>>>> --- /tmp/nf_conntrack_netlink.c-orig 2008-09-29
>>>>> 23:28:55.000000000 +0200
>>>>> +++ /tmp/nf_conntrack_netlink.c 2008-09-29 23:29:11.000000000 +0200
>>>>> @@ -1177,8 +1177,8 @@
>>>>> ct->master = master_ct;
>>>>> }
>>>>> - add_timer(&ct->timeout);
>>>>> nf_conntrack_hash_insert(ct);
>>>>> + add_timer(&ct->timeout);
>>>>> rcu_read_unlock();
>>>> That code looks very fishy. We should be holding the conntrack lock,
>>>> otherwise the addition is not only racy against the timer, but also
>>>> against addition of identical conntracks. Let me look into what
>>>> happened here.
>>>
>>> We have experienced a lot of kernel crashes, _every time_ in the
>>> death_by_timeout() function while we were trying to add a new conntrack
>>> entry from userspace via netlink (attached the disassembled version
>>> of the function, ===> points to the EIP upon the crash). There was a
>>> possibility, that we tried to add conntrack entries with zero timeout
>>> value, maybe it's necessary to trigger this crash. The previous patch
>>> has definitly solved the problem for us.
>>>
>>> I've got photos from various crashes, but it takes a little time to
>>> find them. Please let me know if you want to see them.
>>
>> Thats not necessary, the problem is pretty obvious, I was mainly
>> wondering at what point we broke it.
>>
>> I'll send you a patch soon.
>
> Could you try whether this patch fixes the problem?
>
> Pablo, do you recall the reason why the lock isn't held in
> ctnetlink_create_conntrack()?
The creation is done under the nfnl_mutex so that requests to create
identical entries cannot race. Of course, this is not enough to avoid
the race with the timer if we set a very small timer for a conntrack :(.
AFAICS, we don't need to enclose the whole conntrack creation path.
Would you prefer the patch attached? This patch should apply fine to
2.6.28-rc.
I can send this patch to -stable. BTW, this patch may conflict with my
patch enqueued for 2.6.29 that adds userspace reporting, let me know if
I can give you a hand in some way (like sending it on top of this one or
yours again, whatever).
--
"Los honestos son inadaptados sociales" -- Les Luthiers
[-- Attachment #2: fix-creation.patch --]
[-- Type: text/x-diff, Size: 2555 bytes --]
netfilter: ctnetlink: fix racy timer in the creation path
If we set a very small timer for new conntrack created via
ctnetlink, the entry may expire before it is in the hashes,
resulting in a crash. To fix this problem, the timer
addition and the insertion into the hashes is done holding
the nf_conntrack spin lock bh.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
include/net/netfilter/nf_conntrack.h | 2 +-
net/netfilter/nf_conntrack_core.c | 7 ++-----
net/netfilter/nf_conntrack_netlink.c | 4 +++-
3 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index b76a868..a05e2e2 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -197,7 +197,7 @@ extern void nf_ct_free_hashtable(struct hlist_head *hash, int vmalloced,
extern struct nf_conntrack_tuple_hash *
__nf_conntrack_find(struct net *net, const struct nf_conntrack_tuple *tuple);
-extern void nf_conntrack_hash_insert(struct nf_conn *ct);
+extern void __nf_conntrack_insert(struct nf_conn *ct);
extern void nf_conntrack_flush(struct net *net);
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 622d7c6..22246ec 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -298,18 +298,15 @@ static void __nf_conntrack_hash_insert(struct nf_conn *ct,
&net->ct.hash[repl_hash]);
}
-void nf_conntrack_hash_insert(struct nf_conn *ct)
+void __nf_conntrack_insert(struct nf_conn *ct)
{
unsigned int hash, repl_hash;
hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
-
- spin_lock_bh(&nf_conntrack_lock);
__nf_conntrack_hash_insert(ct, hash, repl_hash);
- spin_unlock_bh(&nf_conntrack_lock);
}
-EXPORT_SYMBOL_GPL(nf_conntrack_hash_insert);
+EXPORT_SYMBOL_GPL(__nf_conntrack_insert);
/* Confirm a connection given skb; places it in hash table */
int
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index b132124..2eb8fd3 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1151,8 +1151,10 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
ct->master = master_ct;
}
+ spin_lock_bh(&nf_conntrack_lock);
add_timer(&ct->timeout);
- nf_conntrack_hash_insert(ct);
+ __nf_conntrack_insert(ct);
+ spin_unlock_bh(&nf_conntrack_lock);
rcu_read_unlock();
return 0;
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-18 22:25 ` Pablo Neira Ayuso
@ 2008-11-19 12:04 ` Patrick McHardy
2008-11-19 12:37 ` Pablo Neira Ayuso
0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2008-11-19 12:04 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: BORBELY Zoltan, Netfilter Development Mailinglist
Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> Pablo, do you recall the reason why the lock isn't held in
>> ctnetlink_create_conntrack()?
>
> The creation is done under the nfnl_mutex so that requests to create
> identical entries cannot race. Of course, this is not enough to avoid
> the race with the timer if we set a very small timer for a conntrack :(.
Its also not enough to avoid the race against packet processing,
which takes nf_conntrack_lock.
> AFAICS, we don't need to enclose the whole conntrack creation path.
> Would you prefer the patch attached? This patch should apply fine to
> 2.6.28-rc.
That fixes the timer race, but the race between lookup and creation
remains. We really need to either hold the lock the entire time or
redo the lookup before inserting the entry into the hash tables.
> I can send this patch to -stable. BTW, this patch may conflict with my
> patch enqueued for 2.6.29 that adds userspace reporting, let me know if
> I can give you a hand in some way (like sending it on top of this one or
> yours again, whatever).
I'll take care of any merge issues.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-19 12:04 ` Patrick McHardy
@ 2008-11-19 12:37 ` Pablo Neira Ayuso
2008-11-19 12:47 ` Patrick McHardy
0 siblings, 1 reply; 13+ messages in thread
From: Pablo Neira Ayuso @ 2008-11-19 12:37 UTC (permalink / raw)
To: Patrick McHardy; +Cc: BORBELY Zoltan, Netfilter Development Mailinglist
Patrick McHardy wrote:
> Pablo Neira Ayuso wrote:
>> Patrick McHardy wrote:
>>> Pablo, do you recall the reason why the lock isn't held in
>>> ctnetlink_create_conntrack()?
>>
>> The creation is done under the nfnl_mutex so that requests to create
>> identical entries cannot race. Of course, this is not enough to avoid
>> the race with the timer if we set a very small timer for a conntrack :(.
>
> Its also not enough to avoid the race against packet processing,
> which takes nf_conntrack_lock.
>
>> AFAICS, we don't need to enclose the whole conntrack creation path.
>> Would you prefer the patch attached? This patch should apply fine to
>> 2.6.28-rc.
>
> That fixes the timer race, but the race between lookup and creation
> remains. We really need to either hold the lock the entire time or
> redo the lookup before inserting the entry into the hash tables.
I see, I forgot about that case. Your patch should be fine then.
--
"Los honestos son inadaptados sociales" -- Les Luthiers
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-19 12:37 ` Pablo Neira Ayuso
@ 2008-11-19 12:47 ` Patrick McHardy
0 siblings, 0 replies; 13+ messages in thread
From: Patrick McHardy @ 2008-11-19 12:47 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: BORBELY Zoltan, Netfilter Development Mailinglist
Pablo Neira Ayuso wrote:
> Patrick McHardy wrote:
>> Pablo Neira Ayuso wrote:
>>> Patrick McHardy wrote:
>>>> Pablo, do you recall the reason why the lock isn't held in
>>>> ctnetlink_create_conntrack()?
>>> The creation is done under the nfnl_mutex so that requests to create
>>> identical entries cannot race. Of course, this is not enough to avoid
>>> the race with the timer if we set a very small timer for a conntrack :(.
>> Its also not enough to avoid the race against packet processing,
>> which takes nf_conntrack_lock.
>>
>>> AFAICS, we don't need to enclose the whole conntrack creation path.
>>> Would you prefer the patch attached? This patch should apply fine to
>>> 2.6.28-rc.
>> That fixes the timer race, but the race between lookup and creation
>> remains. We really need to either hold the lock the entire time or
>> redo the lookup before inserting the entry into the hash tables.
>
> I see, I forgot about that case. Your patch should be fine then.
Thanks, I've queued it up, but will wait for confirmation from Zoltan.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-18 13:27 ` Patrick McHardy
2008-11-18 22:25 ` Pablo Neira Ayuso
@ 2008-11-25 8:09 ` BORBELY Zoltan
2008-11-25 11:11 ` Patrick McHardy
1 sibling, 1 reply; 13+ messages in thread
From: BORBELY Zoltan @ 2008-11-25 8:09 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Netfilter Development Mailinglist
Hi,
On Tue, Nov 18, 2008 at 02:27:44PM +0100, Patrick McHardy wrote:
> Could you try whether this patch fixes the problem?
>
> Pablo, do you recall the reason why the lock isn't held in
> ctnetlink_create_conntrack()?
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 622d7c6..233fdd2 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -305,9 +305,7 @@ void nf_conntrack_hash_insert(struct nf_conn *ct)
> hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
> repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
>
> - spin_lock_bh(&nf_conntrack_lock);
> __nf_conntrack_hash_insert(ct, hash, repl_hash);
> - spin_unlock_bh(&nf_conntrack_lock);
> }
> EXPORT_SYMBOL_GPL(nf_conntrack_hash_insert);
>
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index a040d46..3b009a3 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -1090,7 +1090,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
> struct nf_conn_help *help;
> struct nf_conntrack_helper *helper;
>
> - ct = nf_conntrack_alloc(&init_net, otuple, rtuple, GFP_KERNEL);
> + ct = nf_conntrack_alloc(&init_net, otuple, rtuple, GFP_ATOMIC);
> if (ct == NULL || IS_ERR(ct))
> return -ENOMEM;
>
> @@ -1212,13 +1212,14 @@ ctnetlink_new_conntrack(struct sock *ctnl, struct sk_buff *skb,
> atomic_inc(&master_ct->ct_general.use);
> }
>
> - spin_unlock_bh(&nf_conntrack_lock);
> err = -ENOENT;
> if (nlh->nlmsg_flags & NLM_F_CREATE)
> err = ctnetlink_create_conntrack(cda,
> &otuple,
> &rtuple,
> master_ct);
> + spin_unlock_bh(&nf_conntrack_lock);
> +
> if (err < 0 && master_ct)
> nf_ct_put(master_ct);
>
We didn't see any kernel crashes during a half day heavy work (without the
patch the kernel crashed in 3-4 hours every time), but we found a lot of
BUG messages in the log (maybe for every new entry):
Nov 24 14:45:43 test kernel: BUG: sleeping function called from invalid context at mm/slab.c:3043
Nov 24 14:45:43 test kernel: in_atomic():1, irqs_disabled():0
Nov 24 14:45:43 test kernel: 3 locks held by test/3586:
Nov 24 14:45:43 test kernel: #0: (nfnl_mutex){--..}, at: [<d081500f>] nfnetlink_rcv+0xf/0x30 [nfnetlink]
Nov 24 14:45:43 test kernel: #1: (nf_conntrack_lock){-+..}, at: [<d08c979f>] ctnetlink_new_conntrack+0x7f/0x770 [nf_conntrack_netlink]
Nov 24 14:45:43 test kernel: #2: (rcu_read_lock){..--}, at: [<d08c98ee>] ctnetlink_new_conntrack+0x1ce/0x770 [nf_conntrack_netlink]
Nov 24 14:45:43 test kernel: Pid: 3586, comm: test Not tainted 2.6.27.6bozotest #1
Nov 24 14:45:43 test kernel: [<c027a566>] __kmalloc_track_caller+0x126/0x160
Nov 24 14:45:43 test kernel: [<c052a7a5>] __nf_ct_ext_add+0xb5/0x290
Nov 24 14:45:43 test kernel: [<c026411d>] __krealloc+0x5d/0x80
Nov 24 14:45:44 test kernel: [<c052a7a5>] __nf_ct_ext_add+0xb5/0x290
Nov 24 14:45:44 test kernel: [<c052a71d>] __nf_ct_ext_add+0x2d/0x290
Nov 24 14:45:44 test kernel: [<d08c9af8>] ctnetlink_new_conntrack+0x3d8/0x770 [nf_conntrack_netlink]
Nov 24 14:45:44 test kernel: [<d08c98ee>] ctnetlink_new_conntrack+0x1ce/0x770 [nf_conntrack_netlink]
Nov 24 14:45:44 test kernel: [<c0248910>] validate_chain+0x380/0xed0
Nov 24 14:45:44 test kernel: [<d0815220>] nfnetlink_rcv_msg+0xf0/0x180 [nfnetlink]
Nov 24 14:45:44 test kernel: [<d0815130>] nfnetlink_rcv_msg+0x0/0x180 [nfnetlink]
Nov 24 14:45:44 test kernel: [<c0520ebc>] netlink_rcv_skb+0x7c/0xa0
Nov 24 14:45:44 test kernel: [<d081501b>] nfnetlink_rcv+0x1b/0x30 [nfnetlink]
Nov 24 14:45:44 test kernel: [<c0520c50>] netlink_unicast+0x250/0x280
Nov 24 14:45:44 test kernel: [<c052145e>] netlink_sendmsg+0x1ee/0x2c0
Nov 24 14:45:44 test kernel: [<c04fad7f>] sock_sendmsg+0xbf/0xf0
Nov 24 14:45:44 test kernel: [<c02496e5>] __lock_acquire+0x285/0x9e0
Nov 24 14:45:44 test kernel: [<c0239790>] autoremove_wake_function+0x0/0x50
Nov 24 14:45:44 test kernel: [<c0248910>] validate_chain+0x380/0xed0
Nov 24 14:45:44 test kernel: [<c027ee33>] fget_light+0xd3/0xf0
Nov 24 14:45:44 test kernel: [<c031bea8>] copy_from_user+0x38/0x80
Nov 24 14:45:44 test kernel: [<c031bea8>] copy_from_user+0x38/0x80
Nov 24 14:45:44 test kernel: [<c0502e2a>] verify_iovec+0x2a/0x90
Nov 24 14:45:44 test kernel: [<c04faf14>] sys_sendmsg+0x164/0x280
Nov 24 14:45:44 test kernel: [<c027ee33>] fget_light+0xd3/0xf0
Nov 24 14:45:44 test kernel: [<c031c16a>] copy_to_user+0x3a/0x70
Nov 24 14:45:44 test kernel: [<c04fb98f>] move_addr_to_user+0x5f/0x70
Nov 24 14:45:44 test kernel: [<c04fbf0d>] sys_getsockname+0xcd/0xd0
Nov 24 14:45:44 test kernel: [<c022ad6c>] local_bh_enable_ip+0x7c/0xc0
Nov 24 14:45:44 test kernel: [<c0247e64>] trace_hardirqs_on_caller+0xc4/0x140
Nov 24 14:45:44 test kernel: [<c022ad6c>] local_bh_enable_ip+0x7c/0xc0
Nov 24 14:45:44 test kernel: [<c04fe578>] sock_setsockopt+0x128/0x590
Nov 24 14:45:44 test kernel: [<c027edb3>] fget_light+0x53/0xf0
Nov 24 14:45:44 test kernel: [<c04fa552>] sockfd_lookup_light+0x32/0x60
Nov 24 14:45:44 test kernel: [<c04fc39b>] sys_socketcall+0x25b/0x2b0
Nov 24 14:45:44 test kernel: [<c031ba44>] trace_hardirqs_on_thunk+0xc/0x10
Nov 24 14:45:44 test kernel: [<c031ba44>] trace_hardirqs_on_thunk+0xc/0x10
Nov 24 14:45:44 test kernel: [<c0203029>] sysenter_do_call+0x12/0x35
Nov 24 14:45:44 test kernel: =======================
Bye,
Zoltan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-25 8:09 ` BORBELY Zoltan
@ 2008-11-25 11:11 ` Patrick McHardy
2008-11-25 22:48 ` BORBELY Zoltan
0 siblings, 1 reply; 13+ messages in thread
From: Patrick McHardy @ 2008-11-25 11:11 UTC (permalink / raw)
To: BORBELY Zoltan; +Cc: Netfilter Development Mailinglist
[-- Attachment #1: Type: text/plain, Size: 1044 bytes --]
BORBELY Zoltan wrote:
> Hi,
>
> On Tue, Nov 18, 2008 at 02:27:44PM +0100, Patrick McHardy wrote:
>> Could you try whether this patch fixes the problem?
>>
> We didn't see any kernel crashes during a half day heavy work (without the
> patch the kernel crashed in 3-4 hours every time), but we found a lot of
> BUG messages in the log (maybe for every new entry):
>
> Nov 24 14:45:43 test kernel: BUG: sleeping function called from invalid context at mm/slab.c:3043
> Nov 24 14:45:43 test kernel: in_atomic():1, irqs_disabled():0
> Nov 24 14:45:43 test kernel: 3 locks held by test/3586:
> Nov 24 14:45:43 test kernel: #0: (nfnl_mutex){--..}, at: [<d081500f>] nfnetlink_rcv+0xf/0x30 [nfnetlink]
> Nov 24 14:45:43 test kernel: #1: (nf_conntrack_lock){-+..}, at: [<d08c979f>] ctnetlink_new_conntrack+0x7f/0x770 [nf_conntrack_netlink]
> Nov 24 14:45:43 test kernel: #2: (rcu_read_lock){..--}, at: [<d08c98ee>] ctnetlink_new_conntrack+0x1ce/0x770 [nf_conntrack_netlink]
Crap, I missed a spot. Does this patch help?
[-- Attachment #2: x --]
[-- Type: text/plain, Size: 425 bytes --]
diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 3b009a3..5f4a651 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1138,7 +1138,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
}
}
- nf_ct_acct_ext_add(ct, GFP_KERNEL);
+ nf_ct_acct_ext_add(ct, GFP_ATOMIC);
#if defined(CONFIG_NF_CONNTRACK_MARK)
if (cda[CTA_MARK])
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-25 11:11 ` Patrick McHardy
@ 2008-11-25 22:48 ` BORBELY Zoltan
2008-11-26 11:16 ` Patrick McHardy
0 siblings, 1 reply; 13+ messages in thread
From: BORBELY Zoltan @ 2008-11-25 22:48 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Netfilter Development Mailinglist
Hi,
On Tue, Nov 25, 2008 at 12:11:13PM +0100, Patrick McHardy wrote:
> Crap, I missed a spot. Does this patch help?
>
>
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 3b009a3..5f4a651 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -1138,7 +1138,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
> }
> }
>
> - nf_ct_acct_ext_add(ct, GFP_KERNEL);
> + nf_ct_acct_ext_add(ct, GFP_ATOMIC);
>
> #if defined(CONFIG_NF_CONNTRACK_MARK)
> if (cda[CTA_MARK])
Thanks, the BUG messages have disappeared.
Bye,
Zoltan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: crash in death_by_timeout()
2008-11-25 22:48 ` BORBELY Zoltan
@ 2008-11-26 11:16 ` Patrick McHardy
0 siblings, 0 replies; 13+ messages in thread
From: Patrick McHardy @ 2008-11-26 11:16 UTC (permalink / raw)
To: BORBELY Zoltan; +Cc: Netfilter Development Mailinglist
BORBELY Zoltan wrote:
> Hi,
>
> On Tue, Nov 25, 2008 at 12:11:13PM +0100, Patrick McHardy wrote:
>> Crap, I missed a spot. Does this patch help?
>>
>>
>
>> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
>> index 3b009a3..5f4a651 100644
>> --- a/net/netfilter/nf_conntrack_netlink.c
>> +++ b/net/netfilter/nf_conntrack_netlink.c
>> @@ -1138,7 +1138,7 @@ ctnetlink_create_conntrack(struct nlattr *cda[],
>> }
>> }
>>
>> - nf_ct_acct_ext_add(ct, GFP_KERNEL);
>> + nf_ct_acct_ext_add(ct, GFP_ATOMIC);
>>
>> #if defined(CONFIG_NF_CONNTRACK_MARK)
>> if (cda[CTA_MARK])
>
> Thanks, the BUG messages have disappeared.
Great, thanks for testing.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2008-11-26 11:16 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-17 22:18 crash in death_by_timeout() BORBELY Zoltan
2008-11-18 11:07 ` Patrick McHardy
2008-11-18 12:38 ` BORBELY Zoltan
2008-11-18 13:19 ` Patrick McHardy
2008-11-18 13:27 ` Patrick McHardy
2008-11-18 22:25 ` Pablo Neira Ayuso
2008-11-19 12:04 ` Patrick McHardy
2008-11-19 12:37 ` Pablo Neira Ayuso
2008-11-19 12:47 ` Patrick McHardy
2008-11-25 8:09 ` BORBELY Zoltan
2008-11-25 11:11 ` Patrick McHardy
2008-11-25 22:48 ` BORBELY Zoltan
2008-11-26 11:16 ` Patrick McHardy
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).