From: Eric Dumazet <eric.dumazet@gmail.com>
To: Patrick McHardy <kaber@trash.net>
Cc: Ingo Molnar <mingo@elte.hu>, David Miller <davem@davemloft.net>,
Thomas Gleixner <tglx@linutronix.de>,
torvalds@linux-foundation.org, akpm@linux-foundation.org,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Pablo Neira Ayuso <pablo@netfilter.org>
Subject: [PATCH] netfilter: conntrack: death_by_timeout() fix
Date: Fri, 19 Jun 2009 00:46:11 +0200 [thread overview]
Message-ID: <4A3AC3B3.2030002@gmail.com> (raw)
In-Reply-To: <4A3A66CC.4090205@trash.net>
Patrick McHardy a écrit :
> Eric Dumazet wrote:
>> In my own analysis, I found death_by_timeout() might be problematic,
>> with RCU and lockless lookups.
>>
>> static void death_by_timeout(unsigned long ul_conntrack)
>> {
>> struct nf_conn *ct = (void *)ul_conntrack;
>>
>> if (!test_bit(IPS_DYING_BIT, &ct->status) &&
>> unlikely(nf_conntrack_event(IPCT_DESTROY, ct) < 0)) {
>> /* destroy event was not delivered */
>> nf_ct_delete_from_lists(ct);
>> << HERE >>
>>
>> nf_ct_insert_dying_list(ct);
>> return;
>> }
>> set_bit(IPS_DYING_BIT, &ct->status);
>> nf_ct_delete_from_lists(ct);
>> nf_ct_put(ct);
>> }
>>
>>
>> We delete ct from a list and insert it in a new list.
>>
>> I believe a reader could "*catch*" ct while doing a lookup and miss
>> the end
>> of its chain. (nulls algo check the null value at the end of lookup
>> and can
>> decide to restart the lookup if the null value is not the expected one)
>>
>> We need to change nf_conntrack_init_net() and use a different "null"
>> value,
>> guaranteed not being used in regular lists
>
> Good catch. This is a new bug, but it shouldn't matter in this case
> since nf_conntrack_event() can't fail unless you have a userspace
> listener that makes use of reliable delivery, which I think hasn't
> even been released yet.
>
>> Patch follows :
>
> Looks good. If you send me a Signed-off-by: I'll already apply it.
Sure, here it is.
Thank you
[PATCH] netfilter: conntrack: death_by_timeout() fix
death_by_timeout() might delete a conntrack from hash list
and insert it in dying list.
nf_ct_delete_from_lists(ct);
nf_ct_insert_dying_list(ct);
I believe a (lockless) reader could *catch* ct while doing a lookup
and miss the end of its chain.
(nulls lookup algo must check the null value at the end of lookup and
should restart if the null value is not the expected one.
cf Documentation/RCU/rculist_nulls.txt for details)
We need to change nf_conntrack_init_net() and use a different "null" value,
guaranteed not being used in regular lists. Choose very large values, since
hash table uses [0..size-1] null values.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
---
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 5f72b94..5276a2d 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1267,13 +1267,19 @@ err_cache:
return ret;
}
+/*
+ * We need to use special "null" values, not used in hash table
+ */
+#define UNCONFIRMED_NULLS_VAL ((1<<30)+0)
+#define DYING_NULLS_VAL ((1<<30)+1)
+
static int nf_conntrack_init_net(struct net *net)
{
int ret;
atomic_set(&net->ct.count, 0);
- INIT_HLIST_NULLS_HEAD(&net->ct.unconfirmed, 0);
- INIT_HLIST_NULLS_HEAD(&net->ct.dying, 0);
+ INIT_HLIST_NULLS_HEAD(&net->ct.unconfirmed, UNCONFIRMED_NULLS_VAL);
+ INIT_HLIST_NULLS_HEAD(&net->ct.dying, DYING_NULLS_VAL);
net->ct.stat = alloc_percpu(struct ip_conntrack_stat);
if (!net->ct.stat) {
ret = -ENOMEM;
next prev parent reply other threads:[~2009-06-18 22:47 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-06-15 12:04 [GIT]: Networking David Miller
2009-06-16 9:15 ` Ingo Molnar
2009-06-16 9:19 ` David Miller
2009-06-16 9:33 ` Ingo Molnar
2009-06-16 9:44 ` Ingo Molnar
2009-06-16 9:44 ` Alan Cox
2009-06-16 9:48 ` Ingo Molnar
2009-06-16 9:56 ` David Miller
2009-06-16 10:11 ` Ingo Molnar
2009-06-16 10:35 ` David Miller
2009-06-16 13:38 ` Eric Dumazet
2009-06-16 14:19 ` Eric Dumazet
2009-06-16 18:59 ` Linus Torvalds
2009-06-16 19:08 ` David Miller
2009-06-16 19:37 ` Eric Dumazet
2009-06-16 20:12 ` Eric Dumazet
2009-06-17 6:41 ` [PATCH] net: correct off-by-one write allocations reports Eric Dumazet
2009-06-17 11:31 ` [PATCH] atm: sk_wmem_alloc initial value is one Eric Dumazet
2009-06-18 2:06 ` David Miller
2009-06-18 2:05 ` [PATCH] net: correct off-by-one write allocations reports David Miller
2009-06-17 11:32 ` [GIT]: Networking David Miller
2009-06-16 9:21 ` David Miller
2009-06-16 9:26 ` Ingo Molnar
2009-06-16 10:47 ` David Miller
2009-06-16 10:53 ` Ingo Molnar
2009-06-16 12:24 ` Ingo Molnar
2009-06-16 12:39 ` David Miller
2009-06-17 9:21 ` [bug] __nf_ct_refresh_acct(): WARNING: at lib/list_debug.c:30 __list_add+0x7d/0xad() Ingo Molnar
2009-06-17 10:18 ` Eric Dumazet
2009-06-17 11:08 ` Ingo Molnar
2009-06-17 11:35 ` David Miller
2009-06-18 5:23 ` Ingo Molnar
2009-06-18 5:58 ` Eric Dumazet
2009-06-18 9:47 ` Patrick McHardy
2009-06-18 14:56 ` Patrick McHardy
2009-06-18 15:46 ` Eric Dumazet
2009-06-18 16:09 ` Patrick McHardy
2009-06-18 16:13 ` Pablo Neira Ayuso
2009-06-18 22:46 ` Eric Dumazet [this message]
2009-06-19 11:15 ` [PATCH] netfilter: conntrack: death_by_timeout() fix Patrick McHardy
2009-06-20 15:47 ` [bug] __nf_ct_refresh_acct(): WARNING: at lib/list_debug.c:30 __list_add+0x7d/0xad() Ingo Molnar
2009-06-20 15:56 ` Patrick McHardy
2009-06-17 11:38 ` Patrick McHardy
2009-06-17 11:51 ` Patrick McHardy
2009-06-17 11:55 ` Eric Dumazet
2009-06-17 12:00 ` Patrick McHardy
2009-06-17 12:33 ` Eric Dumazet
2009-06-17 12:36 ` Patrick McHardy
2009-06-17 13:27 ` Eric Dumazet
2009-06-17 13:29 ` Patrick McHardy
2009-06-17 14:23 ` Patrick McHardy
2009-06-17 15:29 ` Eric Dumazet
2009-06-17 15:34 ` Patrick McHardy
2009-06-18 23:18 ` [GIT]: Networking Tilman Schmidt
2009-06-19 3:36 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4A3AC3B3.2030002@gmail.com \
--to=eric.dumazet@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=davem@davemloft.net \
--cc=kaber@trash.net \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=netdev@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).