* Netfilter Connection Tracking Race Condition in Kernel 2.4.x
@ 2006-07-25 0:31 Bob Halley
2006-07-25 1:07 ` Patrick McHardy
0 siblings, 1 reply; 10+ messages in thread
From: Bob Halley @ 2006-07-25 0:31 UTC (permalink / raw)
To: netfilter-devel
This is bugzilla 495, resent to netfilter-devel by request.
Background
Our application uses ip_queue in prerouting to divert DNS UDP packets
to a userland daemon which inspects them and then issues a NF_ACCEPT
or NF_DROP verdict back to the kernel.
We found that if several packets with the same conntrack tuple,
i.e. the same src addr, src port, dst addr, and dst port, arrive very
close together, then only the first one accepted by our software
actually makes it back out to the wire; the others are silently
dropped.
Analysis
We instrumented the kernel to find out where the drop was occurring.
The code doing the dropping was ip_refrag() in
net/ipv4/netfilter/ip_conntrack_standalone.c, specifically:
/* We've seen it coming out the other side: confirm */
if (ip_confirm(hooknum, pskb, in, out, okfn) != NF_ACCEPT)
return NF_DROP;
The dropping is caused by a race between the first packet of a given
tuple making it to confirmed state, and the arrival of another packet
with the same tuple. If a second packet arrives before the first is
confirmed, it is assigned a new connection tracking context instead of
joining that of the first unconfirmed packet. When the second packet
is finally handled by ip_refrag(), the call to ip_confim() finds that
there is already a confirmed entry in the table, and returns NF_DROP.
From the comments in __ip_contrack_confirm(), we infer that this is to
deal with duplicated datagrams and some REJECT case, but it's the
wrong thing in this case because the subsequent packets are neither
duplicates nor REJECTs.
We were using RHEL 3 kernel 2.4.21-40 initially. We looked at later
2.4.x kernels and found some promising looking changes, namely the
addition of an unconfirmed list, in more recent 2.4.x kernels. We built
a 2.4.32 kernel and tested it, but the problem remained. We looked into
the nature of the unconfirmed list and discovered that it was solving a
different problem, but could be a useful starting point for a fix.
Fix
We decided to eliminate the race by having subsequent packets with the
same conntrack tuple join the conntrack context of the first packet
instead of creating a new conntrack context for each of them. Here's
the patch:
--- linux-2.4.32/net/ipv4/netfilter/ip_conntrack_core.c.orig
2005-04-03 18:42:20.000000000 -0700
+++ linux-2.4.32/net/ipv4/netfilter/ip_conntrack_core.c 2006-07-24
13:23:25.000000000 -0700
@@ -777,6 +777,14 @@
/* look for tuple match */
h = ip_conntrack_find_get(&tuple, NULL);
if (!h) {
+ READ_LOCK(&ip_conntrack_lock);
+ h = LIST_FIND(&unconfirmed, conntrack_tuple_cmp,
+ struct ip_conntrack_tuple_hash *, &tuple, NULL);
+ if (h)
+ atomic_inc(&h->ctrack->ct_general.use);
+ READ_UNLOCK(&ip_conntrack_lock);
+ }
+ if (!h) {
h = init_conntrack(&tuple, proto, skb);
if (!h)
return NULL;
This patch reliably ends the race, and we no longer have mysteriously
disappearing packets. Not being netfilter experts, we're not certain
that this patch has no other side effects, and would appreciate any
advice or alternative fixes that people who know more than we do have
to offer.
Regards,
Bob Halley <Bob.Halley@nominum.com>
Brian Wellington <Brian.Wellington@nominum.com>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Netfilter Connection Tracking Race Condition in Kernel 2.4.x
2006-07-25 0:31 Netfilter Connection Tracking Race Condition in Kernel 2.4.x Bob Halley
@ 2006-07-25 1:07 ` Patrick McHardy
2006-07-26 0:54 ` Phil Oester
2006-07-28 13:16 ` [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation Yasuyuki KOZAKAI
0 siblings, 2 replies; 10+ messages in thread
From: Patrick McHardy @ 2006-07-25 1:07 UTC (permalink / raw)
To: Bob Halley; +Cc: netfilter-devel
Bob Halley wrote:
> This is bugzilla 495, resent to netfilter-devel by request.
Thanks.
> Background
>
> Our application uses ip_queue in prerouting to divert DNS UDP packets
> to a userland daemon which inspects them and then issues a NF_ACCEPT
> or NF_DROP verdict back to the kernel.
>
> We found that if several packets with the same conntrack tuple,
> i.e. the same src addr, src port, dst addr, and dst port, arrive very
> close together, then only the first one accepted by our software
> actually makes it back out to the wire; the others are silently
> dropped.
>
>
> Analysis
>
> We instrumented the kernel to find out where the drop was occurring.
> The code doing the dropping was ip_refrag() in
> net/ipv4/netfilter/ip_conntrack_standalone.c, specifically:
>
> /* We've seen it coming out the other side: confirm */
> if (ip_confirm(hooknum, pskb, in, out, okfn) != NF_ACCEPT)
> return NF_DROP;
>
> The dropping is caused by a race between the first packet of a given
> tuple making it to confirmed state, and the arrival of another packet
> with the same tuple. If a second packet arrives before the first is
> confirmed, it is assigned a new connection tracking context instead of
> joining that of the first unconfirmed packet. When the second packet
> is finally handled by ip_refrag(), the call to ip_confim() finds that
> there is already a confirmed entry in the table, and returns NF_DROP.
> From the comments in __ip_contrack_confirm(), we infer that this is to
> deal with duplicated datagrams and some REJECT case, but it's the
> wrong thing in this case because the subsequent packets are neither
> duplicates nor REJECTs.
>
> We were using RHEL 3 kernel 2.4.21-40 initially. We looked at later
> 2.4.x kernels and found some promising looking changes, namely the
> addition of an unconfirmed list, in more recent 2.4.x kernels. We built
> a 2.4.32 kernel and tested it, but the problem remained. We looked into
> the nature of the unconfirmed list and discovered that it was solving a
> different problem, but could be a useful starting point for a fix.
>
>
> Fix
>
> We decided to eliminate the race by having subsequent packets with the
> same conntrack tuple join the conntrack context of the first packet
> instead of creating a new conntrack context for each of them. Here's
> the patch:
>
> --- linux-2.4.32/net/ipv4/netfilter/ip_conntrack_core.c.orig
> 2005-04-03 18:42:20.000000000 -0700
> +++ linux-2.4.32/net/ipv4/netfilter/ip_conntrack_core.c 2006-07-24
> 13:23:25.000000000 -0700
> @@ -777,6 +777,14 @@
> /* look for tuple match */
> h = ip_conntrack_find_get(&tuple, NULL);
> if (!h) {
> + READ_LOCK(&ip_conntrack_lock);
> + h = LIST_FIND(&unconfirmed, conntrack_tuple_cmp,
> + struct ip_conntrack_tuple_hash *, &tuple, NULL);
> + if (h)
> + atomic_inc(&h->ctrack->ct_general.use);
> + READ_UNLOCK(&ip_conntrack_lock);
> + }
> + if (!h) {
> h = init_conntrack(&tuple, proto, skb);
> if (!h)
> return NULL;
>
> This patch reliably ends the race, and we no longer have mysteriously
> disappearing packets. Not being netfilter experts, we're not certain
> that this patch has no other side effects, and would appreciate any
> advice or alternative fixes that people who know more than we do have
> to offer.
The patch itself still contains a race, when a conntrack is confirmed
between the hash lookup and the unconfirmed lookup, it will still
create a new conntrack entry. About the general concept:
Using the unconfirmed lists looks like the only way to do this
besides changing the conntrack of the packet after noticing a
clash, but this would get tricky as the packets could already
cause state transitions or need sequence number adjustments.
Walking the unconfirmed list (which should usually only be very
small, but can grow large with queueing) for every new connection
sounds like a too large performance impact, so at least we need to
use a hash instead. Unfortunately its hard to specify a reasonable
size since it purely depends on userspace. The two other choices I
see are:
- confirm entries manually (using a target) or automatically
before queueing packets. Not a good choice IMO since dropped
connections will stay in the hash table until timeing out.
- change conntrack to always put connections in the hash immediately
and remove them again if the connection is dropped before beeing
confirmed.
The last option looks like the best to me for 2.6 (but could be
tricky to implement, haven't looked at that code in a while).
For 2.4 I think all these possibilities are too intrusive, you
probably need to maintain this patch yourself.
Any better suggestions?
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation
@ 2006-07-25 13:18 Pablo Neira Ayuso
0 siblings, 0 replies; 10+ messages in thread
From: Pablo Neira Ayuso @ 2006-07-25 13:18 UTC (permalink / raw)
To: Netfilter Development Mailinglist; +Cc: Harald Welte, Patrick McHardy
[-- Attachment #1: Type: text/plain, Size: 488 bytes --]
Current conntrack creation path can run into rare race conditions, the
insertion into hashes and timer activation must be done atomically.
This patch also:
- remove functions helper_[find_get|put] that have no clients anymore
- rework get_features facility to avoid a softlockup
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
[-- Attachment #2: 04racy.patch --]
[-- Type: text/plain, Size: 14081 bytes --]
[CTNETLINK] Fix race condition on conntrack creation
Current conntrack creation path can run into rare race conditions, the
insertion into hashes and timer activation must be done atomically.
This patch also:
- remove functions helper_[find_get|put] that have no clients anymore
- rework get_features facility to avoid a softlockup
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Index: net-2.6/net/ipv4/netfilter/ip_conntrack_netlink.c
===================================================================
--- net-2.6.orig/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-07-23 15:23:39.000000000 +0200
+++ net-2.6/net/ipv4/netfilter/ip_conntrack_netlink.c 2006-07-23 15:23:42.000000000 +0200
@@ -1059,13 +1059,9 @@ ctnetlink_create_conntrack(struct nfattr
ct->mark = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_MARK-1]));
#endif
- ct->helper = ip_conntrack_helper_find_get(rtuple);
-
- add_timer(&ct->timeout);
+ ct->helper = ip_conntrack_helper_find(rtuple);
ip_conntrack_hash_insert(ct);
-
- if (ct->helper)
- ip_conntrack_helper_put(ct->helper);
+ add_timer(&ct->timeout);
DEBUGP("conntrack with id %u inserted\n", ct->id);
return 0;
@@ -1107,11 +1103,11 @@ ctnetlink_new_conntrack(struct sock *ctn
h = __ip_conntrack_find(&rtuple, NULL);
if (h == NULL) {
- write_unlock_bh(&ip_conntrack_lock);
DEBUGP("no such conntrack, create new\n");
err = -ENOENT;
if (nlh->nlmsg_flags & NLM_F_CREATE)
err = ctnetlink_create_conntrack(cda, &otuple, &rtuple);
+ write_unlock_bh(&ip_conntrack_lock);
return err;
}
/* implicit 'else' */
Index: net-2.6/net/netfilter/nf_conntrack_netlink.c
===================================================================
--- net-2.6.orig/net/netfilter/nf_conntrack_netlink.c 2006-07-23 15:23:39.000000000 +0200
+++ net-2.6/net/netfilter/nf_conntrack_netlink.c 2006-07-23 18:11:43.000000000 +0200
@@ -1049,6 +1049,7 @@ ctnetlink_create_conntrack(struct nfattr
struct nf_conntrack_tuple *rtuple)
{
struct nf_conn *ct;
+ struct nf_conn_help *help;
int err = -EINVAL;
DEBUGP("entered %s\n", __FUNCTION__);
@@ -1079,8 +1080,11 @@ ctnetlink_create_conntrack(struct nfattr
ct->mark = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_MARK-1]));
#endif
- add_timer(&ct->timeout);
+ help = nfct_help(ct);
+ if (help)
+ help->helper = nf_ct_helper_find(rtuple);
nf_conntrack_hash_insert(ct);
+ add_timer(&ct->timeout);
DEBUGP("conntrack with id %u inserted\n", ct->id);
return 0;
@@ -1124,11 +1128,11 @@ ctnetlink_new_conntrack(struct sock *ctn
h = __nf_conntrack_find(&rtuple, NULL);
if (h == NULL) {
- write_unlock_bh(&nf_conntrack_lock);
DEBUGP("no such conntrack, create new\n");
err = -ENOENT;
if (nlh->nlmsg_flags & NLM_F_CREATE)
err = ctnetlink_create_conntrack(cda, &otuple, &rtuple);
+ write_unlock_bh(&nf_conntrack_lock);
return err;
}
/* implicit 'else' */
Index: net-2.6/include/linux/netfilter_ipv4/ip_conntrack.h
===================================================================
--- net-2.6.orig/include/linux/netfilter_ipv4/ip_conntrack.h 2006-07-23 15:23:39.000000000 +0200
+++ net-2.6/include/linux/netfilter_ipv4/ip_conntrack.h 2006-07-23 15:23:42.000000000 +0200
@@ -255,8 +255,7 @@ ip_ct_iterate_cleanup(int (*iter)(struct
extern struct ip_conntrack_helper *
__ip_conntrack_helper_find_byname(const char *);
extern struct ip_conntrack_helper *
-ip_conntrack_helper_find_get(const struct ip_conntrack_tuple *tuple);
-extern void ip_conntrack_helper_put(struct ip_conntrack_helper *helper);
+ip_conntrack_helper_find(const struct ip_conntrack_tuple *tuple);
extern struct ip_conntrack_protocol *
__ip_conntrack_proto_find(u_int8_t protocol);
Index: net-2.6/include/net/netfilter/nf_conntrack.h
===================================================================
--- net-2.6.orig/include/net/netfilter/nf_conntrack.h 2006-07-23 15:23:39.000000000 +0200
+++ net-2.6/include/net/netfilter/nf_conntrack.h 2006-07-23 15:23:42.000000000 +0200
@@ -221,8 +221,7 @@ extern void nf_ct_remove_expectations(st
extern void nf_conntrack_flush(void);
extern struct nf_conntrack_helper *
-nf_ct_helper_find_get( const struct nf_conntrack_tuple *tuple);
-extern void nf_ct_helper_put(struct nf_conntrack_helper *helper);
+nf_ct_helper_find(const struct nf_conntrack_tuple *tuple);
extern struct nf_conntrack_helper *
__nf_conntrack_helper_find_byname(const char *name);
Index: net-2.6/net/ipv4/netfilter/ip_conntrack_core.c
===================================================================
--- net-2.6.orig/net/ipv4/netfilter/ip_conntrack_core.c 2006-07-23 15:23:39.000000000 +0200
+++ net-2.6/net/ipv4/netfilter/ip_conntrack_core.c 2006-07-23 15:23:42.000000000 +0200
@@ -428,12 +428,12 @@ void ip_conntrack_hash_insert(struct ip_
{
unsigned int hash, repl_hash;
+ ASSERT_WRITE_LOCK(&ip_conntrack_lock);
+
hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
- write_lock_bh(&ip_conntrack_lock);
__ip_conntrack_hash_insert(ct, hash, repl_hash);
- write_unlock_bh(&ip_conntrack_lock);
}
/* Confirm a connection given skb; places it in hash table */
@@ -566,42 +566,14 @@ static inline int helper_cmp(const struc
return ip_ct_tuple_mask_cmp(rtuple, &i->tuple, &i->mask);
}
-static struct ip_conntrack_helper *
-__ip_conntrack_helper_find( const struct ip_conntrack_tuple *tuple)
+struct ip_conntrack_helper *
+ip_conntrack_helper_find(const struct ip_conntrack_tuple *tuple)
{
return LIST_FIND(&helpers, helper_cmp,
struct ip_conntrack_helper *,
tuple);
}
-struct ip_conntrack_helper *
-ip_conntrack_helper_find_get( const struct ip_conntrack_tuple *tuple)
-{
- struct ip_conntrack_helper *helper;
-
- /* need ip_conntrack_lock to assure that helper exists until
- * try_module_get() is called */
- read_lock_bh(&ip_conntrack_lock);
-
- helper = __ip_conntrack_helper_find(tuple);
- if (helper) {
- /* need to increase module usage count to assure helper will
- * not go away while the caller is e.g. busy putting a
- * conntrack in the hash that uses the helper */
- if (!try_module_get(helper->me))
- helper = NULL;
- }
-
- read_unlock_bh(&ip_conntrack_lock);
-
- return helper;
-}
-
-void ip_conntrack_helper_put(struct ip_conntrack_helper *helper)
-{
- module_put(helper->me);
-}
-
struct ip_conntrack_protocol *
__ip_conntrack_proto_find(u_int8_t protocol)
{
@@ -730,7 +702,7 @@ init_conntrack(struct ip_conntrack_tuple
nf_conntrack_get(&conntrack->master->ct_general);
CONNTRACK_STAT_INC(expect_new);
} else {
- conntrack->helper = __ip_conntrack_helper_find(&repl_tuple);
+ conntrack->helper = ip_conntrack_helper_find(&repl_tuple);
CONNTRACK_STAT_INC(new);
}
@@ -1055,7 +1027,7 @@ void ip_conntrack_alter_reply(struct ip_
conntrack->tuplehash[IP_CT_DIR_REPLY].tuple = *newreply;
if (!conntrack->master && conntrack->expecting == 0)
- conntrack->helper = __ip_conntrack_helper_find(newreply);
+ conntrack->helper = ip_conntrack_helper_find(newreply);
write_unlock_bh(&ip_conntrack_lock);
}
Index: net-2.6/net/ipv4/netfilter/ip_conntrack_standalone.c
===================================================================
--- net-2.6.orig/net/ipv4/netfilter/ip_conntrack_standalone.c 2006-07-23 15:23:39.000000000 +0200
+++ net-2.6/net/ipv4/netfilter/ip_conntrack_standalone.c 2006-07-23 15:23:42.000000000 +0200
@@ -954,8 +954,7 @@ EXPORT_SYMBOL_GPL(ip_conntrack_hash_inse
EXPORT_SYMBOL_GPL(ip_ct_remove_expectations);
-EXPORT_SYMBOL_GPL(ip_conntrack_helper_find_get);
-EXPORT_SYMBOL_GPL(ip_conntrack_helper_put);
+EXPORT_SYMBOL_GPL(ip_conntrack_helper_find);
EXPORT_SYMBOL_GPL(__ip_conntrack_helper_find_byname);
EXPORT_SYMBOL_GPL(ip_conntrack_proto_find_get);
Index: net-2.6/net/netfilter/nf_conntrack_core.c
===================================================================
--- net-2.6.orig/net/netfilter/nf_conntrack_core.c 2006-07-23 15:23:39.000000000 +0200
+++ net-2.6/net/netfilter/nf_conntrack_core.c 2006-07-23 18:44:42.000000000 +0200
@@ -678,12 +678,12 @@ void nf_conntrack_hash_insert(struct nf_
{
unsigned int hash, repl_hash;
+ ASSERT_WRITE_LOCK(&nf_conntrack_lock);
+
hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
repl_hash = hash_conntrack(&ct->tuplehash[IP_CT_DIR_REPLY].tuple);
- write_lock_bh(&nf_conntrack_lock);
__nf_conntrack_hash_insert(ct, hash, repl_hash);
- write_unlock_bh(&nf_conntrack_lock);
}
/* Confirm a connection given skb; places it in hash table */
@@ -817,50 +817,51 @@ static inline int helper_cmp(const struc
return nf_ct_tuple_mask_cmp(rtuple, &i->tuple, &i->mask);
}
-static struct nf_conntrack_helper *
-__nf_ct_helper_find(const struct nf_conntrack_tuple *tuple)
+struct nf_conntrack_helper *
+nf_ct_helper_find(const struct nf_conntrack_tuple *tuple)
{
return LIST_FIND(&helpers, helper_cmp,
struct nf_conntrack_helper *,
tuple);
}
-struct nf_conntrack_helper *
-nf_ct_helper_find_get( const struct nf_conntrack_tuple *tuple)
+static u_int32_t __get_features(const struct nf_conntrack_tuple *orig,
+ const struct nf_conntrack_tuple *repl,
+ const struct nf_conntrack_l3proto *l3proto)
{
+ u_int32_t features = l3proto->get_features(orig);
struct nf_conntrack_helper *helper;
- /* need nf_conntrack_lock to assure that helper exists until
- * try_module_get() is called */
- read_lock_bh(&nf_conntrack_lock);
-
- helper = __nf_ct_helper_find(tuple);
- if (helper) {
- /* need to increase module usage count to assure helper will
- * not go away while the caller is e.g. busy putting a
- * conntrack in the hash that uses the helper */
- if (!try_module_get(helper->me))
- helper = NULL;
- }
+ helper = nf_ct_helper_find(repl);
+ if (helper)
+ features |= NF_CT_F_HELP;
- read_unlock_bh(&nf_conntrack_lock);
+ DEBUGP("nf_conntrack_alloc: features=0x%x\n", features);
- return helper;
+ return features;
}
-void nf_ct_helper_put(struct nf_conntrack_helper *helper)
+static u_int32_t get_features(const struct nf_conntrack_tuple *orig,
+ const struct nf_conntrack_tuple *repl,
+ const struct nf_conntrack_l3proto *l3proto)
{
- module_put(helper->me);
+ u_int32_t features;
+
+ /* Protect access to helper list */
+ read_lock_bh(&nf_conntrack_lock);
+ features = __get_features(orig, repl, l3proto);
+ read_unlock_bh(&nf_conntrack_lock);
+
+ return features;
}
static struct nf_conn *
__nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
const struct nf_conntrack_tuple *repl,
- const struct nf_conntrack_l3proto *l3proto)
+ const struct nf_conntrack_l3proto *l3proto,
+ u_int32_t features)
{
struct nf_conn *conntrack = NULL;
- u_int32_t features = 0;
- struct nf_conntrack_helper *helper;
if (unlikely(!nf_conntrack_hash_rnd_initted)) {
get_random_bytes(&nf_conntrack_hash_rnd, 4);
@@ -880,18 +881,6 @@ __nf_conntrack_alloc(const struct nf_con
}
}
- /* find features needed by this conntrack. */
- features = l3proto->get_features(orig);
-
- /* FIXME: protect helper list per RCU */
- read_lock_bh(&nf_conntrack_lock);
- helper = __nf_ct_helper_find(repl);
- if (helper)
- features |= NF_CT_F_HELP;
- read_unlock_bh(&nf_conntrack_lock);
-
- DEBUGP("nf_conntrack_alloc: features=0x%x\n", features);
-
read_lock_bh(&nf_ct_cache_lock);
if (unlikely(!nf_ct_cache[features].use)) {
@@ -908,11 +897,6 @@ __nf_conntrack_alloc(const struct nf_con
memset(conntrack, 0, nf_ct_cache[features].size);
conntrack->features = features;
- if (helper) {
- struct nf_conn_help *help = nfct_help(conntrack);
- NF_CT_ASSERT(help);
- help->helper = helper;
- }
atomic_set(&conntrack->ct_general.use, 1);
conntrack->ct_general.destroy = destroy_conntrack;
@@ -933,9 +917,11 @@ struct nf_conn *nf_conntrack_alloc(const
const struct nf_conntrack_tuple *repl)
{
struct nf_conntrack_l3proto *l3proto;
+ u_int32_t features;
l3proto = __nf_ct_l3proto_find(orig->src.l3num);
- return __nf_conntrack_alloc(orig, repl, l3proto);
+ features = __get_features(orig, repl, l3proto);
+ return __nf_conntrack_alloc(orig, repl, l3proto, features);
}
void nf_conntrack_free(struct nf_conn *conntrack)
@@ -960,13 +946,17 @@ init_conntrack(const struct nf_conntrack
struct nf_conn *conntrack;
struct nf_conntrack_tuple repl_tuple;
struct nf_conntrack_expect *exp;
+ u_int32_t features;
if (!nf_ct_invert_tuple(&repl_tuple, tuple, l3proto, protocol)) {
DEBUGP("Can't invert tuple.\n");
return NULL;
}
- conntrack = __nf_conntrack_alloc(tuple, &repl_tuple, l3proto);
+ /* find features needed by this conntrack */
+ features = get_features(tuple, &repl_tuple, l3proto);
+
+ conntrack = __nf_conntrack_alloc(tuple, &repl_tuple, l3proto, features);
if (conntrack == NULL || IS_ERR(conntrack)) {
DEBUGP("Can't allocate conntrack.\n");
return (struct nf_conntrack_tuple_hash *)conntrack;
@@ -995,8 +985,12 @@ init_conntrack(const struct nf_conntrack
#endif
nf_conntrack_get(&conntrack->master->ct_general);
NF_CT_STAT_INC(expect_new);
- } else
+ } else {
+ struct nf_conn_help *help = nfct_help(conntrack);
+ if (help)
+ help->helper = nf_ct_helper_find(&repl_tuple);
NF_CT_STAT_INC(new);
+ }
/* Overload tuple linked list to put us in unconfirmed list. */
list_add(&conntrack->tuplehash[IP_CT_DIR_ORIGINAL].list, &unconfirmed);
Index: net-2.6/net/netfilter/nf_conntrack_standalone.c
===================================================================
--- net-2.6.orig/net/netfilter/nf_conntrack_standalone.c 2006-07-23 15:23:39.000000000 +0200
+++ net-2.6/net/netfilter/nf_conntrack_standalone.c 2006-07-23 15:23:42.000000000 +0200
@@ -889,8 +889,7 @@ EXPORT_SYMBOL(nf_conntrack_alloc);
EXPORT_SYMBOL(nf_conntrack_free);
EXPORT_SYMBOL(nf_conntrack_flush);
EXPORT_SYMBOL(nf_ct_remove_expectations);
-EXPORT_SYMBOL(nf_ct_helper_find_get);
-EXPORT_SYMBOL(nf_ct_helper_put);
+EXPORT_SYMBOL(nf_ct_helper_find);
EXPORT_SYMBOL(__nf_conntrack_helper_find_byname);
EXPORT_SYMBOL(__nf_conntrack_find);
EXPORT_SYMBOL(nf_ct_unlink_expect);
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Netfilter Connection Tracking Race Condition in Kernel 2.4.x
2006-07-25 1:07 ` Patrick McHardy
@ 2006-07-26 0:54 ` Phil Oester
2006-07-26 3:56 ` Patrick McHardy
2006-07-28 13:16 ` [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation Yasuyuki KOZAKAI
1 sibling, 1 reply; 10+ messages in thread
From: Phil Oester @ 2006-07-26 0:54 UTC (permalink / raw)
To: Patrick McHardy; +Cc: Bob Halley, netfilter-devel
On Tue, Jul 25, 2006 at 03:07:24AM +0200, Patrick McHardy wrote:
> - change conntrack to always put connections in the hash immediately
> and remove them again if the connection is dropped before beeing
> confirmed.
This could in theory be implemented via an IPS_UNCONFIRMED_BIT (ignoring
the sure to be complicated implementation details). But would there be
any concern about a DOS against the hash if unconfirmed connections
were allowed to enter?
Phil
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Netfilter Connection Tracking Race Condition in Kernel 2.4.x
2006-07-26 0:54 ` Phil Oester
@ 2006-07-26 3:56 ` Patrick McHardy
2006-07-26 4:49 ` Yasuyuki KOZAKAI
0 siblings, 1 reply; 10+ messages in thread
From: Patrick McHardy @ 2006-07-26 3:56 UTC (permalink / raw)
To: Phil Oester; +Cc: Bob Halley, netfilter-devel
Phil Oester wrote:
> On Tue, Jul 25, 2006 at 03:07:24AM +0200, Patrick McHardy wrote:
>
>>- change conntrack to always put connections in the hash immediately
>> and remove them again if the connection is dropped before beeing
>> confirmed.
>
>
> This could in theory be implemented via an IPS_UNCONFIRMED_BIT (ignoring
> the sure to be complicated implementation details). But would there be
> any concern about a DOS against the hash if unconfirmed connections
> were allowed to enter?
There isn't really a difference to keeping them in the unconfirmed
list besides better scalability. The same properties for unconfirmed
entries hold here, usually there should be very few (max 2 per CPU
without preemption), except if queueing is involved. I don't think
there is an increased risk of DOS by using the conntrack hash vs.
using a seperate hash, but with the conntrack hash we can do it all
in one lookup and use the existing eviction mechanism.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Netfilter Connection Tracking Race Condition in Kernel 2.4.x
2006-07-26 3:56 ` Patrick McHardy
@ 2006-07-26 4:49 ` Yasuyuki KOZAKAI
0 siblings, 0 replies; 10+ messages in thread
From: Yasuyuki KOZAKAI @ 2006-07-26 4:49 UTC (permalink / raw)
To: kaber; +Cc: kernel, Bob.Halley, netfilter-devel
From: Patrick McHardy <kaber@trash.net>
Date: Wed, 26 Jul 2006 05:56:04 +0200
> Phil Oester wrote:
> > On Tue, Jul 25, 2006 at 03:07:24AM +0200, Patrick McHardy wrote:
> >
> >>- change conntrack to always put connections in the hash immediately
> >> and remove them again if the connection is dropped before beeing
> >> confirmed.
> >
> >
> > This could in theory be implemented via an IPS_UNCONFIRMED_BIT (ignoring
> > the sure to be complicated implementation details). But would there be
> > any concern about a DOS against the hash if unconfirmed connections
> > were allowed to enter?
>
>
> There isn't really a difference to keeping them in the unconfirmed
> list besides better scalability. The same properties for unconfirmed
> entries hold here, usually there should be very few (max 2 per CPU
> without preemption), except if queueing is involved. I don't think
> there is an increased risk of DOS by using the conntrack hash vs.
> using a seperate hash, but with the conntrack hash we can do it all
> in one lookup and use the existing eviction mechanism.
I agree.
BTW that would also solves race condition on multiple CPU at early point.
-- Yasuyuki Kozakai
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation
2006-07-25 1:07 ` Patrick McHardy
2006-07-26 0:54 ` Phil Oester
@ 2006-07-28 13:16 ` Yasuyuki KOZAKAI
2006-07-31 11:15 ` Pablo Neira Ayuso
1 sibling, 1 reply; 10+ messages in thread
From: Yasuyuki KOZAKAI @ 2006-07-28 13:16 UTC (permalink / raw)
To: pablo; +Cc: laforge, netfilter-devel, kaber
Hi, Pablo,
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Tue, 25 Jul 2006 15:18:38 +0200
> - rework get_features facility to avoid a softlockup
This looks nice cleanup, but __nf_conntrack_alloc() cannot
be called while holding nf_conntrack_lock, because it may call
early_drop(), which holds nf_contrack_lock and also may call
nf_ct_put().
> static struct nf_conn *
> __nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
> const struct nf_conntrack_tuple *repl,
> - const struct nf_conntrack_l3proto *l3proto)
> + const struct nf_conntrack_l3proto *l3proto,
> + u_int32_t features)
> {
You've moved "features = l3proto->get_features(orig);" out of
this function, then the argument 'l3proto' isn't necessary.
BTW, I think this race is similar situation in init_conntrack().
It doesn't care about race and __nf_conntrack_confirm() does it
instead.
One more my concern is recent Patrick's proposal.
https://lists.netfilter.org/pipermail/netfilter-devel/2006-July/025107.html
> - change conntrack to always put connections in the hash immediately
> and remove them again if the connection is dropped before beeing
> confirmed.
If we do this, we need to solve the same issue in init_conntrack().
It might be time to consider to organize processing of hash and lock.
-- Yasuyuki Kozkai
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation
2006-07-28 13:16 ` [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation Yasuyuki KOZAKAI
@ 2006-07-31 11:15 ` Pablo Neira Ayuso
2006-08-04 14:43 ` Amin Azez
0 siblings, 1 reply; 10+ messages in thread
From: Pablo Neira Ayuso @ 2006-07-31 11:15 UTC (permalink / raw)
To: Yasuyuki KOZAKAI; +Cc: laforge, netfilter-devel, kaber
Hi Yasuyuki,
Yasuyuki KOZAKAI wrote:
> From: Pablo Neira Ayuso <pablo@netfilter.org>
> Date: Tue, 25 Jul 2006 15:18:38 +0200
>
>
>> - rework get_features facility to avoid a softlockup
>
>
> This looks nice cleanup, but __nf_conntrack_alloc() cannot
> be called while holding nf_conntrack_lock, because it may call
> early_drop(), which holds nf_contrack_lock and also may call
> nf_ct_put().
>
>
>> static struct nf_conn *
>> __nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
>> const struct nf_conntrack_tuple *repl,
>>- const struct nf_conntrack_l3proto *l3proto)
>>+ const struct nf_conntrack_l3proto *l3proto,
>>+ u_int32_t features)
>> {
>
>
> You've moved "features = l3proto->get_features(orig);" out of
> this function, then the argument 'l3proto' isn't necessary.
Indeed, I also detected another problem related with the NAT code in
ip_conntrack_netlink, so this patch needs to be dropped.
I'm questioning the usefulness of this patch since nfnetlink
serializes the creation of two new conntracks.
> BTW, I think this race is similar situation in init_conntrack().
> It doesn't care about race and __nf_conntrack_confirm() does it
> instead.
>
> One more my concern is recent Patrick's proposal.
>
> https://lists.netfilter.org/pipermail/netfilter-devel/2006-July/025107.html
>
>
>>- change conntrack to always put connections in the hash immediately
>> and remove them again if the connection is dropped before beeing
>> confirmed.
>
> If we do this, we need to solve the same issue in init_conntrack().
> It might be time to consider to organize processing of hash and lock.
About the implementation, we can play around with refcounting, set
refcount to 1 if the conntrack is unconfirmed and increase it to 2 once
it gets confirmed.
--
The dawn of the fourth age of Linux firewalling is coming; a time of
great struggle and heroic deeds -- J.Kadlecsik got inspired by J.Morris
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation
2006-07-31 11:15 ` Pablo Neira Ayuso
@ 2006-08-04 14:43 ` Amin Azez
2006-08-08 10:19 ` Patrick McHardy
0 siblings, 1 reply; 10+ messages in thread
From: Amin Azez @ 2006-08-04 14:43 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: laforge, netfilter-devel, kaber
* Pablo Neira Ayuso wrote, On 31/07/06 12:15:
> Hi Yasuyuki,
>
> Yasuyuki KOZAKAI wrote:
>> From: Pablo Neira Ayuso <pablo@netfilter.org>
>> Date: Tue, 25 Jul 2006 15:18:38 +0200
>>
>>
>>> - rework get_features facility to avoid a softlockup
>>
>>
>> This looks nice cleanup, but __nf_conntrack_alloc() cannot
>> be called while holding nf_conntrack_lock, because it may call
>> early_drop(), which holds nf_contrack_lock and also may call
>> nf_ct_put().
>>
>>
>>> static struct nf_conn *
>>> __nf_conntrack_alloc(const struct nf_conntrack_tuple *orig,
>>> const struct nf_conntrack_tuple *repl,
>>> - const struct nf_conntrack_l3proto *l3proto)
>>> + const struct nf_conntrack_l3proto *l3proto,
>>> + u_int32_t features)
>>> {
>>
>>
>> You've moved "features = l3proto->get_features(orig);" out of
>> this function, then the argument 'l3proto' isn't necessary.
>
> Indeed, I also detected another problem related with the NAT code in
> ip_conntrack_netlink, so this patch needs to be dropped.
>
> I'm questioning the usefulness of this patch since nfnetlink
> serializes the creation of two new conntracks.
I'm finding it hard to drop this patch from the series, having trouble
applying patch 7 from the series without this patch.
I find it difficult to be comfortable with dropping some features of
this patch. In this fragment, the second chunk does what looks like an
important re-ordering of locking and conntrack creation; i.e. of course
the lock is retained till after conntrack creation.
So I don't think it is safe to entirely drop this patch; Pablo?
Index: linux-2.6.17.1/net/ipv4/netfilter/ip_conntrack_netlink.c
===================================================================
--- linux-2.6.17.1.orig/net/ipv4/netfilter/ip_conntrack_netlink.c
+++ linux-2.6.17.1/net/ipv4/netfilter/ip_conntrack_netlink.c
@@ -1212,13 +1212,9 @@ ctnetlink_create_conntrack(struct nfattr
ct->mark = ntohl(*(u_int32_t *)NFA_DATA(cda[CTA_MARK-1]));
#endif
- ct->helper = ip_conntrack_helper_find_get(rtuple);
-
- add_timer(&ct->timeout);
+ ct->helper = ip_conntrack_helper_find(rtuple);
ip_conntrack_hash_insert(ct);
-
- if (ct->helper)
- ip_conntrack_helper_put(ct->helper);
+ add_timer(&ct->timeout);
DEBUGP("conntrack with id %u inserted\n", ct->id);
return 0;
@@ -1260,11 +1256,11 @@ ctnetlink_new_conntrack(struct sock *ctn
h = __ip_conntrack_find(&rtuple, NULL);
if (h == NULL) {
- write_unlock_bh(&ip_conntrack_lock);
DEBUGP("no such conntrack, create new\n");
err = -ENOENT;
if (nlh->nlmsg_flags & NLM_F_CREATE)
err = ctnetlink_create_conntrack(cda, &otuple,
&rtuple);
+ write_unlock_bh(&ip_conntrack_lock);
return err;
}
/* implicit 'else' */
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation
2006-08-04 14:43 ` Amin Azez
@ 2006-08-08 10:19 ` Patrick McHardy
0 siblings, 0 replies; 10+ messages in thread
From: Patrick McHardy @ 2006-08-08 10:19 UTC (permalink / raw)
To: Amin Azez; +Cc: laforge, netfilter-devel, Pablo Neira Ayuso
Amin Azez wrote:
> * Pablo Neira Ayuso wrote, On 31/07/06 12:15:
>
> I find it difficult to be comfortable with dropping some features of
> this patch. In this fragment, the second chunk does what looks like an
> important re-ordering of locking and conntrack creation; i.e. of course
> the lock is retained till after conntrack creation.
>
> So I don't think it is safe to entirely drop this patch; Pablo?
I'll drop and and see which of the other ones still make sense.
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2006-08-08 10:19 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-25 0:31 Netfilter Connection Tracking Race Condition in Kernel 2.4.x Bob Halley
2006-07-25 1:07 ` Patrick McHardy
2006-07-26 0:54 ` Phil Oester
2006-07-26 3:56 ` Patrick McHardy
2006-07-26 4:49 ` Yasuyuki KOZAKAI
2006-07-28 13:16 ` [PATCH 4/8][CTNETLINK] Fix race condition on conntrack creation Yasuyuki KOZAKAI
2006-07-31 11:15 ` Pablo Neira Ayuso
2006-08-04 14:43 ` Amin Azez
2006-08-08 10:19 ` Patrick McHardy
-- strict thread matches above, loose matches on Subject: below --
2006-07-25 13:18 Pablo Neira Ayuso
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.