* [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
@ 2014-02-03 22:09 Pablo Neira Ayuso
2014-02-03 23:29 ` Florian Westphal
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Pablo Neira Ayuso @ 2014-02-03 22:09 UTC (permalink / raw)
To: netfilter-devel; +Cc: avagin, fw, eric.dumazet
With this patch, the conntrack refcount is initially set to zero and
it is bumped once it is added to any of the list, so we fulfill
Eric's golden rule which is that all released objects always have a
refcount that equals zero.
Andrey Vagin reports that nf_conntrack_free can't be called for a
conntrack with non-zero ref-counter, because it can race with
nf_conntrack_find_get().
A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
ref-counter says that this conntrack is used. So when we release
a conntrack with non-zero counter, we break this assumption.
CPU1 CPU2
____nf_conntrack_find()
nf_ct_put()
destroy_conntrack()
...
init_conntrack
__nf_conntrack_alloc (set use = 1)
atomic_inc_not_zero(&ct->use) (use = 2)
if (!l4proto->new(ct, skb, dataoff, timeouts))
nf_conntrack_free(ct); (use = 2 !!!)
...
__nf_conntrack_alloc (set use = 1)
if (!nf_ct_key_equal(h, tuple, zone))
nf_ct_put(ct); (use = 0)
destroy_conntrack()
/* continue to work with CT */
After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
race in nf_conntrack_find_get" another bug was triggered in
destroy_conntrack():
<4>[67096.759334] ------------[ cut here ]------------
<2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
...
<4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
<4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
<4>[67096.760255] Call Trace:
<4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
<4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
<4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
<4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
<4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
<4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
<4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
<4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
<4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
<4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
<4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
<4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
<4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
<4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
<4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190
<4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
<4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
<4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
<4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Andrew Vagin <avagin@parallels.com>
Reported-by: Andrew Vagin <avagin@parallels.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
I have reused the original title for the RFC patch that Andrey posted and
most of the original patch description.
Thanks to Eric Dumazet for indicating changes regarding memory barriers.
include/net/netfilter/nf_conntrack.h | 2 ++
net/netfilter/nf_conntrack_core.c | 34 +++++++++++++++++++++++++++++-----
net/netfilter/nf_synproxy_core.c | 5 ++---
net/netfilter/xt_CT.c | 7 +------
4 files changed, 34 insertions(+), 14 deletions(-)
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
index 01ea6ee..b2ac624 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -284,6 +284,8 @@ extern unsigned int nf_conntrack_max;
extern unsigned int nf_conntrack_hash_rnd;
void init_nf_conntrack_hash_rnd(void);
+void nf_conntrack_tmpl_insert(struct net *net, struct nf_conn *tmpl);
+
#define NF_CT_STAT_INC(net, count) __this_cpu_inc((net)->ct.stat->count)
#define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 4d1fb5d..356bef5 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -448,7 +448,9 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
goto out;
add_timer(&ct->timeout);
- nf_conntrack_get(&ct->ct_general);
+ smp_wmb();
+ /* The caller holds a reference to this object */
+ atomic_set(&ct->ct_general.use, 2);
__nf_conntrack_hash_insert(ct, hash, repl_hash);
NF_CT_STAT_INC(net, insert);
spin_unlock_bh(&nf_conntrack_lock);
@@ -462,6 +464,21 @@ out:
}
EXPORT_SYMBOL_GPL(nf_conntrack_hash_check_insert);
+/* deletion from this larval template list happens via nf_ct_put() */
+void nf_conntrack_tmpl_insert(struct net *net, struct nf_conn *tmpl)
+{
+ __set_bit(IPS_TEMPLATE_BIT, &tmpl->status);
+ __set_bit(IPS_CONFIRMED_BIT, &tmpl->status);
+ nf_conntrack_get(&tmpl->ct_general);
+
+ spin_lock_bh(&nf_conntrack_lock);
+ /* Overload tuple linked list to put us in template list. */
+ hlist_nulls_add_head_rcu(&tmpl->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
+ &net->ct.tmpl);
+ spin_unlock_bh(&nf_conntrack_lock);
+}
+EXPORT_SYMBOL_GPL(nf_conntrack_tmpl_insert);
+
/* Confirm a connection given skb; places it in hash table */
int
__nf_conntrack_confirm(struct sk_buff *skb)
@@ -733,11 +750,10 @@ __nf_conntrack_alloc(struct net *net, u16 zone,
nf_ct_zone->id = zone;
}
#endif
- /*
- * changes to lookup keys must be done before setting refcnt to 1
+ /* Because we use RCU lookups, we set ct_general.use to zero before
+ * this is inserted in any list.
*/
- smp_wmb();
- atomic_set(&ct->ct_general.use, 1);
+ atomic_set(&ct->ct_general.use, 0);
return ct;
#ifdef CONFIG_NF_CONNTRACK_ZONES
@@ -761,6 +777,11 @@ void nf_conntrack_free(struct nf_conn *ct)
{
struct net *net = nf_ct_net(ct);
+ /* A freed object has refcnt == 0, that's
+ * the golden rule for SLAB_DESTROY_BY_RCU
+ */
+ NF_CT_ASSERT(atomic_read(&ct->ct_general.use) == 0);
+
nf_ct_ext_destroy(ct);
nf_ct_ext_free(ct);
kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
@@ -856,6 +877,9 @@ init_conntrack(struct net *net, struct nf_conn *tmpl,
NF_CT_STAT_INC(net, new);
}
+ /* Now it is inserted into the unconfirmed list, bump refcount */
+ nf_conntrack_get(&ct->ct_general);
+
/* Overload tuple linked list to put us in unconfirmed list. */
hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
&net->ct.unconfirmed);
diff --git a/net/netfilter/nf_synproxy_core.c b/net/netfilter/nf_synproxy_core.c
index 9858e3e..52e20c9 100644
--- a/net/netfilter/nf_synproxy_core.c
+++ b/net/netfilter/nf_synproxy_core.c
@@ -363,9 +363,8 @@ static int __net_init synproxy_net_init(struct net *net)
goto err2;
if (!nfct_synproxy_ext_add(ct))
goto err2;
- __set_bit(IPS_TEMPLATE_BIT, &ct->status);
- __set_bit(IPS_CONFIRMED_BIT, &ct->status);
+ nf_conntrack_tmpl_insert(net, ct);
snet->tmpl = ct;
snet->stats = alloc_percpu(struct synproxy_stats);
@@ -390,7 +389,7 @@ static void __net_exit synproxy_net_exit(struct net *net)
{
struct synproxy_net *snet = synproxy_pernet(net);
- nf_conntrack_free(snet->tmpl);
+ nf_ct_put(snet->tmpl);
synproxy_proc_exit(net);
free_percpu(snet->stats);
}
diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index 5929be6..75747ae 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -228,12 +228,7 @@ static int xt_ct_tg_check(const struct xt_tgchk_param *par,
goto err3;
}
- __set_bit(IPS_TEMPLATE_BIT, &ct->status);
- __set_bit(IPS_CONFIRMED_BIT, &ct->status);
-
- /* Overload tuple linked list to put us in template list. */
- hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
- &par->net->ct.tmpl);
+ nf_conntrack_tmpl_insert(par->net, ct);
out:
info->ct = ct;
return 0;
--
1.7.10.4
^ permalink raw reply related [flat|nested] 9+ messages in thread* Re: [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
2014-02-03 22:09 [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt Pablo Neira Ayuso
@ 2014-02-03 23:29 ` Florian Westphal
2014-02-04 10:21 ` Pablo Neira Ayuso
2014-02-04 0:39 ` Eric Dumazet
2014-02-04 13:53 ` Andrew Vagin
2 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2014-02-03 23:29 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, avagin, fw, eric.dumazet
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 4d1fb5d..356bef5 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -448,7 +448,9 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
> goto out;
>
> add_timer(&ct->timeout);
> - nf_conntrack_get(&ct->ct_general);
> + smp_wmb();
> + /* The caller holds a reference to this object */
> + atomic_set(&ct->ct_general.use, 2);
What happens when the skb is dropped before confirmation?
How is skb_clone etc. solved (it increments refcnt of underlying
conntrack object?)
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
2014-02-03 23:29 ` Florian Westphal
@ 2014-02-04 10:21 ` Pablo Neira Ayuso
2014-02-04 12:46 ` Florian Westphal
0 siblings, 1 reply; 9+ messages in thread
From: Pablo Neira Ayuso @ 2014-02-04 10:21 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel, avagin, eric.dumazet
On Tue, Feb 04, 2014 at 12:29:25AM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 4d1fb5d..356bef5 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -448,7 +448,9 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
> > goto out;
> >
> > add_timer(&ct->timeout);
> > - nf_conntrack_get(&ct->ct_general);
> > + smp_wmb();
> > + /* The caller holds a reference to this object */
> > + atomic_set(&ct->ct_general.use, 2);
>
> What happens when the skb is dropped before confirmation?
If you refer to the spot above, conntracks added via ctnetlink are
always confirmed and the refcount will become 1 after insertion into
the hashes. In the packet path, the refcount is 1 for unconfirmed
conntracks, so if you release the skbuff, nf_conntrack_put() is called
and the conntrack is also released (fulfilling that refcount equals 0).
> How is skb_clone etc. solved (it increments refcnt of underlying
> conntrack object?)
I think this works as before, this patch is just delaying the initial
refcount bumping to 1 to make it once the conntrack is inserted in any
of the lists.
Please, let me know if you still see any problematic case. Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
2014-02-04 10:21 ` Pablo Neira Ayuso
@ 2014-02-04 12:46 ` Florian Westphal
2014-02-04 13:16 ` Pablo Neira Ayuso
0 siblings, 1 reply; 9+ messages in thread
From: Florian Westphal @ 2014-02-04 12:46 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, avagin, eric.dumazet
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Tue, Feb 04, 2014 at 12:29:25AM +0100, Florian Westphal wrote:
> > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > index 4d1fb5d..356bef5 100644
> > > --- a/net/netfilter/nf_conntrack_core.c
> > > +++ b/net/netfilter/nf_conntrack_core.c
> > > @@ -448,7 +448,9 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
> > > goto out;
> > >
> > > add_timer(&ct->timeout);
> > > - nf_conntrack_get(&ct->ct_general);
> > > + smp_wmb();
> > > + /* The caller holds a reference to this object */
> > > + atomic_set(&ct->ct_general.use, 2);
> >
> > What happens when the skb is dropped before confirmation?
>
> If you refer to the spot above, conntracks added via ctnetlink are
> always confirmed and the refcount will become 1 after insertion into
> the hashes. In the packet path, the refcount is 1 for unconfirmed
> conntracks,
Right, sorry. I missed this. You only want to delay the init-to-1
until the conntrack is inserted into the unconfirmed list.
However, I think there is still a problem with the atomic_set to 2
above.
What about e.g.
Prerouting -> set refcnt to 1
(clone)
(clone)
(refcnt is 3)
forward/postrouting -> confirmation (refcnt reset to two)
I think this could happen at least with netfilter on-top-of bridge
when dealing with e.g. udp mcast. (i am looking at br_forward and
deliver_clone in particular).
I think you could just leave hash_check_insert() alone?
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
2014-02-04 12:46 ` Florian Westphal
@ 2014-02-04 13:16 ` Pablo Neira Ayuso
2014-02-04 13:27 ` Florian Westphal
0 siblings, 1 reply; 9+ messages in thread
From: Pablo Neira Ayuso @ 2014-02-04 13:16 UTC (permalink / raw)
To: Florian Westphal; +Cc: netfilter-devel, avagin, eric.dumazet
On Tue, Feb 04, 2014 at 01:46:37PM +0100, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Tue, Feb 04, 2014 at 12:29:25AM +0100, Florian Westphal wrote:
> > > Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > > > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > > > index 4d1fb5d..356bef5 100644
> > > > --- a/net/netfilter/nf_conntrack_core.c
> > > > +++ b/net/netfilter/nf_conntrack_core.c
> > > > @@ -448,7 +448,9 @@ nf_conntrack_hash_check_insert(struct nf_conn *ct)
> > > > goto out;
> > > >
> > > > add_timer(&ct->timeout);
> > > > - nf_conntrack_get(&ct->ct_general);
> > > > + smp_wmb();
> > > > + /* The caller holds a reference to this object */
> > > > + atomic_set(&ct->ct_general.use, 2);
> > >
> > > What happens when the skb is dropped before confirmation?
> >
> > If you refer to the spot above, conntracks added via ctnetlink are
> > always confirmed and the refcount will become 1 after insertion into
> > the hashes. In the packet path, the refcount is 1 for unconfirmed
> > conntracks,
>
> Right, sorry. I missed this. You only want to delay the init-to-1
> until the conntrack is inserted into the unconfirmed list.
>
> However, I think there is still a problem with the atomic_set to 2
> above.
>
> What about e.g.
> Prerouting -> set refcnt to 1
> (clone)
> (clone)
> (refcnt is 3)
> forward/postrouting -> confirmation (refcnt reset to two)
>
> I think this could happen at least with netfilter on-top-of bridge
> when dealing with e.g. udp mcast. (i am looking at br_forward and
> deliver_clone in particular).
>
> I think you could just leave hash_check_insert() alone?
hash_check_insert() is only used by ctnetlink in the creation path for
new conntracks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
2014-02-04 13:16 ` Pablo Neira Ayuso
@ 2014-02-04 13:27 ` Florian Westphal
0 siblings, 0 replies; 9+ messages in thread
From: Florian Westphal @ 2014-02-04 13:27 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, avagin, eric.dumazet
Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > I think you could just leave hash_check_insert() alone?
>
> hash_check_insert() is only used by ctnetlink in the creation path for
> new conntracks.
I see. In that case I see nothing wrong with your patch.
Thanks for explaining!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
2014-02-03 22:09 [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt Pablo Neira Ayuso
2014-02-03 23:29 ` Florian Westphal
@ 2014-02-04 0:39 ` Eric Dumazet
2014-02-05 23:00 ` Pablo Neira Ayuso
2014-02-04 13:53 ` Andrew Vagin
2 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2014-02-04 0:39 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, avagin, fw
On Mon, 2014-02-03 at 23:09 +0100, Pablo Neira Ayuso wrote:
> With this patch, the conntrack refcount is initially set to zero and
> it is bumped once it is added to any of the list, so we fulfill
> Eric's golden rule which is that all released objects always have a
> refcount that equals zero.
>
> Andrey Vagin reports that nf_conntrack_free can't be called for a
> conntrack with non-zero ref-counter, because it can race with
> nf_conntrack_find_get().
>
> A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
> ref-counter says that this conntrack is used. So when we release
> a conntrack with non-zero counter, we break this assumption.
...
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Florian Westphal <fw@strlen.de>
> Cc: Andrew Vagin <avagin@parallels.com>
> Reported-by: Andrew Vagin <avagin@parallels.com>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> ---
SGTM !
Reviewed-by: Eric Dumazet <edumazet@google.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
2014-02-04 0:39 ` Eric Dumazet
@ 2014-02-05 23:00 ` Pablo Neira Ayuso
0 siblings, 0 replies; 9+ messages in thread
From: Pablo Neira Ayuso @ 2014-02-05 23:00 UTC (permalink / raw)
To: Eric Dumazet; +Cc: netfilter-devel, avagin, fw
On Mon, Feb 03, 2014 at 04:39:22PM -0800, Eric Dumazet wrote:
> On Mon, 2014-02-03 at 23:09 +0100, Pablo Neira Ayuso wrote:
> > With this patch, the conntrack refcount is initially set to zero and
> > it is bumped once it is added to any of the list, so we fulfill
> > Eric's golden rule which is that all released objects always have a
> > refcount that equals zero.
> >
> > Andrey Vagin reports that nf_conntrack_free can't be called for a
> > conntrack with non-zero ref-counter, because it can race with
> > nf_conntrack_find_get().
> >
> > A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
> > ref-counter says that this conntrack is used. So when we release
> > a conntrack with non-zero counter, we break this assumption.
> ...
> > Cc: Eric Dumazet <eric.dumazet@gmail.com>
> > Cc: Florian Westphal <fw@strlen.de>
> > Cc: Andrew Vagin <avagin@parallels.com>
> > Reported-by: Andrew Vagin <avagin@parallels.com>
> > Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
> > ---
>
> SGTM !
>
> Reviewed-by: Eric Dumazet <edumazet@google.com>
Applied, thanks everyone!
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
2014-02-03 22:09 [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt Pablo Neira Ayuso
2014-02-03 23:29 ` Florian Westphal
2014-02-04 0:39 ` Eric Dumazet
@ 2014-02-04 13:53 ` Andrew Vagin
2 siblings, 0 replies; 9+ messages in thread
From: Andrew Vagin @ 2014-02-04 13:53 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel, fw, eric.dumazet
On Mon, Feb 03, 2014 at 11:09:57PM +0100, Pablo Neira Ayuso wrote:
> With this patch, the conntrack refcount is initially set to zero and
> it is bumped once it is added to any of the list, so we fulfill
> Eric's golden rule which is that all released objects always have a
> refcount that equals zero.
>
> Andrey Vagin reports that nf_conntrack_free can't be called for a
> conntrack with non-zero ref-counter, because it can race with
> nf_conntrack_find_get().
>
> A conntrack slab is created with SLAB_DESTROY_BY_RCU. Non-zero
> ref-counter says that this conntrack is used. So when we release
> a conntrack with non-zero counter, we break this assumption.
>
> CPU1 CPU2
> ____nf_conntrack_find()
> nf_ct_put()
> destroy_conntrack()
> ...
> init_conntrack
> __nf_conntrack_alloc (set use = 1)
> atomic_inc_not_zero(&ct->use) (use = 2)
> if (!l4proto->new(ct, skb, dataoff, timeouts))
> nf_conntrack_free(ct); (use = 2 !!!)
> ...
> __nf_conntrack_alloc (set use = 1)
> if (!nf_ct_key_equal(h, tuple, zone))
> nf_ct_put(ct); (use = 0)
> destroy_conntrack()
> /* continue to work with CT */
>
> After applying the path "[PATCH] netfilter: nf_conntrack: fix RCU
> race in nf_conntrack_find_get" another bug was triggered in
> destroy_conntrack():
>
> <4>[67096.759334] ------------[ cut here ]------------
> <2>[67096.759353] kernel BUG at net/netfilter/nf_conntrack_core.c:211!
> ...
> <4>[67096.759837] Pid: 498649, comm: atdd veid: 666 Tainted: G C --------------- 2.6.32-042stab084.18 #1 042stab084_18 /DQ45CB
> <4>[67096.759932] RIP: 0010:[<ffffffffa03d99ac>] [<ffffffffa03d99ac>] destroy_conntrack+0x15c/0x190 [nf_conntrack]
> <4>[67096.760255] Call Trace:
> <4>[67096.760255] [<ffffffff814844a7>] nf_conntrack_destroy+0x17/0x30
> <4>[67096.760255] [<ffffffffa03d9bb5>] nf_conntrack_find_get+0x85/0x130 [nf_conntrack]
> <4>[67096.760255] [<ffffffffa03d9fb2>] nf_conntrack_in+0x352/0xb60 [nf_conntrack]
> <4>[67096.760255] [<ffffffffa048c771>] ipv4_conntrack_local+0x51/0x60 [nf_conntrack_ipv4]
> <4>[67096.760255] [<ffffffff81484419>] nf_iterate+0x69/0xb0
> <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
> <4>[67096.760255] [<ffffffff814845d4>] nf_hook_slow+0x74/0x110
> <4>[67096.760255] [<ffffffff814b5b00>] ? dst_output+0x0/0x20
> <4>[67096.760255] [<ffffffff814b66d5>] raw_sendmsg+0x775/0x910
> <4>[67096.760255] [<ffffffff8104c5a8>] ? flush_tlb_others_ipi+0x128/0x130
> <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
> <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
> <4>[67096.760255] [<ffffffff814c136a>] inet_sendmsg+0x4a/0xb0
> <4>[67096.760255] [<ffffffff81444e93>] ? sock_sendmsg+0x13/0x140
> <4>[67096.760255] [<ffffffff81444f97>] sock_sendmsg+0x117/0x140
> <4>[67096.760255] [<ffffffff8102e299>] ? native_smp_send_reschedule+0x49/0x60
> <4>[67096.760255] [<ffffffff81519beb>] ? _spin_unlock_bh+0x1b/0x20
> <4>[67096.760255] [<ffffffff8109d930>] ? autoremove_wake_function+0x0/0x40
> <4>[67096.760255] [<ffffffff814960f0>] ? do_ip_setsockopt+0x90/0xd80
> <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
> <4>[67096.760255] [<ffffffff8100bc4e>] ? apic_timer_interrupt+0xe/0x20
> <4>[67096.760255] [<ffffffff814457c9>] sys_sendto+0x139/0x190
> <4>[67096.760255] [<ffffffff810efa77>] ? audit_syscall_entry+0x1d7/0x200
> <4>[67096.760255] [<ffffffff810ef7c5>] ? __audit_syscall_exit+0x265/0x290
> <4>[67096.760255] [<ffffffff81474daf>] compat_sys_socketcall+0x13f/0x210
> <4>[67096.760255] [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Florian Westphal <fw@strlen.de>
> Cc: Andrew Vagin <avagin@parallels.com>
Acked-by: Andrew Vagin <avagin@parallels.com>
> Reported-by: Andrew Vagin <avagin@parallels.com>
> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-02-05 23:00 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-03 22:09 [PATCH] netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt Pablo Neira Ayuso
2014-02-03 23:29 ` Florian Westphal
2014-02-04 10:21 ` Pablo Neira Ayuso
2014-02-04 12:46 ` Florian Westphal
2014-02-04 13:16 ` Pablo Neira Ayuso
2014-02-04 13:27 ` Florian Westphal
2014-02-04 0:39 ` Eric Dumazet
2014-02-05 23:00 ` Pablo Neira Ayuso
2014-02-04 13:53 ` Andrew Vagin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).