netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Vagin <avagin@gmail.com>
To: Eric Dumazet <eric.dumazet@gmail.com>, Florian Westphal <fw@strlen.de>
Cc: Andrey Vagin <avagin@openvz.org>,
	netfilter-devel@vger.kernel.org, netfilter@vger.kernel.org,
	coreteam@netfilter.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, vvs@openvz.org,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Patrick McHardy <kaber@trash.net>,
	Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
	"David S. Miller" <davem@davemloft.net>,
	Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
Date: Thu, 9 Jan 2014 09:24:54 +0400	[thread overview]
Message-ID: <20140109052454.GA16743@gmail.com> (raw)
In-Reply-To: <1389107305.26646.20.camel@edumazet-glaptop2.roam.corp.google.com>

On Tue, Jan 07, 2014 at 07:08:25AM -0800, Eric Dumazet wrote:
> On Tue, 2014-01-07 at 14:31 +0400, Andrey Vagin wrote:
> > Lets look at destroy_conntrack:
> > 
> > hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
> > ...
> > nf_conntrack_free(ct)
> > 	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
> > 
> > net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
> > 
> > The hash is protected by rcu, so readers look up conntracks without
> > locks.
> > A conntrack is removed from the hash, but in this moment a few readers
> > still can use the conntrack. Then this conntrack is released and another
> > thread creates conntrack with the same address and the equal tuple.
> > After this a reader starts to validate the conntrack:
> > * It's not dying, because a new conntrack was created
> > * nf_ct_tuple_equal() returns true.
> > 
> > But this conntrack is not initialized yet, so it can not be used by two
> > threads concurrently. In this case BUG_ON may be triggered from
> > nf_nat_setup_info().
> > 
> > Florian Westphal suggested to check the confirm bit too. I think it's
> > right.
> > 
> > task 1			task 2			task 3
> > 			nf_conntrack_find_get
> > 			 ____nf_conntrack_find
> > destroy_conntrack
> >  hlist_nulls_del_rcu
> >  nf_conntrack_free
> >  kmem_cache_free
> > 						__nf_conntrack_alloc
> > 						 kmem_cache_alloc
> > 						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
> > 			 if (nf_ct_is_dying(ct))
> > 			 if (!nf_ct_tuple_equal()
> > 
> > I'm not sure, that I have ever seen this race condition in a real life.
> > Currently we are investigating a bug, which is reproduced on a few node.
> > In our case one conntrack is initialized from a few tasks concurrently,
> > we don't have any other explanation for this.
> 
> > 
> > Cc: Florian Westphal <fw@strlen.de>
> > Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> > Cc: Patrick McHardy <kaber@trash.net>
> > Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> > Signed-off-by: Andrey Vagin <avagin@openvz.org>
> > ---
> >  net/netfilter/nf_conntrack_core.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> > index 43549eb..7a34bb2 100644
> > --- a/net/netfilter/nf_conntrack_core.c
> > +++ b/net/netfilter/nf_conntrack_core.c
> > @@ -387,8 +387,12 @@ begin:
> >  			     !atomic_inc_not_zero(&ct->ct_general.use)))
> >  			h = NULL;
> >  		else {
> > +			/* A conntrack can be recreated with the equal tuple,
> > +			 * so we need to check that the conntrack is initialized
> > +			 */
> >  			if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple) ||
> > -				     nf_ct_zone(ct) != zone)) {
> > +				     nf_ct_zone(ct) != zone) ||
> > +				     !nf_ct_is_confirmed(ct)) {
> >  				nf_ct_put(ct);
> >  				goto begin;
> >  			}
> 
> I do not think this is the right way to fix this problem (if said
> problem is confirmed)
> 
> Remember the rule about SLAB_DESTROY_BY_RCU :
> 
> When a struct is freed, then reused, its important to set the its refcnt
> (from 0 to 1) only when the structure is fully ready for use.
> 
> If a lookup finds a structure which is not yet setup, the
> atomic_inc_not_zero() will fail.
> 
> Take a look at sk_clone_lock() and Documentation/RCU/rculist_nulls.txt
> 

I have one more question. Looks like I found another problem.

init_conntrack:
        hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
                      &net->ct.unconfirmed);

__nf_conntrack_hash_insert:
	hlist_nulls_add_head_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode,
		   &net->ct.hash[hash]);

We use one hlist_nulls_node to add a conntrack into two different lists.
As far as I understand, it's unacceptable in case of
SLAB_DESTROY_BY_RCU.

Lets imagine that we have two threads. The first one enumerates objects
from a first list, the second one recreates an object and insert it in a
second list.  If a first thread in this moment stays on the object, it
can read "next", when it's in the second list, so it will continue
to enumerate objects from the second list. It is not what we want, isn't
it?

cpu1				cpu2
				hlist_nulls_for_each_entry_rcu(ct)
destroy_conntrack
 kmem_cache_free

init_conntrack
 hlist_nulls_add_head_rcu
				ct = ct->next

  parent reply	other threads:[~2014-01-09  5:24 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-07 10:31 [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Andrey Vagin
2014-01-07 11:42 ` Vasily Averin
2014-01-07 15:08 ` Eric Dumazet
2014-01-07 15:25   ` Florian Westphal
2014-01-08 13:42     ` Eric Dumazet
2014-01-08 14:04       ` Florian Westphal
2014-01-08 17:31         ` Eric Dumazet
2014-01-08 20:18           ` Florian Westphal
2014-01-08 20:23             ` Florian Westphal
2014-01-09 20:32     ` Andrew Vagin
2014-01-09 20:56       ` Florian Westphal
2014-01-09 21:07         ` Andrew Vagin
2014-01-09 21:26           ` Florian Westphal
2014-01-09  5:24   ` Andrew Vagin [this message]
2014-01-09 15:23     ` Eric Dumazet
2014-01-09 21:46       ` Andrey Wagin
2014-01-08 13:17 ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v2) Andrey Vagin
2014-01-08 13:47   ` Eric Dumazet
2014-01-12 17:50     ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3) Andrey Vagin
2014-01-12 20:21       ` Eric Dumazet
2014-01-14 10:51         ` Andrew Vagin
2014-01-14 11:10           ` Andrey Wagin
2014-01-14 14:36           ` Eric Dumazet
2014-01-14 17:35             ` [PATCH] [RFC] netfilter: nf_conntrack: don't relase a conntrack with non-zero refcnt Andrey Vagin
2014-01-14 17:44               ` Cyrill Gorcunov
2014-01-14 18:53               ` Florian Westphal
2014-01-15 18:08                 ` Andrew Vagin
2014-01-16  9:23                   ` Florian Westphal
2014-02-02 23:30                     ` Pablo Neira Ayuso
2014-02-03 13:59                       ` Andrew Vagin
2014-02-03 16:22                       ` Eric Dumazet
2014-01-27 13:44               ` Andrew Vagin
2014-01-29 19:21         ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3) Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140109052454.GA16743@gmail.com \
    --to=avagin@gmail.com \
    --cc=avagin@openvz.org \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=fw@strlen.de \
    --cc=gorcunov@openvz.org \
    --cc=kaber@trash.net \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=netfilter@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=vvs@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).