Re: [PATCH] netfilter: nf_conntrack: release conntrack from rcu callback

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Andrew Vagin <avagin@gmail.com>
To: Florian Westphal <fw@strlen.de>
Cc: Andrey Vagin <avagin@openvz.org>,
	netfilter-devel@vger.kernel.org, netfilter@vger.kernel.org,
	coreteam@netfilter.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, vvs@openvz.org,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Patrick McHardy <kaber@trash.net>,
	Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
	"David S. Miller" <davem@davemloft.net>,
	Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH] netfilter: nf_conntrack: release conntrack from rcu callback
Date: Tue, 7 Jan 2014 00:54:15 +0400	[thread overview]
Message-ID: <20140106205414.GA19788@gmail.com> (raw)
In-Reply-To: <20140106170235.GJ28854@breakpoint.cc>

On Mon, Jan 06, 2014 at 06:02:35PM +0100, Florian Westphal wrote:
> Andrey Vagin <avagin@openvz.org> wrote:
> > Lets look at destroy_conntrack:
> > 
> > hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
> > ...
> > nf_conntrack_free(ct)
> > 	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
> > 
> > The hash is protected by rcu, so readers look up conntracks without
> > locks.
> > A conntrack is removed from the hash, but in this moment a few readers
> > still can use the conntrack, so if we call kmem_cache_free now, all
> > readers will read released object.
> > 
> > Bellow you can find more tricky race condition of three tasks.
> > 
> > task 1			task 2			task 3
> > 			nf_conntrack_find_get
> > 			 ____nf_conntrack_find
> > destroy_conntrack
> >  hlist_nulls_del_rcu
> >  nf_conntrack_free
> >  kmem_cache_free
> > 						__nf_conntrack_alloc
> > 						 kmem_cache_alloc
> > 						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
> > 			 if (nf_ct_is_dying(ct))
> > 
> > In this case the task 2 will not understand, that it uses a wrong
> > conntrack.
> 
> Can you elaborate?
> Yes, nf_ct_is_dying(ct) might be called for the wrong conntrack.
> 
> But, in case we _think_ that its the right one we call
> nf_ct_tuple_equal() to verify we indeed found the right one:

Ok. task3 creates a new contrack and nf_ct_tuple_equal() returns true on
it. Looks like it's possible. In this case we have two threads with one
unitialized contrack. It's really bad, because the code supposes that
conntrack can not be initialized in two threads concurrently. For
example BUG can be triggered from nf_nat_setup_info():

BUG_ON(nf_nat_initialized(ct, maniptype));


> 
>        h = ____nf_conntrack_find(net, zone, tuple, hash);
>        if (h) { // might be released right now, but page won't go away (SLAB_BY_RCU)

I did not take SLAB_BY_RCU into account. Thank you. But it doesn't say,
that we don't have the race condition here. It explains why we don't
have really bad situations, when a completely wrong contract is
used. We always use a "right" conntrack, but sometimes it is
uninitialized and here is a problem.

The race window is tiny, because usually we check that conntrack is not
initialized and only then we execute its initialization. We don't hold
any locks in these moments.

Task2					| Task3
if (!nf_nat_initialized(ct))		|
					| if (!nf_nat_initialized(ct)
 alloc_null_binding			|
					|  alloc_null_binding
  nf_nat_setup_info			|
   ct->status |= IPS_SRC_NAT_DONE	|
					|   nf_nat_setup_info
					|    BUG_ON(nf_nat_initialized(ct));

>                 ct = nf_ct_tuplehash_to_ctrack(h);
>                 if (unlikely(nf_ct_is_dying(ct) ||
>                              !atomic_inc_not_zero(&ct->ct_general.use)))
> 			// which means we should hit this path (0 ref).
>                         h = NULL;
>                 else {
> 			// otherwise, it cannot go away from under us, since
> 			// we own a reference now.
>                         if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple) ||
>                                      nf_ct_zone(ct) != zone)) {
> 			// if we get here, the entry got recycled on other cpu
> 			// for a different tuple, we can bail out and drop
> 			// the reference safely and re-try the lookup
>                                 nf_ct_put(ct);
>                                 goto begin;
>                         }
>                 }

Thanks,
Andrey

next prev parent reply	other threads:[~2014-01-06 20:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-06 15:54 [PATCH] netfilter: nf_conntrack: release conntrack from rcu callback Andrey Vagin
2014-01-06 17:02 ` Florian Westphal
2014-01-06 17:21   ` Cyrill Gorcunov
2014-01-06 18:09     ` Cyrill Gorcunov
2014-01-06 21:23     ` Florian Westphal
2014-01-06 21:44       ` Cyrill Gorcunov
2014-01-06 20:54   ` Andrew Vagin [this message]
2014-01-06 21:53     ` Florian Westphal
2014-01-07 10:39       ` Andrey Wagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140106205414.GA19788@gmail.com \
    --to=avagin@gmail.com \
    --cc=avagin@openvz.org \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=gorcunov@openvz.org \
    --cc=kaber@trash.net \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=netfilter@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=vvs@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).