All of lore.kernel.org
 help / color / mirror / Atom feed
From: Florian Westphal <fw@strlen.de>
To: Andrew Vagin <avagin@gmail.com>
Cc: Florian Westphal <fw@strlen.de>, Andrey Vagin <avagin@openvz.org>,
	netfilter-devel@vger.kernel.org, netfilter@vger.kernel.org,
	coreteam@netfilter.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, vvs@openvz.org,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Patrick McHardy <kaber@trash.net>,
	Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
	"David S. Miller" <davem@davemloft.net>,
	Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH] netfilter: nf_conntrack: release conntrack from rcu callback
Date: Mon, 6 Jan 2014 22:53:35 +0100	[thread overview]
Message-ID: <20140106215335.GC9894@breakpoint.cc> (raw)
In-Reply-To: <20140106205414.GA19788@gmail.com>

Andrew Vagin <avagin@gmail.com> wrote:
> On Mon, Jan 06, 2014 at 06:02:35PM +0100, Florian Westphal wrote:
> > Andrey Vagin <avagin@openvz.org> wrote:
> > > Lets look at destroy_conntrack:
> > > 
> > > hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
> > > ...
> > > nf_conntrack_free(ct)
> > > 	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
> > > 
> > > The hash is protected by rcu, so readers look up conntracks without
> > > locks.
> > > A conntrack is removed from the hash, but in this moment a few readers
> > > still can use the conntrack, so if we call kmem_cache_free now, all
> > > readers will read released object.
> > > 
> > > Bellow you can find more tricky race condition of three tasks.
> > > 
> > > task 1			task 2			task 3
> > > 			nf_conntrack_find_get
> > > 			 ____nf_conntrack_find
> > > destroy_conntrack
> > >  hlist_nulls_del_rcu
> > >  nf_conntrack_free
> > >  kmem_cache_free
> > > 						__nf_conntrack_alloc
> > > 						 kmem_cache_alloc
> > > 						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
> > > 			 if (nf_ct_is_dying(ct))
> > > 
> > > In this case the task 2 will not understand, that it uses a wrong
> > > conntrack.
> > 
> > Can you elaborate?
> > Yes, nf_ct_is_dying(ct) might be called for the wrong conntrack.
> > 
> > But, in case we _think_ that its the right one we call
> > nf_ct_tuple_equal() to verify we indeed found the right one:
> 
> Ok. task3 creates a new contrack and nf_ct_tuple_equal() returns true on
> it. Looks like it's possible.

IFF we're recycling the exact same tuple (i.e., flow was destroyed/terminated
AND has been re-created in identical fashion on another cpu)
AND it is not yet confirmed (ie. its not in hash table any more but in
unconfirmed list) then, yes, I think you're right.

> unitialized contrack. It's really bad, because the code supposes that
> conntrack can not be initialized in two threads concurrently. For
> example BUG can be triggered from nf_nat_setup_info():
> 
> BUG_ON(nf_nat_initialized(ct, maniptype));

Right, since a new conntrack entry is not supposed to be in the hash
table.

> >                 ct = nf_ct_tuplehash_to_ctrack(h);
> >                 if (unlikely(nf_ct_is_dying(ct) ||
> >                              !atomic_inc_not_zero(&ct->ct_general.use)))
> > 			// which means we should hit this path (0 ref).
> >                         h = NULL;
> >                 else {
> > 			// otherwise, it cannot go away from under us, since
> > 			// we own a reference now.
> >                         if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple) ||
> >                                      nf_ct_zone(ct) != zone)) {

Perhaps this needs additional !nf_ct_is_confirmed()?

It would cover your case (found a recycled element that has been put on
the unconfirmed list (refcnt already set to 1, ct->tuple is set) on another cpu,
extensions possibly not yet fully initialised), and the same tuple).

Regards,
Florian

  reply	other threads:[~2014-01-06 21:53 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-06 15:54 [PATCH] netfilter: nf_conntrack: release conntrack from rcu callback Andrey Vagin
2014-01-06 17:02 ` Florian Westphal
2014-01-06 17:21   ` Cyrill Gorcunov
2014-01-06 18:09     ` Cyrill Gorcunov
2014-01-06 21:23     ` Florian Westphal
2014-01-06 21:44       ` Cyrill Gorcunov
2014-01-06 20:54   ` Andrew Vagin
2014-01-06 21:53     ` Florian Westphal [this message]
2014-01-07 10:39       ` Andrey Wagin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140106215335.GC9894@breakpoint.cc \
    --to=fw@strlen.de \
    --cc=avagin@gmail.com \
    --cc=avagin@openvz.org \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=gorcunov@openvz.org \
    --cc=kaber@trash.net \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=netfilter@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=vvs@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.