netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ani Sinha <ani@arista.com>
To: Ani Sinha <ani@arista.com>
Cc: Florian Westphal <fw@strlen.de>,
	Patrick McHardy <kaber@trash.net>,
	"David S. Miller" <davem@davemloft.net>,
	netfilter-devel@vger.kernel.org, netfilter@vger.kernel.org,
	coreteam@netfilter.org,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: linux 3.4.43 : kernel crash at __nf_conntrack_confirm
Date: Sun, 18 Oct 2015 14:12:15 -0700 (PDT)	[thread overview]
Message-ID: <alpine.OSX.2.20.1510181409060.87917@athabasca.local> (raw)
In-Reply-To: <CAOxq_8NLLFyNCSDJ68+VjxFGpNSex8ShdhGFNBHK29g_+UBW6g@mail.gmail.com>



> 
> On Sun, Oct 18, 2015 at 1:07 AM, Florian Westphal <fw@strlen.de> wrote:
> > Ani Sinha <ani@arista.com> wrote:
> >> Coming back to this crash, I see something interesting in the
> >> conntrack code in linux 3.4.109 (a supported kernel version). I see
> >> that the hash table manipulations are protected by a spinlock. Also
> >> lookups/reads are protected by RCU. However allocation and
> >> deallocation of conntrack objects happen outside of both the locks.
> >> It seems to me that a conntrack object can be deallocated and a new
> >> object can be allocated and initialized within the same RCU grace
> >> period, while the hash table is being read.
> >
> > Yes.  We need to use SLAB_DESTROY_BY_RCU instead of kfree_rcu because
> > there could be hundreds of thousands of alloc/free pairs within a short
> > time period.
> >
> >> It looks like a bug to me.
> >
> > No, as long as readers detect object reuse.
 
Right.
 
> >
> >> > Looking upstream, I see a couple of patches which fixes race condition
> >> > around the use of the conntrack hash table with RCU (lock free read)
> >> > primitives :
> >> >
> >> > commit c6825c0976fa7893692e0e43b09740b419b23c09
> >> > Author: Andrey Vagin <avagin@openvz.org>
> >> > Date:   Wed Jan 29 19:34:14 2014 +0100
> >> >      netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
> >> >
> >> > and a followup patch :
> >> >
> >> > commit e53376bef2cd97d3e3f61fdc677fb8da7d03d0da
> >> > Author: Pablo Neira Ayuso <pablo@netfilter.org>
> >> > Date:   Mon Feb 3 20:01:53 2014 +0100
> >> >         netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
> >> >
> >
> > These for instance fix such bugs.
> 
Indeed. So it seems to me that we have run into one another such case.
In patch c6825c0976fa7893692, I see we have added an additional check (along with comparing tuple and zone) to verify that if the conntrack is confirmed.
 
+       return nf_ct_tuple_equal(tuple, &h->tuple) &&
+               nf_ct_zone(ct) == zone &&
+               nf_ct_is_confirmed(ct);
 
 
This is necessary since it's possible that a conntrack can be recreated with the same zone.
Unfortunately, we leave a hole open in __nf_conntrack_confirm() because this routine _is_ responsible
for confirming the conntrack. We cannot use the same logic here. 
 
Should I send a patch along the lines of :
 
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 71935fc..6ff4088 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -535,6 +535,12 @@ __nf_conntrack_confirm(struct sk_buff *skb)
 		    zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)))
 			goto out;
 
+	/* we might be racing against a case where the conntrack was deleted 
+	   and a new conntrack was initialized with the exact same zone. We
+	   need to make sure that the conntrack node is in the hashtable */
+	if (hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode))
+	  goto out;
+
 	/* Remove from unconfirmed list */
 	hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
 



  parent reply	other threads:[~2015-10-18 21:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-07 19:57 linux 3.4.43 : kernel crash at __nf_conntrack_confirm Ani Sinha
2015-10-18  2:34 ` Ani Sinha
2015-10-18  8:07   ` Florian Westphal
     [not found]     ` <CAOxq_8NLLFyNCSDJ68+VjxFGpNSex8ShdhGFNBHK29g_+UBW6g@mail.gmail.com>
2015-10-18 21:12       ` Ani Sinha [this message]
2015-10-18 21:40         ` Florian Westphal
2015-10-19 20:22           ` Ani Sinha
2015-10-19 20:33             ` Florian Westphal
2015-10-19 22:13               ` Ani Sinha
2015-10-21 19:35     ` Ani Sinha
2015-10-21 21:19       ` Florian Westphal
2015-10-21 21:26         ` Ani Sinha
2015-10-22  7:42           ` Neal P. Murphy
2015-10-22 19:53             ` Ani Sinha
2015-10-23  2:39               ` Neal P. Murphy
2015-10-24 18:28                 ` Ani Sinha
2015-10-26  6:13                   ` Neal P. Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.OSX.2.20.1510181409060.87917@athabasca.local \
    --to=ani@arista.com \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=netfilter@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).