From: Ani Sinha <ani@arista.com>
To: Ani Sinha <ani@arista.com>
Cc: Florian Westphal <fw@strlen.de>,
Patrick McHardy <kaber@trash.net>,
"David S. Miller" <davem@davemloft.net>,
netfilter-devel@vger.kernel.org, netfilter@vger.kernel.org,
coreteam@netfilter.org,
"netdev@vger.kernel.org" <netdev@vger.kernel.org>
Subject: Re: linux 3.4.43 : kernel crash at __nf_conntrack_confirm
Date: Sun, 18 Oct 2015 14:12:15 -0700 (PDT) [thread overview]
Message-ID: <alpine.OSX.2.20.1510181409060.87917@athabasca.local> (raw)
In-Reply-To: <CAOxq_8NLLFyNCSDJ68+VjxFGpNSex8ShdhGFNBHK29g_+UBW6g@mail.gmail.com>
>
> On Sun, Oct 18, 2015 at 1:07 AM, Florian Westphal <fw@strlen.de> wrote:
> > Ani Sinha <ani@arista.com> wrote:
> >> Coming back to this crash, I see something interesting in the
> >> conntrack code in linux 3.4.109 (a supported kernel version). I see
> >> that the hash table manipulations are protected by a spinlock. Also
> >> lookups/reads are protected by RCU. However allocation and
> >> deallocation of conntrack objects happen outside of both the locks.
> >> It seems to me that a conntrack object can be deallocated and a new
> >> object can be allocated and initialized within the same RCU grace
> >> period, while the hash table is being read.
> >
> > Yes. We need to use SLAB_DESTROY_BY_RCU instead of kfree_rcu because
> > there could be hundreds of thousands of alloc/free pairs within a short
> > time period.
> >
> >> It looks like a bug to me.
> >
> > No, as long as readers detect object reuse.
Right.
> >
> >> > Looking upstream, I see a couple of patches which fixes race condition
> >> > around the use of the conntrack hash table with RCU (lock free read)
> >> > primitives :
> >> >
> >> > commit c6825c0976fa7893692e0e43b09740b419b23c09
> >> > Author: Andrey Vagin <avagin@openvz.org>
> >> > Date: Wed Jan 29 19:34:14 2014 +0100
> >> > netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
> >> >
> >> > and a followup patch :
> >> >
> >> > commit e53376bef2cd97d3e3f61fdc677fb8da7d03d0da
> >> > Author: Pablo Neira Ayuso <pablo@netfilter.org>
> >> > Date: Mon Feb 3 20:01:53 2014 +0100
> >> > netfilter: nf_conntrack: don't release a conntrack with non-zero refcnt
> >> >
> >
> > These for instance fix such bugs.
>
Indeed. So it seems to me that we have run into one another such case.
In patch c6825c0976fa7893692, I see we have added an additional check (along with comparing tuple and zone) to verify that if the conntrack is confirmed.
+ return nf_ct_tuple_equal(tuple, &h->tuple) &&
+ nf_ct_zone(ct) == zone &&
+ nf_ct_is_confirmed(ct);
This is necessary since it's possible that a conntrack can be recreated with the same zone.
Unfortunately, we leave a hole open in __nf_conntrack_confirm() because this routine _is_ responsible
for confirming the conntrack. We cannot use the same logic here.
Should I send a patch along the lines of :
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 71935fc..6ff4088 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -535,6 +535,12 @@ __nf_conntrack_confirm(struct sk_buff *skb)
zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)))
goto out;
+ /* we might be racing against a case where the conntrack was deleted
+ and a new conntrack was initialized with the exact same zone. We
+ need to make sure that the conntrack node is in the hashtable */
+ if (hlist_nulls_unhashed(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode))
+ goto out;
+
/* Remove from unconfirmed list */
hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
next prev parent reply other threads:[~2015-10-18 21:12 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-10-07 19:57 linux 3.4.43 : kernel crash at __nf_conntrack_confirm Ani Sinha
2015-10-18 2:34 ` Ani Sinha
2015-10-18 8:07 ` Florian Westphal
[not found] ` <CAOxq_8NLLFyNCSDJ68+VjxFGpNSex8ShdhGFNBHK29g_+UBW6g@mail.gmail.com>
2015-10-18 21:12 ` Ani Sinha [this message]
2015-10-18 21:40 ` Florian Westphal
2015-10-19 20:22 ` Ani Sinha
2015-10-19 20:33 ` Florian Westphal
2015-10-19 22:13 ` Ani Sinha
2015-10-21 19:35 ` Ani Sinha
2015-10-21 21:19 ` Florian Westphal
2015-10-21 21:26 ` Ani Sinha
2015-10-22 7:42 ` Neal P. Murphy
2015-10-22 19:53 ` Ani Sinha
2015-10-23 2:39 ` Neal P. Murphy
2015-10-24 18:28 ` Ani Sinha
2015-10-26 6:13 ` Neal P. Murphy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.OSX.2.20.1510181409060.87917@athabasca.local \
--to=ani@arista.com \
--cc=coreteam@netfilter.org \
--cc=davem@davemloft.net \
--cc=fw@strlen.de \
--cc=kaber@trash.net \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=netfilter@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).