From: Florian Westphal <fw@strlen.de>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Florian Westphal <fw@strlen.de>, Andrey Vagin <avagin@openvz.org>,
netfilter-devel@vger.kernel.org, netfilter@vger.kernel.org,
coreteam@netfilter.org, netdev@vger.kernel.org,
linux-kernel@vger.kernel.org, vvs@openvz.org,
Pablo Neira Ayuso <pablo@netfilter.org>,
Patrick McHardy <kaber@trash.net>,
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
"David S. Miller" <davem@davemloft.net>,
Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
Date: Wed, 8 Jan 2014 21:18:38 +0100 [thread overview]
Message-ID: <20140108201838.GI9894@breakpoint.cc> (raw)
In-Reply-To: <1389202287.26646.95.camel@edumazet-glaptop2.roam.corp.google.com>
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > The confirmed bit should always be set here.
>
> So why are you testing it ?
To detect ct object recycling when tuple is identical.
This is my understanding of how we can end up with two
cpus thinking they have exclusive ownership of the same ct:
A cpu0: starts lookup: find ct for tuple t
B cpu1: starts lookup: find ct for tuple t
C cpu0: finds ct c for tuple t, no refcnt taken yet
cpu1: finds ct c for tuple t, no refcnt taken yet
cpu2: destroys ct c, removes from hash table, calls ->destroy function
D cpu0: tries to increment refcnt; fails since its 0: lookup ends
E cpu0: allocates a new ct object since no acceptable ct was found for t
F cpu0: allocator gives us just-freed ct c
G cpu0: initialises ct, sets refcnt to 1
H cpu0: adds extensions, ct object is put on unconfirmed list and
assigned to skb->nfct
I cpu0: skb continues through network stack
J cpu1: tries to increment refcnt, ok
K cpu1: checks if ct matches requested tuple t: it does
L cpu0: sets refcnt conntrack tuple, allocates extensions, etc.
cpu1: sets skb->nfct to ct, skb continues through network stack
-> both cpu0 and cpu1 reference a ct object that was not in hash table
cpu0 and cpu1 will then race, for example in
net/ipv4/netfilter/iptable_nat.c:nf_nat_rule_find():
if (!nf_nat_initialized(ct, HOOK2MANIP(hooknum)))
ret = alloc_null_binding(ct, hooknum);
[ Its possible that I misunderstand and that there is something that
precents this from happening. Usually its the 'tuple equal' test
that is performed post-atomic-inc-not-zero that detects the recycling,
so step K above would fail ]
The idea of the 'confirmed bit test' is that when its not set then the
conntrack was recycled and should not be used before the cpu that
currently 'owns' that entry has put it into the hash table again.
> I did this RCU conversion, so I think I know what I am talking about.
Yes, I know. If you have any suggestions on how to fix it, I'd be very
interested to hear about them.
> The entry should not be put into hash table (or refcnt set to 1),
> if its not ready. It is that simple.
I understand what you're saying, but I don't see how we can do it.
I think the assumption that we have a refcnt on skb->nfct is everywhere.
If I understand you correctly we'd have to differentiate between
'can be used fully (e.g. nf_nat_setup_info done for both directions)'
and 'a new conntrack was created (extensions may not be ready yet)'.
But currently in both cases the ct is assigned to the skb, and in both
cases a refcnt is taken. I am out of ideas, except perhaps using
ct->lock to serialize the nat setup (but I don't like that since
I'm not sure that the nat race is the only one).
> We need to address this problem without adding an extra test and
> possible loop for the lookups.
Agreed. I don't like the extra test either.
Many thanks for looking into this Eric!
next prev parent reply other threads:[~2014-01-08 20:18 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-07 10:31 [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Andrey Vagin
2014-01-07 11:42 ` Vasily Averin
2014-01-07 15:08 ` Eric Dumazet
2014-01-07 15:25 ` Florian Westphal
2014-01-08 13:42 ` Eric Dumazet
2014-01-08 14:04 ` Florian Westphal
2014-01-08 17:31 ` Eric Dumazet
2014-01-08 20:18 ` Florian Westphal [this message]
2014-01-08 20:23 ` Florian Westphal
2014-01-09 20:32 ` Andrew Vagin
2014-01-09 20:56 ` Florian Westphal
2014-01-09 21:07 ` Andrew Vagin
2014-01-09 21:26 ` Florian Westphal
2014-01-09 5:24 ` Andrew Vagin
2014-01-09 15:23 ` Eric Dumazet
2014-01-09 21:46 ` Andrey Wagin
2014-01-08 13:17 ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v2) Andrey Vagin
2014-01-08 13:47 ` Eric Dumazet
2014-01-12 17:50 ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3) Andrey Vagin
2014-01-12 20:21 ` Eric Dumazet
2014-01-14 10:51 ` Andrew Vagin
2014-01-14 11:10 ` Andrey Wagin
2014-01-14 14:36 ` Eric Dumazet
2014-01-14 17:35 ` [PATCH] [RFC] netfilter: nf_conntrack: don't relase a conntrack with non-zero refcnt Andrey Vagin
2014-01-14 17:44 ` Cyrill Gorcunov
2014-01-14 18:53 ` Florian Westphal
2014-01-15 18:08 ` Andrew Vagin
2014-01-16 9:23 ` Florian Westphal
2014-02-02 23:30 ` Pablo Neira Ayuso
2014-02-03 13:59 ` Andrew Vagin
2014-02-03 16:22 ` Eric Dumazet
2014-01-27 13:44 ` Andrew Vagin
2014-01-29 19:21 ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3) Pablo Neira Ayuso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140108201838.GI9894@breakpoint.cc \
--to=fw@strlen.de \
--cc=avagin@openvz.org \
--cc=coreteam@netfilter.org \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=gorcunov@openvz.org \
--cc=kaber@trash.net \
--cc=kadlec@blackhole.kfki.hu \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=netfilter-devel@vger.kernel.org \
--cc=netfilter@vger.kernel.org \
--cc=pablo@netfilter.org \
--cc=vvs@openvz.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).