All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vasily Averin <vvs@parallels.com>
To: netfilter-devel@vger.kernel.org
Cc: Andrey Vagin <avagin@openvz.org>,
	netfilter@vger.kernel.org, coreteam@netfilter.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	vvs@openvz.org, Florian Westphal <fw@strlen.de>,
	Pablo Neira Ayuso <pablo@netfilter.org>,
	Patrick McHardy <kaber@trash.net>,
	Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>,
	"David S. Miller" <davem@davemloft.net>,
	Cyrill Gorcunov <gorcunov@openvz.org>
Subject: Re: [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
Date: Tue, 07 Jan 2014 15:42:50 +0400	[thread overview]
Message-ID: <52CBE83A.3080105@parallels.com> (raw)
In-Reply-To: <1389090711-15843-1-git-send-email-avagin@openvz.org>

On 01/07/2014 02:31 PM, Andrey Vagin wrote:
> Lets look at destroy_conntrack:
> 
> hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
> ...
> nf_conntrack_free(ct)
> 	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
> 
> net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.
> 
> The hash is protected by rcu, so readers look up conntracks without
> locks.
> A conntrack is removed from the hash, but in this moment a few readers
> still can use the conntrack. Then this conntrack is released and another
> thread creates conntrack with the same address and the equal tuple.
> After this a reader starts to validate the conntrack:
> * It's not dying, because a new conntrack was created
> * nf_ct_tuple_equal() returns true.
> 
> But this conntrack is not initialized yet, so it can not be used by two
> threads concurrently. In this case BUG_ON may be triggered from
> nf_nat_setup_info().
> 
> Florian Westphal suggested to check the confirm bit too. I think it's
> right.
> 
> task 1			task 2			task 3
> 			nf_conntrack_find_get
> 			 ____nf_conntrack_find
> destroy_conntrack
>  hlist_nulls_del_rcu
>  nf_conntrack_free
>  kmem_cache_free
> 						__nf_conntrack_alloc
> 						 kmem_cache_alloc
> 						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
> 			 if (nf_ct_is_dying(ct))
> 			 if (!nf_ct_tuple_equal()
> 
> I'm not sure, that I have ever seen this race condition in a real life.
> Currently we are investigating a bug, which is reproduced on a few node.
> In our case one conntrack is initialized from a few tasks concurrently,
> we don't have any other explanation for this.

I would add; this strange traffic is generated by botnet clients (/etc/atdd and /boot/.IptabLes)
Several threads are send lot of identical tcp packets via RAW socket.
https://bugzilla.openvz.org/show_bug.cgi?id=2851

> <2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
> ...
> <4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>]  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
> ...
> <4>[46267.085549] Call Trace:
> <4>[46267.085622]  [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
> <4>[46267.085697]  [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
> <4>[46267.085770]  [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
> <4>[46267.085843]  [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
> <4>[46267.085919]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
> <4>[46267.085991]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
> <4>[46267.086063]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
> <4>[46267.086133]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
> <4>[46267.086207]  [<ffffffff814b5890>] ? dst_output+0x0/0x20
> <4>[46267.086277]  [<ffffffff81495204>] ip_output+0xa4/0xc0
> <4>[46267.086346]  [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
> <4>[46267.086419]  [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
> <4>[46267.086491]  [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
> <4>[46267.086562]  [<ffffffff81444d67>] sock_sendmsg+0x117/0x140
> <4>[46267.086638]  [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
> <4>[46267.086712]  [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
> <4>[46267.086785]  [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
> <4>[46267.086858]  [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
> <4>[46267.086936]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
> <4>[46267.087006]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
> <4>[46267.087081]  [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
> <4>[46267.087151]  [<ffffffff81445599>] sys_sendto+0x139/0x190
> <4>[46267.087229]  [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
> <4>[46267.087303]  [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
> <4>[46267.087378]  [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
> <4>[46267.087454]  [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
> <4>[46267.087531]  [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
> <4>[46267.087607]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
> <4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
> <1>[46267.088023] RIP  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590
> 
> Cc: Florian Westphal <fw@strlen.de>
> Cc: Pablo Neira Ayuso <pablo@netfilter.org>
> Cc: Patrick McHardy <kaber@trash.net>
> Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Cyrill Gorcunov <gorcunov@openvz.org>
> Signed-off-by: Andrey Vagin <avagin@openvz.org>
> ---
>  net/netfilter/nf_conntrack_core.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 43549eb..7a34bb2 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -387,8 +387,12 @@ begin:
>  			     !atomic_inc_not_zero(&ct->ct_general.use)))
>  			h = NULL;
>  		else {
> +			/* A conntrack can be recreated with the equal tuple,
> +			 * so we need to check that the conntrack is initialized
> +			 */
>  			if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple) ||
> -				     nf_ct_zone(ct) != zone)) {
> +				     nf_ct_zone(ct) != zone) ||
> +				     !nf_ct_is_confirmed(ct)) {
>  				nf_ct_put(ct);
>  				goto begin;
>  			}
> 

  reply	other threads:[~2014-01-07 11:42 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-01-07 10:31 [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Andrey Vagin
2014-01-07 11:42 ` Vasily Averin [this message]
2014-01-07 15:08 ` Eric Dumazet
2014-01-07 15:25   ` Florian Westphal
2014-01-08 13:42     ` Eric Dumazet
2014-01-08 14:04       ` Florian Westphal
2014-01-08 17:31         ` Eric Dumazet
2014-01-08 20:18           ` Florian Westphal
2014-01-08 20:23             ` Florian Westphal
2014-01-09 20:32     ` Andrew Vagin
2014-01-09 20:32       ` Andrew Vagin
2014-01-09 20:56       ` Florian Westphal
2014-01-09 21:07         ` Andrew Vagin
2014-01-09 21:07           ` Andrew Vagin
2014-01-09 21:26           ` Florian Westphal
2014-01-09  5:24   ` Andrew Vagin
2014-01-09 15:23     ` Eric Dumazet
2014-01-09 21:46       ` Andrey Wagin
2014-01-08 13:17 ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v2) Andrey Vagin
2014-01-08 13:47   ` Eric Dumazet
2014-01-12 17:50     ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3) Andrey Vagin
2014-01-12 20:21       ` Eric Dumazet
2014-01-14 10:51         ` Andrew Vagin
2014-01-14 10:51           ` Andrew Vagin
2014-01-14 10:51           ` Andrew Vagin
2014-01-14 11:10           ` Andrey Wagin
2014-01-14 14:36           ` Eric Dumazet
2014-01-14 17:35             ` [PATCH] [RFC] netfilter: nf_conntrack: don't relase a conntrack with non-zero refcnt Andrey Vagin
2014-01-14 17:44               ` Cyrill Gorcunov
2014-01-14 18:53               ` Florian Westphal
2014-01-15 18:08                 ` Andrew Vagin
2014-01-15 18:08                   ` Andrew Vagin
2014-01-16  9:23                   ` Florian Westphal
2014-02-02 23:30                     ` Pablo Neira Ayuso
2014-02-03 13:59                       ` Andrew Vagin
2014-02-03 13:59                         ` Andrew Vagin
2014-02-03 16:22                       ` Eric Dumazet
2014-01-27 13:44               ` Andrew Vagin
2014-01-27 13:44                 ` Andrew Vagin
2014-01-29 19:21         ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3) Pablo Neira Ayuso

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52CBE83A.3080105@parallels.com \
    --to=vvs@parallels.com \
    --cc=avagin@openvz.org \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=gorcunov@openvz.org \
    --cc=kaber@trash.net \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=netfilter@vger.kernel.org \
    --cc=pablo@netfilter.org \
    --cc=vvs@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.