All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get
@ 2014-01-07 10:31 Andrey Vagin
  2014-01-07 11:42 ` Vasily Averin
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Andrey Vagin @ 2014-01-07 10:31 UTC (permalink / raw)
  To: netfilter-devel
  Cc: netfilter, coreteam, netdev, linux-kernel, vvs, Andrey Vagin,
	Florian Westphal, Pablo Neira Ayuso, Patrick McHardy,
	Jozsef Kadlecsik, David S. Miller, Cyrill Gorcunov

Lets look at destroy_conntrack:

hlist_nulls_del_rcu(&ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode);
...
nf_conntrack_free(ct)
	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);

net->ct.nf_conntrack_cachep is created with SLAB_DESTROY_BY_RCU.

The hash is protected by rcu, so readers look up conntracks without
locks.
A conntrack is removed from the hash, but in this moment a few readers
still can use the conntrack. Then this conntrack is released and another
thread creates conntrack with the same address and the equal tuple.
After this a reader starts to validate the conntrack:
* It's not dying, because a new conntrack was created
* nf_ct_tuple_equal() returns true.

But this conntrack is not initialized yet, so it can not be used by two
threads concurrently. In this case BUG_ON may be triggered from
nf_nat_setup_info().

Florian Westphal suggested to check the confirm bit too. I think it's
right.

task 1			task 2			task 3
			nf_conntrack_find_get
			 ____nf_conntrack_find
destroy_conntrack
 hlist_nulls_del_rcu
 nf_conntrack_free
 kmem_cache_free
						__nf_conntrack_alloc
						 kmem_cache_alloc
						 memset(&ct->tuplehash[IP_CT_DIR_MAX],
			 if (nf_ct_is_dying(ct))
			 if (!nf_ct_tuple_equal()

I'm not sure, that I have ever seen this race condition in a real life.
Currently we are investigating a bug, which is reproduced on a few node.
In our case one conntrack is initialized from a few tasks concurrently,
we don't have any other explanation for this.

<2>[46267.083061] kernel BUG at net/ipv4/netfilter/nf_nat_core.c:322!
...
<4>[46267.083951] RIP: 0010:[<ffffffffa01e00a4>]  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590 [nf_nat]
...
<4>[46267.085549] Call Trace:
<4>[46267.085622]  [<ffffffffa023421b>] alloc_null_binding+0x5b/0xa0 [iptable_nat]
<4>[46267.085697]  [<ffffffffa02342bc>] nf_nat_rule_find+0x5c/0x80 [iptable_nat]
<4>[46267.085770]  [<ffffffffa0234521>] nf_nat_fn+0x111/0x260 [iptable_nat]
<4>[46267.085843]  [<ffffffffa0234798>] nf_nat_out+0x48/0xd0 [iptable_nat]
<4>[46267.085919]  [<ffffffff814841b9>] nf_iterate+0x69/0xb0
<4>[46267.085991]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[46267.086063]  [<ffffffff81484374>] nf_hook_slow+0x74/0x110
<4>[46267.086133]  [<ffffffff81494e70>] ? ip_finish_output+0x0/0x2f0
<4>[46267.086207]  [<ffffffff814b5890>] ? dst_output+0x0/0x20
<4>[46267.086277]  [<ffffffff81495204>] ip_output+0xa4/0xc0
<4>[46267.086346]  [<ffffffff814b65a4>] raw_sendmsg+0x8b4/0x910
<4>[46267.086419]  [<ffffffff814c10fa>] inet_sendmsg+0x4a/0xb0
<4>[46267.086491]  [<ffffffff814459aa>] ? sock_update_classid+0x3a/0x50
<4>[46267.086562]  [<ffffffff81444d67>] sock_sendmsg+0x117/0x140
<4>[46267.086638]  [<ffffffff8151997b>] ? _spin_unlock_bh+0x1b/0x20
<4>[46267.086712]  [<ffffffff8109d370>] ? autoremove_wake_function+0x0/0x40
<4>[46267.086785]  [<ffffffff81495e80>] ? do_ip_setsockopt+0x90/0xd80
<4>[46267.086858]  [<ffffffff8100be0e>] ? call_function_interrupt+0xe/0x20
<4>[46267.086936]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
<4>[46267.087006]  [<ffffffff8118cb10>] ? ub_slab_ptr+0x20/0x90
<4>[46267.087081]  [<ffffffff8118f2e8>] ? kmem_cache_alloc+0xd8/0x1e0
<4>[46267.087151]  [<ffffffff81445599>] sys_sendto+0x139/0x190
<4>[46267.087229]  [<ffffffff81448c0d>] ? sock_setsockopt+0x16d/0x6f0
<4>[46267.087303]  [<ffffffff810efa47>] ? audit_syscall_entry+0x1d7/0x200
<4>[46267.087378]  [<ffffffff810ef795>] ? __audit_syscall_exit+0x265/0x290
<4>[46267.087454]  [<ffffffff81474885>] ? compat_sys_setsockopt+0x75/0x210
<4>[46267.087531]  [<ffffffff81474b5f>] compat_sys_socketcall+0x13f/0x210
<4>[46267.087607]  [<ffffffff8104dea3>] ia32_sysret+0x0/0x5
<4>[46267.087676] Code: 91 20 e2 01 75 29 48 89 de 4c 89 f7 e8 56 fa ff ff 85 c0 0f 84 68 fc ff ff 0f b6 4d c6 41 8b 45 00 e9 4d fb ff ff e8 7c 19 e9 e0 <0f> 0b eb fe f6 05 17 91 20 e2 80 74 ce 80 3d 5f 2e 00 00 00 74
<1>[46267.088023] RIP  [<ffffffffa01e00a4>] nf_nat_setup_info+0x564/0x590

Cc: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
---
 net/netfilter/nf_conntrack_core.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 43549eb..7a34bb2 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -387,8 +387,12 @@ begin:
 			     !atomic_inc_not_zero(&ct->ct_general.use)))
 			h = NULL;
 		else {
+			/* A conntrack can be recreated with the equal tuple,
+			 * so we need to check that the conntrack is initialized
+			 */
 			if (unlikely(!nf_ct_tuple_equal(tuple, &h->tuple) ||
-				     nf_ct_zone(ct) != zone)) {
+				     nf_ct_zone(ct) != zone) ||
+				     !nf_ct_is_confirmed(ct)) {
 				nf_ct_put(ct);
 				goto begin;
 			}
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2014-02-03 16:22 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-07 10:31 [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get Andrey Vagin
2014-01-07 11:42 ` Vasily Averin
2014-01-07 15:08 ` Eric Dumazet
2014-01-07 15:25   ` Florian Westphal
2014-01-08 13:42     ` Eric Dumazet
2014-01-08 14:04       ` Florian Westphal
2014-01-08 17:31         ` Eric Dumazet
2014-01-08 20:18           ` Florian Westphal
2014-01-08 20:23             ` Florian Westphal
2014-01-09 20:32     ` Andrew Vagin
2014-01-09 20:32       ` Andrew Vagin
2014-01-09 20:56       ` Florian Westphal
2014-01-09 21:07         ` Andrew Vagin
2014-01-09 21:07           ` Andrew Vagin
2014-01-09 21:26           ` Florian Westphal
2014-01-09  5:24   ` Andrew Vagin
2014-01-09 15:23     ` Eric Dumazet
2014-01-09 21:46       ` Andrey Wagin
2014-01-08 13:17 ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v2) Andrey Vagin
2014-01-08 13:47   ` Eric Dumazet
2014-01-12 17:50     ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3) Andrey Vagin
2014-01-12 20:21       ` Eric Dumazet
2014-01-14 10:51         ` Andrew Vagin
2014-01-14 10:51           ` Andrew Vagin
2014-01-14 10:51           ` Andrew Vagin
2014-01-14 11:10           ` Andrey Wagin
2014-01-14 14:36           ` Eric Dumazet
2014-01-14 17:35             ` [PATCH] [RFC] netfilter: nf_conntrack: don't relase a conntrack with non-zero refcnt Andrey Vagin
2014-01-14 17:44               ` Cyrill Gorcunov
2014-01-14 18:53               ` Florian Westphal
2014-01-15 18:08                 ` Andrew Vagin
2014-01-15 18:08                   ` Andrew Vagin
2014-01-16  9:23                   ` Florian Westphal
2014-02-02 23:30                     ` Pablo Neira Ayuso
2014-02-03 13:59                       ` Andrew Vagin
2014-02-03 13:59                         ` Andrew Vagin
2014-02-03 16:22                       ` Eric Dumazet
2014-01-27 13:44               ` Andrew Vagin
2014-01-27 13:44                 ` Andrew Vagin
2014-01-29 19:21         ` [PATCH] netfilter: nf_conntrack: fix RCU race in nf_conntrack_find_get (v3) Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.