Re: [PATCH] netfilter: per netns nf_conntrack_cachep

netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Patrick McHardy <kaber@trash.net>
To: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>,
	Jon Masters <jonathan@jonmasters.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>,
	netfilter-devel <netfilter-devel@vger.kernel.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [PATCH] netfilter: per netns nf_conntrack_cachep
Date: Thu, 04 Feb 2010 15:00:44 +0100	[thread overview]
Message-ID: <4B6AD30C.8050107@trash.net> (raw)
In-Reply-To: <1265035970.2848.50.camel@edumazet-laptop>

[-- Attachment #1: Type: text/plain, Size: 994 bytes --]

Eric Dumazet wrote:
> [PATCH] netfilter: per netns nf_conntrack_cachep
> 
> nf_conntrack_cachep is currently shared by all netns instances, but
> because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.
> 
> If we use a shared slab cache, one object can instantly flight between
> one hash table (netns ONE) to another one (netns TWO), and concurrent
> reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
> can be fooled without notice, because no RCU grace period has to be
> observed between object freeing and its reuse.
> 
> We dont have this problem with UDP/TCP slab caches because TCP/UDP
> hashtables are global to the machine (and each object has a pointer to
> its netns).
> 
> If we use per netns conntrack hash tables, we also *must* use per netns
> conntrack slab caches, to guarantee an object can not escape from one
> namespace to another one.

Applied with the discussed change to allocate a unique name (attached
again for reference), thanks Eric.


[-- Attachment #2: x --]
[-- Type: text/plain, Size: 5090 bytes --]

commit ab59b19be78aac65cdd599fb5002c9019885e061
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Thu Feb 4 14:54:05 2010 +0100

    netfilter: nf_conntrack: per netns nf_conntrack_cachep
    
    nf_conntrack_cachep is currently shared by all netns instances, but
    because of SLAB_DESTROY_BY_RCU special semantics, this is wrong.
    
    If we use a shared slab cache, one object can instantly flight between
    one hash table (netns ONE) to another one (netns TWO), and concurrent
    reader (doing a lookup in netns ONE, 'finding' an object of netns TWO)
    can be fooled without notice, because no RCU grace period has to be
    observed between object freeing and its reuse.
    
    We dont have this problem with UDP/TCP slab caches because TCP/UDP
    hashtables are global to the machine (and each object has a pointer to
    its netns).
    
    If we use per netns conntrack hash tables, we also *must* use per netns
    conntrack slab caches, to guarantee an object can not escape from one
    namespace to another one.
    
    Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
    [Patrick: added unique slab name allocation]
    Signed-off-by: Patrick McHardy <kaber@trash.net>

diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index ba1ba0c..aed23b6 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -11,6 +11,7 @@ struct nf_conntrack_ecache;
 struct netns_ct {
 	atomic_t		count;
 	unsigned int		expect_count;
+	struct kmem_cache	*nf_conntrack_cachep;
 	struct hlist_nulls_head	*hash;
 	struct hlist_head	*expect_hash;
 	struct hlist_nulls_head	unconfirmed;
@@ -28,5 +29,6 @@ struct netns_ct {
 #endif
 	int			hash_vmalloc;
 	int			expect_vmalloc;
+	char			*slabname;
 };
 #endif
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 37e2b88..9de4bd4 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -63,8 +63,6 @@ EXPORT_SYMBOL_GPL(nf_conntrack_max);
 struct nf_conn nf_conntrack_untracked __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_untracked);
 
-static struct kmem_cache *nf_conntrack_cachep __read_mostly;
-
 static int nf_conntrack_hash_rnd_initted;
 static unsigned int nf_conntrack_hash_rnd;
 
@@ -572,7 +570,7 @@ struct nf_conn *nf_conntrack_alloc(struct net *net,
 	 * Do not use kmem_cache_zalloc(), as this cache uses
 	 * SLAB_DESTROY_BY_RCU.
 	 */
-	ct = kmem_cache_alloc(nf_conntrack_cachep, gfp);
+	ct = kmem_cache_alloc(net->ct.nf_conntrack_cachep, gfp);
 	if (ct == NULL) {
 		pr_debug("nf_conntrack_alloc: Can't alloc conntrack.\n");
 		atomic_dec(&net->ct.count);
@@ -611,7 +609,7 @@ void nf_conntrack_free(struct nf_conn *ct)
 	nf_ct_ext_destroy(ct);
 	atomic_dec(&net->ct.count);
 	nf_ct_ext_free(ct);
-	kmem_cache_free(nf_conntrack_cachep, ct);
+	kmem_cache_free(net->ct.nf_conntrack_cachep, ct);
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_free);
 
@@ -1119,7 +1117,6 @@ static void nf_conntrack_cleanup_init_net(void)
 
 	nf_conntrack_helper_fini();
 	nf_conntrack_proto_fini();
-	kmem_cache_destroy(nf_conntrack_cachep);
 }
 
 static void nf_conntrack_cleanup_net(struct net *net)
@@ -1137,6 +1134,8 @@ static void nf_conntrack_cleanup_net(struct net *net)
 	nf_conntrack_ecache_fini(net);
 	nf_conntrack_acct_fini(net);
 	nf_conntrack_expect_fini(net);
+	kmem_cache_destroy(net->ct.nf_conntrack_cachep);
+	kfree(net->ct.slabname);
 	free_percpu(net->ct.stat);
 }
 
@@ -1272,15 +1271,6 @@ static int nf_conntrack_init_init_net(void)
 	       NF_CONNTRACK_VERSION, nf_conntrack_htable_size,
 	       nf_conntrack_max);
 
-	nf_conntrack_cachep = kmem_cache_create("nf_conntrack",
-						sizeof(struct nf_conn),
-						0, SLAB_DESTROY_BY_RCU, NULL);
-	if (!nf_conntrack_cachep) {
-		printk(KERN_ERR "Unable to create nf_conn slab cache\n");
-		ret = -ENOMEM;
-		goto err_cache;
-	}
-
 	ret = nf_conntrack_proto_init();
 	if (ret < 0)
 		goto err_proto;
@@ -1302,8 +1292,6 @@ static int nf_conntrack_init_init_net(void)
 err_helper:
 	nf_conntrack_proto_fini();
 err_proto:
-	kmem_cache_destroy(nf_conntrack_cachep);
-err_cache:
 	return ret;
 }
 
@@ -1325,6 +1313,21 @@ static int nf_conntrack_init_net(struct net *net)
 		ret = -ENOMEM;
 		goto err_stat;
 	}
+
+	net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%p", net);
+	if (!net->ct.slabname) {
+		ret = -ENOMEM;
+		goto err_slabname;
+	}
+
+	net->ct.nf_conntrack_cachep = kmem_cache_create(net->ct.slabname,
+							sizeof(struct nf_conn), 0,
+							SLAB_DESTROY_BY_RCU, NULL);
+	if (!net->ct.nf_conntrack_cachep) {
+		printk(KERN_ERR "Unable to create nf_conn slab cache\n");
+		ret = -ENOMEM;
+		goto err_cache;
+	}
 	net->ct.hash = nf_ct_alloc_hashtable(&nf_conntrack_htable_size,
 					     &net->ct.hash_vmalloc, 1);
 	if (!net->ct.hash) {
@@ -1352,6 +1355,10 @@ err_expect:
 	nf_ct_free_hashtable(net->ct.hash, net->ct.hash_vmalloc,
 			     nf_conntrack_htable_size);
 err_hash:
+	kmem_cache_destroy(net->ct.nf_conntrack_cachep);
+err_cache:
+	kfree(net->ct.slabname);
+err_slabname:
 	free_percpu(net->ct.stat);
 err_stat:
 	return ret;

next prev parent reply	other threads:[~2010-02-04 14:00 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-01-30  1:10 debug: nt_conntrack and KVM crash Jon Masters
2010-01-30  1:57 ` Jon Masters
2010-01-30  1:59   ` Jon Masters
2010-01-30  6:58     ` Eric Dumazet
2010-01-30  7:36       ` Jon Masters
2010-01-30  7:40         ` Jon Masters
2010-01-30  8:33         ` Eric Dumazet
2010-01-30 10:03           ` Jon Masters
2010-02-01  9:32       ` Jon Masters
2010-02-01  9:36         ` Alexey Dobriyan
2010-02-01 10:12           ` Eric Dumazet
2010-02-01 10:25             ` Alexey Dobriyan
2010-02-01 10:38               ` Jon Masters
2010-02-01 11:23               ` Eric Dumazet
2010-02-01 14:48                 ` Alexey Dobriyan
2010-02-01 14:57                   ` Eric Dumazet
2010-02-01 14:52                 ` [PATCH] netfilter: per netns nf_conntrack_cachep Eric Dumazet
2010-02-01 14:58                   ` Alexey Dobriyan
2010-02-01 15:02                     ` Eric Dumazet
2010-02-02 11:04                       ` Jon Masters
2010-02-02 11:35                         ` Jon Masters
2010-02-02 16:46                           ` Jon Masters
2010-02-02 16:48                             ` Patrick McHardy
2010-02-02 17:07                               ` Jon Masters
2010-02-02 17:58                                 ` Alexey Dobriyan
2010-02-02 18:16                                   ` Jon Masters
2010-02-02 18:34                                     ` Jon Masters
2010-02-02 18:36                                     ` Patrick McHardy
2010-02-02 18:39                                       ` Jon Masters
2010-02-02 18:42                                         ` Jon Masters
2010-02-03 12:10                                       ` Patrick McHardy
2010-02-03 18:38                                         ` Jon Masters
2010-02-03 19:09                                           ` Alexey Dobriyan
2010-02-03 19:43                                             ` Jon Masters
2010-02-03 19:46                                               ` Jon Masters
2010-02-03 19:53                                                 ` Alexey Dobriyan
2010-02-03 20:04                                                   ` Jon Masters
2010-02-03 19:51                                               ` Alexey Dobriyan
2010-02-03 19:53                                                 ` Jon Masters
2010-02-03 20:01                                                   ` Alexey Dobriyan
2010-02-04 12:25                                               ` Patrick McHardy
2010-02-04 12:27                                                 ` Alexey Dobriyan
2010-02-04 12:30                                                   ` Patrick McHardy
2010-02-04 12:35                                                     ` Alexey Dobriyan
2010-02-04 13:04                                                       ` Patrick McHardy
2010-02-04 13:18                                                         ` Jon Masters
2010-02-04 13:37                                                           ` Patrick McHardy
2010-02-04 13:42                                                             ` Jon Masters
2010-02-03 20:21                                         ` Jon Masters
2010-02-04 12:24                                           ` Patrick McHardy
2010-02-02 16:58                             ` PROBLEM with summary: " Jon Masters
2010-02-02 17:04                               ` Patrick McHardy
2010-02-02 17:16                                 ` Eric Dumazet
2010-02-02 17:23                                   ` Jon Masters
2010-02-02  4:36                   ` Jon Masters
2010-02-02  7:02                     ` Jon Masters
2010-02-02 10:47                   ` Jon Masters
2010-02-04 14:00                   ` Patrick McHardy [this message]
2010-02-01 10:35           ` debug: nt_conntrack and KVM crash Jon Masters
2010-02-01 10:44             ` Alexey Dobriyan
2010-02-01 10:47               ` Alexey Dobriyan
2010-02-01 10:49                 ` Alexey Dobriyan
2010-02-01 10:53                   ` Jon Masters

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:ba1ba0c dfblob:aed23b6 dfblob:37e2b88 dfblob:9de4bd4 )
 OR (
bs:"Re: [PATCH] netfilter: per netns nf_conntrack_cachep" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B6AD30C.8050107@trash.net \
    --to=kaber@trash.net \
    --cc=adobriyan@gmail.com \
    --cc=eric.dumazet@gmail.com \
    --cc=jonathan@jonmasters.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).