All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: Patrick McHardy <kaber@trash.net>
Cc: David Miller <davem@davemloft.net>,
	netdev@vger.kernel.org, paulmck@linux.vnet.ibm.com
Subject: [PATCH] net: nf_conntrack_alloc() fixes
Date: Wed, 15 Jul 2009 21:54:01 +0200	[thread overview]
Message-ID: <4A5E33D9.2030602@gmail.com> (raw)
In-Reply-To: <4A5DF5B4.5090809@trash.net>

Patrick McHardy a écrit :
> Eric Dumazet wrote:
>> [PATCH] net: nf_conntrack_alloc() should not use kmem_cache_zalloc()
>>
>> When a slab cache uses SLAB_DESTROY_BY_RCU, we must be careful when allocating
>> objects, since slab allocator could give a freed object still used by lockless
>> readers.
>>
>> In particular, nf_conntrack RCU lookups rely on ct->tuplehash[xxx].hnnode.next
>> being always valid (ie containing a valid 'nulls' value, or a valid pointer to next
>> object in hash chain.)
>>
>> kmem_cache_zalloc() setups object with NULL values, but a NULL value is not valid
>> for ct->tuplehash[xxx].hnnode.next.
>>
>> Fix is to call kmem_cache_alloc() and do the zeroing ourself.
> 
> I think this is still racy, please see below:

Nice catch indeed !

> 
>> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
>> index 7508f11..23feafa 100644
>> --- a/net/netfilter/nf_conntrack_core.c
>> +++ b/net/netfilter/nf_conntrack_core.c
>> @@ -561,17 +561,28 @@ struct nf_conn *nf_conntrack_alloc(struct net *net,
>>  		}
>>  	}
>>  
>> -	ct = kmem_cache_zalloc(nf_conntrack_cachep, gfp);
>> +	/*
>> +	 * Do not use kmem_cache_zalloc(), as this cache uses
>> +	 * SLAB_DESTROY_BY_RCU.
>> +	 */
>> +	ct = kmem_cache_alloc(nf_conntrack_cachep, gfp);
>>  	if (ct == NULL) {
>>  		pr_debug("nf_conntrack_alloc: Can't alloc conntrack.\n");
>>  		atomic_dec(&net->ct.count);
>>  		return ERR_PTR(-ENOMEM);
>>  	}
>> -
> 
> __nf_conntrack_find() on another CPU finds the entry at this point.
> 
>> +	/*
>> +	 * Let ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode.next
>> +	 * and ct->tuplehash[IP_CT_DIR_REPLY].hnnode.next unchanged.
>> +	 */
>> +	memset(&ct->tuplehash[IP_CT_DIR_MAX], 0,
>> +	       sizeof(*ct) - offsetof(struct nf_conn, tuplehash[IP_CT_DIR_MAX]));
>>  	spin_lock_init(&ct->lock);
>>  	atomic_set(&ct->ct_general.use, 1);
> 
> nf_conntrack_find_get() successfully tries to atomic_inc_not_zero()
> at this point, following by another tuple comparison which is also
> successful.
> 
> Am I missing something? I think we need to make sure the reference
> count is not increased until the new tuples are visible.

Yes you are right, and Documentation/RCU/rculist_nulls.txt should be
updated to reflect this as well (in insert algo, must use smp_wmb()
between obj->key assignment and refcnt assigment)

We'll have to change socket allocation too, this will be addressed
by a followup patch

Thanks Patrick !

[PATCH] net: nf_conntrack_alloc() fixes

When a slab cache uses SLAB_DESTROY_BY_RCU, we must be careful when allocating
objects, since slab allocator could give a freed object still used by lockless
readers.

In particular, nf_conntrack RCU lookups rely on ct->tuplehash[xxx].hnnode.next
being always valid (ie containing a valid 'nulls' value, or a valid pointer to next
object in hash chain.)

kmem_cache_zalloc() setups object with NULL values, but a NULL value is not valid
for ct->tuplehash[xxx].hnnode.next.

Fix is to call kmem_cache_alloc() and do the zeroing ourself.

As spotted by Patrick, we also need to make sure lookup keys are committed to
memory before setting refcount to 1, or a lockless reader could get a reference
on the old version of the object. Its key re-check could then pass the barrier. 

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 Documentation/RCU/rculist_nulls.txt |    7 ++++++-
 net/netfilter/nf_conntrack_core.c   |   21 ++++++++++++++++++---
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/Documentation/RCU/rculist_nulls.txt b/Documentation/RCU/rculist_nulls.txt
index 93cb28d..18f9651 100644
--- a/Documentation/RCU/rculist_nulls.txt
+++ b/Documentation/RCU/rculist_nulls.txt
@@ -83,11 +83,12 @@ not detect it missed following items in original chain.
 obj = kmem_cache_alloc(...);
 lock_chain(); // typically a spin_lock()
 obj->key = key;
-atomic_inc(&obj->refcnt);
 /*
  * we need to make sure obj->key is updated before obj->next
+ * or obj->refcnt
  */
 smp_wmb();
+atomic_set(&obj->refcnt, 1);
 hlist_add_head_rcu(&obj->obj_node, list);
 unlock_chain(); // typically a spin_unlock()
 
@@ -159,6 +160,10 @@ out:
 obj = kmem_cache_alloc(cachep);
 lock_chain(); // typically a spin_lock()
 obj->key = key;
+/*
+ * changes to obj->key must be visible before refcnt one
+ */
+smp_wmb();
 atomic_set(&obj->refcnt, 1);
 /*
  * insert obj in RCU way (readers might be traversing chain)
diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index 7508f11..b5869b9 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -561,23 +561,38 @@ struct nf_conn *nf_conntrack_alloc(struct net *net,
 		}
 	}
 
-	ct = kmem_cache_zalloc(nf_conntrack_cachep, gfp);
+	/*
+	 * Do not use kmem_cache_zalloc(), as this cache uses
+	 * SLAB_DESTROY_BY_RCU.
+	 */
+	ct = kmem_cache_alloc(nf_conntrack_cachep, gfp);
 	if (ct == NULL) {
 		pr_debug("nf_conntrack_alloc: Can't alloc conntrack.\n");
 		atomic_dec(&net->ct.count);
 		return ERR_PTR(-ENOMEM);
 	}
-
+	/*
+	 * Let ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode.next
+	 * and ct->tuplehash[IP_CT_DIR_REPLY].hnnode.next unchanged.
+	 */
+	memset(&ct->tuplehash[IP_CT_DIR_MAX], 0,
+	       sizeof(*ct) - offsetof(struct nf_conn, tuplehash[IP_CT_DIR_MAX]));
 	spin_lock_init(&ct->lock);
-	atomic_set(&ct->ct_general.use, 1);
 	ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple = *orig;
+	ct->tuplehash[IP_CT_DIR_ORIGINAL].hnnode.pprev = NULL;
 	ct->tuplehash[IP_CT_DIR_REPLY].tuple = *repl;
+	ct->tuplehash[IP_CT_DIR_REPLY].hnnode.pprev = NULL;
 	/* Don't set timer yet: wait for confirmation */
 	setup_timer(&ct->timeout, death_by_timeout, (unsigned long)ct);
 #ifdef CONFIG_NET_NS
 	ct->ct_net = net;
 #endif
 
+	/*
+	 * changes to lookup keys must be done before setting refcnt to 1
+	 */
+	smp_wmb();
+	atomic_set(&ct->ct_general.use, 1);
 	return ct;
 }
 EXPORT_SYMBOL_GPL(nf_conntrack_alloc);

  reply	other threads:[~2009-07-15 19:54 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-01 18:08 WARNING: at include/net/sock.h:417 udp_lib_unhash Tantilov, Emil S
2009-07-02  6:10 ` Eric Dumazet
2009-07-07  0:54   ` Emil S Tantilov
2009-07-07  7:21     ` Eric Dumazet
2009-07-07  7:40       ` Eric Dumazet
2009-07-07 16:14         ` [PATCH] net: sk_alloc() should not blindly overwrite memory Eric Dumazet
2009-07-07 18:33           ` Tantilov, Emil S
2009-07-07 22:33             ` [PATCH] net: sk_prot_alloc() " Eric Dumazet
2009-07-08  2:14               ` David Miller
2009-07-08  6:50                 ` Eric Dumazet
2009-07-09  5:36                   ` Eric Dumazet
2009-07-09 17:13                     ` Paul E. McKenney
2009-07-09 20:50                       ` Eric Dumazet
2009-07-12  3:27                     ` David Miller
2009-07-12  7:07                       ` Eric Dumazet
2009-07-15 12:28                         ` [PATCH] net: nf_conntrack_alloc() should not use kmem_cache_zalloc() Eric Dumazet
2009-07-15 15:28                           ` Patrick McHardy
2009-07-15 19:54                             ` Eric Dumazet [this message]
2009-07-16  9:13                               ` [PATCH] net: sock_copy() fixes Eric Dumazet
2009-07-17  1:09                                 ` David Miller
2009-07-16 12:05                               ` [PATCH] net: nf_conntrack_alloc() fixes Patrick McHardy
2009-07-08 17:02                 ` [PATCH] net: sk_prot_alloc() should not blindly overwrite memory Tantilov, Emil S
2009-07-08 17:45                   ` David Miller
2009-07-08 23:21                     ` Eric Dumazet
2009-07-08 23:35                       ` Tantilov, Emil S
2009-07-09  0:20                         ` [PATCH] net: ip_push_pending_frames() fix Eric Dumazet
2009-07-09 14:32                           ` Tantilov, Emil S
2009-07-09 14:38                             ` Eric Dumazet
2009-07-12  3:27                           ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A5E33D9.2030602@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.