All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>,
	netdev@vger.kernel.org, paulmck@linux.vnet.ibm.com
Subject: [PATCH] net: sock_copy() fixes
Date: Thu, 16 Jul 2009 11:13:10 +0200	[thread overview]
Message-ID: <4A5EEF26.5040209@gmail.com> (raw)
In-Reply-To: <4A5E33D9.2030602@gmail.com>

David, I believe following patch could be 2.6.31 candidate
(and stable (2.6.29/2.6.30) as well)

Note : Because we clear all sock fields in sk_prot_alloc(),
setting sk_refcnt to 1 in sock_init_data() is safe, even if
sock not yet added to hash table (with correct keys).
Lockless readers have to re-check their key after getting a
reference on the found object anyway, their re-check
should not catch old key values.

Next problem to fix (but it can wait 2.6.32 I believe) is
to delay sk_refcnt initialization *after* sock lookup keys
are setup for new socket, not after being set to 0.

A lockless reader could :
- find an old sock, with old keys
      (at this point, another cpu re-allocate this sock, clear fields,
       and sets sk_refcnt to 1)
- succeed to increment sk_refcnt
      (at this point, other cpu is filling lookup fields (remote address/port, local address...)
- re-check keys and find a match, while socket still being initialized by other cpu.

Only way to fix this is to set sk_refcnt to a non null value only at the
point it is safe for a reader to find the socket and use it.

Thanks

[PATCH] net: sock_copy() fixes

Commit e912b1142be8f1e2c71c71001dc992c6e5eb2ec1
(net: sk_prot_alloc() should not blindly overwrite memory)
took care of not zeroing whole new socket at allocation time.

sock_copy() is another spot where we should be very careful.
We should not set refcnt to a non null value, until
we are sure other fields are correctly setup, or
a lockless reader could catch this socket by mistake,
while not fully (re)initialized.

This patch puts sk_node & sk_refcnt to the very beginning
of struct sock to ease sock_copy() & sk_prot_alloc() job.

We add appropriate smp_wmb() before sk_refcnt initializations
to match our RCU requirements (changes to sock keys should
be committed to memory before sk_refcnt setting)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h |   32 +++++++++++++++++++-------------
 net/core/sock.c    |   20 ++++++++++++++++++--
 2 files changed, 37 insertions(+), 15 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 2c0da92..ae0906e 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -104,15 +104,15 @@ struct net;
 
 /**
  *	struct sock_common - minimal network layer representation of sockets
+ *	@skc_node: main hash linkage for various protocol lookup tables
+ *	@skc_nulls_node: main hash linkage for UDP/UDP-Lite protocol
+ *	@skc_refcnt: reference count
+ *	@skc_hash: hash value used with various protocol lookup tables
  *	@skc_family: network address family
  *	@skc_state: Connection state
  *	@skc_reuse: %SO_REUSEADDR setting
  *	@skc_bound_dev_if: bound device index if != 0
- *	@skc_node: main hash linkage for various protocol lookup tables
- *	@skc_nulls_node: main hash linkage for UDP/UDP-Lite protocol
  *	@skc_bind_node: bind hash linkage for various protocol lookup tables
- *	@skc_refcnt: reference count
- *	@skc_hash: hash value used with various protocol lookup tables
  *	@skc_prot: protocol handlers inside a network family
  *	@skc_net: reference to the network namespace of this socket
  *
@@ -120,17 +120,21 @@ struct net;
  *	for struct sock and struct inet_timewait_sock.
  */
 struct sock_common {
-	unsigned short		skc_family;
-	volatile unsigned char	skc_state;
-	unsigned char		skc_reuse;
-	int			skc_bound_dev_if;
+	/*
+	 * first fields are not copied in sock_copy()
+	 */
 	union {
 		struct hlist_node	skc_node;
 		struct hlist_nulls_node skc_nulls_node;
 	};
-	struct hlist_node	skc_bind_node;
 	atomic_t		skc_refcnt;
+
 	unsigned int		skc_hash;
+	unsigned short		skc_family;
+	volatile unsigned char	skc_state;
+	unsigned char		skc_reuse;
+	int			skc_bound_dev_if;
+	struct hlist_node	skc_bind_node;
 	struct proto		*skc_prot;
 #ifdef CONFIG_NET_NS
 	struct net	 	*skc_net;
@@ -208,15 +212,17 @@ struct sock {
 	 * don't add nothing before this first member (__sk_common) --acme
 	 */
 	struct sock_common	__sk_common;
+#define sk_node			__sk_common.skc_node
+#define sk_nulls_node		__sk_common.skc_nulls_node
+#define sk_refcnt		__sk_common.skc_refcnt
+
+#define sk_copy_start		__sk_common.skc_hash
+#define sk_hash			__sk_common.skc_hash
 #define sk_family		__sk_common.skc_family
 #define sk_state		__sk_common.skc_state
 #define sk_reuse		__sk_common.skc_reuse
 #define sk_bound_dev_if		__sk_common.skc_bound_dev_if
-#define sk_node			__sk_common.skc_node
-#define sk_nulls_node		__sk_common.skc_nulls_node
 #define sk_bind_node		__sk_common.skc_bind_node
-#define sk_refcnt		__sk_common.skc_refcnt
-#define sk_hash			__sk_common.skc_hash
 #define sk_prot			__sk_common.skc_prot
 #define sk_net			__sk_common.skc_net
 	kmemcheck_bitfield_begin(flags);
diff --git a/net/core/sock.c b/net/core/sock.c
index ba5d211..d9eec15 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -919,13 +919,19 @@ static inline void sock_lock_init(struct sock *sk)
 			af_family_keys + sk->sk_family);
 }
 
+/*
+ * Copy all fields from osk to nsk but nsk->sk_refcnt must not change yet,
+ * even temporarly, because of RCU lookups. sk_node should also be left as is.
+ */
 static void sock_copy(struct sock *nsk, const struct sock *osk)
 {
 #ifdef CONFIG_SECURITY_NETWORK
 	void *sptr = nsk->sk_security;
 #endif
-
-	memcpy(nsk, osk, osk->sk_prot->obj_size);
+	BUILD_BUG_ON(offsetof(struct sock, sk_copy_start) !=
+		     sizeof(osk->sk_node) + sizeof(osk->sk_refcnt));
+	memcpy(&nsk->sk_copy_start, &osk->sk_copy_start,
+	       osk->sk_prot->obj_size - offsetof(struct sock, sk_copy_start));
 #ifdef CONFIG_SECURITY_NETWORK
 	nsk->sk_security = sptr;
 	security_sk_clone(osk, nsk);
@@ -1140,6 +1146,11 @@ struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
 
 		newsk->sk_err	   = 0;
 		newsk->sk_priority = 0;
+		/*
+		 * Before updating sk_refcnt, we must commit prior changes to memory
+		 * (Documentation/RCU/rculist_nulls.txt for details)
+		 */
+		smp_wmb();
 		atomic_set(&newsk->sk_refcnt, 2);
 
 		/*
@@ -1855,6 +1866,11 @@ void sock_init_data(struct socket *sock, struct sock *sk)
 
 	sk->sk_stamp = ktime_set(-1L, 0);
 
+	/*
+	 * Before updating sk_refcnt, we must commit prior changes to memory
+	 * (Documentation/RCU/rculist_nulls.txt for details)
+	 */
+	smp_wmb();
 	atomic_set(&sk->sk_refcnt, 1);
 	atomic_set(&sk->sk_wmem_alloc, 1);
 	atomic_set(&sk->sk_drops, 0);

  reply	other threads:[~2009-07-16  9:13 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-07-01 18:08 WARNING: at include/net/sock.h:417 udp_lib_unhash Tantilov, Emil S
2009-07-02  6:10 ` Eric Dumazet
2009-07-07  0:54   ` Emil S Tantilov
2009-07-07  7:21     ` Eric Dumazet
2009-07-07  7:40       ` Eric Dumazet
2009-07-07 16:14         ` [PATCH] net: sk_alloc() should not blindly overwrite memory Eric Dumazet
2009-07-07 18:33           ` Tantilov, Emil S
2009-07-07 22:33             ` [PATCH] net: sk_prot_alloc() " Eric Dumazet
2009-07-08  2:14               ` David Miller
2009-07-08  6:50                 ` Eric Dumazet
2009-07-09  5:36                   ` Eric Dumazet
2009-07-09 17:13                     ` Paul E. McKenney
2009-07-09 20:50                       ` Eric Dumazet
2009-07-12  3:27                     ` David Miller
2009-07-12  7:07                       ` Eric Dumazet
2009-07-15 12:28                         ` [PATCH] net: nf_conntrack_alloc() should not use kmem_cache_zalloc() Eric Dumazet
2009-07-15 15:28                           ` Patrick McHardy
2009-07-15 19:54                             ` [PATCH] net: nf_conntrack_alloc() fixes Eric Dumazet
2009-07-16  9:13                               ` Eric Dumazet [this message]
2009-07-17  1:09                                 ` [PATCH] net: sock_copy() fixes David Miller
2009-07-16 12:05                               ` [PATCH] net: nf_conntrack_alloc() fixes Patrick McHardy
2009-07-08 17:02                 ` [PATCH] net: sk_prot_alloc() should not blindly overwrite memory Tantilov, Emil S
2009-07-08 17:45                   ` David Miller
2009-07-08 23:21                     ` Eric Dumazet
2009-07-08 23:35                       ` Tantilov, Emil S
2009-07-09  0:20                         ` [PATCH] net: ip_push_pending_frames() fix Eric Dumazet
2009-07-09 14:32                           ` Tantilov, Emil S
2009-07-09 14:38                             ` Eric Dumazet
2009-07-12  3:27                           ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4A5EEF26.5040209@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=davem@davemloft.net \
    --cc=kaber@trash.net \
    --cc=netdev@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.