netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/5] Further SCTP changes
@ 2013-06-18  8:55 Daniel Borkmann
  2013-06-18  8:55 ` [PATCH net-next 1/5] net: sctp: remove TEST_FRAME ifdef Daniel Borkmann
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Daniel Borkmann @ 2013-06-18  8:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-sctp

Mostly it's all about getting rid of unnecessary code, and/or making
it more clean and readable.

Code has been tested with lksctp-tools functional test suite and
also my small sctp stress test helper that was used to discover
the recent null-ptr derefs. This test was also sucessfully run against
auth_enabled=0 and auth_enabled=1, where I also previously applied
the following two patches:

  - http://marc.info/?l=linux-crypto-vger&m=137121898331017&w=2
    -> not yet applied, still pending review for crypto tree
  - 1abd165 (net: sctp: fix NULL pointer dereference in socket destruction)
    -> cherry picked from net tree

Daniel Borkmann (5):
  net: sctp: remove TEST_FRAME ifdef
  ktime: add ms_to_ktime() and ktime_add_ms() helpers
  net: sctp: migrate cookie life from timeval to ktime
  net: sctp: decouple cleaning socket data from endpoint
  net: sctp: minor: sctp_seq_dump_local_addrs add missing newline

 include/linux/ktime.h      | 13 +++++++++++++
 include/net/sctp/sctp.h    | 25 -------------------------
 include/net/sctp/structs.h |  6 +++---
 net/sctp/associola.c       |  8 +-------
 net/sctp/endpointola.c     |  7 -------
 net/sctp/proc.c            |  2 +-
 net/sctp/sm_make_chunk.c   | 19 ++++++++-----------
 net/sctp/socket.c          | 34 ++++++++++++++++++++++------------
 8 files changed, 48 insertions(+), 66 deletions(-)

-- 
1.7.11.7

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH net-next 1/5] net: sctp: remove TEST_FRAME ifdef
  2013-06-18  8:55 [PATCH net-next 0/5] Further SCTP changes Daniel Borkmann
@ 2013-06-18  8:55 ` Daniel Borkmann
  2013-06-18  8:55 ` [PATCH net-next 2/5] ktime: add ms_to_ktime() and ktime_add_ms() helpers Daniel Borkmann
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Daniel Borkmann @ 2013-06-18  8:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-sctp

We do neither ship a test_frame.h, nor will this be compatible with
the 2.5 out-of-tree lksctp kernel test suite anyway. So remove this
artefact.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/net/sctp/sctp.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 6321c08..15214a8 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -188,11 +188,6 @@ extern struct kmem_cache *sctp_bucket_cachep __read_mostly;
  *  Section:  Macros, externs, and inlines
  */
 
-
-#ifdef TEST_FRAME
-#include <test_frame.h>
-#else
-
 /* spin lock wrappers. */
 #define sctp_spin_lock_irqsave(lock, flags) spin_lock_irqsave(lock, flags)
 #define sctp_spin_unlock_irqrestore(lock, flags)  \
@@ -218,8 +213,6 @@ extern struct kmem_cache *sctp_bucket_cachep __read_mostly;
 #define SCTP_INC_STATS_USER(net, field) SNMP_INC_STATS_USER((net)->sctp.sctp_statistics, field)
 #define SCTP_DEC_STATS(net, field)      SNMP_DEC_STATS((net)->sctp.sctp_statistics, field)
 
-#endif /* !TEST_FRAME */
-
 /* sctp mib definitions */
 enum {
 	SCTP_MIB_NUM = 0,
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 2/5] ktime: add ms_to_ktime() and ktime_add_ms() helpers
  2013-06-18  8:55 [PATCH net-next 0/5] Further SCTP changes Daniel Borkmann
  2013-06-18  8:55 ` [PATCH net-next 1/5] net: sctp: remove TEST_FRAME ifdef Daniel Borkmann
@ 2013-06-18  8:55 ` Daniel Borkmann
  2013-06-18  8:55 ` [PATCH net-next 3/5] net: sctp: migrate cookie life from timeval to ktime Daniel Borkmann
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Daniel Borkmann @ 2013-06-18  8:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-sctp

Add two ktime helper functions that i) convert a given msec value to
a ktime structure and ii) that adds a msec value to a ktime structure.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/linux/ktime.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index bbca128..b4fa5e4 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -323,6 +323,11 @@ static inline ktime_t ktime_add_us(const ktime_t kt, const u64 usec)
 	return ktime_add_ns(kt, usec * 1000);
 }
 
+static inline ktime_t ktime_add_ms(const ktime_t kt, const u64 msec)
+{
+	return ktime_add_ns(kt, msec * NSEC_PER_MSEC);
+}
+
 static inline ktime_t ktime_sub_us(const ktime_t kt, const u64 usec)
 {
 	return ktime_sub_ns(kt, usec * 1000);
@@ -366,7 +371,15 @@ extern void ktime_get_ts(struct timespec *ts);
 static inline ktime_t ns_to_ktime(u64 ns)
 {
 	static const ktime_t ktime_zero = { .tv64 = 0 };
+
 	return ktime_add_ns(ktime_zero, ns);
 }
 
+static inline ktime_t ms_to_ktime(u64 ms)
+{
+	static const ktime_t ktime_zero = { .tv64 = 0 };
+
+	return ktime_add_ms(ktime_zero, ms);
+}
+
 #endif
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 3/5] net: sctp: migrate cookie life from timeval to ktime
  2013-06-18  8:55 [PATCH net-next 0/5] Further SCTP changes Daniel Borkmann
  2013-06-18  8:55 ` [PATCH net-next 1/5] net: sctp: remove TEST_FRAME ifdef Daniel Borkmann
  2013-06-18  8:55 ` [PATCH net-next 2/5] ktime: add ms_to_ktime() and ktime_add_ms() helpers Daniel Borkmann
@ 2013-06-18  8:55 ` Daniel Borkmann
  2013-06-18  8:55 ` [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint Daniel Borkmann
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 20+ messages in thread
From: Daniel Borkmann @ 2013-06-18  8:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-sctp

Currently, SCTP code defines its own timeval functions (since timeval
is rarely used inside the kernel by others), namely tv_lt() and
TIMEVAL_ADD() macros, that operate on SCTP cookie expiration.

We might as well remove all those, and operate directly on ktime
structures for a couple of reasons: ktime is available on all archs;
complexity of ktime calculations depending on the arch is less than
(reduces to simpler arithmetic operations on archs with
BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval
functions; code becomes more readable; macros can be thrown out.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 include/net/sctp/sctp.h    | 18 ------------------
 include/net/sctp/structs.h |  6 +++---
 net/sctp/associola.c       |  8 +-------
 net/sctp/sm_make_chunk.c   | 19 ++++++++-----------
 net/sctp/socket.c          | 14 +++-----------
 5 files changed, 15 insertions(+), 50 deletions(-)

diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 15214a8..e6b95bc 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -560,24 +560,6 @@ for (pos = chunk->subh.fwdtsn_hdr->skip;\
 /* Round an int up to the next multiple of 4.  */
 #define WORD_ROUND(s) (((s)+3)&~3)
 
-/* Compare two timevals.  */
-#define tv_lt(s, t) \
-   (s.tv_sec < t.tv_sec || (s.tv_sec == t.tv_sec && s.tv_usec < t.tv_usec))
-
-/* Add tv1 to tv2. */
-#define TIMEVAL_ADD(tv1, tv2) \
-({ \
-        suseconds_t usecs = (tv2).tv_usec + (tv1).tv_usec; \
-        time_t secs = (tv2).tv_sec + (tv1).tv_sec; \
-\
-        if (usecs >= 1000000) { \
-                usecs -= 1000000; \
-                secs++; \
-        } \
-        (tv2).tv_sec = secs; \
-        (tv2).tv_usec = usecs; \
-})
-
 /* External references. */
 
 extern struct proto sctp_prot;
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 1bd4c41..e745c92 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -54,7 +54,7 @@
 #ifndef __sctp_structs_h__
 #define __sctp_structs_h__
 
-#include <linux/time.h>		/* We get struct timespec.    */
+#include <linux/ktime.h>
 #include <linux/socket.h>	/* linux/in.h needs this!!    */
 #include <linux/in.h>		/* We get struct sockaddr_in. */
 #include <linux/in6.h>		/* We get struct in6_addr     */
@@ -284,7 +284,7 @@ struct sctp_cookie {
 	__u32 peer_ttag;
 
 	/* When does this cookie expire? */
-	struct timeval expiration;
+	ktime_t expiration;
 
 	/* Number of inbound/outbound streams which are set
 	 * and negotiated during the INIT process.
@@ -1537,7 +1537,7 @@ struct sctp_association {
 	sctp_state_t state;
 
 	/* The cookie life I award for any cookie.  */
-	struct timeval cookie_life;
+	ktime_t cookie_life;
 
 	/* Overall     : The overall association error count.
 	 * Error Count : [Clear this any time I get something.]
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index bf6e6bd..9a383a8 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -102,13 +102,7 @@ static struct sctp_association *sctp_association_init(struct sctp_association *a
 	sctp_bind_addr_init(&asoc->base.bind_addr, ep->base.bind_addr.port);
 
 	asoc->state = SCTP_STATE_CLOSED;
-
-	/* Set these values from the socket values, a conversion between
-	 * millsecons to seconds/microseconds must also be done.
-	 */
-	asoc->cookie_life.tv_sec = sp->assocparams.sasoc_cookie_life / 1000;
-	asoc->cookie_life.tv_usec = (sp->assocparams.sasoc_cookie_life % 1000)
-					* 1000;
+	asoc->cookie_life = ms_to_ktime(sp->assocparams.sasoc_cookie_life);
 	asoc->frag_point = 0;
 	asoc->user_frag = sp->user_frag;
 
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index fc85487..dd71f1f 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1630,8 +1630,8 @@ static sctp_cookie_param_t *sctp_pack_cookie(const struct sctp_endpoint *ep,
 	cookie->c.adaptation_ind = asoc->peer.adaptation_ind;
 
 	/* Set an expiration time for the cookie.  */
-	do_gettimeofday(&cookie->c.expiration);
-	TIMEVAL_ADD(asoc->cookie_life, cookie->c.expiration);
+	cookie->c.expiration = ktime_add(asoc->cookie_life,
+					 ktime_get());
 
 	/* Copy the peer's init packet.  */
 	memcpy(&cookie->c.peer_init[0], init_chunk->chunk_hdr,
@@ -1680,7 +1680,7 @@ struct sctp_association *sctp_unpack_cookie(
 	unsigned int len;
 	sctp_scope_t scope;
 	struct sk_buff *skb = chunk->skb;
-	struct timeval tv;
+	ktime_t kt;
 	struct hash_desc desc;
 
 	/* Header size is static data prior to the actual cookie, including
@@ -1757,11 +1757,11 @@ no_hmac:
 	 * down the new association establishment instead of every packet.
 	 */
 	if (sock_flag(ep->base.sk, SOCK_TIMESTAMP))
-		skb_get_timestamp(skb, &tv);
+		kt = skb_get_ktime(skb);
 	else
-		do_gettimeofday(&tv);
+		kt = ktime_get();
 
-	if (!asoc && tv_lt(bear_cookie->expiration, tv)) {
+	if (!asoc && ktime_compare(bear_cookie->expiration, kt) < 0) {
 		/*
 		 * Section 3.3.10.3 Stale Cookie Error (3)
 		 *
@@ -1773,9 +1773,7 @@ no_hmac:
 		len = ntohs(chunk->chunk_hdr->length);
 		*errp = sctp_make_op_error_space(asoc, chunk, len);
 		if (*errp) {
-			suseconds_t usecs = (tv.tv_sec -
-				bear_cookie->expiration.tv_sec) * 1000000L +
-				tv.tv_usec - bear_cookie->expiration.tv_usec;
+			suseconds_t usecs = ktime_to_us(ktime_sub(kt, bear_cookie->expiration));
 			__be32 n = htonl(usecs);
 
 			sctp_init_cause(*errp, SCTP_ERROR_STALE_COOKIE,
@@ -2514,8 +2512,7 @@ do_addr_param:
 		/* Suggested Cookie Life span increment's unit is msec,
 		 * (1/1000sec).
 		 */
-		asoc->cookie_life.tv_sec += stale / 1000;
-		asoc->cookie_life.tv_usec += (stale % 1000) * 1000;
+		asoc->cookie_life = ktime_add_ms(asoc->cookie_life, stale);
 		break;
 
 	case SCTP_PARAM_HOST_NAME_ADDRESS:
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 75fe92a..68a6b70 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -2910,13 +2910,8 @@ static int sctp_setsockopt_associnfo(struct sock *sk, char __user *optval, unsig
 			asoc->max_retrans = assocparams.sasoc_asocmaxrxt;
 		}
 
-		if (assocparams.sasoc_cookie_life != 0) {
-			asoc->cookie_life.tv_sec =
-					assocparams.sasoc_cookie_life / 1000;
-			asoc->cookie_life.tv_usec =
-					(assocparams.sasoc_cookie_life % 1000)
-					* 1000;
-		}
+		if (assocparams.sasoc_cookie_life != 0)
+			asoc->cookie_life = ms_to_ktime(assocparams.sasoc_cookie_life);
 	} else {
 		/* Set the values to the endpoint */
 		struct sctp_sock *sp = sctp_sk(sk);
@@ -5068,10 +5063,7 @@ static int sctp_getsockopt_associnfo(struct sock *sk, int len,
 		assocparams.sasoc_asocmaxrxt = asoc->max_retrans;
 		assocparams.sasoc_peer_rwnd = asoc->peer.rwnd;
 		assocparams.sasoc_local_rwnd = asoc->a_rwnd;
-		assocparams.sasoc_cookie_life = (asoc->cookie_life.tv_sec
-						* 1000) +
-						(asoc->cookie_life.tv_usec
-						/ 1000);
+		assocparams.sasoc_cookie_life = ktime_to_ms(asoc->cookie_life);
 
 		list_for_each(pos, &asoc->peer.transport_addr_list) {
 			cnt ++;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-18  8:55 [PATCH net-next 0/5] Further SCTP changes Daniel Borkmann
                   ` (2 preceding siblings ...)
  2013-06-18  8:55 ` [PATCH net-next 3/5] net: sctp: migrate cookie life from timeval to ktime Daniel Borkmann
@ 2013-06-18  8:55 ` Daniel Borkmann
  2013-06-18 14:22   ` Neil Horman
  2013-06-20 14:35   ` Vlad Yasevich
  2013-06-18  8:55 ` [PATCH net-next 5/5] net: sctp: minor: sctp_seq_dump_local_addrs add missing newline Daniel Borkmann
  2013-06-20  1:39 ` [PATCH net-next 0/5] Further SCTP changes David Miller
  5 siblings, 2 replies; 20+ messages in thread
From: Daniel Borkmann @ 2013-06-18  8:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-sctp

Rather instead of having the endpoint clean the garbage for the
socket, use a sk_destruct handler sctp_destruct_sock(), that does
the job for that when there are no more references on the socket.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/sctp/endpointola.c |  7 -------
 net/sctp/socket.c      | 20 +++++++++++++++++++-
 2 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
index a8b2674..9a9ebd2 100644
--- a/net/sctp/endpointola.c
+++ b/net/sctp/endpointola.c
@@ -249,9 +249,6 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
 {
 	SCTP_ASSERT(ep->base.dead, "Endpoint is not dead", return);
 
-	/* Free up the HMAC transform. */
-	crypto_free_hash(sctp_sk(ep->base.sk)->hmac);
-
 	/* Free the digest buffer */
 	kfree(ep->digest);
 
@@ -271,10 +268,6 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
 
 	memset(ep->secret_key, 0, sizeof(ep->secret_key));
 
-	/* Remove and free the port */
-	if (sctp_sk(ep->base.sk)->bind_hash)
-		sctp_put_port(ep->base.sk);
-
 	/* Give up our hold on the sock. */
 	if (ep->base.sk)
 		sock_put(ep->base.sk);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 68a6b70..67f721e 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -93,6 +93,7 @@ static int sctp_wait_for_packet(struct sock * sk, int *err, long *timeo_p);
 static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p);
 static int sctp_wait_for_accept(struct sock *sk, long timeo);
 static void sctp_wait_for_close(struct sock *sk, long timeo);
+static void sctp_destruct_sock(struct sock *sk);
 static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
 					union sctp_addr *addr, int len);
 static int sctp_bindx_add(struct sock *, struct sockaddr *, int);
@@ -3966,6 +3967,8 @@ static int sctp_init_sock(struct sock *sk)
 
 	sp->hmac = NULL;
 
+	sk->sk_destruct = sctp_destruct_sock;
+
 	SCTP_DBG_OBJCNT_INC(sock);
 
 	local_bh_disable();
@@ -4002,6 +4005,21 @@ static void sctp_destroy_sock(struct sock *sk)
 	local_bh_enable();
 }
 
+/* Triggered when there are no references on the socket anymore */
+static void sctp_destruct_sock(struct sock *sk)
+{
+	struct sctp_sock *sp = sctp_sk(sk);
+
+	/* Free up the HMAC transform. */
+	crypto_free_hash(sp->hmac);
+
+	/* Remove and free the port */
+	if (sp->bind_hash)
+		sctp_put_port(sk);
+
+	inet_sock_destruct(sk);
+}
+
 /* API 4.1.7 shutdown() - TCP Style Syntax
  *     int shutdown(int socket, int how);
  *
@@ -6842,7 +6860,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
 	newsk->sk_reuse = sk->sk_reuse;
 
 	newsk->sk_shutdown = sk->sk_shutdown;
-	newsk->sk_destruct = inet_sock_destruct;
+	newsk->sk_destruct = sctp_destruct_sock;
 	newsk->sk_family = sk->sk_family;
 	newsk->sk_protocol = IPPROTO_SCTP;
 	newsk->sk_backlog_rcv = sk->sk_prot->backlog_rcv;
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH net-next 5/5] net: sctp: minor: sctp_seq_dump_local_addrs add missing newline
  2013-06-18  8:55 [PATCH net-next 0/5] Further SCTP changes Daniel Borkmann
                   ` (3 preceding siblings ...)
  2013-06-18  8:55 ` [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint Daniel Borkmann
@ 2013-06-18  8:55 ` Daniel Borkmann
  2013-06-20  1:39 ` [PATCH net-next 0/5] Further SCTP changes David Miller
  5 siblings, 0 replies; 20+ messages in thread
From: Daniel Borkmann @ 2013-06-18  8:55 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-sctp

A trailing newline has been forgotten to add into the WARN().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
---
 net/sctp/proc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 0c83162..62526c4 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -138,7 +138,7 @@ static void sctp_seq_dump_local_addrs(struct seq_file *seq, struct sctp_ep_commo
 
 		peer = asoc->peer.primary_path;
 		if (unlikely(peer == NULL)) {
-			WARN(1, "Association %p with NULL primary path!", asoc);
+			WARN(1, "Association %p with NULL primary path!\n", asoc);
 			return;
 		}
 
-- 
1.7.11.7

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-18  8:55 ` [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint Daniel Borkmann
@ 2013-06-18 14:22   ` Neil Horman
  2013-06-18 16:02     ` Daniel Borkmann
  2013-06-20 14:35   ` Vlad Yasevich
  1 sibling, 1 reply; 20+ messages in thread
From: Neil Horman @ 2013-06-18 14:22 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, linux-sctp

On Tue, Jun 18, 2013 at 10:55:19AM +0200, Daniel Borkmann wrote:
> Rather instead of having the endpoint clean the garbage for the
> socket, use a sk_destruct handler sctp_destruct_sock(), that does
> the job for that when there are no more references on the socket.
> 
> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>  net/sctp/endpointola.c |  7 -------
>  net/sctp/socket.c      | 20 +++++++++++++++++++-
>  2 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
> index a8b2674..9a9ebd2 100644
> --- a/net/sctp/endpointola.c
> +++ b/net/sctp/endpointola.c
> @@ -249,9 +249,6 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
>  {
>  	SCTP_ASSERT(ep->base.dead, "Endpoint is not dead", return);
>  
> -	/* Free up the HMAC transform. */
> -	crypto_free_hash(sctp_sk(ep->base.sk)->hmac);
> -
>  	/* Free the digest buffer */
>  	kfree(ep->digest);
>  
> @@ -271,10 +268,6 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
>  
>  	memset(ep->secret_key, 0, sizeof(ep->secret_key));
>  
> -	/* Remove and free the port */
> -	if (sctp_sk(ep->base.sk)->bind_hash)
> -		sctp_put_port(ep->base.sk);
> -
>  	/* Give up our hold on the sock. */
>  	if (ep->base.sk)
>  		sock_put(ep->base.sk);
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 68a6b70..67f721e 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -93,6 +93,7 @@ static int sctp_wait_for_packet(struct sock * sk, int *err, long *timeo_p);
>  static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p);
>  static int sctp_wait_for_accept(struct sock *sk, long timeo);
>  static void sctp_wait_for_close(struct sock *sk, long timeo);
> +static void sctp_destruct_sock(struct sock *sk);
>  static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
>  					union sctp_addr *addr, int len);
>  static int sctp_bindx_add(struct sock *, struct sockaddr *, int);
> @@ -3966,6 +3967,8 @@ static int sctp_init_sock(struct sock *sk)
>  
>  	sp->hmac = NULL;
>  
> +	sk->sk_destruct = sctp_destruct_sock;
> +
>  	SCTP_DBG_OBJCNT_INC(sock);
>  
>  	local_bh_disable();
> @@ -4002,6 +4005,21 @@ static void sctp_destroy_sock(struct sock *sk)
>  	local_bh_enable();
>  }
>  
> +/* Triggered when there are no references on the socket anymore */
> +static void sctp_destruct_sock(struct sock *sk)
> +{
> +	struct sctp_sock *sp = sctp_sk(sk);
> +
> +	/* Free up the HMAC transform. */
> +	crypto_free_hash(sp->hmac);
> +
> +	/* Remove and free the port */
> +	if (sp->bind_hash)
> +		sctp_put_port(sk);
> +
> +	inet_sock_destruct(sk);
> +}
> +
>  /* API 4.1.7 shutdown() - TCP Style Syntax
>   *     int shutdown(int socket, int how);
>   *
> @@ -6842,7 +6860,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
>  	newsk->sk_reuse = sk->sk_reuse;
>  
>  	newsk->sk_shutdown = sk->sk_shutdown;
> -	newsk->sk_destruct = inet_sock_destruct;
> +	newsk->sk_destruct = sctp_destruct_sock;
>  	newsk->sk_family = sk->sk_family;
>  	newsk->sk_protocol = IPPROTO_SCTP;
>  	newsk->sk_backlog_rcv = sk->sk_prot->backlog_rcv;
> -- 
> 1.7.11.7
> 

I like this idea, but I think I'm maybe missing something from it - we reference
the socket in both the receive and send paths (sctp_unpack_cookie, is
specifically called from the rx path, which makes use of sp->hmac).  a socket
destructor can be called from __sk_free when sk_wmem_alloc reaches zero, but we
use sk_refcnt in the rx path to prevent premature socket cleanup.  If we drain
our send queeue while wer'e still processing rx messages, what prevents us from
freeing the socket in the tx path, via sk_free while we're still using the
socket in the rx path.  Note I don't think this patch is wrong per-se, but it
seems to me there is more work to do to properly interlock the use of sk_refcnt
and sk_wmem_alloc here (unless I'm just missing something obvious, which is
entirely possible, I've been in the sun alot lately :) ).

Neil

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-18 14:22   ` Neil Horman
@ 2013-06-18 16:02     ` Daniel Borkmann
  2013-06-18 16:24       ` Vlad Yasevich
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2013-06-18 16:02 UTC (permalink / raw)
  To: Neil Horman; +Cc: davem, netdev, linux-sctp

On 06/18/2013 04:22 PM, Neil Horman wrote:
> I like this idea, but I think I'm maybe missing something from it - we reference
> the socket in both the receive and send paths (sctp_unpack_cookie, is
> specifically called from the rx path, which makes use of sp->hmac).  a socket
> destructor can be called from __sk_free when sk_wmem_alloc reaches zero, but we
> use sk_refcnt in the rx path to prevent premature socket cleanup.  If we drain
> our send queeue while wer'e still processing rx messages, what prevents us from
> freeing the socket in the tx path, via sk_free while we're still using the
> socket in the rx path.  Note I don't think this patch is wrong per-se, but it
> seems to me there is more work to do to properly interlock the use of sk_refcnt
> and sk_wmem_alloc here (unless I'm just missing something obvious, which is
> entirely possible, I've been in the sun alot lately :) ).

Hm, __sk_free() calls sk_prot_free() which frees our socket structure and in
sctp_wfree() we do a sctp_association_put(asoc) after sock_wfree(skb).

So no matter if having this patch or not, couldn't this use-after-free like
scenario already happen with the current code?

F.e. through a given call graph like that:

sctp_wfree(skb):
  1) sock_wfree(skb)
     -> __sk_free()
      -> sk_prot_free(.., sk)
       -> kmem_cache_free(.., sk) or kfree(sk)
  2) __sctp_write_space(asoc)
  3) sctp_association_put(asoc)
     -> sctp_association_destroy(asoc)
      -> sctp_endpoint_put(asoc->ep)
       -> sctp_endpoint_destroy(ep)
        -> crypto_free_hash(sctp_sk(ep->base.sk)->hmac)
           (etc, all unconditionally accessed while sk is
            already dead/freed)

Then, this might need a fix in general. :-)

Assuming you would reduce the buffer space via setsockopt(.., SO_SNDBUF, ..),
you might end up with a minimum buffer space of SOCK_MIN_SNDBUF [*] and a call to
sk->sk_write_space(sk), which is sctp_write_space() and calls __sctp_write_space()
on all asocs belonging to the socket, but it seems not to alter the current
sk->sk_wmem_alloc I think, but rather sk->sk_sndbuf.

[*] Btw, shouldn't this rather be (2048 + sizeof(struct sk_buff)) or
     SKB_TRUESIZE(2048), at least like in SOCK_MIN_RCVBUF since we operate
     on skb->truesize as well?

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-18 16:02     ` Daniel Borkmann
@ 2013-06-18 16:24       ` Vlad Yasevich
  2013-06-18 17:45         ` Neil Horman
  0 siblings, 1 reply; 20+ messages in thread
From: Vlad Yasevich @ 2013-06-18 16:24 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Neil Horman, davem, netdev, linux-sctp

On 06/18/2013 12:02 PM, Daniel Borkmann wrote:
> On 06/18/2013 04:22 PM, Neil Horman wrote:
>> I like this idea, but I think I'm maybe missing something from it - we
>> reference
>> the socket in both the receive and send paths (sctp_unpack_cookie, is
>> specifically called from the rx path, which makes use of sp->hmac).  a
>> socket
>> destructor can be called from __sk_free when sk_wmem_alloc reaches
>> zero, but we
>> use sk_refcnt in the rx path to prevent premature socket cleanup.  If
>> we drain
>> our send queeue while wer'e still processing rx messages, what
>> prevents us from
>> freeing the socket in the tx path, via sk_free while we're still using
>> the
>> socket in the rx path.  Note I don't think this patch is wrong per-se,
>> but it
>> seems to me there is more work to do to properly interlock the use of
>> sk_refcnt
>> and sk_wmem_alloc here (unless I'm just missing something obvious,
>> which is
>> entirely possible, I've been in the sun alot lately :) ).
>
> Hm, __sk_free() calls sk_prot_free() which frees our socket structure
> and in
> sctp_wfree() we do a sctp_association_put(asoc) after sock_wfree(skb).
>
> So no matter if having this patch or not, couldn't this use-after-free like
> scenario already happen with the current code?
>
> F.e. through a given call graph like that:
>
> sctp_wfree(skb):
>   1) sock_wfree(skb)
>      -> __sk_free()

I don't think this can happen.  sk_wmem_alloc is set to 1 in sk_alloc()
and that acts as a guard to make sure that sk_free() has been called
before we try to free things up.  So, in this partcular case, for 
__sk_free() to be called, sk_free() had to be called meaning the
last ref on the socket was released.  However, that's not possible since
we are still holding the association and thus holding the socket
associated with it.

-vlad

>       -> sk_prot_free(.., sk)
>        -> kmem_cache_free(.., sk) or kfree(sk)
>   2) __sctp_write_space(asoc)
>   3) sctp_association_put(asoc)
>      -> sctp_association_destroy(asoc)
>       -> sctp_endpoint_put(asoc->ep)
>        -> sctp_endpoint_destroy(ep)
>         -> crypto_free_hash(sctp_sk(ep->base.sk)->hmac)
>            (etc, all unconditionally accessed while sk is
>             already dead/freed)
>
> Then, this might need a fix in general. :-)
>
> Assuming you would reduce the buffer space via setsockopt(.., SO_SNDBUF,
> ..),
> you might end up with a minimum buffer space of SOCK_MIN_SNDBUF [*] and
> a call to
> sk->sk_write_space(sk), which is sctp_write_space() and calls
> __sctp_write_space()
> on all asocs belonging to the socket, but it seems not to alter the current
> sk->sk_wmem_alloc I think, but rather sk->sk_sndbuf.
>
> [*] Btw, shouldn't this rather be (2048 + sizeof(struct sk_buff)) or
>      SKB_TRUESIZE(2048), at least like in SOCK_MIN_RCVBUF since we operate
>      on skb->truesize as well?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-18 16:24       ` Vlad Yasevich
@ 2013-06-18 17:45         ` Neil Horman
  2013-06-18 18:15           ` Vlad Yasevich
  0 siblings, 1 reply; 20+ messages in thread
From: Neil Horman @ 2013-06-18 17:45 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Daniel Borkmann, davem, netdev, linux-sctp

On Tue, Jun 18, 2013 at 12:24:31PM -0400, Vlad Yasevich wrote:
> On 06/18/2013 12:02 PM, Daniel Borkmann wrote:
> >On 06/18/2013 04:22 PM, Neil Horman wrote:
> >>I like this idea, but I think I'm maybe missing something from it - we
> >>reference
> >>the socket in both the receive and send paths (sctp_unpack_cookie, is
> >>specifically called from the rx path, which makes use of sp->hmac).  a
> >>socket
> >>destructor can be called from __sk_free when sk_wmem_alloc reaches
> >>zero, but we
> >>use sk_refcnt in the rx path to prevent premature socket cleanup.  If
> >>we drain
> >>our send queeue while wer'e still processing rx messages, what
> >>prevents us from
> >>freeing the socket in the tx path, via sk_free while we're still using
> >>the
> >>socket in the rx path.  Note I don't think this patch is wrong per-se,
> >>but it
> >>seems to me there is more work to do to properly interlock the use of
> >>sk_refcnt
> >>and sk_wmem_alloc here (unless I'm just missing something obvious,
> >>which is
> >>entirely possible, I've been in the sun alot lately :) ).
> >
> >Hm, __sk_free() calls sk_prot_free() which frees our socket structure
> >and in
> >sctp_wfree() we do a sctp_association_put(asoc) after sock_wfree(skb).
> >
> >So no matter if having this patch or not, couldn't this use-after-free like
> >scenario already happen with the current code?
> >
> >F.e. through a given call graph like that:
> >
> >sctp_wfree(skb):
> >  1) sock_wfree(skb)
> >     -> __sk_free()
> 
> I don't think this can happen.  sk_wmem_alloc is set to 1 in sk_alloc()
> and that acts as a guard to make sure that sk_free() has been called
> before we try to free things up.  So, in this partcular case, for
> __sk_free() to be called, sk_free() had to be called meaning the
> last ref on the socket was released.  However, that's not possible since
> we are still holding the association and thus holding the socket
> associated with it.
> 
I see what your saying, and I agree, with that bias added in sk_alloc, it looks
like we won't ever call __sk_free until sk_wmem_alloc is 1 _and_ sk_refcnt is 0.
It still seems messy and confusing though.  It would make more sense to me to
increment the refcount an additional time when the socket is initalized, and
then decrement it again when the socket is closed and sk_wmem_alloc reaches
zero.  That would isolate the refcounting to a single variable.
Neil

> -vlad
> 
> >      -> sk_prot_free(.., sk)
> >       -> kmem_cache_free(.., sk) or kfree(sk)
> >  2) __sctp_write_space(asoc)
> >  3) sctp_association_put(asoc)
> >     -> sctp_association_destroy(asoc)
> >      -> sctp_endpoint_put(asoc->ep)
> >       -> sctp_endpoint_destroy(ep)
> >        -> crypto_free_hash(sctp_sk(ep->base.sk)->hmac)
> >           (etc, all unconditionally accessed while sk is
> >            already dead/freed)
> >
> >Then, this might need a fix in general. :-)
> >
> >Assuming you would reduce the buffer space via setsockopt(.., SO_SNDBUF,
> >..),
> >you might end up with a minimum buffer space of SOCK_MIN_SNDBUF [*] and
> >a call to
> >sk->sk_write_space(sk), which is sctp_write_space() and calls
> >__sctp_write_space()
> >on all asocs belonging to the socket, but it seems not to alter the current
> >sk->sk_wmem_alloc I think, but rather sk->sk_sndbuf.
> >
> >[*] Btw, shouldn't this rather be (2048 + sizeof(struct sk_buff)) or
> >     SKB_TRUESIZE(2048), at least like in SOCK_MIN_RCVBUF since we operate
> >     on skb->truesize as well?
> >--
> >To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-18 17:45         ` Neil Horman
@ 2013-06-18 18:15           ` Vlad Yasevich
  2013-06-18 19:12             ` Neil Horman
  0 siblings, 1 reply; 20+ messages in thread
From: Vlad Yasevich @ 2013-06-18 18:15 UTC (permalink / raw)
  To: Neil Horman; +Cc: Daniel Borkmann, davem, netdev, linux-sctp

On 06/18/2013 01:45 PM, Neil Horman wrote:
> On Tue, Jun 18, 2013 at 12:24:31PM -0400, Vlad Yasevich wrote:
>> On 06/18/2013 12:02 PM, Daniel Borkmann wrote:
>>> On 06/18/2013 04:22 PM, Neil Horman wrote:
>>>> I like this idea, but I think I'm maybe missing something from it - we
>>>> reference
>>>> the socket in both the receive and send paths (sctp_unpack_cookie, is
>>>> specifically called from the rx path, which makes use of sp->hmac).  a
>>>> socket
>>>> destructor can be called from __sk_free when sk_wmem_alloc reaches
>>>> zero, but we
>>>> use sk_refcnt in the rx path to prevent premature socket cleanup.  If
>>>> we drain
>>>> our send queeue while wer'e still processing rx messages, what
>>>> prevents us from
>>>> freeing the socket in the tx path, via sk_free while we're still using
>>>> the
>>>> socket in the rx path.  Note I don't think this patch is wrong per-se,
>>>> but it
>>>> seems to me there is more work to do to properly interlock the use of
>>>> sk_refcnt
>>>> and sk_wmem_alloc here (unless I'm just missing something obvious,
>>>> which is
>>>> entirely possible, I've been in the sun alot lately :) ).
>>>
>>> Hm, __sk_free() calls sk_prot_free() which frees our socket structure
>>> and in
>>> sctp_wfree() we do a sctp_association_put(asoc) after sock_wfree(skb).
>>>
>>> So no matter if having this patch or not, couldn't this use-after-free like
>>> scenario already happen with the current code?
>>>
>>> F.e. through a given call graph like that:
>>>
>>> sctp_wfree(skb):
>>>   1) sock_wfree(skb)
>>>      -> __sk_free()
>>
>> I don't think this can happen.  sk_wmem_alloc is set to 1 in sk_alloc()
>> and that acts as a guard to make sure that sk_free() has been called
>> before we try to free things up.  So, in this partcular case, for
>> __sk_free() to be called, sk_free() had to be called meaning the
>> last ref on the socket was released.  However, that's not possible since
>> we are still holding the association and thus holding the socket
>> associated with it.
>>
> I see what your saying, and I agree, with that bias added in sk_alloc, it looks
> like we won't ever call __sk_free until sk_wmem_alloc is 1 _and_ sk_refcnt is 0.
> It still seems messy and confusing though.  It would make more sense to me to
> increment the refcount an additional time when the socket is initalized, and
> then decrement it again when the socket is closed and sk_wmem_alloc reaches
> zero.  That would isolate the refcounting to a single variable.

See commit 2b85a34e911bf483c27cfdd124aeb1605145dc80.  The whole 
sk_wmem_alloc tric was done so that we dont have to do 
sock_hold/sock_put on transmits.

It might be good to see if we can do that in sctp as well.

-vlad


> Neil
>
>> -vlad
>>
>>>       -> sk_prot_free(.., sk)
>>>        -> kmem_cache_free(.., sk) or kfree(sk)
>>>   2) __sctp_write_space(asoc)
>>>   3) sctp_association_put(asoc)
>>>      -> sctp_association_destroy(asoc)
>>>       -> sctp_endpoint_put(asoc->ep)
>>>        -> sctp_endpoint_destroy(ep)
>>>         -> crypto_free_hash(sctp_sk(ep->base.sk)->hmac)
>>>            (etc, all unconditionally accessed while sk is
>>>             already dead/freed)
>>>
>>> Then, this might need a fix in general. :-)
>>>
>>> Assuming you would reduce the buffer space via setsockopt(.., SO_SNDBUF,
>>> ..),
>>> you might end up with a minimum buffer space of SOCK_MIN_SNDBUF [*] and
>>> a call to
>>> sk->sk_write_space(sk), which is sctp_write_space() and calls
>>> __sctp_write_space()
>>> on all asocs belonging to the socket, but it seems not to alter the current
>>> sk->sk_wmem_alloc I think, but rather sk->sk_sndbuf.
>>>
>>> [*] Btw, shouldn't this rather be (2048 + sizeof(struct sk_buff)) or
>>>      SKB_TRUESIZE(2048), at least like in SOCK_MIN_RCVBUF since we operate
>>>      on skb->truesize as well?
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-18 18:15           ` Vlad Yasevich
@ 2013-06-18 19:12             ` Neil Horman
  2013-06-18 22:55               ` Vlad Yasevich
  0 siblings, 1 reply; 20+ messages in thread
From: Neil Horman @ 2013-06-18 19:12 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Daniel Borkmann, davem, netdev, linux-sctp

On Tue, Jun 18, 2013 at 02:15:21PM -0400, Vlad Yasevich wrote:
> On 06/18/2013 01:45 PM, Neil Horman wrote:
> >On Tue, Jun 18, 2013 at 12:24:31PM -0400, Vlad Yasevich wrote:
> >>On 06/18/2013 12:02 PM, Daniel Borkmann wrote:
> >>>On 06/18/2013 04:22 PM, Neil Horman wrote:
> >>>>I like this idea, but I think I'm maybe missing something from it - we
> >>>>reference
> >>>>the socket in both the receive and send paths (sctp_unpack_cookie, is
> >>>>specifically called from the rx path, which makes use of sp->hmac).  a
> >>>>socket
> >>>>destructor can be called from __sk_free when sk_wmem_alloc reaches
> >>>>zero, but we
> >>>>use sk_refcnt in the rx path to prevent premature socket cleanup.  If
> >>>>we drain
> >>>>our send queeue while wer'e still processing rx messages, what
> >>>>prevents us from
> >>>>freeing the socket in the tx path, via sk_free while we're still using
> >>>>the
> >>>>socket in the rx path.  Note I don't think this patch is wrong per-se,
> >>>>but it
> >>>>seems to me there is more work to do to properly interlock the use of
> >>>>sk_refcnt
> >>>>and sk_wmem_alloc here (unless I'm just missing something obvious,
> >>>>which is
> >>>>entirely possible, I've been in the sun alot lately :) ).
> >>>
> >>>Hm, __sk_free() calls sk_prot_free() which frees our socket structure
> >>>and in
> >>>sctp_wfree() we do a sctp_association_put(asoc) after sock_wfree(skb).
> >>>
> >>>So no matter if having this patch or not, couldn't this use-after-free like
> >>>scenario already happen with the current code?
> >>>
> >>>F.e. through a given call graph like that:
> >>>
> >>>sctp_wfree(skb):
> >>>  1) sock_wfree(skb)
> >>>     -> __sk_free()
> >>
> >>I don't think this can happen.  sk_wmem_alloc is set to 1 in sk_alloc()
> >>and that acts as a guard to make sure that sk_free() has been called
> >>before we try to free things up.  So, in this partcular case, for
> >>__sk_free() to be called, sk_free() had to be called meaning the
> >>last ref on the socket was released.  However, that's not possible since
> >>we are still holding the association and thus holding the socket
> >>associated with it.
> >>
> >I see what your saying, and I agree, with that bias added in sk_alloc, it looks
> >like we won't ever call __sk_free until sk_wmem_alloc is 1 _and_ sk_refcnt is 0.
> >It still seems messy and confusing though.  It would make more sense to me to
> >increment the refcount an additional time when the socket is initalized, and
> >then decrement it again when the socket is closed and sk_wmem_alloc reaches
> >zero.  That would isolate the refcounting to a single variable.
> 
> See commit 2b85a34e911bf483c27cfdd124aeb1605145dc80.  The whole
> sk_wmem_alloc tric was done so that we dont have to do
> sock_hold/sock_put on transmits.
> 
I know about why it was done, I'm just proposing that we don't have to do it
that way to get the speedup it gives us, and make the code a bit more clear at
the same time.  If we set the refcnt to 2 at init time, we can decrement that
added initial bias when the sk is marked as SOCK_DEAD and the sk_wmem_queued
reaches 0.  We still don't have to do a sock_hold/put on every tx, and we keep
all the reference counting in one location.

Alternatively we can move all the reference counting to sk_wmem_alloc, and have
sock_hold/put increment that field by one instead of sk_refcnt, meaning we could
remove that field

We don't really have to do any of this, of course, but not having looked at it
in some time, it seems confusing to me to have have a single additional ref
count tracked in sk_wmem_alloc.

> It might be good to see if we can do that in sctp as well.
Not sure what you mean by "that" here.  You mean remove the additional uses of
sctp_hold/put?

Neil

> 
> -vlad
> 
> 
> >Neil
> >
> >>-vlad
> >>
> >>>      -> sk_prot_free(.., sk)
> >>>       -> kmem_cache_free(.., sk) or kfree(sk)
> >>>  2) __sctp_write_space(asoc)
> >>>  3) sctp_association_put(asoc)
> >>>     -> sctp_association_destroy(asoc)
> >>>      -> sctp_endpoint_put(asoc->ep)
> >>>       -> sctp_endpoint_destroy(ep)
> >>>        -> crypto_free_hash(sctp_sk(ep->base.sk)->hmac)
> >>>           (etc, all unconditionally accessed while sk is
> >>>            already dead/freed)
> >>>
> >>>Then, this might need a fix in general. :-)
> >>>
> >>>Assuming you would reduce the buffer space via setsockopt(.., SO_SNDBUF,
> >>>..),
> >>>you might end up with a minimum buffer space of SOCK_MIN_SNDBUF [*] and
> >>>a call to
> >>>sk->sk_write_space(sk), which is sctp_write_space() and calls
> >>>__sctp_write_space()
> >>>on all asocs belonging to the socket, but it seems not to alter the current
> >>>sk->sk_wmem_alloc I think, but rather sk->sk_sndbuf.
> >>>
> >>>[*] Btw, shouldn't this rather be (2048 + sizeof(struct sk_buff)) or
> >>>     SKB_TRUESIZE(2048), at least like in SOCK_MIN_RCVBUF since we operate
> >>>     on skb->truesize as well?
> >>>--
> >>>To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> >>>the body of a message to majordomo@vger.kernel.org
> >>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>
> >>
> 
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-18 19:12             ` Neil Horman
@ 2013-06-18 22:55               ` Vlad Yasevich
  2013-06-19 11:55                 ` Neil Horman
  0 siblings, 1 reply; 20+ messages in thread
From: Vlad Yasevich @ 2013-06-18 22:55 UTC (permalink / raw)
  To: Neil Horman; +Cc: Daniel Borkmann, davem, netdev, linux-sctp

On 06/18/2013 03:12 PM, Neil Horman wrote:
> On Tue, Jun 18, 2013 at 02:15:21PM -0400, Vlad Yasevich wrote:
>> On 06/18/2013 01:45 PM, Neil Horman wrote:
>>> On Tue, Jun 18, 2013 at 12:24:31PM -0400, Vlad Yasevich wrote:
>>>> On 06/18/2013 12:02 PM, Daniel Borkmann wrote:
>>>>> On 06/18/2013 04:22 PM, Neil Horman wrote:
>>>>>> I like this idea, but I think I'm maybe missing something from it - we
>>>>>> reference
>>>>>> the socket in both the receive and send paths (sctp_unpack_cookie, is
>>>>>> specifically called from the rx path, which makes use of sp->hmac).  a
>>>>>> socket
>>>>>> destructor can be called from __sk_free when sk_wmem_alloc reaches
>>>>>> zero, but we
>>>>>> use sk_refcnt in the rx path to prevent premature socket cleanup.  If
>>>>>> we drain
>>>>>> our send queeue while wer'e still processing rx messages, what
>>>>>> prevents us from
>>>>>> freeing the socket in the tx path, via sk_free while we're still using
>>>>>> the
>>>>>> socket in the rx path.  Note I don't think this patch is wrong per-se,
>>>>>> but it
>>>>>> seems to me there is more work to do to properly interlock the use of
>>>>>> sk_refcnt
>>>>>> and sk_wmem_alloc here (unless I'm just missing something obvious,
>>>>>> which is
>>>>>> entirely possible, I've been in the sun alot lately :) ).
>>>>>
>>>>> Hm, __sk_free() calls sk_prot_free() which frees our socket structure
>>>>> and in
>>>>> sctp_wfree() we do a sctp_association_put(asoc) after sock_wfree(skb).
>>>>>
>>>>> So no matter if having this patch or not, couldn't this use-after-free like
>>>>> scenario already happen with the current code?
>>>>>
>>>>> F.e. through a given call graph like that:
>>>>>
>>>>> sctp_wfree(skb):
>>>>>   1) sock_wfree(skb)
>>>>>      -> __sk_free()
>>>>
>>>> I don't think this can happen.  sk_wmem_alloc is set to 1 in sk_alloc()
>>>> and that acts as a guard to make sure that sk_free() has been called
>>>> before we try to free things up.  So, in this partcular case, for
>>>> __sk_free() to be called, sk_free() had to be called meaning the
>>>> last ref on the socket was released.  However, that's not possible since
>>>> we are still holding the association and thus holding the socket
>>>> associated with it.
>>>>
>>> I see what your saying, and I agree, with that bias added in sk_alloc, it looks
>>> like we won't ever call __sk_free until sk_wmem_alloc is 1 _and_ sk_refcnt is 0.
>>> It still seems messy and confusing though.  It would make more sense to me to
>>> increment the refcount an additional time when the socket is initalized, and
>>> then decrement it again when the socket is closed and sk_wmem_alloc reaches
>>> zero.  That would isolate the refcounting to a single variable.
>>
>> See commit 2b85a34e911bf483c27cfdd124aeb1605145dc80.  The whole
>> sk_wmem_alloc tric was done so that we dont have to do
>> sock_hold/sock_put on transmits.
>>
> I know about why it was done, I'm just proposing that we don't have to do it
> that way to get the speedup it gives us, and make the code a bit more clear at
> the same time.  If we set the refcnt to 2 at init time, we can decrement that
> added initial bias when the sk is marked as SOCK_DEAD and the sk_wmem_queued
> reaches 0.  We still don't have to do a sock_hold/put on every tx, and we keep
> all the reference counting in one location.

sorry, not following..  regardless, this seems to have gotten a bit off 
topic.  I am going to take a deeper look at Daniel's to make sure it 
doesn't introduce any races.

-vlad

>
> Alternatively we can move all the reference counting to sk_wmem_alloc, and have
> sock_hold/put increment that field by one instead of sk_refcnt, meaning we could
> remove that field
>
> We don't really have to do any of this, of course, but not having looked at it
> in some time, it seems confusing to me to have have a single additional ref
> count tracked in sk_wmem_alloc.
>
>> It might be good to see if we can do that in sctp as well.
> Not sure what you mean by "that" here.  You mean remove the additional uses of
> sctp_hold/put?
>
> Neil
>
>>
>> -vlad
>>
>>
>>> Neil
>>>
>>>> -vlad
>>>>
>>>>>       -> sk_prot_free(.., sk)
>>>>>        -> kmem_cache_free(.., sk) or kfree(sk)
>>>>>   2) __sctp_write_space(asoc)
>>>>>   3) sctp_association_put(asoc)
>>>>>      -> sctp_association_destroy(asoc)
>>>>>       -> sctp_endpoint_put(asoc->ep)
>>>>>        -> sctp_endpoint_destroy(ep)
>>>>>         -> crypto_free_hash(sctp_sk(ep->base.sk)->hmac)
>>>>>            (etc, all unconditionally accessed while sk is
>>>>>             already dead/freed)
>>>>>
>>>>> Then, this might need a fix in general. :-)
>>>>>
>>>>> Assuming you would reduce the buffer space via setsockopt(.., SO_SNDBUF,
>>>>> ..),
>>>>> you might end up with a minimum buffer space of SOCK_MIN_SNDBUF [*] and
>>>>> a call to
>>>>> sk->sk_write_space(sk), which is sctp_write_space() and calls
>>>>> __sctp_write_space()
>>>>> on all asocs belonging to the socket, but it seems not to alter the current
>>>>> sk->sk_wmem_alloc I think, but rather sk->sk_sndbuf.
>>>>>
>>>>> [*] Btw, shouldn't this rather be (2048 + sizeof(struct sk_buff)) or
>>>>>      SKB_TRUESIZE(2048), at least like in SOCK_MIN_RCVBUF since we operate
>>>>>      on skb->truesize as well?
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>>>>> the body of a message to majordomo@vger.kernel.org
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-18 22:55               ` Vlad Yasevich
@ 2013-06-19 11:55                 ` Neil Horman
  0 siblings, 0 replies; 20+ messages in thread
From: Neil Horman @ 2013-06-19 11:55 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: Daniel Borkmann, davem, netdev, linux-sctp

On Tue, Jun 18, 2013 at 06:55:27PM -0400, Vlad Yasevich wrote:
> On 06/18/2013 03:12 PM, Neil Horman wrote:
> >On Tue, Jun 18, 2013 at 02:15:21PM -0400, Vlad Yasevich wrote:
> >>On 06/18/2013 01:45 PM, Neil Horman wrote:
> >>>On Tue, Jun 18, 2013 at 12:24:31PM -0400, Vlad Yasevich wrote:
> >>>>On 06/18/2013 12:02 PM, Daniel Borkmann wrote:
> >>>>>On 06/18/2013 04:22 PM, Neil Horman wrote:
> >>>>>>I like this idea, but I think I'm maybe missing something from it - we
> >>>>>>reference
> >>>>>>the socket in both the receive and send paths (sctp_unpack_cookie, is
> >>>>>>specifically called from the rx path, which makes use of sp->hmac).  a
> >>>>>>socket
> >>>>>>destructor can be called from __sk_free when sk_wmem_alloc reaches
> >>>>>>zero, but we
> >>>>>>use sk_refcnt in the rx path to prevent premature socket cleanup.  If
> >>>>>>we drain
> >>>>>>our send queeue while wer'e still processing rx messages, what
> >>>>>>prevents us from
> >>>>>>freeing the socket in the tx path, via sk_free while we're still using
> >>>>>>the
> >>>>>>socket in the rx path.  Note I don't think this patch is wrong per-se,
> >>>>>>but it
> >>>>>>seems to me there is more work to do to properly interlock the use of
> >>>>>>sk_refcnt
> >>>>>>and sk_wmem_alloc here (unless I'm just missing something obvious,
> >>>>>>which is
> >>>>>>entirely possible, I've been in the sun alot lately :) ).
> >>>>>
> >>>>>Hm, __sk_free() calls sk_prot_free() which frees our socket structure
> >>>>>and in
> >>>>>sctp_wfree() we do a sctp_association_put(asoc) after sock_wfree(skb).
> >>>>>
> >>>>>So no matter if having this patch or not, couldn't this use-after-free like
> >>>>>scenario already happen with the current code?
> >>>>>
> >>>>>F.e. through a given call graph like that:
> >>>>>
> >>>>>sctp_wfree(skb):
> >>>>>  1) sock_wfree(skb)
> >>>>>     -> __sk_free()
> >>>>
> >>>>I don't think this can happen.  sk_wmem_alloc is set to 1 in sk_alloc()
> >>>>and that acts as a guard to make sure that sk_free() has been called
> >>>>before we try to free things up.  So, in this partcular case, for
> >>>>__sk_free() to be called, sk_free() had to be called meaning the
> >>>>last ref on the socket was released.  However, that's not possible since
> >>>>we are still holding the association and thus holding the socket
> >>>>associated with it.
> >>>>
> >>>I see what your saying, and I agree, with that bias added in sk_alloc, it looks
> >>>like we won't ever call __sk_free until sk_wmem_alloc is 1 _and_ sk_refcnt is 0.
> >>>It still seems messy and confusing though.  It would make more sense to me to
> >>>increment the refcount an additional time when the socket is initalized, and
> >>>then decrement it again when the socket is closed and sk_wmem_alloc reaches
> >>>zero.  That would isolate the refcounting to a single variable.
> >>
> >>See commit 2b85a34e911bf483c27cfdd124aeb1605145dc80.  The whole
> >>sk_wmem_alloc tric was done so that we dont have to do
> >>sock_hold/sock_put on transmits.
> >>
> >I know about why it was done, I'm just proposing that we don't have to do it
> >that way to get the speedup it gives us, and make the code a bit more clear at
> >the same time.  If we set the refcnt to 2 at init time, we can decrement that
> >added initial bias when the sk is marked as SOCK_DEAD and the sk_wmem_queued
> >reaches 0.  We still don't have to do a sock_hold/put on every tx, and we keep
> >all the reference counting in one location.
> 
> sorry, not following..  regardless, this seems to have gotten a bit
SOCK_DEAD set in sk->flags IIRC should indicate that the socket has been
orphaned from any user space file descriptors (i.e. the same meaning that the
initial bias to sk_wmem_alloc is telling us).  I'm just saying we can use that
to replace the sk_wmem_alloc == 0 test.

> off topic.  I am going to take a deeper look at Daniel's to make
> sure it doesn't introduce any races.
Yes, this is off topic, I was really just complaining that the current scheme
makes it harder to evaluate patches like this.

Neil

> 
> -vlad
> 
> >
> >Alternatively we can move all the reference counting to sk_wmem_alloc, and have
> >sock_hold/put increment that field by one instead of sk_refcnt, meaning we could
> >remove that field
> >
> >We don't really have to do any of this, of course, but not having looked at it
> >in some time, it seems confusing to me to have have a single additional ref
> >count tracked in sk_wmem_alloc.
> >
> >>It might be good to see if we can do that in sctp as well.
> >Not sure what you mean by "that" here.  You mean remove the additional uses of
> >sctp_hold/put?
> >
> >Neil
> >
> >>
> >>-vlad
> >>
> >>
> >>>Neil
> >>>
> >>>>-vlad
> >>>>
> >>>>>      -> sk_prot_free(.., sk)
> >>>>>       -> kmem_cache_free(.., sk) or kfree(sk)
> >>>>>  2) __sctp_write_space(asoc)
> >>>>>  3) sctp_association_put(asoc)
> >>>>>     -> sctp_association_destroy(asoc)
> >>>>>      -> sctp_endpoint_put(asoc->ep)
> >>>>>       -> sctp_endpoint_destroy(ep)
> >>>>>        -> crypto_free_hash(sctp_sk(ep->base.sk)->hmac)
> >>>>>           (etc, all unconditionally accessed while sk is
> >>>>>            already dead/freed)
> >>>>>
> >>>>>Then, this might need a fix in general. :-)
> >>>>>
> >>>>>Assuming you would reduce the buffer space via setsockopt(.., SO_SNDBUF,
> >>>>>..),
> >>>>>you might end up with a minimum buffer space of SOCK_MIN_SNDBUF [*] and
> >>>>>a call to
> >>>>>sk->sk_write_space(sk), which is sctp_write_space() and calls
> >>>>>__sctp_write_space()
> >>>>>on all asocs belonging to the socket, but it seems not to alter the current
> >>>>>sk->sk_wmem_alloc I think, but rather sk->sk_sndbuf.
> >>>>>
> >>>>>[*] Btw, shouldn't this rather be (2048 + sizeof(struct sk_buff)) or
> >>>>>     SKB_TRUESIZE(2048), at least like in SOCK_MIN_RCVBUF since we operate
> >>>>>     on skb->truesize as well?
> >>>>>--
> >>>>>To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> >>>>>the body of a message to majordomo@vger.kernel.org
> >>>>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>>
> >>>>
> >>
> >>
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 0/5] Further SCTP changes
  2013-06-18  8:55 [PATCH net-next 0/5] Further SCTP changes Daniel Borkmann
                   ` (4 preceding siblings ...)
  2013-06-18  8:55 ` [PATCH net-next 5/5] net: sctp: minor: sctp_seq_dump_local_addrs add missing newline Daniel Borkmann
@ 2013-06-20  1:39 ` David Miller
  2013-06-20 13:24   ` Neil Horman
  5 siblings, 1 reply; 20+ messages in thread
From: David Miller @ 2013-06-20  1:39 UTC (permalink / raw)
  To: dborkman; +Cc: netdev, linux-sctp

From: Daniel Borkmann <dborkman@redhat.com>
Date: Tue, 18 Jun 2013 10:55:15 +0200

> Mostly it's all about getting rid of unnecessary code, and/or making
> it more clean and readable.
> 
> Code has been tested with lksctp-tools functional test suite and
> also my small sctp stress test helper that was used to discover
> the recent null-ptr derefs. This test was also sucessfully run against
> auth_enabled=0 and auth_enabled=1, where I also previously applied
> the following two patches:
> 
>   - http://marc.info/?l=linux-crypto-vger&m=137121898331017&w=2
>     -> not yet applied, still pending review for crypto tree
>   - 1abd165 (net: sctp: fix NULL pointer dereference in socket destruction)
>     -> cherry picked from net tree
> 
> Daniel Borkmann (5):
>   net: sctp: remove TEST_FRAME ifdef
>   ktime: add ms_to_ktime() and ktime_add_ms() helpers
>   net: sctp: migrate cookie life from timeval to ktime
>   net: sctp: decouple cleaning socket data from endpoint
>   net: sctp: minor: sctp_seq_dump_local_addrs add missing newline

The discussion of this patch series has taken a tangent towards a
talk about how to do TX based socket accounting differently.

While that's interesting, it'd be good to get ACKs for Daniel's
patches meanwhile, I think they are fine personally.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 0/5] Further SCTP changes
  2013-06-20  1:39 ` [PATCH net-next 0/5] Further SCTP changes David Miller
@ 2013-06-20 13:24   ` Neil Horman
  0 siblings, 0 replies; 20+ messages in thread
From: Neil Horman @ 2013-06-20 13:24 UTC (permalink / raw)
  To: David Miller; +Cc: dborkman, netdev, linux-sctp

On Wed, Jun 19, 2013 at 06:39:03PM -0700, David Miller wrote:
> From: Daniel Borkmann <dborkman@redhat.com>
> Date: Tue, 18 Jun 2013 10:55:15 +0200
> 
> > Mostly it's all about getting rid of unnecessary code, and/or making
> > it more clean and readable.
> > 
> > Code has been tested with lksctp-tools functional test suite and
> > also my small sctp stress test helper that was used to discover
> > the recent null-ptr derefs. This test was also sucessfully run against
> > auth_enabled=0 and auth_enabled=1, where I also previously applied
> > the following two patches:
> > 
> >   - http://marc.info/?l=linux-crypto-vger&m=137121898331017&w=2
> >     -> not yet applied, still pending review for crypto tree
> >   - 1abd165 (net: sctp: fix NULL pointer dereference in socket destruction)
> >     -> cherry picked from net tree
> > 
> > Daniel Borkmann (5):
> >   net: sctp: remove TEST_FRAME ifdef
> >   ktime: add ms_to_ktime() and ktime_add_ms() helpers
> >   net: sctp: migrate cookie life from timeval to ktime
> >   net: sctp: decouple cleaning socket data from endpoint
> >   net: sctp: minor: sctp_seq_dump_local_addrs add missing newline
> 
> The discussion of this patch series has taken a tangent towards a
> talk about how to do TX based socket accounting differently.
> 
> While that's interesting, it'd be good to get ACKs for Daniel's
> patches meanwhile, I think they are fine personally.
Yeah, given that the sk_wmem_alloc thing has been clarified for me, it looks ok
to me

For the series
Acked-by: Neil Horman <nhorman@tuxdriver.com>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-18  8:55 ` [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint Daniel Borkmann
  2013-06-18 14:22   ` Neil Horman
@ 2013-06-20 14:35   ` Vlad Yasevich
  2013-06-20 17:29     ` Daniel Borkmann
  1 sibling, 1 reply; 20+ messages in thread
From: Vlad Yasevich @ 2013-06-20 14:35 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, linux-sctp

On 06/18/2013 04:55 AM, Daniel Borkmann wrote:
> Rather instead of having the endpoint clean the garbage for the
> socket, use a sk_destruct handler sctp_destruct_sock(), that does
> the job for that when there are no more references on the socket.
>

With this patch it is possible to run sctp_put_port while the socket
is not locked.

The flow goes something like this:

sctp_close()
   sk_bh_lock_sock();
   sk_common_release()
     sctp_destroy_sock()
       endpoint_put()
         endpoint_destroy() <-- This is where we used to do sctp_put_port
   sk_bh_unlock_sock();
   sock_put()
     sk_free()
       __sk_free()
         sctp_destruct_sock()
           sctp_put_port()


I haven't found any race conditions yet, but that doesn't mean they 
don't exist.

I think an easy solution would be to do sctp_put_port in sctp_unhash(),
but I haven't traced all the paths.

Daniel, please take a look at this.

Thanks
-vlad


> Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
> ---
>   net/sctp/endpointola.c |  7 -------
>   net/sctp/socket.c      | 20 +++++++++++++++++++-
>   2 files changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/net/sctp/endpointola.c b/net/sctp/endpointola.c
> index a8b2674..9a9ebd2 100644
> --- a/net/sctp/endpointola.c
> +++ b/net/sctp/endpointola.c
> @@ -249,9 +249,6 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
>   {
>   	SCTP_ASSERT(ep->base.dead, "Endpoint is not dead", return);
>
> -	/* Free up the HMAC transform. */
> -	crypto_free_hash(sctp_sk(ep->base.sk)->hmac);
> -
>   	/* Free the digest buffer */
>   	kfree(ep->digest);
>
> @@ -271,10 +268,6 @@ static void sctp_endpoint_destroy(struct sctp_endpoint *ep)
>
>   	memset(ep->secret_key, 0, sizeof(ep->secret_key));
>
> -	/* Remove and free the port */
> -	if (sctp_sk(ep->base.sk)->bind_hash)
> -		sctp_put_port(ep->base.sk);
> -
>   	/* Give up our hold on the sock. */
>   	if (ep->base.sk)
>   		sock_put(ep->base.sk);
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 68a6b70..67f721e 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -93,6 +93,7 @@ static int sctp_wait_for_packet(struct sock * sk, int *err, long *timeo_p);
>   static int sctp_wait_for_connect(struct sctp_association *, long *timeo_p);
>   static int sctp_wait_for_accept(struct sock *sk, long timeo);
>   static void sctp_wait_for_close(struct sock *sk, long timeo);
> +static void sctp_destruct_sock(struct sock *sk);
>   static struct sctp_af *sctp_sockaddr_af(struct sctp_sock *opt,
>   					union sctp_addr *addr, int len);
>   static int sctp_bindx_add(struct sock *, struct sockaddr *, int);
> @@ -3966,6 +3967,8 @@ static int sctp_init_sock(struct sock *sk)
>
>   	sp->hmac = NULL;
>
> +	sk->sk_destruct = sctp_destruct_sock;
> +
>   	SCTP_DBG_OBJCNT_INC(sock);
>
>   	local_bh_disable();
> @@ -4002,6 +4005,21 @@ static void sctp_destroy_sock(struct sock *sk)
>   	local_bh_enable();
>   }
>
> +/* Triggered when there are no references on the socket anymore */
> +static void sctp_destruct_sock(struct sock *sk)
> +{
> +	struct sctp_sock *sp = sctp_sk(sk);
> +
> +	/* Free up the HMAC transform. */
> +	crypto_free_hash(sp->hmac);
> +
> +	/* Remove and free the port */
> +	if (sp->bind_hash)
> +		sctp_put_port(sk);
> +
> +	inet_sock_destruct(sk);
> +}
> +
>   /* API 4.1.7 shutdown() - TCP Style Syntax
>    *     int shutdown(int socket, int how);
>    *
> @@ -6842,7 +6860,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
>   	newsk->sk_reuse = sk->sk_reuse;
>
>   	newsk->sk_shutdown = sk->sk_shutdown;
> -	newsk->sk_destruct = inet_sock_destruct;
> +	newsk->sk_destruct = sctp_destruct_sock;
>   	newsk->sk_family = sk->sk_family;
>   	newsk->sk_protocol = IPPROTO_SCTP;
>   	newsk->sk_backlog_rcv = sk->sk_prot->backlog_rcv;
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-20 14:35   ` Vlad Yasevich
@ 2013-06-20 17:29     ` Daniel Borkmann
  2013-06-21  1:12       ` Vlad Yasevich
  0 siblings, 1 reply; 20+ messages in thread
From: Daniel Borkmann @ 2013-06-20 17:29 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: davem, netdev, linux-sctp, Neil Horman

On 06/20/2013 04:35 PM, Vlad Yasevich wrote:
> On 06/18/2013 04:55 AM, Daniel Borkmann wrote:
>> Rather instead of having the endpoint clean the garbage for the
>> socket, use a sk_destruct handler sctp_destruct_sock(), that does
>> the job for that when there are no more references on the socket.
>
> With this patch it is possible to run sctp_put_port while the socket
> is not locked.
>
> The flow goes something like this:
>
> sctp_close()
>    sk_bh_lock_sock();
>    sk_common_release()
>      sctp_destroy_sock()
>        endpoint_put()
>          endpoint_destroy() <-- This is where we used to do sctp_put_port
>    sk_bh_unlock_sock();
>    sock_put()
>      sk_free()
>        __sk_free()
>          sctp_destruct_sock()
>            sctp_put_port()
>
> I haven't found any race conditions yet, but that doesn't mean they don't exist.
>
> I think an easy solution would be to do sctp_put_port in sctp_unhash(),
> but I haven't traced all the paths.

Hm, compared to the current (pre-patch) solution, sctp_put_port() does not
necessarily need to be called at sk_common_release() time if refs are still
on the endpoint, so that endpoint_destroy() is further deferred in time. Thus,
if we would do the sctp_put_port() in sctp_unhash(), we could free it at an
earlier time than with endpoint_destroy(). This does not necessarily need to
be a bad or wrong way, but with the current approach it's done at an later
point in time afaik. If it's only about the locking, what if we just hold
that socket lock around sctp_put_port() in the current patch?

But besides that, if at such a late point in time someone still has access to
that socket member (right before we do the kfree(sk)), we would be pretty much
screwed. :-) Despite having the socket lock or not, the port hashtable has it's
own protection from what I see.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-20 17:29     ` Daniel Borkmann
@ 2013-06-21  1:12       ` Vlad Yasevich
  2013-06-22 21:38         ` Daniel Borkmann
  0 siblings, 1 reply; 20+ messages in thread
From: Vlad Yasevich @ 2013-06-21  1:12 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: davem, netdev, linux-sctp, Neil Horman

On 06/20/2013 01:29 PM, Daniel Borkmann wrote:
> On 06/20/2013 04:35 PM, Vlad Yasevich wrote:
>> On 06/18/2013 04:55 AM, Daniel Borkmann wrote:
>>> Rather instead of having the endpoint clean the garbage for the
>>> socket, use a sk_destruct handler sctp_destruct_sock(), that does
>>> the job for that when there are no more references on the socket.
>>
>> With this patch it is possible to run sctp_put_port while the socket
>> is not locked.
>>
>> The flow goes something like this:
>>
>> sctp_close()
>>    sk_bh_lock_sock();
>>    sk_common_release()
>>      sctp_destroy_sock()
>>        endpoint_put()
>>          endpoint_destroy() <-- This is where we used to do sctp_put_port
>>    sk_bh_unlock_sock();
>>    sock_put()
>>      sk_free()
>>        __sk_free()
>>          sctp_destruct_sock()
>>            sctp_put_port()
>>
>> I haven't found any race conditions yet, but that doesn't mean they
>> don't exist.
>>
>> I think an easy solution would be to do sctp_put_port in sctp_unhash(),
>> but I haven't traced all the paths.
>
> Hm, compared to the current (pre-patch) solution, sctp_put_port() does not
> necessarily need to be called at sk_common_release() time if refs are still
> on the endpoint, so that endpoint_destroy() is further deferred in time.
> Thus,
> if we would do the sctp_put_port() in sctp_unhash(), we could free it at an
> earlier time than with endpoint_destroy(). This does not necessarily
> need to
> be a bad or wrong way, but with the current approach it's done at an later
> point in time afaik.

You are right. Doing it in sctp_unhash() could be possibly too early.

> If it's only about the locking, what if we just hold
> that socket lock around sctp_put_port() in the current patch?
>
> But besides that, if at such a late point in time someone still has
> access to
> that socket member (right before we do the kfree(sk)), we would be
> pretty much
> screwed. :-) Despite having the socket lock or not, the port hashtable
> has it's
> own protection from what I see.

Yes it does and I've been looking to see if this is sufficient enough
for our purposes.  It looks like our saving grace is the fact that
we set the sk_state to CLOSED sctp_endpoint_free().  Otherwise, we'd
have a race between sctp_endpoint_destroy() and conflict detection
in sctp_get_port_local.  This seems a bit fragile and we are making
it a bit so more with this patch.

I think it would be better to see if we can remove the socket from
the port table a bit earlier if possbile.

-vlad

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint
  2013-06-21  1:12       ` Vlad Yasevich
@ 2013-06-22 21:38         ` Daniel Borkmann
  0 siblings, 0 replies; 20+ messages in thread
From: Daniel Borkmann @ 2013-06-22 21:38 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: davem, netdev, linux-sctp, Neil Horman

On 06/21/2013 03:12 AM, Vlad Yasevich wrote:
> On 06/20/2013 01:29 PM, Daniel Borkmann wrote:
>> On 06/20/2013 04:35 PM, Vlad Yasevich wrote:
>>> On 06/18/2013 04:55 AM, Daniel Borkmann wrote:
>>>> Rather instead of having the endpoint clean the garbage for the
>>>> socket, use a sk_destruct handler sctp_destruct_sock(), that does
>>>> the job for that when there are no more references on the socket.
>>>
>>> With this patch it is possible to run sctp_put_port while the socket
>>> is not locked.
>>>
>>> The flow goes something like this:
>>>
>>> sctp_close()
>>>    sk_bh_lock_sock();
>>>    sk_common_release()
>>>      sctp_destroy_sock()
>>>        endpoint_put()
>>>          endpoint_destroy() <-- This is where we used to do sctp_put_port
>>>    sk_bh_unlock_sock();
>>>    sock_put()
>>>      sk_free()
>>>        __sk_free()
>>>          sctp_destruct_sock()
>>>            sctp_put_port()
>>>
>>> I haven't found any race conditions yet, but that doesn't mean they
>>> don't exist.
>>>
>>> I think an easy solution would be to do sctp_put_port in sctp_unhash(),
>>> but I haven't traced all the paths.
>>
>> Hm, compared to the current (pre-patch) solution, sctp_put_port() does not
>> necessarily need to be called at sk_common_release() time if refs are still
>> on the endpoint, so that endpoint_destroy() is further deferred in time.
>> Thus,
>> if we would do the sctp_put_port() in sctp_unhash(), we could free it at an
>> earlier time than with endpoint_destroy(). This does not necessarily
>> need to
>> be a bad or wrong way, but with the current approach it's done at an later
>> point in time afaik.
>
> You are right. Doing it in sctp_unhash() could be possibly too early.
>
>> If it's only about the locking, what if we just hold
>> that socket lock around sctp_put_port() in the current patch?
>>
>> But besides that, if at such a late point in time someone still has
>> access to
>> that socket member (right before we do the kfree(sk)), we would be
>> pretty much
>> screwed. :-) Despite having the socket lock or not, the port hashtable
>> has it's
>> own protection from what I see.
>
> Yes it does and I've been looking to see if this is sufficient enough
> for our purposes.  It looks like our saving grace is the fact that
> we set the sk_state to CLOSED sctp_endpoint_free().  Otherwise, we'd
> have a race between sctp_endpoint_destroy() and conflict detection
> in sctp_get_port_local.  This seems a bit fragile and we are making
> it a bit so more with this patch.
>
> I think it would be better to see if we can remove the socket from
> the port table a bit earlier if possbile.

Ok, let me come back to this on Monday.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2013-06-22 21:39 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-18  8:55 [PATCH net-next 0/5] Further SCTP changes Daniel Borkmann
2013-06-18  8:55 ` [PATCH net-next 1/5] net: sctp: remove TEST_FRAME ifdef Daniel Borkmann
2013-06-18  8:55 ` [PATCH net-next 2/5] ktime: add ms_to_ktime() and ktime_add_ms() helpers Daniel Borkmann
2013-06-18  8:55 ` [PATCH net-next 3/5] net: sctp: migrate cookie life from timeval to ktime Daniel Borkmann
2013-06-18  8:55 ` [PATCH net-next 4/5] net: sctp: decouple cleaning socket data from endpoint Daniel Borkmann
2013-06-18 14:22   ` Neil Horman
2013-06-18 16:02     ` Daniel Borkmann
2013-06-18 16:24       ` Vlad Yasevich
2013-06-18 17:45         ` Neil Horman
2013-06-18 18:15           ` Vlad Yasevich
2013-06-18 19:12             ` Neil Horman
2013-06-18 22:55               ` Vlad Yasevich
2013-06-19 11:55                 ` Neil Horman
2013-06-20 14:35   ` Vlad Yasevich
2013-06-20 17:29     ` Daniel Borkmann
2013-06-21  1:12       ` Vlad Yasevich
2013-06-22 21:38         ` Daniel Borkmann
2013-06-18  8:55 ` [PATCH net-next 5/5] net: sctp: minor: sctp_seq_dump_local_addrs add missing newline Daniel Borkmann
2013-06-20  1:39 ` [PATCH net-next 0/5] Further SCTP changes David Miller
2013-06-20 13:24   ` Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).