[PATCH net-next 00/10] tipc: more updates for the v3.8 content

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH net-next 00/10] tipc: more updates for the v3.8 content
@ 2012-12-07 14:28 Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 01/10] tipc: remove obsolete flush of stale reassembly buffer Paul Gortmaker
                   ` (9 more replies)
  0 siblings, 10 replies; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Paul Gortmaker

Here is another batch of TIPC changes.  The most interesting
thing is probably the non-blocking socket connect - I'm told
there were several users looking forward to seeing this.

Also there were some resource limitation changes that had
the right intent back in 2005, but were now apparently causing
needless limitations to people's real use cases; those have
been relaxed/removed.

There is a lockdep splat fix, but no need for a stable backport,
since it is virtually impossible to trigger in mainline; you
have to essentially modify code to force the probabilities
in your favour to see it.

The rest can largely be categorized as general cleanup of things
seen in the process of getting the above changes done.

Tested between 64 and 32 bit nodes with the test suite.  I've
also compile tested all the individual commits on the chain.

I'd originally figured on this queue not being ready for 3.8, but
the extended stabilization window of 3.7 has changed that.  On
the other hand, this can still be 3.9 material, if that simply
works better for folks - no problem for me to defer it to 2013.
If anyone spots any problems then I'll definitely defer it,
rather than rush a last minute respin.

Thanks,
Paul.
---

The following changes since commit b93196dc5af7729ff7cc50d3d322ab1a364aa14f:

  net: fix some compiler warning in net/core/neighbour.c (2012-12-05 21:50:37 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux.git tipc_net-next

for you to fetch changes up to c8c0055db5be549dbd7eef2f207c53bcd556de9e:

  tipc: refactor accept() code for improved readability (2012-12-06 17:29:53 -0500)

----------------------------------------------------------------
Erik Hugne (1):
      tipc: remove obsolete flush of stale reassembly buffer

Jon Maloy (1):
      tipc: change sk_receive_queue upper limit

Paul Gortmaker (2):
      tipc: standardize across connect/disconnect function naming
      tipc: refactor accept() code for improved readability

Ying Xue (6):
      tipc: eliminate aggregate sk_receive_queue limit
      tipc: sk_recv_queue size check only for connectionless sockets
      tipc: consolidate connection-oriented message reception in one function
      tipc: introduce non-blocking socket connect
      tipc: eliminate connection setup for implied connect in recv_msg()
      tipc: add lock nesting notation to quiet lockdep warning

 net/tipc/link.c   |  44 -------
 net/tipc/port.c   |  32 +++--
 net/tipc/port.h   |   6 +-
 net/tipc/socket.c | 360 +++++++++++++++++++++++++++++++-----------------------
 net/tipc/subscr.c |   2 +-
 5 files changed, 232 insertions(+), 212 deletions(-)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next 01/10] tipc: remove obsolete flush of stale reassembly buffer
  2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
@ 2012-12-07 14:28 ` Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit Paul Gortmaker
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Erik Hugne, Paul Gortmaker

From: Erik Hugne <erik.hugne@ericsson.com>

Each link instance has a periodic job checking if there is a stale
ongoing message reassembly associated to the link. If no new
fragment has been received during the last 4*[link_tolerance] period,
it is assumed the missing fragment will never arrive. As a consequence,
the reassembly buffer is discarded, and a gap in the message sequence
occurs.

This assumption is wrong. After we abandoned our ambition to develop
packet routing for multi-cluster networks, only single-hop packet
transfer remains as an option. For those, all packets are guaranteed
to be delivered in sequence to the defragmentation layer. Any failure
to achieve sequenced delivery will eventually lead to link reset, and
the reassembly buffer will be flushed anyway.

So we just remove this periodic check, which is now obsolete.

Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: also delete get/inc_timer count, since they are now unused]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/link.c | 44 --------------------------------------------
 1 file changed, 44 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 87bf5aa..daa6080 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -97,7 +97,6 @@ static int  link_send_sections_long(struct tipc_port *sender,
 				    struct iovec const *msg_sect,
 				    u32 num_sect, unsigned int total_len,
 				    u32 destnode);
-static void link_check_defragm_bufs(struct tipc_link *l_ptr);
 static void link_state_event(struct tipc_link *l_ptr, u32 event);
 static void link_reset_statistics(struct tipc_link *l_ptr);
 static void link_print(struct tipc_link *l_ptr, const char *str);
@@ -271,7 +270,6 @@ static void link_timeout(struct tipc_link *l_ptr)
 	}
 
 	/* do all other link processing performed on a periodic basis */
-	link_check_defragm_bufs(l_ptr);
 
 	link_state_event(l_ptr, TIMEOUT_EVT);
 
@@ -2497,16 +2495,6 @@ static void set_expected_frags(struct sk_buff *buf, u32 exp)
 	msg_set_bcast_ack(buf_msg(buf), exp);
 }
 
-static u32 get_timer_cnt(struct sk_buff *buf)
-{
-	return msg_reroute_cnt(buf_msg(buf));
-}
-
-static void incr_timer_cnt(struct sk_buff *buf)
-{
-	msg_incr_reroute_cnt(buf_msg(buf));
-}
-
 /*
  * tipc_link_recv_fragment(): Called with node lock on. Returns
  * the reassembled buffer if message is complete.
@@ -2585,38 +2573,6 @@ int tipc_link_recv_fragment(struct sk_buff **pending, struct sk_buff **fb,
 	return 0;
 }
 
-/**
- * link_check_defragm_bufs - flush stale incoming message fragments
- * @l_ptr: pointer to link
- */
-static void link_check_defragm_bufs(struct tipc_link *l_ptr)
-{
-	struct sk_buff *prev = NULL;
-	struct sk_buff *next = NULL;
-	struct sk_buff *buf = l_ptr->defragm_buf;
-
-	if (!buf)
-		return;
-	if (!link_working_working(l_ptr))
-		return;
-	while (buf) {
-		u32 cnt = get_timer_cnt(buf);
-
-		next = buf->next;
-		if (cnt < 4) {
-			incr_timer_cnt(buf);
-			prev = buf;
-		} else {
-			if (prev)
-				prev->next = buf->next;
-			else
-				l_ptr->defragm_buf = buf->next;
-			kfree_skb(buf);
-		}
-		buf = next;
-	}
-}
-
 static void link_set_supervision_props(struct tipc_link *l_ptr, u32 tolerance)
 {
 	if ((tolerance < TIPC_MIN_LINK_TOL) || (tolerance > TIPC_MAX_LINK_TOL))
-- 
1.7.12.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit
  2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 01/10] tipc: remove obsolete flush of stale reassembly buffer Paul Gortmaker
@ 2012-12-07 14:28 ` Paul Gortmaker
  2012-12-07 16:07   ` Neil Horman
  2012-12-07 14:28 ` [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets Paul Gortmaker
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Ying Xue, Paul Gortmaker

From: Ying Xue <ying.xue@windriver.com>

As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
global atomic counter for the sum of sk_recv_queue sizes across all
tipc sockets. When incremented, the counter is compared to an upper
threshold value, and if this is reached, the message is rejected
with error code TIPC_OVERLOAD.

This check was originally meant to protect the node against
buffer exhaustion and general CPU overload. However, all experience
indicates that the feature not only is redundant on Linux, but even
harmful. Users run into the limit very often, causing disturbances
for their applications, while removing it seems to have no negative
effects at all. We have also seen that overall performance is
boosted significantly when this bottleneck is removed.

Furthermore, we don't see any other network protocols maintaining
such a mechanism, something strengthening our conviction that this
control can be eliminated.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 1a720c8..a059ed0 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2,7 +2,7 @@
  * net/tipc/socket.c: TIPC socket API
  *
  * Copyright (c) 2001-2007, Ericsson AB
- * Copyright (c) 2004-2008, 2010-2011, Wind River Systems
+ * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -1241,11 +1241,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 	}

 	/* Reject message if there isn't room to queue it */
-	recv_q_len = (u32)atomic_read(&tipc_queue_size);
-	if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
-		if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
-			return TIPC_ERR_OVERLOAD;
-	}
 	recv_q_len = skb_queue_len(&sk->sk_receive_queue);
 	if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
 		if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
-- 
1.7.12.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit
  2012-12-07 14:28 ` [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit Paul Gortmaker
@ 2012-12-07 16:07   ` Neil Horman
  2012-12-07 17:36     ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Neil Horman @ 2012-12-07 16:07 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: David Miller, netdev, Jon Maloy, Ying Xue

On Fri, Dec 07, 2012 at 09:28:10AM -0500, Paul Gortmaker wrote:
> From: Ying Xue <ying.xue@windriver.com>
> 
> As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
> global atomic counter for the sum of sk_recv_queue sizes across all
> tipc sockets. When incremented, the counter is compared to an upper
> threshold value, and if this is reached, the message is rejected
> with error code TIPC_OVERLOAD.
> 
> This check was originally meant to protect the node against
> buffer exhaustion and general CPU overload. However, all experience
> indicates that the feature not only is redundant on Linux, but even
> harmful. Users run into the limit very often, causing disturbances
> for their applications, while removing it seems to have no negative
> effects at all. We have also seen that overall performance is
> boosted significantly when this bottleneck is removed.
> 
> Furthermore, we don't see any other network protocols maintaining
> such a mechanism, something strengthening our conviction that this
> control can be eliminated.
> 
> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  net/tipc/socket.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 1a720c8..a059ed0 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -2,7 +2,7 @@
>   * net/tipc/socket.c: TIPC socket API
>   *
>   * Copyright (c) 2001-2007, Ericsson AB
> - * Copyright (c) 2004-2008, 2010-2011, Wind River Systems
> + * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
>   * All rights reserved.
>   *
>   * Redistribution and use in source and binary forms, with or without
> @@ -1241,11 +1241,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
>  	}
>  
>  	/* Reject message if there isn't room to queue it */
> -	recv_q_len = (u32)atomic_read(&tipc_queue_size);
> -	if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
> -		if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
> -			return TIPC_ERR_OVERLOAD;
> -	}
If you're going to remove the one place that you read this variable, don't you
also want to remove the points where you increment/decrement the atomic as well,
and for that matter eliminate the definition itself?

Neil

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit
  2012-12-07 16:07   ` Neil Horman
@ 2012-12-07 17:36     ` David Miller
  2012-12-07 19:54       ` Paul Gortmaker
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2012-12-07 17:36 UTC (permalink / raw)
  To: nhorman; +Cc: paul.gortmaker, netdev, jon.maloy, ying.xue

From: Neil Horman <nhorman@tuxdriver.com>
Date: Fri, 7 Dec 2012 11:07:33 -0500

> On Fri, Dec 07, 2012 at 09:28:10AM -0500, Paul Gortmaker wrote:
>> From: Ying Xue <ying.xue@windriver.com>
>> 
>> As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
>> global atomic counter for the sum of sk_recv_queue sizes across all
>> tipc sockets. When incremented, the counter is compared to an upper
>> threshold value, and if this is reached, the message is rejected
>> with error code TIPC_OVERLOAD.
>> 
>> This check was originally meant to protect the node against
>> buffer exhaustion and general CPU overload. However, all experience
>> indicates that the feature not only is redundant on Linux, but even
>> harmful. Users run into the limit very often, causing disturbances
>> for their applications, while removing it seems to have no negative
>> effects at all. We have also seen that overall performance is
>> boosted significantly when this bottleneck is removed.
>> 
>> Furthermore, we don't see any other network protocols maintaining
>> such a mechanism, something strengthening our conviction that this
>> control can be eliminated.
>> 
>> Signed-off-by: Ying Xue <ying.xue@windriver.com>
>> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
>> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
 ...
>> @@ -1241,11 +1241,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
>>  	}
>>  
>>  	/* Reject message if there isn't room to queue it */
>> -	recv_q_len = (u32)atomic_read(&tipc_queue_size);
>> -	if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
>> -		if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
>> -			return TIPC_ERR_OVERLOAD;
>> -	}
> If you're going to remove the one place that you read this variable, don't you
> also want to remove the points where you increment/decrement the atomic as well,
> and for that matter eliminate the definition itself?

There's another reader, a getsockopt() call.

I would just make it return zero or similar.

Paul please do so and respin this series.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit
  2012-12-07 17:36     ` David Miller
@ 2012-12-07 19:54       ` Paul Gortmaker
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 19:54 UTC (permalink / raw)
  To: David Miller; +Cc: nhorman, netdev, jon.maloy, ying.xue

[Re: [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit] On 07/12/2012 (Fri 12:36) David Miller wrote:

> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Fri, 7 Dec 2012 11:07:33 -0500
> 
> > On Fri, Dec 07, 2012 at 09:28:10AM -0500, Paul Gortmaker wrote:
> >> From: Ying Xue <ying.xue@windriver.com>
> >> 
> >> As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
> >> global atomic counter for the sum of sk_recv_queue sizes across all
> >> tipc sockets. When incremented, the counter is compared to an upper
> >> threshold value, and if this is reached, the message is rejected
> >> with error code TIPC_OVERLOAD.
> >> 
> >> This check was originally meant to protect the node against
> >> buffer exhaustion and general CPU overload. However, all experience
> >> indicates that the feature not only is redundant on Linux, but even
> >> harmful. Users run into the limit very often, causing disturbances
> >> for their applications, while removing it seems to have no negative
> >> effects at all. We have also seen that overall performance is
> >> boosted significantly when this bottleneck is removed.
> >> 
> >> Furthermore, we don't see any other network protocols maintaining
> >> such a mechanism, something strengthening our conviction that this
> >> control can be eliminated.
> >> 
> >> Signed-off-by: Ying Xue <ying.xue@windriver.com>
> >> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
> >> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
>  ...
> >> @@ -1241,11 +1241,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
> >>  	}
> >>  
> >>  	/* Reject message if there isn't room to queue it */
> >> -	recv_q_len = (u32)atomic_read(&tipc_queue_size);
> >> -	if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
> >> -		if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
> >> -			return TIPC_ERR_OVERLOAD;
> >> -	}
> > If you're going to remove the one place that you read this variable, don't you
> > also want to remove the points where you increment/decrement the atomic as well,
> > and for that matter eliminate the definition itself?
> 
> There's another reader, a getsockopt() call.
> 
> I would just make it return zero or similar.

Updated commit below for reference, just to double check that I'm
doing what is requested properly.

Paul.
--

 From 9da3d475874f4da49057767913af95ce01063ba3 Mon Sep 17 00:00:00 2001
From: Ying Xue <ying.xue@windriver.com>
Date: Tue, 27 Nov 2012 06:15:27 -0500
Subject: [PATCH] tipc: eliminate aggregate sk_receive_queue limit

As a complement to the per-socket sk_recv_queue limit, TIPC keeps a
global atomic counter for the sum of sk_recv_queue sizes across all
tipc sockets. When incremented, the counter is compared to an upper
threshold value, and if this is reached, the message is rejected
with error code TIPC_OVERLOAD.

This check was originally meant to protect the node against
buffer exhaustion and general CPU overload. However, all experience
indicates that the feature not only is redundant on Linux, but even
harmful. Users run into the limit very often, causing disturbances
for their applications, while removing it seems to have no negative
effects at all. We have also seen that overall performance is
boosted significantly when this bottleneck is removed.

Furthermore, we don't see any other network protocols maintaining
such a mechanism, something strengthening our conviction that this
control can be eliminated.

As a result, the atomic variable tipc_queue_size is now unused
and so it can be deleted.  There is a getsockopt call that used
to allow reading it; we retain that but just return zero for
maximum compatibility.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
[PG: phase out tipc_queue_size as pointed out by Neil Horman]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 1a720c8..848be69 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2,7 +2,7 @@
  * net/tipc/socket.c: TIPC socket API
  *
  * Copyright (c) 2001-2007, Ericsson AB
- * Copyright (c) 2004-2008, 2010-2011, Wind River Systems
+ * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
  * All rights reserved.
  *
  * Redistribution and use in source and binary forms, with or without
@@ -73,8 +73,6 @@ static struct proto tipc_proto;
 
 static int sockets_enabled;
 
-static atomic_t tipc_queue_size = ATOMIC_INIT(0);
-
 /*
  * Revised TIPC socket locking policy:
  *
@@ -128,7 +126,6 @@ static atomic_t tipc_queue_size = ATOMIC_INIT(0);
 static void advance_rx_queue(struct sock *sk)
 {
 	kfree_skb(__skb_dequeue(&sk->sk_receive_queue));
-	atomic_dec(&tipc_queue_size);
 }
 
 /**
@@ -140,10 +137,8 @@ static void discard_rx_queue(struct sock *sk)
 {
 	struct sk_buff *buf;
 
-	while ((buf = __skb_dequeue(&sk->sk_receive_queue))) {
-		atomic_dec(&tipc_queue_size);
+	while ((buf = __skb_dequeue(&sk->sk_receive_queue)))
 		kfree_skb(buf);
-	}
 }
 
 /**
@@ -155,10 +150,8 @@ static void reject_rx_queue(struct sock *sk)
 {
 	struct sk_buff *buf;
 
-	while ((buf = __skb_dequeue(&sk->sk_receive_queue))) {
+	while ((buf = __skb_dequeue(&sk->sk_receive_queue)))
 		tipc_reject_msg(buf, TIPC_ERR_NO_PORT);
-		atomic_dec(&tipc_queue_size);
-	}
 }
 
 /**
@@ -280,7 +273,6 @@ static int release(struct socket *sock)
 		buf = __skb_dequeue(&sk->sk_receive_queue);
 		if (buf == NULL)
 			break;
-		atomic_dec(&tipc_queue_size);
 		if (TIPC_SKB_CB(buf)->handle != 0)
 			kfree_skb(buf);
 		else {
@@ -1241,11 +1233,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 	}
 
 	/* Reject message if there isn't room to queue it */
-	recv_q_len = (u32)atomic_read(&tipc_queue_size);
-	if (unlikely(recv_q_len >= OVERLOAD_LIMIT_BASE)) {
-		if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE))
-			return TIPC_ERR_OVERLOAD;
-	}
 	recv_q_len = skb_queue_len(&sk->sk_receive_queue);
 	if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
 		if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
@@ -1254,7 +1241,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 
 	/* Enqueue message (finally!) */
 	TIPC_SKB_CB(buf)->handle = 0;
-	atomic_inc(&tipc_queue_size);
 	__skb_queue_tail(&sk->sk_receive_queue, buf);
 
 	/* Initiate connection termination for an incoming 'FIN' */
@@ -1578,7 +1564,6 @@ restart:
 		/* Disconnect and send a 'FIN+' or 'FIN-' message to peer */
 		buf = __skb_dequeue(&sk->sk_receive_queue);
 		if (buf) {
-			atomic_dec(&tipc_queue_size);
 			if (TIPC_SKB_CB(buf)->handle != 0) {
 				kfree_skb(buf);
 				goto restart;
@@ -1717,7 +1702,7 @@ static int getsockopt(struct socket *sock,
 		/* no need to set "res", since already 0 at this point */
 		break;
 	case TIPC_NODE_RECVQ_DEPTH:
-		value = (u32)atomic_read(&tipc_queue_size);
+		value = 0; /* was tipc_queue_size, now obsolete */
 		break;
 	case TIPC_SOCK_RECVQ_DEPTH:
 		value = skb_queue_len(&sk->sk_receive_queue);
-- 
1.7.12.1

> 
> Paul please do so and respin this series.
> 
> Thanks.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets
  2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 01/10] tipc: remove obsolete flush of stale reassembly buffer Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit Paul Gortmaker
@ 2012-12-07 14:28 ` Paul Gortmaker
  2012-12-07 19:20   ` Neil Horman
  2012-12-07 14:28 ` [PATCH net-next 04/10] tipc: change sk_receive_queue upper limit Paul Gortmaker
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Ying Xue, Paul Gortmaker

From: Ying Xue <ying.xue@windriver.com>

The sk_receive_queue limit control is currently performed for
all arriving messages, disregarding socket and message type.
But for connected sockets this check is redundant, since the protocol
flow control already makes queue overflow impossible.

We move the sk_receive_queue limit control so that it is only performed
for connectionless sockets, i.e. SOCK_RDM and SOCK_DGRAM type sockets.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index a059ed0..4d6a448 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1188,9 +1188,6 @@ static int rx_queue_full(struct tipc_msg *msg, u32 queue_size, u32 base)
 	else
 		return 0;
 
-	if (msg_connected(msg))
-		threshold *= 4;
-
 	return queue_size >= threshold;
 }
 
@@ -1219,6 +1216,15 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 	if (sock->state == SS_READY) {
 		if (msg_connected(msg))
 			return TIPC_ERR_NO_PORT;
+		/* Reject SOCK_DGRAM and SOCK_RDM messages if there isn't room
+		 * to queue it.
+		 */
+		recv_q_len = skb_queue_len(&sk->sk_receive_queue);
+		if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
+			if (rx_queue_full(msg, recv_q_len,
+			    OVERLOAD_LIMIT_BASE / 2))
+				return TIPC_ERR_OVERLOAD;
+		}
 	} else {
 		if (msg_mcast(msg))
 			return TIPC_ERR_NO_PORT;
@@ -1240,13 +1246,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 		}
 	}
 
-	/* Reject message if there isn't room to queue it */
-	recv_q_len = skb_queue_len(&sk->sk_receive_queue);
-	if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2))) {
-		if (rx_queue_full(msg, recv_q_len, OVERLOAD_LIMIT_BASE / 2))
-			return TIPC_ERR_OVERLOAD;
-	}
-
 	/* Enqueue message (finally!) */
 	TIPC_SKB_CB(buf)->handle = 0;
 	atomic_inc(&tipc_queue_size);
-- 
1.7.12.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets
  2012-12-07 14:28 ` [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets Paul Gortmaker
@ 2012-12-07 19:20   ` Neil Horman
  2012-12-07 22:30     ` Jon Maloy
  0 siblings, 1 reply; 22+ messages in thread
From: Neil Horman @ 2012-12-07 19:20 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: David Miller, netdev, Jon Maloy, Ying Xue

On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
> From: Ying Xue <ying.xue@windriver.com>
> 
> The sk_receive_queue limit control is currently performed for
> all arriving messages, disregarding socket and message type.
> But for connected sockets this check is redundant, since the protocol
> flow control already makes queue overflow impossible.
> 
Can you explain where that occurs?  I see where the tipc dispatch function calls
sk_add_backlog, which checks the per socket recieve queue (regardless of weather
the receiving socket is connection oriented or connectionless), but if the
receiver doesn't call receive very often, This just adds a check against your
global limit, doing nothing for your per-socket limits.  In fact it seems to
repeat the same check twice, as in the worst case of the incomming message being
TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
OVERLOAD_LIMIT_BASE/2 again.

Neil

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets
  2012-12-07 19:20   ` Neil Horman
@ 2012-12-07 22:30     ` Jon Maloy
  2012-12-09 16:50       ` Neil Horman
  0 siblings, 1 reply; 22+ messages in thread
From: Jon Maloy @ 2012-12-07 22:30 UTC (permalink / raw)
  To: Neil Horman; +Cc: Paul Gortmaker, David Miller, netdev, Ying Xue

On 12/07/2012 02:20 PM, Neil Horman wrote:
> On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
>> From: Ying Xue <ying.xue@windriver.com>
>>
>> The sk_receive_queue limit control is currently performed for
>> all arriving messages, disregarding socket and message type.
>> But for connected sockets this check is redundant, since the protocol
>> flow control already makes queue overflow impossible.
>>
> Can you explain where that occurs?  

It happens in the functions port_dispatcher_sigh() and tipc_send(), 
among other places. Both are to be found in the file port.c, which 
was supposed to contain the 'generic' (i.e., API independent) part 
of the send/receive code.
Now that we have only one API left, the socket API, we are 
planning to merge the code in socket.c and port.c, and get rid of 
some code overhead.

The flow control in TIPC is message based, where the sender
requires to receive an explicit acknowledge message for each 
512 message the receiver reads to user space.
If the sender has more than 1024 messages outstanding without having
received an acknowledge he will be suspended or receive EAGAIN until 
he does.
The plan going forward is to replace this mechanism with a more 
standard looking byte based flow control, while maintaining 
backwards compatibility.

> I see where the tipc dispatch function calls
> sk_add_backlog, which checks the per socket recieve queue (regardless of weather
> the receiving socket is connection oriented or connectionless), but if the
> receiver doesn't call receive very often, This just adds a check against your
> global limit, doing nothing for your per-socket limits. 

OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
our per-socket limit. In fact, TIPC connectionless overflow control currently 
is a kind of a hybrid, based on a message counter when the socket is not locked, 
and based on sk_rcv_queue's byte limit when a message has to be added to the 
backlog.
We are planning to fix this inconsistency too.

 In fact it seems to
> repeat the same check twice, as in the worst case of the incomming message being
> TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
> OVERLOAD_LIMIT_BASE/2 again.

Yes, you are right. The intention is that only the first test, 
if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
will be run for the vast majority of messages, since we must assume
that there is no overload most of the time.
An inelegant optimization, perhaps, but not logically wrong.

///jon

> 
> Neil
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets
  2012-12-07 22:30     ` Jon Maloy
@ 2012-12-09 16:50       ` Neil Horman
  2012-12-10  6:27         ` Ying Xue
  0 siblings, 1 reply; 22+ messages in thread
From: Neil Horman @ 2012-12-09 16:50 UTC (permalink / raw)
  To: Jon Maloy; +Cc: Paul Gortmaker, David Miller, netdev, Ying Xue

On Fri, Dec 07, 2012 at 05:30:11PM -0500, Jon Maloy wrote:
> On 12/07/2012 02:20 PM, Neil Horman wrote:
> > On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
> >> From: Ying Xue <ying.xue@windriver.com>
> >>
> >> The sk_receive_queue limit control is currently performed for
> >> all arriving messages, disregarding socket and message type.
> >> But for connected sockets this check is redundant, since the protocol
> >> flow control already makes queue overflow impossible.
> >>
> > Can you explain where that occurs?  
> 
> It happens in the functions port_dispatcher_sigh() and tipc_send(), 
> among other places. Both are to be found in the file port.c, which 
> was supposed to contain the 'generic' (i.e., API independent) part 
> of the send/receive code.
> Now that we have only one API left, the socket API, we are 
> planning to merge the code in socket.c and port.c, and get rid of 
> some code overhead.
> 
> The flow control in TIPC is message based, where the sender
> requires to receive an explicit acknowledge message for each 
> 512 message the receiver reads to user space.
> If the sender has more than 1024 messages outstanding without having
> received an acknowledge he will be suspended or receive EAGAIN until 
> he does.
> The plan going forward is to replace this mechanism with a more 
> standard looking byte based flow control, while maintaining 
> backwards compatibility.
> 
Ok, That makes more sense, thank you.  Although I still don't think this is
safe (but the problem may not be solely introduced by this patch).  Using a
global limit that assumes the sender will block when the congestion window is
reached just doesn't seem sane to me.  It clearly works with the Linux
implementation, as it conforms to your expectations, but an alternate
implementation could create a DOS situation by simply ignoring the window limit,
and continuing to send.  I see that we drop frames over the global limit in
filter_rcv, but the check in rx_queue_full bumps up that limit based on the
value of msg_importance(msg), but that threshold is ignored if the value of
msg_importance is invalid.  All a sender needs to do is flood a receiver with
frames containing an invalid set of message importance bits, and you will queue
frames indefinately.  In fact that will also happen if you send message of
CRITICAL importance as well, so you don't even need to supply an invalid value
here.

> 
> > I see where the tipc dispatch function calls
> > sk_add_backlog, which checks the per socket recieve queue (regardless of weather
> > the receiving socket is connection oriented or connectionless), but if the
> > receiver doesn't call receive very often, This just adds a check against your
> > global limit, doing nothing for your per-socket limits. 
> 
> OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
> our per-socket limit. In fact, TIPC connectionless overflow control currently 
> is a kind of a hybrid, based on a message counter when the socket is not locked, 
> and based on sk_rcv_queue's byte limit when a message has to be added to the 
> backlog.
> We are planning to fix this inconsistency too.
Good, thank you,  that was seeming quite wrong to me.

> 
>  In fact it seems to
> > repeat the same check twice, as in the worst case of the incomming message being
> > TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
> > OVERLOAD_LIMIT_BASE/2 again.
> 
> Yes, you are right. The intention is that only the first test, 
> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
> will be run for the vast majority of messages, since we must assume
> that there is no overload most of the time.
> An inelegant optimization, perhaps, but not logically wrong.
No, not logically wrong, but not an optimization either.  With this change,
your only use of rx_queue_full passes OVERLOAD_LIMIT_BASE/2 as the base value to
rx_queue_full, and then you do some multiplication based on that.  If you really
want to optimize this, leave OVERLOAD_LIMIT_BASE where it is (rather than
doubling it like this patch series does), mark rx_queue_full as inline, and just
pass OVERLOAD_LIMIT_BASE as the argument, it will save you a division opration,
the conditional branch and a call instruction.  If you add a multiplication
factor table, you can eliminate the if/else clauses in rx_queue_full as well.

Neil

> 
> ///jon
> 
> > 
> > Neil
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets
  2012-12-09 16:50       ` Neil Horman
@ 2012-12-10  6:27         ` Ying Xue
  2012-12-10  8:46           ` Jon Maloy
  2012-12-10 14:22           ` Neil Horman
  0 siblings, 2 replies; 22+ messages in thread
From: Ying Xue @ 2012-12-10  6:27 UTC (permalink / raw)
  To: Neil Horman; +Cc: Jon Maloy, Paul Gortmaker, David Miller, netdev, Ying Xue

Neil Horman wrote:
> On Fri, Dec 07, 2012 at 05:30:11PM -0500, Jon Maloy wrote:
>   
>> On 12/07/2012 02:20 PM, Neil Horman wrote:
>>     
>>> On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
>>>       
>>>> From: Ying Xue <ying.xue@windriver.com>
>>>>
>>>> The sk_receive_queue limit control is currently performed for
>>>> all arriving messages, disregarding socket and message type.
>>>> But for connected sockets this check is redundant, since the protocol
>>>> flow control already makes queue overflow impossible.
>>>>
>>>>         
>>> Can you explain where that occurs?  
>>>       
>> It happens in the functions port_dispatcher_sigh() and tipc_send(), 
>> among other places. Both are to be found in the file port.c, which 
>> was supposed to contain the 'generic' (i.e., API independent) part 
>> of the send/receive code.
>> Now that we have only one API left, the socket API, we are 
>> planning to merge the code in socket.c and port.c, and get rid of 
>> some code overhead.
>>
>> The flow control in TIPC is message based, where the sender
>> requires to receive an explicit acknowledge message for each 
>> 512 message the receiver reads to user space.
>> If the sender has more than 1024 messages outstanding without having
>> received an acknowledge he will be suspended or receive EAGAIN until 
>> he does.
>> The plan going forward is to replace this mechanism with a more 
>> standard looking byte based flow control, while maintaining 
>> backwards compatibility.
>>
>>     
> Ok, That makes more sense, thank you.  Although I still don't think this is
> safe (but the problem may not be solely introduced by this patch).  Using a
> global limit that assumes the sender will block when the congestion window is
> reached just doesn't seem sane to me.  It clearly works with the Linux
> implementation, as it conforms to your expectations, but an alternate
> implementation could create a DOS situation by simply ignoring the window limit,
> and continuing to send.  I see that we drop frames over the global limit in
> filter_rcv, but the check in rx_queue_full bumps up that limit based on the
> value of msg_importance(msg), but that threshold is ignored if the value of
> msg_importance is invalid.  All a sender needs to do is flood a receiver with
> frames containing an invalid set of message importance bits, and you will queue
> frames indefinately.  In fact that will also happen if you send message of
> CRITICAL importance as well, so you don't even need to supply an invalid value
> here.
>
>   

You are absolutely right. I will correct these drawbacks in next version.

>>> I see where the tipc dispatch function calls
>>> sk_add_backlog, which checks the per socket recieve queue (regardless of weather
>>> the receiving socket is connection oriented or connectionless), but if the
>>> receiver doesn't call receive very often, This just adds a check against your
>>> global limit, doing nothing for your per-socket limits. 
>>>       
>> OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
>> our per-socket limit. In fact, TIPC connectionless overflow control currently 
>> is a kind of a hybrid, based on a message counter when the socket is not locked, 
>> and based on sk_rcv_queue's byte limit when a message has to be added to the 
>> backlog.
>> We are planning to fix this inconsistency too.
>>     
> Good, thank you,  that was seeming quite wrong to me.
>
>   
>>  In fact it seems to
>>     
>>> repeat the same check twice, as in the worst case of the incomming message being
>>> TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
>>> OVERLOAD_LIMIT_BASE/2 again.
>>>       
>> Yes, you are right. The intention is that only the first test, 
>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
>> will be run for the vast majority of messages, since we must assume
>> that there is no overload most of the time.
>> An inelegant optimization, perhaps, but not logically wrong.
>>     
> No, not logically wrong, but not an optimization either.  With this change,
> your only use of rx_queue_full passes OVERLOAD_LIMIT_BASE/2 as the base value to
> rx_queue_full, and then you do some multiplication based on that.  If you really
> want to optimize this, leave OVERLOAD_LIMIT_BASE where it is (rather than
> doubling it like this patch series does), mark rx_queue_full as inline, and just
> pass OVERLOAD_LIMIT_BASE as the argument, it will save you a division opration,
> the conditional branch and a call instruction.  If you add a multiplication
> factor table, you can eliminate the if/else clauses in rx_queue_full as well.
>
>   

Good suggestion with a factor table. Maybe it's unnecessary to 
explicitly mark rx_queue_full as inline. Currently it sounds like we let 
complier decide whether a function is defined as inline or not.

Regards,
Ying

> Neil
>
>   
>> ///jon
>>
>>     
>>> Neil
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>
>>>       
>>     
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>   

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets
  2012-12-10  6:27         ` Ying Xue
@ 2012-12-10  8:46           ` Jon Maloy
  2012-12-10 14:22           ` Neil Horman
  1 sibling, 0 replies; 22+ messages in thread
From: Jon Maloy @ 2012-12-10  8:46 UTC (permalink / raw)
  To: Ying Xue; +Cc: Neil Horman, Paul Gortmaker, David Miller, netdev

On 12/10/2012 01:27 AM, Ying Xue wrote:
> Neil Horman wrote:
>> On Fri, Dec 07, 2012 at 05:30:11PM -0500, Jon Maloy wrote:
>>   
>>> On 12/07/2012 02:20 PM, Neil Horman wrote:
>>>     
>>>> On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
>>>>       
>>>>> From: Ying Xue <ying.xue@windriver.com>
>>>>>
>>>>> The sk_receive_queue limit control is currently performed for
>>>>> all arriving messages, disregarding socket and message type.
>>>>> But for connected sockets this check is redundant, since the protocol
>>>>> flow control already makes queue overflow impossible.
>>>>>
>>>>>         
>>>> Can you explain where that occurs?  
>>>>       
>>> It happens in the functions port_dispatcher_sigh() and tipc_send(), 
>>> among other places. Both are to be found in the file port.c, which 
>>> was supposed to contain the 'generic' (i.e., API independent) part 
>>> of the send/receive code.
>>> Now that we have only one API left, the socket API, we are 
>>> planning to merge the code in socket.c and port.c, and get rid of 
>>> some code overhead.
>>>
>>> The flow control in TIPC is message based, where the sender
>>> requires to receive an explicit acknowledge message for each 
>>> 512 message the receiver reads to user space.
>>> If the sender has more than 1024 messages outstanding without having
>>> received an acknowledge he will be suspended or receive EAGAIN until 
>>> he does.
>>> The plan going forward is to replace this mechanism with a more 
>>> standard looking byte based flow control, while maintaining 
>>> backwards compatibility.
>>>
>>>     
>> Ok, That makes more sense, thank you.  Although I still don't think this is
>> safe (but the problem may not be solely introduced by this patch).  Using a
>> global limit that assumes the sender will block when the congestion window is
>> reached just doesn't seem sane to me.  It clearly works with the Linux
>> implementation, as it conforms to your expectations, but an alternate
>> implementation could create a DOS situation by simply ignoring the window limit,
>> and continuing to send.  I see that we drop frames over the global limit in
>> filter_rcv, but the check in rx_queue_full bumps up that limit based on the
>> value of msg_importance(msg), but that threshold is ignored if the value of
>> msg_importance is invalid.  All a sender needs to do is flood a receiver with
>> frames containing an invalid set of message importance bits, and you will queue
>> frames indefinately.  In fact that will also happen if you send message of
>> CRITICAL importance as well, so you don't even need to supply an invalid value
>> here.
>>
>>   
> 
> You are absolutely right. I will correct these drawbacks in next version.

I think we should rather just drop this patch. We introduce a major vulnerability,
as Neil correctly points out. We will anyway have to do a rework of this code.

> 
>>>> I see where the tipc dispatch function calls
>>>> sk_add_backlog, which checks the per socket recieve queue (regardless of weather
>>>> the receiving socket is connection oriented or connectionless), but if the
>>>> receiver doesn't call receive very often, This just adds a check against your
>>>> global limit, doing nothing for your per-socket limits. 
>>>>       
>>> OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
>>> our per-socket limit. In fact, TIPC connectionless overflow control currently 
>>> is a kind of a hybrid, based on a message counter when the socket is not locked, 
>>> and based on sk_rcv_queue's byte limit when a message has to be added to the 
>>> backlog.
>>> We are planning to fix this inconsistency too.
>>>     
>> Good, thank you,  that was seeming quite wrong to me.
>>
>>   
>>>  In fact it seems to
>>>     
>>>> repeat the same check twice, as in the worst case of the incomming message being
>>>> TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
>>>> OVERLOAD_LIMIT_BASE/2 again.
>>>>       
>>> Yes, you are right. The intention is that only the first test, 
>>> if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
>>> will be run for the vast majority of messages, since we must assume
>>> that there is no overload most of the time.
>>> An inelegant optimization, perhaps, but not logically wrong.
>>>     
>> No, not logically wrong, but not an optimization either.  With this change,
>> your only use of rx_queue_full passes OVERLOAD_LIMIT_BASE/2 as the base value to
>> rx_queue_full, and then you do some multiplication based on that.  

It is still in the "unlikely" (in fact, very unlikely) branch. And the multiplication
is by two, i.e. just a left-shift operation. Our approach was rather to let the 
compiler decide about inlining, which in this case might be a sub-optimization.

If you really
>> want to optimize this, leave OVERLOAD_LIMIT_BASE where it is (rather than
>> doubling it like this patch series does), mark rx_queue_full as inline, and just
>> pass OVERLOAD_LIMIT_BASE as the argument, it will save you a division opration,
>> the conditional branch and a call instruction.  If you add a multiplication
>> factor table, you can eliminate the if/else clauses in rx_queue_full as well.
>>
>>   
> 
> Good suggestion with a factor table. Maybe it's unnecessary to 
> explicitly mark rx_queue_full as inline. Currently it sounds like we let 
> complier decide whether a function is defined as inline or not.

One approach I had in mind was to just left-shift OVERLOAD_LIMIT_BASE with
message priority, and compare that to the per-socket counter. This way,
we obtain the limit set [10000,20000,30000,40000] without having to read
data memory. The limits will not be the same as now, but probably good
enough. We don't even need a separate function for this check.
Something we should look into when we move on to make this mechanism 
byte-based.

> 
> Regards,
> Ying
> 
>> Neil
>>
>>   
>>> ///jon
>>>
>>>     
>>>> Neil
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>       
>>>     
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>   
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets
  2012-12-10  6:27         ` Ying Xue
  2012-12-10  8:46           ` Jon Maloy
@ 2012-12-10 14:22           ` Neil Horman
  1 sibling, 0 replies; 22+ messages in thread
From: Neil Horman @ 2012-12-10 14:22 UTC (permalink / raw)
  To: Ying Xue; +Cc: Jon Maloy, Paul Gortmaker, David Miller, netdev

On Mon, Dec 10, 2012 at 02:27:50PM +0800, Ying Xue wrote:
> Neil Horman wrote:
> >On Fri, Dec 07, 2012 at 05:30:11PM -0500, Jon Maloy wrote:
> >>On 12/07/2012 02:20 PM, Neil Horman wrote:
> >>>On Fri, Dec 07, 2012 at 09:28:11AM -0500, Paul Gortmaker wrote:
> >>>>From: Ying Xue <ying.xue@windriver.com>
> >>>>
> >>>>The sk_receive_queue limit control is currently performed for
> >>>>all arriving messages, disregarding socket and message type.
> >>>>But for connected sockets this check is redundant, since the protocol
> >>>>flow control already makes queue overflow impossible.
> >>>>
> >>>Can you explain where that occurs?
> >>It happens in the functions port_dispatcher_sigh() and
> >>tipc_send(), among other places. Both are to be found in the
> >>file port.c, which was supposed to contain the 'generic' (i.e.,
> >>API independent) part of the send/receive code.
> >>Now that we have only one API left, the socket API, we are
> >>planning to merge the code in socket.c and port.c, and get rid
> >>of some code overhead.
> >>
> >>The flow control in TIPC is message based, where the sender
> >>requires to receive an explicit acknowledge message for each 512
> >>message the receiver reads to user space.
> >>If the sender has more than 1024 messages outstanding without having
> >>received an acknowledge he will be suspended or receive EAGAIN
> >>until he does.
> >>The plan going forward is to replace this mechanism with a more
> >>standard looking byte based flow control, while maintaining
> >>backwards compatibility.
> >>
> >Ok, That makes more sense, thank you.  Although I still don't think this is
> >safe (but the problem may not be solely introduced by this patch).  Using a
> >global limit that assumes the sender will block when the congestion window is
> >reached just doesn't seem sane to me.  It clearly works with the Linux
> >implementation, as it conforms to your expectations, but an alternate
> >implementation could create a DOS situation by simply ignoring the window limit,
> >and continuing to send.  I see that we drop frames over the global limit in
> >filter_rcv, but the check in rx_queue_full bumps up that limit based on the
> >value of msg_importance(msg), but that threshold is ignored if the value of
> >msg_importance is invalid.  All a sender needs to do is flood a receiver with
> >frames containing an invalid set of message importance bits, and you will queue
> >frames indefinately.  In fact that will also happen if you send message of
> >CRITICAL importance as well, so you don't even need to supply an invalid value
> >here.
> >
> 
> You are absolutely right. I will correct these drawbacks in next version.
> 
> >>>I see where the tipc dispatch function calls
> >>>sk_add_backlog, which checks the per socket recieve queue (regardless of weather
> >>>the receiving socket is connection oriented or connectionless), but if the
> >>>receiver doesn't call receive very often, This just adds a check against your
> >>>global limit, doing nothing for your per-socket limits.
> >>OVERLOAD_LIMIT_BASE is tested against a per-socket message counter, so it _is_
> >>our per-socket limit. In fact, TIPC connectionless overflow
> >>control currently is a kind of a hybrid, based on a message
> >>counter when the socket is not locked, and based on
> >>sk_rcv_queue's byte limit when a message has to be added to the
> >>backlog.
> >>We are planning to fix this inconsistency too.
> >Good, thank you,  that was seeming quite wrong to me.
> >
> >> In fact it seems to
> >>>repeat the same check twice, as in the worst case of the incomming message being
> >>>TIPC_LOW_IMPORTANCE, its just going to check that the global limit is exactly
> >>>OVERLOAD_LIMIT_BASE/2 again.
> >>Yes, you are right. The intention is that only the first test,
> >>if (unlikely(recv_q_len >= (OVERLOAD_LIMIT_BASE / 2)){..}
> >>will be run for the vast majority of messages, since we must assume
> >>that there is no overload most of the time.
> >>An inelegant optimization, perhaps, but not logically wrong.
> >No, not logically wrong, but not an optimization either.  With this change,
> >your only use of rx_queue_full passes OVERLOAD_LIMIT_BASE/2 as the base value to
> >rx_queue_full, and then you do some multiplication based on that.  If you really
> >want to optimize this, leave OVERLOAD_LIMIT_BASE where it is (rather than
> >doubling it like this patch series does), mark rx_queue_full as inline, and just
> >pass OVERLOAD_LIMIT_BASE as the argument, it will save you a division opration,
> >the conditional branch and a call instruction.  If you add a multiplication
> >factor table, you can eliminate the if/else clauses in rx_queue_full as well.
> >
> 
> Good suggestion with a factor table. Maybe it's unnecessary to
> explicitly mark rx_queue_full as inline. Currently it sounds like we
> let complier decide whether a function is defined as inline or not.
> 
Thats correct, the compiler usually decides if something should be inlined
(unless you use the __always_inline) attribute.  In this case, given a single
call site, it most like will just inline anyway.  But if you're interested in
optimizing here, it might be worth taking the extra steps to make sure.  In
fact, since this is your only call site, it may be worthwhile to just remove the
function entirely, and manually inline the check.
Neil

> Regards,
> Ying
> 
> >Neil
> >
> >>///jon
> >>
> >>>Neil
> >>>
> >>>--
> >>>To unsubscribe from this list: send the line "unsubscribe netdev" in
> >>>the body of a message to majordomo@vger.kernel.org
> >>>More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >>>
> >--
> >To unsubscribe from this list: send the line "unsubscribe netdev" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH net-next 04/10] tipc: change sk_receive_queue upper limit
  2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
                   ` (2 preceding siblings ...)
  2012-12-07 14:28 ` [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets Paul Gortmaker
@ 2012-12-07 14:28 ` Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 05/10] tipc: standardize across connect/disconnect function naming Paul Gortmaker
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Paul Gortmaker

From: Jon Maloy <jon.maloy@ericsson.com>

The sk_recv_queue upper limit for connectionless sockets has empirically
turned out to be too low. When we double the current limit we get much
fewer rejected messages and no noticable negative side-effects.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 4d6a448..65a6bfc 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1,7 +1,7 @@
 /*
  * net/tipc/socket.c: TIPC socket API
  *
- * Copyright (c) 2001-2007, Ericsson AB
+ * Copyright (c) 2001-2007, 2012 Ericsson AB
  * Copyright (c) 2004-2008, 2010-2012, Wind River Systems
  * All rights reserved.
  *
@@ -43,7 +43,7 @@
 #define SS_LISTENING	-1	/* socket is listening */
 #define SS_READY	-2	/* socket is connectionless */
 
-#define OVERLOAD_LIMIT_BASE	5000
+#define OVERLOAD_LIMIT_BASE	10000
 #define CONN_TIMEOUT_DEFAULT	8000	/* default connect timeout = 8s */
 
 struct tipc_sock {
-- 
1.7.12.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 05/10] tipc: standardize across connect/disconnect function naming
  2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
                   ` (3 preceding siblings ...)
  2012-12-07 14:28 ` [PATCH net-next 04/10] tipc: change sk_receive_queue upper limit Paul Gortmaker
@ 2012-12-07 14:28 ` Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 06/10] tipc: consolidate connection-oriented message reception in one function Paul Gortmaker
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Paul Gortmaker

Currently we have tipc_disconnect and tipc_disconnect_port.  It is
not clear from the names alone, what they do or how they differ.
It turns out that tipc_disconnect just deals with the port locking
and then calls tipc_disconnect_port which does all the work.

If we rename as follows: tipc_disconnect_port --> __tipc_disconnect
then we will be following typical linux convention, where:

   __tipc_disconnect: "raw" function that does all the work.

   tipc_disconnect: wrapper that deals with locking and then calls
		    the real core __tipc_disconnect function

With this, the difference is immediately evident, and locking
violations are more apt to be spotted by chance while working on,
or even just while reading the code.

On the connect side of things, we currently only have the single
"tipc_connect2port" function.  It does both the locking at enter/exit,
and the core of the work.  Pending changes will make it desireable to
have the connect be a two part locking wrapper + worker function,
just like the disconnect is already.

Here, we make the connect look just like the updated disconnect case,
for the above reason, and for consistency.  In the process, we also
get rid of the "2port" suffix that was on the original name, since
it adds no descriptive value.

On close examination, one might notice that the above connect
changes implicitly move the call to tipc_link_get_max_pkt() to be
within the scope of tipc_port_lock() protected region; when it was
not previously.  We don't see any issues with this, and it is in
keeping with __tipc_connect doing the work and tipc_connect just
handling the locking.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/port.c   | 32 +++++++++++++++++++++++---------
 net/tipc/port.h   |  6 ++++--
 net/tipc/socket.c |  6 +++---
 net/tipc/subscr.c |  2 +-
 4 files changed, 31 insertions(+), 15 deletions(-)

diff --git a/net/tipc/port.c b/net/tipc/port.c
index 07c42fb..18098ca 100644
--- a/net/tipc/port.c
+++ b/net/tipc/port.c
@@ -726,7 +726,7 @@ static void port_dispatcher_sigh(void *dummy)
 				if (unlikely(!cb))
 					goto reject;
 				if (unlikely(!connected)) {
-					if (tipc_connect2port(dref, &orig))
+					if (tipc_connect(dref, &orig))
 						goto reject;
 				} else if (peer_invalid)
 					goto reject;
@@ -1036,15 +1036,30 @@ int tipc_withdraw(u32 ref, unsigned int scope, struct tipc_name_seq const *seq)
 	return res;
 }
 
-int tipc_connect2port(u32 ref, struct tipc_portid const *peer)
+int tipc_connect(u32 ref, struct tipc_portid const *peer)
 {
 	struct tipc_port *p_ptr;
-	struct tipc_msg *msg;
-	int res = -EINVAL;
+	int res;
 
 	p_ptr = tipc_port_lock(ref);
 	if (!p_ptr)
 		return -EINVAL;
+	res = __tipc_connect(ref, p_ptr, peer);
+	tipc_port_unlock(p_ptr);
+	return res;
+}
+
+/*
+ * __tipc_connect - connect to a remote peer
+ *
+ * Port must be locked.
+ */
+int __tipc_connect(u32 ref, struct tipc_port *p_ptr,
+			struct tipc_portid const *peer)
+{
+	struct tipc_msg *msg;
+	int res = -EINVAL;
+
 	if (p_ptr->published || p_ptr->connected)
 		goto exit;
 	if (!peer->ref)
@@ -1067,17 +1082,16 @@ int tipc_connect2port(u32 ref, struct tipc_portid const *peer)
 			  (net_ev_handler)port_handle_node_down);
 	res = 0;
 exit:
-	tipc_port_unlock(p_ptr);
 	p_ptr->max_pkt = tipc_link_get_max_pkt(peer->node, ref);
 	return res;
 }
 
-/**
- * tipc_disconnect_port - disconnect port from peer
+/*
+ * __tipc_disconnect - disconnect port from peer
  *
  * Port must be locked.
  */
-int tipc_disconnect_port(struct tipc_port *tp_ptr)
+int __tipc_disconnect(struct tipc_port *tp_ptr)
 {
 	int res;
 
@@ -1104,7 +1118,7 @@ int tipc_disconnect(u32 ref)
 	p_ptr = tipc_port_lock(ref);
 	if (!p_ptr)
 		return -EINVAL;
-	res = tipc_disconnect_port(p_ptr);
+	res = __tipc_disconnect(p_ptr);
 	tipc_port_unlock(p_ptr);
 	return res;
 }
diff --git a/net/tipc/port.h b/net/tipc/port.h
index 4660e30..fb66e2e 100644
--- a/net/tipc/port.h
+++ b/net/tipc/port.h
@@ -190,7 +190,7 @@ int tipc_publish(u32 portref, unsigned int scope,
 int tipc_withdraw(u32 portref, unsigned int scope,
 		struct tipc_name_seq const *name_seq);
 
-int tipc_connect2port(u32 portref, struct tipc_portid const *port);
+int tipc_connect(u32 portref, struct tipc_portid const *port);
 
 int tipc_disconnect(u32 portref);
 
@@ -200,7 +200,9 @@ int tipc_shutdown(u32 ref);
 /*
  * The following routines require that the port be locked on entry
  */
-int tipc_disconnect_port(struct tipc_port *tp_ptr);
+int __tipc_disconnect(struct tipc_port *tp_ptr);
+int __tipc_connect(u32 ref, struct tipc_port *p_ptr,
+		   struct tipc_portid const *peer);
 int tipc_port_peer_msg(struct tipc_port *p_ptr, struct tipc_msg *msg);
 
 /*
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 65a6bfc..4d56eae 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -791,7 +791,7 @@ static int auto_connect(struct socket *sock, struct tipc_msg *msg)
 
 	tsock->peer_name.ref = msg_origport(msg);
 	tsock->peer_name.node = msg_orignode(msg);
-	tipc_connect2port(tsock->p->ref, &tsock->peer_name);
+	tipc_connect(tsock->p->ref, &tsock->peer_name);
 	tipc_set_portimportance(tsock->p->ref, msg_importance(msg));
 	sock->state = SS_CONNECTED;
 	return 0;
@@ -1254,7 +1254,7 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 	/* Initiate connection termination for an incoming 'FIN' */
 	if (unlikely(msg_errcode(msg) && (sock->state == SS_CONNECTED))) {
 		sock->state = SS_DISCONNECTING;
-		tipc_disconnect_port(tipc_sk_port(sk));
+		__tipc_disconnect(tipc_sk_port(sk));
 	}
 
 	sk->sk_data_ready(sk, 0);
@@ -1514,7 +1514,7 @@ static int accept(struct socket *sock, struct socket *new_sock, int flags)
 		/* Connect new socket to it's peer */
 		new_tsock->peer_name.ref = msg_origport(msg);
 		new_tsock->peer_name.node = msg_orignode(msg);
-		tipc_connect2port(new_ref, &new_tsock->peer_name);
+		tipc_connect(new_ref, &new_tsock->peer_name);
 		new_sock->state = SS_CONNECTED;
 
 		tipc_set_portimportance(new_ref, msg_importance(msg));
diff --git a/net/tipc/subscr.c b/net/tipc/subscr.c
index 0f7d0d0..6b42d47 100644
--- a/net/tipc/subscr.c
+++ b/net/tipc/subscr.c
@@ -462,7 +462,7 @@ static void subscr_named_msg_event(void *usr_handle,
 		kfree(subscriber);
 		return;
 	}
-	tipc_connect2port(subscriber->port_ref, orig);
+	tipc_connect(subscriber->port_ref, orig);
 
 	/* Lock server port (& save lock address for future use) */
 	subscriber->lock = tipc_port_lock(subscriber->port_ref)->lock;
-- 
1.7.12.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 06/10] tipc: consolidate connection-oriented message reception in one function
  2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
                   ` (4 preceding siblings ...)
  2012-12-07 14:28 ` [PATCH net-next 05/10] tipc: standardize across connect/disconnect function naming Paul Gortmaker
@ 2012-12-07 14:28 ` Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 07/10] tipc: introduce non-blocking socket connect Paul Gortmaker
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Ying Xue, Paul Gortmaker

From: Ying Xue <ying.xue@windriver.com>

Handling of connection-related message reception is currently scattered
around at different places in the code. This makes it harder to verify
that things are handled correctly in all possible scenarios.
So we consolidate the existing processing of connection-oriented
message reception in a single routine.  In the process, we convert the
chain of if/else into a switch/case for improved readability.

A cast on the socket_state in the switch is needed to avoid compile
warnings on 32 bit, like "net/tipc/socket.c:1252:2: warning: case value
‘4294967295’ not in enumerated type".  This happens because existing
tipc code pseudo extends the default linux socket state values with:

	#define SS_LISTENING    -1      /* socket is listening */
	#define SS_READY        -2      /* socket is connectionless */

It may make sense to add these as _positive_ values to the existing
socket state enum list someday, vs. these already existing defines.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: add cast to fix warning; remove returns from middle of switch]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c | 75 +++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 51 insertions(+), 24 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 4d56eae..fc0aa4f 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1192,6 +1192,53 @@ static int rx_queue_full(struct tipc_msg *msg, u32 queue_size, u32 base)
 }
 
 /**
+ * filter_connect - Handle all incoming messages for a connection-based socket
+ * @tsock: TIPC socket
+ * @msg: message
+ *
+ * Returns TIPC error status code and socket error status code
+ * once it encounters some errors
+ */
+static u32 filter_connect(struct tipc_sock *tsock, struct sk_buff **buf)
+{
+	struct socket *sock = tsock->sk.sk_socket;
+	struct tipc_msg *msg = buf_msg(*buf);
+	u32 retval = TIPC_ERR_NO_PORT;
+
+	if (msg_mcast(msg))
+		return retval;
+
+	switch ((int)sock->state) {
+	case SS_CONNECTED:
+		/* Accept only connection-based messages sent by peer */
+		if (msg_connected(msg) && tipc_port_peer_msg(tsock->p, msg)) {
+			if (unlikely(msg_errcode(msg))) {
+				sock->state = SS_DISCONNECTING;
+				__tipc_disconnect(tsock->p);
+			}
+			retval = TIPC_OK;
+		}
+		break;
+	case SS_CONNECTING:
+		/* Accept only ACK or NACK message */
+		if (msg_connected(msg) || (msg_errcode(msg)))
+			retval = TIPC_OK;
+		break;
+	case SS_LISTENING:
+	case SS_UNCONNECTED:
+		/* Accept only SYN message */
+		if (!msg_connected(msg) && !(msg_errcode(msg)))
+			retval = TIPC_OK;
+		break;
+	case SS_DISCONNECTING:
+		break;
+	default:
+		pr_err("Unknown socket state %u\n", sock->state);
+	}
+	return retval;
+}
+
+/**
  * filter_rcv - validate incoming message
  * @sk: socket
  * @buf: message
@@ -1208,6 +1255,7 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 	struct socket *sock = sk->sk_socket;
 	struct tipc_msg *msg = buf_msg(buf);
 	u32 recv_q_len;
+	u32 res = TIPC_OK;
 
 	/* Reject message if it is wrong sort of message for socket */
 	if (msg_type(msg) > TIPC_DIRECT_MSG)
@@ -1226,24 +1274,9 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 				return TIPC_ERR_OVERLOAD;
 		}
 	} else {
-		if (msg_mcast(msg))
-			return TIPC_ERR_NO_PORT;
-		if (sock->state == SS_CONNECTED) {
-			if (!msg_connected(msg) ||
-			    !tipc_port_peer_msg(tipc_sk_port(sk), msg))
-				return TIPC_ERR_NO_PORT;
-		} else if (sock->state == SS_CONNECTING) {
-			if (!msg_connected(msg) && (msg_errcode(msg) == 0))
-				return TIPC_ERR_NO_PORT;
-		} else if (sock->state == SS_LISTENING) {
-			if (msg_connected(msg) || msg_errcode(msg))
-				return TIPC_ERR_NO_PORT;
-		} else if (sock->state == SS_DISCONNECTING) {
-			return TIPC_ERR_NO_PORT;
-		} else /* (sock->state == SS_UNCONNECTED) */ {
-			if (msg_connected(msg) || msg_errcode(msg))
-				return TIPC_ERR_NO_PORT;
-		}
+		res = filter_connect(tipc_sk(sk), &buf);
+		if (res != TIPC_OK || buf == NULL)
+			return res;
 	}
 
 	/* Enqueue message (finally!) */
@@ -1251,12 +1284,6 @@ static u32 filter_rcv(struct sock *sk, struct sk_buff *buf)
 	atomic_inc(&tipc_queue_size);
 	__skb_queue_tail(&sk->sk_receive_queue, buf);
 
-	/* Initiate connection termination for an incoming 'FIN' */
-	if (unlikely(msg_errcode(msg) && (sock->state == SS_CONNECTED))) {
-		sock->state = SS_DISCONNECTING;
-		__tipc_disconnect(tipc_sk_port(sk));
-	}
-
 	sk->sk_data_ready(sk, 0);
 	return TIPC_OK;
 }
-- 
1.7.12.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 07/10] tipc: introduce non-blocking socket connect
  2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
                   ` (5 preceding siblings ...)
  2012-12-07 14:28 ` [PATCH net-next 06/10] tipc: consolidate connection-oriented message reception in one function Paul Gortmaker
@ 2012-12-07 14:28 ` Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 08/10] tipc: eliminate connection setup for implied connect in recv_msg() Paul Gortmaker
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Ying Xue, Paul Gortmaker

From: Ying Xue <ying.xue@windriver.com>

TIPC has so far only supported blocking connect(), meaning that a call
to connect() doesn't return until either the connection is fully
established, or an error occurs. This has proved insufficient for many
users, so we now introduce non-blocking connect(), analogous to how
this is done in TCP and other protocols.

With this feature, if a connection cannot be established instantly,
connect() will return the error code "-EINPROGRESS".
If the user later calls connect() again, he will either have the
return code "-EALREADY" or "-EISCONN", depending on whether the
connection has been established or not.

The user must have explicitly set the socket to be non-blocking
(SOCK_NONBLOCK or O_NONBLOCK, depending on method used), so unless
for some reason they had set this already (the socket would anyway
remain blocking in current TIPC) this change should be completely
backwards compatible.

It is also now possible to call select() or poll() to wait for the
completion of a connection.

An effect of the above is that the actual completion of a connection
may now be performed asynchronously, independent of the calls from
user space. Therefore, we now execute this code in BH context, in
the function filter_rcv(), which is executed upon reception of
messages in the socket.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
[PG: minor refactoring for improved connect/disconnect function names]
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c | 158 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 93 insertions(+), 65 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index fc0aa4f..1ba3b6f 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -783,16 +783,19 @@ exit:
 static int auto_connect(struct socket *sock, struct tipc_msg *msg)
 {
 	struct tipc_sock *tsock = tipc_sk(sock->sk);
-
-	if (msg_errcode(msg)) {
-		sock->state = SS_DISCONNECTING;
-		return -ECONNREFUSED;
-	}
+	struct tipc_port *p_ptr;
 
 	tsock->peer_name.ref = msg_origport(msg);
 	tsock->peer_name.node = msg_orignode(msg);
-	tipc_connect(tsock->p->ref, &tsock->peer_name);
-	tipc_set_portimportance(tsock->p->ref, msg_importance(msg));
+	p_ptr = tipc_port_deref(tsock->p->ref);
+	if (!p_ptr)
+		return -EINVAL;
+
+	__tipc_connect(tsock->p->ref, p_ptr, &tsock->peer_name);
+
+	if (msg_importance(msg) > TIPC_CRITICAL_IMPORTANCE)
+		return -EINVAL;
+	msg_set_importance(&p_ptr->phdr, (u32)msg_importance(msg));
 	sock->state = SS_CONNECTED;
 	return 0;
 }
@@ -1203,7 +1206,9 @@ static u32 filter_connect(struct tipc_sock *tsock, struct sk_buff **buf)
 {
 	struct socket *sock = tsock->sk.sk_socket;
 	struct tipc_msg *msg = buf_msg(*buf);
+	struct sock *sk = &tsock->sk;
 	u32 retval = TIPC_ERR_NO_PORT;
+	int res;
 
 	if (msg_mcast(msg))
 		return retval;
@@ -1221,8 +1226,36 @@ static u32 filter_connect(struct tipc_sock *tsock, struct sk_buff **buf)
 		break;
 	case SS_CONNECTING:
 		/* Accept only ACK or NACK message */
-		if (msg_connected(msg) || (msg_errcode(msg)))
+		if (unlikely(msg_errcode(msg))) {
+			sock->state = SS_DISCONNECTING;
+			sk->sk_err = -ECONNREFUSED;
+			retval = TIPC_OK;
+			break;
+		}
+
+		if (unlikely(!msg_connected(msg)))
+			break;
+
+		res = auto_connect(sock, msg);
+		if (res) {
+			sock->state = SS_DISCONNECTING;
+			sk->sk_err = res;
 			retval = TIPC_OK;
+			break;
+		}
+
+		/* If an incoming message is an 'ACK-', it should be
+		 * discarded here because it doesn't contain useful
+		 * data. In addition, we should try to wake up
+		 * connect() routine if sleeping.
+		 */
+		if (msg_data_sz(msg) == 0) {
+			kfree_skb(*buf);
+			*buf = NULL;
+			if (waitqueue_active(sk_sleep(sk)))
+				wake_up_interruptible(sk_sleep(sk));
+		}
+		retval = TIPC_OK;
 		break;
 	case SS_LISTENING:
 	case SS_UNCONNECTED:
@@ -1369,8 +1402,6 @@ static int connect(struct socket *sock, struct sockaddr *dest, int destlen,
 	struct sock *sk = sock->sk;
 	struct sockaddr_tipc *dst = (struct sockaddr_tipc *)dest;
 	struct msghdr m = {NULL,};
-	struct sk_buff *buf;
-	struct tipc_msg *msg;
 	unsigned int timeout;
 	int res;
 
@@ -1382,26 +1413,6 @@ static int connect(struct socket *sock, struct sockaddr *dest, int destlen,
 		goto exit;
 	}
 
-	/* For now, TIPC does not support the non-blocking form of connect() */
-	if (flags & O_NONBLOCK) {
-		res = -EOPNOTSUPP;
-		goto exit;
-	}
-
-	/* Issue Posix-compliant error code if socket is in the wrong state */
-	if (sock->state == SS_LISTENING) {
-		res = -EOPNOTSUPP;
-		goto exit;
-	}
-	if (sock->state == SS_CONNECTING) {
-		res = -EALREADY;
-		goto exit;
-	}
-	if (sock->state != SS_UNCONNECTED) {
-		res = -EISCONN;
-		goto exit;
-	}
-
 	/*
 	 * Reject connection attempt using multicast address
 	 *
@@ -1413,49 +1424,66 @@ static int connect(struct socket *sock, struct sockaddr *dest, int destlen,
 		goto exit;
 	}
 
-	/* Reject any messages already in receive queue (very unlikely) */
-	reject_rx_queue(sk);
+	timeout = (flags & O_NONBLOCK) ? 0 : tipc_sk(sk)->conn_timeout;
+
+	switch (sock->state) {
+	case SS_UNCONNECTED:
+		/* Send a 'SYN-' to destination */
+		m.msg_name = dest;
+		m.msg_namelen = destlen;
+
+		/* If connect is in non-blocking case, set MSG_DONTWAIT to
+		 * indicate send_msg() is never blocked.
+		 */
+		if (!timeout)
+			m.msg_flags = MSG_DONTWAIT;
+
+		res = send_msg(NULL, sock, &m, 0);
+		if ((res < 0) && (res != -EWOULDBLOCK))
+			goto exit;
 
-	/* Send a 'SYN-' to destination */
-	m.msg_name = dest;
-	m.msg_namelen = destlen;
-	res = send_msg(NULL, sock, &m, 0);
-	if (res < 0)
+		/* Just entered SS_CONNECTING state; the only
+		 * difference is that return value in non-blocking
+		 * case is EINPROGRESS, rather than EALREADY.
+		 */
+		res = -EINPROGRESS;
+		break;
+	case SS_CONNECTING:
+		res = -EALREADY;
+		break;
+	case SS_CONNECTED:
+		res = -EISCONN;
+		break;
+	default:
+		res = -EINVAL;
 		goto exit;
+	}
 
-	/* Wait until an 'ACK' or 'RST' arrives, or a timeout occurs */
-	timeout = tipc_sk(sk)->conn_timeout;
-	release_sock(sk);
-	res = wait_event_interruptible_timeout(*sk_sleep(sk),
-			(!skb_queue_empty(&sk->sk_receive_queue) ||
-			(sock->state != SS_CONNECTING)),
-			timeout ? (long)msecs_to_jiffies(timeout)
-				: MAX_SCHEDULE_TIMEOUT);
-	lock_sock(sk);
+	if (sock->state == SS_CONNECTING) {
+		if (!timeout)
+			goto exit;
 
-	if (res > 0) {
-		buf = skb_peek(&sk->sk_receive_queue);
-		if (buf != NULL) {
-			msg = buf_msg(buf);
-			res = auto_connect(sock, msg);
-			if (!res) {
-				if (!msg_data_sz(msg))
-					advance_rx_queue(sk);
-			}
-		} else {
-			if (sock->state == SS_CONNECTED)
-				res = -EISCONN;
+		/* Wait until an 'ACK' or 'RST' arrives, or a timeout occurs */
+		release_sock(sk);
+		res = wait_event_interruptible_timeout(*sk_sleep(sk),
+				sock->state != SS_CONNECTING,
+				timeout ? (long)msecs_to_jiffies(timeout)
+					: MAX_SCHEDULE_TIMEOUT);
+		lock_sock(sk);
+		if (res <= 0) {
+			if (res == 0)
+				res = -ETIMEDOUT;
 			else
-				res = -ECONNREFUSED;
+				; /* leave "res" unchanged */
+			goto exit;
 		}
-	} else {
-		if (res == 0)
-			res = -ETIMEDOUT;
-		else
-			; /* leave "res" unchanged */
-		sock->state = SS_DISCONNECTING;
 	}
 
+	if (unlikely(sock->state == SS_DISCONNECTING))
+		res = sock_error(sk);
+	else
+		res = 0;
+
 exit:
 	release_sock(sk);
 	return res;
-- 
1.7.12.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 08/10] tipc: eliminate connection setup for implied connect in recv_msg()
  2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
                   ` (6 preceding siblings ...)
  2012-12-07 14:28 ` [PATCH net-next 07/10] tipc: introduce non-blocking socket connect Paul Gortmaker
@ 2012-12-07 14:28 ` Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 09/10] tipc: add lock nesting notation to quiet lockdep warning Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 10/10] tipc: refactor accept() code for improved readability Paul Gortmaker
  9 siblings, 0 replies; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Ying Xue, Paul Gortmaker

From: Ying Xue <ying.xue@windriver.com>

As connection setup is now completed asynchronously in BH context,
in the function filter_connect(), the corresponding code in recv_msg()
becomes redundant.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 1ba3b6f..0df42fa 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -954,13 +954,6 @@ restart:
 	sz = msg_data_sz(msg);
 	err = msg_errcode(msg);
 
-	/* Complete connection setup for an implied connect */
-	if (unlikely(sock->state == SS_CONNECTING)) {
-		res = auto_connect(sock, msg);
-		if (res)
-			goto exit;
-	}
-
 	/* Discard an empty non-errored message & try again */
 	if ((!sz) && (!err)) {
 		advance_rx_queue(sk);
-- 
1.7.12.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 09/10] tipc: add lock nesting notation to quiet lockdep warning
  2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
                   ` (7 preceding siblings ...)
  2012-12-07 14:28 ` [PATCH net-next 08/10] tipc: eliminate connection setup for implied connect in recv_msg() Paul Gortmaker
@ 2012-12-07 14:28 ` Paul Gortmaker
  2012-12-07 14:28 ` [PATCH net-next 10/10] tipc: refactor accept() code for improved readability Paul Gortmaker
  9 siblings, 0 replies; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Ying Xue, Paul Gortmaker

From: Ying Xue <ying.xue@windriver.com>

TIPC accept() call grabs the socket lock on a newly allocated
socket while holding the socket lock on an old socket. But lockdep
worries that this might be a recursive lock attempt:

  [ INFO: possible recursive locking detected ]
  ---------------------------------------------
  kworker/u:0/6 is trying to acquire lock:
  (sk_lock-AF_TIPC){+.+.+.}, at: [<c8c1226c>] accept+0x15c/0x310 [tipc]

  but task is already holding lock:
  (sk_lock-AF_TIPC){+.+.+.}, at: [<c8c12138>] accept+0x28/0x310 [tipc]

  other info that might help us debug this:
  Possible unsafe locking scenario:

          CPU0
          ----
          lock(sk_lock-AF_TIPC);
          lock(sk_lock-AF_TIPC);

          *** DEADLOCK ***

  May be due to missing lock nesting notation
  [...]

Tell lockdep that this locking is safe by using lock_sock_nested().
This is similar to what was done in commit 5131a184a3458d9 for
SCTP code ("SCTP: lock_sock_nested in sctp_sock_migrate").

Also note that this is isn't something that is seen normally,
as it was uncovered with some experimental work-in-progress
code not yet ready for mainline.  So no need for stable
backports or similar of this commit.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 0df42fa..38613cf 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1551,7 +1551,8 @@ static int accept(struct socket *sock, struct socket *new_sock, int flags)
 		u32 new_ref = new_tport->ref;
 		struct tipc_msg *msg = buf_msg(buf);
 
-		lock_sock(new_sk);
+		/* we lock on new_sk; but lockdep sees the lock on sk */
+		lock_sock_nested(new_sk, SINGLE_DEPTH_NESTING);
 
 		/*
 		 * Reject any stray messages received by new socket
-- 
1.7.12.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH net-next 10/10] tipc: refactor accept() code for improved readability
  2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
                   ` (8 preceding siblings ...)
  2012-12-07 14:28 ` [PATCH net-next 09/10] tipc: add lock nesting notation to quiet lockdep warning Paul Gortmaker
@ 2012-12-07 14:28 ` Paul Gortmaker
  2012-12-07 19:42   ` Neil Horman
  9 siblings, 1 reply; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 14:28 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Jon Maloy, Paul Gortmaker

In TIPC's accept() routine, there is a large block of code relating
to initialization of a new socket, all within an if condition checking
if the allocation succeeded.

Here, we simply factor out that init code within the accept() function
to its own separate function, which improves readability, and makes
it easier to track which lock_sock() calls are operating on existing
vs. new sockets.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
---
 net/tipc/socket.c | 93 +++++++++++++++++++++++++++++--------------------------
 1 file changed, 49 insertions(+), 44 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 38613cf..56661c8 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1507,6 +1507,53 @@ static int listen(struct socket *sock, int len)
 	return res;
 }
 
+static void tipc_init_socket(struct sock *sk, struct socket *new_sock,
+			     struct sk_buff *buf)
+{
+
+	struct sock *new_sk = new_sock->sk;
+	struct tipc_sock *new_tsock = tipc_sk(new_sk);
+	struct tipc_port *new_tport = new_tsock->p;
+	u32 new_ref = new_tport->ref;
+	struct tipc_msg *msg = buf_msg(buf);
+
+	/* we lock on new_sk; but lockdep sees accept's lock on sk */
+	lock_sock_nested(new_sk, SINGLE_DEPTH_NESTING);
+
+	/*
+	 * Reject any stray messages received by new socket
+	 * before the socket lock was taken (very, very unlikely)
+	 */
+	reject_rx_queue(new_sk);
+
+	/* Connect new socket to it's peer */
+	new_tsock->peer_name.ref = msg_origport(msg);
+	new_tsock->peer_name.node = msg_orignode(msg);
+	tipc_connect(new_ref, &new_tsock->peer_name);
+	new_sock->state = SS_CONNECTED;
+
+	tipc_set_portimportance(new_ref, msg_importance(msg));
+	if (msg_named(msg)) {
+		new_tport->conn_type = msg_nametype(msg);
+		new_tport->conn_instance = msg_nameinst(msg);
+	}
+
+	/*
+	 * Respond to 'SYN-' by discarding it & returning 'ACK'-.
+	 * Respond to 'SYN+' by queuing it on new socket.
+	 */
+	if (!msg_data_sz(msg)) {
+		struct msghdr m = {NULL,};
+
+		advance_rx_queue(sk);
+		send_packet(NULL, new_sock, &m, 0);
+	} else {
+		__skb_dequeue(&sk->sk_receive_queue);
+		__skb_queue_head(&new_sk->sk_receive_queue, buf);
+	}
+	release_sock(new_sk);
+}
+
 /**
  * accept - wait for connection request
  * @sock: listening socket
@@ -1542,51 +1589,9 @@ static int accept(struct socket *sock, struct socket *new_sock, int flags)
 	}
 
 	buf = skb_peek(&sk->sk_receive_queue);
-
 	res = tipc_create(sock_net(sock->sk), new_sock, 0, 0);
-	if (!res) {
-		struct sock *new_sk = new_sock->sk;
-		struct tipc_sock *new_tsock = tipc_sk(new_sk);
-		struct tipc_port *new_tport = new_tsock->p;
-		u32 new_ref = new_tport->ref;
-		struct tipc_msg *msg = buf_msg(buf);
-
-		/* we lock on new_sk; but lockdep sees the lock on sk */
-		lock_sock_nested(new_sk, SINGLE_DEPTH_NESTING);
-
-		/*
-		 * Reject any stray messages received by new socket
-		 * before the socket lock was taken (very, very unlikely)
-		 */
-		reject_rx_queue(new_sk);
-
-		/* Connect new socket to it's peer */
-		new_tsock->peer_name.ref = msg_origport(msg);
-		new_tsock->peer_name.node = msg_orignode(msg);
-		tipc_connect(new_ref, &new_tsock->peer_name);
-		new_sock->state = SS_CONNECTED;
-
-		tipc_set_portimportance(new_ref, msg_importance(msg));
-		if (msg_named(msg)) {
-			new_tport->conn_type = msg_nametype(msg);
-			new_tport->conn_instance = msg_nameinst(msg);
-		}
-
-		/*
-		 * Respond to 'SYN-' by discarding it & returning 'ACK'-.
-		 * Respond to 'SYN+' by queuing it on new socket.
-		 */
-		if (!msg_data_sz(msg)) {
-			struct msghdr m = {NULL,};
-
-			advance_rx_queue(sk);
-			send_packet(NULL, new_sock, &m, 0);
-		} else {
-			__skb_dequeue(&sk->sk_receive_queue);
-			__skb_queue_head(&new_sk->sk_receive_queue, buf);
-		}
-		release_sock(new_sk);
-	}
+	if (!res)
+		tipc_init_socket(sk, new_sock, buf);
 exit:
 	release_sock(sk);
 	return res;
-- 
1.7.12.1

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 10/10] tipc: refactor accept() code for improved readability
  2012-12-07 14:28 ` [PATCH net-next 10/10] tipc: refactor accept() code for improved readability Paul Gortmaker
@ 2012-12-07 19:42   ` Neil Horman
  2012-12-07 19:56     ` Paul Gortmaker
  0 siblings, 1 reply; 22+ messages in thread
From: Neil Horman @ 2012-12-07 19:42 UTC (permalink / raw)
  To: Paul Gortmaker; +Cc: David Miller, netdev, Jon Maloy

On Fri, Dec 07, 2012 at 09:28:18AM -0500, Paul Gortmaker wrote:
> In TIPC's accept() routine, there is a large block of code relating
> to initialization of a new socket, all within an if condition checking
> if the allocation succeeded.
> 
> Here, we simply factor out that init code within the accept() function
> to its own separate function, which improves readability, and makes
> it easier to track which lock_sock() calls are operating on existing
> vs. new sockets.
> 
> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> ---
>  net/tipc/socket.c | 93 +++++++++++++++++++++++++++++--------------------------
>  1 file changed, 49 insertions(+), 44 deletions(-)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index 38613cf..56661c8 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -1507,6 +1507,53 @@ static int listen(struct socket *sock, int len)
>  	return res;
>  }
>  
> +static void tipc_init_socket(struct sock *sk, struct socket *new_sock,
> +			     struct sk_buff *buf)
> +{
Can you rename this to something more specific to its purpose?  tipc_init_socket
makes me wonder why you're not calling it internally from tipc_create.  This
seems more like a tipc_init_accept_sock, or some such.  Alternatively, since
you're ony using it from your accept call, you might consider not factoring it
out at all, and just reversing the logic in your accept function so that you do:

 res = tipc_create(...)
 if (res)
	goto exit;
 <rest of tipc_init_socket goes here>

That gives you the same level of readability, without the additional potential
call instruction, plus you put the unlikely case inside the if conditional.

Neil

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH net-next 10/10] tipc: refactor accept() code for improved readability
  2012-12-07 19:42   ` Neil Horman
@ 2012-12-07 19:56     ` Paul Gortmaker
  0 siblings, 0 replies; 22+ messages in thread
From: Paul Gortmaker @ 2012-12-07 19:56 UTC (permalink / raw)
  To: Neil Horman; +Cc: David Miller, netdev, Jon Maloy

[Re: [PATCH net-next 10/10] tipc: refactor accept() code for improved readability] On 07/12/2012 (Fri 14:42) Neil Horman wrote:

> On Fri, Dec 07, 2012 at 09:28:18AM -0500, Paul Gortmaker wrote:
> > In TIPC's accept() routine, there is a large block of code relating
> > to initialization of a new socket, all within an if condition checking
> > if the allocation succeeded.
> > 
> > Here, we simply factor out that init code within the accept() function
> > to its own separate function, which improves readability, and makes
> > it easier to track which lock_sock() calls are operating on existing
> > vs. new sockets.
> > 
> > Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
> > ---
> >  net/tipc/socket.c | 93 +++++++++++++++++++++++++++++--------------------------
> >  1 file changed, 49 insertions(+), 44 deletions(-)
> > 
> > diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> > index 38613cf..56661c8 100644
> > --- a/net/tipc/socket.c
> > +++ b/net/tipc/socket.c
> > @@ -1507,6 +1507,53 @@ static int listen(struct socket *sock, int len)
> >  	return res;
> >  }
> >  
> > +static void tipc_init_socket(struct sock *sk, struct socket *new_sock,
> > +			     struct sk_buff *buf)
> > +{
> Can you rename this to something more specific to its purpose?  tipc_init_socket
> makes me wonder why you're not calling it internally from tipc_create.  This
> seems more like a tipc_init_accept_sock, or some such.  Alternatively, since
> you're ony using it from your accept call, you might consider not factoring it
> out at all, and just reversing the logic in your accept function so that you do:
> 
>  res = tipc_create(...)
>  if (res)
> 	goto exit;
>  <rest of tipc_init_socket goes here>
> 
> That gives you the same level of readability, without the additional potential
> call instruction, plus you put the unlikely case inside the if conditional.

I'm inclined to do the latter and just flip the sense of the "if" since
I already scratched my head trying to think of a good name (and
apparently failed in the end).

Thanks,
Paul.

> 
> Neil
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2012-12-10 14:23 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-07 14:28 [PATCH net-next 00/10] tipc: more updates for the v3.8 content Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 01/10] tipc: remove obsolete flush of stale reassembly buffer Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 02/10] tipc: eliminate aggregate sk_receive_queue limit Paul Gortmaker
2012-12-07 16:07   ` Neil Horman
2012-12-07 17:36     ` David Miller
2012-12-07 19:54       ` Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 03/10] tipc: sk_recv_queue size check only for connectionless sockets Paul Gortmaker
2012-12-07 19:20   ` Neil Horman
2012-12-07 22:30     ` Jon Maloy
2012-12-09 16:50       ` Neil Horman
2012-12-10  6:27         ` Ying Xue
2012-12-10  8:46           ` Jon Maloy
2012-12-10 14:22           ` Neil Horman
2012-12-07 14:28 ` [PATCH net-next 04/10] tipc: change sk_receive_queue upper limit Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 05/10] tipc: standardize across connect/disconnect function naming Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 06/10] tipc: consolidate connection-oriented message reception in one function Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 07/10] tipc: introduce non-blocking socket connect Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 08/10] tipc: eliminate connection setup for implied connect in recv_msg() Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 09/10] tipc: add lock nesting notation to quiet lockdep warning Paul Gortmaker
2012-12-07 14:28 ` [PATCH net-next 10/10] tipc: refactor accept() code for improved readability Paul Gortmaker
2012-12-07 19:42   ` Neil Horman
2012-12-07 19:56     ` Paul Gortmaker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).