netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next 0/5] tipc: make connection setup more robust
@ 2018-09-28 18:23 Jon Maloy
  2018-09-28 18:23 ` [net-next 1/5] tipc: refactor function tipc_msg_reverse() Jon Maloy
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: Jon Maloy @ 2018-09-28 18:23 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

In this series we make a few improvements to the connection setup and
probing mechanism, culminating in the last commit where we make it 
possible for a client socket to make multiple setup attempts in case
it encounters receive buffer overflow at the listener socket.

Jon Maloy (4):
  tipc: refactor function tipc_msg_reverse()
  tipc: refactor function tipc_sk_timeout()
  tipc: refactor function tipc_sk_filter_connect()
  tipc: add SYN bit to connection setup messages

Tung Nguyen (1):
  tipc: buffer overflow handling in listener socket

 net/tipc/msg.c    |  78 ++++++++++++--------
 net/tipc/msg.h    |  11 +++
 net/tipc/node.h   |  12 ++--
 net/tipc/socket.c | 207 ++++++++++++++++++++++++++++++++----------------------
 4 files changed, 190 insertions(+), 118 deletions(-)

-- 
2.1.4

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [net-next 1/5] tipc: refactor function tipc_msg_reverse()
  2018-09-28 18:23 [net-next 0/5] tipc: make connection setup more robust Jon Maloy
@ 2018-09-28 18:23 ` Jon Maloy
  2018-09-28 18:23 ` [net-next 2/5] tipc: refactor function tipc_sk_timeout() Jon Maloy
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Jon Maloy @ 2018-09-28 18:23 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

The function tipc_msg_reverse() is reversing the header of a message
while reusing the original buffer. We have seen at several occasions
that this may have unfortunate side effects when the buffer to be
reversed is a clone.

In one of the following commits we will again need to reverse cloned
buffers, so this is the right time to permanently eliminate this
problem. In this commit we let the said function always consume the
original buffer and replace it with a new one when applicable.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/msg.c | 58 ++++++++++++++++++++++++++++------------------------------
 1 file changed, 28 insertions(+), 30 deletions(-)

diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index b618910..00fbb5c 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -499,54 +499,52 @@ bool tipc_msg_make_bundle(struct sk_buff **skb,  struct tipc_msg *msg,
 /**
  * tipc_msg_reverse(): swap source and destination addresses and add error code
  * @own_node: originating node id for reversed message
- * @skb:  buffer containing message to be reversed; may be replaced.
+ * @skb:  buffer containing message to be reversed; will be consumed
  * @err:  error code to be set in message, if any
- * Consumes buffer at failure
+ * Replaces consumed buffer with new one when successful
  * Returns true if success, otherwise false
  */
 bool tipc_msg_reverse(u32 own_node,  struct sk_buff **skb, int err)
 {
 	struct sk_buff *_skb = *skb;
-	struct tipc_msg *hdr;
-	struct tipc_msg ohdr;
-	int dlen;
+	struct tipc_msg *_hdr, *hdr;
+	int hlen, dlen;
 
 	if (skb_linearize(_skb))
 		goto exit;
-	hdr = buf_msg(_skb);
-	dlen = min_t(uint, msg_data_sz(hdr), MAX_FORWARD_SIZE);
-	if (msg_dest_droppable(hdr))
+	_hdr = buf_msg(_skb);
+	dlen = min_t(uint, msg_data_sz(_hdr), MAX_FORWARD_SIZE);
+	hlen = msg_hdr_sz(_hdr);
+
+	if (msg_dest_droppable(_hdr))
 		goto exit;
-	if (msg_errcode(hdr))
+	if (msg_errcode(_hdr))
 		goto exit;
 
-	/* Take a copy of original header before altering message */
-	memcpy(&ohdr, hdr, msg_hdr_sz(hdr));
-
-	/* Never return SHORT header; expand by replacing buffer if necessary */
-	if (msg_short(hdr)) {
-		*skb = tipc_buf_acquire(BASIC_H_SIZE + dlen, GFP_ATOMIC);
-		if (!*skb)
-			goto exit;
-		memcpy((*skb)->data + BASIC_H_SIZE, msg_data(hdr), dlen);
-		kfree_skb(_skb);
-		_skb = *skb;
-		hdr = buf_msg(_skb);
-		memcpy(hdr, &ohdr, BASIC_H_SIZE);
-		msg_set_hdr_sz(hdr, BASIC_H_SIZE);
-	}
+	/* Never return SHORT header */
+	if (hlen == SHORT_H_SIZE)
+		hlen = BASIC_H_SIZE;
 
-	/* Now reverse the concerned fields */
+	/* Allocate new buffer to return */
+	*skb = tipc_buf_acquire(hlen + dlen, GFP_ATOMIC);
+	if (!*skb)
+		goto exit;
+	memcpy((*skb)->data, _skb->data, msg_hdr_sz(_hdr));
+	memcpy((*skb)->data + hlen, msg_data(_hdr), dlen);
+
+	/* Build reverse header in new buffer */
+	hdr = buf_msg(*skb);
+	msg_set_hdr_sz(hdr, hlen);
 	msg_set_errcode(hdr, err);
 	msg_set_non_seq(hdr, 0);
-	msg_set_origport(hdr, msg_destport(&ohdr));
-	msg_set_destport(hdr, msg_origport(&ohdr));
-	msg_set_destnode(hdr, msg_prevnode(&ohdr));
+	msg_set_origport(hdr, msg_destport(_hdr));
+	msg_set_destport(hdr, msg_origport(_hdr));
+	msg_set_destnode(hdr, msg_prevnode(_hdr));
 	msg_set_prevnode(hdr, own_node);
 	msg_set_orignode(hdr, own_node);
-	msg_set_size(hdr, msg_hdr_sz(hdr) + dlen);
-	skb_trim(_skb, msg_size(hdr));
+	msg_set_size(hdr, hlen + dlen);
 	skb_orphan(_skb);
+	kfree_skb(_skb);
 	return true;
 exit:
 	kfree_skb(_skb);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [net-next 2/5] tipc: refactor function tipc_sk_timeout()
  2018-09-28 18:23 [net-next 0/5] tipc: make connection setup more robust Jon Maloy
  2018-09-28 18:23 ` [net-next 1/5] tipc: refactor function tipc_msg_reverse() Jon Maloy
@ 2018-09-28 18:23 ` Jon Maloy
  2018-09-28 18:23 ` [net-next 3/5] tipc: refactor function tipc_sk_filter_connect() Jon Maloy
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Jon Maloy @ 2018-09-28 18:23 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

We refactor this function as a preparation for the coming commits in
the same series.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 62 ++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 38 insertions(+), 24 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 3f03ddd..8152a81 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2545,43 +2545,57 @@ static int tipc_shutdown(struct socket *sock, int how)
 	return res;
 }
 
+static void tipc_sk_check_probing_state(struct sock *sk,
+					struct sk_buff_head *list)
+{
+	struct tipc_sock *tsk = tipc_sk(sk);
+	u32 pnode = tsk_peer_node(tsk);
+	u32 pport = tsk_peer_port(tsk);
+	u32 self = tsk_own_node(tsk);
+	u32 oport = tsk->portid;
+	struct sk_buff *skb;
+
+	if (tsk->probe_unacked) {
+		tipc_set_sk_state(sk, TIPC_DISCONNECTING);
+		sk->sk_err = ECONNABORTED;
+		tipc_node_remove_conn(sock_net(sk), pnode, pport);
+		sk->sk_state_change(sk);
+		return;
+	}
+	/* Prepare new probe */
+	skb = tipc_msg_create(CONN_MANAGER, CONN_PROBE, INT_H_SIZE, 0,
+			      pnode, self, pport, oport, TIPC_OK);
+	if (skb)
+		__skb_queue_tail(list, skb);
+	tsk->probe_unacked = true;
+	sk_reset_timer(sk, &sk->sk_timer, jiffies + CONN_PROBING_INTV);
+}
+
 static void tipc_sk_timeout(struct timer_list *t)
 {
 	struct sock *sk = from_timer(sk, t, sk_timer);
 	struct tipc_sock *tsk = tipc_sk(sk);
-	u32 peer_port = tsk_peer_port(tsk);
-	u32 peer_node = tsk_peer_node(tsk);
-	u32 own_node = tsk_own_node(tsk);
-	u32 own_port = tsk->portid;
-	struct net *net = sock_net(sk);
-	struct sk_buff *skb = NULL;
+	u32 pnode = tsk_peer_node(tsk);
+	struct sk_buff_head list;
 
+	skb_queue_head_init(&list);
 	bh_lock_sock(sk);
-	if (!tipc_sk_connected(sk))
-		goto exit;
 
 	/* Try again later if socket is busy */
 	if (sock_owned_by_user(sk)) {
 		sk_reset_timer(sk, &sk->sk_timer, jiffies + HZ / 20);
-		goto exit;
+		bh_unlock_sock(sk);
+		return;
 	}
 
-	if (tsk->probe_unacked) {
-		tipc_set_sk_state(sk, TIPC_DISCONNECTING);
-		tipc_node_remove_conn(net, peer_node, peer_port);
-		sk->sk_state_change(sk);
-		goto exit;
-	}
-	/* Send new probe */
-	skb = tipc_msg_create(CONN_MANAGER, CONN_PROBE, INT_H_SIZE, 0,
-			      peer_node, own_node, peer_port, own_port,
-			      TIPC_OK);
-	tsk->probe_unacked = true;
-	sk_reset_timer(sk, &sk->sk_timer, jiffies + CONN_PROBING_INTV);
-exit:
+	if (sk->sk_state == TIPC_ESTABLISHED)
+		tipc_sk_check_probing_state(sk, &list);
+
 	bh_unlock_sock(sk);
-	if (skb)
-		tipc_node_xmit_skb(net, skb, peer_node, own_port);
+
+	if (!skb_queue_empty(&list))
+		tipc_node_xmit(sock_net(sk), &list, pnode, tsk->portid);
+
 	sock_put(sk);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [net-next 3/5] tipc: refactor function tipc_sk_filter_connect()
  2018-09-28 18:23 [net-next 0/5] tipc: make connection setup more robust Jon Maloy
  2018-09-28 18:23 ` [net-next 1/5] tipc: refactor function tipc_msg_reverse() Jon Maloy
  2018-09-28 18:23 ` [net-next 2/5] tipc: refactor function tipc_sk_timeout() Jon Maloy
@ 2018-09-28 18:23 ` Jon Maloy
  2018-09-28 18:23 ` [net-next 4/5] tipc: add SYN bit to connection setup messages Jon Maloy
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Jon Maloy @ 2018-09-28 18:23 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

We refactor the function tipc_sk_filter_connect(), both to make it
more readable and as a preparation for the next commit.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/socket.c | 101 +++++++++++++++++++++++-------------------------------
 1 file changed, 43 insertions(+), 58 deletions(-)

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 8152a81..094cfdc 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1959,91 +1959,76 @@ static void tipc_sk_proto_rcv(struct sock *sk,
 }
 
 /**
- * tipc_filter_connect - Handle incoming message for a connection-based socket
+ * tipc_sk_filter_connect - check incoming message for a connection-based socket
  * @tsk: TIPC socket
- * @skb: pointer to message buffer. Set to NULL if buffer is consumed
- *
- * Returns true if everything ok, false otherwise
+ * @skb: pointer to message buffer.
+ * Returns true if message should be added to receive queue, false otherwise
  */
 static bool tipc_sk_filter_connect(struct tipc_sock *tsk, struct sk_buff *skb)
 {
 	struct sock *sk = &tsk->sk;
 	struct net *net = sock_net(sk);
 	struct tipc_msg *hdr = buf_msg(skb);
-	u32 pport = msg_origport(hdr);
-	u32 pnode = msg_orignode(hdr);
+	bool con_msg = msg_connected(hdr);
+	u32 pport = tsk_peer_port(tsk);
+	u32 pnode = tsk_peer_node(tsk);
+	u32 oport = msg_origport(hdr);
+	u32 onode = msg_orignode(hdr);
+	int err = msg_errcode(hdr);
 
 	if (unlikely(msg_mcast(hdr)))
 		return false;
 
 	switch (sk->sk_state) {
 	case TIPC_CONNECTING:
-		/* Accept only ACK or NACK message */
-		if (unlikely(!msg_connected(hdr))) {
-			if (pport != tsk_peer_port(tsk) ||
-			    pnode != tsk_peer_node(tsk))
-				return false;
-
-			tipc_set_sk_state(sk, TIPC_DISCONNECTING);
-			sk->sk_err = ECONNREFUSED;
-			sk->sk_state_change(sk);
-			return true;
-		}
-
-		if (unlikely(msg_errcode(hdr))) {
-			tipc_set_sk_state(sk, TIPC_DISCONNECTING);
-			sk->sk_err = ECONNREFUSED;
-			sk->sk_state_change(sk);
-			return true;
-		}
-
-		if (unlikely(!msg_isdata(hdr))) {
-			tipc_set_sk_state(sk, TIPC_DISCONNECTING);
-			sk->sk_err = EINVAL;
-			sk->sk_state_change(sk);
-			return true;
+		/* Setup ACK */
+		if (likely(con_msg)) {
+			if (err)
+				break;
+			tipc_sk_finish_conn(tsk, oport, onode);
+			msg_set_importance(&tsk->phdr, msg_importance(hdr));
+			/* ACK+ message with data is added to receive queue */
+			if (msg_data_sz(hdr))
+				return true;
+			/* Empty ACK-, - wake up sleeping connect() and drop */
+			sk->sk_data_ready(sk);
+			msg_set_dest_droppable(hdr, 1);
+			return false;
 		}
+		/* Ignore connectionless message if not from listening socket */
+		if (oport != pport || onode != pnode)
+			return false;
 
-		tipc_sk_finish_conn(tsk, msg_origport(hdr), msg_orignode(hdr));
-		msg_set_importance(&tsk->phdr, msg_importance(hdr));
-
-		/* If 'ACK+' message, add to socket receive queue */
-		if (msg_data_sz(hdr))
-			return true;
-
-		/* If empty 'ACK-' message, wake up sleeping connect() */
-		sk->sk_data_ready(sk);
-
-		/* 'ACK-' message is neither accepted nor rejected: */
-		msg_set_dest_droppable(hdr, 1);
-		return false;
-
+		/* Rejected SYN - abort */
+		break;
 	case TIPC_OPEN:
 	case TIPC_DISCONNECTING:
-		break;
+		return false;
 	case TIPC_LISTEN:
 		/* Accept only SYN message */
-		if (!msg_connected(hdr) && !(msg_errcode(hdr)))
+		if (!con_msg && !err)
 			return true;
-		break;
+		return false;
 	case TIPC_ESTABLISHED:
 		/* Accept only connection-based messages sent by peer */
-		if (unlikely(!tsk_peer_msg(tsk, hdr)))
+		if (likely(con_msg && !err && pport == oport && pnode == onode))
+			return true;
+		if (!tsk_peer_msg(tsk, hdr))
 			return false;
-
-		if (unlikely(msg_errcode(hdr))) {
-			tipc_set_sk_state(sk, TIPC_DISCONNECTING);
-			/* Let timer expire on it's own */
-			tipc_node_remove_conn(net, tsk_peer_node(tsk),
-					      tsk->portid);
-			sk->sk_state_change(sk);
-		}
+		if (!err)
+			return true;
+		tipc_set_sk_state(sk, TIPC_DISCONNECTING);
+		tipc_node_remove_conn(net, pnode, tsk->portid);
+		sk->sk_state_change(sk);
 		return true;
 	default:
 		pr_err("Unknown sk_state %u\n", sk->sk_state);
 	}
-
-	return false;
+	/* Abort connection setup attempt */
+	tipc_set_sk_state(sk, TIPC_DISCONNECTING);
+	sk->sk_err = ECONNREFUSED;
+	sk->sk_state_change(sk);
+	return true;
 }
 
 /**
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [net-next 4/5] tipc: add SYN bit to connection setup messages
  2018-09-28 18:23 [net-next 0/5] tipc: make connection setup more robust Jon Maloy
                   ` (2 preceding siblings ...)
  2018-09-28 18:23 ` [net-next 3/5] tipc: refactor function tipc_sk_filter_connect() Jon Maloy
@ 2018-09-28 18:23 ` Jon Maloy
  2018-09-28 18:23 ` [net-next 5/5] tipc: buffer overflow handling in listener socket Jon Maloy
  2018-09-29 18:25 ` [net-next 0/5] tipc: make connection setup more robust David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: Jon Maloy @ 2018-09-28 18:23 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

Messages intended for intitating a connection are currently
indistinguishable from regular datagram messages. The TIPC
protocol specification defines bit 17 in word 0 as a SYN bit
to allow sanity check of such messages in the listening socket,
but this has so far never been implemented.

We do that in this commit.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/msg.h    | 10 ++++++++++
 net/tipc/node.h   | 12 +++++++-----
 net/tipc/socket.c |  5 +++++
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index a4e944d..cd64d7b 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -216,6 +216,16 @@ static inline void msg_set_non_seq(struct tipc_msg *m, u32 n)
 	msg_set_bits(m, 0, 20, 1, n);
 }
 
+static inline int msg_is_syn(struct tipc_msg *m)
+{
+	return msg_bits(m, 0, 17, 1);
+}
+
+static inline void msg_set_syn(struct tipc_msg *m, u32 d)
+{
+	msg_set_bits(m, 0, 17, 1, d);
+}
+
 static inline int msg_dest_droppable(struct tipc_msg *m)
 {
 	return msg_bits(m, 0, 19, 1);
diff --git a/net/tipc/node.h b/net/tipc/node.h
index 48b3298..03f5efb 100644
--- a/net/tipc/node.h
+++ b/net/tipc/node.h
@@ -45,6 +45,7 @@
 /* Optional capabilities supported by this code version
  */
 enum {
+	TIPC_SYN_BIT          = (1),
 	TIPC_BCAST_SYNCH      = (1 << 1),
 	TIPC_BCAST_STATE_NACK = (1 << 2),
 	TIPC_BLOCK_FLOWCTL    = (1 << 3),
@@ -53,11 +54,12 @@ enum {
 	TIPC_LINK_PROTO_SEQNO = (1 << 6)
 };
 
-#define TIPC_NODE_CAPABILITIES (TIPC_BCAST_SYNCH       |  \
-				TIPC_BCAST_STATE_NACK  |  \
-				TIPC_BCAST_RCAST       |  \
-				TIPC_BLOCK_FLOWCTL     |  \
-				TIPC_NODE_ID128        |  \
+#define TIPC_NODE_CAPABILITIES (TIPC_SYN_BIT           |  \
+				TIPC_BCAST_SYNCH       |   \
+				TIPC_BCAST_STATE_NACK  |   \
+				TIPC_BCAST_RCAST       |   \
+				TIPC_BLOCK_FLOWCTL     |   \
+				TIPC_NODE_ID128        |   \
 				TIPC_LINK_PROTO_SEQNO)
 #define INVALID_BEARER_ID -1
 
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 094cfdc..89d6dc0 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -1319,6 +1319,7 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen)
 			tsk->conn_type = dest->addr.name.name.type;
 			tsk->conn_instance = dest->addr.name.name.instance;
 		}
+		msg_set_syn(hdr, 1);
 	}
 
 	seq = &dest->addr.nameseq;
@@ -1478,6 +1479,7 @@ static void tipc_sk_finish_conn(struct tipc_sock *tsk, u32 peer_port,
 	struct net *net = sock_net(sk);
 	struct tipc_msg *msg = &tsk->phdr;
 
+	msg_set_syn(msg, 0);
 	msg_set_destnode(msg, peer_node);
 	msg_set_destport(msg, peer_port);
 	msg_set_type(msg, TIPC_CONN_MSG);
@@ -2006,6 +2008,9 @@ static bool tipc_sk_filter_connect(struct tipc_sock *tsk, struct sk_buff *skb)
 		return false;
 	case TIPC_LISTEN:
 		/* Accept only SYN message */
+		if (!msg_is_syn(hdr) &&
+		    tipc_node_get_capabilities(net, onode) & TIPC_SYN_BIT)
+			return false;
 		if (!con_msg && !err)
 			return true;
 		return false;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [net-next 5/5] tipc: buffer overflow handling in listener socket
  2018-09-28 18:23 [net-next 0/5] tipc: make connection setup more robust Jon Maloy
                   ` (3 preceding siblings ...)
  2018-09-28 18:23 ` [net-next 4/5] tipc: add SYN bit to connection setup messages Jon Maloy
@ 2018-09-28 18:23 ` Jon Maloy
  2018-09-29 18:25 ` [net-next 0/5] tipc: make connection setup more robust David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: Jon Maloy @ 2018-09-28 18:23 UTC (permalink / raw)
  To: davem, netdev
  Cc: gordan.mihaljevic, tung.q.nguyen, hoang.h.le, jon.maloy,
	canh.d.luu, ying.xue, tipc-discussion

From: Tung Nguyen <tung.q.nguyen@dektech.com.au>

Default socket receive buffer size for a listener socket is 2Mb. For
each arriving empty SYN, the linux kernel allocates a 768 bytes buffer.
This means that a listener socket can serve maximum 2700 simultaneous
empty connection setup requests before it hits a receive buffer
overflow, and much fewer if the SYN is carrying any significant
amount of data.

When this happens the setup request is rejected, and the client
receives an ECONNREFUSED error.

This commit mitigates this problem by letting the client socket try to
retransmit the SYN message multiple times when it sees it rejected with
the code TIPC_ERR_OVERLOAD. Retransmission is done at random intervals
in the range of [100 ms, setup_timeout / 4], as many times as there is
room for within the setup timeout limit.

Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/msg.c    | 20 ++++++++++++++++++++
 net/tipc/msg.h    |  1 +
 net/tipc/socket.c | 49 +++++++++++++++++++++++++++++++++++++++++++------
 3 files changed, 64 insertions(+), 6 deletions(-)

diff --git a/net/tipc/msg.c b/net/tipc/msg.c
index 00fbb5c..f48e585 100644
--- a/net/tipc/msg.c
+++ b/net/tipc/msg.c
@@ -525,6 +525,10 @@ bool tipc_msg_reverse(u32 own_node,  struct sk_buff **skb, int err)
 	if (hlen == SHORT_H_SIZE)
 		hlen = BASIC_H_SIZE;
 
+	/* Don't return data along with SYN+, - sender has a clone */
+	if (msg_is_syn(_hdr) && err == TIPC_ERR_OVERLOAD)
+		dlen = 0;
+
 	/* Allocate new buffer to return */
 	*skb = tipc_buf_acquire(hlen + dlen, GFP_ATOMIC);
 	if (!*skb)
@@ -552,6 +556,22 @@ bool tipc_msg_reverse(u32 own_node,  struct sk_buff **skb, int err)
 	return false;
 }
 
+bool tipc_msg_skb_clone(struct sk_buff_head *msg, struct sk_buff_head *cpy)
+{
+	struct sk_buff *skb, *_skb;
+
+	skb_queue_walk(msg, skb) {
+		_skb = skb_clone(skb, GFP_ATOMIC);
+		if (!_skb) {
+			__skb_queue_purge(cpy);
+			pr_err_ratelimited("Failed to clone buffer chain\n");
+			return false;
+		}
+		__skb_queue_tail(cpy, _skb);
+	}
+	return true;
+}
+
 /**
  * tipc_msg_lookup_dest(): try to find new destination for named message
  * @skb: the buffer containing the message.
diff --git a/net/tipc/msg.h b/net/tipc/msg.h
index cd64d7b..a2879e6 100644
--- a/net/tipc/msg.h
+++ b/net/tipc/msg.h
@@ -980,6 +980,7 @@ bool tipc_msg_pskb_copy(u32 dst, struct sk_buff_head *msg,
 			struct sk_buff_head *cpy);
 void __tipc_skb_queue_sorted(struct sk_buff_head *list, u16 seqno,
 			     struct sk_buff *skb);
+bool tipc_msg_skb_clone(struct sk_buff_head *msg, struct sk_buff_head *cpy);
 
 static inline u16 buf_seqno(struct sk_buff *skb)
 {
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 89d6dc0..595c500 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -47,7 +47,7 @@
 #include "netlink.h"
 #include "group.h"
 
-#define CONN_TIMEOUT_DEFAULT	8000	/* default connect timeout = 8s */
+#define CONN_TIMEOUT_DEFAULT    8000    /* default connect timeout = 8s */
 #define CONN_PROBING_INTV	msecs_to_jiffies(3600000)  /* [ms] => 1 h */
 #define TIPC_FWD_MSG		1
 #define TIPC_MAX_PORT		0xffffffff
@@ -80,7 +80,6 @@ struct sockaddr_pair {
  * @publications: list of publications for port
  * @blocking_link: address of the congested link we are currently sleeping on
  * @pub_count: total # of publications port has made during its lifetime
- * @probing_state:
  * @conn_timeout: the time we can wait for an unresponded setup request
  * @dupl_rcvcnt: number of bytes counted twice, in both backlog and rcv queue
  * @cong_link_cnt: number of congested links
@@ -102,8 +101,8 @@ struct tipc_sock {
 	struct list_head cong_links;
 	struct list_head publications;
 	u32 pub_count;
-	uint conn_timeout;
 	atomic_t dupl_rcvcnt;
+	u16 conn_timeout;
 	bool probe_unacked;
 	u16 cong_link_cnt;
 	u16 snt_unacked;
@@ -507,6 +506,9 @@ static void __tipc_shutdown(struct socket *sock, int error)
 	tipc_wait_for_cond(sock, &timeout, (!tsk->cong_link_cnt &&
 					    !tsk_conn_cong(tsk)));
 
+	/* Remove any pending SYN message */
+	__skb_queue_purge(&sk->sk_write_queue);
+
 	/* Reject all unreceived messages, except on an active connection
 	 * (which disconnects locally & sends a 'FIN+' to peer).
 	 */
@@ -1362,6 +1364,8 @@ static int __tipc_sendmsg(struct socket *sock, struct msghdr *m, size_t dlen)
 	rc = tipc_msg_build(hdr, m, 0, dlen, mtu, &pkts);
 	if (unlikely(rc != dlen))
 		return rc;
+	if (unlikely(syn && !tipc_msg_skb_clone(&pkts, &sk->sk_write_queue)))
+		return -ENOMEM;
 
 	rc = tipc_node_xmit(net, &pkts, dnode, tsk->portid);
 	if (unlikely(rc == -ELINKCONG)) {
@@ -1491,6 +1495,7 @@ static void tipc_sk_finish_conn(struct tipc_sock *tsk, u32 peer_port,
 	tipc_node_add_conn(net, peer_node, tsk->portid, peer_port);
 	tsk->max_pkt = tipc_node_get_mtu(net, peer_node, tsk->portid);
 	tsk->peer_caps = tipc_node_get_capabilities(net, peer_node);
+	__skb_queue_purge(&sk->sk_write_queue);
 	if (tsk->peer_caps & TIPC_BLOCK_FLOWCTL)
 		return;
 
@@ -1977,6 +1982,7 @@ static bool tipc_sk_filter_connect(struct tipc_sock *tsk, struct sk_buff *skb)
 	u32 oport = msg_origport(hdr);
 	u32 onode = msg_orignode(hdr);
 	int err = msg_errcode(hdr);
+	unsigned long delay;
 
 	if (unlikely(msg_mcast(hdr)))
 		return false;
@@ -2001,8 +2007,18 @@ static bool tipc_sk_filter_connect(struct tipc_sock *tsk, struct sk_buff *skb)
 		if (oport != pport || onode != pnode)
 			return false;
 
-		/* Rejected SYN - abort */
-		break;
+		/* Rejected SYN */
+		if (err != TIPC_ERR_OVERLOAD)
+			break;
+
+		/* Prepare for new setup attempt if we have a SYN clone */
+		if (skb_queue_empty(&sk->sk_write_queue))
+			break;
+		get_random_bytes(&delay, 2);
+		delay %= (tsk->conn_timeout / 4);
+		delay = msecs_to_jiffies(delay + 100);
+		sk_reset_timer(sk, &sk->sk_timer, jiffies + delay);
+		return false;
 	case TIPC_OPEN:
 	case TIPC_DISCONNECTING:
 		return false;
@@ -2561,12 +2577,26 @@ static void tipc_sk_check_probing_state(struct sock *sk,
 	sk_reset_timer(sk, &sk->sk_timer, jiffies + CONN_PROBING_INTV);
 }
 
+static void tipc_sk_retry_connect(struct sock *sk, struct sk_buff_head *list)
+{
+	struct tipc_sock *tsk = tipc_sk(sk);
+
+	/* Try again later if dest link is congested */
+	if (tsk->cong_link_cnt) {
+		sk_reset_timer(sk, &sk->sk_timer, msecs_to_jiffies(100));
+		return;
+	}
+	/* Prepare SYN for retransmit */
+	tipc_msg_skb_clone(&sk->sk_write_queue, list);
+}
+
 static void tipc_sk_timeout(struct timer_list *t)
 {
 	struct sock *sk = from_timer(sk, t, sk_timer);
 	struct tipc_sock *tsk = tipc_sk(sk);
 	u32 pnode = tsk_peer_node(tsk);
 	struct sk_buff_head list;
+	int rc = 0;
 
 	skb_queue_head_init(&list);
 	bh_lock_sock(sk);
@@ -2580,12 +2610,19 @@ static void tipc_sk_timeout(struct timer_list *t)
 
 	if (sk->sk_state == TIPC_ESTABLISHED)
 		tipc_sk_check_probing_state(sk, &list);
+	else if (sk->sk_state == TIPC_CONNECTING)
+		tipc_sk_retry_connect(sk, &list);
 
 	bh_unlock_sock(sk);
 
 	if (!skb_queue_empty(&list))
-		tipc_node_xmit(sock_net(sk), &list, pnode, tsk->portid);
+		rc = tipc_node_xmit(sock_net(sk), &list, pnode, tsk->portid);
 
+	/* SYN messages may cause link congestion */
+	if (rc == -ELINKCONG) {
+		tipc_dest_push(&tsk->cong_links, pnode, 0);
+		tsk->cong_link_cnt = 1;
+	}
 	sock_put(sk);
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [net-next 0/5] tipc: make connection setup more robust
  2018-09-28 18:23 [net-next 0/5] tipc: make connection setup more robust Jon Maloy
                   ` (4 preceding siblings ...)
  2018-09-28 18:23 ` [net-next 5/5] tipc: buffer overflow handling in listener socket Jon Maloy
@ 2018-09-29 18:25 ` David Miller
  5 siblings, 0 replies; 7+ messages in thread
From: David Miller @ 2018-09-29 18:25 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, gordan.mihaljevic, tung.q.nguyen, hoang.h.le, canh.d.luu,
	ying.xue, tipc-discussion

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Fri, 28 Sep 2018 20:23:17 +0200

> In this series we make a few improvements to the connection setup and
> probing mechanism, culminating in the last commit where we make it 
> possible for a client socket to make multiple setup attempts in case
> it encounters receive buffer overflow at the listener socket.

Series applied, thanks Jon.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-09-30  0:54 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-09-28 18:23 [net-next 0/5] tipc: make connection setup more robust Jon Maloy
2018-09-28 18:23 ` [net-next 1/5] tipc: refactor function tipc_msg_reverse() Jon Maloy
2018-09-28 18:23 ` [net-next 2/5] tipc: refactor function tipc_sk_timeout() Jon Maloy
2018-09-28 18:23 ` [net-next 3/5] tipc: refactor function tipc_sk_filter_connect() Jon Maloy
2018-09-28 18:23 ` [net-next 4/5] tipc: add SYN bit to connection setup messages Jon Maloy
2018-09-28 18:23 ` [net-next 5/5] tipc: buffer overflow handling in listener socket Jon Maloy
2018-09-29 18:25 ` [net-next 0/5] tipc: make connection setup more robust David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).