netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] tipc: some small fixes
@ 2015-02-03 13:59 Jon Maloy
  2015-02-03 13:59 ` [PATCH net-next 1/4] tipc: add reference count to struct tipc_link Jon Maloy
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Jon Maloy @ 2015-02-03 13:59 UTC (permalink / raw)
  To: davem; +Cc: Jon Maloy, netdev, Paul Gortmaker, tipc-discussion

During extensive testing and analysis of running dual links between
nodes, we have encountered some issues that potentially may cause
problems. We choose to fix those proactively in this series. 

Jon Maloy (4):
  tipc: add reference count to struct tipc_link
  tipc: avoid stale link after aborted failover
  tipc: eliminate race during node creation
  tipc: separate link starting event from link timeout event

 net/tipc/discover.c |  9 ++----
 net/tipc/link.c     | 83 ++++++++++++++++++++++++++++++++---------------------
 net/tipc/link.h     |  3 ++
 net/tipc/node.c     | 15 ++++++----
 4 files changed, 65 insertions(+), 45 deletions(-)

-- 
1.9.1


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH net-next 1/4] tipc: add reference count to struct tipc_link
  2015-02-03 13:59 [PATCH net-next 0/4] tipc: some small fixes Jon Maloy
@ 2015-02-03 13:59 ` Jon Maloy
  2015-02-03 13:59 ` [PATCH net-next 2/4] tipc: avoid stale link after aborted failover Jon Maloy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Jon Maloy @ 2015-02-03 13:59 UTC (permalink / raw)
  To: davem; +Cc: Jon Maloy, netdev, Paul Gortmaker, tipc-discussion

When a bearer is disabled, all pertaining links will be reset and
deleted. However, if there is a second active link towards a killed
link's destination, the delete has to be postponed until the failover
is finished. During this interval, we currently put the link in zombie
mode, i.e., we take it out of traffic, delete its timer, but leave it
attached to the owner node structure until all missing packets have
been received.  When this is done, we detach the link from its node
and delete it, assuming that the synchronous timer deletion that was
initiated earlier in a different thread has finished.

This is unsafe, as the failover may finish before del_timer_sync()
has returned in the other thread.

We fix this by adding an atomic reference counter of type kref in
struct tipc_link. The counter keeps track of the references kept
to the link by the owner node and the timer. We then do a conditional
delete, based on the reference counter, both after the failover has
been finished and when the timer expires, if applicable. Whoever
comes last, will actually delete the link. This approach also implies
that we can make the deletion of the timer asynchronous.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/link.c | 79 +++++++++++++++++++++++++++++++++++----------------------
 net/tipc/link.h |  2 ++
 2 files changed, 50 insertions(+), 31 deletions(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 2846ad80..46aa599 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -127,6 +127,21 @@ static unsigned int align(unsigned int i)
 	return (i + 3) & ~3u;
 }
 
+static void tipc_link_release(struct kref *kref)
+{
+	kfree(container_of(kref, struct tipc_link, ref));
+}
+
+static void tipc_link_get(struct tipc_link *l_ptr)
+{
+	kref_get(&l_ptr->ref);
+}
+
+static void tipc_link_put(struct tipc_link *l_ptr)
+{
+	kref_put(&l_ptr->ref, tipc_link_release);
+}
+
 static void link_init_max_pkt(struct tipc_link *l_ptr)
 {
 	struct tipc_node *node = l_ptr->owner;
@@ -222,11 +237,13 @@ static void link_timeout(unsigned long data)
 		tipc_link_push_packets(l_ptr);
 
 	tipc_node_unlock(l_ptr->owner);
+	tipc_link_put(l_ptr);
 }
 
 static void link_set_timer(struct tipc_link *link, unsigned long time)
 {
-	mod_timer(&link->timer, jiffies + time);
+	if (!mod_timer(&link->timer, jiffies + time))
+		tipc_link_get(link);
 }
 
 /**
@@ -267,7 +284,7 @@ struct tipc_link *tipc_link_create(struct tipc_node *n_ptr,
 		pr_warn("Link creation failed, no memory\n");
 		return NULL;
 	}
-
+	kref_init(&l_ptr->ref);
 	l_ptr->addr = peer;
 	if_name = strchr(b_ptr->name, ':') + 1;
 	sprintf(l_ptr->name, "%u.%u.%u:%s-%u.%u.%u:unknown",
@@ -305,46 +322,48 @@ struct tipc_link *tipc_link_create(struct tipc_node *n_ptr,
 	skb_queue_head_init(&l_ptr->waiting_sks);
 
 	link_reset_statistics(l_ptr);
-
 	tipc_node_attach_link(n_ptr, l_ptr);
-
 	setup_timer(&l_ptr->timer, link_timeout, (unsigned long)l_ptr);
-
 	link_state_event(l_ptr, STARTING_EVT);
 
 	return l_ptr;
 }
 
+/**
+ * link_delete - Conditional deletion of link.
+ *               If timer still running, real delete is done when it expires
+ * @link: link to be deleted
+ */
+void tipc_link_delete(struct tipc_link *link)
+{
+	tipc_link_reset_fragments(link);
+	tipc_node_detach_link(link->owner, link);
+	tipc_link_put(link);
+}
+
 void tipc_link_delete_list(struct net *net, unsigned int bearer_id,
 			   bool shutting_down)
 {
 	struct tipc_net *tn = net_generic(net, tipc_net_id);
-	struct tipc_link *l_ptr;
-	struct tipc_node *n_ptr;
+	struct tipc_link *link;
+	struct tipc_node *node;
 
 	rcu_read_lock();
-	list_for_each_entry_rcu(n_ptr, &tn->node_list, list) {
-		tipc_node_lock(n_ptr);
-		l_ptr = n_ptr->links[bearer_id];
-		if (l_ptr) {
-			tipc_link_reset(l_ptr);
-			if (shutting_down || !tipc_node_is_up(n_ptr)) {
-				tipc_node_detach_link(l_ptr->owner, l_ptr);
-				tipc_link_reset_fragments(l_ptr);
-				tipc_node_unlock(n_ptr);
-
-				/* Nobody else can access this link now: */
-				del_timer_sync(&l_ptr->timer);
-				kfree(l_ptr);
-			} else {
-				/* Detach/delete when failover is finished: */
-				l_ptr->flags |= LINK_STOPPED;
-				tipc_node_unlock(n_ptr);
-				del_timer_sync(&l_ptr->timer);
-			}
+	list_for_each_entry_rcu(node, &tn->node_list, list) {
+		tipc_node_lock(node);
+		link = node->links[bearer_id];
+		if (!link) {
+			tipc_node_unlock(node);
 			continue;
 		}
-		tipc_node_unlock(n_ptr);
+		tipc_link_reset(link);
+		if (del_timer(&link->timer))
+			tipc_link_put(link);
+		link->flags |= LINK_STOPPED;
+		/* Delete link now, or when failover is finished: */
+		if (shutting_down || !tipc_node_is_up(node))
+			tipc_link_delete(link);
+		tipc_node_unlock(node);
 	}
 	rcu_read_unlock();
 }
@@ -1837,10 +1856,8 @@ static struct sk_buff *tipc_link_failover_rcv(struct tipc_link *l_ptr,
 		}
 	}
 exit:
-	if ((l_ptr->exp_msg_count == 0) && (l_ptr->flags & LINK_STOPPED)) {
-		tipc_node_detach_link(l_ptr->owner, l_ptr);
-		kfree(l_ptr);
-	}
+	if ((!l_ptr->exp_msg_count) && (l_ptr->flags & LINK_STOPPED))
+		tipc_link_delete(l_ptr);
 	return buf;
 }
 
diff --git a/net/tipc/link.h b/net/tipc/link.h
index 9df7fa4..f06b779 100644
--- a/net/tipc/link.h
+++ b/net/tipc/link.h
@@ -103,6 +103,7 @@ struct tipc_stats {
  * @media_addr: media address to use when sending messages over link
  * @timer: link timer
  * @owner: pointer to peer node
+ * @refcnt: reference counter for permanent references (owner node & timer)
  * @flags: execution state flags for link endpoint instance
  * @checkpoint: reference point for triggering link continuity checking
  * @peer_session: link session # being used by peer end of link
@@ -142,6 +143,7 @@ struct tipc_link {
 	struct tipc_media_addr media_addr;
 	struct timer_list timer;
 	struct tipc_node *owner;
+	struct kref ref;
 
 	/* Management and link supervision data */
 	unsigned int flags;
-- 
1.9.1


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 2/4] tipc: avoid stale link after aborted failover
  2015-02-03 13:59 [PATCH net-next 0/4] tipc: some small fixes Jon Maloy
  2015-02-03 13:59 ` [PATCH net-next 1/4] tipc: add reference count to struct tipc_link Jon Maloy
@ 2015-02-03 13:59 ` Jon Maloy
  2015-02-03 13:59 ` [PATCH net-next 3/4] tipc: eliminate race during node creation Jon Maloy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Jon Maloy @ 2015-02-03 13:59 UTC (permalink / raw)
  To: davem; +Cc: Jon Maloy, netdev, Paul Gortmaker, tipc-discussion

During link failover it may happen that the remaining link goes
down while it is still in the process of taking over traffic
from a previously failed link. When this happens, we currently
abort the failover procedure and reset the first failed link to
non-failover mode, so that it will be ready to re-establish
contact with its peer when it comes available.

However, if the first link goes down because its bearer was manually
disabled, it is not enough to reset it; it must also be deleted;
which is supposed to happen when the failover procedure is finished.
Otherwise it will remain a zombie link: attached to the owner node
structure, in mode LINK_STOPPED, and permanently blocking any re-
establishing of the link to the peer via the interface in question.

We fix this by amending the failover abort procedure. Apart from
resetting the link to non-failover state, we test if the link is
also in LINK_STOPPED mode. If so, we delete it, using the conditional
tipc_link_delete() function introduced in the previous commit.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/link.h | 1 +
 net/tipc/node.c | 4 ++++
 2 files changed, 5 insertions(+)

diff --git a/net/tipc/link.h b/net/tipc/link.h
index f06b779..3e3432b 100644
--- a/net/tipc/link.h
+++ b/net/tipc/link.h
@@ -202,6 +202,7 @@ struct tipc_port;
 struct tipc_link *tipc_link_create(struct tipc_node *n_ptr,
 			      struct tipc_bearer *b_ptr,
 			      const struct tipc_media_addr *media_addr);
+void tipc_link_delete(struct tipc_link *link);
 void tipc_link_delete_list(struct net *net, unsigned int bearer_id,
 			   bool shutting_down);
 void tipc_link_failover_send_queue(struct tipc_link *l_ptr);
diff --git a/net/tipc/node.c b/net/tipc/node.c
index ee5d33c..d4cb8c1 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -406,6 +406,10 @@ static void node_lost_contact(struct tipc_node *n_ptr)
 		l_ptr->reset_checkpoint = l_ptr->next_in_no;
 		l_ptr->exp_msg_count = 0;
 		tipc_link_reset_fragments(l_ptr);
+
+		/* Link marked for deletion after failover? => do it now */
+		if (l_ptr->flags & LINK_STOPPED)
+			tipc_link_delete(l_ptr);
 	}
 
 	n_ptr->action_flags &= ~TIPC_WAIT_OWN_LINKS_DOWN;
-- 
1.9.1


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 3/4] tipc: eliminate race during node creation
  2015-02-03 13:59 [PATCH net-next 0/4] tipc: some small fixes Jon Maloy
  2015-02-03 13:59 ` [PATCH net-next 1/4] tipc: add reference count to struct tipc_link Jon Maloy
  2015-02-03 13:59 ` [PATCH net-next 2/4] tipc: avoid stale link after aborted failover Jon Maloy
@ 2015-02-03 13:59 ` Jon Maloy
  2015-02-03 13:59 ` [PATCH net-next 4/4] tipc: separate link starting event from link timeout event Jon Maloy
  2015-02-05  0:13 ` [PATCH net-next 0/4] tipc: some small fixes David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Jon Maloy @ 2015-02-03 13:59 UTC (permalink / raw)
  To: davem; +Cc: Jon Maloy, netdev, Paul Gortmaker, tipc-discussion

Instances of struct node are created in the function tipc_disc_rcv()
under the assumption that there is no race between received discovery
messages arriving from the same node. This assumption is wrong.
When we use more than one bearer, it is possible that discovery
messages from the same node arrive at the same moment, resulting in
creation of two instances of struct tipc_node. This may later cause
confusion during link establishment, and may result in one of the links
never becoming activated.

We fix this by making lookup and potential creation of nodes atomic.
Instead of first looking up the node, and in case of failure, create it,
we now start with looking up the node inside node_link_create(), and
return a reference to that one if found. Otherwise, we go ahead and
create the node as we did before.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/discover.c |  9 ++-------
 net/tipc/node.c     | 11 +++++------
 2 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/net/tipc/discover.c b/net/tipc/discover.c
index 5b40cb8..a580a40 100644
--- a/net/tipc/discover.c
+++ b/net/tipc/discover.c
@@ -1,7 +1,7 @@
 /*
  * net/tipc/discover.c
  *
- * Copyright (c) 2003-2006, 2014, Ericsson AB
+ * Copyright (c) 2003-2006, 2014-2015, Ericsson AB
  * Copyright (c) 2005-2006, 2010-2011, Wind River Systems
  * All rights reserved.
  *
@@ -47,7 +47,6 @@
 /* indicates no timer in use */
 #define TIPC_LINK_REQ_INACTIVE	0xffffffff
 
-
 /**
  * struct tipc_link_req - information about an ongoing link setup request
  * @bearer_id: identity of bearer issuing requests
@@ -163,13 +162,9 @@ void tipc_disc_rcv(struct net *net, struct sk_buff *buf,
 	if (!tipc_in_scope(bearer->domain, onode))
 		return;
 
-	/* Locate, or if necessary, create, node: */
-	node = tipc_node_find(net, onode);
-	if (!node)
-		node = tipc_node_create(net, onode);
+	node = tipc_node_create(net, onode);
 	if (!node)
 		return;
-
 	tipc_node_lock(node);
 	link = node->links[bearer->identity];
 
diff --git a/net/tipc/node.c b/net/tipc/node.c
index d4cb8c1..842bd7a 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -96,14 +96,14 @@ struct tipc_node *tipc_node_create(struct net *net, u32 addr)
 	struct tipc_node *n_ptr, *temp_node;
 
 	spin_lock_bh(&tn->node_list_lock);
-
+	n_ptr = tipc_node_find(net, addr);
+	if (n_ptr)
+		goto exit;
 	n_ptr = kzalloc(sizeof(*n_ptr), GFP_ATOMIC);
 	if (!n_ptr) {
-		spin_unlock_bh(&tn->node_list_lock);
 		pr_warn("Node creation failed, no memory\n");
-		return NULL;
+		goto exit;
 	}
-
 	n_ptr->addr = addr;
 	n_ptr->net = net;
 	spin_lock_init(&n_ptr->lock);
@@ -123,9 +123,8 @@ struct tipc_node *tipc_node_create(struct net *net, u32 addr)
 	list_add_tail_rcu(&n_ptr->list, &temp_node->list);
 	n_ptr->action_flags = TIPC_WAIT_PEER_LINKS_DOWN;
 	n_ptr->signature = INVALID_NODE_SIG;
-
 	tn->num_nodes++;
-
+exit:
 	spin_unlock_bh(&tn->node_list_lock);
 	return n_ptr;
 }
-- 
1.9.1


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH net-next 4/4] tipc: separate link starting event from link timeout event
  2015-02-03 13:59 [PATCH net-next 0/4] tipc: some small fixes Jon Maloy
                   ` (2 preceding siblings ...)
  2015-02-03 13:59 ` [PATCH net-next 3/4] tipc: eliminate race during node creation Jon Maloy
@ 2015-02-03 13:59 ` Jon Maloy
  2015-02-05  0:13 ` [PATCH net-next 0/4] tipc: some small fixes David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: Jon Maloy @ 2015-02-03 13:59 UTC (permalink / raw)
  To: davem; +Cc: Jon Maloy, netdev, Paul Gortmaker, tipc-discussion

When a new link instance is created, it is trigged to start by
sending it a TIPC_STARTING_EVT, whereafter a regular link
reset is applied to it.

The starting event is codewise treated as a timeout event, and prompts
a link RESET message to be sent to the peer node, carrying a link
session identifier. The later link_reset() call nudges this session
identifier, whereafter all subsequent RESET messages will be sent out
with the new identifier. The latter session number overrides the former,
causing the peer to unconditionally accept it irrespective of its
current working state.

We don't think that this causes any problem, but it is not in accordance
with the protocol spec, and may cause confusion when debugging TIPC
sessions.

To avoid this, we make the starting event distinct from the
subsequent timeout events, by not allowing the former to send
out any RESET message. This eliminates the described problem.

Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
 net/tipc/link.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/tipc/link.c b/net/tipc/link.c
index 46aa599..77c7ccd 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -649,7 +649,9 @@ static void link_state_event(struct tipc_link *l_ptr, unsigned int event)
 			break;
 		case STARTING_EVT:
 			l_ptr->flags |= LINK_STARTED;
-			/* fall through */
+			l_ptr->fsm_msg_cnt++;
+			link_set_timer(l_ptr, cont_intv);
+			break;
 		case TIMEOUT_EVT:
 			tipc_link_proto_xmit(l_ptr, RESET_MSG, 0, 0, 0, 0, 0);
 			l_ptr->fsm_msg_cnt++;
-- 
1.9.1


------------------------------------------------------------------------------
Dive into the World of Parallel Programming. The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net/

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net-next 0/4] tipc: some small fixes
  2015-02-03 13:59 [PATCH net-next 0/4] tipc: some small fixes Jon Maloy
                   ` (3 preceding siblings ...)
  2015-02-03 13:59 ` [PATCH net-next 4/4] tipc: separate link starting event from link timeout event Jon Maloy
@ 2015-02-05  0:13 ` David Miller
  4 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2015-02-05  0:13 UTC (permalink / raw)
  To: jon.maloy
  Cc: netdev, paul.gortmaker, erik.hugne, ying.xue, maloy,
	tipc-discussion

From: Jon Maloy <jon.maloy@ericsson.com>
Date: Tue,  3 Feb 2015 08:59:16 -0500

> During extensive testing and analysis of running dual links between
> nodes, we have encountered some issues that potentially may cause
> problems. We choose to fix those proactively in this series. 

Series applied, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2015-02-05  0:13 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-02-03 13:59 [PATCH net-next 0/4] tipc: some small fixes Jon Maloy
2015-02-03 13:59 ` [PATCH net-next 1/4] tipc: add reference count to struct tipc_link Jon Maloy
2015-02-03 13:59 ` [PATCH net-next 2/4] tipc: avoid stale link after aborted failover Jon Maloy
2015-02-03 13:59 ` [PATCH net-next 3/4] tipc: eliminate race during node creation Jon Maloy
2015-02-03 13:59 ` [PATCH net-next 4/4] tipc: separate link starting event from link timeout event Jon Maloy
2015-02-05  0:13 ` [PATCH net-next 0/4] tipc: some small fixes David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).