* [PATCH net-next 0/3] tipc: fix link failover/synch problems
@ 2015-08-20 6:12 Jon Maloy
2015-08-20 6:12 ` [PATCH net-next 1/3] tipc: eliminate risk of premature link setup during failover Jon Maloy
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Jon Maloy @ 2015-08-20 6:12 UTC (permalink / raw)
To: davem
Cc: netdev, Paul Gortmaker, erik.hugne, ying.xue, maloy,
tipc-discussion, Jon Maloy
We fix three problems with the new link failover/synch implementation,
which was introduced earlier in this release cycle. They are all related
to situations where there is a very short interval between the disabling
and enabling of interfaces.
Jon Maloy (3):
tipc: eliminate risk of premature link setup during failover
tipc: interrupt link synchronization when a link goes down
tipc: fix stale link problem during synchronization
net/tipc/link.c | 5 +++--
net/tipc/node.c | 27 +++++++++++++++++++++------
2 files changed, 24 insertions(+), 8 deletions(-)
--
1.9.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net-next 1/3] tipc: eliminate risk of premature link setup during failover
2015-08-20 6:12 [PATCH net-next 0/3] tipc: fix link failover/synch problems Jon Maloy
@ 2015-08-20 6:12 ` Jon Maloy
2015-08-20 6:12 ` [PATCH net-next 2/3] tipc: interrupt link synchronization when a link goes down Jon Maloy
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Jon Maloy @ 2015-08-20 6:12 UTC (permalink / raw)
To: davem
Cc: netdev, Paul Gortmaker, erik.hugne, ying.xue, maloy,
tipc-discussion, Jon Maloy
When a link goes down, and there is still a working link towards its
destination node, a failover is initiated, and the failed link is not
allowed to re-establish until that procedure is finished. To ensure
this, the concerned link endpoints are set to state LINK_FAILINGOVER,
and the node endpoints to NODE_FAILINGOVER during the failover period.
However, if the link reset is due to a disabled bearer, the corres-
ponding link endpoint is deleted, and only the node endpoint knows
about the ongoing failover. Now, if the disabled bearer is re-enabled
during the failover period, the discovery mechanism may create a new
link endpoint that is ready to be established, despite that this is not
permitted. This situation may cause both the ongoing failover and any
subsequent link synchronization to fail.
In this commit, we ensure that a newly created link goes directly to
state LINK_FAILINGOVER if the corresponding node state is
NODE_FAILINGOVER. This eliminates the problem described above.
Furthermore, we tighten the criteria for which packets are allowed
to end a failover state in the function tipc_node_check_state().
By checking that the receiving link is up and running, instead of just
checking that it is not in failover mode, we eliminate the risk that
protocol packets from the re-created link may cause the failover to
be prematurely terminated.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
net/tipc/node.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 7c19164..004834b 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -565,6 +565,8 @@ void tipc_node_check_dest(struct net *net, u32 onode,
goto exit;
}
tipc_link_reset(l);
+ if (n->state == NODE_FAILINGOVER)
+ tipc_link_fsm_evt(l, LINK_FAILOVER_BEGIN_EVT);
le->link = l;
n->link_cnt++;
tipc_node_calculate_timer(n, l);
@@ -1129,7 +1131,7 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb,
}
/* Open parallel link when tunnel link reaches synch point */
- if ((n->state == NODE_FAILINGOVER) && !tipc_link_is_failingover(l)) {
+ if ((n->state == NODE_FAILINGOVER) && tipc_link_is_up(l)) {
if (!more(rcv_nxt, n->sync_point))
return true;
tipc_node_fsm_evt(n, NODE_FAILOVER_END_EVT);
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next 2/3] tipc: interrupt link synchronization when a link goes down
2015-08-20 6:12 [PATCH net-next 0/3] tipc: fix link failover/synch problems Jon Maloy
2015-08-20 6:12 ` [PATCH net-next 1/3] tipc: eliminate risk of premature link setup during failover Jon Maloy
@ 2015-08-20 6:12 ` Jon Maloy
2015-08-20 6:12 ` [PATCH net-next 3/3] tipc: fix stale link problem during synchronization Jon Maloy
2015-08-23 23:15 ` [PATCH net-next 0/3] tipc: fix link failover/synch problems David Miller
3 siblings, 0 replies; 5+ messages in thread
From: Jon Maloy @ 2015-08-20 6:12 UTC (permalink / raw)
To: davem; +Cc: Jon Maloy, netdev, Paul Gortmaker, tipc-discussion
When we introduced the new link failover/synch mechanism
in commit 6e498158a827fd515b514842e9a06bdf0f75ab86
("tipc: move link synch and failover to link aggregation level"),
we missed the case when the non-tunnel link goes down during the link
synchronization period. In this case the tunnel link will remain in
state LINK_SYNCHING, something leading to unpredictable behavior when
the failover procedure is initiated.
In this commit, we ensure that the node and remaining link goes
back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED)
when one of the parallel links goes down. We also ensure that we don't
re-enter synch mode if subsequent SYNCH packets arrive on the remaining
link.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
net/tipc/link.c | 2 +-
net/tipc/node.c | 11 ++++++++---
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/net/tipc/link.c b/net/tipc/link.c
index f067e54..7058c86 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -351,11 +351,11 @@ int tipc_link_fsm_evt(struct tipc_link *l, int evt)
l->state = LINK_RESET;
break;
case LINK_ESTABLISH_EVT:
+ case LINK_SYNCH_END_EVT:
break;
case LINK_SYNCH_BEGIN_EVT:
l->state = LINK_SYNCHING;
break;
- case LINK_SYNCH_END_EVT:
case LINK_FAILOVER_BEGIN_EVT:
case LINK_FAILOVER_END_EVT:
default:
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 004834b..937cc61 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -423,6 +423,8 @@ static void __tipc_node_link_down(struct tipc_node *n, int *bearer_id,
/* There is still a working link => initiate failover */
tnl = node_active_link(n, 0);
+ tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT);
+ tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT);
n->sync_point = tnl->rcv_nxt + (U16_MAX / 2 - 1);
tipc_link_tnl_prepare(l, tnl, FAILOVER_MSG, xmitq);
tipc_link_reset(l);
@@ -1140,6 +1142,10 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb,
return true;
}
+ /* No synching needed if only one link */
+ if (!pl || !tipc_link_is_up(pl))
+ return true;
+
/* Initiate or update synch mode if applicable */
if ((usr == TUNNEL_PROTOCOL) && (mtyp == SYNCH_MSG)) {
syncpt = iseqno + exp_pkts - 1;
@@ -1158,9 +1164,8 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb,
/* Open tunnel link when parallel link reaches synch point */
if ((n->state == NODE_SYNCHING) && tipc_link_is_synching(l)) {
- if (pl)
- dlv_nxt = mod(pl->rcv_nxt - skb_queue_len(pl->inputq));
- if (!pl || more(dlv_nxt, n->sync_point)) {
+ dlv_nxt = pl->rcv_nxt - mod(skb_queue_len(pl->inputq));
+ if (more(dlv_nxt, n->sync_point)) {
tipc_link_fsm_evt(l, LINK_SYNCH_END_EVT);
tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT);
return true;
--
1.9.1
------------------------------------------------------------------------------
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net-next 3/3] tipc: fix stale link problem during synchronization
2015-08-20 6:12 [PATCH net-next 0/3] tipc: fix link failover/synch problems Jon Maloy
2015-08-20 6:12 ` [PATCH net-next 1/3] tipc: eliminate risk of premature link setup during failover Jon Maloy
2015-08-20 6:12 ` [PATCH net-next 2/3] tipc: interrupt link synchronization when a link goes down Jon Maloy
@ 2015-08-20 6:12 ` Jon Maloy
2015-08-23 23:15 ` [PATCH net-next 0/3] tipc: fix link failover/synch problems David Miller
3 siblings, 0 replies; 5+ messages in thread
From: Jon Maloy @ 2015-08-20 6:12 UTC (permalink / raw)
To: davem
Cc: netdev, Paul Gortmaker, erik.hugne, ying.xue, maloy,
tipc-discussion, Jon Maloy
Recent changes to the link synchronization means that we can now just
drop packets arriving on the synchronizing link before the synch point
is reached. This has lead to significant simplifications to the
implementation, but also turns out to have a flip side that we need
to consider.
Under unlucky circumstances, the two endpoints may end up
repeatedly dropping each other's packets, while immediately
asking for retransmission of the same packets, just to drop
them once more. This pattern will eventually be broken when
the synch point is reached on the other link, but before that,
the endpoints may have arrived at the retransmission limit
(stale counter) that indicates that the link should be broken.
We see this happen at rare occasions.
The fix for this is to not ask for retransmissions when a link is in
state LINK_SYNCHING. The fact that the link has reached this state
means that it has already received the first SYNCH packet, and that it
knows the synch point. Hence, it doesn't need any more packets until the
other link has reached the synch point, whereafter it can go ahead and
ask for the missing packets.
However, because of the reduced traffic on the synching link that
follows this change, it may now take longer to discover that the
synch point has been reached. We compensate for this by letting all
packets, on any of the links, trig a check for synchronization
termination. This is possible because the packets themselves don't
contain any information that is needed for discovering this condition.
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
---
net/tipc/link.c | 3 ++-
net/tipc/node.c | 12 ++++++++++--
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/net/tipc/link.c b/net/tipc/link.c
index 7058c86..75db07c 100644
--- a/net/tipc/link.c
+++ b/net/tipc/link.c
@@ -1330,6 +1330,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
u16 peers_snd_nxt = msg_next_sent(hdr);
u16 peers_tol = msg_link_tolerance(hdr);
u16 peers_prio = msg_linkprio(hdr);
+ u16 rcv_nxt = l->rcv_nxt;
char *if_name;
int rc = 0;
@@ -1393,7 +1394,7 @@ static int tipc_link_proto_rcv(struct tipc_link *l, struct sk_buff *skb,
break;
/* Send NACK if peer has sent pkts we haven't received yet */
- if (more(peers_snd_nxt, l->rcv_nxt))
+ if (more(peers_snd_nxt, rcv_nxt) && !tipc_link_is_synching(l))
rcvgap = peers_snd_nxt - l->rcv_nxt;
if (rcvgap || (msg_probe(hdr)))
tipc_link_build_proto_msg(l, STATE_MSG, 0, rcvgap,
diff --git a/net/tipc/node.c b/net/tipc/node.c
index 937cc61..703875f 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -1079,7 +1079,7 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb,
u16 exp_pkts = msg_msgcnt(hdr);
u16 rcv_nxt, syncpt, dlv_nxt;
int state = n->state;
- struct tipc_link *l, *pl = NULL;
+ struct tipc_link *l, *tnl, *pl = NULL;
struct tipc_media_addr *maddr;
int i, pb_id;
@@ -1164,12 +1164,20 @@ static bool tipc_node_check_state(struct tipc_node *n, struct sk_buff *skb,
/* Open tunnel link when parallel link reaches synch point */
if ((n->state == NODE_SYNCHING) && tipc_link_is_synching(l)) {
+ if (tipc_link_is_synching(l)) {
+ tnl = l;
+ } else {
+ tnl = pl;
+ pl = l;
+ }
dlv_nxt = pl->rcv_nxt - mod(skb_queue_len(pl->inputq));
if (more(dlv_nxt, n->sync_point)) {
- tipc_link_fsm_evt(l, LINK_SYNCH_END_EVT);
+ tipc_link_fsm_evt(tnl, LINK_SYNCH_END_EVT);
tipc_node_fsm_evt(n, NODE_SYNCH_END_EVT);
return true;
}
+ if (l == pl)
+ return true;
if ((usr == TUNNEL_PROTOCOL) && (mtyp == SYNCH_MSG))
return true;
if (usr == LINK_PROTOCOL)
--
1.9.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH net-next 0/3] tipc: fix link failover/synch problems
2015-08-20 6:12 [PATCH net-next 0/3] tipc: fix link failover/synch problems Jon Maloy
` (2 preceding siblings ...)
2015-08-20 6:12 ` [PATCH net-next 3/3] tipc: fix stale link problem during synchronization Jon Maloy
@ 2015-08-23 23:15 ` David Miller
3 siblings, 0 replies; 5+ messages in thread
From: David Miller @ 2015-08-23 23:15 UTC (permalink / raw)
To: jon.maloy
Cc: netdev, paul.gortmaker, erik.hugne, ying.xue, maloy,
tipc-discussion
From: Jon Maloy <jon.maloy@ericsson.com>
Date: Thu, 20 Aug 2015 02:12:53 -0400
> We fix three problems with the new link failover/synch implementation,
> which was introduced earlier in this release cycle. They are all related
> to situations where there is a very short interval between the disabling
> and enabling of interfaces.
Series applied, thanks Jon.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2015-08-23 23:15 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-20 6:12 [PATCH net-next 0/3] tipc: fix link failover/synch problems Jon Maloy
2015-08-20 6:12 ` [PATCH net-next 1/3] tipc: eliminate risk of premature link setup during failover Jon Maloy
2015-08-20 6:12 ` [PATCH net-next 2/3] tipc: interrupt link synchronization when a link goes down Jon Maloy
2015-08-20 6:12 ` [PATCH net-next 3/3] tipc: fix stale link problem during synchronization Jon Maloy
2015-08-23 23:15 ` [PATCH net-next 0/3] tipc: fix link failover/synch problems David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).