From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCA4AC282C4 for ; Tue, 12 Feb 2019 05:26:42 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 9AA0E217FA for ; Tue, 12 Feb 2019 05:26:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725976AbfBLF0l (ORCPT ); Tue, 12 Feb 2019 00:26:41 -0500 Received: from shards.monkeyblade.net ([23.128.96.9]:48236 "EHLO shards.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725778AbfBLF0k (ORCPT ); Tue, 12 Feb 2019 00:26:40 -0500 Received: from localhost (unknown [IPv6:2601:601:9f80:35cd::bf5]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (Client did not present a certificate) (Authenticated sender: davem-davemloft) by shards.monkeyblade.net (Postfix) with ESMTPSA id 073E814C1806F; Mon, 11 Feb 2019 21:26:39 -0800 (PST) Date: Mon, 11 Feb 2019 21:26:37 -0800 (PST) Message-Id: <20190211.212637.697481727085947933.davem@davemloft.net> To: tuong.t.lien@dektech.com.au Cc: jon.maloy@ericsson.com, ying.xue@windriver.com, netdev@vger.kernel.org, tipc-discussion@lists.sourceforge.net Subject: Re: [net] tipc: fix link session and re-establish issues From: David Miller In-Reply-To: <20190211062943.4864-1-tuong.t.lien@dektech.com.au> References: <20190211062943.4864-1-tuong.t.lien@dektech.com.au> X-Mailer: Mew version 6.8 on Emacs 26.1 Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Greylist: Sender succeeded SMTP AUTH, not delayed by milter-greylist-4.5.12 (shards.monkeyblade.net [149.20.54.216]); Mon, 11 Feb 2019 21:26:40 -0800 (PST) Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org From: Tuong Lien Date: Mon, 11 Feb 2019 13:29:43 +0700 > When a link endpoint is re-created (e.g. after a node reboot or > interface reset), the link session number is varied by random, the peer > endpoint will be synced with this new session number before the link is > re-established. > > However, there is a shortcoming in this mechanism that can lead to the > link never re-established or faced with a failure then. It happens when > the peer endpoint is ready in ESTABLISHING state, the 'peer_session' as > well as the 'in_session' flag have been set, but suddenly this link > endpoint leaves. When it comes back with a random session number, there > are two situations possible: > > 1/ If the random session number is larger than (or equal to) the > previous one, the peer endpoint will be updated with this new session > upon receipt of a RESET_MSG from this endpoint, and the link can be re- > established as normal. Otherwise, all the RESET_MSGs from this endpoint > will be rejected by the peer. In turn, when this link endpoint receives > one ACTIVATE_MSG from the peer, it will move to ESTABLISHED and start > to send STATE_MSGs, but again these messages will be dropped by the > peer due to wrong session. > The peer link endpoint can still become ESTABLISHED after receiving a > traffic message from this endpoint (e.g. a BCAST_PROTOCOL or > NAME_DISTRIBUTOR), but since all the STATE_MSGs are invalid, the link > will be forced down sooner or later! > > Even in case the random session number is larger than the previous one, > it can be that the ACTIVATE_MSG from the peer arrives first, and this > link endpoint moves quickly to ESTABLISHED without sending out any > RESET_MSG yet. Consequently, the peer link will not be updated with the > new session number, and the same link failure scenario as above will > happen. > > 2/ Another situation can be that, the peer link endpoint was reset due > to any reasons in the meantime, its link state was set to RESET from > ESTABLISHING but still in session, i.e. the 'in_session' flag is not > reset... > Now, if the random session number from this endpoint is less than the > previous one, all the RESET_MSGs from this endpoint will be rejected by > the peer. In the other direction, when this link endpoint receives a > RESET_MSG from the peer, it moves to ESTABLISHING and starts to send > ACTIVATE_MSGs, but all these messages will be rejected by the peer too. > As a result, the link cannot be re-established but gets stuck with this > link endpoint in state ESTABLISHING and the peer in RESET! > > Solution: > =========== > > This link endpoint should not go directly to ESTABLISHED when getting > ACTIVATE_MSG from the peer which may belong to the old session if the > link was re-created. To ensure the session to be correct before the > link is re-established, the peer endpoint in ESTABLISHING state will > send back the last session number in ACTIVATE_MSG for a verification at > this endpoint. Then, if needed, a new and more appropriate session > number will be regenerated to force a re-synch first. > > In addition, when a link in ESTABLISHING state is reset, its state will > move to RESET according to the link FSM, along with resetting the > 'in_session' flag (and the other data) as a normal link reset, it will > also be deleted if requested. > > The solution is backward compatible. > > Acked-by: Jon Maloy > Acked-by: Ying Xue > Signed-off-by: Tuong Lien Applied.