lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Serguei Smirnov <ssmirnov@whamcloud.com>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 14/15] lnet: don't use hops to determine the route state
Date: Mon,  8 Nov 2021 10:07:42 -0500	[thread overview]
Message-ID: <1636384063-13838-15-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1636384063-13838-1-git-send-email-jsimmons@infradead.org>

From: Serguei Smirnov <ssmirnov@whamcloud.com>

NodeA <-tcp1-> GW1 <-tcp2-> GW2 <-tcp3-> NodeB

Assuming GW1 knows how to reach tcp3 network and GW2 knows
how to reach tcp1 network, it should be possible to add routes
without specifying hop=2 on nodes A and B to reach tcp3 and tcp1
respectively and then be able to lnetctl ping between them.
Changes introduced by LU-13785 interpret default hops to be
equivalent to hop=1 set explicitly for the purpose of determining
route aliveness, which results in the routes created as described
above to be considered "down".

Fix it so that default hop setting doesn't prevent
the multi-hop scenario from working.

Fixes: 64d703ca18 ("lnet: Use lr_hops for avoid_asym_router_failure")
WC-bug-id: https://jira.whamcloud.com/browse/LU-14945
Lustre-commit: 3f2844dc9333c8645 ("LU-14945 lnet: don't use hops to determine the route state")
Signed-off-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/44674
Reviewed-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-by: Chris Horn <chris.horn@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/router.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/net/lnet/lnet/router.c b/net/lnet/lnet/router.c
index 7ce33eb..97e5ab2 100644
--- a/net/lnet/lnet/router.c
+++ b/net/lnet/lnet/router.c
@@ -318,7 +318,7 @@ bool lnet_is_route_alive(struct lnet_route *route)
 	 * routes the next-hop will not have the remote net.
 	 */
 	if (avoid_asym_router_failure &&
-	    (route->lr_hops == 1 || route->lr_hops == LNET_UNDEFINED_HOPS)) {
+	    (route->lr_hops == 1 || route->lr_single_hop)) {
 		rlpn = lnet_peer_get_net_locked(gw, route->lr_net);
 		if (!rlpn)
 			return false;
@@ -470,8 +470,7 @@ bool lnet_is_route_alive(struct lnet_route *route)
 
 		route->lr_single_hop = single_hop;
 		if (avoid_asym_router_failure &&
-		    (route->lr_hops == 1 ||
-		     route->lr_hops == LNET_UNDEFINED_HOPS))
+		    (route->lr_hops == 1 || route->lr_single_hop))
 			lnet_set_route_aliveness(route, net_up);
 		else
 			lnet_set_route_aliveness(route, true);
@@ -764,6 +763,14 @@ static void lnet_shuffle_seed(void)
 	lnet_peer_ni_decref_locked(lpni);
 	lnet_net_unlock(LNET_LOCK_EX);
 
+	/* If avoid_asym_router_failure is enabled and hop count is not
+	 * set to 1 for a route that is actually single-hop, then the
+	 * feature will fail to prevent the router from being selected
+	 * if it is missing a NI on the remote network due to misconfiguration.
+	 */
+	if (avoid_asym_router_failure && hops == LNET_UNDEFINED_HOPS)
+		CWARN("Use hops = 1 for a single-hop route when avoid_asym_router_failure feature is enabled\n");
+
 	rc = 0;
 
 	if (!add_route) {
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  parent reply	other threads:[~2021-11-08 15:08 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-08 15:07 [lustre-devel] [PATCH 00/15] lustre: update to OpenSFS tree Nov 8, 2021 James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 01/15] lustre: sec: keep encryption context in xattr cache James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 02/15] lustre: mdc: add support for grant shrink James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 03/15] lnet: Fix reference leak in lnet_parse James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 04/15] lnet: socklnd: lock ksnc_tx_queue list processing James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 05/15] lustre: ptlrpc: align function names with param names James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 06/15] lnet: don't retry allocating router buffers James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 07/15] lustre: ptlrpc: recalc timer on EINPROGRESS reply James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 08/15] lustre: obdclass: add start time to stats files James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 09/15] lustre: dne: dir migrate in QOS mode James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 10/15] lustre: lov: fix error handling in lov_new_pool James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 11/15] lustre: vfs: set_nlink() is not race-safe James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 12/15] lustre: ptlrpc: remove LASSERT in nrs_polices debugfs handler James Simmons
2021-11-08 15:07 ` [lustre-devel] [PATCH 13/15] lnet: socklnd: default conns_per_peer to 0 James Simmons
2021-11-08 15:07 ` James Simmons [this message]
2021-11-08 15:07 ` [lustre-devel] [PATCH 15/15] lustre: lmv: update default LMV upon any change James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1636384063-13838-15-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    --cc=ssmirnov@whamcloud.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).