lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Amir Shehata <ashehata@whamcloud.com>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 01/24] lnet: Lock primary NID logic
Date: Tue, 21 Sep 2021 22:19:38 -0400	[thread overview]
Message-ID: <1632277201-6920-2-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1632277201-6920-1-git-send-email-jsimmons@infradead.org>

From: Amir Shehata <ashehata@whamcloud.com>

If a peer is created by Lustre make sure to lock that peer's
primary NID. This peer can be discovered in the background.
There is no need to block until discovery is complete, as Lustre
can continue on with the primary NID it provided.

Discovery will populate the peer with other interfaces the peer has
but will not change the peer's primary NID. It can also delete
peer's NIDs which Lustre told it about (not the Primary NID).

WC-bug-id: https://jira.whamcloud.com/browse/LU-14668
Lustre-commit: 024f9303bc6f32a31 ("LU-14668 lnet: Lock primary NID logic")
Signed-off-by: Amir Shehata <ashehata@whamcloud.com>
Reviewed-on: https://review.whamcloud.com/43563
Reviewed-by: Serguei Smirnov <ssmirnov@whamcloud.com>
Reviewed-by: Alexander Boyko <alexander.boyko@hpe.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 69 ++++++++++++++++++++++++++++++++++++++++------------
 1 file changed, 53 insertions(+), 16 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index c2f5d8b..720af99 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -530,6 +530,16 @@ static void lnet_peer_cancel_discovery(struct lnet_peer *lp)
 			goto out;
 		}
 	}
+
+	/* If we're asked to lock down the primary NID we shouldn't be
+	 * deleting it
+	 */
+	if (lp->lp_state & LNET_PEER_LOCK_PRIMARY &&
+	    primary_nid == nid) {
+		rc = -EPERM;
+		goto out;
+	}
+
 	lpni = lnet_find_peer_ni_locked(nid);
 	if (!lpni) {
 		rc = -ENOENT;
@@ -1388,13 +1398,18 @@ struct lnet_peer_ni *
 	 * down then this discovery can introduce long delays into the mount
 	 * process, so skip it if it isn't necessary.
 	 */
-	while (!lnet_peer_discovery_disabled && !lnet_peer_is_uptodate(lp)) {
+	if (!lnet_peer_discovery_disabled && !lnet_peer_is_uptodate(lp)) {
 		spin_lock(&lp->lp_lock);
 		/* force a full discovery cycle */
-		lp->lp_state |= LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH;
+		lp->lp_state |= LNET_PEER_FORCE_PING | LNET_PEER_FORCE_PUSH |
+				LNET_PEER_LOCK_PRIMARY;
 		spin_unlock(&lp->lp_lock);
 
-		rc = lnet_discover_peer_locked(lpni, cpt, true);
+		/* start discovery in the background. Messages to that
+		 * peer will not go through until the discovery is
+		 * complete
+		 */
+		rc = lnet_discover_peer_locked(lpni, cpt, false);
 		if (rc)
 			goto out_decref;
 		/* The lpni (or lp) for this NID may have changed and our ref is
@@ -1408,14 +1423,6 @@ struct lnet_peer_ni *
 			goto out_unlock;
 		}
 		lp = lpni->lpni_peer_net->lpn_peer;
-
-		/* If we find that the peer has discovery disabled then we will
-		 * not modify whatever primary NID is currently set for this
-		 * peer. Thus, we can break out of this loop even if the peer
-		 * is not fully up to date.
-		 */
-		if (lnet_is_discovery_disabled(lp))
-			break;
 	}
 	primary_nid = lp->lp_primary_nid;
 out_decref:
@@ -1522,6 +1529,8 @@ struct lnet_peer_net *
 			lnet_peer_clr_non_mr_pref_nids(lp);
 		}
 	}
+	if (flags & LNET_PEER_LOCK_PRIMARY)
+		lp->lp_state |= LNET_PEER_LOCK_PRIMARY;
 	spin_unlock(&lp->lp_lock);
 
 	lp->lp_nnis++;
@@ -1676,9 +1685,27 @@ struct lnet_peer_net *
 		}
 		/* If this is the primary NID, destroy the peer. */
 		if (lnet_peer_ni_is_primary(lpni)) {
-			struct lnet_peer *rtr_lp =
+			struct lnet_peer *lp2 =
 				lpni->lpni_peer_net->lpn_peer;
-			int rtr_refcount = rtr_lp->lp_rtr_refcount;
+			int rtr_refcount = lp2->lp_rtr_refcount;
+
+			/* If the new peer that this NID belongs to is
+			 * a primary NID for another peer which we're
+			 * suppose to preserve the Primary for then we
+			 * don't want to mess with it. But the
+			 * configuration is wrong at this point, so we
+			 * should flag both of these peers as in a bad
+			 * state
+			 */
+			if (lp2->lp_state & LNET_PEER_LOCK_PRIMARY) {
+				spin_lock(&lp->lp_lock);
+				lp->lp_state |= LNET_PEER_BAD_CONFIG;
+				spin_unlock(&lp->lp_lock);
+				spin_lock(&lp2->lp_lock);
+				lp2->lp_state |= LNET_PEER_BAD_CONFIG;
+				spin_unlock(&lp2->lp_lock);
+				goto out_free_lpni;
+			}
 
 			/* if we're trying to delete a router it means
 			 * we're moving this peer NI to a new peer so must
@@ -1686,9 +1713,9 @@ struct lnet_peer_net *
 			 */
 			if (rtr_refcount > 0) {
 				flags |= LNET_PEER_RTR_NI_FORCE_DEL;
-				lnet_rtr_transfer_to_peer(rtr_lp, lp);
+				lnet_rtr_transfer_to_peer(lp2, lp);
 			}
-			lnet_peer_del(lpni->lpni_peer_net->lpn_peer);
+			lnet_peer_del(lp2);
 			lnet_peer_ni_decref_locked(lpni);
 			lpni = lnet_peer_ni_alloc(nid);
 			if (!lpni) {
@@ -1746,7 +1773,8 @@ struct lnet_peer_net *
 	if (lp->lp_primary_nid == nid)
 		goto out;
 
-	lp->lp_primary_nid = nid;
+	if (!(lp->lp_state & LNET_PEER_LOCK_PRIMARY))
+		lp->lp_primary_nid = nid;
 
 	rc = lnet_peer_add_nid(lp, nid, flags);
 	if (rc) {
@@ -1754,8 +1782,17 @@ struct lnet_peer_net *
 		goto out;
 	}
 out:
+	/* if this is a configured peer or the primary for that peer has
+	 * been locked, then we don't want to flag this scenario as
+	 * a failure
+	 */
+	if (lp->lp_state & LNET_PEER_CONFIGURED ||
+	    lp->lp_state & LNET_PEER_LOCK_PRIMARY)
+		return 0;
+
 	CDEBUG(D_NET, "peer %s NID %s: %d\n",
 	       libcfs_nid2str(old), libcfs_nid2str(nid), rc);
+
 	return rc;
 }
 
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  reply	other threads:[~2021-09-22  2:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-22  2:19 [lustre-devel] [PATCH 00/24] lustre: Update to OpenSFS Sept 21, 2021 James Simmons
2021-09-22  2:19 ` James Simmons [this message]
2021-09-22  2:19 ` [lustre-devel] [PATCH 02/24] lustre: quota: enforce block quota for chgrp James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 03/24] lnet: introduce struct lnet_nid James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 04/24] lnet: add string formating/parsing for IPv6 nids James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 05/24] lnet: change lpni_nid in lnet_peer_ni to lnet_nid James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 06/24] lnet: change lp_primary_nid to struct lnet_nid James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 07/24] lnet: change lp_disc_*_nid " James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 08/24] lnet: socklnd: factor out key calculation for ksnd_peers James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 09/24] lnet: introduce lnet_processid for ksock_peer_ni James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 10/24] lnet: enhance connect/accept to support large addr James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 11/24] lnet: change lr_nid to struct lnet_nid James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 12/24] lnet: extend rspt_next_hop_nid in lnet_rsp_tracker James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 13/24] lustre: ptlrpc: two replay lock threads James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 14/24] lustre: llite: Always do lookup on ENOENT in open James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 15/24] lustre: llite: Remove inode locking in ll_fsync James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 16/24] lnet: socklnd: fix link state detection James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 17/24] lustre: llite: check read only mount for setquota James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 18/24] lustre: llite: don't touch vma after filemap_fault James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 19/24] lnet: Check for -ESHUTDOWN in lnet_parse James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 20/24] lustre: obdclass: EAGAIN after rhashtable_walk_next() James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 21/24] lustre: sec: filename encryption James Simmons
2021-09-22  2:19 ` [lustre-devel] [PATCH 22/24] lustre: uapi: fixup UAPI headers for native Linux client James Simmons
2021-09-22  2:20 ` [lustre-devel] [PATCH 23/24] lustre: ptlrpc: separate out server code for wiretest James Simmons
2021-09-22  2:20 ` [lustre-devel] [PATCH 24/24] lustre: pcc: VM_WRITE should not trigger layout write James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1632277201-6920-2-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=ashehata@whamcloud.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).