lustre-devel-lustre.org archive mirror
 help / color / mirror / Atom feed
From: James Simmons <jsimmons@infradead.org>
To: Andreas Dilger <adilger@whamcloud.com>,
	Oleg Drokin <green@whamcloud.com>, NeilBrown <neilb@suse.de>
Cc: Chris Horn <chris.horn@hpe.com>,
	Lustre Development List <lustre-devel@lists.lustre.org>
Subject: [lustre-devel] [PATCH 09/15] lnet: Discovery queue and deletion race
Date: Thu, 27 Oct 2022 10:05:36 -0400	[thread overview]
Message-ID: <1666879542-10737-10-git-send-email-jsimmons@infradead.org> (raw)
In-Reply-To: <1666879542-10737-1-git-send-email-jsimmons@infradead.org>

From: Chris Horn <chris.horn@hpe.com>

lnet_peer_deletion() can race with another thread calling
lnet_peer_queue_for_discovery.

Discovery thread:
  - Calls lnet_peer_deletion():
  - LNET_PEER_DISCOVERING bit is cleared from lnet_peer::lp_state
  - releases lnet_peer::lp_lock

Another thread:
  - Acquires lnet_net_lock/EX
  - Calls lnet_peer_queue_for_discovery()
  - Takes lnet_peer::lp_lock
  - Sets LNET_PEER_DISCOVERING bit
  - Releases lnet_peer::lp_lock
  - Sees lnet_peer::lp_dc_list is not empty, so it does not add peer
    to dc request queue
  - lnet_peer_queue_for_discovery() returns, lnet_net_lock/EX releases

Discovery thread:
  - Acquires lnet_net_lock/EX
  - Deletes peer from ln_dc_working list
  - performs the peer deletion

At this point, the peer is not on any discovery list, and it has
LNET_PEER_DISCOVERING bit set. This peer is now stranded, and any
messages on the peer's lnet_peer::lp_dc_pendq are likewise stranded.

To solve this, we modify lnet_peer_deletion() so that it waits to
clear the LNET_PEER_DISCOVERING bit until it has completed deleting
the peer and re-acquired the lnet_peer::lp_lock. This ensures we
cannot race with any other thread that may add the
LNET_PEER_DISCOVERING bit back to the peer. We also avoid deleting
the peer from the ln_dc_working list in lnet_peer_deletion(). This is
already done by lnet_peer_discovery_complete().

There is another window where the LNET_PEER_DISCOVERING bit can be
added when the discovery thread drops the lp_lock just before
acquiring the net_lock/EX and calling lnet_peer_discovery_complete().
Have lnet_peer_discovery_complete() clear LNET_PEER_DISCOVERING to
deal with this (it already does this for the case where discovery hit
an error). Also move the deletion of lp_dc_list to after we clear the
DISCOVERING bit. This is to mirror the behavior of
lnet_peer_queue_for_discovery() which sets the DISCOVERING bit and
then manipulates the lp_dc_list.

Also tweak the logic in lnet_peer_deletion() to call
lnet_peer_del_locked() in order to avoid extra calls to
lnet_net_lock()/lnet_net_unlock().

HPE-bug-id: LUS-11237
WC-bug-id: https://jira.whamcloud.com/browse/LU-16149
Lustre-commit: a966b624ac76e34e8 ("LU-16149 lnet: Discovery queue and deletion race")
Signed-off-by: Chris Horn <chris.horn@hpe.com>
Reviewed-on: https://review.whamcloud.com/c/fs/lustre-release/+/48532
Reviewed-by: Frank Sehr <fsehr@whamcloud.com>
Reviewed-by: Cyril Bordage <cbordage@whamcloud.com>
Reviewed-by: Oleg Drokin <green@whamcloud.com>
Signed-off-by: James Simmons <jsimmons@infradead.org>
---
 net/lnet/lnet/peer.c | 39 ++++++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/net/lnet/lnet/peer.c b/net/lnet/lnet/peer.c
index 9b20660..d8d1857 100644
--- a/net/lnet/lnet/peer.c
+++ b/net/lnet/lnet/peer.c
@@ -2206,15 +2206,19 @@ static void lnet_peer_discovery_complete(struct lnet_peer *lp, int dc_error)
 	CDEBUG(D_NET, "Discovery complete. Dequeue peer %s\n",
 	       libcfs_nidstr(&lp->lp_primary_nid));
 
-	list_del_init(&lp->lp_dc_list);
 	spin_lock(&lp->lp_lock);
+	/* Our caller dropped lp_lock which may have allowed another thread to
+	 * set LNET_PEER_DISCOVERING, or it may be set if dc_error is non-zero.
+	 * Ensure it is cleared.
+	 */
+	lp->lp_state &= ~LNET_PEER_DISCOVERING;
 	if (dc_error) {
 		lp->lp_dc_error = dc_error;
-		lp->lp_state &= ~LNET_PEER_DISCOVERING;
 		lp->lp_state |= LNET_PEER_REDISCOVER;
 	}
 	list_splice_init(&lp->lp_dc_pendq, &pending_msgs);
 	spin_unlock(&lp->lp_lock);
+	list_del_init(&lp->lp_dc_list);
 	wake_up(&lp->lp_dc_waitq);
 
 	if (lp->lp_rtr_refcount > 0)
@@ -3152,18 +3156,16 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 	struct list_head rlist;
 	struct lnet_route *route, *tmp;
 	int sensitivity = lp->lp_health_sensitivity;
-	int rc;
+	int rc = 0;
 
 	INIT_LIST_HEAD(&rlist);
 
-	lp->lp_state &= ~(LNET_PEER_DISCOVERING | LNET_PEER_FORCE_PING |
-			  LNET_PEER_FORCE_PUSH);
 	CDEBUG(D_NET, "peer %s(%p) state %#x\n",
 	       libcfs_nidstr(&lp->lp_primary_nid), lp, lp->lp_state);
 
 	/* no-op if lnet_peer_del() has already been called on this peer */
 	if (lp->lp_state & LNET_PEER_MARK_DELETED)
-		return 0;
+		goto clear_discovering;
 
 	spin_unlock(&lp->lp_lock);
 
@@ -3172,28 +3174,25 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 	    the_lnet.ln_dc_state != LNET_DC_STATE_RUNNING) {
 		mutex_unlock(&the_lnet.ln_api_mutex);
 		spin_lock(&lp->lp_lock);
-		return -ESHUTDOWN;
+		rc = -ESHUTDOWN;
+		goto clear_discovering;
 	}
 
+	lnet_peer_cancel_discovery(lp);
 	lnet_net_lock(LNET_LOCK_EX);
-	/* remove the peer from the discovery work
-	 * queue if it's on there in preparation
-	 * of deleting it.
-	 */
-	if (!list_empty(&lp->lp_dc_list))
-		list_del_init(&lp->lp_dc_list);
 	list_for_each_entry_safe(route, tmp,
 				 &lp->lp_routes,
 				 lr_gwlist)
 		lnet_move_route(route, NULL, &rlist);
-	lnet_net_unlock(LNET_LOCK_EX);
 
-	/* lnet_peer_del() deletes all the peer NIs owned by this peer */
-	rc = lnet_peer_del(lp);
+	/* lnet_peer_del_locked() deletes all the peer NIs owned by this peer */
+	rc = lnet_peer_del_locked(lp);
 	if (rc)
 		CNETERR("Internal error: Unable to delete peer %s rc %d\n",
 			libcfs_nidstr(&lp->lp_primary_nid), rc);
 
+	lnet_net_unlock(LNET_LOCK_EX);
+
 	list_for_each_entry_safe(route, tmp,
 				 &rlist, lr_list) {
 		/* re-add these routes */
@@ -3209,7 +3208,13 @@ static int lnet_peer_deletion(struct lnet_peer *lp)
 
 	spin_lock(&lp->lp_lock);
 
-	return 0;
+	rc = 0;
+
+clear_discovering:
+	lp->lp_state &= ~(LNET_PEER_DISCOVERING | LNET_PEER_FORCE_PING |
+			  LNET_PEER_FORCE_PUSH);
+
+	return rc;
 }
 
 /*
-- 
1.8.3.1

_______________________________________________
lustre-devel mailing list
lustre-devel@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-devel-lustre.org

  parent reply	other threads:[~2022-10-27 14:21 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-27 14:05 [lustre-devel] [PATCH 00/15] lustre: sync to OpenSFS Oct 27, 2022 James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 01/15] lnet: o2iblnd: Avoid NULL md deref James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 02/15] lnet: support IPv6 in lnet_inet_enumerate() James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 03/15] lustre: sec: retry ro mount if read-only flag set James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 04/15] lustre: ptlrpc: reduce lock contention in ptlrpc_free_committed James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 05/15] lustre: llite: only statfs for projid if PROJINHERIT set James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 06/15] lustre: llite: revert: "lustre: llite: prevent mulitple group locks" James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 07/15] lustre: ldlm: group lock fix James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 08/15] lustre: llite: adjust read count as file got truncated James Simmons
2022-10-27 14:05 ` James Simmons [this message]
2022-10-27 14:05 ` [lustre-devel] [PATCH 10/15] lustre: statahead: avoid to block ptlrpcd interpret context James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 11/15] lnet: add mechanism for dumping lnd peer debug info James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 12/15] lnet: ksocklnd: fix irq lock inversion while calling sk_data_ready() James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 13/15] lustre: obdclass: fix race in class_del_profile James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 14/15] lnet: use 'fallthrough' pseudo keyword for switch James Simmons
2022-10-27 14:05 ` [lustre-devel] [PATCH 15/15] lustre: " James Simmons

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1666879542-10737-10-git-send-email-jsimmons@infradead.org \
    --to=jsimmons@infradead.org \
    --cc=adilger@whamcloud.com \
    --cc=chris.horn@hpe.com \
    --cc=green@whamcloud.com \
    --cc=lustre-devel@lists.lustre.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).