From mboxrd@z Thu Jan  1 00:00:00 1970
From: Andreas Gruenbacher <agruenba@redhat.com>
Date: Fri, 17 Apr 2020 16:27:26 +0200
Subject: [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3:
	error: implicit declaration of function
In-Reply-To: <1706994570.22510265.1587128956368.JavaMail.zimbra@redhat.com>
References: <202004171521.Z9s8gKSV%lkp@intel.com>
	<CAHc6FU43LpVbPEEq29cDkJ5X2t=6qACfoETw15QZX1CHKqPUbQ@mail.gmail.com>
	<1706994570.22510265.1587128956368.JavaMail.zimbra@redhat.com>
Message-ID: <20200417142726.188190-1-agruenba@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi Bob,

On Fri, Apr 17, 2020 at 3:09 PM Bob Peterson <rpeterso@redhat.com> wrote:
> ----- Original Message -----
> > Bob,
> >
> > commit "gfs2: Force withdraw to replay journals and wait for it to
> > finish" adds three new users of gfs2_glock_dq_wait in function
> > signal_our_withdraw. Is the waiting there done for a reason, or can we
> > replace gfs2_glock_dq_wait with gfs2_glock_dq / gfs2_glock_dq_uninit
> > in that function?
> >
> > Thanks,
> > Andreas
>
> Hi Andreas,
>
> I remember debugging cases in which we needed to wait.
> When we didn't wait for glocks to be demoted, the node just reacquired
> the glocks again too quickly, before other nodes could attempt recovery.
> When the withdrawing node tried to reacquire them, DLM simply granted
> them on the same node, which is the only node that must not do recovery.
>
> Addressing each instance separately:
>
> (1) We would dequeue our journal glock, then dequeue the live glock.
>     The other nodes would see the callback for the "live" glock and
>     attempt recovery, however they were denied access to the journal
>     glock because it was still held on the recovering node. That's
>     because the glock state machine (but more importantly the dlm)
>     had not yet demoted the lock completely when this took place.
>     So none of the nodes performed recovery.

Hmm, sdp->sd_journal_gh has the GL_NOCACHE flag set, so the demote
should happen immediately.  On the other hand, sdp->sd_live_gh doesn't
have that flag set, so that may be the actual problem here.

> (2) We might be able to get rid of the "wait" for the "live" glock.
>     I can't think of a case in which that would behave badly, but
>     bear in mind it's been more than a year since I originally wrote
>     that. It's probably closer to 2 years now.
> (3) We might be able to get rid of the third "wait" which is also for
>     the "live" glock. I don't remember the circumstances.
>
> TBH, I wouldn't feel comfortable getting rid of any of those waits
> until I do some heavy experimentation on my 5-node cluster with the
> recovery tests.

I agree that testing will be needed.  What do you think of the below
patch?

Thanks,
Andreas

---
 fs/gfs2/ops_fstype.c |  2 +-
 fs/gfs2/util.c       | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index e2b69ffcc6a8..98b2577b815b 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -405,7 +405,7 @@ static int init_locking(struct gfs2_sbd *sdp, struct gfs2_holder *mount_gh,
 	error = gfs2_glock_nq_num(sdp,
 				  GFS2_LIVE_LOCK, &gfs2_nondisk_glops,
 				  LM_ST_SHARED,
-				  LM_FLAG_NOEXP | GL_EXACT,
+				  LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
 				  &sdp->sd_live_gh);
 	if (error) {
 		fs_err(sdp, "can't acquire live glock: %d\n", error);
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 9b64d40ab379..6e48cef79c53 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -122,10 +122,8 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	/*
 	 * Drop the glock for our journal so another node can recover it.
 	 */
-	if (gfs2_holder_initialized(&sdp->sd_journal_gh)) {
-		gfs2_glock_dq_wait(&sdp->sd_journal_gh);
-		gfs2_holder_uninit(&sdp->sd_journal_gh);
-	}
+	if (gfs2_holder_initialized(&sdp->sd_journal_gh))
+		gfs2_glock_dq_uninit(&sdp->sd_journal_gh);
 	sdp->sd_jinode_gh.gh_flags |= GL_NOCACHE;
 	gfs2_glock_dq(&sdp->sd_jinode_gh);
 	if (test_bit(SDF_FS_FROZEN, &sdp->sd_flags)) {
@@ -167,7 +165,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	 * Dequeue the "live" glock, but keep a reference so it's never freed.
 	 */
 	gfs2_glock_hold(gl);
-	gfs2_glock_dq_wait(&sdp->sd_live_gh);
+	gfs2_glock_dq(&sdp->sd_live_gh);
 	/*
 	 * We enqueue the "live" glock in EX so that all other nodes
 	 * get a demote request and act on it. We don't really want the
@@ -175,7 +173,8 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	 */
 	fs_warn(sdp, "Requesting recovery of jid %d.\n",
 		sdp->sd_lockstruct.ls_jid);
-	gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | LM_FLAG_NOEXP,
+	gfs2_holder_reinit(LM_ST_EXCLUSIVE,
+			   LM_FLAG_TRY_1CB | LM_FLAG_NOEXP | GL_NOCACHE,
 			   &sdp->sd_live_gh);
 	msleep(GL_GLOCK_MAX_HOLD);
 	/*
@@ -199,8 +198,9 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 		if (gfs2_recover_journal(sdp->sd_jdesc, 1))
 			fs_warn(sdp, "Unable to recover our journal jid %d.\n",
 				sdp->sd_lockstruct.ls_jid);
-		gfs2_glock_dq_wait(&sdp->sd_live_gh);
-		gfs2_holder_reinit(LM_ST_SHARED, LM_FLAG_NOEXP | GL_EXACT,
+		gfs2_glock_dq(&sdp->sd_live_gh);
+		gfs2_holder_reinit(LM_ST_SHARED,
+				   LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
 				   &sdp->sd_live_gh);
 		gfs2_glock_nq(&sdp->sd_live_gh);
 	}

base-commit: bd437e630fee648b79999e617d48bb07ae76f8eb
prerequisite-patch-id: e3b7fbfc1e67b4065cb6c928c29c7ed8bee0fc1c
prerequisite-patch-id: 35e9388872b2d6ffc70c4fc5497d3fdec97d6392
-- 
2.25.2