[Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Andreas Gruenbacher <agruenba@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function
Date: Fri, 17 Apr 2020 16:27:26 +0200	[thread overview]
Message-ID: <20200417142726.188190-1-agruenba@redhat.com> (raw)
In-Reply-To: <1706994570.22510265.1587128956368.JavaMail.zimbra@redhat.com>

Hi Bob,

On Fri, Apr 17, 2020 at 3:09 PM Bob Peterson <rpeterso@redhat.com> wrote:
> ----- Original Message -----
> > Bob,
> >
> > commit "gfs2: Force withdraw to replay journals and wait for it to
> > finish" adds three new users of gfs2_glock_dq_wait in function
> > signal_our_withdraw. Is the waiting there done for a reason, or can we
> > replace gfs2_glock_dq_wait with gfs2_glock_dq / gfs2_glock_dq_uninit
> > in that function?
> >
> > Thanks,
> > Andreas
>
> Hi Andreas,
>
> I remember debugging cases in which we needed to wait.
> When we didn't wait for glocks to be demoted, the node just reacquired
> the glocks again too quickly, before other nodes could attempt recovery.
> When the withdrawing node tried to reacquire them, DLM simply granted
> them on the same node, which is the only node that must not do recovery.
>
> Addressing each instance separately:
>
> (1) We would dequeue our journal glock, then dequeue the live glock.
>     The other nodes would see the callback for the "live" glock and
>     attempt recovery, however they were denied access to the journal
>     glock because it was still held on the recovering node. That's
>     because the glock state machine (but more importantly the dlm)
>     had not yet demoted the lock completely when this took place.
>     So none of the nodes performed recovery.

Hmm, sdp->sd_journal_gh has the GL_NOCACHE flag set, so the demote
should happen immediately.  On the other hand, sdp->sd_live_gh doesn't
have that flag set, so that may be the actual problem here.

> (2) We might be able to get rid of the "wait" for the "live" glock.
>     I can't think of a case in which that would behave badly, but
>     bear in mind it's been more than a year since I originally wrote
>     that. It's probably closer to 2 years now.
> (3) We might be able to get rid of the third "wait" which is also for
>     the "live" glock. I don't remember the circumstances.
>
> TBH, I wouldn't feel comfortable getting rid of any of those waits
> until I do some heavy experimentation on my 5-node cluster with the
> recovery tests.

I agree that testing will be needed.  What do you think of the below
patch?

Thanks,
Andreas

---
 fs/gfs2/ops_fstype.c |  2 +-
 fs/gfs2/util.c       | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index e2b69ffcc6a8..98b2577b815b 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -405,7 +405,7 @@ static int init_locking(struct gfs2_sbd *sdp, struct gfs2_holder *mount_gh,
 	error = gfs2_glock_nq_num(sdp,
 				  GFS2_LIVE_LOCK, &gfs2_nondisk_glops,
 				  LM_ST_SHARED,
-				  LM_FLAG_NOEXP | GL_EXACT,
+				  LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
 				  &sdp->sd_live_gh);
 	if (error) {
 		fs_err(sdp, "can't acquire live glock: %d\n", error);
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 9b64d40ab379..6e48cef79c53 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -122,10 +122,8 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	/*
 	 * Drop the glock for our journal so another node can recover it.
 	 */
-	if (gfs2_holder_initialized(&sdp->sd_journal_gh)) {
-		gfs2_glock_dq_wait(&sdp->sd_journal_gh);
-		gfs2_holder_uninit(&sdp->sd_journal_gh);
-	}
+	if (gfs2_holder_initialized(&sdp->sd_journal_gh))
+		gfs2_glock_dq_uninit(&sdp->sd_journal_gh);
 	sdp->sd_jinode_gh.gh_flags |= GL_NOCACHE;
 	gfs2_glock_dq(&sdp->sd_jinode_gh);
 	if (test_bit(SDF_FS_FROZEN, &sdp->sd_flags)) {
@@ -167,7 +165,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	 * Dequeue the "live" glock, but keep a reference so it's never freed.
 	 */
 	gfs2_glock_hold(gl);
-	gfs2_glock_dq_wait(&sdp->sd_live_gh);
+	gfs2_glock_dq(&sdp->sd_live_gh);
 	/*
 	 * We enqueue the "live" glock in EX so that all other nodes
 	 * get a demote request and act on it. We don't really want the
@@ -175,7 +173,8 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 	 */
 	fs_warn(sdp, "Requesting recovery of jid %d.\n",
 		sdp->sd_lockstruct.ls_jid);
-	gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | LM_FLAG_NOEXP,
+	gfs2_holder_reinit(LM_ST_EXCLUSIVE,
+			   LM_FLAG_TRY_1CB | LM_FLAG_NOEXP | GL_NOCACHE,
 			   &sdp->sd_live_gh);
 	msleep(GL_GLOCK_MAX_HOLD);
 	/*
@@ -199,8 +198,9 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
 		if (gfs2_recover_journal(sdp->sd_jdesc, 1))
 			fs_warn(sdp, "Unable to recover our journal jid %d.\n",
 				sdp->sd_lockstruct.ls_jid);
-		gfs2_glock_dq_wait(&sdp->sd_live_gh);
-		gfs2_holder_reinit(LM_ST_SHARED, LM_FLAG_NOEXP | GL_EXACT,
+		gfs2_glock_dq(&sdp->sd_live_gh);
+		gfs2_holder_reinit(LM_ST_SHARED,
+				   LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
 				   &sdp->sd_live_gh);
 		gfs2_glock_nq(&sdp->sd_live_gh);
 	}

base-commit: bd437e630fee648b79999e617d48bb07ae76f8eb
prerequisite-patch-id: e3b7fbfc1e67b4065cb6c928c29c7ed8bee0fc1c
prerequisite-patch-id: 35e9388872b2d6ffc70c4fc5497d3fdec97d6392
-- 
2.25.2

next prev parent reply	other threads:[~2020-04-17 14:27 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-17  7:36 [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function 'gfs2_glock_dq_wait'; did you mean 'gfs2_glock_nq_init'? kbuild test robot
2020-04-17 12:23 ` Andreas Gruenbacher
2020-04-17 13:09   ` Bob Peterson
2020-04-17 14:27     ` Andreas Gruenbacher [this message]
2020-04-17 14:35       ` [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function Bob Peterson

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:e2b69ffcc6a dfblob:98b2577b815 dfblob:9b64d40ab37
dfblob:6e48cef79c5 )
 OR (
bs:"[Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200417142726.188190-1-agruenba@redhat.com \
    --to=agruenba@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).