From: Andreas Gruenbacher <agruenba@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function
Date: Fri, 17 Apr 2020 16:27:26 +0200 [thread overview]
Message-ID: <20200417142726.188190-1-agruenba@redhat.com> (raw)
In-Reply-To: <1706994570.22510265.1587128956368.JavaMail.zimbra@redhat.com>
Hi Bob,
On Fri, Apr 17, 2020 at 3:09 PM Bob Peterson <rpeterso@redhat.com> wrote:
> ----- Original Message -----
> > Bob,
> >
> > commit "gfs2: Force withdraw to replay journals and wait for it to
> > finish" adds three new users of gfs2_glock_dq_wait in function
> > signal_our_withdraw. Is the waiting there done for a reason, or can we
> > replace gfs2_glock_dq_wait with gfs2_glock_dq / gfs2_glock_dq_uninit
> > in that function?
> >
> > Thanks,
> > Andreas
>
> Hi Andreas,
>
> I remember debugging cases in which we needed to wait.
> When we didn't wait for glocks to be demoted, the node just reacquired
> the glocks again too quickly, before other nodes could attempt recovery.
> When the withdrawing node tried to reacquire them, DLM simply granted
> them on the same node, which is the only node that must not do recovery.
>
> Addressing each instance separately:
>
> (1) We would dequeue our journal glock, then dequeue the live glock.
> The other nodes would see the callback for the "live" glock and
> attempt recovery, however they were denied access to the journal
> glock because it was still held on the recovering node. That's
> because the glock state machine (but more importantly the dlm)
> had not yet demoted the lock completely when this took place.
> So none of the nodes performed recovery.
Hmm, sdp->sd_journal_gh has the GL_NOCACHE flag set, so the demote
should happen immediately. On the other hand, sdp->sd_live_gh doesn't
have that flag set, so that may be the actual problem here.
> (2) We might be able to get rid of the "wait" for the "live" glock.
> I can't think of a case in which that would behave badly, but
> bear in mind it's been more than a year since I originally wrote
> that. It's probably closer to 2 years now.
> (3) We might be able to get rid of the third "wait" which is also for
> the "live" glock. I don't remember the circumstances.
>
> TBH, I wouldn't feel comfortable getting rid of any of those waits
> until I do some heavy experimentation on my 5-node cluster with the
> recovery tests.
I agree that testing will be needed. What do you think of the below
patch?
Thanks,
Andreas
---
fs/gfs2/ops_fstype.c | 2 +-
fs/gfs2/util.c | 16 ++++++++--------
2 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c
index e2b69ffcc6a8..98b2577b815b 100644
--- a/fs/gfs2/ops_fstype.c
+++ b/fs/gfs2/ops_fstype.c
@@ -405,7 +405,7 @@ static int init_locking(struct gfs2_sbd *sdp, struct gfs2_holder *mount_gh,
error = gfs2_glock_nq_num(sdp,
GFS2_LIVE_LOCK, &gfs2_nondisk_glops,
LM_ST_SHARED,
- LM_FLAG_NOEXP | GL_EXACT,
+ LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
&sdp->sd_live_gh);
if (error) {
fs_err(sdp, "can't acquire live glock: %d\n", error);
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 9b64d40ab379..6e48cef79c53 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -122,10 +122,8 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
/*
* Drop the glock for our journal so another node can recover it.
*/
- if (gfs2_holder_initialized(&sdp->sd_journal_gh)) {
- gfs2_glock_dq_wait(&sdp->sd_journal_gh);
- gfs2_holder_uninit(&sdp->sd_journal_gh);
- }
+ if (gfs2_holder_initialized(&sdp->sd_journal_gh))
+ gfs2_glock_dq_uninit(&sdp->sd_journal_gh);
sdp->sd_jinode_gh.gh_flags |= GL_NOCACHE;
gfs2_glock_dq(&sdp->sd_jinode_gh);
if (test_bit(SDF_FS_FROZEN, &sdp->sd_flags)) {
@@ -167,7 +165,7 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
* Dequeue the "live" glock, but keep a reference so it's never freed.
*/
gfs2_glock_hold(gl);
- gfs2_glock_dq_wait(&sdp->sd_live_gh);
+ gfs2_glock_dq(&sdp->sd_live_gh);
/*
* We enqueue the "live" glock in EX so that all other nodes
* get a demote request and act on it. We don't really want the
@@ -175,7 +173,8 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
*/
fs_warn(sdp, "Requesting recovery of jid %d.\n",
sdp->sd_lockstruct.ls_jid);
- gfs2_holder_reinit(LM_ST_EXCLUSIVE, LM_FLAG_TRY_1CB | LM_FLAG_NOEXP,
+ gfs2_holder_reinit(LM_ST_EXCLUSIVE,
+ LM_FLAG_TRY_1CB | LM_FLAG_NOEXP | GL_NOCACHE,
&sdp->sd_live_gh);
msleep(GL_GLOCK_MAX_HOLD);
/*
@@ -199,8 +198,9 @@ static void signal_our_withdraw(struct gfs2_sbd *sdp)
if (gfs2_recover_journal(sdp->sd_jdesc, 1))
fs_warn(sdp, "Unable to recover our journal jid %d.\n",
sdp->sd_lockstruct.ls_jid);
- gfs2_glock_dq_wait(&sdp->sd_live_gh);
- gfs2_holder_reinit(LM_ST_SHARED, LM_FLAG_NOEXP | GL_EXACT,
+ gfs2_glock_dq(&sdp->sd_live_gh);
+ gfs2_holder_reinit(LM_ST_SHARED,
+ LM_FLAG_NOEXP | GL_EXACT | GL_NOCACHE,
&sdp->sd_live_gh);
gfs2_glock_nq(&sdp->sd_live_gh);
}
base-commit: bd437e630fee648b79999e617d48bb07ae76f8eb
prerequisite-patch-id: e3b7fbfc1e67b4065cb6c928c29c7ed8bee0fc1c
prerequisite-patch-id: 35e9388872b2d6ffc70c4fc5497d3fdec97d6392
--
2.25.2
next prev parent reply other threads:[~2020-04-17 14:27 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-17 7:36 [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function 'gfs2_glock_dq_wait'; did you mean 'gfs2_glock_nq_init'? kbuild test robot
2020-04-17 12:23 ` Andreas Gruenbacher
2020-04-17 13:09 ` Bob Peterson
2020-04-17 14:27 ` Andreas Gruenbacher [this message]
2020-04-17 14:35 ` [Cluster-devel] [gfs2:gfs2-iopen 12/12] fs/gfs2/util.c:126:3: error: implicit declaration of function Bob Peterson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200417142726.188190-1-agruenba@redhat.com \
--to=agruenba@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).