From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH v6 26/26] gfs2: Check for log write errors before telling dlm to unlock
Date: Thu, 23 May 2019 08:04:21 -0500 [thread overview]
Message-ID: <20190523130421.21003-27-rpeterso@redhat.com> (raw)
In-Reply-To: <20190523130421.21003-1-rpeterso@redhat.com>
Before this patch, function do_xmote just assumed all the writes
submitted to the journal were finished and successful, and it
called the go_unlock function to release the dlm lock. But if
they're not, and a revoke failed to make its way to the journal,
a journal replay on another node will cause corruption if we
let the go_inval function continue and tell dlm to release the
glock to another node. This patch adds a couple checks for errors
in do_xmote after the calls to go_sync and go_inval. If an error
is found, we cannot withdraw yet, because the withdraw itself
uses glocks to make the file system read-only. Instead, we flag
the error. Later, asserts should cause another node to replay
the journal before continuing, thus protecting rgrp and dinode
glocks and maintaining the integrity of the metadata. Note that
we only need to do this for journaled glocks. System glocks
should be able to progress even under withdrawn conditions.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
fs/gfs2/glock.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index cb378df8217b..d1805285ed73 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -140,7 +140,7 @@ void gfs2_glock_free(struct gfs2_glock *gl)
{
struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
- BUG_ON(test_bit(GLF_REVOKES, &gl->gl_flags));
+ GLOCK_BUG_ON(gl, test_bit(GLF_REVOKES, &gl->gl_flags));
rhashtable_remove_fast(&gl_hash_table, &gl->gl_node, ht_parms);
smp_mb();
wake_up_glock(gl);
@@ -588,6 +588,31 @@ __acquires(&gl->gl_lockref.lock)
}
gfs2_glock_hold(gl);
+ /**
+ * Check for an error encountered since we called go_sync and go_inval.
+ * If so, we can't withdraw from the glock code because the withdraw
+ * code itself uses glocks (see function signal_our_withdraw) to
+ * change the mount to read-only. Most importantly, we must not call
+ * dlm to unlock the glock until the journal is in a known good state
+ * (after journal replay) otherwise other nodes may use the object
+ * (rgrp or dinode) and then later, journal replay will corrupt the
+ * file system. The best we can do here is wait for the logd daemon
+ * to see sd_log_error and withdraw, and in the meantime, requeue the
+ * work for later.
+ *
+ * However, if we're just unlocking the lock (say, for unmount, when
+ * gfs2_gl_hash_clear calls clear_glock) and recovery is complete
+ * then it's okay to tell dlm to unlock it.
+ */
+ if (unlikely(sdp->sd_log_error &&
+ (glops->go_flags & GLOF_JOURNALED))) {
+ if (target != LM_ST_UNLOCKED ||
+ test_bit(SDF_WITHDRAW_RECOVERY, &sdp->sd_flags)) {
+ gfs2_glock_queue_work(gl, GL_GLOCK_DFT_HOLD);
+ goto out;
+ }
+ }
+
if (sdp->sd_lockstruct.ls_ops->lm_lock) {
/* lock_dlm */
ret = sdp->sd_lockstruct.ls_ops->lm_lock(gl, target, lck_flags);
@@ -596,8 +621,7 @@ __acquires(&gl->gl_lockref.lock)
test_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags)) {
finish_xmote(gl, target);
gfs2_glock_queue_work(gl, 0);
- }
- else if (ret) {
+ } else if (ret) {
fs_err(sdp, "lm_lock ret %d\n", ret);
GLOCK_BUG_ON(gl, !gfs2_withdrawn(sdp));
}
@@ -605,7 +629,7 @@ __acquires(&gl->gl_lockref.lock)
finish_xmote(gl, target);
gfs2_glock_queue_work(gl, 0);
}
-
+out:
spin_lock(&gl->gl_lockref.lock);
}
--
2.21.0
prev parent reply other threads:[~2019-05-23 13:04 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 01/26] gfs2: kthread and remount improvements Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 02/26] gfs2: eliminate tr_num_revoke_rm Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 03/26] gfs2: log which portion of the journal is replayed Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 04/26] gfs2: Warn when a journal replay overwrites a rgrp with buffers Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 05/26] gfs2: Change SDF_SHUTDOWN to SDF_WITHDRAWN Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 06/26] gfs2: simplify gfs2_freeze by removing case Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 07/26] gfs2: dump fsid when dumping glock problems Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 08/26] gfs2: replace more printk with calls to fs_info and friends Bob Peterson
2019-05-29 16:20 ` Andreas Gruenbacher
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 09/26] gfs2: Introduce concept of a pending withdraw Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 10/26] gfs2: fix infinite loop in gfs2_ail1_flush on io error Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 11/26] gfs2: log error reform Bob Peterson
2019-08-20 14:09 ` Andreas Gruenbacher
2019-09-04 16:59 ` Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 12/26] gfs2: Only complain the first time an io error occurs in quota or log Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 13/26] gfs2: Stop ail1 wait loop when withdrawn Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 14/26] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn Bob Peterson
2019-08-27 11:20 ` Andreas Gruenbacher
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 15/26] gfs2: move check_journal_clean to util.c for future use Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 16/26] gfs2: Allow some glocks to be used during withdraw Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 17/26] gfs2: Don't loop forever in gfs2_freeze if withdrawn Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 18/26] gfs2: Make secondary withdrawers wait for first withdrawer Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 19/26] gfs2: Don't write log headers after file system withdraw Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 20/26] gfs2: Force withdraw to replay journals and wait for it to finish Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 21/26] gfs2: fix infinite loop when checking ail item count before go_inval Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 22/26] gfs2: Add verbose option to check_journal_clean Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 23/26] gfs2: Abort gfs2_freeze if io error is seen Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 24/26] gfs2: Issue revokes more intelligently Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 25/26] gfs2: Prepare to withdraw as soon as an IO error occurs in log write Bob Peterson
2019-05-23 13:04 ` Bob Peterson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190523130421.21003-27-rpeterso@redhat.com \
--to=rpeterso@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).