cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 v8 PATCH 09/22] gfs2: Make secondary withdrawers wait for first withdrawer
Date: Mon,  9 Dec 2019 09:36:47 -0600	[thread overview]
Message-ID: <20191209153700.700208-10-rpeterso@redhat.com> (raw)
In-Reply-To: <20191209153700.700208-1-rpeterso@redhat.com>

Before this patch, if a process encountered an error and decided to
withdraw, if another process was already in the process of withdrawing,
the secondary withdraw would be silently ignored, which set it free
to proceed with its processing, unlock any locks, etc. That's correct
behavior if the original withdrawer encounters further errors down
the road. However, second withdrawers need to wait for the first
withdrawer to finish its withdraw before proceeding. If we don't wait
we could end up assuming everything is alright, unlock glocks and
telling other nodes they can have the glock, despite the fact that
a withdraw is still ongoing and may require a journal replay before
any locks are released. For example, if an rgrp glock is freed
by a process that didn't wait for the withdraw, a journal replay
could introduce file system corruption by replaying a rgrp block
that has already been granted to another node.

This patch makes secondary withdrawers wait until the primary
withdrawer is finished with its processing before proceeding.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/incore.h |  3 +++
 fs/gfs2/util.c   | 21 +++++++++++++++++++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index f6ec52776408..6e713bf536a1 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -619,6 +619,7 @@ enum {
 	SDF_FORCE_AIL_FLUSH     = 9,
 	SDF_FS_FROZEN           = 10,
 	SDF_WITHDRAWING		= 11, /* Will withdraw eventually */
+	SDF_WITHDRAW_IN_PROG	= 12, /* Withdraw is in progress */
 };
 
 enum gfs2_freeze_state {
@@ -829,6 +830,8 @@ struct gfs2_sbd {
 	struct bio *sd_log_bio;
 	wait_queue_head_t sd_log_flush_wait;
 	int sd_log_error; /* First log error */
+	atomic_t sd_withdrawer;
+	wait_queue_head_t sd_withdraw_wait;
 
 	atomic_t sd_reserving_log;
 	wait_queue_head_t sd_reserving_log_wait;
diff --git a/fs/gfs2/util.c b/fs/gfs2/util.c
index 13068968a958..4ef5218303d7 100644
--- a/fs/gfs2/util.c
+++ b/fs/gfs2/util.c
@@ -86,9 +86,23 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 	struct va_format vaf;
 
 	if (sdp->sd_args.ar_errors == GFS2_ERRORS_WITHDRAW &&
-	    test_and_set_bit(SDF_WITHDRAWN, &sdp->sd_flags))
-		return 0;
+	    test_and_set_bit(SDF_WITHDRAWN, &sdp->sd_flags)) {
+		if (!test_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags))
+			return -1;
+
+		fs_warn(sdp, "Pid %d waiting for process %d to withdraw.\n",
+			pid_nr(task_pid(current)),
+			atomic_read(&sdp->sd_withdrawer));
+		wait_on_bit(&sdp->sd_flags, SDF_WITHDRAW_IN_PROG,
+			    TASK_UNINTERRUPTIBLE);
+		fs_warn(sdp, "Pid %d done waiting for process %d.\n",
+			pid_nr(task_pid(current)),
+			atomic_read(&sdp->sd_withdrawer));
+		return -1;
+	}
 
+	set_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
+	atomic_set(&sdp->sd_withdrawer, pid_nr(task_pid(current)));
 	if (fmt) {
 		va_start(args, fmt);
 
@@ -116,6 +130,9 @@ int gfs2_lm_withdraw(struct gfs2_sbd *sdp, const char *fmt, ...)
 		set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags);
 		fs_err(sdp, "withdrawn\n");
 		dump_stack();
+		clear_bit(SDF_WITHDRAW_IN_PROG, &sdp->sd_flags);
+		smp_mb__after_atomic();
+		wake_up_bit(&sdp->sd_flags, SDF_WITHDRAW_IN_PROG);
 	}
 
 	if (sdp->sd_args.ar_errors == GFS2_ERRORS_PANIC)
-- 
2.23.0



  parent reply	other threads:[~2019-12-09 15:36 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-09 15:36 [Cluster-devel] [GFS2 v8 PATCH 00/22] GFS2 Recovery corruption patches v8 Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 01/22] gfs2: Introduce concept of a pending withdraw Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 02/22] gfs2: clear ail1 list when gfs2 withdraws Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 03/22] gfs2: Rework how rgrp buffer_heads are managed Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 04/22] gfs2: log error reform Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 05/22] gfs2: Only complain the first time an io error occurs in quota or log Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 06/22] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 07/22] gfs2: move check_journal_clean to util.c for future use Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 08/22] gfs2: Allow some glocks to be used during withdraw Bob Peterson
2019-12-09 15:36 ` Bob Peterson [this message]
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 10/22] gfs2: Force withdraw to replay journals and wait for it to finish Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 11/22] gfs2: fix infinite loop when checking ail item count before go_inval Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 12/22] gfs2: Add verbose option to check_journal_clean Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 13/22] gfs2: Issue revokes more intelligently Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 14/22] gfs2: Prepare to withdraw as soon as an IO error occurs in log write Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 15/22] gfs2: Check for log write errors before telling dlm to unlock Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 16/22] gfs2: new slab for transactions Bob Peterson
2020-01-23 22:22   ` Andreas Gruenbacher
2020-01-24 13:45     ` Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 17/22] gfs2: Do log_flush in gfs2_ail_empty_gl even if ail list is empty Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 18/22] gfs2: Don't skip log flush if glock still has revokes Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 19/22] gfs2: Withdraw in gfs2_ail1_flush if write_cache_pages returns error Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 20/22] gfs2: drain the ail2 list after io errors Bob Peterson
2019-12-09 15:36 ` [Cluster-devel] [GFS2 v8 PATCH 21/22] gfs2: Don't demote a glock until its revokes are written Bob Peterson
2019-12-09 15:37 ` [Cluster-devel] [GFS2 v8 PATCH 22/22] gfs2: Do proper error checking for go_sync family of glops functions Bob Peterson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191209153700.700208-10-rpeterso@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).