cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH] gfs2: Panic when an io error occurs writing to the journal
Date: Mon, 17 Dec 2018 08:54:17 -0500 (EST)	[thread overview]
Message-ID: <1033351102.55836224.1545054857301.JavaMail.zimbra@redhat.com> (raw)
In-Reply-To: <142516811.55835189.1545054789753.JavaMail.zimbra@redhat.com>

Hi,

Before this patch, gfs2 would try to withdraw when it encountered
io errors writing to its journal. That's incorrect behavior
because if it can't write to the journal, it cannot write revokes
for the metadata it sends down. A withdraw will cause gfs2 to
unmount the file system from dlm, which is a controlled shutdown,
but the io error means it cannot write the UNMOUNT log header
to the journal. The controlled shutdown will cause dlm to release
all its locks, allowing other nodes to update the metadata.
When the node rejoins the cluster and sees no UNMOUNT log header
it will see the journal is dirty and replay it, but after the
other nodes may have changed the metadata, thus corrupting the
file system.

If we get an io error writing to the journal, the only correct
thing to do is to kernel panic. That will force dlm to go through
its full recovery process on the other cluster nodes, freeze all
locks, and make sure the journal is replayed by a node in the
cluster before any other nodes get the affected locks and try to
modify the metadata in the unfinished portion of the journal.

This patch changes the behavior so that io errors encountered
in the journals cause an immediate kernel panic with a message.
However, quota update errors are still allowed to withdraw as
before.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/lops.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 94dcab655bc0..44b85f7675d4 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -209,11 +209,9 @@ static void gfs2_end_log_write(struct bio *bio)
 	struct page *page;
 	int i;
 
-	if (bio->bi_status) {
-		fs_err(sdp, "Error %d writing to journal, jid=%u\n",
-		       bio->bi_status, sdp->sd_jdesc->jd_jid);
-		wake_up(&sdp->sd_logd_waitq);
-	}
+	if (bio->bi_status)
+		panic("Error %d writing to journal, jid=%u\n", bio->bi_status,
+		      sdp->sd_jdesc->jd_jid);
 
 	bio_for_each_segment_all(bvec, bio, i) {
 		page = bvec->bv_page;



       reply	other threads:[~2018-12-17 13:54 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <142516811.55835189.1545054789753.JavaMail.zimbra@redhat.com>
2018-12-17 13:54 ` Bob Peterson [this message]
2018-12-17 14:04   ` [Cluster-devel] [GFS2 PATCH] gfs2: Panic when an io error occurs writing to the journal Edwin Török
2018-12-17 14:11     ` Steven Whitehouse
2018-12-17 14:58       ` Bob Peterson
2018-12-17 16:46         ` David Teigland

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1033351102.55836224.1545054857301.JavaMail.zimbra@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).