[Cluster-devel] [GFS2 PATCH v6 04/26] gfs2: Warn when a journal replay overwrites a rgrp with buffers

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

From: Bob Peterson <rpeterso@redhat.com>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [GFS2 PATCH v6 04/26] gfs2: Warn when a journal replay overwrites a rgrp with buffers
Date: Thu, 23 May 2019 08:03:59 -0500	[thread overview]
Message-ID: <20190523130421.21003-5-rpeterso@redhat.com> (raw)
In-Reply-To: <20190523130421.21003-1-rpeterso@redhat.com>

This patch adds some instrumentation in gfs2's journal replay that
indicates when we're about to overwrite a rgrp for which we already
have a valid buffer_head.

When this problem occurs, it's a situation in which this node has
been granted a rgrp glock and subsequently read in buffer_heads for
it, and possibly even made changes to the rgrp bits and/or
allocation values. But now another node has failed and forced us to
replay its journal, but its journal contains a copy of the same
rgrp, without a revoke, which means we're about to overwrite a
rgrp that we now rightfully own, with an obsolete copy. That is
always a problem. It means the other node (which failed and left
its journal to be replayed) failed to flush out its rgrp buffers,
write out the revoke, and invalidate its copy before it released
the glock to our possession.

No node should ever release a glock until its metadata has been
written to the journal and revoked and invalidated..

We also kludge around the problem and refuse to replace our good
copy with the journals bad copy by not marking the buffer dirty,
but never do it silently. That's wallpapering over a larger problem
that still exists. IOW, if this situation can happen to this node,
it can also happen to a different node and we wouldn't even know it
or be able to circumvent it: Suppose we have a 3-node cluster:
Node 1 fails, leaving an obsolete rgrp block in its journal without
a revoke. Node 2 grabs the rgrp as soon as the rgrp glock is
released and starts making changes, allocating and freeing blocks
from the rgrp, etc. Node 3 replays the journal from node 1,
oblivious and unaware that it's about to overwrite node 2's changes.
So we still need to be vocal and log the error to make it apparent
that a corruption path still exists in gfs2.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/lops.c | 22 ++++++++++++++++++++--
 1 file changed, 20 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 33ab662c9aac..ac73aa48674b 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -762,9 +762,27 @@ static int buf_lo_scan_elements(struct gfs2_jdesc *jd, u32 start,

 		if (gfs2_meta_check(sdp, bh_ip))
 			error = -EIO;
-		else
+		else {
+			struct gfs2_meta_header *mh =
+				(struct gfs2_meta_header *)bh_ip->b_data;
+
+			if (mh->mh_type == cpu_to_be32(GFS2_METATYPE_RG)) {
+				struct gfs2_rgrpd *rgd;
+
+				rgd = gfs2_blk2rgrpd(sdp, blkno, false);
+				if (rgd && rgd->rd_addr == blkno &&
+				    rgd->rd_bits && rgd->rd_bits->bi_bh) {
+					fs_info(sdp, "Replaying 0x%llx but we "
+						"already have a bh!\n",
+						(unsigned long long)blkno);
+					fs_info(sdp, "busy:%d, pinned:%d\n",
+						buffer_busy(rgd->rd_bits->bi_bh) ? 1 : 0,
+						buffer_pinned(rgd->rd_bits->bi_bh));
+					gfs2_dump_glock(NULL, rgd->rd_gl);
+				}
+			}
 			mark_buffer_dirty(bh_ip);
-
+		}
 		brelse(bh_log);
 		brelse(bh_ip);

-- 
2.21.0

next prev parent reply	other threads:[~2019-05-23 13:03 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-23 13:03 [Cluster-devel] [GFS2 PATCH v6 00/26] gfs2: misc recovery patch collection Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 01/26] gfs2: kthread and remount improvements Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 02/26] gfs2: eliminate tr_num_revoke_rm Bob Peterson
2019-05-23 13:03 ` [Cluster-devel] [GFS2 PATCH v6 03/26] gfs2: log which portion of the journal is replayed Bob Peterson
2019-05-23 13:03 ` Bob Peterson [this message]
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 05/26] gfs2: Change SDF_SHUTDOWN to SDF_WITHDRAWN Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 06/26] gfs2: simplify gfs2_freeze by removing case Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 07/26] gfs2: dump fsid when dumping glock problems Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 08/26] gfs2: replace more printk with calls to fs_info and friends Bob Peterson
2019-05-29 16:20   ` Andreas Gruenbacher
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 09/26] gfs2: Introduce concept of a pending withdraw Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 10/26] gfs2: fix infinite loop in gfs2_ail1_flush on io error Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 11/26] gfs2: log error reform Bob Peterson
2019-08-20 14:09   ` Andreas Gruenbacher
2019-09-04 16:59     ` Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 12/26] gfs2: Only complain the first time an io error occurs in quota or log Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 13/26] gfs2: Stop ail1 wait loop when withdrawn Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 14/26] gfs2: Ignore dlm recovery requests if gfs2 is withdrawn Bob Peterson
2019-08-27 11:20   ` Andreas Gruenbacher
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 15/26] gfs2: move check_journal_clean to util.c for future use Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 16/26] gfs2: Allow some glocks to be used during withdraw Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 17/26] gfs2: Don't loop forever in gfs2_freeze if withdrawn Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 18/26] gfs2: Make secondary withdrawers wait for first withdrawer Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 19/26] gfs2: Don't write log headers after file system withdraw Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 20/26] gfs2: Force withdraw to replay journals and wait for it to finish Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 21/26] gfs2: fix infinite loop when checking ail item count before go_inval Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 22/26] gfs2: Add verbose option to check_journal_clean Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 23/26] gfs2: Abort gfs2_freeze if io error is seen Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 24/26] gfs2: Issue revokes more intelligently Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 25/26] gfs2: Prepare to withdraw as soon as an IO error occurs in log write Bob Peterson
2019-05-23 13:04 ` [Cluster-devel] [GFS2 PATCH v6 26/26] gfs2: Check for log write errors before telling dlm to unlock Bob Peterson

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:33ab662c9aa dfblob:ac73aa48674 )
 OR (
bs:"[Cluster-devel] [GFS2 PATCH v6 04/26] gfs2: Warn when a journal replay overwrites a rgrp with buffers" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190523130421.21003-5-rpeterso@redhat.com \
    --to=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).