From mboxrd@z Thu Jan 1 00:00:00 1970 From: Bob Peterson Date: Fri, 15 Nov 2019 09:42:46 -0500 (EST) Subject: [Cluster-devel] [GFS2 PATCH v2] gfs2: Abort gfs2_freeze if io error is seen In-Reply-To: <1083483282.30233245.1573828936820.JavaMail.zimbra@redhat.com> Message-ID: <1945050534.30233372.1573828966829.JavaMail.zimbra@redhat.com> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, Revised to implement Andreas's suggestions. Bob --- Before this patch, an io error, such as -EIO writing to the journal would cause function gfs2_freeze to go into an infinite loop, continuously retrying the freeze operation. But nothing ever clears the -EIO except unmount after withdraw, which is impossible if the freeze operation never ends (fails). Instead you get: [ 6499.767994] gfs2: fsid=dm-32.0: error freezing FS: -5 [ 6499.773058] gfs2: fsid=dm-32.0: retrying... [ 6500.791957] gfs2: fsid=dm-32.0: error freezing FS: -5 [ 6500.797015] gfs2: fsid=dm-32.0: retrying... This patch adds a check for -EIO in gfs2_freeze, and if seen, it dequeues the freeze glock, aborts the loop and returns the error. Also, there's no need to pass the freeze holder to function gfs2_lock_fs_check_clean since it's only called in one place and it's a well-known superblock pointer, so this simplifies that. Signed-off-by: Bob Peterson --- fs/gfs2/super.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c index 8154c38e488b..68cc7c291a81 100644 --- a/fs/gfs2/super.c +++ b/fs/gfs2/super.c @@ -399,8 +399,7 @@ struct lfcc { * Returns: errno */ -static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp, - struct gfs2_holder *freeze_gh) +static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp) { struct gfs2_inode *ip; struct gfs2_jdesc *jd; @@ -425,7 +424,9 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp, } error = gfs2_glock_nq_init(sdp->sd_freeze_gl, LM_ST_EXCLUSIVE, - GL_NOCACHE, freeze_gh); + GL_NOCACHE, &sdp->sd_freeze_gh); + if (error) + goto out; list_for_each_entry(jd, &sdp->sd_jindex_list, jd_list) { error = gfs2_jdesc_check(jd); @@ -441,7 +442,7 @@ static int gfs2_lock_fs_check_clean(struct gfs2_sbd *sdp, } if (error) - gfs2_glock_dq_uninit(freeze_gh); + gfs2_glock_dq_uninit(&sdp->sd_freeze_gh); out: while (!list_empty(&list)) { @@ -767,15 +768,19 @@ static int gfs2_freeze(struct super_block *sb) goto out; } - error = gfs2_lock_fs_check_clean(sdp, &sdp->sd_freeze_gh); + error = gfs2_lock_fs_check_clean(sdp); if (!error) break; if (error == -EBUSY) fs_err(sdp, "waiting for recovery before freeze\n"); - else + else if (error == -EIO) { + fs_err(sdp, "Fatal IO error: cannot freeze gfs2 due " + "to recovery error.\n"); + goto out; + } else { fs_err(sdp, "error freezing FS: %d\n", error); - + } fs_err(sdp, "retrying...\n"); msleep(1000); }