* [Cluster-devel] [GFS2 PATCH] GFS2: Withdraw for IO errors writing to the journal or statfs
[not found] <619732467.1175311.1502902756045.JavaMail.zimbra@redhat.com>
@ 2017-08-16 17:04 ` Bob Peterson
2017-08-21 13:00 ` [Cluster-devel] [GFS2 PATCH] [resend] " Bob Peterson
0 siblings, 1 reply; 2+ messages in thread
From: Bob Peterson @ 2017-08-16 17:04 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
Before this patch, if GFS2 encountered IO errors while writing to
the journal, it would not report the problem, so they would go
unnoticed, sometimes for many hours. Sometimes this would only be
noticed later, when recovery tried to do journal replay and failed
due to invalid metadata at the blocks that resulted in IO errors.
This patch makes GFS2's log daemon check for IO errors. If it
encounters one, it withdraws from the file system and reports
why in dmesg. A similar action is taken when IO errors occur when
writing to the system statfs file.
These errors are also reported back to any callers of fsync, since
that requires the journal to be flushed. Therefore, any IO errors
that would previously go unnoticed are now noticed and the file
system is withdrawn as early as possible, thus preventing further
file system damage.
Also note that this reintroduces superblock variable sd_log_error,
which Christoph removed with commit f729b66fca.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index a7b0331c549d..0ce0b334f412 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -817,6 +817,7 @@ struct gfs2_sbd {
atomic_t sd_log_in_flight;
struct bio *sd_log_bio;
wait_queue_head_t sd_log_flush_wait;
+ int sd_log_error;
atomic_t sd_reserving_log;
wait_queue_head_t sd_reserving_log_wait;
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 31585c2d22fe..f72c44231406 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -923,6 +923,15 @@ int gfs2_logd(void *data)
while (!kthread_should_stop()) {
+ /* Check for errors writing to the journal */
+ if (sdp->sd_log_error) {
+ gfs2_lm_withdraw(sdp,
+ "GFS2: fsid=%s: error %d: "
+ "withdrawing the file system to "
+ "prevent further damage.\n",
+ sdp->sd_fsname, sdp->sd_log_error);
+ }
+
did_flush = false;
if (gfs2_jrnl_flush_reqd(sdp) || t == 0) {
gfs2_ail1_empty(sdp);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 3010f9edd177..7dabbe721dba 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -207,8 +207,11 @@ static void gfs2_end_log_write(struct bio *bio)
struct page *page;
int i;
- if (bio->bi_status)
- fs_err(sdp, "Error %d writing to log\n", bio->bi_status);
+ if (bio->bi_status) {
+ fs_err(sdp, "Error %d writing to journal, jid=%u\n",
+ bio->bi_status, sdp->sd_jdesc->jd_jid);
+ wake_up(&sdp->sd_logd_waitq);
+ }
bio_for_each_segment_all(bvec, bio, i) {
page = bvec->bv_page;
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 739adf105d7f..e647938432bd 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1474,8 +1474,11 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
{
if (error == 0 || error == -EROFS)
return;
- if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
+ if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
+ sdp->sd_log_error = error;
+ wake_up(&sdp->sd_logd_waitq);
+ }
}
static void quotad_check_timeo(struct gfs2_sbd *sdp, const char *msg,
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index a83fe8260d2e..be26569b08e6 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -946,7 +946,7 @@ static int gfs2_sync_fs(struct super_block *sb, int wait)
gfs2_quota_sync(sb, -1);
if (wait && sdp)
gfs2_log_flush(sdp, NULL, NORMAL_FLUSH);
- return 0;
+ return sdp->sd_log_error;
}
void gfs2_freeze_func(struct work_struct *work)
^ permalink raw reply related [flat|nested] 2+ messages in thread
* [Cluster-devel] [GFS2 PATCH] [resend] GFS2: Withdraw for IO errors writing to the journal or statfs
2017-08-16 17:04 ` [Cluster-devel] [GFS2 PATCH] GFS2: Withdraw for IO errors writing to the journal or statfs Bob Peterson
@ 2017-08-21 13:00 ` Bob Peterson
0 siblings, 0 replies; 2+ messages in thread
From: Bob Peterson @ 2017-08-21 13:00 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
I didn't receive any comments on this patch, so I'm reposting it.
If I don't get any comments, I'll just push it to for-next.
I have tested it with dm-flakey.
Bob Peterson
---
Before this patch, if GFS2 encountered IO errors while writing to
the journal, it would not report the problem, so they would go
unnoticed, sometimes for many hours. Sometimes this would only be
noticed later, when recovery tried to do journal replay and failed
due to invalid metadata at the blocks that resulted in IO errors.
This patch makes GFS2's log daemon check for IO errors. If it
encounters one, it withdraws from the file system and reports
why in dmesg. A similar action is taken when IO errors occur when
writing to the system statfs file.
These errors are also reported back to any callers of fsync, since
that requires the journal to be flushed. Therefore, any IO errors
that would previously go unnoticed are now noticed and the file
system is withdrawn as early as possible, thus preventing further
file system damage.
Also note that this reintroduces superblock variable sd_log_error,
which Christoph removed with commit f729b66fca.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
diff --git a/fs/gfs2/incore.h b/fs/gfs2/incore.h
index a7b0331c549d..0ce0b334f412 100644
--- a/fs/gfs2/incore.h
+++ b/fs/gfs2/incore.h
@@ -817,6 +817,7 @@ struct gfs2_sbd {
atomic_t sd_log_in_flight;
struct bio *sd_log_bio;
wait_queue_head_t sd_log_flush_wait;
+ int sd_log_error;
atomic_t sd_reserving_log;
wait_queue_head_t sd_reserving_log_wait;
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 31585c2d22fe..f72c44231406 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -923,6 +923,15 @@ int gfs2_logd(void *data)
while (!kthread_should_stop()) {
+ /* Check for errors writing to the journal */
+ if (sdp->sd_log_error) {
+ gfs2_lm_withdraw(sdp,
+ "GFS2: fsid=%s: error %d: "
+ "withdrawing the file system to "
+ "prevent further damage.\n",
+ sdp->sd_fsname, sdp->sd_log_error);
+ }
+
did_flush = false;
if (gfs2_jrnl_flush_reqd(sdp) || t == 0) {
gfs2_ail1_empty(sdp);
diff --git a/fs/gfs2/lops.c b/fs/gfs2/lops.c
index 3010f9edd177..7dabbe721dba 100644
--- a/fs/gfs2/lops.c
+++ b/fs/gfs2/lops.c
@@ -207,8 +207,11 @@ static void gfs2_end_log_write(struct bio *bio)
struct page *page;
int i;
- if (bio->bi_status)
- fs_err(sdp, "Error %d writing to log\n", bio->bi_status);
+ if (bio->bi_status) {
+ fs_err(sdp, "Error %d writing to journal, jid=%u\n",
+ bio->bi_status, sdp->sd_jdesc->jd_jid);
+ wake_up(&sdp->sd_logd_waitq);
+ }
bio_for_each_segment_all(bvec, bio, i) {
page = bvec->bv_page;
diff --git a/fs/gfs2/quota.c b/fs/gfs2/quota.c
index 739adf105d7f..e647938432bd 100644
--- a/fs/gfs2/quota.c
+++ b/fs/gfs2/quota.c
@@ -1474,8 +1474,11 @@ static void quotad_error(struct gfs2_sbd *sdp, const char *msg, int error)
{
if (error == 0 || error == -EROFS)
return;
- if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags))
+ if (!test_bit(SDF_SHUTDOWN, &sdp->sd_flags)) {
fs_err(sdp, "gfs2_quotad: %s error %d\n", msg, error);
+ sdp->sd_log_error = error;
+ wake_up(&sdp->sd_logd_waitq);
+ }
}
static void quotad_check_timeo(struct gfs2_sbd *sdp, const char *msg,
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index a83fe8260d2e..be26569b08e6 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -946,7 +946,7 @@ static int gfs2_sync_fs(struct super_block *sb, int wait)
gfs2_quota_sync(sb, -1);
if (wait && sdp)
gfs2_log_flush(sdp, NULL, NORMAL_FLUSH);
- return 0;
+ return sdp->sd_log_error;
}
void gfs2_freeze_func(struct work_struct *work)
^ permalink raw reply related [flat|nested] 2+ messages in thread
end of thread, other threads:[~2017-08-21 13:00 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <619732467.1175311.1502902756045.JavaMail.zimbra@redhat.com>
2017-08-16 17:04 ` [Cluster-devel] [GFS2 PATCH] GFS2: Withdraw for IO errors writing to the journal or statfs Bob Peterson
2017-08-21 13:00 ` [Cluster-devel] [GFS2 PATCH] [resend] " Bob Peterson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).