cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [GFS2 PATCH 0/4] Fix revoke processing at unmount and ro
@ 2023-04-21 19:07 Bob Peterson
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 1/4] gfs2: return errors from gfs2_ail_empty_gl Bob Peterson
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Bob Peterson @ 2023-04-21 19:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

This series of patches fixes a set of corner cases regarding how revokes are
handled during unmount and transitions to read-only. Return codes were
dropped, errors were not reported, and revokes were not written properly.

Bob Peterson (4):
  gfs2: return errors from gfs2_ail_empty_gl
  gfs2: Perform second log flush in gfs2_make_fs_ro
  gfs2: Issue message when revokes cannot be written
  gfs2: gfs2_ail_empty_gl no log flush on error

 fs/gfs2/glops.c | 23 ++++++++++++++---------
 fs/gfs2/super.c |  9 +++++++++
 2 files changed, 23 insertions(+), 9 deletions(-)

-- 
2.39.2


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cluster-devel] [GFS2 PATCH 1/4] gfs2: return errors from gfs2_ail_empty_gl
  2023-04-21 19:07 [Cluster-devel] [GFS2 PATCH 0/4] Fix revoke processing at unmount and ro Bob Peterson
@ 2023-04-21 19:07 ` Bob Peterson
  2023-04-24 14:07   ` Andreas Gruenbacher
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 2/4] gfs2: Perform second log flush in gfs2_make_fs_ro Bob Peterson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Bob Peterson @ 2023-04-21 19:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function gfs2_ail_empty_gl did not return errors it
encountered from __gfs2_trans_begin. Those errors usually came from the
fact that the file system was made read-only, often due to unmount,
(but theoretically could be due to -o remount,ro) which prevented
the transaction from starting.

The inability to start a transaction prevented its revokes from being
properly written to the journal for glocks during unmount (and
transition to ro).

That meant glocks could be unlocked without the metadata properly
revoked in the journal. So other nodes could grab the glock thinking
that their lvb values were correct, but in fact corresponded to the
glock without its revokes properly synced. That presented as lvb
mismatch errors.

This patch allows gfs2_ail_empty_gl to return the error properly to
the caller.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glops.c | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 6e33c8058059..8e245d793e6b 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -90,7 +90,7 @@ static int gfs2_ail_empty_gl(struct gfs2_glock *gl)
 	struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
 	struct gfs2_trans tr;
 	unsigned int revokes;
-	int ret;
+	int ret = 0;
 
 	revokes = atomic_read(&gl->gl_ail_count);
 
@@ -124,15 +124,15 @@ static int gfs2_ail_empty_gl(struct gfs2_glock *gl)
 	memset(&tr, 0, sizeof(tr));
 	set_bit(TR_ONSTACK, &tr.tr_flags);
 	ret = __gfs2_trans_begin(&tr, sdp, 0, revokes, _RET_IP_);
-	if (ret)
-		goto flush;
-	__gfs2_ail_flush(gl, 0, revokes);
-	gfs2_trans_end(sdp);
+	if (!ret) {
+		__gfs2_ail_flush(gl, 0, revokes);
+		gfs2_trans_end(sdp);
+	}
 
 flush:
 	gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
 		       GFS2_LFC_AIL_EMPTY_GL);
-	return 0;
+	return ret;
 }
 
 void gfs2_ail_flush(struct gfs2_glock *gl, bool fsync)
@@ -326,7 +326,9 @@ static int inode_go_sync(struct gfs2_glock *gl)
 	ret = gfs2_inode_metasync(gl);
 	if (!error)
 		error = ret;
-	gfs2_ail_empty_gl(gl);
+	ret = gfs2_ail_empty_gl(gl);
+	if (!error)
+		error = ret;
 	/*
 	 * Writeback of the data mapping may cause the dirty flag to be set
 	 * so we have to clear it again here.
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [GFS2 PATCH 2/4] gfs2: Perform second log flush in gfs2_make_fs_ro
  2023-04-21 19:07 [Cluster-devel] [GFS2 PATCH 0/4] Fix revoke processing at unmount and ro Bob Peterson
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 1/4] gfs2: return errors from gfs2_ail_empty_gl Bob Peterson
@ 2023-04-21 19:07 ` Bob Peterson
  2023-04-24 14:08   ` Andreas Gruenbacher
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 3/4] gfs2: Issue message when revokes cannot be written Bob Peterson
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 4/4] gfs2: gfs2_ail_empty_gl no log flush on error Bob Peterson
  3 siblings, 1 reply; 8+ messages in thread
From: Bob Peterson @ 2023-04-21 19:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch function gfs2_make_fs_ro called gfs2_log_flush once to
finalize the log. However, if there's dirty metadata, log flushes tend
to sync the metadata and formulate revokes. Before this patch, those
revokes may not be written out to the journal immediately, which meant
unresolved glocks could still have revokes in their ail lists. When the
glock worker runs, it tries to transition the glock, but the unresolved
revokes in the ail still need to be written, so it tries to start a
transaction. It's impossible to start a transaction because at that
point the GFS2_LOG_HEAD_FLUSH_SHUTDOWN has been set by gfs2_make_fs_ro.
That cause the glock worker to get an error, and unable to write the
revokes. The calling sequence looked something like this:

gfs2_make_fs_ro
   gfs2_log_flush - with GFS2_LOG_HEAD_FLUSH_SHUTDOWN flag set
	if (flags & GFS2_LOG_HEAD_FLUSH_SHUTDOWN)
		clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
...meanwhile...
glock_work_func
   do_xmote
      rgrp_go_sync (or possibly inode_go_sync)
         ...
         gfs2_ail_empty_gl
            __gfs2_trans_begin
               if (unlikely(!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))) {
               ...
                  return -EROFS;

The previous patch in the series ("gfs2: return errors from
gfs2_ail_empty_gl") now causes the transaction error to no longer be
ignored, so it causes a warning from MOST of the xfstests:

WARNING: CPU: 11 PID: X at fs/gfs2/super.c:603 gfs2_put_super [gfs2]

which corresponds to:

WARN_ON(gfs2_withdrawing(sdp));

The withdraw was triggered silently from do_xmote by:

	if (unlikely(sdp->sd_log_error && !gfs2_withdrawn(sdp)))
		gfs2_withdraw_delayed(sdp);

This patch adds a second log_flush to gfs2_make_fs_ro: one to sync the
data and one to sync any outstanding revokes and finalize the journal.
Note that both of these log flushes need to be "special," in other
words, not GFS2_LOG_HEAD_FLUSH_NORMAL.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/super.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index a83fa62106f0..5eed8c237500 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -552,6 +552,15 @@ void gfs2_make_fs_ro(struct gfs2_sbd *sdp)
 		gfs2_quota_sync(sdp->sd_vfs, 0);
 		gfs2_statfs_sync(sdp->sd_vfs, 0);
 
+		/* We do two log flushes here. The first one commits dirty inodes
+		 * and rgrps to the journal, but queues up revokes to the ail list.
+		 * The second flush writes out and removes the revokes.
+		 *
+		 * The first must be done before the FLUSH_SHUTDOWN code
+		 * clears the LIVE flag, otherwise it will not be able to start
+		 * a transaction to write its revokes, and the error will cause
+		 * a withdraw of the file system. */
+		gfs2_log_flush(sdp, NULL, GFS2_LFC_MAKE_FS_RO);
 		gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_SHUTDOWN |
 			       GFS2_LFC_MAKE_FS_RO);
 		wait_event_timeout(sdp->sd_log_waitq,
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [GFS2 PATCH 3/4] gfs2: Issue message when revokes cannot be written
  2023-04-21 19:07 [Cluster-devel] [GFS2 PATCH 0/4] Fix revoke processing at unmount and ro Bob Peterson
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 1/4] gfs2: return errors from gfs2_ail_empty_gl Bob Peterson
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 2/4] gfs2: Perform second log flush in gfs2_make_fs_ro Bob Peterson
@ 2023-04-21 19:07 ` Bob Peterson
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 4/4] gfs2: gfs2_ail_empty_gl no log flush on error Bob Peterson
  3 siblings, 0 replies; 8+ messages in thread
From: Bob Peterson @ 2023-04-21 19:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch, function gfs2_ail_empty_gl would silently return an
error to the caller. This would get silently set into sd_log_error which
would cause a withdraw, but there was no indication why the file system
was withdrawn. This patch adds a fs_err to log the appropriate error
message.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glops.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 8e245d793e6b..4b3949f2020b 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -127,6 +127,8 @@ static int gfs2_ail_empty_gl(struct gfs2_glock *gl)
 	if (!ret) {
 		__gfs2_ail_flush(gl, 0, revokes);
 		gfs2_trans_end(sdp);
+	} else {
+		fs_err(sdp, "Transaction error %d: Unable to write revokes.", ret);
 	}
 
 flush:
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [GFS2 PATCH 4/4] gfs2: gfs2_ail_empty_gl no log flush on error
  2023-04-21 19:07 [Cluster-devel] [GFS2 PATCH 0/4] Fix revoke processing at unmount and ro Bob Peterson
                   ` (2 preceding siblings ...)
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 3/4] gfs2: Issue message when revokes cannot be written Bob Peterson
@ 2023-04-21 19:07 ` Bob Peterson
  3 siblings, 0 replies; 8+ messages in thread
From: Bob Peterson @ 2023-04-21 19:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Before this patch function gfs2_ail_empty_gl called gfs2_log_flush even
in cases where it encountered an error. It should probably skip the log
flush and leave the file system in an inconsistent state, letting a
subsequent withdraw force the journal to be replayed to reestablish
metadata consistency.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
---
 fs/gfs2/glops.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
index 4b3949f2020b..ef31218060aa 100644
--- a/fs/gfs2/glops.c
+++ b/fs/gfs2/glops.c
@@ -132,8 +132,9 @@ static int gfs2_ail_empty_gl(struct gfs2_glock *gl)
 	}
 
 flush:
-	gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
-		       GFS2_LFC_AIL_EMPTY_GL);
+	if (!ret)
+		gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
+				GFS2_LFC_AIL_EMPTY_GL);
 	return ret;
 }
 
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Cluster-devel] [GFS2 PATCH 1/4] gfs2: return errors from gfs2_ail_empty_gl
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 1/4] gfs2: return errors from gfs2_ail_empty_gl Bob Peterson
@ 2023-04-24 14:07   ` Andreas Gruenbacher
  0 siblings, 0 replies; 8+ messages in thread
From: Andreas Gruenbacher @ 2023-04-24 14:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, Apr 21, 2023 at 9:07?PM Bob Peterson <rpeterso@redhat.com> wrote:
> Before this patch, function gfs2_ail_empty_gl did not return errors it
> encountered from __gfs2_trans_begin. Those errors usually came from the
> fact that the file system was made read-only, often due to unmount,
> (but theoretically could be due to -o remount,ro) which prevented
> the transaction from starting.
>
> The inability to start a transaction prevented its revokes from being
> properly written to the journal for glocks during unmount (and
> transition to ro).
>
> That meant glocks could be unlocked without the metadata properly
> revoked in the journal. So other nodes could grab the glock thinking
> that their lvb values were correct, but in fact corresponded to the
> glock without its revokes properly synced. That presented as lvb
> mismatch errors.
>
> This patch allows gfs2_ail_empty_gl to return the error properly to
> the caller.

Alright,

> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/glops.c | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/fs/gfs2/glops.c b/fs/gfs2/glops.c
> index 6e33c8058059..8e245d793e6b 100644
> --- a/fs/gfs2/glops.c
> +++ b/fs/gfs2/glops.c
> @@ -90,7 +90,7 @@ static int gfs2_ail_empty_gl(struct gfs2_glock *gl)
>         struct gfs2_sbd *sdp = gl->gl_name.ln_sbd;
>         struct gfs2_trans tr;
>         unsigned int revokes;
> -       int ret;
> +       int ret = 0;
>
>         revokes = atomic_read(&gl->gl_ail_count);
>
> @@ -124,15 +124,15 @@ static int gfs2_ail_empty_gl(struct gfs2_glock *gl)
>         memset(&tr, 0, sizeof(tr));
>         set_bit(TR_ONSTACK, &tr.tr_flags);
>         ret = __gfs2_trans_begin(&tr, sdp, 0, revokes, _RET_IP_);
> -       if (ret)
> -               goto flush;
> -       __gfs2_ail_flush(gl, 0, revokes);
> -       gfs2_trans_end(sdp);
> +       if (!ret) {
> +               __gfs2_ail_flush(gl, 0, revokes);
> +               gfs2_trans_end(sdp);
> +       }

but the above hunk doesn't help, so let me skip that.

>  flush:
>         gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_NORMAL |
>                        GFS2_LFC_AIL_EMPTY_GL);
> -       return 0;
> +       return ret;
>  }
>
>  void gfs2_ail_flush(struct gfs2_glock *gl, bool fsync)
> @@ -326,7 +326,9 @@ static int inode_go_sync(struct gfs2_glock *gl)
>         ret = gfs2_inode_metasync(gl);
>         if (!error)
>                 error = ret;
> -       gfs2_ail_empty_gl(gl);
> +       ret = gfs2_ail_empty_gl(gl);
> +       if (!error)
> +               error = ret;
>         /*
>          * Writeback of the data mapping may cause the dirty flag to be set
>          * so we have to clear it again here.
> --
> 2.39.2
>

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cluster-devel] [GFS2 PATCH 2/4] gfs2: Perform second log flush in gfs2_make_fs_ro
  2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 2/4] gfs2: Perform second log flush in gfs2_make_fs_ro Bob Peterson
@ 2023-04-24 14:08   ` Andreas Gruenbacher
  2023-04-24 16:08     ` Bob Peterson
  0 siblings, 1 reply; 8+ messages in thread
From: Andreas Gruenbacher @ 2023-04-24 14:08 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On Fri, Apr 21, 2023 at 9:07?PM Bob Peterson <rpeterso@redhat.com> wrote:
> Before this patch function gfs2_make_fs_ro called gfs2_log_flush once to
> finalize the log. However, if there's dirty metadata, log flushes tend
> to sync the metadata and formulate revokes. Before this patch, those
> revokes may not be written out to the journal immediately, which meant
> unresolved glocks could still have revokes in their ail lists. When the
> glock worker runs, it tries to transition the glock, but the unresolved
> revokes in the ail still need to be written, so it tries to start a
> transaction. It's impossible to start a transaction because at that
> point the GFS2_LOG_HEAD_FLUSH_SHUTDOWN has been set by gfs2_make_fs_ro.

Do you mean that at that point, the SDF_JOURNAL_LIVE flag has already
been cleared?

> That cause the glock worker to get an error, and unable to write the
> revokes. The calling sequence looked something like this:
>
> gfs2_make_fs_ro
>    gfs2_log_flush - with GFS2_LOG_HEAD_FLUSH_SHUTDOWN flag set
>         if (flags & GFS2_LOG_HEAD_FLUSH_SHUTDOWN)
>                 clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
> ...meanwhile...
> glock_work_func
>    do_xmote
>       rgrp_go_sync (or possibly inode_go_sync)
>          ...
>          gfs2_ail_empty_gl
>             __gfs2_trans_begin
>                if (unlikely(!test_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags))) {
>                ...
>                   return -EROFS;
>
> The previous patch in the series ("gfs2: return errors from
> gfs2_ail_empty_gl") now causes the transaction error to no longer be
> ignored, so it causes a warning from MOST of the xfstests:
>
> WARNING: CPU: 11 PID: X at fs/gfs2/super.c:603 gfs2_put_super [gfs2]
>
> which corresponds to:
>
> WARN_ON(gfs2_withdrawing(sdp));
>
> The withdraw was triggered silently from do_xmote by:
>
>         if (unlikely(sdp->sd_log_error && !gfs2_withdrawn(sdp)))
>                 gfs2_withdraw_delayed(sdp);
>
> This patch adds a second log_flush to gfs2_make_fs_ro: one to sync the
> data and one to sync any outstanding revokes and finalize the journal.
> Note that both of these log flushes need to be "special," in other
> words, not GFS2_LOG_HEAD_FLUSH_NORMAL.
>
> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
> ---
>  fs/gfs2/super.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index a83fa62106f0..5eed8c237500 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -552,6 +552,15 @@ void gfs2_make_fs_ro(struct gfs2_sbd *sdp)
>                 gfs2_quota_sync(sdp->sd_vfs, 0);
>                 gfs2_statfs_sync(sdp->sd_vfs, 0);
>
> +               /* We do two log flushes here. The first one commits dirty inodes
> +                * and rgrps to the journal, but queues up revokes to the ail list.
> +                * The second flush writes out and removes the revokes.
> +                *
> +                * The first must be done before the FLUSH_SHUTDOWN code
> +                * clears the LIVE flag, otherwise it will not be able to start
> +                * a transaction to write its revokes, and the error will cause
> +                * a withdraw of the file system. */
> +               gfs2_log_flush(sdp, NULL, GFS2_LFC_MAKE_FS_RO);
>                 gfs2_log_flush(sdp, NULL, GFS2_LOG_HEAD_FLUSH_SHUTDOWN |
>                                GFS2_LFC_MAKE_FS_RO);
>                 wait_event_timeout(sdp->sd_log_waitq,
> --
> 2.39.2
>

Thanks,
Andreas


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Cluster-devel] [GFS2 PATCH 2/4] gfs2: Perform second log flush in gfs2_make_fs_ro
  2023-04-24 14:08   ` Andreas Gruenbacher
@ 2023-04-24 16:08     ` Bob Peterson
  0 siblings, 0 replies; 8+ messages in thread
From: Bob Peterson @ 2023-04-24 16:08 UTC (permalink / raw)
  To: cluster-devel.redhat.com

On 4/24/23 10:08 AM, Andreas Gruenbacher wrote:
>> point the GFS2_LOG_HEAD_FLUSH_SHUTDOWN has been set by gfs2_make_fs_ro.
> 
> Do you mean that at that point, the SDF_JOURNAL_LIVE flag has already
> been cleared?

Ah, yes, you are correct. That was a think-o. Please adjust as appropriate.

Regards,

Bob Peterson


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-24 16:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-21 19:07 [Cluster-devel] [GFS2 PATCH 0/4] Fix revoke processing at unmount and ro Bob Peterson
2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 1/4] gfs2: return errors from gfs2_ail_empty_gl Bob Peterson
2023-04-24 14:07   ` Andreas Gruenbacher
2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 2/4] gfs2: Perform second log flush in gfs2_make_fs_ro Bob Peterson
2023-04-24 14:08   ` Andreas Gruenbacher
2023-04-24 16:08     ` Bob Peterson
2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 3/4] gfs2: Issue message when revokes cannot be written Bob Peterson
2023-04-21 19:07 ` [Cluster-devel] [GFS2 PATCH 4/4] gfs2: gfs2_ail_empty_gl no log flush on error Bob Peterson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).