* [Cluster-devel] [PATCH] gfs2: clear journal live bit in gfs2_log_flush
@ 2015-12-09 3:21 Benjamin Marzinski
2015-12-09 13:51 ` Bob Peterson
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Benjamin Marzinski @ 2015-12-09 3:21 UTC (permalink / raw)
To: cluster-devel.redhat.com
When gfs2 was unmounting filesystems or changing them to read-only it
was clearing the SDF_JOURNAL_LIVE bit before the final log flush. This
caused a race. If an inode glock got demoted in the gap between
clearing the bit and the shutdown flush, it would be unable to reserve
log space to clear out the acive items list in inode_go_sync, causing an
error in inode_go_inval because the glock was still dirty.
To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the
shutdown log flush. This means that, because of the locking on the log
blocks, either inode_go_sync will be able to reserve space to clean the
glock before the shutdown flush, or the shutdown flush will clean the
glock itself, before inode_go_sync fails to reserve the space. Either
way, the glock will be clean before inode_go_inval.
Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
---
fs/gfs2/log.c | 3 +++
fs/gfs2/super.c | 4 ----
2 files changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 536e7a6..0ff028c 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -716,6 +716,9 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl,
}
trace_gfs2_log_flush(sdp, 1);
+ if (type == SHUTDOWN_FLUSH)
+ clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
+
sdp->sd_log_flush_head = sdp->sd_log_head;
sdp->sd_log_flush_wrapped = 0;
tr = sdp->sd_log_tr;
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 894fb01..e55c9b6 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -842,10 +842,6 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
gfs2_quota_sync(sdp->sd_vfs, 0);
gfs2_statfs_sync(sdp->sd_vfs, 0);
- down_write(&sdp->sd_log_flush_lock);
- clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
- up_write(&sdp->sd_log_flush_lock);
-
gfs2_log_flush(sdp, NULL, SHUTDOWN_FLUSH);
wait_event(sdp->sd_reserving_log_wait, atomic_read(&sdp->sd_reserving_log) == 0);
gfs2_assert_warn(sdp, atomic_read(&sdp->sd_log_blks_free) == sdp->sd_jdesc->jd_blocks);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread* [Cluster-devel] [PATCH] gfs2: clear journal live bit in gfs2_log_flush
2015-12-09 3:21 [Cluster-devel] [PATCH] gfs2: clear journal live bit in gfs2_log_flush Benjamin Marzinski
@ 2015-12-09 13:51 ` Bob Peterson
2015-12-09 14:02 ` Andreas Gruenbacher
2015-12-09 14:31 ` Steven Whitehouse
2 siblings, 0 replies; 4+ messages in thread
From: Bob Peterson @ 2015-12-09 13:51 UTC (permalink / raw)
To: cluster-devel.redhat.com
----- Original Message -----
> When gfs2 was unmounting filesystems or changing them to read-only it
> was clearing the SDF_JOURNAL_LIVE bit before the final log flush. This
> caused a race. If an inode glock got demoted in the gap between
> clearing the bit and the shutdown flush, it would be unable to reserve
> log space to clear out the acive items list in inode_go_sync, causing an
> error in inode_go_inval because the glock was still dirty.
>
> To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the
> shutdown log flush. This means that, because of the locking on the log
> blocks, either inode_go_sync will be able to reserve space to clean the
> glock before the shutdown flush, or the shutdown flush will clean the
> glock itself, before inode_go_sync fails to reserve the space. Either
> way, the glock will be clean before inode_go_inval.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> fs/gfs2/log.c | 3 +++
> fs/gfs2/super.c | 4 ----
> 2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index 536e7a6..0ff028c 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -716,6 +716,9 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct
> gfs2_glock *gl,
> }
> trace_gfs2_log_flush(sdp, 1);
>
> + if (type == SHUTDOWN_FLUSH)
> + clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
> +
> sdp->sd_log_flush_head = sdp->sd_log_head;
> sdp->sd_log_flush_wrapped = 0;
> tr = sdp->sd_log_tr;
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 894fb01..e55c9b6 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -842,10 +842,6 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
> gfs2_quota_sync(sdp->sd_vfs, 0);
> gfs2_statfs_sync(sdp->sd_vfs, 0);
>
> - down_write(&sdp->sd_log_flush_lock);
> - clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
> - up_write(&sdp->sd_log_flush_lock);
> -
> gfs2_log_flush(sdp, NULL, SHUTDOWN_FLUSH);
> wait_event(sdp->sd_reserving_log_wait, atomic_read(&sdp->sd_reserving_log)
> == 0);
> gfs2_assert_warn(sdp, atomic_read(&sdp->sd_log_blks_free) ==
> sdp->sd_jdesc->jd_blocks);
> --
> 1.8.3.1
>
>
Hi,
Thanks. This is now applied to the for-next branch of the linux-gfs2 tree:
https://git.kernel.org/cgit/linux/kernel/git/gfs2/linux-gfs2.git/commit/fs/gfs2?h=for-next&id=3ceb22a7b7a1e50658cce8c43d942e9f31f654bd
Regards,
Bob Peterson
Red Hat File Systems
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Cluster-devel] [PATCH] gfs2: clear journal live bit in gfs2_log_flush
2015-12-09 3:21 [Cluster-devel] [PATCH] gfs2: clear journal live bit in gfs2_log_flush Benjamin Marzinski
2015-12-09 13:51 ` Bob Peterson
@ 2015-12-09 14:02 ` Andreas Gruenbacher
2015-12-09 14:31 ` Steven Whitehouse
2 siblings, 0 replies; 4+ messages in thread
From: Andreas Gruenbacher @ 2015-12-09 14:02 UTC (permalink / raw)
To: cluster-devel.redhat.com
Ben,
the fix is looking good and I can no longer trigger this bug, thanks.
Andreas
^ permalink raw reply [flat|nested] 4+ messages in thread
* [Cluster-devel] [PATCH] gfs2: clear journal live bit in gfs2_log_flush
2015-12-09 3:21 [Cluster-devel] [PATCH] gfs2: clear journal live bit in gfs2_log_flush Benjamin Marzinski
2015-12-09 13:51 ` Bob Peterson
2015-12-09 14:02 ` Andreas Gruenbacher
@ 2015-12-09 14:31 ` Steven Whitehouse
2 siblings, 0 replies; 4+ messages in thread
From: Steven Whitehouse @ 2015-12-09 14:31 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
Looks good to me. We should really add the "type" into the log flush
tracing though. I'd quite like to also be able to differentiate between
log flushes caused by timeouts, and those caused by glock dropping (and
possibly other causes too) but I know that is a much more tricky thing
to do. It would be very handy for performance analysis though,
Steve.
On 09/12/15 03:21, Benjamin Marzinski wrote:
> When gfs2 was unmounting filesystems or changing them to read-only it
> was clearing the SDF_JOURNAL_LIVE bit before the final log flush. This
> caused a race. If an inode glock got demoted in the gap between
> clearing the bit and the shutdown flush, it would be unable to reserve
> log space to clear out the acive items list in inode_go_sync, causing an
> error in inode_go_inval because the glock was still dirty.
>
> To solve this, the SDF_JOURNAL_LIVE bit is now cleared inside the
> shutdown log flush. This means that, because of the locking on the log
> blocks, either inode_go_sync will be able to reserve space to clean the
> glock before the shutdown flush, or the shutdown flush will clean the
> glock itself, before inode_go_sync fails to reserve the space. Either
> way, the glock will be clean before inode_go_inval.
>
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
> fs/gfs2/log.c | 3 +++
> fs/gfs2/super.c | 4 ----
> 2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
> index 536e7a6..0ff028c 100644
> --- a/fs/gfs2/log.c
> +++ b/fs/gfs2/log.c
> @@ -716,6 +716,9 @@ void gfs2_log_flush(struct gfs2_sbd *sdp, struct gfs2_glock *gl,
> }
> trace_gfs2_log_flush(sdp, 1);
>
> + if (type == SHUTDOWN_FLUSH)
> + clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
> +
> sdp->sd_log_flush_head = sdp->sd_log_head;
> sdp->sd_log_flush_wrapped = 0;
> tr = sdp->sd_log_tr;
> diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
> index 894fb01..e55c9b6 100644
> --- a/fs/gfs2/super.c
> +++ b/fs/gfs2/super.c
> @@ -842,10 +842,6 @@ static int gfs2_make_fs_ro(struct gfs2_sbd *sdp)
> gfs2_quota_sync(sdp->sd_vfs, 0);
> gfs2_statfs_sync(sdp->sd_vfs, 0);
>
> - down_write(&sdp->sd_log_flush_lock);
> - clear_bit(SDF_JOURNAL_LIVE, &sdp->sd_flags);
> - up_write(&sdp->sd_log_flush_lock);
> -
> gfs2_log_flush(sdp, NULL, SHUTDOWN_FLUSH);
> wait_event(sdp->sd_reserving_log_wait, atomic_read(&sdp->sd_reserving_log) == 0);
> gfs2_assert_warn(sdp, atomic_read(&sdp->sd_log_blks_free) == sdp->sd_jdesc->jd_blocks);
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-12-09 14:31 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-09 3:21 [Cluster-devel] [PATCH] gfs2: clear journal live bit in gfs2_log_flush Benjamin Marzinski
2015-12-09 13:51 ` Bob Peterson
2015-12-09 14:02 ` Andreas Gruenbacher
2015-12-09 14:31 ` Steven Whitehouse
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.