[Cluster-devel] [GFS2 PATCH] GFS2: Flush work queue before clearing glock hash tables

cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed

* [Cluster-devel] [GFS2 PATCH] GFS2: Flush work queue before clearing glock hash tables
       [not found] <1306569537.16642801.1366908476539.JavaMail.root@redhat.com>
@ 2013-04-25 16:49 ` Bob Peterson
  2013-04-26  9:43   ` Steven Whitehouse
  0 siblings, 1 reply; 2+ messages in thread
From: Bob Peterson @ 2013-04-25 16:49 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

There was a timing window when a GFS2 file system was unmounted
that caused GFS2 to call BUG() and panic the kernel. The call
to BUG() is meant to ensure that the glock reference count,
gl_ref, never gets down to zero and bounce back up again. What was
happening during umount is that function gfs2_put_super was dequeing
its glocks for well-known files. In particular, we saw it on the
journal glock, sd_jinode_gh. The dequeue caused delayed work to be
queued for the glock state machine, to transition the lock to an
"unlocked" state. While the work was still queued, gfs2_put_super
called gfs2_gl_hash_clear to clear out the glock hash tables.
If the timing was just so, the glock work function would drop the
reference count at the time when it was being checked for zero,
and that caused BUG() to be called. This patch calls
flush_workqueue before clearing the glock hash tables, thereby
ensuring that the delayed work is executed before the hash tables
are cleared, and therefore the reference count never goes to zero
until the glock is cleared.

Regards,

Bob Peterson
Red Hat File Systems

Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
---
diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
index 3b9e178..b777691 100644
--- a/fs/gfs2/glock.c
+++ b/fs/gfs2/glock.c
@@ -1577,6 +1577,7 @@ static void dump_glock_func(struct gfs2_glock *gl)
 void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
 {
 	set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags);
+	flush_workqueue(glock_workqueue);
 	glock_hash_walk(clear_glock, sdp);
 	flush_workqueue(glock_workqueue);
 	wait_event(sdp->sd_glock_wait, atomic_read(&sdp->sd_glock_disposal) == 0);

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* [Cluster-devel] [GFS2 PATCH] GFS2: Flush work queue before clearing glock hash tables
  2013-04-25 16:49 ` [Cluster-devel] [GFS2 PATCH] GFS2: Flush work queue before clearing glock hash tables Bob Peterson
@ 2013-04-26  9:43   ` Steven Whitehouse
  0 siblings, 0 replies; 2+ messages in thread
From: Steven Whitehouse @ 2013-04-26  9:43 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

Now in the -nmw tree. Thanks,

Steve.

On Thu, 2013-04-25 at 12:49 -0400, Bob Peterson wrote:
> Hi,
> 
> There was a timing window when a GFS2 file system was unmounted
> that caused GFS2 to call BUG() and panic the kernel. The call
> to BUG() is meant to ensure that the glock reference count,
> gl_ref, never gets down to zero and bounce back up again. What was
> happening during umount is that function gfs2_put_super was dequeing
> its glocks for well-known files. In particular, we saw it on the
> journal glock, sd_jinode_gh. The dequeue caused delayed work to be
> queued for the glock state machine, to transition the lock to an
> "unlocked" state. While the work was still queued, gfs2_put_super
> called gfs2_gl_hash_clear to clear out the glock hash tables.
> If the timing was just so, the glock work function would drop the
> reference count at the time when it was being checked for zero,
> and that caused BUG() to be called. This patch calls
> flush_workqueue before clearing the glock hash tables, thereby
> ensuring that the delayed work is executed before the hash tables
> are cleared, and therefore the reference count never goes to zero
> until the glock is cleared.
> 
> Regards,
> 
> Bob Peterson
> Red Hat File Systems
> 
> Signed-off-by: Bob Peterson <rpeterso@redhat.com> 
> ---
> diff --git a/fs/gfs2/glock.c b/fs/gfs2/glock.c
> index 3b9e178..b777691 100644
> --- a/fs/gfs2/glock.c
> +++ b/fs/gfs2/glock.c
> @@ -1577,6 +1577,7 @@ static void dump_glock_func(struct gfs2_glock *gl)
>  void gfs2_gl_hash_clear(struct gfs2_sbd *sdp)
>  {
>  	set_bit(SDF_SKIP_DLM_UNLOCK, &sdp->sd_flags);
> +	flush_workqueue(glock_workqueue);
>  	glock_hash_walk(clear_glock, sdp);
>  	flush_workqueue(glock_workqueue);
>  	wait_event(sdp->sd_glock_wait, atomic_read(&sdp->sd_glock_disposal) == 0);
> 




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-04-26  9:43 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1306569537.16642801.1366908476539.JavaMail.root@redhat.com>
2013-04-25 16:49 ` [Cluster-devel] [GFS2 PATCH] GFS2: Flush work queue before clearing glock hash tables Bob Peterson
2013-04-26  9:43   ` Steven Whitehouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).