cluster-devel.redhat.com archive mirror
 help / color / mirror / Atom feed
* [Cluster-devel] [PATCH 0 of 5]Bz #248176: GFS2: invalid metadata block, gfs2_meta_indirect_buffer
@ 2007-07-24  5:07 Bob Peterson
  2007-07-24 13:21 ` Steven Whitehouse
  0 siblings, 1 reply; 2+ messages in thread
From: Bob Peterson @ 2007-07-24  5:07 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Here is a set of five patches designed to fix the "invalid metadata
block" and hang problems encountered when running the revolver test.

In order, the five patches are:

1. There were still some critical variables being manipulated outside
   the log_lock spinlock.  That usually resulted in more hangs.
2. The list_move code previously concocted in log.c for bug #238162
   (see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23)
   seems to be causing a problem.  That section was reverted.  HOWEVER,
   I need to rerun the test cases listed in that bug to make sure
   removing it doesn't cause anything to break.  I haven't had time yet.
3. The try_rgrp_unlink code in rgrp.c had an infinite loop.  This was
   caused because the bitmap function rgblk_search can return a block
   less than the "goal" block, in which case it was looping.  The fix is
   to make it always march forward as needed.
4. There was metadata corruption caused because the clone bitmaps weren't
   being kept in synch with the regular bitmaps in some cases.
   Code was added to keep them in synch.
5. Metadata corruption was occurring because page references weren't
   being removed in all cases.  I previously added a function called
   detach_bufdata, but I discovered there already WAS a function out
   there to do the job.  It's called gfs2_meta_cache_flush.  So I added
   a call to that to remove the page references.
   Recently I had been thinking that this was entirely unnecessary, but
   when I removed the code, the metadata corruption problem returned
   immediately.  It might be that there is a timing window where the
   pages can be referenced before gfs2_meta_cache_flush is called and
   my patch cleans them up sooner.




^ permalink raw reply	[flat|nested] 2+ messages in thread

* [Cluster-devel] [PATCH 0 of 5]Bz #248176: GFS2: invalid metadata block, gfs2_meta_indirect_buffer
  2007-07-24  5:07 [Cluster-devel] [PATCH 0 of 5]Bz #248176: GFS2: invalid metadata block, gfs2_meta_indirect_buffer Bob Peterson
@ 2007-07-24 13:21 ` Steven Whitehouse
  0 siblings, 0 replies; 2+ messages in thread
From: Steven Whitehouse @ 2007-07-24 13:21 UTC (permalink / raw)
  To: cluster-devel.redhat.com

Hi,

None of these patches apply due to whitespace breakage.

On Tue, 2007-07-24 at 00:07 -0500, Bob Peterson wrote:
> Here is a set of five patches designed to fix the "invalid metadata
> block" and hang problems encountered when running the revolver test.
> 
> In order, the five patches are:
> 
> 1. There were still some critical variables being manipulated outside
>    the log_lock spinlock.  That usually resulted in more hangs.
The BUG_ON(!buffer_mapped()) line in this patch can be removed as its
only debugging.

> 2. The list_move code previously concocted in log.c for bug #238162
>    (see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23)
>    seems to be causing a problem.  That section was reverted.  HOWEVER,
>    I need to rerun the test cases listed in that bug to make sure
>    removing it doesn't cause anything to break.  I haven't had time yet.
Moving the assignment head = &sdp->sd_ail1_list; doesn't give us
anything since head is a pointer and will be constant whether we have
the lock or not.

> 3. The try_rgrp_unlink code in rgrp.c had an infinite loop.  This was
>    caused because the bitmap function rgblk_search can return a block
>    less than the "goal" block, in which case it was looping.  The fix is
>    to make it always march forward as needed.
Ok.

> 4. There was metadata corruption caused because the clone bitmaps weren't
>    being kept in synch with the regular bitmaps in some cases.
>    Code was added to keep them in synch.
I need to look at this in more detail. You might be right, but its worth
being rather careful in this part of the code.

> 5. Metadata corruption was occurring because page references weren't
>    being removed in all cases.  I previously added a function called
>    detach_bufdata, but I discovered there already WAS a function out
>    there to do the job.  It's called gfs2_meta_cache_flush.  So I added
>    a call to that to remove the page references.
>    Recently I had been thinking that this was entirely unnecessary, but
>    when I removed the code, the metadata corruption problem returned
>    immediately.  It might be that there is a timing window where the
>    pages can be referenced before gfs2_meta_cache_flush is called and
>    my patch cleans them up sooner.
Yes, this looks good,

Steve.




^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-07-24 13:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-24  5:07 [Cluster-devel] [PATCH 0 of 5]Bz #248176: GFS2: invalid metadata block, gfs2_meta_indirect_buffer Bob Peterson
2007-07-24 13:21 ` Steven Whitehouse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).