From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Whitehouse Date: Tue, 24 Jul 2007 14:21:52 +0100 Subject: [Cluster-devel] [PATCH 0 of 5]Bz #248176: GFS2: invalid metadata block, gfs2_meta_indirect_buffer In-Reply-To: <1185253671.517.60.camel@technetium.msp.redhat.com> References: <1185253671.517.60.camel@technetium.msp.redhat.com> Message-ID: <1185283312.8765.431.camel@quoit> List-Id: To: cluster-devel.redhat.com MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Hi, None of these patches apply due to whitespace breakage. On Tue, 2007-07-24 at 00:07 -0500, Bob Peterson wrote: > Here is a set of five patches designed to fix the "invalid metadata > block" and hang problems encountered when running the revolver test. > > In order, the five patches are: > > 1. There were still some critical variables being manipulated outside > the log_lock spinlock. That usually resulted in more hangs. The BUG_ON(!buffer_mapped()) line in this patch can be removed as its only debugging. > 2. The list_move code previously concocted in log.c for bug #238162 > (see https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238162#c23) > seems to be causing a problem. That section was reverted. HOWEVER, > I need to rerun the test cases listed in that bug to make sure > removing it doesn't cause anything to break. I haven't had time yet. Moving the assignment head = &sdp->sd_ail1_list; doesn't give us anything since head is a pointer and will be constant whether we have the lock or not. > 3. The try_rgrp_unlink code in rgrp.c had an infinite loop. This was > caused because the bitmap function rgblk_search can return a block > less than the "goal" block, in which case it was looping. The fix is > to make it always march forward as needed. Ok. > 4. There was metadata corruption caused because the clone bitmaps weren't > being kept in synch with the regular bitmaps in some cases. > Code was added to keep them in synch. I need to look at this in more detail. You might be right, but its worth being rather careful in this part of the code. > 5. Metadata corruption was occurring because page references weren't > being removed in all cases. I previously added a function called > detach_bufdata, but I discovered there already WAS a function out > there to do the job. It's called gfs2_meta_cache_flush. So I added > a call to that to remove the page references. > Recently I had been thinking that this was entirely unnecessary, but > when I removed the code, the metadata corruption problem returned > immediately. It might be that there is a timing window where the > pages can be referenced before gfs2_meta_cache_flush is called and > my patch cleans them up sooner. Yes, this looks good, Steve.