From mboxrd@z Thu Jan  1 00:00:00 1970
From: Wendy Cheng <wcheng@redhat.com>
Date: Fri, 10 Aug 2007 09:12:14 -0400
Subject: [Cluster-devel] [PATCH 4 of 5] Bz #248176: GFS2:	invalid	metadata
	block - REVISED
In-Reply-To: <1186734409.8765.757.camel@quoit>
References: <1186609929.25269.46.camel@technetium.msp.redhat.com>	
	<46BB1AC4.9030509@redhat.com>	
	<1186673842.25269.81.camel@technetium.msp.redhat.com>	
	<46BB5B2B.2040405@redhat.com> <1186734409.8765.757.camel@quoit>
Message-ID: <46BC642E.5080606@redhat.com>
List-Id: <cluster-devel.redhat.com>
To: cluster-devel.redhat.com
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit

Steven Whitehouse wrote:

>Hi,
>
>On Thu, 2007-08-09 at 14:21 -0400, Wendy Cheng wrote:
>  
>
>>Bob Peterson wrote:
>>    
>>
>>>On Thu, 2007-08-09 at 09:46 -0400, Wendy Cheng wrote:
>>>  
>>>      
>>>
>>>>Set aside "after this patch, the problem goes away" thing ...
>>>>
>>>>I haven't checked previous three patches yet so I may not have the 
>>>>overall picture ... but why adding the journal flush spin lock here 
>>>>could prevent the new inode to get re-used before its associated buffer 
>>>>are flushed to the logs ? Could you elaborate more ?
>>>>
>>>>    
>>>>        
>>>>
>>>>>+		down_write(&sdp->sd_log_flush_lock);
>>>>> 		block = rgblk_search(rgd, goal, GFS2_BLKST_UNLINKED,
>>>>> 				     GFS2_BLKST_UNLINKED);
>>>>>+		up_write(&sdp->sd_log_flush_lock);
>>>>>      
>>>>>          
>>>>>
>>>IIRC, if we don't protect rgblk_search from finding GFS2_BLKST_UNLINKED
>>>blocks, a "deleted" inode may be returned to function
>>>gfs2_inplace_reserve_i which will do an iput on the inode,
>>>which may reference buffers that are being flushed to disk.
>>>If almost all blocks in that bitmap are allocated, I think the
>>>deleted block may sometimes be reused and the buffer 
>>>associated with the reused block may be changed before it's
>>>actually written out to disk.
>>>  
>>>      
>>>
>>Log flushing is an asynchronous event. I still don't see how this can 
>>*protect* the condition you just described (i.e., prevents the block 
>>being assigned to someone else before log flush occurs).  Or do I 
>>understand your statement right (i.e., the log flushing must occur 
>>before the block is used by someone else) ? It may *reduce* the 
>>possibility (if log flushing happens at the same time as this 
>>assignment) but I don't see how it can *stop* the condition.
>>
>>You may "reduce" the (rare) possibility but the real issue is still 
>>hanging there ? If this is true, then I don't agree we have to pay the 
>>price of moving a journal flushing lock into resource handling code.
>>
>>-- Wendy
>>
>>    
>>
>
>Due to the way in which the locking is defined, the journal lock is also
>used to keep other processes out of the rgrp bitmaps. This prevents the
>state of the rgrp bitmaps changing while we are scanning them in case a
>journal flush might occur.
>
>The sd_log_flush_lock is an rwsem which is held in read mode by each and
>every transaction and in write mode when flushing the journal. Log
>flushing ought to be an asynchronous event, but due to the design of the
>journaling, it unfortunately isn't in GFS2. It is something that we need
>to review in the future,
>
>
>  
>

It is still not clear what exactly does this lock protect ? The unlinked 
rgrp bitmap itself or the buffers associated with these disk blocks ? If 
it is later (the buffers as Bob said),  it implies for every block GFS2 
takes from this unlinked bitmap, journal flush *has* to happen before it 
can be used ? Could you elaborate more ?

-- Wendy