David Chinner wrote: > On Fri, Aug 31, 2007 at 02:01:37PM +1000, Mark Goodwin wrote: >> Lachlan McIlroy wrote: >>> Timothy Shimmin wrote: >>>> Timothy Shimmin wrote: >>>>>>> But I'm not sure this is an error... >>>>>>> Hmmmm...I'm a bit confused. >>>>>>> So you are _almost_ combining an error check with a flushiter check? >>>>>>> If one buffer is an inode magic# and the other isn't then we >>>>>>> have an error right - and could report it - but we are not doing >>>>>>> that here. >>>>>> Not exactly. If what's on disk is not an inode but the log item is >>>>>> then that could be because we haven't written the inode to disk yet >>>>>> and we need to perform recovery. >>>>> Yeah, I was thinking about that afterward. >>>>> The item's format which gives the blk# for the buf to read could >>>>> be a block which hasn't been used for an inode yet. >>>>> >>>> Well, if what's on disk is not an inode but some other data >>>> and it happens to have the inode magic# which is remotely possible, >>>> then we are making a bad assumption. >>>> i.e. if we're not sure what the block/buffer should be, then testing the >>>> MAGIC# isn't a guarantee it's an inode then. >>>> Well not for the freeing of inode clusters case I would assume. >>>> Or am I missing something? >>> I don't think you're missing anything! >>> >>> You're right though - a magic number check is no guarantee. On the same >>> vein, adding a generation number check isn't much better. >> unlink will have to invalidate the on-disk inode magic number? Or only >> when the whole cluster is free'd? > > An unlinked inode is only detectable by the mode parameter being zero. > The rest of the inode will look valid. > > To detect the difference between a newly allocated inode *chunk* > that has been written to and a stale inode chunk that we have > just allocated and not written to yet, you need to walk every inode > in the chunk and determine if the mode parameter is zero in every > inode. > > If the mode is zero for all inodes and there are generation numbers > that are not zero, then you've detected a stale buffer and you should > replay the inode cluster buffer initialisation. > Thanks for this info Dave. I looked into it and came up with a solution that looks at the ondisk inode buffer and determines if it has been written to since being logged. It iterates through all the inodes and checks each one with: - if the magic number is wrong the buffer is stale - if the mode is non-zero then the buffer is newer than the log - if the mode is zero and the generation count is non-zero then the buffer is stale If the end result is a stale buffer then the buffer is replayed otherwise it is skipped. I added a new flag that gets logged with a new inode cluster so that we can identify a buffer of inodes from something else. This fix is passing all the tests we have. Is this a better approach than the last fix? Lachlan