From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: with ECARTIS (v1.0.0; list xfs); Thu, 30 Aug 2007 19:12:56 -0700 (PDT) Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with SMTP id l7V2Ci4p030880 for ; Thu, 30 Aug 2007 19:12:50 -0700 Message-ID: <46D7799F.5030609@sgi.com> Date: Fri, 31 Aug 2007 12:14:55 +1000 From: Lachlan McIlroy MIME-Version: 1.0 Subject: Re: [PATCH] log replay should not overwrite newer ondisk inodes References: <46D6279F.40601@sgi.com> <46D6480F.4040307@sgi.com> <46D64CAD.6050705@sgi.com> <46D67FE6.20205@sgi.com> In-Reply-To: <46D67FE6.20205@sgi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: xfs-bounce@oss.sgi.com Errors-to: xfs-bounce@oss.sgi.com List-Id: xfs To: Timothy Shimmin Cc: xfs-dev , xfs-oss Timothy Shimmin wrote: > Lachlan McIlroy wrote: >> Timothy Shimmin wrote: >>> Lachlan McIlroy wrote: >>>> Log replay of clustered inodes currently ignores the flushiter >>>> field in the inode that is used to determine if the on-disk inode >>>> is more up to date than the copy in the log. As a result during >>>> log replay the newer inode is being overwritten with an older >>>> version and file size updates are being lost. >>>> >>>> I haven't handled the case of the flushiter counter overflowing >>>> but that shouldn't be a problem in this case. The log buffer >>>> contains newly created inodes so their flushiter values will be 0 >>>> and the on-disk inodes should not be much greater. >>>> >>>> Lachlan >>>> >>> >>> Still would want to understand why blf_flags doesn't have >>> XFS_BLI_INODE_ALLOC_BUF set and so we could test that >>> - I didn't understand Dave's (dgc) response about that earlier. >>> But anyway...:) >>> >>> >>> Just some nit-picks: >>> >>> * instead of a comment and then calling xfs_recover_inode >>> why don't we just give the function a better name. >>> When I see the name I think it is going to recover the inode >>> but instead it is just a flush_iter check and matching >>> the magic#s. >> What would you like the function to be called? >> > Dunno :) > xfs_flushiter_check > >>> >>> * My style preference is for option 2 over option 1: >>> 1. >>> if (a) { >>> return 1; >>> } >>> return 0; >>> >>> 2. >>> return a; >>> >>> But I understand the predicate is long and sometimes people like >>> to add tracing/debug for the alternate paths in which case (1) >>> is handy for that. I just like simplicity. >> I can change that. It just looks odd when the body of the function >> is after the return statement. >> > It doesn't look funny to me :-) > But I'm not really fussed - do what you prefer. > >>> >>> * Also it is funny to return a boolean for error in the call. >>> Although, I see that is consistent with the rest of the function. >> Yep, keeping consistent with coding standards. > Well consistent with that function at least :) > >> >>> But I'm not sure this is an error... >>> Hmmmm...I'm a bit confused. >>> So you are _almost_ combining an error check with a flushiter check? >>> If one buffer is an inode magic# and the other isn't then we >>> have an error right - and could report it - but we are not doing >>> that here. >> Not exactly. If what's on disk is not an inode but the log item is >> then that could be because we haven't written the inode to disk yet >> and we need to perform recovery. > Yeah, I was thinking about that afterward. > The item's format which gives the blk# for the buf to read could > be a block which hasn't been used for an inode yet. > > OOI, > Say the inode block was freed in the past and we freshly allocate one using > this inode buf item in replay, so that the old block has the inode magic# > but it has an old large flushiter. We then say it is newer when > really it is older. Yeah that could be a problem. Knew this flushiter thing was a hack - if we logged every inode change we wouldn't have to worry what is on disk we could just play back the log. > We could check for zero mode or different gen#. That might work, thanks. I suppose the existing flushiter check will need this extra sanity too. > Or perhaps we reset the flushiter when we free the inode or > do we go thru a different replay path. Not sure how this would work. > > > > If what's on disk is an inode but >> the log item is not an inode then what we are about to recover is not >> an inode and we abort the check. > Well yes we are only interested in inode buf items. > >> If neither on-disk nor log item are >> inodes then we've got no business continuing the check. > Well as above it aint even an inode in the log. > >> >>> If we have an inode buf and the flushiter on the item is older than >>> the ondisk buffer >>> one then it is not an error but we just don't want to recover it. >> Exactly. >> > So its not an error then yet we are calling it that. > So perhaps we could have: > > if (!error && !old_inode_item) { > .... > > --Tim > > > > > > >>> --- fs/xfs/xfs_log_recover.c_1.322 2007-08-27 17:45:45.000000000 >>> +1000 >>> +++ fs/xfs/xfs_log_recover.c 2007-08-30 11:50:44.000000000 +1000 >>> @@ -1866,6 +1866,27 @@ xlog_recover_do_inode_buffer( >>> } >>> >>> /* >>> + * Check if we need to recover an inode from a buffer >>> + */ >>> +int >>> +xfs_recover_inode( >>> + char *dest, >>> + char *src) >>> +{ >>> + xfs_dinode_t *dip = (xfs_dinode_t *)dest; >>> + xfs_dinode_t *dilp = (xfs_dinode_t*)src; >>> + >>> + if ((be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC) && >>> + (be16_to_cpu(dilp->di_core.di_magic) == XFS_DINODE_MAGIC) && >>> + (be16_to_cpu(dilp->di_core.di_flushiter) < >>> + be16_to_cpu(dip->di_core.di_flushiter))) { >>> + return 1; >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +/* >>> * Perform a 'normal' buffer recovery. Each logged region of the >>> * buffer should be copied over the corresponding region in the >>> * given buffer. The bitmap in the buf log format structure indicates >>> @@ -1917,6 +1938,13 @@ xlog_recover_do_reg_buffer( >>> -1, 0, XFS_QMOPT_DOWARN, >>> "dquot_buf_recover"); >>> } >>> + /* >>> + * Sanity check if this is an inode buffer. >>> + */ >>> + if (!error) >>> + error = xfs_recover_inode(xfs_buf_offset(bp, >>> + (uint)bit << XFS_BLI_SHIFT), >>> + item->ri_buf[i].i_addr); >>> if (!error) >>> memcpy(xfs_buf_offset(bp, >>> (uint)bit << XFS_BLI_SHIFT), /* dest */ >>> >>> > > >