From: Lachlan McIlroy <lachlan@sgi.com>
To: Timothy Shimmin <tes@sgi.com>
Cc: xfs-dev <xfs-dev@sgi.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] log replay should not overwrite newer ondisk inodes
Date: Fri, 31 Aug 2007 12:14:55 +1000 [thread overview]
Message-ID: <46D7799F.5030609@sgi.com> (raw)
In-Reply-To: <46D67FE6.20205@sgi.com>
Timothy Shimmin wrote:
> Lachlan McIlroy wrote:
>> Timothy Shimmin wrote:
>>> Lachlan McIlroy wrote:
>>>> Log replay of clustered inodes currently ignores the flushiter
>>>> field in the inode that is used to determine if the on-disk inode
>>>> is more up to date than the copy in the log. As a result during
>>>> log replay the newer inode is being overwritten with an older
>>>> version and file size updates are being lost.
>>>>
>>>> I haven't handled the case of the flushiter counter overflowing
>>>> but that shouldn't be a problem in this case. The log buffer
>>>> contains newly created inodes so their flushiter values will be 0
>>>> and the on-disk inodes should not be much greater.
>>>>
>>>> Lachlan
>>>>
>>>
>>> Still would want to understand why blf_flags doesn't have
>>> XFS_BLI_INODE_ALLOC_BUF set and so we could test that
>>> - I didn't understand Dave's (dgc) response about that earlier.
>>> But anyway...:)
>>>
>>>
>>> Just some nit-picks:
>>>
>>> * instead of a comment and then calling xfs_recover_inode
>>> why don't we just give the function a better name.
>>> When I see the name I think it is going to recover the inode
>>> but instead it is just a flush_iter check and matching
>>> the magic#s.
>> What would you like the function to be called?
>>
> Dunno :)
> xfs_flushiter_check
>
>>>
>>> * My style preference is for option 2 over option 1:
>>> 1.
>>> if (a) {
>>> return 1;
>>> }
>>> return 0;
>>>
>>> 2.
>>> return a;
>>>
>>> But I understand the predicate is long and sometimes people like
>>> to add tracing/debug for the alternate paths in which case (1)
>>> is handy for that. I just like simplicity.
>> I can change that. It just looks odd when the body of the function
>> is after the return statement.
>>
> It doesn't look funny to me :-)
> But I'm not really fussed - do what you prefer.
>
>>>
>>> * Also it is funny to return a boolean for error in the call.
>>> Although, I see that is consistent with the rest of the function.
>> Yep, keeping consistent with coding standards.
> Well consistent with that function at least :)
>
>>
>>> But I'm not sure this is an error...
>>> Hmmmm...I'm a bit confused.
>>> So you are _almost_ combining an error check with a flushiter check?
>>> If one buffer is an inode magic# and the other isn't then we
>>> have an error right - and could report it - but we are not doing
>>> that here.
>> Not exactly. If what's on disk is not an inode but the log item is
>> then that could be because we haven't written the inode to disk yet
>> and we need to perform recovery.
> Yeah, I was thinking about that afterward.
> The item's format which gives the blk# for the buf to read could
> be a block which hasn't been used for an inode yet.
>
> OOI,
> Say the inode block was freed in the past and we freshly allocate one using
> this inode buf item in replay, so that the old block has the inode magic#
> but it has an old large flushiter. We then say it is newer when
> really it is older.
Yeah that could be a problem. Knew this flushiter thing was a hack - if
we logged every inode change we wouldn't have to worry what is on disk we
could just play back the log.
> We could check for zero mode or different gen#.
That might work, thanks. I suppose the existing flushiter check will need
this extra sanity too.
> Or perhaps we reset the flushiter when we free the inode or
> do we go thru a different replay path.
Not sure how this would work.
>
>
> > If what's on disk is an inode but
>> the log item is not an inode then what we are about to recover is not
>> an inode and we abort the check.
> Well yes we are only interested in inode buf items.
>
>> If neither on-disk nor log item are
>> inodes then we've got no business continuing the check.
> Well as above it aint even an inode in the log.
>
>>
>>> If we have an inode buf and the flushiter on the item is older than
>>> the ondisk buffer
>>> one then it is not an error but we just don't want to recover it.
>> Exactly.
>>
> So its not an error then yet we are calling it that.
> So perhaps we could have:
>
> if (!error && !old_inode_item) {
> ....
>
> --Tim
>
>
>
>
>
>
>>> --- fs/xfs/xfs_log_recover.c_1.322 2007-08-27 17:45:45.000000000
>>> +1000
>>> +++ fs/xfs/xfs_log_recover.c 2007-08-30 11:50:44.000000000 +1000
>>> @@ -1866,6 +1866,27 @@ xlog_recover_do_inode_buffer(
>>> }
>>>
>>> /*
>>> + * Check if we need to recover an inode from a buffer
>>> + */
>>> +int
>>> +xfs_recover_inode(
>>> + char *dest,
>>> + char *src)
>>> +{
>>> + xfs_dinode_t *dip = (xfs_dinode_t *)dest;
>>> + xfs_dinode_t *dilp = (xfs_dinode_t*)src;
>>> +
>>> + if ((be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC) &&
>>> + (be16_to_cpu(dilp->di_core.di_magic) == XFS_DINODE_MAGIC) &&
>>> + (be16_to_cpu(dilp->di_core.di_flushiter) <
>>> + be16_to_cpu(dip->di_core.di_flushiter))) {
>>> + return 1;
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +/*
>>> * Perform a 'normal' buffer recovery. Each logged region of the
>>> * buffer should be copied over the corresponding region in the
>>> * given buffer. The bitmap in the buf log format structure indicates
>>> @@ -1917,6 +1938,13 @@ xlog_recover_do_reg_buffer(
>>> -1, 0, XFS_QMOPT_DOWARN,
>>> "dquot_buf_recover");
>>> }
>>> + /*
>>> + * Sanity check if this is an inode buffer.
>>> + */
>>> + if (!error)
>>> + error = xfs_recover_inode(xfs_buf_offset(bp,
>>> + (uint)bit << XFS_BLI_SHIFT),
>>> + item->ri_buf[i].i_addr);
>>> if (!error)
>>> memcpy(xfs_buf_offset(bp,
>>> (uint)bit << XFS_BLI_SHIFT), /* dest */
>>>
>>>
>
>
>
next prev parent reply other threads:[~2007-08-31 2:12 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-30 2:12 [PATCH] log replay should not overwrite newer ondisk inodes Lachlan McIlroy
2007-08-30 4:31 ` Timothy Shimmin
2007-08-30 4:50 ` Lachlan McIlroy
2007-08-30 8:29 ` Timothy Shimmin
2007-08-30 8:51 ` Timothy Shimmin
2007-08-31 2:22 ` Lachlan McIlroy
2007-08-31 4:01 ` Mark Goodwin
2007-08-31 15:48 ` David Chinner
2007-09-02 22:50 ` Vlad Apostolov
2007-09-03 8:49 ` David Chinner
2007-09-07 2:03 ` Lachlan McIlroy
2007-09-07 14:05 ` David Chinner
2007-09-10 4:43 ` Lachlan McIlroy
2007-08-31 2:14 ` Lachlan McIlroy [this message]
2007-08-30 14:02 ` David Chinner
2007-09-04 23:05 ` Shailendra Tripathi
2007-09-04 23:49 ` David Chinner
2007-09-04 23:51 ` David Chinner
2007-09-05 1:19 ` Timothy Shimmin
2007-09-05 1:40 ` Lachlan McIlroy
2007-09-05 6:54 ` David Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=46D7799F.5030609@sgi.com \
--to=lachlan@sgi.com \
--cc=tes@sgi.com \
--cc=xfs-dev@sgi.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox