Re: [PATCH] log replay should not overwrite newer ondisk inodes

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Timothy Shimmin <tes@sgi.com>
To: Lachlan McIlroy <lachlan@sgi.com>
Cc: xfs-dev <xfs-dev@sgi.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] log replay should not overwrite newer ondisk inodes
Date: Thu, 30 Aug 2007 18:29:26 +1000	[thread overview]
Message-ID: <46D67FE6.20205@sgi.com> (raw)
In-Reply-To: <46D64CAD.6050705@sgi.com>

Lachlan McIlroy wrote:
> Timothy Shimmin wrote:
>> Lachlan McIlroy wrote:
>>> Log replay of clustered inodes currently ignores the flushiter
>>> field in the inode that is used to determine if the on-disk inode
>>> is more up to date than the copy in the log.  As a result during
>>> log replay the newer inode is being overwritten with an older
>>> version and file size updates are being lost.
>>>
>>> I haven't handled the case of the flushiter counter overflowing
>>> but that shouldn't be a problem in this case.  The log buffer
>>> contains newly created inodes so their flushiter values will be 0
>>> and the on-disk inodes should not be much greater.
>>>
>>> Lachlan
>>>
>>
>> Still would want to understand why blf_flags doesn't have
>> XFS_BLI_INODE_ALLOC_BUF set and so we could test that
>> - I didn't understand Dave's (dgc) response about that earlier.
>> But anyway...:)
>>
>>
>> Just some nit-picks:
>>
>> * instead of a comment and then calling xfs_recover_inode
>>   why don't we just give the function a better name.
>>   When I see the name I think it is going to recover the inode
>>   but instead it is just a flush_iter check and matching
>>   the magic#s.
> What would you like the function to be called?
> 
Dunno :)
xfs_flushiter_check

>>
>> * My style preference is for option 2 over option 1:
>>   1.
>>     if (a) {
>>        return 1;
>>     }
>>     return 0;
>>
>>   2.
>>     return a;
>>
>>   But I understand the predicate is long and sometimes people like
>>   to add tracing/debug for the alternate paths in which case (1)
>>   is handy for that. I just like simplicity.
> I can change that.  It just looks odd when the body of the function
> is after the return statement.
> 
It doesn't look funny to me :-)
But I'm not really fussed - do what you prefer.

>>
>> * Also it is funny to return a boolean for error in the call.
>>   Although, I see that is consistent with the rest of the function.
> Yep, keeping consistent with coding standards.
Well consistent with that function at least :)

> 
>>   But I'm not sure this is an error...
>>   Hmmmm...I'm a bit confused.
>>   So you are _almost_ combining an error check with a flushiter check?
>>   If one buffer is an inode magic# and the other isn't then we
>>   have an error right - and could report it - but we are not doing 
>> that here.
> Not exactly.  If what's on disk is not an inode but the log item is
> then that could be because we haven't written the inode to disk yet
> and we need to perform recovery.
Yeah, I was thinking about that afterward.
The item's format which gives the blk# for the buf to read could
be a block which hasn't been used for an inode yet.

OOI,
Say the inode block was freed in the past and we freshly allocate one using
this inode buf item in replay, so that the old block has the inode magic#
but it has an old large flushiter. We then say it is newer when
really it is older.
We could check for zero mode or different gen#.
Or perhaps we reset the flushiter when we free the inode or
do we go thru a different replay path.


 > If what's on disk is an inode but
> the log item is not an inode then what we are about to recover is not
> an inode and we abort the check.
Well yes we are only interested in inode buf items.

> If neither on-disk nor log item are
> inodes then we've got no business continuing the check.
Well as above it aint even an inode in the log.

> 
>>   If we have an inode buf and the flushiter on the item is older than 
>> the ondisk buffer
>>   one then it is not an error but we just don't want to recover it.
> Exactly.
> 
So its not an error then yet we are calling it that.
So perhaps we could have:

if (!error && !old_inode_item) {
....

--Tim






>> --- fs/xfs/xfs_log_recover.c_1.322    2007-08-27 17:45:45.000000000 +1000
>> +++ fs/xfs/xfs_log_recover.c    2007-08-30 11:50:44.000000000 +1000
>> @@ -1866,6 +1866,27 @@ xlog_recover_do_inode_buffer(
>>  }
>>
>>  /*
>> + * Check if we need to recover an inode from a buffer
>> + */
>> +int
>> +xfs_recover_inode(
>> +    char    *dest,
>> +    char    *src)
>> +{
>> +    xfs_dinode_t    *dip = (xfs_dinode_t *)dest;
>> +    xfs_dinode_t    *dilp = (xfs_dinode_t*)src;
>> +
>> +    if ((be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC) &&
>> +        (be16_to_cpu(dilp->di_core.di_magic) == XFS_DINODE_MAGIC) &&
>> +        (be16_to_cpu(dilp->di_core.di_flushiter) <
>> +            be16_to_cpu(dip->di_core.di_flushiter))) {
>> +        return 1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/*
>>   * Perform a 'normal' buffer recovery.  Each logged region of the
>>   * buffer should be copied over the corresponding region in the
>>   * given buffer.  The bitmap in the buf log format structure indicates
>> @@ -1917,6 +1938,13 @@ xlog_recover_do_reg_buffer(
>>                             -1, 0, XFS_QMOPT_DOWARN,
>>                             "dquot_buf_recover");
>>          }
>> +        /*
>> +         * Sanity check if this is an inode buffer.
>> +         */
>> +        if (!error)
>> +            error = xfs_recover_inode(xfs_buf_offset(bp,
>> +                    (uint)bit << XFS_BLI_SHIFT),
>> +                    item->ri_buf[i].i_addr);
>>          if (!error)
>>              memcpy(xfs_buf_offset(bp,
>>                  (uint)bit << XFS_BLI_SHIFT),    /* dest */
>>
>>

next prev parent reply	other threads:[~2007-08-30  8:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-30  2:12 [PATCH] log replay should not overwrite newer ondisk inodes Lachlan McIlroy
2007-08-30  4:31 ` Timothy Shimmin
2007-08-30  4:50   ` Lachlan McIlroy
2007-08-30  8:29     ` Timothy Shimmin [this message]
2007-08-30  8:51       ` Timothy Shimmin
2007-08-31  2:22         ` Lachlan McIlroy
2007-08-31  4:01           ` Mark Goodwin
2007-08-31 15:48             ` David Chinner
2007-09-02 22:50               ` Vlad Apostolov
2007-09-03  8:49                 ` David Chinner
2007-09-07  2:03               ` Lachlan McIlroy
2007-09-07 14:05                 ` David Chinner
2007-09-10  4:43                   ` Lachlan McIlroy
2007-08-31  2:14       ` Lachlan McIlroy
2007-08-30 14:02   ` David Chinner
2007-09-04 23:05 ` Shailendra Tripathi
2007-09-04 23:49   ` David Chinner
2007-09-04 23:51     ` David Chinner
2007-09-05  1:19   ` Timothy Shimmin
2007-09-05  1:40     ` Lachlan McIlroy
2007-09-05  6:54       ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46D67FE6.20205@sgi.com \
    --to=tes@sgi.com \
    --cc=lachlan@sgi.com \
    --cc=xfs-dev@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox