public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Lachlan McIlroy <lachlan@sgi.com>
To: Timothy Shimmin <tes@sgi.com>
Cc: xfs-dev <xfs-dev@sgi.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] log replay should not overwrite newer ondisk inodes
Date: Fri, 31 Aug 2007 12:14:55 +1000	[thread overview]
Message-ID: <46D7799F.5030609@sgi.com> (raw)
In-Reply-To: <46D67FE6.20205@sgi.com>

Timothy Shimmin wrote:
> Lachlan McIlroy wrote:
>> Timothy Shimmin wrote:
>>> Lachlan McIlroy wrote:
>>>> Log replay of clustered inodes currently ignores the flushiter
>>>> field in the inode that is used to determine if the on-disk inode
>>>> is more up to date than the copy in the log.  As a result during
>>>> log replay the newer inode is being overwritten with an older
>>>> version and file size updates are being lost.
>>>>
>>>> I haven't handled the case of the flushiter counter overflowing
>>>> but that shouldn't be a problem in this case.  The log buffer
>>>> contains newly created inodes so their flushiter values will be 0
>>>> and the on-disk inodes should not be much greater.
>>>>
>>>> Lachlan
>>>>
>>>
>>> Still would want to understand why blf_flags doesn't have
>>> XFS_BLI_INODE_ALLOC_BUF set and so we could test that
>>> - I didn't understand Dave's (dgc) response about that earlier.
>>> But anyway...:)
>>>
>>>
>>> Just some nit-picks:
>>>
>>> * instead of a comment and then calling xfs_recover_inode
>>>   why don't we just give the function a better name.
>>>   When I see the name I think it is going to recover the inode
>>>   but instead it is just a flush_iter check and matching
>>>   the magic#s.
>> What would you like the function to be called?
>>
> Dunno :)
> xfs_flushiter_check
> 
>>>
>>> * My style preference is for option 2 over option 1:
>>>   1.
>>>     if (a) {
>>>        return 1;
>>>     }
>>>     return 0;
>>>
>>>   2.
>>>     return a;
>>>
>>>   But I understand the predicate is long and sometimes people like
>>>   to add tracing/debug for the alternate paths in which case (1)
>>>   is handy for that. I just like simplicity.
>> I can change that.  It just looks odd when the body of the function
>> is after the return statement.
>>
> It doesn't look funny to me :-)
> But I'm not really fussed - do what you prefer.
> 
>>>
>>> * Also it is funny to return a boolean for error in the call.
>>>   Although, I see that is consistent with the rest of the function.
>> Yep, keeping consistent with coding standards.
> Well consistent with that function at least :)
> 
>>
>>>   But I'm not sure this is an error...
>>>   Hmmmm...I'm a bit confused.
>>>   So you are _almost_ combining an error check with a flushiter check?
>>>   If one buffer is an inode magic# and the other isn't then we
>>>   have an error right - and could report it - but we are not doing 
>>> that here.
>> Not exactly.  If what's on disk is not an inode but the log item is
>> then that could be because we haven't written the inode to disk yet
>> and we need to perform recovery.
> Yeah, I was thinking about that afterward.
> The item's format which gives the blk# for the buf to read could
> be a block which hasn't been used for an inode yet.
> 
> OOI,
> Say the inode block was freed in the past and we freshly allocate one using
> this inode buf item in replay, so that the old block has the inode magic#
> but it has an old large flushiter. We then say it is newer when
> really it is older.
Yeah that could be a problem.  Knew this flushiter thing was a hack - if
we logged every inode change we wouldn't have to worry what is on disk we
could just play back the log.

> We could check for zero mode or different gen#.
That might work, thanks.  I suppose the existing flushiter check will need
this extra sanity too.

> Or perhaps we reset the flushiter when we free the inode or
> do we go thru a different replay path.
Not sure how this would work.

> 
> 
>  > If what's on disk is an inode but
>> the log item is not an inode then what we are about to recover is not
>> an inode and we abort the check.
> Well yes we are only interested in inode buf items.
> 
>> If neither on-disk nor log item are
>> inodes then we've got no business continuing the check.
> Well as above it aint even an inode in the log.
> 
>>
>>>   If we have an inode buf and the flushiter on the item is older than 
>>> the ondisk buffer
>>>   one then it is not an error but we just don't want to recover it.
>> Exactly.
>>
> So its not an error then yet we are calling it that.
> So perhaps we could have:
> 
> if (!error && !old_inode_item) {
> ....
> 
> --Tim
> 
> 
> 
> 
> 
> 
>>> --- fs/xfs/xfs_log_recover.c_1.322    2007-08-27 17:45:45.000000000 
>>> +1000
>>> +++ fs/xfs/xfs_log_recover.c    2007-08-30 11:50:44.000000000 +1000
>>> @@ -1866,6 +1866,27 @@ xlog_recover_do_inode_buffer(
>>>  }
>>>
>>>  /*
>>> + * Check if we need to recover an inode from a buffer
>>> + */
>>> +int
>>> +xfs_recover_inode(
>>> +    char    *dest,
>>> +    char    *src)
>>> +{
>>> +    xfs_dinode_t    *dip = (xfs_dinode_t *)dest;
>>> +    xfs_dinode_t    *dilp = (xfs_dinode_t*)src;
>>> +
>>> +    if ((be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC) &&
>>> +        (be16_to_cpu(dilp->di_core.di_magic) == XFS_DINODE_MAGIC) &&
>>> +        (be16_to_cpu(dilp->di_core.di_flushiter) <
>>> +            be16_to_cpu(dip->di_core.di_flushiter))) {
>>> +        return 1;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/*
>>>   * Perform a 'normal' buffer recovery.  Each logged region of the
>>>   * buffer should be copied over the corresponding region in the
>>>   * given buffer.  The bitmap in the buf log format structure indicates
>>> @@ -1917,6 +1938,13 @@ xlog_recover_do_reg_buffer(
>>>                             -1, 0, XFS_QMOPT_DOWARN,
>>>                             "dquot_buf_recover");
>>>          }
>>> +        /*
>>> +         * Sanity check if this is an inode buffer.
>>> +         */
>>> +        if (!error)
>>> +            error = xfs_recover_inode(xfs_buf_offset(bp,
>>> +                    (uint)bit << XFS_BLI_SHIFT),
>>> +                    item->ri_buf[i].i_addr);
>>>          if (!error)
>>>              memcpy(xfs_buf_offset(bp,
>>>                  (uint)bit << XFS_BLI_SHIFT),    /* dest */
>>>
>>>
> 
> 
> 

  parent reply	other threads:[~2007-08-31  2:12 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-30  2:12 [PATCH] log replay should not overwrite newer ondisk inodes Lachlan McIlroy
2007-08-30  4:31 ` Timothy Shimmin
2007-08-30  4:50   ` Lachlan McIlroy
2007-08-30  8:29     ` Timothy Shimmin
2007-08-30  8:51       ` Timothy Shimmin
2007-08-31  2:22         ` Lachlan McIlroy
2007-08-31  4:01           ` Mark Goodwin
2007-08-31 15:48             ` David Chinner
2007-09-02 22:50               ` Vlad Apostolov
2007-09-03  8:49                 ` David Chinner
2007-09-07  2:03               ` Lachlan McIlroy
2007-09-07 14:05                 ` David Chinner
2007-09-10  4:43                   ` Lachlan McIlroy
2007-08-31  2:14       ` Lachlan McIlroy [this message]
2007-08-30 14:02   ` David Chinner
2007-09-04 23:05 ` Shailendra Tripathi
2007-09-04 23:49   ` David Chinner
2007-09-04 23:51     ` David Chinner
2007-09-05  1:19   ` Timothy Shimmin
2007-09-05  1:40     ` Lachlan McIlroy
2007-09-05  6:54       ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46D7799F.5030609@sgi.com \
    --to=lachlan@sgi.com \
    --cc=tes@sgi.com \
    --cc=xfs-dev@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox