public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Shailendra Tripathi <stripathi@agami.com>
To: Lachlan McIlroy <lachlan@sgi.com>
Cc: xfs-dev <xfs-dev@sgi.com>, xfs-oss <xfs@oss.sgi.com>
Subject: Re: [PATCH] log replay should not overwrite newer ondisk inodes
Date: Tue, 04 Sep 2007 16:05:06 -0700	[thread overview]
Message-ID: <46DDE4A2.1070204@agami.com> (raw)
In-Reply-To: <46D6279F.40601@sgi.com>

Hi,
      Can someone explain how not checking the flushiter can losse 
filesize updates.
Let me the take the case mentioned here in the fix statement:

a. Clustered inode create -  flush iter - X( 0)
b. size update  --> flush iter --> Y

X and Y will always hold as: X <= Y, that is, it is not possible to have 
X >Y (unless size update is non -transactional. As far as I know, size 
update is always transactional.)

There are 2 cases here:
a) log of Y reached to the disk  --> 1) inode with flush iter was 
reached 2) inode didn't make.
b) log of Y didn't reach the disk --> flush_iter Y should have never 
reached disk

In none of cases, I can see the possibility that size update can be lost 
becuase of replaying of the logs in the sequential order. If Log of Y 
didn't reach, does it not make sense to have the flush_iter and size 
correctly set to the last known transaction on the disk. To me, it 
appears unsafe to do as the inode state will not match the state of the 
last known transaction after recovery.

Regards,
Shailendra
Lachlan McIlroy wrote:
> Log replay of clustered inodes currently ignores the flushiter
> field in the inode that is used to determine if the on-disk inode
> is more up to date than the copy in the log.  As a result during
> log replay the newer inode is being overwritten with an older
> version and file size updates are being lost.
>
> I haven't handled the case of the flushiter counter overflowing
> but that shouldn't be a problem in this case.  The log buffer
> contains newly created inodes so their flushiter values will be 0
> and the on-disk inodes should not be much greater.
>
> Lachlan
> ------------------------------------------------------------------------
>
> --- fs/xfs/xfs_log_recover.c_1.322	2007-08-27 17:45:45.000000000 +1000
> +++ fs/xfs/xfs_log_recover.c	2007-08-30 11:50:44.000000000 +1000
> @@ -1866,6 +1866,27 @@ xlog_recover_do_inode_buffer(
>  }
>  
>  /*
> + * Check if we need to recover an inode from a buffer
> + */
> +int
> +xfs_recover_inode(
> +	char	*dest,
> +	char	*src)
> +{
> +	xfs_dinode_t	*dip = (xfs_dinode_t *)dest;
> +	xfs_dinode_t	*dilp = (xfs_dinode_t*)src;
> +
> +	if ((be16_to_cpu(dip->di_core.di_magic) == XFS_DINODE_MAGIC) &&
> +		(be16_to_cpu(dilp->di_core.di_magic) == XFS_DINODE_MAGIC) &&
> +		(be16_to_cpu(dilp->di_core.di_flushiter) <
> +			be16_to_cpu(dip->di_core.di_flushiter))) {
> +		return 1;
> +	}
> +
> +	return 0;
> +}
> +
> +/*
>   * Perform a 'normal' buffer recovery.  Each logged region of the
>   * buffer should be copied over the corresponding region in the
>   * given buffer.  The bitmap in the buf log format structure indicates
> @@ -1917,6 +1938,13 @@ xlog_recover_do_reg_buffer(
>  					       -1, 0, XFS_QMOPT_DOWARN,
>  					       "dquot_buf_recover");
>  		}
> +		/*
> +		 * Sanity check if this is an inode buffer.
> +		 */
> +		if (!error)
> +			error = xfs_recover_inode(xfs_buf_offset(bp,
> +					(uint)bit << XFS_BLI_SHIFT),
> +					item->ri_buf[i].i_addr);
>  		if (!error)
>  			memcpy(xfs_buf_offset(bp,
>  				(uint)bit << XFS_BLI_SHIFT),	/* dest */
>   

  parent reply	other threads:[~2007-09-05  0:14 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-30  2:12 [PATCH] log replay should not overwrite newer ondisk inodes Lachlan McIlroy
2007-08-30  4:31 ` Timothy Shimmin
2007-08-30  4:50   ` Lachlan McIlroy
2007-08-30  8:29     ` Timothy Shimmin
2007-08-30  8:51       ` Timothy Shimmin
2007-08-31  2:22         ` Lachlan McIlroy
2007-08-31  4:01           ` Mark Goodwin
2007-08-31 15:48             ` David Chinner
2007-09-02 22:50               ` Vlad Apostolov
2007-09-03  8:49                 ` David Chinner
2007-09-07  2:03               ` Lachlan McIlroy
2007-09-07 14:05                 ` David Chinner
2007-09-10  4:43                   ` Lachlan McIlroy
2007-08-31  2:14       ` Lachlan McIlroy
2007-08-30 14:02   ` David Chinner
2007-09-04 23:05 ` Shailendra Tripathi [this message]
2007-09-04 23:49   ` David Chinner
2007-09-04 23:51     ` David Chinner
2007-09-05  1:19   ` Timothy Shimmin
2007-09-05  1:40     ` Lachlan McIlroy
2007-09-05  6:54       ` David Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46DDE4A2.1070204@agami.com \
    --to=stripathi@agami.com \
    --cc=lachlan@sgi.com \
    --cc=xfs-dev@sgi.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox