public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 6/8] xfs: DIO write completion size updates race
Date: Tue, 14 Apr 2015 10:35:23 -0400	[thread overview]
Message-ID: <20150414143523.GG36198@bfoster.bfoster> (raw)
In-Reply-To: <1428996411-1507-7-git-send-email-david@fromorbit.com>

On Tue, Apr 14, 2015 at 05:26:49PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> xfs_end_io_direct_write() can race with other IO completions when
> updating the in-core inode size. The IO completion processing is not
> serialised for direct IO - they are done either under the
> IOLOCK_SHARED for non-AIO DIO, and without any IOLOCK held at all
> during AIO DIO completion. Hence the non-atomic test-and-set update
> of the in-core inode size is racy and can result in the in-core
> inode size going backwards if the race if hit just right.
> 
> If the inode size goes backwards, this can trigger the EOF zeroing
> code to run incorrectly on the next IO, which then will zero data
> that has successfully been written to disk by a previous DIO.
> 
> To fix this bug, we need to serialise the test/set updates of the
> in-core inode size. This first patch introduces locking around the
> relevant updates and checks in the DIO path. Because we now have an
> ioend in xfs_end_io_direct_write(), we know exactly then we are
> doing an IO that requires an in-core EOF update, and we know that
> they are not running in interrupt context. As such, we do not need to
> use irqsave() spinlock variants to protect against interrupts while
> the lock is held.
> 
> Hence we can use an existing spinlock in the inode to do this
> serialisation and so not need to grow the struct xfs_inode just to
> work around this problem.
> 
> This patch does not address the test/set EOF update in
> generic_file_write_direct() for various reasons - that will be done
> as a followup with separate explanation.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_aops.c |  7 +++++++
>  fs/xfs/xfs_file.c | 13 ++++++++++++-
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 55356f6..cd6b2e0 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1577,9 +1577,16 @@ xfs_end_io_direct_write(
>  	 * with the on-disk inode size being outside the in-core inode size. We
>  	 * have no other method of updating EOF for AIO, so always do it here
>  	 * if necessary.
> +	 *
> +	 * We need to lock the test/set EOF update as we can be racing with
> +	 * other IO completions here to update the EOF. Failing to serialise
> +	 * here can result in EOF moving backwards and Bad Things Happen when
> +	 * that occurs.
>  	 */
> +	spin_lock(&ip->i_flags_lock);
>  	if (offset + size > i_size_read(inode))
>  		i_size_write(inode, offset + size);
> +	spin_unlock(&ip->i_flags_lock);
>  
>  	/*
>  	 * If we are doing an append IO that needs to update the EOF on disk,
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index c203839..5d5b4ba 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -569,10 +569,20 @@ restart:
>  	 * write.  If zeroing is needed and we are currently holding the
>  	 * iolock shared, we need to update it to exclusive which implies
>  	 * having to redo all checks before.
> +	 *
> +	 * We need to serialise against EOF updates that occur in IO
> +	 * completions here. We want to make sure that nobody is changing the
> +	 * size while we do this check until we have placed an IO barrier (i.e.
> +	 * hold the XFS_IOLOCK_EXCL) that prevents new IO from being dispatched.
> +	 * The spinlock effectively forms a memory barrier once we have the
> +	 * XFS_IOLOCK_EXCL so we are guaranteed to see the latest EOF value
> +	 * and hence be able to correctly determine if we need to run zeroing.
>  	 */
> +	spin_lock(&ip->i_flags_lock);
>  	if (*pos > i_size_read(inode)) {
>  		bool	zero = false;
>  
> +		spin_unlock(&ip->i_flags_lock);
>  		if (*iolock == XFS_IOLOCK_SHARED) {
>  			xfs_rw_iunlock(ip, *iolock);
>  			*iolock = XFS_IOLOCK_EXCL;
> @@ -582,7 +592,8 @@ restart:
>  		error = xfs_zero_eof(ip, *pos, i_size_read(inode), &zero);
>  		if (error)
>  			return error;
> -	}
> +	} else
> +		spin_unlock(&ip->i_flags_lock);
>  
>  	/*
>  	 * Updating the timestamps will grab the ilock again from
> -- 
> 2.0.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-04-14 14:35 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-14  7:26 [PATCH 0/8 v2] xfs: fix direct IO completion issues Dave Chinner
2015-04-14  7:26 ` [PATCH 1/8] xfs: factor DIO write mapping from get_blocks Dave Chinner
2015-04-14 14:23   ` Brian Foster
2015-04-14 20:06     ` Dave Chinner
2015-04-14  7:26 ` [PATCH 2/8] xfs: move DIO mapping size calculation Dave Chinner
2015-04-14 14:24   ` Brian Foster
2015-04-14  7:26 ` [PATCH 3/8] xfs: DIO needs an ioend for writes Dave Chinner
2015-04-14 14:24   ` Brian Foster
2015-04-14  7:26 ` [PATCH 4/8] xfs: handle DIO overwrite EOF update completion correctly Dave Chinner
2015-04-14 14:35   ` Brian Foster
2015-04-14 15:35     ` Brian Foster
2015-04-14 20:12     ` Dave Chinner
2015-04-14  7:26 ` [PATCH 5/8] xfs: DIO writes within EOF don't need an ioend Dave Chinner
2015-04-14 14:35   ` Brian Foster
2015-04-14 20:18     ` Dave Chinner
2015-04-14  7:26 ` [PATCH 6/8] xfs: DIO write completion size updates race Dave Chinner
2015-04-14 14:35   ` Brian Foster [this message]
2015-04-14  7:26 ` [PATCH 7/8] xfs: direct IO EOF zeroing needs to drain AIO Dave Chinner
2015-04-14 14:35   ` Brian Foster
2015-04-14  7:26 ` [PATCH 8/8] xfs: using generic_file_direct_write() is unnecessary Dave Chinner
2015-04-14 14:35   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150414143523.GG36198@bfoster.bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox