All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Sandeen <sandeen@sandeen.net>
To: xfs@oss.sgi.com
Subject: Re: [PATCH 1/2] xfs: always drain dio before extending aio write submission
Date: Tue, 29 Sep 2015 17:25:02 -0500	[thread overview]
Message-ID: <560B0FBE.5010205@sandeen.net> (raw)
In-Reply-To: <1443465481-63460-2-git-send-email-bfoster@redhat.com>

Looks good to me, and tested w/ a testcase I'll send in a bit.

Reviewed-by: Eric Sandeen <sandeen@redhat.com>


On 9/28/15 1:38 PM, Brian Foster wrote:
> XFS supports and typically allows concurrent asynchronous direct I/O
> submission to a single file. One exception to the rule is that file
> extending dio writes that start beyond the current EOF (e.g.,
> potentially create a hole at EOF) require exclusive I/O access to the
> file. This is because such writes must zero any pre-existing blocks
> beyond EOF that are exposed by virtue of now residing within EOF as a
> result of the write about to be submitted.
> 
> Before EOF zeroing can occur, the current file i_size must be stabilized
> to avoid data corruption. In this scenario, XFS upgrades the iolock to
> exclude any further I/O submission, waits on in-flight I/O to complete
> to ensure i_size is up to date (i_size is updated on dio write
> completion) and restarts the various checks against the state of the
> file. The problem is that this protection sequence is triggered only
> when the iolock is currently held shared. While this is true for async
> dio in most cases, the caller may upgrade the lock in advance based on
> arbitrary circumstances with respect to EOF zeroing. For example, the
> iolock is always acquired exclusively if the start offset is not block
> aligned. This means that even though the iolock is already held
> exclusive for such I/Os, pending I/O is not drained and thus EOF zeroing
> can occur based on an unstable i_size.
> 
> This problem has been reproduced as guest data corruption in virtual
> machines with file-backed qcow2 virtual disks hosted on an XFS
> filesystem. The virtual disks must be configured with aio=native mode
> and the must not be truncated out to the maximum file size (as some virt
> managers will do).
> 
> Update xfs_file_aio_write_checks() to unconditionally drain in-flight
> dio before EOF zeroing can occur. Rather than trigger the wait based on
> iolock state, use a new flag and upgrade the iolock when necessary. Note
> that this results in a full restart of the inode checks even when the
> iolock was already held exclusive when technically it is only required
> to recheck i_size. This should be a rare enough occurrence that it is
> preferable to keep the code simple rather than create an alternate
> restart jump target.
> 
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
>  fs/xfs/xfs_file.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e78feb4..347b3e0 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -574,6 +574,7 @@ xfs_file_aio_write_checks(
>  	struct xfs_inode	*ip = XFS_I(inode);
>  	ssize_t			error = 0;
>  	size_t			count = iov_iter_count(from);
> +	bool			drained_dio = false;
>  
>  restart:
>  	error = generic_write_checks(iocb, from);
> @@ -611,12 +612,13 @@ restart:
>  		bool	zero = false;
>  
>  		spin_unlock(&ip->i_flags_lock);
> -		if (*iolock == XFS_IOLOCK_SHARED) {
> -			xfs_rw_iunlock(ip, *iolock);
> -			*iolock = XFS_IOLOCK_EXCL;
> -			xfs_rw_ilock(ip, *iolock);
> -			iov_iter_reexpand(from, count);
> -
> +		if (!drained_dio) {
> +			if (*iolock == XFS_IOLOCK_SHARED) {
> +				xfs_rw_iunlock(ip, *iolock);
> +				*iolock = XFS_IOLOCK_EXCL;
> +				xfs_rw_ilock(ip, *iolock);
> +				iov_iter_reexpand(from, count);
> +			}
>  			/*
>  			 * We now have an IO submission barrier in place, but
>  			 * AIO can do EOF updates during IO completion and hence
> @@ -626,6 +628,7 @@ restart:
>  			 * no-op.
>  			 */
>  			inode_dio_wait(inode);
> +			drained_dio = true;
>  			goto restart;
>  		}
>  		error = xfs_zero_eof(ip, iocb->ki_pos, i_size_read(inode), &zero);
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  reply	other threads:[~2015-09-29 22:25 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-28 18:37 [PATCH 0/2] xfs: fix eof zeroing i_size race Brian Foster
2015-09-28 18:38 ` [PATCH 1/2] xfs: always drain dio before extending aio write submission Brian Foster
2015-09-29 22:25   ` Eric Sandeen [this message]
2015-09-28 18:38 ` [PATCH 2/2] xfs: add an xfs_zero_eof() tracepoint Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=560B0FBE.5010205@sandeen.net \
    --to=sandeen@sandeen.net \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.