From: Eric Sandeen <sandeen@sandeen.net>
To: xfs@oss.sgi.com
Subject: Re: [PATCH 1/2] xfs: always drain dio before extending aio write submission
Date: Tue, 29 Sep 2015 17:25:02 -0500 [thread overview]
Message-ID: <560B0FBE.5010205@sandeen.net> (raw)
In-Reply-To: <1443465481-63460-2-git-send-email-bfoster@redhat.com>
Looks good to me, and tested w/ a testcase I'll send in a bit.
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
On 9/28/15 1:38 PM, Brian Foster wrote:
> XFS supports and typically allows concurrent asynchronous direct I/O
> submission to a single file. One exception to the rule is that file
> extending dio writes that start beyond the current EOF (e.g.,
> potentially create a hole at EOF) require exclusive I/O access to the
> file. This is because such writes must zero any pre-existing blocks
> beyond EOF that are exposed by virtue of now residing within EOF as a
> result of the write about to be submitted.
>
> Before EOF zeroing can occur, the current file i_size must be stabilized
> to avoid data corruption. In this scenario, XFS upgrades the iolock to
> exclude any further I/O submission, waits on in-flight I/O to complete
> to ensure i_size is up to date (i_size is updated on dio write
> completion) and restarts the various checks against the state of the
> file. The problem is that this protection sequence is triggered only
> when the iolock is currently held shared. While this is true for async
> dio in most cases, the caller may upgrade the lock in advance based on
> arbitrary circumstances with respect to EOF zeroing. For example, the
> iolock is always acquired exclusively if the start offset is not block
> aligned. This means that even though the iolock is already held
> exclusive for such I/Os, pending I/O is not drained and thus EOF zeroing
> can occur based on an unstable i_size.
>
> This problem has been reproduced as guest data corruption in virtual
> machines with file-backed qcow2 virtual disks hosted on an XFS
> filesystem. The virtual disks must be configured with aio=native mode
> and the must not be truncated out to the maximum file size (as some virt
> managers will do).
>
> Update xfs_file_aio_write_checks() to unconditionally drain in-flight
> dio before EOF zeroing can occur. Rather than trigger the wait based on
> iolock state, use a new flag and upgrade the iolock when necessary. Note
> that this results in a full restart of the inode checks even when the
> iolock was already held exclusive when technically it is only required
> to recheck i_size. This should be a rare enough occurrence that it is
> preferable to keep the code simple rather than create an alternate
> restart jump target.
>
> Signed-off-by: Brian Foster <bfoster@redhat.com>
> ---
> fs/xfs/xfs_file.c | 15 +++++++++------
> 1 file changed, 9 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index e78feb4..347b3e0 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -574,6 +574,7 @@ xfs_file_aio_write_checks(
> struct xfs_inode *ip = XFS_I(inode);
> ssize_t error = 0;
> size_t count = iov_iter_count(from);
> + bool drained_dio = false;
>
> restart:
> error = generic_write_checks(iocb, from);
> @@ -611,12 +612,13 @@ restart:
> bool zero = false;
>
> spin_unlock(&ip->i_flags_lock);
> - if (*iolock == XFS_IOLOCK_SHARED) {
> - xfs_rw_iunlock(ip, *iolock);
> - *iolock = XFS_IOLOCK_EXCL;
> - xfs_rw_ilock(ip, *iolock);
> - iov_iter_reexpand(from, count);
> -
> + if (!drained_dio) {
> + if (*iolock == XFS_IOLOCK_SHARED) {
> + xfs_rw_iunlock(ip, *iolock);
> + *iolock = XFS_IOLOCK_EXCL;
> + xfs_rw_ilock(ip, *iolock);
> + iov_iter_reexpand(from, count);
> + }
> /*
> * We now have an IO submission barrier in place, but
> * AIO can do EOF updates during IO completion and hence
> @@ -626,6 +628,7 @@ restart:
> * no-op.
> */
> inode_dio_wait(inode);
> + drained_dio = true;
> goto restart;
> }
> error = xfs_zero_eof(ip, iocb->ki_pos, i_size_read(inode), &zero);
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2015-09-29 22:25 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-28 18:37 [PATCH 0/2] xfs: fix eof zeroing i_size race Brian Foster
2015-09-28 18:38 ` [PATCH 1/2] xfs: always drain dio before extending aio write submission Brian Foster
2015-09-29 22:25 ` Eric Sandeen [this message]
2015-09-28 18:38 ` [PATCH 2/2] xfs: add an xfs_zero_eof() tracepoint Brian Foster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=560B0FBE.5010205@sandeen.net \
--to=sandeen@sandeen.net \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox