Re: [PATCH] xfs: serialize unaligned dio writes against all other dio writes

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Zorro Lang <zlang@redhat.com>
Cc: linux-xfs@vger.kernel.org, Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH] xfs: serialize unaligned dio writes against all other dio writes
Date: Mon, 25 Mar 2019 09:45:13 -0400	[thread overview]
Message-ID: <20190325134513.GB52167@bfoster> (raw)
In-Reply-To: <20190325034721.GT7200@dhcp-12-102.nay.redhat.com>

On Mon, Mar 25, 2019 at 11:47:22AM +0800, Zorro Lang wrote:
> On Sat, Mar 23, 2019 at 06:29:30PM +0800, Zorro Lang wrote:
> > On Fri, Mar 22, 2019 at 12:52:42PM -0400, Brian Foster wrote:
> > > XFS applies more strict serialization constraints to unaligned
> > > direct writes to accommodate things like direct I/O layer zeroing,
> > > unwritten extent conversion, etc. Unaligned submissions acquire the
> > > exclusive iolock and wait for in-flight dio to complete to ensure
> > > multiple submissions do not race on the same block and cause data
> > > corruption.
> > > 
> > > This generally works in the case of an aligned dio followed by an
> > > unaligned dio, but the serialization is lost if I/Os occur in the
> > > opposite order. If an unaligned write is submitted first and
> > > immediately followed by an overlapping, aligned write, the latter
> > > submits without the typical unaligned serialization barriers because
> > > there is no indication of an unaligned dio still in-flight. This can
> > > lead to unpredictable results.
> > > 
> > > To provide proper unaligned dio serialization, require that such
> > > direct writes are always the only dio allowed in-flight at one time
> > > for a particular inode. We already acquire the exclusive iolock and
> > > drain pending dio before submitting the unaligned dio. Wait once
> > > more after the dio submission to hold the iolock across the I/O and
> > > prevent further submissions until the unaligned I/O completes. This
> > > is heavy handed, but consistent with the current pre-submission
> > > serialization for unaligned direct writes.
> > > 
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > 
> > > I was originally going to deal with this problem by hacking in an inode
> > > flag to track unaligned dio writes in-flight and use that to block any
> > > follow on dio writes until cleared. Dave suggested we could use the
> > > iolock to serialize by converting unaligned async dio writes to sync dio
> > > writes and just letting the unaligned dio itself always block. That
> > > seemed reasonable to me, but I morphed the approach slightly to just use
> > > inode_dio_wait() because it seemed a bit cleaner. Thoughts?
> > > 
> > > Zorro,
> > > 
> > > You reproduced this problem originally. It addresses the problem in the
> > > test case that reproduced for me. Care to confirm whether this patch
> > > fixes the problem for you? Thanks.
> > 
> > Hi Brian,
> > 
> > Sure, but I can't reproduce this bug on upstream kernel. I have to merge
> > it into an older kernel(you know that:), to verify if it works.
> 
> I merged your patch into an older kernel which I can reproduce this bug.
> Then test passed.
> 

Excellent, thanks.

> BTW, Eryu said he hit this bug on upstream v5.0, on a KVM machine with LVM
> devices, when he ran the case which I sent to fstests@. So I think
> it's reproducible on upstream kernel. Just we need some conditions to trigger
> that. So if you know how to make this 'condition', please tell me, I'll
> think about if I can write anther case to cover this bug specially.
> 

Ok. I could only reproduce with your custom reproducer (over loop) on
kernels between v4.14 and v4.20 (inclusive). More specifically, I could
reproduce between commits 546e7be824 ("iomap_dio_rw: Allocate AIO
completion queue before submitting dio") and a79d40e9b0 ("aio: only use
blk plugs for > 2 depth submissions"). As discussed, these commits
mostly just alter timing and thus affect the issue indirectly.

I haven't taken a closer look at the fstest yet beyond the custom
variant you provided. I wonder if a test could reproduce this more
effectively by increasing the load of the test..? For example, can you
reproduce by running many iterations of the I/O? What about running a
deeper queue and submitting many such overlapping aligned/unaligned I/Os
all at once (to the same or different offsets of the file)? Just a
thought..

Brian

> Thanks,
> Zorro
> 
> > 
> > If upstream kernel has this issue too, do you have a better idea to reproduce
> > it on upstream? Maybe I can improve my case to cover more.
> > 
> > Thanks,
> > Zorro
> > 
> > > 
> > > Brian
> > > 
> > >  fs/xfs/xfs_file.c | 21 ++++++++++++---------
> > >  1 file changed, 12 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > > index 770cc2edf777..8b2aaed82343 100644
> > > --- a/fs/xfs/xfs_file.c
> > > +++ b/fs/xfs/xfs_file.c
> > > @@ -529,18 +529,19 @@ xfs_file_dio_aio_write(
> > >  	count = iov_iter_count(from);
> > >  
> > >  	/*
> > > -	 * If we are doing unaligned IO, wait for all other IO to drain,
> > > -	 * otherwise demote the lock if we had to take the exclusive lock
> > > -	 * for other reasons in xfs_file_aio_write_checks.
> > > +	 * If we are doing unaligned IO, we can't allow any other IO in-flight
> > > +	 * at the same time or we risk data corruption. Wait for all other IO to
> > > +	 * drain, submit and wait for completion before we release the iolock.
> > > +	 *
> > > +	 * If the IO is aligned, demote the iolock if we had to take the
> > > +	 * exclusive lock in xfs_file_aio_write_checks() for other reasons.
> > >  	 */
> > >  	if (unaligned_io) {
> > > -		/* If we are going to wait for other DIO to finish, bail */
> > > -		if (iocb->ki_flags & IOCB_NOWAIT) {
> > > -			if (atomic_read(&inode->i_dio_count))
> > > -				return -EAGAIN;
> > > -		} else {
> > > +		/* unaligned dio always waits, bail */
> > > +		if (iocb->ki_flags & IOCB_NOWAIT)
> > > +			return -EAGAIN;
> > > +		else
> > >  			inode_dio_wait(inode);
> > > -		}
> > >  	} else if (iolock == XFS_IOLOCK_EXCL) {
> > >  		xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
> > >  		iolock = XFS_IOLOCK_SHARED;
> > > @@ -548,6 +549,8 @@ xfs_file_dio_aio_write(
> > >  
> > >  	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
> > >  	ret = iomap_dio_rw(iocb, from, &xfs_iomap_ops, xfs_dio_write_end_io);
> > > +	if (unaligned_io && !is_sync_kiocb(iocb))
> > > +		inode_dio_wait(inode);
> > >  out:
> > >  	xfs_iunlock(ip, iolock);
> > >  
> > > -- 
> > > 2.17.2
> > >

next prev parent reply	other threads:[~2019-03-25 13:45 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-22 16:52 [PATCH] xfs: serialize unaligned dio writes against all other dio writes Brian Foster
2019-03-22 20:46 ` Allison Henderson
2019-03-23 10:29 ` Zorro Lang
2019-03-25  3:47   ` Zorro Lang
2019-03-25 13:45     ` Brian Foster [this message]
2019-03-24 20:59 ` Dave Chinner
2019-03-25 13:48   ` Brian Foster
2019-03-25 11:06 ` Christoph Hellwig
2019-03-25 13:51   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190325134513.GB52167@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.