Re: [PATCH] xfs: properly serialise fallocate against AIO+DIO

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: properly serialise fallocate against AIO+DIO
Date: Tue, 29 Oct 2019 15:41:33 +1100	[thread overview]
Message-ID: <20191029044133.GN4614@dread.disaster.area> (raw)
In-Reply-To: <20191029041908.GB15222@magnolia>

On Mon, Oct 28, 2019 at 09:19:08PM -0700, Darrick J. Wong wrote:
> On Tue, Oct 29, 2019 at 02:48:50PM +1100, Dave Chinner wrote:
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -1040,6 +1040,7 @@ xfs_unmap_extent(
> >  	goto out_unlock;
> >  }
> >  
> > +/* Caller must first wait for the completion of any pending DIOs if required. */
> >  int
> >  xfs_flush_unmap_range(
> >  	struct xfs_inode	*ip,
> > @@ -1051,9 +1052,6 @@ xfs_flush_unmap_range(
> >  	xfs_off_t		rounding, start, end;
> >  	int			error;
> >  
> > -	/* wait for the completion of any pending DIOs */
> > -	inode_dio_wait(inode);
> 
> Does xfs_reflink_remap_prep still need this function to call inode_dio_wait
> before zapping the page cache prior to reflinking into an existing file?

No, because that is done in generic_remap_file_range_prep() after we
have locked the inodes and broken leases in
xfs_reflink_remap_prep().

> > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> > index 525b29b99116..865543e41fb4 100644
> > --- a/fs/xfs/xfs_file.c
> > +++ b/fs/xfs/xfs_file.c
> > @@ -817,6 +817,36 @@ xfs_file_fallocate(
> >  	if (error)
> >  		goto out_unlock;
> >  
> > +	/*
> > +	 * Must wait for all AIO to complete before we continue as AIO can
> > +	 * change the file size on completion without holding any locks we
> > +	 * currently hold. We must do this first because AIO can update both
> > +	 * the on disk and in memory inode sizes, and the operations that follow
> > +	 * require the in-memory size to be fully up-to-date.
> > +	 */
> > +	inode_dio_wait(inode);
> > +
> > +	/*
> > +	 * Now AIO and DIO has drained we flush and (if necessary) invalidate
> > +	 * the cached range over the first operation we are about to run.
> > +	 *
> > +	 * We care about zero and collapse here because they both run a hole
> > +	 * punch over the range first. Because that can zero data, and the range
> > +	 * of invalidation for the shift operations is much larger, we still do
> > +	 * the required flush for collapse in xfs_prepare_shift().
> > +	 *
> > +	 * Insert has the same range requirements as collapse, and we extend the
> > +	 * file first which can zero data. Hence insert has the same
> > +	 * flush/invalidate requirements as collapse and so they are both
> > +	 * handled at the right time by xfs_prepare_shift().
> > +	 */
> > +	if (mode & (FALLOC_FL_PUNCH_HOLE | FALLOC_FL_ZERO_RANGE |
> > +		    FALLOC_FL_COLLAPSE_RANGE)) {
> 
> Er... "Insert has the same requirements as collapse", but we don't test
> for that here?  Also ... xfs_prepare_shift handles flushing for both
> collapse and insert range, but we still have to flush here for collapse?
>
> <confused but suspecting this has something to do with the fact that we
> only do insert range after updating the isize?>

Yes, exactly.

The flush for collapse here is for the hole punch part of collapse,
before we start shifting extents. insert does not hole punch, so it
doesn't need flushing here but it still needs flush/inval before
shifting. i.e.:

collapse				insert

flush_unmap(off, len)
punch hole(off, len)			extends EOF
  writes zeros around (off,len)		  writes zeros around EOF
collapse(off, len)			insert(off, len)
  flush_unmap(off, EOF)			  flush_unmap(off, EOF)
  shift extents down			  shift extents up

So once we start the actual extent shift operation (up or down)
the flush/unmap requirements are identical.

> I think the third paragraph of the comment is just confusing me more.
> Does the following describe what's going on?
> 
> "Insert range has the same range [should this be "page cache flushing"?]
> requirements as collapse.  Because we can zero data as part of extending
> the file size, we skip the flush here and let the flush in
> xfs_prepare_shift take care of invalidating the page cache." ?

It's a bit better - that's kinda what I was trying to describe - but
I'll try to reword it more clearly after I've let it settle in my
head for a little while....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

next prev parent reply	other threads:[~2019-10-29  4:41 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-29  3:48 [PATCH] xfs: properly serialise fallocate against AIO+DIO Dave Chinner
2019-10-29  4:02 ` Dave Chinner
2019-10-29  4:19 ` Darrick J. Wong
2019-10-29  4:41   ` Dave Chinner [this message]
2019-10-29 10:03     ` Brian Foster
2019-10-29 20:22       ` Darrick J. Wong
2019-10-29 22:29       ` Dave Chinner
2019-10-29 20:22     ` Darrick J. Wong
2019-10-29 20:23 ` [RFC PATCH] generic: test race between appending AIO DIO and fallocate Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191029044133.GN4614@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox