All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Wengang Wang <wen.gang.wang@oracle.com>
Cc: linux-xfs@vger.kernel.org, david@fromorbit.com
Subject: Re: [PATCH V2] xfs: make src file readable during reflink
Date: Tue, 5 Jul 2022 11:37:54 -0700	[thread overview]
Message-ID: <YsSFAmc70npnoCbM@magnolia> (raw)
In-Reply-To: <20220629060755.25537-1-wen.gang.wang@oracle.com>

On Tue, Jun 28, 2022 at 11:07:55PM -0700, Wengang Wang wrote:
> During a reflink operation, the IOLOCK and MMAPLOCK of the source file
> are held in exclusive mode for the duration. This prevents reads on the
> source file, which could be a very long time if the source file has
> millions of extents.
> 
> As the source of copy, besides some necessary modification (say dirty page
> flushing), it plays readonly role. Locking source file exclusively through
> out the full reflink copy is unreasonable.
> 
> This patch downgrades exclusive locks on source file to shared modes after
> page cache flushing and before cloning the extents. To avoid source file
> change after lock downgradation, direct write paths take IOLOCK_EXCL on
> seeing reflink copy happening to the files.

This is going to complicate the synchronization logic between reflink
and everything else quite a bit -- right now we generally allow multiple
concurrent direct writers (IOLOCK) and write faults (MMAPLOCK) per file,
so space mapping operations (fallocate/reflink) can lock out those
writers in a simple manner.

> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
> V2 changes:
>  Commit message
>  Make direct write paths take IOLOCK_EXCL when reflink copy is happening
>  Tiny changes
> ---
>  fs/xfs/xfs_file.c  | 33 ++++++++++++++++++++++++++++++---
>  fs/xfs/xfs_inode.c | 31 +++++++++++++++++++++++++++++++
>  fs/xfs/xfs_inode.h | 11 +++++++++++
>  3 files changed, 72 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5a171c0b244b..6ca7118ee274 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -514,8 +514,10 @@ xfs_file_dio_write_aligned(
>  	struct iov_iter		*from)
>  {
>  	unsigned int		iolock = XFS_IOLOCK_SHARED;
> +	int			remapping;

bool?

>  	ssize_t			ret;
>  
> +relock:
>  	ret = xfs_ilock_iocb(iocb, iolock);
>  	if (ret)
>  		return ret;
> @@ -523,14 +525,25 @@ xfs_file_dio_write_aligned(
>  	if (ret)
>  		goto out_unlock;
>  
> +	remapping = xfs_iflags_test(ip, XFS_IREMAPPING);

remapping = xfs_has_reflink(mp) && xfs_iflags_test(ip, XFS_IREMAPPING);

so that you can skip the locked test on filesystems where remapping
isn't possible.

> +
>  	/*
>  	 * We don't need to hold the IOLOCK exclusively across the IO, so demote
>  	 * the iolock back to shared if we had to take the exclusive lock in
>  	 * xfs_file_write_checks() for other reasons.
> +	 * But take IOLOCK_EXCL when reflink copy is going on
>  	 */
>  	if (iolock == XFS_IOLOCK_EXCL) {
> -		xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
> -		iolock = XFS_IOLOCK_SHARED;
> +		if (!remapping) {
> +			xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
> +			iolock = XFS_IOLOCK_SHARED;
> +		}

Hm.  So the logic in the IOLOCK_EXCL case is that if a directio write
takes IOLOCK_EXCL, there can't possibly be a remap operation running.
Remap operations themselves always start by taking IOLOCK_EXCL before
setting IREMAPPING, so a remap operation cannot set IREMAPPING until
after this directio completes.

In the IOLOCK_SHARED case below, the directio upgrades to IOLOCK_EXCL if
a remap operation is detected.  We're protected against IREMAPPING
getting set while we hold IOLOCK_SHARED (because remap operations start
by taking IOLOCK_EXCL), though in theory we could race with the end of a
remapping operation, which at worst will result in an unnecessary
IOLOCK_EXCL acquisition, right?

There can only be one remapping operation in progress at a time because
they will take IOLOCK_EXCL initially and demote to _SHARED, so there
shouldn't be any races to setting and clearing IREMAPPING.

So I /think/ this works, but concurrency is hard to think about. :/

> +	} else { /* iolock == XFS_ILOCK_SHARED */

IOLOCK_SHARED, not ILOCK_SHARED?

> +		if (remapping) {
> +			xfs_iunlock(ip, iolock);
> +			iolock = XFS_IOLOCK_EXCL;
> +			goto relock;
> +		}
>  	}
>  	trace_xfs_file_direct_write(iocb, from);
>  	ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
> @@ -1125,6 +1138,19 @@ xfs_file_remap_range(

Aren't changes necessary for xfs_file_dio_write_unaligned too?

>  	if (ret || len == 0)
>  		return ret;
>  
> +	/*
> +	 * Set XFS_IREMAPPING flag to source file before we downgrade
> +	 * the locks, so that all direct writes know they have to take
> +	 * IOLOCK_EXCL.
> +	 */
> +	xfs_iflags_set(src, XFS_IREMAPPING);
> +
> +	/*
> +	 * From now on, we read only from src, so downgrade locks to allow
> +	 * read operations go.
> +	 */
> +	xfs_ilock_io_mmap_downgrade_src(src, dest);
> +
>  	trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out);
>  
>  	ret = xfs_reflink_remap_blocks(src, pos_in, dest, pos_out, len,
> @@ -1152,7 +1178,8 @@ xfs_file_remap_range(
>  	if (xfs_file_sync_writes(file_in) || xfs_file_sync_writes(file_out))
>  		xfs_log_force_inode(dest);
>  out_unlock:
> -	xfs_iunlock2_io_mmap(src, dest);
> +	xfs_iflags_clear(src, XFS_IREMAPPING);
> +	xfs_iunlock2_io_mmap_src_shared(src, dest);
>  	if (ret)
>  		trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
>  	return remapped > 0 ? remapped : ret;
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 52d6f2c7d58b..1cbd4a594f28 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3786,6 +3786,16 @@ xfs_ilock2_io_mmap(
>  	return 0;
>  }
>  
> +/* Downgrade the locks on src file if src and dest are not the same one. */
> +void
> +xfs_ilock_io_mmap_downgrade_src(
> +	struct xfs_inode	*src,
> +	struct xfs_inode	*dest)
> +{
> +	if (src != dest)
> +		xfs_ilock_demote(src, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);

Oh, you're downgrading MMAPLOCK_EXCL (aka invalidate_lock) too?

That's going to be tricky to figure out -- write page faults (and
apparently all DAX faults) take MMAPLOCK_SHARED (invalidate_lock_shared)
so I think you'd have to add similar "upgrade/downgrade lock" logic to
__xfs_filemap_fault?

I'm not 100% sure I'm correct about that statement, my head is starting
to spin and I'm not sure it's worth the complexity.

I /do/ wonder if range locking would be a better solution here, since we
can safely unlock file ranges that we've already remapped?

--D

> +}
> +
>  /* Unlock both inodes to allow IO and mmap activity. */
>  void
>  xfs_iunlock2_io_mmap(
> @@ -3798,3 +3808,24 @@ xfs_iunlock2_io_mmap(
>  	if (ip1 != ip2)
>  		inode_unlock(VFS_I(ip1));
>  }
> +
> +/*
> + * Unlock the exclusive locks on dest file.
> + * Also unlock the shared locks on src if src and dest are not the same one
> + */
> +void
> +xfs_iunlock2_io_mmap_src_shared(
> +	struct xfs_inode	*src,
> +	struct xfs_inode	*dest)
> +{
> +	struct inode	*src_inode = VFS_I(src);
> +	struct inode	*dest_inode = VFS_I(dest);
> +
> +	inode_unlock(dest_inode);
> +	filemap_invalidate_unlock(dest_inode->i_mapping);
> +	if (src == dest)
> +		return;
> +
> +	inode_unlock_shared(src_inode);
> +	filemap_invalidate_unlock_shared(src_inode->i_mapping);
> +}
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 7be6f8e705ab..c07d4b42cf9d 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -262,6 +262,13 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
>   */
>  #define XFS_INACTIVATING	(1 << 13)
>  
> +/*
> + * A flag indicating reflink copy / remapping is happening to the file as
> + * source. When set, all direct IOs should take IOLOCK_EXCL to avoid
> + * interphering the remapping.
> + */
> +#define XFS_IREMAPPING		(1 << 14)
> +
>  /* All inode state flags related to inode reclaim. */
>  #define XFS_ALL_IRECLAIM_FLAGS	(XFS_IRECLAIMABLE | \
>  				 XFS_IRECLAIM | \
> @@ -512,5 +519,9 @@ void xfs_end_io(struct work_struct *work);
>  
>  int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
>  void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
> +void xfs_ilock_io_mmap_downgrade_src(struct xfs_inode *src,
> +					struct xfs_inode *dest);
> +void xfs_iunlock2_io_mmap_src_shared(struct xfs_inode *src,
> +					struct xfs_inode *dest);
>  
>  #endif	/* __XFS_INODE_H__ */
> -- 
> 2.21.0 (Apple Git-122.2)
> 

  parent reply	other threads:[~2022-07-05 18:38 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-29  6:07 [PATCH V2] xfs: make src file readable during reflink Wengang Wang
2022-07-05 16:34 ` Wengang Wang
2022-07-05 18:37 ` Darrick J. Wong [this message]
2022-07-06  1:35   ` Dave Chinner
2022-08-04  2:50     ` Wengang Wang
2022-07-06  1:24 ` Dave Chinner
     [not found] <20220624191037.23683-1-wen.gang.wang@oracle.com>
2022-06-28 15:57 ` Wengang Wang
2022-06-28 22:21   ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YsSFAmc70npnoCbM@magnolia \
    --to=djwong@kernel.org \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=wen.gang.wang@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.