From: "Darrick J. Wong" <djwong@kernel.org>
To: Wengang Wang <wen.gang.wang@oracle.com>
Cc: linux-xfs@vger.kernel.org, david@fromorbit.com
Subject: Re: [PATCH V2] xfs: make src file readable during reflink
Date: Tue, 5 Jul 2022 11:37:54 -0700 [thread overview]
Message-ID: <YsSFAmc70npnoCbM@magnolia> (raw)
In-Reply-To: <20220629060755.25537-1-wen.gang.wang@oracle.com>
On Tue, Jun 28, 2022 at 11:07:55PM -0700, Wengang Wang wrote:
> During a reflink operation, the IOLOCK and MMAPLOCK of the source file
> are held in exclusive mode for the duration. This prevents reads on the
> source file, which could be a very long time if the source file has
> millions of extents.
>
> As the source of copy, besides some necessary modification (say dirty page
> flushing), it plays readonly role. Locking source file exclusively through
> out the full reflink copy is unreasonable.
>
> This patch downgrades exclusive locks on source file to shared modes after
> page cache flushing and before cloning the extents. To avoid source file
> change after lock downgradation, direct write paths take IOLOCK_EXCL on
> seeing reflink copy happening to the files.
This is going to complicate the synchronization logic between reflink
and everything else quite a bit -- right now we generally allow multiple
concurrent direct writers (IOLOCK) and write faults (MMAPLOCK) per file,
so space mapping operations (fallocate/reflink) can lock out those
writers in a simple manner.
> Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
> ---
> V2 changes:
> Commit message
> Make direct write paths take IOLOCK_EXCL when reflink copy is happening
> Tiny changes
> ---
> fs/xfs/xfs_file.c | 33 ++++++++++++++++++++++++++++++---
> fs/xfs/xfs_inode.c | 31 +++++++++++++++++++++++++++++++
> fs/xfs/xfs_inode.h | 11 +++++++++++
> 3 files changed, 72 insertions(+), 3 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 5a171c0b244b..6ca7118ee274 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -514,8 +514,10 @@ xfs_file_dio_write_aligned(
> struct iov_iter *from)
> {
> unsigned int iolock = XFS_IOLOCK_SHARED;
> + int remapping;
bool?
> ssize_t ret;
>
> +relock:
> ret = xfs_ilock_iocb(iocb, iolock);
> if (ret)
> return ret;
> @@ -523,14 +525,25 @@ xfs_file_dio_write_aligned(
> if (ret)
> goto out_unlock;
>
> + remapping = xfs_iflags_test(ip, XFS_IREMAPPING);
remapping = xfs_has_reflink(mp) && xfs_iflags_test(ip, XFS_IREMAPPING);
so that you can skip the locked test on filesystems where remapping
isn't possible.
> +
> /*
> * We don't need to hold the IOLOCK exclusively across the IO, so demote
> * the iolock back to shared if we had to take the exclusive lock in
> * xfs_file_write_checks() for other reasons.
> + * But take IOLOCK_EXCL when reflink copy is going on
> */
> if (iolock == XFS_IOLOCK_EXCL) {
> - xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
> - iolock = XFS_IOLOCK_SHARED;
> + if (!remapping) {
> + xfs_ilock_demote(ip, XFS_IOLOCK_EXCL);
> + iolock = XFS_IOLOCK_SHARED;
> + }
Hm. So the logic in the IOLOCK_EXCL case is that if a directio write
takes IOLOCK_EXCL, there can't possibly be a remap operation running.
Remap operations themselves always start by taking IOLOCK_EXCL before
setting IREMAPPING, so a remap operation cannot set IREMAPPING until
after this directio completes.
In the IOLOCK_SHARED case below, the directio upgrades to IOLOCK_EXCL if
a remap operation is detected. We're protected against IREMAPPING
getting set while we hold IOLOCK_SHARED (because remap operations start
by taking IOLOCK_EXCL), though in theory we could race with the end of a
remapping operation, which at worst will result in an unnecessary
IOLOCK_EXCL acquisition, right?
There can only be one remapping operation in progress at a time because
they will take IOLOCK_EXCL initially and demote to _SHARED, so there
shouldn't be any races to setting and clearing IREMAPPING.
So I /think/ this works, but concurrency is hard to think about. :/
> + } else { /* iolock == XFS_ILOCK_SHARED */
IOLOCK_SHARED, not ILOCK_SHARED?
> + if (remapping) {
> + xfs_iunlock(ip, iolock);
> + iolock = XFS_IOLOCK_EXCL;
> + goto relock;
> + }
> }
> trace_xfs_file_direct_write(iocb, from);
> ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
> @@ -1125,6 +1138,19 @@ xfs_file_remap_range(
Aren't changes necessary for xfs_file_dio_write_unaligned too?
> if (ret || len == 0)
> return ret;
>
> + /*
> + * Set XFS_IREMAPPING flag to source file before we downgrade
> + * the locks, so that all direct writes know they have to take
> + * IOLOCK_EXCL.
> + */
> + xfs_iflags_set(src, XFS_IREMAPPING);
> +
> + /*
> + * From now on, we read only from src, so downgrade locks to allow
> + * read operations go.
> + */
> + xfs_ilock_io_mmap_downgrade_src(src, dest);
> +
> trace_xfs_reflink_remap_range(src, pos_in, len, dest, pos_out);
>
> ret = xfs_reflink_remap_blocks(src, pos_in, dest, pos_out, len,
> @@ -1152,7 +1178,8 @@ xfs_file_remap_range(
> if (xfs_file_sync_writes(file_in) || xfs_file_sync_writes(file_out))
> xfs_log_force_inode(dest);
> out_unlock:
> - xfs_iunlock2_io_mmap(src, dest);
> + xfs_iflags_clear(src, XFS_IREMAPPING);
> + xfs_iunlock2_io_mmap_src_shared(src, dest);
> if (ret)
> trace_xfs_reflink_remap_range_error(dest, ret, _RET_IP_);
> return remapped > 0 ? remapped : ret;
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 52d6f2c7d58b..1cbd4a594f28 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3786,6 +3786,16 @@ xfs_ilock2_io_mmap(
> return 0;
> }
>
> +/* Downgrade the locks on src file if src and dest are not the same one. */
> +void
> +xfs_ilock_io_mmap_downgrade_src(
> + struct xfs_inode *src,
> + struct xfs_inode *dest)
> +{
> + if (src != dest)
> + xfs_ilock_demote(src, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
Oh, you're downgrading MMAPLOCK_EXCL (aka invalidate_lock) too?
That's going to be tricky to figure out -- write page faults (and
apparently all DAX faults) take MMAPLOCK_SHARED (invalidate_lock_shared)
so I think you'd have to add similar "upgrade/downgrade lock" logic to
__xfs_filemap_fault?
I'm not 100% sure I'm correct about that statement, my head is starting
to spin and I'm not sure it's worth the complexity.
I /do/ wonder if range locking would be a better solution here, since we
can safely unlock file ranges that we've already remapped?
--D
> +}
> +
> /* Unlock both inodes to allow IO and mmap activity. */
> void
> xfs_iunlock2_io_mmap(
> @@ -3798,3 +3808,24 @@ xfs_iunlock2_io_mmap(
> if (ip1 != ip2)
> inode_unlock(VFS_I(ip1));
> }
> +
> +/*
> + * Unlock the exclusive locks on dest file.
> + * Also unlock the shared locks on src if src and dest are not the same one
> + */
> +void
> +xfs_iunlock2_io_mmap_src_shared(
> + struct xfs_inode *src,
> + struct xfs_inode *dest)
> +{
> + struct inode *src_inode = VFS_I(src);
> + struct inode *dest_inode = VFS_I(dest);
> +
> + inode_unlock(dest_inode);
> + filemap_invalidate_unlock(dest_inode->i_mapping);
> + if (src == dest)
> + return;
> +
> + inode_unlock_shared(src_inode);
> + filemap_invalidate_unlock_shared(src_inode->i_mapping);
> +}
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 7be6f8e705ab..c07d4b42cf9d 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -262,6 +262,13 @@ static inline bool xfs_inode_has_large_extent_counts(struct xfs_inode *ip)
> */
> #define XFS_INACTIVATING (1 << 13)
>
> +/*
> + * A flag indicating reflink copy / remapping is happening to the file as
> + * source. When set, all direct IOs should take IOLOCK_EXCL to avoid
> + * interphering the remapping.
> + */
> +#define XFS_IREMAPPING (1 << 14)
> +
> /* All inode state flags related to inode reclaim. */
> #define XFS_ALL_IRECLAIM_FLAGS (XFS_IRECLAIMABLE | \
> XFS_IRECLAIM | \
> @@ -512,5 +519,9 @@ void xfs_end_io(struct work_struct *work);
>
> int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
> void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
> +void xfs_ilock_io_mmap_downgrade_src(struct xfs_inode *src,
> + struct xfs_inode *dest);
> +void xfs_iunlock2_io_mmap_src_shared(struct xfs_inode *src,
> + struct xfs_inode *dest);
>
> #endif /* __XFS_INODE_H__ */
> --
> 2.21.0 (Apple Git-122.2)
>
next prev parent reply other threads:[~2022-07-05 18:38 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-29 6:07 [PATCH V2] xfs: make src file readable during reflink Wengang Wang
2022-07-05 16:34 ` Wengang Wang
2022-07-05 18:37 ` Darrick J. Wong [this message]
2022-07-06 1:35 ` Dave Chinner
2022-08-04 2:50 ` Wengang Wang
2022-07-06 1:24 ` Dave Chinner
[not found] <20220624191037.23683-1-wen.gang.wang@oracle.com>
2022-06-28 15:57 ` Wengang Wang
2022-06-28 22:21 ` Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YsSFAmc70npnoCbM@magnolia \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
--cc=wen.gang.wang@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox