From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
avi@scylladb.com, Dave Chinner <dchinner@redhat.com>,
Brian Foster <bfoster@redhat.com>
Subject: Re: [PATCH 11/11] xfs: reduce exclusive locking on unaligned dio
Date: Fri, 22 Jan 2021 09:24:28 -0800 [thread overview]
Message-ID: <20210122172428.GF1282159@magnolia> (raw)
In-Reply-To: <20210122162043.616755-12-hch@lst.de>
On Fri, Jan 22, 2021 at 05:20:43PM +0100, Christoph Hellwig wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Attempt shared locking for unaligned DIO, but only if the the
> underlying extent is already allocated and in written state. On
> failure, retry with the existing exclusive locking.
>
> Test case is fio randrw of 512 byte IOs using AIO and an iodepth of
> 32 IOs.
>
> Vanilla:
>
> READ: bw=4560KiB/s (4670kB/s), 4560KiB/s-4560KiB/s (4670kB/s-4670kB/s), io=134MiB (140MB), run=30001-30001msec
> WRITE: bw=4567KiB/s (4676kB/s), 4567KiB/s-4567KiB/s (4676kB/s-4676kB/s), io=134MiB (140MB), run=30001-30001msec
>
> Patched:
> READ: bw=37.6MiB/s (39.4MB/s), 37.6MiB/s-37.6MiB/s (39.4MB/s-39.4MB/s), io=1127MiB (1182MB), run=30002-30002msec
> WRITE: bw=37.6MiB/s (39.4MB/s), 37.6MiB/s-37.6MiB/s (39.4MB/s-39.4MB/s), io=1128MiB (1183MB), run=30002-30002msec
>
> That's an improvement from ~18k IOPS to a ~150k IOPS, which is
> about the IOPS limit of the VM block device setup I'm testing on.
>
> 4kB block IO comparison:
>
> READ: bw=296MiB/s (310MB/s), 296MiB/s-296MiB/s (310MB/s-310MB/s), io=8868MiB (9299MB), run=30002-30002msec
> WRITE: bw=296MiB/s (310MB/s), 296MiB/s-296MiB/s (310MB/s-310MB/s), io=8878MiB (9309MB), run=30002-30002msec
>
> Which is ~150k IOPS, same as what the test gets for sub-block
> AIO+DIO writes with this patch.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> [hch: rebased, split unaligned from nowait]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
Looks good to me now,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> ---
> fs/xfs/xfs_file.c | 58 +++++++++++++++++++++++++++++++++++-----------
> fs/xfs/xfs_iomap.c | 29 ++++++++++++++++-------
> 2 files changed, 66 insertions(+), 21 deletions(-)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index c60ff7b5dd829e..39695b59dfcc92 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -544,10 +544,13 @@ xfs_file_dio_write_aligned(
> * to do sub-block zeroing and that requires serialisation against other direct
> * I/O to the same block. In this case we need to serialise the submission of
> * the unaligned I/O so that we don't get racing block zeroing in the dio layer.
> + * In the case where sub-block zeroing is not required, we can do concurrent
> + * sub-block dios to the same block successfully.
> *
> - * This means that unaligned dio writes always block. There is no "nowait" fast
> - * path in this code - if IOCB_NOWAIT is set we simply return -EAGAIN up front
> - * and we don't have to worry about that anymore.
> + * Optimistically submit the I/O using the shared lock first, but use the
> + * IOMAP_DIO_OVERWRITE_ONLY flag to tell the lower layers to return -EAGAIN
> + * if block allocation or partial block zeroing would be required. In that case
> + * we try again with the exclusive lock.
> */
> static noinline ssize_t
> xfs_file_dio_write_unaligned(
> @@ -555,13 +558,28 @@ xfs_file_dio_write_unaligned(
> struct kiocb *iocb,
> struct iov_iter *from)
> {
> - int iolock = XFS_IOLOCK_EXCL;
> + size_t isize = i_size_read(VFS_I(ip));
> + size_t count = iov_iter_count(from);
> + int iolock = XFS_IOLOCK_SHARED;
> + unsigned int flags = IOMAP_DIO_OVERWRITE_ONLY;
> ssize_t ret;
>
> - /* unaligned dio always waits, bail */
> - if (iocb->ki_flags & IOCB_NOWAIT)
> - return -EAGAIN;
> - xfs_ilock(ip, iolock);
> + /*
> + * Extending writes need exclusivity because of the sub-block zeroing
> + * that the DIO code always does for partial tail blocks beyond EOF, so
> + * don't even bother trying the fast path in this case.
> + */
> + if (iocb->ki_pos > isize || iocb->ki_pos + count >= isize) {
> +retry_exclusive:
> + if (iocb->ki_flags & IOCB_NOWAIT)
> + return -EAGAIN;
> + iolock = XFS_IOLOCK_EXCL;
> + flags = IOMAP_DIO_FORCE_WAIT;
> + }
> +
> + ret = xfs_ilock_iocb(iocb, iolock);
> + if (ret)
> + return ret;
>
> /*
> * We can't properly handle unaligned direct I/O to reflink files yet,
> @@ -578,15 +596,29 @@ xfs_file_dio_write_unaligned(
> goto out_unlock;
>
> /*
> - * If we are doing unaligned I/O, this must be the only I/O in-flight.
> - * Otherwise we risk data corruption due to unwritten extent conversions
> - * from the AIO end_io handler. Wait for all other I/O to drain first.
> + * If we are doing exclusive unaligned I/O, this must be the only I/O
> + * in-flight. Otherwise we risk data corruption due to unwritten extent
> + * conversions from the AIO end_io handler. Wait for all other I/O to
> + * drain first.
> */
> - inode_dio_wait(VFS_I(ip));
> + if (flags & IOMAP_DIO_FORCE_WAIT)
> + inode_dio_wait(VFS_I(ip));
>
> trace_xfs_file_direct_write(iocb, from);
> ret = iomap_dio_rw(iocb, from, &xfs_direct_write_iomap_ops,
> - &xfs_dio_write_ops, IOMAP_DIO_FORCE_WAIT);
> + &xfs_dio_write_ops, flags);
> +
> + /*
> + * Retry unaligned I/O with exclusive blocking semantics if the DIO
> + * layer rejected it for mapping or locking reasons. If we are doing
> + * nonblocking user I/O, propagate the error.
> + */
> + if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT)) {
> + ASSERT(flags & IOMAP_DIO_OVERWRITE_ONLY);
> + xfs_iunlock(ip, iolock);
> + goto retry_exclusive;
> + }
> +
> out_unlock:
> if (iolock)
> xfs_iunlock(ip, iolock);
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 7b9ff824e82d48..ef76f775fabf11 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -784,15 +784,28 @@ xfs_direct_write_iomap_begin(
> goto allocate_blocks;
>
> /*
> - * NOWAIT IO needs to span the entire requested IO with a single map so
> - * that we avoid partial IO failures due to the rest of the IO range not
> - * covered by this map triggering an EAGAIN condition when it is
> - * subsequently mapped and aborting the IO.
> + * NOWAIT and OVERWRITE I/O needs to span the entire requested I/O with
> + * a single map so that we avoid partial IO failures due to the rest of
> + * the I/O range not covered by this map triggering an EAGAIN condition
> + * when it is subsequently mapped and aborting the I/O.
> */
> - if ((flags & IOMAP_NOWAIT) &&
> - !imap_spans_range(&imap, offset_fsb, end_fsb)) {
> + if (flags & (IOMAP_NOWAIT | IOMAP_OVERWRITE_ONLY)) {
> error = -EAGAIN;
> - goto out_unlock;
> + if (!imap_spans_range(&imap, offset_fsb, end_fsb))
> + goto out_unlock;
> + }
> +
> + /*
> + * For overwrite only I/O, we cannot convert unwritten extents without
> + * requiring sub-block zeroing. This can only be done under an
> + * exclusive IOLOCK, hence return -EAGAIN if this is not a written
> + * extent to tell the caller to try again.
> + */
> + if (flags & IOMAP_OVERWRITE_ONLY) {
> + error = -EAGAIN;
> + if (imap.br_state != XFS_EXT_NORM &&
> + ((offset | length) & mp->m_blockmask))
> + goto out_unlock;
> }
>
> xfs_iunlock(ip, lockmode);
> @@ -801,7 +814,7 @@ xfs_direct_write_iomap_begin(
>
> allocate_blocks:
> error = -EAGAIN;
> - if (flags & IOMAP_NOWAIT)
> + if (flags & (IOMAP_NOWAIT | IOMAP_OVERWRITE_ONLY))
> goto out_unlock;
>
> /*
> --
> 2.29.2
>
next prev parent reply other threads:[~2021-01-22 18:17 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-22 16:20 reduce sub-block DIO serialisation v4 Christoph Hellwig
2021-01-22 16:20 ` [PATCH 01/11] xfs: factor out a xfs_ilock_iocb helper Christoph Hellwig
2021-01-22 16:20 ` [PATCH 02/11] xfs: make xfs_file_aio_write_checks IOCB_NOWAIT-aware Christoph Hellwig
2021-01-22 16:20 ` [PATCH 03/11] xfs: cleanup the read/write helper naming Christoph Hellwig
2021-01-22 16:20 ` [PATCH 04/11] xfs: remove the buffered I/O fallback assert Christoph Hellwig
2021-01-22 16:20 ` [PATCH 05/11] xfs: simplify the read/write tracepoints Christoph Hellwig
2021-01-22 16:20 ` [PATCH 06/11] xfs: improve the reflink_bounce_dio_write tracepoint Christoph Hellwig
2021-01-22 16:20 ` [PATCH 07/11] xfs: split the unaligned DIO write code out Christoph Hellwig
2021-01-22 16:20 ` [PATCH 08/11] iomap: rename the flags variable in __iomap_dio_rw Christoph Hellwig
2021-01-22 16:20 ` [PATCH 09/11] iomap: pass a flags argument to iomap_dio_rw Christoph Hellwig
2021-01-22 16:20 ` [PATCH 10/11] iomap: add a IOMAP_DIO_OVERWRITE_ONLY flag Christoph Hellwig
2021-01-22 16:20 ` [PATCH 11/11] xfs: reduce exclusive locking on unaligned dio Christoph Hellwig
2021-01-22 17:24 ` Darrick J. Wong [this message]
2021-01-23 18:57 ` reduce sub-block DIO serialisation v4 Darrick J. Wong
2021-02-01 16:58 ` Christoph Hellwig
2021-02-01 17:51 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2021-01-21 8:58 reduce sub-block DIO serialisation v3 Christoph Hellwig
2021-01-21 8:59 ` [PATCH 11/11] xfs: reduce exclusive locking on unaligned dio Christoph Hellwig
2021-01-21 9:35 ` Dave Chinner
2021-01-22 10:20 ` Christoph Hellwig
2021-01-21 13:33 ` Brian Foster
2021-01-21 19:01 ` Darrick J. Wong
2021-01-18 19:35 reduce sub-block DIO serialisation v2 Christoph Hellwig
2021-01-18 19:35 ` [PATCH 11/11] xfs: reduce exclusive locking on unaligned dio Christoph Hellwig
2021-01-18 20:55 ` Dave Chinner
2021-01-20 16:36 ` Christoph Hellwig
2021-01-20 18:40 ` Darrick J. Wong
2021-01-20 18:44 ` Christoph Hellwig
2021-01-20 19:58 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210122172428.GF1282159@magnolia \
--to=djwong@kernel.org \
--cc=avi@scylladb.com \
--cc=bfoster@redhat.com \
--cc=dchinner@redhat.com \
--cc=hch@lst.de \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).