From: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
To: John Garry <john.g.garry@oracle.com>,
brauner@kernel.org, djwong@kernel.org, cem@kernel.org
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-kernel@vger.kernel.org, ojaswin@linux.ibm.com,
martin.petersen@oracle.com, tytso@mit.edu,
linux-ext4@vger.kernel.org, John Garry <john.g.garry@oracle.com>
Subject: Re: [PATCH v4 09/12] xfs: Add xfs_file_dio_write_atomic()
Date: Mon, 10 Mar 2025 19:09:39 +0530 [thread overview]
Message-ID: <877c4x57j8.fsf@gmail.com> (raw)
In-Reply-To: <20250303171120.2837067-10-john.g.garry@oracle.com>
John Garry <john.g.garry@oracle.com> writes:
> Add xfs_file_dio_write_atomic() for dedicated handling of atomic writes.
>
> In case of -EAGAIN being returned from iomap_dio_rw(), reissue the write
> in CoW-based atomic write mode.
>
> For CoW-based mode, ensure that we have no outstanding IOs which we
> may trample on.
>
> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
> fs/xfs/xfs_file.c | 42 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 42 insertions(+)
>
> diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
> index 51b4a43d15f3..70eb6928cf63 100644
> --- a/fs/xfs/xfs_file.c
> +++ b/fs/xfs/xfs_file.c
> @@ -619,6 +619,46 @@ xfs_file_dio_write_aligned(
> return ret;
> }
>
> +static noinline ssize_t
> +xfs_file_dio_write_atomic(
> + struct xfs_inode *ip,
> + struct kiocb *iocb,
> + struct iov_iter *from)
> +{
> + unsigned int iolock = XFS_IOLOCK_SHARED;
> + unsigned int dio_flags = 0;
> + ssize_t ret;
> +
> +retry:
> + ret = xfs_ilock_iocb_for_write(iocb, &iolock);
> + if (ret)
> + return ret;
> +
> + ret = xfs_file_write_checks(iocb, from, &iolock);
> + if (ret)
> + goto out_unlock;
> +
> + if (dio_flags & IOMAP_DIO_FORCE_WAIT)
> + inode_dio_wait(VFS_I(ip));
> +
> + trace_xfs_file_direct_write(iocb, from);
> + ret = iomap_dio_rw(iocb, from, &xfs_atomic_write_iomap_ops,
> + &xfs_dio_write_ops, dio_flags, NULL, 0);
> +
> + if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT) &&
> + !(dio_flags & IOMAP_DIO_ATOMIC_SW)) {
> + xfs_iunlock(ip, iolock);
> + dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT;
> + iolock = XFS_IOLOCK_EXCL;
> + goto retry;
> + }
IIUC typically filesystems can now implement support for IOMAP_ATOMIC_SW
as a fallback mechanism, by returning -EAGAIN error during
IOMAP_ATOMIC_HW handling from their ->iomap_begin() routine. They can
then retry the entire DIO operation of iomap_dio_rw() by passing
IOMAP_DIO_ATOMIC_SW flag in their dio_flags argument and handle
IOMAP_ATOMIC_SW fallback differently in their ->iomap_begin() routine.
However, -EAGAIN can also be returned when there is a race with mmap
writes for the same range. We return the same -EAGAIN error during page
cache invalidation failure for IOCB_ATOMIC writes too. However, current
code does not differentiate between these two types of failures. Therefore,
we always retry by falling back to SW CoW based atomic write even for
page cache invalidation failures.
__iomap_dio_rw()
{
<...>
/*
* Try to invalidate cache pages for the range we are writing.
* If this invalidation fails, let the caller fall back to
* buffered I/O.
*/
ret = kiocb_invalidate_pages(iocb, iomi.len);
if (ret) {
if (ret != -EAGAIN) {
trace_iomap_dio_invalidate_fail(inode, iomi.pos,
iomi.len);
if (iocb->ki_flags & IOCB_ATOMIC) {
/*
* folio invalidation failed, maybe
* this is transient, unlock and see if
* the caller tries again.
*/
ret = -EAGAIN;
} else {
/* fall back to buffered write */
ret = -ENOTBLK;
}
}
goto out_free_dio;
}
<...>
}
It's easy to miss such error handling conditions. If this is something
which was already discussed earlier, then perhaps it is better if
document this. BTW - Is this something that we already know of and has
been kept as such intentionally?
-ritesh
> +
> +out_unlock:
> + if (iolock)
> + xfs_iunlock(ip, iolock);
> + return ret;
> +}
> +
> /*
> * Handle block unaligned direct I/O writes
> *
> @@ -723,6 +763,8 @@ xfs_file_dio_write(
> return -EINVAL;
> if ((iocb->ki_pos | count) & ip->i_mount->m_blockmask)
> return xfs_file_dio_write_unaligned(ip, iocb, from);
> + if (iocb->ki_flags & IOCB_ATOMIC)
> + return xfs_file_dio_write_atomic(ip, iocb, from);
> return xfs_file_dio_write_aligned(ip, iocb, from);
> }
>
> --
> 2.31.1
next prev parent reply other threads:[~2025-03-10 14:05 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-03 17:11 [PATCH v4 00/12] large atomic writes for xfs with CoW John Garry
2025-03-03 17:11 ` [PATCH v4 01/12] xfs: Pass flags to xfs_reflink_allocate_cow() John Garry
2025-03-03 17:11 ` [PATCH v4 02/12] iomap: Rename IOMAP_ATOMIC -> IOMAP_ATOMIC_HW John Garry
2025-03-05 12:57 ` Carlos Maiolino
2025-03-06 8:50 ` Christian Brauner
2025-03-03 17:11 ` [PATCH v4 03/12] xfs: Switch atomic write size check in xfs_file_write_iter() John Garry
2025-03-03 17:11 ` [PATCH v4 04/12] xfs: Refactor xfs_reflink_end_cow_extent() John Garry
2025-03-03 17:11 ` [PATCH v4 05/12] iomap: Support SW-based atomic writes John Garry
2025-03-09 21:51 ` Dave Chinner
2025-03-10 10:44 ` John Garry
2025-03-10 17:21 ` Darrick J. Wong
2025-03-03 17:11 ` [PATCH v4 06/12] iomap: Lift blocksize restriction on " John Garry
2025-03-03 17:11 ` [PATCH v4 07/12] xfs: Reflink CoW-based atomic write support John Garry
2025-03-03 17:11 ` [PATCH v4 08/12] xfs: Iomap SW-based " John Garry
2025-03-03 17:11 ` [PATCH v4 09/12] xfs: Add xfs_file_dio_write_atomic() John Garry
2025-03-10 13:39 ` Ritesh Harjani [this message]
2025-03-10 15:24 ` John Garry
2025-03-10 17:25 ` Darrick J. Wong
2025-03-03 17:11 ` [PATCH v4 10/12] xfs: Commit CoW-based atomic writes atomically John Garry
2025-03-03 17:11 ` [PATCH v4 11/12] xfs: Update atomic write max size John Garry
2025-03-10 10:06 ` Carlos Maiolino
2025-03-10 10:54 ` John Garry
2025-03-10 11:11 ` Carlos Maiolino
2025-03-10 11:20 ` John Garry
2025-03-10 12:38 ` Carlos Maiolino
2025-03-10 12:59 ` John Garry
2025-03-03 17:11 ` [PATCH v4 12/12] xfs: Allow block allocator to take an alignment hint John Garry
2025-03-09 22:03 ` Dave Chinner
2025-03-10 12:10 ` John Garry
2025-03-12 19:51 ` Dave Chinner
2025-03-12 21:00 ` John Garry
2025-03-06 8:47 ` (subset) [PATCH v4 00/12] large atomic writes for xfs with CoW Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877c4x57j8.fsf@gmail.com \
--to=ritesh.list@gmail.com \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=djwong@kernel.org \
--cc=john.g.garry@oracle.com \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=ojaswin@linux.ibm.com \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox