From: "Darrick J. Wong" <djwong@kernel.org>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 4/9] xfs,iomap: move delalloc punching to iomap
Date: Thu, 17 Nov 2022 09:57:13 -0800 [thread overview]
Message-ID: <Y3Z1+aDVRZFwDUvL@magnolia> (raw)
In-Reply-To: <20221117055810.498014-5-david@fromorbit.com>
On Thu, Nov 17, 2022 at 04:58:05PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> Because that's what Christoph wants for this error handling path
> only XFS uses.
>
> It requires a new iomap export for handling errors over delalloc
> ranges. This is basically the XFS code as is stands, but even though
> Christoph wants this as iomap funcitonality, we still have
> to call it from the filesystem specific ->iomap_end callback, and
> call into the iomap code with yet another filesystem specific
> callback to punch the delalloc extent within the defined ranges.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> fs/iomap/buffered-io.c | 60 ++++++++++++++++++++++++++++++++++++++++++
> fs/xfs/xfs_iomap.c | 47 ++++++---------------------------
> include/linux/iomap.h | 6 +++++
> 3 files changed, 74 insertions(+), 39 deletions(-)
>
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 91ee0b308e13..77f391fd90ca 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -832,6 +832,66 @@ iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *i,
> }
> EXPORT_SYMBOL_GPL(iomap_file_buffered_write);
>
> +/*
> + * When a short write occurs, the filesystem may need to remove reserved space
> + * that was allocated in ->iomap_begin from it's ->iomap_end method. For
> + * filesystems that use delayed allocation, we need to punch out delalloc
> + * extents from the range that are not dirty in the page cache. As the write can
> + * race with page faults, there can be dirty pages over the delalloc extent
> + * outside the range of a short write but still within the delalloc extent
> + * allocated for this iomap.
> + *
> + * This function uses [start_byte, end_byte) intervals (i.e. open ended) to
> + * simplify range iterations, but converts them back to {offset,len} tuples for
> + * the punch callback.
> + */
> +int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
> + struct iomap *iomap, loff_t offset, loff_t length,
Nit: loff_t pos, not 'offset', to (try to) be consistent with the rest
of the codebase.
I'll fix this on commit if you don't beat me to the punch, so
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
--D
> + ssize_t written,
> + int (*punch)(struct inode *inode, loff_t offset, loff_t length))
> +{
> + loff_t start_byte;
> + loff_t end_byte;
> + int blocksize = 1 << inode->i_blkbits;
> + int error = 0;
> +
> + if (iomap->type != IOMAP_DELALLOC)
> + return 0;
> +
> + /* If we didn't reserve the blocks, we're not allowed to punch them. */
> + if (!(iomap->flags & IOMAP_F_NEW))
> + return 0;
> +
> + /*
> + * start_byte refers to the first unused block after a short write. If
> + * nothing was written, round offset down to point at the first block in
> + * the range.
> + */
> + if (unlikely(!written))
> + start_byte = round_down(offset, blocksize);
> + else
> + start_byte = round_up(offset + written, blocksize);
> + end_byte = round_up(offset + length, blocksize);
> +
> + /* Nothing to do if we've written the entire delalloc extent */
> + if (start_byte >= end_byte)
> + return 0;
> +
> + /*
> + * Lock the mapping to avoid races with page faults re-instantiating
> + * folios and dirtying them via ->page_mkwrite between the page cache
> + * truncation and the delalloc extent removal. Failing to do this can
> + * leave dirty pages with no space reservation in the cache.
> + */
> + filemap_invalidate_lock(inode->i_mapping);
> + truncate_pagecache_range(inode, start_byte, end_byte - 1);
> + error = punch(inode, start_byte, end_byte - start_byte);
> + filemap_invalidate_unlock(inode->i_mapping);
> +
> + return error;
> +}
> +EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
> +
> static loff_t iomap_unshare_iter(struct iomap_iter *iter)
> {
> struct iomap *iomap = &iter->iomap;
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 7bb55dbc19d3..ea96e8a34868 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1123,12 +1123,12 @@ xfs_buffered_write_iomap_begin(
> static int
> xfs_buffered_write_delalloc_punch(
> struct inode *inode,
> - loff_t start_byte,
> - loff_t end_byte)
> + loff_t offset,
> + loff_t length)
> {
> struct xfs_mount *mp = XFS_M(inode->i_sb);
> - xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, start_byte);
> - xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, end_byte);
> + xfs_fileoff_t start_fsb = XFS_B_TO_FSBT(mp, offset);
> + xfs_fileoff_t end_fsb = XFS_B_TO_FSB(mp, offset + length);
>
> return xfs_bmap_punch_delalloc_range(XFS_I(inode), start_fsb,
> end_fsb - start_fsb);
> @@ -1143,13 +1143,9 @@ xfs_buffered_write_iomap_end(
> unsigned flags,
> struct iomap *iomap)
> {
> - struct xfs_mount *mp = XFS_M(inode->i_sb);
> - loff_t start_byte;
> - loff_t end_byte;
> - int error = 0;
>
> - if (iomap->type != IOMAP_DELALLOC)
> - return 0;
> + struct xfs_mount *mp = XFS_M(inode->i_sb);
> + int error;
>
> /*
> * Behave as if the write failed if drop writes is enabled. Set the NEW
> @@ -1160,35 +1156,8 @@ xfs_buffered_write_iomap_end(
> written = 0;
> }
>
> - /* If we didn't reserve the blocks, we're not allowed to punch them. */
> - if (!(iomap->flags & IOMAP_F_NEW))
> - return 0;
> -
> - /*
> - * start_fsb refers to the first unused block after a short write. If
> - * nothing was written, round offset down to point at the first block in
> - * the range.
> - */
> - if (unlikely(!written))
> - start_byte = round_down(offset, mp->m_sb.sb_blocksize);
> - else
> - start_byte = round_up(offset + written, mp->m_sb.sb_blocksize);
> - end_byte = round_up(offset + length, mp->m_sb.sb_blocksize);
> -
> - /* Nothing to do if we've written the entire delalloc extent */
> - if (start_byte >= end_byte)
> - return 0;
> -
> - /*
> - * Lock the mapping to avoid races with page faults re-instantiating
> - * folios and dirtying them via ->page_mkwrite between the page cache
> - * truncation and the delalloc extent removal. Failing to do this can
> - * leave dirty pages with no space reservation in the cache.
> - */
> - filemap_invalidate_lock(inode->i_mapping);
> - truncate_pagecache_range(inode, start_byte, end_byte - 1);
> - error = xfs_buffered_write_delalloc_punch(inode, start_byte, end_byte);
> - filemap_invalidate_unlock(inode->i_mapping);
> + error = iomap_file_buffered_write_punch_delalloc(inode, iomap, offset,
> + length, written, &xfs_buffered_write_delalloc_punch);
> if (error && !xfs_is_shutdown(mp)) {
> xfs_alert(mp, "%s: unable to clean up ino 0x%llx",
> __func__, XFS_I(inode)->i_ino);
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index 238a03087e17..6bbed915c83a 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -226,6 +226,12 @@ static inline const struct iomap *iomap_iter_srcmap(const struct iomap_iter *i)
>
> ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from,
> const struct iomap_ops *ops);
> +int iomap_file_buffered_write_punch_delalloc(struct inode *inode,
> + struct iomap *iomap, loff_t offset, loff_t length,
> + ssize_t written,
> + int (*punch)(struct inode *inode,
> + loff_t offset, loff_t length));
> +
> int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops);
> void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops);
> bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count);
> --
> 2.37.2
>
next prev parent reply other threads:[~2022-11-17 17:57 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-17 5:58 [PATCH 0/8 v3] xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-17 5:58 ` [PATCH 1/9] xfs: write page faults in iomap are not buffered writes Dave Chinner
2022-11-17 5:58 ` [PATCH 2/9] xfs: punching delalloc extents on write failure is racy Dave Chinner
2022-11-17 17:50 ` Darrick J. Wong
2022-11-17 5:58 ` [PATCH 3/9] xfs: use byte ranges for write cleanup ranges Dave Chinner
2022-11-17 5:58 ` [PATCH 4/9] xfs,iomap: move delalloc punching to iomap Dave Chinner
2022-11-17 17:57 ` Darrick J. Wong [this message]
2022-11-17 5:58 ` [PATCH 5/9] iomap: buffered write failure should not truncate the page cache Dave Chinner
2022-11-17 18:36 ` Darrick J. Wong
2022-11-17 5:58 ` [PATCH 6/9] xfs: xfs_bmap_punch_delalloc_range() should take a byte range Dave Chinner
2022-11-17 18:37 ` Darrick J. Wong
2022-11-17 5:58 ` [PATCH 7/9] iomap: write iomap validity checks Dave Chinner
2022-11-17 18:40 ` Darrick J. Wong
2022-11-17 5:58 ` [PATCH 8/9] xfs: use iomap_valid method to detect stale cached iomaps Dave Chinner
2022-11-17 18:59 ` Darrick J. Wong
2022-11-17 21:36 ` Dave Chinner
2022-11-17 5:58 ` [PATCH 9/9] xfs: drop write error injection is unfixable, remove it Dave Chinner
2022-11-17 18:01 ` Darrick J. Wong
-- strict thread matches above, loose matches on Subject: below --
2022-11-23 5:58 [PATCH 0/9 V4] xfs, iomap: fix data corrupton due to stale cached iomaps Dave Chinner
2022-11-23 5:58 ` [PATCH 4/9] xfs,iomap: move delalloc punching to iomap Dave Chinner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y3Z1+aDVRZFwDUvL@magnolia \
--to=djwong@kernel.org \
--cc=david@fromorbit.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).