linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Christoph Hellwig <hch@lst.de>
Cc: Chandan Babu R <chandan.babu@oracle.com>,
	Christian Brauner <brauner@kernel.org>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 02/10] iomap: remove iomap_file_buffered_write_punch_delalloc
Date: Mon, 23 Sep 2024 09:18:25 -0700	[thread overview]
Message-ID: <20240923161825.GE21877@frogsfrogsfrogs> (raw)
In-Reply-To: <20240923152904.1747117-3-hch@lst.de>

On Mon, Sep 23, 2024 at 05:28:16PM +0200, Christoph Hellwig wrote:
> Currently iomap_file_buffered_write_punch_delalloc can be called from
> XFS either with the invalidate lock held or not.  To fix this while
> keeping the locking in the file system and not the iomap library
> code we'll need to life the locking up into the file system.
> 
> To prepare for that, open code iomap_file_buffered_write_punch_delalloc
> in the only caller, and instead export iomap_write_delalloc_release.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  .../filesystems/iomap/operations.rst          |  2 +-
>  fs/iomap/buffered-io.c                        | 85 ++++++-------------
>  fs/xfs/xfs_iomap.c                            | 16 +++-
>  include/linux/iomap.h                         |  6 +-
>  4 files changed, 46 insertions(+), 63 deletions(-)
> 
> diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
> index 8e6c721d233010..b93115ab8748ae 100644
> --- a/Documentation/filesystems/iomap/operations.rst
> +++ b/Documentation/filesystems/iomap/operations.rst
> @@ -208,7 +208,7 @@ The filesystem must arrange to `cancel
>  such `reservations
>  <https://lore.kernel.org/linux-xfs/20220817093627.GZ3600936@dread.disaster.area/>`_
>  because writeback will not consume the reservation.
> -The ``iomap_file_buffered_write_punch_delalloc`` can be called from a
> +The ``iomap_write_delalloc_release`` can be called from a
>  ``->iomap_end`` function to find all the clean areas of the folios
>  caching a fresh (``IOMAP_F_NEW``) delalloc mapping.
>  It takes the ``invalidate_lock``.
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index 884891ac7a226c..237aeb883166df 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1149,6 +1149,32 @@ static void iomap_write_delalloc_scan(struct inode *inode,
>   * have dirty data still pending in the page cache - those are going to be
>   * written and so must still retain the delalloc backing for writeback.
>   *
> + * When a short write occurs, the filesystem may need to remove reserved space
> + * that was allocated in ->iomap_begin from it's ->iomap_end method. For

"When a short write occurs, the filesystem may need to remove space
reservations created in ->iomap_begin.

> + * filesystems that use delayed allocation, we need to punch out delalloc
> + * extents from the range that are not dirty in the page cache. As the write can
> + * race with page faults, there can be dirty pages over the delalloc extent
> + * outside the range of a short write but still within the delalloc extent
> + * allocated for this iomap.
> + *
> + * The punch() callback *must* only punch delalloc extents in the range passed
> + * to it. It must skip over all other types of extents in the range and leave
> + * them completely unchanged. It must do this punch atomically with respect to
> + * other extent modifications.

Can a failing buffered write race with a write fault to the same file
range?

write() thread:			page_mkwrite thread:
---------------			--------------------
take i_rwsem
->iomap_begin
create da reservation
lock folio
fail to write
unlock folio
				take invalidation lock
				lock folio
				->iomap_begin
				sees da reservation
				mark folio dirty
				unlock folio
				drop invalidation lock
->iomap_end
take invalidation lock
iomap_write_delalloc_release
drop invalidation lock

Can we end up in this situation, where the write fault thinks it has a
dirty page backed by a delalloc reservation, yet the delalloc
reservation gets removed by the delalloc punch logic?  I think the
answer to my question is that this sequence is impossible because the
write fault dirties the folio so the iomap_write_delalloc_release does
nothing, correct?

Unrelated question about iomap_write_begin: Can we get rid of the
!mapping_large_folio_support if-body just prior to __iomap_get_folio?
filemap_get_folio won't return large folios if
!mapping_large_folio_support, so I think the separate check in iomap
isn't needed anymore?

This push-down looks fine though, so
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> + *
> + * The punch() callback may be called with a folio locked to prevent writeback
> + * extent allocation racing at the edge of the range we are currently punching.
> + * The locked folio may or may not cover the range being punched, so it is not
> + * safe for the punch() callback to lock folios itself.
> + *
> + * Lock order is:
> + *
> + * inode->i_rwsem (shared or exclusive)
> + *   inode->i_mapping->invalidate_lock (exclusive)
> + *     folio_lock()
> + *       ->punch
> + *         internal filesystem allocation lock
> + *
>   * As we are scanning the page cache for data, we don't need to reimplement the
>   * wheel - mapping_seek_hole_data() does exactly what we need to identify the
>   * start and end of data ranges correctly even for sub-folio block sizes. This
> @@ -1177,7 +1203,7 @@ static void iomap_write_delalloc_scan(struct inode *inode,
>   * require sprinkling this code with magic "+ 1" and "- 1" arithmetic and expose
>   * the code to subtle off-by-one bugs....
>   */
> -static void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
> +void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
>  		loff_t end_byte, unsigned flags, struct iomap *iomap,
>  		iomap_punch_t punch)
>  {
> @@ -1243,62 +1269,7 @@ static void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
>  out_unlock:
>  	filemap_invalidate_unlock(inode->i_mapping);
>  }
> -
> -/*
> - * When a short write occurs, the filesystem may need to remove reserved space
> - * that was allocated in ->iomap_begin from it's ->iomap_end method. For
> - * filesystems that use delayed allocation, we need to punch out delalloc
> - * extents from the range that are not dirty in the page cache. As the write can
> - * race with page faults, there can be dirty pages over the delalloc extent
> - * outside the range of a short write but still within the delalloc extent
> - * allocated for this iomap.
> - *
> - * This function uses [start_byte, end_byte) intervals (i.e. open ended) to
> - * simplify range iterations.
> - *
> - * The punch() callback *must* only punch delalloc extents in the range passed
> - * to it. It must skip over all other types of extents in the range and leave
> - * them completely unchanged. It must do this punch atomically with respect to
> - * other extent modifications.
> - *
> - * The punch() callback may be called with a folio locked to prevent writeback
> - * extent allocation racing at the edge of the range we are currently punching.
> - * The locked folio may or may not cover the range being punched, so it is not
> - * safe for the punch() callback to lock folios itself.
> - *
> - * Lock order is:
> - *
> - * inode->i_rwsem (shared or exclusive)
> - *   inode->i_mapping->invalidate_lock (exclusive)
> - *     folio_lock()
> - *       ->punch
> - *         internal filesystem allocation lock
> - */
> -void iomap_file_buffered_write_punch_delalloc(struct inode *inode,
> -		loff_t pos, loff_t length, ssize_t written, unsigned flags,
> -		struct iomap *iomap, iomap_punch_t punch)
> -{
> -	loff_t			start_byte;
> -	loff_t			end_byte;
> -
> -	if (iomap->type != IOMAP_DELALLOC)
> -		return;
> -
> -	/* If we didn't reserve the blocks, we're not allowed to punch them. */
> -	if (!(iomap->flags & IOMAP_F_NEW))
> -		return;
> -
> -	start_byte = iomap_last_written_block(inode, pos, written);
> -	end_byte = round_up(pos + length, i_blocksize(inode));
> -
> -	/* Nothing to do if we've written the entire delalloc extent */
> -	if (start_byte >= end_byte)
> -		return;
> -
> -	iomap_write_delalloc_release(inode, start_byte, end_byte, flags, iomap,
> -			punch);
> -}
> -EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc);
> +EXPORT_SYMBOL_GPL(iomap_write_delalloc_release);
>  
>  static loff_t iomap_unshare_iter(struct iomap_iter *iter)
>  {
> diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> index 1e11f48814c0d0..30f2530b6d5461 100644
> --- a/fs/xfs/xfs_iomap.c
> +++ b/fs/xfs/xfs_iomap.c
> @@ -1227,8 +1227,20 @@ xfs_buffered_write_iomap_end(
>  	unsigned		flags,
>  	struct iomap		*iomap)
>  {
> -	iomap_file_buffered_write_punch_delalloc(inode, offset, length, written,
> -			flags, iomap, &xfs_buffered_write_delalloc_punch);
> +	loff_t			start_byte, end_byte;
> +
> +	/* If we didn't reserve the blocks, we're not allowed to punch them. */
> +	if (iomap->type != IOMAP_DELALLOC || !(iomap->flags & IOMAP_F_NEW))
> +		return 0;
> +
> +	/* Nothing to do if we've written the entire delalloc extent */
> +	start_byte = iomap_last_written_block(inode, offset, written);
> +	end_byte = round_up(offset + length, i_blocksize(inode));
> +	if (start_byte >= end_byte)
> +		return 0;
> +
> +	iomap_write_delalloc_release(inode, start_byte, end_byte, flags, iomap,
> +			xfs_buffered_write_delalloc_punch);
>  	return 0;
>  }
>  
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index e62df5d93f04de..137e0783faa224 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -290,9 +290,9 @@ vm_fault_t iomap_page_mkwrite(struct vm_fault *vmf,
>  
>  typedef void (*iomap_punch_t)(struct inode *inode, loff_t offset, loff_t length,
>  		struct iomap *iomap);
> -void iomap_file_buffered_write_punch_delalloc(struct inode *inode, loff_t pos,
> -		loff_t length, ssize_t written, unsigned flag,
> -		struct iomap *iomap, iomap_punch_t punch);
> +void iomap_write_delalloc_release(struct inode *inode, loff_t start_byte,
> +		loff_t end_byte, unsigned flags, struct iomap *iomap,
> +		iomap_punch_t punch);
>  
>  int iomap_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
>  		u64 start, u64 len, const struct iomap_ops *ops);
> -- 
> 2.45.2
> 

  reply	other threads:[~2024-09-23 16:18 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-23 15:28 fix stale delalloc punching for COW I/O v3 Christoph Hellwig
2024-09-23 15:28 ` [PATCH 01/10] iomap: factor out a iomap_last_written_block helper Christoph Hellwig
2024-09-23 15:53   ` Darrick J. Wong
2024-09-24  5:45     ` Christoph Hellwig
2024-09-23 15:28 ` [PATCH 02/10] iomap: remove iomap_file_buffered_write_punch_delalloc Christoph Hellwig
2024-09-23 16:18   ` Darrick J. Wong [this message]
2024-09-23 22:43     ` Dave Chinner
2024-09-24  5:55     ` Christoph Hellwig
2024-09-24  6:05       ` Darrick J. Wong
2024-09-24  6:10         ` Christoph Hellwig
2024-09-23 15:28 ` [PATCH 03/10] iomap: move locking out of iomap_write_delalloc_release Christoph Hellwig
2024-09-23 16:19   ` Darrick J. Wong
2024-09-23 15:28 ` [PATCH 04/10] xfs: factor out a xfs_file_write_zero_eof helper Christoph Hellwig
2024-09-23 16:20   ` Darrick J. Wong
2024-09-23 15:28 ` [PATCH 05/10] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof Christoph Hellwig
2024-09-23 15:28 ` [PATCH 06/10] xfs: zeroing already holds invalidate_lock Christoph Hellwig
2024-09-23 16:22   ` Darrick J. Wong
2024-09-24  5:44     ` Christoph Hellwig
2024-09-23 15:28 ` [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range Christoph Hellwig
2024-09-23 15:28 ` [PATCH 08/10] xfs: share more code in xfs_buffered_write_iomap_begin Christoph Hellwig
2024-09-23 15:28 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig
2024-09-23 15:28 ` [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes Christoph Hellwig
2024-09-25  9:19 ` fix stale delalloc punching for COW I/O v3 Christian Brauner
2024-09-25  9:24   ` fix stale delalloc punching for COW I/O v4 Christian Brauner
  -- strict thread matches above, loose matches on Subject: below --
2024-09-24  7:40 Christoph Hellwig
2024-09-24  7:40 ` [PATCH 01/10] iomap: factor out a iomap_last_written_block helper Christoph Hellwig
2024-09-24 14:58   ` Darrick J. Wong
2024-09-24  7:40 ` [PATCH 02/10] iomap: remove iomap_file_buffered_write_punch_delalloc Christoph Hellwig
2024-09-24  7:40 ` [PATCH 03/10] iomap: move locking out of iomap_write_delalloc_release Christoph Hellwig
2024-09-24  7:40 ` [PATCH 04/10] xfs: factor out a xfs_file_write_zero_eof helper Christoph Hellwig
2024-09-24  7:40 ` [PATCH 05/10] xfs: take XFS_MMAPLOCK_EXCL xfs_file_write_zero_eof Christoph Hellwig
2024-09-24  7:40 ` [PATCH 06/10] xfs: zeroing already holds invalidate_lock Christoph Hellwig
2024-09-24  7:40 ` [PATCH 07/10] xfs: support the COW fork in xfs_bmap_punch_delalloc_range Christoph Hellwig
2024-09-24  7:40 ` [PATCH 08/10] xfs: share more code in xfs_buffered_write_iomap_begin Christoph Hellwig
2024-09-24  7:40 ` [PATCH 09/10] xfs: set IOMAP_F_SHARED for all COW fork allocations Christoph Hellwig
2024-09-24  7:40 ` [PATCH 10/10] xfs: punch delalloc extents from the COW fork for COW writes Christoph Hellwig
2024-10-05 15:53 ` fix stale delalloc punching for COW I/O v4 Darrick J. Wong
2024-10-07  5:41   ` Christoph Hellwig
2024-10-07  6:28     ` Darrick J. Wong
2024-10-07  6:46       ` Christoph Hellwig
2024-10-07 15:20         ` Darrick J. Wong
2024-10-08  8:59 fix stale delalloc punching for COW I/O v5 Christoph Hellwig
2024-10-08  8:59 ` [PATCH 02/10] iomap: remove iomap_file_buffered_write_punch_delalloc Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240923161825.GE21877@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=brauner@kernel.org \
    --cc=chandan.babu@oracle.com \
    --cc=hch@lst.de \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).