public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: fdmanana@kernel.org, linux-btrfs@vger.kernel.org
Subject: Re: [PATCH 1/3] btrfs: fix -ENOSPC mmap write failure on NOCOW files/extents
Date: Wed, 9 Jul 2025 19:32:21 +0930	[thread overview]
Message-ID: <bb1cb822-39fc-4fe9-83b3-25d44eafbc50@gmx.com> (raw)
In-Reply-To: <91b4d80da9b7938b6f7b0c628a6c0cf896f97061.1752050704.git.fdmanana@suse.com>



在 2025/7/9 18:23, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
> 
> If we attempt a mmap write into a NOCOW file or a prealloc extent when
> there is no more available data space (or unallocated space to allocate a
> new data block group) and we can do a NOCOW write (there are no reflinks
> for the target extent or snapshots), we always fail due to -ENOSPC, unlike
> for the regular buffered write and direct IO paths where we check that we
> can do a NOCOW write in case we can't reserve data space.
> 
> Simple reproducer:
> 
>    $ cat test.sh
>    #!/bin/bash
> 
>    DEV=/dev/sdi
>    MNT=/mnt/sdi
> 
>    umount $DEV &> /dev/null
>    mkfs.btrfs -f -b $((512 * 1024 * 1024)) $DEV
>    mount $DEV $MNT
> 
>    touch $MNT/foobar
>    # Make it a NOCOW file.
>    chattr +C $MNT/foobar
> 
>    # Add initial data to file.
>    xfs_io -c "pwrite -S 0xab 0 1M" $MNT/foobar
> 
>    # Fill all the remaining data space and unallocated space with data.
>    dd if=/dev/zero of=$MNT/filler bs=4K &> /dev/null
> 
>    # Overwrite the file with a mmap write. Should succeed.
>    xfs_io -c "mmap -w 0 1M"        \
>           -c "mwrite -S 0xcd 0 1M" \
>           -c "munmap"              \
>           $MNT/foobar
> 
>    # Unmount, mount again and verify the new data was persisted.
>    umount $MNT
>    mount $DEV $MNT
> 
>    od -A d -t x1 $MNT/foobar
> 
>    umount $MNT
> 
> Running this:
> 
>    $ ./test.sh
>    (...)
>    wrote 1048576/1048576 bytes at offset 0
>    1 MiB, 256 ops; 0.0008 sec (1.188 GiB/sec and 311435.5231 ops/sec)
>    ./test.sh: line 24: 234865 Bus error               xfs_io -c "mmap -w 0 1M" -c "mwrite -S 0xcd 0 1M" -c "munmap" $MNT/foobar
>    0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
>    *
>    1048576
> 
> Fix this by not failing in case we can't allocate data space and we can
> NOCOW into the target extent - reserving only metadata space in this case.
> 
> After this change the test passes:
> 
>    $ ./test.sh
>    (...)
>    wrote 1048576/1048576 bytes at offset 0
>    1 MiB, 256 ops; 0.0007 sec (1.262 GiB/sec and 330749.3540 ops/sec)
>    0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
>    *
>    1048576
> 
> A test case for fstests will be added soon.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

With large data folios, I'm afraid we may fail the nocow check more 
frequently than the regular page sized folios.
But that's unavoidable, we have to ensure the whole folio can be written 
back NOCOW.

Thanks,
Qu

> ---
>   fs/btrfs/file.c | 49 +++++++++++++++++++++++++++++++++++++++++--------
>   1 file changed, 41 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 05b046c6806f..f08c72dbb530 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -1841,6 +1841,7 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>   	loff_t size;
>   	size_t fsize = folio_size(folio);
>   	int ret;
> +	bool only_release_metadata = false;
>   	u64 reserved_space;
>   	u64 page_start;
>   	u64 page_end;
> @@ -1861,10 +1862,28 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>   	 * end up waiting indefinitely to get a lock on the page currently
>   	 * being processed by btrfs_page_mkwrite() function.
>   	 */
> -	ret = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
> -					   page_start, reserved_space);
> -	if (ret < 0)
> +	ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved,
> +					  page_start, reserved_space, false);
> +	if (ret < 0) {
> +		size_t write_bytes = reserved_space;
> +
> +		if (btrfs_check_nocow_lock(BTRFS_I(inode), page_start,
> +					   &write_bytes, false) <= 0)
> +			goto out_noreserve;
> +
> +		if (write_bytes < reserved_space)
> +			goto out_noreserve;
> +
> +		only_release_metadata = true;
> +	}
> +	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), reserved_space,
> +					      reserved_space, false);
> +	if (ret < 0) {
> +		if (!only_release_metadata)
> +			btrfs_free_reserved_data_space(BTRFS_I(inode), data_reserved,
> +						       page_start, reserved_space);
>   		goto out_noreserve;
> +	}
>   
>   	ret = file_update_time(vmf->vma->vm_file);
>   	if (ret < 0)
> @@ -1906,9 +1925,11 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>   		reserved_space = round_up(size - page_start, fs_info->sectorsize);
>   		if (reserved_space < fsize) {
>   			end = page_start + reserved_space - 1;
> -			btrfs_delalloc_release_space(BTRFS_I(inode),
> -					data_reserved, end + 1,
> -					fsize - reserved_space, true);
> +			if (!only_release_metadata)
> +				btrfs_delalloc_release_space(BTRFS_I(inode),
> +							     data_reserved, end + 1,
> +							     fsize - reserved_space,
> +							     true);
>   		}
>   	}
>   
> @@ -1945,10 +1966,16 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>   
>   	btrfs_set_inode_last_sub_trans(BTRFS_I(inode));
>   
> +	if (only_release_metadata)
> +		btrfs_set_extent_bit(io_tree, page_start, end, EXTENT_NORESERVE,
> +				     &cached_state);
> +
>   	btrfs_unlock_extent(io_tree, page_start, page_end, &cached_state);
>   	up_read(&BTRFS_I(inode)->i_mmap_lock);
>   
>   	btrfs_delalloc_release_extents(BTRFS_I(inode), fsize);
> +	if (only_release_metadata)
> +		btrfs_check_nocow_unlock(BTRFS_I(inode));
>   	sb_end_pagefault(inode->i_sb);
>   	extent_changeset_free(data_reserved);
>   	return VM_FAULT_LOCKED;
> @@ -1958,10 +1985,16 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
>   	up_read(&BTRFS_I(inode)->i_mmap_lock);
>   out:
>   	btrfs_delalloc_release_extents(BTRFS_I(inode), fsize);
> -	btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
> -				     reserved_space, true);
> +	if (only_release_metadata)
> +		btrfs_delalloc_release_metadata(BTRFS_I(inode), reserved_space, true);
> +	else
> +		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved,
> +					     page_start, reserved_space, true);
>   	extent_changeset_free(data_reserved);
>   out_noreserve:
> +	if (only_release_metadata)
> +		btrfs_check_nocow_unlock(BTRFS_I(inode));
> +
>   	sb_end_pagefault(inode->i_sb);
>   
>   	if (ret < 0)


  reply	other threads:[~2025-07-09 10:02 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-09  8:53 [PATCH 0/3] btrfs: fix mmap writes for NOCOW files/extents fdmanana
2025-07-09  8:53 ` [PATCH 1/3] btrfs: fix -ENOSPC mmap write failure on " fdmanana
2025-07-09 10:02   ` Qu Wenruo [this message]
2025-07-09 10:36     ` Filipe Manana
2025-07-09  8:53 ` [PATCH 2/3] btrfs: use variable for io_tree when clearing range in btrfs_page_mkwrite() fdmanana
2025-07-09 10:02   ` Qu Wenruo
2025-07-09  8:53 ` [PATCH 3/3] btrfs: use btrfs_inode local variable at btrfs_page_mkwrite() fdmanana
2025-07-09 10:04   ` Qu Wenruo
2025-07-09 10:39     ` Filipe Manana
2025-07-09 11:13 ` [PATCH v2 0/3] btrfs: fix mmap writes for NOCOW files/extents fdmanana
2025-07-09 11:13   ` [PATCH v2 1/3] btrfs: fix -ENOSPC mmap write failure on " fdmanana
2025-07-09 11:13   ` [PATCH v2 2/3] btrfs: use variable for io_tree when clearing range in btrfs_page_mkwrite() fdmanana
2025-07-09 11:13   ` [PATCH v2 3/3] btrfs: use btrfs_inode local variable at btrfs_page_mkwrite() fdmanana
2025-07-09 11:45 ` [PATCH v3 0/3] btrfs: fix mmap writes for NOCOW files/extents fdmanana
2025-07-09 11:45   ` [PATCH v3 1/3] btrfs: fix -ENOSPC mmap write failure on " fdmanana
2025-07-09 11:45   ` [PATCH v3 2/3] btrfs: use variable for io_tree when clearing range in btrfs_page_mkwrite() fdmanana
2025-07-09 11:45   ` [PATCH v3 3/3] btrfs: use btrfs_inode local variable at btrfs_page_mkwrite() fdmanana

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb1cb822-39fc-4fe9-83b3-25d44eafbc50@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox