Linux kernel -stable discussions
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: patches@lists.linux.dev, stable@vger.kernel.org
Cc: Filipe Manana <fdmanana@suse.com>, Qu Wenruo <wqu@suse.com>,
	David Sterba <dsterba@suse.com>, Sasha Levin <sashal@kernel.org>,
	clm@fb.com, josef@toxicpanda.com, linux-btrfs@vger.kernel.org
Subject: [PATCH AUTOSEL 6.16 10/35] btrfs: fix -ENOSPC mmap write failure on NOCOW files/extents
Date: Sun,  3 Aug 2025 17:17:10 -0400	[thread overview]
Message-ID: <20250803211736.3545028-10-sashal@kernel.org> (raw)
In-Reply-To: <20250803211736.3545028-1-sashal@kernel.org>

From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit 6599716de2d68ab7b3c0429221deca306dbda878 ]

If we attempt a mmap write into a NOCOW file or a prealloc extent when
there is no more available data space (or unallocated space to allocate a
new data block group) and we can do a NOCOW write (there are no reflinks
for the target extent or snapshots), we always fail due to -ENOSPC, unlike
for the regular buffered write and direct IO paths where we check that we
can do a NOCOW write in case we can't reserve data space.

Simple reproducer:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  umount $DEV &> /dev/null
  mkfs.btrfs -f -b $((512 * 1024 * 1024)) $DEV
  mount $DEV $MNT

  touch $MNT/foobar
  # Make it a NOCOW file.
  chattr +C $MNT/foobar

  # Add initial data to file.
  xfs_io -c "pwrite -S 0xab 0 1M" $MNT/foobar

  # Fill all the remaining data space and unallocated space with data.
  dd if=/dev/zero of=$MNT/filler bs=4K &> /dev/null

  # Overwrite the file with a mmap write. Should succeed.
  xfs_io -c "mmap -w 0 1M"        \
         -c "mwrite -S 0xcd 0 1M" \
         -c "munmap"              \
         $MNT/foobar

  # Unmount, mount again and verify the new data was persisted.
  umount $MNT
  mount $DEV $MNT

  od -A d -t x1 $MNT/foobar

  umount $MNT

Running this:

  $ ./test.sh
  (...)
  wrote 1048576/1048576 bytes at offset 0
  1 MiB, 256 ops; 0.0008 sec (1.188 GiB/sec and 311435.5231 ops/sec)
  ./test.sh: line 24: 234865 Bus error               xfs_io -c "mmap -w 0 1M" -c "mwrite -S 0xcd 0 1M" -c "munmap" $MNT/foobar
  0000000 ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab
  *
  1048576

Fix this by not failing in case we can't allocate data space and we can
NOCOW into the target extent - reserving only metadata space in this case.

After this change the test passes:

  $ ./test.sh
  (...)
  wrote 1048576/1048576 bytes at offset 0
  1 MiB, 256 ops; 0.0007 sec (1.262 GiB/sec and 330749.3540 ops/sec)
  0000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd
  *
  1048576

A test case for fstests will be added soon.

Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---

LLM Generated explanations, may be completely bogus:

Based on my analysis of this commit, here is my determination:

**Backport Status: YES**

This commit should be backported to stable kernel trees for the
following reasons:

1. **Fixes a real user-visible bug**: The commit fixes a clear
   functional regression where mmap writes to NOCOW files fail with
   -ENOSPC even when they shouldn't. The bug causes legitimate write
   operations to fail when the filesystem is nearly full, even though
   NOCOW writes should succeed without requiring new data space
   allocation.

2. **Contains a simple reproducer**: The commit message includes a clear
   test case that demonstrates the problem, showing it's a reproducible
   issue that users can encounter in normal usage scenarios.

3. **Limited scope and risk**: The fix is confined to the
   `btrfs_page_mkwrite()` function in fs/btrfs/file.c. It doesn't touch
   multiple subsystems or introduce architectural changes. The
   modification follows the same pattern already used in buffered write
   and direct IO paths.

4. **Follows existing patterns**: The fix applies the same NOCOW
   handling logic that already exists for regular buffered writes and
   direct IO. It's essentially fixing an inconsistency where mmap writes
   didn't have the same NOCOW fallback logic.

5. **Clear fix with minimal changes**: The patch adds proper NOCOW
   checking when data space reservation fails, similar to what's done in
   other write paths. It uses existing functions like
   `btrfs_check_nocow_lock()` and `btrfs_check_nocow_unlock()`.

6. **No new features**: This is purely a bug fix that restores expected
   behavior. It doesn't add any new functionality or APIs.


The code changes show a straightforward fix that:
- Adds `only_release_metadata` flag to track NOCOW write scenarios
- When data space reservation fails, checks if NOCOW write is possible
- If NOCOW is possible, only reserves metadata space instead of failing
- Properly releases resources in error paths based on whether it's a
  NOCOW operation

This is exactly the type of bug fix that stable kernels should receive -
it fixes a specific functional problem without introducing new risks or
complexity.

 fs/btrfs/file.c | 59 ++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 51 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 8ce6f45f45e0..0a468bbfe015 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1842,6 +1842,7 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	loff_t size;
 	size_t fsize = folio_size(folio);
 	int ret;
+	bool only_release_metadata = false;
 	u64 reserved_space;
 	u64 page_start;
 	u64 page_end;
@@ -1862,10 +1863,34 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	 * end up waiting indefinitely to get a lock on the page currently
 	 * being processed by btrfs_page_mkwrite() function.
 	 */
-	ret = btrfs_delalloc_reserve_space(BTRFS_I(inode), &data_reserved,
-					   page_start, reserved_space);
-	if (ret < 0)
+	ret = btrfs_check_data_free_space(BTRFS_I(inode), &data_reserved,
+					  page_start, reserved_space, false);
+	if (ret < 0) {
+		size_t write_bytes = reserved_space;
+
+		if (btrfs_check_nocow_lock(BTRFS_I(inode), page_start,
+					   &write_bytes, false) <= 0)
+			goto out_noreserve;
+
+		only_release_metadata = true;
+
+		/*
+		 * Can't write the whole range, there may be shared extents or
+		 * holes in the range, bail out with @only_release_metadata set
+		 * to true so that we unlock the nocow lock before returning the
+		 * error.
+		 */
+		if (write_bytes < reserved_space)
+			goto out_noreserve;
+	}
+	ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), reserved_space,
+					      reserved_space, false);
+	if (ret < 0) {
+		if (!only_release_metadata)
+			btrfs_free_reserved_data_space(BTRFS_I(inode), data_reserved,
+						       page_start, reserved_space);
 		goto out_noreserve;
+	}
 
 	ret = file_update_time(vmf->vma->vm_file);
 	if (ret < 0)
@@ -1906,10 +1931,16 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	if (folio_contains(folio, (size - 1) >> PAGE_SHIFT)) {
 		reserved_space = round_up(size - page_start, fs_info->sectorsize);
 		if (reserved_space < fsize) {
+			const u64 to_free = fsize - reserved_space;
+
 			end = page_start + reserved_space - 1;
-			btrfs_delalloc_release_space(BTRFS_I(inode),
-					data_reserved, end + 1,
-					fsize - reserved_space, true);
+			if (only_release_metadata)
+				btrfs_delalloc_release_metadata(BTRFS_I(inode),
+								to_free, true);
+			else
+				btrfs_delalloc_release_space(BTRFS_I(inode),
+							     data_reserved, end + 1,
+							     to_free, true);
 		}
 	}
 
@@ -1946,10 +1977,16 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 
 	btrfs_set_inode_last_sub_trans(BTRFS_I(inode));
 
+	if (only_release_metadata)
+		btrfs_set_extent_bit(io_tree, page_start, end, EXTENT_NORESERVE,
+				     &cached_state);
+
 	btrfs_unlock_extent(io_tree, page_start, page_end, &cached_state);
 	up_read(&BTRFS_I(inode)->i_mmap_lock);
 
 	btrfs_delalloc_release_extents(BTRFS_I(inode), fsize);
+	if (only_release_metadata)
+		btrfs_check_nocow_unlock(BTRFS_I(inode));
 	sb_end_pagefault(inode->i_sb);
 	extent_changeset_free(data_reserved);
 	return VM_FAULT_LOCKED;
@@ -1959,10 +1996,16 @@ static vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf)
 	up_read(&BTRFS_I(inode)->i_mmap_lock);
 out:
 	btrfs_delalloc_release_extents(BTRFS_I(inode), fsize);
-	btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved, page_start,
-				     reserved_space, true);
+	if (only_release_metadata)
+		btrfs_delalloc_release_metadata(BTRFS_I(inode), reserved_space, true);
+	else
+		btrfs_delalloc_release_space(BTRFS_I(inode), data_reserved,
+					     page_start, reserved_space, true);
 	extent_changeset_free(data_reserved);
 out_noreserve:
+	if (only_release_metadata)
+		btrfs_check_nocow_unlock(BTRFS_I(inode));
+
 	sb_end_pagefault(inode->i_sb);
 
 	if (ret < 0)
-- 
2.39.5


  parent reply	other threads:[~2025-08-03 21:18 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-03 21:17 [PATCH AUTOSEL 6.16 01/35] hfs: fix general protection fault in hfs_find_init() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 02/35] hfs: fix slab-out-of-bounds in hfs_bnode_read() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 03/35] hfsplus: fix slab-out-of-bounds in hfsplus_bnode_read() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 04/35] hfsplus: fix slab-out-of-bounds read in hfsplus_uni2asc() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 05/35] hfsplus: don't use BUG_ON() in hfsplus_create_attributes_file() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 06/35] arm64: Handle KCOV __init vs inline mismatches Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 07/35] tpm: Check for completion after timeout Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 08/35] tpm: tpm_crb_ffa: try to probe tpm_crb_ffa when it's built-in Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 09/35] firmware: arm_ffa: Change initcall level of ffa_init() to rootfs_initcall Sasha Levin
2025-08-03 21:17 ` Sasha Levin [this message]
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 11/35] smb/server: avoid deadlock when linking with ReplaceIfExists Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 12/35] nvme-pci: try function level reset on init failure Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 13/35] dm-stripe: limit chunk_sectors to the stripe size Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 14/35] md/raid10: set chunk_sectors limit Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 15/35] nvme-tcp: log TLS handshake failures at error level Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 16/35] gfs2: Validate i_depth for exhash directories Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 17/35] gfs2: Set .migrate_folio in gfs2_{rgrp,meta}_aops Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 18/35] md: call del_gendisk in control path Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 19/35] loop: Avoid updating block size under exclusive owner Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 20/35] udf: Verify partition map count Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 21/35] drbd: add missing kref_get in handle_write_conflicts Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 22/35] hfs: fix not erasing deleted b-tree node issue Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 23/35] better lockdep annotations for simple_recursive_removal() Sasha Levin
2025-08-03 21:17 ` [PATCH AUTOSEL 6.16 24/35] ata: ahci: Disallow LPM policy control if not supported Sasha Levin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250803211736.3545028-10-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=fdmanana@suse.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=patches@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox